Email the Author
You can use this page to email Matthew Powers about Testing Spark Applications.
About the Book
The book discusses Scala testing basics with the Scalatest framework. It uses the spark-fast-tests library to demonstrate column equality testing and DataFrame equality testing. Spark tests can run slowly so the book provides several practical workflows to keep tests running quickly. Spark code frequently reads and writes to disk and the book covers how to write tests for code with I/O. Configuring a test suite properly can make it around 70% faster and this book explains the configuration options you should have on your radar. Complex transformations (e.g. aggregations) and column types (e.g. MapType, ArrayType, StructType, BinaryType) have special testing considerations that are addressed in separate chapters.
This book has a heavy emphasis on software engineering best practices and will teach you skills that are useful for any language or framework.
About the Author