Testing Spark Applications
Testing Spark Applications
$29.99
Minimum price
$39.99
Suggested price
Testing Spark Applications

Last updated on 2020-03-25

About the Book

The book discusses Scala testing basics with the Scalatest framework. It uses the spark-fast-tests library to demonstrate column equality testing and DataFrame equality testing. Spark tests can run slowly so the book provides several practical workflows to keep tests running quickly. Spark code frequently reads and writes to disk and the book covers how to write tests for code with I/O. Configuring a test suite properly can make it around 70% faster and this book explains the configuration options you should have on your radar. Complex transformations (e.g. aggregations) and column types (e.g. MapType, ArrayType, StructType, BinaryType) have special testing considerations that are addressed in separate chapters.

This book has a heavy emphasis on software engineering best practices and will teach you skills that are useful for any language or framework.

About the Author

Table of Contents

  • Introduction
    • Messy data
    • Nightmare deploys
    • Empower refactoring
    • Tests encourage code that doesn’t have side effects
    • Identifying code bottlenecks
    • Test suites document behavior
    • Technologies used
  • Testing Scala with Scalatest
    • Writing a simple test
    • Directory organization
    • build.sbt
    • More tests
    • Running tests and configuring output
    • assertThrows
    • assertDoesNotCompile
    • Other assertions
    • Other test formats
    • Test library alternatives
    • Testing Spark applications
    • Next steps
  • Column Equality Tests
    • Custom DataFrame Transformation Refresher
    • Spark project setup
    • assertColumnEquality with spark-fast-tests
    • Conclusion
  • Quieting Test Output
    • Customizing test suite output
  • Creating DataFrames for Tests
    • toDF
    • createDataFrame
    • createDF
    • Including spark-daria in your projects
    • Next steps
  • DataFrame Equality Tests
    • Simple example
    • assertSmallDataFrameEquality error messages
    • Next steps
  • Running Tests
    • Running from the SBT console
    • Running a single test file
    • Running a single test
    • Best workflow
  • Approximate Equality
    • Difference between double, float and decimal
    • When assertColumnEquality falls short
    • assertFloatTypeColumnEquality to the rescue
    • assertApproximateDataFrameEquality
    • Conclusion
  • Testing User Defined Functions
    • Creating a UDF
    • Testing a UDF
    • Check the UDF fails with null input
    • The billion dollar mistake
    • Verifying test failure in the test suite
    • Next steps
  • Testing Spark Column Functions
    • Simple example
    • How Spark functions handle null
    • Important takeaway
    • Why print DataFrames from the test suite?
    • Next steps
  • Testing Filesystem Reads
    • Untestable code
    • Setting the path as a param
    • Testing with the config pattern
    • Elegant testing with dependency injection
    • Abstracting custom transformation to a separate function
    • Next steps
  • Testing Filesystem Writes
    • Simple example
    • Rude tests leave garbage behind
    • Performance considerations
    • Next steps
  • Identifying Bottlenecks
    • Let’s find the bottleneck
    • Benchmarking individual transformations
    • Contrived but representative
    • Conclusion
  • Organizing Tests
    • Some pure Scala
    • Poorly organized Spark tests
    • Test suite rules to follow
    • Good Spark test organization
    • Tests should be descriptive and document behavior
    • Quantifying performance difference
    • Next steps
  • Test Suite Configuration
    • Shuffle partitions
    • javaOptions
    • Conclusion
  • Testing Aggregations
    • groupBy refresher
    • groupBy with two columns
    • groupBy with filters
    • Conclusions

Authors have earned$9,072,219writing, publishing and selling on Leanpub,
earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.

Learn more about writing on Leanpub

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub