Testing Spark Applications
Minimum price
Suggested price

Testing Spark Applications

Writing Spark code is hard... well designed, performant Spark tests are even harder. You need a robust test suite to identify performance bottlenecks in your code and refactor with ease. This book teaches you how to write a beautiful test suite and how to run the tests whenever code is pushed to the master branch.

About the Book

The book discusses Scala testing basics with the Scalatest framework. It uses the spark-fast-tests library to demonstrate column equality testing and DataFrame equality testing. Spark tests can run slowly so the book provides several practical workflows to keep tests running quickly. Spark code frequently reads and writes to disk and the book covers how to write tests for code with I/O. Configuring a test suite properly can make it around 70% faster and this book explains the configuration options you should have on your radar. Complex transformations (e.g. aggregations) and column types (e.g. MapType, ArrayType, StructType, BinaryType) have special testing considerations that are addressed in separate chapters.

This book has a heavy emphasis on software engineering best practices and will teach you skills that are useful for any language or framework.

About the Author

Table of Contents

  • Introduction
    • Messy data
    • Nightmare deploys
    • Empower refactoring
    • Tests encourage code that doesn’t have side effects
    • Identifying code bottlenecks
    • Test suites document behavior
    • Technologies used
  • Testing Scala with Scalatest
    • Writing a simple test
    • Directory organization
    • build.sbt
    • More tests
    • Running tests and configuring output
    • assertThrows
    • assertDoesNotCompile
    • Other assertions
    • Other test formats
    • Test library alternatives
    • Testing Spark applications
    • Next steps
  • Column Equality Tests
    • Custom DataFrame Transformation Refresher
    • Spark project setup
    • assertColumnEquality with spark-fast-tests
    • Conclusion
  • Quieting Test Output
    • Customizing test suite output
  • Creating DataFrames for Tests
    • toDF
    • createDataFrame
    • createDF
    • Including spark-daria in your projects
    • Next steps
  • DataFrame Equality Tests
    • Simple example
    • assertSmallDataFrameEquality error messages
    • Next steps
  • Running Tests
    • Running from the SBT console
    • Running a single test file
    • Running a single test
    • Best workflow
  • Approximate Equality
    • Difference between double, float and decimal
    • When assertColumnEquality falls short
    • assertFloatTypeColumnEquality to the rescue
    • assertApproximateDataFrameEquality
    • Conclusion
  • Testing User Defined Functions
    • Creating a UDF
    • Testing a UDF
    • Check the UDF fails with null input
    • The billion dollar mistake
    • Verifying test failure in the test suite
    • Next steps
  • Testing Spark Column Functions
    • Simple example
    • How Spark functions handle null
    • Important takeaway
    • Why print DataFrames from the test suite?
    • Next steps
  • Testing Filesystem Reads
    • Untestable code
    • Setting the path as a param
    • Testing with the config pattern
    • Elegant testing with dependency injection
    • Abstracting custom transformation to a separate function
    • Next steps
  • Testing Filesystem Writes
    • Simple example
    • Rude tests leave garbage behind
    • Performance considerations
    • Next steps
  • Identifying Bottlenecks
    • Let’s find the bottleneck
    • Benchmarking individual transformations
    • Contrived but representative
    • Conclusion
  • Organizing Tests
    • Some pure Scala
    • Poorly organized Spark tests
    • Test suite rules to follow
    • Good Spark test organization
    • Tests should be descriptive and document behavior
    • Quantifying performance difference
    • Next steps
  • Test Suite Configuration
    • Shuffle partitions
    • javaOptions
    • Conclusion
  • Testing Aggregations
    • groupBy refresher
    • groupBy with two columns
    • groupBy with filters
    • Conclusions

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub