Modern Data Pipelines Testing Techniques
Minimum price
Suggested price

Modern Data Pipelines Testing Techniques

A Visual Guide

About the Book

Just run it in prod already. Common starting point. Don't let it be your end point. Evolve. You'll thank yourself later.

Any software product deteriorates rapidly without disciplined testing.

However, testing data pipelines is a hellish experience for new data developers.

Unfortunately, existing training about data pipeline testing give a scattered view of techniques for testing data pipelines. This book will help with a full view of modern data pipelines testing techniques in a highly-visual and coherent body of work. I hope it helps you in your career.

Why bother testing data pipelines? Billions of budget dollars regularly rely on the excellence of the data scientists, data engineers, and machine learning engineers behind the countless software data pipelines that inform critical business decisions.

Checkout the table of contents below to see how this book can help you evolve your data practices.

Unsure?! Here is a blog post to get your started



  • Share this book

  • Categories

    • Automated Software Testing
    • Databases
    • DevOps
    • Testing
    • Software Architecture
    • Enterprise Management
  • Feedback

    Email the Author(s)

About the Author

Moussa Taifi
Moussa Taifi

Data science platform architect focused on data science productivity, reliability, performance, and cost.

Working on designing and implementing large scale AI products through data collection, analysis, and warehousing.

Passionate about building scalable machine learning pipeline architectures with high business impact.

Aspiring author.



Table of Contents

  • Chapter 1: Testing Your Patience
    • Data Pipeline Transitive Failure Modes: The Reality Check
    • Bad Data Devs Lifestyle
    • TDD + CICD to the rescue?
    • Objections to TDD for Data Work
    • Sources of Data Validation Complexity
    • The Data Product Promise No One Can Keep
    • Fighting Against The Manual Auto-Pilot
    • Observability vs. Testing vs. Monitoring
    • Test-Driven Theater vs Continuous Delivery Theater
  • Chapter 2: Core Types of Data Pipeline Tests
    • Discovering Holistic Testing
    • Types of Tests: Test Boundaries
    • Types of Tests: Test Sizes
    • Types of Tests: Data Product Testing Quadrant
    • Types of Tests: Write-Audit-Publish
    • Types of Tests: Testing Grid
    • Types of Tests: Code Scale vs Data Scale Testing Grid
    • Types of Tests: Structuring Data Quality Tests
    • Types of Tests: Pointwise vs Pairwise vs Composite
    • Types of Tests: Testing SQL Queries
    • Types of Tests: Assembling The Testing Parts + Bug Tests
    • Feedback Levels vs. Testing Scales
    • Test Pyramids and Test Summits
  • Chapter 3: Supporting Components for Data Pipelines Tests
    • Supporting Pattern: Static vs Dynamic Test Data Generation
    • Supporting Pattern: Data Copies, Clones, and Snapshots
    • Supporting Pattern: Reverse Data Plane to Support Testing
    • Supporting Pattern: Parallel Dev-Test Data Streams
  • Chapter 4: Testing Legacy Data Pipelines
    • Legacy Testing Pattern I: Before Touching Anything -- End to End Characterization Tests
    • Legacy Testing Pattern III: Semantic Monitoring
    • Legacy Testing Pattern IV: Data Processing Platform Alerts
    • Legacy Testing Pattern V: Co-Control Data Contracts
    • Legacy Testing Pattern VI: Legacy Pipelines Golden Rule
  • Chapter 5: Design for Testability
    • Designing Hidden Data Pipelines
    • Designing Temporally Decoupled Data Pipelines
    • Designing Debuggable Data Pipelines
    • Designing Encapsulated Data Pipelines
    • Designing Right-Tool-For-The-Job Data Pipelines
    • Designing Feature Engineering Data Pipelines
    • Designing Iceberg Data Pipelines
  • Chapter 6: Data-oriented Development Environments
    • What Can You Do From Your Laptop?
    • Optimal Data Development Environment
    • Fundamental Data Dev Repo Components
    • Coding Timeline vs Data Job Timeline
  • Chapter 7: Deploying Data Pipelines
    • Useful CICD workflows for Data pipelines
    • Data Pipeline Release lifecycle
    • Testable Scheduled Jobs CICD Workflow
    • Database Schema Versioning Rational
    • Database Schema Versioning Golden Rule
    • Database Schema Migrations - Fields Strategies
    • Database Schema Migrations - Hidden Things To Test
  • Chapter 8: Tips for Data Organizations
    • Data Organization Testability Score Cards vs Your Average Data Dev
    • When To Give Up On Testing Data Pipelines
    • Actors In A Data Product
    • Organizational Friction To Disable Data Pipelines Testability
    • Organizational Changes To Enable Data Pipelines Testability
  • Chapter 9: Is This It?
    • With Great Responsibility Comes Great Capped Autonomy
    • Data Dev Autonomy Destruction Cookbook
    • The Fear Of Obsolescence
  • Outro
    • References
    • Release Notes

Causes Supported


You reforest the world

Tree-Nation is the largest reforestation platform enabling citizens and companies to plant trees around the world.

Tree-Nation’s mission is to reforest the world. The organisation offers people the possibility to select from over 300 different tree species which can be planted in 50 different reforestation projects located in 6 different continents. In every project, the species are carefully selected according to the specific benefits they bring for the environment and to the local population. Each tree planted on the Tree-Nation platform is assigned its own unique URL, which means the species, the location, the plantation project information and the CO2 compensation values can be tracked throughout the lifetime of the tree. Since its inception in 2006, more than 130,000 users and more than 2,200 companies have planted 5 million trees using its platform.

Other books by this author

The Leanpub 60-day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $12 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub