Modern Data Pipelines Testing Techniques
Modern Data Pipelines Testing Techniques
A Visual Guide
About the Book
Just run it in prod already. Common starting point. Don't let it be your end point. Evolve. You'll thank yourself later.
Any software product deteriorates rapidly without disciplined testing.
However, testing data pipelines is a hellish experience for new data developers.
Unfortunately, existing training about data pipeline testing give a scattered view of techniques for testing data pipelines. This book will help with a full view of modern data pipelines testing techniques in a highly-visual and coherent body of work. I hope it helps you in your career.
Why bother testing data pipelines? Billions of budget dollars regularly rely on the excellence of the data scientists, data engineers, and machine learning engineers behind the countless software data pipelines that inform critical business decisions.
Checkout the table of contents below to see how this book can help you evolve your data practices.
Unsure?! Here is a blog post to get your started
Best,
Moussa
Table of Contents
-
Chapter 1: Testing Your Patience
- Data Pipeline Transitive Failure Modes: The Reality Check
- Bad Data Devs Lifestyle
- TDD + CICD to the rescue?
- Objections to TDD for Data Work
- Sources of Data Validation Complexity
- The Data Product Promise No One Can Keep
- Fighting Against The Manual Auto-Pilot
- Observability vs. Testing vs. Monitoring
- Test-Driven Theater vs Continuous Delivery Theater
-
Chapter 2: Core Types of Data Pipeline Tests
- Discovering Holistic Testing
- Types of Tests: Test Boundaries
- Types of Tests: Test Sizes
- Types of Tests: Data Product Testing Quadrant
- Types of Tests: Write-Audit-Publish
- Types of Tests: Testing Grid
- Types of Tests: Code Scale vs Data Scale Testing Grid
- Types of Tests: Structuring Data Quality Tests
- Types of Tests: Pointwise vs Pairwise vs Composite
- Types of Tests: Testing SQL Queries
- Types of Tests: Assembling The Testing Parts + Bug Tests
- Feedback Levels vs. Testing Scales
- Test Pyramids and Test Summits
-
Chapter 3: Supporting Components for Data Pipelines Tests
- Supporting Pattern: Static vs Dynamic Test Data Generation
- Supporting Pattern: Data Copies, Clones, and Snapshots
- Supporting Pattern: Reverse Data Plane to Support Testing
- Supporting Pattern: Parallel Dev-Test Data Streams
-
Chapter 4: Testing Legacy Data Pipelines
- Legacy Testing Pattern I: Before Touching Anything -- End to End Characterization Tests
- Legacy Testing Pattern III: Semantic Monitoring
- Legacy Testing Pattern IV: Data Processing Platform Alerts
- Legacy Testing Pattern V: Co-Control Data Contracts
- Legacy Testing Pattern VI: Legacy Pipelines Golden Rule
-
Chapter 5: Design for Testability
- Designing Hidden Data Pipelines
- Designing Temporally Decoupled Data Pipelines
- Designing Debuggable Data Pipelines
- Designing Encapsulated Data Pipelines
- Designing Right-Tool-For-The-Job Data Pipelines
- Designing Feature Engineering Data Pipelines
- Designing Iceberg Data Pipelines
-
Chapter 6: Data-oriented Development Environments
- What Can You Do From Your Laptop?
- Optimal Data Development Environment
- Fundamental Data Dev Repo Components
- Coding Timeline vs Data Job Timeline
-
Chapter 7: Deploying Data Pipelines
- Useful CICD workflows for Data pipelines
- Data Pipeline Release lifecycle
- Testable Scheduled Jobs CICD Workflow
- Database Schema Versioning Rational
- Database Schema Versioning Golden Rule
- Database Schema Migrations - Fields Strategies
- Database Schema Migrations - Hidden Things To Test
-
Chapter 8: Tips for Data Organizations
- Data Organization Testability Score Cards vs Your Average Data Dev
- When To Give Up On Testing Data Pipelines
- Actors In A Data Product
- Organizational Friction To Disable Data Pipelines Testability
- Organizational Changes To Enable Data Pipelines Testability
-
Chapter 9: Is This It?
- With Great Responsibility Comes Great Capped Autonomy
- Data Dev Autonomy Destruction Cookbook
- The Fear Of Obsolescence
-
Outro
- References
- Release Notes
Causes Supported
Tree-Nation
You reforest the world
https://tree-nation.comTree-Nation is the largest reforestation platform enabling citizens and companies to plant trees around the world.
Other books by this author
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them