Kick off your book project in 2 hours! Live workshop on Zoom. You’ll leave with a real book project, progress on your first chapter, and a clear plan to keep going. Tuesday, June 16, 2026. Learn more…

Leanpub Header

Skip to main content

DATABRICKS FOR PRACTITIONERS: Volume 1

The Production Lakehouse Playbook: Platform, Governance, and Data Engineering

The Databricks platform and data-engineering playbook for the engineers who own pipelines, govern catalogs, and keep workloads on schedule. Sixteen chapters on Unity Catalog, Lakeflow, identity, observability, and performance. Azure examples; concepts mapped to AWS and GCP.

Minimum price

$19.00

$24.00

You pay

Author earns

$

Also available for 1 book credit with a Reader Membership

PDF
About

About

About the Book

A practical, depth-first guide to running Databricks in production.

The complete platform-and-data-engineering playbook for the engineers who own pipelines, govern catalogs, and keep workloads on schedule. Current to 2026, with examples on Azure Databricks and concepts that apply unchanged on AWS and GCP.

Key Features

  • Build the governed Databricks platform layer by layer, from workspaces and compute to Unity Catalog, identity, and access control
  • Ship production data pipelines with Lakeflow Spark Declarative Pipelines, Jobs, Declarative Automation Bundles, and CI/CD from Git
  • Tune for cost and performance with Photon, Adaptive Query Execution, Liquid Clustering, and the Query Profile UI

Examples run on Azure Databricks. What happens inside Databricks is identical on AWS and GCP; where the cloud seams differ (identity, storage, secrets, networking), chapters name the AWS and GCP equivalents explicitly.

What you will learn

- Architect a governed Databricks workspace from metastore to volume

- Configure Unity Catalog with privileges, ABAC, Governed Tags, and lineage

- Integrate Microsoft Entra ID identity, SCIM, and service principals

- Build ingestion with Lakeflow Connect, Auto Loader, and streaming tables

- Author bronze-silver-gold pipelines with Lakeflow Spark Declarative Pipelines

- Deploy from Git with Declarative Automation Bundles and GitHub Actions

- Observe billing, audit, query history, and lineage through system tables

- Tune performance with Photon, AQE, Liquid Clustering, and the Query Profile UI

Who this book is for

Data engineers, platform engineers, and architects who already know PySpark and now need to run it on Databricks at production depth. A working knowledge of PySpark, Spark SQL, and Delta Lake is expected. Readers new to Spark should start with Volumes 1 and 2 of the series.

Table of Contents

1. Databricks: The Platform on Top of Spark. Why Databricks exists, what it adds on top of open-source Spark, and how the layers stack so you can see the platform clearly.

2. Workspaces, Notebooks, and Git Folders. The workspace surface, multiplayer notebook ergonomics, Git Folders, and the seven traps that ruin notebook-driven development.

3. Compute: Classic, Serverless, and Cluster Policies. Six compute types with one decision tree. Photon eligibility, cluster policies that prevent runaway costs, the cold-start math.

4. Unity Catalog Architecture. Metastore, catalogs, schemas, tables, volumes, the three-grant cascade. How every securable connects, traced end to end with a single SELECT query.

5. Access Control: Privileges, ABAC, and Governed Tags. The grant model, attribute-based access for cross-table policies, tag-driven row filters and column masks that scale to 400 tables.

6. Identity: Entra ID, SCIM, and Service Principals. Users, groups, SPs; OAuth flows; the anti-patterns the auditor will catch, including the JDBC-string trap and the printed-secret trap.

7. Managed Tables: Delta, Iceberg, and UniForm. The default Delta path, Iceberg v3 features, REST Catalog access from DuckDB, and when to use which format.

8. Liquid Clustering and Predictive Optimization. CLUSTER BY versus PARTITION BY, the four-column limit, Predictive Optimization, and when PO does what you would otherwise script.

9. System Tables and Platform Observability. Billing, audit, query history, lineage. The SQL surface for everything the UI shows, so you can put a dashboard on top of any of it.

10. Ingestion: Lakeflow Connect, Auto Loader, and Streaming Tables. File-discovery modes, file-notification vs directory listing, CDC pipelines, partner connectors, the four trigger modes.

11. Lakeflow Spark Declarative Pipelines. The CREATE PIPELINE shape, bronze/silver/gold patterns, the DLT-to-SDP migration path, the validation contracts that catch bad data early.

12. Lakeflow Jobs and Scheduling. Tasks, dependencies, retries, file-arrival triggers, repair-and-rerun. The Job that runs the nightly bronze-silver-gold pipeline, fully traced.

13. Declarative Automation Bundles. Project structure, targets, deploy/run/destroy, the YAML decoded clause by clause, the staging-vs-production override patterns.

14. CI/CD with GitHub Actions. OIDC federation (no client secrets), six PR validations, staging vs production gates, the full promote-the-pipeline worked example with rollback.

15. Performance: Photon, AQE, and the Query Profile UI. Four EXPLAIN flavors, reading the plan bottom-up, the salting decision tree, the cache vs MV vs raw-query choice.

16. Metric Views and the Bridge to Volume 4. The CREATE METRIC VIEW shape, composing metrics from metrics, the certified-semantics layer every dashboard in Volume 4 reads through.

Bundle

Bundles that include this book

Author

About the Author

Ritesh Modi

Ritesh Modi is Head of AI at MarketOnce and a former Forward Deployed Engineer at Microsoft. He has spent more than a decade building and shipping production systems across cloud, distributed computing, and applied machine learning, working with organizations ranging from global enterprises to fast-moving startups. His recent work focuses on applied large language models, designing systems that turn pretrained models into reliable, task-specific tools.

Ritesh has authored multiple technology books and speaks regularly at industry conferences on AI, cloud architecture, and software engineering. His writing philosophy rests on a simple belief: the best technical books are written by practitioners who still remember what it felt like to not understand something, not by experts who have forgotten. Every explanation in this book was tested against that standard, if it would not have made sense to him when he was first learning this material, it was rewritten until it did.

He writes, shares ideas, and connects with readers at www.riteshmodi.com. When he is not writing or building AI systems, he can be found mentoring engineers, exploring new architectures, or debugging a training run that should have converged three hours ago.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub