Leanpub | Store

From Query Fundamentals to Production-Grade Tuning Across Database Platforms

Steve Publications

Slow SQL queries waste time, increase costs, and limit scalability. Mastering SQL Performance Optimization teaches you how to analyze execution plans, optimize queries and indexes, and solve real-world performance bottlenecks across today's leading database platforms.

Build a SQL Database Engine in C++

Through Challenges --- Storage and Query Processing (Parts I & II)

Hatem M.

Build a working SQL database engine in C++20 -- from raw pages to a realquery executor -- one compilable, measured challenge at a time. Everyperformance claim is a benchmark you run yourself; every unit ends with areal bug, caught red-handed and fixed. Nothing asserted. Everythingdemonstrated.

The DuckDB Handbook

A Comprehensive Guide to In-Memory Analytical Processing

Steve Publications

DuckDB has changed how people work with data by bringing fast analytical queries to a lightweight embedded database. Whether you are exploring Parquet files, building data pipelines or embedding analytics into your application, this handbook shows you how to get the most out of DuckDB with practical examples and real-world techniques.

Databricks for Practitioners: Volume 2

The AI Lakehouse and Agentic Playbook: Analytics, Mosaic AI, Agents, and Lakebase

Ritesh Modi

RAG, Agent Bricks, the Multi-Agent Supervisor with MCP, Lakebase, MLflow 3, Lakehouse Monitoring, Feature Store, Vector Search. Every AI surface Databricks shipped at GA in 2025 and 2026, taught by a practitioner, current to 2026. What you will learn - Build RAG pipelines with Vector Search, embedding models, and citation grounding- Ship Agent Bricks for classification and information extraction- Orchestrate specialist agents with the Multi-Agent Supervisor and MCP- Use Lakebase as the operational Postgres layer for AI apps and agents- Detect data and model drift with Lakehouse Monitoring; wire alerts to retraining- Manage the ML lifecycle with MLflow 3 and the UC Model Registry- Govern features across training and serving with Feature Store (offline + online)- Serve foundation and custom models with AI Gateway controls Who this book is for Data engineers, ML engineers, and AI/ML architects who know PySpark and the Databricks platform and now need to ship production AI. Volume 3 is the recommended prerequisite. Table of Contents 1. Databricks SQL in Production. Warehouses, materialized views, three latency signals (admission, compilation, execution), the full dashboard backend wiring.2. External BI: Tableau, Power BI, dbt. Performance tips that take a dashboard from sluggish to instant, dbt configuration at incremental scale, the seam between BI and the lakehouse.3. AI/BI Dashboards. Anatomy of a Lakeview dashboard, draft vs published flow, the Dashboard Agent's reliable patterns, the five-grant permission model.4. Genie: Natural-Language Analytics. Grounding sources, the priority rule, the SQL Genie actually writes, the questions Genie answers cleanly versus the ones that confuse it.5. AI SQL Functions. ai_query, ai_parse_document, ai_extract for PDFs and HTML, univariate forecasts, the daily cost math for production AI SQL pipelines.6. Model Serving. Endpoints, the three fields that decide capacity and cost, the chat-completion payload, the five moving pieces of a production recommender.7. Foundation Models. Five major providers, the External Models config, the vendor-swap pattern (Claude to Gemini in hours, not weeks), the three habits that keep swap cost low.8. Vector Search and RAG. Six delta-sync arguments, three chunking strategies compared, the RAG function your app imports, end-to-end answer evaluation with traces.9. MLflow 3 and UC Model Registry. Versions, aliases, tags (and what each is not for), five tracking calls and what each one writes, the experiment-to-production lifecycle.10. Feature Store. Why SDP is the right producer, the six-file project layout, four parity-failure classes between offline and online stores and what causes each.11. MLOps as a Practice. Seven sources every incident reads from, three deploy patterns (canary, shadow, blue-green), three retrain strategies, five golden signals for an ML endpoint.12. Lakehouse Monitoring: Drift Detection. Six monitor parameters, the loop from drift alert to retraining, what to do when the baseline table is missing.13. Distributed Deep Learning. Three signals that force distributed training, picking the flavor (data, model, hybrid) from the bottleneck, four pieces of GPU memory worked out for a 7B model.14. Agent Bricks. Declarative classification and information-extraction agents, eval-set ingredients, the pre-compute pattern that makes small seed sets work.15. Multi-Agent Supervisor and MCP. The supervisor build, synthetic-turn evaluation, three real conversations end to end, the auth-passthrough chain across child agents.16. Lakebase: Operational Postgres for AI. Five alternatives compared, sub-10ms reads for AI apps, the lineage from Delta source through SDP into Postgres and onward to the endpoint.17. Capstone: Retail Intelligence App. Ten stages, each anchored to an earlier chapter. The smoke test that confirms every stage of the platform is reachable, the new-data path through the recommender.18. Certification and What's Next. The certification paths that actually map to the book, and the reading list the on-call team uses when something breaks.

DATABRICKS FOR PRACTITIONERS: Volume 1

The Production Lakehouse Playbook: Platform, Governance, and Data Engineering

Ritesh Modi

The Databricks platform and data-engineering playbook for the engineers who own pipelines, govern catalogs, and keep workloads on schedule. Sixteen chapters on Unity Catalog, Lakeflow, identity, observability, and performance. Azure examples; concepts mapped to AWS and GCP.

Production SQLite

Architecting Reliable Data Systems with the World's Most Deployed Database

Steve Publications

SQLite is everywhere but running it well in production takes more than flipping a few settings. This book goes beyond the basics to show how SQLite really works, then walks through the patterns, tools and tradeoffs behind reliable, high-performance deployments. Packed with practical examples, it's built for engineers who want confidence in production.

SQL and Database Testing: Query Your Way to Quality

From First SELECT to Schema Migrations, NoSQL, and GDPR Compliance

Yuri Syuganov

The bugs live in the data. From your first SELECT to migration testing, NoSQL, and GDPR-compliant masking — with exercises all the way.

Aprende SQL de 0 a Analista de Datos

Aprende, practica y resuelve desafíos de datos de SQL como un verdadero analista de datos

Adrian Rodriguez

No Description Available

SQL Mastery Series

A Problem-Solving Workbook — 84 Solved SQL Challenges from Beginner to Interview-Ready

Hatem M.

84 hand-picked SQL problems, fully solved and explained — from your first SELECT to interview-ready queries. Seven volumes of pure practice: a problem, the data, the answer, and why it works. No theory chapters, no filler, every query tested against a real database. Solve first, read second.

Spark 4.0 from Scratch

Advanced Processing & Production Mastery

Ritesh Modi

Structured Streaming, MLlib, GraphFrames, performance tuning, testing and CI, and the lakehouse. Eleven chapters that take a competent PySpark user from "the job runs" to "the on-call team trusts the job.

Spark 4.0 from Scratch

Foundations: From Your First DataFrame to Production-Ready Joins and Aggregations

Ritesh Modi

PySpark from page one. Ten chapters that take a Python user who knows pandas and turn them into someone who can write, read, and debug production PySpark, without a three-chapter detour through distributed-computing theory.

Filters

Books