Leanpub Header

Skip to main content

Distributed AI with Spark

We build scalable AI pipelines with Spark and add Dask where Python-first parallelism fits best. We then instrument everything with OpenTelemetry so distributed systems become observable, measurable, and debuggable.

Bought separately

$89.97

Minimum price

$79.99

$89.99

You pay

$89.99

Author earns

$71.99
$

...Or Buy With Credits!

You can get credits with a paid monthly or annual Reader Membership, or you can buy them here.

The following 3 books are included in this bundle...

These books have a total suggested price of $89.97. Get them now for only $79.99!
About

About

About the Bundle

This bundle targets teams building data and AI workloads that must scale and stay observable. We use Spark to develop scalable AI pipelines, add Dask for Python-first parallelism where it fits best, and then instrument the whole system with OpenTelemetry practices so we can trace, measure, and troubleshoot distributed behavior with confidence.

Books

About the Books

Parallel Python with Dask

Make code reusable and deployed for high performance web apps

Unlock the Power of Parallel Python with Dask: A Perfect Learning Guide for Aspiring Data Scientists

Dask has revolutionized parallel computing for Python, empowering data scientists to accelerate their workflows. This comprehensive guide unravels the intricacies of Dask to help you harness its capabilities for machine learning and data analysis.

Across 10 chapters, you'll master Dask's fundamentals, architecture, and integration with Python's scientific computing ecosystem. Step-by-step tutorials demonstrate parallel mapping, task scheduling, and leveraging Dask arrays for NumPy workloads. You'll discover how Dask seamlessly scales Pandas, Scikit-Learn, PyTorch, and other libraries for large datasets.

Dedicated chapters explore scaling regression, classification, hyperparameter tuning, feature engineering, and more with clear examples. You'll also learn to tap into the power of GPUs with Dask, RAPIDS, and Google JAX for orders of magnitude speedups.

This book places special emphasis on practical use cases related to scalability and distributed computing. You'll learn Dask patterns for cluster computing, managing resources efficiently, and robust data pipelines. The advanced chapters on DaskML and deep learning showcase how to build scalable models with PyTorch and TensorFlow.

With this book, you'll gain practical skills to:

  • Accelerate Python workloads with parallel mapping and task scheduling
  • Speed up NumPy, Pandas, Scikit-Learn, PyTorch, and other libraries
  • Build scalable machine learning pipelines for large datasets
  • Leverage GPUs efficiently via Dask, RAPIDS and JAX
  • Manage Dask clusters and workflows for distributed computing
  • Streamline deep learning models with DaskML and DL frameworks

Packed with hands-on examples and expert insights, this book provides the complete toolkit to harness Dask's capabilities. It will empower Python programmers, data scientists, and machine learning engineers to achieve faster workflows and operationalize parallel computing.

Table of Content

  1. Introduction to Dask
  2. Dask Fundamentals
  3. Batch Data Parallel Processing with Dask
  4. Distributed Systems and Dask
  5. Advanced Dask: APIs and Building Blocks
  6. Dask with Pandas
  7. Dask with Scikit-learn
  8. Dask and PyTorch
  9. Dask with GPUs
  10. Scaling Machine Learning Projects with Dask

OpenTelemetry Cookbook

Proven approaches for real-time monitoring and observability on cloud, AI, and modern infrastructures

A hands-on, recipe-driven book that puts OpenTelemetry into immediate use. This cookbook is for IT folks like developers, Linux admins, cloud engineers, backend pros, networking experts, and security practitioners. It's for anyone who wants a proven, hands-on way to keep an eye on, trace, and understand modern systems.

This book gives you step-by-step easy solutions to everyday observability challenges, so you can integrate, configure, and operate OpenTelemetry in dynamic environments. Each chapter focuses on solving problems that are directly relevant to production teams. These problems include installing and bootstrapping the Collector on Linux, wiring telemetry pipelines for traces, metrics, logs, and baggage, and integrating with the platforms that organizations trust for analysis and alerting.

Key Features
  • Get the OpenTelemetry Collector up and running on Linux, Docker, and Kubernetes.
  • Make and adjust pipelines to collect, process, and send out different telemetry signals.
  • Use it to develop apps across Python, Go, Java, and Node.js.
  • Boost signals with semantic info and resource detectors for more context.
  • Integrate with AWS CloudWatch, X-Ray, Elastic, Prometheus, Splunk, Datadog, New Relic, Grafana Tempo, Loki, and Jaeger.
  • Make a visual plan for telemetry and set up real-time alerts.

There's no need to get lost in theoretical jargon because OpenTelemetry Cookbook gets right to the meat and potatoes of implementation. Every recipe gives you a clear problem statement, a step-by-step solution, and practical validation. If you're just starting out with observability or want to level up your skills, this book's got you covered with clear steps to understand distributed, cloud-native, and hybrid systems.

This book builds a solid foundation for strong, easy-to-spot infrastructure and application settings, one step at a time. This book isn't about offering quick fixes or magic solutions. It gives you a full set of tools and techniques that help professionals improve visibility, performance, and reliability in their own technical landscapes.

Table of Content
  1. Bootstrapping OpenTelemetry
  2. Building Telemetry Pipelines
  3. Instrumenting Applications with OpenTelemetry SDKs
  4. Code-Based Instrumentation and Auto-Instrumentation
  5. Telemetry with Attributes and Resource Detectors
  6. Advanced Signal Processing and Filtering
  7. Observability in Kubernetes
  8. Integrations with Cloud and Observability Platforms
  9. Visualizing and Alerting Telemetry Data

Private AI with Spark

Design, package, and operate private AI locally using Apache Spark, batch pipelines, and vLLM acceleration

For those who want to build controlled, reproducible AI systems entirely within your own infrastructure, this book is the most practical and implementation-focused trainer. Instead of relying on external APIs or cloud-hosted intelligence services, this book clearly demonstrates how Apache Spark can orchestrate data preparation, model training, batch inference, reporting, and LLM acceleration in a disciplined and transparent way.

As the book opens, it swiftly defines private AI, making it clear that external AI calls are not allowed, full ownership of datasets and model assets is imperative, and repeatable runs with traceable outputs are essential. I will use a realistic sample to show you how to build an end-to-end workflow that ingests raw data, normalizes it into a stable schema, trains a baseline classifier, extracts keywords, generates summaries, and produces structured reports. There's no doubt that each step is implemented with clarity and attention to maintainability. You can be sure that logging, manifests, and monitoring are embedded from the start. We implement classic machine learning techniques, vLLM, performance measurement, batch processing patternsquarantine handling, and structured metrics to make private AI more usable and compete with cloud-based AI.

Beyond experimentation, the book transitions seamlessly into packaging and routine execution. It will teach you to bundle multiple stages into a single command workflow, schedule daily or weekly runs, generate compact run reports, and adapt the architecture to new datasets without redesigning the system. It does not promise instant transformation or one-click AI solutions. Instead, it provides a structured path to building a sustainable private AI backbone using Spark as the orchestration layer.

Key Learnings
  • No external AI calls and full control over data, models, and repeatable runs.
  • Stable canonical schema with downstream ML and reusable reporting.
  • Infuse Classic ML with Spark without introducing LLM complexity.
  • Carry out extractive summaries without hallucination risk.
  • Complete traceability through manifests, prompt versions, and run logs.
  • Implement data and batch flow, along with fast inference using vLLM.
  • Extract inspectable data and surface out the hidden errors using quarantine tables.
  • Measure and store performance for every run with stakeholder reporting.
  • Design single-command pipeline with clear configs to build repeatable AI.
Table of Content
  1. Up and Running with Private AI
  2. Data Workflows using Spark DataFrames
  3. Powerful NLP without LLM
  4. Batch Inference and Practical Outputs
  5. Smart Summaries
  6. Boosting with vLLM Integration
  7. Packaging Private AI

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub