Retrieval-Augmented Generation

Name: Retrieval-Augmented Generation
Brand: Leanpub
Price: 19.00 USD
Availability: InStock

An Engineer's Guide to Building RAG Systems with Your Own Data

This book is 100% completeLast updated on 2026-07-07

Jeroen Herczeg

The engineer's guide to RAG systems that survive a deploy.

This book is 100% completeLast updated on 2026-07-07

Jeroen Herczeg

The engineer's guide to RAG systems that survive a deploy.

Minimum price

$19.00

$29.00

You pay

Author earns

PDF

EPUB

WEB

APP

About

Retrieval-Augmented Generation

Minimum price

$19.00

$29.00

You pay

Author earns

About

About the Book

Most teams trying to ship a RAG system stall at the prototype stage. The notebook works, the demo wins the meeting, the system never reaches users at scale. The gap between "this works on my laptop" and "this runs reliably in production" is wide and full of engineering challenges. This book is about that gap.

It's written for engineers who need to ship something real. Not for researchers writing benchmarks, not for managers picking vendors. For the person at the keyboard who needs to make decisions about chunking strategy, vector store choice, evaluation methodology, and production operations, and who's tired of vendor-shaped blog posts and examples that don't survive a deploy.

Each chapter pairs concept with implementation. Real code on a real corpus, runnable end to end. The seven failure points of a RAG pipeline are introduced in chapter 1 and traced through every subsequent chapter, so you learn to recognize *where* things break, not just patch them when they do.

The book

Why standalone LLMs fail on private data, what RAG actually is, and the building blocks underneath: embeddings, chunking strategies, vector storage (FAISS vs pgvector vs Qdrant with measured benchmarks), and a complete ingestion pipeline that handles the messiness of real documents.

Wiring retrieval into generation. Sparse vs dense retrieval, BM25, hybrid search with reciprocal rank fusion, reranking with cross-encoders, query transformation patterns (multi-query, sub-question decomposition, HyDE). Every chapter measures the improvement instead of just describing it.

Evaluation done right (separate retrieval and generation metrics, RAGAS, ablation testing). Hardening the pipeline (observability, semantic caching, citation systems, embedding staleness, cost optimization, load testing). Advanced retrieval patterns (GraphRAG, Corrective RAG, Self-RAG) with honest takes on when each earns its keep. Then agentic RAG with realistic guardrails for production.

By the end you'll be able to

Choose a chunking strategy on retrieval evidence, not intuition
Pick FAISS, pgvector, or Qdrant based on your actual constraints
Build a RAG pipeline that handles real PDFs with OCR artifacts, encoding issues, and dirty markdown
Evaluate retrieval quality separately from generation quality, and prove your changes help
Add reranking, hybrid search, and query transformation when (and only when) they earn it
Catch the seven failure points before they reach production
Scale, monitor, and cost-optimize a RAG system that survives a deploy

Share this book

Feedback

Email the Author

Author

About the Author

Jeroen Herczeg

Jeroen Herczeg is a senior software engineer who builds AI systems for production.

He has 20 years of engineering experience across software platforms, distributed systems, microservices, Kubernetes, and product teams. His current work focuses on retrieval-augmented generation, AI agent orchestration, and practical AI engineering.

Most recently, he built the orchestrator agent for the Google + BBC AI Agents demo at IBC2025, winner of the Broadcast Tech Innovation Award. His interest in AI goes back to 2017, when he completed Udacity’s Artificial Intelligence Nanodegree. Today, that work has evolved into a focus on production RAG systems and AI agent orchestration.

He writes about practical AI engineering at herczeg.be/blog and lives in Belgium.

Launch

Launch Video

Subscribe on YouTube

Clips

View all clips →

Table of Contents

Preface

Who I am
Who it is for
How to read it

The problem RAG solves

What an LLM can and cannot do
Limitations of a standalone LLM
The RAG mental model
The RAG pipeline end-to-end
RAG vs. fine-tuning vs. long-context prompting
The seven failure points
Common misconceptions
Seeing the difference: standalone LLM vs. RAG
Summary

Embeddings

From words to vectors
The bi-encoder architecture
Generating embeddings locally
Generating embeddings via API
Cosine similarity and distance metrics
Visualizing embedding space with UMAP
Choosing an embedding model
Similarity search from scratch
Summary

Chunking strategies

The chunk size tradeoff
Fixed-size chunking
Recursive character splitting
Semantic chunking
Document-structure-aware chunking
Contextual chunking
Comparing strategies: A retrieval test
Summary

Vector storage and indexing

Exact vs. approximate nearest neighbor
The speed-accuracy-memory tradeoff
Choosing a vector store
How HNSW works
Building a FAISS index from scratch
pgvector: Vectors in PostgreSQL
Qdrant: A purpose-built vector database
Tuning index parameters
Putting it all together: the comparison benchmark
Summary

Building the ingestion pipeline

The ingestion flow
Parsing real-world documents
Text cleaning and normalization
The full pipeline: Parse, clean, chunk, embed, store
Metadata extraction and storage
Idempotent re-ingestion
Running the complete pipeline
Summary

Hybrid retrieval

Keyword retrieval and the BM25 mental model
Adding a search vector to the chunks table
Side by side: each retriever fails the other’s queries
Hybrid retrieval as candidate generation
Filters as candidate-set scoping
Putting it together: hybrid retrieval over the corpus
Summary

Your first RAG pipeline

Selecting context from the candidate pool
Building the prompt
The complete pipeline
Five queries: where the pipeline succeeds and fails
The failure catalog
What this pipeline cannot do yet
Summary

Reranking

Why first-stage retrieval optimizes for recall
Bi-encoder versus cross-encoder
Adding a local reranker with bge-reranker-v2-m3
Choosing K and N
Latency and the cost of cross-encoders
Did reranking actually improve answers?
When reranking is not worth it
The pipeline so far
Summary

Query transformation

Where query transformation belongs
Query rewriting
HyDE: search with a hypothetical answer
Multi-query expansion
Decomposition
When transformation hurts
A technique hierarchy
Summary

Evaluating RAG systems

Two evaluation surfaces
Building an evaluation set
Retrieval metrics
Generation metrics
The ablation table
Regression tracking
What evaluation will not tell you
Summary

Hardening for production

Stage-level observability
Tracing across stages
Failure modes and graceful degradation
Configuration and secrets
Model versioning and the silent-rebuild trap
Security boundaries in RAG systems
Deploying changes safely
The production baseline
Summary

Advanced retrieval patterns

Parent-document retrieval
Contextual retrieval
Graph-based retrieval
ColBERT and late interaction
The complexity test
Summary

Agentic RAG

Retrieval as a tool call
Multi-step reasoning loops
Bounding agentic loops
Observability for agents
When agentic RAG is worth it
Summary

Closing

What stays true
What does not work
What to do next

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

Download Sample PDF Download Sample EPUB

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

You pay

Author earns

About

The book

By the end you'll be able to

Share this book

Categories

Feedback

Author

Launch

Clips

Contents

Preface

The problem RAG solves

Embeddings

Chunking strategies

Vector storage and indexing

Building the ingestion pipeline

Hybrid retrieval

Your first RAG pipeline

Reranking

Query transformation

Evaluating RAG systems

Hardening for production

Advanced retrieval patterns

Agentic RAG

Closing

Get the free sample chapters

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub