June 10, 2026

Retrieval-Augmented Generation

Jeroen Herczeg

Watch on YouTube Get "Retrieval-Augmented Generation"

Description

Welcome to the Leanpub Launch video for Retrieval-Augmented Generation: An Engineer's Guide to Building RAG Systems with Your Own Data https://leanpub.com/retrieval-augmented-generation by Jeroen Herczeg! 00:00 Introduction to Retrieval-Augmented Generation 00:15 Jeroen's background in software engineering and AI 01:09 The challenge of moving AI systems into production 02:08 Why the book focuses on production-ready RAG 02:35 Core concepts: retrieval, augmentation, and generation 03:20 Why engineering RAG systems is difficult 03:59 The structure of the book and the RAG pipeline 05:15 Diagnosing failures across the RAG pipeline 05:45 Building and evaluating each stage independently 06:21 From basic RAG to advanced and agentic approaches 06:36 What is Agentic RAG? 07:43 When agentic workflows are worth the complexity 07:50 Security, governance, and cost considerations 08:52 Final thoughts and book recommendation About the Book Most teams trying to ship a RAG system stall at the prototype stage. The notebook works, the demo wins the meeting, the system never reaches users at scale. The gap between "this works on my laptop" and "this runs reliably in production" is wide and full of engineering challenges. This book is about that gap. It's written for engineers who need to ship something real. Not for researchers writing benchmarks, not for managers picking vendors. For the person at the keyboard who needs to make decisions about chunking strategy, vector store choice, evaluation methodology, and production operations, and who's tired of vendor-shaped blog posts and examples that don't survive a deploy. Each chapter pairs concept with implementation. Real code on a real corpus, runnable end to end. The seven failure points of a RAG pipeline are introduced in chapter 1 and traced through every subsequent chapter, so you learn to recognize *where* things break, not just patch them when they do. The book Why standalone LLMs fail on private data, what RAG actually is, and the building blocks underneath: embeddings, chunking strategies, vector storage (FAISS vs pgvector vs Qdrant with measured benchmarks), and a complete ingestion pipeline that handles the messiness of real documents. Wiring retrieval into generation. Sparse vs dense retrieval, BM25, hybrid search with reciprocal rank fusion, reranking with cross-encoders, query transformation patterns (multi-query, sub-question decomposition, HyDE). Every chapter measures the improvement instead of just describing it. Evaluation done right (separate retrieval and generation metrics, RAGAS, ablation testing). Hardening the pipeline (observability, semantic caching, citation systems, embedding staleness, cost optimization, load testing). Advanced retrieval patterns (GraphRAG, Corrective RAG, Self-RAG) with honest takes on when each earns its keep. Then agentic RAG with realistic guardrails for production. By the end you'll be able to Choose a chunking strategy on retrieval evidence, not intuition Pick FAISS, pgvector, or Qdrant based on your actual constraints Build a RAG pipeline that handles real PDFs with OCR artifacts, encoding issues, and dirty markdown Evaluate retrieval quality separately from generation quality, and prove your changes help Add reranking, hybrid search, and query transformation when (and only when) they earn it Catch the seven failure points before they reach production Scale, monitor, and cost-optimize a RAG system that survives a deploy About the Author Jeroen Herczeg is a senior software engineer who builds AI systems for production. He has 20 years of engineering experience across software platforms, distributed systems, microservices, Kubernetes, and product teams. His current work focuses on retrieval-augmented generation, AI agent orchestration, and practical AI engineering. Most recently, he built the orchestrator agent for the Google + BBC AI Agents demo at IBC2025, winner of the Broadcast Tech Innovation Award. His interest in AI goes back to 2017, when he completed Udacity’s Artificial Intelligence Nanodegree. Today, that work has evolved into a focus on production RAG systems and AI agent orchestration. He writes about practical AI engineering at herczeg.be/blog and lives in Belgium. Follow the author here! https://x.com/jeroenherczeg Thank you for watching, please like and leave a comment, we'd love to hear from you! Please Subscribe and Follow! YouTube: https://www.youtube.com/leanpub X: https://x.com/leanpub Instagram: https://www.instagram.com/leanpub Facebook: https://www.facebook.com/leanpub Create Your Own Leanpub Book! You can create your own book anytime here: https://leanpub.com/create/book Here's the tutorial showing how to write and publish a Leanpub book in your browser (it's free!): http://help.leanpub.com/en/articles/2932527-getting-started-writing-a-book-in-the-web-browser-writing-mode If you're a Leanpub author and you'd like to submit your own Launch video for us to publish, or if you'd like to record a Launch video with Len, please go here: https://leanpub.com/launch. #books #leanpublishing #selfpublishing #leanpub #writing #ai #softwarearchitecture #softwareengineering