Table of Contents
Building Production-Ready Gen AI Systems
Architecture, Patterns & Governance for LLM Systems
5 Chapters 8 Patterns 12 Principles 34 Checklist Items 56 Pages Front Matter —Preface
Why Gen AI systems fail differently from traditional software, scope declaration, and how to use this book.
—A Note on Build vs Buy New
Foundation model access models, self-hosting breakeven calculation, and how to evaluate providers beyond accuracy.
- Managed API vs self-hosted open-weight vs hybrid routing
- Breakeven: 500M–2B tokens/month for a single A100 (with derivation)
- Provider evaluation: support SLAs, rate limits, data commitment terms, model version stability
- Fine-tuning vs prompt engineering cost crossover criteria
The AI Maturity Model
Assessing organisational readiness before committing to architecture.
- L1 Foundational — direct API, no gateway, no evaluation
- L2 Contextual — RAG pipeline, prompt registry, gateway layer
- L3 Agentic — tool calling, full observability, CI eval harness
- L4 Adaptive — fine-tuned specialists, multi-agent mesh, continuous feedback
- Capability coverage table across all four levels
- Pre-engagement maturity assessment framework
Canonical AI System Architecture
The six-layer reference blueprint for production AI systems.
- Layer 1: Clients — Web/Mobile UI, CLI/API consumers, Event Bus triggers
- Layer 2: Gateway — AI Gateway, Guardrail Engine (input and output)
- Layer 3: Orchestration — Prompt Registry, Agent Runtime, Memory Manager
- Layer 4: Models — Foundation model, Specialist/Fine-tuned, Embedding model
- Layer 5: Data — Vector store trade-offs, Structured store, Object storage, Chunking strategy
- Layer 6: Eval & Ops — Evaluation harness, Observability stack, Feedback loop
Design Patterns
Eight reusable architectures from simple retrieval to multi-agent systems. Each pattern includes When to Use criteria, Strengths, Limitations, and code skeleton.
- Pattern 1: Naive RAG · Retrieval · L2 · Low complexity
- Pattern 2: Advanced RAG · Hybrid search, cross-encoder re-ranking · L2–L3 · Medium
- Pattern 3: ReAct Agent · Reason + Act loop with step budget · L3 · High
- Pattern 4: Plan-and-Execute Agent · DAG decomposition · L3 · High
- Pattern 5: Structured Output Pipeline · Schema-validated extraction · L2–L3
- Pattern 6: Fine-Tuning / PEFT · LoRA/QLoRA specialists · L4 · Very High
- Pattern 7: Evaluation-Driven Iteration · All levels · Quality discipline
- Pattern 8: Multi-Agent Collaboration · Specialist mesh · L4 only
Architecture Principles
Twelve non-negotiable standards distilled from production failures. Each principle includes an example and an anti-pattern.
- P01 Version Everything — prompts, models, embeddings, datasets
- P02 Separate Concerns by Layer — no direct model API calls from application code
- P03 Design for Failure — retries, circuit breakers, model drift, hallucination cascade, tool schema drift
- P04 Guardrails Are Not Optional — direct and indirect prompt injection, input/output gates
- P05 Evaluate Before You Ship — golden set sizing, statistical confidence, held-out test sets
- P06 Instrument Every Call — OpenTelemetry, LangSmith, cost attribution
- P07 Minimise Context Surface — context budget, compression, large-window trade-offs
- P08 Scope Agent Authority — least-privilege manifests, step budgets, human-in-the-loop
- P09 Own the Embedding Contract — index immutability, dual-write migration
- P10 Isolate Tenant Data — collection-level isolation, application-layer anti-pattern
- P11 Capture Feedback Continuously — ground truth expansion, PII stripping
- P12 Document Architecture Decisions — AIADRs, regulatory classification
The 34-Point Engagement Checklist
Quality gates from pre-engagement through production go-live. Every item requires evidence, not acknowledgement.
- Phase 1: Pre-Engagement (items 01–06) — objectives, data inventory, maturity level, model selection, compliance, eval golden set
- Phase 2: Architecture & Design (items 07–14) — gateway, guardrails, chunking strategy, tenant isolation, agent step budget, prompt registry, memory TTL, fallback
- Phase 3: Delivery & Handover (items 15–34) — CI eval harness, observability, feedback capture, AIADRs, load testing, security review, runbook, rollback, data retention, cost alerts, DR test, UAT, production readiness sign-off
Glossary of Key Terms 25 terms
AIADR, Agent Runtime, Circuit Breaker, Cross-Encoder, DAG, Embedding Model, Evaluation Harness, Foundation Model, Gen AI, Guardrail Engine, Golden Eval Set, LoRA/QLoRA, Namespace Isolation, OpenTelemetry, PEFT, PII, Prompt Registry, RAG, RAGAS, ReAct, Step Budget, Tenant Isolation, TTL, Vector Store, and more.
BRegulatory Mapping — Quick Reference 12 principles
All twelve architecture principles mapped to NIST AI RMF functions, ISO 42001 clauses, and EU AI Act articles. Includes compliance depth note and EU AI Act risk tier guidance.
CCase Study: Autonomous M&A Due Diligence Agent Full worked example
Every framework section applied to a single complex agentic system for a mid-market private equity firm processing 5,000–40,000 documents per deal.
- A1: Scenario brief — stakeholders, constraints, success metrics
- A2: Maturity assessment — L2 current state, L3 target, L4 roadmap
- A3: Architecture decisions — all six layers with rationale
- A4: Pattern selection — four adopted, two explicitly rejected with reasons
- A5: Principles in practice — five most critical with implementation evidence
- A6: Completed 34-point checklist — all items evidenced; one item marked DEFERRED with rationale