Leanpub Header

Skip to main content

Claude Code: Building Production Agents That Actually Scale

The practitioner's guide to Claude Code in production. Thirty chapters covering the agent loop, tools, hooks, MCP, permissions, evals, observability, and cost engineering for AI agents that actually scale.

Minimum price

$9.99

$29.00

You pay

$29.00

Author earns

$23.20
$
You can also buy this book with 1 book credit. Get book credits with a Reader Membership or an Organization Membership for your team.
PDF
EPUB
WEB
466
Pages
About

About

About the Book

Claude Code is Anthropic's framework for building AI agents that use tools, delegate to subagents, and run autonomously. Most tutorials stop at "hello world." This book covers what happens after that: when your Claude Code agent needs to run reliably in production, at scale, in environments where failure has real consequences.

Written by an engineer who builds Claude Code agent systems for regulated financial institutions, it walks through the full stack: the agent loop, context management, tools, hooks, MCP servers, the permissions model, sandboxing, secrets handling, evals, LLM-as-judge, observability, cost engineering, and team workflows. Thirty chapters, five practical appendices, and code extracted from production systems.

If you are a senior engineer, technical lead, or architect evaluating Claude Code for production use, this is the reference that will save you months of trial and error.

Author

About the Author

Thomas De Vos

Thomas De Vos has spent over a decade building AI systems for regulated financial institutions and twenty-five years as a software engineer. He leads AI strategy and implementation for global banking, insurance, and fintech clients, where production reliability is not optional. He writes and speaks frequently on building autonomous systems that hold up under real-world constraints.

Contents

Table of Contents

Preface

  1. Why This Book
  2. Who This Book Is For
  3. How to Read This Book
  4. Acknowledgements

Chapter 1. The agent loop, seriously

  1. Learning objectives
  2. 1.1 The minimal loop
  3. 1.2 What Claude Code adds on top
  4. 1.3 Where state actually lives
  5. 1.4 Worked example: an adverse media screening agent
  6. 1.5 The ten-minute whiteboard version
  7. 1.6 A note on vocabulary
  8. Summary
  9. Exercises
  10. Notes

Chapter 2. What Claude Code is in April 2026

  1. 2.1 The CLI as ground truth
  2. 2.2 IDE surfaces: VS Code, JetBrains, and the new wave
  3. 2.3 CI surfaces: GitHub Actions
  4. 2.4 Chat surfaces: Slack
  5. 2.5 Other surfaces and what this book does not cover
  6. 2.6 How versions propagate
  7. 2.7 What you need to know before committing to a surface
  8. Summary
  9. Exercises
  10. Notes

Chapter 3. The model family and why it matters for the loop

  1. 3.1 The three sizes
  2. 3.2 Context windows and how they fail
  3. 3.3 Prompt caching and why it changes the game
  4. 3.4 Cost shape, not cost per token
  5. 3.5 The model selection matrix
  6. 3.6 Performance and the SWE-Bench baseline
  7. Summary
  8. Exercises
  9. Notes

Chapter 4. Context as a first-class resource

  1. 4.1 The anatomy of a turn
  2. 4.2 CLAUDE.md as the system of truth
  3. 4.3 Auto-memory and when it lies
  4. 4.4 The session transcript and context accumulation
  5. 4.4a Configuring auto-memory and compaction
  6. 4.5 Context budgeting worksheet
  7. 4.6 Worked example: A compliance monitoring agent
  8. Summary
  9. Exercises
  10. Notes

Chapter 5. The production definition

  1. 5.1 Five properties of a production agent
  2. 5.2 Observable: Audit trails and reasoning
  3. 5.3 Reversible: Undo and correction
  4. 5.4 Evaluable: Metrics and measurement
  5. 5.5 Governable: Change control and accountability
  6. 5.6 The readiness rubric
  7. 5.7 Worked example: The adverse media screening agent
  8. 5.8 What this book will and will not help with
  9. Summary
  10. Exercises
  11. Notes

Part II: The Primitive Stack

Chapter 6. Tools as an API surface

  1. Learning objectives
  2. 6.1 What a tool actually is
  3. 6.2 The three categories of tools
  4. 6.3 Error handling and what the model learns on failure
  5. 6.4 Versioning tools without breaking agents
  6. 6.5 Rate limiting and cost attribution
  7. 6.6 Worked example: the screening toolkit
  8. 6.7 Tool schemas as governance
  9. Summary
  10. Exercises
  11. Notes

Chapter 7. Subagents and the delegation model

  1. Learning objectives
  2. 7.1 What a subagent is and is not
  3. 7.2 The parent-child contract
  4. 7.3 When to spawn vs. when to inline
  5. 7.4 Context isolation and why it matters
  6. 7.5 The fan-out pattern
  7. 7.6 Subagent configuration
  8. 7.7 Common misuses
  9. 7.8 When to use what: a decision guide across the primitive stack
  10. 7.9 Worked example: a compliance inquiry agent with subagents
  11. Summary
  12. Exercises
  13. Notes

Chapter 8. Hooks, the only supervisor you have

  1. Learning objectives
  2. 8.1 What a hook is and what it is not
  3. 8.2 Hook matching patterns
  4. 8.3 Writing a PreToolUse hook
  5. 8.4 Writing a PostToolUse hook
  6. 8.5 The four-eyes hook pattern
  7. 8.6 Where hooks live
  8. 8.7 Hooks as a debugging tool
  9. 8.8 Worked example: a four-eyes compliance screening agent
  10. Summary
  11. Exercises
  12. Notes

Chapter 9. Slash commands and skills

  1. Learning objectives
  2. Learning objectives (continued)
  3. 9.1 What a skill is
  4. 9.2 The skill file anatomy
  5. 9.3 Writing instructions that agents can follow
  6. 9.4 Auto-invoke versus manual invoke
  7. 9.5 Skill versioning and audit trails
  8. 9.6 Worked example: a screening skill
  9. 9.7 Skills, tools, and hooks
  10. Summary
  11. Exercises
  12. Notes

Chapter 10. MCP and the integration plane

  1. Learning objectives
  2. 10.1 What MCP actually is
  3. 10.2 How MCP works
  4. 10.3 Transport mechanisms: stdio and HTTP
  5. 10.4 Writing an MCP server
  6. 10.5 MCP security: authentication, rate limiting, and PII
  7. 10.6 When MCP is the right choice
  8. 10.7 Worked example: an MCP server for adverse media
  9. Summary
  10. Exercises
  11. Notes

Chapter 11. Plugins and Marketplaces

  1. Learning objectives
  2. 11.1 What a plugin contains
  3. 11.2 Installation and lifecycle
  4. 11.3 Trust and provenance
  5. 11.4 The marketplace model
  6. 11.5 Building a plugin for your team
  7. 11.6 The FS compliance plugin: a worked example
  8. 11.7 Summary
  9. Exercises
  10. Footnotes

Chapter 12. Putting the stack together: a compliance monitoring agent end to end

  1. Learning objectives
  2. 12.1 The workload in detail
  3. 12.2 The agent architecture
  4. 12.3 Session walk-through: screening a customer
  5. 12.4 The governance stack in action
  6. 12.5 Multi-entity orchestration
  7. 12.6 Cost attribution and budget management
  8. 12.7 Audit trail example
  9. 12.8 What this chapter does not cover (and where to find it)
  10. Summary
  11. Exercises
  12. Notes

Chapter 13. When to leave the CLI, and how to embed it when you do

  1. Learning objectives
  2. 13.1 The five good reasons
  3. 13.2 The fifteen false positives
  4. 13.2.2 Performance and latency (two variations)
  5. 13.2.3 Integration (four variations)
  6. 13.2.4 Governance (two variations)
  7. 13.2.5 Multi-tenancy, queuing, and workflows (two final ones)
  8. 13.3 The subprocess pattern: CLI from the SDK
  9. 13.4 Decision flowchart as prose
  10. 13.5 Worked example: a screening inquiry handler with audit
  11. 13.6 Enterprise deployment: from CLI to Kubernetes
  12. Summary
  13. Exercises
  14. Footnotes

Chapter 14. The SDK in one sitting

  1. Learning objectives
  2. 14.1 The query function
  3. 14.2 Options and configuration
  4. 14.3 Message types
  5. 14.4 Tool registration and schemas
  6. 14.5 The minimal working agent in fifty lines (Python)
  7. 14.6 The same agent in TypeScript
  8. 14.7 What the SDK does not do
  9. Summary
  10. Exercises
  11. Footnotes

Chapter 15. Tools, hooks, and subagents from the SDK side

  1. Learning objectives
  2. 15.1 Registering tools with schemas
  3. 15.2 PreToolUse and PostToolUse semantics
  4. 15.3 Hook matching in the SDK
  5. 15.4 Spawning subagents
  6. 15.5 Worked example: compliance inquiry handler
  7. Summary
  8. Exercises
  9. Footnotes

Chapter 16. Sessions, state, and durability

  1. Learning objectives
  2. 16.1 The session model
  3. 16.2 Storage options and tradeoffs
  4. 16.3 Durability patterns
  5. 16.4 Session replay for debugging
  6. 16.5 Worked example: inquiry handling across analyst shifts
  7. Summary
  8. Exercises
  9. Footnotes

Chapter 17. The permissions model, read it twice

  1. Learning objectives
  2. The three permission modes
  3. Per-tool permissions
  4. Permissions at scale: containerized deployments
  5. File path restrictions
  6. The permission escalation problem
  7. Default-deny as the only sane starting point
  8. Permission auditing
  9. Worked Financial Services example: the sanctions list incident
  10. Summary
  11. Exercises
  12. Footnotes

Chapter 18. Sandboxing and the blast radius question

  1. Learning objectives
  2. OS-level sandboxing: filesystem, process, and network boundaries
  3. Container-based isolation
  4. The blast radius framework
  5. Sandbox configuration across environments
  6. Reducing blast radius: practical design patterns
  7. Measuring blast radius
  8. Worked Financial Services example: blast radius analysis for a compliance agent with customer PII access
  9. Summary
  10. Exercises
  11. Footnotes

Chapter 19. Network egress, secrets, and data boundaries

  1. Learning objectives
  2. Network allowlisting
  3. Secret injection patterns: where and how to manage API keys, tokens, and credentials
  4. Environment variable hygiene
  5. Data classification and the agent
  6. PII in context windows
  7. The screening workload stress test
  8. Worked Financial Services example: a screening agent that logs PII to an observability endpoint
  9. Summary
  10. Exercises
  11. Footnotes

Chapter 20. Policy as code: managed settings, audit, and lineage

  1. Learning objectives
  2. The managed-settings.json anatomy
  3. Deploying settings at scale
  4. Versioning policy: the policy repo
  5. Audit logging: what to capture, where to send it
  6. Lineage: tracing a decision back through the agent’s tool calls
  7. The forensic question: reconstructing what the agent did
  8. Worked Financial Services example: responding to a compliance audit with lineage
  9. Summary
  10. Exercises
  11. Footnotes

Chapter 21. The supply chain: plugins, skills, MCP servers

  1. Learning objectives
  2. The trust problem
  3. Provenance and signing
  4. Version pinning
  5. The audit before you install
  6. Dependency sprawl
  7. Building an internal registry
  8. The vendor plugin problem in regulated industries
  9. Worked Financial Services example: the sanctions plugin that was phoning home
  10. Summary
  11. Exercises
  12. Footnotes

Chapter 22. The eval mindset

  1. Learning objectives
  2. Why evals are non-negotiable
  3. The three eval types
  4. What to eval in an agent system versus a completion system
  5. The eval-first development loop
  6. Building eval into CI
  7. Continuous evals and the AgentOps connection
  8. When evals fail: kill switches, canary rollouts, and automated responses
  9. The eval debt trap
  10. FS example: Screening agents and outcome drift
  11. Summary
  12. Exercises

Chapter 23. Designing evals for agentic workloads

  1. Learning objectives
  2. Trajectory versus answer grading
  3. Building golden datasets for agents
  4. The label leakage problem
  5. Eval harness architecture
  6. How Claude Code fits into the eval workflow
  7. Tracing agent trajectories and decision points
  8. Determinism traps
  9. Statistical validity for small eval sets
  10. FS example: Evaluating adverse media screening without label leakage
  11. Summary
  12. Exercises

Chapter 24. LLM-as-judge, done seriously

  1. Learning objectives
  2. What a judge model is
  3. Writing judge prompts that actually discriminate
  4. Calibrating judges against human ratings
  5. Judge drift and how to detect it
  6. Multi-judge panels
  7. The circularity question
  8. The cost of judging
  9. Skills 2.0 judges
  10. FS example: Calibrating a judge on inquiry triage
  11. Summary
  12. Exercises

Chapter 25. Observability: traces, spans, and the agent timeline

  1. Learning objectives
  2. What to instrument
  3. The trace model for agents: turns, tool calls, subagents
  4. The observability stack and where Claude Code fits
  5. OpenTelemetry integration
  6. MLflow for experiment tracking
  7. Opik for agent-specific traces
  8. Datadog for production monitoring
  9. Building dashboards that matter
  10. Agent cards and KPI-driven responses
  11. Alert design for agents
  12. FS example: Instrumentation for a KYC agent
  13. Summary
  14. Exercises

Chapter 26. Failure modes and reliability engineering

  1. Learning objectives
  2. The failure taxonomy
  3. Detecting failure modes
  4. SLOs for agents
  5. Error budgets
  6. Rollback strategies for agents
  7. Where the SRE playbook breaks for agents
  8. Circuit breakers and fallback patterns
  9. Incident response for agent failures
  10. FS example: Sycophantic convergence in a screening agent
  11. Summary
  12. Exercises

Chapter 27. Cost engineering

  1. Learning objectives
  2. 27.1 The four cost types
  3. 27.2 Token cost mechanics and optimisation levers
  4. 27.3 Time cost: why wall-clock matters more than tokens
  5. 27.4 Review cost: the hidden tax on every agent output
  6. 27.5 Context-switch cost: what it costs the human to re-engage
  7. 27.6 Building a cost model
  8. 27.7 Cost monitoring and alerts
  9. 27.8 The cost-quality tradeoff curve
  10. 27.9 Worked example: cost model for a screening workload
  11. Summary
  12. Exercises
  13. Notes

Chapter 28. Team workflows and the agent factory

  1. Learning objectives
  2. 28.1 The pair: one developer, one agent
  3. 28.2 The pool: shared agents, team governance
  4. 28.3 The platform: agents as infrastructure
  5. 28.4 When each pattern fits
  6. 28.5 The agent factory: provisioning, monitoring, and lifecycle
  7. 28.6 The golden path for new agent workloads
  8. 28.7 Responsibilities: the platform team vs the developer
  9. 28.8 Worked example: scaling from pair to platform
  10. Summary
  11. Exercises
  12. Notes

Chapter 29. Anti-patterns from the field, and how to migrate legacy automation without regret

  1. Learning objectives
  2. 29.1 Anti-pattern 1: The decision loop
  3. 29.2 Anti-pattern 2: The inference loop
  4. 29.3 Anti-pattern 3: Prompt brittleness
  5. 29.4 Anti-pattern 4: The tool trap
  6. 29.5 Anti-pattern 5: The model cult
  7. 29.6 Anti-pattern 6: Context overload
  8. 29.7 Anti-pattern 7: The approval bottleneck
  9. 29.8 Anti-pattern 8: Runaway cost
  10. 29.9 The judgment test: deciding what should be an agent
  11. 29.10 Migrating legacy automation without regret
  12. 29.11 Worked example: the compliance team that regretted converting their entire pipeline
  13. 29.12 Things that should never be agents
  14. Summary
  15. Exercises
  16. Notes

Chapter 30. The road ahead, with minimum hype

  1. Learning objectives
  2. 30.1 What has landed and what is coming
  3. 30.2 What is plausible but unconfirmed
  4. 30.3 What is not happening in 2026
  5. 30.4 Three things to build now that will pay off regardless
  6. 30.5 Three bets to avoid
  7. 30.6 An architect’s preparation checklist
  8. 30.7 The next conversations
  9. Summary
  10. An architect’s preparation checklist
  11. Exercises
  12. Notes

Appendix A. The Ninety-Day Production-Readiness Checklist

  1. Phase 1: Foundations (Days 1-30)
  2. Phase 2: Governance (Days 31-60)
  3. Phase 3: Evals and Operationalisation (Days 61-90)
  4. Metrics to track

Appendix B. The Eval Starter Kit

  1. Canonical Evals (All Agents)
  2. Financial Services Evals
  3. How to Run These Evals
  4. Recording Results

Appendix C. managed-settings.json Reference with Annotations

  1. Field-by-Field Explanation
  2. How to Use This Config

Appendix D. MCP Server Audit Template

  1. MCP Server Audit Checklist
  2. Scoring
  3. After Audit: Ongoing Monitoring

Appendix E. Glossary

  1. Patterns and Anti-Patterns
  2. Governance and Compliance
  3. Common Abbreviations
  4. When This Glossary Gets Out of Sync

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub