Name: Claude Code: Building Production Agents That Actually Scale
Brand: Leanpub
Price: 14.97 USD
Availability: InStock

A Best-Selling Book on LeanPub for weeks since releasing. A mandatory guide for building Claude production AI agents.

This is a Claude Code production agents book for engineers who need agents that can run with tools, permissions, MCP servers, evals, observability, and cost controls in real systems.

Most Claude Code tutorials stop at "hello world." This book covers what happens after that: when your agent needs to run reliably in production, at scale, in environments where failure has real consequences.

Written by an AI engineer who builds Claude Code agent systems for regulated financial institutions, it walks through the full production stack.

Part I covers the agent loop, context management, and model selection.

Part II builds the primitive layer: tools, hooks, skills, MCP servers, and plugins.

Part III bridges the CLI to the Claude Agent SDK, covering the dispatch loop, session management, and tool registration for headless deployment. Part IV tackles governance: permissions, sandboxing, secrets, audit trails, and managed settings.

Part V covers evals, LLM-as-judge patterns, observability, cost engineering, and failure modes.

Part VI puts it together with team workflows, deployment patterns, multi-agent orchestration, and a full walkthrough of Anthropic's open-source financial services reference agents (the production-grade agent templates Anthropic released in May 2026).

Thirty-one chapters, five practical appendices (including a ninety-day production-readiness checklist, an eval starter kit, and an MCP server audit template), and code extracted from production systems.

If you are a senior AI engineer, technical lead, or architect evaluating Claude Code for production use, this is the reference that will save you months of trial and error.

Preface

Why This Book
Who This Book Is For
How to Read This Book
Acknowledgements

Chapter 1. The agent loop, seriously

Learning objectives
1.1 The minimal loop
1.2 What Claude Code adds on top
1.3 Where state actually lives
1.4 Worked example: an adverse media screening agent
1.5 The ten-minute whiteboard version
1.6 A note on vocabulary
Summary
Exercises
Notes

Chapter 2. What Claude Code is in April 2026

2.1 The CLI as ground truth
2.2 IDE surfaces: VS Code, JetBrains, and the new wave
2.3 CI surfaces: GitHub Actions
2.4 Chat surfaces: Slack
2.5 Other surfaces and what this book does not cover
2.6 How versions propagate
2.7 What you need to know before committing to a surface
Summary
Exercises
Notes

Chapter 3. The model family and why it matters for the loop

3.1 The model family
3.2 Context windows and how they fail
3.3 Prompt caching and why it changes the game
3.4 Cost shape, not cost per token
3.5 The model selection matrix
3.6 Effort controls: a new dimension in model selection
3.7 Performance and the SWE-Bench baseline
Summary
Exercises
Notes

Chapter 4. Context as a first-class resource

4.1 The anatomy of a turn
4.2 CLAUDE.md as the system of truth
4.3 Auto-memory and when it lies
4.4 The session transcript and context accumulation
4.4a Configuring auto-memory and compaction
4.4b Context that lives outside the window
4.5 Context budgeting worksheet
4.6 Worked example: A compliance monitoring agent
Summary
Exercises
Notes

Chapter 5. The production definition

5.1 Five properties of a production agent
5.2 Observable: Audit trails and reasoning
5.3 Reversible: Undo and correction
5.4 Evaluable: Metrics and measurement
5.5 Governable: Change control and accountability
5.6 The readiness rubric
5.7 Worked example: The adverse media screening agent
5.8 What this book will and will not help with
Summary
Exercises
Notes

Part II: The Primitive Stack

Chapter 6. Tools as an API surface

Learning objectives
6.1 What a tool actually is
6.2 The three categories of tools
6.3 Error handling and what the model learns on failure
6.4 Versioning tools without breaking agents
6.5 Rate limiting and cost attribution
6.6 Worked example: the screening toolkit
6.7 Design principles that compound
6.8 Tool schemas as governance
Summary
Exercises
Notes

Chapter 7. Subagents and the delegation model

Learning objectives
7.1 What a subagent is and is not
7.2 The parent-child contract
7.3 When to spawn vs. when to inline
7.4 Context isolation and why it matters
7.5 The fan-out pattern
7.6 Dynamic workflows: Opus 4.8 and the orchestrator model
7.7 Subagent configuration
7.8 Common misuses
7.9 When to use what: a decision guide across the primitive stack
7.10 Worked example: a compliance inquiry agent with subagents
Summary
Exercises
Notes

Chapter 8. Hooks, the only supervisor you have

Learning objectives
8.1 What a hook is and what it is not
8.2 Hook matching patterns
8.3 Writing a PreToolUse hook
8.4 Writing a PostToolUse hook
8.5 The four-eyes hook pattern
8.6 Where hooks live
8.7 Hooks as a debugging tool
8.8 Worked example: a four-eyes compliance screening agent
Summary
Exercises
Notes

Chapter 9. Slash commands and skills

Learning objectives
Learning objectives (continued)
9.1 What a skill is
9.2 The skill file anatomy
9.3 Writing instructions that agents can follow
9.4 Auto-invoke versus manual invoke
9.5 Skill versioning and audit trails
9.6 Worked example: a screening skill
9.7 Skills, tools, and hooks
Summary
Exercises
Notes

Chapter 10. MCP and the integration plane

Learning objectives
10.1 What MCP actually is
10.2 How MCP works
10.3 Transport mechanisms: stdio and HTTP
10.4 Writing an MCP server
10.5 MCP security: authentication, rate limiting, and PII
10.6 When MCP is the right choice
10.7 Worked example: an MCP server for adverse media
Summary
Exercises
Notes

Chapter 11. Plugins and Marketplaces

Learning objectives
11.1 What a plugin contains
11.2 Installation and lifecycle
11.3 Trust and provenance
11.4 The marketplace model
11.5 Building a plugin for your team
11.6 The FS compliance plugin: a worked example
11.7 Summary
Exercises
Footnotes

Chapter 12. Putting the stack together: a compliance monitoring agent end to end

Learning objectives
12.1 The workload in detail
12.2 The agent architecture
12.3 Session walk-through: screening a customer
12.4 The governance stack in action
12.5 Multi-entity orchestration
12.6 Cost attribution and budget management
12.7 Audit trail example
12.8 From custom build to packaged agent
12.9 What this chapter does not cover (and where to find it)
Summary
Exercises
Notes

Chapter 13. When to leave the CLI, and how to embed it when you do

Learning objectives
13.1 The five good reasons
13.2 The fifteen false positives
13.2.2 Performance and latency (two variations)
13.2.3 Integration (four variations)
13.2.4 Governance (two variations)
13.2.5 Multi-tenancy, queuing, and workflows (two final ones)
13.3 The subprocess pattern: CLI from the SDK
13.4 Decision flowchart as prose
13.5 Worked example: a screening inquiry handler with audit
13.6 Enterprise deployment: from CLI to Kubernetes
13.7 Managed Agents: Anthropic hosts the loop
Summary
Exercises
Footnotes

Chapter 14. The SDK in one sitting

Learning objectives
14.1 The query function
14.2 Options and configuration
14.3 Message types
14.4 Tool registration and schemas
14.5 The minimal working agent in fifty lines (Python)
14.6 The same agent in TypeScript
14.7 What the SDK does not do
Summary
Exercises
Footnotes

Chapter 15. Tools, hooks, and subagents from the SDK side

Learning objectives
15.1 Registering tools with schemas
15.2 PreToolUse and PostToolUse semantics
15.3 Hook matching in the SDK
15.4 Spawning subagents
15.5 Worked example: compliance inquiry handler
Summary
Exercises
Footnotes

Chapter 16. Sessions, state, and durability

Learning objectives
16.1 The session model
16.2 Storage options and tradeoffs
16.3 Durability patterns
16.4 Session replay for debugging
16.5 Worked example: inquiry handling across analyst shifts
16.6 Managed Agents and the session event log
Summary
Exercises
Footnotes

Chapter 17. The permissions model, read it twice

Learning objectives
The three permission modes
Per-tool permissions
Permissions at scale: containerized deployments
File path restrictions
The permission escalation problem
Default-deny as the only sane starting point
Permission auditing
Worked Financial Services example: the sanctions list incident
Summary
Exercises
Footnotes

Chapter 18. Sandboxing and the blast radius question

Learning objectives
OS-level sandboxing: filesystem, process, and network boundaries
Container-based isolation
The blast radius framework
Sandbox configuration across environments
Reducing blast radius: practical design patterns
Measuring blast radius
Worked Financial Services example: blast radius analysis for a compliance agent with customer PII access
Summary
Exercises
Footnotes

Chapter 19. Network egress, secrets, and data boundaries

Learning objectives
Network allowlisting
Secret injection patterns: where and how to manage API keys, tokens, and credentials
Environment variable hygiene
Data classification and the agent
PII in context windows
The screening workload stress test
Worked Financial Services example: a screening agent that logs PII to an observability endpoint
Summary
Exercises
Footnotes

Chapter 20. Policy as code: managed settings, audit, and lineage

Learning objectives
The managed-settings.json anatomy
Deploying settings at scale
Versioning policy: the policy repo
Audit logging: what to capture, where to send it
Lineage: tracing a decision back through the agent’s tool calls
The forensic question: reconstructing what the agent did
Worked Financial Services example: responding to a compliance audit with lineage
Summary
Exercises
Footnotes

Chapter 21. The supply chain: plugins, skills, MCP servers

Learning objectives
The trust problem
Provenance and signing
Version pinning
The audit before you install
Dependency sprawl
Building an internal registry
The vendor plugin problem in regulated industries
Worked Financial Services example: the sanctions plugin that was phoning home
Summary
Exercises
Footnotes

Chapter 22. The eval mindset

Learning objectives
Why evals are non-negotiable
The three eval types
What to eval in an agent system versus a completion system
The eval-first development loop
Building eval into CI
Continuous evals and the AgentOps connection
When evals fail: kill switches, canary rollouts, and automated responses
The eval debt trap
FS example: Screening agents and outcome drift
Summary
Exercises

Chapter 23. Designing evals for agentic workloads

Learning objectives
Trajectory versus answer grading
Building golden datasets for agents
The label leakage problem
Eval harness architecture
How Claude Code fits into the eval workflow
Tracing agent trajectories and decision points
Determinism traps
Statistical validity for small eval sets
FS example: Evaluating adverse media screening without label leakage
Summary
Exercises

Chapter 24. LLM-as-judge, done seriously

Learning objectives
What a judge model is
Writing judge prompts that actually discriminate
Calibrating judges against human ratings
Judge drift and how to detect it
Multi-judge panels
The circularity question
The cost of judging
Skills 2.0 judges
FS example: Calibrating a judge on inquiry triage
Summary
Exercises

Chapter 25. Observability: traces, spans, and the agent timeline

Learning objectives
What to instrument
The trace model for agents: turns, tool calls, subagents
The observability stack and where Claude Code fits
OpenTelemetry integration
MLflow for experiment tracking
Opik for agent-specific traces
Datadog for production monitoring
Building dashboards that matter
Agent cards and KPI-driven responses
Alert design for agents
FS example: Instrumentation for a KYC agent
Summary
Exercises

Chapter 26. Failure modes and reliability engineering

Learning objectives
The failure taxonomy
Detecting failure modes
SLOs for agents
Error budgets
Rollback strategies for agents
Where the SRE playbook breaks for agents
Circuit breakers and fallback patterns
Incident response for agent failures
FS example: Sycophanticity in a screening agent
Cross-agent injection: a real-world defence
Auto mode and the permission fan-out problem
Summary
Exercises

Chapter 27. Cost engineering

Learning objectives
27.1 The four cost types
27.2 Token cost mechanics and optimisation levers
27.3 Time cost: why wall-clock matters more than tokens
27.4 Review cost: the hidden tax on every agent output
27.5 Context-switch cost: what it costs the human to re-engage
27.6 Infrastructure cost: the line item everyone forgets
27.7 Building a cost model
27.8 Cost monitoring and alerts
27.9 The cost-quality tradeoff curve
27.10 Worked example: cost model for a screening workload
Summary
Exercises
Notes

Chapter 28. Team workflows and the agent factory

Learning objectives
28.1 The pair: one developer, one agent
28.2 The pool: shared agents, team governance
28.3 The platform: agents as infrastructure
28.4 When each pattern fits
28.5 The agent factory: provisioning, monitoring, and lifecycle
28.6 Scaling architecture: brains, hands, and burst patterns
28.7 The golden path for new agent workloads
28.8 Responsibilities: the platform team vs the developer
28.9 Worked example: scaling from pair to platform
Summary
Exercises
Notes

Chapter 29. Anti-patterns from the field, and how to migrate legacy automation without regret

Learning objectives
29.1 Anti-pattern 1: The decision loop
29.2 Anti-pattern 2: The inference loop
29.3 Anti-pattern 3: Prompt brittleness
29.4 Anti-pattern 4: The tool trap
29.5 Anti-pattern 5: The model cult
29.6 Anti-pattern 6: Context overload
29.7 Anti-pattern 7: The approval bottleneck
29.8 Anti-pattern 8: Runaway cost
29.9 The judgment test: deciding what should be an agent
29.10 Migrating legacy automation without regret
29.11 Worked example: the compliance team that regretted converting their entire pipeline
29.12 Things that should never be agents
29.13 Community plugins and supply chain risk
Summary
Exercises
Notes

Chapter 30. The road ahead, with minimum hype

Learning objectives
30.1 What has landed and what is coming
30.2 What is plausible but unconfirmed
30.3 What is not happening in 2026
30.4 Three things to build now that will pay off regardless
30.5 Three bets to avoid
30.6 An architect’s preparation checklist
30.7 The next conversations
Summary
An architect’s preparation checklist
Exercises
Notes

Chapter 31. Anthropic’s financial services reference agents

Learning objectives
31.1 What is in the repository
31.2 The dual-deployment model
31.3 Skills: authored once, propagated everywhere
31.4 MCP connectors: centralised data access
31.5 Cross-agent orchestration
31.6 Partner integrations
31.7 What this means for your team
31.8 Build versus configure
Summary
Exercises
Notes

Appendix A. The Ninety-Day Production-Readiness Checklist

Phase 1: Foundations (Days 1-30)
Phase 2: Governance (Days 31-60)
Phase 3: Evals and Operationalisation (Days 61-90)
Metrics to track

Appendix B. The Eval Starter Kit

Canonical Evals (All Agents)
Financial Services Evals
How to Run These Evals
Recording Results

Appendix C. managed-settings.json Reference with Annotations

Field-by-Field Explanation
How to Use This Config

Appendix D. MCP Server Audit Template

MCP Server Audit Checklist
Scoring
After Audit: Ongoing Monitoring

Appendix E. Glossary

Patterns and Anti-Patterns
Governance and Compliance
Common Abbreviations
When This Glossary Gets Out of Sync

You pay

Author earns

About

Share this book

Categories

Feedback

Bundle

Enterprise AI Agents in Production: Build and Secure Them

$23.99

Author

Launch

Translations

Languages

Contents