Copyright
About the Author
Introduction
- I Started This on a Train
- What Changed My Mind
- Who This Book Is For
- What You Won’t Find Here
- How This Book Is Organised
- A Note on Version Currency
- How to Use This Book
- Grounding
Chapter 1. What Is Codex CLI?
- Learning Objectives
- Before Codex: A Brief History
- The Problem with Autocomplete
- The Agentic Landscape
- Defining Codex CLI
- Why the Terminal Matters
- The Core Proposition
- Summary
- Exercises
Chapter 2. Getting Started with Codex CLI
- Learning Objectives
- What a First Session Looks Like
- The Bigger Picture: Four Surfaces, One Agent
- Prerequisites
- Installation
- Authentication
- Your First Session
- Three Week-One Mistakes
codexvs.codex exec- Where Configuration Lives
AGENTS.mdin one minute- Where this leaves you
- Summary
- Exercises
Chapter 3. Prompting Codex CLI Effectively
- Learning Objectives
- Why Codex CLI Prompting Is Different
- The Anatomy of an Effective Prompt
- Reasoning Effort
- Iterative Prompting and Mid-Session Corrections
- Prompt Patterns for Common Tasks
- Documentation Generation
- Moving Durable Context Out of Prompts
- Output Control: Shaping How the Agent Communicates
- Summary
- Exercises
Chapter 4. AGENTS.md: Patterns and Pitfalls
- Learning Objectives
- What AGENTS.md Is and How Codex CLI Reads It
- Essential Sections: Commits, Testing, Style
- AGENTS.md for Monorepos and Multi-Service Repos
- Common Mistakes
- The File Map Pattern
- Production Patterns: The openai/codex Case Study
- Beyond AGENTS.md: Extending the Pattern
- Where this leaves you
- Summary
- Exercises
- Guardrails
Chapter 5. Approval Modes and Trust Boundaries
- Learning Objectives
- The Trust Model: What Codex CLI Can Touch
- The Three Approval Modes
- Sandboxing: Filesystem and Network Restrictions
- Kernel-Level Sandboxing
- Named Profiles
- The Permission Ladder
- Where this leaves you
- Summary
- Exercises
Chapter 6. Model Selection and Reasoning Effort
- Learning Objectives
- Model Roles, Not Model Names
- Reasoning Effort
- Named Profiles
- The Model Routing Decision Framework
- Multi-Model Workflows
- Cloud and Local Model Providers
- Multi-Provider Resilience
- Monitoring Model Deprecations
- Where this leaves you
- Summary
- Exercises
Chapter 7. Context Window Management
- Learning Objectives
- The Quadratic Growth Problem
- Thread Resume and Fork: Preserving Context Without Restarting
- What Consumes Context (and What Doesn’t)
- The /compact Command and Automatic Summarisation
- Sub-Agent Delegation as Context Management
- Strategies for Large Codebases
- Monitoring Context Usage
- How Compaction Actually Works
- Plan Mode and Fresh-Context Implementation
- Prompt Caching: Economics of Long Sessions
- Persistent Context: The Two-Phase Memory Pipeline
- MCP Memory Servers: Persistent Cross-Session Context
- Team Memory: The Gap Beyond Individual Recall
- Context Failure Modes: A Taxonomy
- Finding Past Sessions
- Where this leaves you
- Summary
- Exercises
Chapter 8. MCP: Consuming and Serving
- Learning Objectives
- MCP in 60 Seconds
- What MCP Is (and What It Is Not)
- The Architecture: Hosts, Clients and Servers
- Tool Annotations: Risk Vocabulary
- Connecting to Common Servers
- The Context Cost of MCP
- Enterprise MCP: Authentication, Scoping and Restrictions
- Building a Simple MCP Server
- Codex CLI on Both Sides of MCP
- Running MCP in Production
- Where this leaves you
- Summary
- Exercises
Chapter 9. Hooks: Intercepting the Agent Lifecycle
- Learning Objectives
- The Hook System: Overview and Events
- Lifecycle Events
- Writing Robust Hooks
- Patterns: Enforcement, Audit, and Notification
- Tooling
- Summary
- Exercises
- Scaling Up
Chapter 10. The Skills Ecosystem: Using and Writing Skills
- Learning Objectives
- The Consumer’s View: Using and Browsing the Ecosystem
- The Producer’s View: Writing Your Own Skills
- Plugins: Packaging and Distribution
- Case studies
- Summary
- Exercises
Chapter 11. Sub-Agents and Parallel Execution
- Learning Objectives
- The Sub-Agent Model
- Task Decomposition
- CSV Fan-Out
- Scaling Beyond One Session
- Operational Considerations
- Summary
- Exercises
Chapter 12. Multi-Agent Orchestration Patterns
- Learning Objectives
- Composition, Not New Primitives
- Starting Point: the Ralph Loop
- Pattern 1: Sequential Gated Chain
- Pattern 2: Wave-Based Hybrid
- Pattern 3: Cross-Model Review Loop
- Pattern 4: Iterative Repair Loop
- Choosing Among the Four Patterns
- Conversation Branching: Supervisor Patterns Without a Harness
/goal: The Persistent Objective- External Orchestration:
codex remote-control - Debugging Orchestration Failures
- Anti-patterns
- Summary
- Exercises
Chapter 13. Worktrees and Isolated Execution
- Learning Objectives
- Git Worktrees: A Brief Recap
- Why Worktrees Matter for Agentic Workflows
- One Agent, One Worktree: The Isolation Principle
- CLI Worktree Workflows
- Merging Agent Work Back to Main
- Worktree Lifecycle in CI
- Worktree Workflow Patterns by Team Size
- Summary
- Exercises
Chapter 14. Cost Management and Quota Strategy
- Learning Objectives
- How Codex CLI Quota Works
- Estimating Team Costs
- Configuring Cost Controls
- Monitoring and Alerting with Hooks
- Cost-Quality Decision Matrix
- Per-Reasoning-Effort Token Consumption
- Prompt Caching: the Largest Single Cost Lever
- Context-Usage Visibility from Plan Mode
- Goal Mode and Long-Horizon Cost
- Alternative Cloud Billing Paths
- Token Compression with a Proxy
- On-Premises Inference: GB10 Break-Even
- Summary
- Exercises
Chapter 15. CI/CD Integration
- Learning Objectives
- Running Codex CLI Headless
- Deterministic, Hermetic Runs
- The openai/codex-action GitHub Action
- GitLab CI/CD Integration
- From Automation to Autonomy: Self-Healing Pipelines
- Pipeline Observability: Token Usage, Steering Metadata, and Thread Context
- CI/CD Session Reliability: WebSocket Keepalive and Remote Connections
- Build System Integration: Bazel and Dagger
- Empirical Evidence: What 33,000 Agentic PRs Reveal About Pipeline Design
- Microsoft’s Agentic DevOps Playbook
- CI/CD Pattern Catalogue
- Summary
- Exercises
- Production
Chapter 16. Security Hardening
- Learning Objectives
- The Threat Model for Agentic Systems
- Prompt Injection: Attack Patterns and Defences
- Filesystem Restrictions and Sandboxing
- Automated Review and a Trust Ladder for Adoption
- CI Isolation
- Network Allowlisting
- Secret Management
- Audit and Compliance
- Security Checklist
- Summary
- Exercises
Chapter 17. Enterprise Deployment
- Learning Objectives
- What Enterprise Deployment Is Actually About
- Distributing Configuration at Scale
- Managed Policies: requirements.toml
- RBAC and Access Control
- AGENTS.override.md: Enforcing Policy Across Teams
- Onboarding an Engineering Team
- Measuring ROI
- Multi-Cloud Provider Strategy
- Cloud vs Self-Hosted
- Governance Frameworks
- How Enterprise Rollouts Actually Fail
- Rollout Checklist
- Summary
- Exercises
Chapter 18. Debugging and Diagnosing Agent Failures
- Learning Objectives
- Reading the Session Transcript
- Using Approval Mode as a Diagnostic
- Diagnosing AGENTS.md Failures
- Testing Commands Under the Sandbox
- Context Overflow Symptoms
- Recovering a Runaway Session
- Structured Logging with
RUST_LOG - Model Catalog Introspection with
codex debug models - Detecting Specification Drift with SLUMP
- A Diagnostic Workflow Checklist
- The Three-Level Observability Stack
- Summary
- Exercises
Chapter 19. Testing and Evaluation Strategy for Agentic Workflows
- Learning Objectives
- What Makes a Test Suite Agent-Friendly
- Designing for Agent Execution
- The Feedback Signal Problem
- Using Codex CLI to audit your test suite
- Evaluation Beyond Unit Tests
- Building an Evaluation Harness
- TDD as an Agent Feedback Loop
- The 4-File Durable Memory Pattern for Long-Horizon Evaluation
- Agent Test Quality: The Over-Mocking Problem and Review Benchmarks
- Evaluation Frameworks: CocoaBench, HiL-Bench, and AAR
- Testing MCP Servers
- Summary
- Exercises
- Practitioner Guides
Chapter 20. AI Code Review
- Learning Objectives
- Why AI Code Review Works (and Where It Doesn’t)
- Configuring Codex for Code Review
- The /review Command
- PR Integration: Automated Review on Every PR
- Structured Output Code Review with
codex exec --output-schema - Writing Review Checklists in AGENTS.md
- Human-AI Review Collaboration Patterns
- The Review-Fix Loop: A Three-Level Maturity Model
- Summary
- Exercises
Chapter 21. Codebase Migration and Modernisation
- Learning Objectives
- Why Codex CLI Excels at Migration Work
- Understanding the Legacy System First
- Planning the Migration with Codex CLI
- Incremental Migration Patterns
- Validation Strategies During Migration
- Framework and Language Version Migrations
- Decomposing a Monolith
- Beyond the Tests Passing: Verifying a Large Rewrite
- Summary
- Exercises
Chapter 22. Backend Engineering
- Learning Objectives
- The Service We Are Building
- Why Python Is Hard for Agents
- Python-Specific AGENTS.md
- Pytest Integration and Test Generation
- Type Hints and Docstring Automation
- uv, ruff, and Modern Python Toolchain
- Service and Async Conventions
- Summary
- Exercises
Chapter 23. Frontend Engineering
- Learning Objectives
- Frontend-Specific AGENTS.md Configuration
- Create the Frontend Project
- Run the Backend and Define the Contract
- Component Generation and Scaffolding
- Building the Board
- Test Generation for React Components
- The Explorer/Worker Sub-Agent Pattern
- Accessibility Audit Automation
- Design-to-Code Workflows
- Codex Browser Use: Visual Verification
- Summary
- Exercises
Chapter 24. Infrastructure as Code
- Learning Objectives
- Why Infrastructure Code Resists Agents
- AGENTS.md for Infrastructure Repositories
- Closing the Loop with Deterministic Tools
- Terraform Workflows
- Deploying to Cloud Run
- Hardening Generation with the Terraform MCP Server
- GitOps: Making the Apply Boundary Structural
- Safety Guardrails
- Day-Two Operations: Drift Detection and Runbooks
- Other Infrastructure Tools
- Document the Stack: A Project README
- Summary
- Exercises
- The Bigger Picture
Chapter 25. Benchmarks and Real-World Performance
- Learning Objectives
- The Benchmark Landscape
- SWE-bench: Gold Standard to Cautionary Tale
- Terminal-Bench 2.0: CLI-First Benchmarking
- The Scaffolding Effect
- What the Numbers Actually Mean for Your Team
- Running Your Own Benchmarks
- April 2026 Benchmarks: CocoaBench, HiL-Bench, and AAR
- SlopCodeBench: Measuring Quality, Not Just Completion
- The Benchmark Hierarchy
- The Other Side of Benchmarks: Agent Code Quality in Production
- Summary
- Exercises
Chapter 26. Competing Tools and When to Use Each
- Learning Objectives
- The Decision Framework
- CLI Agents
- IDE Agents
- The Convergence Layer
- Migrating from Claude Code to Codex CLI
- Cross-Agent Migration Tooling
- The Multi-Tool Pattern
- The Sandbox Landscape
- Summary
- Exercises
Chapter 27. Harness Engineering for Long-Running Agents
- Learning Objectives
- What Harness Engineering Is
- The WORKFLOW.md Pattern
- State Persistence Across Agent Runs
- Error Recovery and Resumption
- The Proof-of-Work Principle
- Harness Components You Can Add Today
- Harnesses in Practice
- The Evidence: Why the Harness Beats the Model
- The Bigger Picture
- Summary
- Exercises
Chapter 28. The Agentic Engineering Pod
- Learning Objectives
- Why Three Roles?
- The Context Architect: Human Role
- The Value Engineer: Human Role
- The Quality Engineer: Human Role
- The Pod in Practice: A Feature Lifecycle
- Failure Modes and Pod Anti-Patterns
- Distributing the Pod with Plugins
- Closing
- Summary
- Exercises
Conclusion: The Agentic Engineer
A Note from the Author, Or Rather, the Tool
- How This Book Was Made
- What This Means for You
- The Book as Artefact
- An Invitation
- A Face for the Voice
Bibliography
- OpenAI Documentation and Releases
- Academic Papers and Research
- Industry Articles and Commentary
- Standards, Specifications, and Foundations
- Tools and Open-Source Projects
- Author’s Published Articles
- Other Sources