Leanpub Header

Skip to main content

Codex CLI

Agentic Engineering from First Principles

This book is 100% completeLast updated on 2026-06-21

The definitive guide to agentic software engineering with Codex CLI, from prompting and AGENTS.md fundamentals to multi-agent orchestration, CI/CD integration, security hardening, and enterprise deployment across 28 hands-on chapters.

Minimum price

$19.99

$29.99

You pay

Author earns

$

Also available for 1 book credit with a Reader Membership

PDF
EPUB
WEB
APP
About

About

About the Book

Codex CLI: Agentic Engineering from First Principles is the most comprehensive guide to agentic software engineering with OpenAI's command-line coding agent. Across 28 chapters in six parts, you'll move from first principles to production workflows and team-scale practice: prompting and AGENTS.md configuration, approval modes and kernel-level sandboxing, model selection, context and cost management, MCP servers, hooks, skills, sub-agents and orchestration, worktrees, CI/CD integration, security hardening, and enterprise deployment. Later chapters cover debugging and testing agentic workflows, AI code review, practical engineering guides (codebase migration, backend, frontend, and infrastructure as code), and the bigger picture: benchmarks, competing tools, harness engineering, and how to structure an agentic engineering team.

Whether you're a solo developer looking to multiply your output or an engineering lead rolling out agentic workflows across a team, this book gives you the mental models and practical techniques to work effectively with AI coding agents.

Written by Daniel Vaughan and drawing on real-world experience and community insights, it equips every chapter with learning objectives, worked examples, and hands-on exercises.

Author

About the Author

Daniel Vaughan

Daniel Vaughan is a technology leader and software architect based in the United Kingdom, specialising in agentic AI. He has spent approaching thirty years across enterprise, startup, and academic settings, with a career-long focus on engineering quality and developer productivity. His work now centres on the shift where AI stops being an experiment and becomes core to how organisations build software.

Daniel is Head of Forward Deployed Engineering at HCLTech AI Labs, where he leads a global practice embedding engineers and coding agents inside complex enterprise environments to take production AI from prototype to live systems. He built the practice from the ground up, running lean regional pods in which a shared architect, a small number of engineers, and a team of agents working through tools such as Codex, Claude Code, Cursor, and GitHub Copilot deliver at the output of a much larger team.

Before HCLTech AI Labs, Daniel was director of software engineering at Mastercard in London, leading cloud strategy and architecture for real-time payment products in a highly regulated financial services environment. Earlier, he spent eight years at the European Bioinformatics Institute in Cambridge, moving from software engineer into engineering leadership, working on the same problems of software quality and developer productivity that he now solves with agentic tooling.

He is the author of Cloud Native Development with Google Cloud (O'Reilly, 2024) and Ext GWT 2.0: Beginner's Guide (Packt, 2010), a Google Developer Expert, and a Green Software Champion. He writes about agentic engineering and AI-assisted development at blog.danielvaughan.com.

The Leanpub Podcast

Episode 329

An Interview with Daniel Vaughan

Launch

Launch Video

Subscribe on YouTube

Contents

Table of Contents

Copyright

About the Author

Introduction

  1. I Started This on a Train
  2. What Changed My Mind
  3. Who This Book Is For
  4. What You Won’t Find Here
  5. How This Book Is Organised
  6. A Note on Version Currency
  7. How to Use This Book
  8. Grounding

Chapter 1. What Is Codex CLI?

  1. Learning Objectives
  2. Before Codex: A Brief History
  3. The Problem with Autocomplete
  4. The Agentic Landscape
  5. Defining Codex CLI
  6. Why the Terminal Matters
  7. The Core Proposition
  8. Summary
  9. Exercises

Chapter 2. Getting Started with Codex CLI

  1. Learning Objectives
  2. What a First Session Looks Like
  3. The Bigger Picture: Four Surfaces, One Agent
  4. Prerequisites
  5. Installation
  6. Authentication
  7. Your First Session
  8. Three Week-One Mistakes
  9. codex vs. codex exec
  10. Where Configuration Lives
  11. AGENTS.md in one minute
  12. Where this leaves you
  13. Summary
  14. Exercises

Chapter 3. Prompting Codex CLI Effectively

  1. Learning Objectives
  2. Why Codex CLI Prompting Is Different
  3. The Anatomy of an Effective Prompt
  4. Reasoning Effort
  5. Iterative Prompting and Mid-Session Corrections
  6. Prompt Patterns for Common Tasks
  7. Documentation Generation
  8. Moving Durable Context Out of Prompts
  9. Output Control: Shaping How the Agent Communicates
  10. Summary
  11. Exercises

Chapter 4. AGENTS.md: Patterns and Pitfalls

  1. Learning Objectives
  2. What AGENTS.md Is and How Codex CLI Reads It
  3. Essential Sections: Commits, Testing, Style
  4. AGENTS.md for Monorepos and Multi-Service Repos
  5. Common Mistakes
  6. The File Map Pattern
  7. Production Patterns: The openai/codex Case Study
  8. Beyond AGENTS.md: Extending the Pattern
  9. Where this leaves you
  10. Summary
  11. Exercises
  12. Guardrails

Chapter 5. Approval Modes and Trust Boundaries

  1. Learning Objectives
  2. The Trust Model: What Codex CLI Can Touch
  3. The Three Approval Modes
  4. Sandboxing: Filesystem and Network Restrictions
  5. Kernel-Level Sandboxing
  6. Named Profiles
  7. The Permission Ladder
  8. Where this leaves you
  9. Summary
  10. Exercises

Chapter 6. Model Selection and Reasoning Effort

  1. Learning Objectives
  2. Model Roles, Not Model Names
  3. Reasoning Effort
  4. Named Profiles
  5. The Model Routing Decision Framework
  6. Multi-Model Workflows
  7. Cloud and Local Model Providers
  8. Multi-Provider Resilience
  9. Monitoring Model Deprecations
  10. Where this leaves you
  11. Summary
  12. Exercises

Chapter 7. Context Window Management

  1. Learning Objectives
  2. The Quadratic Growth Problem
  3. Thread Resume and Fork: Preserving Context Without Restarting
  4. What Consumes Context (and What Doesn’t)
  5. The /compact Command and Automatic Summarisation
  6. Sub-Agent Delegation as Context Management
  7. Strategies for Large Codebases
  8. Monitoring Context Usage
  9. How Compaction Actually Works
  10. Plan Mode and Fresh-Context Implementation
  11. Prompt Caching: Economics of Long Sessions
  12. Persistent Context: The Two-Phase Memory Pipeline
  13. MCP Memory Servers: Persistent Cross-Session Context
  14. Team Memory: The Gap Beyond Individual Recall
  15. Context Failure Modes: A Taxonomy
  16. Finding Past Sessions
  17. Where this leaves you
  18. Summary
  19. Exercises

Chapter 8. MCP: Consuming and Serving

  1. Learning Objectives
  2. MCP in 60 Seconds
  3. What MCP Is (and What It Is Not)
  4. The Architecture: Hosts, Clients and Servers
  5. Tool Annotations: Risk Vocabulary
  6. Connecting to Common Servers
  7. The Context Cost of MCP
  8. Enterprise MCP: Authentication, Scoping and Restrictions
  9. Building a Simple MCP Server
  10. Codex CLI on Both Sides of MCP
  11. Running MCP in Production
  12. Where this leaves you
  13. Summary
  14. Exercises

Chapter 9. Hooks: Intercepting the Agent Lifecycle

  1. Learning Objectives
  2. The Hook System: Overview and Events
  3. Lifecycle Events
  4. Writing Robust Hooks
  5. Patterns: Enforcement, Audit, and Notification
  6. Tooling
  7. Summary
  8. Exercises
  9. Scaling Up

Chapter 10. The Skills Ecosystem: Using and Writing Skills

  1. Learning Objectives
  2. The Consumer’s View: Using and Browsing the Ecosystem
  3. The Producer’s View: Writing Your Own Skills
  4. Plugins: Packaging and Distribution
  5. Case studies
  6. Summary
  7. Exercises

Chapter 11. Sub-Agents and Parallel Execution

  1. Learning Objectives
  2. The Sub-Agent Model
  3. Task Decomposition
  4. CSV Fan-Out
  5. Scaling Beyond One Session
  6. Operational Considerations
  7. Summary
  8. Exercises

Chapter 12. Multi-Agent Orchestration Patterns

  1. Learning Objectives
  2. Composition, Not New Primitives
  3. Starting Point: the Ralph Loop
  4. Pattern 1: Sequential Gated Chain
  5. Pattern 2: Wave-Based Hybrid
  6. Pattern 3: Cross-Model Review Loop
  7. Pattern 4: Iterative Repair Loop
  8. Choosing Among the Four Patterns
  9. Conversation Branching: Supervisor Patterns Without a Harness
  10. /goal: The Persistent Objective
  11. External Orchestration: codex remote-control
  12. Debugging Orchestration Failures
  13. Anti-patterns
  14. Summary
  15. Exercises

Chapter 13. Worktrees and Isolated Execution

  1. Learning Objectives
  2. Git Worktrees: A Brief Recap
  3. Why Worktrees Matter for Agentic Workflows
  4. One Agent, One Worktree: The Isolation Principle
  5. CLI Worktree Workflows
  6. Merging Agent Work Back to Main
  7. Worktree Lifecycle in CI
  8. Worktree Workflow Patterns by Team Size
  9. Summary
  10. Exercises

Chapter 14. Cost Management and Quota Strategy

  1. Learning Objectives
  2. How Codex CLI Quota Works
  3. Estimating Team Costs
  4. Configuring Cost Controls
  5. Monitoring and Alerting with Hooks
  6. Cost-Quality Decision Matrix
  7. Per-Reasoning-Effort Token Consumption
  8. Prompt Caching: the Largest Single Cost Lever
  9. Context-Usage Visibility from Plan Mode
  10. Goal Mode and Long-Horizon Cost
  11. Alternative Cloud Billing Paths
  12. Token Compression with a Proxy
  13. On-Premises Inference: GB10 Break-Even
  14. Summary
  15. Exercises

Chapter 15. CI/CD Integration

  1. Learning Objectives
  2. Running Codex CLI Headless
  3. Deterministic, Hermetic Runs
  4. The openai/codex-action GitHub Action
  5. GitLab CI/CD Integration
  6. From Automation to Autonomy: Self-Healing Pipelines
  7. Pipeline Observability: Token Usage, Steering Metadata, and Thread Context
  8. CI/CD Session Reliability: WebSocket Keepalive and Remote Connections
  9. Build System Integration: Bazel and Dagger
  10. Empirical Evidence: What 33,000 Agentic PRs Reveal About Pipeline Design
  11. Microsoft’s Agentic DevOps Playbook
  12. CI/CD Pattern Catalogue
  13. Summary
  14. Exercises
  15. Production

Chapter 16. Security Hardening

  1. Learning Objectives
  2. The Threat Model for Agentic Systems
  3. Prompt Injection: Attack Patterns and Defences
  4. Filesystem Restrictions and Sandboxing
  5. Automated Review and a Trust Ladder for Adoption
  6. CI Isolation
  7. Network Allowlisting
  8. Secret Management
  9. Audit and Compliance
  10. Security Checklist
  11. Summary
  12. Exercises

Chapter 17. Enterprise Deployment

  1. Learning Objectives
  2. What Enterprise Deployment Is Actually About
  3. Distributing Configuration at Scale
  4. Managed Policies: requirements.toml
  5. RBAC and Access Control
  6. AGENTS.override.md: Enforcing Policy Across Teams
  7. Onboarding an Engineering Team
  8. Measuring ROI
  9. Multi-Cloud Provider Strategy
  10. Cloud vs Self-Hosted
  11. Governance Frameworks
  12. How Enterprise Rollouts Actually Fail
  13. Rollout Checklist
  14. Summary
  15. Exercises

Chapter 18. Debugging and Diagnosing Agent Failures

  1. Learning Objectives
  2. Reading the Session Transcript
  3. Using Approval Mode as a Diagnostic
  4. Diagnosing AGENTS.md Failures
  5. Testing Commands Under the Sandbox
  6. Context Overflow Symptoms
  7. Recovering a Runaway Session
  8. Structured Logging with RUST_LOG
  9. Model Catalog Introspection with codex debug models
  10. Detecting Specification Drift with SLUMP
  11. A Diagnostic Workflow Checklist
  12. The Three-Level Observability Stack
  13. Summary
  14. Exercises

Chapter 19. Testing and Evaluation Strategy for Agentic Workflows

  1. Learning Objectives
  2. What Makes a Test Suite Agent-Friendly
  3. Designing for Agent Execution
  4. The Feedback Signal Problem
  5. Using Codex CLI to audit your test suite
  6. Evaluation Beyond Unit Tests
  7. Building an Evaluation Harness
  8. TDD as an Agent Feedback Loop
  9. The 4-File Durable Memory Pattern for Long-Horizon Evaluation
  10. Agent Test Quality: The Over-Mocking Problem and Review Benchmarks
  11. Evaluation Frameworks: CocoaBench, HiL-Bench, and AAR
  12. Testing MCP Servers
  13. Summary
  14. Exercises
  15. Practitioner Guides

Chapter 20. AI Code Review

  1. Learning Objectives
  2. Why AI Code Review Works (and Where It Doesn’t)
  3. Configuring Codex for Code Review
  4. The /review Command
  5. PR Integration: Automated Review on Every PR
  6. Structured Output Code Review with codex exec --output-schema
  7. Writing Review Checklists in AGENTS.md
  8. Human-AI Review Collaboration Patterns
  9. The Review-Fix Loop: A Three-Level Maturity Model
  10. Summary
  11. Exercises

Chapter 21. Codebase Migration and Modernisation

  1. Learning Objectives
  2. Why Codex CLI Excels at Migration Work
  3. Understanding the Legacy System First
  4. Planning the Migration with Codex CLI
  5. Incremental Migration Patterns
  6. Validation Strategies During Migration
  7. Framework and Language Version Migrations
  8. Decomposing a Monolith
  9. Beyond the Tests Passing: Verifying a Large Rewrite
  10. Summary
  11. Exercises

Chapter 22. Backend Engineering

  1. Learning Objectives
  2. The Service We Are Building
  3. Why Python Is Hard for Agents
  4. Python-Specific AGENTS.md
  5. Pytest Integration and Test Generation
  6. Type Hints and Docstring Automation
  7. uv, ruff, and Modern Python Toolchain
  8. Service and Async Conventions
  9. Summary
  10. Exercises

Chapter 23. Frontend Engineering

  1. Learning Objectives
  2. Frontend-Specific AGENTS.md Configuration
  3. Create the Frontend Project
  4. Run the Backend and Define the Contract
  5. Component Generation and Scaffolding
  6. Building the Board
  7. Test Generation for React Components
  8. The Explorer/Worker Sub-Agent Pattern
  9. Accessibility Audit Automation
  10. Design-to-Code Workflows
  11. Codex Browser Use: Visual Verification
  12. Summary
  13. Exercises

Chapter 24. Infrastructure as Code

  1. Learning Objectives
  2. Why Infrastructure Code Resists Agents
  3. AGENTS.md for Infrastructure Repositories
  4. Closing the Loop with Deterministic Tools
  5. Terraform Workflows
  6. Deploying to Cloud Run
  7. Hardening Generation with the Terraform MCP Server
  8. GitOps: Making the Apply Boundary Structural
  9. Safety Guardrails
  10. Day-Two Operations: Drift Detection and Runbooks
  11. Other Infrastructure Tools
  12. Document the Stack: A Project README
  13. Summary
  14. Exercises
  15. The Bigger Picture

Chapter 25. Benchmarks and Real-World Performance

  1. Learning Objectives
  2. The Benchmark Landscape
  3. SWE-bench: Gold Standard to Cautionary Tale
  4. Terminal-Bench 2.0: CLI-First Benchmarking
  5. The Scaffolding Effect
  6. What the Numbers Actually Mean for Your Team
  7. Running Your Own Benchmarks
  8. April 2026 Benchmarks: CocoaBench, HiL-Bench, and AAR
  9. SlopCodeBench: Measuring Quality, Not Just Completion
  10. The Benchmark Hierarchy
  11. The Other Side of Benchmarks: Agent Code Quality in Production
  12. Summary
  13. Exercises

Chapter 26. Competing Tools and When to Use Each

  1. Learning Objectives
  2. The Decision Framework
  3. CLI Agents
  4. IDE Agents
  5. The Convergence Layer
  6. Migrating from Claude Code to Codex CLI
  7. Cross-Agent Migration Tooling
  8. The Multi-Tool Pattern
  9. The Sandbox Landscape
  10. Summary
  11. Exercises

Chapter 27. Harness Engineering for Long-Running Agents

  1. Learning Objectives
  2. What Harness Engineering Is
  3. The WORKFLOW.md Pattern
  4. State Persistence Across Agent Runs
  5. Error Recovery and Resumption
  6. The Proof-of-Work Principle
  7. Harness Components You Can Add Today
  8. Harnesses in Practice
  9. The Evidence: Why the Harness Beats the Model
  10. The Bigger Picture
  11. Summary
  12. Exercises

Chapter 28. The Agentic Engineering Pod

  1. Learning Objectives
  2. Why Three Roles?
  3. The Context Architect: Human Role
  4. The Value Engineer: Human Role
  5. The Quality Engineer: Human Role
  6. The Pod in Practice: A Feature Lifecycle
  7. Failure Modes and Pod Anti-Patterns
  8. Distributing the Pod with Plugins
  9. Closing
  10. Summary
  11. Exercises

Conclusion: The Agentic Engineer

A Note from the Author, Or Rather, the Tool

  1. How This Book Was Made
  2. What This Means for You
  3. The Book as Artefact
  4. An Invitation
  5. A Face for the Voice

Bibliography

  1. OpenAI Documentation and Releases
  2. Academic Papers and Research
  3. Industry Articles and Commentary
  4. Standards, Specifications, and Foundations
  5. Tools and Open-Source Projects
  6. Author’s Published Articles
  7. Other Sources

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub