Codex CLI [Leanpub PDF/iPad/Kindle]

Copyright and Trademarks

Trademarks
Disclaimer

Introduction

I Started This on a Train
What Changed My Mind
Who This Book Is For
What You Won’t Find Here
How This Book Is Organised
A Note on Version Currency
How to Use This Book
Getting Your Bearings

Chapter 1. What Is Codex CLI?

Learning Objectives
Before Codex: A Brief History
The Problem with Autocomplete
The 2026 Agentic Landscape
Defining Codex CLI
Why the Terminal Matters
The Core Proposition
Summary
Exercises

Chapter 2. Getting Started with Codex CLI

Learning Objectives
The Bigger Picture: Four Surfaces, One Agent
Prerequisites
Installation
Authentication
Your First Session
Interactive REPL vs. codex exec
Where Configuration Lives
Summary
Exercises

Chapter 3. What’s New in Codex CLI

Learning Objectives
The Three Themes of 2026
The Model Landscape
Approval Mode Changes
New CLI Features
Enterprise Features
Staying Current
Summary
Exercises

Chapter 4. Benchmarks and Real-World Performance

Learning Objectives
The Benchmark Landscape
SWE-bench: Gold Standard to Cautionary Tale
Terminal-Bench 2.0: CLI-First Benchmarking
The Scaffolding Effect
What the Numbers Actually Mean for Your Team
Running Your Own Benchmarks
April 2026 Benchmarks: CocoaBench, HiL-Bench, and AAR
SlopCodeBench: Measuring Quality, Not Just Completion
The Benchmark Hierarchy
The Other Side of Benchmarks: Agent Code Quality in Production
Summary
Exercises

Chapter 5. Competing Tools and When to Use Each

Learning Objectives
The Decision Framework
Claude Code
Cursor
Gemini CLI (Deprecated, Sunset 18 June 2026)
OpenCode
GitHub Copilot
Kiro CLI
Grok Build
Desktop Workstations: Codexia and the GUI Wrapper Trend
Zed 1.0 Agent Mode
Google Antigravity 2.0
VS Code Multi-Agent Architecture
The Multi-Tool Pattern
Summary
Exercises

Chapter 6. Codex in the Wild: Interfaces and Community

Learning Objectives
Usage Limits: The Hidden Variable
What the Benchmarks Actually Tell You
The Core Personality Difference
Practical Handoff Patterns
The Power Stack: Using Both
Team Adoption Patterns
Interfaces: Desktop, CLI, and IDE
Summary
Exercises
Foundations

Chapter 7. Prompting Codex CLI Effectively

Learning Objectives
Why Codex CLI Prompting Is Different
The Anatomy of an Effective Prompt
Task Scoping and Reasoning Effort
Iterative Prompting and Mid-Session Corrections
Prompt Patterns for Common Tasks
Moving Durable Context Out of Prompts
Output Control: Shaping How the Agent Communicates
Summary
Exercises

Chapter 8. AGENTS.md: Patterns and Pitfalls

Learning Objectives
What AGENTS.md Is and How Codex CLI Reads It
Essential Sections: Commits, Testing, Style
Project-Specific Context: What to Include and Omit
Common Mistakes and How They Manifest
AGENTS.md for Monorepos and Multi-Service Repos
Testing and Validating Your AGENTS.md
What Not to Do: Common AGENTS.md Mistakes
Starter Template
The File Map Pattern: Addressing the Navigation Failure Mode
Production AGENTS.md Patterns: The openai/codex Case Study
Empirical Adoption Data: What 2,853 Repositories Reveal
The .md File Ecosystem: Beyond AGENTS.md
Summary
Exercises

Chapter 9. Approval Modes and Trust Boundaries

Learning Objectives
The Trust Model: What Codex CLI Can Touch
The Four Approval Modes
Auto-Approve: When to Use It and When Not To
Sandboxing: Filesystem and Network Restrictions
Kernel-Level vs. Hook-Based Sandboxing
Approval Mode Strategy for Teams
Programmatic Approval with PermissionRequest Hooks
Guardian Failure Modes and Escalation Patterns
Deny-Read Glob Policies (v0.122.0)
Smart Approvals: Adaptive Command Policies (v0.120.0)
CI Reproducibility Flags (v0.122.0)
Granular Approval Policies (v0.122–v0.125)
Summary
Exercises

Chapter 10. Debugging and Diagnosing Agent Failures

Learning Objectives
Reading the Session Transcript
Using Approval Mode as a Diagnostic
Diagnosing AGENTS.md Failures
Testing Commands Under the Sandbox
Context Overflow Symptoms
Recovering a Runaway Session
Structured Logging and --debug
Model Catalog Introspection with codex debug models
Codex Journal: Conversation History as Analysable Data (Preview, PR #20199)
Detecting Specification Drift with SLUMP
A Diagnostic Workflow Checklist
The Three-Level Observability Stack
Summary
Exercises

Chapter 11. Model Selection and Reasoning Effort

Learning Objectives
The Available Models
Multi-Provider Resilience
Reasoning Effort: The Second Knob
Task Taxonomy: Matching Model and Effort to Task
Cost Modelling: Estimating Monthly Spend
GPT-5.5 Prompting Patterns: Outcome-First Instructions
Model Selection in Automated Pipelines
Part 2 Summary
Beyond Local Configuration
Summary
Exercises
The Extension Stack

Chapter 12. MCP: Consuming and Serving

Learning Objectives
MCP in 60 Seconds
What MCP Is (and What It Is Not)
The Architecture: Hosts, Clients, and Servers
MCP and A2A: Complementary Protocol Layers
MCP outputSchema: Typed Tool Outputs
MCP Tool Annotations: Risk Vocabulary for Approval Policy
Parallel Tool Calls and External Event Injection
Connecting Codex CLI to MCP Servers
Connecting to Common Servers (GitHub, Browser, Database)
The Context Cost of MCP: What Gets Loaded
Enterprise MCP: Authentication, Scoping, and Restrictions
MCP Elicitations
Common MCP Gotchas
Building a Simple MCP Server
Serving MCP from Codex
Codex CLI as an MCP Server
Beyond Read-Only: Write-Back Integration
Ticketing System Integration (Jira, Linear)
Communication Platform Integration (Slack, Teams)
Bidirectional Database Patterns
Sandbox-Aware MCP Tools
MCP Resilience and Governance Hardening
Safety Boundaries for Write-Enabled Agents
Remote MCP Executor Stack
Docker MCP Toolkit: Containerised Tool Servers
Remote HTTP MCP Transport
MCP OAuth 2.1: Authenticating Remote Tool Servers
MCP Turn Metadata (PR #21219)
Context Fragments as MCP Injection Points
MCP Debugging and Diagnostics
Parallel Tool Calls (v0.121.0)
Sandbox-State Metadata (v0.125.0)
Building Custom MCP Servers: TypeScript and Python Patterns
MCP Namespace Cleanup (PR #21442, proposed)
OpenAI Developer Docs MCP Server
AWS MCP Server GA
Summary
Testing MCP Servers
Exercises

Chapter 13. Hooks: Intercepting the Agent Lifecycle

Learning Objectives
The Hook System: Overview and Events
SessionStart: Configuring the Environment
UserPromptSubmit: Shaping Input Before the Agent Acts
PreToolUse: Intercepting Shell Commands
PostToolUse: Observing Without Blocking
Stop: Teardown and Cleanup
Writing Robust Hooks
Real Hook Patterns: Enforcement, Audit, and Notification
Summary
Exercises

Chapter 14. The Skills Ecosystem: Using and Writing Skills

Learning Objectives
Part 1: The Consumer’s View: Using and Browsing the Ecosystem
Part 2: The Producer’s View: Writing Your Own Skills
The Customisation Stack: Five Layers of Agent Configuration
The Plugin System: From Skills to Distributable Packages
Summary
Exercises
Scale and Automation

Chapter 15. Context Window Management

Learning Objectives
The Quadratic Growth Problem
Thread Resume and Fork: Preserving Context Without Restarting
What Consumes Context (and What Doesn’t)
The /compact Command and Automatic Summarisation
Background Prefix Compaction
Sub-Agent Delegation as Context Management
Strategies for Large Codebases
Monitoring Context Usage
Context Compaction Architectures: Cross-Tool Comparison
The Model Lineage Context Compaction Breakthrough
Plan Mode and Fresh-Context Implementation
Context Fragments Architecture
Prompt Caching: Economics of Long Sessions
Persistent Context: The Two-Phase Memory Pipeline
MCP Memory Servers: Persistent Cross-Session Context
The Built-In Memory Pipeline
Memory Lifecycle Management
Team Memory: The Gap Beyond Individual Recall
Context Failure Modes: A Taxonomy
ThreadStore and Session Persistence (v0.129.0-alpha.2, pre-release)
GPT-5.5 Compaction Failures: Diagnosis and Mitigation
Strategic Forgetting: Application-Level Context Management
Specification Drift in Long-Context Sessions
Finding Past Sessions (v0.134.0)
Summary
Exercises

Chapter 16. Sub-Agents and Parallel Execution

Learning Objectives
The Sub-Agent Model
The TOML Subagent Definition Format
Task Decomposition: What to Parallelise
spawn_agents_on_csv: Fan-Out Patterns
Aggregating Results and Handling Failures
Path-Based Sub-Agent Addressing
Beyond Built-In Subagents: External Swarm Orchestration
Named Exec Environments: Per-Agent Sandbox Isolation (Proposed)
Exec Policy Propagation to Sub-Agents
Cascade Thread Archive Lifecycle
Agent Graph Store (#19229)
Multi-Agent v2 Depth Unlocking (#20180)
Multi-Agent V2 Fork Semantics Hardening (May 2026)
Multi-Agent Architecture Patterns Beyond Codex
Known Failure Mode: MCP Process Tree Leak
Metric Freedom: When to Use Multi-Agent vs Single-Agent
When Not to Parallelise
Summary
Exercises

Chapter 17. Cost Management and Quota Strategy

Learning Objectives
How Codex CLI quota works
Estimating Team Costs
Configuring Cost Ceilings
Monitoring and Alerting with Hooks
Cost-Quality Decision Matrix
Per-Reasoning-Effort Token Consumption
Prompt Caching: the Largest Single Cost Lever
Context-Usage Visibility from Plan Mode
Goal Mode: Cost-Aware Long-Horizon Tasks (v0.133 GA)
Bedrock Pricing as an Alternative to Direct API
Token Compression: RTK and the Cost Impact
On-Premises Cost: GB10 Break-Even Analysis
Summary
Exercises

Chapter 18. Multi-Agent Orchestration Patterns

Learning Objectives
The Three-Tier Orchestration Landscape
Pattern 1: Sequential Gated Chain
Pattern 2: Parallel Worker Swarm
Pattern 3: Wave-Based Hybrid
Choosing the Right Pattern
Pattern 4: Cross-Model Review Loop
Agentmaxxing: Human-Coordinated Cross-Vendor Parallelism
Conversation Branching as a Supervisor Pattern
Goal Mode: From Task Execution to Objective Tracking
Debugging Orchestration Failures
Orchestration Anti-Patterns to Avoid
MultiAgentV2: Custom Roles and Thread Orchestration (v0.128.0)
Remote-Control Command: External Orchestration of Running Sessions (v0.130.0)
Pattern 5: Iterative Repair Loops (Review-Repair-Validate)
Multi-Agent Desktop Workspaces
Summary
Exercises

Chapter 19. Worktrees and Isolated Execution

Learning Objectives
Git Worktrees: A Brief Recap
Why Worktrees Matter for Agentic Workflows
One Agent, One Worktree: The Isolation Principle
Worktrees in the Codex Desktop App
CLI Worktree Workflows
Merging Agent Work Back to Main
Worktree Lifecycle in CI
Worktree Workflow Patterns by Team Size
Summary
Exercises

Chapter 20. CI/CD Integration

Learning Objectives
Running Codex CLI in Non-Interactive Mode
The openai/codex-action GitHub Action
Automated Code Review on Every PR
Test Generation on Merge
Dependency Update Agents
GitLab CI/CD Integration
Analytics Pipeline: CI/CD Observability Layer
Self-Healing CI/CD: From Observation to Autonomous Remediation
Project-Level Skills: The codex-pr-body Pattern
Stacked PRs as a Review-Scaling Pattern
Structured Output and Session Resume
Safety Strategies for CI Agents
Remote Sandbox Configuration: Hostname-Pattern Policies
Hermetic Execution Patterns
Microsoft’s Agentic DevOps Playbook
Non-Interactive Pipeline Patterns: exec, resume, and Structured Output
codex exec CI/CD Patterns Reference
AutoLoop: Bounded Optimisation in CI
Empirical Evidence: What 33,000 Agentic PRs Reveal About Pipeline Design
CI/CD Session Reliability: WebSocket Keepalive and Remote Connections
Build System Integration: Bazel and Dagger
Summary
Exercises

Chapter 21. Security Hardening

Learning Objectives
The Threat Model for Agentic Systems
Prompt Injection: Attack Patterns and Defences
Filesystem Restrictions and Sandboxing
Network Allowlisting
Secret Management for Agent Environments
Audit Logging and Observability
MCP as an Attack Surface
Supply Chain Attacks: The Axios Incident and Binary Verification
Package Registry Compromises: PyPI and npm (April–May 2026)
The TanStack Supply Chain Attack: npm Worm and Sandbox Defence (May 2026)
Windows Sandbox Engineering: The Four-Layer Architecture
Indirect AGENTS.md Injection: Supply-Chain Poisoning of Agent Instructions
Automated Security Tools: Claude Security and Cursor Security Review
Model-Level Risks: The Goblin Incident and Reward Signal Leakage
CVE-2026-26268 and Comment-and-Control: Spring 2026 Agent Vulnerabilities
Enterprise Defence-in-Depth: The Five-Layer Security Model
Running Codex Safely: OpenAI’s Own Deployment Model
Empirical Security Findings: The Amplifying.ai Benchmark
Compliance Considerations
Offline Security Validation Tools
Summary
Exercises

Chapter 22. Enterprise Deployment

Learning Objectives
Distributing config.toml at Scale
RBAC: Three Admin Roles and Access Control
Managed Policies: requirements.toml
AGENTS.override.md: Enforcing Policy Across Teams
Onboarding an Engineering Team
Measuring ROI: Metrics That Work
Codex Cloud vs Self-Hosted
Multi-Cloud Provider Strategy
Dell AI Factory: On-Premises Codex Deployment
Agent Identity Authentication
Governance Frameworks for Agentic AI
Governance APIs and Data Pipelines
The Compliance API
Rollout Checklist
Remote Connections: Operational Guide
Hooks GA and Enterprise Governance Mapping
External Governance Frameworks: Microsoft AGT and Forrester ADS
Summary
Exercises

Chapter 23. Testing and Evaluation Strategy for Agentic Workflows

Learning Objectives
What Makes a Test Suite Agent-Friendly
Designing for Agent Execution
The Feedback Signal Problem
Using Codex CLI to audit your test suite
Evaluation Beyond Unit Tests
Building an Evaluation Harness
TDD as an Agent Feedback Loop
The 4-File Durable Memory Pattern for Long-Horizon Evaluation
Agent Test Quality: The Over-Mocking Problem and Review Benchmarks
April 2026 Evaluation Frameworks: CocoaBench, HiL-Bench, and AAR
Testing MCP Servers
Summary
Exercises
Specialised Workflows

Chapter 24. AI Code Review

Learning Objectives
Why AI Code Review Works (and Where It Doesn’t)
Configuring Codex for Code Review
The /review Command
PR Integration: Automated Review on Every PR
Structured Output Code Review with codex exec --output-schema
Writing Review Checklists in AGENTS.md
Human-AI Review Collaboration Patterns
The Review-Fix Loop: A Three-Level Maturity Model
Summary
Exercises

Chapter 25. Frontend Engineering with React and TypeScript

Learning Objectives
Frontend-Specific AGENTS.md Configuration
Component Generation and Scaffolding
Test Generation for React Components
The Explorer/Worker Sub-Agent Pattern
Accessibility Audit Automation
Design-to-Code Workflows
Codex Browser Use: Visual Verification
Summary
Exercises

Chapter 26. Python Team Workflows

Learning Objectives
Python-Specific AGENTS.md
Pytest Integration and Test Generation
Type Hints and Docstring Automation
uv, ruff, and Modern Python Toolchain
Data Pipeline Code Generation
Multi-Service Python Workflows
Summary
Exercises

Chapter 27. Web Search and Research Agents

Learning Objectives
Enabling Web Search in Codex CLI
Research-to-Code Workflows
Dependency Research and Evaluation
Staying Current with API Changes
Knowledge-Augmented Agents: MCP Knowledge Servers
Combining Web Search with Sub-Agents
Summary
Exercises

Chapter 28. Codebase Migration

Learning Objectives
Why Codex CLI Excels at Migration Work
Planning the Migration with Codex CLI
Incremental Migration Patterns
Validation Strategies During Migration
Migrating from Claude Code to Codex CLI
Framework and Language Version Migrations
External Agent Migration: The Detect/Import API
Summary
Exercises
Architecture and Vision

Chapter 29. The Agents SDK

Learning Objectives
What the Agents SDK Adds
The SDK Architecture: Agents, Handoffs, Tools, and Guardrails
Two Orchestration Patterns: Handoffs vs Agents-as-Tools
State Persistence: Four Strategies
Function Tools and Hosted Tools
Codex CLI as an MCP Server in SDK Pipelines
Building a Designer-Developer-Tester Pipeline
Tracing and Observability
SDK vs CLI: Choosing the Right Level
TypeScript SDK
Cross-Language Orchestration and Heterogeneous Model Routing
Python SDK MessageRouter: Internal Concurrency Plumbing
The openai-codex Python SDK: Embedding the Agent Runtime
The OpenAI Cookbook: Canonical Tutorial Path
Streaming and Approval Pauses
Model Configuration and Routing
Recent SDK Developments (April–May 2026)
Summary
Exercises

Chapter 30. Agentic Primitives Compared

Learning Objectives
The Four Primitives: Agents, Handoffs, Tools, Guardrails
How Codex CLI Implements Each Primitive
LangChain and LangGraph
AutoGen and CrewAI
Google Gemini Agents and ADK
Choosing a Primitives Model
Codex CLI’s Custom Agent TOML in Practice
What Happens When You Type codex
Context Health Monitoring
Runtime Architecture: Remote Control and State Management
Extending the Primitives: Custom Agents in TOML vs Code
Summary
Exercises

Chapter 31. Harness Engineering for Long-Running Agents

Learning Objectives
What Harness Engineering Is
The WORKFLOW.md Pattern
State Persistence Across Agent Runs
Error Recovery and Resumption
The Proof-of-Work Principle
Symphony: A Harness in Practice
Summary
Exercises

Chapter 32. The Agentic Engineering Pod

Learning Objectives
Section 1: Why Three Roles?
Section 2: The Context Architect: Human Role
Section 3: The Value Engineer: Human Role
Section 4: The Quality Engineer: Human Role
Section 5: The Pod in Practice: A Feature Lifecycle
Section 6: The Agentic Pod Principles
Section 7: Failure Modes and Pod Anti-Patterns
Section 8: Distributing the Pod with Plugins
Metric Freedom and Benchmark Failure Modes: Tools for the Pod
Closing
Exercises

Conclusion: The Agentic Engineer

A Note from the Author, Or Rather, the Tool

How This Book Was Made
What This Means for You
The Book as Artefact
An Invitation
A Face for the Voice

Bibliography

1. Research Papers
2. OpenAI Sources
3. Standards and Security
4. Frameworks and Protocols
5. Benchmarks and Evaluations
6. Developer Tooling
7. Industry Reports and Blogs

About

Share this book

Categories

Feedback

Author

Launch

Contents

Copyright and Trademarks

Introduction

Chapter 1. What Is Codex CLI?

Chapter 2. Getting Started with Codex CLI

Chapter 3. What’s New in Codex CLI

Chapter 4. Benchmarks and Real-World Performance

Chapter 5. Competing Tools and When to Use Each

Chapter 6. Codex in the Wild: Interfaces and Community

Chapter 7. Prompting Codex CLI Effectively

Chapter 8. AGENTS.md: Patterns and Pitfalls

Chapter 9. Approval Modes and Trust Boundaries

Chapter 10. Debugging and Diagnosing Agent Failures

Chapter 11. Model Selection and Reasoning Effort

Chapter 12. MCP: Consuming and Serving

Chapter 13. Hooks: Intercepting the Agent Lifecycle

Chapter 14. The Skills Ecosystem: Using and Writing Skills

Chapter 15. Context Window Management

Chapter 16. Sub-Agents and Parallel Execution

Chapter 17. Cost Management and Quota Strategy

Chapter 18. Multi-Agent Orchestration Patterns

Chapter 19. Worktrees and Isolated Execution

Chapter 20. CI/CD Integration

Chapter 21. Security Hardening

Chapter 22. Enterprise Deployment

Chapter 23. Testing and Evaluation Strategy for Agentic Workflows

Chapter 24. AI Code Review

Chapter 25. Frontend Engineering with React and TypeScript

Chapter 26. Python Team Workflows

Chapter 27. Web Search and Research Agents

Chapter 28. Codebase Migration

Chapter 29. The Agents SDK

Chapter 30. Agentic Primitives Compared

Chapter 31. Harness Engineering for Long-Running Agents

Chapter 32. The Agentic Engineering Pod

Conclusion: The Agentic Engineer

A Note from the Author, Or Rather, the Tool

Bibliography

Get the free sample chapters

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub