Leanpub Header

Skip to main content

Why We Still Suck At Resilience

Organizational Dynamics

Your organization does all the right things. They practice chaos engineering, GameDays, and load testing. They conduct incident reviews and operational readiness reviews. Yet the same types of incidents keep recurring. This book examines why resilience practices so often fail to build resilience, revealing the organizational dynamics that systematically transform learning mechanisms into compliance theater and what you can do to navigate them consciously.

Minimum price

$29.00

$49.00

You pay

$49.00

Author earns

$39.20
$

...Or Buy With Credits!

You can get credits with a paid monthly or annual Reader Membership, or you can buy them here.

Buying multiple copies for your team? See below for a discount!

PDF
EPUB
WEB
About

About

About the Book

Why We Still Suck At Resilience: Organizational Dynamics

Your organization has invested heavily in resilience. You've implemented chaos engineering, incident analysis, GameDays, load testing, and operational readiness reviews. You've followed the frameworks, hired experts, and built the programs. Everything looks good on paper.

Yet when production fails in ways that matter, you discover that all that investment hasn't built the adaptive capacity you expected. The same types of incidents keep recurring. Teams struggle with novel failures despite months of practice. The gap between impressive-sounding programs and actual resilience persists.

If this sounds familiar, this book is for you.

This book focuses on organizational and cultural dynamics of resilience work. It examines why your technically excellent practices aren't building the capability you need.

This book examines the organizational dynamics that systematically transform learning mechanisms into compliance theater. Through more than a decade of consulting work across hundreds of organizations, I have noticed that resilience practices fail not because they're poorly implemented, but because organizational forces, efficiency pressure, performance demands, control orientation, short-term focus, heroism culture, systematically undermine the conditions that enable learning.

At the heart of every system failure is a gap between Work-as-Imagined (how we think systems work) and Work-as-Done (how they actually work). This gap is inevitable in complex systems. Resilience comes from building organizational capacity to discover and navigate this gap through continuous learning. Most organizations instead try to eliminate the gap through controls and documentation, drifting into brittleness while feeling increasingly safe.

What you'll learn from this book:

  • Why the gap between Work-as-Imagined and Work-as-Done is where fragility hides, and why learning capacity matters more than perfect planning
  • The five organizational tensions that systematically undermine resilience practices, and how to navigate them consciously rather than drift unconsciously
  • What determines whether practices function as genuine learning mechanisms or become expensive theater
  • How to make organizational drift visible before it degrades adaptive capacity
  • Why psychological safety, appropriate incentives, and leadership support are prerequisites for any practice to build capability

The approach is philosophical rather than prescriptive. This book doesn't offer a five-step plan to fix resilience. It offers frameworks for understanding what's actually happening in your organization, vocabulary for discussing dynamics that usually stay invisible, and principles for navigating irreducible tensions that cannot be eliminated.

Who this book is for:

  • Engineering leaders and SREs who know their resilience practices aren't working but can't articulate why
  • Teams doing all the right things yet still experiencing repeated incidents
  • Anyone frustrated that significant investment hasn't built the adaptive capacity they need
  • Leaders who need to explain to their organization why resilience work keeps failing despite genuine effort

What makes this book different:

This book combines academic resilience engineering research (Hollnagel, Woods, Dekker, Edmondson, and more) with practical consulting experience, examining not just how practices should work in theory, but why they fail in practice. It's grounded in operational reality while being intellectually rigorous. It's honest about complexity rather than offering simple solutions that don't work.

If you're looking for a checklist to follow, this isn't your book. If you want to understand why your resilience work keeps failing and how to navigate the organizational dynamics that undermine it, keep reading.

Team Discounts

Team Discounts

Get a team discount on this book!

  • Up to 5 members

    Minimum price
    $120.00
    Suggested price
    $210.00
  • Up to 10 members

    Minimum price
    $230.00
    Suggested price
    $400.00
  • Up to 25 members

    Minimum price
    $500.00
    Suggested price
    $875.00
  • Up to 50 members

    Minimum price
    $900.00
    Suggested price
    $1,500.00
  • Up to 100 members

    Minimum price
    $1,600.00
    Suggested price
    $2,700.00

Author

About the Author

Adrian Hornsby

Adrian Hornsby has over twenty years of experience in software systems engineering and operations, spanning organizations from large enterprises to small startups. He spent nine years at Amazon Web Services (AWS), including the last four years as a Principal Engineer, where he helped shape resilience strategies for some of the world’s largest and most critical systems. Today, he is the Founder and CEO of Resilium Labs, helping organizations build technical and cultural resilience to thrive through disruption. He holds a Master’s Degree in Networks and Telecommunications from Telecom St-Etienne, pursued doctoral studies in Networks and Telecommunications at Tampere University of Technology, and earlier earned a Bachelor-Technician Degree in Electronics from Lycée Portes de l’Oisans.

Contents

Table of Contents

Preface

  1. What This Book Offers
  2. Who This Book Is For
  3. How to Read This Book
  4. A Note on Examples
  5. A Note on Interpretation
  6. Acknowledgments
  7. What I Hope You Take Away
  8. Thank You
  9. Part 1: The Problem

Chapter 1: How Complex Systems Fail

  1. The Mechanism: Why Everything Fails
  2. Learning Is The Only Option
  3. What We Mean by Resilience
  4. What This Book Examines
  5. How Organizations Drift

Chapter 2: What Learning Actually Means

  1. Organizational Learning and Learning Organizations
  2. Three Types of Learning
  3. The Learning Cycle
  4. The Five Practices and the Learning Cycle
  5. The Visibility Prerequisite
  6. The Vocabulary Barrier

Chapter 3: Why Practices Fail to Build Learning

  1. The Definition Problem
  2. The Five Primary Tensions
  3. The Cascade Model
  4. How This Manifests

Chapter 4: The Bedrock

  1. Psychological Safety
  2. Appropriate Incentives
  3. Leadership Support
  4. Maintaining the Bedrock Under Pressure
  5. Why the Bedrock Matters
  6. Part 2: The Practices

Chapter 5: Operational Readiness Reviews

  1. What Readiness Reviews Actually Reveal
  2. Readiness Reviews as Theater
  3. Why Checklists Fail Silently
  4. Designing for Discovery
  5. Who Should Conduct Reviews
  6. What Readiness Reviews Require from the Bedrock
  7. Continuous Operational Review

Chapter 6: Load Testing

  1. What Load Tests Actually Reveal
  2. Why Capacity Assumptions Are Systematically Wrong
  3. Load Testing as Theater
  4. Designing for Discovery
  5. The Cliff
  6. Interpreting Results for Learning
  7. What Load Testing Requires from the Bedrock

Chapter 7: Chaos Engineering

  1. What Chaos Engineering Actually Reveals
  2. Chaos Engineering as Theater
  3. Designing for Discovery
  4. Chaos Engineering Under Load
  5. What Chaos Engineering Requires from the Bedrock
  6. Double-Loop Learning Through Chaos Engineering
  7. Chaos Engineering and the WAI-WAD Gap

Chapter 8: GameDays

  1. What GameDays Actually Reveal
  2. GameDays as Theater
  3. Designing for Discovery
  4. The Debrief Is Where Learning Happens
  5. Unscripted Versus Scripted Scenarios
  6. What GameDays Require from the Bedrock
  7. GameDays and Knowledge Distribution
  8. GameDays and the WAI-WAD Gap

Chapter 9: Incident Analysis

  1. What Incidents Actually Reveal
  2. Incident Analysis as Theater
  3. Designing for Learning
  4. Blameless Analysis in Practice
  5. What Incident Analysis Requires from the Bedrock
  6. Learning Loops from Incidents
  7. Incident Analysis and the WAI-WAD Gap
  8. Part 3: Navigating the Gap

Chapter 10: Designing for Adaptability

  1. Learning from Successful Resilience
  2. Who Designs for Adaptability
  3. Communication Patterns That Enable Learning
  4. Hiring and Cultural Transmission
  5. Knowledge Distribution Systems
  6. The Bedrock at Organizational Scale
  7. The Resilience That Isn’t
  8. The Limits of Design

Chapter 11: Making Drift Visible

  1. Recognizing the Pattern of Drift
  2. Making Trade-offs Explicit
  3. What Conscious Navigation Looks Like
  4. The Role of Leadership in Making Drift Visible
  5. The Limits of Visibility

Chapter 12: The Prevention Paradox

  1. The Asymmetry
  2. What Gets Cut, and What Gets Lost
  3. The Tensions Resurface
  4. The Delay Obscures Causation
  5. Why Leaders Can’t See It
  6. The Pattern Repeats Across Domains
  7. Navigating the Paradox

Chapter 13: Organizing Resilience Work

  1. Why Organizational Structure Matters for Learning
  2. Three Models That Struggle
  3. The SRE Promise
  4. The SRE Reality
  5. The Enabling Team Model
  6. Unified Ownership Enables Connections
  7. The Enabling Team’s Job
  8. Staffing the Enabling Team
  9. Measuring What Matters

Chapter 14: The Future of Resilience

  1. Automation’s Learning Problem
  2. When Machines Do the Learning
  3. AI as Learning Mechanism
  4. Tooling Has Gotten It Wrong
  5. Starting Where You Are
  6. Part 4: Resources

Appendix: Operational Readiness Review Template

  1. How to Use This Template
  2. Can You Have the Right Answers to All Questions?
  3. Who Should Conduct an ORR?
  4. When Should You Conduct an ORR?
  5. How Does an ORR Differ from Architecture Reviews?
  6. Template Structure
  7. 1 - Service Definition and Goals
  8. 2 - Architecture
  9. 3 - Failures, Impact & Adaptive Capacity
  10. 4 - Risk Assessment
  11. 5 - Learning & Adaptation
  12. 6 - Monitoring, Metrics & Alarms
  13. 7 - Testing & Experimentation
  14. 8 - Deployment
  15. 9 - Operations & Adaptive Capacity
  16. 10 - Disaster Recovery
  17. 11 - Organizational Learning

Appendix: Incident Analysis Template

  1. How to Use This Template
  2. Moving Beyond Root Cause Analysis
  3. Language That Promotes Learning
  4. Developing Precise Vocabulary
  5. Interview Approach
  6. Template Structure
  7. 1 - Title
  8. 2 - Incident Details
  9. 3 - Owner & Review Committee
  10. 4 - Classification
  11. 5 - Executive Summary
  12. 6 - Supporting Data
  13. 7 - Customer Impact
  14. 8 - Incident Response Analysis
  15. 9 - Post-Incident Analysis
  16. 10 - Timeline
  17. 11 - Contributing Factors Analysis
  18. 12 - Surprises & Learning
  19. 13 - Action Items
  20. 14 - Learning Loops & Knowledge Sharing
  21. Quality Checklist

References

  1. Works Cited
  2. Further Reading

Glossary

  1. A
  2. B
  3. C
  4. D
  5. E
  6. F
  7. G
  8. H
  9. I
  10. L
  11. M
  12. N
  13. O
  14. P
  15. R
  16. S
  17. T
  18. U
  19. V
  20. W
  21. Y

About the Author

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub