How Software Fails
$24.00
Minimum price
$29.00
Suggested price

How Software Fails

The Hidden Laws of Complex Systems

About the Book

How Software Fails: A Field Guide to Understanding Complex System Disasters

Why Systems Break in Ways Their Creators Never Imagined.

Software failures aren't accidents, they're inevitabilities. In a universe governed by probability rather than certainty, even cosmic rays from distant stars can flip bits in computer memory, causing election machines to miscount votes or video game characters to jump impossibly high. But cosmic interference is just the beginning of how our most critical systems fail in spectacular and unpredictable ways.

Through gripping real-world case studies, this field guide reveals the hidden laws governing complex system disasters. Discover how Knight Capital lost $460 million in 45 minutes due to a single misplaced software flag. Learn why the Therac-25 radiation machine killed patients despite passing every safety test.Understand how a 40-kilobyte configuration file crashed 8.5 million computers worldwide, grounding flights and shuttering hospitals across the globe.

What You'll Learn

  • Based on complexity theorist Richard Cook's groundbreaking principles, you'll discover:Why testing can never guarantee perfection, and what to do instead
  • How "reasonable" decisions combine to create unreasonable disasters
  • Why complex systems always run in degraded mode, and why that's actually normal
  • How scale transforms rare impossibilities into daily certainties
  • Why the search for "root causes" consistently leads us astray

From Understanding to Action

  • But this isn't just about understanding failure, it's about building resilience. Explore practical strategies from organizations that have learned to thrive in chaos:
  • NASA's Mars rovers that adapt and learn from component failures, operating decades beyond their planned lifetimes
  • The internet's routing protocols that automatically heal themselves when damaged
  • Netflix's chaos engineering that deliberately breaks their own servers to build antifragile systems
  • The ethical frameworks for deciding what level of failure is acceptable when lives are at stake

Who This Book Is For

Whether you're a software engineer debugging production issues, a manager trying to prevent the next catastrophic outage, or simply curious about why technology fails in impossible ways, this book will forever change how you think about the complex systems that run our world. 

  • Share this book

  • Categories

    • Computers and Programming
    • Testing
    • Ethics & Technology
    • Software
    • Software Engineering
    • System Integration
  • Feedback

    Email the Author(s)

About the Author

Engin Yöyen
Engin Yöyen

I am a Software Engineer, dad, humble home chef (emphasis on humble), motorcycle enthusiast, and currently living the Berlin life with my wife and our two tiny humans.

I did dabble in everything from the Internet of Things to Telecommunications, Smart Classrooms, CMS platforms, and more. So yes, I do build software. I’ve even done it at places you might’ve heard of… like Microsoft and eBay. (Name drop: achieved.)

I have a background with everything but the kitchen sink: Computer Science + Psychology (yes, really), Business Administration, and a Master’s in Embedded Software Engineering.

I like tinkering with tech, writing things (sometimes just to see if anyone’s reading), experimenting with new ideas, and learning weird, wonderful stuff no one asked for, but someone, someday, might need.

Table of Contents

    • Preliminaries
      • Errata & Suggestion
    Part I: Foundation
    • Chapter 1: When Cosmic Rays Attack
      • What Do a Video Game, an Election, and an Airplane Have in Common?
      • The Bit That Changed Everything
      • The Cosmic Connection
      • The Scale Problem: When Rare Becomes Routine
      • Making Sense of the Impossible
      • Chapter 1 Key Takeaways
    • Chapter 2: The Rules of the Game
      • How Faults Become Failures
      • What Makes a System “Complex”?
      • How Complex Systems Actually Fail
      • Designing for Inevitable Failure
      • Your Diagnostic Toolkit: Using Cook’s Rules
      • Chapter 2 Key Takeaways
    Part II: Failure Patterns
    • Chapter 3: The Blame Game
      • Before the Fall: The Choices That Mattered
      • The Unraveling: When Simple Solutions Create Complex Problems
      • The Detective Work: Following the Trail
      • The Pattern: How Complex Systems Fail
      • The Aftermath: Learning the Wrong Lessons
      • The Broader Truth: Why Root Cause Analysis Falls Short
      • Chapter 3 Key Takeaways
    • Chapter 4: Death by a Thousand Cuts
      • The Drift: When Reasonable Becomes Catastrophic
      • The Aftermath: Learning from Drift
      • The Psychology Behind Drift: How Normalization Enables Failure
      • The Pre-Mortem: Imagining Failure to Prevent It
      • Chapter 4 Key Takeaways
    • Chapter 5: When Humans and Machines Have Communication Problems
      • The Human Cost of Interface Failure
      • The Critical Development Failures That Created Catastrophe
      • The Unraveling: When “Normal” Means “Catastrophic”
      • The Design Problem: When Context Gets Lost in Translation
      • Chapter 5 Key Takeaways
    • Chapter 6: The Butterfly Effect Has Trust Issues
      • When Everything Connects to Everything
      • When Redundancy Becomes a Single Point of Failure
      • The Anatomy of a $370 Million Failure
      • The Pattern: How Tight Coupling Amplifies Small Failures
      • A Warmer Example: When Your House Becomes Too Smart
      • The Ripple Effect: How Local Failures Become Global Problems
      • The False Promise of Redundancy
      • The Uncomfortable Truth About Connection
      • Chapter 6 Key Takeaways
    • Chapter 7: The Potemkin Village of Code
      • The Hidden Vulnerability: How Excellence Concealed Danger
      • The Performance Paradox: When Excellence Creates Vulnerability
      • The Pattern: How Industries Optimize Into Vulnerability
      • Why We Can’t See What’s Hidden
      • Chapter 7 Key Takeaways
    • Chapter 8: Murphy’s Law Meets Moore’s Law
      • The Mathematics of Scale Amplification
      • The CrowdStrike Cascade: Anatomy of Global Failure
      • The Pattern: Scale as the Ultimate Amplifier
      • Why Scale Makes Recovery Impossible
      • Designing for Scale Resilience
      • Chapter 8 Key Takeaways
    Part III: Making Peace with Chaos
    • Chapter 9: Breaking Things to Fix Them
      • The Setup: The Traditional Approach to Not Breaking
      • The Problem with Perfect Components
      • The Chaos Engineering Philosophy
      • The Unraveling That Works
      • Learning from Controlled Chaos
      • The Pattern: Antifragile Design
      • From Fighting Cook’s Rules to Embracing Them
      • The Psychological Safety Requirement
      • Chapter 9 Key Takeaways
    • Chapter 10: The Internet Routes Around Everything
      • The Setup: From Network Research to Robust Design
      • The Routing Philosophy: Simple Rules, Complex Behavior
      • How the Internet Healed Itself
      • Lessons for Building Adaptive Systems
      • The Routing Paradox
      • Chapter 10 Key Takeaways
    • Chapter 11: Failing Fast, Failing Forward, Failing Better
      • The Setup: Building for Breakdown
      • The Philosophy of Graceful Degradation
      • The Unraveling That Wasn’t
      • The Fast Failure Philosophy
      • The Recovery Time Objective
      • The Culture of Resilient Failure
      • The Resilience Investment Paradox
      • Chapter 11 Key Takeaways
    • Chapter 12: When Good Enough Isn’t
      • Who Decides What Level of Broken Is Acceptable?
      • The Invisible Moral Layer
      • The Degraded Mode Dilemma
      • The Gambler’s Ethics
      • Making the Invisible Visible
      • Chapter 12 Key Takeaways
    • Epilogue: Living with Inevitable Failure
      • The Pattern Beneath the Patterns
      • The Moral Thread
      • Building for the World We Actually Live In
      • The Future of Failure
      • The Wisdom of Inevitable Failure
    • Technical Appendix
      • Therac-25 Race Condition
      • Spectre Attack
      • CrowdStrike Channel File 291 Incident
      • Border Gateway Protocol (BGP) Technical Details

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $14 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub