Phase 0 - Orientation
Learning How to Think About Design
Before we talk about systems, we need to talk about how you think.
Most engineers approach design the way they approach code: gather requirements, choose tools, apply patterns, and move forward. This works—until it doesn’t. At some point, the systems you build stop behaving like the code you wrote. Problems emerge that cannot be traced to a single function, service, or decision. When that happens, adding more knowledge does not help. Changing how you reason does.
This phase exists to interrupt familiar instincts.
Phase 0 is not about teaching you what to design. It is about exposing the assumptions you already carry—about correctness, scale, failure, and simplicity—and showing where they quietly break down. These chapters will feel less concrete than the ones that follow. That is intentional. You cannot reason well about systems until you are comfortable with uncertainty, trade-offs, and incomplete explanations.
If you rush through this phase looking for answers, it will feel unsatisfying. If you sit with it, it will change how the rest of the book lands. The goal of Phase 0 is simple: by the time you reach Phase 1, you should no longer be asking, “What framework should I use?” You should be asking, “What kind of problem is this, really?”
Chapter 1: What Is System Design
(and What It Is Not)
A System That Worked Yesterday
The system worked yesterday.
The code hadn’t changed. Tests were green. No alerts fired overnight.
And yet this morning, requests were timing out.
Someone suggested adding more instances. Someone else blamed the database. A third person said, “It works on my machine.”
All reasonable guesses. All wrong.
By noon, a temporary fix was deployed. By evening, the incident was “resolved.” By next week, it happened again—somewhere else.
Nothing was broken. And that was the problem.
The Question Nobody Asked
After incidents like this, teams usually ask familiar questions:
- Which service caused it?
- Was it a bug or a configuration issue?
- Do we need better monitoring?
These are safe questions. They have owners. They lead to tickets.
The dangerous question is the one nobody asks:
What assumption did we design this system around—and when did it stop being true?
That question does not map to a single component. It does not have a Jira ticket. It cannot be fixed with a patch.
And yet, most system failures quietly begin there.
When Code Stops Explaining Behavior
As a developer, you are trained to reason locally.
Input goes in. Logic executes. Output comes out.
When something breaks, you look for the line of code responsible.
Systems do not behave this way.
In systems:
- Failures emerge from interactions
- Latency appears without slow functions
- Load exposes assumptions you forgot you made
Every component can behave “correctly” while the system misbehaves.
This is not an edge case. This is the default state of non-trivial systems.
The Moment Code Becomes a System
There is a moment—often unnoticed—when local reasoning stops working.
The moment when:
- Understanding one service is no longer enough
- Fixing one issue creates another elsewhere
- Behaviour depends on timing, not logic
- Confidence drops even though competence hasn’t
Nothing announces this transition. No role change marks it. No diagram captures it cleanly.
But once it happens, the rules change.
System design begins after this moment. Most teams do not notice this moment. They only feel its consequences