Kick off your book project in 2 hours, get started with GhostAI in 2 hours, or do both! Free live workshops, on Zoom. You’ll leave with a real book project and a clear plan to keep going. Saturday, June 27, 2026.

Leanpub Header

Skip to main content

Distributed System Illustrated

Master distributed systems through visual diagrams — from clock drift and CAP to Paxos, Raft, and distributed transactions, explained with clear illustrations instead of dense academic papers.

Minimum price

$19.90

$29.90

You pay

Author earns

$

Also available for 1 book credit with a Reader Membership

PDF
About

About

About the Book

About This Book

Distributed System Illustrated: A Visual Guide for Engineers

Building reliable systems out of unreliable components — that's the essential challenge of distributed computing. Networks drop packets, clocks drift, nodes crash, and yet we expect 24/7 availability from every modern service.

This book takes a visual, first-principles approach to distributed systems. Instead of drowning you in formal proofs or assuming you already understand the jargon, it uses clear diagrams and intuitive reasoning to peel back the layers — from why physical clocks can't be trusted, all the way to how Google Spanner achieves global consistency with atomic clocks.

What Makes This Book Different
  • Derive, don't memorize. The Paxos algorithm isn't presented as a finished artifact to be accepted on faith. Instead, you'll watch it emerge step by step from simple quorum reads and writes, understanding why each piece is necessary before moving to the next.
  • Engineering depth, not just theory. Raft is covered in full — not just leader election and log replication, but also cluster membership changes (single-step and joint consensus), liveness problems under network partitioning (Pre-Vote, CheckQuorum, Leader Lease), log compaction, and implementing linearizable reads.
  • Visual-first. Every concept is illustrated. Consistency models, log replication flows, two-phase commit sequences, TrueTime commit wait — all accompanied by diagrams that make the abstract concrete.
  • Full-stack coverage. From the physics of quartz clocks and NTP's flawed symmetry assumption, through Lamport clocks and vector clocks, replication and CAP, Paxos and Raft, consistent hashing and Dynamo, all the way to distributed transactions, Spanner's TrueTime, TCC, and SAGA.
What You'll Learn
  • Why physical clocks fail in distributed systems, and how logical clocks (Lamport, Vector) capture causality instead
  • The complete consistency model spectrum — from linearizability to eventual consistency — and when to choose which
  • How Paxos works, derived from first principles via quorum read-write evolution
  • Raft in production-grade detail: election, replication, safety, membership changes, liveness, snapshots, and linearizable reads
  • Consistent hashing, virtual nodes, and how Dynamo handles partitioning at scale
  • ACID internals (Undo/Redo logs, 2PL, OCC, MVCC, Write Skew), distributed transactions (2PC/3PC), and how Spanner combines TrueTime + Paxos + 2PC for global external consistency
  • When to use strong consistency (2PC/Spanner) vs. eventual consistency (TCC/SAGA), and the trade-offs involved
Who This Book is For
  • Backend engineers and architects who want to go beyond surface-level understanding of distributed systems
  • Developers preparing for system design interviews who need solid command of Paxos, Raft, CAP, and consistency models
  • Students looking for a reference that's more visual than a textbook and more rigorous than a blog post
  • Anyone curious about how large-scale internet systems actually work under the hood

Basic computer science knowledge (data structures, operating systems, networking) is assumed. No prior distributed systems experience required.

Chapter Overview
  1. Overview of Distributed Systems — Challenges, fallacies, and the mindset shift from single-machine to distributed thinking
  2. Distributed System Models — Two Generals, Byzantine Generals, communication/failure/timing models
  3. Time and Order — Physical clock limitations, Happens-Before, partial vs. total order, Lamport clocks, vector clocks
  4. Replication — Replication modes, quorum read-write, consistency models (linearizability to eventual), CAP and PACELC
  5. Distributed Consensus Algorithms — FLP impossibility, Paxos derived from first principles, Raft in full engineering detail
  6. Partitioning — Consistent hashing, Chord, virtual nodes, Amazon Dynamo
  7. Transactions — ACID internals, concurrency control (2PL, OCC, MVCC), distributed transactions (2PC/3PC), Google Spanner, TCC, SAGA

Share this book

Author

About the Author

lichuang

Lichuang is a software engineer with extensive experience building distributed systems, databases, and storage engines. His work spans consensus protocols, replication, and cloud-native infrastructure. He is an active open-source contributor — most notably a committer on openraft, the leading Raft consensus library in the Rust ecosystem.

He is the author of Lua-Source-Internal, a widely-read deep dive into the Lua VM, compiler, and runtime — later published as the book Lua: Design and Implementation in both Simplified and Traditional Chinese. He shares more of his technical writing at codedump.info. His approach: cut through academic jargon, lean on diagrams, and help engineers grasp why systems behave the way they do — not just memorize how.

Contents

Table of Contents

  • 1. Overview of Distributed Systems
    • 1.1 What Is a Distributed System
    • 1.2 Challenges of Distributed Systems
      • 1.2.1 Unreliable Networks
      • 1.2.2 Clock and Ordering Problems
      • 1.2.3 Partial Failure
      • 1.2.4 Data Consistency
    • 1.3 Mindset Shift
    • 1.4 Chapter Summary
  • 2. Models of Distributed Systems
    • 2.1 Two Generals' Problem
    • 2.2 Byzantine Generals' Problem
    • 2.3 System Models
      • 2.3.1 Communication Models
      • 2.3.2 Failure Models
        • 2.3.2.1 Crash Model
        • 2.3.2.2 Omission Model
        • 2.3.2.3 Byzantine Model
      • 2.3.3 Timing Models
    • 2.4 Chapter Summary
  • 3. Time and Order in Distributed Systems
    • 3.1 State, Events, and Snapshots
    • 3.2 Physical Clocks
      • 3.2.1 Physical Time Sources and Representation
      • 3.2.2 Wall Clocks and Monotonic Clocks
      • 3.2.3 Physical Time Synchronization
      • 3.2.4 Deficiencies of Physical Clocks
    • 3.3 Causality and Event Ordering
    • 3.4 Total Order and Partial Order
    • 3.5 Causality and the Happened-before Relation
    • 3.6 Logical Clocks
      • 3.6.1 Lamport Clocks
      • 3.6.2 Vector Clocks
    • 3.7 Global Snapshots in Distributed Systems
    • 3.8 Chapter Summary
  • 4. Replication
    • 4.1 Primary-Backup Replication
      • 4.1.1 Data Replication Modes
      • 4.1.2 Quorum
      • 4.1.3 Client Request Routing
      • 4.1.4 Replicating Data
      • 4.1.5 Node Failures
    • 4.2 Consistency Models
      • 4.2.1 What Is a Consistency Model
      • 4.2.2 Consistency Model Diagram Conventions
      • 4.2.3 Sequential Consistency
      • 4.2.4 Linearizability
      • 4.2.5 Causal Consistency
      • 4.2.6 Eventual Consistency
        • 4.2.6.1 Safety and Liveness
        • 4.2.6.2 Eventual Consistency
    • 4.3 CAP Theorem
    • 4.4 Client-Centric Consistency Models
      • 4.4.1 Consistent Prefix
      • 4.4.2 Monotonic Reads
      • 4.4.3 Read Your Writes
      • 4.4.4 Baseball Game Example
    • 4.5 Leaderless Replication
      • 4.5.1 Overview
      • 4.5.2 Read-Write Quorum Mechanism
      • 4.5.3 Data Conflict Problem
      • 4.5.4 Cluster Membership Management
      • 4.5.5 Data Repair
    • 4.6 Chapter Summary
  • 5. Distributed Consensus Algorithms
    • 5.1 Introduction to Consensus Algorithms
      • 5.1.1 Overview
      • 5.1.2 Difficulties of Consensus Algorithms
      • 5.1.3 Consensus and Consistency
    • 5.2 FLP Impossibility Theorem
    • 5.3 Paxos Algorithm
      • 5.3.1 Birth of the Paxos Algorithm
      • 5.3.2 Intuitive Explanation of Paxos
        • 5.3.2.1 Imperfect Replication Strategies
        • 5.3.2.2 From Quorum Reads and Writes to Paxos
      • 5.3.3 Paxos Algorithm Description
        • 5.3.3.1 Prepare Phase
        • 5.3.3.2 Accept Phase
    • 5.4 Multi-Paxos Algorithm
    • 5.5 Raft Algorithm
      • 5.5.1 Basic Concepts
      • 5.5.2 Leader Election
      • 5.5.3 Log Replication
      • 5.5.4 Safety
      • 5.5.5 Cluster Membership Changes
        • 5.5.5.1 Safety
        • 5.5.5.2 Single-Step Membership Change
        • 5.5.5.3 Joint Consensus
      • 5.5.6 Liveness Problems During Network Partitioning
      • 5.5.7 Log Compaction
      • 5.5.8 Implementing Linearizable Reads
    • 5.6 Chapter Summary
  • 6. Partitioning
    • 6.1 Partitioning Strategies
      • 6.1.1 Range Partitioning
      • 6.1.2 Hash Partitioning
      • 6.1.3 Consistent Hashing
    • 6.2 Data Migration Process
    • 6.3 Request Routing
      • 6.3.1 Request Routing Patterns
      • 6.3.2 Request Routing Implementation
    • 6.4 Chapter Summary
  • 7. Transactions
    • 7.1 Understanding ACID in Depth
      • 7.1.1 Atomicity and Durability
      • 7.1.2 Isolation
      • 7.1.3 Consistency
    • 7.2 Concurrency Control
      • 7.2.1 Two-Phase Locking
      • 7.2.2 Optimistic Concurrency Control
      • 7.2.3 Multi-Version Concurrency Control
    • 7.3 Distributed Transactions
      • 7.3.1 Two-Phase Commit
      • 7.3.2 Three-Phase Commit
      • 7.3.3 Google Spanner
        • 7.3.3.1 Overall Architecture
        • 7.3.3.2 True Time
        • 7.3.3.3 Distributed Transactions
        • 7.3.3.4 Summary
      • 7.3.4 TCC Transactions
      • 7.3.5 SAGA Transactions
    • 7.4 Chapter Summary

Get the free Community Edition

You can get the free Community Edition in PDF or EPUB just by sharing your name and email address with the author, or you can just click this link to read a shorter sample online...

 

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub