Leanpub Header

Skip to main content

LLM Quantization

From the Bits Up

This book is 100% completeLast updated on 2026-06-29

Anyone can run INT4 and read off the accuracy drop. This book explains why that number is what it is — building every quantization method from scratch, breaking it on purpose, and measuring the result. Quantization, from the bits up.

Minimum price

$18.00

$18.00

You pay

Author earns

$

Also available for 1 book credit with a Reader Membership

PDF
EPUB
About

About

About the Book

Most explanations of model quantization hand you a recipe and a number: "use INT4, you'll lose about 1% accuracy." This book refuses to stop there. It asks why — and answers from the bits up.

Starting from a single weight and the question of how few bits can represent it, the book builds the entire field from first principles. Every mechanism is coded from scratch in tested Python and PyTorch, deliberately pushed until it breaks, and measured with real numbers produced by executed code. Nothing is asserted; everything is demonstrated. When the book says GPTQ makes 3-bit weights free, there is a measured perplexity behind it. When it says one outlier channel costs 40 dB, there is a derivation and a verification.


What you'll build and measure
  • The Δ²/12 noise law and the 6.02 dB-per-bit rule — derived, then verified in code
  • A proof of why output error, not weight error, is the objective that matters
  • GPTQ, built from the Optimal Brain Surgeon equations, rescuing 3-bit weights
  • AWQ, SmoothQuant, and Hadamard rotation — implemented and compared
  • The GGUF K-quant byte layout that powers llama.cpp, reconstructed byte for byte
  • The quality cliff where low-bit quantization collapses — and exactly why no algorithm can save it
  • A capstone that quantizes a real model end to end and explains every number in the final size × accuracy × speed table

Who it's for

Engineers and researchers who want to understand quantization, not just apply it — people who would rather know why INT4 works than memorize that it does. If you can read Python and basic linear algebra, you can follow every derivation and reproduce every experiment.


How it's taught

The reference model throughout is TinyGPT, a small GPT-style decoder trained from scratch on Shakespeare — chosen so every experiment runs on a single CPU core and every result is reproducible. Seventeen chapters, six parts, fifty-plus figures, and not a single invented number.

This is quantization as a small set of principles — representation, noise, propagation, outliers, algorithms, kernels — that together let you predict exactly what happens when you take the bits away.

Build it, break it, measure it.

Bundle

Bundles that include this book

Author

About the Author

Hatem M.

Hatem M. is a programmer and author. He is the creator of C++ Algorithmic Mastery: 1000 Challenges from Beginner to Legendary...

Building on that challenge-driven philosophy, he now explores a deeper engineering challenge: creating an LLM inference engine from scratch in C++. LLM Quantization: From the Bits Up comes out of that work — turning the same build-it-from-scratch, measure-everything lens on the quantization techniques that make modern inference small and fast.

He is also the author of C++ Autopsy...

Contents

Table of Contents

LLM Quantization: From the Bits Up — Table of Contents Book Four

LLM Quantization

From the Bits Up by Hatem M.

A build-it, break-it, measure-it approach — every concept built from scratch, every number measured.

17 Chapters 6 Parts 50+ Figures 100% Measured

Part I Foundations

  1. 0Notation, Tools, and the Baseline
  2. 1The Economics of Bits the memory wall, the roofline, what shrinking actually buys
  3. 2The Quantization Map the affine map, symmetric vs asymmetric, rounding and clipping
  4. 3Granularity

Part II The Mathematics of Loss

  1. 4Quantization Noise the Δ²/12 law and 6.02 dB per bit, derived and verified
  2. 5Error Propagation why output error, not weight error, is the right objective

Part III The Outlier Problem

  1. 6Weights versus Activations
  2. 7Emergent Outliers why one bad channel destroys a whole layer
  3. 8Taming Outliers: LLM.int8() and SmoothQuant

Part IV The PTQ Algorithms

  1. 9RTN and Calibration
  2. 10GPTQ derived from Optimal Brain Surgeon, implemented from scratch
  3. 11AWQ and Rotation

Part V Representation and Kernels

  1. 12GGUF K-Quants, Byte by Byte the llama.cpp format reconstructed byte for byte
  2. 13Dequantization and Direct Block Matmul

Part VI Measuring and Breaking

  1. 14The Quality Cliff where low-bit quantization collapses, and exactly why
  2. 15KV-Cache Quantization
  3. 16Capstone: Quantizing a Model End to End the full size × accuracy × speed table, every number explained
LLM Quantization: From the Bits Up  •  Book Four  •  Hatem M. Build it, break it, measure it.

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub