LLM Quantization

Name: LLM Quantization
Brand: Leanpub
Price: 18.00 USD
Availability: InStock

From the Bits Up

This book is 100% completeLast updated on 2026-06-29

Hatem M.

Anyone can run INT4 and read off the accuracy drop. This book explains why that number is what it is — building every quantization method from scratch, breaking it on purpose, and measuring the result. Quantization, from the bits up.

This book is 100% completeLast updated on 2026-06-29

Hatem M.

Minimum price

$18.00

You pay

Author earns

Buying multiple copies for your team? See below for a discount!

PDF

EPUB

About

LLM Quantization

Minimum price

$18.00

You pay

Author earns

About

About the Book

Most explanations of model quantization hand you a recipe and a number: "use INT4, you'll lose about 1% accuracy." This book refuses to stop there. It asks why — and answers from the bits up.

Starting from a single weight and the question of how few bits can represent it, the book builds the entire field from first principles. Every mechanism is coded from scratch in tested Python and PyTorch, deliberately pushed until it breaks, and measured with real numbers produced by executed code. Nothing is asserted; everything is demonstrated. When the book says GPTQ makes 3-bit weights free, there is a measured perplexity behind it. When it says one outlier channel costs 40 dB, there is a derivation and a verification.

What you'll build and measure

The Δ²/12 noise law and the 6.02 dB-per-bit rule — derived, then verified in code
A proof of why output error, not weight error, is the objective that matters
GPTQ, built from the Optimal Brain Surgeon equations, rescuing 3-bit weights
AWQ, SmoothQuant, and Hadamard rotation — implemented and compared
The GGUF K-quant byte layout that powers llama.cpp, reconstructed byte for byte
The quality cliff where low-bit quantization collapses — and exactly why no algorithm can save it
A capstone that quantizes a real model end to end and explains every number in the final size × accuracy × speed table

Who it's for

Engineers and researchers who want to understand quantization, not just apply it — people who would rather know why INT4 works than memorize that it does. If you can read Python and basic linear algebra, you can follow every derivation and reproduce every experiment.

How it's taught

The reference model throughout is TinyGPT, a small GPT-style decoder trained from scratch on Shakespeare — chosen so every experiment runs on a single CPU core and every result is reproducible. Seventeen chapters, six parts, fifty-plus figures, and not a single invented number.

This is quantization as a small set of principles — representation, noise, propagation, outliers, algorithms, kernels — that together let you predict exactly what happens when you take the bits away.

Build it, break it, measure it.

Share this book

Feedback

Email the Author

Team Discounts

Get a team discount on this book!

Up to 3 members
Minimum price
$45.00
Suggested price
$45.00
Up to 5 members
Minimum price
$72.00
Suggested price
$72.00
Up to 10 members
Minimum price
$126
Suggested price
$126
Up to 15 members
Minimum price
$180
Suggested price
$180
Up to 25 members
Minimum price
$250
Suggested price
$250

Bundles

Bundles that include this book

Build, Debug, Infer, Quantize — A C++ & LLM Bundle
4 Books
Pricing
$44.00
Minimum price
Bought separately$62.60
Suggested price$44.00
Built and Proven — C++, SQL, LLMs & Claude
8 Books
Pricing
$69.00
Minimum price
Bought separately$107.80
Suggested price$69.00

Author

About the Author

Hatem M.

Hatem M. is a programmer and author. He is the creator of C++ Algorithmic Mastery: 1000 Challenges from Beginner to Legendary...

Building on that challenge-driven philosophy, he now explores a deeper engineering challenge: creating an LLM inference engine from scratch in C++. LLM Quantization: From the Bits Up comes out of that work — turning the same build-it-from-scratch, measure-everything lens on the quantization techniques that make modern inference small and fast.

He is also the author of C++ Autopsy...

Table of Contents

LLM Quantization: From the Bits Up — Table of Contents Book Four

LLM Quantization

From the Bits Up by Hatem M.

A build-it, break-it, measure-it approach — every concept built from scratch, every number measured.

17 Chapters 6 Parts 50+ Figures 100% Measured

Part I Foundations

0Notation, Tools, and the Baseline
1The Economics of Bits the memory wall, the roofline, what shrinking actually buys
2The Quantization Map the affine map, symmetric vs asymmetric, rounding and clipping
3Granularity

Part II The Mathematics of Loss

4Quantization Noise the Δ²/12 law and 6.02 dB per bit, derived and verified
5Error Propagation why output error, not weight error, is the right objective

Part III The Outlier Problem

6Weights versus Activations
7Emergent Outliers why one bad channel destroys a whole layer
8Taming Outliers: LLM.int8() and SmoothQuant

Part IV The PTQ Algorithms

9RTN and Calibration
10GPTQ derived from Optimal Brain Surgeon, implemented from scratch
11AWQ and Rotation

Part V Representation and Kernels

12GGUF K-Quants, Byte by Byte the llama.cpp format reconstructed byte for byte
13Dequantization and Direct Block Matmul

Part VI Measuring and Breaking

14The Quality Cliff where low-bit quantization collapses, and exactly why
15KV-Cache Quantization
16Capstone: Quantizing a Model End to End the full size × accuracy × speed table, every number explained

LLM Quantization: From the Bits Up • Book Four • Hatem M. Build it, break it, measure it.

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

Download Sample PDF Download Sample EPUB

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

You pay

Author earns

About

Share this book

Categories

Feedback

Team Discounts

Bundles

Build, Debug, Infer, Quantize — A C++ & LLM Bundle

$44.00

Built and Proven — C++, SQL, LLMs & Claude

$69.00

Author

Contents

LLM Quantization

Part I Foundations

Part II The Mathematics of Loss

Part III The Outlier Problem

Part IV The PTQ Algorithms

Part V Representation and Kernels

Part VI Measuring and Breaking

Get the free sample chapters

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub