Leanpub Header

Skip to main content

Build, Debug, Infer, Quantize — A C++ & LLM Bundle

Four books. One complete arc: master C++, debug like a forensic investigator, build an LLM inference engine from scratch, and understand quantization down to the bit level. Everything in one place, for engineers who want to understand — not just use.

Bought separately

$56.00

$40.00

You pay

Author earns

$

Also available for 2 book credits with a Reader Membership

These books have a total suggested price of $56.00. Get them now for only $40.00!
About

About

About the Bundle

Four books for engineers who want to understand — not just use.

From first-principles algorithms to LLM inference and quantization: a library built for the kind of developer who reads the source code when the docs aren't enough.

C++ Algorithmic Mastery: 1000 Challenges from Beginner to Legendary One thousand problems across eight progressive volumes — each with a hint, a tested solution, complexity analysis, and a plain-language explanation. A complete, structured path from your first line of C++ to the frontiers of competitive programming.

C++ Autopsy Ten forensic investigations into programs that compile cleanly, run without crashing, and quietly return the wrong answer. Each case unfolds like a real investigation: evidence, suspects, real diagnostic tool output, and a verdict backed by the C++ standard. Undefined behavior, memory corruption, concurrency bugs, optimizer assumptions — the kind of defects that reach production.

Build an LLM Inference Engine in C++ You don't truly understand how large language models run until you've built the engine yourself. Starting from a blank C++20 project, you'll build a complete working inference engine — strided tensors, BPE tokenizer, Transformer with Flash Attention and KV Cache, int4/int8 quantization, GGUF loading with mmap — until it runs a real Llama-family model on CPU.

LLM Quantization Starting from a single weight, this book builds the entire field of model quantization from first principles. GPTQ, AWQ, SmoothQuant, Hadamard rotation, the GGUF K-quant byte layout — every mechanism implemented in Python and PyTorch, every claim verified with measured numbers. Nothing asserted. Everything demonstrated.

Who this is for: CS students who want more than their curriculum gives them. Engineers preparing for systems-level technical interviews. AI/ML practitioners who want to understand what actually happens inside an LLM runtime. Anyone who has ever fixed a bug by guessing, and wants to stop.

Books

About the Books

C++ Algorithmic Mastery: 1000 Challenges from Beginner to Legendary

Solved and Tested — A Complete 8-Volume Journey from Beginner to Advanced Algorithms

1000 problems. Eight volumes. One complete journey from your very first line of C++ to the frontiers of modern algorithms.

This is the complete collection: 1000 carefully designed C++ problems, organized into 8 progressive volumes that take you from absolute beginner to advanced algorithmic mastery — with every single solution compiled and tested on g++.

What you get:

- 1000 problems, carefully ordered by difficulty across 8 volumes

- A complete, working C++ solution for every problem

- Clear explanations, hints, and complexity analysis throughout

- A structured path, so you never have to guess what to learn next

Every problem includes:

- A clear statement with examples and constraints

- A hint to point you in the right direction

- A full, tested C++ solution

- Time and space complexity

- A plain-language explanation of how it works

The 8 volumes:

1. Absolute Beginner — I/O, Variables, Conditions, Loops, Functions, Arrays

2. Beginner — Strings, Recursion, Basic Math, Simple Data Structures

3. Elementary — Sorting, Searching, Core Algorithms, the STL

4. Upper-Intermediate — Data Structures, Two Pointers, Prefix Sums, Greedy

5. Advanced — Graphs, Dynamic Programming, Trees, Shortest Paths

6. Expert — Advanced Strings, Heavy Graph Machinery, Computational Geometry

7. Master — Suffix Structures, Flow & Matching, Number Theory, Combinatorics

8. Genius — Polynomials & Transforms, Advanced Graphs, Capstone Challenges

Who it's for:

Anyone learning C++ who wants a structured, guided path — students, self-taught programmers, and anyone preparing for coding interviews or competitive programming who wants every step explained, not just a list of problems.

One thousand problems. Eight volumes. A complete roadmap from foundations to mastery.

Build an LLM Inference Engine in C++

A Challenge-Driven Guide to Building a CPU-First Inference Engine in C++20

Build an LLM Inference Engine in C++ — Through Challenges

You don't truly understand how large language models run

until you've built the engine yourself.

This book takes you from a blank C++ project to a complete,

working inference engine that loads a real Llama-family model

and generates text — one challenge at a time.

What you'll build:

- A strided tensor system with zero-copy views and arena allocation

- Math kernels: RMSNorm, SwiGLU, softmax, GEMM with SIMD

- A byte-level BPE tokenizer

- A full Transformer: RoPE, GQA, Flash Attention, KV Cache

- int8/int4 quantization with direct block multiplication

- GGUF model loading with mmap

- Sampling, streaming, speculative decoding, and continuous batching

- An optional CUDA capstone for the heaviest kernels

By Unit 14, the engine runs a real model on CPU. Every concept

earns its place right after you've built the thing it improves.

Who this is for: C++20 developers comfortable with algorithms

and memory layout who want to understand what actually happens

inside an LLM runtime — not by reading, but by building.

C++ AUTOPSY

Ten Investigations into Code that Compiled, Ran, and Lied

Every bug in this book passed the compiler. Every bug ran to completion. Every bug returned exit code zero. None of them told the truth.

Most C++ books teach you how to write code. This one teaches you how to investigate it.

C++ Autopsy presents ten forensic investigations into programs that appear perfectly healthy: they compile cleanly, execute normally, and quietly produce the wrong result. No syntax errors. No crashes. No obvious clues. Just evidence waiting to be examined.

Each case unfolds like a real investigation. You begin at the crime scene with a minimal, reproducible program. You examine the evidence, question the suspects—including one deliberate red herring—inspect real diagnostic output from professional tools, identify the true cause, and verify the fix. Every conclusion is backed by the C++ standard, compiler behavior, or observable runtime evidence.

Along the way, you'll uncover some of the language's most deceptive pitfalls: undefined behavior, lifetime errors, memory corruption, concurrency bugs, numerical surprises, optimizer assumptions, and subtle violations of the Standard Library's contracts. Some cases are caught immediately by modern tools. Others pass every warning, every sanitizer, and every test—exactly the kinds of defects that reach production.

Whether you write systems software, libraries, game engines, or high-performance applications, this book will change the way you debug C++. Instead of asking, "Why did my program crash?" you'll learn to ask the more dangerous question:

Why did it appear to work?

For intermediate and advanced C++ developers. Every example compiles with C++20. Every case is real. Every verdict is earned.

LLM Quantization

From the Bits Up

Most explanations of model quantization hand you a recipe and a number: "use INT4, you'll lose about 1% accuracy." This book refuses to stop there. It asks why — and answers from the bits up.

Starting from a single weight and the question of how few bits can represent it, the book builds the entire field from first principles. Every mechanism is coded from scratch in tested Python and PyTorch, deliberately pushed until it breaks, and measured with real numbers produced by executed code. Nothing is asserted; everything is demonstrated. When the book says GPTQ makes 3-bit weights free, there is a measured perplexity behind it. When it says one outlier channel costs 40 dB, there is a derivation and a verification.


What you'll build and measure
  • The Δ²/12 noise law and the 6.02 dB-per-bit rule — derived, then verified in code
  • A proof of why output error, not weight error, is the objective that matters
  • GPTQ, built from the Optimal Brain Surgeon equations, rescuing 3-bit weights
  • AWQ, SmoothQuant, and Hadamard rotation — implemented and compared
  • The GGUF K-quant byte layout that powers llama.cpp, reconstructed byte for byte
  • The quality cliff where low-bit quantization collapses — and exactly why no algorithm can save it
  • A capstone that quantizes a real model end to end and explains every number in the final size × accuracy × speed table

Who it's for

Engineers and researchers who want to understand quantization, not just apply it — people who would rather know why INT4 works than memorize that it does. If you can read Python and basic linear algebra, you can follow every derivation and reproduce every experiment.


How it's taught

The reference model throughout is TinyGPT, a small GPT-style decoder trained from scratch on Shakespeare — chosen so every experiment runs on a single CPU core and every result is reproducible. Seventeen chapters, six parts, fifty-plus figures, and not a single invented number.

This is quantization as a small set of principles — representation, noise, propagation, outliers, algorithms, kernels — that together let you predict exactly what happens when you take the bits away.

Build it, break it, measure it.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub