Build an LLM Inference Engine in C++

Name: Build an LLM Inference Engine in C++
Brand: Leanpub
Price: 24.00 USD
Availability: InStock

A Challenge-Driven Guide to Building a CPU-First Inference Engine in C++20

This book is 100% completeLast updated on 2026-07-11

Hatem M.

Build a complete LLM inference engine in C++ — from a blank

project to a working Transformer that loads a real model and

generates text. Forged one challenge at a time, with tests

that prove every piece works before you move on.

This book is 100% completeLast updated on 2026-07-11

Hatem M.

Build a complete LLM inference engine in C++ — from a blank

project to a working Transformer that loads a real model and

generates text. Forged one challenge at a time, with tests

that prove every piece works before you move on.

Minimum price

$24.00

You pay

Author earns

Buying multiple copies for your team? See below for a discount!

PDF

EPUB

About

Build an LLM Inference Engine in C++

Minimum price

$24.00

You pay

Author earns

About

About the Book

Build an LLM Inference Engine in C++ — Through Challenges

You don't truly understand how large language models run

until you've built the engine yourself.

This book takes you from a blank C++ project to a complete,

working inference engine that loads a real Llama-family model

and generates text — one challenge at a time.

What you'll build:

- A strided tensor system with zero-copy views and arena allocation

- Math kernels: RMSNorm, SwiGLU, softmax, GEMM with SIMD

- A byte-level BPE tokenizer

- A full Transformer: RoPE, GQA, Flash Attention, KV Cache

- int8/int4 quantization with direct block multiplication

- GGUF model loading with mmap

- Sampling, streaming, speculative decoding, and continuous batching

- An optional CUDA capstone for the heaviest kernels

By Unit 14, the engine runs a real model on CPU. Every concept

earns its place right after you've built the thing it improves.

Who this is for: C++20 developers comfortable with algorithms

and memory layout who want to understand what actually happens

inside an LLM runtime — not by reading, but by building.

Share this book

Feedback

Email the Author

Team Discounts

Get a team discount on this book!

Up to 3 members
Minimum price
$60.00
Suggested price
$60.00
Up to 5 members
Minimum price
$96.00
Suggested price
$96.00
Up to 10 members
Minimum price
$168
Suggested price
$168
Up to 15 members
Minimum price
$240
Suggested price
$240
Up to 25 members
Minimum price
$360
Suggested price
$360

Bundles

Bundles that include this book

Build, Debug, Infer, Quantize — A C++ & LLM Bundle
4 Books
Pricing
$44.00
Minimum price
Bought separately$62.60
Suggested price$44.00
Built and Proven — C++, SQL, LLMs & Claude
8 Books
Pricing
$69.00
Minimum price
Bought separately$107.80
Suggested price$69.00

Author

About the Author

Hatem M.

Hatem M. is a programmer and technical author whose work focuses on modern C++, large language models, and AI systems.

His books combine first-principles explanations with complete implementations and reproducible experiments. They include C++ Algorithmic Mastery, an eight-volume series on algorithms and problem solving; Build an LLM Inference Engine in C++, which constructs a GPT-style inference engine from scratch; LLM Quantization: From the Bits Up, which develops the theory and practice of neural network quantization from the bit level upward; and C++ Autopsy, a forensic investigation of ten subtle C++ bugs that compiled successfully, ran correctly, and still produced the wrong answers.

Table of Contents

Unit 0 — Foundation & Setup
Unit 1 — Tensor & Memory
Unit 2 — Math Kernels
Unit 3 — Matrix Multiplication
Unit 4 — Tokenizer
Unit 5 — Transformer
Unit 6 — Mixture of Experts
Unit 7 — KV Cache
Unit 8 — Flash Attention
Unit 9 — Quantization
Unit 10 — Model Loading
Unit 11 — Sampling & Generation
Unit 12 — Speculative Decoding
Unit 13 — Serving & Batching
Unit 14 — Performance (Engine Complete)
Unit 15 — Capstone: GPU/CUDA

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

Download Sample PDF

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

You pay

Author earns

About

Share this book

Categories

Feedback

Team Discounts

Bundles

Build, Debug, Infer, Quantize — A C++ & LLM Bundle

$44.00

Built and Proven — C++, SQL, LLMs & Claude

$69.00

Author

Contents

Get the free sample chapters

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub