Kick off your book project in 3 hours! Live workshop on Zoom. You’ll leave with a real book project, progress on your first chapter, and a clear plan to keep going. Saturday, May 16, 2026. Learn more…

Leanpub Header

Skip to main content

Inside Large Language Models, Volumes I and II

Most LLM books pick a side. Either they explain the math without showing the code, or they show the code without explaining the math.   This is both. Two volumes. Eighteen chapters. From your first dot product in Chapter 5 to your fourth fine-tuning project in Chapter 17.   Volume I builds the transformer from scratch. Volume II takes it into production. Together they are the only LLM set that walks the full arc from "what is attention" to "ship a fine-tuned model on a budget."   If you have ever found a prompt trick that just worked and wished you knew why, or stared at a fine-tuning bill and wondered if you were doing it wrong, this is the set.

Bought separately

$73.98

Minimum price

$29.00

$34.00

You pay

Author earns

$
These books have a total suggested price of $73.98. Get them now for only $29.00!
About

About

About the Bundle

Inside Large Language Models is a two-volume, first-principles education in modern AI. Eighteen chapters. Two cover artworks. One companion codebase. Written for the engineer who has shipped with the OpenAI or Claude API and now wants to understand what is actually happening underneath, and to take that understanding into production.   Most LLM resources pick a side. The popular-science books explain the ideas without ever showing the math. The academic textbooks bury the ideas under a wall of notation. The blog posts hand-wave attention as "the model decides what to focus on." The library tours teach you which Hugging Face button to click without explaining why. None of these resources walks the full arc, and none of them speaks the language of an engineer who has work to ship.   These two volumes are that arc.   **Volume I: Foundations and the Complete Transformer (Chapters 1 to 9)**   Volume I takes you from the very first question, "what is a large language model, really?" to building and training a complete GPT-style model from scratch in PyTorch. Tokenisation. Embeddings. Positional encoding. Single-head and multi-head attention. Residual connections. Layer normalisation. Feed-forward networks. The language-modelling head. Every component gets a chapter that explains the problem it solves, the math behind it, and a working PyTorch implementation you can run on your laptop. The math is taught with arithmetic that a motivated high-school student can follow on paper. No graduate-level prerequisites. No "it can be shown that." No skipping the steps.   By the end of Volume I, you have written a transformer in PyTorch, trained it on real text, and watched it generate language. The mystery is gone. What is left is mastery.   **Volume II: Inference, Optimisation, and Fine-Tuning (Chapters 10 to 18)**   Volume II picks up where Volume I left off. The model is built. Now what? The next nine chapters answer that question end to end: how inference actually works token by token, how to align a model with human preferences using RLHF, how to fine-tune billion-parameter models on consumer hardware with LoRA and QLoRA, how to make production inference ten times cheaper, and how to build four real applied systems on real data. Contract classification. Legal-document QLoRA. Text-to-SQL. Function calling for AI agents and agentic workflows.   By the end of Volume II, you have fine-tuned four models on real domains, optimised them for production, and shipped a function-calling system that works.   **Across both volumes you will:**   **See every concept worked out by hand.** When the book introduces attention, you compute attention scores between three tokens with three-dimensional vectors and a calculator. When it introduces the KV cache, you watch a real attention computation grow token by token and see exactly which tensors get cached and which get recomputed. There is no hand-waving, no "the framework handles it," no skipping the math.   **Build the transformer block piece by piece, then take it into production.** Each component of the model gets a chapter in Volume I. Each production technique gets a chapter in Volume II. By the end you have a working understanding of every layer from token embedding to deployed function-calling agent, with running code throughout.   **Pick the right fine-tuning method for any problem.** Full fine-tuning, LoRA, QLoRA, parameter-efficient tuning, instruction tuning, RLHF with PPO. Each method gets a chapter that explains the problem it solves, the math behind it, the cost trade-off, and a working PyTorch implementation. By the end you can look at a new problem and pick the cheapest method that will actually solve it.   **Ship four end-to-end applied projects on real data.** A contract-type classifier (Chapter 14). A legal-document assistant fine-tuned with QLoRA on a real legal corpus (Chapter 15). A text-to-SQL system that translates natural language into working database queries (Chapter 16). A function-calling system that powers the AI agents and agentic workflows everyone is building right now (Chapter 17). Every project includes runnable code, a real dataset, and a step-by-step walkthrough from data preparation to deployed model.   **Prepare for an LLM-engineer interview.** Each chapter ends with ten to fifteen scenario-based questions modelled on real interviews at FAANG, AI labs, and AI-first startups. Across the two volumes you finish with 150-plus questions you can use to prepare for an LLM-engineer or AI-engineer interview, with the technical depth to answer them properly.   **Who these books are for:** Software engineers who have shipped with the OpenAI or Claude API and want to understand the system underneath, and to start fine-tuning their own models for a fraction of the price. Machine-learning engineers who need to take an open-weights model and adapt it to a specific business domain on a real budget, on real hardware. Practitioners building AI agents and agentic systems who need to understand function calling at the level beneath the framework abstractions. Engineering managers and tech leads making build-versus-buy decisions about LLM features and want the technical depth to defend those decisions in front of a CFO. Students and self-taught practitioners who want a real, technical foundation without a PhD-level prerequisite.   **What makes this bundle different:**   Most LLM resources cover either the foundations or the production techniques. Few cover both. None cover both in the same voice with the same example codebase. Inside Large Language Models is the only set that walks the complete arc, from your first dot product through to your fourth deployed fine-tuning project, in a single coherent voice with a single coherent companion repository.   The two volumes are written so each can stand alone. Read Volume I first if you want to understand the technology underneath. Read Volume II first if you have a deadline and need to ship a fine-tuned model this quarter. Either way, the second volume reads more deeply once both are in hand. Buying the bundle is the most economical way to acquire the complete journey, and the discount versus buying the two volumes separately is built into the price below.   Companion code: every listing across both volumes is available as a runnable Python file at https://github.com/ritesh-modi/inside-llm. Clone it. Run it. Modify it. Fine-tune it on your own data. Break it. Fix it.   About the author: Ritesh Modi is Head of AI at MarketOnce and a former Principal Forward Deployed Engineer at Microsoft. He writes about LLMs at https://www.riteshmodi.com.

Books

About the Books

Inside Large Language Models for absolute beginners: Volume I

Simple Arithmetic and beginners Python based approach

What if you could understand how ChatGPT actually works, with nothing more than high-school algebra and a working laptop?

Inside Large Language Models, Volume I is the book the field has been missing: a plain-English, math-light, code-first introduction to the technology behind every modern AI assistant. No prior machine learning experience is assumed. No graduate-level mathematics is required. Every concept is walked through with simple arithmetic that a motivated high-school student can follow on paper.

Volume I takes you from the very first question, "what is a large language model, really?" to building and training a complete GPT-style model from scratch in Python. Along the way you will:

  • See every step worked out by hand. When the book introduces attention, you compute attention scores between three actual tokens with three-dimensional vectors and a calculator. When it introduces softmax, you apply softmax to a tiny list of numbers and watch the probabilities come out correctly. There is no hand-waving, no "it can be shown that," no skipping the math.
  • Build the transformer block, piece by piece. Single-head attention. Multi-head attention. Residual connections. Layer normalisation. Feed-forward networks. The language modeling head. Every component gets a chapter that explains the problem it solves, the math behind it, and a working PyTorch implementation you can run on your laptop.
  • Learn the math the way it should be taught. The dot product is presented as a similarity score with a worked example. Softmax is presented as a soft winner-take-all rule with a four-row computation. Backpropagation is walked through a tiny one-weight network with arithmetic at every step before scaling to a 96-layer transformer. If you can multiply two numbers, you can follow this book.
  • Train your own GPT. The final chapter assembles everything into a complete, runnable Python implementation that trains on a small text corpus and generates new text. You will run it. You will modify it. You will understand exactly what every line does.

Who this book is for:

  • Software engineers who want to move beyond calling APIs and actually understand the systems they ship.
  • Students who are tired of textbooks that hide the math behind notation and want to see every step.
  • Curious readers with a high-school background who have heard about transformers and want a real, technical understanding without a PhD-level prerequisite.
  • Practitioners moving into AI roles who need a foundation that goes deeper than online tutorials.

What makes this book different:

Most LLM books fall into one of two camps: the popular-science books that explain the ideas without ever showing the math, and the academic textbooks that bury the ideas under a wall of notation. Inside Large Language Models takes a third path. It treats the reader as a serious adult who wants the real machinery, but it refuses to require any background the reader does not already have. Every formula is preceded by a plain-English paragraph that explains what the formula is doing. Every code listing is followed by a line-by-line table that explains what each line is doing. Every concept is paired with a concrete numerical example you can verify on paper.

Volume I is the foundation: tokenisation, embeddings, positional encoding, attention in all its forms, the complete transformer block, training, and a from-scratch GPT. Volume II takes those foundations into production: inference, alignment, fine-tuning, and four end-to-end fine-tuning projects.

By the end of Volume I, you will not just know how a transformer works. You will have built one yourself, trained it, and watched it generate text. The mystery will be gone. What is left is mastery.

Companion code: every listing in the book is available as a runnable Python file at https://github.com/ritesh-modi/inside-llm. Clone it, run it, modify it, break it, fix it.

Inside Large Language Models for absolute beginners: Volume II

Simple Arithmetic and beginners Python based approach

What if you could turn any open-weights language model into a domain expert that knows your contracts, your databases, and your tools, and ship it without a research lab budget?

Inside Large Language Models, Volume II is the book that takes the foundation built in Volume I and turns it into a working production system. It is the book for the engineer who has stopped

wondering how attention works and started wondering why their fine-tuning bill is bigger than their server bill, why a seven-billion-parameter model is the largest they can fit on their hardware,and how the production teams shipping LLMs to millions of users do it without a research lab budget.

Volume II picks up where Volume I left off. The transformer is built. The model is trained. Now what? The next nine chapters answer that question end to end: how inference actually works token by token, how to align a model with human preferences using RLHF, how to fine-tune billion-parameter models on consumer hardware with LoRA and QLoRA, how to make production inference ten times cheaper, and how to build four real applied systems on real data.

Along the way you will:

See every production technique worked out the same way the math was in Volume I. When the book introduces the KV cache, you watch a concrete attention computation grow token by token and see exactly which tensors get cached and which get recomputed. When it introduces quantisation, you take a real weight matrix from FP32 down to INT4 and check that the dequantized version still produces sensible outputs. There is no "this is an industry standard," no "the framework handles it." You see the bytes.

Understand fine-tuning at the level where you can pick the right tool. Full fine-tuning, LoRA, QLoRA, parameter-efficient tuning, instruction tuning, RLHF with PPO. Each method gets a chapter that explains the problem it solves, the math behind it, the cost trade-off, and a working PyTorch implementation you can run on your laptop. By the end you can look at a new problem and pick the cheapest method that will actually solve it, rather than reaching for whatever was in the last tutorial you read.

Build four end-to-end applied projects on real data. A contract-type classifier trained on real legal documents (Chapter 14).

A legal-document assistant fine-tuned with QLoRA on a real legal corpus (Chapter 15).

A text-to-SQL system that translates natural language into working database queries (Chapter 16).

A function-calling system that teaches an LLM to use your APIs and powers the

AI agents and agentic workflows everyone is building right now (Chapter 17).

Every project has runnable code, a real dataset, and a step-by-step walkthrough from data preparation to a deployed model.

Make production inference fast and cheap. The KV cache. Prefix caching. Quantisation. Continuous batching. Speculative decoding. Each one is broken down with concrete examples, real numbers, and the reasoning behind why it works. You will understand why putting variables at the end of your prompt makes API calls ten times cheaper, why a 70-billion-parameter model fits on a single consumer GPU after QLoRA, and why the same prompt sometimes produces different outputs at temperature zero.

Who this book is for:

Software engineers who have shipped with the OpenAI or Claude API and are tired of paying for capabilities they could fine-tune themselves for a fraction of the price.

Machine-learning engineers who need to take an open-weights model and adapt it to a specific business domain, on a real budget, on real hardware.

Practitioners building agents and agentic systems who need to understand function calling at the level beneath the framework abstractions.

Engineering managers and tech leads who need to make build-versus-buy decisions about LLM features and want the technical depth to defend those decisions in front of a CFO.

What makes this book different:

Most fine-tuning content online is either toy examples on famous datasets (which never transfer to real work) or library tours that teach you which Hugging Face button to click without explaining why.

Inside Large Language Models, Volume II takes the third path. It teaches the underlying mechanics of every production technique, then walks through four real applied projects from data preparation to deployed model. You finish the book with code you can actually use, models you have actually trained, and the judgement to know which technique fits your next problem before you have written a single line of code.

Inside Large Language Models for absolute beginners: Volume I

Simple Arithmetic and beginners Python based approach

What if you could understand how ChatGPT actually works, with nothing more than high-school algebra and a working laptop?

Inside Large Language Models, Volume I is the book the field has been missing: a plain-English, math-light, code-first introduction to the technology behind every modern AI assistant. No prior machine learning experience is assumed. No graduate-level mathematics is required. Every concept is walked through with simple arithmetic that a motivated high-school student can follow on paper.

Volume I takes you from the very first question, "what is a large language model, really?" to building and training a complete GPT-style model from scratch in Python. Along the way you will:

  • See every step worked out by hand. When the book introduces attention, you compute attention scores between three actual tokens with three-dimensional vectors and a calculator. When it introduces softmax, you apply softmax to a tiny list of numbers and watch the probabilities come out correctly. There is no hand-waving, no "it can be shown that," no skipping the math.
  • Build the transformer block, piece by piece. Single-head attention. Multi-head attention. Residual connections. Layer normalisation. Feed-forward networks. The language modeling head. Every component gets a chapter that explains the problem it solves, the math behind it, and a working PyTorch implementation you can run on your laptop.
  • Learn the math the way it should be taught. The dot product is presented as a similarity score with a worked example. Softmax is presented as a soft winner-take-all rule with a four-row computation. Backpropagation is walked through a tiny one-weight network with arithmetic at every step before scaling to a 96-layer transformer. If you can multiply two numbers, you can follow this book.
  • Train your own GPT. The final chapter assembles everything into a complete, runnable Python implementation that trains on a small text corpus and generates new text. You will run it. You will modify it. You will understand exactly what every line does.

Who this book is for:

  • Software engineers who want to move beyond calling APIs and actually understand the systems they ship.
  • Students who are tired of textbooks that hide the math behind notation and want to see every step.
  • Curious readers with a high-school background who have heard about transformers and want a real, technical understanding without a PhD-level prerequisite.
  • Practitioners moving into AI roles who need a foundation that goes deeper than online tutorials.

What makes this book different:

Most LLM books fall into one of two camps: the popular-science books that explain the ideas without ever showing the math, and the academic textbooks that bury the ideas under a wall of notation. Inside Large Language Models takes a third path. It treats the reader as a serious adult who wants the real machinery, but it refuses to require any background the reader does not already have. Every formula is preceded by a plain-English paragraph that explains what the formula is doing. Every code listing is followed by a line-by-line table that explains what each line is doing. Every concept is paired with a concrete numerical example you can verify on paper.

Volume I is the foundation: tokenisation, embeddings, positional encoding, attention in all its forms, the complete transformer block, training, and a from-scratch GPT. Volume II takes those foundations into production: inference, alignment, fine-tuning, and four end-to-end fine-tuning projects.

By the end of Volume I, you will not just know how a transformer works. You will have built one yourself, trained it, and watched it generate text. The mystery will be gone. What is left is mastery.

Companion code: every listing in the book is available as a runnable Python file at https://github.com/ritesh-modi/inside-llm. Clone it, run it, modify it, break it, fix it.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub