Leanpub Header

Skip to main content

Inside LLM

Bought separately

$73.98

Minimum price

$29.00

$34.00

You pay

$34.00

Author earns

$27.20
$
You can also buy this bundle with 2 book credits. Get book credits with a Reader Membership or an Organization Membership for your team.
These books have a total suggested price of $73.98. Get them now for only $29.00!
About

About

About the Bundle

Books

About the Books

Inside Large Language Models for absolute beginners: Volume I

Simple Arithmetic and beginners Python based approach

What if you could understand how ChatGPT actually works, with nothing more than high-school algebra and a working laptop?

Inside Large Language Models, Volume I is the book the field has been missing: a plain-English, math-light, code-first introduction to the technology behind every modern AI assistant. No prior machine learning experience is assumed. No graduate-level mathematics is required. Every concept is walked through with simple arithmetic that a motivated high-school student can follow on paper.

Volume I takes you from the very first question, "what is a large language model, really?" to building and training a complete GPT-style model from scratch in Python. Along the way you will:

  • See every step worked out by hand. When the book introduces attention, you compute attention scores between three actual tokens with three-dimensional vectors and a calculator. When it introduces softmax, you apply softmax to a tiny list of numbers and watch the probabilities come out correctly. There is no hand-waving, no "it can be shown that," no skipping the math.
  • Build the transformer block, piece by piece. Single-head attention. Multi-head attention. Residual connections. Layer normalisation. Feed-forward networks. The language modeling head. Every component gets a chapter that explains the problem it solves, the math behind it, and a working PyTorch implementation you can run on your laptop.
  • Learn the math the way it should be taught. The dot product is presented as a similarity score with a worked example. Softmax is presented as a soft winner-take-all rule with a four-row computation. Backpropagation is walked through a tiny one-weight network with arithmetic at every step before scaling to a 96-layer transformer. If you can multiply two numbers, you can follow this book.
  • Train your own GPT. The final chapter assembles everything into a complete, runnable Python implementation that trains on a small text corpus and generates new text. You will run it. You will modify it. You will understand exactly what every line does.

Who this book is for:

  • Software engineers who want to move beyond calling APIs and actually understand the systems they ship.
  • Students who are tired of textbooks that hide the math behind notation and want to see every step.
  • Curious readers with a high-school background who have heard about transformers and want a real, technical understanding without a PhD-level prerequisite.
  • Practitioners moving into AI roles who need a foundation that goes deeper than online tutorials.

What makes this book different:

Most LLM books fall into one of two camps: the popular-science books that explain the ideas without ever showing the math, and the academic textbooks that bury the ideas under a wall of notation. Inside Large Language Models takes a third path. It treats the reader as a serious adult who wants the real machinery, but it refuses to require any background the reader does not already have. Every formula is preceded by a plain-English paragraph that explains what the formula is doing. Every code listing is followed by a line-by-line table that explains what each line is doing. Every concept is paired with a concrete numerical example you can verify on paper.

Volume I is the foundation: tokenisation, embeddings, positional encoding, attention in all its forms, the complete transformer block, training, and a from-scratch GPT. Volume II takes those foundations into production: inference, alignment, fine-tuning, and four end-to-end fine-tuning projects.

By the end of Volume I, you will not just know how a transformer works. You will have built one yourself, trained it, and watched it generate text. The mystery will be gone. What is left is mastery.

Companion code: every listing in the book is available as a runnable Python file at https://github.com/ritesh-modi/inside-llm. Clone it, run it, modify it, break it, fix it.

Inside Large Language Models for absolute beginners: Volume II

Simple Arithmetic and beginners Python based approach

What if you could turn any open-weights language model into a domain expert that knows your contracts, your databases, and your tools, and ship it without a research lab budget?

Inside Large Language Models, Volume II is the book that takes the foundation built in Volume I and turns it into a working production system. It is the book for the engineer who has stopped

wondering how attention works and started wondering why their fine-tuning bill is bigger than their server bill, why a seven-billion-parameter model is the largest they can fit on their hardware,and how the production teams shipping LLMs to millions of users do it without a research lab budget.

Volume II picks up where Volume I left off. The transformer is built. The model is trained. Now what? The next nine chapters answer that question end to end: how inference actually works token by token, how to align a model with human preferences using RLHF, how to fine-tune billion-parameter models on consumer hardware with LoRA and QLoRA, how to make production inference ten times cheaper, and how to build four real applied systems on real data.

Along the way you will:

See every production technique worked out the same way the math was in Volume I. When the book introduces the KV cache, you watch a concrete attention computation grow token by token and see exactly which tensors get cached and which get recomputed. When it introduces quantisation, you take a real weight matrix from FP32 down to INT4 and check that the dequantized version still produces sensible outputs. There is no "this is an industry standard," no "the framework handles it." You see the bytes.

Understand fine-tuning at the level where you can pick the right tool. Full fine-tuning, LoRA, QLoRA, parameter-efficient tuning, instruction tuning, RLHF with PPO. Each method gets a chapter that explains the problem it solves, the math behind it, the cost trade-off, and a working PyTorch implementation you can run on your laptop. By the end you can look at a new problem and pick the cheapest method that will actually solve it, rather than reaching for whatever was in the last tutorial you read.

Build four end-to-end applied projects on real data. A contract-type classifier trained on real legal documents (Chapter 14).

A legal-document assistant fine-tuned with QLoRA on a real legal corpus (Chapter 15).

A text-to-SQL system that translates natural language into working database queries (Chapter 16).

A function-calling system that teaches an LLM to use your APIs and powers the

AI agents and agentic workflows everyone is building right now (Chapter 17).

Every project has runnable code, a real dataset, and a step-by-step walkthrough from data preparation to a deployed model.

Make production inference fast and cheap. The KV cache. Prefix caching. Quantisation. Continuous batching. Speculative decoding. Each one is broken down with concrete examples, real numbers, and the reasoning behind why it works. You will understand why putting variables at the end of your prompt makes API calls ten times cheaper, why a 70-billion-parameter model fits on a single consumer GPU after QLoRA, and why the same prompt sometimes produces different outputs at temperature zero.

Who this book is for:

Software engineers who have shipped with the OpenAI or Claude API and are tired of paying for capabilities they could fine-tune themselves for a fraction of the price.

Machine-learning engineers who need to take an open-weights model and adapt it to a specific business domain, on a real budget, on real hardware.

Practitioners building agents and agentic systems who need to understand function calling at the level beneath the framework abstractions.

Engineering managers and tech leads who need to make build-versus-buy decisions about LLM features and want the technical depth to defend those decisions in front of a CFO.

What makes this book different:

Most fine-tuning content online is either toy examples on famous datasets (which never transfer to real work) or library tours that teach you which Hugging Face button to click without explaining why.

Inside Large Language Models, Volume II takes the third path. It teaches the underlying mechanics of every production technique, then walks through four real applied projects from data preparation to deployed model. You finish the book with code you can actually use, models you have actually trained, and the judgement to know which technique fits your next problem before you have written a single line of code.

Inside Large Language Models for absolute beginners: Volume I

Simple Arithmetic and beginners Python based approach

What if you could understand how ChatGPT actually works, with nothing more than high-school algebra and a working laptop?

Inside Large Language Models, Volume I is the book the field has been missing: a plain-English, math-light, code-first introduction to the technology behind every modern AI assistant. No prior machine learning experience is assumed. No graduate-level mathematics is required. Every concept is walked through with simple arithmetic that a motivated high-school student can follow on paper.

Volume I takes you from the very first question, "what is a large language model, really?" to building and training a complete GPT-style model from scratch in Python. Along the way you will:

  • See every step worked out by hand. When the book introduces attention, you compute attention scores between three actual tokens with three-dimensional vectors and a calculator. When it introduces softmax, you apply softmax to a tiny list of numbers and watch the probabilities come out correctly. There is no hand-waving, no "it can be shown that," no skipping the math.
  • Build the transformer block, piece by piece. Single-head attention. Multi-head attention. Residual connections. Layer normalisation. Feed-forward networks. The language modeling head. Every component gets a chapter that explains the problem it solves, the math behind it, and a working PyTorch implementation you can run on your laptop.
  • Learn the math the way it should be taught. The dot product is presented as a similarity score with a worked example. Softmax is presented as a soft winner-take-all rule with a four-row computation. Backpropagation is walked through a tiny one-weight network with arithmetic at every step before scaling to a 96-layer transformer. If you can multiply two numbers, you can follow this book.
  • Train your own GPT. The final chapter assembles everything into a complete, runnable Python implementation that trains on a small text corpus and generates new text. You will run it. You will modify it. You will understand exactly what every line does.

Who this book is for:

  • Software engineers who want to move beyond calling APIs and actually understand the systems they ship.
  • Students who are tired of textbooks that hide the math behind notation and want to see every step.
  • Curious readers with a high-school background who have heard about transformers and want a real, technical understanding without a PhD-level prerequisite.
  • Practitioners moving into AI roles who need a foundation that goes deeper than online tutorials.

What makes this book different:

Most LLM books fall into one of two camps: the popular-science books that explain the ideas without ever showing the math, and the academic textbooks that bury the ideas under a wall of notation. Inside Large Language Models takes a third path. It treats the reader as a serious adult who wants the real machinery, but it refuses to require any background the reader does not already have. Every formula is preceded by a plain-English paragraph that explains what the formula is doing. Every code listing is followed by a line-by-line table that explains what each line is doing. Every concept is paired with a concrete numerical example you can verify on paper.

Volume I is the foundation: tokenisation, embeddings, positional encoding, attention in all its forms, the complete transformer block, training, and a from-scratch GPT. Volume II takes those foundations into production: inference, alignment, fine-tuning, and four end-to-end fine-tuning projects.

By the end of Volume I, you will not just know how a transformer works. You will have built one yourself, trained it, and watched it generate text. The mystery will be gone. What is left is mastery.

Companion code: every listing in the book is available as a runnable Python file at https://github.com/ritesh-modi/inside-llm. Clone it, run it, modify it, break it, fix it.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub