Leanpub Header

Skip to main content

Inside Large Language Models for absolute beginners: Volume II

Simple Arithmetic and beginners Python based approach

Most books about ChatGPT explain the magic. This one shows you the math. Inside Large Language Models, Volume I takes a curious beginner from "what is an LLM" to a complete, trained GPT, with nothing more than high-school algebra, a working laptop, and a willingness to read carefully. Every formula is walked through by hand. Every line of code comes with a plain-English explanation. By the end you will have built, trained, and run your own transformer from scratch, and you will know exactly what is happening inside.

No PhD or Data Science required. No prior machine learning needed. Just curiosity and a calculator.

Minimum price

$19.00

$24.00

You pay

$24.00

Author earns

$19.20
$
You can also buy this book with 1 book credit. Get book credits with a Reader Membership or an Organization Membership for your team.
PDF
About

About

About the Book

What if you could turn any open-weights language model into a domain expert that knows your contracts, your databases, and your tools, and ship it without a research lab budget?

Inside Large Language Models, Volume II is the book that takes the foundation built in Volume I and turns it into a working production system. It is the book for the engineer who has stopped

wondering how attention works and started wondering why their fine-tuning bill is bigger than their server bill, why a seven-billion-parameter model is the largest they can fit on their hardware,and how the production teams shipping LLMs to millions of users do it without a research lab budget.

Volume II picks up where Volume I left off. The transformer is built. The model is trained. Now what? The next nine chapters answer that question end to end: how inference actually works token by token, how to align a model with human preferences using RLHF, how to fine-tune billion-parameter models on consumer hardware with LoRA and QLoRA, how to make production inference ten times cheaper, and how to build four real applied systems on real data.

Along the way you will:

See every production technique worked out the same way the math was in Volume I. When the book introduces the KV cache, you watch a concrete attention computation grow token by token and see exactly which tensors get cached and which get recomputed. When it introduces quantisation, you take a real weight matrix from FP32 down to INT4 and check that the dequantized version still produces sensible outputs. There is no "this is an industry standard," no "the framework handles it." You see the bytes.

Understand fine-tuning at the level where you can pick the right tool. Full fine-tuning, LoRA, QLoRA, parameter-efficient tuning, instruction tuning, RLHF with PPO. Each method gets a chapter that explains the problem it solves, the math behind it, the cost trade-off, and a working PyTorch implementation you can run on your laptop. By the end you can look at a new problem and pick the cheapest method that will actually solve it, rather than reaching for whatever was in the last tutorial you read.

Build four end-to-end applied projects on real data. A contract-type classifier trained on real legal documents (Chapter 14).

A legal-document assistant fine-tuned with QLoRA on a real legal corpus (Chapter 15).

A text-to-SQL system that translates natural language into working database queries (Chapter 16).

A function-calling system that teaches an LLM to use your APIs and powers the

AI agents and agentic workflows everyone is building right now (Chapter 17).

Every project has runnable code, a real dataset, and a step-by-step walkthrough from data preparation to a deployed model.

Make production inference fast and cheap. The KV cache. Prefix caching. Quantisation. Continuous batching. Speculative decoding. Each one is broken down with concrete examples, real numbers, and the reasoning behind why it works. You will understand why putting variables at the end of your prompt makes API calls ten times cheaper, why a 70-billion-parameter model fits on a single consumer GPU after QLoRA, and why the same prompt sometimes produces different outputs at temperature zero.

Who this book is for:

Software engineers who have shipped with the OpenAI or Claude API and are tired of paying for capabilities they could fine-tune themselves for a fraction of the price.

Machine-learning engineers who need to take an open-weights model and adapt it to a specific business domain, on a real budget, on real hardware.

Practitioners building agents and agentic systems who need to understand function calling at the level beneath the framework abstractions.

Engineering managers and tech leads who need to make build-versus-buy decisions about LLM features and want the technical depth to defend those decisions in front of a CFO.

What makes this book different:

Most fine-tuning content online is either toy examples on famous datasets (which never transfer to real work) or library tours that teach you which Hugging Face button to click without explaining why.

Inside Large Language Models, Volume II takes the third path. It teaches the underlying mechanics of every production technique, then walks through four real applied projects from data preparation to deployed model. You finish the book with code you can actually use, models you have actually trained, and the judgement to know which technique fits your next problem before you have written a single line of code.

Bundle

Bundles that include this book

Author

About the Author

Ritesh Modi

Ritesh Modi is Head of AI at MarketOnce and a former Forward Deployed Engineer at Microsoft. He has spent more than a decade building and shipping production systems across cloud, distributed computing, and applied machine learning, working with organizations ranging from global enterprises to fast-moving startups. His recent work focuses on applied large language models, designing systems that turn pretrained models into reliable, task-specific tools.

Ritesh has authored multiple technology books and speaks regularly at industry conferences on AI, cloud architecture, and software engineering. His writing philosophy rests on a simple belief: the best technical books are written by practitioners who still remember what it felt like to not understand something, not by experts who have forgotten. Every explanation in this book was tested against that standard, if it would not have made sense to him when he was first learning this material, it was rewritten until it did.

He writes, shares ideas, and connects with readers at www.riteshmodi.com. When he is not writing or building AI systems, he can be found mentoring engineers, exploring new architectures, or debugging a training run that should have converged three hours ago.

Get the free sample chapters

Click the buttons to get the free sample in PDF or EPUB, or read the sample online here

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub