Building A Small Language Model from Scratch: A Practical Guide

Master language models by building one yourself. This hands-on guide takes you from neural network fundamentals to implementing a complete Qwen3 model with modern techniques. Includes all code, Colab notebooks, and real training on the TinyStories dataset.

Prashant Lakhera

Minimum price

$19.00

$29.00

You pay

$29.00

Author earns

$23.20

PDF

About

About the Book

Building Small Language Models from Scratch: A Practical Guide

The Illustrated Guide to Building LLMs from Scratch

Most books teach you how to use language models. This one teaches you how to build them yourself, from the ground up.

By the end of this comprehensive 854-page guide, you'll have implemented a complete 283M parameter Qwen3 model trained on real data. You'll understand every component, write every line of code, and train a working language model that generates coherent text.

Why This Book is Different

Unlike tutorials that show you how to use existing models, this book takes you through building every component yourself:

- Implement attention mechanisms from scratch

- Build positional encodings (RoPE) yourself

- Create feed-forward networks and normalization layers

- Write the complete transformer architecture

- Train a real model on the TinyStories dataset

What You'll Build

By working through this book, you'll create a complete Qwen3-based language model with:

- 283M parameters

- Modern architecture: Grouped Query Attention (GQA), RoPE, RMSNorm, SwiGLU

- 32,768 token context length

- Full training pipeline from data preprocessing to model evaluation

What's Inside (14 Comprehensive Chapters)

1. Neural Networks - Build a solid foundation from first principles

2. PyTorch - Master the deep learning framework

3. GPU Computing - Optimize for performance

4. Data - Collect, process, and prepare training data

5. Model Scale - Understand the relationship between size and capability

6. Tokenization & Embeddings - Process text for language models

7. Positional Encodings - Implement RoPE and understand alternatives

8. Attention Mechanisms - Build the heart of transformers

9. KV Cache, MQA, GQA - Optimize attention for efficiency

10. Building Blocks - RMSNorm, SwiGLU, and modern components

11. Building Qwen from Scratch - Complete model implementation

12. Quantization - Make models efficient and deployable

13. Mixture of Experts - Scale with efficiency

14. Training Small Language Models - Complete training pipeline

Key Features

✅ Complete Implementation - Every component built from scratch, no black boxes

✅ Modern Architecture - State-of-the-art techniques (GQA, RoPE, RMSNorm, SwiGLU)

✅ Real Training - Train on the TinyStories dataset with full training loops

✅ Production-Ready Code - All examples work on Google Colab or your local GPU

✅ Comprehensive Coverage - From neural network basics to advanced topics

✅ Hands-On Learning - Understand by doing, not just reading

Perfect For

- Developers who want to understand transformers at a fundamental level

- Researchers building custom language models

- Students learning deep learning and NLP

- Engineers who need to modify and optimize language models

- Anyone tired of using models without understanding how they work

What You'll Gain

- Deep understanding of how transformers work internally

- Practical skills in data processing, training loops, and optimization

- Ability to modify, optimize, and adapt models for your needs

- Real implementation experience with a working trained model

- Foundation that scales from small models to large systems

Technical Details

- Model Size: 283M parameters

- Training Time: ~5-6 hours on NVIDIA A100 (longer on consumer GPUs)

- Memory: ~8GB VRAM required (works on RTX 3060, RTX 3070, RTX 4090)

- Dataset: TinyStories (2.14M examples, ~1GB)

- Framework: PyTorch with Python 3.8+

- Platform: Google Colab compatible (free GPU access)

Includes

- Complete source code for all implementations

- Google Colab notebooks for easy setup

- Detailed explanations of every design decision

- Training scripts and optimization techniques

- Troubleshooting guides and best practices

Prerequisites

- Basic Python programming

- Fundamental machine learning concepts

- No prior transformer experience required—we build that knowledge together

Start Building Today

Stop using language models as black boxes. Start understanding them from the inside out. By the end of this book, you'll have built a working language model yourself—and you'll understand every component that makes it work.

Note: This is a comprehensive 854-page guide. We recommend taking your time with each chapter, especially the foundational early chapters, to build a solid understanding before moving to advanced topics.

Share this book

Feedback

Email the Author

Author

About the Author

Prashant Lakhera

Prashant Lakhera is an AI researcher and educator passionate about making deep learning accessible. This book represents years of experience building and training language models, distilled into a practical, hands-on guide.

Table of Contents

Chapter 0 — Building from Scratch

Sets expectations for the book. Explains what "from scratch" truly means, what it does not mean, and what prerequisites the reader needs. Introduces the overall journey of building a small language model from first principles.

Chapter 1 — Understanding Neural Networks: The Foundations of Modern AI

Covers the core building blocks of neural networks: neurons, weights, biases, activation functions, forward and backward propagation, losses, optimizers, and training challenges. This chapter builds the intuition needed before diving into transformers.

Chapter 2 — PyTorch Fundamentals: The Building Blocks of Deep Learning

Introduces tensors, operations, reshaping, indexing, GPU support, and key PyTorch APIs. Builds the practical foundation needed to implement neural network components later in the book.

Chapter 3 — GPUs: The Computational Engine Behind LLM Training

Explains CPU vs GPU architecture, VRAM, tensor cores, FLOPS, monitoring GPU memory, avoiding OOM errors, and understanding how deep learning workloads run on hardware. Provides context for training performance and hardware choices.

Chapter 4 — Where Intelligence Comes From: A Deep Look at Data

Focuses on why data quality matters more than architecture. Explores real-world datasets like Common Crawl, Books, Wikipedia, StackExchange, and GitHub. Discusses scaling laws, data curation, deduplication, and multi-stage training datasets.

Chapter 5 — Understanding Language Models: From Foundations to Small-Scale Design

Explains what a language model is mathematically, why scaling matters, emergent abilities, transformer basics, and why building smaller custom models remains valuable. Sets the stage for designing your own LLM.

Chapter 6 — Tokenizer: How Language Models Break Text into Meaningful Units

Introduces character, word, and subword tokenization. Explains why tokenization exists, how it affects downstream model performance, and why small models must optimize vocabulary carefully.

Chapter 7 — Understanding Embeddings, Positional Encodings, and RoPE

Discusses embeddings as dense vector representations, positional encodings (integer, binary, sinusoidal), their limitations, and why RoPE (Rotary Position Embedding) became the modern standard. Includes intuitive and mathematical explanations.

Chapter 8 — Understanding Attention: From Self-Attention to Multi-Head Attention

Covers the attention mechanism step-by-step: queries, keys, values, dot products, scaling, softmax, causal masks, multi-head attention, and detailed PyTorch-like breakdowns. Builds intuition for how transformers process context.

Chapter 9 — Making Inference Fast: KV Cache, Multi-Query, and Grouped-Query Attention

Explains the inference loop, KV caching, why only the last token matters, and how cache size affects memory and speed. Introduces Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) to reduce KV cache memory costs while preserving performance.

Chapter 10 — Inside the Transformer Block: RMSNorm, SwiGLU, and Residual Connections

Breaks down the internal block structure, normalization layers, why RMSNorm is preferred, how SwiGLU works, and why residual connections improve depth and gradient flow. Prepares the reader to assemble full transformer blocks.

Chapter 11 — Building Qwen from Scratch

A hands-on implementation chapter covering tokenization, dataset preparation (TinyStories), RoPE, RMSNorm, GQA, SwiGLU, transformer blocks, causal masks, loss computation, generation loop, and training loop, culminating in a full Qwen-style model.

Chapter 12 — Quantization

Explains how LLM weights are stored, numerical precision, integer vs floating formats, 8-bit/4-bit quantization, BitsAndBytes usage, perplexity evaluation, and how quantization affects performance and accuracy.

Chapter 13 — Mixture of Experts

Introduces the MoE architecture, sparse activation, expert routing, top-k gating, load balancing, and the historical evolution of MoE from 1990s research to modern implementations like DeepSeek. Includes conceptual and mathematical explanations.

Chapter 14 — Training Small Language Models: A Practical Journey

Covers architectural choices, tokenizer selection, dataset curation, debugging, GPU selection, memory optimization, training loops, and evaluation strategies. Wraps up the end-to-end pipeline for training effective small language models.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub