Building A Small Language Model from Scratch: A Practical Guide
Building A Small Language Model from Scratch: A Practical Guide
About the Book
Building Small Language Models from Scratch: A Practical Guide
The Illustrated Guide to Building LLMs from Scratch
Most books teach you how to use language models. This one teaches you how to build them yourself, from the ground up.
By the end of this comprehensive 854-page guide, you'll have implemented a complete 283M parameter Qwen3 model trained on real data. You'll understand every component, write every line of code, and train a working language model that generates coherent text.
Why This Book is Different
Unlike tutorials that show you how to use existing models, this book takes you through building every component yourself:
- Implement attention mechanisms from scratch
- Build positional encodings (RoPE) yourself
- Create feed-forward networks and normalization layers
- Write the complete transformer architecture
- Train a real model on the TinyStories dataset
What You'll Build
By working through this book, you'll create a complete Qwen3-based language model with:
- 283M parameters
- Modern architecture: Grouped Query Attention (GQA), RoPE, RMSNorm, SwiGLU
- 32,768 token context length
- Full training pipeline from data preprocessing to model evaluation
What's Inside (14 Comprehensive Chapters)
1. Neural Networks - Build a solid foundation from first principles
2. PyTorch - Master the deep learning framework
3. GPU Computing - Optimize for performance
4. Data - Collect, process, and prepare training data
5. Model Scale - Understand the relationship between size and capability
6. Tokenization & Embeddings - Process text for language models
7. Positional Encodings - Implement RoPE and understand alternatives
8. Attention Mechanisms - Build the heart of transformers
9. KV Cache, MQA, GQA - Optimize attention for efficiency
10. Building Blocks - RMSNorm, SwiGLU, and modern components
11. Building Qwen from Scratch - Complete model implementation
12. Quantization - Make models efficient and deployable
13. Mixture of Experts - Scale with efficiency
14. Training Small Language Models - Complete training pipeline
Key Features
✅ Complete Implementation - Every component built from scratch, no black boxes
✅ Modern Architecture - State-of-the-art techniques (GQA, RoPE, RMSNorm, SwiGLU)
✅ Real Training - Train on the TinyStories dataset with full training loops
✅ Production-Ready Code - All examples work on Google Colab or your local GPU
✅ Comprehensive Coverage - From neural network basics to advanced topics
✅ Hands-On Learning - Understand by doing, not just reading
Perfect For
- Developers who want to understand transformers at a fundamental level
- Researchers building custom language models
- Students learning deep learning and NLP
- Engineers who need to modify and optimize language models
- Anyone tired of using models without understanding how they work
What You'll Gain
- Deep understanding of how transformers work internally
- Practical skills in data processing, training loops, and optimization
- Ability to modify, optimize, and adapt models for your needs
- Real implementation experience with a working trained model
- Foundation that scales from small models to large systems
Technical Details
- Model Size: 283M parameters
- Training Time: ~5-6 hours on NVIDIA A100 (longer on consumer GPUs)
- Memory: ~8GB VRAM required (works on RTX 3060, RTX 3070, RTX 4090)
- Dataset: TinyStories (2.14M examples, ~1GB)
- Framework: PyTorch with Python 3.8+
- Platform: Google Colab compatible (free GPU access)
Includes
- Complete source code for all implementations
- Google Colab notebooks for easy setup
- Detailed explanations of every design decision
- Training scripts and optimization techniques
- Troubleshooting guides and best practices
Prerequisites
- Basic Python programming
- Fundamental machine learning concepts
- No prior transformer experience required—we build that knowledge together
Start Building Today
Stop using language models as black boxes. Start understanding them from the inside out. By the end of this book, you'll have built a working language model yourself—and you'll understand every component that makes it work.
Note: This is a comprehensive 854-page guide. We recommend taking your time with each chapter, especially the foundational early chapters, to build a solid understanding before moving to advanced topics.
Table of Contents
Table of Contents
Chapter 0 — Building from ScratchSets expectations for the book. Explains what "from scratch" truly means, what it does not mean, and what prerequisites the reader needs. Introduces the overall journey of building a small language model from first principles.
Chapter 1 — Understanding Neural Networks: The Foundations of Modern AICovers the core building blocks of neural networks: neurons, weights, biases, activation functions, forward and backward propagation, losses, optimizers, and training challenges. This chapter builds the intuition needed before diving into transformers.
Chapter 2 — PyTorch Fundamentals: The Building Blocks of Deep LearningIntroduces tensors, operations, reshaping, indexing, GPU support, and key PyTorch APIs. Builds the practical foundation needed to implement neural network components later in the book.
Chapter 3 — GPUs: The Computational Engine Behind LLM TrainingExplains CPU vs GPU architecture, VRAM, tensor cores, FLOPS, monitoring GPU memory, avoiding OOM errors, and understanding how deep learning workloads run on hardware. Provides context for training performance and hardware choices.
Chapter 4 — Where Intelligence Comes From: A Deep Look at DataFocuses on why data quality matters more than architecture. Explores real-world datasets like Common Crawl, Books, Wikipedia, StackExchange, and GitHub. Discusses scaling laws, data curation, deduplication, and multi-stage training datasets.
Chapter 5 — Understanding Language Models: From Foundations to Small-Scale DesignExplains what a language model is mathematically, why scaling matters, emergent abilities, transformer basics, and why building smaller custom models remains valuable. Sets the stage for designing your own LLM.
Chapter 6 — Tokenizer: How Language Models Break Text into Meaningful UnitsIntroduces character, word, and subword tokenization. Explains why tokenization exists, how it affects downstream model performance, and why small models must optimize vocabulary carefully.
Chapter 7 — Understanding Embeddings, Positional Encodings, and RoPEDiscusses embeddings as dense vector representations, positional encodings (integer, binary, sinusoidal), their limitations, and why RoPE (Rotary Position Embedding) became the modern standard. Includes intuitive and mathematical explanations.
Chapter 8 — Understanding Attention: From Self-Attention to Multi-Head AttentionCovers the attention mechanism step-by-step: queries, keys, values, dot products, scaling, softmax, causal masks, multi-head attention, and detailed PyTorch-like breakdowns. Builds intuition for how transformers process context.
Chapter 9 — Making Inference Fast: KV Cache, Multi-Query, and Grouped-Query AttentionExplains the inference loop, KV caching, why only the last token matters, and how cache size affects memory and speed. Introduces Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) to reduce KV cache memory costs while preserving performance.
Chapter 10 — Inside the Transformer Block: RMSNorm, SwiGLU, and Residual ConnectionsBreaks down the internal block structure, normalization layers, why RMSNorm is preferred, how SwiGLU works, and why residual connections improve depth and gradient flow. Prepares the reader to assemble full transformer blocks.
Chapter 11 — Building Qwen from ScratchA hands-on implementation chapter covering tokenization, dataset preparation (TinyStories), RoPE, RMSNorm, GQA, SwiGLU, transformer blocks, causal masks, loss computation, generation loop, and training loop, culminating in a full Qwen-style model.
Chapter 12 — QuantizationExplains how LLM weights are stored, numerical precision, integer vs floating formats, 8-bit/4-bit quantization, BitsAndBytes usage, perplexity evaluation, and how quantization affects performance and accuracy.
Chapter 13 — Mixture of ExpertsIntroduces the MoE architecture, sparse activation, expert routing, top-k gating, load balancing, and the historical evolution of MoE from 1990s research to modern implementations like DeepSeek. Includes conceptual and mathematical explanations.
Chapter 14 — Training Small Language Models: A Practical JourneyCovers architectural choices, tokenizer selection, dataset curation, debugging, GPU selection, memory optimization, training loops, and evaluation strategies. Wraps up the end-to-end pipeline for training effective small language models.
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $14 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them