Most LLM tutorials stop at GPT-2. This book doesn't.
My Adventures with Large Language Models walks you through building five real LLM architectures from scratch in PyTorch, starting from a vanilla encoder-decoder Transformer and ending at DeepSeek's Multi-Head Latent Attention and Mixture-of-Experts.
Every chapter has runnable, end-to-end code. No pseudocode, no hand-waving. You type it, you run it, you understand it.
What you'll build:
Chapter 1: A vanilla encoder-decoder Transformer for English-to-Hindi translation. The fundamentals, implemented from the ground up.
Chapter 2: GPT-2 (124M parameters) from scratch, then load real OpenAI pretrained weights to verify your implementation works.
Chapter 3: Llama 3.2-3B by swapping exactly four components of your GPT-2. LayerNorm becomes RMSNorm. Learned positional encodings become RoPE. GELU becomes SwiGLU. Multi-Head Attention becomes Grouped-Query Attention. Then load Meta's pretrained weights.
Chapter 4: KV cache, Multi-Query Attention, and Grouped-Query Attention for inference optimisation.
Chapter 5: DeepSeek's full architecture. Multi-Head Latent Attention (with the absorption trick and decoupled RoPE), DeepSeekMoE (shared experts, fine-grained segmentation, auxiliary-loss-free load balancing), Multi-Token Prediction, and FP8 quantisation.
The code repository is open source: https://github.com/S1LV3RJ1NX/mal-code
This book is for ML engineers, researchers, and senior developers who know Python and PyTorch and want to understand modern LLMs at the level of code, not slides or blog posts. If you've read Raschka or watched Karpathy and want to go further, into Llama, GQA, MLA, and MoE, this is the book.