Deep Learning with PyTorch Step-by-Step
Minimum price
Suggested price

Deep Learning with PyTorch Step-by-Step

A Beginner's Guide

About the Book

UPDATE (July, 19th, 2022): The Spanish version of Part I, Fundamentals, was published today:

UPDATE (February 23rd, 2022): The paperback edition is available now (the book had to be split into 3

volumes for printing). For more details, please check

UPDATE (February 13th, 2022): The latest revised edition (v1.1.1) was published today to address small changes to Chapters 9 and 10 that weren't included in the previous revision.

UPDATE (January 23rd, 2022): The revised edition (v1.1) was published today - better graphics, improved formatting, larger page size (thus reducing page count from 1187 to 1045 pages - no content was removed!). If you already bought the book, you can download the new version at any time!

If you're looking for a book where you can learn about Deep Learning and PyTorch without having to spend hours deciphering cryptic text and code, and that's easy and enjoyable to read, this is it :-)

The book covers from the basics of gradient descent all the way up to fine-tuning large NLP models (BERT and GPT-2) using HuggingFace. It is divided into four parts:

  • Part I: Fundamentals (gradient descent, training linear and logistic regressions in PyTorch)
  • Part II: Computer Vision (deeper models and activation functions, convolutions, transfer learning, initialization schemes)
  • Part III: Sequences (RNN, GRU, LSTM, seq2seq models, attention, self-attention, transformers)
  • Part IV: Natural Language Processing (tokenization, embeddings, contextual word embeddings, ELMo, BERT, GPT-2)

This is not a typical book: most tutorials start with some nice and pretty image classification problem to illustrate how to use PyTorch. It may seem cool, but I believe it distracts you from the main goal: how PyTorch works? In this book, I present a structured, incremental, and from first principles approach to learn PyTorch (and get to the pretty image classification problem in due time).

Moreover, this is not a formal book in any way: I am writing this book as if I were having a conversation with you, the reader. I will ask you questions (and give you answers shortly afterward) and I will also make (silly) jokes.

My job here is to make you understand the topic, so I will avoid fancy mathematical notation as much as possible and spell it out in plain English.

In this book, I will guide you through the development of many models in PyTorch, showing you why PyTorch makes it much easier and more intuitive to build models in Python: autograd, dynamic computation graph, model classes and much, much more.

We will build, step-by-step, not only the models themselves but also your understanding as I show you both the reasoning behind the code and how to avoid some common pitfalls and errors along the way.

I wrote this book for beginners in general - not only PyTorch beginners. Every now and then I will spend some time explaining some fundamental concepts which I believe are key to have a proper understanding of what's going on in the code.

Maybe you already know well some of those concepts: if this is the case, you can simply skip them, since I've made those explanations as independent as possible from the rest of the content.

  • Share this book

  • Categories

    • Artificial Intelligence
    • Machine Learning
    • Data Science
  • Feedback

    Email the Author(s)

About the Author

Daniel Voigt Godoy
Daniel Voigt Godoy

Daniel has been teaching machine learning and distributed computing technologies at Data Science Retreat, the longest-running Berlin-based bootcamp, for more than three years, helping more than 150 students advance their careers.

He writes regularly for Towards Data Science. His blog post "Understanding PyTorch with an example: a step-by-step tutorial" reached more than 220,000 views since it was published.

The positive feedback from the readers resulted in an invitation to speak at the Open Data Science Conference (ODSC) Europe in 2019. It also motivated him to write the book "Deep Learning with PyTorch Step-by-Step", which covers a broader range of topics.

Daniel is also the main contributor of two python packages: HandySpark and DeepReplay.

His professional background includes 20 years of experience working for companies in several industries: banking, government, fintech, retail and mobility.

Reader Testimonials

Mahmud Hasan
Mahmud Hasan

Machine Learning Engineer at Micron Technology, Smart Manufacturing and AI

I am usually really picky in choosing books about ML/DL but I have to tell you, this book was one of the best books I have ever invested in. I cannot thank you enough for writing a book that gives so much clarity on the explanations of the inner workings of many DL techniques. Thank you so much and I hope you come up with even better books on other ML topics in the future.

Nipun Nayan Sadvilkar
Nipun Nayan Sadvilkar

Lead Data Scientist & Author, DL & NLP Workshop

As an author myself who've co-authored two books in Deep Learning & NLP space, I'm extremely impressed by Daniel's step-by-step pedagogical approach. Starting with a toy problem and gradually building abstractions on top of each other massively helps beginner to understand the nuts and bolts of each models and neural architectures be it basic or advanced! Daniel has justified "step-by-step" part from the title in a true sense. Highly recommended! ?

Table of Contents

  • Preface
  • About the Author
  • Frequently Asked Questions (FAQ)
    • Why PyTorch?
    • Why this book?
    • Who should read this book?
    • What do I need to know?
    • How to read this book?
    • What’s Next?
  • Setup Guide
    • Official Repository
    • Environment
      • Google Colab
      • Binder
      • Local Installation
    • Moving On
  • Part I: Fundamentals
  • Chapter 0: Visualizing Gradient Descent
    • Visualizing Gradient Descent
    • Model
    • Data Generation
    • Step 0: Random Initialization
    • Step 1: Compute Model’s Predictions
    • Step 2: Compute the Loss
    • Step 3: Compute the Gradients
    • Step 4: Update the Parameters
    • Step 5: Rinse and Repeat!
  • Chapter 1: A Simple Regression Problem
    • A Simple Regression Problem
    • Data Generation
    • Gradient Descent
    • Linear Regression in Numpy
    • PyTorch
    • Autograd
    • Dynamic Computation Graph
    • Optimizer
    • Loss
    • Model
  • Chapter 2: Rethinking the Training Loop
    • Rethinking the Training Loop
    • Dataset
    • DataLoader
    • Evaluation
    • TensorBoard
    • Saving and Loading Models
  • Chapter 2.1: Going Classy
    • Going Classy
      • The Class
      • The Constructor
      • Training Methods
      • Saving and Loading Methods
      • Visualization Methods
      • The Full Code
    • Classy Pipeline
      • Model Training
      • Making Predictions
      • Checkpointing
      • Resuming Training
  • Chapter 3: A Simple Classification Problem
    • A Simple Classification Problem
    • Data Generation
    • Data Preparation
    • Model
    • Loss
      • BCELoss
      • BCEWithLogitsLoss
      • Imbalanced Dataset
    • Model Configuration
    • Model Training
    • Decision Boundary
    • Classification Threshold
      • Confusion Matrix
      • Metrics
      • Trade-offs and Curves
  • Part II: Computer Vision
  • Chapter 4: Classifying Images
    • Classifying Images
    • Torchvision
    • Data Preparation
      • Dataset Transforms
      • SubsetRandomSampler
      • Data Augmentation Transforms
      • WeightedRandomSampler
      • Seeds and more (seeds)
      • Putting It Together
      • Pixels as Features
    • Shallow Model
    • Deep-ish Model
    • Activation Functions
    • Deep Model
  • Bonus Chapter: Feature Space
    • Two-Dimensional Feature Space
    • Transformations
    • A Two-Dimensional Model
    • Decision Boundary, Activation Style!
    • More Functions, More Boundaries
    • More Layers, More Boundaries
    • More Dimensions, More Boundaries
  • Chapter 5: Convolutions
    • Spoilers
    • Jupyter Notebook
    • Convolutions
      • Filter/Kernel
      • Convolving
      • Moving Around
      • Shape
      • Convolving in PyTorch
      • Striding
      • Padding
      • A REAL Filter
    • Pooling
    • Flattening
    • Dimensions
    • Typical Architecture
    • A Multiclass Classification Problem
      • Data Generation
      • Data Preparation
      • Loss
      • Classification Losses Showdown!
      • Model Configuration
      • Model Training
    • Visualizing Filters and More!
      • Static Method
      • Visualizing Filters
      • Hooks
      • Visualizing Feature Maps
      • Visualizing Classifier Layers
      • Accuracy
      • Loader Apply
    • Putting It All Together
    • Recap
  • Chapter 6: Rock, Paper, Scissors
    • Rock, Paper, Scissors...
    • Data Preparation
      • ImageFolder
      • Standardization
      • The Real Datasets
    • Three-Channel Convolutions
    • Fancier Model
    • Dropout
    • Model Configuration
    • Model Training
    • Learning Rates
      • Finding LR
      • Adaptive Learning Rate
      • Stochastic Gradient Descent (SGD)
          Flavors of SGD
      • Learning Rate Schedulers
      • Adaptive vs Cycling
  • Chapter 7: Transfer Learning
    • Transfer Learning
    • ImageNet
    • ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
    • Transfer Learning in Practice
      • Pre-Trained Model
      • Model Configuration
      • Data Preparation
      • Model Training
      • Generating a Dataset of Features
      • Top Model
    • Auxiliary Classifiers (Side-Heads)
    • 1x1 Convolutions
    • Inception Modules
    • Batch Normalization
      • Running Statistics
      • Evaluation Phase
      • Momentum
      • BatchNorm2d
      • Other Normalizations
      • Small Summary
    • Residual Connections
      • Learning the Identity
      • The Power of Shortcuts
      • Residual Blocks
    • Putting It All Together
      • Fine-Tuning
      • Feature Extraction
  • Extra Chapter: Vanishing and Exploding Gradients
    • Vanishing Gradients
    • Initialization Schemes
    • Batch Normalization
    • Exploding Gradients
    • Gradient Clipping
      • Value Clipping
      • Norm Clipping
      • Clipping with Hooks
  • Part III: Sequences
  • Chapter 8: Sequences
    • Sequences
    • Data Generation
    • Recurrent Neural Networks (RNNs)
      • RNN Cell
      • RNN Layer
      • Shapes
      • Stacked RNN
      • Bidirectional RNN
      • Square Model
      • Visualizing the Model
        • Transformed Inputs
        • Hidden States
        • The Journey of a Hidden State
    • Gated Recurrent Units (GRUs)
    • Long Short-Term Memory (LSTM)
    • Variable-Length Sequences
      • Padding
      • Packing
      • Unpacking (to padded)
      • Packing (from padded)
      • Variable-Length Dataset
      • Collate Function
    • 1D Convolutions
      • Shapes
      • Multiple Features or Channels
      • Dilation
  • Chapter 9: Sequence-to-Sequence
    • Sequence-to-Sequence
    • Encoder-Decoder Architecture
      • Teacher Forcing
    • Attention
      • "Values"
      • "Keys" and "Queries"
      • Computing the Context Vector
      • Scoring Method
      • Attention Scores
      • Scaled Dot Product
      • Attention Mechanism
      • Source Mask
      • Decoder
      • Encoder + Decoder + Attention
      • Multi-Headed Attention
    • Self-Attention
      • Encoder
      • Cross-Attention
      • Decoder
        • Subsequent Inputs and Teacher Forcing
        • Target Mask
    • Positional Encoding (PE)
  • Chapter 10: Transform and Roll Out
    • Transform and Roll Out
    • Narrow Attention
      • Chunking
      • Multi-Headed Attention
    • Stacking Encoders and Decoders
    • Wrapping "Sub-Layers"
    • Transformer Encoder
    • Transformer Decoder
    • Layer Normalization
      • Batch vs Layer
      • Projections or Embeddings
    • The Transformer
    • The PyTorch Transformer
    • Vision Transformer
      • Patches
      • Special Classifier Token
  • Part IV: Natural Language Processing
  • Chapter 11: Down the Yellow Brick Rabbit Hole
    • Down the Yellow Brick Rabbit Hole
    • Building a Dataset
      • Sentence Tokenization
      • HuggingFace's Dataset
    • Word Tokenization
      • Vocabulary
      • HuggingFace's Tokenizer
    • Before Word Embeddings
      • One-Hot Encoding (OHE)
      • Bag-of-Words (BoW)
      • Language Models
      • N-grams
      • Continuous Bag-of-Words (CBoW)
    • Word Embeddings
      • Word2Vec
      • Global Vectors (GloVe)
      • Using Word Embeddings
      • Model I - GloVe + Classifier
      • Model II - GloVe + Transformer
    • Contextual Word Embeddings
      • ELMo
      • BERT
      • Document Embeddings
      • Model III - Preprocessed Embeddings
    • BERT
      • Tokenization
      • Input Embeddings
      • Pretraining Tasks
      • Model IV - Classifying using BERT
    • Fine-Tuning with HuggingFace
      • Sequence Classification (or Regression)
      • Trainer
      • Pipelines
    • GPT-2

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub