Leanpub Header

Skip to main content

Mathematics of Reinforcement Learning

Master the mathematics behind modern artificial intelligence. This complete two-volume series takes you from Bellman equations and Markov Decision Processes to Q-Learning, Deep Q Networks, Policy Gradients, Actor-Critic architectures, and advanced reinforcement learning research. Perfect for students, researchers, and AI professionals seeking both theoretical depth and practical understanding.

Bought separately

$48.99

$29.00

You pay

Author earns

$
These books have a total suggested price of $48.99. Get them now for only $29.00!
About

About

About the Bundle

The Mathematics of Reinforcement Learning Complete Series (Vol-I & Vol-II) is a comprehensive mathematical and algorithmic guide to one of the most influential fields in modern Artificial Intelligence. Written by Anshuman Mishra, this two-volume series takes readers on a rigorous journey from the fundamental principles of Markov Decision Processes and Bellman Equations to advanced topics such as Deep Q-Networks, Policy Gradient Methods, Actor-Critic Architectures, and modern Reinforcement Learning research.

Reinforcement Learning (RL) is the science of learning through interaction, rewards, and optimal decision-making under uncertainty. It powers some of the most remarkable achievements in AI, including autonomous robots, recommendation systems, self-driving vehicles, intelligent game-playing agents, and large-scale adaptive systems.

Unlike conventional books that emphasize coding implementations alone, this bundle focuses on the mathematical foundations that make reinforcement learning work. Every major concept is derived systematically using probability theory, linear algebra, optimization, stochastic processes, and dynamic programming.

What You'll Learn

Volume I – Foundations of Reinforcement Learning
  • Introduction to Reinforcement Learning
  • Agents, Environments, States, Actions, and Rewards
  • Markov Processes and Markov Decision Processes (MDPs)
  • Mathematical Foundations of RL
  • Linear Algebra and Probability for RL
  • Bellman Expectation and Optimality Equations
  • Dynamic Programming Methods
  • Policy Evaluation and Policy Iteration
  • Value Iteration Algorithms
  • Monte Carlo Methods
  • Temporal Difference Learning
  • Eligibility Traces and TD(λ)
  • Sarsa and Expected Sarsa
  • Q-Learning and Off-Policy Learning
Volume II – Advanced Reinforcement Learning and Research Perspectives
  • Policy Gradient Theorem
  • REINFORCE Algorithms
  • Actor-Critic Architectures
  • Entropy-Regularized Reinforcement Learning
  • Constrained and Safe Reinforcement Learning
  • Deep Q-Networks (DQN)
  • Double DQN and Dueling Networks
  • Prioritized Experience Replay
  • Function Approximation Methods
  • Proximal Policy Optimization (PPO)
  • Advanced Exploration Strategies
  • Convergence Analysis and Stability Proofs
  • Explainable Reinforcement Learning
  • Quantum Reinforcement Learning
  • Current Research Challenges and Future Directions

Why This Bundle Is Different

Most reinforcement learning books focus primarily on implementation and software libraries. This series takes a different approach by explaining the mathematical principles that govern learning, convergence, optimization, and decision-making.

Readers will learn:

  • Why Bellman equations form the foundation of intelligence.
  • How dynamic programming evolves into modern RL algorithms.
  • Why Q-learning converges toward optimal policies.
  • How policy gradients are mathematically derived.
  • How neural networks integrate with reinforcement learning.
  • How modern systems such as AlphaGo and autonomous robots learn from experience.
  • How theoretical guarantees and convergence proofs support practical AI systems.

Every concept is explained through mathematical derivations, numerical examples, graphical intuition, and algorithmic implementation.

Key Topics Covered

  • Dynamic Programming
  • Bellman Equations
  • Markov Decision Processes
  • Monte Carlo Estimation
  • Temporal Difference Learning
  • Q-Learning
  • Sarsa
  • Policy Optimization
  • Actor-Critic Methods
  • Deep Reinforcement Learning
  • Deep Q Networks (DQN)
  • PPO Algorithms
  • Function Approximation
  • Stochastic Optimization
  • Safe Reinforcement Learning
  • Explainable AI
  • Quantum Reinforcement Learning

Who Should Read This Bundle?

This bundle is ideal for:

  • B.Tech, BCA, MCA, M.Tech, and M.Sc. AI students
  • Artificial Intelligence and Data Science learners
  • Machine Learning Engineers
  • Research Scholars and PhD Candidates
  • University Faculty and Educators
  • Applied Mathematicians
  • Robotics and Autonomous Systems Researchers
  • Industry Professionals seeking deeper theoretical understanding of AI

Educational Value

The bundle is designed to serve as:

  • A university-level textbook
  • A reference for advanced AI courses
  • A guide for reinforcement learning research
  • A preparation resource for graduate studies and interviews
  • A mathematical foundation for modern AI development

Each chapter contains mathematical derivations, proofs, solved numerical examples, pseudocode, conceptual discussions, and research-oriented exercises.

Learning Outcomes

After completing this series, readers will be able to:

  • Understand reinforcement learning from first principles.
  • Derive Bellman equations mathematically.
  • Implement value-based and policy-based algorithms.
  • Analyze convergence and stability of RL methods.
  • Develop deep reinforcement learning systems.
  • Apply RL concepts to robotics, autonomous agents, and optimization problems.
  • Read and understand modern reinforcement learning research papers with confidence.

This complete series transforms reinforcement learning from a collection of algorithms into a mathematically elegant science of intelligent decision-making under uncertainty.

Books

About the Books

Mathematics of Reinforcement Learning VOL-1

 Mathematics of Reinforcement Learning: From Bellman Equations to Q-Learning  VOL-1
A Mathematical Journey through Dynamic Programming and Optimal Decision-Making
Author: Anshuman Mishra, M.Tech (Computer Science)
Assistant Professor, Doranda College, Ranchi University

 COPYRIGHT PAGE

© 2025 Anshuman Mishra, M.Tech (Computer Science)
All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the author or publisher, except for brief quotations used in reviews, academic references, or scholarly works.

First Edition: 2025

DISCLAIMER

This book is designed to provide academic and research-based knowledge on Mathematics of Reinforcement Learning, including the principles of dynamic programming, Bellman equations, Q-learning, and related computational models. The information contained herein is intended solely for educational purposes for students, teachers, and researchers in computer science, mathematics, and artificial intelligence.

While every effort has been made to ensure the accuracy of the contents, the author and publisher make no representations or warranties with respect to the accuracy or completeness of the contents of this book. The examples, algorithms, and derivations have been thoroughly checked, but errors may still exist. The author and publisher shall not be liable for any damages arising from the use of the material contained herein.

The mathematical examples and algorithms are for educational and illustrative purposes only.
Readers implementing algorithms for research or practical projects are encouraged to verify results independently and consult additional resources as needed.

All trademarks, trade names, or logos mentioned belong to their respective owners. Any resemblance of examples or case studies to actual data, individuals, or organizations is purely coincidental.

 BOOK DESCRIPTION

Title: Mathematics of Reinforcement Learning: From Bellman Equations to Q-Learning  VOL-1 Subtitle: A Mathematical Journey through Dynamic Programming and Optimal Decision-Making

Author: Anshuman Mishra, M.Tech (Computer Science)
Assistant Professor, Doranda College, Ranchi University

About the Book

The 21st century marks a revolutionary transformation in artificial intelligence (AI), where machines are not only learning from data but are also learning how to act intelligently in dynamic environments. Among the various branches of AI, Reinforcement Learning (RL) stands as the mathematical and conceptual foundation that allows computers and robots to make autonomous decisions through trial and reward.

This book, Mathematics of Reinforcement Learning, serves as a bridge between mathematical theory and practical algorithms, enabling readers to deeply understand the mathematical intuition behind learning systems that think, adapt, and optimize behavior.

Unlike traditional AI books that focus only on algorithmic implementation, this book unfolds the complete mathematical foundation—from Bellman equations and dynamic programming to Monte Carlo methods, temporal-difference learning, and Q-learning. Each topic is mathematically derived, systematically explained, and complemented with step-by-step numerical examples and proofs.

This book is written specifically for:

·        Undergraduate and postgraduate students (B.Tech, BCA, MCA, M.Sc. AI, Data Science)

·        Teachers and researchers in artificial intelligence and applied mathematics

·        Industry professionals and developers seeking deeper theoretical clarity in RL

Philosophy Behind the Book

Most introductory books on reinforcement learning explain algorithms but rarely delve into why these algorithms work or how their mathematical properties guarantee convergence, stability, and optimality. This book aims to unveil the mathematics that drives intelligence, presenting reinforcement learning not as a set of black-box algorithms but as a beautifully structured mathematical framework grounded in linear algebra, probability, optimization, and dynamic programming.

Each chapter begins with fundamental theory and builds toward algorithmic application, showing how every step—from expectation computation to Bellman optimization—can be rigorously formulated using mathematical logic.

The goal is to empower readers to not only use reinforcement learning but to understand and innovate upon it.

Structure and Organization

This book is divided into seven modules and twenty comprehensive chapters, organized in an intuitive learning sequence.

Module I: Foundations of Reinforcement Learning

It begins with the basic building blocks—agents, environments, states, actions, and rewards—and introduces readers to the concept of learning through interaction.
Chapters 1 to 3 explore:

·        The mathematical definitions of Markov Processes and Decision Models

·        The essential linear algebra and probability theory underlying reinforcement learning

·        The formal structure of Markov Decision Processes (MDPs) and Bellman equations

By the end of this module, the reader understands the theoretical backbone of RL, paving the way for algorithmic exploration.

Module II: Bellman Equations and Dynamic Programming

Here, the mathematics of optimality takes center stage. The Bellman equations are explored in full depth—both expectation and optimality formulations—along with proofs of convergence and computational methods.

Dynamic programming methods such as policy evaluation, policy iteration, and value iteration are introduced with complete derivations and worked-out numerical examples. The connection between dynamic programming and reinforcement learning is clearly established, showing how each step in the algorithm emerges from a recursive mathematical structure.

Module III: Monte Carlo and Temporal-Difference Learning

This module blends probability, sampling, and prediction. It explains how learning can happen from experience through Monte Carlo estimation and Temporal Difference (TD) learning.
Readers learn the relationships between bias, variance, convergence speed, and data efficiency. The transition from offline to online learning is demonstrated through examples like the Blackjack problem and Random Walk prediction.

Eligibility traces and TD(λ) methods are explained rigorously with mathematical equivalence proofs, bridging theory with implementation.

Module IV: Control Algorithms — From Sarsa to Q-Learning

The heart of reinforcement learning—learning to control—is covered in this section.
Starting with on-policy control (Sarsa) and progressing to off-policy control (Q-Learning), readers explore the mathematical mechanisms that enable agents to learn optimal strategies.

The derivation of the Q-learning update rule from the Bellman optimality principle is shown step-by-step, providing a strong conceptual understanding of how agents converge to optimal policies.
Comparisons between different approaches (Sarsa, Expected Sarsa, and Q-Learning) are backed with numerical and graphical examples.

 

 

Module V: Advanced Mathematical Tools and Extensions

At this point, the book transitions from classical reinforcement learning to advanced formulations.
Topics include:

·        Policy Gradient Theorem and its derivation

·        Actor-Critic architecture with detailed gradient calculations

·        Regularization and constrained optimization for safe and stable learning

·        Entropy and KL-Divergence based formulations for robust policy optimization

Readers are introduced to Lagrangian optimization in RL, showing how constraints can be mathematically imposed to ensure balanced exploration and exploitation.

Module VI: Deep and Approximate Reinforcement Learning

This section connects traditional reinforcement learning to deep neural networks and function approximation.
The mathematical underpinnings of Deep Q-Networks (DQN) are derived, explaining loss functions, gradient backpropagation, and the role of target networks.

Advanced architectures such as Double DQN, Dueling Networks, Prioritized Replay, and Proximal Policy Optimization (PPO) are also presented with mathematical clarity.
Through carefully designed examples, the book shows how deep learning integrates with reinforcement learning, resulting in modern AI systems like AlphaGo and autonomous robots.

Module VII: Theoretical and Research Perspectives

The final section consolidates all mathematical insights, focusing on proofs, convergence theorems, and future research directions.
It contains:

·        Rigorous proofs of TD and Q-learning convergence

·        Stability analysis using stochastic approximation theory

·        Exploration of open challenges such as safe RL, explainable RL, and quantum RL

This section encourages teachers and researchers to extend the theoretical boundaries of reinforcement learning.

Pedagogical Features

To ensure clarity and academic depth, each chapter includes:

·        Conceptual Explanation: Theoretical context and motivation

·        Mathematical Derivation: Step-by-step proofs and equations

·        Algorithm Design: Pseudocode for each major algorithm

·        Numerical Examples: Solved problems for classroom and self-practice

·        Visual Illustrations: Graphical understanding of value functions and convergence

·        Exercises and Research Notes: For deeper investigation

This structure makes the book equally useful for students learning the subject, teachers designing course material, and researchers developing new models.

Why This Book Is Unique

1.      Mathematical Depth: Every equation is derived and explained, not merely presented.

2.      Pedagogical Precision: Structured for both classroom teaching and independent study.

3.      Balanced Approach: Covers both classical RL (Bellman, DP, Q-learning) and modern RL (DQN, PPO, Actor-Critic).

4.      Research Orientation: Provides open problems, mathematical proofs, and advanced theoretical questions.

5.      Language Clarity: Written in simple, academic English with minimal jargon.

While most books treat RL as a subset of machine learning, this book presents RL as a pure mathematical science of decision-making under uncertainty.

Mathematics of Reinforcement Learning VOL-2

About the Book

The 21st century marks a revolutionary transformation in artificial intelligence (AI), where machines are not only learning from data but are also learning how to act intelligently in dynamic environments. Among the various branches of AI, Reinforcement Learning (RL) stands as the mathematical and conceptual foundation that allows computers and robots to make autonomous decisions through trial and reward.

This book, Mathematics of Reinforcement Learning, serves as a bridge between mathematical theory and practical algorithms, enabling readers to deeply understand the mathematical intuition behind learning systems that think, adapt, and optimize behavior.

Unlike traditional AI books that focus only on algorithmic implementation, this book unfolds the complete mathematical foundation—from Bellman equations and dynamic programming to Monte Carlo methods, temporal-difference learning, and Q-learning. Each topic is mathematically derived, systematically explained, and complemented with step-by-step numerical examples and proofs.

This book is written specifically for:

·        Undergraduate and postgraduate students (B.Tech, BCA, MCA, M.Sc. AI, Data Science)

·        Teachers and researchers in artificial intelligence and applied mathematics

·        Industry professionals and developers seeking deeper theoretical clarity in RL

Philosophy Behind the Book

Most introductory books on reinforcement learning explain algorithms but rarely delve into why these algorithms work or how their mathematical properties guarantee convergence, stability, and optimality. This book aims to unveil the mathematics that drives intelligence, presenting reinforcement learning not as a set of black-box algorithms but as a beautifully structured mathematical framework grounded in linear algebra, probability, optimization, and dynamic programming.

Each chapter begins with fundamental theory and builds toward algorithmic application, showing how every step—from expectation computation to Bellman optimization—can be rigorously formulated using mathematical logic.

The goal is to empower readers to not only use reinforcement learning but to understand and innovate upon it.

Structure and Organization

This book is divided into seven modules and twenty comprehensive chapters, organized in an intuitive learning sequence.

Module I: Foundations of Reinforcement Learning

It begins with the basic building blocks—agents, environments, states, actions, and rewards—and introduces readers to the concept of learning through interaction.
Chapters 1 to 3 explore:

·        The mathematical definitions of Markov Processes and Decision Models

·        The essential linear algebra and probability theory underlying reinforcement learning

·        The formal structure of Markov Decision Processes (MDPs) and Bellman equations

By the end of this module, the reader understands the theoretical backbone of RL, paving the way for algorithmic exploration.

Module II: Bellman Equations and Dynamic Programming

Here, the mathematics of optimality takes center stage. The Bellman equations are explored in full depth—both expectation and optimality formulations—along with proofs of convergence and computational methods.

Dynamic programming methods such as policy evaluation, policy iteration, and value iteration are introduced with complete derivations and worked-out numerical examples. The connection between dynamic programming and reinforcement learning is clearly established, showing how each step in the algorithm emerges from a recursive mathematical structure.

Module III: Monte Carlo and Temporal-Difference Learning

This module blends probability, sampling, and prediction. It explains how learning can happen from experience through Monte Carlo estimation and Temporal Difference (TD) learning.
Readers learn the relationships between bias, variance, convergence speed, and data efficiency. The transition from offline to online learning is demonstrated through examples like the Blackjack problem and Random Walk prediction.

Eligibility traces and TD(λ) methods are explained rigorously with mathematical equivalence proofs, bridging theory with implementation.

Module IV: Control Algorithms — From Sarsa to Q-Learning

The heart of reinforcement learning—learning to control—is covered in this section.
Starting with on-policy control (Sarsa) and progressing to off-policy control (Q-Learning), readers explore the mathematical mechanisms that enable agents to learn optimal strategies.

The derivation of the Q-learning update rule from the Bellman optimality principle is shown step-by-step, providing a strong conceptual understanding of how agents converge to optimal policies.
Comparisons between different approaches (Sarsa, Expected Sarsa, and Q-Learning) are backed with numerical and graphical examples.

 

 

Module V: Advanced Mathematical Tools and Extensions

At this point, the book transitions from classical reinforcement learning to advanced formulations.
Topics include:

·        Policy Gradient Theorem and its derivation

·        Actor-Critic architecture with detailed gradient calculations

·        Regularization and constrained optimization for safe and stable learning

·        Entropy and KL-Divergence based formulations for robust policy optimization

Readers are introduced to Lagrangian optimization in RL, showing how constraints can be mathematically imposed to ensure balanced exploration and exploitation.

Module VI: Deep and Approximate Reinforcement Learning

This section connects traditional reinforcement learning to deep neural networks and function approximation.
The mathematical underpinnings of Deep Q-Networks (DQN) are derived, explaining loss functions, gradient backpropagation, and the role of target networks.

Advanced architectures such as Double DQN, Dueling Networks, Prioritized Replay, and Proximal Policy Optimization (PPO) are also presented with mathematical clarity.
Through carefully designed examples, the book shows how deep learning integrates with reinforcement learning, resulting in modern AI systems like AlphaGo and autonomous robots.

Module VII: Theoretical and Research Perspectives

The final section consolidates all mathematical insights, focusing on proofs, convergence theorems, and future research directions.
It contains:

·        Rigorous proofs of TD and Q-learning convergence

·        Stability analysis using stochastic approximation theory

·        Exploration of open challenges such as safe RL, explainable RL, and quantum RL

This section encourages teachers and researchers to extend the theoretical boundaries of reinforcement learning.

Pedagogical Features

To ensure clarity and academic depth, each chapter includes:

·        Conceptual Explanation: Theoretical context and motivation

·        Mathematical Derivation: Step-by-step proofs and equations

·        Algorithm Design: Pseudocode for each major algorithm

·        Numerical Examples: Solved problems for classroom and self-practice

·        Visual Illustrations: Graphical understanding of value functions and convergence

·        Exercises and Research Notes: For deeper investigation

This structure makes the book equally useful for students learning the subject, teachers designing course material, and researchers developing new models.

Why This Book Is Unique

1.      Mathematical Depth: Every equation is derived and explained, not merely presented.

2.      Pedagogical Precision: Structured for both classroom teaching and independent study.

3.      Balanced Approach: Covers both classical RL (Bellman, DP, Q-learning) and modern RL (DQN, PPO, Actor-Critic).

4.      Research Orientation: Provides open problems, mathematical proofs, and advanced theoretical questions.

5.      Language Clarity: Written in simple, academic English with minimal jargon.

While most books treat RL as a subset of machine learning, this book presents RL as a pure mathematical science of decision-making under uncertainty.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub