Leanpub Header

Skip to main content

Modern GPU Architecture and Programming Complete Bundle

A 7-book GPU collection covering architecture, CUDA, assembly, PTX, SASS, and parallel computing. Learn how to move from high-level programming to low-level execution and optimize performance across modern GPU systems.

Bought separately

$203.00

Minimum price

$87.00

$197.00

You pay

$197.00

Author earns

$157.60
$

...Or Buy With Credits!

You can get credits with a paid monthly or annual Reader Membership, or you can buy them here.
These books have a total suggested price of $203.00. Get them now for only $87.00!
About

About

About the Bundle

Achieve full-spectrum mastery of GPU systems with the Modern GPU Architecture and Programming Complete Bundle—a comprehensive 7-book collection covering architecture, programming, and low-level optimization.

This bundle is designed as a complete progression from hardware foundations to instruction-level control, giving you the ability to understand, analyze, and optimize GPU performance across every layer of the stack.

It integrates architectural insight with practical programming models and low-level instruction analysis—bridging the gap between CUDA development, GPU assembly, and real hardware behavior.

Included in this collection:

  • Advanced GPU Assembly Programming (Second Edition) – Low-level control across NVIDIA and AMD architectures
  • Advanced CUDA Programming – High-performance computing techniques and GPU acceleration
  • Modern GPU Architecture (Volumes One & Two, Second Edition) – Graphics pipeline, compute systems, tensor cores, and hardware design
  • Mastering PTX and SASS (Volumes One & Two) – Instruction-level execution, optimization, and hardware mapping
  • GPU Parallel Computing (Second Edition) – Core parallel programming principles and performance scaling

What this bundle delivers:

  • Complete coverage from architecture to assembly-level execution
  • Deep understanding of how code maps to hardware behavior
  • Advanced optimization strategies for HPC and AI workloads
  • A unified view of GPU programming across abstraction layers

This is a full-stack GPU engineering library—built for developers and engineers who want to operate beyond frameworks and gain precise control over performance.

Books

About the Books

GPU Parallel Computing

From Basics to Breakthroughs in GPU Programming

GPU Parallel Computing: From Basics to Breakthroughs — A Technical Guide to GPU ProgrammingIf you want to understand how modern GPUs work and how to use them effectively for high-performance workloads, this book provides the technical foundation required.This book assumes no prior exposure to GPU internals; however, a working knowledge of electronics and general computer architecture is recommended.It is written for students, engineers, researchers, and data scientists who are new to GPU architecture and parallel programming and want a rigorous introduction before progressing into optimization and large-scale GPU systems.If you are already an experienced CUDA performance engineer or low-level GPU architect seeking a specialized microarchitectural reference manual, this book is not positioned for that purpose.What You Will LearnGPU Architecture FundamentalsStreaming multiprocessors and SIMT executionWarp scheduling and instruction flowGPU memory hierarchy and bandwidth considerationsGPU Programming ModelsCUDA programming principlesOpenCL fundamentalsKernel structure and execution behaviorPerformance OptimizationMemory access patterns and coalescingWarp divergence and latency hidingOccupancy principles and kernel configurationReal-World ApplicationsScientific simulationsMachine learning workloadsGraphics and visualization pipelinesAdvanced TopicsMulti-GPU communicationTensor cores and mixed precisionProfiling, debugging, and performance analysisThe early chapters establish architectural clarity and programming fundamentals.Later chapters address optimization strategies, scalability, and applied GPU workloads.Who This Book Is ForStudents entering GPU computingEngineers transitioning into parallel architectureResearchers and data scientists adopting GPU accelerationThis is a technical book. It builds understanding from architectural principles upward and focuses on performance-oriented reasoning rather than superficial overview.Why This BookMany GPU resources either assume too much prior knowledge or remain overly abstract.This book emphasizes structured technical understanding:How GPUs execute threadsWhy performance bottlenecks occurHow architectural constraints shape resultsHow programming decisions map to hardware behaviorClear explanations.Practical code examples.Architectural context.Read more

Modern GPU Architecture Second Edition

Volume Two Compute Acceleration Tensor Cores, and Advanced Systems

Modern GPU Architecture Second Edition — Volume One
Graphics Pipeline Design and Hardware Implementation

Modern GPUs are the most complex and efficient parallel processors ever created—and this book shows you exactly how they work at the hardware level.

Unlike typical graphics or programming guides, this volume takes you inside the GPU itself:
how instructions flow through pipelines, how memory hierarchies sustain bandwidth, how shader cores and fixed-function units cooperate to render billions of pixels per second.

You’ll explore every major stage of the graphics pipeline in depth—geometry, rasterization, shading, texturing, and render output—all supported by clear mathematical models and synthesizable Verilog examples. This is not “theory for theory’s sake”; it’s engineering detail you can apply directly in design, simulation, or hardware verification.

By reading this book, you’ll gain:

  • Architectural intuition — understand how throughput, latency, and bandwidth interact in real GPUs.
  • Practical RTL-level insight — see how each stage can be implemented with clean, synthesizable Verilog.
  • A foundation for advanced design — build the knowledge required for AI acceleration, compute architectures, or FPGA-based GPU prototyping.
  • Confidence to analyze real silicon — reason about performance, bottlenecks, and tradeoffs like a hardware architect.

Every chapter bridges concept and implementation, making it invaluable for anyone designing graphics hardware, studying computer architecture, or seeking mastery of parallel computation systems.

Dense, detailed, and unapologetically technical, this book is written for those who want to understand modern GPUs—not just use them.

⚠️ This isn’t entertainment. It’s engineering.
If that excites you, welcome aboard.
If it intimidates you, this book isn’t for you.

From the Editor at Burst Books — Gareth Thomas

A Smarter Kind of Learning Has Arrived — Thinking on Its Own.

Forget tired textbooks from years past. These AI-crafted STEM editions advance at the speed of discovery. Each page is built by intelligence trained on thousands of trusted sources, delivering crystal-clear explanations, flawless equations, and functional examples — all refreshed through the latest breakthroughs.

Best of all, these editions cost a fraction of traditional texts yet surpass expectations. You’re gaining more than a book — you’re enhancing the mind’s performance.

Explore BurstBooksPublishing on GitHub to find technical samples, infographics, and additional study material — a complete hub that supports deeper, hands-on learning.

In this age of AI, leave the past behind and learn directly from tomorrow.

Modern GPU Architecture Second Edition

Volume One Graphics Pipeline Design and Hardware Implementation

Modern GPU Architecture Second Edition — Volume One
Graphics Pipeline Design and Hardware Implementation

Modern GPUs are the most complex and efficient parallel processors ever created—and this book shows you exactly how they work at the hardware level.

Unlike typical graphics or programming guides, this volume takes you inside the GPU itself:
how instructions flow through pipelines, how memory hierarchies sustain bandwidth, how shader cores and fixed-function units cooperate to render billions of pixels per second.

You’ll explore every major stage of the graphics pipeline in depth—geometry, rasterization, shading, texturing, and render output—all supported by clear mathematical models and synthesizable Verilog examples. This is not “theory for theory’s sake”; it’s engineering detail you can apply directly in design, simulation, or hardware verification.

By reading this book, you’ll gain:

  • Architectural intuition — understand how throughput, latency, and bandwidth interact in real GPUs.
  • Practical RTL-level insight — see how each stage can be implemented with clean, synthesizable Verilog.
  • A foundation for advanced design — build the knowledge required for AI acceleration, compute architectures, or FPGA-based GPU prototyping.
  • Confidence to analyze real silicon — reason about performance, bottlenecks, and tradeoffs like a hardware architect.

Every chapter bridges concept and implementation, making it invaluable for anyone designing graphics hardware, studying computer architecture, or seeking mastery of parallel computation systems.

Dense, detailed, and unapologetically technical, this book is written for those who want to understand modern GPUs—not just use them.

⚠️ This isn’t entertainment. It’s engineering.
If that excites you, welcome aboard.
If it intimidates you, this book isn’t for you.

From the Editor at Burst Books — Gareth Thomas

A Smarter Kind of Learning Has Arrived — Thinking on Its Own.

Forget tired textbooks from years past. These AI-crafted STEM editions advance at the speed of discovery. Each page is built by intelligence trained on thousands of trusted sources, delivering crystal-clear explanations, flawless equations, and functional examples — all refreshed through the latest breakthroughs.

Best of all, these editions cost a fraction of traditional texts yet surpass expectations. You’re gaining more than a book — you’re enhancing the mind’s performance.

Explore BurstBooksPublishing on GitHub to find technical samples, infographics, and additional study material — a complete hub that supports deeper, hands-on learning.

In this age of AI, leave the past behind and learn directly from tomorrow.

Advanced GPU Assembly Programming Second Edition

A Technical Reference for NVIDIA and AMD Architectures

Uncover the fundamentals of GPU architecture and assembly programming with Advanced GPU Assembly Programming, a resource designed for enthusiasts and professionals who want to explore the intricate workings of modern GPUs. This book is not a step-by-step manual but a gateway to understanding GPU architecture and assembly programming at a foundational level. It’s ideal for readers who are ready to invest their own effort to experiment and grow their expertise.

What You’ll Gain:

1. Deep Insights into GPU Architecture
  • Explore the fundamental differences between GPUs and CPUs, with a focus on parallelism, memory hierarchies, and threading models.
  • Learn the principles underlying the instruction set architectures (ISAs) of NVIDIA and AMD GPUs.
2. Foundations of Assembly Programming
  • Delve into the mechanics of low-level GPU programming, including execution models, memory access optimization, and pipeline management.
  • Understand the core concepts of assembly programming while preparing to apply them with additional resources and practice.
3. Tools and Techniques
  • Get an overview of key debugging and profiling tools such as NVIDIA Nsight and AMD Radeon GPU Profiler.
  • Gain the contextual knowledge to optimize GPU performance through careful analysis and tuning.
4. Future-Focused Knowledge
  • Stay ahead of emerging trends in GPU technology, from next-generation architectures to AI-driven optimization tools.

Who This Book is For:

  • Assembly Enthusiasts: Those eager to understand GPUs at their core and explore low-level programming.
  • Developers and Engineers: Professionals optimizing GPU-driven systems in gaming, AI, and scientific computing.
  • Researchers and Students: Anyone seeking a foundational understanding of GPU architectures and programming approaches.

What This Book is Not:

This is not a hands-on, step-by-step guide. Instead, it provides a conceptual framework and architectural insights to set readers on the right path. It encourages further exploration and learning through personal effort and experimentation.

Whether you’re a developer, researcher, or assembly enthusiast, Advanced GPU Assembly Programming will give you the knowledge needed to deeply understand GPU architecture and programming. Equip yourself with the foundational tools to explore, experiment, and achieve mastery in the fascinating world of GPU assembly.

Order your copy today and take your first step into the realm of GPU programming mastery!

UPDATE: This book now has a github repository with all source code samples, infographics, exercise manual and more.

From the Editor at Burst Books — Gareth Thomas

A Smarter Kind of Learning Has Arrived — Thinking on Its Own.

Forget tired textbooks from years past. These AI-crafted STEM editions advance at the speed of discovery. Each page is built by intelligence trained on thousands of trusted sources, delivering crystal-clear explanations, flawless equations, and functional examples — all refreshed through the latest breakthroughs.

Best of all, these editions cost a fraction of traditional texts yet surpass expectations. You’re gaining more than a book — you’re enhancing the mind’s performance.

Explore BurstBooksPublishing on GitHub to find technical samples, infographics, and additional study material — a complete hub that supports deeper, hands-on learning.

In this age of AI, leave the past behind and learn directly from tomorrow.

Mastering PTX and SASS

Volume I — The PTX Language and Architecture Foundations

If you’ve ever wondered why your GPU code hits a wall long before the hardware’s limits, this book tells you why—and how to break through it.

Most programmers stop where the compiler starts. They trust nvcc to make the right decisions, to manage registers, to schedule instructions, and to use memory efficiently. But the compiler doesn’t know your problem. It guesses. And in GPU computing, guessing costs performance.

Mastering PTX and SASS – Volume I pulls back the curtain on NVIDIA’s virtual machine—the PTX instruction set that every CUDA kernel becomes before it touches silicon. You’ll learn how threads, warps, and memory really behave at the hardware level, how each instruction interacts with caches and pipelines, and how to read, write, and reason about PTX like an architect, not just a coder.

This isn’t a surface-level “how-to.” It’s a deep, methodical tour through the machinery of modern GPUs—built for professionals who want measurable, repeatable speedups, not guesswork. You’ll discover how the compiler transforms your high-level logic into executable reality, and where you can step in to take control.

By the time you finish, you won’t be relying on compiler magic. You’ll understand it, improve it, and surpass it.

Mastering PTX and SASS – Volume I gives you the foundation; Volume II takes you to the bleeding edge of optimization. Together, they turn GPU performance from a mystery into a science.

Mastering PTX and SASS

Volume II — Optimization, SASS, and Advanced Techniques

You’ve mastered the architecture—now it’s time to own the performance.

Every GPU developer hits the same wall: the profiler says you’re close to peak, but you know there’s still headroom. What’s missing isn’t another compiler flag—it’s visibility into the hardware’s final truth. That truth lives in SASS, the real machine code running on NVIDIA GPUs.

Mastering PTX and SASS – Volume II takes you past theory into the territory where nanoseconds matter. Here you’ll learn how to read, analyze, and tune instruction streams with surgical precision. You’ll uncover how schedulers pair ops, how register pressure throttles throughput, and how to turn your kernels into clock-cycle-balanced engines of pure efficiency.

This book is for engineers who refuse to settle for “good enough.” It turns profiling, disassembly, and optimization into a repeatable process—one grounded in data, not superstition. From tensor cores to warp shuffles, from atomic operations to multi-GPU scaling, you’ll learn how real experts bend hardware to their will.

Volume I built the foundation; Volume II shows you how to weaponize it.
If you’re ready to squeeze every drop of performance from your GPU—and understand exactly how you did it—this is the manual you’ve been waiting for.

Advanced CUDA Programming

High Performance Computing with GPUs

NOTICE: "All Code for this book and many more is at github BurstBooksPublishing"

Advanced CUDA Programming: High-Performance Computing with GPUs is the ultimate guide to unlocking the full power of modern GPU computing. Whether you're developing AI models, optimizing scientific simulations, or pushing real-time applications to their limits, this book delivers the advanced techniques and expert insights you need to achieve peak CUDA performance.

GPU programming is no longer optional—it's a necessity in today's world of deep learning, AI acceleration, and high-performance computing. But simply writing CUDA kernels isn’t enough. To truly optimize GPU applications, you need a deep understanding of GPU architecture, memory hierarchies, execution models, and performance tuning strategies. This book takes you beyond the fundamentals and into the world of advanced CUDA programming, where efficiency, scalability, and raw computational power define success.

What You’ll Learn:

  • Deep GPU Architecture Insights – Explore the Ampere and Hopper architectures, including streaming multiprocessors, warp scheduling, and memory controller design.
  • Memory Optimization Techniques – Implement coalesced memory access, shared memory tuning, cache optimizations, and unified memory strategies for peak performance.
  • Asynchronous Execution & CUDA Streams – Master multi-stream processing, event-based synchronization, and pinned memory usage to maximize parallelism.
  • High-Performance Kernel Development – Learn thread block optimization, warp-level programming, and dynamic parallelism for efficient kernel execution.
  • AI & Deep Learning Acceleration – Optimize GEMM, convolution operations, mixed precision training, and inference using tensor cores.
  • Multi-GPU & Distributed Computing – Scale workloads across GPUs with P2P communication, NVLink, workload distribution, and MPI integration.
  • Real-Time Processing & Low-Latency Optimization – Develop real-time applications with deterministic execution, deadline scheduling, and pipeline optimizations.
  • Debugging & Profiling Mastery – Use Nsight Compute, CUDA-GDB, memory checking tools, and roofline analysis to fine-tune CUDA applications.

Why This Book?

This isn’t just another CUDA guide—it’s a masterclass in performance optimization. Packed with real-world case studies, hands-on techniques, and cutting-edge strategies, it delivers everything you need to develop fast, scalable, and production-ready GPU applications.

If you're ready to take your CUDA skills to the next level and maximize GPU performance like never before, this book is your roadmap. Don't leave performance on the table—start optimizing today.

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub