Leanpub Header

Skip to main content

Advanced GPU Assembly Programming Second Edition

A Technical Reference for NVIDIA and AMD Architectures

Uncover the fundamentals of GPU architecture and assembly programming with Advanced GPU Assembly Programming, a resource designed for enthusiasts and professionals who want to explore the intricate workings of modern GPUs. This book is not a step-by-step manual but a gateway to understanding GPU architecture and assembly programming at a foundational level. It’s ideal for readers who are ready to invest their own effort to experiment and grow their expertise.

Minimum price

$19.00

$29.00

You pay

$29.00

Author earns

$23.20
$

...Or Buy With Credits!

You can get credits with a paid monthly or annual Reader Membership, or you can buy them here.
PDF
About

About

About the Book

Uncover the fundamentals of GPU architecture and assembly programming with Advanced GPU Assembly Programming, a resource designed for enthusiasts and professionals who want to explore the intricate workings of modern GPUs. This book is not a step-by-step manual but a gateway to understanding GPU architecture and assembly programming at a foundational level. It’s ideal for readers who are ready to invest their own effort to experiment and grow their expertise.

What You’ll Gain:

1. Deep Insights into GPU Architecture
  • Explore the fundamental differences between GPUs and CPUs, with a focus on parallelism, memory hierarchies, and threading models.
  • Learn the principles underlying the instruction set architectures (ISAs) of NVIDIA and AMD GPUs.
2. Foundations of Assembly Programming
  • Delve into the mechanics of low-level GPU programming, including execution models, memory access optimization, and pipeline management.
  • Understand the core concepts of assembly programming while preparing to apply them with additional resources and practice.
3. Tools and Techniques
  • Get an overview of key debugging and profiling tools such as NVIDIA Nsight and AMD Radeon GPU Profiler.
  • Gain the contextual knowledge to optimize GPU performance through careful analysis and tuning.
4. Future-Focused Knowledge
  • Stay ahead of emerging trends in GPU technology, from next-generation architectures to AI-driven optimization tools.

Who This Book is For:

  • Assembly Enthusiasts: Those eager to understand GPUs at their core and explore low-level programming.
  • Developers and Engineers: Professionals optimizing GPU-driven systems in gaming, AI, and scientific computing.
  • Researchers and Students: Anyone seeking a foundational understanding of GPU architectures and programming approaches.

What This Book is Not:

This is not a hands-on, step-by-step guide. Instead, it provides a conceptual framework and architectural insights to set readers on the right path. It encourages further exploration and learning through personal effort and experimentation.

Whether you’re a developer, researcher, or assembly enthusiast, Advanced GPU Assembly Programming will give you the knowledge needed to deeply understand GPU architecture and programming. Equip yourself with the foundational tools to explore, experiment, and achieve mastery in the fascinating world of GPU assembly.

Order your copy today and take your first step into the realm of GPU programming mastery!

UPDATE: This book now has a github repository with all source code samples, infographics, exercise manual and more.

From the Editor at Burst Books — Gareth Thomas

A Smarter Kind of Learning Has Arrived — Thinking on Its Own.

Forget tired textbooks from years past. These AI-crafted STEM editions advance at the speed of discovery. Each page is built by intelligence trained on thousands of trusted sources, delivering crystal-clear explanations, flawless equations, and functional examples — all refreshed through the latest breakthroughs.

Best of all, these editions cost a fraction of traditional texts yet surpass expectations. You’re gaining more than a book — you’re enhancing the mind’s performance.

Explore BurstBooksPublishing on GitHub to find technical samples, infographics, and additional study material — a complete hub that supports deeper, hands-on learning.

In this age of AI, leave the past behind and learn directly from tomorrow.

Share this book

Categories

Bundle

Bundles that include this book

Author

About the Author

gareth thomas

Gareth Morgan Thomas is a qualified expert with extensive expertise across multiple STEM fields. Holding six university diplomas in electronics, software development, web development, and project management, along with qualifications in computer networking, CAD, diesel engineering, well drilling, and welding, he has built a robust foundation of technical knowledge.

Educated in Auckland, New Zealand, Gareth Morgan Thomas also spent three years serving in the New Zealand Army, where he honed his discipline and problem-solving skills. With years of technical training, Gareth Morgan Thomas is now dedicated to sharing his deep understanding of science, technology, engineering, and mathematics through a series of specialized books aimed at both beginners and advanced learners.


Contents

Table of Contents

Chapter 1. GPU Assembly Fundamentals

Section 1. GPU ISA Architecture Deep Dive

  • Binary encoding and instruction formats
  • Microarchitectural pipeline stages
  • Vector and scalar execution units
  • Hardware thread scheduling mechanisms
  • Clock domains and synchronization barriers

Section 2. Memory System Architecture

  • Memory controller design and protocols
  • Cache line states and coherency protocols
  • Memory fence operations and atomics
  • Page table structures and TLB organization
  • Memory compression algorithms

Section 3. Execution Model Implementation

  • Warp/wavefront scheduling algorithms
  • Instruction issue and dispatch logic
  • Branch prediction and speculation
  • Predication and mask operations
  • Hardware synchronization primitives

Chapter 2. Assembly Language Specifics

Section 1. Instruction Set Deep Dive

  • Opcode formats and encoding schemes
  • Immediate value handling
  • Predicate registers and condition codes
  • Special function unit instructions
  • Vector mask operations

Section 2. Register Architecture

  • Register file organization
  • Register bank conflicts
  • Register allocation algorithms
  • Spill/fill optimization techniques
  • Vector register partitioning

Section 3. Memory Access Patterns

  • Cache line alignment requirements
  • Stride pattern optimization
  • Bank conflict avoidance
  • Scatter/gather operation implementation
  • Atomic operation mechanics

Chapter 3. AMD GPU Assembly Architecture

Section 1. GCN/RDNA ISA Technical Details

  • Instruction word encoding formats
  • Scalar and vector ALU implementations
  • Local Data Share architecture
  • Wave32/Wave64 execution models
  • Hardware scheduler implementation

Section 2. AMD Memory System

  • L0/L1/L2 cache architectures
  • Memory controller interface specs
  • Cache coherency protocols
  • Page table walker implementation
  • Memory view hierarchy

Section 3. AMD Performance Optimization

  • VGPR/SGPR allocation strategies
  • Instruction bundling techniques
  • Cache bypass mechanisms
  • Memory barrier optimization
  • Wave item permutation techniques

Chapter 4. NVIDIA GPU Assembly Architecture

Section 1. PTX/SASS Technical Implementation

  • PTX instruction encoding
  • SASS optimization patterns
  • Predication implementation
  • Branch synchronization mechanics
  • Warp shuffle operation details

Section 2. NVIDIA Memory Architecture

  • Shared memory bank organization
  • L1/TEX cache implementation
  • Global memory coalescing rules
  • Memory consistency model
  • Atomic operation implementation

Section 3. NVIDIA Performance Engineering

  • Register dependency chains
  • Instruction latency hiding
  • Memory transaction coalescing
  • Warp scheduling optimization
  • Tensor core matrix operation details

Chapter 5. Cross-Vendor Techniques

Section 1. Comparative Analysis

  • Key architectural differences between AMD and NVIDIA GPUs
  • ISA-level comparisons
  • Execution model trade-offs

Section 2. Portable Assembly Code

  • OpenCL, Vulkan, and SPIR-V
  • Adapting AMD optimizations for NVIDIA GPUs (and vice versa)
  • Strategies for platform-specific gains

Section 3. Cross-Vendor Debugging and Profiling

  • Using RenderDoc and GDB for cross-platform analysis
  • Bottleneck identification and resolution
  • Ensuring performance parity across GPUs

Chapter 6. Low-Level Optimization Strategies

Section 1. Memory System Optimization

  • Cache line state manipulation
  • TLB optimization techniques
  • Memory controller queue management
  • Memory barrier minimization
  • Atomic operation alternatives

Section 2. Instruction Scheduling

  • Dependency chain analysis
  • Resource conflict avoidance
  • Instruction reordering techniques
  • Loop unrolling strategies
  • Software pipelining methods

Section 3. Register Optimization

  • Register pressure analysis
  • Live range splitting
  • Register coalescing techniques
  • Spill code optimization
  • Register renaming strategies

Chapter 7. Practical Applications

Section 1. Scientific Computing

  • FFT optimization techniques
  • Stencil computation methods
  • Sparse matrix optimization
  • Random number generation

Section 2. Real-Time Graphics

  • Ray tracing at the assembly level
  • Optimizing Vulkan shaders
  • Texture sampling techniques

Section 3. Machine Learning

  • Convolution implementation
  • Batch normalization techniques
  • Gradient computation optimization

Chapter 8. Performance Analysis Techniques

Section 1. Performance Counters

  • Hardware counter interpretation
  • Event sampling methods
  • Pipeline stall analysis
  • Cache miss classification
  • Memory bandwidth analysis

Section 2. Optimization Methodology

  • Static code analysis
  • Dynamic execution tracing
  • Bottleneck identification
  • Resource utilization analysis
  • Latency/throughput optimization

Chapter 9. Emerging Trends in GPU Assembly

Section 1. Next-Generation Architectures

  • Upcoming trends in GPU ISA design (RDNA3, Hopper)
  • Unified memory and ray tracing implications
  • Specialized hardware accelerators (tensor cores, AI chips)

Section 2. Future of Low-Level Programming

  • AI-driven code generation and profiling
  • Opportunities for low-level developers
  • Evolution of tools and techniques

Chapter 10. Advanced Development Tools

Section 1. Assembly Development Tools

  • Binary analysis techniques
  • Disassembly methods
  • Code generation tools
  • Performance modeling
  • Debugging techniques

Section 2. Profiling Implementation

  • Sampling methods
  • Trace collection and visualization
  • Bottleneck analysis and optimization validation

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub