Advanced GPU Assembly Programming Second Edition

A Technical Reference for NVIDIA and AMD Architectures

Uncover the fundamentals of GPU architecture and assembly programming with Advanced GPU Assembly Programming, a resource designed for enthusiasts and professionals who want to explore the intricate workings of modern GPUs. This book is not a step-by-step manual but a gateway to understanding GPU architecture and assembly programming at a foundational level. It’s ideal for readers who are ready to invest their own effort to experiment and grow their expertise.

gareth thomas

Minimum price

$19.00

$29.00

You pay

Author earns

PDF

About

About the Book

What You’ll Gain:

1. Deep Insights into GPU Architecture

Explore the fundamental differences between GPUs and CPUs, with a focus on parallelism, memory hierarchies, and threading models.
Learn the principles underlying the instruction set architectures (ISAs) of NVIDIA and AMD GPUs.

2. Foundations of Assembly Programming

Delve into the mechanics of low-level GPU programming, including execution models, memory access optimization, and pipeline management.
Understand the core concepts of assembly programming while preparing to apply them with additional resources and practice.

3. Tools and Techniques

Get an overview of key debugging and profiling tools such as NVIDIA Nsight and AMD Radeon GPU Profiler.
Gain the contextual knowledge to optimize GPU performance through careful analysis and tuning.

4. Future-Focused Knowledge

Stay ahead of emerging trends in GPU technology, from next-generation architectures to AI-driven optimization tools.

Who This Book is For:

Assembly Enthusiasts: Those eager to understand GPUs at their core and explore low-level programming.
Developers and Engineers: Professionals optimizing GPU-driven systems in gaming, AI, and scientific computing.
Researchers and Students: Anyone seeking a foundational understanding of GPU architectures and programming approaches.

What This Book is Not:

This is not a hands-on, step-by-step guide. Instead, it provides a conceptual framework and architectural insights to set readers on the right path. It encourages further exploration and learning through personal effort and experimentation.

Whether you’re a developer, researcher, or assembly enthusiast, Advanced GPU Assembly Programming will give you the knowledge needed to deeply understand GPU architecture and programming. Equip yourself with the foundational tools to explore, experiment, and achieve mastery in the fascinating world of GPU assembly.

Order your copy today and take your first step into the realm of GPU programming mastery!

UPDATE: This book now has a github repository with all source code samples, infographics, exercise manual and more.

From the Editor at Burst Books — Gareth Thomas

A Smarter Kind of Learning Has Arrived — Thinking on Its Own.

Forget tired textbooks from years past. These AI-crafted STEM editions advance at the speed of discovery. Each page is built by intelligence trained on thousands of trusted sources, delivering crystal-clear explanations, flawless equations, and functional examples — all refreshed through the latest breakthroughs.

Best of all, these editions cost a fraction of traditional texts yet surpass expectations. You’re gaining more than a book — you’re enhancing the mind’s performance.

Explore BurstBooksPublishing on GitHub to find technical samples, infographics, and additional study material — a complete hub that supports deeper, hands-on learning.

In this age of AI, leave the past behind and learn directly from tomorrow.

Share this book

Feedback

Email the Author

Bundles

Bundles that include this book

Modern GPU Architecture and Programming Complete Bundle
4 Books
Pricing
$37.00
Minimum price
Bought separately$116
Suggested price$97.00
Modern GPU Architecture and Programming Complete Bundle
7 Books
Pricing
$87.00
Minimum price
Bought separately$203
Suggested price$197

Author

About the Author

gareth thomas

Gareth Morgan Thomas is a qualified expert with extensive expertise across multiple STEM fields. Holding six university diplomas in electronics, software development, web development, and project management, along with qualifications in computer networking, CAD, diesel engineering, well drilling, and welding, he has built a robust foundation of technical knowledge.

Educated in Auckland, New Zealand, Gareth Morgan Thomas also spent three years serving in the New Zealand Army, where he honed his discipline and problem-solving skills. With years of technical training, Gareth Morgan Thomas is now dedicated to sharing his deep understanding of science, technology, engineering, and mathematics through a series of specialized books aimed at both beginners and advanced learners.

Table of Contents

Chapter 1. GPU Assembly Fundamentals

Section 1. GPU ISA Architecture Deep Dive

Binary encoding and instruction formats
Microarchitectural pipeline stages
Vector and scalar execution units
Hardware thread scheduling mechanisms
Clock domains and synchronization barriers

Section 2. Memory System Architecture

Memory controller design and protocols
Cache line states and coherency protocols
Memory fence operations and atomics
Page table structures and TLB organization
Memory compression algorithms

Section 3. Execution Model Implementation

Warp/wavefront scheduling algorithms
Instruction issue and dispatch logic
Branch prediction and speculation
Predication and mask operations
Hardware synchronization primitives

Chapter 2. Assembly Language Specifics

Section 1. Instruction Set Deep Dive

Opcode formats and encoding schemes
Immediate value handling
Predicate registers and condition codes
Special function unit instructions
Vector mask operations

Section 2. Register Architecture

Register file organization
Register bank conflicts
Register allocation algorithms
Spill/fill optimization techniques
Vector register partitioning

Section 3. Memory Access Patterns

Cache line alignment requirements
Stride pattern optimization
Bank conflict avoidance
Scatter/gather operation implementation
Atomic operation mechanics

Chapter 3. AMD GPU Assembly Architecture

Section 1. GCN/RDNA ISA Technical Details

Instruction word encoding formats
Scalar and vector ALU implementations
Local Data Share architecture
Wave32/Wave64 execution models
Hardware scheduler implementation

Section 2. AMD Memory System

L0/L1/L2 cache architectures
Memory controller interface specs
Cache coherency protocols
Page table walker implementation
Memory view hierarchy

Section 3. AMD Performance Optimization

VGPR/SGPR allocation strategies
Instruction bundling techniques
Cache bypass mechanisms
Memory barrier optimization
Wave item permutation techniques

Chapter 4. NVIDIA GPU Assembly Architecture

Section 1. PTX/SASS Technical Implementation

PTX instruction encoding
SASS optimization patterns
Predication implementation
Branch synchronization mechanics
Warp shuffle operation details

Section 2. NVIDIA Memory Architecture

Shared memory bank organization
L1/TEX cache implementation
Global memory coalescing rules
Memory consistency model
Atomic operation implementation

Section 3. NVIDIA Performance Engineering

Register dependency chains
Instruction latency hiding
Memory transaction coalescing
Warp scheduling optimization
Tensor core matrix operation details

Chapter 5. Cross-Vendor Techniques

Section 1. Comparative Analysis

Key architectural differences between AMD and NVIDIA GPUs
ISA-level comparisons
Execution model trade-offs

Section 2. Portable Assembly Code

OpenCL, Vulkan, and SPIR-V
Adapting AMD optimizations for NVIDIA GPUs (and vice versa)
Strategies for platform-specific gains

Section 3. Cross-Vendor Debugging and Profiling

Using RenderDoc and GDB for cross-platform analysis
Bottleneck identification and resolution
Ensuring performance parity across GPUs

Chapter 6. Low-Level Optimization Strategies

Section 1. Memory System Optimization

Cache line state manipulation
TLB optimization techniques
Memory controller queue management
Memory barrier minimization
Atomic operation alternatives

Section 2. Instruction Scheduling

Dependency chain analysis
Resource conflict avoidance
Instruction reordering techniques
Loop unrolling strategies
Software pipelining methods

Section 3. Register Optimization

Register pressure analysis
Live range splitting
Register coalescing techniques
Spill code optimization
Register renaming strategies

Chapter 7. Practical Applications

Section 1. Scientific Computing

FFT optimization techniques
Stencil computation methods
Sparse matrix optimization
Random number generation

Section 2. Real-Time Graphics

Ray tracing at the assembly level
Optimizing Vulkan shaders
Texture sampling techniques

Section 3. Machine Learning

Convolution implementation
Batch normalization techniques
Gradient computation optimization

Chapter 8. Performance Analysis Techniques

Section 1. Performance Counters

Hardware counter interpretation
Event sampling methods
Pipeline stall analysis
Cache miss classification
Memory bandwidth analysis

Section 2. Optimization Methodology

Static code analysis
Dynamic execution tracing
Bottleneck identification
Resource utilization analysis
Latency/throughput optimization

Chapter 9. Emerging Trends in GPU Assembly

Section 1. Next-Generation Architectures

Upcoming trends in GPU ISA design (RDNA3, Hopper)
Unified memory and ray tracing implications
Specialized hardware accelerators (tensor cores, AI chips)

Section 2. Future of Low-Level Programming

AI-driven code generation and profiling
Opportunities for low-level developers
Evolution of tools and techniques

Chapter 10. Advanced Development Tools

Section 1. Assembly Development Tools

Binary analysis techniques
Disassembly methods
Code generation tools
Performance modeling
Debugging techniques

Section 2. Profiling Implementation

Sampling methods
Trace collection and visualization
Bottleneck analysis and optimization validation

Other books by this author

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub