If you’ve ever wondered why your GPU code hits a wall long before the hardware’s limits, this book tells you why—and how to break through it.
Most programmers stop where the compiler starts. They trust nvcc to make the right decisions, to manage registers, to schedule instructions, and to use memory efficiently. But the compiler doesn’t know your problem. It guesses. And in GPU computing, guessing costs performance.
Mastering PTX and SASS – Volume I pulls back the curtain on NVIDIA’s virtual machine—the PTX instruction set that every CUDA kernel becomes before it touches silicon. You’ll learn how threads, warps, and memory really behave at the hardware level, how each instruction interacts with caches and pipelines, and how to read, write, and reason about PTX like an architect, not just a coder.
This isn’t a surface-level “how-to.” It’s a deep, methodical tour through the machinery of modern GPUs—built for professionals who want measurable, repeatable speedups, not guesswork. You’ll discover how the compiler transforms your high-level logic into executable reality, and where you can step in to take control.
By the time you finish, you won’t be relying on compiler magic. You’ll understand it, improve it, and surpass it.
Mastering PTX and SASS – Volume I gives you the foundation; Volume II takes you to the bleeding edge of optimization. Together, they turn GPU performance from a mystery into a science.