Leanpub Header

Skip to main content

GPU Parallel Computing

From Basics to Breakthroughs in GPU Programming

GPU Parallel Computing: From Basics to Breakthroughs — A Technical Guide to GPU Programming

If you want to understand how modern GPUs work and how to use them effectively for high-performance workloads, this book provides the technical foundation required.

This book assumes no prior exposure to GPU internals; however, a working knowledge of electronics and general computer architecture is recommended.

It is written for students, engineers, researchers, and data scientists who are new to GPU architecture and parallel programming and want a rigorous introduction before progressing into optimization and large-scale GPU systems.

Minimum price

$19.00

$29.00

You pay

$29.00

Author earns

$23.20
$

...Or Buy With Credits!

You can get credits with a paid monthly or annual Reader Membership, or you can buy them here.
PDF
About

About

About the Book

GPU Parallel Computing: From Basics to Breakthroughs — A Technical Guide to GPU ProgrammingIf you want to understand how modern GPUs work and how to use them effectively for high-performance workloads, this book provides the technical foundation required.This book assumes no prior exposure to GPU internals; however, a working knowledge of electronics and general computer architecture is recommended.It is written for students, engineers, researchers, and data scientists who are new to GPU architecture and parallel programming and want a rigorous introduction before progressing into optimization and large-scale GPU systems.If you are already an experienced CUDA performance engineer or low-level GPU architect seeking a specialized microarchitectural reference manual, this book is not positioned for that purpose.What You Will LearnGPU Architecture FundamentalsStreaming multiprocessors and SIMT executionWarp scheduling and instruction flowGPU memory hierarchy and bandwidth considerationsGPU Programming ModelsCUDA programming principlesOpenCL fundamentalsKernel structure and execution behaviorPerformance OptimizationMemory access patterns and coalescingWarp divergence and latency hidingOccupancy principles and kernel configurationReal-World ApplicationsScientific simulationsMachine learning workloadsGraphics and visualization pipelinesAdvanced TopicsMulti-GPU communicationTensor cores and mixed precisionProfiling, debugging, and performance analysisThe early chapters establish architectural clarity and programming fundamentals.Later chapters address optimization strategies, scalability, and applied GPU workloads.Who This Book Is ForStudents entering GPU computingEngineers transitioning into parallel architectureResearchers and data scientists adopting GPU accelerationThis is a technical book. It builds understanding from architectural principles upward and focuses on performance-oriented reasoning rather than superficial overview.Why This BookMany GPU resources either assume too much prior knowledge or remain overly abstract.This book emphasizes structured technical understanding:How GPUs execute threadsWhy performance bottlenecks occurHow architectural constraints shape resultsHow programming decisions map to hardware behaviorClear explanations.Practical code examples.Architectural context.Read more

Share this book

Bundle

Bundles that include this book

Author

About the Author

gareth thomas

Gareth Morgan Thomas is a qualified expert with extensive expertise across multiple STEM fields. Holding six university diplomas in electronics, software development, web development, and project management, along with qualifications in computer networking, CAD, diesel engineering, well drilling, and welding, he has built a robust foundation of technical knowledge.

Educated in Auckland, New Zealand, Gareth Morgan Thomas also spent three years serving in the New Zealand Army, where he honed his discipline and problem-solving skills. With years of technical training, Gareth Morgan Thomas is now dedicated to sharing his deep understanding of science, technology, engineering, and mathematics through a series of specialized books aimed at both beginners and advanced learners.


Contents

Table of Contents

GPU Parallel Computing — Outline

Chapter 1. Introduction to GPU Parallel Computing

Section 1. Evolution of Parallel Computing

  • Historical Development of GPUs
  • From Graphics to General-Purpose Computing
  • Modern GPU Computing Landscape

Section 2. Key Applications of GPU Computing

  • High-Performance Computing Domains
  • Real-time Processing Applications
  • Emerging GPU Computing Fields

Section 3. Benefits of GPU Parallelism

  • Performance Advantages
  • Energy Efficiency Considerations
  • Cost-Benefit Analysis for Different Workloads

Chapter 2. GPU Architecture Fundamentals

Section 1. Streaming Multiprocessors

  • SM Architecture and Components
  • Thread Block Scheduling
  • SIMT Execution Model

Section 2. Memory Subsystems

  • Memory Hierarchy Overview
  • Cache Architecture
  • Memory Controllers and Bandwidth

Section 3. Instruction Execution and Scheduling

  • Warp Scheduling Mechanisms
  • Instruction Pipeline
  • Latency Hiding Techniques

Chapter 3. Programming Models for GPU Parallelism

Section 1. CUDA Programming Overview

  • CUDA Programming Model
  • Kernel Programming Basics
  • CUDA Runtime API vs Driver API

Section 2. OpenCL Fundamentals

  • Platform and Device Models
  • Memory Model
  • Programming Pattern Differences

Section 3. Comparing Programming Models

  • CUDA vs OpenCL Trade-offs
  • DirectCompute and Other APIs
  • Choosing the Right Framework

Chapter 4. GPU Memory Management

Section 1. GPU Memory Types

  • Global Memory Management
  • Shared Memory Utilization
  • Constant and Texture Memory
  • Register Usage Strategies

Section 2. Coalesced Memory Access

  • Memory Access Patterns
  • Alignment Requirements
  • Bank Conflict Resolution

Section 3. Memory Allocation Techniques

  • Dynamic Memory Management
  • Unified Memory
  • Zero-Copy Memory

Chapter 5. Parallel Algorithm Design

Section 1. Task Parallelism vs. Data Parallelism

  • Identifying Parallelization Opportunities
  • Decomposition Strategies
  • Hybrid Approaches

Section 2. Workload Partitioning

  • Data Distribution Techniques
  • Load Balancing Strategies
  • Granularity Considerations

Section 3. Algorithm Scalability

  • Strong vs Weak Scaling
  • Amdahl's Law in Practice
  • Scalability Bottlenecks

Chapter 6. Optimizing GPU Kernels

Section 1. Block and Grid Configuration

  • Occupancy Optimization
  • Thread Block Sizing
  • Grid Dimensioning

Section 2. Warp-Level Efficiency

  • Warp Divergence Mitigation
  • Warp Shuffle Operations
  • Thread Coarsening

Section 3. Reducing Register Pressure

  • Register Usage Analysis
  • Variable Scope Optimization
  • Spill Prevention Techniques

Chapter 7. Synchronization and Communication

Section 1. Managing Threads and Warps

  • Thread Synchronization Primitives
  • Atomic Operations
  • Race Condition Prevention

Section 2. Inter-Thread Communication

  • Shared Memory Communication
  • Warp-Level Primitives
  • Global Memory Synchronization

Section 3. Barriers and Synchronization Techniques

  • Block-Level Synchronization
  • Grid-Level Synchronization
  • Cooperative Groups

Chapter 8. Multi-GPU Programming

Section 1. Peer-to-Peer Communication

  • Direct GPU Communication
  • NVIDIA NVLink
  • PCIe Communication

Section 2. Load Balancing Across GPUs

  • Work Distribution Strategies
  • Dynamic Load Balancing
  • Multi-GPU Synchronization

Section 3. Distributed GPU Systems

  • MPI Integration
  • Remote Memory Access
  • Cluster Programming

Chapter 9. Advanced Techniques in GPU Programming

Section 1. Tensor Core Optimization

  • Matrix Operation Acceleration
  • Mixed Precision Computing
  • Tensor Core Programming

Section 2. Mixed Precision Computing

  • FP16/FP32/FP64 Trade-offs
  • Automatic Mixed Precision
  • Numerical Stability

Section 3. Dynamic Parallelism

  • Nested Kernel Launch
  • Parent-Child Synchronization
  • Resource Management

Chapter 10. Real-World Applications of GPU Computing

Section 1. Scientific Simulations

  • N-body Simulations
  • Fluid Dynamics
  • Molecular Dynamics

Section 2. Machine Learning Workloads

  • Deep Learning Training
  • Inference Optimization
  • Data Processing Pipelines

Section 3. Graphics and Visualization

  • Ray Tracing
  • Volume Rendering
  • Real-time Graphics

Chapter 11. Performance Profiling and Debugging

Section 1. Profiling Tools

  • Nsight Systems Usage
  • Nsight Compute Analysis
  • Visual Profiler Techniques

Section 2. Debugging GPU Kernels

  • CUDA-GDB Usage
  • Memory Checker Tools
  • Common Debug Patterns

Section 3. Optimizing Performance Metrics

  • Metrics Collection
  • Performance Analysis
  • Optimization Strategies

Chapter 12. GPU Accelerated AI and Machine Learning

Section 1. Training Neural Networks on GPUs

  • Data Parallelism Strategies
  • Model Parallelism
  • Pipeline Parallelism

Section 2. Real-Time Inference

  • Inference Optimization
  • Batch Processing
  • Low Latency Techniques

Section 3. Mixed Precision for AI

  • Training with Mixed Precision
  • Inference Optimization
  • Accuracy vs Performance

Chapter 13. Emerging Trends in GPU Computing

Section 1. Advances in GPU Hardware

  • Next-Generation Architectures
  • Specialized Computing Units
  • Memory Technology Evolution

Section 2. Innovations in Programming Models

  • Modern API Developments
  • Unified Memory Advances
  • Programming Abstractions

Section 3. Exascale Computing Challenges

  • Power Efficiency
  • Resilience and Reliability
  • Programming Model Scaling

Chapter 14. Best Practices and Case Studies

Section 1. Writing Efficient GPU Code

  • Performance Optimization Patterns
  • Memory Access Patterns
  • Kernel Design Patterns

Section 2. Real-World GPU Implementations

  • Industry Case Studies
  • Research Applications
  • Performance Analysis

Section 3. Lessons from Industry and Academia

  • Common Pitfalls
  • Success Stories
  • Future Directions

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub