GPU Parallel Computing

Name: GPU Parallel Computing
Brand: Leanpub
Price: 19.00 USD
Availability: InStock

From Basics to Breakthroughs in GPU Programming

This book is 100% completeLast updated on 2026-03-06

gareth thomas

This book is 100% completeLast updated on 2026-03-06

gareth thomas

Minimum price

$19.00

$29.00

You pay

Author earns

PDF

About

GPU Parallel Computing

Minimum price

$19.00

$29.00

You pay

Author earns

About

About the Book

GPU Parallel Computing: From Basics to Breakthroughs — A Technical Guide to GPU ProgrammingIf you want to understand how modern GPUs work and how to use them effectively for high-performance workloads, this book provides the technical foundation required.This book assumes no prior exposure to GPU internals; however, a working knowledge of electronics and general computer architecture is recommended.It is written for students, engineers, researchers, and data scientists who are new to GPU architecture and parallel programming and want a rigorous introduction before progressing into optimization and large-scale GPU systems.If you are already an experienced CUDA performance engineer or low-level GPU architect seeking a specialized microarchitectural reference manual, this book is not positioned for that purpose.What You Will LearnGPU Architecture FundamentalsStreaming multiprocessors and SIMT executionWarp scheduling and instruction flowGPU memory hierarchy and bandwidth considerationsGPU Programming ModelsCUDA programming principlesOpenCL fundamentalsKernel structure and execution behaviorPerformance OptimizationMemory access patterns and coalescingWarp divergence and latency hidingOccupancy principles and kernel configurationReal-World ApplicationsScientific simulationsMachine learning workloadsGraphics and visualization pipelinesAdvanced TopicsMulti-GPU communicationTensor cores and mixed precisionProfiling, debugging, and performance analysisThe early chapters establish architectural clarity and programming fundamentals.Later chapters address optimization strategies, scalability, and applied GPU workloads.Who This Book Is ForStudents entering GPU computingEngineers transitioning into parallel architectureResearchers and data scientists adopting GPU accelerationThis is a technical book. It builds understanding from architectural principles upward and focuses on performance-oriented reasoning rather than superficial overview.Why This BookMany GPU resources either assume too much prior knowledge or remain overly abstract.This book emphasizes structured technical understanding:How GPUs execute threadsWhy performance bottlenecks occurHow architectural constraints shape resultsHow programming decisions map to hardware behaviorClear explanations.Practical code examples.Architectural context.Read more

Share this book

Feedback

Email the Author

Bundles

Bundles that include this book

Modern GPU Architecture and Programming Complete Bundle
4 Books
Pricing
$37.00
Minimum price
Bought separately$116
Suggested price$97.00
Modern GPU Architecture and Programming Complete Bundle
7 Books
Pricing
$87.00
Minimum price
Bought separately$203
Suggested price$197

Author

About the Author

gareth thomas

Gareth Morgan Thomas is a qualified expert with extensive expertise across multiple STEM fields. Holding six university diplomas in electronics, software development, web development, and project management, along with qualifications in computer networking, CAD, diesel engineering, well drilling, and welding, he has built a robust foundation of technical knowledge.

Educated in Auckland, New Zealand, Gareth Morgan Thomas also spent three years serving in the New Zealand Army, where he honed his discipline and problem-solving skills. With years of technical training, Gareth Morgan Thomas is now dedicated to sharing his deep understanding of science, technology, engineering, and mathematics through a series of specialized books aimed at both beginners and advanced learners.

Table of Contents

GPU Parallel Computing — Outline

Chapter 1. Introduction to GPU Parallel Computing

Section 1. Evolution of Parallel Computing

Historical Development of GPUs
From Graphics to General-Purpose Computing
Modern GPU Computing Landscape

Section 2. Key Applications of GPU Computing

High-Performance Computing Domains
Real-time Processing Applications
Emerging GPU Computing Fields

Section 3. Benefits of GPU Parallelism

Performance Advantages
Energy Efficiency Considerations
Cost-Benefit Analysis for Different Workloads

Chapter 2. GPU Architecture Fundamentals

Section 1. Streaming Multiprocessors

SM Architecture and Components
Thread Block Scheduling
SIMT Execution Model

Section 2. Memory Subsystems

Memory Hierarchy Overview
Cache Architecture
Memory Controllers and Bandwidth

Section 3. Instruction Execution and Scheduling

Warp Scheduling Mechanisms
Instruction Pipeline
Latency Hiding Techniques

Chapter 3. Programming Models for GPU Parallelism

Section 1. CUDA Programming Overview

CUDA Programming Model
Kernel Programming Basics
CUDA Runtime API vs Driver API

Section 2. OpenCL Fundamentals

Platform and Device Models
Memory Model
Programming Pattern Differences

Section 3. Comparing Programming Models

CUDA vs OpenCL Trade-offs
DirectCompute and Other APIs
Choosing the Right Framework

Chapter 4. GPU Memory Management

Section 1. GPU Memory Types

Global Memory Management
Shared Memory Utilization
Constant and Texture Memory
Register Usage Strategies

Section 2. Coalesced Memory Access

Memory Access Patterns
Alignment Requirements
Bank Conflict Resolution

Section 3. Memory Allocation Techniques

Dynamic Memory Management
Unified Memory
Zero-Copy Memory

Chapter 5. Parallel Algorithm Design

Section 1. Task Parallelism vs. Data Parallelism

Identifying Parallelization Opportunities
Decomposition Strategies
Hybrid Approaches

Section 2. Workload Partitioning

Data Distribution Techniques
Load Balancing Strategies
Granularity Considerations

Section 3. Algorithm Scalability

Strong vs Weak Scaling
Amdahl's Law in Practice
Scalability Bottlenecks

Chapter 6. Optimizing GPU Kernels

Section 1. Block and Grid Configuration

Occupancy Optimization
Thread Block Sizing
Grid Dimensioning

Section 2. Warp-Level Efficiency

Warp Divergence Mitigation
Warp Shuffle Operations
Thread Coarsening

Section 3. Reducing Register Pressure

Register Usage Analysis
Variable Scope Optimization
Spill Prevention Techniques

Chapter 7. Synchronization and Communication

Section 1. Managing Threads and Warps

Thread Synchronization Primitives
Atomic Operations
Race Condition Prevention

Section 2. Inter-Thread Communication

Shared Memory Communication
Warp-Level Primitives
Global Memory Synchronization

Section 3. Barriers and Synchronization Techniques

Block-Level Synchronization
Grid-Level Synchronization
Cooperative Groups

Chapter 8. Multi-GPU Programming

Section 1. Peer-to-Peer Communication

Direct GPU Communication
NVIDIA NVLink
PCIe Communication

Section 2. Load Balancing Across GPUs

Work Distribution Strategies
Dynamic Load Balancing
Multi-GPU Synchronization

Section 3. Distributed GPU Systems

MPI Integration
Remote Memory Access
Cluster Programming

Chapter 9. Advanced Techniques in GPU Programming

Section 1. Tensor Core Optimization

Matrix Operation Acceleration
Mixed Precision Computing
Tensor Core Programming

Section 2. Mixed Precision Computing

FP16/FP32/FP64 Trade-offs
Automatic Mixed Precision
Numerical Stability

Section 3. Dynamic Parallelism

Nested Kernel Launch
Parent-Child Synchronization
Resource Management

Chapter 10. Real-World Applications of GPU Computing

Section 1. Scientific Simulations

N-body Simulations
Fluid Dynamics
Molecular Dynamics

Section 2. Machine Learning Workloads

Deep Learning Training
Inference Optimization
Data Processing Pipelines

Section 3. Graphics and Visualization

Ray Tracing
Volume Rendering
Real-time Graphics

Chapter 11. Performance Profiling and Debugging

Section 1. Profiling Tools

Nsight Systems Usage
Nsight Compute Analysis
Visual Profiler Techniques

Section 2. Debugging GPU Kernels

CUDA-GDB Usage
Memory Checker Tools
Common Debug Patterns

Section 3. Optimizing Performance Metrics

Metrics Collection
Performance Analysis
Optimization Strategies

Chapter 12. GPU Accelerated AI and Machine Learning

Section 1. Training Neural Networks on GPUs

Data Parallelism Strategies
Model Parallelism
Pipeline Parallelism

Section 2. Real-Time Inference

Inference Optimization
Batch Processing
Low Latency Techniques

Section 3. Mixed Precision for AI

Training with Mixed Precision
Inference Optimization
Accuracy vs Performance

Chapter 13. Emerging Trends in GPU Computing

Section 1. Advances in GPU Hardware

Next-Generation Architectures
Specialized Computing Units
Memory Technology Evolution

Section 2. Innovations in Programming Models

Modern API Developments
Unified Memory Advances
Programming Abstractions

Section 3. Exascale Computing Challenges

Power Efficiency
Resilience and Reliability
Programming Model Scaling

Chapter 14. Best Practices and Case Studies

Section 1. Writing Efficient GPU Code

Performance Optimization Patterns
Memory Access Patterns
Kernel Design Patterns

Section 2. Real-World GPU Implementations

Industry Case Studies
Research Applications
Performance Analysis

Section 3. Lessons from Industry and Academia

Common Pitfalls
Success Stories
Future Directions

Also by the Author

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

You pay

Author earns

About

Share this book

Categories

Feedback

Bundles

Modern GPU Architecture and Programming Complete Bundle

$37.00

Modern GPU Architecture and Programming Complete Bundle

$87.00

Author

Contents

Chapter 1. Introduction to GPU Parallel Computing

Section 1. Evolution of Parallel Computing

Section 2. Key Applications of GPU Computing

Section 3. Benefits of GPU Parallelism

Chapter 2. GPU Architecture Fundamentals

Section 1. Streaming Multiprocessors

Section 2. Memory Subsystems

Section 3. Instruction Execution and Scheduling

Chapter 3. Programming Models for GPU Parallelism

Section 1. CUDA Programming Overview

Section 2. OpenCL Fundamentals

Section 3. Comparing Programming Models

Chapter 4. GPU Memory Management

Section 1. GPU Memory Types

Section 2. Coalesced Memory Access

Section 3. Memory Allocation Techniques

Chapter 5. Parallel Algorithm Design

Section 1. Task Parallelism vs. Data Parallelism

Section 2. Workload Partitioning

Section 3. Algorithm Scalability

Chapter 6. Optimizing GPU Kernels

Section 1. Block and Grid Configuration

Section 2. Warp-Level Efficiency

Section 3. Reducing Register Pressure

Chapter 7. Synchronization and Communication

Section 1. Managing Threads and Warps

Section 2. Inter-Thread Communication

Section 3. Barriers and Synchronization Techniques

Chapter 8. Multi-GPU Programming

Section 1. Peer-to-Peer Communication

Section 2. Load Balancing Across GPUs

Section 3. Distributed GPU Systems

Chapter 9. Advanced Techniques in GPU Programming

Section 1. Tensor Core Optimization

Section 2. Mixed Precision Computing

Section 3. Dynamic Parallelism

Chapter 10. Real-World Applications of GPU Computing

Section 1. Scientific Simulations

Section 2. Machine Learning Workloads

Section 3. Graphics and Visualization

Chapter 11. Performance Profiling and Debugging

Section 1. Profiling Tools

Section 2. Debugging GPU Kernels

Section 3. Optimizing Performance Metrics

Chapter 12. GPU Accelerated AI and Machine Learning

Section 1. Training Neural Networks on GPUs

Section 2. Real-Time Inference

Section 3. Mixed Precision for AI

Chapter 13. Emerging Trends in GPU Computing

Section 1. Advances in GPU Hardware

Section 2. Innovations in Programming Models

Section 3. Exascale Computing Challenges

Chapter 14. Best Practices and Case Studies

Section 1. Writing Efficient GPU Code

Section 2. Real-World GPU Implementations

Section 3. Lessons from Industry and Academia

Also by the Author

The Leanpub 60 Day 100% Happiness Guarantee

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

Free Updates. DRM Free.

Write and Publish on Leanpub