An edition of Programming Massively Parallel Processors (2010)

Programming Massively Parallel Processors: A Hands-on Approach

by David Kirk and Wen-Mei W. Hwu

2 Want to read
1 Currently reading

Cover of: Programming Massively Parallel Processors: A Hands-on Approach by David Kirk, Wen-Mei W. Hwu, David B. Kirk, Wen-mei W. Hwu

Preview

Borrow

My Reading Lists:

Use this Work

Create a new list

2 Want to read
1 Currently reading

Check nearby libraries

WorldCat

Buy this book

Last edited by Drini

September 22, 2025 | History

Edit

An edition of Programming Massively Parallel Processors (2010)

Programming Massively Parallel Processors: A Hands-on Approach

by David Kirk and Wen-Mei W. Hwu

2 Want to read
1 Currently reading

This edition doesn't have a description yet. Can you add one?

Publish Date

Dec 28, 2012

Publisher

Morgan Kaufmann

Pages

514

Check nearby libraries

WorldCat

Buy this book

Edition	Availability
1 Programming Massively Parallel Processors: A Hands-On Approach 2022, Elsevier Science & Technology, Morgan Kaufmann in English 0323912311 9780323912310	zzzz Locate
2 Programming Massively Parallel Processors: A Hands-on Approach Dec 28, 2012, Morgan Kaufmann 0124159923 9780124159921	aaaa Borrow Listen Locate
3 Programming Massively Parallel Processors December 28, 2012 , Morgan Kaufmann 0124159923 9780124159921	zzzz Locate
4 Programming Massively Parallel Processors: A Hands-On Approach 2010, Elsevier Science & Technology Books in English 0123814731 9780123814739	zzzz Locate

Add another edition?

Book Details

Preface

Page xiii

Acknowledgements

Page xix

Dedication

Page xxi

Chapter 1. Introduction

Page 1

1.1. Heterogeneous Parallel Computing

Page 2

1.2. Architecture of a Modern GPU

Page 8

1.3. Why More Speed or Parallelism?

Page 10

1.4. Speeding Up Real Applications

Page 12

1.5. Parallel Programming Languages and Models

Page 14

1.6. Overarching Goals

Page 16

1.7. Organization of the Book

Page 17

References

Page 21

Chapter 2. History of GPU Computing

Page 23

2.1. Evolution of Graphics Pipelines

Page 23

The Era of Fixed-Function Graphics Pipelines

Page 24

Evolution of Programmable Real-Time Graphics

Page 28

Unified Graphics and Computing Processors

Page 31

2.2. GPGPU: An Intermediate Step

References and Further Reading

Page 37

Chapter 3. Introduction to Data Parallelism and CUDA C

Page 41

3.1. Data Parallelism

Page 42

3.2. CUDA Program Structure

Page 43

3.3. A Vector Addition Kernel

Page 45

3.4. Device Global Memory and Data Transfer

Page 48

3.5. Kernel Functions and Threading

Page 53

3.6. Summary

Page 58

Function Declarations

Chapter 4. Data-Parallel Execution Model

Page 63

4.1. CUDA Thread Organization

Page 64

4.2. Mapping Threads to Multidimensional Data

Page 68

4.3. Matrix-Matrix Multiplication—A More Complex Kernel

Page 74

4.4. Synchronization and Transparent Scalability

Page 81

4.5. Assigning Resources to Blocks

Page 83

4.6. Querying Device Properties

Page 85

4.7. Thread Scheduling and Latency Tolerance

Chapter 5. CUDA Memories

Page 95

5.1. Importance of Memory Access Efficiency

Page 96

5.2. CUDA Device Memory Types

Page 97

5.3. A Strategy for Reducing Global Memory Traffic

Page 105

5.4. A Tiled Matrix-Matrix Multiplication Kernel

Page 109

5.5. Memory as a Limiting Factor to Parallelism

Chapter 6. Performance Considerations

Page 123

6.1. Warps and Thread Execution

Page 124

6.2. Global Memory Bandwidth

Page 132

6.3. Dynamic Partitioning of Execution Resources

Page 141

6.4. Instruction Mix and Thread Granularity

Chapter 7. Floating-Point Considerations

Page 151

7.1. Floating-Point Format

Page 152

Normalized Representation of \(M\)

Page 152

Excess Encoding of \(E\)

Page 153

7.2. Representable Numbers

Page 155

7.3. Special Bit Patterns and Precision in IEEE Format

Page 160

7.4. Arithmetic Accuracy and Rounding

Page 161

7.5. Algorithm Considerations

Page 162

7.6. Numerical Stability

Chapter 8. Parallel Patterns: Convolution

Page 173

8.1. Background

Page 174

8.2. 1D Parallel Convolution—A Basic Algorithm

Page 179

8.3. Constant Memory and Caching

Page 181

8.4. Tiled 1D Convolution with Halo Elements

Page 185

8.5. A Simpler Tiled 1D Convolution—General Caching

Chapter 9. Parallel Patterns: Prefix Sum

Page 197

9.1. Background

Page 198

9.2. A Simple Parallel Scan

Page 200

9.3. Work Efficiency Considerations

Page 204

9.4. A Work-Efficient Parallel Scan

Page 205

9.5. Parallel Scan for Arbitrary-Length Inputs

Chapter 10. Parallel Patterns: Sparse Matrix-Vector Multiplication

Page 217

10.1. Background

Page 218

10.2. Parallel SpMV Using CSR

Page 222

10.3. Padding and Transposition

Page 224

10.4. Using Hybrid to Control Padding

Page 226

10.5. Sorting and Partitioning for Regularization

Chapter 11. Application Case Study: Advanced MRI Reconstruction

Page 235

11.1. Application Background

Page 236

11.2. Iterative Reconstruction

Page 239

11.3. Computing FHD

Page 241

Step 1: Determine the Kernel Parallelism Structure

Page 243

Step 2: Getting Around the Memory Bandwidth Limitation

Page 249

Step 3: Using Hardware Trigonometry Functions

Page 255

Step 4: Experimental Performance Tuning

Page 259

11.4. Final Evaluation

Chapter 12. Application Case Study: Molecular Visualization and Analysis

Page 265

12.1. Application Background

Page 266

12.2. A Simple Kernel Implementation

Page 268

12.3. Thread Granularity Adjustment

Page 272

12.4. Memory Coalescing

Chapter 13. Parallel Programming and Computational Thinking

Page 281

13.1. Goals of Parallel Computing

Page 282

13.2. Problem Decomposition

Page 283

13.3. Algorithm Selection

Page 287

13.4. Computational Thinking

Chapter 14. An Introduction to OpenCL

Page 297

14.1. Background

Page 297

14.2. Data Parallelism Model

Page 299

14.3. Device Architecture

Page 301

14.4. Kernel Functions

Page 303

14.5. Device Management and Kernel Launch

Page 304

14.6. Electrostatic Potential Map in OpenCL

Chapter 15. Parallel Programming with OpenACC

Page 315

15.1. OpenACC Versus CUDA C

Page 315

15.2. Execution Model

Page 318

15.3. Memory Model

Page 319

15.4. Basic OpenACC Programs

Asynchronous Computation and Data Transfer

Page 335

15.5. Future Directions of OpenACC

Page 336

15.6. Exercises

Page 337

Chapter 16. Thrust: A Productivity-Oriented Library for CUDA

16.3. Basic Thrust Features

Page 343

Iterators and Memory Space

Page 344

Interoperability

Page 345

16.4. Generic Programming

Page 347

16.5. Benefits of Abstraction

Page 349

16.6. Programmer Productivity

Page 349

Robustness

Page 350

Real-World Performance

Chapter 17. CUDA Fortran

Page 359

17.1. CUDA Fortran and CUDA C Differences

Page 360

17.2. A First CUDA Fortran Program

Page 361

17.3. Multidimensional Array in CUDA Fortran

Page 363

17.4. Overloading Host/Device Routines With Generic Interfaces

Page 364

17.5. Calling CUDA C Via Iso_C_Binding

Page 367

17.6. Kernel Loop Directives and Reduction Operations

Page 369

17.7. Dynamic Shared Memory

Page 370

17.8. Asynchronous Data Transfers

Page 371

17.9. Compilation and Profiling

Page 377

17.10. Calling Thrust from CUDA Fortran

Page 378

17.11. Exercises

Page 382

Chapter 18. An Introduction to C++ AMP

Page 383

18.1. Core C++ AMP Features

Page 384

18.2. Details of the C++ AMP Execution Model

Page 391

Explicit and Implicit Data Copies

Page 391

Asynchronous Operation

Page 393

Section Summary

Page 395

18.3. Managing Accelerators

Page 395

18.4. Tiled Execution

Page 398

18.5. C++ AMP Graphics Features

Chapter 19. Programming a Heterogeneous Computing Cluster

Page 407

19.1. Background

Page 408

19.2. A Running Example

Page 408

19.3. MPI Basics

Page 410

19.4. MPI Point-to-Point Communication Types

Page 414

19.5. Overlapping Computation and Communication

Page 421

19.6. MPI Collective Communication

Reference

Chapter 20. CUDA Dynamic Parallelism

Page 435

20.1. Background

Page 436

20.2. Dynamic Parallelism Overview

Page 438

20.3. Important Details

Page 439

Launch Environment Configuration

Page 439

API Errors and Launch Failures

Synchronization Scope

Page 441

20.4. Memory Visibility

20.5. A Simple Example

Page 444

20.6. Runtime Limitations

Memory Allocation and Lifetime

20.7. A More Complex Example

Page 449

Linear Bezier Curves

Page 450

Quadratic Bezier Curves

Page 450

Bezier Curve Calculation (Predynamic Parallelism)

Page 450

Bezier Curve Calculation (with Dynamic Parallelism)

Chapter 21. Conclusion and Future Outlook

Page 459

21.1. Goals Revisited

Page 459

21.2. Memory Model Evolution

Page 461

21.3. Kernel Execution Control Evolution

Page 464

21.4. Core Performance

Page 467

21.5. Programming Environment

Appendix A: Matrix Multiplication Host-Only Version Source Code

Page 471

Appendix B: GPU Compute Capabilities

Page 481

Index

Page 487

Classifications

Library of Congress: QA76.58, QA76.642 .K57 2013eb

Edition Identifiers

Open Library: OL26838891M
Internet Archive: programmingmassi0000kirk
ISBN 10: 0124159923
ISBN 13: 9780124159921
OCLC/WorldCat: 841331948

Work Identifiers

Work ID: OL25666151W

Source records

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

History

Created April 6, 2019
9 revisions

Download catalog record: RDF / JSON / OPDS | Wikipedia citation

September 22, 2025	Edited by Drini	Add TOC from Tocky
September 14, 2024	Edited by MARC Bot	import existing book
August 6, 2024	Edited by Seth Pellegrino	Edited without comment.
August 6, 2024	Edited by Seth Pellegrino	Edited without comment.
April 6, 2019	Created by ImportBot	Imported from amazon.com record

Programming Massively Parallel Processors: A Hands-on Approach

by David Kirk and Wen-Mei W. Hwu

Programming Massively Parallel Processors: A Hands-on Approach

Preview Book

Create a new list

Programming Massively Parallel Processors: A Hands-on Approach

by David Kirk and Wen-Mei W. Hwu

Book Details

Table of Contents

Classifications

Edition Identifiers

Work Identifiers

Source records

Community Reviews (0)

Lists

History

Wikipedia citation

Programming Massively Parallel Processors: A Hands-on Approach

by David Kirk and Wen-Mei W. Hwu

Programming Massively Parallel Processors: A Hands-on Approach

Preview Book

Create a new list

My Book Notes

My Book Notes

Programming Massively Parallel Processors: A Hands-on Approach

by David Kirk and Wen-Mei W. Hwu

Subjects

Book Details

Table of Contents

Classifications

Edition Identifiers

Work Identifiers

Source records

Community Reviews (0)

Lists

History

Wikipedia citation