Parallel programming in C with MPI and OpenMP

  • 11 Want to read
  • 2 Currently reading

My Reading Lists:

Create a new list

  • 11 Want to read
  • 2 Currently reading

Buy this book

Last edited by Drini
September 22, 2025 | History

Parallel programming in C with MPI and OpenMP

  • 11 Want to read
  • 2 Currently reading

This edition doesn't have a description yet. Can you add one?

Publish Date
Language
English
Pages
529

Buy this book

Previews available in: English

Edition Availability
Cover of: Parallel programming in C with MPI and OpenMP
Parallel programming in C with MPI and OpenMP
2004, McGraw-Hill Higher Education
in English
Cover of: Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
September 1, 2003, McGraw-Hill Education (ISE Editions)
in English
Cover of: Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
September 1, 2003, McGraw-Hill Education (ISE Editions)
Paperback in English
Cover of: Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
June 5, 2003, McGraw-Hill Science/Engineering/Math
Hardcover in English - 1 edition
Cover of: Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
June 5, 2003, McGraw-Hill Science/Engineering/Math
in English

Add another edition?

Book Details


Table of Contents

Preface
Page xiv
Chapter 1. Motivation and History
Page 1
1.1. Introduction
Page 1
1.2. Modern Scientific Method
Page 3
1.3. Evolution of Supercomputing
Page 4
1.4. Modern Parallel Computers
Page 5
1.4.1. The Cosmic Cube
Page 6
1.4.2. Commercial Parallel Computers
Page 6
1.4.3. Beowulf
Page 7
1.4.4. Advanced Strategic Computing Initiative
Page 8
1.5. Seeking Concurrency
Page 9
1.5.1. Data Dependence Graphs
Page 9
1.5.2. Data Parallelism
Page 10
1.5.3. Functional Parallelism
Page 10
1.5.4. Pipelining
Page 12
1.5.5. Size Considerations
Page 13
1.6. Data Clustering
Page 14
1.7. Programming Parallel Computers
Page 17
1.7.1. Extend a Compiler
Page 17
1.7.2. Extend a Sequential Programming Language
Page 18
1.7.3. Add a Parallel Programming Layer
Page 19
1.7.4. Create a Parallel Language
Page 19
1.7.5. Current Status
Page 21
1.8. Summary
Page 21
1.9. Key Terms
Page 22
1.10. Bibliographic Notes
Page 22
1.11. Exercises
Page 23
Chapter 2. Parallel Architectures
Page 27
2.1. Introduction
Page 27
2.2. Interconnection Networks
Page 28
2.2.1. Shared versus Switched Media
Page 28
2.2.2. Switch Network Topologies
Page 29
2.2.3. 2-D Mesh Network
Page 29
2.2.4. Binary Tree Network
Page 30
2.2.5. Hypertree Network
Page 31
2.2.6. Butterfly Network
Page 32
2.2.7. Hypercube Network
Page 33
2.2.8. Shuffle-exchange Network
Page 35
2.2.9. Summary
Page 36
2.3. Processor Arrays
Page 37
2.3.1. Architecture and Data-parallel Operations
Page 37
2.3.2. Processor Array Performance
Page 39
2.3.3. Processor Interconnection Network
Page 40
2.3.4. Enabling and Disabling Processors
Page 40
2.3.5. Additional Architectural Features
Page 42
2.3.6. Shortcomings of Processor Arrays
Page 42
2.4. Multiprocessors
Page 43
2.4.1. Centralized Multiprocessors
Page 43
2.4.2. Distributed Multiprocessors
Page 45
2.5. Multicomputers
Page 49
2.5.1. Asymmetrical Multicomputers
Page 49
2.5.2. Symmetrical Multicomputers
Page 51
2.5.3. Which Model Is Best for a Commodity Cluster?
Page 52
2.5.4. Differences between Clusters and Networks of Workstations
Page 53
2.6. Flynn's Taxonomy
Page 54
2.6.1. SISD
Page 54
2.6.2. SIMD
Page 55
2.6.3. MISD
Page 55
2.6.4. MIMD
Page 56
2.7. Summary
Page 58
2.8. Key Terms
Page 59
2.9. Bibliographic Notes
Page 59
2.10. Exercises
Page 60
Chapter 3. Parallel Algorithm Design
Page 63
3.1. Introduction
Page 63
3.2. The Task/Channel Model
Page 63
3.3. Foster's Design Methodology
Page 64
3.3.1. Partitioning
Page 65
3.3.2. Communication
Page 67
3.3.3. Agglomeration
Page 68
3.3.4. Mapping
Page 70
3.4. Boundary Value Problem
Page 73
3.4.1. Introduction
Page 73
3.4.2. Partitioning
Page 75
3.4.3. Communication
Page 75
3.4.4. Agglomeration and Mapping
Page 76
3.4.5. Analysis
Page 76
3.5. Finding the Maximum
Page 77
3.5.1. Introduction
Page 77
3.5.2. Partitioning
Page 77
3.5.3. Communication
Page 77
3.5.4. Agglomeration and Mapping
Page 81
3.5.5. Analysis
Page 82
3.6. The n-Body Problem
Page 82
3.6.1. Introduction
Page 82
3.6.2. Partitioning
Page 83
3.6.3. Communication
Page 83
3.6.4. Agglomeration and Mapping
Page 85
3.6.5. Analysis
Page 85
3.7. Adding Data Input
Page 86
3.7.1. Introduction
Page 86
3.7.2. Communication
Page 87
3.7.3. Analysis
Page 88
3.8. Summary
Page 89
3.9. Key Terms
Page 90
3.10. Bibliographic Notes
Page 90
3.11. Exercises
Page 90
Chapter 4. Message-Passing Programming
Page 93
4.1. Introduction
Page 93
4.2. The Message-Passing Model
Page 94
4.3. The Message-Passing Interface
Page 95
4.4. Circuit Satisfiability
Page 96
4.4.1. Function `MPI_Init`
Page 99
4.4.2. Functions `MPI_Comm_rank` and `MPI_Comm_size`
Page 99
4.4.3. Function `MPI_Finalize`
Page 101
4.4.4. Compiling MPI Programs
Page 102
4.4.5. Running MPI Programs
Page 102
4.5. Introducing Collective Communication
Page 104
4.5.1. Function `MPI_Reduce`
Page 105
4.6. Benchmarking Parallel Performance
Page 108
4.6.1. Functions `MPI_Wtime` and `MPI_Wtick`
Page 108
4.6.2. Function `MPI_Barrier`
Page 108
4.7. Summary
Page 110
4.8. Key Terms
Page 110
4.9. Bibliographic Notes
Page 110
4.10. Exercises
Page 111
Chapter 5. The Sieve of Eratosthenes
Page 115
5.1. Introduction
Page 115
5.2. Sequential Algorithm
Page 115
5.3. Sources of Parallelism
Page 117
5.4. Data Decomposition Options
Page 117
5.4.1. Interleaved Data Decomposition
Page 118
5.4.2. Block Data Decomposition
Page 118
5.4.3. Block Decomposition Macros
Page 120
5.4.4. Local Index versus Global Index
Page 120
5.4.5. Ramifications of Block Decomposition
Page 121
5.5. Developing the Parallel Algorithm
Page 121
5.5.1. Function MPI_Bcast
Page 122
5.6. Analysis of Parallel Sieve Algorithm
Page 122
5.7. Documenting the Parallel Program
Page 123
5.8. Benchmarking
Page 128
5.9. Improvements
Page 129
5.9.1. Delete Even Integers
Page 129
5.9.2. Eliminate Broadcast
Page 130
5.9.3. Reorganize Loops
Page 131
5.9.4. Benchmarking
Page 131
5.10. Summary
Page 133
5.11. Key Terms
Page 134
5.12. Bibliographic Notes
Page 134
5.13. Exercises
Page 134
Chapter 6. Floyd's Algorithm
Page 137
6.1. Introduction
Page 137
6.2. The All-Pairs Shortest-Path Problem
Page 137
6.3. Creating Arrays at Run Time
Page 139
6.4. Designing the Parallel Algorithm
Page 140
6.4.1. Partitioning
Page 140
6.4.2. Communication
Page 141
6.4.3. Agglomeration and Mapping
Page 142
6.4.4. Matrix Input/Output
Page 143
6.5. Point-to-Point Communication
Page 145
6.5.1. Function MPI_Send
Page 146
6.5.2. Function MPI_Recv
Page 147
6.5.3. Deadlock
Page 148
6.6. Documenting the Parallel Program
Page 149
6.7. Analysis and Benchmarking
Page 151
6.8. Summary
Page 154
6.9. Key Terms
Page 154
6.10. Bibliographic Notes
Page 154
6.11. Exercises
Page 154
Chapter 7. Performance Analysis
Page 159
7.1. Introduction
Page 159
7.2. Speedup and Efficiency
Page 159
7.3. Amdahl's Law
Page 161
7.3.1. Limitations of Amdahl's Law
Page 164
7.3.2. The Amdahl Effect
Page 164
7.4. Gustafson-Barsis's Law
Page 164
7.5. The Karp-Flatt Metric
Page 167
7.6. The Isoefficiency Metric
Page 170
7.7. Summary
Page 174
7.8. Key Terms
Page 175
7.9. Bibliographic Notes
Page 175
7.10. Exercises
Page 176
Chapter 8. Matrix-Vector Multiplication
Page 178
8.1. Introduction
Page 178
8.2. Sequential Algorithm
Page 179
8.3. Data Decomposition Options
Page 180
8.4. Rowwise Block-Striped Decomposition
Page 181
8.4.1. Design and Analysis
Page 181
8.4.2. Replicating a Block-Mapped Vector
Page 183
8.4.3. Function MPI_Allgatherv
Page 184
8.4.4. Replicated Vector Input/Output
Page 186
8.4.5. Documenting the Parallel Program
Page 187
8.4.6. Benchmarking
Page 187
8.5. Columnwise Block-Striped Decomposition
Page 189
8.5.1. Design and Analysis
Page 189
8.5.2. Reading a Columnwise Block-Striped Matrix
Page 191
8.5.3. Function MPI_Scatterv
Page 191
8.5.4. Printing a Columnwise Block-Striped Matrix
Page 193
8.5.5. Function `MPI_Gatherv`
Page 193
8.5.6. Distributing Partial Results
Page 195
8.5.7. Function `MPI_Alltoallv`
Page 195
8.5.8. Documenting the Parallel Program
Page 196
8.5.9. Benchmarking
Page 198
8.6. Checkerboard Block Decomposition
Page 199
8.6.1. Design and Analysis
Page 199
8.6.2. Creating a Communicator
Page 202
8.6.3. Function `MPI_Dims_create`
Page 203
8.6.4. Function `MPI_Cart_create`
Page 204
8.6.5. Reading a Checkerboard Matrix
Page 205
8.6.6. Function `MPI_Cart_rank`
Page 205
8.6.7. Function `MPI_Cart_coords`
Page 207
8.6.8. Function `MPI_Comm_split`
Page 207
8.6.9. Benchmarking
Page 208
8.7. Summary
Page 210
8.8. Key Terms
Page 211
8.9. Bibliographic Notes
Page 211
8.10. Exercises
Page 211
Chapter 9. Document Classification
Page 216
9.1. Introduction
Page 216
9.2. Parallel Algorithm Design
Page 217
9.2.1. Partitioning and Communication
Page 217
9.2.2. Agglomeration and Mapping
Page 217
9.2.3. Manager/Worker Paradigm
Page 218
9.2.4. Manager Process
Page 219
9.2.5. Function `MPI_Abort`
Page 220
9.2.6. Worker Process
Page 221
9.2.7. Creating a Workers-only Communicator
Page 223
9.3. Nonblocking Communications
Page 223
9.3.1. Manager's Communication
Page 224
9.3.2. Function `MPI_Irecv`
Page 224
9.3.3. Function `MPI_Wait`
Page 225
9.3.4. Workers' Communications
Page 225
9.3.5. Function `MPI_Isend`
Page 225
9.3.6. Function `MPI_Probe`
Page 225
9.3.7. Function `MPI_Get_count`
Page 226
9.4. Documenting the Parallel Program
Page 226
9.5. Enhancements
Page 232
9.5.1. Assigning Groups of Documents
Page 232
9.5.2. Pipelining
Page 232
9.5.3. Function `MPI_Testsome`
Page 234
9.6. Summary
Page 235
9.7. Key Terms
Page 236
9.8. Bibliographic Notes
Page 236
9.9. Exercises
Page 236
Chapter 10. Monte Carlo Methods
Page 239
10.1. Introduction
Page 239
10.1.1. Why Monte Carlo Works
Page 240
10.1.2. Monte Carlo and Parallel Computing
Page 243
10.2. Sequential Random Number Generators
Page 243
10.2.1. Linear Congruential
Page 244
10.2.2. Lagged Fibonacci
Page 245
10.3. Parallel Random Number Generators
Page 245
10.3.1. Manager-Worker Method
Page 246
10.3.2. Leapfrog Method
Page 246
10.3.3. Sequence Splitting
Page 247
10.3.4. Parameterization
Page 248
10.4. Other Random Number Distributions
Page 248
10.4.1. Inverse Cumulative Distribution Function Transformation
Page 249
10.4.2. Box-Muller Transformation
Page 250
10.4.3. The Rejection Method
Page 251
10.5. Case Studies
Page 253
10.5.1. Neutron Transport
Page 253
10.5.2. Temperature at a Point Inside a 2-D Plate
Page 255
10.5.3. Two-Dimensional Ising Model
Page 257
10.5.4. Room Assignment Problem
Page 259
10.5.5. Parking Garage
Page 262
10.5.6. Traffic Circle
Page 264
10.6. Summary
Page 268
10.7. Key Terms
Page 269
10.8. Bibliographic Notes
Page 269
10.9. Exercises
Page 270
Chapter 11. Matrix Multiplication
Page 273
11.1. Introduction
Page 273
11.2. Sequential Matrix Multiplication
Page 274
11.2.1. Iterative, Row-Oriented Algorithm
Page 274
11.2.2. Recursive, Block-Oriented Algorithm
Page 275
11.3. Rowwise Block-Striped Parallel Algorithm
Page 277
11.3.1. Identifying Primitive Tasks
Page 277
11.3.2. Agglomeration
Page 278
11.3.3. Communication and Further Agglomeration
Page 279
11.3.4. Analysis
Page 279
11.4. Cannon's Algorithm
Page 281
11.4.1. Agglomeration
Page 281
11.4.2. Communication
Page 283
11.4.3. Analysis
Page 284
11.5. Summary
Page 286
11.6. Key Terms
Page 287
11.7. Bibliographic Notes
Page 287
11.8. Exercises
Page 287
Chapter 12. Solving Linear Systems
Page 290
12.1. Introduction
Page 290
12.2. Terminology
Page 291
12.3. Back Substitution
Page 292
12.3.1. Sequential Algorithm
Page 292
12.3.2. Row-Oriented Parallel Algorithm
Page 293
12.3.3. Column-Oriented Parallel Algorithm
Page 295
12.3.4. Comparison
Page 295
12.4. Gaussian Elimination
Page 296
12.4.1. Sequential Algorithm
Page 296
12.4.2. Parallel Algorithms
Page 298
12.4.3. Row-Oriented Algorithm
Page 299
12.4.4. Column-Oriented Algorithm
Page 303
12.4.5. Comparison
Page 303
12.4.6. Pipelined, Row-Oriented Algorithm
Page 304
12.5. Iterative Methods
Page 306
12.6. The Conjugate Gradient Method
Page 309
12.6.1. Sequential Algorithm
Page 309
12.6.2. Parallel Implementation
Page 310
12.7. Summary
Page 313
12.8. Key Terms
Page 314
12.9. Bibliographic Notes
Page 314
12.10. Exercises
Page 314
Chapter 13. Finite Difference Methods
Page 318
13.1. Introduction
Page 318
13.2. Partial Differential Equations
Page 320
13.2.1. Categorizing PDEs
Page 320
13.2.2. Difference Quotients
Page 321
13.3. Vibrating String
Page 322
13.3.1. Deriving Equations
Page 322
13.3.2. Deriving the Sequential Program
Page 323
13.3.3. Parallel Program Design
Page 324
13.3.4. Isoefficiency Analysis
Page 327
13.3.5. Replicating Computations
Page 327
13.4. Steady-State Heat Distribution
Page 329
13.4.1. Deriving Equations
Page 329
13.4.2. Deriving the Sequential Program
Page 330
13.4.3. Parallel Program Design
Page 332
13.4.4. Isoefficiency Analysis
Page 332
13.4.5. Implementation Details
Page 334
13.5. Summary
Page 334
13.6. Key Terms
Page 335
13.7. Bibliographic Notes
Page 335
13.8. Exercises
Page 335
Chapter 14. Sorting
Page 338
14.1. Introduction
Page 338
14.2. Quicksort
Page 339
14.3. A Parallel Quicksort Algorithm
Page 340
14.3.1. Definition of Sorted
Page 340
14.3.2. Algorithm Development
Page 341
14.3.3. Analysis
Page 341
14.4. Hyperquicksort
Page 343
14.4.1. Algorithm Description
Page 343
14.4.2. Isoefficiency Analysis
Page 345
14.5. Parallel Sorting by Regular Sampling
Page 346
14.5.1. Algorithm Description
Page 346
14.5.2. Isoefficiency Analysis
Page 347
14.6. Summary
Page 349
14.7. Key Terms
Page 349
14.8. Bibliographic Notes
Page 350
14.9. Exercises
Page 350
Chapter 15. The Fast Fourier Transform
Page 353
15.1. Introduction
Page 353
15.2. Fourier Analysis
Page 353
15.3. The Discrete Fourier Transform
Page 355
15.3.1. Inverse Discrete Fourier Transform
Page 357
15.3.2. Sample Application: Polynomial Multiplication
Page 357
15.4. The Fast Fourier Transform
Page 360
15.5. Parallel Program Design
Page 363
15.5.1. Partitioning and Communication
Page 363
15.5.2. Agglomeration and Mapping
Page 365
15.5.3. Isoefficiency Analysis
Page 365
15.6. Summary
Page 367
15.7. Key Terms
Page 367
15.8. Bibliographic Notes
Page 367
15.9. Exercises
Page 367
Chapter 16. Combinatorial Search
Page 369
16.1. Introduction
Page 369
16.2. Divide and Conquer
Page 370
16.3. Backtrack Search
Page 371
16.3.1. Example
Page 371
16.3.2. Time and Space Complexity
Page 374
16.4. Parallel Backtrack Search
Page 374
16.5. Distributed Termination Detection
Page 377
16.6. Branch and Bound
Page 380
16.6.1. Example
Page 380
16.6.2. Sequential Algorithm
Page 382
16.6.3. Analysis
Page 385
16.7. Parallel Branch and Bound
Page 385
16.7.1. Storing and Sharing Unexamined Subproblems
Page 386
16.7.2. Efficiency
Page 387
16.7.3. Halting Conditions
Page 387
16.8. Searching Game Trees
Page 388
16.8.1. Minimax Algorithm
Page 388
16.8.2. Alpha-Beta Pruning
Page 392
16.8.3. Enhancements to Alpha-Beta Pruning
Page 395
16.9. Parallel Alpha-Beta Search
Page 395
16.9.1. Parallel Aspiration Search
Page 396
16.9.2. Parallel Subtree Evaluation
Page 396
16.9.3. Distributed Tree Search
Page 397
16.10. Summary
Page 399
16.11. Key Terms
Page 400
16.12. Bibliographic Notes
Page 400
16.13. Exercises
Page 401
Chapter 17. Shared-Memory Programming
Page 404
17.1. Introduction
Page 404
17.2. The Shared-Memory Model
Page 405
17.3. Parallel `for` Loops
Page 407
17.3.1. `parallel for` pragma
Page 408
17.3.2. Function `omp_get_num_procs`
Page 410
17.3.3. Function `omp_set_num_threads`
Page 410
17.4. Declaring Private Variables
Page 410
17.4.1. `private` Clause
Page 411
17.4.2. `firstprivate` Clause
Page 412
17.4.3. `lastprivate` Clause
Page 412
17.5. Critical Sections
Page 413
17.5.1. `critical` Pragma
Page 415
17.6. Reductions
Page 415
17.7. Performance Improvements
Page 417
17.7.1. Inverting Loops
Page 417
17.7.2. Conditionally Executing Loops
Page 418
17.7.3. Scheduling Loops
Page 419
17.8. More General Data Parallelism
Page 421
17.8.1. `parallel` Pragma
Page 422
17.8.2. Function `omp_get_thread_num`
Page 423
17.8.3. Function `omp_get_num_threads`
Page 425
17.8.4. `for` Pragma
Page 425
17.8.5. `single` Pragma
Page 427
17.8.6. `nowait` Clause
Page 427
17.9. Functional Parallelism
Page 428
17.9.1. `parallel` sections Pragma
Page 429
17.9.2. `section` Pragma
Page 429
17.9.3. `sections` Pragma
Page 429
17.10. Summary
Page 430
17.11. Key Terms
Page 432
17.12. Bibliographic Notes
Page 432
17.13. Exercises
Page 433
Chapter 18. Combining MPI and OpenMP
Page 436
18.1. Introduction
Page 436
18.2. Conjugate Gradient Method
Page 438
18.2.1. MPI Program
Page 438
18.2.2. Functional Profiling
Page 442
18.2.3. Parallelizing Function `matrix_vector_product`
Page 442
18.2.4. Benchmarking
Page 443
18.3. Jacobi Method
Page 444
18.3.1. Profiling MPI Program
Page 444
18.3.2. Parallelizing Function `find_steady_state`
Page 444
18.3.3. Benchmarking
Page 446
18.4. Summary
Page 448
18.5. Exercises
Page 448
Appendix A. MPI Functions
Page 450
Appendix B. Utility Functions
Page 485
B.1. Header File `MyMPI.h`
Page 485
B.2. Source File `MyMPI.c`
Page 486
Appendix C. Debugging MPI Programs
Page 505
C.1. Introduction
Page 505
C.2. Typical Bugs in MPI Programs
Page 505
C.2.1. Bugs Resulting in Deadlock
Page 505
C.2.2. Bugs Resulting in Incorrect Results
Page 506
C.2.3. Advantages of Collective Communications
Page 507
C.3. Practical Debugging Strategies
Page 507
Appendix D. Review of Complex Numbers
Page 509
Appendix E. OpenMP Functions
Page 513
Bibliography
Page 515
Author Index
Page 520
Subject Index
Page 522

Edition Notes

Includes bibliographical references (p. 515-519) and indexes

Published in
Boston

Classifications

Library of Congress
QA76.73.C15 Q55 2004, QA76.73.C15 Q55 2003

The Physical Object

Pagination
xiv, 529 p. :
Number of pages
529

Edition Identifiers

Open Library
OL18143256M
Internet Archive
parallelprogramm0000quin
ISBN 10
0072822562
LCCN
2003046371
OCLC/WorldCat
51892577
LibraryThing
3190613
Goodreads
1746675

Work Identifiers

Work ID
OL3233335W

Excerpts

Are you one of those people for whom "fast" isn't fast enough?
added anonymously.

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

History

Download catalog record: RDF / JSON / OPDS | Wikipedia citation
September 22, 2025 Edited by Drini Add TOC from Tocky
September 3, 2024 Edited by MARC Bot import existing book
December 19, 2023 Edited by ImportBot import existing book
December 8, 2020 Edited by MARC Bot import existing book
October 11, 2008 Created by ImportBot Imported from Miami University of Ohio MARC record