并行程序设计C、MPI与OpenMP

作　者：	Michael Quinn
出版社：	清华大学出版社
丛编项：	大学计算机教育国外著名教材系列
版权说明：	本书为出版图书，暂不支持在线阅读，请支持正版图书
标　签：	程序理论

ISBN	出版时间	包装	开本	页数	字数
未知	暂无	暂无	未知	0	暂无

作者简介

暂缺《并行程序设计C、MPI与OpenMP》作者简介

内容简介

本书是美国Oregon州立大学的Miachael J. Quinn教授在多年讲授“并行程序设计”课程的基础上编写而成的，主要介绍用C语言，并结合使用MPI和OpenMP进行并行程序设计，内容包括并行体系结构、并行算法设计、消息传递编程、Eratosthenes筛选、Floyd算法、性能分析、矩阵向量乘法、文档分类、蒙特卡洛法、矩阵乘法、线性方程组求解、有限差分方法、排序、快速傅立叶变换、组合搜索、共享存储编程、融合OpenMP和MPI以及5个附录。本书按授课方式安排章节，通过划分、通信、集聚和映射等四步的并行程序设计方法，来解决各种实际的并行性问题，使读者掌握系统化的并行程序设计方法，开发出高效的并行程序。本书不仅是一本优秀的并行程序设计教材，对广大的相关专业人员也很有参考价值。

图书目录

CHAPTER

Motivation and History

1.1 Introduction 1

1.2 Modern Scientific Method 3

1.3 Evolution of Supercomputing

1.4 Modern Parallel Computers

1.4.1 The Cosmic Cube 6

1.4.2 Commercial Parallel Computers 6

1.4.3 Beowulf 7

1.4.4 Advanced Strategic Computing Initiative 8

1.5 Seeking Concurrency 9

1.5.1 Data Dependence Graphs 9

1.5.2 Data Parallelism I0

1.5.3 Functional Parallelism I0

1.5.4 Pipelining 12

1.5.5 Size Considerations 13

1.6 Data Clustering 14

1.7 Programming Parallel Computers 17

1.7.1 Extenda Compiler 17

1.7.2 Extend a Sequential Programming Language 18

1.7.3 Add a Parallel Programming Layer 19

1.7.4 Create a Parallel Language 19

1.7.5 Current Status 21

1.8 Summary 21

1.9 Key Terms 22

1.10 Bibliographic Notes 22

1.11 Exercises 23

CHAPTER 2

Parallel Architectures 27

2.1 Introduction 27

2.2 Interconnection Networks 28

2.2.1 Shared versus Switched Media 28

2.2.2 Switch Network Topologies 29

2.2.3 2-D Mesh Network 29

2.2.4 Binary Tree Network 30

2.2.5 Hypertree Network 31

2.2.6 Butterfly Network 32

2.2.7 Hypercube Network 33

2.2.8 Shuffle-exchange Network 35

2.2.9 Summary 36

2.3 Processor Arrays 37

2.3.1 Architecture and Data-parallel Operations 37

2.3.2 Processor Array Performance 39

2.3.3 Processor Interconnection Network 40

2.3.4 Enabling and Disabling Processors 40

2.3.5 Additional ArchitecturaI Features 42

2.3.6 Shortcomings of Processor Arrays 42

2.4 Multiprocessors 43

2.4.1 Centralized Multiprocessors 43

2.4.2 Distributed Multiprocessors 45

2.5 Multicomputers 49

2.5.1 Asymmetrical Multicomputers 49

2.5.2 Symmetrical Multicomputers 51

2.5.3 Which Model Is Best for a Commodiy Cluster? 52

2.5.4 Differences between Clusters and Networks of Workstations 53

2.6 Flynn's Taxonomy 54

2.6.1 SISD 54

2.6.2 SIMD 55

2.6.3 MISD 55

2.6.4 MIMD 56

2.7 Summary 58

2.8 Key Terms 59

2.9 Bibliographic Notes 59

2.10 Exercises 60

CHAPTER 3

Parallel Algorithm Design 63

3.1 Introduction 63

3.2 The Task/Channel Model 63

3.3 Foster's Design Methodology 64

3.3.1 Partitioning 65

3.3.2 Communication 67

3.3.3 Agglomeration 68

3.3.4 Mapping 70

3.4 Boundary Value Problem 73

3.4.1 Introduction 73

3.4.2 Partitioning 75

3.4.3 Communication 75

3.4.4 Agglomeration and Mapping 76

3.4.5 Analysis 76

3.5 Finding the Maximum 77

3.5.1 Introduction 77

3.5.2 Partitioning 77

3.5.3 Communication 77

3.5.4 Agglomeration and Mapping 81

3.5.5 Analysis 82

3.6 The n-Body Problem 82

3.6.1 Introduction 82

3.6.2 Partitioning 83

3.6.3 Communication 83

3.6.4 Agglomeration and Mapping 85

3.6.5 Analysis 85

3.7 Adding Data Input 86

3.7.1 Introduction 86

3.7.2 Communication 87

3.7.3 Analysis 88

3.8 Summary 89

3.9 Key Terms 90

3.10 Bibliographic Notes 90

3.11 Exercises 90

CHAPTER 4

Message.Passing Programming 93

4.1 Introduction 93

4.2 The Message-Passing Model 94

4.3 The Message-Passing Interface 95

4.4 Circuit Satisfiability 96

4.4.1 Function MPI Init 99

4.4.2 Functions MPI Comm rank and MPI Comm size 99

4.4.3 Function MPI Finalize 101

4.4.4 Compiling MPI Programs I02

4.4.5 Running MPI Programs 102

4.5 Introducing Collective Communication 104

4.5.1 Function MPI Reduce 105

4.6 Benchmarking Parallel Performance 108

4.6.1 Functions MPI wt ime and MPI Wtick 1.8

4.6.2 Function MPI Barri er 108

4.7 Summary 110

4.8 Key Terms I10

4.9 Bibliographic Notes 110

4.10 Exercises 111

CHAPTER 5

The Sieve of Eratosthenes 115

5.1 Introduction 115

5.2 Sequential Algorithm 115

5.3 Sources of Parallelism 117

5.4 Data Decomposition Options 117

5.4.1 Interleaved Data Decomposition 118

5.4.2 Block Data Decomposition !18

5.4.3 Block Decomposition Macros 120

5.4.4 Local lndex versus Global Index 120

5.4.5 Ramifications of Block

Decomposition 121

5.5 Developing the Parallel Algorithm 121

5.5.1 Function MPI Bcast 122

5.6 Analysis of Parallel Sieve Algorithm 122

5.7 Documenting the Parallel Program 123

5.8 Benchmarking 128

5.9 Improvements 129

5.9.1 Delete Even Integers 129

5.9.2 Eliminate Broadcast 130

5.9.3 Reorganize Loops 131

5.9.4 Benchmarking 131

5.10 Summary 133

5.11 Key Terms 134

5.12 Bibliographic Notes 134

5.13 Exercises 134

CHAPTER 6

Floyd's Algorithm 137

6.1 Introduction 137

6.2 The All-Pairs Shortest-Path

Problem 137

6.3 Creating Arrays at Run Time 139

6.4 Designing the Parallel Algorithm 140

6.4.1 Partitioning 140

6.4.2 Communication 141

6.4.3 Agglomeration and Mapping 142

6.4.4 Matrix Input/Output 143

6.5 Point-to-Point Communication 145

6.5.1 Function MPI Send 146

6.5.2 Function MPI Recv 147

6.5.3 Deadlock 148

6.6 Documenting the Parallel

Program 149

6.7 Analysis and Benchmarking 151

6.8 Summary 154

6.9 Key Terms 154

6.10 Bibliographic Notes 154

6.11 Exercises 154

CHAPTER 7

Performance Analysis 159

7.1 Introduction 159

7.2 Speedup and Efficiency 159

7.3 Amdahl's Law 161

7.3.1 Limitations of Amdahl's Law 164

7.3.2 The Amdahl Effect 164

7.4 Gustafson-Barsis's Law 164

7.5 The Karp-Flatt Metric 167

7.6 The Isoefficiency Metric 170

7.7 Summary 174

7.8 Key Terms 175

7.9 Bibliographic Notes 175

7.10 Exercises 176

CHAPTER 8

Matrix.Vector Multiplication 178

8.1 Introduction 178

8.2 Sequential Algorithm 179

8.3 Data Decomposition Options 180

8.4 Rowwise Block-Striped Decomposition 181

8.4.1 Design and Analysis 181

8.4.2 Replicating a Block-Mapped Vector 183

8.4.3 Function MPI Allqatherv 184

8.4.4 Replicated Vector Input/Output 186

8.4.5 Documenting the Parallel Program 187

8.4.6 Benchmarking 187

8.5 Columnwise Block-Striped Decomposition 189

8.5.1 Design and Analysis 189

8.5.2 Reading a Columnwise Block-Striped Matrix 191

8.5.3 Function MPI Scatterv 191

8.5.4 Printing a Coluranwise Block-Striped Matrix 193

8.5.5 Function MPI Gatherv 193

8.5.6 Distributing Partial Results 195

8.5.7 Function MPI_Al l t oal l v 195

8.5.8 Documenting the Parallel Program 196

8.5.9 Benchmarking 198

8.6 Checkerboard Block Decomposition 199

8.6.1 Design and Analysis 199

8.6.2 Creating a Communicator 202

8.6.3 Function MPI Dims creat e 203

8.6.4 Function MPI Cart create 204

8.6.5 Reading a Checkerboard Matrix 205

8.6.6 Function MPI Cart rank 205

8.6.7 Function MPI Cart coords 207

8.6.8 Function MPI Comm split 207

8.6.9 Benchmarking 208

8.7 Summary 210

8.8 Key Terms 211

8.9 Bibliographic Notes 211

8.10 Exercises 211

CHAPTER 9

Document Classification 216

9.1 Introduction 216

9.2 Parallel Algorithm Design 217

9.2.1 Partitioning and Comraunication 217

9.2.2 Agglomeration and Mapping 217

9.2.3 Manager/Worker Paradigm 218

9.2.4 Manager Process 219

9.2.5 Function MPI Abort 220

9.2.6 Worker Process 221

9.2.7 Creating a Workers-only

Communicator 223

9.3 Nonblocking Communications 223

9.3.1 Manager's Communication 224

9.3.2 Function MPI Irecv 224

9.3.3 Function MPI Wai t 225

9.3.4 Workers' Communications 225

9.3.5 Function MPI Isend 225

9.3.6 Function MPI Probe 225

9.3.7 Function MPI Get count 226

9.4 Documenting the Parallel Program 226

9.5 Enhancements 232

9.5.1 Assigning Groups of Documents 232

9.5.2 Pipelining 232

9.5.3 Function MPI Testsome 234

9.6 Summary 235

9.7 Key Terms 236

9.8 Bibliographic Notes 236

9.9 Exercises 236

CHAPTER 10

Monte Carlo Methods 239

10.1 Introduction 239

10.1.1 Why Monte Carlo Works 240

10.1.2 Monte Carlo and Parallel Computing 243

10.2 Sequential Random Number Generators 243

10.2.1 Linear Congruential 244

10.2.2 Lagged Fibonacci 245

10.3 Parallel Random Number Generators 245

10.3.1 Manager-Worker Method 246

10.3.2 Leapfrog Method 246

10.3.3 Sequence Splitting 247

10.3.4 Parameterization 248

10.4 Other Random Number Distributions 248

10.4.1 lnverse Cumulative Distribution Function Transformation 249

10.4.2 Box-Muller Transformation 250

10.4.3 The Rejection Method 251

10.5 Case Studies 253

10.5.1 Neutron Transport 253

10.5.2 Temperature at a Point Insidea 2-D Plate 255

10.5.3 Two-Dimensional lsing Model 257

10.5.4 Room Assignment Problem 259

10.5.5 Parking Garage 262

10.5.6 TraJ]ic Circle 264

10.6 Summary 268

10.7 Key Terms 269

10.8 Bibliographic Notes 269

10.9 Exercises 270

CHAPTER 11

Matrix Multiplication 273

11.1 Introduction 273

11.2 Sequential Matrix Multiplication 274

11.2.1 lterative, Row-Oriented

Algorithm 274

11.2,2 Recursive, Block-Oriented

Algorithm 275

11.3 Rowwise Block-Striped Parallel

Algorithm 277

11.3.1 Identifying Primitive Tasks 277

11.3,2 Agglomeration 278

11,3.3 Communication and Further

Agglomeration 279

11.3.4 Analysis 279

11.4 Cannon's Algorithm 281

11.4.1 Agglomeration 281

11.4.2 Communication 283

11.4.3 Analysis 284

11.5 Summary 286

11.6 Key Terms 287

11.7 Bibliographic Notes 287

11.8 Exercises 287

CHAPTER t2

Solying Linear Systems 2go

12.1 Introduction 290

12.2 Terminology 291

12.3 Back Substitution 292

12.3.1 Sequential Algorithm 292

12.3.2 Row-Oriented Parallel Algorithm 293

12.3.3 Column-Oriented Parallel

Algorithm 295

12.3.4 Comparison 295

12.4 Gaussian Elimination 296

12.4.1 Sequential Algorithm 296

12.4.2 Parallel Algorithms 298

12,4.3 Row-Oriented Algorithm 299

12.4.4 Column-Oriented Algorithm 303

12.4.5 Comparison 303

12.4.6 Pipeline& Row-Oriented Algorithm 304

12.5 Iterative Methods 306

12.6 The Conjugate Gradient Method 309

12.6.1 Sequential Algorithm 309

12.6.2 Parallel Implementation 310

12.7 Summary 313

12.8 Key Terms 314

12.9 Bibliographic Notes 314

12.10 Exercises 314

CHAPTER 13

Finite Difference Methods 318

13.1 Introduction 318

13.2 Partial Differential Equations 320

13.2.1 Categorizing PDEs 320

13.2.2 Difference Quotients 321

13.3 Vibrating String 322

13.3.1 Deriving Equations 322

13.3.2 Deriving the Sequential Program 323

13.3.3 Parallel Program Design 324

13.3.4 Isoefficiency Analysis 327

13.3.5 Replicating Computations 327

13.4 Steady-State Heat Distribution 329

13.4.1 Deriving Equations 329

13.4.2 Deriving the Sequential Program 330

13.4.3 Parallel Program Design 332

13.4.4 Isoeffciency Analysis 332

13.4.5 Implementation Details 334

13.5 Summary 334

13.6 Key Terms 335

13.7 Bibliographic Notes 335

13.8 Exercises 335

CHAPTER 14

Sorting 338

14.1 Introduction 338

14.2 Quicksort 339

14.3 A Parallel Quicksort Algorithm 340

14.3.1 Definition of Sorted 340

14.3.2 Algorithm Development 341

14.3.3 Analysis 341

14.4 Hyperquicksort 343

14.4.1 Algorithm Description 343

14.4.2 lsoefficiency Analysis 345

14.5 Parallel Sorting by Regular Sampling 346

14.5.1 Algorithm Description 346

14.5.2 Isoefficiency Analysis 347

14.6 Summary 349

14.7 Key Terms 349

14.8 Bibliographic Notes 350

14.9 Exercises 350

CHAPTER 15

The Fast Fourier Transform 353

15.1 Introduction 353

15.2 Fourier Analysis 353

15.3 The Discrete Fourier Transform 355

15.3.1 Inverse Discrete Fourier

Transform 357

15.3.2 Sample Application: Polynomial

Multiplication 357

15.4 The Fast Fourier Transform 360

15.5 Parallel Program Design 363

15.5.1 Partitioning and Communication 363

15.5.2 Agglomeration and Mapping 365

15.5.3 lsoefficiency Analysis 365

15.6 Summary 367

15.7 Key Terms 367

15.8 Bibliographic Notes 367

15.9 Exercises 367

CHAPTER 16

Combinatorial-Search 369

16.1 Introduction 369

16.2 Divide and Conquer 370

16.3 Backtrack Search 371

16.3.1 Example 371

16.3.2 Time and Space Complexit3' 374

16.4 Parallel Backtrack Search 374

16.5 Distributed Termination Detection 377

16.6 Branch and Bound 380

16.6.1 Example 380

16.6.2 Sequential Algorithm 382

16.6.3 Analysis 385

16.7 Parallel Branch and Bound 385

16.7.1 Storing and Sharing Unexamined

Subproblems 386

16.7.2 Efficiency 387

16.7.3 Halting Conditions 387

16.8 Searching Game Trees 388

16.8.1 Minimax Algorithm 388

16.8.2 Alpha-Beta Pruning 392

16.8.3 Enhancements to Alpha-Beta

Pruning 395

16.9 Parallel Alpha-Beta Search 395

16.9.1 Parallel Aspiration Search 396

16.9.2 Parallel Subtree Evaluation 396

16.9.3 Distributed Tree Search 397

16.10 Summary 399

16.11 Key Terms 400

16.12 Bibliographic Notes 400

16.13 Exercises 401

CHAPTER 17

Shared-Memory Programming 404

17.1 Introduction 404

17.2 The Shared-Memory Model 405

17.3 Parallel for Loops 407

17.3.1 parallel for Pragma 408

17.3.2 Function omp get num procs 410

17.3.3 Function omp set num threads 410

17.4 Declaring Private Variables 410

17.4.1 private Clause 411

17.4.2 frstprivate Clause 412

17.4.3 lastprivate Clause 412

17.5 Critical Sections 413

17.5.1 critical Pragma 415

17.6 Reductions 415

17.7 Performance Improvements 417

17.7.1 Inverting Loops 417

17.7.2 Conditionally Executing Loops 418

17.7.3 Scheduling Loops 419

17.8 More General Data Parallelism 421

17.8.1 parallel Pragma 422

17.8.2 Function omp get thread_num 423

17.8.3 Function omp get num threads 425

17.8.4 for Pragma 425

17.8.5 singlePragma 427

17.8.6 nowai t Clause 427

17.9 Functional Parallelism 428

17.9.I parallel sections Pragma 429

17.9.2 sectionPragma 429

17.9.3 sections Pragma 429

17.10 Summary 430

17.11 Key Terms 432

17.12 Bibliographic Notes 432

17.13 Exercises 433

CHAPTER 18

Combining MPI and OpenMP 436

18.1 Introduction 436

18.2 Conjugate Gradient Method 438

18.2.1 MPI Program 438

18.2.2 Functional Profiling 442

18.2.3 Parallelizing Function

matrix vector product 442

18.2.4 Benchmarking 443

18.3 Jacobi Method 444

18.3.1 Profiling MPI Progmm 444

1&3.2 Parallelizing Function find steady state 444

1&3.3 Benchmarkmg 446

18.4 Summary 448

18.5 Exercises 448

APPENDIX A

MPI Functions 450

APPENDIX a

Utility Functions 485

B.1 Header FileMyMPI.h 485

B.2 Source File MyMPI.c 486

APPENDIX C

Debugging MPI Programs 505

C.1 Introduction 505

C.2 Typical Bugs in MPI Programs 505

C.2.1 Bugs Resulting in Deadlock 505

C.2.2 Bugs Resulting in Incorrect Results 506

C.2.3 Advantages of Collective

Communications 507

C.3 Practical Debugging Strategies 507

APPENDIX D

Review of Complex Numbers 509

APPENIX E

OpenMP Functions 513

Bibliography 515