Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 328 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
328
Dung lượng
3,39 MB
Nội dung
"If you build it, they will come."
And so we built them. Multiprocessor workstations, massively parallel supercomputers, a cluster in
every department and they haven't come. Programmers haven't come to program these wonderful
machines. Oh, a few programmers in love with the challenge have shown that most types of problems
can be force-fit onto parallel computers, but general programmers, especially professional
programmers who "have lives", ignore parallel computers.
And they do so at their own peril. Parallel computers are going mainstream. Multithreaded
microprocessors, multicore CPUs, multiprocessor PCs, clusters, parallel game consoles parallel
computers are taking over the world of computing. The computer industry is ready to flood the market
with hardware that will only run at full speed with parallel programs. But who will write these
programs?
This is an old problem. Even in the early 1980s, when the "killer micros" started their assault on
traditional vector supercomputers, we worried endlessly about how to attract normal programmers.
We tried everything we could think of: high-level hardware abstractions, implicitly parallel
programming languages, parallellanguage extensions, and portable message-passing libraries. But
after many years of hard work, the fact of the matter is that "they" didn't come. The overwhelming
majority of programmers will not invest the effort to write parallel software.
A common view is that you can't teach old programmers new tricks, so the problem will not be solved
until the old programmers fade away and a new generation takes over.
But we don't buy into that defeatist attitude. Programmers have shown a remarkable ability to adopt
new software technologies over the years. Look at how many old Fortran programmers are now
writing elegant Java programs with sophisticated object-oriented designs. The problem isn't with old
programmers. The problem is with old parallel computing experts and the way they've tried to create a
pool of capable parallel programmers.
And that's where this book comes in. We want to capture the essence of how expert parallel
programmers think about parallel algorithms and communicate that essential understanding in a way
professional programmers can readily master. The technology we've adopted to accomplish this task is
a pattern language. We made this choice not because we started the project as devotees of design
patterns looking for a new field to conquer, but because patterns have been shown to work in ways that
would be applicable in parallel programming. For example, patterns have been very effective in the
field of object-oriented design. They have provided a common language experts can use to talk about
the elements of design and have been extremely effective at helping programmers master object-
oriented design.
This book contains our patternlanguageforparallel programming. The book opens with a couple of
chapters to introduce the key concepts in parallel computing. These chapters focus on the parallel
computing concepts and jargon used in the patternlanguage as opposed to being an exhaustive
introduction to the field.
The patternlanguage itself is presented in four parts corresponding to the four phases of creating a
parallel program:
*
Finding Concurrency. The programmer works in the problem domain to identify the available
concurrency and expose it for use in the algorithm design.
*
Algorithm Structure. The programmer works with high-level structures for organizing a parallel
algorithm.
*
Supporting Structures. We shift from algorithms to source code and consider how the parallel
program will be organized and the techniques used to manage shared data.
*
Implementation Mechanisms. The final step is to look at specific software constructs for
implementing a parallel program.
The patterns making up these four design spaces are tightly linked. You start at the top (Finding
Concurrency), work through the patterns, and by the time you get to the bottom (Implementation
Mechanisms), you will have a detailed design for your parallel program.
If the goal is a parallel program, however, you need more than just a parallel algorithm. You also need
a programming environment and a notation for expressing the concurrency within the program's
source code. Programmers used to be confronted by a large and confusing array of parallel
programming environments. Fortunately, over the years the parallel programming community has
converged around three programming environments.
*
OpenMP. A simple language extension to C, C++, or Fortran to write parallel programs for
shared-memory computers.
*
MPI. A message-passing library used on clusters and other distributed-memory computers.
*
Java. An object-oriented programming language with language features supporting parallel
programming on shared-memory computers and standard class libraries supporting distributed
computing.
Many readers will already be familiar with one or more of these programming notations, but for
readers completely new to parallel computing, we've included a discussion of these programming
environments in the appendixes.
In closing, we have been working for many years on this pattern language. Presenting it as a book so
people can start using it is an exciting development for us. But we don't see this as the end of this
effort. We expect that others will have their own ideas about new and better patterns forparallel
programming. We've assuredly missed some important features that really belong in this pattern
language. We embrace change and look forward to engaging with the larger parallel computing
community to iterate on this language. Over time, we'll update and improve the patternlanguage until
it truly represents the consensus view of the parallel programming community. Then our real work
will begin—using the patternlanguage to guide the creation of better parallel programming
environments and helping people to use these technologies to write parallel software. We won't rest
until the day sequential software is rare.
ACKNOWLEDGMENTS
We started working together on this patternlanguage in 1998. It's been a long and twisted road,
starting with a vague idea about a new way to think about parallel algorithms and finishing with this
book. We couldn't have done this without a great deal of help.
Mani Chandy, who thought we would make a good team, introduced Tim to Beverly and Berna. The
National Science Foundation, Intel Corp., and Trinity University have supported this research at
various times over the years. Help with the patterns themselves came from the people at the Pattern
Languages of Programs (PLoP) workshops held in Illinois each summer. The format of these
workshops and the resulting review process was challenging and sometimes difficult, but without
them we would have never finished this pattern language. We would also like to thank the reviewers
who carefully read early manuscripts and pointed out countless errors and ways to improve the book.
Finally, we thank our families. Writing a book is hard on the authors, but that is to be expected. What
we didn't fully appreciate was how hard it would be on our families. We are grateful to Beverly's
family (Daniel and Steve), Tim's family (Noah, August, and Martha), and Berna's family (Billie) for
the sacrifices they've made to support this project.
— Tim Mattson, Olympia, Washington, April 2004
— Beverly Sanders, Gainesville, Florida, April 2004
— Berna Massingill, San Antonio, Texas, April 2004
Chapter 1. A PatternLanguageforParallel Programming
Section 1.1. INTRODUCTION
Section 1.2. PARALLEL PROGRAMMING
Section 1.3. DESIGN PATTERNS AND PATTERN LANGUAGES
Section 1.4. A PATTERNLANGUAGEFORPARALLEL PROGRAMMING
Chapter 2. Background and Jargon of Parallel Computing
Section 2.1. CONCURRENCY IN PARALLEL PROGRAMS VERSUS OPERATING SYSTEMS
Section 2.2. PARALLEL ARCHITECTURES: A BRIEF INTRODUCTION
Section 2.3. PARALLEL PROGRAMMING ENVIRONMENTS
Section 2.4. THE JARGON OF PARALLEL COMPUTING
Section 2.5. A QUANTITATIVE LOOK AT PARALLEL COMPUTATION
Section 2.6. COMMUNICATION
Section 2.7. SUMMARY
Chapter 3. The Finding Concurrency Design Space
Section 3.1. ABOUT THE DESIGN SPACE
Section 3.2. THE TASK DECOMPOSITION PATTERN
Section 3.3. THE DATA DECOMPOSITION PATTERN
Section 3.4. THE GROUP TASKS PATTERN
Section 3.5. THE ORDER TASKS PATTERN
Section 3.6. THE DATA SHARING PATTERN
Section 3.7. THE DESIGN EVALUATION PATTERN
Section 3.8. SUMMARY
Chapter 4. The Algorithm Structure Design Space
Section 4.1. INTRODUCTION
Section 4.2. CHOOSING AN ALGORITHM STRUCTURE PATTERN
Section 4.3. EXAMPLES
Section 4.4. THE TASK PARALLELISM PATTERN
Section 4.5. THE DIVIDE AND CONQUER PATTERN
Section 4.6. THE GEOMETRIC DECOMPOSITION PATTERN
Section 4.7. THE RECURSIVE DATA PATTERN
Section 4.8. THE PIPELINE PATTERN
Section 4.9. THE EVENT-BASED COORDINATION PATTERN
Chapter 5. The Supporting Structures Design Space
Section 5.1. INTRODUCTION
Section 5.2. FORCES
Section 5.3. CHOOSING THE PATTERNS
Section 5.4. THE SPMD PATTERN
Section 5.5. THE MASTER/WORKER PATTERN
Section 5.6. THE LOOP PARALLELISM PATTERN
Section 5.7. THE FORK/JOIN PATTERN
Section 5.8. THE SHARED DATA PATTERN
Section 5.9. THE SHARED QUEUE PATTERN
Section 5.10. THE DISTRIBUTED ARRAY PATTERN
Section 5.11. OTHER SUPPORTING STRUCTURES
Chapter 6. The Implementation Mechanisms Design Space
Section 6.1. OVERVIEW
Section 6.2. UE MANAGEMENT
Section 6.3. SYNCHRONIZATION
Section 6.4. COMMUNICATION
Endnotes
Appendix A: A Brief Introduction to OpenMP
Section A.1. CORE CONCEPTS
Section A.2. STRUCTURED BLOCKS AND DIRECTIVE FORMATS
Section A.3. WORKSHARING
Section A.4. DATA ENVIRONMENT CLAUSES
Section A.5. THE OpenMP RUNTIME LIBRARY
Section A.6. SYNCHRONIZATION
Section A.7. THE SCHEDULE CLAUSE
Section A.8. THE REST OF THE LANGUAGE
Appendix B: A Brief Introduction to MPI
Section B.1. CONCEPTS
Section B.2. GETTING STARTED
Section B.3. BASIC POINT-TO-POINT MESSAGE PASSING
Section B.4. COLLECTIVE OPERATIONS
Section B.5. ADVANCED POINT-TO-POINT MESSAGE PASSING
Section B.6. MPI AND FORTRAN
Section B.7. CONCLUSION
Appendix C: A Brief Introduction to Concurrent Programming in Java
Section C.1. CREATING THREADS
Section C.2. ATOMICITY, MEMORY SYNCHRONIZATION, AND THE volatile KEYWORD
Section C.3. SYNCHRONIZED BLOCKS
Section C.4. WAIT AND NOTIFY
Section C.5. LOCKS
Section C.6. OTHER SYNCHRONIZATION MECHANISMS AND SHARED DATA
STRUCTURES
Section C.7. INTERRUPTS
Glossary
Bibliography
About the Authors
Index
A PatternLanguageforParallel Programming > INTRODUCTION
Chapter 1. A PatternLanguageforParallel
Programming
1.1 INTRODUCTION
1.2 PARALLEL PROGRAMMING
1.3 DESIGN PATTERNS AND PATTERN LANGUAGES
1.4 A PATTERNLANGUAGEFORPARALLEL PROGRAMMING
1.1. INTRODUCTION
Computers are used to model physical systems in many fields of science, medicine, and engineering.
Modelers, whether trying to predict the weather or render a scene in the next blockbuster movie, can
usually use whatever computing power is available to make ever more detailed simulations. Vast
amounts of data, whether customer shopping patterns, telemetry data from space, or DNA sequences,
require analysis. To deliver the required power, computer designers combine multiple processing
elements into a single larger system. These so-called parallel computers run multiple tasks
simultaneously and solve bigger problems in less time.
Traditionally, parallel computers were rare and available for only the most critical problems. Since the
mid-1990s, however, the availability of parallel computers has changed dramatically. With
multithreading support built into the latest microprocessors and the emergence of multiple processor
cores on a single silicon die, parallel computers are becoming ubiquitous. Now, almost every
university computer science department has at least one parallel computer. Virtually all oil companies,
automobile manufacturers, drug development companies, and special effects studios use parallel
computing.
For example, in computer animation, rendering is the step where information from the animation files,
such as lighting, textures, and shading, is applied to 3D models to generate the 2D image that makes
up a frame of the film. Parallel computing is essential to generate the needed number of frames (24
per second) for a feature-length film. Toy Story, the first completely computer-generated feature-
length film, released by Pixar in 1995, was processed on a "renderfarm" consisting of 100 dual-
processor machines [PS00]. By 1999, for Toy Story 2, Pixar was using a 1,400-processor system with
the improvement in processing power fully reflected in the improved details in textures, clothing, and
atmospheric effects. Monsters, Inc. (2001) used a system of 250 enterprise servers each containing 14
processors for a total of 3,500 processors. It is interesting that the amount of time required to generate
a frame has remained relatively constant—as computing power (both the number of processors and
the speed of each processor) has increased, it has been exploited to improve the quality of the
animation.
The biological sciences have taken dramatic leaps forward with the availability of DNA sequence
information from a variety of organisms, including humans. One approach to sequencing, championed
and used with success by Celera Corp., is called the whole genome shotgun algorithm. The idea is to
break the genome into small segments, experimentally determine the DNA sequences of the segments,
and then use a computer to construct the entire sequence from the segments by finding overlapping
areas. The computing facilities used by Celera to sequence the human genome included 150 four-way
servers plus a server with 16 processors and 64GB of memory. The calculation involved 500 million
trillion base-to-base comparisons [Ein00].
The SETI@home project [SET, ACK
+
02 ] provides a fascinating example of the power of parallel
computing. The project seeks evidence of extraterrestrial intelligence by scanning the sky with the
world's largest radio telescope, the Arecibo Telescope in Puerto Rico. The collected data is then
analyzed for candidate signals that might indicate an intelligent source. The computational task is
beyond even the largest supercomputer, and certainly beyond the capabilities of the facilities available
to the SETI@home project. The problem is solved with public resource computing, which turns PCs
around the world into a huge parallel computer connected by the Internet. Data is broken up into work
units and distributed over the Internet to client computers whose owners donate spare computing time
to support the project. Each client periodically connects with the SETI@home server, downloads the
data to analyze, and then sends the results back to the server. The client program is typically
implemented as a screen saver so that it will devote CPU cycles to the SETI problem only when the
computer is otherwise idle. A work unit currently requires an average of between seven and eight
hours of CPU time on a client. More than 205,000,000 work units have been processed since the start
of the project. More recently, similar technology to that demonstrated by SETI@home has been used
for a variety of public resource computing projects as well as internal projects within large companies
utilizing their idle PCs to solve problems ranging from drug screening to chip design validation.
Although computing in less time is beneficial, and may enable problems to be solved that couldn't be
otherwise, it comes at a cost. Writing software to run on parallel computers can be difficult. Only a
small minority of programmers have experience with parallel programming. If all these computers
designed to exploit parallelism are going to achieve their potential, more programmers need to learn
how to write parallel programs.
This book addresses this need by showing competent programmers of sequential machines how to
design programs that can run on parallel computers. Although many excellent books show how to use
particular parallel programming environments, this book is unique in that it focuses on how to think
about and design parallel algorithms. To accomplish this goal, we will be using the concept of a
pattern language. This highly structured representation of expert design experience has been heavily
used in the object-oriented design community.
The book opens with two introductory chapters. The first gives an overview of the parallel computing
landscape and background needed to understand and use the pattern language. This is followed by a
more detailed chapter in which we lay out the basic concepts and jargon used by parallel
programmers. The book then moves into the patternlanguage itself.
1.2. PARALLEL PROGRAMMING
The key to parallel computing is exploitable concurrency. Concurrency exists in a computational
problem when the problem can be decomposed into subproblems that can safely execute at the same
time. To be of any use, however, it must be possible to structure the code to expose and later exploit
the concurrency and permit the subproblems to actually run concurrently; that is, the concurrency
must be exploitable.
Most large computational problems contain exploitable concurrency. A programmer works with
exploitable concurrency by creating a parallel algorithm and implementing the algorithm using a
parallel programming environment. When the resulting parallel program is run on a system with
multiple processors, the amount of time we have to wait for the results of the computation is reduced.
In addition, multiple processors may allow larger problems to be solved than could be done on a
single-processor system.
As a simple example, suppose part of a computation involves computing the summation of a large set
of values. If multiple processors are available, instead of adding the values together sequentially, the
set can be partitioned and the summations of the subsets computed simultaneously, each on a different
processor. The partial sums are then combined to get the final answer. Thus, using multiple processors
to compute in parallel may allow us to obtain a solution sooner. Also, if each processor has its own
memory, partitioning the data between the processors may allow larger problems to be handled than
could be handled on a single processor.
This simple example shows the essence of parallel computing. The goal is to use multiple processors
to solve problems in less time and/or to solve bigger problems than would be possible on a single
processor. The programmer's task is to identify the concurrency in the problem, structure the
algorithm so that this concurrency can be exploited, and then implement the solution using a suitable
programming environment. The final step is to solve the problem by executing the code on a parallel
system.
Parallel programming presents unique challenges. Often, the concurrent tasks making up the problem
include dependencies that must be identified and correctly managed. The order in which the tasks
execute may change the answers of the computations in nondeterministic ways. For example, in the
parallel summation described earlier, a partial sum cannot be combined with others until its own
computation has completed. The algorithm imposes a partial order on the tasks (that is, they must
complete before the sums can be combined). More subtly, the numerical value of the summations may
change slightly depending on the order of the operations within the sums because floating-point
[...]... application designer. (In spite of the overlapping terminology, a patternlanguage is not a programming language. ) 1.4 A PATTERN LANGUAGE FOR PARALLEL PROGRAMMING This book describes a patternlanguageforparallel programming that provides several benefits. The immediate benefits are a way to disseminate the experience of experts by providing a catalog of good solutions to important problems, an expanded vocabulary, and a methodology for the design of parallel programs. We hope to lower the barrier to parallel programming by providing guidance ... QPC++ Fortunately, by the late 1990s, the parallel programming community converged predominantly on two environments forparallel programming: OpenMP [OMP] for shared memory and MPI [Mesb] for message passing OpenMP is a set of language extensions implemented as compiler directives. Implementations are currently available for Fortran, C, and C++. OpenMP is frequently used to incrementally add parallelism to sequential code. By adding a compiler directive around a loop, for example, the ... among the processors in a balanced way is often not as easy as the summation example suggests. The effectiveness of a parallel algorithm depends on how well it maps onto the underlying parallel computer, so a parallel algorithm could be very effective on one parallel architecture and a disaster on another We will revisit these issues and provide a more quantitative view of parallel computation in the next chapter 1.3 DESIGN PATTERNS AND PATTERN LANGUAGES A design pattern describes a good solution to a recurring problem in a particular context. The pattern ... Mensore PLoP (Japan). The proceedings of these workshops [Pat] provide a rich source of patterns covering a vast range of application domains in software development and have been used as a basis for several books [CS95, VCK96, MRB97, HFR99] In his original work on patterns, Alexander provided not only a catalog of patterns, but also a patternlanguage that introduced a new approach to design. In a pattern language, the patterns are organized into a structure that leads the user through the collection of patterns in such a way that complex ... coordinates velocities (3,N) //velocity vector forces (3,N) //force in each dimension neighbors(N) //atoms in cutoff volume loop over time steps vibrational_forces (N, atoms, forces) rotational_forces (N, atoms, forces) neighbor_list (N, atoms, neighbors) non_bonded_forces (N, atoms, neighbors, forces) update_atom_positions_and_velocities( N, atoms, velocities, forces) physical_properties ( Lots of stuff... presented as patterns because in many cases they map directly onto elements within particular parallel programming environments. They are included in the patternlanguage anyway, however, to provide a complete path from problem description to code Chapter 2 Background and Jargon of Parallel Computing 2.1 CONCURRENCY IN PARALLEL PROGRAMS VERSUS OPERATING SYSTEMS 2.2 PARALLEL ARCHITECTURES: A BRIEF INTRODUCTION 2.3 PARALLEL PROGRAMMING ENVIRONMENTS 2.4 THE JARGON OF PARALLEL COMPUTING... language that introduced a new approach to design. In a pattern language, the patterns are organized into a structure that leads the user through the collection of patterns in such a way that complex systems can be designed using the patterns. At each decision point, the designer selects an appropriate pattern. Each pattern leads to other patterns, resulting in a final design in terms of a web of patterns. Thus, a patternlanguage embodies a design methodology and provides domainspecific advice to the ... receive a message from task A, after which B will send a message to A. Because each task is waiting for the other to send it a message first, both tasks will be blocked forever. Fortunately, deadlocks are not difficult to discover, as the tasks will stop at the point of the deadlock 2.5 A QUANTITATIVE LOOK AT PARALLEL COMPUTATION The two main reasons for implementing a parallel program are to obtain better performance and to solve larger problems. Performance can be both modeled and measured, so in this section we will take ... parallel programs. We hope to lower the barrier to parallel programming by providing guidance through the entire process of developing a parallel program. The programmer brings to the process a good understanding of the actual problem to be solved and then works through the pattern language, eventually obtaining a detailed parallel design or possibly working code. In the longer term, we hope that this patternlanguage can provide a basis for both a disciplined approach to the qualitative ... MPI is implemented as a library of routines to be called from programs written in a sequential programming language, whereas OpenMP is a set of extensions to sequential programming languages. They represent two of the possible categories of parallel programming environments (libraries and language extensions), and these two particular environments account for the overwhelming majority of parallel computing being done today. There is, however, one more category of parallel programming environments, namely languages with builtin features to support parallel programming. . terminology, a pattern language is not a
programming language. )
1.4. A PATTERN LANGUAGE FOR PARALLEL PROGRAMMING
This book describes a pattern language for parallel. Authors
Index
A Pattern Language for Parallel Programming > INTRODUCTION
Chapter 1. A Pattern Language for Parallel
Programming
1.1 INTRODUCTION
1.2 PARALLEL