PART D
Grid applications
35
Application overview for the book:
Grid computing – making the
global infrastructure a reality
Fran Berman,
1,2
Geoffrey Fox,
3
and Tony Hey
4,5
1
San Diego Supercomputer Center, and Department of Computer Science and
Engineering, University of California, San Diego, California, United States,
2
Indiana
University, Bloomington, Indiana, United States,
3
EPSRC, Swindon, United Kingdom,
4
University of Southampton, Southampton, United Kingdom
35.1 INTRODUCTION
This book, Grid Computing: Making the Global Infrastructure a Reality, is divided into
four parts. This short chapter introduces the last part, Part D, on applications for the Grid.
All the chapters in the book contain material relevant for Grid applications, but in this
part the focus is the applications themselves. Some of the previous chapters also cover
applications as part of an overview or to illustrate a technological issue.
Rather than merely reviewing a list of applications in this introduction, we abstract
some general principles about the features of different types of applications well-suited
for the Grid. Note that in addition to Chapters 37 to 43 devoted to applications in Part D,
Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox
2003 John Wiley & Sons, Ltd ISBN: 0 -470-85319-0
806 FRAN BERMAN, GEOFFREY FOX, AND TONY HEY
applications are found in Chapters 1, 2, 11, 12, 16, 23, 24, 28, 30 and 33 from Parts A,
B and C and we abstract this material as well here.
35.2 GRID APPLICATIONS
Exactly what types of applications are suitable for Gridcomputing is still an active
research area but some initial discussion is provided in Chapters 1 (Section 1.5), 2 (intro-
duction), and 23 (Section 23.4). One can identify three broad problem architectures for
which Grids have been successful:
1. Megacomputing problems: These correspond to problems that can be divided up into
large numbers of independent parts and are often termed pleasingly or embarrassingly
parallel in the parallel computing domain [1]. This area is addressed in detail with
several examples in Chapters 1, 11, 12, 24 and 33. Further the data analysis part of
particle physics in Chapter 39 and many biological computations (Chapters 40 and
41) fall into this class. Chapter 12 discusses in detail bioinformatics, molecular mod-
eling and finance applications of this architecture. This class also covers ‘parameter
sweep’ applications, an important and popular application paradigm for the Grid. Such
applications can use tools such as Condor and AppLeS Parameter Sweep Template
(APST), discussed in Chapters 11 and 33.
2. Mega and seamless access problems: These correspond to use of Grids to integrate
the access and use of multiple data and compute resources. This underlies all the
‘data deluge’ applications such as those of Chapters 36 and 38 to 43. The scientific
collections described in Chapters 16 and 36 are of this class.
3. Loosely coupled nets: These correspond to functionally decomposed problems (such
as get data, compute, visualize or simulate ocean and simulate atmosphere) where
synchronized (possibly pipelined) operation on the Grid is possible. See discussion of
category 6 below.
In Chapter 23, a rather different view is taken of Grid problem classification. This
chapter uses the motivation or style of computing involved in the Grid to identify four
categories of Grid applications:
4. Community centric: The Grid is used to knit organizations together for collaboration
and would often correspond to problem architecture 2 above. Education in Chapter 43
is clearly of this class. The virtual observatory in Chapter 38 has features of this as it
integrates observations from different instruments.
5. Data-centric: This corresponds closely to problem architecture 2 above and reflects
the ‘data deluge’.
6. Compute-centric: This case is limited in applicability owing to the high latency of Grid
connections, but certain loosely coupled applications and seamless access to multiple
back-end hosts (architectures 3 and 2 above) make the Grid attractive for this category.
There have been several problem-solving environments (see Chapters 24, 28 and 30
for example) built using Grid portals that support the many different loosely coupled
stages of scientific computation with linked data and compute modules.
7. Interaction-centric: This corresponds to problems requiring real-time response and is
not an area where there is much experience so far except perhaps in the real-time mili-
tary simulations illustrated by Synthetic Forces Express [2] discussed in Chapter 1.
GRID COMPUTING: MAKING THE GLOBAL INFRASTRUCTURE A REALITY 807
Further applications involving data-compute-visualization pipelines (Section 1.1 of
Chapter 1) are of this type as are the data navigation examples in C hapter 37. Control
of scientific instruments (Chapters 1, 28 and 37) also has this flavor.
We stress that the above categories overlap and our examples given above are not
meant as precise classifications. For example, seamless access and community integration
(Classes 2 and 4 above) are often supported in conjunction with other cases.
Chapter 1 describes several applications including a general discussion of the e-Science
applications. These include synchrotron data analysis, astronomical virtual observatory,
megacomputing, aircraft design and real-time engine data analysis, satellite operation,
particle physics, combinatorial chemistry and bioinformatics. Chapter 37 gives a historical
perspective with the 1992 vision of this area and gives visualization pipeline, instrument
control and data navigation examples.
As in categories 2 and 5 above, data-intensive applications are expected to be of
growing importance in the next decade as new instruments come online in a variety of
fields. These include basic research in high-energy physics and astronomy, which are
perhaps leading the use of the Grid for coping with the so-called data deluge described
in Chapter 36. This chapter also describes data-centric applications in bioinformatics,
environmental science, medicine and health, social sciences and digital libraries. The
virtual observatory of Chapter 38 describes a new type of astronomy using the Grid to
analyze the data from multiple instruments observing at different wavelengths. High-
energy physics described in Chapter 39 is preparing for the wealth of data (100 petabytes
by 2007) expected from the new Large Hadron Collider at CERN with a careful designed
distributed computer and data a rchitecture.
Biology and chemistry, as we have discussed, may actually stress the Grid even more
with the growing number of commodity instruments, with ultimately, for instance, per-
sonal gene measurements enabling new approaches to healthcare. Aspects of this grand
biology/chemistry vision are described in Chapters 40, 41 and 42. Chapter 40 describes a
variety of biology applications involving both distributed gene sequencing and parameter
sweep style simulations. Chapter 41 describes a different important ‘e-health-care’ prob-
lem – using the Grid to manage a federated database of distributed mammograms. The
use of metadata and the implications for Grid-enabled medicine are stressed. Chapter 42
describes the importance of Grid for combinatorial chemistry – with new instruments pro-
ducing and analyzing compounds in parallel. Here the G rid will manage both individual
laboratories and enable their world- or company-wide integration.
Early Grid applications are naturally focused primarily on academic research but the
Grid will soon be valuable in supporting enterprise IT systems and make an impact
in the broader community. Chapter 43 describes the Grid supporting the collaboration
between teachers, students and the public – a community Grid for education and outreach.
It introduces this vision with a general discussion of the impact of web services on
enterprise computing.
All of these applications indicate both the current and future promise of the Grid. As
Grid software becomes more robust and sophisticated, applications will be able to better
utilize the Grid for adaptive applications, real-time data analysis and interaction, more
tightly coupled applications and ‘poly-applications’ that can adapt algorithm structure of
808 FRAN BERMAN, GEOFFREY FOX, AND TONY HEY
individual application components to available Grid resources. Ultimately, applications
are key to the perceived success or failure of Grid technologies and are critical to drive
technology forward. In Part D of this book, we describe current applications visions
enabled by the Grid.
REFERENCES
1. Fox, G. (2002) Chapter 4, in Dongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy, K.,
Torczon, L. and White, A. (eds) The Sourcebook of Parallel Computing. San Francisco: Morgan
Kaufmann Publishers, ISBN 1-55860-871-0.
2. Synthetic Forces Express, http://www.cacr.caltech.edu/SFExpress/.
. motivation or style of computing involved in the Grid to identify four
categories of Grid applications:
4. Community centric: The Grid is used to knit organizations. applications well-suited
for the Grid. Note that in addition to Chapters 37 to 43 devoted to applications in Part D,
Grid Computing – Making the Global Infrastructure