Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 16 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
16
Dung lượng
204,58 KB
Nội dung
40
The new biology and the Grid
Kim Baldridge and Philip E. Bourne
University of California, San Diego, California, United States
40.1 INTRODUCTION
Computational biology is undergoing a revolution from a traditionally compute-intensive
science conducted by individuals and small research groups to a high-throughput, data-
driven science conducted by teams working in both academia and industry. It is this new
biology as a data-driven science in the era of GridComputing that is the subject of this
chapter. This chapter is written from the perspective of bioinformatics specialists who
seek to fully capitalize on the promise of the Grid and who are working with computer
scientists and technologists developing biological applications for the Grid.
To understand what has been developed and what is proposed for utilizing the Grid
in the new biology era, it is useful to review the ‘first wave’ of computational biology
application models. In the next section, we describe the first wave of computational models
used for computational biology and computational chemistry to date.
40.1.1 The first wave: compute-driven biology applications
The first computational models for biology and chemistry were developed for the clas-
sical von Neumann machine model, that is, for sequential, scalar processors. With the
Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox
2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0
908 KIM BALDRIDGE AND PHILIP E. BOURNE
emergence of parallel computing, biological applications were developed that could take
advantage of multiple processor architectures with distributed or shared memory and
locally located disk space to execute a collection of tasks. Applications that compute
molecular structure or electronic interactions of a protein fragment are examples of pro-
grams developed to take advantage of emerging computational technologies.
As distributed memory parallel architectures became more prevalent, computational
biologists became familiar with message passing library toolkits, first with Parallel Virtual
Machine (PVM) and more recently with Message Passing Interface (MPI). This enabled
biologists to take advantage of distributed computational models as a target for executing
applications whose structure is that of a pipelined set of stages, each dependent on the
completion of a previous stage. In pipelined applications, the computation involved for
each stage can be relatively independent from the others. For example, one computer may
perform molecular computations and immediately stream results to another computer for
visualization and analysis of the data generated. Another application scenario is that of
a computer used to collect data from an instrument (say a tilt series from an electron
microscope), which is then transferred to a supercomputer with a large shared memory
to perform a volumetric reconstruction, which is then rendered on yet a different high-
performance graphic engine. The distribution of the application pipeline is driven by the
number and the type of different tasks to be performed, the available architectures that
can support each task, and the I/O requirements between tasks.
While the need to support these applications continues to be very important in com-
putational biology, an emerging challenge is to support a new generation of applications
that analyze and/or process immense amounts of input/output data. In such applications,
the computation on each of a large number of data points can be relatively small, and
the ‘results’ of an application are provided by the analysis and often visualization of the
input/output data. For such applications, the challenge to infrastructure developers is to
provide a software environment that promotes application performance and can leverage
large numbers of computational resources for simultaneous data analysis and processing.
In this chapter, we consider these new applications that are forming the next wave of
computational biology.
40.1.2 The next wave: data-driven applications
The next wave of computational biology is characterized by high-throughput, high technol-
ogy, data-driven applications. The focus on genomics, exemplified by the human genome
project, will engender new science impacting a wide spectrum of areas from crop produc-
tion to personalized medicine. And this is just the beginning. The amount of raw DNA
sequence being deposited in the public databases doubles every 6 to 8 months. Bioinfor-
matics and Computational Biology have become a prime focus of academic and industrial
research. The core of this research is the analysis and synthesis of immense amounts of
data resulting in a new generation of applications that require information technology as
a vehicle for the next generation of advances.
Bioinformatics grew out of the human genome project in the early 1990s. The requests
for proposals for the physical and genetic mapping of specific chromosomes called for
developments in informatics and computer science, not just for data management but for
THE NEW BIOLOGY AND THE GRID 909
innovations in algorithms and application of those algorithms to synergistically improve
the rate and accuracy of the genetic mapping. A new generation of scientists was born,
whose demand still significantly outweighs their supply, and who have been brought
up on commodity hardware architectures and fast turnaround. This is a generation that
contributed significantly to the fast adoption of the Web by biologists and who want instant
gratification, a generation that makes a strong distinction between wall clock and CPU
time. It makes no difference if an application runs 10 times as fast on a high-performance
architecture (minimizing execution time) if you have to wait 10 times as long for a
result by sitting in a long queue (maximizing turnaround time). In data-driven biology,
turnaround time is important in part because of sampling: a partial result is generally
useful while the full result is being generated. We will see specific examples of this
subsequently, for now let us better grasp the scientific field we wish the Grid to support.
The complexity of new biology applications reflects exponential growth rates at dif-
ferent levels of biological complexity. This is illustrated in Figure 40.1 that highlights
representative activities at different levels of biological complexity. While bioinformatics
is currently focusing on the molecular level, this is just the beginning. Molecules form
complexes that are located in different parts of the cell. Cells differentiate into different
types forming organs like the brain and liver. Increasingly complex biological systems
Sequence
Structure
Assembly
Sub-cellular
Cellular
Organ
Higher-life
Year
90 05
Computing
power
Sequencing
technology
Data
1 10 100 1000 100000
95 00
Human
genome
project
E.coli
genome
C.elegans
genome
1 Small
genome/Mo.
ESTs
Yeast
genome
Gene chips
Virus
structure
Ribosome
Model metaboloic
pathway of
E.coli
Biological
complexity
Technology
Brain
mapping
Genetic
circuits
Neuronal
modeling
Cardiac
modeling
Human
genome
# People/Web
site
10
6
10
2
1
Biological experiment Data
Information Knowledge Discovery
Collect
Characterize Compare Model Infer
Figure 40.1 From biological data comes knowledge and discovery.
910 KIM BALDRIDGE AND PHILIP E. BOURNE
generate increasingly large and complex biological data sets. If we do not solve the prob-
lems of processing the data at the level of the molecule, we will not solve problems of
higher order biological complexity.
Technology has catalyzed the development of the new biology as shown on the right
vertical axis of Figure 40.1. To date, Moore’s Law has at least allowed data processing
to keep approximate pace with the rate of data produced. Moreover, the cost of disks,
as well as the communication access revolution brought about by the Web, has enabled
the science to flourish. Today, it costs approximately 1% of what it did 10 to 15 years
ago to sequence one DNA base pair. With the current focus on genomics, data rates
are anticipated to far outweigh Moore’s Law in the near future making Grid and cluster
technologies more critical for the new biology to flourish.
Now and in the near future, a critical class of new biology applications will involve
large-scale data production, data analysis and synthesis, and access through the Web
and/or advanced visualization tools to the processed data from high-performance databases
ideally federated with other types of data. In the next section, we illustrate in more detail
two new biology applications that fit this profile.
40.2 BIOINFORMATICS GRID APPLICATIONS TODAY
The two applications in this section require large-scale data analysis and management,
wide access through Web portals, and visualization. In the sections below, we describe
CEPAR (Combinatorial Extension in PARallel), a computational biology application, and
CHEMPORT, a computational chemistry framework.
40.2.1 Example 1: CEPAR and CEPort – 3D protein structure comparison
The human genome and the less advertised but very important 800 other genomes that
have been mapped, encode genes. Those genes are the blueprints for the proteins that
are synthesized by reading the genes. It is the proteins that are considered the building
blocks of life. Proteins control all cellular processes and define us as a species and as
individuals. A step on the way to understanding protein function is protein structure – the
3D arrangement that recognizes other proteins, drugs, and so on. The growth in the
number and complexity of protein structures has undergone the same revolution as shown
in Figure 40.1, and can be observed in the evolution of the Protein Data Bank (PDB;
http://www.pdb.org), the international repository for protein structure data.
A key element to understanding the relationship between biological structure and func-
tion is to characterize all known protein structures. From such a characterization comes
the ability to be able to infer the function of the protein once the structure has been
determined, since similar structure implies similar function. High-throughput structure
determination is now happening in what is known as structure genomics – a follow-on to
the human genome project in which one objective is to determine all protein structures
encoded by the genome of an organism. While a typical protein consists of 300 of one of
20 different amino acids – a total of 20
300
possibilities – more than all the atoms in the
universe – nature has performed her own reduction, both in the number of sequences and
THE NEW BIOLOGY AND THE GRID 911
in the number of protein structures as defined by discrete folds. The number of unique
protein folds is currently estimated at between 1000 and 10 000. These folds need to be
characterized and all new structures tested to see whether they conform to an existing
fold or represent a new fold. In short, characterization of how all proteins fold requires
that they be compared in 3D to each other in a pairwise fashion.
With approximately 30 000 protein chains currently available in the PDB, and with
each pair taking 30 s to compare on a typical desktop processor using any one of several
algorithms, we have a
(30 000
∗
30 000/2)
∗
30 s size problem to compute all pairwise com-
parisons, that is, a total of 428 CPU years on one processor. Using a combination of data
reduction (a pre-filtering step that permits one structure to represent a number of similar
structures), data organization optimization, and efficient scheduling, this computation was
performed on 1000 processors of the 1.7 Teraflop IBM Blue Horizon in a matter of days
using our Combinatorial Extension (CE) algorithm for pairwise structure comparison [1].
The result is a database of comparisons that is used by a worldwide community of users 5
to 10 000 times per month and has led to a number of interesting discoveries cited in over
80 research papers. The resulting database is maintained by the San Diego Supercom-
puter Center (SDSC) and is available at http://cl.sdsc.edu/ce.html [2]. The procedure to
compute and update this database as new structures become available is equally amenable
to Grid and cluster architectures, and a Web portal to permit users to submit their own
structures for comparison has been established.
In the next section, we describe the optimization utilized to diminish execution time
and increase applicability of the CE application to distributed and Grid resources. The
result is a new version of the CE algorithm we refer to as CEPAR. CEPAR distributes
each 3D comparison of two protein chains to a separate processor for analysis. Since
each pairwise comparison represents an independent calculation, this is an embarrassingly
parallel problem.
40.2.1.1 Optimizations of CEPAR
The optimization of CEPAR involves structuring CE as an efficient and scalable mas-
ter/worker algorithm. While initially implemented on 1024 processors of Blue Horizon,
the algorithm and optimization undertaken can execute equally well on a Grid platform.
The addition of resources available on demand through the Grid is an important next step
for problems requiring computational and data integration resources of this magnitude.
We have employed algorithmic and optimization strategies based on numerical studies on
CEPAR that have made a major impact on performance and scalability. To illustrate what
can be done in distributed environments, we discuss them here. The intent is to familiarize
the reader with one approach to optimizing a bioinformatics application for the Grid.
Using a trial version of the algorithm without optimization (Figure 40.2), performance
bottlenecks were identified. The algorithm was then redesigned and implemented with the
following optimizations:
1. The assignment packets (chunks of data to be worked on) are buffered in advance.
2. The master processor algorithm prioritizes incoming messages from workers since such
messages influence the course of further calculations.
912 KIM BALDRIDGE AND PHILIP E. BOURNE
Performance (3422 ent.)
0
256
512
768
1024
0 256 512 768 1024
Number of processors
Speedup
Ideal
Early stopping
100% processed
Trial
Figure 40.2 Scalability of CEPAR running on a sample database of 3422 data points (protein
chains). The circles show the performance of the trial version of the code. The triangles show the
improved performance after improvements 1, 2, and 4 were added to the trial version. The squares
show the performance based on timing obtained with an early stopping criterion (improvement 3).
The diamonds provide an illustration of the ideal scaling.
3. Workers processing a data stream that no longer poses any interest (based on a result
from another worker) are halted. We call this early stopping.
4. Standard single-processor optimization techniques are applied to the master processor.
With these optimizations, the scalability of CEPAR was significantly improved, as can
be seen from Figure 40.2.
The optimizations significantly improved the performance of the code. The MPI imple-
mentation on the master processor is straightforward, but it was essential to use buffered
sends (or another means such as asynchronous sends) in order to avoid communica-
tion channel congestion. In summary, with 1024 processors the CEPAR algorithm out-
performs CE (no parallel optimization) by 30 to 1 and scales well. It is anticipated that
this scaling would continue even on a larger number of processors.
One final point concerns the end-process load imbalance. That is, a large number of
processors can remain idle while the final few do their job. We chose to handle this by
breaking the runs involving a large number of processors down into two separate runs. The
first run does most of the work and exits when the early stopping criterion is met. Then
the second run completes the task for the outliers using a small number of processors,
thus freeing these processors for other users. Ease of use of the software is maintained
through an automatic two-step job processing utility.
CEPAR has been developed to support our current research efforts on PDB structure
similarity analysis on the Grid. CEPAR software uses MPI, which is a universal standard
for interprocessor communication. Therefore, it is suitable for running in any parallel envi-
ronment that has an implementation of MPI, including PC clusters or Grids. There is no
dependence on the particular structural alignment algorithm or on the specific application.
THE NEW BIOLOGY AND THE GRID 913
The CEPAR design provides a framework that can be applied to other problems facing
computational biologists today where large numbers of data points need to be processed in
an embarrassingly parallel way. Pairwise sequence comparison as described subsequently
is an example. Researchers and programmers working on parallel software for these prob-
lems might find useful the information on the bottlenecks and optimization techniques
used to overcome them, as well as the general approach of using numerical studies to
aid algorithm design briefly reported here and given in more detail in Reference [1]. But
what of the naive user wishing to take advantage of high-performance Grid computing?
40.2.1.2 CEPAR portals
One feature of CEPAR is the ability to allow users worldwide provide their own structures
for comparison and alignment against the existing database of structures. This service cur-
rently runs on a Sun Enterprise server as part of the CE Website (http://cl.sdsc.edu/ce.html)
outlined above. Each computation takes on an average three hours of CPU time for a sin-
gle user request. On occasion, this service must be turned off as the number of requests for
structure comparisons far outweighs what can be processed on a Sun Enterprise server. To
overcome this shortage of compute resources, a Grid portal has been established to han-
dle this situation (https://gridport.npaci.edu/CE/) using SDSC’s GridPort technology [3].
The portal allows this computation to be done using additional resources when available.
Initial target compute resources for the portal are the IBM Blue Horizon, a 64-node Sun
Enterprise server and a Linux PC cluster of 64 nodes.
The GridPort Toolkit [3] is composed of a collection of modules that are used to
provide portal services running on a Web server and template Web pages needed to
implement a Web portal. The function of GridPort is simply to act as a Web frontend to
Globus services [4], which provide a virtualization layer for distributed resources. The
only requirements for adding a new high-performance computing (HPC) resource to the
portal are that the CE program is recompiled on the new architecture, and that Globus
services are running on it. Together, these technologies allowed the development of a
portal with the following capabilities:
• Secure and encrypted access for each user to his/her high-performance computing
(HPC) accounts, allowing submission, monitoring, and deletion of jobs and file man-
agement;
• Separation of client application (CE) and Web portal services onto separate servers;
• A single, common point of access to multiple heterogeneous compute resources;
• Availability of real-time status information on each compute machine;
• Easily adaptable (e.g. addition of newly available compute resources, modification of
user interfaces etc.).
40.2.1.3 Work in progress
While the CE portal is operational, much work remains to be done. A high priority is
the implementation of a distributed file system for the databases, user input files, jobs
in progress, and results. A single shared, persistent file space is a key component of the
distributed abstract machine model on which GridPort was built. At present, files must
914 KIM BALDRIDGE AND PHILIP E. BOURNE
be explicitly transferred from the server to the compute machine and back again; while
this process is invisible to the user, from the point of view of portal development and
administration, it is not the most elegant solution to the problem. Furthermore, the present
system requires that the all-against-all database must be stored locally on the file system of
each compute machine. This means that database updates must be carried out individually
on each machine.
These problems could be solved by placing all user files, along with the databases, in
a shared file system that is available to the Web server and all HPC machines. Adding
Storage Resource Broker (SRB) [5] capability to the portal would achieve this. Work
is presently ongoing on automatically creating an SRB collection for each registered
GridPort user; once this is complete, SRB will be added to the CE portal.
Another feature that could be added to the portal is the automatic selection of compute
machine. Once ‘real-world’ data on CPU allocation and turnaround time becomes avail-
able, it should be possible to write scripts that inspect the queue status on each compute
machine and allocate each new CE search to the machine expected to produce results in
the shortest time.
Note that the current job status monitoring system could also be improved. Work is
underway to add an event daemon to the GridPort system, such that compute machines
could notify the portal directly when, for example, searches are scheduled, start and finish.
This would alleviate the reliance of the portal on intermittent inspection of the queue of
each HPC machine and provide near-instantaneous status updates. Such a system would
also allow the portal to be regularly updated with other information, such as warnings
when compute machines are about to go down for scheduled maintenance, broadcast
messages from HPC system administrators and so on.
40.2.2 Example 2: Chemport – a quantum mechanical biomedical framework
The successes of highly efficient, composite software for molecular structure and dynam-
ics prediction has driven the proliferation of computational tools and the development of
first-generation cheminformatics for data storage, analysis, mining, management, and pre-
sentations. However, these first-generation cheminformatics tools do not meet the needs
of today’s researchers. Massive volumes of data are now routinely being created that
span the molecular scale, both experimentally and computationally, which are available
for access for an expanding scope of research. What is required to continue progress is
the integration of individual ‘pieces’ of the methodologies involved and the facilitation
of the computations in the most efficient manner possible.
Towards meeting these goals, applications and technology specialists have made con-
siderable progress towards solving some of the problems associated with integrating the
algorithms to span the molecular scale computationally and through the data, as well
as providing infrastructure to remove the complexity of logging on to a HPC system
in order to submit jobs, retrieve results, and supply ‘hooks’ into other codes. In this
section, we give an example of a framework that serves as a working environment for
researchers, which demonstrates new uses of the Grid for computational chemistry and
biochemistry studies.
THE NEW BIOLOGY AND THE GRID 915
Figure 40.3 The job submission page from the SDSC GAMESS portal.
Using GridPort technologies [3] as described for CEPAR, our efforts began with the
creation of a portal for carrying out chemistry computations for understanding various
details of structure and property for molecular systems – the General Atomic Molecular
Electronic Structure Systems (GAMESS) [6] quantum chemistry portal (http://gridport.
npaci.edu/gamess). The GAMESS software has been deployed on a variety of com-
putational platforms, including both distributed and shared memory platforms. The job
submission page from the GAMESS portal is shown in Figure 40.3. The portal uses Grid
technologies such as the SDSC’s GridPort toolkit [3], the SDSC SRB [5] and Globus [7]
to assemble and monitor jobs, as well as store the results. One goal in the creation
of a new architecture is to improve the user experience by streamlining job creation
and management.
Related molecular sequence, structure, and property software have been created using
similar frameworks, including the AMBER [8] classical molecular dynamics portal, the
EULER [9] genetic sequencing program, and the Adaptive Poisson-Boltzmann Solver
(APBS) [10] program for calculating electrostatic potential surfaces around biomolecules.
Each type of molecular computational software provides a level of understanding of
molecular structure that can be used for a larger scale understanding of the function.
What is needed next are strategies to link the molecular scale technologies through
the data and/or through novel new algorithmic strategies. Both involve additional Grid
technologies.
Development of portal infrastructure has enabled considerable progress towards the
integration across scale from molecules to cells, linking the wealth of ligand-based
data present in the PDB, and detailed molecular scale quantum chemical structure and
916 KIM BALDRIDGE AND PHILIP E. BOURNE
Builder_Launcher
3DStructure
3DVibration
3DMolecular_Orbitals
3DElectrostatic_Surface
3DSolvent_Surface
3DReaction_Path
3D-Biopolymer_Properties
QMView
Computational
modeling
Protein Data Bank
(PDB)
Experimental
characterization
Quantum mechanics
Highly accurate
Small molecule
Semi Empirical
Moderate accuracy
Moderate size molecule
Empirical
Low accuracy
Large complex
QM compute engine
(e.g. GAMESS)
*
Internet
Quantum Mechanical Biomedical Framework
(QM-BF)
Quantum
Mechanical
Data
Base
(QM-DB)
O
O
O
−
Na
+
N
N
R
n
n
Figure 40.4 Topology of the QM-PDB framework.
property data. As such, accurate quantum mechanical data that has been hitherto under-
utilized will be made accessible to the nonexpert for integrated molecule to cell stud-
ies, including visualization and analysis, to aid in the understanding of more detailed
molecular recognition and interaction studies than is currently available or sufficiently
reliable. The resulting QM-PDB framework (Figure 40.4) integrates robust computational
quantum chemistry software (e.g. GAMESS) with associated visualization and analysis
toolkits, QMView, [11] and associated prototype Quantum Mechanical (QM) database
facility, together with the PDB. Educational tools and models are also integrated into
the framework.
With the creation of Grid-based toolkits and associated environment spaces, researchers
can begin to ask more complex questions in a variety of contexts over a broader range
of scales, using seamless transparent computing access. As more realistic molecular com-
putations are enabled, extending well into the nanosecond and even microsecond range
at a faster turnaround time, and as problems that simply could not fit within the phys-
[...]... emerging Grid standards (e.g the Globus Toolkit) and dynamic reconfigurability By defining XML schema to describe both resources and application codes, and interfaces, using emerging Grid standards (such as Web services, SOAP [13], Open Grid Services Architecture), and building user friendly interfaces like science portals, the next generation portal will include a ‘pluggable’ event-driven model in which Grid- enabled... users • To ensure that the science performed on the Grid constitutes the next generation of advances and not just proof-of-concept computations • To accept feedback from bioinformaticians that is used in the design and implementation of the current environment and to improve the next generation of Grid infrastructure 919 THE NEW BIOLOGY AND THE GRID 40.3.1 A future data-driven application – the encyclopedia... generation of Chemport will require the ability to build complex and dynamically reconfigurable workflows At present, GridPort only facilitates the creation of Web-based portals, limiting its potential use The next step is to develop an architecture that integrates the myriad of tools (GridPort, XML, Simple Object Access Protocol (SOAP) [12]) into a unified system Ideally, this architecture will provide... instances (DataMarts) are derived for different user groups and made accessible to users and applications The Grid is a prime facilitator of the comparative analysis needed for putative assignment The pipeline consists of a series of applications well known in bioinformatics and outlined in Figure 40.7 GRID Ported applications Sequence data from genomic sequencing projects Pipeline data MySQL DataMart(s)... Distribute the database to which every target sequence is to be compared to each node on the Grid We use the nonredundant (NR) protein sequence database and the PFAM databases which contain all unique sequences and those organized by families respectively 2 Schedule and distribute each of 107 target sequences to nodes on the Grid 3 Run PSI-BLAST on each node 4 Retain the PSI-BLAST profiles in secondary storage... is not necessary to await the final result before useful science can be done Once two genomes have been completed, comparative proteomics can begin THE NEW BIOLOGY AND THE GRID 921 Interestingly, the major challenge in using the Grid for EOL is not technical at this stage, but sociological (how should policies for sharing and accounting of data and computational resources be formulated?) and logistical... dedicated set of processors rather than accessing a much larger set of processors that must be shared with others The challenge for Grid researchers is to enable disciplinary researchers to easily use a hybrid of local dedicated resources and national and international Grids in a way that enhances the science and maximizes the potential for new disciplinary advances If this is not challenge enough,... (2000) Development of web toolkits for computational science portals: The NPACI hotpage Proc of the Ninth IEEE International Symposium on High Performance Distributed Computing, 2000 4 Foster, I and Kesselman, C (1997) Globus: A metacomputing infrastructure toolkit International Journal of Supercomputer Applications, 11, 115–128 5 Chaitanya Baru, R M., Rajasekar, A and Wan, M (1998) Proc CASCON ’98... 1347 7 Foster, I and Kesselman, C (1998) IPPS/SPDP ’98 Heterogeneous Workshop, 1998, pp 4–18 8 http://gridport.npaci.edu/amber 9 Pevzner, P., Tang, H and Waterman, M S (2001) An Eulerian approach to DNA fragment assembly Proceedings of the National Academy of Sciences (USA), 98, 9748–9753, http://gridport.npaci.edu/euler srb 10 Baker, N A., Sept, D., Holst, M J and McCammon., J A (2001) The adaptive... Web client Resource layer SOAP SRB Web service SOAP Storage Figure 40.5 The three-layered architecture and the communication through SOAP and GRAM protocols over the Internet 40.3 THE CHALLENGES OF THE GRID FROM THE PERSPECTIVE OF BIOINFORMATICS RESEARCHERS Disciplinary researchers care most about results, not the infrastructure used to achieve those results Computational scientists care about optimizing . of compute resources, a Grid portal has been established to han-
dle this situation (https://gridport.npaci.edu/CE/) using SDSC’s GridPort technology [3].
The. on the promise of the Grid and who are working with computer
scientists and technologists developing biological applications for the Grid.
To understand