Grid Applications – Case Studies

45 414 0
Grid Applications – Case Studies

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

9 Grid Applications Case Studies LEARNING OBJECTIVES In this chapter, we will introduce Grid applications that have applied the core technologies presented in the previous chapters. This chapter will help show: • Where and how to apply Grid technologies? • The problem domains that the Grid can be applied to. • The benefits the Grid can bring to distributed applications. CHAPTER OUTLINE 9.1 Introduction 9.2 GT3 Use Cases 9.3 OGSA-DAI Use Cases 9.4 Resource Management Case Studies 9.5 Grid Portal Use Cases 9.6 Workflow Management Discovery Net Use Cases 9.7 Semantic Grid myGrid Use Case 9.8 Autonomic Computing AutoMate Use Case 9.9 Conclusions The Grid: Core Technologies Maozhen Li and Mark Baker © 2005 John Wiley & Sons, Ltd 380 GRID APPLICATIONS CASE STUDIES 9.1 INTRODUCTION In the previous chapters, we have discussed and explored core Grid technologies, such as security, OGSA/WSRF, portals, moni- toring, resource management and scheduling and workflow. We have also reviewed some projects related to each area of these core technologies. Basically the projects reviewed in the previous chapters are focused on the Grid infrastructure, not applications. In this chapter, we present some representative Grid applications that have applied or are applying the core technologies discussed earlier and describe their make-up and how they are being used to solve real-life problems. The reminder of this chapter is organized as follows. In Section 9.2, we present GT3 applications in the areas of broadcast- ing, software reuse and bioinformatics. In Section 9.3, we present two projects that have employed OGSA-DAI. In Section 9.4, we present a Condor pool being used at University College London (UCL) and introduce three use cases of Sun Grid Engine (SGE). In Section 9.5, we give two use cases of Grid portals. In Section 9.6, we present the use of workflow in Discovery Net project for solv- ing domain-related problems. In Section 9.7, we present one use case of myGrid project. In Section 9.8, we present AutoMate for self-optimizing oil reservoir applications. 9.2 GT3 USE CASES As highlighted in Chapter 2, OGSA has become the de facto standard for building service-oriented Grids. Currently most OGSA-based systems have been implemented with GT3. The OGSA standard introduces the concepts of Grid services, which are Web services with three major extensions as follows: • Grid services can be transient services implemented as instances, which are created by persistent service factories. • Grid services are stateful and associated with service data elements. • Notification can be associated with a Grid service, which can be used to notify clients of the events they are interested in. 9.2 GT3 USE CASES 381 Compared with systems implemented with distributed object tech- nologies, such as Java RMI, CORBA and DCOM, services-oriented Grid systems can bring the following benefits: • Services can be published, discovered and used by a wide user community by using WSDL and UDDI. • Services can be created dynamically, used for a certain time and then destroyed. • A service-oriented system is potentially more resilient than an object-oriented system because if a service being used fails, an alternative service could be discovered and used automatically by searching a UDDI registry. In this section, we present GT3 applications from two areas, one related to broadcasting large amount of data and the other involv- ing software reuse. 9.2.1 GT3 in broadcasting The multi-media broadcasting sector is a fast evolving and reac- tive industry that presents many challenges to its infrastructure, including: • The storage, management and distribution of large media files. As mentioned in Harmer et al. [1], a typical one-hour television programme requires about 25 GB of storage and this could be 100–200 GB in production. In the UK, the BBC needs to distribute approximately 1 PB of material per year to satisfy its broad- casting needs. In addition, the volume of broadcast material is increasing every year. • The management of broadcast content and metadata. • The secure access of valuable broadcast content. • A resilient infrastructure for high levels of quality of service. A Grid infrastructure can meet these broadcasting challenges in a cost-effective manner. To this end, the BBC and Belfast e-Science Centre (BeSC) have started the GridCast project [2] which involves the storage, management and secure distribution of media files. 382 GRID APPLICATIONS CASE STUDIES GT3 has been applied in the project to define broadcast services that can integrate existing BBC broadcast scheduling, automation and planning tools in a Grid environment. A prototype has been built with 1 Gbps connections between the BBC North Ireland sta- tion at Belfast, BBC R&D sector at London and BeSC. Various GT3 services have been implemented: • For the transport of files between sites, • The management of replicas of stored files, • The discovery of sites and services on GridCast. A services-oriented design with GT3 fits the project well because the broadcast infrastructure is by its nature service oriented. 9.2.2 GT3 in software reuse GT3 can be used to execute legacy codes that normally execute on one computer as Grid services that can be published, discovered and reused in a distributed environment. In addition, the mecha- nisms provided in GT3 to dynamically create a service, use it for a certain amount of time and then destroyed it are suitable for mak- ing these programs as services for hire. In this section, we introduce two projects that are wrapping legacy codes as GT3-based Grid services. 9.2.2.1 GSLab GSLab [3] is a toolkit for automatically wrapping legacy codes as GT3-based Grid services. The development of GSLab was moti- vated by the following aspects: • Manually wrapping legacy codes as GT3-based Grid services is a time-consuming and error-prone process. • To wrap a legacy code as a Grid service, the legacy code devel- oper also needs expertise in GT3, which may typically be beyond their current area of expertise. Two components have been implemented in GSLab: the GSFWrap- per and the GSFAccessor. The GSFWrapper is used to automat- ically wrap legacy codes as Grid services and then deploy them 9.2 GT3 USE CASES 383 in a container for service publication. The GSFAccessor is used to discover Grid services and automatically generate clients to access the discovered services wrapped from legacy codes via GSFWrapper. To improve the high throughput of running a large number of tasks generated from a wrapped Grid service, SGE ver- sion 5.3 has been employed with GSLab to dispatch the generated tasks to a SGE cluster. The architecture of GSLab is shown in Figure 9.1. The process of wrapping legacy codes as Grid services involves three stages: service publication, discovery and access: • Publication: GSFWrapper takes a legacy code as an input (step 1) and generates all the code needed to wrap the legacy application as a Grid Service Factory (GSF) and then deploy the wrapped GSF into a Grid service container for publishing (step 2). Once the Grid service container is started, the wrapped GSF will be automatically published in an SGE cluster environment and the jobs generated by the GSF will be scheduled in the SGE cluster. • Discovery: A user browses the GSFs registered in a Grid service container via GSFAccessor (step 3) and discovers a GSF to use. • Access: The user submits a job request to GSFAccessor via its GUI (step 4). Once the GSFAccessor receives a user job sub- mission request, it will automatically generate a Grid service Figure 9.1 The architecture of GSLab 384 GRID APPLICATIONS CASE STUDIES client (step 5) to request a GSF (step 6) to create a Grid ser- vice instance (step 7). Then the Grid service client will access the created instance (step 8) to generate tasks in the form of SGE scripts, which will then be used by an SGE server (step 9) through which to dispatch the tasks to an SGE cluster. One SGE script will be generated for each task in GSLab. A case study, based on a legacy code called q3D [4], has been used to test GSLab. q3D is a C code for rendering 3D-like frames using either 2D geometric shapes or raster images as input primitives, which are organized in layers called cels. q3D has basic 3D features such as lighting, perspective projection and 3D movement. It can handle hidden-surface elimination (cel intersection) when render- ing cels. Figure 9.2 shows four frames taken from an animation rendered by q3D. In the animation, the balloon moves gradually approaching the camera and the background becomes darker. Each frame in the animation has two cels: a balloon cel and a lake cel. Each frame is rendered individually from an input file called stack that contains the complete description of the frame such as the 3D locations of the cels involved. These stack files are generated by makeStacks from a script that describes the animation such as the camera path, cels path and lighting. makeStacks is a C code developed for q3D. To wrap a legacy code as a Grid service, a user needs to provide the parameters to execute the legacy code in the GSFWrapper GUI, as shown in Figure 9.3. Then the GSFWrapper will automatically generate related codes and then deploy the service into a GT3 Grid service container. Figure 9.2 Four frames rendered by q3D using two cels 9.2 GT3 USE CASES 385 Figure 9.3 The GSFWrapper GUI Once a service is published, the client uses the GSFAccessor GUI, as shown in Figure 9.4, to specify the parameters needed to execute the legacy code, e.g. the input data file name, number of jobs to run and output data file name. Once invoked, the GSFAccessor will generate the related code to call the Grid service that is deployed in Figure 9.4 The GSFAccessor GUI 386 GRID APPLICATIONS CASE STUDIES 0 1000 2000 3000 4000 5000 6000 7000 0 50 100 150 200 250 Number of tasks (frames) Time to render frames (seconds) Running one Gq3D instance with multiple tasks on the SGE cluster in GSLab Sequentially running the q3D legacy code on one computer Figure 9.5 The performance of GSLab an SGE-managed cluster and request its services. Figure 9.5 shows the performance of GSLab in wrapping the q3D legacy code as a Grid service accessed in an SGE cluster with five nodes, each of which has a Pentium IV 2.6-GHz processor and 512 MB RAM, running Redhat Linux. 9.2.2.2 GEMLCA The Grid Execution Management for Legacy Code Architecture (GEMLCA) [5] provides a solution for wrapping legacy codes as GT3-based Grid services without re-engineering the original codes. The wrapped GT3 services can then be deployed in a Condor- managed pool of computers. To use GEMLCA, a user needs to write a Legacy Code Interface Description (LCID) file, which is an XML file that describes how to execute the legacy code, e.g. the name of the legacy code and its main binary files, and the job manager (e.g. UNIX fork or Condor). Once deployed in GEMLCA, the legacy code becomes a Grid service that can be discovered and reused. A job submission is based on GT3 MMJFS as described in Chapter 2. A legacy code called MadCity [6], a discrete time-based microscopic simulator for traffic simulations, has been wrapped as a GT3 service and its performance has been demonstrated as a GEMCLA application. The GEMLCA client has been integrated within the P-GRADE portal [7] to provide a GUI that supports workflow enactment. 9.3 OGSA-DAI USE CASES 387 Each legacy code deployed in GEMLCA [5, 8] can be discovered in the GUI and multiple published legacy codes can be composed to form another composite application. 9.2.3 A GT3 bioinformatics application The Basic Local Alignment Search Tool (BLAST) [9] has been widely used in bioinformatics to compare a query sequences to a set of target sequences, with the intention of finding similar sequences in the target set. However, BLAST searches are com- putationally intensive. Bayer et al. [10] present a BLAST Grid ser- vice based on GT3 to speed up the search process, in which the BLAST service interacts with backend ScotGRID [11] computing resources. ScotGRID is a three-site (LHC Tier-2) centre consisting of an IBM 200 CPU Monte Carlo production facility run by the Glasgow Particle Physics Experimental (PPE) group [12] and an IBM 24 TByte data store and associated high-performance server run by EPCC [13]. A 100-CPU farm is based at Durham Univer- sity Institute for Particle Physics Phenomenology (IPPP) [14]. Once deployed as a Grid service, the BLAST service can be accessed by a broad range of users. 9.3 OGSA-DAI USE CASES A number of projects have adopted OGSA-DAI [15], in this section, we introduce eDiaMoND and ODD-Genes. 9.3.1 eDiaMoND The eDiaMoND project [16] is a collaborated project between Oxford University, IBM, Mirada Solutions Ltd and a group of clinical partners. It aims to build a Grid-based system to support the diagnosis of breast cancer by facilitating the process of breast screening. Traditional mammograms (film) and paper records will be replaced with digital data. Each mammogram image is a size of 32 MB and about 250 TB data will need to be stored every year. OGSA-DAI has been used in the eDiaMoND project to access the 388 GRID APPLICATIONS CASE STUDIES large data sets, which are geographically distributed. The work carried out so far has shown the flexibility of OGSA-DAI and the granularity of the task that can be written. 9.3.2 ODD-Genes ODD-Genes [17] is a genetics data analysis application built on SunDCG [18] and OGSA-DAI running on Globus. ODD-Genes allows researchers at the Scottish Centre for Genomic Technol- ogy and Informatics (GTI) in Edinburgh, UK, to automate impor- tant micro-array data analysis tasks securely and seamlessly using remote high-performance computing resources at EPCC. ODD- Genes performs queries on gene identifiers against remote, inde- pendently managed databases, enriching the information available on individual genes. Using OGSA-DAI, the ODD-Genes applica- tion supports automated data discovery and uniform access to arbitrary databases on the Grid. 9.4 RESOURCE MANAGEMENT CASE STUDIES In Chapter 6, we have introduced resource management and scheduling systems, namely, Condor, SGE, PBS and LSF. In this section, we first introduce a Condor pool running at University College London (UCL). Then we introduce three SGE use cases. 9.4.1 The UCL Condor pool A production-level Condor pool has currently been running at UCL since October 2003 [19]. In August 2004, the pool had 940 nodes on more than 30 clusters within the University. Roughly 1 500 000 hours of computational time have been obtained from Windows Terminal Service (WTS) workstations since October with virtually no perturbation to normal workstation usage. An average of 20 000 jobs are submitted on a monthly basis. The implemen- tation of the Globus 2.4 toolkit as a gatekeeper to UCL-Condor allows users to access the pool via Globus certificates and the e-minerals mini-grid [20]. [...]... frames from overnight to 1–2 hours, eliminating bottlenecks from animation process and increasing server utilization rates to almost 95% 390 GRID APPLICATIONS CASE STUDIES 9.5 GRID PORTAL USE CASES 9.5.1 Chiron Chiron [24] is a Grid portal that facilitates the description and discovery of virtual data products, the integration of virtual data systems with data-intensive applications and the configuration... Exploring Williams–Beuren Syndrome Using myGrid Bioinformatics, 20(Suppl 1): i303–i310 (2003) 400 GRID APPLICATIONS CASE STUDIES [45] Morris, C The Natural History of Williams Syndrome: Physical Characteristics Journal of Paediatrics, 113: 31 8–3 26 (1988) [46] Matossian, V., Bhat, V., Parashar, M., Peszynska, M., Sen, M., Stoffa, P and Wheeler, M.F Autonomic Oil Reservoir Optimization on the Grid Concurrency... range of 1. 5–8 .1 m 9.7 SEMANTIC GRID MYGRID USE CASE We have briefly introduced myGrid in Chapters 3 and 7 It is a UK e-Science pilot project, which is developing middleware infrastructure specifically to support in silico experiments in biology myGrid provides semantic workflow registration and discovery In this section, we briefly describe the application of myGrid to the study of Williams–Beuren Syndrome... accessed via a Web-based user interface GridSphere GridSphere is a framework for building Web portals with portlets URL: http:// www.gridsphere.org GSFL Grid Services Flow Language GSFL is a WSFL-based workflow language for OGSA compliant Grid service composition GSH Grid Service Handler GSH is a globally unique URI for a Grid service or a Grid service instance GSI Grid Security Infrastructure GSI provide... chapter, we introduced Chiron and GENIUS for portal applications Regarding workflow management, we described the application of Discovery Net to the areas of distributed genome annotation, SARS virus evolution analysis, urban air pollution monitoring and geo-hazard modelling As one of the leading 398 GRID APPLICATIONS CASE STUDIES projects in Semantic Grid, myGrid has recently been applied to the study of... of the Grid and the right direction upon which the Grid community is currently moving towards 9.10 REFERENCES [1] Harmer, T.J., Donachy, P., Perrott, R.H., Chambers, C., Craig, S., Mallon, B and Wright, C GridCast Using the Grid in Broadcast Infrastructures Proceedings of UK e-Science All Hands Meeting 2003 (AHM ’03), 2003, Nottingham, UK [2] GridCast, http://www.qub.ac.uk/escience/projects/gridcast/... During this process, information from sensors and actuators located on the oil wells in the 396 GRID APPLICATIONS CASE STUDIES field can be fed back into the simulation environment to further control and tune the model to improve the simulator’s accuracy The locations of wells in oil and environmental applications significantly affect the productivity and environmental/economic benefits of a subsurface... flows between services GGF Global Grid Forum GGF is a standards body for Grid technologies URL: http:// www.gridforum.org/ GIS Grid Information Service GIS is part of the Globus Toolkit used to manage resources information Globus Toolkit GMA Goaloriented SLA Scheduling The Globus Toolkit provides middleware technologies for building Grid systems URL: http://www.globus.org Grid Monitoring Architecture The... Gombás, G P-GRADE: A Grid Programming Environment, Journal of Grid Computing, 1(2): 17 1–1 97 (2003) [8] GEMLCA, http://www.cpc.wmin.ac.uk/ogsitestbed/GEMLCA/ [9] BLAST, http://www.ncbi.nlm.nih.gov/BLAST/ [10] Bayer, M., Campbell, A and Virdee, D A GT3 based BLAST Grid Service for Biomedical Research Proceedings of UK All Hands Meeting, 2004, Nottingham, UK [11] ScotGRID, http://www.scotgrid.ac.uk/ 9.10... around Europe and the USA 9.6.2 SARS virus evolution analysis In 2003, SARS spread rapidly from its site of origin in Guangdong Province, in Southern China, to a large number of countries 392 GRID APPLICATIONS CASE STUDIES throughout the world Discovery Net has been used for the analysis of the evolution of the SARS virus to establish the relationship between observed genomic variations in strains taken . GT3 Use Cases 9.3 OGSA-DAI Use Cases 9.4 Resource Management Case Studies 9.5 Grid Portal Use Cases 9.6 Workflow Management – Discovery Net Use Cases 9.7. 9 Grid Applications – Case Studies LEARNING OBJECTIVES In this chapter, we will introduce Grid applications that have applied

Ngày đăng: 19/10/2013, 03:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan