Tài liệu Grid Computing P30 pdf

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	16
Dung lượng	118,09 KB

Nội dung

30 Distributed object-based Grid computing environments Tomasz Haupt 1 and Marlon E. Pierce 2 1 Mississippi State University, Starkville, Mississippi, United States, 2 Indiana University, Bloomington, Indiana, United States 30.1 INTRODUCTION Computational Grid technologies hold the promise of providing global scale distributed computing for scientific applications. The goal of projects such as Globus [1], Legion [2], Condor [3], and others is to provide some portion of the infrastructure needed to support ubiquitous, geographically distributed computing [4, 5]. These metacomputing tools provide such services as high-throughput computing, single login to resources distributed across multiple organizations, and common Application Programming Interfaces (APIs) and protocols for information, job submission, and security services across multiple organizations. This collection of services forms the backbone of what is popularly known as the computational Grid, or just the Grid. The service-oriented architecture of the Grid, with its complex client tools and programming interfaces, is difficult to use for the application developers and end users. The perception of complexity of the Grid environment comes from the fact that often Grid Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox  2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0 714 TOMASZ HAUPT AND MARLON E. PIERCE services address issues at levels that are too low for the application developers (in terms of API and protocol stacks). Consequently, there are not many Grid-enabled applications, and in general, the Grid adoption rate among the end users is low. By way of contrast, industry has undertaken enormous efforts to develop easy user interfaces that hide the complexity of underlying systems. Through Web portals the user has access to a wide variety of services such as weather forecasts, stock market quotes and online trading, calendars, e-mail, auctions, air travel reservations and ticket purchasing, and many others yet to be imagined. It is the simplicity of the user interface, which hides all implementation details from the user, that has contributed to the unprecedented success of the idea of a Web browser. Grid computing environments (GCEs) such as computational Web portals are an exten- sion of this idea. GCEs are used for aggregating, managing, and delivering grid services to end users, hiding these complexities behind user-friendly interfaces. Computational Web portal takes advantage of the technologies and standards developed for Internet computing such as HTTP, HTML, XML, CGI, Java, CORBA [6, 7], and Enterprise JavaBeans (EJB) [8], using them to provide browser-based access to High Performance Computing (HPC) systems (both on the Grid and off). A potential advantage of these environments also is that they may be merged with more mainstream Internet technologies, such as information delivery and archiving and collaboration. Besides simply providing a good user interface, computing portals designed around distributed object technologies provide the concept of persistent state to the Grid. The Grid infrastructure is implemented as a bag of services. Each service performs a particular transaction following a client-server model. Each transaction is either stateless or supports only a conversional state. This model closely resemble HTTP-based Web transaction model: the user makes a request by pointing the Web browser to a particular URL, and a Web server responds with the corresponding, possibly dynamically generated, HTML page. However, the very early Web developers found this model too restrictive. Nowadays, most Web servers utilize object- or component-oriented technologies, such as EJB or CORBA, for session management, multistep transaction processing, persistence, user profiles, providing enterprise-wide access to resources including databases and for incorporating third-party services. There is a remarkable similarity between the current capabilities of the Web servers (the Web technologies), augmented with Application Servers (the Object and Com- ponent Technologies), and the required functionality of a Grid Computing Environment. This paper provides an overview of Gateway and Mississippi Computational Web Portal (MCWP). These projects are being developed separately at Indiana University and Mississippi State University, respectively, but they share a common design heritage. The key features of both MCWP and Gateway are the use of XML for describing portal metadata and the use of distributed object technologies in the control tier. 30.2 DEPLOYMENT AND USE OF COMPUTING PORTALS In order to make concrete the discussion presented in the introduction, we describe below our deployed portals. These provide short case studies on the types of portal users and the services that they require. DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 715 30.2.1 DMEFS: an application of the Mississippi Computational Web Portal The Distributed Marine Environment Forecast System (DMEFS) [9] is a project of the Mississippi State team that is funded by the Office of Naval Research. DMEFS’s goal is to provide open framework to simulate the littoral environments across many temporal and spatial scales in order to accelerate the evolution of timely and accurate forecasting. DMEFS is expected to provide a means for substantially reducing the time to develop, prototype, test, validate, and transition simulation models to operation, as well as support a genuine, synergistic collaboration among the scientists, the software engineers, and the operational users. In other words, the resulting system must provide an environment for model development, including model coupling, model validation and data analysis, routine runs of a suite of forecasts, and decision support. Such a system has several classes of users. The model developers are expected to be computer savvy domain specialists. On the other hand, operational users who routinely run the simulations to produce daily forecasts have only a limited knowledge on how the simulations actually work, while the decision support is typically interested only in accessing the end results. The first type of users typically benefits from services such as archiving and data pedigree as well as support for testing and validation. The second type of users benefits from an environment that simplifies the complicated task of setting up and running the simulations, while the third type needs ways of obtaining and organizing results. DMEFS is in its initial deployment phase at the Naval Oceanographic Office Major Shared Resource Center (MSRC). In the next phase, DMEFS will develop and inte- grate metadata-driven access to heterogenous, distributed data sources (databases, data servers, scientific instruments). It will also provide support for data quality assessment, data assimilation, and model validation. 30.2.2 Gateway support for commodity codes The Gateway computational Web portal is deployed at the Army Research Laboratory MSRC, with additional deployment approved for the Aeronautical Systems Center MSRC. Gateway’s initial focus has been on simplifying access to commercial codes for novice HPC users. These users are assumed to understand the preprocessing and postprocessing tools of their codes on their desktop PC or workstation but not to be familiar with common HPC tasks such as queue script writing and job submission and management. Problems using HPC systems are often aggravated by the use of different queuing systems between and even within the same center, poor access for remote users caused by slow network speeds at peak hours, changing locations for executables, and licensing issues for commercial codes. Gateway attempts to hide or manage as much of these details as possible, while providing a browser front end that encapsulates sets of commands into relatively few portal actions. Currently, Gateway supports job creation, submission, monitoring, and archiving for ANSYS, ZNS, and Fluent, with additional support planned for CTH. Gateway interfaces to these codes are currently being tested by early users. Because Gateway must deal with applications with restricted source codes, we wrap these codes in generic Java proxy objects that are described in XML. The interfaces for the invocation of these services likewise are expressed in XML, and we are in the process 716 TOMASZ HAUPT AND MARLON E. PIERCE of converting our legacy service description to the Web service standard Web Services Description Language (WSDL) [10]. Gateway also provides secure file transfer, job monitoring and job management through a Web browser interface. These are currently integrated with the application interfaces but have proven popular on their own and so will be provided as stand-alone services in the future. Future plans for Gateway include integration with the Interdisciplinary Computing Envi- ronment (ICE) [11], which provides visualization tools and support for light code coupling through a common data format. Gateway will support secure remote job creation and management for ICE-enabled codes, as well as secure, remote, sharable visualization services. 30.3 COMPUTING PORTAL SERVICES One may build computational environments such as the one above out of a common set of core services. We list the following as the base set of abstract service definitions, which may be (but are not necessarily) implemented more or less directly with typical Grid technologies in the portal middle tier. 1. Security: Allow access only to authenticated users, give them access only to authorized areas, and keep all communications private. 2. Information resources: Inform the user about available codes and machines. 3. Queue script generation: On the basis of the user’s choice of code and host, create a script to run the job for the appropriate queuing system. 4. Job submission: Through a proxy process, submit the job with the selected resources for the user. 5. Job monitoring: Inform the user of the status of his submitted jobs, and more generally provide events that allow loosely coupled applications to be staged. 6. File transfer and management : Allow the user to transfer files between his desktop computer and a remote system and to transfer files between remote systems. Going beyond the initial core services above, both MCWP and Gateway have identified and have or are in the process of implementing the following GCE-specific services. 1. Metadata-driven resource allocation and monitoring: While indispensable for acquir- ing adequate resources for an application, allocation of remote resources adds to the complexity of all user tasks. To simplify this chore, one requires a persistent and platform-independent way to express computational tasks. This can be achieved by the introduction of application metadata. This user service combines standard authentication, information, resource allocation, and file transfer Grid services with GCE services: metadata discovery, retrieval and processing, metadata-driven Resource Specification Language (RSL) (or batch script) generation, resource brokerage, access to remote file systems and data servers, logging, and persistence. 2. Task composition or workflow specification and management: This user service auto- mates mundane user tasks with data preprocessing and postprocessing, file transfers, format conversions, scheduling, and so on. It replaces the nonportable ‘spaghetti’ shell DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 717 scripts currently widely used. It requires task composition tools capable of describing the workflow in a platform-independent way, since some parts of the workflow may be preformed on remote systems. The workflow is built hierarchically from reusable modules (applications), and it supports different mechanisms for triggering execution of modules: from static sequences with branches to data flow to event-driven systems. The workflow manager combines information, resource brokers, events, resource allocation and monitoring, file transfer, and logging services. 3. Metadata-driven, real-time data access service: Certain simulation types perform assimilation of observational data or analyze experimental data in a real time. These data are available from many different sources in a variety of formats. Built on top of the metadata, file transfer and persistence services, this user service closely interacts with the resource allocation and monitoring or workflow management services. 4. User space, persistency, and pedigree service: This user service provides support for reuse and sharing of applications and their configuration, as well as for preserving the pedigree of all jobs submitted by the user. The pedigree information allows the user to reproduce any previous result on the one hand and to localize the product of any completed job on the other. It collects data generated by other services, in particular, by the resource allocation and workflow manager. 30.4 GRID PORTAL ARCHITECTURE A computational Web portal is implemented as a multitier system composed of clients running on the users’ desktops or laptops, portal servers providing user level services (i.e. portal middleware), and backend servers providing access to the computing resources. 30.4.1 The user interface The user interacts with the portal through either a Web browser, a client application, or both. The central idea of both the Gateway and the MCWP user interfaces is to allow users to organize their work into problem contexts, which are then subdivided into session contexts in Gateway terminology, or projects and tasks using MCWP terms. Problems (or projects) are identified by a descriptive name handle provided by the user, with sessions automatically created and time-stamped to give them unique names. Within a particular session (or task), the user chooses applications to run and selects computing resources to use. This interface organization is mapped to components in the portal middleware (user space, persistency, and pedigree services) described below. In both cases, the Web browser–based user interface is developed using JavaServer Pages (JSP), which allow us to dynamically generate Web content and interface easily with our Java-based middleware. The Gateway user interface provides three tracks: code selection, problem archive, and administration. The code selection track allows the user to start a new problem, make an initial request for resources, and submit the job request to the selected host’s queuing system. The problem archive allows the user to revisit and edit old problem sessions so that he/she can submit his/her job to a different machine, use a different input file, and 718 TOMASZ HAUPT AND MARLON E. PIERCE so forth. Changes to a particular session are stored in a newly generated session name. The administration track allows privileged users to add applications and host computers to the portal, modify the properties of these entities, and verify their installation. This information is stored in an XML data record, described below. The MCWP user interface provides five distinct views of the system, depending on the user role: developer, analyst, operator, customer, and administrator. The developer view combines the selection and archive tracks. The analyst view provides tools for data selection and visualizations. The operator view allows for creating advance scheduling of tasks for routine runs (similar to creating a cron table). The customer view allows access to routinely generated and postprocessed results (plots, maps, and so forth). Finally, the administrator view allows configuration and controlling of all operations performed by the portal. 30.4.2 Component-based middleware The portal middleware naturally splits into two layers: the actual implementation of the user services and the presentation layer responsible for providing mechanisms for the user interactions with the services. The presentation layer accepts the user requests and returns the service responses. Depending on the implementation strategy for the client, the services’ responses are directly displayed in the Web browser or consumed by the client-side application. A key feature of both Gateway and MCWP is that they provide a container-based middle tier that holds and manages the (distributed) proxy wrappers for basic services like those listed above. This allows us to build user interfaces to services without worrying about the implementation of those services. Thus, for example, we may implement the portal using standard service implementations from the Globus toolkit, we may implement some core services ourselves for stand-alone resources, or we may implement the portal as a mixture of these different service implementation styles. The Gateway middle tier consists of two basic sections: a Web server running a servlet engine and a distributed CORBA-based middle tier (WebFlow). This is illustrated in Figure 30.1. The Web server typically runs a single Java Virtual Machine (JVM) on a single server host that contains local JavaBean components. These components may implement specific local services or they may act as proxies for WebFlow-distributed components running in different JVMs on a nest of host computers. WebFlow servers consist of a top-level master server and any number of child servers. The master server acts as a gatekeeper and manages the life cycle of the children. These child servers can in turn provide access to remote backend services such as HPCs running Portal Batch System (PBS) or Load Sharing Facility (LSF) queuing systems, a Condor flock, a Globus grid, and data storage devices. By running different WebFlow child servers on different hosts, we may easily span organizational barriers in a lightweight fashion. For more information on the WebFlow middleware, see References [12, 13, 14]. For a general overview of the role of commodity technologies in computational Grids, see Reference [15]. The MCWP application server is implemented using EJB. The user space is a hierarchy of entities: users, projects, tasks, and applications. The abstract application metadata tree is implemented as entity beans as well with the host-independent information as one database table and host-dependent information as another one. Finally, there are two entities related DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 719 WebFlow master server WebFlow child server WebFlow child server WebFlow child server WebFlow child server SECIOP SECIOP JVM HTTP(S) HTTP(S) Data storage Condor flock HPC + PBS HPC + LSF Globus grid RSH,SSH RSH,SSH Web browser and client applications Web browser and client applications Web server and servlet engine JavaBean service proxy JavaBean local service JavaBean service proxy JavaBean local service Figure 30.1 The Gateway computational portal is implemented in a multitiered architecture. to job status: a job entity (with the unique jobId as the key in the job table) and a host that describes the target machines properties (metadata). It is important to note that all metadata beans (i.e. application, hosts, and data sets) are implemented using a hybrid technology: EJB and XML, that is, a database is used to store many short XML files. The MCWP services are implemented as EJB session beans, and their relationship is depicted in Figure 30.2. The bottom-layer services are clients to the low-level Grid services, the upper-layer services are user level services, and the middle-layer services provides mapping between the two former ones. The task composition service provides a high-level interface for the metadata-driven resource allocation and monitoring. The knowledge about the configuration of each component of the computational task is encom- passed in the application metadata and presented to the user in the form of a GUI. The user does not need to know anything about the low-level Globus interfaces, syntax of RSL or batch schedulers on the remote systems. In addition, the user is given either the default values of parameters for all constituent applications that comprise the task or the 720 TOMASZ HAUPT AND MARLON E. PIERCE Metadata Task composition Scripting tools User space Task repository Advance scheduling SecurityLoggingCron RSL and script generator Job table Workflow manager Resource broker Status Resource allocation File transfer Access to remote file systems Access to data servers and databases EJB container Figure 30.2 MCWP services are implemented as EJB session beans. values of parameters used in any of the previous runs. The application metadata are acces- sible through the metadata service. A configured task, that is, application parameters for all components and relationship between the components (e.g. workflow specification) is transparently saved in the user space (or application context) for later reuse. Optionally, the user may choose to publish the configured task to be used by others through the task repository service. The scripting tool is similar to the task composition service. If several steps are to be executed on the same machine running in a batch mode, it is much more efficient to generate a shell script that orchestrate these steps in a single run, rather than to submit several batch jobs under control of the workflow manager. The advance scheduling service allows an operator to schedule a selected application to run routinely at specified times, say everyday at 2 p.m. The services in the middle and bottom layers have self-describing names. The job table is an EJB entity that keeps track of all jobs submitted through MCWP and is used for reconnection, monitoring, and preserving the task pedigree. The cron service reproduces the functionality of the familiar Unix service to run commands at predefined times, and it is closely related to the advance scheduling user service. The security service is responsible for delegation of the user credentials. For Globus-based DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 721 implementation, it is a client of the myProxy server [16], which stores the user’s temporary certificates. For Kerberos-based systems, it serializes the user tickets. For both MCWP and Gateway, it is natural to implement clients as stand-alone Java applications built as a collection of (Enterprise) JavaBean clients. However, this approach has several drawbacks if applied to the ‘World Wide Grids’. CORBA and EJB technologies are well suited for distributed, enterprise-wide applications but not for the Internet. Going beyond the enterprise boundaries, there is always a problem with client software distribution and in particular the upgrades of the service interfaces. Secondly, the protocols employed for the client-server communication are not associated with standard ports and are often filtered by firewalls, making it impossible for the external users to access the services. Finally, in the case of EJB, currently available containers do not implement robust security mechanisms for extra-enterprise method invocation. An alternative solution is to restrict the scope of client application to the application server and to provide access to it through the World Wide Web, as shown in Figure 30.1 for Gateway and Figure 30.3 for MCWP. Here, the clients are implemented as the server-side Java Beans and these beans are accessed by JSP to dynamically generate user interface as HTML forms. This approach solves the problem of the client software distribution as well as the problem of secure access to the Grid resources. With the Web browser–server communications secured using the HTTPS protocol, and using myProxy server to store the The Grid Web browser: user graphical interface Java server pages Javabeans: EJB clients EJB container Metadata bean Task composition Scripting tools User space Task repository Advance scheduling Resource broker Workflow manager Job table RSL and script generator Cron Logging Security Status Resource allocation File transfer Access to remote file systems Access to data servers and databases Web server integrated with EJB container Kerberized universe Figure 30.3 EJB clients may be implemented as JavaBeans and accessed through JavaServer Pages. 722 TOMASZ HAUPT AND MARLON E. PIERCE user’s Globus (proxy) certificate, the MCWP services are capable of securely allocating services, transfer files, and access data using the Globus grid services. Finally, the server- side Java Beans acting as EJB clients can be easily converted into Web services (Simple Object Access Protocol/Web Services Description Language (SOAP/WSDL)) [17]. There- fore, the MCWP can be implemented as a stand-alone application, deployed using Java WebStart technology, acting as a WSDL client as opposed to EJB client. 30.4.3 Resource tier Computing resources constitute the final tier of the portal. These again are accessed through standard protocols, such as the Globus protocols for Grid-enabled computing resources, and also including protocols such as Java Database Connectivity (JDBC) for database connections. There is always the problem that the computing resource may not be using a grid service, so the transport mechanism for delivering commands from the middle tier to the backend must be pluggable. We implement this in the job submission proxy service in the middle tier, which constructs and invokes commands on the backend either through secure remote shell invocations or else through something such as a globusrun command. The actual command to use in a particular portal installation is configured. 30.5 APPLICATION DESCRIPTORS One may view the middle tier core services as being generic building blocks for assem- bling portals. A specific portal on the other hand includes a collection of metadata about the services it provides. We refer to this metadata as descriptors, which we define in XML. Both MCWP and Gateway define these metadata as a container hierarchy of XML schema: applications contain host computing resources, which contain specific entities like queuing systems. Descriptors are divided into two types: abstract and instance descriptors. Abstract application descriptors contain the ‘static’ information about how to use a particular application. Instance descriptors are used to collect information about a particular run by a particular user, which can be reused later by an archiving service and for pedigree. XML descriptors are used to describe data records that should remain long-lived or static. As an example, an application descriptor contains the information needed to run a particular code: the number of input and output files that must be specified on the command line, the method that the application uses for input and output, the machines that the code is installed on, and so on. Machine, or host, descriptors describe specific computing resources, including the queuing systems used, the locations of application executables, and the location of the host’s workspace. Taken together, these descriptors provide a general framework for building requests for specific resources that can be used to generate batch queue scripts. Applications may be further chained together into a workflow. The GCE Application Metadata Working Group has been proposed as a forum for different groups to exchange ideas and examples of using application metadata, which may potentially lead to a standardization of some of the central concepts. [...]... Virtual Computer, http://www.cs.virginia.edu/ legion, July 20, 2001 Condor: High Throughput Computing, http://www.cs.wisc.edu/condor, July 20, 2001 Foster, I and Kesselman, C (eds) (1999) The Grid: Blueprint for a New Computing Infrastructure San Francisco, CA: Morgan Kaufmann Publishers Global Grid Forum, http://www.gridforum.org, July 20, 2001 Orfali, R and Harkey, D (1998) Client/Server Programming with... grain distributed computing Concurrency and Computation: Practice and Experience, 9(6), 555–577 Akarsu, E (1999) Integrated Three-Tier Architecture for High-Performance Commodity Metacomputing, Ph.D Dissertation, Syracuse University, Syracuse, 1999 Fox, G and Furmanski, W (1999) High performance commodity computing, in Foster, I and Kesselman, C (eds) The Grid: Blueprint for a New Computing Infrastructure... accessing remote, distributed high-performance computing resources We have the capability to provide bridges to different grid infrastructure services, and where required implement these services ourselves The so-called Web services model and particularly the proposed Open Grid Services Architecture (OGSA) [23] represent an important future development for computing portals and their services This is... compliant with the existing security requirements of the centers where they are run Both the Gateway and the MCWP are funded in part to be deployed at DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 725 Department of Defense (DoD) computing centers that require Kerberos [20] for authentication, data transmission integrity, and privacy [21] In the following section we describe some general security issues... persistency, and data pedigree access will remain important services that must be implemented at the application level, although they may use underlying core OGSA services DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 727 Web services are an important development for computational portals because they promote the development of interoperable and reusable services with well-defined interfaces...DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 723 30.6 SAMPLE SERVICE IMPLEMENTATIONS In previous sections we outlined several cores services In this section, we look in some detail at two service implementations: batch script... which the request was created 30.7 KERBEROS SECURITY REQUIREMENTS IN MULTITIERED ARCHITECTURES Security is of utmost importance when grid resources are made available through the Internet Security solutions of commercial Web sites are typically inadequate for computational grids: customers and retailers are protected by third parties such as credit card companies and banks, the company sponsoring the... for a New Computing Infrastructure San Francisco, CA: Morgan Kaufmann Novotny, J., Tuecke, S and Welch, V (2001) An online credential repository for the grid: myproxy, Proceedings of the Tenth International Symposium on High Performance Distributed Computing IEEE Press SOAP Version 1.2, http://www.w3c.org/TR/soap12, July 20, 2001 Gamma E., Helm, R., Johnson, R and Vlissides, J (1995) Design Patterns:... service for computational portals Proceedings of the International Conference on Communications in Computing, 2002 Neuman C and Tso, T (1994) Kerberos: an authentication service for computer networks IEEE Communications, 32(9), 33–38 728 TOMASZ HAUPT AND MARLON E PIERCE 21 Department of Defense High Performance Computing Modernization Program Security Issues, http://www.hpcmo.hpc.mil/Htdocs/Security 22... Technology, http://java.sun.com/products/ejb Haupt, T., Bangalore, P and Henley, G (2001) A computational web portal for distributed marine environment forecast system Proceedings of the High-Performance Computing and Networking, HPCN-Europe 2001, 2001, pp 104–114 Web Services Description Language (WSDL) 1.1, http://www.w3c.org/TR/wsdl, July 8, 2002 ICE Home Page, http://www.arl.hpc.mil/ice/, July 8, 2002 . end users. The perception of complexity of the Grid environment comes from the fact that often Grid Grid Computing – Making the Global Infrastructure a. what is popularly known as the computational Grid, or just the Grid. The service-oriented architecture of the Grid, with its complex client tools and pro-

Ngày đăng: 24/12/2013, 13:16

Xem thêm