Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 21 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
21
Dung lượng
1,28 MB
Nội dung
University of California Gird Project Prepared by the UC Research Computing Group I Summary This white paper has been developed by the UC Research Computing Group (UCRCG) in response to the directive by the Information Technology Guidance Committee’s (ITGC) High-Performance Computing Work Group that a secure Computing Grid be developed to link together hardware resources and autonomous organizations at the University of California The Grid is to provide networked resources: computing, storage, and network technology resources in support of research The first task is to provide a Grid infrastructure in order to expose existing computing resources to the UC research community and to facilitate the use of those resources as appropriate to existing research needs and funding The UCRCG proposes to provide this infrastructure by: Creating a Campus Grid at each University of California campus We propose to this by deploying the UCLA Grid Portal (UGP) software, and applying the UCLA Grid architecture, which integrates computing resources into a Grid by the attachment of Grid Appliances The use of Grid Appliances allows for the attachment of independently owned compute clusters to the Grid without changing the way the administrators of those clusters business UGP provides a single intuitive web-based interface to all of the resources in a Grid UGP and the Grid Appliances were developed by UCLA Academic Technology Services (ATS), which has been successfully running a Grid at UCLA since June 2004 Each Campus Grid will expose independently operated campus resources to all researchers belonging to that campus Creating a UC-wide Grid called the UC Grid The UC Grid will allow for the sharing of resources from different campuses Every user of a Campus Grid will also be able to access the UC Grid to use multi-campus resources The UC Grid will deploy the same UGP software as the Campus Grids, thus providing the same user interface as the Campus Grids It will connect to the Grid Appliances already installed on the campuses as part of the Camps Grids; additional Grid Appliances will not be required Deploying the Grid Certificate Authority (CA) for all the Campus Grids and the UC Grid at the UC Grid level This will provide each user with a single credential that will be recognized by all the Grids, campus and UC, thus making the sharing of computer resources between campuses possible and with single sign on Grid certificates meet the X.509 certificate standard [1] Adoption of the Grid CA and certificates will not prevent the adoption of another UC-wide authentication service at a later date Using resource pools to provide researchers with the most appropriate compute resources and software anywhere in the UC system according to their compute requirements This will allow each Grid Portal to manage resource allocation within its Grid in order to optimize performance and utilization This initial deployment will fulfill the following mandates: To develop a secure Grid of computing and storage resources in support of research To augment UC’s research technology infrastructure in an interoperable fashion to facilitate the sharing of resources To optimize performance and utilization To deliver services that address researchers’ needs while encouraging behavior that benefits the common good It will also provide easy access to a very large number of users without having to create individual user login ids for them on all of the clusters After deployment, parameters and codes can be optimized and tuned to maximize performance, stability, security, and resource utilization while at the same time ensuring the fastest turnaround for users Funding models and infrastructure at the campus level will have to be addressed in order to create the Campus Grids Currently, different funding models for computing are in use at the different campuses The creation of the Campus Grids will require each campus to provide: A Grid Administrator to install and maintain the UGP software; install and provide the Grid Appliances to the administrators of the compute resources on that campus; and provide some user services, such as adding and approving users, keeping the Grid software infrastructure up to date, etc At least computers and a Storage Area to act as the Campus Grid Portal Additional computers will be required to provide load balancing when usage increases One computer (the Grid Appliance) for each computational resource (usually a compute cluster) that joins the Campus Grid In addition to leveraging short-term funding opportunities initially, funding models will have to be developed in the long term that can sustain the Grid services and the technologies behind them Currently the emphasis in developing the UCLA Grid Architecture and the UGP software has been to join computational clusters together into a Grid To extend the UC Grid concept to enable the creation of a California Grid for use within K-12 education, will require that Grid services be expanded to provide for other services in addition to batch computing This will require an assessment of needs as well as the development necessary to a) connect the kinds of compute resources that meet those needs into the Grid and b) add the user interfaces for those kinds of resources into the UGP software This will be addressed in a later phase of Grid development II Grids A Grid is a collection of independently owned and administered resources which have been joined together by a software and hardware infrastructure that interacts with the resources and the users of the resources to provide coordinated dynamic resource sharing in a dependable and consistent way according to policies that have been agreed to by all parties Because of the large number of resources available on a Grid, at any given time, an individual researcher can always be provided with the best resources for his/her needs, and overall, resource utilization can be distributed for maximum efficiency In 1969 Leonard Kleinrock, the UCLA professor who was one of the founders of the ARPA net (now the Internet) stated: "We will probably see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country." A Grid, is such a "computer utility" It presents a uniform interface to resources that exist within different autonomous administrative domains The Globus Alliance [2], a consortium with contributing members from universities, research centers, and government agencies, conducts research and does software development to provide the fundamental technologies behind the Grid The Globus ToolKit [3] software, developed by the Globus Alliance, forms the underpinning of the most Grids This toolkit implements a command line interface and is thus not recommended for end users because of its detailed command syntax and long learning time The UCLA Grid Portal (UGP) software, built on top of Globus ToolKit 4.0 and GridSphere [4], uses Java portlets and AJAX technology [5] to provide and easy-to-use web-based interface for Grids III The UCLA Grid Architecture and the UCLA Grid Portal (UGP) Software UGP and the UCLA Grid Architecture bring computational clusters together into a Grid, The hardware resources making up the Grid consist entirely of computational clusters each of which consists of a head node, compute nodes, storage nodes, network resources, software, and data resources Individual computational clusters can be quite large, containing hundreds of nodes By incorporating the concepts of pooled resources and Pool Users, UGP facilitates the sharing of resources among users Administrative overhead is reduced because there is no longer a need to add individual user login ids on multiple clusters UGP: Provides a single through-the-web interface to all of the clusters in a Grid This interface hides user interface and scheduler differences among clusters and it makes it easy to work with multiple clusters at once Provides a single login for users A user logs into the Grid Portal, not into each of the individual clusters that the user will use Provides resources both to: users who have login ids on individual clusters, Cluster Users, and users who not, Pool-Only Users Any person with campus affiliation can easily gain access to resources throughout the Grid by becoming a Pool-Only User Is secure to the extent possible by up to date technology Clusters can sit behind firewalls if that is their policy A Grid Appliances is open only to the cluster to which it is attached and the Grid Portal Proxy certificates are used for authentication at every step of the way (between Grid Portal and Grid Appliances) Users never handle their certificates At the same time as the UGP presents a uniform appearance to users, the UCLA Grid Architecture provides for a Grid made up of diverse computing environments (hardware, operating systems, job schedulers) and autonomous administrative domains Local organizations own and maintain control of the resources involved, and local administrative procedures and security solutions take precedence Each of the participating clusters is attached to the Grid Portal via a Grid Appliance, provided by the organization that administers the Grid and maintained by the Grid administrator, which serves as the gateway between the Grid Portal and that cluster The addition of a Grid Appliance to a cluster in no way modifies policy decisions at the cluster level Any participating cluster can always also be used directly by users who login to the cluster head node, without them having to go through the Grid Portal A Architecture The UCLA Grid Architecture is depicted in Figure Figure UCLA Grid Architecture In Figure a user connects, via a web browser, to a Grid Portal Three additional machines are joined to the Grid Portal to provide 1) storage for user certificates, 2) storage space for user files, and 3) through the web visualization of user’s data Two computational clusters are depicted at the right side of Figure Each cluster consists of compute nodes and a head node (In the absence of a Grid Portal, users normally logon to a cluster via its head node.) The Grid Appliance, which acts like an alternative head node (and submission host for the job scheduler) for the Grid Portal only, connects the Grid Portal to the cluster User’s home directories from the cluster it is attached to must be cross-mounted on the Grid Appliance Both the Grid Portal and the Grid Appliances run the Globus ToolKit (which has been customized on the Appliances) The Grid Portal additionally runs the Apache Tomcat web server, MySQL [6], GridSphere, and the UGP software (by ATS) B User Types on the Campus Grids Two types of users are supported by UGP: Cluster Users A cluster user has a login id on one or more of the clusters participating in the Grid A cluster user can get this login id by being a member of a research group that owns one of the clusters Someone with computational needs can normally also apply for a login id on any cluster that is provided as a campus service o Cluster users have home directories on each of the clusters they can access They use their home directories to store files o Cluster users can use the Grid Portal to access files on and submit jobs to the clusters they have access to o Cluster users can also submit jobs to resource pools as a Pool User Pool-Only Users – Students, staff, and faculty members who not have login ids on any of the clusters can easily sign up on the Grid Portal to be Pool-Only Users o Each Pool-Only User is assigned a storage area on the Storage Server connected to the Grid Portal o The Pool-Only User can submit jobs to resource pools C The Resource Pool Clusters which have contractual obligations with their granting agencies: NSF, NIH etc., to provide a fixed percentage of their compute cycle to the campus can share those cycles with the campus community by joining the campus resource pool Clusters that are provided solely as campus resources can also join the resource pool Clusters contribute both cycles and applications to the resource pool and a cluster administrator can determine which of the applications available on that cluster to contribute to the pool (The Grid administrator does not take any responsibility for the application updates or maintenance on individual clusters That is the responsibility of each cluster administrator.) Pooled resources are available for use by anyone who can login to the Grid Portal Currently pooled resources run applications only When a user submits a pool job, UGP selects that cluster which will give that job the best turnaround from among the clusters that are contributing the application requested to the pool D Services provided by UGP Services currently provided by UGP include: Resources – Allows one to see at a glance, the status of all the clusters Both summarized status information, and detailed status information is provided Data Manager – Allows one to: o List and manage files on the clusters and the Storage Server including all services normally provided for files in compute environments: create, remove, move, copy, change permissions, create directory, compress/uncompress, etc o View and edit text files, view image files, visualize computational results o Copy files and directories between a cluster or the Storage Server and the user’s local machine (upload/download) o Copy files and directories between clusters or between a cluster and the Storage Server Job Services – Allows one to submit a job and view the job status and results Special application services provide easy access to all popular applications Cluster users can submit jobs to specific clusters All users can submit jobs to the resource pool When a job is submitted to the resource pool, using a best fit algorithm, UGP selects the cluster to run it and stages all the input files to that cluster from any accessible cluster or Storage Server Once the job has completed, it is the user’s responsibility to transfer the output files from the cluster on which the job has run to a more permanent location Other Grids – Provides Data Manager and Job Services for clusters, not part of the Grid connected to the Grid Portal the user is using, but which are part of other Grids that are open and not behind firewalls The MyProxy Server [7] for the other grid muse also be available to UGP Currently service is provided to several clusters that are part of the TeraGrid To use a cluster on another Grid, a user must enter his/her certificate username/passphrase on that Grid into a form provided by UGP UGP then retrieves the user proxy certificate from a MyProxy server on that other Grid and uses that proxy certificate to access the requested outside cluster Grid Development Environment – Provides a development environment for the editing and compilation of C/C++ and Fortran source codes on the clusters IV Expanding the UCLA Grid Architecture to Encompass the University of California The diagram in Figure is a simplified version of the architecture shown in Figure In it, a box labeled “C” represents an entire cluster and a box labeled “A” represents a Grid Appliance The ION Visualization Server is not shown because not all campuses may have an ION Visualization Server The campus Grid Portal includes a CA (Certificate granting Authority) for the Grid This is the way the Grid Portal at UCLA is currently configured With the advent of the UC Grid, the Grid Portal at each campus will no longer include a CA as the CA for the all of the University of California Grids, both Campus Grids and the UC Grid, will be at the UC Grid Portal Figure Single-Campus Architecture Figure depicts the Multi-Campus Grid architecture for the University of California This figure depicts the Campus Grids for three campuses and UC Grid The Campus Grid shown for each campus is identical to the one shown in Figure except that a CA is not included at the campus level A single CA for the Grid is included as part of the UC Grid and a special service, the UC Register Service, has been added to the UC Grid Portal Note also that each Grid Appliance must be open to both the Campus Grid Portal and the UC Grid Portal Figure Multi-Campus Architecture for the University of California This design allows: Each user of a Campus Grid to also use the UC Grid Portal Each Cluster User to access the Campus Grid Portal of each campuses whose Grid includes clusters on which that user has a login id The UC Grid Portal to access every cluster that belongs to each of the Campus Grids, i.e., every cluster that participates in a Campus Grid also participates in the UC Grid Clusters at the campus level to contribute both cycles and applications to the UC resource pool in addition to the campus resource pool of the local campus The clusters that contribute cycles to the UC resource pool, and the applications they contribute to that pool, not have to be the same as the ones that contribute to the Campus resource pool Contributing to the resource pools is not a requirement for a cluster to join the Grid When a cluster administrator wants to join his/her cluster to the Campus Grid, he/she must also join his/her cluster to the UC Grid This is a requirement A Grid Certificate Authority, Grid Certificates and MyProxy Servers The Globus ToolKit uses public key cryptography The UC Grid Portal has a Simple Certificate Authority for the Grid (Grid CA) associated with it When a user requests a username in order to access one of the Campus Grids in the UC system, that user will be issued a certificate signed by the UC Grid’s CA The certificate consists of two parts, the public key and the private key With the UCLA Grid Architecture, these are never returned to the user Instead, the certificate is automatically digitally signed by the CA and the public key and private keys are stored in two MyProxy servers, one at the UC Grid Portal and the other at the Campus Grid Portal The digital signature of the CA guarantees that the certificate has not been tampered with The user never handles the certificate and may not even know that a certificate exists To use a Grid Portal, a user must login by providing his/her username and passphrase This provides access to the user’s private key Once the UC Grid has been set up, when a user logs into the UC Grid Portal, that portal will look up the user in its MyProxy Server; when a user logs into a Campus Grid Portal, that Grid Portal will look up the user in its own MyProxy Server If for some reason, its MyProxy Server is unavailable or the user is not found there, the Campus Grid Portal can look for the user in the MyProxy Server belonging to the UC Grid Once the user has been validated, UGP will retrieve a proxy certificate for the user from the MyProxy Server The proxy certificate has a limited lifespan, normally one day, The Grid Portal uses that proxy certificate on the user’s behalf every time it contacts one of the clusters, via its Grid Appliance, to perform a service for that user The proxy certificate is destroyed once the user logs out B User Types on the UC Grid The UC Grid will have two types of users: Cluster Users A Cluster User is a user that has a login id on at least one cluster at least one campus Pool-Only Users – There is no need to assign a storage area on the Storage Server connected to the UC Grid Portal; the UC Grid Portal can access the user’s files that are on the Storage Server at the user’s local Campus Grid Portal Use of the UC Grid Portal is the best choice for Cluster Users with access to clusters on different campuses as all clusters UC-wide that that user can access will be accessible from the UC Grid portal Use of the UC Grid Portal will be advantageous for Pool-Only users only to the extent that the UC Portal can solicit cluster administrators UC-wide to contribute to its resource pool The UC Portal is the only Grid Portal from which users can submit jobs to the UC resource pool C Workflow to add a User The workflow required to add a user to the Grid always begins at the Campus Grid Portal because it is on the Campus Grid level where the user has the strongest affiliation and is known The workflow, depicted in Figures and always results in a user who has been added to both his/her Campus Grid Portal and to the UC Grid Portal Figure Workflow to add a User, Part 10 Figure Workflow to add a User, Part In both figures, the part in the upper right, above the horizontal line represents what happens at the UC Grid Portal while the rest of the figure represents what happens at the Campus Grid Portal To login to a Grid Portal a user needs: A Grid certificate A GridSphere account on that Portal Additionally: o A Cluster User must be added to the gridmap file on the Grid Appliance connected to one of the clusters on which he/she has a login id as part of the creation process o A Pool-Only user needs to be assigned a storage area on the Grid Portal’s Storage Server A GridSphere account is required for each Grid Portal because UGP is built on top of the GridSphere Portal Framework and that framework requires the accounts GridSphere is a portlet container that implements the JSR 168 standard The use of GridSphere allowed 11 UGP to be written in the form of portlets, the small, self-contained, pluggable user interface components that provide the functionality of the UGP The UC Register Service, shown in the figures at the UC level (and the clients that communicate with it at the campus level) is currently (as of August 2006) under development at UCLA ATS It is required to make the single UC Grid CA and the single UC Grid username space work for the UC Grid and the Campus Grids The workflow starts when the user goes to the home page of his/her Campus Grid Portal and clicks the link that says “Apply for Grid Access” This link will be absent from the UC Grid Portal The first step will be for the user to authenticate There are a number of ways that the user can this: If the user is to be a Cluster User, ssh authentication can be used to prove that the user can login to a cluster The cluster used for ssh authentication is automatically added to the user’s cluster access list At UCLA, ISIS authentication is used for Pool-Only users ISIS is the campus authentication method that knows about all UCLA-affiliated individuals: students, faculty, staff, etc At other campuses, whatever authentication method the campus is using can be used If, at some time in the future, a UC-Wide authentication method such as Shibboleth is initiated, it can be used After the user has authenticated and proved that he/she is eligible to join the Grid, the user will be presented with a form asking for his/her name, organization and other identifying information, as well as for a proposed Grid username and passphrase This is the username and passphrase that the user will use to login to the Grid Portals A client to the UC Register Service will then contact that service to make sure that the proposed username is unique If not, the user will be prompted for another selection until a unique username is found The UC Register Service will then add that username and passphrase, along with the information about the user, in its database and mark it pending Pending records are purged from the database after a short period of time, less than weeks, if they not become permanent What happens next depends upon the user type If the user is a Cluster User, a message is automatically sent to the cluster administrator of the cluster that the user used for ssh authentication, requesting that the cluster administrator add the user into the gridmap file on that cluster The gridmap file is the link between the user’s Grid username and the user’s login id on that particular cluster Included in the message is the Distinguished Name (DN) that will be in the certificate when it is created for that user Every certificate has a Distinguished Name (DN), a string that includes the name of the issuing 12 organization and unit, and the user’s common name (this is the user’s actual real full name) The cluster administrator must add the user’s DN and local login id as a record in the gridmap file for that cluster Without this record, the Grid Portal cannot take any action on the user’s behalf on that cluster: listing files, submitting jobs, etc Once the cluster administrator has done this, he/she clicks on a link in the message he/she received That takes him/her to the Campus Grid Portal where the cluster administrator must first authenticate and then click on a button indicating that he/she has approved the user and taken the requisite action This causes a message to be sent to the Grid administrator for the campus Pool-Only Users cannot use a cluster in their own right The step described above is skipped and a message is sent directly to the Grid administrator for the campus The Grid administrator must take an additional step of creating a storage area for the Pool-Only User on the Storage Server associated with the Grid Portal If the Grid administrator for the campus approves the user, he/she clicks on a link in the message he/she has received and is taken to the Campus Grid Portal were the Grid administrator authenticates and then clicks on a button to indicate that he/she has approved the user This automatically creates the GridSphere account on that portal and causes the UC Register Client to communicate with the UC Register Service on the UC Grid Portal that the user has been processed and approved The UC Register Service takes the following actions: Removes the pending mark from the user’s record in its database Creates and signs the certificate for the user Pushes the certificate to the UC MyProxy server Pushes the certificate to the Campus MyProxy server The user can now login to both the Campus Grid Portal and the UC Grid Portal D What Happens When a Cluster User has Login Ids on Clusters Belonging to Different Campuses There will always be users who have login ids on clusters that are physically located on different campuses For example, the CNSI is a joint institute that maintains clusters on both the UCLA and UCSB campuses Suppose Professor A., a member of the CNSI, is at UCLA and has access to all the CNSI clusters He can start by becoming a Cluster User of the UCLA Grid Once Professor A has his certificate and can login to both the UCLA and UC Grid Portals, he can go to the UCSB Grid Portal, authenticate by entering his Grid username and passphrase, and add the CNSI cluster at UCSB to the clusters he can access The act of adding the cluster will also add his GridSphere account at the UCSB Grid Portal thus enabling him to logon to that portal The workflow for adding a cluster at a different campus is shown in Figures and 13 Professor A can use the CNSI clusters at UCLA from the UCLA Grid Portal, the CNSI clusters at UCSB from the UCSB Grid Portal, or all the CNSI clusters, as well as any other clusters he has access to on any of the campuses from the UC Grid Portal Figure Adding a Cluster on Another Campus, Part Figure Adding a Cluster on Another Campus, Part V Pools 14 UGP currently incorporates resource pools To summarize the current state of pools: Clusters can contribute resources and applications to the resource pool on each Grid All users of a Grid Portal can use the pooled resources and applications available there by submitting jobs to the resource pool When a job is submitted to the resource pool, the Grid Portal selects the cluster on which is to run the job and stages all the input files for that job over to that cluster To make pools work each cluster that contributes to the pool must: Create a guest login id under which the pool jobs run Put the guest login id in its gridmap file Create a GridSphere account and certificate for that guest login id Provide a mechanism that allocates resources to pool jobs in a way that prevents pool jobs from going over the allotment of resources that that cluster has contributed to the resource pool The use of pools provides a mechanism for automatically spreading work around to available resources and equalizing and balancing loads Pools also provide a mechanism for sharing application software In the future we plan to implement specialty pools in addition to campus pools The cluster administrators of different clusters on a campus will be able join their clusters together in a specialty pool by contributing resources and applications to that pool Likewise, cluster administrators of clusters on different campuses can form resource pools at the UC Grid level The clusters forming a specialty pool must assign someone to act as the pool administrator for that pool As before, all users of a Grid Portal will belong to that portal’s default pool, but to join a specialty pool, a user will have to perform an “Add a Pool” operation and be approved by the pool administrator Figure shows how specialty pools will work In it we see clusters from UCLA and the TeraGrid contributing resources and applications to the default resource pool at the UCLA Grid and clusters from UCLA and UCI contributing resources and applications to the default pool at the UC Grid Two specialty pools have also been formed at UCLA and one at UC 15 If Professor B is a Pool-Only user from UCLA, she can use Matlab, Amber, Q-Chem, Mathematica, and Fluent when logged onto the UCLA Grid Portal A Mathematica job she submits could run on either of the two clusters that are contributing Mathematica to the default pool at UCLA Likewise, an Amber Job could run at either UCLA or on a cluster at the TeraGrid When Professor B logs onto the UC Grid Portal, she has her choice of Matlab, Amber, Q-Chem or Mathematica, but if she submits a Mathematica job, it will run on a cluster at UC Irvine Professor A is part of the CNSI He is also in one of the specialty pools at UCLA and the specialty pool on the UC Grid If he logs onto the UC Grid Portal, he can submit a pool job that runs any of the following applications: Matlab, Amber, Q-Chem, Mathematica, Gromacs, Jaguar, or Vasp He cannot submit a Vasp pool job from the UCLA Grid Portal as Vasp is contributed to the resource pool at UC by a CNSI cluster from UCSB Figure Specialty Pools VI Future Phases The following have to be addressed in future phases: Augmentation of UC’s research technology infrastructure in an interoperable fashion to facilitate the sharing of resources Strategies to manage resource allocation within the Grid Architecture and the UGP software to optimize performance and utilization 16 Addressing needs that are not met by the proposed Grid Architecture and the UGP software during Phase Also, addressing the needs of K-12 education for the creation of a California Grid Funding Models to sustain these services A Augmentation of UC’s Research Technology Infrastructure The UC Grid is an infrastructure enabling technology and framework that promotes resource-sharing and pooling, and cross-discipline and cross-campus collaboration, in addition to being a method for efficiently planning and utilizing computational resources within the UC system It is envisioned that the UC Grid infrastructure will enhance the ability of UC researchers to win grants that require a robust cyberinfrastructure because much of the complexity and barriers to access in using HPC resources can be eliminated through the Grid infrastructure Additionally, time otherwise required to maintain HPC resources could be significantly reduced or eliminated from grant proposals Through the UC Grid, access to a significant amount of computing power and data storage, along with access to advanced visualization capabilities, can provide tools to researchers who can significantly enhance the quality of grant submittals and subsequent grant activities Researchers who could not otherwise afford or justify their own personal clusters will be able to access cluster resources through Grid Pools Campuses can plan enhancements to their infrastructure more effectively by better understanding their requirements through the monitoring of Grid usage and investing accordingly Ultimately the Grid is a way to allow local control of resources while still maintaing campus-level visibility and resource sharing B Optimizing Performance and Utilization After the Phase implementation, the developers at ATS plan to monitor and keep statistics on: the load on each of the clusters, which cluster has been selected to run particular pool jobs, the ratio of pool jobs to cluster jobs submitted, the turnaround time for both cluster and pool jobs, etc They will use the statistics that they gather to adjust the algorithms used by UGP when selecting a cluster to run each specific pool job In addition, the UCRGC will use the statistics to address such issues as educating Cluster Users on whether cluster jobs or pool jobs are more advantageous, creating publicity about the grid for potential pool users, and deciding whether the Campus Grid/UC Grid model to be implemented in Phase is working optimally or whether it needs to be adjusted C Addressing Needs and Possible Extensions to the Grid Architecture and UGP After phase 1, work will have to be undertaken to query researchers about unfulfilled needs 17 Starting now, the committee can talk with those behind the California Grid for K-12 education to determine what is envisioned for that grid and what the needs are that that grid will address and see how the capabilities required mesh with what the Architecture described in this white paper and the UGP currently supply There are areas that the current Grid Architecture and UGP either don’t address or address poorly These may have to be addressed in the future according to the needs assessment They are discussed here in no particular order i UGP is insufficient for very large files unless high bandwidth networks, 10G Ethernet, are universally deployed across the UC system The insufficiencies are as follows: In order to run a job on a cluster, the data for that job must be on that cluster Cluster Users are responsible for using the tools provided in the UGP Data Manager to move the files to a particular cluster For Pool Jobs, the Grid Portal must stage the data over to the target cluster as part of job submission For very large files, 1) the stage in may take a very long time 2) the target cluster may not have enough disk space for the staged in file(s), 3) by the time the data is staged in, the cluster selected to run the job may no longer be the optimum cluster to run the job Currently, the only way to retrieve the output of a pool job is to download it to the machine on which the user is running his/her web browser, i.e., to the user’s desktop That machine may not have enough disk space for the output if the output is very large and also the download may take a very long time To visualize a result, the file has to be copied to the ION Visualization Server If the file is very big, this could take an unacceptably long time ii The Visualization Currently Provided is limited The limitations are as follows: Visualization currently provided includes: o The ability to view images o The ability to visualize chemistry results o The ability to visualize 2D and 3D rectilinearly-gridded data While this is satisfies the need of many researchers, not-all kinds of data are addressed Also, multiple-time-step data files are not addressed Currently, to visualize other kinds of data (scattered data, data on non-rectangular grids, medical data, etc.) in the grid environment would require the user to run some 18 visualization program in batch and save the images that result The Grid Portal could display images created this way All interactive, through-the-web visualization is done on a single Visualization Server Very large data would require a cluster dedicated to visualization to visualize for appropriate performance Highly interactive, VR-style visualization is not supported iii The Grid Architecture and the UGP are Designed to Provide for Batch Computing Only Batch computing is built directly into the design of the Grid Architecture and the UGP In the UCLA Grid Architecture, the Grid Portal communicates with a Grid Appliance with submits a job directly to the scheduler running on the cluster to which the Grid Appliance is attached Many interactive applications, such as Matlab, Mathematica, Techplot, etc present the user with a GUI interface when run interactively and also can be run in batch by providing a command file It is much more convenient an intuitive to run these application via their GUIs and much easier to make a mistake when creating a command file for a batch run Using these interactive applications in batch is inconvenient and may not be viable for a K-12 user community On the other hand, we should stress that the Grid is provided for applications that have large compute requirements These are normally parallel applications If an application can run on a local desktop machine, it is best to make use of the interactivity of a local desktop and use of the Grid is not necessary iv The Grid Architecture and the UGP are Designed to run each job on a Single Cluster Running Jobs that span multiple clusters is possible but there are several factors that would limit this capability: Latency and processor performance differences between clusters would prohibit anything but serial jobs or parallel jobs that spawn multiple serial jobs from being run across multiple clusters Running tightly coupled jobs across cluster would result in the inefficient use of compute resources Differences in processor architecture among clusters would require “smart queues” to insure that the architecture a given code was compiled on is matched to the same architecture at runtime Uniform deployment of compilers and associated libraries would be required in pooled clusters to insure compatibility for cross-cluster job submittal Firewalls and internal networks used for cluster compute nodes would prohibit inter-node communication required for tightly coupled parallel jobs 19 Schedulers such as the Sun N1 Grid Engine (SGE) are for scheduling jobs on a single cluster A metascheduler such as Condor would be needed to implement multi-cluster scheduling v Support is only for Compute Clusters The UGP Architecture currently provides support for compute clusters only Running instruments through the grid, sharing large databases through the grid and other activities are currently not supported but have been demonstrated with other Grid implementations such as the Astro Portal Stacking Service used with the Sloan Digital Sky Survey (8), the Earth Systems Grid (9), or the International Virtual Data Grid Laboratory (10) Each of these deficiencies may or may not need to be addressed but that can only be determined after a needs-assessment has been performed D Funding Models The proposed UC Grid architecture is such that discrete identification of resources related to a given Grid component can be easily identified and funded Additionally an individual campus Grid capability can be grown in an evolutionary fashion based on utilization An estimate of cost, suggested architecture, and funding sources associated with deployment of the UC Grid and Campus Grids along with the recommended funding are as follows: UC Grid - The UC Grid architecture requires four systems as a minimum to insure uptime and performance The equipment consists of an Apache web system, two Tomcat/MySQL/UGP systems and one MyProxy system Approximate cost is $22,000-25,000 depending on vendor and configuration UC or pooled campus funding Campus Grid – A campus Grid deployment consists of systems as a minimum Note that there is no redundancy in this configuration The equipment consists of an Apache web/Tomcat/MySQL/UGP system, a MyProxy system and a Storage Server Approximate cost is $12,000-25,000 depending on vendor and configuration (particularly the storage server) Campus level funding Appliance Node – Each cluster connected to a Campus Grid (and the UC Grid) requires an appliance node Approximate cost is $900-1,100 depending on vendor and configuration Campus or grant funding In addition to the system recommendations above, a small percentage of FTE time is required to handle UC and Campus Grid administrative tasks that would be funded by the respective organization It is expected that the initial Gird startup will require a larger time commitment than the sustaining effort necessary to maintain the Grid once it is in operation The FTE time required varies according to the number of users, usage of the Grid, robustness of the Grid systems and other factors and can vary from to 25 FTE Startup time would be on the higher end of this range but should only last for 2-4 weeks 20 Additionally there is a requirement for individual cluster administrators to enable Grid access for their cluster users The time requirement to perform this task is on the order of minutes per user The assumption is that a Programmer Analyst level III or IV will perform these tasks depending on the complexity and usage of the Grid installation References http://www.globus.org/toolkit/docs/4.0/security http://www.globus.org http://www.globus.org/toolkit/citations.html http://www.gridsphere.org/gridsphere/gridsphere https://bpcatalog.dev.java.net/nonav/ajax/index.html http://www.mysql.com http://grid.ncsa.uiuc.edu/myproxy http://www.sdss.org/ http://www.earthsystemgrid.org/ 10 http://www.ivdgl.org/ UCLA Grid Portal https://grid.ucla.edu For the Globus Toolkit in general: Globus Toolkit Version 4: Software for Service-Oriented Systems I Foster IFIP International Conference on Network and Parallel Computing, Springer-Verlag LNCS 3779, pp 2-13, 2005 This paper is an excellent introduction to the Globus Toolkit 4.0 and its use Globus: A Metacomputing Infrastructure Toolkit I Foster, C Kesselman Intl J Supercomputer Applications, 11(2):115-128, 1997 A Globus Toolkit Primer I Foster, 2005 For Grid concepts and architecture: The Anatomy of the Grid: Enabling Scalable Virtual Organizations I Foster, C Kesselman, S Tuecke International J Supercomputer Applications, 15(3), 2001 The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration I Foster, C Kesselman, J Nick, S Tuecke, 2002 The Grid: Blueprint for a New Computing Infrastructure, I Foster and C Kesselman (Eds), Morgan Kaufmann, 2005 21 ... the way the Grid Portal at UCLA is currently configured With the advent of the UC Grid, the Grid Portal at each campus will no longer include a CA as the CA for the all of the University of California. .. within the UC system It is envisioned that the UC Grid infrastructure will enhance the ability of UC researchers to win grants that require a robust cyberinfrastructure because much of the complexity... start by becoming a Cluster User of the UCLA Grid Once Professor A has his certificate and can login to both the UCLA and UC Grid Portals, he can go to the UCSB Grid Portal, authenticate by entering