Spatial Cloud Computing A Practical Approach Spatial Cloud Computing A Practical Approach Chaowei Yang Qunying Huang With the collaboration of Zhenlong Li Chen Xu Kai Liu Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Version Date: 20131017 International Standard Book Number-13: 978-1-4665-9317-6 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface Acknowledgments xvii xxiii Part I Introduction to cloud computing for geosciences 1 Geoscience application challenges to computing infrastructures 3 1.1 Challenges and opportunities for geoscience applications in the 21st century 1.1.1 Energy 3 1.1.2 Emergency response 1.1.3 Climate change 1.1.4 Sustainable development 1.2 The needs of a new computing infrastructure 1.2.1 Providing enough computing power 1.2.2 Responding in real time 1.2.3 Saving energy 1.2.4 Saving the budget 10 1.2.5 Improving accessibility 10 1.3 The birth of cloud computing 10 1.3.1 Distributed computing 10 1.3.2 On-demand services 11 1.3.3 Computing sharing and cost savings 12 1.3.4 Reliability 12 1.3.5 The emergence of cloud computing 12 1.4 The advantages and disadvantages of cloud computing for geoscience applications 14 v vi Contents 1.4.1 Advantages of cloud computing 14 1.4.2 Problems 14 1.5 Summary 15 1.6 Problems 15 References 16 Cloud computing architecture, concepts, and characteristics 19 2.1 2.2 2.3 2.4 2.5 2.6 Concepts 19 Cloud computing architecture 20 Characteristics 22 Service models 23 Deployment models and cloud types 25 Review of cloud computing resources 27 2.6.1 Commercial clouds 27 2.6.2 Open-source cloud solutions 29 2.7 Summary 30 2.8 Problems 30 References 31 Enabling technologies 33 3.1 Hardware advancements 33 3.1.1 Multicore and many-core technologies 33 3.1.2 Networking 34 3.1.3 Storage 34 3.1.4 Smart devices 35 3.2 Computing technologies 35 3.2.1 Distributed computing paradigm 35 3.2.2 Computing architecture model 36 3.3 Virtualization 36 3.3.1 Virtualization implementation 37 3.3.2 Virtualization solutions 38 3.4 Distributed file system 39 3.4.1 Introduction to the distributed file system 40 3.4.2 Google File System 40 3.4.3 Apache Hadoop Distributed File System 41 3.5 Web x.0 42 3.5.1 Web services 43 3.5.2 Service-oriented architecture 44 3.6 Conclusion 46 Contents vii 3.7 Summary 46 3.8 Problems 47 References 47 Part II Deploying applications onto cloud services 49 How to use cloud computing 51 4.1 Popular cloud services 51 4.1.1 Introduction 51 4.1.2 Amazon AWS and Windows Azure 52 4.2 Use case: A simple Web Application 53 4.2.1 HTML design for the Hello Cloud Web application 53 4.2.2 Web servers 54 4.3 Deploying the Web application onto cloud services 55 4.3.1 Amazon Web Services 55 4.3.2 Windows Azure 66 4.4 Conclusion and discussion 70 4.5 Summary 72 4.6 Problems 72 References 73 Cloud-enabling geoscience applications 75 5.1 Common components for geoscience applications 75 5.1.1 Server-side programming 75 5.1.2 Database 76 5.1.3 High performance computing 76 5.2 Cloud-enabling geoscience applications 77 5.3 Use cases 78 5.3.1 Database-driven Web applications 78 5.3.2 Typical HPC applications 84 5.4 Summary 90 5.5 Problems 90 References 90 How to choose cloud services: Toward a cloud computing cost model 93 6.1 6.2 The importance and challenges of selecting cloud services 93 The factors impacting cloud service selection 94 viii Contents 6.2.1 Cloud service capacity provisioning and measurements 94 6.2.2 Cloud platform pricing rules 96 6.2.3 Application features and requirements 97 6.3 Selecting cloud services using the Earth Science Information Partners (ESIP) cloud Adoption Advisory Tool as an example 98 6.3.1 Architecture of the advisory tool 99 6.3.2 The general workflow for cloud service selection 99 6.3.3 Use case 102 6.4 In-depth considerations in cloud service selection and the development of advisory tools 104 6.4.1 Correctness and accuracy of evaluation models 105 6.4.2 Up-to-date information of cloud services 106 6.4.3 Interactivity and visualization functions of the advisory tool 106 6.5 Summary 106 6.6 Problems 107 References 107 Part III Cloud-enabling geoscience projects 109 ArcGIS in the cloud 111 7.1 Introduction 111 7.1.1 Why a geographical information system needs the cloud 111 7.1.2 GIS examples that need the cloud 112 7.2 ArcGIS in the cloud 112 7.2.1 ArcGIS Online 112 7.2.1.1 Functionalities 113 7.2.2 ArcGIS for Server 114 7.2.2.1 Functionalities 114 7.2.3 GIS software as a service 115 7.2.3.1 Functionalities 115 7.2.4 Mobile GIS service 116 7.2.5 Section summary 116 7.3 Use cases 117 7.3.1 Regional analysis of Oregon using ArcGIS Online 117 Contents ix 7.3.2 Use cases of ArcGIS for Server 120 7.3.2.1 Brisbane City Council Flood Common Operating Picture 120 7.3.2.2 Pennsylvania State Parks Viewer 121 7.3.3 Section summary 122 7.4 Summary 122 7.5 Problems 123 References 124 Cloud-enabling GEOSS Clearinghouse 125 8.1 GEOSS Clearinghouse: Background and challenges 125 8.1.1 Background 125 8.1.2 Challenges 126 8.2 Deployment and optimization 127 8.2.1 General deployment workflow 127 8.2.2 Special considerations 130 8.2.2.1 Data backup 130 8.2.2.2 Load balancing 132 8.2.2.3 Auto-scaling 132 8.2.3 The differences from the general steps in Chapter 5 134 8.3 System demonstration 135 8.3.1 Local search 135 8.3.2 Remote search 136 8.4 Conclusion 136 8.4.1 Economic advantages 137 8.4.2 Technical advantages 138 8.5 Summary 138 8.6 Problems 138 Appendix 8.1 T emplate for creating an auto-scaling function 138 References 141 Cloud-enabling Climate@Home 143 9.1 9.2 Climate@Home: Background and challenges 143 9.1.1 Background 143 9.1.2 Challenges 146 Deployment and optimizations 147 9.2.1 General deployment workflow 147 9.2.1.1 Deploying the spatial Web portal 147 30 25 Amazon AWS Windows Azure 20 15 10 2008–11- 2008–12- 2009–02- 2009–03- 2009–04- 2009–05- 2009–06- 2009–07- 2009–08- 2009–10- 2009–11- 2009–12- 2010–01- 2010–02- 2010–03- 2010–05- 2010–06- 2010–07- 2010–08- 2010–09- 2010–10- 2010–11- 2011–01- 2011–02- 2011–03- 2011–04- 2011–05- 2011–06- 2011–07- 2011–09- 2011–10- 2011–11- 2011–12- 2012–01- 2012–02- 2012–04- 2012–05- 2012–06- 2012–07- 2012–08- 2012–09- 2012–10- 2012–12- 2013–01- Figure 4.1 Search trends among Amazon Web Services (AWS) and Windows Azure Cloud Account Management Create Image for the Application Security, Access Management Deploy Application Customization Launch Instance Setup Environments Connect to Instance Software Libraries Others Figure 5.1 General steps for cloud-enabling geoscience applications Authorize network access Create a new AMI from the running instance Launch an instance with Ubuntu 12.04 Customization of the application Log in to the instance Deploy Drupal site Set up environments, e.g., Apache HTTP DBMSs Transfer Drupal files onto instance Figure 5.2 The procedure for deploying the Drupal site onto EC2 (blue boxes indicate the additional steps for the deployment) Authorize network access Run the DEM interpolation Launch a cluster of instances as the head node Transfer the DEM data and interpolation code to the head node Install the middleware packages, e.g., Condor Configure the middleware on both nodes to enable communication Create a new AMI from the running instance Start another instance from the new AMI as a computing node Figure 5.9 The process for configuring an HPC system to run the DEM interpolation on EC2 (blue boxes indicate the additional steps for configuring a virtual HPC environment) Minimum Fee 50.0 37.5 25.0 12.5 0.0 Solution Solution Solution Solution Solution VM fee Storage fee Data transfer fee (a) Maximum Fee 80 60 40 20 Solution Solution Solution Solution Solution VM fee Storage fee Data transfer fee (b) Figure 6.5 (a) Minimum Fee and (b) Maximum Fee charts Virtual Machine Configuration Comparison –3 CPU Cores CPU Units CPU Speed(GHz) Solution (Azure_M) RAM(GB) Bandwidth(Gbps) Local Disk(100GB) Solution (Azure_M) Figure 6.6 Virtual machine configuration comparison Figure 7.1 Regional Analysis of Oregon map Solution (Azure_M) Figure 7.2 Base maps and data offered by ArcGIS Online Figure 7.3 Symbol modification of the Primary and Secondary Roads Figure 8.6 Search results for global “Annual Sum NDVI Annual Rainfall.” Figure 8.7 Search results for global “Annual Sum NDVI Annual Rainfall” from the GEO Portal Figure 9.6 Seasonal mean analysis using model outputs Figure 9.7 Visualizing volunteered computing nodes with Google Maps Figure 10.4 Create a placement group and place the instances within the same group Figure 10.5 Photograph of a haboob that hit Phoenix, Arizona, on July 1, 2007 (Courtesy of Osha Gray Davidson, Mother Jones, San Francisco, CA at http://www. motherjones.com/blue-marble/2009/09/australias-climate-chaos.) ETA-8bin: 50 km GMU-ETA-8bin dust–load (g/m2) V.T.(+24) 2007.JUL.02.00 UTC I.T 2007.07.01.00 UTC 34 N V.T.(+24) 2007.JUL.02 00 UTC I.T 2007.07.01.00 UTC 34 N 33.5 N GrADS: COLA/IGES 105W 106W 105.5W 107W 106.5W 108W 107.5W 109W 108.5W 105W 106W 107W 106.5W 108W 107.5W 109W 105.5W 2010–09– 28 N 110W GrADS: COLA/IGES 108.5W 110W 109.5W 111W 0.1 0.02 28.5 N 110.5W 107W 106.5W 108W 107.5W 109W 108.5W 110W 109.5W 110.5W 0.2 29 N 28.5 N 28 N 0.5 29.5 N 29 N GrADS: COLA/IGES 30 N 29.5 N 28.5 N 1.5 30.5 N 30 N 29 N 31 N 30.5 N 29.5 N 2.5 31.5 N 31 N 30 N 32 N 31.5 N 30.5 N 32.5 N 32 N 31 N 111W 33 N 32.5 N 31.5 N 28 N 33.5 N 33 N 32 N 109.5W 32.5 N V.T.(+24) 2007.JUL.02 00 UTC I.T 2007.07.01.00 UTC 34 N 111W 33 N NMM-dust: km GMU-NMM-dust dust-load (g/m2) 110.5W 33.5 N NMM-dust: 22 km GMU-NMM-dust dust-load (g/m2) 2010–09–26–23:39 Figure 10.6 Comparison of the simulation results by ETA-8bin and NMM-dust AOI 10, 11, 12, and 13 at a m., July 2, 2007 AOI Distribution Length (Latitude Degree) 3.5 2.5 1.5 0.5 0 0.5 AOI AOI 17 AOI 13 1.5 AOI AOI AOI 14 2.5 3.5 Width (Longitude Degree) AOI AOI AOI 15 AOI AOI 10 AOI 16 4.5 AOI AOI 11 AOI 17 5.5 AOI AOI 12 AOI 18 Figure 10.7 AOI distribution identified by the ETA-8bin 30 25 EC2 Scalability instance instances instances 20 15 10 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 Average Response Time (s) 35 Concurrent Number Figure 12.4 EC2 scalability with up to 1, 2, and instances Figure 13.2 Architecture of CloudStack (Adopted from CloudStack Architecture at http://www.slideshare.net/cloudstack/cloudstack-architecture.) Figure 13.3 Architecture of Eucalyptus (See Eucalyptus at http://en.wikipedia.org/wiki/ File:Eucalyptus_Platform_Architecture,_February_2013.jpg.) Tools Command Line Interface Scheduler Request Manager (XML-RPC) Core SQL Pool Drivers Other Tools VM Manager Transfer Driver Host Manager Virtual Machine Driver VN Manager Information Driver (a) EC2 Tools OCCI CLI EC2-Query REST-OCCI EC2 Server OCCI Server SYSADMII USER Tool Component Interface/API ONE CLI OCA (Java + Ruby) Scheduler XML-RPC ONE acct OpenNebula TM-API IM-API VM-API Auth-API TM Driver IM Driver VM Driver Auth Driver Physical Infrastructure SQL DB (b) Figure 13.4 (a) OpenNebula internal architecture (see OpenNebula architecture at http://opennebula.org/documentation:archives:rel2.0:architecture) and (b) interfaces (see OpenNebula scalable architecture at http://opennebula.org/ documentation:rel3.8:introapis) Activities, Outcomes Architectures Acquire, compose, document, and deploy reference platform cloud that support Geospatial Platform standards Identify requirements-driven solution architectures and platforms for various sized deployments of geospatial data and services Cost Models Monitor costs, loads, issues, and options in support of OMB IT project document guidance Document and assess cost models to support scalability, reliability, and redundancy Certification Certify Geospatial Solution packages to facilitate reuse Expedite FISMA (security) certification and accreditation for agency adoption of packaged solution architectures Comparisons Document best practices and guides to agencies on adoption of geospatial cloud infrastructure Support and collect cost comparison information from agencies for existing and externally hosted cloud solutions Document lessons learned and best practices Figure 15.1 GeoCloud Goals, Activities, and Outcomes Applications LarvaMap Tool Tiger GEOSS Clearinghouse EROS Spatial Data Warehouse Platform as a Service (PaaS) Platform Application Servers Platform Enablers (DB’s, etc) App Frameworks/Libraries Runtime Systems Infrastructure Operating System Virtual Machines/CPU Figure 15.2 GeoCloud architectural framework Storage More Agency Apps Basic Image Open Source Additions Java, PHP, PostGRES Tomcat, FLEX, Apache ArcGIS Server GeoSpatial Platform Harden, Build Base Platforms Open Source Platform Linux (CentOS) Specialized for Target Apps Open Source Core Apache, PostGRES/My SQL, Java, Ruby on Rails, Tomcat, Perl, Python) Semantic Drupal GlassFish 3, Axis, SunMQ OpenGeo: GeoServer, GeoNetwork Custom UIs and Standard APIs Windows Platform Specialized Platforms Geospatial, HHS and Semantic Apps GSA Windows 2008 Server Image NET, IIS Base Platforms THREDDS Tiers (e.g., database, app server) can be split or combined as needed Figure 15.4 GeoCloud platform creation and customization for federal geospatial applications (From Nebert, 2010, www.fgdc.gov/ngac/meetings/december-2010/ geocloud-briefing.pptx.) Figure 16.5 Global content delivery to handle global user access 1.8 Query Time (s) 1.5 No-Index query time R-Tree query time 1.2 0.9 0.6 0.3 1000 10,000 100,000 1,000,000 Feature Number Figure 16.7 Spatial indexing to improve the access speed 10,000,000 .. .Spatial Cloud Computing A Practical Approach Spatial Cloud Computing A Practical Approach Chaowei Yang Qunying Huang With the collaboration... for readers who wish to get a sense of spatial cloud computing, adopt cloud computing for their applications, or conduct further research in spatial cloud computing HOW DID WE WRITE THE BOOK? In... knowledge of spatial cloud computing through practical examples in 17 chapters from aspects including: (a) What are the essential cloud computing concepts and why geosciences need cloud computing?