This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Designing Enterprise Solutions with Sun™ Cluster 3.0 By Richard Elling, Tim Read • Table of Contents Publisher: Prentice Hall PTR Pub Date: December 01, 2001 ISBN: 0-13-008458-1 Pages: 302 Slots: Designing Enterprise Solutions with Sun Cluster 3.0 is an introduction to architecting high available systems with Sun servers, storage, and the Sun Cluster 3.0 software Three recurring themes are used throughout the book: failures, synchronization, and arbitration These themes occur throughout all levels of system design The first chapter deals with understanding these relationships and recognizing failure modes associated with synchronization and arbitration The second and third chapters review the building blocks and describe the Sun Cluster 3.0 software environment in detail The remaining chapters discuss management servers and provide hypothetical case studies in which enterprise solutions are designed using Sun technologies Appendices provide a checklist for designing clustered solutions, additional information on Sun technologies used in many different types of clusters, guidelines for data center design best practices, and a brief description of some failure analysis tools used by Sun systems designers and architects This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Designing Enterprise Solutions with Sun™ Cluster 3.0 By Richard Elling, Tim Read • Table of Contents Publisher: Prentice Hall PTR Pub Date: December 01, 2001 ISBN: 0-13-008458-1 Pages: 302 Slots: Copyright Figures Tables Preface Sun BluePrints Program Who Should Use This Book Before You Read This Book How This Book Is Organized Ordering Sun Documents Accessing Sun Documentation Online Related Books Typographic Style Shell Prompts in Command Examples Acknowledgements Richard Elling Tim Read Chapter Cluster and Complex System Design Issues Business Reasons for Clustered Systems Failures in Complex Systems Data Synchronization Arbitration Schemes Data Caches Timeouts Failures in Clustered Systems Summary Chapter Enterprise Cluster Computing Building Blocks Data Repositories and Infrastructure Services Business Logic and Application Service User Access Services: Web Farms Compute Clusters Technologies for Building Distributed Applications Chapter Sun Cluster 3.0 Architecture System Architecture Kernel Infrastructure System Features Cluster Failures Synchronization This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Synchronization Arbitration Chapter Management Server Design Goals Services Console Services Sun Ray Server Sun StorEdge SAN Surfer Sun Explorer Data Collector Sun Remote Services Software Stack Hardware Components Network Configuration Systems Management Backup, Restore, and Recovery Summary Chapter Case Study 1—File Server Cluster Firm Description Design Goals Cluster Software Recommended Hardware Configuration Summary Chapter Case Study 2—Database Cluster Company Description Information Technology Organization Design Goals Business Case Requirements Design Priorities Cluster Software Recommended Hardware Configuration Summary Appendix A Sun Cluster 3.0 Design Checklists Business Case Considerations Personnel Considerations Top-Level Design Documentation Environmental Design Server Design Shared Storage Design Network Design Software Environment Design Security Considerations Systems Management Requirements Testing Requirements Appendix B Sun Cluster Technology History And Perspective SPARCcluster PDB x and SPARCcluster HA x History Sun Cluster x Sun Cluster 2.2 and 3.0 Feature Comparison Appendix C Data Center Guidelines Hardware Platform Stability Server Consolidation in a Common Rack This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com System Component Identification AC / DC Power System Cooling Network Infrastructure Security System Installation and Configuration Documentation Change Control Practices Maintenance and Patch Strategy Component Spares New Release Upgrade Process Support Agreement and Associated Response Time Backup-and-Restore Testing Cluster Recovery Procedures Summary Appendix D Tools Fault Tree Analysis Reliability Block Diagram Analysis Failure Modes and Effects Analysis Event Tree Analysis Acronyms, Abbreviations, and Glossary A B C D E F G H I J K L M N O P Q R S T U V W X Bibliography Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Copyright © 2002 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303-4900 U.S.A All rights reserved This product or document is distributed under licenses restricting its use, copying, distribution, and decompilation No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any Third-party software, including font technology, is copyrighted and licensed from Sun suppliers Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California UNIX is a registered trademark in the U.S and other countries, exclusively licensed through X/Open Company, Ltd Sun, Sun Microsystems, Sun BluePrints, SunUP, the Sun logo, AnswerBook, AnswerBook2, DLT, docs.sun.com, IPC, Solaris, Solstice Backup, Trusted Solaris, SunDocs, Sun Quad FastEthernet, SunFastEthernet, Sun StorEdge, SunPlex, OpenBoot, Sun Enterprise, Sun Enterprise Network Array, Sun Enterprise SyMON , Sun Fire, Sun HighGround, Starfire, iPlanet, Netra, SunTone, JumpStart, Solstice, Solstice DiskSuite, Solstice Backup, Solstice SyMON, Ultra Enterprise, Java, Jiro, JavaServer Pages, JSP, J2EE, JDBC, Sun Ray, Sun RSM Array, SunSwift, Enterprise JavaBeans, EJB, Sun HPC ClusterTools, SPARCcenter, SPARCcluster, SPARCserver, SPARCstorage, Sun Professional Services, SunSolve, SPARCcluster, PDB, Prism, RSM, and Write Once, Run Anywhere are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc in the United States and other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc in the U.S and other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc CMS is a trademark or registered trademark of Eastman Kodak Company in the United States and other countries ORACLE is a registered trademark of Oracle Corporation Netscape is a trademark or registered trademark of Netscape Communications Corporation in the United States and other countries Legato NetWorker is a registered trademark of Legato Systems, Inc Adobe is a registered trademark of Adobe Systems, Incorporated The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements Federal Acquisitions: Commercial Software—Government Users Subject to Standard License Terms and Conditions This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Editorial/production supervisor: Nicholas Radhuber Cover design director: Jerry Votta Cover designer: Kavish & Kavish Digital Publishing Design Manufacturing manager: Alexis R Heydt Marketing manager: Debby vanDijk Acquisitions manager: Gregory G Doench Sun Microsystems Press Marketing manager: Michael Llwyd Alread Publisher: Rachel Borden Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Figures FIGURE 1-1 HA System Failure Response FIGURE 1-2 Cache Latency Versus Cost FIGURE 1-3 Nested Timing Diagram—Example FIGURE 1-4 Nested Timing Diagram With Timeout FIGURE 1-5 Stable System With Timeout FIGURE 1-6 Unstable System With Timeout FIGURE 2-1 SAP R/3 Application Architecture FIGURE 2-2 Multitiered Model FIGURE 2-3 iPlanet Application Server Overview FIGURE 2-4 iPlanet Application Server Processes FIGURE 2-5 Web Hosting Service Ladder Approach—Load Balanced and Load Balancing FIGURE 2-6 Sun Parallel File System, High-Level View FIGURE 3-1 I/O Overhead of Campus Clustering Versus Replication FIGURE 3-2 Sun Cluster 3.0 System Hardware Diagram FIGURE 3-3 Relationship Between Sun Cluster Components FIGURE 3-4 Mini-Transaction Replicated Object Sequence FIGURE 3-5 Clustered Pair Topology FIGURE 3-6 N+1 Topology FIGURE 3-7 Pair+M Topology FIGURE 3-8 Local and Dual-Hosted Devices FIGURE 3-9 DID Numbering for a Three-Node Cluster FIGURE 3-10 Global File Service Architecture FIGURE 3-11 CFS Read Mechanism FIGURE 3-12 CFS Write Mechanism This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com FIGURE 3-13 Four-Node Cluster With Two Subnets and Two Global IP Addresses FIGURE 3-14 Switch-Connected and Back-to-Back Private Interconnects FIGURE 3-15 Failover Resource Group Configuration—Example FIGURE 4-1 SRS High-Level Architectural Block Diagram FIGURE 4-2 Environment Logical Connectivity FIGURE 4-3 Management Server Network Configuration FIGURE 5-1 NFS Cluster Logical Configuration FIGURE 5-2 NFS Cluster Network Configuration Options FIGURE 5-3 NFS Cluster Rack Configuration FIGURE 5-4 Expected Backup Times FIGURE 6-1 Top-Level Architecture for Oracle 9i RAC Database System FIGURE 6-2 Active Users FIGURE 6-3 Oracle 9i RAC Architecture FIGURE 6-4 Oracle 9i RAC Logical Cache FIGURE 6-5 Two-Node Cache Fusion FIGURE 6-6 Sun Cluster Configuration FIGURE 6-7 Sun StorEdge D240 Media Tray—Front View FIGURE 6-8 Sun StorEdge D240 Media Tray—SCSI Bus Configurations FIGURE 6-9 Production Server Boot Disk Environment FIGURE 6-10 Multiple Failure Scenario FIGURE 6-11 Alternate Design for Production Server Boot Disk Environment FIGURE 6-12 The Company's Network Design FIGURE 6-13 Sun Fire Cabinet Power Distribution FIGURE B-1 Sun Cluster 2.2 Logical Host FIGURE C-1 Data Center Floor Plan Grid Diagram—System xyz Is Located at Grid Coordinate b4 FIGURE D-1 AND Gate Structure Analysis This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com FIGURE D-2 OR Gate Structure Analysis FIGURE D-3 Functional Block Structure Analysis FIGURE D-4 Fault Tree Analysis of Boot, Root, and Swap Disk Subsystem FIGURE D-5 Mirrored Disk Subsystem Reliability Block Diagram FIGURE D-6 Mirrored Sun StorEdge D240 Media Tray—Event Tree Analysis Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Tables TABLE 1-1 Common 10BASE-T Ethernet Failure Modes TABLE 1-2 Reported and Correctable Errors TABLE 3-1 Global File Service Benefits for Application Executables TABLE 3-2 CFS and NFS Differences TABLE 3-3 Resource Type Properties TABLE 5-1 Major Feature Priorities TABLE 5-2 High-Level Features of NFS Versions and TABLE 5-3 Management Server and Support Components Parts List TABLE 5-4 NFS Cluster Node Parts List TABLE 5-5 NFS Cluster Node Network Port Assignments TABLE 6-1 Planned Service Schedule TABLE 6-2 Major Feature Priorities TABLE 6-3 Oracle 9i RAC Cluster Software Stack TABLE 6-4 Sun Fire 4800 Server Node Parts List TABLE 6-5 Sun Fire 4800 Node I/O Configuration TABLE 6-6 Oracle 9i RAC Shared Storage Parts List TABLE 6-7 Sun StorEdge T3 Array With Sun Cluster Configurations TABLE B-1 Sun Cluster 2.2 and 3.0 Terminology Differences TABLE D-1 FMEA Table Example TABLE D-2 Key for FMEA Severity, Likelihood, and Detection Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Q Quorum The number, usually a majority of officers or members of a body, that when duly assembled is legally competent to transact business Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com R RAC Oracle 9i Real Application Cluster RAM random access memory Reliability An abstract term defined as the probability that a product or system performs its intended function for a specified time period when operating under normal environmental conditions Reliability differs from availability in that reliability involves only one event, failure, whereas availability takes into account two events: failure and recovery A system can be highly available yet experience frequent periods of inoperability as long as the length of each period is short RBAC role-based access control RBD reliability block diagram RDBMS relational database management system This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com RM-API Resource Management API RM replica managers RMA replica manager agents RMM replica manager manager RPN risk priority number RSC remote service controller RSM Remote Shared Memory RTR resource type registration This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com RTS redundant transfer switch RTU redundant transfer unit Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com S S3L Sun Scalable Scientific Subroutine Library SAN storage area network SC system controller SCI Scalable Coherent Interface SCN system change number SCSI Small Computer System Interface SCSL Sun Community Software License This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com SGA system global area SLA service level agreement SMAD switch management agent daemon SMB Now called the Common Intenet File System SMON system monitor SMP symmetric multiprocessor SPA service point architecture SPOF This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com single point of failure SRS Sun Remote Services SSP system service processor SVM Solaris Volume Manager Formerly known as Solstice DiskSuite SQE software query enable Split brain Condition in which a cluster forms multiple partitions, with each partition forming without knowledge of the existence of any other partition SMON system monitor SRAM static random access memory This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com SMP symmetric multiprocessor Systems engineering The engineering discipline concerned with the design of the whole as distinct from the design of the parts Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com T TC terminal concentrator TCP/IP Transmission Control Protocol/Internet Protocol Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com U UDP User Datagram Protocol Built on top of IP at the transport layer, UDP provides a datagram-based service UDLM UNIX distributed lock manager UFS UNIX file system UPS uninterruptible power supply UTC universal time coordinated UTP unshielded twisted pair Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com V Vote A usually formal expression of opinion or will in response to a proposed decision VPN virtual private network VxFS Veritas File System VxVM Veritas Volume Manager, which includes a licensable CVM Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com W WAN wide area network WBEM web-based enterprise management WERO write-exclusive, registrants only Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com X XAS executive application server XCS C++ server XJS executive Java server Top This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com Bibliography ANOL01 Alex Noordergraaf, "Building a JumpStart Infrastructure," Sun BluePrints Online, April, 2001, http://www.sun.com/blueprints Bern98 Peter L Bernstein, Against the Gods, the Remarkable Story of Risk, 1998, 0-47129563-9 CORBAhist CORBA history: http://cgi.omg.org/corba/corbahistory.html Dijkstra65 E W Dijkstra, "Solution of a Problem in Concurrent Programming Control," Communications of the ACM, Vol 8, 1965 DNS Domain Name Service: http://www.polyserve.com/support/whitepapers/wp_load_balancing.html (DNS round-robin) Hayes98 John P Hayes, Computer Architecture and Organization, 3rd edition, 1998, 0-07027355-3 HPcod2 David A Patterson and John L Hennessy, Computer Organization and Design: The Hardware/Software Interface, 2nd edition, 1998, 1-55860-428-6 iMS51 iPlanet Message Server 5.1 manuals: http://docs.iplanet.com/docs/manuals/messaging/ims51/ig/unix/overview.htm#23546 JMRM00 Jim Mauro and Richard McDougall, Solaris Internals Core Kernel Architecture, Prentice Hall, 2000, 0-13-022496-0 JHAN01 John S Howard and Alex Noordergraaf, JumpStart Technology: Effective Use in the Solaris Operating Environment, 2001, 0-13-062154-4 JXTAover Project JXTA: A Technology Overview, by Li Gong, Sun Microsystems (whitepaper) Lamport74 Leslie Lamport, "A New Solution of Dijkstra's Concurrent Programming Problem," Communications of the ACM, 17(8):453-455, August 1974 Laprie85 Jean-Claude Laprie, Dependable Computing and Fault Tolerance: Concepts and Terminology, 1985, IEEE 0731-3071/85/0000/0002 Lyu95 Michael R Lyu, editor, Handbook of Software Reliability Engineering, 1995, Tupper and Love/David McKay Company Madron89 Thomas W Madron, LANS: Applications of IEEE/ANSI 802 Standards, 1989, 0471-62049-1 ORBimpl ORB Implementations: http://www.yy.cs.keio.ac.jp/~suzuki/object/dist_comp.html#corba-r This document is created with a trial version of CHM2PDF Pilot http://www.colorpilot.com PHcaqa2 John L Hennessy and David A Patterson, Computer Architecture: A Quantitative Approach, 2nd edition, 1996, 1-55860-329-8 PL01 Peter Lees, "Writing Scalable Service with Sun Cluster 3.0," Proceedings of the Sun Users Performance Group, April 2001, http://www.sun.com/datacenter/superg Ramo65 Simon Ramo, "The Design of the Whole—Systems Engineering," in Listen to Leaders in Engineering, edited by Albert Love and JamesSaxon Childers, 1965, Library of Congress Catalog Card Number: 64-23488 SKB00 Stan Stringfellow Miroslav Klivansky, and Michael Barto, Backup and Restore Practices for Sun Enterprise Servers, 2000, 0-13-089401-X SunMPG99 Multithreaded Programming Guide, Sun Microsystems, Part Number 806-525710, http://docs.sun.com SunSCE01 Enrique Vargas, Joesph Bianco, and David Deeths, Sun Cluster Environment Sun Cluster 2.2, 2001, 0-13-041870-6 SunSIG99 System Interface Guide, Sun Microsystems, Part Number 806-4750-10, http://docs.sun.com SWuios96 W David Schwaderer and Andrew W Wilson, Jr., Understanding I/O Subsystems, 1st edition, 1996, 0-9651911-0-9 vahalia96 Uresh Vahalia, "Cluster Platform 220/1000 Architecture—A Product from the SunTone Platforms Portfolio," UNIX Internals: the New Frontiers, 1996, 0-13-101908-2 VargasOL00 Enrique Vargas, "High Availability Fundamentals," Sun BluePrints Online, November, 2000, http://www.sun.com/blueprints Vargas01 Enrique Vargas, "Cluster Platform 220/1000 Architecture—A Product from the SunTone Platforms Portfolio," Sun BluePrints Online, August 2001, http://www.sun.com/blueprints VBD01 Sun™ Cluster Environment: Sun Cluster 2.2, Sun BluePrints™, 2001, 0-13-041870-6 Webster87 Webster's Ninth New Collegiate Dictionary, 1987, 0-87779-508-8 Top ... Cable unplugged Physical NIC Yes, unless Software Query Enable (SQE) is enabled Cable shorted Physical NIC Yes Cable wired in reverse polarity Physical NIC Yes Cable too long Physical NIC (in... Arwood, Joseph Bianco, Michael Byrne, Ralph Campbell, Al Clepper, David Deeths, Tom Cox, Steven Docy, Andrew Hisgen, Mark Kampe, Yousef Khalidi, Peter Lees, James MacFarlane, James McPherson, Kim... sufficient Physical failures are a bounded set They are often detected by the network interface card (NIC) However, not all physical failures can be detected by a single NIC, nor can all physical