Blueprints for High Availability Second Edition Evan Marcus Hal Stern Blueprints for High Availability Second Edition Executive Publisher: Robert Ipsen Executive Editor: Carol Long Development Editor: Scott Amerman Editorial Manager: Kathryn A Malm Production Editor: Vincent Kunkemueller Text Design & Composition: Wiley Composition Services Copyright © 2003 by Wiley Publishing, Inc., Indianapolis, Indiana All rights reserved Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8700 Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: permcoordinator@wiley.com Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002 Trademarks: Wiley, the Wiley Publishing logo and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc and/or it’s affiliates in the United States and other countries, and may not be used without written permission All other trademarks are the property of their respective owners Wiley Publishing, Inc., is not associated with any product or vendor mentioned in this book Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available in electronic books Library of Congress Cataloging-in-Publication Data is available from the publisher ISBN: 0-471-43026-9 Printed in the United States of America 10 For Carol, Hannah, Madeline, and Jonathan —Evan Marcus For Toby, Elana, and Benjamin —Hal Stern Contents Contents vii Preface For the Second Edition From Evan Marcus From Hal Stern Preface from the First Edition From Evan Marcus From Hal Stern About the Authors Chapter Introduction Why an Availability Book? Our Approach to the Problem What’s Not Here Our Mission The Availability Index Summary Organization of the Book Key Points Chapter What to Measure Measuring Availability The Myth of the Nines Defining Downtime Causes of Downtime What Is Availability? M Is for Mean What’s Acceptable? xix xix xix xxii xxiv xxv xxviii xxxi 4 6 10 11 14 15 17 18 19 vii viii Contents Failure Modes Hardware Environmental and Physical Failures Network Failures File and Print Server Failures Database System Failures Web and Application Server Failures Denial-of-Service Attacks Confidence in Your Measurements Renewability Sigmas and Nines Key Points 20 20 21 23 24 24 26 27 28 28 29 30 Chapter The Value of Availability What Is High Availability? The Costs of Downtime Direct Costs of Downtime Indirect Costs of Downtime The Value of Availability Example 1: Clustering Two Nodes Example 2: Unknown Cost of Downtime The Availability Continuum The Availability Index The Lifecycle of an Outage Downtime Lost Data Degraded Mode Scheduled Downtime Key Points 31 31 34 34 36 37 42 46 47 51 52 53 55 57 57 60 Chapter The Politics of Availability Beginning the Persuasion Process Start Inside Then Go Outside Legal Liability Cost of Downtime Start Building the Case Find Allies Which Resources Are Vulnerable? Develop a Set of Recommendations Your Audience Obtaining an Audience Know Your Audience Delivering the Message The Slide Presentation The Report After the Message Is Delivered Key Points 61 61 62 63 63 64 65 65 66 68 69 69 70 70 70 71 73 73 Contents Chapter 20 Key High Availability Design Principles #20: Don’t Be Cheap #19: Assume Nothing #18: Remove Single Points of Failure (SPOFs) #17: Enforce Security #16: Consolidate Your Servers #15: Watch Your Speed #14: Enforce Change Control #13: Document Everything #12: Employ Service Level Agreements #11: Plan Ahead #10: Test Everything #9: Separate Your Environments #8: Learn from History #7: Design for Growth #6: Choose Mature Software #5: Choose Mature, Reliable Hardware #4: Reuse Configurations #3: Exploit External Resources #2: One Problem, One Solution #1: K.I.S.S (Keep It Simple ) Key Points 75 76 77 78 79 81 82 83 84 87 88 89 90 92 93 94 95 97 98 99 101 104 Chapter Backups and Restores The Basic Rules for Backups Do Backups Really Offer High Availability? What Should Get Backed Up? Back Up the Backups Getting Backups Off-Site Backup Software Commercial or Homegrown? Examples of Commercial Backup Software Commercial Backup Software Features Backup Performance Improving Backup Performance: Find the Bottleneck Solving for Performance Backup Styles Incremental Backups Incremental Backups of Databases Shrinking Backup Windows Hot Backups Have Less Data, Save More Time (and Space) Hierarchical Storage Management Archives Synthetic Fulls 105 106 108 109 110 110 111 111 113 113 115 118 122 125 126 130 130 131 132 132 134 134 ix x Contents Use More Hardware Host-Free Backups Third-Mirror Breakoff Sophisticated Software Features Copy-on-Write Snapshots Multiplexed Backups Fast and Flash Backup Chapter 135 135 136 138 138 140 141 Handling Backup Tapes and Data General Backup Security Restores Disk Space Requirements for Restores Summary Key Points 141 144 145 146 147 148 Highly Available Data Management Four Fundamental Truths Likelihood of Failure of Disks Data on Disks Protecting Data Ensuring Data Accessibility Six Independent Layers of Data Storage and Management Disk Hardware and Connectivity Terminology SCSI Fibre Channel Multipathing Multihosting Disk Array Hot Swapping Logical Units (LUNs) and Volumes JBOD (Just a Bunch of Disks) Hot Spares Write Cache Storage Area Network (SAN) RAID Technology RAID Levels RAID-0: Striping RAID-1: Mirroring Combining RAID-0 and RAID-1 RAID-2: Hamming Encoding RAID-3, -4, and -5: Parity RAID Other RAID Variants Hardware RAID Disk Arrays Software RAID Logical Volume Management Disk Space and Filesystems Large Disks or Small Disks? What Happens When a LUN Fills Up? 149 150 150 151 151 151 152 153 153 156 157 157 157 158 158 158 158 159 159 161 161 161 162 163 167 167 169 170 173 175 176 176 178 179 Contents Managing Disk and Volume Availability Filesystem Recovery 180 181 Key Points 182 Chapter SAN, NAS, and Virtualization Storage Area Networks (SANs) Why SANs? Storage Centralization and Consolidation Sharing Data Reduced Network Loads More Efficient Backups A Brief SAN Hardware Primer Network-Attached Storage (NAS) SAN or NAS: Which Is Better? Storage Virtualization Why Use Virtual Storage? Types of Storage Virtualization Filesystem Virtualization Block Virtualization Virtualization and Quality of Service Key Points 183 184 186 186 187 188 188 189 190 191 196 197 198 198 198 200 202 Chapter Networking Network Failure Taxonomy Network Reliability Challenges Network Failure Modes Physical Device Failures IP Level Failures IP Address Configuration Routing Information Congestion-Induced Failures Network Traffic Congestion Design and Operations Guidelines Building Redundant Networks Virtual IP Addresses Redundant Network Connections Redundant Network Attach Multiple Network Attach Interface Trunking Configuring Multiple Networks IP Routing Redundancy Dynamic Route Recovery Static Route Recovery with VRRP Routing Recovery Guidelines Choosing Your Network Recovery Model Load Balancing and Network Redirection Round-Robin DNS Network Redirection Dynamic IP Addresses 203 204 205 207 208 209 209 210 211 211 213 214 215 216 217 217 219 220 223 224 225 226 227 228 228 229 232 xi .. .Blueprints for High Availability Second Edition Evan Marcus Hal Stern Blueprints for High Availability Second Edition Executive Publisher: Robert Ipsen... Preface For the Second Edition The strong positive response to the first edition of Blueprints for High Availability was extremely gratifying It was very encouraging to see that our message about high. .. completed our work on the first edition of Blueprints for High Availability, and in that time, a great many things xix xx Preface have changed The biggest personal change for me is that my family has