ptg 524 CHAPTER 18 SQL Server High Availability What’s New in High Availability In general, a couple of Microsoft SQL Server 2008 configuration options offer a very strong database engine foundation that can be highly available (7 days a week, 365 days a year). Microsoft’s sights are set on being able to achieve five-nines reliability with almost every- thing it builds. An internal breakthrough introduced with SQL Server 2005 called “copy- on-write” technology, has enabled Microsoft to greatly enhance several of its database high availability options. Here are a few of the most significant enhancements and new features that have direct or indirect effects on increasing high availability for a SQL Server 2008–based implementation: . Increased number of nodes in a SQL cluster—You can create a SQL cluster of up to 64 nodes on Windows Server Data Center 2008. . Enhancements to do unattended cluster setup—Instead of having to use wizards to set up SQL clustering, you can use the Unattended Cluster Setup mode. This is very useful for fast re-creation or remote creation of SQL clustering configurations. . All SQL Server 2008 services as cluster managed resources—All SQL Server 2008 services are now cluster aware. . SQL Server 2008 database mirroring—Database mirroring creates an automatic failover capability to a “hot” standby server. (Chapter 20, “Database Mirroring,” covers this topic in detail.) . SQL Server 2008 peer-to-peer replication—This option of data replication uses a publisher-to-publisher model (hence peer-to-peer). . SQL Server 2008 automatic corruption recovery from mirror—This enhance- ment in database mirroring recognizes and corrects corrupt pages during mirroring. . SQL Server 2008 mirroring transaction record compression—This feature allows for compression of the transaction log records used in database mirroring to increase the speed of transmission to the mirror. . SQL Server 2008 fast recovery—Administrators can reconnect to a recovering database after the transaction log has been rolled forward (and before the rollback processing has finished). . Online restore—Database administrators can perform a restore operation while the database is still online. . Online indexing—The online index option allows concurrent modifications (updates, deletes, and inserts) to the underlying table or clustered index data and any associated indexes during index creation time. . Database snapshot—SQL Server 2008 allows for the generation and use of a read- only, stable view of a database. The database snapshot is created without the overhead of creating a complete copy of the database or having completely redun- dant storage. Download from www.wowebook.com ptg 525 What Is High Availability? . Hot additions—This feature allows for hot additions to memory and CPU. . Addition of a snapshot isolation level—A new snapshot isolation (SI) level is being provided at the database level. With SI, users can access the last committed row, using a transactionally consistent view of the database. . Dedicated administrator connection—SQL Server 2008 supports a dedicated administrator connection that administrators can use to access a running server even if the server is locked or otherwise unavailable. This capability enables administra- tors to troubleshoot problems on a server by executing diagnostic functions or Transact-SQL statements without having to take down the server. At the operating system (OS) level, Virtual Server 2005 has firmly established virtualization for both development and production environments and allows entire application and database stacks to run on a completely virtual operating system footprint that will never bring down the physical server. NOTE Microsoft has announced that log shipping will be deprecated soon. Although it has been functionally replaced with database mirroring, log shipping remains available in SQL Server 2008. However, you should plan to move off log shipping as soon as you can. Keep in mind that Microsoft already has an extensive capability in support of high avail- ability. The new HA features add significant gains to the already feature-rich offering. What Is High Availability? The availability continuum depicted in Figure 18.1 shows a general classification of avail- ability based on the amount of downtime an application can tolerate without impacting the business. You would write your service-level agreements (SLAs) to support and try to achieve one of these continuum categories. Topping the chart is the category extreme availability, so named to indicate that this is the least tolerant category and is essentially a zero (or near zero) downtime requirement (that is, sustained 99.5% to 100% availability). The mythical five-nines falls at the high end of this category. Next is the high availability category, which has a minimal tolerance for downtime (that is, sustained 95% to 99.4% availability). Most “critical” applications would fit into this category of availability need. Then comes the standard availability cate- gory, with a more normal type of operation (that is, sustained 83% to 94% availability). The acceptable availability category is for applications that are deemed noncritical to a company’s business, such as online employee benefit package self-service applications. These applications can tolerate much lower availability ranges (sustained 70% to 82% availability) than the more critical services. Finally, the marginal availability category is for nonproduction custom applications, such as marketing mailing label applications that can tolerate significant downtime (that is, sustained 0% to 69% availability). Again, remember that availability is measured by the planned operation times of the application. 18 Download from www.wowebook.com ptg 526 CHAPTER 18 SQL Server High Availability Extreme Availability Characteristic Availability Continuum Availability Range Near zero downtime! Availability Range describes the percentage of time relative to the “planned” hours of operations 8,760 hours/year | 168 hours/week | 24 hours/day 525,600 minutes/year | 7,200 minutes/week | 1,440 minutes/day (99.5%-100%) 1.8 days/yr–5.26 min/yr High Availability Minimal downtime (95%-99.4%) 18 days/yr–2.0 days/yr Standard Availability With some downtime tolerance (83%-94%) Acceptable Availability Non-critical Applications (70%-82%) Marginal Availability Non-production Applications (up to 69%) FIGURE 18.1 Availability continuum. NOTE Another featured book from Sams Publishing, called Microsoft SQL Server High Availability, can take you to the depths of high availability from every angle. This land- mark offering provides a complete guide to high availability, beginning with ways to gather and understand your HA requirements, assess your HA needs, and completely build out high-availability implementations for the most common business scenarios in the industry. Pick up this book if you are serious about achieving five-nines of reliability. Achieving the mythical five-nines (that is, a sustained 99.999% availability) falls into the extreme availability category (which tolerates between 5.26 minutes and 1.8 days of down time per year). In general, the computer industry calls this high availability, but we push this type of near-zero downtime requirement into its own extreme category, all by itself. Most applications can only dream about this level of availability because of the costs involved, the high level of operational support required, the specialized hardware that must be in place, and many other extreme factors. The Fundamentals of HA Every minute of downtime you have today translates into losses that you cannot well afford. You must fully understand how the hardware and software components work together and how, if one component fails, the others will be affected. High availability of Download from www.wowebook.com ptg 527 The Fundamentals of HA an application is a function of all the components together, not just one by itself. Therefore, the best approach for moving into supporting high availability is to work on shoring up the basic foundation components of hardware, backup/recovery, operating system upgrading, ample vendor agreements, sufficient training, extensive quality assur- ance/testing, rigorous standards and procedures, and some overall risk-mitigating strate- gies, such as spreading out critical applications over multiple servers. By addressing these first, you add a significant amount of stability and high-availability capability across your hardware/system stack. In other words, you are moving up to a necessary level before you completely jump into a particular high-availability solution. If you do nothing further from this point, you have already achieved a portion of your high-availability goals. Hardware Factors You need to start by addressing your basic hardware issues for high availability and fault tolerance. This includes redundant power supplies, UPS systems, redundant network connections, and ECC memory (error correcting). Also available are “hot-swappable” components, such as disks, CPUs, and memory. In addition, most servers are now using multiple CPUs, fault-tolerant disk systems such as RAID, mirrored disks, storage area networks (SANs), Network Attached Storage (NAS), redundant fans, and so on. Cost may drive the full extent of what you choose to build out. However, you should start with the following: . Redundant power supplies (and UPSs) . Redundant fan systems . Fault-tolerant disks, such as RAID (1 through 10), preferably “hot swappable” . ECC memory . Redundant Ethernet connections Backup Considerations After you consider hardware, you need to look at the basic techniques and frequency of your disk backups and database backups. For many companies, the backup plan isn’t what it needs to be to guarantee recoverability and even the basic level of high availability. At many sites, database backups are not being run, are corrupted, or aren’t even considered necessary. You would be shocked by the list of Fortune 1000 companies where this occurs. Operating System Upgrades You need to make sure that all upgrades to your OS are applied and also that the configu- ration of all options is correct. This includes making sure you have antivirus software installed (if applicable), along with the appropriate firewalls for external-facing systems. 18 Download from www.wowebook.com ptg 528 CHAPTER 18 SQL Server High Availability Vendor Agreements Followed Vendor agreements come in the form of software licenses, software support agreements, hardware service agreements, and both hardware and software service-level agreements. Essentially, you are trying to make sure you can get all software upgrades and patches for your OS and for your application software at any time, as well as get software support, hardware support agreements, and both software and hardware SLAs in place to guarantee a level of service within a defined period of time. Training Kept Up to Date Training is multifaceted in that it can be for software developers to guarantee that the code they write is optimal, for system administrators who need to administer applications, and even for end users themselves to make sure they use the system correctly. All these types of training play into the ultimate goal of achieving high availability. Quality Assurance Done Well Testing as much as possible—and doing it in a very formal way—is a great way to guaran- tee a system’s availability. Dozens of studies over the years have clearly shown that the more thoroughly you test (and the more formal your QA procedures), the fewer software problems you will have. Many companies foolishly skimp on testing, which has a huge impact on system reliability and availability. Standards/Procedures Followed Standards and procedures are interlaced tightly with training and QA. Coding standards, code walkthroughs, naming standards, formal system development life cycles, protection of tables from being dropped, use of governors, and so on all contribute to more stable and potentially more highly available systems. Server Instance Isolation By design, you may want to isolate applications (such as SQL Server’s applications and their databases) away from each other to mitigate the risk of such an application causing another to fail. Plain and simple, you should never put applications in each other’s way if you don’t have to. The only things that might force you to load up a single server with all your applica- tions would be expensive licensing costs for each server’s software and perhaps hardware scarcity (strict limitations to the number of servers available for all applications). A classic example occurs when a company loads up a single SQL Server instance with between two and eight applications and their associated databases. The problem is that the applications are sharing memory, CPUs, and internal work areas, such as tempdb. Figure18.2 shows an overloaded SQL Server instance that is being asked to service seven major applications (Appl 1 DB through Appl 7 DB). The single SQL Server instance in Figure 18.2 is sharing memory (cache) and critical inter- nal working areas, such as tempdb, with all seven major applications. Everything runs fine Download from www.wowebook.com ptg 529 The Fundamentals of HA Network Windows 2003 SQL Server 2008 High Risk: Single SQL Server Instance Processors - 4 Web Services Memory/Cache 8GB RAID Disk Array OS/SQL Binaries C: SCSI D: E: F: Master DB MSDB DB Temp DB Appl 1 DB Appl 2 DB Appl 3 DB Appl 4 DB Appl 5 DB Appl 6 DB Appl 7 DB FIGURE 18.2 High risk: Many applications sharing a single SQL Server 2008 instance. until one of these applications submits a runaway query, and all other applications being serviced by that SQL Server instance come to a grinding halt. Most of this built-in risk could be avoided by simply putting each application (or perhaps two applications) onto their own SQL Server instance, as shown in Figure 18.3. This fundamental design approach greatly reduces the risk of one application affecting another. 18 Network Windows 2003 SQL Server 2008 Processors - 4 Web Services Memory/Cache 8GB RAID Disk Array Master DB TempDB C: SCSI D: E: F: Appl 1 DB Appl 5 DB Network Windows 2003 SQL Server 2008 Processors - 4 Web Services Memory/Cache 8GB RAID Disk Array Master DB TempDB C: SCSI D: E: F: Appl 2 DB Appl 3 DB Network Windows 2003 SQL Server 2008 Processors - 4 Web Services Memory/Cache 8GB RAID Disk Array Master DB TempDB C: SCSI D: E: F: Appl 4 DB Appl 6 DB Appl 7 DB FIGURE 18.3 Mitigated risk: Isolating critical applications away from each other. Download from www.wowebook.com ptg 530 Many companies make this fundamental error. The trouble is that they keep adding new applications to their existing server instance without a full understanding of the shared resources that underpin the environment. It is often too late when they finally realize that they are hurting themselves “by design.” You have now been given proper warning of the risks. If other factors, such as cost or hardware availability, dictate otherwise, then at least it is a calculated risk that is entered into knowingly (and is properly documented as well). Building Solutions with One or More HA Options When you have the fundamental foundation in place, as described in the preceding section, you can move on to building a tailored software-driven high-availability solution. Which HA option(s) you should be using really depends on your HA requirements. The following high-availability options are used both individually and, very often, together to achieve different levels of HA: . Microsoft Cluster Services (non–SQL Server based) . SQL clustering . Data replication (including peer-to-peer configurations) . Log shipping . Database mirroring All these options are readily available “out of the box” from Microsoft, from the Windows Server family of products and from Microsoft SQL Server 2008. It is important to understand that some of these options can be used together, but not all go together. For example, you might use Microsoft Cluster Services (MSCS) along with Microsoft SQL Server 2008’s SQL Clustering to implement the SQL clustering database configuration, whereas, you wouldn’t necessarily need to use MSCS with database mirror- ing. Microsoft Cluster Services (MSCS) MSCS could actually be considered a part of the basic HA foundation components described earlier, except that it’s possible to build a high-availability system without it (for example, a system that uses numerous redundant hardware components and disk mirror- ing or RAID for its disk subsystem). Microsoft has made MSCS the cornerstone of its clus- tering capabilities, and MSCS is utilized by applications that are cluster enabled. A prime example of a cluster-enabled technology is Microsoft SQL Server 2008. MSCS is the advanced Windows operating system configuration that defines and manages between 2 and 16 servers as “nodes” in a cluster. These nodes are aware of each other and can be set up to take over cluster-aware applications from any node that fails (for example, a failed server). This cluster configuration also shares and controls one or more disk subsystems as part of its high-availability capability. Figure 18.4 illustrates a basic two- node MSCS configuration. CHAPTER 18 SQL Server High Availability Download from www.wowebook.com ptg 531 Building Solutions with One or More HA Options 18 Windows 2008 R2 Enterprise Edition Node B Windows 2008 R2 Enterprise Edition Node A Local Binaries C: Local Binaries C: SCSI D: Shared Disk FIGURE 18.4 Basic two-node MSCS configuration. MSCS is available only with Microsoft Windows Enterprise Edition and Data Center oper- ating system products. Don’t be alarmed, though. If you are looking at a high-availability system to begin with, there is a great probability that your applications are already running with these enterprise-level OS versions. MSCS can be set up in an active/passive or active/active mode. Essentially, in an active/passive mode, one server sits idle (that is, is passive) while the other is doing the work (that is, is active). If the active server fails, the passive one takes over the shared disk and the cluster-aware applications instantaneously. SQL Clustering If you want a SQL Server instance to be clustered for high availability, you are essentially asking that this SQL Server instance (and the database) be completely resilient to a server failure and completely available to the application without the end user ever even notic- ing that there was a failure (or at least with minimal interruption). Microsoft provides this capability through the SQL Clustering option. SQL Clustering is built on top of MSCS for its underlying detection of a failed server and for its availability of the databases on the shared disk (which is controlled by MSCS). SQL Server is said to be a “cluster- aware/enabled” technology. A SQL Server instance that is clustered can be created by actually creating a virtual SQL Server instance that is known to the application (the constant in the equation) and then Download from www.wowebook.com ptg 532 Windows 2008 Enterprise Edition SQL Connections SQL Server 2008 (physical) SQL A SQL B Windows 2008 Enterprise Edition SQL Server 2008 (physical) SQL Server 2008 (Virtual SQL Server) Local Binaries C: Local Binaries C: SCSI D: Master DB Temp DB Appl 1 DB FIGURE 18.5 Basic SQL Clustering two-node configuration (active/passive). two physical SQL Server instances that share one set of databases. In an active/passive configuration, only one SQL Server instance is active at a time and just goes along and does its work. If that active server fails (and with it, the physical SQL Server instance), the passive server (and the physical SQL Server instance on that server) simply takes over instantaneously. This is possible because MSCS also controls the shared disk where the databases are. The end user and application never really know which physical SQL Server instance they are on or whether one failed. Figure 18.5 illustrates a typical SQL Clustering configuration built on top of MSCS. CHAPTER 18 SQL Server High Availability Setup and management of this type of configuration are much easier than you might think. More and more often, SQL Clustering is the method chosen for most high-availabil- ity solutions. Later in this chapter, you see that other methods may also be viable for achieving high availability (based on the application’s HA requirements). Chapter 21, “SQL Server Clustering,” covers this topic in more detail. Extending the clustering model to include Network Load Balancing (NLB) pushes this particular solution even further into higher availability—from client traffic high avail- ability to back-end SQL Server high availability. Figure 18.6 shows a four-host NLB cluster architecture acting as a virtual server to handle the network traffic coupled with a two-node SQL cluster on the back end. This setup is resilient from top to bottom. Download from www.wowebook.com ptg 533 Building Solutions with One or More HA Options 18 The four NLB hosts work together, distributing the work efficiently. NLB automatically detects the failure of a server and repartitions client traffic among the remaining servers. The following apply to SQL Clustering in SQL Server 2008: . Full SQL Server 2008 Services as cluster-managed resources—All SQL Server 2008 services, including the following, are cluster aware: . SQL Server DBMS engine . SQL Server Agent . SQL Server Full-Text Search . Analysis Services . Integration Services . Notification Services . Reporting Services . Service Broker Front-End LAN Back-End LAN SQL Cluster (virtual) (IP 100.122134.32) Instance data SQL A Local Binaries CL Node A (IP 100.122.134.33) SQL B Local Binaries scsiscsi CL Node B (IP 100.122.134.34) FIGURE 18.6 An NLB host cluster with a two-node server cluster. Download from www.wowebook.com . www.wowebook.com ptg 532 Windows 2008 Enterprise Edition SQL Connections SQL Server 2008 (physical) SQL A SQL B Windows 2008 Enterprise Edition SQL Server 2008 (physical) SQL Server 2008 (Virtual SQL Server) Local. detects the failure of a server and repartitions client traffic among the remaining servers. The following apply to SQL Clustering in SQL Server 2008: . Full SQL Server 2008 Services as cluster-managed. Services as cluster-managed resources—All SQL Server 2008 services, including the following, are cluster aware: . SQL Server DBMS engine . SQL Server Agent . SQL Server Full-Text Search . Analysis