ptg 534 CHAPTER 18 SQL Server High Availability Now, you can extend this fault-tolerant solution to embrace more SQL Server instances and all of SQL Server’s related services. This is a big deal because things like Analysis Services previously had to be handled with separate techniques to achieve near high avail- ability. Not anymore; each SQL Server service is now cluster aware. Data Replication The next technology option that can be utilized to achieve high availability is data repli- cation. Originally, data replication was created to offload processing from a very busy server (such as an OLTP application that must also support a big reporting workload) or to geographically distribute data for different, very distinct user bases (such as worldwide product ordering applications). As data replication (transactional replication) became more stable and reliable, it started to be used to create “warm” (almost “hot”) standby SQL Servers that could also be used to fulfill basic reporting needs. If the primary server ever failed, the reporting users would still be able to work (hence a higher degree of availability achieved for them), and the replicated reporting database could be used as a substitute for the primary server, if needed (hence a warm-standby SQL Server). When doing transac- tional replication in the “instantaneous replication” mode, all data changes were repli- cated to the replicate servers extremely quickly. With SQL Server 2000, updating subscribers allowed for even greater distribution of the workload and, overall, increased the availability of the primary data and distributed the update load across the replication topology. There are plenty of issues and complications involved in using the updating subscribers approach (for example, conflict handlers, queues). With SQL Server 2005, Microsoft introduced peer-to-peer replication, which is not a publisher/subscription model, but a publisher-to-publisher model (hence peer-to-peer). It is a lot easier to configure and manage than other replication topologies, but it still has its nuances to deal with. This peer-to-peer model allows excellent availability for this data and great distribution of workload along geographic (or other) lines. This may fit some companies’ availability requirements and also fulfill their distributed reporting require- ments as well. The top of Figure 18.7 shows a typical SQL data replication configuration of a central publisher/subscriber using continuous transactional replication. This can serve as a basis for high availability and also fulfills a reporting server requirement at the same time. The bottom of Figure 18.7 shows a typical peer-to-peer continuous transactional replication model that is also viable. The downside of peer-to-peer replication comes into play if ever the subscriber (or the other peer) needs to become the primary server (that is, take over the work from the origi- nal server). This takes a bit of administration that is not transparent to the end user. Connection strings have to be changed, ODBC data sources need to be updated, and so on. But this process may take minutes as opposed to hours of database recovery time, and it may well be tolerable to end users. Peer-to-peer configurations handle recovery a bit better in that much of the workload is already distributed to either of the nodes. So, at most, only part of the user base will be affected if one node goes down. Those users can easily be redirected to the other node (peer), with the same type of connection changes described earlier. Download from www.wowebook.com ptg 535 Building Solutions with One or More HA Options 18 SQL Server 2008 Publication Server Adventure Works DB translog SQL Server 2008 Publication Server Adventure Works DB translog SQL Server 2008 Distribution Server “continuous” transactional replication Peer-to-Peer Central Publisher/ Subscriber SQL Server 2008 Hot Spare (Fail-over) AW DB SQL Server 2008 distribution Adventure Works translog MSDB DB FIGURE 18.7 Basic data replication configurations for HA. With either the publisher/subscriber or peer-to-peer replication approach, there is a risk of not having all the transactions from the publishing server. However, often, a company is willing to live with this small risk in favor of availability. Remember that a replicated data- base is an approximate image of the primary database (up to the point of the last update that was successfully distributed), which makes it very attractive as a warm standby. For publishing databases that are primarily read-only, using a warm standby is a great way to distribute the load and mitigate the risk of any one server failing. Chapter 19, “Replication,” covers data replication and all the various implementation scenarios that you might ever need to use. Log Shipping Another, more direct, method of creating a completely redundant database image is to utilize log shipping. Microsoft “certifies” log shipping as a method of creating an “almost hot” spare. Some folks even use log shipping as an alternative to data replication (it has been referred to as “the poor man’s data replication”). There’s just one problem: Microsoft has formally announced that log shipping (as we know and love it) will be deprecated in the near future. The reasons are many, but the primary one is that it is being replaced by database mirroring (referred to as real-time log shipping, when it was first being conceived). If you still want to use log shipping, it is perfectly viable—for now. Log shipping does three primary things: . Makes an exact image copy of a database on one server from a database dump . Creates a copy of that database on one or more other servers from that dump Download from www.wowebook.com ptg 536 CHAPTER 18 SQL Server High Availability FIGURE 18.8 Log shipping in support of high availability. . Continuously applies transaction log dumps from the original database to the copy In other words, log shipping effectively replicates the data of one server to one or more other servers via transaction log dumps. Figure 18.8 shows a source/destination SQL Server pair that has been configured for log shipping. Log shipping is a great solution when you have to create one or more failover servers. It turns out that, to some degree, log shipping fits the requirement of creating a read-only subscriber as well. The following are the gating factors for using log shipping as a method of creating and maintaining a redundant database image: . Data latency lag is the time that exists between the transaction log dumps on the source database and when these dumps are applied to the destination databases. . Sources and destinations must be the same SQL Server version. . Data is read-only on the destination SQL Server until the log shipping pairing is bro- ken (as it should be to guarantee that the transaction logs can be applied to the des- tination SQL Server). The data latency restriction might quickly disqualify log shipping as an instantaneous high-availability solution (if you need rapid availability of the failover server). However, log shipping might be adequate for certain situations. If a failure ever occurs on the primary SQL Server, a destination SQL Server that was created and maintained via log shipping can be swapped into use fairly quickly. The destination SQL Server would contain exactly what was on the source SQL Server (right down to every user ID, table, index, and file allocation map, except for any changes to the source database that Download from www.wowebook.com ptg 537 Building Solutions with One or More HA Options 18 occurred after the last log dump was applied). This directly achieves a level of high avail- ability. It is still not completely transparent, though, because the SQL Server instance names are different, and the end user may be required to log in again to the new server instance. NOTE Log shipping is not covered further in this book because of its limited life going for- ward. The SQL Server 2000 Unleashed version of this book covers log shipping in extensive detail. Remember that log shipping is not data replication and uses a com- pletely different technology than data replication. Database Mirroring Another failover option with SQL Server is database mirroring. Database mirroring essen- tially extends the old log shipping feature of SQL Server and creates an automatic failover capability to a “hot” standby server. Database mirroring is being billed as creating a fault- tolerant database that is an “instant” standby (ready for use in less than three seconds). At the heart of database mirroring is the “copy-on-write” technology. Copy-on-write means that transactional changes are shipped to another server as the logs are written. All logged changes to the database instance become immediately available for copying to another location. As you can see in Figure 18.9, database mirroring utilizes a witness server as well as client components to insulate the client applications from any knowledge of a server failure. SQL Server 2008 Principal Server Adventure Works DB translog SQL Server xyz Witness Server MSDB DB SQL Server 2008 Mirror Server Adventure Works DB translog Client Client ClientClient Network FIGURE 18.9 SQL Server 2008 database mirroring high-availability configuration. Download from www.wowebook.com ptg 538 CHAPTER 18 SQL Server High Availability Chapter 20 dives much more deeply into database mirroring setup, configuration, and architecture. It is sufficient to say here that with database mirroring, an application can possibly be failed over to the mirrored database in 3 seconds or less, with nearly complete client transparency. You can also leverage this mirrored database for offloading reporting by creating a snapshot from it. Again, this topic is covered in Chapter 20. Combining Failover with Scale-Out Options SQL Server 2008 pushes combinations of options to achieve higher availability levels. A prime example would be combining data replication with database mirroring to provide maximum availability of data, scalability to users, and fault tolerance via failover, poten- tially at each node in the replication topology. By starting with the publisher and perhaps the distributor, you make them both database mirror failover configurations. Building up a combination of both options together is essentially the best of both worlds: the super-low latency of database mirroring for fault tolerance and high availability (and scalability) of data through replication. Check out Chapter 20 for more details on this creative configuration. Other HA Techniques That Yield Great Results Microsoft has been revisiting (and architecting) several operations that previously required a table or whole database to be offline. For several critical database operations (such as recovery operations, restores, indexing, and others), Microsoft has either made the data in the database available earlier in the execution of the operation or made the data in the database completely available simultaneously with the operation. The following primary areas are now addressed: . Fast recovery—This faster recovery option directly improves the availability of SQL Server databases. Administrators can reconnect to a recovering database after the transaction log has been rolled forward (and before the rollback processing has finished). Figure 18.10 illustrates how Microsoft makes a SQL Server 2008 database available earlier than would SQL Server 2000. In particular, a database in SQL Server 2008 becomes available when committed transaction log entries are rolled forward (termed redo) and no longer have to wait for the “in flight” transactions to be rolled back (termed undo). . Online restore—Database administrators can perform a restore operation while the database is still online. Online restore improves the availability of SQL Server because only the data being restored is unavailable; the rest of the database remains online and available to users. In addition, the granularity of the restore has changed to be at the filegroup level and even at the page level, if needed. The remainder of the database remains available. . Online indexing—Concurrent modifications (updates, deletes, and inserts) to the underlying table or clustered index data and any associated indexes can now be done during index creation time. For example, while a clustered index is being Download from www.wowebook.com ptg 539 Other HA Techniques That Yield Great Results 18 SQL Server 2008 SQL Server Restart Stage time Restart complete Transactions Rolled Back Transactions Rolled Forward SQL Server 2000 SQL Server SQL Server 2000 database is available SQL Server 2005 and 2008 database is available FIGURE 18.10 SQL Server 2008 databases become available earlier than databases with SQL Server 2000 database recovery (fast recovery). rebuilt, you can continue to make updates to the underlying data and perform queries against the data. . Database snapshots—You can now create a read-only, stable view of a database. A database snapshot is created without the overhead of creating a complete copy of the database or having completely redundant storage. A database snapshot is simply a reference point of the pages used in the database (that is defined in the system catalog). When pages are updated, a new page chain is started that contains the data pages changed since the database snapshot was taken, as illustrated in Figure 18.11. As the original database diverges from the snapshot, the snapshot gets its own copy of original pages when they are modified. The snapshot can even be used to recover an accidental change to a database by simply reapplying the pages from the snap- shot back to the original database. The copy-on-write technology used for database mirroring also enables database snapshots. When a database snapshot is created on a database, all writes check the system catalog of “changed pages” first; if not there, the original page is copied (using the copy-on-write technique) and is put in a place for reference by the data- base snapshot (because this snapshot must be kept intact). In this way, the database snapshot and the original database share the data pages that have not changed. . Data partitioning improvements—Data partitioning has been enhanced with native table and index partitioning. It essentially allows you to manage large tables and indexes at a lower level of granularity. In other words, a table can be defined to identify distinct partitions (such as by date or by a range of key values). This approach effectively defines a group of data rows that are unique to a partition. Download from www.wowebook.com ptg 540 CHAPTER 18 SQL Server High Availability SQL Server 2008 SQL Server Source Data Pages Adventure Works DB Snapshot AdventureWorks DB System Catalog of changed pages Sparse File Pages Snapshot Users SELECT…data…… FROM AdventureWorks SNAPSHOT FIGURE 18.11 Database snapshots and the original database share pages and are managed within the system catalog of SQL Server 2008. These partitions can be taken offline, restored, or loaded independently while the rest of the table is available. . Addition of a snapshot isolation level—This snapshot isolation (SI) level is a database-level capability that allows users to access the last committed row, using a transactionally consistent view of the database. This capability provides improved scalability and availability by not blocking data access of this previously unavailable data state. This new isolation level essentially allows data reading requests to see the last committed version of data rows, even if they are currently being updated as part of a transaction (for example, they see the rows as they were at the start of the trans- action without being blocked by the writers, and the writers are not blocked by readers because the readers do not lock the data). This isolation level is probably best used for databases that are read-mostly (with few writes/updates) due to the poten- tial overhead in maintaining this isolation level. . Dedicated administrator connection—This feature introduces a dedicated admin- istrator connection that administrators can use to access a running server even if the server is locked or otherwise unavailable. This capability enables administrators to troubleshoot problems on a server by executing diagnostic functions or Transact-SQL statements without having to take down the server. High Availability from the Windows Server Family Side To enhance system uptimes, numerous system architecture enhancements that directly reduce unplanned downtime, such as improved memory management and driver verifica- tion, were made in Windows 2000, 2003, and 2008 R2. New file protection capabilities Download from www.wowebook.com ptg 541 High Availability from the Windows Server Family Side 18 prevent new software installations from replacing essential system files and causing fail- ures. In addition, device driver signatures identify drivers that may destabilize a system. And, perhaps another major step toward stabilization is the use of virtual servers. Microsoft Virtual Server 2005 Virtual Server 2005 is a much more cost-effective virtual machine solution designed on top of Windows Server 2008 to increase operational efficiency in software testing and development, application migration, and server consolidation scenarios. Virtual Server 2005 is designed to increase hardware efficiency and help boost administrator productiv- ity, and it is a key Microsoft deliverable toward the Dynamic Systems Initiative (eliminat- ing reboots of servers, which directly affects downtime!). As shown in Figure 18.12, the host operating system—Windows Server 2008 in this case—manages the host system (at the bottom of the stack). Virtual Server 2005 provides a Virtual Machine Monitor (VMM) virtualization layer that manages virtual machines and provides the software infrastructure for hardware emula- tion. As you move up the stack, each virtual machine consists of a set of virtualized devices, the virtual hardware for each virtual machine. A guest operating system and applications run in the virtual machine—unaware, for example, that the network adapter they interact with through Virtual Server is only a soft- ware simulation of a physical Ethernet device. When a guest operating system is running, the special-purpose VMM kernel takes mediated control over the CPU and hardware during virtual machine operations, creating an isolated environment in which the guest Virtual Machine Operating System and Applications Virtual Machine Operating System and Applications Virtual Hardware Virtual Hardware Virtual Server 2005 Windows Server 2008 Any x86/x64 (32/64 bit) Server FIGURE 18.12 Microsoft Virtual Server 2005 server architecture. Download from www.wowebook.com ptg 542 CHAPTER 18 SQL Server High Availability operating system and applications run close to the hardware at the highest possible performance. Virtual Server 2005 is a multithreaded application that runs as a system service, with each virtual machine running in its own thread of execution; I/O occurs in child threads. Virtual Server derives two core functions from the host operating system: the underlying host operating system kernel schedules CPU resources, and the device drivers of the host operating system provide access to system devices. The Virtual Server VMM provides the software infrastructure to create virtual machines, manage instances, and interact with guest operating systems. An often-discussed example of leveraging Virtual Server 2005 capabilities would be to use it in conjunction with a disaster recovery implementation. Virtual Server 2005 and Disaster Recovery Virtual Server 2005 enables a form of server consolidation for disaster recovery. Rather than maintaining redundancy with costly physical servers, customers can use Virtual Server 2005 to back up their mission-critical functionality in a cost-effective way by means of virtual machines. The Virtual Machine Monitor (VMM) and Virtual Hard Disk (VHD) technologies in Virtual Server 2005, coupled with the comprehensive COM API, can be used to create similar failover functionality as standard, hardware-driven disaster recovery solutions. Customers can then use the Virtual Server COM API to script periodic duplica- tion of physical hard disks containing vital business applications to virtual machine VHDs. Additional scripts can switch to the virtual machine backup in the event of cata- strophic failure. In this way, a failing device can be taken offline to troubleshoot, or the application or database can be moved to another physical or virtual machine. Moreover, because VHDs are a core Virtual Server technology, they can be used as a disaster recovery agent, wherein business functionality and data can be easily archived, duplicated, or moved to other physical machines. Summary As you come to completely understand and assess your application’s high-availability requirements, you can create a matching high-availability solution that will serve you well for years to come. The crux of high availability is laying a fundamentally sound founda- tion that you can count on when failures occur and then, when failures do occur, deter- mining how much data loss you can tolerate, how much downtime is possible, and what the downtime is costing you. The overall future seems to be improving greatly in all the basic areas of your Microsoft platform footprint, including . Cheaper and more reliable hardware components that are highly swappable . The advent of virtual server capabilities (with Windows Virtual Server 2005) to insu- late software failures from affecting hardware . Enhancements that Microsoft is making to SQL Server 2008 that address availability Download from www.wowebook.com ptg 543 Summary 18 The critical enhancements to the cornerstone availability capabilities of SQL Clustering will help this fault-tolerant architecture grow more reliable for years to come. The big bonuses come with the features of database mirroring as another fault-tolerant solution at the database level and the database snapshots feature to make data more available to more users more quickly than the older method of log shipping. To top it all off, Microsoft is making great strides in the areas of online maintenance oper- ations (online restores, online index creates, and so on) and leaping into the realm of one or more virtual server machines (with Virtual Server 2005) that will not bring down a physical server that houses them (which is very UNIX-like). Chapter 19 delves into the complexities of the major data replication options available with SQL Server 2008. Download from www.wowebook.com . Results 18 SQL Server 2008 SQL Server Restart Stage time Restart complete Transactions Rolled Back Transactions Rolled Forward SQL Server 2000 SQL Server SQL Server 2000 database is available SQL Server. HA Options 18 SQL Server 2008 Publication Server Adventure Works DB translog SQL Server 2008 Publication Server Adventure Works DB translog SQL Server 2008 Distribution Server “continuous”. finished). Figure 18.10 illustrates how Microsoft makes a SQL Server 2008 database available earlier than would SQL Server 2000. In particular, a database in SQL Server 2008 becomes available when committed