Server Cluster Deployment Options When you use either the single quorum device model or MNS model, there are a variety of ways that you can configure your clustered applications to act during a failover operation.The choices vary with the number of nodes in your server cluster, and each has advantages and disadvantages. These deployment options are not always mutually exclusive. In a server cluster with several nodes and multiple cluster groups, it is possible that some groups will use one deployment option while other groups use a different one. Consider these options carefully when you design larger server clusters. N-Node Failover Pairs The N-node failover pairs deployment option specifies that two nodes, and only two nodes, may run the application.This is the simplest option and is, in essence, the only option available in a two- node server cluster. If configured in a larger server cluster with three or more nodes, the application will not be able to function if both nodes are not operational. In larger server clusters made up of nodes with different processing capabilities or capacities, you can use this option to limit an applica- tion to running on only the nodes capable of adequately servicing the application. An N-node failover pair is configured by specifying the two nodes in the Possible Owners property for the cluster resource, as shown in Figure 6.4.You can set the Possible Owners property using the server cluster administrative tools described in the “Server Cluster Administration” section later in this chapter. Every cluster resource has a Possible Owners property that can be configured or left blank. Figure 6.5 illustrates an N-node failover configuration in a server cluster with four nodes—A, B, C and D—in its normal operational state. Nodes A and B are configured as a failover pair, and nodes C and D are also a failover pair. Assorted virtual servers are active and are spread among the nodes. 196 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.4 Setting the Possible Owners Property 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 196 Figure 6.6 shows the same server cluster as Figure 6.5, but after two of the nodes failed. As you can see, node B has taken ownership of the virtual servers that were operating on its failover partner (node A). Node C has also taken ownership of node D’s virtual servers. Note that Figures 6.5 and 6.6 depict a single quorum device server cluster. An MNS server cluster with four nodes could not operate with failed two nodes.The storage devices and Interconnects have been removed from the images for clarity. Hot-Standby Server/N+I The hot-standby server/N+1 deployment option is possible on server clusters with two or more nodes and is sometimes referred to as an active/passive design. In this design, you specify one node in the server cluster as a hot spare.This hot-spare node is normally idle or lightly loaded. It acts as the failover destination for other nodes in the cluster. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 197 Figure 6.5 N-Node Failover, Initial State Public Network Virtual 1 Virtual 3 Virtual 2 Node A Node DNode B Node C Virtual 8Virtual 5Virtual 4 Virtual 6 Virtual 7 Initial State Failover Pair AB Failover Pair CD Figure 6.6 N-Node Failover, Failed State FAILED Public Network Virtual 1 Virtual 3 Virtual 2 FAILED Node DNode B Node C Virtual 8 Virtual 5Virtual 4 Virtual 6 Virtual 7 Failover Pair AB Failover Pair CD Node A Failed State 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 197 The main advantage of this option is cost savings. If a two-node server cluster is configured with one node running the application(s) (the N or active node) and one node standing idle, waiting for a failover (the I or passive node), the overhead cost in hardware is 50 percent. In an eight-node server cluster with seven N (active) nodes and one I (passive) node, the overhead cost is about 15 percent. This option is not limited to a single hot-spare node. An eight-node server cluster could be configured with one N node and seven I nodes or any other possible combination. In these config- urations, the overhead cost savings would be quite a bit less or nonexistent. Configure this option by setting the Preferred Owners property of the group to the N node(s), as shown in Figure 6.7, and the Possible Owners of the resources to the N and I nodes. As men- tioned earlier, the Possible Owners property is a property of the individual resource.The Preferred Owner property, however, applies only to cluster groups. Both the Possible Owners and Preferred Owners properties are configured via the server cluster administrative tools, which are covered in the “Server Cluster Administration” section later in this chapter. Figure 6.8 illustrates a four-node server cluster configured with three active (N) nodes and one passive (I) node in its normal operational state. Each active node supports various virtual servers. 198 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.7 Setting the Preferred Owners Property Figure 6.8 Hot-Standby/N+I Configuration, Initial State Public Network Virtual 1 Virtual 2 Node DNode B Node C Virtual 4 Virtual 5Virtual 3 Virtual 6 Virtual 7 Initial State Active/ N Node Active/ N Node Active/ N Node Passive/ I Node Node A 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 198 Figure 6.9 shows the same server cluster as Figure 6.8, but after the failure of two of the nodes. The virtual servers that were operating on the failed nodes have failed over to the I node.Again, if this were an MNS server cluster, there would not be enough nodes operating to support the server cluster.The MNS cluster would have failed when the second node failed, but the virtual servers from the first node would have been successfully failed over to the I node.Again, note that the storage devices and Interconnects have been removed from both images. Failover Ring A failover ring is mainly used when all nodes in a server cluster are active. When a failover occurs, applications are moved to the next node in line.This mode is possible if all nodes in the server cluster have enough excess capacity to support additional applications beyond what they normally run. If a node is operating at peak utilization, a failover to that node may reduce performance for all applications running on that node after the failover. The order of failover is defined by the order the nodes appear in the Preferred Owner list (see Figure 6.7).The default node for the application is listed first. A failover will attempt to move the cluster group to each node on the list, in order, until the group successfully starts. It is possible to limit the size of the failover ring by not specifying all the cluster nodes on the Preferred Owner list. In effect, this combines the N+I and failover ring options to produce a hybrid option.This hybrid option reduces the N+I overhead cost to zero, but you need to make sure that enough capacity is present to support your applications. Figure 6.10 illustrates an eight-node server cluster in a failover ring configuration in its initial state.This server cluster is operating with eight nodes.To simplify the diagram, each node is running one virtual server. (The configuration of the failover ring in this scenario is very simple: each node fails over to the next node, with the last node set to fail over to the first, and so on.) Storage devices and Interconnects have been removed for clarity. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 199 Figure 6.9 Hot Standby/N+I Configuration, Failed State FAILEDFAILED Public Network Virtual 1 Virtual 3 Virtual 2 Node B Node C Virtual 5 Virtual 4 Virtual 6 Virtual 7 Node A Failed State Active/ N Node Active/ N Node Active/ N Node Passive/ I Node Node D 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 199 Figure 6.11 illustrates the failover ring configuration after the server cluster has experienced a failure of half of its nodes. Notice how node F has picked up the virtual servers from nodes D and E, and how node A picked up the virtual server from node H. Again, if this were an MNS server cluster, there would not be enough nodes left operational for the server cluster to function. Again, storage devices and Interconnects have been removed from the image for clarity. Random The random deployment option makes the cluster service determine the destination for a failover. This option is used in large server clusters where each node is active and it is difficult to specify an order of failover because of the needs and complexity of the environment. When adopting this option, it is important to make sure that each node has sufficient excess capacity to handle addi- tional load. Otherwise, a failover may reduce performance for applications running on a node that is at or near peak capacity. This mode is configured by not defining a Preferred Owner for the resource group.The cluster service will attempt to determine a suitable node for the application in the event of a failover. Figure 6.12 illustrates a random failover configuration in the initial state. It shows a server cluster of eight nodes supporting two virtual servers, each in its normal operating mode. 200 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.10 Failover Ring Configuration, Initial State Public Network Virtual 1 Virtual 2 Node A Node D Node B Node C Virtual 4 Virtual 5Virtual 3 Virtual 6 Virtual 7 Initial State Virtual 8 Node E Node F Node G Node H Figure 6.11 Failover Ring Configuration, Failed State FAILEDFAILED Public Network Virtual 1 Virtual 2 Node A Node D FAILED FAILED Node B Node C Virtual 4 Virtual 5 Virtual 3 Virtual 6 Virtual 7 Failed State Virtual 8 Node E Node F Node G Node H 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 200 Figure 6.13 shows the same configuration after this server cluster has experienced a failure of three of its nodes. Notice how the virtual servers have been distributed seemingly at random to the surviving nodes. If this were an MNS server cluster, it would still be functioning. Server Cluster Administration After a server cluster is operational, it must be administered.There are two tools provided to you to accomplish this: Cluster Administrator, an interactive graphical utility, and Cluster.exe, provided for use at the command line and in scripts or batch files. Using the Cluster Administrator Tool To access the Cluster Administrator utility, select Start | Administrative Tools | Cluster Administrator.The Cluster Administrator utility, shown in Figure 6.14, allows you to create a new server cluster, add nodes to an existing server cluster, and perform administrative tasks on a server cluster. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 201 Figure 6.12 Random Configuration, Initial State Public Network Virtual 1 Virtual 3 Node A Node D Node B Node C Virtual 7 Virtual 9Virtual 5 Virtual 11 Virtual 13 Initial State Virtual 15 Node E Node F Node G Node H Virtual 2 Virtual 4 Virtual 6 Virtual 8 Virtual 10 Virtual 12 Virtual 14 Virtual 16 Figure 6.13 Random Configuration, Failed State Public Network Virtual 1 Virtual 3 Node A Node D Node B Node C Virtual 7 Virtual 9 Virtual 5 Virtual 11 Virtual 13 Failed State Virtual 15 Node E Node F Node G Node H Virtual 2 Virtual 4Virtual 6 Virtual 8 Virtual 10 Virtual 12 Virtual 14 Virtual 16 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 201 At the Open Connection to Cluster dialog box, shown in Figure 6.15, you can enter the name of a server cluster or browse for it. If you wish to create a new server cluster, select Create new cluster in the Action drop-down list box and click OK.This will start the New Server Cluster Wizard, which will step you through the process of creating a new server cluster. Selecting Add nodes to cluster in the Action drop- down list will start the Add Nodes Wizard.This Wizard lets you add nodes to an existing server cluster. Using Command-Line Tools Cluster.exe is the command-line utility you can use to create or administer a server cluster. It has all of the capabilities of the Cluster Administrator graphical utility and more. Cluster.exe has numerous options. Figure 6.16 shows the syntax of the cluster.exe command and the options you can use with it. 202 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.14 The Cluster Administrator Window Figure 6.15 The Open Connection Dialog Box 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 202 Figure 6.16 Cluster.exe Command Options CLUSTER /LIST[:domain-name] CLUSTER /CHANGEPASS[WORD] /? CLUSTER /CHANGEPASS[WORD] /HELP CLUSTER /CLUSTER:clustername1[,clustername2[, ]] /CHANGEPASS[WORD][:newpassword[,oldpassword]] <options> <options> = [/FORCE] [/QUIET] [/SKIPDC] [/TEST] [/VERB[OSE]] [/UNATTEND[ED]] [/?] [/HELP] CLUSTER [/CLUSTER:]cluster-name <options> <options> = /CREATE [/NODE:node-name] [/VERB[OSE]] [/UNATTEND[ED]] [/MIN[IMUM]] /USER:domain\username | username@domain [/PASS[WORD]:password] /IPADDR[ESS]:xxx.xxx.xxx.xxx[,xxx.xxx.xxx.xxx,network-connection- name] /ADD[NODES][:node-name[,node-name ]] [/VERB[OSE]] [/UNATTEND[ED]] [/MIN[IMUM]] [/PASSWORD:service-account-password] CLUSTER [[/CLUSTER:]cluster-name] <options> <options> = /CREATE [/NODE:node-name] /WIZ[ARD] [/MIN[IMUM]] [/USER:domain\username | username@domain] [/PASS[WORD]:password] [/IPADDR[ESS]:xxx.xxx.xxx.xxx] /ADD[NODES][:node-name[,node-name ]] /WIZ[ARD] [/MIN[IMUM]] [/PASSWORD:service-account-password] /PROP[ERTIES] [<prop-list>] /PRIV[PROPERTIES] [<prop-list>] /PROP[ERTIES][:propname[,propname ] /USEDEFAULT] /PRIV[PROPERTIES][:propname[,propname ] /USEDEFAULT] /REN[AME]:cluster-name /QUORUM[RESOURCE][:resource-name] [/PATH:path] [/MAXLOGSIZE:max-size- kbytes] /SETFAIL[UREACTIONS][:node-name[,node-name ]] Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 203 Continued 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 203 Figure 6.16 Cluster.exe Command Options /LISTNETPRI[ORITY] /SETNETPRI[ORITY]:net[,net ] /REG[ADMIN]EXT:admin-extension-dll[,admin-extension-dll ] /UNREG[ADMIN]EXT:admin-extension-dll[,admin-extension-dll ] /VER[SION] NODE [node-name] node-command GROUP [group-name] group-command RES[OURCE] [resource-name] resource-command {RESOURCETYPE|RESTYPE} [resourcetype-name] resourcetype-command NET[WORK] [network-name] network-command NETINT[ERFACE] [interface-name] interface-command <prop-list> = name=value[,value ][:<format>] [name=value[,value ][:<format> ] ] <format> = BINARY|DWORD|STR[ING]|EXPANDSTR[ING]|MULTISTR[ING]|SECURITY|ULARGE CLUSTER /? CLUSTER /HELP Note: With the /CREATE, /ADDNODES, and /CHANGEPASSWORD options, you will be prompted for passwords not provided on the command line unless you also specify the /UNATTENDED option. The following are some of the tasks that are impossible to do with Cluster Administrator or are easier to perform with Cluster.exe: ■ Changing the password on the cluster service account ■ Creating a server cluster or adding a node to a server cluster from a script ■ Creating a server cluster as part of an unattended setup of Windows Server 2003 ■ Performing operations on multiple server clusters at the same time 204 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 204 Recovering from Cluster Node Failure It is reasonable to assume that on any server cluster, you will have a component failure or need to take part of the server cluster offline for service.A properly designed and maintained server cluster should have no problems. But what if something causes the node to fail? For example, if a local hard disk in the node crashes, how do you recover? Many of the same basic administrative tasks performed on nonclustered servers apply to clus- tered ones. Following the same practices will help prevent unplanned downtime and assist in restoring service when service is lost: ■ Have good documentation Proper and complete documentation is the greatest asset you can have when trying to restore service. Configuration and contact information should also be included in your documentation. ■ Perform regular backups and periodically test restores Clusters need to be backed up just like any other computer system. Periodically testing a restore will help keep the process fresh and help protect against hardware, media, and some software failures. ■ Perform Automated System Recovery (ASR) backups When performing an ASR backup on your server cluster, make sure that one node owns the quorum resource during the ASR backup. If you need an ASR restore, this will be a critical component. ■ Develop performance baselines A performance baseline should be developed for each node and the server cluster as a whole.This will help you determine if your server cluster is not performing properly or is being outgrown. If a node experiences a failure, any groups that were on the failed node should be moved to another node (unless you are using the single-node model).You should then repair the failed com- ponents in the node in the same way as you would repair any computer system. If repairing the node involves the replacement of the boot and/or system drives, you may need to do an ASR restore. As a precaution, you should physically disconnect the node from the cluster’s shared storage devices first. Once the ASR restore is complete, shut down the node, reconnect it to the shared storage devices, and boot the node. Server Clustering Best Practices There are many ways to accomplish the setup and operation of a server cluster, but some methods are more reliable than others. Microsoft has published a number of “Best Practices” documents relating to its products and technologies, and server clusters are no exception. Hardware Issues The foundation of your server cluster is the hardware. It is critical to build reliable nodes at the hardware level.You cannot build high availability from unreliable or unknown components. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 205 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 205 . an MNS server cluster, there would not be enough nodes operating to support the server cluster .The MNS cluster would have failed when the second node failed, but the virtual servers from the first. 198 Figure 6.9 shows the same server cluster as Figure 6.8, but after the failure of two of the nodes. The virtual servers that were operating on the failed nodes have failed over to the I node.Again,. first. Once the ASR restore is complete, shut down the node, reconnect it to the shared storage devices, and boot the node. Server Clustering Best Practices There are many ways to accomplish the setup