Compatibility List Microsoft’s position since it first began publishing cluster technology is that the hardware compo- nents used in a server cluster and the entire server cluster configuration itself must be listed on the Hardware Compatibility List (HCL) in order to receive support. With the introduction of Windows XP, Microsoft changed from the HCL to the Windows Catalog. Windows Server 2003-compatible hardware is listed in the Windows Server Catalog, but the concept and support requirements remain the same as they were with the HCL. In order to receive technical support from Microsoft, ensure that your entire hardware configu- ration is listed as compatible in the Windows Server Catalog. Using unlisted hardware does not mean you cannot make the hardware work; it simply means that you cannot call Microsoft for help if the need arises. Network Interface Controllers A server cluster requires at least two network interfaces to function: one for the public network (where client requests come from) and one for the private interconnect network (for the heartbeat). Since a single private interconnect would present a single point of failure, it is good practice to have at least two interconnects. Do not use a teamed configuration with interconnects. A teamed configura- tion binds two or more physical interfaces together into one logical interface. Using teamed con- trollers preserves the single point of failure. Network controllers should be identical.This includes not only the manufacturer and model, but also the firmware and drivers. Using identical controllers will also simplify the design of your server cluster and make troubleshooting easier. Change the default name of each network interface to a descriptive name. Use Heartbeat, Interconnect, or Private for the interconnect interface. Similarly, use Public, Primary, or some similar name for the public interfaces.You should configure these names identically on each node. Following this procedure will make identifying and troubleshooting network issues much easier. Storage Devices No single resource in a server cluster requires more planning and preparation than shared storage. Poor planning can make management tasks quite difficult. Planning cluster disk resources requires attention to numerous details. First, thorough planning must be done for the acquisition of the shared disk hardware. Develop capacity requirements and design disk layouts. Dynamic disks, volume sets, remote storage, removable storage, and software-based RAID cannot be used for shared cluster disks. Plan on using hardware RAID, and purchase extra hard disks for use as RAID hot spares. If a single RAID controller is part of the design (likely in a single-node cluster), make sure that you keep an identical spare RAID controller on hand.The spare should be the exact brand and model and have the same firmware version as your production RAID controller. If you are using Fibre Channel-based controllers, consider using multiple Fibre Channel host bus adapters (HBAs) configured in either a load-balanced or failover configuration.This will increase the cost of the cluster, but fault-tolerance will also increase. Before purchasing redundant HBAs, make sure that they are of the same brand, model, and firmware version.Also, ensure that the hardware vendor includes any necessary drivers or software to support the redundant HBA configuration. 206 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 206 If you are using SCSI-based controllers, ensure that each SCSI adapter on the shared storage bus is configured with a different SCSI ID.Also ensure that the shared SCSI bus is properly terminated. If either of these tasks is not done properly, data could be lost, hardware could be damaged, or the second cluster node may not properly join the cluster. Use caution with write caching of shared disks. If power fails or a failover occurs before data is completely written to disk, data can be corrupted or lost. Disable write caching in Device Manager by clearing the Enable write caching on the disk check box on the Policies tab in the Properties of the drive, shown in Figures 6.17 and 6.18. If the RAID controller supports write caching, either disable the feature or ensure that battery backup for the cache or an alternate power supply for the controller is available. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 207 Figure 6.17 Accessing Disk Drive Properties in Device Manager Figure 6.18 Disabling Write Caching on a Drive through Device Manager 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 207 When starting the installation of the first node, ensure that the first node is the only node on the shared storage bus.This must be done to properly partition and format the drives in the shared storage. Until the cluster service is installed, other nodes can access the shared disks and cause data corruption. If you are using a sophisticated disk system for shared cluster storage, use the features of the system to create logical drives that your nodes will access.This step is necessary because the disk is the smallest unit of storage that is recognized as a cluster resource.All of the partitions on a disk move with the disk between cluster nodes. Once the first node is booted, format your shared drives. Only the NTFS file system is sup- ported on clustered disks.The quorum drive should be created first. A minimum of 500MB should be assigned to the quorum drive, and no applications should reside on it. Partition and format the rest of your clustered drives as planned.Assign drive letters as you normally would, as shown in Figure 6.19, and document them.You can assign any drive letters that are not already in use, but it is a good idea to adopt the convention of assigning the quorum drive the same drive letter each time you create a cluster—Q (for quorum) is a good choice. Once you have assigned drive letters, you will need to match these drive-letter assignments on each node in the cluster. In addition to drive-letter assignments, you also have the option of using NTFS mounted drives. A mounted drive does not use a drive letter, but appears as a folder on an existing drive. Mounted drives on clustered storage must be mounted to a drive residing on shared storage in the same cluster group and are dependent on this “root” disk. Planning sufficient allocation of disk space for your applications is critical. Since you cannot use dynamic disks on shared storage without using third-party tools, it is difficult to increase the size of clustered disks. Be sure to allow for data growth when initially sizing your partitions.This is a situa- tion where it is better to allocate a few megabytes too many than a few kilobytes too few. 208 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.19 Configuring Clustered Disks in Disk Management 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 208 If you plan on using the generic script resource type, make sure the script file resides on a local disk, not a shared disk. It is possible for errant scripts to be the cause of a failover, and if a script resides on a clustered disk, the script “disappears” from under the node executing it. By keeping the scripts on a local disk, they remain available to the node at all times, and the appropriate error- checking logic can be used when errors are encountered. Power-Saving Features Windows Server 2003 includes power-management features that allow you to reduce the power consumed by your servers.This is very useful on laptop computers and some small servers, but can cause serious problems if used on clustered servers. If more than one node were to enter a standby or hibernation state, the server cluster could fail. The power-saving options in Windows Server 2003 must be disabled for server clusters. Nodes should be configured to use the Always On power scheme, as shown in Figure 6.20.To access this option, select Start | Control Panel | Power Options. Using this power scheme will prevent the system from shutting down its hard drives and attempting to enter a standby or hibernation state. Cluster Network Configuration Communications are a critical part of server cluster operations. Nodes must communicate with each other directly over the interconnects in order to determine each other’s health and, if necessary, ini- tiate a failover. Nodes must also communicate with client systems over the public network to pro- vide services. Both of these networks require proper planning. When referring to server clusters, there are four types of networks: ■ Internal cluster communications only (private network) Used by nodes to handle their communication requirements only. No clients are present on this network.This net- work should be physically separated from other networks and must have good response times (less than 500 ms) in order to avoid availability problems. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 209 Figure 6.20 Enabling the Always On Power Scheme 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 209 ■ Client access only (public network) Used to service client requests only. No internal cluster communication occurs over this network. ■ All communications (mixed network) Can handle both categories of communica- tions traffic. Normally, this network acts as a backup to a private network, but that is not required. ■ Nonclustered network (disabled) Unavailable for use by the cluster for either ser- vicing clients or for internal communications. When you create the server cluster through the New Server Cluster Wizard, it will detect the different networks configured in the server.You will be asked to select the role each network will have in the server cluster. Select Internal cluster communications only (private network) for the interconnect(s), as shown in Figure 6.21, instead of accepting the default value (which will mix the server cluster heartbeat traffic with client communication traffic). If you are using only a single interconnect, you should configure at least one public network interface with the All communications (mixed network) setting, as shown in Figure 6.22.This allows the server cluster to have a backup path for internal server cluster communications, if one is needed. If you have multiple interconnects configured, you should set the public interfaces to the Client access only (private network) setting. 210 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.21 Configuring Interconnect Networks 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 210 Multiple Interconnections At least one interconnect between nodes is required. Node status messages are passed over this com- munication path. If this path becomes unavailable, a failover may be initiated. Because of this, mul- tiple interconnects are recommended. If your server cluster is configured with multiple interconnects, the reliability of the intercon- nects goes up. If a heartbeat message on one interconnect path goes unanswered, the node will attempt to use the other interconnect paths before initiating a failover. As with most components in a high-availability system, redundancy is good. When using multiple interconnects, follow the same rules previously stated for configuration, but try to avoid using multiple ports on the same multiport network interface card (NIC). If the card fails, you will lose the interconnect. If you are using two dual-port cards, try to configure the system to use one port on each card for interconnects and the other port for your public network. Node-to-Node Communication The interconnects are used by the nodes to determine each other’s status.This communication is unencrypted and frequent. Normal client activity does not occur on this network, so you should not have client-type services assigned to the network interface used for interconnects. Windows Server 2003 normally attaches the following services to each network interface: ■ Client for Microsoft Networks ■ Network Load Balancing ■ File and Printer Sharing for Microsoft Networks ■ Internet Protocol (TCP/IP) Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 211 Figure 6.22 Configuring Public Networks 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 211 You should uncheck the first three services from each interconnect interface (the properties of a network interface are accessible via Start | Control Panel | Network Connections). Only TCP/IP should be assigned. Figure 6.23 shows a properly configured interconnect interface. You should also make sure that the Network Priority property of the server cluster is config- ured with the interconnect(s) given highest priority, as shown in Figure 6.24.This ensures that internal cluster communication attempts are made on the interconnects first.To access this property, in Cluster Administrator, right-click the server cluster name and select Properties. 212 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing Figure 6.23 Configuring an Interconnect Interface Figure 6.24 Setting the Network Priority Property of the Cluster 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 212 Binding Order Binding is the process of linking the various communications components together, in the proper order, to establish the communications path.To configure the binding order of communication pro- tocols and services to the network interface, select Start | Control Panel | Network Connections. Click the Advanced menu and select Advanced Settings…. When establishing the order of network connections, you should ensure that the public interfaces appear highest on the list, followed by interconnects, and then any other interfaces. Figure 6.25 shows this binding order. Adapter Settings All network interfaces in a server cluster should be manually set for speed and duplex mode. Do not allow the network adapters to attempt to auto-negotiate these settings. If the controllers negotiate differently, your communications can be disrupted. Also, in many cases, a crossover cable is used on the interconnects. In these cases, an auto-negotiation may fail entirely, and the interconnect may never be established, affecting cluster operation. As mentioned earlier, teamed network adapters must not be used for interconnects. However, they are perfectly acceptable for the public network interfaces. A failover or load-balanced configu- ration increases redundancy and reliability. TCP/IP Settings Static IP addresses (along with the relevant DNS and WINS information) should be used on public network interfaces. For the interconnects, you must use static IP addresses. It is also a good practice to assign private IP addresses on interconnects from a different address class than your public class. For example, if you are using class A addresses (10.x.x.x) on your public interface, you could use class C addresses (192.168.x.x) on your interconnects. Following this prac- tice helps easily identify the type of network you may be troubleshooting just by looking at the address class. Using addresses this way is not required, but it does prove useful. Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 213 Figure 6.25 Setting the Proper Binding Order of Interfaces 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 213 Finally, you should not configure IP gateway, DNS, or WINS addresses on your interconnect interfaces. Name resolution is usually not required on interconnects and, if configured, could cause conflicts with name resolution on your public interfaces. All public interfaces must reside on the same IP subnet. Likewise, all interconnect interfaces must reside on the same IP subnet. The Default Cluster Group Every server cluster has at least one cluster group: the default.This group contains the following resources: ■ Quorum disk (which contains the quorum resource and logs) ■ Cluster IP address ■ Cluster name (which creates the virtual server) When designing your server cluster, you should not plan on using these resources for anything other than system administration. If this group is offline for any reason, cluster operation can be compromised. Do not install applications on the quorum drive or in the default cluster group. Security Security is a consideration for any computer system. Server clusters are no exception. In fact, because they often contain critical information, they should usually be more closely guarded than a standard server. Physical Security Nodes should be kept in controlled environments and behind locked doors. More downtime is caused by accident than by intent. It is obvious that you would not want an unhappy or ex- employee to have access to your computer systems, but what about the curious user? Both can lead to the same end. When setting up physical security, do not forget to include the power systems, network switches and routers, keyboards, mice, and monitors. Unauthorized access to any of these can lead to an unexpected outage. Public/Mixed Networks It is a good idea to isolate critical server clusters behind firewalls if possible. A properly configured firewall will also allow you to control the network traffic your server cluster encounters. If there are infrastructure servers (DNS, WINS, and so on) that are relied on to access the server cluster, make sure that those servers are secured as well. If, for example, name resolution fails, it is possible that clients will not be able to access the server cluster, even though it is fully operational. Private Networks The traffic on the private interconnect networks is meant to be used and accessed by nodes only. If high traffic levels disrupt or delay heartbeat messages, the server cluster may interpret this as a node failure and initiate a failover. For this reason, it is a good idea to place the interconnects on their own switch or virtual LAN (VLAN) and to not mix heartbeats with other traffic. 214 Chapter 6 • Implementing Windows Cluster Services and Network Load Balancing 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 214 Do not place infrastructure servers (DNS, WINS, DHCP, and so on) on the same subnet as the interconnects.These services are not used by the interconnects and may cause the conflicts you desire to avoid. Remote Administration of Cluster Nodes Administration of your server cluster should be limited to a few controlled and trusted nodes.The administrative tools are quite powerful and could be used intentionally or accidentally to cause failovers, service stoppages, resource stoppages, or node evictions. Use of Terminal Services on nodes is debatable.Terminal Services works just fine on nodes and actually includes some benefits. Evaluate your administrative, security, and operational needs to determine if installing Terminal Services on your nodes is appropriate for your situation. The Cluster Service Account The account that the cluster service uses must be a domain-level account and configured to be a member of the local Administrators group on each node.This account should not be a member of the Domain Admins group. Using an account that has elevated domain-level privileges would pre- sent a strong security risk if the cluster service account were to become compromised. Do not use the cluster service account for administration, and be sure to configure it so that it can log on to only cluster nodes. Use different cluster service accounts for each cluster in your envi- ronment.This limits the scope of a security breach in the event that one occurs. If any of the appli- cations running on your server cluster require accounts for operation, create and assign accounts specifically for the applications. Do not use the cluster service account for running applications. Doing so would make your cluster vulnerable to a malfunctioning application. If you are required to permanently evict (forcibly remove) a node from a server cluster, you should manually remove the cluster service account from the appropriate local security groups on the evicted node.The cluster administrative tools will not automatically remove this account. Leaving this account with elevated permissions on an evicted node can expose you to security risks for both the evicted node and your domain. Another possible method of securing a server cluster is to create a domainlet. A domainlet is a domain created just to host a server cluster. Each node in the server cluster is a domain controller of the domain. A domainlet allows you to better define and control the security boundary for the cluster.There are advantages and disadvantages to this approach. (For more information about domainlets, visit Microsoft’s Web site.) Client Access Use the security features built into Windows Server 2003 and Active Directory (AD) to secure the applications and data on your server cluster.Turn on and use the auditing features of the operating system to see what activity is occurring on your server cluster. Administrative Access In larger organizations, it may be possible to have a different group of personnel responsible for administering clusters than those that perform other administrative tasks. Evaluate this possibility in your organization. If this strategy is adopted, assign these cluster administrators to a domain group Implementing Windows Cluster Services and Network Load Balancing • Chapter 6 215 301_BD_W2k3_06.qxd 5/13/04 3:06 PM Page 215 . support. With the introduction of Windows XP, Microsoft changed from the HCL to the Windows Catalog. Windows Server 2003- compatible hardware is listed in the Windows Server Catalog, but the concept. clustered servers. If more than one node were to enter a standby or hibernation state, the server cluster could fail. The power-saving options in Windows Server 2003 must be disabled for server. Access Use the security features built into Windows Server 2003 and Active Directory (AD) to secure the applications and data on your server cluster.Turn on and use the auditing features of the operating system