Like ClusterXL load sharing and HA New mode, the best test you can perform is a real-world test. In load sharing, a simple test consists of starting a connection through the cluster and monitoring the cluster to determine which member the connection has Figure 6.69 Both Members Are Online as Part of the Cluster
gone through. If the test connection is the only connection, you might be able to see this from the “Work assigned” value in the cluster monitoring facility in Voyager, or you could use the FireWall-1 NG FP3 SmartView Tracker (with a hotfix applied to show origin IP addresses of the member in the cluster), or you could use SmartView Monitor.
In this example, we have started an FTP session through the cluster, and we are using SmartView Monitor to monitor the traffic through the cluster. When we initiate the FTP session through the cluster and start downloading data, we can see that all the load is on member fw2 in the cluster (see Figure 6.72).
Figure 6.71 One Member Only in Cluster
Figure 6.72 Display of Traffic Through SmartView Monitor
As we can see in Figure 6.72, the FTP session was started at 11:52:30, and failure occurred at 11:52:48 (actually, we pulled the internal interface connector out of member fw2). Figure 6.73 shows that member fw1 took over the session.
Note that the timeline shows that member fw1 did not take over the load for 3 seconds.
Command-Line Stats
We saw earlier that ClusterXL uses the cphaprob command to determine status of the cluster. We can use a similar Nokia command-line tool to check the status of a Nokia cluster.
On the Nokia platform, we use the Command Line Interface Shell (known as clish).This is an interactive command line, although a single command can be executed using the –c“command” option. Once in the shell, you can use the command show clus- tersto determine the status of the members in the cluster (see Figure 6.74).
Figure 6.74 Example of Use of the clishCommand to Check the Cluster Status
fw2[admin]# clish Nokia> show clusters CID 130
Cluster State up Member ID 1
Figure 6.73 Display of Traffic Through Member fw1 When fw2 Fails
Continued
Protocol State master
System Uptime At Join 1:02:58:57 Performance Rating 275
Failure Interval 4000 Cold Start Delay 30 Number of Interfaces 3 Primary Interface eth-s2p3c0 Interface eth-s2p1c0
IP Address 195.166.16.132/24 Cluster IP Address 195.166.16.130 Hash NAT-external
Interface eth-s2p3c0
IP Address 192.168.12.132/24 Cluster IP Address 192.168.12.130 Hash default
Interface eth-s2p4c0
IP Address 192.168.1.132/24 Cluster IP Address 192.168.1.130 Hash NAT-internal
Member(s) information Number of Member(s) 2 Member 1 (master)
IP Address 192.168.12.132 HostName(Platform) fw2(IP400) OS Release 3.6-FCS4
Rating 275
Time Since Join 0:19:20:57
Cluster Uptime At Join 0:00:00:00 Work Assigned 50%
Member 2 (member)
IP Address 192.168.12.131 HostName(Platform) fw1(IP400) OS Release 3.6-FCS4
Rating 275
Time Since Join 0:19:14:34
Figure 6.74 Example of Use of the clishCommand to Check the Cluster Status
Cluster Uptime At Join 0:00:06:22 Work Assigned 50%
Nokia> show cluster securemote yes
Nokia> show cluster vpn-tunnels VPN tunnel(s) configured
Network/Mask Destination 192.168.254.0/24 194.155.13.33 Nokia> exit
Goodbye..
Many commands are variations of the show clustercommand. See the Nokia Command Line Reference Guide for further information.You can use the cphaprob com- mand on the Nokia platform if you like, but the information that it will tell you is lim- ited. For example, it can’t tell you which interfaces are up or down. It can tell you if the state table synchronization is working or not.
How Nokia Clustering Works
Nokia clustering has many similarities to the Check Point ClusterXL load-sharing solution, but because the clustering is not part of the Check Point product, you do get some differences that are significant. We can draw some parallels between ClusterXL load sharing and Nokia clustering as follows:
■ Both ClusterXL load sharing and Nokia clustering use a VIP address and a multicast MAC address, so devices on the local subnet do not see any differ- ence when initiating connections through the cluster. On a Nokia cluster, there is always a host that is assigned master in the cluster, and this member will respond to ARP requests.
■ Both ClusterXL and Nokia clustering have a method for each member to tell the other members its status in the cluster. However, the ways that they do this are different. ClusterXL does this using the CPHA protocol, which is sent from each interface of the cluster member to all other cluster members. Nokia uses a dedicated network to communicate using its own protocols: IP protocol 0x90 (144 decimal), which is a multicast MAC destination and IP address, and Figure 6.74 Example of Use of the clishCommand to Check the Cluster Status
two TCP services (ports 11003 and 11004). Note that the protocol 0x90 traffic bypasses the firewall, so no policy rules are required.
■ Both systems have a load-sharing hashing method that can be altered by the user. On Nokia, this method is set up in Voyager, based on whether your interface is external or internal (or a VPN gateway); on Check Point ClusterXL, this is based on three choices: IP addresses, ports, and SPI (VPN negotiation); IP addresses and ports; or just IP addresses.
■ Like ClusterXL, connections through the Nokia cluster are directed through one member in the cluster on a per-connection basis. Asymmetric routing is avoided by the load-sharing algorithm, and although this would still work if it does occur, you could get some sessions dropped when they initiate, due to the reply being received from the remote host before the state tables have an opportunity to synchronize between the cluster members.
■ Just like ClusterXL, the Nokia members still have valid IP addresses that you can connect to.
Let’s walk through an example of how a connection would work through a Nokia cluster. In our example, host 192.168.1.200 will initiate a Telnet session through the Nokia cluster to our ISP router on IP address 195.166.16.129, and as before, in our ClusterXL HA New mode, we will hide the connection behind the cluster external IP address of 195.166.16.130, using a hide rule in our firewall NAT Rule Base.
When the Telnet session is initiated, the host 192.168.1.200 sends out an ARP request for 192.168.1.130, which is the default gateway on the network 192.168.1.0.
The response in the ARP will be a multicast MAC address—a MAC address that applies to all members of our cluster for the internal interface.The Nokia member that is the master will always send the ARP response. (More on the master later.) In our example, the MAC address returned is 01:50:5a:a8:01:82. Our host on 192.168.1.200 then sends a SYN TCP packet, high source port, destination is to 195.166.16.129, des- tination MAC is 01:50:5a:a8:01:82 (the default gateway MAC address).
All members in the cluster will receive this packet, but only one of them will do any- thing with the packet—depending on which member in the cluster is meant to pick up the packet, which is based on the load-sharing algorithm.The member who will deal with the connection will pass the packet up through the IP stack to the Check Point FireWall-1 NG FP3 kernel for the incoming interface.The TCP SYN packet will pass through the Rule Base of the firewall and, providing everything is fine, it will then send the packet out of its external interface, with the source IP address of 195.166.16.130 (the external cluster IP address), with the source MAC address of the member that is taking the connection (in our example, the source MAC address is 00:c0:95:e2:b1:40, which
corresponds with member fw2 external interface eth-s2p1c0), and the destination IP address will be 195.166.16.129 (see Figure 6.75).
If the Telnet daemon is listening when the packet reaches the ISP router on 195.166.16.129, it will produce a response. Again, the ISP router will issue an ARP request for IP address 195.166.16.130, which is the VIP of the cluster.The master member will respond to the ARP request, sending the multicast MAC address as the MAC address associated with IP 195.166.16.130 (but it will keep the source MAC address of the ARP reply as its own physical external interface; this is one way to see Figure 6.75 Description of a Connection Through a Nokia Load-Sharing Cluster
fw1 Hub fw2
Hub
Hub ISP Router
cpmgr
PDC 192.168.11.131
eth-s1p2c0 MAC=00:c0:95:e0:15:dd
192.168.11.132 eth-s2p2c0 MAC=00:c0:95:e2:b1:41
195.166.16.134
195.166.16.131 eth-s1p1c0 MAC=00:c0:95:e0:15:dc
195.166.16.132 eth-s2p1c0 MAC=00:c0:95:e2:b1:40
192.168.1.131 eth-s1p4c0 MAC=00:c0:95:e0:15:df
192.168.1.132 eth-s2p4c0 MAC=00:c0:95:e2:b1:43
192.168.1.200 Default route = 192.168.1.130 Out to the Internet
195.166.16.129
State sync Network 192.168.11.0 /24 External Network
195.166.16.0/24 VIP = 195.166.16.130 VMAC=01:50:5a:a6:10:82
Internal Network 192.168.1.0/24 VIP = 192.168.1.130 VMAC=01:50:5a:a8:01:82 Domain = london.com
1. TCP SYN packet sent to multicast MAC
2. Load sharing hash calculates that fw2 should take the connection 3. Packet is HIDE address
translated behind VIP 4. SYN ACK packet
is sent to multicast MAC 01:50:5a:a6:10:82
5. Reply packet is accpepted by fw2 based on hashing algorithm, and address translated.
which of your Nokia members is the master without using Voyager). Host 195.166.16.130 will then send a SYN,ACK TCP packet, the source IP will be
195.166.16.129, source port will be 23, and the destination MAC will be the multicast MAC address of the VIP 195,166.16.130, which is 01:50:5a:a6:10:82 in our example.
Again, the reply packet gets onto all members in the cluster, and the correct member that took the original SYN packet for the connection is selected by the hashing algorithm that was selected for that interface.
NOTE
It is important to understand the importance and meaning of the various hashing algorithms. The reply packets get sent back through the same member based on which hashing algorithms you select. For example, if you use Hide NAT when initiating a connection that leaves through the external interface, you have to pick hashing methods that take the NAT into account: NAT_EXT for the external interface, NAT_INT for the internal interface. Not doing this could cause the reply packets to be accepted by the wrong member in the cluster by the load-sharing algorithm, ending up with asymmetric routing. In some complex NAT configurations, there will be conflicts as to which hashing algorithms should be used—for example, where “double NAT” takes place. If these config- urations cannot be avoided, other measures should be taken to avoid asyn- chronous routing, such as static routing via members. This could well lead to imbalances in load sharing and lack of resilience for some connections.
The packet then leaves the internal interface of member fw2 in our example; the source IP is the 195.166.16.129 IP address of the ISP router, the source MAC address is the internal interface MAC address 00:c0:95:e2:b1:43 of fw2, and the destination IP is now 192.168.1.200 (it has been address translated by FireWall-1).
Nokia Cluster Failover
In the event of a failure condition, network traffic taken by that member needs to be routed by an alternative member in the cluster.This is done on the cluster control net- work. Again, the key is the cluster control protocol that uses this network.
The Nokia cluster control protocol is utilized by the member that is the master.
The master member sends out the status of the cluster to all other members in the cluster, using the cluster control protocol.The master member is usually the first
member that is made active when you create a Nokia cluster. If the master fails, another member will take over and become master.There is only one master member in any cluster, but the member that is master can change depending on failures in the cluster.
When the master member in the cluster communicates with the other members in the cluster, it uses the Nokia cluster control protocol, which is IP protocol 0x90 (144
decimal).The cluster control network is used exclusively (unlike the CPHA protocol used in ClusterXL). When the master communicates with the other members in the cluster, it is from the real source MAC address of the master on the control network, the source is the real IP address of the master, the destination MAC address is a multi- cast MAC address, and the IP address is a multicast IP address. For example, if member fw1 were the master, it would send out a packet, source MAC 00:c0:95:e0:15:de, source IP 192.168.12.131, destination MAC 01:00:5e:00:01:90, destination IP address
224.0.1.144. All members that receive the packet will often respond, with their real source MAC and IP address, to the real destination MAC and IP address of the master.
In our example, if member fw1 were the master and member fw1 failed, fw2 would be the master.You would notice that fw2 would start to issue IP protocol 0x90 packets from its real IP, and the destination IP would be the multicast IP for the other members in the cluster.This is another method you can use to determine which member in the Nokia cluster thinks it is the master. Note that when a new master is chosen, it will stay the master until it fails and cannot be the master any longer.You will also see TCP ports 11003 and 11004 Nokia cluster control connections on the cluster control network.
Failover from the point of view of the networking devices on the same local subnet as the VIPs is transparent because the MAC address used by the cluster does not
change.There will be a short delay during failover as the load-sharing algorithm deter- mines which member in the cluster will take over the connections of the failed
member.This process can take up to 4 seconds.
Nokia Failover Conditions
Failure of a Nokia cluster member is determined when one of the following occurs:
■ IP forwarding fails or is stopped (e.g., by cpstop).
■ The FireWall-1 process fwddies.
■ An interface goes down.
All these scenarios are monitored by the clusterdprocess on each Nokia member.
When a failover occurs, the clusterdprocess logs the event in the Nokia system logs (/var/log/messages file).
Special Considerations for Nokia Clusters
We have talked a little about how the Nokia clustering solution works, so based on how the technology in Nokia clustering works, we need to take into account its effects when setting up our cluster and the Rule Base we are likely to use.
Network Address Translation
As with all clusters, the way you decide to implement your NAT rules needs to be taken into account. In ClusterXL in HA New mode, we noticed that you cannot use manual proxy ARP entries into the OS. In ClusterXL in Load-Sharing mode, we stated that all methods of NAT and proxy ARP should work fine.
In a Nokia cluster, you cannot use Check Point’s own Automatic ARP setting in the Policy | Global Properties | NAT – Network Address Translation | Automatic Rules for Automatic ARP Configuration menu.
The reason for this is that each member will proxy ARP for the real MAC address of the member in the cluster as opposed to proxy ARPing the multicast MAC address of the cluster. For this reason, you cannot use Automatic ARP Configuration.
You can enter proxy ARP entries into Voyager for NATed IP addresses, using the multicast MAC address of the cluster interface.You can also use static routes on the ISP router to route traffic to the VIP address of the cluster for the NATed IP address.
If you plan to use proxy ARPs for multicast MAC addresses on the Nokia platform, you need to enable Accept Multicast reply to ARPon the ARP page of the
Voyager interface.You need to do this for all members that make up your cluster.
NOTE
Accept Multicast reply to ARPmust be enabled for the cluster to work properly.
Defining the Cluster Object Topology
When defining the gateway cluster object for the Nokia cluster, it is possible to define the cluster topology, listing the VIPs. However, this apparently harmless change results in a significant change in FireWall-1 behavior. Connections that originate from indi- vidual cluster members are subject to implicit Hide NAT behind the outgoing cluster VIP.This will affect traffic such as DNS lookups and outgoing FTP connections origi- nating from cluster members.This is the same behavior we saw under ClusterXL. As with ClusterXL, once FP3 Hot Fix 1 is applied, packets routed back to the wrong member will be routed onward via the sync link. Check Point ClusterXL makes allowances for this when handling this traffic, dealing with it gracefully. A Nokia clus- tering solution will not deal with it as well, and the traffic involved will not be reliable.
This behavior will also cause a problem with traffic between external interfaces of members. For these reasons, defining the cluster topology is not recommended when you’re using a Nokia solution. Possibly this configuration will be made workable in future releases of NG.