JUNOS software configuration hannes@Frankfurt> show configuration [… ] protocols { isis { interface at-4/0/0.100 { lsp-interval 50; } } } [… ] LSP throttling by use of the lsp-interval command is a powerful mechanism to control the flooding pace to neighbouring routers in order to not overload them. There is another issue that has not yet been discussed: control traffic (LSP and related packets) may “push back” the user traffic (information packets) because control traffic always has precedence in terms of scheduling on the router interface cards. Unfortunately, the con- trol traffic transmission rate does not get lower on low-bandwidth interfaces such as DS0 or fractional T1/E1 line – control traffic stays the same. You can easily imagine that on a low-bandwidth circuit transmitting 30 full-MTU sized packets does not leave much room for other types of packets. So it would be nice if there were a way to tell the router just to utilize a certain percentage of the interface bandwidth for control traffic. In IOS, you can configure the bandwidth <bw> statement on a (sub)-interface so that the router makes sure that there is not more than 50 per cent (for instance) of the interface bandwidth utilized for LSP transmission. This is the recommended option to use for low- bandwidth circuits. IOS configuration In IOS, LSP throttling is calculated automatically by setting the bandwidth parameter in interface configuration mode – this makes sure that not more than 50 per cent (for example) of the configured interface Bandwidth is dedicated to the routing protocol. This example sets the total bandwidth available for IS-IS traffic to 256 Kbps, which might be only a fraction of the total bandwidth available on the link (perhaps 2 Mbps): London# show running-config [… ] interface Serial1/2 ip router isisu bandwidth 256 [… ] JUNOS software does not support automated calculation of LSP throttling because the lowest-speed interface cards on a Juniper Networks router starts at T1/E1 speeds (1.5 and 2 Mbps) and it is assumed that even with an LSP pacing of 20 ms this will not consume more than 50 per cent of the interface bandwidth. However, there may be fractional Flow Control and Throttling of LSPs 177 T1/E1 circuits (less than the full bandwidth) configured as well, where LSP pacing might have to be adjusted. However, the JUNOS software lsp-interval knob really helps to solve two prob- lems: regulating the control-traffic-to-user-traffic ratio, and protecting neighbours during transient situations. So the lack of direct bandwidth control is not really an issue: the same knob can be used to solve both problems. Note that the traffic subject to this pacing was non-self-originated traffic, which is traf- fic that has been originated by other routers, not the local router. In the next section, you learn about pacing of self-originated LSPs that come from the local router. 6.6.2 LSP-generation-interval Routers need to limit how fast they announce changes to the network. A router does not just send an LSP and move on. Sending an LSP to the network essentially requests a replication service from the network to flood the LSP. So any LSP sent consumes tremendous resources from the network. The LSP sent may be replicated by hundreds of routers over thousands of links. By inserting pacing rules on the individual routers, you can make sure that the net- work does not melt down once more than one router has to say something. The ISO 10589 specification describes an architectural constant called minimumLSPGenerationInterval that serves this purpose. In vendor’s documentation this is sometimes referred to as LSP holddown. The IS-IS specification recommends setting this value to 30 seconds. Higher intervals may lead to routers that are not responsive to changes in the network, whereas lower values may generate too much churn in the network. For a long time, IOS has implemented a 5 second holddown interval to keep a good balance between the two extremes. Today, the frequency of LSP origination can be controlled using the lsp-gen-interval <holddown> [<initial-wait> <minimum-holddown>] configuration command. The first argument specifies the time between LSP builds. This is the timer that ISO 10589 mentions and is discussed pre- viously. The interesting thing about LSP build holddown is that this is not enforced statically today. Modern implementations have a dynamic approach and try to strike a balance between responsiveness and stability. So there are two LSP holddown timers: a fast holddown and a slow holddown timer. Depending on how busy the network is, a router switches from fast behaviour to slow behaviour. The first couple of LSP builds are scheduled very quickly without LSP build holddown consideration. However, if more LSP builds are requested, then the router is probably in trouble and the router backs off to the normal slow LSP origination behaviour. The initial-wait timer specifies how fast the router fires off an LSP after first building it. In transient situations a router probably needs to update its LSP a few times and this initial-wait timer helps by accumulating a few builds. Minimum-wait controls the LSP build holddown in the fast phase. How many LSPs need to be built until IOS switches from fast to slow behaviour? IOS uses a technique called exponential back off to toggle gradually between the two modes. Consider the IOS configuration snippet shown here. In IOS, there are three timers to con- trol LSP holddown. The first timer specifies the LSP holddown in the slow phase expressed in units of seconds. The second timer specifies how many milliseconds to wait 178 6. Generating, Flooding and Ageing LSPs before sending the LSP. The third timer specifies the LSP holddown in the fast phase expressed in milliseconds. IOS configuration London# show running-config [… ] router isis lsp-gen-interval 5 200 1000 [… ] Figure 6.17 shows the timing behaviour of the exponential back off algorithm. After the first LSP is built it is delayed for 200 ms (second value given) until it gets sent. Next, the holddown timer kicks in, therefore the second LSP originated will be delayed for at least 1000 ms (a full second) as specified in the third argument of the lsp-gen- interval configuration command. All subsequent LSP builds will be delayed by twice the previous holddown time: 2 seconds for the third LSP, 4 seconds for the fourth, and so on. The holddown time is limited to the first argument (5 seconds) of the lsp- gen-interval command as a precaution that the interval does not grow to an infinite value. So for every fast-build the LSP-Origination-Interval gets larger until it hits the ceiling of 5 seconds. After a particular router has stopped issuing LSPs for 20 seconds, the LSP holddown will be reset. This means that from here on any further LSP origin- ations will receive fast holddowns again, but only for the first couple of LSPs. The JUNOS software scheme has a two-step rate limit. First, there is a global LSP throttling similar to the one specified in ISO 10589. All the LSPs are paced using a 20 ms timer. Additionally, there is additional logic that damps adjacency and makes sure that the adjacency is reliably up for some time before advertising the adjacency. The global LSP gating is hard-coded; there is no user interface knob to change the value. The slow LSP holddown value is a base value 10 seconds with 25 per cent jitter (timing variation) applied. That means that subsequent LSP builds will be randomly delayed between 7.5 and 10 seconds. Jittering a timer makes the Event always happening earlier but never later than the original base value. This variation is useful to avoid global synchronization and the associated LSP storms and router churn. Recall that a new LSP makes all routers do several things at the same time (flooding, SPF calculation, and more), which in turn synchronizes the CPU peaks in a network. Smearing the CPU peaks across routers by adding some timer jitter helps to avoid churn across all routers. In JUNOS software, there are also a number of fast builds, which are currently hard- coded to three fast builds of LSPs. The initial wait timer is hard-coded to 20 ms before the LSP is sent. The reason why there are no configuration knobs is the JUNOS software has adjacency holddown logic to make sure that the root cause of dynamic LSP changes (adjacency changes), will be damped (suppressed). Exactly how does this adjacency holddown logic work? After a successful three-way handshake, the router does not declare the adjacency Up immediately. The router will wait to see if it can sustain the LSP stress generated from the new adjacency. Each new adjacency can generate a lot of LSPs. Just think of a partitioned network that starts to heal. The healing router brings up Flow Control and Throttling of LSPs 179 180 F IGURE 6.17. Exponential holddown gradually supresses LSPs, generation 2000 4000 6000 8000 10000 12000 t (ms) 0 First LSP build Second LSP build and send First LSP sent 200 ms after build 1000 ms holddown 2000 ms holddown Third LSP build and send 4000 ms holddown Fourth LSP build and send 32000 5000 ms holddown (max holddown) After 20 s fallback to fast behaviour the adjacency and is exposed to a massive amount of new LSPs sent to it from the new peer. In Chapter 8 you will acquire more insight as to just how IS-IS exchanges LSPs and the mechanisms that synchronize link-state databases. Can the router sustain the stress generated from all the new LSPs hammering at it? The router does not know yet. Does it make sense to advertise a new LSP if the network is in flux? Probably not – so the router delays its own LSPs until the network is quieter. Just to be safe, the JUNOS software waits at least 20 seconds after an adjacency is declared Up before doing anything further with the to-be-generated LSP. Next, the router starts to measure the arrival rate of LSPs to see if things have become more stabilized. JUNOS software still holds the adjacency down until the LSP reception rate has gone down to 5 LSPs/per 5 seconds. After the maximum holddown period of 60 seconds, which begins after the IS-IS 3-way handshake, the adjacency will finally be advertised in the LSP. That two-level approach (LSP gating plus adjacency holddowns) has proven to be a good mechanism that works in a variety of networking environements. The Juniper Networks development engineers felt that it was not necessary to expose a knob to change this behav- iour to the user. (Knobs are good – but the knobs that I do not need are even better.) 6.6.3 Retransmission Interval According to ISO 10589, each IS-IS router has to acknowledge LSPs within a five- second window or else the neighbouring router will re-transmit that new LSP. A router that is in trouble may not be able to respond within the five seconds. Therefore it makes sense to increase that retransmission timer to higher values for lower-powered, CPU- based routers. In JUNOS software, the five-second retransmission interval is hard coded and cannot be changed. In Cisco IOS the retransmission interval is configurable and can be controlled on a per-interface basis. IOS configuration In IOS, the retransmission timer is configurable. Setting the isis retransmit-interval <interval> command in interface configuration mode controls this timer, as shown in the following: London# show running-config [… ] router isis isis retransmit-interval 5 [… ] In Cisco IOS, you can also control how fast LSPs are sent once a router is in the retransmission window. This is another mechanism that helps a busy neighbour and makes sure that a sender does not overwhelm the receiving router with LSPs once the sender starts retransmitting LSPs. Here the router takes a non-acknowledgement of an LSP previously sent as a sign of trouble and therefore throttles down the LSP transmis- sion rate. Recall that the default LSP transmission rate in Cisco IOS is 33 ms between LSPs. The default retransmission-throttling interval increases that value by a factor of 3, Flow Control and Throttling of LSPs 181 up to 100 ms. That should be sufficient to back off a troubled router. It is not recom- mended to go beyond 333 ms because the LSP pacing gets so slow that the network becomes unresponsive in terms of reaction to changes. In IOS, the retransmission-throttling timer is configurable. Setting the isis retransmit-throttle-interval <interval> command in interface con- figuration mode controls this timer. IOS configuration London# show running-config [… ] router isis isis isis retransmit-throttle-interval 200 [… ] 6.7 Conclusion The way in which an IS-IS implementation handles LSP dynamics separates amateur enthusiast code from professional developer’s routing code. LSP dynamics is perhaps the most important feature to focus on when evaluating IS-IS vendors. Interestingly, there is almost nothing in the ISO 10589 specification that tells you how to implement IS-IS in a scalable and robust manner. For many router startups, the lack of experience in how to do this right has been a barrier to entrance in the high-end router market and it probably still is. Ironically, in the world of open specifications, there are still barely a dozen routing protocol software engineers who have the necessary experience to get the IS-IS code right the first time. Do not be misled. I am not asserting that no other engineers but these few can ever get IS-IS right. With enough time, and with customers willing to take the pain to obtain that operational experience with regard to what works and what does not, sooner or later every implementation of IS-IS can get to a level of what is called Carrier- Class-Code. There are a number of interesting routing software approaches used by other vendors, but these are not discussed in this book. Time and operational experience will tell what implementation of IS-IS will finally prevail in the Internet. 182 6. Generating, Flooding and Ageing LSPs 7 Pseudonodes and Designated Routers 183 Historically routers were used to network local sub-nets to each other. Routing protocols are optimized to run in a wide area network (WAN) environment which are typically point- to-point links like Serial Lines, Frame Relay or ATM. Due to the popularity of Ethernet since the mid-1980s routing protocols are required to operate and scale on broadcast cir- cuits like Ethernet. Broadcast networks allow multiple devices to see each other. For link-state routing protocols like IS-IS multipoint capability means additional forms of stress in the domains of Hello processing, database storage size dynamics like link-state database churn. In this chapter you will learn how LAN circuits are different from p2p circuits, and what scaling challenges there are on p2p circuits. You will learn about the pseudonode concept, its nodal representation in the IS-IS link-state database and implications in the SPF algorithm. Finally the purpose of a Designated Intermediate System (DIS) and its election, pre-emption and timing details will be highlighted. 7.1 Scaling Adjacencies on Large LANs Whenever there is a large number of routers on a LAN, lots of care must be taken. There are several aspects of the protocol to worry about: first, if there is a large number of speakers on the LAN there is a lot amount of Hellos to process. Just imagine a LAN with 100 IS-IS speakers generating in total 300 Hellos per second. If those 300 Hellos are evenly spread at one Hello each 3 milliseconds, as illustrated in Figure 7.1, no problem – this won’t stress the internal scheduling of the Router OS too much. However, the environment, especially once it comes down to routing protocols is not nice and far from being ideal. Therefore we may never assume ideal working conditions. 7.1.1 The Self-synchronization Problem Murphy’s Law dictates “If things can go wrong they will go wrong”. The worst case scenario is that 99 Hellos hit the control plane of the receiving router at once as shown in Figure 7.2. Although the average CPU stress remains moderate if all the Hellos are evenly spread, there could be a short time shortage of resources (buffer memory and CPU) if a large number of Hellos arrives at once. The last line of defence in a peak load situation is to drop incoming Hellos. Arguably the buffers should be made big enough to absorb any peak load condition. So how big is big enough? One needs to make a trade- off here as well. Due to stability reasons a router should not buffer an almost infinite queue of incoming protocol packets. Processing very large queues may keep the router busy with updates that are a few packets later withdrawn. On the other side there should be some minimum buffer to absorb short time bursts. The worst case was previously described as “one Router hit by all Hellos of 99 Routers at once” and on first sight this might seem as unrealistic, artificial scenario. The reality is that without precautions in the routing code generates Hellos there will be a resulting effect called self-synchronization. Self-synchronization means that a router is immediately answering with a Hello to network events like adjacency changes and new neighbours. This behaviour tends to add up by all the speakers on the LAN and as a side-effect all the Hellos are scheduled at the same point, which is artificially generating an unwanted form of peak-stress followed by seconds of silence, as illustrated in Figure 7.2. 184 7. Pseudonodes and Designated Routers Hello Received from 1921.6800.1005 t (ms) 3 96 12 150 Hello Received from 1921.6800.1002 Hello Received from 1921.6800.1001 Hello Received from 1921.6800.1003 Hello Received from 1921.6800.1004 15 Hello Received from 1921.6800.1006 FIGURE 7.1. Even spread Hello arrival times are an ideal, desired environment t (ms) 3 60 Hello Received from 1921.6800.1004 Hello Received from 1921.6800.1003 Hello Received from 1921.6800.1001 Hello Received from 1921.6800.1002 Hello Received from 1921.6800.1005 Hello Received from 1921.6800.1006 FIGURE 7.2. A lot of Hellos hitting the control plane CPU at the same time may exhaust resources 7.1.2 Scheduling Hellos How is the Hello scheduled? This depends on the Hold timer which controls adjacency expiration. In order to avoid adjacency expiration each neighbouring router sends Hellos to reset the Hold timer before it expires. In every implementation of IS-IS there is an internal constant called the Hello-Multiplier. The Hello Interval is calculated by dividing the Hold timer by the Hello-Multiplier. The Hold timer reset by receipt of an Hello is illus- trated in Figure 5.3 in Chapter 5 “Neighbour Discovery and Handshaking”. For example, a Hold timer of 30 s and a Hello-Multiplier of 3 results in a Hello Interval of 10 s. If the system dispatches exactly each 10 s a Hello then there may be risk that the system is starting to self-synchronize and after some local network events all routers on the LAN will generate their Hellos at the same point in time. To avoid the effect of self-synchronization ISO 10589 mandates to jitter timers for scheduling Hellos. 7.1.3 Applying Jitter to Timers What does applying a jitter to timers mean and how does it attempt to solve the self- synchronization problem? Applying a jitter means scheduling a Hello before it must be sent. The trick is that each router on a LAN deducts a random time off the original Hello timer. Because each router computes its own independent random number it is made sure that routers never send Hellos at the same point in time. ISO 10589 mandates to apply a 25 per cent jitter on Hellos. The 25 per cent mean that a random number between the 0 and 25 per cent mark of the original timer is computed. The random number should be truly random in the sense that the numbers the random- generator produces have a uniform distribution over the entire space that it covers. For example, a 25 per cent jitter of an underlying 10 s Hello timer would result in a random time between 0 and 2.5 seconds. Finally the jitter is subtracted from the original timer. In Figure 7.3 the jitter calculation is illustrated. Both IOS and JUNOS do apply a 25 per cent jitter to their Hello timer before scheduling the Hello for transmission. In the following tcpdump output you can see that the Timestamps are not spaced in discrete 10 s intervals – it is always varying a little less than 10 s. Tcpdump output 00:11:39.391338 OSI, IS-IS, L1 Lan IIH, src-id 0000.0000.0002, lan-id 0000.0000.0001.02, prio 65, length 74 00:11:48.951503 OSI, IS-IS, L1 Lan IIH, src-id 0000.0000.0002, lan-id 0000.0000.0001.02, prio 65, length 74 00:11:57.061652 OSI, IS-IS, L1 Lan IIH, src-id 0000.0000.0002, lan-id 0000.0000.0001.02, prio 65, length 74 00:12:05.451811 OSI, IS-IS, L1 Lan IIH, src-id 0000.0000.0002, lan-id 0000.0000.0001.02, prio 65, length 74 00:12:14.671953 OSI, IS-IS, L1 Lan IIH, src-id 0000.0000.0002, lan-id 0000.0000.0001.02, prio 65, length 74 Scaling Adjacencies on Large LANs 185 Applying a jitter on the timers offers a good distribution of the scheduled Hellos among the LAN routers over time. It is used in many other places as well. IOS and JUNOS go much fur- ther as required by ISO 10589. For almost every one-time and periodic event the system applies jitter. Virtually all IS-IS packet dispatching routines apply between 5 per cent and 25 per cent jitter for Hellos (IIHs), Sequence Number PDUs (SNPs) and link-state PDUs (LSPs). As soon as the router maintains a high number of adjacencies on the LAN circuit it needs to advertise them in its link-state PDU. A large number of LAN adjacencies raises the ques- tion of how to represent all the router-to-router relationships in the link-state database. 7.2 Pseudonodes See Figure 7.4 for an illustration of six routers that are located on the same LAN. The LAN is transitive; this means that all the routers can see each other. Each of the routers gener- ates an LSP and tells the world that it has five neighbours on the LAN by explicitly list- ing them inside the IS Reachability TLV #2 or #22. Any-to-any connectivity lets grow the size of the link-state database by an order of O(N 2 ). This is often referred to as the N 2 problem. 7.2.1 The N 2 Problem Figure 7.5 illustrates the relationship between the size of IS-reach information in the link-state database and the number of routers on a LAN. Arguably the absolute size of the link-state database is a moderate problem compared to the dynamic effects of a full-mesh advertisement. Every time a new router N gets on the LAN, all the other routers (N Ϫ 1) that have been on the LAN previously need to update their LSPs to list the adjacency to the new router. This results in a massive LSP update storm because all the routers on the LAN need to tell the network that there has been a change in adjacencies. The same update storm happens if a router is disconnected from the LAN. The dynamic component (routers joining or leaving the sub-net) is a more important problem than database storage size. 186 7. Pseudonodes and Designated Routers 10s Hello Timer t (s) 2 100 4 6 8 Random jitter 1 3 5 7 9 2.5s F IGURE 7.3. A 25 per cent jitter on the basis of a 10 s timer results in a random Hello between 7.5 and 10 s . the IS-IS code right the first time. Do not be misled. I am not asserting that no other engineers but these few can ever get IS-IS right. With enough time, and with customers willing to take the pain. reset the Hold timer before it expires. In every implementation of IS-IS there is an internal constant called the Hello-Multiplier. The Hello Interval is calculated by dividing the Hold timer by the. located on the same LAN. The LAN is transitive; this means that all the routers can see each other. Each of the routers gener- ates an LSP and tells the world that it has five neighbours on the LAN