The Complete IS-IS Routing Protocol- P49 ppt

The forwarding state change of tens of thousands of routes may stress several sub-systems of an Internet core router. It turns out that changing a forwarding state is one of the most expensive operations in a router. Meanwhile, both Juniper and Cisco have found a way to pass on third party next-hop information to the line-cards and retain the dependency of BGP routes to IS-IS speakers to forwarding interfaces. More on passing on third party next- hop information, and why it is not always a good idea to attempt to fully resolve a route to its forwarding next-hop, can be found in Chapter 10, “SPF and Route Calculation”. 482 16. Network Design Wash D.C. Metric 4 Metric 2 Metric 4 Metric 2 Metric 1Metric 1 Metric 4 Metric 4 Pennsauken Frankfurt London Washington New York Paris BGP 40 K active routes BGP 25 K active routes BGP 30 K active routes BGP 15 K active routes BGP 20 K active routes BGP 10 K active routes FIGURE 16.3. The resolver needs to track and map BGP next-hops to the shortest path resulting from the SPF calculation 16.2.4 CPU and Memory Usage The two main things that utilize the CPU most in an IS-IS router are the SPF calculation and the resolver. SPF calculation puts a short burden on the system but even in large topologies that burden does not last more than 200 ms using modern route processors. As discussed in the previous section, the far bigger CPU hog is the resolver, which maps BGP routes to forwarding next-hops. SPF execution runtime is ultimately a non-issue; however, the burden that the resolver can put on the system needs to be carefully examined. In the 1990s, during the explosive growth of the Internet, routers were constantly short of memory. Since then network service providers are cautious about the memory usage of their routing protocols. There is almost no IS-IS-related documentation regarding memory consumption. The majority of IS-IS implementations use memory in three areas: 1. Link-state database 2. SPF result table 3. Storing neighbour information The link-state database size is the easiest to predict. It contains mostly raw data that was extracted from the TLVs in an IS-IS PDU. There are also overhead and index structures so the IS-IS software can quickly traverse the database when it is looking for a certain LSP. As a rough guideline, one can state that the size of the link-state database is about double the size that individual LSPs consume on the wire. For example, if the network knows about 100 LSPs with an average length of 400 bytes each, then the size to store this information in the router software is 100 * 400 * 2 ϭ 80 KB. The size of the SPF result table depends largely on how many IP prefixes are known to IS-IS inside the network. A good estimation here is that each prefix consumes about 70 bytes. For example, if you have 1600 IS-IS prefixes in your network, then the memory consumption on the control plane is 112 KB. The neighbouring table is the most complex one to calculate as all the flooding state and retransmission list needs to be kept on a per adjacency basis. That structure is also dependent on the size of the link-state database, because all the flooding states are tied to both the LSP and the adjacency. There is a lot of clever pointer work involved here, and the overhead to do efficient flooding is enormous. A good approximate figure is that this table is about 50 times the average LSP size multiplied by the number of active adjacencies. For example, if the average LSP is about 400 bytes and the number of adjacencies is eight, then the memory consumption is 400 * 50 * 8 ϭ 160 K. If you sum the three memory areas up, then the result for a large network is unlikely to exceed 4–5 MB in total. In IS-IS, the memory consumption is minimal given that there are mainly route processors with 256 MB–2 GB memory deployed in the field. Interestingly, there are large overhead structures in the LSP databases to increase LSP lookup speed and to keep flooding state even for large numbers of adjacencies. This is just more evidence that memory consumption for IS-IS networks with big core routers is a non-issue. Router Stress 483 16.3 Design Recommendations Through the years of designing large IS-IS networks, and based on the experience of NOC engineers and software engineers at the big router vendors, the authors have come up with the following design tips to design truly scalable networks. Those recommendations are not rigid, that is, you do not need to follow them all to the letter. To be a good network designer, you have to find a healthy balance between what the products can do and what you want to achieve. The rest of this chapter draws on many of the topics and ideas discussed throughout this book. There is no need to repeat more than the basics of the discussions, however, so we don’t present all of the gory details all over again. 16.3.1 Separate Topology and IP Reachability Data Perhaps the most important rule is keeping topology and IP reachability data separate. You saw that IGPs are not very good at transporting large numbers of routes, so just avoid it and pass the job to BGP. In large (more than 1000 routers per level) you may even decide to advertise directly connected routes in BGP as well. Given that an average IS-IS core router has about five or six directly attached sub-nets, then you clearly want to avoid that extra 2500–3000 prefixes at the IS-IS level in order to keep convergence times within an upper bound. An ideal IS-IS LSP contains just a single IP prefix, which is the router’s loopback IP address, plus Extended IS Reach TLVs that point to neighbouring routers. Tcpdump output An ideal LSP just conveys a single IP prefix per router and passes all other routing information via BGP. 12:36:45.587565 OSI, IS-IS, length: 405 hlen: 27, v: 1, pdu-v: 1, sys-id-len: 6 (0), max-area: 3 (0) L2 LSP, lsp-id: 2092.1113.4009-00, seq: 0x000002fd, lifetime: 1198s chksum: 0xe984 (correct), PDU length: 185, Flags: [ L1L2 IS ] Area address(es) TLV #1, length: 4 Area address (length: 3): 49.0001 Protocols supported TLV #129, length: 1 NLPID(s): IPv4 IPv4 Interface address(es) TLV #132, length: 4 IPv4 interface address: 192.168.1.1 Hostname TLV #137, length: 10 Hostname: Washington Extended IS Reachability TLV #22, length: 99 IS Neighbor: 1921.6800.1077.00, Metric: 4, sub-TLVs present (12) IPv4 interface address (subTLV #6), length: 4, 172.17.1.6 IPv4 neighbor address (subTLV #8), length: 4, 172.16.1.5 484 16. Network Design IS Neighbor: 1921.6800.1043.00, Metric: 4, sub-TLVs present (12) IPv4 interface address (subTLV #6), length: 4, 172.16.33.38 IPv4 neighbor address (subTLV #8), length: 4, 172.16.33.37 IS Neighbor: 1921.6800.1018.00, Metric: 4, sub-TLVs present (12) IPv4 interface address (subTLV #6), length: 4, 172.16.33.25 IPv4 neighbor address (subTLV #8), length: 4, 172.16.33.26 Extended IPv4 reachability TLV #135, length: 9 IPv4 prefix: 192.168.1.1/32, Distribution: up, Metric: 0 Authentication TLV #10, length: 17 HMAC-MD5 password: 68e18feb2e29257113e4bb6580169310 16.3.2 Keep the Number of Active BGP Routes per Node Low Vendors have come up with smart representations of BGP routes and how those routes depend on IS-IS routes. However, there is one fault condition where even smart route representations inside a router do not gain us much. If an entire BGP speaker disappears, then when the BGP speaker goes down the BGP control plane needs to re-route all those prefixes, which of course takes time. If an IS-IS router is carrying a large number of active routes, then it takes proportionally longer if that BGP router goes down. Figure 16.4 shows that, on the left-hand side, Washington is a “hotspot” BGP speaker that car- ries the majority of BGP routes. If this speaker goes down, then you need to re-route all 120 K routes, which can cause a network wide outage of up to 3 minutes. The logical step is to spread those 120 K routes among several routers as shown on the right-hand side of Figure 16.4. In well-developed peering meshes, the average number of routes per border router is not more than 10 K. In our example, because of a lack of routers, we still did not put more than 30 K routes per node. In practice, if you receive more than 10 K routes per peer, then you may need to consider a redundant router and spread the incoming prefixes over the two redundant routers. Re-routing 10 K prefixes if the active router breaks down can be done in a matter of 5–10 seconds. 16.3.3 Avoid LSP Fragmentation IS-IS has plenty of space (precisely 375,040 bytes per LSP) in the distributed database. Despite this vast amount of information that an individual IS-IS speaker can originate, you typically do not want to use that storage size – ever. You should try to accommodate all the information that you need in maxLSPsize (1492) – LSP header (27) ϭ 1465 bytes. There may be a number of additional LSP updates if you cross an LSP boundary and have to break things up into another segment. Consider Figure 16.5 to see what happens if you are at the edge of Fragment 0 and an additional adjacency comes up. Router 1921.6800.1018 decides that it needs to break up another segment. Router 1921. 6800.1018 generates the fragment and floods it. The troubles start if any of the router’s other sub-nets or adjacencies become unavailable. Assume that Adjacency #4 falls down, and then the entire TLVs that follow this particular adjacency gets shifted, and also may fall into another fragment. Considering the example in Figure 16.5, there is no need to Design Recommendations 485 486 Frankfurt London New York Frankfurt London BGP BGP BGP BGP BGP BGP BGP BGP Pennsauken 20K active routes 120K active routes Washington Paris 20K active routes Pennsauken 30K active routes New York 30K active routes Washington 20K active routes Paris 25K active routes 15K active routes F IGURE 16.4. In a well-developed peering mesh the BGP routes are almost e venly distributed over the entire network 487 TLVs Extd-IS Reach Neighbour #1 Extd-IS Reach Neighbour #2 Extd-IS Reach Neighbour #3 Extd-IS Reach Neighbour #4 Extd-IS Reach Neighbour #5 Extd-IS Reach Neighbour #6 Extd-IS Reach Neighbour #7 Extd-IS Reach Neighbour #8 Extd-IS Reach Neighbour #9 Extd-IS Reach Neighbour #10 LSP 1921.6800.1018.00-00, Sequence 0x1, Lifetime 1200s LSP 1921.6800.1018.00-01, Sequence 0x1, Lifetime 1195s TLVs Extd-IS Reach Neighbour #11 TLVs TLVs 1 2 Extd-IS Reach Neighbour #1 Extd-IS Reach Neighbour #2 Extd-IS Reach Neighbour #3 Extd-IS Reach Neighbour #4 Extd-IS Reach Neighbour #5 Extd-IS Reach Neighbour #6 Extd-IS Reach Neighbour #7 Extd-IS Reach Neighbour #8 Extd-IS Reach Neighbour #9 Extd-IS Reach Neighbour #10 Extd-IS Reach Neighbour #1 Extd-IS Reach Neighbour #2 Extd-IS Reach Neighbour #3 Extd-IS Reach Neighbour #4 Extd-IS Reach Neighbour #5 Extd-IS Reach Neighbour #6 Extd-IS Reach Neighbour #7 Extd-IS Reach Neighbour #8 Extd-IS Reach Neighbour #9 Extd-IS Reach Neighbour #10 Extd-IS Reach Neighbour #11 LSP 1921.6800.1018.00-00, Sequence 0x2, Lifetime 1195s LSP 1921.6800.1018.00-00, Sequence 0x2, Lifetime 1197s LSP 1921.6800.1018.00-01, Sequence 0x2, Lifetime 1197s empty TLV block F IGURE 16.5. IS-IS fragmentation may cause excess LSP updates if adjacencies w ander across several fragments use Fragment #1 now, as everything would easily fit into Fragment #0. Fragment #1 is tossed using a network-wide purge. The trouble here is that a single change in a router’s adjacency may cause several fragments to get re-aligned. ISO 10589 recommends spar- ing the top 10 per cent of LSP space for problem scenarios like this. That is, when an LSP is built, then only the first 1318 bytes (1465 – 10 per cent) are used for data. The top 10 per cent are reserved to take up “wandering adjacencies” from higher fragments as those fragments shrink below a 146-byte fill level. There is a lot of clever heuristics involved (you could even pad lost adjacencies using the Padding TLV #8 in order to avoid fragment shifts); however, most implementations keep those heuristics to a minimum. In order to avoid fragment shifts, the best approach is to avoid fragmentation at all. Tcpdump output An adjacency carrying full TE extensions consumes 75 bytes on the wire. Extended IS Reachability TLV #22, length: 75 IS Neighbor: 2092.1113.4007.00, Metric: 5, sub-TLVs present (64) IPv4 interface address (subTLV #6), length: 4, 172.16.1.6 IPv4 neighbor address (subTLV #8), length: 4, 172.16.1.5 Unreserved bandwidth (subTLV #11), length: 32 priority level 0: 9953.280 Mbps priority level 1: 9953.280 Mbps priority level 2: 9953.280 Mbps priority level 3: 9953.280 Mbps priority level 4: 9953.280 Mbps priority level 5: 9953.280 Mbps priority level 6: 9953.280 Mbps priority level 7: 9953.280 Mbps Reservable link bandwidth (subTLV #10), length: 4, 9953.280 Mbps Maximum link bandwidth (subTLV #9), length: 4, 9953.280 Mbps Administrative groups (subTLV #3), length: 4, 0x00000000 If you consider that you almost need no space for IP Reachability-related TLVs, there is approximately space for 18 * 75 bytes of full-blown adjacencies using the full-set of TE sub-TLVs, which ought to be enough even for larger core routers. 16.3.4 Reduce Background Noise IS-IS has the nice advantage over OSPF in that IS-IS can control its own LSP refresh rate. In IS-IS the max-LSP-age is a countdown function, which is user configurable. That is, each router is required to refresh its LSP (refresh just means bump the sequence number and leave the contents unchanged) in less than max-LSP-age. The recommended value for implementers is to set the max-LSP-age refresh timer to a value less than 300 seconds, but this is very low. The default value of the max-LSP-age is set to 1200 seconds, which is also the recommended value mentioned in ISO 10589. If you keep the 488 16. Network Design default value, or use the 300 value, you end up tolerating a lot of “refresh noise” based on the relatively small interval of 1200 seconds (20 minutes). For example, in a network consisting of 400 routers, this means on average every 3 seconds a network-wide flood of an LSP from some router even when the network is quiet (there are no link flaps, and no topology changes, and so on). Both IOS and JUNOS allow you to change that default value of 1200 seconds to get to a lower amount of refresh noise in your network. The recommended value is to set the max-LSP-age timer to 65,535 seconds, which extends the refresh period to 18.2 hours and therefore reduces the refresh noise by a factor of 50. There are no side-effects of changing the default value, and it remains an open question for router vendors as to why this higher value is not made the default value, because every service provider changes it to this value anyway. Keep in mind that in IOS you need to set both the lsp- age timer as well as the lsp-refresh timer and subtract the 300 seconds to get a proper refreshing. JUNOS internally calculates a “sane” timer based on the configured lsp-age. 16.3.5 Rely on the Link-layer for Fault Detection Many service providers believe that the key for getting to sub-second convergence is to tweak all the timers in a router, particularly the Hello and Hold timers. Unfortunately today some implementations of routing protocols are not real-time capable. If you make your non-real-time capable IS-IS implementation generate a Hello every 333 ms on hun- dreds of adjacencies, this may cause some side-effects. Consider the processing of a big BGP batch run, where the router may not be able to revisit the code that submits the Hellos, which in turn may cause network-wide churn due to missed Hellos. Considering that not all vendors support real-time control planes for IS-IS, we have to go down the road of the lowest common denominator. In many router implementations, generation of link-layer messages like keep-alives are handled by the forwarding complex, which typically does run a real-time OS (or at least a tweaked OS that is close enough). In order to get real-time detection, we offload this task to the forwarding complex. Fault detection works reasonably well on certain interface technologies like SONET/SDH. No surprise here! SONET/SDH have the best liveness protocol you can think of. Among the SONET/SDH overhead are bytes (K1/K2, K3, K4) that carry Remote Defect Indicator (RDI) bits which are immediately set if there is a problem along the SONET/SDH link. Due to SONET/SDH requirements, that message will be sent, worst case, within 50 ms of a failure and travel through every node along the path. In the ATM world, end-to-end fault detection is performed by operation and management (OAM) cells that are inserted by routers at both ends of a Virtual Connection (VC). The OAM cells are a nice liveness protocol that can perform fault-detection for IS-IS as well. The only remaining problem is Ethernet. Because of its inherent simplicity, there is no link-layer protocol where you could embed Ethernet keep-alive messages. Historically there was never any possibility to get quick fault detection on Ethernet except through tuning IS-IS Hold timers. But now there is a solution called bi-directional fault detection (BFD) for this purpose. BFD is described in draft-katz-ward-bfd-00.txt and the protocol and its Design Recommendations 489 mechanisms are simple: The idea is to set up a high frequency (Ͻ100 ms) exchange of UDP packets. If that exchange is disrupted there must be a problem with the underlying media and the link can be declared down. As soon as there are interoperable BFD implementations it will become the method of choice as a liveness protocol for Ethernet. Table 16.2 shows a short summary of the preferred interface media type fault- detection protocols over IS-IS. As for every major interface type there is a high-frequency fault detection protocol available and so there is no need to abuse IS-IS to provide that function. It is our recom- mendation to use the per-interface media type-dependent fault-detection protocols and leave IS-IS with its default Hello timers. 16.3.6 Simple Loopback IP Address to System-ID Conversion Schemes The 6-byte System-ID field has an inherent drawback. For administering System-IDs there are almost no address management tools available that can cope with 6-byte address entities. For the network service operator there are two choices: 1. Develop a custom address management tool for 6-byte System-IDs 2. Do not manage System-IDs – rather auto-derive it from IPv4 loopback addresses Typically, network service providers do not want to maintain yet another list of addresses, and therefore there are very simple mapping concepts for converting IPv4 loopback addresses to System-IDs. It is recommended to keep these schemes as simple as possible. The simplest form is the binary coded decimal (BCD) conversion where the IP address is represented in decimal notation and the resulting digits make up the System-ID. See Figure 16.6 for a few conversion examples. 490 16. Network Design TABLE 16.2. For every interface media type there is a high-frequency fault-detection protocol available. Interface media type Liveness protocol SONET/SDH SONET/SDH RDI ATM OAM cells Ethernet Bi-directional fault detection 192.168.13.1 193.83.223.237 172.1.14.18 IP Address System-ID 1921.6801.3001 1930.8322.3237 1720.0101.4018 FIGURE 16.6. The best conversion tool is a simple binary coded decimal (BCD) conversion Simple System-ID schemes also have the advantage that once you need to troubleshoot complex synchronization and flooding problems, it is convenient to have simple schemes to spot on certain routers. Tcpdump output When you are (for example) troubleshooting a synchronization problem, then it is handy if you can easily derive the IPv4 address of routers by use of a simple mapping scheme. 21:14:07.712478 OSI, IS-IS, length: 1478 L2 CSNP, hlen: 33, v: 1, pdu-v: 1, sys-id-len: 6 (0), max-area: 3 (0) source-id: 6b01.c219.07fa.00, PDU length: 275 start lsp-id: 1921.6800.1001.00-00 end lsp-id: 1921.6800.1039.00-00 LSP entries TLV #9, length: 240 lsp-id: 1921.6800.1001.00-00, seq: 0x00000562, lifetime: 5014s, chksum: 0x03dc lsp-id: 1921.6800.1003.00-00, seq: 0x0000073a, lifetime: 31107s, chksum: 0xdb8b lsp-id: 1921.6800.1005.00-00, seq: 0x0000050c, lifetime: 5205s, chksum: 0xa8bf lsp-id: 1921.6800.1006.00-00, seq: 0x00000d20, lifetime: 30639s, chksum: 0x2699 lsp-id: 1921.6800.1007.00-00, seq: 0x0000089f, lifetime: 52194s, chksum: 0x74ad lsp-id: 1921.6800.1011.00-00, seq: 0x00000319, lifetime: 61707s, chksum: 0xc69e lsp-id: 1921.6800.1011.00-01, seq: 0x0000008e, lifetime: 44126s, chksum: 0x6e4d lsp-id: 1921.6800.1013.00-00, seq: 0x000002c0, lifetime: 36610s, chksum: 0xb05d lsp-id: 1921.6800.1013.00-01, seq: 0x000000b0, lifetime: 5052s, chksum: 0x0e21 lsp-id: 1921.6800.1013.00-03, seq: 0x0000029f, lifetime: 11790s, chksum: 0x5bfa lsp-id: 1921.6800.1033.00-00, seq: 0x00000318, lifetime: 11255s, chksum: 0xbb6e lsp-id: 1921.6800.1034.00-00, seq: 0x000006f4, lifetime: 48962s, chksum: 0x634f lsp-id: 1921.6800.1037.00-00, seq: 0x000005bf, lifetime: 44818s, chksum: 0x4701 lsp-id: 1921.6800.1038.00-00, seq: 0x000013fc, lifetime: 8664s, chksum: 0x93d4 lsp-id: 1921.6800.1039.00-00, seq: 0x000014b9, lifetime: 17862s, chksum: 0x2894 Particularly when you need to parse packet dumps like the above using network ana- lyzers, and you do not have the name cache ready, then simple conversion logic makes Design Recommendations 491 . providers are cautious about the memory usage of their routing protocols. There is almost no IS-IS- related documentation regarding memory consumption. The majority of IS-IS implementations use. information The link-state database size is the easiest to predict. It contains mostly raw data that was extracted from the TLVs in an IS-IS PDU. There are also overhead and index structures so the IS-IS. 16.3. The resolver needs to track and map BGP next-hops to the shortest path resulting from the SPF calculation 16.2.4 CPU and Memory Usage The two main things that utilize the CPU most in an IS-IS

Định dạng
Số trang	10
Dung lượng	169,49 KB