key 1 key-string 0 secret789 ! interface POS4/1 description “to Frankfurt so-0/2/0” ip address 172.16.33.17 255.255.255.252 ip router isis encapsulation ppp crc 16 clock source internal pos scramble-atm isis circuit-type level-1 ! router isis net 49.0010.1921.6800.0012.00 authentication mode md5 authentication key-chain MY-ISIS-PASSWORD metric-style wide is-type level-1 passive-interface Loopback0 ! Let’s see if the two configurations are working. Nope, neither router sees the other. What could be wrong? London#sh clns neighbors System Id Interface SNPA State Holdtime Type Protocol hannes@Frankfurt> show isis adjacency hannes@Frankfurt> show isis interface IS-IS interface database: Interface L CirID Level 1 DR Level 2 DR L1/L2 Metric lo0.0 0 0x1 Disabled Passive 0/0 15.3.1.1 Missing PPP-OSICP Configuration In our phased troubleshooting approach, first we’ll check the underlying physical and logical interface: IOS command output Only IPCP is up – The OSICP state is listening. London#show interfaces pos4/1 POS4/1 is up, line protocol is up Hardware is Packet over SONET Description: “to Frankfurt so-0/2/0” Internet address is 172.16.33.13/30 MTU 4470 bytes, BW 155000 Kbit, DLY 100 usec, rely 255/255, load 1/255 462 15. Troubleshooting Encapsulation PPP, crc 16, loopback not set Keepalive set (10 sec) Scramble enabled LCP Open Listen: CDPCP, OSICP Open: IPCP [… ] JUNOS command output On the JUNOS side we do not even attempt to open up the OSICP because the router is not configured to do so! hannes@Frankfurt> show interfaces so-0/2/0 Physical interface: so-0/2/0, Enabled, Physical link is Up Interface index: 148, SNMP ifIndex: 66 Description: to London POS4/1 Link-level type: PPP, MTU: 4474, Clocking: Internal, FCS: 16, Payload scrambler: Enabled Device flags : Present Running Interface flags: Point-To-Point SNMP-Traps Link flags : Keepalives Keepalive settings: Interval 10 seconds, Up-count 1, Down-count 3 Keepalive: Input: 291 (00:00:04 ago), Output: 296 (00:00:03 ago) LCP state: Opened NCP state: inet: Opened, inet6: Not-configured, iso: Not- configured, mpls: Not-configured CHAP state: Not-configured [… ] On the IOS side, we encounter a circuit that is eager to speak OSICP, but does not receive any OSI frame from the other side. We can also rule out physical problems at this point as the Line Control Protocol (LCP) and IP Control Protocol (IPCP) are both up and running. On the JUNOS side, the output tells us that OSI support is not even configured. Checking the configuration reveals that we forgot to set the family iso keyword at the logical interface level (you’d be surprised how often this happens). JUNOS configuration change We forgot the family iso on the SONET interface on the JUNOS side. So the PPP-OSICP did not get started. hannes@Frankfurt# show | compare [edit interfaces so-0/2/0 unit 0] + family iso; After adding the family iso statement at the logical interface Level, OSICP comes up, but our adjacency is still down. What else could be wrong? Case Studies 463 15.3.1.2 Non-matching Level Next, we check to see if there is a mismatch in our Level configuration by checking the debug logfiles. JUNOS configuration/debug output The JUNOS trace log reveals that there is a Level mismatch. hannes@Frankfurt> show configuration protocols isis traceoptions { file isis-trace.log; flag hello detail; flag error; } [… ] *** isis-trace.log *** Nov 21 23:53:11 Received PTP IIH, source id 1921.6800.0012 on so-0/2/0.0 Nov 21 23:53:11 intf index 69 Nov 21 23:53:11 max area 0, circuit type l1, packet length 4469 Nov 21 23:53:11 hold time 30, circuit id 1 Nov 21 23:53:11 ERROR: IIH from 1921.6800.0012 with no matching level, interface so-0/2/0.0, circuit type 1 The Frankfurt router complains that it got a Hello from London and there is a circuit mismatch reported. IOS debug output IOS does not detect any Level mismatches. London#debug isis adj-packets IS-IS Adjacency related packets debugging is on *Nov 22 00:49:12: ISIS-Adj: Rec serial IIH from *PPP* (POS4/1), cir type L2, cir id 01, length 67 *Nov 22 00:49:12: ISIS-Adj: rcvd state DOWN, old state INIT, new state INIT *Nov 22 00:49:12: ISIS-Adj: Action = GOING DOWN Note that in the IOS debug file there is no indication for a Level mismatch. But check- ing the JUNOS configuration, we find out that somebody must have set the Level 1 dis- able knob on the interface, which prevents a common Level to be found between the routers during the adjacency establishment process. JUNOS configuration diff Clearing the Level 1 disable flag makes the circuit a L1L2 circuit so that both peers have a common circuit type. hannes@Frankfurt# show | compare [edit protocols isis interface so-0/2/0.0] - level 1 disable; 464 15. Troubleshooting Changing the pure L2 circuit into a L1L2 or L1 lets the routers have a common Level; however, there are still other caveats to overcome before out adjacency will go up. For example, on a Level 1 adjacency, the Areas have to match. 15.3.1.3 Non-matching Area-ID Depending on the IS-IS circuit type, the Area-IDs need or need not match. For L1 adja- cencies there needs to be a match of one of the Areas-IDs, but for L2 Adjacencies the Area-ID is not relevant. JUNOS debug output hannes@Frankfurt> show configuration protocols isis traceoptions { file isis-trace.log; flag hello detail; flag error; } [… ] *** isis-trace.log *** Nov 22 00:09:25 Received PTP IIH, source id 1921.6800.0012 on so-0/2/0.0 Nov 22 00:09:25 intf index 69 Nov 22 00:09:25 max area 0, circuit type l1, packet length 4469 Nov 22 00:09:25 hold time 30, circuit id 1 Nov 22 00:09:25 17 bytes of authentication data Nov 22 00:09:25 restart RR reset RA reset holdtime 0 Nov 22 00:09:25 ptp adjacency tlv length 1 Nov 22 00:09:25 neighbor state initializing Nov 22 00:09:25 speaks IP Nov 22 00:09:25 area address 49.0001 (3) Nov 22 00:09:25 IP address 172.16.33.13 Nov 22 00:09:25 4371 bytes of total padding Nov 22 00:09:25 ERROR: IIH from London with no matching areas, interface so-0/2/0.0, our area 49.0100 JUNOS notes that there is no common Area-ID. It checks the Area-ID because the circuit-type is set to L1. IOS debug output The last change moves the circuit type from L1 to L1L2, however there are still no match- ing areas. London#debug isis adj-packets IS-IS Adjacency related packets debugging is on *Nov 22 01:05:34: ISIS-Adj: Rec serial IIH from *PPP* (POS4/1), cir type L1L2, cir id 01, length 48 *Nov 22 01:05:34: ISIS-Adj: rcvd state DOWN, old state INIT, new state INIT *Nov 22 01:05:34: ISIS-Adj: No matching areas *Nov 22 01:05:34: ISIS-Adj: Action = GOING DOWN Case Studies 465 IOS makes a similar log entry in the debug output. As we have no matching areas, we have two options. Either we can change the circuit-type to be Level-2, or we can change the Area-ID. In our case, we discover that circuit-type cannot be changed on the London router and we have to change the Area-ID accordingly. IOS configuration change London#configure terminal Enter configuration commands, one per line. End with CNTL/Z. London(config)#router isis London(config-router)# no net 49.0001.1921.6800.0012.00 London(config-router)# net 49.0100.1921.6800.0012.00 As the adjacency is still not up (take our word for it), we next check for an authenti- cation match. 15.3.1.4 Non-matching Authentication Before troubleshooting authentication information, we need to first find out which PDU types are authenticated. JUNOS debug output JUNOS reports an IIH Authentication failure because per-level configuration authenti- cates all PDUs including IIHs. Because authentication is always symmetric, the JUNOS router also expects that all IIHs are authenticated, but that is not the case. hannes@Frankfurt> show configuration protocols isis traceoptions { file isis-trace.log; flag hello detail; flag error; } [… ] *** isis-trace.log *** Nov 22 00:23:01 Received PTP IIH, source id 1921.6800.0012 on so-0/2/0.0 Nov 22 00:23:01 intf index 69 Nov 22 00:23:01 max area 0, circuit type l1, packet length 4469 Nov 22 00:23:01 hold time 30, circuit id 1 Nov 22 00:23:01 17 bytes of authentication data Nov 22 00:23:01 ERROR: IIH authentication failure The JUNOS router reports an authentication error for IIHs, quite the contrary to the IOS router, which does not report an authentication error. However, the IS-IS adjacency gets stuck on the Initialize state. 466 15. Troubleshooting IOS debug output On the IOS side, no authentication error is logged because IOS does not expect its Hellos to be authenticated. London#debug isis authentication information IS-IS authentication information debugging is on London#debug isis adj-packets IS-IS Adjacency related packets debugging is on *Nov 22 01:19:34: ISIS-Adj: Rec serial IIH from *PPP* (POS4/1), cir type L1L2, cir id 01, length 67 *Nov 22 01:19:34: ISIS-Adj: rcvd state DOWN, old state INIT, new state INIT *Nov 22 01:19:34: ISIS-Adj: Action = GOING UP, new type = L1 Although IOS has enabled MD5 authentication, it authenticates only LSPs and SNPs and not IIHs. But JUNOS does authenticate all PDU types and also expects authentica- tion from others, which breaks the adjacency in this case. There are two strategies to over come this. IOS configuration change London#conf t Enter configuration commands, one per line. End with CNTL/Z. London(config)#int pos4/1 London(config-if)# isis authentication key-chain MY-ISIS-PASSWORD London(config-if)# isis authentication mode md5 The first is to configure an additional IIH authentication on the interface. JUNOS configuration change hannes@Frankfurt# show | compare [edit protocols isis level 1] + no-hello-authentication; Most network administrators are too lazy to maintain an additional IS-IS configuration statement, and it is decided decide to suppress the authentication of IIH PDUs through use of the no-hello-authentication on the JUNOS router. 15.3.1.5 Non-matching IP Sub-net As our adjacency is still not up, we check the IP sub-net information using show com- mands. Additionally we keep an eye on the debug outputs. JUNOS debug output hannes@Frankfurt> show configuration protocols isis traceoptions { Case Studies 467 file isis-trace.log; flag hello detail; flag error; } [… ] *** isis-trace.log *** Nov 22 00:52:45 Received PTP IIH, source id London on so-0/2/0.0 Nov 22 00:52:45 intf index 69 Nov 22 00:52:45 max area 0, circuit type l1, packet length 4469 Nov 22 00:52:45 hold time 30, circuit id 1 Nov 22 00:52:45 17 bytes of authentication data Nov 22 00:52:45 restart RR reset RA reset holdtime 0 Nov 22 00:52:45 ptp adjacency tlv length 1 Nov 22 00:52:45 neighbor state down Nov 22 00:52:45 speaks IP Nov 22 00:52:45 area address 49.0100 (3) Nov 22 00:52:45 IP address 172.16.33.13 Nov 22 00:52:45 4371 bytes of total padding Nov 22 00:52:45 ERROR: IIH from 1921.6800.2001 without matching addresses, interface so-0/2/0.0 JUNOS refuses an adjacency if there is no common IP sub-net. IOS debug output London#debug isis adj-packets IS-IS Adjacency related packets debugging is on *Nov 22 01:40:52: ISIS-Adj: Rec serial IIH from *PPP* (POS4/1), cir type L1L2, cir id 01, length 1492 *Nov 22 01:40:52: ISIS-Adj: No usable IP interface addresses in serial IIH from POS4/1 IOS also checks to see if the Interface Address TLV is within the range of the own sub-net. JUNOS configuration change hannes@Frankfurt# show | compare [edit interfaces so-0/2/0 unit 0 family inet] + address 172.16.33.14/30; - address 172.16.33.17/30; In our example, there were two different IP sub-nets configured. Changing one side back to what was originally allocated should do the trick and at last bring the adjacency Up. (Don’t worry: it’s not usually this hard in the real world to get adjacencies up, even in multi-vendor environments.) 468 15. Troubleshooting IOS debug output *Nov 22 01:54:50: ISIS-Adj: Rec serial IIH from *PPP* (POS4/1),cir type L1L2, cir id 01,length 1492 *Nov 22 01:54:50: ISIS-Adj: rcvd state INIT, old state DOWN, new state INIT *Nov 22 01:54:50: ISIS-Adj: Action = GOING UP, new type = L1 *Nov 22 01:54:50: ISIS-Adj: New serial adjacency *Nov 22 01:54:50: ISIS-Adj: Sending serial IIH on POS4/1, length 4469 *Nov 22 01:54:50: ISIS-Adj: Rec serial IIH from *PPP* (POS4/1), cir type L1L2, cir id 01, length 58 *Nov 22 01:54:50: ISIS-Adj: rcvd state UP, old state INIT, new state UP *Nov 22 01:54:50: ISIS-Adj: Action = GOING UP, new type = L1 *Nov 22 01:54:50: ISIS-Adj: L1 adj count 1 The debug output shows the state transition from the Down to the Up state. Once we are there, our routers can talk to their neighbours and exchange LSPs. Sometimes routers exchange a bit too many LSPs, which is undesirable, too – there is a closer description of this problem in the following case study. 15.3.2 Injecting Full Internet Routes into IS-IS It is the nightmare of every network operation engineer: getting paged in the middle of the night and all routers are unreachable. The iBGP mesh is collapsing and the network is literally falling to pieces. Particularly on JUNOS routers, there was a dangerous trap that many service providers ran into. In 2002, the Juniper Technical Assistance Center (JTAC) noticed several incidents of the type we describe here: through human error, a router attempts to inject the full set of Internet routes into IS-IS Level 2. The generated flooding and processing load eventually melts down the entire network. During a network-wide failure, it is hard at first to determine where to look initially for traces and clues. A good place is the central syslog server. Often a few syslog messages that are logged just before the network go haywire and these provide a good starting point. Syslog server logfile The Munich router is logging every second that its IS-IS database is overloaded. [… ] Nov 21 18:22:22 Munich rpd[2235]: RPD_ISIS_OVERLOAD: IS-IS database overload Nov 21 18:22:23 Munich rpd[2235]: RPD_ISIS_OVERLOAD: IS-IS database overload Nov 21 18:22:24 Munich rpd[2235]: RPD_ISIS_OVERLOAD: IS-IS database overload Nov 21 18:22:25 Munich rpd[2235]: RPD_ISIS_OVERLOAD: IS-IS database overload Nov 21 18:22:26 Munich rpd[2235]: RPD_ISIS_OVERLOAD: IS-IS database overload [… ] Case Studies 469 By inspecting the syslog server, a set of log entries is spotted that indicates a database overload by a router running JUNOS. By consulting the documentation (System Log Messages Reference) we find out that the RPD_ISIS_OVERLOAD message is logged when the router has no memory (!) or is running out of LSP fragments. Next, we inspect the link-state database on any router and try to verify if the Munich router has run out of fragments. JUNOS command output The show isis database output reveals that the Munich router is generating 256 LSP fragments that are purged already. hannes@Munich> show isis database [… ] IS-IS level 2 link-state database: LSP ID Sequence Checksum Lifetime Attributes Munich.00-00 0x11 0x6dac 982 L1 Overload Munich.00-01 0xca 0 0 L1 Munich.00-02 0xca 0 0 L1 Munich.00-03 0xca 0 0 L1 Munich.00-04 0xca 0 0 L1 Munich.00-05 0xca 0 0 L1 Munich.00-06 0xca 0 0 L1 Munich.00-07 0xca 0 0 L1 Munich.00-08 0xca 0 0 L1 Munich.00-09 0xca 0 0 L1 Munich.00-0a 0xca 0 0 L1 Munich.00-0b 0xca 0 0 L1 Munich.00-0c 0xca 0 0 L1 Munich.00-0d 0xc9 0 0 L1 Munich.00-0e 0xc9 0 0 L1 Munich.00-0f 0xc9 0 0 L1 [… ] Munich.00-f8 0x1 0 0 L1 Munich.00-f9 0x1 0 0 L1 Munich.00-fa 0x1 0 0 L1 Munich.00-fb 0x1 0 0 L1 Munich.00-fc 0x1 0 0 L1 Munich.00-fd 0x1 0 0 L1 Munich.00-fe 0x1 0 0 L1 Munich.00-ff 0x1 0 0 L1 Pennsauken.00-00 0x33 0xed52 1193 L1 London.00-00 0x3af 0x865e 744 L1 Frankfurt.00-00 0x19 0x8612 980 L1 268 LSPs [… ] 470 15. Troubleshooting The output looks odd – the Munich router is generating in total 256 fragments, and the router is overloaded. Why? Inspecting the non-zero fragments shows another interesting trace: JUNOS command output All the Munich non-zero LSP fragments have the garbage collection timer set. hannes@Frankfurt> show isis database Munich.00-01 extensive IS-IS level 2 link-state database: [… ] Munich.00-01 Sequence: 0xca, Checksum: 0, Lifetime: 0 secs Header: LSP ID: Munich.00-01, Length: 40 bytes Allocated length: 284 bytes, Router ID: 0.0.0.0 Remaining lifetime: 0 secs, Level: 1,Interface: 64 Estimated free bytes: 209, Actual free bytes: 244 Garbage collection timer expires in: 1134 secs [… ] The LSP fragments do not contain any data anymore. All of them are still in the database for their maximum LSP lifetime to avoid re-learning them in case the network gets partitioned. For our further troubleshooting, this means that somebody has purged the Munich LSP fragments. In IS-IS the only router that purges LSPs is the orig- inating router. So we next inspect the Munich router and check out its router configura- tion file. JUNOS configuration The IS-IS configuration looks alright, but there is also an export policy configured which should be further inspected. hannes@Munich> show configuration [… ] protocols { isis { export static-to-isis; level 2 { wide-metric-only; } interfaces { [… ] lo0.0; } [… ] } } Case Studies 471 . adjacency, the Areas have to match. 15.3.1.3 Non-matching Area-ID Depending on the IS-IS circuit type, the Area-IDs need or need not match. For L1 adja- cencies there needs to be a match of one of the. notes that there is no common Area-ID. It checks the Area-ID because the circuit-type is set to L1. IOS debug output The last change moves the circuit type from L1 to L1L2, however there are still. 00:23:01 17 bytes of authentication data Nov 22 00:23:01 ERROR: IIH authentication failure The JUNOS router reports an authentication error for IIHs, quite the contrary to the IOS router, which