401 TCP ACK (40) ATM Hdr (5) AAL-5 Trailer (8) LLC Header (3) ATM Hdr (5) (last cell) AAL5 pad (40) 53 SNAP Header (5) 53 40 40 /106 ؍ 37% TCP ACK (40) TCP ACK (40) 48 L3 L2 AAL5 TCP ACK (40) AAL-5 Trailer (8) pad pad Nx48 ATM LLC Header (3) SNAP Header (5) LLC Header (3) SNAP Header (5) F IGURE 14.7. The worst case scenario is a 40-byte payload size, which requires tw o ATM cells 14.3 Traffic Engineering by MPLS The first big application for MPLS is traffic engineering. Service providers should be able to guide traffic between any two points inside their network. To deviate from the prevailing hop-by-hop routing paradigm that always guides the traffic along the shortest path through a network, a new forwarding paradigm had to be introduced. 14.3.1 Introduction to MPLS The forwarding decisions in ATM and Frame Relay networks are truly independent from the Network Layer protocol. The forwarding engine itself does not see the Network Layer protocol; all it sees is the ATM cell or Frame Relay header. Based on the VPI/VCI or DLCI field, the Layer-2 switch looks up the outgoing port and just as importantly, an outgoing VPI/VCI or DLCI. Based on this information, the VPI/VCI or DLCI is rewrit- ten before the cell or the packet leaves the router. The VPI/VCI or DLCI field has purely local meaning and is only valid on the interface downstream to the receiver. The concept of label swapping comes from the original ability of ATM and Frame Relay Switches to change the VPI/VCI or DLCI descriptor as the traffic leaves the chassis. It was clear to the designers of the new MPLS suite of protocols that each IP packet and frame had to be preceded by an MPLS header in order to support label swapping in the IP protocol family. The big question today is at what layer the MPLS header needs to exist. There are roughly two, unfortunately, fundamentally different views in the industry: • MPLS is a Layer-2 technology • MPLS is a Layer-3 technology In order not to further confuse readers with multiple layering terminologies (ISO layers, ATM layers, IP layers, MPLS layers), this book will typically use the terms cell-based MPLS for the ATM switch vendors’ view of MPLS as a Layer-2 technology, and packet- based MPLS for the router vendors’ view of MPLS being a Layer-3 technology. 14.3.1.1 Cell-based MPLS The proponents of MPLS as a Layer-2 technology argue that this transition path for exist- ing ATM networks is the smoothest. The vision is that an existing ATM network which runs Q.2931 signalling and PNNI for internal routing will be replaced by an IP stack of signalling protocols. Figure 14.8 illustrates the control plane transformation of an ATM network to running IS-IS and the combination of one or both of the two major signalling protocols used with MPLS: LDP and RSVP-TE. As the combination of Q.2931 and PNNI established VCIs in the ATM world, now an IP stack sets up the labels in MPLS. So the main question here is: what is a label in the cell-based MPLS world? In the con- trol plane, much has been changed by radically replacing the prevailing ATM control plane with an IP one. However, on the forwarding plane, almost nothing has changed. Figure 14.8 shows that cell-based MPLS makes use of the VPI/VCI fields for MPLS as well. The label will be written into the VPI/VCI fields and the forwarding paradigm stays the same. The label look up determines the outgoing port and outgoing label, with which the cell will be rewritten upon transmission. 402 14. Traffic Engineering and MPLS 403 PNNI Q.2931 Routing Signalling Cell Header OSPF/IS-IS RSVP/LDP GFC VPI VCI PT CLP HEC GFC Label PT CLP HEC F IGURE 14.8. In the cell-based MPLS world, IP protocols do the label setup Although cell-based MPLS may sound as being the panacea for smoothly rolling over to MPLS, there are important caveats to consider. First, cell-Based MPLS cores still need a SAR function at the ingress point of the network for converting IP frames to cells. And the semiconductor supplier market still lacks SAR chips faster than OC-12/STM-4. So the conclusion is that cell-based MPLS precludes itself for consideration as backbone technol- ogy for high-speed cores. The second important point is that, due to the finite size of the cell-header, there is no possibility for label stacking where multiple path labels are “pushed” and “popped” onto and off packets as they flow in an MPLS network. In large networks it turned out that label stacking is the foundation for scaling the distribution mesh for MPLS- based services. As a brief example, consider Figure 14.9, where for each customer VPN a set of LSPs is set up in the core. To add another customer, another distribution mesh that connects all the Provider Edge devices through the core network is needed. Although ATM vendors tune their control planes to process thousands of label setups per second, the sys- tem does not scale in the long run due to its label forwarding state explosion in the core. Consider, for example, 64 customers needing 10 applications which results in more than 20,000 connected paths – that amount of customers and paths will stress the control plane severely and in the long run will exhaust the label space due to the finite cell-header size. Today, cell-based networks are rarely used for IP transport. Network operators mainly share the router vendors’ view that MPLS and its stacking ability are the foundation for scaling services across the network. 14.3.1.2 Packet-based MPLS In the packet-based MPLS world, MPLS is a fully fledged protocol type that runs on top of link-layers such as ATM, C-HDLC, Ethernet and PPP. Figure 14.10 shows examples of how MPLS is encapsulated on those link-layer protocols. After the Link-Layer Header, a 4-byte MPLS “shim header” is added. Interestingly, the MPLS shim header can also be present on ATM frames. The nature of MPLS is packet based, however the link-layer is MPLS. Note that packet-based MPLS does not modify the VPI/VCI labels of the ATM header. The only information that a packet-based MPLS router modifies is the shim header. The MPLS shim header consists of a 20-bit label value plus EXP, S and TTL bits. The label information inside the MPLS shim header is constantly rewritten along the switching path as in ATM or Frame Relay switch networks. The TTL field carries the same semantics as the IP TTL field. The main purpose is to prevent harm resulting from persistent forwarding loops. The Experimental or short EXP bits typically carry COS-related information like Forwarding Class Name and drop probability. The last piece of information is only a single bit, but it gives MPLS its scaling abilities. The Bottom of Stack bit, if set, indicates that after the MPLS shim header the Payload (typically the IP packet) is stored. Reverse logic implies that if the Bottom of Stack bit is not set, then an additional MPLS shim header is found inside. In other words, MPLS supports label stacking. Those stacking capabilities are used for a variety of applications such as VPNs, and also for Traffic Engineering tun- nels for LDP over RSVP-TE tunnelling, which are typically used in large-scale networks. In order for IP to take advantage of MPLS, IP packets need to get wrapped in MPLS packets by prepending an MPLS shim header before the IP packet. Adding an MPLS shim to the potential stack of MPLS shims is called a push operation. Consequently, taking off 404 14. Traffic Engineering and MPLS 405 Shared LSP distribution mesh Pennsauken Frankfurt Washington New York Paris Pennsauken London Frankfurt Paris London Per application distribution mesh Pennsauken Frankfurt Washington New York Paris Pennsauken London Frankfurt Paris London F IGURE 14.9. Cell-based MPLS requires for each service a dedicated label switched path mesh 406 Ethernet Destination Mac Source Mac Ethertype 0x8848 48 48 16 Protocol 0x8848 16 HDLC 0x00FF HDLC 0xFF03 Protocol 0x0281 16 16 16 Label EXP S TTL 20 3 1 8 Label EXP S TTL 20 3 1 8 Label EXP S TTL 20 3 1 8 Cisco HDLC PPP 16 Label EXP S TTL 20 3 1 8 subProto-ID 0x8848 OUI 0x0 Proto-ID 0xAAAA03 24 24 ATM SNAP MPLS shim header F IGURE 14.10. The MPLS shim header is treated like an OSI-RM Layer-3 protocol and can be run o ver a variety of link-layer protocols a label from the MPLS shim stack is called a pop operation. Just changing a label value and not adding or removing a label off or on the stack is called a swap operation. Figure 14.11 shows where those three operations are applied to IP transit traffic. Consider an MPLS label switched path from Frankfurt to Washington DC. Frankfurt is the Ingress or Head End of the label switched path. The Ingress router performs a push operation, which adds a label #397 to the IP payload and passes off the packet to the next downstream router, which is London. London’s lookup table maps incoming labels #397 to the outgoing port facing Pennsauken and swaps label #397 to label #512. Pennsauken maps label #512 to its outgoing port facing New York and swaps the label to #438. Traffic Engineering by MPLS 407 Pennsauken Frankfurt London Washington New York Paris label 397 label 512 label 438 label 0 FIGURE 14.11. Each MPLS router along a label switched path changes, swaps the label Pennsauken is the penultimate router on the forwarding path and therefore it has to tell the egress router to unwrap the packet. It does so by swapping the label to zero and forward- ing it to Washington. The Egress or tail end router in Washington knows now to POP the top label off the stack and do a regular IP lookup on the packet inside. 14.4 MPLS Signalling Protocols Now the next big question is: how are label switched paths established? As in the routing protocol world, there are generally two ways to bring up label switched paths: • Static setup • Dynamic (signalled) setup Static setups have no real practical relevance: they are difficult to coordinate and to set up and cumbersome to maintain. Additionally they do not have the possibility to re-route traffic in case the primary path fails. The majority of network operators deployed sig- nalled setup of label switched paths using one or both of the following protocols: LDP and RSVP-TE. Path control is one of the prime necessities of traffic engineering. LDP is not directly related to traffic engineering because LDP lacks support for traffic path control. Although there is an extension to LDP that allows traffic path control based on constraints called CR-LDP the new extensions never materialized in real networks. Finally the CR-LDP got abandoned by the IETF. Today’s protocol of choice for traffic-path control is RSVP, aug- mented with a Traffic Engineering Extension, called RSVP-TE. This chapter covers only LDP basics, and just to provide a better understanding about how LDP fits in with more advanced concepts like LDP over RSVP-TE tunnelling, which is related to traffic engin- eering and IS-IS. 14.4.1 RSVP-TE RSVP was originally defined in RFC 2205, however, with completely different intentions than using it for traffic engineering purposes. Originally it was thought of as being the tool to make the Internet CoS aware. The application running on End Systems should be made CoS aware and signal bandwidth and delay requirements to the network, which was expected to provide for these requirements. The RSVP message travels throughout the network and, if the receiver confirms that it is willing to accept traffic according to the flow request, then admission is granted. Then all routers across the path are required to set up per-flow schedulers to guarantee that the individual application can transmit the traffic with the requirements granted by the network. The per-flow model failed due to the inher- ent scaling problems of implementing it in hardware. This was to some degree compar- able to ATM networks, where a similar mistake was made – dynamic signalling of ATM VCs and the subsequent introduction of forwarding state does not scale. Consider that today on an OC-192c circuit there are typically millions of flows transported, then the limits of the design are immediately apparent. A hardware-scheduling engine that operates at such high speeds on so many flows cannot be built. RSVP was considered dead by the mid-1990s 408 14. Traffic Engineering and MPLS MPLS Signalling Protocols 409 and there was no broader deployment of flow-aware networks. Finally, the vision of a flow-aware Internet was abandoned for the time being. However, there were three things about RSVP that still attracted interest within the developer community. 1. Extensibility. First of all, RSVP is a very extensible protocol. Like IS-IS, the RSVP header is quite generic and all the information is encoded using TLV containers called Objects. More about the advantages of TLV encoding were discussed in Chapter 13. Virtually all successful networking protocols have a TLV orientation. RSVP is actu- ally a very good example of a protocol that, if it is just extensible enough, can be used for a totally different purpose many years later. All that is required is to define a dif- ferent set of TLVs, and functionality is added as developers move forward with the protocol. 2. Forwarding State Model. RSVP uses two basic messages for requesting and granting forwarding state: the PATH and the RESV messages. The Path message describes what the sender wants to transmit to the receiver, and the RESV message describes what the receiver is willing to accept. The PATH message travels hop-by-hop down- stream to the receiver and the RESV message travels upstream from the receiver to the sender along the path established. The receiver can set up forwarding state in a step- by-step fashion and, as soon the RESV message arrives at the requester, everything is ready and then the forwarding path can immediately be used. That property is a wide deviation from the usual “signalling” and routing paradigm found in IP networks. Routing typically does not get any feedback – at best a routing protocol tells its neigh- bour that it has received the routing update by sending back an Acknowledgement. However, the router cannot tell if the path will ever be used. RSVP is different. The router that requests a certain forwarding state from the network also gets immediate feedback that the network has set up the requesting state and now it is ready for use. For fast converging networks especially, fast feedback about whether a path can be used or not is imperative. 3. Unidirectional Notion. A flow was originally thought of as a unidirectional path between two nodes. Also, routed paths in the IP world are always unidirectional relationships. Therefore the IETF similarly defined a label switched path as a unidirectional rela- tionship. As the RSVP flow-based model implied unidirectional operation as well, it was a natural choice for setting up label forwarding state between a pair of routers. 14.4.2 Simple Traffic Engineering with RSVP-TE RSVP had a lot of interesting ingredients to serve as the protocol for setting up label switched paths across the Internet. However, a few changes and extra objects had to be defined before RSVP could be used to set up label switched paths. The most important change was that RSVP is not run between a pair of End Systems. Rather, RSVP-TE for MPLS is run between a pair of routers. The next evolutionary step was to get rid of some of the per-flow objects and to define a set of new objects that could be used for traffic engineering purposes. In RSVP, Objects is the term that is interchangeably used for TLVs. Table 14.1 shows the additional RSVP-TE objects that are defined in RFC 3209 and used with MPLS. All the objects in Table 14.1 are used for Signalling Traffic Engineering LSPs. Most of them appear in RSVP RESV or PATH messages. The tcpdump output shows how these attributes look on the wire. Tcpdump output In the tcpdump output you see the contents of a PATH and RESV message of a RSVP call that requests and assigns a label. Many TE objects are embedded in the two messages. 12:35:47.351675 IP 209.211.134.9 > 209.211.134.8: RSVP v: 1, msg-type: Path, length: 216, ttl: 255, checksum: 0x4406 Session Object (1) Flags: [reject if unknown], Class-Type: Tunnel IPv4 (7), length: 16 IPv4 Tunnel EndPoint: 209.211.134.8, Tunnel ID: 0x0011, Extended Tunnel ID: 209.211.134.9 RSVP Hop Object (3) Flags: [reject if unknown], Class-Type: IPv4 (1), length: 12 Previous/Next Interface: 10.154.1.6, Logical Interface Handle: 0x0853f4c8 Time Values Object (5) Flags: [reject if unknown], Class- Type: 1 (1), length: 8 Refresh Period: 120000ms Session Attribute Object (207) Flags: [ignore and forward if unknown], Class-Type: Tunnel IPv4 (7), length: 28 Session Name: juncore02-juncore01 Setup Priority: 7, Holding Priority: 0, Flags: [none] Sender Template Object (11) Flags: [reject if unknown], Class-Type: Tunnel IPv4 (7), length: 12 IPv4 Tunnel Sender Address: 209.211.134.9, LSP-ID: 0x0007 Sender TSpec Object (12) Flags: [reject if unknown], Class-Type: IntServ (2), length: 36 Msg-Version: 0, length: 28 Service Type: Default/Global Information (1), break bit not set, Service length: 24 Parameter ID: Token Bucket TSpec (127), length: 20, Flags: [0x00] Token Bucket Rate: 0 Mbps Token Bucket Size: 0 bytes Peak Data Rate: inf Mbps Minimum Policed Unit: 20 bytes Maximum Packet Size: 1500 bytes 410 14. Traffic Engineering and MPLS TABLE 14.1. The major traffic engineering objects for RSVP-TE. Code point Object name 16 Label object 19 Label request object 20 Explicit route object 21 Record route object 22 Hello 207 Session attribute object . information, the VPI/VCI or DLCI is rewrit- ten before the cell or the packet leaves the router. The VPI/VCI or DLCI field has purely local meaning and is only valid on the interface downstream to the. MPLS as well. The label will be written into the VPI/VCI fields and the forwarding paradigm stays the same. The label look up determines the outgoing port and outgoing label, with which the cell will. changes, swaps the label Pennsauken is the penultimate router on the forwarding path and therefore it has to tell the egress router to unwrap the packet. It does so by swapping the label to zero