Path MTU Discovery with UDP

Let us examine the interaction between an application using UDP and the path MTU discovery mechanism (PMTUD) [RFC1191]. For a protocol such as UDP, in which the calling application is generally in control of the outgoing datagram size, it is useful if there is some way to determine an appropriate datagram size if fragmentation is to be avoided. Conventional PMTUD uses ICMP PTB messages (see Chapter 8) in determining the largest packet size along a routing path that can be used without inducing fragmentation. These messages are typically processed below the UDP layer and are not directly visible to an application, so either an API call is used for the application to learn the best current estimate of the path MTU size for each destination with which it has communicated, or the IP layer can per- form PMTUD independently without the application knowing. The IP layer often caches PMTUD information on a per-destination basis and times it out if it is not refreshed.

10.8.1 Example

In the following example, we use the sock program to create a UDP datagram that produces a 1501-byte IPv4 datagram. Both our host system and the attached LAN support an MTU larger than 1500 bytes, but the outgoing link to the Internet at the router does not. The command attempts to send three UDP messages to the echo service (UDP port 7) in quick succession.

Linux% sock -u -i -n 3 -w1473 www.cs.berkeley.edu echo

Listing 10-6 illustrates the corresponding packet trace we can see using tcpdump at the sender (some lines are wrapped for clarity).

Listing 10-6 tcpdump output illustrating ICMP PTB message. The suggested MTU is included.

1 14:42:18.359366 IP (tos 0x0, ttl 64, id 18331, offset 0, flags [DF], proto UDP (17), length 1501)

12.46.129.28.33954 > 128.32.244.172.7: UDP, length 1473

2 14:42:18.359384 IP (tos 0x0, ttl 64, id 18332, offset 0, flags [DF], proto UDP (17), length 1501)

12.46.129.28.33954 > 128.32.244.172.7: UDP, length 1473

3 14:42:18.359402 IP (tos 0x0, ttl 64, id 18333, offset 0, flags [DF], proto UDP (17), length 1501)

12.46.129.28.33954 > 128.32.244.172.7: UDP, length 1473 4 14:42:18.360156 IP (tos 0x0, ttl 255, id 23457, offset 0, flags [none], proto ICMP (1), length 56)

12.46.129.1 > 12.46.129.28: ICMP

128.32.244.172 unreachable - need to frag (mtu 1500), length 36

ptg999 IP (tos 0x0, ttl 63, id 18331, offset 0, flags [DF],

proto UDP (17), length 1501)

12.46.129.28.33954 > 128.32.244.172.7: UDP, length 1473

In Listing 10-6 we see three UDP datagrams of 1473 UDP (application) pay- load bytes each. Each produces a 1501-byte (unfragmented) IPv4 datagram. Each of these datagrams has the IPv4 DF bit field turned on (the default on this system), so when one of them reaches a router (IPv4 address 12.46.129.1), an ICMPv4 PTB message is produced, which includes the suggested next-hop MTU of 1500 bytes.

We may also observe that the ICMPv4 messages produced contain the UDP/IPv4 headers (and first 8 data bytes) from our discarded (“offending”) datagrams. In this example, our sock program sent its datagrams so quickly (in under a mil- lisecond) that it completed its execution before any of the ICMP messages were returned and processed.

Note

The 1500-byte MTU is now a common minimum MTU among ISPs. Some ISPs that incorporate PPPoE for address assignment and management use smaller, 1492-byte MTUs. The PPPoE header (see Chapter 3) comprises 6 bytes, and the following PPP header is 2, leaving 1500 – 6 – 2 = 1492 bytes for the encapsulated datagram.

If we use another destination host (one about which we have no path MTU history), and we add additional delay between writes, we can observe different behavior. Using the sock command with the -p 2 option, which adds 2s of delay between each send, we use the following two (identical) commands:

Linux% sock -u -i -n 3 -w1473 -p 2 www.wisc.edu echo write returned -1, expected 1473: Message too long Linux% sock -u -i -n 3 -w1473 -p 2 www.wisc.edu echo

The tcpdump output, using an alternative version of tcpdump, for these commands is given in Listing 10-7 (some lines are wrapped for clarity).

Listing 10-7 Illustration of successful Path MTU discovery on 3000-byte MTU link adapting to 1500-byte path MTU

1 17:22:16.331023 IP (tos 0x0, ttl 64, id 58648, offset 0, flags [DF], proto: UDP (17), length: 1501)

12.46.129.28.33955 > 144.92.9.185.7: UDP, length 1473 2 17:22:16.331581 IP (tos 0x0, ttl 255, id 38518, offset 0, flags [none], proto: ICMP (1), length: 56)

12.46.129.1 > 12.46.129.28: ICMP

144.92.9.185 unreachable - need to frag (mtu 1500), length 36

ptg999 Section 10.8 Path MTU Discovery with UDP 495

IP (tos 0x0, ttl 63, id 58648, offset 0, flags [DF], proto: UDP (17), length: 1501)

12.46.129.28.33955 > 144.92.9.185.7: UDP, length 1473

3 17:22:24.284866 IP (tos 0x0, ttl 64, id 53776, offset 0, flags [+], proto: UDP (17), length: 1500)

12.46.129.28.33955 > 144.92.9.185.7: UDP, length 1473 4 17:22:24.284873 IP (tos 0x0, ttl 64, id 53776, offset 1480, flags [none], proto: UDP (17), length: 21)

12.46.129.28 > 144.92.9.185: udp

5 17:22:26.293554 IP (tos 0x0, ttl 64, id 53777, offset 0, flags [+], proto: UDP (17), length: 1500)

12.46.129.28.33955 > 144.92.9.185.7: UDP, length 1473 6 17:22:26.293559 IP (tos 0x0, ttl 64, id 53777, offset 1480, flags [none], proto: UDP (17), length: 21)

12.46.129.28 > 144.92.9.185: udp

7 17:22:28.301469 IP (tos 0x0, ttl 64, id 53778, offset 0, flags [+], proto: UDP (17), length: 1500)

12.46.129.28.33955 > 144.92.9.185.7: UDP, length 1473 8 17:22:28.301474 IP (tos 0x0, ttl 64, id 53778, offset 1480, flags [none], proto: UDP (17), length: 21)

12.46.129.28 > 144.92.9.185: udp

In Listing 10-7 we can see that the first time we ran our program it resulted in an error due to the ICMPv4 PTB message. The extra time provided within and between runs provides an opportunity for the PTB message to reach the sending host and for the error condition to be delivered back to the sender for processing.

Interestingly, when we run the program a second time, the path MTU has been discovered to be 1500 bytes and the system is able to send the program’s three datagrams using fragmentation (packets 3, 5, and 7 indicate the first fragments of the three datagrams). After 15 minutes (not illustrated), the path MTU information is considered stale, the datagram is sent unfragmented, another ICMPv4 PTB message is returned, and the process repeats.

Note

[RFC1191] recommends a PMTU value determined using PMTUD to be considered stale after 10 minutes. Path MTU discovery can sometimes cause problems because firewalls and filtering gateways may drop ICMP traffic indiscriminately, which can harm the PMTU discovery algorithm. Because of this, it is possible to disable PMTU discovery on a system-wide or finer-granularity basis. On Linux, the file /proc/sys/net/ipv4/ip_no_pmtu_disc can have a 1 written to it to disable the feature. On Windows, it involves editing the registry entry HKEY_LOCAL_

MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\

ptg999 EnablePMTUDiscovery to include the value 0. An alternative to conventional

PMTUD that does not use ICMP has also been developed [RFC4821]; we will discuss it in the context of TCP in Chapter 15.

Ethernet and the IEEE 802 LAN/MAN Standards

Dynamic Host Configuration Protocol (DHCP)