Delay-Based Congestion Control

The approaches to congestion control we have seen so far are usually triggered by packet loss, detected using some combination of ACKs or SACKs, ECN (if available), and expiration of a retransmission timer. ECN (see Section 16.11) allows a sending TCP to be informed about congestion prior to the need for the network to drop packets, but this requires participation from routers within the network that may not be available. However, even without ECN it is still possible to try to determine from a host whether congestion is about to occur within the network.

One clue that congestion may be forming is an increase in measured RTT as the sender injects more packets into the network. We saw this situation in Figure 16-8, where additional packets were being queued rather than delivered, contributing to a higher measured RTT (until packets were ultimately discarded). Several congestion control techniques depend on this observation. They are called delay-based congestion control algorithms, as opposed to the loss-based congestion control algorithms we have seen so far.

16.9.1 Vegas

In 1994, TCP Vegas was introduced [BP95]. It was the first delay-based congestion control approach for TCP published and tested by the community of TCP devel- opers. Vegas operates by estimating the amount of data it expects to transfer in a certain amount of time and comparing this with the amount of data it is actually able to transfer. If the requisite amount of data is not transferred, it is likely to be held up in a router queue along the path. If this condition persists, the Vegas sender slows down. This is in contrast to the standard TCP approach, which forces a packet drop to occur in order to determine the point at which the network is congested.

While in its congestion avoidance phase, during each RTT, Vegas measures the amount of data transferred and divides this number by the minimum delay observed across the connection. It maintains two thresholds, α and β (where α

< β). When the difference in expected throughput (window size divided by the smallest RTT observed) versus achieved throughput is less than α, the congestion window is increased; when it is greater than β, the congestion window is decreased. Otherwise, it is left as is. All changes to the congestion window are linear, meaning the scheme is an additive increase/additive decrease (AIAD) congestion control scheme.

The authors describe α and β in terms of buffer utilization at a bottleneck link.

The smallest values of interest are 1 for α and 3 for β. The reasoning behind these values is as follows: At least one packet buffer should be occupied in the network path (i.e., at the queue in the router incident with the minimum-bandwidth link on the path) to keep the network busy. If extra bandwidth becomes available, occu- pying two additional buffers (up to 3, the value for α) obviates the need to wait an extra RTT in order to inject more, which would be required if Vegas tried to

ptg999 maintain only one buffer full. Furthermore, having the region (β–α) as the oper-

ating range leaves some room for minor changes in throughput without causing an immediate change in the window, a form of damping that aims to reduce rate oscillations.

With a slight modification, this approach can also be applied to the slow start period. Here, increasing cwnd by 1 for each good ACK is allowed only every other RTT. For those RTTs when it is not increased, a measurement is made to ensure that throughput is increasing. If not, the sender switches to the Vegas congestion avoidance scheme.

Under certain circumstances, Vegas can be “fooled” into believing that the forward-direction delay is higher than it really is. This happens when there is significant congestion in the reverse direction (recall that the paths in the two directions of a TCP connection may be different and have different states of congestion). In such cases, packets (ACKs) returning to the sending TCP are delayed, even though the sender is not really contributing to the (reverse-path) congestion.

This causes Vegas to reduce the congestion window even though such an adjust- ment is not really necessary. This is a potential pitfall for most techniques based on measuring RTT as a basis for congestion control decisions. Indeed, significant traffic in the reverse direction can cause the ACK clock (Figure 16-1) to be significantly perturbed [M92].

Vegas is fair relative to other Vegas TCPs sharing the same path because each pushes the network to hold only a minimal amount of data. However, Vegas and standard TCP flows do not share paths equally. A standard TCP sender tends to fill queues in the network, whereas Vegas tends to keep them nearly empty.

Consequently, as the standard sender injects more packets, the Vegas sender sees increased delay and slows down. Ultimately, this leads to an unfair bias in favor of the standard TCP. Vegas is supported by Linux but not enabled by default. For kernels prior to 2.6.13, the Boolean sysctl variable net.ipv4.tcp_vegas_

cong_avoid determines whether it is used (default 0). The variables net.ipv4.

tcp_vegas_alpha (default 2) and net.ipv4.tcp_vegas_beta (default 6) correspond to the alpha and beta described previously but are expressed in half-packet units (i.e., 6 corresponds to 3 packets). The variable net.ipv4.

tcp_vegas_gamma (default 2) configures how many half-packets Vegas should attempt to keep outstanding during slow start. For kernels after 2.6.13, Vegas must be loaded as a separate kernel module and enabled by setting net.ipv4.tcp_

congestion_control to vegas.

16.9.2 FAST

FAST TCP was developed with particular attention to operations in high-speed environments with large bandwidth-delay products [WJLH06]. Similar to Vegas in spirit, it adjusts the window based on the difference between an expected throughput rate and an experienced rate. It differs from Vegas by adjusting the window based not only on the window size, but also on the difference between

ptg999 Section 16.9 Delay-Based Congestion Control 779

the current and expected performance. It updates the sending rate every other RTT using a rate-pacing technique. If the measured delay is significantly below a threshold, the window is updated aggressively followed by a period when the increase is less aggressive. When the delay increases, the reverse takes place. FAST differs from the other approaches we have discussed because it is the subject of several patents and is being commercialized independently. It has received somewhat less scrutiny from the research community, but an independent evaluation [S09] has shown it to have good stability and fairness properties.

16.9.3 TCP Westwood and Westwood+

TCP Westwood (TCPW) and TCP Westwood+ (TCPW+) aim at handling large bandwidth-delay-product paths by modifying a conventional TCP NewReno sender.

TCPW+ is a correction to the original TCPW algorithm, so we will just refer to either as TCPW. In TCPW, the sender’s eligible rate estimate (ERE) is an estimate of the bandwidth available on the connection. It is continuously computed in a fash- ion somewhat similar to Vegas (based upon the difference between an expected and an achieved rate), but with a variable measurement interval for the rates based on the dynamics of ACK arrivals. When congestion is low, the measurement interval is small, and vice versa. When a packet loss is detected, instead of reduc- ing cwnd by half, TCPW computes an estimated BDP (ERE times the minimum RTT observed) and uses this as the new value for ssthresh. Agile probing [WYSG05]

adaptively and repeatedly sets ssthresh when a connection would otherwise oper- ate in slow start. This causes cwnd to grow exponentially in cases where ssthresh has been increased (by initiating slow start). Westwood can be enabled in Linux kernels after 2.6.13 by loading a TCPW module and setting net.ipv4.tcp_con- gestion_control to westwood.

16.9.4 Compound TCP

Starting with Windows Vista, it is possible to choose which congestion control procedure (“provider”) TCP should use, in a way similar to Linux’s pluggable congestion avoidance modules. One such option (but not the default, except for Windows Server 2008) is called Compound TCP (CTCP) [TSZS06]. CTCP makes window adjustments based upon packet loss, but also based on measured delays.

In some sense it is a combination of standard TCP and Vegas, but with the scal- ability features of HSTCP.

The authors begin by recounting a number of results shown in the Vegas and FAST research that suggest that delay-based congestion control schemes tend to have better utilization, less self-induced packet loss, faster convergence (to the correct operating point), plus better RTT fairness and stabilization. However, as mentioned previously, delay-based approaches tend to lose bandwidth when com- peting with loss-based congestion control approaches. CTCP attempts to address this situation by combining a delay-based approach with a loss-based approach.

ptg999 To do this, CTCP introduces a new window control variable called dwnd (the

“delay window”). The usable window W then becomes W = min(cwnd + dwnd, awnd)

The handling of cwnd is similar to that of standard TCP, but the addition of dwnd may allow additional packets to be sent if the delay conditions are appropriate.

When ACKs arrive during congestion avoidance, cwnd is updated as follows:

cwnd = cwnd + 1/(cwnd + dwnd)

The management of dwnd is based on Vegas and is nonzero only during congestion avoidance (CTCP uses conventional slow start). As a connection operates, the minimum RTT measured is maintained in the variable baseRTT. Then, the difference in expected data outstanding versus the actual amount, diff, is computed as follows: diff = W*(1 - (baseRTT/RTT)), where RTT is the estimated (smoothed) RTT estimate. The value of diff estimates the number of packets (or bytes) queued in the network. CTCP, like most delay-based schemes, attempts to keep diff at a certain threshold, called γ, in order to ensure that the network remains utilized but not congested. Given this goal, the control process for dwnd is then expressed as follows:

dwnd(t) + (α * win(t)k – 1)+, if diff < γ dwnd(t + 1) = (dwnd(t) – ζ * diff)+, if diff ≥ γ

(win(t){ * (1 – β) – cwnd/2)+, if loss detected

where (x)+ means max(x, 0). Note that dwnd can never be negative. Rather, it may be zero, in which case CTCP behaves like standard TCP.

In the first case, where the network may be underutilized, CTCP grows dwnd according to the polynomial α * win(t)k. This is a form of binomial increase and accounts for the way CTCP can be made more aggressive (similar to HSTCP) when the buffer occupancy is estimated to be less than γ. In the second case, where the buffer occupancy appears to be growing beyond the desired threshold γ, the con- stant ζ dictates how quickly the delay-based component should be reduced (but recall that dwnd is always added to cwnd). This is what contributes to CTCP’s RTT and TCP fairness. When loss is detected, dwnd has its own multiplicative decrease factor β applied.

As can be seen, CTCP can be tuned using the parameters k, α, β, γ, and ζ. The value of k affects the level of aggressiveness. A value of about 0.8 was desired to be similar to HSTCP, but 0.75 was chosen for implementation reasons. The values of α and β affect smoothness and responsiveness. The default values are 0.125 and 0.5, respectively. For γ, the authors suggest a default value of 30 packets based on empirical evaluation. If this value is too small, there may not be enough packets

ptg999 Section 16.10 Buffer Bloat 781

outstanding to obtain good delay measurements. Conversely, values that are too large could result in undesirable persistent congestion.

CTCP is relatively new, so further experimentation and evaluation will no doubt be performed to see how well and fairly it competes with standard TCP, and how well it is able to adapt to significant changes in available bandwidth. In a simulation study, the author of [W08] noted that CTCP can perform poorly when network buffers are small (i.e., smaller than γ). They also suggest that CTCP can fall victim to some of the problems with Vegas, including rerouting (adapting to new paths with different delays) and persistent congestion. Finally, they observe that if many CTCP flows, each trying to keep γ packets in flight, share the same bottleneck link, performance can be poor.

As mentioned previously, CTCP is not enabled by default on most versions of Windows. However, the following command can be used to select CTCP as the congestion provider:

C:\> netsh interface tcp set global congestionprovider=ctcp

It can be disabled by selecting a different provider (or none). CTCP has also been ported to Linux as a pluggable congestion avoidance module but is not included by default.

Ethernet and the IEEE 802 LAN/MAN Standards

Dynamic Host Configuration Protocol (DHCP)