Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 34 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
34
Dung lượng
729,85 KB
Nội dung
18
CONGESTION CONTROL FOR
SELF-SIMILAR NETWORK TRAFFIC
TSUNYI TUAN AND KIHONG PARK
Network Systems Lab, Department of Computer Sciences, Purdue University,
West Lafayette, IN 47907
18.1 INTRODUCTION
Recent measurements of local-area and wide-area traf®c [8, 28, 42] have shown that
network traf®c exhibits variability at a wide range of scales. What is striking is the
ubiquitousness of the phenomenon, which has been observed in diverse networking
contexts, from Ethernet to ATM, LAN and WAN, compressed video, and HTTP-
based WWW traf®c [8, 15, 23, 42]. Such scale-invariant variability is in strong
contrast to traditional models of network traf®c, which show burstiness at short time
scales but are essentially smooth at large time scales; that is, they lack long-range
dependence. Since scale-invariant burstiness can exert a signi®cant impact on
network performance, understanding the causes and effects of traf®c self-similarity
is an important problem.
In previous work [33, 34], we have investigated the causal and performance
aspects of traf®c self-similarity, and we have shown that self-similar traf®c ¯ow is an
intrinsic property of networked client=server systems with heavy-tailed ®le size
distributions, and conjoint provision of low delay and high throughput is adversely
affected by scale-invariant burstiness. From a queueing theory perspective, the
principal distinguishing characteristic of long-range-dependent (LRD) traf®c is that
the queue length distribution decays much more slowlyÐthat is, polynomiallyÐvis-
a
Á
-vis short-range-dependent (SRD) traf®c sources such as Poisson sources, which
exhibit exponential decay. A number of performance studies [1, 2, 11, 29, 32, 34]
Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger
ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc.
447
Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger
Copyright # 2000 by John Wiley & Sons, Inc.
Print ISBN 0-471-31974-0 Electronic ISBN 0-471-20644-X
have shown that self-similarity has a detrimental effect on network performance,
leading to increased delay and packet loss rate. In Grossglauser and Bolot [18] and
Ryu and Elwalid [37], the point is advanced that for small buffer sizes or short time
scales, long-range dependence has only a marginal impact. This is, in part, due to a
saturation effect that arises when resources are overextended, whereby the burstiness
associated with short-range-dependent traf®c is suf®cientÐand, in many cases,
dominantÐto cause signi®cant buffer over¯ow.
What is still in its infancy, however, is the problem of controlling self-similar
network traf®c. By the control of self-similar traf®c, we mean the problem of
modulating traf®c ¯ow such that network performance including throughput is
optimized. Scale-invariant burstiness introduces new complexities into the picture,
which make the task of providing quality of service (QoS) while achieving high
utilization signi®cantly more dif®cult. First and foremost, scale-invariant burstiness
implies the existence of concentrated periods of high activity at a wide range of time
scales which adversely affects congestion control. Burstiness at ®ne time scales is
commensurate with burstiness observed for traditional short-range dependent traf®c
models. The distinguishing feature is burstiness at coarser time scales, which
induces extended periods of either overload or underutilization and degrades overall
performance. However, on the ¯ip side, long-range dependence, by de®nition,
implies the existence of nontrivial correlation structure, which may be exploitable
for congestion control purposes, information to which current algorithms are
impervious.
In this chapter, we show the feasibility of ``predicting the future'' under self-
similar traf®c conditions with suf®cient reliability such that the information can be
effectively utilized for congestion control purposes. First, we show that long-range
dependence can be on-line detected to predict future traf®c levels and contention at
time scales above and beyond the time scale of the feedback congestion control.
Second, we present a traf®c modulation mechanism based on multiple time scale
congestion control framework (MTSC) [46] and show that it is able to effectively
exploit this information to improve network performance, in particular, throughput.
The congestion control mechanism works by selectively applying aggressiveness
using the predicted future when it is warranted, throttling the data rate upward if the
predicted future contention level is low, being more aggressive the lower the
predicted contention level. We show that the selective agressiveness mechanism is
of bene®t even for short-range-dependent traf®c; however, being signi®cantly more
effective for long-range dependent traf®c, leading to comparatively large perfor-
mance gains. We also show that as the number of connections engaging in selective
aggressiveness control (SAC) increases, both fairness and ef®ciency are preserved.
The latter refers to the total throughput achieved across all SAC-controlled connec-
tions.
The rest of the chapter is organized as follows. In Section 18.2, we give a brief
overview of self-similar network traf®c and the speci®c setup employed in this
chapter. In Section 18.3, we describe the predictability mechanism and its ef®cacy at
extracting the correlation structure present in long-range dependent traf®c. This is
followed by Section 18.4, where we describe the SAC protocol and a re®nement of
448 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC
the predictability mechanism for on-line, per-connection estimation. In Section 18.5
we show performance results of SAC and show its ef®cacy under different long-
range dependence conditions and when the number of SAC connections is varied.
We conclude with a discussion of current results and future work.
18.2 PRELIMINARIES
18.2.1 Self-Similar Traf®c: Basic De®nitions
Let X
t
tPZ
be a time series, which, for example, represents the trace of data ¯ow at
a bottleneck link measured at some ®xed time granularity. We de®ne the aggregated
series X
m
i
as
X
m
i
1
m
X
imÀm1
ÁÁÁX
im
:
That is, X
t
is partitioned into blocks of size m, their values are averaged, and i is used
to index these blocks.
Let rk and r
m
k denote the autocorrelation functions of X
t
and X
m
i
,
respectively. X
t
is self-similarÐmore precisely, asymptotically second-order self-
similarÐif the following conditions hold:
rk$const Á k
Àb
; 18:1
r
m
k$rk; 18:2
for k and m large, where 0 < b < 1. That is, X
t
is ``self-similar'' in the sense that the
correlation structure is preserved with respect to time aggregationÐrelation (18.2)Ð
and rk behaves hyperbolically with
P
I
k0
rkIas implied by Eq. (18.1). The
latter property is referred to as long-range dependence.
Let H 1 À b=2. H is called the Hurst parameter, and by the range of b,
1
2
< H < 1. It follows from Eq. (18.1) that the farther H is away from
1
2
the more
long-range dependent X
t
is, and vice versa. Thus, the Hurst parameter acts as an
indicator of the degree of self-similarity.
A test for long-range dependence can be obtained by checking whether H
signi®cantly deviates from
1
2
or not. We use two methods for testing this condition.
The ®rst method, the variance±time plot, is based on the slowly decaying variance of
a self-similar time series. The second method, the R=S plot, use the fact that for a
self-similar time series, the rescaled range or R=S statistic grows according to a
power law with exponent H as a function of the number of points included. Thus, the
plot of R=S against this number on a log±log scale has a slope that is an estimate of
H. A comprehensive discussion of the estimation methods can be found in Beran [4]
and Taqqu et al. [39].
18.2 PRELIMINARIES 449
A random variable X has a heavy-tailed distribution if
PrfX > xg$x
Àa
as x 3I, where 0 < a < 2. That is, the asymptotic shape of the tail of the
distribution obeys a power law. The Pareto distribution,
pxak
a
x
ÀaÀ1
;
with parameters a > 0; k > 0, x ! k, has the distribution function
PrfX xg1 Àk=x
a
;
and hence is clearly heavy tailed.
It is not dif®cult to check that for a 2 heavy-tailed distributions have in®nite
variance, and for a 1, they also have in®nite mean. Thus, as a decreases, a large
portion of the probability mass is located in the tail of the distribution. In practical
terms, a random variable that follows a heavy-tailed distribution can take on
extremely large values with nonnegligible probability.
18.2.2 Structural Causality
In Park et al. [33], we show that aggregate traf®c self-similarity is an intrinsic
property of networked client=server systems where the size of the objects (e.g., ®les)
being accessed is heavy-tailed. In particular, there exists a linear relationship
between the heavy-tailedness measure of ®le size distributions as captured by aÐ
the shape parameter of the Pareto distributionÐand the Hurst parameter of the
resultant multiplexed traf®c streams. That is, the aggregate network traf®c that is
induced by hosts exchanging ®les with heavy-tailed sizes over a generic network
environment running ``regular'' protocol stacks (e.g., TCP, ¯ow-controlled UDP) is
self-similar, being more burstyÐin the scale-invariant senseÐthe more heavy-tailed
the ®le size distribution are. This relationship is shown in Fig. 18.1. The relationship
is robust with respect to changes in network resources (bandwidth, buffer capacity),
topology, the in¯uence of cross-traf®c, and the distribution of interarrival times. We
call this relationship between the traf®c pattern observed at the network layer and the
structural property of a distributed, networked system in terms of its high-level
object sizes structural causality [33]. H 3 À a=2 is the theoretical value
predicted by the on=off model [42]Ða 0=1 renewal process with heavy-tailed on
or off periodsÐassuming independent traf®c sources with no interactions due to
sharing of network resources.
Structural causality is of import to self-similar traf®c control since (1) it provides
an environment where self-similar traf®c conditions are easily facilitatedÐjust
simulate a client=server networkÐ(2) the degree of self-similar burstiness can be
intimately controlled by the application layer parameter a, and (3) the self-similar
network traf®c induced already incorporates the actions and modulating in¯uence of
450 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC
the protocol stack since the observed traf®c pattern is a direct consequence of hosts
exchanging ®les whose transport was mediated through protocols (e.g., TCP, ¯ow-
controlled UDP) in the protocol stack. This provides us with a natural environment
where the impact of control actions by a congestion control protocol can be
discerned and evaluated under self-similar traf®c conditions.
18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC
18.3.1 Predictability Setup
In this section, we show that the correlation structure present in long-range
dependent (LRD) traf®c can be detected and used to predict the future over time
scales relevant to congestion control. Time series analysis and prediction theory have
long histories with techniques spanning a number of domains from estimation theory
to regression theory to neural network based techniques to mention a few [3, 17, 22,
40, 44, 45, 49]. In many senses, it is an ``art form'' with different methods giving
variable performance depending on the context and modeling assumptions. Our goal
is not to perform optimal time series prediction but rather to choose a simple, easy-
to-implement scheme, and use it as a reference for studying congestion control
techniques and their ef®cacy at exploiting the correlation structure present in LRD
traf®c for improving network performance. Our prediction method, which is
described next, is a time domain technique and can be viewed as an instance of
conditional expectation estimation.
Fig. 18.1 Hurst parameter estimates (R=S and variance±time) for a varying from 1.05 to
1.95.
18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 451
Assume we are given a wide-sense stationary stochastic process x
t
tPZ
and two
numbers T
1
; T
2
> 0. At time t, we have at our disposal
a
P
iPtÀT
1
;t
q
i
;
where q
i
is a sample path of x
t
over time interval t À T
1
; t. For notational clarity, let
V
1
P
iPtÀT
1
;t
x
i
; V
2
P
iPt;tT
2
x
i
:
a may be thought of as the aggregate traf®c observed over the ``recent past''
t À T
1
; t and V
1
, V
2
are composite random variables denoting the recent past and
near future. We are interested in computing the conditional probability
PrfV
2
bjV
1
ag18:3
for b in the range of V
2
. For example, if a represented a ``high'' traf®c volume, then
we may be interested in knowing what the probability of encountering yet another
high traf®c volume in the near future would be. Let
V
t
max
max
P
iPtÀT À1;t
q
i
; V
t
min
min
P
iPtÀT
1
;t
q
i
where t tkT
1
; k 0; 1; ; V
t
max
and V
t
min
denote the highest and lowest traf®c
volume seen so far at time t, respectively.
To make sense of ``high'' and ``low,'' we will partition the range between V
t
max
and
V
t
min
into h levels with quantization step m V
t
max
V
t
min
=h:
0; V
t
min
m; V
t
min
m; V
t
min
2m; V
t
min
2m; V
t
min
3m;
V
t
min
h À 2m; V
t
min
h À 1m; V
t
min
h À 1m; I;
We will de®ne two new random variables L
1
; L
2
where
L
k
1 D V
k
P0; V
t
min
m;
L
k
2 D V
k
PV
t
min
m; V
t
min
2m;
.
.
.
L
k
h À 1 D V
k
PV
t
min
h À 2m; V
t
min
h À 1m;
L
k
h D V
k
V
t
min
h À 1m; I:
In other words, L
k
is a function of V
k
; L
k
L
k
V
k
; and it performs a certain
quantization. Thus if L
k
% 1 then the traf®c level is ``low'' relative to the mean, and
if L
k
% h, then it is ``high.''
452 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC
In our case, eight levels h 8 were found to be suf®ciently granular for
prediction purposes. In practice, V
t
max
and V
t
min
are determined by applying a 3%
threshold to the previously observed traf®c volumes, i.e., the outliers corresponding
to extraordinarily large or small data points are dropped to make the classi®cation
reasonable.
Returning to Eq. (18.3) and prediction, for certain values of T
1
, T
2
,weare
interested in knowing the conditional probability densities
PrfL
2
jL
1
lg
for l P1; 8.IfPrfL
2
jL
1
8g were concentrated toward L
2
8, and PrfL
2
j L
1
1g
were concentrated toward L
2
1, then this information could be potentially
exploited for congestion control purposes.
18.3.2 Estimation of Conditional Probability Density
To explore and quantify the potential predictability of self-similar network traf®c, we
use TCP traf®c traces used in Park et al. [33] whose Hurst parameter estimates are
shown in Fig. 18.1 as the main reference point. First, we use off-line estimation of
aggregate throughput traf®c, which is then re®ned to on-line estimation of aggregate
traf®c using per-connection traf®c when performing predictive congestion control.
Other traces including those collected from ¯ow-controlled UDP runs yield similar
results. The traces used are each 10,000 seconds long at 10 ms granularity. They
represent the aggregate traf®c of 32 concurrent TCP Reno connections recorded at a
bottleneck router.
We observe that the aggregate throughput series exhibit correlation structure at
several time scales from 250 ms to 20 s and higher. To estimate PrfL
2
jL
1
lg from
the aggregate throughput series X
t
, we segment X
t
into
N
10;000 seconds
T
1
T
2
seconds
contiguous nonoverlapping blocks of length T
1
T
2
(except possibly for the last
block), and for each block j P1; N compute the aggregate traf®c V
1
, V
2
over the
subintervals of length T
1
, T
2
.
For l; l
H
P1; 8, let h
l
P0; N denote the total number of blocks such that
L
1
V
1
l and let h
l
H
P0; h
l
denote the size of the subset of those blocks such that
L
2
V
2
l
H
. Then
PrfL
2
l
H
jL
1
lg
h
l
H
h
l
:
Figure 18.2 shows the estimated conditional probability densities for a 1:05, 1.95
traf®c for time scales 500 ms, 1 s, and 5 s. In the following, T
1
T
2
.
18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 453
Fig. 18.2 Top row: Probability densities with L
2
conditioned on L
1
for a 1:05. Bottom row: Probability
densities with L
2
conditioned on L
1
for a 1:95.
454
For the aggregate throughput traces with a 1:05ÐFigure 18.2 (top row)Ðthe
three-dimensional (3D) conditional probability densities can be seen to be skewed
diagonally from the lower left side toward the upper right side. This indicates that if
the current traf®c level L
1
is low, say, L
1
1, chances are that L
2
will be low as well.
That is, the probability mass of PrfL
2
jL
1
1g is concentrated toward 1. Conversely,
the plots show that PrfL
2
jL
1
8g is concentrated toward 8. This is more clearly seen
in Fig. 18.3(a), which shows two cross sections, that is, 2D projections, re¯ecting
PrfL
2
jL
1
1g and PrfL
2
jL
1
8g.
For the aggregate throughput traces with a 1:95 (Fig. 18.2 (bottom-row)), on
the other hand, the shape of the distribution does not change as the conditioning
variable L
1
is varied. This is more clearly seen in the projections of PrfL
2
jL
1
1g
and PrfL
2
jL
1
8g shown in Fig. 18.3(b). This indicates that for a 1:95 traf®c
observing the past (over the time scales considered) does not help much in predicting
the future beyond the information conveyed by the ®xed a priori distribution. Given
the de®nition of L
k
, the Gaussian shape of the marginal densities is consistent with
short-range correlations, making the central limit theorem approximately applicable
over larger time scales. In both cases a 1:05, 1.95), the shape of the distribution
stays relatively constant across a wide range of time scales 500 ms to 20 s. For
a 1:35, 1.65 the predictability structure lies ``in-between'' (not shown here).
18.3.3 Predictability and Time Scale
An important issue is how time scale affects predictability when traf®c is long-range
dependent. Going back to Fig. 18.2 (top row), one subtle effect that is not easily
discernible is that as time scale is increased the conditional probability densities
PrfL
2
jL
1
lg become more concentrated. Given that PrfL
2
jL
1
lg is a function of
T
1
, T
2
, we would like to determine at what time scale predictability is maximized.
One way to measure the ``information content''Ðthat is, in the sense of
randomness or unstructurednessÐin a probability distribution is to compute its
Fig. 18.3 (a) Shifting effect of conditional probability densities PL
2
jL
1
1 and
PL
2
jL
1
8 for a 1:05. (b) For a 1:95, the corresponding probabilities remain invariant.
(a)
(b)
18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 455
entropy. For a discrete probability density p
i
, its entropy Sp
i
is de®ned as
Sp
i
P
i
p
i
log1=p
i
. In the case of our conditional density PrfL
2
jL
1
lg,
S
l
À
P
8
l
H
1
PrfL
2
l
H
jL
1
lg log PrfL
2
l
H
jL
1
lg:
Thus, entropy is maximal when the distribution is uniform and it is minimal if the
distribution is concentrated at a single point. Since we are given a set of eight
conditional probability densities, one for each L
1
1; 2; ; 8, we de®ne the
average entropy
S as
S
P
8
l1
S
l
=8:
The average entropy remains a function of T
1
; T
2
: that is,
S
ST
1
; T
2
.
Figure 18.4 plots
ST
1
; T
2
ST
1
(recall that T
1
T
2
) for the a 1:05
throughput series as a function of time scale or aggregation level T
1
. Entropy is
highest for small time scales in the range $250 ms, and it drops monotonically as T
1
is increased. Eventually,
ST
1
begins to ¯atten out near the 3±5 second mark,
reaching saturation, and stays so as time scale is further increased. From our analysis
of various long-range dependent traf®c traces, we ®nd that the ``knee'' of the entropy
curve is in the range of 1±5 seconds. Note that increasing T
1
further and further to
gain small decreases in entropy brings forth with it an important problem, namely, if
prediction is done over a ``too long'' time interval, then the information may not be
effectively exploitable by various congestion control strategies. In the next section,
Aggregation Level (seconds)
Fig. 18.4 Average entropy
ST
1
plot for a 1:05 traf®c as a function of time scale T
1
.
456
CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC