17 Self-similar Traffic play it again, Sam SELF-SIMILARITY AND LONG-RANGE-DEPENDENT TRAFFIC The queueing models and solutions we have presented, developed and applied in this book are very useful and have wide applicability. However, one of the most significant recent findings for the design and performance evaluation of networks has been the discovery of self- similarity and long-range dependence (LRD) in a variety of traffic types [17.1]. Why is it significant? Well, the essence of self-similarity is that a time-varying process behaves in a similar way over all time scales. The observations made on a variety of traffic types in different network tech- nologies show bursty behaviour over awiderangeoftimescales.And,as we have seen in previous chapters, bursty behaviour has a much greater impact on finite resources. Let’s take a memoryless process first, and see how that scales with time. Figure 17.1 shows the results of simulating traffic for 10 000 seconds. The first 100 seconds of the arrival process are shown as a thin grey line, and here we see typical variable behaviour around a mean value of about 25 arrivals per second. The thick black line shows the process scaled by 100, i.e. the number of arrivals is averaged every 100 seconds and so the 100 scaled time units cover the full 100 00 seconds of the simulation. This averaging clearly shows a reduction in the variability of the process when viewed on the longer time scale – the mean value of 25 arrivals per second is evident. Figure 17.2 takes a self-similar process and plots it in the same way. In this case we can see the high variability of the process even after scaling. However, it is not self-similarity which is the underlying phenomenon, but rather it is the presence of many basic communications processes which have heavy-tailed sojourn-time distributions. In thesedistributions, the tail probabilities decay as a power law, rather than exponentially. Introduction to IP and ATM Design Performance: With Applications Analysis Software, Second Edition. J M Pitts, J A Schormans Copyright © 2000 John Wiley & Sons Ltd ISBNs: 0-471-49187-X (Hardback); 0-470-84166-4 (Electronic) 288 SELF-SIMILAR TRAFFIC 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 40 45 50 Time (scaled units) × 100 × 1 Normalized arrival rate Figure 17.1. Scaling Behaviour of a Memoryless Process 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 40 45 50 Time (scaled units) × 100 × 1 Normalized arrival rate Figure 17.2. Scaling Behaviour of a Self-Similar Process THE PARETO MODEL OF ACTIVITY 289 Another way of expressing this is that the process has long-term (slowly decaying) correlations. So, an individual communications process with heavy-tailed sojourn times exhibits long-range dependence. And the aggregation of LRD sources produces a traffic stream with self-similar characteristics. So, how do we model and analyse the impact of this traffic? There have been claims that ‘traditional’ approaches to teletraffic modelling no longer apply. Much research effort has been, and is being, spent on developing new teletraffic models, such as Fractional Brownian Motion (FBM) processes (e.g. [17.2]) and non-linear chaotic maps (e.g. [17.3]). However, because of their mathematical complexity, assessing their impact on network resources is not a simple task, although good progress is being made. In this chapter we take a different approach: with a little effort we can re-apply what we already know about traffic engineering usefully, and generate results for these new scenarios quickly. Indeed, this is in line with our approach throughout this book. THE PARETO MODEL OF ACTIVITY A distribution is heavy-tailed if PrfX > xgD1 Fx ³ 1 x ˛ as x !1, and noting that ˛>0 (usually ˛ takes on values in the range 1 ! 2). The Pareto distribution is one of the class of distributions that are ‘heavy-tailed’,andisdefined as PrfX > xgD υ x ˛ where υ is the parameter which specifies the minimum value that the distribution can take, i.e. x υ. For example, if υ D 25, then PrfX > 25gD 1, i.e. X cannot be less than or equal to 25. For our purposes it is often convenient to set υ D 1. The cumulative distribution function is Fx D 1 υ x ˛ and the probability density function is given by f x D ˛ υ Ð υ x ˛C1 290 SELF-SIMILAR TRAFFIC The mean value of the Pareto distribution is E[x] D υ Ð ˛ ˛ 1 Note that for this formula to be correct, ˛>1 is essential; otherwise the Pareto has an infinite mean. Let’s put some numbers in to get an idea of the effect of moving to heavy-tailed distributions. Assume that we have a queue with a time- slotted arrival process of packets or cells. The load is 0.5, and we have a batch arriving as a Bernoulli process, such that Prfthere is a batch in a time slotgD0.25 thus the mean number of arrivals in any batch is 2. We calculate the probability of having more than x arrivals in any time slot, in two cases: for an exponentially distributed batch size, and for a Pareto-distributed batch size. In the former case, we have Prfbatch size > xgDe x 2 so Prf> 10 arrivals in any time slotgDPrfbatch size > 10g ð Prfthere is a batch in a time slotg D e 10 2 ð 0.25 D 0.001 684 In the latter case, we have (with υ D 1) E[x] D 1 Ð ˛ ˛ 1 D 2 so ˛ D E[x] E[x] 1 D 2 hence Prfbatch size > xgD 1 x 2 giving Prf>10 arrivals in any time slotgD 1 10 2 ð 0.25 D 0.0025 Thus for a batch size of greater than 10 arrivals there is not that much difference between the two distributions – the probability is of the same THE PARETO MODEL OF ACTIVITY 291 order of magnitude. However, if we try again for more than 100 arrivals we obtain Prf>100 arrivals in any time slotgDe 100 2 ð 0.25 D 4.822 ð 10 23 in the exponential case, and Prf>100 arrivals in any time slotgD 1 100 2 ð 0.25 D 2.5 ð 10 5 0 20 40 60 80 100 120 140 160 180 200 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Pareto, E(x) = 10 Pareto, E(x) = 2 Exponential, E(x) = 10 Exponential, E(x) = 2 Pr{X>x} Batch size, x exponential , t :D e Ðt Pareto k ,˛,x :D k x ˛ i:D 1 1000 x i :D i y1 i :D exponential 0 . 5, x i y2 i :D exponential 0 . 1, x i y3 i :D Pareto 1 , 2 2 1 , x i y4 i :D Pareto 1 , 10 10 1 , x i Figure 17.3. Comparison of Exponential and Pareto Distributions, and the Mathcad Code to Generate x, y Values for Plotting the Graph 292 SELF-SIMILAR TRAFFIC 2 3 4 5 67 2 3 4 5 67 2 3 4 5 67 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 3 10 −2 10 2 10 −1 10 1 10 0 10 0 Pareto, E(x) = 10 Pareto, E(x) = 2 Exponential, E(x) = 10 Exponential, E(x) = 2 Batch size, x Pr{X>x} Figure 17.4. Comparison of Exponential and Pareto Distributions, with Logarithmic Scale for x in the Pareto case. This is a significant difference, and clearly illustrates the problems associated with highly variable traffic, i.e. non-negligible probabilities for large batch sizes, or long sojourn times. Figure 17.3 compares the exponential and Pareto distributions for two different mean batch sizes, plotting x on a linear scale. For the exponential distribution (which we have used extensively for sojourn times in state- based models) the logarithm of the probability falls away linearly with increasing x. But for the Pareto the distribution ‘bends back’ so that much longer values have much more significant probability values than they would otherwise. In fact we can see, in Figure 17.4, that when both axes have a logarithmic scale, there is a straight-line relationship for the Pareto. We can see from these figures that the Pareto distribution has increasing, not constant, decay rate. This is very important for our analysis; for example, as the ON period continues, the probability of the ON period coming to an end diminishes. This is completely different from the expo- nential model, and the effect on buffer content is predictably dramatic. IMPACT OF LRD TRAFFIC ON QUEUEING BEHAVIOUR In previous queueing analysis we have been able to use memoryless distributions such as the exponential or geometric, in the trafficmodels, resulting in constant decay rates for the queueing behaviour. The effect of using a Pareto distribution is that, as the buffer fill becomes very large, the decay rate of the buffer-state probabilities tends to 1. This has an IMPACT OF LRD TRAFFIC ON QUEUEING BEHAVIOUR 293 important practical outcome: above a certain level, there is no practical value in adding more buffer space to that already available. This is clearly both important and very different from those queueing systems we have already studied. The queue with Pareto-distributed input is then one of those examples (referred to previously in Chapter 14) which are not covered by the rule of asymptotically constant decay rates – except that it will always eventually be the case that the decay rate tends to 1! The Geo/Pareto/1 queue In order to explore the effects of introducing heavy-tailed distributions into the analysis, we can re-use the queueing analysis developed in Chapter 7. Let’s assume a queue model in which batches of packets arrive at random, i.e. as a Bernoulli process, and the number of packets in a batch is Pareto-distributed. The Bernoulli process has a basic time unit (e.g. the time to serve an average-length packet), and a probability, q,that a batch arrives during the time unit. This is illustrated in Figure 17.5. In order to use the queueing analysis from Chapter 7, we need to calculate the batch arrivals distribution. The probability that there are k arrivals in any time unit is denoted ak.Thuswewrite a0 D 1 q a1 D q Ð b1 a2 D q Ð b2 . . . ak D q Ð bk where bk is the probability that an arriving batch has k packets. Note that this is a discrete distribution, whereas the Pareto, as defined earlier, . . . . . . Geometrically distributed period of time between arriving batches Pareto distributed number of packets in an arriving batch Time Packet departure process Figure 17.5. Model of Arriving Batches of Packets 294 SELF-SIMILAR TRAFFIC is a continuous distribution. We use the cumulative form Fx D 1 1 x ˛ to compute a discrete version of the Pareto distribution. In order to calculate bk, we use the interval [k 0.5, k C 0.5] on the continuous 0 20 40 60 80 100 120 140 160 180 200 Batch size 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Probability α = 1.9 α = 1.1 BatchPareto q , k ,˛:D 1 qifkD 0 1 1 1 . 5 ˛ Ð qifkD 1 q Ð 1 k 0.5 ˛ 1 k C 0.5 ˛ otherwise maxX :D 1000 k:D 0 maxX l:D 0 1 ˛ :D 1 . 9 1 . 1 B l :D ˛ l ˛ l 1 :D 0.25 q l :D B l x k :D k y1 k :D BatchPareto q 0 , k ,˛ 0 y2 k :D BatchPareto q 1 , k ,˛ 1 Figure 17.6. Discrete Version of Batch Pareto Input Distributions IMPACT OF LRD TRAFFIC ON QUEUEING BEHAVIOUR 295 10 0 10 1 10 2 10 3 234567 234567 234567 Queue size 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 State probability α = 1.9 α = 1.1 maxX :D 1000 k:D 0 maxX 1:D 0 1 ˛ :D 1 . 9 1 . 1 B 1 :D ˛ 1 ˛ 1 1 :D 0.25 q 1 :D B 1 aP1 k :D Batchpareto q 0 , k,˛ 0 ap2 k :D Batchpareto q 1 , k,˛ 1 x k :D k y1 :D infiniteQmaxX, aP1, y2 :D infiniteQmaxX, aP2, Figure 17.7. State Probability Distributions with Pareto Distributed Batch Input distribution, i.e. bx D Fx C 0.5 Fx 0.5 D 1 x 0.5 ˛ 1 x C 0.5 ˛ Note that F1 D 0, i.e. the probability that an arriving batch is less than or (exactly) equal to 1 packet is zero. Remember this is for a continuous distribution; so, for the discrete case of a batch size of one packet, 296 SELF-SIMILAR TRAFFIC we have b1 D F1.5 F1 D 1 1 1.5 ˛ So, bk is the conditional probability distribution for the number of packets arriving in a time unit, i.e. given that there is a batch;andak is the unconditional probability distribution for the number of packets arriving in a time unit – i.e. whether there is an arriving batch or not. Intuitively, we can see that the probability there are no arrivals at all will probably be the biggest single value in the distribution – most of the time there will be zero arrivals, but when packets do arrive – watch out – because there are likely to be a lot of them! Figure 17.6 shows some example distributions for batch Pareto input, with ˛ D 1.1 and 1.9. The figure is plotted on a linear axis for the batch size, so that we can see the probability of no arrivals. Note that the mean batch sizes are 11 and 2.111 packets respectively. The mean number of packets per time unit is set to 0.25; thus the probability of there being a batch is q D B D 0.25 B giving q D 0.023 and 0.118 respectively. 23456 2 2334455667710 2 10 3 710 1 10 0 Size 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Probability α = 1.9 α = 1.1 Figure 17.8. Comparison of Power-Law Decays for Arrival (Thin) and Queue-State (Thick) Probability Distributions [...]... Effect of Truncated Power-Law Decays for Arrival (Thin) and Queue-State (Thick) Probability Distributions 298 SELF-SIMILAR TRAFFIC Now that we have prepared the arrival distribution, we can put this directly into the queueing analysis from Chapter 7 Figure 17.7 shows the resulting queue state probabilities for both ˛ D 1.1 and 1.9 Note that the queue-state probabilities have power-law decay similar to,... set X D 500 The load is reduced because of the truncation to 0.115 and 0.242 respectively The figure shows both the truncated arrival distributions and the resulting queue-state distributions For the latter, it is clear that the power-law decay begins to change, even before the truncation limit, towards an exponential decay So, we can see that it is important to know the actual limit of the ON period... queue-state probabilities have power-law decay similar to, but not the same as, the arrival distributions This is illustrated in Figure 17.8, which shows the arrival probabilities as thin lines and the queue-state probabilities as thick lines From these results it appears that the advantage of having a large buffer is somewhat diminished by having to cope with LRD traffic: no buffer would seem to be large . Schormans Copyright © 2000 John Wiley & Sons Ltd ISBNs: 0-4 7 1-4 9187-X (Hardback); 0-4 7 0-8 416 6-4 (Electronic) 288 SELF-SIMILAR TRAFFIC 0 10 20 30 40 50 60 70 80 90. 17 Self-similar Traffic play it again, Sam SELF-SIMILARITY AND LONG-RANGE-DEPENDENT TRAFFIC The queueing models and