The relevant proofs for Theorems 4.1 and 4.2 are shown in the article by Tang et al.
[2005].
THEOREM 4.1: The average number of rungs in a 1-epoch LadderQ is bounded by a constant provided the bucketwidth δ is of O(1/N).
The next theorem, i.e. Theorem 4.2, provides an alternative proof to demonstrate that the average number of rungs in the 1-epoch LadderQ is bounded by a constant. While Theorem 4.1 has already provided an explicit bound (see Equation (4.9)) for the average number of rungs, a constraint is required on the 1-epoch LadderQ’s δ1 parameter, i.e. δ1 ~ O(1/N). Theorem 4.2, on the other hand, does not
require any constraint on the δ1 parameter but will not provide an explicit bound value as Theorem 4.1 did.
H OM 4.2: T he average number of rungs in a 1-epoch LadderQ is bounded by a constant regardless of the Rung[1] bucketwidth δ1.
COROLLARY 4.2: The 1-epoch LadderQ has O(1) average time complexity.
PROOF: The proof is obtained by combining Proposition 4.1 with either Theorem 4.1 or Theorem 4.2. □
COROLLARY 4.3: The 1-epoch LadderQ has O(N) total memory usage.
PROOF: As shown in Equation (4.1), Ladder’s first rung requires N +1 (≈N) buckets on a transfer of events from Top and thus the first rung has an O(N) memory usage. Each subsequent child rung requires O(THRES) memory space.
Since the average number of rungs is bounded (See Theorem 4.1 or Theorem 4.2), the 1-epoch LadderQ’s memory consumption is therefore bounded by O(N). □
COROLLARY 4.4: The average amortized DeleteMin cost incurred when events are transferred from Top to the multi-rung Ladder structure and then to the Bottom structure is O(1).
PROOF:Assume that the average number of rungs spawned is C. Therefore, the worst-case total cost (note: total cost is not the amortized cost) incurred is given as O(N(C+1)), where all the events in Top traversed C number of rungs before reaching Bottom (see Proposition 4.1). Hence, the cost incurred per event is O(C+1)
= O(1) since C is some constant independent of N which is given in Theorems 4.1 and 4.2. □
for the Conventional LadderQ’s O(1) Amortized Complexity For formality, we now have:
COROLLARY 4.5: conventional (multi-epoch) LadderQ is, theoretically, a true O(1) priority event queue structure.
PROOF:Combine the results of Corollary 4.1 and Corollary 4.2. □
LadderQ, with its theoretical O(1) amortized complexity, is clearly and significantly more robust than the current O(1) priority queue structure proposed. It should be noted that the UCQ considered by [Erickson et al. 2000] is O(1) only provided the bucketwidth can be kept at O(1/N). Therefore, for the UCQ to maintain O(1) in a dynamic queue situation where N varies, costly bucketwidth resizes must be initiated to adjust the bucketwidth and hence the UCQ has never been widely used as the priority queue structure for practical simulators. In comparison with the more widely implemented SCQ, the LadderQ is also more superior. Firstly in the area of amortized complexity, the SCQ can at best be described as having expected O(1) complexity, meaning that there is no known theoretical proof to show O(1) except through a number of simulation examples. In fact, simulations studies conducted on the SCQ has shown that for certain scenarios where the priority increment is highly unstable, the SCQ exhibit O(N) characteristics either due to under or over triggering of resize operations. To demonstrate the usefulness and superiority of the LadderQ for practical implementation, Section 4.8 provides simulation studies on the performance of the LadderQ in comparison with previously proposed priority queue structures. It should be noted that most of the scenarios chosen for the simulation studies are those that have been previously proposed by well-known researchers on priority queues [see Section 2.7]. Other scenarios presented are to incorporate more stringent tests. In fact, the LadderQ does
not require any special scenarios to show its superiority since it already has the distinction being (1) theoretically, irrespective of the bucketwidth parameter (and hence N). We note that in the numerical studies presented in Section 4.8, the maximum number of rungs spawned never exceeded four levels.
Studies
We employ similar performance measurement techniques as discussed in Section 2.7 which consist of two different access pattern models – Classic Hold and Up/Down models, and queue size ranging from 10 to 1 million. But in addition to the ten different event distributions, we have included the Parto distribution. The Pareto(C,D) is a heavy-tailed distribution and is an excellent model for event timestamps with high variance. C is the shape parameter and D is the scale parameter. D affects the range where the random number generated is ≥ D. In our analysis, we set D = 1. C affects the mean and variance; Type 1 – Infinite mean &
infinite variance: 0 < C ≤ 1, Type 2 – Finite mean & infinite variance: 1 < C ≤ 2 and Type 3 – Finite mean & finite variance: C > 2. We have included the first two types, Pareto(1) and Pareto(1.5), since their mean and/or variance are different from the other distributions included in Section 2.7. The expression used to compute Pareto(C) is defined as pow(1/(1-rand()),1/C). The benchmarks were done on a similar hardware platform which is an Intel Pentium 4 with cache disabled and the operating system is identical to the one stated in Chapter 3.
on Intel Pentium 4 (Cache Disabled)
Figure 4-3: Mean access time for Ladder Queue under Classic Hold experiments.
Figure 4-4: Mean access time for Ladder Queue under Up/Down experiments.
Figures 4-3 and 4-4 demonstrate the reliable (1) characteristics of LadderQ under the Pareto(1) and Pareto(1.5) distributions which allowed us to test the jump parameter (à) with infinite mean and infinite variance, and finite mean and infinite variance respectively. The LadderQ also exhibits similar O(1) behavior for the other ten distributions and the performance graphs are shown in the article by Tang et al.
[2005]. In Section 4.7, we presented theoretical justifications that the LadderQ has a true O(1) amortized time complexity, based on the assumption that à is finite and greater than zero, regardless of its variance. The empirical studies of 12 different priority increment distributions, under the Classic Hold and Up/Down experiments, and for queue sizes ranging from 10 to 1 million, confirm the theoretical predictions that LadderQ is indeed O(1). In fact it is than the predictions because it exhibits O(1) behavior even under Pareto(1.0) where the mean and variance of the jump parameter are infinite.
Table 4-2 gives a summary of the normalized performance of all the new sequential priority queues contributed in this thesis. Table 4-3 shows a summary of the maximum number of rungs spawned in the experiments. The number of rungs did not exceed four under all distributions and queue sizes.
Table 4-2: Relative Average Performance* for All Distributions on Intel Pentium 4
Model Queue Size
Henriksen With Twol
Splay Tree With Twol
Skew Heap With Twol
Linked List
With Twol RCB Ladder Queue 10 1.42 1.94 1.45 1.00 1.09 1.07
102 1.21 1.78 1.35 1.00 1.07 1.08
103 1.18 1.87 1.39 1.37 1.10 1.00
104 1.16 1.87 1.42 4.92 1.13 1.00
105 1.19 1.85 1.43 1.17 1.22 1.00
106 1.20 1.89 1.49 1.20 1.25 1.00
Classic Hold
Avg 1.23 1.87 1.42 1.78 1.14 1.03
10 1.27 1.59 1.32 1.00 1.07 1.22
102 1.15 1.57 1.27 1.00 1.08 1.14 103 1.09 1.52 1.26 1.03 1.05 1.00
104 1.13 1.68 1.41 1.15 1.15 1.00
105 1.18 1.75 1.49 1.22 1.22 1.00
106 1.18 1.75 1.49 1.22 1.23 1.00
Up/Down
Avg 1.17 1.64 1.37 1.10 1.13 1.06
Total Avg 1.20 1.76 1.40 1.44 1.14 1.04
*Normalized with respect to the fastest access time, where the higher the number, the slower it is.
Table 4-3: Maximum number of rungs utilized in Classic Hold and Up/Down experiments
Distribution Classic Hold Up/Down
Exponential (1) 2 2
Uniform(0,2) 1 1
Uniform(0.9,1.1) 1 1
Bimodal 2 1
Triangular(0,1.5) 1 1
NegativeTriangular(0, 1000) 2 1
ExponentialMix 2 3
Camel(0,1000,0.001, 0.999) 3 3
Change (exp(1),Triangular
(90000,100000),2000) 4 3
Change(triangular(90000,
100000), exp(1), 10000) 3 3
Pareto (1.0) 3 3
Pareto (1.5) 2 3
of Bucketwidth on the Performance of LadderQ
In this section, we demonstrate that the bucketwidth δ1 does not affect the (1) characteristic of the LadderQ as elucidated in Theorem 4.2. We note in Figure 4-5 that when δ1 is varied from (1/100) to (100/), the mean access time remains relatively constant across all queue sizes. The LadderQ with a (1/100) bucketwidth results in slightly poorer performance (but still retains the (1) constant mean access time characteristic). This is due to the additional cost of skipping more empty buckets since for δ1 to be (1/100), there are 100 times more buckets created in ung[1]. In distinct contrast, when the bucketwidth of the UCQ varies by a few magnitudes, the mean access time varies by several factors (see Figure 1 in [Erickson et al. 2000]).
Figure 4-5: Mean access time of LadderQ (with widely-varying bucketwidth) for Classic Hold experiments and Exponential distribution. Maximum number of rungs
(MaxRungs) used during the experiments are also shown.
In this chapter, we presented a new priority queue structure called Ladder Queue (or LadderQ), which is made up of a sorted linked list as its front-end and an improved Twol back-end. To achieve an (1) characteristic, the LadderQ uses a unique bucket-by-bucket copy operation via the rung spawning mechanism. LadderQ’s true
(1) amortized time complexity is theoretically justified to be true on the assumption that à is finite and greater than zero, i.e. even when à varies, and it is regardless of the variance of the jump parameter. The empirical studies on the performance of the LadderQ were presented and they confirmed the theoretical predictions that LadderQ is indeed (1). Because LadderQ’s (1) complexity applies even when à varies, it is expected to efficiently handle many more different distributions not included in this chapter as compared to the Conventional+Twol and RCB+Twol. Furthermore, we have shown empirically that LadderQ exhibits (1) behavior even when the mean and variance of the jump parameter are infinite!
This concludes our contributions of high performance sequential priority queues. In the following chapter, we shall present an efficient, scalable and dynamic lock-free parallel access priority queue.
Chapter 5
Lock-Free Twol Priority Queue
Parallel access structures are mainly categorized as either blocking or non-blocking.
Blocking algorithms are commonly implemented using mutex (mutual exclusion) locks. Though mutex locks seem simple and straightforward, the technique often exhibits drawbacks like deadlock, priority inversion, scaling difficulties and relatively mediocre performance. Non-blocking algorithms, on the other hand, address all the negative aspects of blocking algorithms, thereby providing robustness and scalability.
In this chapter, we introduce a novel lock-free multi-tier Twol structure which is non-blocking, efficient and dynamic. This is the first known structure which involves the transferring of information between tiers in an efficient and absolutely lock-free fashion. The new structure is also linearizable, which is an important property demonstrating correctness for Twol to be practically useful as a priority queue structure. In addition, empirical results show that it outperforms the most recent lock-free algorithms. Lock-free Twol is indeed a pioneering work for the introduction of more efficient multi-tier lock-free structures in the near future.
of Chapter 5
This chapter is organized as follows. In Section 5.2, we provide an introduction to lock-free queue structures and the tic operations associated with these structures.
In addition, we illustrate some concepts such as process helping and pointer marking which are employed in the lock-free Twol structure. We reveal the basic structure of the lock-free Twol in Section 5.3 and the subtle features of this new priority queue are explained in Section 5.4. Section 5.5 presents the numerical analysis which shows that the lock-free Twol outperforms the most recent lock-free algorithms. Note that the lock-free Twol algorithm is explained explicitly in Appendix A and the proof of its correctness is given in Appendix B.