Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
407,16 KB
Nội dung
80 Chapter 4 Parallel Sort and GroupBy
may also be used. Apart from these basic functions, most commercial relational
database management systems (RDBMS) also include other advanced functions,
such as advanced statistical functions, etc. From a query processing point of view,
these functions take a set of records (i.e., a table) as their input and produce a single
value as the result.
4.1.3 GroupBy
An example of a GroupBy query is “retrieve number of students for each degree”.
The student records are grouped according to specific degrees, and for each group
the number of records is counted. These numbers will then represent the number
of students in each degree program. The
SQL and a sample result of this query are
given below.
Query 4.5:
Select Sdegree, COUNT(*)
From STUDENT
Group By Sdegree;
It is also worth mentioning that the input table may have been filtered by using
a
Where clause (in both scalar aggregate and GroupBy queries), and additionally
for GroupBy queries the results of the grouping may be further filtered by using a
Having clause.
4.2 SERIAL EXTERNAL SORTING METHOD
Serial external sorting is external sorting in a uniprocessor environment. The most
common serial external sorting algorithm is based on sort-merge. The underlying
principle of sort-merge algorithm is to break the file up into unsorted subfiles, sort
the subfiles, and then merge the sorted subfiles into larger and larger sorted subfiles
until the entire file is sorted. Note that the first stage involves sorting the first lot of
subfiles, whereas the second stage is actually the merging phase. In this scenario,
it is important to determine the size of the first lot of subfiles that are to be sorted.
Normally, each of these subfiles must be small enough to fit into the main memory,
so that sorting of these subfiles can be done in the main memory with any internal
sorting technique. In other words, the size of these subfiles is usually determined
by the buffer size in main memory, which is to be used for sorting each subfile
internally. A typical algorithm for external sorting using B buffers is presented in
Figure 4.1.
The algorithm presented in Figure 4.1 is divided into two phases: sort and
merge. The merge phase consists of loops and each run in the outer loop is called
a pass; subsequently, the merge phase contains i passes, where i D 1; 2;:::.For
consistency, the sort phase is named pass 0.
To explain the sort phase, consider the following example. Assume the size of
the file to be sorted is 108 pages and we have 5 buffer pages available (B D 5
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
4.2 Serial External Sorting Method 81
Algorithm: Serial External Sorting
//
Sort phase
Pass 0
1. Read
B
pages at a time into memory
2. Sort them, and Write out a sub-file
3. Repeat steps 1-2 until all pages have been processed
//
Merge phase
Pass
i
D 1, 2, :::
4. While the number of sub-files at end of previous pass
is > 1
5. While there are sub-files to be merged from
previous pass
6. Choose
B
-1 sorted sub-files from the previous pass
7. Read each sub-file into an input buffer page
at a time
8. Merge these sub-files into one bigger sub-file
9. Write to the output buffer one page at a time
Figure 4.1 External sorting algorithm based on sort-merge
pages). First read 5 pages from the file, sort them, and write them as one subfile
into the disk. Then read, sort, and write another 5 pages. In the last run, read, sort,
and write 3 pages only. As a result of this sort phase, d108=BeD22 subfiles, where
the first 21 subfiles are of size 5 pages each and the last subfile is only 3 pages long.
Once the sorting of subfiles is completed, the merge phase starts. Continuing the
example above, we will use B 1 buffers (i.e., 4 buffers) for input and 1 buffer
for output. The merging process is as follows. In pass 1, we first read 4 sorted
subfiles that are produced in the sort phase. Then we perform a 4-way merg-
ing (because only 4 buffers are used as input). This 4-way merging is actually a
k-way merging, and in this case k D 4, since the number of input buffers is 4 (i.e.,
B 1 buffers D 4 buffers). An algorithm for a k-way merging is explained in
Figure 4.2.
The above 4-way merging is repeated until all subfiles (e.g., 22 subfiles from
pass 0) are processed. This process is called pass 1, and it produces d22=4eD6
subfiles of 20 pages each, except for the last run, which is only 8 pages long.
The next pass, pass 2, repeats the 4-way merging to merge the 6 subfiles pro-
duced in pass 1. We then first read 4 subfiles of 20 pages long and perform a 4-way
merge. This results in a subfile 80 pages long. Then we read the last 2 subfiles, one
of which is 20 pages long while the other is only 8 pages long, and merge them to
become the second subfile in this pass. So, as a result, pass 2 produces d6=4eD2
subfiles.
Finally, the final pass, pass 3, is to merge the 2 subfiles produced in pass 2 and
to produce a sorted file. The process stops as there are no more subfiles.
In the above example, using an 108-page file and 5 buffer pages, we need to have
4 passes, where pass 0 is the sort phase and passes 1 to 3 are the merge phase. The
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
82 Chapter 4 Parallel Sort and GroupBy
Algorithm: k-way merging
input files f
1
,f
2
, , f
n
;
output file f
o
/* Sort files f
1
,f
2
, , f
n
, based on the attributes a
1
of all files */
1. Open files f
1
,f
2
, , f
n
.
2. Read a record from files f
1
,f
2
, , f
n
.
3. Find the smallest value among attributes a
1
of the
records from step 2. Store this value to a
x
and the
file to f
x
(f
1
Äf
x
Äf
n
).
4. Write a
x
to an output file f
o
.
5. Read a record from file f
x
.
6. Repeat steps 3-5, until no more record in all files
f
1
,f
2
, , f
n
.
Figure 4.2 k-Way merging algorithm
number of passes can be calculated as follows. The number of passes needed to sort
a file with B buffers available is dlog
B1
dfile size=Bee C 1, where dfile size=Be is
the number of subfiles produced in pass 0 and dlog
B1
dfile size=Bee is the number
of passes in the merge phase. This can be seen as follows. In general, the number of
passes x in the merge phase of α items satisfies the relationship: α=.B 1/
x
D 1,
from which we obtain x D log
B1
.α/.
In each pass, we read and write all the pages (e.g., 108 pages). Therefore,
the total I/O cost for the overall serial external sorting can be calculated
as 2 ð file size ð number of passes D 2 ð 108 ð 4 D 864 pages. More com-
prehensive cost models for serial external sort are explained below in
Section 4.4.
As shown in the above example, an important aspect of serial external sorting is
the buffer size, where each subfile comfortably fits into the main memory. The big-
ger the buffer (main memory) size, the fewer number of passes taken to sort a file,
resulting in performance gain. Table 4.1 illustrates how performance is improved
when the number of buffers increases.
In terms of total I/O cost, the number of passes is a key determinant. For
example, to sort 1 billion pages, using 129 buffers is 6 times more efficient than
using 3 buffers (e.g., 30:5 D 6:1).
There are a number of variations to the serial external sort-merge explained
above, such as using a double buffering technique or a blocked I/O method. As
our concern is not with the serial part of external sorting, our assumption of serial
external sorting is based on the above sort-merge technique using B buffers.
As stated in the beginning, serial external sort is the basis for parallel exter-
nal sort. Particularly in a shared-nothing environment, each processor has its own
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
4.3 Algorithms for Parallel External Sort 83
Table 4.1 Number of passes in serial external sorting as number of buffer increases
R B D 3 B D 5 B D 9 B D 17 B D 129 B D 257
100 7 4 3 2 1 1
1,000 10 5 4 3 2 2
10,000 13 7 5 4 2 2
100,000 17 9 6 5 3 3
1 million 20 10 7 5 3 3
10 million 23 12 8 6 4 3
100 million 26 14 9 7 4 4
1 billion 30 15 10 8 5 4
data, and sorting this data locally in each processor is done as per serial external
sort explained above. Therefore, the main concern in parallel external sort is not on
the local sort but on when the local sort is carried out (i.e., local sort is done first
or later) and how merging is performed. The next section describes different meth-
ods of parallel external sort by basically considering the two factors mentioned
above.
4.3 ALGORITHMS FOR PARALLEL EXTERNAL SORT
In this section, five parallel external sort methods for paralleldatabase systems
are explained; (i/ parallel merge-all sort, (ii) parallel binary-merge sort, (iii)paral-
lel redistribution binary-merge sort, (iv) parallel redistribution merge-all sort, and
(v/ parallel partitioned sort. Each of these will be described in more detail in the
following.
4.3.1 Parallel Merge-All Sort
The Parallel merge-all sort method is a traditional approach, which has been
adopted as the basis for implementing sorting operations in several database
machine prototypes (e.g., Gamma) and some commercial Parallel DBMS. Parallel
merge-all sort is composed of two phases: local sort and final merge.Thelocal
sort phase is carried out independently in each processor. Local sorting in each
processor is performed as per a normal serial external sorting mechanism. A serial
external sorting is used as it is assumed that the data to be sorted in each processor
is very large and cannot be fitted into the main memory, and hence external sorting
(as opposed to internal sorting) is required in each processor.
After the local sort phase has been completed, the second phase, final merge
phase, starts. In this final merge phase, the results from the local sort phase are
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
84 Chapter 4 Parallel Sort and GroupBy
1 2 3
4
Local sort
Records
f
rom the child o
p
erator
Final merge
1
8
12
16
4
11
15
3
7
14
2
6
10
1
5
9
13
4
8
12
16
1
5
9
13
2
6
10
14
3
7
11
15
1
16
Figure 4.3 Parallel merge-all sort
transferred to the host for final merging. The final merge phase is carried out by
one processor, namely, the host. An algorithm for a k-way merging is explained in
Figure 4.2.
Figure 4.3 illustrates a parallel merge-all sort process. For simplicity, a list of
numbers is used and this list is to be sorted. In the real world, the list of numbers
is actually a list of records from very large tables.
Figure 4.3 shows that a parallel merge-all sort is simple, because it is a one-level
tree. Load balancing in each processor at the local sort phase is relatively easy
to achieve, especially if a round-robin data placement technique is used in the
initial data partitioning. It is also easy to predict the outcome of the process, as
performance modeling of such a process is relatively straightforward.
Despite its simplicity, the parallel merge-all sort method incurs an obvious prob-
lem, particularly in the final merging phase, as merging in one processor is heavy.
This is true especially if the number of processors is large and there is a limit to
the number of files to be merged (i.e., limitation in number of files to be opened).
Another factor in merging is the buffer size as mentioned above in the discussion
of serial external sorting.
Another problem with parallel merge-all sort is network contention, as all tem-
porary results from each processor in the local sort phase are passed to the host.
The problem of merging by one host is to be tackled by the next sorting scheme,
where merging is not done by one processor but is shared by multiple processors
in the form of hierarchical merging.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
4.3 Algorithms for Parallel External Sort 85
4.3.2 Parallel Binary-Merge Sort
The first phase of parallel binary-merge sort is a local sort similar to the paral-
lel merge-all sort. The second phase, the merging phase, is pipelined instead of
concentrating on one processor. The way the merging phase works is by taking
the results from two processors and then merging the two in one processor. As
this merging technique uses only two processors, this merging is called “binary
merging.” The result of the merging between two processors is passed on to the
next level until one processor (the host) is left. Subsequently, the merging process
forms a hierarchy. Figure 4.4 illustrates the process.
The main reason for using parallel binary-merge sort is that the merging work-
load is spread to a pipeline of processors instead of one processor. It is true,
however, that final merging still has to be done by one processor.
Some of the benefits of parallel binary-merge sort are similar to those of parallel
merge-all sort. For instance, balancing in local sort can be done if a round-robin
1
1
2
Local sort
Records
f
rom the child o
p
erator
Two-level hierarchical
merging using (N–1)
nodes in a pipeline.
2
3
8
12
16
4
11
15
3
7
14
2
6
10
1
5
9
13
4
8
12
16
3
7
11
15
2
6
10
14
1
5
9
13
11
12
15
16
9
10
13
14
3
4
7
8
1
2
3
6
1
16
3
4
Figure 4.4 Parallel binary-merge sort
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
86 Chapter 4 Parallel Sort and GroupBy
k-way merging
Parallel Merge-All Sort
binary merging
Parallel Binary-Merge Sor
t
Figure 4.5 Binary-merge vs.
k-way merge in the merging phase
data placement is initially used for the raw data to be sorted. Another benefit, as
stated above, is that by merging the workload it is now shared among processors.
However, problems relating to the heavy merging workload in the host still exist,
even though now the final merging merges only a pair of lists of sorted data and is
not a k-way merging like that in parallel merge-all sort. Binary merging can still be
time consuming, particularly if the two lists to be merged are very large. Figure 4.5
illustrates binary-merge versus k-way merge, which is carried out by the host.
The main difference between k-way merging and binary merging is that in
k-way merging, there is a searching process in the merging; that is, it searches
the smallest value among all values being compared at the same time. In binary
merging, this searching is purely to obtain a comparison between two values simul-
taneously.
Regarding the system requirement, k-way merging requires a sufficient number
of files to be opened at the same time. This requirement is trivial in binary merging,
as it requires only a maximum of two files to be opened, and this is easily satisfied
by any operating systems.
The pipeline system, as in the binary merging, will certainly produce extra work
through the pipe itself. The pipeline mechanism also produces a higher tree, not
a one-level tree as with the previous method. However, if there is a limit to the
number of opened files permitted in the k-way merging, parallel merge-all sort
will incur merging overheads.
In parallel binary-merge sort, there is still no true parallelism in the merging
because only a subset, not all, of the available processors are used.
In the next three sections, three possible alternatives using the concept of redis-
tribution or repartitioning are described. The first approach is a modification of
parallel binary-merge sort by incorporating redistribution in the pipeline hierarchy
of merging. The second approach is an alteration to parallel merge-all sort, also
through the use of redistribution. The third approach differs from the others, as
local sorting is delayed after partitioning is done.
4.3.3 Parallel Redistribution Binary-Merge Sort
Parallel redistribution binary-merge sort is motivated by parallelism at all levels in
the pipeline hierarchy. Therefore, it is similar to parallel binary-merge sort, because
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
4.3 Algorithms for Parallel External Sort 87
both methods use a hierarchy pipeline for merging local sort results, but differs in
terms of the number of processors involved in the pipe. With parallel redistribution
binary-merge sort, all processors are used at each level in the hierarchy of merging.
The steps for parallel redistribution binary-merge sort can be described as fol-
lows. First, carry out a local sort in each processor similar to the previous sorting
methods. Second, redistribute the results of the local sort to the same pool of pro-
cessors. Third, do a merging using the same pool of processors. Finally, repeat the
above two steps until final merging. The final result is the union of all temporary
results obtained in each processor. Figure 4.6 illustrates the parallel redistribution
binary-merge sort method.
1–5
6–10
11–15
16–20
3
4
Local sort
Records from the child operator
8
12
16
4
11
15
3
7
14
2
6
10
1
5
9
13
4
8
12
16
3
7
11
15
2
6
10
14
1
5
9
13
Redistribution
1
2
1-10
11-20 11-201-10
4
8
3
7
12
16
11
15
2
6
10
14
1
5
9
13
Intermediate merge
1
2
3
4
3
4
7
8
11
12
15
16
1
2
5
6
13
14
9
10
Sorted among
and within files
3
4
1
2
5
1
2
3
4
5
7
8
6
9
10
6
7
8
9
10
11
12
15
13
14
16
11
12
13
14
15
16
Final merge
Sorted list
3
4
Range
Redistribution
Range
Redistribution
1
2
Range
Redistribution
Figure 4.6 Parallel redistribution binary-merge sort
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
88 Chapter 4 Parallel Sort and GroupBy
Note from the illustration that in the final merge phase, some of the boxes are
empty (i.e., gray boxes). This indicates that they do not receive any values from the
designated processors. For example, the first box on the left is gray because there
are no values ranging from 1 to 5 from processor 2. Practically, in this example,
processor 1 performs the final merging of two lists, because the other two lists are
empty.
Also, note that the results produced by the intermediate merging in the above
example are sorted within and among processors. This means that, for example,
processors 1 and 2 produce a sorted list each, and the union of these results is also
sorted where the results from processor 2 are preceded by those from processor
1. This is applied to other pairs of processors. Each pair of processors in this case
forms a pool of processors. At the next level of merging, two pools of processors
use the same strategy as in the previous level. Finally, in the final merging, all
processors will form one pool, and therefore results produced in each processor
are sorted, and these results united together are then sorted based on the processor
order. In some systems, this is already a final result. If there is a need to place the
results in one processor, results transfers are then carried out.
The apparent benefit of this method is that merging becomes lighter compared
with those without redistribution, because merging is now shared by multiple pro-
cessors, not monopolized by just one processor. Parallelism is therefore accom-
plished at all levels of merging, even though the performance benefits of this
mechanism are restricted.
The problem of the redistribution method still remains, which relates to the
height of the tree. This is due to the fact that merging is done in a pipeline format.
Another problem raised by the redistribution is skew. Although initial placement
in each disk is balanced through the use of round-robin data partitioning, redistri-
bution in the merging process is likely to produce skew, as shown in Figure 4.6.
Like the merge-all sort method, final merging in the redistribution method is also
dependent upon the maximum number of files opened.
4.3.4 Parallel Redistribution Merge-All Sort
Parallel redistribution merge-all sort is motivated by two factors, namely, reducing
the height of the tree while maintaining parallelism at the merging stage. This can
be achieved by exploiting the features of parallel merge-all andparallel redistribu-
tion binary-merge methods. In other words, parallel redistribution is a two-phase
method (local sort and final merging) like parallel merge-all sort, but does a redis-
tribution based on a range partitioning. Figure 4.7 gives an illustration of parallel
redistribution merge-all sort.
As shown in Figure 4.7, parallel redistribution merge-all sort is a two-phase
method, where in phase one, local sort is carried out as is done with other methods,
and in phase two, results from local sort are redistributed to all processors based
on a range partitioning, and merging is then performed by each processor.
Similar to parallel redistribution binary-merge sort, empty (gray) boxes are actu-
ally empty lists as a result of data redistribution. In the above example, processor
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
4.3 Algorithms for Parallel External Sort 89
6–10
1–5
11–15
16–20
1
Local sort
Records from the child operator
8
12
16
4
11
15
3
7
14
2
6
10
1
5
9
13
Redistribution
1
2
3
4
34 2
1
5
1
2
3
4
5
8 7
6
10
9
6
7
8
9
10
12
11
15
1314
16
11
12
13
14
15
16
Final merge
Sorted list
4
8
12
16
3
7
11
15
2
6
10
14
1
5
9
13
Range
Redistribution
3
4
2
Figure 4.7 Parallel redistribution merge-all sort
4 has three empty lists coming from processors 2, 3, and 4, as they do not have
values ranging from 16 to 20 as specified by the range partitioning function.
Also, note that the final results produced in the final merging phase in each
processor are sorted, and these are also sorted among all processors based on the
order of the processors specified by the range partitioning function.
The advantage of this method is the same as that of parallel redistribution
binary-merge sort, including true parallelism in the merging process. However,
the tree of parallel redistribution merge-all sort is not a tall tree as in the paral-
lel redistribution binary-merge sort. It is, in fact, a one-level tree, the same as in
parallel merge-all sort.
Not only do the advantages of parallel redistribution merge-all sort mirror those
in parallel merge-all sort andparallel redistribution binary-merge sort, so also do
the problems. Skew problems found in parallel redistribution binary-merge sort
also exist with this method. Consequently, skew modeling needs some simplified
assumptions as well. Additionally, a bottleneck problem in merging, which is sim-
ilar to that of parallel merge-all sort is also common here, especially if the number
of processors is large and exceeds the limit of the number of files that can be
opened at once.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
[...]... (ICDE 1990) and DeWitt et al (1992) discuss systems issues in paralleldatabase sorting Parallel sorting for databases uses external sorting methods Yamane and Take (1987) and Zhao et al (2000) proposed a parallel partition sort Lorie and Young (VLDB 1999) concentrated on the communication costs of parallel sorting, whereas Lo and Huang (2002) focused on the skew aspects Recent work on parallel sorting... selecting an appropriate parallel sorting algorithm, the following rules can be adopted: Ž If processing skew degree is high, then use parallel redistribution merge-all sort Ž If both data skew andprocessing skew degrees are high OR no skew, then use parallel partitioned sort In this chapter, three parallel algorithms for processing GroupBy queries in high-performanceparalleldatabase systems have also... 4.6 Give a proof for the following rule: If processing skew degree is high, then use parallel redistribution merge-all sort, whereas if both data skew andprocessing skew degrees are high or no skew, then use parallel partitioned sort 4.7 Investigate your favourite parallel DBMS, and show how parallel sort andparallel group-by are expressed in SQL Chapter 5 Parallel Join The join operation is one of... PARALLEL ALGORITHMS FOR GROUPBY QUERIES Parallel aggregate processing is very similar to parallel sorting, described in the previous section From the lessons we learned from parallel sorting, we focus on three parallel aggregate query algorithms; Ž Ž Ž Traditional methods including merge-all and hierarchical merging, Two-phase method, and Redistribution method 4.4.1 Traditional Methods (Merge-All and. .. Chapter 4 Parallel Sort and GroupBy 4.3.5 Parallel Partitioned Sort Parallel partitioned sort is influenced by the techniques used in parallel partitioned join, where the process is split into two stages: partitioning and independent local work In parallel partitioned sort, first we partition local data according to range partitioning used in the operation Note the difference between this method and others... destination cost D jRi j ð td Similar to parallel merge-all sort andparallel binary-merge sort, Ri in the above equation may involve data skew Other than the compute destination cost, the local 102 Chapter 4 Parallel Sort and GroupBy merge-sort costs in parallel redistribution binary-merge sort are the same as those in parallel merge-all sort The pipeline merging costs in parallel redistribution binary-merge... sorting are studied These include (i) parallel merge-all sort, (ii) parallel binary-merge sort, (iii) parallel redistribution binary-merge sort, (iv) parallel redistribution merge-all sort, and (v/ parallel partitioned sort The third and fourth algorithms are the redistribution versions of the second and first, respectively Cost models for each parallel sort algorithm are studied The unique features of these... often necessary when assembling information, since the desired information is often split during the design as a result of the normalization process High-PerformanceParallelDatabaseProcessingandGrid Databases, by David Taniar, Clement Leung, Wenny Rahayu, and Sushant Goel Copyright 2008 John Wiley & Sons, Inc 112 5.1 Join Operations 113 join attributes R S attr1 a attr2 r b r c s r attr1 join attr2... depends on the size of the available buffer, and data redistribution to bridge the first and second steps of parallel sorting The redistribution versions of parallel sorting algorithms, which are normally better than the non-redistribution version, are prone to processing skew, even though the initial data placement is uniformly distributed Solving processing skew in parallel sorting is a challenge without... merging costs for parallel redistribution merge-all sort and those for parallel redistribution binary-merge sort, there are major differences The first relates to the number of levels in the pipeline, which is dlog2 N /e for parallel redistribution binary-merge sort and 1 for parallel redistribution merge-all sort The second concerns the number of merging passes involved in the k-way merging In parallel redistribution . watermark.
92 Chapter 4 Parallel Sort and GroupBy
4.4 PARALLEL ALGORITHMS FOR GROUPBY
QUERIES
Parallel aggregate processing is very similar to parallel sorting,. several database
machine prototypes (e.g., Gamma) and some commercial Parallel DBMS. Parallel
merge-all sort is composed of two phases: local sort and final