Tài liệu High-Performance Parallel Database Processing and Grid Databases- P3 docx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	50
Dung lượng	407,16 KB

Nội dung

80 Chapter 4 Parallel Sort and GroupBy may also be used. Apart from these basic functions, most commercial relational database management systems (RDBMS) also include other advanced functions, such as advanced statistical functions, etc. From a query processing point of view, these functions take a set of records (i.e., a table) as their input and produce a single value as the result. 4.1.3 GroupBy An example of a GroupBy query is “retrieve number of students for each degree”. The student records are grouped according to specific degrees, and for each group the number of records is counted. These numbers will then represent the number of students in each degree program. The SQL and a sample result of this query are given below. Query 4.5: Select Sdegree, COUNT(*) From STUDENT Group By Sdegree; It is also worth mentioning that the input table may have been filtered by using a Where clause (in both scalar aggregate and GroupBy queries), and additionally for GroupBy queries the results of the grouping may be further filtered by using a Having clause. 4.2 SERIAL EXTERNAL SORTING METHOD Serial external sorting is external sorting in a uniprocessor environment. The most common serial external sorting algorithm is based on sort-merge. The underlying principle of sort-merge algorithm is to break the file up into unsorted subfiles, sort the subfiles, and then merge the sorted subfiles into larger and larger sorted subfiles until the entire file is sorted. Note that the first stage involves sorting the first lot of subfiles, whereas the second stage is actually the merging phase. In this scenario, it is important to determine the size of the first lot of subfiles that are to be sorted. Normally, each of these subfiles must be small enough to fit into the main memory, so that sorting of these subfiles can be done in the main memory with any internal sorting technique. In other words, the size of these subfiles is usually determined by the buffer size in main memory, which is to be used for sorting each subfile internally. A typical algorithm for external sorting using B buffers is presented in Figure 4.1. The algorithm presented in Figure 4.1 is divided into two phases: sort and merge. The merge phase consists of loops and each run in the outer loop is called a pass; subsequently, the merge phase contains i passes, where i D 1; 2;:::.For consistency, the sort phase is named pass 0. To explain the sort phase, consider the following example. Assume the size of the file to be sorted is 108 pages and we have 5 buffer pages available (B D 5 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 4.2 Serial External Sorting Method 81 Algorithm: Serial External Sorting // Sort phase  Pass 0 1. Read B pages at a time into memory 2. Sort them, and Write out a sub-file 3. Repeat steps 1-2 until all pages have been processed // Merge phase  Pass i D 1, 2, ::: 4. While the number of sub-files at end of previous pass is > 1 5. While there are sub-files to be merged from previous pass 6. Choose B -1 sorted sub-files from the previous pass 7. Read each sub-file into an input buffer page at a time 8. Merge these sub-files into one bigger sub-file 9. Write to the output buffer one page at a time Figure 4.1 External sorting algorithm based on sort-merge pages). First read 5 pages from the file, sort them, and write them as one subfile into the disk. Then read, sort, and write another 5 pages. In the last run, read, sort, and write 3 pages only. As a result of this sort phase, d108=BeD22 subfiles, where the first 21 subfiles are of size 5 pages each and the last subfile is only 3 pages long. Once the sorting of subfiles is completed, the merge phase starts. Continuing the example above, we will use B  1 buffers (i.e., 4 buffers) for input and 1 buffer for output. The merging process is as follows. In pass 1, we first read 4 sorted subfiles that are produced in the sort phase. Then we perform a 4-way merging (because only 4 buffers are used as input). This 4-way merging is actually a k-way merging, and in this case k D 4, since the number of input buffers is 4 (i.e., B  1 buffers D 4 buffers). An algorithm for a k-way merging is explained in Figure 4.2. The above 4-way merging is repeated until all subfiles (e.g., 22 subfiles from pass 0) are processed. This process is called pass 1, and it produces d22=4eD6 subfiles of 20 pages each, except for the last run, which is only 8 pages long. The next pass, pass 2, repeats the 4-way merging to merge the 6 subfiles produced in pass 1. We then first read 4 subfiles of 20 pages long and perform a 4-way merge. This results in a subfile 80 pages long. Then we read the last 2 subfiles, one of which is 20 pages long while the other is only 8 pages long, and merge them to become the second subfile in this pass. So, as a result, pass 2 produces d6=4eD2 subfiles. Finally, the final pass, pass 3, is to merge the 2 subfiles produced in pass 2 and to produce a sorted file. The process stops as there are no more subfiles. In the above example, using an 108-page file and 5 buffer pages, we need to have 4 passes, where pass 0 is the sort phase and passes 1 to 3 are the merge phase. The Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 82 Chapter 4 Parallel Sort and GroupBy Algorithm: k-way merging input files f 1 ,f 2 , , f n ; output file f o /* Sort files f 1 ,f 2 , , f n , based on the attributes a 1 of all files */ 1. Open files f 1 ,f 2 , , f n . 2. Read a record from files f 1 ,f 2 , , f n . 3. Find the smallest value among attributes a 1 of the records from step 2. Store this value to a x and the file to f x (f 1 Äf x Äf n ). 4. Write a x to an output file f o . 5. Read a record from file f x . 6. Repeat steps 3-5, until no more record in all files f 1 ,f 2 , , f n . Figure 4.2 k-Way merging algorithm number of passes can be calculated as follows. The number of passes needed to sort a file with B buffers available is dlog B1 dfile size=Bee C 1, where dfile size=Be is the number of subfiles produced in pass 0 and dlog B1 dfile size=Bee is the number of passes in the merge phase. This can be seen as follows. In general, the number of passes x in the merge phase of α items satisfies the relationship: α=.B  1/ x D 1, from which we obtain x D log B1 .α/. In each pass, we read and write all the pages (e.g., 108 pages). Therefore, the total I/O cost for the overall serial external sorting can be calculated as 2 ð file size ð number of passes D 2 ð 108 ð 4 D 864 pages. More com- prehensive cost models for serial external sort are explained below in Section 4.4. As shown in the above example, an important aspect of serial external sorting is the buffer size, where each subfile comfortably fits into the main memory. The bigger the buffer (main memory) size, the fewer number of passes taken to sort a file, resulting in performance gain. Table 4.1 illustrates how performance is improved when the number of buffers increases. In terms of total I/O cost, the number of passes is a key determinant. For example, to sort 1 billion pages, using 129 buffers is 6 times more efficient than using 3 buffers (e.g., 30:5 D 6:1). There are a number of variations to the serial external sort-merge explained above, such as using a double buffering technique or a blocked I/O method. As our concern is not with the serial part of external sorting, our assumption of serial external sorting is based on the above sort-merge technique using B buffers. As stated in the beginning, serial external sort is the basis for parallel external sort. Particularly in a shared-nothing environment, each processor has its own Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 4.3 Algorithms for Parallel External Sort 83 Table 4.1 Number of passes in serial external sorting as number of buffer increases R B D 3 B D 5 B D 9 B D 17 B D 129 B D 257 100 7 4 3 2 1 1 1,000 10 5 4 3 2 2 10,000 13 7 5 4 2 2 100,000 17 9 6 5 3 3 1 million 20 10 7 5 3 3 10 million 23 12 8 6 4 3 100 million 26 14 9 7 4 4 1 billion 30 15 10 8 5 4 data, and sorting this data locally in each processor is done as per serial external sort explained above. Therefore, the main concern in parallel external sort is not on the local sort but on when the local sort is carried out (i.e., local sort is done first or later) and how merging is performed. The next section describes different methods of parallel external sort by basically considering the two factors mentioned above. 4.3 ALGORITHMS FOR PARALLEL EXTERNAL SORT In this section, five parallel external sort methods for parallel database systems are explained; (i/ parallel merge-all sort, (ii) parallel binary-merge sort, (iii)parallel redistribution binary-merge sort, (iv) parallel redistribution merge-all sort, and (v/ parallel partitioned sort. Each of these will be described in more detail in the following. 4.3.1 Parallel Merge-All Sort The Parallel merge-all sort method is a traditional approach, which has been adopted as the basis for implementing sorting operations in several database machine prototypes (e.g., Gamma) and some commercial Parallel DBMS. Parallel merge-all sort is composed of two phases: local sort and final merge.Thelocal sort phase is carried out independently in each processor. Local sorting in each processor is performed as per a normal serial external sorting mechanism. A serial external sorting is used as it is assumed that the data to be sorted in each processor is very large and cannot be fitted into the main memory, and hence external sorting (as opposed to internal sorting) is required in each processor. After the local sort phase has been completed, the second phase, final merge phase, starts. In this final merge phase, the results from the local sort phase are Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 84 Chapter 4 Parallel Sort and GroupBy 1 2 3 4 Local sort Records f rom the child o p erator Final merge 1 8 12 16 4 11 15 3 7 14 2 6 10 1 5 9 13 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 1 16 Figure 4.3 Parallel merge-all sort transferred to the host for final merging. The final merge phase is carried out by one processor, namely, the host. An algorithm for a k-way merging is explained in Figure 4.2. Figure 4.3 illustrates a parallel merge-all sort process. For simplicity, a list of numbers is used and this list is to be sorted. In the real world, the list of numbers is actually a list of records from very large tables. Figure 4.3 shows that a parallel merge-all sort is simple, because it is a one-level tree. Load balancing in each processor at the local sort phase is relatively easy to achieve, especially if a round-robin data placement technique is used in the initial data partitioning. It is also easy to predict the outcome of the process, as performance modeling of such a process is relatively straightforward. Despite its simplicity, the parallel merge-all sort method incurs an obvious problem, particularly in the final merging phase, as merging in one processor is heavy. This is true especially if the number of processors is large and there is a limit to the number of files to be merged (i.e., limitation in number of files to be opened). Another factor in merging is the buffer size as mentioned above in the discussion of serial external sorting. Another problem with parallel merge-all sort is network contention, as all temporary results from each processor in the local sort phase are passed to the host. The problem of merging by one host is to be tackled by the next sorting scheme, where merging is not done by one processor but is shared by multiple processors in the form of hierarchical merging. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 4.3 Algorithms for Parallel External Sort 85 4.3.2 Parallel Binary-Merge Sort The first phase of parallel binary-merge sort is a local sort similar to the parallel merge-all sort. The second phase, the merging phase, is pipelined instead of concentrating on one processor. The way the merging phase works is by taking the results from two processors and then merging the two in one processor. As this merging technique uses only two processors, this merging is called “binary merging.” The result of the merging between two processors is passed on to the next level until one processor (the host) is left. Subsequently, the merging process forms a hierarchy. Figure 4.4 illustrates the process. The main reason for using parallel binary-merge sort is that the merging workload is spread to a pipeline of processors instead of one processor. It is true, however, that final merging still has to be done by one processor. Some of the benefits of parallel binary-merge sort are similar to those of parallel merge-all sort. For instance, balancing in local sort can be done if a round-robin 1 1 2 Local sort Records f rom the child o p erator Two-level hierarchical merging using (N–1) nodes in a pipeline. 2 3 8 12 16 4 11 15 3 7 14 2 6 10 1 5 9 13 4 8 12 16 3 7 11 15 2 6 10 14 1 5 9 13 11 12 15 16 9 10 13 14 3 4 7 8 1 2 3 6 1 16 3 4 Figure 4.4 Parallel binary-merge sort Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 86 Chapter 4 Parallel Sort and GroupBy k-way merging Parallel Merge-All Sort binary merging Parallel Binary-Merge Sor t Figure 4.5 Binary-merge vs. k-way merge in the merging phase data placement is initially used for the raw data to be sorted. Another benefit, as stated above, is that by merging the workload it is now shared among processors. However, problems relating to the heavy merging workload in the host still exist, even though now the final merging merges only a pair of lists of sorted data and is not a k-way merging like that in parallel merge-all sort. Binary merging can still be time consuming, particularly if the two lists to be merged are very large. Figure 4.5 illustrates binary-merge versus k-way merge, which is carried out by the host. The main difference between k-way merging and binary merging is that in k-way merging, there is a searching process in the merging; that is, it searches the smallest value among all values being compared at the same time. In binary merging, this searching is purely to obtain a comparison between two values simul- taneously. Regarding the system requirement, k-way merging requires a sufficient number of files to be opened at the same time. This requirement is trivial in binary merging, as it requires only a maximum of two files to be opened, and this is easily satisfied by any operating systems. The pipeline system, as in the binary merging, will certainly produce extra work through the pipe itself. The pipeline mechanism also produces a higher tree, not a one-level tree as with the previous method. However, if there is a limit to the number of opened files permitted in the k-way merging, parallel merge-all sort will incur merging overheads. In parallel binary-merge sort, there is still no true parallelism in the merging because only a subset, not all, of the available processors are used. In the next three sections, three possible alternatives using the concept of redistribution or repartitioning are described. The first approach is a modification of parallel binary-merge sort by incorporating redistribution in the pipeline hierarchy of merging. The second approach is an alteration to parallel merge-all sort, also through the use of redistribution. The third approach differs from the others, as local sorting is delayed after partitioning is done. 4.3.3 Parallel Redistribution Binary-Merge Sort Parallel redistribution binary-merge sort is motivated by parallelism at all levels in the pipeline hierarchy. Therefore, it is similar to parallel binary-merge sort, because Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 4.3 Algorithms for Parallel External Sort 87 both methods use a hierarchy pipeline for merging local sort results, but differs in terms of the number of processors involved in the pipe. With parallel redistribution binary-merge sort, all processors are used at each level in the hierarchy of merging. The steps for parallel redistribution binary-merge sort can be described as follows. First, carry out a local sort in each processor similar to the previous sorting methods. Second, redistribute the results of the local sort to the same pool of processors. Third, do a merging using the same pool of processors. Finally, repeat the above two steps until final merging. The final result is the union of all temporary results obtained in each processor. Figure 4.6 illustrates the parallel redistribution binary-merge sort method. 1–5 6–10 11–15 16–20 3 4 Local sort Records from the child operator 8 12 16 4 11 15 3 7 14 2 6 10 1 5 9 13 4 8 12 16 3 7 11 15 2 6 10 14 1 5 9 13 Redistribution 1 2 1-10 11-20 11-201-10 4 8 3 7 12 16 11 15 2 6 10 14 1 5 9 13 Intermediate merge 1 2 3 4 3 4 7 8 11 12 15 16 1 2 5 6 13 14 9 10 Sorted among and within files 3 4 1 2 5 1 2 3 4 5 7 8 6 9 10 6 7 8 9 10 11 12 15 13 14 16 11 12 13 14 15 16 Final merge Sorted list 3 4 Range Redistribution Range Redistribution 1 2 Range Redistribution Figure 4.6 Parallel redistribution binary-merge sort Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 88 Chapter 4 Parallel Sort and GroupBy Note from the illustration that in the final merge phase, some of the boxes are empty (i.e., gray boxes). This indicates that they do not receive any values from the designated processors. For example, the first box on the left is gray because there are no values ranging from 1 to 5 from processor 2. Practically, in this example, processor 1 performs the final merging of two lists, because the other two lists are empty. Also, note that the results produced by the intermediate merging in the above example are sorted within and among processors. This means that, for example, processors 1 and 2 produce a sorted list each, and the union of these results is also sorted where the results from processor 2 are preceded by those from processor 1. This is applied to other pairs of processors. Each pair of processors in this case forms a pool of processors. At the next level of merging, two pools of processors use the same strategy as in the previous level. Finally, in the final merging, all processors will form one pool, and therefore results produced in each processor are sorted, and these results united together are then sorted based on the processor order. In some systems, this is already a final result. If there is a need to place the results in one processor, results transfers are then carried out. The apparent benefit of this method is that merging becomes lighter compared with those without redistribution, because merging is now shared by multiple processors, not monopolized by just one processor. Parallelism is therefore accom- plished at all levels of merging, even though the performance benefits of this mechanism are restricted. The problem of the redistribution method still remains, which relates to the height of the tree. This is due to the fact that merging is done in a pipeline format. Another problem raised by the redistribution is skew. Although initial placement in each disk is balanced through the use of round-robin data partitioning, redistribution in the merging process is likely to produce skew, as shown in Figure 4.6. Like the merge-all sort method, final merging in the redistribution method is also dependent upon the maximum number of files opened. 4.3.4 Parallel Redistribution Merge-All Sort Parallel redistribution merge-all sort is motivated by two factors, namely, reducing the height of the tree while maintaining parallelism at the merging stage. This can be achieved by exploiting the features of parallel merge-all and parallel redistribution binary-merge methods. In other words, parallel redistribution is a two-phase method (local sort and final merging) like parallel merge-all sort, but does a redistribution based on a range partitioning. Figure 4.7 gives an illustration of parallel redistribution merge-all sort. As shown in Figure 4.7, parallel redistribution merge-all sort is a two-phase method, where in phase one, local sort is carried out as is done with other methods, and in phase two, results from local sort are redistributed to all processors based on a range partitioning, and merging is then performed by each processor. Similar to parallel redistribution binary-merge sort, empty (gray) boxes are actually empty lists as a result of data redistribution. In the above example, processor Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 4.3 Algorithms for Parallel External Sort 89 6–10 1–5 11–15 16–20 1 Local sort Records from the child operator 8 12 16 4 11 15 3 7 14 2 6 10 1 5 9 13 Redistribution 1 2 3 4 34 2 1 5 1 2 3 4 5 8 7 6 10 9 6 7 8 9 10 12 11 15 1314 16 11 12 13 14 15 16 Final merge Sorted list 4 8 12 16 3 7 11 15 2 6 10 14 1 5 9 13 Range Redistribution 3 4 2 Figure 4.7 Parallel redistribution merge-all sort 4 has three empty lists coming from processors 2, 3, and 4, as they do not have values ranging from 16 to 20 as specified by the range partitioning function. Also, note that the final results produced in the final merging phase in each processor are sorted, and these are also sorted among all processors based on the order of the processors specified by the range partitioning function. The advantage of this method is the same as that of parallel redistribution binary-merge sort, including true parallelism in the merging process. However, the tree of parallel redistribution merge-all sort is not a tall tree as in the parallel redistribution binary-merge sort. It is, in fact, a one-level tree, the same as in parallel merge-all sort. Not only do the advantages of parallel redistribution merge-all sort mirror those in parallel merge-all sort and parallel redistribution binary-merge sort, so also do the problems. Skew problems found in parallel redistribution binary-merge sort also exist with this method. Consequently, skew modeling needs some simplified assumptions as well. Additionally, a bottleneck problem in merging, which is similar to that of parallel merge-all sort is also common here, especially if the number of processors is large and exceeds the limit of the number of files that can be opened at once. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... (ICDE 1990) and DeWitt et al (1992) discuss systems issues in parallel database sorting Parallel sorting for databases uses external sorting methods Yamane and Take (1987) and Zhao et al (2000) proposed a parallel partition sort Lorie and Young (VLDB 1999) concentrated on the communication costs of parallel sorting, whereas Lo and Huang (2002) focused on the skew aspects Recent work on parallel sorting... selecting an appropriate parallel sorting algorithm, the following rules can be adopted: Ž If processing skew degree is high, then use parallel redistribution merge-all sort Ž If both data skew and processing skew degrees are high OR no skew, then use parallel partitioned sort In this chapter, three parallel algorithms for processing GroupBy queries in high-performance parallel database systems have also... 4.6 Give a proof for the following rule: If processing skew degree is high, then use parallel redistribution merge-all sort, whereas if both data skew and processing skew degrees are high or no skew, then use parallel partitioned sort 4.7 Investigate your favourite parallel DBMS, and show how parallel sort and parallel group-by are expressed in SQL Chapter 5 Parallel Join The join operation is one of... PARALLEL ALGORITHMS FOR GROUPBY QUERIES Parallel aggregate processing is very similar to parallel sorting, described in the previous section From the lessons we learned from parallel sorting, we focus on three parallel aggregate query algorithms; Ž Ž Ž Traditional methods including merge-all and hierarchical merging, Two-phase method, and Redistribution method 4.4.1 Traditional Methods (Merge-All and. .. Chapter 4 Parallel Sort and GroupBy 4.3.5 Parallel Partitioned Sort Parallel partitioned sort is influenced by the techniques used in parallel partitioned join, where the process is split into two stages: partitioning and independent local work In parallel partitioned sort, first we partition local data according to range partitioning used in the operation Note the difference between this method and others... destination cost D jRi j ð td Similar to parallel merge-all sort and parallel binary-merge sort, Ri in the above equation may involve data skew Other than the compute destination cost, the local 102 Chapter 4 Parallel Sort and GroupBy merge-sort costs in parallel redistribution binary-merge sort are the same as those in parallel merge-all sort The pipeline merging costs in parallel redistribution binary-merge... sorting are studied These include (i) parallel merge-all sort, (ii) parallel binary-merge sort, (iii) parallel redistribution binary-merge sort, (iv) parallel redistribution merge-all sort, and (v/ parallel partitioned sort The third and fourth algorithms are the redistribution versions of the second and first, respectively Cost models for each parallel sort algorithm are studied The unique features of these... often necessary when assembling information, since the desired information is often split during the design as a result of the normalization process High-Performance Parallel Database Processing and Grid Databases, by David Taniar, Clement Leung, Wenny Rahayu, and Sushant Goel Copyright  2008 John Wiley & Sons, Inc 112 5.1 Join Operations 113 join attributes R S attr1 a attr2 r b r c s r attr1 join attr2... depends on the size of the available buffer, and data redistribution to bridge the first and second steps of parallel sorting The redistribution versions of parallel sorting algorithms, which are normally better than the non-redistribution version, are prone to processing skew, even though the initial data placement is uniformly distributed Solving processing skew in parallel sorting is a challenge without... merging costs for parallel redistribution merge-all sort and those for parallel redistribution binary-merge sort, there are major differences The first relates to the number of levels in the pipeline, which is dlog2 N /e for parallel redistribution binary-merge sort and 1 for parallel redistribution merge-all sort The second concerns the number of merging passes involved in the k-way merging In parallel redistribution . watermark. 92 Chapter 4 Parallel Sort and GroupBy 4.4 PARALLEL ALGORITHMS FOR GROUPBY QUERIES Parallel aggregate processing is very similar to parallel sorting,. several database machine prototypes (e.g., Gamma) and some commercial Parallel DBMS. Parallel merge-all sort is composed of two phases: local sort and final

Ngày đăng: 21/01/2014, 18:20

Xem thêm