Tài liệu High-Performance Parallel Database Processing and Grid Databases- P11 doc

50 368 0
Tài liệu High-Performance Parallel Database Processing and Grid Databases- P11 doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

480 Chapter 17 Parallel Clustering and Classification Rec# Weather Temperature Time Day Jog (Target Class) 1 Fine Mild Sunset Weekend Yes 2 Fine Hot Sunset Weekday Yes 3 Shower Mild Midday Weekday No 4 Thunderstorm Cool Dawn Weekend No 5 Shower Hot Sunset Weekday Yes 6 Fine Hot Midday Weekday No 7 Fine Cool Dawn Weekend No 8 Thunderstorm Cool Midday Weekday No 9 Fine Cool Midday Weekday Yes 10 Fine Mild Midday Weekday Yes 11 Shower Hot Dawn Weekend No 12 Shower Mild Dawn Weekday No 13 Fine Cool Dawn Weekday No 14 Thunderstorm Mild Sunset Weekend No 15 Thunderstorm Hot Midday Weekday No Figure 17.11 Training data set thunderstorm, whereas the possible values for temperature are hot, mild, and cool. Continuous values are real numbers (e.g., heights of a person in centimetres). Figure 17.11 shows the training data set for the decision tree shown previously. This training data set consists of only 15 records. For simplicity, only categorical attributes are used in this example. Examining the first record and matching it with the decision tree in Figure 17.10, the target is a Yes for fine weather and mild temperature, disregarding the other two attributes. This is because all records in this training data set follow this rule (see records 1 and 10). Other records, such as records 9 and 13 use all the four attributes. 17.3.2 Decision Tree Classification: Processes Decision Tree Algorithm There are many different algorithms to construct a decision tree, such as ID3, C4.5, Sprint, etc. Constructing a decision tree is generally a recursive process. At the start, all training records are at the root node. Then it partitions the training records recursively by choosing one attribute at a time. The process is repeated for the partitioned data set. The recursion stops when a stopping condition is reached, which is when all of the training records in the partition have the same target class label. Figure 17.12 shows an algorithm for constructing a decision tree. The deci- sion tree construction algorithm uses a divide-and-conquer method. It constructs the tree using a depth-first fashion. Branching can be binary (only 2 branches) or multiway (½2 branches). Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 17.3 Parallel Classification 481 Algorithm: Decision Tree Construction Input: training dataset D Output: decision tree T Procedure DTConstruct ( D ): 1. T DØ 2. Determine best splitting attribute 3. T Dcreate root node and label with splitting attribute 4. T Dadd arc to root node for each split predicate with label 5. For each arc do 6. D Ddataset created by applying splitting predicate to D 7. If stopping point reached for this path Then 8. T’ D create leaf node and label with appropriate class 9. Else 10. T’ D DTConstruct ( D ) 11. T Dadd T ’toarc Figure 17.12 Decision tree algorithm Note that in the algorithm shown in Figure 17.12, the key element is the splitting attribute selection (line 2). The splitting attribute is the attribute chosen to split the training data set into a number of partitions. The splitting attribute step is also often known as feature selection, because the algorithm needs to select a feature (or an attribute) of the training data set to create a node. As mentioned earlier, choosing a different attribute as a splitting attribute will cause the result decision to be dif- ferent. The difference in the decision tree produced by an algorithm lies in how to position the features or input attributes. Hence, choosing a splitting attribute, which will result in an optimum decision tree, is desirable. The way by which a splitting node is determined will be described in greater detail in the following. Splitting Attributes or Feature Selection When constructing a decision tree, it is necessary to have a means of determining the importance of the attributes for the classification. Hence, calculation is needed to find the best splitting attribute at a node. All possible splitting attributes are evaluated with a feature selection criterion to find the best attribute. Although the feature selection criterion still does not guarantee the best decision tree, neverthe- less, it also relies on the completeness of the training data set and whether or not the training data set provides enough information. The main aim of feature selection or choosing the right splitting attribute at some point in a decision tree is to create a tree that is as simple as possible and gives the correct classification. Consequently, poor selection of an attribute can result in a poor decision tree. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 482 Chapter 17 Parallel Clustering and Classification At each node, available attributes are evaluated on the basis of separating the classes of the training records. For example, looking at the training records in Figure 17.11, we note that if Time D Dawn, then the answer is always No (see records 4, 7, 11–13). It means that if Time is chosen as the first splitting attribute, at the next stage, we do not need to process these 5 records (records 4, 7, 11–13). We need to process only those records with Time D Sunset or Midday (10 records altogether), making the gain for choosing attribute Time as a splitting attribute quite high and hence, desirable. Let us look at another possible attribute, namely, Weather. Also notice that when the Weather D Thunderstorm, the target class is always No (see records 4, 8, 14–15). If attribute Weather is chosen as a splitting attribute in the beginning, in the next stage, these four records (records 4, 8, 14–15) will not be processed—we need to process only the other 11 records. So, the gain in choosing attribute Weather as a splitting attribute is not that bad, but not as good as the attribute Time, because a higher number of records are pruned out. Therefore, the main goal for choosing the best splitting attribute is to choose the attribute that will prune out as many records as possible at the early stage, so that fewer records need to be processed in the subsequent stages. We can also say that the best splitting attribute is the one that will result in the smallest tree. There are various kinds of feature selection criteria for determining the best splitting attributes. The basic feature selection criterion is called gain criterion, which was designed for the one of the original decision tree algorithm (i.e., ID3/C4.5). Heuristically, the best splitting attribute will produce the “purest” nodes. A popular impurity criterion is information gain. Information gain increases with the average purity of the subsets that an attribute produces. Therefore, the strategy is to choose an attribute that results in greatest information gain. The gain criterion basically consists of four important calculations. Ž Given a probability distribution, the information required to predict an event is the distribution’s entropy. Entropy for the given probability of the target classes, p 1 ; p 2 ;:::; p n where n P iD1 p i D 1, can be calculated as follows: entropy.p 1 ; p 2 ;:::; p n / D n X iD1 . p i log.1= p i // (17.2) Let us use the training data set in Figure 17.11. There are two target classes: Yes and No. With 15 records in the training data set, 5 records have target class Yes and the other 10 records have target class No. The probability of falling into a Yes is 5/15, whereas the No probability is 10/15. Entropy for the given probability of the two target classes is then calculated as follows: entropy(Yes, No) D 5=15 ð log.15=5/ C 10=15 ð log.15=10/ D 0:2764 (17.3) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 17.3 Parallel Classification 483 At the next iteration, when the training data set is partitioned to a smaller subset, we need to calculate the entropy based on the number of training records in the partition, not the total number of records in the original training data set. Ž For each of the possible attributes to be chosen as a splitting attribute, we need to calculate the entropy value for each of the possible values of that particular attribute. Equation 17.2 can be used, but the number of records is not the total number of training records but rather the number of records possessing the attribute value of the entropy of a particular attribute: For example, for Weather D Fine, there are 4 records with target class Yes and 3 records with No. Hence the entropy for Weather D Fine is: entropy.Weather D Fine/ D 4=7 ð log.7=4/ C 3=7 ð log.7=3/ D 0:2966 (17.4) For example, for Weather D Shower, there is only 1 record with target class Yes and 3 records with No. Hence the entropy for Weather D Shower is: entropy.Weather D Shower / D 1=4 ð log.4=1/ C 3=4 ð log.4=3/ D 0:2442 (17.5) Note that the entropy calculation for both examples above uses a differ- ent total number of records. In Weather D Fine the number of records is 7, whereas in Weather D Shower the number of records is only 4. This number of records is important, because it affects the probability of having a target class. For example, for target class Yes in Fine weather the probability is 4/7, whereas the same target class Yes in Shower weather the probability is only 1/4. For each of the attribute values, we need to calculate the entropy. In other words, for attribute Weather, because there are three attribute values (e.g., Fine, Shower,andThunderstorm), each of these three values must have an entropy value. For attribute Temperature, for instance, we need an entropy calculated for values Hot, Mild,andCool. Ž The entropy values for each attribute must be summed with a weighted sum. The aim is that each attribute must have one entropy value. Because each attribute value has an individual entropy value (e.g., attribute Weight has three entropy values, one for each weather), and the entropy of each attribute value is based on a different probability distribution, when we combine all the entropy values from the same attributes, their individual weight must be considered. To calculate the weighted sum, each entropy value must be multiplied with the probability of each value of the total number of training records in the partition. For example, the weighted entropy value for Fine weather is 7/15 ð 0:2966. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 484 Chapter 17 Parallel Clustering and Classification There are 7 records out of 15 records with Fine weather, and the entropy for Fine weather is 0.2966 as calculated earlier (see equation 17.4). Using the same method, the weighted sum for Shower weather is 4/15 ð 0:2442, as there are only 4 records out of the 15 records in the training dataset with Shower weather, and the original entropy for Shower as calculated in equation 17.5 is 0.2442. After each individual entropy value has been weighted, we can sum them for each individual attribute. For example, the weighted sum for attribute Weather is: Weighted sum entropy .Weather/ D Weighted entropy .Fine/ C Weighted entropy .Shower / C Weighted entropy .Thunderstorm/ D 7=15 ð 0:2966 C 4=15 ð 0:2442 C 4=15 ð 0 D 0:2035 (17.6) Ž Finally, the gain for an attribute can be calculated by subtracting the weighted sum of the attribute entropy from the overall entropy. For example, the gain for attribute Weather is: gain(Weather) D entropy.training datasetD/  entropy.attributeWeather/ D 0:2764  0:2035 D 0:0729 (17.7) The first part of equation 17.7 was previously calculated from equation 17.3, whereas the second part of the equation is from equation 17.6 After all attributes have their gain values, the attribute that has the highest gain value is chosen as the splitting attribute. After an attribute has been chosen as a splitting attribute, the training data set is partitioned into a number of partitions according to the number of distinct values in the splitting attribute. Once the training data set has been partitioned, for each partition, the same process as above is repeated, until all records at the same parti- tion fall into the same target class, and then the process for the partition terminates (refer to Fig. 17.12 for the algorithm). A Walk-Through Example Using the sample training data set in Figure 17.11, the following gives a complete walk-through of the process to create a decision tree. Step 1: Calculate entropy for the training data set in Figure 17.11. The result is previously calculated as 0.2764 (see equation 17.3). Step 2: Process attribute Weather Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 17.3 Parallel Classification 485 Ž Calculate weighted sum entropy of attribute Weather: entropy(Fine) D 0:2966 (equation 17.4) entropy(Shower) D 0:2442 (equation 17.5) entropy(Thunderstorm) D 0 C 4=4 ð log.4=4/ D 0 weighted sum entropy(Weather) D 0:2035 (equation 17.6) Ž Calculate information gain for attribute Weather: gain (Weather) D 0:0729 (equation 17.7) Step 3: Process attribute Temperature Ž Calculate weighted sum entropy of attribute Temperature: entropy(Hot) D 2=5 ð log.5=2/ C 3=5 ð log.5=3/ D 0:2923 entropy(Mild) D entropy(Hot) entropy(Cool) D 1=5 ð log.5=1/ C 4=5 ð log.5=4/ D 0:2173 weighted sum entropy(Temperature) D 5=15 ð 0:2923 C 5=15 ð 0:2173 D 0:2674 Ž Calculate information gain for attribute Temperature: gain (Temperature) D 0:2764  0:2674 D 0:009 (17.8) Step 4: Process attribute Time Ž Calculate weighted sum entropy of attribute Time: entropy(Dawn) D 0 C 5=5 ð log.5=5/ D 0 entropy(Midday) D 2=6 ð log.6=2/ C 4=6 ð log.6=4/ D 0:2764 entropy(Sunset) D 3=4 ð log.4=3/ C 1=4 ð log.4=1/ D 0:2443 weighted sum entropy (Time) D 0 C 6=15 ð 0:2764 C 4=15 ð 0:2443 D 0:1757 Ž Calculate information gain for attribute Time: gain.Temperature/ D 0:2764  0:1757 D 0:1007 (17.9) Step 5: Process attribute Day Ž Calculate weighted sum entropy of attribute Day: entropy(Weekday) D 4=10 ð log.10=4/ C 6=10 ð log.10=6/ D 0:2923 entropy(Weekend) D 1=5 ð log.5=1/ C 4=5 ð log.5=4/ D 0:2173 weighted sum entropy (Day) D 10=15 ð 0:2923 C 5=15 ð 0:2173 D 0:2674 Ž Calculate information gain for attribute Day: gain.Temperature/ D 0:2764—0:2674 D 0:009 (17.10) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 486 Chapter 17 Parallel Clustering and Classification Sunset Dawn Midday Time No Partition D 1 Partition D 2 Figure 17.13 Attribute Time as the root node Comparing equations 17.7, 17.8, 17.9, and 17.10 ,and 17.10 for the gain of each other attributes (Weather, Temperature, Time, and Day), the biggest gain is Time, with gain value D 0:1007 (see equation 17.9), and as a result, attribute Time is chosen as the first splitting attribute. A partial decision tree with the root node Time is shown in Figure 17.13. The next stage is to process partition D 1 consisting of records with Time D Midday. Training dataset partition D 1 consists of 6 records with record numbers 3, 6, 8, 9, 10, and 15. The next task is to determine the splitting attribute for par- tition D 1 , whether it is Weather, Temperature,orDay. The process similar to the above to calculate the entropy and information gain, is summarized as follows: Step 1: Calculate entropy for the training dataset partition D 1 . entropy.D 1 / D 2=6log.6=2/ C 4=6log.6=4/ D 0:2764 (17.11) Step 2: Process attribute Weather Ž Calculate weighted sum entropy of attribute Weather entropy(Fine) D 2=3 ð log.6=2/ C 1=3 ð log.3=1/ D 0:2764 entropy(Shower) D entropy(Thunderstorm) D 0 weighted sum entropy (Weather) D 3=5 ð 0:2764 D 0:1382 Ž Calculate information gain for attribute Weather: gain.Weather/ D 0:2764  0:1382 D 0:1382 (17.12) Step 3: Process attribute Temperature Ž Calculate weighted sum entropy of attribute Temperature entropy(Hot) D 0 entropy(Mild) D entropy(Cool) D 1=2 ð log.2=1/ C 1=2 ð log.2=1/ D 0:3010 weighted sum entropy (Temperature) D 2=6 ð 0:3010 C 2=6 ð 0:3010 D 0:2006 Ž Calculate information gain for attribute Temperature: gain.Temperature/ D 0:2764—0:2006 D 0:0758 (17.13) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 17.3 Parallel Classification 487 Step 4: Process attribute Day Ž Calculate weighted sum entropy of attribute Day: entropy(Weekday) D 2=6 ð log.6=2/ C 4=6 ð log.6=4/ D 0:2764 entropy(Weekend) D 0 weighted sum entropy (Day) D 0:2764 Ž Calculate information gain for attribute Day: gain.Temperature/ D 0:2764—0:2764 D 0 (17.14) The best splitting node for partition D 2 is attribute Weather with information gain value of 0.1382 (see equation 17.12). Continuing from Figure 17.13, Figure 17.14 shows the temporary decision tree. For partition D 2 , the splitting attribute is also Weather. The entropy and infor- mation gain calculations are summarized as follows: entropy .D 2 / D 0:2443 weighted sum entropy .Weather/ D 0 gain . Weather/ D 0:2443 ) Highest information gain weighted sum entropy .Temperature/ D 0:1505 gain .Temperature/ D 0:0938 weighted sum entropy .Day/ D 0:1505 gain .Day/ D 0:0938 And for partition D 11 , the splitting attribute is Temperature. The entropy and information gain calculations are summarized as follows: entropy .D 11 / D 0:2546 weighted sum entropy .Temperature/ D 0 Dawn Sunset Midday Time No Partition D 2 Weather No Shower No Thunderstorm Partition D 11 Fine Figure 17.14 Attribute Weather as next splitting attribute Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 488 Chapter 17 Parallel Clustering and Classification Thunderstorm Thunderstorm Fine Dawn Sunset Midday Time No Weather No Shower No Fine Weather Yes Shower No Yes Hot Temperature Yes Mild No Cool Yes Figure 17.15 Final decision tree gain .Temperature/ D 0:2546 ) Highest inf ormation gain weighted sum entropy .Day/ D 0:2546 gain .Day/ D 0 Because each of the partitions has branches that reach the target class node, a complete decision tree is generated. Figure 17.15 shows the final decision tree. Note that the decision tree in Figure 17.15 looks different from the decision tree in Figure 17.10, and yet both correctly represent all rules from the training data set in Figure 17.11. The decision tree in Figure 17.15 looks more compact and is better than the one previously shown in Figure 17.10. Also note that Figure 17.15 does not use attribute Day as a splitting attribute at all (as the training data set is limited) and all rules can be generated without the need for attribute Day. 17.3.3 Decision Tree Classification: Parallel Processing Since the structure of a decision tree is similar to query tree optimization, parallelization of a decision tree would be quite similar to subqueries execution scheduling in parallel query optimization (refer to Chapter 9). In subqueries execution scheduling for query tree optimization, there are serial subqueries execution scheduling and parallel subqueries execution scheduling, whereas for parallel data mining, this chapter introduces data parallelism and result paral- lelism. A parallel decision tree combines both concepts, subqueries execution Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 17.3 Parallel Classification 489 scheduling and parallel data mining, because both deal with tree parallelism. Data parallelism for a decision tree is basically similar to serial subqueries execution scheduling, whereas result parallelism is identical to parallel subqueries execution scheduling. Both data parallelism and result parallelism for a decision tree are described below. Data Parallelism for Decision Tree There are many terms used to describe data parallelism for a decision tree, includ- ing synchronous tree construction, feature/attribute partitioning,orintratree node parallelism. All of these basically describe data parallelism from a different angle. As we discuss data parallelism for a decision tree, we will then note how other names would occur. Data parallelism is created because of data partitioning. Previously, particularly in parallel association rules, parallel sequential patterns, and parallel clustering, data parallelism employed horizontal data partitioning, whereby different records from the data set are distributed to different processors. Each processor will have a disjoint partitioned data set, each of which consists of a number of records with the complete attributes. Data parallelism for decision making employs another type of data partition- ing, namely vertical data partitioning. Note that basic data partitioning, covering horizontal and vertical data partitioning, was explained in Chapter 3 on parallel searching operation (or parallel selection operation). For a parallel decision tree using data parallelism, the training data set is vertically partitioned, so that each partition will have one or more feature attributes, the target class, and the record number. In other words, the feature attributes are vertically partitioned, but the record number and target class are replicated to all partitions. Figure 17.16 illus- trates the vertical data partitioning of a training data set. The target class needs to be replicated to all partitions because only by having the target class can the partitions be glued together. The record numbers will be used in the subsequent iterations in building the tree, as the partition size will be shrunk because of further partitioning of each partition. In data parallelism for a decision tree, like any other data parallelism, the com- plete temporary result, in this case the decision tree, will be maintained in each processor. In other words, at the end of each stage of building the decision tree, the same temporary decision tree will exist in all processors. This is the same as any other data parallelism, like data parallelism for association rules, where in count distribution, at the end of each iteration, the frequent itemset is the same for each processor. This is also the same in data parallelism for k-means clustering, where each processor will have the same clusters at the end of each iteration. Figure 17.17 shows an illustration of data parallelism for a decision tree. At level 1, the root node is processed and determined. At the end of level 1, each processor will have the same root node. At level 2, if the root node has n branches, there will be n level 2s. In the example shown in Figure 17.17, there are 3 branches from the root node. Con- sequently, there will be levels 2a, 2b, and 2c. Each sublevel of level 2 will be Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... Bibliography CHAPTER 1: PARALLEL DATABASES AND GRID DATABASES Almasi, G and Gottlieb, A., Highly Parallel Computing, Second edition, The Benjamin/Cummings Publishing Company Inc., 1994 Antonioletti, M., et al., “The design and implementation of Grid database services in OGSA-DAI”, Concurrency—Practice and Experience, 17(2–4):357–376, 2005 Atkinson, M.P., “Databases and the Grid: Who Challenges Whom?”,... Performance Evaluation of Parallel GroupBy-Before-join Query Processing in High Performance Database Systems HPCN Europe 2001: 241–250, Lecture Notes in Computer Science 2110 ( 2001 Springer) [5] David Taniar, Wenny Rahayu: Parallel Processing of "GroupBy-BeforeJoin" Queries in Cluster Architecture CCGrid 2001: 178–185 ( 2001 IEEE) High-Performance Parallel Database Processing and Grid Databases, by David... H., Wu, Z., and Mao, Y., “Q3: A Semantic Query Language for Dart Database Grid , Proceedings of Grid and Cooperative Computing (GCC), pp 372–380, 2004 Chen, H., Wu, Z., Mao, Y., and Zheng, G., “DartGrid: a semantic infrastructure for building database Grid applications”, Concurrency and Computation: Practice and Experience, 18(14):1811–1828, 2006 Cruanes, T., Dageville, B., and Ghosh, B., Parallel SQL... Proceedings of Very Large Data Bases (VLDB), pp 228–237, 1986 High-Performance Parallel Database Processing and Grid Databases, by David Taniar, Clement Leung, Wenny Rahayu, and Sushant Goel Copyright  2008 John Wiley & Sons, Inc 511 512 BIBLIOGRAPHY Fu, X., Xu, H., Hou, W., Lu, Y., and Chen, B., “Research of the Access and Integration of Grid Database , Proceedings of the 10th International Conference... Proceedings of Database and Expert Systems Applications DEXA Workshop, pp 524–528, 2004 Malaika, S., Eisenberg, A., and Melton, J., “Standards for Databases on the Grid , ACM SIGMOD Record, 32(3):92–100, 2003 Mehta, M and DeWitt, D.J., “Managing Intra-operator Parallelism in Parallel Database Systems”, Proceedings of Very Large Data Bases (VLDB), pp 382–394, 1995 Miller, S.S., Parallel Databases”, Proceedings... Pacitti, E., and Valduriez, P., Parallel Processing with Autonomous Databases in a Cluster System”, Proceedings of CoopIS/DOA/ODBASE, pp 410–428, 2002 Gottemukkala, V., Jhingran, A., and Padmanabhan, S., “Interfacing Parallel Applications and Parallel Databases”, Proceedings of International Conference on Data Engineering (ICDE), pp 355–364, 1997 Grabs, T., Böhm, K., and Schek, H., “High-level Parallelism... Systems for Efficient Parallel Query Processing , Proceedings of the 1st International Conference on Parallel and Distributed Information Systems PDIS’91, Miami Beach, pp 262–270, 1991 Jeffery, K.G., Database Research Issues in a WWW and GRIDs World”, Proceedings of Conference on Current Trends in Theory and Practice of Informatics (SOFSEM), pp 9–21, 2004 Jeffery, K.G., “GRIDS, Databases, and Information... http://www.informatik.uni-trier.de/¾ley/db/conf/waim/index.html Ž Parallel/ Distributed, High-Performance, and Grid Computing Conferences ICPP—International Conference on Parallel Processing DBLP URL: http://www.informatik.uni-trier.de/¾ley/db/conf/icpp/index.html ICPADS—International Conference on Parallel and Distributed Systems DBLP URL: http://www.informatik.uni-trier.de/¾ley/db/conf/icpads/index.html IPDPS—International Parallel and Distributed Processing. .. “Distributed Database Management Systems and the Data Grid , 18th IEEE Symposium on Mass Storage Systems and 9th NASA Goddard Conference on Mass storage Systems and Technologies, April 2001 Su, S.Y.W., Database Machines”, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp 157–158, 1978 Valduriez, P., Parallel Database Systems: Open Problems and New Issues”, Distributed and Parallel. .. Universal Quantification in Large Databases”, ACM Trans Database Syst., 20(2):187–236, 1995 Gray, J., The Benchmark Handbook for Database and Transaction Systems, Second edition, Morgan Kaufmann, 1993 Hameurlain, A and Morvan, F., “A Cost Evaluator for Parallel Database Systems”, Proceedings of Database and Expert Systems Applications (DEXA), pp 146–156, 1995 Hennessy, J.L and Patterson, D.A., Computer . decision tree method is used. Parallel k-means and the parallel decision tree adopt data parallelism and result parallelism. Data parallelism in clustering. scheduling and parallel subqueries execution scheduling, whereas for parallel data mining, this chapter introduces data parallelism and result paral- lelism. A parallel

Ngày đăng: 26/01/2014, 15:20

Từ khóa liên quan

Mục lục

  • High-Performance Parallel Database Processing and Grid Databases

    • Contents

    • Preface

    • Part I Introduction

      • 1. Introduction

        • 1.1. A Brief Overview: Parallel Databases and Grid Databases

        • 1.2. Parallel Query Processing: Motivations

        • 1.3. Parallel Query Processing: Objectives

          • 1.3.1. Speed Up

          • 1.3.2. Scale Up

          • 1.3.3. Parallel Obstacles

          • 1.4. Forms of Parallelism

            • 1.4.1. Interquery Parallelism

            • 1.4.2. Intraquery Parallelism

            • 1.4.3. Intraoperation Parallelism

            • 1.4.4. Interoperation Parallelism

            • 1.4.5. Mixed Parallelism—A More Practical Solution

            • 1.5. Parallel Database Architectures

              • 1.5.1. Shared-Memory and Shared-Disk Architectures

              • 1.5.2. Shared-Nothing Architecture

              • 1.5.3. Shared-Something Architecture

              • 1.5.4. Interconnection Networks

              • 1.6. Grid Database Architecture

              • 1.7. Structure of this Book

              • 1.8. Summary

              • 1.9. Bibliographical Notes

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan