Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
352,03 KB
Nội dung
180 Chapter 7 Parallel Indexing
the FRI-1 structure. Note that the global index is replicated to the three processors.
In the diagram, the data pointers are not shown. However, one can imagine that
each key in the leaf nodes has a data pointer going to the correct record, and each
record will have three incoming data pointers.
FRI-3 is quite similar to PRI-1, except that the table partitioning for FRI-3 is not
the same as the indexed attribute. For example, the table partitioning is based on
the Name field and uses a range partitioning, whereas the index is on the ID field.
However, the similarity is that the index is fully replicated, and each of the records
will also have n incoming data pointers, where n is the number of replication of the
index. Figure 7.13 shows an example of the FRI-3. Once again, the data pointers
are not shown in the diagram.
It is clear from the two variations discussed above (i.e., FRI-1 and FRI-3) that
variation 2 is not applicable for FRI structures, because the index is fully replicated.
Unlike the other variations 2 (i.e., NRI-2 and PRI-2), they exist because the index
is partitioned, and part of the global index on a particular processor is built upon
the records located at that processor. If the index is fully replicated, there will not
be any structure like this, because the index located at a processor cannot be built
purely from the records located at that processor alone. This is why FRI-2 does not
exist.
7.3 INDEX MAINTENANCE
In this section, we examine the various issues and complexities related to main-
taining different parallel index structures. Index maintenance covers insertion and
deletion of index nodes. The general steps for index maintenance are as follows:
ž
Insert/delete a record to the table (carried out in processor p
1
),
ž
Insert/delete an index node to/from the index tree (carried out in processor
p
2
), and
ž
Update the data pointers.
In the last step above, if it is an insertion operation, a data pointer is created
from the new index key to the new inserted record. If it is a deletion operation, a
deletion of the data pointer takes place.
Parallel index maintenance essentially concerns the following two issues:
ž
Whether p
1
D p
2
. This relates to the data pointer complexity.
ž
Whether maintaining an index (insert or delete) involves multiple processors.
This issue relates to the restructuring of the index tree itself.
The simplest form of index maintenance is where p
1
D p
2
and the inser-
tion/deletion of an index node involves a single processor only. These two issues
for each of the parallel indexing structures are discussed next.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Processor 1
23 Adams
37 Chris
46 Eric
92 Fred
48 Greg
78 Oprah
28 Tracey
39 Uma
8 Agnes
Name: −x
Processor 3
59 Johanna
74 Norman
16 Queenie
20 Ross
24 Susan
69 Yuliana
75 Zorro
49 Bonnie
Name: −x
x = vowel (i,o,u)
x = consonant
Processor 2
65 Bernard
60 David
71 Harold
56 Ian
18 Kathy
21 Larry
10 Mary
15 Peter
43 Vera
Name: −x
47 Wenny
50 Xena
33 Caroline
38 Dennis
x = vowel (a,e)
o8
o10
o15
o28
o33
o37
o46
o47
o48
o38
o39
o43
o49
o
50
o56
o65
o69
o71
o16
o18
o23
o24
o20
o21
o59
o
60
o74
o75
o78
o92
15
43
56
37
18
21
24
71
75
48
60
8
Processor 1
o
o10 o15 o
28 o33
o37 o46
o47
o48
o38
o39 o43
o49
o50 o56 o
65 o69
o71
o16 o18
o23
o24
o20
o21 o59 o60 o74
o75
o78
o92
15
43
56
37
18
21 24
71 75
48
60
Processor 2
o8 o10 o15 o28 o33
o37
o46
o47
o48
o38 o39 o43 o49 o50 o56 o65
o69
o71o16
o18
o23 o24
o20
o21
o59
o60 o74 o75
o78 o92
15
43 56
37
18
21
24
71 75
48
60
Processor 3
Figure 7.13 FRI-3 (index partitioning attribute 6D table partitioning attribute)
181
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
182 Chapter 7 Parallel Indexing
7.3.1 Maintaining a Parallel Nonreplicated Index
Maintenance of the NRI structures basically involves a single processor. Hence,
the subject is really whether p
1
is equal to p
2
. For the NRI-1 and NRI-2 struc-
tures, p
1
D p
2
. Accordingly, these two parallel indexing structures are the simplest
form of parallel index. The mechanism of index maintenance for these two parallel
indexing structures is carried out as per normal index maintenance on sequential
processors. The insertion and deletion procedures are summarized as follows.
After a new record has been inserted to the appropriate processor, a new index
key is inserted to the index tree also at the same processor. The index key insertion
steps are as follows. First, search for an appropriate leaf node for the new key
on the index tree. Then, insert the new key entry to this leaf node, if there is still
space in this node. However, if the node is already full, this leaf node must be
split into two leaf nodes. The first half of the entries are kept in the original leaf
node, and the remaining entries are moved to a new leaf node. The last entry of
the first of the two leaf nodes is copied to the nonleaf parent node. Furthermore, if
the nonleaf parent node is also full, it has to be split again into two nonleaf nodes,
similar to what occurred with the leaf nodes. The only difference is that the last
entry of the first node is not copied to the parent node, but is moved. Finally, a
data pointer is established from the new key on the leaf node to the record located
at the same processor.
The deletion process is similar to that for insertion. First, delete the record, and
then delete the desired key from the leaf node in the index tree (the data pointer is
to be deleted as well). When deleting the key from a leaf node, it is possible that
the node will become underflow after the deletion. In this case, try to find a sibling
leaf node (a leaf node directly to the left or to the right of the node with underflow)
and redistribute the entries among the node and its sibling so that both are at least
half full; otherwise, the node is merged with its siblings and the number of leaf
nodes is reduced.
Maintenance of the NRI-3 structure is more complex because p
1
6D p
2
.This
means that the location of the record to be inserted/deleted may be different from
the index node insertion/deletion. The complexity of this kind of index mainte-
nance is that the data pointer crosses the processor boundary. So, after both the
record and the index entry (key) have been inserted, the data pointer from the new
index entry in p
1
has to be established to the record in p
2
. Similarly, in the dele-
tion, after the record and the index entry have been deleted (and the index tree
is restructured), the data pointer from p
1
to p
2
has to be deleted as well. Despite
some degree of complexity, there is only one data pointer for each entry in the leaf
nodes to the actual record.
7.3.2 Maintaining a Parallel Partially Replicated
Index
Following the first issue on p
1
D p
2
mentioned in the previous section, mainte-
nance of PRI-1 and PRI-2 structures is similar to that of NRI-1 and NRI-2 where
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
7.3 Index Maintenance 183
p
1
D p
2
. Hence, there is no additional difficulty to data pointer maintenance. For
PRI-3, it is also similar to NRI-3; that is, p
1
6D p
2
. In other words, data pointer
maintenance of PRI-3 has the same complexity as that of NRI-3, where the data
pointer may be crossing from one processor (index node) to another processor
(record).
The main difference between the PRI and NRI structures is very much related to
the second issue on single/multiple processors being involved in index restructur-
ing. Unlike the NRI structures, where only single processors are involved in index
maintenance, the PRI structures require multiple processors to be involved. Hence,
the complexity of index maintenance for the PRI structures is now moved to index
restructuring, not so much on data pointers.
To understand the complexity of index restructuring for the PRI structures, con-
sider the insertion of entry 21 to the existing index (assume the PRI-1 structure is
used). In this example, we show three stages of the index insertion process. The
stages are (i) the initial index tree and the desired insertion of the new entry to the
existing index tree, (ii) the splitting node mechanism, and (iii) the restructuring of
the index tree.
The initial index tree position is shown in Figure 7.14(a). When a new entry of
21 is inserted, the first leaf node becomes overflow. A split of the overflow leaf
node is then carried out. The split action also causes the nonleaf parent node to be
overflow, and subsequently, a further split must be performed to the parent node
(see Fig. 7.14(b)).
Not that when splitting the leaf node, the two split leaf nodes are replicated to
processors 1 and 2, although the first leaf node after the split contains entries of the
first processor only (18 and 21—the range of processor 1 is 1–30). This is because
the original leaf node (18, 23, 37) has already been replicated to both processors 1
and 2. The two new leaf nodes have a node pointer linking them together.
When splitting the nonleaf node (37, 48, 60) into two nonleaf nodes (21; 48,
60), processor 3 is involved because the root node is also replicated to processor 3.
In the implementation, this can be tricky as processor 3 needs to be informed that
it must participate in the splitting process. An algorithm is presented at the end of
this section.
The final step is the restructuring step. This step is necessary because we need to
ensure that each node has been allocated to the correct processors. Figure 7.14(c)
shows a restructuring process. In this restructuring, the processor allocation is
updated. This is done by performing an in-order traversal of the tree, finding the
range of the node (min, max), determining the correct processor(s), and reallocat-
ing to the designated processor(s). When reallocating the nodes to processor(s),
each processor will also update the node pointers, pointing to its local or neighbor-
ing child nodes. Note that in the example, as a result of the restructuring, leaf node
(18, 21) is now located in processor 1 only (instead of processors 1 and 2).
Next, we present an example of a deletion process, which affects the index
structure. In this example, we would like to delete entry 21, expecting to get the
original tree structure shown previously before entry 21 is inserted. Figure 7.15
shows the current tree structure and the merge and collapse processes.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
184 Chapter 7 Parallel Indexing
Processors 1, 2
o18 o23 o37 o65 o71 o92
o46 o48
37 48 60
(a) Initial Tree
o56 o59 o60
Processor 2 Processor 2 Processor 3
Processors 1,2, 3
Insert 21 (overflow)
(b2) Split (Non Leaf Node)
Processors 1, 2
o23 o37
37 48 60
(b1) Split (Leaf Node)
Processors 1, 2, 3
Insert 21 (overflow)
o18 o21
Processors 1, 2
Processor 2
o23 o37
48 60
Processors 1, 2, 3
o18 o21
Processors 1, 2
Processor 2
21
37
o46 o48
o46 o48
(c) Restructure (Processor Re-Allocation)
o23 o37
48 60
Processors 1, 2, 3
o18 o21
Processors 1, 2
Processor 2
21
37
Processors 1, 2
Processors 2, 3
Processor 1
o46 o48
Figure 7.14 Index entry insertion in the PRI structures
As shown in Figure 7.15(a), after the deletion of entry 21, leaf node (18)
becomes underflow. A merging with its sibling leaf node needs to be carried out.
When merging two nodes, the processor(s) that own the new node are the union
of all processors owning the two old nodes. In this case, since node (18) is located
in processor 1 and node (23, 37) is in processors 1 and 2, the new merged node
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
7.3 Index Maintenance 185
(a) Initial Tree
o23 o37
48 60
Processors 1, 2, 3
o18 o21
Processors 1, 2
o56 o59 o60
Processor 2
21
37
Processors 1, 2
Processors 2, 3
Processor 1
D
elete 21 (underflow)
o65 o71 o92
Processor 3
o46 o48
Processor 2
Processors 1, 2
(b) Merge
48 60
Processors 1, 2, 3
37
37
Processors 1, 2
Processors 2, 3
Modify
o46 o48
Processor 2
o18 o23 o37
void
Processors 1, 2
o18 o23 o37
o46 o48
37 48 60
(c) Collapse
Processor 2
Processors 1, 2, 3
Figure 7.15 Index entry deletion in PRI structures
(18, 23, 37) should be located in processors 1 and 2. Also, as a consequence of the
merging, the immediate nonleaf parent node entry has to be modified in order to
identify the maximum value of the leaf node, which is now 37, not 21. As shown
in Figure 7.15(b), the right node pointer of the nonleaf parent node (37) becomes
void. Because nonleaf node (37) has the same entry as its parent node (root node
(37)), they have to be collapsed together, and consequently a new nonleaf node
(37, 48, 60) is formed (see Fig. 7.15(c)).
The restructuring process is the same as for the insertion process. In this
example, however, processor allocation has been done correctly and hence,
restructuring is not needed.
Maintenance Algorithms
As described above, maintenance of the PRI structures relates to splitting and
merging nodes when performing an insertion or deletion operation and to restruc-
turing and reallocating nodes after a split/merge has been done. The insertion and
deletion of a key from an index tree are preceded by a searching of the node where
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
186 Chapter 7 Parallel Indexing
the desired key is located. Algorithm find_node illustrates a key searching proce-
dure on an index tree. The
find_node algorithm is a recursive algorithm. It basi-
cally starts from a root node and traces into the desired leaf node either at the local
or neighboring processor by recursively calling the
find_node algorithm and pass-
ing a child tree to the same processor or following the trace to a different processor.
Once the node has been found, an operation insert or delete can be performed.
After an operation has been carried out to a designated leaf node, if the node
is overflow (in the case of insertion) or underflow (in the case of deletion), a split
or a merge operation must be done to the node. Splitting or merging nodes are
performed in the same manner as splitting or merging nodes in single-processor
systems (i.e., single-processor B C trees).
The difficult part of the
find_node algorithm is that when splitting/merging
nonleaf nodes, sometimes more processors need to be involved in addition to those
initially used. For example, in Figure 7.14(a) and (b), at first processors 1 and 2
are involved in inserting key 21 into the leaf nodes. Inserting entry 21 to the root
node involves processor 3 as well, since the root node is also replicated to pro-
cessor 3. The problem is how processor 3 is notified to perform such an operation
while only processors 1 and 2 were involved in the beginning. This is solved by
activating the
find_node algorithm in each processor. Processor 1 will ultimately
find the desired leaf node (18,23,37) in the local processor, and so will processor
2. Processor 3 however, will pass the operation to processor 2, as the desired leaf
node (18,23,37) located in processor 2 is referenced by the root node in proces-
sor 3. After the insertion operation (and the split operation) done to the leaf nodes
(18,23,37) located at processors 1 and 2 has been completed, the program control
is passed back to the root node. This is due to the nature of a recursive algorithm,
where the initial copy of the algorithm is called back when the child copy of the
process has been completed. Since all processors were activated in the beginning of
the find node operation, each processor now can perform a split process (because of
the overflow to the root node). In other words, there is no special process whereby
an additional processor (in this case processor 3) needs to be invited or notified
to be involved in the splitting of the root node. Everything is a consequence of the
recursive nature of the algorithm which was initiated in each processor. Figure 7.16
lists the
find_node algorithm.
After the
find_node algorithm (with an appropriate operation: insert or delete),
it is sometimes necessary to restructure the index tree (as shown in Fig. 7.14(c)).
The
restructure algorithm (Fig. 7.17) is composed of three algorithms. The
main
restructure algorithm calls the inorder algorithm where the traversal
is done. The
inorder traversal is a modified version of the traditional inorder
traversal, because an index tree is not a binary tree.
For each visit to the node in the
inorder algorithm, the proc_alloc algo-
rithm is called, for the actual checking of whether the right processor has been
allocated to each node. The checking in the
proc_alloc algorithm basically
checks whether or not the current node should be located at the current proces-
sor. If not, the node is deleted (in the case of a leaf node). If it is a nonleaf node, a
careful checking must be done, because even when the range of (min,max) is not
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
7.3 Index Maintenance 187
Algorithm: Find a node initiated in each processor
find
node (tree, key, operation)
1. if (key is in the range of local node)
2. if (local node is leaf)
3. execute operation insert or delete on local node
4. if (node is overflow or underflow)
5. perform split or merge on leaf
6. else
7. locate child tree
8. perform find
node (child, key, operation)
9. if (node is overflow or underflow)
10. perform split or collapse on non-leaf
11. else
12. locate child tree in neighbour
13. perform find
node (neighbour, key, operation)
Figure 7.16 Find a node algorithm
Algorithm:Index restructuring algorithms
restructure (tree) // Restructure in each local
processor
1. perform inorder (tree)
inorder (tree) // Inorder traversal for non-binary
// trees (like B C trees)
1. if (local tree is not null)
2. for
i
D1 to number of node pointers
3. perform inorder (tree!node pointer
i
)
4. perform proc
alloc (node)
proc
alloc (node) // Processor allocation
1. if (node is leaf)
2. if ((min,max) is not within the range)
3. delete node
4. if (node is non-leaf)
5. if (all node pointers are either void or
point to non local nodes)
6. delete node
7. if (a node pointer is void)
8. re-establish node pointer to a neighbor
Figure 7.17 Index restructuring algorithms
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
188 Chapter 7 Parallel Indexing
exactly within the range of the current processor, it is not necessary that the node
should not be located in this processor, as its child nodes may have been correctly
allocated to this processor. Only in the case where the current nonleaf node does
not have child nodes should the nonleaf node be deleted; otherwise, a correct node
pointer should be reestablished.
7.3.3 Maintaining a Parallel Fully Replicated Index
As an index is fully replicated to all processors, the main difference between NRI
and FRI structures is that in FRI structures, the number of data pointers coming
from an index leaf node to the record is equivalent to the number of processors.
This certainly increases the complexity of maintenance of data pointers.
In regard to involving multiple processors in index maintenance, it is not as
complicated as in the PRI structures, because in the FRI structures the index in
each processor is totally isolated and is not coupled as in the PRI structures. As a
result, any extra complication relating to index restructuring in the PRI structures
does not exist here. In fact, index maintenance of the FRI structures is similar to
that of the NRI structures, as all indexes are local to each processor.
7.3.4 Complexity Degree of Index Maintenance
The order of the complexity of parallel index maintenance, from the simplest to
the most complex, is as follows.
ž
The simplest forms are NRI-1 and NRI-2 structures, as p
1
D p
2
and only
single processors are involved in index maintenance (insert/delete).
ž
The next complexity level is on data pointer maintenance, especially when
index node location is different from based data location. The simpler one is
the NRI-3 structure, where the data pointer from an index entry to the record
is 1-to-1. The more complex one is the FRI structures, where the data pointers
are N-to-1 (from N index nodes to 1 record).
ž
The highest complexity level is on index restructuring. This is applicable to
all three PRI structures.
7.4 INDEX STORAGE ANALYSIS
Even though disk technology and disk capacity are expanding, it is important to
analyze space requirements of each parallel indexing structure. When examining
index storage capacity, we cannot exclude record storage capacity. Therefore, it
becomes important to include a discussion on the capacity of the base table, and to
allow a comparative analysis between index and record storage requirement.
In this section, the storage cost models for uniprocessors is first described. These
models are very important, as they will be used as a foundation for indexing the
storage model for parallel processors. The storage model for each of the three
parallel indexing structures is described next.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
7.4 Index Storage Analysis 189
7.4.1 Storage Cost Models for Uniprocessors
There are two storage cost models for uniprocessors: one for the record and the
other for the index.
Record Storage
There are two important elements in calculating the space required to store records
of a table. The first is the length of each record, and the second is the blocking fac-
tor. Based on these two elements, we can calculate the number of blocks required
to store all records.
The length of each record is the sum of the length of all fields, plus one byte for
deletion marker (Equation 7.1). The latter is used by the DBMS to mark records
that have been logically deleted but have not been physically removed, so that a
rollback operation can easily be performed by removing the deletion code of that
record.
Record length D Sum of all fields C 1 byte Deletion marker (7.1)
The storage unit used by a disk is a block. A blocking factor indicates the max-
imum number of records that can fit into a block (Equation 7.2).
Blocking factor D
floor.Block size=Record length/ (7.2)
Given the number of records in each block (i.e., blocking factor), the number of
blocks required to store all records can be calculated as follows.
Total blocks for all records D
ceiling.Number of records=Blocking factor/
(7.3)
Index Storage
There are two main parts of an index tree, namely leaf nodes and nonleaf nodes.
Storage cost models for leaf nodes are as follows. First, we need to identify the
number of entries in a leaf node. Then, the total number of blocks for all leaf nodes
can be determined. Each leaf node consists of a number of indexed attributes (i.e.,
key), and each key in a leaf node has a data pointer pointing to the corresponding
record. Each leaf node has also one node pointer pointing to the next leaf node.
Each leaf node is normally stored in one disk block. Therefore, it is important to
find out the number of keys (and their data pointers) that can fit into one disk block
(or one leaf node). Equation 7.4 shows the relationship between number of keys in
a leaf node and the size of each leaf node.
. p
leaf
ð .Key size C Data pointer// C Node pointer Ä Block size (7.4)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
[...]... spread spread spread, but spread not random randomly randomly not random randomly randomly not random randomly Figure 7.26 A comparative table for parallel multi-index selection query processing using a one-index access method NRI-1 Isolated record loading 212 Chapter 7 Parallel Indexing Both of these factors determine the efficiency of parallel multi-index search query processing based on the one-index... indexing structure in the context of parallel search and join query processing 7.7.1 Comparative Analysis of Parallel Search Index In this section, parallel one-index and multi-index search query processing are examined, followed by some discussions Analyzing Parallel One-Index Search Query Processing As mentioned previously, in parallel one-index search query processing there are three main elements: (i/... parallel join processing without indexes (e.g., parallel hash join) is applied instead 7.7 COMPARATIVE ANALYSIS As there are different kinds of parallel indexing structures and consequently various parallel algorithms for search and join queries involving index, as studied in the previous sections, it becomes important to analyze the efficiency of each parallel indexing structure in the context of parallel. .. of parallel multi-index search query processing Therefore, further performance analysis, incorporating storage space analysis and other operation analysis, is necessary in order to identify the efficiency of these indexing structures 7.7.2 Comparative Analysis of Parallel Index Join In this section, parallel one-index and multi-index join query processing are examined In parallel one-index join processing, ... loaded, and there is a great chance the records need to be loaded remotely and this incurs overhead Comparing PRI-1 with PRI-3, both seek remote index searching and data loading in parallel one-index join Additionally, PRI-3 needs remote record loading in parallel two-index join FRI-1 and FRI-3 are similar, since both require remote data loading in parallel one-index join and searching for starting and. .. offer a great number of benefits in parallel processing, which in this case is confirmed in parallel one-index join andparallel two-index join processing An obvious drawback is storage overhead, which can be enormously large If storage overhead is to be minimized, the nonreplicated parallel indexing structures, in particular NRI-1 and NRI-3, are favorable On the other hand, the PRI structures do not seem... to maintain the index On the other hand, NRI-1 is sufficient to provide support for parallel one-index search query processing The second preferable indexing structure to support parallel search query processing cannot be easily identified NRI/PRI/FRI-3 indexing structures are quite favorable, particularly in parallel one-index search query processing On the other hand, NRI/PRI-2 indexing structures are... search attributes are indexed, parallel algorithms for these search queries are very much influenced by the indexing structures Depending on the number of attributes being searched for and whether these attributes are indexed, parallelprocessing of search queries using parallel index can be categorized into two types of searches, namely (i/ parallel one-index search and (ii) parallel multi-index search... table comparison, each parallel indexing structure has advantages and disadvantages in supporting parallel index-join processing There is no single parallel indexing structure that is the most efficient in all aspects However, it is noted that NRI-2 and PRI-2 do not support parallel index-join processing efficiently Therefore, the use of these parallel indexing structures is not suggested The comparison shows... indexed table in both parallel one-index andparallel two-index join processing The efficiency of remote data loading is very much determined by the selectivity factor of the query The higher 214 Figure 7.27 A comparative table for parallel index-join query processing Local join Data partitioning Indexed table searching Indexed table record loading Parallel Merging Searching start and end values Two-Index . across
to another processor.
7.5 PARALLEL PROCESSING OF SEARCH QUERIES
USING INDEX
In this section, we consider the parallel processing of search queries involving
index predicates on multiple indexed attributes.
7.5.1 Parallel One-Index Search Query Processing
Parallel processing of a one-index selection query exists in