O''''Reilly Network For Information About''''s Book part 140 pps

Free at Last: An Efficient Method of Handling Variable-Length Records Introduction In this chapter we will develop an algorithm that uses the quantum file access method to handle a file containing a large number of records each of which can vary dynamically in size. In order to appreciate the power of this access method, we will start by considering the much simpler problem of access to fixed-length records. Algorithm Discussed The Quantum File Access Method A Harmless Fixation Let us suppose that we want to access a number of fixed-length customer records by their record numbers. 1 Given the record number, we can locate that customer's record by multiplying the record number by the length of a record, which gives us the offset into the file where that record will be found. We can then read or write that record as needed. Of course, a real application needs to reuse records of customers who have become inactive, to prevent the customer file from growing indefinitely. To handle this problem, we could set a limit on the file size and, when it is reached, start reusing records that haven't been referenced for a long time, making sure to correct or delete any records in other files that refer to the deleted customer. This fixed-length record system is not very difficult to implement, but it has significant drawbacks; address fields, for example, tend to vary greatly in length, with some records needing 25 or 30 characters for a city name or street address and others needing only 10. If we allocate enough storage for the extreme case, the records become much longer than if we had to handle only the average case. However, allocating enough for only the average case will leave those customers whose names or addresses won't fit into the allotted space quite unhappy, as I know from personal experience as a software developer! The obvious solution is to allow the fields (and therefore the records) to vary in length as necessary. Checking Out of the Hotel Procrustes Unfortunately, variable-length records are much more difficult to deal with than fixed-length ones. The most obvious reason, as discussed above, is that determining where fixed-length records are stored requires only a simple calculation; this is not true of variable-length records. However, we could remedy this fairly easily by maintaining a record index consisting of an array of structures containing the starting location and length of each record, as depicted in Figure recindex. 2 A sample record index array and data for variable-length records (Figure recindex) +- Starting Record | Index Address Length | + + | 0 | 0 12 + + Record | 1 | 12 12 + ++ Index | 2 | 24 7 + ++ + Array | 3 | 31 2 + ++ + + | 4 | 33 5 + ++ + +-+ | + + || | | | +- + +| | | | | + + | | | +- | | | | | | ++ + + +-+ + Record | |Steve HellerP.O.Box 0335BaldwinNY11510| Data | + + +- +- Record | 0000000000111111111122222222223333333333 Offset | 0123456789012345678901234567890123456789 +- We encounter a more serious difficulty when we want to delete a record and reuse the space it occupied. 3 In some situations we can sidestep the problem by adding the new version of the record to the end of the file and changing the record pointer to refer to the new location of the record; however, in the case of an actively updated file such an approach would cause the file to grow rapidly. But how can we reuse the space vacated by a deleted record of some arbitrary size? The chances of a new record being exactly the same size as any specific deleted one are relatively small, especially if the records average several hundred bytes each, as is not at all unusual in customer data files. A possible solution is to keep a separate free list for each record size and reuse a record of the correct size. However, there is a very serious problem with this approach: a new record may need 257 bytes, for example, and there may be no available record of exactly that size. Even though half of the records in the file might be deleted, none of them could be reused, and we would still be forced to extend the file. The attempt to solve this difficulty by using a record that is somewhat larger than necessary leads to many unusably small areas being left in the file (a situation known as fragmentation). However, there is a relatively unknown way to make variable-length records more tractable: the quantum file access method. 4 The key is to combine them into groups of fixed length, which can then be allocated and deallocated in an efficient manner. The Quantum File Access Method Before the following discussion will make much sense to you, I will need to explain in general terms what we're trying to accomplish: building a virtual memory system that can accommodate records of varying lengths in an efficient manner. This means that even though at any given time, we are storing most of our data on the disk rather than maintaining it all in memory, we will provide access to all the data as though it were in memory. To do this, we have to arrange that any actively used data is actually in memory when it is needed. In the present application, our data is divided into fixed-size blocks called quanta (plural of quantum), 5 so the task of our virtual memory system is to ensure that the correct blocks are in memory as needed by the user. 6 The quanta in the file are generally divided into a number of addressable units called items. 7 When adding a record to the file, we search a free space list, looking for a quantum with enough free space for the new record. When we find one, we add the record to that quantum and store the record's location in the item reference array, or IRA, which replaces the record index in Figure recindex; this array consists of entries of the form "quantum number, item number". 8 The item number refers to an entry in the item index stored at the beginning of the quantum; the items are stored in the quantum in order of their item index entries, which allows the size of an item to be calculated rather than having to be stored. For example, if we were to create an array of variable-length strings, some of its item references might look like those illustrated in Figure itemref1. Sample IRA, item index, and data, before deletions (Figure itemref1) +- | Quantum Item | Index Number Number | + + Item | 0 | 3 1 + + Reference | 1 | 3 2 + +-+ Array | 2 | 3 3 + +-+ + (IRA) | 3 | 3 4 + +-+ +-+ | 4 | 3 5 + +-+ +-+-+ | + + | | | | | +- | | | | | | | | | | +- Item # Offset Type Index | | | | | | + + | | | | | | 1 | 12 -+ VARSTRING 0 |± + | | | | | 2 | 24 +| VARSTRING 1 |± + | | | Item | 3 | +-31 || VARSTRING 2 |± + | | Index | 4 | ++-33 || VARSTRING 3 |± + | for | 5 |+++-38 || VARSTRING 4 |± + Quantum | 6 |||| 38 || UNUSED 0 | 3 | 7 |||| 38 || UNUSED 0 | | 8 |||| 38 || UNUSED 0 | | 9 |||| 38 || UNUSED 0 | | 10 |||| 38 || UNUSED 0 | | ++++ ++ + +- ||| |+ + ||| + + | ||+ + | | +- |+ + | | | | + + +-+ + + + Quantum | | 11510NYBaldwinP.O.Box 0335Steve Heller| Data | + + +- +- Quantum | 333333333222222222211111111110000000000 Offset | 876543210987654321098765432109876543210 +- When we delete an item from a quantum, we have to update the free space list entry for that quantum to reflect the amount freed, so the space can be reused the next time an item is to be added to the file. We also have to slide the remaining items in the quantum together so that the free space is in one contiguous block, rather than in slivers scattered throughout the quantum. With a record index like the one in Figure recindex, we would have to change the record index entries for all the records that were moved. Since the records that were moved might be anywhere in the record index, this could impose unacceptable overhead on deletions; to avoid this problem, we will leave the item index entry for a deleted item empty rather than sliding the other entries down in the quantum, so that the IRA is unaffected by changes in the position of records within a quantum. If we delete element 1 from the array, the resulting quantum looks like Figure itemref2. Sample IRA, item index, and data (Figure itemref2) +- | Quantum Item | Index Number Number | + + Item | 0 | 3 1 + + Reference | 1 | NONE 0 | | Array | 2 | 3 3 + + + (IRA) | 3 | 3 4 + + +-+ | 4 | 3 5 + + +-+-+ | + + | | | | +- | | | | | | | | +- Item # Offset Type Index | | | | | + + | | | | | 1 | 12 -+ VARSTRING 0 |± + | | | | 2 | 12 | UNUSED 0 | | | | Item | 3 | 19-+| VARSTRING 2 |± + | | Index | 4 | + 21 || VARSTRING 3 |± + | for | 5 |++ 26 || VARSTRING 4 |± + Quantum | 6 ||| 38 || UNUSED 0 | 3 | 7 ||| 38 || UNUSED 0 | | 8 ||| 38 || UNUSED 0 | | 9 ||| 38 || UNUSED 0 | | 10 ||| 38 || UNUSED 0 | | +++ ++ + +- || |+ + . fields, for example, tend to vary greatly in length, with some records needing 25 or 30 characters for a city name or street address and others needing only 10. If we allocate enough storage for. obvious solution is to allow the fields (and therefore the records) to vary in length as necessary. Checking Out of the Hotel Procrustes Unfortunately, variable-length records are much more. separate free list for each record size and reuse a record of the correct size. However, there is a very serious problem with this approach: a new record may need 257 bytes, for example, and

Định dạng
Số trang	6
Dung lượng	23,98 KB