HandBooks Professional Java-C-Scrip-SQL part 158 ppt

The normal constructor for the PersistentArrayUlongRef class (from quantum\persist.cpp) (Figure persist.06) codelist/persist.06 This constructor sets m_PAU to a reference to p_PAU and m_Index to the value of p_Index, so the object returned from the operator[ ] call will be a PersistentArrayUlongRef with those values. Therefore, the compiler- generated code looks something like this, so far: PersistentArrayUlongRef T(Save,1000000L); T = 1234567L; where T is an arbitrary name for a temporary object. The second of these lines is then translated into a call to PersistentArrayUlongRef::operator = (Figure persist.07), to do the assignment. The PersistentArrayUlongRef::operator = function (from quantum\persist.cpp) (Figure persist.07) codelist/persist.07 The generated code now looks something like this: PersistentArrayUlongRef T(Save,1000000L); T.operator =(1234567L); The operator = code, as you can see from the figure, calls the StoreElement operation for the object m_PAU, which as we noted above is a reference to the original object Save; the arguments to that call are m_Index, a copy of the index supplied in the original operator[ ] call, and p_Element, which is the value specified in the operator = call. Thus, the result is the same as that of the statement Save.StoreElement(1000000L,1234567L);, while the notation is that of a "normal" array access. However, we've only handled the case where we're updating an element of the array. We also need to be able to retrieve values once they've been stored. To see how that works, let's follow the translation of the following line: TestValue = Save[1000000L]; The process is fairly similar to what we've already done. The definition of PersistentArrayUlong::operator[] causes the compiler to generate code somewhat like the following: PersistentArrayUlongRef T(Save,1000000L); TestValue = T; This time, however, rather than translating the second line into a call to PersistentArrayUlongRef::operator=, the compiler translates it into a call to PersistentArrayUlongRef::operator Ulong, a "conversion function" that allows a PersistentArrayUlongRef to be used where a Ulong is expected (Figure persist.08). The PersistentArrayUlongRef::operator Ulong function (from quantum\persist.cpp) (Figure persist.08) codelist/persist.08 Therefore, the final generated code comes out something like this: PersistentArrayUlongRef T(Save,1000000L); TestValue = Ulong(T); As should be evident by looking at the code for that conversion function, it merely calls the GetElement operation of the object m_PAU, which as we noted above is a reference to the original object Save, with the argument m_Index, which is a copy of the original element index. Thus, the result is the same as the statement TestValue = Save.GetElement(1000000L);, while the notation is that of a "normal" array access. Before we move on to our next topic, I have a word of warning for you. If you use these "synthetic arrays" frequently, you may be tempted to inline the definitions of these auxiliary functions that make the array notation possible. I recommend that you don't do it, at least not without a lot of testing to make sure it works correctly. When I tried this with Borland C++ 3.1, the result appeared to work but generated terrible memory leaks; as far as I could determine, the memory that wasn't being freed was that belonging to the temporary objects that were created during the operation of these functions. Some Fine Details Now let's look at some of the other implementation details and problems I encountered in the process of getting the dynamic hashing algorithm and the hashtest.cpp test program to work. During development of the test program, I discovered that running the tests on large numbers of records took a fairly large amount of time; to generate a quantum file with 500,000 records in it takes about 45 minutes, doing almost continual disk accesses the whole time. 10 Although this is actually very fast when one considers the amount of data being processed, I still didn't want to have to execute such a program more often than I had to. Therefore, the test program needed the capability of incrementally adding records to an existing quantum file. However, this meant that the test program had to be able to start somewhere in the middle of the input file; adding the same records again and again wouldn't work because the algorithm isn't designed to handle records with duplicate keys. As the test program is implemented, only two pieces of information need to be saved between runs in order to allow the addition of previously unused records to the quantum file: the index number in the SaveKeys array where the next key should be saved, and the offset into the input file where the next record begins. We don't have to save the name of the input file or the ASCII output file, since those are compiled into the program; of course, this would not be appropriate for a real application program, but in our test program we don't want to change the name of the input file between runs. Clearly, if the input file could change from one run to the next, the information as to where we stopped when reading records wouldn't be of much use. Since both of the data items that need to be preserved happen to be representable by an unsigned long value, I decided to use a PersistentArrayUlong named Save to save them between runs. As a result, the only difference between starting a new quantum file and adding records to an existing file is that when we start a new quantum file, the offset in the input file and the starting position for keys to be added to the SaveKeys array have to be reset to their initial values. This ability to build on a previously created quantum file turned out to be quite useful in fixing an interesting bug that I ran into during capacity testing of the 16- bit version of this program. The theoretical capacity of a quantum file with the original size parameters, when used to support the dynamic hashing algorithm, could be calculated as 64K (maximum hash slots in the 16-bit implementation) * 6 (average records per hash slot), or 393,216. Although I didn't necessarily need to demonstrate that this exact number of records could actually be stored and retrieved successfully, especially in view of the number of hours that my machine would be doing continual random disk accesses, I felt that it was important to check that a very large number of records could be accommodated. I selected 250,000 as a nice round number that wouldn't take quite so long to test, and started to run tests for every multiple of 50,000 records up to 250,000. 11 As each test was finished, I copied the resulting quantum file to a new name to continue the testing to the next higher multiple of 50,000 records. Everything went along very nicely through the test that created a 150,000 record quantum file. Since adding 50,000 records to a file that already contained 150,000 records should have taken about 45 minutes, I started the next test, which should have generated a 200,000 record file, and went away to do some chores. Imagine my surprise when I came back and found that the program had aborted somewhere between elements 196,000 and 197,000 due to the failure of an assert that checks that all the records in a hash slot that is being split have the correct hash code to be put into either the old slot or the new slot. Upon investigating the reason for this bug, I discovered that the problem was that m_CurrentMaxSlotCount, which is used in DynamicHashArray::CalculateHash to calculate the hash code according to Larson's algorithm, was an unsigned rather than an unsigned long. As a result, when 32K slots were occupied and it was time to double the value of m_CurrentMaxSlotCount, its new value, which should have been 64K, was 0. This caused the hashing function to generate incorrect values and thus caused the assert to fail. Changing the type of m_CurrentMaxSlotCount to unsigned long solved the problem. However, there was one other place where the code had to change to implement this solution, and that was in the destructor for DynamicHashArray. The reason for this change is that the dynamic hashing algorithm needs to store some state information in a persistent form. To be specific, we need to keep track of the number of active slots, the current maximum slot number, and the number of elements remaining before the next slot is to be activated. With these data items saved in the file, the next time we open the file, we're ready to add more records while keeping the previously stored ones accessible. Unfortunately, we don't have a convenient persistent numeric array to save these values in, and it doesn't make much sense to create one just for three values. However, we do have a persistent string array, which we're using to store the hashed records, and we can use the first element of that array to store ASCII representations of the three values that must be maintained in the file. 12 This is handled during the normal constructor for the DynamicHashArray type (Figure dynhash.00), which calls the Open function to do the actual work (Figure dynhash.01). 13 The normal constructor for the DynamicHashArray class (from quantum\dynhash.cpp) (Figure dynhash.00) codelist/dynhash.00 The DynamicHashArray::Open function (from quantum\dynhash.cpp) (Figure dynhash.01) codelist/dynhash.01 The Open function retrieves these values from the first element of the array if it has already been created, and the destructor (Figure dynhash.02) stores them into the first element of the array. The destructor for the DynamicHashArray class (from quantum\dynhash.cpp) (Figure dynhash.02) codelist/dynhash.02 After the change to the type of m_CurrentMaxSlotCount, the sprintf format parameter for that type also had to change, to "%6lu" from "%6u", so that the whole value of the variable would be used. If I hadn't made this change, the data would have been written to the parameter string incorrectly, and the parameters wouldn't have been read back in correctly the next time the file was opened. Overflow Handling The previous version of this program, in the second edition of this book, did not have any means of handling the situation where the total size of the records with the same hash code is so large that the element used to store those records will no longer fit into a quantum. Therefore, in the event that too many records were put into a particular element, the program would fail. While this is acceptable in a program that is intended only to demonstrate the dynamic hashing algorithm, it is unacceptable for commercial use and therefore should really be addressed in a book intended for use in the real world. I've thought about this problem off and on for some time without coming up with a really good solution. However, a few months ago I did figure out how to solve it in a reasonably easy and efficient way, which is illustrated in the code for DynamicHashArray::StoreElement (Figure dynhash.04). . will no longer fit into a quantum. Therefore, in the event that too many records were put into a particular element, the program would fail. While this is acceptable in a program that is intended