pointers to each one (temp_zip_pointer) so that we can copy the ZIP codes from the keys as we select each record. Once that bookkeeping is out of the way, we can copy the ZIP code for the current record into the ZIP block via the temp_zip_pointer pointer array and increment which pointer in the array will be used next (current_zip_entry). Now we're at the end of the processing for one record in a processing batch. Once all the records in the batch have been processed, we fall off the end of the inner loop. At this point, we can add the number of items we have read in this batch to total_items_read. If we have read less than a full batch, we must have reached the end of the file, so we are finished with this phase of processing. Getting Ready to Sort Now that we have all the information needed to sort the keys so that we will be able to read the records in ZIP code order on the second pass. However, the sort function, Megasort (Figure megac.00) requires that same information in a different form, so we have some preparation to do before we can call it. The Megasort function (mail\megac.cpp) (Figure megac.00) codelist/megac.00 Looking at the parameters for this function, we see that the data to be sorted are described by an array of string pointers (unsigned char **PtrArray), whereas what we have is a number of blocks of ZIP codes. The record numbers, which the sort function will rearrange according to the order of the ZIP codes, must be passed as an array of unsigned values (unsigned *RecNums), rather than as the bitmap in which they are stored by our search function. To set up the first argument to Megasort, we need to produce an array of pointers to each of the ZIP codes we have stored in the blocks, which we do in the portion of Figure mail.02 between the function call that reports the "Pass 1 time" and the call to the Megasort function. We start by allocating a block of ZIP code pointers (zip_pointer) that can hold all the pointers we will need (items_found). Then, for each block of ZIP codes we have created (zip_block[i]), we calculate the address of each ZIP code in the block and store it in an entry of our zip_pointer array. That takes care of the pointers to the ZIP codes. Now we have to turn the bitmap into an array of record numbers to be rearranged. First we allocate the array of unsigned values, with items_found entries. Then we call testbit (Figure bitfunc.01). The testbit function definition (from mail\bitfunc.cpp) (Figure bitfunc.01) codelist/bitfunc.01 We call testbit once for every record number in the file, and every time we find a bit set, we store that record number in the next available space in the record_number array. The testbit function is very similar to setbit, described above. The difference is that we don't want to set the bit, but to find out whether it is already set; therefore, we don't modify the bitmap array. Now we are finally ready to call Megasort (Figure megac.00). The Megasort Function The main variables involved are the BucketCount array, which keeps track of the number of strings that have each character in the current sorting position (the number of A's, B's, etc.), and the BucketPosition array, which maintains the current position in the output array for each possible character (where the next A string should go, the next B string, etc.). We also have to allocate storage to the TempPtrArray variable for a copy of the pointers to the strings, and to the TempRecNums variable for a copy of the record numbers to be rearranged. All of this setup code having been executed, the main sorting loop is ready to go. You may be surprised to see how few lines of code are involved in the main loop. If we disregard blank lines and those that contain only a starting or ending brace, only 17 lines contain any executable code. Yet in many common applications this simple algorithm will outperform all of the complex methods favored by the computer science textbooks. The outer loop is executed once for each character position in the sort. That is, if we are sorting a number of ten-character strings, it will be executed ten times. Its role is to select which character of the strings to be sorted will be used in the current pass through the strings. The reason that the loop index i decreases for each pass through the loop was discussed above: each character position is treated as a separate "key", so we have to sort on the least significant "key" first and work our way up to the most significant one. Each pass of this sort leaves strings in the same order as they were in before the current pass if they have the same character in the current position. The first operation in the main loop is to use the memset function to clear the BucketCount array, which maintains the counts for each possible ASCII character code. In the first pass through the input data, we step through PtrArray, selecting the ith character as the current character to be sorted from each element of the array. The ASCII value of that character, m, is used to select the element of BucketCount to be incremented, so that when we reach the end of the pointer array, we will know the number of strings that have that character in the current position being sorted. Once all the counts have been accumulated, we can use that information to fill in the BucketPosition array, which indicates the next available slot for each possible character position. This serves the function of directing traffic from the input pointer array PtrArray to the output one, TempPtrArray. Each element in the BucketPosition array corresponds to the next output position to be used in TempPtrArray for a given character. For example, suppose that the lowest character in the current position of the array to be sorted is 'A', and there are five strings that have an 'A' in the current position. In that case, the initial value of BucketPosition['A'] would be 0, and the initial value of BucketPosition['B'] would be 5, since the first 'A' key should go into output slot 0 and the first 'B' key should go into output slot 5. After filling in the BucketPosition array, we are ready to rearrange the pointers to the keys by copying these pointers into the temporary array TempPtrArray. That is done in the second inner loop through all the keys. First, we calculate the appropriate index into the BucketPosition array by retrieving the character from the input string, as in the first inner loop above. Then we use that index to retrieve the appropriate element of the BucketPosition array, which we will use to copy both the key pointer and the record number of the current record into the output arrays, TempPtrArray and TempRecNums, respectively. Then we increment the element of BucketPosition that we have used, because the next key that has that particular character in the current position should go in the next position of the output. In our example, after the first 'A' key has been copied into the output array, BucketPosition['A'] would be incremented to 1, meaning that the next 'A' key should go into output slot 1. Similarly, after the first 'B' key has been copied into the output array, BucketPosition['B'] would be incremented to 6, meaning that the next 'B' key should go into output slot 6. After copying all of the pointers from PtrArray into TempPtrArray, and all of the record numbers from RecNums into TempRecNums in sorted order, we copy them back into the original arrays for use in the next sorting pass, if there is one, or as the final result if we are done. Finishing Up When we are done with the sort, we can return the sorted keys and record numbers to process (Figure mail.02), so that it can retrieve and print the first and last name and the ZIP code for each customer in order by their ZIP codes. We then free the dynamic storage used in the function and return to the main program, which calls terminate to close the files, then exits. Performance You may be wondering how much performance improvement we have obtained by using distribution sorting in our example program, mail.cpp. In order to answer this question, I have created another program called mailx.cpp that is identical to mail.cpp except that mailx.cpp uses qsort rather than the distribution counting sort. I've included calls to timing functions in both versions and have run each of them four times with different numbers of keys to see how they perform; Figures promailx.00 and promail.00 summarize the results of those runs. Performance figures for mailx.cpp (Figure promailx.00) codelist/promailx.00 Performance figures for mail.cpp (Figure promail.00) codelist/promail.00 While the improvement in performance increases with the number of elements, even at the low end, we've multiplied the performance of the sorting part of the task by about 43 times. At the high end, the difference in the sorting speed is a phenomenal 316 to 1, and the overall performance in that case is more than 50 times what it is using qsort. Is optimization always that simple? This would be a good time for me to mention an attempted optimization that actually did not help performance in the final analysis: the idea of "strip mining", which can most simply be described as dividing a file into two parts, one of which is the "key" portion and the other the "data" portion. Theoretically, it should be faster to read just the keys on the first pass of our mailing list program, then read the data on the second pass only for those records that have been selected. Unfortunately, when I tried this "optimization", it actually slowed the program down. I found this quite surprising, because in earlier tests on other machines, it did help somewhat. However, it doesn't help on my current machine, so I've omitted it. Moral Support Is there a moral to this real-life optimization story that summarizes both its successes and failures? I think there are several. First, we've seen that a very simple algorithm can grossly out-perform a standard library function from a well-known compiler vendor. Second, and more important, optimization is a tricky business. The common thread of these lessons is that no matter how carefully you plan a programming task, the unexpected is very likely to occur. You have to be willing to examine all your preconceptions and discard them if necessary in the face of new evidence. Perhaps this willingness, along with the ability to handle the unexpected when it occurs, is the difference between a novice and an expert. A Cautionary Tale Before we leave this chapter, I should mention that the programs described here do not work properly when compiled with the DJGPP compiler, but produce general protection faults at apparently random times during their execution. This is quite surprising, because they work perfectly with the Microsoft C++ version 5 compiler. Unfortunately, I haven't been able to track down and eliminate the source of this problem; however, at least I can tell you about the problem before you discover it for yourself. As to the cause of the problem: I can only guess that it is caused by bugs in the most recent release of DJGPP. Summary In this chapter, we have covered the use of a bitmap to keep track of one attribute of a number of records, sorting via the distribution counting sort, and how to gain rapid access to data according to criteria specified at run time. In the next chapter we will examine the arithmetic coding method of data compression. Problems What modifications to the distribution counting sort would: 1. Arrange the output in descending order rather than ascending order? . increment the element of BucketPosition that we have used, because the next key that has that particular character in the current position should go in the next position of the output. In our. number of elements, even at the low end, we've multiplied the performance of the sorting part of the task by about 43 times. At the high end, the difference in the sorting speed is a phenomenal. idea of "strip mining", which can most simply be described as dividing a file into two parts, one of which is the "key" portion and the other the "data" portion.