digits of the numeric key in the previous input file, and change the number of logical buffers to correspond to this modification. So how does this sixth version of the program actually perform? Figure timings.06 answers that question. Performance of Zensort version 6 (Figure timings.06) zensort/timings.06 Unfortunately, we can't directly compare the results of this series of tests with the previous one because the file sizes are different. However, we can establish a lower bound on our improvements if we compare the throughput on the one million record file: because throughput generally decreases as the size of the file increases, we can be fairly sure that the throughput figures on the 100 MB file would be lower with the old algorithm than the throughput figures on the 63 MB file with that same algorithm. As usual, the relative improvement is greatest on the smallest file, being almost twice as fast as the previous program on the smaller memory configuration and over twice as fast on the larger configuration. However, the improvement in throughput on the larger files with the larger memory configuration is also quite substantial. The better key distribution has enabled us to improve performance on the large file under the large memory configuration by 80 percent over our previous best with minimal code changes. The Final Version There's one more change we can make that will improve performance significantly on a file with fixed-length records, such as the one used in the sorting contest. Of course, if that were the only "application" using such records, they wouldn't be worth our attention. However, this is far from the case; in fact, most traditional data processing systems rely heavily on files with fixed-length records, which is a main reason they were chosen for the sorting contest in the first place. Therefore, optimizing the code for such records is worthwhile. It turned out that the changes needed to handle fixed-length records more efficiently were not very difficult, as you can see by looking at the next version of the program, shown in Figure zen07.cpp. Zensort version 7 (Zensort\zen07.cpp) (Figure zen07.cpp) zensort/zen07.cpp Basically these changes consisted of: using the read function rather than the getline function to read a fixed-length record into an input buffer; changing the calculation of the record length to use the fixed length; and stepping through the buffer in units of the fixed record length when searching for the next record rather than searching for the next new-line character as the end-of-record marker. I also took advantage of the fixed-length records to avoid clearing the big buffer on all passes after the first one: since I know exactly where each record (and therefore key) starts in the buffer, as well as how many records are in the buffer, records left over from the previous pass can safely be ignored. So how does this final version of the program actually perform? Figure timings.07 answers that question. Performance of Zensort version 7 (Figure timings.07) zensort/timings.07 I'm pretty happy with these figures, especially the ones for the 10 and 25 million byte files with the large memory configuration! However, even at the 100 million byte mark, the large memory configuration results are quite good, exceeding our previous mark by more than 40%. As for the contest: Although the commercial sorting programs that have been entered in the sorting contest are noticeably faster than this one on the largest file, somehow I doubt that their authors will give you the source code. Therefore, if you need an algorithm that you can tinker with yourself, and whose inner workings are explained in detail, I don't think you'll find anything faster than this program. Summary In this chapter, we have developed an algorithm for sorting large files from rather humble beginnings to quite respectable performance by a series of small, incremental steps. This is a good example of how such algorithm development occurs in the real world, in addition to being potentially useful in its own right. In all, we have improved performance on large files by more than 65 to 1 in the large memory configuration, and by a factor of over 20 even in a relatively small 64 MB machine, compared to our first attempt. Problems 1. How would you modify the "divide-and-conquer" version of the sorting program so that it would handle uneven key distributions in a reasonable manner rather than running extremely slowly or failing to execute at all? (You can find suggested approaches to problems in Chapter artopt.htm). Footnotes 1. The displacement and total displacement arrays play the same roles in this program as the bucket count and bucket position arrays did in the distribution counting program in Chapter mail.htm. 2. This special handling for short records is required so that we don't accidentally pick up garbage characters past the end of the key if the record we are handling is shorter than the number of characters on which we wish to sort. 3. Notes on performance tables: 1. All times are in seconds. 2. All tests were run on a machine with a Pentium II processor running at 233 MHz with either 64 or 192 MB of RAM and a Western Digital model 35100 5.1 GB Ultra DMA hard drive. The disk partition on which the tests were run was defragmented before each test was run. 3. The programs were run in a DOS session under Windows 95. 4. I know that the entries in some of the figures look suspicious, as the timing for the small and large memory configurations are sometimes identical on the 100000 record case. However, the entries are correct; I guess it's just a fluke of testing. 4. All times are in seconds. 5. You may be wondering how much of the improvement in performance was due to the better buffer allocation strategy and how much to the other minor improvements such as keeping track of the buffer contents ourselves rather than calling strlen. Unfortunately, I can't give you that information because I did not run tests with each of those factors isolated; there are just too many combinations for me to test in a reasonable amount of time. However, because you have the source code for all the different versions of this program, you can find that out for yourself. I would be interested in receiving the results of any comparative tests you might run. 6. In case you were wondering how much of the performance improvement was due to using one large buffer rather than many small buffers, there was little or no performance improvement from that change. However, according to the first law of optimization, that doesn't make any difference because the program didn't work properly until I made that change. Mozart, No. Would You Believe Gershwin? Introduction In this final chapter we will summarize the characteristics of the algorithms we have encountered in previous chapters (Figures ioopt-processoropt), discuss the future of the art of optimization, and examine approaches to the problems posed in previous chapters. Summary of Characteristics Characteristics of file access time reduction techniques (Figure ioopt) Standard disk-based hashing 1 (Chapter superm.htm) o Excellent time efficiency o Extra storage needed to handle collisions (usually 25% extra is enough) o Appropriate for tables of a few hundred or more entries o Data can be accessed by a unique key which can be assigned arbitrarily Dynamic hashing (Chapter dynhash.htm) o Excellent time efficiency o Table expands as needed o Appropriate for tables of almost unlimited size o Data can be accessed by a unique key which can be assigned arbitrarily Caching 2 (Chapter superm.htm) o Excellent time efficiency o Memory utilization can be adjusted according to availability o Useful when the same data are read repeatedly Characteristics of quantum file access method (Figure quantumfile) Quantum file access method (Chapters quantum.htm and dynhash.htm) o Excellent time efficiency o Memory utilization can be adjusted according to availability o Allows random access to records whose length can vary dynamically o Provides array notation for ease of integration Characteristics of data compression techniques (Figure datacomp) Radix40 (Chapters prologue.htm and superm.htm) o Predictable output size o Good time efficiency o Moderate compression ratio: output size = .67 * input size o Character set limited to 40 distinct characters BCD (Chapter superm.htm) o Predictable output size o Good time efficiency o Good compression ratio: output size = .5 * input size o Character set limited to 16 distinct characters Bitmaps (Chapter mail.htm) o Predictable output size o Good time efficiency o Excellent compression ratio: output size = .125 * input size o Character set limited to two distinct characters Arithmetic coding (Chapter compress.htm) o Unpredictable output size o Fair time efficiency o Very good compression ratio: output size typically ranges from .3 to .5 * input size o Character set not artificially restricted Characteristics of processor time reduction techniques (Figure processoropt) Hash coding (Chapter superm.htm) o See entry in Figure ioopt Lookup tables (Chapters prologue.htm and compress.htm) o Excellent time efficiency o May use somewhat more memory than searching a list o Appropriate for tables of a few hundred or thousand items . calling strlen. Unfortunately, I can't give you that information because I did not run tests with each of those factors isolated; there are just too many combinations for me to test in. fixed-length records, which is a main reason they were chosen for the sorting contest in the first place. Therefore, optimizing the code for such records is worthwhile. It turned out that the changes. perform? Figure timings.07 answers that question. Performance of Zensort version 7 (Figure timings.07) zensort/timings.07 I'm pretty happy with these figures, especially the ones for