programs and system programs, as they attempted to add more and more features to those available in the previous versions. 5 This illustrates what I call the Iron Law of Programming: whenever more resources are made available, a more ambitious project is attempted. This means that optimal use of these resources is important no matter how fast or capacious the machine. Why Optimization Is Often Neglected In view of this situation, why has optimization of application programs not gained more attention? I suspect that a major reason is the tremendous and continuing change in the relative costs of programming time and computer hardware. To illustrate this, let us examine two situations where the same efficiency improvement is gained, but the second example occurs after twenty years of technological improvement. In the early 1970's a programmer's starting salary was about $3 per hour, and an hour of timesharing connect time cost about $15. Therefore, if a program originally took one hour to run every day, and the programmer spent a week to reduce this time by 20%, in about eight weeks the increased speed would have paid for the programmer's time. In the late 1990's, the starting salary is in the vicinity of $15 per hour, and if we assume that the desktop computer costs $2500 and is amortized over three years, the weekly cost of running the unoptimized program is about $2.00. Holding the other assumptions constant, the optimization payback time is about 30 years! 6 This long-term trend seems to favor hardware solutions for performance problems over the investment of programming effort. However, the appropriate strategy for performance improvement of a program depends on how much control you have over the hardware on which the program will run. The less control you have over the hardware, the greater the advantage of software solutions to performance problems, as illustrated by the situations below. Considering a Hardware Solution While every optimization problem is different, here are some general guidelines that can help you decide whether adding hardware to help you with your performance problems makes sense. 1. If you are creating a system that includes both software and hardware, and you can improve the functioning of the program, ease maintenance, or speed up development at a small additional expense for extra hardware resources, you should almost certainly do so. A few years ago, I was involved in a classic example of this situation. The project was to create a point-of-sale system, including specialized hardware and programs, that tied a number of terminals into a common database. Since the part of the database that had to be accessed rapidly could be limited to about 1 1/2 megabytes, I suggested that we put enough memory in the database server machine to hold the entire database. That way, the response time would be faster than even a very good indexing system could provide, and the programming effort would be greatly reduced. The additional expense came to about $150 per system, which was less than 3% of the price of the system. Of course, just adding hardware doesn't always mean that no software optimization is needed. In this case, as you will see below, adding the hardware was only the beginning of the optimization effort. 2. If you are writing programs for your own use on one computer, and you can afford to buy a machine powerful enough to perform adequately, then you might very well purchase such a machine rather than optimizing the program. 7 3. If you are writing a program that will be running on more than one computer, even if it is only for internal use at your company, the expense and annoyance of requiring the other users to upgrade their computers may outweigh the difficulty of optimization. 4. If your program is to run on many computers at your company, it is almost always a good idea to try to optimize it rather than to require all those users to upgrade their hardware. 5. If your program is to be sold in the open market to run on standard hardware, you must try to optimize it. Otherwise, the users are likely to reject the program. An excellent example of what happens when you try to make a number of users upgrade their computers is the relatively disappointing sales record (as of this writing) of the Windows NT TM operating system. In order to get any reasonable performance under version 4.0 of Windows NT (the latest extant as of this writing), you need at least 64 megabytes of memory and a Pentium II/233 processor. Since almost the only people who have such machines are professional software developers, the sales of this program to other users have been much smaller than expected. Categories of Optimization There are two broad categories of optimization: using a better algorithm, and improving the implementation of the algorithm we are already using. Generally, we should replace an algorithm by a better one as soon as possible, as this usually does not hinder understanding or later modification of the program. An example is the use of a distribution counting sort (see Chapter mail.htm) in preference to Quicksort. Of course, if we employ these efficient algorithms from the start of our programming effort, then we are not optimizing in the strict sense of our definition (changing a working program to make it more efficient). However, the end result is still a better program than we would have obtained otherwise. 8 The second category of optimization is the modification of the implementation of an existing algorithm to take advantage of peculiarities of the environment in which it is running or of the characteristics of the data to which it is applied. This type of modification often has the unfortunate side effect of making the algorithm much harder to understand or modify in the future; it also impairs portability among different hardware and software architectures. Therefore, such an optimization should be postponed until the last possible moment, in order to reduce its negative effects on the development and maintenance of the program. This is an application of the First Law of Optimization: don't try to optimize a program (in the strict sense of modifying an existing program) until it is working correctly. Finding the Critical Resource It may seem obvious that, before you can optimize a program, you have to know what is making it inefficient. Of course, if you run out of memory or disk space while executing the program, this determination becomes much simpler. Depending on which language and machine you are using, there may be "profiling" tools available which allow you to determine where your program is spending most of its time. These, of course, are most useful when the problem is CPU time, but even if that is not the problem, you may still be able to find out that, e.g., your program is spending 95% of its time in the disk reading and/or writing routines. This is a valuable clue to where the problem lies. However, even if no profiler is available for your system, it isn't hard to gather some useful information yourself. One way to do this is to insert a call to a system timer routine (such as clock() in ANSI C) at the beginning of the segment to be timed and another call at the end of the segment and subtract the two times. Depending on the resolution of the timer, the length of the routine, and the speed of your processor, you may have to execute the segment more than once to gather any useful information. This is illustrated in the real-life example below. Determining How Much Optimization Is Needed Sometimes your task is simply to make the program run as fast (or take as little memory) as possible. In this case, you must use the most effective optimization available, regardless of the effort involved. However, you often have (or can get) a specific target, such as a memory budget of 1.4 megabytes for your data. If you can achieve this goal by relatively simple means, it would be a waste of your time to try to squeeze the last few kilobytes out of the data by a fancier compression algorithm. In fact, it may be worse than a waste of time; the simpler algorithm may also have other desirable characteristics (other than its raw performance). A good example of such a simple algorithm is the Radix40 data compression method (see Chapter superm.htm, Figures radix40.00 through radix40.03), which is, on average, considerably less effective at reducing the size of a data file than a number of other data compression algorithms. On the other hand, it is quite fast, requires very little storage, and always produces the same amount of output for the same amount of input, which means that its compression efficiency can be calculated exactly in advance. The (statistically) more effective routines such as those using arithmetic coding take more memory and more time, and they generally produce different amounts of output for a given amount of input (depending on what has gone before), so that their compression efficiency cannot be predicted exactly. (In fact, in rare circumstances, they produce output that is larger than the input.) This context dependence also means that they are more difficult to use in applications where random access to compressed data is needed. The moral is that you should use the simplest algorithm that will meet your needs. If you can define your needs precisely, you probably won't have to implement as sophisticated an algorithm to solve your problem, which will leave you more time to work on other areas that need improvement. A Real-Life Example The point-of-sale database program I mentioned earlier is an excellent example of the fact that optimization rarely follows a straight-line path. The first problem was the speed of access to the database by multiple users. Since the software is supplied with specialized hardware, it was reasonable to solve this problem by adding enough memory to hold the portion of the database that requires rapid access. My employer determined that 15,000 invoice records and 5000 customer records would be sufficient, resulting in a memory requirement of about 1.25 megabytes. The expense of this amount of memory was within reason. Unfortunately, that part of memory (conventional memory) which allows normal program access is limited to 640 kilobytes on IBM-compatible systems running the MS-DOS operating system, the environment in which this program operated. While our actual hardware allowed for more memory, it could be referenced only as expanded memory, which cannot be allocated as simply as conventional memory. Luckily, the problem of using expanded memory for data storage has been addressed by libraries of routines which allow any particular 16 Kbyte "page" of expanded memory to be loaded when the data in it are required. Storing the records in expanded memory solved the speed problem by eliminating excessive disk I/O. However, the amount of expanded memory required was very close to the total available. In the very likely event of adding even a single field to the record definition, there would not be enough room for all the records. Therefore, I had to consider ways to reduce the space taken by these records. The makeup of the database is important to the solution of this new problem. It consists almost entirely of 15,000 invoice records of approximately 35 bytes each and 5000 customer records of approximately 145 bytes each. Fortunately, the majority of the fields in the customer record contained only uppercase alphabetic characters, numeric digits, a few special characters (".", ",", and "-"), and spaces. This limited character set allows the use of Radix40 compression, which packs three characters into two bytes. (See Chapter superm.htm for more details on this algorithm). However, the time required to convert these fields from ASCII to Radix40 representation seemed excessive. Some testing disclosed that converting 5000 records containing 6 fields of 12 characters each from ASCII to Radix40 on a 33 MHz i386 took about 40 seconds! 9 Although the most common operations in this system do not require wholesale conversion, it is required in such cases as importing an old-style database into this new system, and such inefficiency was unacceptable. So my space problem had become a speed problem again. . specialized hardware and programs, that tied a number of terminals into a common database. Since the part of the database that had to be accessed rapidly could be limited to about 1 1/2 megabytes,. memory and a Pentium II/233 processor. Since almost the only people who have such machines are professional software developers, the sales of this program to other users have been much smaller. about 1.25 megabytes. The expense of this amount of memory was within reason. Unfortunately, that part of memory (conventional memory) which allows normal program access is limited to 640 kilobytes