The first test determines whether the highest frequency value allocated to our message is less than one-half of the total frequency range. If it is, we know that the first output bit is 0, so we call the bit_plus_follow_0 macro (Figure arenc.07) to output that bit. Let's take a look at that macro. First we call output_0, which adds a 0 bit to the output buffer and writes it out if it is full. Then, in the event that bits_to_follow is greater than 0, we call output_1 and decrement bits_to_follow until it reaches 0. Why do we do this? The bit_plus_follow_0 macro (from compress\arenc.cpp) (Figure arenc.07) codelist/arenc.07 The reason is that bits_to_follow indicates the number of bits that could not be output up till now because we didn't know the first bit to be produced ("deferred" bits). For example, if the range of codes for the message had been 011 through 100, we would be unable to output any bits until the first bit was decided. However, once we have enough information to decide that the first bit is 0, we can output three bits, "011". The value of bits_to_follow would be 2 in that case, since we have two deferred bits. Of course, if the first bit turns out to be 1, we would emit "100" instead. The reason that we know that the following bits must be the opposite of the initial bit is that the only time we have to defer bits is when the code range is split between codes starting with 0 and those starting with 1; if both low and high started with the same bit, we could send that bit out. The values of HALF, FIRST_QTR, and THIRD_QTR are all based on TOP_VALUE (Figure arith.00). In our example, FIRST_QTR is 64, HALF is 128, and THIRD_QTR is 192. The current value of high is 255, which is more than HALF, so we continue with our tests. Assuming that we haven't obtained the current output bit yet, we continue by testing the complementary condition. If low is greater than or equal to HALF, we know that the entire frequency range allocated to our message so far is in the top half of the total frequency range; therefore, the first bit of the message is 1. If this occurs, we output a 1 via bit_plus_follow_1. The next two lines reduce high and low by HALF, since we know that both are above that value. In our example, low is 223, which is more than HALF. Therefore, we can call bit_plus_follow_1 to output a 1 bit. Then we adjust both low and high by subtracting HALF, to account for the 1 bit we have just produced; low is now 95 and high is 127. If we haven't passed either of the tests to output a 0 or a 1 bit, we continue by testing whether the range is small enough but in the wrong position to provide output at this time. This is the situation labeled "Defer" in figure initcode. We know that the first two bits of the output will be either 01 or 10; we just don't know which of these two possibilities it will be. Therefore, we defer the output, incrementing bits_to_follow to indicate that we have done so; we also reduce both low and high by FIRST_QTR, since we know that both are above that value. If this seems mysterious, remember that the encoding information we want is contained in the differences between low and high, so we want to remove any redundancy in their values. 16 If we get to the break statement near the end of the loop, we still don't have any idea what the next output bit(s) will be. This means that the range of frequencies corresponding to our message is still greater than 50% of the maximum possible frequency; we must have encoded an extremely frequent character, since it hasn't even contributed one bit! If this happens, we break out of the output loop; obviously, we have nothing to declare. In any of the other three cases, we now have arrived at the statement low <<= 1; with the values of low and high guaranteed to be less than half the maximum possible frequency. 17 Therefore, we shift the values of low and high up one bit to make room for our next pass through the loop. One more detail: we increment high after shifting it because the range represented by high actually extends almost to the next frequency value; we have to shift a 1 bit in rather than a 0 to keep this relationship. In our example, low is 95 and high is 127, which represents a range of frequencies from 95 to slightly less than 128. The shifts give us 190 for low and 255 for high, which represents a range from 190 to slightly less than 256. If we hadn't added 1 to high, the range would have been from 190 to slightly less than 255. Since we have been over the code in this loop once already, we can continue directly with the example. We start out with low at 190 and high at 255. Since high is not less than HALF (128), we proceed to the second test, where low turns out to be greater than HALF. So we call bit_plus_follow_1 again as on the first loop and then reduce both low and high by HALF, producing 62 for low and 127 for high. At the bottom of the loop, we shift a 0 into low and a 1 into high, resulting in 124 and 255, respectively. On the next pass through the loop, high is not less than HALF, low isn't greater than or equal to HALF, and high isn't less than THIRD_QTR (192), so we hit the break and exit the loop. We have sent two bits to the output buffer. We are finished with encode_symbol for this character. Now we will start processing the next character of our example message, which is 'B'. This character has a symbol value of 1, as it is the second character in our character set. First, we set prev_cum to 0. The frequency accumulation loop in Figure arenc.03 will not be executed at all, since symbol/2 evaluates to 0; we fall through to the adjustment code in Figure arenc.04 and select the odd path after the else, since the symbol code is odd. We set current_pair to (1,3), since that is the first entry in the frequency table. Then we set total_pair_weight to the corresponding entry in the both_weights table, which is 54. Next, we set cum to 0 + 54, or 54. The high part of the current pair is 1, so high_half_weight becomes entry 1 in the translate table, or 2; we add this to prev_cum, which becomes 2 as well. Now we have reached the first line in Figure arenc.05. Since the current value of low is 124 and the current value of high is 255, the value of range becomes 131. Next, we recalculate high as 124 + (131*54)/63 - 1, or 235. The new value of low is 124 + (131*2)/63, or 128. We are ready to enter the output loop. First, high is not less than HALF, so the first test fails. Next, low is equal to HALF, so the second test succeeds. Therefore, we call bit_plus_follow_1 to output a 1 bit; it would also output any deferred bits that we might have been unable to send out before, although there aren't any at present. We also adjust low and high by subtracting HALF, to account for the bit we have just sent; their new values are 0 and 107, respectively. Next, we proceed to the statements beginning at low <<= 1;, where we shift low and high up, injecting a 0 and a 1, respectively; the new values are 0 for low and 215 for high. On the next pass through the loop we will discover that these values are too far apart to emit any more bits, and we will break out of the loop and return to the main function. We could continue with a longer message, but I imagine you get the idea. So let's return to the main function (Figure encode.00). 18 update_model The next function called is update_model (Figure adapt.01), which adjusts the frequencies in the current frequency table to account for having seen the most recent character. The update_model function (from compress\adapt.cpp) (Figure adapt.01) codelist/adapt.01 The arguments to this function are symbol, the internal value of the character just encoded, and oldch, the previous character encoded, which indicates the frequency table that was used to encode that character. What is the internal value of the character? In the current version of the program, it is the same as the ASCII code of the character; however, near the end of the chapter we will employ an optimization that involves translating characters from ASCII to an internal code to speed up translation. This function starts out by adding the character's ASCII code to char_total, a hash total which is used in the simple pseudorandom number generator that we use to help decide when to upgrade a character's frequency to the next frequency index code. We use the symbol_translation table to get the ASCII value of the character before adding it to char_total; this is present for compatibility with our final version which employs character translation. The next few lines initialize some variables: old_weight_code, which we set when changing a frequency index code or "weight code", so that we can update the frequency total for this frequency table; temp_freq_info, a pointer to the frequency table structure for the current character; and freq_ptr, the address of the frequency table itself. Next, we compute the index into the frequency table for the weight code we want to examine. If this index is even, that means that this symbol is in the high part of that byte in the frequency table. In this case, we execute the code in the true branch of the if statement "if (symbol % 2 == 0)". This starts by setting temp_freq to the high four bits of the table entry. If the result is 0, this character has the lowest possible frequency value; we assume that this is because it has never been encountered before and set its frequency index code to INIT_INDEX. Then we update the total_freq element in the frequency table. However, if temp_freq is not 0, we have to decide whether to upgrade this character's frequency index code to the next level. The probability of this upgrade is inversely proportional to the ratio of the current frequency to the next frequency; the larger the gap between two frequency code values, the less the probability of the upgrade. So we compare char_total to the entry in upgrade_threshold; if char_total is greater, we want to do the upgrade, so we record the previous frequency code in old_weight_code and add HIGH_INCREMENT to the byte containing the frequency index for the current character. We have to use HIGH_INCREMENT rather than 1 to adjust the frequency index, since the frequency code for the current character occupies the high four bits of its byte. Of course, the character is just as likely to be in the low part of its byte; in that case, we execute the code in the false branch of that if statement, which corresponds exactly to the above code. In either case, we follow up with the if statement whose condition is "(old_weight_code != -1)", which tests whether a frequency index code was incremented. If it was, we add the difference between the new code and the old one to the total_freq entry in the current frequency table, unless the character previously had a frequency index code of 0; in that case, we have already adjusted the total_freq entry. The last operation in update_model is to make sure that the total of all frequencies in the frequency table does not exceed the limit of MAX_FREQUENCY; if that were to happen, more than one character might map into the same value between high and low, so that unambiguous decoding would become impossible. Therefore, if temp_total_freq exceeds MAX_FREQUENCY, we have to reduce the frequency indexes until this is no longer the case. The while loop whose continuation expression is "(temp_total_freq > MAX_FREQUENCY)" takes care of this problem in the following way. First, we initialize temp_total_freq to 0, as we will use it to accumulate the frequencies as we modify them. Then we set freq_ptr to the address of the first entry in the frequency table to be modified. Now we are ready to step through all the bytes in the frequency table; for each one, we test whether both indexes in the current byte are 0. If so, we can't reduce them, so we just add the frequency value corresponding to the translation of two 0 indexes (BOTH_WEIGHTS_ZERO) to temp_total_freq. Otherwise, we copy the current index pair into freq. If the high index is nonzero, we decrement it. Similarly, if the low index is nonzero, we decrement it. After handling either or both of these cases, we add the translation of the new index pair to temp_total_freq. After we have processed all of the index values in this way, we retest the while condition, and when temp_total_freq is no longer out of range, we store it back into the frequency table and return to the main program. Finally, we have returned to the main function (Figure encode.00), where we copy ch to oldch, so that the current character will be used to select the frequency table for the next character to be encoded; then we continue in the main loop until all characters have been processed. When we reach EOF in the input file, the main loop terminates; we use encode_symbol to encode EOF_SYMBOL, which tells the receiver to stop . bits). For example, if the range of codes for the message had been 011 through 100, we would be unable to output any bits until the first bit was decided. However, once we have enough information. a 1, respectively; the new values are 0 for low and 215 for high. On the next pass through the loop we will discover that these values are too far apart to emit any more bits, and we will break. we know that both are above that value. If this seems mysterious, remember that the encoding information we want is contained in the differences between low and high, so we want to remove