hardware engineer about the nature of the problem. However, it is visible only if we are running the test program in a debugger or emulator. The best way to proceed is to assume the best, download the test program, and let it run to completion. Then, if and only if the red LED comes on, must you use the debugger to step through the program and examine the return codes and contents of the memory to see which test failed and why. #include "led.h" #define BASE_ADDRESS (volatile datum *) 0x10000000 #define NUM_BYTES 0x10000 /****************************************************************** **** * * Function: main() * * Description: Test the second 64 KB bank of SRAM. * * Notes: * * Returns: 0 on success. * Otherwise -1 indicates failure. * ****************************************************************** ****/ main(void) { if ((memTestDataBus(BASE_ADDRESS) != 0) || (memTestAddressBus(BASE_ADDRESS, NUM_BYTES) != NULL) || (memTestDevice(BASE_ADDRESS, NUM_BYTES) != NULL)) { toggleLed(LED_RED); return (-1); } else { toggleLed(LED_GREEN); return (0); } } /* main() */ Unfortunately, it is not always possible to write memory tests in a high-level language. For example, C and C++ both require the use of a stack. But a stack itself requires working memory. This might be reasonable in a system that has more than one memory device. For example, you might create a stack in an area of RAM that is already known to be working, while testing another memory device. In a common such situation, a small SRAM could be tested from assembly and the stack could be created there afterward. Then a larger block of DRAM could be tested using a nicer test algorithm, like the one shown earlier. If you cannot assume enough working RAM for the stack and data needs of the test program, then you will need to rewrite these memory test routines entirely in assembly language. Another option is to run the memory test program from an emulator. In this case, you could choose to place the stack in an area of the emulator's own internal memory. By moving the emulator's internal memory around in the target memory map, you could systematically test each memory device on the target. The need for memory testing is perhaps most apparent during product development, when the reliability of the hardware and its design are still unproved. However, memory is one of the most critical resources in any embedded system, so it might also be desirable to include a memory test in the final release of your software. In that case, the memory test and other hardware confidence tests should be run each time the system is powered-on or reset. Together, this initial test suite forms a set of hardware diagnostics. If one or more of the diagnostics fail, a repair technician can be called in to diagnose the problem and repair or replace the faulty hardware. 6.3 Validating Memory Contents It does not usually make sense to perform the type of memory testing described earlier when dealing with ROM and hybrid memory devices. ROM devices cannot be written at all, and hybrid devices usually contain data or programs that cannot be overwritten. However, it should be clear that the same sorts of memory problems can occur with these devices. A chip might be missing or improperly inserted or physically or electrically damaged, or there could be an electrical wiring problem. Rather than just assuming that these nonvolatile memory devices are functioning properly, you would be better off having some way to confirm that the device is working and that the data it contains is valid. That's where checksums and cyclic redundancy codes come in. 6.3.1 Checksums How can we tell if the data or program stored in a nonvolatile memory device is still valid? One of the easiest ways is to compute a checksum of the data when it is known to be good—prior to programming the ROM, for example. Then, each time you want to confirm the validity of the data, you need only recalculate the checksum and compare the result to the previously computed value. If the two checksums match, the data is assumed to be valid. By carefully selecting the checksum algorithm, we can increase the probability that specific types of errors will be detected. The simplest checksum algorithm is to add up all the data bytes (or, if you prefer a 16-bit checksum, words), discarding carries along the way. A noteworthy weakness of this algorithm is that if all of the data (including the stored checksum) is accidentally overwritten with 0's, then this data corruption will be undetectable. The sum of a large block of zeros is also zero. The simplest way to overcome this weakness is to add a final step to the checksum algorithm: invert the result. That way, if the data and checksum are somehow overwritten with 0's, the test will fail because the proper checksum would be FFh. Unfortunately, a simple sum-of-data checksum like this one cannot detect many of the most common data errors. Clearly if one bit of data is corrupted (switched from 1 to 0, or vice versa), the error would be detected. But what if two bits from the very same "column" happened to be corrupted alternately (the first switches from 1 to 0, the other from to 1)? The proper checksum does not change, and the error would not be detected. If bit errors can occur, you will probably want to use a better checksum algorithm. We'll see one of these in the next section. After computing the expected checksum, we'll need a place to store it. One option is to compute the checksum ahead of time and define it as a constant in the routine that verifies the data. This method is attractive to the programmer but has several shortcomings. Foremost among them is the possibility that the data—and, as a result, the expected checksum—might change during the lifetime of the product. This is particularly likely if the data being tested is actually embedded software that will be periodically updated as bugs are fixed or new features added. A better idea is to store the checksum at some fixed location in memory. For example, you might decide to use the very last location of the memory device being verified. This makes insertion of the checksum easy—just compute the checksum and insert it into the memory image prior to programming the memory device. When you recalculate the checksum, you simply skip over the location that contains the expected result, and compare the new result to the value stored there. Another good place to store the checksum is in another nonvolatile memory device. Both of these solutions work very well in practice. 6.3.2 Cyclic Redundancy Codes A cyclic redundancy code (CRC) is a specific checksum algorithm that is designed to detect the most common data errors. The theory behind the CRC is quite mathematical and beyond the scope of this book. However, cyclic redundancy codes are frequently useful in embedded applications that require the storage or transmission of large blocks of data. What follows is a brief explanation of the CRC technique and some source code that shows how it can be done in C. Thankfully, you don't need to understand why CRCs detect data errors—or even how they are implemented—to take advantage of their ability to detect errors. Here's a very brief explanation of the mathematics. When computing a CRC, you consider the set of data to be a very long string of 1's and 0's (called the message). This binary string is divided—in a rather peculiar way—by a smaller fixed binary string called the generator polynomial. The remainder of this binary long division is the CRC checksum. By carefully selecting the generator polynomial for certain desirable mathematical properties, you can use the resulting checksum to detect most (but never all) errors within the message. The strongest of these generator polynomials are able to detect all single and double bit errors, and all odd-length strings of consecutive error bits. In addition, greater than 99.99% of all burst errors—defined as a sequence of bits that has one error at each end—can be detected. Together, these types of errors account for a large percentage of the possible errors within any stored or transmitted binary message. Those generator polynomials with the very best error-detection capabilities are frequently adopted as international standards. Three such standards are parameterized in Table 6-4. Associated with each standard are its width (in bits), the generator polynomial, a binary representation of the polynomial called the divisor, an initial value for the remainder, and a value to XOR (exclusive or) with the final remainder. [2] Table 6-4. International Standard Generator Polynomials CCITT CRC16 CRC32 Checksum size (width) 16 bits 16 bits 32 bits Generator polynomial x 16 + x 12 + x 5 + 1 x 16 + x 15 + x 2 + 1 x 32 + x 26 + x 23 + x 22 + x 16 + x 12 + x 11 + x 10 + x 8 + x 7 + x 5 + x 4 + x 2 + x 1 + 1 Divisor (polynomial) 0x1021 0x8005 0x04C11DB7 Initial remainder 0xFFFF 0x0000 0xFFFFFFFF Final XOR value 0x0000 0x0000 0xFFFFFFFF The code that follows can be used to compute any CRC formula that has a similar set of parameters. [3] To make this as easy as possible, I have defined all of the CRC parameters as constants. To change to the CRC16 standard, simply change the values of the three constants. For CRC32, change the three constants and redefine width as type unsigned long. /* * The CRC parameters. Currently configured for CCITT. * Simply modify these to switch to another CRC standard. */ #define POLYNOMIAL 0x1021 #define INITIAL_REMAINDER 0xFFFF #define FINAL_XOR_VALUE 0x0000 /* * The width of the CRC calculation and result. * Modify the typedef for an 8 or 32-bit CRC standard. */ typedef unsigned short width; #define WIDTH (8 * sizeof(width)) #define TOPBIT (1 << (WIDTH - 1)) The function crcInit should be called first. It implements the peculiar binary division required by the CRC algorithm. It will precompute the remainder for each of the 256 possible values of a byte of the message data. These intermediate results are stored in a global lookup table that can be used by the crcCompute function. By doing it this way, the CRC of a large message can be computed a byte at a time rather than bit by bit. This reduces the CRC calculation time significantly. . toggleLed(LED_GREEN); return (0); } } /* main() */ Unfortunately, it is not always possible to write memory tests in a high-level language. For example, C and C++ both require the use of a stack several shortcomings. Foremost among them is the possibility that the data—and, as a result, the expected checksum—might change during the lifetime of the product. This is particularly likely. values of the three constants. For CRC32, change the three constants and redefine width as type unsigned long. /* * The CRC parameters. Currently configured for CCITT. * Simply modify these