513 Digital Logic Testing and Simulation , Second Edition , by Alexander Miczo ISBN 0-471-43995-9 Copyright © 2003 John Wiley & Sons, Inc. CHAPTER 10 Memory Test 10.1 INTRODUCTION Memory is pervasive in digital products. Consider, for example, the personal com- puter (PC). It has main memory, video memory, translation ROMs, shadow ROMs, scratchpad memory, hard disk, floppy disk, CDROM, and various other kinds of storage distributed throughout. In addition, the die that contains the microprocessor may also contain one or more levels of cache. A typical PC is depicted in the block diagram of Figure 10.1. It is basically a memory hierarchy connected by several buses and adapters and controlled by a CPU. The purpose for much of the hierarchy is to combine two or more storage sys- tems with divergent capacities, speeds, and costs such that the combined system has almost the speed of the smaller, faster, more expensive memory at almost the cost, speed, and storage capacity of the larger, slower, less expensive memory. Clearly, not all storage devices are part of this hierarchy. The CDROM may be used to deliver programs and/or data to an end user, and video memory is dedicated to the display console. The central processing unit (CPU) accesses many of these auxiliary memory devices through a peripheral component interconnect (PCI) bus, which reg- ulates the flow of data through the system. Unlike the random logic that has been considered up to this point, memory storage devices are characterized by a high degree of regularity. For example, a semiconductor memory is organized as an array of cells, while storage on a hard drive is organized into cylinders. This regularity of semiconductor memories permits much greater pack- ing of transistors on die. For example, in the PowerPC MPC750, memory accounts for 85% of the transistors but only 44% of the die area. 1 In the Alpha 21164, 80% of the 9.6 million transistors are used for three on-chip caches, but the remaining 20% of the transistors occupy a majority of the physical die area. 2 The various storage devices in Figure 10.1 employ different kinds of circuits for storing and retrieving data, and dif- ferent kinds of media for retaining data, hence they have unique failure mechanisms, requiring different test strategies. These memories may also employ varying levels of redundancy to detect and/or correct errors during operation. 514 MEMORY TEST Figure 10.1 Memory distribution in a typical PC architecture. 10.2 SEMICONDUCTOR MEMORY ORGANIZATION Because semiconductor memories are characterized by a high degree of regularity, it is easy to devise algorithms to test them. However, because of the growing capacity of memories, many of the tests will run for unacceptably long periods of time. A sig- nificant problem then, when testing memories, is to identify the kinds of faults that are most likely to occur and determine the most efficient tests for those faults. Semiconductor memories can be characterized according to the following properties: Serial or random access Volatile or nonvolatile Static or dynamic Destructive or nondestructive readout. Serial access memories are those in which data are accessed in a fixed, predeter- mined sequence. Magnetic tape units are an example of serial access. To read a record it is necessary to read the entire tape up to the point where the desired data exists. By way of contrast, a random access memory (RAM) permits reading of data at any specific location without first reading other data. When performing a read of a FIFO (first-in, first-out) memory, the first location stored is the first to be read out. These memories act as buffers when transferring data between functional units with different data rates. A stack in a computer, often used to save data and return addresses, is an example of a LIFO (last-in, first-out) memory. The last data pushed onto the stack is the first data to become available when the stack contents are popped from the stack. CPU Memory Bus Local Bus Cache controller PCI bridge Main memory Motion video peripheral Video memory SCSI Host bus adapter LAN adapter ISA/EISA bridge PCI Bus LAN CD ROM Disk Tape Expansion bus Graphics adapter Video frame buffer Bus master I/O slave Memory slave SEMICONDUCTOR MEMORY ORGANIZATION 515 Figure 10.2 Dynamic memory cell. Memories can be categorized according to whether or not they can retain infor- mation when power is removed. A nonvolatile memory can retain information when power is removed. Examples of nonvolatile memories include magnetic cores, mag- netic tapes, disks, MROMs, EPROMS, EEPROMS, and flash memories. Volatile memory devices lose information when power is removed. Volatile memories can be further broken down into static and dynamic memories. A static memory retains information as long as power is applied, while a dynamic mem- ory can lose information even when power is continuously applied. Static RAMs (SRAMs) are flip-flops that, with their two stable states, can remain in a given state indefinitely, without need for refresh, as long as power is applied; that is, they are static but volatile. The dynamic RAM (DRAM), illustrated in Figure 10.2, is an example of a dynamic memory. The cell is chosen if decoding the memory address causes its word- line to be selected. It is basically a capacitor that can either be discharged onto the bit- line or that can be recharged from the bit-line. Since it is a capacitor, the charge can leak away over time. The memory system must employ refresh circuitry that periodi- cally reads the cells and writes back a suitably amplified version of the signal. If the contents of a memory device are destroyed by a read operation, it is classi- fied as a destructive readout (DRO); otherwise it is a nondestructive readout (NDRO) device. DRAMs must be refreshed when their contents are read out, since a read causes the capacitor to discharge. Programmable read-only memories (PROMs) are slightly more complicated to characterize. They are static and nonvolatile. Mask programmable ROMs and fuse programmable ROMs are programmed once and thereafter can only be read. EPROMs (erasable PROMs) can be erased by means of ultraviolet light, which involves physically removing them from the system in which they are installed. For all practical purposes, they are programmed only once because it is quite inconve- nient to erase and reprogram them, unless they are being used to emulate a new design for the purposes of debugging that design. EEPROMs (electrically erasable PROMs) can be reprogrammed after being installed in a system, but their response time is slower than DRAMs or SRAMs; hence they are confined to applications where nonvolatility is required. Flash memo- ries are structurally almost identical to EPROMs, but they can be reprogrammed in a system and are more dense than EEPROMs. However, EEPROMs can be pro- grammed a bit at a time, whereas flash memories are erased a block at a time before being reprogrammed. The Venn diagram in Figure 10.3 illustrates this distribution of properties among the various kinds of semiconductor memories. 3 read/write select Bit-line Data bit Word-line 516 MEMORY TEST Figure 10.3 Semiconductor memory properties. Semiconductor memories usually employ an organization called 2-D. In this orga- nization a 2 m × 1 memory with m address lines is organized into a matrix with 2 N rows and 2 M columns ( N + M = m ). The address lines are split into two groups such that N lines go to a row decoder and M lines go to a column decoder. This is illus- trated in Figure 10.4. The row decoder selects 2 N memory cells, and the column decoder selects one of those to be read out of or written into memory. This idealized organization is the subject of numerous modifications whose purpose is to permit faster operation and/or faster test. One of the more significant changes is the division of the memory array into several smaller arrays. This reduces loading on the bit lines. As we shall see, it also permits multiple cells to be tested simultaneously. Figure 10.4 A semiconductor memory organization. Dense Non- volatile Re- writable DRAM ROM EPROM EEPROM FLASH . . . . Row decode Column decode A 0 A 1 A N 2 N word lines N ×M MEMORY ARRAY A N+1 A N+2 A N+M . . Data Data D in D out b 0 b 0 b 1 b 1 bm bm R/W CS m = 2 M Sense Amps MEMORY TEST PATTERNS 517 10.3 MEMORY TEST PATTERNS In this section some classical, or legacy, memory test algorithms will be examined. Memory test algorithms fall into two categories: functional and dynamic. A func- tional test targets defects within a memory cell, as well as failures that occur when cell contents are altered by a read or write to another cell. A dynamic test attempts to find access time failures. The All 1s or All 0s tests are examples of functional tests. These tests write 1s or 0s into all memory cells in order to detect individual cell defects including shorts and opens. However, these tests are not effective at finding other failure types. A memory test pattern that tests for address nonuniqueness and other functional faults in memories, as well as some dynamic faults, is the GALPAT (GALloping PATtern), sometimes referred to as a ping-pong pattern. This pattern accesses each address repeatedly using, at some point, every other cell as a previous address. It starts by writing a background of zeroes into all memory cells. Then the first cell becomes the test cell. It is complemented and read alternately with every other cell in memory. Each succeeding cell then becomes the test cell in turn and the entire read process is repeated. All data are complemented and the entire test is repeated. If each read and compare is counted as one operation, then GALPAT has an execution time proportional to 4 N 2 , where N is the number of cells. It is effective for finding cell opens, shorts, address uniqueness faults, sense amplifier interaction, and access time problems. The following Verilog code illustrates the operation of the GALPAT test. First, a RAM module of size “memdepth” × 1 bit is described. The RAM model contains code used to insert a stuck-at fault at memory location 27. The RAM model is fol- lowed by a testbench that executes the GALPAT test. The line of code that instanti- ates the RAM passes parameters into the RAM from the testbench in order to override the RAM size. module ram(addr, datai, datao, wen, oen); parameter log2_memdepth = 8, memdepth = 256; input [log2_memdepth − 1:0] addr; input datai, wen, oen; output datao; reg ramcore[memdepth − 1:0]; reg datao; always @(oen or wen or addr) begin if (!oen && wen) datao = ramcore[addr]; else if (oen) datao = 1 ' bz; else datao = 1 ' bx; end always @(negedge wen) begin 518 MEMORY TEST if (addr == 27) // inject a fault at location 27 ramcore[addr] = 1'b1; else ramcore[addr] = datai; end endmodule module testbench; parameter log2_memdepth = 6; parameter memdepth = 64; reg [log2_memdepth−1:0] addr; reg datain, wen, oen, memval; wire dataout; integer e, i, j; ram #(log2_memdepth,memdepth) U1(addr,datain,dataout,wen, oen); always begin for(e = 0; e <= 1; e = e+1) begin for(i = 0; i < memdepth; i = i+1) write(e,i); // write background of e, e ∈ {0,1} for(i = 0; i < memdepth; i = i+1) begin write(!e,i); for(j = 0; j < memdepth; j = j+1) if(j != i) begin //check all mem. loc. except loc. i read(memval, j); if(memval != e) $display("Mem. Error at loc. %d\n",j); end // for j read(memval,i); // loc. i should not change if(memval != !e) $display("Mem. Error at loc. %d\n", j); write(e,i); // restore value at loc. i end // for i end // for e $finish; end // always task write; // write to memory MEMORY TEST PATTERNS 519 input data, adval; integer adval; begin datain = data; addr = adval; #1 wen = 0; #1 wen = 1; end endtask task read; // read from memory output data; input adval; integer adval; begin addr = adval; #1 oen = 0; #0.5 data = dataout; #0.5 oen = 1; end endtask endmodule Walking Pattern is similar to the GALPAT except that the test cell is read once and then all other cells are read. To create a Walking Pattern from the GALPAT pro- gram, omit the second read operation in the testbench. The Walking Pattern has an execution time proportional to 2N 2 . It checks memory for cell opens and shorts and address uniqueness. March, like most of the algorithms, begins by writing a background of zeroes. Then it reads the data at the first location and writes a 1 to that address. It continues this read/write procedure sequentially with each address in memory. When the end of memory is reached, each cell is read and changed back to zero in reverse order. The test is then repeated using complemented data. Execution time is of order N. It can find cell opens, shorts, address uniqueness, and some cell interactions. Galloping Diagonal is similar to GALPAT in that a 1 is moved through memory. However, it is moved diagonally, checking both row and column decoders simulta- neously. It is of order 4N 3/2 . Row and column GALPATs of order 4N 3/2 also exist. Sliding Diagonal (see Figure 10.5) writes a complete diagonal of 1s against a background of 0s and then, after reading all memory cells, it shifts the diagonal hor- izontally. This continues until the diagonal of 1s has passed through all memory locations. The Diagonal test, of order N, will verify address uniqueness at a signifi- cant speed enhancement over the Walk or GALPAT. Checkerboard Test writes 1s and 0s into alternate memory locations in a check- erboard pattern. After a time delay, which may be several seconds, the pattern is read from memory. This pattern is used to evaluate data retention in static RAMs. 520 MEMORY TEST Figure 10.5 The sliding diagonal test. Surround Read Disturb starts by creating a background of all 0s. Then, each cell in turn becomes the test cell. The test cell is complemented and the eight physi- cally adjacent cells are repeatedly read. After a number of iterations the test cell is read to determine if it has been affected by the read of its neighbors. The operation is then repeated for a background of 1s. The intent is to find disturbances caused by adjacent cell operations. Execution time depends on the number of read cycles but is of the order N. Surround Write Disturb is identical to the Surround Read Disturb except that a write rather than a read is performed. Write Recovery writes a background of 0s. Then the first cell is established as the test cell. A 1 is written into the second cell and the first (test) cell is read. The second cell is restored to 0 and the test cell is read again. This is repeated for the test cell and every other cell. Every cell then becomes the test cell in turn. The entire process is repeated using complemented data. This is an N 2 test that is directed at write recovery type faults. It also detects faults that are detected by GALPAT. Address Test writes a unique value into each memory location. Typically, this could be the address of that memory cell; that is, the value n is written into memory location n. After writing all memory locations, the data are read back. The purpose of this test is to check for address uniqueness. This algorithm requires that the num- ber of bits in each memory word equal or exceed the number of address bits. Moving Inversions test 4 inverts a memory filled with 0s to 1s and conversely. After initially filling the memory with 0s, a word is read. Then a single bit is changed to a 1, and the word is read again. This is repeated until all bits in the word are set to 1 and then repeated for every word in memory. The operation is then reversed, setting bits to 0 and working from high memory to low memory. For a memory with n address bits the process is repeated n times. However, on each repetition, a different bit of the address is taken as the least significant bit for incrementing through all possible addresses. An overflow generates an end around carry so all addresses are generated but the method increments through addresses by 1s, 2s, 4s, and so on. For example, on the second time through, bit 1 (when regarding bit 0 as least significant bit, LSB) is treated as the LSB so all even addresses are gen- erated out to the end of memory. After incrementing to address 111 .110, the next address generated is address 000 .001, and then all consecutive odd addresses are 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 MEMORY FAULTS 521 generated out to the end of memory. The pattern of memory address generation (read the addresses vertically) for the second iteration is as follows: 0000 . . . 1111 . . . . . . . . . 0000 . . . 1111 0011 . . . 0011 0101 . . . 0101 0000 . . . 1111 The Moving Inversions test pattern has 12BN log 2 N patterns, where B is the number of bits in a memory word. It detects addressing failures and cell opens and shorts. It is also effective for checking access times. 10.4 MEMORY FAULTS As memories grow larger, with more memory cells packed into an ever-shrinking die area, the cost to manufacture a die remains fairly constant, while the time it takes to apply test programs increases exponentially. It is variously estimated that the cost to test a memory chip runs from 50% to 70% of the total cost of the finished prod- uct. 5 The first step in reducing the cost of memory test is to understand what fault mechanisms are most likely to occur and then develop test programs that target those faults. With this approach, the manufacturer and the end-user can determine their priorities, balancing cost versus DPM (defects per million) that they can toler- ate in their applications. A number of different failure types can occur in semiconductor memories, affect- ing memory cell contents, cell addressing, and the time required to read out data. Some of the more common failures include the following: 6 Cell opens or shorts Address nonuniqueness Cell/column/row disturb sensitivity Sense amplifier interaction Slow access time Slow write recovery Data sensitivity Refresh sensitivity Static data losses Opens and shorts within semiconductor memory cells may occur because of faulty processing, including misaligned masks or imperfect metallization. These 522 MEMORY TEST failures are characterized by a general randomness in their nature. Opens and shorts may occur at the chip connections to a printed circuit board. In a km × n memory system containing km words of n bits each, and made up of memory chips of size m × 1, a fault that occurs in bit position i of m consecutive bits is indicative of either a totally failed chip or one in which an open or short exists between the chip and the PCB on which it is mounted. Address nonuniqueness results from address decoder failures that may either cause the same memory cell to be accessed by several different addresses or several cells may be addressed during a single access. These failures often cause some cells to be physically inaccessible. An effective test must insure that each read or write operation accesses one, and only one, memory cell. Disturb sensitivity between adjacent cells or between cells in the same row or column can result from capacitive coupling. Slow access time can be caused by slow decoders, overloaded sense amplifiers, or an excessive capacitive charge on output circuits. Slow write recovery may indicate a saturated sense amplifier that cannot recover from a write operation in time to perform a subsequent read operation. A memory cell can be affected by the contents of neighboring cells. Worse still, the cell may be affected only by particular combinations on neighboring cells. This problem grows more serious as the distance between neighboring cells shrinks. Refresh sensitivity in dynamic RAMs may be induced by a combination of data sensitivity and temperature or voltage fluctuations. Static RAM cells are normally able to retain their state indefinitely. However, data may become lost due to leakage current or opens in resistors or feedback paths. Recall from Section 3.4, when discussing faults in random logic, that fault mod- els other than the stuck-at model were examined. The one trait these models had in common was a susceptibility to combinatorial explosion. For very small circuits, the number of faults grew so quickly that it was simply not feasible to consider them. Memory circuits, because of their density and the close proximity of cells to one another, exhibit this problem of combinatorial explosion to a far greater degree. Hence, it becomes necessary to restrict consideration to faults that are most likely to occur. The first step is to group the faults into three broad categories: address decoder faults, memory array faults, and read/write logic faults. From there we use the fact, demonstrated by Nair, Thatte, and Abraham, 7 that faults in memory addressing and read/write logic, which includes sense amplifiers, write drivers, and other supporting logic, can be mapped onto functionally equivalent faults in the memory array. This makes it possible to concentrate on faults in the memory array and to develop tests addressed at the functionality of the memory array. First consider faults in the address decode logic. A fault may cause multiple cells to be accessed, or no cell may be accessed, or the wrong cell may be addressed. In the case of multiple cells being addressed, the fault may be viewed as a coupling fault between cells. If no cell is addressed, then, depending on the logic, the response from the read logic may appear as a stuck-at-1 or a stuck-at-0. If the wrong cell is addressed, then, given the presence of the opposite value in that cell, it appears as a stuck-at fault. [...]... memory test algorithms detect the faults 10.7 Suppose that a particular die is made up of 55% memory and 45% random logic Assume that in shipped parts, memory has 2 DPM (defects per million) and that the logic has 1100 DPM What is the overall DPM for the chip? If process yield for the logic is 70%, what fault coverage is needed to have less than 500 DPM for the shipped parts? 10.8 Create the (8,4) SEC-DED... faulty operation can be caused by mask imperfections or pinhole defects that would not have caused errors in a die with larger feature sizes In a die populated with random logic, there is no predictable order to the placement of logic cells, and faulty die are normally discarded However, since a significant portion of the faulty die contain only a few faulty memory cells,17 it is possible to take advantage... circuits to generate memory addresses in some predetermined order With minor modifications to the diagram, the same test generator could be used to generate the expected response, by way of the control logic, in addition to the test pattern sequence In fact, the test generator could generate data to first fill all of memory with some desired pattern, then the same test generator could generate the expected... same one used previously (Section 10.3) It is easily altered to model various fault mechanisms Inputs Outputs Test pattern sequence M U X Circuit under test (CUT) Response monitor Test Generator Control logic Figure 10.6 Generic BIST scheme Error 526 MEMORY TEST `timescale 1ns / 100ps module DFF(Q, CLK, set); input CLK, set; output Q; reg Q; always @(posedge CLK or posedge set) if(set) Q = 1’b1; else... immediately detectable by virtue of the fact that they will cause an even number of flip-flops to be turned on A parity check on these flipflops reveals stuck-at faults not only in the flip-flops, but in the logic that controls the state transitions 10.5.4 Parallel Test for Memories Conceptually, it is inviting to think of a memory as being composed of a single, monolithic array This is in part due to the...MEMORY FAULTS 523 A fault in the read/write logic may cause an output line to be stuck-at-0 or stuckat-1 In either case, the corresponding cell may be considered to be stuck-at-0 or stuck-at-1 If there are shorts or capacitive coupling between data... reduced charge densities.19 Helium nuclei from impurities found in the semiconductor packaging materials can migrate toward the charge area and neutralize enough of the charge in a memory cell to cause a logic 1 to be changed to a 0 These soft errors, intermittent in nature, are growing more prevalent as chip densities increase One solution is to employ a parity bit with each memory word to aid in the... reliability improvement When should ECC be employed? The answer to this question depends on the application and the extent to which it can tolerate memory bit failures ECC requires extra memory bits and logic and introduces extra delay in a memory cycle; furthermore, it is not a cure for all memory problems since it cannot correct address line failures and, in memories where data can be stored as bytes... constantly upgrade their testers, with a resultant increase in cost Another problem that must be faced is the inability to access many of the memories because they are embedded in ICs, surrounded by random logic Gaining access to the address, data and control pins and controlling them with dedicated memory test algorithms is often impossible As a result of these growing difficulties, memory built-in self-test . read/write logic faults. From there we use the fact, demonstrated by Nair, Thatte, and Abraham, 7 that faults in memory addressing and read/write logic, which. between cells. If no cell is addressed, then, depending on the logic, the response from the read logic may appear as a stuck-at-1 or a stuck-at-0. If the wrong