Hardware and Computer Organization- P16 pptx

Chapter 16 432 Local Clocks The last future that I want to discuss is the concept of local clocks. Before we look at this phenom- enon, we should spend some time looking into the problem that we are trying to solve. First, let’s try to scope the problem. At this writing (August, 2004) the fastest microprocessor clock frequen - cies are approximately 3.5 GHz. Predictions are that we will easily be at 5 GHz in the next year or so, and that 10 GHz is not far behind. A 5 GHz clock rate corresponds to a clock period of 200 picoseconds (ps). Since the speed of light is roughly 12 inches per nanosecond in free space and 6 inches per nanosecond through a wire, this means that in 200 ps, light can travel about 1.2 inches. A modern microprocessor is about ¾ of an inch on a side, so this means that 62% of the clock period will be wasted just getting the clock signal from one edge of the chip to the other. Since our microprocessor is a fully synchronous machine, this is a very serious problem. We call this problem clock skew. Clock skew is simply the difference in time between corresponding portions of the clock (phase difference) because of the problems associated with simultaneously distributing the clock to all portions of the chip. In Tera- mac, clock skew was a major design issue that had to be factored into all elements of the machine design. Also, the original Cray supercomputer controlled clock skew by adjusting the lengths of the coaxial cables carrying the clock to various circuit boards in the machine. Another potential problem is that all transistors don’t switch in exactly the same way. There can be slight differences in the switching characteristics of the clock circuitry at various portions of the chip. Measurements have shown these differences in switching characteristics to be as large as about 180 ps 17 . Thus, as the chips get bigger and faster, our ability to keep the clock uniformly distributed across the chip becomes more problematic. Today, most clock distribution networks are hierarchical. Figure 16.11 shows a typical clock distribution network. The circuit block labeled phase-locked loop represents the method used in modern computers to multiply the internal clock frequency to a higher value than the external clock input. For example, if your external clock frequency is 200 MHz, a multiplier value that you might set in the BIOS, or is locked into the chip, could be a factor of 11. Thus, the internal clock frequency is 2200 MHz, or 2.2 GHz. As you can see, simple variations in IC process parameters could lead to clock skew problems as the clock is distributed to all of the synchronous circuitry on the chip. Recall that the modern processor is a pipeline- driven device with different Figure 16.11: Synchronous clock distribution network. External clock input Phase-locked loop (PLL) Global clocks Major clocks Local clocks Future Trends and Refigurable Hardware 433 combinatorial logic circuits functioning within the various stages of the pipe. All of the stages are driven from the same synchronous clock, as shown in Figure 16.12. Here we can see the reason why limiting clock skew is so critical. Each stage of the pipeline must complete its work before the clock arrives to latch the result into the next stage of the pipeline. The combinatorial logic within each pipeline stage depends upon the time budget it has to complete its work before the next clock edge comes along. Skewing of the clock edges means that some pipeline stages will be clocked sooner than others, destroying the synchronicity of the pipeline. Now, let’s modify the architecture slightly to allow each com - binatorial block to execute at its own pace. Figure 16.13 shows a schematic diagram of an asyn- chronously clocked pipeline. The system clock is used to drive local clock controllers for each stage of the pipeline. However, each pipeline stage is autonomous, and its local clock is not synchronized with the clock of either the previous stage or the next stage of the pipeline. When the combinatorial logic of a particular stage has completed its work, the stage logic outputs a request to the local clock controller to latch the result to into the D register that feeds the next stage. When the data is latched into the input register for the next stage, the local clock controller issues an acknowledge signal to the next stage, indicating that valid data is now available to work with. The net effect is that we’ve created a pipeline with handshake control between the stages. Each stage must request a data transfer and the latch mechanism responds with an acknowledge - ment of the transfer to the next stage. The drawback of this scheme is that because the local clocks are not synchronized, the handshake may miss a clock edge and the data may have to wait another for clock cycle before the transfer to the next stage may occur. Since each stage is waiting for the previous to complete, this delay in Figure 16.12: Pipeline with a synchronous clock. Combinatorial Logic Combinatorial Logic Combinatorial Logic D Register D Register D Register Clock Figure 16.13: Pipeline with an asynchronous clocking architecture. Combinatorial Logic D Register Combinatorial Logic D Register Combinatorial Logic D Register Acknowledge Acknowledge Acknowledge Request Request Request Local Clock Control Local Clock Control Local Clock Control System clock Chapter 16 434 the pipe could easily propagate back and stall the pipe. However, the advantages of such a scheme could far outweigh the disadvantages when we are asking our processors to run at clock speeds in excess of 10 GHz. Given that we may still be able to build digital logic circuits capable of running at such high clock rates, local clocking of the system is probably the only solution. This raises an interesting question, “Why use clocks at all?” Can we build a completely asynchro - nous (clockless) computer. According to Marculescu et al 17 fully asynchronous designs are probably still a ways away. The computer-aided design (CAD) tools used for design and verification of mod - ern processors still have not reached a level of sophistication that would allow them to deal with a fully asynchronous design. Also, there’s the problem of inertia. We just don’t design computers this way. However, the local clock remains a viable compromise to the problem of clock skew. Several start-up companies have already formed to exploit the idea of a fully asynchronous mi - croprocessor design. Fulcrum Microsystems 18 grew out of work done at Caltech. Figure 16.14 illustrates one of the potential advantages to asynchronous processors. With an asynchronous system, the data in the pipeline flows through at its own rate. Additional circuitry is needed to prevent the runaway condi - tion that clocks and registers are used to prevent in traditional clocked microprocessor systems. This concept is similar to the use of local clocks, but in this case, additional logic is necessary to detect when a stage has completed its work so that the next stage in the pipeline may be enabled. This is shown in Figure 16.15. Summary of Chapter 16 In Chapter 16, we covered: • The architecture of programmable logic devices • The architecture of field programmable gate arrays • The development of reconfigurable computing machines based upon arrays of field programmable gate arrays • Future trends in molecular computing, local clocks and clockless computers. Figure 16.14: Advantage of clockless logic over traditionally clocked logic. Courtesy of Fulcrum Microsystems. Cycle time of clocked logic Manuf acturing margin Clock jitter, skew margin Worst case − average case (logic execution time) Cycle time of clockless logic Logic Time Figure 16.15: Clockless pipeline. Courtesy of Fulcrum Microsystems. Input Completion Detection Output Completion Detection Stage A Stage B Stage C Dual-Rail Domino Logic Dual-Rail Domino Logic Dual-Rail Domino Logic Control Control Control Future Trends and Refigurable Hardware 435 Chapter 16: Endnotes 1 http://www.datio.com. 2 http://www.xilinx.com. 3 http://www.actel.com. 4 http://www.xilinx.com/company/press/kits/v2pro/backgrounder.pdf. 5 “Inside Intel: It’s Moving at Double-Time to Head Off Competitors,” Business Week, June 1, 1992. 6 Greg Snider, Philip Kuekes, W. Bruce Culbertson, Richard J. Carter, Arnold S. Berger, Rick Amerson, The Teramac Configurable Computer Engine, Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications, edited by Will Moore and Wayne Luk, Oxford, UK, September 1995, p. 44. 7 B.S. Landman and R.L. Russo, IEEE Trans. Comp., C20, 1469, 1971. 8 Rick Anderson, Richard J. Carter, W. Bruce Culbertson, Philip Kuekes, Greg Snider, Lyle Albertson: Plasma: An FPGA for Million Gate Systems. FPGA ‘96. Proceedings of the 1996 Fourth International Symposium on Field Programmable Gate Arrays, February 11-13, 1996, Monterey, CA, USA. ACM, 1996, pp. 10–16. 9 B. Culbertson, R. Amerson, R. Carter, P. Kuekes, G. Snider, The Teramac Custom Computer: Extending the limits with defect tolerance, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, November 1996. 10 Barry Shakleford, HP Labs, Private Communication. 11 http://www.triscend.com. 12 Daniel Tynan, “Silicon is Slow,” Popular Science, June, 2002, p. 25. 13 http://setiathome.ssl.berkeley.edu/. 14 Gordon E. Moore, Cramming More Components onto Integrated Circuits, Electronics, Volume 28, Number 8, April 19, 1965. 15 http://www.intel.com/research/silicon/mooreslaw.htm. 16 Mark A. Reed and James M. Tour, Computing with Molecules, Scientific American, June, 2000, p. 89. 17 Diana Marculescu, Dave Albonesi, Alper Buyuktosunoglu, Tutorial: Partially Asynchronous Microprocessors, Micro-35, Istanbul, Turkey, Nov. 18, 2002. 18 http://www.fulcrummicro.com. 436 1. Consider the circuit for a portion of a PLD as shown below. Indicate a fuse that is “blown” by a solid black interconnect and a connection as an open white circle. Make a copy of the dia - gram and “program” the device by filling in the interconnect circles of the fuses that you want to blow. Program the logical equation: X = (A ⊕ B) + C * D Exercises for Chapter 16 A B C D A A B B C C D D X OR = Intact fuse = Blown fuse Input/Invert 2. Does the circuit shown below obey Rent’s Rule? NOR AND AND OR XOR A B C OUT NOT 3. Circuits similar to the circuit shown below, consisting of 16–32 stages, are used to detect defective interconnects or defective logic elements in defect tolerant computing machines. Why is this circuit particularly a particularly good choice for such a task? 4. Suppose that you want to design a synchronous CPU with a 10 GHz clock rate. The worst case propagation delay through the logic gates is 28 picoseconds. No stage of the pipeline has more than three levels of logic circuitry. You also need to maintain a safety margin of 10 picosec - onds to allow for manufacturing uncertainties, device set-up times, and differences between the switching characteristics of the devices in the circuitry. Approximately what is largest dif - ference in the length of the clock paths that this design can tolerate? 437 APPENDIX A Chapter 1: Solutions for Odd-Numbered Problems 1. Moore’s Law states that the number of transistors on an integrated circuit die doubles approximately every 18 months. Since the number of transistors that circuit designers can place on a single die is constantly going up, this means that the complexity of the type of computers and memories that they use is also going up. Also, since the numbers of transistors is increasing, the size of the transistors is decreasing, so transistors are being packed more closely and the distance that the electrical signals have to travel goes down. This means that circuits can run faster. Thus, there are two effects going on. Computers can achieve higher performance in areas such as bus bandwidth and complexity because we can take advantage of the number of circuits we can place on a single die. Also, these complex designs can run faster. Finally, complex circuit designs allow even more complex software applications to run because we have memories with higher speed and capacity to implement the algorithms. 3. An advantage of an abstraction layer concept is that you can hide the details and differences of the lower level details so that programs at the upper level need only be written once and will be able to run on a wide range of different machines. A disadvantage is that you may lose efficiency as calls to the lower level functions must progress through the different layer and be translated at each step. 5. On average, semiconductor memory is 34,286 times faster than the hard drive. 7. Convert the following hexadecimal numbers to decimal: (i) 0xFE57 = 65,111 (j) 0xA3011 = 667,665 (k) 0xDE01 = 56,833 (l) 0x3AB2 = 15026 9. 545 microfeet per second or 545 × 10 –6 feet per second. [Solutions to the even-numbered problems are available through the instructor’s resource website at http://www.elsevier.com/0750678860.] 439 1. The AND circuit becomes an OR circuit and the OR circuit becomes an AND circuit. 3. 5. The truth table is shown on the right. 7. The circuit is shown below: Part a a b c F 0 0 0 0 1 0 0 1 0 1 0 1 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 1 1 1 1 0 Part b a b c F 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 a b c d X 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 X a b Chapter 2: Solutions for Odd-Numbered Problems 441 1. The truth table and K-maps are shown below: Chapter 3: Solutions for Odd-Numbered Problems XOR XOR SUM A B Cin A B Cin SUM Cout 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 Cout A B Cin Karnaugh Map for Cout Karnaugh Map for SUM *A*B *AB AB A*B Cin *Cin 1 1 1 1 1 1 1 1 *A*B *AB AB A*B Cin *Cin SUM = A * B * Cin + A * B * Cin + A * B * Cin + A * B * Cin SUM = Cin * (A * B + A * B ) + Cin * (A * B + A * B) We can simplify the second term by realizing that A * B + A * B is just the equation for the Exclusive OR (XOR) gate. Also, the first term, A * B + A * B is just the complement of the exclusive OR function. Thus, there are two nested XOR terms. SUM = Cin ⊕ [A ⊕ B] We can Use the Karnaugh map to simplify the logic for Cout. There are three loops: Cout = B * Cin + A * Cin + A * B Following is the logic circuitry for SUM and Cout. [...]... shown below: a*b a*b a*b a*b x*y x*y 1 1 1 x*y 1 1 1 1 1 x*y This gives us three loops: Z=b*y+a*y+a*x*y The gate diagram is shown below: a b a NOT b NOT NOT NOT Clock in AND AND X OR D0 Q0 x AND AND Y AND OR y D1 Q1 AND AND Z AND AND 450 OR D2 Q2 z Solutions for Odd-Numbered Problems 7 After the RESET all of the outputs are zero This guarantees that the machine starts from a known state The state of... returns to S0 and dispenses the merchandise We can show this as the following conditions: a 0 0 1 b 0 1 0 x y 1 1 1 0 0 0 X 1 0 0 Y 0 1 0 Z 0 0 1 Now, assume that we’re in state S2 (S2 → X = 0, Y = 1) The possibilities are: 1 No coin is deposited, it stays in S2 2 A dime is deposited, it transitions to S0 and dispenses merchandise 3 A quarter is deposited, it returns to S0 and dispenses the merchandise We... heater, h E B F Solution E B A AND f OR NOT F Alternative Solution Thus, in the above circuit there are three AND conditions for the heater to be turned on E A 1 The key switch (E) must be enabled, f AND OR NOT AND h B 2 The pump must be on (B + F), OR F 3 The temperature is low (A) Solution The alternative solution leads to a simpler arrangement Only the key switch AND low temperature are required... assume that we’re in state S3 (S3 → X = 1, Y = 1) The possibilities are: 1 No coin is deposited, it stays in S3 2 A dime is deposited, it transitions to S0 and dispenses merchandise 3 A quarter is deposited, it returns to S0 and dispenses the merchandise We can show this as the following conditions: a 0 0 1 b x 0 1 0 y 1 1 1 X 1 1 1 Y 1 0 0 Z 1 0 0 0 1 1 That covers all the possibilities Let’s now fill... 007FFFF 00FFFFF 017FFFF 5a Direct memory access: A method of improving the efficiency of data transfers between a peripheral device and the computer s memory The DMA process allows a peripheral device to take control of the memory bus while the processor idles and the peripheral handles the data transfer directly to memory, bypassing the processor 5b Tri-state logic: A circuit design that adds an additional... MOVE.W A6,D8 Illegal: D8 is not a valid register 7e AND. L $4000,$55AA Illegal: A data register must, at least, be the source or destination operand of the AND operation 9 The logical operation of an XOR instruction is to bit-wise do the “EXCLUSIVE OR” of the bits Thus, any bit pairs that are both 1 will give a zero result, any bit pair that is a 1 and a 0 will give a 1 result The FFFF word has the effect... S3, so we need two variables, X and Y, to provide the outputs to the register and to provide two inputs to the truth table Thus, we can make the following assertions: S0 → X = 0, Y = 0 S1 → X = 1, Y = 0 S2 → X = 0, Y = 1 S4 → X = 1, Y = 1 Let’s first analyze the system in words Once we do that, we can begin to fill in the truth table Suppose that the system is in state S0 and no money is deposited It just... low, the pump would not automatically start the pump motor and the heater Another possible interpretation is that a low temperature would automatically start the pump motor and the heater The circuitry for the pump shows both options for the solution a Pump motor: The pump motor is on (f = 1) when the timer (B) is on OR the manual switch (F) is on AND the key switch (E) is on Note in the alternative solution... flows into the processor and out to memory on the same bus signals The status bus is heterogeneous Some signals are input only, some are output only and others are bidirectional The status bus carries all of the housekeeping signals of the processor 7a The circuit for the memory decoder is shown, right: U6 MEMORY DECODING CIRCUIT “CHIP” 1 A14 2 A15 ADVAL 3 CS(ROM) 5 OR NOT NAND 7b The net list is shown... don’t have to worry about the pump because A also turns it on E AND A Alternative Solution c Blower: The air blower (g) is pretty simple The key switch must be on (E = 1) AND the blower switch must be on (D = 1) to turn on the soothing bubbles after a hard day of solving homework problem sets The solution is shown, right: 443 h NOT E AND D g Appendix A 9 The circuit is shown below: A C B D 444 Chapter . The gate diagram is shown below: OR OR OR a b a b AND AND AND AND AND AND AND AND Clock in X x Y y Z z D0 Q0 D1 Q1 D2 Q2 NOT NOT NOT NOT AND Solutions for Odd-Numbered Problems 451 7. After. S2. 2. A dime is deposited, it transitions to S0 and dispenses merchandise. 3. A quarter is deposited, it returns to S0 and dispenses the merchandise. We can show this as the following conditions: a. S3. 2. A dime is deposited, it transitions to S0 and dispenses merchandise. 3. A quarter is deposited, it returns to S0 and dispenses the merchandise. We can show this as the following conditions: a

Định dạng
Số trang	30
Dung lượng	673,8 KB