2009 dce KIẾN TRÚC MÁY TÍNH CS2009 BK TP.HCM Khoa Khoa học và Kỹ thuật Máy tính BM Kỹ thuật Máy tính Võ Tấn Phương http://www.cse.hcmut.edu.vn/ ~ vtphuong/KTMT ©2009, CE Department http://www.cse.hcmut.edu.vn/ vtphuong/KTMT 2009 dce Chapter 3 The Processor Adapted from Computer Organization and Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, © 2008 ©2009, CE Department©2009, CE Department 2 11/17/2009 2009 dce The Five classic Components of a Computer ©2009, CE Department©2009, CE Department 3 11/17/2009 2009 dce Introduction • CPU performance factors – Instruction count • Determined by ISA and compiler – CPI and Cycle time • Determined by CPU hardware Determined by CPU hardware • We will examine two MIPS implementations – A simplified version – A more realistic pipelined version • Simple subset, shows most aspects Mf l – M emory re f erence: l w, sw – Arithmetic/logical: add, sub, and, or, slt – Control transfer: beq , j ©2009, CE Department Control transfer: beq , j 2009 dce Instruction Execution •PC → instruction memory, fetch instruction • Register numbers → register file read registers Register numbers → register file , read registers • Depending on instruction class – Use ALU to calculate Use ALU to calculate • Arithmetic result • Memory address for load/store • Branch target address – Access data memory for load/store – PC ← target address or PC + 4 – PC ← target address or PC + 4 ©2009, CE Department 2009 dce CPU Overview ©2009, CE Department 2009 dce Multiplexers • Can’t just join wires together wires together – Use multiplexers ©2009, CE Department 2009 dce Control ©2009, CE Department 2009 dce Logic Design Basics • Information encoded in binary Low voltage = 0 High voltage = 1 – Low voltage = 0 , High voltage = 1 – One wire per bit Multi bit data encoded on multi wire buses – Multi - bit data encoded on multi - wire buses • Combinational element – Operate on data – Output is a function of input • State (sequential) elements – Store information ©2009, CE Department 2009 dce Combinational Elements • AND-gate A Y + • Adder – Y = A & B A Y B Y + – Y = A + B B Y • Multiplexer • Arithmetic/Logic Unit • Multiplexer – Y = S ? I1 : I0 A – Y = F(A, B) I0 I1 Y M u x B Y ALU ©2009, CE Department S F [...]... Longest delay determines clock period ©2009, CE Department dce 2009 Building a Datapath • Datapath – Elements that process data and addresses in the CPU • Registers, ALUs, mux’s, memories, … mux s, • We will build a MIPS datapath incrementally – Refining the overview design ©2009, CE Department dce 2009 Instruction Fetch 32-bit register Increment by 4 for next instruction ©2009, CE Department dce 2009...dce 2009 Sequential Elements • Register: stores data in a circuit – Uses a clock signal to determine when to update the stored value – Edge-triggered: update when Clk changes from 0 to 1 Clk D Clk Q D Q ©2009, CE Department dce 2009 Sequential Elements • Register with write control – Only updates on clock edge when... + 4 • Already calculated by instruction fetch ©2009, CE Department dce 2009 Branch Instructions Just re-routes re routes wires Sign bit Sign-bit wire replicated ©2009, CE Department dce 2009 Composing the Elements • First-cut data path does an instruction in one clock cycle – Each datapath element can only do one function at a time – Hence, we need separate instruction and data memories • Use multiplexers... equal q XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 ©2009, CE Department dce 2009 The Main Control Unit • Control signals derived from instruction Load/ Store Branch 0 rs rt rd shamt funct 31:26 R-type 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt... – Critical path: load instruction – Instruction memory → register file → ALU → data memory → register file • Not feasible to vary period for different instructions • Violates design principle – Making the common case fast • We will improve performance by pipelining ©2009, CE Department dce 2009 Pipelining Analogy • Pipelined laundry: overlapping execution – Parallelism improves performance • Four loads:... Access memory operand WB: W it WB Write result back to register lt b k t i t ©2009, CE Department dce 2009 Pipeline Performance • Assume time for stages is – 100ps for register read or write – 200ps for other stages • Compare p p p pipelined datapath with single-cycle p g y datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw . Phương http://www.cse.hcmut.edu.vn/ ~ vtphuong/KTMT ©2009, CE Department http://www.cse.hcmut.edu.vn/ vtphuong/KTMT 2009 dce Chapter 3 The Processor Adapted from Computer Organization and Adapted from Computer Organization and Design,. Department 2009 dce CPU Overview ©2009, CE Department 2009 dce Multiplexers • Can’t just join wires together wires together – Use multiplexers ©2009, CE Department 2009 dce Control ©2009, CE Department 2009 dce Logic. addresses in the CPU • Registers, ALUs, mux ’ s, memories, … Registers, ALUs, mux s, memories, … • We will build a MIPS datapath incrementally incrementally – Refining the overview design ©2009,