Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 62 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
62
Dung lượng
191,29 KB
Nội dung
4 THE MICROARCHITECTURE LEVEL CuuDuongThanCong.com https://fb.com/tailieudientucntt MAR To and from main memory Memory control registers MDR PC MBR SP LV Control signals Enable onto B bus CPP Write C bus to register TOS OPC C bus B bus H A ALU control B N Z ALU Shifter Shifter control Figure 4-1 The data path of the example microarchitecture used in this chapter CuuDuongThanCong.com https://fb.com/tailieudientucntt F0 0 1 1 1 1 0 0 F1 1 1 1 1 1 1 1 ENA 1 1 1 1 1 0 ENB 1 1 1 1 0 INVA 0 0 0 1 0 0 INC 0 0 1 1 1 0 Function A B A B A+B A+B+1 A+1 B+1 B−A B−1 −A A AND B A OR B −1 Figure 4-2 Useful combinations of ALU signals and the function performed CuuDuongThanCong.com https://fb.com/tailieudientucntt Registers loaded instantaneously from C bus and memory on rising edge of clock Shifter output stable Cycle starts here Clock cycle ∆w ∆x Set up signals to drive data path Drive H and B bus ∆y Clock cycle New MPC used to load MIR with next microinstruction here ∆z ALU and shifter MPC available here Propagation from shifter to registers Figure 4-3 Timing diagram of one data path cycle CuuDuongThanCong.com https://fb.com/tailieudientucntt 32-Bit MAR (counts in words) Discarded 0 32-Bit address bus (counts in bytes) Figure 4-4 Mapping of the bits in MAR to the address bus CuuDuongThanCong.com https://fb.com/tailieudientucntt Bits NEXT_ADDRESS Addr J M P C J A M N J A M Z JAM S L L S F0 F1 E E I I H O T C L S P M M W R F R P O P V P C D A R E E N N N N I T R R T A C A C S P A B V C A E D H ALU C Mem B bus B B bus registers = MDR = PC = MBR = MBRU = SP = LV = CPP = TOS = OPC -15 none Figure 4-5 The microinstruction format for the Mic-1 CuuDuongThanCong.com https://fb.com/tailieudientucntt Memory control signals (rd, wr, fetch) 4-to-16 Decoder MAR MDR MPC PC O MBR SP 512 × 36-Bit control store for holding the microprogram LV JMPC CPP Addr J ALU C MIR M B TOS JAMN/JAMZ OPC H B bus 1-bit flip–flop N ALU control High bit ALU Control signals Enable onto B bus Z Shifter C bus Write C bus to register Figure 4-6 The complete block diagram of our example microarchitecture, the Mic-1 CuuDuongThanCong.com https://fb.com/tailieudientucntt Address Addr JAM 0x75 0x92 001 Data path control bits JAMZ bit set … 0x92 … 0x192 One of these will follow 0x75 depending on Z Figure 4-7 A microinstruction with JAMZ set to has two potential successors CuuDuongThanCong.com https://fb.com/tailieudientucntt SP LV SP LV SP LV a3 a2 a1 (a) 108 104 100 b4 b3 b2 b1 a3 a2 a1 c2 c1 b4 b3 b2 b1 a3 a2 a1 (b) (c) SP LV d5 d4 d3 d2 d1 a3 a2 a1 (d) Figure 4-8 Use of a stack for storing local variables (a) While A is active (b) After A calls B (c) After B calls C (d) After C and B return and A calls D CuuDuongThanCong.com https://fb.com/tailieudientucntt , , , SP SP LV a2 a3 a2 a1 (a) LV a3 a2 a3 a2 a1 (b) SP LV a2 + a3 a3 a2 a1 (c) SP LV a3 a2 a2 + a3 (d) Figure 4-9 Use of an operand stack for doing an arithmetic computation CuuDuongThanCong.com https://fb.com/tailieudientucntt Branch/ no branch Valid Slot Branch address/tag Valid Slot Prediction Branch bits address/tag (a) Valid Slot Prediction bits Branch Target address/tag address (b) (c) Figure 4-41 (a) A 1-bit branch history (b) A 2-bit branch history (c) A mapping between branch instruction address and target address CuuDuongThanCong.com https://fb.com/tailieudientucntt Branch No branch 00 Predict no branch Branch No branch Branch 01 10 Predict no branch one more time Predict branch one more time No branch 11 Branch Predict branch No branch Figure 4-42 A 2-bit finite-state machine for branch prediction CuuDuongThanCong.com https://fb.com/tailieudientucntt Cy # Decoded R3=R0 *R1 R4=R0+R2 R5=R0+R1 R6=R1+R4 Iss – R7=R1 *R2 R1=R0−R2 – 10 11 12 13 14 15 Ret R3=R3 *R1 7 R1=R4+R4 Registers being read Registers being written 7 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 Figure 4-43 Operation of a superscalar CPU with in-order issue and in-order completion CuuDuongThanCong.com https://fb.com/tailieudientucntt Cy # Decoded R3=R0 *R1 R4=R0+R2 R5=R0+R1 R6=R1+R4 R7=R1 *R2 S1=R0−R2 Iss – Ret R3=R3 *S1 S2=R4+R4 – 6 8 Registers 1 1 3 3 3 3 4 2 2 2 1 1 1 1 being read Registers being written 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 4-44 Operation of a superscalar CPU with out-oforder issue and out-of-order completion CuuDuongThanCong.com https://fb.com/tailieudientucntt evensum = 0; evensum = 0; oddsum = 0; oddsum = 0; i = 0; i = 0; while (i < limit) { i >= limit while (i < limit) k = i * i * i; k = i * i * i; if ((i/2) * 2) = = 0) if ((i/2) * 2) = = 0) evensum = evensum + k; T F evensum = evensum + k; else oddsum = oddsum + k; oddsum = oddsum + k; i = i + 1; i = i + 1; } (a) (b) Figure 4-45 (a) A program fragment (b) The corresponding basic block graph CuuDuongThanCong.com https://fb.com/tailieudientucntt To level cache Local bus to PCI bridge Bus interface unit Level I-cache Fetch/Decode unit Level D-cache Dispatch/Execute unit Retire unit Micro-operation pool (ROB) Figure 4-46 The Pentium II microarchitecture CuuDuongThanCong.com https://fb.com/tailieudientucntt Level I-cache Pipeline stage IFU0 Cache line fetcher Next IP IFU1 Instruction length decoder Dynamic branch predictor IFU2 Instruction aligner ID0 ID1 Micro-operation queuer RAT Register allocator Micro-operation sequencer Static branch predictor ROB Micro-operations go in the ROB Figure 4-47 Internal structure of the Fetch/Decode unit (simplified) CuuDuongThanCong.com https://fb.com/tailieudientucntt Port MMXExecution unit Floating-Point execution unit Integer execution unit Port MMXExecution unit Floating-Point execution unit Integer execution unit Reservation station From/to ROB Port Load Unit Loads Port Store Unit Stores Port Store Unit Stores Figure 4-48 The Dispatch/Execute unit CuuDuongThanCong.com https://fb.com/tailieudientucntt To main memory Memory interface unit Level cache External cache unit Prefetch/Dispatch unit Level cache Grouping logic Integer execution unit Floating-point unit Load/store unit Integer registers FP registers Level D-cache ALU ALU FP ALU FP ALU Load store Store queue Graphics unit Figure 4-49 The UltraSPARC II microarchitecture CuuDuongThanCong.com https://fb.com/tailieudientucntt Integer pipeline Execute Fetch Decode Cache N1 N2 Group N3 Register X1 X2 X3 Floating-point/graphics pipeline Figure 4-50 The UltraSPARC II’s pipeline CuuDuongThanCong.com https://fb.com/tailieudientucntt Write Memory and I/O bus interface unit 32 32 0-16 KB Instruction cache 0-16 KB Data cache 32 Prefetch, decode, and folding unit 32 Execution control unit Integer and floating-point unit x 32 x 32 64 32-Bit registers for holding the top 64 words of the stack Figure 4-51 The block diagram of the picoJava II with both level caches and the floating-point unit This is configuration of the microJava 701 CuuDuongThanCong.com https://fb.com/tailieudientucntt Fetch from I-cache Decode and fold Fetch operands from stack Execute instruction Access data cache Write results to stack Figure 4-52 The picoJava II has a six-stage pipeline CuuDuongThanCong.com https://fb.com/tailieudientucntt Without folding SP k m k k+m k k k k k k k n n n n k+m n k+m m m m m m m m Start After folded instruction SP SP With folding Start After ILOAD k After ILOAD m (a) After IADD After ISTORE n (b) Figure 4-53 (a) Execution of a four-instruction sequence to compute n = k + m (b) The same sequence folded to one instruction CuuDuongThanCong.com https://fb.com/tailieudientucntt Group NF LV MEM BG1 BG2 OP Description Nonfoldable instructions Pushing a word onto the stack Popping a word and storing it in memory Operations using one stack operand Operations using two stack operands Computations on two operands with one result Figure 4-54 JVM instruction groups for folding purposes CuuDuongThanCong.com https://fb.com/tailieudientucntt Example GOTO ILOAD ISTORE IFEQ IF CMPEQ IADD LV LV LV LV LV LV OP Instruction sequence LV OP LV OP LV BG2 BG1 BG2 MEM MEM MEM Example ILOAD, ILOAD, IADD, ISTORE ILOAD, ILOAD, IADD ILOAD, ILOAD, IF CMPEQ ILOAD, IFEQ ILOAD, IF CMPEQ ILOAD, ISTORE IADD, ISTORE Figure 4-55 Some of the JVM instruction sequences that can be folded CuuDuongThanCong.com https://fb.com/tailieudientucntt ... Figure 4- 17 The microprogram for the Mic-1 (part of 3) CuuDuongThanCong .com https://fb .com/ tailieudientucntt BIPUSH (0×10) BYTE Figure 4- 18 The BIPUSH instruction format CuuDuongThanCong .com https://fb .com/ tailieudientucntt... Figure 4- 10 The various parts of the IJVM memory CuuDuongThanCong .com https://fb .com/ tailieudientucntt PC Hex 0x10 0x59 0xA7 0x60 0x7E 0x99 0x9B 0x9F 0x 84 0x15 0xB6 0x80 0xAC 0x36 0x 64 0x13 0x00... (b) The corresponding Java assembly language (c) The IJVM program in hexadecimal CuuDuongThanCong .com https://fb .com/ tailieudientucntt j k j j+k j j j–1 10 11 j j 12 13 14 15 Figure 4- 15 The