Chapter04 2pipelinedprocessor

48 0 0
Chapter04 2pipelinedprocessor

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

dce 2017 COMPUTER ARCHITECTURE CSE 2015 BK TP.HCM Faculty of Computer Science and Engineering Department of Computer Engineering Vo Tan Phuong http://www.cse.hcmut.edu.vn/~vtphuong dce 2017 Chapter 4.2 Thiết kế xử lý đường ống (Pipelined Processor Design) Computer Architecture – Chapter 4.2 ©Fall 2017, CSE dce Nội dung 2017  Thực thi theo kiểu đường ống so với  Datapath & Control theo kiểu đường ống  Rủi ro (Hazard) thực đường ống  Rủi ro liệu phương pháp xúc tiến sớm  Chờ lệnh “Load”, phát rủi ro khựng  Rủi ro điều khiển Computer Architecture – Chapter 4.2 ©Fall 2017, CSE dce Ví dụ chế đường ống 2017  Dịch vụ giặt đồ: bước Giặt Sấy Gấp  Mỗi bước thực 30 phút  Có mẻ Computer Architecture – Chapter 4.2 A B C D ©Fall 2017, CSE dce Phương pháp 2017 PM Time 30 30 30 30 30 30 10 30 30 11 30 30 12 AM 30 30 A B C D  Cần tiếng để hoàn thành mẻ  Dễ thấy cách làm cịn cải thiện Computer Architecture – Chapter 4.2 ©Fall 2017, CSE dce Áp dụng chế đường ống 2017 PM 30 30 30 30 30 30 30 30 30 PM Time 30 30 30 A  Cần tiếng cho mẻ B  Hiệu lần cho mẻ C  Thời gian xử lý mẻ không đổi (90 phút) D Computer Architecture – Chapter 4.2 ©Fall 2017, CSE dce Hiệu suất chế đường ống 2017  Mỗi công việc cần k công đoạn  Với ti = thời gian công đoạn Si  Chu kỳ xung nhịp t = max(ti) thời gian công đoạn dài  Tần số xung nhịp f = 1/t = 1/max(ti)  Thời gian xử lý n công việc = (k + n – 1)*t  k chu kỳ để hồn thành cơng việc  n – chu kỳ cịn lại hồn thành n – công việc  Speed up trường hợp lý tưởng Số chu kỳ cho cách Sk = Số chu lỳ cho cách pipeline Computer Architecture – Chapter 4.2 nk = k+n–1 Sk → k n lớn ©Fall 2017, CSE dce 2017 Bộ xử lý MIPS theo chế Pipeline  Gồm công đoạn, công đoạn chu kỳ IF: Instruction Fetch (nạp lệnh) ID: Instruction Decode (giải mã lệnh) EX: Execute (thực thi phép toán) MEM: Memory access (truy xuất nhớ liệu) WB: Write Back (ghi kết vào ghi đích) Computer Architecture – Chapter 4.2 ©Fall 2017, CSE dce 2017 So sánh Single-Cycle với Pipelined  Giả sử công đoạn q trình thực thi lệnh có thời gian sau:  Nạp lệnh (IF) = ALU thực thi (ALU) = truy xuất nhớ liệu (MEM) = 200 ps  Đọc ghi (RegR) = ghi ghi (RegW) = 150 ps  Tính chu kỳ xử lý đơn chu kỳ (Ts)?  Tính chu kỳ xử lý đơn đường ống (Tp)?  Tính speedup?  Lời giải: Ts = 200+150+200+200+150 = 900 ps IF Reg ALU MEM Reg 900 ps IF Reg ALU MEM Reg 900 ps Computer Architecture – Chapter 4.2 ©Fall 2017, CSE dce So sánh (tiếp theo) 2017  Tp = max(200, 150) = 200 ps IF Reg 200 IF 200 ALU Reg IF 200 MEM Reg ALU MEM Reg ALU MEM 200 200 Reg 200  CPI cho xử lý pipeline = Reg 200  Xét trường hợp số lượng lệnh lớn  Speedup xử lý pipeline = 900 ps / 200 ps = 4.5  IC CPI cho hai trường hợp  Speedup nhỏ (số công đoạn)  Do thời gian công đoạn không cân Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 10 dce Load Delay 2017  Unfortunately, not all data hazards can be forwarded  Load has a delay that cannot be eliminated by forwarding  In the example shown below …  The LW instruction does not read data until end of CC4  Cannot forward data to ADD at end of CC3 - NOT possible Program Order Time (cycles) lw $s2, 20($t1) add $s4, $s2, $t5 CC1 CC2 CC3 CC4 CC5 IF Reg ALU DM Reg IF Reg ALU DM Reg IF Reg ALU DM Reg IF Reg ALU DM or $t6, $t3, $s2 and $t7, $s2, $t4 Computer Architecture – Chapter 4.2 CC6 CC7 CC8 However, load can forward data to 2nd next and later instructions Reg ©Fall 2017, CSE 34 dce 2017 Detecting RAW Hazard after Load  Detecting a RAW hazard after a Load instruction:  The load instruction will be in the EX stage  Instruction that depends on the load data is in the decode stage  Condition for stalling the pipeline if ((EX.MemRead == 1) // Detect Load in EX stage and (ForwardA==1 or ForwardB==1)) Stall // RAW Hazard  Insert a bubble into the EX stage after a load instruction  Bubble is a no-op that wastes one clock cycle  Delays the dependent instruction after load by once cycle  Because of RAW hazard Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 35 dce Stall the Pipeline for one Cycle 2017  ADD instruction depends on LW  stall at CC3  Allow Load instruction in ALU stage to proceed  Freeze PC and Instruction registers (NO instruction is fetched)  Introduce a bubble into the ALU stage (bubble is a NO-OP)  Load can forward data to next instruction after delaying it Program Order Time (cycles) lw $s2, 20($s1) CC1 CC2 CC3 CC4 CC5 IM Reg ALU DM Reg IM stall bubble bubble bubble Reg ALU DM Reg IM Reg ALU DM add $s4, $s2, $t5 or $t6, $s3, $s2 Computer Architecture – Chapter 4.2 CC6 CC7 CC8 Reg ©Fall 2017, CSE 36 dce Showing Stall Cycles 2017  Stall cycles can be shown on instruction-time diagram  Hazard is detected in the Decode stage  Stall indicates that instruction is delayed  Instruction fetching is also delayed after a stall  Example: Data forwarding is shown using green arrows lw $s1, ($t5) lw $s2, 8($s1) IF ID IF EX MEM WB Stall add $v0, $s2, $t3 ID IF sub $v1, $s2, $v0 EX MEM WB Stall ID EX MEM WB IF ID EX MEM WB CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 Time Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 37 dce Hazard Detect, Forward, and Stall 32 Data_out 32 32 WData Data Memory 0 BusW Result Im26 A BusB Address Data_in 32 Rd4 RW A L U D Rd RB B Rt RA BusA 32 ALU result 32 E Rd2 PC Instruction Rs Register File Imm26 Rd3 2017 clk Disable PC RegDst ForwardB func ForwardA Hazard Detect Forward, & Stall MemRead Stall Bubble =0 RegWrite Computer Architecture – Chapter 4.2 WB Control Signals MEM Main & ALU Control EX Op RegWrite RegWrite ©Fall 2017, CSE 38 dce Code Scheduling to Avoid Stalls 2017  Compilers reorder code in a way to avoid load stalls  Consider the translation of the following statements: A = B + C; D = E – F; // A thru F are in Memory  Slow code: lw lw add sw lw lw sub sw $t0, 4($s0) $t1, 8($s0) $t2, $t0, $t1 $t2, 0($s0) $t3, 16($s0) $t4, 20($s0) $t5, $t3, $t4 $t5, 12($0)  Fast code: No Stalls # &B = 4($s0) # &C = 8($s0) # stall cycle # &A = 0($s0) # &E = 16($s0) # &F = 20($s0) # stall cycle # &D = 12($0) Computer Architecture – Chapter 4.2 lw lw lw lw add sw sub sw $t0, $t1, $t3, $t4, $t2, $t2, $t5, $t5, 4($s0) 8($s0) 16($s0) 20($s0) $t0, $t1 0($s0) $t3, $t4 12($s0) ©Fall 2017, CSE 39 dce 2017 Name Dependence: Write After Read  Instruction J should write its result after it is read by I  Called anti-dependence by compiler writers I: sub $t4, $t1, $t3 # $t1 is read J: add $t1, $t2, $t3 # $t1 is written  Results from reuse of the name $t1  NOT a data hazard in the 5-stage pipeline because:  Reads are always in stage  Writes are always in stage 5, and  Instructions are processed in order  Anti-dependence can be eliminated by renaming  Use a different destination register for add (eg, $t5) Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 40 dce 2017 Name Dependence: Write After Write  Same destination register is written by two instructions  Called output-dependence in compiler terminology I: sub $t1, $t4, $t3 # $t1 is written J: add $t1, $t2, $t3 again # $t1 is written  Not a data hazard in the 5-stage pipeline because:  All writes are ordered and always take place in stage  However, can be a hazard in more complex pipelines  If instructions are allowed to complete out of order, and  Instruction J completes and writes $t1 before instruction I  Output dependence can be eliminated by renaming $t1  Read After Read is NOT a name dependence Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 41 dce 2017 Tiếp theo…  Thực thi theo kiểu đường ống so với  Datapath & Control theo kiểu đường ống  Rủi ro (Hazard) thực đường ống  Rủi ro liệu phương pháp xúc tiến sớm  Chờ lệnh “Load”, phát rủi ro khựng  Rủi ro điều khiển Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 42 dce Control Hazards 2017  Jump and Branch can cause great performance loss  Jump instruction needs only the jump target address  Branch instruction needs two things:  Branch Result Taken or Not Taken  Branch Target Address  PC + If Branch is NOT taken  PC + + × immediate If Branch is Taken  Jump and Branch targets are computed in the ID stage  At which point a new instruction is already being fetched  Jump Instruction: 1-cycle delay  Branch: 2-cycle delay for branch result (taken or not taken) Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 43 dce 2-Cycle Branch Delay 2017  Control logic detects a Branch instruction in the 2nd Stage  ALU computes the Branch outcome in the 3rd Stage  Next1 and Next2 instructions will be fetched anyway  Convert Next1 and Next2 into bubbles if branch is taken Beq $t1,$t2,L1 cc1 cc2 cc3 IF Reg ALU IF Next1 Next2 L1: target instruction Computer Architecture – Chapter 4.2 cc4 cc5 cc6 Reg Bubble Bubble Bubble IF Bubble Bubble Bubble Bubble IF Reg ALU DM Branch Target Addr cc7 ©Fall 2017, CSE 44 dce Implementing Jump and Branch NPC2 Bne A 32 E BusB BusW zero A L U ALUout Imm16 D RW Beq 32 32 Rd3 Rd Im26 NPC Address RB BusA J B Rt RA Next PC Rd2 Instruction Rs Register File Instruction Memory Imm26 Op PCSrc Instruction +1 PC Jump or Branch Target 2017 Branch target & outcome are computed in ALU stage J, Beq, Bne Main & ALU Control Computer Architecture – Chapter 4.2 Control Signals Bubble = 0 MEM Branch Delay = cycles Reg Dst EX func clk ©Fall 2017, CSE 45 dce Predict Branch NOT Taken 2017  Branches can be predicted to be NOT taken  If branch outcome is NOT taken then  Next1 and Next2 instructions can be executed  Do not convert Next1 & Next2 into bubbles  No wasted cycles Beq $t1,$t2,L1 Next1 cc1 cc2 cc3 IF Reg ALU NOT Taken IF Next2 Computer Architecture – Chapter 4.2 cc4 cc5 cc6 Reg ALU DM Reg IF Reg ALU DM cc7 Reg ©Fall 2017, CSE 46 dce 2017 Reducing the Delay of Branches  Branch delay can be reduced from cycles to just cycle  Branches can be determined earlier in the Decode stage  A comparator is used in the decode stage to determine branch decision, whether the branch is taken or not  Because of forwarding the delay in the second stage will be increased and this will also increase the clock cycle  Only one instruction that follows the branch is fetched  If the branch is taken then only one instruction is flushed  We should insert a bubble after jump or taken branch  This will convert the next instruction into a NOP Computer Architecture – Chapter 4.2 ©Fall 2017, CSE 47 dce Reducing Branch Delay to Cycle J Beq Bne Longer Cycle = RW A E BusB BusW 32 32 1 A L U D Rd 32 Rd3 RB BusA B Address Op Rt RA Rd2 Instruction Rs Register File Instruction Memory Instruction Imm16 PCSrc Data forwarded then compared ALUout Next PC Reset +1 PC Jump or Branch Target Zero Im16 2017 Reg Dst J, Beq, Bne Main & ALU Control Computer Architecture – Chapter 4.2 Control Signals Bubble = ALUCtrl MEM Reset signal converts next instruction after jump or taken branch into a bubble EX func clk ©Fall 2017, CSE 48

Ngày đăng: 08/04/2023, 06:21

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan