tính toán song song thoại nam parallelprocessing 07 pipeline sinhvienzone com

Si nh Vi en Zo ne C om Pipeline SinhVienZone.com Thoai Nam https://fb.com/sinhvienzonevn  C ne  Zo  Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome nh Vi en  om Outline Si Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson, Chapter Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn .C A technique to make fast CPUs by overlapping execution of multiple instructions Cycles ne Instruction i S1 S2 Instruction i+1 Instruction i+2 Instruction i+3 S1 Instruction i+4 Khoa Coâng Nghệ Thông Tin SinhVienZone.com Zo S3 S4 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 nh Vi en Instruction # Si  om Concepts – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn S4 Concepts (cont’d) Pipeline throughput om  Pipeline stall nh Vi en  Zo ne C – Determined by how often an instruction exists the pipeline – Depends on the overhead of clock skew and setup – Depends on the time required for the slowest pipe stage  Si – Delay the execution of some instructions and all succeeding instructions – “Slow down” the pipeline Pipeline Designer’s goal – Balance the length of pipeline stages – Reduce / Avoid pipeline stalls Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Concepts (cont’d) om = Average instruction time with pipeline C Pipeline speedup Average instruction time without pipeline ne CPI with pipelining * Clock cycle with pipelining Zo = CPI without pipelining * Clock cycle without pipelining nh Vi en ( CPI = number of Cycles Per Instruction) Si CPI without pipelining = Ideal CPI * Pipeline depth = Ideal CPI + Pipeline stall clock cycles per instruction CPI with pipelining Ideal CPI * Pipeline depth Pipeline speedup = Ideal CPI + Pipeline stall clock cycles per instruction Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn The DLX Architecture om C ne Zo  – – – – – nh Vi en  A mythical computer which architecture is based on most frequently used primitives in programs Used to demonstrate and study computer architecture organizations and techniques A DLX instruction consists of execution stages IF – instruction fetch ID – instruction decode and register fetch EX – execution and effective address calculation MEM – memory access WB – write back Si  Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Instruction i IF Instruction i+1 EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM Khoa Công Nghệ Thông Tin SinhVienZone.com Cycles ID Si Instruction i+2 Instruction i+3 nh Vi en Instruction # Zo ne  Fetch a new instruction on each clock cycle An instruction step = a pipe stage C  om A Simple DLX Pipeline – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn WB Are situations that prevent the next instruction in the instruction stream from executing during its designated cycles Leads to pipeline stalls Reduce pipeline performance Are classified into types  Zo nh Vi en  – Structural hazards – Data hazards – Control hazards Si  ne C  om Pipeline Hazards Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Baùch Khoa Tp.HCM https://fb.com/sinhvienzonevn  Due to resource conflicts Instances of structural hazards C  om Structure Hazard Why we allow this type of hazards? Si  nh Vi en Zo ne – Some functional unit is not fully pipelined » a sequence of instructions that all use that unit cannot be sequentially initiated – Some resource has not been duplicated enough Eg: » Has only register-file write port while needing write in a cycle » Using a single memory pipeline for data and instruction – To reduce cost – To reduce the latency of the unit Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Occurs when the order of access to operands is changed by the pipeline, making data unavailable for next instruction Example: consider these instructions ( R2 + R3  R1) ( R1 – R5  R4) ADD instruction IF ID EX MEM WB IF ID EX MEM Si Instruction # nh Vi en ADD R1, R2, R3 SUB R4, R1, R5 Zo  ne C  om Data Hazard SUB instruction Cycles Data written here WB Data read here  instruction is stalled cycles Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn  om Hardware Solution to Data Hazard Forwarding (bypassing/short-circuiting) techniques SUB R4, R1, R5 ID EX MEM WB IF ID EX MEM WB IF ID Si AND R6, R1, R7 IF nh Vi en ADD R1, R2, R3 OR R8,R1,R9 XOR R1, R10, R11 Khoa Công Nghệ Thông Tin SinhVienZone.com Zo ne C – Reduce the delay time between depended instructions – The ALU result is fed back to the ALU input latches – Forwarding hardware check and forward the necessary result to the ALU input for the next instructions IF No stall EX MEM WB No stall ID EX MEM WB IF ID – Đại Học Bách Khoa Tp.HCM No stall EX MEM WB https://fb.com/sinhvienzonevn  om Types of Data Hazards RAW(Read After Write) WAR(Write After Read) Zo  ne C – Instruction j tries to read a source before instruction i writes it – Most common types  nh Vi en – Instruction j tries to write a destination before instruction i read it to execute – Can not happen in DLX pipeline Why? WAW(Write After Write)  Si – Instruction j tries to write a operand before instruction i updates it – The writes end up in the wrong order Is RAR (Read After Read) a hazard? Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Pipeline scheduling (Instruction scheduling) nh Vi en Zo ne C – Use compiler to rearrange the generated code to eliminate hazard Example: Generated and rearranged code Source code Generated code (no hazard) c=a+b LW Ra, a LW Ra, a d=e-f LW Rb, b LW Rb, b ADD Rc, Ra, Rb LW Re, e SW c, Rc ADD Rc, Ra, Rb LW Re, e LW Rf, f Si  om Software Solution to Data Hazard Data hazards LW Rf, f SW c, Rc SUB Rd, Re, Rf SUB Rd, Re, Rf SW d, Rd SW d, Rd Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn .C  Unnecessary instruction loaded Instruction i+1 Instruction i+2 IF ID EX MEM IF stall stall Instruction i+4 Instruction i+5 stall IF ID EX MEM WB stall stall IF ID EX MEM WB stall stall IF ID EX MEM stall stall stall IF ID EX… stall stall stall IF ID stall stall stall IF stall Si Instruction i+3 WB Instruction i+6 Khoa Công Nghệ Thông Tin SinhVienZone.com The PC register changed here nh Vi en Branch instruction ne  Occurs when a branch/jump instruction is taken Causes great performance loss Example: Zo  om Control/Branch Hazard – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn .C ne – – – – Zo  Pipeline freezing Predict-not-taken scheme Predict-taken scheme (N/A in DLX) Delayed branch nh Vi en  Predict whether the branch is taken or not Compute the branch target address earlier Use many schemes Si  om Reducing Control Hazard Effects Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Hold any instruction after the branch until the branch destination is known Simple but not efficient Zo nh Vi en Si  ne C  om Pipeline Freezing Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Predict-Not-Taken Scheme om Predict the branch as not taken and allow execution to continue C  nh Vi en  If the branch is not taken: no penalty If the branch is taken: – Restart the fetch at the branch target – Stall one cycle Si  Zo ne – Must not change the machine state till the branch outcome is known Khoa Coâng Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Example Instruction Fetch restarted IF ID Instruction i+1 IF IF ID EX MEM WB stall IF ID EX MEM WB ID EX MEM WB IF ID EX nh Vi en Instruction i+2 EX MEM WB Zo Taken branch instruction ne C  om Predict-Not-Taken Scheme (cont’d) Instruction i+3 stall IF stall MEM Si Instruction i+4 Right instruction fetched Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn Change the order of execution so that the next instruction is always valid and useful “From before” approach Zo nh Vi en ADD R1, R2, R3 If R2=0 then Delay slot becomes If R2=0 then ADD R1, R2, R3 Si  ne C  om Branch Delayed Khoa Công Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn .C “From target” approach nh Vi en SUB R4,R5,R6 Zo ne  om Branch Delayed (cont’d) If R1=0 then Si ADD R1, R2, R3 Delay slot Khoa Coâng Nghệ Thông Tin SinhVienZone.com becomes ADD R1, R2, R3 If R1=0 then SUB R4,R5,R6 – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn om Branch Delayed (cont’d) C “From fall through” approach nh Vi en Zo ne  ADD R1, R2, R3 becomes If R1=0 then Delay slot If R1=0 then SUB R4,R5,R6 Si SUB R4,R5,R6 ADD R1, R2, R3 Khoa Coâng Nghệ Thông Tin SinhVienZone.com – Đại Học Bách Khoa Tp.HCM https://fb.com/sinhvienzonevn ... the pipeline Pipeline Designer’s goal – Balance the length of pipeline stages – Reduce / Avoid pipeline stalls Khoa Công Nghệ Thông Tin SinhVienZone. com – Đại Học Bách Khoa Tp.HCM https://fb .com/ sinhvienzonevn... Khoa Công Nghệ Thông Tin SinhVienZone. com – Đại Học Bách Khoa Tp.HCM https://fb .com/ sinhvienzonevn Pipeline scheduling (Instruction scheduling) nh Vi en Zo ne C – Use compiler to rearrange the... pipelining = Ideal CPI * Pipeline depth = Ideal CPI + Pipeline stall clock cycles per instruction CPI with pipelining Ideal CPI * Pipeline depth Pipeline speedup = Ideal CPI + Pipeline stall clock

Định dạng
Số trang	21
Dung lượng	502,52 KB