Advanced Computer Architecture - Lecture 11: Computer hardware design. This lecture will cover the following: pipeline and instruction level parallelism; structural hazards; data hazards; control hazards; pipelining the R-type and load instruction; branch prediction; multiple streams;...
CS 704 Advanced Computer Architecture Lecture 11 Computer Hardware Design (Pipeline and Instruction Level Parallelism) Prof Dr M Ashraf Chughtai Today’s Topics Recap Lecture 10 Structural Hazards Data Hazards Control Hazards MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Recap: Lecture 10 Multi cycle datapath verses pipeline datapath Key components of pipeline data path Performance enhancement due to pipeline Introduction to hazards in pipelined datapath MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Structural Hazards Attempt to use the same resource two different ways at the same time, e.g., Single memory port is accessed for instruction fetch and data read in the same clock cycle would be a structural hazard … Example : next slide MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Single Memory is a Structural Hazard Time (clock cycles) Instr Mem Reg Mem Reg Mem Reg Mem Reg Mem Reg Mem Reg ALU Instr Reg ALU Instr Mem Reg ALU Instr Mem ALU O r d e r Instr Load Mem Reg ALU I n s t r Mem Reg Two memory read operations in the 4th cycle: The LOAD instruction accesses memory to read data and the 4th instruction fetched from the same memory MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Single Memory is a Structural Hazard Time (clock cycles) Stall Instr Reg Mem Reg Mem Reg Mem Reg Mem Reg ALU Instr Mem ALU ADD Reg Bubble Instr Mem ALU O r d e r Instr Load Mem Reg ALU I n s t r Mem Reg Insert stall (bubble) to avoid memory structural hazard MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Structural Hazards Structural hazard exists when Single write port of register accessed for two WB operations in same clock cycle – this situation does not exist in 5-stage pipeline But it may exist in and stage multi-cycle pipeline Explanation next………………… MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Pipelining the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock 1st lw Ifetch Reg/Dec 2nd lw Ifetch 3rd lw Exec Mem Wr Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr The five independent functional units in the pipeline datapath are: Inst Fetch, Dec/Reg Rd, ALU for Exec, Data Mem and Register File’s Write port for the Wr stage Here, we have separate register’s read and write ports so registers read and write is allowed at the same time Each functional unit is used once MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) The Four Stages of R-type Rtype Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-type instruction does not access data memory, so it only takes clocks, or say stages to complete Here, the ALU is used to operate on the register operands The result is written in to the register during WB stage MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) Pipelining the R-type and Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock Rtype Ifetch Rtype Reg/Dec Exec Ifetch Reg/Dec Exec Ifetch Reg/Dec Load Ops! We have a problem! Wr Rtype Ifetch Wr Exec Mem Wr Reg/Dec Exec Wr Rtype Ifetch Reg/Dec Exec Wr We have pipeline conflict or structural hazard: – Two instructions try to write to the register file at the same time! – Only one write port MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 10 Solution# Redo Branch Branch successor Branch successor + IFetch Fetch after Branch Dcd Exec Mem IFetch IFetch Dcd IFetch WB Exec Dcd Mem Exec WB Mem WB We know that once a branch has been detected during the Instruction decode /Register read stage, the next instruction fetch cycle should essentially be a stall, if we assume that branch is taken Next slide please …………… MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 34 Solution# Redo Fetch after Branch However, the instruction fetched in this cycle never performs useful work, and is ignored Therefore, re-fetch the Branch successor instruction will is provide the correct instruction Indeed, the second fetch is not essential branch is not taken Impact: clock cycles per branch instruction if branch is untaken MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 35 Solution# Delayed Beq Misc Reg Mem Reg IF Reg Mem Reg IF Reg IF Reg Mem Reg ALU Load IF ALU Add ALU O r d e r Time (clock cycles) ALU I n s t r Branch – S/W method Mem Reg Redefine branch behavior to take place after the next instruction by introducing other instruction (may be No-OP) which is always executed Impact: clock cycles per branch instruction if can find instruction to put in “slot” (- 50% of time) MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 36 Solution#4 Prediction This techniques suggest that for a branch instruction we guess one direction of the branch, to begin, then back up if wrong The two possible predictions are: - Predict Branch not-taken - Predict branch taken MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 37 Branch Prediction Flowchart MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 38 Predict – Branch not taken - This scheme is implemented assuming every branch as branch Not-taken - So the processor continues to fetch branch as normal instructions Sequence when branch is not-taken Branch Inst ‘i’ IF Inst ‘i+1’ Inst ‘i+2’ Inst ‘i+2’ MAC/VU-Advanced Computer Architecture ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM Lecture 11 –Computer Hardware Design (5) WB 39 Predict Branch not taken … Cont’d - We the decision has been made, and the branch is taken, then fetch operations are turned into NO-OP and fetch is restarted at the target address Sequence when branch is taken Taken Branch Inst ‘i’ IF ID EX MEM WB Inst ‘i+1’ IF Idle Idle Idle Idle Branch target Branch target +1’ ID MAC/VU-Advanced Computer Architecture IF ID EX MEM WB IF EX MEM WB Lecture 11 –Computer Hardware Design (5) 40 Predict - Branch taken An alternative way is to treat every branch as Branch taken As soon as the target address is computed, we assume that the branch is to be taken and start fetching and executing at the target In a five stage pipeline the target address and condition evaluation are available at the same time, so this technique is of no use Let us consider this example of a LOOP to explain the concept: MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 41 Predict - Branch taken i=0 Loop: …… i = i+1 IF i ≠ 1001 THEN Loop …… Here, the branch is taken for 1000 time, so the prediction “Branch Taken” fails in 1000, hence no stall for 1000 times Further, the compiler can improve performance by organizing the code so that the most frequent path matches the hardware choice MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 42 Solution #5 Multiple Streams Have two pipelines Pre-fetch each branch into a separate pipeline Use appropriate pipeline Results Leads to bus & register contention Multiple branches lead to further pipelines being needed MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 43 Solution# Pre-fetch Branch Target Target of branch is pre-fetched in addition to instructions following branch Keep target until branch is executed Used by IBM 360/91 MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 44 Summary Type of hazards in pipelined datapath Structural hazards occur when same resource is accessed by more than one instructions One memory port or one register write port It can be removed by using either multiple resources or inserting stall Stall degrades the pipeline performance MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 45 Summary Data Hazards occur when attempt is made to read invalid data Data hazard can be removed by using stall and forwarding techniques Control hazards occur when an attempt is made to branch prior to the evaluation of the condition Four ways to handle control hazards MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 46 Summary – ways to handle control hazard 1: 2: Stall until branch direction is clear Predict Branch Not Taken Execute successor instructions in sequence “Squash” instructions in pipeline if branch actually taken PC+4 already calculated, so use it to get next instruction 3: 4: Predict Branch Taken Delayed Branch Define branch to take place AFTER a following instruction slot delay allows proper decision and branch target address in stage pipeline MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 47 Asslam-u-aLacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 11 –Computer Hardware Design (5) 48 ... pipeline MAC/VU -Advanced Computer Architecture Lecture 11 ? ?Computer Hardware Design (5) 47 Asslam-u-aLacum and ALLAH Hafiz MAC/VU -Advanced Computer Architecture Lecture 11 ? ?Computer Hardware Design (5)... are: - Predict Branch not-taken - Predict branch taken MAC/VU -Advanced Computer Architecture Lecture 11 ? ?Computer Hardware Design (5) 37 Branch Prediction Flowchart MAC/VU -Advanced Computer Architecture. .. 6! MAC/VU -Advanced Computer Architecture Lecture 11 ? ?Computer Hardware Design (5) 12 Solution 2: Delay R-type’s Write by One Cycle Delay R-type’s register write by one cycle: – Now R-type instructions