Advanced Computer Architecture - Lecture 18: Instruction level parallelism. This lecture will cover the following: hardware-based speculations and exceptions; speculating on the outcome of branches; extension in the tomasulo’s hardware; handling exceptions; modified hardware including ROB;...
CS 704 Advanced Computer Architecture Lecture 18 Instruction Level Parallelism (Hardware-based speculations and exceptions) Prof Dr M Ashraf Chughtai Today's Topics Recap Hardware-based Speculations - Speculating on the outcome of branches - Extension in the Tomasulo’s hardware - Handling Exceptions Summary MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Recap: Lecture 17 Last time we discussed three basic concepts to accomplish multiple instructions issue: Branch Target Buffer Integrated Instruction Fetch Units Return Address Predictors MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Recap: Lecture 17 Branch Target-buffer provides the target branch address at the IF stage Its variation, branch folding, buffers the actual target-instruction instead of or along with target address Both facilitate to minimize branch-hazard stalls allowing multiple instruction issue in one clock cycle MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Recap Lecture 17… Cont’d Integrated Instruction Fetch Unit (IIFU) integrates the following three functions into a single step : Branch Prediction Instruction Prefetch Instruction memory access and buffering MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Recap: Lecture 17… Cont’d The Return-Address predictor is one that predicts the indirect jumps, i.e., the jumps for indirect procedure calls and select or case statements MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Recap: Lecture 17 … Cont’d Then we discussed the features of: Superscalar processors VLIW processors In the superscalar pipeline processors the multiple instructions issued in one clock cycle can be scheduled using both the static as well as dynamic scheduling techniques MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Recap: Lecture 17… Cont’d Whereas, the VLIW-based processors schedule multiple instruction issues in one clock cycle using only the static scheduling approaches Then we discussed the performance enhancement and factors limiting the performance in superscalar pipes – statically scheduled MAC/VU-Advanced Computer Architecture scheduled Lecture 18 – Instruction Level Parallelism -Dynamic (7) and dynamically Today’s Focus Last time, in the loop-based example, we observed that the control hazards, which prevent us from starting the next iteration before we know whether the branch was correctly predicted or not, causes one-cycle penalty, on every loop iteration Today we will focus on the hardware-based speculation to address this limitation MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) Hardware-based Speculation: Introduction Hardware-based speculation offers many advantages – Can incorporate hardware-based branch prediction – Does not require additional bookkeeping code – Does not depend on a compiler MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 10 Explanation con’t… Note that the L.D following the BNE cannot start execution earlier, because it must wait until the branch outcome is determined This type of program with data dependent branches that cannot be resolved earlier, shows evaluation allow multiple instructions to execute in the same clock cycle MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 63 table MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 64 Explanation con’t… The second table shows the time of issue, execution and writing result for a dualissue version of our pipeline with speculation Note that the L.D following the BNE can start execution early because it is speculative MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 65 Explanation con’t… Comparing the two tables, note that The third branch in the speculative processor executes in 13 clock cycle, while in non-speculative processor it executes in 19 clock cycle That is, the non-speculative pipelines are falling behind the issue rate rapidly MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 66 Exceptions to Hardware-based speculation Extended discussion So far, we have been discussing the performanceenhancement using the structure of Tomasulo’s Algorithm extended to handle speculations for ILP in single-issue and multiple-issue processors Here, we observed that the store-buffer of the Tomasulo’s structure is eliminated and a Re-Order Buffer is included that incorporates the function of store-buffer The structure is then further extended to handle multiple-issue by making the CDB wider MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 67 Exceptions to Hardware-based speculation Now, we will talk about the exceptional situations which may arise when executing a program using dynamic scheduling and how the structure with hardware-based speculation considers these exceptions We know that the dynamic scheduling without speculation, allows to complete execution out-of-order, where as the structure with speculating-hardware commits in-order MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 68 Exceptions to Hardware-based speculation Therefore, if an exceptional situation occurs while exacting an instruction, the ROB in structure with speculation doesn’t commit and handle exceptions Let us reconsider the execution of our first example program using Tomasulo’s structure with speculation and without speculation - insert table 3.30 MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 69 Exceptions to Hardware-based speculation Here, the instructions SUB.D and ADD.D, occurring after the incomplete instruction MUL.D, but executed earlier, don’t commit until the instruction MUL.D completes and commit – in an exceptional case, if MUL.D causes an interrupt, then it is handled as follows we can wait until this interrupt reaches the head of ROB and any pending instruction is flushed out, the speculation is un-done MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 70 Exceptions to Hardware-based speculation – Whereas, in case of dynamic scheduling without speculation, the results in registers F8 (for SUB.D) and in register F6 (for ADD.D) could be overwritten out-of-order, thus the interrupt could not be handled MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 71 Exceptions to Hardware-based speculation Furthermore, the exceptions are handled not recognizing then until it is ready to commit This may be explained by considering our earlier example of the execution of a loop Loop: L.D F0,0(R1) MUL.D F4,F0,F2 S.D F4,0(R1) DADDUI R1,R1,# -8 BNE R1,R2, LOOP ;branch if R1=R2 MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 72 Exceptions to Hardware-based speculation Here, if the an exception arises, say due to interrupt from MUL.D, the exception is recorded in the ROB At the same time, if misprediction arises from the speculated instruction (i.e., BNE) then the exception is flushed out along with the speculated instruction that should not have been executed when the ROB is cleared MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 73 Summary The focus of our today’s discussion has been the Tomasulo’s hardware modification to handle execution using speculation, i.e., Speculating on the outcome of branches to avoid control hazards, which prevent us from starting the next operation before we know whether the branch was correctly predicted or not MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 74 Summary The Main idea is to allow execution of a branch instruction, predicted taken, such that there are no consequences if branch is not actually taken Further, we don’t want a speculative instruction to cause exceptions which stop programs Software generated interrupt or memory violation are typical examples of exceptions MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 75 Summary We found that this can be achieved: - by including a buffer that holds the results and exceptions from instructions, until it is known that the instruction would execute - Such a buffer is called Re-Order Buffer – ROB - ROB is used only to track commits - The ROB is flushed out if the speculation does not hold or exception is found MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 76 Asslam-u-aLacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 77 ... MAC/VU -Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 27 Modified hardware including ROB MAC/VU -Advanced Computer Architecture Lecture 18 – Instruction Level. .. allow an instruction: - to execute and - to bypass its result to other instructions without …………… Cont’d MAC/VU -Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic... speculation is to allow instructions to execute out-of- order but force them to commit in-order MAC/VU -Advanced Computer Architecture Lecture 18 – Instruction Level Parallelism -Dynamic (7) 25 Hardware