Advanced Computer Architecture - Lecture 24: Instruction level parallelism. This lecture will cover the following: concluding instruction level parallelism; compile time H/W support; to preserve exceptions - typical examples; for memory reference speculation; speculation mechanism;...
CS 704 Advanced Computer Architecture Lecture 24 Instruction Level Parallelism (Concluding Instruction Level Parallelism) Prof Dr M Ashraf Chughtai Today’s Topics Recap Compile Time H/W Support: – To Preserve Exceptions - Typical Examples – For memory Reference Speculation Speculation Mechanism: H/W Vs S/W Summary MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap: Compile Time H/W Support Last time we discussed the methods to provide H/W support for exposing more parallelism at the compile time We introduced the concept of extension in the Instruction set by including Conditional or predicated instructions MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d We found that such instructions can be used to eliminate branches and to convert control dependence to data dependence which is relatively simple to handle Thus, it improves the processing performance MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d We also introduced the hardwareand software-based abilities required to: move the speculated instructions before the branch condition evaluation while preserving the exception behavior MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d We further introduced four methods to support speculation; and, move of instruction such that the mispredicted speculated sequence is not used in the final execution; But, the exception behavior is preserved to take care of the exceptions that may be used later MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d However, in order to study these methods we distinguish between the exceptions that indicate program error and: normally cause termination; or handle the error to resume normally MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d Typical example of the behaviorexception that indicate the program error and terminates is memory protection violation The result of a program that gets such an exception is not well defined , therefore … MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d … if such an exception arise in speculated instructions, we cannot take the exception, hence need not preserve such an instruction MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) Recap …… Cont’d The example of the behaviorexception that indicates the program error, which could be handled and program normally resumes is Page Fault in virtual memory MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 10 Example Cont’d Here, we discussed that if the LW immediately following the Branch is converted to predicted version of load-word (LWC) instruction, move it up by two slot, and assume load occurs unless third operand (R10) is ZERO, then it improve the execution time First instruction slot second instruction slot LW R1,40 (R2) ADD R3,R4,R5 LW C R8,0(R10), R10 ADD R6,R3,R7 BEQZ R10,L LW R9,0 (R8) MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 68 Example Cont’d Here, the branch is written to skip the LW instruction if R10 = 0; and this instruction LW R8, 0(R10) is executed unconditionally This is likely to cause a protection exception which should not occur Now let us see how we can re-write the code using conditional move instruction MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 69 Example Cont’d The code should be written assuming that the loads, which are no longer control dependent, cannot raise an exception if they should not have been executed, i.e., The branch must guards against a memory access violation MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 70 Example Cont’d As in this example code, a violation would terminate the program, therefore, while re-writing the program we must consider that “If the instruction LW R8, 0(R10) is moved before the branch, the effective address must not be zero” MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 71 Example Cont’d Here, in order to guard the load by conditional move instruction, we need two unassigned registers; – One of the register must contain a safe address for the load (Let us use register R29) instruction [LW R8, 0(R10) ] – The other register must save the original contents of R8 (Let us use the register R30 for this purpose) MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 72 Example Cont’d: Revised code DADDI LW MOV CMOVNZ LW CMOVZ BEQZ LW MAC/VU-Advanced Computer Architecture R29,R0,#1000 ;initialize R29 to a safe address R1,40(R2) ;first load instruction of original code R30,R8 ;save R8 in unused R30 R29 is unused R29,R10,R10 and contain a safe R8,0(R29);speculative load address R29R10 if R10 contains a safe R8,R30,R10 address ≠0 R10,L If R10=0 load is R9,0(R8) incorrectly speculated so restore R8 Lecture 24 – Instruction Level Parallelism-Static (5) 73 Example Cont’d Here, the Load after the branch can be speculated using one more conditional move, and for this one more unused register will be needed However, there is a significant conditional instruction overhead Using R31 as the unused register, the branch free code is as follows MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 74 Example Cont’d ADDI LW MOV MOV CMOVNZ LW LW CMOVZ CMOVZ MAC/VU-Advanced Computer Architecture R29,R0,#1000 R1,40(R2) R30,R8 R31,R9 ;save R9 in unused R31 R29,R10,R10 R8,0(R29) R9,0(R8) ;load speculated R8,R30,R10 R9,R31,R10 ;restore R9, if needed Lecture 24 – Instruction Level Parallelism-Static (5) 75 Summary: Hardware versus software speculation Now while concluding our discussion on exploiting the ILP using hardware and software techniques we can say that both the approaches have certain limitations These limitations can be summarized as follows: MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 76 Summary: Hardware versus software speculation 1) To speculate extensively, we must able to ascertain the memory references, which is difficult to at the compile time In a hardware based schemes, dynamic run time certainty of memory address is done using the Tomasulo’s pipelined structure, which allows us to move loads past stores at run time MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 77 Summary: Hardware versus software speculation 2) Hardware based speculation works better when: when control flow is unpredictable And when hardware-based branch prediction is superior to softwarebased branch prediction, which is done at compile time MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 78 Summary: Hardware versus software speculation 3) Hardware based speculation maintains a completely precise exception model for speculated instructions 4) Hardware based speculation does not require compensation which is needed by ambitious software speculation mechanism MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 79 Summary: Hardware versus software speculation 5) Compiler based approaches have the ability to see further in the code sequence, Thus, may provide better code scheduling than a purely hardware driven approach; for example, use of conditional move instruction MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 80 Summary: Hardware versus software speculation 6) Hardware based speculation with dynamic scheduling does not require different code sequences to achieve good performance for different implementation of architecture However, the major disadvantage of hardware support for speculation is the extra hardware resources and Complexity of the structure MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 81 Asslam-u-aLacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism-Static (5) 82 ... 2: Speculative-instructions method 3: Poison-bit Register method 4: Hardware (Re-order) buffering MAC/VU -Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism- Static (5) 12... with poison-bit turned ON, the instruction causes a fault MAC/VU -Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism- Static (5) 36 Resulting Behavior: Poison-bits Approach... a special instruction is included in the in the instruction- set to set/save the state of the poisonbit MAC/VU -Advanced Computer Architecture Lecture 24 – Instruction Level Parallelism- Static (5)