Advanced Computer Architecture - Lecture 23: Instruction level parallelism. This lecture will cover the following: hardware support at compile time; conditional/predicated instructions; H/W based compiler speculation; conditional move instruction; predicated load instructions;...
CS 704 Advanced Computer Architecture Lecture 23 Instruction Level Parallelism (Hardware Support at Compile Time) Prof Dr M Ashraf Chughtai Today’s Topics Recap H/W Support at Compile Time – Conditional/Predicated Instructions – H/W based Compiler Speculation Summary MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Recap: H/W and S/W Exploitation We have studied both the Dynamic and Static scheduling techniques to exploit ILP for single or multiple instructions issue per clock cycle and to enhance the processor performance The dynamic approaches use hardware modification which results in superscalar and VLIW processors MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Recap …… Cont’d Furthermore, the pipeline structure enhancement such as – Tomasulo’s pipeline facilitates to overcome the structural and data hazards and – Branch predictors minimize the stalls due the control hazards MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Recap …… Cont’d The static scheduling approaches include – Loop unrolling – Software Pipelining – Trace Scheduling – Superblock Scheduling MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Recap …… Cont’d These techniques are focused to increase ILP by exploiting processor issuing more than one instruction every cycle These techniques give better performance when the behavior of the branches is correctly predictable at the compile time MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Recap …… Cont’d Otherwise, the parallelism could not be completely exposed at the compile time This is due to the following two reasons MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Recap …… Cont’d Control dependences limits the amount of the parallelism that can be exploited; and Dependence between memory reference instructions could prevent code movement necessary to increase parallelism MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Hardware Support for VLIW These limitations, particularly for VILW processor, could be overcome by providing hardware support at the compile time Today, we will introduce some hardware support-based techniques to help: MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) Hardware Support for VLIW – overcoming these limitations; and – to expose more parallelism at the compile time The most commonly used such techniques are: MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 10 Hardware support Speculation Furthermore, in most of the cases we would have to move the speculated instructions before the condition evaluation But, this cannot be done by predication Rather, it motivates to have the following capabilities to speculate ambitiously MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 48 Compiler speculation with hardware support 1) The ability to find instruction can be speculatively moved and not affect the program data flow 2) The ability to ignore exceptions in speculated instructions, until we know such exception would really occur 3) The ability to speculatively interchange loads and stores, or stores and stores, which may have address conflicts MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 49 Compiler speculation with hardware support Note the first one is the compiles capability where as the other two can be achieved by Hardware support Hardware speculation approach supports reordering loads and stores MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 50 Compiler speculation with hardware support Hardware based speculative movement of instructions is done by checking for potential address conflicts at runtime; and It allows the compiler to reorder loads and stores when it suspects they not conflict MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 51 Methods to preserve exceptions The following are four hardware methods to support more ambitious speculation without introducing erroneous exception behavior The key to these methods is to observe that the results of speculated sequence that is mispredicted will not be used in the final computation, for this purpose the exceptions are preserved MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 52 Methods to provide Hardware support 1) The hardware and operating system cooperatively ignores the exception for speculative instructions for the incorrect program Here, the exception behavior for the correct program is preserved and for incorrect one is ignored This approach is used as a “fast mode” under the program control MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 53 Methods to Preserve exceptions The examples of exceptions that indicate a program error and normally cause termination are memory protection violation The examples of exceptions that handle the program error and normally resumed are page fault MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 54 Methods to provide Hardware support 2) Speculative instructions that never raise exception are used; and checks are introduced to determine when exception should occur MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 55 Methods to preserving the exception 3) A set of bits called “poison bits” are attached to the result register These bits are written by speculated instructions when the instruction causes exceptions The poison bits cause a fault when a normal instruction attempts to use the register MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 56 Methods to preserve exception This approach suggest to track the exceptions as they occur but postpone any terminating exception until a value is actually used The scheme adds a poison pit to every register and another bit to every instruction to indicate if whether the instruction is speculative MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 57 Methods to preserve exception The poison bit of the destination register is set whenever a speculative instruction results in terminating exception And all other exceptions are handled immediately MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 58 Methods to preserve exception If the speculative instruction uses a register with poison bit on; the destination register has its poison bit on Now if the normal instruction attempts a register source with poison bit on, the instruction causes a fault MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 59 Methods to provide Hardware support 4) A mechanism is provided to indicate: that an instruction is speculative; and the hardware buffers the instruction result until it is certain that the instruction is no longer speculative MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 60 Summary Both the hardware and software mechanisms provides approaches to exploit ILP There are certain limitation on both mechanisms We will discuss these limitations next time MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 61 Asslam-u-aLacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism-Static (4) 62 ... cycle Let us see: MAC/VU -Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism- Static (4) 32 2-Issue Instruction Code First instruction slot second instruction slot LW ADD... to the BEQZ R10, L MAC/VU -Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism- Static (4) 36 2-Issue Instruction Code First instruction slot Second instruction slot LW R1,40(R2)... conditional instruction resolve the dependence where the register-write occurs MAC/VU -Advanced Computer Architecture Lecture 23 – Instruction Level Parallelism- Static (4) 24 Conditional or predicated Instructions