Advanced Computer Architecture - Lecture 21: Instruction level parallelism. This lecture will cover the following: static multiple issue: VLIW approach; detecting and enhancing loop level parallelism; software pipelining; multiple-issue overheads; VLIW/EPIC processor;...
CS 704 Advanced Computer Architecture Lecture 21 Instruction Level Parallelism (Static Scheduling – Multiple Issue Processor) Prof Dr M Ashraf Chughtai Today’s Topics Recap: Static Scheduling and Branch Prediction Static Multiple Issue: VLIW Approach Detecting and enhancing loop level parallelism Software pipelining Summary MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Recap: Static Scheduling Last time we started discussion on to the static scheduling techniques to exploit the ILP in pipeline datapath We noticed that inserting stalls is the basic compiler approach used to avoid the data and control hazards MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Recap: Static Scheduling However, as the number of stalls degrade the performance so compiler schedule the instructions to avoid hazards and to reduce or eliminate stalls Furthermore, we observed that in case of loops, the loops are unrolled to enhance the performance and reduce stalls MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Recap: Static Scheduling The number of stalls are further reduced when unrolled loop is scheduled by repeating each instruction for the number of iteration, but using additional registers Finally, we discussed the impact of static branch prediction on the performance on the scheduled and unrolled loops MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Recap: Static Scheduling We also observed that in superscalar processor, with multiple issues, the static branch prediction results in decrease in the misprediction rate better than the dynamic branch prediction Here, the misprediction rate ranges between 4% to 15% MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Today’s Discussion - Scheduling in VLIW processor We know that the Very Long Instruction Word or VLIW-based processors schedule multiple instruction issues using only the static scheduling Today we will extend our discussion on the Static Scheduling as used in VLIW processors MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Review of VLIW format A VLIW contains a fixed set of instructions, say 4-16 instructions A VLIW is formatted: Either as one large instruction Or a fixed instruction packet with explicit parallelism among instructions in a set MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) VLIW / EPIC Processor Since there exist explicit parallelism among instructions; VLIW is also referred to as: Explicitly Parallel Instruction Computing – EPIC It can initiate multiple instructions in a cycle by putting operations into wide template or packet by the compiler A packet may contain 64 – 128 bytes MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) Multiple-Issue overheads - VLIW Vs Superscalar In superscalar processor Overhead grows with issue-width – For two-issue processor the overhead for is minimal – For four-issue processor the overhead for is manageable For VLIW the over-head does not grow with the issue-width MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) 10 Detecting and enhancing loop level parallelism Recurrence exists, when the variable is defined based on the value of that variable in the earlier iteration Recurrence detection is important for two reasons MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) 78 Detecting and enhancing loop level parallelism 1) some architectures have special support for executing recurrences 2) some recurrence can be the source of a reasonable amount of parallelism MAC/VU-Advanced Computer Architecture Lecture 21 – Instruction Level Parallelism-Static (2) 79 Detecting and enhancing loop level parallelism Now consider one more loop, for(i=6; i