Advanced Computer Architecture - Lecture 22: Instruction level parallelism. This lecture will cover the following: software pipelining and trace scheduling; eliminating dependent computations; superblocks; reducing dependent computations; uncovering instruction level parallelism;...
Static (3) 78 Part b) The unrolled and scheduled code for the transferred code - Loop body takes 10 cycles integer Inst FP Inst Clock Cycles Foo L.D F0,0(R1) L.D F6,-8(R1) L.D F4,0(R2) L.D F8,-8(R2) DADDUI R1,R1,#-16 MUL.D F0,F0,F4 DADDUI R2,R2,#-16 MUL.D F6,F6,F8 stall stall BNEZ R1,foo ADD.D F2,F0,F2 ADD.D F2,F0,F2 10 … MAC/VU-Advanced Lecture 22 – Instruction Level Bar:Computer Architecture ADD.D F2,F0,F2 14 Parallelism-Static (3) 79 Problem # Consider a code For (i=2; i