Lecture VLSI Digital signal processing systems - Chapter 5 discuss the unfolding. The main contents of this chapter include: Algorithm for unfolding, applications of unfolding, sample period reduction, parallel processing,... Inviting you refer.
Chapter 5: Unfolding Keshab K Parhi • Unfolding ≡ Parallel Processing 2-unfolded (1) B (1) A 2D A0àB0=> A2àB2=> A4àB4=>… A1àB1=> A3àB3=> A5àB5=>… nodes & edges T∞ = (1+1)/2 = 1ut (1) 0,2,4,… B0 (1) A0 T’∞ = 2ut (1) A1 T’∞ = 2ut D (1) 1,3,5,… B1 D nodes & edges T∞ = 2/2 = 1ut • In a ‘J’ unfolded system each delay is J-slow => if input to a delay element is the signal x(kJ + m), the output is x((k-1)J + m) = x(kJ + m – J) Chap • Algorithm for unfolding: Ø For each node U in the original DFG, draw J node U0 , U1 , U2 ,…, UJ-1 Ø For each edge U → V with w delays in the original DFG, draw the J edges Ui → V(i + w)%J with (i+w)/J delays for i = 0, 1, …, J-1 U 37D V w = 37 ⇒(i+w)/4 = 9, i = 0,1,2 =10, i = U0 9D V0 U1 9D V1 U2 V2 U3 10D 9D V3 ØUnfolding of an edge with w delays in the original DFG produces J-w edges with no delays and w edges with 1delay in J unfolded DFG for w < J ØUnfolding preserves precedence constraints of a DSP program Chap 2D D V U 3-unfolded 6D 5D T DFG U0 V0 2D U1 V1 2D T1 U2 D D T0 V2 2D T2 2D Properties of unfolding : Ø Unfolding preserves the number of delays in a DFG This can be stated as follows: w/J + (w+1)/J + … + (w + J - 1)/J = w Ø J-unfolding of a loop l with wl delays in the original DFG leads to gcd(wl , J) loops in the unfolded DFG, and each of these gcd(wl , J) loops contains wl/ gcd(wl , J) delays and J/ gcd(wl , J) copies of each node that appears in l Ø Unfolding a DFG with iteration bound T∞ results in a Junfolded DFG with iteration bound JT∞ Chap • Applications of Unfolding Ø Sample Period Reduction Ø Parallel Processing • Sample Period Reduction Ø Case : A node in the DFG having computation time greater than T∞ Ø Case : Iteration bound is not an integer Ø Case : Longest node computation is larger than the iteration bound T∞, and T∞ is not an integer Chap 5 Case : ØThe original DFG cannot have sample period equal to the iteration bound because a node computation time is more than iteration bound Ø If the computation time of a node ‘U’, tu, is greater than the iteration bound T∞, then tu/T ∞ - unfolding should be used Ø In the example, tu = 4, and T∞ = 3, so 4/3 - unfolding i.e., 2unfolding is used Chap • Case : ØThe original DFG cannot have sample period equal to the iteration bound because the iteration bound is not an integer ØIf a critical loop bound is of the form tl/wl where tl and wl are mutually co-prime, then wl-unfolding should be used ØIn the example tl = 60 and wl = 45, then tl/wl should be written as 4/3 and 3-unfolding should be used •Case : In this case the minimum unfolding factor that allows the iteration period to equal the iteration bound is the value of J such that JT∞ is an integer and is greater than the longest node computation time Chap • Parallel Processing : Ø Word- Level Parallel Processing Ø Bit Level Parallel processing vBit-serial processing vBit-parallel processing vDigit-serial processing Chap • Bit-Level Parallel Processing a0 a1 a2 a3 b0 b1 b2 b3 Bit-parallel a2 a0 a3 a2 a1 a0 b3 b2 b1 b0 4l+0 Bit-serial adder b3 b2 b1 b0 b2 b0 Digit-Serial (Digit-size = 2) a3 a1 Chap a3 a2 a1 a0 Bit-serial b3 b1 s3 s2 s1 s0 D 4l+1,2,3 • The following assumptions are made when unfolding an edge U→V : Ø The wordlength W is a multiple of the unfolding factor J, i.e W = W’J Ø All edges into and out of the switch have no delays • With the above two assumptions an edge U→V can be unfolded as follows : Ø Write the switching instance as Wl + u = J( W’l + u/J ) + (u%J) Ø Draw an edge with no delays in the unfolded graph from the node Uu%J to the node Vu%J , which is switched at time instance ( W’l + u/J ) Chap 10 Example : 4l + U0 12l + 1, 7, 9, 11 U V V0 4l + 0,2 Unfolding by U1 U2 4l + V1 V2 To unfold the DFG by J=3, the switching instances are as follows 12l + = 3(4l + 0) + 12l + = 3(4l + 2) + 12l + = 3(4l + 3) + 12l + 11 = 3(4l + 3) + Chap 11 • Unfolding a DFG containing an edge having a switch and a positive number of delays is done by introducing a dummy node 2D 2D 6l + 1, 6l + 1, A D A Inserting C C Dummy node 6l + 0, 2, 3, B B 6l + 0, 2, 3, A0 A1 A2 D D0 D D1 D2 B0 B1 Chap B2 2l + C0 2l + A2 D 2l + C1 2l + 2l + C2 2l + B0 B1 B2 A0 C0 2l + C1 2l + 2l + C2 2l + 12 • If the word-length, W, is not a multiple of the unfolding factor, J, then expand the switching instances with periodicity lcm(W,J) • Example: Consider W=4, J=3 Then lcm(4,3) = 12 For this case, 4l = 12l + {0,4,8), 4l+1 = 12l + {1,5,9}, 4l+2 = 12l + {2,6,10}, 4l+3 = 12l + {3,7,11} All new switching instances are now multiples of J=3 Chap 13 ... Parallel Processing : Ø Word- Level Parallel Processing Ø Bit Level Parallel processing vBit-serial processing vBit-parallel processing vDigit-serial processing Chap • Bit-Level Parallel Processing. .. Parallel Processing 2-unfolded (1) B (1) A 2D A0àB0=> A2àB2=> A4àB4=>… A1àB1=> A3àB3=> A5àB5=>… nodes & edges T∞ = (1+1)/2 = 1ut (1) 0,2,4,… B0 (1) A0 T’∞ = 2ut (1) A1 T’∞ = 2ut D (1) 1,3 ,5, … B1... a0 a1 a2 a3 b0 b1 b2 b3 Bit-parallel a2 a0 a3 a2 a1 a0 b3 b2 b1 b0 4l+0 Bit-serial adder b3 b2 b1 b0 b2 b0 Digit-Serial (Digit-size = 2) a3 a1 Chap a3 a2 a1 a0 Bit-serial b3 b1 s3 s2 s1 s0 D