General Approach of FFT Algorithms

CHAPTER 12 Applications of Discrete-Time Signals and Systems

12.2 Application to Digital Signal Processing

12.2.3 General Approach of FFT Algorithms

Although there are many algorithmic approaches to the FFT, the initial idea was to represent the finite one-dimensional signalx[n] as a two-dimensional array. This can be done by representing the lengthNofx[n] as the product of smaller integers, providedNis not prime. IfNis prime, the DFT computation is done with the conventional formula, as the FFT does not provide any simplification.

However, in that case we could attach zeros to the signal (if the signal is not periodic) to increase its length to a nonprime number. This factorization approach has historic significance as it was the technique used by Cooley and Tukey, the authors of the FFT.

SupposeNcan be factored asN=pq; then the frequency and time indiceskandnin the direct DFT can be written as

k=k1p+k0 for k0=0,. . .,p−1, k1 =0,. . .,q−1 n=n1q+n0 for n0=0,. . .,q−1, n1=0,. . .,p−1

The values ofk range from 0 (whenk0=k1=0) toN−1 (whenk1=q−1 andk0=p−1 then k=k1p+k0=(q−1)p+(p−1)=qp−1=N−1). Likewise forn. The direct DFT

X[k]=

N−1

n=0

x[n]Wnk

can be written to reflect the dependence on the new indices as X[k0,k1]=

q−1

n0=0 p−1

n1=0

x[n0,n1]WN(n1q+n0)(k1p+k0) (12.9) giving a two-dimensional array.

The decimation-in-time FFT presented before may be viewed in this framework by lettingq=2 and p=N/2, where as beforeN=2γ is even. We then have usingWN2n1(k1p+k0)=WN/2n1(k1p+k0),

X[k0,k1]=

n0=0

WNn0(k1p+k0)

N/2−1

n1=0

x[n0,n1]WN/2n1(k1p+k0)

N/2−1

n1=0

x[0,n1]WN/2n1(k1p+k0)+WN(k1p+k0)

N/2−1

n1=0

x[1,n1]WN/2n1(k1p+k0)

=Y[k]+WN(k1p+k0)Z[k] k0=0,. . .,N/2−1, k1=0, 1 (12.10) where whenn0=0 the even terms (x[0,n1]=x[qn1]=x[2n1]) in the input are being transformed, while when n0=1 the odd terms (x[1,n1]=x[qn1+1]=x[2n1+1]) of the input are being

12.2 Application to Digital Signal Processing 717

transformed. Since k=k1p+k0 the final equation is Y[k]+WNkZ[k], which we obtained in the decimation-in-time approach (see Eq. 12.3).

FactoringN=2×N/2 corresponds to one step of the decimation-in-time method. If we factorN/2 asN/2=(2)(N/4), we would obtain the second step in the decimation-in-time algorithm. IfN=2γ, this process is repeatedγtimes or until the length is 2.

Remark A dual of the decimation-in-time FFT algorithm is the decimation-in-frequency method.

The Modern FFT

A paper by James Cooley, an IBM researcher, and Professor John Tukey from Princeton University [15] describing an algorithm for the machine calculation of complex Fourier series appeared inMathematics of Computationin 1965. Cooley, a mathematician, and Tukey, a statistician, had in fact developed an efficient algorithm to compute the discrete Fourier transform (DFT), which will be called the FFT. Their result was a turning point in digital signal processing: The proposed algorithm was able to compute the DFT of a sequence of lengthNusingNlogNarithmetic operations, much smaller than theN2operations that had blocked the practical use of the DFT. As Cooley indicated in his paper “How the FFT Gained Acceptance” [14], his interest in the problem came from a suggestion from Tukey on lettingNbe a composite number, which would allow a reduction in the number of operations of the DFT computation.

The FFT algorithm was a great achievement for which the authors received deserved recognition, but also benefited the new digital signal processing area, and motivated further research on the FFT. But as in many areas of research, Cooley and Tukey were not the only ones who had developed an algorithm of this class. Many other researchers before them had developed similar procedures. In particular, Danielson and Lanczos, in a paper published in theJournal of the Franklin Institutein 1942 [19], proposed an algorithm that came very close to Cooley and Tukey’s results. Danielson and Lanczos showed that a DFT of lengthNcould be represented as a sum of twoN/2 DFTs proceeding recursively with the condition thatN=2γ. Interestingly, they mention that (remember this was in 1942!):

Adopting these improvements the approximation time for Fourier analysis are: 10 minutes for 8 coefficients, 25 minutes for 16 coefficients, 60 minutes for 32 coefficients, and 140 minutes for 64 coefficients.

nExample 12.2

Consider computing the FFT of a signal of length N=23 =8 using the decimation-in-time algorithm.

Solution

LettingN=qp=2×4, we then have that X[k0,k1]=

n0=0

W8n0(k1p+k0)

N/2−1

n1=0

x[n0,n1]W4n1(k1p+k0)

n1=0

x[0,n1]W4n1(k1p+k0)+W8(k1p+k0)

n1=0

x[1,n1]W4n1(k1p+k0) where

k=4k1+k0 for k0 =0,. . ., 3, k1=0, 1 n=2n1+n0 for n0=0, 1, n1=0,. . ., 3

FIGURE 12.2

Lattice forn0=0, 1andn1=0,. . ., 3(the values in parentheses are the indices of the samples). Notice the ordering in the two columns.

n0 0 1

2 3

(0) (2) (4) (6)

(1) (3) (5) (7)

Figure 12.2 displays a lattice forn0 andn1 and the indices of the samples are in parentheses. By replacingk=4k1+k0, we get the first step of the decimation-in-time.

If we lety[n]=x[2n] andz[n]=x[2n+1],n=0,. . ., 3, we can then repeat the above procedure by factoring 4=pq=2×2 and expressingY[k] andZ[k] as we did forX[k]. Thus, we have

Y[k0,k1]=

n0=0

W4n0(k1p+k0)

n1=0

y[n0,n1]W2n1(k1p+k0)

n1=0

y[0,n1]W2n1(k1p+k0)+W4(k1p+k0)

n1=0

y[1,n1]W2n1(k1p+k0)

and

Z[k0,k1]=

n1=0

z[0,n1]W2n1(k1p+k0)+W4(k1p+k0)

n1=0

z[1,n1]W2n1(k1p+k0) where now

k=2k1+k0 for k0=0, 1, k1=0, 1 n=2n1+n0 for n0=0, 1, n1=0, 1 If we replacek=2k1+k0, we obtain

Y[k]=I[k]+W4kG[k]

Z[k]=H[k]+W4kF[k]

where

I[k]=

n1=0

y[0,n1]W2n1k=

n1=0

y[2n1]Wn21k=

n1=0

x[4n1]W2n1k

12.2 Application to Digital Signal Processing 719

G[k]=

n1=0

y[1,n1]W2n1k=

n1=0

y[2n1+1]W2n1k=

n1=0

x[4n1+2]W2n1k

Likewise,

H[k]=

n1=0

z[0,n1]W2n1k =

n1=0

z[2n1]W2n1k=

n1=0

x[4n1+1]W2n1k

F[k]=

n1=0

z[1,n1]W2n1k =

n1=0

z[2n1+1]W2n1k=

n1=0

x[4n1+3]Wn21k

nExample 12.3

In this example we wish to compare the efficiency of the FFT algorithm with that of our algorithm dft.mthat computes the DFT using its definition. Consider the computation of the FFT and the DFT of a signal consisting of ones of increasing lengthsN=2r,r=8,. . ., 11, or 256 to 2048.

Solution

To compare the algorithms we use the following script. The MATLAB functioncputimemeasures the time it takes for each of the algorithms to compute the DFT of the sequence of ones.

%%%%%%%%%%%%%%

% example 12.3

% fft vs dft

%%%%%%%%%%%%%%

clf; clear all

time = zeros(1,4); time1 = time;

for r = 8:11,$$

N(r) = 2 ˆ r;

i = r - 7;

t = cputime;

fft(ones(1,N(r)),N(r));

time(i) = cputime - t;

t = cputime;

dft(ones(N(r),1),N(r));

time1(i) = cputime - t;

end

%%%%%%%%%%%

% function dft

%%%%%%%%%%%

function X = dft(x,N) n = 0:N - 1;

FIGURE 12.3

CPU times for thefftand the dftfunctions used in

computing the DFT of sequences of ones of lengths N=256to2048

(corresponding ton=8,. . .11).

The CPU time for the FFT is multiplied by104.

8 8.5 9 9.5 10 10.5 11

0 100 200 300 400 500 600 700 800

n=log2(N)

CPU Time (sec)

104 fft time dft time

W = ones(1,N);

for k = 1:N - 1,

W = [W; exp(-j∗2∗pi∗n∗k/N)];

end X = W∗x;

The results of the comparison are shown in Figure 12.3. Notice that to make the Computer Pro- cessing Unit (CPU) time for the FFT comparable with that of thedft algorithm, it is multiplied by 104, illustrating how much faster the FFT is compared to the computation of the DFT from its

definition. n

nExample 12.4

The convolution sum is computationally very expensive. Compare the CPU time used by the MAT- LAB functionconv, which is used to compute the convolution sum in the time domain, with the CPU time used by an implementation of the convolution sum in frequency using the FFT. Recall that the frequency implementation requires computing the DFT of the signals being convolved, their multiplication, and finally the computation of the IDFT to the final convolution result.

Solution

To illustrate the efficiency in computation provided by the FFT in computing the convolution sum we compare the CPU times used by the conv function and the implementation of the convolution sum using the FFT. As indicated before, the convolution of two signals x[n],

12.2 Application to Digital Signal Processing 721

and y[n] of lengths N and M is obtained in the frequency domain by following these three steps:

n Compute the DFTsX[k],Y[k] ofx[n], andy[n] of lengthM+N−1.

n Multiply these complex DFTs to getX[k]Y[k]=U[k].

n Compute the IDFT ofU[k] corresponding to the convolutionx[n]∗y[n].

Implementing the DFT and the IDFT with the FFT algorithm it can be shown that the computa- tional complexity of the above three steps is much smaller than that of computing the convolution sum directly using theconvfunction.

To demonstrate the efficiency of the FFT implementation we consider the convolution of a signal, for increasing lengths, with itself. The signal is a sequence of ones of increasing length of 1000 to 10,000 samples. The CPU times used by the functionsconvand the FFT three-step procedure are measured and compared for each of the lengths. The CPU time used byconvis divided by 10 to be able to plot it with the CPU of the FFT-based procedure shown in the following script. The results are shown in Figure 12.4.

%%%%%%%%%%%%%

% example 12.4

% conv vs fft

%%%%%%%%%%%%%

time1 = zeros(1,10);time2 = time1;

for i = 1:10, NN = 1000∗i;

x = ones(1,NN);

FIGURE 12.4

CPU times for thefftand theconvfunctions when computing the convolution of sequences of ones of lengths N=1000to10,000. The CPU time used byconvis divided by 10.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

×104 0

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Length of Convolution Sum

CPU Time (sec)

fft time conv time / 10

M = 2∗NN-1;

t0 = cputime;

y = conv(x,x); % convolution using conv time1(i) = cputime-t0;

t1 = cputime;

X = fft(x,M); X = fft(x,M); Y = X.∗X; y1 = ifft(Y); % convolution using fft time2(i) = cputime-t1

sum(y-y1) % check conv and fft results coincide pause % check for small difference

end n

Gauss and the FFT

Going back to the sources used by the FFT researchers it was discovered that many well-known mathematicians had developed similar algorithms for different values ofN. But that an algorithm similar to the modern FFT had been developed and used by Carl Gauss, the German mathematician, probably in 1805, predating even Fourier’s work on harmonic analysis in 1807, was an interesting discovery—although not surprising [31]. Gauss has been called the “prince of mathematicians”

for his prodigious work in so many areas of mathematics, and for the dedication to his work. His motto wasPauca sed matura (few, but ripe); he would not disclose any of his work until he was very satisfied with it. Moreover, as it was customary in his time, his treatises were written in Latin using a difficult mathematical notation, which made his results not known or understood by modern researchers. Gauss’s treatise describing the algorithm was not published in his lifetime, but appeared later in his collected works. He, however, deserves the paternity of the FFT algorithm.

The developments leading to the FFT, as indicated by Cooley [14], point out two important concepts in numerical analysis (the first of which applies to research in other areas): (1) thedivide-and-conquer approach—that is, it pays to break a problem into smaller pieces of the same structure; and (2) theasymptotic behavior of the number of operations. Cooley’s final recommendations in his paper are worth serious consideration by researchers in technical areas:

n Prompt publication of significant achievements is essential.

n Review of old literature can be rewarding.

n Communication among mathematicians, numerical analysts, and workers in a wide range of applications can be fruitful.

n Do not publish papers in neoclassic Latin.

Frequency Response from Poles and Zeros

What Have We Accomplished? What Is Next?