Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 11 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
11
Dung lượng
182,95 KB
Nội dung
Feig, E. “Complexity TheoryofTransformsinSignal Processing” Digital SignalProcessing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c 1999byCRCPressLLC 9 ComplexityTheoryofTransformsinSignalProcessing Ephraim Feig IBM Corporation T.J. Watson Research Center 9.1 Introduction 9.2 One-Dimensional DFTs 9.3 Multidimensional DFTs 9.4 One-Dimensional DCTs 9.5 Multidimensional DCTs 9.6 Nonstandard Models and Problems References 9.1 Introduction Complexitytheoryof computation attempts to determine how “inherently” difficult are certain tasks. For example, how inherently complex is the task of computing an inner product of two vectors of length N? Certainly one can compute the inner product N j=1 x j y j by computing the Nproductsx j y j and then summing them. But can one compute this inner product with fewer than N multiplications? The answer is no, but the proof of this assertion is no trivial matter. One first abstracts and defines the notions of the algorithm and its components (such as addition and multiplication); then a theorem is proven that any algorithm for computing a bilinear form which uses K multiplications can be transformed to a quadratic algorithm (some algorithm of a very special form, which uses no divisions, and whose multiplications only compute quadratic forms) which uses at most K multiplications [20]; and finally a proof by induction on the length N of the summands in the inner product is made to obtain the lower bound result [6, 13, 22, 25]. We will not present the details here; we just want to let the reader know that the process for even proving what seems to be an intuitive result is quite complex. Consider next the more complex task of computing the product of an N point vector by an M × N matrix. This corresponds to the task of computing M separate inner products of N-point vectors. It is tempting to jump to the conclusion that this task requires MN multiplications. But we should not jump to fast conclusions. First, the M inner products are separate, but not independent (the term is used loosely, and not in any linear algebra sense). After all, the second factor in the M inner products is always the same. It turns out [6, 22, 25] that, indeed, our intuition this time is correct again. And the proof is really not much more difficult than the proof for the complexity result for inner products. In fact, once the general machinery is built, the proof is a slight extension of the previous case. So far intuition proved accurate. Incomplexitytheory one learns early on to be skeptical of intuitions. An early surprising result in complexitytheory— and to date still one of its most remarkable — contradicts the intuitive guess that c 1999 by CRC Press LLC computing the product of two 2 × 2 matrices requires 8 multiplications. Remarkably, Strassen [21] has shown that it can be done with 7 multiplication. His algorithm is very nonintuitive; I am not aware of any good algebraic explanation for it exceptfor the assertion that the mathematical identities which define the algorithm indeed are valid. It can also be shown [15] that 7 is the minimum number of multiplications required for the task. The consequences of Strassen’s algorithm for general matrix multiplication tasks are profound. The task of computing the product of two 4 × 4 matrices with real entries can be viewed as a task of computing two 2 × 2 matrices whose entries are themselves 2 × 2 matrices. Each of the 7 multipli- cations in Strassen’s algorithm now become matrix multiplications requiring 7 real multiplications plus a bunch of additions; and each addition in Strassen’s algorithm becomes an addition of 2 × 2 matrices, which can be done with 4 real additions. This process of obtaining algorithms for large problems, which are built up of smaller ones in a structures manner, is called the “nesting” proce- dure [25]. It is a very powerful tool in both complexitytheory and algorithm design. It is a special form of recursion. The set of N × N matrices form a noncommutative algebra. A branch ofcomplexitytheory called “multiplicative complexity theory” is quite well established for certain relatively few algebras, and wide open for the rest. In this theorycomplexity is measured by the number of “essential multiplications.” Given an algebra over a field F, an algorithm is a sequence of arithmetic operations in the algebra. A multiplication is called essential if neither factor is an element in F . If one of the factors in a multiplication is an element in F, the operation is called a scaling. Consider an algebra of dimension N over a field F, with basis b 1 , .,b N . An algorithm for computing the product of two elements N j=1 f j b j and N j=1 g j b j with f j ,g j ∈ F is called bilinear, if every multiplication in the algorithm is of the form L 1 (f 1 , .,f N ) ∗ L 2 (g 1 , .,g N ), where L 1 and L 2 are linear forms and ∗ is the product in the algebra, and it uses no divisions. Because none of the arithmetic operations in bilinear algorithms rely on the commutative nature of the underlying field, these algorithms can be used to build recursively via the nesting process algorithms for noncommutative algebras of increasingly large dimensions, which are built from the smaller algebras via the tensor product. For example, the algebra of 4×4 matrices (over some field F; I will stop adding this necessary assumption, as it will be obvious from content) is isomorphic to the tensor product of the algebra of 2 × 2 matrices with itself. Likewise, the algebra of 16× 16 matrices is isomorphic to the tensor product of the algebra of 4× 4 matrices with itself. And this proceeds to higher and higher dimensions. Suppose we have a bilinear algorithm for computing the product in an algebra T 1 of dimension D, which uses M multiplications and A additions (including subtractions) and S scalings. The algebra T 2 = T 1 ⊗T 1 has dimension D 2 . By the nesting procedure we can obtain an algorithm for computing the product in T 2 which uses M multiplications of elements in T 1 , A additions of elements in T 1 , and S scalings of elements in T 1 . Each multiplication in T 1 requires M multiplications, A additions, and S scalings; each addition in T 1 requires D additions; and each scaling in T 1 requires D scalings. Hence, the total computational requirements for this new algorithm is M 2 multiplications, A(M+D) additions and S(M + D) scalings. If the nesting procedure is continued to yield an algorithm for the product in the D 4 dimensional algebra T 4 = T 2 ⊗ T 2 , then its computational requirements would be M 4 multiplications, A(M + D)(M 2 + D 2 ) additions and S(M + D)(M 2 + D 2 ) scalings. One more iteration would yield an algorithm for the D 8 dimensional algebra T 8 = T 4 ⊗ T 4 , which uses M 8 multiplications, A(M + D)(M 2 + D 2 )(M 4 + D 4 ) additions, M 8 multiplications, and S(M + D)(M 2 + D 2 )(M 4 + D 4 ) scalings. The general pattern should be apparent by now. We see that the growth of the number of operations (the high order term, that is) is governed by M and not by A or S. A major goal ofcomplexitytheory is the understanding of computational requirements as problem sizes increase, and nesting is the natural way of building algorithms for larger and larger problems. We see one reason why counting multiplications (as opposed to all arithmetic operations) c 1999 by CRC Press LLC became so important incomplexity theory. (Historically, in the early days multiplications were indeed much more expensive than additions.) Algebras of polynomials are important insignal processing; filtering can be viewed as polynomial multiplications. Theproductoftwopolynomialsofdegreesd 1 and d 2 canbecomputedwith d 1 +d 2 −1 multiplications. Furthermore, it is rather easy to prove (a straightforward dimension argument) that this is the minimal number of multiplications necessary for this computation. Algorithms which compute these products with these numbers of multiplications (so-called optimal algorithms) are obtained using Lagrange interpolation techniques. For even moderate values of d j , they use inordinately many additions and scalings. Indeed, they use (d 1 + d 2 − 3)(d 1 + d 2 − 2) additions, and a half as many scalings. So these algorithms are not very practical, but they are of theoretical interest. Also of interest is the asymptotic complexityof polynomial products. They can be computed by embedding them in cyclic convolutions of sizes at most twice as long. Using FFT techniques, these can be achieved with order D log D arithmetic operations, where D is the maximum of the degrees. With optimal algorithms, while the number of (essential) multiplications is linear, the total number of operations is quadratic. If nesting is used, then the asymptotic behavior of the number of multiplications is also quadratic. Convolution algebras are derived from algebras of polynomials. Given a polynomial P (u) of degree D, one can define an algebra of dimension D whose entries are all polynomials of degree less than D, with addition defined in the standard way, and multiplication is modulo P (u). Such algebras are called convolution algebras. For polynomials P (u) = u D − 1, the algebras are cyclic convolutionsofdimension D. For polynomials P (u) = u D +1, these algebras are called signed-cyclic convolutions. The product of two polynomials modulo P (u) can be obtained from the product of the two polynomials without any extra essential multiplications. Hence, if the degree of P (u) is D, then the product modulo P (u) can be done with 2D − 1 multiplications. But can it be done with fewer multiplications? Whereas complexitytheory has huge gaps in almost all areas, it has triumphed in convolution algebras. The minimum number of multiplications required to compute a product in an algebra is called the multiplicative complexityof the algebra. The multiplicative complexityof convolution algebras (over infinite fields) is completely determined [22]. If P(u) factors (over the base field; the role of the field will be discussed in greater detail soon) to a product of k irreducible polynomials, then the multiplicative complexityof the algebra is 2D− k.SoifP (u) is irreducible, then the answer to the question in the previous paragraph is no. Otherwise, it is yes. The above complexity result for convolution algebras is a sharp bound. It is a lower bound in that every algorithm for computing the product in the algebra requires at least 2D − k multiplications, where k is the number of factors of the defining polynomial P (u). It is also an upper bound, in that there are algorithms which actually achieve it. Let us factor P (u) = P j (u) into a product of irreduciblepolynomials (here we see the roleofthe field; moreaboutthissoon). Then the convolution algebra modulo P (u) is isomorphic to a direct sum of algebras modulo P j (u); the isomorphism is via the Chinese remainder theorem. The multiplicative complexityof the direct summands are 2d j − 1, where d j are the degrees of P j (u); these are sharp bounds. The algorithm for the algebra modulo P (u) is derived from these smaller algorithms; because of the isomorphism, putting them all together requires no extra multiplications. The proof that this is a lower bound, first given by Winograd [23], is quite complicated. The above result is an example of a “direct sum theorem.” If an algebra is decomposable to a direct sum of subalgebras, then clearly the multiplicative complexityof the algebra is less than or equal to the sum of the multiplicative complexities of the summands. In some (relatively rare) circumstances equality can beshown. The exampleofconvolutionalgebrasissucha case. The resultsforconvolution algebras are very strong. Winograd has shown that every minimal algorithm for computing products in a convolution algebra is bilinear and is a direct sum algorithm. The latter means that the algorithm actually computes a minimal algorithm for each direct summand and then combines these results c 1999 by CRC Press LLC without any extra essential multiplications to yield the product in the algebra itself. Things get interesting when we start considering algebras which are tensor products of convolution algebras (these are called multi-dimensional convolution algebras). A simple example already is enlightening. Consider the algebra C of polynomial multiplications modulo u 2 +1 overthe rationals Q; this algebra is called the Gaussian rationals. The polynomial u 2 + 1 is irreducible over Q (the algebra is a field), so by the previous result, its multiplicative complexity is 3. The nesting procedure would yield an algorithm the product in C ⊗ C which uses 9 multiplications. But it can in fact be computed with 6 multiplications. The reason is due to an old theorem, probably due to Kroeneker (though I cannot find the original proof); the reference I like best is Adrian Albert’s book [1]. The theorem asserts that the tensor product of fields is isomorphic to a direct sum of fields, and the proof of the theorem is actually a construction of this isomorphsim. For our example, the theorem yields that the tensor product C ⊗ C is isomorphic to a direct sum of two copies of C. The product in C ⊗ C can, therefore, be computed by computing separately the product in each of the two direct summands, each with 3 multiplications, and the final result can be obtained without any more essential multiplications. The explicit isomorphism was presented to the complexitytheory community by Winograd [22]. Since the example is sufficiently simple to work out, and the results so fundamental to much of our later discussions, we will present it here explicitly. Consider A, the polynomial ring modulo u 2 + 1 over the Q. This is a field of dimension 2 over Q, and it has the matrix representation (called its regular representation) given by ρ(a + bu) = a −b ba . (9.1) While for all b = 0 the matrix above is not diagonalizable over Q, the field (algebra) is diagonalizable over the complexes. Namely, 1 i 1 −i a −b ba 1 i 1 −i −1 = a + ib 0 0 a − ib . (9.2) The elements 1 and i of A correspond (in the regular representation) in the tensor algebra A⊗ A to the matrices ρ( 1 ) = 10 01 (9.3) and ρ( i ) = 0 −1 10 , (9.4) respectively. Hence, the 4 × 4 matrix R = ρ( 1 )ρ(i) ρ( 1 )ρ(−i) (9.5) diagonalizes the algebra A ⊗ A. Explicitly, we can compute 10 0−1 01 1 0 10 0 1 01−10 x 0 −x 1 −x 2 −x 3 x 1 x 0 −x 3 x 2 x 2 −x 3 x 0 −x 1 x 3 x 2 x 1 x 0 10 0−1 01 1 0 10 0 1 01−10 −1 = y 0 −y 1 00 y 1 y 0 00 00y 2 −y 2 00y 3 y 3 , (9.6) c 1999 by CRC Press LLC where y 0 = x 0 − x 3 ,y 1 = x 1 + x 2 ,y 2 = x 0 + x 3 and y 3 = x 1 − x 2 . A simple way to derive this is by setting X 0 to be the top left 2 × 2 minor of the matrix with x j entries in the above equation, X 1 to be its bottom left 2 × 2 minor, and observing that R X 0 −X 1 X 1 X 0 R −1 = ρ( 1 )X 0 + ρ( i )X 1 ρ( 0 )X 0 − ρ( i )X 1 . (9.7) The algorithmic implications are straightforward. The product in A ⊗ A can be computed with fewer multiplications than the nesting process would yield. Straightforward extensions of the above construction yield recipes for obtaining minimal algorithms for products in algebras which are tensor products of convolution algebras. The example also highlights the role of the base field. The complexityof A as an algebra over Q is 3; the complexityof A as an algebra over the complexes is 2, as over the complexes this algebra diagonalizes. Historically,multiplicativecomplexitytheorygeneralizedintwoways(andinvariouscombinations of the two). The first addressed the question: what happens when one of the factors in the product is not an arbitrary element but a fixed element not in the basefield? The second addressed: what is the complexityof semidirect systems — those in which several products are to be computed, and one factor is arbitrary but fixed, while the others are arbitrary? Computing an arbitrary product in an n-dimensional algebra can be thought of (via the regular representation) as computing a product of a matrix A(X) times a vector Y, where the entries in the matrix A(X) are linear combinations of n indeterminates x 1 , ., x n and y is a vector of n indeterminates y 1 , ., y n . When one factor is a fixed element in an extension field, the entries in A(X) are now entries in some extension field of the basefield which may have algebraic relations. For example, consider G = γ(1, 8) −γ(3, 8) γ(3, 8)γ(1, 8) (9.8) where γ (m, n) = cos(2πm/n). Thecomplexnumbers γ(1, 8) and γ(3, 8) arelinearly independent over Q, but they satisfy the algebraic relation γ(1, 8)/γ(3, 8) = √ 2. This algebraic relation gives a relation of the two numbers to the rationals, namely γ(1, 8) 2 /γ(3, 8) 2 = 2. Now this is not a linear relation; linear independence over Q has complexity ramifications. But this algebraic relation also has algorithmic ramifications. The linear independence implies that the multiplicative complexityof multiplying an arbitrary vector by G is 3. But because of the algebraic relation, it is not true (as is the case for quadratic extensions by indeterminates) that all minimal algorithms for this product are quadratic. A nonquadratic minimal algorithm is given via the factorization G = γ(1, 8) 0 0 γ(1, 8) 11− √ 2 √ 2 − 11 . (9.9) As for computing the product of G and k distinct vectors, theory has it that the multiplicative complexity is 3k [5]. In other words, a direct sum theorem hold for this case. This result, and its generalization, due to Auslander and Winograd [5], is very deep; its proof is very complicated. But it yields great rewards. The multiplicative complexityof all DFTs and DCTs are established using this result. The key to obtaining multiplicative complexity results for DFTs and DCTs is to find the appropriate block diagonalizations that transform these linear operators to such direct sums, and then to invoke this fundamental theorem. We will next cite this theorem, and then describe explicitly how we apply it to DFTs and DCTs. Fundamental Theorem (Auslander-Winograd): Let P j be polynomials of degrees d j , respectively, over a field φ.LetF j denote polynomials of degree d j − 1 with complex coefficients (that is, they c 1999 by CRC Press LLC are complex numbers). For non-negative integers k j ,letT(k j ,F j ,P j ) denote the task of computing k j products of arbitrary polynomials by F j modulo P j .Let j T(k j ,F j ,P j ) denote the task of simultaneously computing all of these products. If the coefficients span a vector space of dimension j d j over φ, then the multiplicative complexityof j T(k j ,F j ,P j ) is j k j (2d j − 1). In other words, if the dimension assumption holds, then so does the direct sum theorem for this case. Multiplicative complexity results for DFTs and DCTs assert that their computation is linear in the size of the input. The measure is number of nonrational multiplications. More specifically, inall cases (arbitrary input sizes, arbitrary dimensions), the number of nonrational multiplications necessary for computing these transforms is always less than twice the size of the input. The exact numbers are interesting, but more important is the algebraic structure of the transforms which lead to these numbers. This is what will be emphasized in the remainder of this chapter. Some special cases will be discussed in greater detail; general results will be reviewed rather briefly. The following notation will be convenient. If A, B are matrices with real entries, and R, S are invertible rational matrices such that A = RBS, then we will say that A is rationally equivalent (or more plainly, equivalent) to B and write A ≈ B. The multiplicative complexityof A is the same as that of B. 9.2 One-Dimensional DFTs We will build up the theory for the DFT in stages. The one-dimensional DFT on input size N is a linear operator whose matrix is given by F N = w jk ,wherew = e 2πi/N , and j, k index the rows and columns of the matrix, respectively. The first row and first column of F N have all entries equal to 1, so the multiplicative complexityof F N are the same as that of its “core” C N , its minor comprising its last N − 1 rows and N − 1 columns. The first results were for one-dimensional DFTs on input sizes which are prime [24]. For p a prime integer, the set of integers between 0 and p−1 form a cyclic group under multiplication modulo p. It was shown by Rader [19] that there exist permutations of the rows and columns of the core C N that bring it to the cyclic convolution w g j +k ,whereg is any generator of the cyclic group described above. Using the decomposition for cyclic convolutions described above, we decompose the core to a direct sum of convolutions modulo the irreducible factors of u p−1 − 1. This decomposition into cyclotomic polynomials is well known [18]. There are τ(p− 1) irreducible factors, where τ (n) is the number of positive divisors of the positive integer n. One direct summand is the 1 × 1 matrix corresponding to the factor u − 1, and its entry is −1 (in particular, rational). Also, the coefficients of the other polynomials comprising the direct summands are all linearly independent over Q, hence the fundamental theorem (in its weakest form) applies. It yields that the multiplicative complexityof F p for p a prime is 2p − τ(p− 1) − 3. Next is the case for N = p k where p is an odd prime and the integer k is greater than 1. The group of units comprising those integers between 0 and p − 1 which are relatively prime to p, and under multiplication modulo p,isoforderp k − p k−1 . A Rader-like permutation [24] brings the sub-core, whose rows and columns are indexed by the entries in this group of units, to a cyclic convolution. The group of units, when multiplied by p, forms an orbit of order p k−1 −p k−2 (p elements in the group of units map to the same element in the orbit), and the Rader-like permutations induces a permutation ontheorbit, whichyields cyclicconvolutionsofthesizesoftheorbit. Thisproceedsuntilthefinalorbit ofsize p−1. Thesecyclicconvolutionsaredecomposedvia theChineseremaindertheorem,and (after muchcancellationand rearrangement) it can be shown that the core C N in this case reducesto k direct summands, eachof which is a semi-direct sum of j(p−1)(p k−j −p k−j−1 )dimensional convolutions modulo irreducible polynomials, j = 1, 2, .,k. Also, the dimension of the coefficients of the polynomials is precisely k j=1 (p−1)(p k−j − p k−j−1 ). These are precisely the conditions sufficient to invoke the fundamental theorem. This algebraic decompositionyields minimal algorithms. When c 1999 by CRC Press LLC one adds all these up, the numerical result is that the multiplicative complexity for the DFT on p k points where p is an odd prime and k a positive integer, is 2p k − k − 2 − k 2 +k 2 τ(p− 1). The case of the one dimensional DFT on N = 2 n points is most familiar. In this case, F N = P N F N/2 G N/2 R N (9.10) where P N is the permutation matrix which rearranges the output to even entries followed by odd entries, R N is a rational matrix for computing the so-called “butterfly additions,” and G N/2 = D N/2 F N/2 ,whereD N/2 is a diagonal matrix whose entries are the so-called “twiddle factors.” This leads to the classical divide-and-conquer algorithm called the FFT. For our purposes, G N/2 is equiva- lent to a directsum of twopolynomial products modulo u 2 j j = 0, .,n−3. It is routine to proceed inductively, and then show that the hypothesis of the fundamental theorem are satisfied. Without details, the final result is that the complexityof the DFT on N = 2 n points is 2 n+1 − n 2 − n − 2. Again, the complexity is below 2N. For the general one-dimensional DFT case, we start with the equivalence F mn ≈ F m ⊗ F n , whenever m and n are relatively prime, and where⊗ denotes the tensor product. If m and n are of the forms p k for some prime p and positive integer k, then from above, both F m and F n are equivalent to direct sums of polynomial products modulo irreducible polynomials. Applying the theorem of Kroeneker/Albert, which states that the tensor product of algebraic extension fields is isomorphic to a direct sum of fields, we have that F mn is, therefore, equivalent to a direct sum of polynomial products modulo irreducible polynomials. When one follows the construction suggested by the theorem and counts the dimensionality of the coefficients, one can show that this direct sum system satisfies the hypothesis of the fundamental theorem. This argument extends to the general one-dimensional case of F N where N = j p k j j with p j distinct primes. 9.3 Multidimensional DFTs The k-dimensional DFT on N 1 , ., N k points is equivalentto the tensorproduct F N 1 ⊗···⊗ F N k . Directly from the theorem of Kroeneker/Albert, this is equivalent to a direct sum of polynomial prod- ucts modulo irreducible polynomials. It can be shown that this system satisfies the hypothesis of the fundamental theorem so that complexity results can be directly invoked for the general multidimen- sional DFT. Details can be found in [4]. More interesting than the general case are some special cases with unique properties. The k-dimensional DFT on p , ., ppoints, where p is an odd prime, is quite remarkable. The coreof this transform is a cyclic convolutionmodulo u p k −1 −1. The core of the matrix corresponding to F p ⊗···⊗ F p , which is the entire matrix minus its first row and column, can be brought into this large cyclic convolution by a permutation derived from a generator of the group of units of the field with p k elements. The details are in [2]. Even more remarkably, this large cyclic convolution is equivalent to a direct sum of p + 1 copies of the same cyclic convolution obtainable from the core of the one-dimensional DFT on p points. In other words, the k-dimensional DFT on p, ., ppoints, where p is an odd prime, is equivalent to a direct sum of p + 1 copies of the one-dimensional DFT on p points. In particular, its multiplicative complexity is (p + 1)(2p − τ(p− 1) − 3). Another particularly interesting case is the k-dimensional DFT on N, ., N points, where N = 2 k . This transform is equivalent to the k-fold tensor product F N ⊗···⊗ F N , and we have seen above the recursive decomposition of F N to a direct sum of F N/2 and G N/2 . The semi-simple Abelian construction [3, 8] yields that F N/2 ⊗ G N/2 is equivalent to N/2 copies of G N/2 , and likewise that F N/2 ⊗G N/2 is equivalent to N/2 copies of G N/2 .Hence,F N and F N is equivalent to 3N/2 copies of G N/2 plus F N/2 ⊗ F N/2 . This leads recursively to a complete decomposition of the two-dimensional c 1999 by CRC Press LLC DFT to a direct sum of polynomial products modulo irreducible polynomials (of the form u 2 m + 1 in this case). The extensions to arbitrary dimensions are quite detailed but straightforward. 9.4 One-Dimensional DCTs As in the case of DFTs, DCTs are also all equivalent to direct sums of polynomial multiplications modulo irreducible polynomials and satisfy the hypothesis of the fundamental theorem. In fact, some instances are easier to handle. A fast way to see the structure of the DCT is by relating it to the DFT. Let C N denote the one-dimensional DCT on N points; recall we defined F N to be the one-dimensional DFT on N points. It can be shown [14] that F 4N is equivalent to a direct sum of two copies of C N plus one copy of F 2N . This is sufficient to yield complexity results for all one-dimensional DCTs. But for some special cases, direct derivations are more revealing. For example, when N = 2 k ,C N is equivalent to a direct sum of polynomial products modulo u 2 j + 1, for j = 1, .,k− 1. This is a much simpler form than the corresponding one for the DFT on 2 k points. It is then straightforward to check that this direct sum system satisfies the hypothesis of the fundamental theorem, and then that the multiplicative complexityof C 2 k is 2 k+1 − n − 2. Another (not so) special case is when N is an odd integer. Then C N is equivalent to F N ,from which complexity results follow directly. Another useful result is that, as in the case of the DFT, C pq is equivalent to C p ⊗ C q where p and q are relatively prime [26]. We can then use the theorem of Kroeneker/Albert [10] to build direct sum structures for DCTs of composites given direct sums of the various components. 9.5 Multidimensional DCTs Here too, once the one-dimensional DCT structures are known, their extensions to multidimensions via tensor products, utilizing the theorem of Kroeneker/Albert, is straightforward. This leads to the appropriate direct sum structures, proving that the coefficients satisfy the hypothesis of the fundamental theorem does require some careful applications of elementary number theory. This is done in [10]. A most interesting special case is multidimensional DCT on input sizes which are powers of 2 in each dimension. If the input is k dimensional with size 2 j 1 × .× 2 j k , and j 1 ≤ j i ,i= 2, ., k, then the multidimensional DCT is equivalent to 2 j 2 × .× 2 j k copies of the one-dimensional DCT on 2 j 1 points [11]. This is a much more straightforward result than the corresponding one for multidimensional DFTs. 9.6 Nonstandard Models and Problems DCTs have become popular because of their role in compression. In such roles, the DCT is usually followed by quantization. Therefore, in such applications, one need not actually compute the DCT but a scaled version of it, and then absorb the scaling into the quantization step. For the one- dimensional case this means that one can replace the computation of a product by C withaproduct by a matrix DC,whereD is diagonal. It turns out [9, 16] that for propitious choices of D, the computation of the product by DC is easier than that by C. The question naturally arises—what is the minimum number of steps required to compute a product of the form DC,whereD can be any diagonal matrix? Our ability to answer such a question is very limited. All we can say today is that if we can compute a scaled DCT on N points with m multiplications, then certainly we can compute a DCT on N multiplications with m + N points. Since we know the complexityof DCTs, this gives a c 1999 by CRC Press LLC lower bound on the complexityof scaled DCTs. For example, the one-dimensional DCT on 8 points (the most popular applied case) requires 12 multiplications. (The reader may see the number 11 in the literature; this is for the case of the “unnormalized DCT” in which the DC component is scaled. The unnormalized DCT is not orthogonal.) Suppose a scaled DCT on 8 points can be done with m multiplications. Then 8 + m ≥ 12,orm ≥ 4. An algorithm for the scaled DCT on 8 points which uses 5 multiplications is known [9, 16]. It is an open question whether one can actually do it in 4 multiplications or not. Similarly, the two-dimensional DCT on 8 × 8 points can be done with 54 multiplications [9, 12], and theory says that at least 24 are needed [11]. The gap is very wide, and I know of stronger results as of this writing. Machines whose primitive operations are fused multiply-accumulate are becoming very popular, especially in the higher end workstation arena. Here a single cycle can yield a result of the form ab + c for arbitrary floating point numbers a, b,c; we call such an operation a “mutiply/add.” Lower bounds are obviously bounded below by lower bounds for number of multiplications and also for lower bounds on number of additions. The latter is a wide open subject. A simple yet instructive example involves multiplications of a 4 × 4 Hadamard matrix. It is well known that, in general, multiplication by an N × N Hadamard matrix, where N is a power of 2, can be done with Nlog 2 N additions. Recently it was shown [7] that the 4 × 4 case can be done with 7 multiply/add operations [7]. This result has not been extended, and it may in fact be rather hard to extend except in most trivial (and uninteresting) ways. Upper bounds of DFTs have been obtained. It was shown in [17] that a complex DFT on N = 2 k points can be done with 8 3 Nk− 16 9 N + 2 − 2 9 (−1) k real multiply/adds. For real input, an upper bound of 4 3 Nk− 17 9 N +3− 2 9 (−1) k real multiply/adds was given. These were later improved slightly using the results of the Hadamard transform computation. Similar multidimensional results were also obtained. In the past several years new, more powerful, processors have been introduced. Sun and HP have incorporated new vector instructions. Intel has introduced its aggressive Intel’s MMX architecture. And new MSPs (multimedia signal processors) from Philips, Samsung, and Chromatic are pushing similar designs even more aggressively. These will lead to new models of computation. Astounding (though probablynot surprising) upperboundswill be announced; lowerbounds are suretocontinue to baffle. References [1] Albert, A., Structure of Algebras, AMS Colloqium Publications, Vol. 21, 1939. [2] Auslander, L., Feig, E., and Winograd, S., New algorithms for the multidimensional discrete Fourier transform, IEEE Trans. Accoust. Speech Signal Process., ASSP-31(2): 388–403, Apr., 1983. [3] Auslander, L., Feig, E., and Winograd, S., Abelian semi-simple algebras and algorithms for the discrete Fourier transform, Adv. Appl. Math., 5: 31–55, Mar., 1984. [4] Auslander, L., Feig, E., and Winograd, S., The multiplicative complexityof the discrete Fourier transform, Adv. Appl. Math., 5: 87–109, Mar., 1984. [5] Auslander, L. and Winograd, S., The multiplicative complexityof certain semilinear systems defined by polynomials, Adv. Appl. Math., 1(3): 257–299, 1980. [6] Brocket, R.W. and Dobkin, D., On the optimal evaluation of a set of bilinear forms, Linear Algebra Appl., 19(3): 207–235, 1978. [7] Coppersmith, D., Feig, E., and Linzer, E., Hadamard transforms on multiply/add architectures, IEEE Trans. Signal Processing, 46(4): 969–970, Apr., 1994. [8] Feig, E., New algorithms for the 2-dimensional discrete Fourier transform, IBM RC 8897 (No. 39031), June, 1981. c 1999 by CRC Press LLC [...]... 1990 [10] Feig, E and Linzer, E., The multiplicative complexityof discrete cosine transforms, Adv Appl Math., 13: 494–503, 1992 [11] Feig, E and Winograd, S., On the multiplicative complexityof discrete cosine transforms, IEEE Trans Inf Theory, 38(4): 1387–1391, July, 1992 [12] Feig, E and Winograd, S., Fast algorithms for the discrete cosine transform, IEEE Trans Signal Processing, 40:(9) Sept., 1992... divisionen, J Reine Angew Math., 264: 184–202, 1973 [21] Strassen, V., Gaussian elimination is not optimal, Numer Math., 13: 354–356, 1969 [22] Winograd, S., On the number of multiplications necessary to compute certain functions, Commun Pure Appl Math., No 23, 165–179, 1970 [23] Winograd, S., Some bilinear forms whose multiplicative complexity depends on the field of constants, Math Syst Theory, 10(2):... Trans Signal Processing, 40:(9) Sept., 1992 [13] Fiduccia C.M., and Zalcstein, Y., Algebras having linear multiplicative complexities, J ACM, 24(2): 311–331, 1977 [14] Heideman, M.T., Multiplicative Complexity, Convolution, and the DFT, Springer-Verlag, New York, 1988 [15] Hopcroft, J and Kerr, L., On minimizing the number of multiplications necessary for matrix multiplication, SIAM J Appl Math., 20:... Math Syst Theory, 10(2): 169–180, 1977 [24] Winograd, S., On the multiplicative complexityof the discrete Fourier transform, Adv Math., 32(2): 83–117, May, 1979 [25] Winograd, S., Arithmetic Complexityof Computations, CBMS-NSF Regional Conference Series in Applied Math, 1980 [26] Yang, P.P.N and Narasimha, M.J., Prime Factor Decomposition of the Discrete Cosine Transform and its Hardware Realization,... images, Trans IEICE, E-71(11): 1095–1097, Nov., 1988 [17] Linzer, E and Feig, E., Modified FFTs for fused multiply-add architectures, Math Comput., 60(201): 347–361, Jan., 1993 [18] Niven, I and Zuckerman, H.S., An Introduction to the Theoryof Numbers, John Wiley & Sons, New York, 1980 [19] Rader, C.M., Discrete Fourier transforms when the number of data samples is prime, Proc IEEE, 56(6): 1107–1108, June, . Feig, E. Complexity Theory of Transforms in Signal Processing Digital Signal Processing Handbook Ed. Vijay K. Madisetti and. So far intuition proved accurate. In complexity theory one learns early on to be skeptical of intuitions. An early surprising result in complexitytheory—