1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

10 Fast Matrix Computations

10 336 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 177,96 KB

Nội dung

Yagle, A.E. “Fast Matrix Computations” Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c  1999byCRCPressLLC 10 Fast Matrix Computations Andrew E. Yagle University of Michigan 10.1 Introduction 10.2 Divide-and-Conquer Fast Matrix Multiplication Strassen Algorithm • Divide-and-Conquer • Arbitrary Preci- sion Approximation (APA) Algorithms • Number Theoretic Transform (NTT) Based Algorithms 10.3 Wavelet-Based Matrix Sparsification Overview • The Wavelet Transform • Wavelet Representations of Integral Operators • Heuristic Interpretation of Wavelet Sparsification References 10.1 Introduction This chapter presents two major approaches to fast matrix multiplication. We restrict our attention to matrix multiplication, excluding matrix addition and matrix inversion, since matrix addition admits no fast algorithm structure (save for the obvious parallelization), and matrix inversion (i.e., solution of large linear systems of equations) is generally performed by iterative algorithms that require repeated matrix-matrix or matrix-vector multiplications. Hence, matrix multiplication is the real problem of interest. We presenttwomajorapproachestofastmatrix multiplication. Thefirst is the divide-and-conquer strategy made possible by Strassen’s [1] remarkable reformulation of non-commutative 2 × 2 matrix multiplication. We also present the APA (arbitrary precision approximation) algorithms, which improve on Strassen’s resultatthepriceofapproximation, and a recentresult that reformulatesmatrix multiplication as convolution and applies number theoretic transforms. The second approach is to use a wavelet basis to sparsify the representation of Calderon-Zygmund operators as matrices. Since electromagnetic Green’s functions are Calderon-Zygmund operators, this has proven to be useful in solving integral equations in electromagnetics. The sparsified matrix representation is used in an iterative algorithm to solve the linear system of equations associated with the integral equations, greatly reducing the computation. We also present some new insights that make the wavelet-induced sparsification seem less mysterious. 10.2 Divide-and-Conquer Fast Matrix Multiplication 10.2.1 Strassen Algorithm It is not obvious that there should be any way to perform matrix multiplication other than using the definition of matrix multiplication, for which multiplying two N × N matrices requires N 3 c  1999 by CRC Press LLC multiplications and additions (N for each of the N 2 elements of the resulting matrix). However, in 1969 Strassen [1] made the remarkable observation that the product of two 2 × 2 matrices  a 1,1 a 1,2 a 2,1 a 2,2  b 1,1 b 1,2 b 2,1 b 2,2  =  c 1,1 c 1,2 c 2,1 c 2,2  (10.1) may be computed using only seven multiplications (fewer than the obvious eight), as m 1 = (a 1,2 − a 2,2 )(b 2,1 + b 2,2 ); m 3 = (a 1,1 − a 2,1 )(b 1,1 + b 1,2 ) m 2 = (a 1,1 + a 2,2 )(b 1,1 + b 2,2 ) m 4 = (a 1,1 + a 1,2 )b 2,2 ; m 7 = (a 2,1 + a 2,2 )b 1,1 m 5 = a 1,1 (b 1,2 − b 2,2 ); m 6 = a 2,2 (b 2,1 − b 1,1 ) c 1,1 = m 1 + m 2 − m 4 + m 6 ; c 1,2 = m 4 + m 5 c 2,2 = m 2 − m 3 + m 5 − m 7 ; c 2,1 = m 6 + m 7 (10.2) A vital feature of (10.2) is that it is non-commutative, i.e., it does not depend on the commutative property of multiplication. This can be seen easily by noting that each of the m i are the product of a linear combination of the elements of A by a linear combination of the elements of B, in that order, so that it is never necessary to use, say a 2,2 b 2,1 = b 2,1 a 2,2 . We note there exist commutative algorithms for 2 × 2 matrix multiplication that require even fewer operations, but they are of little practical use. Thesignificanceofnoncommutativityisthat the noncommutative algorithm (10.2)maybeapplied as is to block matrices. That is, if the a i,j ,b i,j and c i,j in (10.1) and (10.2)arereplacedbyblock matrices, (10.2) is still true. Since matrix multiplication can be subdivided into block submatrix operations (i.e. (10.1)isstilltrueifa i,j ,b i,j and c i,j are replaced by block matrices), this immediately leads to a divide-and-conquer fast algorithm. 10.2.2 Divide-and-Conquer To see this, consider the 2 n × 2 n matrix multiplication AB = C,whereA, B, C are all 2 n × 2 n matrices. Using the usual definition, this requires (2 n ) 3 = 8 n multiplications and additions. But if A, B, C are subdivided into 2 n−1 × 2 n−1 blocks a i,j ,b i,j ,c i,j , then AB = C becomes (10.1), which can be implemented with (10.2) since (10.2) does not require the products of subblocks of A and B to commute. Thus the 2 n × 2 n matrix multiplication AB = C can actually be implemented using only seven matrix multiplications of 2 n−1 × 2 n−1 subblocks of A and B. And these subblock multiplications can in turn be broken down by using (10.2) to implement them as well. The end result is that the 2 n ×2 n matrix multiplication AB = C can be implemented using only 7 n multiplications, instead of 8 n . The computational savings grow as the matrix size increases. For n = 5 (32 × 32 matrices) the savings is about 50%. For n = 12 (4096 × 4096 matrices) the savings is about 80%. The savings as a fraction can be made arbitrarily close to unity by taking sufficiently large matrices. Another way of looking at this is to note that N × N matrix multiplication requires O(N log 2 7 ) = O(N 2.807 )<N 3 multiplications using Strassen. Of course we are not limited to subdividing into 2 × 2 = 4 subblocks. Fast non-commutative algorithms for 3 × 3 matrix multiplication requiring only 23 < 3 3 = 27 multiplications were found by exhaustive search in [2] and [3]; 23 is now known to be optimal. Repeatedly subdividing AB = C into 3 × 3 = 9 subblocks computes a 3 n × 3 n matrix multiplication in 23 n < 27 n multiplications; N × N matrix multiplication requires O(N log 3 23 ) = O(N 2.854 ) multiplications, so this is not quite as good as using (10.2). A fast noncommutative algorithm for 5 × 5 matrix multiplication requiring only 102 < 5 3 = 125 multiplications was found in [4]; this also seems to be optimal. Using this c  1999 by CRC Press LLC algorithm, N × N matrix multiplication requires O(N log 5 102 ) = O(N 2.874 ) multiplications, so this is even worse. Of course, the idea is to write N = 2 a 3 b 5 c for some a, b,c and subdivide into 2 ×2 = 4 subblocks a times, then subdivide into 3 ×3 = 9 subblocks b times, etc. The total number of multiplications is then 7 a 23 b 102 c < 8 a 27 b 125 c = N 3 . Note that we have not mentioned additions. Readers familiar with nesting fast convolution algo- rithms will know why; now we review why reducing multiplications is much more important than reducing additions when nesting algorithms. The reason is that at each nesting stage (reversing the divide-and-conquer to build up algorithms for multiplying large matrices from (10.2)), each scalar addition is replaced by a matrix addition (which requires N 2 additions for N × N matrices), and each scalar multiplication is replaced by a matrix multiplication (which requires N 3 multiplications and additions for N × N matrices). Although we are reducing N 3 to about N 2.8 , it is clear that each multiplication will produce more multiplications and additions as we nest than each addition. So reducing the number of multiplications from eight to seven in (10.2) is well worth the extra additions incurred. In fact, the number of additions is also O(N 2.807 ). The design of these base algorithms has been based on the theory of bilinear and trilinear forms. The review paper [5] and book [6] of Pan are good introductions to this theory. We note that reducing the exponent of N in N × N matrix multiplication is an area of active research. This exponent has been reduced to below 2.5; a known lower bound is two. However, the resulting algorithms are too complicated to be useful. 10.2.3 Arbitrary Precision Approximation (APA) Algorithms APA algorithms are noncommutative algorithms for 2 × 2 and 3 × 3 matrix multiplication that require even fewer multiplications than the Strassen-type algorithms, but at the price of requiring longer word lengths. Proposed by Bini [7], the APA algorithm for multiplying two 2 × 2 matrices is this: p 1 = (a 2,1 + a 1,2 )(b 2,1 + b 1,2 ) ; p 2 = (−a 2,1 + a 1,1 )(b 1,1 + b 1,2 ) p 3 = (a 2,2 − a 1,2 )(b 2,1 + b 2,2 ) ; p 4 = a 2,1 (b 1,1 − b 2,1 ) ; p 5 = (a 2,1 + a 2,2 )b 2,1 c 1,1 = (p 1 + p 2 + p 4 )/ − (a 1,1 + a 1,2 )b 1,2 ; c 2,1 = p 4 + p 5 ; c 2,2 = (p 1 + p 3 − p 5 )/ − a 1,2 (b 1,2 − b 2,2 ). (10.3) If we now let  → 0, the second terms in (10.3) become negligible next to the first terms, and so they need not be computed. Hence, three of the four elements of C = AB may be computed using only five multiplications. c 1,2 may be computed using a sixth multiplication, so that, in fact, two 2 ×2 matrices may be multiplied to arbitrary accuracy using only six multiplications. The APA 3 ×3 matrix multiplication algorithm requires 21 multiplications. Note that APA algorithms improve on the exact Strassen-type algorithms (6 < 7, 21 < 23). The APA algorithms are often described as being numerically unstable, due to roundoff error as  → 0. We believe that an electrical engineering perspective on these algorithms puts them in a light different from that of the mathematical perspective. In fixed point implementation, the computation AB = C can be scaled to operations on integers, and the p i can be bounded. Then it is easy to set  a sufficiently small (negative) power of two to ensure that the second terms in (10.3) do not overlap the first terms, provided that the wordlength is long enough. Thus, the reputation for instability c  1999 by CRC Press LLC is undeserved. However, the requirement of large wordlengths to be multiplied seems also to have escaped notice; this may be a more serious problem in some architectures. The divide-and-conquer and resulting nesting of APA algorithms work the same way as for the Strassen-typealgorithms. N×N matrixmultiplicationusing(10.3)requires O(N log 2 (6) ) = O(N 2.585 ) multiplications, which improves on the O(N 2.807 ) multiplications using (10.2). But the wordlengths are longer. A design methodology for fast matrix multiplication algorithms by grouping terms has been proposed in a series of papers by Pan (see References [5] and [6]). While this has proven quite fruitful, the methodology of grouping terms becomes somewhat ad hoc. 10.2.4 Number Theoretic Transform (NTT) Based Algorithms An approach similar in flavor to the APA algorithms, but more flexible, has been taken recentlyin [8]. First, matrix multiplication is reformulated as a linear convolution, which can be implemented as the multiplication of two polynomials using the z-transform. Second, the variable z is scaled, producing a scaled convolution, which is then made cyclic. This aliases some quantities, but they are separated by a power of the scaling factor. Third, the scaled convolution is computed using pseudo-number- theoretic transforms. Finally, the various components of the product matrix are read off of the convolution, using the fact that the elements of the product matrix are bounded. This can be done without error if the scaling factor is sufficiently large. This approach yields algorithms that require the same number of multiplications or fewer as APA for 2 × 2 and 3 × 3 matrices. The multiplicands are again sums of scaled matrix elements as in APA. However, the design methodology is quite simple and straightforward, and the reason why the fast algorithm exists is now clear, unlike the APA algorithms. Also, the integer computations inherent in this formulation make possible the engineering insights into APA noted above. We reformulate the product of two N ×N matrices as the linearconvolutionofa sequence of length N 2 and a sparse sequence of length N 3 − N + 1. This results in a sequence of length N 3 + N 2 − N, from which elements of the product matrix may be obtained. For convenience, we write the linear convolution as the product of two polynomials. This result (of [8]) seems to be new, although a similar result is briefly noted in ([3], p. 197). Define a i,j = a i+jN ; b i,j = b N−1−i+jN ; 0 ≤ i, j ≤ N − 1   N−1  i=0 N−1  j=0 a i+jN x i+jN     N−1  i=0 N−1  j=0 b N−1−i+jN x N(N−1−i+jN)   = N 3 +N 2 −N−1  i=0 c i x i c i,j = c N 2 −N+i+jN 2 ; 0 ≤ i, j ≤ N − 1 . (10.4) Note that coefficientsofall three polynomials are read off of the matrices A, B, C column-by-column (each column of B is reversed), and the result is noncommutative. For example, the 2 × 2 matrix multiplication (10.1) becomes  a 1,1 + a 2,1 x + a 1,2 x 2 + a 2,2 x 3  b 2,1 + b 1,1 x 2 + b 2,2 x 4 + b 1,2 x 6  =∗+∗x + c 1,1 x 2 + c 2,1 x 3 +∗x 4 +∗x 5 + c 1,2 x 6 + c 2,2 x 7 +∗x 8 +∗x 9 , (10.5) c  1999 by CRC Press LLC where ∗ denotes an irrelevant quantity. In (10.5) substitute x = szand take the result mod(z 6 −1). This gives  a 1,1 + a 2,1 sz + a 1,2 s 2 z 2 + a 2,2 s 3 z 3  (b 2,1 + b 1,2 s 6 ) + b 1,1 s 2 z 2 + b 2,2 s 4 z 4  = (∗+c 1,2 s 6 ) + (∗s + c 2,2 s 7 )z + (c 1,1 s 2 +∗s 8 )z 2 + (c 2,1 s 3 +∗s 9 )z 3 +∗z 4 +∗z 5 ; mod(z 6 − 1) (10.6) If |c i,j |, |∗| <s 6 then the ∗ and c i,j may be separated without error, since both are known to be integers. If s is a power of two, c 0,1 may be obtained by discarding the 6 log 2 s least significant bits in the binary representationof ∗+c 0,1 s 6 . The polynomial multiplication mod(z 6 −1) can be computed using number-theoretic transforms [9] using six multiplications. Hence, 2 ×2 matrix multiplication requires six multiplications. Similarly, 3 × 3 matrices may be multiplied using 21 multiplications. Note these are the same numbers required by the APA algorithms, quantities multiplied are again sums of scaled matrix elements, and results are again sums in which one quantity is partitioned from another quantity which is of no interest. However, this approach is more flexible than the APA approach (see [8]). As an extreme case, setting z = 1 in (10.5)computesa2 × 2 matrix multiplication using ONE (very long wordlength) multiplication! For example, using s = 100  24 35  98 76  =  46 40 62 54  (10.7) becomes the single scalar multiplication (5, 040, 302)(8, 000, 600, 090, 007) = 40, 325, 440,634, 862, 462, 114 . (10.8) This is useful in optical computing architectures for multiplying large numbers. 10.3 Wavelet-Based Matrix Sparsification 10.3.1 Overview A commonapplicationofsolvinglargelinearsystemsof equations is the solution of integral equations arisingin,say, electromagnetics. Theintegralequationistransformedintoalinearsystemofequations using Galerkin’s method, so that entries in the matrix and vectors of knowns and unknowns are coefficients of basis functions used to represent the continuous functions in the integral equation. Intelligent selection of the basis functions results in a sparse (mostly zero entries) system matrix. The sparse linear system of unknowns is then usually solved using an iterative algorithm, which is where the sparseness becomes an advantage (iterative algorithms require repeated multiplication of the system matrix by the current approximation to the vector of unknowns). Recently, wavelets have been recognized as a good choice of basis function for a wide variety of applications, especially in electromagnetics. This is true because in electromagnetics the kernel of the integral equation is a 2-D or 3-D Green’s function for the wave equation, and these are Calderon- Zygmund operators. Using wavelets as basis functions makes the matrix representation of the kernel dropoffrapidlyaway fromthemaindiagonal, morerapidly thandiscretizationof theintegral equation would produce. Here we quickly review the wavelet transform as a representation of continuous functions and show how it sparsifies Calderon-Zygmund integral operators. We also provide some insight into why this happens and present some alternatives that make the sparsification less mysterious. We present our results in terms of continuous (integral) operators, rather than discrete matrices, since this is the proper presentation for applications, and also since similar results can be obtained for the explicitly discrete case. c  1999 by CRC Press LLC 10.3.2 The Wavelet Transform We will not attempt to present even an overview of the rich subject of wavelets. The reader is urged to consult the many papers and textbooks (e.g., [10]) now being published on the subject. Instead, we restrict our attention to aspects of wavelets essential to sparsification of matrix operator representations. ThewavelettransformofanL 2 function f(x)is defined as f i (n) = 2 i/2  ∞ −∞ f(x)ψ(2 i x − n)dx; f(x) =  i  n f i (n)ψ(2 i x − n)2 i/2 (10.9) where {ψ(2 i x −n), i, n ∈ Z} is a complete orthonormal basis for L 2 . That is L 2 (the space of square- integrable functions) is spanned by dilations (scaling) and translations of a wavelet basis function ψ(x). Constructing this ψ(x) is nontrivial, but has been done extensively in the literature. Sincethe summations must be truncatedtofinite intervals in practice, we define the wavelet scaling function φ(x)whose translations on agivenscale span the spacespanned by thewavelet basis function ψ(x) at all translations and at scales coarser than the given scale. Then we can write f(x) = 2 I/2  n c I (n)φ(2 I x − n) + ∞  i=I  n f i (n)ψ(2 i x − n)2 i/2 c I (n) = 2 I/2  ∞ −∞ f(x)φ(2 I x − n)dx (10.10) So the projection c I (n) of f(x) on the scaling function φ(x) at scale I replaces the projections f i (n) on the basis function ψ(x) on scales coarser (smaller) than I. The scaling function φ(x) is orthogonal to its translations but (unlike the basis function ψ(x)) is not orthogonal between scales. Truncating the summation at the upper end approximates f(x)at the resolution defined bythe finest (largest) scale i; this is somewhat analogous to truncating Fourier series expansions and neglecting high-frequency components. We also define the 2-D wavelet transform of f (x,y) as f i,j (m, n) = 2 i/2 2 j/2  ∞ −∞  ∞ −∞ f(x, y)ψ(2 i x − m)ψ(2 j y − n)dx dy f(x, y) =  i,j,m,n f i,j (m, n)ψ(2 i x − m)ψ(2 j y − n)2 i/2 2 i/2 (10.11) However, it is more convenient to use the 2-D counterpart of (10.10), which is c I (m, n) = 2 I  ∞ −∞  ∞ −∞ f(x, y)φ(2 I x − m)φ(2 I y − n)dx dy f 1 i (m, n) = 2 i  ∞ −∞  ∞ −∞ f(x, y)φ(2 i x − m)ψ(2 i y − n)dx dy f 2 i (m, n) = 2 i  ∞ −∞  ∞ −∞ f(x, y)ψ(2 i x − m)φ(2 i y − n)dx dy f 3 i (m, n) = 2 i  ∞ −∞  ∞ −∞ f(x, y)ψ(2 i x − m)ψ(2 i y − n)dx dy f(x, y) =  m,n c I (m, n)φ(2 I x − m)φ(2 I y − n)2 I c  1999 by CRC Press LLC + ∞  i=I  m,n f 1 i (m, n)φ(2 i x − m)ψ(2 i y − n)2 i + ∞  i=I  m,n f 2 i (m, n)ψ(2 i x − m)φ(2 i y − n)2 i + ∞  i=I  m,n f 3 i (m, n)ψ(2 i x − m)ψ(2 i y − n)2 i . (10.12) Once again the projection c I (m, n) on the scaling function at scale I replaces all projections on the basis functions on scales coarser than M. Some examples of wavelet scaling and basis functions: Scaling pulse B-spline sinc softsinc Daubechies Wavelet Haar Battle-Lemarie Paley-Littlewood Meyer Daubechies An important property of the wavelet basis function ψ(x) is that its first k moments can be made zero, for any integer k [10]:  ∞ −∞ x i ψ(x)dx = 0,i= 0 .k (10.13) 10.3.3 Wavelet Representations of Integral Operators We wish to use wavelets to sparsify the L 2 integral operator K(x,y) in g(x) =  ∞ −∞ K(x, y)f (y)dy (10.14) A common situation: (10.14) is an integral equation with known kernel K(x, y) and known g(x) in which the goal is to compute an unknown function f(y). Often the kernel K(x, y) is the Green’s function (spatial impulse response) relating observed wave field or signal g(x) to unknown source field or signal f(y). For example, the Green’s function for Laplace’s equation in free space is G(r) =− 1 2π log r(2D); 1 4πr (3D) (10.15) where r is the distance separating the points of source and observation. Now consider a line source in an infinite 2-D homogeneous medium, with observations made along the same line. The observed field strength g(x) at position x is g(x) =− 1 2π  ∞ −∞ log |x − y|f(y)dy (10.16) where f(y)is the source strength at position y. Using Galerkin’s method, we expand f(y)and g(x) as in (10.9) and K(x,y) as in (10.11). Using the orthogonality of the basis functions yields  j  n K i,j (m, n)f j (n) = g i (m) (10.17) Expanding f(y)and g(x)as in (10.10) and K(x, y)as in (10.12) leads toanother system of equations whichis difficultnotationally towriteout ingeneral, butcanclearlybe donein individualapplications. c  1999 by CRC Press LLC We note here that the entries in the system matrix in this latter case can be rapidly generated using the fast wavelet algorithm of Mallat (see [10]). The point of using wavelets is as follows. K(x, y) is a Calderon-Zygmund operator if | ∂ k ∂x k K(x, y)|+| ∂ k ∂y k K(x, y)|≤ C k |x − y| k+1 (10.18) for some k ≥ 1. Note in particular that the Green’s functions in (10.15) are Calderon-Zygmund operators. Then the representation (10.12)ofK(x, y) has the property [11] |f 1 i (m, n)|+|f 2 i (m, n)|+|f 3 i (m, n)|≤ C k 1 +|m − n| k+1 , |m − n| > 2k (10.19) if the wavelet basis function ψ(x) has its first k moments zero (10.13). This means that using wavelets satisfying (10.13) sparsifies the matrix representation of the kernel K(x, y). Forexample,adirectdiscretizationofthe3-D Green’sfunctionin (10.15)decaysas1/|m−n| as one moves away from the main diagonal m = n in its matrix representation. However, using wavelets, we canattainthemuchfasterdecayrate1/(1+|m − n| k+1 )faraway fromthemain diagonal. By neglecting matrix entries less than some threshold (typically 1% of the largest entry) a sparse and mostly banded matrix is obtained. This greatly speeds up the following matrix computations: 1. Multiplication by the matrix for solving the forward problem of computing the response to a given excitation (as in (10.16)); 2. Fast solution of the linear system of equations for solving the inverse problem of re- constructing the source from a measured response (solving (10.16)asanintegralequa- tion). This is typically performed using an iterative algorithm such as conjugate gradient method. Sparsification is essential for convergence in a reasonable time. A typical sparsified matrix from an electromagnetics application is shown in Figure 6 of [12]. Battle-Lemarie wavelet basisfunctionswereusedtosparsify the Galerkin methodmatrix inanintegral equation for planar dielectric millimeter-wave waveguides and a 1% threshold applied (see [12] for details). Note that the matrix is not only sparse but (mostly) banded. 10.3.4 Heuristic Interpretation of Wavelet Sparsification Why does this sparsification happen? Considerable insight can be gained using (10.13). Let ˆ ψ(ω) be the Fourier transform of the wavelet basis function ψ(x). Since the first k moments of ψ(x) are zero by (10.13) we can expand ˆ ψ(ω) in a power series around ω = 0: ˆ ψ(ω) ≈ ω k ;|ω| << 1 . (10.20) This shows that for small |ω| taking the wavelet transform of f(x)is roughly equivalent to taking the k th derivative of f(x). This can be confirmed that many wavelet basis functions bear a striking resemblance to the impulse responses of regularized differentiators. Since K(x,y) is assumed a Calderon-Zygmund operator, its k th derivatives in x and y drop off as 1/|x − y| k+1 . Thus, it is not surprising that the wavelet transform of K(x, y), which is roughly taking k th derivatives, should drop off as 1/|m − n| k+1 . Of course there is more to it, but this is why it happens. It is not surprising that K(x, y)can be sparsified by taking advantage of its derivatives being small. To see a more direct way of accomplishing this, apply integration by parts to (10.14) and take the partial derivative with respect to x. This gives dg(x) dx =−  ∞ −∞  ∂ ∂x ∂ ∂y K(x, y)   y −∞ f(y  )dy   dy (10.21) c  1999 by CRC Press LLC whichwilllikelysparsify asmooth K(x, y). Of course, higher derivativescan be useduntila condition like (10.18) is reached. The operations of integrating f(y) and ∂ k g ∂x k (to get g(x)) k times can be accomplished using nk << n 2 additions, so considerable savings can result. This is different from using wavelets, but in the same spirit. References [1] Strassen, V., Gaussian elimination is not optimal, Numer. Math., 13: 354–356, 1969. [2] Landerman, J.D., A noncommutative algorithm for multiplying 3 × 3 matrices using 23 mul- tiplications, Bull. Am. Math. Soc., 82: 127–128, 1976. [3] Johnson, R.W. and McLoughlin, A.M., Noncommutative bilinear algorithms for 3 × 3 matrix multiplication, SIAM J. Comput., 15: 595–603, 1976. [4] Makarov, O.M., A noncommutative algorithm for multiplying 5 × 5 matrices using 102 mul- tiplications, Inform. Proc. Lett., 23: 115–117, 1986. [5] Pan, V., How can we speed up matrix multiplication? SIAM Rev., 26(3): 393–415, 1984. [6] Pan, V., How Can We Multiply Matrices Faster?, Springer-Verlag, New York, 1984. [7] Bini, D., Capovani, M., Lotti, G. and Romani, F., O(n 2.7799 ) complexity for matrix multipli- cation, Inform. Proc. Lett., 8: 234–235, 1979. [8] Yagle, A.E., Fast algorithms for matrix multiplication using pseudo number theoretic trans- forms, IEEE Trans. Signal Process., 43: 71–76, 1995. [9] Nussbaumer, H.J., Fast Fourier Transforms and Convolution Algorithms, Springer-Verlag, Berlin, 1982. [10] Daubechies, I., TenLecturesonWavelets, SIAM, Philadelphia, PA, 1992. [11] Beylkin, G., Coifman, R. and Rokhlin, V., Fast wavelet transforms and numerical algorithms I, Comm. Pure Appl. Math., 44: 141–183, 1991. [12] Sabetfakhri, K. and Katehi, L.P.B., Analysis of integrated millimeter wave and submillime- ter wave waveguides using orthonormal wavelet expansions, IEEE Trans. Microwave Theor. Technol., 42: 2412–2422, 1994. c  1999 by CRC Press LLC . 1999byCRCPressLLC 10 Fast Matrix Computations Andrew E. Yagle University of Michigan 10. 1 Introduction 10. 2 Divide-and-Conquer Fast Matrix Multiplication. repeated matrix- matrix or matrix- vector multiplications. Hence, matrix multiplication is the real problem of interest. We presenttwomajorapproachestofastmatrix

Ngày đăng: 25/10/2013, 02:15

TỪ KHÓA LIÊN QUAN

w