The Matrix Cookbook Kaare Brandt Petersen Michael Syskind Pedersen Version November 15, 2012 1 Introduction What is this? These pages are a collection of facts (identitie.The Matrix Cookbook Kaare Brandt Petersen Michael Syskind Pedersen Version November 15, 2012 1 Introduction What is this? These pages are a collection of facts (identitie.
The Matrix Cookbook [ http://matrixcookbook.com ] Kaare Brandt Petersen Michael Syskind Pedersen Version: November 15, 2012 Introduction What is this? These pages are a collection of facts (identities, approximations, inequalities, relations, ) about matrices and matters relating to them It is collected in this form for the convenience of anyone who wants a quick desktop reference Disclaimer: The identities, approximations and relations presented here were obviously not invented but collected, borrowed and copied from a large amount of sources These sources include similar but shorter notes found on the internet and appendices in books - see the references for a full list Errors: Very likely there are errors, typos, and mistakes for which we apologize and would be grateful to receive corrections at cookbook@2302.dk Its ongoing: The project of keeping a large repository of relations involving matrices is naturally ongoing and the version will be apparent from the date in the header Suggestions: Your suggestion for additional content or elaboration of some topics is most welcome acookbook@2302.dk Keywords: Matrix algebra, matrix relations, matrix identities, derivative of determinant, derivative of inverse matrix, differentiate a matrix Acknowledgements: We would like to thank the following for contributions and suggestions: Bill Baxter, Brian Templeton, Christian Rishứj, Christian Schră oppel, Dan Boley, Douglas L Theobald, Esben Hoegh-Rasmussen, Evripidis Karseras, Georg Martius, Glynne Casteel, Jan Larsen, Jun Bin Gao, Jă urgen Struckmeier, Kamil Dedecius, Karim T Abou-Moustafa, Korbinian Strimmer, Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Liguo He, Loic Thibaut, Markus Froeb, Michael Hubatka, Miguel Bar˜ao, Ole Winther, Pavel Sakov, Stephan Hattinger, Troels Pedersen, Vasile Sima, Vincent Rabaud, Zhaoshui He We would also like thank The Oticon Foundation for funding our PhD studies Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page CONTENTS CONTENTS Contents Basics 1.1 Trace 1.2 Determinant 1.3 The Special Case 2x2 Derivatives 2.1 Derivatives 2.2 Derivatives 2.3 Derivatives 2.4 Derivatives 2.5 Derivatives 2.6 Derivatives 2.7 Derivatives 2.8 Derivatives of of of of of of of of a Determinant an Inverse Eigenvalues Matrices, Vectors and Scalar Forms Traces vector norms matrix norms Structured Matrices Inverses 3.1 Basic 3.2 Exact Relations 3.3 Implication on Inverses 3.4 Approximations 3.5 Generalized Inverse 3.6 Pseudo Inverse 6 8 10 10 12 14 14 14 17 17 18 20 20 21 21 Complex Matrices 24 4.1 Complex Derivatives 24 4.2 Higher order and non-linear derivatives 26 4.3 Inverse of complex sum 27 Solutions and Decompositions 5.1 Solutions to linear equations 5.2 Eigenvalues and Eigenvectors 5.3 Singular Value Decomposition 5.4 Triangular Decomposition 5.5 LU decomposition 5.6 LDM decomposition 5.7 LDL decompositions 28 28 30 31 32 32 33 33 Statistics and Probability 34 6.1 Definition of Moments 34 6.2 Expectation of Linear Combinations 35 6.3 Weighted Scalar Variable 36 Multivariate Distributions 7.1 Cauchy 7.2 Dirichlet 7.3 Normal 7.4 Normal-Inverse Gamma 7.5 Gaussian 7.6 Multinomial 37 37 37 37 37 37 37 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page CONTENTS 7.7 7.8 7.9 CONTENTS Student’s t Wishart Wishart, Inverse Gaussians 8.1 Basics 8.2 Moments 8.3 Miscellaneous 8.4 Mixture of Gaussians 37 38 39 40 40 42 44 44 Special Matrices 9.1 Block matrices 9.2 Discrete Fourier Transform Matrix, The 9.3 Hermitian Matrices and skew-Hermitian 9.4 Idempotent Matrices 9.5 Orthogonal matrices 9.6 Positive Definite and Semi-definite Matrices 9.7 Singleentry Matrix, The 9.8 Symmetric, Skew-symmetric/Antisymmetric 9.9 Toeplitz Matrices 9.10 Transition matrices 9.11 Units, Permutation and Shift 9.12 Vandermonde Matrices 46 46 47 48 49 49 50 52 54 54 55 56 57 10 Functions and Operators 10.1 Functions and Series 10.2 Kronecker and Vec Operator 10.3 Vector Norms 10.4 Matrix Norms 10.5 Rank 10.6 Integral Involving Dirac Delta 10.7 Miscellaneous 58 58 59 61 61 62 62 63 Functions A One-dimensional Results 64 A.1 Gaussian 64 A.2 One Dimensional Mixture of Gaussians 65 B Proofs and Details 66 B.1 Misc Proofs 66 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page CONTENTS CONTENTS Notation and Nomenclature A Aij Ai Aij An A−1 A+ A1/2 (A)ij Aij [A]ij a ai a z z Z z z Z Matrix Matrix indexed for some purpose Matrix indexed for some purpose Matrix indexed for some purpose Matrix indexed for some purpose or The n.th power of a square matrix The inverse matrix of the matrix A The pseudo inverse matrix of the matrix A (see Sec 3.6) The square root of a matrix (if unique), not elementwise The (i, j).th entry of the matrix A The (i, j).th entry of the matrix A The ij-submatrix, i.e A with i.th row and j.th column deleted Vector (column-vector) Vector indexed for some purpose The i.th element of the vector a Scalar Real part of a scalar Real part of a vector Real part of a matrix Imaginary part of a scalar Imaginary part of a vector Imaginary part of a matrix det(A) Tr(A) diag(A) eig(A) vec(A) sup ||A|| AT A−T A∗ AH Determinant of A Trace of the matrix A Diagonal matrix of the matrix A, i.e (diag(A))ij = δij Aij Eigenvalues of the matrix A The vector-version of the matrix A (see Sec 10.2.2) Supremum of a set Matrix norm (subscript if any denotes what norm) Transposed matrix The inverse of the transposed and vice versa, A−T = (A−1 )T = (AT )−1 Complex conjugated matrix Transposed and complex conjugated matrix (Hermitian) A◦B A⊗B Hadamard (elementwise) product Kronecker product I Jij Σ Λ The null matrix Zero in all entries The identity matrix The single-entry matrix, at (i, j) and zero elsewhere A positive definite matrix A diagonal matrix Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 1 Basics (AB)−1 (ABC ) −1 T −1 (A ) = .C −1 (1) −1 B −1 A −1 T ) (2) = (A = T A +B (4) (AB)T = BT AT (5) (ABC )T = .CT BT AT (3) (6) T H −1 (A ) (A + B)H = = (A ) AH + BH (AB)H = BH AH H (ABC ) 1.2 B−1 A−1 = T (A + B) 1.1 BASICS −1 H H = (7) (8) (9) H .C B A H (10) Trace Tr(A) = i Aii Tr(A) = i λi , T (11) λi = eig(A) (12) Tr(A) = Tr(A ) (13) Tr(AB) = Tr(BA) (14) Tr(A + B) = Tr(A) + Tr(B) (15) Tr(ABC) = Tr(BCA) = Tr(CAB) (16) aT a = Tr(aaT ) (17) Determinant Let A be an n × n matrix det(A) = i λi λi = eig(A) det(cA) = cn det(A), det(AT ) = if A ∈ Rn×n det(A) (18) (19) (20) det(AB) = det(A) det(B) (21) det(A−1 ) = 1/ det(A) (22) det(An ) = det(A)n (23) T det(I + uv ) = T 1+u v (24) det(I + A) = + det(A) + Tr(A) (25) 1 det(I + A) = + det(A) + Tr(A) + Tr(A)2 − Tr(A2 ) 2 (26) For n = 2: For n = 3: Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 1.3 The Special Case 2x2 BASICS For n = 4: det(I + A) = + det(A) + Tr(A) + +Tr(A)2 − Tr(A2 ) 1 + Tr(A) − Tr(A)Tr(A2 ) + Tr(A3 ) (27) For small ε, the following approximation holds 1 det(I + εA) ∼ = + det(A) + εTr(A) + ε2 Tr(A)2 − ε2 Tr(A2 ) 2 1.3 (28) The Special Case 2x2 Consider the matrix A A= A11 A21 A12 A22 Determinant and trace det(A) = A11 A22 − A12 A21 (29) Tr(A) = A11 + A22 (30) Eigenvalues λ2 − λ · Tr(A) + det(A) = λ1 = Tr(A) + Tr(A)2 − det(A) λ1 + λ2 = Tr(A) λ2 = Tr(A) − Tr(A)2 − det(A) λ1 λ2 = det(A) Eigenvectors v1 ∝ A12 λ1 − A11 Inverse A−1 = det(A) v2 ∝ A22 −A21 A12 λ2 − A11 −A12 A11 (31) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 2 DERIVATIVES Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X Note that it is always assumed that X has no special structure, i.e that the elements of X are independent (e.g not symmetric, Toeplitz, positive definite) See section 2.8 for differentiation of structured matrices The basic assumptions can be written in a formula as ∂Xkl = δik δlj ∂Xij (32) that is for e.g vector forms, ∂x ∂y = i ∂xi ∂y ∂x ∂y = i ∂x ∂yi ∂x ∂y = ij ∂xi ∂yj The following rules are general and very useful when deriving the differential of an expression ([19]): ∂A ∂(αX) ∂(X + Y) ∂(Tr(X)) ∂(XY) ∂(X ◦ Y) ∂(X ⊗ Y) ∂(X−1 ) ∂(det(X)) ∂(det(X)) ∂(ln(det(X))) ∂XT ∂XH 2.1 2.1.1 = = = = = = = = = = = = = α∂X ∂X + ∂Y Tr(∂X) (∂X)Y + X(∂Y) (∂X) ◦ Y + X ◦ (∂Y) (∂X) ⊗ Y + X ⊗ (∂Y) −X−1 (∂X)X−1 Tr(adj(X)∂X) det(X)Tr(X−1 ∂X) Tr(X−1 ∂X) (∂X)T (∂X)H (A is a constant) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) Derivatives of a Determinant General form k ∂ det(Y) ∂x ∂ det(X) Xjk ∂Xik ∂ det(Y) ∂x2 = det(Y)Tr Y−1 ∂Y ∂x (46) = δij det(X) = (47) det(Y) Tr Y−1 +Tr Y−1 −Tr ∂ ∂Y ∂x ∂x ∂Y ∂Y Tr Y−1 ∂x ∂x Y−1 ∂Y ∂x Y−1 ∂Y ∂x (48) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 2.2 Derivatives of an Inverse 2.1.2 DERIVATIVES Linear forms k ∂ det(X) ∂X ∂ det(X) Xjk ∂Xik ∂ det(AXB) ∂X 2.1.3 = det(X)(X−1 )T = δij det(X) = (49) (50) det(AXB)(X−1 )T = det(AXB)(XT )−1 (51) Square forms If X is square and invertible, then ∂ det(XT AX) = det(XT AX)X−T ∂X (52) If X is not square but A is symmetric, then ∂ det(XT AX) = det(XT AX)AX(XT AX)−1 ∂X (53) If X is not square and A is not symmetric, then ∂ det(XT AX) = det(XT AX)(AX(XT AX)−1 + AT X(XT AT X)−1 ) ∂X 2.1.4 (54) Other nonlinear forms Some special cases are (See [9, 7]) ∂ ln det(XT X)| ∂X ∂ ln det(XT X) ∂X+ ∂ ln | det(X)| ∂X ∂ det(Xk ) ∂X 2.2 = 2(X+ )T = −2XT = (X−1 )T = (XT )−1 = k det(Xk )X−T (55) (56) (57) (58) Derivatives of an Inverse From [27] we have the basic identity ∂Y −1 ∂Y−1 = −Y−1 Y ∂x ∂x (59) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 2.3 Derivatives of Eigenvalues DERIVATIVES from which it follows ∂(X−1 )kl ∂Xij = −(X−1 )ki (X−1 )jl (60) ∂aT X−1 b = −X−T abT X−T (61) ∂X ∂ det(X−1 ) = − det(X−1 )(X−1 )T (62) ∂X ∂Tr(AX−1 B) = −(X−1 BAX−1 )T (63) ∂X ∂Tr((X + A)−1 ) = −((X + A)−1 (X + A)−1 )T (64) ∂X From [32] we have the following result: Let A be an n × n invertible square matrix, W be the inverse of A, and J(A) is an n × n -variate and differentiable function with respect to A, then the partial differentials of J with respect to A and W satisfy ∂J −T ∂J = −A−T A ∂A ∂W 2.3 Derivatives of Eigenvalues ∂ ∂ eig(X) = Tr(X) = I (65) ∂X ∂X ∂ ∂ eig(X) = det(X) = det(X)X−T (66) ∂X ∂X If A is real and symmetric, λi and vi are distinct eigenvalues and eigenvectors of A (see (276)) with viT vi = 1, then [33] ∂λi ∂vi 2.4 2.4.1 = viT ∂(A)vi = (67) + (λi I − A) ∂(A)vi (68) Derivatives of Matrices, Vectors and Scalar Forms First Order ∂xT a ∂x ∂aT Xb ∂X ∂aT XT b ∂X ∂aT Xa ∂X ∂X ∂Xij ∂(XA)ij ∂Xmn ∂(XT A)ij ∂Xmn = ∂aT x ∂x = abT (70) = baT (71) = ∂aT XT a ∂X = Jij = δim (A)nj = (Jmn A)ij (74) = δin (A)mj = (Jnm A)ij (75) = a = (69) aaT (72) (73) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 10 10 10 10.1 10.1.1 FUNCTIONS AND OPERATORS Functions and Operators Functions and Series Finite Series (Xn − I)(X − I)−1 = I + X + X2 + + Xn−1 10.1.2 (487) Taylor Expansion of Scalar Function Consider some scalar function f (x) which takes the vector x as an argument This we can Taylor expand around x0 f (x) ∼ = f (x0 ) + g(x0 )T (x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) where g(x0 ) = 10.1.3 ∂f (x) ∂x H(x0 ) = x0 ∂ f (x) ∂x∂xT (488) x0 Matrix Functions by Infinite Series As for analytical functions in one dimension, one can define a matrix function for square matrices X by an infinite series ∞ cn Xn f (X) = (489) n=0 assuming the limit exists and is finite If the coefficients cn fulfils n cn xn < ∞, then one can prove that the above series exists and is finite, see [1] Thus for any analytical function f (x) there exists a corresponding matrix function f (x) constructed by the Taylor expansion Using this one can prove the following results: 1) A matrix A is a zero of its own characteristic polynomium [1]: cn λn p(λ) = det(Iλ − A) = ⇒ p(A) = (490) n 2) If A is square it holds that [1] A = UBU−1 ⇒ f (A) = Uf (B)U−1 (491) 3) A useful fact when using power series is that An → 0forn → ∞ 10.1.4 if |A| < (492) Identity and commutations It holds for an analytical matrix function f (X) that f (AB)A = Af (BA) (493) see B.1.2 for a proof Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 58 10.2 Kronecker and Vec Operator 10.1.5 10 FUNCTIONS AND OPERATORS Exponential Matrix Function In analogy to the ordinary scalar exponential function, one can define exponential and logarithmic matrix functions: ∞ eA ≡ e−A ≡ etA ≡ n A = I + A + A2 + n! n=0 (494) ∞ 1 (−1)n An = I − A + A2 − n! n=0 (495) ∞ 1 (tA)n = I + tA + t2 A2 + n! n=0 (496) ∞ 1 (−1)n−1 n A = A − A2 + A3 − n n=1 ln(I + A) ≡ (497) Some of the properties of the exponential function are [1] eA eB A −1 (e ) d tA e dt = eA+B = e if AB = BA −A = AetA = etA A, (499) t∈R d Tr(etA ) = Tr(AetA ) dt det(eA ) = eTr(A) 10.1.6 (498) (500) (501) (502) Trigonometric Functions ∞ sin(A) ≡ (−1)n A2n+1 1 = A − A3 + A5 − (2n + 1)! 3! 5! n=0 (503) ∞ cos(A) ≡ 10.2 10.2.1 (−1)n A2n 1 = I − A2 + A4 − (2n)! 2! 4! n=0 (504) Kronecker and Vec Operator The Kronecker Product The Kronecker product of an m × n matrix A and an r × q matrix B, is an mr × nq matrix, A ⊗ B defined as A11 B A12 B A1n B A21 B A22 B A2n B A⊗B= (505) Am1 B Am2 B Amn B Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 59 10.2 Kronecker and Vec Operator 10 FUNCTIONS AND OPERATORS The Kronecker product has the following properties (see [19]) A ⊗ (B + C) = A⊗B = A⊗B+A⊗C B⊗A (506) in general (507) A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C (508) (αA A ⊗ αB B) = αA αB (A ⊗ B) (509) (A ⊗ B) T = T T A ⊗B (510) (A ⊗ B)(C ⊗ D) = AC ⊗ BD (511) (A ⊗ B)−1 = A−1 ⊗ B−1 (512) (A ⊗ B)+ = A+ ⊗ B+ (513) rank(A ⊗ B) = rank(A)rank(B) (514) Tr(A ⊗ B) = (515) det(A ⊗ B) = Tr(A)Tr(B) = Tr(ΛA ⊗ ΛB ) det(A)rank(B) det(B)rank(A) (517) {eig(A ⊗ B)} = {eig(B ⊗ A)} {eig(A ⊗ B)} = {eig(A)eig(B)T } if A, B are square (516) (518) if A, B are symmetric and square eig(A ⊗ B) = eig(A) ⊗ eig(B) (519) Where {λi } denotes the set of values λi , that is, the values in no particular order or structure, and ΛA denotes the diagonal matrix with the eigenvalues of A 10.2.2 The Vec Operator The vec-operator applied on a matrix A stacks the columns into a vector, i.e for a × matrix A11 A21 A11 A12 A= vec(A) = A12 A21 A22 A22 Properties of the vec-operator include (see [19]) vec(AXB) T = (BT ⊗ A)vec(X) T (520) Tr(A B) = vec(A) vec(B) (521) vec(A + B) = vec(A) + vec(B) (522) vec(αA) = aT XBXT c = α · vec(A) (523) vec(X)T (B ⊗ caT )vec(X) (524) See B.1.1 for a proof for Eq 524 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 60 10.3 Vector Norms 10.3 10.3.1 10 FUNCTIONS AND OPERATORS Vector Norms Examples ||x||1 |xi | = (525) i H ||x||22 = x x (526) 1/p ||x||p |xi |p = (527) i ||x||∞ = max |xi | (528) i Further reading in e.g [12, p 52] 10.4 10.4.1 Matrix Norms Definitions A matrix norm is a mapping which fulfils ||A|| ≥ ||A|| = ||cA|| = ||A + B|| 10.4.2 ≤ (529) 0⇔A=0 |c|||A||, (530) c∈R (531) ||A|| + ||B|| (532) Induced Norm or Operator Norm An induced norm is a matrix norm induced by a vector norm by the following ||A|| = sup{||Ax|| | ||x|| = 1} (533) where || · || on the left side is the induced matrix norm, while || · || on the right side denotes the vector norm For induced norms it holds that ||I|| = ||Ax|| ≤ ||A|| · ||x||, ||AB|| ≤ ||A|| · ||B||, 10.4.3 for all A, x for all A, B (534) (535) (536) Examples ||A||1 = |Aij | max j (537) i ||A||2 ||A||p max eig(AH A) = = ( max ||Ax||p )1/p ||A||∞ = ||A||F (538) (539) ||x||p =1 |Aij | max i (540) j |Aij |2 = = Tr(AAH ) (Frobenius) (541) ij Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 61 10.5 Rank 10 ||A||max = max |Aij | ||A||KF = ||sing(A)||1 FUNCTIONS AND OPERATORS (542) ij (Ky Fan) (543) where sing(A) is the vector of singular values of the matrix A 10.4.4 Inequalities E H Rasmussen has in yet unpublished material derived and collected the following inequalities They are collected in a table as below, assuming A is an m × n, and d = rank(A) ||A||max ||A||max ||A||1 ||A||∞ ||A||2 ||A||F ||A||KF m √n mn √ √ mn mnd ||A||1 ||A||∞ m √n n √ √n nd √ m √ √m md ||A||2 √1 √m n ||A||F √1 √m n √ d d √ ||A||KF √1 √m n 1 d which are to be read as, e.g ||A||2 ≤ 10.4.5 √ m · ||A||∞ (544) Condition Number The 2-norm of A equals (max(eig(AT A))) [12, p.57] For a symmetric, positive definite matrix, this reduces to max(eig(A)) The condition number based on the 2-norm thus reduces to A 10.5 10.5.1 A−1 = max(eig(A)) max(eig(A−1 )) = max(eig(A)) min(eig(A)) (545) Rank Sylvester’s Inequality If A is m × n and B is n × r, then rank(A) + rank(B) − n ≤ rank(AB) ≤ min{rank(A), rank(B)} 10.6 (546) Integral Involving Dirac Delta Functions Assuming A to be square, then p(s)δ(x − As)ds = p(A−1 x) det(A) (547) Assuming A to be ”underdetermined”, i.e ”tall”, then √ p(s)δ(x − As)ds = p(A+ x) det(AT A) if x = AA+ x elsewhere (548) See [9] Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 62 10.7 Miscellaneous 10.7 10 FUNCTIONS AND OPERATORS Miscellaneous For any A it holds that rank(A) = rank(AT ) = rank(AAT ) = rank(AT A) (549) It holds that A is positive definite ⇔ ∃B invertible, such that A = BBT (550) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 63 A A ONE-DIMENSIONAL RESULTS One-dimensional Results A.1 Gaussian A.1.1 Density p(x) = √ A.1.2 e−(ax ec x (s−µ)2 2σ exp − √ ds = +bx+c) dx = +c1 x+c0 dx = (x − µ)2 2σ (551) 2πσ (552) π b2 − 4ac exp a 4a π c21 − 4c2 c0 exp −c2 −4c2 (553) (554) Derivatives ∂p(x) ∂µ ∂ ln p(x) ∂µ ∂p(x) ∂σ ∂ ln p(x) ∂σ A.1.4 2πσ Normalization e− A.1.3 = = = = (x − µ) σ2 (x − µ) σ2 (x − µ)2 p(x) −1 σ σ2 (x − µ)2 −1 σ σ2 p(x) (555) (556) (557) (558) Completing the Squares c2 x2 + c1 x + c0 = −a(x − b)2 + w −a = c2 b= c1 c2 or c2 x2 + c1 x + c0 = − µ= A.1.5 −c1 2c2 σ2 = w= c21 + c0 c2 (x − µ)2 + d 2σ −1 2c2 d = c0 − c21 4c2 Moments If the density is expressed by p(x) = √ 2πσ exp − (s − µ)2 2σ or p(x) = C exp(c2 x2 + c1 x) (559) then the first few basic moments are Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 64 A.2 One Dimensional Mixture of Gaussians A ONE-DIMENSIONAL RESULTS x = µ = −c1 2c2 x2 = σ + µ2 = x3 = 3σ µ + µ3 = −1 2c2 + c1 (2c2 )2 x4 = µ4 + 6µ2 σ + 3σ = c1 2c2 −c1 2c2 3− +6 c21 2c2 c1 2c2 −1 2c2 +3 2c2 and the central moments are (x − µ) = = (x − µ) (x − µ)3 = = σ (x − µ)4 = 3σ −1 2c2 = = = 2c2 A kind of pseudo-moments (un-normalized integrals) can easily be derived as exp(c2 x2 + c1 x)xn dx = Z xn = π c21 exp −c2 −4c2 xn (560) ¿From the un-centralized moments one can derive other entities like A.2 A.2.1 x2 − x x3 − x2 x = = σ2 2σ µ = = x4 − x2 = 2σ + 4µ2 σ = c2 − 2c12 One Dimensional Mixture of Gaussians Density and Normalization K ρk (s − µk )2 exp − σk2 2πσk2 p(s) = k A.2.2 −1 2c2 2c1 (2c2 )2 (2c2 )2 (561) Moments A useful fact of MoG, is that xn = ρk xn k (562) k where · k denotes average with respect to the k.th component We can calculate the first four moments from the densities p(x) = ρk k p(x) 1 (x − µk )2 exp − σk2 2πσk2 ρk Ck exp ck2 x2 + ck1 x = (563) (564) k as Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 65 B PROOFS AND DETAILS x = k ρk µk = k ρk −ck1 2ck2 x2 = k ρk (σk2 + µ2k ) = k ρk −1 2ck2 x3 = k ρk (3σk2 µk + µ3k ) = k ρk ck1 (2ck2 )2 x4 = k ρk (µ4k + 6µ2k σk2 + 3σk4 ) = k ρk 2ck2 −ck1 2ck2 + 3− 2 c2k1 2ck2 ck1 2ck2 c2 − 2ck1 +3 k2 If all the gaussians are centered, i.e µk = for all k, then x = x2 x3 = = x4 = = k ρk σk k ρk 3σk4 = = k ρk k ρk −1 2ck2 = −1 2ck2 ¿From the un-centralized moments one can derive other entities like x2 − x x3 − x2 x x4 − x2 A.2.3 = = = k,k k,k k,k ρk ρk µ2k + σk2 − µk µk ρk ρk 3σk2 µk + µ3k − (σk2 + µ2k )µk ρk ρk µ4k + 6µ2k σk2 + 3σk4 − (σk2 + µ2k )(σk2 + µ2k ) Derivatives Defining p(s) = nent k ρk Ns (µk , σk2 ) we get for a parameter θj of the j.th compo- ∂ ln p(s) = ∂θj ρj Ns (µj , σj2 ) ∂ ln(ρj Ns (µj , σj2 )) ∂θj k ρk Ns (µk , σk ) (565) that is, ∂ ln p(s) ∂ρj = ρj Ns (µj , σj2 ) k ρk Ns (µk , σk ) ρj (566) ∂ ln p(s) ∂µj = ρj Ns (µj , σj2 ) (s − µj ) σj2 k ρk Ns (µk , σk ) (567) ∂ ln p(s) ∂σj = ρj Ns (µj , σj2 ) k ρk Ns (µk , σk ) σj (568) (s − µj )2 −1 σj2 Note that ρk must be constrained to be proper ratios Defining the ratios by ρj = erj / k erk , we obtain ∂ ln p(s) = ∂rj B B.1 B.1.1 l ∂ ln p(s) ∂ρl ∂ρl ∂rj where ∂ρl = ρl (δlj − ρj ) ∂rj (569) Proofs and Details Misc Proofs Proof of Equation 524 The following proof is work of Florian Roemer Note the the vectors and matrices below can be complex and the notation XH is used for transpose and conjugated, while XT is only transpose of the complex matrix Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 66 B.1 Misc Proofs B PROOFS AND DETAILS Define the row vector y = aH XB and the column vector z = XH c Then aT XBXT c = yz = zT yT Note that y can be rewritten as vec(y)T which is the same as vec(conj(y))H = vec(aT conj(X)conj(B))H where ”conj” means complex conjugated Applying the vec rule for linear forms Eq 520, we get y = (BH ⊗ aT vec(conj(X))H = vec(X)T (B ⊗ conj(a)) where we have also used the rule for transpose of Kronecker products For yT this yields (BT ⊗ aH )vec(X) Similarly we can rewrite z which is the same as vec(zT ) = vec(cT conj(X)) Applying again Eq 520, we get z = (I ⊗ cT )vec(conj(X)) where I is the identity matrix For zT we obtain vec(X)(I ⊗ c) Finally, the original expression is zT yT which now takes the form vec(X)H (I ⊗ c)(BT ⊗ aH )vec(X) the final step is to apply the rule for products of Kronecker products and by that combine the Kronecker products This gives vec(X)H (BT ⊗ caH )vec(X) which is the desired result B.1.2 Proof of Equation 493 For any analytical function f (X) of a matrix argument X, it holds that ∞ f (AB)A cn (AB)n = A n=0 ∞ cn (AB)n A = n=0 ∞ cn A(BA)n = n=0 ∞ cn (BA)n = A n=0 = Af (BA) B.1.3 Proof of Equation 91 Essentially we need to calculate Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 67 B.1 Misc Proofs B ∂(Xn )kl ∂Xij ∂ ∂Xij = PROOFS AND DETAILS Xk,u1 Xu1 ,u2 Xun−1 ,l u1 , ,un−1 = δk,i δu1 ,j Xu1 ,u2 Xun−1 ,l +Xk,u1 δu1 ,i δu2 ,j Xun−1 ,l +Xk,u1 Xu1 ,u2 δun−1 ,i δl,j n−1 (Xr )ki (Xn−1−r )jl = r=0 n−1 (Xr Jij Xn−1−r )kl = r=0 Using the properties of the single entry matrix found in Sec 9.7.4, the result follows easily B.1.4 Details on Eq 571 ∂ det(XH AX) = det(XH AX)Tr[(XH AX)−1 ∂(XH AX)] = det(XH AX)Tr[(XH AX)−1 (∂(XH )AX + XH ∂(AX))] = det(XH AX) Tr[(XH AX)−1 ∂(XH )AX] +Tr[(XH AX)−1 XH ∂(AX)] = det(XH AX) Tr[AX(XH AX)−1 ∂(XH )] +Tr[(XH AX)−1 XH A∂(X)] First, the derivative is found with respect to the real part of X ∂ det(XH AX) ∂ X = = Tr[AX(XH AX)−1 ∂(XH )] ∂ X Tr[(XH AX)−1 XH A∂(X)] + ∂ X det(XH AX) AX(XH AX)−1 + ((XH AX)−1 XH A)T det(XH AX) Through the calculations, (100) and (240) were used In addition, by use of (241), the derivative is found with respect to the imaginary part of X i ∂ det(XH AX) ∂ X Tr[AX(XH AX)−1 ∂(XH )] ∂ X Tr[(XH AX)−1 XH A∂(X)] + ∂ X H det(X AX) AX(XH AX)−1 − ((XH AX)−1 XH A)T = i det(XH AX) = Hence, derivative yields ∂ det(XH AX) ∂X = = ∂ det(XH AX) ∂ det(XH AX) −i ∂ X ∂ X T H H −1 H det(X AX) (X AX) X A Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 68 B.1 Misc Proofs B PROOFS AND DETAILS and the complex conjugate derivative yields ∂ det(XH AX) ∂X∗ = = ∂ det(XH AX) ∂ det(XH AX) +i ∂ X ∂ X H H −1 det(X AX)AX(X AX) Notice, for real X, A, the sum of (249) and (250) is reduced to (54) Similar calculations yield ∂ det(XAXH ) ∂X = = ∂ det(XAXH ) ∂ det(XAXH ) −i ∂ X ∂ X H H H −1 T det(XAX ) AX (XAX ) (570) ∂ det(XAXH ) ∂ det(XAXH ) +i ∂ X ∂ X det(XAXH )(XAXH )−1 XA (571) and ∂ det(XAXH ) ∂X∗ = = Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 69 REFERENCES REFERENCES References [1] Karl Gustav Andersson and Lars-Christer Boiers Ordinaera differentialekvationer Studenterlitteratur, 1992 [2] Jă orn Anemă uller, Terrence J Sejnowski, and Scott Makeig Complex independent component analysis of frequency-domain electroencephalographic data Neural Networks, 16(9):1311–1323, November 2003 [3] S Barnet Matrices Methods and Applications Oxford Applied Mathematics and Computin Science Series Clarendon Press, 1990 [4] Christopher Bishop Neural Networks for Pattern Recognition Oxford University Press, 1995 [5] Robert J Boik Lecture notes: Statistics 550 Online, April 22 2002 Notes [6] D H Brandwood A complex gradient operator and its application in adaptive array theory IEE Proceedings, 130(1):11–16, February 1983 PTS F and H [7] M Brookes Matrix Reference Manual, 2004 Website May 20, 2004 [8] Contradsen K., En introduktion til statistik, IMM lecture notes, 1984 [9] Mads Dyrholm Some matrix results, 2004 Website August 23, 2004 [10] Nielsen F A., Formula, Neuro Research Unit and Technical university of Denmark, 2002 [11] Gelman A B., J S Carlin, H S Stern, D B Rubin, Bayesian Data Analysis, Chapman and Hall / CRC, 1995 [12] Gene H Golub and Charles F van Loan Matrix Computations The Johns Hopkins University Press, Baltimore, 3rd edition, 1996 [13] Robert M Gray Toeplitz and circulant matrices: A review Technical report, Information Systems Laboratory, Department of Electrical Engineering,Stanford University, Stanford, California 94305, August 2002 [14] Simon Haykin Adaptive Filter Theory Prentice Hall, Upper Saddle River, NJ, 4th edition, 2002 [15] Roger A Horn and Charles R Johnson Matrix Analysis Cambridge University Press, 1985 [16] Mardia K V., J.T Kent and J.M Bibby, Multivariate Analysis, Academic Press Ltd., 1979 [17] Mathpages on ”Eigenvalue Problems and Matrix Invariants”, http://www.mathpages.com/home/kmath128.htm [18] Carl D Meyer Generalized inversion of modified matrices SIAM Journal of Applied Mathematics, 24(3):315–323, May 1973 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 70 REFERENCES REFERENCES [19] Thomas P Minka Old and new matrix algebra useful for statistics, December 2000 Notes [20] Daniele Mortari Ortho–Skew and Ortho–Sym Matrix Trigonometry John Lee Junkins Astrodynamics Symposium, AAS 03–265, May 2003 Texas A&M University, College Station, TX [21] L Parra and C Spence Convolutive blind separation of non-stationary sources In IEEE Transactions Speech and Audio Processing, pages 320– 327, May 2000 [22] Kaare Brandt Petersen, Jiucang Hao, and Te-Won Lee Generative and filtering approaches for overcomplete representations Neural Information Processing - Letters and Reviews, vol 8(1), 2005 [23] John G Proakis and Dimitris G Manolakis Digital Signal Processing Prentice-Hall, 1996 [24] Laurent Schwartz Cours d’Analyse, volume II Hermann, Paris, 1967 As referenced in [14] [25] Shayle R Searle Matrix Algebra Useful for Statistics John Wiley and Sons, 1982 [26] G Seber and A Lee Linear Regression Analysis John Wiley and Sons, 2002 [27] S M Selby Standard Mathematical Tables CRC Press, 1974 [28] Inna Stainvas Matrix algebra in differential calculus Neural Computing Research Group, Information Engeneering, Aston University, UK, August 2002 Notes [29] P P Vaidyanathan Multirate Systems and Filter Banks Prentice Hall, 1993 [30] Max Welling The Kalman Filter Lecture Note [31] Wikipedia on minors: ”Minor (linear algebra)”, http://en.wikipedia.org/wiki/Minor_(linear_algebra) [32] Zhaoshui He, Shengli Xie, et al, ”Convolutive blind source separation in frequency domain based on sparse representation”, IEEE Transactions on Audio, Speech and Language Processing, vol.15(5):1551-1563, July 2007 [33] Karim T Abou-Moustafa On Derivatives of Eigenvalues and Eigenvectors of the Generalized Eigenvalue Problem McGill Technical Report, October 2010 [34] Mohammad Emtiyaz Khan Updating Inverse of a Matrix When a Column is Added/Removed Emt CS,UBC February 27, 2008 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 71 Index Anti-symmetric, 54 Normal-Inverse Gamma distribution, 37 Normal-Inverse Wishart distribution, 39 Block matrix, 46 Orthogonal, 49 Chain rule, 15 Cholesky-decomposition, 32 Co-kurtosis, 34 Co-skewness, 34 Condition number, 62 Cramers Rule, 29 Power series of matrices, 58 Probability matrix, 55 Pseudo-inverse, 21 Schur complement, 41, 47 Single entry matrix, 52 Singular Valued Decomposition (SVD), 31 Skew-Hermitian, 48 Skew-symmetric, 54 Stochastic matrix, 55 Student-t, 37 Sylvester’s Inequality, 62 Symmetric, 54 Derivative of a complex matrix, 24 Derivative of a determinant, Derivative of a trace, 12 Derivative of an inverse, Derivative of symmetric matrix, 15 Derivatives of Toeplitz matrix, 16 Dirichlet distribution, 37 Eigenvalues, 30 Eigenvectors, 30 Exponential Matrix Function, 59 Taylor expansion, 58 Toeplitz matrix, 54 Transition matrix, 55 Trigonometric functions, 59 Gaussian, conditional, 40 Gaussian, entropy, 44 Gaussian, linear combination, 41 Gaussian, marginal, 40 Gaussian, product of densities, 42 Generalized inverse, 21 Unipotent, 49 Vandermonde matrix, 57 Vec operator, 59, 60 Wishart distribution, 38 Woodbury identity, 18 Hadamard inequality, 52 Hermitian, 48 Idempotent, 49 Kronecker product, 59 LDL decomposition, 33 LDM-decomposition, 33 Linear regression, 28 LU decomposition, 32 Lyapunov Equation, 30 Moore-Penrose inverse, 21 Multinomial distribution, 37 Nilpotent, 49 Norm of a matrix, 61 Norm of a vector, 61 72 ... inverse matrix of the matrix A (see Sec 3.6) The square root of a matrix (if unique), not elementwise The (i, j).th entry of the matrix A The (i, j).th entry of the matrix A The ij-submatrix, i.e A... purpose Matrix indexed for some purpose Matrix indexed for some purpose Matrix indexed for some purpose or The n.th power of a square matrix The inverse matrix of the matrix A The pseudo inverse matrix. .. A Trace of the matrix A Diagonal matrix of the matrix A, i.e (diag(A))ij = δij Aij Eigenvalues of the matrix A The vector-version of the matrix A (see Sec 10.2.2) Supremum of a set Matrix norm