Matrix algebra for econometrics

MATRIX ALGEBRA FOR ECONOMETRICS MEI-YUAN CHEN Department of Finance National Chung Hsing University July 2, 2003 c Mei-Yuan Chen The LATEX source files are mat-alg1.tex Contents Vertors 1.1 Vector Operations 1.2 Inner Products 1.3 Unit Vectors 1.4 Direction Cosines 1.5 Statistical Applications Vector Spaces 2.1 The Dimension of a Vector Space 2.2 The Sum and Direct Sum of Vector Spaces 10 2.3 Orthogonal Basis Vectors 11 2.4 Orthogonal Projection of a Vector 13 2.5 Statistical Applications 14 Matrices 16 3.1 Matrix Operations 17 3.2 Matrix Scalar Functions 20 3.3 Matrix Rank 22 3.4 Matrix Inversion 24 3.5 Statistical Applications 26 Linear Transformations and Systems of Linear Equaltions 27 4.1 Systems of Linear Equations 27 4.2 Linear Transformations 28 Special Matrices 32 5.1 Symmetric Matrices 32 5.2 Quadratic Forms and Definite Matrices 5.3 Differentiation Involving Vectors and Matrices 34 5.4 Idempotent Matrices 35 5.5 Orthogonal Matrices 35 5.6 Projection Matrices 36 32 5.7 Partitioned Matrices 37 5.8 Statistical Applications 38 Eigenvalues and Eigenvectors 39 6.1 Eigenvalues and Eigenvectors 39 6.2 Some Basic Properties of Eigenvalues and Eigenvectors 41 6.3 Diagonalization 43 6.4 Orthogonal Diagonalization 45 Vertors An n-dimensional vector in Rn is a collection of n real numbers An n-dimensional row vector is written as (u1 , u2 , , un ), and the corresponding column vector is  u1         u2        un Clearly, a vector reduces to a scalar when n = Note that a vector is completely determined by its magnitude and direction, e.g., force and velocity This is in contrast with the quantities such as area and length for which direction is of no relevance A vector can also be interpreted as a point in a system of coordinate axes, so that its components ui represent the corresponding coordinates In what follows, vectors are denoted by English and Greek alphabets in boldface The representation of vectors on a geometric plane is as following: (v1 , v2 ) ✡❏ ✣ ✡ ❏ ✡ ❏ ✡ ❏ v ✡ ❏ u − v = (u1 − v1 , u2 − v2 ) ✡ ❏ ✡ ❏ ✡ ❏ ✡ ❏ θ ✡ ✲❏ ❫ (u1 , u2 ) u 1.1 Vector Operations Consider vectors u, v, and w in Rn and scalars h and k Two vectors u and v are said to be equal if they are the same componentwise, i.e., ui = vi , i = 1, , n The sum of u and v is defined as u + v = (u1 + v1 , u2 + v2 , · · · , un + ), and the scalar multiple of u is hu = (hu1 , hu2 , · · · , hun ) Moreover, u + v = v + u; u + (v + w) = (u + v) + w; h(ku) = (hk)u; h(u + v) = hu + hv; (h + k)u = hu + ku The zero vector is the vector with all elements zero, so that for any vector u, u + = u The negative (additive inverse) of a vector u is −u, so that u + (−u) = 1.2 Inner Products The Euclidean inner product of u and v is u · v = u1 v1 + u2 v2 + · · · + un Inner products have the following properties u · v = v · u; (u + v) · w = u · w + v · w; (hu) · v = h(u · v); u · u ≥ 0; u · u = if and only if u = The Euclidean norm (or -norm) of u is defined as u = (u21 + u22 + · · · + u2n )1/2 = (u · u)1/2 The Euclidean distance between u and v is d(u, v) = u − v = [(u1 − v1 )2 + (u2 − v2 )2 + · · · + (un − )2 ]1/2 Note that ”norm” is a generalization of the usual notion of ”length” In addition, the term “inner” indicates the dimension of the two vectors is reduced to × after the inner product being taken Some other commonly used vector norms are as follows The norm u the sum norm or -norm, of u, called is defined by n u |ui | = i=1 Both the sum norm and the Euclidean norm are members of the family of p -norm which is defined as 1/p n u p p |ui | = , i=1 where p ≥ The infinity norm or max norm, is given by u ∞ = max |ui | 1≤i≤n Note that for any norm and a scalar h, hu = |h| u 1.3 Unit Vectors A vector is said to be a unit vector if it has norm one Two vectors are said to be orthogonal if their inner product is zero (see also Section 1.4) For example, (1, 0, 0), (0, 1, 0), (0, 0, 1), and (0.267, 0.534, 0.802) are all unit vectors in R3 , but only the first three vectors are mutually orthogonal Orthogonal unit vectors are also known as orthonormal vectors It is also easy to see that any non-zero vector can be normalized to unit length, i.e., for any u = 0, u/ u has norm one Any n-dimensional vector can be represented as a linear combination of n orthonormal vectors: u = (u1 , u2 , , un ) = (u1 , 0, , 0) + (0, u2 , 0, , 0) + · · · + (0, , 0, un ) = u1 (1, 0, , 0) + u2 (0, 1, , 0) + · · · + un (0, , 0, 1) Hence, orthonormal vectors can be viewed as orthogonal coordinate axes of unit length We could, of course, change the coordinate system without affecting the vector For example, we can express u = (1, 1/2) in terms of two orthogonal vectors (2,0) and (0,3) as u = (1/2, 1/6) = 1/2(2, 0) + 1/6(0, 3) 1.4 Direction Cosines Given u in Rn , let θi denote the angle between u and the ith axis The direction cosines are cos θi = ui / u , i = 1, , n Clearly, n n cos2 θi = i=1 u2i / u = i=1 Note that for any scalar non-zero h, hu has the direction cosines cos θi = hui / hu = ±(ui / u ), i = 1, , n That is, direction cosines are independent of vector magnitudes; only the sign of h (diLet u and v be two vectors in Rn with direction cosines ri , and rection) matters si , i = 1, , n Also let θ denote the angle between u and v Then by the law of cosine, u cos θ = + v 2− u−v u v Using the definition of direction cosines, the numerator can be expressed as n u + v n n u2i − − i=1 vi2 + i=1 ui vi i=1 n = u + v − u n ri2 − v i=1 i=1 n = s2i + ui vi i=1 ui vi i=1 Hence, cos θ = u1 v1 + u2 v2 + · · · + un u v We have proved: Theorem 1.1 For two vectors u and v in Rn , u · v = u v cos θ, where θ is the angle between u and v Alternative proof for above theorem is as following Write u/ u = (cos α, sin α) and v/ v = (cos β, sin β) Then the inner product of these two vectors is (u/ u ) · (v/ v ) = cos α cos β + sin α sin β = cos(β − α) when β > α Then β − α is the angle between u/ u and v/ v When θ = 0(π), u and v are on the same ”line” and have the same (or opposite) direction In this case, u and v are said to be linearly dependent (collinear) and u = hv for some scalar h When θ = π/2, u and v are said to be orthogonal Therefore, two non-zero vectors u and v are orthogonal if and only if u · v = As −1 ≤ cos θ ≤ 1, we immediately have from Theorem 1: Theorem 1.2 (Cauchy-Schwartz Inequality) Given two vectors u and v, |u · v| ≤ u v ; the equality holds when u and v are linearly dependent By the Cauchy-Schwartz inequality, u+v = (u · u) + (v · v) + 2(u · v) ≤ u + v + 2|u · v| ≤ u + v +2 u v = ( u + v )2 This establishes the following inequality Theorem 1.3 (Triangle Inequality) Given two vectors u and v, u+v ≤ u + v ; the equality holds when u = hv and h > If u and v are orthogonal, we have u+v 2 = u + v 2, the generalized Pythagoras theorem 1.5 Statistical Applications Given a random variable X with n observations x1 , , xn , different statistics can be used to summarize the information contained in this sample An important statistic is the sample average of xi which shows the ”central tendency” of these observations: ¯ := x n n xi = i=1 (x · ), n where x is the vector of n observations and is the vector of ones Another important statistic is the sample variance of xi which describes the ”dispersion” of these observations ¯ be the vector of deviations from the sample average Then, the sample Let x∗ = x − x variance is s2x := n−1 n ¯ )2 = (xi − x i=1 x∗ n−1 In contrast with the sample second moment of x, n n x2i = i=1 x 2, n s2x is invariant with respect to scalar addition Also note that sample variance is divided by n − rather than n This is because n n ¯ ) = 0, (xi − x i=1 so that any component of x∗ depends on the remaining n − components The square root of s2x is called the standard deviation of x, denoted as sx For two random variables X and Y with the vectors of observations x and y, their sample covariance characterizes the co-variation of these observations: sx,y := n−1 n ¯ )(yi − y ¯) = (xi − x i=1 (x∗ · y∗ ), n−1 ¯ This statistic is again invariant with respect to scalar addition where y∗ = y − y Both sample variance and covariance are not invariant with respect to constant multiplication The sample covariance normalized by corresponding standard deviations is called the sample correlation coefficient: rx,y := = = [ n ¯ )(yi − y ¯) i=1 (xi − x n n 1/2 ¯ ) ] [ i=1 (yi − i=1 (xi − x ∗ ∗ x ·y ¯ )2 ]1/2 y x∗ y∗ cos θ∗ , where θ∗ is the angle between x∗ and y∗ Clearly, rx,y is scale invariant and bounded between −1 and Special Matrices In what follows, we treat an n-dimensional vector as an n × matrix and use these terms interchangeably We also let span(A) denote the column space of A Thus, span(A ) is the row space of A 5.1 Symmetric Matrices We have learned that a matrix A is symmetric if A = A Let A be an n × k matrix Then, A A is a k × k symmetric matrix with the (i, j)th element a·i a·j where a·i is the ith column of A If A A = 0, then all the main diagonal elements are a·i a·i = a·i = It follows that all columns of A are the zero vector and hence A = As A A is symmetric, its row space and column space are the same If x ∈ span(A A)⊥ , i.e., (A A)x = 0, then x (A A)x = (Ax) (Ax) = so that Ax = That is, x is orthogonal to every row vector of A, and x ∈ span(A )⊥ This shows span(A A)⊥ ⊂ span(A )⊥ Conversely, if Ax = 0, then (A A)x = We have span(A )⊥ ⊂ span(A A)⊥ We have established: Theorem 5.1 Let A be an n × k matrix Then the row space of A is the same as the row space of A A Similarly, the column space of A is the same as the column space of AA It follows from Theorem 3.2 and Theorem 5.1 that: Theorem 5.2 Let A be an n × k matrix Then rank(A) = rank(A A) = rank(AA ) In particular, if A is of rank k < n, then A A is of full rank k so that A A is nonsingular, but AA is not of full rank, and hence a singular matrix 5.2 Quadratic Forms and Definite Matrices Recall that a second order polynomial in the variables x1 , , xn is n n aij xi xj , i=1 j=1 which can be expressed as a quadratic form: x Ax, where x is n × and A is n × n We know that an arbitrary square matrix A can be written as the sum of a symmetric 32 matrix S and a skew-symmetric matrix S∗ It is easy to verify that x S∗ x = (Check!) It is then without loss of generality to consider quadratic forms x Ax with a symmetric A The quadratic form x Ax is said to be positive definite (semi-definite) if and only if x Ax > (≥)0 for all x = A square matrix A is said to be positive definite if its quadratic form is positive definite Similarly, a matrix A is said to be negative definite (semi-definite) if and only if x Ax < (≤)0 for all x = A matrix that is not definite or semi-definite is indefinite A symmetric and positive semi-definite matrix is also known as a Grammian matrix It can be shown that A is positive definite if and only if all the principal minors:   det(a11 ), det  a11 a12 a21 a22  ,  a a12 a13   11   det  a21 a22 a23  , , det(A),  a31 a32 a33  are all positive; A is negative definite if and only if all the principal minors alternate in signs:   det(a11 ) < 0, det  a11 a12 a21 a22   > 0,  a a12 a13   11   det  a21 a22 a23  < 0,  a31 a32 a33  Thus, a positive (negative) definite matrix must be nonsingular, but a positive determinant is not sufficient for a positive definite matrix The difference between a positive (negative) definite matrix and a positive (negative) semi-definite matrix is that the latter may be singular (Why?) Theorem 5.3 Let A be positive definite and B be nonsingular Then B AB is also positive definite Proof: For any n × matrix y = 0, there exists x = such that B−1 x = y Hence, y B ABy = x B−1 (B AB)B−1 x = x Ax > 0.✷ It follows that if A is positive definite, A−1 exists and A−1 = A−1 A (A )−1 is also positive definite It can be shown that a symmetric matrix is positive definite if and only if it can be factored as P P, where P is a nonsingular matrix Let A be a symmetric and positive definite matrix so that A = P P and A−1 = P−1 P−1 33 5.3 Differentiation Involving Vectors and Matrices Let x be an ntimes1 matrix, f (x) a real function, and f (x) a vector-valued function with elements f1 (x), , fm (x) Then     ∇x f (x) =     ∂f (x) ∂x1 ∂f (x) ∂x2 ∂f (x) ∂xn     ,        ∇x f (x) =     ∂f1 (x) ∂x1 ∂f1 (x) ∂x2 ∂f1 (x) ∂xn ∂f2 (x) ∂x1 ∂f2 (x) ∂x2 ··· ··· ∂fm (x) ∂x1 ∂fm (x) ∂x2 ··· ∂f2 (x) ∂xn ··· ∂fm (x) ∂xn         Some particular examples are: f (x) = a x: ∇x f (x) = a f (x) = x Ax with A an n × n symmetric matrix: As n n n aii x2i + x Ax = i=1 aij xi xj , i=1 j=i+1 ∇x f (x) = 2Ax f (x) = Ax with A an m × n matrix: ∇x f (x) = A f (X) = trace(X) with X an n × n matrix: ∇X f (X) = In f (X) = det(X) with X an n × n matrix: ∇X f (X) = det(X)X−1 When f (x) = x Ax with A a symmetric matrix, the matrix of second order derivatives, also known as the Hessian matrix, is ∇2x f (x) = ∇x (∇x f (x)) = ∇x (2Ax) = 2A Analogous to the standard optimization problem, a necessary condition for maximizing (minimizing) the quadratic form f (x) = x Ax is ∇x f (x) = 0, and a sufficient condition for a maximum (minimum) is that the Hessian matrix is negative (positive) definite 34 5.4 Idempotent Matrices A square matrix A is said to be idempotent if A = A2 Let B and C be two n × k matrices with rank k < n An idempotent matrix can be constructed as B(C B)−1 C ; in particular, B(B B)−1 B and I are idempotent We observe that if an idempotent matrix A is nonsingular, then I = AA−1 = A2 A−1 = A(AA−1 ) = A That is, all idempotent matrices are singular, except the identity matrix I It is also easy to see that if A is idempotent, then so is I − A 5.5 Orthogonal Matrices A square matrix A is orthogonal if A A = AA = I, i.e., A−1 = A Clearly, when A is orthogonal, aj = for i = j and ai = That is, the column (row) vectors of an orthogonal matrix are orthonormal For example, the matrices we learned in Sections 4.1 and 4.3:  cos θ − sin θ  sin θ cos θ   ,  cos θ sin θ − sin θ cos θ  , are orthogonal matrices Note that when A is an orthogonal matrix, = det(I) = det(A)det(A ) = (det(A))2 , so that det(A) = ±1 Hence, if A is orthogonal, det(ABA ) = det(B) for any square matrix B If A is orthogonal and B is idempotent, then ABA is also idempotent because (ABA )(ABA ) = ABBA = ABA That is, pre- and post-multiplying a matrix by orthogonal matrices A and A preserves determinant and idempotency Also, the product of orthogonal matrices is again an orthogonal matrix (Verify!) Given two vectors u and v and their orthogonal transformations Au and Av It is easy to see that u A Av = u v and that u = Au Hence, orthogonal transformations preserve inner products, norms, angles, and distances These in turn preserve sample variances, covariances, and correlation coefficients 35 5.6 Projection Matrices Let V = V1 ⊕V2 be a vector space From Corollary 2.7, we can write y in V as y = y1 +y2 , where y1 ∈ V1 and y ∈ V2 For a matrix P, the transformation Py = y1 is called the projection of y onto V1 along V2 if and only if Py1 = y1 The matrix P is called a projection matrix The projection is said to be an orthogonal projection if and only if V1 and V2 are orthogonal complements In this case, P is called an orthogonal projection matrix which projects vectors orthogonally onto the subspace V1 Theorem 5.4 P is a projection matrix if and only if P is idempotent Proof: Let y be a non-zero vector Given that P is a projection matrix, Py = y1 = Py1 = P2 y, so that (P − P2 )y = As y is arbitrary, we have P = P2 , an idempotent matrix Conversely, if P = P2 , Py1 = P2 y = Py = y1 Hence, P is a projection matrix ✷ Let pi denote the ith column of P By idempotency of P, Ppi = pi , so that pi must be in V1 , the space to which the projection is made Theorem 5.5 A matrix P is an orthogonal projection matrix if and only if P is symmetric and idempotent Proof: Note that y2 = y − y1 = (I − P)y If y1 is the orthogonal projection of y, = y1 y2 = y P (I − P)y Hence, P (I − P) = so that P = P P and P = P P This shows that P is symmetric Idempotency follows from the proof of Theorem 5.4 Conversely, if P is symmetric and idempotent, y1 y2 = y P (y − y1 ) = y (Py − P2 y) = y (P − P2 )y = This shows that the projection is orthogonal ✷ It is readily verified that P = A(A A)−1 A is an orthogonal projection matrix, where A is an n × k matrix with full column rank Moreover, we have the following result; see Exercise 5.7 36 Theorem 5.6 Given an n × k matrix with full column rank, P = A(A A)−1 A orthogonally projects vectors in Rn onto span(A) Clearly, if P is an orthogonal projection matrix, then so is I − P, which orthogonally projects vectors in Rn onto span(A)⊥ While span(A) is k-dimensional, span(A)⊥ is (n − d)-dimensional by Theorem 2.6 5.7 Partitioned Matrices A matrix may be partitioned into sub-matrices Operations for partitioned matrices are analogous to standard matrix operations Let A and B be n × n and m × m matrices, respectively The direct sum of A and B is defined to be the (n + m) × (n + m) blockdiagonal matrix:  A⊕B= A 0 B  ; this can be generalized to the direct sum of finitely many matrices Clearly, the direct sum is associative but not commutative, i.e., A ⊕ B = B ⊕ A Consider the following partitioned matrix:  A= A11 A12 A21 A22   When either A11 or A22 is nonsingular, we have det(A) = det(A11 ) det(A22 − A21 A−1 11 A12 , det(A) = det(A22 ) det(A11 − A12 A−1 22 A21 ) −1 If A11 and A22 are nonsingular, let Q = A11 − A12 A−1 22 A21 and R = A22 − A21 A11 A12 The inverse of the partitioned matrix A can be computed as:  A −1 =   =  Q−1 −Q−1 A12 A−1 22 −1 A−1 − A−1 A Q−1 A A1 −A−1 21 12 22 22 22 A21 Q 22   −1 −1 −1 −1 −A−1 −A−1 11 − A11 A12 R A21 A11 11 A12 R −R−1 A21 A−1 11 R−1 37   In particular, if A is block-diagonal so that A12 and A21 are zero matrices, then Q = A11 , R = A22 , and  A−1 =  5.8 A−1 11 0 A−1 22   Statistical Applications Consider now the problem of explaining the behavior of the dependent variable y using k linearly independent, explanatory variables: X = [ , x2 , , xk ] Suppose that these variables contain n > k observations so that X is an n × k matrix with rank k The least ˆ = Xβ, that best fits the data squares method is to compute a regression ”hyperplane”, y (y, X) Write y = Xβ + e, where e is the vector of residuals Let f (β) = (y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ Our objective is to minimize f (β), the sum of squared residuals As X is of rank k, X X is nonsingular In view of Section 5.4, the first order condition is: ∇β f (β) = −2X y + −2(X X)β = 0, which yields the solution β = (X X)−1 X y The matrix of the second order derivatives is 2(X X), a positive definite matrix Hence, the solution β minimizes f (β) and is referred to as the ordinary least squares estimator Note that Xβ = X(X X)−1 X y, and that X(X X)−1 X is an orthogonal projection matrix The fitted regression hyperplane is in fact the orthogonal projection of y onto the column space of X It is also easy to see that e = y − X(X X)−1 y = (I − X(X X)−1 X )y, which is orthogonal to the column space of X This fitted hyperplane is thus the best approximation of y in terms of the Euclidean norm, based on the information of X 38 Eigenvalues and Eigenvectors In many applications it is important to transform a large matrix to a matrix of a simpler structure that preserves important properties of the original matrix 6.1 Eigenvalues and Eigenvectors As mentioned in the previous section, the linear transformation Ax usually changes the direction of x However, “certain exceptional” vector x are in the same direction as Ax Those are “eigenvectors.” If A is an n×n matrix, then any scalar λ satisfying the equation Ax = λx, (1) for some nonzero n × vector x, is called an eigenvalue, latent root, or characteristic root of A The vector x is called an eigenvector, latent vector, or characteristic vector of A corresponding to the eigenvalue λ, and equation (1) is called the eigenvalue-eigenvector equation of A Equation (1) can be equivalently expressed as (A − λI)x = (2) Note that if |A − λI| = 0, then (A − λI)−1 would exist, then premultiply (2) by (A − λI)−1 and we have, (A − λI)−1 (A − λI)x = x = which contradicts the assumption of x = Thus, any eigenvalue λ must satisfy |A − λI| = 0, which is known as the characteristic equation of A Using the definition of a determinant, it is ready to observe that the the characteristic equation is an nth degree polynomial in λ, i.e., (−λ)n + αn−1 (−λ)n−1 + · · · + α1 (−λ) + α0 = Since an nth degree polynomial has m roots, it follows that an n × n matrix has n eigenvalues; that is, there are n scalars λ1 , , λn , which satisfy the characteristic equation When all of the eigenvalues of A are real, it is common to denote the ith largest eigenvalue of A as λ(i) In other words, λ(1) ≥ λ(2) ≥ · · · ≥ λ(n) 39 Note that if a nonzero vector x satisfies (1) for a given value of λ, then so will (αx) for any nonzero scalar α Thus, eigenvectors are not uniquely defined unless some scalar constraint is imposed For example, only eigenvector x is considered which satisfies x x = 1, i.e., has unit length The n eigenvalues of A need not all be different since the characteristic equation may have repeated roots An eigenvalue that occurs as a single solution to the characteristic equation will be called a simple or distinct eigenvalue Otherwise, an eigenvalue will be called a multiple eigenvalues, and its multiplicity will be given by the number of times this solution is repeated The collection, SA (λ), of all eigenvectors corresponding to the particular eigenvalue λ, along with the trivial vector 0, is called the eigenspace of A associated with λ That is, SA (λ) = {x : x ∈ Rn and Ax = λx} Theorem 6.1 If SA (λ) is the eigenspace of the n × n matrix A corresponding to the root λ, then SA (λ) is a vector subspace of Rn Proof: By definition, if x ∈ SA (λ), then Ax = λx Thus, if x ∈ SA (λ) and if y ∈ SA (λ), we have for any scalars α and β A(αx + βy) = αAx + βAx = α(Ax) + β(Ax) = λ(αx + βy) Consequently, (αx + βy) ∈ SA (λ), and so SA (λ) is a vector space ✷ Note that the dimension of a eigenspace of a given λ is not always equal to the multiplicity of λ As an example, consider the matrix   −2     A =  −2   0  It is easy to verify that det(A − λI) = −(λ − 1)(λ − 5)2 = The eigenvalues are thus and When λ = 1, we have    −2   p1      (A − λI)p =  −2   p2  =  0  p3  40 It follows that p1 = p2 = a for any a and p3 = Thus, {(1, 1, 0) } is a basis for the eigenspace corresponding to λ = Similarly, when λ = 5,    −2 −2   p1      (A − λI)p =  −2 −2   p2  =  0  p3  We have p1 = −p2 = a and p3 = b for any a, b Hence, {(1, −1, 0) , (0, 0, 1) } is a basis for the eigenspace corresponding to λ = 6.2 Some Basic Properties of Eigenvalues and Eigenvectors Theorem 6.2 Let A be an n × n matrix Then (a) The eigenvalues of A are the same as the eigenvalues of A (b) A is singular if and only if at least one eigenvalue of A is equal to (c) The diagonal elements of A are the eigenvalues of A if A is a triangular matrix (d) The eigenvalues of BAB−1 are the same as the eigenvalues of A, if B is a nonsingular n × n matrix (e) Each of the eigenvalues of A is either or -1 if A is an orthogonal matrix Theorem 6.3 Suppose λ is an eigenvalue with multiplicity r ≥ of the n × n matrix A Then ≤ dim{SA (λ)} ≤ r Theorem 6.4 Let λ be an eigenvalue of the n × n matrix A and let x be a corresponding eigenvector Then (a) If r ≥ is an integer, λr is an eigenvalue of Ar corresponding to the eigenvector x (b) If A is nonsingular, λ−1 is an eigenvalue of A−1 corresponding to the eigenvector x Theorem 6.5 Let A be an n × n matrix with eigenvalues λ1 , , λn Then 41 (a) trace(A) = (b) |A| = n i=1 λi , n i=1 λi Theorem 6.6 Suppose x1 , , xr are eigenvectors of the n × n matrix A, where r ≤ n If the corresponding eigenvalues λ1 , , λr are such that λi = λj for all i = j, then the vectors x1 , xr are linearly independent If the eigenvalues λ1 , , λn of n × n matrix A are all distinct, the matrix X = (x1 , , xn ), where xi is the eigenvector corresponding to λi , is nonsingular Define the matrix Λ = diag(λ1 , , λn ), then AX = XΛ as Axi = λi xi Premultiplying this equation by X−1 , i.e., X−1 AX = X−1 XΛ = Λ Any square matrix that can be transformed to a diagonal matrix through the postmultiplication by a nonsingular matrix and premultiplication by its inverse is said to be diagonalizable Thus, a square matrix with distinct eigenvalues is diagonalizable Clearly, when a matrix is diagonalizable, its rank equals the number of its of its nonzero eigenvalues, since rank(A) = rank(X−1 AX) = rank(Λ) Theorem 6.7 Let A be an n × n real symmetric matrix Then the eigenvalues of A are real, and corresponding to any eigenvalue there exist eigenvectors that are real Proof: See Schott (1997) page 93 Theorem 6.8 If the n × n matrix A is symmetric, then it is possible to construct a set of n eigenvectors of A such that the set is orthonormal Proof: See Schott (1997) page 95 If the n × n matrix X = (x1 , , xn ), where x1 , , xn are the orthonormal vectors and Λ = diag(λ1 , , λn ), then the eigenvalue-eigenvector equation Axi = λi xi can be expressed collectively as the matrix equation AX = XΛ Since the columns of X are orthonormal vectors, X is an orthogonal matrix As XX = In , then X AX = X XΛ XX AXX = XΛX , 42 this is known as the spectral decomposition of A Theorem 6.9 Suppose that the n × n matrix A has r nonzero eigenvalues Then, if A is symmetric, rank(A) = r Proof: See Schott (1997) page 99-100 Theorem 6.10 Let A be an n × n matrix with eigenvalues λ1 , , λn Then, (a) A is positive definite if and only if λ > for all i (b) A is positive semidefinite if and only if λ ≥> for all i and λi = for al least one i Proof: See Schott (1997) page 112 Theorem 6.11 Let Y be an n × m matrix, with rank(Y = r Then Y Y has r positive eigenvalues It is positive definite if r = m and positive semidefinite if r < m Proof: See Schott (1997) page 114 6.3 Diagonalization When x is an eigenvector, multiplication by A is just multiplication by a number: Ax = λx All the difficulties of matrices are swept away Instead of watching every interconnection in the system, we follow the eigenvectors separately It is like having a diagonal matrix, with no off-diagonal interconnection A diagonal matrix is easy to square and to invert This is why we discuss how to diagonalized a matrix using the eigenvectors properly Two n × n matrices A and B are said to be similar if there exists a nonsingular matrix P such that B = P−1 AP, or equivalently PBP−1 = A The following results show that similarity transformation preserves important properties of a matrix Theorem 6.12 Let A and B be two similar matrices Then, (a) det(A) = det(B) (b) trace(A) = trace(B) 43 (c) A and B have the same eigenvalues (d) PqB = qA , where qA and qB are eigenvectors of A and B, respectively (e) If A and B are nonsingular, then A−1 is similar to B−1 Proof: Part (a), (b) and (e) are obvious Part (c) follows because det(B − λI) = det(P−1 AP − λP−1 P) = det(P−1 (A − λI)P) = det(A − λI) For part (d), we note that AP = PB Then, APqB = PBqB = λPqB This shows that PqB is an eigenvector ✷ Of particular interest to us is the similarity between a square matrix and a diagonal matrix A square matrix A is said to be diagonalizable if A is similar to a diagonal matrix Λ, i.e., Λ = P−1 AP or equivalently A = PΛP−1 for some nonsingular matrix P We also say that P diagonalizes A When A is diagonalizable, we have Api = λi pi , where pi is the ith column of P and λi is the ith diagonal element of Λ That is, pi is an eigenvector of A corresponding to the eigenvalue λi When Λ = P−1 AP, these eigenvectors must be linearly independent Conversely, if A has n linearly independent eigenvectors pi corresponding to eigenvalues λi , i = 1, , n, we can write AP = PΛ, where pi is the ith column of P, and Λ contains diagonal terms λi That pi are linearly independent implies that P is invertible It follows that P diagonalizes A We have proved: Theorem 6.13 Let A be an n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors Eigenvalues of a matrix need not be distinct; an eigenvalue may be repeated with multiplicity k The result below indicates that if A has n distinct eigenvalues, the associated eigenvectors are linearly independent Theorem 6.14 If p1 , , pn are eigenvectors of A corresponding to distinct eigenvalues λ1 , , λn , then {p1 , , pn } is a linearly independent set 44 For an n × n matrix A, that A has n distinct eigenvalues is a sufficient (but not necessary) condition for diagonalizability by Theorem 6.2 and 6.3 When some eigenvalues are equal, however, not much can be asserted in general When A has n distinct eigenvalues, Λ = P−1 AP by Theorem 6.2 It follows that n det(A) = det(Λ) = λi , i=1 n trace(A) = trace(Λ) = λi i=1 In this case, A is nonsingular if and only if its eigenvalues are all non-zero If we know that A is nonsingular, then by Theorem 6.1 (e), A−1 is similar to Λ−1 and A−1 has eigenvectors pi corresponding to eigenvalues 1/λi It is also easy to verify that Λ + cI = P−1 (A + cI)P and Λk = PAk P 6.4 Orthogonal Diagonalization A square matrix A is said to be orthogonally diagonalizable if there is an orthogonal matrix P that diagonalizes A, i.e., Λ = P AP in the light of the proof of Theorem 6.2, we have the following result Theorem 6.15 Let A be an n × n matrix Then A is orthogonally diagonalizable if and only if A has an orthonormal set of n eigenvectors If A is orthogonally diagonalizable, then Λ = P AP so that A = PΛP is a symmetric matrix The converse is also true, but its proof is more difficult and hence omitted We have: Theorem 6.16 A matrix A is orthogonally diagonalizable if and only if A is symmetric Moreover, we note that a symmetric matrix A has only real eigenvalues and eigenvectors If an eigenvalue λ of an n × n symmetric matrix A is repeated with multiplicity k, then in view of Theorem 6.4 and Theorem 6.5, there must exist exactly k orthogonal eigenvectors corresponding to λ Hence, this eigenspace is k-dimensional It follows that rank(A − λI) = n − k This implies that when λ = is repeated with multiplicity k, rank(A) = n − k This proves: 45 Theorem 6.17 The number of non-zero eigenvalues of a symmetric matrix A is equal to rank(A) When A is orthogonally diagonalizable, we note that n λi pi pi , A = PΛP = i=1 where pi is the ith column of P This is known as the spectral (canonical) decomposition of A which applies to both singular and nonsingular symmetric matrices It can be seen that pi pi , is an orthogonal projection matrix which orthogonally projects vectors onto pi We also have the following results for some special matrices Let A be an orthogonal matrix and p be its eigenvector corresponding to the eigenvalue λ Observe that p p = p A Ap = λ2 p p Thus, the eigenvalues of an orthogonal matrix must be ±1 It can also be shown that the eigenvalues of a positive definite (semi-definite) matrix are positive (non-negative) If A is symmetric and idempotent, then for any x = 0, x Ax = x A Ax ≥ That is, a symmetric and idempotent matrix must be positive semi-definite and therefore must have non-negative eigenvalues In fact, as A is orthogonally diagonalizable, we have Λ = P AP = P APP AP = Λ2 Consequently, the eigenvalues of A must be either or 46 [...]... and B is the matrix A + B = C with cij = aij + bij for all i, j given When n1 = n2 , k1 = k2 which is the conformable condition for matrix addition (c) The scalar multiplication of A is the matrix hA = C with cij = haij for all i, j (d) The product of A and B is the n×m matrix AB = C with cij = ai· ·b·j = k1 s=1 ais bsj for all i, j given k1 = n2 which is the conformable condition for matrix multiplication...   0 · · · ann Besides, when aii = c for all i, it is a scalar matrix; when c = 1, it is the identity matrix, denoted as In 16 (e) A lower triangular matrix is a square matrix of the following form:  a11    a21 A=    0 ··· 0  a22 ··· 0    ,    an1 an2 · · · ann i.e., aij = 0 for all j > i; similarly, an upper triangular matrix is a square matrix with the non-zero elements located... 3.3 Matrix Rank For an n × m matrix A, any matrix formed by deleting rows or columns of A is called a submatrix of A The determinant of an r × r submatrix of A is called a minor of order r Definition 3.3 The rank of a nonnull n × m matrix A is r, written rank(A) = r, if at least one of its minors of order r is nonzero while all minors of order r + 1 (if there are any) are zero If A is a null matrix, ... an elementary transformation matrix An elementary transformation of the rows of A will be given by the premultiplication of A by an elementary transformation matrix, while an elementary transformation of the columns corresponds to a postmultiplication Elementary transformation matrices are nonsingular and any nonsingular matrix can be expressed as the product of elementary transformation matrices Theorem... transformation matrices Theorem 3.6 Let A be an n × m matrix, B an n × n matrix, and C an m × m matrix Then if B and C are nonsingular matrices, it follows that rank(BAC) = rank(BA) = rank(AC) = rank(A) By using elementary transformation matrices, any matrix A can be transformed to another matrix of simpler form having the same rank as A Theorem 3.7 If A is an n × m matrix of rank r > 0, then there exist nonsingular... generality to consider quadratic forms x Ax with a symmetric A The quadratic form x Ax is said to be positive definite (semi-definite) if and only if x Ax > (≥)0 for all x = 0 A square matrix A is said to be positive definite if its quadratic form is positive definite Similarly, a matrix A is said to be negative definite (semi-definite) if and only if x Ax < (≤)0 for all x = 0 A matrix that is not definite... m × n and B is p × q 3.4 Matrix Inversion For any square matrix A, if there exists a matrix B such that AB = BA = I, then A is said to be invertible, and B is the inverse of A, denoted as A−1 Note that A−1 need not 24 exist; if it does, it is unique An invertible matrix is also known as a nonsingular matrix; otherwise it is singular Let A and B be two nonsingular matrices Matrix inversion has the... adjoint of A, denoted as adj(A) The inverse of a matrix can also be computed using its adjoint matrix, as shown in the following result; for a proof see Anton (1981, p 81) Theorem 3.11 Let A be an invertible matrix, then A−1 = 1 adj(A) det(A) Theorem 3.12 Suppose A and B are nonsingular with A being m × m and B being n × n For any m × n matrix C and any n × m matrix D, it follows that if A + CBD is nonsingular... as uv which is a n × n matrix Some specific matrices are defined as follows Definition 3.1 (a) A becomes a scalar when n = k = 1; A reduces to be a row (column) vector when n = 1 (k = 1) (b) A is a square matrix if n = k (c) A is a zero matrix if aij = 0 for all i = 1, , n and j = 1, , k (d) A is a diagonal matrix when n = k such that aij = 0 for all i = j and aii = 0 for some is, i.e.,  a11... (negative) definite matrix must be nonsingular, but a positive determinant is not sufficient for a positive definite matrix The difference between a positive (negative) definite matrix and a positive (negative) semi-definite matrix is that the latter may be singular (Why?) Theorem 5.3 Let A be positive definite and B be nonsingular Then B AB is also positive definite Proof: For any n × 1 matrix y = 0, there ... if and only if A = 3.3 Matrix Rank For an n × m matrix A, any matrix formed by deleting rows or columns of A is called a submatrix of A The determinant of an r × r submatrix of A is called a... equal if aij = bij for all i, j (b) The sum of A and B is the matrix A + B = C with cij = aij + bij for all i, j given When n1 = n2 , k1 = k2 which is the conformable condition for matrix addition... Besides, when aii = c for all i, it is a scalar matrix; when c = 1, it is the identity matrix, denoted as In 16 (e) A lower triangular matrix is a square matrix of the following form:  a11   

Định dạng
Số trang	49
Dung lượng	246,98 KB