Preliminaries
This section covers fundamental concepts related to real matrices, assuming the reader has a foundational understanding of matrix operations and the basic properties of determinants.
Anm×nmatrixconsists ofmnreal numbers arranged inmrows andncolumns.
We denote matrices by bold letters The entry in rowiand columnj of the matrix
Ais denoted bya ij Anm×1 matrix is called acolumn vectorof orderm; similarly, a 1×nmatrix is arow vectorof ordern Anm×nmatrix is called asquarematrix ifmn.
IfA , Barem×nmatrices, thenA+Bis defined as them×nmatrix with (i, j)-entrya ij +b ij IfAis a matrix andcis a real number, thenc Ais obtained by multiplying each element ofAbyc.
IfAism×pandBisp×n, then their productCABis anm×nmatrix with (i, j)-entry given by c ij p k1 a ik b kj
(AB)CA(BC), A(B+C)AB+AC ,
Thetransposeof them×nmatrixA, denoted byA ,is then×mmatrix whose (i, j)-entry isa j i It can be verified that (A ) A ,(A+B) A +B ,(AB)
A good understanding of the definition of matrix multiplication is quite useful.
This article highlights essential facts regarding matrix multiplication, emphasizing that all products discussed are defined under the assumption that the matrices involved are compatible in terms of their dimensions.
(i) Thejth column ofABis the same asAmultiplied by thejth column ofB.
(ii) Theith row ofABis the same as theith row ofAmultiplied byB
(iii) The (i, j)-entry ofABCis obtained as
, where (x 1 , , x p ) is theith row ofAand (y 1 , , y q ) is thejth column of
, wherea i denote columns ofAandb j denote rows ofB, then
Adiagonal matrixis a square matrixAsuch thata ij 0, i j We denote the diagonal matrix
The matrix diag(λ₁, , λₙ) becomes the identity matrix Iₙ when all λᵢ equal 1 In contexts where the order is clear, I may be used instead of Iₙ It's important to note that for any square matrix A, the equation AII = A holds true.
The entriesa 11 , , a nn are said to constitute the (main) diagonal entries ofA.
ThetraceofAis defined as traceA a 11+ ã ã ã +a nn
It follows from this definition that ifA , Bare matrices such that bothABandBA are defined, then traceABtraceBA
Thedeterminantof ann×nmatrixA, denoted by|A|,is defined as
(σ)a 1σ(1)ã ã ãa nσ (n) where the summation is over all permutations{σ(1), , σ(n)}of{1, , n}and
(σ) is 1 or−1 according asσis even or odd.
We state some basic properties of the determinant without proof:
(i) The determinant can be evaluated by expansion along a row or a column. Thus, expanding along the first row,
(−1) 1 +j a 1j |A 1j |, whereA 1j is the submatrix obtained by deleting the first row and thejth column ofA We also note that n j 1
The determinant of a matrix alters its sign when two rows or columns are swapped Additionally, the determinant remains the same if a constant multiple of one row is added to another row, and this property is also applicable to columns.
(iv) The determinant is a linear function of any column (row) when all the other columns (rows) are held fixed.
The matrixAisupper triangularifa ij 0, i > j The transpose of an upper triangular matrix islower triangular.
It will often be necessary to work with matrices in partitioned form For example, let
Matrices B21 and B22 consist of individual elements Aij and Bij, which are themselves matrices Assuming compatibility for matrix multiplication, known as partitioned conformality, we can express this relationship mathematically.
1 Construct a 3×3 matrixAsuch that bothA , A 2 are nonzero butA 3 0
2 Decide whether the determinant of the following matrix Ais even or odd, without evaluating it explicitly:
Can you find 3×3 matricesX , Ysuch thatXY−YXA?
4 IfA , Baren×nmatrices, show that
5 Evaluate the determinant of then×nmatrixA, wherea ij ij ifi j and a ij 1+ij ifij.
6 LetAbe ann×nmatrix and supposeAhas a zero submatrix of orderr×s wherer+sn+1 Show that|A| 0.
Vector Spaces and Subspaces
A nonempty setSis called avector spaceif it satisfies the following conditions: (i) For anyx , yinS,x+yis defined and is inS.Furthermore, x+yy+x , (commutativity) x+(y+z)(x+y)+z (associativity)
(ii) There exists an element inS, denoted by0, such thatx+0xfor allx.
(iii) For anyxinS, there exists an elementyinSsuch thatx+y0.
(iv) For anyxinSand any real numberc,c xis defined and is inS; moreover,
(v) For anyx 1 , x 2 inSand real numbersc 1 , c 2 , c 1(x 1 +x 2 )c 1 x 1 +c 1 x 2 ,(c 1+ c 2)x 1 c 1 x 1 +c 2 x 1 andc 1(c 2 x 1 )(c 1 c 2)x 1
Vectors are fundamental elements in mathematics, and when two vectors, x and y, are summed, the process is known as vector addition The zero vector, as mentioned in the context, represents a unique vector that has no magnitude Additionally, scalar multiplication refers to the operation where a vector is multiplied by a scalar A vector space can be defined in relation to any field, but for our discussion, we will focus on the field of real numbers, which is adequate for our needs.
Basis and Dimension
The collection of column vectors of order n, as well as the collection of row vectors of the same order, both constitute vector spaces These two types of vector spaces are the primary focus in most discussions.
Let \( \mathbb{R}^n \) represent the set of all n-tuples of real numbers, where \( \mathbb{R} \) denotes the set of real numbers In this context, elements of \( \mathbb{R}^n \) can be expressed as either column vectors or row vectors, depending on the convenience of the situation.
IfS, T are vector spaces andS ⊂T, thenSis called asubspaceofT
In the context of vector spaces, R³ is a fundamental example, including the trivial subspace consisting solely of the zero vector For any real numbers c₁, c₂, and c₃, the collection of vectors x ∈ R³ that satisfy the equation c₁x₁ + c₂x₂ + c₃x₃ = 0 forms a subspace, which geometrically represents a plane that passes through the origin Furthermore, the intersection of two distinct planes through the origin results in a straight line that also qualifies as a subspace Ultimately, these configurations— the zero vector, planes, and lines— encompass all possible subspaces of R³.
1 Which of the following sets are vector spaces (with the natural operations of addition and scalar multiplication)? (i) Vectors (a, b, c, d) such thata+2b c−d; (ii)n×nmatricesAsuch thatA 2 I; (iii) 3×3 matricesAsuch that a 11+a 13 a 22+a 31
2 IfSandT are vector spaces, then areS∪T andS∩T vector spaces as well?
The linear span of vectors \( x_1, \ldots, x_m \) is defined as the collection of all possible linear combinations of these vectors, expressed as \( c_1 x_1 + c_2 x_2 + \ldots + c_m x_m \), where \( c_1, \ldots, c_m \) are real numbers This linear span forms a subspace, which is evident from its definition.
A set of vectorsx 1 , , x m is said to belinearly dependentif there exist real numbersc 1 , , c m such that at least onec i is nonzero andc 1 x 1 +ã ã ã+c m x m 0
A collection of vectors is considered linearly independent if it does not exhibit linear dependence It is important to note that we should refer to a collection or multiset of vectors rather than simply a set Therefore, when discussing the linear dependence or independence of vectors \(x_1, \ldots, x_m\), we acknowledge that the vectors may not necessarily be distinct.
The following statements are easily proved:
(i) The set consisting of the zero vector alone is linearly dependent.
(ii) IfX⊂Y and ifXis linearly dependent, then so isY.
(iii) IfX⊂Y and ifY is linearly independent, then so isX.
A set of vectors is said to form abasisfor the vector spaceS if it is linearly independent and its linear span equalsS.
Lete i be theith column of then×nidentity matrix The sete 1 , , e n forms a basis forR n , called thestandard basis.
If x 1 , , x m is a basis for S, then any vector x in S admits a unique representation as a linear combinationc 1 x 1 + ã ã ã +c m x m For if x c 1 x 1 + ã ã ã +c m x m d 1 x 1 + ã ã ã +d m x m , then
(c 1−d 1)x 1 + ã ã ã +(c m −d m )x m 0 , and sincex 1 , , x m are linearly independent,c i d i for eachi.
A vector space is classified as finite-dimensional when it has a basis made up of a finite number of vectors Notably, the vector space that includes only the zero vector is also considered finite-dimensional In this discussion, we will focus exclusively on finite-dimensional vector spaces, typically assuming that these spaces are nontrivial, meaning they contain vectors in addition to the zero vector.
3.1 LetSbe a vector space Then any two bases ofShave the same cardinality.
Proof Supposex 1 , , x p andy 1 , , y q are bases forS and let, if possible, p > q We can express everyx i as a linear combination ofy 1 , , y q Thus there exists ap×qmatrixA(a ij ) such that x i q j 1 a ij y j , i1, , p (1)
Similarly, there exists aq×pmatrixB(b ij ) such that y j p k 1 b j k x k , j 1, , q (2)
From (1),(2) we see that x i p k 1 c ik x k , i1, , p, (3) whereCAB It follows from (3) and the observation made preceding3.1that
ABI, the identity matrix of orderp Addp−qzero columns toAto get the p×pmatrixU Similarly, addp−qzero rows toBto get thep×pmatrixV.
ThenUV AB I Therefore,|UV| 1 However,|U| |V| 0, sinceU has a zero column andVhas a zero row Thus we have a contradiction, and hence p≤q We can similarly prove thatq ≤p, it follows thatpq 2
In the process of proving3.1we have proved the following statement which will be useful LetSbe a vector space Supposex 1 , , x p is a basis forSand suppose the sety 1 , , y q spansS Thenp≤q.
The dimension of a vector space S, represented as dim(S), is defined as the number of vectors in a basis for S According to convention, the dimension of the space that contains only the zero vector is considered to be zero.
LetS, T be vector spaces We say thatSisisomorphictoT if there exists a one- to-one and onto mapf :S−→Tsuch thatfislinear,i.e.,f(x+y)f(x)+f(y) andf(c x)cf(x) for allx , yinSand real numbersc.
3.2 LetS, T be vector spaces ThenS, T are isomorphic if and only ifdim(S) dim(T).
Proof We first prove theonly if part Supposef :S−→T is an isomorphism.
Ifx 1 , , x k is a basis forS, then we will show thatf(x 1 ), , f(x k ) is a basis forT First supposec 1 f(x 1 )+ ã ã ã +c k f(x k )0 It follows from the definition of isomorphism thatf(c 1 x 1 + ã ã ã +c k x k )0and hencec 1 x 1 + ã ã ã +c k x k 0.
The linear independence of the vectors x₁, , xₖ implies that the coefficients c₁, , cₖ are non-zero, leading to the conclusion that f(x₁), , f(xₖ) are also linearly independent For any vector v in T, there exists a corresponding vector u in S such that f(u) is equal to v We can express u as a linear combination of the independent vectors, u = d₁x₁ + + dₖxₖ, for some coefficients d₁, , dₖ Consequently, v can be represented as f(u) = d₁f(x₁) + + dₖf(xₖ), demonstrating that f(x₁), , f(xₖ) span the space T and thus form a basis for T This leads to the conclusion that the dimension of T is equal to k.
To prove the converse, letx 1 , , x k ;y 1 , , y k be bases forS,T, respectively. (Since dim(S)dim(T), the bases have the same cardinality.) AnyxinSadmits a unique representation x c 1 x 1 + ã ã ã +c k x k
Definef(x)y, wherey c 1 y 1 + ã ã ã +c k y k It can be verified thatf satisfies the definition of isomorphism 2
3.3 Let S be a vector space and suppose S is the linear span of the vectors x 1 , , x m If some x i is a linear combination of x 1 , , x i − 1 , x i + 1 , x m , then these latter vectors also spanS.
3.4 Let S be a vector space of dimension n and let x 1 , , x m be linearly independent vectors inS Then there exists a basis forScontaining x 1 , , x m
In this proof, let \( y_1, \ldots, y_n \) be a basis for the space \( S \), and consider the set \( x_1, \ldots, x_m, y_1, \ldots, y_n \), which is linearly dependent This implies there exists a linear combination \( c_1 x_1 + \ldots + c_m x_m + d_1 y_1 + \ldots + d_n y_n = 0 \) where at least one \( c_i \) or \( d_i \) is nonzero Given that \( x_1, \ldots, x_m \) are linearly independent, it follows that some \( d_i \) must be nonzero, indicating that some \( y_i \) can be expressed as a linear combination of the other vectors Consequently, the set \( x_1, \ldots, x_m, y_1, \ldots, y_{i-1}, y_{i+1}, \ldots, y_n \) also spans \( S \) If this new set is linearly independent, we have achieved the required basis; if not, we continue the process until a basis containing \( x_1, \ldots, x_m \) is established.
3.5 Any set ofn+1vectors inR n is linearly dependent.
Proof If the set is linearly independent then by3.4we can find a basis forR n containing the set This is a contradiction since every basis forR n must contain preciselynvectors 2
To establish a basis for the vector space S, we begin by selecting vectors \( x_1, \ldots, x_m \) in S, ensuring that each chosen vector is linearly independent If at any point the selected vectors span S, we have successfully identified a basis However, if they do not span S, there exists an additional vector \( x_{m+1} \) in S that lies outside the linear span of the previously selected vectors By adding \( x_{m+1} \) to our set, we maintain linear independence This process continues until we reach a conclusion, as it is guaranteed that any \( n+1 \) vectors in \( \mathbb{R}^n \) are linearly dependent, ensuring that the selection will eventually terminate.
3.7 IfSis a subspace ofT, thendim(S)≤dim(T).Furthermore, equality holds if and only ifST
In finite-dimensional vector spaces, if the dimension of space S is p and the dimension of space T is q, then any set of r vectors in T is linearly dependent when r exceeds q Given that the vectors x₁, , xₚ form a linearly independent set in S, which is a subset of T, it follows that p must be less than or equal to q.
To demonstrate the second part, assume \( p \) and \( S \) such that \( T \) contains a vector \( z \) not in the span of \( x_1, \ldots, x_p \) This implies that the set \( x_1, \ldots, x_p, z \) is linearly independent However, this contradicts the earlier remark that any \( p+1 \) vectors in \( T \) must be linearly dependent Thus, we have established that if
Sis a subspace ofT and if dimS dimT, thenS T Conversely, ifS T, then clearly dimSdimT, and the proof is complete 2
To determine if the given sets are vector spaces and to find their dimensions, we analyze three specific cases: (i) The set of vectors (a, b, c, d) satisfying the condition a + b = c + d forms a vector space, with a dimension of 3; (ii) The set of n×n matrices with a zero trace is also a vector space, and its dimension is n² - 1; (iii) The set of solutions (x, y, z) to the system of equations 2x - y = 0 and 2y + 3z = 0 constitutes a vector space, with a dimension of 2.
2 Ifx , y , zis a basis forR 3 , which of the following are also bases forR 3 ? (i) x+2y,y+ 3z,x+2z; (ii)x+y-2z,x- 2y+z, -2x+y+z; (iii)x,y,x+y+z.
3 If {x 1 , x 2 } and {y 1 , y 2 } are both bases of R 2 , show that at least one of the following statements is true: (i) {x 1 , y 2 },{x 2 , y 1 }are both bases of R 2 ;(ii){x 1 , y 1 },{x 2 , y 2 }are both bases ofR 2
Rank
LetAbe anm×nmatrix The subspace ofR m spanned by the column vectors ofAis called thecolumn spaceor thecolumn spanofAand is denoted byC(A).
Similarly, the subspace ofR n spanned by the row vectors ofAis called therow spaceofA, denoted byR(A) Clearly,R(A) is isomorphic toC(A ) The dimension
The column rank of a matrix, defined as the rank of its column space, is equivalent to the row rank, which refers to the dimension of its row space This fundamental property of matrices is often briefly mentioned in linear algebra texts, as both ranks are always equal.
4.1 The column rank of a matrix equals its row rank.
Let A be an m×n matrix with column rank r, which implies that the column space C(A) has a basis consisting of r vectors, denoted as b1, , br Construct the m×r matrix B from these basis vectors Since every column of A can be expressed as a linear combination of b1, , br, we can represent A as the product ABC for some r×n matrix C Consequently, every row of A is a linear combination of the rows of C, leading to the conclusion that the row space R(A) is contained within R(C) This relationship indicates that the dimension of R(A), or the row rank of A, cannot exceed r Furthermore, by applying similar reasoning, we can demonstrate that the column rank of A is also limited by the row rank, establishing that the column rank and row rank of A are equal.
The rank of a matrix A, denoted as R(A), is defined as the common value of its column rank and row rank It is important to distinguish this notation from that used to represent the row space of A, which is also denoted as R(A).
It is obvious thatR(A)R(A ).The rank ofAis zero if and only ifAis the zero matrix.
4.2 Let A , B be matrices such that AB is defined Then
Proof A vector inC(AB) is of the formABxfor some vectorx, and therefore it belongs toC(A) ThusC(AB)⊂C(A), and hence by3.7,
Now using this fact we have
4.3 Let A be anm×nmatrix of rankr, r 0 Then there exist matrices B , C of orderm×r andr×n, respectively, such thatR(B)R(C)rand ABC
This decomposition is called a rank factorization of A
The proof follows a similar approach as in section 4.1, allowing us to express A as the product of matrices B and C, where B is an m×r matrix and C is an r×n matrix Given that the columns of B are linearly independent, the rank of B, denoted R(B), is equal to r Additionally, since C has r rows, the rank of C, R(C), is less than or equal to r According to the findings in section 4.2, it follows that r times the rank of A is less than or equal to the rank of C, leading to the conclusion that R(C) is at most r.
Throughout this book whenever we talk of rank factorization of a matrix it is implicitly assumed that the matrix is nonzero.
Proof LetAXY , BUVbe rank factorizations ofA , B Then
Letx 1 , , x p andu 1 , , u q be bases forC(X),C(U), respectively Any vector in the column space of [X , U] can be expressed as a linear combination of these p+qvectors Thus
R[X , U]≤R(X)+R(U)R(A)+R(B), and the proof is complete 2
The following operations performed on a matrixAare calledelementary column operations:
(ii) Multiply a column ofAby a nonzero scalar.
(iii) Add a scalar multiple of one column to another column.
Elementary row and column operations do not alter the column space C(A) of a matrix, which means they do not change the matrix's rank These operations are essential for computations, as they allow us to simplify the matrix by introducing zeros To determine the rank of a matrix, we first apply these operations to reduce it, and then we can easily compute the rank of the resulting matrix.
1 Find the rank of the following matrix for each real numberα:
2 Let{x 1 , , x p },{y 1 , , y q }be linearly independent sets inR n ,wherep < q≤n Show that there existsi∈ {1, , q}such that{x 1 , , x p , y i }is linearly independent.
3 LetAbe anm×nmatrix and letBbe obtained by changing anykentries of
4 LetA , B , Cben×nmatrices Is it always true thatR(ABC)≤R(AC)?
Orthogonality
LetS be a vector space A function that assigns a real numberx , yto every pair of vectorsx , yinSis said to be aninner productif it satisfies the following conditions:
(ii) x , x ≥0 and equality holds if and only ifx0
InR n , x , y x y x 1 y 1+ ã ã ã +x n y n is easily seen to be an inner product.
We will work with this inner product while dealing withR n and its subspaces, unless indicated otherwise.
For a vectorx, the positive square root of the inner productx , xis called the normofx, denoted byx Vectorsx , yare said to beorthogonalorperpendicular ifx , y 0, in which case we writex⊥y
5.1 If x 1 , , x m are pairwise orthogonal nonzero vectors, then they are linearly independent.
Proof Supposec 1 x 1 + ã ã ã +c m x m 0 Then c 1 x 1 + ã ã ã +c m x m , x 1 0, and hence, m i 1 c i x i , x 1 0.
Since the vectorsx 1 , , x m are pairwise orthogonal, it follows thatc 1 x 1 , x 1 0, and sincex 1 is nonzero, c 1 0 Similarly, we can show that each c i is zero.
Therefore, the vectors are linearly independent 2
A set of vectorsx 1 , , x m is said to form anorthonormal basisfor the vector spaceSif the set is a basis forSand furthermore,x i , x j is 0 ifi j and 1 if ij.
We now describe theGram–Schmidt procedure, which produces an orthonormal basis starting with a given basisx 1 , , x n
Sety 1 x 1 Having definedy 1 , , y i − 1 ,we define y i x i −a i,i− 1 y i − 1 − ã ã ã −a i1 y 1 , wherea i,i− 1 , , a i 1are chosen so thaty i is orthogonal toy 1 , , y i − 1 Thus we must solvey i , y j 0, j1, , i−1 This leads to x i −a i,i−1 y i − 1 − ã ã ã −a i1 y 1 , y j 0, j 1, , i−1, which gives x i , y j − i−
Now, sincey 1 , , y i − 1 is an orthogonal set, we get x i , y j −a ij y j , y j 0,
12 1 Vector Spaces and Matrices and hence, a ij x i , y j y j , y j , j 1, , i−1.
The process continues to derive the basis vectors \( y_1, \ldots, y_n \) of pairwise orthogonal vectors Given that \( x_1, \ldots, x_n \) are linearly independent, each \( y_i \) is guaranteed to be nonzero By defining \( z_i \) as a normalized version of \( y_i \), the set \( z_1, \ldots, z_n \) forms an orthonormal basis Importantly, the linear span of \( z_1, \ldots, z_i \) corresponds to the linear span of \( x_1, \ldots, x_i \) for each \( i \).
We remark that given a set of linearly independent vectorsx 1 , , x m ,the Gram– Schmidt procedure described above can be used to produce a pairwise orthogonal sety 1 , , y m , such thaty i is a linear combination ofx 1 , , x i − 1 , i1, , m.
This fact is used in the proof of the next result.
LetW be a set (not necessarily a subspace) of vectors in a vector spaceS We define
It follows from the definitions thatW ⊥ is a subspace ofS.
5.2 LetSbe a subspace of the vector spaceT and let x∈T Then there exists a unique decomposition xu+v such that u ∈Sand v ∈S ⊥ The vector u is called the orthogonal projection of x on the vector spaceS.
To prove the required decomposition, consider the case where \( x \in S \); in this scenario, \( x + 0 \) serves as the decomposition If \( x \) is not in \( S \), let \( x_1, \ldots, x_m \) represent a basis for \( S \) By applying the Gram-Schmidt process to the set \( x_1, \ldots, x_m, x \), we generate a sequence of pairwise orthogonal vectors \( y_1, \ldots, y_m, v \) Since \( v \) is orthogonal to each \( y_i \) and the linear span of \( y_1, \ldots, y_m \) is equivalent to that of \( x_1, \ldots, x_m \), it follows that \( v \in S^\perp \) Furthermore, \( x - v \) can be expressed as a linear combination of \( y_1, \ldots, y_m \), confirming that \( x - v \in S \) Thus, \( x(x - v) + v \) constitutes the required decomposition Finally, we must demonstrate the uniqueness of this decomposition.
Ifx u 1 +v 1 u 2 +v 2 are two decompositions satisfyingu 1 ∈ S, u 2 ∈
Sinceu 1 −u 2 , v 1 −v 2 0,it follows from the preceding equation thatu 1 − u 2 , u 1 −u 2 0.Thenu 1 −u 2 0, and henceu 1 u 2 It easily follows that v 1 v 2 Thus the decomposition is unique 2
5.3 LetW be a subset of the vector spaceT and letSbe the linear span ofW. Then dim(S)+dim(W ⊥ )dim(T).
Proof Suppose dim(S)m,dim(W ⊥ )n, and dim(T)p Letx 1 , , x m andy 1 , , y n be bases forS, W ⊥ , respectively Suppose c 1 x 1 + ã ã ã +c m x m +d 1 y 1 + ã ã ã +d n y n 0
Letu c 1 x 1 + ã ã ã +c m x m , v d 1 y 1 + ã ã ã +d n y n Sincex i , y j are orthogonal for eachi, j,uandvare orthogonal However,u+v0and henceuv0.
It follows thatc i 0, d j 0 for eachi, j, and hencex 1 , , x m , y 1 , , y n is a linearly independent set Therefore,m+n≤p.Ifm+n < p, then there exists a vectorz∈T such thatx 1 , , x m , y 1 , , y n , zis a linearly independent set Let
Let M be the linear span of vectors x₁, , xₘ and y₁, , yₙ According to theorem 5.2, there exists a decomposition z = u + v, where u is in M and v is in the orthogonal complement of M (M⊥) Consequently, v is orthogonal to each xᵢ, implying v is in W⊥ Additionally, v is orthogonal to each yᵢ, leading to the conclusion that v is zero This results in z being equal to u, which contradicts the premise that z is linearly independent of the vectors x₁, , xₘ and y₁, , yₙ Therefore, it follows that m + n must be greater than or equal to p.
The proof of the next result is left as an exercise.
5.4 IfS 1 ⊂S 2 ⊂T are vector spaces, then(i) (S 2) ⊥ ⊂(S 1) ⊥ ;(ii) (S 1 ⊥ ) ⊥ S 1
LetAbe anm×nmatrix The set of all vectorsx ∈R n such thatAx0is easily seen to be a subspace ofR n This subspace is called thenull spaceofA, and we denote it byN(A).
5.5 Let A be anm×nmatrix ThenN(A)C(A ) ⊥
Proof Ifx∈N(A), thenAx0, and hencey Ax0for ally∈R m Thusxis orthogonal to any vector inC(A ).Conversely, ifx∈C(A ) ⊥ , thenxis orthogonal to every column ofA , and thereforeAx0 2
5.6 Let A be anm×nmatrix of rankr Thendim(N(A))n−r.
Proof We have dim(N(A)) dimC(A ) ⊥ by5.5 n−dimC(A ) by5.3 n−r.
The dimension of the null space ofAis called thenullityofA Thus5.6says thatthe rank plus the nullity equals the number of columns.
1 Which of the following functions define an inner product onR 3 ? (i)f(x , y) x 1 y 1+x 2 y 2+x 3 y 3+1; (ii)f(x , y)2x 1 y 1+3x 2 y 2+x 3 y 3−x 1 y 2−x 2 y 1; (iii) f(x , y)x 1 y 1+2x 2 y 2+x 3 y 3+2x 1 y 2+2x 2 y 1; (iv)f(x , y)x 1 y 1+x 2 y 2; (v)f(x , y)x 1 3 y 1 3 +x 2 3 y 2 3 +x 3 3 y 3 3
2 Show that the following vectors form a basis forR 3 Use the Gram–Schmidt procedure to convert it into an orthonormal basis. x
Nonsingularity
In the context of linear equations involving unknowns \( x_1, \ldots, x_n \), these equations can be represented as a matrix equation \( Ax = b \), where \( A \) is an \( m \times n \) matrix of coefficients The equation \( Ax = b \) is considered consistent if it has at least one solution; if no solutions exist, it is termed inconsistent Additionally, the equation is classified as homogeneous when \( b = 0 \) The solutions to the homogeneous equation \( Ax = 0 \) correspond to the null space of the matrix \( A \).
If the equationAxbis consistent, then we can write b x 1 0 a 1 + ã ã ã +x n 0 a n for some x 1 0 , , x n 0 , wherea 1 , , a n are the columns of A Thus b ∈ C(A).
If b belongs to the column space of matrix A, then the equation Ax = b is guaranteed to be consistent When this equation is consistent and x₀ is a known solution, the complete set of solutions can be expressed accordingly.
Clearly, the equation Ax b has either no solution, a unique solution, or infinitely many solutions.
A matrixAof ordern×nis said to benonsingularifR(A)n; otherwise, the matrix issingular.
6.1 Let A be ann×nmatrix Then the following conditions are equivalent:
(ii) For any b∈R n , Axb has a unique solution.
(iii) There exists a unique matrix B such that ABBAI
Proof (i)⇒(ii) SinceR(A)n, we haveC(A)R n , and thereforeAxb has a solution IfAxbandAyb, thenA(x−y)0 By5.6, dim(N(A))0 and thereforexy This proves the uniqueness.
(ii)⇒(iii) By (ii),Ax e i has a unique solution, sayb i ,wheree i is theith column of the identity matrix ThenB[b 1 , , b n ] is a unique matrix satisfying
ABI Applying the same argument toA , we conclude the existence of a unique matrixCsuch thatCAI NowB(CA)BC(AB)C
(iii)⇒(i) Suppose (iii) holds Then anyx∈R n can be expressed asxA(Bx), and henceC(A) R n ThusR(A), which by definition is dim(C(A)), must be n 2
The matrixBof (iii) of6.1is called theinverseofAand is denoted byA − 1
IfA , Baren×nmatrices, then (AB)(B − 1 A − 1 ) I, and therefore (AB) − 1
The product of two nonsingular matrices is also nonsingular For an n×n matrix A, the submatrix A_ij is formed by removing row i and column j The cofactor of A_ij is defined as (−1)^(i+j) |A_ij| The adjoint of matrix A, represented as adjA, is an n×n matrix where the (i, j)-entry corresponds to the cofactor of the entry a_ji.
From the theory of determinants we have n j1 a ij (−1) i+j |A ij | |A|, and fori k, n j 1 a ij (−1) j +k |A kj | 0.
These equations can be interpreted as
Thus if|A| 0, thenA − 1 exists and
Conversely, ifAis nonsingular, then fromAA − 1 Iwe conclude that|AA − 1 |
|A||A − 1 | 1 and therefore|A| 0 We have therefore proved the following result:
6.2 A square matrix is nonsingular if and only if its determinant is nonzero.
Anr×rminorof a matrix is defined to be the determinant of anr×rsubmatrix ofA
Let A be an m×n matrix with rank r, and consider an s×s minor of A formed by specific rows and columns If s is greater than r, the selected columns must be linearly dependent, resulting in a zero minor Conversely, if A has rank r, it contains r linearly independent rows, allowing for the formation of a submatrix B with rank r Consequently, B possesses an r×r submatrix C that also has rank r, which implies that C has a nonzero determinant.
The rank of a matrix A is defined by its minors: it is r if there exists a nonzero r×r minor, while all s×s minors (where s > r) are zero Notably, the rank is zero if and only if A is the zero matrix.
1 LetAbe ann×nmatrix Show thatAis nonsingular if and only ifAx 0 has no nonzero solution.
2 LetAbe ann×nmatrix and letb∈ R n Show thatAis nonsingular if and only ifAxbhas a unique solution.
3 LetAbe ann×nmatrix with only integer entries Show thatA − 1 exists and has only integer entries if and only if|A| ±1.
4 Compute the inverses of the following matrices:
5 LetA , Bbe matrices of order 9×7 and 4×3, respectively Show that there exists a nonzero 7×4 matrixXsuch thatAXB0
Frobenius Inequality
7.1 Let B be anm×rmatrix of rankr Then there exists a matrix X (called a left inverse of B ), such that XBI
Proof Ifmr, thenBis nonsingular and admits an inverse So supposer < m.
The columns of matrix B are linearly independent, allowing us to identify a set of m−r columns that, when combined with B's columns, create a basis for R^m Consequently, we can construct a matrix U of size m×(m−r) such that the concatenated matrix [B, U] is nonsingular Let the inverse of the matrix [B, U] be denoted as X.
We can similarly show that anr×nmatrixCof rankrhas aright inverse,i.e., a matrixYsuch thatCYI Note that a left inverse or a right inverse is not unique, unless the matrix is square and nonsingular.
7.2 Let B be anm×rmatrix of rankr Then there exists a nonsingular matrix
Proof The proof is the same as that of7.1 If we setP X
Similarly, ifCisr×nof rankr, then there exists a nonsingular matrixQsuch that
CQ [I , 0] These two results and the rank factorization (see4.3) immediately lead to the following.
7.3 Let A be anm×nmatrix of rankr Then there exist nonsingular matrices
7.4 Let A , B be matrices of orderm×nandn×p, respectively IfR(A)n, thenR(AB)R(B) IfR(B)n, thenR(AB)R(A).
Proof First supposeR(A)n By7.1, there exists a matrixXsuch thatXAI.
R(B)R(XAB)≤R(AB)≤R(B), and henceR(AB)R(B) The second part follows similarly 2
As an immediate corollary of 7.4we see that the rank is not affected upon multiplying by a nonsingular matrix.
7.5 Let A be anm×nmatrix of rankr Then there exists anm×nmatrix Z of rankn−rsuch that A+Z has rankn.
Proof By7.3there exist nonsingular matricesP , Qsuch that
Q − 1 , whereWis any matrix of rankn−r Then it is easily verified thatP(A+Z)Qhas rankn SinceP , Qare nonsingular, it follows by the remark immediately preceding the result thatA+Zhas rankn 2
Observe that7.5may also be proved using rank factorization; we leave this as an exercise.
7.6 (The Frobenius Inequality) Let A , B be matrices of orderm×nandn×p respectively Then
Proof By7.5there exists a matrixZof rankn−R(A) such thatA+Zhas rank n We have
1 Let A , X , B be matrices such that the product AXB is defined Prove the following generalization of the Frobenius inequality:
2 LetAbe ann×nmatrix such thatA 2 I Show thatR(I+A)+R(I−A)n.
Eigenvalues and the Spectral Theorem
LetAbe ann×nmatrix The determinant|A−λ I|is a polynomial in the (complex) variableλof degreenand is called thecharacteristic polynomialofA The equation
|A−λ I| 0 is called thecharacteristic equationofA By the fundamental theorem of algebra, the equation hasnroots, and these roots are called theeigenvaluesofA
The eigenvalues may not all be distinct The number of times an eigenvalue occurs as a root of the characteristic equation is called thealgebraic multiplicity of the eigenvalue.
We factor the characteristic polynomial as
Settingλ0 in (4) we see that|A|is just the product of the eigenvalues ofA.
Similarly by equating the coefficient ofλ n−1 on either side of (4) we see that the trace ofAequals the sum of the eigenvalues.
Aprincipal submatrixof a square matrix is a submatrix formed by a set of rows and the corresponding set of columns Aprincipal minorofAis the determinant of a principal submatrix.
A square matrixAis calledsymmetricifAA Ann×nmatrixAis said to bepositive definiteif it is symmetric and if for any nonzero vectorx , x Ax >0.
The identity matrix is clearly positive definite and so is a diagonal matrix with only positive entries along the diagonal.
8.1 If A is positive definite, then it is nonsingular.
Proof IfAx 0, then x Ax 0, and sinceAis positive definite, x 0.
The next result is obvious from the definition.
8.2 If A , B are positive definite and ifα ≥ 0, β ≥ 0, withα+β > 0, then α A+β B is positive definite.
8.3 If A is positive definite then|A|>0.
1.8 Eigenvalues and the Spectral Theorem 19
By8.2,α A+(1−α)Iis positive definite, and therefore by8.1,f(α) 0,0 ≤ α≤1 Clearly,f(0)1, and sincef is continuous,f(1) |A|>0 2
8.4 If A is positive definite, then any principal submatrix of A is positive definite.
In the context of positive definite matrices, it is established that for any vector \( x \) where \( x > 0 \), the expression \( x^T A x > 0 \) holds true When applying this condition to vectors with zeros in specific coordinates, namely \( j_1, \ldots, j_s \), the expression simplifies to \( y^T B y \), where \( B \) is the principal submatrix of \( A \) obtained by removing the corresponding rows and columns This leads to the conclusion that \( B \), along with any principal submatrix of \( A \), is also positive definite.
A symmetricn×nmatrixAis said to bepositive semidefiniteifx Ax≥0 for allx∈R n
8.5 If A is a symmetric matrix, then the eigenvalues of A are all real.
Proof Supposeàis an eigenvalue ofAand letàα+iβ, whereα, βare real andi√
Taking the complex conjugate of the above determinant and multiplying the two, we get
Given that matrix A is symmetric, it follows that A² is positive semidefinite, as any matrix B will result in BB being positive semidefinite Consequently, if β is greater than zero, the expression |(A−α I)² + β² I| is positive definite, leading to the conclusion that equation (5) cannot be satisfied Therefore, β must be zero, indicating that α must be a real number.
If Ais a symmetric n×n matrix, we will denote the eigenvalues of Aby λ 1(A)≥ ã ã ã ≥λ n (A) and occasionally byλ 1≥ ã ã ã ≥λ n if there is no possibility of confusion.
LetAbe a symmetricn×nmatrix Then for anyi,|A−λ i I| 0 and therefore
The null space of the matrix A - λiI is singular, indicating that it has a dimension of at least one This null space is referred to as the eigenspace of A associated with the eigenvalue λi, and any nonzero vector within this eigenspace is termed an eigenvector corresponding to λi The dimension of this null space is known as the geometric multiplicity of λi.
8.6 Let A be a symmetricnìnmatrix, and letλ àbe eigenvalues of A with x , y as corresponding eigenvectors respectively Then x y0.
Proof We haveAx λ xandAy à y Therefore,y Ax y (Ax) λ y x.
Also,y Ax (y A)x à y x Thusλ y x à y x Sinceλ à, it follows that x y0 2
A square matrixPis said to beorthogonalifP − 1 P , that is to say, ifPP
P PI Thus ann×nmatrix is orthogonal if its rows (as well as columns) form
Vector spaces and matrices play a crucial role in linear algebra, particularly in the context of orthonormal bases for \( \mathbb{R}^n \) The identity matrix is a prime example of an orthogonal matrix Additionally, any matrix derived from the identity matrix through the permutation of its rows or columns is known as a permutation matrix, which also maintains orthogonality Furthermore, the product of orthogonal matrices results in another orthogonal matrix, highlighting the stability of orthogonality under multiplication.
8.7 (The Spectral Theorem) Let A be a symmetricn×nmatrix Then there exists an orthogonal matrix P such that
To prove the result, we start by assuming it holds for matrices of order n−1 and use induction Let x be an eigenvector associated with the eigenvalue λ1, and let Q be an orthogonal matrix where x is the first column Such a matrix Q can be constructed by extending x to a basis for R^n and applying the Gram-Schmidt process.
The eigenvalues ofQ AQare alsoλ 1 , , λ n , and hence the eigenvalues ofBare λ 2 , , λ n Clearly,Bis symmetric sinceQ AQis so By the induction assumption there exists an orthogonal matrixRsuch that
Suppose the matrixPin8.7has columnsx 1 , , x n Then, since
APPdiag(λ 1 , , λ n ), we haveAx i λ i x i In other words,x i is an eigenvector ofAcorresponding toλ i Another way of writing (6) is
This is known as thespectral decompositionofA
8.8 Let A be a symmetricn×nmatrix Then A is positive definite if and only if the eigenvalues of A are all positive.
1.8 Eigenvalues and the Spectral Theorem 21
Proof By the Spectral Theorem,P AP diag(λ 1 , , λ n ) for an orthogonal matrixP The result follows from the fact thatAis positive definite if and only if
Similarly, a symmetric matrix is positive semidefinite if and only if its eigenvalues are all nonnegative.
8.9 If A is positive semidefinite, then there exists a unique positive semidefinite matrix B such that B 2 A The matrix B is called the square root of A and is denoted by A 1 / 2
Proof There exists an orthogonal matrixPsuch that (6) holds SinceAis positive semidefinite,λ i ≥0, i1, , n Set
To demonstrate the uniqueness of positive semidefinite matrices B and C satisfying AB² = C², we must establish that BC holds Let D represent B - C According to the spectral theorem, there exists an orthogonal matrix Q such that Z = QDQ results in a diagonal matrix By defining E as EQB and F as QCC, it is sufficient to prove that EF is valid.
SinceZE−Fis a diagonal matrix,e ij f ij , i j Also,
EZ+ZFE(E−F)+(E−F)FE 2 −F 2 Q(B 2 −C 2 )Q 0 , and therefore,
If \( e_i = 0 \), then \( f_i = 0 \) Additionally, if \( e_i = 0 \), then \( e_i + f_i = 0 \) Given that \( E \) and \( F \) are positive semidefinite, it follows that \( e_i \geq 0 \) and \( f_i \geq 0 \), which implies \( e_i f_i = 0 \) Therefore, for all \( i = 1, \ldots, n \), the conclusion holds, completing the proof.
A square matrixAis said to beidempotentifA 2 A
8.10 If A is idempotent, then each eigenvalue of A is either0or1.
Proof Letλ 1 , , λ n be the eigenvalues ofA Thenλ 2 1 , , λ 2 n are the eigen- values ofA 2 (see Exercise 12) SinceAA 2 ,{λ 2 1 , , λ 2 n } {λ 1 , , λ n }, and it follows thatλ i 0 or 1 for eachi 2
Conversely, ifAis symmetric and if each eigenvalue ofAis 0 or 1, thenAis idempotent This follows by an application of the spectral theorem.
We say that a matrix has full row (or column) rank if its rank equals the number of rows (or columns).
8.11 If A is idempotent, thenR(A)trace A
Proof LetA BCbe a rank factorization SinceBhas full column rank, it admits a left inverse by7.1 Similarly,Cadmits a right inverse LetB − , C − r be a left inverse and a right inverse ofB , C, respectively ThenA 2 Aimplies
22 1 Vector Spaces and Matrices and henceCB I, where the order of the identity matrix is the same asR(A).
More results on positive definite matrices and idempotent matrices are given in the Exercises.
Let A be a symmetric n×n matrix where the sum of the entries in each row equals α It can be demonstrated that α is an eigenvalue of A Additionally, the remaining eigenvalues of A are denoted as α₂, , αₙ When considering the matrix A + βJ, where β is a real number and J represents the matrix of all ones, the eigenvalues of this new matrix can be derived based on the properties of A and the structure of J.
Jis a matrix of all ones?
2 Find the eigenvalues of then×nmatrix with all diagonal entries equal toa and all the remaining entries equal tob.
3 IfAis a symmetric matrix, then show that the algebraic multiplicity of any eigenvalue ofAequals its geometric multiplicity.
4 IfAis a symmetric matrix, what would be a natural way to define matrices sinA and cosA? Does your definition respect the identity (sinA) 2 +(cosA) 2 I?
5 LetAbe a symmetric, nonsingular matrix Show thatAis positive definite if and only ifA −1 is positive definite.
6 LetAbe ann×npositive definite matrix and letx∈R n withx 1 Show that
7 Letθ 1 , , θ n ∈ [−π, π] and letAbe then×nmatrix with its (i, j)-entry given by cos(θ i −θ j ) for alli, j Show thatAis positive semidefinite What can you say about the rank ofA?
Exercises
1 Consider the set of all vectorsxinR n such that n i 1 x i 0 Show that the set is a vector space and find a basis for the space.
2 Consider the set of alln×nmatricesAsuch that traceA0 Show that the set is a vector space and find its dimension.
3 LetAbe ann×nmatrix such that traceAB0 for everyn×nmatrixB Can we conclude thatAmust be the zero matrix?
4 LetAbe ann×nmatrix of rankr.If rowsi 1 , , i r are linearly independent and if columnsj 1 , , j r are linearly independent, then show that ther×r submatrix formed by these rows and columns has rankr.
5 For any matrixA, show thatA0if and only if traceA A0.
6 LetAbe a square matrix Prove that the following conditions are equivalent: (i)Ais symmetric; (ii)A 2 AA ; (iii) traceA 2 traceAA ; (iv)A 2 A A;
7 LetAbe a square matrix with all row sums equal to 1.IfAA A A, then show that the column sums ofAare also equal to 1.
8 LetA , B , C , Dben×nmatrices such that the matrix
C D has rankn Show that|AD| |BC|.
9 LetA , Bben×nmatrices such thatR(AB)R(B) Show that the following type of cancellation is valid: WheneverABXABY, thenBXBY
10 LetAbe ann×nmatrix such thatR(A)R(A 2 ).Show thatR(A)R(A k ) for any positive integerk.
11 LetA , Bben×nmatrices such thatAB0 Show thatR(A)+R(B)≤n.
12 IfAhas eigenvaluesλ 1 , , λ n , then show thatA 2 has eigenvaluesλ 2 1 , , λ 2 n
13 If A , B are n ×n matrices, then show that AB and BA have the same eigenvalues.
14 LetA , Bbe matrices of orderm×n, n×m, respectively Consider the identity
Now obtain a relationship between the characteristic polynomoials ofABand
BA Conclude that the nonzero eigenvalues ofABandBAare the same.
15 If S is a nonsingular matrix, then show that A andS − 1 AS have the same eigenvalues.
16 SupposeAis ann×nmatrix, and let
|A−λ I| c 0−c 1 λ+c 2 λ 2 − ã ã ã +c n (−1) n λ n be the characteristic polynomial ofA The Cayley–Hamilton theorem asserts thatAsatisfies its characteristic equation, i.e., c 0 I−c 1 A+c 2 A 2 − ã ã ã +c n (−1) n A n 0
Prove the theorem for a diagonal matrix Then prove the theorem for any symmetric matrix.
17 Prove the following: If A B B for some matrix B, then A is positive semidefinite Further,Ais positive definite ifBhas full column rank.
To establish key properties of positive semidefinite matrices, we can demonstrate that if matrix A is positive semidefinite, then its determinant |A| is non-negative Furthermore, all principal minors of A are also nonnegative Additionally, matrix A is classified as positive definite if and only if it is nonsingular, confirming the relationship between positive definiteness and the invertibility of the matrix.
19 For any matrixX, show thatR(X X)R(X).IfAis positive definite, then show thatR(X AX)R(X) for anyX
20 LetAbe a square matrix such thatA+A is positive definite Then prove that
21 If A is symmetric, then show that R(A) equals the number of nonzero eigenvalues ofA, counting multiplicity.
22 LetAhave eigenvaluesλ 1 , , λ n and let 1≤k≤n Show that i 1 0, then×nmatrix (t x i +x j ) is positive semidefinite Now use the fact that x 1 i +x j 1
34 Using the spectral theorem we may assume, without loss of generality, that
Xdiag(x 1 , , x n ) LetXY+YXZ Theny ij (x i +x j )z ij , and hence y ij x z ij i +x j for alli, j Now use the preceding two exercises.
35 Hint: LetX (A 1 2 +B 1 2 ), Y (A 1 2 −B 1 2 ).ThenXY+YX 2(A−B), which is positive semidefinite Now use the preceding exercise.
Generalized Inverses
LetAbe anm×nmatrix A matrixGof ordern×mis said to be ageneralized inverse(or ag-inverse) ofAifAGAA
IfAis square and nonsingular, thenA − 1 is the unique g-inverse ofA Otherwise,
Ahas infinitely many g-inverses, as we will see shortly.
1.1 Let A , G be matrices of orderm×n and n×mrespectively Then the following conditions are equivalent:
(ii) For any y∈C(A), xGy is a solution of Axy
Proof (i) ⇒(ii) Any y ∈ C(A) is of the form y Az for somez Then A(Gy)AGAzAzy
(ii)⇒(i) SinceAGyyfor anyy∈C(A) we haveAGAzAzfor allz In particular, if we letzbe theith column of the identity matrix, then we see that the ith columns ofAGAandAare identical Therefore,AGAA 2
LetABCbe a rank factorization We have seen thatBadmits a left inverse
B − , andCadmits a right inverseC − r ThenGC − r B − is a g-inverse ofA, since
Alternatively, ifAhas rankr, then by7.3of Chapter 1 there exist nonsingular matricesP , Qsuch that
It can be verified that forany U , V , Wof appropriate dimensions,
P − 1 is a g-inverse ofA This also shows that any matrix that is not a square nonsingular matrix admits infinitely many g-inverses.
Another method that is particularly suitable for computing a g-inverse is as follows LetAbe of rankr Choose anyr×r nonsingular submatrix ofA For convenience let us assume
, whereA 11 isr×rand nonsingular SinceAhas rankr,there exists a matrixX such thatA 12 A 11 X , A 22 A 21 X Now it can be verified that then×mmatrix
0 0 is a g-inverse ofA (Just multiplyAGAout.) We will often use the notationA − to denote a g-inverse ofA
1.2 If G is a g-inverse of A , thenR(A)R(AG)R(GA).
Proof R(A) R(AGA) ≤ R(AG) ≤ R(A) The second part follows similarly 2
A g-inverse ofAis called areflexive g-inverseif it also satisfiesGAG G.
Observe that ifGis any g-inverse ofA, thenGAGis a reflexive g-inverse ofA 1.3 Let G be a g-inverse of A ThenR(A)≤R(G) Furthermore, equality holds if and only if G is reflexive.
Proof For any g-inverseGwe haveR(A)R(AGA)≤R(G).IfGis reflexive, thenR(G)R(GAG)≤R(A) and henceR(A)R(G).
Conversely, supposeR(A)R(G) First observe thatC(GA)⊂C(G) By1.2,
R(G)R(GA), and henceC(G)C(GA).Therefore,GGAXfor someX.
1.4 Let A be anm×nmatrix, let G be a g-inverse of A and let y∈C(A) Then the class of solutions of Axy is given by Gy+(I−GA)z , where z is arbitrary.
A{Gy+(I−GA)z} AGyy , sincey ∈ C(A), and henceGy+(I−GA)zis a solution Conversely, ifu is a solution, then setzu−Gyand verify that
A g-inverseGofAis said to be aminimum norm g-inverseofAif in addition toAGA A, it satisfies (GA) GA The reason for this terminology will be clear from the next result.
1.5 Let A be anm×nmatrix Then the following conditions are equivalent:
(i) G is a minimum norm g-inverse of A
(ii) For any y∈C(A), xGy is a solution of Axy with minimum norm.
Proof (i)⇒(ii) In view of1.4we must show that
Gy ≤ Gy+(I−GA)z (1) for anyy∈C(A) and for anyz
Gy+(I−GA)z 2 Gy 2 + (I−GA)z 2 +2y G (I−GA)z (2) Sincey∈C(A), thenyAufor someu Hence y G (I−GA)zu A G (I−GA)z u GA(I−GA)z
Inserting this in (2) we get (1).
(ii)⇒(i) Since for anyy∈C(A), x Gyis a solution ofAxy, by1.1,G is a g-inverse ofA Now we have (1) for allz, and therefore for allu , z ,
In the equation (3), replacing \( u \) with \( \alpha u \) leads to a contradiction if \( u A G (I−GA)z < 0 \) when \( \alpha \) is chosen large and positive Conversely, if \( u A G (I−GA)z > 0 \), selecting a large and negative \( \alpha \) also results in a contradiction Consequently, we deduce that \( u A G (I−GA)z = 0 \) for all \( u \) and \( z \), which implies \( A G (I−GA) = 0 \) This indicates that \( A G \) is equivalent to \( (GA) GA \), confirming its symmetry.
A g-inverseGofAis said to be aleast squares g-inverseofAif in addition to
AGAA, it satisfies (AG) AG
1.6 Let A be anm×nmatrix Then the following conditions are equivalent:
(i) G is a least squares g-inverse of A
(ii) For any x , y , AGy−y ≤ Ax−y
Proof (i)⇒(ii) Letx−Gyw Then we must show
AGy−y+Aw 2 (AG−I)y 2 + Aw 2 +2w A (AG−I)y (5) But w A (AG−I)yw (A G A −A )y0, since (AG) AG Inserting this in (5) we get (4).
(ii)⇒(i) For any vectorx, setyAxin (ii) Then we see that
Given that AGAx−Ax ≤ Ax−Ax 0 implies AGA A, it follows that G serves as a g-inverse of A for any arbitrary x The proof's remaining steps mirror those of the implication (ii)⇒(i) in section 1.5 and are left for the reader to complete as an exercise.
To find a solution \( x \) that minimizes the difference \( Ax - y \) for the equation \( Ax = y \), which may not be consistent, we can utilize a least squares g-inverse \( G \) This approach ensures that the solution is optimal by minimizing the residual error.
IfGis a reflexive g-inverse ofAthat is both minimum norm and least squares then it is called aMoore–Penrose inverseofA In other words,Gis a Moore– Penrose inverse ofAif it satisfies
AGAA , GAGG , (AG) AG , (GA) GA (6)
We will demonstrate the existence and uniqueness of the group \( G \) To establish uniqueness, we assume that \( G_1 \) and \( G_2 \) both satisfy the condition outlined in equation (6) Our goal is to show that \( G_1 \) is equal to \( G_2 \) Each subsequent step in our proof will build on the application of equation (6), with specific terms being reinterpreted to advance to the next conclusion.
Linear Model
We will denote the Moore–Penrose inverse ofAbyA + We now show the existence. LetABCbe a rank factorization Then it can be easily verified that
1 Find two different g-inverses of
2 Find the minimum norm solution of the system of equations
3 Find the Moore–Penrose inverse of 2 4
Letybe a column vector with componentsy 1 , , y n We callyarandom vectorif eachy i is a random variable The expectation ofy, denoted byE(y), is the column
34 2 Linear Estimation vector whoseith component isE(y i ) Clearly,
E(Bx+Cy)B E(x)+C E(y), wherex , yare random vectors andB , Care constant nonrandom matrices.
Ifx , yare random vectors of orderm, n, respectively, then thecovariance ma- trixbetweenx , y, denoted by cov(x , y), is anm×nmatrix whose (i, j)-entry is cov(x i , y j ).
Thedispersion matrix,or the variance-covariance matrix ofy, denoted byD(y), is defined to be cov(y , y) The dispersion matrix is obviously symmetric.
Ifb , care constant vectors, then cov(b x , c y)cov(b 1 x 1+ ã ã ã +b m x m , c 1 y 1 ,+ ã ã ã +c n y n ) m i1 n j 1 b i c j cov(x i , y j ) b cov(x , y)c
It follows that ifB , Care constant matrices, then cov(Bx , Cy)Bcov(x , y)C
Since variance is nonnegative, we conclude thatD(x) is positive semidefinite Note thatD(x) is positive definite unless there exists a linear combinationb xthat is constant with probability one.
In this article, we introduce the concept of a linear model, which arises from experiments yielding random variables y1, y2, , yn We assume that the distribution of these random variables is influenced by a limited number of unknown parameters The fundamental premise of a linear model is based on this assumption.
E(y i ) is a linear function of the parametersβ 1 , , β p with known coefficients.
In matrix notation this can be expressed as
The equation E(y) = Xβ represents a statistical model where y is an n×1 vector consisting of components y1, , yn, X is a known nonrandom matrix of size n×p, and β is a p×1 vector of parameters β1, , βp It is assumed that the components y1, , yn are uncorrelated and exhibit constant variance, denoted as var(yi) = σ² for all i, a property referred to as homoscedasticity.
Another way to write the model is yX β+, where the vector satisfiesE( )0 , D( )σ 2 I
We do not make any further assumptions about the distribution ofyat present. Our first objective is to find estimates ofβ 1 , β p and their linear combinations.
We also seek an estimate ofσ 2
Estimability
Answer the following questions with reference to the linear modelE(y 1)β 1+ β 2 , E(y 2)2β 1−β 2 , E(y 3)β 1−β 2, wherey 1 , y 2 , y 3are uncorrelated with a common varianceσ 2 :
1 Find two different linear functions of y 1 , y 2 , y 3 that are unbiased for β 1. Determine their variances and the covariance between the two.
2 Find two linear functions that are both unbiased forβ 2and are uncorrelated.
3 Write the model in terms of the new parametersθ 1 β 1+2β 2 , θ 2 β 1−2β 2
The linear parametric function βis said to beestimableif there exists a linear functionc yof the observations such thatE(c y) βfor allβ∈R p
The conditionE(c y) βis equivalent toc X β β, and since this must hold for allβinR p , we must havec X Thus βis estimable if and only if
The following facts concerning generalized inverse are frequently used in this as well as the next chapter:
(i) For any matrix X, R(X) R(X X) This is seen as follows Clearly,
R(X X) ⊂ R(X).However,X XandXhave the same rank (see Exercise
19, Chapter 1), and therefore their row spaces have the same dimension. This implies that the spaces must be equal As a consequence we can write
(ii) The matrixAC − Bis invariant under the choice of the g-inverse C − ofC ifC(B) ⊂C(C) andR(A) ⊂ R(C) This is seen as follows We can write
AC − BVCC − CUVCU , which does not depend on the choice of the g-inverse (Note that the matrices
U , Vare not necessarily unique However, ifBCU 1 , AV 1 Cis another representation, then
V 1 CU 1 V 1 CC − CU 1 AC − BVCC − CUVCU )
The statement has a converse, which we will establish in Chapter 6. (iii) The matrixX(X X) − X is invariant under the choice of the g-inverse This is immediate from (ii), sinceR(X)R(X X).
(iv) X(X X) − X X X , X X(X X) − X X This is easily proved by writing
3.1 Let βbe an estimable function and let G be a least squares g-inverse of X
Gy serves as the best linear unbiased estimate (BLUE) of β, characterized by its minimum variance among all unbiased linear estimates The variance of Gy is expressed as σ²(X'X)⁻¹.
Proof Since βis estimable, u Xfor someu Then
E( Gy)u XGX β u X β β, and hence Gyis unbiased for β.Any other linear unbiased estimate is of the form ( G+w )y, wherew X0 Now var{( G+w )y} σ 2 ( G+w )(G +w) σ 2 (u XG+w )(G X u+w).
SinceGis a least squares g-inverse ofX , u XGwu G X w0, and therefore var{( G+w )y} σ 2 (u (XG)(XG) u+w w)
Therefore, Gyis BLUE of β The variance of Gyisσ 2 GG It is easily seen that for any choice of g-inverse, (X X) − X is a least squares g-inverse ofX.
In particular, using the Moore–Penrose inverse,
(X X) − , since (X X) − u X(X X) − X u is invariant with respect to the choice of g- inverse 2
We can express the model in standard form as
, so thatXis the 4×4 matrix on the right-hand side LetSbe the set of all vectors(l 1 , l 2 , m 1 , m 2) such thatl 1+l 2 m 1+m 2 Note that ifx ∈R(X), thenx∈S.
ThusR(X)⊂S Clearly, dim(S)3, and the rank ofXis 3 as well Therefore,
R(X)S, and we conclude thatl 1 α 1+l 2 α 2+m 1 β 1+m 2 β 2is estimable if and only ifl 1+l 2 m 1+m 2
is one possible g-inverse Thus
Now we can compute the BLUE of any estimable functionu X βas u X(X X) − X y For example, ifu (1,0,0,0) , then we get the BLUE ofα 1+β 1 as
The model (7) is said to be afull-rank model(or aregression model) ifXhas full column rank, i.e.,R(X)p For such models the following results can easily be verified.
(i) R(X)R p , and therefore every function βis estimable.
(iii) Letβ i be the BLUE ofβ i and letβbe the column vector with components β 1 , ,β p Thenβ(X X) − 1 X y The dispersion matrix ofβisσ 2 (X X) − 1
(iv) The BLUE of βis βwith varianceσ 2 (X X) − 1
Parts (iii) and (iv) constitute theGauss–Markov theorem.
1 Consider the model E(y 1) 2β 1 −β 2 −β 3 , E(y 2) β 2 −β 4 , E(y 3) β 2+β 3−2β 4with the usual assumptions Describe the estimable functions.
2 Consider the modelE(y 1) β 1+β 2 , E(y 2) β 1−β 2 , E(y 3) β 1+2β 2 with the usual assumptions Obtain the BLUE of 2β 1+β 2and find its variance.
3 Consider the modelE(y 1)2β 1+β 2 , E(y 2)β 1−β 2 , E(y 3)β 1+αβ 2 with the usual assumptions Determineα such that the BLUEs ofβ 1 , β 2 are uncorrelated.
Weighing Designs
The next result is the Hadamard inequality for positive semidefinite matrices.
4.1 Let A be ann×npositive semidefinite matrix Then
Furthermore, if A is positive definite, then equality holds in the above inequality if and only if A is a diagonal matrix.
If matrix A is singular, then its determinant |A| equals zero, while each diagonal element a ii is non-negative In contrast, if A is nonsingular, all diagonal elements a ii are strictly positive We define matrix D as the diagonal matrix with elements √a 11 to √a nn, and construct matrix B as D^(-1)AD^(-1) This results in B being positive semidefinite with each diagonal entry b ii equal to 1 The eigenvalues of matrix B are denoted as λ 1 through λ n, and we can apply the arithmetic mean-geometric mean inequality to analyze these eigenvalues further.
Since n i1 λ i traceB n and n i1 λ i |B|, we get|B| ≤1 Therefore,
If the inequality |D − 1 AD − 1 | ≤ 1 holds and A is positive definite with equality, then the arithmetic mean-geometric mean inequality applies, indicating that all eigenvalues λ 1 , , λ n are equal Consequently, by the spectral theorem, B must be a scalar multiple of the identity matrix, which implies that A is diagonal.
4.2 Let X be ann×nmatrix and suppose|x ij | ≤1for alli, j Then
Proof LetAX X Then a ii n j 1 x 2 j i ≤n and|A| |X X| The result follows by4.1 2
The application of the inequality in section 4.2 is demonstrated through an example involving the weighing of four objects using a standard chemical balance with two pans and a limit of four weighings Each weighing allows for a specific allocation of the objects to either pan, referred to as a weighing design Let β₁, β₂, β₃, and β₄ represent the true weights of the objects, and define xᵢⱼ as either 1 or -1 based on the placement of the jth object in the ith weighing, with xᵢⱼ set to 0 if the object is not included The variable yᵢ indicates the weight necessary for balance in the ith weighing, where a positive sign denotes a need in the left pan and a negative sign indicates the right pan This leads to the model E(y) = Xβ, where X is the matrix of xᵢⱼ values and y is a vector comprising the yᵢ components, while β is a vector of the βᵢ values Assuming uncorrelated yᵢ’s with a common variance of σ², the dispersion matrix for β is σ²(XX)⁻¹, contingent on XX being nonsingular To enhance precision, it is crucial to maximize the size of the XX matrix, with the determinant serving as a key measure of largeness, aligning with the D-optimality criterion discussed further in Chapter 5 regarding block designs.
(8) satisfies|X X| 4 4 , and by4.2this is the maximum determinant possible.
A square matrix is called aHadamard matrixif each entry is 1 or−1 and the rows are orthogonal The matrix (8) is a Hadamard matrix.
is positive definite, wherea, b, care real numbers, not all zero Show that a 2 +b 2 +c 2 −2abc >0.
3 Show that there exists a Hadamard matrix of order 2 k for any positive integer k≥1.
Residual Sum of Squares
We continue to consider the model (7) of Section 3 The equationsX X β X yare called thenormal equations.The equations are consistent, sinceC(X )C(X X).
Letβbe a solution of the normal equations Thenβ(X X) − X yfor some choice of the g-inverse Theresidual sum of squares(RSS) is defined to be
The RSS is invariant under the choice of the g-inverse (X X) − , althoughβdepends on the choice Thusβis not unique and does not admit any statistical interpretation.
By “fitting the model” we generally mean calculating the BLUEs of parametric functions of interest and computing RSS.
5.1 The minimum of(y−X β) (y−X β)is attained atβ.
We will use the notation
2.5 Residual Sum of Squares 41 throughout this and the next chapter Observe thatPis a symmetric, idempotent matrix andPX0 These properties will be useful Now,
Etrace(Pyy ) traceP E(yy ) σ 2 traceP , by (9) and the fact thatPX0 Finally, traceP n−traceX(X X) − X n−trace(X X) − X X n−R((X X) − X X), since (X X) − X Xis idempotent However,
R((X X) − X X)R(X X)R(X)r, and the proof is complete 2
We conclude from5.2that RSS/(n−r) is an unbiased estimator ofσ 2 For computations it is more convenient to use the expressions
Consider the model y ij α i + ij , i1, , k, j 1, , n i , where ij are independent with mean 0 and varianceσ 2 The model can be written as
Thus the model is of full rank and the BLUEs ofα i are given by the components of α(X X) − 1 X y
Since the rank ofXisk, by5.2,E(RSS)(n−k)σ 2 , wheren n i 1 n i and RSS/(n−k) is an unbiased estimator ofσ 2
1 Consider the modelE(y 1)β 1+β 2 , E(y 2)2β 1 , E(y 3)β 1−β 2with the usual assumptions Find the RSS.
In a one-way classification model represented as \( y_{ij} = \alpha + \alpha_i + \epsilon_{ij} \), where \( i = 1, \ldots, k \) and \( j = 1, \ldots, n_i \), the \( \epsilon_{ij} \) terms are independent with a mean of 0 and a variance of \( \sigma^2 \) The parameter \( \alpha \) is commonly known as the "general effect." This framework allows for the estimation of various functions that reflect the effects of different groups within the classification model.
Is it correct to say that the grand meany is an unbiased estimator ofà?
Estimation Subject to Restrictions
Consider the usual modelE(y)X β, D(y)σ 2 I, whereyisn×1, Xisn×p.
Suppose we have a priori linear restrictionsL β zon the parameters We assume thatR(L)⊂R(X) and that the equationL β zis consistent.
Letβ(X X) − X yfor a fixed g-inverse (X X) − and let β˜ β−(X X) − L (L(X X) − L ) − (L β−z).
6.1 The minimum of(y−X β) (y−X β)subject to L β z is attained atβ ˜β.
Proof SinceR(L) ⊂R(X) and sinceR(X) R(X X), thenL WX Xfor someW LetTWX Now,
SinceL β zis consistent,Lvzfor somev Thus
Using (10), (11), (12) we see thatL β˜ z, and thereforeβ˜satisfies the restriction
(y−X β)˜ (y−X β)˜ +(β˜ −β) X X(β˜ −β), (13) since we can show (β˜−β) X (y−X β)˜ 0 as follows: We have
44 2 Linear Estimation sinceL X XW Hence
X (y−X β)˜ L (L(X X) − L ) − (L β−z), and sinceL β˜ L β z, it follows that
From (13) it is clear that
(y−X β) (y−X β)≥(y−X β)˜ (y−X β)˜ ifL β z, and the proof is complete 2
Proof Since L(X X) − L TT , then R(L(X X) − L ) R(TT ) R(T).
Clearly, R(L) R(TX) ≤ R(T) Since R(X) R(X X), then X MX X for someM ThusTWX WX XM LM Therefore,R(T)≤R(L), and henceR(T)R(L) 2
We note some simplifications that occur if additional assumptions are made. Thus suppose thatR(X)p, so that we have a full-rank model We also assume thatLism×pof rankm Then by6.2,
R(L(X X) − 1 L )R(L)m, and henceL(X X) − 1 L is nonsingular It reduces to a scalar ifm1.
Consider the modelE(y i ) θ i ,i 1, 2, 3, 4, wherey i are uncorrelated with varianceσ 2 Suppose we have the restrictionθ 1+θ 2+θ 3+θ 4 0 on the parameters.
We find the RSS The model in standard form hasXI 4 The restriction on the parameters can be written asL θ0, whereL(1,1,1,1) Thus θ(X X) − X y and θ˜ θ−(X X) − L (L(X X) − L ) − L θ y−
Consider an alternative formulation of the one-way classification model considered earlier in Section 5: y ij à+α i + ij , i1, , k; j 1, , n i ;
2.6 Estimation Subject to Restrictions 45 where ij are independent with mean 0 and varianceσ 2 This model arises when we want to comparektreatments We haven i observations on theith treatment.
In statistical analysis, the parameter is understood as the "general effect," while αi represents the "effect attributed to the ith treatment." Our objective is to determine the Residual Sum of Squares (RSS) Rather than expressing the model in its conventional format, we adopt an alternative method, where the RSS is defined as the minimum value of the summation from k=1 to n, and from j=1 onward.
We use the fact that ifu 1 , , u m are real numbers, then m i1
(u i −θ) 2 is minimized whenθ u, the mean of u 1 , , u n This is easily proved using calculus Thus (14) is minimized whenà+α i y i ,i1, , k; and therefore,
To find the RSS under the constraints that α i − α j ≥ 0 for all i, j, we can estimate α i − α j and apply the relevant formula Instead of using the complex formula, we can simplify the process by letting α represent the common value of α 1, , α k Our goal is to minimize the sum of n i multiplied by the sum of j from 1 to k.
(y ij −à−α) 2 , and this is achieved by setting à+αy 1 n k i1 n i j1 y ij , wheren k i1 n i Thus the RSS now is k i 1 n i j 1
The calculation of the Restricted Sum of Squares (RSS) under linear constraints is essential for testing the validity of these restrictions, a process that will be explored in the following chapter.
1 Consider the modelE(y 1) β 1+2β 2 , E(y 2) 2β 1 , E(y 3)β 1+β 2 with the usual assumptions Find the RSS subject to the restrictionβ 1 β 2
2 Consider the one-way classification model (withk≥2) y ij α i + ij , i1, , k, j 1, , n i , where ij are independent with mean 0 and varianceσ 2 Find the RSS subject to the restrictionα 1 α 2
Exercises
1 LetAbe a matrix and letGbe a g-inverse ofA Show that the class of all g-inverses ofAis given by G+(I−GA)U+V(I−AG), whereU,Vare arbitrary.
2 Find a g-inverse of the following matrix such that it does not contain any zero entry:
3 Show that the class of g-inverses of 1 −1
4 LetAbe anm×nmatrix of rankrand letkbe an integer,r≤k≤min(m, n). Show thatAhas a g-inverse of rankk Conclude that a square matrix has a nonsingular g-inverse.
5 Letxbe ann×1 vector Find the g-inverse ofxthat is closest to the origin.
6 LetXbe ann×mmatrix and lety∈R n Show that the orthogonal projection ofyontoC(X) is given byX(X X) − X yfor any choice of the g-inverse.
7 For any matrixX, show thatX + (X X) + X andX(X X) − X XX +
8 LetAbe anm×nmatrix and letP , Qbe matrices of orderr×m Then prove thatPAQAif and only ifPAA QAA
9 LetA , Gbe matrices of orderm×n, n×m, respectively Then show thatGis a minimum norm g-inverse ofAif and only ifGAA A
10 Is it true that any positive semidefinite matrix is the dispersion matrix of a random vector?
11 Letx 1 , , x n be real numbers with meanx.Consider the linear modelY i α+β(x i −x)+ i , i 1,2, , n, with the usual assumptions Show that the BLUEs ofαandβare uncorrelated.
12 Consider the linear model (7) and letx i be theith column ofX , i1, , p.Show that the function 1 β 1+ 2 β 2 is estimable if and only ifx 1 , x 2 do not belong to the linear span of 2 x 1 − 1 x 2 , x 3 , , x p
13 For any vector show that the following conditions are equivalent: (i) βis estimable (ii) X − Xfor some g-inverseX − (iii) (X X) − X X for some g-inverse (X X) −
To demonstrate the uniqueness of the Best Linear Unbiased Estimator (BLUE) for an estimable function, we need to show that if β is an estimable function and both c and d are BLUEs of this function, then it follows that c equals d This establishes that the BLUE of an estimable function is unique.
15 Consider the data in Table 2.1, which gives the distanced(in meters) traveled by an object in timet (in seconds) Fit the modeld i d 0+vt i +e i , i
1,2, ,7, wheree i denote uncorrelated errors with zero mean and variance σ 2 Findd 0 ,v,σ 2
16 Supposex i , y i , z i ,i 1, , n, are 3nindependent observations with com- mon variance σ 2 and expectations given by E(x i ) θ 1, E(y i ) θ 2,
E(z i )θ 1−θ 2 , i1, , n Find BLUEs ofθ 1 , θ 2and compute the RSS.
17 In Example6.3suppose a further restrictionθ 1 θ 2 is imposed Find the RSS.
In a standard linear model where the error space is one-dimensional and spanned by a linear function of the observations, let \( z \) represent this function If \( u_y \) is an unbiased estimator for the function \( p(\beta) \), then the Best Linear Unbiased Estimator (BLUE) of \( p(\beta) \) can be expressed as \( u_y - \frac{\text{cov}(u_y, z)}{\text{var}(z)} z \).
19 LetAbe anm×nmatrix of rankrand supposeAis partitioned as
, whereA 11 isr×rnonsingular Show that
20 LetAbe anm×nmatrix of rankrwith only integer entries If there exists an integer linear combination of ther×rminors ofAthat equals 1, then show thatAadmits a g-inverse with only integer entries.
21 LetAbe a positive semidefinite matrix that is partitioned as
, whereA 11 is a square matrix Then show that
Given a set of random variables \(X_1, \ldots, X_n\) with a correlation matrix \(A\), where the \((i, j)\)-entry represents the correlation between \(X_i\) and \(X_j\), if the determinant of \(A\) is non-zero (\(|A| \neq 0\)), it indicates that the random variables are linearly independent Consequently, this implies that the covariance \(cov(X_i, X_j)\) is defined for any \(i, j\), and the relationship between the variables does not exhibit perfect multicollinearity Thus, we can conclude that there exists a non-trivial correlation structure among the random variables, allowing for meaningful analysis of their covariances.
23 If there exists a Hadamard matrix of order n, n > 2, then show thatn is divisible by 4.
24 LetX 1 , , X n be random variables with equal meanàand suppose var(X i ) λ i σ 2 , i1,2, , n,whereλ >0 is known Find the BLUE ofà.