1. Trang chủ
  2. » Khoa Học Tự Nhiên

247 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition
Tác giả Haruo Yanai, Kei Takeuchi, Yoshio Takane
Trường học St. Luke’s College of Nursing
Chuyên ngành Statistics
Thể loại thesis
Năm xuất bản 2011
Thành phố Tokyo
Định dạng
Số trang 247
Dung lượng 2,65 MB

Cấu trúc

  • Cover

  • Statistics for Social and Behavioral Sciences

  • Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition

  • ISBN 9781441998866

  • Preface

  • Contents

  • Chapter 1 Fundamentals of Linear Algebra

    • 1.1 Vectors and Matrices

      • 1.1.1 Vectors

      • 1.1.2 Matrices

    • 1.2 Vector Spaces and Subspaces

    • 1.3 Linear Transformations

    • 1.4 Eigenvalues and Eigenvectors

    • 1.5 Vector and Matrix Derivatives

    • 1.6 Exercises for Chapter 1

  • Chapter 2 Projection Matrices

    • 2.1 Definition

    • 2.2 Orthogonal Projection Matrices

    • 2.3 Subspaces and Projection Matrices

      • 2.3.1 Decomposition into a direct-sum of disjoint subspaces

      • 2.3.2 Decomposition into nondisjoint subspaces

      • 2.3.3 Commutative projectors

      • 2.3.4 Noncommutative projectors

    • 2.4 Norm of Projection Vectors

    • 2.5 Matrix Norm and Projection Matrices

    • 2.6 General Form of Projection Matrices

    • 2.7 Exercises for Chapter 2

  • Chapter 3 Generalized Inverse Matrices

    • 3.1 Definition through Linear Transformations

    • 3.2 General Properties

      • 3.2.1 Properties of generalized inverse matrices

      • 3.2.2 Representation of subspaces by generalized inverses

      • 3.2.3 Generalized inverses and linear equations

      • 3.2.4 Generalized inverses of partitioned square matrices

    • 3.3 A Variety of Generalized Inverse Matrices

      • 3.3.1 Reflexive generalized inverse matrices

      • 3.3.2 Minimum norm generalized inverse matrices

      • 3.3.3 Least squares generalized inverse matrices

      • 3.3.4 The Moore-Penrose generalized inverse matrix

    • 3.4 Exercises for Chapter 3

  • Chapter 4 Explicit Representations

    • 4.1 Projection Matrices

    • 4.2 Decompositions of Projection Matrices

    • 4.3 The Method of Least Squares

    • 4.4 Extended Definitions

      • 4.4.1 A generalized form of least squares g-inverse

      • 4.4.2 A generalized form of minimum norm g-inverse

      • 4.4.3 A generalized form of the Moore-Penrose inverse

      • 4.4.4 Optimal g-inverses

    • 4.5 Exercises for Chapter 4

  • Chapter 5 Singular Value Decomposition (SVD)

    • 5.1 Definition through Linear Transformations

    • 5.2 SVD and Projectors

    • 5.3 SVD and Generalized Inverse Matrices

    • 5.4 Some Properties of Singular Values

    • 5.5 Exercises for Chapter 5

  • Chapter 6 Various Applications

    • 6.1 Linear Regression Analysis

      • 6.1.1 The method of least squares and multiple regression analysis

      • 6.1.2 Multiple correlation coefficients and their partitions

      • 6.1.3 The Gauss-Markov model

    • 6.2 Analysis of Variance

      • 6.2.1 One-way design

      • 6.2.2 Two-way design

      • 6.2.3 Three-way design

      • 6.2.4 Cochran's theorem

    • 6.3 Multivariate Analysis

      • 6.3.1 Canonical correlation analysis

      • 6.3.2 Canonical discriminant analysis

      • 6.3.3 Principal component analysis

      • 6.3.4 Distance and projection matrices

    • 6.4 Linear Simultaneous Equations

      • 6.4.1 QR decomposition by the Gram-Schmidt orthogonalization method

      • 6.4.2 QR decomposition by the Householder transformation

      • 6.4.3 Decomposition by projectors

    • 6.5 Exercises for Chapter 6

  • Chapter 7 Answers to Exercises

    • 7.1 Chapter 1

    • 7.2 Chapter 2

    • 7.3 Chapter 3

    • 7.4 Chapter 4

    • 7.5 Chapter 5

    • 7.6 Chapter 6

  • Chapter 8 References

  • Index

Nội dung

Vectors and Matrices

Vectors

Sets of n real numbers a 1 , a 2 ,ã ã ã, a n and b 1 , b 2 ,ã ã ã, b n , arranged in the fol- lowing way, are calledn-component column vectors: a 

The real numbersa 1 , a 2 ,ã ã ã, a n andb 1 , b 2 ,ã ã ã, b n are called elements or com- ponents ofa and b, respectively These elements arranged horizontally, a 0 = (a 1 , a 2 ,ã ã ã, a n ), b 0 = (b 1 , b 2 ,ã ã ã, b n ),

||a|| q a 2 1 +a 2 2 +ã ã ã+a 2 n (1.2) © Springer Science+Business Media, LLC 2011 are calledn-component row vectors.

We define the length of then-component vector ato be

Statistics for Social and Behavioral Sciences, DOI 10.1007/978-1-4419-9887-3_1,

H Yanai et al., Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition, 1

This is also called a norm of vector a We also define an inner product between two vectors a andbto be

(a,b) =a 1 b 1 +a 2 b 2 +ã ã ã+a n b n (1.3) The inner product has the following properties:

(iii) (aa,b) = (a, ab) =a(a,b), where ais a scalar,

(iv) ||a|| 2 = 0⇐⇒ a = 0, where ⇐⇒ indicates an equivalence (or “if and only if”) relationship.

We define the distance between two vectors by d(a,b) =||a−b|| (1.4)

The three properties above are called the metric (or distance) axioms.

Theorem 1.1 The following properties hold:

||a+b|| ≤ ||a||+||b|| (1.6) Proof (1.5): The following inequality holds for any real number t:

Inequality (1.5) is called the Cauchy-Schwarz inequality, and (1.6) is called the triangular inequality.

For two n-component vectors a (6=0) and b (6= 0), the angle between them can be defined by the following definition.

Definition 1.1 For two vectors a andb, θ defined by cosθ= (a,b)

||a|| ã ||b|| (1.7) is called the angle betweena and b.

Matrices

We callnmreal numbers arranged in the following form a matrix:

In mathematics, numbers organized in horizontal lines are termed rows, while those arranged vertically are known as columns A matrix A is defined as having n row vectors or m column vectors, typically referred to as an n by m matrix (n×m) When the number of rows equals the number of columns (n = m), the matrix is classified as a square matrix Specifically, a square matrix of order n features unit diagonal elements and zero off-diagonal elements.

 , is called an identity matrix.

We may represent them vectors collectively by

The element of matrix A located in the ith row and jth column is known as the (i, j)th element, represented as a_ij The matrix A can be expressed as A = [a_ij] When the rows and columns of A are interchanged, the resulting matrix is called the transposed matrix of A, denoted as A'.

LetA= [a ik ] andB = [b kj ] benbymandmbypmatrices, respectively. Their product, C = [c ij ], denoted as

C =AB, (1.10) is defined byc ij = P m k=1 a ik b kj The matrixC is of ordernbyp Note that

A 0 A=O⇐⇒A=O, (1.11) where Ois a zero matrix consisting of all zero elements.

Note An n-component column vector a is an n by 1 matrix Its transpose a 0 is a 1 by n matrix The inner product between a and b and their norms can be expressed as

Let A = [a ij ] be a square matrix of ordern The trace of A is defined as the sum of its diagonal elements That is, tr(A) =a 11 +a 22 +ã ã ã+a nn (1.12)

For any real numbers \( c \) and \( d \), and square matrices \( A \) and \( B \) of the same order, the trace of a linear combination of matrices is given by \( \text{tr}(cA + dB) = c \, \text{tr}(A) + d \, \text{tr}(B) \) Additionally, the trace of the product of two matrices is invariant under the order of multiplication, as demonstrated by the property \( \text{tr}(AB) = \text{tr}(BA) \).

Also, when A 0 1 A 1 ,A 0 2 A 2 ,ã ã ã,A 0 m A m are matrices of the same order, we have tr(A 0 1 A 1 +A 0 2 A 2 +ã ã ã+A 0 m A m ) = 0⇐⇒A j =O(j = 1,ã ã ã, m) (1.18) LetA and B be nbym matrices Then, tr(A 0 A) X n i=1

X m j=1 a ij b ij , and Theorem 1.1 can be extended as follows.

Corollary 1 tr(A 0 B)≤ q tr(A 0 A)tr(B 0 B) (1.19) and q tr(A+B) 0 (A+B)≤ q tr(A 0 A) + q tr(B 0 B) (1.20)

Inequality (1.19) is a generalized form of the Cauchy-Schwarz inequality.

The definition of a norm can be generalized for a nonnegative-definite matrix M of order n, as outlined in Section 1.4, prior to Theorem 1.12.

Furthermore, if the inner product betweenaand bis defined by

(a,b) M =a 0 M b, (1.22) the following two corollaries hold.

Corollary 1 can further be generalized as follows.

Corollary 3 tr(A 0 M B)≤ q tr(A 0 M A)tr(B 0 M B) (1.24) and q tr{(A+B) 0 M(A+B)} ≤ q tr(A 0 M A) + q tr(B 0 M B) (1.25)

In addition, (1.15) can be generalized as

Vector Spaces and Subspaces

In the context of m n-component vectors \( a_1, a_2, \ldots, a_m \), a linear combination is defined as the sum of these vectors multiplied by corresponding constants \( \alpha_1, \alpha_2, \ldots, \alpha_m \), represented mathematically as \( f = \alpha_1 a_1 + \alpha_2 a_2 + \ldots + \alpha_m a_m \) This equation can be simplified to \( f = Aa \), where \( A \) is defined in a previous context, and \( a_0 = (\alpha_1, \alpha_2, \ldots, \alpha_m) \) Additionally, the norm of the linear combination \( f \) is expressed in terms of these vectors and their associated coefficients.

In linear algebra, a set of m-component vectors \( a_1, a_2, \ldots, a_m \) is considered linearly dependent if there exist scalars \( \alpha_1, \alpha_2, \ldots, \alpha_m \), not all zero, such that the equation \( \alpha_1 a_1 + \alpha_2 a_2 + \ldots + \alpha_m a_m = 0 \) holds Conversely, these vectors are deemed linearly independent if the only solution to this equation is when all scalars \( \alpha_1, \alpha_2, \ldots, \alpha_m \) are equal to zero.

When a set of vectors \( a_1, a_2, \ldots, a_m \) is linearly dependent, it implies that there exists a non-zero scalar \( \alpha_j \) for some \( j \) Specifically, if \( \alpha_i \neq 0 \), the relationship can be expressed as \( a_i = \beta_1 a_1 + \ldots + \beta_{i-1} a_{i-1} + \beta_{i+1} a_{i+1} + \beta_m a_m \), where \( \beta_k = -\frac{\alpha_k}{\alpha_i} \) for \( k = 1, \ldots, m \) and \( k \neq i \) Conversely, if this equation holds true, it confirms that the vectors \( a_1, a_2, \ldots, a_m \) are indeed linearly dependent Thus, a set of vectors is considered linearly dependent if at least one vector can be represented as a linear combination of the others.

Leta 1 ,a 2 ,ã ã ã,a m be linearly independent, and let

, where the α i ’s are scalars, denote the set of linear combinations of these vectors ThenW is called a linear subspace of dimensionalitym.

Definition 1.2 Let E n denote the set of all n-component vectors Suppose thatW ⊂E n (W is a subset of E n ) satisfies the following two conditions:

(2)If a∈W, then αa∈W, where α is a scalar.

ThenW is called a linear subspace or simply a subspace ofE n

When there are r linearly independent vectors in W, while any set of r+ 1 vectors is linearly dependent, the dimensionality ofW is said to ber and is denoted as dim(W) =r.

Let dim(W) = r, and let a 1 ,a 2 ,ã ã ã,a r denote a set of r linearly in- dependent vectors in W These vectors are called basis vectors spanning (generating) the (sub)spaceW This is written as

W = Sp(a 1 ,a 2 ,ã ã ã,a r ) = Sp(A), (1.28) whereA= [a 1 ,a 2 ,ã ã ã,a r ] The maximum number of linearly independent vectors is called the rank of the matrixA and is denoted as rank(A) The following property holds: dim(Sp(A)) = rank(A) (1.29)

Theorem 1.2 Leta 1 ,a 2 ,ã ã ã,a r denote a set of linearly independent vectors in the r-dimensional subspace W Then any vector in W can be expressed uniquely as a linear combination ofa 1 ,a 2 ,ã ã ã,a r

The theorem demonstrates that any vector within a linear subspace can be uniquely expressed as a linear combination of its basis vectors However, it is important to note that the specific set of basis vectors that span the subspace is not uniquely defined.

An orthogonal basis is formed by mutually orthogonal basis vectors, denoted as a1, a2, , ar By normalizing these vectors, we create new vectors bj = aj / ||aj||, ensuring that ||bj|| = 1 for each j (j = 1, , r) The resulting normalized vectors, b1, b2, , br, are referred to as an orthonormal basis, which maintains the property of orthonormality.

(b i ,b j ) =δ ij , where δ ij is called Kronecker’s δ, defined by δ ij ( 1 ifi=j

Letxbe an arbitrary vector in the subspaceV spanned byb 1 ,b 2 ,ã ã ã,b r , namely x∈V = Sp(B) = Sp(b 1 ,b 2 ,ã ã ã,b r )⊂E n

Then xcan be expressed as x= (x,b 1 )b 1 + (x,b 2 )b 2 +ã ã ã+ (x,b r )b r (1.30)

Sinceb 1 ,b 2 ,ã ã ã,b r are orthonormal, the squared norm ofxcan be expressed as

The formula above is called Parseval’s equality.

In this discussion, we explore the relationships between two subspaces, denoted as V A = Sp(A) and V B = Sp(B), which are formed by the sets of vectors represented in matrices A = [a₁, a₂, , aₚ] and B = [b₁, b₂, , bₖ] The subspace generated by the sum of the vectors from these two subspaces is a crucial concept in understanding their interaction.

V A +V B ={a+b|a∈V A ,b∈V B } (1.32) The resultant subspace is denoted by

V A+B =V A +V B = Sp(A,B) (1.33) and is called the sum space of V A and V B The set of vectors common to both V A and V B , namely

V A∩B ={x|x=Aα=Bβfor someαandβ}, (1.34) also constitutes a linear subspace Clearly,

The subspace given in (1.34) is called the product space betweenV A and

When V A ∩V B ={0} (that is, the product space between V A and V B has only a zero vector), V A and V B are said to be disjoint When this is the case,V A+B is written as

V A+B =V A ⊕V B (1.37) and the sum space V A+B is said to be decomposable into the direct-sum of

When the n-dimensional Euclidean space E n is expressed by the direct- sum ofV andW, namely

In linear algebra, a subspace W is considered complementary to another subspace V, denoted as W = Vᶜ or V = Wᶜ The complementary subspace of the span of a set A, represented as Sp(A), is referred to as Sp(A)ᶜ For any subspace V that is a part of Sp(A), there exist infinitely many complementary subspaces W that belong to Sp(A)ᶜ.

Furthermore, when all vectors inV and all vectors inW are orthogonal,

W =V ⊥ (or V = W ⊥ ) is called the ortho-complement subspace, which is defined by

The n-dimensional Euclidean space E n expressed as the direct sum of r disjoint subspacesW j (j= 1,ã ã ã, r) is written as

In particular, when W i and W j (i 6= j) are orthogonal, this is especially written as

E n =W 1 ⊕ ã W 2 ⊕ ã ã ã ã ⊕ ã W r , (1.41) where⊕ ã indicates an orthogonal direct-sum.

The following properties hold regarding the dimensionality of subspaces. Theorem 1.3 dim(V A+B ) = dim(V A ) + dim(V B )−dim(V A∩B ), (1.42) dim(V A ⊕V B ) = dim(V A ) + dim(V B ), (1.43) dim(V c ) =n−dim(V) (1.44)

In an n-dimensional Euclidean space \( E^n \), if it can be represented as the direct sum of two subspaces \( V = Sp(A) \) and \( W = Sp(B) \), then the equation \( Ax + By = 0 \) implies that \( Ax = -By \) belongs to the intersection of the subspaces \( Sp(A) \) and \( Sp(B) \), which is the zero vector Consequently, this leads to the conclusion that both \( Ax \) and \( By \) must equal zero This concept can be further generalized.

Theorem 1.4 The necessary and sufficient condition for the subspaces

W 1 = Sp(A 1 ), W 2 = Sp(A 2 ),ã ã ã, W r = Sp(A r ) to be mutually disjoint is

Corollary An arbitrary vector x ∈ W = W 1 ⊕ ã ã ã ⊕W r can uniquely be expressed as x=x 1 +x 2 +ã ã ã+x r , where x j ∈W j (j= 1,ã ã ã, r).

Theorem 1.4 and its corollary illustrate that decomposing a specific subspace into the direct sum of disjoint subspaces naturally extends the concept of linear independence among vectors.

The following theorem holds regarding implication relations between subspaces.

Theorem 1.5 Let V 1 andV 2 be subspaces such thatV 1 ⊂V 2 , and let W be any subspace in E n Then,

Proof Lety∈V 1 +(V 2 ∩W) Thenycan be decomposed intoy=y 1 +y 2 , wherey 1 ∈V 1 andy 2 ∈V 2 ∩W SinceV 1 ⊂V 2 ,y 1 ∈V 2 , and since y 2 ⊂V 2 , y=y 1 +y 2 ∈V 2 Also, y 1 ∈V 1 ⊂V 1 +W, and y 2 ∈W ⊂V 1 +W, which together implyy∈V 1 +W Hence,y∈(V 1 +W)∩V 2 Thus,V 1 +(V 2 ∩W)⊂

(V 1 +W)∩V 2 If x∈(V 1 +W)∩V 2 , then x∈V 1 +W and x∈V 2 Thus, xcan be decomposed asx=x 1 +y, wherex 1 ∈V 1 and y∈W Theny x−x 1 ∈V 2 ∩W =⇒x∈V 1 +(V 2 ∩W) =⇒(V 1 +W)∩V 2 ⊂V 1 +(V 2 ∩W), es- tablishing (1.45) Q.E.D.

Corollary (a) For V 1 ⊂ V 2 , there exists a subspace W˜ ⊂ V 2 such that

Proof (a): LetW be such thatV 1 ⊕W ⊃V 2 , and set ˜W =V 2 ∩W in (1.45).

Let V1 be a subset of V2, where V1 equals the span of A (Sp(A)) According to part (a) of the corollary, we can select B such that W equals the span of B (Sp(B)) and V2 is the direct sum of Sp(A) and Sp(B) Furthermore, part (b) suggests that we can arrange Sp(A) and Sp(B) to be orthogonal to each other.

In addition, the following relationships hold among the subspacesV,W, andK inE n :

Note In (1.50) and (1.51), the distributive law in set theory does not hold For the conditions for equalities to hold in (1.50) and (1.51), refer to Theorem 2.19.

Linear Transformations

A function φ that relates an m-component vector x to an n-component vector y (that is, y = φ(x)) is often called a mapping or transformation.

In this book, we mainly use the latter terminology When φ satisfies the following properties for any twon-component vectors xand y, and for any constanta, it is called a linear transformation:

If we combine the two properties above, we obtain φ(α 1 x 1 +α 2 x 2 +ã ã ã+α m x m ) =α 1 φ(x 1 ) +α 2 φ(x 2 ) +ã ã ã+α m φ(x m ) form n-component vectors,x 1 ,x 2 ,ã ã ã,x m , and m scalars,α 1 , α 2 ,ã ã ã, α m

Theorem 1.6 A linear transformation φthat transforms anm-component vector xinto ann-component vector ycan be represented by annby mma- trixA= [a 1 ,a 2 ,ã ã ã,a m ]that consists ofm n-component vectors a 1 ,a 2 ,ã ã ã, a m (Proof omitted.)

The dimensionality of the subspace created by a linear transformation is examined, focusing on W = Sp(A), which represents the range of the transformation y = Ax as x varies across the entire m-dimensional space.

E m Then, if y∈W,αy=A(αx)∈W, and if y 1 ,y 2 ∈W, y 1 +y 2 ∈W. Thus,W constitutes a linear subspace of dimensionality dim(W) = rank(A) spanned by mvectors,a 1 ,a 2 ,ã ã ã,a m

When the domain of x is V, where V ⊂ E m and V 6= E m (that is, x does not vary over the entire range ofE m ), the range of y is a subspace of

Then, dim(W V )≤min{rank(A),dim(W)} ≤dim(Sp(A)) (1.54)

Note The W V above is sometimes written as W V = Sp V (A) Let B represent the matrix of basis vectors Then W V can also be written as W V = Sp(AB).

We next consider the set of vectors x that satisfiesAx=0 for a given linear transformationA We write this subspace as

The kernel of a linear transformation A, denoted as Ker(A), is defined as the set of m-dimensional vectors that are mapped to the zero vector, satisfying the equation A(αx) = 0, which indicates that αx is in Ker(A) Additionally, the kernel is closed under addition; if x and y are both in Ker(A), then their sum x+y also belongs to Ker(A) because A(x+y) = 0 This property confirms that Ker(A) is a subspace of E^m, often referred to as the annihilation space or kernel Furthermore, if BAx = 0 and A = 0, it follows that the transformation results in the zero vector.

The following three theorems hold concerning the dimensionality of sub- spaces.

Theorem 1.7 Let Ker(A 0 ) ={y|A 0 y= 0} for y∈E n Then

Ker(A 0 ) = Sp(A) ⊥ , (1.56) whereSp(A) ⊥ indicates the subspace in E n orthogonal toSp(A).

Proof Lety 1 ∈Ker(A 0 ) andy 2 =Ax 2 ∈Sp(A) Then,y 0 1 y 2 =y 0 1 Ax 2 (A 0 y 1 ) 0 x 2 = 0 Thus, y 1 ∈ Sp(A) ⊥ =⇒ Ker(A 0 ) ⊂ Sp(A) ⊥ Conversely, let y 1 ∈ Sp(A) ⊥ Then, because Ax 2 ∈ Sp(A), y 0 1 Ax 2 = (A 0 y 1 ) 0 x 2 0 =⇒ A 0 y 1 = 0 =⇒ y 1 ∈ Ker(A 0 ) =⇒ Sp(A 0 ) ⊥ ⊂ Ker(A 0 ), establishing

To demonstrate that the rank of matrix A is at least the rank of matrix A₀, we express any arbitrary vector x in Eₘ as the sum of two components, x₁ from the span of A₀ and x₂ from the kernel of A Consequently, we find that y = Ax = Ax₁, leading to the conclusion that the span of A equals the span of V, where V is the span of A₀ This implies that the rank of A is equal to the dimension of its span, which is less than or equal to the dimension of V, thus confirming that rank(A) ≤ rank(A₀) Conversely, we can also establish that rank(A) is greater than or equal to rank(A₀) using an earlier result.

Theorem 1.9 Let A be ann by m matrix Then, dim(Ker(A)) =m−rank(A) (1.60)

Proof Follows directly from (1.57) and (1.59) Q.E.D.

Corollary rank(A) = rank(A 0 A) = rank(AA 0 ) (1.61)

In addition, the following results hold:

(i) Let A and B be p by n and p by q matrices, respectively, and [A,B] denote a row block matrix obtained by puttingAandB side by side Then, rank(A) + rank(B)−rank([A,B])

≤rank(A 0 B)≤min(rank(A),rank(B)), (1.62) where rank(A) + rank(B)−rank([A,B]) = dim(Sp(A)∩Sp(B)).

(ii) Let U and V be nonsingular matrices (see the next paragraph) Then, rank(U AV) = rank(A) (1.63)

In linear algebra, for matrices A and B of the same order, the rank of their sum satisfies the inequality rank(A+B) ≤ rank(A) + rank(B) Additionally, for matrices A, B, and C with dimensions n by p, p by q, and q by r respectively, the rank of the product ABC is at least rank(AB) + rank(BC) - rank(B) For further insights into rank formulas, refer to the work of Marsaglia and Styan (1974).

A linear transformation matrix A transforms an n-component vector x into another n-component vector y, where A is a square matrix of order n A matrix A is classified as nonsingular (or regular) when its rank equals n, while it is considered singular when its rank is less than n.

Theorem 1.10 Each of the following three conditions is necessary and suf- ficient for a square matrix A to be nonsingular:

(i) There exists anx such thaty=Ax for an arbitraryn-dimensional vec- tor y.

(ii)The dimensionality of the annihilation space ofA(Ker(A))is zero; that is,Ker(A) ={0}.

(iii) If Ax 1 =Ax 2 , then x 1 =x 2 (Proof omitted.)

A linear transformation φ is one-to-one if the square matrix A that represents φ is nonsingular, meaning that the equation Ax = 0 has the unique solution x = 0 This condition can also be stated as the kernel of A being equal to the zero vector, or Ker(A) = {0}.

A square matrix A of order ncan be considered as a collection of n n- component vectors placed side by side, i.e.,A= [a 1 ,a 2 ,ã ã ã,a n ] We define a function of these vectors by ψ(a 1 ,a 2 ,ã ã ã,a n ) =|A|.

The determinant of a square matrix A, denoted as |A| or det(A), is a scalar function that is linear in each variable a_i and reverses its sign when any two variables a_i and a_j (where i ≠ j) are interchanged The determinant can be expressed in the relation ψ(a_1, , αa_i + βb_i, , a_n).

If among thenvectors there exist two identical vectors, then ψ(a 1 ,ã ã ã,a n ) = 0.

When vectors a1, a2, , an are linearly dependent, the determinant |A| equals zero For square matrices A and B of the same order, the determinant of their product can be expressed as the product of their individual determinants.

According to Theorem 1.10, if the rank of matrix A is equal to n, then the solution x that satisfies the equation y = Ax is uniquely determined Additionally, the relationship αy = A(αx) holds true, and for two vectors, y₁ = Ax₁ and y₂ = Ax₂, the equation y₁ + y₂ = A(x₁ + x₂) is valid By expressing the transformation that converts y into x as x = ϕ(y), we identify this as a linear transformation This transformation is known as the inverse transformation of y = Ax, and its matrix representation is referred to as the inverse matrix of A, denoted as A⁻¹ Conversely, if y = φ(x) represents a linear transformation, then x = ϕ(y) serves as its inverse transformation.

In linear algebra, the relationship between two functions, ϕ and φ, can be expressed as ϕ(φ(x)) = ϕ(y) = x and φ(ϕ(y)) = φ(x) = y, indicating that both composite transformations are identity transformations This leads to the conclusion that AA −1 = A −1 A = I n, where A represents a matrix and A −1 its inverse Therefore, the inverse matrix is defined as the matrix that, when multiplied by A, yields the identity matrix.

If A is regular (nonsingular), the following relation holds:

IfA and B are nonsingular matrices of the same order, then

Let A, B, C, and D be n by n, n by m, m by n, and m by m matrices, respectively If Aand D are nonsingular, then ¯ ¯ ¯ ¯ ¯

C D ¯ ¯ ¯ ¯ ¯=|A||D−CA −1 B|=|D||A−BD −1 C| (1.66) Furthermore, the inverse of a symmetric matrix of the form

(1.66) is nonzero and Aand C are nonsingular, is given by

In Chapter 3, we will discuss a generalized inverse of Arepresenting an inverse transformationx=ϕ(y) of the linear transformationy=Axwhen

A is not square or when it is square but singular.

Eigenvalues and Eigenvectors

Definition 1.3 Let A be a square matrix of order n A scalar λ and an n-component vector x(6=0) that satisfy

In the equation Ax = λx, the terms λ and x represent the eigenvalue and eigenvector of the matrix A, respectively This matrix equation identifies an n-component vector x that maintains its direction despite the linear transformation applied by matrix A.

The vector \( x \) that fulfills the equation (1.69) lies in the null space of the matrix \( \tilde{A} = A - \lambda I_n \), as demonstrated by the equation \( (A - \lambda I_n)x = 0 \) According to Theorem 1.10, for the null space to possess a dimensionality of at least 1, the determinant of the matrix must equal 0.

Let the determinant on the left-hand side of the equation above be denoted by ψ A (λ) ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ a 11 −λ a 12 ã ã ã a 1n a 21 a 22 −λ ã ã ã a 2n

The equation presented is a polynomial function of λ, characterized by a leading coefficient of (−1) n It can be expressed as ψ A (λ) = (−1) n λ n +α 1 (−1) n−1 λ n−1 + +α n, known as the eigenpolynomial of A Setting this eigenpolynomial to zero, ψ A (λ) = 0, yields the eigenequation The eigenvalues of A are the solutions or roots of this eigenequation.

The following properties hold for the coefficients of the eigenpolynomial ofA Setting λ= 0 in (1.71), we obtain ψ A (0) =α n =|A|.

In the expansion of |A−λI n|, all terms except the product of the diagonal elements, (a11−λ)(a nn−λ), are of order at most n−2 Consequently, the coefficient α1 on λ n−1 is derived from the diagonal elements, leading to the expression α1 = (−1) n−1 (a11 + a22 + + a nn) This establishes the equality α1 = tr(A), where tr(A) represents the trace of matrix A, calculated as the sum of its diagonal elements, a11 + a22 + + a nn.

LetAbe a square matrix of ordernnot necessarily symmetric Assume thatA hasndistinct eigenvalues, λ i (i= 1,ã ã ã, n) Then u i that satisfies

Au i =λ i u i (i= 1,ã ã ã, n) (1.73) is called a right eigenvector, and v i that satisfies

A 0 v i =λ i v i (i= 1,ã ã ã, n) (1.74) is called a left eigenvector The following relations hold:

We may setu i and v i so that u 0 i v i = 1 (1.76)

Let U and V be matrices of such u i ’s andv i ’s, that is,

Then, from (1.75) and (1.76), we have

Furthermore, it follows that v 0 j Au i = 0 (j6=i) and v 0 j Au j =λ j v 0 j u j =λ j , and we have

Pre- and postmultiplying the equation above byU andV 0 , respectively, and noting thatV 0 U =I n =⇒U V 0 =I n (note thatV 0 =U −1 ), we obtain the following theorem.

Theorem 1.11 states that for a matrix A with distinct eigenvalues λ₁, λ₂, , λₙ, the matrices of right eigenvectors U = [u₁, u₂, , uₙ] and left eigenvectors V = [v₁, v₂, , vₙ] can be used to establish specific decompositions related to A.

IfAis symmetric (i.e.,A=A 0 ), the right eigenvectoru i and the corre- sponding left eigenvector v i coincide, and since (u i ,u j ) = 0, we obtain the following corollary.

Corollary When A=A 0 and λ i are distinct, the following decompositions hold:

Decomposition (1.81) is called the spectral decomposition of the symmetric matrixA.

A symmetric matrix A is classified as positive-definite (pd) when all its eigenvalues are positive, indicating that A is regular (nonsingular) Conversely, if all eigenvalues are nonnegative, A is referred to as a nonnegative-definite (nnd) matrix Theorem: A nonnegative-definite matrix possesses specific properties that are important in various mathematical applications.

Theorem 1.12 The necessary and sufficient condition for a square matrix

A to be an nnd matrix is that there exists a matrix B such that

Vector and Matrix Derivatives

In multivariate analysis, identifying the extremum (maximum or minimum) of a scalar function involving vectors and matrices is essential A key requirement for determining an extremum is that the function's derivatives must equal zero at the corresponding point To achieve this, we need to compute the derivatives of the function concerning the vector or matrix variable For a scalar function f(x) defined on a p-component vector x, the derivative is expressed as f d (x) = ∂f(x)/∂x = (∂f(x)/∂x₁, ∂f(x)/∂x₂, , ∂f(x)/∂xₚ)ᵀ.

Similarly, let f(X) denote a scalar function of then by pmatrix X Then its derivative with respect toX is defined as f d (X)≡ ∂f(X)

Below we give functions often used in multivariate analysis and their corre- sponding derivatives.

Theorem 1.13 Let a be a constant vector, and let A and B be constant matrices Then,

(vi) f(X) = tr(X 0 AXB) f d (X) =AXB+A 0 XB 0 ,

Let f(x) andg(x) denote two scalar functions ofx Then the following relations hold, as in the case in whichx is a scalar:

The relations above still hold when the vectorxis replaced by a matrixX.

In a linear regression model represented by the equation Lety = Xb, y denotes the observations of the criterion variable, X is the matrix of predictor variables, and b represents the regression coefficients The least squares (LS) estimate of b is determined by minimizing the function f(b) = ||y - Xb||² To find this estimate, the first derivative of the function is taken with respect to b and set to zero.

∂b =−2X 0 y+ 2X 0 Xb=−2X 0 (y−Xb), (1.90) obtained by expanding f(b) and using (1.86) Similarly, letf(B) =||Y − XB|| 2 Then,

Let h(x)≡λ= x 0 Ax x 0 x , (1.92) whereA is a symmetric matrix Then,

To solve the eigenequation Ax = λx, we start with the equation ∂x = 2(Ax−λx) x 0 x (1.93) and set it to zero This leads to the maximization of x 0 Ax under the constraint x 0 x = 1, which is achieved using the Lagrange multiplier method We define the function g(x) = x 0 Ax − λ(x 0 x − 1) (1.94), where λ represents the Lagrange multiplier By differentiating this function with respect to x and λ and setting the derivatives to zero, we can find the optimal values for the variables involved.

Ax−λx=0 (1.95) and x 0 x−1 = 0, (1.96) from which we obtain a normalized eigenvector x From (1.95) and (1.96), we have λ=x 0 Ax, implying that x is the (normalized) eigenvector corre- sponding to the largest eigenvalue ofA.

The function \( f(b) \), as defined in equation (1.89), can be expressed as a composite function \( f(g(b)) = g(b) \circ g(b) \), where \( g(b) = y - Xb \) represents a vector function of the vector \( b \) The derivative of \( f(g(b)) \) with respect to \( b \) follows the chain rule.

Applying this formula to f(b) defined in (1.89), we obtain

(This is like replacing a 0 in (i) of Theorem 1.13 with −X.) Formula (1.98) is essentially the same as (1.91), as it should be.

See Magnus and Neudecker (1988) for a more comprehensive account of vector and matrix derivatives.

Exercises for Chapter 1

1 (a) Let A and C be square nonsingular matrices of orders n and m, respectively, and let B be an n by m matrix Show that

(A + BCB 0 ) −1 = A −1 − A −1 B(B 0 A −1 B + C −1 ) −1 B 0 A −1 (1.100) (b) Let c be an n-component vector Using the result above, show the following:

3 Let M be a pd matrix Show

4 Let E n = V ⊕ W Answer true or false to the following statements:

5 Let E n = V 1 ⊕ W 1 = V 2 ⊕ W 2 Show that dim(V 1 + V 2 ) + dim(V 1 ∩ V 2 ) + dim(W 1 + W 2 ) + dim(W 1 ∩ W 2 ) = 2n (1.101)

6 (a) Let A be an n by m matrix, and let B be an m by p matrix Show the following:

Ker(AB) = Ker(B) ⇐⇒ Sp(B) ∩ Ker(A) = {0}

(b) Let A be a square matrix Show the following:

7 Let A, B, and C be m by n, n by m, and r by m matrices, respectively. (a) Show that rank àã A AB

(b) Show that rank(A − ABA) = rank(A) + rank(I n − BA) − n

8 Let A be an n by m matrix, and let B be an m by r matrix Answer the following questions:

(a) Let W 1 = {x|Ax = 0 for all x ∈ Sp(B)} and W 2 = {Ax|x ∈ Sp(B)} Show that dim(W 1 ) + dim(W 2 ) = rank(B).

(b) Use (a) to show that rank(AB) = rank(A) − dim(Sp(A 0 ) ∩ Sp(B) ⊥ ).

9 (a) Assume that the absolute values of the eigenvalues of A are all smaller than unity Show the following:

 , and use the formula in part (a).)

10 Let A be an m by n matrix If rank(A) = r, A can be expressed as A = M N, where M is an m by r matrix and N is an r by n matrix (This is called a rank decomposition of A.)

Let U and ˜ U represent matrices of basis vectors for the vector space E n, while V and ˜ V serve the same purpose for E m It follows that ˜ U can be expressed as U multiplied by a transformation matrix T 1, and ˜ V as V multiplied by another transformation matrix T 2 The representation matrix A, defined with respect to the basis matrices U and V, can be shown to have a corresponding representation with respect to the transformed basis matrices ˜ U and ˜ V.

In the context of a multiple regression equation expressed as y = Xβ + e, where y represents the observations of the criterion variable, X denotes the matrix of predictor variables, β signifies the vector of regression coefficients, and e indicates the disturbance terms, it can be established that, without loss of generality, the vector of regression coefficients β can be assumed to belong to the span of the matrix X (Sp(X)).

Definition

Definition 2.1 Let x∈E n =V ⊕W Thenx can be uniquely decomposed into x=x 1 +x 2 (wherex 1 ∈V andx 2 ∈W).

The transformation that maps a vector \( x \) to \( x_1 \) is known as the projection matrix, or projector, onto the subspace \( V \) along \( W \), and is represented by \( \phi \) This linear transformation satisfies the property \( \phi(a_1 y_1 + a_2 y_2) = a_1 \phi(y_1) + a_2 \phi(y_2) \) for any vectors \( y_1, y_2 \) in \( E_n \), indicating that it can be expressed as a matrix This matrix, referred to as the projection matrix \( P_{V \to W} \), transforms the vector \( x \) into \( x_1 = P_{V \to W} x \), which is termed the projection vector of \( x \) onto \( V \) along \( W \).

Theorem 2.1 The necessary and sufficient condition for a square matrix

P of ordernto be the projection matrix ontoV = Sp(P)alongW = Ker(P) is given by

We need the following lemma to prove the theorem above.

Lemma 2.1 Let P be a square matrix of order n, and assume that (2.2) holds Then

E n = Sp(P)⊕Ker(P) (2.3) © Springer Science+Business Media, LLC 2011

Statistics for Social and Behavioral Sciences, DOI 10.1007/978-1-4419-9887-3_2, 25

H Yanai et al., Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition, and

The proof of Lemma 2.1 demonstrates that if \( x \) belongs to the spectrum of \( P \) and \( y \) is in the kernel of \( P \), then \( P x = x \) and \( P y = 0 \) Consequently, the equation \( x + y = 0 \) leads to \( P x + P y = 0 \), resulting in \( P x = x = 0 \) and thus \( y = 0 \) This establishes that the intersection of the spectrum and kernel of \( P \) is trivial, specifically \( Sp(P) \cap Ker(P) = \{0\} \) Furthermore, the relationship between the dimensions of the spectrum and kernel, expressed as \( \text{dim}(Sp(P)) + \text{dim}(Ker(P)) = \text{rank}(P) + (n - \text{rank}(P)) = n \), confirms that \( E^n \) can be represented as the direct sum of \( Sp(P) \) and \( Ker(P) \).

(2.4): We have P x=0⇒x= (I n −P)x⇒ Ker(P)⊂Sp(I n −P) on the one hand andP(I n −P)⇒Sp(I n −P)⊂Ker(P) on the other Thus,

Note When (2.4) holds, P (I n − P ) = O ⇒ P 2 = P Thus, (2.2) is the necessary and sufficient condition for (2.4).

Proof of Theorem 2.1 (Necessity) For ∀x∈ E n , y =P x ∈V Noting that y=y+0, we obtain

P)x,x∈E n } From Lemma 2.1, V andW are disjoint Then, an arbitrary x∈E n can be uniquely decomposed into x=P x+ (I n −P)x=x 1 +x 2

(wherex 1 ∈V andx 2 ∈W) From Definition 2.1,P is the projection matrix ontoV = Sp(P) along W = Ker(P) Q.E.D.

Let E n =V ⊕W, and let x=x 1 +x 2 , wherex 1 ∈V and x 2 ∈W Let

P WãV denote the projector that transforms xintox 2 Then,

P V ãW x+P W ãV x= (P V ãW +P W ãV )x (2.5) Because the equation above has to hold for any x∈E n , it must hold that

Let a square matrixP be the projection matrix ontoV alongW Then,

Q = I n −P satisfies Q 2 = (I n −P) 2 = I n −2P +P 2 = I n −P = Q, indicating that Q is the projection matrix ontoW along V We also have

P Q=P(I n −P) =P −P 2 =O, (2.6) implying that Sp(Q) constitutes the null space ofP (i.e., Sp(Q) = Ker(P)). Similarly, QP = O, implying that Sp(P) constitutes the null space of Q

Theorem 2.2 Let E n = V ⊕W The necessary and sufficient conditions for a square matrix P of order n to be the projection matrix onto V along

Proof (Sufficiency) Let P V ãW and P WãV denote the projection matrices ontoV alongW and ontoW alongV, respectively Premultiplying (2.5) by

P, we obtainP(P V ãW x) =P V ãW x, whereP P W ãV xecause of (i) and (ii) above, and P V ãW x∈ V and P W ãV x∈ W Since P x = P V ãW x holds for any x, it must hold that P =P V ãW

(Necessity) For anyx∈V, we havex=x+0 Thus,P x=x Similarly, for anyy∈W, we havey=0+y, so thatP y=0 Q.E.D.

In Figure 2.1, the vector OA represents the projection of z onto the subspace Sp(x) along the subspace Sp(y), expressed mathematically as OA = P_Sp(x)⟨Sp(y) z The projection matrix P_Sp(x)⟨Sp(y) facilitates this transformation Additionally, the vector OB is defined by the equation OB = (I - P_Sp(y)⟨Sp(x)) × z, illustrating the relationship between these projections.

Figure 2.1: Projection onto Sp(x) ={x}along Sp(y) ={y}.

Example 2.2 In Figure 2.2, OA −→ indicates the projection of z onto V {x|x=α 1 x 1 +α 2 x 2 }along Sp(y) (that is,OA= −→ P V ãSp( y ) z), whereP V ãSp( y ) indicates the projection matrix ontoV along Sp(y).

Figure 2.2: Projection onto a two-dimensional space V along Sp(y) ={y}.

Theorem 2.3 The necessary and sufficient condition for a square matrix

P of order n to be a projector onto V of dimensionality r (dim(V) =r) is given by

P =T∆ r T −1 , (2.8) where T is a square nonsingular matrix of order n and

(There are r unities on the leading diagonals, 1≤r≤n.)

Proof (Necessity) Let E n = V ⊕W, and let A = [a 1 ,a 2 ,ã ã ã,a r ] and

B = [b 1 ,b 2 ,ã ã ãb n−r ] be matrices of linearly independent basis vectors span- ning V and W, respectively Let T = [A,B] Then T is nonsingular, since rank(A) + rank(B) = rank(T) Hence, ∀x ∈V and ∀y ∈ W can be expressed as x=Aα= [A,B] Ã α

Adding the two equations above, we obtain

! is an arbitrary vector in then-dimensional spaceE n , it follows that

Furthermore, T can be an arbitrary nonsingular matrix since V = Sp(A) andW = Sp(B) such that E n =V ⊕W can be chosen arbitrarily.

(Sufficiency)P is a projection matrix, since P 2 =P, and rank(P) =r from Theorem 2.1 (Theorem 2.2 can also be used to prove the theorem above.) Q.E.D.

Lemma 2.2 LetP be a projection matrix Then, rank(P) = tr(P) (2.9)

Proof rank(P) = rank(T∆ r T −1 ) = rank(∆ r ) = tr(T∆T −1 ) = tr(P).

Theorem 2.4 Let P be a square matrix of order n Then the following three statements are equivalent.

E n = Sp(P)⊕Sp(I n −P) (2.12)Proof (2.10)→ (2.11): It is clear from rank(P) = tr(P).

(2.11) →(2.12): Let V = Sp(P) andW = Sp(I n −P) Then, dim(V +

W) = dim(V) + dim(W) −dim(V ∩W) Since x = P x + (I n −P)x for an arbitrary n-component vector x, we have E n = V +W Hence, dim(V ∩W) = 0 =⇒V ∩W ={0}, establishing (2.12).

P =P 2 + (I n −P)P, which impliesP(I n −P) = (I n −P)P On the other hand, we haveP(I n −P) =Oand (I n −P)P =Obecause Sp(P(I n −P))⊂

Proof (⇒): It is clear from Lemma 2.1.

Orthogonal Projection Matrices

In the context of a specified subspace V within E n, there are infinitely many options for selecting its complementary subspace W This article will explore various approaches in Chapter 4, but here we focus on the scenario where V and W are orthogonal, denoted as W = V ⊥.

Let x,y ∈ E n , and let x and y be decomposed as x = x 1 +x 2 and y=y 1 +y 2 , wherex 1 ,y 1 ∈V andx 2 ,y 2 ∈W LetP denote the projection matrix ontoV along V ⊥ Then,x 1 =P x and y 1 =P y Since (x 2 ,P y) (y 2 ,P x) = 0, it must hold that

Theorem 2.5 The necessary and sufficient condition for a square matrix

P of order nto be an orthogonal projection matrix (an orthogonal projector) is given by

Proof (Necessity) ThatP 2 =P is clear from the definition of a projection matrix That P 0 =P is as shown above.

In the context of linear algebra, let \( x = P \alpha \in Sp(P) \) It follows that \( P x = P^2 \alpha = P \alpha = x \) For any \( y \in Sp(P)^\perp \), we have \( P y = 0 \), as the condition \( (P x, y) = x^0 P^0 y = x^0 P y = 0 \) must hold for any arbitrary \( x \) According to Theorem 2.2, \( P \) serves as the projection matrix onto \( Sp(P) \) along \( Sp(P)^\perp \), functioning as the orthogonal projection matrix onto the specified subspace.

Definition 2.2 A projection matrix P such that P 2 =P and P 0 =P is called an orthogonal projection matrix (projector) Furthermore, the vector

The orthogonal projection of a vector x is denoted as P x, where P represents the orthogonal projector This projector acts as a projection matrix onto the subspace Sp(P) while simultaneously aligning along the orthogonal complement Sp(P) ⊥ Commonly, it is referred to as the orthogonal projector onto Sp(P).

Note A projection matrix that does not satisfy P 0 = P is called an oblique pro- jector as opposed to an orthogonal projector.

Theorem 2.6 Let A = [a 1 ,a 2 ,ã ã ã,a m ], where a 1 ,a 2 ,ã ã ã,a m are linearly independent Then the orthogonal projector onto V = Sp(A) spanned by a 1 ,a 2 ,ã ã ã,a m is given by

Proof Let x 1 ∈ Sp(A) From x 1 = Aα, we obtain P x 1 = x 1 = Aα A(A 0 A) −1 A 0 x 1 On the other hand, letx 2 ∈Sp(A) ⊥ Then,A 0 x 2 =0=⇒ A(A 0 A) −1 A 0 x 2 = 0 Let x = x 1 +x 2 From P x 2 = 0, we obtain P x A(A 0 A) −1 A 0 x, and (2.15) follows becausexis arbitrary.

Let Q=I n −P ThenQ is the orthogonal projector onto Sp(A) ⊥ , the ortho-complement subspace of Sp(A).

Example 2.3 Let 1 n = (1,1,ã ã ã,1) 0 (the vector with n ones) Let P M denote the orthogonal projector ontoV M = Sp(1 n ) Then,

The orthogonal projector onto V M ⊥ = Sp(1 n ) ⊥ , the ortho-complement sub- space of Sp(1 n ), is given by

Clearly,P M andQ M are both symmetric, and the following relation holds:

P 2 M =P M , Q 2 M =Q M , andP M Q M =Q M P M =O (2.19) Note The matrix Q M in (2.18) is sometimes written as P ⊥ M

Subspaces and Projection Matrices

Decomposition into a direct-sum of

E n =V 1 ⊕W 1 =V 2 ⊕W 2 , (2.21) and ifV 1 ⊂W 2 or V 2 ⊂W 1 ,the following relation holds:

E n = (V 1 ⊕V 2 )⊕(W 1 ∩W 2 ) (2.22) Proof WhenV 1 ⊂W 2 , Theorem 1.5 leads to the following relation:

(W 1 ∩W 2 ) Hence the following relation holds:

When V 2 ⊂ W 2 , the same result follows by using W 1 = V 2 ⊕(W 1 ∩

E n = (V 1 ⊕W 2 )⊕(V 2 ∩W 1 ) (2.23) Proof In the proof of Lemma 2.3, exchange the roles ofW 2 andV 2 Q.E.D.

Theorem 2.7 Let P 1 andP 2 denote the projection matrices ontoV 1 along

W 1 and ontoV 2 alongW 2 , respectively Then the following three statements are equivalent:

(i)P 1 +P 2 is the projector onto V 1 ⊕V 2 alongW 1 ∩W 2

(iii)V 1 ⊂W 2 and V 2 ⊂W 1 (In this case,V 1 andV 2 are disjoint spaces.)

Proof (i)→(ii): From (P 1 +P 2 ) 2 =P 1 +P 2 ,P 2 1 =P 1 , andP 2 2 =P 2 , we haveP 1 P 2 =−P 2 P 1 Pre- and postmutiplying both sides byP 1 , we obtain

(ii)→(iii): For an arbitrary vectorx∈V 1 ,P 1 x=xbecauseP 1 x∈V 1 Hence, P 2 P 1 x =P 2 x = 0, which implies x ∈ W 2 , and so V 1 ⊂ W 2 On the other hand, when x ∈V 2 , it follows that P 2 x∈ V 2 , and so P 1 P 2 x P 1 x=0, implyingx∈W 2 We thus have V 2 ⊂W 2

(iii)→(ii): Forx∈E n ,P 1 x∈V 1 , which implies (I n −P 2 )P 1 x=P 1 x, which holds for any x Thus, (I n −P 2 )P 1 = P 1 , implying P 1 P 2 = O.

We also have x∈E n ⇒ P 2 x∈V 2 ⇒ (I n −P 1 )P 2 x=P 2 x, which again holds for anyx, which implies (I n −P 1 )P 2 =P 2 ⇒P 1 P 2 =O Similarly,

(ii) → (i): An arbitrary vector x ∈ (V 1 ⊕V 2 ) can be decomposed into x = x 1 +x 2 , where x 1 ∈ V 1 and x 2 ∈ V 2 From P 1 x 2 = P 1 P 2 x = 0 and P 2 x 1 = P 2 P 1 x = 0, we have (P 1 +P 2 )x = (P 1 +P 2 )(x 1 +x 2 ) P 1 x 1 +P 2 x 2 = x 1 +x 2 = x On the other hand, by noting that P 1 P 1 (I n −P 2 ) and P 2 = P 2 (I n −P 1 ) for any x ∈ (W 1 ∩W 2 ), we have (P 1 +P 2 )x = P 1 (I n −P 2 )x+P 2 (I n −P 1 )x = 0 Since V 1 ⊂ W 2 and

V 2 ⊂W 1 , the decomposition on the right-hand side of (2.22) holds Hence, we knowP 1 +P 2 is the projector ontoV 1 ⊕V 2 along W 1 ∩W 2 by Theorem

Note In the theorem above, P 1 P 2 = O in (ii) does not imply P 2 P 1 = O.

P 1 P 2 = O corresponds with V 2 ⊂ W 1 , and P 2 P 1 = O with V 1 ⊂ W 2 in (iii) It should be clear that V 1 ⊂ W 2 ⇐⇒ V 2 ⊂ W 1 does not hold.

Theorem 2.8 Given the decompositions ofE n in (2.21), the following three statements are equivalent:

(i) P 2 −P 1 is the projector ontoV 2 ∩W 1 alongV 1 ⊕W 2

Proof (i) → (ii): (P 2 −P 1 ) 2 =P 2 −P 1 implies 2P 1 =P 1 P 2 +P 2 P 1 Pre- and postmultiplying both sides by P 2 , we obtain P 2 P 1 = P 2 P 1 P 2 and P 1 P 2 =P 2 P 1 P 2 , respectively, which implyP 1 P 2 =P 2 P 1 =P 1 (ii)→(iii): For∀x∈E n ,P 1 x∈V 1 , which impliesP 1 x=P 2 P 1 x∈V 2 , which in turn implies V 1 ⊂ V 2 Let Q j = I n −P j (j = 1,2) Then,

P 1 P 2 =P 1 implies Q 1 Q 2 =Q 2 , and so Q 2 x∈W 2 , which impliesQ 2 x Q 1 Q 2 x∈W 1 , which in turn implies W 2 ⊂W 1

(iii) → (ii): FromV 1 ⊂V 2 , for∀x∈E n ,P 1 x∈V 1 ⊂V 2 ⇒P 2 (P 1 x) P 1 x⇒P 2 P 1 =P 1 On the other hand, fromW 2 ⊂W 1 , Q 2 x∈W 2 ⊂W 1 for ∀x ∈ E n ⇒ Q 1 Q 2 x = Q 2 x ⇒ Q 1 Q 2 Q 2 ⇒ (I n −P 1 )(I n −P 2 ) (I n −P 2 )⇒P 1 P 2 =P 1

(ii) → (i): For x ∈ (V 2 ∩W 1 ), it holds that (P 2 −P 1 )x = Q 1 P 2 x Q 1 x =x On the other hand, let x = y+z, where y ∈ V 1 and z ∈W 2 Then, (P 2 −P 1 )x= (P 2 −P 1 )y+ (P 2 −P 1 )z =P 2 Q 1 y+Q 1 P 2 z =0. Hence,P 2 −P 1 is the projector ontoV 2 ∩W 1 alongV 1 ⊕W 2 Q.E.D.

Note As in Theorem 2.7, P 1 P 2 = P 1 does not necessarily imply P 2 P 1 = P 1 Note that P 1 P 2 = P 1 ⇐⇒ W 2 ⊂ W 1 , and P 2 P 1 = P 1 ⇐⇒ V 1 ⊂ V 2

Theorem 2.9 When the decompositions in (2.21) and (2.22) hold, and if

P 1 P 2 =P 2 P 1 , (2.24) thenP 1 P 2 (or P 2 P 1 ) is the projector onto V 1 ∩V 2 along W 1 +W 2

Proof P 1 P 2 =P 2 P 1 implies (P 1 P 2 ) 2 =P 1 P 2 P 1 P 2 =P 2 1 P 2 2 =P 1 P 2 , indicating that P 1 P 2 is a projection matrix On the other hand, let x ∈

V 1 ∩V 2 Then, P 1 (P 2 x) = P 1 x = x Furthermore, let x ∈ W 1 +W 2 and x = x 1 +x 2 , where x 1 ∈ W 1 and x 2 ∈ W 2 Then, P 1 P 2 x P 1 P 2 x 1 +P 1 P 2 x 2 =P 2 P 1 x 1 +0=0 SinceE n = (V 1 ∩V 2 )⊕(W 1 ⊕W 2 ) by the corollary to Lemma 2.3, we know thatP 1 P 2 is the projector ontoV 1 ∩V 2 alongW 1 ⊕W 2 Q.E.D.

Note Using the theorem above, (ii) → (i) in Theorem 2.7 can also be proved as follows: From P 1 P 2 = O

Hence, Q 1 Q 2 is the projector onto W 1 ∩W 2 along V 1 ⊕V 2 , and P 1 +P 2 = I n −Q 1 Q 2 is the projector onto V 1 ⊕ V 2 along W 1 ∩ W 2

If we take W 1 =V 1 ⊥ and W 2 =V 2 ⊥ in the theorem above, P 1 and P 2 become orthogonal projectors.

Theorem 2.10 Let P 1 and P 2 be the orthogonal projectors onto V 1 and

V 2 , respectively Then the following three statements are equivalent:

(i) P 1 +P 2 is the orthogonal projector ontoV 1 ⊕ ã V 2

Theorem 2.11 The following three statements are equivalent:

(i) P 2 −P 1 is the orthogonal projector ontoV 2 ∩V 1 ⊥

The two theorems above can be proved by setting W 1 =V 1 ⊥ and W 2 V 2 ⊥ in Theorems 2.7 and 2.8.

Theorem 2.12 The necessary and sufficient condition forP 1 P 2 to be the orthogonal projector onto V 1 ∩V 2 is (2.24).

Proof Sufficiency is clear from Theorem 2.9 Necessity follows fromP 1 P 2

= (P 1 P 2 ) 0 , which implies P 1 P 2 =P 2 P 1 sinceP 1 P 2 is an orthogonal pro- jector Q.E.D.

We next present a theorem concerning projection matrices when E n is expressed as a direct-sum of m subspaces, namely

E n =V 1 ⊕V 2 ⊕ ã ã ã ⊕V m (2.25) Theorem 2.13 Let P i (i= 1,ã ã ã, m) be square matrices that satisfy

Then the following three statements are equivalent:

(ii)→ (iii): Use rank(P i ) = tr(P i ) whenP 2 i =P i Then,

(iii) → (i), (ii): Let V i = Sp(P i ) From rank(P i ) = dim(V i ), we obtain dim(V 1 ) + dim(V 2 ) +ã ã ãdim(V m ) = n; that is, E n is decomposed into the sum ofm disjoint subspaces as in (2.26) By postmultiplying (2.26) byP i , we obtain

Since Sp(P 1 ),Sp(P 2 ),ã ã ã,Sp(P m ) are disjoint, (2.27) and (2.28) hold from

Note P i in Theorem 2.13 is a projection matrix Let E n = V 1 ⊕ ã ã ã ⊕ V r , and let

Then, E n = V i ⊕V (i) Let P iã(i) denote the projector onto V i along V (i) This matrix coincides with the P i that satisfies the four equations given in (2.26) through (2.29).

Corollary 2 Let P (i)ãi denote the projector onto V (i) along V i Then the following relation holds:

P (i)ãi =P 1ã(1) +ã ã ã+P i−1ã(i−1) +P i+1ã(i+1) +ã ã ã+P mã(m) (2.34)Proof The proof is straightforward by notingP iã(i) +P (i)ãi =I n Q.E.D.

Note The projection matrix P iã(i) onto V i along V (i) is uniquely determined As- sume that there are two possible representations, P iã(i) and P ∗ iã(i) Then,

Each term in the equation above belongs to one of the respective subspaces V 1 , V 2 , ã ã ã , V m , which are mutually disjoint Hence, from Theorem 1.4, we obtain P iã(i) =

P ∗ iã(i) This indicates that when a direct-sum of E n is given, an identity matrix I n of order n is decomposed accordingly, and the projection matrices that constitute the decomposition are uniquely determined.

The following theorem due to Khatri (1968) generalizes Theorem 2.13. Theorem 2.14 Let P i denote a square matrix of order n such that

Consider the following four propositions:

All other propositions can be derived from any two of (i), (ii),and(iii),and

(i) and(ii) can be derived from(iii) and(iv).

Proof That (i) and (ii) imply (iii) is obvious To show that (ii) and (iii) imply (iv), we may use

(ii), (iii)→(i): Postmultiplying (2.35) byP i , we obtainP P i =P 2 i , from which it follows that P 3 i =P 2 i On the other hand, rank(P 2 i ) = rank(P i ) implies that there exists W such that P 2 i W i = P i Hence, P 3 i = P 2 i ⇒

(iii), (iv)→ (i), (ii): We have Sp(P)⊕Sp(I n −P) =E n fromP 2 =P. Hence, by postmultiplying the identity

Next we consider the case in which subspaces have inclusion relationships like the following.

E n =V k ⊃V k−1 ⊃ ã ã ã ⊃V 2 ⊃V 1 ={0}, and let W i denote a complement subspace of V i Let P i be the orthogonal projector onto V i along W i , and let P ∗ i = P i −P i−1 , where P 0 = O and

P k =I n Then the following relations hold:

(iv)P i is the projector ontoV i ∩W i−1 alongV i−1 ⊕W i

Proof (i): Obvious (ii): Use P i P i−1 = P i−1 P i = P i−1 (iii): It follows from (P ∗ i ) 2 = P ∗ i that rank(P ∗ i ) = tr(P ∗ i ) = tr(P i −P i−1 ) tr(P i ) −tr(P i−1 ) Hence, P k i=1 rank(P ∗ i ) = tr(P k )−tr(P 0 ) = n, from which P ∗ i P ∗ j = O follows by Theorem 2.13 (iv): Clear from Theorem

Note The theorem above does not presuppose that P i is an orthogonal projec- tor However, if W i = V i ⊥ , P i and P ∗ i are orthogonal projectors The latter, in particular, is the orthogonal projector onto V i ∩ V i−1 ⊥

Decomposition into nondisjoint subspaces

In this section, we present several theorems indicating how projectors are decomposed when the corresponding subspaces are not necessarily disjoint.

We elucidate their meaning in connection with the commutativity of pro- jectors.

We first consider the case in which there are two direct-sum decomposi- tions of E n , namely

E n =V 1 ⊕W 1 =V 2 ⊕W 2 , as given in (2.21) Let V 12 =V 1 ∩V 2 denote the product space between V 1 and V 2 , and let V 3 denote a complement subspace to V 1 +V 2 in E n Fur- thermore, letP 1+2 denote the projection matrix ontoV 1+2 =V 1 +V 2 along

V 3 , and let P j (j = 1,2) represent the projection matrix onto V j (j= 1,2) along W j (j= 1,2) Then the following theorem holds.

Theorem 2.16 (i)The necessary and sufficient condition forP 1+2 =P 1 +

(V 1+2 ∩W 2 )⊂(V 1 ⊕V 3 ) (2.36) (ii)The necessary and sufficient condition for P 1+2 =P 1 +P 2 −P 2 P 1 is

Proof (i): Since V 1+2 ⊃ V 1 and V 1+2 ⊃ V 2 , P 1+2 −P 1 is the projector onto V 1+2 ∩W 1 along V 1 ⊕V 3 by Theorem 2.8 Hence, P 1+2 P 1 =P 1 and

P 1+2 P 2 =P 2 Similarly,P 1+2 −P 2 is the projector ontoV 1+2 ∩W 2 along

Corollary Assume that the decomposition (2.21) holds The necessary and sufficient condition for P 1 P 2 =P 2 P 1 is that both (2.36) and (2.37) hold.

The following theorem can readily be derived from the theorem above.

Theorem 2.17 LetE n = (V 1 +V 2 )⊕V 3 ,V 1 =V 11 ⊕V 12 , andV 2 =V 22 ⊕V 12 , where V 12 =V 1 ∩V 2 Let P ∗ 1+2 denote the projection matrix onto V 1 +V 2 along V 3 , and let P ∗ 1 and P ∗ 2 denote the projectors onto V 1 along V 3 ⊕V 22 and onto V 2 alongV 3 ⊕V 11 , respectively Then,

P ∗ 1+2 =P ∗ 1 +P ∗ 2 −P ∗ 1 P ∗ 2 (2.39) Proof Since V 11 ⊂V 1 and V 22 ⊂V 2 , we obtain

V 1+2 ∩W 2 =V 11 ⊂(V 1 ⊕V 3 ) andV 1+2 ∩W 1 =V 22 ⊂(V 2 ⊕V 3 ) by settingW 1 =V 22 ⊕V 3 andW 2 =V 11 ⊕V 3 in Theorem 2.16.

Another proof Let y = y 1 +y 2 +y 12 +y 3 ∈ E n , where y 1 ∈ V 11 , y 2 ∈V 22 ,y 12 ∈V 12 , andy 3 ∈V 3 Then it suffices to show that (P ∗ 1 P ∗ 2 )y (P ∗ 2 P ∗ 1 )y Q.E.D.

Let P j (j = 1,2) denote the projection matrix onto V j along W j As- sume thatE n =V 1 ⊕W 1 ⊕V 3 =V 2 ⊕W 2 ⊕V 3 andV 1 +V 2 =V 11 ⊕V 22 ⊕V 12 hold However, W 1 =V 22 may not hold, even if V 1 = V 11 ⊕V 12 That is, (2.38) and (2.39) hold only when we setW 1 =V 22 and W 2 =V 11

Theorem 2.18 Let P 1 and P 2 be the orthogonal projectors onto V 1 and

V 2 , respectively, and let P 1+2 denote the orthogonal projector onto V 1+2 Let V 12 =V 1 ∩V 2 Then the following three statements are equivalent:

Proof (i) →(ii): Obvious from Theorem 2.16.

(ii) → (iii): P 1+2 =P 1 +P 2 −P 1 P 2 ⇒ (P 1+2 −P 1 )(P 1+2 −P 2 ) (P 1+2 −P 2 )(P 1+2 −P 1 ) =O⇒ V 11 andV 22 are orthogonal.

(iii)→(i): SetV 3 = (V 1 +V 2 ) ⊥ in Theorem 2.17 SinceV 11 andV 22 , and

V 1 andV 22 , are orthogonal, the result follows Q.E.D.

When P 1 ,P 2 , andP 1+2 are orthogonal projectors, the following corol- lary holds.

Commutative projectors

This section explores orthogonal projectors, highlighting the significance of Theorem 2.18 and its corollary Additionally, we extend these findings to scenarios involving three or more subspaces.

Theorem 2.19 LetP j denote the orthogonal projector ontoV j IfP 1 P 2 P 2 P 1 , P 1 P 3 =P 3 P 1 , and P 2 P 3 =P 3 P 2 , the following relations hold:

Proof Let P 1+(2∩3) denote the orthogonal projector onto V 1 + (V 2 ∩V 3 ). Then the orthogonal projector ontoV 2 ∩V 3 is given byP 2 P 3 (or byP 3 P 2 ). Since P 1 P 2 =P 2 P 1 ⇒P 1 P 2 P 3 =P 2 P 3 P 1 , we obtain

P 1+(2∩3) =P 1 +P 2 P 3 −P 1 P 2 P 3 by Theorem 2.18 On the other hand, from P 1 P 2 = P 2 P 1 and P 1 P 3 P 3 P 1 , the orthogonal projectors ontoV 1 +V 2 and V 1 +V 3 are given by

P 1+2 =P 1 +P 2 −P 1 P 2 andP 1+3 =P 1 +P 3 −P 1 P 3 , respectively, and so P 1+2 P 1+3 =P 1+3 P 1+2 holds Hence, the orthogonal projector onto (V 1 +V 2 )∩(V 1 +V 3 ) is given by

(P 1 +P 2 −P 1 P 2 )(P 1 +P 3 −P 1 P 3 ) =P 1 +P 2 P 3 −P 1 P 2 P 3 , which impliesP 1+(2∩3) =P 1+2 P 1+3 Since there is a one-to-one correspon- dence between projectors and subspaces, (2.40) holds.

Relations (2.41) and (2.42) can be similarly proven by noting that (P 1 +

The three identities from (2.40) to (2.42) indicate the distributive law of subspaces, which holds only if the commutativity of orthogonal projectors holds.

We now present a theorem on the decomposition of the orthogonal pro- jectors defined on the sum spaceV 1 +V 2 +V 3 ofV 1 ,V 2 , andV 3

Theorem 2.20 Let P 1+2+3 denote the orthogonal projector onto V 1 +V 2 +

V 3 , and let P 1 , P 2 , and P 3 denote the orthogonal projectors onto V 1 , V 2 , and V 3 , respectively Then a sufficient condition for the decomposition

P 2+3 =P 2 +P 3 −P 2 P 3 We therefore haveP 1+2 P 2+3 =P 2+3 P 1+2 We also haveP 1+2+3 =P (1+2)+(1+3), from which it follows that

An alternative proof FromP 1 P 2+3 =P 2+3 P 1 , we haveP 1+2+3 =P 1 +

P 2+3 −P 1 P 2+3 If we substituteP 2+3 =P 2 +P 3 −P 2 P 3 into this equation, we obtain (2.43) Q.E.D.

Assume that (2.44) holds, and let

Additionally, all matrices on the right-hand side of (2.45) are orthogonal projectors, which are also all mutually orthogonal.

P 12(3) = P 1 P 2 (I n −P 3 ), P 13(2) = P 1 P 3 (I n −P 2 ), and P 23(1) = P 2 P 3 (I n −P 1 ), the decomposition of the projector P 1∪2∪3 corresponds with the decomposition of the subspace V 1 + V 2 + V 3

Theorem 2.20 can be generalized as follows.

Corollary LetV =V 1 +V 2 +ã ã ã+V s (s≥2) LetP V denote the orthogonal projector onto V, and let P j denote the orthogonal projector onto V j A sufficient condition for

Noncommutative projectors

We now consider the case in which two subspaces V 1 and V 2 and the cor- responding projectors P 1 and P 2 are given but P 1 P 2 = P 2 P 1 does not necessarily hold Let Q j =I n −P j (j = 1,2) Then the following lemma holds.

= Sp(Q 2 P 1 )⊕Sp(P 2 ) (2.50) Proof [P 1 ,Q 1 P 2 ] and [Q 2 P 1 ,P 2 ] can be expressed as

SinceS and T are nonsingular, we have rank(P 1 ,P 2 ) = rank(P 1 ,Q 1 P 2 ) = rank(Q 2 P 1 ,P 1 ), which implies

Furthermore, letP 1 x+Q 1 P 2 y=0 Premultiplying both sides by P 1 , we obtainP 1 x=0 (sinceP 1 Q 1 =O), which implies Q 1 P 2 y=0 Hence, Sp(P 1 ) and Sp(Q 1 P 2 ) give a direct-sum decomposition ofV 1 +V 2 , and so do Sp(Q 2 P 1 ) and Sp(P 2 ) Q.E.D.

The following theorem follows from Lemma 2.4.

Let Q j = I n −P j (j = 1,2), where P j is the orthogonal projector onto

V j , and let P ∗ , P ∗ 1 , P ∗ 2 ,P 1[2] , andP 2[1] denote the projectors onto V 1 +V 2 alongW, onto V 1 alongV 2[1] ⊕W, ontoV 2 alongV 1[2] ⊕W, ontoV 1[2] along

V 2 ⊕W, and onto V 2[1] alongV 1 ⊕W, respectively Then,

Note When W = (V 1 + V 2 ) ⊥ , P ∗ j is the orthogonal projector onto V j , while P ∗ j[i] is the orthogonal projector onto V j [i].

Corollary Let P denote the orthogonal projector onto V = V 1 ⊕V 2 , and let P j (j = 1,2) be the orthogonal projectors onto V j If V i and V j are orthogonal, the following equation holds:

Norm of Projection Vectors

We now present theorems concerning the norm of the projection vectorP x

(x∈E n ) obtained by projectingx onto Sp(P) along Ker(P) by P.

(The proof is trivial and hence omitted.)

Theorem 2.22 Let P denote a projection matrix (i.e., P 2 = P) The necessary and sufficient condition to have

||P x|| ≤ ||x|| (2.56) for an arbitrary vector x is

Proof (Sufficiency) Let x be decomposed as x = P x+ (I n −P)x We have (P x) 0 (I n −P)x=x 0 (P 0 −P 0 P)x= 0 becauseP 0 =P ⇒P 0 P =P 0 from Lemma 2.5 Hence,

(Necessity) By assumption, we have x 0 (I n −P 0 P)x≥0, which implies

I n −P 0 P isnndwith all nonnegative eigenvalues Letλ 1 , λ 2 ,ã ã ã, λ n denote the eigenvalues of P 0 P Then, 1−λ j ≥ 0 or 0 ≥ λ j ≥ 1 (j = 1,ã ã ã, n).

Hence, P n j=1 λ 2 j ≤ P n j=1 λ j , which implies tr(P 0 P) 2 ≤tr(P 0 P).

On the other hand, we have

(tr(P 0 P)) 2 = (tr(P P 0 P)) 2 ≤tr(P 0 P)tr(P 0 P) 2 from the generalized Schwarz inequality (set A 0 = P and B = P 0 P in (1.19)) and P 2 =P Hence, tr(P 0 P) ≤tr(P 0 P) 2 ⇒tr(P 0 P) = tr(P 0 P) 2 , from which it follows that tr{(P −P 0 P) 0 (P −P 0 P)} = tr{P 0 P −P 0 P −

Corollary LetM be a symmetricpdmatrix, and define the (squared) norm of x by

The necessary and sufficient condition for a projection matrixP (satisfying

||P x|| 2 M ≤ ||x|| 2 M (2.59) for an arbitraryn-component vector x is given by

Proof Let M = U∆ 2 U 0 be the spectral decomposition of M, and let

M 1/2 = ∆U 0 Then, M −1/2 = U∆ −1 Define y = M 1/2 x, and let ˜P M 1/2 P M −1/2 Then, ˜P 2 = ˜P, and (2.58) can be rewritten as ||P y||˜ 2 ≤

||y|| 2 By Theorem 2.22, the necessary and sufficient condition for (2.59) to hold is given by

Note The theorem above implies that with an oblique projector P (P 2 = P , but

P 0 6= P ) it is possible to have ||P x|| ≥ ||x|| For example, let

Theorem 2.23 Let P 1 and P 2 denote the orthogonal projectors onto V 1 andV 2 , respectively Then, for an arbitrary x∈E n , the following relations hold:

Proof (2.62): Replace xby P 1 xin Theorem 2.22.

(2.63): By Theorem 2.11, we have P 1 P 2 =P 2 , from which (2.63) fol- lows immediately.

Let x 1 ,x 2 ,ã ã ã,x p represent p n-component vectors in E n , and define

X = [x 1 ,x 2 ,ã ã ã,x p ] From (1.15) and P = P 0 P, the following identity holds:

The above identity and Theorem 2.23 lead to the following corollary.

(ii)Let P denote an orthogonal projector onto an arbitrary subspace in E n

Proof (i): Obvious from Theorem 2.23 (ii): We have tr(P j P) = tr(P j P 2 )

= tr(P P j P) (j= 1,2), and (P 1 −P 2 ) 2 =P 1 −P 2 , so that tr(P P 1 P)−tr(P P 2 P) = tr(SS 0 )≥0, whereS = (P 1 −P 2 )P It follows that tr(P 1 P)≥tr(P 2 P).

We next present a theorem on the trace of two orthogonal projectors.

Theorem 2.24 Let P 1 and P 2 be orthogonal projectors of order n Then the following relations hold: tr(P 1 P 2 ) = tr(P 2 P 1 )≤min(tr(P 1 ),tr(P 2 )) (2.65)

Proof We have tr(P 1 )−tr(P 1 P 2 ) = tr(P 1 (I n −P 2 )) = tr(P 1 Q 2 ) tr(P 1 Q 2 P 1 ) = tr(S 0 S) ≥ 0, where S = Q 2 P 1 , establishing tr(P 1 ) ≥ tr(P 1 P 2 ) Similarly, (2.65) follows from tr(P 2 )≥tr(P 1 P 2 ) = tr(P 2 P 1 ).

Note From (1.19), we obtain tr(P 1 P 2 ) ≤ p tr(P 1 )tr(P 2 ) (2.66)

However, (2.65) is more general than (2.66) because p tr(P 1 )tr(P 2 ) ≥ min(tr(P 1 ), tr(P 2 )).

Matrix Norm and Projection Matrices

LetA= [a ij ] be annbypmatrix We define its Euclidean norm (also called the Frobenius norm) by

X p j=1 a 2 ij (2.67) Then the following four relations hold.

Let both A andB be n by p matrices Then,

Let U andV be orthogonal matrices of orders nand p, respectively Then

Proof Relations (2.68) and (2.69) are trivial Relation (2.70) follows im- mediately from (1.20) Relation (2.71) is obvious from tr(V 0 A 0 U 0 U AV) = tr(A 0 AV V 0 ) = tr(A 0 A).

Note Let M be a symmetric nnd matrix of order n Then the norm defined in

The norm of A with respect to M, often referred to as a metric matrix, exhibits properties similar to those outlined in Lemma 2.6 Additionally, there exist various alternative definitions for the norm of A.

(ii) ||A|| 2 = à 1 (A), where à 1 (A) is the largest singular value of A (see Chapter 5), and

All of these norms satisfy (2.68), (2.69), and (2.70) (However, only ||A|| 2 satisfies(2.71).)

Lemma 2.7 Let P and P˜ denote orthogonal projectors of ordersn and p, respectively Then,

(the equality holds if and only if P A=A) and

(the equality holds if and only if AP˜ =A).

Proof (2.73): Square both sides and subtract the right-hand side from the left Then, tr(A 0 A)−tr(A 0 P A) = tr{A 0 (I n −P)A}

= tr(A 0 QA) = tr(QA) 0 (QA)≥0 (where Q=I n −P).

The two lemmas above lead to the following theorem.

Theorem 2.25 Let A be an nby pmatrix, B and Y nby r matrices, and

||A−BX|| ≥ ||(I n −P B )A||, (2.75) where P B is the orthogonal projector ontoSp(B) The equality holds if and only ifBX=P B A We also have

||A−Y C|| ≥ ||A(I p −P C 0 )||, (2.76) where P C 0 is the orthogonal projector onto Sp(C 0 ) The equality holds if and only if Y C=AP C 0 We also have

The equality holds if and only if

The equality holds when QA=O⇐⇒P A=A.

(2.74): This can be proven similarly by noting that||AP˜|| 2 = tr( ˜P A 0 AP˜)

= tr(AP A˜ 0 ) = ||P A˜ 0 || 2 The equality holds when ˜QA 0 = O ⇐⇒ P A˜ 0 A 0 ⇐⇒AP˜ =A, where ˜Q=I n −P˜ Q.E.D. or

Proof (2.75): We have (I n −P B )(A −BX) = A −BX −P B A +

BX = (I n −P B )A Since I n −P B is an orthogonal projector, we have

||A−BX|| ≥ ||(I n −P B )(A−BX)||=||(I n −P B )A||by (2.73) in Lemma 2.7 The equality holds when (I n −P B )(A−BX) = A−BX, namely

(2.76): It suffices to use (A−Y C)(I p −P C 0 ) =A(I p −P C 0 ) and (2.74) in Lemma 2.7 The equality holds when (A−Y C)(I p −P C 0 ) =A−Y C holds, which impliesY C =AP C 0

P C 0 )|| The first equality condition (2.78) follows from the first relation above, and the second equality condition (2.79) follows from the second rela- tion above Q.E.D.

Note Relations (2.75), (2.76), and (2.77) can also be shown by the least squares method Here we show this only for (2.77) We have

The objective is to minimize the expression tr(A − Y C)₀(A − Y C) − 2tr(BX)₀(A − Y C) + tr(BX)₀(BX) By differentiating this criterion with respect to X and setting the derivative to zero, we derive the equation B₀(A − Y C) = B₀BX Upon premultiplying by B(B₀B)⁻¹, we find that P_B(A − Y C) = BX Additionally, we can expand the original criterion to yield tr(A − BX)₀(A − BX) − 2tr(Y C(A − BX)₀) + tr(Y C)(Y C)₀.

By differentiating the criterion concerning Y and equating the result to zero, we derive the equation C(A − BX) = CC'Y' or (A − BX)C' = YCC' Upon postmultiplying the latter by (CC')^(-1)C', we arrive at the expression (A − BX)PC' = YC.

P B (A − Y C ) = BX, we obtain P B A(I p − P C 0 ) = BX(I p − P C 0 ) after some simplification If, on the other hand, BX = P B (A − Y C) is substituted into

(A − BX)P C 0 = Y C , we obtain (I n − P B )AP C 0 = (I n − P B )Y C (In the derivation above, the regular inverses can be replaced by the respective generalized inverses See the next chapter.)

General Form of Projection Matrices

In this section, we explore a generalized form of projection matrices that extend beyond the traditional definition of idempotent square matrices, specifically those that meet the criterion P² = P This approach is informed by the foundational work of Rao (1974) and Rao and Yanai (1979), offering a broader perspective on projection matrices in mathematical applications.

In the context of vector spaces, let V be a subset of E^n, distinct from E^n itself, and decompose it into a direct sum of m subspaces, expressed as V = V₁ ⊕ V₂ ⊕ ⊕ Vₘ A square matrix Pₗ of order n, which transforms any vector y in V to the subspace Vₗ, is referred to as the projection matrix onto Vₗ This projection occurs along the complementary subspace V(j), defined as the direct sum of all subspaces excluding Vₗ, specifically V₁ ⊕ ⊕ Vₗ₋₁ ⊕ Vₗ₊₁ ⊕ ⊕ Vₘ.

Letx j ∈V j Then any x∈V can be expressed as x=x 1 +x 2 +ã ã ã+x m = (P ∗ 1 +P ∗ 2 +ã ã ãP ∗ m )x.

Premultiplying the equation above byP ∗ j , we obtain

P ∗ i P ∗ j x=0 (i6=j) and (P ∗ j ) 2 x=P ∗ j x (i= 1,ã ã ã, m) (2.82) since Sp(P 1 ),Sp(P 2 ),ã ã ã,Sp(P m ) are mutually disjoint However, V does not cover the entireE n (x∈V 6=E n ), so (2.82) does not imply (P ∗ j ) 2 =P ∗ j orP ∗ i P ∗ j =O (i6=j).

Let V 1 and V 2 ∈E 3 denote the subspaces spanned bye 1 = (0,0,1) 0 and e 2 = (0,1,0) 0 , respectively Suppose

The operator P ∗ acts as a projector onto the subspace V 1 while nullifying the subspace V 2, as defined in Definition 2.3 It is important to note that (P ∗ ) 2 does not equal P ∗ unless specific conditions are met, such as when both a and b are zero or when a equals 1 and c equals 0 This indicates that when the subspace V does not span the entire space E n, the projector P ∗ j, according to Definition 2.3, fails to be idempotent However, by identifying a complementary subspace of V, it is possible to derive an idempotent matrix from P ∗ j.

Theorem 2.26 Let P ∗ j (j= 1,ã ã ã, m) denote the projector in the sense of Definition 2.3, and let P denote the projector onto V along V m+1 , where

V =V 1 ⊕V 2 ⊕ ã ã ã ⊕V m is a subspace in E n and whereV m+1 is a complement subspace toV Then,

P j =P ∗ j P (j= 1,ã ã ãm) and P m+1 =I n −P (2.83) are projectors (in the sense of Definition 2.1) onto V j (j = 1,ã ã ã, m+ 1) alongV (j) ∗ =V 1 ⊕ ã ã ã ⊕V j−1 ⊕V j+1 ⊕ ã ã ã ⊕V m ⊕V m+1

Proof Let x∈ V If x∈ V j (j = 1,ã ã ã, m), we have P ∗ j P x=P ∗ j x=x.

On the other hand, ifx∈V i (i6=j,i= 1,ã ã ã, m), we haveP ∗ j P x=P ∗ j x 0 Furthermore, if x∈ V m+1 , we have P ∗ j P x =0 (j = 1,ã ã ã, m) On the other hand, if x∈ V, we have P m+1 x = (I n −P)x =x−x = 0, and if x∈V m+1 ,P m+1 x= (I n −P)x=x−0=x Hence, by Theorem 2.2, P j

(j= 1,ã ã ã, m+ 1) is the projector ontoV j along V (j) Q.E.D.

Exercises for Chapter 2

2 Let P A and P B denote the orthogonal projectors onto Sp(A) and Sp(B), re- spectively Show that the necessary and sufficient condition for Sp(A) = {Sp(A) ∩

3 Let P be a square matrix of order n such that P 2 = P , and suppose

||P x|| = ||x|| for any n-component vector x Show the following:

4 Let Sp(A) = Sp(A 1 ) ⊕ ã ã ã ã ⊕ ã Sp(A m ), and let P j (j = 1, ã ã ã , m) denote the projector onto Sp(A j ) For ∀x ∈ E n :

(Also, show that the equality holds if and only if Sp(A) = E n )

(ii) Show that Sp(A i ) and Sp(A j ) (i 6= j) are orthogonal if Sp(A) = Sp(A 1 ) ⊕

Sp(A 2 ) ⊕ ã ã ã ⊕ Sp(A m ) and the inequality in (i) above holds.

5 Let E n = V 1 ⊕ W 1 = V 2 ⊕ W 2 = V 3 ⊕ W 3 , and let P j denote the projector onto

(i) Let P i P j = O for i 6= j Then, P 1 +P 2 + P 3 is the projector onto V 1 + V 2 + V 3 along W 1 ∩ W 2 ∩ W 3

(ii) Let P 1 P 2 = P 2 P 1 , P 1 P 3 = P 3 P 1 , and P 2 P 3 = P 3 P 2 Then P 1 P 2 P 3 is the projector onto V 1 ∩ V 2 ∩ V 3 along W 1 + W 2 + W 3

(iii) Suppose that the three identities in (ii) hold, and let P 1+2+3 denote the pro- jection matrix onto V 1 + V 2 + V 3 along W 1 ∩ W 2 ∩ W 3 Show that

Q [A,B] = Q A Q Q A B , where Q [A,B] , Q A , and Q Q A B are the orthogonal projectors onto the null space of [A, B], onto the null space of A, and onto the null space of Q A B, respectively.

P X = P XA + P X(X 0 X) −1 B , where P X , P XA , and P X(X 0 X) −1 B are the orthogonal projectors onto Sp(X), Sp(XA), and Sp(X(X 0 X ) −1 B), respectively, and A and B are such that Ker(A 0 )

(b) Use the decomposition above to show that

P [X 1 ,X 2 ] = P X 1 + P Q X 1 X 2 , where X = [X 1 , X 2 ], P Q X 1 X 2 is the orthogonal projector onto Sp(Q X 1 X 2 ), and

8 Let E n = V 1 ⊕ W 1 = V 2 ⊕ W 2 , and let P 1 = P V 1 ãW 1 and P 2 = P V 2 ãW 2 be two projectors (not necessarily orthogonal) of the same size Show the following:

(a) The necessary and sufficient condition for P 1 P 2 to be a projector is V 12 ⊂

V 2 ⊕ (W 1 ∩ W 2 ), where V 12 = Sp(P 1 P 2 ) (Brown and Page, 1970).

(b) The condition in (a) is equivalent to V 2 ⊂ V 1 ⊕ (W 1 ∩ V 2 ) ⊕(W 1 ∩ W 2 ) (Werner, 1992).

9 Let A and B be n by a (n ≥ a) and n by b (n ≥ b) matrices, respectively Let

P A and P B be the orthogonal projectors defined by A and B, and let Q A and

Q B be their orthogonal complements Show that the following six statements are equivalent: (1) P A P B = P B P A , (2) A 0 B = A 0 P B P A B, (3) (P A P B ) 2 = P A P B ,

Definition through Linear Transformations

For a nonsingular square matrix \( A \) of order \( n \), the kernel of \( A \) is given by \( \text{Ker}(A) = \{0\} \) Consequently, the solution vector \( x \) in the equation \( y = Ax \) is uniquely determined by \( x = A^{-1}y \), where \( A^{-1} \) is the inverse matrix of \( A \) This inverse matrix defines a transformation from \( y \in E^n \) to \( x \in E^m \), while the matrix \( A \) itself represents the transformation from \( x \) to \( y \).

Ax=yhas a solution if and only ify∈Sp(A) Even then, if Ker(A)6={0}, there are many solutions to the equationAx=ydue to the existence ofx 0

(6=0) such thatAx 0 =0, so that A(x+x 0 ) =y Ify6∈Sp(A), there is no solution vector to the equationAx=y.

Assuming that \( y \) belongs to the span of matrix \( A \), there exists a linear transformation \( G \) such that \( x = Gy \) is a solution to the equation \( Ax = y \) The existence of this transformation is confirmed by the rank of \( A \), which equals the dimension of its span, denoted as \( r \) The basis vectors for the span, represented as \( y_1, y_2, \ldots, y_r \), correspond to specific vectors \( x_i \) such that \( Ax_i = y_i \) for \( i = 1, \ldots, r \) An arbitrary vector \( y \) in the span can be expressed as a linear combination \( y = c_1 y_1 + \ldots + c_r y_r \), leading to the transformation of \( y \) into \( x = c_1 x_1 + \ldots + c_r x_r \), thereby confirming that this is a valid linear transformation.

Definition 3.1 Let A be an n by m matrix, and assume that y ∈Sp(A).

If a solution to the linear equation Ax= y can be expressed as x=A − y, anmbynmatrixA − is called a generalized inverse (g-inverse) matrix ofA. © Springer Science+Business Media, LLC 2011

Statistics for Social and Behavioral Sciences, DOI 10.1007/978-1-4419-9887-3_3, 55

H Yanai et al., Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition,

Theorem 3.1 The necessary and sufficient condition for anm by nmatrix

A − to be a generalized inverse matrix of A is given by

Proof (Necessity) Let x = A − y denote a solution to Ax = y Since y ∈ Sp(A) can be expressed as y = Aα for some α, Ax = AA − y AA − Aα=Aα=y, which impliesAA − A=A.

(Sufficiency) AA − A = A ⇒ AA − Aα= Aα Define y = Aα Then,

AA − y=y, from which a solution vectorx=A − yis obtained Q.E.D.

The concept of a generalized inverse matrix was first introduced by Rao in 1962, with property (3.1) being a fundamental characteristic that defines this mathematical concept Notably, when a matrix A is square and nonsingular, its regular inverse A-1 satisfies property (3.1), indicating that the regular inverse is a special case of a generalized inverse Furthermore, the definition of a generalized inverse matrix allows for its application to non-square matrices, expanding its utility beyond traditional square matrices.

Let \( a \) represent any real number The value of \( b \) that meets the condition \( ab = a \) is defined as \( b = a^{-1} \) when \( a \neq 0 \) and \( b = k \) when \( a = 0 \), where \( k \) can be any real number This relationship is a specific instance of a broader equation and can be referred to as a generalized reciprocal.

A generalized inverse matrix, as defined in (3.1), establishes a linear transformation from y to x in E m when y belongs to the span of A (Sp(A)) Additionally, even if y does not belong to Sp(A), the generalized inverse A − can still be utilized to define a transformation from y to x.

Let V = Sp(A), and let ˜W = Ker(A) denote the null space of A.

Furthermore, let W and ˜V denote complement subspaces ofV and ˜W, re- spectively Then,

In the context of vector spaces, consider an arbitrary vector y in E n, which can be expressed as the sum of two components, y = y 1 + y 2, where y 1 belongs to the subspace V and y 2 belongs to the subspace W Similarly, let x be a vector in E m, decomposed as x = x 1 + x 2, with x 1 in the subspace V˜ and x 2 in W˜ The process of transforming y into x, by mapping y 1 to x 1 and y 2 to x 2, constitutes a linear transformation from the vector space E n to E m.

The transformation from vector space V to ˜V is one-to-one, ensuring a unique inverse transformation from ˜V back to V This inverse transformation is represented as x 1 = φ − V (y 1 ) Additionally, we select a linear transformation from ˜W to W such that x 2 = φ − M (y 2 ) Consequently, we define a transformation φ − that maps an element y ∈ E n to an element x ∈ E m, expressed as x = φ − V (y 1 ) + φ − M (y 2 ) = φ − (y).

We define the matrix representation of this linear transformationφ − , namely

A − , as a generalized inverse matrix of A As is clear from this definition, there is some arbitrariness in the choice of A − due to the arbitrariness in the choice ofW, ˜V, and Φ − M (SeeFigure 3.1.)

Figure 3.1: Geometric representation of a generalized inverse A − V and

W˜ are determined uniquely by A, but W and ˜V are arbitrary except that they satisfy (3.2) There is also some arbitrariness in the choice ofφ − M Lemma 3.1 If y 1 ∈V andy 2 ∈W, φ − V (y 1 ) =A − y 1 and Φ − M (y 2 ) =A − y 2 (3.4) Proof Substitutey 1 =y 1 +0andy 2 =0+y 2 in (3.3) Q.E.D.

Conversely, the following statement holds.

Theorem 3.2 Let A − be a generalized inverse of A, and let V = Sp(A) and W˜ = Ker(A) Then there exist decompositions E n = V ⊕W and

E m = ˜V ⊕W˜ Let y ∈ E n be decomposed as y =y 1 +y 2 , where y 1 ∈ V and y 2 ∈W Furthermore, let x=A − y=x 1 +x 2 (wherex 1 ∈V ,˜ x 2 ∈W˜) (3.5)

Proof Let E n = V ⊕W be an arbitrary direct-sum decomposition of

E n Let Sp(A − V ) = ˜V denote the image of V by φ − V Since y ∈ Sp(A) for any y such that y = y 1 +y 2 , where y 1 ∈ V and y 2 ∈ W, we have

A − y 1 = x 1 ∈ V˜ and Ax 1 = y 1 Also, if y 1 6= 0, then y 1 = Ax 1 6= 0, and so x 1 6∈ W˜ or ˜V ∩W˜ = {0} Furthermore, if x 1 and ˜x 1 ∈ V but x 1 6= ˜x 1 , then x 1 −x˜ 1 6∈ W˜, so that A(x 1 −x˜ 1 ) 6= 0, which implies

Ax 1 6=Ax˜ 1 Hence, the correspondence betweenV and ˜V is one-to-one Be- cause dim( ˜V) = dim(V) implies dim( ˜W) =m−rank(A), we obtain ˜V⊕W˜ E m Q.E.D.

Theorem 3.3 Let E n =V ⊕W and E m = ˜V ⊕W˜, where V = Sp(A)and

W = Ker(A) Let an arbitrary vectory∈E n be decomposed as y=y 1 +y 2 , where y 1 ∈V and y 2 ∈W Suppose

A − y=A − y 1 +A − y 2 =x 1 +x 2 (3.7) holds, where x 1 ∈V˜ and x 2 ∈W˜ Then the following three statements are equivalent:

(ii)AA − is the projector onto V alongW.

(iii) A − A is the projector ontoV˜ along W˜.

Proof (i) →(ii): SinceA − is a generalized inverse ofA, we haveA − y 1 x 1 andA − y 2 =x 2 by Theorem 3.2 Premultiplying (3.7) by Aand taking (3.6) into account, we obtain

AA − y 1 =Ax 1 =y 1 andAA − y 2 =Ax 2 =0, establishing (ii) by Theorem 2.2.

(ii)→(iii): Ax 1 =y 1 ⇒A − Ax 1 =A − y 1 =x 1 On the other hand, for x 2 ∈ W˜, we have Ax 2 =0 ⇒ A − Ax 2 = 0, establishing (iii) by Theorem2.2.

(iii) → (i): Decomposey=y 1 +y 2 , where y 1 ∈V andy 2 ∈W Then,

A − y=A − y 1 +A − y 2 =x 1 +x 2 , where x 1 ∈ V˜ and x 2 ∈ W˜ Hence, A − Ax 1 = x 1 ⇒ A − y 1 = x 1 and soA − y 2 =x 2 From the properties of a generalized inverse matrix shown in Lemma 3.1 and Theorem 3.2, it is clear thatA − is a generalized inverse of

General Properties

Properties of generalized inverse matrices

Theorem 3.4 Let H = AA − , and let F = A − A Then the following relations hold:

H 2 =HandF 2 =F, (3.8) rank(H) = rank(F) = rank(A), (3.9) rank(A − )≥rank(A), (3.10) rank(A − AA − ) = rank(A) (3.11)

Proof (3.8): Clear from the definition of generalized inverses.

(3.9): rank(A)≥rank(AA − ) = rank(H), and rank(A) = rank(AA − A)

= rank(HA) ≤rank(H), from which it follows that rank(A) = rank(H). rank(F) = rank(A) can be similarly proven.

(3.10): rank(A) = rank(AA − A)≤rank(AA − )≤rank(A − ).

(3.11): rank(A − AA) ≤ rank(A − A) We also have rank(A − AA − ) ≥ rank(A − AA − A) = rank(A − A), so that rank(A − AA − ) = rank(A − A) rank(A) Q.E.D.

Example 3.1 Find a generalized inverse ofA "

The generalized inverse of a matrix A, denoted as A −, is typically not uniquely defined When multiple generalized inverses exist, they are collectively represented as {A − } For instance, consider the matrix A = ã 1 1.

We next derive several basic theorems regardingA −

Theorem 3.5 The following relations hold for a generalized inverseA − of A:

Proof (3.12): AA − A = A ⇒ A 0 (A − ) 0 A 0 = A 0 Hence, {(A − ) 0 } ⊂ {(A 0 ) − } On the other hand, A 0 (A 0 ) − A 0 =A 0 , and so ((A 0 ) − ) 0 ∈ {A − } ⇒ {(A 0 ) − } ⊂ {(A − ) 0 } Thus, {(A − ) 0 }={(A 0 ) − }.

(3.14): Let G denote a generalized inverse of A 0 A Then G 0 is also a generalized inverse of A 0 A, andS = (G+G 0 )/2 is a symmetric generalized inverse of A 0 A Let H = ASA 0 −A(A 0 A) − A 0 Then, using (3.13), we obtain

ThenP A and P A 0 are the orthogonal projectors ontoSp(A) andSp(A 0 ).

If Sp(A) equals Sp(˜A), then the projection operator P A is equal to P A ˜ This indicates that P A relies solely on the span of A, denoted as Sp(A), rather than the specific basis vectors that define it Therefore, a more precise notation would be P Sp(A), but for the sake of simplicity, we continue to use P A.

3.2.2 Representation of subspaces by generalized inverses

We start with the following lemma.

Lemma 3.2 Let A − denote an arbitrary generalized inverse of A Then,

Proof That Sp(A)⊃Sp(AA − ) is clear On the other hand, from rank(A ×A − )≥rank(AA − A) = rank(A), we have Sp(AA − )⊃Sp(A)⇒Sp(A) Sp(AA − ) Q.E.D.

Theorem 3.6 Using a generalized inverse A of A − , we can express any complement subspaceW of V = Sp(A)as

Proof (Sufficiency) LetAA − x+ (I n −AA − )y=0 Premultiplying both sides by AA − , we obtain AA − x = 0, which implies (I n −AA − )y = 0.

On the other hand, let P = AA − Then, P 2 = P, and so rank(AA − ) + rank(I n −AA − ) =n Hence,E n =V ⊕W.

(Necessity) Let P =AA − Then P 2 = P From Lemma 2.1, the null (annihilation) space ofP is given by Sp(I n −P) Hence, Sp(P)∩Sp(I n −

P) ={0}, and Sp(I n −P) gives a general expression for a complement sub- space of Sp(P), establishing (3.18) Q.E.D.

Like Lemma 2.1, the following theorem is extremely useful in under- standing generalized inverses in relation to linear transformations.

Ker(A) = Sp(I n −A − A), (3.20) and a complement space of W˜ = Ker(A) is given by

V˜ = Sp(A − A), (3.21) where A − is a generalized inverse ofA.

Proof (3.19): Ax = 0 ⇒ A − Ax = 0 ⇒ Ker(A) ⊂ Ker(A − A) On the other hand, A − Ax = 0 ⇒ AA − Ax = Ax= 0 ⇒ Ker(A − A) ⊂Ker(A). Hence, Ker(A) = Ker(AA − ).

(3.21): Note that (I m −A − A) 2 =I m −A − A From Theorem 3.6, we obtain{Ker(A)} c = Sp(I m −(I m −A − A)) = Sp(A − A) Q.E.D.

From Theorem 3.6 and Lemma 3.3, we obtain the following theorem.

Theorem 3.7 Let A be an m by n matrix Then,

Sp(AA − )⊕Sp(I n −AA − ) =E n , (3.22) Sp(A − A)⊕Sp(I m −A − A) =E m (3.23)

Proof Clear from Ker(AA − ) = Sp(I n −AA − ) and Ker(A − A) = Sp(I m −

Note Equation (3.22) corresponds to E n = V ⊕ W , and (3.23) corresponds to

E m = ˜ V ⊕ W ˜ A complement subspace W = Sp(I n − AA − ) of V = Sp(A) = Sp(AA − ) in E n is not uniquely determined However, the null space (kernel) of A,

W ˜ = Sp(I m −A − A) = Sp(I m − A 0 (AA 0 ) − A) = Sp(A 0 ) ⊥ , is uniquely determined, although a complement subspace of ˜ W , namely ˜ V = Sp(A − A), is not uniquely determined (See Example 3.2.)

Note Equation (3.23) means rank(A − A) + rank(I m − A − A) = m and that (1.55) in Theorem 1.9 holds from rank(A − A) = rank(A) and rank(I m −

Find (i) W = Sp(I 2 −AA − ), (ii) ˜W Sp(I 2 −A − A), and (iii) ˜V = Sp(A − A).

Letx= 1−(a+c) Then W = Sp(I 2 −AA − ) is the unidimensional space spanned by vector (x, x−1) 0 (Sincex can take any value, Sp(I 2 −AA − ) is not uniquely determined.)

Hence, ˜W = Sp(I 2 −A − A) is the unidimensional space Y =−X spanned by (1,−1) 0 , which is uniquely determined.

Let a+b =x Then Sp(A − A) is generated by the two-component vector

(x,1−x) 0 Since x can take any value, it is not uniquely determined.

Note Sp(A) is spanned by (1, 1) 0 , and so it is represented by a line Y = X pass- ing through the origin Its complement subspace is a line connecting (0,0) and an arbitrary point P 1 (X, X − 1) on Y = X − 1 (Figure 3.2).

Note Since ˜ W = Ker(A) = {(1, −1) 0 }, ˜ V = Sp(A − A) is represented by a line connecting the origin and an arbitrary point P 2 (X, 1 − X ) on the line Y = 1 − X

Generalized inverses and linear equations

We use theorems in the previous section to represent solutions to linear equations in terms of generalized inverse matrices.

Theorem 3.8 Let Ax=b, and supposeb∈Sp(A) Then, x=A − b+ (I m −A − A)z, (3.24) where z is an arbitrarym-component vector.

Proof Letb∈Sp(A) andx 1 =A − b Then,Ax 1 =b On the other hand, from (3.20), a solution toAx 0 =0is given byx 0 = (I m −A − A)z Equation

(3.24) is obtained byx=x 1 +x 0 Conversely, it is clear that (3.24) satisfies

Corollary The necessary and sufficient condition for Ax = b to have a solution is

Proof The sufficiency is obvious The necessity can be shown by substitut- ingAx=bintoAA − Ax=Ax Q.E.D.

The corollary above can be generalized as follows.

Theorem 3.9 The necessary and sufficient condition for

AXB =C (3.26) to have a solution is

Proof The sufficiency can be shown by setting X = A − CB − in (3.26). The necessity can be shown by pre- and postmultiplying AXB = C by

AA − and B − B, respectively, to obtain AA − AXBB − B = AXB AA − CB − B Q.E.D.

Let A, B, and C represent matrices of dimensions n by p, q by r, and n by r, respectively, while X denotes a p by q matrix When the dimensions q and r are equal, and B is the identity matrix of order q, the condition required for the equation AX = C to possess a general solution is established.

The equation X = A − C + (I p − A − A)Z, with Z as an arbitrary square matrix of order p, leads to the condition AA − C = C When n equals p and A is the identity matrix of order p, a general solution for the equation XB = C is expressed as X = CB − +Z(I q −BB − ) In this case, the necessary and sufficient condition is CB − B = C.

X=A − CB − (3.28) is a solution to (3.26) In addition,

X=A − CB − + (I p −A − A)Z 1 +Z 2 (I q −BB − ) (3.29) also satisfies (3.26), which can be derived from

X 1 = (I p −A − A)Z 1 andX 2 =Z 2 (I q −BB − ), which satisfy AX 1 =O andX 2 B =O, respectively Furthermore,

X=A − CB − +Z−A − AZBB − (3.30) also satisfies (3.26), which is obtained by settingZ 2 =A − AZ and Z 1 =Z in (3.29) If we use A for bothB and C in the equation above, we obtain the following theorem.

Theorem 3.10 Let A − denote a generalized inverse of an n by m matrix

G=A − +Z 1 (I n −AA − ) + (I m −A − A)Z 2 (3.32) are both generalized inverses ofA, whereZ,Z 1 , and Z 2 are arbitrarym by nmatrices (Proof omitted.)

We give a fundamental theorem on inclusion relations between subspaces and the corresponding generalized inverses.

Proof (3.33): Sp(A) ⊃ Sp(B) implies that there exists a W such that

B =AW Hence,AA − B=AA − AW =AW =B.

(3.34): We have A 0 (A 0 ) − B 0 =B 0 from (3.33) Transposing both sides, we obtain B = B((A 0 ) − ) 0 A, from which we obtain B = BA − A because (A 0 ) − = (A − ) 0 from (3.12) in Theorem 3.5 Q.E.D.

Theorem 3.12 WhenSp(A+B)⊃Sp(B) andSp(A 0 +B 0 )⊃Sp(B 0 ),the following statements hold (Rao and Mitra, 1971):

Sp(A)∩Sp(B) = Sp(A(A+B) − B) (3.37) Proof (3.35): This is clear from the following relation:

(3.37): That Sp(A(A+B) − B)⊂Sp(A)∩Sp(B) is clear from A(A+

B) − B=B(A+B) − A On the other hand, let Sp(A)∩Sp(B) = Sp(AX|

AX=BY), whereX = (A+B) − BandY = (A+B) − A Then, Sp(A)∩

Sp(B)⊂Sp(AX)∩Sp(BY) = Sp(A(A+B) − B) Q.E.D.

Statement (3.35) is called the parallel sum of matricesA and B.

Generalized inverses of partitioned

In Section 1.4, we showed that the (regular) inverse of a symmetric nonsin- gular matrix

(3.38) is given by (1.71) or (1.72) In this section, we consider a generalized inverse ofM that may be singular.

Lemma 3.4 Let A be symmetric and such thatSp(A)⊃Sp(B) Then the following propositions hold:

Proof Propositions (3.39) and (3.40) are clear from Theorem 3.11.

(3.41): LetB=AW Then,B 0 A − B =W 0 A 0 A − AW =W 0 AA − AW

(3.42) represent a symmetric generalized inverse ofM defined in (3.38) If Sp(A)

Z =D − = (C−B 0 A − B) − , (3.45) and ifSp(C)⊃Sp(B 0 ), they satisfy

(AX+BY 0 )E=O, where E=A−BC − B 0 , (3.47) and

X=E − = (A−BC − B 0 ) − (3.48) Proof FromM HM =M, we obtain

(AX+BY 0 )B+ (AY +BZ)C =B, (3.50) and

Postmultiply (3.49) byA − B and subtract it from (3.50) Then, by noting

AA − B = B, we obtain (AY +BZ)(C −B 0 A − B) = O, which implies

Premultiply (3.50) byB 0 A − and subtract it from (3.51) We obtain

D(Y 0 B+ZC−I p ) =O (3.53) by noting B 0 A − A = B 0 , D 0 = D ((3.40) in Lemma 3.4), C = C 0 , and

Z = Z 0 Hence, we have D = CZD+B 0 Y D = CZ 0 D +B 0 A − AY D.

We substitute (3.52) into this to obtainD=CZD−B 0 A − BZD= (C−

B 0 A − B)ZD = DZD, implying Z = D − , from which (3.43), (3.44), and (3.45) follow Equations (3.46), (3.47), and (3.48) follow similarly by deriv- ing equations analogous to (3.52) and (3.53) by (3.50)−BC − ×(3.51), and

Corollary 1 H defined in (3.42) can be expressed as

Proof It is clear from AY D = −BZD and AA − B = B that Y −A − BZ is a solution Hence, AY B 0 = −AA − BZB 0 = −BZB 0 , and

BY 0 A = B(AY) 0 =B(−AA − BZ) 0 = −BZB 0 Substituting these into (3.43) yields

The equation above shows thatX =A − +A − BD − B 0 A − satisfies (3.43), indicating that (3.54) gives an expression forH Equation (3.55) can be de- rived similarly using (3.46) through (3.48) Q.E.D.

When Sp(A)⊃Sp(B), we have

We omit the case in which Sp(A)⊃Sp(B) does not necessarily hold in Theorem 3.13 (This is a little complicated but is left as an exercise for the reader.) Let

# be a generalized inverse ofN defined above Then,

C˜ 3 = −U + (B 0 T − B) − , where T = A+BU B 0 , and U is an arbitrary matrix such that Sp(T) ⊃

Sp(A) and Sp(T)⊃Sp(B) (Rao, 1973).

A Variety of Generalized Inverse Matrices

Reflexive generalized inverse matrices

The method outlined previously identifies AA − as the projector onto V along W (P V ãW), and A − A as the projector onto ˜V along ˜W (P V ˜ ã W ˜) However, the inverse transformation φ − (y) for an arbitrary n-component vector y ∈ E n is not uniquely defined due to the flexibility in selecting φ − M (y 2 ) = A − y 2 ∈ Ker(A), where y 2 is derived from the decomposition y = y 1 + y 2, with y 1 belonging to V and y 2 to W Consequently, an A − that fulfills the condition P V ãW = AA − is established.

P W ãV =A − Ais not uniquely determined We therefore consider the condi- tion under whichA − is uniquely determined for a givenAand that satisfies the conditions above.

For an arbitrary y ∈ E n , we have y 1 = AA − y ∈ Sp(A) and y 2 (I n −AA − )y∈Sp(A) c Let A − be an arbitrary generalized inverse of A.

From Theorem 3.2, we have x = A − Ay=A − y 1 +A − y 2

= (A − A)A − y+ (I n −A − A)A − y=x 1 +x 2 , (3.61) where x 1 ∈ V˜ = Sp(A − A) and x 2 ∈ W˜ = Ker(A) = Sp(I n −AA − ) On the other hand,

Hence,A − transforms y 1 ∈Sp(A) intox 1 ∈Sp(A − A), and y 2 ∈Sp(I n −

AA − ) into x 2 ∈W = Ker(A) However, the latter mapping is not surjec- tive This allows an arbitrary choice of the dimensionalityr in ˜W r ⊂W˜ Ker(A) Let r= 0, namely ˜W r ={0} Then, (I m −A − A)A − =O, and it follows that

Definition 3.2A generalized inverseA − that satisfies both (3.1) and (3.63) is called a reflexive g-inverse matrix ofA and is denoted as A − r

As is clear from the proof of Theorem 3.3,A − that satisfies x 2 =A − y 2 =0 (3.64) is a reflexive g-inverse of A, which transforms y ∈ V = Sp(A) into x 1 ∈

A − r is uniquely determined only if W such that E n =V ⊕W and ˜V such that E m = ˜V ⊕W˜ are simultaneously determined In general, A − r is not uniquely determined because of the arbitrariness in the choice ofW and ˜V.

Note That A − y = 0 ⇔ A − AA − = A − can be shown as follows From Theorem 3.2, we have AA − y 1 = y 1 ⇒ A − AA − y 1 = A − y 1 Furthermore, since AA − y 2 =

0 ⇒ A − AA − y 2 = 0, we have A − AA − (y 1 + y 2 ) = A − y 1 , the left-hand side of which is equal to A − AA − y If we add A − y 2 to the right-hand side, we obtain

A − y 1 + A − y 2 = A − y, and so A − AA − y = A − y Since this has to hold for any y, it must hold that A − AA − = A −

Conversely, from Lemma 3.2, we have y 2 = (I n − AA − )y Thus, A − AA − =

Theorem 3.14 The following relation holds for a generalized inverse A − that satisfies (3.1):

(⇐): Decompose A − as A − = (I m −A − A)A − +A − AA − Since Sp(I m − A − A) ∩ Sp(A − A) = {0}, we have rank(A − ) = rank((I m −

A − A)A − ) + rank(A − AA − ) From rank(A − AA − ) = rank(A), we ob- tain rank((I m −A − A)A − ) = 0 ⇒ (I m −A − A)A − = O, which implies

, whereA 11 is of order r and nonsingular Let the rank ofA ber Then,

# is a reflexive g-inverse of A because rank(G) = rank(A) =r, and

, and soA 11 W =A 12 andA 21 W =A 22 For example,

  is a symmetric matrix of rank 2 One expression of a reflexive g-inverseA − r ofA is given by

Example 3.4 Obtain a reflexive g-inverseA − r of A "

AA − r A=A implies a+b+c+d= 1 An additional condition may be derived from " a b c d

It may also be derived from rank(A) = 1, which implies rank(A − r ) = 1 It follows that det(A − r ) =ad−bc= 0 A set ofa, b, c, anddthat satisfies both a+b+c+d= 1 and ad−bc= 0 defines a 2 by 2 square matrixA − r

Minimum norm generalized inverse matrices

LetE m be decomposed asE m = ˜V ⊕W˜, where ˜W = Ker(A), in Definition 3.3 If we choose ˜V = ˜W ⊥ (that is, when ˜V and ˜W are orthogonal),A − A P V ˜ ã W ˜ becomes an orthogonal projector, and it holds that

Since ˜V = Sp(A − A) and ˜W = Sp(I m −A − A) from Lemmas 3.1 and 3.2,

(3.66) can also be derived from

Let Ax=y be a linear equation, where y ∈Sp(A) Since y 1 =Ax ∈

Sp(A) implies y=y 1 +y 2 =y 1 +0=y 1 , we obtain ˜ x=A − y=A − y 1 =A − Ax (3.68)

Let P = A − A be a projection matrix, and let P ∗ denote an orthogonal projector that satisfiesP 0 =P From Theorem 2.22, we have

The relationship P 0 = P implies that the norm of x reaches its minimum value, ||x|| = ||P* y||, when P 0 equals P This highlights that while there are infinitely many solutions for x in the linear equation Ax = y, where y belongs to the span of A and A is an n by m matrix with rank(A) < m, the solution x = A^−1 y corresponds to the minimum sum of squares of its elements when A satisfies the specified conditions.

Definition 3.3 AnA − that satisfies bothAA − A=Aand(A − A) 0 =A − A is called a minimum norm g-inverse of A, and is denoted by A − m (Rao and Mitra, 1971).

The following theorem holds for a minimum norm g-inverseA − m

Theorem 3.15 The following three conditions are equivalent:

(3.71)⇒(3.72): Postmultiply both sides ofA − m AA 0 =A 0 by (AA 0 ) − A.

(3.72) ⇒ (3.70): UseA 0 (AA 0 ) − AA 0 =A 0 , and (A 0 (AA 0 ) − A) 0 =A 0 (A ×A 0 ) − A obtained by replacingAby A 0 in Theorem 3.5 Q.E.D.

Note Since (A − m A) 0 (I m − A − m A) = O, A − m A = A 0 (AA 0 ) − A is the orthogonal projector onto Sp(A 0 ).

When the conditions of Theorem 3.15 are satisfied, the subspaces ˜V and ˜W are ortho-complementary within the direct-sum decomposition of E m, represented as ˜V ∩ W˜ Consequently, it follows that ˜V is equal to Sp(A − m A) or Sp(A 0 ), as indicated in equation (3.72).

Note By Theorem 3.15, one expression for A − m is given by A 0 (AA 0 ) − In this case, rank(A − m ) = rank(A) holds Let Z be an arbitrary m by n matrix Then the following relation holds:

Let x = A − m b be a solution to Ax = b Then, AA 0 (AA 0 ) − b = AA 0 (AA 0 ) − Ax =

Ax = b Hence, we obtain x = A 0 (AA 0 ) − b (3.74)

In the context of linear algebra, the first term in (3.73) is associated with ˜ V = Sp(A 0 ), while the second term relates to ˜ W = ˜ V ⊥ Generally, it holds that Sp(A − m ) includes Sp(A 0 ), indicating that rank(A − m ) is greater than or equal to rank(A 0 ) A minimum norm g-inverse that achieves equality in rank, specifically rank(A − m ) = rank(A), is referred to as a minimum norm reflexive g-inverse, denoted as such.

Note Equation (3.74) can also be obtained directly by finding x that minimizes x 0 x under Ax = b Let λ = (λ 1 , λ 2 , ã ã ã , λ p ) 0 denote a vector of Lagrange multipliers, and define f (x, λ) = 1

To differentiate the function f with respect to the elements of x and set the results to zero, we derive the equation x = A 0 λ By substituting this expression into the equation Ax = b, we obtain b = AA 0 λ From this, we can express λ as λ = (AA 0) − b + [I n − (AA 0) − (AA 0)]z, where z represents an arbitrary m-component vector Consequently, we arrive at the final expression x = A 0 (AA 0) − b.

The solution above amounts to obtainingxthat satisfies both x=A 0 λ andAx=b, that is, to solving the simultaneous equation

From Sp(I m )⊃Sp(A) and the corollary of Theorem 3.13, we obtain (3.74) becauseC 2 =A 0 (AA 0 ) −

Example 3.5 Solving a simultaneous equation in three unknowns, x+y−2z= 2, x−2y+z=−1, −2x+y+z=−1, we obtain x=k+ 1, y=k+ 1, andz=k.

Hence, the solution obtained by setting k=− 2 3 (that is, x= 1 3 ,y= 1 3 , and z=− 2 3 ) minimizesx 2 +y 2 +z 2

Let us derive the solution above via a minimum norm g-inverse Let

Ax=ydenote the simultaneous equation above in three unknowns Then,

A minimum norm reflexive g-inverse of A, on the other hand, is, according to (3.75), given by

(Verify that theA − mr above satisfies AA − mr A=A, (A − mr A) 0 =A − mr A, and

Least squares generalized inverse matrices

For the simultaneous equation Ax = y to have a solution, it is essential that y belongs to the span of A (y ∈ Sp(A)) Conversely, if y does not belong to the span of A (y ∉ Sp(A)), then no solution vector x can be found Thus, our focus is on determining the vector x* that meets these criteria.

Let y = y 0 +y 1 , where y ∈ E n , y 0 ∈ V = Sp(A), and y 1 ∈ W Sp(I n −AA − ) Then a solution vector to Ax = y 0 can be expressed as x = A − y 0 using an arbitrary g-inverse A − Since y 0 = y−y 1 , we have x = A − y 0 = A − (y −y 1 ) = A − y−A − y 1 Furthermore, since y 1 (I n −AA − )y, we have

Ax=AA − y−AA − (I n −AA − )y=AA − y.

LetP A denote the orthogonal projector onto Sp(A) From

A − that minimizes||Ax−y|| satisfiesAA − =P A That is,

In this case, V and W are orthogonal, and the projector onto V along W,

P V ãW =P V ãV ⊥ =AA − , becomes an orthogonal projector.

Definition 3.4 A generalized inverse A − that satisfies both AA − A=A and (AA 0 ) 0 = AA − is called a least squares g-inverse matrix of A and is denoted asA − ` (Rao and Mitra, 1971).

The following theorem holds for a least squares g-inverse.

Theorem 3.16The following three conditions are equivalent:

(3.79)⇒ (3.80): Premultiply both sides of A 0 AA − ` =A 0 by A(A 0 A) − , and use the result in Theorem 3.5.

Similarly to a minimum norm g-inverse A − m , a general form of a least squares g-inverse is given by

A − ` = (A 0 A) − A 0 + [I m −(A 0 A) − A 0 A]Z, (3.81) where Z is an arbitrary m by n matrix In this case, we generally have rank(A − ` )≥rank(A) A least squares reflexive g-inverse that satisfies rank(A − ` ) = rank(A) is given by

We next prove a theorem that shows the relationship between a least squares g-inverse and a minimum norm g-inverse.

{(A 0 ) − m }={(A − ` )} (3.83) Proof AA − ` A = A ⇒ A 0 (A − ` ) 0 A 0 = A 0 Furthermore, (AA − ` ) 0 AA − ` ⇒ (A − ` ) 0 A 0 = ((A − ` ) 0 A 0 ) 0 Hence, from Theorem 3.15, (A − ` ) 0 ∈ {(A 0 ) − m } On the other hand, from A 0 (A 0 ) − m A 0 = A 0 and ((A 0 ) − m A 0 ) 0 (A 0 ) − m A 0 , we have

Example 3.6 The simultaneous equations x+y = 2, x−2y = 1, and

−2x+y = 0 obviously have no solution Let

The z that minimizes ||b−Ax|| 2 is given by z =A − `r b= (A 0 A) − A 0 b.

In this case, we have

The minimum is given by||b−Az|| 2 = (2−1) 2 + (1−0) 2 + (0−(−1)) 2 = 3.

The Moore-Penrose generalized inverse matrix

The generalized inverses, including reflexive g-inverses, minimum norm g-inverses, and least squares g-inverses, are not uniquely defined for a specific matrix A Nevertheless, the matrix A−, which fulfills the conditions outlined in equations (3.59) and (3.60), can be determined through the direct-sum decompositions E n = V ⊕ W.

E m = ˜V ⊕W˜, so that if A − is a reflexive g-inverse (i.e.,A − AA − =A − ), it can be uniquely determined If in additionW =V ⊥ and ˜V = ˜W ⊥ , then clearly (3.66) and (3.77) hold, and the following definition can be given.

Definition 3.5 Matrix A + that satisfies all of the following conditions is called the Moore-Penrose g-inverse matrix of A (Moore, 1920; Penrose,

1955),hereafter called merely the Moore-Penrose inverse:

From the properties given in (3.86) and (3.87), we have (AA + ) 0 (I n −AA + ) Oand (A + A) 0 (I m −A + A) =O, and so

The orthogonal projectors onto Sp(A) and Sp(A +) are represented by P A = AA + and P A + = A + A, as indicated in equation (3.88) It is important to note that (A + ) + = A, which allows for the definition of the Moore-Penrose inverse through this equation While Penrose (1955) provided a definition using equations (3.84) to (3.87), Moore (1920) introduced the definition based on (3.88) In cases where reflexivity does not hold, specifically when A + AA + ≠ A +, the orthogonal projector P A + = A + A projects onto Sp(A 0) but may not project onto Sp(A +) This information is encapsulated in a key theorem.

Theorem 3.18 Let P A , P A 0 , and P G be the orthogonal projectors onto

Sp(A), Sp(A 0 ), andSp(G),respectively Then,

(ii)AG=P A ,GA=P A 0 ⇐⇒AGA=A, (AG) 0 =AG, (GA) 0 =GA.

Theorem 3.19 The necessary and sufficient condition for A − inx=A − b to minimize ||Ax−b|| is A − =A +

Proof (Sufficiency) The xthat minimizes ||Ax−b||can be expressed as x=A + b+ (I m −A + A)z, (3.89) where z is an arbitrary m-component vector.

Furthermore, from (3.20) we have Sp(I m −A + A) = Ker(A), and from the fact that A + is a reflexive g-inverse, we have Sp(A + ) = Sp(A + A).

From the fact that A + is also a minimum norm g-inverse, we have E m Sp(A + ) ⊕ ã Sp(I m −A + A), which together imply that the two vectors in

(3.89), A + band (I m −A + A)b, are mutually orthogonal We thus obtain

||x|| 2 =||A + b|| 2 +||(I m −A + A)z|| 2 ≥ ||A + b|| 2 , (3.90) indicating that ||A + b|| 2 does not exceed||x|| 2

(Necessity) Assume that A + b gives the minimum norm ||x|| among all possible x’s Then A + that satisfies

Hence, by pre- and postmultiplying both sides of (A + ) 0 A + A = (A + ) 0 by (AA + −I n ) 0 andA + , respectively, we obtain (A + AA + −A + ) 0 (A + AA + −

After manipulation, we find that \( A + ) = O \) By premultiplying both sides of \( (A + ) 0 A + A = (A + ) 0 \) by \( A 0 \), we derive \( (A + A) 0 = A + A \) Additionally, the conditions \( AA + A = A \) and \( (A 0 A + ) 0 = AA + \) can be established, as \( A + \) serves as a least squares g-inverse.

We next introduce a theorem that shows the uniqueness of the Moore- Penrose inverse.

Theorem 3.20 The Moore-Penrose inverse ofAthat satisfies the four con- ditions (3.84) through (3.87) in Definition 3.5 is uniquely determined.

Proof Let X and Y represent the Moore-Penrose inverses of A Then,

X = XAX = (XA) 0 X = A 0 X 0 X = A 0 Y 0 A 0 X 0 X = A 0 Y 0 XAX A 0 Y 0 X = Y AX = Y AY AX = Y Y 0 A 0 X 0 A 0 = Y Y 0 A 0 = Y AY = Y.

(This proof is due to Kalman (1976).) Q.E.D.

We now consider expressions of the Moore-Penrose inverse.

Theorem 3.21 The Moore-Penrose inverse of A can be expressed as

To minimize the expression ||b−Ax||, we start with the condition that x = A + b satisfies the normal equation A^T Ax = A^T b To further minimize ||x||², we define the function f(x, λ) = x^T x - 2λ^T (AA^T - A^T b), where λ is a vector of Lagrange multipliers By differentiating f with respect to x and setting the derivative to zero, we derive the equation x = A^T Aλ, leading to A^T Aλ = A + b By premultiplying this equation by A^T A(A^T A A^T A)⁻¹ - A^T A, we simplify to A^T A + = A^T.

A 0 A(A 0 AA 0 A) − A 0 AA 0 =A 0 , leading to A 0 Aλ =A 0 A(A 0 AA 0 A) − A 0 b x, thereby establishing (3.91) Q.E.D.

The equation Ax = AA + b leads to the condition A^T Ax = A^T b, allowing for the minimization of x^T x under this constraint We define ˜λ = (˜λ₁, ˜λ₂, , ˜λₘ)ᵀ and the function f(x, λ) = x^T x - 2λ^T (A^T Ax - A^T b) By differentiating f with respect to x and λ and setting the derivatives to zero, we derive the solutions x = A^T Aλ and A^T Ax = b.

Combining these two equations, we obtain ã I m A 0 A

A 0 b á Solving this equation using the corollary of Theorem 3.13, we also obtain (3.91).

Proof We use the fact that (A 0 A) − A 0 (AA 0 ) − A(A 0 A) − is a g-inverse of

A 0 AA 0 A, which can be seen from

# FromAA − A=A, we obtaina+b+c+d= 1 A least squares g-inverseA − ` is given by

# is symmetric, which impliesa+c=b+d This, combined witha+b+c+d= 1, yieldsc= 1 2 −aand d= 1 2 −b.

Similarly, a minimum norm g-inverse A − m is given by

This derives from the fact that

# is symmetric, which impliesa+b=c+d This, combined witha+b+c+d= 1, yieldsb= 1 2 −aand d= 1 2 −c A reflexive g-inverse has to satisfyad=bc from Example 3.4.

The Moore-Penrose inverse should satisfy all of the conditions above, and it is given by

The Moore-Penrose inverse A + can also be calculated as follows instead of using (3.91) and (3.92) Let x = A + b for ∀b ∈ E n From (3.91), x ∈

Sp(A 0 ), and so there existsxsuch thatx=A 0 z for somez Hence, A 0 b A 0 z Premultiplying both sides of this equation by A 0 A, we obtain A 0 b A 0 Axfrom A 0 AA + =A 0 Thus, by canceling outz from

10x 1 + 5x 2 + 15x 3 =b 1 + 2b 2 (3.93) from A 0 Ax = A 0 b Furthermore, from x= A 0 z, we have x 1 = 2z 1 + 4z 2 , x 2 =z 1 + 2z 2 , and x 3 = 3z 1 + 6z 2 Hence, x 1 = 2x 2 , x 3 = 3x 2 (3.94) Substituting (3.94) into (3.93), we obtain x 2 = 1

We now show how to obtain the Moore-Penrose inverse ofAusing (3.92) when ann by m matrixA admits a rank decomposition A =BC, where rank(B) = rank(C) =r = rank(A) Substituting A=BC into (3.92), we obtain

Note that B 0 B and CC 0 are nonsingular matrices of order r and that

B 0 (BB 0 ) − B and C(C 0 C) − C 0 are both identity matrices We thus have

(BB 0 ) − B(CC 0 ) − B 0 (BB 0 ) − ∈ {(BCC 0 B 0 ) − } and

The following theorem is derived from the fact that A(A 0 A) − A 0 = I n if rank(A) =n(≤m) and A 0 (AA 0 ) − A=I m if rank(A) =m(≤n).

A + =A 0 (AA 0 ) −1 =A − mr , (3.97) and if rank(A) =m≤n,

Note Another expression of the Moore-Penrose inverse is given by

A + = A − mr AA − `r = A − m AA − ` (3.99) This can be derived from the decomposition:

(It is left as an exercise for the reader to verify that the formula above satisfies the four conditions given in Definition 3.5.)

Extended Definitions

Linear Regression Analysis

Analysis of Variance

Multivariate Analysis

Linear Simultaneous Equations

Ngày đăng: 27/05/2022, 15:14

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...