Luận án tiến sĩ: Iterative methods for singular linear equations and least-squares problems

CG, MINRES, and SYMMLQ are Krylov subspace methods for solving large symmetric systems of linear equations.. When these methods areapplied to an inconsistent system that is, a singular s

Trang 1

LINEAR EQUATIONS AND LEAST-SQUARES PROBLEMS

A DISSERTATIONSUBMITTED TO THE PROGRAM IN

COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

Sou-Cheng (Terrya) ChoiDecember 2006

Trang 2

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the copysubmitted Broken or indistinct print, colored or poor quality illustrations andphotographs, print bleed-through, substandard margins, and improperalignment can adversely affect reproduction

In the unlikely event that the author did not send a complete manuscriptand there are missing pages, these will be noted Also, if unauthorizedcopyright material had to be removed, a note will indicate the deletion

®

UMI

ProQuest Information and Learning Company

300 North Zeeb RoadP.O Box 1346Ann Arbor, MI 48106-1346

Trang 3

Trang 4

(Michael A Saunders)Principal Advisor

I certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy

Aut Abel(Gene H Golub)

Trang 5

CG, MINRES, and SYMMLQ are Krylov subspace methods for solving large symmetric systems

of linear equations CG (the conjugate-gradient method) is reliable on positive-definite systems,while MINRES and SYMMLQ are designed for indefinite systems When these methods areapplied to an inconsistent system (that is, a singular symmetric least-squares problem), CGcould break down and SYMMLQ’s solution could explode, while MINRES would give a least-squares solution but not necessarily the minimum-length solution (often called the pseudoinversesolution) This understanding motivates us to design a MINRES-like algorithm to computeminimum-length solutions to singular symmetric systems

MINRES uses QR factors of the tridiagonal matrix from the Lanczos process (where R isupper-tridiagonal) Our algorithm uses a QLP decomposition (where rotations on the rightreduce R to lower-tridiagonal form), and so we call it MINRES-QLP On singular or nonsingularsystems, MINRES-QLP can give more accurate solutions than MINRES or SYMMLQ We derivepreconditioned MINRES-QLP, new stopping rules, and better estimates of the solution andresidual norms, the matrix norm and condition number

For a singular matrix of arbitrary shape, we observe that null vectors can be obtained bysolving least-squares problems involving the transpose of the matrix For sparse rectangularmatrices, this suggests an application of the iterative solver LSQR In the square case, MINRES,MINRES-QLP, or LSQR are applicable Results are given for solving homogeneous systems,computing the stationary probability vector for Markov Chain models, and finding null vectorsfor sparse systems arising in helioseismology

Trang 6

First and foremost, I owe an enormous debt of gratitude to my advisor Professor Michael Saundersfor his tireless support throughout my graduate education in Stanford Michael is the bestmentor a research student could possibly hope for He is of course an amazing academic going

by his first-rate scholarly abilities, unparalleled mastery of his specialty, and profound insights

on matters algorithmic and numerical (not surprising considering that he is one of most highlycited computer scientists in the world today) But above and beyond all these, Michael is a mostwonderful gentleman with great human qualities—he is modest, compassionate, understanding,accommodating, and possesses a witty sense of humor I am very fortunate, very proud, and veryhonored to be Michael’s student This thesis certainly would not have been completed withoutMichael’s most meticulous and thorough revision

Professor Gene Golub is a demigod in our field and a driving force behind the computationalmathematics community at Stanford Incidentally, Gene is also Michael’s advisor many yearsago I am also very grateful to Gene for his generosity and encouragement He is the onlyprofessor I know who gives students 24-hour access to his large collection of books in his office.His stature and renown for hospitality attract visiting researchers from all over the world andcreate a most lively and dynamic environment at Stanford This contributed greatly to myacademic development Like me, Gene came from a working class family—a rarity in a placelike Stanford where many students are of the well-heeled gentry He has often reminded me that

a humble background is no obstacle to success I am also very fortunate, very proud, and veryhonored to have Gene as my co-advisor

Special thanks are due to Professor Chris Paige Of McGill University for generously sharinghis ideas and insights He spent many precious hours with me over emails and long discussionsduring his two visits to Stanford in the past year Chris is a giant in the field and is a greathonour to fill a gap in one of the famous works of Chris and Michael started long ago

I thank my reading committee members: Professor Doron Levy and Dr Rasmus Larsen Theirhelpful suggestions have improved this thesis enormously My thanks also to Professor JeromeFriedman for chairing my oral defense despite already having retired a few months earlier

I am very grateful to my professors from the National University of Singapore (NUS), whohave instilled and inspired in me interests in computational mathematics since I was an under-graduate: Dr Lawrence K.H Ma, Professors Choy-Heng Lai, Jiang-Sheng Wang, Zuowei Shen,Gongyun Zhou, Kim-Chuan Toh, Belal Baaquie, Kan Chen, and last but not least Prabir Burman(UC Davis)

The work in this thesis was generously supported by research grants of Professors MichaelSaunders, Gene Golub, and David Donoho Thanks are also due to the C Gary & VirginiaSkartvedt Endowed Engineering Fund for a Stanford School-of-Engineering Fellowship, and tothe Silicon Valley Engineering Council for an SVEC Scholarship

MATLAB has been an indispensable tool—without which, none of the numerical experimentscould have been performed with such ease and efficiency I am proud to say that I learnt MATLAB

Trang 7

me as his teaching assistant for the course on which his very enjoyable book [71] is based (andfor kindly recommending me as teaching assistant to his daughter Professor Kathryn Moler, whotaught the course in the subsequent year) The book is filled with illuminating examples andthis thesis has borrowed a most fascinating one (cf Chapter 1).

I thank Michael Friedlander for the elegant thesis template that he generously shares withthe Stanford public

I have been fortunate to intern at both Google and IBM Almaden Labs, during which periods

I benefited from working with Doctors John Tomlin, Andrew Tomkins, and Tom Truong.Specifically I want to thank Dr Xiaoye Sherry Li and Dr Amy Langville for inviting me

to speak about applications motivated by this thesis in Lawrence Berkeley Lab and the SIAMAnnual Meeting 2004 respectively Thanks also to Professor Beresford Parlett and ProfessorInderjit Dhillon for the opportunities to speak in their seminars in UC Berkeley and UT Austinrespectively

I also want to take the opportunity to thank each administrator and staff members of Stanfordand NUS who have gone beyond their duties of call: Professors Walter Murray and Peter Glynn,Indira Choudhury, Lilian Lao, Evelyn Boughton, Lorrie Papadakis, Tim Keely, Seth Tornborg,Suzanne Bigas, Connie Chan, Christine Fiksdal, Dana Halpin, Jam Kiattinant, Nikkie Salgado,Claire Stager, Deborah Michael, Lori Cottle, Pat Shallenberger, Helen Tombropoulos, SharonBergman, Lee Kuen Chee, and Kowk Te Ang

I am indebted to the following friends and colleagues for their friendship and encouragementthat made my Stanford years so much more enjoyable: Michael’s family Prue, Tania, and Emily;David, Ha, and baby Mike Saunders; Holly Jin, Neil, Danlin, and Hansen Lillemark; Lilian Lao,Michael and Victor Dang; Justin Wing Lok Wan and Winnie Wan Chu; Dulce Ponceleón, Walter,Emma and Sofia Murray; Von Bing Yap and Anne Suet Lin Chong; Pei Yee Woo and KennethWee; Larry and Mary Wong I thank for their friendship and wisdom: Monica Johnston, Wanchi

So, Regina Ip-Lau, Stephen Ng, Wah Tung Lau, Chris Ng, Jonathan Choi, Xiaoqing Zhu, SoravBansal, Jeonghee Yi, Mike Ching, Cindy Law, Doris Wong, Jasmine Wong, Sandi Suardi, SharonWong, Popoh Low, Grace Ng, Roland Law, Ricky Ip, Fanny Lau, Stephen Yeung, Kenneth(D&G) Wong Chok Hang Yeung, Carrie Teng, Grace Hui, Anthony So, Samuel Ieong, KennethTam, Yee Wai Chong, Anthony Fai Tong Chung, Winnie Wing Yin Choi, Victor Lee, William

Yu Cheong Chan, Dik Kin Wong, Collin Kwok-Leung Mui, Rosanna Man, Michael Friedlander,Kaustuv, Zheng Su, Yen Lin Chia, Hanh Huynh, Wanjun Mi, Linzhong Deng, Ofer Levi, JamesLambers, Paul Tupper, Melissa Aczon, Steve Bryson, Oren Livne, Valentin Spitkovsky, CindyMason, Morten Mørup, Anil Gaba, Donald van Deventer, Kenji Imai, Chong-Peng Toh, FrederickWilleboordse, Yuan Ping Feng, Alex Ling, Roland Su, Helen Lau, and Suzanne Woo

I have been infinitely lucky to have met Lek-Heng Lim when we were both undergraduates inNUS As I made further acquaintance with Lek-Heng, I found him among the most thoughtful,encouraging, and inspiring person of all friends and colleagues Without his encouragement, Iwould not have started this long journey, let alone finished

Last but not least, I thank my parents and grandma for years of toiling and putting up with

my “life-long” studies I am indebted to my siblings Dawn and Stephen, and brother-in-law JackCheng for their constant support

vi

Trang 8

List of Tables and Figures

1 Introduction

1.1 The Motivating Problem 1 0 0c cee ee ee kia1.1.1 Null Vectors 2 HQ nà nà KV V Và va1.1.2 ARevelaion Q Q Q HQ ee k k k k ka

113 SymmetricSystems Q0 HQ HH ee va1.2 Preliminaries Q Q Q Q HQ HH HQ nu ng k ko kh K k ky1.21 Problem Description and Formal Solutions 1.2.2 Existing Numerical Algorithms 10 00 00 ee eee eee1.2.3 Background for MINRES 02 0 eee eee eee

2.2.3 MINRES Q Q Q Q Q Q Q c HH ee2.3 Existing Iterative Methods for Hermitian Least-Squares

23.2 GMRES 1 TH

2.3.4 QMRandSQMR 2 et2.4 Stopping Conditions and Norm Estimates 0 0 ce ee

2.4.1 Residual and Residual Norm 1 2 ee ee ee

2.4.3 Solution Norms 2 0 es2.4.4 Matrix Norms 0 n n Q v Q kg k k k KV 42.4.5 Matrix Condition Numbers

vii

xiii

Oo © CO ON NF BH CB aee CO CC

1313161821252930323335353636373740

Trang 9

3.2.1 The MINRES-QLP Subproblem Q Q eee eens3.2.2 Solving the Subproblem 2.0000 eee eee La

3.2.4 Transfer from MINRES to MINRES-QLP 3.3 Stopping Conditions and Norm Estimates 2 0 ee eee3.3.1 Residual and Residual Norm 2 eee ee ee3.3.2 NormofAr, 0.0.0 cc Q v Ty kg vn kg kg kg va3.3.3 Matrix Norms Q Q Q k LH gu Q v kg ki v kia3.3.4 Matrix Condition Numbers 1 0 eee ee ee es

3.3.6 Projection of Right-hand Side onto Krylov Subspaces 3.4 Preconditioned MINRES and MINRES-QLP De ee ee ee3.4.1 Derivation 0 ch ng vn quà gà v ki kia3.4.2 Preconditioning Singular lz=bÙb ee ee ee ee3.4.3 Preconditioning Singular Arb 2 cu cu kg

3.5.1 Diagonal Preconditioning Ặ ee ee3.5.2 Binormalization (BIN) 2.0.0.0 Qua3.5.3 Incomplete Cholesky Factorization 000048

Numerical Experiments on Symmetric Systems

41 A Singular Indefinite System © 0 ee ee

42 Two Laplacian Systems 1 0 ee

4.2.1 An Almost Compatible System 2 ee

4.2.2 A Least-Squares Problem 0 0.0 ce ee ee es4.3 Hermitian Problems 2 0 ee ee4.3.1 Without Preconditioning © 0.0 0 cee eee ee ee kia4.3.2 With Diagonal Preconditioning 2 ee4.3.3 With Binormalization 1 0 0 es4.4 Effects of Rounding Errors in MINRES-QLP -.-.2+2005

Computation of Null Vectors, Eigenvectors, and Singular Vectors

5.1 Applications en5.1.1 Eigenvalue Problem 2 0.0 ee5.1.2 Singular Value Problem - 2 0 ee ee

5.1.38 Generalized, Quadratic, and Polynomial Eigenvalue Problems 5.1.4 Multiparameter Eigenvalue Problem Ặ 000045.2 Computing a Single Null Vector ee es5.3 Computing Multiple Null Vectors 0.0.0 Q0 Q Q eee ee ee5.3.1 MCGLS: Least-Squares with Multiple Right-Hand Sides 5.3.2 MLSQR: Least-Squares with Multiple Right-Hand Sides 5.3.3 MLSQRnull: Multiple Null Vectors 02 0 00000

viii

Trang 10

5.4.2 PageRank Applied to Citation Data 1 2 ee ee 875.4.3 A Multiple Null-Vector Problem from Helioseismology 89

6 Conclusions and Future Work 91

6.2 Contributions - Q Q Q Q H H H n ng cv KÓ kg kh R k Ko va 92

Bibliography 85

Trang 11

Tables and Figures

Subproblem definitions of MINRES, GMRES, QMR, and LSQR .Bases and subproblem solutions in MINRES, GMRES, QMR, LSQR Algorithm Arnoldi Q Q Q Q Q Q HQ HQ u Q u Q v VY kia

Algorithm Bidiag1 (the Golub-Kahan proces) eee ees

Algorithm MINRES-QLP OOOOQ ee

Subproblem definitions of CG, SYMMLQ, MINRES, and MINRBS-QLP Bases and subproblem solutions in CG, SYMMLQ, MINRES, MINRES-QLP .Algorithm PMINRES Preconditioned MINRES `,Algorithm PMINRES-QLP Preconditioned MINRES-QLP Different MATLAB implementations of various Krylov subspace methods

Null vectors from various Krylov subspace methods

Trang 12

(MINRES-QLP’s performance (cf MINRES) on an ill-conditioned system (big ||z||).

The loss of orthogonality in Lanczos implies convergence of solution in Ar =} .Estimating ||Al|, and ||Al|p using different methods in MINRES

Rounding errors in MINRES on ill-conditioned systema MINRES-QLP with and without interleaving left and right reflectors .The ratio of L;,’s extreme diagonal entries from MINRES-QLP approximates «(A).MINRES and MINRES-QLP on a well-conditioned linear system Estimating ||A||, using different methods in MINRES-QLP .Norms of solution estimates from MINRES and MINRES-QLP min ||Az — 6]

Example: Indefinite and singular Ar =b 2.2 ee eeRounding errors in MINRES-QLP (cf MINRES) on ill-conditioned systems .Rounding errors in MINRES-QLP (cf MINRES) on least-squares problems

Convergence of the power method and LSQR on harvard500 PageRank of harvardBO0 Q Q ee vaConvergence of the power method and LSQR on CiteSeer data PageRank of CiteSeer data 2 HQ ng Q k kg

A multiple null-vector problem that arises from helioseismology

xi

NN DMD OFF W

Trang 13

Chapter 1

Introduction

1.1 The Motivating Problem

In 1998 when the Google PageRank algorithm was first described [16], the World Wide Webcontained about 150 million web pages and the classical power method appeared to be effectivefor computing the relevant matrix eigenvector By 2003, the number of web pages had grown to 2billion, and the power method was still being used (monthly) to compute an up-to-date rankingvector Given some initial eigenvector estimate vp, the power method involves the iteration

2k = AVE-1, Uk = £¢/|lrell, k=1, ,kp, (1.1)

where A is a square matrix with rows and columns corresponding to web pages, and A;; z 0

if there is a link from page 7 to page i Each column of A sums to 1 and thus A is called acolumn-stochastic matrix Moreover, if its underlying graph is strongly connected, then by thePerron-Frobenius theorem, A would have a simple dominant eigenvalue of 1 and thus the powermethod is applicable In practice, the convergence of (1.1) appeared to be remarkably good Therequired number of iterations kp was at most a few hundred

Much analysis has since been done (e.g., [31, 64]), but at this stage, there was still room foroptimistic researchers [18, 42, 46] to believe that Krylov subspace methods might prove useful

in place of the power method Since the related eigenvalue is known to be 1, the method ofinverse iteration [50, p 362], [87] could be used This involves a sequence of linear systems inthe following iteration:

where the number of iterations ky would be only 1 or 2 The matrix ÁT— Ï is intentionally singular,and the computed solutions x, are expected to grow extremely large (||zx|| ~ 1/e, where £ isthe machine precision), so that the normalized vectors v, would satisfy (A — I)v, ~ 0 and henceAvy ® Up as required

Of course, Krylov subspace methods involve many matrix-vector products Av (as in the powermethod) and additional storage in the form of some very large work vectors

Trang 14

in order to compute null vectors 0 satisfying Av 0 (We have now replaced A — I by A,and A may be rectangular.) For almost any nonzero vector ö, the computed solution x should

be extremely large in norm, and the normalized vector v = z/||z|| will be a null vector of A.Our first test matrix A was derived from Ay, the 500 x 500 unsymmetric Harvard matrixcalled harvard500 assembled by Cleve Moler [71] to simulate the PageRank problem Withnormal stopping tolerances in place, we found that LSQR converged to a least-squares solutionthat did not have large norm (and was not a null vector of A) Only after disabling all stoppingconditions were we able to force LSQR to continue iterating until the solution norm finallyincreased toward 1/e, giving a null vector v = #/||z|| as required

1.1.2 A Revelation

The question arose: Which solution z was LSQR converging to with the normal stopping ruleswhen A was singular? Probably it was the minimum-length solution in which ||z||a is minimizedamong the (infinitely many) solutions that minimize ||Áz — 6|/2 In any case, the associatedresidual vector r = b — Ax was satisfying AÍr = 0 because LSQR’s stopping rules require I4“rlI/(II4|Illr|l) to be small when ||r|| 4 0 Suddenly we realized that we were computing a nullvector for the transpose matrix AT This implied that to obtain a null vector for the singularmatrix A in (1.3), we could solve the least-squares problem

min |A7y—clla, AERTM*", ~~ rank(A)<n (1.4)

with some rather arbitrary vector c The optimal residual s = c — 4! would satisfy As = 0,and the required null vector would be v = s/||s|| Furthermore, LSQR should converge sooner

on (1.4) than if we force it to compute a very large x for (1.3)

Figure 1.1 shows LSQR converging twice as quickly on (1.4) compared to (1.3)

If b happens to lie in the range of A, the optimal residual is r = 0, but otherwise—for example, if

b is a random vector—we can expect r z 0, so that ø = r/|[r|| will be a null vector, and again itwill be obtained sooner than if we force iterations to continue until ||+|| is extremely large In aleast-squares problem, ||r|| > 0 and thus MINRES would need new stopping conditions to detect

if || Ax|{/|{x|| or ||Ar||/||r|| were small enough We derive recurrence relations for || Az|| and || Ar||that give us accurate estimates without extra matrix-vector multiplications

We created our second test matrix from harvard500 by defining A= Ayt+ AT, and structing a diagonal matrix D with diagonal elements d(i) = 1/V||A(, :)||¡, which is well-defined

Trang 15

FIGURE 1.1 Solving min ||Az — b|| (1.3) and min ||ATụ — c|| (1.4) with A= An — I, b random, ||b||a = 1,

and c = b, where Ay is the 500x500 Harvard matriz of Moler [71] The matriz is unsymmetric with rank

499 Both solves compute the null vector of A Left: With the normal stopping rules disabled, LSQR on

min || Aa—b|| (1.8) takes 711 iterations to give an exploding solution x; such that ||Azx|{/||ax|| * 1x 107°,where k is the LSQR iteration number Right: In contrast, LSQR on mìn ||A“ụ — c|| (1.4) takes only

311 iterations to give s¿ = c— ATuy such that ||Asg||/||se|| © 7 x 1077

testNull3([3,4])

200 400 600k

4

200 400 600k

—

200 400 600k

To reproduce this figure, run

Trang 16

because there is no zero row in Az, and then we apply diagonal scaling: A = DAD Note that A is

not a doubly stochastic matrix (which would have a trivial dominant eigenvector e = [1, , 1|?),but it happens to have a simple dominant eigenvalue 1 We applied MINRES twice on (1.5) withthe shifted matrix A:= A—TJ anda randomly generated 6: the first time with normal stopping

conditions and a second time with all stopping conditions disabled except ||Az||/||x|| < tol Theresults are shown in Figure 1.2

Given that singular least-squares problems have an infinite number of solutions, the samequestion arises: Which solution does MINRES produce on singular problems? As for LSQR, wesurmised that it would be the minimum-length solution, and indeed this is true for MINRESwhen ở lies in the range of A However, when the optimal r = b — Az in (1.5) is nonzero, wefound experimentally (and later theoretically) that MINRES does not return the minimum-lengthsolution

Thus began the research that comprises most of this thesis A new implementation calledMINRES-QLP has been developed that has the desired property on singular systems (that ofminimizing ||z||) The implementation is substantially more complex, but as a bonus we expectMINRES-QLP to be more accurate than the original MINRES on nonsingular symmetric systems

Ar =b

For a preview of the performance of MINRES-QLP compared to MINRES with normal ping conditions on symmetric problems, see Figures 1.3-1.6 On ill-conditioned nonsingular com-patible systems, the solution quality of MINRES-QLP could be similar to that of MINRES, butthe residuals are much more accurate (see Figures 1.5 and 1.6) There are applications, such asnull-vector computations, that require accurate residuals On singular systems, MINRES-QLP’ssolutions and residuals could be much more accurate than MINRES’s (see Figures 1.3 and 1.4).Ipsen and Meyer [60] state that in general, Krylov subspace methods such as GMRES onsingular compatible systems yield only the Drazin inverse solution (see section 2.3.2 for moredetails) GMRES is actually mathematically equivalent to MINRES if A is symmetric Incontrast, our work shows that both MINRES and MINRES-QLP could give us the minimum-length solution

stop-Ipsen and Meyer [60] also show that in general, Krylov subspace methods return no solutionfor inconsistent problems However, we show that MINRES computes a least-squares solution(with minimum ||r|/2) and our new Krylov subspace method MINRES-QLP gives the minimum-length solution to singular symmetric linear systems or least-squares problems

In Chapter 2 we establish that for singular and incompatible Hermitian problems, existingiterative methods such as the conjugate-gradient method CG [57], SYMMLQ [81], MINRES [81],and SQMR [38] cannot minimize the solution norm and residual norm simultaneously

Trang 17

Right: In contrast, MINRES with normal stopping conditions takes only about 42 iterations to give rx

such that ||re|| 0.78 and ||Arel|/|[re|| 1 x 107° Since the system is incompatible, MINRES needs

new stopping conditions to detect if ||Axx||/||z«|| or ||Arw|| is small enough To reproduce this figure, runtestNull3([5,6])

Trang 18

FIGURE 1.4 Solving min ||Az — b|| (1.5) with symmetric A as in Figure 1.2 and Figure 1.3 We define

b = Az +22, where z\¡ and z2 are randomly generated with ||z\|1 13 and ||z|l2 10~12, and then

we normalize b by its 2-norm Thus b has very small component in the null space of A—if any atall The matrix has rank 499 but the system Ax = b is nearly compatible The plots of MINRES andMINRES-QLP overlap completely except for the last two iterations MINRES takes 143 iterations to give

a nonminimum-length solution zp such that ||#k|| + 4.0, while MINRES-QLP takes 145 iterations to give

|u|] ~ 0.75, with ||re|| z 107"? and ||Arg|| ~ 1074 in both cases We also computed the TEVD solution

and found that tt matches our MINRES-QLP solution here If we had not known that this were generated

as an almost compatible system, we would have guessed that it is compatible MINRES-QLP appears tohave a better regularization property than MINRES This example also prompts us to ask the question:how to put a dividing line—in terms of ||rx|| —between a linear system and a least-squares problem? Toreproduce this figure, run PreviewMINRESQLP1 (4)

Trang 19

10° 10° 29

MINRES +0 — += - MINRES-QLP

Ilre le lLAsllz+g-” ll+ lÍa 0g [

FIGURE 1.5 Solving Ax = b with symmetric positive definite A = Qdiag([10~°,2 x 107°,2: a5 : 3])Q

of dimension n = 792 and norm ||A||a = 3, where Q = I — (2/n)eeTM is a Householder matrix generatedbye = [1, ,1]" We define b = Ae (||b|| + 70.7) Thus, the true solution is x =e and ||+|| = O(|b||).This example is constructed similar to Figure 4 in Sleijpen et al [96] The left and middle plots differafter 30 and 39 iterations, with the final MINRES solution a giving |r || ~ 10—'° and || Arp‘ || z 10719, while the final MINRES-QLP solution sự gives ||r@\| ~ 10-1? and ||Arp|| ~ 10712 The right plot shows that |\as|| is very similar for both methods; in fact for the final points, ||xj* || |x@ || ~ 2.8 and lai — z|| ~ Ila — z||2x 107" To reproduce this figure, run PreviewMINRESQLP2(2).

FIGURE 1.6 Solving Ac = b with the same symmetric positive definite A as in Figure 1.5 but with

b = e Since cond2(A) + 10° and ||b|la = Vn, we expect the solution norm to be big (|\z|| > {ldll).The left and middle plots differ after 22 and 26 iterations, with the final MINRES solution al giving lIr#f|| 10~? and ||Arjfl| ~ 107? only, while the final MINRES-QLP solution sự gives ||rg|| ~ 107" and ||Ar?|| ~ 10-° The right plot shows that |zv|| is very similar for both methods; in fact for the final points, |z#! || = ||z£|| ~ 10° but ||z#f — 2? || = 1.4 To reproduce this figure, run PreviewMINRESQLP2(1).

Trang 20

1.2 Preliminaries

1.2.1 Problem Description and Formal Solutions

We consider solving for the n-vector x in the system of linear equations

Az =b (1.6)

when the n xn real symmetric matrix A is large and sparse, or represents an operator for formingproducts Av When the real vector b is in the range of A, we say that the system is consistent

or compatible; otherwise it is inconsistent or incompatible

When A is nonsingular, the system is always consistent and the solution of (1.6) is unique.When A is singular and (1.6) has at least one solution, we say that the singular system isconsistent or compatible, in which case it has infinitely many solutions To obtain a uniquesolution, we select the minimum-length solution among all solutions z in R” such that Az = b

On the other hand, if the singular system has no solution, we say that it is inconsistent orincompatible, in which case we solve the singular symmetric least-squares problem instead andselect the minimum-length solution:

% = arg min ||Azx — bl|a (1.7)More precisely, the minimum-length least-squares problem is defined as

min ||zlla s.t 2 € argmin ||Az — b|a, (1.8)

or with the more commonly seen but actually a slight abuse of notation

minl|zll s.t 2 = argmin||Az — b|a (1.9)

The minimum-length solution of either (1.6) or (1.7) is unique and is also called the doinverse solution Formally,

pseu-at = (ATA) ATb = (A2) AD,

where 4Ì denotes the pseudoinverse of A We postpone the definition and more discussion ofpseudoinverse to section 2.3.1

We may also consider (1.6) or (1.7) with A’s diagonal shifted by a scalar o Shifted problemsappear, for example, in inverse iteration (as mentioned in section 1.1) or Rayleigh quotientiteration The shift is mentioned here because it is best handled within the Lanczos process (see

section 2.1) rather than by defining A = A - ol.

Two related but.more difficult problems are known as Basis Pursuit and Basis Pursuit Noising [22, 23] (see also the Lasso problem [103]):

De-min ||z||, st Az =),

BH

min À||z|Ìi + ‡lIrll st Az+r=b,

where A is usually rectangular (with more columns than rows) in signal-processing applications

Trang 21

Existing iterative algorithms since CG was created in 1952 All methods require products Au, for a

TABLE 1.1

sequence of vectors {uy} The last column indicates whether a method also requires ATuy for a sequence

of vectors {uy}

| Linear Equations | Authors Properties of A | A7? |

CG Hestenes and Stiefel (1952) [57] Symmetric positive definite

CRAIG Faddeev and Faddeeva (1963)[33] Square or rectangular yesMINRES Paige and Saunders (1975) [81] Symmetric indefinite

SYMMLQ Paige and Saunders (1975) [81] Symmetric indefinite

Bi-CG Fletcher (1976)[35] Square unsymmetric yesLSQR Paige and Saunders (1982) [82, 83] | Square or rectangular yesGMRES Saad and Schultz (1986) [89] Square unsymmetric

CGS Sonneveld (1989) [98] Square unsymmetric

QMR Freund and Nachtigal (1991) [37] Square unsymmetric yesBi-CGSTAB Van der Vorst (1992) [109] Square unsymmetric yesTFQMR Freund (1993) [36] Square unsymmetric

SQMR Freund and Nachtigal (1994) [38] Symmetric

| Least Squares | Authors Properties of A | AT? |

CGLS Hestenes and Stiefel (1952) [57] Square or rectangular yesRRLS Chen (1975) [24] Square or rectangular yesRRLSQR Paige and Saunders (1982) [82] Square or rectangular yesLSQR Paige and Saunders (1982) [82] Square or rectangular yes

1.2.2 Existing Numerical Algorithms

In this thesis, we are interested in sparse matrices that are so large that direct factorizationmethods such as Gaussian elimination or Cholesky decomposition are not immediately applicable.Instead, iterative methods and in particular Krylov subspace methods are usually the methods ofchoice For example, CG is designed for a symmetric positive definite matrix A (whose eigenvaluesare all positive), while SYMMLQ and MINRES are for an indefinite and symmetric matrix A(whose eigenvalues could be positive, negative, or zero)

The main existing iterative methods for symmetric and unsymmetric A are listed in Table 1.1

1.2.3 Background for MINRES

MINRES, first proposed in [81, section 6], is an algorithm for solving indefinite symmetric linearsystems A number of acceleration methods for MINRES using (block) preconditioners have beenproposed in [73, 51, 105] Researchers in various science and engineering disciplines have foundMINRES useful in a range of applications, including:

® interior eigenvalue problems [72, 114]

e augmented systems [34]

e nonlinear eigenvalue problems [20]

Trang 22

e characterization of null spaces [21]

e symmetric generalized eigenvalue problems [74]

singular value computations [112]

letters such as b, u,v, w and x (possibly with integer subscripts) to denote column vectors of

length n In particular, e, denotes the kth unit vector We use upper-case italic letters (possibly with integer subscripts) to denote matrices The exception is superscript T, which denotes the transpose of a vector or matrix We reserve I, to denote identity matrix of order k, and Q¿ and

P, for orthogonal matrices Lower-case Greek letters denote scalars The symbol ||-|| denotes

the 2-norm of a vector or the Frobenius norm of a matrix We use «(A) to denote the conditionnumber of matrix A; R(A) and (4) to denote the range and null space of A; K,(A, 6) to denote the kth Krylov subspace of A and 6; and A? is the pseudoinverse of A We use A > 0 to denotethat A is positive definite, A 4 0 to mean that A is not positive definite (so A could be negativedefinite, non-negative definite, indefinite, and so on) When we have a compatible linear system,

we often write Az = b If the linear system is incompatible, we write Ar b as shorthand for thecorresponding linear least-squares problem min || Az — b||s We use symbols || to denote parallelvectors, and L to denote orthogonality

Most of the results in our discussion are directly extendable to problems with complex matricesand vectors When special care is needed in handling complex problems, we will be very specific

We use superscript H to denote the conjugate transpose of a complex matrix or vector

1.2.5 Computations

We use MATLAB 7.0 and double precision for computations unless otherwise specified Weuse £ (varepsilon) to denote machine precision (= 25? = 2.2 x 10718) In an algorithm, weuse // to indicate comments For measuring mathematical quantities or complexity of algorithms,sometimes we use big-oh O(-) to denote an asymptotic upper bound [94, Definition 7.2]:

# O(g(n)) if 3 e > 0 and a positive integer nọ € N such that Vn € Ñ, n = nọ,f(n) < eg(n).

Thus a nonzero constant a = O(1) = O(n) and an = O(n) We note that f(n) = O(g(n)) is aslight abuse of notation—to be precise, it is f(n) € O(g(n)).

Following the philosophy of reproducible computational research as advocated in (27, 25], forIA

n)

each figure and example we mention either the source or the specific MATLAB command

Trang 23

1.2.6 Roadmap

We review the iterative algorithms CG, SYMMLQ, and MINRES for Hermitian linear systemsand least-squares problems in Chapter 2, and show that MINRES gives a nonminimum-lengthsolution for inconsistent systems We also review other Krylov subspace methods such as LSQRand GMRES for non-Hermitian problems, and we derive new recursive formulas for efficientestimation of ||Arz||, || Azx||, and the condition number of A for MINRES

In Chapter 3, we present a new algorithm MINRES-QLP for symmetric and possibly singularsystems Chapter 4 gives numerical examples that contrast the solutions of MINRES with theminimum-length solutions of MINRES-QLP on symmetric and Hermitian systems

In Chapter 5, we return to the null-vector problem for sparse matrices or linear operators,and apply the previously mentioned iterative solvers

Chapter 6 summarizes our contributions and ongoing work

Trang 25

Hermi-2.1 The Lanczos Process

The Lanczos process transforms a symmetric matrix A to a symmetric tridiagonal matrix with

an additional row at the bottom:

a Bo 0a a2 3

T, = 83 a3

~ Bre

Be Ok Pr+i]

If we define T;, to be the first k rows of T,, then T;, is square and symmetric, and

The Lanczos process iteratively computes vectors vz, as follows:

vp = 0, Biv, = 6, where Øị serves to normalize v1, (2.1)

Pk = Avr, Ok = UEDk,

Bk+-1Vk+1 = Pk — AVE — kUy_i, Where y+ serves to normalize 0g++ (2.2)

In matrix form,

AV, = Ve4ilk, where Vụ = le: vee 9|: (2.3)

In exact arithmetic, the columns of Vi are orthonormal and the process stops when /y+i = 0(k <n), and then we obtain

AV, = VT (2.4)

13

Trang 26

TABLE 2.1

Algorithm LanczosStep

LanczosStep(A, 0y, 0y_1; Bk, 7) —> Oks Beets Uk+1

Dk = uy — TUR, Qk =VUEPk, Dk C— Dk — OkUk

Đg+1 = Pk — vUk—l, — Be+i = ÌÌÐe+llÌ

if Bra # 0, Đg+i — Đg+1/¿+a end

while 6, 4 0 and k < maxit

LanczosStep(A, 0y, Up—1, Ủy, 0) > Ok, Bet, Vk+1

ke—k+l

end

The above discussion can be extended for A —oJ, where o is a scalar shift We call each ation in the Lanczos process a Lanczos step: LanczosStep(A, 0y, 0y—1, Ổy, 7) —> Oh; Ổg+t› Vet:See Table 2.1 and Table 2.2

iter-We need to keep at most the matrix A or a function that returns Az if A is a linear operator, 3vectors, and 3 scalars in memory In fact, a careful implementation would only require 2 vectors

in working memory at a time, if 0y; replaces vy_; Each iteration performs a matrix-vectormultiplication, 2 inner products, 3 scalar-vector multiplications, and 2 vector subtractions, whichsums up to 2v + 9n floating-point operations per iteration, where v is number of nonzeros in A.The Lanczos process stops in at most min{rank(A) + 1,n} iterations It stops sooner when

A has clusters of eigenvalues or 6 has nonzero components along only a few eigenvectors of A

Definition 2.1 (kth Krylov subspace with respect to A and b) Given a squaren x nmatric A € R"*” and an n-vector b € R(A), we define the kth Krylov subspace of (A,b) as

Kg(A,b) := span{b, Ab, , A"—-1b} = span{v1, , ve}; (2.5)

where k is a positive integer

Proposition 2.2 Given symmetric A € R"*" and b € R” and supposing that Ø, > 0 for

¿=1, ,k but Ög+\ =0 in the Lanczos process, we have the following results

1 Ifbe€ N(A), then œ =0, Gove = 0 and rank(A) > 1

2 Ifb € R(A), then v1 || b and v2, ,v% L b are k orthogonal vectors that lie in R(A) and

n > rank(A) > k

3 Ifb ¢ R(A) andb ¢ N(A) (that is, N(A) is nontrivial; b has a nonzero component in R(A)and a nonzero component in N(A)), then v1, ,U% have nonzero components in R(A) andthus n > rank(A) > k-1

Trang 27

Proof 3 Let b = bg + by, where b is the component of b in R(A) and by is the component ofbin (4) The first Lanczos step gives 6101 = Øi(01/8 + 01V) = bg + bn = b So

ay = vf Av = (018 + uw) Aloe + UW) = UL RÂ01/8, (2.6)

sua = Avy — 0101 = A(01 + 01V) — 01(01® + UW)

= Á(0¡ + UN) — (vir + Uw) — ¿(U—1/R8 + ¿~1,MV)

= Avr — 0/1 — Ô/U¡_1/® + — OU¿V — Ô¿Ui—1,N

mm i‡19:+1,V

so that +1, || 01V || bv Thus

= T

[vs tae ve | = [vir toe v,R| + UNC ,

where cl = [1 a ¬ for some scalars c; Thus,

rank ([vs.r wee one] ) = rank ({v1 see ve | — vie") =k-lork

since a rank-1 change to a full rank matrix of rank k can only change the matrix rank by atmost 1, and an n x k with k < n matrix could have at most rank k Thus

rank(A) > rank (Gee " TT) = k— 1 or

Corollary 2.3 Given symmetric A € R”*", we define r = rank(A)

1 Ifbe€ R(A), then Bua, = 0 for somek<r<n

2 Ifr <n andb¢ R(A), then ¿+ = 0 for somek<r+1<n

Theorem 2.4 Given a symmetric matriz A € R"*” with s distinct nonzero eigenvalues and

b € R” that has nonzero components along t (t < s) eigenvectors of A that correspond tot distinctnonzero eigenvalues of A, then 8x41 = 0 for some k < min{t+1,s} ifb £ R(A), ork <t of

Trang 28

(b) Ifb =[12000]", then t = 2, 83 = 0.

2 Let A=diag([12300]),n=5,r=s=3

(a) Ifb=(12000]", then b € R(A), t=

(b) Ifb=[10000]", then be R(A), t=1, hs =

(c) Ifb=[12340]', thenb¢ R(A), t=

(d) Ifb=[10040]*, thenb¢ R(A), t=

ở Let A= diag ([22300]),r=3,s =2

(a) Ifb=([12000]° , thenbe R(A), t = 1, 63 = 0.

(b) JƑb=[t2+34o]”, thenb¢ R(A), t= 2, Gy = 0.

(c) Ifb=[10040]", then b # R(A), t=1, 63 = 0.

2.2 Lanczos-Based Methods for Linear Systems

In each Lanczos step, we solve a subproblem to find z„ € K;,(A,b) such that x, = Vy for some

€ RẺ It follows that ry = b— Ary = Ve41(G1e1 —T,y), and all Lanczos-based methods attempt

to make địei — 7k small in one way or another CG focuses on the first k equations, attempting

to solve for T,y = 81e, by applying the Cholesky decomposition to 7¿ SYMMLQ concentrates

on the first k — 1 equations and wants to solve the underdetermined system Ty-1 Ty = đyey That

said, since J, is available in the kth iteration, SYMMLQ goes ahead and solves Thy = Bye,

instead by applying the LQ decomposition to Tk MINRES works to minimize the 2-norm of

Ø:e+ — Thy by applying the QR decomposition to Ty The following stencil depicts the rationaleand focuses of the three methods, where s’s represent the last row of the tridiagonal matrix inSYMMLQ% (k — 1)th iteration, c’s in CG’s kth iteration, m’s in MINRES’s kth iteration, and *for common entries of all three methods:

summa-An iterative process generates certain quantities from the data At each iteration a

subproblem is defined, suggesting how those quantities may be combined to give a newestimate of the required solution Different subproblems define different methods forsolving the original problem Different ways of solving a subproblem lead to differentimplementations of the associated method

Tables 2.3-2.4 (from [90]) give the subproblem associated with each method, and the mechanismfor defining solution estimates for the original problem in terms of various transformed bases

CG and LanczosCG are two implementations of the same method

Trang 29

TABLE 2.3

Subproblem definitions of CG, SYMMLQ, and MINRES

LanczosCG Try = rei Cholesky: #y = VkWk

SYMMLQ [81, 90] | +: = arg min, cgk+: LQ: Le = Vụ+1k+1

{ilyll | TẾy = Grex} TQ, = [Le 0] | eKeea(A,9)

MINRES [81] 1 = arg min, ck \|Tey — Øiel|| QR: th = VkWk

QeTk = H €K¿(A,b)

TABLE 2.4Bases and subproblem solutions in CG, SYMMLQ, and MINRES

| Estimate of x,Method New basis 2k

CG We := Velg? Oe Lr De Ber = Brer ah = Wee

©; := diag(||r.|l, - › re ||)

1

SYMMLQ | Wy := W+:Qk ñ L„zk = Bier me = Week

MINRES De = VR? Ryze = Ổi LẺ 0| Qkei | te = Deer

Another way to classify Krylov subspace methods is based on the error and residual properties

as described in Demmel [29, section 6.6.2]:

find x, = argming, ex, a,b) |Ì# — el], where x denotes the true solution

Table 2.5 gives an expanded description

TABLE 2.5Residual and error properties of CG, SYMMLQ, and MINRES

CG for A> 0 || min|[rgi[4-1, re L Kg(A,b), Are L Kz_i(A,b) min ||# — xxl 4 SYMMLQ rh Kg(A,b), Arg L Kg_i(A,b) min ||# — #kÌÌa MINRES min lÌrz|la Ôk+1 =0U=>rzg L Kk(4, b), Ary L Ki, (A, b) —

Trang 30

WY, CSPY(} Vy V,- I) MINRES-SOL63 [Ir

FIGURE 2.1 A is symmetric tridiagonal of order 100 and full rank, and b is a scalar multiple of ey TheLanczos vectors are the sparsest possible: uy = e, Left: In double precision, loss of local orthogonalityAMONG Vk~2,Uk—-1,Uk for each iteration k = 1, ,94, and loss of global orthogonality among v1, , Uk.Middle: Color-spying elementwise-absolute values of VeVi; —I The color patterns are symmetric Theupper left corner is usually closest to zero (of order e) and white in color The area closer to the diagonalindicates the extent of loss of local orthogonality In contrast, the areas in the upper right and lower leftcorners correspond to the loss of global orthogonality, which is larger in magnitude and darker in color.Right: Loss of global orthogonality in the Lanczos basis, however, implies convergence of solution in theLanczos-based solver MINRES This figure can be reproduced by LossOrthogonality(L)

In finite-precision arithmetic, the columns of V; are observed to lose orthogonality whenthe x;’s from one of the Lanczos-based methods are converging to the solution [78, 84] SeeFigure 2.1

2.2.1 CG

In this section, we present two equivalent CG algorithms One is derived from the Lanczos processfor academic interest (Table 2.6), and the other is the standard CG algorithm (Table 2.7), which

is more memory efficient and commonly found in the literature (e.g., [50])

The kth iteration of CG works on the Cholesky factors of T} from the Lanczos process:

1

to 1

Ty = LyDkLƑ, Ly = ¬ , Dy = diag(51, , 5%).

In the rest of this section, we highlight a few important properties of CG We first assume

A > 0 and then relax it to A > 0 later

Proposition 2.5 (||Ar,|| for CG)

1 ||Arol| = lIroll 4/2.

2 J|4rgl|| = llrel ‘oo (a +Heg+i + để) + apt fork =1, when q,Aqy # 0.⁄k+1

3 ||Arzll = |lrell a (1+ ue4i) when gf Ag, = 0.

Trang 31

TABLE 2.6

Algorithm LanezosCG We assume A is symmetric only

LanczosCG(A, b, ơ, maxit) — z, ¢

8i = ||blls vo =0, 8:01 = b, zo =0, do=fi, k=l

while no stopping condition is true,

LanczosStep(A, uy, 04-1, Ổy, 0) > Ok, k+L; Đk+t

//update solution and residual norm

if 6, <0, STOP end //A indefinite, perhaps unstable to continueifk=1

CG(A, b, tol, maxit) — ø, Ó, x, A, K //if z =0, no converged solution

zo=0, ro=b, @=|ll, xo=0 8 =KIroll”, a = To

k=l, k= Ì, A=0, Vmin = 0

while (atta > tol or (k < maxit)

Sk=Ade, — Êu = USk

if £, <0

Lp := 0, on = Ôt, Xe = 0, STOP //q, is a null vectorend

Ve = OR-1/Ek, PR k~i ÐUkQk = Tp=Tk—t— UkSk, = Xk = ||#k|

đệ = WIrell?, Meo = 2 /62_1, đk+t = Tk + Hk+1k //gradient

Ứmin = mÌn{Vmin› Ve}; A = max{A, vx}, k= mg k=k+1

end

L=Xke, P=Ok, xX = Xk

Trang 32

The following lemma implies that CG is only applicable to symmetric linear systems.Lemma 2.6 ||r;|| = 0 if and only if ||Ars|| = 0.

Proposition 2.7 (Null vector of A = 0 from CG’s breakdown) In exact arithmetic, if

Á >0 and & = đị Aqx =0, then Uy becomes undefined and CG breaks down, and the gradient qx

is 0 null vector of A

Proposition 2.8 (Null vector of A > 0 from CG’s exploding solution) In finite-precisionarithmetic, if A > 0 and & = qFAqy = O(e) in CG, then v, and zp explode, and x, (normalized)

is an approximate null vector of A

When we know in advance that A is symmetric negative semidefinite, we can apply CG to(—A)x = —b to get a solution since A ~ 0 if and only if —A > 0

Most textbook discussions restrict application of CG to a symmetric positive definite matrix

A because ||-||„ and ||-||,-1 are in general not defined for singular A However, CG can often beapplied to a symmetric positive semidefinite matrix A (all eigenvalues of A nonnegative) withoutfailure if b € R(A) Moreover, CG sometimes also works with a symmetric indefinite (singular)matrix if we change the stopping condition from (& < 0) to (€ = 0) For example,

Proposition 2.9 (Solution of z7Az = 0 from CGI’s breakdown) In ezact arithmetic,

if € = gp Agk = 0, then vy, becomes undefined and CGI breaks down, and the gradient q, is a solution of the quadratic equation xTAz = 0.

Proposition 2.10 (Solution of zÏAz = 0 from CGI’s exploding solution) In precision arithmetic, if &, = qf Aq = O(e) in CGI, then vy, and xp explode, and x, (normalized)

finite-is an approximate solution of the quadratic equation x7 Az = 0.

Example 2 A case when CG and CGI fail

—90 —20

—19 —19

Ab # 0, but q?Aq, = bTAb = 0, rendering failure of CGI However, SYMMLQ and MINRESwork to give the solution [110 0 110]

Trang 33

TABLE 2.8

Algorithm CGI We assume A= AT only

CGI(A, 6, tol, maxit) — z,¢,x, A,« //if c=0, no converged solution

ro = 0, To =Ù, Ar = loll, xo = 0, đã = llroll”, đi =70,

k=l, K=1, A=0

while (afta: > tol) or (k < maxit)

Sk = Aqn, bk = đ%k

if €, = 0 rE :=0, bp = Ổi, Xk = 9, STOP end

Ve = PR_1/€k; Lp = Le-1 + Vkdk› Th = Tk-1 — VeSks Xk = [rel

đệ =|Irell?, tu+i = 02/021, Met = Tet Merge — //gradient

Vein = min{Vmin, |vel}, A = max{A, |vg|}, K= mảng k=k+l

x++ = arg min {||w|| | Tu”ụ = Bier, ye Ret}, (2.8)

where Tk is available at the kth Lanczos Step The subproblem is best solved using the LQ

bottom-right element of TẾ A compact way to describe the action of f2 w+¡ is

Trang 34

imple-TABLE 2.9Algorithm SymOrtho.

Ck = Vy+1Uk+1 = Vari Pr = Wrz = #k~1 + CeWe, (2.12)

where Wy+i» = [Ws Di); Cy is the last component of z¿, and

we will see) SYMMLQ is designed for compatible linear systems but not least-squares problems.Most of the following SYMMLQ properties are presented and succinctly proved in the later part

of (81, section 5]

Proposition 2.11 (r, of SYMMLQ)

1 To = Birr = b and llrol| = By.

Trang 35

TABLE 2.10

Algorithn SYMMLQ with possible transfer to the CG point at the end This algorithm also estimatessolution and residual norms x = ||#x||, ¢ = | |r, || At the end of the algorithm, if the recurrently computedresidual norm of CG point oe is smaller than that from SYMMLQ, the algorithm will compute the CGiterate øẸ from the SYMMLQ iterate xy

SYMMLQ(A, b,c, maxit) — x, 6, x

Br=llblo, w=0, — đu =Ù,

do = fi, xo = 0, C-1 =o =9,

while no stopping condition is true

LanczosStep(A, vp, 0,_—1, Gk, 0) 4 Ok, Bayi, Đk+1

//last right orthogonalization on middle two entries in last row of TẾ 5? =œ-_1ổt) + so — +)

//last right orthogonalization to produce first two entries of Ïy16k+2(4) _ 1

We = ChOk + SkUk+l, — TDk+t = SeWk — CvUk+l, Lk = #k—1 + ChWkend

# =#-;, X = Xk

Trang 36

8 Fork > 1, define (y1 = We Ces and 0+2 = Dade Then

Th = 0k+1Uk+1 — Oh+2Vk+2) (2.14)

x := lÌrk|Ì = | [eet Øk+2] | = [ons +52) Ce ox+2]) where pe = €j-216k-1- (2-15)

Thus, Vir, = 0.

Proposition 2.12 (Ar, of SYMMLQ)

1 Aro = Bx (a1v1 + Bove) and \|Arol| = Bio? + BB.

2 Define we41 = Pet and 0+2 = 6.) ole Then

Lemma 2.14 (Solution norm of SYMMLQ and its monotonicity) Let x9 = 0 Then

Xk = llvello = ÌÌzw|Ì = 4/X$_¡ + C2 ts monotonically increasing as k increases.

Proposition 2.15 (SYMMLQ’s breakdown on incompatible systems) Suppose we want

to solve Ax = b where A= AT and b are given In exact arithmetic, if +?) =0, then SYMMLQ

breaks down If 62) = (1) = 0, then x,_1 is our solution; otherwise, b ¢ R(A) and there is no

solution from S VMMLO

In finite precision, we may be able to obtain an exploding solution of SYMMLQ by disablingthe normal stopping rules However, that is usually not a null vector of A To obtain a nullvector of A, we recommend transferring to a CG point at the end or using ¿ when /y+¡ = 0

Proposition 2.16 (Transfer to CG point) Suppose that A is symmetric positive semidefinite Let x denote the kth iterate from CG, eC := 2-20, and Ó£ be the norm of the corresponding residual rf = b— AzE Then we have the following results:

1 22 =zy + (ss)

ñPk+t-2 |sỹ|, = [lel + (%#)ˆ > Izxls

C _ Lfk+1919293 '8k—1 _ |Ck—1|Sk

ở oy = pO] lex oe 1°

Trang 37

Lemma 2.17 If 0+1 =0 and +? = 0, then 1y is a unit null vector of A.

least-Yk = arg min \|G1e1 — Teylle, (2.18)

by computing the QR factorization

where Q¿ = Qk,k-+1'+*Q2,3Q1,2 is a product of (k + 1) x (k + 1) Householder reflectors designed

to annihilate the G,’s in the subdiagonal of T, Of course, this is the transpose of the LQfactorization used in SYMMLQ, with Q, = Pf and Qx,k+i = P¿,w+¡ in (2.9)-(2.10) Thus our

Trang 38

TABLE 2.11

Algorithm MINRES The algorithm also estimates ¢ = ||re|, = ||Arell, x = flee, A = IAI,

& = cond(A)

MINRES(A, b, ơ, maxit) — z, Ó, Ó, X;.Á, K

Ar = l|la vo = 0, Biv, = , do = To = Fi, xo = 0, K=1

while no stopping condition is true,

LanczosStep(A, uy, ve—1, 8k, 7) > Oks Ổy+L› Đk+1

//last left orthogonalization on middle two entries in last column of Ty

fk=1 Ap= vo + 6% else Ax = max{.Ás~, ,/ 62 +02 + +} end

/fapdate solution and matrix condition number

if 4” £0,

dp = (% — 6 dys — daz) Jy, #k = #k_—1 + Trdr, Xk = llzzil

“min = min{Ymin; +12}, k= Ag /Ymin

end

k-—k+1

end

r=, ¢=b— =e Obs)? +2) x=xe A=Ab

A compact way to describe the action of Qz,441 is

where it can be shown that

di, = (% — 5 dua — ef daz) J3”: (2.24)

A careful implementation of MINRES needs memory for at most the matrix A and 5 workingn-vectors for Up, Uk41, dk—1, dk, and x, in each iteration (not counting the vector b) There are2u + 9n flops per iteration, where v is number of nonzeros in A

Trang 39

while no stopping condition is true

Oe = Hk-t/|ÌÐk-l|P, tk = Th-1 + ORPR-1, Tk=Tk-i—OkUk-l, OK = [Tal

Ze= Ark, — Hè = 72 2e, Be = HkÍMk-t Pk = Tk +kPk—t

to caution the reader)

The following lemma gives a recurrence relation for rz It says that the intermediate r,’s arenot orthogonal to K;,(A,6) except when 6,41 = 0 In that case, s, = 0 and ry = —@xve41 isfinally orthogonal to K;(A,b) The residual norm can be recurred without computing rz

Lemma 2.18 (r; for MINRES and monotonicity of l|rg||s) r = 827k-1 — ÓkCkUk++ andlrello = llre—t|lasx It follows that ||rx||¿ < |Ìre—illa-

Similarly, || Ar,|| can be efficiently computed by the following recurrence relation While ||rx||,

is monotonically decreasing, ||Ar,|| is often observed to be oscillating

Lemma 2.19 (Ar; for MINRES)

Art = [Irell (af rveta + 60+),

2

|Aral = [nell [$2] + [22a].

Lemma 2.20 (Recurrence formula for ||/Az,|| for MINRES)

Ứk—1

Tk

Proposition 2.21 [fb c R(A), and in MINRES 8; > 0 fori =1, ,k, but By41 = 0, then

+ >0 and thua Tỳ and Ry are nonsingular.

||Ave|l2 = |ltell2 =

Proof Suppose + = 0 Then s, = 0 and thus r, = 0 and dy = ||rz|| = s;#;-i = 0 Then

Re = Q Tỳ is singular—of order k and rank k -1—and MINRES will proceed to set x4 := Zp-1

Trang 40

It follows that rp := rg_¡ and ó; = d¿~_¡ = 0 However, this contradicts the fact that MINRES had not stopped at the (k — 1)th iteration |

Corollary 2.22 If in MINRES 6, > 0 fori =1, ,k, and Bygi = 0, andy? =0, then Tr

and Ry are singular (both of order k and rank k — 1) and b ¢ R(A).

In the following, we review the definition of minimum-length solution or pseudoinverse solutionfor a linear system Then we prove that MINRES returns the unique minimum-length solutionfor any symmetric compatible (possibly singular) system

Definition 2.23 (Moore-Penrose conditions and pseudoinverse [ð0]) Given any m x nmatrix A, X is the pseudoinverse of A if it satisfies the four Moore-Penrose conditions:

1, AXA=A

9 XAX =X

3 (AX)# = AX.

4 (XA)E = XA.

Theorem 2.24 (Existence and uniqueness of the pseudoinverse) The pseudoinverse of

a matrix always exists and is unique

If A is square and nonsingular, then At, the pseudoinverse of A, is the matrix inverse AT1,

Even if A is square and nonsingular, we rarely compute A~! Instead, we would computesay the LU decomposition PA = LU or QR decomposition A = QR If we want the solution

of Ax = b, we do not compute z = A~!d but instead, solve the triangular systems Ly = Pb and Ux = y if we have computed LU decomposition of A, or Rx = QTb in the case of QR decomposition Likewise, we rarely compute the pseudoinverse of A It is mainly an analytical tool If A is singular, A~! does not exist, but Ax = b may have a solution In that case, there areinfinitely many solutions In some applications we want the unique minimum-length solution,

which could be written in terms of the pseudoinverse of A: zt = Atb However, to compute zÌ, we

would not compute At Instead we could compute some rankrevealing factorization of A such as the reduced singular value decomposition A = UDV7, where U and V have orthogonal columns

-and 3 is diagonal with positive entries Then the minimum-length solution is zi =V>7!U7).

Theorem 2.25 If b € R(A), and in MINRES G; > 0 fori=1, ,k, but B41 = 0, then xz isthe pseudoinverse solution of Ax = b

Proof We know that span(0, ,0y) C R(A) However, we assume span(0ị, , 0y) = R(A).Without this assumption, the result is still true but the proof will be more complicated

By Proposition 2.21, when 6,41 = 0, R;,* exists Moreover,

te = Veye = ViRy ite = Ve Rg G1Qn—1e1 = Veg Qe-1V, 0 (2.25)

Thus, we define

Ab := VuRE'Qg-iVý = VẶT, 'VỆ” since Qe—iTk = Re.

Tiêu đề	Iterative methods for singular linear equations and least-squares problems
Tác giả	Sou-Cheng (Terrya) Choi
Người hướng dẫn	Michael A. Saunders, Gene H. Golub, Rasmus M. Larsen, Doron Levy
Trường học	Stanford University
Chuyên ngành	Computational and Mathematical Engineering
Thể loại	Dissertation
Năm xuất bản	2006
Thành phố	Stanford

Định dạng
Số trang	113
Dung lượng	12,31 MB