Numerical Methods for Unconstrained Optimization and Nonlinear Equations SIAM's Classics in Applied Mathematics series consists of books that were previously allowed to go out of print These books are republished by SIAM as a professional service because they continue to be important resources for mathematical scientists Editor-in'Chief Robert E O'Malley, Jr., University of Washington Editorial Board Richard A Brualdi, University of Wisconsin-Madison Herbert B Keller, California Institute of Technology Andrzej Z Manitius, George Mason University Ingram Olkin, Stanford University Stanley Richardson, University of Edinburgh Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht Classics in Applied Mathematics C C Lin and L A Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan G F Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods James M Ortega, Numerical Analysis: A Second Course Anthony V Fiacco and Garth P McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques F H Clarke, Optimisation and Nonsmooth Analysis George F Carrier and Carl E Pearson, Ordinary Differential Equations Leo Breiman, Probability R Bellman and G M Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L Mangasarian, Nonlinear Programming *Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement Translated by G W Stewart Richard Bellman, Introduction to Matrix Analysis U M Ascher, R M M Mattheij, and R D Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K E Brenan, S L Campbell, and L R Petzold, Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations Charles L Lawson and Richard J Hanson, Solving Least Squares Problems J E Dennis, Jr and Robert B Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations Richard E Barlow and Frank Proschan, Mathematica/ Theory of Reliability Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N Parlett, The Symmetric Eigenvalue Problem *First time in print Classics in Applied Mathematics (continued) Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow Peter W M John, Statistical Design and Analysis of Experiments Tamer Ba§ar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Emanuel Parzen, Stochastic Processes Petar Kokotovic, Hassan K Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New Statistical Methodology James A Murdock, Perturbations: Theory and Methods Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II J M Ortega and W C Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications F Natterer, The Mathematics of Computerized Tomography Avinash C Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging R Wong, Asymptotic Approximations of Integrals O Axelsson and V A Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation David R Brillinger, Time Series: Data Analysis and Theory Joel N Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems Philip Hartman, Ordinary Differential Equations, Second Edition Michael D Intriligator, Mathematical Optimization and Economic Theory Philippe G Ciarlet, The Finite Element Method for Elliptic Problems Jane K Cullum and Ralph A Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol I: Theory M Vidyasagar, Nonlinear Systems Analysis, Second Edition Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice Shanti S Gupta and S Panchapakesan, Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations Eugene L Allgower and Kurt Georg, Introduction to Numerical Continuation Methods Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the NavierStofces Equations Copyright © 1996 by the Society for Industrial and Applied Mathematics This SIAM edition is an unabridged, corrected republication of the work first published by Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983 1098765 All rights reserved Printed may be reproduced, stored, permission of the publisher Applied Mathematics, 3600 2688 in the United States of America No part of this book or transmitted in any manner without the written For information, write to the Society for Industrial and University City Science Center, Philadelphia, PA 19104- Library of Congress Cataloging-in-Publication Data Dennis, J E ( John E ) , 1939Numerical methods for unconstrained optimization and nonlinear equations / J.E Dennis, Jr., Robert B Schnabel p cm — ( Classics in applied mathematics ; 16 ) Originally published : Englewood Cliffs, NJ : Prentice-Hall, ©1983 Includes bibliographical references and indexes ISBN 0-89871-364-1 ( pbk ) Mathematical optimization Equations—Numerical solutions I Schnabel, Robert B II Title III Series QA402.5.D44 1996 95-51776 The royalties from the sales of this book are being placed in a fund to help students attend SIAM meetings and other SIAM related activities This fund is administered by SIAM and qualified individuals are encouraged to write directly to SIAM for guidelines • is a registered trademark Numerical Methods for Unconstrained Optimization and Nonlinear Equations J R Dennis, Jr Rice University Houston, Texas Robert B Schnabel University of Colorado Boulder, Colorado siaiiL Society for Industrial and Applied Mathematics Philadelphia To Catherine, Heidi, and Cory Contents PREFACE TO THE CLASSICS EDITION PREFACE 1.1 1.2 1.3 1.4 xiii INTRODUCTION Problems to be considered Characteristics of "real-world" problems Finite-precision arithmetic and measurement of error 10 Exercises 13 21 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 xi NONLINEAR PROBLEMS IN ONE VARIABLE 15 What is not possible 75 Newton's method for solving one equation in one unknown 16 Convergence of sequences of real numbers 19 Convergence of Newton's method 21 Globally convergent methods for solving one equation in one unknown 24 Methods when derivatives are unavailable 27 Minimization of a function of one variable 32 Exercises 36 vii viii Contents 3.1 3.2 3.3 3.4 3.5 3.6 3.7 MULTIVARIABLE CALCULUS BACKGROUND 69 Derivatives and multivariable models 69 Multivariable finite-difference derivatives 77 Necessary and sufficient conditions for unconstrained minimization 80 Exercises 83 5.1 5.2 5.3 5.4 5.5 5.6 5.7 40 Vector and matrix norms and orthogonality 41 Solving systems of linear equations—matrix factorizations 47 Errors in solving linear systems 51 Updating matrix factorizations 55 Eigenvalues and positive definiteness 58 Linear least squares 60 Exercises 66 41 4.1 4.2 4.3 4.4 NUMERICAL LINEAR ALGEBRA BACKGROUND NEWTON'S METHOD FOR NONLINEAR EQUATIONS AND UNCONSTRAINED MINIMIZATION 86 Newton's method for systems of nonlinear equations 86 Local convergence of Newton's method 89 The Kantorovich and contractive mapping theorems 92 Finite-difference derivative methods for systems of nonlinear equations 94 Newton's method for unconstrained minimization 99 Finite-difference derivative methods for unconstrained minimization 103 Exercises 107 GLOBALLY CONVERGENT MODIFICATIONS OF NEWTON'S METHOD 6.1 The quasi-Newton framework 112 6.2 Descent directions 113 6.3 Line searches 776 6.3.1 Convergence results for properly chosen steps 120 6.3.2 Step selection by backtracking 126 6.4 The model-trust region approach 729 6.4.1 The locally constrained optimal ("hook") step 134 6.4.2 The double dogleg step 139 6.4.3 Updating the trust region 143 6.5 Global methods for systems of nonlinear equations 147 6.6 Exercises 752 111 Contents STOPPING, SCALING, AND TESTING 155 7.1 7.2 7.3 7.4 Scaling 155 Stopping criteria 759 Testing 767 Exercises 164 8.1 8.2 8.3 8.4 8.5 SECANT METHODS FOR SYSTEMS OF NONLINEAR EQUATIONS 168 Broyden's method 769 Local convergence analysis of Broyden's method 174 Implementation of quasi-Newton algorithms using Broyden's update 186 Other secant updates for nonlinear equations 189 Exercises 790 SECANT METHODS FOR UNCONSTRAINED MINIMIZATION 194 9.1 9.2 9.3 9.4 The symmetric secant update of Powell 795 Symmetric positive definite secant updates 198 Local convergence of positive definite secant methods 203 Implementation of quasi-Newton algorithms using the positive definite secant update 205 9.5 Another convergence result for the positive definite secant method 270 9.6 Other secant updates for unconstrained minimization 277 9.7 Exercises 272 10 10.1 10.2 10.3 10.4 10.5 218 The nonlinear least-squares problem 218 Gauss-Newton-type methods 227 Full Newton-type methods 228 Other considerations in solving nonlinear least-squares problems 233 Exercises 236 11 11.1 11.2 11.3 11.4 11.5 NONLINEAR LEAST SQUARES METHODS FOR PROBLEMS WITH SPECIAL STRUCTURE The sparse finite-difference Newton method 240 Sparse secant methods 242 Deriving least-change secant updates 246 Analyzing least-change secant methods 257 Exercises 256 239 ix 140 Chap Globally Convergent Modifications of Newton's Method Figure 6.4.5 The double dogleg curve, xc —> C.P -+ N —> x + along the curve from xc to C.P to N to x + This makes the process reasonable Point C.P in Figure 6.4.5 is found by solving which has the unique solution Therefore, and if the algorithm takes a step of length Let /(x) = x2 Show that the infinite sequence of points x, = + 2~', i — 0, 1, , is permitted by (6.3.3) for any a < j but is prohibited by (6.3.4) for any ft > Show that if/(x) is a positive definite quadratic, the Newton step from any xk e R" satisfies condition (6.3.3) for a < \ and (6.3.4) for any /? > Prove that if (6.3.4) is replaced by then Theorems 6.3.2 and 6.3.3 remain true; and that if in addition ft > j, then Theorem 6.3.4 also remains true This is Goldstein's original condition Let/(x) = xf + x|, xc = (1, 1)T, and the search direction pc be determined by the Newton step What will x+ be, using: (a) The Newton step? (b) Line-search Algorithm A6.3.1? (c) A "perfect line-search" algorithm that sets x+ to the minimizer of/(x) in the direction pc from xc? [Answer (c) approximately.] Globally Convergent Modification of Newton's Method Chap 153 Let/(x) - %x\ + x ,, xc = (1, l)r What is the exact solution to (6.4.1) if = 2? if = £ ? {Hint: Try // = for the second part.] 10 Prove that the locally constrained optimal curve s ( u ) in Figure 6.4.2 is not necessarily planar by constructing a simple three-dimensional example in which xc and some three points on the curve are not coplanar [/(jc) = x2 + 2x2 + 3jt2 will do.] 11 Let He IR" X n be symmetric and positive definite, and let t;,, , be an orthonormaf basis of eigenvectors for H with corresponding eigenvalues A ! , , An Show that for How does this relate to the model m c (ju) for \\H + nl lg\\ — used in the locally constrained optimal algorithm? 12 Let H, g be given by Exercise 11, and define s(/x) = (H + /u7)~'g for n > and r/O^) = ||j(/x)||2 Using the techniques of Exercise 11, show that [You can also show that (J /J/x )^(/Lt) > for all /z > using these techniques, but the proof is tedious.] 13 Let/(x) = {x\ + x22, x0 = (1, 1)T, flf = V/(x0), H = V2f(x0) Calculate the Cauchy point of/from x0 (see Subsection 6.4.2), and the point N = x0 — (0.8y + 0.2)H~ lg as used in our double dogleg algorithm Then graph the double dogleg curve and show graphically the values of xl if S = 1; if — 14 Let H e IR" x n be symmetric and positive definite, g e R" Prove that (gTg)2< (gTHg)(gTH~}g) [Hint: Let u = Hl/2g, v = H~l/2g, and apply the Cauchy-Schwarz inequality.] 15 Complete Example 6.4.5 16 Let F(x) = (x t , 2x2)r, x0 = (1, 1)T Using the techniques of Section 6.5, what is the "steepest-descent" step from x0 for F = 0? If all your steps were in steepest-descent directions, what rate of convergence to the root of F would you expect? 17 Let F: IR" —> IR" be continuously differentiable and x e IR" Suppose x is a local minimizer of/(x) £ {F(x)rF(x) but F(x) + Is J(x) singular? If J(x) is singular, must x be a local minimizer of/(x)? 18 An alternative quadratic model for F = to the one used in the global algorithms of Section 6.5 is the first three terms of the Taylor series of 3l|F(xc + s)]!2, Show that this model is How does this compare with the model used in Section 6.5? Is the Newton step for minimizing mc(x) the same as the Newton step for F(x) = 0? Comment on the attractiveness of this model versus the one used in Section 6.5 (An analogous 154 Chap Globally Convergent Modifications of Newton's Method situation with different conclusions occurs in the solution of nonlinear leastsquares problems—see Chapter 10.) 19 Why is the Newton step from x0 in Example 6.5.1 bad? [Hint: What would happen if x0 were changed to (2, e/6)T z (2, 0.45)r?] 20 What step would be taken from x0 in Example 6.5.1 using the double dogleg strategy (Algorithm A6.4.4) with S0 = 3? 21 Find out what the conjugate gradient algorithm is for minimizing convex quadratic functions, and show that the dogleg with N = x+ is the conjugate gradient method applied to mc(x) on the subspace spanned by the steepest-descent and Newton directions 22 One of the disadvantages often cited for the dogleg algorithm is that its steps are restricted to be in the two-dimensional subspace spanned by the steepest-descent and Newton directions at each step Suggest ways to alleviate this criticism, based on Exercise 21 See Steihaug (1981) 23 Let J e R"x" be singular Show that [Hint: Use (3.1.13) and Theorem 3.5.7.] Generalize this inequality to the case when K2(J) > macheps"1/2 24 Let F e R" be nonzero, J e R" x " singular Another step that has been suggested for the nonlinear equations problem is the s that solves It was shown in Section 3.6 that the solution to (6.6.1) iss = —J*F, where J+ is the pseudoinverse of J Show that this step is similar to s = —(JTJ + a/)~'J r F, where a > is small, by proving: (a) lim (JTJ + xirlJr = J + a-»0 + (b) for any a > and v e R", both (JTJ + a/)~ lJTv and J + v are perpendicular to all vectors w in the null space of J 25 The global convergence of the line search and trust region algorithms of Subsections 6.3.2, 6.4.1, and 6.4.2 does not follow directly from the theory in Subsection 6.3.1, because none of the algorithms implement condition (6.3.4) However, the bounds in the algorithms on the amount of each adjustment in the line search parameter A or the trust region 6C have the same theoretical effect For a proof that all of our algorithms are globally convergent, see Shultz, Schnabel, and Byrd (1982) ... 1. 2500000266453 1. 025000 017 9057 1. 00030480 011 20 1. 00000004647 01 1.0 Secant Method chosen by f.d.N.M.) x0 X1 1. 2500000266453 x2 1. 0769230844 910 x3 1. 0082644643823 x4 1. 00030487 813 54 x5 1. 0000 012 544523 x6 1. 00000000 019 12... f1(x) = x2 - f2(x) = x2 - 2x + 1. 25 1. 025 1. 0003048780488 1. 0000000464 611 1. 0 X0 x X2 X3 X4 X5 1. 5 1. 25 1. 125 1. 0625 1. 0 312 5 24 Chap Nonlinear Problems in One Variable Figure 2.4 .1 Newton's method. .. solving nonlinear least-squares problems 233 Exercises 236 11 11 .1 11. 2 11 .3 11 .4 11 .5 NONLINEAR LEAST SQUARES METHODS FOR PROBLEMS WITH SPECIAL STRUCTURE The sparse finite-difference Newton method