Numerical method for unconstrained optimization and nonlinear equations part 2

7 Stopping, Scaling, and Testing In this chapter we discuss three issues that are peripheral to the basic mathematical considerations in the solution of nonlinear equations and minimization problems, but essential to the computer solution of actual problems The first is how to adjust for problems that are badly scaled in the sense that the dependent or independent variables are of widely differing magnitudes The second is how to determine when to stop the iterative algorithms in finite-precision arithmetic The third is how to debug, test, and compare nonlinear algorithms 7.1 SCALING An important consideration in solving many "real-world" problems is that some dependent or independent variables may vary greatly in magnitude For example, we might have a minimization problem in which the first independent variable, x1? is in the range [102, 103] meters and the second, x2, is in the range [10~ , 10 ~6] seconds These ranges are referred to as the scales of the respective variables In this section we consider the effect of such widely disparate scales on our algorithms One place where scaling will effect our algorithms is in calculating terms such as || x+ — x c || , which we used in our algorithms in Chapter In the above example, any such calculation will virtually ignore the second (time) 155 156 Chap Scaling, Stopping, and Testing variable However, there is an obvious remedy: rescale the independent variables; that is, change their units For example, if we change the units of Xj to kilometers and x2 to microseconds, then both variables will have range [10"1, 1] and the scaling problem in computing ||x+ — xc ||2 will be eliminated Notice that this corresponds to changing the independent variable to x = Dxx, where Dx is the diagonal scaling matrix This leads to an important question Say we transform the units of our problem to x = Dx x, or more generally, transform the variable space to x = Tx, where T e IR"*" is nonsingular, calculate our global step in the new variable space, and then transform back Will the resultant step be the same as if we had calculated it using the same globalizing strategy in the old variable space? The surprising answer is that the Newton step is unaffected by this transformation but the steepest-descent direction is changed, so that a linesearch step in the Newton direction is unaffected by a change in units, but a trust region step may be changed To see this, consider the minimization problem and let us define x = Tx, /(x) =/CT~ 'x) Then it is easily shown that so that the Newton step and steepest-descent direction in the new variable space are or, in the old variable space, These conclusions are really common sense The Newton step goes to the lowest point of a quadratic model, which is unaffected by a change in units of x (The Newton direction for systems of nonlinear equations is similarly unchanged by transforming the independent variable.) However, determining which direction is "steepest" depends on what is considered a unit step in each direction The steepest-descent direction makes the most sense if a step of one unit in variable direction x, has about the same relative length as a step of one unit in any other variable direction x, For these reasons, we believe the preferred solution to scaling problems is for the user to choose the units of the variable space so that each component of x will have roughly the same magnitude However, if this is troublesome, the equivalent effect can be achieved by a transformation in the algorithm of Scaling, Stopping, and Testing Chap 157 the variable space by a corresponding diagonal scaling matrix Dx This is the scaling strategy on the independent variable space that is implemented in our algorithms All the user has to is set Dx to correspond to the desired change in units, and then the algorithms operate as if they were working in the transformed variable space The algorithms are still written in the original variable space, so that an expression like || x+ — xc ||2 becomes || D^x+ — xc) \\2 and the steepest-descent and hook steps become respectively (see Exercise 3) The Newton direction is unchanged, however, as we have seen The positive diagonal scaling matrix Dx is specified by the user on input by simply supplying n values typx;, i = 1, , n, giving "typical" magnitudes of each x, Then the algorithm sets (D,),-,- = (typx,)- , making the magnitude of each transformed variable x, = (£*);, x, about For instance, if the user inputs typx t = 103, typx = 10~ in our example, then Dx will be (7.1.1) If no scaling of x, is considered necessary, typx, should be set to Further instructions for choosing typx, are given in Guideline in the appendix Naturally, our algorithms not store the diagonal matrix Dx, but rather a vector Sx (S stands for scale), where (Sx), = (DJ,, = (typx,)- l The above scaling strategy is not always sufficient; for example, there are rare cases that need dynamic scaling because some x, varies by many orders of magnitude This corresponds to using Dx exactly as in all our algorithms, but recalculating it periodically Since there is little experience along these lines, we have not included dynamic scaling in our algorithms, although we would need only to add a module to periodically recalculate Dx at the conclusion of an iteration of Algorithm D6.1.1 or D6.1.3 An example illustrating the importance of considering the scale of the independent variables is given below EXAMPLE 7.1.1 A common test problem for minimization algorithms is the Rosenbrock banana function which has its minimum at x^ = (\, l)r Two typical starting points are x0 = (-1.2, 1)T and x0 = (6.39, -0.221)T This problem is well scaled, but if a ^ 1, then the scale can be made worse by substituting axt for x1? and x2/a for x2 in (7.1.2), giving 158 Chap Scaling, Stopping, and Testing This corresponds to the transformation If we run the minimization algorithms found in the appendix on/(x), starting from x0 = ( — 1.2/a, a)r and x0 = (6.39/a, a( — 0.221))T, use exact derivatives, the "hook" globalizing step, and the default tolerances, and neglect the scale by setting typxj = typx = 1, then the number of iterations required for convergence with various values of a are as follows (the asterisk indicates failure to converge after 150 iterations): a 0.01 0.1 10 100 Iterations from x0 = (-1.2/a, a)r Iterations from x0 = (6.39/a, a(-0.221))r 150 + * 94 24 52 150 + * 150 + 47 29 48 150 + * * However, if we set typxi = I/a, typ*2 =

Định dạng
Số trang	224
Dung lượng	16,86 MB