Undergraduate Programin Mathematics NONDIFFERENTIABLE OPTIMIZATION METHODS

VIETNAM NATIONAL UNIVERSITY UNIVERSITY OF SCIENCE FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS Nguyen Huu Thinh NONDIFFERENTIABLE OPTIMIZATION METHODS Undergraduate Thesis Undergraduate Program in Mathematics Hanoi - 2012 VIETNAM NATIONAL UNIVERSITY UNIVERSITY OF SCIENCE FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS Nguyen Huu Thinh NONDIFFERENTIABLE OPTIMIZATION METHODS Undergraduate Thesis Advanced Undergraduate Program in Mathematics Thesis advisor: Prof. Dr.Sc. Nguyen Dong Yen Hanoi - 2012 INTRODUCTION Throughout the human being history, people always wonder how to choose the “best” decision or to solve, with great efforts, a problem. Optimization, a branch of applied mathematics, is aimed at helping people to find the best elements in given sets in effective ways. In the recent years, due to the rapid progress in computer technology, including the availability of high speed processors, artificial neutral networks, optimization has played more and more important role in engineering and economics. Historically, in his Ph.D. thesis (1964), N. Z. Shor proposed the subgradient method which was applied to linear programming. Later, the approach was developed by other mathematicians and significant results were obtained. The purpose of this diploma thesis is to present, in a unified way, some fun- damental facts on subgradient methods for solving convex minimization problems which are available in the monographs of B. T. Polyak [2] and A. P. Rusczynski [4]. Proposition 2.1 and the proof of Lemma 1.5 are new. Theorem 2.2 which reformulates Theorem 7.2 from [4] with an additional assumption on the scaling coefficients is also a new result in some sense. In fact, the proof of [4, Theorem 7.2] is not fully correct: Remark 2.3 explains clearly the reason for the failure of a crucial argument at [4, pp. 346–347]. In parallel with correcting the noted error in the proof of [4, Theorem 7.2], we also propose several detailed arguments for the sake of completeness of the proof. Several refinements have been introduced to the difficult proof Theorem 2.3 which is Theorem 7.3 from [4]. Thesis has three chapters. Chapter 1 “Elements of Convex Analysis” reviews basic definitions, notations, theorems, lemmas from convex analysis which are used frequently in the sequel. iii Chapter 2 “Subgradient Methods” describes several versions of the subgradient method and proves the main convergence theorems. Chapter 3 “Conclusions” presents several concluding remarks. The author would like to thank Professor Nguyen Dong Yen for his guidance and encouragement. It is my great pleasure to express my gratitude to all the lecturers and staff members of Faculty of Mathematics, Mechanics and Informatics for their nice teaching and kind help. I would like also to thank all the members of Advanced Mathematics class K53 for their friendship. Finally, I wish to thank my parents for their love and encouragements. iv Contents Introduction iii 1 Elements of Convex Analysis 1 1.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Separation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6 Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Subgradient Methods 14 2.1 Shor’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Other Subgradient Methods . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Known Optimal Value . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Problems with Geometric Constraints . . . . . . . . . . . . . . . . . . 31 3 Conclusions 33 Bibliography 34 v Chapter 1 Elements of Convex Analysis Convexity of sets and functions is an important concept in mathematics and math- ematical economics. This chapter reviews some basic properties of convex sets and functions in Euclidean spaces. 1.1 Convex Sets A set is convex if it contains the line segment connecting its two arbitrarily given points. Formally, the notion can be formulated as follows. Definition 1.1 A set C ⊂ R n is called convex if, for all x 1 ∈ C, x 2 ∈ C, and 0 < α < 1, the point (1 −α)x 1 + αx 2 belongs to C. A point x is called a convex combination of points x 1 , . . . , x m from R n if there exist α 1 ≥ 0, . . . , α m ≥ 0 with m ∑ i=1 α i = 1 such that x = α 1 x 1 + α 2 x 2 + ···+ α m x m . The convex hull of the set X ⊂ R n , denoted by conv X, is the intersection of all convex sets containing X. By definition, conv X is the smallest convex set containing X. It can be proved that conv X consists of all the convex combinations of the points from X. 1 1.2 Projection Consider a nonempty closed convex set V ⊂ R n and a point x ∈ R n . Theorem 1.1 (See [2, pp. 20-21]) If the set V ⊂ R n is nonempty, convex, and closed, then for every x ∈ R n there exists exactly one point z ∈ V, that is closest to x. Definition 1.2 We call the point in V that is closest to x the projection of x on V and we denote it by P V (x), i.e., P V (x) = {z ∈ V | z −x = inf y∈V y − x}. Lemma 1.1 (See [4, p. 21]) Assume that V ⊂ R n is a nonempty closed convex set and let x ∈ R n . Then z = P V (x) if and only if z ∈ V and v −z, x −z ≤ 0, ∀v ∈ V. (1.1) Theorem 1.2 (See [4, p. 23]) Assume that V ⊂ R n is a nonempty closed convex set. Then for all x ∈ R n and y ∈ R n we have P V (x) − P V (y) ≤ x − y. 1.3 Separation Theorem The idea of the separation theorem: a closed convex set and a point outside the set can be separated strictly by a hyperplane. Theorem 1.3 (See [4, p. 24]) Let X ∈ R n be a closed convex set and let x /∈ X. Then there exists a nonzero y ∈ R n and ε > 0 such that y, v ≤ y, x − ε, ∀v ∈ X. (1.2) Definition 1.3 Two sets X and Y in R n are called separable if there is a hyperplane separating them, i.e., there exists α ∈ R and a vector a ∈ R n \{0} such that a, x ≥ α, ∀x ∈ X, and a, x ≤ α, ∀x ∈ Y. Two sets X and Y are strictly separable if there are a ∈ R n \{0} and α 1 > α 2 such that a, x ≥ α 1 , ∀x ∈ X, and a, x < α 2 , ∀x ∈ Y. 2 1.4 Convex Functions It is convenient to consider functions which may take, in addition to real values, two special values: −∞ and +∞. The real line added by these special values will be denoted by R. With every function f : R n → R we can associate the domain dom f := {x | f (x) < +∞}, and the epigraph epi f := {(x, α) ∈ R n ×R | α ≥ f (x)}. Definition 1.4 A function f is called convex if epi f is a convex set. A function f is called concave if −f is convex. Definition 1.5 A function f is called proper if f ( x) > −∞ for all x, and f (x) < +∞ for at least one x. Lemma 1.2 (See [4, pp. 44-45]) A function f is convex if and only if, for all x 1 , x 2 ∈ dom f and for all 0 ≤ α ≤ 1, Jensen inequality f ((1 − α)x 1 + αx 2 ) ≤ (1 − α) f (x 1 ) + α f (x 2 ) (1.3) holds. Lemma 1.3 (See [4, p. 46]) If f is convex, then dom f is a convex set. The epigraph of a function can be used to characterize its continuity properties and its convexity. Definition 1.6 A function f : R n → R is called lower semicontinuous if for every convergent sequence of points {x k } we have f  lim k→∞ x k  ≤ lim inf k→∞ f (x k ). Definition 1.7 Let f : R n → R and α ∈ R. The level set corresponding to the value α is M α = {x ∈ R n | f (x) ≤ α}. Lemma 1.4 (See [2, p. 48]) If f : R n → R is convex then, for each β ∈ R, the set M β = {x ∈ R n | f (x) ≤ β} (1.4) is convex. If in addition, f is lower semicontinuous, then the set M β is closed for all β. 3 We now prove a useful lemma, which was given in [4, p. 86] as Exercise 2.12 without proof. Lemma 1.5 If a proper convex function f : R n → R has a nonempty and bounded level set, then all level sets of f are bounded. Proof We will prove by contradiction. Assume that there exists α ∈ R such that M α is unbounded. Since f is convex, the level set M α is convex. Obviously, M α ⊂ dom f . By our assumption, there is β ∈ R such that M β is nonempty and bounded. It is clear that β < α; so M β ⊂ M α . Since M α is unbounded, the recession cone of M α is nontrivial by [3, Theorem 8.4, p. 64]. So there is a vector v ∈ R n with v = 1 such that, for any x ∈ M α and t > 0, vector x t := x + tv lies in M α . Fix a point x 0 ∈ M β and let t 1 > 0 be such that x t 1 := x 0 + t 1 v belongs to M α \ M β . Then we have α ≥ f (x t 1 ) > β ≥ f (x 0 ); thus f (x t 1 ) − f (x 0 ) t 1 > 0. Setting x t := x 0 + tv ∈ M α , we see that f (x t ) − f (x 0 ) t ≤ α − f (x 0 ) t , where t −1 [α − f (x 0 )] → 0 as t → +∞. Consequently, f (x t 1 ) − f (x 0 ) t 1 > 0 ≥ lim sup t→+∞ f (x t ) − f (x 0 ) t . So, for t > 0 large enough, it holds that f (x t 1 ) − f (x 0 ) t 1 > f (x t ) − f (x 0 ) t . Then we have t 1 t [ f (x 0 + tv) − f (x 0 )] < f (x 0 + t 1 v) − f (x 0 ), 4 or, t 1 t f (x 0 + tv) + (1 − t 1 t ) f (x 0 ) < f ( t 1 t (x 0 + tv) + (1 − t 1 t )x 0 ), for t > 0 sufficiently large. This contradicts the Jensen inequality, which should hold due to the convexity of f . The proof is complete. ✷ 1.5 Derivatives We now recall some facts related to derivatives of a proper, extended-real-valued function f (x) : R n → R. Directional differentiability of convex functions is also considered. Definition 1.8 The gradient of f at a point x from the interior of dom f , denoted by int(dom f ), is a vector ∇f (x) ∈ R n satisfying the relation f (x + y) = f (x) + ∇f (x), y + o(y) for all y ∈ R n , wherey −1 o(y) → 0 as y → 0. The gradient of f at x, if such exists, is uniquely defined, and we have ∇f (x) =        ∂ f(x) ∂x 1 ∂ f(x) ∂x 2 . . . ∂ f(x) ∂x 1        , where ∂ f (x)/∂x i stands for the partial derivative of f with respect to the variable x i at x. In that case, we say that f is Fréchet differentiable at x. Definition 1.9 If the functions ∂ f (·)/∂x i , i = 1, . . . , n, are Fr ´ echet differentiable at x ∈ int(dom f ), then f is said to be twice differentiable at x. Example 1.1 Let f (x) be a linear-quadratic function, i.e., f (x) = 1 2 Ax, x−b, x, where A is a symmetric n ×n matrix, b ∈ R n . Then the gradient of f at x is computed by the formula ∇f (x) = Ax −b. 5 [...]... τk = ∞ k= 2.2 Other Subgradient Methods There are several other solution methods for convex minimization problems which also use subgradients of the objective function for constructing iterative sequences This section is based on some materials given by A P Ruszczynski in [4, Chapter 7] All the methods studied here are subgradient-type methods The similarity of these methods with that one of Shor is... Lagrange multipliers ¯ ¯ ˆ ˆ ˆ (λ, µ) if and only if ( x, (λ, µ)) is a saddle point of the Lagrangian 13 Chapter 2 Subgradient Methods This chapter presents several subgradient methods for solving convex optimization problems with or without constraints The main idea of the methods is to use the subdifferential of convex functions to produce iterative sequences converging to the solution of the given... may not be Fr´ chet differentiable For instance, the Euclideane norm function f ( x ) = x , x ∈ Rn , n ≥ 1, is nondifferentiable at 0 The maximum function f ( x ) = max{ f i ( x ) | i ∈ I }, (1.7) where f i : Rn → R, i ∈ I := {1, , m}, are differentiable convex functions, is generally nondifferentiable The notion of subdifferential for convex functions is a counterpart of the notion of derivative... C, the latter is equivalent to (1.15) 2 We are going to describe optimality conditions for convex minimization problems under both geometrical and functional constraints Consider the general nonlinear optimization problem: minimize f ( x ) subject to gi ( x ) ≤ 0, h j ( x ) = 0, (1.17) i = 1, , m, j = 1, , p, x ∈ X0 , where f : Rn → R, gi : Rn → R, i = 1, , m, h j : Rn → R, j = 1, , p, are given functions,... ∈ R p conditions + ˆ (1.18) and (1.19) are satisfied, the x is a solution of (1.17) Problem (1.17) with satisfying the assumptions of Theorem 1.9, apart from Slater’s condition, is said to be a convex optimization problem 11 1.7 Lagrangian Duality We now consider (1.17) without requiring its convexity The Lagrangian of (1.17) is defined by setting m p i =1 j =1 L( x, λ, µ) = f ( x ) + ∑ λi gi ( x ) +... ( x0 , (λ0 , µ0 )) ∈ X0 × Λ0 is called a saddle point of the Lagrangian if, for all x ∈ X0 and all (λ, µ) ∈ Λ0 L( x0 , λ, µ) ≤ L( x0 , λ0 , µ0 ) ≤ L( x, λ0 , µ0 ) The primal function of the nonlinear optimization problem (1.17) is L P ( x ) := sup L( x, λ, µ), (λ,µ)∈Λ0 and the corresponding dual function is L D (λ, µ) := inf L( x, λ, µ) x ∈ X0 Under suitable conditions, problem (1.17) can be transformed... to g ≤ 1 Optimality conditions for unconstrained convex minimization problems can be described as follows 9 ¯ Theorem 1.7 Let f : Rn → R be a proper convex function A point x ∈ Rn is a solution of the optimization problem min{ f ( x ) | x ∈ Rn } (1.12) if and only if ¯ 0 ∈ ∂ f ( x ) (1.13) ¯ ¯ ¯ Proof If x is a solution of (1.12) then f ( x + y) ≥ f ( x ) + 0, y for all y ∈ Rn That ¯ means 0 is a subgradient... iterative sequences converging to the solution of the given problem Various convergence theorems will be proved and some illustrative examples will be given 2.1 Shor’s Method Consider the unconstrained optimization problem min { f ( x ) | x ∈ Rn }, (2.1) where f : Rn → R is proper convex function Denote the solution set of this problem by X ∗ and note that X ∗ may be empty In this section, we assume... true in the case under our consideration The proof is complete 2 The above theorem assures that the sequence of the record values converges to optimal value f ∗ (it may happen that f ∗ = −∞) of the given optimization problem But the theorem says nothing about the convergence of the iterative sequence There is a natural question about the convergence of Shor’s method as follows: If there exists x ∗ = limk→∞... constraints can be formulated as follows Theorem 1.8 Let f : Rn → R be a proper convex function Let C ⊂ Rn be such convex ¯ set that C ∩ int(dom) f = ∅, or int C ∩ dom f = ∅ A point x ∈ Rn is a solution of the optimization problem min{ f ( x ) | x ∈ C } (1.14) if and only if ¯ ¯ 0 ∈ ∂ f ( x ) + NC ( x ), where NC (u) := { x ∗ ∈ Rn | x ∗ , x − u ≤ 0 ∀ x ∈ C } ∅ (1.15) if u ∈ C if u ∈ C / is normal cone to C at . UNIVERSITY UNIVERSITY OF SCIENCE FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS Nguyen Huu Thinh NONDIFFERENTIABLE OPTIMIZATION METHODS Undergraduate Thesis Undergraduate Program in Mathematics Hanoi - 2012 VIETNAM. OF SCIENCE FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS Nguyen Huu Thinh NONDIFFERENTIABLE OPTIMIZATION METHODS Undergraduate Thesis Advanced Undergraduate Program in Mathematics Thesis advisor:. Lagrangian. 13 Chapter 2 Subgradient Methods This chapter presents several subgradient methods for solving convex optimization problems with or without constraints. The main idea of the methods is to use the subdifferential

Định dạng
Số trang	39
Dung lượng	308,74 KB