Several optimization schemes have been known for convex optimization problems. However, numerical algorithms for solving nonconvex optimization problems are still underdeveloped. A progress to go beyond convexity was made by considering the class of functions representable as differences of convex functions. In this paper, we introduce a generalized proximal point algorithm to minimize the difference of a nonconvex function and a convex function. We also study convergence results of this algorithm under the main assumption that the objective function satisfies the Kurdyka Lojasiewicz property. Keywords: DC programming, proximal point algorithm, difference of convex functions, Kurdyka Lojasiewicz property
CONVERGENCE ANALYSIS OF A PROXIMAL POINT ALGORITHM FOR MINIMIZING DIFFERENCES OF FUNCTIONS June 18, 2015 Nguyen Thai An1 , Nguyen Mau Nam2 Abstract. Several optimization schemes have been known for convex optimization problems. However, numerical algorithms for solving nonconvex optimization problems are still underdeveloped. A progress to go beyond convexity was made by considering the class of functions representable as differences of convex functions. In this paper, we introduce a generalized proximal point algorithm to minimize the difference of a nonconvex function and a convex function. We also study convergence results of this algorithm under the main assumption that the objective function satisfies the Kurdyka - Lojasiewicz property. Keywords: DC programming, proximal point algorithm, difference of convex functions, Kurdyka - Lojasiewicz property. Mathematical Subject Classification 2000: Primary 49J52, 49J53; Secondary 90C30 1 Introduction In this paper, we introduce and study the convergence analysis of an algorithm for solving optimization problems in which the objective functions can be represented as differences of nonconvex and convex functions. The structure of the problem under consideration is flexible enough to include the problem of minimizing a smooth function on a closed set or minimizing a DC function, where DC stands for Difference of Convex functions. It is worth noting that DC programming is one of the most successful approaches to go beyond convexity. The class of DC functions is closed under many operations usually considered in optimization and is quite large to contain many objective functions in applications of optimization. Moreover, this class of functions possesses beautiful generalized differentiation properties and is favorable for applying numerical optimization schemes; see [1, 2, 3] and the references therein. A pioneer in this research direction is Pham Dinh Tao who introduced a simple algorithm called the (DCA) based on generalized differentiation of the functions involved as well as their Fenchel conjugates [4]. Over the 1 Thua Thien Hue College of Education, 123 Nguyen Hue, Hue City, Vietnam (thaian2784@gmail.com). Maseeh Department of Mathematics and Statistics, Portland State University, PO Box 751, Portland, OR 97207, United States (mau.nam.nguyen@pdx.edu). 2 Fariborz 1 past three decades, Pham Dinh Tao, Le Thi Hoai An and many others have contributed to providing mathematical foundation for the algorithm and making it accessible for applications. The (DCA) nowadays becomes a classical tool in the field of optimization due to several key features including simplicity, inexpensiveness, flexibility and efficiency; see [5, 6, 7, 8]. The proximal point algorithm (PPA for short) was suggested by Martinet [9] for solving convex optimization problems and was extensively developed by Rockafellar [10] in the context of monotone variational inequalities. The main idea of this method consists of replacing the initial problem with a sequence of regularized problems, so that each particular auxiliary problem can be solved by one of the well-known algorithms. Along with the (DCA), a number of proximal point optimization schemes have been proposed in [11, 12, 13, 14] to minimize differences of convex functions. Although convergence results for the (DCA) and the proximal point algorithms for minimizing differences of convex functions have been addressed in some recent research, it is still an open research question to study the convergence analysis of algorithms for minimizing differences of functions in which convexity is not assumed. Based on the method developed recently in [15, 16, 17], we study a proximal point algorithm for minimizing the difference of nonsmooth functions in which only the second function involved is required to be convex. Under the main assumption that the objective function satisfies the Kurdyka - Lojiasiewicz property, we are able to analyze the convergence of the algorithm. Our results further recent progress in using the Kurdyka Lojiasiewicz property and variational analysis to study nonsmooth numerical algorithms pioneered by Attouch, Bolte, Redont, Soubeyran, and many others. The paper is organized as follows. In Section 2, we provide tools of variational analysis used throughout the paper. Section 3 is the main section of the paper devoted to the generalized proximal point algorithm and its convergence results. Applications to trust-region subproblems and nonconvex feasibility problems are introduced in Section 4. 2 Tools of Variational Analysis In this section, we recall some basic concepts and results of generalized differentiation for nonsmooth functions used throughout the paper; see, e.g., [18, 19, 20, 21] for more details. We use Rn to denote the n dimensional Euclidean space, ·, · to denote the inner product, and 2 · to denote the associated Euclidean norm. For an extended-real-value function f : Rn → R ∪ {+∞}, the domain of f is the set domf = {x ∈ Rn : f (x) < +∞}. The function f is said to be proper if its domain is nonempty. Given a lower semicontinuous function f : Rn → R ∪ {+∞} with x ¯ ∈ dom f , the Fr´echet subdifferential of f at x ¯ is defined by ∂ F f (¯ x) = v ∈ Rn : lim inf x→¯ x f (x) − f (¯ x) − v, x − x¯ ≥0 . x − x¯ We set ∂ F f (¯ x) = ∅ if x ¯ ∈ / dom f . Note that the Fr´echet subdifferential mapping does not have a closed graph, so it is unstable computationally. Based on the Fr´echet subdifferential, the limiting/Mordukhovich subdifferential of f at x¯ ∈ dom f is defined by f ∂ L f (¯ x) = Lim sup ∂ F f (x) = {v ∈ Rn : ∃ xk − →x ¯, v k ∈ ∂ F f (xk ), vk → v}. f x− →x¯ f where the notation x − →x ¯ means that x → x ¯ and f (x) → f (¯ x). We also set ∂ L f (¯ x) = ∅ if x ¯∈ / dom f . It follows from the definition the following robustness/closedness property of ∂ L f : f v ∈ Rn : ∃ xk − →x ¯, v k → v, v k ∈ ∂ L f (xk ) = ∂ L f (¯ x). Obviously, we have ∂ F f (x) ⊂ ∂ L f (x) for every x ∈ Rn , where the first set is closed and convex while the second one is closed; see [22, Theorem 8.6, p 302]. If f is differentiable at x¯, then ∂ F (¯ x) = {∇f (¯ x)}. Moreover, if f is continuously differentiable on a neighborhood of x ¯, then ∂ L f (¯ x) = {∇f (¯ x)}. When f is convex, the Fr´echet and the limiting subdifferentials reduce to the subdifferential in the sense of convex analysis: ∂f (¯ x) = {v ∈ Rn : v, x − x ¯ ≤ f (x) − f (¯ x), ∀ x ∈ Rn }. For a convex subset Ω of Rn and x¯ ∈ Ω, the normal cone to Ω at x ¯ is the set N (¯ x; Ω) = {v ∈ Rn : v, x − x ¯ ≤ 0, ∀ x ∈ Ω}. This normal cone can be represented as the subdifferential at the point under consideration of the indicator function: δ(x; Ω) = 0 if x ¯ ∈ Ω, +∞ if x ¯∈ / Ω, i.e., N (¯ x; Ω) = ∂δ(¯ x; Ω). We use the notation dist(¯ x; Ω) to denote the distance from x ¯ to Ω, i.e., dist(¯ x; Ω) = inf x∈Ω x − x ¯ . The notation PΩ (¯ x) = {w ¯∈Ω: x ¯−w ¯ = dist(¯ x; Ω)} stands for the projection from x¯ onto Ω. We also use dΩ (¯ x) for dist(¯ x; Ω) where convenience. 3 Another subdifferential concept called the Clarke subdifferential was defined in [18] based on generalized directional derivatives. The Clarke subdifferential of a locally Lipschitz continuous function f around x¯ can be represented in terms of the limiting subdifferential: ∂ C f (¯ x) = co ∂ L f (¯ x). Here co Ω denotes the convex hull of an arbitrary set Ω. Proposition 2.1. ([22, Exercise 8.8, p. 304]). Let f = g + h where g is lower semicontinuous and let h is continuously differentiable on a neighborhood of x ¯. Then ∂ F f (¯ x) = ∂ F g(¯ x) + ∇h(¯ x) and ∂ L f (¯ x) = ∂ L g(¯ x) + ∇h(¯ x). Proposition 2.2. ([22, Theorem 10.1, p. 422]). If a lower semicontinuous function f : Rn → R ∪ {+∞} has a local minimum at x ¯ ∈ dom f , then 0 ∈ ∂ F f (¯ x) ⊂ ∂ L f (¯ x). In the convex case, this condition is not only necessary for a local minimum but also sufficient for a global minimum. Proposition 2.3. Let h : Rn → R be a finite convex function on Rn . If y k ∈ ∂h(xk ) for all k and {xk } is bounded, then the sequence {y k } is also bounded. Proof The result follows from the fact that h is locally Lipschitz continuous on Rn and [22, Definition 5.14, Proposition 5.15, Theorem 9.13]. Following [15, 16], a lower semicontinuous function f : Rn → R ∪ {+∞} satisfies the Kurdyka - Lojasiewicz property at x∗ ∈ dom ∂ L f if there exist ν > 0, a neighborhood V of x∗ , and a continuous concave function ϕ : [0, ν[→ [0, +∞[ with (i) ϕ(0) = 0. (ii) ϕ is of class C 1 on ]0, ν[. (iii) ϕ′ > 0 on ]0, ν[. (iv) For every x ∈ V with f (x∗ ) < f (x) < f (x∗ ) + ν, we have ϕ′ (f (x) − f (x∗ )) dist 0, ∂ L f (x) ≥ 1. We say that f satisfies the strong Kurdyka - Lojasiewicz property at x∗ if the same assertion holds for the Clarke subdifferential ∂ C f (x). According to [15, Lemma 2.1], a proper lower semicontinuous function f : Rn → R∪{+∞} has the Kurdyka - Lojasiewicz property at any point x ¯ ∈ Rn such that 0 ∈ / ∂ L f (¯ x). Recall that a subset Ω of Rn is called 4 semi-algebraic if it can be represented as a finite union of sets of the form {x ∈ Rn : pi (x) = 0, qi (x) < 0 for all i = 1, . . . , m}, where pi and qi for i = 1, . . . , m are polynomial functions. A function f is said to be semi-algebraic if its graph is a semi-algebraic subset of Rn+1 . It is known that a proper lower semicontinuous semi-algebraic function always satisfies the Kurdyka - Lojasiewicz property; see [15, 23]. In a recent paper, Bolte et al. [23, Theorem 14] showed that the class of definable functions, which contains the class of semi-algebraic functions, satisfies the strong Kurdyka - Lojasiewicz property at each point of dom ∂ C f . 3 A Generalized Proximal Point Algorithm for Minimizing Differences of Functions We focus on the convergence analysis of a proximal point algorithm for solving nonconvex optimization problems of the following type min f (x) = g1 (x) + g2 (x) − h(x) : x ∈ Rn , (1) where g1 (x) : Rn → R ∪ {+∞} is proper and lower semicontinuous, g2 (x) : Rn → R is differentiable with L Lipschitz gradient, and h : Rn → R is convex. The specific structure of (1) is flexible enough to include the problem of minimizing a smooth function on a closed constraint set: min{g(x) : x ∈ Ω}, and the general DC problem: min f (x) = g(x) − h(x) : x ∈ Rn , (2) where g : Rn → R ∪ {+∞} is a proper lower semicontinuous convex function and h : Rn → R is convex. It is well-known that if x ¯ ∈ dom f is a local minimizer of (2), then ∂h(¯ x) ⊂ ∂g(¯ x). (3) Any point x¯ ∈ dom f that satisfies (3) is called a stationary point of (2), and any point x¯ ∈ dom f such that ∂g(¯ x)∩∂h(¯ x) = ∅ is called a critical point of this problem. Since h is a finite convex function, its subdifferential at any point is nonempty, and hence any stationary point of (2) is a critical point; see [5, 24, 25] and the references therein for more details. Let us recall below a necessary optimality condition from [26] for minimizing differences of functions in the 5 nonconvex settings. Proposition 3.1. ([26, Proposition 4.1]) Consider the difference function f = g−h, where g : Rn → R∪{+∞} and h : Rn → R are lower semicontinuous functions. If x ¯ ∈ dom f is a local minimizer of f , then we have the inclusion ∂ F h(¯ x) ⊂ ∂ F g(¯ x). If in addition h is convex, then ∂h(¯ x) ⊂ ∂ L g(¯ x). When adapting to the setting of (1), we obtain the following optimality condition. Proposition 3.2. If x¯ ∈ dom f is a local minimizer of the function f considered in (1), then ∂h(¯ x) ⊂ ∂ L g1 (¯ x) + ∇g2 (¯ x). (4) Proof The assertion follows from Proposition 2.1 and Proposition 3.1. Following the DC case, any point x¯ ∈ dom f satisfying condition (4) is called a stationary point of (1). In general, this condition is hard to be reached and we may relax it to [∂ L g1 (¯ x) + ∇g2 (¯ x)] ∩ ∂h(¯ x) = ∅ (5) and call x ¯ a critical point of f . Obviously, every stationary point x ¯ is a critical point. Moreover, by [26, Corollary 3.4] at any point x ¯ with g1 (¯ x) < +∞, we have ∂ L (g1 + g2 − h)(¯ x) ⊂ ∂ L g1 (¯ x) + ∇g2 (¯ x) − ∂h(¯ x). Thus, if 0 ∈ ∂ L f (¯ x), then x¯ is a critical point of f in the sense of (5). The converse is not true in general as shown by the following example. Consider the functions below f (x) = 2|x| + 3x, g1 (x) = 3|x|, g2 (x) = 3x, and h(x) = |x|. In this case, x ¯ = 0 satisfies (5) but 0 ∈ / ∂ L f (¯ x) since ∂g1 (0) = [−3, 3], ∇g2 (0) = 3, ∂h(0) = [−1, 1] and ∂f (0) = [1, 5]. However, it is easy to check that these two conditions are equivalent when h is differentiable on Rn . We recall now the Moreau/Moreau-Yoshida proximal mapping for a nonconvex function; see [22, page 20]. Let g : Rn → R ∪ {+∞} be a proper lower semicontinuous function. The Moreau proximal mapping with n regularization parameter t > 0, proxgt : Rn → 2R , is defined by proxgt (x) = argmin g(u) + 6 t u−x 2 2 : u ∈ Rn . As an interesting case, when g is the indicator function δ(·; Ω) associated with a nonempty closed set Ω, proxgt (x) coincides with the projection mapping. Under the assumption inf x∈Rn g(x) > −∞, the lower semicontinuity of g and the coercivity of the squared norm imply that the proximal mapping is well-defined; see [27, Proposition 2.2]. Proposition 3.3. Let g : Rn → R∪{+∞} be a proper lower semicontinuous function with inf x∈Rn g(x) > −∞. Then, for every t ∈ (0, +∞), the set proxgt (x) is nonempty and compact for every x ∈ Rn . We now introduce a new generalized proximal point algorithm for solving (1). Let us begin with the lemma below regarding an upper bound for a smooth function with Lipschitz continuous gradient; see [28, 29]. Proposition 3.4. If g : Rn → R is a differentiable function with L - Lipschitz gradient, then g(y) ≤ g(x) + ∇g(x), y − x + L y−x 2 2 for all x, y ∈ Rn . (6) Let us introduce the generalized proximal point algorithm (GPPA) below to solve (1). Generalized Proximal Point Algorithm (GPPA) 1. Initialization: Choose x0 ∈ dom g1 and a tolerance ǫ > 0. Fix any t > L. 2. Find y k ∈ ∂h(xk ). 3. Find xk+1 as follows xk+1 ∈ proxgt 1 xk − ∇g2 (xk ) − y k t . (7) 4. If xk − xk+1 ≤ ǫ, then exit. Otherwise, increase k by 1 and go back to step 2. From the definition of proximal mapping, (7) is equivalent to saying that xk+1 ∈ argmin g1 (x) − y k − ∇g2 (xk ), x − xk + x∈Rn t x − xk 2 2 . (8) Theorem 3.1. Consider the (GPPA) for solving (1) in which g1 (x) : Rn → R ∪ {+∞} is proper and lower semicontinuous with inf x∈Rn g1 (x) > −∞, g2 (x) : Rn → R is differentiable with L - Lipschitz gradient, and h : Rn → R is convex. Then (i) For any k ≥ 1, we have f (xk ) − f (xk+1 ) ≥ t−L k x − xk+1 2 7 2 . (9) (ii) If α = infn f (x) > −∞, then lim f (xk ) = ℓ∗ ≥ α and lim x∈R k→+∞ k→+∞ xk − xk+1 = 0. (iii) If α = infn f (x) > −∞ and {xk } is bounded, then every cluster point of {xk } is a critical point of f . x∈R Proof (i) By Proposition 2.1 and Proposition 25, it follows from (8) that y k − ∇g2 (xk ) ∈ ∂ L g1 (xk+1 ) + t xk+1 − xk . (10) h(xk+1 ) ≥ h(xk ) + y k , xk+1 − xk . (11) Since y k ∈ ∂h(xk ), From (8), we have g1 (xk ) ≥ g1 (xk+1 ) − y k − ∇g2 (xk ), xk+1 − xk + t k x − xk+1 2 2 . (12) Adding (11) and (12) and using (6), we get t k x − xk+1 2 2 t L k x − xk+1 2 + xk − xk+1 ≥ g1 (xk+1 ) − h(xk+1 ) + g2 (xk+1 ) − g2 (xk ) − 2 2 g1 (xk ) − h(xk ) ≥ g1 (xk+1 ) − h(xk+1 ) + ∇g2 (xk ), xk+1 − xk + 2 . This implies f (xk ) − f (xk+1 ) ≥ t−L k x − xk+1 2 2 . Assertion (i) has been proved. (ii) It follows from the assumptions made and (i) that {f (xk )} is monotone decreasing and bounded below, so the first assertion of (ii) is obvious. Observe that m xk − xk+1 k=1 2 ≤ 2 2 f (x1 ) − f (xm+1 ) ≤ f (x1 ) − α for all m ∈ N. t−L t−L Thus, the sequence { xk − xk+1 } converges to 0. (iii) From (8), for all x ∈ Rn , we have g1 (xk+1 ) − wk , xk+1 − xk + t k+1 x − xk 2 2 ≤ g1 (x) − wk , x − xk + t x − xk 2 2 , (13) where wk = y k − ∇g2 (xk ). Now suppose further that {xk } is bounded. Since h is finite convex function on Rn , y k ∈ ∂h(xk ) and {xk } is bounded, from Proposition 2.3, {y k } is also bounded. We can take two subsequences: {xkℓ } of {xk } and {y kℓ } of {y k } that converge to x∗ and y ∗ , respectively. Because xkℓ − xkℓ +1 → 0 as ℓ → +∞, we deduce from (13) that lim sup g1 (xkℓ +1 ) ≤ g1 (x) − y ∗ − ∇g2 (x∗ ), x − x∗ + ℓ→+∞ 8 t x − x∗ 2 2 for all x ∈ Rn . In particular, for x = x∗ , we get lim sup g1 (xkℓ +1 ) ≤ g1 (x∗ ). ℓ→+∞ Combining this with the lower semicontinuity of g1 , we get lim g1 (xkℓ +1 ) = g1 (x∗ ). ℓ→+∞ From the closed property of the subdifferential mapping ∂h(·), we have y ∗ ∈ ∂h(x∗ ). It follows from (10) that there exists z kℓ +1 ∈ ∂ L g1 (xkℓ +1 ) satisfying y kℓ − ∇g2 (xkℓ ) − z kℓ +1 = t xkℓ − xkℓ +1 . By (ii) and the Lipschitz continuity of ∇g2 , lim z kℓ +1 = y ∗ − ∇g2 (x∗ ) := z ∗ . ℓ→+∞ g1 Thus, xkℓ +1 −→ x∗ , z kℓ +1 ∈ ∂ L g1 (xkℓ +1 ), z kℓ +1 → z ∗ as ℓ → +∞, it follows from the robustness of limiting subdifferential that z ∗ ∈ ∂ L g1 (x∗ ). Therefore, y ∗ ∈ [∂ L g1 (x∗ ) + ∇g2 (x∗ )] ∩ ∂h(x∗ ). This implies that x∗ is a critical point of f and the proof is complete. Proposition 3.5. Suppose that inf x∈Rn f (x) > −∞, f is proper and lower semicontinuous. If the (GPPA) sequence {xk } has a cluster point x∗ , then lim f (xk ) = f (x∗ ). Thus, f has the same value at all cluster k→+∞ points of {xk }. Proof Since inf x∈Rn f (x) > −∞, it follows from (9) that the sequence of real numbers {f (xk )} is nonincreasing and bounded below. Thus, lim f (xk ) = ℓ∗ exists. If {xkℓ } is a subsequence converging to x∗ , k→+∞ then by the lower semicontinuity of f , we have lim inf f (xkℓ ) ≥ f (x∗ ). Observe from the structure of f that ℓ→+∞ dom f = dom g1 . Since g2 and h are continuous, f is proper and lower semicontinuous if and only if g1 is proper and lower semicontinuous. To prove the opposite inequality, we employ the proof of (iii) of Theorem 3.1 and get lim sup f (xkℓ ) = lim sup g1 (xkℓ ) + g2 (xkℓ ) − h(xkℓ ) ℓ→+∞ ℓ→+∞ ≤ lim sup g1 (xkℓ ) + lim sup g2 (xkℓ ) − lim inf h(xkℓ ) ℓ→+∞ ℓ→+∞ ℓ→+∞ ≤ g1 (x∗ ) + g2 (x∗ ) − h(x∗ ) = f (x∗ ). Combining this with the uniqueness of limit, we have ℓ∗ = f (x∗ ). The proof is complete. 9 Remark 3.1. (i) If g is also convex, we can get a stronger inequality than (9) and relax the range of the regularization parameter t. Indeed, using definition of the subdifferential in the sense of convex analysis in (10), we have y k − ∇g2 (xk ) − t(xk+1 − xk ), xk − xk+1 ≤ g1 (xk ) − g1 (xk+1 ). Since y k ∈ ∂h(xk ), h(xk+1 ) ≥ h(xk ) + y k , xk+1 − xk . Adding these inequalities and using (6) give f (xk ) − f (xk+1 ) ≥ Thus, we can choose t > L 2 t− L 2 xk − xk+1 2 . instead of t > L as before. (ii) When h(x) = 0, the (GPPA) reduces to the proximal forward - backward algorithm for minimizing f = g1 + g2 considered in [30]. If h(x) = 0 and g1 is the indicator function δ(·; Ω) associated with a nonempty closed set Ω, then the (GPPA) reduces to the projected gradient method (PGM) for minimizing the smooth function g2 on a nonconvex constraint set Ω: 1 xk+1 = PΩ xk − ∇g2 (xk ) . t (iii) If g2 = 0, then the (GPPA) reduces to the (PPA) with constant stepsize proposed in [11, 31]. In the theorem below, we establish sufficient conditions that guarantee the convergence of the sequence {xk } generated by the (GPPA). These conditions include the Kurdyka - Lojasiewicz property of the function f and the differentiability with Lipschitz gradient of h. In what follows, let C ∗ denote the set of cluster points of the sequence {xk }. We follow the method from [15, 16]. Theorem 3.2. Suppose that inf x∈Rn f (x) > −∞, and f is lower semicontinuous. Suppose further that ∇h is L(h) - Lipschitz continuous and f has the Kurdyka - Lojasiewicz property at any point x ∈ domf . If C ∗ = ∅, then the (GPPA) sequence {xk } converges to a critical point of f . Proof Take any x∗ ∈ C ∗ and a subsequence {xkℓ } that converges to x∗ . Applying Proposition 3.5 yields lim f (xk ) = ℓ∗ = f (x∗ ). k→+∞ If f (xk ) = ℓ∗ for some k ≥ 1, then f (xk ) = f (xk+p ) for any p ≥ 0 since the sequence {f (xk )} is monotone decreasing by (9). Therefore, xk = xk+p for all p ≥ 0. Thus, the (GPPA) terminates after a finite number of 10 steps. Without loss of generality, from now on, we assume that f (xk ) > ℓ∗ for all k. Recall that the (GPPA) starts from a point x0 ∈ dom g1 and generates two sequences {xk } and {y k } with y k ∈ ∂h(xk ) = ∇h(xk ) and y k−1 − ∇g2 (xk−1 ) − t(xk − xk−1 ) ∈ ∂ L g1 (xk ). Thus, from Proposition 2.1 we have y k−1 − ∇g2 (xk−1 ) − t(xk − xk−1 ) + ∇g2 (xk ) − y k ∈ ∂ L g1 (xk ) + ∇g2 (xk ) − ∇h(xk ) = ∂ L f (xk ). Using the Lipschitz continuity of ∇g2 and ∇h, we have y k−1 − ∇g2 (xk−1 ) − t(xk − xk−1 ) + ∇g2 (xk ) − y k = = ∇h(xk−1 ) − ∇h(xk ) + ∇g2 (xk ) − ∇g2 (xk−1 ) − t(xk − xk−1 ) ≤ (L(h) + L + t) xk−1 − xk ≤ M xk−1 − xk , where M := L(h) + L + t. Therefore, dist 0; ∂ L f (xk ) ≤ M xk−1 − xk . (14) According to the assumption that f has the strong Kurdyka - Lojasiewicz property at x∗ , there exist ν > 0, a neighborhood V of x∗ , and a continuous concave function ϕ : [0, ν[→ [0, +∞[ so that for all x ∈ V satisfying ℓ∗ < f (x) < ℓ∗ + ν, we have ϕ′ (f (x) − ℓ∗ ) dist 0; ∂ L f (x) ≥ 1. (15) Let δ > 0 small enough such that IB(x∗ ; δ) ⊂ V. Using the facts that lim xkℓ = x∗ , lim ℓ→+∞ k→+∞ xk+1 − xk = 0, lim f (xk ) = ℓ∗ , and f (xk ) > ℓ∗ for all k, we can find a natural number N large enough satisfying k→+∞ xN ∈ IB(x∗ ; δ), ℓ∗ < f (xN ) < ℓ∗ + ν, (16) and xN − x∗ + where γ = 2M t−L xN − xN −1 3δ + γϕ f (xN ) − ℓ∗ < , 4 4 (17) > 0. We will show that for all k ≥ N , xk ∈ IB(x∗ ; δ). To this end, we first show that whenever xk ∈ IB(x∗ ; δ) and ℓ∗ < f (xk ) < ℓ∗ + ν for some k, we have xk − xk+1 ≤ xk−1 − xk + γ ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ 4 11 . (18) Indeed, by (14), the concavity of ϕ, (15), and (9), we have M xk−1 − xk ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ ≥ dist 0; ∂ L f (xk ) ≥ dist 0; ∂ L f (xk ) ϕ′ f (xk ) − ℓ∗ ≥ t−L k x − xk+1 2 2 f (xk ) − f (xk+1 ) . It follows that ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ ≥ ≥ where the last inequality holds since a2 b ≥a− b 4 1 xk − xk+1 γ xk−1 − xk 1 γ 2 xk − xk+1 − xk−1 − xk 4 , (19) for any positive real numbers a and b. This implies (18). We next show that xk ∈ IB(x∗ ; δ) for all k ≥ N by induction. The claim is true for k = N by the construction above. Now suppose the assertion holds for k = N, . . . , N + k − 1 for some k ≥ 1, i.e., xN , . . . , xN +k−1 ∈ IB(x∗ ; δ). Since f (xk ) is a non-increasing sequence that converges to ℓ∗ , our choice of N implies that ℓ∗ < f (xk ) < ℓ∗ + ν for all k ≥ N . In particular, (18) can be applied for all k = N, . . . , N + k − 1. Using the estimation (18) for k = N, . . . , N + k − 1, we have xN −1 − xN + γ ϕ f (xN ) − ℓ∗ − ϕ f (xN +1 ) − ℓ∗ , 4 xN − xN +1 + γ ϕ f (xN +1 ) − ℓ∗ − ϕ f (xN +2 ) − ℓ∗ , ≤ 4 xN − xN +1 ≤ xN +1 − xN +2 ... xN +k−1 − xN +k ≤ xN +k−2 − xN +k−1 + γ ϕ f (xN +k−1 ) − ℓ∗ − ϕ f (xN +k ) − ℓ∗ 4 . Therefore, k xN +j − xN +j−1 ≤ j=1 1 4 k xN +j − xN +j−1 + j=1 xN −1 − xN xN +k−1 − xN +k − 4 4 + γ ϕ f (xN ) − ℓ∗ − ϕ f (xN +k ) − ℓ∗ . Making use of the non-negativity of ϕ, we get k xN +j − xN +j−1 ≤ j=1 4 3 xN −1 − xN + γϕ f (xN ) − ℓ∗ 4 12 . (20) It follows that k xN +j − xN +j−1 xN +k − x∗ ≤ xN − x∗ + j=1 ≤ 4 3 xN − x∗ + xN −1 − xN + γϕ f (xN ) − ℓ∗ 4 < δ. Thus, xk ∈ IB(x∗ ; δ) for all k ≥ N . Since xk ∈ IB(x∗ , δ) and ℓ∗ < f (xk ) < ℓ∗ + ν for all k ≥ N , it follows from (20) by letting k → +∞ that +∞ k=1 xk+1 − xk < +∞. Therefore, {xk } is a Cauchy sequence and hence it is a convergent sequence. Below is another theorem which gives sufficient conditions that guarantee the convergence of the sequence {xk } generated by (GPPA). In contrast to Theorem 3.2, we require the differentiability with Lipschitz gradient of the function g1 + g2 instead of h along with the strong Kurdyka - Lojasiewicz property of f . In this case, without loss of generality, we can assume that g1 (x) = 0. In the next result, for convenience, we put g2 (x) = g(x). Theorem 3.3. Consider the difference of functions f = g − h with inf x∈Rn f (x) > −∞. Suppose that g is differentiable and ∇g is L - Lipschitz continuous, f has the strong Kurdyka - Lojasiewicz property at any point x ∈ domf , and h is a finite convex function. If C ∗ = ∅, then the (GPPA) sequence {xk } converges to a critical point of f . Proof The proof is very similar to that of Theorem 3.2, except a few adjustments. Note that f is locally Lipschitz continuous under the assumptions made since g is a C 1 function and h is a finite convex function. By (10), we have y k−1 − t(xk − xk−1 ) = ∇g(xk ) and y k − t(xk+1 − xk ) = ∇g(xk+1 ). This implies, y k − y k−1 − t(xk − xk−1 ) = ∇g(xk+1 ) − ∇g(xk ) + t(xk+1 − xk ). Making use of the Lipschitz continuity of ∇g yields y k − y k−1 − t(xk − xk−1 ) = ∇g(xk+1 ) − ∇g(xk ) + t(xk+1 − xk ) ≤ (L + t) xk − xk+1 . On the other hand, y k − y k−1 − t(xk − xk−1 ) ∈ ∂h(xk ) − ∇g(xk ) = ∂ F (−f )(xk ) ⊂ ∂ C (−f )(xk ). 13 Since ∂ C (−f )(xk ) = −∂ C f (xk ), we have dist 0; ∂ C f (xk ) = dist 0; ∂ C (−f )(xk ) ≤ (L + t) xk − xk+1 . Choose N as in (16) and (17) with γ = 2L+2t t−L instead of 2M t−L as before. For all k large enough such that xk ∈ IB(x∗ ; r) and ℓ∗ < f (xk ) < ℓ∗ + ν, we have ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ (L + t) xk − xk+1 ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ ≥ dist 0; ∂ C f (xk ) ≥ dist 0; ∂ C f (xk ) ϕ′ f (xk ) − ℓ∗ ≥ t−L k x − xk+1 2 2 f (xk ) − f (xk+1 ) . It follows that xk − xk+1 ≤ γ ϕ f (xk ) − ℓ∗ − ϕ f (xk+1 ) − ℓ∗ . (21) From this, the induction to prove that xk ∈ IB(x∗ ; r) for all k ≥ N can be carried out similarly to the proof of Theorem 3.2. Indeed, suppose the assertion holds for k = N, . . . , N + k − 1 for some k ≥ 1, i.e., xN , . . . , xN +k−1 ∈ IB(x∗ ; r). Observe that k xN +j−1 − xN +j xN +k − x∗ ≤ xN − x∗ + j=1 k ϕ f (xN +j−1 ) − ℓ∗ − ϕ f (xN +j ) − ℓ∗ ≤ xN − x∗ + γ j=1 ≤ xN − x∗ + γϕ f (xN ) − ℓ∗ < r. Thus, xk ∈ IB(x∗ ; r) for all k ≥ N . Since xk ∈ IB(x∗ , r) and ℓ∗ < f (xk ) < ℓ∗ + ν for all k ≥ N , we can sum (21) from k = N to some N1 greater than N and take the limit as N1 → +∞, showing that ∞ k=1 xk+1 − xk < +∞. This completes the proof. In the proposition below, we give sufficient conditions for the set of cluster points C ∗ of the (GPPA) sequence {xk } to be nonempty. Proposition 3.6. Consider the function f = g − h, where g = g1 + g2 in (1). Let {xk } be sequence generated by the (GPPA) for solving (2). The set of critical points C ∗ of {xk } is nonempty if one of the following conditions is satisfied: (i) For any α, the lower level set L≤α := {x ∈ Rn : f (x) ≤ α} is bounded. 14 g(x) x →+∞ h(x) (ii) lim inf h(x) = +∞ and lim inf x →+∞ > 1. Proof The conclusion under (i) follows directly form the facts that f (xk ) ≤ f (x0 ) for all k and L≤f (x0 ) is bounded. Now assume that (ii) is satisfied. Then there exist M > 1 and R > 0 such that g(x) ≥ M h(x) for all x satisfying x ≥ R. It follows that lim inf f (x) = lim inf [g(x) − h(x)] ≥ (M − 1) lim inf h(x) = +∞. x →+∞ x →+∞ x →+∞ Thus, f is coercive. Combining this with the descent property of the sequence {f (xk )}, we can conclude that {xk } is bounded. It is known from [23, Corollary 16] and [15, Section 4.3] that a proper lower semicontinuous semi-algebraic function f on Rn always satisfies the Kurdyka - Lojasiewicz property at all points in dom ∂f with ϕ(s) = cs1−θ for some θ ∈ [0, 1[ and c > 0. We now derive convergence rates of the (GPPA) sequence by examining the range of the exponent. Theorem 3.4. Consider the settings of Theorems 3.2 and 3.3. Suppose further that f is a proper closed semialgebraic function so that the function ϕ in the Kurdyka - Lojasiewicz property has the form ϕ(s) = cs1−θ for some θ ∈ [0, 1[ and c > 0. Then we have the following conclusions. (i) If θ = 0, then the sequence {xk } converges in a finite number of steps. (ii) If 0 < θ ≤ 12 , then there exist µ > 0 and q ∈ (0, 1) satisfying xk − x∗ ≤ µq k . (iii) If 1 2 < θ < 1, then there exists µ > 0 such that 1−θ xk − x∗ ≤ µk 1−2θ . Proof For each k ≥ 1, set ∆k = +∞ p=k xp+1 − xp and set ℓk = f (xk ) − ℓ∗ . It is obvious from the triangle inequality that xk − x∗ ≤ ∆k . From Kurdyka - Lojasiewicz property with the special form of ϕ, we have L k c(1 − θ)ℓ−θ k dist 0; ∂ f (x ) ≥ 1. (22) From the proof of Theorem 3.3, if ∇g is L - Lipschitz continuous, then dist 0; ∂ L f (xk ) ≤ (L + t) xk+1 − xk , for all sufficiently large k. Combining this with (21) yields ∆k ≤ γϕ(ℓk ) ≤ γϕ(ℓk−1 ) = γcℓ1−θ k−1 ≤ γc [(L + t)c(1 − θ)] where γ = 2L+2t t−L . 15 1−θ θ xk − xk−1 1−θ θ , In the case of Theorem 3.2 where ∇h is L(h) - Lipschitz continuous, we have dist 0; ∂ L f (xk ) ≤ M xk − xk−1 , for all sufficiently large k, where M = L(h) + L + t. It follows from (20) that ∆k ≤ where γ = 2L+2t t−L . 4 3 1−θ xk − xk−1 xk − xk−1 4γ + γϕ(ℓk ) ≤ + [M c(1 − θ)] θ xk − xk−1 4 3 3 1−θ θ , Thus, in both cases it always holds that ∆k ≤ C1 (∆k−1 − ∆k ) + C2 (∆k−1 − ∆k ) 1−θ θ , for some C1 , C2 > 0. The result now follows from the proof of [32, Theorem 2]. 4 Examples Trust-Region SubProblem. Consider the trust-region subproblem min φ(x) = 1 ⊤ x Ax + b⊤ x : 2 x 2 ≤ r2 , (23) where A is an n × n real symmetric matrix and b ∈ Rn is given. Since A is not required to be positivesemidefinite, (23) is a nonconvex optimization problem. Let E = {x ∈ Rn : x ≤ r} and define the function f (x) = φ(x) + δ(x; E), x ∈ Rn . The trust-region subproblem (23) can be solved by the (DCA) with the following DC decomposition f = g − h with g(x) = 1 ρ x 2 2 + b⊤ x + δ(x; E) and h(x) = 1 ⊤ x (ρI − A)x, 2 where ρ is a positive number such that ρI − A is positive-semidefinite; see [6]. The convergence analysis of the (DCA) sequence for solving (23) was proved in [34]. Define g2 (x) = 1 ρ x 2 2 + b⊤ x and g1 (x) = δ(x; E). In this case, g2 and h have Lipschitz gradient with Lipschitz constants L = ρ and L(h) = λmax (ρI − A), respectively. Applying the (GPPA) for (23), we have y k = ∇h(xk ) = (ρI − A)xk and y k − ∇g2 (xk+1 ) − t xk+1 − xk ∈ ∂g1 (xk+1 ). This implies y k + txk − b ∈ (t + ρ)xk+1 + N (xk+1 ; E). 16 Thus, xk+1 = PE 1 (t + ρ)xk − Axk − b t+ρ . Proposition 4.1. Consider the trust-region subproblem (23). Then C ∗ = ∅ and the (GPPA) sequence {xk } converges to a critical point of f = g1 + g2 − h. Proof We only need to verify that all assumptions of Theorem 3.2 are satisfied in this particular case. Note that f (x) = φ(x) + δ(x; E). Obviously, inf x∈Rn f (x) > −∞ and C ∗ = ∅. Let us show that f is a semi-algebraic function. Note that E = {x ∈ Rn : p(x) ≤ r2 }, where p is the polynomial p(x) = n i=1 x2i . Thus, E is a semialgebraic set, which implies that its associated indicator function is a semi-algebraic function; see, e.g., [15]. It is also straightforward that φ is also a semi-algebraic function since its graph gph φ = {(x, y) ∈ Rn × R : x⊤ Ax + b⊤ x − y = 0} is a semi-algebraic set. It follows that f is a semialgebraic function as it is the sum of two semi-algebraic functions; see, e.g., [15]. Therefore, f satisfies the Kurdyka - Lojasiewicz property. Obviously, h has Lipschitz continuous gradient. We have shown that all assumptions of Theorem 3.2 are satisfied and the conclusion follows from Theorem 3.2. Nonconvex Feasibility Problems. In this part, we show how the (GPPA) can be applied to solve nonconvex feasibility problems. Let A and B be two nonempty closed sets in Rn . It is implicitly assumed that A and B are simple enough so that the projection onto each set is easy to compute. The feasibility problem asks for a point in A ∩ B. It is clear that A ∩ B = ∅ if and only if the following optimization problem has the zero optimal value: min 1 2 d (x) : x ∈ A . 2 B (24) This problem is of the type (1) with the objective function f (x) = g1 (x) + g2 (x) − h(x), where g1 (x) = δ(x; A), g2 (x) = 1 1 x 2 , h(x) = 2 2 17 x 2 − d2B (x) . Obviously, the function g2 is differentiable with L−Lipschitz gradient where L = 1. We have 1 x 2 1 inf{ x 2 + y 2 − 2 x, y : y ∈ B} 2 y 2 = sup{ x, y − : y ∈ B} 2 h(x) = 2 − = sup{fy (x) : y ∈ B}, where fy (x) = x, y − y 2 2 . Therefore, h is a pointwise supremum of a collection of affine functions so it is a convex function. Denote S(¯ x) = {y ∈ B : fy (¯ x) = h(¯ x)}. We have S(¯ x) = {y ∈ B : x ¯−y 2 = d2B (¯ x)} = PB (¯ x). Since B is a nonempty closed subset of Rn , the set S(¯ x) = PB (¯ x) is nonempty and compact for any x ¯ ∈ Rn . By [33, Theorem 3, p. 201], we have ∂h(¯ x) = co y∈S(¯ x) ∂fy (¯ x) = co y∈S(¯ x) {y} = co PB (¯ x). Making use of Proposition 3.1, we now can state the necessary condition for a local minimum of (24). Proposition 4.2. If x¯ ∈ A is a local optimal solution of (24), then PB (¯ x) ⊂ x¯ + N L (¯ x; A), (25) where N L (¯ x; A) is the limiting normal cone to A at x ¯ defined by N L (¯ x; A) = ∂ L δ(¯ x; A). Note that the optimality condition (25) is not sufficient to ensure that x ¯ is a local minimizer of (24) as shown in the next example. Example 4.1. Consider the following subsets of R2 : A = (x1 , x2 ) ∈ R2 : x2 ≥ 1 and B = (x1 , x2 ) ∈ R2 : x2 ≤ αx21 , where α < 21 . Put x ¯ = (0, 1) ∈ A. Since α < 12 , the system x2 + (x2 − 1)2 1 x2 − αx21 ≤ 1, ≤ 0, has a unique solution (x1 , x2 ) = (0, 0). This implies PB (¯ x) = {(0, 0)} and dB (¯ x) = 1. Obviously, x¯ satisfies condition (25) since PB (¯ x) = {(0, 0)} ⊂ {(0, γ) : γ ≤ 1} = x¯ + N (¯ x; A). However, for any neighborhood U of x ¯, there always exists ǫ > 0 small enough such that xǫ = (ǫ, 1) ∈ U and dB (xǫ ) ≤ 1 − αǫ2 < 1. 18 Thus, z¯ cannot be a local minimizer of (24). Based on the (GPPA), we now propose the following simple algorithm for solving (24). For a given initial point x0 ∈ A, the (GPPA) sequence {xk } with the starting point x0 is defined by 1 1 xk+1 ∈ PA (1 − )xk + y k , t t (26) where y k is an element chosen in co PB (xk ). Note that, this scheme is different from some other well-known methods such as the alternating projection algorithm or the averaged projection algorithm. Moreover, it cannot be obtained from the proximal forward - backward schemes in [30, 27]. Theorem 4.1. Let A and B are nonempty closed sets in Rn and let t > 1. Then the sequence {xk } ⊂ A satisfies the following: (i) For any k ≥ 1, d2B (xk ) − d2B (xk+1 ) ≥ 2(t − 1) xk − xk+1 (ii) lim k→+∞ 2 . xk − xk+1 = 0. (iii) If {xk } is bounded, then every cluster point is a critical point of f = δ(·; A) + d2B (·). Proposition 4.3. Let A and B are nonempty closed sets in Rn such that both of them are semi-algebraic sets and B is convex. Suppose further that either A or B is bounded. Then the sequence {xk } generated by the (GPPA) converges to a critical point of (24). Proof As A is a semi-algebraic set, the indicator function δ(·; A) is a semi-algebraic function. On the other hand, B is also semi-algebraic, so x → 12 d2B (x) is also a semi-algebraic function; see [30, Lemma 2.3]. Therefore, f (x) = δ(x; A) + 12 d2B (x) is a semi-algebraic function. If B is closed and convex, it is well known that the function x → d2B (x) is smooth with 1 - Lipschitz continuous gradient; see [35, Corollary 12.30]. The result now follows directly from Theorem 3.2 since the boundedness of {xk } is ensured by the coercivity of f under the assumption that either A or B is bounded. 5 Concluding Remarks Based on recent progress in using the Kurdyka - Lojasiewicz property and variational analysis in analyzing nonsmooth optimization algorithms, we introduce and study convergence analysis of a proximal point algorithm for minimizing differences of functions. We are able to relax some convexity in the classical DC programming 19 to deal with a more general class of problems. The results open up the possibility of understanding the convergence of the (DCA) and other algorithms for minimizing differences of convex functions used in numerous applications. Acknowledgment. The authors are very grateful to Prof. J´erˆome Bolte for his helpful suggestions to improve the paper. The authors are also thankful to Prof. Nguyen Dong Yen and Dr. Hoang Ngoc Tuan for useful discusions on the subject. This work was completed while the first author was visiting the Vietnam Institute for Advanced Study in Mathematics (VIASM). He would like to thank the VIASM for financial support and hospitality. The research of the second author was partially supported by the USA National Science Foundation under grant DMS-1411817 and the Simons Foundation under grant #208785. References 1. Tuy, H.: Convex Analysis and Global Optimization, Kluwer Academic Publishers, Dordrecht, The Netherlands (1998) 2. Horst, R., Tuy, H.: Global Optimization, Springer Verlag (1990) 3. Bac´ak, M., Borwein, J. M.: On difference convexity of locally Lipschitz functions. Optimization, 60, 961-978 (2011) 4. Pham, D. T., Souad, E. B.: Algorithms for solving a class of nonconvex optimization problems: Methods of subgradient, Fermat days 85, Mathematics for optimization, Elsevier, North Holland, 249-270 (1986) 5. Pham Dinh, T., Le Thi, H. A.: Convex analysis approach to D.C. programming: Theory, algorithms and applications, ActaMath. Vietnam. 22, 289-355 (1997) 6. Pham Dinh, T., Le Thi, H. A.: A d.c. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim. 8, 476-505 (1998) 7. Pham, D. T., An, L. T. H., Akoa, F.: The DC (Difference of Convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research, 133, 23-46 (2005) 8. Muu, L. D., Quoc, T. D.: One step from DC optimization to DC mixed variational inequalities, Optimization, 59, 63-76 (2010) 20 9. Martinet, B.: Regularisation, d’in´equations variationelles par approximations succesives, Rev. Francaise d’Inform. Recherche Oper., 4, 154-159 (1970) 10. Rockafellar, R.T.: Monotone operator and the proximal point algorithm, SIAM J. Control. Opt., 14:5, 877–898 (1976) 11. Sun, W., Sampaio, R. J. B., Candido, M. A. B.: Proximal point algorithm for minimization of DC Functions, Journal of Computational Mathematics, 21, 451-462 (2003) 12. Moudafi, A., Maing´e, P. E.: On the convergence of an approximate proximal method for DC functions, Journal of Computational Mathematics, 24, 475-480 (2006) 13. Bento, G. C., Ferreira, O. P., Oliveira, P. R.: Proximal point method for a special class of nonconvex functions on Hadamard manifolds, Optimization, 64(2), 289-319 (2015) 14. Souza, S. S., Oliveira, P. R., Cruz Neto, J. X., Soubeyran, A.: A proximal method with separable Bregman distances for quasiconvex minimization over the nonnegative orthant, European Journal of Operational Research, 201, 365-376 (2010) 15. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka - Lojasiewicz inequality. Mathematics of Operations Research, 35, 438–457 (2010) 16. Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semialgebraic and tame programs, preprint (2015) 17. Pham Dinh, T., Ngai, H. V., Le Thi, H. A.: Convergence analysis of DC algorithm for DC programming with subanalytic data, preprint (2013) 18. Clarke, F. H.: Optimization and Nonsmooth Analysis, SIAM, Philadelphia (1990) 19. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory, Springer, Berlin (2006) 20. Mordukhovich, B. S., Nam, N. M.: An Easy Path to Convex Analysis and Applications, Morgan & Claypool Publishers (2014) 21 21. Rockafellar, R. T.: Convex Analysis, Princeton University Press, Princeton, NJ (1970) 22. Rockafellar, R. T., Wets, R.: Variational Analysis, Grundlehren der Mathematischen Wissenschaften, 317, Springer (1998) 23. Bolte, J., Daniilidis, A., Lewis, A.S., Shiota., M.: Clarke subgradients of stratifiable functions. SIAM J. Optim., 18, 556-572 (2007) 24. Hiriart-Urruty, J. B.: Generalized differentiability, duality and optimization for problems dealing with differences of convex functions, Lecture Note in Economics and Math. Systems 256, 37-70 (1985) 25. Horst, R., Thoai, N. V.: DC programming: overview, Journal of Optimization Theory and Applications, 103(1), 1-43 (1999) 26. Mordukhovich, B. S., Nam, N. M., Yen, N. D.: Fr´echet subdifferential calculus and optimality conditions in nondifferentiable programming, Optimization, 55, 685-708 (2006) 27. Bolte, J., Sabach, S., Teboulle, M.: Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems, Math. Program., Ser. A, (146), 459-494 (2014) 28. Nesterov, Y.: Introductory lectures on convex optimization: a basic course, Applied optimization, Kluwer Academic Publ., Boston, Dordrecht, London (2004) 29. Ortega, J. M., Rheinboldt, W. C.: Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New-York (1970) 30. Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program., Ser. A,137 (1), 91-124 (2011) 31. Souza, J. C. , Oliveira, P. R., Soubeyran, A.: A modified generalized proximal point algorithm for DC functions with application to the optimal size of the firm problem, submitted to European J. Oper. Res. (2015) 32. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, no. 1-2, Ser. B, 5-16 (2009) 22 33. Ioffe, A. D., Tihomirov, V. M.: Theory of extremal problems, Studies in Mathematics and its Applications, vol. 6, North-Holland Publishing Co., Amsterdam - New York, 1979. Translated from the Russian by Karol Makowski. 34. Tuan, H. N., Yen, N. D.: Convergence of Pham DinhLe This algorithm for the trust-region subproblem, J. Global Optim., 55 (2013), 337–347. 35. Bauschke, H. H., Combettes, P. L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer (2011). 23 [...]... Kurdyka - Lojasiewicz property and variational analysis in analyzing nonsmooth optimization algorithms, we introduce and study convergence analysis of a proximal point algorithm for minimizing differences of functions We are able to relax some convexity in the classical DC programming 19 to deal with a more general class of problems The results open up the possibility of understanding the convergence of. .. A. : Convergence analysis of DC algorithm for DC programming with subanalytic data, preprint (2013) 18 Clarke, F H.: Optimization and Nonsmooth Analysis, SIAM, Philadelphia (1990) 19 Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory, Springer, Berlin (2006) 20 Mordukhovich, B S., Nam, N M.: An Easy Path to Convex Analysis and Applications, Morgan & Claypool Publishers... of Nonlinear Equations in Several Variables, Academic Press, New-York (1970) 30 Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods Math Program., Ser A, 137 (1), 91-124 (2011) 31 Souza, J C , Oliveira, P R., Soubeyran, A. : A modified generalized proximal point algorithm. .. SIAM J Control Opt., 14:5, 877–898 (1976) 11 Sun, W., Sampaio, R J B., Candido, M A B.: Proximal point algorithm for minimization of DC Functions, Journal of Computational Mathematics, 21, 451-462 (2003) 12 Moudafi, A. , Maing´e, P E.: On the convergence of an approximate proximal method for DC functions, Journal of Computational Mathematics, 24, 475-480 (2006) 13 Bento, G C., Ferreira, O P., Oliveira,... Soubeyran, A. : Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka - Lojasiewicz inequality Mathematics of Operations Research, 35, 438–457 (2010) 16 Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semialgebraic and tame programs, preprint (2015) 17 Pham Dinh, T., Ngai, H V., Le Thi, H A. : Convergence. .. P., Oliveira, P R.: Proximal point method for a special class of nonconvex functions on Hadamard manifolds, Optimization, 64(2), 289-319 (2015) 14 Souza, S S., Oliveira, P R., Cruz Neto, J X., Soubeyran, A. : A proximal method with separable Bregman distances for quasiconvex minimization over the nonnegative orthant, European Journal of Operational Research, 201, 365-376 (2010) 15 Attouch, H., Bolte,... Institute for Advanced Study in Mathematics (VIASM) He would like to thank the VIASM for financial support and hospitality The research of the second author was partially supported by the USA National Science Foundation under grant DMS-1411817 and the Simons Foundation under grant #208785 References 1 Tuy, H.: Convex Analysis and Global Optimization, Kluwer Academic Publishers, Dordrecht, The Netherlands... problems, Annals of Operations Research, 133, 23-46 (2005) 8 Muu, L D., Quoc, T D.: One step from DC optimization to DC mixed variational inequalities, Optimization, 59, 63-76 (2010) 20 9 Martinet, B.: Regularisation, d’in´equations variationelles par approximations succesives, Rev Francaise d’Inform Recherche Oper., 4, 154-159 (1970) 10 Rockafellar, R.T.: Monotone operator and the proximal point algorithm, ... Global Optimization, Springer Verlag (1990) 3 Bac´ak, M., Borwein, J M.: On difference convexity of locally Lipschitz functions Optimization, 60, 961-978 (2011) 4 Pham, D T., Souad, E B.: Algorithms for solving a class of nonconvex optimization problems: Methods of subgradient, Fermat days 85, Mathematics for optimization, Elsevier, North Holland, 249-270 (1986) 5 Pham Dinh, T., Le Thi, H A. : Convex analysis. .. analysis approach to D.C programming: Theory, algorithms and applications, ActaMath Vietnam 22, 289-355 (1997) 6 Pham Dinh, T., Le Thi, H A. : A d.c optimization algorithm for solving the trust-region subproblem, SIAM J Optim 8, 476-505 (1998) 7 Pham, D T., An, L T H., Akoa, F.: The DC (Difference of Convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, ... property at each point of dom ∂ C f A Generalized Proximal Point Algorithm for Minimizing Differences of Functions We focus on the convergence analysis of a proximal point algorithm for solving... variational analysis in analyzing nonsmooth optimization algorithms, we introduce and study convergence analysis of a proximal point algorithm for minimizing differences of functions We are able to...past three decades, Pham Dinh Tao, Le Thi Hoai An and many others have contributed to providing mathematical foundation for the algorithm and making it accessible for applications The (DCA)