Convex Analysis, Nonsmooth Analysis, and Variational
In classical calculus, the focus shifts from differentiable functions to convex functions that may not be differentiable Unlike traditional cases where the derivative is a single value, the subdifferential of a convex function represents a set of values This concept is closely linked to the geometric notion of the normal cone to a convex set, a concept that traces back to Minkowski.
Before the contributions of Rockafellar and Moreau, concepts of generalized differentiation were explored in mathematics and applied sciences by figures such as Saks and Sobolev However, these generalized derivatives overlook sets of measure zero, rendering them less useful in optimization theory, where function behavior at specific points is crucial Convex analysis provides the theoretical foundation for numerical methods in convex optimization, allowing for a thorough examination of optimal solution properties and the formulation of necessary and sufficient optimality conditions This framework also facilitates the development of effective numerical algorithms for solving convex optimization problems, even when dealing with nondifferentiable objective functions The influence of convex analysis and optimization continues to grow across various fields, including automatic control systems, signal processing, communications, electronic circuit design, data analysis, statistics, machine learning, and economics.
The beauty and applications of convex analysis motivated the search for a new theory to deal with broader classes of functions and sets where convexity is not assumed This
Convex Analysis search has been the inspiration for the devel- opments of variational analysis, also known as nonsmooth analysis, initiated in the early
1970’s Variational analysis has now become a well-developed field with many applications, especially to optimization theory; see the cor- nerstone references [9, 38, 64, 65] for more history and recent developments of the field.
Optimization
To solve an optimization problem, one seeks an input ¯x ∈ C such that f(¯x) ≤ f(x) for all x ∈ C, identifying ¯x as the solution When f is a convex function and C is a convex set, the problem is classified as convex optimization These problems can be further categorized based on the objective function's nature, either differentiable (smooth) or non-differentiable (nonsmooth), with the latter being particularly significant in convex optimization To approximate a solution, optimization algorithms are employed, which typically follow an iterative process starting with an initial guess x₀ and progressively refining that guess to create a sequence x₀, x₁, x₂, that converges to ¯x The algorithms discussed in this thesis leverage concepts from convex, nonsmooth, or variational analysis.
Operations research utilizes optimization techniques to address real-world challenges in industries and logistics Among the earliest issues examined in this field are location problems, which are thoroughly explored in this dissertation.
Electric Power Systems
In California, electricity demand on hot days has remained consistent since the year 2000, reflecting the operational principles of the electric power system established in the early 1900s This legacy system was designed to accommodate consumer peak demand by adjusting generation levels accordingly Notably, electricity consumption fluctuates throughout the day, with peak demand often reaching twice the minimum demand within a 24-hour cycle.
In recent decades, the electric power system has been significantly impacted by two major advancements: the rise of variable generation sources like wind and solar energy, and the introduction of distributed energy resources, including rooftop solar, energy storage, and controllable loads such as smart appliances and electric vehicles These developments create opportunities for optimization, enabling system planners and operators to minimize costs, maximize participant revenue, and reduce emissions in a more efficient energy landscape.
Overview of Research
• Chapter 2: Use the optimal value function and coderivatives to establish subdifferential formulas in locally convex topological vector spaces Develop formulas for coderivatives in infinite dimensions.
• Chapter 3: Applynonsmooth optimization techniques to location problems: – Apply Nesterov’s smoothing technique and accelerated gradient method to facility location problems.
– ApplyDC programming techniques to problems of multifacility location and clustering.
• Chapter 4: Apply optimization techniques to a problem in electric power systems: develop an optimal control scheme for a smart solar inverter and battery-storage system operating in a transactive control setting.
Basic Tools of Convex Analysis and Optimization
Definitions
In mathematical analysis, when we define Y as the set of real numbers R, we typically consider the ordering of Y+ as the interval [0, +∞) Conversely, if Y is defined as the extended real line (−∞, ∞], we assume Y+ to be [0, ∞) The notation Y := R indicates that the object on the left is being defined as the object on the right Additionally, a function φ: Y → R is classified as non-decreasing if for any two values y and z in Y, the condition y ≤ z ensures that f(y) ≤ f(z).
A map f: X → Y is convex if f(λx 1 + (1−λ)x 2 ) ≤ λf(x 1 ) + (1 −λ)f(x 2 ) for all x 1 , x 2 ∈ X and all λ ∈ (0,1) When Y = R, we define the domain of f to be the set domf := {x ∈ X | f(x) < ∞} The epigraph of f is the set epif := {(x, y) ∈
A set Ω∈Xisconvex if for allx 1 , x 2 ∈Ω and allλ∈(0,1) we haveλx 1 +(1−λ)x 2 ∈Ω.
An important geometric property of convex functions is that f is convex if and only if its epigraph is convex.
In mathematical notation, we represent set-valued maps using double arrows, such as G: X →→ Y, indicating that for each input x in the domain X, the output G(x) is a subset of Y, which may also be empty The domain of a set-valued map, denoted as domG, includes all inputs that yield a nonempty output The graph of the set-valued map G is defined as gphG = {(x, y) ∈ X × Y | y ∈ G(x)}, which can also be expressed as {(x, y) ∈ X × Y | x ∈ domG, y ∈ G(x)} Notably, if the graph gphG is convex, it has implications for the properties of the set-valued map.
X×Y then we say thatGis aconvex set-valued map Thesubdifferential of a convex function f: X →R at ¯x∈domf is the set
Elements of this set are called the subgradients of f at ¯x In this way, the operator
∂f is a set-valued map ∂f: X →→X ∗ (The overline ¯xis often used to denote a point of interest, not necessarily being the solution to an optimization problem.)
Another important set-valued map into X ∗ is that of the normal cone Let Ω ⊂ X be convex and ¯x∈Ω Then the normal cone to Ω at x¯ is defined by
The following proposition provides a useful representation of the subdifferential via the normal cone and epigraph is easily proven.
Proposition 1.1.1 Let f: X →R be convex and let x¯∈domf Then we have
Proof For notation, set W = {x ∗ ∈ X ∗ | (x ∗ ,−1) ∈ N((¯x, f(¯x)); epif)} Let x ∗ ∈∂f(¯x) and pick any (x, λ)∈epif Then we have h(x ∗ ,−1),(x, λ)−(¯x, f(¯x))i=hx ∗ , x−xi −¯ (λ−f(¯x))
≤ hx ∗ , x−xi −¯ (f(x)−f(¯x))≤0, where the last inequality holds because x ∗ ∈∂f(¯x), so x ∗ ∈W.
For the reverse containment, let u ∗ ∈W Since h(u ∗ ,−1),(x, λ)−(¯x, f(¯x))i ≤ 0 for all (x, λ)∈epif, we have hx ∗ , x−xi −¯ (f(x)−f(¯x))≤0 for all x∈domf, so u ∗ ∈∂f(¯x) This completes the proof
In our geometric approach, two key definitions are the indicator function and the support function The indicator function δ Ω : X → R is defined such that δ Ω (x) equals 0 if x is in the set Ω and ∞ if x is not in Ω For convex sets Ω, it can be shown that the subdifferential ∂δ Ω (¯x) equals the normal cone N(¯x; Ω) Meanwhile, the support function σ Ω : X ∗ → R is defined as σ Ω (x ∗ ) = sup{hx ∗ , xi | x ∈ Ω} If a point ¯x is in Ω, it follows that x ∗ belongs to the normal cone N(¯x; Ω) if and only if σ Ω (x ∗ ) equals the inner product hx ∗ , xi.
Optimal Value Function
Definition 1.1.2 Given G: X →→ Y and φ: X ×Y → R, we denote the optimal value function à: X →R by à(x) := inf{φ(x, y)|y∈G(x)}.
It can be helpful to think of φ(x,ã) as the “objective function” and G(x) as the
“constraint set” We have as astanding assumptionin this document thatà(x)>−∞ for all x∈X.
Definition 1.1.3 The solution set of à at x¯ is M(¯x) :={y∈Y |à(¯x) =φ(¯x, y)}.
Optimization Algorithms
1.1.3.1 Subgradient Method The subgradient method is a standard method for solving nonsmooth optimization problems It is outlined as follows.
To update the variable \( x \) in the subgradient method, begin with an initial point \( x_0 \in \mathbb{R}^n \) and apply the update rule \( x_{k+1} := x_k - t_k w_k \), where \( w_k \) is a subgradient of the function \( f \) at \( x_k \) and \( t_k > 0 \) is a predetermined step size The method guarantees convergence for any initial value \( x_0 \) when the step sizes \( (t_k) \) satisfy the conditions \( \sum_{k=1}^{\infty} t_k = \infty \) and \( \sum_{k=1}^{\infty} t_k^2 < \infty \), particularly when \( f \) is convex Generally, the subgradient method exhibits a convergence rate characterized by its order of convergence.
1.1.3.2 Stochastic Subgradient Method The stochastic subgradient method is a variation on the subgradient method It is particularly well-suited for problems where the objective function is a sum of convex functions.
Definition 1.1.4 Let f: R n → R be a convex function A vector valued random variable V˜ ∈R n is called a noisy unbiased subgradient of f at x¯ if the expected value E( ˜V)∈∂f(¯x) That is
The stochastic subgradient method is outlined as follows.
Input x 0 ∈R n Then update x k+1 :=x k −t k v˜ k where E( ˜v k )∈∂f(x k ) and tk >0 is a pre-determined step size.
Convex optimization is an expanding domain that encompasses various algorithms, including smoothing methods, proximal point methods, bundle methods, and majorization minimization methods This field has diverse applications across machine learning, computational statistics, optimal control, neural network training, data mining, engineering, and economics.
1.1.3.3 Optimization Beyond Convexity We may often be presented with an op- timization problem where the objective function is not convex No complete theory exists for finding solutions to these types of optimization problems, but certain results may be obtained using the tools of convex analysis A main focus of this thesis is to develop optimization algorithms for optimization problems in which the objective function is not necessarily convex We present below one such class of non-convex optimization problems, and an algorithm for find their solutions.
DC programming stands for “Difference of Convex” programming It offers a method to solve the following types of optimization problems.
Let f(x) = g(x)−h(x), where g: R n → R and h: R n → R are convex functions.Then f is called a DC function since it is the difference of convex functions.
The problem minimize f(x) = g(x)−h(x), x∈R n , (1.1.1) is a DC optimization problem.
The framework for DC programming was constructed by Tao and An in their papers
[79, 80] in the late 1990’s; its essential elements are presented below.
The DC Programming Algorithm (DCA):
One of the key components in the DCA is the Fenchel conjugate ϕ ∗ of a convex function ϕ: R n →(−∞,+∞], defined by ϕ ∗ (v) := sup{hv, xi −ϕ(x)|x∈R n }.
If ϕ is proper, i.e., domϕ6=∅, then ϕ ∗ : R n →(−∞,+∞] is also a convex function. Some other important properties of the Fenchel conjugate of a convex function are given in the following proposition.
Proposition 1.1.5 Let ϕ: R n →(−∞,+∞] be a convex function.
(i) Given any x∈domϕ, one has that y∈∂ϕ(x) if and only if ϕ(x) +ϕ ∗ (y) =hx, yi.
(ii) If ϕ is proper and lower semicontinuous, then for any x ∈ domϕ one has that y∈∂ϕ(x) if and only if x∈∂ϕ ∗ (y).
(iii) If ϕ is proper and lower semicontinuous, then (ϕ ∗ ) ∗ =ϕ.
The DC optimization problem (1.1.1) possesses useful optimality conditions, one of which is given in the next proposition.
Proposition 1.1.6 If x¯∈domf is a local minimizer of (1.1.1), then
A stationary point ¯x∈domf that satisfies condition (1.1.2) is defined as a critical point of (1.1.1) if the intersection of the subdifferentials ∂g(¯x) and ∂h(¯x) is not empty While every stationary point is a critical point, the reverse is not always true Additionally, the Toland dual of (1.1.1) presents an alternative formulation of the problem.
Using the convention (+∞)−(+∞) = +∞, we have the following relationship be- tween a DC optimization problem and its Toland dual.
Proposition 1.1.7 Considering the function f =g−h given in (1.1.1), one has inf{g(x)−h(x)|x∈R n }= inf{h ∗ (y)−g ∗ (y)|y∈R n }.
The DCA, grounded in Toland’s duality theorem and Proposition 1.1.5, aims to create two sequences, {xk} and {yk}, where the sequences g(xk)−h(xk) and h∗(yk)−g∗(yk) are both monotone decreasing This construction ensures that every cluster point ¯x of the sequence {xk} is a critical point of problem (1.1.1), while every cluster point ¯y of the sequence {yk} serves as a critical point of (1.1.3), satisfying the condition ∂g∗(¯y)∩∂h∗(¯y)≠∅.
The DCA is summarized as follows:
Step 2 Fork ≥0, use x k to find y k ∈∂h(x k ).
Step 4 Increase k by 1 and go back to Step 2.
In the case where we cannot find y k orx k+1 exactly, we can find them approximately by solving a convex optimization problem This idea is explored in Chapter 3.
Generalized differential calculus encompasses the rules and derivatives specifically designed for nonsmooth functions and set-valued mappings, which frequently occur in various applications This area of study lays the mathematical groundwork essential for nonsmooth optimization.
This chapter introduces findings from a geometric approach to convex analysis and generalized differential calculus, a concept developed by B.S Mordukhovich This method utilizes the normal cone, optimal value function, and coderivative to simplify the proof process for both new and existing generalized calculus results The ease of these new proofs makes them accessible for teaching to beginning graduate and even undergraduate students, addressing a challenge that has traditionally been reserved for advanced graduate courses Throughout this discussion, we consider X and Y as Hausdorff locally convex topological vector spaces over R, unless stated otherwise.
This chapter is divided into two sections Section 2.1 discusses a key finding that connects the subdifferential of the optimal value function to the coderivative, and although this result is established, we utilize it innovatively to formulate subdifferentials for different convex functions Building on the significance of coderivatives, section 2.2 introduces new formulas for the coderivatives of various set-valued mappings.
A Geometric Approach to Subdifferential Calculus
This section demonstrates the rules of subdifferential calculus through the optimal value function and coderivatives The relationship between the subdifferential of the optimal value function and coderivatives is established by a key theorem, known as "the fundamental theorem" (Theorem 2.1.8) The proof of this theorem relies on the subdifferential sum rule, which is derived from the normal cone intersection rule A novel approach is used to prove the normal cone intersection rule, employing support functions and the convex extremal principle We will begin by presenting a definition before advancing to the relevant theorems.
Definition 2.1.1 We say that two nonempty sets Ω 1 ,Ω 2 ∈ X form an extremal system if for any neighborhood V of the origin there exists a vector a∈V such that
The following theorem is a key element of the convex extremal principle, which is derived from the classical separation principle For a detailed proof and additional insights, refer to the work by Mordukhovich, Nam, Rector, and Tran [39].
Theorem 2.1.2 Let Ω 1 ,Ω 2 ⊂ X be nonempty and convex If Ω 1 and Ω 2 form an extremal system and int (Ω1−Ω2)6= ∅, then Ω1 and Ω2 can be separated, i.e., there exists some nonzero x ∗ ∈X ∗ such that
(2.1.5) sup x∈Ω 1 hx ∗ , xi ≤ inf x∈Ω 2 hx ∗ , xi.
We apply this result in the proof of the following theorem regarding support functions to intersections of sets As mentioned, we use this result to prove the normal cone
Theorem 2.1.3 Let Ω 1 ,Ω 2 ⊂ X be nonempty and convex Suppose that either (int Ω 2 )∩Ω 1 6= ∅ or (int Ω 1 )∩Ω 2 6= ∅ Then for any x ∗ ∈ dom (σ Ω 1 ∩Ω 2 ) there are x ∗ 1 , x ∗ 2 ∈X ∗ such that x ∗ =x ∗ 1 +x ∗ 2 and
Proof First we note that for any x ∗ 1 , x ∗ 2 ∈X ∗ with x ∗ 1 +x ∗ 2 =x ∗ , we have hx ∗ 1 , xi+hx ∗ 2 , xi ≤σΩ 1(x ∗ 1 ) +σΩ 2(x ∗ 2 ) for any x∈Ω 1 ∩Ω 2 So the “≤” inequality is established in (2.1.6).
To demonstrate the reverse direction, we utilize the Convex Extremal Principal Theorem 2.1.2 to identify the elements x∗₁ and x∗₂ in X∗ necessary for establishing the “≥” inequality We begin by applying this theorem, letting x∗ belong to the domain of σΩ₁ ∩ Ω₂ and defining α as σΩ₁ ∩ Ω₂(x∗), ensuring that hx∗, xi - α ≤ 0 for all x in Ω₁ ∩ Ω₂ Subsequently, we define two sets for the application of Theorem 2.1.2: Θ₁ = Ω₁ × [0, ∞) and Θ₂ = {(x, λ) ∈ X × R | x ∈ Ω₂, λ ≤ hx∗, xi - α}.
We can see from the construction of Θ 1 and Θ 2 that
(Θ 1 + (0, γ))∩Θ 2 =∅ for any γ >0, so, Θ 1 and Θ 2 form an extremal system To apply Theorem 2.1.2, we need to show that int (Θ 1 −Θ 2 )6=∅ We will use the assumption that (int Ω 2 )∩Ω 1 6=∅.
The interior of Θ 2 is expressed as int (Θ 2 ) = {(x, λ)∈X×R|x∈int (Ω 2 ), λ