Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội.. Important here that step sizes go to zero, but not too fast..[r]
(1)Subgradient Method
Hoàng Nam Dũng
(2)Last last time: gradient descent Consider the problem
min x f(x)
forf convex and differentiable,dom(f) =Rn Gradient descent: choose initial x(0) ∈Rn, repeat:
x(k) =x(k−1)−tk · ∇f(x(k−1)), k =1,2,3,
Step sizestk chosen to be fixed and small, or by backtracking line
search
(3)Subgradient method
Now considerf convex, havingdom(f) =Rn, but not necessarily
differentiable
Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initializex(0), repeat:
x(k) =x(k−1)−tk ·g(k−1), k =1,2,3,
whereg(k−1)∈∂f(x(k−1)) any subgradient off at x(k−1)
Subgradient method is not necessarily a descent method, so we keep track of best iteratexbest(k) amongx(0), x(k) so far, i.e.,
f(xbest(k)) = i=0, ,kf(x
(4)Subgradient method
Now considerf convex, havingdom(f) =Rn, but not necessarily
differentiable
Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initializex(0), repeat:
x(k) =x(k−1)−tk ·g(k−1), k =1,2,3,
(5)Outline
Today:
I How to choose step sizes
I Convergence analysis
I Intersection of sets
(6)Step size choices
I Fixed step sizes:tk =t allk =1,2,3,
I Fixed step length, i.e., tk =s/kg(k−1)k2, and hence
ktkg(k−1)k2 =s
I Diminishing step sizes: choose to meet conditions ∞
X
k=1
tk2 <∞,
∞
X
k=1
tk =∞,
(7)Convergence analysis
Assume thatf convex,dom(f) =Rn, and also that f is Lipschitz
continuous with constantL>0, i.e.,
|f(x)−f(y)| ≤Lkx−yk2 for all x,y
Theorem
For a fixed step sizet, subgradient method satisfies f(xbest(k))−f∗≤ kx
(0)−x∗k2
2kt + L2t
2 For fixed step length, i.e.,tk =s/kg(k−1)k2, we have
f(xbest(k))−f∗ ≤ Lkx
(0)−x∗k2
2ks + Ls
2 For diminishing step sizes, subgradient method satisfies
f(xbest(k))−f∗≤ kx
(0)−x∗k2
2+L2
Pk i=1ti2
2Pki=1ti
,
i.e., lim
→∞f(x (k) best) =f
(8)Lipschitz continuity
Before the proof let consider the Lipschitz continuity assumption
Lemma
f is Lipschitz continuous with constantL>0, i.e.,
|f(x)−f(y)| ≤Lkx−yk2 for all x,y,
is equivalent to
kgk2 ≤L for all x andg ∈∂f(x)
Chứng minh
⇐=: Choose subgradientsgx and gy atx andy We have
(9)Lipschitz continuity
Before the proof let consider the Lipschitz continuity assumption
Lemma
f is Lipschitz continuous with constantL>0, i.e.,
|f(x)−f(y)| ≤Lkx−yk2 for all x,y,
is equivalent to
kgk2 ≤L for all x andg ∈∂f(x)
Chứng minh
=⇒: Assume kgk2>Lfor some g ∈∂f(x) Take y =x+g/kgk2
we haveky−xk2 =1 and
f(y)≥f(x) +gT(y−x) =f(x) +kgk2 >f(x) +L,
(10)Convergence analysis - Proof
Can prove both results from same basic inequality Key steps:
I Using definition of subgradient
kx(k)−x∗k2
2 =kx(k
−1)−t
kg(k−1)−x∗k22
=kx(k−1)−x∗k2
2−2tkg(k−1)(x(k−1)−x∗) +tk2kg(k
−1)k2
≤ kx(k−1)−x∗k22−2tk(f(x(k−1))−f(x∗)) +tk2kg(k
−1)k2
2
I Iterating last inequality
kx(k)−x∗k22
≤ kx(0)−x∗k22−2
k X
i=1
ti(f(x(i−1))−f(x∗)) + k X
i=1