Bài giảng Tối ưu hóa nâng cao - Chương 7: Subgradient method

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội.. Important here that step sizes go to zero, but not too fast..[r]

(1)

Subgradient Method

Hoàng Nam Dũng

(2)

Last last time: gradient descent Consider the problem

min x f(x)

forf convex and differentiable,dom(f) =Rn Gradient descent: choose initial x(0) ∈Rn, repeat:

x(k) =x(k−1)−tk · ∇f(x(k−1)), k =1,2,3,

Step sizestk chosen to be fixed and small, or by backtracking line

search

(3)

Subgradient method

Now considerf convex, havingdom(f) =Rn, but not necessarily

differentiable

Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initializex(0), repeat:

x(k) =x(k−1)−tk ·g(k−1), k =1,2,3,

whereg(k−1)∈∂f(x(k−1)) any subgradient off at x(k−1)

Subgradient method is not necessarily a descent method, so we keep track of best iteratexbest(k) amongx(0), x(k) so far, i.e.,

f(xbest(k)) = i=0, ,kf(x

(4)

Subgradient method

Now considerf convex, havingdom(f) =Rn, but not necessarily

differentiable

Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initializex(0), repeat:

x(k) =x(k−1)−tk ·g(k−1), k =1,2,3,

(5)

Outline

Today:

I How to choose step sizes

I Convergence analysis

I Intersection of sets

(6)

Step size choices

I Fixed step sizes:tk =t allk =1,2,3,

I Fixed step length, i.e., tk =s/kg(k−1)k2, and hence

ktkg(k−1)k2 =s

I Diminishing step sizes: choose to meet conditions ∞

X

k=1

tk2 <∞,

∞

X

k=1

tk =∞,

(7)

Convergence analysis

Assume thatf convex,dom(f) =Rn, and also that f is Lipschitz

continuous with constantL>0, i.e.,

|f(x)−f(y)| ≤Lkx−yk2 for all x,y

Theorem

For a fixed step sizet, subgradient method satisfies f(xbest(k))−f∗≤ kx

(0)−x∗k2

2kt + L2t

2 For fixed step length, i.e.,tk =s/kg(k−1)k2, we have

f(xbest(k))−f∗ ≤ Lkx

(0)−x∗k2

2ks + Ls

2 For diminishing step sizes, subgradient method satisfies

f(xbest(k))−f∗≤ kx

(0)−x∗k2

2+L2

Pk i=1ti2

2Pki=1ti

,

i.e., lim

→∞f(x (k) best) =f

(8)

Lipschitz continuity

Before the proof let consider the Lipschitz continuity assumption

Lemma

f is Lipschitz continuous with constantL>0, i.e.,

|f(x)−f(y)| ≤Lkx−yk2 for all x,y,

is equivalent to

kgk2 ≤L for all x andg ∈∂f(x)

Chứng minh

⇐=: Choose subgradientsgx and gy atx andy We have

(9)

Lipschitz continuity

Before the proof let consider the Lipschitz continuity assumption

Lemma

f is Lipschitz continuous with constantL>0, i.e.,

|f(x)−f(y)| ≤Lkx−yk2 for all x,y,

is equivalent to

kgk2 ≤L for all x andg ∈∂f(x)

Chứng minh

=⇒: Assume kgk2>Lfor some g ∈∂f(x) Take y =x+g/kgk2

we haveky−xk2 =1 and

f(y)≥f(x) +gT(y−x) =f(x) +kgk2 >f(x) +L,

(10)

Convergence analysis - Proof

Can prove both results from same basic inequality Key steps:

I Using definition of subgradient

kx(k)−x∗k2

2 =kx(k

−1)−t

kg(k−1)−x∗k22

=kx(k−1)−x∗k2

2−2tkg(k−1)(x(k−1)−x∗) +tk2kg(k

−1)k2

≤ kx(k−1)−x∗k22−2tk(f(x(k−1))−f(x∗)) +tk2kg(k

−1)k2

2

I Iterating last inequality

kx(k)−x∗k22

≤ kx(0)−x∗k22−2

k X

i=1

ti(f(x(i−1))−f(x∗)) + k X

i=1

Định dạng
Số trang	10
Dung lượng	186,58 KB