Bài giảng Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... We use pre-set rules for the step sizes (e.g., diminshing step sizes rule).[r]

(1)

Proximal Gradient Descent (and Acceleration)

Hoàng Nam Dũng

(2)

Last time: subgradient method Consider the problem

min x f(x)

withf convex, and dom(f) =Rn

Subgradient method: choose an initial x(0)∈Rn, and repeat:

x(k) =x(k−1)−tk ·g(k−1), k =1,2,3,

whereg(k−1)∈∂f(x(k−1)) We use pre-set rules for the step sizes (e.g., diminshing step sizes rule)

Iff is Lipschitz, then subgradient method has a convergence rate O(1/ε2).

(3)

Outline

Today

I Proximal gradient descent

I Convergence analysis

I ISTA, matrix completion

I Special cases

(4)

Decomposable functions Suppose

f(x) =g(x) +h(x)

where

I g is convex, differentiable,dom(g) =Rn

I h is convex, not necessarily differentiable

Iff were differentiable, then gradient descent update would be x+ =x−t· ∇f(x)

Recall motivation: minimizequadratic approximationtof around x, replace∇2f(x) by 1tI

x+= argminzf(x) +∇f(x)T(z −x) +

1

2tkz −xk

2

| {z }

(5)

Decomposable functions

In our casef is not differentiable, butf =g +h,g differentiable Why don’t we makequadratic approximation to g, leaveh alone? I.e., update

x+= argminzg˜t(z) +h(z)

= argminzg(x) +∇g(x)T(z −x) +

1

2tkz −xk

2

2+h(z) = argminz

1

2tkz−(x−t∇g(x))k

2

2+h(z)

1

2tkz −(x−t∇g(x))k22 stay close to gradient update forg

(6)

Proximal mapping

Theproximal mapping (orprox-operator) of a convex function h is defined as

proxh(x) = argminz

1

2kx−zk

2

2+h(z)

Examples:

I h(x) =0:proxh(x) =x

I h(x) is indicator function of a closed convex set C:proxh is

the projection on C

proxh(x) = argminz∈C

1

2kx−zk

2

2=PC(x) I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =     

xi −1 xi ≥1

0 |xi| ≤1

(7)

Proximal mapping

proxh(x) = argminz

1

2kx−zk

2

2+h(z) Examples:

the projection on C

1

2kx−zk

2

operation

proxh(x)i =     

xi −1 xi ≥1

0 |xi| ≤1

(8)

Proximal mapping

proxh(x) = argminz

1

2kx−zk

2

2+h(z) Examples:

the projection on C

1

2kx−zk

2

2=PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =     

xi −1 xi ≥1

0 |xi| ≤1

(9)

Proximal mapping

proxh(x) = argminz

1

2kx−zk

2

2+h(z) Examples:

the projection on C

1

2kx−zk

2

operation

proxh(x)i =     

xi −1 xi ≥1

0 |xi| ≤1

(10)

Proximal mapping Theorem

Ifh is convex and closed (has closed epigraph) then

proxh(x) = argminz

1

2kx−zk

2

2+h(z)

exists and is unique for allx Chứng minh

Seehttp://www.seas.ucla.edu/~vandenbe/236C/lectures/

proxop.pdf

Uniqueness since the objective function is strictly convex

Optimality condition

z = proxh(x)⇔x−z ∈∂h(z)

Định dạng
Số trang	10
Dung lượng	193,62 KB