Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... We use pre-set rules for the step sizes (e.g., diminshing step sizes rule).[r]
(1)Proximal Gradient Descent (and Acceleration)
Hoàng Nam Dũng
(2)Last time: subgradient method Consider the problem
min x f(x)
withf convex, and dom(f) =Rn
Subgradient method: choose an initial x(0)∈Rn, and repeat:
x(k) =x(k−1)−tk ·g(k−1), k =1,2,3,
whereg(k−1)∈∂f(x(k−1)) We use pre-set rules for the step sizes (e.g., diminshing step sizes rule)
Iff is Lipschitz, then subgradient method has a convergence rate O(1/ε2).
(3)Outline
Today
I Proximal gradient descent
I Convergence analysis
I ISTA, matrix completion
I Special cases
(4)Decomposable functions Suppose
f(x) =g(x) +h(x)
where
I g is convex, differentiable,dom(g) =Rn
I h is convex, not necessarily differentiable
Iff were differentiable, then gradient descent update would be x+ =x−t· ∇f(x)
Recall motivation: minimizequadratic approximationtof around x, replace∇2f(x) by 1tI
x+= argminzf(x) +∇f(x)T(z −x) +
1
2tkz −xk
2
| {z }
(5)Decomposable functions
In our casef is not differentiable, butf =g +h,g differentiable Why don’t we makequadratic approximation to g, leaveh alone? I.e., update
x+= argminzg˜t(z) +h(z)
= argminzg(x) +∇g(x)T(z −x) +
1
2tkz −xk
2
2+h(z) = argminz
1
2tkz−(x−t∇g(x))k
2
2+h(z)
1
2tkz −(x−t∇g(x))k22 stay close to gradient update forg
(6)Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z)
Examples:
I h(x) =0:proxh(x) =x
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x) I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 |xi| ≤1
(7)Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z) Examples:
I h(x) =0:proxh(x) =x
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x) I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 |xi| ≤1
(8)Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z) Examples:
I h(x) =0:proxh(x) =x
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x)
I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 |xi| ≤1
(9)Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z) Examples:
I h(x) =0:proxh(x) =x
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x) I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 |xi| ≤1
(10)Proximal mapping Theorem
Ifh is convex and closed (has closed epigraph) then
proxh(x) = argminz
1
2kx−zk
2
2+h(z)
exists and is unique for allx Chứng minh
Seehttp://www.seas.ucla.edu/~vandenbe/236C/lectures/
proxop.pdf
Uniqueness since the objective function is strictly convex
Optimality condition
z = proxh(x)⇔x−z ∈∂h(z)
http://www.seas.ucla.edu/~vandenbe/236C/lectures/proxop.pdf