Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... Stop at some point...[r]
(1)Gradient Descent
Hoàng Nam Dũng
(2)Gradient descent
Consider unconstrained, smooth convex optimization
min x f(x)
with convex and differentiable functionf :Rn→R Denote the optimal
value byf∗ = minxf(x)and a solution byx∗
Gradient descent: choose initial pointx(0) ∈Rn, repeat: x(k)=x(k−1)−tk · ∇f(x(k−1)), k =1,2,3,
(3)Gradient descent
Consider unconstrained, smooth convex optimization
min x f(x)
with convex and differentiable functionf :Rn→R Denote the optimal
value byf∗ = minxf(x)and a solution byx∗
Gradient descent: choose initial pointx(0) ∈Rn, repeat: x(k)=x(k−1)−tk· ∇f(x(k−1)), k =1,2,3,
(4)●
●
● ●
(5)● ●
(6)Gradient descent interpretation
At each iteration, consider the expansion
f(y)≈f(x) +∇f(x)T(y
−x) +
2t ky−xk
2
Quadratic approximation, replacing usual Hessian∇2f(x)by
tI f(x) +∇f(x)T(y
−x) linear approximation to f
1
2tky−xk
2
2 proximity term tox, with weight 1/2t
Choose next pointy =x+to minimize quadratic approximation
(7)Gradient descent interpretation
●
(8)Outline
I How to choose step sizes
I Convergence analysis
I Nonconvex functions
(9)Fixed step size
Simply taketk =t for allk =1,2,3, , candivergeift is too big
Considerf(x) = (10x2
1+x22)/2, gradient descent after steps:
Fixed step size
Simply taketk=tfor all k= 1,2,3, , can divergeift is too big.
Considerf(x) = (10x21+x22)/2, gradient descent after steps:
(10)Fixed step size
Can beslowift is too small Same example, gradient descent after 100
steps:
Can be slowiftis too small Same example, gradient descent after 100 steps:
−20
−10
0
10
20 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●