Bài giảng Tối ưu hóa nâng cao - Chương 5: Gradient descent

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... Stop at some point...[r]

(1)

Gradient Descent

Hoàng Nam Dũng

(2)

Gradient descent

Consider unconstrained, smooth convex optimization

min x f(x)

with convex and differentiable functionf :Rn→R Denote the optimal

value byf∗ = minxf(x)and a solution byx∗

Gradient descent: choose initial pointx(0) ∈Rn, repeat: x(k)=x(k−1)−tk · ∇f(x(k−1)), k =1,2,3,

(3)

Gradient descent

Consider unconstrained, smooth convex optimization

min x f(x)

with convex and differentiable functionf :Rn→R Denote the optimal

value byf∗ = minxf(x)and a solution byx∗

Gradient descent: choose initial pointx(0) ∈Rn, repeat: x(k)=x(k−1)−tk· ∇f(x(k−1)), k =1,2,3,

(4)

●

● ●

(5)

● ●

(6)

Gradient descent interpretation

At each iteration, consider the expansion

f(y)≈f(x) +∇f(x)T(y

−x) +

2t ky−xk

2

Quadratic approximation, replacing usual Hessian∇2f(x)by

tI f(x) +∇f(x)T(y

−x) linear approximation to f

1

2tky−xk

2

2 proximity term tox, with weight 1/2t

Choose next pointy =x+to minimize quadratic approximation

(7)

Gradient descent interpretation

●

(8)

Outline

I How to choose step sizes

I Convergence analysis

I Nonconvex functions

(9)

Fixed step size

Simply taketk =t for allk =1,2,3, , candivergeift is too big

Considerf(x) = (10x2

1+x22)/2, gradient descent after steps:

Fixed step size

Simply taketk=tfor all k= 1,2,3, , can divergeift is too big.

Considerf(x) = (10x21+x22)/2, gradient descent after steps:

(10)

Fixed step size

Can beslowift is too small Same example, gradient descent after 100

steps:

Can be slowiftis too small Same example, gradient descent after 100 steps:

−20

−10

0

10

20 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Định dạng
Số trang	10
Dung lượng	384,47 KB