1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Bài 2 Slide Linear Regression

84 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Linear Regression
Định dạng
Số trang 84
Dung lượng 1,99 MB

Nội dung

Bài 2 Slide Linear Regression. Linear Regression Linear Regression Regression Given – Data X = x(1), , x(n) – Corresponding labels y = where x(i) y(1), , y(n) where y(i) 2 R 2 9 8 7 6 5 4 3 2 1 0 1975 1980 1985 1990 1995 2000 2005.

Linear Regression Regression Given: – Data x X = x (1) where , , Rd x(i) (n ) where y – Corresponding labels y (1) , , y y(i) R (n ) = Linear Regression Quadratic Regression 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year Prostate Cancer Dataset • • 97 samples, partitioned into 67 train / 30 test Eight predictors (features): – • Continuous outcome variable: – Based on slide by Jeff Howbert continuous (4 log transforms), binary, ordinal lpsa: log(prostate specific antigen level) Linear Regression • Hypothesis: X y = ✓0 + ✓1x1 + ✓2x2 + + ✓dxd d = ✓j x j j =0 Assume x = • Fit model by minimizing sum of squared errors x Figures are courtesy of Greg Shakhnarovich Least Squares Linear Regression • Cost Function X n J (✓) = 2n • Fit by solving ⇣ ⇣ h✓ i=1 ⌘ x (i ) ⌘2 (i) — y J (✓) ✓ Intuition Behind Cost Function J (✓) = X n ⇣ ⇣ h✓ 2n ⌘ x (i ) ⌘2 — i=1 For insight on J(), let’s assume y (i ) x R so ✓ = [✓0 , ✓1 ] Based on example by Andrew Ng Intuition Behind Cost Function J (✓) = X n ⇣ ⇣ h✓ 2n ⌘ ⌘2 x (i ) — i=1 y (i ) x For insight on J(), let’s assume (for fixed , this is a function of x) R so ✓ = [✓0 , ✓1 ] (function of the parameter 3 2 1 0 ) y x -0.5 0.5 1.5 2.5 Based on example by Andrew Ng Intuition Behind Cost Function X J (✓) = n ⇣ ⇣ h✓ 2n ⌘ ⌘2 x (i ) — i=1 y (i ) x For insight on J(), let’s assume (for fixed , this is a function of x) R ✓ = [✓0 , ✓1 ] so (function of the parameter 3 2 1 0 ) y Based on example by Andrew Ng x J ([0, 5]) = -0.5 3 ⇥ (0.5 — 1) + (1 — ) + (1.5 — 3) 0.5 ⇤ 1.5 2.5 ⇡ 0.58 Intuition Behind Cost Function J (✓) = X n ⇣ ⇣ h✓ 2n ⌘ ⌘2 x (i ) — i=1 y (i ) x For insight on J(), let’s assume (for fixed , this is a function of x) R so ✓ = [✓0 , ✓1 ] (function of the parameter 3 2 1 0 ) J ([0, 0]) ⇡ 333 y x -0.5 J() is concave 0.5 1.5 2.5 Based on example by Andrew Ng 10 Intuition Behind Cost Function Slide by Andrew Ng 11 Logistic Regression Objective Function • Can’t just use squared loss as in linear regression: J (✓) = X n ⇣ ⇣ h✓ 2n x ⌘ (i ) ⌘2 (i) i= —y – Using the logistic regression model h ✓ (x ) = results in a non-convex optimization + e—✓T x 70 Deriving the Cost Function via Maximum Likelihood Estimation • Y Likelihood of data is given by: n l (✓) = p(y (i) |x (i ) ; ✓) i= • So, looking for the θ that maximizes the likelihood Y ✓M L E n = arg max l ( ✓) = arg max p( y ✓ (i ) |x (i ) ; ✓) ✓ i =1 • Can take the log without changing the solution: Y ✓M L E n = arg max log p( y (i ) |x (i ) ; ✓) ✓ i =1 X n = arg m ax log p( y (i ) |x (i ) ; ✓) ✓ i =1 10 Deriving the Cost Function via Maximum Likelihood Estimation • Expand as follows: X ✓M LE n = arg ma x log p( y (i ) |x (i ) ; ✓) ✓ i =1 X n h = arg max y ✓ i =1 • (i) log p(y (i) =1 |x (i) ⇣ ; ✓) + 1—y (i) ⌘ ⇣ log — p(y (i) =1 |x (i) ⌘ i ; ✓) Substitute in model, and take negative to yield Logistic regression objective: J (✓) ✓ X n h ⇣ i= J (✓) = — y ⌘ (i) (i) (i) (i) log h ✓ (x )+ 1—y ⇣ log ⌘i — h✓ ( x ) 11 Intuition Behind the Objective X h n i=1 ⇣ (i) (i) (i) (i) J (✓) = — y log h ✓ (x Cost of a single instance: )+ • 1—y log — h✓ ( x ) if y = if y= — log( — h ✓ ( x ) ) cost ( h ✓ ( x ) , y) = ⌘ i ⇣ — log( h ✓ ( x ) ) ⇢ • ⌘ Can re-write objective function as X ⇣ n J (✓) = cost ⌘ h ✓ (x ( i ) ), y ( i ) i= 1 Compare to linear regression: J (✓) = 2n X n ⇣ ⇣ h✓ ⌘ x ⌘2 (i ) (i) —y 12 Intuition Behind the Objective ⇢ cost ( h ✓ ( x ) , y) = Aside: — log( h ✓ ( x ) ) — log( — h ✓ ( x ) ) if y = if y= Recall the plot of log(z) 13 Intuition Behind the Objective — log(h✓ ( x ) ) ⇢ if y = — log(1 — h✓ ( x ) ) cost ( h ✓ ( x ) , y) = if y = If y = • • If y = • cost Cost = if prediction is correct As h ✓ (x ) ! 0, cost ! Captures intuition that larger mistakes should get larger penalties – e.g., predict Based on example by Andrew Ng h ✓ (x ) h ✓ (x ) = , but y = 75 Intuition Behind the Objective — log(h✓ ( x ) ) ⇢ if y = — log(1 — h✓ ( x ) ) cost ( h ✓ ( x ) , y) = if y = If y = • Cost = if prediction is correct • As If y = If y =0 • cost (1 — h ✓ (x )) ! 0, cost ! Captures intuition that larger mistakes should get larger penalties Based on example by Andrew Ng h ✓ (x ) 76 Regularized Logistic Regression X h n i= ⇣ ⌘ (i) (i) (i) (i) J (✓) = — y log h ✓ (x )+ 1—y We can regularize logistic regression exactly as before: log • λ X J regularized = J (✓) + ) ✓2 j =1 j λ k✓ — h✓ ( x d (✓) = J (✓) + ⌘ i ⇣ [1:d ] k2 77 Gradient Descent for Logistic Regression X Jreg (✓) = — Want n i =1 h y (i ) log h ✓ (x (i ) ⇣ )+ 1—y ⌘ (i ) ⇣ log — h ✓ (x (i ) λ ⌘i ) k✓ + 2 [1:d ] k2 J (✓) ✓ • • Initialize ✓ Repeat until convergence @ ✓j J (✓) ← ✓j — ↵ simultaneous update for j = d @✓j Use the natural logarithm (ln = loge) to cancel with the exp() in h✓ (x ) 78 Gradient Descent for Logistic Regression X Jreg(✓) = — Want n h i =1 y (i) log h✓ ( x (i) ⇣ )+ ⌘ 1—y (i) ⇣ log (i) — h✓ (x λ ⌘i ) ✓[1:d ] k + k 2 J (✓) ✓ • Initialize • ✓ Repeat until convergence (simultaneous update for j = d) X ⇣ n ✓0 ← ✓0 — ↵ ⇣ h✓ ⌘ x (i ) (i) —y i=1 " X ✓j n ← ✓j — ↵ ⇣ ⇣ h✓ ⌘ ⌘ x (i ) # ⌘ —y (i ) x (i ) j + ✓ λj i=1 79 Gradient Descent for Logistic Regression • • Initialize ✓ until convergence Repeat (simultaneous update for j = d) X ⇣ n ✓0 ← ✓0 — ↵ ⇣ h✓ ⌘ x (i ) ⌘ — y (i ) i=1 " X ✓j n ⇣ ← ✓j — ↵ ⇣ h✓ ⌘ x (i ) # ⌘ —y (i ) x (i ) j + ✓ λj i=1 This looks IDENTICAL to linear regression!!! • • Ignoring the 1/n constant However, the form of the model is very different: h ✓ (x ) = 80 Multi-Class Classification Binary classification: Multi-class classification: x2 x2 x1 x1 Disease diagnosis: healthy / cold / flu / pneumonia Object classification: desk / chair / monitor / bookcase 81 Multi-Class Logistic Regression • For classes: h ✓ (x ) = + exp(—✓TTx ) exp(✓ x) = exp(✓T x ) 1+ weight assigned to y = • weight assigned to y = For C classes {1, , C }: T p(y = c | x ; ✓1 , , ✓C ) = exp(✓ x)c C c= exp(✓Tx)c – Called the softmax function 82 Multi-Class Logistic Regression Split into One vs Rest: x2 x1 • Train a logistic regression classifier for each class i to predict the probability that y = i hc (x) = T exp(✓ x) C c= with c exp(✓Tx)c 83 Implementing Multi-Class Logistic Regression • Use hc(x) = • T exp(✓ x) Gradient descent c simultaneously C as the model for class c Tx) exp(✓ c c= updates all parameters for all models – Same derivative as before, just with the above hc(x) • Predict class label as the most probable label max h c (x ) c 84 ... (u1 u2 ) • ( v = length(u )2 = – Note: dot product of u with itself • Matrix product: a   a11 A=   a 12 a 21 22 uk  b =   b , B  2 12 21 22 11 11 12 21 ab 21 11 22 21 ab 11 21 + u2v2... Based on slides by Joseph Bradley v2 ) = u1v1 12 12  +ab   +ab  12 22 22 22  Linear Algebra Concepts • Vector products: – Dot product: T u•v = u v= (u  vu1 )  = + u2v2  u12v1 v – Outer... Vectorization • For the linear regression cost function: X n ⇣ J (✓) = 2n = ⇣ h✓ ⌘ x (i ) (i) i=1 X n — y ⇣ ✓| x (i ) — y (i ) R n⇥(d+1) = y = y (1 ) y (2) y (n) ? ?2 2n i=1 Let: ? ?2 2n | (X ✓ — y ) (X

Ngày đăng: 18/10/2022, 09:38

w