xử lý ngôn ngữ tự nhiên,christopher manning,web stanford edu

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	81
Dung lượng	3,5 MB

Nội dung

xử lý ngôn ngữ tự nhiên,christopher manning,web stanford edu Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 4 Gradients by hand (matrix calculus) and algorit[.]

Natural LanguageProcessing Processing Natural Language with DeepLearning Learning with Deep CS224N/Ling284 CS224N/Ling284 Christopher Manning Christopher Manning and Richard Socher Lecture 4: Gradients by hand (matrix calculus) and Lecture 2: Word Vectors algorithm) algorithmically (the backpropagation CuuDuongThanCong.com https://fb.com/tailieudientucntt Introduction Assignment is all about making sure you really understand the math of neural networks … then we’ll let the software it! We’ll go through it quickly today, but also look at the readings! This will be a tough week for some! Make sure to get help if you need it Visit office hours Friday/Tuesday Note: Monday is MLK Day – No office hours, sorry! But we will be on Piazza Read tutorial materials given in the syllabus CuuDuongThanCong.com https://fb.com/tailieudientucntt NER: Binary classification for center word being location • We supervised training and want high score if it’s a location 𝐽" 𝜃 = 𝜎 𝑠 = + 𝑒 *+ x = [ xmuseums xin xParis CuuDuongThanCong.com https://fb.com/tailieudientucntt xare xamazing ] Remember: Stochastic Gradient Descent Update equation: 𝛼 = step size or learning rate How can we compute ∇- 𝐽(𝜃)? By hand Algorithmically: the backpropagation algorithm CuuDuongThanCong.com https://fb.com/tailieudientucntt Lecture Plan Lecture 4: Gradients by hand and algorithmically Introduction (5 mins) Matrix calculus (40 mins) Backpropagation (35 mins) CuuDuongThanCong.com https://fb.com/tailieudientucntt Computing Gradients by Hand • Matrix calculus: Fully vectorized gradients • “multivariable calculus is just like single-variable calculus if you use matrices” • Much faster and more useful than non-vectorized gradients • But doing a non-vectorized gradient can be good for intuition; watch last week’s lecture for an example • Lecture notes and matrix calculus notes cover this material in more detail • You might also review Math 51, which has a new online textbook: http://web.stanford.edu/class/math51/textbook.html CuuDuongThanCong.com https://fb.com/tailieudientucntt Gradients • Given a function with output and input 𝑓 𝑥 = 𝑥3 • It’s gradient (slope) is its derivative 45 46 = 3𝑥 “How much will the output change if we change the input a bit?” CuuDuongThanCong.com https://fb.com/tailieudientucntt Gradients • Given a function with output and n inputs • Its gradient is a vector of partial derivatives with respect to each input CuuDuongThanCong.com https://fb.com/tailieudientucntt Jacobian Matrix: Generalization of the Gradient • Given a function with m outputs and n inputs • It’s Jacobian is an m x n matrix of partial derivatives CuuDuongThanCong.com https://fb.com/tailieudientucntt Chain Rule • For one-variable functions: multiply derivatives • For multiple variables at once: multiply Jacobians 10 CuuDuongThanCong.com https://fb.com/tailieudientucntt ... in more detail • You might also review Math 51, which has a new online textbook: http://web .stanford. edu/ class/math51/textbook.html CuuDuongThanCong.com https://fb.com/tailieudientucntt Gradients

Ngày đăng: 27/11/2022, 21:12