396532696 machine learning cơ bản

Vũ Hữu Tiệp Machine Learning machinelearningcoban.com Vũ Hữu Tiệp Machine Learning Order ebook https:// machinelearningcoban.com/ ebook/ Blog: https:// machinelearningcoban.com Facebook Page: https:// www.facebook.com/ machinelearningbasicvn/ Facebook Group: https:// www.facebook.com/ groups/ machinelearningcoban/ Interactive Learning: https:fundaml.com Last update: June 8, 2018 Chương Lời tác giả Những năm gần đây, trí tuệ nhân tạo (artificial intelligence–AI ) lên chứng cách mạng công nghiệp lần thứ tư (1–động nước, 2–năng lượng điện, 3–cơng nghệ thơng tin) Trí tuệ nhân tạo trở thành thành phần cốt lõi hệ thống cơng nghệ cao Nó len lỏi vào hầu hết lĩnh vực đời sống mà không nhận Xe tự hành Google Tesla, hệ thống tự tag khuôn mặt ảnh Facebook; trợ lý ảo Siri Apple, hệ thống gợi ý sản phẩm Amazon, hệ thống gợi ý phim Netflix, hệ thống dịch đa ngôn ngữ Google Translate, máy chơi cờ vây AlphaGo gần AlphaGo Zero Google DeepMind, v.v., vài ứng dụng bật ứng dụng trí tuệ nhân tạo Học máy (machine learning–ML) tập trí tuệ nhân tạo Nó lĩnh vực nhỏ khoa học máy tính, có khả tự học hỏi dựa liệu đưa vào mà khơng cần phải lập trình cụ thể (Machine Learning is the subfiled of computer science, that “gives computers the ability to learn without being explicitly programmed”–Wikipedia) Những năm gần đây, phát triển hệ thống tính tốn với lượng liệu khổng lồ thu thập hãng công nghệ lớn giúp machine learning tiến thêm bước dài Một lĩnh vực đời gọi học sâu (deep learning–DL) Deep learning giúp máy tính thực thi việc tưởng chừng vào mười năm trước: phân loại ngàn vật thể khác ảnh, tự tạo thích cho ảnh, bắt chước giọng nói chữ viết người, giao tiếp với người, chuyển đổi ngôn ngữ, hay chí sáng tác văn thơ hay âm nhạc1 Mối quan hệ AI-ML-DL Deep learning tập machine learning Machine learning tập artificial intelligence (xem Hình 0.1) Đọc thêm: Inspirational Applications of Deep Learning (https:// goo.gl/ Ds3rRy) CHƯƠNG LỜI TÁC GIẢ ii Hình 0.1: Mối quan hệ artificial intelligence, machine learning, deep learning (Nguồn What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning? – https:// goo.gl/ NNwGCi) 0.1 Mục đích sách Những phát triển thần kỳ trí tuệ nhân tạo dẫn đến nhu cầu cao nhân lực ngành khoa học liệu, machine learning, ngành liên quan toàn giới Việt Nam năm tới Đó động lực để bắt đầu viết blog Machine Learning (https:// machinelearningcoban.com) từ đầu năm 2017 Tính tới thời điểm tơi viết dòng này, trang blog có 650 ngàn lượt ghé thăm Facebook page Machine Learning (https:// goo.gl/ wyUEjr ) blog có 10 nghìn lượt likes, Forum Machine Learning (https:// goo.gl/ gDPTKX ) có gần nghìn thành viên Trong trình viết blog trì trang Facebook, nhận nhiều ủng hộ bạn đọc tinh thần vật chất Ngoài ra, nhiều bạn đọc khuyến khích tơi tổng hợp kiến thức blog lại thành sách cho cộng đồng người làm machine learning sử dụng tiếng Việt Những ủng hộ lời động viên động lực lớn cho tơi bắt tay vào thực hoàn thành sách Lĩnh vực machine learning deep learning rộng lớn có nhiều nhánh nhỏ Để sâu vào nhánh, sách chắn bao quát vấn đề Mục đích sách cung cấp cho bạn khái niệm, kỹ thuật chung Machine Learning https:// machinelearningcoban.com/ ebook iii CHƯƠNG LỜI TÁC GIẢ thuật toán machine learning Từ đó, bạn đọc muốn sâu vào vấn đề cụ thể tìm đọc thêm tài liệu, sách, khố học liên quan Hãy ln nhớ đơn giản trước hết Khi bắt tay vào giải toán machine learning hay toán nào, nên thuật toán đơn giản Khơng nên nghĩ có thuật tốn phức tạp giải vấn đề Những thuật tốn phức tạp thường u cầu độ tính toán cao nhạy cảm với cách chọn tham số đầu vào Thêm vào đó, thuật tốn đơn giản giúp sớm có mơ hình tổng quát cho toán Kết thuật toán đơn giản, thường gọi baseline, giúp có nhìn ban đầu phức tạp toán Việc cải thiện kết dần thực bước sau Cuốn sách giúp bạn có nhìn hướng giải cho tốn machine learning Để có sản phẩm thực tiễn, phải học hỏi thực hành thêm nhiều 0.2 Hướng tiếp cận sách Để giải toán machine learning, cần chọn mơ hình phù hợp Mơ hình mơ tả tham số, lên tới triệu tham số, mà cần tìm Thơng thường, tham số tìm cách giải toán tối ưu Khi viết thuật tốn machine learning, tơi bắt đầu ý tưởng trực quan, theo sau mơ hình tốn học mơ tả ý tưởng Các tham số mơ hình tìm cách tối ưu mơ hình tốn học Các suy luận tốn học ví dụ mẫu Python cuối giúp bạn đọc hiểu rõ nguồn gốc, ý nghĩa, cách sử dụng thuật toán Xen kẽ phần thuật toán machine learning, giới thiệu kỹ thuật tối ưu bản, với hy vọng giúp bạn đọc hiểu rõ chất vấn đề 0.3 Đối tượng sách Cuốn sách thực hướng đến nhiều nhóm độc giả khác Nếu bạn khơng thực muốn sâu vào phần toán, bạn tham khảo source code cách sử dụng thư viện Nhưng để sử dụng thư viện cách hiệu quả, bạn cần hiểu nguồn gốc mơ hình ý nghĩa tham số Nếu bạn thực muốn tìm hiểu nguồn gốc, ý nghĩa thuật tốn, bạn học nhiều điều từ cách xây dựng tối ưu mơ hình Phần tổng hợp kiến thức tốn cần thiết Phần I nguồn tham khảo súc tích bạn có thắc mắc dẫn giải toán học sách2 Phần VII dành riêng để nói tối ưu lồi–một mảng quan trọng tối ưu, phù hợp với bạn thực muốn sâu thêm tối ưu Rất nhiều hình vẽ sách vẽ dạng vector graphics (độ phân giải cao), dùng giảng thuyết trình Các kiến thức sách xếp theo thứ tự từ dễ đến khó, sách hy vọng giáo trình cho khoá học machine learning tiếng Việt Bạn đọc chưa quen với nhiều khái niệm toán học phần đọc từ Phần II quay lại bạn gặp khó khăn Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG LỜI TÁC GIẢ iv Các dẫn giải toán học xây dựng phù hợp với chương trình tốn phổ thơng đại học Việt Nam Các từ khoá dịch sang tiếng Việt dựa tài liệu học nhiều năm học toán Việt Nam Các thuật ngữ tiếng Anh thường xuyên sử dụng, với hy vọng giúp bạn đọc dần làm quen với tài liệu tiếng Anh, giúp bạn học đại học nước ngồi tiếp cận Phần cuối sách có mục Index thuật ngữ quan trọng tiếng Anh nghĩa tiếng Việt kèm tơi tìm cách dịch phù hợp 0.4 Yêu cầu kiến thức Để bắt đầu đọc sách này, bạn cần có kiến thức định đại số tuyến tính, giải tích ma trận, xác suất thống kê, kỹ lập trình Phần I sách ôn tập lại kiến thức toán quan trọng cho machine learning Bất bạn đọc gặp khó khăn tốn, bạn khuyến khích đọc lại chương phần Ngơn ngữ lập trình sử dụng sách Python Lý tơi sử dụng ngơn ngữ ngơn ngữ lập trình miễn phí, cài đặt dễ dàng tảng hệ điều hành khác Quan trọng hơn, có nhiều thư viện hỗ trợ machine learning deep learning viết cho Python Có hai thư viện python thường sử dụng sách numpy scikit-learn Numpy (http:// www.numpy.org/ ) thư viện phổ biến giúp xử lý phép toán liên quan đến mảng nhiều chiều, với hàm gần gũi với đại số tuyến tính Nếu bạn đọc chưa quen thuộc với numpy, bạn tham gia khố học ngắn miễn phí trang web kèm theo sách (https:// fundaml.com) Bạn làm quen với cách xử lý mảng nhiều chiều với nhiều ví dụ tập thực hành trực tiếp trình duyệt Các kỹ thuật xử lý mảng sách đề cập Scikit-learn, hay sklearn, (http:// scikit-learn.org/ ) thư viện chứa nhiều thuật toán machine learning dễ sử dụng Tài liệu scikit-learn nguồn chất lượng cho bạn làm machine learning Scikit-learn dùng sách cách kiểm chứng lại kết mà thực dựa suy luận tốn học lập trình thông qua numpy Tất nhiên, thư viện machine learning phổ biến có bạn tạo sản phẩm cách sử dụng thư viện mà không cần nhiều kiến thức tốn Tuy nhiên, sách khơng hướng tới việc sử dụng thư viện sẵn có mà khơng hiểu chất đằng sau chúng Việc sử dụng thư viện yêu cầu kiến thức định việc lựa chọn điều chỉnh tham số mô hình 0.5 Source code kèm Tồn source code sách tìm thấy https:// github.com/ tiepvupsu/ ebookML_src Các file có ipynb file chứa code (Jupyter notebook) Các file có pdf, png hình tạo từ file ipynb Machine Learning https:// machinelearningcoban.com/ ebook v CHƯƠNG LỜI TÁC GIẢ 0.6 Bố cục sách Cuốn sách chia thành phần tiếp tục cập nhật: Phần I ôn tập lại cho bạn đọc kiến thức quan trọng đại số tuyến tính, giải tích ma trận, xác suất, hai phương pháp phổ biến việc ước lượng tham số cho mơ hình machine learning thống kê Phần II giới thiệu khái niệm machine learning, kỹ thuật xây dựng vector đặc trưng cho liệu, mơ hình machine learning bản–linear regression, tượng cần tránh xây dựng mơ hình machine learning Phần III giúp bạn làm quen với mơ hình machine learning trực quan, khơng u cầu nhiều kiến thức toán phức tạp Qua đây, bạn đọc có nhìn việc xây dựng mơ hình machine learning Phần IV đề cập tới lớp thuật toán machine learning phổ biến nhất–neural networks, tảng cho mơ hình deep learning phức tạp Phần giới thiệu kỹ thuật hữu dụng việc giải tốn tối ưu khơng ràng buộc Phần V giới thiệu kỹ thuật thường dùng hệ thống khuyến nghị sản phầm Phần VI giới thiệu kỹ thuật giảm chiều liệu Phần VII mang lại cho bạn nhìn bao quát tối ưu, đặc biệt tối ưu lồi Các tốn tối ưu lồi có ràng buộc giới thiệu phần Phần VIII giới thiệu thuật toán phân lớp dựa ý tưởng support vector machine 0.7 Các lưu ý ký hiệu Các ký hiệu tốn học sách mơ tả Bảng 0.1 đầu Chương Các khung với font chữ có chiều rộng ký tự dùng để chứa đoạn source code text in a box with constant width represents source codes Các đoạn ký tự với constant width, deep red, ’string, dark green’ dùng để biến, hàm số, chuỗi, v.v., đoạn code Đóng khung in nghiêng Các khái niệm, định nghĩa, định lý, lưu ý quan trọng đóng khung in nghiêng Ký tự phân cách phần nguyên phần thập phân số thực dấu chấm, ‘.’, thay dấu phẩy, ‘,’, tài liệu tiếng Việt khác Cách làm thống với tài liệu tiếng Anh ngơn ngữ lập trình Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG LỜI TÁC GIẢ vi 0.8 Tham khảo thêm Có nhiều sách, khố học, website hay machine learning deep learning, đó, có số mà tơi muốn đặc biệt nhấn mạnh: 0.8.1 Khố học Khóa học Machine Learning Andrew Ng Coursera (https:// goo.gl/ WBwU3K ) Khoá học Deep Learning Specialization Andrew Ng (https:// goo.gl/ ssXfYN ) Các khoá CS224n: Natural Language Processing with Deep Learning (https:// goo.gl/ 6XTNkH ); CS231n: Convolutional Neural Networks for Visual Recognition (http:// cs231n.stanford.edu/ ); CS246: Mining Massive Data Sets (https:// goo.gl/ TEMQ9H ) Stanford Introduction to Computer Science and Programming Using Python (https:// goo.gl/ 4nNXvJ ) MIT 0.8.2 Sách C Bishop, Pattern Recognition and Machine Learning (https:// goo.gl/ pjgqRr ), Springer, 2006 [Bis06] I Goodfellow et al., Deep Learning (https:// goo.gl/ sXaGwV ), MIT press, 2016 [GBC16] J Friedman et al., The Elements of Statistical Learning (https:// goo.gl/ Qh9EkB ), Springer, 2001 [FHT01] Y Abu-Mostafa et al., Learning from data (https:// goo.gl/ SRfNFJ ), AMLBook New York, 2012 [AMMIL12] S JD Prince, Computer Vision: Models, Learning, and Inference (https:// goo.gl/ 9Fchf3 ), Cambridge University Press, 2012 [Pri12] S Boyd et al., Convex Optimization (https:// goo.gl/ NomDpC ), Cambridge university press, 2004 [BV04] Ngoài ra, website Machine Learning Mastery (https:// goo.gl/ 5DwGbU ), Pyimagesearch (https:// goo.gl/ 5DwGbU ) Kaggle (https:// www.kaggle.com/ ), Scikit-learn (http: // scikit-learn.org/ ) nguồn thơng tin hữu ích Machine Learning https:// machinelearningcoban.com/ ebook vii CHƯƠNG LỜI TÁC GIẢ 0.9 Đóng góp ý kiến Mọi ý kiến đóng góp, phản hồi, báo lỗi cho nội dung sách tốt đáng quý Các bạn gửi ý kiến tới vuhuutiep@gmail.com tạo issue https:// goo.gl/ zPYWKV Cuốn sách tiếp tục chỉnh sửa thêm chương sách giấy mắt Tất bạn đặt ebook nhận cập nhật sách giấy (dự tính vào năm 2018) 0.10 Vấn đề quyền Toàn nội dung blog sách (bao gồm source code hình ảnh minh hoạ) thuộc quyền tôi–Vũ Hữu Tiệp Tôi mong muốn kiến thức tạo đến với nhiều bạn đọc Tuy nhiên, không ủng hộ hình thức chép khơng trích nguồn Mọi trích dẫn cần nêu rõ tên sách, tên tác giả (Vũ Hữu Tiệp), link gốc tới blog Các viết trích dẫn q 25% tồn văn post blog chương sách không phép, trừ trường hợp có đồng ý tác giả Mọi vấn đề liên quan đến chép, phân phát, đăng tải, sử dụng sách blog, trao đổi, cộng tác, xin vui lòng liên hệ với tơi địa email vuhuutiep@gmail.com 0.11 Lời cảm ơn Trước hết, xin cảm ơn bạn bè friend list Facebook tơi nhiệt tình ủng hộ chia sẻ blog ngày đầu blog mắt Tôi xin chân thành cảm ơn bạn đọc blog Machine Learning Facebook page Machine Learning đồng hành suốt năm qua Không có độc giả, chắn tơi khơng có đủ động lực viết 30 blog nhiều ghi chép nhanh Facebook page Trong trình viết blog, nhận rất nhiều ủng hộ bạn đọc vật chất lẫn tinh thần Khơng có ủng hộ lời động viên viết sách, dự án bắt đầu Khi bắt đầu, số lượng pre-order sách tăng lên ngày Tôi thực biết ơn bạn pre-order lời nhắn gửi ấm áp Quan trọng hết, số lượng sách đặt trước tơi hồn thành khiến tơi tin sản phẩm tạo mang lại giá trị định cho cộng đồng Những điều góp phần tơi trì tinh thần làm việc cố gắng để tạo sản phẩm chất lượng Tôi may mắn nhận phản hồi tích cực góp ý từ thầy cô trường đại học lớn ngồi nước Tơi xin gửi lời cảm ơn tới thầy Phạm Ngọc Nam cô Nguyễn Việt Hương (ĐH Bách Khoa Hà Nội), thầy Chế Viết Nhật Anh (ĐH Bách Khoa Tp.HCM), thầy Nguyễn Thanh Tùng (ĐH Thuỷ Lợi), thầy Trần Duy Trác (ĐH Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG LỜI TÁC GIẢ viii Johns Hopkins), anh Nguyễn Hồng Lâm (người hướng dẫn thời gian thực tập U.S Army Research Lab) Tơi đặc biệt cảm ơn bạn Nguyễn Hồng Linh Hoàng Đức Huy, Đại học Waterloo–Canada, người bạn nhiệt tình giúp tơi xây dựng trang FundaML.com giúp bạn đọc học Python/Numpy trực tiếp trình duyệt Tôi xin cảm ơn bạn Lê Việt Hải–nghiên cứu sinh ngành toán ứng dụng Penn State, Đinh Hồng Phong–kỹ sư phần mềm Facebook–đã góp ý sửa đổi nhiều điểm ngơn ngữ tốn nháp Tôi tin sách sửa đổi nhiều so với phiên blog Tôi xin cảm ơn ba người bạn thân–Nguyễn Tiến Cường, Nguyễn Văn Giang, Vũ Đình Quyền– ln động viên tơi đóng góp nhiều phản hồi q giá cho sách Ngồi ra, tơi xin cảm ơn người bạn thân thiết khác Penn State bên cạnh thời gian thực dự án, bao gồm gia đình anh Triệu Thanh Quang, gia đình anh Trần Quốc Long, bạn thân (cũng blogger) Nguyễn Phương Chi, đồng nghiệp John McKay, Tiantong Guo, Hojjat Mousavi, Omar Aldayel, Mohammad Tofighi Phòng nghiên cứu Xử lý Thơng tin Thuật toán (Information Processing and Algorithm Laboratory–iPAL), ĐH bang Pennsylvania Cuối quan trọng nhất, xin cảm ơn gia đình tơi, người ln ủng hộ tơi vơ điều kiện hỗ trợ tơi q trình thực dự án Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG 29 MULTI-CLASS SUPPORT VECTOR MACHINE 374 # more efficient way to compute loss and grad def svm_loss_vectorized(W, X, y, reg): d, C = W.shape N = X.shape[0] loss = dW = np.zeros_like(W) Z = X.dot(W) # shape of (N, C) id0 = np.arange(Z.shape[0]) correct_class_score = Z[id0, y].reshape(N, 1) # shape of (N, 1) margins = np.maximum(0, Z - correct_class_score + 1) # shape of (N, C) margins[id0, y] = loss = np.sum(margins) loss /= N loss += 0.5 * reg * np.sum(W * W) F = (margins > 0).astype(int)# shape of (N, C) F[np.arange(F.shape[0]), y] = np.sum(-F, axis = 1) dW = X.T.dot(F)/N + reg*W return loss, dW Đoạn code phía khơng chứa vòng for Để kiểm tra tính xác hiệu hàm này, cần kiểm chứng ba điều (i) Giá trị hàm mát xác chưa (ii) Giá trị đạo hàm xác chưa (iii) Cách tính thực hiệu chưa Ba điều kiểm chứng thơng qua đoạn code d, C = 3073, 10 W_rand = np.random.randn(d, C) import time t1 = time.time() l1, dW1 = svm_loss_naive(W_rand, X_train, y_train, reg) t2 = time.time() l2, dW2 = svm_loss_vectorized(W_rand, X_train, y_train, reg) t3 = time.time() print(’Naive run time:’, t2 - t1, ’(s)’) print(’Vectorized run time:’, t3 - t2, ’(s)’) print(’loss difference:’, np.linalg.norm(l1 - l2)) print(’gradient difference:’, np.linalg.norm(dW1 - dW2)) Kết quả: Naive run time: 7.34640693665 (s) Vectorized run time: 0.365024089813 (s) loss difference: 8.73114913702e-11 gradient difference: 1.87942037251e-10 Kết cho thấy cách tính vectorization nhanh so với cách tính naive khoảng 20 lần Hơn nữa, chênh lệch kết hai cách tính nhỏ, nhỏ 1e−10; ta sử dụng cách tính vectorization để cập nhật nghiệm sử dụng mini-batch gradient descent Machine Learning https:// machinelearningcoban.com/ ebook 375 CHƯƠNG 29 MULTI-CLASS SUPPORT VECTOR MACHINE 29.3.3 Mini-batch gradient descent cho multi-class SVM Với hàm viết, thực việc huấn luyện multi-class SVM đoạn code # Mini-batch gradient descent def multiclass_svm_GD(X, y, Winit, reg, lr=.1, \ batch_size = 1000, num_iters = 50, print_every = 10): W = Winit loss_history = [] for it in xrange(num_iters): mix_ids = np.random.permutation(X.shape[0]) n_batches = int(np.ceil(X.shape[0]/float(batch_size))) for ib in range(n_batches): ids = mix_ids[batch_size*ib: min(batch_size*(ib+1), X.shape[0])] X_batch = X[ids] y_batch = y[ids] lossib, dW = svm_loss_vectorized(W, X_batch, y_batch, reg) loss_history.append(lossib) W -= lr*dW if it % print_every == and it > 0: print(’it %d/%d, loss = %f’ %(it, num_iters, loss_history[it])) return W, loss_history d, C = X_train.shape[1], 10 reg = W = 0.00001*np.random.randn(d, C) W, loss_history = multiclass_svm_GD(X_train, y_train, W, reg, lr = 1e-8, num_iters = 50, print_every = 5) Kết quả: epoch epoch epoch epoch epoch epoch epoch epoch epoch 5/50, loss = 5.482782 10/50, loss = 5.204365 15/50, loss = 4.885159 20/50, loss = 5.051539 25/50, loss = 5.060423 30/50, loss = 4.691241 35/50, loss = 4.841132 40/50, loss = 4.643097 45/50, loss = 4.691177 Ta thấy giá trị loss có xu hướng giảm hội tụ Giá trị sau vòng lặp minh hoạ Hình 29.6 Sau tìm ma trận hệ số W đại diện cho mơ hình multi-class SVM, cần viết hàm xác định nhãn điểm liệu đánh giá độ xác mơ đây: Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG 29 MULTI-CLASS SUPPORT VECTOR MACHINE Hình 29.6: Lịch sử loss qua vòng lặp Ta thấy loss có xu hướng giảm hội tụ nhanh loss function 376 500 1000 1500 number of iterations 2000 2500 def multisvm_predict(W, X): Z = X.dot(W) return np.argmax(Z, axis=1) def evaluate(W, X, y): y_pred = multisvm_predict(W, X) acc = 100*np.mean(y_pred == y) return acc Việc sử dụng tập validation để chọn tham số mơ hình phù hợp Có hai tham số thuật toán tối ưu multi-class SVM: regularization learning rate Hai tham số tìm dựa cặp giá trị cho trước Bộ giá trị khiến cho độ xác mơ hình tập validation cao dùng để đánh giá tập kiểm thử lrs = [1e-9, 1e-8, 1e-7, 1e-6] regs = [0.1, 0.01, 0.001, 0.0001] best_W = best_acc = for lr in lrs: for reg in regs: W, loss_history = multiclass_svm_GD(X_train, y_train, W, reg, \ lr = 1e-8, num_iters = 100, print_every = 1e20) acc = evaluate(W, X_val, y_val) print(’lr = %e, reg = %e, loss = %f, validation acc = %.2f’ %(lr, reg, loss_history[-1], acc)) if acc > best_acc: best_acc = acc best_W = W Machine Learning https:// machinelearningcoban.com/ ebook 377 CHƯƠNG 29 MULTI-CLASS SUPPORT VECTOR MACHINE Kết quả: lr lr lr lr lr lr lr lr lr lr lr lr lr lr lr lr = = = = = = = = = = = = = = = = 1.000000e-09, 1.000000e-09, 1.000000e-09, 1.000000e-09, 1.000000e-08, 1.000000e-08, 1.000000e-08, 1.000000e-08, 1.000000e-07, 1.000000e-07, 1.000000e-07, 1.000000e-07, 1.000000e-06, 1.000000e-06, 1.000000e-06, 1.000000e-06, reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg reg = = = = = = = = = = = = = = = = 1.000000e-01, 1.000000e-02, 1.000000e-03, 1.000000e-04, 1.000000e-01, 1.000000e-02, 1.000000e-03, 1.000000e-04, 1.000000e-01, 1.000000e-02, 1.000000e-03, 1.000000e-04, 1.000000e-01, 1.000000e-02, 1.000000e-03, 1.000000e-04, loss loss loss loss loss loss loss loss loss loss loss loss loss loss loss loss = = = = = = = = = = = = = = = = 4.422479, 4.474095, 4.240144, 4.257436, 4.482856, 4.036566, 4.085053, 3.891934, 3.947408, 4.088984, 4.073365, 4.006863, 3.851727, 3.941015, 3.995598, 3.857822, validation validation validation validation validation validation validation validation validation validation validation validation validation validation validation validation acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc = = = = = = = = = = = = = = = = 40.30 40.70 40.90 41.40 41.50 41.40 41.00 41.40 41.50 41.90 41.70 41.80 41.90 41.80 41.60 41.80 Như vậy, độ xác cao cho tập validation 41.9% Ma trận hệ số W tốt lưu biến best_W Áp dụng mơ hình lên tập kiểm thử: acc = evaluate(best_W, X_test, y_test) print(’Accuracy on test data = %2f %%’%acc) Kết quả: Accuracy on test data = 39.88 % Như vậy, kết đạt rơi vào khoảng gần 40 % Bạn đọc thử với tham số khác đạt kết tốt vài phần trăm 29.3.4 Minh họa nghiệm tìm Để ý wi có chiều giống chiều liệu Bằng cách bỏ hệ số tương ứng với bias xếp lại điểm 10 vector hệ số tìm được, thu ảnh có kích thước × 32 × 32 ảnh nhỏ sở liệu Hình 29.7 mơ tả hệ số tìm wi Ta thấy hệ số tương ứng với lớp mơ tả hình dạng giống với ảnh lớp tương ứng, ví dụ car truck trông giống với ảnh lớp car truck Hệ số ship plane có mang màu xanh nước biển bầu trời Trong horse trông giống ngựa hai đầu; điều dễ hiểu tập training, ngựa quay đầu hai phía Có thể nói theo cách khác hệ số tìm được coi ảnh đại diện cho lớp Vì nói vậy? Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG 29 MULTI-CLASS SUPPORT VECTOR MACHINE 378 Hình 29.7: Minh họa hệ số tìm dạng ảnh Cùng xem lại cách xác định class cho liệu thực cách tìm vị trí giá trị lớn score vector WT x, tức class(x) = arg max wiT x i=1,2, ,C Để ý tích vơ hướng đại lượng đo tương quan hai vector Đại lượng lớn tương quan cao, tức hai vector giống Như vậy, việc tìm nhãn ảnh việc tìm ảnh gần với ảnh đại diện cho lớp Việc giống với K-nearest neighbors, thay thực KNN tồn training data, thực 10 ảnh đại diện tìm multi-class SVM Lập luận áp dụng với softmax regression 29.4 Thảo luận • Giống softmax regression, multi-class SVM coi phân lớp tuyến tính đường ranh giới lớp đường tuyến tính • Kernel SVM hoạt động tốt, việc tính tốn ma trận kernel tốn nhiều thời gian nhớ Hơn nữa, việc mở rộng cho tốn multi-class classification thường khơng hiệu multi-class SVM kỹ thuật sử dụng one-vs-rest Một ưu điểm multi-class SVM tối ưu phương pháp gradient descent, phù hợp với toán với liệu lớn Việc đường ranh giới lớp tuyến tính giải cách kết hợp với deep neurel network • Có cách mở rộng hinge loss cho toán multi-class classification dùng loss: max(0, − wyTn xn + maxj=yn wjT xn ) Đây vi phạm lớn nhất, so với tổng vi pham mà sử dụng • Trên thực tế, multi-class SVM softmax regression có hiệu tương đương (xem https:// goo.gl/ xLccj3 ) Có thể tốn cụ thể, phương pháp tốt phương pháp kia, điều ngược lại xảy toán khác Khi thực hành, có thể, ta thử hai phương pháp chọn phương pháp cho kết tốt Machine Learning https:// machinelearningcoban.com/ ebook Phụ lục A Phương pháp nhân tử Lagrange Việc tối thiểu (tối đa) hàm số biến liên tục, khả vi, với tập xác định tập mở1 thường thực dựa việc giải phương trình đạo hàm hàm số Gọi hàm số f (x) : R → R, giá trị nhỏ lớn có thường tìm cách giải phương trình f (x) = Chú ý điều ngược lại không đúng, tức điểm thoả mãn đạo hàm không chưa làm cho hàm số đạt giá trị nhỏ lớn Ví dụ hàm f (x) = x3 có điểm dừng khơng phải điểm cực trị Với hàm nhiều biến, ta áp dụng quan sát Tức cần tìm nghiệm phương trình đạo hàm theo biến không Cách làm áp dụng vào tốn tối ưu khơng ràng buộc, tức khơng có điều kiện biến X Với tốn mà ràng buộc phương trình: x = arg minx f0 (x) thoả mãn: f1 (x) = (A.1) ta có phương pháp để đưa tốn khơng ràng buộc Phương pháp có tên phương pháp nhân tử Lagrange Xét hàm số L(x, λ) = f0 (x) + λf1 (x) với biến λ gọi nhân tử Lagrange (Lagrange multiplier ) Hàm số L(x, λ) gọi hàm hỗ trợ (auxiliary function), hay the Lagrangian Người ta chứng minh rằng, điểm optimal value toán (A.1) thoả mãn điều kiện ∇x,λ L(x, λ) = Điều tương đương với: ∇x L(x, λ) = ∇x f0 (x) + λ∇x f1 (x) = ∇λ L(x, λ) = f1 (x) = Để ý điều kiện thứ hai ràng buộc tốn (A.1) Xem thêm: Open sets, closed sets and sequences of real numbers (https:// goo.gl/ AgKhCn) (A.2) (A.3) PHỤ LỤC A PHƯƠNG PHÁP NHÂN TỬ LAGRANGE 380 Việc giải hệ phương trình (A.2) - (A.3), nhiều trường hợp, đơn giản việc trực tiếp tìm nghiệm tốn (A.1) Ví dụ 1: Tìm giá trị lớn nhỏ hàm số f0 (x, y) = x + y, biết x, y thoả mãn điều kiện f1 (x, y) = x2 + y = Lời giải: Điều kiện ràng buộc viết lại dạng x2 + y − = Lagrangian toán là: L(x, y, λ) = x + y + λ(x2 + y − 2) Các điểm cực trị hàm số Lagrange phải thoả mãn điều kiện   + 2λx = ∇x,y,λ L(x, y, λ) = ⇔ + 2λy = (A.4)  x + y2 = Thay vào phương trình cuối ta Từ hai phương trình đầu (A.4) ta suy x = y = −1 2λ 1 có λ = ⇒ λ = ± Vậy ta cặp nghiệm (x, y) ∈ {(1, 1), (−1, −1)} Bằng cách thay giá trị vào hàm mục tiêu, ta tìm giá trị nhỏ lớn tốn Ví dụ 2: √2 norm ma trận Chúng ta quen thuộc với norm vector x : x = xT x Dựa norm vector, norm ma trận A ∈ Rm×n ký hiệu A định nghĩa sau: A = max Ax = max x xT AT Ax , với x ∈ Rn xT x (A.5) Bài toán tối ưu tương đương với: max xT AT Ax thoả mãn: xT x = (A.6) Lagrangian toán L(x, λ) = xT AT Ax + λ(1 − xT x) (A.7) Các điểm cực trị hàm số Lagrange phải thoả mãn ∇x L = 2AT Ax − 2λx = ∇λ L = − xT x = (A.8) (A.9) Từ (A.8) ta có AT Ax = λx Vậy x phải vector riêng AT A λ trị riêng tương ứng Nhân hai vế biểu thức với xT vào bên trái sử dụng (A.9), ta thu xT AT Ax = λxT x = λ (A.10) Từ suy Ax đạt giá trị lớn λ đạt giá trị lớn Nói cách khác, λ phải trị riêng lớn AT A Vậy, A = λmax (AT A) Machine Learning https:// machinelearningcoban.com/ ebook 381 PHỤ LỤC A PHƯƠNG PHÁP NHÂN TỬ LAGRANGE Các trị riêng AT A gọi singular value A Tóm lại, trận singular value lớn ma trận norm ma Hồn tồn tương tự, nghiệm toán Ax x ≤1 (A.11) vector riêng ứng với singular value nhỏ A Machine Learning https:// machinelearningcoban.com/ ebook Tài liệu tham khảo AKA91 David W Aha, Dennis Kibler, and Marc K Albert Instance-based learning algorithms Machine learning, 6(1):37–66, 1991 AM93 Sunil Arya and David M Mount Algorithms for fast vector quantization In Data Compression Conference, 1993 DCC’93., pages 381–390 IEEE, 1993 AMMIL12 Yaser S Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin Learning from data, volume AMLBook New York, NY, USA:, 2012 AV07 David Arthur and Sergei Vassilvitskii k-means++: The advantages of careful seeding In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035 Society for Industrial and Applied Mathematics, 2007 Bis06 Christopher M Bishop Pattern recognition and machine learning springer, 2006 BL14 Artem Babenko and Victor Lempitsky Additive quantization for extreme vector compression In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 931–938, 2014 Ble08 David M Blei Hierarchical clustering 2008 BMV+ 12 Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii Scalable k-means++ Proceedings of the VLDB Endowment, 5(7):622–633, 2012 BTVG06 Herbert Bay, Tinne Tuytelaars, and Luc Van Gool Surf: Speeded up robust features Computer vision– ECCV 2006, pages 404–417, 2006 BV04 Stephen Boyd and Lieven Vandenberghe Convex optimization Cambridge university press, 2004 CDF+ 04 Gabriella Csurka, Christopher Dance, Lixin Fan, Jutta Willamowski, and Cédric Bray Visual categorization with bags of keypoints In Workshop on statistical learning in computer vision, ECCV, volume 1, pages 1–2 Prague, 2004 CLMW11 Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright Robust principal component analysis? Journal of the ACM (JACM), 58(3):11, 2011 Cyb89 George Cybenko Approximation by superpositions of a sigmoidal function Mathematics of Control, Signals, and Systems (MCSS), 2(4):303–314, 1989 DFK+ 04 Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, and V Vinay Clustering large graphs via the singular value decomposition Machine learning, 56(1):9–33, 2004 dGJL05 Alexandre d’Aspremont, Laurent E Ghaoui, Michael I Jordan, and Gert R Lanckriet A direct formulation for sparse pca using semidefinite programming In Advances in neural information processing systems, pages 41–48, 2005 DHS11 John Duchi, Elad Hazan, and Yoram Singer Adaptive subgradient methods for online learning and stochastic optimization Journal of Machine Learning Research, 12(Jul):2121–2159, 2011 DT05 Navneet Dalal and Bill Triggs Histograms of oriented gradients for human detection In Computer Vision and Pattern Recognition, 2005 CVPR 2005 IEEE Computer Society Conference on, volume 1, pages 886–893 IEEE, 2005 ERK+ 11 Michael D Ekstrand, John T Riedl, Joseph A Konstan, et al Collaborative filtering recommender systems Foundations and Trends® in Human–Computer Interaction, 4(2):81–173, 2011 FHT01 Jerome Friedman, Trevor Hastie, and Robert Tibshirani The elements of statistical learning, volume Springer series in statistics New York, 2001 Fuk13 Keinosuke Fukunaga Introduction to statistical pattern recognition Academic press, 2013 GBC16 Ian Goodfellow, Yoshua Bengio, and Aaron Courville Deep Learning MIT Press, 2016 http:// www deeplearningbook.org Tài liệu tham khảo 384 GR70 Gene H Golub and Christian Reinsch Singular value decomposition and least squares solutions Numerische mathematik, 14(5):403–420, 1970 HNO06 Per Christian Hansen, James G Nagy, and Dianne P O’leary Deblurring images: matrices, spectra, and filtering SIAM, 2006 HZRS16 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Deep residual learning for image recognition In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016 JDJ17 Jeff Johnson, Matthijs Douze, and Hervé Jégou Billion-scale similarity search with gpus arXiv preprint arXiv:1702.08734, 2017 JDS11 Herve Jegou, Matthijs Douze, and Cordelia Schmid Product quantization for nearest neighbor search IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128, 2011 KA04 Shehroz S Khan and Amir Ahmad Cluster center initialization algorithm for k-means clustering Pattern recognition letters, 25(11):1293–1302, 2004 KB14 Diederik Kingma and Jimmy Ba Adam: A method for stochastic optimization arXiv preprint arXiv:1412.6980, 2014 KBV09 Yehuda Koren, Robert Bell, and Chris Volinsky Matrix factorization techniques for recommender systems Computer, 42(8), 2009 KH92 Anders Krogh and John A Hertz A simple weight decay can improve generalization In Advances in neural information processing systems, pages 950–957, 1992 KSH12 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton Imagenet classification with deep convolutional neural networks In Advances in neural information processing systems, pages 1097–1105, 2012 LCB10 Yann LeCun, Corinna Cortes, and Christopher JC Burges Mnist handwritten digit database AT&T Labs [Online] Available: http://yann lecun com/exdb/mnist, 2, 2010 LCD04 Anukool Lakhina, Mark Crovella, and Christophe Diot Diagnosing network-wide traffic anomalies In ACM SIGCOMM Computer Communication Review, volume 34, pages 219–230 ACM, 2004 Low99 David G Lowe Object recognition from local scale-invariant features In Computer vision, 1999 The proceedings of the seventh IEEE international conference on, volume 2, pages 1150–1157 Ieee, 1999 LSP06 Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories In Computer vision and pattern recognition, 2006 IEEE computer society conference on, volume 2, pages 2169–2178, 2006 LW+ 02 Andy Liaw, Matthew Wiener, et al Classification and regression by randomforest R news, 2(3):18–22, 2002 M+ 97 Tom M Mitchell et al Machine learning wcb, 1997 MSS+ 99 Sebastian Mika, Bernhard Schă olkopf, Alex J Smola, Klaus-Robert Mă uller, Matthias Scholz, and Gunnar Ră atsch Kernel pca and de-noising in feature spaces In Advances in neural information processing systems, pages 536–542, 1999 Nes07 Yurii Nesterov Gradient methods for minimizing composite objective function, 2007 NF13 Mohammad Norouzi and David J Fleet Cartesian k-means In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3017–3024, 2013 NJW02 Andrew Y Ng, Michael I Jordan, and Yair Weiss On spectral clustering: Analysis and an algorithm In Advances in neural information processing systems, pages 849–856, 2002 Pat07 Arkadiusz Paterek Improving regularized singular value decomposition for collaborative filtering In Proceedings of KDD cup and workshop, volume 2007, pages 5–8, 2007 Pla98 John Platt Sequential minimal optimization: A fast algorithm for training support vector machines 1998 Pri12 Simon JD Prince Computer vision: models, learning, and inference Cambridge University Press, 2012 RDVC+ 04 Lorenzo Rosasco, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri Are loss functions all the same? Neural Computation, 16(5):1063–1076, 2004 Rey15 Douglas Reynolds Gaussian mixture models Encyclopedia of biometrics, pages 827–832, 2015 Ros57 F Rosemblat The perceptron: A perceiving and recognizing automation Cornell Aeronautical Laboratory Report, 1957 Rud16 Sebastian Ruder An overview of gradient descent optimization algorithms arXiv preprint arXiv:1609.04747, 2016 SCSC03 Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang A novel anomaly detection scheme based on principal component classifier Technical report, MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003 SFHS07 J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen Collaborative filtering recommender systems In The adaptive web, pages 291–324 Springer, 2007 SHK+ 14 Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov Dropout: a simple way to prevent neural networks from overfitting Journal of machine learning research, 15(1):1929–1958, 2014 Machine Learning https:// machinelearningcoban.com/ ebook 385 Tài liệu tham khảo SKKR00 Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl Application of dimensionality reduction in recommender system-a case study Technical report, Minnesota Univ Minneapolis Dept of Computer Science, 2000 SKKR02 Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl Incremental singular value decomposition algorithms for highly scalable recommender systems In Fifth International Conference on Computer and Information Science, pages 27–28 Citeseer, 2002 SLJ+ 15 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich Going deeper with convolutions In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 19, 2015 SSWB00 Bernhard Schă olkopf, Alex J Smola, Robert C Williamson, and Peter L Bartlett New support vector algorithms Neural computation, 12(5):1207–1245, 2000 SWY75 Gerard Salton, Anita Wong, and Chung-Shu Yang A vector space model for automatic indexing Communications of the ACM, 18(11):613–620, 1975 SZ14 Karen Simonyan and Andrew Zisserman Very deep convolutional networks for large-scale image recognition arXiv preprint arXiv:1409.1556, 2014 TH12 Tijmen Tieleman and Geoffrey Hinton Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude COURSERA: Neural networks for machine learning, 4(2):26–31, 2012 VJG14 Jỗo Vinagre, Alípio Mário Jorge, and Jỗo Gama Fast incremental matrix factorization for recommendation with positive-only feedback In International Conference on User Modeling, Adaptation, and Personalization, pages 459–470 Springer, 2014 VL07 Ulrike Von Luxburg A tutorial on spectral clustering Statistics and computing, 17(4):395–416, 2007 VM16 Tiep Vu and Vishal Monga Learning a low-rank shared dictionary for object classification In Proceedings IEEE Int Conference on Image Processing, pages 4428–4432 IEEE, 2016 VM17 Tiep Vu and Vishal Monga Fast low-rank shared dictionary learning for image classification IEEE Transactions on Image Processing, 26(11):5160–5175, Nov 2017 VMM+ 16 Tiep Vu, Hojjat Seyed Mousavi, Vishal Monga, Ganesh Rao, and UK Arvind Rao Histopathological image classification using discriminative feature-oriented dictionary learning IEEE transactions on medical imaging, 35(3):738–751, 2016 WYG+ 09 John Wright, Allen Y Yang, Arvind Ganesh, S Shankar Sastry, and Yi Ma Robust face recognition via sparse representation IEEE transactions on pattern analysis and machine intelligence, 31(2):210–227, 2009 XWCL15 Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li Empirical evaluation of rectified activations in convolutional network arXiv preprint arXiv:1505.00853, 2015 YZFZ11 M Yang, L Zhang, X Feng, and D Zhang Fisher discrimination dictionary learning for sparse representation pages 543–550, Nov 2011 ZDW14 Ting Zhang, Chao Du, and Jingdong Wang Composite quantization for approximate nearest neighbor search In ICML, number 2, pages 838–846, 2014 ZF14 Matthew D Zeiler and Rob Fergus Visualizing and understanding convolutional networks In European conference on computer vision, pages 818–833 Springer, 2014 ZWFM06 Sheng Zhang, Weihong Wang, James Ford, and Fillia Makedon Learning from incomplete ratings using non-negative matrix factorization In Proceedings of the 2006 SIAM International Conference on Data Mining, pages 549–553 SIAM, 2006 ZYK06 Haitao Zhao, Pong Chi Yuen, and James T Kwok A novel incremental principal component analysis and its application for face recognition IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 36(4):873–886, 2006 ZYX+ 08 Zhi-Qiang Zeng, Hong-Bin Yu, Hua-Rong Xu, Yan-Qi Xie, and Ji Gao Fast training support vector machines using parallel sequential minimal optimization In Intelligent System and Knowledge Engineering, 2008 ISKE 2008 3rd International Conference on, volume 1, pages 997–1001 IEEE, 2008 Machine Learning https:// machinelearningcoban.com/ ebook Index α–sublevel sets, 294 activation fuction sigmoid fuction, 167 fuction, 167 activation function, 162 ReLU, 199 activation function–hàm kích hoạt, 197 affine function, 290 back substitution, 16 backpropagation, 200 bag of words, 75 dictionary, 76 basic, 19 orthogonal, 20 orthonormal, 20 batch gradient descent, 152 Bayes’ rule - quy tắc Bayes, 44 bias, 86 bias trick, 86, 366 binary classification, 156 class boundary, 156 classification–phân lớp, 65 closed-form solution, 97 cluster, 110 complementary slackness, 323 conditional probability - xác suất có điều kiện, 44 conjugate distributions, 59 conjugate prior, 59 constraints, 282 contours, 293 convex, 282 combination, 287 function, 288 domain, 288 functions first-order condition, 296 Second-order condition, 297 hull, 287 optimization problems, 305 sets, 283 strictly convex functions, 289 convex optimization, 282 cosine similarity, 228 cross entropy, 184 CVXOPT, 307 data point - điểm liệu, 64 determinant, 16 diagonal matrix, 15 dimensionality reduction, 75 dimensionality reduction – giảm chiều liệu, 245 duality, 317 early stopping, 96 eigenvalues, 22 eigenvectors, 22 elbow method, 123 element-wise, 197 epoch, 153 expectation - kỳ vọng, 45 feasible points, 282 feasible sets, 282 feature engineering, 71 feature extraction, 245 feature selection, 97, 245 feature vector, 71 feature vector - vector đặc trưng, 64 Fisher’s linear discriminant, 273 forward substitution, 16 Gaussian naive Bayes, 128 Gaussion mixture model, 124 GD, see gradient descent generalization, 91 Geometric Programming, 313 Geometric programming convex form, 315 global minimum, 140 gradient descent, 140 stopping criteria – điều kiện dừng, 155 batch size, 154 momentum, 148 Nesterov accelerated gradient, 151 gradient–đạo hàm, 30 387 first-order gradient–đạo hàm bậc nhất, 30 numerical gradient, 36 second-order gradient–đạo hàm bậc hai, 30 ground truth, 83 Hadamard product, 202, 203 halfspace, 285 hand-crafted feature, 79 Hermitian, 13 hidden layer, 162 hierarchical, 176 hierarchical clustering, 120 hinge loss, 346 hinge loss tổng , 368 Huber loss, 89 hyperparameter, 60 hyperplane, 156 hyperplane – siêu mặt phẳng, 285 hyperpolygon–siêu đa diện, 111 identity matrix - ma trận đơn vị, 14 infeasible sets, 282 inner product – tích vơ hướng, 14 input layer, 162 inverse matrix - ma trận nghịch đảo, 15 iteration, 153 joint probability - xác suất đồng thời, 41 K-means clustering, 110 K-nearest neighbor, 100 Kernel, 358 Kernel trick, 358 Linear, 359 Mercer conditions, 359 Polynomial, 360 Radial Basic Function (RBF), 360 Sigmoid, 360 KKT conditions, 324 KNN, xem K-nearest neighbor, 100 Lagrange dual function, 318 dual problem, 321 Lagrangian, 318 Lagrange/Lagrangian dual functions, 318 Laplace smoothing, 129 large-scale, 101 lasso regression, 97 lazy learning, 100 LDA, 269 learning rate, 141 lemmatization, 133 level sets, 293 level sets–đường đồng mức, 147 likelihood, 53 linear combination, 17 linear dependece, 17 Machine Learning Index linear discriminant analysis, 269 linear independence, 17 linear programming, 307 general form, 308 standard form, 308 linear regression–hồi quy tuyến tính, 83 linearly separable, 156 Ling-Spam dataset, 132 local minimum, 140 log-likelihood, 53 loss function–hàm mát, 69 MAP, 58 marginal probability - xác suất biên, 43 marginalization, 43 matrix calculus, 30 matrix completion, 216 matrix factorization: phân tích ma trận thành nhân tử, 236 maximum a posteriori, 58 maximum entropy classifier, 191 maximum likelihood estimation, 53 maximum margin classifier, 330 mean squared error, 93 mini-batch gradient descent, 154 misclassified point–điểm bị phân lớp lỗi, 158 MLE, 53 MNIST, 117 model parameter–tham số mơ hình, 69 model parameters, 69 monomial, 313 multi-class classification, 175 multinomial logistic regression, 191 multinomial naive Bayes, 129 naive Bayes classifier, 127 NBC, 127 neural network, 162 non-word, 133 norm, 26 norm, 27 norm, 27 p norm, 27 Euclidean norm, 27 Frobenius norm, 28 norm balls, 285 null space, 19 numpy, iv offline learning, 67 one-hot coding, 111 one-vs-one, 176 one-vs-rest, 177 online learning, 67, 152 orthogonal matrix, 20 orthogonality, 20 output layer, 162 overfitting, 91 https:// machinelearningcoban.com/ ebook Index partial derivative–đạo hàm riêng, 30 patch, 77 PCA–xem principle component analysis, 254 pdf, xem probability density function, 40 perceptron learning algorithm, 156 PLA, 156 pocket algorithm, 163 polynomial regression, 89, 92 positive definite matrix, 24 negative definite, 24 negative semidefinite, 24 positive semidefinite, 24 posterior probability, 58 posynomial, 313 predicted output, 83 principal component analysis, 254 prior, 58 probability density function - hàm mật độ xác suất, 40 probability distribution - phân phối xác suất, 47 Bernoulli distribution, 47 Beta distribution, 50 Categorical distribution, 48 Dirichlet distribution, 51 multivariate normal distribution, 50 univariate normal distribution, 49 projection matrix, 75, 269 pseudo inverse, 85 quadratic forms, 291 Quadratic programming, 310 quasiconvex, 296 random projection, 75 random variable - biến ngẫu nhiên, 40 range space, 19 rank, 19 recommendation system collaborative filtering, 215 content-based, 214, 215 item, 214 item-item collaborative filtering, 230 long tail, 214 similarity matrix, 228 user, 214 user-user collaborative filtering, 226 utility matrix, 215 regression–hồi quy, 65 regularization, 96 regularization, 97 regularization, 97 regularization parameter, 97 regularized loss function, 97 regularized neural network, 209 reinforcement learning - học củng cố, 68 ridge regression, 90, 97 robust, 97 Machine Learning 388 scikit-learn, iv semi-supervised learning–học bán giám sát, 68 Separating hyperplane theorem, 288 SGD, see stochastic gradient descent sigmoid, 198 sklearn, iv Slater’s constraint qualification, 322 softmax function, 181 softmax regression, 180 spam filtering, 132 span, 17 sparsity, 97 spectral clustering, 124 state-of-the-art, 74 stochastic gradient descent, 152 stop word, 133 strong duality, 322 submatrix leading principal matrix, 25 leading principal minor, 25 principal minor, 25 principal submatrix, 25 supervised learning–học có giám sát, 67 Support Vector Machine, 328 Hard Margin SVM, 328 Support vector machine Kernel SVM, 355 soft-margin SVM, 339 support vector machine margin, 329 multi-class SVM, 364 symmetric matrix, 13 tanh, 198 task, 64 tensor, 64 test set - tập kiểm thử, 67 training error, 93 training set - tập huấn luyện, 67 transfer learning, 80 triangular matrix, 16 lower, 16 upper, 16 underfitting, 92 unitary matrix, 21 unsupervised learning–học không giám sát, 68 validation, 94 cross-validation, 95 k-fold cross-validation, 95 leave-one-out, 95 vector-valued function, 31 vectorization–vector hoá, 74 weak duality, 321 weight decay, 97 weight vector–vector trọng số, 83 https:// machinelearningcoban.com/ ebook ...Vũ Hữu Tiệp Machine Learning Order ebook https:// machinelearningcoban.com/ ebook/ Blog: https:// machinelearningcoban.com Facebook Page: https:// www.facebook.com/ machinelearningbasicvn/... trình tơi thực dự án Machine Learning https:// machinelearningcoban.com/ ebook ix CHƯƠNG LỜI TÁC GIẢ 0.12 Bảng ký hiệu Các ký hiệu sử dụng sách liệt kê Bảng 0.1 Bảng 0.1: Bảng ký hiệu Ký hiệu... ngơn ngữ lập trình Machine Learning https:// machinelearningcoban.com/ ebook CHƯƠNG LỜI TÁC GIẢ vi 0.8 Tham khảo thêm Có nhiều sách, khoá học, website hay machine learning deep learning, đó, có

Định dạng
Số trang	400
Dung lượng	23,35 MB