Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 31 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
31
Dung lượng
2,02 MB
Nội dung
VIETNAM GENERAL CONFEDERATION OF LABOR TON DUC THANG UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY THE MIDDLE-TERM ESSAY INTRODUCTION TO MACHINE LEARNING MACHINE LEARNING’S PROBLEMS Instructors: MR.LE ANH CUONG Student: LE QUANG DUY– 520H0529 TRAN QUOC HUY - 520H0647 Class: 20H50204 Course: HO CHI MINH CITY, 2022 0 24 VIETNAM GENERAL CONFEDERATION OF LABOR TON DUC THANG UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY THE MIDDLE-TERM ESSAY INTRODUCTION TO MACHINE LEARNING MACHINE LEARNING’S PROBLEMS Instructors: MR.LE ANH CUONG Student: LE QUANG DUY– 520H0529 TRAN QUOC HUY - 520H0647 Class: 20H50204 Course: HO CHI MINH CITY, 2022 0 24 ACKNOWLEDGEMENT Sincere gratitude to Mr Le Anh Cuong and my partner Tran Quoc Huy for their help during the machine learning semester His practical lectures paired with theory assisted me in mastering the principles of machine learning He taught with enthusiasm and with the help of Tran Quoc Huy Please accept my heartfelt gratitude once more 0 MIDDLE-TERM ESSAY COMPLETED AT TON DUC THANG UNIVERSITY I hereby declare that this is my own report and is under the guidance of Mr Le Anh Cuong The research contents and results on this topic are honest and have not been published in any form before The data in the tables for analysis, comments, and evaluation are collected by the author himself from different sources, clearly stated in the reference section In addition, the project also uses a number of comments, assessments as well as data from other authors, other agencies, and organizations, with citations and source annotations If I find any fraud I take full responsibility for the content of my report Ton Duc Thang University is not related to copyright and copyright violations caused by me during the implementation process (if any) Ho Chi Minh City, 16 October 2022 Author Le Quang Duy 0 TEACHER’S CONFIRMATION AND ASSESSMENT SECTION Confirmation section of the instructors _ Ho Chi Minh City, day month year (sign and write full name) The evaluation part of the lecturer marks the report _ Ho Chi Minh City, day month year (sign and write full name) 0 SUMMARY In this report, we will discuss basic methods for machine learning In chapter 2, we will practice solving a classification problem based on different models (Naive Bayes, k-Nearest Neighbor, and Decision Tree) And compare these models based on metrics: accuracy, precision, recall, f1-score for each class, and weighted average of f1-score for all the data In chapter 3, we will discuss, work on, and visualize the Feature Selection problem, and the way it (“correlation”) works In chapter 4, we will show a theory, the code implementation, and the code’s illustration for algorithms of optimization (Stochastic Gradient Descent and Adam Optimization Algorithm) 0 TABLE OF CONTENTS ACKNOWLEDGEMENT MIDDLE-TERM ESSAY COMPLETED AT TON DUC THANG UNIVERSITY TEACHER’S CONFIRMATION AND ASSESSMENT SECTION SUMMARY LIST OF ABBREVIATIONS LIST OF DIAGRAMS, CHARTS, AND TABLES CHAPTER 1: INTRODUCE CHAPTER 2: PROBLEM 10 2.1 Common Preparing for models: 10 2.2 Execute models: 12 2.2.1 Naive Bayes model: 12 2.2.2 k-Nearest Neighbors model: 13 2.2.3 Decision Tree model: 14 2.3 Comparing: 15 2.3.1 Reporting from Naive Bayes Model: 15 2.3.2 Reporting from k-Nearest Neighbors Model: 15 2.3.3 Reporting from Decision Tree Model: 16 CHAPTER 3: PROBLEM 17 3.1 What is correlation? [1] 17 3.2 How it works to help?[1] 17 3.3 Solving linear regression’s problem: 19 CHAPTER 4: PROBLEM 21 4.1 Stochastic Gradient Descent 21 4.1.1 Theory: 21 4.1.2 Show code: 26 4.2 Adam Optimization Algorithm 27 4.2.1 Theory: 27 4.2.2 Show code: 29 REFERENCES 30 0 LIST OF ABBREVIATIONS 0 LIST OF DIAGRAMS, CHARTS, AND TABLES 0 CHAPTER 1: INTRODUCE In this report, we divided into parts to solve problems with chapters _In chapter 1, we will introduce the outline of the report _In chapter 2, we will show models which are: Naive Bayes Classification, k-Nearest Neighbors, and Decision Tree With each model, we will a common preparation before training and testing After all, we split data into types: training (75%) and testing (25%), and make a comparison among standards: accuracy, precision, recall, f1 - score, and weighted average of f1-score _In chapter 3, we will answer questions: what it is and how it works it means we will show a theory about “correlation” in feature selection and solve Boston-house-pricing regression _In chapter 4, we will show the theory of Adam and the Stochastic Gradient Descent Algorithm and show our code for each algorithm 0 16 2.3.3 Reporting from Decision Tree Model: Conclusion: Weighted f1-score of data: 87% 0 17 CHAPTER 3: PROBLEM To solve the problem, we have to answer kinds of questions: _ What is correlation? _ How does correlation help in feature selection? 3.1 What is correlation? [1] The statistical concept of correlation is frequently used to describe how nearly linear a relationship exists between two variables For instance, two linearly dependent variables, such as x and y, would have a larger correlation than two non-linearly dependent variables, such as u and v, which are dependent on each other as u = v2 3.2 How it works to help?[1] High correlation features are more linearly dependent and hence virtually equally affect the dependent variable We can thus exclude one of the two features when there is a substantial correlation between the two features For example: We used house-pricing which existed in scikit-learn library to analyze, Boston housepricing: After loading data, we have: We divided data into sets: training (70%) and testing (30%): 0 18 We used “heatmap” to visualize data: As we can see, the number in each square is the percent that they correlate together so we can reject in of them In this instance, “tax” column with “rad” row is up to 0.91 which means relative up to 91% so we can remove one of them from the data set Thresholds are often used which are 70% to 90% In this situation, we used 70% for the threshold to reject attributes unnecessary Our function of correlation: 0 19 We return a set of names of rejecting and prepare for our dataset: After rejecting, we lost attributes with only 10 columns (13 columns before): 3.3 Solving linear regression’s problem: After all, we solve this problem with linear regression: Predicting values: 0 20 Checking MAE test score: 0 21 CHAPTER 4: PROBLEM 4.1 Stochastic Gradient Descent 4.1.1 Theory: An optimization approach called Stochastic Gradient Descent is used to identify the model parameters that best match the expected and observed outcomes It's a crude yet effective method Stochastic Gradient Descent is especially useful when there are redundancies in the data As we can see, we have to draw a line as linear and we have a formula to predict a height: Predicted Height = Intercept + Slope x Weight (1) In this instance, we can see clusters, and we can choose randomly intercept = and slope = As we knew, we can use a “Loss Function” to determine how it fit the data: 0 22 Sum of squared residuals = (Observed Height - Predicted Height)2 (2) Replace (1) into (2), we have: Sum of squared residuals = (Observed Height - (Intercept + Slope x Weight)) We have to calculate the derivative of the sum of squared residuals with respect to the intercept and slope: f(Sum of squared residuals)intercept’ = -2 (Height - (Intercept + Slope x Weight)) f(Sum of squared residuals)slope’ = -2 Weight(Height - (Intercept + Slope x Weight)) We can pick randomly sample to calculate the derivative: f(Sum of squared residuals)intercept’ = -2 (3.3 - (0 +1 x 3)) = -0.6 f(Sum of squared residuals)slope’ = -2 x (3.3 - (0 + x )) = -1.8 We can easily calculate the step size to improve the line: Step sizeintercept = f(Sum of squared residuals)intercept’ x learning rate Step sizeslope = f(Sum of squared residuals)slope’ x learning rate We have to start with large Learning Rate and make it smaller with each step In this example, we chose 0.01 for learning rate: Step sizeintercept = f(Sum of squared residuals)intercept’ x learning rate = -0.6 x 0.01 = -0.006 Step sizeslope = f(Sum of squared residuals)slope’ x learning rate = -1.8 x 0.01 = -0.018 => New intercept = Old intercept - Step sizeintercept = - (0.006) = 0.006 0 23 New slope = Old slope - Step sizeslope = - (-0.018) = 1.018 We had a new line: We iterate from pick another random sample to calculate until Loss Function is less than 0.001, we can stop: 0 24 Having a new line: Repeating step above and having a new line: 0 25 And we can stop at intercept = 0.85 and slope = 0.68 in this instance: When we have a new sample added, we use this sample and repeat each step before to create a new line which fits with data: Result after adding new data point: 0 26 4.1.2 Show code: 0 27 4.2 Adam Optimization Algorithm 4.2.1 Theory: Adam Optimization Algorithm also known as Adaptive Moment Estimation is a method for stochastic optimization It’s a kind of gradient descent optimization for machine learning (neural networks, etc), ADAM is a method created to improve the learning rate for machine learning Adam is a stochastic objective function optimization algorithm based on first-order gradients and adaptive estimation of low-order moments It’s a very efficient method when only first-order gradients are required with low memory This method is also suitable for problems with unstable variability and fragmented training data Pseudo code for Adam Algorithm: 0 28 Note that we can improve above algorithm by changing the order of computation: Ê file iris Mấy file data m xài Duy 0 29 4.2.2 Show code: 0 30 REFERENCES [1]: https://www.kaggle.com/code/bbloggsbott/feature-selection-correlation-and-p-value/notebook [2]: https://www.phamduytung.com/blog/2021-01-15 -adabelief-optimizer/#:~:text=Adam%20%2D %20Adaptive%20Moment%20Estimation,s%E1%BB%AD%20d%E1%BB%A5ng%20exponential %20weighted%20averaging [3]: https://arxiv.org/abs/1412.6980 [4]: https://github.com/theroyakash/Adam/blob/master/Code/Adam.ipynb [5]: https://vitalflux.com/stochastic-gradient-descent-python-example/ 0 ... CONFEDERATION OF LABOR TON DUC THANG UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY THE MIDDLE- TERM ESSAY INTRODUCTION TO MACHINE LEARNING MACHINE LEARNING? ??S PROBLEMS Instructors: MR.LE ANH CUONG... gratitude to Mr Le Anh Cuong and my partner Tran Quoc Huy for their help during the machine learning semester His practical lectures paired with theory assisted me in mastering the principles of machine. .. method for stochastic optimization It’s a kind of gradient descent optimization for machine learning (neural networks, etc), ADAM is a method created to improve the learning rate for machine learning