The middle term essay introduction to machine learning machine learning’s problems

VIETNAM GENERAL CONFEDERATION OF LABOR TON DUC THANG UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY DALHOC TON DUC THANG THE MIDDLE-TERM ESSAY INTRODUCTION TO MACHINE LEARNING MACHIN

Trang 1

VIETNAM GENERAL CONFEDERATION OF LABOR

TON DUC THANG UNIVERSITY

FACULTY OF INFORMATION TECHNOLOGY

DALHOC TON DUC THANG

THE MIDDLE-TERM ESSAY

INTRODUCTION TO MACHINE LEARNING

MACHINE LEARNING’S PROBLEMS

Instructors MR.LE ANH CUONG Student LE QUANG DUY- 520H0529 TRAN QUOC HUY - 520H0647

Class: 20H50204 Course:

HO CHI MINH CITY, 2022

Trang 2

VIETNAM GENERAL CONFEDERATION OF LABOR

TON DUC THANG UNIVERSITY

FACULTY OF INFORMATION TECHNOLOGY

DALHOC TON BUC THANG TON DUC THANG UNIVERSITY THE MIDDLE-TERM ESSAY

INTRODUCTION TO MACHINE LEARNING

MACHINE LEARNING’S PROBLEMS

Instructors MR.LE ANH CUONG Student LE QUANG DUY- 520H0529 TRAN QUOC HUY - 520H0647

Class: 20H50204 Course: 24

HO CHI MINH CITY, 2022

Trang 4

MIDDLE-TERM ESSAY COMPLETED AT TON DUC THANG

UNIVERSITY

I hereby declare that this is my own report and is under the guidance of Mr Le Anh Cuong The research contents and results on this topic are honest and have not been published in any form before The data in the tables for analysis, comments, and evaluation are collected by the author himself from different sources, clearly stated in the reference section

In addition, the project also uses a number of comments, assessments as well as data from other authors, other agencies, and organizations, with citations and source annotations

If I find any fraud I take full responsibility for the content of my report Ton Duc Thang University is not related to copyright and copyright violations caused by me during the implementation process (if any)

Ho Chi Minh City, 16 October 2022

Author

Le Quang Duy

Trang 5

TEACHER’S CONFIRMATION AND ASSESSMENT SECTION Confirmation section of the instructors

Ho Chi Minh City, day month year

(sign and write full name)

The evaluation part of the lecturer marks the report

Ho Chi Minh City, day month year

(sign and write full name)

Trang 6

SUMMARY

In this report, we will discuss basic methods for machine learning

In chapter 2, we will practice solving a classification problem based on 3 different models (Naive Bayes, k-Nearest Neighbor, and Decision Tree) And compare these models based

on metrics: accuracy, precision, recall, fl-score for each class, and weighted average of fl-score for all the data

In chapter 3, we will discuss, work on, and visualize the Feature Selection problem, and the way it (“correlation”) works

In chapter 4, we will show a theory, the code implementation, and the code’s illustration for 2 algorithms of optimization (Stochastic Gradient Descent and Adam Optimization Algorithm)

Trang 7

TABLE OF CONTENTS ACKNOWLEDGEMENT

MIDDLE-TERM ESSAY COMPLETED AT TON DUC THANG UNIVERSITY TEACHER’S CONFIRMATION AND ASSESSMENT SECTION

2.2.1 Naive Bayes model:

2.2.2 k-Nearest Neighbors model:

2.2.3 Decision Tree model:

2.3 Comparing:

2.3.1 Reporting from Naive Bayes Model:

2.3.2 Reporting from k-Nearest Neighbors Model:

2.3.3 Reporting from Decision Tree Model:

3.1 What is correlation? [1]

3.2 How it works to help?[1]

3.3 Solving linear regression’s problem:

Trang 8

LIST OF ABBREVIATIONS

Trang 9

LIST OF DIAGRAMS, CHARTS, AND TABLES

Trang 10

CHAPTER 1: INTRODUCE

In this report, we divided into 3 parts to solve 3 problems with 4 chapters

_In chapter 1, we will introduce the outline of the report

_In chapter 2, we will show 3 models which are: Naive Bayes Classification, k-Nearest Neighbors, and Decision Tree With each model, we will do a common preparation before traming and testing After all,

we split data into 2 types: traming (75%) and testing (25%), and make a comparison among standards: accuracy, precision, recall, fl - score, and weighted average of fl-score

_In chapter 3, we will answer 2 questions: what it is and how it works it means we will show a theory about “correlation” in feature selection and solve Boston-house-pricing regression

_In chapter 4, we will show the theory of Adam and the Stochastic Gradient Descent Algorithm and show our code for each algorithm

Trang 11

10

CHAPTER 2: PROBLEM 1

2.1 Common Preparing for 3 models:

_In this chapter, I will solve the problem with 3 models: Naive Bayes, k - Nearest Neighbors, and Decision Tree

_ We used the “iris” data set to visualize 3 of the models

_ First of all, we prepare for collecting data and reading file “ins.data”

from google.colab import files

file = files.upload()

iris.data

¢ iris.data(n/a) - 4551 bytes, last modified: 3/10/2022 - 100% done

Saving iris.data to iris.data

Trang 12

[5 from sklearn.model_selection import train_test_split

# Split into 2 kind of random set: 75% training set, 25% test set

Trang 13

12

Description; We chose 149 rows of 4 first columns and split them into 75% for training and 25% percent for testing through a group of variables: x_train, x_test, y_train, y_test

2.2 Execute models:

2.2.1 Naive Bayes model:

_ Training time: Take less than 1 second to train data

of © from sklearn.naive_bayes import Multinomial1NB

NB = MultinomialNB()

# Training

NB fit(x_train,y_train)

Multinomia1NB()

_ Predicting time: Take less than | second to train data

¥ © # Predict result by x_test y_predict = NB.predict(x_test)

Trang 14

Conclusion: We only found 3 errors after running this model

2.2.2 k-Nearest Neighbors model:

Trang 15

Conclusion: We only found 3 errors after running this model

2.2.3 Decision Tree model:

dTree = DecisionTreeClassifier(max_depth=2) dTree.fit(x_train,y_train)

Trang 16

15 Iris-versicolor Iris-versicolor Iris-setosa Iris-setosa Iris-setosa Iris-setosa

Iris-virginica Iris-versicolor Iris-versicolor Iris-virginica

Conclusion: Weighted fl-score of data: 92%

2.3.2 Reporting from k-Nearest Neighbors Model:

Trang 17

16 2.3.3 Reporting from Decision Tree Model:

9.85 9.87

.99 74 81 87 85 -87

Trang 18

3.2 How it works to help?!

High correlation features are more linearly dependent and hence virtually equally affect the dependent variable We can thus exclude one of the two features when there is a substantial correlation between the two features

For example: We used house-pricing which existed in scikit-learn library to analyze, Boston house- pricing:

After loading data, we have:

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV

Trang 19

18

y = df["MEDV"]

Co from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 9, test_size = @.3)

We used “heatmap” to visualize data:

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIIO B LSTAT

As we can see, the number in each square is the percent that they correlate together so we can reject 1 in 2

of them In this instance, “tax” column with “rad” row is up to 0.91 which means relative up to 91% so

we can remove one of them from the data set Thresholds are often used which are 70% to 90%

In this situation, we used 70% for the threshold to reject attributes unnecessary

Our function of correlation:

Trang 20

19

col_corr = set() corr_matrix = dataset.corr()

We return a set of names of rejecting and prepare for our dataset:

3.3 Solving linear regression’s problem:

After all, we solve this problem with linear regression:

from sklearn.metrics import mean_absolute_error, mean_squared_error

from sklearn.linear_model import LinearRegression

4.54 5.21

Trang 21

20

oC y_pred = lr.predict(X test)

_preds df = pd.DataFrame(dict(observed=y_test, predicted=y_pred) ) _preds_ df.head ()

Checking MAE test score:

7 Co print("MAE test score", mean_absolute_error(y_test, y_pred)) [> MAE test score 3.609904060381819

Trang 22

As we can see, we have to draw a line as linear and we have a formula to predict a height:

Predicted Height = Intercept + Slope x Weight (1)

In this instance, we can see 3 clusters, and we can choose randomly intercept = 0 and slope = |

Trang 23

22 Sum of squared residuals = (Observed Height - Predicted Height)’ (2)

Replace (1) into (2), we have:

Sum of squared residuals = (Observed Height - (Intercept + Slope x Weight))’

We have to calculate the derivative of the sum of squared residuals with respect to the intercept and slope: f(Sum of squared residuals) ntercep:” = -2 (Height - (Intercept + Slope x Weight)) f(Sum of squared residuals)siope’ = -2 Weight(Height - (Intercept + Slope x Weight))

We can pick randomly 1 sample to calculate the derivative:

Height =

Weight f(Sum of squared residuals)intercep’ = -2 (3.3 - (0 +1 x 3)) =-0.6

f(Sum of squared residuals)sope” = -2 x 3 (3.3 - (0 + 1x3 ))=-1.8

We can easily calculate the step size to improve the line:

Step Sizeintercep: = f(Sum of squared residuals) ntercep’ X learning rate

Step sizesope = f(Sum of squared residuals)siope-’ X learning rate

We have to start with large Learning Rate and make it smaller with each step

In this example, we chose 0.01 for learning rate:

Step siz@intercep: = Sum of squared residuals )intercep’ X learning rate = -0.6 x 0.01 = -0.006

Step sizésiope = f(Sum of squared residuals)sope’ X learning rate = -1.8 x 0.01 =-0.018

Trang 24

23 New slope = Old slope - Step sizestope = 1 - (-0.018) = 1.018

We had a new line:

Trang 25

Having a new line:

Trang 26

25 And we can stop at intercept = 0.85 and slope = 0.68 in this mstance:

Weight

Trang 27

self.learning rate = learning rate

def fit(self, X, y):

rgen = np.random.RandomState(self.random_state)

self.coef_ = rgen.normal(loc=0.@, scale=0.01, size=1 + X.shape[1])

for _ in range(self.n_ iterations):

for xi, expected_value in zip(X, y):

predicted_value = self.predict(xi)

self.coef_[1:] += self.learning_rate * (expected_value - predicted_value) * xi self.coef_[0] += self learning rate * (expected_value - predicted value) * 1 def activation(self, X):

return np.dot(x, self.coef_[1:]) + self.coef [@]

Trang 28

27

oe from sklearn.datasets import load_breast_cancer

from sklearn.model selection import train test split

Adam is a stochastic objective function optimization algorithm based on first-order gradients and adaptive estimation of low-order moments It’s a very efficient method when only first-order gradients are required with low memory This method is also suitable for problems with unstable variability and fragmented training data

Pseudo code for Adam Algorithm:

Require: a: Stepsize

Require: B 8, € [0 1): Exponential decay rates for the moment estimates

Require: £(6): Stochastic objective function with parameters 6

Require: 6 _: Initial parameter vector

m, — 0 (Initialize 1st moment vector)

m,— O (Initialize 2nd moment vector)

t — 0 (Initialize timestep)

Trang 29

28 while 6 not converged do:

tT—t~I

9, ~ v FB, ủ

m_— B, cm, +(1- B.) -8, 1e B, ` PT mae ~ 8.) 9,

ma m, (ai- 8)

kể ~ if a ~8,) 6-6 -a- m,(afv,~e) end while

Trang 30

29 4.2.2 Show code:

return math.log(1 + (abs(x))**(2 + math.sin(x)))

def general_grad(theta, function):

vt = beta2 * vt + (1 - betal) * (np.power(gt, 2))

m_up = np.true_divide(mt, (1 - (beta1**(i + 1))))

theta_new = theta[-1] - np.true_divide(alpha * m_up, (np.sqrt(v_up) + np.ones(theta[-1].shape[@]) * er theta.append(theta_new)

Trang 31

30

REFERENCES [1]: https://www.kaggle.com/code/bbloggsbott/feature-selection-correlation-and-p-value/notebook

Tiêu đề	Introduction to Machine Learning's Problems
Tác giả	Le Quang Duy, Tran Quoc Huy
Người hướng dẫn	Mr. Le Anh Cuong
Trường học	Ton Duc Thang University
Chuyên ngành	Information Technology
Thể loại	Essay
Năm xuất bản	2022
Thành phố	Ho Chi Minh City

Định dạng
Số trang	31
Dung lượng	3,54 MB