Machine learning & data mining ii labwork 3 report

1 K-nearest Neighbor Classification and original labels of test data.. Confusion Matrix ‘crn accuracy macro avg weighted avg Classification error: By normalizing the Iris dataset be

Trang 1

Xí

USTH

VIETNAM FRANCE UNIVERSITY

University of Science and Technology of Hanoi

Machine Learning & Data Mining II Labwork 3 Report

BI12-389 Nguyen Son

BI12-447 An Minh Tri

Academic Year 2 - Data Science February 2023

Trang 2

Contents

1.1

1.2

1.3

2.1

2.2

2.3

10

11

12

13

14

15

16

17

18

Trang 3

1 K-nearest Neighbor Classification

and original labels of test data

Confusion Matrix

ce)

Here we set k = 3 (3 nearest neighbor) to calculate Confusion Matrix, precision, recall, fl-score, for 3 classes in the Iris dataset

Trang 4

with k= 3

Confusion Matrix

For example, with k = 4 compared to when k = 3, almost every aspect has changed slightly But

Trang 5

Confusion Matrix

‘crn

accuracy

macro avg

weighted avg

Classification error:

By normalizing the Iris dataset before implementing the k-nn, the performance yields much better results and the accuracy can sometimes achieve 100%.(k = 3)

Trang 6

with explanation

We apply PCA for 2 components:

0.646835

150 rows x 2 columns

Confusion Matrix

cece)

In case of PCA, the accuracy drops down to 90% as well as other aspects compared to when

Trang 7

e In order to confidently say that we can achieve consistently high accuracy on future unseen data, testing the model on unseen data is essential

e In cross-validation, instead of splitting the data into two parts, we split it into 3 Training data, cross-validation data, and test data

e To use the cross-validation data under training data, we randomly split training data to k equal parts 1/k of those will be for cross-validation, the rest is for training Repeat the step

k times and the data from cross-validation will be trained

e With cross-validation, we can also find the optimal number of neighbor(k) for the best accuracy

097

Number of Neighbors

Implementing the leave-one-out method, we get the

Classification score: 9.9833333333333333

which results in

Trang 8

1.2.1 Apply k-nn on Digits dataset, compute classification error by comparing pre- dicted and original labels of test data

Confusion Matrix

9]

kề 0]

le pS

-0o -95

- 80 -S7/

==) mes)

iy ery

- 80 -97

9o

- 80

608 -7Z/

oo

98

628

97

92

94

4)

xu

HN

Fy Sun, -98

Se

TY

`

9

1

2

3

4

=) io)

vi

8

kở

® -98 mes) -98

accuracy

Here we set k = 3 (3 nearest neighbor) to calculate Confusion Matrix, precision, recall, fl-score, for 3 classes in the Digits dataset

Trang 9

with k= 5

Confusion Matrix

8

3

8

¬— ®

43

Ww 2

di

oyrodsd Ww

|

[

Í

[

- 88 mr!

- 88

- 88 97

- 88

ay | mer 97

- 88

0o

68

42 ko)

43

33

Ey

be

36 Ee)

35

8

nf

2

3

Fì _

6

Vị

8

9 accuracy

macro avg

weighted avg

As expected the results vary as k changes

Trang 10

Confusion Matrix

9]

tà co

w oe

a mo

” "

8

1

2

3

Fì

Sy

6

7

8

9

accuracy

macro avg

By normalizing the dataset before implementing the k-nn, the performance yields much better results

Trang 11

with explanation

Apply PCA for 2 components

Confusion Matrix

` i

ho 0¬

8

75) -=

_>v,

và, -88

ay 7]

vụ,

ae

ay | _-=

Sa,

51

Br

82

22

77

65

19

30

ở

1

2

3

4

=

6

7

8

9

The results drop drastically to only about 51%

Trang 12

0990

0985

0980

0.975

0.970

Number of Neighbors

The optimal k is equal to 1

12

Trang 13

1.3.1 Apply k-nn on Wine dataset, compute classification error by comparing pre- dicted and original labels of test data

Confusion Matrix

Classification error:

Here we set k = 3 (3 nearest neighbor) to calculate Confusion Matrix, precision, recall, fl-score, for 3 classes in the Wine dataset

Confusion Matrix

E8 )

The results vary as k changes

13

Trang 14

Confusion Matrix

‘aD

By normalizing the dataset before implementing the k-nn, the performance yields a small improve- ment

with explanation

Apply PCA for 2 components

Confusion Matrix

eC)

[9 1 9]]

.80 accuracy

macro avg

weighted avg

For the Wine dataset, after applying PCA, the results for k-nn improve by a large margin from about 75% to 94% accuracy

14

Trang 15

072

070

068

= oa a

064 3

062

060

Number of Neighbors

The optimal k is equal to 12

15

Trang 16

2 Perceptron classifier

Perceptron on Iris

% °

o? = ocd ©? oe

"Pe @ % 8 % ge

107? 5

PC1

The weight vector w is initialized to a numpy array of zeros in the perceptron function as follows

n samples, n features = X.shape

w = np.zeros(n_features) .‹b=

The learning rate œ 1s seb to 0.01 by default in the function signature:

16

Trang 17

2.1.2 Plot linear classifiers for Iris dataset (using PCA or SVD to reduce the number

of dimensions to 2D):

Perceptron Classifier for Iris

converging faster?

The convergence rate of the Perceptron on Iris dataset depends on the data distribution and the learning rate (@) used The Perceptron algorithm is guaranteed to converge to a solution if the data

is linearly separable, but the number of iterations required to converge may vary depending on the data distribution

In this case of the Iris Dataset, we set the maximum number of epochs to 10, which means that the algorithm will iterate over the entire dataset 10 times If the data is not well separated, the algorithm may not converge within 10 epochs

To make the algorithm converge faster, we can do the following:

e Scale the input features: Rescaling the input features to have zero mean and unit variance can improve the convergence rate of the algorithm

e Change the learning rate: The learning rate determines the step size of the weight update A larger learning rate can result in faster convergence, but it may also cause the algorithm to overshoot the optimal weights and diverge On the other hand, a smaller learning rate may converge slower but more stably Therefore, it is important to choose an appropriate learning rate based on the problem and the data

Trang 18

e Use a more advanced algorithm: There are many advanced algorithms for linear classification, such as Support Vector Machines (SVMs) and Logistic Regression, which can converge faster and have better performance than Perceptron

Perceptron Classifier for wine

Perceptron on wine

60

40

y

L4

0

° -10 -20 -20

1000

Perceptron on digits Perceptron Classifier for digits

18

Trang 19

3 References

Scikit-learn (n.d.) datasets.load_iris Retrieved from

https: //scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris html Scikit-learn (n.d.) datasets.load_ wine Retrieved from

https: //scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html Scikit-learn (n.d.) datasets.load_ digits Retrieved from

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.ht

ml

The link to all the source code for this labwork:

https://drive google com/drive/folders/1oWdt974E_5SvHOvK6 9nBgAd18FxMDpG2?usp=shari

ng

Tiêu đề	Machine Learning & Data Mining II Labwork 3 Report
Tác giả	Nguyen Son, An Minh Tri
Trường học	University of Science and Technology of Hanoi
Chuyên ngành	Data Science
Thể loại	labwork
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	19
Dung lượng	2,67 MB