team4 group project iris flower classiffication

OverviewThe Iris Flower Classification problem requires you to identify three iris flower species based on four features: sepal length, sepal width, petal length, and petal width.. Comme

Trang 1

BỘ GIÁO DỤC VÀ ĐẠO TẠO

TRƯỜNG ĐẠI HỌC DUY TÂN

KHOA ĐÀO TẠO QUỐC TẾ

ARTIFICIAL INTELLIGENCE (FOR BUSINESS)

INSTRUCTOR: DR Soon Goo Hong

CLASS: IS-CS 468 AIS

Term Group Project

“IRIS FLOWER CLASSIFICATION”

Team 4

 Phan Van Minh Manh - 27211445925

 Tran Thi Thu Hong - 27201401792

 Doan Thien Nhan - 27211201936

Trang 2

Da Nang, 12 December, 2023

Trang 3

1 Overview 3

2 K-Nearest Neighbors (KNN) 3

3 Decision Tree 8

4 Logistic Regression 12

5 Prediction for the given data 22

Trang 4

1 Overview

The Iris Flower Classification problem requires you to identify three iris flowerspecies based on four features: sepal length, sepal width, petal length, and petalwidth The problem has importance because it has several practical uses,including plant breeding, horticulture, and environmental monitoring

The results of our research of the Iris Flower dataset employing three distinct

classification algorithms will be presented in this report: K-Nearest Neighbors

(KNN), Decision Tree, and Logistic Regression The performance of these

algorithms will be compared using various measures such as accuracy, recall,precision, and F1 score We will also guess the species of a new instance based

on the supplied features and make some suggestions for future changes

1 K-Nearest Neighbors (KNN)

BRIGHTICS PROCESS

DATA LOAD

- Load the KNN data from “sample_iris.csv”

- We upload a sample data provided by Samsung Brightics AI

- Click ‘upload’ button and search ‘sample_iris.csv’ then click it

- Click ‘Run’ button

Trang 5

 Group by: species.

 In the output panel, you can see that there are two distinct data sets Inother words, they are separated into "Split data (train_table)" and "Splitdata (test_table)

Trang 6

KNN Classification

 Parameter:

 Inputs: click ‘Empty’ in the ‘test_table’

 Set Inputs: drag ‘test_table’ in the Split Data and drop it to the ‘DropData’ in the ‘test_table’

 Feature Columns: select all

 Label Columns: species

 Inputs: default Label Column: ‘species’

 Prediction Column: ‘prediction

Trang 7

Comment: The accuracy of the predictions was perfectly high (Accuracy: 1.0).

With the KNN model, we successfully classified 3 flower species: setosa,versicolor, virginica with 100% accuracy

Trang 8

Accuracy is the proportion of the total number of predictions that are correct A high accuracy score means that the model is making correct predictions most of the time

Precision refers to the proportion of positive predictions made by the modelthat are actually true positives The denominator becomes TP+FP, as shown in the formula If the precision index is higher, it ensures that the ratio of positive predictions to actual positives is higher

After using k =1, 3, 5 the model achieved an accuracy of 0.967 With k = 7,

9, 11 the model achieves an accuracy of 1.0

As we can see, if the value of K is 1, 3, or 5 then theaccuracy is lower compared to when K has a value greaterthan 5 On the other hand, as the value of K increases, therisk of the model overfitting also increases Therefore, with

a K value of 7, the risk of overfitting is minimized, andsimultaneously, the four metrics for evaluating the modelachieve their highest values, ensuring correct predictionsmost of the time

Therefore, the best k for this dataset is 7

Trang 9

Based on the classification evaluation results with k is 7, modelperformed very well on the sample_iris dataset Here is theanalysist about the metrics

 Accuracy: The model's accuracy is 1.0, meaning the modelcorrectly predicts the flower species in about 100% ofcases

 Setosa: The model classified this species perfectly with F1,Precision and Recall all 1.0 This shows that the modelrecognized all Setosa samples without any errors

 Virginica: The model classified this species perfectly withF1, Precision and Recall all 1.0 This shows that the modelrecognized all Virginica samples without any errors

 Versicolor: The model classified this species perfectly withF1, Precision and Recall all 1.0 This shows that the modelrecognized all Versicolor samples without any errors

2 Decision Tree

BRIGHTICS PROCESS

DATA LOAD

- Load the KNN data from “sample_iris.csv”.

- We upload sample data provided by Samsung Brightics AI

Trang 11

After running the Decision Tree approach to classifying cases with max depth =

3, 5, 7 using species as the outcome variable, we obtained the best max depth =

3 With max depth = 3, the model achieved an accuracy of 0.967 With maxdepth = 5 or 7, The model predicts with the same accuracy of 0.93 Therefore,

the best max depth is 3.

Decision Tree Classification Train

Trang 12

 Splitter: Best.

 Max Depth: 3 , 5, 7 (Replace the values one by one)

Decision Tree Classification Predict

- Parameter

 Inputs: click ‘Empty’ in the ‘test_table’

 Set Inputs: drag ‘test_table’ in the Split Data and drop it to the ‘DropData’ in the ‘test_table’

Trang 13

 Parameter:

 Label Column: ‘species’

 Prediction Column: ‘prediction’

Trang 14

Comment: With max depth = 3, the accuracy of the predictions was

exceptionally high (Accuracy: 0.967) With the Decision Tree model, wesuccessfully classified 3 flower species: setosa (10/10), versicolor (10/10),virginica (9/10) The classification of the two species setosa and versicolor isabsolutely accurate Meanwhile, Iris virginica only correctly classified 9 out of

10 records (0.9)

3 Logistic Regression

Brightics Process

DATA LOAD

- Load the KNN data from “sample_iris.csv”

- We upload a sample data provided by Samsung Brightics AI

Trang 16

 Parameter:

 Columns: sepal length, sepal width, petal length, petal width

 Target Statistic: Max, Min, Average, Standard deviation

 Group by: species

Trang 17

String Summary

Examine the frequencies and proportions of species and the categorical variablesusing them as separators

 Parameter:

 Input Columns: sepal length, sepal width, petal length, petal width

 Group by: species

Trang 18

Logistic Regression Train

Select the dependent variable (spec_cd) and explanatory variables (sepal length,sepal width, petal length, petal width), then proceed with the analysis

- Parameter :

 Inputs: Split Data-train_table

 Feautre Columns: sepal length, sepal width, petal length, petal width

 Label Column: spec_cd

Trang 19

Logistic Regression Predict

Perform predictions by applying the regression equation generated from

 Inputs: Logistic Regression Predict

 Prediction Column: prediction

Based on the classification evaluation results, the logistic regression modelperformed very well on the sample_iris dataset Here are some detailedcomments:

 Accuracy: The model's accuracy is 0.967, which means it accurately

guesses the flower species in around 96.7% of cases

 Species 1 (Setosa): The model correctly categorized this species with F1,

Precision, and Recall all 1.0 This shows that the model correctlyidentified all Setosa samples

Trang 20

 Species 2 (Virginica): The model performed well for this species as well,

with an F1 of 0.95, Precision of 1.0, and Recall of 0.9 This suggests thatthe model properly detected all of the Virginica samples predicted,however some Virginica samples were missing

 Species 0 (Versicolor): The model has F1 of 0.95, Precision of 0.91, and

Recall of 1.0 This shows the model accurately recognized all specimenspredicted to be Versicolor, however, there were several instances whenthe model incorrectly classified other species

 Probability Column: probability_1

 Positive Label: 1

Trang 21

In the ROC curve chart, verify the threshold: 0.69 and the AUC (Area Under theCurve) value of 1.00.

 Positive Label: 0

Trang 22

In the ROC curve chart, verify the threshold: 0.55 and the AUC (Area Under theCurve) value of 1.00

 Positive Label : 2

Trang 23

In the ROC curve chart, verify the threshold: 0.46 and the AUC (Area Under theCurve) value of 1.00.

Comment: We get pretty good accuracy (96.7%) in iris flower classification

using sepal length, sepal width, petal length, and petal width

Trang 24

5 Prediction for the given data

Sepal_length Sepal_width Petal_length Petal_width

Create table

K-Nearest Neighbors

Comment: Using the provided data, we use the previously trained KNN model

to predict species The model suggests that this is Iris Setosa, with a probability

of 85.71% for Iris Setosa and a probability of 14.29% for Iris Versicolor

Decision Tree

Trang 25

Comment: Using the provided data, we use the Decision Tree model trained

previously to predict species The model suggests this is Iris Versicolor, with achance of 0% for Iris Setosa, 97.44% for Iris Versicolor, and 2.56% for IrisVirginica

Logistics Regression

Comment: Using the data provided, we use the previously trained Logistic

Regression model to predict species The model suggests this is Iris Setosa,with a chance of 89.69% for Iris Setosa, 10.02% for Iris Versicolor, and 0.29%for Iris Virginica

Executive Summary

Our team analyzed the data and grouped it into three groups Setosa,

Versicolor, and Virginica are the three categories Three machine learning

techniques were used for classification: K-Nearest Neighbors (KNN), Decision

Tree, and Logistic Regression The three algorithms achieved the following

levels of accuracy: 100%, 96.7%, and 96.7%.

Methodology

To conduct the analysis, our team performed the following steps:

1 Data Collection: We get data from Samsung Brightics AI sources.

Trang 26

2 Cleaning the Data: We cleaned the data by eliminating duplicates, missing

values, and outliers

3 Data Preprocessing: We preprocessed the data by dividing it into training

and testing sets

4 Model Training: On the training set, we trained three machine learning

models using the K-Nearest Neighbors (KNN) with k = 7, Decision Tree withmax depth = 3, and Logistic Regression algorithms

5 Model Evaluation: We examined the models on the testing set and obtained

the following levels of accuracy: 100%, 96.7%, and 96.7%

6 Classification: We classified the data into three groups using the trained

models: Setosa, Versicolor, and Virginica

Result : On the testing set (Accuracy)

On the given data

Sepal_length Sepal_width Petal_length Petal_width

K-Nearest Neighbors: The model predicts that this is Iris Setosa.

Decision Tree: The model predicts that this is Iris Versicolor

Logistic Regression: The model predicts that this is Iris Setosa.

100%

%

Tiêu đề	Iris Flower Classification
Tác giả	Phan Van Minh Manh, Tran Thi Thu Hong, Doan Thien Nhan
Người hướng dẫn	Dr. Soon Goo Hong
Trường học	Duy Tan University
Chuyên ngành	Artificial Intelligence (For Business)
Thể loại	Group Project
Năm xuất bản	2023
Thành phố	Da Nang

Định dạng
Số trang	26
Dung lượng	730,07 KB