OverviewThe Iris Flower Classification problem requires you to identify three iris flower species based on four features: sepal length, sepal width, petal length, and petal width.. Comme
Trang 1BỘ GIÁO DỤC VÀ ĐẠO TẠO
TRƯỜNG ĐẠI HỌC DUY TÂN
KHOA ĐÀO TẠO QUỐC TẾ
ARTIFICIAL INTELLIGENCE (FOR BUSINESS)
INSTRUCTOR: DR Soon Goo Hong
CLASS: IS-CS 468 AIS
Term Group Project
“IRIS FLOWER CLASSIFICATION”
Team 4
Phan Van Minh Manh - 27211445925
Tran Thi Thu Hong - 27201401792
Doan Thien Nhan - 27211201936
Trang 2Da Nang, 12 December, 2023
Trang 31 Overview 3
2 K-Nearest Neighbors (KNN) 3
3 Decision Tree 8
4 Logistic Regression 12
5 Prediction for the given data 22
Trang 41 Overview
The Iris Flower Classification problem requires you to identify three iris flowerspecies based on four features: sepal length, sepal width, petal length, and petalwidth The problem has importance because it has several practical uses,including plant breeding, horticulture, and environmental monitoring
The results of our research of the Iris Flower dataset employing three distinct
classification algorithms will be presented in this report: K-Nearest Neighbors
(KNN), Decision Tree, and Logistic Regression The performance of these
algorithms will be compared using various measures such as accuracy, recall,precision, and F1 score We will also guess the species of a new instance based
on the supplied features and make some suggestions for future changes
1 K-Nearest Neighbors (KNN)
BRIGHTICS PROCESS
DATA LOAD
- Load the KNN data from “sample_iris.csv”
- We upload a sample data provided by Samsung Brightics AI
- Click ‘upload’ button and search ‘sample_iris.csv’ then click it
- Click ‘Run’ button
Trang 5 Group by: species.
In the output panel, you can see that there are two distinct data sets Inother words, they are separated into "Split data (train_table)" and "Splitdata (test_table)
Trang 6KNN Classification
Parameter:
Inputs: click ‘Empty’ in the ‘test_table’
Set Inputs: drag ‘test_table’ in the Split Data and drop it to the ‘DropData’ in the ‘test_table’
Feature Columns: select all
Label Columns: species
Inputs: default Label Column: ‘species’
Prediction Column: ‘prediction
Trang 7Comment: The accuracy of the predictions was perfectly high (Accuracy: 1.0).
With the KNN model, we successfully classified 3 flower species: setosa,versicolor, virginica with 100% accuracy
Trang 8Accuracy is the proportion of the total number of predictions that are correct A high accuracy score means that the model is making correct predictions most of the time
Precision refers to the proportion of positive predictions made by the modelthat are actually true positives The denominator becomes TP+FP, as shown in the formula If the precision index is higher, it ensures that the ratio of positive predictions to actual positives is higher
After using k =1, 3, 5 the model achieved an accuracy of 0.967 With k = 7,
9, 11 the model achieves an accuracy of 1.0
As we can see, if the value of K is 1, 3, or 5 then theaccuracy is lower compared to when K has a value greaterthan 5 On the other hand, as the value of K increases, therisk of the model overfitting also increases Therefore, with
a K value of 7, the risk of overfitting is minimized, andsimultaneously, the four metrics for evaluating the modelachieve their highest values, ensuring correct predictionsmost of the time
Therefore, the best k for this dataset is 7
Trang 9Based on the classification evaluation results with k is 7, modelperformed very well on the sample_iris dataset Here is theanalysist about the metrics
Accuracy: The model's accuracy is 1.0, meaning the modelcorrectly predicts the flower species in about 100% ofcases
Setosa: The model classified this species perfectly with F1,Precision and Recall all 1.0 This shows that the modelrecognized all Setosa samples without any errors
Virginica: The model classified this species perfectly withF1, Precision and Recall all 1.0 This shows that the modelrecognized all Virginica samples without any errors
Versicolor: The model classified this species perfectly withF1, Precision and Recall all 1.0 This shows that the modelrecognized all Versicolor samples without any errors
2 Decision Tree
BRIGHTICS PROCESS
DATA LOAD
- Load the KNN data from “sample_iris.csv”.
- We upload sample data provided by Samsung Brightics AI
Trang 11After running the Decision Tree approach to classifying cases with max depth =
3, 5, 7 using species as the outcome variable, we obtained the best max depth =
3 With max depth = 3, the model achieved an accuracy of 0.967 With maxdepth = 5 or 7, The model predicts with the same accuracy of 0.93 Therefore,
the best max depth is 3.
Decision Tree Classification Train
Trang 12 Splitter: Best.
Max Depth: 3 , 5, 7 (Replace the values one by one)
Decision Tree Classification Predict
- Parameter
Inputs: click ‘Empty’ in the ‘test_table’
Set Inputs: drag ‘test_table’ in the Split Data and drop it to the ‘DropData’ in the ‘test_table’
Trang 13 Parameter:
Label Column: ‘species’
Prediction Column: ‘prediction’
Trang 14Comment: With max depth = 3, the accuracy of the predictions was
exceptionally high (Accuracy: 0.967) With the Decision Tree model, wesuccessfully classified 3 flower species: setosa (10/10), versicolor (10/10),virginica (9/10) The classification of the two species setosa and versicolor isabsolutely accurate Meanwhile, Iris virginica only correctly classified 9 out of
10 records (0.9)
3 Logistic Regression
Brightics Process
DATA LOAD
- Load the KNN data from “sample_iris.csv”
- We upload a sample data provided by Samsung Brightics AI
Trang 16 Parameter:
Columns: sepal length, sepal width, petal length, petal width
Target Statistic: Max, Min, Average, Standard deviation
Group by: species
Trang 17String Summary
Examine the frequencies and proportions of species and the categorical variablesusing them as separators
Parameter:
Input Columns: sepal length, sepal width, petal length, petal width
Group by: species
Trang 18Logistic Regression Train
Select the dependent variable (spec_cd) and explanatory variables (sepal length,sepal width, petal length, petal width), then proceed with the analysis
- Parameter :
Inputs: Split Data-train_table
Feautre Columns: sepal length, sepal width, petal length, petal width
Label Column: spec_cd
Trang 19Logistic Regression Predict
Perform predictions by applying the regression equation generated from
Inputs: Logistic Regression Predict
Label Column: spec_cd
Prediction Column: prediction
Based on the classification evaluation results, the logistic regression modelperformed very well on the sample_iris dataset Here are some detailedcomments:
Accuracy: The model's accuracy is 0.967, which means it accurately
guesses the flower species in around 96.7% of cases
Species 1 (Setosa): The model correctly categorized this species with F1,
Precision, and Recall all 1.0 This shows that the model correctlyidentified all Setosa samples
Trang 20 Species 2 (Virginica): The model performed well for this species as well,
with an F1 of 0.95, Precision of 1.0, and Recall of 0.9 This suggests thatthe model properly detected all of the Virginica samples predicted,however some Virginica samples were missing
Species 0 (Versicolor): The model has F1 of 0.95, Precision of 0.91, and
Recall of 1.0 This shows the model accurately recognized all specimenspredicted to be Versicolor, however, there were several instances whenthe model incorrectly classified other species
Label Column: spec_cd
Probability Column: probability_1
Positive Label: 1
Trang 21In the ROC curve chart, verify the threshold: 0.69 and the AUC (Area Under theCurve) value of 1.00.
Label Column: spec_cd
Probability Column: probability_0
Positive Label: 0
Trang 22In the ROC curve chart, verify the threshold: 0.55 and the AUC (Area Under theCurve) value of 1.00
Label Column: spec_cd
Probability Column: probability_2
Positive Label : 2
Trang 23In the ROC curve chart, verify the threshold: 0.46 and the AUC (Area Under theCurve) value of 1.00.
Comment: We get pretty good accuracy (96.7%) in iris flower classification
using sepal length, sepal width, petal length, and petal width
Trang 245 Prediction for the given data
Sepal_length Sepal_width Petal_length Petal_width
Create table
K-Nearest Neighbors
Comment: Using the provided data, we use the previously trained KNN model
to predict species The model suggests that this is Iris Setosa, with a probability
of 85.71% for Iris Setosa and a probability of 14.29% for Iris Versicolor
Decision Tree
Trang 25Comment: Using the provided data, we use the Decision Tree model trained
previously to predict species The model suggests this is Iris Versicolor, with achance of 0% for Iris Setosa, 97.44% for Iris Versicolor, and 2.56% for IrisVirginica
Logistics Regression
Comment: Using the data provided, we use the previously trained Logistic
Regression model to predict species The model suggests this is Iris Setosa,with a chance of 89.69% for Iris Setosa, 10.02% for Iris Versicolor, and 0.29%for Iris Virginica
Executive Summary
Our team analyzed the data and grouped it into three groups Setosa,
Versicolor, and Virginica are the three categories Three machine learning
techniques were used for classification: K-Nearest Neighbors (KNN), Decision
Tree, and Logistic Regression The three algorithms achieved the following
levels of accuracy: 100%, 96.7%, and 96.7%.
Methodology
To conduct the analysis, our team performed the following steps:
1 Data Collection: We get data from Samsung Brightics AI sources.
Trang 262 Cleaning the Data: We cleaned the data by eliminating duplicates, missing
values, and outliers
3 Data Preprocessing: We preprocessed the data by dividing it into training
and testing sets
4 Model Training: On the training set, we trained three machine learning
models using the K-Nearest Neighbors (KNN) with k = 7, Decision Tree withmax depth = 3, and Logistic Regression algorithms
5 Model Evaluation: We examined the models on the testing set and obtained
the following levels of accuracy: 100%, 96.7%, and 96.7%
6 Classification: We classified the data into three groups using the trained
models: Setosa, Versicolor, and Virginica
Result : On the testing set (Accuracy)
On the given data
Sepal_length Sepal_width Petal_length Petal_width
K-Nearest Neighbors: The model predicts that this is Iris Setosa.
Decision Tree: The model predicts that this is Iris Versicolor
Logistic Regression: The model predicts that this is Iris Setosa.
100%
%