team4 group project iris flower classiffication

26 0 0
Tài liệu đã được kiểm tra trùng lặp
team4 group project iris flower classiffication

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

OverviewThe Iris Flower Classification problem requires you to identify three iris flower species based on four features: sepal length, sepal width, petal length, and petal width.. Comme

Trang 1

BỘ GIÁO DỤC VÀ ĐẠO TẠOTRƯỜNG ĐẠI HỌC DUY TÂN

KHOA ĐÀO TẠO QUỐC TẾ

ARTIFICIAL INTELLIGENCE (FOR BUSINESS)INSTRUCTOR: DR Soon Goo Hong

CLASS: IS-CS 468 AIS

Term Group Project

“IRIS FLOWER CLASSIFICATION”

Team 4

 Phan Van Minh Manh - 27211445925  Tran Thi Thu Hong - 27201401792  Doan Thien Nhan - 27211201936

Trang 2

Da Nang, 12 December, 2023

Trang 4

1 Overview

The Iris Flower Classification problem requires you to identify three iris flower species based on four features: sepal length, sepal width, petal length, and petal width The problem has importance because it has several practical uses, including plant breeding, horticulture, and environmental monitoring.

The results of our research of the Iris Flower dataset employing three distinct

classification algorithms will be presented in this report: K-Nearest Neighbors

(KNN), Decision Tree, and Logistic Regression The performance of these

algorithms will be compared using various measures such as accuracy, recall, precision, and F1 score We will also guess the species of a new instance based on the supplied features and make some suggestions for future changes.

1 K-Nearest Neighbors (KNN)BRIGHTICS PROCESS

DATA LOAD

- Load the KNN data from “sample_iris.csv”.

- We upload a sample data provided by Samsung Brightics AI - Click ‘upload’ button and search ‘sample_iris.csv’ then click it - Click ‘Run’ button.

Trang 5

 Group by: species.

 In the output panel, you can see that there are two distinct data sets In other words, they are separated into "Split data (train_table)" and "Split data (test_table).

Trang 6

KNN Classification

 Parameter:

 Inputs: click ‘Empty’ in the ‘test_table’.

 Set Inputs: drag ‘test_table’ in the Split Data and drop it to the ‘Drop Data’ in the ‘test_table’.

 Feature Columns: select all  Label Columns: species.

 Inputs: default Label Column: ‘species’  Prediction Column: ‘prediction.

Trang 7

Comment: The accuracy of the predictions was perfectly high (Accuracy: 1.0).

With the KNN model, we successfully classified 3 flower species: setosa, versicolor, virginica with 100% accuracy.

Trang 8

Accuracy is the proportion of the total number of predictions that are correct A high accuracy score means that the model is making correct predictions most of the time

Precision refers to the proportion of positive predictions made by the model that are actually true positives The denominator becomes TP+FP, as shown in the formula If the precision index is higher, it ensures that the ratio of positive predictions to actual positives is higher.

After using k =1, 3, 5 the model achieved an accuracy of 0.967 With k = 7, 9, 11 the model achieves an accuracy of 1.0

As we can see, if the value of K is 1, 3, or 5 then the accuracy is lower compared to when K has a value greater than 5 On the other hand, as the value of K increases, the risk of the model overfitting also increases Therefore, with a K value of 7, the risk of overfitting is minimized, and simultaneously, the four metrics for evaluating the model achieve their highest values, ensuring correct predictions most of the time.

Therefore, the best k for this dataset is 7

Trang 9

Based on the classification evaluation results with k is 7, model performed very well on the sample_iris dataset Here is the analysist about the metrics

 Accuracy: The model's accuracy is 1.0, meaning the model correctly predicts the flower species in about 100% of cases.

 Setosa: The model classified this species perfectly with F1, Precision and Recall all 1.0 This shows that the model recognized all Setosa samples without any errors.

 Virginica: The model classified this species perfectly with F1, Precision and Recall all 1.0 This shows that the model recognized all Virginica samples without any errors  Versicolor: The model classified this species perfectly with

F1, Precision and Recall all 1.0 This shows that the model recognized all Versicolor samples without any errors.

2 Decision Tree

BRIGHTICS PROCESS

DATA LOAD

- Load the KNN data from “sample_iris.csv”.

- We upload sample data provided by Samsung Brightics AI.

Trang 11

After running the Decision Tree approach to classifying cases with max depth = 3, 5, 7 using species as the outcome variable, we obtained the best max depth = 3 With max depth = 3, the model achieved an accuracy of 0.967 With max depth = 5 or 7, The model predicts with the same accuracy of 0.93 Therefore,

the best max depth is 3.

Decision Tree Classification Train

Trang 12

 Splitter: Best.

 Max Depth: 3 , 5, 7 (Replace the values one by one).

Decision Tree Classification Predict

- Parameter

 Inputs: click ‘Empty’ in the ‘test_table’.

 Set Inputs: drag ‘test_table’ in the Split Data and drop it to the ‘Drop Data’ in the ‘test_table’.

Trang 13

 Parameter:

 Label Column: ‘species’  Prediction Column: ‘prediction’.

Trang 14

Comment: With max depth = 3, the accuracy of the predictions was

exceptionally high (Accuracy: 0.967) With the Decision Tree model, we successfully classified 3 flower species: setosa (10/10), versicolor (10/10), virginica (9/10) The classification of the two species setosa and versicolor is absolutely accurate Meanwhile, Iris virginica only correctly classified 9 out of 10 records (0.9).

3 Logistic RegressionBrightics Process

DATA LOAD

- Load the KNN data from “sample_iris.csv”.

- We upload a sample data provided by Samsung Brightics AI.

Trang 15

PRE-PROCESSINGQuery Executor

Perform the conversion of the dependent variable into decimal format (numeric type) as per the input conditions (species), resulting in 1s, 2s, and 0s.

DESCRIPTIVE ANALYSISStatistic Summary

 For the Number type variable (sepal length, sepal width, petal length, petal width), examine various statistics based on species.

Trang 16

 Parameter:

 Columns: sepal length, sepal width, petal length, petal width  Target Statistic: Max, Min, Average, Standard deviation  Group by: species

Select Column

 To transform the categorical variable into a String format  Parameter:

 Condition: Change the Type of the " sepal length, sepal width, petal length, petal width " variable to String.

Trang 17

String Summary

Examine the frequencies and proportions of species and the categorical variables using them as separators.

 Parameter:

 Input Columns: sepal length, sepal width, petal length, petal width  Group by: species.

Trang 18

Logistic Regression Train

Select the dependent variable (spec_cd) and explanatory variables (sepal length, sepal width, petal length, petal width), then proceed with the analysis

- Parameter :

 Inputs: Split Data-train_table.

 Feautre Columns: sepal length, sepal width, petal length, petal width  Label Column: spec_cd.

Trang 19

Logistic Regression Predict

Perform predictions by applying the regression equation generated from.

 Inputs: Logistic Regression Predict  Label Column: spec_cd.

 Prediction Column: prediction.

Based on the classification evaluation results, the logistic regression model performed very well on the sample_iris dataset Here are some detailed comments:

 Accuracy: The model's accuracy is 0.967, which means it accurately

guesses the flower species in around 96.7% of cases.

 Species 1 (Setosa): The model correctly categorized this species with F1,

Precision, and Recall all 1.0 This shows that the model correctly identified all Setosa samples.

Trang 20

 Species 2 (Virginica): The model performed well for this species as well,

with an F1 of 0.95, Precision of 1.0, and Recall of 0.9 This suggests that the model properly detected all of the Virginica samples predicted, however some Virginica samples were missing.

 Species 0 (Versicolor): The model has F1 of 0.95, Precision of 0.91, and

Recall of 1.0 This shows the model accurately recognized all specimens predicted to be Versicolor, however, there were several instances when the model incorrectly classified other species.

Evaluation 2

Plot ROC and PR CurvesSetosa_1

Check the performance through the plots of ROC (Receiver Operating Characteristic) and PR (Precision-Recall)

In this case, the classification performance targeted is for spec_cd = 1 (setosa) Parameter

 Label Column: spec_cd.

 Probability Column: probability_1  Positive Label: 1.

Trang 21

In the ROC curve chart, verify the threshold: 0.69 and the AUC (Area Under the Curve) value of 1.00.

Check the performance through the plots of ROC (Receiver Operating Characteristic) and PR (Precision-Recall)

In this case, the classification performance targeted is for spec_cd = 0 (versicolor)

 Label Column: spec_cd.

 Probability Column: probability_0  Positive Label: 0.

Trang 22

In the ROC curve chart, verify the threshold: 0.55 and the AUC (Area Under the Curve) value of 1.00

Check the performance through the plots of ROC (Receiver Operating Characteristic) and PR (Precision-Recall)

In this case, the classification performance targeted is for spec_cd = 2 (virginica)

 Label Column: spec_cd

 Probability Column: probability_2  Positive Label : 2

Trang 23

In the ROC curve chart, verify the threshold: 0.46 and the AUC (Area Under the Curve) value of 1.00.

Comment: We get pretty good accuracy (96.7%) in iris flower classification

using sepal length, sepal width, petal length, and petal width.

Trang 24

5 Prediction for the given data

Create table

K-Nearest Neighbors

Comment: Using the provided data, we use the previously trained KNN model

to predict species The model suggests that this is Iris Setosa, with a probability of 85.71% for Iris Setosa and a probability of 14.29% for Iris Versicolor.

Decision Tree

Trang 25

Comment: Using the provided data, we use the Decision Tree model trained

previously to predict species The model suggests this is Iris Versicolor, with a chance of 0% for Iris Setosa, 97.44% for Iris Versicolor, and 2.56% for Iris Virginica.

Logistics Regression

Comment: Using the data provided, we use the previously trained Logistic

Regression model to predict species The model suggests this is Iris Setosa, with a chance of 89.69% for Iris Setosa, 10.02% for Iris Versicolor, and 0.29% for Iris Virginica.

Executive Summary

Our team analyzed the data and grouped it into three groups Setosa,

Versicolor, and Virginica are the three categories Three machine learning

techniques were used for classification: K-Nearest Neighbors (KNN), Decision

Tree, and Logistic Regression The three algorithms achieved the following

levels of accuracy: 100%, 96.7%, and 96.7%.

To conduct the analysis, our team performed the following steps:

1 Data Collection: We get data from Samsung Brightics AI sources.

Trang 26

2 Cleaning the Data: We cleaned the data by eliminating duplicates, missing

values, and outliers.

3 Data Preprocessing: We preprocessed the data by dividing it into training

and testing sets.

4 Model Training: On the training set, we trained three machine learning

models using the K-Nearest Neighbors (KNN) with k = 7, Decision Tree with max depth = 3, and Logistic Regression algorithms.

5 Model Evaluation: We examined the models on the testing set and obtained

the following levels of accuracy: 100%, 96.7%, and 96.7%.

6 Classification: We classified the data into three groups using the trained

models: Setosa, Versicolor, and Virginica.

Result : On the testing set (Accuracy)

On the given data

K-Nearest Neighbors: The model predicts that this is Iris Setosa.Decision Tree: The model predicts that this is Iris Versicolor.

Logistic Regression: The model predicts that this is Iris Setosa.

100%%

Ngày đăng: 24/04/2024, 16:23

Tài liệu cùng người dùng

Tài liệu liên quan