1. Trang chủ
  2. » Luận Văn - Báo Cáo

project course introduction to data mining business analytics

33 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Introduction to Data Mining & Business Analytics
Người hướng dẫn Do Trung Tuan
Trường học Vietnam National University, Hanoi International School
Thể loại project
Năm xuất bản 2023
Thành phố Hanoi
Định dạng
Số trang 33
Dung lượng 1,99 MB

Cấu trúc

  • 1. PROJECT PROPOSAL (5)
    • 1.1. Team member list (5)
    • 1.2. Team name: Group 6 (5)
    • 1.3. Work division – Contribution (5)
  • 2. INTRODUCTION (6)
  • 3. PROBLEM STATEMENT (7)
  • 4. GETTING THE DATA (8)
  • 5. EXPLORATORY DATA ANALYSIS (8)
    • 5.1. Preprocess the datasets (9)
    • 5.2. Understanding Dataset Features (12)
  • 6. DESCRIPTIVE STATISTICS (13)
    • 6.1. Statistical numbers (13)
    • 6.2. Lets separate Numerical and categorical variables for easy analysis (0)
  • 7. REGRESSION ANALYSIS (0)
    • 7.1. Bangladesh’s crop production (BLD) (0)
    • 7.2. China’s crop production (CHN) (21)
    • 7.3. Japan’s crop production (JPN) (22)
    • 7.4. Korea’s crop production (KOR) (23)
    • 7.5. Thailand’s crop production (THA) (25)
    • 7.6. India’s crop production (IND) (26)
    • 7.7. Iran’s crop production (IRN) (27)
  • 8. DECISION TREE (28)
    • 8.1. Decision tree in text form (28)
    • 8.2. Decision tree using the scikit-learn library in Python (29)

Nội dung

INTRODUCTION Our team''''s project name is "Predicting Crop Yields in Selected Asian Countries Using Machine Learning" Agriculture plays a crucial role in sustaining economies and ensuring

PROJECT PROPOSAL

Team member list

No Full name Student ID

Work division – Contribution

1 Leader, find dataset Ngô Hồng Anh

2 Reading and analyzing results then Writing report

Nguyễn Khánh Linh Nguyễn Thị Minh Hằng Ngô Hồng Anh

4 Visualizing the data Nguyễn Đăng Mỹ Oanh

Ngô Hồng Anh Nguyễn Khánh Linh Trần Khánh Huyền

Nguyễn Đăng Mỹ Oanh Trần Khánh Huyền

INTRODUCTION

Our team's project name is "Predicting Crop Yields in Selected Asian Countries Using Machine Learning"

Agriculture plays a crucial role in sustaining economies and ensuring food security for nations worldwide In the context of Asia, where agriculture is a significant sector, accurately predicting crop yields becomes imperative By employing machine learning techniques such as Exploratory Data Analysis (EDA), regression analysis, and decision trees, it becomes possible to harness the power of data to forecast crop production for the years 2026 2028 This essay aims to explore - the potential of these machine learning methods in predicting crop yields in selected Asian countries, thereby enabling policymakers and stakeholders to make informed decisions and implement effective strategies to address potential food shortages or surpluses

Machine learning techniques have gained considerable attention and recognition due to their ability to analyze vast amounts of data and identify meaningful patterns and relationships EDA, as an initial step, allows us to understand the data's structure, identify missing values, outliers, and relationships between variables By conducting a comprehensive EDA on historical agricultural datasets, we can gain valuable insights into the factors that influence crop yields, such as temperature, precipitation, soil composition, and cultivation practices

Regression analysis offers a statistical approach to modeling the relationship between these influential factors and crop yields By fitting regression models to historical data, we can estimate the relationship and quantify the impact of each variable on crop production This knowledge can then be utilized to predict future yields based on projected values of the input variables

Furthermore, decision trees provide a powerful framework for predicting crop yields by constructing a tree-like model of decisions and their potential consequences Decision tree algorithms can consider multiple variables simultaneously and create a tree structure that maps out different scenarios, leading to different yield outcomes By training decision tree models on historical data, we can create predictive models capable of estimating crop yields for future years based on specified input conditions

In conclusion, the utilization of machine learning techniques such as EDA,

Page 7 regression analysis, and decision trees offers a promising approach to predict crop yields in selected Asian countries for the period of 2026-2028 These methods can provide valuable insights into how policymakers can allocate resources effectively, implement suitable policies, and support farmers in making informed decisions By leveraging the power of data and machine learning, we can strive for a more sustainable and resilient agricultural future in Asia.

PROBLEM STATEMENT

This research explores agricultural data and employs data mining techniques and machine learning algorithms to ascertain optimal crop yields, offering valuable insights into crop production

Furthermore, leveraging food data spanning the past 35 years, this study enables the prediction of food production for the upcoming three-year period (2026- 2028)

The dataset used to estimate crop production includes data on:

The dataset consists of over 1000 data points collected from seven randomly selected countries in Asia It encompasses four major agricultural crops, namely rice, wheat, soybean, and maize, over a period spanning from 1990 to 2025 This comprehensive dataset allows for a detailed analysis of the trends and patterns in crop production across these countries over a significant time frame By exploring this extensive data, we can gain valuable insights into the agricultural productivity in Asia and make informed predictions about future crop yields using advanced machine learning techniques

GETTING THE DATA

Yield data for two crops: rice, wheat, soybean and maize for 7 randomly Asia countries below At the national level, forecasts are made throughout the year

EXPLORATORY DATA ANALYSIS

Preprocess the datasets

To begin our analysis, we start by loading the necessary dependencies and configuring the settings for our analysis We import the following libraries:

Pandas: Used for data manipulation and analysis

Seaborn: Used for data visualization

Numpy: Used for numerical computations

Sklearn: Used for machine learning tasks

After loading the dependencies, we load our data into a DataFrame and examine its structure by printing the first 5 rows and the last 5 rows This allows us to get a quick overview of the data Here is the code snippet for loading the dependencies and printing the data:

Figure 1 Overview of the raw data Having reviewed the raw data, we proceed to dive deeper into the analysis Our targeted data is the "Value" column in the DataFrame Therefore, we identify a list of

Page 10 possible features to consider As a first step, we drop the 'Index', 'Indicator',

'Frequency', 'Flag Codes' column as it duplicates the Pandas' index

We have observed that the data features "LOCATION", "SUBJECT", and

"TIME" are suitable and of sufficient quality for further statistical analysis

# Therefore, we will filter and focus solely on these features

We use code: df.head(5) #display number of data lines as required

Figure 3 Drop the columns contain relevant info and all the possible feature

Next, we examine each feature and list all the unique values it contains This helps us understand the distinct categories present in each feature

During this analysis, we identify columns that contain only empty or one unique value These columns do not provide meaningful information for our analysis, so we decide to remove them from the DataFrame

Figure 4 Identify unique value for each feature

Figure 5 Define filtered data frame

Now, with the selected features including "LOCATION," "SUBJECT,"

"MEASURE," and "TIME," along with the "Value" column, we can form a filtered DataFrame to proceed to the next steps of our analysis

By following these steps, we ensure that we have a clean and focused dataset, ready for further analysis and modeling.

Understanding Dataset Features

Upon inspecting the raw dataset and examining several data rows, we can gain valuable insights into the different columns and their corresponding features:

LOCATION: This column represents the geographic location and is classified by country code In the given dataset, we have data from seven distinct countries: Bangladesh (BGD), China (CHN), Japan (JPN), South Korea (KOR), Thailand

(THA), Indonesia (IDN), and Iran (IRN) Each country code corresponds to a specific location where agricultural production data was recorded

SUBJECT: This column indicates the type of agricultural production The dataset includes four main categories: "RICE", "WHEAT", "SOYBEAN", and

"MAIZE" These categories represent different crops or agricultural products

TIME: This column records the time period for the data In the dataset, the TIME feature is represented in the form of years Each entry in the TIME column corresponds to a specific year during which the agricultural production data was collected

Value: This column represents the actual value of agricultural production It contains numeric values that quantify the production quantity or other relevant metrics associated with the specific agricultural subject and location

By examining the unique values in each column, we gain a better understanding of the distinct locations, subject categories, and time periods covered by the dataset This information helps us identify the key components and characteristics of the data, enabling us to perform more targeted analysis and draw meaningful conclusions about agricultural production trends across different countries and crops

DESCRIPTIVE STATISTICS

Statistical numbers

Since our data is primarily clustered around the "SUBJECT" feature with unique values of 'RICE,' 'WHEAT,' 'SOYBEAN,' and 'MAIZE,' we proceed to calculate various statistical measures for these categories Specifically, we calculate the mean, median, correlation, maximum, and minimum values for each category This analysis allows us to gain insights into the characteristics and variations within each subject's data

The following code snippet demonstrates how we perform these calculations and presents the overall results:

Next, we plot data with the main focus feature Subject Overall, this code generates a line plot that visualizes the data of two subjects over time The x-axis represents the time values, the y-axis represents the corresponding values of the subjects, and each subject is differentiated by a different colored line plt.figure(figsize=(12,6)): Sets the size of the figure to 12 inches in width and 6 inches in height, ensuring a proper aspect ratio for the plot sns.lineplot(datafiltered, x='TIME', y='Value', hue='SUBJECT'): Creates a line plot using the lineplot function from Seaborn The data parameter specifies the DataFrame df_filtered containing the data to be plotted The x parameter specifies the column to be plotted on the x-axis, which is 'TIME' The y parameter specifies the column to be plotted on the y-axis, which is 'Value' The hue parameter specifies the column that represents the different subjects, which is 'SUBJECT' This results in multiple lines on the plot, each representing a different subject plt.title("Line Plot by Subject"): Sets the title of the plot to "Line Plot by Subject"

Figure Production and prediction crop production of BGD14

The purpose of this code snippet is to predict crop yields for the upcoming years using a linear regression model

From the graph above, we can see that the food production of 4 crops of

Bangladesh in the period 2026 and 2028 will all have positive growth Bangladesh's domestic agricultural output is not enough to meet domestic consumption demand Therefore, they choose to import food from abroad and Vietnam is one of the countries Bangladesh chooses to cooperate with The Minister of Food of Bangladesh said that the country's rice production is insufficient to supply 170 million people, so Bangladesh still needs to import rice from the main suppliers including Vietnam For the Bangladesh market, VINAFOOD II has been the main supplier of rice under the MOU for many years now Of which, 2011 provided 450,000 tons; in 2017 supply 250,000 tons; in 2021 supply 52,500 tons of white rice; and in 2022 supply 230,000 tons of rice Also according to Bangladesh's Food Minister, the country's rice production is not enough to supply 170 million people, so Bangladesh still needs to import rice, with the main suppliers being India, Vietnam and Myanmar In that spirit, Bangladesh has agreed to extend the MOU on rice trade with Vietnam for another five

Utilizing similar code lines in this section and extracting information from the data file, we can predict the agricultural yields of the next six countries

We will forecast the CHN's crop production including 4 crops and find the CHN's production forecast for the period between 2026 and 2028

Figure 15 Production and prediction crop production of CHN

From the forecast chart, it can be seen that China's 4 crop food production - forecast for the period between 2026 and 2028, both recorded an increase In 2022, China's total food production reached 686.55 million tons, up 3.7 million tons, equivalent to 0.5% compared to 2021, continuing to record a new record, maintaining production of more than 650 million tons, stable for 8 consecutive years According to data released by the State Bureau of Statistics of China on December 12, the country's food production increased in all three harvests of the year In terms of main foods, production of wheat and maize both increased slightly, rice decreased by 2%, and

Page 22 soybean alone increased by 23.7% compared to 2021 To explain these numbers, we can give a few reasons as follows China is one of the world's superpowers, so they invest a lot in biotechnology As a result, the yield and quality of crops are improved Besides, the large area of the territory is also their strong point, so the production of agricultural products is convenient and easy to have a positive growth rate

In 2022, the agricultural cultivation area in China will reach 1.775 billion acres (1 Chinese acre equals 666.67m2), an increase of 10,519 million acres, or 0.6% over last year

We will forecast the JPN's crop production including 4 crops and find the JPN's production forecast for the period between 2026 and 2028

Figure 16 Production and prediction crop production of JPN

According to the chart above, Japan's grain output forecast shows that they have not achieved high growth Even the corn output recorded a slight negative value and the soybean output remained at the maintenance level The other two products, rice and wheat, still showed positive growth, but not much There are two main reasons to solve this problem First of all, Japan is not an agricultural power, because the area of agricultural arable land is very small They mainly rely on improving biotechnology to increase agricultural output and food imports According to the Japanese Ministry of Agriculture, Forestry and Fisheries, the focus of the G 7 - discussions is to increase agricultural productivity to increase production, while doing it in an environmentally friendly way At the meeting, Japan also plans to call on other members to join the projects of the United Nations and the European Union to support small-scale grain production in developing countries Secondly, due to the economic recession and the high inflation rate in Japan, especially the depreciation of the yen, the Japanese government only focuses on importing basic foods such as rice and wheat Therefore, these two agricultural products can still maintain the growth of domestic output

We will forecast the KOR's crop production including 4 crops and find the KOR's production forecast for the period between 2026 and 2028

Figure 17 Production and prediction crop production of KOR

According to the graph above, Korea recorded a remarkable growth in rice production followed by corn production Wheat production fell quite a bit and soybean production fell slightly The main reasons can be stated as follows: First, rice is the main food of Korea, so even though the dietary needs changed, rice is still an agricultural crop that is prioritized for research and cultivation In addition, rice farmers in Korea are worried about the low purchasing price in the context of the COVID-19 pandemic, the oversupply of rice causing a large surplus of rice production costs continue to rise According to the latest data from the Ministry of Agriculture, Food and Rural Affairs of Korea, in September, the price of 20 kg rice decreased by 24.9% compared to the same period last year -the sharpest decline in 45 years Meanwhile, South Korea is expected to consume only 51.9 kg of rice per person in 2022, less than half of what it consumed 30 years ago, due to the demand for rice by schools, Businesses and restaurants have yet to recover from the impact of the COVID-19 pandemic This has resulted in an increase in domestic rice inventories

We will forecast the THA's crop production including 4 crops and find the THA's production forecast for the period between 2026 and 2028

Figure 18 Production and prediction crop production of THA

According to the graph, in addition to maize with positive growth, the remaining 3 crops, rice, wheat, and soybean, all have negative growth The reasons may be as follows: Due to the sharp decline in crop yields in the context of the

Russian-Ukrainian military conflict causing shortages of fertilizers and fodder, food storage will be inevitable The conflict also caused the prices of fertilizers and some raw materials to spike Due to the sharp decline in crop yields in the context of the Russian-Ukrainian military conflict causing shortages of fertilizer and fodder, food stockpiling will be inevitable The conflict also caused the prices of fertilizers and some raw materials to spike According to Kriengkrai, Russia and Ukraine are two of the largest exporters of goods Currently, Russia is the largest producer and exporter of steel and is also a major exporter of wheat and maize for animal feed on the world

Page 26 market Meanwhile, Ukraine is also a major wheat exporter Thus, we can see that Thailand is a country that receives many orders for exporting agricultural products Therefore, due to changes in market demand, the type of crops they prefer to grow for export changes, leading to a difference in output between types of agricultural crops

We will forecast the IND's crop production including 4 crops and find the IND's production forecast for the period between 2026 and 2028

Figure 19 Production and prediction crop production of IND

From the graph, we can see that in addition to corn, which recorded positive growth, rice, soybeans, and wheat all recorded a slight decrease in production The reason for India will be quite similar to Thailand because these two countries are always competing in the export market for food, so the output of the crop will be greatly affected by the demand of the export market Although rice has traditionally been India's main agricultural product, it is still expected to see a decline in output,

Page 27 possibly because weather remains a key factor for rice production Some meteorologists predict an El Niủo phenomenon will dry out much of Asia this year

We will forecast the IRN's crop production including 4 crops and find the IRN's production forecast for the period between 2026 and 2028

Figure 20 Production and prediction crop production of IRN

The results obtained on the graph predict that the food production of Iran has a little variation between the different types of crops Because Iran is an exporter of oil and gas, it does not focus too much on food exports products However, wheat production records the largest increase forecast, followed by rice production

Production of corn and soybeans decreased slightly but not by much However, to explain why wheat and rice still recorded an increase in production because Iran is one of the largest wheat producing countries in the world with wheat production in the 2019/20 crop year estimated at 16.8 million ton

8.1 Decision tree in text form

Figure 21 Below are the steps involved in creating a decision tree

Figure 22 A decision tree in text form

The output above represents a decision tree generated by your model based on data concerning food production This decision tree aids in understanding how the model deduces and predicts food production values based on input attributes

The decision tree starts with a decision based on the "SUBJECT" attribute If the value of the "SUBJECT" attribute is less than or equal to 0.50, the model predicts a food production value of 4.53 Conversely, if the "SUBJECT" value is greater than 0.50, the model proceeds to examine the "SUBJECT" attribute further

REGRESSION ANALYSIS

China’s crop production (CHN)

We will forecast the CHN's crop production including 4 crops and find the CHN's production forecast for the period between 2026 and 2028

Figure 15 Production and prediction crop production of CHN

From the forecast chart, it can be seen that China's 4 crop food production - forecast for the period between 2026 and 2028, both recorded an increase In 2022, China's total food production reached 686.55 million tons, up 3.7 million tons, equivalent to 0.5% compared to 2021, continuing to record a new record, maintaining production of more than 650 million tons, stable for 8 consecutive years According to data released by the State Bureau of Statistics of China on December 12, the country's food production increased in all three harvests of the year In terms of main foods, production of wheat and maize both increased slightly, rice decreased by 2%, and

Page 22 soybean alone increased by 23.7% compared to 2021 To explain these numbers, we can give a few reasons as follows China is one of the world's superpowers, so they invest a lot in biotechnology As a result, the yield and quality of crops are improved Besides, the large area of the territory is also their strong point, so the production of agricultural products is convenient and easy to have a positive growth rate

In 2022, the agricultural cultivation area in China will reach 1.775 billion acres (1 Chinese acre equals 666.67m2), an increase of 10,519 million acres, or 0.6% over last year.

Japan’s crop production (JPN)

We will forecast the JPN's crop production including 4 crops and find the JPN's production forecast for the period between 2026 and 2028

Figure 16 Production and prediction crop production of JPN

According to the chart above, Japan's grain output forecast shows that they have not achieved high growth Even the corn output recorded a slight negative value and the soybean output remained at the maintenance level The other two products, rice and wheat, still showed positive growth, but not much There are two main reasons to solve this problem First of all, Japan is not an agricultural power, because the area of agricultural arable land is very small They mainly rely on improving biotechnology to increase agricultural output and food imports According to the Japanese Ministry of Agriculture, Forestry and Fisheries, the focus of the G 7 - discussions is to increase agricultural productivity to increase production, while doing it in an environmentally friendly way At the meeting, Japan also plans to call on other members to join the projects of the United Nations and the European Union to support small-scale grain production in developing countries Secondly, due to the economic recession and the high inflation rate in Japan, especially the depreciation of the yen, the Japanese government only focuses on importing basic foods such as rice and wheat Therefore, these two agricultural products can still maintain the growth of domestic output.

Korea’s crop production (KOR)

We will forecast the KOR's crop production including 4 crops and find the KOR's production forecast for the period between 2026 and 2028

Figure 17 Production and prediction crop production of KOR

According to the graph above, Korea recorded a remarkable growth in rice production followed by corn production Wheat production fell quite a bit and soybean production fell slightly The main reasons can be stated as follows: First, rice is the main food of Korea, so even though the dietary needs changed, rice is still an agricultural crop that is prioritized for research and cultivation In addition, rice farmers in Korea are worried about the low purchasing price in the context of the COVID-19 pandemic, the oversupply of rice causing a large surplus of rice production costs continue to rise According to the latest data from the Ministry of Agriculture, Food and Rural Affairs of Korea, in September, the price of 20 kg rice decreased by 24.9% compared to the same period last year -the sharpest decline in 45 years Meanwhile, South Korea is expected to consume only 51.9 kg of rice per person in 2022, less than half of what it consumed 30 years ago, due to the demand for rice by schools, Businesses and restaurants have yet to recover from the impact of the COVID-19 pandemic This has resulted in an increase in domestic rice inventories

Thailand’s crop production (THA)

We will forecast the THA's crop production including 4 crops and find the THA's production forecast for the period between 2026 and 2028

Figure 18 Production and prediction crop production of THA

According to the graph, in addition to maize with positive growth, the remaining 3 crops, rice, wheat, and soybean, all have negative growth The reasons may be as follows: Due to the sharp decline in crop yields in the context of the

Russian-Ukrainian military conflict causing shortages of fertilizers and fodder, food storage will be inevitable The conflict also caused the prices of fertilizers and some raw materials to spike Due to the sharp decline in crop yields in the context of the Russian-Ukrainian military conflict causing shortages of fertilizer and fodder, food stockpiling will be inevitable The conflict also caused the prices of fertilizers and some raw materials to spike According to Kriengkrai, Russia and Ukraine are two of the largest exporters of goods Currently, Russia is the largest producer and exporter of steel and is also a major exporter of wheat and maize for animal feed on the world

Page 26 market Meanwhile, Ukraine is also a major wheat exporter Thus, we can see that Thailand is a country that receives many orders for exporting agricultural products Therefore, due to changes in market demand, the type of crops they prefer to grow for export changes, leading to a difference in output between types of agricultural crops.

India’s crop production (IND)

We will forecast the IND's crop production including 4 crops and find the IND's production forecast for the period between 2026 and 2028

Figure 19 Production and prediction crop production of IND

From the graph, we can see that in addition to corn, which recorded positive growth, rice, soybeans, and wheat all recorded a slight decrease in production The reason for India will be quite similar to Thailand because these two countries are always competing in the export market for food, so the output of the crop will be greatly affected by the demand of the export market Although rice has traditionally been India's main agricultural product, it is still expected to see a decline in output,

Page 27 possibly because weather remains a key factor for rice production Some meteorologists predict an El Niủo phenomenon will dry out much of Asia this year.

Iran’s crop production (IRN)

We will forecast the IRN's crop production including 4 crops and find the IRN's production forecast for the period between 2026 and 2028

Figure 20 Production and prediction crop production of IRN

The results obtained on the graph predict that the food production of Iran has a little variation between the different types of crops Because Iran is an exporter of oil and gas, it does not focus too much on food exports products However, wheat production records the largest increase forecast, followed by rice production

Production of corn and soybeans decreased slightly but not by much However, to explain why wheat and rice still recorded an increase in production because Iran is one of the largest wheat producing countries in the world with wheat production in the 2019/20 crop year estimated at 16.8 million ton

DECISION TREE

Decision tree in text form

Figure 21 Below are the steps involved in creating a decision tree

Figure 22 A decision tree in text form

The output above represents a decision tree generated by your model based on data concerning food production This decision tree aids in understanding how the model deduces and predicts food production values based on input attributes

The decision tree starts with a decision based on the "SUBJECT" attribute If the value of the "SUBJECT" attribute is less than or equal to 0.50, the model predicts a food production value of 4.53 Conversely, if the "SUBJECT" value is greater than 0.50, the model proceeds to examine the "SUBJECT" attribute further

If the "SUBJECT" value is greater than 1.50, the model moves to a decision based on the "LOCATION" attribute If the "LOCATION" value is less than or equal to 5.50, the model predicts a food production value of 2.13 On the other hand, if the

"LOCATION" value is greater than 5.50, the model predicts a food production value of 0.44

The purpose of the decision tree is to discover rules and patterns within the input data to predict food production values By dividing the data based on important attributes such as "SUBJECT" and "LOCATION," the model can uncover underlying patterns and rules to forecast the food production values of different countries

The "value" values in the output represent the predicted values of food production at each leaf branch of the decision tree They denote the average value of the training samples within the corresponding branch For example, [4.53] indicates that the average value of food production at the first leaf branch is 4.53

Thus, the decision tree helps us understand how the attributes "SUBJECT" and

"LOCATION" influence the food production of countries within the dataset.

Decision tree using the scikit-learn library in Python

The given code snippet represents the creation and visualization of a decision tree model This model, called a decision tree classifier, is a powerful tool for classification tasks It uses a set of rules derived from the provided data to make predictions In this case, the decision tree is trained on the iris dataset, which contains information about different iris flower species

By utilizing the plot_tree() function, the decision tree structure is presented in a visual format Each node in the tree represents a decision based on a specific attribute, while the leaf nodes indicate the predicted class labels The branches of the tree correspond to different attribute values and demonstrate how the model arrives at its predictions

This graphical representation of the decision tree provides a more accessible way for a broader audience to understand its underlying logic and decision making - process It allows users to interpret the model's predictions and gain insights into how different attributes influence the classification outcomes

This code snippet generates a decision tree using the scikit-learn library The DecisionTreeClassifier() function initializes a decision tree classifier, and then the fit() method trains the decision tree using the input data (iris.data) and corresponding target labels (iris.target)

The resulting decision tree can be visualized using the plot_tree() function from the tree module The feature_names parameter specifies the names of the input features (in this case, the feature names of the iris dataset), and the class_names parameter indicates the names of the target classes (the target names of the iris dataset)

The decision tree structure is then plotted using plt.show(), providing a visual representation of the tree Each node represents a decision based on a specific attribute, and the leaf nodes indicate the predicted class labels The branches of the tree depict the attribute values used for classification decisions

Figure 24 A decision tree using the scikit learn library in Python.-

In this report, we utilized decision trees to classify data on crop yields for different countries The decision tree was trained using attributes such as "SUBJECT" (crop type) and "LOCATION" (geographical location) to predict the crop type with the highest and lowest yields for each country

The results of the decision tree are presented as branches and corresponding predicted values for each branch Through analyzing the output, we can draw the following observations:

If "SUBJECT" 1.50, the decision tree classifies the data into 2 branches:

If "LOCATION" 5.50, the predicted value is [0.44]

By examining the decision tree, we can easily understand how the data is partitioned and the decisions made at each branch Through analyzing the attributes and predicted values, we can infer that the "SUBJECT" attribute plays a significant

Page 32 role in predicting the crop type with the highest and lowest yields Similarly, the

"LOCATION" attribute also has a notable impact on the prediction outcomes

Based on these results, we can apply the decision tree to predict the crop type with the highest and lowest yields for other countries in the dataset This can aid us in understanding and making informed decisions in agricultural production management and enhancing crop productivity in each country

Determining the relationship between the "SUBJECT" attribute and crop yields: Based on decision trees, we observe that the "SUBJECT" attribute plays a significant role in data classification The decision tree branches based on the value of this attribute to predict the highest and lowest crop yields This indicates that the type of crops has a notable impact on crop production

Assessing the impact of geographical location on crop yields: The

"LOCATION" attribute also has a significant influence on prediction outcomes The decision tree branches based on the value of this attribute to predict the highest and lowest crop yields This demonstrates that the geographical location of each country can affect crop yields

Drawing conclusions and making decisions based on decision trees: Decision trees provide us with an overview of decision rules within crop yield data We can use decision trees to predict the crops with the highest and lowest yields for other countries in the dataset This can assist in resource allocation and agricultural production management decisions

In summary, decision trees are an important tool in data analysis and decision- making With an understanding of decision trees and their analytical results, we can enhance predictive capabilities and support decision-making in the field of crop production

Agriculture plays a pivotal role in economies worldwide and holds paramount importance for ensuring global food security, with Asia being a key region in this regard The accurate prediction of crop yields assumes utmost significance in this region The utilization of machine learning techniques, such as Exploratory Data Analysis (EDA), regression analysis, and decision trees, holds immense potential in

Ngày đăng: 03/05/2024, 16:25

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w