Table of contents Chapter 1: Searching for data 1.1 Create and retrieve data from world bank databank page, Go to page 1.1.2 Choose 5 favorite countries: Bhutan, Singapore, Japan, Nepal,
Trang 1BỘ GIÁO DỤC VÀ ĐÀO TẠO
TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT TP.HCM KHOA ĐÀO TẠO CHẤT LƯỢNG CAO
Sinh viên thực hiện : Trần Nguyễn Hiền Vân
TP Hồ Chí Minh, tháng năm 2028 2
Trang 2Table of contents
Chapter 1: Searching for data
1.1 Create and retrieve data from world bank databank page, Go to page 1.1.2 Choose 5 favorite countries: Bhutan, Singapore, Japan, Nepal, Vietnam
1.1.3 Selection of 10 criteria in the clustered index include Access to clean fuels and technologies for cooking (% of population)
1.1.4 Selection of survey time: from 2006 to 2020
Chapter 2: Processing and creating reports on Excel
2.1 Processing data from Excel files
2.2 Create dashboards and reports that match existing data 2.2.1 Create data reporting dashboard
2.2.2 Calculation of the remaining data (finding sum, max, min, average, vlookup, ) is done in the Excel file "Data Analytics" including content sentences
2.2.3 Use Conditional Formatting to highlight (Excel File “Data Conditional Formatting)
Chapter 3: Analyzing processed data on GOOGLE COLAB
3.1 Upload, connect Google drive with Google Colab
3.2 Run the commands to read the file with pandas, then tell the size of the tuple 3.3 Description of the data fields
3.4 Plot the distribution chart for at least 2 columns, commentEnter code 3.5 Draw at least 2 arbitrary graphs using seaborn, comment
Trang 3Chapter 1: Searching for data
1.1 Create and retrieve data from world bank databank page, Go to pagehttps://databank.worldbank.org/source/world-development-indicators
1.1.1 Select 1 Database out of 86 provided databases I chose the topic: Sustainable Energy for All
1.1.2 Choose 5 favorite countries: Bhutan, Singapore, Japan, Nepal, Vietnam
1.1.3 Selection of 10 criteria in the clustered index includeAccess to clean fuels and technologies for cooking (% of population)
• Access to clean fuels and technologies for cooking, rural (% of rural population) • Access to clean fuels and technologies for cooking, urban (% of urban population) • Access to electricity (% of population)
• Access to electricity, rural (% of rural population) • Access to electricity, urban (% of urban population) • Adjusted savings: carbon dioxide damage (% of GNI) • Adjusted savings: consumption of fixed capital (% of GNI)
• Adjusted net savings, including particulate emission damage (% of GNI) • Adjusted net savings, excluding particulate emission damage (% of GNI)
Trang 4Criteria and indicators students choose to do the test
Criteria for qualitative and quantitative:
I chose the topic of sustainable fuel, because Vietnam is a very polluted country and ranks 2nd in the world in terms of cancer incidence from such agents Therefore, I want to analyze the data and compare it with the data in Vietnam to show how the developed countries manage clean energy sources, from which some temporary solutions can be given Vietnam
- Bhutan is my country of choice because this is the place in the world where people want to live, because it has fresh air and and generally agrees with the metrics they use to manage waste
Criteria for time
Selection period from 2006 - 2020
- I want to analyze the oldest time to the present, the purpose is to find the difference in sustainable energy when the country transforms technology, and as a measure to analyze the change from the past to the present, and How human impact has affected fossil fuel energy resources
- The next criterion for the time I choose is that I want to make it more difficult for myself, try to exploit my full potential by choosing a long time to analyze data instead of following the request of the topic is a minimum of 10 years
Chapter 2: Processing and creating reports on Excel
Trang 5From datablank data after selecting requirements and criteria We export to an excel file with raw data as follows:
2.1 Processing data from Excel files
Step 1: Use the trim function to trim excess data and remove spaces before and after the text in the COUNTRY column
Column A: copied from column B Note:
Trim function syntax Cell (A1) = Trim (B1)
Inference: Column A data has been filtered to remove spaces before and after the text
Step 2: Select an empty cell to enter the desired data and format Select data area
Trang 6Display data: no cells found
Inferred: no blank cells and no need to enter data, desired format
Step 3: Use the ISNUMBER function to check whether the data is text or not
If the function is of the NUMBER format, the ISNUMBER function will display TRUE, indicating that the format is correct
If the function is in TEXT format, the ISNUMBER function will display FALSE, which indicates that the format is not correct
Result: When entering the ISNUMBER function, it checks that all cells display TRUE, indicating that the data has been formatted as a policy in NUMBER format
Step 4: Color the error cells
Select data area
Home -> Condition Formatting -> New Rule -> Format only cell that contains Select Format only cell with -> Errors -> Choose pink format format
Result: Cells in the data range do not contain pink cells (which have error types: #N/A, #VALUA, #REFI, #NULL!)
Inference: all data is not colored so the data does not contain error cells
Step 5: Replace cells with 0 into empty cells
Select File -> Option -> Advanced -> Check Show a zero in cells that have zero value in Display options for this worksheet
Trang 7Result: there are 2 cells containing 0's, H45, I45
Step 6: Replace empty cells with the character “-”
Select Format Cells -> Number -> Accounting -> Symbol (None) Result: Show empty cells as
Step 7: Filter with New Query
Select Data -> New Query -> From File -> From Excel Workbook -> Insert Excel File -> Display a New Query table
Start filtering -> Tick the arrow -> Remove null values -> Oke
Trang 8The obtained data table has the following form
Trang 9The same way to remove null values in the remaining columns Columns without null values remain the same
After filtering is complete, export the file
Select Close & Load -> Close & Load to export to an Excel page
Trang 10
From there, the data table obtained has the following form:
2.2 Create dashboards and reports that match existing data 2.2.1 Create data reporting dashboard
Question 1: Draw column chart From Data file create pivot from Access to clean fuels and technologies for cooking (% of population) and Access to clean fuels and technologies for cooking, rural (% of rural population)
Trang 162.2.2 Calculation of the remaining data (finding sum, max, min, average, vlookup, ) is done in the Excel file "Data Analytics" including content sentences
1.Total access by countries from 2006 - 2020
2 Bhutan uses the lowest access to clean fuel and technology for cooking (% of population), what is the data, and similar to other countries
From there, find the country with the lowest rate among countries
3 What is the highest access to clean fuel and technology for cooking (% of population) country, what is the data, and similar to other countries
4 What is the average access to clean fuel and technology for cooking (% of population) country, what is the data, and similar to other countries
5 Find area codes, abbreviations, Serial numbers of countries
6 What is the total amount of access to fuel and cooking technology (% of population) of Bhutan and VietNam, Singapore in 2020
Trang 17average and exponential flattening in 2021
8 Use the exponential leveling method with alpha = 0.2 to make the same time forecast in question 1
Trang 189 Compare the results of questions 7 and 8 Which method gives better results
The result in question 7 has a smaller MSE value than sentence 8 So the result in question 7 will be better
Trang 192.2.3 Use Conditional Formatting to highlight (Excel File “Data Conditional Formatting)
1 Use Conditional Formatting to highlight cells with more than 20 of Bhutan country
Trang 202020
5 Use Conditional Formatting to highlight cells 100% of Bhutan country 2006 - 2020
Chapter 3: Analyzing processed data on GOOGLE COLAB 3.1 Upload, connect Google drive with Google Colab
After creating the Excel file in chapter 1 and processing chapter 2 Upload the drive file csv
Open googledrive, create a folder, create a googlecolab file, and download the file from your computer's library as file.csv
Generate code to connect google drive with google Colab
Trang 21Copy the link into the code
Paste the link into the code
3.2 Run the commands to read the file with pandas, then tell the size of the tuple
Trang 22Data set size includes 14 columns, 5 rows
Includes a column containing data about country, year, and criteria such asAccess to clean fuels and technologies for cooking (% of population)
• Access to clean fuels and technologies for cooking, rural (% of rural population) • Access to clean fuels and technologies for cooking, urban (% of urban population) • Access to electricity (% of population)
• Access to electricity, rural (% of rural population) • Access to electricity, urban (% of urban population) • Adjusted savings: carbon dioxide damage (% of GNI) • Adjusted savings: consumption of fixed capital (% of GNI)
• Adjusted net savings, including particulate emission damage (% of GNI) • Adjusted net savings, excluding particulate emission damage (% of GNI)
3.3 Description of the data fields
Trang 23From the above data we can see that there are many different columns and rows
Trang 24And are sorted much better based on the 10 criteria outlined above
3.4 Plot the distribution chart for at least 2 columns, commentNhập code
In general, the data input is simpler and the results are concise On the contrary, in the cleaned data, although the input has many complexities in terms of code, the results are easy to see, the chart plot is relatively harmonious and the colors are stable And also beautiful results
The data presented in Chart format is distributed in a harmonious color and the relative proportions are approximately the same
Trang 25The data has 5 columns showing 5 different countries about Access to clean and technology, in which the highest percentage is Singapore at 100%, followed by Jaapan, Bhutan, Vietnam and the lowest is the country Nepal occupies, respectively rate of 20%
3.5 Draw at least 2 arbitrary graphs using seaborn, comment
The data is represented as a column, using the command “sns.boxplot” From the fairly simple data about Access to electricity but somewhat we understand their figures are in the range of 90 - 100%
Trang 26The data is shown by 4 dotted lines from bottom to top, in general, countries have different over the years, but in general, access to clean fuels and technology over the years has changed and grown from the command " sns.implot”
The data is shown by 4 dotted lines from bottom to top In general, countries have differences over the years, but in general, access to clean fuels and technology over the years has changed and grown unevenly from the command “sns.implt”
Trang 27From the command "pairplot" we have an overview of many different shapes, but most of them have small and medium differences
With the command "sns.heatmap, matrix" we see that there is a difference between the criteria, most of the numbers are from 0.1 to 1 with different parameters