VIETNAM-KOREA UNIVERSITY OF INFORMATION ANDCOMMUNICATION TECHNOLOGY COMPUTER SIENCE DEPARTMENT DATA WAREHOUSE ANALYSIS OF E-COMMERCE DATA LE VIET TOAN LE HOANG TRUNG LA HOANG NHAT Y NGUY
Trang 1VIETNAM-KOREA UNIVERSITY OF INFORMATION AND
COMMUNICATION TECHNOLOGY
COMPUTER SIENCE DEPARTMENT
DATA WAREHOUSE
ANALYSIS OF E-COMMERCE DATA
LE VIET TOAN
LE HOANG TRUNG
LA HOANG NHAT Y NGUYEN TIEN THANH
Da Nang, November 2024
Trang 2VIETNAM-KOREA UNIVERSITY OF INFORMATION AND
COMMUNICATION TECHNOLOGY
COMPUTER SIENCE DEPARTMENT
DATA WAREHOUSE
ANALYSIS OF E-COMMERCE DATA
Students: Le Xuan Tuyen
Instructor: Ths Tran Thanh Liem
Da Nang, November 2024
Trang 3This report for the “Data Warehouse” course, on the topic of “E-commerceData Analysis,” is the result of our group’s efforts, supported and encouraged by ourteachers and friends Through this, we would like to express our heartfelt thanks toeveryone who helped us during this period of study and research
Firstly, we extend our sincere gratitude to our lecturer and instructor, Master’sDegree Tran Thanh Liem, for his direct guidance and dedicated instructionsthroughout our report
We would also like to express our sincere thanks to the professors at the Faculty
of Information Technology, Vietnam - Korea University of Information andCommunication Technology, who have taught and assisted us throughout ouruniversity years The valuable knowledge and experience gained will serve asinvaluable assets that will guide us toward success in the future
Da Nang, November 2024
Trang 4ADVISOR’S FEEDBACK
Da Nang, …/…/2024
Instructor
(Sign and full name)
Trang 5TABLE OF CONTENTS
ACKNOWLEDGEMENTS
ADVISOR’S FEEDBACK
TABLE OF CONTENTS
LIST OF TABLES
IMAGE CATALOG
LIST OF ABBREVIATIONS
INTRODUCTION
1 Reason for choosing the topic 11
2 Project Goals 11
3 Project Orientation 12
4 Report Structure 12
CHAPTER 1: OVERVIEW OF THE DATASET
1.1 Data Source 13
1.2 Detailed Dataset Description 13
1.2.1 Dataset Parameters: 13
1.2.2 Extracted Data for Warehouse Construction 13
1.2.3 Detailed Description of Dataset Attributes 14
1.3 Introduction to tools used 15
1.3.1 Visual Studio 2022 15
1.3.2 SQL Server Management Studio (SSMS) 16
1.3.3 SQL Server Integration Services (SSIS) 16
1.3.4 SSAS (SQL Server Analysis Services) 17
1.3.5 SSRS (SQL Server Reporting Services) 17
1.3.6 MDX Language 18
CHAPTER 2: ANALYSIS AND DESIGN OF THE E-COMMERCE DATA WAREHOUSE
2.1 Logical Modeling 19
2.1.1 Facts 19
2.1.2 Dimension 20
2.2.3 Constellation Model 22
CHAPTER 3: DATA INTEGRATION INTO THE WAREHOUSE (SSIS)
Trang 63.1 Creating a New SSIS Project 23
3.2 Data Loading Process from Excel to Database 24
3.2.1 Creating an Excel Connection Manager 24
3.2.2 Creating an OLE DB Connection 26
3.2.3 Creating Control Flow 29
3.2.4 Project Execution Results 35
CHAPTER 4: DATA ANALYSIS (SSAS)
4.1 Query List 36
4.1.1 Using SSAS 36
4.1.2 Using MDX Query Language 36
4.2 Model Construction Results 37
4.3 Cube Construction Process 38
4.4 Executing Queries (SSAS, MDX) 40
CHAPTER 5: REPORTING (SSRS)
5.1 Create SSRS project 45
5.2 Reports 49
5.2.1 Statistics of total shipping costs over the years 49
5.2.2 Statistics of delivery volume by city from 2016-2018 50
5.2.3 Report on customer reviews of products 51
CONCLUSION
1 Results achieved 52
2 Limitations 52
REFERENCES
Trang 7LIST OF TABLES
Table 1 Dataset attribute description 15
Trang 8IMAGE CATALOG
Picture 1 Data after extract 1 13
Picture 2 Data after extract 2 14
Picture 3 Visual Studio 2022 15
Picture 4 SQL Server Management Studio 16
Picture 5 SQL Server Integration Services (SSIS) 16
Picture 6 SSAS (SQL Server Analysis Services) 17
Picture 7 SSRS (SQL Server Reporting Services) 17
Picture 8 Fact Shipping 19
Picture 9 Fact Review 20
Picture 10 Dim Products 20
Picture 11 Dim Customers 21
Picture 12 Dim Sellers 21
Picture 13 Dim Time 21
Picture 14 Constellation Model 22
Picture 15 Process of creating new SSIS project 1 23
Picture 16 Process of creating new SSIS project 2 24
Picture 17 Excel Connection Manager Creation Process 1 24
Picture 18 Excel Connection Manager Creation Process 2 25
Picture 19 Excel Connection Manager Creation Process 3 25
Picture 20 OLE DB Connection Creation Process 1 26
Picture 21 OLE DB Connection Creation Process 2 26
Picture 22 OLE DB Connection Creation Process 3 27
Picture 23 OLE DB Connection Creation Process 4 27
Picture 24 OLE DB Connection Creation Process 5 28
Picture 25 OLE DB Connection Creation Process 6 28
Picture 26 Control Flow Creation Process 1 29
Picture 27 Control Flow Creation Process 2 29
Picture 28 Control Flow Creation Process 3 30
Picture 29 Control Flow Creation Process 4 30
Picture 30 Control Flow Creation Process 5 31
Picture 31 Control Flow Creation Process 6 31
Picture 32 Control Flow Creation Process 7 32
Picture 33 Control Flow Creation Process 8 33
Picture 34 Control Flow Creation Process 9 33
Picture 35 Control Flow Creation Process 10 34
Picture 36 Control Flow Creation Process 11 34
Picture 37 Complete creating links to store data 35
Picture 38 Project run results 35
Picture 39 Model Construction Results 37
Picture 40 Cube Fact_Shipping 38
Trang 9Picture 41 Cube Fact_Review 39
Picture 42 Use SSAS tool for query 1 40
Picture 43 Use MDX query language for query 1 40
Picture 44 Use SSAS tool for query 2 41
Picture 45 Use MDX query language for query 2 41
Picture 46 Use SSAS tool for query 3 42
Picture 47 Use MDX query language for query 3 42
Picture 48 Use SSAS tool for query 4 43
Picture 49 Use MDX query language for query 4 43
Picture 50 Use SSAS tool for query 5 44
Picture 51 Use MDX query language for query 5 44
Picture 52 Create SSRS project 45
Picture 53 Create SSRS project 46
Picture 54 Create SSRS project 47
Picture 55 Create SSRS project 47
Picture 56 Create SSRS project 48
Picture 57 Statistics of total shipping costs over the years 49
Picture 58 Statistics of total shipping costs over the years 49
Picture 59 Statistics of delivery volume by city from 2016-2018 50
Picture 60 Statistics of delivery volume by city from 2016-2018 50
Picture 61 Report on customer reviews of products 51
Picture 62 Report on customer reviews of products 51
Trang 10LIST OF ABBREVIATIONS
1 SSMS SQL Server Management Studio
2 SSIS SQL Server Integration Services
3 SSRS SQL Server Reporting Services
4 API Application Programming Interface
5 GUI Graphical User Interface
6 IDE Integrated Development Environment
7 UI User Interface
8 SQL Structured Query Language
9 ETL Extract, Transform, Load
Trang 11A Data Warehouse (DW) is a specialized data management and storage systemdesigned to support business decision-making processes It collects, stores, andmanages data from multiple sources, such as daily transaction systems, databases, orexternal sources The primary purpose of a data warehouse is to provide acomprehensive and integrated view of business data, enabling managers and analysts
to make accurate decisions based on detailed and well-structured information
In today’s business environment, where the volume of data is rapidly increasing,companies need an efficient approach to collect and process data Data warehousesprovide an effective solution for aggregating data from different systems,standardizing it, and making it available for analysis and reporting Unlike traditionaldatabases, which are typically focused on supporting daily operations, data warehousesare specifically designed to support long-term data analysis
1 Reason for choosing the topic
With the rapid growth of e-commerce, understanding the factors affecting ordercompletion and customer satisfaction has become essential to improving efficiencyand customer retention This analysis focuses on identifying trends in order processingtime, shipping costs, and customer feedback By exploring data on orders, shipping,and reviews, this research aims to uncover valuable insights for optimizing e-commerce operations
Trang 123 Project Orientation
- Collect and standardize data from multiple sources
- Design an appropriate data model
- Develop an ETL (Extract, Transform, Load) system
- Build a query and reporting system
- Ensure data security and privacy
- Integrate advanced data analysis Optimize the data warehouse
4 Report Structure
After the introduction, the report is structured into four chapters as follows:
- Chapter 1 Overview of the Dataset
- Chapter 2 Analysis and Design of the E-commerce Data Warehouse
- Chapter 3 Data Integration into the Warehouse (SSIS)
- Chapter 4 Data Analysis (SSAS)
The report concludes with a summary and references
Trang 13CHAPTER 1: OVERVIEW OF THE
DATASET
1.1 Data Source
The data was collected from the website kaggle.com
(Source: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce)
1.2 Detailed Dataset Description
This dataset represents Brazilian e-commerce orders fulfilled by Olist Store Thedataset includes information on 100,000 orders from 2016 to 2018, covering variousmarketplaces in Brazil It enables multidimensional views of orders: from order status,prices, payments, and shipping performance to customer locations, product attributes,and customer-written reviews
1.2.1 Dataset Parameters:
The dataset comprises 52 columns and approximately 500,000 rows, with dataupdated by Kaggle from the Olist Store website in Brazil
1.2.2 Extracted Data for Warehouse Construction
Extracted approximately 112,651 rows and 20 columns to serve the purpose ofconstructing the e-commerce data warehouse
Picture 1 Data after extract 1
Trang 14Picture 2 Data after extract 2
1.2.3 Detailed Description of Dataset Attributes
product_category_name Product category nameproduct_weight_g Product weight (unit: grams)product_length_cm Product length (unit: cm)product_height_cm Product height (unit: cm)product_width_cm Product width (unit: cm)price Product pricefreight_value Shipping costorder_status Order statusorder_purchase_timestamp Order purchase timeorder_delivered_customer_date Delivery date to the customerorder_estimated_delivery_date Estimated delivery datecustomer_zip_code_prefix Customer zip code prefix
Trang 15customer_city Customer city
customer_state Customer state
seller_zip_code_prefix Seller zip code prefix
seller_city Seller city
seller_state eller state
review_score Product review score (scale from 1 to 5)review_comment_title Review comment title
review_comment_message Review content
Table 1 Dataset attribute description
1.3 Introduction to tools used
1.3.1 Visual Studio 2022
Microsoft Visual Studio is an integrated development environment (IDE) fromMicrosoft It is used to develop computer programs for Microsoft Windows, websites,web applications, and web services…
Integrating technologies: Business Intelligence, SQL server Data Tool(SSDT)
Picture 3 Visual Studio 2022
Trang 161.3.2 SQL Server Management Studio (SSMS)
SQL Server Management Studio, abbreviated as SSMS, is configured withmany separate components used to manage SQL Server It is an IDE (IntegratedDevelopment Environment) - Integrated Environment, providing Microsoft SQLServer an interface to connect and work
Picture 4 SQL Server Management Studio
1.3.3 SQL Server Integration Services (SSIS)
SQL Server Integration Services (SSIS), a powerful tool in the Microsoft SQLServer suite, goes beyond the limitations of a simple data extraction, transformation,and loading (ETL) tool It is a complete platform, an ecosystem, empoweringdevelopers and data architects to build robust, scalable, and maintainable dataintegration solutions
Picture 5 SQL Server Integration Services (SSIS)
Trang 171.3.4 SSAS (SQL Server Analysis Services)
SSAS is an OLAP (Online Analytical Processing) and data mining tool Itallows users to create data models, summarize and analyze large data sets SSASprovides a way to organize and compress data into blocks, called "cubes," for fast dataretrieval In addition, SSAS supports multidimensional and predictive data analysis,which helps to generate results and forecast models based on data
Picture 6 SSAS (SQL Server Analysis Services)
1.3.5 SSRS (SQL Server Reporting Services)
SSRS is a reporting tool for creating, managing, and distributing reports fromSQL Server data or other data sources SSRS allows you to create reports from basic tocomplex in a variety of formats such as tables, charts, and dynamic charts Users cancustomize and share these reports, as well as schedule them to be sent automatically atregular intervals
Picture 7 SSRS (SQL Server Reporting Services)
Trang 181.3.6 MDX Language
MDX (MultiDimension Express) language is an extension of the SQL querylanguage, applied to exploit OLAP data warehouses with multidimensional cubes tosupport information synthesis and decision making
Basic structure of MDX query language:
- Apply MDX to exploit OLAP data warehouses with multidimensional cubes tosupport information synthesis and decision making
- In terms of structure: similar to the structure of an SQL query on a normaldatabase, but more extended to query on multidimensional data cubes.-
Trang 19CHAPTER 2: ANALYSIS AND DESIGN
OF THE E-COMMERCE DATA
Picture 8 Fact Shipping
Trang 20Fact_Review: This fact table analyzes product quality based on customer reviews, helping assess customer satisfaction with purchased products.
Picture 9 Fact Review
2.1.2 Dimension
- Dim Products: contains product information
Picture 10 Dim Products
Trang 21- Dim Customers: contains customer addresses.
Picture 11 Dim Customers
- Dim Sellers: contains seller addresses
Picture 12 Dim Sellers
- Dim Time: contains information on order time and review time
Picture 13 Dim Time
Trang 222.2.3 Constellation Model
Picture 14 Constellation Model
Trang 23CHAPTER 3: DATA INTEGRATION INTO THE WAREHOUSE (SSIS)
In this chapter, SSIS (SQL Server Integration Services) is used to build dataintegration packages and set up automated data processing Queries are written totransfer data from OLTP (Online Transaction Processing) to the data warehouse
3.1 Creating a New SSIS Project
Open Visual Studio 2022 -> Select Create a new project.
Picture 15 Process of creating new SSIS project 1
In the search box, type "Integration Services Project" and select it to start a newSSIS project
Trang 24Picture 16 Process of creating new SSIS project 2
3.2 Data Loading Process from Excel to Database
3.2.1 Creating an Excel Connection Manager
Right-click and select "New File Connection"
Picture 17 Excel Connection Manager Creation Process 1
Trang 25In the dialog box, select "Excel" -> Click "Add".
Picture 18 Excel Connection Manager Creation Process 2
Select the path to the Excel file and specify the file version -> Click "OK"
Picture 19 Excel Connection Manager Creation Process 3
Trang 263.2.2 Creating an OLE DB Connection
Right-click and select "New OLE DB Connection"
Picture 20 OLE DB Connection Creation Process 1
A new window appears -> Click "New" to create a new connection
Picture 21 OLE DB Connection Creation Process 2