1. Trang chủ
  2. » Luận Văn - Báo Cáo

DATA WAREHOUSE ANALYSIS OF E-COMMERCE DATA

53 2 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 53
Dung lượng 11,41 MB

Nội dung

VIETNAM-KOREA UNIVERSITY OF INFORMATION ANDCOMMUNICATION TECHNOLOGY COMPUTER SIENCE DEPARTMENT DATA WAREHOUSE ANALYSIS OF E-COMMERCE DATA LE VIET TOAN LE HOANG TRUNG LA HOANG NHAT Y NGUY

Trang 1

VIETNAM-KOREA UNIVERSITY OF INFORMATION AND

COMMUNICATION TECHNOLOGY

COMPUTER SIENCE DEPARTMENT

DATA WAREHOUSE

ANALYSIS OF E-COMMERCE DATA

LE VIET TOAN

LE HOANG TRUNG

LA HOANG NHAT Y NGUYEN TIEN THANH

Da Nang, November 2024

Trang 2

VIETNAM-KOREA UNIVERSITY OF INFORMATION AND

COMMUNICATION TECHNOLOGY

COMPUTER SIENCE DEPARTMENT

DATA WAREHOUSE

ANALYSIS OF E-COMMERCE DATA

Students: Le Xuan Tuyen

Instructor: Ths Tran Thanh Liem

Da Nang, November 2024

Trang 3

This report for the “Data Warehouse” course, on the topic of “E-commerceData Analysis,” is the result of our group’s efforts, supported and encouraged by ourteachers and friends Through this, we would like to express our heartfelt thanks toeveryone who helped us during this period of study and research

Firstly, we extend our sincere gratitude to our lecturer and instructor, Master’sDegree Tran Thanh Liem, for his direct guidance and dedicated instructionsthroughout our report

We would also like to express our sincere thanks to the professors at the Faculty

of Information Technology, Vietnam - Korea University of Information andCommunication Technology, who have taught and assisted us throughout ouruniversity years The valuable knowledge and experience gained will serve asinvaluable assets that will guide us toward success in the future

Da Nang, November 2024

Trang 4

ADVISOR’S FEEDBACK

Da Nang, …/…/2024

Instructor

(Sign and full name)

Trang 5

TABLE OF CONTENTS

ACKNOWLEDGEMENTS

ADVISOR’S FEEDBACK

TABLE OF CONTENTS

LIST OF TABLES

IMAGE CATALOG

LIST OF ABBREVIATIONS

INTRODUCTION

1 Reason for choosing the topic 11

2 Project Goals 11

3 Project Orientation 12

4 Report Structure 12

CHAPTER 1: OVERVIEW OF THE DATASET

1.1 Data Source 13

1.2 Detailed Dataset Description 13

1.2.1 Dataset Parameters: 13

1.2.2 Extracted Data for Warehouse Construction 13

1.2.3 Detailed Description of Dataset Attributes 14

1.3 Introduction to tools used 15

1.3.1 Visual Studio 2022 15

1.3.2 SQL Server Management Studio (SSMS) 16

1.3.3 SQL Server Integration Services (SSIS) 16

1.3.4 SSAS (SQL Server Analysis Services) 17

1.3.5 SSRS (SQL Server Reporting Services) 17

1.3.6 MDX Language 18

CHAPTER 2: ANALYSIS AND DESIGN OF THE E-COMMERCE DATA WAREHOUSE

2.1 Logical Modeling 19

2.1.1 Facts 19

2.1.2 Dimension 20

2.2.3 Constellation Model 22

CHAPTER 3: DATA INTEGRATION INTO THE WAREHOUSE (SSIS)

Trang 6

3.1 Creating a New SSIS Project 23

3.2 Data Loading Process from Excel to Database 24

3.2.1 Creating an Excel Connection Manager 24

3.2.2 Creating an OLE DB Connection 26

3.2.3 Creating Control Flow 29

3.2.4 Project Execution Results 35

CHAPTER 4: DATA ANALYSIS (SSAS)

4.1 Query List 36

4.1.1 Using SSAS 36

4.1.2 Using MDX Query Language 36

4.2 Model Construction Results 37

4.3 Cube Construction Process 38

4.4 Executing Queries (SSAS, MDX) 40

CHAPTER 5: REPORTING (SSRS)

5.1 Create SSRS project 45

5.2 Reports 49

5.2.1 Statistics of total shipping costs over the years 49

5.2.2 Statistics of delivery volume by city from 2016-2018 50

5.2.3 Report on customer reviews of products 51

CONCLUSION

1 Results achieved 52

2 Limitations 52

REFERENCES

Trang 7

LIST OF TABLES

Table 1 Dataset attribute description 15

Trang 8

IMAGE CATALOG

Picture 1 Data after extract 1 13

Picture 2 Data after extract 2 14

Picture 3 Visual Studio 2022 15

Picture 4 SQL Server Management Studio 16

Picture 5 SQL Server Integration Services (SSIS) 16

Picture 6 SSAS (SQL Server Analysis Services) 17

Picture 7 SSRS (SQL Server Reporting Services) 17

Picture 8 Fact Shipping 19

Picture 9 Fact Review 20

Picture 10 Dim Products 20

Picture 11 Dim Customers 21

Picture 12 Dim Sellers 21

Picture 13 Dim Time 21

Picture 14 Constellation Model 22

Picture 15 Process of creating new SSIS project 1 23

Picture 16 Process of creating new SSIS project 2 24

Picture 17 Excel Connection Manager Creation Process 1 24

Picture 18 Excel Connection Manager Creation Process 2 25

Picture 19 Excel Connection Manager Creation Process 3 25

Picture 20 OLE DB Connection Creation Process 1 26

Picture 21 OLE DB Connection Creation Process 2 26

Picture 22 OLE DB Connection Creation Process 3 27

Picture 23 OLE DB Connection Creation Process 4 27

Picture 24 OLE DB Connection Creation Process 5 28

Picture 25 OLE DB Connection Creation Process 6 28

Picture 26 Control Flow Creation Process 1 29

Picture 27 Control Flow Creation Process 2 29

Picture 28 Control Flow Creation Process 3 30

Picture 29 Control Flow Creation Process 4 30

Picture 30 Control Flow Creation Process 5 31

Picture 31 Control Flow Creation Process 6 31

Picture 32 Control Flow Creation Process 7 32

Picture 33 Control Flow Creation Process 8 33

Picture 34 Control Flow Creation Process 9 33

Picture 35 Control Flow Creation Process 10 34

Picture 36 Control Flow Creation Process 11 34

Picture 37 Complete creating links to store data 35

Picture 38 Project run results 35

Picture 39 Model Construction Results 37

Picture 40 Cube Fact_Shipping 38

Trang 9

Picture 41 Cube Fact_Review 39

Picture 42 Use SSAS tool for query 1 40

Picture 43 Use MDX query language for query 1 40

Picture 44 Use SSAS tool for query 2 41

Picture 45 Use MDX query language for query 2 41

Picture 46 Use SSAS tool for query 3 42

Picture 47 Use MDX query language for query 3 42

Picture 48 Use SSAS tool for query 4 43

Picture 49 Use MDX query language for query 4 43

Picture 50 Use SSAS tool for query 5 44

Picture 51 Use MDX query language for query 5 44

Picture 52 Create SSRS project 45

Picture 53 Create SSRS project 46

Picture 54 Create SSRS project 47

Picture 55 Create SSRS project 47

Picture 56 Create SSRS project 48

Picture 57 Statistics of total shipping costs over the years 49

Picture 58 Statistics of total shipping costs over the years 49

Picture 59 Statistics of delivery volume by city from 2016-2018 50

Picture 60 Statistics of delivery volume by city from 2016-2018 50

Picture 61 Report on customer reviews of products 51

Picture 62 Report on customer reviews of products 51

Trang 10

LIST OF ABBREVIATIONS

1 SSMS SQL Server Management Studio

2 SSIS SQL Server Integration Services

3 SSRS SQL Server Reporting Services

4 API Application Programming Interface

5 GUI Graphical User Interface

6 IDE Integrated Development Environment

7 UI User Interface

8 SQL Structured Query Language

9 ETL Extract, Transform, Load

Trang 11

A Data Warehouse (DW) is a specialized data management and storage systemdesigned to support business decision-making processes It collects, stores, andmanages data from multiple sources, such as daily transaction systems, databases, orexternal sources The primary purpose of a data warehouse is to provide acomprehensive and integrated view of business data, enabling managers and analysts

to make accurate decisions based on detailed and well-structured information

In today’s business environment, where the volume of data is rapidly increasing,companies need an efficient approach to collect and process data Data warehousesprovide an effective solution for aggregating data from different systems,standardizing it, and making it available for analysis and reporting Unlike traditionaldatabases, which are typically focused on supporting daily operations, data warehousesare specifically designed to support long-term data analysis

1 Reason for choosing the topic

With the rapid growth of e-commerce, understanding the factors affecting ordercompletion and customer satisfaction has become essential to improving efficiencyand customer retention This analysis focuses on identifying trends in order processingtime, shipping costs, and customer feedback By exploring data on orders, shipping,and reviews, this research aims to uncover valuable insights for optimizing e-commerce operations

Trang 12

3 Project Orientation

- Collect and standardize data from multiple sources

- Design an appropriate data model

- Develop an ETL (Extract, Transform, Load) system

- Build a query and reporting system

- Ensure data security and privacy

- Integrate advanced data analysis Optimize the data warehouse

4 Report Structure

After the introduction, the report is structured into four chapters as follows:

- Chapter 1 Overview of the Dataset

- Chapter 2 Analysis and Design of the E-commerce Data Warehouse

- Chapter 3 Data Integration into the Warehouse (SSIS)

- Chapter 4 Data Analysis (SSAS)

The report concludes with a summary and references

Trang 13

CHAPTER 1: OVERVIEW OF THE

DATASET

1.1 Data Source

The data was collected from the website kaggle.com

(Source: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce)

1.2 Detailed Dataset Description

This dataset represents Brazilian e-commerce orders fulfilled by Olist Store Thedataset includes information on 100,000 orders from 2016 to 2018, covering variousmarketplaces in Brazil It enables multidimensional views of orders: from order status,prices, payments, and shipping performance to customer locations, product attributes,and customer-written reviews

1.2.1 Dataset Parameters:

The dataset comprises 52 columns and approximately 500,000 rows, with dataupdated by Kaggle from the Olist Store website in Brazil

1.2.2 Extracted Data for Warehouse Construction

Extracted approximately 112,651 rows and 20 columns to serve the purpose ofconstructing the e-commerce data warehouse

Picture 1 Data after extract 1

Trang 14

Picture 2 Data after extract 2

1.2.3 Detailed Description of Dataset Attributes

product_category_name Product category nameproduct_weight_g Product weight (unit: grams)product_length_cm Product length (unit: cm)product_height_cm Product height (unit: cm)product_width_cm Product width (unit: cm)price Product pricefreight_value Shipping costorder_status Order statusorder_purchase_timestamp Order purchase timeorder_delivered_customer_date Delivery date to the customerorder_estimated_delivery_date Estimated delivery datecustomer_zip_code_prefix Customer zip code prefix

Trang 15

customer_city Customer city

customer_state Customer state

seller_zip_code_prefix Seller zip code prefix

seller_city Seller city

seller_state eller state

review_score Product review score (scale from 1 to 5)review_comment_title Review comment title

review_comment_message Review content

Table 1 Dataset attribute description

1.3 Introduction to tools used

1.3.1 Visual Studio 2022

Microsoft Visual Studio is an integrated development environment (IDE) fromMicrosoft It is used to develop computer programs for Microsoft Windows, websites,web applications, and web services…

Integrating technologies: Business Intelligence, SQL server Data Tool(SSDT)

Picture 3 Visual Studio 2022

Trang 16

1.3.2 SQL Server Management Studio (SSMS)

SQL Server Management Studio, abbreviated as SSMS, is configured withmany separate components used to manage SQL Server It is an IDE (IntegratedDevelopment Environment) - Integrated Environment, providing Microsoft SQLServer an interface to connect and work

Picture 4 SQL Server Management Studio

1.3.3 SQL Server Integration Services (SSIS)

SQL Server Integration Services (SSIS), a powerful tool in the Microsoft SQLServer suite, goes beyond the limitations of a simple data extraction, transformation,and loading (ETL) tool It is a complete platform, an ecosystem, empoweringdevelopers and data architects to build robust, scalable, and maintainable dataintegration solutions

Picture 5 SQL Server Integration Services (SSIS)

Trang 17

1.3.4 SSAS (SQL Server Analysis Services)

SSAS is an OLAP (Online Analytical Processing) and data mining tool Itallows users to create data models, summarize and analyze large data sets SSASprovides a way to organize and compress data into blocks, called "cubes," for fast dataretrieval In addition, SSAS supports multidimensional and predictive data analysis,which helps to generate results and forecast models based on data

Picture 6 SSAS (SQL Server Analysis Services)

1.3.5 SSRS (SQL Server Reporting Services)

SSRS is a reporting tool for creating, managing, and distributing reports fromSQL Server data or other data sources SSRS allows you to create reports from basic tocomplex in a variety of formats such as tables, charts, and dynamic charts Users cancustomize and share these reports, as well as schedule them to be sent automatically atregular intervals

Picture 7 SSRS (SQL Server Reporting Services)

Trang 18

1.3.6 MDX Language

MDX (MultiDimension Express) language is an extension of the SQL querylanguage, applied to exploit OLAP data warehouses with multidimensional cubes tosupport information synthesis and decision making

Basic structure of MDX query language:

- Apply MDX to exploit OLAP data warehouses with multidimensional cubes tosupport information synthesis and decision making

- In terms of structure: similar to the structure of an SQL query on a normaldatabase, but more extended to query on multidimensional data cubes.-

Trang 19

CHAPTER 2: ANALYSIS AND DESIGN

OF THE E-COMMERCE DATA

Picture 8 Fact Shipping

Trang 20

Fact_Review: This fact table analyzes product quality based on customer reviews, helping assess customer satisfaction with purchased products.

Picture 9 Fact Review

2.1.2 Dimension

- Dim Products: contains product information

Picture 10 Dim Products

Trang 21

- Dim Customers: contains customer addresses.

Picture 11 Dim Customers

- Dim Sellers: contains seller addresses

Picture 12 Dim Sellers

- Dim Time: contains information on order time and review time

Picture 13 Dim Time

Trang 22

2.2.3 Constellation Model

Picture 14 Constellation Model

Trang 23

CHAPTER 3: DATA INTEGRATION INTO THE WAREHOUSE (SSIS)

In this chapter, SSIS (SQL Server Integration Services) is used to build dataintegration packages and set up automated data processing Queries are written totransfer data from OLTP (Online Transaction Processing) to the data warehouse

3.1 Creating a New SSIS Project

Open Visual Studio 2022 -> Select Create a new project.

Picture 15 Process of creating new SSIS project 1

In the search box, type "Integration Services Project" and select it to start a newSSIS project

Trang 24

Picture 16 Process of creating new SSIS project 2

3.2 Data Loading Process from Excel to Database

3.2.1 Creating an Excel Connection Manager

Right-click and select "New File Connection"

Picture 17 Excel Connection Manager Creation Process 1

Trang 25

In the dialog box, select "Excel" -> Click "Add".

Picture 18 Excel Connection Manager Creation Process 2

Select the path to the Excel file and specify the file version -> Click "OK"

Picture 19 Excel Connection Manager Creation Process 3

Trang 26

3.2.2 Creating an OLE DB Connection

Right-click and select "New OLE DB Connection"

Picture 20 OLE DB Connection Creation Process 1

A new window appears -> Click "New" to create a new connection

Picture 21 OLE DB Connection Creation Process 2

Ngày đăng: 20/12/2024, 16:26

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w