Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 58 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
58
Dung lượng
1,44 MB
Nội dung
VIETNAM - KOREA UNIVERSITY OF INFORMATION & COMMUNICATION TECHNOLOGY FACULTY OF COMPUTER SCIENCE DATA WAREHOUSE BUILDING A DATA WAREHOUSE FOR TRAFFIC ACCIDENT Students Instructor Class : LÊ PHÚ QUỐC PHẠM TOÀN PHÚC LÊ THỊ HỒNG QUÝ LÊ VIỆT THẮNG : PhD NGUYỄN THU HƯƠNG : 20GIT Da Nang, May of 2023 VIETNAM - KOREA UNIVERSITY OF INFORMATION & COMMUNICATION TECHNOLOGY FACULTY OF SCIENCE DEPARTMENT DATA WAREHOUSE BUILDING A DATA WAREHOUSE FOR TRAFFIC ACCIDENT Students Instructor Class : LÊ PHÚ QUỐC PHẠM TOÀN PHÚC LÊ THỊ HỒNG QUÝ LÊ VIỆT THẮNG : PhD NGUYỄN THU HƯƠNG : 20GIT Da Nang, May of 2023 ACKNOWLEDGMENTS First of all, the team would like to express their sincere thanks to PhD Nguyen Thu Huong (Lecturer of Data Warehouse) for helping the group acquire the basic knowledge needed as the foundation to carry out this thesis She directly guided the group enthusiastically, corrected mistakes, and contributed many valuable comments to help the group complete their subject report well During one semester of project implementation, the group applied the accumulated background knowledge and combined it with learning and researching new knowledge Since then, the team has applied what it has collected to complete the best project report However, in the implementation process, the team cannot avoid shortcomings Therefore, the group is looking forward to receiving suggestions from teachers in order to improve the knowledge that it has acquired and prepare the group to tackle other topics in the future Sincerely, thank you! COMMENT …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… ……………………………………………………… TABLE OF CONTENTS Chapter Introduction 1.1 The goal of the project 1.2 Requirements 1.3 Conceptual in Data warehouse 1.3.1 Dimension 1.3.2 Fact 1.4 Tools 1.4.1 Visual Studio 1.4.2 SQL Server Integration Services 1.4.3 SQL Server Management Studio Chapter Data warehouse analysis and design 2.1 Conceptual modeling 2.1.1 Measure and dimension entities 2.1.2 Hierarchies of dimensions 2.1.3 Conceptual modeling diagram 2.2 Logical modeling 2.2.1 Fact and dimension tables 2.2.2 Star schema 2.3 Query questions Chapter Data Warehouse Development Chapter ETL Process 4.1 Conceptual ETL design 4.2 ETL development by using SSIS 4.2.1 Time_Dim 4.2.2 Location_Dim 4.2.3 Cause_Dim 4.2.4 Participant_Dim 4.2.5 Vehicle_Dim 4.2.6 Accidents_Fact 4.2.7 Casualties_Fact 4.2.8 Damages_Fact Chapter OLAP Analysis Chapter SSRS 6.1 Number of accidents, number of casualties, number of vehicles damaged by month, quarter 6.2 Number of accidents by cause 6.3 Number of casualties by age group over the years 6.4 Number of casualties by gender over the years 6.5 Number of people injured and dead over the years 6.6 Number of vehicles damaged by vehicle type 6.7 Top provinces/cities with the most accidents 6.8 Top provinces/cities with the highest number of deaths in the adult age group 6.9 Top provinces/cities with the largest number of casualties 6.10 Top provinces/cities with the most property damage 6.11 Top provinces/cities with the most damage to vehicles Chapter Conclusions Page 10 10 10 11 11 11 11 11 11 12 13 13 13 13 15 15 15 18 18 20 22 22 22 22 23 24 25 27 28 29 31 33 39 39 40 42 43 45 46 47 49 50 52 53 55 7.1 Conclusion 7.1.1 Achievements 7.1.2 Limitations 7.2 Development References 55 55 55 56 57 LIST OF IMAGES Figure 1-1 Visual Studio 11 Figure 1-2 SQL Server Management Studio 12 Figure 2-1 Time Dimension 13 Figure 2-2 Location Dimension 14 Figure 2-3 Participant Dimension .14 Figure 2-4 Cause Dimension .14 Figure 2-5 Vehicle Dimension 14 Figure 2-6 Conceptual modeling diagram .15 Figure 2-7 Star Schema 18 Figure 3-1 Cause Dim .20 Figure 3-2 Location Dim 20 Figure 3-3 Participant Dim .20 Figure 3-4 Time Dim 20 Figure 3-5 Vehicle Dim 20 Figure 3-6 Accidents Fact 21 Figure 3-7 Casualties Fact 21 Figure 3-8 Damages Fact 21 Figure 4-1 Conceptual ETL design .22 Figure 4-2 Time Dim Data flow 22 Figure 4-3 Time Dim Dataset 23 Figure 4-4 Time Dim ETL result 23 Figure 4-5 Location Dim Data flow 23 Figure 4-6 Location Dim Dataset .24 Figure 4-7 Location Dim ETL result 24 Figure 4-8 Cause Dim Dataflow 24 Figure 4-9 Cause Dim Dataset 25 Figure 4-10 Cause Dim ETL result .25 Figure 4-11 Participant Dim Data flow 25 Figure 4-12 Participant Dim Dataset 26 Figure 4-13 Participant Dim Dataset 26 Figure 4-14 Vehicle Dim Data flow 27 Figure 4-15 Vehicle Dim Dataset .27 Figure 4-16 Vehicle Dim ETL Result 27 Figure 4-17 Accident Fact Data flow 28 Figure 4-18 Accident Fact Dataset 28 Figure 4-19 Accident Fact ETL result 28 Figure 4-20 Casualties Fact Data flow 29 Figure 4-21 Casualties Fact Dataset 29 Figure 4-22 Casualties Fact Dataset 30 Figure 4-23 Casualties Fact ETL result 30 Figure 4-24 Damages Fact Data flow 31 Figure 4-25 Damages Fact Dataset 31 Figure 4-26 Damages Fact Dataset 32 Figure 4-27 Damages Fact ETL result 32 Figure 5-1 Cube .33 Figure 5-2 MDX Question 33 Figure 5-3 MDX Query 34 Figure 5-4 MDX Query 34 Figure 5-5 MDX Query 35 Figure 5-6 MDX Query 35 Figure 5-7 MDX Query 36 Figure 5-8 MDX Query 36 Figure 5-9 MDX Query 37 Figure 5-10 MDX Query 10 37 Figure 5-11 MDX Query 11 38 Figure 5-12 MDX Query 12 38 Figure 6-1 Question query 39 Figure 6-2 Answer report format 39 Figure 6-3 Answer .40 Figure 6-4 Question query 40 Figure 6-5 Answer report format 40 Figure 6-6 Answer .41 Figure 6-7 Question query 41 Figure 6-8 Answer report format 41 Figure 6-9 Answer .42 Figure 6-10 Question query 42 Figure 6-11 Answer report format 43 Figure 6-12 Answer 43 Figure 6-13 Question query 43 Figure 6-14 Answer report format 44 Figure 6-15 Answer 44 Figure 6-16 Question query 45 Figure 6-17 Answer report format 45 Figure 6-18 Answer 46 Figure 6-19 Question query 46 Figure 6-20 Answer report format 47 Figure 6-21 Answer 47 Figure 6-22 Question query 47 Figure 6-23 Answer report format 48 Figure 6-24 Answer 48 Figure 6-25 Question query 49 Figure 6-26 Answer report format 49 Figure 6-27 Answer 50 Figure 6-28 Question 10 query 50 Figure 6-29 Answer 10 report format 51 Figure 6-30 Answer 10 51 Figure 6-31 Question 11 query 52 Figure 6-32 Answer 11 report format 52 Figure 6-33 Answer 11 53 Figure 6-34 Question 12 query 53 Figure 6-35 Answer 12 report format 54 Figure 6-36 Answer 12 54 LIST OF ACRONYMS Number Phrase Integrated development environment Structured Query Language Extensible Markup Language Extract - Transform - Load SQL Server Integration Services Abbreviation IDE SQL XML ETL SSIS Chapter 1.1 Introduction The goal of the project Traffic accidents are a major public health problem that causes death, injury, and disability for millions of people around the world According to the World Health Organization (WHO), road traffic injuries are the leading cause of death for children and young adults aged 5-29 years, and the eighth leading cause of death for all age groups Road traffic injuries also have a significant economic impact, costing countries on average 3% of their gross domestic product The traffic accident data warehouse provides a comprehensive and reliable source of data for analyzing and predicting traffic accidents A data warehouse is a centralized repository of integrated data from various sources, such as police reports, road sensors, vehicle registrations, weather stations, etc A data warehouse enables the application of data mining techniques to discover patterns and trends in the data, such as the causes, effects, and risk factors of traffic accidents Data mining is the process of extracting useful information from large and complex data sets using statistical and machine learning methods A traffic accident data warehouse can support various objectives and stakeholders in the field of road safety For example, it can help policymakers and planners design and evaluate effective interventions and regulations to reduce traffic accidents and fatalities It can also help researchers and analysts identify and understand the underlying factors and mechanisms of traffic accidents, such as human behavior, road conditions, vehicle characteristics, etc Furthermore, it can help drivers and travelers make informed decisions and avoid potential hazards on the road 1.2 Requirements The data warehouse should store historical and current data on traffic accidents, such as location, date, time, causes, vehicles involved, injuries, fatalities, etc The data warehouse should support various analytical queries and reports on traffic accident data, such as the frequency and distribution of accidents by location (ware, district, province), time (date, month, quarter, year), etc., the correlation and causation of accidents with various factors, the impact and cost of accidents on society and economy, etc The data warehouse should be scalable, reliable, and efficient to handle large volumes of data and high concurrency of users Statistics on the number of traffic accidents by location over years Statistics on the number of traffic accidents by cause Statistics on the number of vehicles damaged by cause The largest number of vehicles damaged, the smallest number of vehicles damaged due to causes Sort the number of casualties in ascending order, by years Top months with the most accidents Top months with the least number of accidents Statistics of the total number of casualties in each province Statistics of casualties by month of 2022