Data warehouse building a data warehouse for traffic accident

58 4 0
Data warehouse building a data warehouse for traffic accident

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

VIETNAM - KOREA UNIVERSITY OF INFORMATION & COMMUNICATION TECHNOLOGY FACULTY OF COMPUTER SCIENCE DATA WAREHOUSE BUILDING A DATA WAREHOUSE FOR TRAFFIC ACCIDENT Students Instructor Class : LÊ PHÚ QUỐC PHẠM TOÀN PHÚC LÊ THỊ HỒNG QUÝ LÊ VIỆT THẮNG : PhD NGUYỄN THU HƯƠNG : 20GIT Da Nang, May of 2023 VIETNAM - KOREA UNIVERSITY OF INFORMATION & COMMUNICATION TECHNOLOGY FACULTY OF SCIENCE DEPARTMENT DATA WAREHOUSE BUILDING A DATA WAREHOUSE FOR TRAFFIC ACCIDENT Students Instructor Class : LÊ PHÚ QUỐC PHẠM TOÀN PHÚC LÊ THỊ HỒNG QUÝ LÊ VIỆT THẮNG : PhD NGUYỄN THU HƯƠNG : 20GIT Da Nang, May of 2023 ACKNOWLEDGMENTS First of all, the team would like to express their sincere thanks to PhD Nguyen Thu Huong (Lecturer of Data Warehouse) for helping the group acquire the basic knowledge needed as the foundation to carry out this thesis She directly guided the group enthusiastically, corrected mistakes, and contributed many valuable comments to help the group complete their subject report well During one semester of project implementation, the group applied the accumulated background knowledge and combined it with learning and researching new knowledge Since then, the team has applied what it has collected to complete the best project report However, in the implementation process, the team cannot avoid shortcomings Therefore, the group is looking forward to receiving suggestions from teachers in order to improve the knowledge that it has acquired and prepare the group to tackle other topics in the future Sincerely, thank you! COMMENT …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… ……………………………………………………… TABLE OF CONTENTS Page Chapter Introduction 10 1.1 The goal of the project 10 1.2 Requirements 10 1.3 Conceptual in Data warehouse 11 1.3.1 Dimension 11 1.3.2 Fact 11 1.4 Tools 11 1.4.1 Visual Studio 11 1.4.2 SQL Server Integration Services 11 1.4.3 SQL Server Management Studio 12 Chapter Data warehouse analysis and design 13 2.1 Conceptual modeling 13 2.1.1 Measure and dimension entities 13 2.1.2 Hierarchies of dimensions 13 2.1.3 Conceptual modeling diagram 15 2.2 Logical modeling 15 2.2.1 Fact and dimension tables 15 2.2.2 Star schema 18 2.3 Query questions 18 Chapter Data Warehouse Development 20 Chapter ETL Process 22 4.1 Conceptual ETL design 22 4.2 ETL development by using SSIS 22 4.2.1 Time_Dim 22 4.2.2 Location_Dim 23 4.2.3 Cause_Dim 24 4.2.4 Participant_Dim 25 4.2.5 Vehicle_Dim 27 4.2.6 Accidents_Fact 28 4.2.7 Casualties_Fact 29 4.2.8 Damages_Fact 31 Chapter OLAP Analysis 33 Chapter SSRS 39 6.1 Number of accidents, number of casualties, number of vehicles damaged by month, quarter 39 6.2 Number of accidents by cause 40 6.3 Number of casualties by age group over the years 42 6.4 Number of casualties by gender over the years 43 6.5 Number of people injured and dead over the years 45 6.6 Number of vehicles damaged by vehicle type 46 6.7 Top provinces/cities with the most accidents 47 6.8 Top provinces/cities with the highest number of deaths in the adult age group 49 6.9 Top provinces/cities with the largest number of casualties 50 6.10 Top provinces/cities with the most property damage 52 6.11 Top provinces/cities with the most damage to vehicles 53 Chapter Conclusions 55 7.1 Conclusion 7.1.1 Achievements 7.1.2 Limitations 7.2 Development References 55 55 55 56 57 LIST OF IMAGES Figure 1-1 Visual Studio 11 Figure 1-2 SQL Server Management Studio 12 Figure 2-1 Time Dimension .13 Figure 2-2 Location Dimension 14 Figure 2-3 Participant Dimension 14 Figure 2-4 Cause Dimension .14 Figure 2-5 Vehicle Dimension 14 Figure 2-6 Conceptual modeling diagram 15 Figure 2-7 Star Schema 18 Figure 3-1 Cause Dim 20 Figure 3-2 Location Dim 20 Figure 3-3 Participant Dim 20 Figure 3-4 Time Dim 20 Figure 3-5 Vehicle Dim 20 Figure 3-6 Accidents Fact 21 Figure 3-7 Casualties Fact 21 Figure 3-8 Damages Fact 21 Figure 4-1 Conceptual ETL design 22 Figure 4-2 Time Dim Data flow 22 Figure 4-3 Time Dim Dataset 23 Figure 4-4 Time Dim ETL result .23 Figure 4-5 Location Dim Data flow 23 Figure 4-6 Location Dim Dataset .24 Figure 4-7 Location Dim ETL result 24 Figure 4-8 Cause Dim Dataflow 24 Figure 4-9 Cause Dim Dataset 25 Figure 4-10 Cause Dim ETL result 25 Figure 4-11 Participant Dim Data flow 25 Figure 4-12 Participant Dim Dataset 26 Figure 4-13 Participant Dim Dataset 26 Figure 4-14 Vehicle Dim Data flow 27 Figure 4-15 Vehicle Dim Dataset .27 Figure 4-16 Vehicle Dim ETL Result 27 Figure 4-17 Accident Fact Data flow 28 Figure 4-18 Accident Fact Dataset .28 Figure 4-19 Accident Fact ETL result 28 Figure 4-20 Casualties Fact Data flow .29 Figure 4-21 Casualties Fact Dataset 29 Figure 4-22 Casualties Fact Dataset 30 Figure 4-23 Casualties Fact ETL result 30 Figure 4-24 Damages Fact Data flow 31 Figure 4-25 Damages Fact Dataset 31 Figure 4-26 Damages Fact Dataset 32 Figure 4-27 Damages Fact ETL result .32 Figure 5-1 Cube 33 Figure 5-2 MDX Question 33 Figure 5-3 MDX Query 34 Figure 5-4 MDX Query 34 Figure 5-5 MDX Query 35 Figure 5-6 MDX Query 35 Figure 5-7 MDX Query 36 Figure 5-8 MDX Query 36 Figure 5-9 MDX Query 37 Figure 5-10 MDX Query 10 37 Figure 5-11 MDX Query 11 .38 Figure 5-12 MDX Query 12 38 Figure 6-1 Question query 39 Figure 6-2 Answer report format 39 Figure 6-3 Answer 40 Figure 6-4 Question query 40 Figure 6-5 Answer report format 40 Figure 6-6 Answer 41 Figure 6-7 Question query 41 Figure 6-8 Answer report format 41 Figure 6-9 Answer 42 Figure 6-10 Question query 42 Figure 6-11 Answer report format 43 Figure 6-12 Answer 43 Figure 6-13 Question query 43 Figure 6-14 Answer report format 44 Figure 6-15 Answer 44 Figure 6-16 Question query 45 Figure 6-17 Answer report format 45 Figure 6-18 Answer 46 Figure 6-19 Question query 46 Figure 6-20 Answer report format 47 Figure 6-21 Answer 47 Figure 6-22 Question query 47 Figure 6-23 Answer report format 48 Figure 6-24 Answer 48 Figure 6-25 Question query 49 Figure 6-26 Answer report format 49 Figure 6-27 Answer 50 Figure 6-28 Question 10 query 50 Figure 6-29 Answer 10 report format 51 Figure 6-30 Answer 10 51 Figure 6-31 Question 11 query 52 Figure 6-32 Answer 11 report format 52 Figure 6-33 Answer 11 .53 Figure 6-34 Question 12 query 53 Figure 6-35 Answer 12 report format 54 Figure 6-36 Answer 12 54 LIST OF ACRONYMS Number Phrase Integrated development environment Structured Query Language Extensible Markup Language Extract - Transform - Load SQL Server Integration Services Abbreviation IDE SQL XML ETL SSIS Chapter 1.1 Introduction The goal of the project Traffic accidents are a major public health problem that causes death, injury, and disability for millions of people around the world According to the World Health Organization (WHO), road traffic injuries are the leading cause of death for children and young adults aged 5-29 years, and the eighth leading cause of death for all age groups Road traffic injuries also have a significant economic impact, costing countries on average 3% of their gross domestic product The traffic accident data warehouse provides a comprehensive and reliable source of data for analyzing and predicting traffic accidents A data warehouse is a centralized repository of integrated data from various sources, such as police reports, road sensors, vehicle registrations, weather stations, etc A data warehouse enables the application of data mining techniques to discover patterns and trends in the data, such as the causes, effects, and risk factors of traffic accidents Data mining is the process of extracting useful information from large and complex data sets using statistical and machine learning methods A traffic accident data warehouse can support various objectives and stakeholders in the field of road safety For example, it can help policymakers and planners design and evaluate effective interventions and regulations to reduce traffic accidents and fatalities It can also help researchers and analysts identify and understand the underlying factors and mechanisms of traffic accidents, such as human behavior, road conditions, vehicle characteristics, etc Furthermore, it can help drivers and travelers make informed decisions and avoid potential hazards on the road 1.2 Requirements The data warehouse should store historical and current data on traffic accidents, such as location, date, time, causes, vehicles involved, injuries, fatalities, etc The data warehouse should support various analytical queries and reports on traffic accident data, such as the frequency and distribution of accidents by location (ware, district, province), time (date, month, quarter, year), etc., the correlation and causation of accidents with various factors, the impact and cost of accidents on society and economy, etc The data warehouse should be scalable, reliable, and efficient to handle large volumes of data and high concurrency of users Statistics on the number of traffic accidents by location over years Statistics on the number of traffic accidents by cause Statistics on the number of vehicles damaged by cause The largest number of vehicles damaged, the smallest number of vehicles damaged due to causes Sort the number of casualties in ascending order, by years Top months with the most accidents Top months with the least number of accidents Statistics of the total number of casualties in each province Statistics of casualties by month of 2022 Figure 6-67 Answer report format Figure 6-68 Answer 6.4 Number of casualties by gender over the years Figure 6-69 Question query Figure 6-70 Answer report format Figure 6-71 Answer 6.5 Number of people injured and dead over the years Figure 6-72 Question query Figure 6-73 Answer report format Figure 6-74 Answer 6.6 Number of vehicles damaged by vehicle type Figure 6-75 Question query Figure 6-76 Answer report format Figure 6-77 Answer 6.7 Top provinces/cities with the most accidents Figure 6-78 Question query Figure 6-79 Answer report format Figure 6-80 Answer 6.8 Top provinces/cities with the highest number of deaths in the adult age group Figure 6-81 Question query Figure 6-82 Answer report format Figure 6-83 Answer 6.9 Top provinces/cities with the largest number of casualties Figure 6-84 Question 10 query Figure 6-85 Answer 10 report format Figure 6-86 Answer 10 6.10 Top provinces/cities with the most property damage Figure 6-87 Question 11 query Figure 6-88 Answer 11 report format Figure 6-89 Answer 11 6.11 Top provinces/cities with the most damage to vehicles Figure 6-90 Question 12 query Figure 6-91 Answer 12 report format Figure 6-92 Answer 12 Chapter 7.1 Conclusions Conclusion 7.1.1 Achievements The project of building a data warehouse for traffic accidents was a challenging and rewarding endeavor that resulted in several achievements Some of the main achievements are: The data warehouse integrated data from multiple sources, such as police reports, hospital records, insurance claims, and road sensors, and provided a comprehensive and consistent view of traffic accident data The data warehouse enables advanced analytics and reporting on various aspects of traffic accidents, such as causes, trends and impacts The data warehouse supported both descriptive as well as interactive dashboards and visualizations The data warehouse improved the decision-making and policy-making processes of various stakeholders, such as traffic authorities, public health agencies, insurance companies, and researchers The data warehouse helped to identify the root causes of traffic accidents, evaluate the effectiveness of existing interventions, and design new strategies to reduce traffic accidents and its consequences 7.1.2 Limitations This project aims to build a data warehouse for traffic accident data from various sources, such as police reports, insurance claims, hospital records, and media reports The data warehouse will enable analysts and researchers to perform queries and analyses on the traffic accident data, such as identifying the causes, patterns and impacts of traffic accidents The data warehouse will also support decision making and policy making for traffic safety and management However, this project faces several limitations that may affect its feasibility, quality, and usefulness Some of the limitations are: - Data quality and consistency: The data sources may have different formats, standards, definitions, and levels of detail for the traffic accident data For example, some sources may use different criteria to classify the severity or type of accidents, or some sources may not record certain attributes such as weather conditions or road features This may lead to data quality and consistency issues when integrating the data from different sources into the data warehouse - Data availability and accessibility: The data sources may have different policies and regulations for sharing and accessing the traffic accident data For example, some sources may require authorization or permission to access the data, or some sources may impose restrictions or fees for using the data This may limit the availability and accessibility of the data for the project - Data privacy and security: The traffic accident data may contain sensitive or personal information about the involved parties, such as names, addresses, license plates, or medical records This may raise ethical and legal concerns about the privacy and security of the data The project needs to comply with the relevant laws and regulations for protecting the data privacy and security, such as anonymizing or encrypting the data, or obtaining consent from the data owners - Data analysis and interpretation: The traffic accident data may be complex and multidimensional, involving various factors and variables that may influence the occurrence and outcome of traffic accidents The project needs to employ appropriate methods and techniques for analyzing and interpreting the data, such as statistical models, machine learning algorithms, or visualization tools The project also needs to account for the limitations and assumptions of the methods and techniques, such as validity, reliability, accuracy, or bias 7.2 Development The project of building a data warehouse for traffic accident analysis is progressing well and has achieved some significant milestones The main objectives of the project are to collect, integrate, and store data from various sources related to traffic accidents, such as police reports, hospital records, insurance claims, and road sensors The data warehouse will enable advanced analytics and reporting on the causes, consequences The project team has completed the following tasks so far: - Defined the business requirements and scope of the project - Designed the conceptual and logical data models for the data warehouse - Selected and configured the data warehouse platform and tools - Implemented the extraction, transformation, and loading (ETL) processes for data ingestion - Performed data quality checks and cleansing - Created some sample reports The next steps of the project are: - Conduct user acceptance testing and feedback sessions - Deploy the data warehouse to production environment - Provide training and documentation for end users and stakeholders - Monitor and maintain the data warehouse performance and security - Evaluate the project outcomes and benefits The project team is confident that the data warehouse will provide valuable insights and support for traffic accident management and policy making References Data Warehouse System - Design and Implementation, Alejandro Vaisman Esteban Zimányi, 2014

Ngày đăng: 24/08/2023, 10:23

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan