(Tiểu luận) data warehouse building a data warehouse of building violations in the city of chicago

18 2 0
(Tiểu luận) data warehouse building a data warehouse of building violations in the city of chicago

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

VIETNAM KOREA THE UNIVERSITY OF INFORMATION AND COMMUNICATION TECHNOLOGY COMPUTER SCIENCE - - DATA WAREHOUSE BUILDING A DATA WAREHOUSE OF BUILDING VIOLATIONS IN THE CITY OF CHICAGO Students perform: Instructors: DINH VUONG GIA HUY 20SE5 TRAN THI MY DUYEN 20SE1 M.S TRAN THANH LIEM Da Nang, May 2023 VIETNAM KOREA THE UNIVERSITY OF INFORMATION AND COMMUNICATION TECHNOLOGY COMPUTER SCIENCE - - DATA WAREHOUSE BUILDING A DATA WAREHOUSE OF BUILDING VIOLATIONS IN THE CITY OF CHICAGO Students perform: Instructors: DINH VUONG GIA HUY 20SE5 TRAN THI MY DUYEN 20SE1 M.S TRAN THANH LIEM Da Nang, May 2023 COMMENT (For Instructor) Da Nang, May 2023 Instructor THANK YOU We would like to express our sincere thanks to the teachers of the Department of Computer Science and to everyone who took the time to help us during the implementation of this thematic project In particular, we would like to thank M.S Tran Thanh Liem is the person who agreed to direct our topic We are dedicated to helping us with project information Thanks to that, we have completed our project and most importantly, we have gained experience during the course of implementing the subject project Although we have prepared the report very carefully, it is inevitable that errors will not be avoided We look forward to receiving your understanding and suggestions We sincerely thank you! MỤC LỤC DANH MỤC HÌNH ẢNH CHAPTER 1: INTRODUCE 1.1 Topic introduce Violations issued by the Department of Buildings from 2006 to the present Lenders and title companies, please note: These data are historical in nature and should not be relied upon for real estate transactions For transactional purposes such as closings, please consult the title commitment for outstanding enforcement actions in the Circuit Court of Cook County or the Chicago Department of Administrative Hearings Violations are always associated to an inspection and there can be multiple violation records to one inspection record Related Applications: Building Data Warehouse 1.2 Introduce dataset Chicago Building Violations The Chicago Building Violations dataset provides information about building code violations that have occurred in the city of Chicago This dataset offers valuable insights into the condition of buildings, the enforcement of building regulations, and the efforts made to ensure public safety and compliance with building codes Here are some key details about the Chicago Building Violations dataset: Content: The dataset includes detailed information about building code violations in Chicago It covers a wide range of violations, such as structural issues, plumbing problems, electrical hazards, lack of permits, and other violations related to building safety and maintenance Data Fields: The dataset typically includes information such as ID, VIOLATION LAST MODIFIED DATE, VIOLATION DATE, VIOLATION CODE, VIOLATION STATUS, VIOLATION STATUS DATE, VIOLATION DESCRIPTION, VIOLATION LOCATION, VIOLATION INSPECTOR COMMENTS, VIOLATION ORDINANCE, INSPECTOR ID, INSPECTION NUMBER, INSPECTION STATUS, INSPECTION WAIVED, STREET INSPECTION CATEGORY, NUMBERSTREET DEPARTMENT DIRECTION, STREET BUREAU, NAME, ADDRESS, STREET TYPE, PROPERTY GROUP, SSA, LATITUDE, LONGITUDE, LOCATION, Community Areas, Zip Codes Boundaries - ZIP Codes, Census Tracts, Wards, Historical Wards 2003-2015 (whether it has been resolved or is still open) Sources: The dataset is derived from official records maintained by the City of Chicago, including the Department of Buildings and other relevant authorities responsible for enforcing building codes and regulations Purpose: The Chicago Building Violations dataset serves multiple purposes It helps city officials and inspectors monitor and enforce compliance with building codes, ensuring the safety and habitability of buildings within the city The dataset also provides valuable information to researchers, analysts, and the general public interested in understanding building conditions, patterns of violations, and trends over time Analysis and Applications: The dataset can be analyzed to identify areas or types of buildings with a higher frequency of violations, allowing for targeted enforcement efforts It can also be used to assess the effectiveness of building code regulations, identify areas in need of improvement, and evaluate the impact of enforcement actions Researchers and analysts can use the dataset to study correlations between building violations and factors such as neighborhood characteristics, property ownership, or economic indicators Accessibility: The availability and accessibility of the dataset may vary It may be accessible through the official website of the City of Chicago or other government data portals Additionally, there might be different versions or subsets of the dataset, each containing specific time frames or types of violations It's important to note that the specific details and availability of the dataset may change over time Therefore, it's recommended to refer to the official sources or the City of Chicago's data portal for the most up-to-date and accurate information regarding the Chicago Building Violations dataset 1.3 Tools Used 1.3.1 SQL Server SQL Server is a relational database management system (RDBMS) developed by Microsoft It is primarily designed and developed to compete with MySQL and Oracle database SQL Server supports ANSI SQL, which is the standard SQL (Structured Query Language) language However, SQL Server comes with its own implementation of the SQL language, T-SQL (Transact-SQL) MS SQL Server as Client-Server Architecture Let’s have a look at the below early morning conversation between Mom and her Son, Tom Fig Fig MS SQL Server as Client-Server Architecture Key Components and Services of SQL Server Below are the main components and services of SQL server: Database Engine: This component handle storage, Rapid transaction Processing, and Securing Data SQL Server: This service starts, stops, pauses, and continues an instance of Microsoft SQL Server Executable name is sqlservr.exe.SQL Server Agent: It performs the role of Task Scheduler It can be triggered by any event or as per demand Executable name is sqlagent.exe SQL Server Browser: This listens to the incoming request and connects to the desired SQL server instance Executable name is sqlbrowser.exe SQL Server Full-Text Search: This lets user running full-text queries against Character data in SQL Tables.Executable name is fdlauncher.exe SQL Server VSS Writer: This allows backup and restoration of data files when the SQL server is not running.Executable name is sqlwriter.exe SQL Server Analysis Services (SSAS): Provide Data analysis, Data mining and Machine Learning capabilities SQL server is integrated with R and Python language for advanced analytics Executable name is msmdsrv.exe SQL Server Reporting Services (SSRS): Provides reporting features and decisionmaking capabilities It includes integration with Hadoop Executable name is ReportingServicesService.exe SQL Server Integration Services (SSIS): Provided Extract-Transform and Load capabilities of the different type of data from one source to another It can be view as converting raw information into useful information Executable name is MsDtsSrvr.exe SQL SERVER INSTANCE SQL Server allows you to run multiple services at a go, with each service having separate logins, ports, databases, etc These are divided into two: Primary Instances Named Instances There are two ways through which we may access the primary instance First, we can use the server name Secondly, we can use its IP address Named instances are accessed by appending a backslash and instance name For example, to connect to an instance named xyx on the local server, you should use 127.0.0.1\xyz From SQL Server 2005 and above, you are allowed to run up to 50 instances simultaneously on a server Note that even though you can have multiple instances on the same server, only one of them must be the default instance while the rest must be named instances One can run all the instances concurrently, and each instance runs independent of the other instances IMPORTANCE OF SQL SERVER INSTANCES The following are the advantages of SQL Server instances: For installation of different versions on one machine You can have different versions of SQL Server on a single machine Each installation works independently from the other installations For cost reduction Instances can help us reduce the costs of operating SQL Server, especially in purchasing the SQL Server license You can get different services from different instances, hence no need for purchasing one license for all services For maintenance of development, production and test environments separately This is the main benefit of having many SQL Server instances on a single machine You can use different instances for development, production and test purposes For reducing temporary database problems When you have all services running on a single SQL Server instance, there are high chances of having problems with the problems, especially problems that keep on recurring When such services are run on different instances, you can avoid having such problems For separating security privileges When different services are running on different SQL Server instances, you can focus on securing the instance running the most sensitive service For maintaining a standby server A SQL Server instance can fail, leading to an outage of services This explains the importance of having a standby server to be brought in if the current server fails This can easily be achieved using SQL Server instances SQL Server Management Studio (SSMS) SQL Server Management Studio (SSMS) is an integrated environment for managing any SQL infrastructure Use SSMS to access, configure, manage, administer, and develop all components of SQL Server, Azure SQL Database, Azure SQL Managed Instance, SQL Server on Azure VM, and Azure Synapse Analytics SSMS provides a single comprehensive utility that combines a broad group of graphical tools with many rich script editors to provide access to SQL Server for developers and database administrators of all skill levels 1.3.2 Visual Studio 2022 Visual Studio 2022 is the latest major release of Microsoft's integrated development environment (IDE) for building software applications It introduces several new features and improvements aimed at enhancing developer productivity, collaboration, and overall development experience Here's an overview of the key highlights of Visual Studio 2022: 64-bit Architecture: Visual Studio 2022 is now available as a native 64-bit application, providing improved performance and stability With the 64-bit architecture, the IDE can handle larger projects and utilize more system resources, resulting in faster builds and smoother operations Enhanced Performance: Visual Studio 2022 introduces various performance improvements to make the IDE more responsive and efficient These optimizations include faster startup times, improved load times for large solutions, quicker code navigation, and reduced memory usage Updated User Interface: The IDE's user interface has undergone a refresh in Visual Studio 2022 It features a cleaner and more modern look with redesigned icons, updated themes, and improved layout management options The new UI provides a refreshed coding experience and improves readability Productivity Enhancements: Visual Studio 2022 brings several productivity enhancements to help developers write code faster and with fewer distractions Some notable features include improved IntelliSense with AI-driven suggestions, enhanced code search capabilities, customizable code formatting, and improved Git integration for easier version control Collaboration and Live Share: Visual Studio 2022 expands on the collaboration features introduced in previous versions It includes enhancements to Visual Studio Live Share, allowing developers to collaborate in real-time with teammates, regardless of the programming language or platform Live Share enables shared editing, debugging, and code reviews, making it easier to work together on projects .NET and MAUI Support: Visual Studio 2022 provides comprehensive support for the latest NET framework and the Multi-platform App UI (MAUI) framework It includes templates, tools, and debugging capabilities to streamline the development of cross-platform applications targeting Windows, macOS, iOS, and Android Improved Web Development: Visual Studio 2022 offers enhanced web development capabilities with improved support for front-end frameworks like React, Angular, and Vue.js It includes a new Hot Reload feature that allows developers to instantly view code changes in running applications without restarting or losing application state Cloud Development: The IDE has strengthened support for cloud development scenarios with improved integration with Azure services Visual Studio 2022 provides streamlined workflows for building, deploying, and debugging cloud-native applications, including support for containers and serverless development These are just a few of the key highlights of Visual Studio 2022 The new release aims to provide a more efficient, modern, and collaborative development environment for developers across various platforms and programming languages Visual Studio 2022 offers a wide range of tools and features to support the development of desktop applications, web applications, mobile apps, and cloud-based solutions 1.3.3 What is ETL? ETL, which stands for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system As the databases grew in popularity in the 1970s, ETL was introduced as a process for integrating and loading data for computation and analysis, eventually becoming the primary method to process data for data warehousing projects ETL provides the foundation for data analytics and machine learning workstreams Through a series of business rules, ETL cleanses and organizes data in a way which addresses specific business intelligence needs, like monthly reporting, but it can also tackle more advanced analytics, which can improve back-end processes or end user experiences ETL is often used by an organization to: Extract data from legacy systems Cleanse the data to improve data quality and establish consistency Load data into a target database 1.3.4 What is OLAP? OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store Most business data have multiple dimensions—multiple categories into which the data are broken down for presentation, tracking, or analysis For example, sales figures might have several dimensions related to location (region, country, state/province, store), time (year, month, week, day), product (clothing, men/women/children, brand, type), and more But in a data warehouse, data sets are stored in tables, each of which can organize data into just two of these dimensions at a time OLAP extracts data from multiple relational data sets and reorganizes it into a multidimensional format that enables very fast processing and very insightful analysis CHAPTER 2: DATA WAREHOUSES ANALYSIS AND DESIGN 2.1 Conceptual modeling 2.1.1 Dimension entities We create dimensions DIM_VIOLATION: would contain details about each building involved in the violations It could include fields such as VIOLATION CODE, VIOLATION STATUS, VIOLATION DESCRIPTION, VIOLATION LOCATION, VIOLATION ORDINANCE 2 DIM_LOCATION: would contain details about each building involved in the location It could include fields such as VIOLATION LOCATION, ADDRESS, STREET NUMBER, STREET DIRECTION, STREET NAME, STREET TYPE, LATITUDE, LONGITUDE, LOCATION DIM_INSPECTION: would contain details about each building involved in the inspection It could include fields such as INSPECTION NUMBER, INSPECTION STATUS, INSPECTION WAIVED DIM_INSPECTOR: would contain details about each building involved in the inspector It could include fields such as VIOLATION INSPECTOR COMMENTS, INSPECTOR ID DIM_TIME: would contain details about each building involved in the time It could include fields such as VIOLATION LAST MODIFIED DATE, VIOLATION DATE, VIOLATION CODE, VIOLATION STATUS DATE 2.1.2 Hierarchies of dimensions DIM_VIOLATION: Hierarchy 1: VIOLATION CODE can have sub-levels such as VIOLATION DESCRIPTION and VIOLATION ORDINANCE For example, a specific violation code can have multiple corresponding descriptions and ordinances Hierarchy 2: VIOLATION STATUS can have a sub-level such as VIOLATION STATUS DATE DIM_LOCATION: Hierarchy 1: VIOLATION LOCATION can have sub-levels such as ADDRESS, STREET NUMBER, STREET DIRECTION, STREET NAME, and STREET TYPE Hierarchy 2: LOCATION can have sub-levels such as LATITUDE and LONGITUDE DIM_INSPECTION: Hierarchy 1: INSPECTION NUMBER can have sub-levels such as INSPECTION STATUS and INSPECTION WAIVED DIM_INSPECTOR: Hierarchy 1: INSPECTOR ID can have a sub-level such as VIOLATION INSPECTOR COMMENTS DIM_TIME: Hierarchy 1: VIOLATION DATE can have a sub-level such as VIOLATION LAST MODIFIED DATE Hierarchy 2: VIOLATION CODE can have a sub-level such as VIOLATION STATUS DATE 2.1.3 Conceptual modeling diagram Fig 2.2 Logical Modeling Modeling diagram

Ngày đăng: 20/09/2023, 15:03

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan