1. Trang chủ
  2. » Luận Văn - Báo Cáo

faculty of information system data analytics in business project report topic analyze sales data of adventurewworks using power pi

54 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Analyze Sales Data of Adventureworks Using Power PI
Tác giả Vo Thi Ngoc Trinh, Nguyen Tran Bich Ngoc, Truong Do Dang Khoa, Nguyen Thanh Phat
Người hướng dẫn M.Sc. Le Ba Thien
Trường học University of Economics and Laws
Chuyên ngành Data Analytics in Business
Thể loại Project Report
Năm xuất bản 2024
Thành phố Ho Chi Minh City
Định dạng
Số trang 54
Dung lượng 3,22 MB

Nội dung

Overview of BI: In the contemporary business landscape, organizations find themselves amassing vast volumes of data pertaining to various facets such as sales, inventory, customer detail

Trang 2

Ho Chi Minh City, January 04th, 2024

Group 5

Trang 3

ACKNOWLEDGEMENT

Our team would like to show our sincere appreciation to Mr Le Ba Thien, the lecturer, for your enthusiastic guidance and passing of essential knowledge to us during this course Because of that, we have gained plenty of helpful information that

we can afterward apply in real work

"Data Analysis In Business" is a wonderfully fascinating, very practical, and highly valuable subject Along with our interest in this subject, the team put up a lot

of effort in addition to researching more essential information Yet, the research is undoubtedly not perfect since there are still many limitations of knowledge in this field, as well as undeveloped skills

In order for the group to benefit from the experience and perform better on the following projects, | hope that you will give consideration to and offer suggestions

Sincerely thanks!

Ho Chi Minh City,

Trang 4

TABLE OF CONTENTS DECLARATION .0.cccceecccceeeeceeeeeeeeeececereeecneeeeeseceeeeeeasaeeeecatesesneeeesneareestsatenerenets ACKNOWLEDGEMENT .0 cccccceeceeeceeeenreeeeeneeeesenneeeeecneeeesncnreeetnieeeersnneeeeteas ii TABLE OF CONTENTS 0.cccececceceeecceeeeceeeeeeneeeeeecneeeecneeeeencnreeesiireeesnenterennaes iii II3NG) 2 1zlA:iaiiiiiiiẳ (đ+đ vi

CHAPTER 1 TOPIC OVERVIEVN QQQ.QQQQL LH HH ng nh khe ru 1

CHAPTER 2 THEORETICAL BACKGROUND AND RELATED WORKS 4

2.1 Overview Of Blo ce ceccc cece ceeeeceeeee cette secs aeeeeeeseeceneeeeeeeeecicneeeeeeeeeeseasieeeeeeeees 4 2.1.1 Introduce BI model and SOlUtION 2 cece ceeeeceeeceeeeeeeeeeeeetcneeeeeetetenaees 5

2.2.1 Theory and Methods in Data AnalySis ccccccecceseeesssssessteteenaeeeees 7

CHAPTER 3 ANALYSIS OF USER REQUIREMENTS AND DATA

3.1 Apply the development life cycle of a data analytics project 14

Trang 5

3.2 Identify and analyze User reqUIFEMENtS .ccccceeeeeesecseeeseeeeeeesceeeeceneeenecs 17

KV 2N c0) -0-(0LÌ::.itiẳtẳẳỎỖŨ 17 3.2.3 PrOdUCt ANALYSIS 17 H aada A 17

3.3 SQL Server lntegration SerVIC@S - TQ TT TS SH HT HT kh 18

3.3.2 Loading Data into SQL SerV€Y: LH hen 19

3.3.5 Maintenance and Optimization: - ch ket 19

CHAPTER 4 EXPERIMENTAL RESULTS AND ANALYSIS 35

4.1.4 Employee Evaluation cece cceeceeeeeeeeeeseeeeeeeeeeeeeeeeeeseseeeeeseseneea 40 4.2 Evaluation and Suggestion .cccccccceeeeeeccceeceeeeeeceecceeeeeeeeesesesaeeeeeesenesneeeees 41

Trang 7

5 Dim_ Product table . nn HH HS HH HS HH HH HH TH TT HT TT nn nh nhe hư, 28 290i (si: 1 28

0P) _Date table ai 30

vi

Trang 8

LIST OF FIGURES

Figure 2 1 Bl MOdE|L ccc ccceecceeseeste esses ssseeseeeeaeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeene ese 5

Figure 3 1 The Process of Data Analysis in Six S†eps che 14 Figure 3 2 Dim Product ccccccccccceeeceeeeeeeees cece cess eesseessssesesessseesseeenaeeees 21

I0i-6cn-;n- 0s 1 na 25

Figure 4.1 Data visualization of General AnalySiS nhe 35 Figure 4 2 Data visualization of Revenue AnalySiS -.cẶẶccSSŸssseesằ 37 Figure 4 3 Dashboard of Product AnalySis .cccccccccccccccccececeeceeeeeeeeeeeeeeeeeeeeeeea 39

vii

Trang 9

CHAPTER 1 TOPIC OVERVIEW

1.1 The reason for choosing the topic:

Recognizing data as an indispensable component for any firm, the aim is to conduct thorough research to explore diverse perspectives on handling data effectively The ultimate goal is to furnish the business with a plethora of intelligent and practical recommendations

The decision to delve into this subject is underpinned by our belief that a comprehensive understanding of data, combined with our foundational knowledge in the business domain, will empower us to engage in meaningful data mining activities

By leveraging our insights into the business area, we anticipate being able to extract valuable information and contribute substantially to the overarching goals of the research

Specifically, the chosen company for analysis is Adventure Works, a prominent global entity engaged in the manufacturing and sale of diverse products, ranging from clothing and accessories to bicycle parts and complete bicycles Operating in a commercial market that spans six countries across three continents — Australia, North America (United States and Canada), and Europe (United Kingdom, France, and Germany) — Adventure Works presents a rich and diverse dataset for investigation

Furthermore, the delineation of the company's primary sales channels, namely online and wholesaler sales, adds an additional layer of complexity to the analysis This multi-faceted approach aligns with our intention to explore various dimensions

of data, providing a well-rounded perspective on how AdventureWorks operates in its global market

Through this exploration, we aim to contribute valuable insights that can inform data-driven decision-making processes and strategies for businesses operating

in a multifaceted, global marketplace

1.2 Topic goal:

- Analyze the business model from the perspectives of revenue, staff, and product

Trang 10

- Sort the best-selling items, then group consumers and areas according to them

- Build a report including four dashboards:

+ General business situation + Detailed business situation by product + Detailed business situation according to employee + Detailed business situation according to revenue

- Make some suggestions for the business

- Create some potential paths for the topic's development

- Make a proposal for Adventure Works company's future business plan based

on the 4P model (Product - Price - Place - Promotion)

1.3 Subject and research scope of the project:

- Subject: Microsoft's AdventureWorks database, a free dataset

- Research scope: Information from the Manufacturing, Sales, Purchasing, Product Management, Reseller Management, and Human Resources is investigated in this research

1.4 Tools used:

- SQL Server

- Power BI

1.5 Research implications:

After finishing this research, Adventure Works can:

- Identify target customers

- Review the statistical information of the company to build some strategies and change it promptly

Trang 11

- Statistics on employee capacity and performance, thereby providing appropriate business strategies as well as rewards or training

1.6 Structure of report:

Chapter 1: TOPIC OVERVIEW

Chapter 2: THEORETICAL BACKGROUND AND RELATED WORKS

Chapter 3: ANALYSIS OF USER REQUIREMENTS AND DATA DESCRIPTION Chapter 4: EXPERIMENTAL RESULTS AND ANALYSIS

Chapter 5: CONCLUSION

Trang 12

CHAPTER 2 THEORETICAL BACKGROUND AND RELATED WORKS

2.1 Overview of BI:

In the contemporary business landscape, organizations find themselves amassing vast volumes of data pertaining to various facets such as sales, inventory, customer details, and supplier information, alongside comprehensive employee records Despite the abundance of data, its utility in guiding managerial decisions remains limited Establishing a unified database for the systematic classification and organization of this data holds immense potential This approach enables businesses

to retrospectively assess past performance while also facilitating the anticipation of future scenarios

The adoption of Business Intelligence (Bl) emerges as a pivotal solution in this context Coined by Howard Dresner in 1989, BI is a broad term encompassing a range of concepts and methodologies aimed at enhancing decision-making through diverse information techniques As articulated by Turban et al (2008), BI can be conceptualized as a suite of applications and techniques designed to collect, store, analyze, and provide data access, empowering business users in their decision- making processes The scope of BI applications spans decision support systems (DSS), query and reporting, online processing analysis (OLAP), statistical analysis, forecasting, and data mining

Carlo (2009) further refines the definition, portraying BI as a collection of mathematical models and analytical methods that delve into existing data to extract valuable information and knowledge crucial for decision-making By incorporating

BI methodologies, businesses gain a competitive edge by enabling managers to make faster and more informed decisions This integrated approach fosters a comprehensive understanding of past business scenarios, empowering enterprises to predict and navigate future situations with greater precision In essence, BI emerges

as an indispensable tool, ushering in a new era of strategic decision-making for businesses aiming to stay ahead in today's dynamic and competitive markets

Trang 13

2.1.1 Introduce BI model and solution

Figure 2 1 BI model

Within the framework of Business Intelligence (Bl), a comprehensive model consists of key components aimed at enhancing data -driven decision-making for businesses These components include:

- Data Modeling: The data modeling process entails the analysis and definition

of data types and interconnections within the business context This includes the creation of conceptual, logical, and physical data models, employing text, symbols, and diagrams

- Data Mining: Data mining is an automated process focused on revealing patterns and anomalies within data, employing diverse analytical techniques such as exploratory, descriptive, statistical, and predictive analytics

- Data Visualization: The process of data visualization involves presenting findings in an intuitive and interactive manner through mediums such as dashboards, charts, graphs, and maps

implementation of actions guided by data insights This includes adapting

performance, establishing benchmarks, and addressing challenges

Trang 14

These components collectively form a robust BI model, empowering businesses to make more informed decisions and enhance their efficiency, profitability, and competitiveness By incorporating these elements into their operations, organizations can leverage the full potential of BI to navigate dynamic market conditions and achieve sustainable growth

2.1.2 The benefits of BI in the business

Numerous scholarly papers underscore the advantages of Business Intelligence (BI) in the corporate landscape, highlighting several key points:

Informed Strategic Decisions: BI plays a pivotal role in empowering businesses to make well-informed strategic decisions, delivering accurate and timely data and insights crucial for navigating the dynamic business landscape

- Trend and Pattern Identification: BI serves as a valuable tool for businesses to discern trends and patterns within their data, offering insights into customer behavior, market demand, sales performance, and operational efficiency

- Performance and Revenue Optimization: BI becomes a catalyst for businesses seeking to enhance performance and revenue through the optimization of marketing and sales strategies, the improvement of customer satisfaction and retention, and the reinforcement of competitive advantage

- Operational Efficiency Enhancement: Businesses leverage Bl to elevate operational efficiency, undertaking measures to reduce costs, eliminate waste, streamline processes, and bolster overall quality and productivity

- Opportunity Discovery through Predictions: Through the power of BI, businesses uncover opportunities for improvement by harnessing predictive capabilities, whether it be in forecasting demand, identifying risks, or receiving actionable recommendations

- Creation of Smarter and Faster Reports: BI empowers businesses to generate reports that are not only smarter and faster but also easily comprehensible, shareable, and actionable, facilitating efficient decision-making processes

Trang 15

2.2 Data analysis and visualization

Data analysis involves the exploration of extensive stored data to unveil novel relationships, patterns, and trends This process employs pattern recognition technologies, statistical methods, and mathematical techniques to scrutinize repositories comprehensively Conceptually, data analysis can be likened to "data drilling" in depth and "data aggregation" in breadth, delving into data from multiple perspectives to discern relationships among its components This approach aims to uncover hidden trends, patterns, and past experiences within the data warehouse, ultimately supporting operational processes and decision-making

A crucial aspect of the broader business intelligence landscape is data visualization Simply put, data visualization entails presenting a specific dataset in a visual format, including charts, graphs, maps, and more The graphical representation

of text-based data allows for the identification of new insights and concealed patterns that might be challenging to discern in raw, non-graphical forms

The primary motivation behind data visualization is to identify patterns, trends, and relationships among diverse datasets that might be less apparent in a non- graphical representation This visual approach enhances users' understanding of market dynamics and facilitates the evaluation of customer needs Consequently, businesses can evolve by developing new strategies and techniques to enhance their operations Recognizing the significance of this, software companies are channeling their efforts into optimizing their Business Intelligence (Bl) tools to provide the most effective data visibility This emphasis on data visibility is integral to unveiling concealed information within the warehouse, contributing to more informed decision- making processes

2.2.1 Theory and Methods in Data Analysis

Theoretical Frameworks in Data Analysis:

- John Tukey’s Exploratory Data Analysis (EDA) theory proposed by John Tukey emphasizes the analysis of datasets to succinctly capture their essential characteristics This typically involves employing statistical graphics and various data visualization techniques

Trang 16

- Confirmatory Data Analysis (CDA): Confirmatory Data Analysis (CDA) focuses on employing conventional statistical tools to rigorously evaluate data, aiming to scrutinize and challenge any assumptions that may have arisen during the Exploratory Data Analysis phase

- Grounded Theory of Analysis: The Grounded Theory of Analysis unfolds in two stages: first, collecting a substantial amount of information; second, analyzing all gathered data, indexing it, and discovering relevance This iterative process continues as more data is collected and analyzed

Data Analysis Methods:

Multidimensional Cubes: Employing multidimensional cubes enables managers and employees to explore data comprehensively by utilizing operations like rotation, slicing, and drill-down, providing versatile perspectives on the dataset

- Time Series Analysis: Time series analysis entails systematically recording data over a defined period, facilitating the identification of trends and differences over time This method proves effective in helping companies predict and forecast future developments based on historical data

- Data Mining: Data mining is a method wherein large datasets are scrutinized

to identify trends and patterns, revealing valuable insights into customer behaviors, habits, and evolving trends

- Optimization Models: Optimization models consist of three fundamental elements—objective function, decision variables, and business constraints— working together to pinpoint the most favorable solutions from a predetermined set of options

2.2.2 Visualization

Data visualization serves as a conduit, transforming information from numerical metrics into visual representations, typically in the form of charts The primary purpose is to facilitate easy comprehension of information for managers and department employees, enabling swift and informed decision-making

Trang 17

Figure 2 2 Data visualization by Power Bl

Various types of charts cater to specific objectives within the realm of data visualization:

- Specific Value Representation: Charts designed for specific value representation encompass single-value charts, tables, and highlight tables These aid in presenting individual data points or key metrics clearly

- Comparison Charts: Comparison charts are diverse, ranging from single and multiple lines to bar charts, group bar charts, and bullet charts These aim to facilitate comparisons between different datasets, helping users discern trends and variations

- Relationship Visualization: Relationship visualization encompasses scatter plots, bubble charts, and word clouds These charts focus on illustrating connections and associations between data points, aiding in the interpretation

of relationships within datasets

- Composition Charts: Composition charts, such as tree maps, pie charts, and donut charts, are designed to showcase the composition of a whole These visualizations help in depicting the distribution of components within a dataset

- Distribution Charts: Distribution charts like box plots, scatter plots, and histograms are employed to represent the spread and distribution of data They offer insights into the variation and concentration of values within a dataset

Trang 18

Geographic Visualization: Geographic visualization involves filled maps and symbol maps These charts are particularly useful for showcasing geographical data, and helping users understand spatial distribution and patterns

In essence, the variety of charts available in data visualization serves distinct purposes, empowering users to interpret and act upon information swiftly and effectively

2.3 Data warehouse:

At its core, a data warehouse is a centralized repository strategically designed

to integrate and store vast volumes of data originating from diverse sources within an organization This consolidation serves a pivotal role in supporting business intelligence (BI} and decision-making processes by furnishing a unified and historical perspective on data Several fundamental theoretical concepts underpin the structure and functionality of data warehouses:

Data Integration: Data warehouses excel in integrating information from disparate sources such as transactional databases, spreadsheets, and externa systems This integration process involves transforming and cleaning data to ensure uniformity and quality

Dimensional Modeling: The theoretical cornerstone of data warehousing, dimensional modeling, entails organizing data into dimensions and facts Ralph Kimball's (2011) contributions, particularly in the realm of dimensional modeling, have profoundly shaped the landscape of data warehousing His works provide actionable insights for constructing effective and scalable data warehouses This creates a star or snowflake schema where dimensions encapsulate descriptive data, and facts encapsulate measurable data

ETL (Extract, Transform, Load): ETL processes constitute a foundational element, orchestrating the extraction of data from source systems, its transformation to fit the data warehouse schema, and ultimately loading it into the warehouse This meticulous process safeguards data quality and consistency

10

Trang 19

OLAP (Online Analytical Processing): Integral to data warehousing, OLAP refers to a spectrum of tools and technologies that empower users to interactively analyze multidimensional data Data warehouses are explicitly designed to facilitate OLAP queries for intricate analysis and reporting Historical Data Storage: A distinguishing feature of data warehouses is their ability to store historical data This capability allows users to analyze trends and make informed decisions grounded in a comprehensive, long-term perspective

Data Mart and Enterprise Data Warehouse: The conceptualization of data marts as subsets focused on specific business functions or departments, and the integration of these into the enterprise data warehouse (EDW), epitomizes the scalability and comprehensiveness of data warehousing solutions

2.4 SSIS:

SQL Server Integration Services (SSIS) is a powerful ETL (Extract, Transform,

Load) tool developed by Microsoft for data integration and workflow applications It

is a fundamental component of the SQL Server database platform, designed to facilitate the extraction, transformation, and loading of data from various sources into target destinations Numerous books and online tutorials authored by SSIS experts, such as Brian Knight and Andy Leonard (2005), offer practical insights into SSIS development, covering topics from basic concepts to advanced techniques:

Data Flow: At the heart of SSIS is the data flow engine, which defines the movement and transformation of data between sources and destinations The data flow is composed of data flow components that enable diverse operations, such as data cleansing, aggregation, and merging

Control Flow: SSIS employs a control flow to manage the flow of tasks and containers in a package Control flow elements include tasks (e.g., data extraction, transformation, and loading tasks), precedence constraints to define the order of execution, and containers for grouping tasks

SSIS Package: A package is a collection of interconnected data flow and control flow elements It serves as a container for organizing and executing

ETL processes Packages can be developed using SQL Server Data Tools

11

Trang 20

(SSDT) and executed using SQL Server Management Studio (SSMS) or through the SSIS runtime

Connection Managers: Connection managers in SSIS define the connection information for source and destination systems They play a crucial role in establishing connections to various data sources and destinations, ensuring seamless data movement

Transformations: SSIS includes a variety of transformations that enable the manipulation and enrichment of data during the ETL process Common transformations include sorting, merging, and aggregating data

Expressions and Variables: Expressions and variables in SSIS allow for dynamic configurations and the manipulation of values during runtime Expressions can be used to set properties dynamically, enhancing the flexibility of SSIS packages

2.5 Schema:

A schema is a fundamental concept in database management and information organization, providing a blueprint or framework for structuring and defining the logical organization of data It serves as a set of rules or specifications that dictate how data should be organized, stored, and accessed within a database The theoretical background of schemas encompasses several key aspects:

Database Schema: A database schema defines the structure of a database, including tables, relationships, constraints, and other elements It serves as a high-level abstraction that provides an organized representation of the data model, facilitating data integrity and consistency

Schema Elements: Within a database schema, various elements contribute to data organization These include tables, which represent entities, columns that define attributes, primary and foreign keys for relationship establishment, and constraints to enforce data integrity rules

Normalization: The process of normalization, based on normalization forms (e.g., First Normal Form, Second Normal Form), is a theoretical framework for organizing data within a schema to eliminate redundancy and dependency

12

Trang 21

issues Normalization aims to enhance data integrity and reduce data anomalies

introducing redundancy into a schema for performance optimization It is a theoretical concept often applied in data warehousing or scenarios where read performance is a priority over data modification efficiency

Schema Evolution: Schema evolution refers to the process of modifying a database schema over time to accommodate changes in data requirements Theoretical considerations include strategies for versioning, backward compatibility, and migration to ensure a seamless transition when updating the schema

13

Trang 22

CHAPTER 3 ANALYSIS OF USER REQUIREMENTS AND DATA

DESCRIPTION

3.1 Apply the development life cycle of a data analytics project

The data analytics lifecycle outlines the six fundamental steps of a data analytics project based on the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) This process consists of business & data understanding, preparing the data, exploratory analysis, validation, visualization and presentation, according to Paula Munoz, an alumni of Northeastern

Northeastern University

Based on CRISP-DM Methodology

Stepi Step 2 Step3 Step 4 Step 5 Step 6

Figure 3 1 The Process of Data Analysis in Six Steps

Step 1: Business Issues Understanding

This phase concentrates on understanding the objectives and requirements of the project

Determine business objectives: thoroughly understand, from a business perspective, what the customer really wants to accomplish, and then define business success criteria

14

Trang 23

Assess situation: determine resources availability, project requirements, assess risks and contingencies, and conduct a cost-benefit analysis

Determine data mining goals: in addition, define what success looks like from

a technical data mining perspective

Produce project plan: Select technologies and tools and define detailed plans for each project phase

Step 2: Data Understanding

Focus to identify, collect, and analyze the data sets, including 4 tasks:

Collect initial data: import necessary data into an analysis tool

Describe data: examine the data and document its properties such as data format, number of records, or field identities

Explore data: query, visualize and identify relationships among the data Verify data quality: check data’s cleanliness and set quality issues

Step 3: Data Preparation

Often referred to as “data munging”, prepare the final dataset for modeling:

Select data: define reasons for data inclusion/exclusion

Clean data: correct, impute, or remove erroneous values

Construct data: derive new helpful attributes

Integrate data: combine data from multiple sources

Format data: re-format data as necessary

Step 4: Perform Exploratory Analysis and Modeling

Start creating models to test data and search for solutions to the stated goals Select modeling techniques: determine algorithms

15

Trang 24

Generate test design: pending modeling approach (split the data into training, test, and validation sets)

Build model: code execution

Assess model: interpret the model results based on domain knowledge, the pre-defined success criteria, and the test design

Step 5: Validation

Once the predictive models are constructed, conduct thorough data analysis to assess their effectiveness Verify the accuracy and validity of the information utilized in the models

And then, determine whether the developed models operate as envisioned Examine their behavior, performance, and alignment with the initial objectives

Evaluate the need for additional data cleansing Identify any persistent anomalies, inconsistencies, or outliers that may impact model performance Step 6: Visualization and Presentation

Once all deliverables are completed, begin working on data visualization Data visualization is often crucial for effectively communicating findings to clients Interactive visualization tools like Tableau are valuable in explaining research findings to clients who may not be data experts Weaving a narrative with the data is essential for conveying the significance of the research to the client Then, clearly define the project objectives to ensure a successful outcome Break down the project into specific tasks to streamline the process and deliver exceptional results Finally, gather all necessary information before starting the project to avoid delays and rework

16

Trang 25

3.2 Identify and analyze user requirements

3.2.1 Business overview analysis

Requirements: Gathering statistical data concerning fundamental business activities across the enterprise:

- Sales volume, revenue, profits, employee count, resellers, and products

- Group-based, regional, and product category-specific statistics, segmented by quarterly periods

- Analyzing and filtering these metrics annually

Benefits: Furnishes business managers with a holistic overview of their ongoing business endeavors

3.2.2 Revenue analysis

Requirements: We need comprehensive statistics delineating revenue and business profits categorized by product, region, and broken down into monthly, quarterly, and annual data

- Total revenue, profit

- Year on Year Revenue Growth (Net Revenue YoY%) = (this year revenue / last year - 1) * 100

- Year on Year Profit Growth (Net Profit YoY%) = (this year profit / last year -

Trang 26

- Increase and decrease in revenue over the years Benefits: Offer a full picture of business performance They reveal sales trends, highlight successful product categories, and identify top-performing items, guiding strategic decisions for maximizing potential and market focus

3.2.4 Sales person analysis

Requirements: Employee sales performance statistics

- Percentage of target completion for individual employees

- Top-performing employees Benefits: Employee performance statistics can help identify top performers, pinpoint areas for improvement, and allocate resources more effectively, optimizing organizational strategies and business outcomes

3.3 SQL Server Integration Services

We have implemented a process involving SQL Server Integration Services (SSIS)

to import data from CSV and Excel files into SQL Server This approach is commonly used in building data warehouses where you organize data into dimensional (dim) and fact tables

We carried out a process using SQL Server Integration Services (SSIS) to transfer

data from CSV into SQL Server This method is frequently used to create data

warehouses where the data is structured into dimensional (dim) and fact tables Here's an extended explanation:

3.3.1 Data Extraction Using SSIS:

Integration Services (SSIS) is a powerful tool for ETL purposes (Extract, Transform,

Load) provided by Microsoft SQL Server It can be used to perform a broad range of

data migration tasks

SSIS provides specific connectors and components to efficiently handle various data sources including CSV files which are our data type

18

Trang 27

3.3.2 Loading Data into SQL Server:

SQL Server is the destination of data flow: The extracted data is loaded into SQL

Server, which serves as the central repository for our data

SSIS Data Flow: SIS enables the creation of data flow tasks, allowing for the movement of data from a source to a destination During this process, the data undergoes transformations to ensure it is properly cleansed and formatted before being stored in SQL Server This ensures the data meets the required standards and

is ready for analysis and utilization

3.3.3 Designing Dim and Fact Tables:

Data Warehouse Structure: In a data warehouse, data is structured into two dimensional (dim) and fact tables Dim tables generally hold descriptive information, such as product details, with data fields like name, product code, and standard cost Fact tables predominantly store quantitative data, such as sales records, with metrics like revenue, quantity, and date The dimensional and fact tables are interconnected, with dim tables providing context and attributes for the quantitative data present in the fact tables

Normalization and Star Schema: The design may involve normalizing dim tables and using a star schema where a fact table is connected to multiple dim tables, creating a more flexible and efficient structure for analytical queries

3.3.4 Populating Dim and Fact Tables:

SSIS Packages for Loading: SSIS packages are created to populate the dim and fact tables in SQL Server These packages can either be scheduled to run automatically or triggered manually based on how frequently your data requires updates

Transformations and Lookups: SSIS enables data transformations during the loading process Lookup transformations can be used to enhance fact tables with relevant information retrieved from dimension tables

3.3.5 Maintenance and Optimization:

Indexing and Statistics: To optimize query performance in SQL Server, ensure proper indexing and consistent maintenance of indexes Regularly update statistics and

19

Ngày đăng: 27/08/2024, 12:10

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w