Overview of BI: In the contemporary business landscape, organizations find themselves amassing vast volumes of data pertaining to various facets such as sales, inventory, customer detail
Trang 2Ho Chi Minh City, January 04th, 2024
Group 5
Trang 3ACKNOWLEDGEMENT
Our team would like to show our sincere appreciation to Mr Le Ba Thien, the lecturer, for your enthusiastic guidance and passing of essential knowledge to us during this course Because of that, we have gained plenty of helpful information that
we can afterward apply in real work
"Data Analysis In Business" is a wonderfully fascinating, very practical, and highly valuable subject Along with our interest in this subject, the team put up a lot
of effort in addition to researching more essential information Yet, the research is undoubtedly not perfect since there are still many limitations of knowledge in this field, as well as undeveloped skills
In order for the group to benefit from the experience and perform better on the following projects, | hope that you will give consideration to and offer suggestions
Sincerely thanks!
Ho Chi Minh City,
Trang 4TABLE OF CONTENTS DECLARATION .0.cccceecccceeeeceeeeeeeeeececereeecneeeeeseceeeeeeasaeeeecatesesneeeesneareestsatenerenets ACKNOWLEDGEMENT .0 cccccceeceeeceeeenreeeeeneeeesenneeeeecneeeesncnreeetnieeeersnneeeeteas ii TABLE OF CONTENTS 0.cccececceceeecceeeeceeeeeeneeeeeecneeeecneeeeencnreeesiireeesnenterennaes iii II3NG) 2 1zlA:iaiiiiiiiẳ (đ+đ vi
CHAPTER 1 TOPIC OVERVIEVN QQQ.QQQQL LH HH ng nh khe ru 1
CHAPTER 2 THEORETICAL BACKGROUND AND RELATED WORKS 4
2.1 Overview Of Blo ce ceccc cece ceeeeceeeee cette secs aeeeeeeseeceneeeeeeeeecicneeeeeeeeeeseasieeeeeeeees 4 2.1.1 Introduce BI model and SOlUtION 2 cece ceeeeceeeceeeeeeeeeeeeetcneeeeeetetenaees 5
2.2.1 Theory and Methods in Data AnalySis ccccccecceseeesssssessteteenaeeeees 7
CHAPTER 3 ANALYSIS OF USER REQUIREMENTS AND DATA
3.1 Apply the development life cycle of a data analytics project 14
Trang 53.2 Identify and analyze User reqUIFEMENtS .ccccceeeeeesecseeeseeeeeeesceeeeceneeenecs 17
KV 2N c0) -0-(0LÌ::.itiẳtẳẳỎỖŨ 17 3.2.3 PrOdUCt ANALYSIS 17 H aada A 17
3.3 SQL Server lntegration SerVIC@S - TQ TT TS SH HT HT kh 18
3.3.2 Loading Data into SQL SerV€Y: LH hen 19
3.3.5 Maintenance and Optimization: - ch ket 19
CHAPTER 4 EXPERIMENTAL RESULTS AND ANALYSIS 35
4.1.4 Employee Evaluation cece cceeceeeeeeeeeeseeeeeeeeeeeeeeeeeeseseeeeeseseneea 40 4.2 Evaluation and Suggestion .cccccccceeeeeeccceeceeeeeeceecceeeeeeeeesesesaeeeeeesenesneeeees 41
Trang 75 Dim_ Product table . nn HH HS HH HS HH HH HH TH TT HT TT nn nh nhe hư, 28 290i (si: 1 28
0P) _Date table ai 30
vi
Trang 8LIST OF FIGURES
Figure 2 1 Bl MOdE|L ccc ccceecceeseeste esses ssseeseeeeaeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeene ese 5
Figure 3 1 The Process of Data Analysis in Six S†eps che 14 Figure 3 2 Dim Product ccccccccccceeeceeeeeeeees cece cess eesseessssesesessseesseeenaeeees 21
I0i-6cn-;n- 0s 1 na 25
Figure 4.1 Data visualization of General AnalySiS nhe 35 Figure 4 2 Data visualization of Revenue AnalySiS -.cẶẶccSSŸssseesằ 37 Figure 4 3 Dashboard of Product AnalySis .cccccccccccccccccececeeceeeeeeeeeeeeeeeeeeeeeeea 39
vii
Trang 9CHAPTER 1 TOPIC OVERVIEW
1.1 The reason for choosing the topic:
Recognizing data as an indispensable component for any firm, the aim is to conduct thorough research to explore diverse perspectives on handling data effectively The ultimate goal is to furnish the business with a plethora of intelligent and practical recommendations
The decision to delve into this subject is underpinned by our belief that a comprehensive understanding of data, combined with our foundational knowledge in the business domain, will empower us to engage in meaningful data mining activities
By leveraging our insights into the business area, we anticipate being able to extract valuable information and contribute substantially to the overarching goals of the research
Specifically, the chosen company for analysis is Adventure Works, a prominent global entity engaged in the manufacturing and sale of diverse products, ranging from clothing and accessories to bicycle parts and complete bicycles Operating in a commercial market that spans six countries across three continents — Australia, North America (United States and Canada), and Europe (United Kingdom, France, and Germany) — Adventure Works presents a rich and diverse dataset for investigation
Furthermore, the delineation of the company's primary sales channels, namely online and wholesaler sales, adds an additional layer of complexity to the analysis This multi-faceted approach aligns with our intention to explore various dimensions
of data, providing a well-rounded perspective on how AdventureWorks operates in its global market
Through this exploration, we aim to contribute valuable insights that can inform data-driven decision-making processes and strategies for businesses operating
in a multifaceted, global marketplace
1.2 Topic goal:
- Analyze the business model from the perspectives of revenue, staff, and product
Trang 10- Sort the best-selling items, then group consumers and areas according to them
- Build a report including four dashboards:
+ General business situation + Detailed business situation by product + Detailed business situation according to employee + Detailed business situation according to revenue
- Make some suggestions for the business
- Create some potential paths for the topic's development
- Make a proposal for Adventure Works company's future business plan based
on the 4P model (Product - Price - Place - Promotion)
1.3 Subject and research scope of the project:
- Subject: Microsoft's AdventureWorks database, a free dataset
- Research scope: Information from the Manufacturing, Sales, Purchasing, Product Management, Reseller Management, and Human Resources is investigated in this research
1.4 Tools used:
- SQL Server
- Power BI
1.5 Research implications:
After finishing this research, Adventure Works can:
- Identify target customers
- Review the statistical information of the company to build some strategies and change it promptly
Trang 11- Statistics on employee capacity and performance, thereby providing appropriate business strategies as well as rewards or training
1.6 Structure of report:
Chapter 1: TOPIC OVERVIEW
Chapter 2: THEORETICAL BACKGROUND AND RELATED WORKS
Chapter 3: ANALYSIS OF USER REQUIREMENTS AND DATA DESCRIPTION Chapter 4: EXPERIMENTAL RESULTS AND ANALYSIS
Chapter 5: CONCLUSION
Trang 12CHAPTER 2 THEORETICAL BACKGROUND AND RELATED WORKS
2.1 Overview of BI:
In the contemporary business landscape, organizations find themselves amassing vast volumes of data pertaining to various facets such as sales, inventory, customer details, and supplier information, alongside comprehensive employee records Despite the abundance of data, its utility in guiding managerial decisions remains limited Establishing a unified database for the systematic classification and organization of this data holds immense potential This approach enables businesses
to retrospectively assess past performance while also facilitating the anticipation of future scenarios
The adoption of Business Intelligence (Bl) emerges as a pivotal solution in this context Coined by Howard Dresner in 1989, BI is a broad term encompassing a range of concepts and methodologies aimed at enhancing decision-making through diverse information techniques As articulated by Turban et al (2008), BI can be conceptualized as a suite of applications and techniques designed to collect, store, analyze, and provide data access, empowering business users in their decision- making processes The scope of BI applications spans decision support systems (DSS), query and reporting, online processing analysis (OLAP), statistical analysis, forecasting, and data mining
Carlo (2009) further refines the definition, portraying BI as a collection of mathematical models and analytical methods that delve into existing data to extract valuable information and knowledge crucial for decision-making By incorporating
BI methodologies, businesses gain a competitive edge by enabling managers to make faster and more informed decisions This integrated approach fosters a comprehensive understanding of past business scenarios, empowering enterprises to predict and navigate future situations with greater precision In essence, BI emerges
as an indispensable tool, ushering in a new era of strategic decision-making for businesses aiming to stay ahead in today's dynamic and competitive markets
Trang 132.1.1 Introduce BI model and solution
Figure 2 1 BI model
Within the framework of Business Intelligence (Bl), a comprehensive model consists of key components aimed at enhancing data -driven decision-making for businesses These components include:
- Data Modeling: The data modeling process entails the analysis and definition
of data types and interconnections within the business context This includes the creation of conceptual, logical, and physical data models, employing text, symbols, and diagrams
- Data Mining: Data mining is an automated process focused on revealing patterns and anomalies within data, employing diverse analytical techniques such as exploratory, descriptive, statistical, and predictive analytics
- Data Visualization: The process of data visualization involves presenting findings in an intuitive and interactive manner through mediums such as dashboards, charts, graphs, and maps
implementation of actions guided by data insights This includes adapting
performance, establishing benchmarks, and addressing challenges
Trang 14These components collectively form a robust BI model, empowering businesses to make more informed decisions and enhance their efficiency, profitability, and competitiveness By incorporating these elements into their operations, organizations can leverage the full potential of BI to navigate dynamic market conditions and achieve sustainable growth
2.1.2 The benefits of BI in the business
Numerous scholarly papers underscore the advantages of Business Intelligence (BI) in the corporate landscape, highlighting several key points:
Informed Strategic Decisions: BI plays a pivotal role in empowering businesses to make well-informed strategic decisions, delivering accurate and timely data and insights crucial for navigating the dynamic business landscape
- Trend and Pattern Identification: BI serves as a valuable tool for businesses to discern trends and patterns within their data, offering insights into customer behavior, market demand, sales performance, and operational efficiency
- Performance and Revenue Optimization: BI becomes a catalyst for businesses seeking to enhance performance and revenue through the optimization of marketing and sales strategies, the improvement of customer satisfaction and retention, and the reinforcement of competitive advantage
- Operational Efficiency Enhancement: Businesses leverage Bl to elevate operational efficiency, undertaking measures to reduce costs, eliminate waste, streamline processes, and bolster overall quality and productivity
- Opportunity Discovery through Predictions: Through the power of BI, businesses uncover opportunities for improvement by harnessing predictive capabilities, whether it be in forecasting demand, identifying risks, or receiving actionable recommendations
- Creation of Smarter and Faster Reports: BI empowers businesses to generate reports that are not only smarter and faster but also easily comprehensible, shareable, and actionable, facilitating efficient decision-making processes
Trang 152.2 Data analysis and visualization
Data analysis involves the exploration of extensive stored data to unveil novel relationships, patterns, and trends This process employs pattern recognition technologies, statistical methods, and mathematical techniques to scrutinize repositories comprehensively Conceptually, data analysis can be likened to "data drilling" in depth and "data aggregation" in breadth, delving into data from multiple perspectives to discern relationships among its components This approach aims to uncover hidden trends, patterns, and past experiences within the data warehouse, ultimately supporting operational processes and decision-making
A crucial aspect of the broader business intelligence landscape is data visualization Simply put, data visualization entails presenting a specific dataset in a visual format, including charts, graphs, maps, and more The graphical representation
of text-based data allows for the identification of new insights and concealed patterns that might be challenging to discern in raw, non-graphical forms
The primary motivation behind data visualization is to identify patterns, trends, and relationships among diverse datasets that might be less apparent in a non- graphical representation This visual approach enhances users' understanding of market dynamics and facilitates the evaluation of customer needs Consequently, businesses can evolve by developing new strategies and techniques to enhance their operations Recognizing the significance of this, software companies are channeling their efforts into optimizing their Business Intelligence (Bl) tools to provide the most effective data visibility This emphasis on data visibility is integral to unveiling concealed information within the warehouse, contributing to more informed decision- making processes
2.2.1 Theory and Methods in Data Analysis
Theoretical Frameworks in Data Analysis:
- John Tukey’s Exploratory Data Analysis (EDA) theory proposed by John Tukey emphasizes the analysis of datasets to succinctly capture their essential characteristics This typically involves employing statistical graphics and various data visualization techniques
Trang 16- Confirmatory Data Analysis (CDA): Confirmatory Data Analysis (CDA) focuses on employing conventional statistical tools to rigorously evaluate data, aiming to scrutinize and challenge any assumptions that may have arisen during the Exploratory Data Analysis phase
- Grounded Theory of Analysis: The Grounded Theory of Analysis unfolds in two stages: first, collecting a substantial amount of information; second, analyzing all gathered data, indexing it, and discovering relevance This iterative process continues as more data is collected and analyzed
Data Analysis Methods:
Multidimensional Cubes: Employing multidimensional cubes enables managers and employees to explore data comprehensively by utilizing operations like rotation, slicing, and drill-down, providing versatile perspectives on the dataset
- Time Series Analysis: Time series analysis entails systematically recording data over a defined period, facilitating the identification of trends and differences over time This method proves effective in helping companies predict and forecast future developments based on historical data
- Data Mining: Data mining is a method wherein large datasets are scrutinized
to identify trends and patterns, revealing valuable insights into customer behaviors, habits, and evolving trends
- Optimization Models: Optimization models consist of three fundamental elements—objective function, decision variables, and business constraints— working together to pinpoint the most favorable solutions from a predetermined set of options
2.2.2 Visualization
Data visualization serves as a conduit, transforming information from numerical metrics into visual representations, typically in the form of charts The primary purpose is to facilitate easy comprehension of information for managers and department employees, enabling swift and informed decision-making
Trang 17Figure 2 2 Data visualization by Power Bl
Various types of charts cater to specific objectives within the realm of data visualization:
- Specific Value Representation: Charts designed for specific value representation encompass single-value charts, tables, and highlight tables These aid in presenting individual data points or key metrics clearly
- Comparison Charts: Comparison charts are diverse, ranging from single and multiple lines to bar charts, group bar charts, and bullet charts These aim to facilitate comparisons between different datasets, helping users discern trends and variations
- Relationship Visualization: Relationship visualization encompasses scatter plots, bubble charts, and word clouds These charts focus on illustrating connections and associations between data points, aiding in the interpretation
of relationships within datasets
- Composition Charts: Composition charts, such as tree maps, pie charts, and donut charts, are designed to showcase the composition of a whole These visualizations help in depicting the distribution of components within a dataset
- Distribution Charts: Distribution charts like box plots, scatter plots, and histograms are employed to represent the spread and distribution of data They offer insights into the variation and concentration of values within a dataset
Trang 18Geographic Visualization: Geographic visualization involves filled maps and symbol maps These charts are particularly useful for showcasing geographical data, and helping users understand spatial distribution and patterns
In essence, the variety of charts available in data visualization serves distinct purposes, empowering users to interpret and act upon information swiftly and effectively
2.3 Data warehouse:
At its core, a data warehouse is a centralized repository strategically designed
to integrate and store vast volumes of data originating from diverse sources within an organization This consolidation serves a pivotal role in supporting business intelligence (BI} and decision-making processes by furnishing a unified and historical perspective on data Several fundamental theoretical concepts underpin the structure and functionality of data warehouses:
Data Integration: Data warehouses excel in integrating information from disparate sources such as transactional databases, spreadsheets, and externa systems This integration process involves transforming and cleaning data to ensure uniformity and quality
Dimensional Modeling: The theoretical cornerstone of data warehousing, dimensional modeling, entails organizing data into dimensions and facts Ralph Kimball's (2011) contributions, particularly in the realm of dimensional modeling, have profoundly shaped the landscape of data warehousing His works provide actionable insights for constructing effective and scalable data warehouses This creates a star or snowflake schema where dimensions encapsulate descriptive data, and facts encapsulate measurable data
ETL (Extract, Transform, Load): ETL processes constitute a foundational element, orchestrating the extraction of data from source systems, its transformation to fit the data warehouse schema, and ultimately loading it into the warehouse This meticulous process safeguards data quality and consistency
10
Trang 19OLAP (Online Analytical Processing): Integral to data warehousing, OLAP refers to a spectrum of tools and technologies that empower users to interactively analyze multidimensional data Data warehouses are explicitly designed to facilitate OLAP queries for intricate analysis and reporting Historical Data Storage: A distinguishing feature of data warehouses is their ability to store historical data This capability allows users to analyze trends and make informed decisions grounded in a comprehensive, long-term perspective
Data Mart and Enterprise Data Warehouse: The conceptualization of data marts as subsets focused on specific business functions or departments, and the integration of these into the enterprise data warehouse (EDW), epitomizes the scalability and comprehensiveness of data warehousing solutions
2.4 SSIS:
SQL Server Integration Services (SSIS) is a powerful ETL (Extract, Transform,
Load) tool developed by Microsoft for data integration and workflow applications It
is a fundamental component of the SQL Server database platform, designed to facilitate the extraction, transformation, and loading of data from various sources into target destinations Numerous books and online tutorials authored by SSIS experts, such as Brian Knight and Andy Leonard (2005), offer practical insights into SSIS development, covering topics from basic concepts to advanced techniques:
Data Flow: At the heart of SSIS is the data flow engine, which defines the movement and transformation of data between sources and destinations The data flow is composed of data flow components that enable diverse operations, such as data cleansing, aggregation, and merging
Control Flow: SSIS employs a control flow to manage the flow of tasks and containers in a package Control flow elements include tasks (e.g., data extraction, transformation, and loading tasks), precedence constraints to define the order of execution, and containers for grouping tasks
SSIS Package: A package is a collection of interconnected data flow and control flow elements It serves as a container for organizing and executing
ETL processes Packages can be developed using SQL Server Data Tools
11
Trang 20(SSDT) and executed using SQL Server Management Studio (SSMS) or through the SSIS runtime
Connection Managers: Connection managers in SSIS define the connection information for source and destination systems They play a crucial role in establishing connections to various data sources and destinations, ensuring seamless data movement
Transformations: SSIS includes a variety of transformations that enable the manipulation and enrichment of data during the ETL process Common transformations include sorting, merging, and aggregating data
Expressions and Variables: Expressions and variables in SSIS allow for dynamic configurations and the manipulation of values during runtime Expressions can be used to set properties dynamically, enhancing the flexibility of SSIS packages
2.5 Schema:
A schema is a fundamental concept in database management and information organization, providing a blueprint or framework for structuring and defining the logical organization of data It serves as a set of rules or specifications that dictate how data should be organized, stored, and accessed within a database The theoretical background of schemas encompasses several key aspects:
Database Schema: A database schema defines the structure of a database, including tables, relationships, constraints, and other elements It serves as a high-level abstraction that provides an organized representation of the data model, facilitating data integrity and consistency
Schema Elements: Within a database schema, various elements contribute to data organization These include tables, which represent entities, columns that define attributes, primary and foreign keys for relationship establishment, and constraints to enforce data integrity rules
Normalization: The process of normalization, based on normalization forms (e.g., First Normal Form, Second Normal Form), is a theoretical framework for organizing data within a schema to eliminate redundancy and dependency
12
Trang 21issues Normalization aims to enhance data integrity and reduce data anomalies
introducing redundancy into a schema for performance optimization It is a theoretical concept often applied in data warehousing or scenarios where read performance is a priority over data modification efficiency
Schema Evolution: Schema evolution refers to the process of modifying a database schema over time to accommodate changes in data requirements Theoretical considerations include strategies for versioning, backward compatibility, and migration to ensure a seamless transition when updating the schema
13
Trang 22CHAPTER 3 ANALYSIS OF USER REQUIREMENTS AND DATA
DESCRIPTION
3.1 Apply the development life cycle of a data analytics project
The data analytics lifecycle outlines the six fundamental steps of a data analytics project based on the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) This process consists of business & data understanding, preparing the data, exploratory analysis, validation, visualization and presentation, according to Paula Munoz, an alumni of Northeastern
Northeastern University
Based on CRISP-DM Methodology
Stepi Step 2 Step3 Step 4 Step 5 Step 6
Figure 3 1 The Process of Data Analysis in Six Steps
Step 1: Business Issues Understanding
This phase concentrates on understanding the objectives and requirements of the project
Determine business objectives: thoroughly understand, from a business perspective, what the customer really wants to accomplish, and then define business success criteria
14
Trang 23Assess situation: determine resources availability, project requirements, assess risks and contingencies, and conduct a cost-benefit analysis
Determine data mining goals: in addition, define what success looks like from
a technical data mining perspective
Produce project plan: Select technologies and tools and define detailed plans for each project phase
Step 2: Data Understanding
Focus to identify, collect, and analyze the data sets, including 4 tasks:
Collect initial data: import necessary data into an analysis tool
Describe data: examine the data and document its properties such as data format, number of records, or field identities
Explore data: query, visualize and identify relationships among the data Verify data quality: check data’s cleanliness and set quality issues
Step 3: Data Preparation
Often referred to as “data munging”, prepare the final dataset for modeling:
Select data: define reasons for data inclusion/exclusion
Clean data: correct, impute, or remove erroneous values
Construct data: derive new helpful attributes
Integrate data: combine data from multiple sources
Format data: re-format data as necessary
Step 4: Perform Exploratory Analysis and Modeling
Start creating models to test data and search for solutions to the stated goals Select modeling techniques: determine algorithms
15
Trang 24Generate test design: pending modeling approach (split the data into training, test, and validation sets)
Build model: code execution
Assess model: interpret the model results based on domain knowledge, the pre-defined success criteria, and the test design
Step 5: Validation
Once the predictive models are constructed, conduct thorough data analysis to assess their effectiveness Verify the accuracy and validity of the information utilized in the models
And then, determine whether the developed models operate as envisioned Examine their behavior, performance, and alignment with the initial objectives
Evaluate the need for additional data cleansing Identify any persistent anomalies, inconsistencies, or outliers that may impact model performance Step 6: Visualization and Presentation
Once all deliverables are completed, begin working on data visualization Data visualization is often crucial for effectively communicating findings to clients Interactive visualization tools like Tableau are valuable in explaining research findings to clients who may not be data experts Weaving a narrative with the data is essential for conveying the significance of the research to the client Then, clearly define the project objectives to ensure a successful outcome Break down the project into specific tasks to streamline the process and deliver exceptional results Finally, gather all necessary information before starting the project to avoid delays and rework
16
Trang 253.2 Identify and analyze user requirements
3.2.1 Business overview analysis
Requirements: Gathering statistical data concerning fundamental business activities across the enterprise:
- Sales volume, revenue, profits, employee count, resellers, and products
- Group-based, regional, and product category-specific statistics, segmented by quarterly periods
- Analyzing and filtering these metrics annually
Benefits: Furnishes business managers with a holistic overview of their ongoing business endeavors
3.2.2 Revenue analysis
Requirements: We need comprehensive statistics delineating revenue and business profits categorized by product, region, and broken down into monthly, quarterly, and annual data
- Total revenue, profit
- Year on Year Revenue Growth (Net Revenue YoY%) = (this year revenue / last year - 1) * 100
- Year on Year Profit Growth (Net Profit YoY%) = (this year profit / last year -
Trang 26- Increase and decrease in revenue over the years Benefits: Offer a full picture of business performance They reveal sales trends, highlight successful product categories, and identify top-performing items, guiding strategic decisions for maximizing potential and market focus
3.2.4 Sales person analysis
Requirements: Employee sales performance statistics
- Percentage of target completion for individual employees
- Top-performing employees Benefits: Employee performance statistics can help identify top performers, pinpoint areas for improvement, and allocate resources more effectively, optimizing organizational strategies and business outcomes
3.3 SQL Server Integration Services
We have implemented a process involving SQL Server Integration Services (SSIS)
to import data from CSV and Excel files into SQL Server This approach is commonly used in building data warehouses where you organize data into dimensional (dim) and fact tables
We carried out a process using SQL Server Integration Services (SSIS) to transfer
data from CSV into SQL Server This method is frequently used to create data
warehouses where the data is structured into dimensional (dim) and fact tables Here's an extended explanation:
3.3.1 Data Extraction Using SSIS:
Integration Services (SSIS) is a powerful tool for ETL purposes (Extract, Transform,
Load) provided by Microsoft SQL Server It can be used to perform a broad range of
data migration tasks
SSIS provides specific connectors and components to efficiently handle various data sources including CSV files which are our data type
18
Trang 273.3.2 Loading Data into SQL Server:
SQL Server is the destination of data flow: The extracted data is loaded into SQL
Server, which serves as the central repository for our data
SSIS Data Flow: SIS enables the creation of data flow tasks, allowing for the movement of data from a source to a destination During this process, the data undergoes transformations to ensure it is properly cleansed and formatted before being stored in SQL Server This ensures the data meets the required standards and
is ready for analysis and utilization
3.3.3 Designing Dim and Fact Tables:
Data Warehouse Structure: In a data warehouse, data is structured into two dimensional (dim) and fact tables Dim tables generally hold descriptive information, such as product details, with data fields like name, product code, and standard cost Fact tables predominantly store quantitative data, such as sales records, with metrics like revenue, quantity, and date The dimensional and fact tables are interconnected, with dim tables providing context and attributes for the quantitative data present in the fact tables
Normalization and Star Schema: The design may involve normalizing dim tables and using a star schema where a fact table is connected to multiple dim tables, creating a more flexible and efficient structure for analytical queries
3.3.4 Populating Dim and Fact Tables:
SSIS Packages for Loading: SSIS packages are created to populate the dim and fact tables in SQL Server These packages can either be scheduled to run automatically or triggered manually based on how frequently your data requires updates
Transformations and Lookups: SSIS enables data transformations during the loading process Lookup transformations can be used to enhance fact tables with relevant information retrieved from dimension tables
3.3.5 Maintenance and Optimization:
Indexing and Statistics: To optimize query performance in SQL Server, ensure proper indexing and consistent maintenance of indexes Regularly update statistics and
19