1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Data analysis in business

55 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Analyze Sales Data of AdventureWorks Using Power PI
Tác giả Vo Thi Ngoc Trinh, Nguyen Tran Bich Ngoc, Truong Do Dang Khoa, Nguyen Thanh Phat
Người hướng dẫn M.Sc. Le Ba Thien
Trường học Vietnam National University of HCM City, University of Economics and Laws, Faculty of Information System
Chuyên ngành Information System
Thể loại Project Report
Năm xuất bản 2024
Thành phố Ho Chi Minh City
Định dạng
Số trang 55
Dung lượng 2,99 MB

Cấu trúc

  • CHAPTER 1. TOPIC OVERVIEW (9)
    • 1.1. The reason for choosing the topic (9)
    • 1.2. Topic goal (9)
    • 1.3. Subject and research scope of the project (10)
    • 1.4. Tools used (10)
    • 1.5. Research implications (10)
    • 1.6. Structure of report (11)
  • CHAPTER 2. THEORETICAL BACKGROUND AND RELATED WORKS . 4 2.1. Overview of BI (12)
    • 2.1.1. Introduce BI model and solution (13)
    • 2.1.2. The benefits of BI in the business (14)
    • 2.2. Data analysis and visualization (15)
      • 2.2.1. Theory and Methods in Data Analysis (15)
      • 2.2.2. Visualization (16)
    • 2.3. Data warehouse (18)
    • 2.4. SSIS (19)
    • 2.5. Schema (20)
  • CHAPTER 3. ANALYSIS OF USER REQUIREMENTS AND DATA (22)
    • 3.1. Apply the development life cycle of a data analytics project (22)
    • 3.2. Identify and analyze user requirements (25)
      • 3.2.1. Business overview analysis (25)
      • 3.2.2. Revenue analysis (25)
      • 3.2.3. Product analysis (25)
      • 3.2.4. Sales person analysis (26)
    • 3.3. SQL Server Integration Services (26)
      • 3.3.1. Data Extraction Using SSIS (26)
      • 3.3.2. Loading Data into SQL Server (27)
      • 3.3.3. Designing Dim and Fact Tables (27)
      • 3.3.4. Populating Dim and Fact Tables (27)
      • 3.3.5. Maintenance and Optimization (27)
      • 3.3.6. Data Quality and Consistency (28)
    • 3.4. Overview of the data warehouse (33)
      • 3.4.1. Description of data building reports (33)
      • 3.4.2. Data tables (34)
      • 3.4.3. Data warehouse model (39)
  • CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS (43)
    • 4.1. Data analysis and visualization (43)
      • 4.1.1. General analysis (43)
      • 4.1.2. Revenue analysis (44)
      • 4.1.3. Product analysis (46)
      • 4.1.4. Employee Evaluation (48)
    • 4.2. Evaluation and Suggestion (49)
      • 4.2.1. Evaluation (49)
      • 4.2.2. Suggestion (50)
  • CHAPTER 5. CONCLUSION (51)
    • 5.1. Results (51)
      • 5.1.1. Advantages (51)
      • 5.1.2. Disadvantages (51)
    • 5.2. Thread development direction (51)

Nội dung

These components include: - Data Modeling: The data modeling process entails the analysis and definition of data types and interconnections within the business context.. - Data Mining: D

TOPIC OVERVIEW

The reason for choosing the topic

Recognizing data as an indispensable component for any firm, the aim is to conduct thorough research to explore diverse perspectives on handling data effectively The ultimate goal is to furnish the business with a plethora of intelligent and practical recommendations

The decision to delve into this subject is underpinned by our belief that a comprehensive understanding of data, combined with our foundational knowledge in the business domain, will empower us to engage in meaningful data mining activities

By leveraging our insights into the business area, we anticipate being able to extract valuable information and contribute substantially to the overarching goals of the research

Specifically, the chosen company for analysis is Adventure Works, a prominent global entity engaged in the manufacturing and sale of diverse products, ranging from clothing and accessories to bicycle parts and complete bicycles Operating in a commercial market that spans six countries across three continents — Australia, North America (United States and Canada), and Europe (United Kingdom, France, and Germany) Adventure Works presents a rich and diverse dataset for — investigation

Furthermore, the delineation of the company's primary sales channels, namely online and wholesaler sales, adds an additional layer of complexity to the analysis This multi-faceted approach aligns with our intention to explore various dimensions of data, providing a well-rounded perspective on how AdventureWorks operates in its global market

Through this exploration, we aim to contribute valuable insights that can inform data-driven decision-making processes and strategies for businesses operating in a multifaceted, global marketplace.

Topic goal

- Analyze the business model from the perspectives of revenue, staff, and

- Sort the best-selling items, then group consumers and areas according to them

- Build a report including four dashboards:

+ Detailed business situation by product

+ Detailed business situation according to employee

+ Detailed business situation according to revenue

- Make some suggestions for the business

- Create some potential paths for the topic's development

- Make a proposal for Adventure Works company's future business plan based on the 4P model (Product - Price - Place - Promotion).

Subject and research scope of the project

- Subject: Microsoft's AdventureWorks database, a free dataset

- Research scope: Information from the Manufacturing, Sales, Purchasing, Product Management, Reseller Management, and Human Resources is investigated in this research.

Tools used

Research implications

After finishing this research, Adventure Works can:

- Review the statistical information of the company to build some strategies and change it promptly

- Statistics on employee capacity and performance, thereby providing appropriate business strategies as well as rewards or training.

Structure of report

Chapter 2: THEORETICAL BACKGROUND AND RELATED WORKS

Chapter 3: ANALYSIS OF USER REQUIREMENTS AND DATA DESCRIPTION Chapter 4: EXPERIMENTAL RESULTS AND ANALYSIS

THEORETICAL BACKGROUND AND RELATED WORKS 4 2.1 Overview of BI

Introduce BI model and solution

Within the framework of Business Intelligence (BI), a comprehensive model consists of key components aimed at enhancing data -driven decision-making for businesses These components include:

- Data Modeling: The data modeling process entails the analysis and definition of data types and interconnections within the business context This includes the creation of conceptual, logical, and physical data models, employing text, symbols, and diagrams

- Data Mining: Data mining is an automated process focused on revealing patterns and anomalies within data, employing diverse analytical techniques such as exploratory, descriptive, statistical, and predictive analytics

- Data Visualization: The process of data visualization involves presenting findings in an intuitive and interactive manner through mediums such as dashboards, charts, graphs, and maps

- Data Action: Data action encompasses the decision-making and implementation of actions guided by data insights This includes adapting operational processes, understanding customer behavior, monitoring performance, establishing benchmarks, and addressing challenges

These components collectively form a robust BI model, empowering businesses to make more informed decisions and enhance their efficiency, profitability, and competitiveness By incorporating these elements into their operations, organizations can leverage the full potential of BI to navigate dynamic market conditions and achieve sustainable growth.

The benefits of BI in the business

Numerous scholarly papers underscore the advantages of Business Intelligence (BI) in the corporate landscape, highlighting several key points:

Informed Strategic Decisions: BI plays a pivotal role in empowering businesses to make well-informed strategic decisions, delivering accurate and timely data and insights crucial for navigating the dynamic business landscape

- Trend and Pattern Identification: BI serves as a valuable tool for businesses to discern trends and patterns within their data, offering insights into customer behavior, market demand, sales performance, and operational efficiency

- Performance and Revenue Optimization: BI becomes a catalyst for businesses seeking to enhance performance and revenue through the optimization of marketing and sales strategies, the improvement of customer satisfaction and retention, and the reinforcement of competitive advantage

- Operational Efficiency Enhancement: Businesses leverage BI to elevate operational efficiency, undertaking measures to reduce costs, eliminate waste, streamline processes, and bolster overall quality and productivity

- Opportunity Discovery through Predictions: Through the power of BI, businesses uncover opportunities for improvement by harnessing predictive capabilities, whether it be in forecasting demand, identifying risks, or receiving actionable recommendations

- Creation of Smarter and Faster Reports: BI empowers businesses to generate reports that are not only smarter and faster but also easily comprehensible, shareable, and actionable, facilitating efficient decision-making processes.

Data analysis and visualization

Data analysis involves the exploration of extensive stored data to unveil novel relationships, patterns, and trends This process employs pattern recognition technologies, statistical methods, and mathematical techniques to scrutinize repositories comprehensively Conceptually, data analysis can be likened to "data drilling" in depth and "data aggregation" in breadth, delving into data from multiple perspectives to discern relationships among its components This approach aims to uncover hidden trends, patterns, and past experiences within the data warehouse, ultimately supporting operational processes and decision-making.

A crucial aspect of the broader business intelligence landscape is data visualization Simply put, data visualization entails presenting a specific dataset in a visual format, including charts, graphs, maps, and more The graphical representation of text-based data allows for the identification of new insights and concealed patterns that might be challenging to discern in raw, non-graphical forms

The primary motivation behind data visualization is to identify patterns, trends, and relationships among diverse datasets that might be less apparent in a non- graphical representation This visual approach enhances users' understanding of market dynamics and facilitates the evaluation of customer needs Consequently, businesses can evolve by developing new strategies and techniques to enhance their operations Recognizing the significance of this, software companies are channeling their efforts into optimizing their Business Intelligence (BI) tools to provide the most effective data visibility This emphasis on data visibility is integral to unveiling concealed information within the warehouse, contributing to more informed decision- making processes

2.2.1 Theory and Methods in Data Analysis

Theoretical Frameworks in Data Analysis:

- John Tukey’s Exploratory Data Analysis (EDA) theory proposed by John

Tukey emphasizes the analysis of datasets to succinctly capture their essential characteristics This typically involves employing statistical graphics and various data visualization techniques

- Confirmatory Data Analysis (CDA): Confirmatory Data Analysis (CDA) focuses on employing conventional statistical tools to rigorously evaluate data, aiming to scrutinize and challenge any assumptions that may have arisen during the Exploratory Data Analysis phase

- Grounded Theory of Analysis: The Grounded Theory of Analysis unfolds in two stages: first, collecting a substantial amount of information; second, analyzing all gathered data, indexing it, and discovering relevance This iterative process continues as more data is collected and analyzed

Multidimensional Cubes: Employing multidimensional cubes enables managers and employees to explore data comprehensively by utilizing operations like rotation, slicing, and drill-down, providing versatile perspectives on the dataset

- Time Series Analysis: Time series analysis entails systematically recording data over a defined period, facilitating the identification of trends and differences over time This method proves effective in helping companies predict and forecast future developments based on historical data

- Data Mining: Data mining is a method wherein large datasets are scrutinized to identify trends and patterns, revealing valuable insights into customer behaviors, habits, and evolving trends

- Optimization Models: Optimization models consist of three fundamental elements objective function, decision variables, and business constraints— — working together to pinpoint the most favorable solutions from a predetermined set of options

Data visualization serves as a conduit, transforming information from numerical metrics into visual representations, typically in the form of charts The primary purpose is to facilitate easy comprehension of information for managers and department employees, enabling swift and informed decision-making

Figure 2 2 Data visualization by Power BI

Various types of charts cater to specific objectives within the realm of data visualization:

- Specific Value Representation: Charts designed for specific value representation encompass single-value charts, tables, and highlight tables These aid in presenting individual data points or key metrics clearly

- Comparison Charts: Comparison charts are diverse, ranging from single and multiple lines to bar charts, group bar charts, and bullet charts These aim to facilitate comparisons between different datasets, helping users discern trends and variations

- Relationship Visualization: Relationship visualization encompasses scatter plots, bubble charts, and word clouds These charts focus on illustrating connections and associations between data points, aiding in the interpretation of relationships within datasets

- Composition Charts: Composition charts, such as tree maps, pie charts, and donut charts, are designed to showcase the composition of a whole These visualizations help in depicting the distribution of components within a dataset

- Distribution Charts: Distribution charts like box plots, scatter plots, and histograms are employed to represent the spread and distribution of data They offer insights into the variation and concentration of values within a dataset.

- Geographic Visualization: Geographic visualization involves filled maps and symbol maps These charts are particularly useful for showcasing geographical data, and helping users understand spatial distribution and patterns

In essence, the variety of charts available in data visualization serves distinct purposes, empowering users to interpret and act upon information swiftly and effectively.

Data warehouse

At its core, a data warehouse is a centralized repository strategically designed to integrate and store vast volumes of data originating from diverse sources within an organization This consolidation serves a pivotal role in supporting business intelligence (BI) and decision-making processes by furnishing a unified and historical perspective on data Several fundamental theoretical concepts underpin the structure and functionality of data warehouses:

- Data Integration: Data warehouses excel in integrating information from disparate sources such as transactional databases, spreadsheets, and externa systems This integration process involves transforming and cleaning data to ensure uniformity and quality

- Dimensional Modeling: The theoretical cornerstone of data warehousing, dimensional modeling, entails organizing data into dimensions and facts Ralph Kimball's (2011) contributions, particularly in the realm of dimensional modeling, have profoundly shaped the landscape of data warehousing His works provide actionable insights for constructing effective and scalable data warehouses This creates a star or snowflake schema where dimensions encapsulate descriptive data, and facts encapsulate measurable data

- ETL (Extract, Transform, Load): ETL processes constitute a foundational element, orchestrating the extraction of data from source systems, its transformation to fit the data warehouse schema, and ultimately loading it into the warehouse This meticulous process safeguards data quality and consistency

- OLAP (Online Analytical Processing): Integral to data warehousing, OLAP refers to a spectrum of tools and technologies that empower users to interactively analyze multidimensional data Data warehouses are explicitly designed to facilitate OLAP queries for intricate analysis and reporting

- Historical Data Storage: A distinguishing feature of data warehouses is their ability to store historical data This capability allows users to analyze trends and make informed decisions grounded in a comprehensive, long-term perspective.

SSIS

SQL Server Integration Services (SSIS) is a powerful ETL (Extract, Transform, Load) tool developed by Microsoft for data integration and workflow applications It is a fundamental component of the SQL Server database platform, designed to facilitate the extraction, transformation, and loading of data from various sources into target destinations Numerous books and online tutorials authored by SSIS experts, such as Brian Knight and Andy Leonard (2005), offer practical insights into SSIS development, covering topics from basic concepts to advanced techniques:

- Data Flow: At the heart of SSIS is the data flow engine, which defines the movement and transformation of data between sources and destinations The data flow is composed of data flow components that enable diverse operations, such as data cleansing, aggregation, and merging

- Control Flow: SSIS employs a control flow to manage the flow of tasks and containers in a package Control flow elements include tasks (e.g., data extraction, transformation, and loading tasks), precedence constraints to define the order of execution, and containers for grouping tasks

- SSIS Package: A package is a collection of interconnected data flow and control flow elements It serves as a container for organizing and executing

(SSDT) and executed using SQL Server Management Studio (SSMS) or through the SSIS runtime

- Connection Managers: Connection managers in SSIS define the connection information for source and destination systems They play a crucial role in establishing connections to various data sources and destinations, ensuring seamless data movement

- Transformations: SSIS includes a variety of transformations that enable the manipulation and enrichment of data during the ETL process Common transformations include sorting, merging, and aggregating data

- Expressions and Variables: Expressions and variables in SSIS allow for dynamic configurations and the manipulation of values during runtime Expressions can be used to set properties dynamically, enhancing the flexibility of SSIS packages.

Schema

A schema is a fundamental concept in database management and information organization, providing a blueprint or framework for structuring and defining the logical organization of data It serves as a set of rules or specifications that dictate how data should be organized, stored, and accessed within a database The theoretical background of schemas encompasses several key aspects:

- Database Schema: A database schema defines the structure of a database, including tables, relationships, constraints, and other elements It serves as a high-level abstraction that provides an organized representation of the data model, facilitating data integrity and consistency

- Schema Elements: Within a database schema, various elements contribute to data organization These include tables, which represent entities, columns that define attributes, primary and foreign keys for relationship establishment, and constraints to enforce data integrity rules

- Normalization: The process of normalization, based on normalization forms (e.g., First Normal Form, Second Normal Form), is a theoretical framework for organizing data within a schema to eliminate redundancy and dependency issues Normalization aims to enhance data integrity and reduce data anomalies

- Denormalization: Conversely, denormalization involves deliberately introducing redundancy into a schema for performance optimization It is a theoretical concept often applied in data warehousing or scenarios where read performance is a priority over data modification efficiency

- Schema Evolution: Schema evolution refers to the process of modifying a database schema over time to accommodate changes in data requirements Theoretical considerations include strategies for versioning, backward compatibility, and migration to ensure a seamless transition when updating the schema.

ANALYSIS OF USER REQUIREMENTS AND DATA

Apply the development life cycle of a data analytics project

The data analytics lifecycle outlines the six fundamental steps of a data analytics project based on the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) This process consists of business & data understanding, preparing the data, exploratory analysis, validation, visualization and presentation, according to Paula Muủoz, an alumni of Northeastern

Figure 3 1 The Process of Data Analysis in Six Steps

This phase concentrates on understanding the objectives and requirements of the project

- Determine business objectives: thoroughly understand, from a business perspective, what the customer really wants to accomplish, and then define business success criteria

- Assess situation: determine resources availability, project requirements, assess risks and contingencies, and conduct a cost-benefit analysis

- Determine data mining goals: in addition, define what success looks like from a technical data mining perspective

- Produce project plan: Select technologies and tools and define detailed plans for each project phase

Focus to identify, collect, and analyze the data sets, including 4 tasks:

- Collect initial data: import necessary data into an analysis tool

- Describe data: examine the data and document its properties such as data format, number of records, or field identities

- Explore data: query, visualize and identify relationships among the data

- Verify data quality: check data’s cleanliness and set quality issues

Often referred to as “data munging”, prepare the final dataset for modeling:

- Select data: define reasons for data inclusion/exclusion

- Clean data: correct, impute, or remove erroneous values

- Construct data: derive new helpful attributes

- Integrate data: combine data from multiple sources

- Format data: re-format data as necessary

Step 4: Perform Exploratory Analysis and Modeling

Start creating models to test data and search for solutions to the stated goals

- Select modeling techniques: determine algorithms

- Generate test design: pending modeling approach (split the data into training, test, and validation sets)

- Assess model: interpret the model results based on domain knowledge, the pre-defined success criteria, and the test design

- Once the predictive models are constructed, conduct thorough data analysis to assess their effectiveness Verify the accuracy and validity of the information utilized in the models

- And then, determine whether the developed models operate as envisioned Examine their behavior, performance, and alignment with the initial objectives

- Evaluate the need for additional data cleansing Identify any persistent anomalies, inconsistencies, or outliers that may impact model performance Step 6: Visualization and Presentation

- Once all deliverables are completed, begin working on data visualization Data visualization is often crucial for effectively communicating findings to clients Interactive visualization tools like Tableau are valuable in explaining research findings to clients who may not be data experts Weaving a narrative with the data is essential for conveying the significance of the research to the client

- Then, clearly define the project objectives to ensure a successful outcome Break down the project into specific tasks to streamline the process and deliver exceptional results Finally, gather all necessary information before starting the project to avoid delays and rework.

Identify and analyze user requirements

Requirements: Gathering statistical data concerning fundamental business activities across the enterprise:

- Sales volume, revenue, profits, employee count, resellers, and products

- Group-based, regional, and product category-specific statistics, segmented by quarterly periods

- Analyzing and filtering these metrics annually

Benefits: Furnishes business managers with a holistic overview of their ongoing business endeavors

Requirements: We need comprehensive statistics delineating revenue and business profits categorized by product, region, and broken down into monthly, quarterly, and annual data

- Year on Year Revenue Growth (Net Revenue YoY%) = (this year revenue / last year - 1) * 100

- Year on Year Profit Growth (Net Profit YoY%) = (this year profit / last year -

Benefits: Delivers thorough growth and revenue information, helping business managers make decisions about adjusting resellers, regions and the products they sell in a clear, understandable manner

Requirements: Statistics on revenue, costs, quantity of products sold by region, category, and time

- Increase and decrease in revenue over the years

Benefits: Offer a full picture of business performance They reveal sales trends, highlight successful product categories, and identify top-performing items, guiding strategic decisions for maximizing potential and market focus

Requirements: Employee sales performance statistics

- Percentage of target completion for individual employees

Benefits: Employee performance statistics can help identify top performers, pinpoint areas for improvement, and allocate resources more effectively, optimizing organizational strategies and business outcomes.

SQL Server Integration Services

We have implemented a process involving SQL Server Integration Services (SSIS) to import data from CSV and Excel files into SQL Server This approach is commonly used in building data warehouses where you organize data into dimensional (dim) and fact tables

We carried out a process using SQL Server Integration Services (SSIS) to transfer data from CSV into SQL Server This method is frequently used to create data warehouses where the data is structured into dimensional (dim) and fact tables Here's an extended explanation:

Integration Services (SSIS) is a powerful tool for ETL purposes (Extract, Transform, Load) provided by Microsoft SQL Server It can be used to perform a broad range of data migration tasks

SSIS provides specific connectors and components to efficiently handle various data sources including CSV files which are our data type

3.3.2 Loading Data into SQL Server:

SQL Server is the destination of data flow: The extracted data is loaded into SQL Server, which serves as the central repository for our data

SSIS Data Flow: SIS enables the creation of data flow tasks, allowing for the movement of data from a source to a destination During this process, the data undergoes transformations to ensure it is properly cleansed and formatted before being stored in SQL Server This ensures the data meets the required standards and is ready for analysis and utilization

3.3.3 Designing Dim and Fact Tables:

Data Warehouse Structure: In a data warehouse, data is structured into two dimensional (dim) and fact tables Dim tables generally hold descriptive information, such as product details, with data fields like name, product code, and standard cost Fact tables predominantly store quantitative data, such as sales records, with metrics like revenue, quantity, and date The dimensional and fact tables are interconnected, with dim tables providing context and attributes for the quantitative data present in the fact tables

Normalization and Star Schema: The design may involve normalizing dim tables and using a star schema where a fact table is connected to multiple dim tables, creating a more flexible and efficient structure for analytical queries

3.3.4 Populating Dim and Fact Tables:

SSIS Packages for Loading: SSIS packages are created to populate the dim and fact tables in SQL Server These packages can either be scheduled to run automatically or triggered manually based on how frequently your data requires updates

Transformations and Lookups: SSIS enables data transformations during the loading process Lookup transformations can be used to enhance fact tables with relevant information retrieved from dimension tables

Indexing and Statistics: To optimize query performance in SQL Server, ensure proper periodically review execution plans of essential queries to identify potential performance bottlenecks and areas for improvement

Incremental Loading: Consider implementing incremental loading strategies to update only the changed or new data, reducing the overall load time

Data Cleansing: SSIS offers a range of data cleansing capabilities as part of its ETL process, ensuring the accuracy and consistency of data stored in the warehouse This data cleansing functionality helps eliminate errors, inconsistencies, and duplicate records, resulting in high-quality data that supports informed decision-making and analysis

Error Handling: Incorporate robust error handling mechanisms into your SSIS packages to effectively identify and resolve issues that arise during the data loading process, ensuring data integrity and maintaining a seamless data flow

By utilizing SSIS for data integration and SQL Server as the underlying database, you have created a robust infrastructure for constructing a scalable and efficient data warehouse that can fulfill business intelligence and analytics requirements Continuously monitoring, maintaining, and optimizing the data integration pipeline will ensure its ongoing effectiveness and success

Overview of the data warehouse

3.4.1 Description of data building reports

With a view to tracking the data from daily transactions and utilizing it to compare each month and each year, the data warehouse must include details about items, resellers, promotions, territories, and time

The following describes the data that were used to create the dimension table:

Table 3 1 Description of dimension tables

1 Product Detailed information about the product, includi name, color, cost, category, subcategory,

3 Region Information about sales territory, including regio country, and group

4 Salesperson Information about sales employees, including na title, email

5 SalespersonRegion Connection between Salesperson dimension

Region dimension showing responsible regions each sales employee

6 Reseller Information about reseller, including nam revenue, and opened year

The following describes the data that was used to create the fact table:

Table 3 2 Description of dimension tables

1 Sales Information about the transactions

2 Targets Information about KPI goals of each sales person month

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

Column Name Data Type Allow Nulls

By exploring and characterizing the essential data source, dimensions, and values for the data warehouse are constructed We can indicate that the data warehouse consists of 2 Fact Tables connected to multiple Dimension Tables which is organized as a Galaxy schema

The manager's decision is supported by the fact tables' analysis of the significant factors The manager can understand the business performance of the sales employee for each product, as well as their goals Sales, profits, costs and sales territory will be detailed in FactSales In addition FactTargets displays the target per month of each sales employee

The Dimension tables provide the manager with particular details about the business’ transactions, including DimProduct, DimRegion, DimSalesperon, DimSalesperonRegion, DimDate, DimReseller

The following diagram will explain every Dimension and Fact table:

Using the described Dimension and Fact tables in the Data Warehouse, the manager can analyze the necessary information about the current state of the salesperson and resellers to make a better decision to maximize revenue for the company

The relationships in Data Warehouse schema

Table 3 11 The relationships in Data Warehouse schema

1 DimProduct - FactSale 1 - n One product can have one multiple rows in the FactSales tab and each row in the FactSales ta only has one product

DimReseller - FactSale 1 - n One reseller can have one or multi rows in the FactSales table, and e row in the FactSales table only h one reseller

1 - n One sales employee can have one multiple rows in the FactSales tab and each row in the FactSales ta only has one employee

1-n One sales employee can have one multiple rows in the FactTarget table, and each row in th FactTargets table only has o employee

1-n One sales employee can have one multiple rows in the DimSalespersonRegion table, a each row in the

DimSalespersonRegion table on has one employee

1-n One region can have one or multip rows in the DimSalespersonRegi table, and each row in th DimSalespersonRegion table on has one region

7 DimRegion - FactSales1-n One region can have one or multip rows in the FactSales table, and e row in the FactSales table only h one region

DimDate - FactSales 1-n One date can have one or multip rows in the FactSales table, and e row in the FactSales table only h one date

DimDate - FactTargets 1-n One date can have one or multip rows in the FactTargets table, a each row in the FactTargets tab only has one date.

EXPERIMENTAL RESULTS AND ANALYSIS

Data analysis and visualization

First, we created an overview dashboard of AdventureWorks by:

● Giving some important metrics about AdventureWorks’ business: how many products, how many sales employees, how many resellers, quantity of orders, quantity of sold products, revenue, profit

● Statistics of revenue, profit and revenue by group of markets, by country, by category, by quarter and year

● The statistics can be filtered by year

Figure 4.1 Data visualization of General Analysis

We use Card visualization (1) to show an overview of some key metrics of AdventureWorks including Total Order, Sold Quantity, Revenue, Profit, Resellers, Sales Employees, Products These metrics will change from year to year

Revenue by Group and Country statistics: we use the Pie Chart and Funnel visualizations (2) because it shows which stores have the highest revenue and sorted from highest to lowest and also compares percentages of revenue from other groups/regions compared to the group/region with the highest turnover

In addition, we uses Line and stacked column chart visualizations (3) to describe:

● Statistics of revenue and profit by category to show which category is the best- selling one and brings the most profit

● Revenue statistics, total orders and profit by Quarter and Year to monitor the change, also trend of revenue, orders and profit throughout the 4-year period including quarters

In analyzing revenue, there are 4 key metrics to consider: Revenue, Profit, Net Revenue YoY (%), Net Profit YoY (%)

- Description: Total proceeds from sales

Revenue = SUM(Fact_Sales[Sales])

- Description: is the remaining amount after subtracting expenses from revenue

Profit = SUM(Fact_Sales[Sales]) - SUM(Fact_Sales[Cost])

- Description: Percentage increase in revenue for the current year compared to the previous year

● Net Revenue YoY = ([SalesCurrent] [SalesPrevious]/ -1)

● SalesCurrent = SUM(Fact_Sales[Sales])

● SalesPrevious = CALCULATE(SUM(Fact_Sales[Sales]), SAMEPERIODLASTYEAR(Dim_Date[FullDateAlternateKey])) Net Profit YoY (%):

- Description: Percentage increase in profit for the current year compared to the previous year

● ProfitCurrent = SUM(Fact_Sales[Sales]) - SUM(Fact_Sales[Cost])

● ProfitPrevious = CALCULATE((SUM(Fact_Sales[Sales]) - SUM(Fact_Sales[Cost])),

SAMEPERIODLASTYEAR(Dim_Date[FullDateAlternateKey])) 4.1.2.2 Data visualization

To emphasize key statistics like Quantity of Products Sold, Total Revenue, Year- over-Year Revenue Growth (Net Revenue YoY%), and Year-over-Year Profit Growth (Net Profit YoY%), card visualizations (1) are employed for enhanced ease and convenience, aiding viewers in grasping the figures more effectively

To enhance the Dashboard's liveliness and avoid overwhelming it with excessive tabular data, a section displaying total revenue in the form of a Treemap visualization (2) is included This addition ensures viewers aren't inundated with tables, allowing for convenient selection of revenue statistics across different regions

To present comprehensive Total Revenue figures, Year-over-Year Revenue Growth (Net Revenue YoY%), and Year-over-Year Profit Growth (Net Profit YoY%) categorized by Region, Category, and Product criteria, facilitating the display of extensive data while illustrating changes in Year-over-Year Revenue Growth and Year-over-Year Profit Growth, Matrix visualization (3) emerges as the most suitable visualization method

● Giving 3 important metrics related to product: Cost, Sold Quantity, Revenue

● Statistics of revenue and cost of product by group of markets, and quarters of years

● The statistics can be filtered by year, quarter, month, region and category

In this dashboard, we also use the function to determine the sales trend measure equaling sales of current year - sales of previous year (increase or decrease of revenue year by year)

Figure 4 3 Dashboard of Product Analysis

We use Card visualization (1) to show an overview of 3 key metrics including Sold Quantity, Revenue, and Cost These metrics will change by time, region, and category of product

To add more vibrancy to the Dashboard and prevent the Dashboard from having too much table data, a total revenue section in the form of a Map visualization (2) is added so that the viewer is not overwhelmed with the table data and allows the viewer to check how high the revenue of each region/country by the size of bubble

The revenue and cost by year and quarter statistics: we use Clustered Column Chart

(3) to compare the difference between revenue and cost of product leading to the gain/loss of the business’ profit in a particular period of time

We use Stacked Bar Chart (4) with the filter Top N to indicate top 10 best sellings products This top 10 can be changed in position based on the filtered by category, region, quarter and month

Lastly, we use a Matrix visualization (5) to display the revenue of each product from

2017 to 2020 In this matrix, if a cell is filled with a green background, it means at that time, that product made a gain profit In contrast, if it is red, it means that the product made a loss profit There is a sales trend measure mentioned above, this will indicate which direction of the arrow in each cell Upward means the revenue made improvement, downward means worsen

● Giving 3 important metrics to evaluate an employee: Sales Amount (Revenue), Target Amount, Achieved Rate

● Statistics of sales amount of employees by time

● Statistics of Employee by Country

● The statistics can be filtered by year, quarter, month, region, category and subcategory

In addition, we created a new measure used for this dashboard which is called Achieved Rate The measure equals sales amount/target amount with the % unit 4.1.4.2 Data visualization

Figure 4 4 Dashboard of Employee Evaluation

We use Card visualization (1) to show an overview of 3 key metrics including Sales Amount, Target Amount and Achieved Rate These metrics will change by time, region, category and subcategory

We use Stacked Bar Chart (2) with the filter Top N to indicate top 5 excellent sales employees who lead the sales race Besides, it also displays the achieved rate of these employees This top 5 can be changed in position based on the filtered by category, region, quarter and month

The sales, target amount and achieved rate by year, quarter and month statistics was displayed by the Line and Clustered Column Chart (3) to compare the difference between target and real sales, and the rate of achieving goals

At the Donut Chart (4), we demonstrated the amount of employees by country And right beside, there is a matrix illustrating the sales amount of each sales employee in the 4-year period from 2017 We set an icon condition for this matrix:

● Empty star: if the achieved rate is less than 30 percent

● Half star: if the achieved rate is equal or greater than 30 percent

● Full star: if the achieved rate is equal or greater than 70 percent.

Evaluation and Suggestion

From the 4 dashboards, we can draw some conclusions:

● The main category which made the highest revenue is the Bike category from the beginning day of 2017 to the end of 4-year period However, when mentioning the profit, the Components category must be the greatest one

● From the Sales data, we can indicate that many products were sold with a price lower than their costs In our point of view, this may be a promotion strategy of AdventureWorks to push the sales amount, especially for the Bike category After buying a bike, customers will have a need to buy more subproducts like components, accessories,

● In overall, sales employees of AdventureWorks set a really high target for their sales amount which has a great gap between the real sales data This can push their motivation but also put too much pressure on them leading to worse effective results Luckily, the company’s employees are still doing well with a growth in their sales performance each year

● Lastly, from 2017 to the last quarter of 2019, AdventureWorks increased their number of resellers significantly due to the expansion of the market

Based on the evaluation above, our group has some recommendations for improvement of AdventureWorks business:

● Due to the significant growth in Bike category revenue and the market expansion, AdventureWorks had better cut down but not stop the discount promotion on this category Besides, the business can concentrate more on R&D (Research and Development) of the subproducts because they have a lot of profitable potential based on the report

● Consider the adjustment on sales target for sales employments, also the whole business which is now too different from reality sales This may help to release a lot of stress for employees.

CONCLUSION

Results

- Active engagement is expected from all members during meetings, with contributions and feedback on the meeting content

- Members should possess access to a wide range of knowledge and analytical methods within their respective fields

- Dedication to acquiring a thorough understanding of the Power BI analysis tool is essential, and members should allocate sufficient time for learning and skill development

- Basic proficiency in handling data is a prerequisite for effective participation 5.1.2 Disadvantages

- The available chart types are limited, resulting in a lack of diversity in the reports

- The database remains incomplete, omitting crucial aspects of actual business operations

- Due to a lack of experience in design, function utilization, and copyright constraints, the tool has not fully showcased the team's innovative ideas

- The inventory algorithm is rudimentary, limiting its predictive capabilities and hindering the attainment of high accuracy levels

- The original database source of ours has no Dimension of Date, therefore we have to search and find ways to process a new Date table which is suitable for the original database.

Thread development direction

- Employ supplementary tools and incorporate diverse charts to enhance the clarity and conciseness of data representation

- Acquire knowledge about algorithms designed to predict product life cycles, facilitating more effective inventory management strategies

Dresner, H (1989) Business intelligence Gartner Inc

Kalos, M H., & Whitlock, P A (2009) Monte Carlo methods John Wiley & Sons

Hotz, N (2023, January 19) What is CRISP DM? Data Science Process Alliance https://www.datascience-pm.com/crisp-dm-2/

Kimball, R., & Ross, M (2011) The data warehouse toolkit: the complete guide to dimensional modeling John Wiley & Sons.

Knight, B., Mitchell, A., Green, D., Hinson, D., Kellenberger, K., Leonard, A., & Murphy, M (2007) Professional SQL server 2005 integration services John Wiley

Sections have been edited according to instructor comments compared to the previously submitted version:

- Revise the scope of the research and add reasons for choosing the topic in chapter 1

- Supplementing the theoretical basis of data warehouse, SSIS and schema in chapter 2

- In chapter 3, add the process of implementing SSIS steps, review and edit a more appropriate schema (from snowflake schema to galaxy schema).

Ngày đăng: 06/04/2024, 09:40

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w