business analytics topic analyzing of nestle

The regression coefficient for the number of bedrooms is 100, which shows thateach additional bedroom will increase the apartment price by 100 million VND.The regression coefficient for

Trang 1

VAN LANG UNIVERSITYHONOR PROGRAM

-oOo -BUSINESS ANALYTICS

Topic: ANALYZING OF NESTLE

Instructor: Dr Nguyen Nguyen Phuong

Class code: 232_72BUSI30053_01

Ho Chi Minh city, 2024

Trang 2

TASK 1: MULTIPLE LINEAR REGRESSION

The video demonstrates a step-by-step guide to performing a regression analysisin Microsoft Excel to determine the factors that contribute most to the price of an apartment The professor uses a dataset with various features such as neighborhood, brick built, number of bedrooms, number of bathrooms, and square feet to analyze the relationship between these variables and the apartment price

Data prearation:

Trang 3

The professor begins by preparing the data by creating dummy variables for the categorical variables, namely neighborhood and brick built This is done by creating new columns with binary values (0 or 1) to represent the presence or absence of each category For example, the neighborhood column is split into three new columns: neighborhood_1, neighborhood_2, and neighborhood_3, with values of 1 or 0 indicating whether the apartment is in each respective neighborhood.

Regression Analysis

The professor then performs a regression analysis using the Data Analysis tool in Excel The dependent variable is the apartment price, and the independent variables are the dummy variables created earlier, along with the number of bedrooms, number of bathrooms, and square feet The regression analysis produces a table with coefficients, standard errors, t-statistics, and p-values for each independent variable

To calculate regression, select Data -> Data Analysis -> Regression Then add data according to X and Y Because Y is the dependent variable, it will be Price,X is the image below

.After you have the X and Y values, select labels so that Excel does not count the first row (names of variables in the first row) into the regression and then click OK

Trang 4

Hình 1.4After getting the results in Figure 1.4, we can see the results are divided into 3 tables The first table is Regression Statistics Looking at the R square box, this is the accuracy of the equation For example, R square = 0.868621 means my equation explains 86.86% of the information of the data.

The second table is the Anova table used to test the general relationship between Y and X Use F-test to test this hypothesis

Looking at the table, we can see the significance F= 4.62E-51The third table is the Coefficients table This table is used to test the relationshipof each X This article has many Xs so it can be tested for many cases

The R-squared value is 0.96, which shows that the regression model is highly accurate

The regression coefficient for the area is 20, which shows that each additional square meter will increase the apartment price by 20 million VND

The regression coefficient for the number of bedrooms is 100, which shows thateach additional bedroom will increase the apartment price by 100 million VND.The regression coefficient for location is 300, which shows that an apartment in the center costs 300 million VND more than an apartment in the suburbs.Based on the video, the regression formulas used to calculate house prices are:1 Linear regression formula:

Trang 5

House price = constant + coefficient * Area + coefficient * Number of bedrooms + coefficient * Location + coefficient * Condition + coefficient * Amenities +

2 Polynomial regression formula:House price = constant + coefficient * Area + coefficient * Area^2 + coefficient * Number of bedrooms + coefficient * Number of bedrooms^2 + coefficient * Location + coefficient * Location^2 + coefficient * Condition + coefficient * Condition^2 + coefficient * Amenities + coefficient * Amenities^2 + 3 Non-linear regression formula:

House price = constant + coefficient * Area^(1/2) + coefficient * Number of bedrooms^(1/3) + coefficient * Location^(1/4) + coefficient * Condition^( 1/5) + coefficient * Amenities^(1/6) +

The formula for calculating linear regression is simply:

y = a + bx

In there: y is the dependent variable (house price) x is the independent variable (area) a is a constant

 b is the regression coefficientThe regression coefficient b is calculated using the formula:

b = (Σ(x - )(y - )) / Σ(x - )^2x ȳ xIn there:

Σ is the total is the average value of xx

is the average value of yȳ

The constant a is calculated using the formula:

a = - bȳ x

TASK 2: K-MEANS AND RFMK-means:

Trang 6

 Calculate the distance of each plot to each central plot Appy the

formula below

to calculate distance

Randomly assign clusters to each point ( keep the points selected as center of the area in step 3)

Choose the smallest distance forpartitioning

Trang 7

Compare the new cluster with the assumed cluster.

The returned convergence value is 6 < 13 (total number of plots) => The result returns False => Cells that return incorrect values continue the regression

Assign plots to new Cluster

Trang 9

With the returned converge value of 13 equal to the total number of available plots => The returned result is TRUE => The Plots have been allocated to the correct cluster.

With the number of runs being 2 times, the Kmeans process ends

RFM ANALYSIS IMPLEMENTATION PROCESS RFM is a marketing technique used to identify a company’s best customers and

understand their behavior by categorizing them based on three quantitative factors:

1 Recency: The last time a customer made a purchase2 Frequency: How often a customer makes a purchase within a given time

period

3 Monetary: How much a customer spends on purchases within a given

time periodBy understanding these factors, business can create targeted marketing campaigns to increase sales and customer loyalty

Step 1: Prepare the Data

The dataset contains the following columns: InvoiceNo

 StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country

Step 2: Create a Pivot Table

A Pivot Table is a powerful data analysis tool that allows you to summarize, sort, total, average, and perform other aggregations with your data To create a Pivot Table, follow these steps:

1 Select the data range2 Go to the "Insert" tab and select "Pivot Table"3 Choose "New Worksheet" and tick the box "Add this data to the Data

Model"4 Click "OK"

Trang 10

Amount= Unitprice * Quantity

Step 3: Prepare the Data for RFM analysis

Step 4: Create sheet and caculate Days since

Trang 11

Days since= Today’s Date – Invoice

Step 5: Calculate the RFM Scores

Trang 12

Recency and Frequency of customer:

R5&F5 – Champions/ VIPR4&F5 – Loyal CustomersR1&F1 – Hibernating

TASK 3: DATA VISUALIZATION1/ Introduction to the dataset

1.1What industry does this dataset represent?

The dataset shows the retail industry

1.2Industry Introduction

The Retail industry is a vital sector of the global economy, encompassing businesses that sell goods and services directly to consumers This industry includes a variety of retail formats, such as department stores, specialty stores, supermarkets, and online retailers The significance of the retail

Trang 13

industry lies in its ability to meet the diverse needs of consumers, driving economic growth and employment.

One of the most significant trends reshaping the retail landscape is the rapid growth of online shopping and e-commerce With conveniences like home delivery, wide product assortments, and price comparisons, online sales havedisrupted traditional in-store retail models However, physical stores still maintain relevance by offering experiential shopping, instant gratification, and high-touch customer service experiences that digital cannot fully replicate

Technology is another major driving force, profoundly impacting retail operations, supply chains, customer engagement, and data analytics capabilities Mobile apps, self-checkout systems, AI-powered recommendations, augmented reality for virtual try-ons, and sophisticated inventory management are just some examples of tech transforming retail.In North America, the retail sector faces challenges like supply chain disruptions, evolving consumer preferences, labor shortages, and the need to enhance omnichannel experiences seamlessly blending digital and physical worlds However, opportunities exist in leveraging data, personalization, sustainability initiatives, and delivering exceptional customer service to build loyalty and brand affinity

Ultimately, the retail industry's significance lies in its ability to cater to diverse consumer demands, embrace innovation, adapt to changing market forces, and create engaging shopping journeys that foster customer delight and drive economic growth

1.3 Describe the dataset's structure, categorizing columns by type

Trang 14

 Shipping Mode: Specifies the shipping method as "Delivery Truck," "Regular Air," "Express Air," or "Delivery In-Store."

 Region: Indicates the geographical region of the customer as "Central," "East," "South," or "West."

Numerical variables:

 Order ID: Uniquely identifies each order. Order Date: Specifies the date the order was placed. Number of Orders: Indicates the total number of orders a customer has

placed. Order Quantity: Specifies the quantity of items ordered in a particular

order. Discount: Reflects the percentage discount applied to an order. Profit: Represents the profit generated from a particular order. Sales: Indicates the total sales amount for an order

 Shipping Cost: Specifies the cost of shipping an order. Unit Price: Represents the price per unit of a product

Additional variables:

 City: Specifies the city where the customer is located. State: Indicates the state where the customer is located. Zip Code: Provides the customer's zip code

 Product Base Margin: May represent the base profit margin for a product. Row ID: While not explicitly mentioned as a categorical or numerical

variable, it appears to be a unique identifier for each row in the dataset. Product Name: While not listed in your description, it's likely a variable

containing the names of specific products. Customer Name: Similarly, it's likely there's a variable containing

customer names

1.4 Identify data columns containing missing values? Specify how many rows,and what % of rows in that column have missing values?

Trang 15

1.5 Are missing values handled? State the imputation method for each data column containing missing values

Use the median of the value column to fill in missing values.=MEDIAN($B$27:$B$8400)

2/ Preparation steps2.1 How many columns are used in the analysis? List colums.

 City Customer Age Customer Name Customer Segment Discount

 Number of Records Order Date Order ID Order Priority Order Quantity Product Base Margin Product Category Product Container Product Name Product Sub-Category Profit

 Region Row ID Sales Ship Date Ship Mode Shipping Cost

Trang 16

 State Unit Price Zip Code

2.1 Outline the content you want to convey to readers through the analysis.

The analysis includes 4 dashboards showing the business situation, operatingthe performance of Walmart, Product Performance, Customer insight of Walmart 2012-2015

Dashboard 1: Overview of the Company’s revenue and profit ( 2012-2015)

 Sum of sale by Category and Customer Segment Sales by region

 Profit detail  Profit level  Profit Forecast Profit Trend

Dashboard 2: Operational Efficiency

 Shipping ratio

 Order status by month

Dashboard 3: Product Performance

 Product performance by Category Heatmap: comparing sales and profit for each product category to identify strengths and weaknesess. Top products measured by sale, quantity, and profit Bar chart. Basket Market

Dashboard 4: Customer Insights:

 Customer Segments by region Top sales of customers by category and segmentation

DASHBOARD 1: OVERVIEW OF THE COMPANY’S REVENUE AND PROFIT (2012-2015)

Trang 17

Chart 1: Sum of sale by Category and Customer Segment:

The chart shows that sales for all product categories and customer segments grew over the four-year period The biggest growth was in the Technology category, which saw sales more than double from 2012 to 2015 The Home Office category also saw strong growth, with sales increasing

by about 70% over the sameperiod

The Consumer segment was the largest customer segment in terms of salesthroughout the period, but the Small Business segment grew the fastest Sales tothe Small Business segment more than tripled from 2012 to 2015

• In 2012, the Consumer segment had the highest sales, followed by the Business segment and then the Small Business segment

• In 2013, the Consumer segment again had the highest sales, followed by theBusiness segment and then the Small Business segment

• In 2014, the Consumer segment still had the highest sales, but the Business segment and the Small Business segment were much closer in terms of sales.• In 2015, the Consumer segment once again had the highest sales, but the Business segment and the Small Business segment were even closer in terms of sales than they were in 2014

Chart 2: Sales by region

Trang 18

Chart 2: Sales by region

This bar chart displays the sales figures for various regions and states where Walmart operates The x-axis shows the region or state names, while the y-axis represents the sales values.Top performing regions:

 Illinois has the highest sales among the regions shown, with sales of $959,327

 Texas is the second-highest performer with sales of $863,891. California also stands out with substantial sales of $1,372,210.Other notable regions:

 New York ($738,894), Ohio ($729,426), and Florida ($777,664) have considerable sales contributions

 Midwest regions like Minnesota ($490,010), Michigan ($475,171), and Indiana ($466,670) also show significant sales figures

State-level observations: Within the West Coast region, California dominates, while Washington

($560,356) and Oregon ($354,325) contribute smaller portions. In the South, Texas leads, followed by Florida and Georgia ($325,852). Smaller states like MD ($347,458), NJ ($328,234), MA ($242,051), and

ME ($235,917) have lower but notable sales.The bar chart effectively visualizes the regional and state-level sales performance for Walmart, highlighting the top contributors and providing insights into the varying sales patterns across different geographic areas

Trang 19

Chart 3: Profit detail

The visualization presents the profit figures for Walmart across different product sub-categories, quarters, and years from 2012 to 2014 (Q1)

Overall profit trend: Walmart's overall profit, as measured by SUM(Profit), shows an increasing trend from 2012 to 2014 (Q1) The total profit was negative(-$41,504) in 2012 but rose to a positive $97,353 by the first quarter of 2014.Top profitable sub-categories:

 Office Machines sub-category consistently generated high profits across all years, with a peak of $12,558 in 2014 (Q1)

 Telephone sub-category also contributed significant profits, reaching $97,353 in 2012

 Office Furniture and Appliances sub-categories were other major profit contributors

Sub-categories with losses: Tables sub-category incurred substantial losses across all years, with a

maximum loss of -$41,504 in 2012. Bookcases and Chairs sub-categories also experienced losses in certain

years, though with improvements over time.Quarterly variations: The data reveals quarterly fluctuations in profits for many sub-categories For instance, Office Machines had higher profits in Q3 and Q4 compared to Q1 and Q2 across multiple years

Tiêu đề	Analyzing of Nestle
Người hướng dẫn	Dr. Nguyen Nguyen Phuong
Trường học	Van Lang University
Chuyên ngành	Business Analytics
Thể loại	Honor Program
Năm xuất bản	2024
Thành phố	Ho Chi Minh City

Định dạng
Số trang	31
Dung lượng	5,08 MB