The Chi-Square value of 260.293 with 6 degrees of freedom and a p-value of .000 indicating a p-value less than .001 suggests that there is a very strong statistical association between t
Trang 1NATIONAL ECONOMICS UNIVERSITY
Business School
BUSINESS STATISTICS CLASS: EBDB4 Group Assignment
Group Members:
Nguyễn Thị Lan Anh - 11220457
Phan Thị Phương Thảo - 11225967
Tạ Khánh Vi - 11226889
Lê Thị Mai Chi - 11220982 Nguyễn Thị Việt An - 11220045
Ha Noi, November 8 , 2023.th
Trang 2TABLE OF CONTENTS
CASE STUDY 1: EMPLOYEE 3
Question 1 3
Question 2 12
CASE STUDY 2: Specialty Toys 13
Question 1 13
Question 2 14
Question 3 15
Question 4 17
Question 5 17
Trang 3CASE STUDY 1: EMPLOYEE Question 1
a Do necessary data cleaning
b Produce the appropriate graphs and tables to explore the distribution of variables?Summarize the quantitative variables using location and variability measures
Produce the appropriate graphs and tables to explore the distribution of variables?
- There are 1000 observations,
with the majority of
employees (752 employees
accounting for 75.2%)
having bachelor’s degree
Followed by 211 employees
who have master’s degree
The number of people
having PHD was only 3.7%
(37 employees)
Trang 4- 2017 is the year that most of employees joined the company with 243 people,accounting for 24.3% In 2018 and 2012, there were only 81 and 99employees joining the company respectively During the remaining years, thenumber of employees joining the company increased from 11.9% to 16.6%
- Bangalore is the city which 47% of employees are based or work Followed
by 279 employees who are based or work in Pune and 251 employees in NewDelhi
Trang 5- The most employees answer that their payment tier is 3.0, accounting for74.4% The number of employees who have payment tier 2.0 and 1.0 are 204and 52 employees respectively.
- The company's employees are mainly at the age of 26 years old(203 people).The number of people at the age from 22-23 years old and over 29 years oldare less than 10 people, 34 years old (11 employees)
Trang 6- The number of male employees in the company (61.6%) are twice as large asthe number of female employees (38.4%).
Trang 7- 898 people out of 1,000 said they had never been temporarily withoutassigned work and 102 people remaining have ever
- From this table, we can see that there are a lot of people having from 2 to 4years of experience in their current field There are only 21 employees having
no experience before and 69 employees having 1 years of experience
Trang 8- Out of 1000 employees, 35.7% of employees responded that they would leavethe company The remaining 643 employees answered that they did not leavethe company and this number accounted for 64.3%.
Summarize the quantitative variables using location and variability measures
We have 2 quantitative variables which are Age variable and Experience currentdomain variable
Trang 9- In the Age variable, the youngest age is 22 years old, the oldest is 40 yearsold, the mode value is 26 The mean value is 26.49, Standard deviation is2.575 and variance is 6.629
- In the Experience Current Domain variable, the minimum number of years ofexperience is 0 year, the maximum is 4 years of experience, the mode value is
3 The mean value is 3.14, Standard deviation is 1.291 and variance is 1.666
Trang 10c Analyze the relationship between two variables (for example how the years of working inthe companies change by cities? Any relationship between Payment Tier andExperience?)
Experience in Current Domain and Age: How is the Experience in Current Domain ofemployees affect by their age?
Scatter Graphs:
There is a nonlinear relationship
Correlation:
Trang 11 A Pearson correlation analysis was performed to test whether there is arelationship between age and experience in current domain The result
of the Pearson correlation analysis showed that there was a significantrelationship between age and experience in current domain, IrI= 0.25,
p-value = 0.437 > 0.05 => The regression model is not suitable
c.1 Joining year and Leaving or not
Trang 12Null hypothesis: there is no relationship between Joining year and Leaving or not.
Alternative hypothesis: There is a relation between Joining year and Leaving or not
The Chi-Square value of 260.293 with 6 degrees of freedom and a p-value of 000
(indicating a p-value less than 001) suggests that there is a very strong statistical
association between the joining year and the decision leaving or not
Since the p-value is less than 0.001, we can reject the null hypothesis that there is no associationbetween the variables
c.2 City and Leaving or not
Trang 13The Chi-Square value of 33.532 with 2 degrees of freedom and a p-value of 0.000
(indicating a p-value less than 0.001) suggests that there is a very strong statistical
association between the city which employee live and the decision leaving or not
Since the p-value is less than 0.001, we can reject the null hypothesis that there is no associationbetween the variables
c.3 Payment Tier and Leaving or not
PaymentTier * LeaveOrNot Crosstabulation
Trang 14Total 643 357 1000
Chi-Square Tests
Asymp Sig sided)
The Chi-Square value of 77.243 with 2 degrees of freedom and a p-value of 0.000
(indicating a p-value less than 0.001) suggests that there is a very strong statistical
association between payment tier and the decision leaving or not
Since the p-value is less than 0.001, we can reject the null hypothesis that there is no associationbetween the variables
c.4 Age and Leaving or not
Age * LeaveOrNot Crosstabulation
Trang 15The Chi-Square value of 30.324 with 17 degrees of freedom and a p-value of 0.024
(indicating a p-value more than 0.001) suggests that there is a weak statistical
association between age and the decision of leaving or not
Since the p-value is more than 0.001, we can accept the null hypothesis that there is noassociation between the variables
c.5 Gender and Leaving or not
Gender * LeaveOrNot Crosstabulation
Trang 16Value df
Asymp Sig sided)
Exact Sig sided)
(2-Exact Sig sided)
a 0 cells (0.0%) have expected count less than 5 The minimum expected count is 137.09.
b Computed only for a 2x2 table
The Chi-Square value of 35.512 with 1 degrees of freedom and a p-value of 0.000
(indicating a p-value less than 0.001) suggests that there is a strong statistical
association between gender and the decision of leaving or not
Since the p-value is less than 0.001, we can reject the null hypothesis that there is no associationbetween the variables
c.6 Everbenched and Leaving or not
EverBenched * LeaveOrNot Crosstabulation
Exact Sig sided)
(2-Exact Sig sided)
Trang 17b Computed only for a 2x2 table
The Chi-Square value of 8.779 with 1 degrees of freedom and a p-value of 0.003
(indicating a p-value more than 0.001) suggests that there is not a strong statistical
association between everbenched and the decision of leaving or not
Since the p-value is more than 0.001, we can accept the null hypothesis that there is noassociation between the variables
c.7 Experience in current domain and Leaving or not
ExperienceInCurrentDomain * LeaveOrNot Crosstabulation
The Chi-Square value of 5.324 with 5 degrees of freedom and a p-value of 0.378
(indicating a p-value more than 0.001) suggests that there is a weak statistical
Trang 18association between Experience in current domain and the decision of leaving or not.
Since the p-value is more than 0.001, we can accept the null hypothesis that there is noassociation between the variables
c.8 Education and Leaving or not
Education * LeaveOrNot Crosstabulation
The Chi-Square value of 41.411 with 2 degrees of freedom and a p-value of 0.000
(indicating a p-value less than 0.001) suggests that there is a strong statistical
association between education and the decision of leaving or not
Since the p-value is less than 0.001, we can reject the null hypothesis that there is no associationbetween the variables
d Do produce the confidence interval and comparative tests if possible
Gender and Age: 1: Male
2: Female
We will consider the differences between male and female in age
We have hypothesis:
Trang 19H0: There is no difference between the two samples.
H1: There is a difference between the two samples
- An independent samples t-test was conducted to compare Age in Male and Female.There was a significant difference in the scores for Male (mean = 26.57, Stdev =2.7533) and Female (mean = 26.365, Stdev = 2.2523); t = 1.226, p = 0.22 (two-tailed) The magnitude of the differences in the means
(mean difference = 0.2052, 95% CI: (-0.1232, 0.5336) was small
- In this t-test, the p-value (2-tailed) is 0.22 or 22% Since the significance level was set
at 5 %, it is thus lower than 22 % For this reason, no significant difference isassumed between the two samples, and they therefore come from the samepopulation
2.1 Education and Decision to leave
Educational level has an impact on the decision to leave a company, HR should considerimplementing strategies to address this factor and enhance employee retention such as: trainingand development opportunities, educational assistance programs, recognition and rewards,…
Trang 20A notable disparity exists in the average joining year between individuals who opted to leave andthose who chose to stay, with departing employees having joined more recently HR effortscould be directed towards enhancing the onboarding process and providing better support in theearly stages of a career to improve retention among newer employees.
2.3 City and Decision to leave
The chi-square test
findings indicate a noteworthy association between the city and the decision to
depart It is advisable for HR to further investigate factors specific to each
location that could impact employee retention and proactively address any regional
concerns
2.4 Payment Tier and Decision to leave
Because the employee could decide to leave or not depend on payment tier, so Hr can have somemethods to minimize this case: Salary Benchmarking, Transparent Compensation Policies,Rewards and Professional Development Opportunities,…
2.5 Age and Decision to leave
Age does not have any effect on decision leaving of the companies So HR can do recruitmentwith people with all range of age
2.6 Gender and Decision to leave
Gender play an important role in decision to leave of the employee, so the HR should have EqualOpportunities and Inclusion Policies, Family-Friendly Policies,…
2.7 Ever Benched and Decision to leave
The chi-square test indicates a lack of a significant association between being benched and thedecision to leave However, it's important to note that the p-value is close to the threshold ofsignificance HR should remain vigilant and monitor the situation closely, as benching maypotentially become a more influential factor in the future
2.8 Experience in current domain and Decision to leave
The independent samples t-test comparing the decision to leave with the level of experienceindicates that there is no significant difference in the mean experience between those who chose
to leave and those who stayed This suggests that decisions to leave may not be influenced by theamount of experience within the current domain HR may want to explore alternative factorsaffecting retention, such as work-life balance or job satisfaction
Trang 21CASE STUDY 2: Specialty Toys Question 1
According to the case study:
Expected demand of 20000 units with a 0.9 probability that demand would be between
10000 units and 30000 units That is, the 90% confidence interval is (10000; 30000)
For 90% confidence, α = 1 - 0.9 = 0.1
The 90% confidence interval is given by: μ ± z ×σ = μ ± z ×σ∝ /2 0.05
where z is the critical value at 0.05 obtained from normal distribution tables We know0.05
that z = 1.645 So, the 90% confidence interval is given by: μ ± 1.645σ.0.05
From the sales forecaster's prediction, we know that:
μ - 1.645σ = 10000 (1); μ + 1.645σ = 30000 (2)
Solving (1) and (2), we get: μ = 20000; σ = 6079.03
The probability density function that defines the curve of the normal distribution is given
f (x) = 16079.03√2π
−1
2(x−20000 6079.03)2
Now we sketch the curve The highest point on the normal curve is at the mean 20000.The curve is symmetric The standard deviation determines how flat and wide the curve
is Larger values result in wider, flatter curves, showing more variability in the data.Thus, we have:
Trang 22Question 2
A stock-out will occur if the demand is greater than the order quantity To compute theprobability of a stock-out, we need to compute the probability that the demand will be greaterthan the order quantity for each of the order quantities recommended by the different managers.This can be done either using the Excel NORMDIST function or manually using the standardnormal distribution tables
Do with NORMDIST function in Excel as: NORMDIST (x; μ; σ; TRUE)
Where:
x = the recommended order quantity
μ = the mean of the distribution or 20000 in this case
σ = the standard deviation of the distribution or 6079.03 in this case
TRUE is used to indicate that you are using the cumulative probability
P (X > k) = P (Z > k−20000
6079.03) = 1- P (Z < k−20000
6079.03) = 1- NORMDIST (x; μ; σ; TRUE)
We have table of probability is shown below:
To calculate manually using the charts, use the standard normal distributionconversion formula to calculate z Look up the calculated z value in the tables to getthe probability that the demand will be greater than order quantity
The suggested order quantities are: 15000; 18000; 24000; 28000 units
o Let X be the demand of the product: X ~ N (20000; 6079.032) Order quantity is
15000, probability of stock out is:
P (X > 15000)
Trang 23=> If order quantity is 15000, probability of stock is: 79.39%
o Order quantity is 18000, probability of stock out is:
=> If order quantity is 18000, probability of stock is: 62.93%
o Order quantity is 24000, probability of stock out is:
=> If order quantity is 24000, probability of stock is: 25.46%
o Order quantity is 28000, probability of stock out is:
Case 1: Order quantities = 15000 units
Cost of $16 per unit => Total cost = 15000 * 16= 240000 ($)
Worth case in which sales = 10000 units => inventory has 5000 units
=> Total worth case sold = 10000 * 24 + 5000 * 5 = 265000 ($)
Trang 24=> Profit = 265000 - 240000 = 25000 ($)
Most likely case in which sales = 20000 units, but sales were limited by order size
=> Total most likely case sold = 15000*24 = 360000 ($)
=> Profit = 360000 - 240000 = 120000 ($)
Best case in which sales = 30000 units, but sales were limited by order size
=> Total most likely case sold = 15000*24 = 360000 ($)
=> Profit = 360000 - 240000 = 120000 ($)
Case 2: Order quantities = 18000 units
Cost of $16 per unit => Total cost = 18000*16= 288000 ($)
Worth case in which sales = 10000 units => inventory has 8000 units
=> Total worth case sold = 10000*24 + 8000*5 = 280000 ($)
=> Profit = 280000 - 288000 = -8000 ($)
Most likely case in which sales = 20000 units, but sales were limited by order size
=> Total most likely case sold = 18000*24 = 432000 ($)
=> Profit = 432000 - 288000 = 144000 ($)
Best case in which sales = 30000 units, but sales were limited by order size
=> Total most likely case sold = 18000*24 = 432000 ($)
=> Profit = 432000 - 288000 = 144000 ($)
Case 3: Order quantities = 24000 units
Cost of $16 per unit => Total cost = 24000*16= 384000 ($)
Worth case in which sales = 10000 units => inventory has 14000 units
=> Total worth case sold = 10000*24 + 14000*5 = 310000 ($)
=> Profit = 310000 - 384000 = -74000 ($)
Most likely case in which sales = 20000 units, => inventory has 4000 units
=> Total most likely case sold = 20000*24 + 4000*5 = 500000 ($)
=> Profit = 500000 - 384000 = 116000 ($)
Best case in which sales = 30000 units, but sales were limited by order size
=> Total most likely case sold = 24000 * 24 = 576000 ($)