- Female Group: o Mean Income: 8,208 o Standard Deviation: 10,7308 Hypothesis Stating: Null Hypothesis HO: There is no statistically significant difference in income between male and fem
Trang 1UEH UNIVERSITY
UEH COLLEGE OF BUSINESS
SCHOOL OF INTERNATIONAL BUSINESS- MARKETING
IIHHMI
UEH
UNIVERSITY
END OF TERM-
GROUP PRESENTATION
Subject: Business Analytics
Lecturer : Ph.D Nguyễn Văn Dũng
Group :4
Class Code : 23C1BUS50320002
Course -Class : IBC05
Malor : International Business
Trang 2Contents
1 Collect and create an SPSS file with the following requiremen: - - 3
2 Make a frequency table about Educational levelL 2 c1 222211222 xe se 3
3 Draw apie chart showing the percentage of observations by Gendker 5
4 Compare the mean of Income of the 2 øroups oÊ genđ€T crete tees 5
a Descriptive Statistics: ccc ccccccccccccecseececteeeeeseceeesseseiesesseeseeseseseseesesieeeseeeees 5
b Hypothesis Sfating: L Q2 112212112112 2 cty 5111 k1 1551k khe 6
lân “dd 6
5 Compare the mean of Income among different educational leveÌs 7
A Descriptive Statistt css ccc Ởý33 7
B Hypothesis Stating ccc ccc 1121221112111 121211 2111111 11115111 1 11H ngu 8
C Homogeneity of Variances T€SÍ: - 12 1121112111211 1 1211121110111 811k ky 8
BI - 9
6 Check whether there is multicollinearity among the variables: Age, Gender,
Education, Marital status, Doing exercises? 00.ccccccccccccccccecceceeeesecesseesseseseseseeteeeenees 9
7 Use multiple linear regression to analyze the impact of the vanables Age, Gender,
Education, Marital status, Doing exercises on the variable Ineome? - - 10
A Lmear Regression Model c2: 2221211112111 112111 181112111 0111181115281 vky 10
SN 200) 2 e A II
C Linear reøTcssIoni r€SUÏ( - c2 121112211211 121 1151115111811 H1 c ng x rệy II
D Interprefatlon GÊT€SuÏ[ c2 221221112111 1521 151121111511 1511 1811151118111 811 111g xky 13
Trang 3GROUP PRESENTATION —END OF TERM (20%)
Analyze the data and write a report according to the following requirements for the
Business Analytics course
Requirements:
I Format:
1
2
3
4 Reference list and citations in the text follow APA style
5
Font: Times New Roman, Size: 12, Line spacing: 1.5 lines, Spacing:
Before: 0 pt After: 0 pt
Length: 10-15 pages (core content)
Cover, content, main text, reference list
Submit | Word file, 1 PPT file and 1 SPSS file into LMS
II content:
1 Collect and create an SPSS file with the following requirements:
1 Variables:
©
=>
a Name: string
b Income: million VNDs/month
Age
Gender: Male (1), Female (0)
Education: thpt (1), dai hoc (2), thac si (3), tién si (4)
Marital status: single (1), married (2)
Doing exercises: doing regular exercise (more than 20
minutes a day) (1), not doing regular exercise (less than 20
minutes a day) (0)
ul Observations: 100
2 Make a frequency table about Educational level
Trang 4Frequencies
Statistics
Trinh d6 hoc van
N Valid 105
Missing 0
Trình độ học vấn
Frequene Valid Cumulative
y Percent Percent Percent
Valid Highschool 5 4,8 4,8 48
Student
College Student 85 81,0 81,0 85,7
Master’s Degree 13 12,4 12,4 98,1
Ph.D Degree 2 1,9 1,9 100,0
Total 105 100,0 100,0
The table describes the number of observations for each level of education and their
proportion in the total extent It can be drawn from the table that “College Students”
account for the highest number with 85 observations, accounting for 81% of the sample
On the other hand, “Ph.D Degree” is the smallest group with only 2 observations,
which only capture 1.9% of the sample
Trang 53 Draw a pie chart showing the percentage of observations by Gender
Pie Chart Percent of Gioi tinh
Giới
tính
#Femals
@ Male
The Pie chart illustrates that within the surveyed sample, the gender distribution
shows a fairly balanced split between males and females, with percentages of
50.48% for males and 49.52% for females This suggests a relatively even
distribution of genders within the dataset
4 Compare the mean of Income of the 2 groups of gender
Group Statistics
Gidi tinh N Mean Std Deviation Std Error Mean
Thu nhập hiện tạ _Female 53 8,208 10,7308 1,4740
Male 52 6,531 96743 1,3416
a Descriptive Statistics:
- Male Group:
o Mean Income: 6,531
o Standard Deviation: 9,6743
Trang 6b
1
- Female Group:
o Mean Income: 8,208
o Standard Deviation: 10,7308
Hypothesis Stating:
Null Hypothesis (HO): There is no statistically significant difference in income
between male and female groups
If P-value > a, the null hypothesis (HO) is not rejected, suggesting no statistically
significant difference in income
Alternative Hypothesis (H1): There is a statistically significant difference in
income between the male and female groups
If P-value < a, the null hypothesis is rejected, indicating a statistically significant
difference in income between the two groups
Give the significant level (a) of 0,05
c Result
Independent Samples Test
Levene's
Test for
Equality of
Variances t-test for Equality of Means
Std 95% Confidence
Mean Error Interval of the
Sig (2- Differen Differe Difference
F Sig t df tailed) ce nce Lower Upper
Thu Equal 061 ,805 ,840 103 403 1/6768 1,9951 -22800 56336
nhập variances
hiện assumed
tại Equal 841 102, 402 1,6768 1,9931 -2,2764 5,6300
variances 276
not
assumed
To check the Equal Variances Assumptions, use the result in Levene’s Test of
Equality of Variances The test’s F-value is 0,061 which yields a P-value of 0,805 >
0,05 Therefore, there is no difference in the variance of the 2 populations and we will
use the results in the line Equal Variance of assumed
In the independent sample Test above, the P-value or Sig = 0,403, so it is
concluded that the null hypothesis is not rejected, therefore, it suggests that there is no
statistically significant difference between the two groups
Trang 7In other words, there is no evidence of a difference in means of Income between
males and females
5 Compare the mean of Income among different educational levels
Descriptives
Thu nhap hién tai
95% Confidence
Interval for Mean
Mea Std Std Lower Upper Minimu Maximu
N n Deviation Error Bound Bound m m
Highschool 5 4,00 89443 4.0000 -7,106 15,106 „0 20,0
Student 0
College 85 4,41 3,0401 3298 3,763 5,075 „0 15,0
Student 9
Master’s 13 20,6 13,0600 3,6222 12,800 28.584 5,0 60,0
Degree 92
Ph.D 2 55,0 7,0711 5,0000 -8 531 118,531 50,0 60,0
Degree 00
Total 105 7,37 102069 9961 5,402 9352 „0 60,0
7
A Descriptive Statistics:
1 Highschool Student:
- Mean income: 4,000
- Standard Deviation: 8,9443
- Sample Size: 5
2 College Student:
- Mean income: 4,419
- Standard Deviation: 3,0401
- Sample Size: 85
3 Master’s Degree:
- Mean income: 20,692
- Standard Deviation: 13,0600
- Sample Size: 13
Trang 84 Ph D Degree
- Mean income: 55,000
- Standard Deviation: 7,0711
- Sample Size: 2
B Hypothesis Stating:
1 Null Hypothesis (HO): There is no statistically significant difference in income
among the four educational levels
If P > a, the null hypothesis (HO) is not rejected, suggesting no statistically
significant difference in income
2 Alternative Hypothesis (H1): There is a statistically significant difference in
income among the four educational levels
If P <a, the null hypothesis (HO) is rejected, indicating a statistically significant
difference in income among educational levels
Given the significant level (a) of 0,05
Test of Homogeneity of Variances
Levene
Statistic dfl df2 Sig
Thu nhập hiện tại Based on Mean 7,736 3 101 ,000
Based on Median 4,927 3 101 ,003
Based on Median and 4,927 3 21,760 ,009
with adjusted df
Based on trimmed mean 6,641 3 101 2 000
C Homogeneity of Variances Test:
- The Levene test indicates that the variances of the four educational levels are
not equal, Sig < 0.05
“> This violates the assumption of homogeneity of variances and should be
taken into account when conducting further statistical analyses, as it may
impact the validity of the results Therefore, an alternative statistical
method will be used, which is Welch’s test in the table Robust Tests of
Equality of Means
Robust Tests of Equality of Means
Trang 9Thu nhap hién tai
Statistic* dfl df2 Sig
Welch 29,934 3 3,787 ,004
a Asymptotically F distributed
D Result:
Sig of the Welch test is 0.004 < 0.05 This leads us to the conclusion of rejecting
H0, indicating a statistically significant difference in income among educational
levels, or at least two of the four groups differ significantly with regard to the mean
of Income
6 Check whether there is multicollinearity among the variables: Age,
Gender, Education, Marital status, Doing exercises?
Correlations
Độ tuổi hiện Trình độhọc Tìnhtrạng Mức độ tập
tại Giới tính van hôn nhân luyện thê dục
Độ tuổi hiện tại Pearson 1 - 088 664" 507" -,149
Correlation
Sig (2-tailed) „370 „000 „000 ,129
N 105 105 105 105 105
Gidi tinh Pearson -,088 1 - 194” ,173 -,276”
Correlation
Sig (2-tailed) „370 047 ,078 ,004
N 105 105 105 105 105
Trinh d6 hoc van Pearson 664" -,194° 1 349” -,199°
Correlation
Sig (2-tailed) ,000 047 ,000 ,042
N 105 105 105 105 105
Tỉnh trạng hôn nhân Pearson 507" ,173 349" 1 -,173
Correlation
Sig (2-tailed) ,000 ,078 ,000 ,078
N 105 105 105 105 105
Mức độ tập luyện thể Pearson -149_ -276” - 199” - 173 1
duc Correlation
Trang 10Sig (2-tailed) 129 004 042 078
N 105 105 105 105
** Correlation is significant at the 0.01 level (2-tailed)
* Correlation is significant at the 0.05 level (2-tailed)
The correlations of “Age - Education” ,“Age - Martial status”, “Gender-
Education”, “Gender- Doing exercise”, “Education- Martial status’, “Education-
Doing excercise” are statistically significant and have the Pearson correlation none
exceed the recommended threshold of +0.7
The correlations of “Age — Gender”, “Age — Doing Exercises” “Gender- Martial
status”, “Martial- Doing exercise” do not statistically meaning
= It means that there is no problem of multicollinearity
7 Use multiple linear regression to analyze the impact of the variables
Age, Gender, Education, Marital status, Doing exercises on the variable
Income?
A Linear Regression Model
By utilizing a multiple linear regression model, we can analyze the impact of
the five mdependent variables Age, Gender, Education, Marital status, and
Doing exercises on the dependent variable Income The model equation is as
follows:
Y=B0+B1X1+P2X2+PB3X3+P4X4+B5SX5+E
Where:
¢ Yis the dependent variable Income
e X1, X2, X3, X4, XS are the independent variables Age, Gender, Education,
Marital status, Doing exercises respectively
¢ 0 represents the intercept
e 61, B2, B3, B4, B5 are respective coefficients for the independent variables
¢ € represents the error term
10
105
Trang 11A R square & Interpretation:
Model Summary
Adjusted R Std Error of the
Model R R Square Square Estimate
1 „841 „707 „692 5,6613
a Predictors: (Constant), Mire d6 tap luyén thể dục, Độ tuổi hiện tai,
Giới tính, Tình trạng hôn nhân, Trình độ học vẫn
The value of R”indicates that 70% of the variation in Income 1s explained by five
independent variables
B Hypothesis stating and testing:
H0: BI =2 = B3 = B4 =5
If P> a, the null hypothesis (HO) is not rejected, suggesting no statistically
significant relationship
H1: At least one Bj 4 0
If P< a, the null hypothesis (HO) is rejected, suggesting a statistically
significant relationship exists
ANOVA?
Sum of
Model Squares df Mean Square F Sig
1 Regression 7661,781 5 1532,356 47,811 000°
Residual 3172,964 99 32,050
Total 10834,745 104
a Dependent Variable: Thu nhap hién tai
b Predictors: (Constant), Murc d6 tap luyén thể dục, Độ tuôi hiện tại, Giới tính,
Tình trạng hôn nhân, Trình độ học vấn
Test Statistic: F-stat = 47,811, which yields a P-value of 0,000 < a
Therefore, the null hypothesis is rejected and there is a significant
relationship between the dependent variable and at least one
independent variable
C Linear regression result
Coefficients’
11
Trang 12Unstandardized Standardized
Coefficients Coefficients
Model B Std Error Beta t Sig
1 (Constant) -30,413 4.010 -7,585 ,000
Độ tuôi hiện tai ,268 „113 ,188 2,375 „019
Giới tính -1,101 1,230 -,054 -,895 373
Trinh d6 hoc van 9,758 1,587 465 6,147 ,000
Tỉnh trạng hôn nhân 12,768 2,374 „352 5,379 ,000
Mức độ tập luyện thê -1,314 1,194 -,065 -1,101 „274
dục
a Dependent Vanable: Thu nhap hiện tại
Looking at the p-values for the independent variables in the last section, we see those two
of the five independent variables (Gender, Doing exercises) have P-values that exceed
the Significance level and therefore, are not statistically significant
By removing the variables with the highest P-value (Gender) and re-analyze the
model, we could create an improved regression model:
Coefficients’
Standardize
Unstandardized d
Coefficients Coefficients
Model B Std Error Beta t Sig
1 (Constant) -31,738 3,723 -8,524 ,000
Độ tuổi hiện tại 273 112 192 2424 017
Trinh d6 hoc van 10,102 1,539 482 6,565 000
Tinh trạng hôn nhân 12259 2,303 „338 5,324 ,000
Mức độ tập luyện thê -,983 1,134 -,048 -,867 388
duc
a Dependent Vanable: Thu nhap hiện tại
Doing exercises P-value remains higher than the Significant level, indicating that the
variable has no statistical significance in the model and should be removed
The regression model follows the coefficient tables as bellow:
Coefficients’
12