Is there any statistically significant difference between the 2 groups of gender in terms of Income?. Null Hypothesis HO: There is no statistically significant difference in income betwe
Trang 1
UNIVERSITY OF ECONOMICS HO CHI MINH CITY
BUSINESS SCHOOL
FACULTY OF INTERNATIONAL BUSINESS - MARKETING
UEH
UNIVERSITY
GROUP REPORT
GROUP PRESENTATION — END OF TERM
Lecturer: PhD Nguyen Van Dung
Subject: Business Analysis
Course ID: 24D1BUS50320001
Date: Wednesday morning
Ho Chi Minh City, May 12, 2024
Trang 2Table of Contents
2 Make a frequency table about Educational level Requirements: (1) number of observations for
each level and (11) specific percentage for each level Which level of education accounts for the
highest percentage? Which level of education accounts for the lowest percentage? 1
3 Draw a pie chart showing the percentage of observations by Gender (male, female) Show the
specific percentages on the graph In the sample, males or females accounted for a higher
JOSE 0) 010) 18 (0) cc cc cc cor 2
4, Compare the mean of Income of the 2 groups of gender Is there any statistically significant
difference between the 2 sroups of gender 1n terms of Ïneorn€? ‹ c1 1121111112111 2
5 Compare the mean of Income among different educational levels Is there any statistically
significant difference among educational levels in terms oŸ Income? - cá c2 s22 4
6 Check whether there is multicollinearity among the variables: Age, Gender, Education,
„/bitrJRorin 88.900 5852.<2v01.- yPnn 6 6
7, Use multiple linear regression to analyze the impact of the variables Age, Gender, Education,
Martital status, Doing exercIses on the variable Ïncorme i c1 101121211 1 1110111111 7
8 Create an interaction variable between Age and Doing exercises Analyze the moderating
effect of Doing exercises on the relationship between Age and Ineorme? - ác s2 10
Trang 3GROUP 4
Name Student ID Task completion
Nguyễn Thị Thanh Ngân 31221024484 100%
Đặng Võ Như Quỳnh 31221026177 100%
Lê Cao Nhật Anh 31221020690 100%
Đặng Thị Hà Nhu 31221026107 100%
Phạm Phan Hà My 31221025378 100%
Bùi Băng Nhi 31221026104 100%
Bùi Như Quỳnh 31221026123 100%
Trang 4
2 Make a frequency table about Educational level Requirements: (i) number of
observations for each level and (ii) specific percentage for each level Which level of
education accounts for the highest percentage? Which level of education accounts for the
lowest percentage?
Statistics
Education:
N Valid 100
Missing
Education:
Frequency Percent Valid Percent Cumulative Percent
Valid THPT 21 21.0 21.0 21.0
Dai hoc 70 70.0 70.0 91.0
Thạc sĩ 7.0 7.0 98.0
Tiến sĩ 2.0 2.0 100.0
Total 100 100.0 100.0
The table describes the number of observations for each level of education and their proportion
in the total extent It can be drawn from the table that “College Student” accounts for the highest
number with 70 observations, accounting for 70% of the sample
On the other hand, “Ph.D Master” is the smallest group with only 2 observations, which only
capture 2 % of the sample
Trang 53 Draw a pie chart showing the percentage of observations by Gender (male, female)
Show the specific percentages on the graph In the sample, males or females accounted for
a higher proportion?
Pie Chart Count of Gender:
Gender:
#Femals
Bae
The Pie chart illustrates that within the surveyed sample, the gender distribution shows a fairly
balanced split between males and females, with 48% for males and 52% for females This
suggests a relatively even distribution of genders within the dataset
4 Compare the mean of Income of the 2 groups of gender Is there any statistically
significant difference between the 2 groups of gender in terms of Income?
A Descriptive Statistics:
- Male Group:
@ Mean Income: 12.92
e Standard Deviation: 8.470
- Female Group:
@ Mean Income: 11.25
e Standard Deviation: 9.800
Group Statistics
| Gchder: N Mean Std Deviation Std Error Mean
Income: Female 52 11.25 9.800 1.359
Male 48 12.92 8.470 1.223
Trang 6B Hypothesis Stating:
1 Null Hypothesis (HO): There is no statistically significant difference in income between male
and female groups
If P-value > a, the null hypothesis (HO) is not rejected, suggesting no statistically significant
difference in income
2 Alternative Hypothesis (H1): There is a statistically significant difference in income between
the male and female groups
If P-value < a, the null hypothesis is rejected, indicating a statistically significant difference in
income between the two groups
Give the significant level (a) of 0,05
€, Levene”s test
According to the statistical table described above, Sig of the test F = 0.865 > 0.05 => fail to
reject the HO: there is no difference in the variance of the 2 populations => use the results in the
lie Equal variances assumed
Independent Samples Test
Levene's Test
for
Equality of
Variancettest for Equality of Means
95% Confidence
Interval of the
Sig (Mean Std Error Difference
Sig df tailed) Difference Differenkewer Upper
Income: Equal 029 865 -.906 98 367 -1.667 1.839 -5.315 1.982
variances
assumed
Equal -.912 97.592 364 -1.667 1.828 -5.294 1.961
variances
not
assumed
D Results
Trang 7Based on the independent sample test results, with a p-value or significance level of 0.367 > 0.05
sit is concluded that the null hypothesis is not rejected This suggests that there is no statistically
significant difference between the mean incomes of males and females
In other words, it implies that there is no significant evidence in income between the two
genders
5 Compare the mean of Income among different educational levels Is there any
Statistically significant difference among educational levels in terms of Income?
Descriptives
Income:
95% Confidence Interval
for Mean
N Mean — Std Deviaticiitd Errorf_ower BoundJpper BoundVinimum Maximum
THPT 21 910 8414 1.836 5.27 12.93 30
Đại học 70 10.56 6.606 790 8.98 12.13 30
Thạc sĩ 27.14 4298 1625 23.17 31.12 20 32
Tiến sĩ 42.50 3.536 2.500 10.73 74.27 40 45
Total 100 12.05 9.178 918 10.23 13.87 45
According to the statistical table described above, in the Mean column the average value
of Income of the High school education is 9.10, University is 10.56, MA is 27.14 and
PhD 42.50 Which values from 9.10 to 42.50 elements representing each income level
gradually increases
— So it is clear that income will be proportional to education level The high school education
level is average lowest income among the three qualifications above; and vice versa, PhD level
of education has the highest average income
Test of Homogeneity of Variances
Levene Statistic dfl df2 Sig
Income: Based on Mean 1.467 96 228
Based on Median 852 96 469
Trang 8Based on Median and wifh2 72.970 470
adjusted df
Based on trimmed mean 1.155 96 331
- According to table “Test of Homogeneity of Variances” above, Sig of the Levene
statistic of the Income average per month is 0.228 > 0.05 So at 95% confidence that the
hypothesis HO “The variance is equal” are accepted, and reject hypothesis H1: “The
variance is different”
— Hence, the result of ANOVA analysis can be used
ANOVA
Income:
Sum of Squares df Mean Square Sig
Between Groups 3788.312 1262.771 26.641 000
Within Groups 4550.438 96 47.400
Total 8338.750 99
- According to the ANOVA table above, it results of ANOVA analysis with a significance
level of 0.000 < 0.05, thus the observational data are qualified to confirm that there is a
difference in average monthly income between groups with different levels of education
— There is a statistically significant difference between educational levels and income
Robust Tests of Equality of Means
Income:
Statistic* dfl df2 Sig
Welch 57.206 4.843 000
a Asymptotically F distributed
6 Check whether there is multicollinearity among the variables: Age, Gender, Education,
Marital status, Doing exercises?
Correlations
Trang 9Age: Gender: Education: Marital status:Doing exercises:
Age: Pearson Correlation 083 220° 618” 136
Sig, (2-tailed) 412-028 000 177
N 100 100 100 100 100
Gender: Pearson Correlation083 -.074 053 088
Sig (2-tailed) 412 462 604 386
N 100 100 100 100 100
Education: Pearson Correlation 220° -.074 170 199°
Sig (2-tailed) 028 462 091 048
N 100 100 100 100 100
Marital status: Pearson Correlaton6l8” .053 170 140
Sig (2-tailed) 000 604 091 164
N 100 100 100 100 100
Doing exercises:Pearson Correlation 136 088 199° 140
Sig(2tailed)ọ 177 386 .048 164
N 100 100 100 100 100
* Correlation is significant at the 0.05 level (2-tailed)
** Correlation is significant at the 0.01 level (2-tailed)
Trang 10According to Correlations tables, we can see that:
- The correlations of “Age-Gender” is 0.083, “Age-Education” is 0.220, “Age-Marital
status” is 0.618, “Age-Doing exercises” is 0.136 4 + 0.7 So there is no multicollinearity
between these pairs of variables
- Similarly, the correlation between two variables: “Gender - Education” is -0.074,
“Gender - Marital status” is 0.053, “Gender - Doing exercises” is 0.088, “Education -
Marital status” is 0.170, “Education - Doing exercises” is 0.199, “Marital status - Doing
exercises’ is 0.140 All indexes are equal less than + 0.7, so it can be concluded that each
individual pair of variables does not have multicollinearity
— It means that there is no problem of multicollinearity
7 Use multiple linear regression to analyze the impact of the variables Age, Gender,
Education, Marital status, Doing exercises on the variable Income?
A Linear Regression Model
By utilizing a multiple linear regression model, we can analyze the impact of the five
independent variables Age, Gender, Education, Marital status, and Doing exercises on the
dependent variable Income The model equation is as follows:
Y=B0+BIXI+B2X2+B3X3+B4X4+BSX5+e
Where:
e Y is the dependent variable Income
e© XI, X2, X3, X4, Xã are the mdependent variables Age, Gender, Education, Marital
status, Doing exercises respectively
® 0 represents the intercept
e £1, 62, £3, B4, B5 are respective coefficients for the independent variables
e erepresents the error term
B R square & Interpretation:
Model Summary
Model R Square Adjusted R Square Std Error of the Estimate
834° 695 679 5.201
a Predictors: (Constant), Gender:, Marital status:, Doing exercises:, Education:, Age:
Trang 11The value of R2 indicates that 70% of the variation in Income is explained by five independent
variables
C Hypothesis stating and testing:
H0: B1 = B2 = B3 = B4 = B5 If P > a, the null hypothesis (HO) is not rejected, suggesting no
statistically significant relationship H1: At least one By #0
If P <a, the null hypothesis (HO) is rejected, suggesting a statistically significant relationship
exists
ANOVA?’
Model Sum of Squares df Mean Square Sig
Regression 5796.032 1159.206 42.854 000°
Residual 2542.718 94 27.050
Total 8338.750 99
a Dependent Variable: Income:
b Predictors: (Constant), Gender:, Marital status:, Doing exercises:, Education:, Age:
Test Statistic: F-stat = 42,854, which yields a P-value of 0,000 < oa Therefore, the null
hypothesis is rejected and there is a significant relationship between the dependent variable and
at least one independent variable
D Linear regression result
Coefficients*
Standardized
Unstandardized Coefficients Coefficients
Model Std Error Beta Sig
(Constant) -15.610 2.408 -6.482 000
Doing exercises: 915 1.088 049 841 402
Age: 522 084 459 6.240 000
Education: 6.004 921 389 6.516 000
Marital status: 4.525 1.361 242 3.325 001
Gender: 1.188 1.054 065 1.127 262
a Dependent Variable: Income:
Trang 12Looking at the p-values for the independent variables in the last section, we see that two of the
five independent variables have P-values that exceed the Significant level and therefore, are no
statistically significant
By removing the variables with the highest P-value (Doing exercises) and re-analyze the model,
we could create an improved regression model:
Coefficients*
Standardized
Unstandardized Coefficients Coefficients
Model Std Error Beta Sig
(Constant) -15.470 2.399 -6.450 000
Age: 524 084 461 6.278 000
Education: 6.144 905 398 6.790 000
Marital status: 4.600 1.356 246 3,392 001
Gender: 1.273 1.048 070 1.215 227
a Dependent Variable: Income:
Gender P-value remains higher than the Significant level, indicating that the variable has no
statistical significance in the model and should be removed The regression model follows the
coefficient tables as bellow:
Coefficients*
Standardized
Unstandardized Coefficients Coefficients
Model Std Error Beta Sig
(Constant) -14.866 2.352 -6.320 000
Age: 532 083 468 6.377 000
Marital status: 4.609 1.359 246 3.391 001
Education: 6.039 903 391 6.688 000
a Dependent Variable: Income:
Y= -14.866 + 0,532X1 + 4,609X2 + 6,039X3 + €
E Interpretation of result
With the exclusion of X4 and X5 from the model, the interpretation of the result is as follows:
Trang 13e The coefficient of Age is statistically significant and positive This indicates that age has
a positive influence on income If age increases by 1 year, income increases by 0,532
million VND, keeping all other independent variables constant
e The coefficient of Marital status is statistically significant and positive This indicates
that level of education has a positive influence on income If education increases by 1
level, income increases by 4,609 million VND, keeping all other independent variables
constant
e The coefficient of Education is statistically significant and positive This indicates that
marital status has a positive influence on income If the observation is married, income
increases by 6,039million VND, keeping all other independent variables constant
8 Create an interaction variable between Age and Doing exercises Analyze the moderating
effect of Doing exercises on the relationship between Age and Income?
ANOVA?
Model Sum of Squares df Mean Square Sig
Regression 4482.916 1494.305 37.204 000°
Residual 3855.834 96 40.165
Total 8338.750 99
a Dependent Variable: Income:
b Predictors: (Constant), Interaction, Age:, Doing exercises:
— The ANOVA table illustrated that this model is reliable (Sig < 0.05)
Coefficients*
Standardized
Unstandardized Coefficients Coefficients
Model Std Error Beta Sig
(Constant) -2.993 3.768 -.794 429
Age: 517 149 454 3.459 001
Doing exercises: -6.773 4.600 -.366 -1.472 144
Interaction 371 177 599 2.101 038
a Dependent Variable: Income: