1. Trang chủ
  2. » Luận Văn - Báo Cáo

business and economics statistics case study academic performance of university students

17 0 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

Business and Economics Statistics CASE STUDY

ACADEMIC PERFORMANCE OF UNIVERSITY STUDENTS

Tutor: Mr.Nguyen Hoang Viet

Tutorial class: Tut 4

Group members:

Nguyen Huong Giang - 1904040027 Pham Ngoc Van Giang : 1904040029

Trang 2

5 Interaction plot and 1nterpretatIOTIS c1 1 1011111111111 11110101111 HH HH g1 11110 14 6 Credibility of the interpretations and eoricÏuSIOTIS - c1 9n Snn S19 1911811121018 re 16

Trang 3

TABLE OF FIGURES

Figure 1 The structure of this data Írame cic HH HH HH HH ng Ho 5 Figure 2 Frequency table (sample $1Z€$) nh nh HH HH HT 11 11111111111 6 Figure 3 Mean of Edu Spend according to Edu Levels and ProvInces§ - che 6 Figure 4 The standard deviation of Edu Spend according to Edu levels and Provinces 7

Figure 7 Levene’s 'Ï€s( T€SUÏLL HH HH HH ng HH HT HH 0 tt tre 11 Eigure 8 Q-Q plot ofresidual ác ch HH HH HH HH dt HH tt HH 11 Figure 9, Test s(atIstIC OUtDUL óc nh HH HH HH HH dt g1 HH6 1 Hr 13 Figure 10 Interaction plot between Spending for Edulevel and ProvInee ác chen reo 15

Trang 4

A Scenario

The Vietnam Household Living Standards Survey (VHLSS) was conducted nationwide every two years

to systematically monitor the living standards of Vietnam's societies In 2018, the survey was carried

out with a sample size of 46,995 households in 3,133 communes/wards which were representative at national, regional, urban, rural, and provincial levels The household questionnaire contained many

sections, each of which covered a separate aspect of household activities, and education was one important indicator In the survey, household heads were asked to specify their place of residence (province), schooling level of their children (edulevel), and expenditure on education per child for the past 12 months in thousands of VND (eduspend) The objective of our study is to test for any significant interaction between the place of residence and schooling levels and to test for any significant differences in education expenditure due to these two variables Use 0.05 level of significance

A portion of the obtained data is presented below The complete dataset, consisting of 90 observations and 4 variables (obs, province, edulevel, eduspend), is provided in the accompanying file named

Case5.cvs

# W

1 2 3

4

5 6 7 8 9

Trang 5

B Questions and Answers 1 Inference technique

It is given that the experiment of The Vietnam Household Living Standard is done to test for any significant interaction between the place of residence (province) and schooling level (edulevel) to test for any significant differences in education expenditure (eduspend) due to these two variables In this study, a two-way ANOVA (two-way analysis of variance) is applied into the real case study to assess whether there is a substantial interaction at the same time between 2 independent variables on 1 dependent variable Firstly, it can be seen that province and edulevel were two factors as well as independent variables in this case study Secondly, eduspend is known to be a variable that depends on two factors (province and edulevel)

The purpose of this study is to examine the effect of place of residence and schooling levels on education expenditure, and the interaction between two factors (province and edulevel)

2 Descriptive statistics for the dataset

Firstly, we use Rstudio to describe statistics for this question To start with, we import the Excel file “Case 5.csv” into R for further calculation:

e > setwd("Cc:/Users/Admin/Documents/bes research") e > getwd

e > case5 <- read.table(C"Case5.csv",header=TRUE, sep=",", quote="\"", stringsAsFactors=FALSE)

The structure of this data frame can be checked using str() function:

$ eduspend: int 12301

S

s

Figure 1 The structure of this data frame

From the above R output, we can obtain that there are 90 observations and 4 variables: osb, province,

edulevel, eduspend; obs and eduspend variables are numeric data, province and edulevel variables are

Trang 6

character data To apply some graphical or statistical methods, we should convert province and edulevel into factors, using the following code:

Figure 2, Frequency table (sample sizes)

It can be seen that all 6 treatment groups have the same sample size of 15 This selection is our best

choice to use a two-way ANOVA test

Next, we use the by () function in R to find several descriptive statistics such as mean, standard deviation, for each treatment group listed by the factors and their output respectively:

e > byCcase5$eduspend, list(case5$province, case5$edulevel), mean)

Figure 3 Mean of Edu Spend according to Edu Levels and Provinces

Trang 7

e > by(case5$eduspend, list(case5$province, case5$edulevel), sd)

Hungyen , > Nursery School

{1} 2481.657

Hungyen

Primary School

[11 1071.281 ThaiB1nh

Primary school [1] 356.0621

Hungyen Secondary school

] 4938.879

ThaiBinh

Secondary school [1131 1377.442

Figure 4 The standard deviation of Edu Spend according to Edu levels and Provinces Each code gives the specific descriptive statistics of the outcome variable (Edu Spend) for each treatment group with the listed Edu Levels first then the Cities

To get further information, we conduct the boxplot and the mean plot:

> boxplotCeduspend ~ interaction(province, edulevel), data = case5, xlab = “Place of residents", ylab ="Education

e expenditure", col = cC"pink", “light blue", “yellow", "white", “orange”,

“gray"))

Trang 8

HungYen Nursery School ThaiBinh Nursery School HungYen Primary School ThaiBinh PrimarySchool HungYen Secondary school ThaiBinh Secondary school

Figure 5 Boxplot for distribution of groups

Initially, the box plot shows clearly several descriptive statistics: medians, quartiles, maximum and minimum data among different groups Each cell has different characteristics for all Based on R output, we can see that the Hung Yen — Secondary School groups have reached the peak of median value but there is no difference between Hung Yen — School group and Thai Binh Secondary group Moreover, The Hung Yen — Nursery group has the lowest at almost every value: median and

minimum value

The skewness of each group is naturally through a boxplot The data of each group can be distributed basically, positive-skewed or negative-skewed is built based on the distance from the median to two

endpoints It can be obviously seen that Thai Binh — Secondary School and Hung Yen — Nursery

School are normally distributed Besides, Hung Yen — Secondary School and Thai Binh — Nursery School are basic examples of positive-skewed distribution while the others are negative-skewed distribution Also, there are 6 outliers when existing six white dots in Hung Yen — Secondary School (1 outlier), Thai Binh — Secondary School (1 outlier), Hung Yen — Nursery School (2 outliers), and Thai Binh — Nursery School (3 outliers) respectively but 4 out of 90

We still use meanplot to identify mean value of each group and compare means between groups with the following codes and their outcome:

e > install.packagesC"gplots") e > libraryC"gplots")

Trang 9

e > plotmeansCeduspend~ interaction(province, edulevel), data = case5, xlab = “Province and edulevel", ylab = “Eduspend", main="Mean Plot + with 95% CI")

Mean Plot + with 95% CI

n=15 n=15

T HungYen Nursery School haiBinh Nursery School 2 and edulevel ThaiBinh.Primary Schoot HungYen Secondary school ThaiBinh Secondary school

Figure 6 Gplot of group means

Figure 6 helps to understand better the structure of the Case 5 data and summarize difference between the means of each group at 95% of the confidence interval It displays the sample size of each group which equals 15 And we can conclude that there is large difference between means of edulevel and province due to the variable eduspend Mean difference in ThaiBinh.Primary — school,

HungYen.Secondary school and ThaiBinh.Secondary school is large while there is small difference

between Hung Yen.Primary school, HungYen Nursery school and ThaiBinh Nursery school

3 Checking assumption

The two-way ANOVA test has three assumptions:

1 Assumption 1: Sample is independent, Simple random selected 2 Assumption 2: All population standard variances are identical 3 Assumption 3: All population distributions are normal

3.1 Sample are independent, Simple random selected

As the spending on education of one household is not determined by the other one, the samples are

independent The scenario stated that: “In 2018, the survey was carried out with a sample size of 9

Trang 10

46,995 households in 3,133 communes/wards which were representative at national, regional,

urban, rural and provincial levels” Therefore, it can be assumed that this sample was selected randomly

3.2 All population variances are identical

From Figure 4 The standard deviation of Edu Spend according to Edu levels and Provinces, 1t can be seen that the largest standard variance equals 4938.879 and the smallest one equals 356.0621 The result is 4938 879/356.062 1 = 13.87084, larger than 2 This ratio reveals the second assumption

is not satisfied but the Levene’s test In fact, the condition of Levene’s test did not meet when the

ratio is larger than 3 Hypothesis

Ho: All population variances are equal

Ha: At least one population variance is a difference b Significant level: « = 0.05

Test statistic: F = 2.0727

p-value = 0.07684 d Rejection rule:

We reject Ho if p-value<a Where p-value= 0.07684 > 0.05, so we do not reject Ho e Conclusion

There is not enough significant evidence to conclude that at least one population variance is different

The result of Levene’s test was obtained by the following codes:

> install.packagesC"car") > library(Ccar)

>leveneTest(case5$eduspend, interaction(case5$province, cases$edulevel), center = median)

10

Trang 11

Figure 7 Levene’s Test result

3.3 All population distributions are normal

By using these code, we can check the distribution of all population:

> install.packagesC"car") > library(Ccar)

> qqPlotC|lmCeduspend ~ province + edulevel + province*edulevel, data=case5), simulate=T, main="Q-Q Plot", labels=F)

Figure 8 Q-Q plot of residual

Looking at figure 8, numerous points are out of the blue area This can not be proof of the normal distribution of all populations However, due to the scope of the course, we assume that the 2 last assumptions are satisfied To sum up, we are able to carry out a two-way ANOVA test with all satisfied assumptions

4 Two-way ANOVA test

As mentioned in question 1, we could use two-way ANOVA to test for the significance of the

interaction between Province and Edulevel (Interaction effect) as well as that of the differences

11

Trang 12

in education expenditure due to Province and Edulevel (2 main effects) with 0.05 level of significance

Step 1: Form hypotheses for the three tests

The three null hypotheses and alternative hypotheses for the test are stated below: > The hypothesis to test interaction effect:

Hol There is no interaction between Province and Edulevel Hal There is a significant interaction between Province and Edulevel >» The hypothesis to test main effects:

Ho2: There are no differences in education expenditure due to Province Ha2: There are differences in education expenditure due to Province Ho3: There are no differences in education expenditure due to Edulevel Ha3 There are differences in education expenditure due to Edulevel Step 2: Check assumptions:

The assumptions of the test that have been checked in the answer for question 3:

- Samples are independent, simple random samples - All populations are normally distributed - All populations have the same standard variances

Step 3: Test statistic

We run two-way ANOVA on R Studio with Eduspend as outcome variable; Province and

Edulevel as two factors by the following command:

> case5.result<-aovCeduspend~ province*edulevel, data = case5)

12

Trang 13

> sSummaryCcase5.result)

> Case5.result<- aov(eduspend ~ = Case5 > summary(case5.result)

D province

edulevel province: edulevel Residuals

To test the main effect of edulevel:

Fe= 67582359/ 6309566= 10.711 Step 4: Level of significance

The level of significance is o = 0.05 Step 5: Decision rule

Reject Ho if p-value <a To test for interaction effect: p — value =0.6905 > ø = 0.05

Step 6: Conclusion

We do not have enough statistical evidence to conclude that there is a significant interaction between two factors and differences in the education expenditure of households in two provinces ( Thai Binh and Hung Yen) due to the place of residence and schooling levels at

13

Trang 14

5% level of significance, Therefore, our conclusion is that there is insufficient evidence to

argue that the interaction between the place of residence and schooling levels is significant

Because the interaction effect is not significant, we examine 2 main effects: the effect of

provinces on education expenditure and the effect of edulevel on education expenditure As_ regards the effect of the province, we have:

5 Interaction plot and interpretations

To visualize the possible interaction between two factors graphically, we use the interaction.plot

function as follow:

> interaction.plot(Case5$province, Case5$edulevel, Case5$eduspend, type = “b", col = cC"red", “blue", “black"), pch = c(16, 18), main="Interaction between Province and Eduspend”)

14

Ngày đăng: 29/08/2024, 16:09