1. Trang chủ
  2. » Luận Văn - Báo Cáo

Business And Economics Statistics Academic Performance Of University Students.pdf

18 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

Business and Economics StatisticsAcademic Performance of University Students

1

Trang 2

Table of Contents

I Scenario

II Questions

Question 1: What inference technique should be considered for this study? Explain 3

Question 2: Produce descriptive statistics for the dataset 4

Question 3: Check all assumptions of the inference technique you suggest in Question 1 Are the assumptions satisfied? Explain

Question 4: Perform the inference technique you suggest in Question 1 Remember to provide all the necessary steps What are your interpretations and conclusions? Explain What are your interpretations and conclusions if we use 0.05 level of significance 12

Question 6: Discuss the credibility of the interpretations and conclusions of Question 4 Is there anything we should be concerned about? Explain 14

III Conclusion

2

Trang 3

Table of Figures

Figure 1: Structure of data 3

Figure 2: Some first rows of the data 4

Figure 3: Structure of the data when factors have not been converted yet 4

Figure 4: Structure of the data when factors have been converted 5

Figure 5: Frequency table (sample sizes) 5

Figure 6: Mean of GPA according to Gender and Major 5

Figure 7: Median of GPA according to Gender and Major 6

Figure 8: Standard deviation of GPA according to Gender and Major 6

Figure 9: Summary of GPA according to Gender and Major 7

Figure 10: Boxplot 7

Figure 11: Mean plot with 90% CI 9

Figure 12: Q-Q Plot 11

Figure 13: Two-way ANOVA ouput 12

Figure 14: Interaction Plot between Gender and Major 14

3

Trang 4

I Scenario.

A survey was conducted by a large university in the United States to find the relationshipbetween majors and academic performance (GPA) for both its female and male students Theanswers required the interviewees to indicate their majors and GPA scores were based on a 0to 4.0 scale The purpose of this study is to examine for any substantial interaction betweenmajor and gender and to check for any significant differences in GPA due to these twovariables with 0.1 level of significance.

Figure 1: Structure of data

Question 1: What inference technique should be considered for this study? Explain.

In this case study, two-way ANOVA should be seen as an inference method for the tworeasons In general, this test evaluates the mean differences of each factor Moreover, itspurpose is to test for some connections between majors and genders with differences in GPAdue to these two variables Therefore, our team decided to use two ways ANOVA for the factthat it compares the difference between groups that have split into two independent variables(major and gender) and dependent variable (GPA) as well as it indicates the interactionbetween them.

4

Trang 5

Question 2: Produce descriptive statistics for the dataset.

We use Rstudio to describe statistics for this question To start with, we import the Excelfile “StudentSurvey 2.csv” into R for further calculation:

⮚ studentsurvey 2<-read.table("StudentSurvey 2.csv",header = TRUE,sep =",", quote="/",stringsAsFactors = FALSE )

In addition, there are 234 observations in this case study; therefore, we should see somefirst observations to have better knowledge related to this data using head () function in R:

⮚ head(student survey 2)

Figure 2: Some first rows of the dataThe internal structure of the data can be obtained by:

⮚ str(student survey 2)

Figure 3: Structure of the data when factors have not been converted yet

From the above output, it is clear that there are 234 observations with 4 variables:observation, gender, major and GPA Since Gender and Major are characters, we willconvert them into factors by using the following R codes:

⮚ student survey 2$gender<-factor(studentsurvey 2$gender, levels=c("1","2"),labels=c("Female","Male"))

⮚ studentsurvey 2$major<-factor(student survey 2$major,levels=c("1","2","3"),labels=c("Administration", "Accounting", "Finance"))

Then we use the R code str (Student Survey) to get the new structure of the data file with“Gender” and “Major” converted into factors:

5

Trang 6

Figure 4: Structure of the data when factors have been converted

A frequency table can be created to see the sample size of each treatment group with thefollowing R code:

⮚ table(student survey$gender,student survey 2$major) Figure 5: Frequency table (sample sizes)

It can be seen that all 6 treatment groups have the same sample size of 39 This selection isour best choice to use a two-way ANOVA test.

Next, we use by () function in R to find several descriptive statistics such as mean, median,standard deviation, summary, … for each treatment group listed by the factors and theiroutput respectively:

⮚ by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), mean)

Figure 6: Mean of GPA according to Gender and Major

by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), median)

6

Trang 7

ure 7: Median of GPA according to Gender and Major

by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), sd)

Figure 8: Standard deviation of GPA according to Gender and Major

⮚ by(studentsurvey2$gpa,list(studentsurvey2$gender,studentsurvey2$major), summary)

7

Trang 8

ure 9: Summary of GPA according to Gender and Major

Each code gives the specific descriptive statistics of the outcome variable (GPA) for eachtreatment group with the listed Gender first then the Major The final code Summary helpsto find 5 basic statistics along with the GPA: Minimum value, the first quantile, mean,median, the third quartile and maximum value.

To get further information, we conduct the boxplot and the mean plot.

⮚ boxplot(gpa~ interaction(gender,major), data = studentsurvey2, xlab = "Gender andMajor", ylab = "GPA", col = c("red", "blue", "yellow","grey","brown","pink"))

Figure 10: Boxplot8

Trang 9

Initially, the box plot shows clearly several descriptive statistics: medians, quartiles,maximum and minimum data among different Each cell has different characteristics forall Taken into account the most special cell, the Male - Accounting group seems to havethe highest median value, the stable and uniform GPA values when the variance within thegroup is smallest because of the smallest interquartile range and marginal value rangebetween highest and lowest value The Female - Finance group has the lowest at almostevery value: median, minimum value and maximum value when others have the highestGPA above 3.5, first and third quartiles with the average interquartile range and largevariance In contrast, the highest GPA, interquartile and variance belong to the Male -Finance group.

The skewness of each group is obvious through boxplot The data of each group can bedistributed asymmetrically, positive-skewed or negative-skewed based on the distance frommedian to two endpoints Taking three groups of male into consideration, Male -Administration distribution is left-skewed when the number of GPA values larger thanmedian value is less than the number of those which is smaller than median In the sameanalysis, it can be seen that Male - Accounting is the example of right-skewed distributionand asymmetric distribution is discovered at Male - Finance group Also, there are 3outliers when appearing three white dots in Male – Accounting, Female – Finance andMale – Finance respectively but 3 out of 234 will not affect our test result.

We still use meanplot to identify mean value of each group and compare means betweengroups with the following codes and their outcome:

⮚ install.packages("gplots")⮚ library(gplots)

⮚ plotmeans(gpa~ interaction(gender,major), data = studentsurvey2, xlab = "Genderand Seat", ylab = "GPA", main="Mean Plot + with 90% CI")

9

Trang 10

Figure 11: Mean plot with 90% CI

It can be seen from the mean plot, there are six groups which are presented in the mean plotwith 90% confidence interval The result of the mean plot for mean values is the same as By ()function when we run it for means The Female – Accounting group has the highest mean andthe lowest one is Female – Finance group Besides, means of six groups are different whichare satisfied for assumption of two-way ANOVA.

Question 3: Check all assumptions of the inference technique you suggest in Question 1 Are the assumptions satisfied? Explain

As you know from question 1, two-way factorial analysis of variance is always the bestinference method to cope with this case.However, it is necessary to check all the assumptionof this inference system before showing our two-way ANOVA with the aim of ensuring thatour results are valid.There are three assumptions which we need to check for two-wayANOVA

● Samples are independent, simple random samples of size n from each of k (=a*b)ij

● All populations are normally distributed.

● All populations have the same standard deviation: = = …=

To use these general conditions to check whether the study satisfies three assumptions fortwo-way ANOVA or not, some subjects should be denoted in detail:

10

Trang 11

● nij: Cell (combination of the factors)● i (Factor A): Gender

Assumption 2:All populations have the same standard deviation

11

Trang 12

Secondly, we are going to check the assumption 2 of equal standard deviations Looking atthe output of the “By” function in R for both male and female gender which is done inquestion 2, we can see that the ratio between the largest sample standard deviation over thesmallest sample standard deviation (= 0.765563/ 0.5129712) is around 1.49240932, which isless than 2 Therefore, we infer that all populations have the same standard deviations.Assumption 3:All populations are normally distributed

In order to check all populations are normally distributed or not, we can use Q-Q plot withR command:

qqPlot(lm(gpa ~ gender + major + gender*major, data = studentSurvey), simulate =T,main="Q-Q Plot", labels=F)

Figure 12: Q-Q Plot

We usually use a normal Q-Q plot to see the normality of residuals The scatter measures upthe data to a perfect normal distribution It can be seen from the plot that the scatter line closesto the line without outliers Therefore, it is possible for Q-Q plot to meet two requirements, asa result, the population is normally distributed

12

Trang 13

Question 4: Perform the inference technique you suggest in Question 1 Remember to provide all the necessary steps What are your interpretations and conclusions? Explain What are your interpretations and conclusions if we use 0.05 level of significance.

ANOVA test 2-way factors:

- Step 1: Identify null and alternative hypothesis:

Ho: There is not a significant interaction between major and gender in GPA.Ha: There is significant interaction between major and gender in GPA.- Step 2: Test statistic and p-value

❖ Check assumptions: We use Two-way ANOVA to test the hypothesis.● All populations are normally distributed.

● Samples are independent, simple random samples of 39 from each of 6 populations.● All populations have the same standard deviation.

❖ Test statistic and p-value:

We used Rstudio to calculate and had the output as following:

StudentSurvey2.result<-aov(GPA ~ Gender*Major, data = StudentSurvey2)summary(StudentSurvey2.result)

Figure 13: Two-way ANOVA ouput- Step 3 : Level of significance

The level of significance: α=0.1- Step 4: Decision rule and conclusion

Reject Ho if p-value < ∝

As we mentioned in question 1, the primary purpose of a two-way ANOVA is to examine theinfluence of two different categorical independent variables on one continuous dependentvariable, therefore, we now consider the interaction between major preference and gender aspriority.

● If α = 0.1

As can be obtained from the chart using R, P-value <α (0.06504< 0.1) Therefore,following the decision rule, we reject Ho.

13

Trang 14

Conclusion: We have enough evidence to conclude that there is significant interaction in GPAbetween major and gender with 90% confidence.

- With the test for Gender: P-value < α (0.00737 < 0.05).

Inferring from the result, we have enough evidence to conclude that mean in GPA of factorgender are different.

- With the test for Major: P-value < α (8.89e-13 < 0.05).

Inferring from the result, we have enough evidence to conclude that the mean in GPA of atleast one factor major are different.

Question 5: Draw an interaction plot and interpret the plot Is the plot consistent withthe conclusions made in Question 4?

Another way to see that there is a significant interaction in GPA due to Major and Gender isthe interaction plot here with Rcode:

interaction.plot(studentsurvey2$gender,studentsurvey2$major,studentsurvey2$gpa,type="b", col=c("red","blue"), pch=c(16, 18),main="Interaction between Gender andMajor")

14

Trang 15

Figure 14: Interaction Plot between Gender and Major

Theoretically, the more nonparallel the lines are, the greater the strength of the interaction.From this interaction plot, it can be seen that there is an interaction between gender and major.The Accounting major and the Administration major are two examples of the stronginteraction while Administration and Finance show a moderate one of gender and major.Overall, male students have a higher GPA than female students Female students studyingaccounting show better performances than other female students in other majors Their GPA isslightly higher than male student’s when the blue line presents a negative relationship Theline of Administration shows a positive relationship when the male performs better GPA thanfemale, approximately equal to Accounting male’s GPA This is a proof for a stronginteraction between gender and major Since the GPA of both male and female in Finance aremuch lower than that of the other two majors, the interaction here is pretty weak However, asshown in the plot, the three lines are non-parallel, so it can be assumed that the interaction ismoderate This result is consistent with the conclusions made in question 4 when we canfollow the alternative hypothesis with 0.1 level of significant, not up to 0.05 level ofsignificant.

Question 6: Discuss the credibility of the interpretations and conclusions of Question 4 Is there anything we should be concerned about? Explain.

a The credibility of the interpretations and conclusion.

In terms of interpretation, it is noticeable that the assumptions are accurate and allassumptions have been apparently confirmed and convinced without any bewilderment Inthe area of α=0.1 (α: level of significance), we could conclude that there exists interaction in

15

Trang 16

GPA between gender and major It discovers the exact probability of type I error is 10 percent,which exists when the null hypothesis is rejected In this level of significance (α=0.1) there isa noticeable interaction, and then turn a blind eye to the following two sets of hypotheses forthe main effects A noticeable interaction tells us that the change in the correct averagereactions for a level of factor major depends on the level of factor gender The outcome ofcoincident changes cannot be concluded by examining the main effects independently In thescheme of question 4, we tried to compare p-value with α=0.05 and wrapped up that theredoesn't exist interaction in GPA between gender and major It measures that the probability oftype I error is 5 percent By way of explanation, it is 95% believable to conclude that there isno interaction in GPA between gender and major The level of significance of a hypothesis testis equal to the probability of a type I error Consequently, when α changes from 0.1 to 0.05,the probability of type I error reduces by the same amount This will result in an increase inthe Probability of type II error which occurs if we do not reject Ho even it is wrong Theinteraction is not significant enough to the point of level of significance type I which is equal0.05 to reject We claimed the interpretations and conclusions of question 4 are somewhattrustworthy and three factors (GPA major and gender are remarkably different As a result,, ) there is not a noticeable interaction, then continue to test the main effects The factor majorsums of squares will reflect random variation and any differences between the true averagereactions for different levels of factor major In a similar way, factor gender sums of squareswill reflect random variation and the true average reactions for the different levels of factorgender In short, it is 90% conceivable to wrap up that there is interaction in GPA betweengender and major

b Limitation.

In spite of two-way ANOVA’s advantages to solve the case, there are a few demerits weshould consider in this paper First of all, our samples are quite exiguous Therefore, theoutcomes of the samples might not reflect excellently the whole GPA and diminishedcredibility On the other hand, we just have learned the interaction between two factors, anddissimilarity in GPA between majors and genders Notwithstanding, GPA is affected by avariety of factors such as the number of hours they work per week, social environment,living place and income level of family, previous academic performance, learningability, time spent for studying In addition, there is no documentation to verify that thesamples were aimlessly chosen The final demerit is the conditions of manipulating ANOVAtest especially in this context we can identify that the data is clumsily satisfy this requirement.Based on these demerits we have 90% confidence level instead of 95% To sum up, although

16

Ngày đăng: 14/05/2024, 15:37

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w