tiểu luận kinh tế lượng ECONOMETRICS REPORT factors affect students’ GPA

Introduction

Question of interest

Learning techniques are a significant concern for university students, particularly undergraduates, as they often struggle to adapt to new learning environments and varying social circumstances This adjustment can hinder their academic performance, leading to lower GPAs.

At Hanoi University of Science and Technology and the National University of Civil Engineering, approximately 40% of students successfully graduate within the standard duration of their programs, while around 15% face expulsion due to significantly low academic performance.

Therefore, in this research, we will analyze the factors that affect students’

Utilizing a regression model and hypothesis testing, we analyze the impact of various factors on GPA Our survey, conducted across social networks, gathered 152 responses, ensuring that the results are objective and reliable for drawing conclusions.

Some background analysis into the topic

are some results that the article had pointed out, using median hypothesis testing

LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com

Exhibit 1: Difference in personal information and GPA (Source: sj.ctu.edu.vn)

Criteria Choices GPA (out of 4) Difference (a –b)

Joining a club or participating in extra-curricular programs can enhance social skills and personal development for both males and females Engaging in a part-time job during first or second enrollment can provide valuable work experience While some may choose to rent a house, others prefer the comfort of a family home The efficiency of these choices varies, with some finding them beneficial while others may not Ultimately, the decision to join clubs, work part-time, or participate in additional programs depends on individual preferences and circumstances.

*,**,***: having meaning with the confidence interval of 99%, 95%, 90% respectively ns: no meaning with the confidence interval of 90%

Exhibit 2: Difference in time-spending compared to GPA (Source: sj.ctu.edu.vn)

Criteria Choices GPA (out of 4) Difference (a –b)

Group studying a, < 3,6 hours a day b, > 3,6 hours a day a, < 2,7 hours a day b, > 2,7 hours a day a, Yes b, No

The data indicates that, on average, girls achieve higher GPAs than boys Additionally, students who participate as monitors, class operators, or in clubs tend to have elevated GPAs compared to their peers These findings are statistically significant, with a confidence interval of 99% Furthermore, research highlights that factors such as revision time, class attendance, and group study sessions also significantly influence academic performance.

With the help of the last research, we are now conducting another research to see these factors’ effect into students’ GPA.

Methodology

In this research, to have data for study, we have conducted the online survey asking people about their GPA and other habits

We utilize both quantitative and qualitative methods to assess the impact of various factors on GPA By employing software such as Excel and Stata, we analyze and present data that reveals the significance of each factor's influence Additionally, we adhere to an eight-step process for problem analysis in econometrics, which will be detailed in the following section.

Procedure and program used

a, The procedure for analyzing include:

Step 1: Question of interest Step 2: Economic model Step 3: Econometrics model Step 4: Data collection Step 5: Estimation of econometric model Step 6: Check multicollinearity and heteroscedasticity Step 7: Hypothesis postulated

Step 8: Result analysis & Policy implication

LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com b, Program used for the whole research

Google Forms: To collect data & carry out the survey

Google Drive: To store all materials we have collected for this report, which includes lots of folders & files

Microsoft Excel: To present data & replace some answers to match the Stata The data set will be attached with this report

Stata: To analyze the data and run the regression

Economic model

This report employs an empirical economic model, which relies on data collected for various variables Unlike a fundamental model that is purely mathematical, the empirical approach utilizes accepted statistical techniques to estimate the values of the model based on the gathered data.

Empirical model discovery and theory evaluation consist of five essential steps; however, due to constraints in purpose and resources, this report will focus on only three of those steps.

1) Specifying the object for modelling

2) Defining the target for modelling

3) Embedding that target in a general unrestricted model

1 Specifying the object for modeling

As such, this report find the relationship between GPA, which is the object for modeling, and each of relating factors

2 Defining the target for modeling by the choice of the variables to analyze, denote {𝒙𝒊}

Our research identified ten key factors influencing student success: years of university education, gender, time spent on clubs, jobs, entertainment, sleep, self-study, socializing, the number of credits earned, and the influence of teachers.

3 Embedding that target in a general unrestricted model (GUM)

In its simplest acceptable representation (which will later be specified in the econometric model), the GUM of is determined to be:

GPA = 𝑓(educ, female, tclb, tjob, tentertain, tsleep, tstudy, tout, ncre, tchimp)

Exhibit 3: Definition of variables in the GPA model

This article defines several key variables related to academic performance and personal time management GPA represents the grade point average, while "educ" indicates the number of years spent in university education The variable "female" is coded as 1 for female students Additionally, the article outlines the various time allocations: "tclb" refers to time spent in clubs, "tjob" denotes time dedicated to jobs, "tsleep" indicates sleep duration, "tstudy" represents time for self-study, "tout" refers to time spent hanging out, and "tentertain" is the time allocated for entertainment Lastly, "tchimp" assesses the impact of teachers, and "ncre" signifies the number of credits earned.

To determine the relationship between GPA and other factors, the regression function can be constructed as follows:

𝛽 0 is the intercept of the regression model

𝛽 𝑖 is the slope coefficient of the independent variable x i

𝑢 is the disturbance of the regression model

𝑢 is the residual (the estimator of 𝑢 From this model, this report is interested in explaining GPA in terms of each of the ten independent variables:

(educ, female, tclb, tjob, tentertain, tsleep, tstudy, tout, ncre, tchimp

This set of data is a primary one, collected from a recent survey

In 2019, a survey involving 152 students from various universities was conducted, revealing their GPAs along with correlating factors that influence academic performance The data collected provides insights into the elements affecting students' GPA, contributing to a deeper understanding of academic success.

The data set would be attached with this report in APPENDIX part

The survey was made by following these steps:

Step 1: Set the goals for the survey: We hope to find out the relationships between the GPA of the students and their living and studying behaviors

Step 2: Set the parameters of the survey: The people who are asked to take the survey are 152 random students at Foreign Trade University (FTU) The survey was taken in December, 2019

Step 3: Decide on the survey method: The survey was an online form which was convenient and time-saving for both the researching group and the students who took the survey The structure of researching data is cross-sectional data to observe several factors in a period of time

Step 4: Match questions to the objectives: The questions were arranged so that they covered most of the significant factors that might affect the study results of the students These included the time spending for clubs, jobs, entertainment, self- studying, etc Also, most of the questions were multiple choice questions which were easy to answer within a few minutes

Step 5: Maintain records: All of the answers were recorded automatically at

Google Forms so that the survey could be checked later for researching purpose

To get statistic indicators of the variables, in Stata, the following command is used: sum gpa educ female tclb tjob tentertain tsleep tstudy tout ncre tchimp

The result is shown in Exhibit 4

Exhibit 4: Statistic indicators of variables in the GPA model

• Obs is the number of observations

• Mean is the expected value of the variable

• Std Dev is the standard deviation of the variable

• Min is the minimum value of the variable

• Max is the maximum value of the variable

1 Checking the correlation among variables

The correlation between GPA and various factors such as education, gender, time spent in class, job, entertainment, sleep, study, time out, and creativity is analyzed by calculating the correlation coefficient This coefficient, denoted as r, indicates the strength and direction of the linear relationship between two variables on a scatterplot In Stata, a correlation matrix is generated using the command: corr gpa educ female tclb tjob tentertain tsleep tstudy tout ncre tchimp, with the results displayed in Exhibit 5.

From the correlation matrix, it can be inferred that the correlation between gpa and each of the independent variable is decent enough to run the regression model Specifically:

- gpa and educ have a weak uphill relationship

- gpa and female have a weak uphill relationship

- gpa and tclb have a weak downhill relationship

- gpa and tjob have a weak uphill relationship

- gpa and tentertain have a moderate downhill relationship

- gpa and tsleep have a weak downhill relationship

- gpa and tstudy have a moderate uphill relationship

- gpa and tout have a weak uphill relationship

- gpa and ncre have a weak downhill relationship

- gpa and tchimp have a weak uphill relationship

The correlation between each pair of them can be visualized using scatter lot graph in Stata The result is shown in Exhibit 6

Exhibit 6: Scatterplot of variables in GPA model

After verifying the correlation conditions among the variables, the regression model can be executed in Stata using the command: reg gpa educ female time1 time2 time3 time4 time5 time6 ncre tchimp.

From the result, it can be inferred that:

➢ We have the regression function:

❖ 𝛽 0 = 3.308301 : When all the independent variables are zero, the expected value of GPA is 3.308301

❖ 𝛽 1 = 0.0328137: When years of education at university increases by one year, the expected value of GPA increases by 0.0328137

❖ 𝛽 2 = 0.0436706: Expected value of GPA in 𝑓𝑒𝑚𝑎𝑙𝑒 is lower than that in male 0.0436706 unit

❖ 𝛽 3 = −0.0505358: When 𝑡𝑖𝑚𝑒 𝑓𝑜𝑟 𝑐𝑙𝑢𝑏𝑠 increases by one hour, the expected value of GPA decreases by 0.0505358

❖ 𝛽 4 = −0.0046487: When 𝑡𝑖𝑚𝑒 𝑓𝑜𝑟 𝑗𝑜𝑏𝑠 increases by one hour, the expected value of GPA decreases by 0.0046487

❖ 𝛽 5 = −0.1089011: When the 𝑡𝑖𝑚𝑒 𝑓𝑜𝑟 𝑒𝑛𝑡𝑒𝑟𝑡𝑎𝑖𝑛𝑚𝑒𝑛𝑡 increases by one hour, the expected value of GPA decreases by 0.1089011

❖ 𝛽 6 = −0.0008117: When 𝑡𝑖𝑚𝑒 𝑓𝑜𝑟 𝑠𝑙𝑒𝑒𝑝 increases by 1 hour, the expected value of GPA decreases by 0.0008117

❖ 𝛽 7 = 0.1164687 : When 𝑡𝑖𝑚𝑒 𝑓𝑜𝑟 𝑠𝑒𝑙𝑓 − 𝑠𝑡𝑢𝑑𝑦 increases by 1 hour, the expected value of GPA increases by 0.1164687

❖ 𝛽 8 = 0.0147472 : When 𝑡𝑖𝑚𝑒 𝑓𝑜𝑟 ℎ𝑎𝑛𝑔𝑖𝑛𝑔 𝑜𝑢𝑡 increases by 1 hour, the expected value of GPA increases by 0.0147472

❖𝛽 9 = −0.0147472 : When 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑐𝑟𝑒𝑑í𝑡𝑠 increases by 1 credit per student, the expected value of GPA decreases by 0.0147472

❖𝛽 10 = 0.0878932 : When 𝑖𝑚𝑝𝑎𝑐𝑡 𝑜𝑓 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 increases by 1 unit, the expected value of GPA increases by 0.0878932

❖All independent variables (educ, female, tclb, tjob, tentertain, tsleep, tstudy, tout, ncre, tchimp) jointly explain 41.32% of the variation in the dependent variable (gpa)

❖Other factors that are not mentioned explain the remaining 58.68% of the variation in the gpa

❖ Adjusted coefficient of determination adj R- squared= 0.3716

❖ Total Sum of Squares TSS= 33.2980886

❖ Explained Sum of Squares ESS = 13.7604018

❖ Residual Sum of Squares RSS = 19.53768681

❖ The degress of freedom of Model Df m = 10

❖ The degree of freedom of residual Df r = 141

We have this following hypothesis: H0: ui is normally distributed

H1: ui is not normally distributed

To test this hypothesis, we can use histogram in Stata, which is generated using these commands: predict resid, residual histogram resid, normal

Exhibit 8: Histogram plot indicating normality

We can also test normality using Skewness Kurtosis test for normality, using the command:

Sktest resid The result is shown in Exhibit 9

At the 5% significance level, both p-values of Skewness and Kurtosis are smaller than 0.05 so we have enough evidence to reject H 0

Our sample comprises 152 observations, which is sufficiently large to ensure that even with a non-normally distributed variable, the model can yield reliable results and remain applicable for statistical analysis.

Multicollinearity refers to the strong correlation between explanatory variables, complicating the isolation of individual regressor effects and potentially leading to overestimated standard errors and reduced t-values To identify multicollinearity, one can analyze the correlation matrix of the regressors and perform auxiliary regressions In Stata, the command used to assess this issue is "vif," which stands for variance inflation factor.

Exhibit 9: Skewness/ Kurtosis tests for normality

The value of VIF here is lower than 10, indicating that multicollinearity is not too worrisome a problem for this set of data

Heteroscedasticity refers to the situation where the variance of the error term is not constant, leading to inefficient least squares results and potentially misleading t and F tests To detect heteroscedasticity, one can plot the residuals against the regressors, with White's test being a popular method To address this issue, it may be necessary to respecify the model by identifying any missing variables In Stata, the imtest, white command, which stands for information matrix test, is utilized for this purpose.

At the 5% significance level, there is enough evidence to reject the null hypothesis and conclude that this set of data meets the problem of Heteroscedasticity

Another way to test if Heteroscedasticity exists is to graph the residualversus- fitted plot, which can be generated using the rvfplot, yline (0) line command in Stata

From the graph, we can see that there is an increase in the variability, which means this set of data has Heteroscedasticity problem

To address the issue of error assumptions in regression analysis, robust standard errors are employed to allow for non-independence and non-identical distribution of errors In Stata, this is accomplished by rerunning the regression with the robust option using the command: reg gpa educ female tclb tjob tentertain tsleep tstudy tout ncre tchimp, robust.

Exhibit 12: Residual-versus-fitted plot of the model

The comparison of the results with the previous regression shows that while the coefficient estimates remained unchanged, the standard errors and t values varied, leading to more accurate p values.

The question of interest, in multiple regression model:

Data collection

This set of data is a primary one, collected from a recent survey

The 2019 survey, sourced from [this link](http://bit.ly/2PYZl1H), involved 152 students from various universities and examined their GPAs along with related factors outlined in our model.