OVERVIEW OF THE TOPIC
Model Specification
ESTIMATED MODEL AND STATISTICAL INFERENCE
Estimated Model
Run the command reg HDI LE lnEYS lnGNI INF to compute the estimation result, the result obtained is a table:
Total sum of square (TSS) 2.45374667
Estimated sum of square (ESS) 2.41301155
Residual sum of square (RSS) 0.040735121
Variables Coefficient ^ β T P-value Confident interval (95%)
According to the estimated result from Stata using the Ordinary Least Squares (OLS) method, we obtained the Sample Regression Function (SRF) as below:
HDI it =−0.744174 +0.0045675 ¿ it +0.1983033 lnEYS it + 0.0673941 lnGNI it −0.0002993 INF it − 0.0019004 FER it + ^ u it
The coefficient of determination, R-squared (R² = 0.9834), indicates that 98.34% of the variation in the Human Development Index is accounted for by key factors such as life expectancy at birth, expected years of schooling, gross national income (GNI) per capita, inflation rate, and fertility rate, while the remaining variance is attributed to other influences.
In addition to the coefficient of determination, we consider the adjusted R² (R²) to ensure that the inclusion of additional variables does not diminish the model's significance The model exhibits a high R² of approximately 0.9827, indicating that it effectively explains 98.27% of the variability, suggesting that the added variables are justifiable and relevant.
Meanings of estimated coefficients
The constant term is estimated to be ^ β 1 =−0.744174: Holding every explanatory variable equals to 0, the expected value of Human Development index (HDI) will be −0.744174
The regression coefficient of LE is estimated to be ^ β 2 =0.0045675: Holding every explanatory variables unchanged, if the Life expectancy at birth (LE) increases by
1 year, the expected value of Human Development index (HDI) will increase by
The estimated regression coefficient for the natural logarithm of Expected Years of Schooling (lnEYS) is ^ β 3 = 0.1983033 This indicates that, with all other explanatory variables held constant, an increase of one year in Expected Years of Schooling (EYS) is associated with a 0.1983033 increase in the Human Development Index (HDI).
The estimated regression coefficient for lnGNI is 0.0673941, indicating that, while keeping other variables constant, a $1 increase in Gross National Income (GNI) per capita is associated with an expected increase of 0.0673941 in the Human Development Index (HDI).
The regression coefficient of INF is estimated to be ^ β 5 =0.0002993, which is contrary to expectation and correlation matrix Besides, its confident interval (95%) is
[−0.00114 ; 0.0017387] which included 0 Therefore, INF isn’t statistical significant.
The estimated regression coefficient for the Fertility Rate (FER) is ^β6 = -0.0019004, indicating that a 1% increase in FER is associated with a decrease of 0.0019004 in the expected Human Development Index (HDI), assuming all other variables remain constant However, the confidence interval for this estimate is [-0.0071364; 0.0033356], which includes zero, suggesting that the relationship may not be statistically significant.
Other results analysis
The Explained Sum of Squares (ESS) quantifies the variation in estimated Human Development Index (HDI) values around their sample mean, as explained by the regression model With a value of ESS equal to 2.41301155 and degrees of freedom calculated as k - 1 = 5, this metric highlights the model's effectiveness in capturing the variability in HDI data.
The Residual Sum of Squares represents the unexplained variation of the dependent variable HDI about the regression line: RSS ¿ 0.040735121 , which has the degree of freedom of n−k 4
The Total Sum of Squares (TSS) quantifies the overall variation of the Human Development Index (HDI) values in relation to their sample mean, with a calculated TSS of approximately 2.45374667 This measure is associated with a degree of freedom of n - 19.
Test for model’s possible problems and correct them
4.1 Testing omitted variables (verifying the model's correct format)
Conducting Ramsey’s RESET test with Stata we get the results:
Table 5 RESET test result of RAMSEY
Hypothesis H0: The original model did not miss the variable
H0 : Coefficient of ¿ it ; lnEYS it ;lnGNI it ; INF it ; FER it Simultaneously equal to 0 (Model without omission)
H1 : Coefficient of ¿ it or lnEYS it ∨lnGNI it ∨INF it ∨ FER it ≠ 0 (Model of variable variation)
From the above result, with p-value = 0.0688 > 𝛼 = 0.05 => Accept 𝐻0.
Therefore, the model does not Misspecification
Consider the magnification factor VIF
Table 6 Exaggerated factor of VIF variance
Comments: We see 𝑉𝐼𝐹 LE, 𝑉𝐼𝐹 lnEYS, 𝑉𝐼𝐹 lnGNI, 𝑉𝐼𝐹 FER, VIF INF are all smaller than 10 Therefore, model does not exist multicollinearity phenomenon.
Using the White test in Stata we get the results:
Table 7 White test results of Heteroskedasticity
Imtest test, White variance error change
White's test for H 0: Homoskedasticity against H 1: Unrestricted heteroskedasticity chi2(20) = 21.30
Comment: From the above result, with P-value = 0.3795 > α = 0.05 => Accept H 0
Therefore, the model doesn’t have heteroskedasticity.
The model uses a cross-sectional data type, so there is no need for self-correlation testing.
4.5 Test the normal distribution of random errors
Using the Jarque - Bera test in stata, we get the results:
Table 8 Test results of the normal distribution of random errors
{ H 0 : Random error with normal distribution
H 1 : Random error without standard distribution
The analysis yields a P-value of 0.0031, which is less than the significance level of 0.05 Given the large number of observations, the P-value does not significantly affect the outcome Consequently, we accept the null hypothesis (H0), indicating that the model's random errors adhere to the normal distribution.
The model is entirely defect-free and adheres to all the principles of the classical linear model Consequently, the Ordinary Least Squares (OLS) estimate is recognized as the optimal choice for conducting statistical inference.
From the estimation results, we obtain the following sample regression function:
HDI it =−0.744174 +0.0045675 ¿ it +0.1983033 lnEYS it + 0.0673941 lnGNI it −0.0002993 INF it − 0.0019004 FER it + ^ u it
Hypothesis Testing
5.1 Testing the significance of an individual regression coefficient ^ β j
The Ordinary Least Squares regression analysis conducted using Stata revealed the confidence intervals for the regression coefficients of each variable, assessed at a significance level of 5%.
For the variable INF, FER the value of 0 belongs to the confidence interval
[−0.00114 ; 0.0017387], which means we don’t have enough evidence to reject H 0 Therefore, the regression coefficient of INF, FER isn’t statistically significant at a significance level of 5%.
The regression coefficients for the variables LE, lnEYS, lnGNI, and the constant are statistically significant at the 5% significance level, as the value of 0 is not included in the confidence interval for each variable.
n: the number of observations or sample size, n = 120
α : the significance level, α =0.05 , for the two-tailed test, α 2 = 0.025
According to the test statistic t s = ^ β j −0
SE ( ^ β j ) of each variable at the significance level of 5% obtained from the results, we have:
For the variable LE, its absolute value is | t s | =8.34> 1.998 , we can reject H 0 Therefore, the regression coefficient of LE is statistically significant at a significance level of 5%.
For the variable lnEYS, its absolute value is | t s | 45 >1.998 , we can reject H 0. Therefore, the regression coefficient of lnEYS is statistically significant at a significance level of 5%.
For the variable lnGNI, its absolute value is | t s | 78 >1.984 , we can reject H 0. Therefore, the regression coefficient of lnGNI is statistically significant at a significance level of 5%.
For the variable INF, its absolute value is | t s | =0.41