1 Descriptive Statistics Some statistical information:
[E]Group: UNTITLED Workfile: ECO ASSIGNMENT:UntiledA [co | &) | &2]
| View | Proc | Object] [Print|Name |Freeze | | Sample | Sheet | Stats |Spec|
Covariance X2_ POPU X3 INVEST X4 EXPORT X5 IMPORT
X5 IMPORT 108E+13 107E+12 333E+12 317E+12Figure 3: Covariance matrix The following graphs show the relationship between GDP and others factors which are Population, Investment, Export and Import respectively:
(Ga Grape XVLINE Workfile: ECO_ASSIGNMENT=Untitied\
[ l Tobie i Ej | ụ cillerine] T Toon re | P T II T T — -
Figure 4: Relationship between GDP and its factors
Based on the EViews’ result, we have the equation:
GDP= 1 + 2*Population + 3*Investment + 4*Export + B5* Import In stochastic form:
GDP= 1 + 2*Population + 3*Investment + 4*Export + B5*Import + Ui
(=) Equation: ESTIMATEEQUATION Workfile: ECO_ASSI | || [me] || t4
|View] Proc] Object] [Print| Name| Freeze| | Estimate | Forecast] Stats| Resids|
Dependent Variable: Y Method: Least Squares Date: 04/12/21 Time: 15:52 Sample: 1995 2019 Included observations: 25
Variable Coefficient Std Error t-Statistic Prob
S.E of regression 123290.4 Akaike info criterion 26.45933 Sum squared resid 3.04E+11 Schwarz criterion 26.70310 Log likelihood -325.7416 | Hannan-Quinn criter 26.52694 F-statistic 1392.059 Durbin-Watson stat 0.561738 Prob(F-statistic) 0.000000
Figure 5 Estimation of the best model Thus, we have the Sample regression function is:
GDP = -3053843 + 0.040863*Population + 1.473427*Investment + 0.763855*Export - 0.401301* Import
BI = -3053843; Regardless of other variables, the GDP is expected to decrease 3,053,843 billion VND per year
B2 = 0.040863; There was a positive relationship between GDP and Population If Population increases by 1 million people and other variables unchanged, GDP is expected to increase by 0.040863 billion VND
BB = 1.473427: There was a positive relationship between GDP and Investment If Investment increases by 1 billion VND and other variables unchanged, GDP is expected to increase by 1.473427 billion VND
B4 = 0.763855: There was a positive relationship between GDP and Export If Export increases by 1 billion VND and other variables unchanged, GDP is expected to increase by 0.763855 billion VND
BS =-0.401301: There was a negative relationship between GDP and Import If Import increases by 1 billion VND and other variables unchanged, GDP is expected to decrease by 0.401301 billion VND
R-squared = 0.996421 is measure of “Goodness of fit”, which means that approximately 99.64% of total variation of GDP can be explained by the variation of four factors: Population, Investment, Export, Import
3.1 Testing the overall significance of all coefficient
According to the test of functional form, the OLS for the model gets the following representation:
GDP=Bl+ PL Population+ 3ftInvestment+ 4 *[ffXport + B5)* Import
We use the F-test to test the overall significant test to check the effect of all independent variables
In hypothesis testing, we use a significance level of 5% and a number of observations n = 24
H,: All variables have no effect on GDP (B2 = B3 = Bs = B5 = 0) H,: At least one variable has effect on GDP (8240, B340, B440 or/and B540) F-statistic
In the Eviews table above, we obtain:
Test statistic: F-stat = 1392.059 Critical value: = = 4.53 Decision rule: If F-stat > => Reject Ho Compare: 1392.059 > 4.53
In hypothesis testing, after evaluating the overall significance of the model, individual partial coefficients can be tested A t-test is employed to determine whether each independent variable significantly contributes to the dependent variable These tests assess the significance of each variable's effect on the outcome.
Intercept | Hypothesis | Critical value: There is not enough evidence to coefficient = $47 conclude that the intercept coefficient
Te statistically significant with 95% of
HO: BL=0 | tose statistic: confidence level
Population | Hypothesis | Critical value: There is enough statistical evidence to
= =2.447 conclude that Population has an effect HO: B2=0 | Test statistic: on GDP with 95% of confidence level
Investmen | Hypothesis | Critical value: yp There is not enough _ statistical t = $47 evidence to conclude that Investment
.83— has an effect on GDP with 95% of
Export Hypothesis | Critical value: There is enough statistical evidence to
= $47 conclude that Export variable has an effect on GDP with 95% of confidence Test statistic: level.
J4=0 | BAFO0 |=4.1s42CHECKING ERRORS IN THE MODEL1 Multicollinearity One of ten assumptions of the classical linear regression model (CLRM) is that there is no multicollinearity among regressors (Assumption 10) This might be because the existence of multicollinearity leads to less accuracy results of the regression coefficients and reduces the reliance of the model
1.1 The nature Multicollinearity exists when there are perfect linear relationships among independent variables of the regression model This exact relationship happens if the following condition is satisfied:
MXI+2.2X2+ + kXk vi = 0 ủ viIsa stochastic error term ủ Àl,^À2, , Àk constant
Test for multicollinearity must be done to identify whether there are some functional relationships among explanatory variables so that improve the precision and accuracy for the model
1.2 Consequences There are several consequences when imperfect multicollinearity exists, namely:
L] Large variances and covariances which make the estimation less accurate
The estimation confidence intervals tend to be much wider, increasing the chance to accept “zero null hypothesis”
The t-statistics of coefficients tend to be statistically insignificant
The R2 can be very high
The OLS estimators and their standard errors can be sensitive to small changes in the data
1.3 Detection In order to find out how much the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity, we use a method which uses the variance inflation factor: VIF 18
[View|Proc|Objeet] |Print| Name |Freeze | |Estimate | Forecast | Stats | Resids |
Variance Inflation Factors Date: 04/13/21 Time: 11:26 Sample: 1995 2019 Included observations: 25
As illustrated in Figure 6, the calculated Variance Inflation Factors (VIFs) exceed the benchmark of 10 This indicates that the regression variables exhibit a high degree of collinearity, suggesting the presence of a multicollinear relationship among these variables.
1.4 Remedial measures To sum up, our model incurs a significant issue related to multicollinearity (except BIAS)
Multicollinearity arises from data deficiency, leaving researchers with limited options for data selection In such situations, refraining from action may be the optimal response, acknowledging the constraints imposed by the available data.
2 Heteroscedasticity Heteroskedasticity (unequal conditional variance of error terms), is the most popular error as constructing models with cross sectional data This contrasts Assumption 3 about
19 homoscedasticity Compared to multicollinearity, heteroskedasticity contributes to a more serious problem
2.1 The nature Heteroscedasticity exists if variances of error terms in any model are not constant according to changes in explanatory and explained variables Symbolically, Var() = EQ = is not constant (for 7 = /, 2, , 7)
This indicates that the disturbance for each of the n-units is drawn from a probability distribution that has a different variance
There are in fact both formal and informal methods to test for the existence of heteroscedasticity
2.2 Consequences Oo ©6OLS estimators are still linear and unbiased O Var()s are not minimum
Oo Var(Q) = instead of Var() ủ The estimated variances and covariances are biased and inconsistent O tand F statistics are unreliable
White's heteroscedasticity test is employed to assess the presence of heteroscedasticity in the OLD model By omitting the cross term, degrees of freedom are preserved without compromising the validity of the test.
[E]Equation: ESTIMATEEOUATION_ Workfile: ECO_ASSIGNM [ | E] E3 ] [view] Proc] Object] [Print| Name|Freeze| [Estimate |Forecast|Stats|Resids]
Obs*R-squared 22.17663 Prob Chi-Square(14) 0.0751 Scaled explained SS 6.917226 Prob Chi-Square(14) 0.9379
Dependent Variable: RESID‘2 Method: Least Squares Date: 04/13/21 Time: 11:59 Sample: 1995 2019 Included observations: 25
Variable Coefficient Std Error t-Statistic Prob
R-squared 0.887065 Mean dependent var 1.22E+10 Adjusted R-squared 0.728957 S.D dependent var 1.23E+10 S.E of regression 6.38E+09 Akaike info criterion 48.27424 Sum squared resid 4.07E+20 Schwarz criterion 49.00557 Log likelihood -588.4281 Hannan-Quimn criter 48.47708 F-statistic 5.610486 Durbin-Watson stat 2.398176
Figure 7: White’s Heteroscedasticity test Hypothesis: Homoscedasticity Var Q : Heteroscedasticity Var Q) Test statistic: W =n % 0.887065 = 22.176625 Critical value: = = 23.685
Decision rule: If W => Reject Compare: = 23.685 > W = 22.176625 => Do not Reject
Conclusion: There is not enough evidence to infer that heteroscedasticity exists from this model at 5% level of significance
The model complies with Assumption 3 of the CLRM as evident in the hypotheses testing Subsequently, the issue of Heteroscedasticity was addressed by transforming the model to a log-log structure.
“White Heteroskedasticity-consistent standard errors & covariance” method which is a robust method This action helps us have a better model with the result from Eviews below:
| View| Proc | Object| | Print | Name | Freeze | [ Estimate | Forecast | Stats | Resids |
Heteroskedasticity Test Breusch-Pagan-Godfrey
Obs*R-squared 6.339463 Prob Chi-Square(4) 0.1752 Scaled explained SS 4.328563 Prob Chi-Square(4) 0.3634
Dependent Variable: RESID*2 Method: Least Squares Date: 05/15/21 Time: 12:45 Sample: 1995 2019 Included observations: 25 White heteroskedasticity-consistent standard errors & covariance
Variable Coefficient Std Error t-Statistic Prob
R-squared 0.253579 Mean dependent var 0.002825 Adjusted R-squared 0.104294 SD dependentvar 0.004212 S.E of regression 0.003986 Akaike info criterion -8.035199 Sum squared resid 0.000318 Schwarz criterion -7.791424 Log likelihood 105.4400 Hannan-Quinn criter -7.967586 F-statistic 1.698628 Durbin-Watson stat 1.807876 Prob(F-statistic) 0.189814
Figure 8: White Heteroskedasticity-consistent standard errors & covariance test
In this new model, we have the W-statistic (n) = 6.339463 < = 9.49 which shows that the model still be a Homoskedasticity
Autocorrelation tests are crucial for verifying the absence of linear relationships between errors, a violation of Assumption 5 Similar to heteroskedasticity tests, their importance stems from the potential inflation of the coefficient's standard error, resulting in a higher than optimal variance.
When the assumption of zero autocorrelation (cov(e_i, e_j) = 0 for all i and j) is violated, serial correlation among the disturbances in the population regression function arises This serial correlation can be detected by examining the covariance between the error terms (cov(e_i, e_j) ≠ 0 for some i and j), indicating a non-random pattern in the residuals.
3.2 Consequences The estimated coefficients remain unbiased O Var() is no longer the smallest Therefore, its standard error also becomes large ủ The usual t and F tests of significance are no longer valid
O The residual variance = 1s likely to underestimate the true O R-squared is more likely to be overestimated
3.3 Detection a Durbin-Watson Test AR(1):
Step 1: Ho: There is no positive autocorrelation existing Ha: There is positive autocorrelation existing Step 2: Test statistic
Durbin- Watson stat: d* = 0.561738 (from Eviews) Step 3: Critical value: n = 25, k’= 4 => =0.832, = 1.521 Step 4: Decision rule
Reject Ho if 0 < DW stat <
Do not reject Ho if 4 - < DW stat < 4 or < DW stat