Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 22 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
22
Dung lượng
74,24 KB
Nội dung
Group 10 I Econometrics Report INTRODUCTION Overall about econometrics Why choosing OLS? II QUESTION OF INTEREST .5 III ECONOMIC MODEL Choosing the variables .5 Embedding that target in a general unrestricted model (GUM) IV ECONOMETRICS MODEL Population regression function (PRF) Sample regression function (SRF) .9 V DATA COLLECTION .10 Data overview 10 Data description 10 VI ESTIMATION OF ECONOMETRIC MODEL 10 Checking the correlation among variables: .10 Regression run 12 VII CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY .15 Multicollinearity .15 Heteroskedasticity 16 VIII HYPOTHESES POSTULATED 19 The t test .19 Confidence Intervals .21 P Value 22 Testing the overall significance: The F test 23 IX RESULT ANALYSIS AND POLICY IMPLICATION 24 X CONCLUSION 24 XI REFERENCES 25 Y Figure Figure Figure Figure .10 .11 Group 10 Econometrics Report Figure Figure Figure Figure Figure .13 .15 .16 .18 .21 Group 10 Econometrics Report I INTRODUCTION 1.Overall about econometrics Econometrics is the application of statistical methods to economic data and is described as the branch of economics that aims to give empirical content to economic relations. Precisely speaking, it is the quantitative analysis of actual economic problems, based on the concurrent development of theory and observation, related by appropriate methods of inference. It is understandable that economist make comparison econometrics is like an effective tool to convert mountains of data into extract simple relationships The reason why econometrics is effective is economics theory use statistical theory and mathematical statistics to evaluate and develop econometrics method In reality, econometrics help economists to assess economic theories, developing econometrics model, analyzing and forecasting the economic history Aware of the importance of econometrics to economic phenomena, our group decides to carry out a research of econometrics: “The factors that have influence on median housing price” and aim to analyze statistic and point out differences and their reason of price level The data set has 506 observations with 12 variables in total We choose 6 variables: price, crime, nox, rooms, dist and proptax to do the research in which price is dependent variable and the other five are independent variables. The general method used in this research is OLS (ordinary least squares). In addition, the specialized method is estimate, running Stata software as well Group 10 Econometrics Report During carrying out this research, our group is so lucky to be guided thoroughly by Dr. Dinh Thi Thanh Binh. We are grateful for everything you have taught us! This is the first time our group carry out an econometrics research, our performance is unavoidable to have many mistakes It would be a pleasure if we can receive the feedback from you to better ourselves next time 2.Why choosing OLS? Ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function. With the six selected variables, we use the OLS model because all regressions variable are exogenous variables, the effects of independent variables on the dependent variable are linear effects In addition, the estimates calculated by means of the least squares OLS are linear estimates that are not deviate and are better than others When using OLS, we have some basic assumptions: The regression model is linear in the parameters X values are fixed in repeated sampling, which means Xi and ui are uncorrelated Zero mean value of disturbance (E(ui)) =0) Homoscedasticity or equal variance of ui : var(ui) = No correlation between disturbances The model is correctly specified Group 10 II Econometrics Report Number of observations must be greater than the number of parameters to be estimated X values in a given sample must not be the same No perfect multicollinearity 10 Normal distribution. QUESTION OF INTEREST We have always been wondering “Why housing prices among locations and regions differ so much?” Housing prices are affected by many different factors such as structure, neighborhood, accessibility, air pollution and so on. To seek the answer to that question, our group is going to use the collected data to build and run the regression model and then the results are going to be analyzed to finally answer the question of interest above III ECONOMIC MODEL According the provided data, the economic model used in this report is an empirical one. Note that the fundamental model is mathematical; with an empirical model, however, data is gathered for the variables and using accepted statistical techniques, the data are used to provide estimates of the model's values Choosing the variables Having described data via the command “des” in file… from Stata software, we gain the result as following: des obs: 506 vars: 12 31 Oct 1996 16:37 size: 22,770 Group 10 Econometrics Report Variable name storag display valu e type format e variable label median housing price float %9.0g price, $ crimes committed crime float %9.0g per capita nit ox concen; nox float %9.0g parts per 100m avg number of rooms float %9.0g rooms wght dist to 5 dist float %9.0g employ centers access. index to radial byte %9.0g rad. hghwys property tax per proptax float %9.0g $1000 Group 10 Econometrics Report average student stratio float %9.0g teacher ratio perc of people lowstat lprice lnox lproptax Figure 1 float float float float %9.0g %9.0g %9.0g %9.0g 'lower status' log(price) log(nox) log(proptax) The above table reveal that this is the statistic of factors which have influence in housing price via 506 observations. After discussing carefully, our group jumped into a conclusion to choose a dependent variable Y: Price, independent variable contains: X1crime X2nox X3rooms X4dist X5proptax Embedding that target in a general unrestricted model (GUM) In its simplest acceptable representation (which will later be specified in the econometric model), the GUM of is determined to be: A brief description of each variable is given in Figure 1 Group 10 Econometrics Report Name Dependent Variable (Y) Independent Variables (X) Meaning Expected Price Median housing price sign + Crime Number of crimes Nox committed per capita The amount of nitrogen oxide concentrator parts Rooms in the air per 100m The average number of + Dist rooms Weight distance to 5 Proptax employ centers Property tax per $1000 Figure 2 IV ECONOMETRICS MODEL Population regression function (PRF) PRF: Sample regression function (SRF) SRF: where: is the intercept of the regression model i is the slope coefficient of the independent variable xi is the disturbance of the regression model is the estimator of is the estimator of i is the residual (the estimator of i ) Group 10 Econometrics Report V DATA COLLECTION Data overview This set of data is collected from a given source, therefore it is a secondary one The structure of Economic data: crosssectional data Data description To get statistic indicators of the variables, in Stata, the following command is used: sum Variab Std le Obs Mean 22511 Dev 9208.85 Min Max price 506 51 3.6115 8.59024 5000 0.00 50001 88.97 crime 506 36 5.5497 1.15839 6 nox 506 83 6.2840 0.70259 3.85 8.71 rooms 506 51 3.7957 38 2.10613 3.56 8.78 dist propta 506 51 40.823 16.8537 1.13 12.13 x 506 72 18.7 71.1 Figure 3 where: Obs is the number of observations Std. Dev is the standard deviation of the variable Min is the minimum value of the variable Max is the maximum value of the variable VI ESTIMATION OF ECONOMETRIC MODEL Checking the correlation among variables: Group 10 Econometrics Report price price crime crime nox rooms dist proptax 0.3879 nox 0.426 0.4212 rooms 0.6958 0.2188 0.3028 dist 0.2493 0.3799 0.7702 0.2054 0.4671 0.5828 0.667 0.2921 proptax Figure 4 0.5344 First and foremost, the correlation of Price and nox, crime, rooms, dist, proptax is checked by calculating the correlation coefficient among these variables The correlation coefficient measures the strength and direction of a linear relationship between two variables on a scatterplot. In Stata, the correlation with matrix is generated the command: corr price crime nox rooms dist proptax We can see from the matrix, it can be inferred that the correlation between price and each of the independent variable is decent enough to run the regression model. Specifically: Correlation coefficient between price and crime is 0.3879 => price and crime have a moderate relationship Correlation coefficient between price and nox is 0.426 => price and nox have a moderate relationship Correlation coefficient between price and rooms is 0.6958 => price and rooms have a moderate relationship Correlation coefficient between price and dist is 0.2493 => price and dist have a weak relationship Correlation coefficient between price and proptax is 0.4671 => price and proptax have a moderate relationship 10 Group 10 Econometrics Report Independent variables including Rooms and Dist have correlation coefficient larger than 0, which means they are in directly relationship with dependent variable The highest coefficient is 0.6958 (between Rooms and Price) points out that Rooms have the strongest impaction on Price When rooms increases, then price will increase much. On the other hands, the correlation coefficient between Price and Dist is 0.2493 It implies that they have not strong connection. Even if the Dist increases, Price increases but not much. In addition, all variables have correlation coefficient not larger than 0.8 so this model does not have multicollinearity problem. Regression run Having checked the required condition of correlation among variables, the regression model is ready to run. In Stata, this is done by using the command: Reg price nox crime rooms dist proptax Number of obs F( 5, = 506 500) Prob > F R = = 142.92 Source Model SS 2.52E+10 df MS 5.04E+09 Residual 1.76E+10 500 35258403.7 squared Adj R = 0.5883 Total 4.28E+10 505 84803032 squared = 0.5842 11 Group 10 Econometrics Report Root MSE [95% Std. Err t 38.11571 410.7763 399.0772 P>t = 5937.9 price crime nox rooms Coef 150.0703 1737.66 7707.327 dist 791.2588 197.9444 4 1180.164 402.3535 proptax 89.95717 23.61555 3.81 0.02 136.3551 43.55923 _cons 9060.303 3978.871 2.28 16877.67 1242.937 3.94 4.23 19.31 Conf Interval] 224.957 75.18364 2544.72 930.5992 6923.252 8491.402 Figure 5 From table above we have Sample Regression Function: Price = 9060.303 1737.66*nox + 7707.327*rooms 89.95717*proptax From the result, it can be inferred that crime, nox, rooms, dist, proptax all have statistically significant effects on price at the 5% significant level (as all pvalues are smaller than 0.05). In particular, those effects can be specified by the regression coefficients as follows: β0 = 9060.303 1 = 1737.66 means that if nit ox concen per 100m increases by one , average housing price will decrease by 1737.66 in condition other factors do not change = 150.0703 means that if crimes committed per capital increases by one , average housing price will decrease by 150.0703 in condition other factors do not change 3 = 7707.327 means that if average number of rooms increases by one, average housing price will increase by 7707.327 in condition other factors do not change 12 Group 10 Econometrics Report 4 = 791.2588 means that if weight distance to 5 employ centers increases 1 unit, average housing price will decrease by 791.2588 in condition other factors do not change = 89.95717 means that if average property tax per $1000 increases by one, average housing price will decrease by 89.95717 in condition other factors do not change The coefficient of determination Rsquared=0.5883: all independent variables (crime, nox, rooms, dist, proptax,) jointly explain 58.83% of the variation in the dependent variable (price); other factors that are not mentioned explain the remaining 41.17% of the variation in the price Other indicators: Adjusted coefficient of determination adj Rsquared = 0.5842 Total Sum of Squares TSS = 4,28E+14 Explained Sum of Squares ESS = 2,52E+14 Residual Sum of Squares RSS = 1,76E+14 The degree of freedom of Model Dfm= 5 The degree of freedom of residual Dfr = 500 VII CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY Multicollinearity Multicollinearity is the high degree of correlation amongst the explanatory variables, which may make it difficult to separate out the effects of the individual regressors, standard errors may be overestimated and tvalue depressed Detect multicollinearity o Method 1: Use cor command to examine multicollinearity If independent variables are strongly correlated (r > 0.8), multicollinearity may occur price crime price 1.0000 0.3879 crime 1.0000 nox rooms dist proptax 13 Group 10 nox rooms dist proptax Figure 6 Econometrics Report 0.426 0.6958 0.2493 0.4671 0.4212 0.2188 0.3799 0.5828 1.0000 0.3028 0.7702 0.667 1.0000 0.2054 0.2921 1.0000 0.5344 1.0000 From the table above, we can easily see that correlating coefficient among independent variables are pretty low and all smaller than 0.8 As a result, we can conclude that multicollinearity does not occur in this model o Method 2: Use variance inflation factor (VIF) If VIF > 10, multicollinearity occurs Variable nox dist proptax crime rooms Mean VIF Figure 7 VIF 3.24 2.49 2.27 1.54 1.13 2.13 1/VIF 0.308352 0.401709 0.440742 0.651256 0.888073 The table shows that all VIF value is smaller than 10, thus, multicollinearity does not is occur in this model We can draw a conclusion from methods above that multicollinearity not too worrisome a problem for this set of data Heteroskedasticity Another problem that our model can suffer from when being examined is heteroskedasticity. Heteroskedasticity may result in the situation that some least squared estimators are still unbiased but are no longer effective, along with that, estimators 14 Group 10 Econometrics Report of variances will become biased, thus lead to the reduction in effectiveness of our model When the assumption of variance of each error term Ui is unchanged when i moves from 1, 2 to n. It can also be rewritten as: Var (Ui) = Var (Uj) i=1,2,3,…,n j=1,2,3,…,n When that assumption is violated, heteroskedasticity appears Causes o Essence of economic phenomena: If economic phenomena is examined on subjects having difference in scale or they are examined under periods of time that are not similar in fluctuation level o Model’s function is wrongly formatted, maybe because appropriate variables are missing or function analysis is false o cannot fully and correctly reflect the essence of economic phenomena For example, external observations appear Bringing in or eliminate these observations does great impact on regression analysis o Error tends to decrease as data collecting, conserving and processing techniques are improved o Behaviors in the past are learnt Hypothesis: Using the command estat hettest in STATA: 15 Group 10 Econometrics Report BreuschPagan / CookWeisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 26.56 Prob > chi2 = 0.0000 We can see that Prob > chi2 = 0.0000 We reject H0, accept H1 We can conclude that heteroskedasticity does occur in this model Correcting heteroskedasticity We use command: reg price crime nox rooms dist proptax, robust we have the result Number of obs F( Robust price Coef crime 150.0703 nox 1737.66 rooms 7707.327 dist 791.2588 proptax 89.95717 _cons 9060.303 Figure 8 Std. Err 30.45247 389.6642 670.6304 175.744 26.84788 5398.964 t 4.93 4.46 11.49 4.5 3.35 1.68 = 506 103.2 500) Prob > F = = 0.588 Rsquared = 5937 Root MSE = P>t 0 0 0.001 0.094 5, [95% Conf 209.9009 2503.241 6389.726 1136.546 142.7057 19667.75 Interval] 90.23976 972.0787 9024.928 445.9712 37.20862 1547.148 Note that comparing the results with the earlier regression, none of the coefficient estimates changed, but the standard 16 Group 10 Econometrics Report errors and hence the t values are different, which gives reasonably more accurate p values VIII HYPOTHESES POSTULATED The t test Hypothesis: c(500)0.025 = 1.965 Reject Conclusion: Number of crimes committed per capita has statistically signifincant effect on median housing price. Higher number of crimes commited per capita, lower median housing price Hypothesis: c(500)0.025 = 1.965 Reject Conclusion: nitrogen oxide concentrator per 100m has statistically signifincant effect on median housing price. Higher nitrogen oxide concentrator per 100m, lower median housing price Hypothesis: c(500)0.025 = 1.965 Reject Conclusion: The average number of rooms has statistically signifincant effect on median housing price, higher average number of rooms, higher median housing price Hypothesis: 4.5 17 Group 10 Econometrics Report c(500)0.025 = 1.965 Reject Conclusion weight distance to 5 employ centers has statistically signifincant effect on median housing price, higher weight distance to 5 employ centers, lower median housing price Hypothesis: c(500)0.025 = 1.965 Reject Conclusion Property tax per $1000 has statistically signifincant effect on median housing price, higher property tax per $1000, lower median housing price Confidence Intervals Test the following hypothesis: Variable Const X1 X2 X3 X4 Coefficient Significant Level 5% 5% 5% 5% 5% Confidence Interval (19667.75 ; 1547.148) (209.9009 ; 90.23976) (2503.241 ; 972.0787) (6389.726 ; 9024.928) (1136.546 ;445.9712) 18 Group 10 X5 Econometrics Report 5% (142.7057 ; 37.20862) Figure 9 We can see that for all coefficients, doesn’t belong to the confidence interval, so we reject the hypotheses H0: , , , , Conclusion: Number of crimes committed per capita, nitrogen oxide concentrator per 100m, the average number of rooms, weight distance to 5 employ centers and property tax per $1000 all have statistically signifincant effect on median housing price with the confidence level of 95%. P Value Hypothesis testing: Pvalue = 0.0004 Reject H0 Number of crimes committed per capita has statistically signifincant effect on median housing price Higher number of crimes commited per capita, lower median housing price In particular, with the sample we have, the estimated result shows that one more crime committed decreases median housing price by 150.07$, holding other factors fixed Hypothesis testing: Pvalue = 0.0004 Reject H0 Nitrogen oxide concentrator per 100m has statistically signifincant effect on median housing price Higher nitrogen oxide concentrator per 100m, lower median housing price In particular, with the sample we have, the estimated result shows that one more unit in nitrogen oxide concentrator per 100m decreases median housing price by 1737.66$, holding other factors fixed Hypothesis testing: Pvalue = 0.0004 Reject H0 The average number of rooms has statistically signifincant effect on median housing price, higher average number of rooms, higher median housing price 19 Group 10 Econometrics Report In particular, with the sample we have, the estimated result shows that one more room added in the house increases median housing price by 7707.33 $, holding other factors fixed Hypothesis testing: Pvalue = 0.0004 Reject H0 Weight distance to employ centers has statistically signifincant effect on median housing price, higher weight distance to 5 employ centers, lower median housing price In particular, with the sample we have, the estimated result shows that one more unit increased in weight distance to 5 employ centers decreases median housing price by 791.25$, holding other factors fixed Hypothesis testing: Pvalue = 0.0008 Reject H0 Property tax per $1000 has statistically signifincant effect on median housing price, higher property tax per $1000, lower median housing price. In particular, with the sample we have, the estimated result shows that one more $ increased in property tax per 1000$ decreases median housing price by 89.96 $, holding other factors fixed Testing the overall significance: The F test This test is to examine if the parameters of the independent variable βi at the same time can be zero The hypothesis is as follows: = 142.92 > As a result, there is enough evidence to reject the null hypothesis and conclude that at least one independent variable in 20 Group 10 Econometrics Report the subset does have explanatory or predictive power on price, so we don’t reduce the model by dropping out this subset IX RESULT ANALYSIS AND POLICY IMPLICATION From data analysis in previous sections, we have gained an overall view of data set given in term of the satistical relationship between housing prices and each of the factors proposed. As mentioned at the beginning of this report, we aim to learn how security of the neighborhood, the air pollution, the size of house, accessibility and the property tax are associated with housing price. In other words, we are concerned about what is the willingness of buyers to pay for these components. Following the analysis of data, regression model run and hypothesis testing, it can be concluded that security of the neighborhood, the air pollution, the size of house, accessibility and the property tax statistically affect the housing prices. Therefore, tenants, investors or constructors should take all of these ingredients into account when making deals X CONCLUSION This report is completed on the dedicated contribution of each member and the knowledge from our study in Econometrics This research has provided us with a good opportunity to practice what we have learned and to get a deeper understanding of data analysis and relevant testing. From this useful application, we hope that our research can somehow suggest the relationship between the housing prices and some other factors Again, due to the limitation of understanding and resources, our report may contain misinterpretations. We hope that teacher and readers can give us constructive comments on the report so that we would improve ourselves and do better in the future XI REFERENCES 21 Group 10 Econometrics Report https://www.york.ac.uk/media/economics/documents/seminars/Hendry_ Feb_2011.pdf http://pages.hmc.edu/evans/chap1.pdf http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.926.5532&rep=rep1&type=pdf D.A. Belsey, E. Kuh, and R. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: Wiley (1990) 22 ... been wondering “Why housing prices among locations and regions differ so much?” Housing prices are affected by many different factors such as structure,... analysis and relevant testing. From this useful application, we hope that our research can somehow suggest the relationship between the housing prices and some other factors Again, due to the limitation of understanding and resources, our... Again, due to the limitation of understanding and resources, our report may contain misinterpretations. We hope that teacher and readers can give us constructive comments on the report so that we would improve ourselves and do better in the future