tiểu luận kinh tế lượng factors that determine housing prices

17 91 0
tiểu luận kinh tế lượng factors that determine housing prices

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

FOREIGN TRADE UNIVERSITY FACULTY OF INTERNATIONAL ECONOMICS -o0o - ECONOMETRIC REPORT Topic: Factors that determine housing prices Class : KTEE 309.1 Group No : Student Name – ID : Nguyen Ha Trang - 1711150066 – 40% Nguyen Mai Thuy Tien - 1711150064 – 30% Nguyen Thi Lan Huong - 1715150032 – 30% Supervisor : Dr Dinh Thanh Binh Hanoi, 2018 Group Econometric Report Table of Contents II Introduction III Literature overview Questions of interest Procedure and program used IV Economic model Specifying the object for modeling Defining the target for modeling by the choice of the variables to analyze, denoted xi  Embedding that target in a general unrestricted model (GUM) .4 V Econometric model VI Data collection Data overview Data description VII Estimation of econometric model Checking the correlation among variables .6 Regression run VIII Check multicollinearity and heteroscedasticity Multicollinearity .9 Heteroskedasticity 10 IX Hypothesis postulated 12 The impact of neighborhood factors 12 The impact of accessibility factors 13 X Result analysis & Policy implication 14 XI Conclusion 15 XII References 16 Exhibit 1: Definition of variables in the Housing Price model Exhibit 2: Statistic indicators of variables in the Housing Price model Exhibit 3: Correlation matrix Exhibit 4: Scatterplot of variables in the Housing Price model Exhibit 5: Regression model Exhibit 6: Multicollinearity test Exhibit 7: Heteroskedasticity test 10 Exhibit 8: Residual-versus-fitted plot of the Housing Price model 11 Exhibit 9: Correcting heteroskedasticity 11 Exhibit 10: Hypothesis testing of multiple regression model of neighborhood factors 12 Exhibit 11: Hypothesis testing of multiple regression model of accessibility factors 13 Group Econometric Report I Introduction As much as Economy is a meaningful science that determines the social development in general and national growth in particular, Econometrics is the use of statistical techniques to understand those issues and test theories Without evidence, economic theories are abstract and might have no bearing on reality (even if they are completely rigorous) Econometrics is a set of tools we can use to confront theory with real-world data Given the data set, our group, which includes three members: Nguyen Ha Trang, Nguyen Mai Thuy Tien, and Nguyen Thi Lan Huong, follows the methodology of econometric comprising eight steps to analyze the data Note that because of the lack of information on the data set, all inferences of abbreviations and others are based on assumptions and self-research As a result, we hope to have shown clearly our logic and reasoning of analysis To the extent of purpose and resources, there are still deficiencies in this report, but we look forward to providing readers with a decent view of the overall of the data set given and the knowledge that we have gained through Dr Dinh Thanh Binh’s Econometrics course II Literature overview Questions of interest “Why housing prices differ among locations and regions?” – this is the basic question to which this report targets to find the answer Although there is a variety of factors that might affect housing prices, they are divided into four main categories: structure, neighborhood, accessibility, and air pollution Consequently, elements that represent each of these categories are taken into account to find out whether they do, or at least statistically have an impact on housing prices In following parts, models are going to be built, data are going to be used in order to run the regression model and then the results are going to be analyzed to finally answer the question of interest above Procedure and program used  Procedure Step 1: Questions of interest Step 2: Economic model Step 3: Econometric model Step 4: Data collection Step 5: Estimation of econometric model Step 6: Check multicollinearity and heteroscedasticity Step 7: Hypothesis postulated Step 8: Result analysis & Policy implication  Stata program is primarily used to analyze the data and run the regression Group Econometric Report III Economic model As data are provided up front, the economic model used in this report is an empirical one Note that the fundamental model is mathematical; with an empirical model, however, data is gathered for the variables and using accepted statistical techniques, the data are used to provide estimates of the model's values Empirical model discovery and theory evaluation are suggested to involve five key steps, but for the limitation of purpose and resources, this part of the report only follows three of them: (1) specifying the object for modeling, (2) defining the target for modeling, (3) embedding that target in a general unrestricted model Specifying the object for modeling price  f x (1) As such, this report finds the relationship between housing price, which is the object for modeling, and each of relating factors including structure, neighborhood, accessibility, and air pollution ones Defining the target for modeling by the choice of the variables to analyze, denoted x i  As mentioned above, there are four main categories that are expected to affect housing prices: structure, neighborhood, accessibility, and air pollution Hence, the choices of xi  would be such variables that constitute them After thorough research, factors have been narrowed down to eight significant ones: (structure) number of rooms, (neighborhood) crimes, property tax, the percentage of people of low status, student-teacher ratio, (accessibility) distances to employment centers, accessibility to radial highways and (air pollution) nitrous oxide Embedding that target in a general unrestricted model (GUM) In its simplest acceptable representation (which will later be specified in the econometric model), the GUM of is determined to be: lprice  f crim, nox , rooms , dist , radial , proptax , stratio, lowstat  A brief description of each variable is given in Exhibit Exhibit 1: Definition of variables in the Housing Price model Variable lprice crime nox rooms dist radial proptax stratio lowstat Definition logarithm of median housing price, $ crimes committed per capita nitrous oxide, parts per 100 million square average number of rooms per house weighted distances to employment centers accessibility index to radial highways property tax per $1000 average student-teacher ratio percentage of people of low status Group Econometric Report IV Econometric model To demonstrate the relationship between housing price and other factors, the regression function can be constructed as follows:  (PRF): i lprice   o   crime   nox   rooms   dist   radial   proptax   stratio   lowstat   7  (SRF): i lprice   o   crime   nox   rooms   dist   radial   proptax   stratio   lowstat   where:  is the intercept of the regression model i is the slope coefficient of the independent variable xi  is the disturbance of the regression model  is the estimator of 0  is the estimator of i i  is the residual (the estimator of i ) i From this model, this report is interested in explaining lprice in terms of each of the eight independent variables ( crim, nox , rooms , dist , radio , proptax , stratio ) V Data collection Data overview  This set of data is a secondary one, as they are collected from a given source  Data source: Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, by D.A Belsey, E Kuh, and R Welsch, 1990 New York: Wiley  The structure of Economic data: cross-sectional data Data description To get statistic indicators of the variables, in Stata, the following command is used: sum lprice crime nox rooms dist radial proptax stratio lowstat The result is shown in Exhibit Exhibit 2: Statistic indicators of variables in the Housing Price model Variable Obs Mean lprice 506 9.941057 crime nox rooms dist 506 506 506 506 3.611536 5.549783 6.284051 3.795751 radial 506 proptax stratio lowstat 506 506 506 Min Max 4092549 8.517193 10.8198 8.590247 1.158395 7025938 2.106137 006 3.85 3.56 1.13 88.976 8.71 8.78 12.13 9.549407 8.707259 24 40.82372 18.45929 12.70148 16.85371 2.16582 7.238066 18.7 12.6 1.73 71.1 22 39.07 Std Dev Group Econometric Report where: Obs is the number of observations Std Dev is the standard deviation of the variable Min is the minimum value of the variable Max is the maximum value of the variable VI Estimation of econometric model Checking the correlation among variables First of all, the correlation of lprice and nox, rooms, dist, radial, proptax, stratio, lowstat is checked by calculating the correlation coefficient among these variables The correlation coefficient r measures the strength and direction of a linear relationship between two variables on a scatterplot In Stata, the correlation matrix is generated with the command: corr lprice crime nox rooms dist radial proptax stratio lowstat The result is shown in Exhibit Exhibit 3: Correlation matrix lprice crime lprice nox rooms dist radial proptax stratio lowstat 1.0000 0.2054 -0.2098 -0.2921 -0.3540 -0.6096 1.0000 -0.4951 -0.5344 -0.2293 -0.4956 1.0000 0.9102 0.4642 0.4760 1.0000 0.4542 0.5276 1.0000 0.3654 1.0000 crime nox rooms dist radial proptax stratio lowstat -0.5275 -0.5088 0.6329 0.3420 -0.4810 -0.5597 -0.4976 -0.7914 1.0000 0.4212 -0.2188 -0.3799 0.6254 0.5828 0.2887 0.4470 1.0000 -0.3028 -0.7702 0.6103 0.6670 0.1869 0.5856 From the matrix, it can be inferred that the correlation between lprice and each of the independent variable is decent enough to run the regression model Specifically: - lprice and crime have a moderate downhill relationship - lprice and nox have a moderate downhill relationship - lprice and nox have a moderate uphill relationship - lprice and dist have a weak uphill relationship - lprice and radial have a moderate downhill relationship - lprice and proptax have a moderate downhill relationship - lprice and proptax have a moderate downhill relationship - lprice and proptax have a strong downhill relationship The correlation between each pair of variables can be visualized using the scatter command in Stata The result is shown in Exhibit 1.0000 Group Exhibit 4: Scatterplot of variables in the Housing Price model Econometric Report Group Econometric Report Regression run Having checked the required condition of correlation among variables, the regression model is ready to run In Stata, this is done by using the command: reg lprice crime nox rooms dist radial proptax stratio lowstat The result is shown in Exhibit Exhibit 5: Regression model Source SS Model Residual 64.8618936 19.7203314 Total lprice df 497 MS P>|t| Number of obs F( 8, 497) Prob > F R-squared Adj R-squared Root MSE [95% Conf 8.1077367 039678735 84.582225 505 167489554 Coef Std Err t 506 204.33 0.0000 0.7669 0.7631 1992 Interval] = = = = = = crime -.0111825 0013614 -8.21 0.000 -.0138573 -.0085078 nox rooms dist radial proptax stratio lowstat _cons -.0754564 0996545 -.0463708 0133694 -.0062133 -.0413327 -.0280384 11.19507 0146936 0167697 0067557 0026525 0013807 0050633 0019154 2037294 -5.14 5.94 -6.86 5.04 -4.50 -8.16 -14.64 54.95 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -.1043256 0667061 -.0596441 008158 -.008926 -.0512807 -.0318016 10.79479 -.0465873 1326028 -.0330975 0185808 -.0035006 -.0313846 -.0242752 11.59535 From the result, it can be inferred that  crime, nox, rooms, dist, radial, proptax, stratio and lowstat all have statistically significant effects on lprice at the 5% significant level (as all p-values are smaller than 0.05) In particular, those effects can be specified by the regression coefficients as follows: -   11.1951 : When all the independent variables are zero, the expected value of housing price is 1011.1951 -  0.0112 : When the number of crime committed per capita increases by one, the expected value of housing price decreases by 1.12% -  0.0755 : When nitrous oxide increases by one part per 100 million square, the of expected value housing price decreases by 7.55%  0.0997 - : When the number of rooms increases by one, the expected value of housing price decreases by 9.97%  0.0464 - : When the distance to employment centers increases by one unit, the expected value of housing price decreases by 4.64% Group Econometric Report -   0.013 : When the accessibility index to radial highways increases by one unit, the expected value of housing price increases by 4.64% -  0.0062 : When the property tax per $1000 increases by $1, the expected value of housing price decreases by 0.62% -  0.0413 : When the student-teacher ratio increases by 1%, the expected value of housing price decreases by 4.13% -  0.028 : When the percentage of people of lower status increases by 1%, the expected value of housing price decreases by 2.80%  The coefficient of determination R  squared  0.7669 : all independent variables (crime, nox, rooms, dist, radial, proptax, stratio, lowstat) jointly explain 76.69% of the variation in the dependent variable (lprice); other factors that are not mentioned explain the remaining 23.31% of the variation in the lprice  Other indicators: Adjusted coefficient of determination adj R-squared = 0.7631 Total Sum of Squares TSS = 84.5822 Explained Sum of Squares ESS = 64.8619 Residual Sum of Squares RSS = 19.7203 - - - - The degree of freedom of Model Dfm= The degree of freedom of residual Dfr = 497  Based on the data collected from the table, the sample regression function is established: SRF : lprice  11.2  0.01crime  0.08nox  0.1rooms  0.05dist  0.01radial  0.01proptax 0.03stratio  0.03lowstat   VII Check multicollinearity and heteroscedasticity Multicollinearity Multicollinearity is the high degree of correlation amongst the explanatory variables, which may make it difficult to separate out the effects of the individual regressors, standard errors may be overestimated and t-value depressed The problem of Multicollinearity can be detected by examining the correlation matrix of regressors and carry out auxiliary regressions amongst them In Stata, the vif command is used, which stand for variance inflation factor Exhibit shows the result Exhibit 6: Multicollinearity test Variable VIF 1/VIF proptax 6.89 0.145103 radial nox dist lowstat rooms crime stratio 6.79 3.69 2.58 2.45 1.77 1.74 1.53 0.147301 0.271206 0.388106 0.408804 0.565985 0.574531 0.653369 Mean VIF 3.43 Group Econometric Report The value of VIF here is lower than 10, indicating that Multicollinearity is not too worrisome a problem for this set of data Heteroskedasticity Heteroskedasticity indicates that the variance of the error term is not constant, which makes the least squares results no longer efficient and t tests and F tests results may be misleading The problem of Heteroskedasticity can be detected by plotting the residuals against each of the regressors, most popularly the White’s test It can be remedied by respecifying the model – look for other missing variables In Stata, the imtest white command is used, which stands for information matric test Exhibit shows the result Exhibit 7: Heteroskedasticity test imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(44) Prob > chi2 = = 235.31 0.0000 Cameron & Trivedi's decomposition of IM-test Source chi2 df p Heteroskedasticity 235.31 44 0.0000 Skewness Kurtosis 34.20 12.38 0.0000 0.0004 Total 281.89 53 0.0000 At the 5% significance level, there is enough evidence to reject the null hypothesis and conclude that this set of data meets the problem of Heteroskedasticity Another way to test if Heteroskedasticity exists is to graph the residual-versus-fitted plot, which can be generated using the rvfplot, yline (0) line command in Stata The result is shown in Exhibit 10 Group Econometric Report Exhibit 8: Residual-versus-fitted plot of the Housing Price model In a well-fitted model, there should be no pattern to the residuals plotted against the fitted values - something not true of our model Ignoring the outliers at the top center of the graph, we see curvature in the pattern of the residuals, suggesting a violation of the assumption that price is linear in our independent variables We might also have seen increasing or decreasing variation in the residuals— heteroskedasticity To fix the problem, robust standard errors are used to relax the assumption that errors are both independent and identically distributed In Stata, regression is rerun with the robust option, using the command: reg lprice crime nox rooms dist radial proptax stratio lowstat, robust Exhibit shows the result Exhibit 9: Correcting heteroskedasticity Linear regression F( Prob > F R-squared Root MSE Number of obs = 8, 497) = = = = 506 179.01 0.0000 0.7669 1992 Robust lprice Coef Std Err t crime -.0111825 0019035 -5.87 0.000 -.0149225 -.0074426 nox rooms dist radial proptax stratio lowstat _cons -.0754564 0996545 -.0463708 0133694 -.0062133 -.0413327 -.0280384 11.19507 0150626 025796 0068001 0029003 0013641 0042322 003584 2672806 -5.01 3.86 -6.82 4.61 -4.55 -9.77 -7.82 41.89 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -.1050506 0489718 -.0597312 0076711 -.0088935 -.0496478 -.0350801 10.66993 -.0458622 1503372 -.0330103 0190677 -.0035331 -.0330175 -.0209967 11.72021 11 P>|t| [95% Conf Interval] Group Econometric Report Note that comparing the results with the earlier regression, none of the coefficient estimates changed, but the standard errors and hence the t values are different, which gives reasonably more accurate p values VIII Hypothesis postulated The impact of neighborhood factors The question of interest: In the multiple regression model: lprice   o   crime   nox   rooms   dist   radial   proptax   stratio   lowstat   (full model) Does the subset of independent variables (crime, proptax, lowstat, stratio) contribute to explaining/ predicting lprice? Or, would it just as well if these variables were dropped and we reduced the model to lprice   o  nox   rooms   dist   radial   (reduced model) From this question, the following hypothesis is postulated: Null Hypothesis: The initial assumption is that the subset does not contribute to the model's explanatory power Alternative Hypothesis: At least one of the independent variables in the subset is useful in explaining/predicting lprice  o :   which is expressed as: H    H : at least one  0 j 0 In Stata, the test statistic F is calculated using the command: test crime proptax lowstat stratio The result is shown in Exhibit 10 Exhibit 10: Hypothesis testing of multiple regression model of neighborhood factors ( ( ( ( 1) 2) 3) 4) crime = proptax lowstat stratio F( 4, = = = 497) = Prob > F = 112.88 0.0000 As a result, there is enough evidence to reject the null hypothesis and conclude that at least one independent variable in the subset (crime, proptax, stratio, lowstat) does have explanatory or predictive power on lprice, so we don’t reduce the model by dropping out this subset 12 Group Econometric Report The impact of accessibility factors The question of interest: In the multiple regression model: lprice   o   crime   nox   rooms   dist   radial   proptax   stratio   lowstat   (full model) Does the subset of independent variables (dist, radial) contribute to explaining/ predicting lprice? Or, would it just as well if these variables were dropped and we reduced the model to lprice   o   crime   nox   rooms   proptax   stratio   lowstat   (reduced model) From this question, the following hypothesis is postulated: Null Hypothesis: The initial assumption is that the subset does not contribute to the model's explanatory power Alternative Hypothesis: At least one of the independent variables in the subset is useful in explaining/predicting lprice which is expressed as:  H o :      j H :  at least one   In Stata, the test statistic F is calculated using the command: test dist radial The result is shown in Exhibit 11 Exhibit 11: Hypothesis testing of multiple regression model of accessibility factors ( 1) dist = ( 2) radial = F( 2, 497) = 34.91 Prob > F = 0.0000 As a result, there is enough evidence to reject the null hypothesis and conclude that at least one independent variable in the subset (dist, radial) does have explanatory or predictive power on lprice, so we don’t reduce the model by dropping out this subset 13 Group Econometric Report IX Result analysis & Policy implication From data analysis in preceding sections, we have gained an overall view of the data set given in terms of the statistical proof of the relationship between housing prices and each of the factors proposed As mentioned at the beginning of this report, we aim to learn how structure, neighborhood, accessibility, and air pollution features are associated with housing price In other words, we are concerned about what is the willingness of buyers to pay for these components Following the analysis of data, regression model run and hypothesis testing, it can be concluded that structure, neighborhood, accessibility, and air pollution factors affect, or at least statistically so, the housing prices Therefore, tenants, investors or constructors should take all of these ingredients into account when making deals 14 Group Econometric Report X Conclusion This report is completed on the dedicated contribution of each member and the knowledge from our study in Econometrics This also provides us with a good opportunity to practice what we have learned and to get a deeper understanding of data analysis and relevant testing From this useful application, we hope that our work can somehow suggest the relationship between the housing prices and structure, neighborhood, accessibility, air pollution factors Again, due to the limitation of understanding and resources, our report may contain misinterpretations We hope that Dr Le Thanh Binh and readers can give us constructive comments on the report so that we would improve ourselves and better in the future Sincerely, Group 15 Group Econometric Report XI References https://www.york.ac.uk/media/economics/documents/seminars/Hendry_Feb_2011.pdf http://pages.hmc.edu/evans/chap1.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.926.5532&rep=rep1&type=pdf D.A Belsey, E Kuh, and R Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: Wiley (1990) 16 ... a variety of factors that might affect housing prices, they are divided into four main categories: structure, neighborhood, accessibility, and air pollution Consequently, elements that represent... four main categories that are expected to affect housing prices: structure, neighborhood, accessibility, and air pollution Hence, the choices of xi  would be such variables that constitute them... hypothesis testing, it can be concluded that structure, neighborhood, accessibility, and air pollution factors affect, or at least statistically so, the housing prices Therefore, tenants, investors

Ngày đăng: 22/06/2020, 21:30

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan