Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 19 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
19
Dung lượng
2,04 MB
Nội dung
NATIONAL ECONOMICS UNIVERSITY SCHOOL OF ADVANCE EDUCATION PROGRAM -*** - SUBJECT: ECONOMETRICS TITLE: EMPERICAL ANALYSIS: FACTORS AFFECTING HOUSE PRICE IN BRISTOLS OVER A PERIOD OF 2019 TO 2020 Lecturer: Ph.D Nguyen Manh The Class: ECONOMETRICS Group: Hanoi, 12/2022 Table of Contents I Introduction: II Data characteristics: III Model Estimation: Linear Regression Model: 1.1 For overall model 1.1.2 For Coefficients .9 1.2 Significance test 1.1.1 Goodness of fit 10 Other model specifications: .10 2.1 Logarithm transformation model 10 2.2 Interaction terms model: 10 2.3 IV Model estimated following the drop test of model (1): 11 Diagnostic test 11 Multicollinearity .11 1.1 Consequences 11 1.2 Remedy 12 Heteroscedasticity 12 2.1 Detection 12 2.2 2.3 Consequences 12 Remedy 12 Serial correlation .13 3.1 Detection 13 3.2 Consequences 13 3.3 Remedy 13 Model Specification error – Incorrect Functional Form and Omitted Variables 14 4.1 4.1.1 4.1.2 4.3 Detection 14 Ramsey test or Reset test .14 Jarque Bera test 14 Remedy 15 V Appendix: .16 I Introduction: Our research’s aim is to provide a comprehensive view of the Bristol House price regarding independent variables and empirical analysis in accordance with the matter at hand Data about property prices (Sale price in £) and some characteristics of the properties Data is available for 9151 property sales in Bristol during the years 2019 and 2020 independent variables: - Size - Number of rooms - Age - Detached - Flat To specific, our group have assessed data to analysis the regression of price on the variables size, number of rooms, age, detached and flat using OLS regression analysis We conduct those data given on a Population Model Regression: Given the data description and result, the model is formed: Where: Price = Sale price of Bristol house price (£) Size = Total floor area (m2) Numberrooms = Number of habitable rooms Age = if building part constructed before 2002 = 0, otherwise Detached = if property type: detached house = 0, otherwise Flat = if property type: flats/maisonettes = 0, otherwise The summary of model (1) in R-statistics is presented in the Figure in the Appendix To simplify the model, we would narrowly consider dummy variables “Detached” and “Flat” in case of equivalent to Sample regression functions: Explanation of coefficients: = 4145.18: If other independent variables fixed, when the total floor area increases square meter, the property price in Bristol will increase £4148.18 = 1130.13: If other independent variables fixed, when the number of habitable rooms increases unit, the property price in Bristol will increase £1130.13 = 6990.02: If other independent variables fixed, for house has building part constructed before 2002, the property price in Bristol will be higher than that of property having building part constructed after 2002 by £6990.02 = 67733.04: If other independent variables fixed, the property price of detached house in Bristol will be higher than price of semi-detached house by £6773.04 = 50140.83: If other independent variables fixed, average of the property price of flats/maisonettes in Bristol will be higher than price of others by £50140.83 Predict: β2 > 0, β3 > 0, β4 > 0, β5 > 0, β6 > The results concur with our expectation II Data characteristics: After analyzing the data in R, we have concluded in the following scatter diagrams for the relationship between the house prices and other factors such as floor area, numbers of rooms and age in the UK during the two-year period, 2019-2020 The graph above illustrates the correlation between the price of houses in the UK and the floor area It is immediately apparent that, there is a moderately strong, positive, linear association between the two variables with a few potential outliers It is clear that people would prefer the houses which has the floor area lower than 200 square meters have the price tag about £1 million or lower As the points in the graph are rising from left to right, the plot shows a positive correlation between houses’ price and size There appears to be some outliers in the data, although they are relatively unremarkable People have a tendency to purchase smaller houses that are more reasonably priced to them Regarding the numbers of rooms, there seems to be an upward trend since the prices increase as the residences consist of more rooms A house having about to rooms attracts more people than one having above 10 rooms The purchased property that has 12 or above 12 rooms occupy trivial proportions, compared to others Conducting the data in R, it is calculated that the mean prices of detached and flat type of property are £612,231 and £245,175, respectively It is evident that the average price of detached houses sold is about three times as high as that of flats/maisonettes This implies that detached houses purchased have more investment value than flat type in the long run The result of Rstudio is included in the Figure in the Appendix III Model Estimation: Linear Regression Model: 1.1 Significance test 1.1.1 For overall model Test the significance of the overall model, we have: H0: the model is not significant H1: the model is significant → Reject H0 → The model is significant 1.1.2 For coefficients • Test the significance of the coefficient of variable ‘Size’, we have: H0: = H1: → reject H0 Document continues below Discover more from: Econometrics 112 documents Go to course Bai giang Kinh te luong - co Hong Van 66 Econometrics 100% (6) Vi mô - laaaaa 92 Econometrics 100% (2) Baitap KTL - Exercise on chapter 30 Econometrics 100% (2) Huong Dan Su Dung Stata 2014 Tuan Anh UEH 65 Econometrics 100% (2) Lý thuyết tập kinh tế lượng chương có lời giải 19 20 Econometrics 100% (2) KTL Maianh - tập nhóm kinh tế lượng lớp thầy bùi dương hải Econometrics 100% (1) → The ‘Size’ variable is significant • Test the significance of the coefficient of variable ‘Numberrooms’, we have: H0: = H1: → Do not reject H0 → The ‘Numberrooms’ variable is not significant • Test the significance of the coefficient of variable ‘Age’, we have: H0: β4 = H1: β4 ≠ → Do not reject H0 → The ‘Age’ variable is not significant • Test the significance of the coefficient of variable ‘Detached’, we have: H0: β5 = H1: β5 ≠ → reject H0 → The ‘Detached’ variable is significant • Test the significance of the Flat coefficient, we have: H0: β6 = H1: β6 ≠ → reject H0 → The ‘Flat’ variable is significant 1.2 Goodness of fit Determination coefficient (R-squared): R2 = 71.11% means that about 71.11% of the variation of the property prices in Bristol in the UK can be explained by the model (1) Other model specifications: 2.1 Logarithm transformation model * Test significance of overall model: H0: the model is not significant H1: the model is significant → Reject H0 → The model is significant * Elasticity of regressor: As we have seen, the coefficient of an equation estimated using OLS regression analysis provides an estimate of the slope of a straight line that is assumed be the relationship between the dependent variable and at least one independent variable = 0.99305: If other independent variables fixed, when the total floor area increases 1% square meter, the property price in Bristol will increase 0.99305% = 0.01223: If other independent variables fixed, when the number of habitable rooms increases 1%, the property price in Bristol will increase 0.01223% = 0.02771: If other independent variables fixed, for house has building part constructed after 2002, the property price in Bristol will be higher than that of property having building part constructed before 2002 by 0.02771% = 0.17337: If other independent variables fixed, the property price of detached house in Bristol will be higher than price of semi-detached house by 0.17337% = 0.12023: If other independent variables fixed, average of the property price of flats/maisonettes in Bristol will be higher than price of others by 0.12023% 2.2 Interaction terms model: An interaction effect occurs when the effect of one variable depends on the value of another variable This type of effect makes the model more complex, but if the real world behaves this way, it is critical to incorporate it in the model We have added two interaction terms: and The difference in the rate of change of average property prices in Bristol with respect to total floor areas of detached type of property The difference in the rate of change of average property prices in Bristol with respect to total floor areas of flats/maisonettes of property 2.3 Model estimated following the drop test of model (1): Non-significant causal relationship means in the real data collected; the relationship is not occurred Non-significant independent variables should be deleted, and the analysis should be run again to obtain a model that show only all significant variables Taking all models proposed into account, we have come to a conclusion that the best model estimated is the one which has highest R2 in model and adjusted R2 Although model (1) and (4) have the same R2, model (4) is more relevant than model (1) because it can simplify the process of analyzing the data as the model contains fewer independent variables IV Diagnostic test Multicollinearity And the result is shown at the VIF Figure reference in the appendix 1.1 Consequences The consequences of multicollinearity are as follows: - If there is perfect collinearity among the X’s, their regression coefficients are indeterminate, and their standard errors are not defined - If collinearity is high but not perfect, estimation of regression coefficients is possible, but their standard errors tend to be large As a result, the population values of the coefficients cannot be estimated precisely However, if the objective is to estimate linear combinations of these coefficients, the estimable functions, this can be done even in the presence of perfect multicollinearity 1.2 Detection - With respect to multicollinearity, we manifest the technique of Variance inflation factor – VIF – to diagnose the matter at hand - And the result is shown at the VIF figure reference in the appendix - According to the calculation in R-studio, it is immediately apparent that the VIF indicators for size, detached and flat, are 1.294230, 1.117006 and 1.176558, respectively - In conclusion, in accordance with the Rule of Thumbs, those variables with VIF indicator lower than 10, show no sign of multicollinearity Therefore, our model above has no error in terms of the error mentioned 1.3 Remedy Do nothing Heteroscedasticity 2.1 Detection We have conducted the Breusch-Pagan test, of which result is shown in the Appendix as Figure 4, with the null hypothesis of: H0: Homoscedasticity H1: Heteroscedasticity We have: → Reject H0 → There exists heteroscedasticity in the model (4) 2.2 Consequences - The OLS estimators and regression predictions based on them remain unbiased and consistent 10 - The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too - Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid 2.3 Remedy - When is known: The Method of Weighted Least Squares, for estimators thus obtained are BLUE - When is unknown: White’s Heteroscedasticity-Consistent Variances and Standard Errors Plausible assumptions about Heteroscedasticity Pattern: hypothesizing a relationship between the error variance and one of the explanatory variables (Error variance is proportional to - square root transformation, to the square of the mean value of Y) and a Log transformation Serial correlation 3.1 Detection In order to diagnose for Serial correlation error, we conduct Breusch-Godfrey Test, which result is shown in the Appendix as Figure 5: H0: no serial correlation H1: not H0 SC of order (2 year 2019-2020) → Reject H0 → There exists serial correlation in the model (4) 3.2 Consequences - Pure serial correlation does not cause bias in the regression coefficient estimates, but it does affect their efficiency With positive serial correlation, the OLS estimates of the standard errors will be smaller than the true standard errors This will lead to the conclusion that the parameter estimates are more precise than they really are - Serial correlation causes OLS to no longer be a minimum variance estimator 11 - Serial correlation causes the estimated variances of the regression coefficients to be biased, leading to unreliable hypothesis testing The t-statistics will appear to be more significant than they really are 3.3 Remedy - Try to find out if the autocorrelation is pure autocorrelation and not the result of misspecification of the model – that is, it has excluded some important variables or its functional form is incorrect - If it is pure autocorrelation, we will have to use some type of generalized least-square (GLS) method - Use the Newey-West method to obtain adjusted standard errors of the estimated regression coefficients but not the estimates themselves since they are still unbiased - Ignore the problem and continue to use the OLS method - In addition, many researchers tend to implement the usage of logarithmic form of variables to transform the functional form into corrected one, and they test for Heteroskedasticity for transformed model using mentioned methods above Model Specification error – Incorrect Functional Form and Omitted Variables 4.1 Detection Regarding Model Specification error, we decided to conduct two specific tests, Ramsey RESET test and Jarque Bera test, in order to provide thorough diagnostic analysis for the situation at hand 4.1.1 Ramsey test or Reset test We conduct Ramsey RESET Test for omitted independent variables, which result is noted in Appendix as Figure H0: no specification errors H1: have whether incorrect functional form and omitted variables → Reject H0 → There exists whether incorrect functional form or omitted variables in the model (4) 4.1.2 Jarque Bera test 12 - We base on the assumption that the error, u has normal distribution in order to conduct Jarque Bera test The result is presented to Figure in the Appendix H0: u has normal distribution H1: not H0 → Reject H0 → u does not have normal distribution; therefore, the model has whether functional form error or omitted variables in the model (4) 4.2 Consequences - In consequence, estimation of the model may yield results that are incorrect or misleading - Specification error can occur with any sort of statistical model, although some models and estimation methods are much less affected by it than others Estimation methods that are unaffected by certain types of specification error - The consequences of multicollinearity have types of specification errors: omitting a relevant variable, and inclusion of irrelevant variable When legitimate variables are omitted from a model, the consequences can be very serious: The OLS estimator of the variables retained in the model not only are biased but are inconsistent as well Additionally, the variances and standards errors of these coefficients are incorrectly estimated, thereby vitiating the usual hypothesis – testing procedures The consequences of including irrelevant variables in the model are fortunately less serious: The estimators of the coefficients of the relevant as well as irrelevant variables remain unbiased as well as consistent, and the error variance remains correctly estimated The only problem is that the estimated variances tend to be larger than necessary, thereby making for less precise estimation of the parameters That is, the confidence intervals tend to be larger than necessary The normal distribution underlies much of statistical theory, and many statistical tests require the errors, or the test statistic, represent a normal distribution The test statistic's distribution cannot be assessed directly without resampling procedures, so the conventional approach has been to test the deviations from model predictions For correlation 13 coefficients this is equivalent to testing how the raw data are distributed, but this is not true for most other models - including regression model Owing to their limited power, tests of normality can be very misleading for small samples In certain cases, normal distribution is not possible especially when large samples size is not possible In other cases, the distribution can be skewed to the left or right depending on the parameter measure 4.3 Remedy If the data follows normal distribution, we can use parametric methods for data analysis When the data does not follow normal distribution, we can transform the data (logarithmic transformations) or use a statistical method that does not consider the distribution for analysis The remedies are often not easy The use of instrumental or proxy variables is theoretically attractive but not always practical Thus, it is very important in practice that the researcher be careful in stating the sources of data Data collected by official agencies often come up with several footnotes and the researcher should bring those to the attention of the reader V Conclusion: In general, the estimated model specification provides us with information about the factors which influenced the house prices in Bristol in the UK in a two-year period from 2019 to 2020 From our empirical analysis, about 71.11% the variation of property price can be explained by the model Notwithstanding, there still exists several problems in the model that needs remedial action VI Appendix: Figure 1: Summary of model 14 Figure 2: Average price of detached and flat type of property Figure 3: VIF Test for multicollinearity Figure 4: Breusch Pagan test for Heteroskedasticity 15 Figure 5: Breusch-Godfrey test Figure 6: Ramsey reset test Figure 7: Jarque Bera test Figure 8: Dropping “Numberrooms” variable Figure 9: Dropping “Size” variable 16 Figure 10: Dropping “Flat” variable Figure 11: Dropping “Detached” variable Figure 12: Dropping “Age” variable 17 Figure 13: Dropping “Age” and “Numberrooms” variables Figure 14: Summary of Log function: 18