Econometrics Thomas Andren Download free books at Thomas Andren Econometrics Download free eBooks at bookboon.com Econometrics 1st edition © 2007 Thomas Andren & bookboon.com ISBN 978-87-7681-235-5 Download free eBooks at bookboon.com Econometrics Contents Contents 1 Basics of probability and statistics 1.1 Random variables and probability distributions 1.2 The multivariate probability distribution function 15 1.3 Characteristics of probability distributions 17 2 Basic probability distributions in econometrics 24 2.1 24 The normal distribution 2.2 The t-distribution 31 2.3 The Chi-square distribution 33 2.4 The F-distribution 34 The simple regression model 36 3.1 The population regression model 36 3.2 Estimation of population parameters 41 Fast-track your career Masters in Management Stand out from the crowd Designed for graduates with less than one year of full-time postgraduate work experience, London Business School’s Masters in Management will expand your thinking and provide you with the foundations for a successful career in business The programme is developed in consultation with recruiters to provide you with the key skills that top employers demand Through 11 months of full-time study, you will gain the business knowledge and capabilities to increase your career choices and stand out from the crowd London Business School Regent’s Park London NW1 4SA United Kingdom Tel +44 (0)20 7000 7573 Email mim@london.edu Applications are now open for entry in September 2011 For more information visit www.london.edu/mim/ email mim@london.edu or call +44 (0)20 7000 7573 www.london.edu/mim/ Download free eBooks at bookboon.com Click on the ad to read more Econometrics Contents Statistical inference 49 4.1 Hypothesis testing 50 4.2 Confidence interval 52 4.3 Type I and type II errors 55 4.4 The best linear predictor 58 Model Measures 5.1 The coefficient of determination (R ) 61 5.2 The adjusted coefficient of determination (Adjusted R2) 66 5.3 The analysis of variance table (ANOVA) 67 The multiple regression model 70 6.1 Partial marginal effects 70 6.2 Estimation of partial regression coefficients 72 6.3 The joint hypothesis test 73 61 7 Specification 78 7.2 Omission of a relevant variable 83 7.3 Inclusion of an irrelevant variable 85 7.4 Measurement errors 86 Download free eBooks at bookboon.com Click on the ad to read more Econometrics Contents Dummy variables 89 8.1 Intercept dummy variables 89 8.2 Slope dummy variables 92 8.3 Qualitative variables with several categories 94 8.4 Piecewise linear regression 96 8.5 Test for structural differences 98 9 Heteroskedasticity and diagnostics 100 9.1 Consequences of using OLS 100 9.2 Detecting heteroskedasticity 102 9.3 Remedial measures 110 10 Autocorrelation and diagnostics 117 10.1 Definition and the nature of autocorrelation 117 10.2 Consequences 118 10.3 Detection of autocorrelation 121 10.4 Remedial measures 127 your chance to change the world Here at Ericsson we have a deep rooted belief that the innovations we make on a daily basis can have a profound effect on making the world a better place for people, business and society Join us In Germany we are especially looking for graduates as Integration Engineers for • Radio Access and IP Networks • IMS and IPTV We are looking forward to getting your application! To apply and for all current job openings please visit our web page: www.ericsson.com/careers Download free eBooks at bookboon.com Click on the ad to read more Econometrics Contents 11 Multicollinearity and diagnostics 129 11.1 Consequences 130 11.2 Measuring the degree of multicollinearity 133 11.3 Remedial measures 136 12 Simultaneous equation models 137 12.1 Introduction 137 12.4 Estimation methods 146 13 Statistical tables 152 I joined MITAS because I wanted real responsibili� I joined MITAS because I wanted real responsibili� Real work International Internationa al opportunities �ree wo work or placements �e Graduate Programme for Engineers and Geoscientists Maersk.com/Mitas www.discovermitas.com Ma Month 16 I was a construction Mo supervisor ina const I was the North Sea super advising and the No he helping foremen advis ssolve problems Real work he helping fo International Internationa al opportunities �ree wo work or placements ssolve pr Download free eBooks at bookboon.com �e G for Engine Click on the ad to read more Econometrics Basics of probability and statistics 1 Basics of probability and statistics The purpose of this and the following chapter is to briefly go through the most basic concepts in probability theory and statistics that are important for you to understand If these concepts are new to you, you should make sure that you have an intuitive feeling of their meaning before you move on to the following chapters in this book 1.1 Random variables and probability distributions The first important concept of statistics is that of a random experiment It is referred to as any process of measurement that has more than one outcome and for which there is uncertainty about the result of the experiment That is, the outcome of the experiment can not be predicted with certainty Picking a card from a deck of cards, tossing a coin, or throwing a die, are all examples of basic random experiments The set of all possible outcomes of an experiment is called the sample space of the experiment In case of tossing a coin, the sample space would consist of a head and a tail If the experiment was to pick a card from a deck of cards, the sample space would be all the different cards in a particular deck Each outcome of the sample space is called a sample point An event is a collection of outcomes that resulted from a repeated experiment under the same condition Two events would be mutually exclusive if the occurrence of one event precludes the occurrence of the other event at the same time Alternatively, two events that have no outcomes in common are mutually exclusive For example, if you were to roll a pair of dice, the event of rolling a and of rolling a double have the outcome (3,3) in common These two events are therefore not mutually exclusive Events are said to be collectively exhaustive if they exhaust all possible outcomes of an experiment For example, when rolling a die, the outcomes 1, 2, 3, 4, 5, and are collectively exhaustive, because they encompass the entire range of possible outcomes Hence, the set of all possible die rolls is both mutually exclusive and collectively exhaustive The outcomes and are mutually exclusive but not collectively exhaustive, and the outcomes even and not-6 are collectively exhaustive but not mutually exclusive Even though the outcomes of any random experiment can be described verbally, such as described above, it would be much easier if the results of all experiments could be described numerically For that purpose we introduce the concept of a random variable A random variable is a function that assigns unique numerical values to all possible outcomes of a random experiment Download free eBooks at bookboon.com Econometrics Basics of probability and statistics By convention, random variables are denoted by capital letters, such as X, Y, Z, etc., and the values taken by the random variables are denoted by the corresponding small letters x, y, z, etc A random variable from an experiment can either be discrete or continuous A random variable is discrete if it can assume only a finite number of numerical values That is, the result in a test with 10 questions can be 0, 1, 2, …, 10 In this case the discrete random variable would represent the test result Other examples could be the number of household members, or the number of sold copy machines a given day Whenever we talk about random variables expressed in units we have a discrete random variable However, when the number of unites can be very large, the distinction between a discrete and a continuous variable become vague, and it can be unclear whether it is discrete or continuous A random variable is said to be continuous when it can assume any value within an interval In theory that would imply an infinite number of values But in practice that does not work out Time is a variable that can be measured in very small units and go on for a very long time and is therefore a continuous variable Variables related to time, such as age is therefore also considered to be a continuous variable Economic variables such as GDP, money supply or government spending are measured in units of the local currency, so in some sense one could see them as discrete random variables However, the values are usually very large so counting each Euro or dollar would serve no purpose It is therefore more convenient to assume that these measures can take any real number, which therefore makes them continuous Since the value of a random variable is unknown until the experiment has taken place, a probability of its occurrence can be attached to it In order to measure a probability for a given events, the following formula may be used: P( A) The number of ways event A can occur The total number of possible outcomes (1.1) This formula is valid if an experiment can result in n mutually exclusive and equally likely outcomes, and if m of these outcomes are favorable to event A Hence, the corresponding probability is calculated as the ratio of the two measures: n/m as stated in the formula This formula follows the classical definition of a probability Example 1.1 You would like to know the probability of receiving a when you throw a die The sample space for a die is {1, 2, 3, 4, 5, 6}, so the total number of possible outcome are You are interested in one of them, namely Hence the corresponding probability equals 1/6 Download free eBooks at bookboon.com Econometrics Basics of probability and statistics Example 1.2 You would like to know the probability of receiving when rolling two dice First we have to find the total number of unique outcomes using two dice By forming all possible combinations of pairs we have (1,1), (1,2),…, (5,6),(6,6), which sum to 36 unique outcomes How many of them sum to 7? We have (1,6), (2,5), (3,4), (4,3), (5,2), (6,1): which sums to combinations Hence, the corresponding probability would therefore be 6/36 = 1/6 The classical definition requires that the sample space is finite and that each outcome in the sample space is equally likely to appear Those requirements are sometimes difficult to stand up to We therefore need a more flexible definition that handles those cases Such a definition is the so called relative frequency definition of probability or the empirical definition Formally, if in n trials, m of them are favorable to the event A, then P(A) is the ratio m/n as n goes to infinity or in practice we say that it has to be sufficiently large Example 1.3 Let us say that we would like to know the probability to receive when rolling two dice, but we not know if our two dice are fair That is, we not know if the outcome for each die is equally likely We could then perform an experiment where we throw two dice repeatedly, and calculate the relative frequency In Table 1.1 we report the results for the sum from to for different number of trials Number of trials Sum 10 100 1000 10000 100000 1000000 ∞ 0.02 0.021 0.0274 0.0283 0.0278 0.02778 0.1 0.02 0.046 0.0475 0.0565 0.0555 0.05556 0.1 0.07 0.09 0.0779 0.0831 0.0838 0.08333 0.2 0.12 0.114 0.1154 0.1105 0.1114 0.11111 0.1 0.17 0.15 0.1389 0.1359 0.1381 0.13889 0.2 0.17 0.15 0.1411 0.1658 0.1669 0.16667 Table 1.1 Relative frequencies for different number of trials From Table 1.1 we receive a picture of how many trials we need to be able to say that that the number of trials is sufficiently large For this particular experiment million trials would be sufficient to receive a correct measure to the third decimal point It seem like our two dices are fair since the corresponding probabilities converges to those represented by a fair die Download free eBooks at bookboon.com 10 0 0 Since the expected value of the forecast error is zero, we have an unbiased forecast Assuming that X is known, the variance of the forecast error is given by: [ ] E YT +1 − YˆT +1 = E [(B0 − b0 ) + (B1 − b1 )X T +1 + U T +1 ]2 2 = E [B0 − b0 ] + E [(B1 − b1 )X T +1 ] + E [U T +1 ] + E [2(B0 − b0 )(B1 − b1 )X T +1 ] 2 = V (b0 ) + V (b1 )X T +1 + V (U ) + 2Cov(b0 , b1 )X T +1 assuming that X is constant in repeated sampling Replacing the variances and the covariance with the expression for the sample estimators and rearrange we end up with the following expression: σ 2f 2 ˆ = E YT +1 − YT +1 = σ + + T [ ] (X T +1 − X ) (4.12) T ∑t =1 (X t − X ) Observe that the forecast error variance is smallest when the future value of X equals the mean value of X This formula is true if the future value of X is known That is often not the case and hence the formula has to be elaborated accordingly One way to deal with the uncertainty is to impose a distribution for X, with a component of uncertainty That is, assume that ( X T* +1 = X T +1 + ε T +1, ε ~ N 0,σ ε2 ) Download free eBooks at bookboon.com 59 Econometrics Statistical inference With this assumption we may form an expression for the error variance that takes the extra variation from the uncertainty into account: σ 2f = σ 1+ + T 2 (X T +1 − X ) + X T +1σ ε2 + B 2σ (4.13) ε T T 2 ∑t =1 (X t − X ) ∑t =1 (X t − X ) The important point to notice here is that this variance is impossible to estimate unless we know the exact value of the variance for the uncertainty That is of course not possible Furthermore, the expression involves the population parameter multiplied with the variance of the uncertainty Hence, in practice (4.12) is often use, but one should hold in mind that it most likely is an understatement of the true forecast error variance Taking the square root of the variance in (4.12) or (4.13) gives us the standard error of the forecast With this standard error it is possible to calculate confidence interval around the predicted values using the usual formula for a confidence interval, that is: Confidence interval of a forecast YˆT +1 ± tc × σ f Download free eBooks at bookboon.com 60 Econometrics Model Measures Model Measures In the previous chapters we have developed the basics of the simple regression model, describing how to estimate the population parameters using sample information and how to perform inference on the population But so far we not know how well the model describes the data The two most popular measures for model fit are the so called coefficient of determination and the adjusted coefficient of determination 5.1 The coefficient of determination (R2) In the simple regression model we explain the variation of one variable with help of another We can that because they are correlated Had they not been correlated there would be no explanatory power in our X variable In regression analysis the correlation coefficient and the coefficient of determination are very much related, but their interpretation differs slightly Furthermore, the correlation coefficient can only be used between pairs of variables, while the coefficient of determination can connect a group of variable with the dependent variable Download free eBooks at bookboon.com 61 Click on the ad to read more that because they are correlated Had they not been correlated there would be no explanatory power in our X variable In regression analysis the correlation coefficient and the coefficient of determination are very much related, but their interpretation differs slightly Furthermore, the correlation coefficient Econometrics Model Measures can only be used between pairs of variables, while the coefficient of determination can connect a group of variable with the dependent variable In general the correlation coefficient offers no information about the causal relationship between two In general the correlation coefficient offers no information thecorrelation causal relationship two of the regression variables But the attempt of this chapter is toabout put the coefficientbetween in a context variables But the attempt this under chapterwhat is toconditions put the correlation coefficient a context the model and of show it is appropriate to in interpret theofcorrelation coefficient as a regression model and show under what conditions it is appropriate to interpret the correlation measure of strength of a causal relationship coefficient as a measure of strength of a causal relationship The coefficient of determination tries to decompose the average deviation from the mean into an The coefficient of determination tries to decompose the average deviation from the mean into an part and an unexplained part It is therefore to start themeasure derivation of the measure explained part explained and an unexplained part It is therefore natural to start thenatural derivation of the fromfrom the deviation the mean expression andthe then introduce thethat predicted from the deviation the mean from expression and then introduce predicted value comesvalue from that comes from the regression That individual is, for a single individual we have: the regression model That is,model for a single we have: Yi Y Yi Y Yˆi Yˆi Yˆi Y Yi Yˆi (5.1) (5.1) Explained Unexplained We have to remember that we try to explain the deviation from the mean value of Y, using the regression We have to remember that we the try to explain the deviation from the mean of Y,the using thevalue (Y ) will therefore model Hence, difference between the expected value value Yˆi and mean ˆ Y therefore regression model Hence, the difference between thethe expected value Yi The andremaining the mean value will be denoted as the explained part of mean difference part will be denoted therefore be denoted as the explained part of thesimple mean trick difference The remaining part will therefore the unexplained part With this we decomposed the simple mean difference for a single unexplainedWe part With this simple trick we decomposed the simple mean difference be denoted the observation must now transform (5.1) into an expression that is valid for the whole sample, that is for a single observation We must now transform (5.1) into an expression that is valid for the whole for all observations We that by squaring and summing over all n observations: sample, that is for all observations We that by squaring and summing over all n observations: nn n ii =11 i =1 ( ) n n (Yˆˆ −2Y )2 +ˆ (Y − Yˆ )2 −ˆ2(ºYˆ − Y )(Y − Yˆ ) ( YYˆi −YY ) =Y∑(Y(Y YˆiY)) 2=∑ ˆˆi ...Thomas Andren Econometrics Download free eBooks at bookboon.com Econometrics 1st edition © 2007 Thomas Andren & bookboon.com ISBN 978-87-7681-235-5 Download free eBooks at bookboon.com Econometrics. .. mim@london.edu or call +44 (0)20 7000 7573 www.london.edu/mim/ Download free eBooks at bookboon.com Click on the ad to read more Econometrics Contents Statistical inference 49 4.1 Hypothesis testing... of an irrelevant variable 85 7.4 Measurement errors 86 Download free eBooks at bookboon.com Click on the ad to read more Econometrics Contents Dummy variables 89 8.1 Intercept dummy variables