Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 13 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
13
Dung lượng
295,9 KB
Nội dung
Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models Chapter DISCRETE CHOICE ANALYSIS: MULTINOMIAL MODELS We look at settings with multiple, unordered choices A key notion here is the “independence of irrelevant alternative” property Models for discrete choice with more than two choices: We assume for the i th consumer faced with i choices (j=1,2,…,J) suppose that the utility of choice j is: U ij = X ij β + ε ij If the consumer makes choice j in particular, then we assume that U ij is the maximum among J alternatives → Prob(U ij > U ik ) for all k ≠ j This is a probability of individual I makes choice j Yi = j if U ij > U ik for all k ≠ j The model is made by a particular choice of distribution for the disturbances Let Yi be a random variable that indicates the choice made McFadden (1974) has shown that if and only if the J disturbances are independent and identically distributed with type I extreme value distribution: F (ε ij ) = exp(− exp(−ε ij ) = e −e − ε ij Then: Pr ob(Y= j= ) i exp( X ij β ) J ∑ exp( X j =1 ij β) = exp( Z ijθ ) J ∑ exp(Z θ ) j =1 ij Utility depends on Z ij which includes aspects specific to the individual (i) as well as to choice (j) Let Z ij = [ X ij , wi ] , θ = [ β , α ] • X ij varies across choices (j) (and possibly across individual (i) as well) • wi contains the characteristics of the individual (i), therefore the same for all choice Nam T Hoang UNE Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models exp( X ij β + wiα ) j= Prob(Y= ) i J ∑ exp( X j =1 ij = β + wiα ) = [exp( X ij β )] exp( wiα ) J ∑ exp( X ij β ) exp( wiα ) j =1 [exp( X ij β ] J ∑ exp( X j =1 ij β) For example, a model of a shopping centre choices by individual: Depends on: number of stores Sij , distance from the centre of the city Dij, and income of the individual (i’) i which varies across individuals but not across the choices → Z ij = (Sij I Ii ) Dij THE MULTINOMIAL LOGIT MODEL: Suppose we have only individual specifre characteristics (i) wi which is the same for all choice The model response probability as: Prob(Y= j wi= ) P= i ij exp( wiα j ) J + ∑ exp( wiα j ) j =1 For all choices j=1,….,J For the first choice j=0 to satisfy J ∑P ij j =0 =1 Pr ob(Yi = wi ) = Pio = J + ∑ exp( wiα j ) j =1 The log – likelihood: = L ln= n J ∑∑ d =i =j ij ln Pij Where d ij =1 if alternative j is chosen by individual i, if not n ∂L = ∑ (d ij − Pij ) wi j=1,…,J ∂α j i =1 The marginal effects of the characteristics on probabilities: Nam T Hoang UNE Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models ∂P J e=0 ij Pij α jk − ∑ Pieα ek = Pij [α jk − α ] α = δ ij = = ∂wik II J ∑P α e =0 ie ek CONDITIONAL LOGIT MODEL: When the data consist of choice - specific (X ij ) instead of individual - specific characteristics The model is: Prob( = Yi j X i1 , X i , , = X iJ ) Pr = ob(Yi j X i ) Pij = exp( X ij β ) J ∑ exp( X j =0 ij β) Notes: When wi is unchanged α j varies When X ij varies β is unchanged The multinomial logit model can be viewed as a special case of this suppose we have a vector of individual characteristics X i with dimension K Then define for each choice j the vector of X ij as following: 0 Xi 0 0 0 0 X i'1 = , X ij' = , X iJ' = Xi X i So X ij varies for each choice X i ( K ×1) X io = [0 0] X i1 = [ X i 0] X ij = [0 X ij 0] Nam T Hoang UNE Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models X iJ = [0 X i ] β1 β 2 →β = β K Pij = exp( X ij β ) J ∑ exp( X j =0 ij = β) exp( X i β j ) J + ∑ exp( X i β j ) In this model, the j =1 coefficients are not directly tied to the marginal effects: ∂Pij ∂xim = [ Pij (1( j = m) − Pim )]β Where 1( j = m) equals if j=m and if not Log likelihood: = L ln= III n J ∑∑ d =i =j ij ln Pij MIXED LOGIT MODEL: For a model combines the two models: Prob (Y= j= ) i exp( X ij β + Wiα j ) J ∑ exp( X j =1 ij β + Wiα j ) exp( Z ijθ ) → Pr[Yi = j ] =J ∑ exp(Zijθ ) j =1 Z i1 = [ X i1 0 0] Z i = [ X i Wi Nam T Hoang UNE Business School 0] Z ij = [ X ij Wi 0] Z iJ = [ X iJ 0 Wi ] University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models β α1 : θ = α j : α J This model doesn’t have the advantage the same as the conditional logit model: If an additional alternative was added to the choice set then one can predict its probability of selection, since the parameter of the conditional logit model not vary across alternatives IV INDEPENDENCE OF IRRELEVANT ALTERNATIVES: • The ratio of probabilities of any two alternatives is independent of the introduction of a third alternative This is unrealistic in many economic choice models • In the multinomial logit and conditional logit model Pij Pim is independent of the remaining probability called the Independence of Irrelevant Alternative • Consider the conditional probability of choosing j given that you choose either j or l Pr(Yi = j ) Prob (Yi =j Yi ∈ { j , l}) = j ) + Pr(Yi = l) Pr(Yi = = • exp( X ij β ) exp( X ij β ) + exp( X il β ) This probability does not depend on the characteristics X im of alternatives m other than j and l The traditional example is MeFadden’s famous blue bus/red bus example • Suppose there are initially three choices: commuting by car, by red or by blue bus • People are indifferent between red versus blue buses U i ,redbus = U i ,bluebus With the choice between the blue and red bus being random, suppose: X X= X i ,bus = i , redbus i ,bluebus Then suppose that the probability of commuting by bus is Nam T Hoang UNE Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models Pr( = Yi bus = ) Pr(( = Yi redbus or bluebus = ) = And Pr( ) = Yi redbus= Yi bus = exp( X i*,bus β ) exp( X i*,bus β ) + exp( X i ,car β ) • That would imply that the conditional probability commuting by car, given that one commutes by blue or red bus, would differ from the same conditional probability if there is no blue bus Presumably taking away the blue bus choice would lead all the current blue bus users to shift to the red bus, not to cars P • ie exp( X ie β − X ik β ) does not depend on any alternative other than l & k = Pik • The conditional logit model does not allow for this type of substitution pattern Again, consider commuting initially choosing between two models of transportation, car and red bus So Pi Pi car bus ( red ) Pic =1 = exp( X car β − X redbus β )(= 1) = Pirb • Now suppose a third choice, blue bus is added Assuming bus commuters not care about the colour of the bus, consumers will choose between these with equal probability The ratio of their probabilities of taking blue bus and red bus is 1: But then IIA implies that Pirb = Pibb Pic is the same whether or not another alternative is added (blue Pirb Pirb Pic bus) so we have: = and P = P= P= = and Pic + Pirb + Pibb = ic irb ibb Pibb Pirb Which are the probabilities that the logit model predicts? • In real life, however, we would expect the probability of taking a car to remain the same when a new bus is introduced that is exactly the same as the old bus We would expect the original probability of taking the bus to be split between the two buses after the second one is introduced That is we would expect: Pic = Nam T Hoang UNE Business School 1 , Pibb = , Pirb = 4 University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models • In this case, the logit model, because of its IIA property, overestimates the probability of taking a car The ratio of probabilities of car and bus Pc actually changes with Pbb introduction of the red bus, rather than remaining constant as required by the logit model • The same kind of misprediction arises with logit models if there is change of another alternative Suppose individuals have choice out of three restaurants: Purdue (P) restaurant, Krannert restaurant (K), Chauncey restaurant (C): Pp = 95 , Pk = 85 , Pc = and quality Q p = 10 , Qk = , Qc = Suppose that market shares for restaurant are S p = 0.1 , S k = 0.25 and Sc = 0.65 U ij = −0.2 Pj + 2Q j + ε ij conditional logit model → Pip 0.1 = Pic 0.65 Suppose that Krannert restaurant raise the price to 1000 (taking it out of business) Conditional logit model would predict Pip = 0.13 and Pic = 0.87 to satisfy Pip 0.1 = = const Pic 0.65 This seems implausible people who were planning to go to Krannert would appear to be more likely to go to PMU than to go to the Chauncey rest so one would expect S p ≈ 0.35 ; Sc ≈ 0.65 (IIA not holds in reality conditional logit is not valid in this case) IIA: adding another alternative or changing the characteristics of a third alternative does not affects the ratio between two alternatives • Test of IIA Hausman & MeFadden offer tests of the IIA assumption based on the observation that: If the conditional logit model is true, β can be consistently estimated by conditional logit by focusing on any subset of alternative Using Hausman’s test to compare the estimate of β, using all alternative with the estimate, using a subset of alternatives: (βˆ s )[ ' − βˆ f Vˆs − Vˆ f ] (βˆ −1 s ) − βˆ f ~ χ ˆ ˆ s: restricted subset, f: full subset H o : β s = β f Nam T Hoang UNE Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models • We need IIA holds to apply the conditional logit model If reject Ho IIA not holds conditional logit is not valid model in this case • U ij X ij β + ε ij = The IIA assumption need to hold in reality to apply the conditional logit model The IIA property follows from the initial assumption that ε ij are extreme value distributions V • NESTED LOGIT MODEL If the test of IIA fails (reject H o : βˆs = βˆ f ) then the conditional logit model is not valid We need to modify the multinomial logit model One way to introduce correlation between the choices is through nesting them Suppose the set of choices {0 , 1,…, J} can be partitioned into S sets B1, B2 ,…, Bs , so that the full set of choices can be written as: {0,1, , J } = Us =1s Bs Let Zs be set – specific characteristics (Branch characteristics) Mc Fadden (1981) studied the following model: Adjusted with ρ *s • Conditional probability: Pr(= Yi j X i , Yi ∈ B= s) • ∑ exp( ρ s−1 X ij β ) l∈Bs exp( ρ s−1 X il β ) Within the sets, the correlation coefficient for ε ij is equal to (1 − ρ s2 ) Between the sets the ε ij are independent adjusted the probabilities by ρ s in each group The probability of a choice in the set Bs is exp( Z sα )[ ∑ exp( ρ s−1 X il β )]ρs l∈Bs Pr(Yi ∈ Bs X i ) = s ∑ [exp(Ztα )(∑ exp( ρt−1 X il β )) ρs ] =t l∈Bt → Pr(Yi = j Xi ) If we fix ρ s = for all s, then Nam T Hoang UNE Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models exp( X ij β + Z sα ) Pr( = Yi j= Xi ) s ∑∑ exp( X =t l∈Bt il β + Z tα ) and we are back in the conditional logit model In the first: In general this model corresponds to individuals choosing the option with the highest utility, where the utility of choice j in set Bs for individuals i is U ij = X ij β + Z sα + ε ij Mc Fadden suppose that: the joint distribution function of the ε ij is S F (ε io , , ε= exp(−∑ ( ∑ exp( ρt−1ε ij )) ρs ) iJ ) =s j∈Bs From this he derive the results in the previous page • How we estimate these models? One approach is to construct the log – likelihood and directly maximize it That is complicated, especially since the log likelihood function is not concave (but this also not impossible) An easier alternative is to directly use the nesting structure Within a nest we have a conditional logit model with coefficient ρ s−1β Hence we can directly estimate ρ s−1β using the concavity of the conditional logit model ( Newton – Raphson procedure will converge to a global maximum) Denote these estimate of ρ s−1β = λˆs Then the probability of a particular set Bs can be used to estimate ρ s and α through: ρs exp( Z sα ) ∑ exp( X il λˆs ) l∈Bs Pr((Yi ∈ Bs X i ) = ρs S ˆ exp(Z tα ) ∑ exp( X il λt ) ∑ t =1 l∈Bs = exp( Z sα + ρ sWˆ s ) S ∑ exp(Z α + ρ Wˆ t t =1 Nam T Hoang UNE Business School t t University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models Wˆ s is called: “inclusive values” Where: Wˆ s = ln ∑ exp( X il λˆs ) l∈Bs • We have another conditional logit model with likelihood function: = n ∏( ∏ i =1 Yi ∈Bs exp( Z α + ρ Wˆ s s s Pr( Yi ∈ Bs X i )) = ∏ ∏ s i =1 Yi ∈Bs ˆ ∑ exp( Z tα + ρ tWt t = n • These models can be extended too many lagers of nests It should be noted that both the order of the nests and the elements of each nest are very important VI • MULTINOMIAL PROBIT MODEL: A natural alternative model to avoid the IIA problem which is caused by correlation across choices is to work with normally distributed errors (ε ij ~ N (.)) Now we will not assume ε ij ~ Extreme value distribution anymore • Note that: extreme value ≈ normal distribution, but EV distribution is much easier to calculate • The cost of using normal distribution is the complicated likelihood function U ij = X ij β + ε ij j = 1,2, , J U i X i β + ε i U X β + ε i1 i1 i1 Ui = : = : : : U iJ X iJ β + ε iJ With: εi0 ε i1 ε i : X i ~ N (0, ∑) = : ε iJ Nam T Hoang UNE Business School 10 University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models With unrestricted covariance matrix ∑ Pr(Yi = q ) = Pr[U iq > U ij , j = 1, , J j ≠ q ] , or Pr(Yi = q ) = Pr[ε i1 − ε iq < ( X iq − X i1 ) β ; , ε iJ − ε iq < ( X iq − X iJ ) β ] • The main obstacle to the implementation of the Multinomial probit model is the difficulty in computing the multivariate normal probabilities for any J > • Recent results on accurate simulation of multinomial integrals have made estimation of MNP model feasible • Read: Geweke, Keane and Runkle (1994) – RE Statistics 76, No4 for the method, if you want to use the MN Probit model • For J = → P( yi = 1) = P(U i1 > U i ;U i1 > U i ) u1 = ε i − ε i1 < ( X i1 − X i ) β +∞ +∞ +∞ = ∫ ∫ ∫ → P( yi = 1) = P u2 = ε i − ε i1 < ( X i1 − X i ) β − ∞ − ∞ − ∞ U * ~ N (0, ∑ ) U2 − − 1 −1 0 ∑ 0 Where: ∑ = −1 0 * • Each element of the likelihood is a double integral and must be evaluated numerically • This model does not suffer from the IIA problem VII ORDERED LOGIT, ORDERED PROBIT: & SEQUENTIAL MODELS Ordered Probit: Yi* = X i β + ε Yi = Yi = = Yi = Yi J Nam T Hoang UNE Business School Y * is unobservable: Yi * ≤ if ≤ Yi * < µ1 if µ1 ≤ Yi* ≤ µ : if : if µ J −1 ≤ Yi* 11 University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models μ1,μ2,…μJ-1, are unknown parameters to be estimate with β Assume that ε is normally distributed across observations Normalize the mean and variance of ε , ε ~ N (0,1) We have: P ( yi =0 X i ) =Φ ( − X i β ) P( yi = X i ) = Φ (µ1 − X i β ) − Φ (− X i β ) P ( yi = X i ) = Φ (µ − X i β ) − Φ (µ1 − X i β ) : : P( yi = J X i ) = − Φ (µ J −1 − X i β ) We must have: < µ1 < µ < < µ J −1 (for all the probabilities to be positive) Likelihood function: = = ∏ Pr(Yi j ) j∈[1, ,J] all observations Marginal Effeds: ∂P(Yi = X i ) = −φ ( X i β ) β k ∂χ ik ∂P(Yi = j X i ) = [φ ( µ j − − X i β ) − φ ( µ j −1 − X i β )]β k ∂χ ik ∂P(Yi = J X i ) = [φ ( µ J −1 − X i β )]β k ∂χ ik Ordered Logit: Replace Φ with the logit function F(Xiβ ) = exp( X i β ) exp( X ) F(X ) = + exp( X i β ) + exp( X ) gives the ordered logit model Sequential Multinomial Models: A Special case of an ordered variable (where choices have a natural ranking) is a sequential variable This occurs when second event is dependent on the first event, the third event is dependent on the previous two events, … Person i at nth category means person i has been all (n-1) previous categories: Nam T Hoang UNE Business School 12 University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models not 1 yi = 2 3 high school highschool , not college college Pr[ yi = 2] = Pr[yi = yi ≠ 1]× Pr[ yi ≠ 1] = Φ ( X β )(1 − Φ ( X 1β1 )) The parameters β1 and β2 can be estimated by maximizing the log-likelihood: = L ln= p1i = Φ ( X 1i β1 ) , p2i is n m ∑∑ y =i =j ij ln pij given in the preceding equation and p3i =− p1i − p2i Notes: P ( yi = 2) means P ( yi = and Nam T Hoang UNE Business School yi ≠ 1) 13 University of New England ...Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models exp( X ij β + wiα ) j= Prob(Y= ) i J ∑ exp( X j =1 ij = β + wiα... Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models ∂P J e=0 ij Pij α jk − ∑ Pieα ek = Pij [α jk − α... Business School University of New England Advanced Econometrics - Part II Chapter 4: Discrete choice analysis: Multinomial Models X iJ = [0 X i ] β1 β 2 →β =