Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 49 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
49
Dung lượng
275,87 KB
Nội dung
8 A duration dependent variable In the previous chapters we have discussed econometric models for ordered and unordered discrete choice dependent variables and continuous dependent variables, which may be censored or truncated In this chapter we deal with models for duration as the dependent variable Duration data often occur in marketing research Some examples concern the time between two purchases, the time until a customer becomes inactive or cancels a subscription or service contract, and the time it takes to respond to a direct mailing (see Helsen and Schmittlein, 1993, table 1, for more examples) Models for duration data receive special attention in the econometric literature This is because standard regression models cannot be used In fact, standard regression models are used to correlate a dependent variable with explanatory variables that are all measured at the same point in time In contrast, if one wants to relate a duration variable to explanatory variables, it is likely that the duration will also depend on the path of the values of the explanatory variables during the period of duration For example, the timing of a purchase may depend on the price of the product at the time of the purchase but also on the price in the weeks or days before the purchase During these weeks a household may have considered the price of the product to be too high, and therefore it postponed its purchase Hence, the focus of modeling of duration is often not on explaining duration directly but merely on the probability that the duration will end this week given that it lasted until this week A second important feature of duration data is censoring If one collects duration data it is likely that at the beginning of the measurement period some durations will already be in progress Also, at the end of the measurement period, some durations may not have been completed It is, for example, unlikely that all households in the sample purchased a product exactly at the end of the observation period To deal with these properties of duration variables, so-called duration models, have been proposed and used For an extensive theoretical discussion of duration models, we refer 158 A duration dependent variable 159 to Kalbfleisch and Prentice (1980), Kiefer (1988) and Lancaster (1990), among others The outline of this chapter is as follows In section 8.1 we discuss the representation and interpretation of two commonly considered duration models, which are often used to analyze duration data in marketing Although the discussion starts off with a simple model for discrete duration variables, we focus in this section on duration models with continuous dependent variables We discuss the Accelerated Lifetime specification and the Proportional Hazard specification in detail Section 8.2 deals with Maximum Likelihood estimation of the parameters of the two models In section 8.3 we discuss diagnostics, model selection and forecasting with duration models In section 8.4 we illustrate models for interpurchase times in relation to liquid detergents (see section 2.2.6 for more details on the data) Finally, in section 8.5 we again deal with modeling unobserved heterogeneity as an advanced topic 8.1 Representation and interpretation Let Ti be a discrete random variable for the length of a duration observed for individual i and ti the actual length, where Ti can take the values 1; 2; 3; for i ¼ 1; ; N It is common practice in the econometric literature to refer to a duration variable as a spell Suppose that the probability that the spell ends is equal to at every period t in time, where t ¼ 1; ; ti The probability that the spell ends after two periods is therefore ð1 À Þ In general, the probability that the spell ends after ti duration periods is then PrẵTi ẳ ti ¼ ð1 À Þðti À1Þ : ð8:1Þ In other words, the random variable Ti has a geometric distribution with parameter (see section A.2 in the Appendix) In many cases one wants to relate the probability that a spell ends to explanatory variables Because is a probability, one can, for example, consider 8:2ị ẳ F ỵ xi Þ; where F is again a function that maps the explanatory variable xi on the unit interval ½0; 1 (see also section 4.1) The function F can, for example, be the logistic function If xi is a variable that takes the same value over time (for example, gender), the probability that the spell ends does not change over time This may be an implausible assumption If we consider, for example, purchase timing, we may expect that the probability that a household will buy detergent is higher if the relative price of detergent is low and lower if the 160 Quantitative models in marketing research relative price is high In other words, the probability that a spell will end can be time dependent In this case, the probability that the spell ends after ti periods is given by PrẵTi ẳ ti ¼ ti ti À1 Y ð1 À t Þ; ð8:3Þ t¼1 where t is the probability that the spell will end at time t given that it has lasted until t for t ¼ 1; ; ti This probability may be related to explanatory variables that stay the same over time, xi , and explanatory variables that change over time, wi;t , according to t ẳ F ỵ xi ỵ wi;t ị: ð8:4Þ The variable wi;t can be the price of detergent in week t, for example Additionally it is likely that the probability that a household will buy detergent is higher if it had already bought detergent four weeks ago, rather than two weeks ago To allow for an increase in the purchase probability over time, one may include (functions of) the variable t as an explanatory variable with respect to t , as in t ẳ F ỵ xi ỵ wi;t ỵ tị: 8:5ị The functions t , which represent the probability that the spell will end at time t given that it has lasted until t, are called hazard functions In practice, duration data are often continuous variables (or treated as continuous variables) instead of discrete variables This means that Ti is a continuous random variable that can take values on the interval ẵ0; 1ị In the remainder of this chapter we will focus the discussion on modeling such continuous duration data The discussion concerning discrete duration data turns out to be a good basis for the interpretation of the models for continuous duration data The distribution of the continuous random variable Ti for the length of a spell of individual i is described by the density function f ðti Þ The density function f ðti Þ is the continuous-time version of (8.3) Several distributions have been proposed to describe duration (see table 8.1 for some examples and section A.2 in the Appendix for more details) The normal distribution, which is frequently used in econometric models, is however not a good option because duration has to be positive The lognormal distribution can be used instead The probability that the continuous random variable Ti is smaller than t is now given by t 8:6ị PrẵTi < t ẳ Ftị ẳ f sịds; where Ftị denotes the cumulative distribution function of Ti It is common practice in the duration literature to use the survival function, which is exp
tị
tị1 exp
tị ị
tị1 ỵ
tị ị2 =tị log
tịị exp
tị exp
tị ị ỵ
tị ị1 ẩ log
tịị Survival Stị
tị1
tị1 ỵ
tị ị1 ð=tÞð logð
tÞÞðÈðÀ logð
tÞÞÞÀ1 Hazard ðtÞ Notes: In all cases > and > È and are the cumulative distribution function and the density function of a standard normal distribution Exponential Weibull Loglogistic Lognormal Density f ðtÞ Table 8.1 Some density functions with expressions for their corresponding hazard functions can be interpreted as an elasticity 8.1.2 Proportional Hazard model A second way to include explanatory variables in a duration model is to scale the hazard function by the function ị, that is, ti jxi ị ẳ xi ị0 ðti Þ; ð8:21Þ where 0 ðti Þ denotes the baseline hazard Again, because the hazard function has to be nonnegative, one usually species ị as xi ị ẳ exp þ xi Þ: ð8:22Þ If the intercept is unequal to 0, the baseline hazard in (8.21) is identified upon a scalar Hence, if one opts for a Weibull or an exponential baseline hazard one again has to restrict to to identify the parameters The interpretation of the parameters for the proportional hazard specification is different from that for the Accelerated Lifetime model This parameter describes the constant proportional effect of xi on the conditional probability of completing a spell, which can be observed from @ log ti jxi ị @ log xi ị ẳ ẳ : @xi @xi ð8:23Þ This suggests that one can linearize the model as follows: log ti ị ẳ ỵ xi ỵ ui ; where ti Þ denotes the integrated baseline hazard defined as distribution of ui follows from Ð ti ð8:24Þ 0 ðsÞds The Prẵui < U ẳ Prẵ log ti ị < U ỵ ỵ xi ẳ Prẵ0 ti Þ > expðÀU À À xi Þ ¼ Prẵti > ẵexpU xi ị ẳ S1 ẵexpU 8:25ị ... parameters of the two models In section 8. 3 we discuss diagnostics, model selection and forecasting with duration models In section 8. 4 we illustrate models for interpurchase times in relation to liquid... distributions with ¼ 1:5 and ¼ 164 Quantitative models in marketing research In practice, for many problems we are interested not particularly in the density of the durations but in the shape of the hazard... ! t: 8: 7ị Using the survival function we can define the continuous-time analogue of the hazard functions t in (8. 2) and (8. 5), that is, tị ẳ f ðtÞ ; SðtÞ ? ?8: 8Þ where we now use ðtÞ to indicate