Inference for Logistic and Probit Regression Model- 123docz.net

The customary method of estimation for logistic and probit models is maximum likelihood, described in further detail in Section11.9. To provide intuition, we outline the ideas in the context of binary dependent variable regression models.

The likelihood is the observed value of the probability function. For a single observation, the likelihood is

*1−πi ifyi =0 πi ifyi =1 .

The objective of maximum likelihood estimation is to find the parameter values that produce the largest likelihood. Finding the maximum of the logarithmic function yields the same solution as finding the maximum of the correspond- ing function. Because it is generally computationally simpler, we consider the logarithmic (or log-) likelihood, written as

*ln (1−πi) ifyi =0

lnπi ifyi =1 . (11.2)

More compactly, the log-likelihood of a single observation is yilnπ(xiβ)+(1−yi) ln

1−π(xiβ) ,

whereπi =π(xiβ).Assuming independence among observations, the likelihood of the dataset is a product of likelihoods of each observation. Taking logarithms, the log-likelihood of the dataset is the sum of log-likelihoods of single observations.

The log-likelihood is viewed as a function of the parameters, with the data held fixed. In contrast, the joint probability mass function is viewed as a function of the realized data, with the parameters held fixed.

The log-likelihood of the dataset is L(β)=

n i=1

9yilnπ(xiβ)+(1−yi) ln

1−π(xiβ):

. (11.3)

The log-likelihood is viewed as a function of the parameters, with the data held fixed. In contrast, the joint probability mass function is viewed as a function of the realized data, with the parameters held fixed.

The method of maximum likelihood involves finding the values of β that maximize the log-likelihood. The customary method of finding the maximum is taking partial derivatives with respect to the parameters of interest and finding roots of the resulting equations. In this case, taking partial derivatives with respect toβyields the score equations

∂

∂βL(β)= n

i=1

yi−π(xiβ) π (xiβ)

π(xiβ)(1−π(xiβ)) =0, (11.4) whereπ is the derivative ofπ. The solution of these equations, denoted as bMLE, is the maximum likelihood estimator. For the logit function the score equations reduce to

∂

∂βL(β)= n

i=1

yi−π(xiβ)

=0, (11.5)

whereπ(z)=1/(1+exp(−z)).

11.3.2 Additional Inference

An estimator of the large sample variance ofβ may be calculated taking partial derivatives of the score equations. Specifically, the term

I(β)= −E ∂2

∂β∂β L(β)

is the information matrix. As a special case, using the logit function and equation (11.5), straightforward calculations show that the information matrix is

I(β)= n

i=1

σi2xixi,

whereσi2 =π(xiβ)(1−π(xiβ)). The square root of the (j+1)st diagonal ele- ment of this matrix evaluated atβ =bMLE yields the standard error for bj,MLE, denoted as se(bj,MLE).

To assess the overall model fit, it is customary to cite likelihood ratio test statistics in nonlinear regression models. To test the overall model adequacy

H0 :β=0, we use the statistic

LRT =2×(L(bMLE)−L0),

where L0 is the maximized log-likelihood with only an intercept term. Under the null hypothesisH0, this statistic has a chi-square distribution withkdegrees of freedom. Section 11.9.3 describes likelihood ratio test statistics in greater technical detail.

As described in Section 11.9, measures of goodness of fit can be difficult to interpret in nonlinear models. One measure is the so-called max-scaled R2, defined asRms2 =R2/R2max, where

R2 =1−

exp(L0/n) exp(L(bMLE)/n)

and R2max=1−exp(L0/n)2. Here, L0/n represents the average value of this log-likelihood.

Another measure is pseudo-R2

L(bMLE)−L0 Lmax−L0 ,

where L0 and Lmax is the log-likelihood based on only an intercept and on the maximum achievable, respectively. Like the coefficient of determination, the pseudo-R2 takes on values between zero and one, with larger values indicat- ing a better fit to the data. Other versions of the pseudo-R2 are available in the literature; see, for example, Cameron and Trivedi (1998). An advantage of this pseudo-R2 measure is its link to hypothesis testing of regression coefficients.

Example: Job Security. Valletta (1999) studied declining job security using the Panel Survey of Income Dynamics (PSID) database. We consider here one of the regression models presented by Valletta, based on a sample of male heads of households that consists of n=24,168 observations over the years 1976–

92, inclusive. The PSID survey records reasons why men left their most recent employment, including plant closures, “quit,”and changed jobs for other reasons. However, Valletta focused on dismissals (“laid off” or “fired”)because involuntary separations are associated with job insecurity.

Table 11.3 presents a probit regression model run by Valletta (1999), using dismissals as the dependent variable. In addition to the explanatory variables listed in Table 11.3, other variables controlled for consisted of education, mar- ital status, number of children, race, years of full-time work experience and its square, union membership, government employment, logarithmic wage, the U.S.

employment rate, and location as measured through the Metropolitan Statisti- cal Area residence. In Table11.3, tenure is years employed at the current firm.

Table 11.3 Dismissal Probit Regression Estimates Parameter Standard

Variable Estimate Error

Tenure −0.084 0.010

Time Trend −0.002 0.005

Tenure*(Time Trend) 0.003 0.001

Change in Logarithmic Sector Employment 0.094 0.057 Tenure*(Change in Logarithmic Sector Employment) −0.020 0.009

−2 Log-Likelihood 7,027.8

Pseudo-R2 0.097

Further, sector employment was measured by examining the Consumer Price Sur- vey employment in 387 sectors of the economy, based on 43 industry categories and nine regions of the country.

On the one hand, the tenure coefficient reveals that more experienced workers are less likely to be dismissed. On the other hand, the coefficient associated with the interaction between tenure and time trend reveals an increasing dismissal rate for experienced workers.

The interpretation of the sector employment coefficients is also of interest.

With an average tenure of about 7.8 years in the sample, we see that the low-tenure men are relatively unaffected by changes in sector employment. However, for more experienced men, there is an increasing probability of dismissal associated with sectors of the economy where growth declines.

Inference for Logistic and Probit Regression Models

Fitting Data to a Normal Distribution

Is the Model Useful? Some Basic Summary Measures