SAS/ETS 9.22 User''''s Guide 156 potx

1542 ✦ Chapter 22: The SEVERITY Procedure (Experimental) Given this, the likelihood of the data L is as follows: L D Y i2E f ‚ .y i / Y j 2E l f ‚ .y j / 1 F ‚ .t j / Y k2C 1 F ‚ .c k / Y m2C l 1 F ‚ .c m / 1 F ‚ .t m / The maximum likelihood procedure used by PROC SEVERITY finds an optimal set of parameter values O ‚ that maximizes log.L/ subject to the boundary constraints on parameter values. Note that for a distribution dist, such boundary constraints can be specified by using the dist_LOWERBOUNDS and dist_UPPERBOUNDS subroutines. Some aspects of the optimization process can be controlled by using the NLOPTIONS statement. Probability of Observability and Likelihood If probability of observability is specified for the left-truncation, then PROC SEVERITY uses a modified likelihood function for each truncated observation. If the probability of observability is p 2 .0:0; 1:0 , then for each left-truncated observation with truncation threshold t , there exist .1  p/=p observations with a response variable value less than or equal to t . Each such observation has a probability of Pr.Y Ä t/ D F ‚ .t/ . Thus, following the notation of the section “Likelihood Function” on page 1541, the likelihood of the data is as follows: L D Y i2E f ‚ .y i / Y j 2E l f ‚ .y j /F ‚ .t j / 1p p Y k2C 1 F ‚ .c k / Y m2C l .1 F ‚ .c m //F ‚ .t m / 1p p Note that the likelihood of the observations that are not left-truncated (observations in sets E and C ) is not affected. Estimating Covariance and Standard Errors PROC SEVERITY computes an estimate of the covariance matrix of the parameters by using the asymptotic theory of the maximum likelihood estimators (MLE). If N denotes the number of observations used for estimating a parameter vector Â Â Â , then the theory states that as N ! 1 , the distribution of O Â Â Â , the estimate of Â Â Â , converges to a normal distribution with mean Â Â Â and covariance O C such that I.Â Â Â/ O C ! 1 , where I.Â Â Â/ D E  r 2 log.L.Â Â Â//  is the information matrix for the likelihood of the data, L.Â Â Â/ . The covariance estimate is obtained by using the inverse of the information matrix. In particular, if G D r 2 log.L.Â Â Â// denotes the Hessian matrix of the log likelihood, then the covariance estimate is computed as O C D N d G 1 where d is a denominator determined by the VARDEF= option. If VARDEF=N, then d D N , which yields the asymptotic covariance estimate. If VARDEF=DF, then d D N  k , where k is number of Estimating Regression Effects ✦ 1543 parameters (the model’s degrees of freedom). The VARDEF=DF option is the default, because it attempts to correct the potential bias introduced by the finite sample. The standard error s i of the parameter Â i is computed as the square root of the i th diagonal element of the estimated covariance matrix; that is, s i D q O C i i . Note that covariance and standard error estimates might not be available if the Hessian matrix is found to be singular at the end of the optimization process. This can especially happen if the optimization process stops without converging. Estimating Regression Effects The SEVERITY procedure enables you to estimate the effects of regressor (exogenous) variables while fitting a distribution model if the distribution has a scale parameter or a log-transformed scale parameter. Let x j ( j D 1; : : : ; k ) denote the k regressor variables. Let ˇ j denote the regression parameter that corresponds to the regressor x j . If regression effects are not specified, then the model for the response variable Y is of the form Y  F.‚/ where F is the distribution of Y with parameters ‚ . This model is typically referred to as the error model. The regression effects are modeled by extending the error model to the following form: Y  exp. k X j D1 ˇ j x j / F.‚/ Under this model, the distribution of Y is valid and belongs to the same parametric family as F if and only if F has a scale parameter. Let Â denote the scale parameter and  denote the set of nonscale distribution parameters of F. Then the model can be rewritten as Y  F.Â; / such that Â is affected by the regressors as Â D Â 0  exp. k X j D1 ˇ j x j / where Â 0 is the base value of the scale parameter. Thus, the regression model consists of the following parameters: Â 0 , , and ˇ j .j D 1; : : : ; k/. Given this form of the model, distributions without a scale parameter cannot be considered when regression effects are to be modeled. If a distribution does not have a direct scale parameter, then PROC SEVERITY accepts it only if it has a log-transformed scale parameter — that is, if it has a parameter p D log.Â/ . You must define the SCALETRANSFORM function to specify the log-transformation when you define the distribution model. 1544 ✦ Chapter 22: The SEVERITY Procedure (Experimental) Parameter Initialization for Regression Models Let a random variable Y be distributed as F.Â; / , where Â is the scale parameter. By definition of the scale parameter, a random variable W D Y =Â is distributed as G./ such that G./ D F .1; / . Given a random error term e that is generated from a distribution G./ , a value y from the distribution of Y can be generated as y D Â  e Taking the logarithm of both sides and using the relationship of Â with the regressors yields: log.y/ D log.Â 0 / C k X j D1 ˇ j x j C log.e/ If you do not provide initial values for the regression and distribution parameters, then PROC SEVERITY makes use of the preceding relationship to initialize parameters of a regression model with distribution dist as follows: 1. The following linear regression problem is solved to obtain initial estimates of ˇ 0 and ˇ j : log.y/ D ˇ 0 C k X j D1 ˇ j x j The estimates of ˇ j .j D 1; : : : ; k/ in the solution of this regression problem are used to initialize the respective regression parameters of the model. The results of this regression are also used to detect whether any regressors are linearly dependent on the other regressors. If any such regressors are found, then a warning is written to the SAS log and the corresponding regressor is eliminated from further analysis. The estimates for linearly dependent regressors are denoted by a special missing value of .R in the OUTEST= data set and in any displayed output. 2. Each input value y i of the response variable is transformed to its scale-normalized version w i as w i D y i exp.ˇ 0 C P k j D1 ˇ j x ij / where x ij denotes the value of j th regressor in the i th input observation. These w i values are used to compute the input arguments for the dist_PARMINIT subroutine. The values that are computed by the subroutine for nonscale parameters are used as their respective initial values. Let s 0 denote the value of the scale parameter that is computed by the subroutine. If the distribution has a log-transformed scale parameter P , then s 0 is computed as s 0 D exp.l 0 / , where l 0 is the value of P computed by the subroutine. 3. The value of Â 0 is initialized as Â 0 D s 0  exp.ˇ 0 / Estimating Regression Effects ✦ 1545 If you provide initial values for the regression parameters, then you must provide valid, nonmissing initial values for Â 0 and ˇ j parameters. You can use only the INEST= data set to specify the initial values for ˇ j . You can use the .R special missing value to denote redundant regressors if any such regressors are specified in the MODEL statement. Initial values for Â 0 and other distribution parameters can be specified using either the INEST= data set or the INIT= option in the DIST statement. If the distribution has a direct scale parameter (no transformation), then the initial value for the first parameter of the distribution is used as an initial value for Â 0 . If the distribution has a log-transformed scale parameter, then the initial value for the first parameter of the distribution is used as an initial value for log.Â 0 /. Reporting Estimates of Regression Parameters When you request estimates to be written to the output (either ODS displayed output or in the OUTEST= data set), the estimate of the base value of the first distribution parameter is reported. If the first parameter is the log-transformed scale parameter, then the estimate of log.Â 0 / is reported; otherwise, the estimate of Â 0 is reported. The transform of the first parameter of a distribution dist is controlled by the dist_SCALETRANSFORM function that is defined for it. CDF and PDF Estimates with Regression Effects When regression effects are estimated, the estimate of the scale parameter depends on the values of the regressors and estimates of the regression parameters. This results in a potentially different distribution for each observation. In order to make estimates of the cumulative distribution function (CDF) and probability density function (PDF) comparable across distributions and comparable to the empirical distribution function (EDF), PROC SEVERITY reports the CDF and PDF estimates from a mixture distribution. This mixture distribution is an equally weighted mixture of N distributions, where N is the number of observations used for estimation. Each component of the mixture differs only in the value of the scale parameter. In particular, let f .yI O Â i ; O / and F .yI O Â i ; O / denote the PDF and CDF, respectively, of the component distribution due to observation i , where y denotes the value of the response variable, O Â i denotes the estimate of the scale parameter due to observation i , and O  denotes the set of estimates of all other parameters of the distribution. The value of Â i is computed as O Â i D O Â 0  exp. k X j D1 O ˇ j x ij / where O Â 0 is an estimate of the base value of the scale parameter, O ˇ j are the estimates of regression coefficients, and x ij is the value of regressor j in observation i . Then, the PDF and CDF estimates, f  .y/ and F  .y/, respectively, of the mixture distribution at y are computed as follows: f  .y/ D 1 N N X iD1 f .yI O Â i ; O / F  .y/ D 1 N N X iD1 F .yI O Â i ; O / 1546 ✦ Chapter 22: The SEVERITY Procedure (Experimental) The CDF estimates reported in OUTCDF= data set and plotted in CDF plots are the F  .y/ values. The PDF estimates plotted in PDF plots are the f  .y/ values. If left-truncation is specified without the probability of observability, then the conditional CDF estimate from the mixture distribution is computed as follows: Let F  .y/ denote an unconditional mixture estimate of the CDF at y and t min be the smallest value of the left-truncation threshold. Let F  .t min / denote an unconditional mixture estimate of the CDF at t min . Then, the conditional mixture estimate of the CDF at y is computed as F c .y/ D .F  .y/  F  .t min //=.1 F  .t min ///. Parameter Initialization PROC SEVERITY enables you to initialize parameters of a model in different ways. There can be two kinds of parameters in a model: distribution parameters and regression parameters. The distribution parameters can be initialized by using one of the following three methods: PARMINIT subroutine You can define a PARMINIT subroutine in the distribution model. INEST= data set You can use the INEST= data set. INIT= option You can use the INIT= option in the DIST statement. Note that only one of the initialization methods is used. You cannot combine them. They are used in the following order:  The method of using the INIT= option takes the highest precedence. If you use the INIT= option to provide an initial value for at least one parameter, then other initialization methods (INEST= and PARMINIT) are not used. If you specify initial values for some but not all the parameters by using the INIT= option, then the uninitialized parameters are initialized to the default value of 0.001. If this option is used when regression effects are specified, then the value of the first distribution parameter must be related to the initial value for the base value of the scale or log-transformed scale parameter. See the section “Estimating Regression Effects” on page 1543 for details.  The method of using the INEST= data set takes the second precedence. If there is a nonmissing value specified for even one distribution parameter, then the PARMINIT method is not used and any uninitialized parameters are initialized to the default value of 0.001.  If none of the distribution parameters are initialized by using the INIT= option or the INEST= data set, but the distribution model defines a PARMINIT subroutine, then PROC SEVERITY invokes that subroutine with appropriate inputs to initialize the parameters. If the PARMINIT subroutine returns missing values for some parameters, then those parameters are initialized to the default value of 0.001.  If none of the initialization methods are used, each distribution parameter is initialized to the default value of 0.001. Empirical Distribution Function Estimation Methods ✦ 1547 The regression parameters can be initialized by using the INEST= data set or the default method. If you use the INEST= data set, then you must specify nonmissing initial values for all the regressors. The only missing value allowed is the special missing value .R, which indicates that the regressor is linearly dependent on other regressors. If you specify .R for a regressor for one distribution in a BY group, you must specify it so for all the distributions in that BY group. If you do not provide initial values for regressors by using the INEST= data set, then PROC SEVERITY computes them by fitting a linear regression model for log.y/ on all the regressors with an intercept in the model, where y denotes the response variable. If it finds any linearly dependent regressors, warnings are printed to the SAS log and those regressors are dropped from the model. Details about estimating regression effects are provided in the section “Estimating Regression Effects” on page 1543. Empirical Distribution Function Estimation Methods The empirical distribution function (EDF) is a nonparametric estimate of the cumulative distribution function (CDF) of the distribution. PROC SEVERITY uses EDF estimates for computing the EDF-based statistics in addition to providing a nonparametric estimate of the CDF to the PARMINIT subroutine. Let there be a set of N observations, each containing a triplet of values .y i ; t i ; ı i /; i D 1; : : : ; N , where y i is the value of the response variable, t i is the value of the left-truncation threshold, and ı i is the indicator of right-censoring. A missing value for t i indicates no left-truncation. ı i D 0 indicates a right-censored observation, in which case y i is assumed to record the right-censoring limit c i . ı i ¤ 0 indicates an uncensored observation. In the following definitions, an indicator function I Œe is used, which takes a value of 1 or 0 if the expression e is true or false, respectively. Given this notation, the EDF is estimated as follows: F n .y/ D 8 < : 0 if y < y .1/ O F n .y .k/ / if y .k/ Ä y < y .kC1/ ; k D 1; : : : ; N 1 O F n .y .N / / if y .N / Ä y where y .k/ denotes the k th order statistic of the set fy i g and O F n .y .k/ / is the estimate computed at that value. The definition of O F n depends on the estimation method. You can specify a particular method or let PROC SEVERITY choose an appropriate method by using the EMPIRICALCDF= option in the MODEL statement. Each method computes O F n as follows: STANDARD This method is the standard way of computing EDF. The EDF estimate at observation i is computed as follows: O F n .y i / D 1 N N X j D1 I Œy j Ä y i  This method ignores any censoring and truncation information, even if it is specified. When no censoring or truncation information is specified, this is the default method chosen. 1548 ✦ Chapter 22: The SEVERITY Procedure (Experimental) KAPLANMEIER This method is suitable primarily when left-truncation or right-censoring is specified. The Kaplan-Meier (KM) estimator, also known as the product-limit estimator, was first introduced by Kaplan and Meier (1958) for censored data. Lynden-Bell (1971) derived a similar estimator for left-truncated data. PROC SEVERITY uses the definition that combines both censoring and truncation information (Klein and Moeschberger 1997, Lai and Ying 1991). The EDF estimate at observation i is computed as O F n .y i / D 1  Y Äy i Â 1  n  R n ./ Ã where n  and R n ./ are defined as follows:  n  D P N kD1 I Œy k D  and ı k ¤ 0 , which is the number of uncensored observations with response variable value equal to .  R n ./ D P N kD1 I Œy k   > t k  , which is the size (cardinality) of the risk set at  . The term risk set has its origins in survival analysis; it contains the events that are at the risk of failure at a given time,  . In other words, it contains the events that have survived up to time  and might fail at or after  . For PROC SEVERITY, time is equivalent to the magnitude of the event and failure is equivalent to an uncensored and observable event, where observable means it satisfies the left-truncation threshold. If you specify either right-censoring or left-truncation and do not explicitly specify a method of computing EDF, then this is the default method used. MODIFIEDKM The product-limit estimator used by the KAPLANMEIER method does not work well if the risk set size becomes very small. This can happen for right- censored data towards the right tail, and for left-truncated data at the left tail and propagate to the entire range of data. This was demonstrated by Lai and Ying (1991). They proposed a modification to the estimator that ignores the effects due to small risk set sizes. The EDF estimate at observation i is computed as O F n .y i / D 1  Y Äy i Â 1  n  R n ./  IŒR n ./  cN ˛  Ã where the definitions of n  and R n ./ are identical to those used for the KAPLANMEIER method described previously. You can specify the values of c and ˛ by using the C= and ALPHA= options. If you do not specify a value for c , the default value used is c D 1 . If you do not specify a value for ˛, the default value used is ˛ D 0:5. As an alternative, you can also specify an absolute lower bound, say L , on the risk set size by using the RSLB= option, in which case I ŒR n ./  cN ˛  is replaced by I ŒR n ./  L in the definition. Statistics of Fit ✦ 1549 EDF Estimates and Left-Truncation If left-truncation is specified without the probability of observability, the estimate O F n .y/ computed by KAPLANMEIER and MODIFIEDKM methods is a conditional estimate. In other words, O F n .y/ D Pr.Y Ä yjY >  G / , where G denotes the (unknown) distribution function of t i and  G D inffs W G.s/ > 0g . In other words,  G is the smallest threshold with a nonzero cumulative probability. For computational purposes, PROC SEVERITY computes  G as  G D minft k W 1 Ä k Ä N g. If left-truncation is specified with the probability of observability p , then PROC SEVERITY uses the additional information provided by p to compute an unconditional estimate of the EDF. In particular, for each left-truncated observation i with response variable value y i and truncation threshold t i , an observation j is added with weight w j D .1  p/=p and y j D t j . Each added observation is assumed to be uncensored; that is, ı j D 1 . Weight on each original observation i is assumed to be 1; that is, w i D 1 . Let N a denote the number of observations in this appended set of observations. Then, the specified EDF method is used by assuming no left-truncation. For the KAPLANMEIER and MODIFIEDKM methods, definitions of n  and R n ./ are modified to account for the weights on the observations. n  is now defined as n  D P N a kD1 w k I Œy k D  and ı k ¤ 0 , and R n ./ is defined as R n ./ D P N a kD1 w k I Œy k   . From the definition of R n ./ , note that each observation in the appended set is assumed to be observed; that is, the left-truncation information is not used, because it was used along with p to add the observations. The estimate that is obtained using this method is an unconditional estimate of the EDF. Statistics of Fit PROC SEVERITY computes and reports various statistics of fit to indicate how well the estimated model fits the data. The statistics belong to two categories: likelihood-based statistics and EDF-based statistics. Statistics Neg2LogLike, AIC, AICC, and BIC are likelihood-based statistics, and statistics KS, AD, and CvM are EDF-based statistics. The following subsections provide definitions of each. Likelihood-Based Statistics Let y i ; i D 1; : : : ; N denote the response variable values. Let L be the likelihood as defined in the section “Likelihood Function” on page 1541. Let p denote the number of model parameters estimated. Note that p D p d C .k  k r / , where p d is the number of distribution parameters, k is the number of regressors, if any, specified in the MODEL statement, and k r is the number of regressors found to be linearly dependent (redundant) on other regressors. Given this notation, the likelihood-based statistics are defined as follows: Neg2LogLike The log likelihood is reported as Neg2LogLike D 2 log.L/ The multiplying factor 2 makes it easy to compare it to the other likelihood- based statistics. A model with a smaller value of Neg2LogLike is deemed better. 1550 ✦ Chapter 22: The SEVERITY Procedure (Experimental) AIC The Akaike’s information criterion (AIC) is defined as AIC D 2 log.L/ C 2p A model with a smaller value of AIC is deemed better. AICC The corrected Akaike’s information criterion (AICC) is defined as AICC D 2 log.L/ C 2Np N  p  1 A model with a smaller value of AICC is deemed better. It corrects the finite- sample bias that AIC has when N is small compared to p . AICC is related to AIC as AICC D AIC C 2p.p C1/ N  p  1 As N becomes large compared to p , AICC converges to AIC. AICC is usually recommended over AIC as a model selection criterion. BIC The Schwarz Bayesian information criterion (BIC) is defined as BIC D 2 log.L/ C p log.N / A model with a smaller value of BIC is deemed better. EDF-Based Statistics This class of statistics is based on the difference between the estimate of the cumulative distribution function (CDF) and the estimate of the empirical distribution function (EDF). Let y i ; i D 1; : : : ; N denote the sample of N values of the response variable. Let r i D P N j D1 I Œy j Ä y i  denote the number of observations with a value less than or equal to y i , where I is an indicator function. Let F n .y i / denote the EDF estimate that is computed by using the method specified in the EMPIRICALCDF= option. Let Z i D O F .y i / denote the estimate of the CDF. Let F n .Z i / denote the EDF estimate of Z i values that are computed using the same method that is used to compute the EDF of y i values. Using the probability integral transformation, if F .y/ is the true distribution of the random variable Y , then the random variable Z D F .y/ is uniformly distributed between 0 and 1 (D’Agostino and Stephens 1986, Ch. 4). Thus, comparing F n .y i / with O F .y i / is equivalent to comparing F n .Z i / with O F .Z i / D Z i (uniform distribution). Note the following two points regarding which CDF estimates are used for computing the test statistics:  If regressor variables are specified, then the CDF estimates Z i used for computing the EDF test statistics are from a mixture distribution. See the section “CDF and PDF Estimates with Regression Effects” on page 1545 for details.  If left-truncation is specified without the probability of observability and the method for computing the EDF estimate is KAPLANMEIER or MODIFIEDKM, then F n .z i / is a conditional estimate of the EDF, as noted in the section “EDF Estimates and Left-Truncation” on Statistics of Fit ✦ 1551 page 1549. However, Z i is an unconditional estimate of the CDF. So, a conditional estimate of the CDF needs to be used for computing the EDF-based statistics. It is denoted by O F c .y i / and defined as: O F c .y i / D O F .y i /  O F .t min / 1  O F .t min / where t min D min i ft i g is the smallest value of the left-truncation threshold. Note that if regressors are specified, then both O F .y i / and O F .t min / are computed from a mixture distribution, as indicated previously. In the following, it is assumed that Z i denotes an appropriate estimate of the CDF if left-truncation or regression effects are specified. Given this, the EDF-based statistics of fit are defined as follows: KS The Kolmogorov-Smirnov (KS) statistic computes the largest vertical distance between the CDF and the EDF. It is formally defined as follows: KS D sup y jF n .y/  F .y/j If the STANDARD method is used to compute the EDF, then the following formula is used: D C D max i . r i N  Z i / D  D max i .Z i  r i1 N / KS D p N max.D C ; D  / C 0:19 p N Note that r 0 is assumed to be 0. If the method used to compute the EDF is any method other than the STANDARD method, then the following formula is used: D C D max i .F n .Z i / Z i /; if F n .Z i /  Z i D  D max i .Z i  F n .Z i //; if F n .Z i / < Z i KS D p N max.D C ; D  / C 0:19 p N AD The Anderson-Darling (AD) statistic is a quadratic EDF statistic that is proportional to the expected value of the weighted squared difference between the EDF and CDF. It is formally defined as follows: AD D N Z 1 1 .F n .y/  F .y// 2 F .y/.1 F .y// dF .y/ If the STANDARD method is used to compute the EDF, then the following formula is used: AD D N  1 N N X iD1 Œ .2r i  1/ log.Z i / C.2N C 1  2r i / log.1  Z i /  . definition that combines both censoring and truncation information (Klein and Moeschberger 199 7, Lai and Ying 199 1). The EDF estimate at observation i is computed as O F n .y i / D 1  Y Äy i Â 1  n  R n ./ Ã where. known as the product-limit estimator, was first introduced by Kaplan and Meier ( 195 8) for censored data. Lynden-Bell ( 197 1) derived a similar estimator for left-truncated data. PROC SEVERITY uses. the left tail and propagate to the entire range of data. This was demonstrated by Lai and Ying ( 199 1). They proposed a modification to the estimator that ignores the effects due to small risk set

Định dạng
Số trang	10
Dung lượng	256,37 KB