1532 ✦ Chapter 22: The SEVERITY Procedure (Experimental) Given n observations of the severity value y i ( 1 Ä i Ä n ), the estimate of k th raw moment is denoted by m k and computed as m k D 1 n n X iD1 y k i The 100pth percentile is denoted by p (0 Ä p Ä 1). By definition, p satisfies F . p / Ä p Ä F . p / where F . p / D lim h#0 F . p h/ . PROC SEVERITY uses the following practical method of computing p . Let O F .y/ denote the empirical distribution function (EDF) estimate at a severity value y . This estimate is computed by PROC SEVERITY and supplied to the name_PARMINIT subroutine. Let y p and y C p denote two consecutive values in the array of y values such that O F .y p / < p and O F .y C p / p. Then, the estimate O p is computed as O p D y p C p O F p O F C p O F p .y C p y p / where O F C p D O F .y C p / and O F p D O F .y p /. Let denote the smallest double-precision floating-point number such that 1 C > 1 . This machine precision constant can be obtained by using the CONSTANT function in Base SAS software. The details of how parameters are initialized for each predefined distribution model are as follows: BURR The parameters are initialized by using the method of moments. The k th raw moment of the Burr distribution is: EŒX k D  k .1 C k=/.˛ k=/ .˛/ ; < k < ˛ Three moment equations EŒX k D m k ( k D 1; 2; 3 ) need to be solved for initializing the three parameters of the distribution. In order to get an approximate closed form solution, the second shape parameter O is initialized to a value of 2 . If 2m 3 3m 1 m 2 > 0 , then simplifying and solving the moment equations yields the following feasible set of initial values: O  D r m 2 m 3 2m 3 3m 1 m 2 ; O˛ D 1 C m 3 2m 3 3m 1 m 2 ; O D 2 If 2m 3 3m 1 m 2 < , then the parameters are initialized as follows: O  D p m 2 ; O˛ D 2; O D 2 EXP The parameters are initialized by using the method of moments. The k th raw moment of the exponential distribution is: EŒX k D  k .k C 1/; k > 1 Solving EŒX D m 1 yields the initial value of O  D m 1 . Predefined Distribution Models ✦ 1533 GAMMA The parameter ˛ is initialized by using its approximate maximum likelihood (ML) estimate. For a set of n iid observations y i ( 1 Ä i Ä n ), drawn from a gamma distribution, the log likelihood, l, is defined as follows: l D n X iD1 log y ˛1 i e y i =  ˛ .˛/ ! D .˛ 1/ n X iD1 log.y i / 1  n X iD1 y i n˛ log.Â/ n log..˛// Using a shorter notation of P to denote P n iD1 and solving the equation @l=@ D 0 yields the following ML estimate of Â: O  D P y i n˛ D m 1 ˛ Substituting this estimate in the expression of l and simplifying gives l D .˛ 1/ X log.y i / n˛ n˛ log.m 1 / C n˛ log.˛/ n log..˛// Let d be defined as follows: d D log.m 1 / 1 n X log.y i / Solving the equation @l=@˛ D 0 yields the following expression in terms of the digamma function, .˛/: log.˛/ .˛/ D d The digamma function can be approximated as follows: O .˛/ log.˛/ 1 ˛  0:5 C 1 12˛ C2 à This approximation is within 1.4% of the true value for all the values of ˛ > 0 except when ˛ is arbitrarily close to the positive root of the digamma function (which is approximately 1.461632). Even for the values of ˛ that are close to the positive root, the absolute error between true and approximate values is still acceptable ( j O .˛/ .˛/j < 0:005 for ˛ > 1:07 ). Solving the equation that arises from this approximation yields the following estimate of ˛: O˛ D 3 d C p .d 3/ 2 C 24d 12d If this approximate ML estimate is infeasible, then the method of moments is used. The kth raw moment of the gamma distribution is: EŒX k D  k .˛ C k/ .˛/ ; k > ˛ Solving EŒX D m 1 and EŒX 2 D m 2 yields the following initial value for ˛: O˛ D m 2 1 m 2 m 2 1 1534 ✦ Chapter 22: The SEVERITY Procedure (Experimental) If m 2 m 2 1 < (almost zero sample variance), then ˛ is initialized as follows: O˛ D 1 After computing the estimate of ˛, the estimate of  is computed as follows: O  D m 1 O˛ Both the maximum likelihood method and the method of moments arrive at the same relationship between O˛ and O Â. GPD The parameters are initialized by using the method of moments. Notice that for > 0 , the CDF of the generalized Pareto distribution (GPD) is: F .x/ D 1  1 C x  à 1= D 1  Â= x C Â= à 1= This is equivalent to a Pareto distribution with scale parameter  1 D Â= and shape parameter ˛ D 1= . Using this relationship, the parameter initialization method used for the PARETO distribution model is used to get the following initial values for the parameters of the GPD distribution model: O  D m 1 m 2 2.m 2 m 2 1 / ; O D m 2 2m 2 1 2.m 2 m 2 1 / If m 2 m 2 1 < (almost zero sample variance) or m 2 2m 2 1 < , then the parameters are initialized as follows: O  D m 1 2 ; O D 1 2 IGAUSS The parameters are initialized by using the method of moments. Note that the standard parameterization of the inverse Gaussian distribution (also known as the Wald distribution), in terms of the location parameter and shape parameter , is as follows (Klugman, Panjer, and Willmot 1998, p. 583): f .x/ D r 2x 3 exp  .x / 2 2 2 x à F .x/ D ˆ  x 1 à r x ! C ˆ  x C 1 à r x ! exp  2 à For this parameterization, it is known that the mean is EŒX D and the variance is VarŒX D 3 = , which yields the second raw moment as EŒX 2 D 2 .1 C =/ (computed by using EŒX 2 D VarŒX C.EŒX/ 2 ). The predefined IGAUSS distribution model in PROC SEVERITY uses the following alternate parameterization to allow the distribution to have a scale parameter, Â: f .x/ D r ˛Â 2x 3 exp  ˛.x Â/ 2 2x à F .x/ D ˆ x  1 Á r ˛Â x ! C ˆ x  C 1 Á r ˛Â x ! exp . 2˛ / Predefined Distribution Models ✦ 1535 The parameters  (scale) and ˛ (shape) of this alternate form are related to the parameters and of the preceding form such that  D and ˛ D = . Using this relationship, the first and second raw moments of the IGAUSS distribution are: EŒX D  EŒX 2 D  2  1 C 1 ˛ à Solving EŒX D m 1 and EŒX 2 D m 2 yields the following initial values: O  D m 1 ; O˛ D m 2 1 m 2 m 2 1 If m 2 m 2 1 < (almost zero sample variance), then the parameters are initialized as follows: O  D m 1 ; O˛ D 1 LOGN The parameters are initialized by using the method of moments. The k th raw moment of the lognormal distribution is: EŒX k D exp  k C k 2 2 2 à Solving EŒX D m 1 and EŒX 2 D m 2 yields the following initial values: O D 2 log.m1/ log.m2/ 2 ; O D p log.m2/ 2 log.m1/ PARETO The parameters are initialized by using the method of moments. The k th raw moment of the Pareto distribution is: EŒX k D  k .k C 1/.˛ k/ .˛/ ; 1 < k < ˛ Solving EŒX D m 1 and EŒX 2 D m 2 yields the following initial values: O  D m 1 m 2 m 2 2m 2 1 ; O˛ D 2.m 2 m 2 1 / m 2 2m 2 1 If m 2 m 2 1 < (almost zero sample variance) or m 2 2m 2 1 < , then the parameters are initialized as follows: O  D m 1 ; O˛ D 2 WEIBULL The parameters are initialized by using the percentile matching method. Let q1 and q3 denote the estimates of the 25 th and 75 th percentiles, respectively. Using the formula for the CDF of Weibull distribution, they can be written as 1 exp..q1=Â/ / D 0:25 1 exp..q3=Â/ / D 0:75 1536 ✦ Chapter 22: The SEVERITY Procedure (Experimental) Simplifying and solving these two equations yields the following initial values: O  D exp  r log.q1/ log.q3/ r 1 à ; O D log.log.4// log.q3/ log. O Â/ where r D log.log.4//= log.log.4=3// . These initial values agree with those sug- gested in Klugman, Panjer, and Willmot (1998). A summary of the initial values of all the parameters for all the predefined distributions is given in Table 22.4. The table also provides the names of the parameters to use in the INIT= option in the DIST statement if you want to provide a different initial value. Table 22.4 Parameter Initialization for Predefined Distributions Distribution Parameter Name for INIT option Default Initial Value BURR  theta q m 2 m 3 2m 3 3m 1 m 2 ˛ alpha 1 C m 3 2m 3 3m 1 m 2 gamma 2 EXP  theta m 1 GAMMA  theta m 1 =˛ ˛ alpha 3d C p .d 3/ 2 C24d 12d GPD  theta m 1 m 2 =.2.m 2 m 2 1 // xi .m 2 2m 2 1 /=.2.m 2 m 2 1 // IGAUSS  theta m 1 ˛ alpha m 2 1 =.m 2 m 2 1 / LOGN mu 2 log.m1/ log.m2/=2 sigma p log.m2/ 2 log.m1/ PARETO  theta m 1 m 2 =.m 2 2m 2 1 / ˛ alpha 2.m 2 m 2 1 /=.m 2 2m 2 1 / WEIBULL  theta exp r log.q1/log.q3/ r1 Á tau log.log.4//=.log.q3/ log. O Â// Notes: m k denotes the kth raw moment d D log.m 1 / . P log.y i //=n q1 and q3 denote the 25th and 75th percentiles, respectively r D log.log.4//= log.log.4=3// Predefined Utility Functions ✦ 1537 Predefined Utility Functions The following predefined utility functions are provided with the SEVERITY procedure and are available in the SASHELP.SVRTDIST library: SVRTUTIL_HILLCUTOFF: This function computes an estimate of the value where the right tail of a distribution is expected to begin. The function implements the algorithm described in Danielsson et al. 2001. The description of the algorithm uses the following notation: n number of observations in the original sample B number of bootstrap samples to draw m 1 size of the bootstrap sample in the first step of the algorithm (m 1 < n) x j;m .i/ ith order statistic of j th bootstrap sample of size m (1 Ä i Ä m; 1 Ä j Ä B) x .i/ ith order statistic of the original sample (1 Ä i Ä n) Given the input sample x and values of B and m 1 , the steps of the algorithm are as follows: 1. Take B bootstrap samples of size m 1 from the original sample. 2. Find the integer k 1 that minimizes the bootstrap estimate of the mean squared error: k 1 D arg min 1Äk<m 1 Q.m 1 ; k/ 3. Take B bootstrap samples of size m 2 D m 2 1 =n from the original sample. 4. Find the integer k 2 that minimizes the bootstrap estimate of the mean squared error: k 2 D arg min 1Äk<m 2 Q.m 2 ; k/ 5. Compute the integer k opt , which is used for computing the cutoff point: k opt D k 2 1 k 2  log.k 1 / 2 log.m 1 / log.k 1 / à 22 log.k 1 /= log.m 1 / 6. Set the cutoff point equal to x .k opt C1/ . The bootstrap estimate of the mean squared error is computed as Q.m; k/ D 1 B B X j D1 MSE j .m; k/ The mean squared error of j th bootstrap sample is computed as MSE j .m; k/ D .M j .m; k/ 2. j .m; k// 2 / 2 where M j .m; k/ is a control variate proposed by Danielsson et al. 2001, M j .m; k/ D 1 k k X iD1 log.x j;m .miC1/ / log.x j;m .mk/ / Á 2 1538 ✦ Chapter 22: The SEVERITY Procedure (Experimental) and j .m; k/ is the Hill’s estimator of the tail index (Hill 1975), j .m; k/ D 1 k k X iD1 log.x j;m .miC1/ / log.x j;m .mk/ / This algorithm has two tuning parameters, B and m 1 . The number of bootstrap samples B is chosen based on the availability of computational resources. The optimal value of m 1 is chosen such that the following ratio, R.m 1 /, is minimized: R.m 1 / D .Q.m 1 ; k 1 // 2 Q.m 2 ; k 2 / The SVRTUTIL_HILLCUTOFF utility function implements the preceding algorithm. It uses the grid search method to compute the optimal value of m 1 . Type: Function Signature: SVRTUTIL_HILLCUTOFF(n, x{*}, b, s, status) Argument Description: n Dimension of the array x. x{*} Input numeric array of dimension n that contains the sample. b Number of bootstrap samples used to estimate the mean squared error. If b is less than 10, then a default value of 50 is used. s Approximate number of steps used to search the optimal value of m 1 in the range Œn 0:75 ; n 1 . If s is less than or equal to 1, then a default value of 10 is used. status Output argument that contains the status of the algorithm. If the algorithm succeeds in computing a valid cutoff point, then status is set to 0. If the algorithm fails, then status is set to 1. Return value: The cutoff value where the right tail is estimated to start. If the size of the input sample is inadequate ( n Ä 5 ), then a missing value is returned and status is set to a missing value. If the algorithm fails to estimate a valid cutoff value (status = 1), then the fifth largest value in the input sample is returned. SVRTUTIL_PERCENTILE: This function computes the specified percentile given the EDF estimates of a sample. Let F .x/ denote the EDF estimate at x . Let x p and x C p denote two consecutive values in the sample of x values such that F .x p / < p and F .x C p / p . Then, the function computes the 100pth percentile p as p D x p C p F p F C p F p .x C p x p / where F C p D F .x C p / and F p D F .x p /. Type: Function Signature: SVRTUTIL_PERCENTILE(p, n, x{*}, F{*}) Predefined Utility Functions ✦ 1539 Argument Description: p Desired percentile. The value must be in the interval (0,1). The function returns the 100pth percentile. n Dimension of the x and F input arrays. x{*} Input numeric array of dimension n that contains distinct values of the random variable observed in the sample. These values must be sorted in increasing order. F{*} Input numeric array of dimension n in which each F[ i ] contains the EDF estimate for x[i]. These values must be sorted in nondecreasing order. Return value: The 100pth percentile of the input sample. SVRTUTIL_RAWMOMENTS: This subroutine computes the raw moments of a sample. Type: Subroutine Signature: SVRTUTIL_RAWMOMENTS(n, x{*}, nx{*}, nRaw, raw{*}) Argument Description: n Dimension of the x and nx input arrays. x{*} Input numeric array of dimension n that contains distinct values of the random variable that are observed in the sample. nx{*} Input numeric array of dimension n in which each nx[ i ] contains the number of observations in the sample that have the value x[i]. nRaw Desired number of raw moments. The output array raw contains the first nRaw raw moments. raw{*} Output array of raw moments. The k th element in the array (raw{ k }) contains the kth raw moment, where 1 Ä k Ä nRaw. Return value: Numeric array raw that contains the first nRaw raw moments. The array contains missing values if the sample has no observations (that is, if all the values in the nx array add up to zero). SVRTUTIL_SORT: This function sorts the given array of numeric values in an ascending or descending order. Type: Subroutine Signature: SVRTUTIL_SORT(n, x{*}, flag) Argument Description: n Dimension of the input array x. x{*} Numeric array that contains the values to be sorted at input. The subroutine uses the same array to return the sorted values. flag A numeric value that controls the sort order. If flag is 0, then the values are sorted in an ascending order. If flag has any value other than 0, then the values are sorted in descending order. Return value: Numeric array x, which is sorted in place (that is, the sorted array is stored in the same storage area occupied by the input array x). 1540 ✦ Chapter 22: The SEVERITY Procedure (Experimental) Censoring and Truncation One of the key features of PROC SEVERITY is that it enables you to specify whether the severity event’s magnitude is observable and if it is observable, then whether the exact value of the magnitude is known. If an event is unobservable when the magnitude is in certain intervals, then it is referred to as a truncation effect. If the exact magnitude of the event is not known, but it is known to have a value in a certain interval, then it is referred to as a censoring effect. PROC SEVERITY allows a severity event to be left-truncated and right-censored. An event is said to be left-truncated if it is observed only when Y > T , where Y denotes the random variable for the magnitude and T denotes a random variable for the truncation threshold. An event is said to be right-censored if it is known that the magnitude is Y > C , but the exact value of Y is not known. C is a random variable for the censoring limit. PROC SEVERITY assumes that the input data is given as a triplet .y i ; t i ; ı i /; .i D 1; : : : ; N / , where N is the number of observations (in a BY group), y i is the observed value (magnitude) of the response (event) variable, t i is the left-truncation threshold, and ı i is a right-censoring indicator. If ı i is equal to one of the values specified in the RIGHTCENSORED= option (or 0 if no indicator value is specified), then it indicates that y i is right-censored. In that case, the censoring limit c i is assumed to be equal to the recorded value y i . If ı i is not equal to one of the indicator values or has a missing value, then y i is assumed to be the exact event value; that is, the observation is uncensored. If the global left-truncation threshold T g is specified by using the LEFTTRUNCATED= option, then t i D T g ; 8i . If y i Ä t i for some i , then that observation is ignored and a warning is written to the SAS log. A missing value for t i indicates that the observation is not left-truncated. If the global right-censoring limit C g is specified by using the RIGHTCENSORED= option, then y i is compared with C g . If y i < C g , then ı i D 1 to indicate exact (uncensored) observation; otherwise, ı i D 0 to indicate right-censored observation. Note that the case of y i D C g is considered as right-censored, because it is assumed that the actual event magnitude is greater than C g . However, it gets recorded as C g . If y i > C g for some observation, then it is reduced to the limit ( y i D C g ) and a warning is written to the SAS log. Specification of right-censoring and left-truncation affects the likelihood of the data (see the section “Likelihood Function” on page 1541) and how the empirical distribution function (EDF) is estimated (see the section “Empirical Distribution Function Estimation Methods” on page 1547). Probability of Observability For left-truncated data, PROC SEVERITY also enables you to provide additional information in the form of probability of observability by using the PROBOBSERVED= option. It is defined as the probability that the underlying severity event gets observed (and recorded) for the specified left-truncation threshold value. For example, if you specify a value of 0.75, then for every 75 observations recorded above a specified threshold, 25 more events have happened with a severity value less than or equal to the specified threshold. Although the exact severity value of those 25 events is not known, PROC SEVERITY can use the information about the number of those events. Parameter Estimation Method ✦ 1541 In particular, for each left-truncated observation, PROC SEVERITY assumes a presence of .1p/=p additional observations with y i D t i . These additional observations are then used for computing the likelihood (see the section “Probability of Observability and Likelihood” on page 1542) and an unconditional estimate of the empirical distribution function (see the section “EDF Estimates and Left-Truncation” on page 1549). Parameter Estimation Method PROC SEVERITY uses the maximum likelihood (ML) method to estimate the parameters of each model. A nonlinear optimization process is used to maximize the log of the likelihood function. Likelihood Function Let Y denote the random response variable, and let y denote its value recorded in an observation in the input data set. Let ı denote the censoring indicator: ı D 1 indicates that the observation is uncensored (sometimes referred to as an event observation) and ı D 0 indicates that the observation is right-censored. When ı D 0 , the recorded value of y is assumed to be the censoring limit, denoted by c . Let t denote the left-truncation threshold. Let f ‚ .y/ and F ‚ .y/ denote the PDF and CDF respectively, evaluated at y for a set of parameter values ‚ . Then, the set of input observations can be categorized into the following four subsets within each BY group: E : the set of uncensored observations that are not left-truncated. The likelihood of an observation i 2 E is l i2E D Pr.Y D y i / D f ‚ .y i / E l : the set of uncensored observations that are left-truncated. The likelihood of an observation j 2 E l is l j 2E l D Pr.Y D y j jY > t j / D f ‚ .y j / 1 F ‚ .t j / C : the set of right-censored observations that are not left-truncated. The likelihood of an observation k 2 C is l k2C D Pr.Y > c k / D 1 F ‚ .c k / C l : the set of right-censored observations that are left-truncated. The likelihood of an observa- tion m 2 C l is l m2C l D Pr.Y > c m jY > t m / D 1 F ‚ .c m / 1 F ‚ .t m / Note that .E [ E l / \.C [C l / D ; . Also, the sets E l and C l are empty when left-truncation is not specified, and the sets C and C l are empty when right-censoring is not specified. . sug- gested in Klugman, Panjer, and Willmot ( 199 8). A summary of the initial values of all the parameters for all the predefined distributions is given in Table 22. 4. The table also provides the names. of the location parameter and shape parameter , is as follows (Klugman, Panjer, and Willmot 199 8, p. 583): f .x/ D r 2x 3 exp  .x / 2 2 2 x à F .x/ D ˆ  x 1 à r x ! C ˆ . D 1 k k X iD1 log.x j;m .miC1/ / log.x j;m .mk/ / Á 2 1538 ✦ Chapter 22: The SEVERITY Procedure (Experimental) and j .m; k/ is the Hill’s estimator of the tail index (Hill 197 5), j .m; k/ D 1 k k X iD1 log.x j;m .miC1/ /