1. Trang chủ
  2. » Công Nghệ Thông Tin

Statistical Methods for Survival Data Analysis Third Edition phần 4 potx

54 320 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 54
Dung lượng 312,67 KB

Nội dung

Figure 6.13 Lognormal probability plot of the survival time in months plus 4 of 162 male patients with chronic myelocytic leukemia. (From Feinleib and MacMahon, 1960. Reproduced by permission of the publisher.) Figure 6.14 Lognormal probability plot of the survival time in months plus 4 of female patients with two types of leukemia. (From Feinleib and MacMahon, 1960. Reproduced by permission of the publisher.) Suppose that failure or death takes place in n stages or as soon as n subfailures have happened. At the end of the first stage, after time T  , the first subfailure occurs; after that the second stage begins and the second subfailure occurs after time T  ; and so on. Total failure or death occurs at the end of the nth stage, when the nth subfailure happens. The survival time, T, is then T  ; T  ; %; T L . The times T  , T  , , T L spent in each stage are assumed to      149 Figure 6.15 Gamma hazard functions with : 1. be independently exponentially distributed with probability density function  exp(9t G ), i : 1, , n. That is, the subfailures occur independently at a constant rate . The distribution of T is then called the Erlangian distribution. There is no need for the stages to have physical significance since we can always assume that death occurs in the n-stage process just described. This idea, introduced by A. K. Erlang in his study of congestion in telephone systems, has been used widely in queuing theory and life processes. A natural generalization of the Erlangian distribution is to replace the parameter n restricted to the integers 1, 2, . . . by a parameter  taking any real positive value. We then obtain the gamma distribution. The gamma distribution is characterized by two parameters,  and . When 0 ::1, there is negative aging and the hazard rate decreases monotonically from infinity to  as time increases from 0 to infinity. When 91, there is positive aging and the hazard rate increases monotonically from 0 to  as time increases from 0 to infinity. When  : 1, the hazard rate equals , a constant, as in the exponential case. Figure 6.15 illustrates the gamma hazard function for  : 1 and :1,  : 1, 2, 4. Thus, the gamma distribution describes a different type of survival pattern where the hazard rate is decreasing or increasing to a constant value as time approaches infinity. The probability density function of a gamma distribution is f (t) :  () (t)A\e\HR t 90, 90, 90 (6.4.1) where () is defined as in (6.2.9). Figures 6.16 and 6.17 show the gamma density function with various values of  and . It is seen that varying  changes the shape of the distribution while varying  changes only the scaling. Consequently,  and  are shape and scale parameters, respectively. When 91, there is a single peak at t : ( 9 1)/. 150 -    Figure 6.16 Gamma density functions with : 1. Figure 6.17 Gamma density functions with  : 3. The cumulative distribution function F(t) has a more complex form: F(t) :  R   () (x)A\e\HV dx (6.4.2) : 1 ()  HR  uA\e\S du : I(t, )(6.4.3) where I(s, ) : 1 ()  Q  uA\e\S du (6.4.4) known as the incomplete gamma function, is tabulated in Pearson (1922, 1957).      151 For the Erlangian distribution, it can be shown that F(t) : 1 9 L\  I e\HR(t)I k! (6.4.5) Thus, the survivorship function 1 9 F(t)is S(t) :   R  () (x)A\e\HV dx (6.4.6) for the gamma distribution or S(t) : e\R L\  I (t)I k! (6.4.7) for the Erlangian distribution. Since the hazard function is the ratio of f (t)toS(t), it can be calculated from (6.4.1) and (6.4.7). When  is an integer n, h(t) : (t)L\ (n 9 1)!  L\ I (1/k!)(t)I (6.4.8) When  : 1, the distribution is exponential. When  :   and  :   , where  is an integer, the distribution is chi-square with  degrees of freedom. The mean and variance of the standard gamma distribution are, respectively, / and /, so that the coefficient of variation is 1/(. Many survival distributions can be represented, at least roughly, by suitable choice of the parameters  and . In many cases, there is an advantage in using the Erlangian distribution, that is, in taking  integer. The exponential, Weibull, lognormal, and gamma distributions are special cases of a generalized gamma distribution with three parameters, , , and , whose density function is defined as f (t) : ?A () t?A\ exp [9(t)?] t 90, 90, 90, 90(6.4.9) It is easily seen that this generalized gamma distribution is the exponential distribution if  :  : 1, the Weibull distribution if  : 1; the lognormal distribution if ; -, and the gamma distribution if  : 1. In later chapters (e.g., Chapters 7 and 9), we discuss several parametric procedures for estimation and hypothesis testing. To use available computer software such as SAS to carry out the computation, we use the distributions adopted by the software. One of the very few software packages that include the gamma or generalized gamma distribution is SAS. In SAS, the generalized 152 -    Table 6.4 Lifetimes of 101 Strips of Aluminum Coupon 370 706 716 746 785 797 844 855 858 886 886 930 960 988 990 1000 1010 1016 1018 1020 1055 1085 1102 1102 1108 1115 1120 1134 1140 1199 1200 1200 1203 1222 1235 1238 1252 1258 1262 1269 1270 1290 1293 1300 1310 1313 1315 1330 1355 1390 1416 1419 1420 1420 1450 1452 1475 1478 1481 1485 1502 1505 1513 1522 1522 1530 1540 1560 1567 1578 1594 1602 1604 1608 1630 1642 1674 1730 1750 1750 1763 1768 1781 1782 1792 1820 1868 1881 1890 1893 1895 1910 1923 1940 1945 2023 2100 2130 2215 2268 2440 Source: Birnbaum and Saunders (1958). gamma distribution is defined as having the following density function: f (t) : ""A?A () t?A\ exp[9(t)?], t 90, 90, 90(6.4.10) To differentiate this form of the generalized gamma distribution from the generalized gamma in (6.4.9), we refer to this distribution as the extended generalized gamma distribution. It can be shown that the extended generalized gamma distribution reduces to the Weibull distribution when 90 and : 1, the lognormal distribution when  ; -, the gamma distribution when  : 1, and the exponential distribution when  :  : 1. Example 6.4 Birnbaum and Saunders (1958) report an application of the gamma distribution to the lifetime of aluminum coupon. In their study, 17 sets of six strips were placed in a specially designed machine. Periodic loading was applied to the strips with a frequency of 18 hertz and a maximum stress of 21,000 pounds per square inch. The 102 strips were run until all of them failed. One of the 102 strips tested had to be discarded for an extraneous reason, yielding 101 observations. The lifetime data are given in Table 6.4 in ascending order. From the data the two parameters of the gamma distribution were      153 Figure 6.18 Graphical comparison of observed and fitted cumulative distribution functions of data in Example 6.4. (From Birnbaum and Saunders, 1958.) estimated (estimation methods are discussed in Chapter 7). They obtained  : 11.8 and  : 1/(118.76;10). A graphical comparison of the observed and fitted cumulative distribution function is given in Figure 6.18, which shows very good agreement. A chi-square goodness-of-fit test (discussed in Chapter 9) yielded a  value of 4.49 for 6 degrees of freedom, corresponding to a significance level between 0.5 and 0.6. Thus, it was concluded that the gamma distribution was an adequate model for the life length of some materials. 6.5 LOG-LOGISTIC DISTRIBUTION The survival time T has a log-logistic distribution if log(T ) has a logistic distribution. The density, survivorship, hazard, and cumulative hazard func- tions of the log-logistic distribution are, respectively, f (t) : tA\ (1 ; tA) (6.5.1) S(t) : 1 1 ; tA (6.5.2) 154 -    h(t) : tA\ 1 ; tA (6.5.3) H(t) : log(1 ; tA)(6.5.4) t .0, 90, 90 The log-logistic distribution is characterized by two parameters , and . The median of the log-logistic distribution is \A. Figure 6.19(a) to (c) show the log-logistic hazard, density, and survivorship functions with :1 and various values of  : 2.0, 1, and 0.67. When 91, the log-logistic hazard has the value 0 at time 0, increases to a peak at t :( 9 1)A/A, and then declines, which is similar to the lognormal hazard. When : 1, the hazard starts at A and then declines monotonically. When :1, the hazard starts at infinity and then declines, which is similar to the Weibull distribution. The hazard function declines toward 0 as t ap- proaches infinity. Thus, the log-logistic distribution may be used to describe a first increasing and then decreasing hazard or a monotonically decreasing hazard. Example 6.5 Byers et al. (1988) used the log-logistic distribution to describe the rate of spread of HIV between 1978 and 1986. Between 1978 and 1980, over 6700 homosexual and bisexual men in San Francisco were enrolled in studies of the prevalence and incidence of sexually transmitted hepatitis B virus (HBV) infections. Blood specimens were collected from the participants. Four hundred and eighty-eight men who were HBV-seronegative were ran- domly selected to participate in a study of HIV infection later. These men agreed to allow the investigators to test the specimens collected previously together with a current specimen. For those who convert to positive, the infection time is only known to have occurred between the previous negative test and the time of the first positive one. The exact time is unknown. The time to infection is therefore interval censored. The investigators tried to fit several distributions to the interval-censored data, including the Weibull and log- logistic by maximum likelihood methods (discussed in Chapter 7). Based on the Akaike information criterion (discussed in Chapter 9), the log-logistic distribution was found to provide the best fit to the data. The maximum likelihood estimates of the two parameters are  : 0.003757 and  : 1.424328. Based on the log-logistic model, the median infection time is estimated to be 50.4 months, and the hazard function approaches its peak at 27.6 months. 6.6 OTHER SURVIVAL DISTRIBUTIONS Many other distributions can be used as models of survival time, three of which we discuss briefly in this section: the linear exponential, the Gompertz (1825),    155 (a) (b) (c) Figure 6.19 (a) Hazard function of the log-logistic distribution; (b) density function of the log-logistic distribution; (c) Survivorship function of the log-logistic distribution. 156 Figure 6.20 Hazard function of linear-exponential model. and a distribution whose hazard rate is a step function. The linear-exponential model and the Gompertz distribution are extensions of the exponential distribution. Both describe survival patterns that have a constant initial hazard rate. The hazard rate varies as a linear function of time or age in the linear-exponential model and as an exponential function of time or age in the Gompertz distribution. In demonstrating the use of the linear-exponential model, Broadbent (1958), uses as an example the service of milk bottles that are filled in a dairy, circulated to customers, and returned empty to the dairy. The model was also used by Carbone et al. (1967) to describe the survival pattern of patients with plasmacytic myeloma. The hazard function of the linear-exponential distribu- tion is h(t) : ; t (6.6.1) where  and  can be values such that h(t) is nonnegative. The hazard rate increases from  with time if 90, decreases if :0, and remains constant (an exponential case) if  : 0, as depicted in Figure 6.20. The probability density function and the survivorship function are, respec- tively, f (t) : ( ; t) exp[9(t ;   t)] (6.6.2) and S(t) : exp[9(t ;   t)] (6.6.3) The mean of the linear-exponential distribution is 9(/) ; (/2)\L (/2), where L (x) : eV   V y e\W dy    157 Table 6.5 Values of L(x) and G(x) xL(x) G(x) 00.886 - 0.10.951 2.015 0.21.012 1.493 0.31.067 1.223 0.41.119 1.048 0.51.168 0.923 0.61.214 0.828 0.71.258 0.753 0.81.300 0.691 0.91.341 0.640 11.381 0.596 21.712 0.361 31.987 0.262 Source: Broadbent (1958). Figure 6.21 Gompertz hazard function. is tabulated in Table 6.5. A special case of the linear-exponential distribution, the Rayleigh distribution, is obtained by replacing  by    (Kodlin, 1967). That is, the hazard function of the Rayleigh distribution is h(t) : ;   t. The Gompertz distribution is also characterized by two parameters,  and . The hazard function, h(t) : exp( ; t)(6.6.4) is plotted in Figure 6.21. When 90, there is positive aging starting from eH; when :0, there is negative aging; and when  : 0, h(t) reduces to a constant, eH. The survivorship function of the Gompertz distribution is S(t) : exp  9 eH  (eAR 9 1)  (6.6.5) 158 -    [...]... exp(5. 846 ; 0.217),, or 10.2 04, weeks A 95% confidence interval for is 2.923 9 2.776 0.521 (4 : : 2.923 ; 2.776 0.521 (4 or (2.200,3. 646 ) A 95% confidence interval for , following (7 .4. 4), is 5(0.217) 5(0.217) : : 11. 143 3 0 .48 44 or (0.097, 2. 240 ) 7 .4. 2 Estimation of and 2 for Data with Censored Observations We first consider samples with singly censored observations The data consist of r exact survival. .. above, confidence intervals for and can also be obtained A 95% confidence interval for the relapse rate , following (7.2.7), is approximately (0.106)( 24. 433) (0.106)(59. 342 ) : : 42 42 or (0.062, 0.150) A 95% confidence interval for the mean remission time, 168       following (7.2.9), is (42 )(9 .42 9) (42 )(9 .42 9) : : 59. 342 24. 433 or (6.673, 16.208) Once... confidence interval for (0.058)(3. 247 ) (0.058)(20 .48 3) : : (2)(5) (2)(5) by (7.2.12) is 170       or (0.019, 0.119) A 95% confidence interval for following (7.2.13) is 2(5)(17. 241 ) 2(5)(17. 241 ) : : 20 .48 3 3. 247 or (8 .41 7, 53.098) The probability of surviving a given time for the mice can be estimated from (7.2.2) For example, the probability that a mouse... by t G : max t   9 1 ,0 n (7.2 .40 )   177 and the variance of G is  Var(G ) :  1 1 1; (n  ) r91 (7.2 .41 ) When n is large, G and Var(G) can be estimated by  G 9 nt G G GP> G  (7.2 .44 ) with variance Var(  ) :  r91 (7.2 .45 ) Any percentile of survival time t may be estimated by... Suppose that the remission times follow a lognormal distribution In this case, parameters are estimated by (7 .4. 2) and (7 .4. 3) as follows: log t t 8 16 23 27 28  : (log t) 2.079 2.773 3.135 3.296 3.332 ——— 14. 615 4. 322 7.690 9.828 10.8 64 11.102 ——— 43 .806 14. 615 : 2.923 5  : 1 1 43 .806 9 ( 14. 615) : 0.217 5 5 s : 5 : 0.271 591 182       The... mice are 4, 5, 8, 9, and 10 weeks The survival data of the 10 mice are 4, 5, 8, 9, 10, 10;, 10;, 10;, 10;, and 10; Assuming that the failure of these mice follows an exponential distribution, the survival rate and mean survival time are estimated, respectively, according to (7.2.10) and (7.2.11) by : 5 : 0.058 per week 36 ; 50 and  : 1/0.058 : 17. 241 weeks A 95% confidence interval for (0.058)(3. 247 ) (0.058)(20 .48 3)... confidence interval for computed from (7.2.33) is 2(600) 2(600) : : 31.526 8.231 or (38.0 64, 145 .790) When data are progressively censored, Gehan (1970) derives an estimate for G and a modified MLE for the hazard rate Suppose that r out of the n individuals in the study die before the end of the study and n 9 r individuals are alive at the time of the last follow-up or termination The n survival times are... times in weeks for 21 patients with acute leukemia: 1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 8, 8, 9, 10, 10, 12, 14, 16, 20, 24, and 34 Assume that remission duration follows the exponential distribution Let us estimate the parameter by using the formulas given above According to (7.2.5), the MLE of the relapse rate, , is  : 21 : 0.106 per week 198 The mean remission time is then 198/21 : 9 .42 9 weeks Using... S(t) to p and N solving for t ; that is, t : 9(log p)/  ; G  N N C The following example illustrates the procedures Example 7.8 Suppose that 19 patients with brain tumor are followed in a clinical trial for a year Their survival times in weeks are 3, 4, 6, 8, 8, 10, 12, 16, 17, 30, 33, 3;, 8;, 13;, 21;, 26;, 35;, 44 ;, and 45 ; In this case n : 19, r : 11, t : 3,  t : 147 , and  t> : 195 The hazard... and its variance may be estimated by (7.2 .44 ) and (7.2 .45 ) as  : 10 : 0.035 147 ; 195 9 19(3) and  Var(  ) : (0.035) : 0.0001 10 The guarantee time G and its variance may then be estimated by (7.2 .40 ) and (7.2 .41 ): 178       G : max 3 9  1 , 0 : 1 .49 6 19;0.035 and   Var(G ) : 1 (19;0.035) 1; 1 : 2 .48 7 10 Thus, after a guarantee time of approximately . Coupon 370 706 716 746 785 797 844 855 858 886 886 930 960 988 990 1000 1010 1016 1018 1020 1055 1085 1102 1102 1108 1115 1120 11 34 1 140 1199 1200 1200 1203 1222 1235 1238 1252 1258 1262 1269 1270 1290 1293 1300 1310 1313 1315 1330 1355 1390 141 6 141 9 142 0 142 0 145 0 145 2 147 5 147 8 148 1 148 5 1502 1505 1513 1522 1522 1530 1 540 1560 1567 1578 15 94 1602 16 04 1608 1630 1 642 16 74 1730 1750 1750 1763 1768 1781 1782 1792 1820 1868 1881 1890 1893 1895 1910 1923 1 940 1 945 2023 2100 2130 2215 2268 244 0 Source:. (7.2.7),is approximately (0.106)( 24. 433) 42 :: (0.106)(59. 342 ) 42 or (0.062, 0.150). A 95% confidence interval for the mean remission time,   167 following (7.2.9),is (42 )(9 .42 9) 59. 342 :: (42 )(9 .42 9) 24. 433 or. G(x) xL(x) G(x) 00.886 - 0.10.951 2.015 0.21.012 1 .49 3 0.31.067 1.223 0 .41 .119 1. 048 0.51.168 0.923 0.61.2 14 0.828 0.71.258 0.753 0.81.300 0.691 0.91. 341 0. 640 11.381 0.596 21.712 0.361 31.987 0.262 Source:

Ngày đăng: 14/08/2014, 09:22

TỪ KHÓA LIÊN QUAN