1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Statistical Methods for Survival Data Analysis 3rd phần 4 pot

53 338 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 53
Dung lượng 4,39 MB

Nội dung

Figure 6.11 Lognormal probability plot of the survival time of 234 male patients with chronic lymphocytic leukemia. (From Feinleib and MacMahon, 1960. Reproduced by permission of the publisher.) Example 6.3 In a study of chronic lymphocytic and myelocytic leukemia, Feinleib and MacMahon (1960) applied the lognormal distribution to analyze survival data of 649 white residents of Brooklyn diagnosed from 1943 to 1952. The analysis of several subgroups of patients follows. The survival time of each patient is computed from the date of diagnosis in months. Analytical method is used to fit the lognormal distribution to the data. The method is discussed in Chapters 7 and 8. Figure 6.11 gives the probability plot of the survival time of 234 male patients with chronic lymphocytic leukemia, in which the horizontal axis for the survival time is in logarithmic scale and the vertical axis is in normal probability scale. When plotting 1 9 S(t) on this graph paper, a straight line is obtained when the data follow a two-parameter lognormal distribution. An inspection of the graph shows that the distribution is concave. Gaddum (1945a, b) has pointed out that such a deviation can be corrected by subtracting an appropriate constant from the survival times. In other words, the three- parameter lognormal distribution can be used. Figure 6.12 gives a similar plot in which the survival time of every patient plus 4 is plotted. The configuration is linear and hence empirically it seems valid to assume that the lognormal distribution is appropriate. Similar graphs for male patients with chronic myelocytic leukemia and for female patients with chronic lymphocytic or myelocytic leukemia are given in Figures 6.13 and 6.14. Parameters of the lognormal distribution are estimated. Feinleib and MacMahon report that the agreement between the observed and calculated distributions is striking for each group except for women with chronic lymphocytic leukemia. The corresponding p values for the chi-square   147 Figure 6.12 Lognormal probability plot of the survival time in months plus 4 of 234 male patients with chronic lymphocytic leukemia. (From Feinleib and MacMahon, 1960. Reproduced by permission of the publisher.) goodness-of-fit test are as follows: Chronic Myelocytic Chronic Lymphocytic Male 0.86 0.73 Female 0.57 0.016 Since a large p value indicates close agreement, it is concluded that the three-parameter lognormal distribution adequately describes the distribution of survival times for each subgroup except women with chronic lymphocytic leukemia. The shape of the observed distribution for the latter group suggests that it might actually be composed of two dissimilar groups, each of whose survival times might fit a lognormal distribution. 6.4 GAMMA AND GENERALIZED GAMMA DISTRIBUTIONS The gamma distribution, which includes the exponential and chi-square distribution, was used a long time ago by Brown and Flood (1947) to describe the life of glass tumblers circulating in a cafeteria and by Birnbaum and Saunders (1958) as a statistical model for life length of materials. Since then, this distribution has been used frequently as a model for industrial reliability problems and human survival. 148 -    Figure 6.13 Lognormal probability plot of the survival time in months plus 4 of 162 male patients with chronic myelocytic leukemia. (From Feinleib and MacMahon, 1960. Reproduced by permission of the publisher.) Figure 6.14 Lognormal probability plot of the survival time in months plus 4 of female patients with two types of leukemia. (From Feinleib and MacMahon, 1960. Reproduced by permission of the publisher.) Suppose that failure or death takes place in n stages or as soon as n subfailures have happened. At the end of the first stage, after time T  , the first subfailure occurs; after that the second stage begins and the second subfailure occurs after time T  ; and so on. Total failure or death occurs at the end of the nth stage, when the nth subfailure happens. The survival time, T, is then T  ; T  ; % ; T L . The times T  , T  , , T L spent in each stage are assumed to      149 Figure 6.15 Gamma hazard functions with  : 1. be independently exponentially distributed with probability density function  exp(9t G ), i : 1, , n. That is, the subfailures occur independently at a constant rate . The distribution of T is then called the Erlangian distribution. There is no need for the stages to have physical significance since we can always assume that death occurs in the n-stage process just described. This idea, introduced by A. K. Erlang in his study of congestion in telephone systems, has been used widely in queuing theory and life processes. A natural generalization of the Erlangian distribution is to replace the parameter n restricted to the integers 1, 2, . . . by a parameter  taking any real positive value. We then obtain the gamma distribution. The gamma distribution is characterized by two parameters,  and . When 0 ::1, there is negative aging and the hazard rate decreases monotonically from infinity to  as time increases from 0 to infinity. When 91, there is positive aging and the hazard rate increases monotonically from 0 to  as time increases from 0 to infinity. When  : 1, the hazard rate equals , a constant, as in the exponential case. Figure 6.15 illustrates the gamma hazard function for : 1 and :1, : 1, 2, 4. Thus, the gamma distribution describes a different type of survival pattern where the hazard rate is decreasing or increasing to a constant value as time approaches infinity. The probability density function of a gamma distribution is f (t) :  () (t)A\e\HR t 9 0, 90, 90 (6.4.1) where () is defined as in (6.2.9). Figures 6.16 and 6.17 show the gamma density function with various values of  and . It is seen that varying  changes the shape of the distribution while varying  changes only the scaling. Consequently,  and  are shape and scale parameters, respectively. When 91, there is a single peak at t : ( 9 1)/. 150 -    Figure 6.16 Gamma density functions with  : 1. Figure 6.17 Gamma density functions with  : 3. The cumulative distribution function F(t) has a more complex form: F(t) :  R   () (x)A\e\HV dx (6.4.2) : 1 ()  HR  uA\e\S du : I(t, )(6.4.3) where I(s, ) : 1 ()  Q  uA\e\S du (6.4.4) known as the incomplete gamma function, is tabulated in Pearson (1922, 1957).      151 For the Erlangian distribution, it can be shown that F(t) : 1 9 L\  I e\HR(t)I k! (6.4.5) Thus, the survivorship function 1 9 F(t)is S(t) :   R  () (x)A\e\HV dx (6.4.6) for the gamma distribution or S(t) : e\R L\  I (t)I k! (6.4.7) for the Erlangian distribution. Since the hazard function is the ratio of f (t)toS(t), it can be calculated from (6.4.1) and (6.4.7). When  is an integer n, h(t) : (t)L\ (n 9 1)!  L\ I (1/k!)(t)I (6.4.8) When : 1, the distribution is exponential. When  :   and  :   , where  is an integer, the distribution is chi-square with  degrees of freedom. The mean and variance of the standard gamma distribution are, respectively, / and /, so that the coefficient of variation is 1/(. Many survival distributions can be represented, at least roughly, by suitable choice of the parameters  and . In many cases, there is an advantage in using the Erlangian distribution, that is, in taking  integer. The exponential, Weibull, lognormal, and gamma distributions are special cases of a generalized gamma distribution with three parameters, , , and , whose density function is defined as f (t) : ?A () t?A\ exp [9(t)?] t 9 0, 90, 90, 90(6.4.9) It is easily seen that this generalized gamma distribution is the exponential distribution if  :  : 1, the Weibull distribution if  : 1; the lognormal distribution if  ; -, and the gamma distribution if  : 1. In later chapters (e.g., Chapters 7 and 9), we discuss several parametric procedures for estimation and hypothesis testing. To use available computer software such as SAS to carry out the computation, we use the distributions adopted by the software. One of the very few software packages that include the gamma or generalized gamma distribution is SAS. In SAS, the generalized 152 -    Table 6.4 Lifetimes of 101 Strips of Aluminum Coupon 370 706 716 746 785 797 844 855 858 886 886 930 960 988 990 1000 1010 1016 1018 1020 1055 1085 1102 1102 1108 1115 1120 1134 1140 1199 1200 1200 1203 1222 1235 1238 1252 1258 1262 1269 1270 1290 1293 1300 1310 1313 1315 1330 1355 1390 1416 1419 1420 1420 1450 1452 1475 1478 1481 1485 1502 1505 1513 1522 1522 1530 1540 1560 1567 1578 1594 1602 1604 1608 1630 1642 1674 1730 1750 1750 1763 1768 1781 1782 1792 1820 1868 1881 1890 1893 1895 1910 1923 1940 1945 2023 2100 2130 2215 2268 2440 Source: Birnbaum and Saunders (1958). gamma distribution is defined as having the following density function: f (t) : ""A?A () t?A\ exp [9(t)?], t 9 0, 90, 90(6.4.10) To differentiate this form of the generalized gamma distribution from the generalized gamma in (6.4.9), we refer to this distribution as the extended generalized gamma distribution. It can be shown that the extended generalized gamma distribution reduces to the Weibull distribution when 90 and  : 1, the lognormal distribution when  ; -, the gamma distribution when  : 1, and the exponential distribution when  :  : 1. Example 6.4 Birnbaum and Saunders (1958) report an application of the gamma distribution to the lifetime of aluminum coupon. In their study, 17 sets of six strips were placed in a specially designed machine. Periodic loading was applied to the strips with a frequency of 18 hertz and a maximum stress of 21,000 pounds per square inch. The 102 strips were run until all of them failed. One of the 102 strips tested had to be discarded for an extraneous reason, yielding 101 observations. The lifetime data are given in Table 6.4 in ascending order. From the data the two parameters of the gamma distribution were      153 Figure 6.18 Graphical comparison of observed and fitted cumulative distribution functions of data in Example 6.4. (From Birnbaum and Saunders, 1958.) estimated (estimation methods are discussed in Chapter 7). They obtained  : 11.8 and  : 1/(118.76;10). A graphical comparison of the observed and fitted cumulative distribution function is given in Figure 6.18, which shows very good agreement. A chi-square goodness-of-fit test (discussed in Chapter 9) yielded a  value of 4.49 for 6 degrees of freedom, corresponding to a significance level between 0.5 and 0.6. Thus, it was concluded that the gamma distribution was an adequate model for the life length of some materials. 6.5 LOG-LOGISTIC DISTRIBUTION The survival time T has a log-logistic distribution if log(T ) has a logistic distribution. The density, survivorship, hazard, and cumulative hazard func- tions of the log-logistic distribution are, respectively, f (t) : tA\ (1 ; tA) (6.5.1) S(t) : 1 1 ; tA (6.5.2) 154 -    h(t) : tA\ 1 ; tA (6.5.3) H(t) : log(1 ; tA)(6.5.4) t . 0, 90, 90 The log-logistic distribution is characterized by two parameters , and . The median of the log-logistic distribution is \A. Figure 6.19(a) to (c) show the log-logistic hazard, density, and survivorship functions with : 1 and various values of  : 2.0, 1, and 0.67. When 91, the log-logistic hazard has the value 0 at time 0, increases to a peak at t : ( 9 1)A/A, and then declines, which is similar to the lognormal hazard. When  : 1, the hazard starts at A and then declines monotonically. When :1, the hazard starts at infinity and then declines, which is similar to the Weibull distribution. The hazard function declines toward 0 as t ap- proaches infinity. Thus, the log-logistic distribution may be used to describe a first increasing and then decreasing hazard or a monotonically decreasing hazard. Example 6.5 Byers et al. (1988) used the log-logistic distribution to describe the rate of spread of HIV between 1978 and 1986. Between 1978 and 1980, over 6700 homosexual and bisexual men in San Francisco were enrolled in studies of the prevalence and incidence of sexually transmitted hepatitis B virus (HBV) infections. Blood specimens were collected from the participants. Four hundred and eighty-eight men who were HBV-seronegative were ran- domly selected to participate in a study of HIV infection later. These men agreed to allow the investigators to test the specimens collected previously together with a current specimen. For those who convert to positive, the infection time is only known to have occurred between the previous negative test and the time of the first positive one. The exact time is unknown. The time to infection is therefore interval censored. The investigators tried to fit several distributions to the interval-censored data, including the Weibull and log- logistic by maximum likelihood methods (discussed in Chapter 7). Based on the Akaike information criterion (discussed in Chapter 9), the log-logistic distribution was found to provide the best fit to the data. The maximum likelihood estimates of the two parameters are  : 0.003757 and  : 1.424328. Based on the log-logistic model, the median infection time is estimated to be 50.4 months, and the hazard function approaches its peak at 27.6 months. 6.6 OTHER SURVIVAL DISTRIBUTIONS Many other distributions can be used as models of survival time, three of which we discuss briefly in this section: the linear exponential, the Gompertz (1825),    155 (a) (b) (c) Figure 6.19 (a) Hazard function of the log-logistic distribution; (b) density function of the log-logistic distribution; (c) Survivorship function of the log-logistic distribution. 156 [...]... above, confidence intervals for and can also be obtained A 95% confidence interval for the relapse rate , following (7.2.7), is approximately (0.106)( 24. 433) (0.106)(59. 342 ) : : 42 42 or (0.062, 0.150) A 95% confidence interval for the mean remission time, 168       following (7.2.9), is (42 )(9 .42 9) (42 )(9 .42 9) : : 59. 342 24. 433 or (6.673, 16.208) Once... mice are 4, 5, 8, 9, and 10 weeks The survival data of the 10 mice are 4, 5, 8, 9, 10, 10;, 10;, 10;, 10;, and 10; Assuming that the failure of these mice follows an exponential distribution, the survival rate and mean survival time are estimated, respectively, according to (7.2.10) and (7.2.11) by : 5 : 0.058 per week 36 ; 50 and  : 1/0.058 : 17. 241 weeks A 95% confidence interval for (0.058)(3. 247 ) (0.058)(20 .48 3)... confidence interval for (0.058)(3. 247 ) (0.058)(20 .48 3) : : (2)(5) (2)(5) by (7.2.12) is 170       or (0.019, 0.119) A 95% confidence interval for following (7.2.13) is 2(5)(17. 241 ) 2(5)(17. 241 ) : : 20 .48 3 3. 247 or (8 .41 7, 53.098) The probability of surviving a given time for the mice can be estimated from (7.2.2) For example, the probability that a mouse... by t G : max t   9 1 ,0 n (7.2 .40 )   177 and the variance of G is  Var(G ) :  1 1 1; (n  ) r91 (7.2 .41 ) When n is large, G and Var(G) can be estimated by  G 9 nt G G GP> G  (7.2 .44 ) with variance Var(  ) :  r91 (7.2 .45 ) Any percentile of survival time t may be estimated by... obtained For example, the probability of staying in remission for at least 20 weeks, estimated from (7.2.2), is S(20) : exp[90.106(20)] : 0.120 Any percentile of survival  time t may be estimated by equating S(t) to p and solving for t , that is, N N t : 9log p/  For example, the median (50th percentile) survival time can be N estimated by t : 9log 0.5/  : 6.539 weeks   Estimation of for Data with... exponential distribution for the observed survival data in ‘‘C:!EXAMPLE.DAT’’   173 data w1; infile ‘c:!example.dat’ missover; input t cens; run; proc lifereg; model t*cens(0) : /covb d : exponential; run; The respective BMDP code for program 2L is /input /print /variable /form /regress /end file : ‘c:!example.dat’ variables : 2 format : free level : brief cova survival names : t,... confidence interval for computed from (7.2.33) is 2(600) 2(600) : : 31.526 8.231 or (38.0 64, 145 .790) When data are progressively censored, Gehan (1970) derives an estimate for G and a modified MLE for the hazard rate Suppose that r out of the n individuals in the study die before the end of the study and n 9 r individuals are alive at the time of the last follow-up or termination The n survival times are... times in weeks for 21 patients with acute leukemia: 1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 8, 8, 9, 10, 10, 12, 14, 16, 20, 24, and 34 Assume that remission duration follows the exponential distribution Let us estimate the parameter by using the formulas given above According to (7.2.5), the MLE of the relapse rate, , is  : 21 : 0.106 per week 198 The mean remission time is then 198/21 : 9 .42 9 weeks Using... S(t) to p and N solving for t ; that is, t : 9(log p)/  ; G  N N C The following example illustrates the procedures Example 7.8 Suppose that 19 patients with brain tumor are followed in a clinical trial for a year Their survival times in weeks are 3, 4, 6, 8, 8, 10, 12, 16, 17, 30, 33, 3;, 8;, 13;, 21;, 26;, 35;, 44 ;, and 45 ; In this case n : 19, r : 11, t : 3,  t : 147 , and  t> : 195 The hazard... and its variance may be estimated by (7.2 .44 ) and (7.2 .45 ) as  : 10 : 0.035 147 ; 195 9 19(3) and  Var(  ) : (0.035) : 0.0001 10 The guarantee time G and its variance may then be estimated by (7.2 .40 ) and (7.2 .41 ): 178       G : max 3 9  1 , 0 : 1 .49 6 19;0.035 and   Var(G ) : 1 (19;0.035) 1; 1 : 2 .48 7 10 Thus, after a guarantee time of approximately . Coupon 370 706 716 746 785 797 844 855 858 886 886 930 960 988 990 1000 1010 1016 1018 1020 1055 1085 1102 1102 1108 1115 1120 11 34 1 140 1199 1200 1200 1203 1222 1235 1238 1252 1258 1262 1269 1270 1290 1293 1300 1310 1313 1315 1330 1355 1390 141 6 141 9 142 0 142 0 145 0 145 2 147 5 147 8 148 1 148 5 1502 1505 1513 1522 1522 1530 1 540 1560 1567 1578 15 94 1602 16 04 1608 1630 1 642 16 74 1730 1750 1750 1763 1768 1781 1782 1792 1820 1868 1881 1890 1893 1895 1910 1923 1 940 1 945 2023 2100 2130 2215 2268 244 0 Source:.  Table 6 .4 Lifetimes of 101 Strips of Aluminum Coupon 370 706 716 746 785 797 844 855 858 886 886 930 960 988 990 1000 1010 1016 1018 1020 1055 1085 1102 1102 1108 1115 1120 11 34 1 140 1199 1200 1200 1203 1222 1235 1238 1252 1258 1262 1269 1270 1290 1293 1300 1310 1313 1315 1330 1355 1390 141 6 141 9 142 0 142 0 145 0 145 2 147 5 147 8 148 1 148 5 1502 1505 1513 1522 1522 1530 1 540 1560 1567 1578 15 94 1602 16 04 1608 1630 1 642 16 74 1730 1750 1750 1763 1768 1781 1782 1792 1820 1868 1881 1890 1893 1895 1910 1923 1 940 1 945 2023 2100 2130 2215 2268 244 0 Source:. lognormal distribution to analyze survival data of 649 white residents of Brooklyn diagnosed from 1 943 to 1952. The analysis of several subgroups of patients follows. The survival time of each patient

Ngày đăng: 14/08/2014, 05:20

TỪ KHÓA LIÊN QUAN