Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 54 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
54
Dung lượng
302,46 KB
Nội dung
Figure 8.4 Normal probability plot of the WBC data in Example 8.1. observations have the same value, the sample cumulative distribution function is plotted against only the t with the largest i value. Step 3. Plot t or a function of it versus the estimated sample cumulative distribution or a function of it. Step 4. Fit a straight line through the points by eye. The position of the straight line should be chosen to provide a fit to the bulk of the data and may ignore outliers or data points of doubtful validity. Figure 8.4 gives a normal probability plot of the WBC versus \(F), where \( · ) is the inverse of the standard normal distribution function. The values of \(F (WBC G )) are shown in Table 8.1. The plot is reasonably linear. The straight line fitted by eye in a probability plot can be used to estimate percentiles and proportions within given limits in the same manner as for the sample cumulative distribution curve. In addition, a probability plot provides estimates of the parameters of the theoretical distribution chosen. The mean (or median) WBC estimated from the normal probability plot in Figure 8.4 is 56,000 [at \(F) : 0, F : 0.5 and WBC: 56,000]. At \(F) : 1, WBC : 91,000, which corresponds to the mean plus 1 standard deviation. Thus, the standard deviation is estimated as 35,000. We now discuss probability plots of the exponential, Weibull, lognormal, and log-logistic distributions. 203 Table 8.2 Probability Plotting for Example 8.2 Order, F, ti(i 9 0.5)/21 log[1/(1 9 F)] 11 1 2 0.071 0.074 23 2 4 0.167 0.182 3 5 0.214 0.241 46 4 7 0.310 0.370 58 5 9 0.405 0.519 6 10 0.452 0.602 811 8 12 0.548 0.793 9 13 0.595 0.904 10 14 10 15 0.690 1.173 12 16 0.738 1.340 14 17 0.786 1.540 16 18 0.833 1.792 20 19 0.881 2.128 24 20 0.929 2.639 34 21 0.976 3.738 Exponential Distribution The exponential cumulative distribution function is F(t) : 1 9 exp[9(t)] t 9 0(8.2.1) The probability plot for the exponential distribution is based on the relation- ship between t and F(t), from (8.2.1), t : 1 log 1 1 9 F(t) (8.2.2) This relationship is linear between t and the function log[1/(19 F(t))]. Thus, an exponential probability plot is made by plotting the ith ordered observed survival time t G versus log[1/(1 9 F (t G ))], where F (t G ) is an estimate of F(t G ), for example, (i 9 0.5)/n, for i : 1, , n. From (8.2.2), at log+1/[1 9 F(t)], : 1, t : 1/. This fact can be used to estimate 1/ and thus from the fitted straight line. That is, the value t 204 Figure 8.5 Exponential probability plot of the data in Example 8.2. corresponding to log+1/[1 9F(t)], : 1 is an estimate of the mean 1/ and its reciprocal is an estimate of the hazard rate . Example 8.2 Suppose that 21 patients with acute leukemia have the following remission times in months: 1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 8, 8, 9, 10, 10, 12, 14, 16, 20, 24, and 34. We would like to know if the remission time follows the exponential distribution. The ordered remission times t G and the log+1/ [1 9 F(t)], are given in Table 8.2. The exponential probability plot is shown in Figure 8.5. A straight line is fitted to the points by eye, and the plot indicates that the exponential distribution fits the data very well. At the point log[1/ (1 9 F(t))] : 1.0, the corresponding t, approximately 9.0 months, is an esti- mate of the mean 1/ and thus an estimate of the hazard rate is : 1/9 : 0.111 per month. An alternative is to use (7.2.5) to estimate , : 21/198: 0.107, which is very close to the graphical estimate. Weibull Distribution The Weibull cumulative distribution function is F(t) : 1 9 exp[9(t)A] t 90, 90, 90(8.2.3) The probability plot for the Weibull distribution is based on the relationship log t : log 1 ; 1 log log 1 1 9 F(t) (8.2.4) 205 between t and the cumulative distribution function F of t obtained from (8.2.3). This relationship is linear between log t and the function log(log+1/[19F(t)],). Thus, a Weibull probability plot is a graph of log(t G ) and log(log+1/ [1 9 F (t G )],), where F (t G ) is an estimate of F(t G ), for example, (i 9 0.5)/n, for i : 1, , n. The shape parameter is estimated graphically as the reciprocal of the slope of the straight line fitted to the graph. If the fitted line is appropriate, then at log(log+1/[1 9 F(t)],) : 0, the corresponding log(t) is an estimate of log(1/) from (8.2.4). This fact can be used to estimate 1/ and thus graphically from a Weibull probability plot. At log(log+1/[1 9 F(t)],) : 0.5, (8.2.4) reduces to log t : log(1/) ; 0.5/. This equation can be used to estimate . Estimates of the parameters can also be obtained from the method described in Chapter 7 if the Weibull distribution appears to be a good fit graphically. The following hypothetical example illustrates the use of the Weibull probabil- ity plot. The small number of observations used in the example is only for illustrative purposes. In practice, many more observations are needed to identify an appropriate theoretical model for the data. Example 8.3 Six mice with brain tumors have survival times, in months of 3, 4, 5, 6, 8, and 10. Log(t G ) plotted against log(log+1/[1 9 (i 9 0.5)/6],) for i : 1, , 6 is shown in Figure 8.6. A straight line is fitted to the data point by eye. From the fitted line, at log(log+1/[1 9 F(t)],) : 0, the corresponding log(t) : 1.9, and thus an estimate of 1/ is approximately 6.69 [:exp(1.9)] months and an estimate of is 0.150. At log(log+1/[1 9 F(t)],) : 0.5, the corresponding log(t) : 2.09, and thus an estimate of : 0.5/(2.09—1.9) : 2.63. The maximum likelihood estimates of and obtained from the SAS procedure LIFEREG are 2.75 and 0.148, respectively. The graphical estimates of and are close to the MLE. Lognormal Distribution If the survival time t follows a lognormal distribution with parameters and , log t follows the normal distribution with mean and variance . Consequently, (log t 9 )/ has the standard normal distribution. Thus, the lognormal distribution function can be written as F(t) : log t 9 t 9 0(8.2.5) where ( · ) is the standard normal distribution function and and are, respectively, the mean and standard deviation of log t. A probability plot for the lognormal distribution is based on the following relationship obtained from (8.2.5): log t :; \(F(t)) (8.2.6) 206 Figure 8.6 Weibull probability plot of the data in Example 8.3. The function \( · ) is the inverse of the standard normal distribution func- tion or its 100F percentile. This relationship is linear between the value log t and the function \(F(t)). Thus, a log-normal probability plot is a graph of log(t G ) versus \(F (t G )), where F (t G ) is an estimate of F(t G ). From (8.2.6),at\(F(t)) : 0, log t : ; and at, \(F(t)) : 1, : log t 9 . These facts can be used to estimate and from a straight line fitted to the graph. Example 8.4 In a study of a new insecticide, 20 insects are exposed. Survival times in seconds are 3, 5, 6, 7, 8, 9, 10, 10, 12, 15, 15, 18, 19, 20, 22, 25, 28, 30, 40, and 60. Suppose that prior experience indicates that the survival time follows a lognormal distribution; that is, some insects might react to the insecticide very slowly and not die for a long time. The log(t G ) versus \[(i 9 0.5)/20], i : 1, , 20, are plotted in Figure 8.7. The plot shows a reasonably straight line. From the fitted line, at \(F(t)) : 0, log t is an estimate of , which is equal to 2.64, and at \(F(t)) : 1, log t : 3.4 and thus : 3.4 9 2.64 : 0.76. \(F(t)) can be obtained by applying Microsoft Excel function NORMSINV. 207 Figure 8.7 Lognormal probability plot of the data in Example 8.4. Log-Logistic Distribution The log-logistic distribution function is F(t) : tA 1 ; tA t 9 0, 90, 90(8.2.7) A probability plot for the log-logistic distribution is based on the following relationship obtained from (8.2.7): log t : 1 log 1 1 9 F(t) 9 1 9 1 log (8.2.8) Thus, a log-logistic probability plot is a graph of log(t G ) versus log(+1/ [1 9 F (t G )], 9 1), where F (t G ) is an estimate of F(t G ), for example, (i 9 0.5)/n, for i : 1, , n. From (8.2.8), at log+[1/(1 9 F)] 9 1, : 0, log t :9(1/) log ; and at log+[1/(1 9F)] 9 1, :1, log t : (1/)(1 9 log ). These facts can be used to estimate and . The following example illustrates the log-logistic probability plot. Example 8.5 Consider the following survival times of 10 experimental rats in days: 8, 15, 25, 30, 50, 90, 95, 100, 150, and 300. Figure 8.8 plots log(t G ) 208 Figure 8.8 Log-logistic probability plot of the data in Example 8.5. against log(+1/[1 9 (i 9 0.5)/10], 9 1) for i : 1, , 10. To estimate and , from the fitted line, at log(+1/[1 9 F(t)], 9 1) : 0, log t : 4.0; and at log(+1/ [1 9 F(t)], 9 1) : 1, log t : 4.6. Thus, we have two equations: 4.0 :9 1 log and 4.6 : 1 (1 9 log) From these two equations, : 1.667 and : 0.0013. 8.3 HAZARD PLOTTING Hazard plotting (Nelson 1972, 1982) is analogous to probability plotting, the principal difference being that the survival time (or a function of it) is plotted against the cumulative hazard function (or a function of it) rather than the distribution function. Hazard plotting is designed to handle censored data. Similar to probability plotting, estimates of parameters in the distribution can be determined from the hazard plot with little computational effort. To determine if a set of survival time with censored observation is from a given theoretical distribution, we construct a hazard plot by plotting the survival time (or a function of it) versus an estimation cumulative hazard (or 209 a function of it). The cumulative hazard function can be estimated by following the steps below. Step 1. Order the n observations in the sample from smallest to largest without regard to whether they are censored. If some uncensored and censored observations have the same value, they should be listed in random order. In the list of ordered values, the censored data are each marked with a plus. Step 2. Number the ordered observations in reverse order, with n assigned to the smallest data value, n 9 1 to the second smallest, and so on. The numbers so obtained are called K values or reverse-order numbers. For the uncensored observation, K is the number of subjects still at risk at that time. Step 3. Obtain the corresponding hazard value for each uncensored observa- tion. Censored observations do not have a hazard value. The hazard value for an uncensored observation is 1/K. This is the fraction of the K individuals who survived that length of time and then failed. It is an observed conditional failure probability for an uncensored observation. Step 4. For each uncensored observation, calculate the cumulative hazard value. This is the sum of the hazard values of the uncensored observation and of all preceding uncensored observations. For tied uncensored observations, the cumulative hazard is evaluated only at the smallest K among the uncen- sored observations. The table in the following example illustrates the procedure. Example 8.6 Consider the remission data of the 21 leukemia patients receiving 6-MP in Example 3.3. Table 8.3 illustrates the procedure for estima- ting the cumulative hazard function. We now discuss the basic idea underlying hazard plotting for the exponen- tial, Weibull, lognormal, and log-logistic distributions. Exponential Distribution The exponential distribution has constant hazard function h(t) : . Thus, the cumulative hazard function is H(t) : t (8.3.1) From (8.3.1), the time can be written as a linear function of the cumulative hazard H, t : 1 H(t)(8.3.2) Thus, t plots as a straight-line function of H. The slope of the fitted line is the 210 Table 8.3 Estimation of Cumulative Hazard Reversed Cumulative Order, Hazard, Hazard, tK1/K H (t) 621 0.048 6; 20 619 0.053 618 0.056 0.156 717 0.059 0.215 9; 16 10 15 0.067 0.281 10; 14 11; 13 13 12 0.083 0.365 16 11 0.091 0.456 17; 10 19; 9 20; 8 22 7 0.143 0.598 23 6 0.167 0.765 25; 5 32; 4 32; 3 34; 2 35; 1 mean survival time 1/ of the distribution. More simply, 1/ is the value of t when H(t) : 1. This fact is used to estimate 1/ from an exponential hazard plot. Example 8.7 Using the estimated cumulative hazard values H (t) in Table 8.3, we construct the exponential hazard plot in Figure 3.5 by plotting each exact time t against its corresponding H (t). The configuration appears to be reasonably linear, suggesting that the exponential distribution provides a reasonable fit. In Chapter 3 we see that the Weibull distribution gives a better fit than the exponential. We use the data here just to demonstrate how the parameter can be estimated. To find an estimate for the mean remission time of the leukemia patients, we can use H(t) : 0.5 since the time for which H : 1 is out of the range of the horizontal axis. At H(t) : 0.5, t : 16.9, from (8.3.2), an estimate of is 0.5/16.9: 0.0296. Thus, an estimate of the mean remission time is 34 weeks. 211 Figure 8.9 Cumulative hazard functions of the Weibull distribution with :0.5, 1, 2, 4. Weibull Distribution The Weibull distribution has the hazard function h(t) : (t)A\ t 9 0 The cumulative hazard function is H(t) : (t)A t 90(8.3.3) and is plotted in Figure 8.9 for four different values of : 0.5, 1, 2, and 4. From (8.3.3), the time t can be written as a function of the cumulative hazard function, that is, t : 1 [H(t)]A (8.3.4) Taking the logarithm of (8.3.4), we obtain log t : log 1 ; 1 log H(t)(8.3.5) Since logt is a linear function of logH(t), a plot of log t against log H(t)isa straight line. For log H(t) : 0orH(t) : 1, (8.3.5) reduces to log t : log(1/), and thus the corresponding time t equals 1/. This fact is used to estimate 1/ and consequently, . The slope of the fitted straight line is 1/,orat log H(t) : 1, (8.3.5) can be written as : 1/(log t ; log). This equation can be used to estimate . 212 [...]... 4 .51 1. 05 9.47 79. 05 2.02 4.26 11. 25 10.34 10.66 12.03 2.64 14.76 1.19 8.66 14.83 5. 62 18.10 25. 74 17.36 1. 35 9.02 6.94 7.26 4.70; 3.70 3.64 3 .57 11.64 6. 25 25. 82 3.88 3.02; 19.36; 20.28 46.12 5. 17 0.20 36.66 10.06 4.98 5. 06 16.62 12.07 6.97 0.08 1.40 2. 75 7.32 1.26 6.76 8.60; 7.62 3 .52 9.74 0.40 5. 41 2 .54 2.69 8.26 0 .50 5. 32 5. 09 2.09 7.93 12.02 t 13.80 5. 85 7.09 5. 32 4.33; 2.83 8.37 14.77 8 .53 11.98... the data in Example 8.8 Example 8.8 Consider the following survival times in months of 14 patients: 15, 25, 38, 40;, 50 , 55 , 65, 80;, 90, 140, 150 ;, 155 , 250 ;, 252 Figure 8.10 is the hazard plot with log t versus log H(t) of the data From the fitted line, at log H(t) : 0, log t : 4.8 Thus, t : 121 .5 and the estimate of is : 1/t : 0.0082 Similarly, at, log H(t) : 1, log t : 5. 6, and thus : 1/ (5. 6... MLE of GP> G 8 : 0.06 35 91 ; 35 and l ( ) : 8(log 0.06 35) 9 0.06 35( 91) 9 0.06 35( 35) : 930. 055 Under H , # l ( ) : 8(log 0.06) 9 0.06(91) 9 0.06( 35) : 930.067 Thus, following (9.4.1), # X : 2[930. 055 9 (930.067)] : 0.024 X : 3.84; therefore, we cannot
* reject the null hypothesis that the data are from the exponential distribution with : 0.06 2 Testing the hypothesis that the underlying... for the exponential, Weibull, lognormal, and generalized gamma distributions The results are given in Table 9.2 For example, the MLE of in the exponential distribution is 5. 054 and the corresponding log-likelihood is 9 35. 359 , and the MLE of the two parameters in the Weibull distribution are : 5. 002 and : 0 .50 0 and the corresponding log-likelihood 229 — — 0 .50 0 0 .56 1 0 .52 7 0.332 4.739 BA 5. 054 5. 054 ... (n) For each candidate distribution, compute p r : l(b ) 9 log n 2 (9.3.1) 231 Table 9.3 Remission Times (Months) of 137 Cancer Patients t 4 .50 19.13 14.24 7.87 5. 49 2.02 9.22 3.82 26.31 4. 65; 2.62 0.90 21.73 0.87; 0 .51 3.36 43.01 0.81 3.36 1.46 24.80; 10.86; 17.14 15. 96 7.28 4.33 22.69 2.46 3.48 4.23 6 .54 8. 65 5.41 2.23 4.34 t t 32. 15 4.87 5. 71 7 .59 3.02 4 .51 ... 38 45 46 50 53 54 58 66 69 77 78 81 84 85 91 95 101 108 1 15 118 120 1 25 134 1 35 Which of the distributions discussed in this chapter provide a reasonable fit to the data? Estimate graphically the parameters of the distribution chosen 220 8 .5 In a clinical study, 28 patients with cancer of the head and neck did not respond to chemotherapy Their survival. .. 0 .52 7 0.332 4.739 BA 5. 054 5. 054 5. 002 4.7 65 4.4 95 A@ — — — — — 91.088 CB X * 11.922C 19.762D 7.840D 2.326D — — LL 9 35. 359 9 35. 359 929.398 926.641 9 25. 478 926.867 — :0.001 :0.001 0.0 05 0.127 — p Value 930.268 — 937.060 932.800 930.042 930 .58 0 BIC 930.867 — 937. 359 933.398 930.641 931.478 AIC ? LL, log-likelihood; X , likelihood ratio statistic; p value, P(X 9 X ) * * @A : 9log for the exponential and the... ) :9 *l ( , ) 5 * *l ( , ) \ 5 * * *l ( , ) 5 * * *l ( , ) 5 * (9.1.7) and [*l ( , )/* ][*l ( , )/* ] 9 (*l ( , )/* * ) 5 5 5 (9.1.8) V \( , ) :9 *l ( , )/* 5 For a given significant-level , H is rejected if X 9 , when the likelihood ? * ratio statistic is used; or if X 9 or X : , when the Wald ? \? 5 5 statistic is used It... 0.367 0.333 0.267 0.267 0.233 0.000 0.034 0.069 0.1 05 0.143 0.182 0.223 0.266 0.310 0.4 05 0.4 05 0. 457 0 .51 1 0 .56 8 0.629 0.693 0.762 0.836 0.916 1.003 1.099 1.322 1.322 1. 455 ? r, ordered Cox—Snell residuals from the fitted lognormal model @S (r), Kaplan—Meier estimate of survivorship function for the 0 Cox—Snell residuals lognormal model may be appropriate for the tumor-free times observed In Chapter 9 (Example... intercept Therefore, a Figure 8.13 Cox—Snell residual plot for the fitted lognormal model on the tumor-free time data for rats fed with saturated diets 218 Table 8.4 Kaplan Meier Estimate of Survivorship Function for the Cox Snell Residuals from the Fitted Lognormal Model on Tumor-Free Time Data for Rats Fed with Saturated Diets t 43 46 56 58 68 75 79 81 . 0. 053 618 0. 056 0. 156 717 0. 059 0.2 15 9; 16 10 15 0.067 0.281 10; 14 11; 13 13 12 0.083 0.3 65 16 11 0.091 0. 456 17; 10 19; 9 20; 8 22 7 0.143 0 .59 8 23 6 0.167 0.7 65 25; 5 32; 4 32; 3 34; 2 35; . following survival times in months of 14 patients: 15, 25, 38, 40;, 50 , 55 , 65, 80;, 90, 140, 150 ;, 155 , 250 ;, 252 . Figure 8.10 is the hazard plot with log t versus log H(t) of the data. From. given below. 30 53 77 91 118 38 54 78 95 120 45 58 81 101 1 25 46 66 84 108 134 50 69 85 1 15 1 35 Which of the distributions discussed in this chapter provide a reasonable fit to the data? Estimate