Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 83 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
83
Dung lượng
538,22 KB
Nội dung
rather than a single measure of central tendency. The data set, which is of size n, is arranged in ascending order of age and points 1 to kn are selected (where k is a fraction, usually between 0Á05 and 0Á2). For these points the simple linear regression of the variable on age is determined and the residuals are calculated. The m selected percentiles of these residuals are determined by sorting and counting and the m percentiles for the variable found by adding the m residual percentiles to the fitted value of the regression calculated at the median of the kn ages. These percentiles are plotted against the median age and then the whole process is repeated using points 2 to kn 1. The process continues until the whole range of ages is covered. This approach means that no percentiles are computed for the range of ages covered by the smallest 1 2 kn points and the largest 1 2 kn points. Example 12.4 Ultrasound measurements of the kidneys of 560 newborn infants were obtained in a study reported by Scott et al. (1990). Measurements of the depth, length and area of each kidney were made and interest was focused on the range of kidney sizes, as certain pathologies are associated with larger kidneys. However, the assessment of what is a large kidney depends on the size of the baby and, to allow for this, birth-weight- and head-circumference- specific percentiles were derived using the HRY method. For each of depth and length, the percentiles were estimated for the maximum of the measurements on the right and left kidney. Figure 12.6 shows the data on maximum kidney depth plotted against birth weight. The seven percentiles, namely, 3rd, 10th, 25th, 50th, 75th, 90th and 97th are estimated. The unsmoothed percentiles are computed using k 0Á2: the results for the 3rd, 50th and 97th percentiles are illustrated. The unsmoothed percentiles are smoothed using (12.17) and (12.18) with p 1, q 0 2 and q 1 1: the fitted percentiles are given by a 0i 1Á594 0Á220z i 0Á013z 2 i and a 1i 0Á201 0Á002z i : These values are arrived at after extensive fitting of the data and assessment of the shape and goodness of fit of the resulting percentiles. The quadratic term of a 0i provides evidence of slight non-normality. The linear term for a 1i suggests that dispersion increases with birth weight but the size of this effect is modest. Once the values of the b coefficients in (12.18) have been found, percentile charts can readily be produced. Moreover, if a new subject presents with value x for the variable at age t, then the Z-score or percentile can be found by solving a polynomial equation for z, namely, x q 0 r0 b 0r z r q 1 r0 b 1r z r t q p r0 b pr z r t p : 12.3 Reference ranges 405 4 . 0 3 . 5 3 . 0 2 . 5 Maximum kidney depth (cm) 2 . 0 1 . 5 1 . 0 123 Birth weight (kg) 45 Fig. 12.6 Kidney depth versus birth weight for 560 newborn infants. Points indicate raw data and 3rd, 50th and 97th unsmoothed percentiles are shown. ÐÐ shows smoothed versions of these percentiles, ± ± ± shows 10th and 90th percentiles and ÁÁÁÁÁ25th and 75th percentiles. The HRY method does not rely on any distributional assumptions for its validity. The main weakness of the method is its reliance on polynomials for smoothing. Polynomials are used for two purposes in this method. The poly- nomial of order p in t (12.17), which describes the median (z i 0), is affected by the usual difficulties of polynomial smoothing which arise from the global nature of polynomial fits, described in §12.1. This is likely to be especially problematic if the variable changes markedly across the age range being analysed. Some of these difficulties can be overcome by the extension of the method proposed by Pan et al. (1990) and Goldstein and Pan (1992), which uses piecewise polynomial fits. The second use of polynomialsÐin particular, non-linear polynomialsÐis to accommodate non-normality in the distribution of the variable. This use of polynomials has some theoretical foundation: approximations to non-normal distribution functions can be made by substituting measures of skewness, kur- tosis (degree of `peakedness') and possibly higher moments in the Cornish± Fisher expansion (Cornish & Fisher, 1937; see also Barndorff-Nielsen & Cox, 1989, §4.4). However, it is probably unwise to expect severe non-normality, or at least non-normality which arises from values of moments higher than kurtosis, 406 Further regression models to be adequately accommodated by the method. In such cases it may be sensible to try to find a transformation to make the distribution of the data closer to normal before applying the HRY method. Goodness of fit At the beginning of this section the benefit of taking advantage of specific distributional forms in deriving percentiles was explained. The LMS method takes advantage of the normal distribution of X LT where L(T ) has been estimated from the data. However, in practice, variables do not arise with labels on them telling the analyst that they do or do not have a normal distribution. For some variables, such as height, the variable has been measured so often in the past and always found to be normally distributed that a prior assumption of normality may seem quite natural. In other cases, there may be clear evidence from simple plots of the data that the normal distribution does not apply. If a method for determining percentiles is proposed that assumes the normal- ity of the variable, then it is important that this assumption be examined. If the LMS or some similar method which estimates transformations that are pur- ported to yield data with a normal distribution is to be used, then the normality of the transformed data must be examined. It may be that the wrong L(T ) has been chosen or it may be that no Box±Cox transformation will yield a variable that has a normal distribution. It could be argued that it is important to examine the assumption of a normal distribution that attends many statistical techniques, and this may be so. How- ever, slight departures from the normal distribution in the distribution of, for example, residuals in an analysis of variance are unlikely to have a serious effect on the significance level of hypothesis tests or widths of confidence intervals. However, when the assumption underpins estimated percentiles, matters are rather different. In practice, it is the extreme percentiles, such as the 3rd, 95th or 0Á5th percentile, that are of use to the clinician and slight departures of the true distribution from the normal distribution, particularly in the tails of the distribution, can give rise to substantial errors. It is therefore of the utmost importance that any distributional assumptions made by the method are prop- erly checked. This is a difficult matter which has been the subject of little research. Indeed, it may be almost impossible to detect important deviations of distributional form because they occur precisely in those parts of the distribution where there is least information. Probably the best global test for normality is that due to Shapiro and Wilk (1965). This test computes an estimate of the standard deviation using a linear combination of the data that will be correct only if the data follow a normal distribution. The ratio of this to the usual estimate, which is always valid, is W,0< W < 1, and is the basis of the test. Significance levels can be found 12.3 Reference ranges 407 using algorithms due to Royston (1992). However, even this test lacks power. Moreover, if it is the transformed data that are being assessed, and the test is applied to the same data that were used to determine the transformation, then the test will be conservative. A useful graphical approach to the assessment of the goodness of fit of the percentiles is to use the fitted percentiles to compute the Z-score for each point in the data set. If the percentiles are a good fit, then the Z-scores will all share the standard normal distribution. If these scores and their associated ages are sub- jected to the first part of the HRY method, then the unsmoothed percentiles should show no trend with age and should be centred around the values of the percentiles, which the HRY method requires. This is a most useful tool for a visual assessment of the adequacy of percentile charts, but its detailed properties do not seem to have been considered. It can be applied to any method which determines percentiles and which allows the computation of Z-scores, not just methods which assume normality. In particular, it can be used to assess percen- tiles determined by the HRY method itself. A useful comparison of several methods for deriving age-related reference ranges can be found in Wright and Royston (1997). 12.4 Non-linear regression The regression of a variable y on a set of covariates x 1 , x 2 , , x p ,E(yjx 1 , x 2 , , x p , can be determined using the multiple regression techniques of §11.6, provided that Eyjx 1 , x 2 , , x p b 0 b 1 x 1 b 2 x 2 b p x p Á12:19 This equation not only encompasses a straight-line relationship with each of several variables, but, as was seen in §12.1, it allows certain types of curves to be fitted to data. If y represents the vector of n observations on y and X is the n Âp 1 matrix with element (i, j) being the observation of x jÀ1 on unit i (to accommodate the term in b 0 , the first element of each row of X is 1), then (12.19) can be written EyXb where b is the vector of b i s. The estimate of b can be written as X T X À1 X T y, 12:20 and its dispersion matrix is s 2 X T X À1 , 12:21 where s 2 is the variance of y about (12.19). These results depend on the linearity of (12.19) in the parameters, not the covariates, and it is this feature which allows curves to be fitted by this method. 408 Further regression models As pointed out in §12.1, polynomial curves in x 1 , say, can be fitted by identifying x 2 , x 3 , etc., with powers of x 1 and the linearity in the bs is undisturbed. The same applies to fractional polynomials and, although this is less easy to see, it also applies to many of the smoothing methods of §12.2. However, for many curves that a statistician may wish to fit, the parameters do not enter linearly; an example would be an exponential growth curve: Eyjxb 0 1 À e Àb 1 x : Suppose n pairs of observations (y i , x i ) are thought to follow this model, with y i Ey i jx i e i , where e i s are independent, normally distributed residuals with common variance s 2 . Estimates of b 0 , b 1 can be found using the same method of least squares used for (12.19)Ðthat is, b 0 , b 1 are chosen to minimize n i1 y i À b 0 1 À e Àb 1 x i 2 : 12:22 Unlike the situation with (12.19), the equations which determine the minimizing values of these parameters cannot be solved explicitly as in (12.20), and numer- ical methods need to be used. A more general non-linear regression can be written as y i f x i , be i , 12:23 where f is some general function, usually assumed differentiable, b is a vector of p parameters and x i is the covariate for the ith case. In general, x i can be a vector but in much of the present discussion it will be a scalar, often representing time or dosage. The error terms e i will be taken as independently and normally distributed, with zero mean and common variance s 2 , but other forms of error are encountered. It turns out that some of the properties of the estimates in (12.23) can be written in a form very similar to (12.20) and (12.21), namely, ^ b F T F À1 F T y, 12:24 var ^ bs 2 F T F À1 , 12:25 E ^ bb, 12:26 where the (i, j)th element of F is @f x i , b=@b j evaluated at ^ b. The similarity stems in part from a first-order Taylor expansion of (12.23) and, as such, the results in (12.24) and (12.25) are approximations, but for large samples the approximations are good and they are often adequate for smaller samples. Indeed, X from (12.20) and (12.21) is F if f x i , bb 0 b 1 x i1 b ip x p , so (12.24) and (12.25) are exact in the linear case and it is not unreasonable to 12.4 Non-linear regression 409 suppose that the closeness of the above approximations depends on the depar- ture of f x, b from linearity. Much work has been done on the issue of the degree of non-linearity or curvature in non-linear regression; important contributions include Beale (1960) and Bates and Watts (1980). Important aspects of non-linearity can be described by a measure of curvature introduced by the latter authors. The measurement of curvature involves considering what happens to f x, b as b moves through its allowed values. The measure has two components, namely the intrinsic curvature and the parameter-effects curvature. These two components are illustrated by the model f x, b1 À e Àbx . The same response curve results from gx, g1 À g x if g e Àb , so in a clear sense these response curves have the same curvature. However, when fitting this curve to data, the statistician searches through the parameter space to find a best-fitting value and it is clear, with one parameter logarithmically related to the other, that the rate at which fitted values change throughout this search will be highly dependent on the parameterization adopted. Thus, while the intrinsic curvatures are similar, their parameter-effects values are quite different. Indeed, many features of a non-linear regression depend importantly on the parameterization adopted. Difficulties in applying numerical methods to find ^ b can often be alleviated by altering the parameterization used. Also, the shape of likelihood surfaces and the performance of likelihood ratio tests can be sensitive to the parameterization. More information on these matters can be found in the encyclopaedic work by Seber and Wild (1989). The foregoing discussion has assumed that a value ^ b which minimizes (12.22) is available. As pointed out previously, there is no general closed-form solution and numerical methods must be used. Equation (12.24) suggests an iterative approach, with F evaluated at the current estimate being applied as in this equation to give a new estimate of b. Indeed, this approach is closely related to the Gauss±Newton method for obtaining numerical estimates. However, the numerical analysis of non-linear regression can be surprisingly awkward, with various subtle problems giving rise to unstable solutions. The simple Gauss±Newton is not to be recommended and more sophisticated methods are widely available in the form of suites of programs or, more conveniently, in major statistical packages and these should be the first choice. Much more on this problem can be found in Chapters 3, 13 and 14 of Seber and Wild (1989). Uses of non-linear regression The methods described in Chapter 7 and §§12.1±12.2 are used, in a descriptive way, to assess associations between variables and summarize relationships between variables. Non-linear methods can be used in this context, but they 410 Further regression models can often help in circumstances where the analysis is intended to shed light on deeper aspects of the problem being studied. The linear methods, including those in §12.2, can describe a wide range of shapes of curve and, if they are adequate for the task at hand, the statistician is likely to use these rather than non-linear methods, with their awkward technic- alities and the approximate nature of the attendant statistical results. Moreover, the selection of the degree of a conventional polynomial requires some care, and a good deal more is required when working with fractional polynomials. When the analyst has the whole range of differentiable functions to choose from, the issue of model selection becomes even more difficult. If the only guidance is the fit to the data, then considerable ingenuity can be applied to finding functions which fit the particular sample well. However, the performance of such models on future data sets may well be questionable. One instance where a non-linear curve may be required is when the methods of §12.1 are inadequate. The methods of §12.2 may then be adequate, but there may be reasons why it is desirable to be able to describe the fitted curve with a succinct equation. However, there are applications where there is guidance on the form of f x, b. The guidance can range from complete specification to rather general aspects, such as the curve tending to an asymptote. The Michaelis±Menten equation of enzyme kinetics describes the velocity, y, at which a reaction proceeds in the presence of an enzyme when the substrate has concentration s. Study of the mechanisms underlying this kind of reaction indicate that y V max s K s : 12:27 The parameters V max and K have meaning in terms of the reaction: the former is the maximum rate at which the reaction proceeds, which occurs when the system is saturated with substrate (i.e. s 3I), and K measures how quickly the system gets to that rate, being the substrate concentration at which u 1 2 y max . Taking y y, x s and b V max , K this equation is of the form (12.23). Quite often a biological system can plausibly be modelled by a system of differential equations. A common instance of this is the use of compartmental models. A widespread application of this methodology occurs in pharmaco- kinetics, in which the passage of a drug through a patient can be modelled by assuming the patient comprises several interconnecting compartments. Figure 12.7 gives an example of this with two compartments. The two compartments are associated with the tissues of the body and the central compartment includes features such as the circulation. When a bolus dose of a drug is administered, it first goes into the central compartment and from there transfers into the tissues. Once in the tissues, it transfers back to the central compartment. From there it can either be excreted or 12.4 Non-linear regression 411 Central compartment concentration X C volume V C Dose D Tissue compartment concentration X T volume V T k CT k E k TC Fig. 12.7 Schematic representation of a two-compartment model. The k parameters are the constants governing the rate of transfer between the compartments or excreted from the body. go back to the tissues. Each compartment is ascribed a notional volume, V C and V T , and the drug concentration in each is X C and X T . The system can be modelled by two linear differential equations. These can be solved to give, for example, X C Ae Àat Be Àbt , 12:28 where t is the time since the injection of the bolus and A, B, a and b can be expressed in terms of the underlying parameters given in Fig. 12.7. Once (12.28) has been identified it can be used as f x, b in (12.23). This form of equation has been developed greatly in the area of population pharmacokinetics and else- where; see Davidian and Giltinan (1995). A minor point which should not be overlooked when designing experiments that are to be analysed with this form of equation is that if one of a or b is large, then the only information on the corresponding term of (12.28) will reside in observations taken very soon after the administration of the drug. If few or no observations are made or are possible in this interval, then it may be impossible to obtain satisfactory estimates for the parameters. Analyses based on compartmental models are widely and successfully used and their basis in a differential equation gives them a satisfying underpinning in non-statistical theory. However, while they may be more appealing than simple empirical models with no such basis, the model indicated in Fig. 12.7 is a gross simplification and should not be interpreted too literally. It is not uncommon for a compartmental volume to be estimated to exceed greatly the volume of the whole patient. Models which arise from differential equations are common in the modelling of growth of many kinds, from the growth of cells to that of individuals or populations. Appleton (1995) gives an interesting illustration of the application of a model based on a system of differential equations to mandibular growth in utero. This author also provides an interesting discussion of the nature of statistical modelling. Curves that can arise from differential equations include: 412 Further regression models f x, bÀde Àkx Mono-exponential f x, bA B 1 e ÀlÀkx Four-parameter logistic f x, ba expÀe ÀkxÀg Gompertz While these, and many other curves, can be thought of as solutions to differen- tial equations they may prove useful as empirical curves, with little or no guidance on the choice of equation available from considerations of the underlying subject-matter. Sometimes the guidance for choosing a curve is very general and does not wholly specify the form of the curve. Matthews et al. (1999) fit the pair of curves: f x, bA1 À e Àk a x f x, bA1 À e Àk n x , 12:29 where the first curve is fitted to the arterial data and the second to the venous data from the Kety±Schmidt technique for the measurement of cerebral blood flow. The Kety±Schmidt technique required a pair of curves starting from 0 and rising to a common asymptote. The method did not indicate the form of curves in any more detail. The above choice was based on nothing more substantial than the appearance of exponential functions in a number of related studies of gaseous diffusion in cerebral tissue and the fact that these curves gave a good fit in a large number of applications of the method. A further instance of this use of non-linear functions comes from the nerve conduction data analysed in Example 12.2. The data seem to indicate that the conduction velocity increases through childhood, tending to a limit in adulthood. There was some interest in estimating the mean adult conduction velocity. Figure 12.8 shows the data, with variable t on the horizontal axis being the age since conception, together with two plausible models, both of which model the mean conduction velocity as an increasing function of t tending to a limit A, which is the parameter of interest. The first curve is the exponential A1 À e Àbt and the second is the hyperbola A1 À b=b t. Both of these are zero at t 0 and tend to A as t 3I. The fit of both models can be criticized but, nevertheless, one feature of the models deserves careful attention. The estimate A and its standard error are 22Á00 (0Á16) for the exponential and 24Á53 (0Á16) for the hyperbola, giving approximate 95% confidence intervals of (21Á69, 22Á31) and (24Á22, 24Á84). Both models give plausible but entirely inconsistent estimates for A. The stand- ard errors measure the sampling variation in the parameter estimate, given that the model is correct. Uncertainty in the specification of the model has not been incorporated into the standard error. Situations, such as the use of (12.27) for the Michaelis±Menten equation, where the prior guidance is sufficiently strong to 12.4 Non-linear regression 413 30 25 20 15 10 5 0 0 204060 Age (years since conception) Conduction velocity (m/s) Fig. 12.8 Fits of mono-exponential (± ± ±) and hyperbola (- - -) to nerve conduction velocity data of Example 12.2. dictate a specific model are the exception rather than the rule. Once some element of choice enters then the statistician must be aware that, while some parameters may appear to have an interpretation that is independent of the model, such as mean adult velocity, the estimates and standard errors of such parameters are certainly not model-independent. In these circumstances the statistician would be unwise to rely on an estimate from a single model. Methods for accommodating model uncertainty, so that the above standard errors might be appropriately inflated, have been developed, but it remains a matter of debate whether they offer a remedy that is more appropriate and helpful than simply presenting results from a range of plausible models. Details on these methods can be found in Chatfield (1995) and Draper (1995). Model dependence and model uncertainty are issues which clearly have significance across statistical modelling and are not restricted to non-linear models. However, the rarity of circumstances which, a priori, prescribe a parti- cular model means that the statistician is wise to be especially aware of its effects in non-linear modelling. Models are often fitted when the intention of an analysis is to determine a quantity such as a blood flow rate or a cell birth rate and it is hoped that the value estimated does not rest heavily on an arbitrarily chosen model that has been fitted. 414 Further regression models [...]... time intervals then b 360 =k (degrees) Even though this deals with one non-linear parameter, the resulting equation is still non-linear because g does not appear linearly However, expanding the sine function gives the alternative formula E y j x a0 z1 sin bx z2 cos bx, where z1 a1 cosg and z2 a1 sing This equation is linear in these parameters and the regression can be fitted by recalling... gives rise to complexities in both their estimation and their definition There is considerable merit in viewing the process as estimating random effects rather than as an exercise in extending the definition of a residual in non-hierarchical models Indeed, there is much relevant background material in the article by Robinson (1991) on estimating random effects As a brief and incomplete illustration of... because individuals are measured longitudinally, but adequate modelling of the form of the response usually requires non-linear functions 12.5 Multilevel models 429 Non-linear models can be accommodated by repeatedly using Taylor expansions to linearize the model There are close connections between this way of extending linear multilevel models and the types of model obtained by extending non-linear... idea behind the important approach to the analysis of longitudinal data that is outlined in the next subsection Summary measures Perhaps the principal difficulty in analysing longitudinal data is coping with the dependency that is likely to exist between responses on the same individual However, there is no more difficulty in assuming that responses from different individuals are independent than in other... Goldstein (1995, Chapter 4) 12 .6 Longitudinal data In many medical studies there is interest not only in observing a variable at a given instant but in seeing how it changes over time This could be because the investigator wishes to observe how a variable evolves over time, such as the height of a growing child, or to observe the natural variation that occurs in a clinical measurement, such as the... that aspect of the response is of interest For example, in the study of the profile of the blood level of a shortacting drug, measurements may be made every 10 or 15 min in the initial stages when the profile is changing rapidly, but then less frequently, perhaps at 1, 2 and 3 h post-administration In many studies in the medical literature the reasons behind the timing of observations are seldom discussed,... much more likely to be highly model-dependent Such an example, concerning the rate of growth of tumours, is given by Gratton et al (1978) Linearizing non-linear equations Attempts to exploit any resemblance a non-linear equation has to a linear counterpart can often prove profitable but must be made with care Many non-linear equations include parameters that appear linearly An example is equation (12.29)... concerns to this arise in the analysis of longitudinal data and are discussed at more length in the next section and also in clinical trials (see §18 .6) ) It can also be useful to arrange to collect data in a way that deliberately leads to the partial observation of some or all vectors If the creatinine, sodium and albumin of the foregoing example are to be observed on premature infants then it may not... medicine and can be addressed by a wide variety of methods in addition to those provided by multilevel models Consequently, discussion of this kind of data is deferred until §§12 6 and 12Á7 However, it should be borne in mind that the methods described in this section can often be used fruitfully in the study of longitudinal data Random effects and building multilevel models Rather than attempting to... diffuse priors Estimation using Markov chain Monte Carlo (MCMC) methods will then provide estimates of error that take account of the uncertainty in the parameter estimates For a fuller discussion of the application of MCMC methods to multilevel models, see Appendix 2.4 of Goldstein (1995) and Goldstein et al (1998) The use of MCMC methods in Bayesian methodology is discussed in § 16. 4 More generally, this . attempt to use methods for linear regres- sions in a non-linear problem is to attempt to apply a transformation that makes 12.4 Non-linear regression 415 the problem accessible to linear methods. A. time intervals then b 360 =k (degrees). Even though this deals with one non-linear parameter, the resulting equation is still non-linear because g does not appear linearly. However, expand- ing. depar- ture of f x, b from linearity. Much work has been done on the issue of the degree of non-linearity or curvature in non-linear regression; important contributions include Beale (1 960 ) and