Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 38 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
38
Dung lượng
311,61 KB
Nội dung
Table 20.2 Properties of (a) direct variance estimates, (b) direct covariance estimates, (c) direct variance estimates and (d) direct regression estimates (Â1000) over 100 replications. (a) LLTI (1) yy 78X6, (2) yy 0X979, yy 79X6 S (1) yy S (2) yy Data Mean SD Mean SD m 0 371, n 0 18 880 79.6 0.834 128 9.365 m 0 371, n 0 3776 79.6 1.845 88.5 6.821 m 0 371, n 0 378 79.3 5.780 m 0 371, n 0 38 77.8 18.673 (b) LLTI/CARO; (1) yx 22X8, (2) yx 1X84, yx 24X7 S (1) yx S (2) yx Data Mean SD Mean SD m 0 371, n 0 18 880 24.7 0.645 115 14.227 m 0 371, n 0 3776 24.5 1.480 41.1 6.626 m 0 371, n 0 378 24.8 4.856 m 0 371, n 0 38 23.2 14.174 (c) CARO; (1) xx 90X5, (2) xx 6X09, xx 96X6 S (1) xx S (2) xx Data Mean SD Mean SD m 0 371, n 0 18 880 96.6 1.167 399 32.554 m 0 371, n 0 3776 96.5 2.264 152 12.511 m 0 371, n 0 378 96.6 7.546 m 0 371, n 0 38 92.0 22.430 (d) LLTI vs. CARO; b (1) yx 252, b (2) yx 302, b yx 255 B (1) yx B (2) yx Data Mean SD Mean SD m 0 371, n 0 18 880 255 6.155 288 26.616 m 0 371, n 0 3776 254 14.766 270 36.597 m 0 371, n 0 378 256 45.800 m 0 371, n 0 38 258 155.517 SIMULATION STUDIES 337 20.5. USING AUXILIARY VARIABLES TO REDUCE AGGREGATION EFFECTS using auxiliary variablesto reduce aggregation effects In Section 20.3 it was shown how individual-level data on y and x can be used in combination with aggregate data. In some cases individual-level data may not be available for y and x, but may be available for some auxiliary variables. Steel and Holt (1996a) suggested introducing extra variables to account for the correlations within areas. Suppose that there is a set of auxiliary variables, z, that partially characterize the way in which individuals are clustered within the groups and, conditional on z, the observations for individuals in area g are influenced by random group-level effects. The auxiliary variables in z may only have a small effect on the individual-level relationships and may not be of any direct interest. The auxiliary variables are only included as they may be used in the sampling process or to help account for group effects and we assume that there is no interest in the influence of these variables in their own right. Hence the analysis should focus on relationships averaging out the effects of auxiliary variables. However, because of their strong homogeneity within areas they may affect the ecological analysis greatly. The matrices z U [z 1 , F F F , z N ] 0 , c U [c 1 , F F F , c N ] 0 give the values of all units in the population. Both the explanatory and auxiliary variables may contain group-level variables, al- though there will be identification problems if the mean of an individual-level explanatory variable is used as a group-level variable. This leads to: Case (6) Data available: d 1 and {z t , t P s 0 }, aggregate data and individual- level data for the auxiliary variables. This case could arise, for example, when we have individual-level data on basic demographic variables from a survey and we have information in aggregate form for geographic areas on health or income obtained from the health or tax systems. The survey data may be a sample released from the census, such as the UK Sample of Anonymized Records (SAR). Steel and Holt (1996a) considered a multi-level model with auxiliary vari- ables and examined its implications for the ecological analysis of covariance matrices and correlation coefficients obtained from them. They also developed a method for adjusting the analysis of aggregate data to provide less biased estimates of covariance matrices and correlation coefficients. Steel, Holt and Tranmer (1996) evaluated this method and were able to reduce the biases by about 70 % by using limited amounts of individual-level data for a small set of variables that help characterize the differences between groups. Here we con- sider the implications of this model for ecological linear regression analysis. The model given in (20.1) to (20.2) is expanded to include z by assuming the following model conditional on z U and the groups used: w t m wjz b 0 wz z t n g t , t P g (20X14) where var(n g jz U , c U ) Æ (2) jz and var( t jz U , c U ) Æ (1) jz X (20X15) 338 ANALYSIS OF SURVEY AND GEOGRAPHICALLY AGGREGATED DATA This model implies E(w t jz U , c U ) m wjz b 0 wz z t , (20X16) var(w t jz U , c U ) Æ (1) jz Æ (2) jz Æ jz (20X17) and Cov(w t , w u jz U , c U ) Æ (2) jz ifc t c u , t T uX The random effects in (20.14) are different from those in (20.1) and (20.2) and reflect the within-group correlations after conditioning on the auxiliary vari- ables. The matrix Æ (2) jz has components Æ (2) xxjz , Æ (2) xyjz and Æ (2) yyjz and b 0 wz (b xz , b yz ) 0 . Assuming var(z t ) Æ zz then the marginal covariance matrix is Æ Æ jz b 0 wz Æ zz b wz (20X18) which has components Æ xx , Æ xy and Æ yy . We assume that the target of inference is b yx Æ À1 xx Æ xy , although this approach can be used to estimate regression coefficients at each level. Under this model, Steel and Holt (1996a) showed E[S (2) 1 jz U , c U ] Æ b 0 wz (S (2) 1zz À Æ zz )b wz ( " n à 1 À 1)Æ (2) jz X (20X19) Providing that the variance of S (2) 1 is O(m À1 1 ) (see (20.9)) the expectation of the ecological regression coefficients, B (2) 1yx S (2)À1 1xx S (2) 1xy , can be obtained to O(m À1 1 ) by replacing S (2) 1xx and S (2) 1xy by their expectations. 20.5.1. Adjusted aggregate regression If individual-level data on the auxiliary variables are available, the aggregation bias due to them may be estimated. Under (20.14), E[B (2) 1wz jz U , c U ] b wz where B (2) 1wz S (2)À1 1zz S (2) 1zw . If an estimate of the individual-level population covariance matrix for z U was available, possibly from another source such as s 0 , Steel and Holt (1996) proposed the following adjusted estimator of Æ: Æ 6 S (2) 1 B (2) 0 1wz Æ zz À S (2) 1zz B (2) 1wz S (2) 1jz B (2) 0 1wz Æ zz B (2) 1wz where Æ zz is the estimate of Æ zz calculated from individual-level data. This estimator corresponds to a Pearson-type adjustment, which has been proposed as a means of adjusting for the effect of sampling schemes that depend on a set of design variables (Smith, 1989). This estimator removes the aggregation bias due to the auxiliary variables. This estimator can be used to adjust simultan- eously for the effect of aggregation and sample selection involving design variables by including these variables in z U . For normally distributed data this estimator is MLE. Adjusted regression coefficients can then be calculated from Æ 6 , that is USING AUXILIARY VARIABLES TO REDUCE AGGREGATION EFFECTS 339 b 6yx Æ À1 6xx Æ 6xy X The adjusted estimator replaces the components of bias in (20.19) due to b 0 wz (S (2) 1zz À Æ zz )b wz by b 0 wz ( Æ zz À Æ zz )b wz . If Æ zz is an estimate based on an individual-level sample involving m 0 first-stage units then for many sample designs Æ zz À Æ zz O(1am 0 ), and so b 0 wz ( Æ zz À Æ zz )b wz is O(1am 0 ). The adjusted estimator can be rewritten as b 6yx B (2) 1yxjz b 6zx B (2) 1yzjx (20X20) where b 6zx Æ À1 6xx B (2) 0 1xz Æ zz . Corresponding decompositions apply at the group and individual levels: B (2) 1yx B (2) 1yxjz B (2) 1zx B (2) 1yzjx B (1) 1yx B (1) 1yxjz B (1) 1zx B (1) 1yzjx X The adjustment is trying to correct for the bias in the estimation of b zx by replacing B (2) 1zx by b 6zx . The bias due to the conditional variance components Æ (2) jz remains. Steel, Tranmer and Holt (1999) carried out an empirical investigation into the effects of aggregation on multiple regression analysis using data from the Australian 1991 Population Census for the city of Adelaide. Group-level data were available in the form of totals for 1711 census Collection Districts (CDs), which contain an average of about 250 dwellings. The analysis was confined to people aged 15 or over and there was an average of about 450 such people per CD. To enable an evaluation to be carried out data from the households sample file (HSF), which is a 1 % sample of households and the people within them, released from the population census were used. The evaluation considered the dependent variable of personal income. The following explanatory variables were considered: marital status, sex, degree, employed±manual occupation, employed±managerial or professional occupa- tion, employed±other, unemployed, born Australia, born UK and four age categories. Multiple regression models were estimated using the HSF data and the CD data, weighted by CD population size. The results are summarized in Table 20.3. The R 2 of the CD-level equation, 0.880, is much larger than that of the individual-level equation, 0.496. However, the CD-level R 2 indicates how much of the variation in CD mean income is being explained. The difference between the two estimated models can also be examined by comparing their fit at the individual level. Using the CD-level equation to predict individual-level income gave an R 2 of 0.310. Generally the regression coefficients estimated at the two levels are of the same sign, the exceptions being married, which is non- significant at the individual level, and the coefficient for age 20±29. The values can be very different at the two levels, with the CD-level coefficients being larger than the corresponding individual-level coefficients in some cases and smaller in others. The differences are often considerable: for example, the 340 ANALYSIS OF SURVEY AND GEOGRAPHICALLY AGGREGATED DATA Table 20.3 Comparison of individual, CD and adjusted CD-level regression equations. Individual level CD level Adjusted CD level Variable Coefficient SE Coefficient SE Coefficient SE Intercept 11 876.0 496.0 4 853.6 833.9 1 573.0 1 021.3 Married À8.5 274.0 4 715.5 430.0 7 770.3 564.4 Female À6 019.1 245.2 À3 067.3 895.8 2 195.0 915.1 Degree 8 471.5 488.9 21 700.0 1 284.5 23 501.0 1 268.9 Unemp À962.5 522.1 À390.7 1 287.9 569.5 1 327.2 Manual 9 192.4 460.4 1 457.3 1101.2 2 704.7 1 091.9 ManProf 20 679.0 433.4 23 682.0 1 015.5 23 037.0 1 023.7 EmpOther 11 738.0 347.8 6 383.2 674.9 7 689.9 741.7 Born UK 1 146.3 425.3 2 691.1 507.7 2 274.6 506.0 Born Aust 1 873.8 336.8 2 428.3 464.6 2 898.8 491.8 Age 15±19 À9 860.6 494.7 À481.9 1 161.6 57.8 1 140.6 Age 20±29 À3 529.8 357.6 2 027.0 770.2 1 961.6 758.4 Age 45±59 585.6 360.8 434.3 610.2 1 385.1 1 588.8 Age 60 255.2 400.1 1 958.0 625.0 2 279.5 1 561.4 R 2 0.496 0.880 0.831 coefficient for degree increases from 8471 to 21 700. The average absolute difference was 4533. The estimates and associated estimated standard errors obtained at the two levels are different and hence so is the assessment of their statistical significance. Other variables could be added to the model but the R 2 obtained was considered acceptable and this sort of model is indicative of what researchers might use in practice. The R 2 obtained at the individual level is consistent with those found in other studies of income (e.g. Davies, Joshi and Clarke, 1997). As with all regression models there are likely to be variables with some explanatory power omitted from the model; however, this reflects the world of practical data analysis. This example shows the aggregation effects when a reasonable but not necessarily perfect statistical model is being used. The log transform- ation was also tried for the income variable but did not result in an appreciably better fit. Steel, Tranmer and Holt (1999) reported the results of applying the adjusted ecological regression method to the income regression. The auxiliary variables used were: owner occupied, renting from government, housing type, aged 45±59 and aged 60. These variables were considered because they had relatively high within-CD correlations and hence their variances were subject to strong grouping effects and also it is reasonable to expect that individual-level data might be available for them. Because the adjustment relies on obtaining a good estimate of the unit-level covariance matrix of the adjustment variables, we need to keep the number of variables small. By choosing variables that charac- terize much of the difference between CDs we hope to have variables that will perform effectively in a range of situations. USING AUXILIARY VARIABLES TO REDUCE AGGREGATION EFFECTS 341 These adjustment variables remove between 9 and 75 % of the aggregation effect on the variances of the variables in the analysis. For the income variable the reduction was 32 % and the average reduction was 52 %. The estimates of the regression equation obtained from Æ 6 , that is b 6yx , are given in Table 20.3. In general the adjusted CD regression coefficients are no closer than those for the original CD-level regression equation. The resulting adjustment of R 2 is still considerably higher than that in the individual-level equation indicating that the adjustment is not working well. The measure of fit at the individual level gives an R 2 of 0.284 compared with 0.310 for the unadjusted equation, so according to this measure the adjustment has had a small detrimental effect. The average absolute difference between the CD- and individual-level coefficients has also increased slightly to 4771. While the adjustment has eliminated about half the aggregation effects in the variables it has not resulted in reducing the difference between the CD- and individual-level regression equations. The adjustment procedure will be effect- ive if B (2) 1yxjz B (1) 1yxjz , B (2) 1yzjx B (1) 1yzjx and b 6zx (z) B (1) 1zx . Steel, Tranmer and Holt (1999) found that the coefficients in B (1) yxjz and B (2) yxjz are generally very different and the average absolute difference is 4919. Inclusion of the auxiliary variables in the regression has had no appreciable effect on the aggregation effect on the regression coefficients and the R 2 is still considerably larger at the CD level than the individual level. The adjustment procedure replaces B (2) 1zx B (2) 1yzjx by b 6zx B (2) 1yzjx . Analysis of these values showed that the adjusted CD values are considerably closer to the individual-level values than the CD-level values. The adjustment has had some beneficial effect in the estimation of b zx b yzjx and the bias of the adjusted estimators is mainly due to the difference between the estimates of b yxjz . The adjustment has altered the component of bias it is designed to reduce. The remaining biases mean that the overall effect is largely unaffected. It appears that conditioning on the auxiliary variables has not sufficiently reduced the biases due to the random effects. Attempts were made to estimate the remaining variance components from purely aggregate data using MLn but this proved unsuccessful. Plots of the squares of the residuals against the inverse of the population sizes of groups showed that there was not always an increasing trend that would be needed to obtain sensible estimates. Given the results in Section 20.3 concerning the use of purely aggregate data, these results are not surprising. The multi-level model that incorporates grouping variables and random effects provides a general framework through which the causes of ecological biases can be explained. Using a limited number of auxiliary variables it was possible to explain about half the aggregation effects in income and a number of explanatory variables. Using individual-level data on these adjustment vari- ables the aggregation effects due to these variables can be removed. However, the resulting adjusted regression coefficients are no less biased. This suggests that we should attempt to find further auxiliary variables that account for a very large proportion of the aggregation effects and for which it 342 ANALYSIS OF SURVEY AND GEOGRAPHICALLY AGGREGATED DATA would be reasonable to expect that the required individual-level data are available. However, in practice there are always going to be some residual group-level effects and because of the impact of " n à in (20.19) there is still the potential for large biases. 20.6. CONCLUSIONS conclusions This chapter has shown the potential for survey and aggregate data to be used together to produce better estimates of parameters at different levels. In par- ticular, survey data may be used to remove biases associated with analysis using group-level aggregate data even if it does not contain indicators for the groups in question. Aggregate data may be used to produce estimates of variance components when the primary data source is a survey that does not contain indicators for the groups. The model and methods described in this chapter are fairly simple. Development of models appropriate to categorical data and more evaluation with real datasets would be worthwhile. Sampling and nonresponse are two mechanisms that lead to data being missing. The process of aggregation also leads to a loss of information and can be thought of as a problem missing data. The approaches in this chapter could be viewed in this light. Further progress may be possible through use of methods that have been developed to handle incomplete data, such as those discussed by Little in this volume (chapter 18). ACKNOWLEDGEMENTS acknowledgements This work was supported by the Economic and Science Research Council (Grant number R 000236135) and the Australian Research Council. CONCLUSIONS 343 References Aalen, O. and Husebye, E. (1991) Statistical analysis of repeated events forming renewal processes. Statistics in Medicine, 10, 1227±40. Abowd, J. M. and Card, D. (1989) On the covariance structure of earnings and hours changes. Econometrica, 57, 411±45. Achen, C. H. and Shively, W. P. (1995) Cross Level Inference. Chicago: The University of Chicago Press. Agresti, A. (1990) Categorical Data Analysis. New York: Wiley. Allison, P. D. (1982) Discrete-time methods for the anlaysis of event-histories. In Socio- logical Methodology 1982 (S. Leinhardt, ed.), pp. 61±98. San Francisco: Jossey-Bass. Altonji, J. G. and Segal, L. M. (1996) Small-sample bias in GMM estimation of covariance structures. Journal of Business and Economic Statistics, 14, 353±66. Amemiya, T. (1984) Tobit models: a survey. Journal of Econometrics, 24, 3±61. Andersen, P. K., Borgan, O., Gill, R. D. and Keiding, N. (1993) Statistical Models Based on Counting Processes. New York: Springer-Verlag. Anderson, T. W. (1957) Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statis- tical Association, 52, 200±3. Andrews, M. and Bradley, S. (1997) Modelling the transition from school and the demand for training in the United Kingdom. Economica, 64, 387±413. Assakul, K. and Proctor, C. H. (1967) Testing independence in two- way contingency tables with data subject to misclassification. Psychometrika, 32, 67±76. Baltagi, B. H. (2001) Econometric Analysis of Panel Data. 2nd Edn. Chichester: Wiley. Basu, D. (1971) An essay on the logical foundations of survey sampling, Part 1. Foundations of Statistical Inference, pp. 203±42. Toronto: Holt, Rinehart and Winston. Bellhouse, D. R. (2000) Density and quantile estimation in large- scale surveys when a covariate is present. Unpublished report. Bellhouse, D. R. and Stafford, J. E. (1999) Density estimation from complex surveys. Statistica Sinica, 9, 407±24. Bellhouse, D. R. and Stafford, J. E. (2001) Local polynomial regression techniques in complex surveys. Survey Methodology, 27, 197±203 Berman, M. and Turner, T. R. (1992) Approximating point process likelihoods with GLIM. Applied Statistics, 41, 31±8. Berthoud, R. and Gershuny, J. (eds) (2000) Seven Years in the Lives of British Families. Bristol: The Policy Press. Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993) Efficient and Adaptive Estimation for Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press. Binder, D. A. (1982) Non-parametric Bayesian models for samples from finite popula- tions. Journal of the Royal Statistical Society, Series B, 44, 388±93. Binder, D. A. (1983) On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279±92. Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright ¶ 2003 John Wiley & Sons, Ltd. ISBN: 0-471-89987-9 Binder, D. A. (1992) Fitting Cox's proportional hazards models from survey data. Biometrika, 79, 139±47. Binder, D. A. (1996) Linearization methods for single phase and two-phase samples: a cookbook approach. Survey Methodology, 22, 17±22. Binder, D. A. (1998) Longitudinal surveys: why are these different from all other surveys? Survey Methodology, 24, 101±8. Birnbaum, A. (1962) On the foundations of statistical inference (with discussion). Journal of the American Statistical Association, 53, 259±326. Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975) Discrete Multivariate Analysis: Theory and Practice. Cambrdige, Massachusetts: MIT Press. Bjùrnstad, J. F. (1996) On the generalization of the likelihood function and the likeli- hood principle. Journal of the American Statistical Association, 91, 791±806. Blau, D. M. and Robins, P. K. (1987) Training programs and wages ± a general equilibrium analysis of the effects of program size. Journal of Human Resources, 22, 113±25. Blossfeld, H. P., Hamerle, A. and Mayer, K. U. (1989) Event History Analysis. Hillsdale, New Jersey: L. Erlbaum Associates. Boudreau, C. and Lawless, J. F. (2001) Survival analysis based on Cox proportional hazards models and survey data. University of Waterloo, Dept. of Statistics and Actuarial Science, Technical Report. Box, G. E. P. (1980) Sampling and Bayes' inference in scientific modeling and robustness. Journal of the Royal Statistical Society, Series A, 143, 383±430 (with discussion). Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. Journal of the Royal Statistical Society, Series B, 26, 211±52. Boyd, L. H. and Iversen, G. R. (1979) Contextual Analysis: Concepts and Statistical Techniques. Belmont, California: Wadsworth. Breckling, J. U., Chambers, R. L., Dorfman, A. H., Tam, S. M. and Welsh, A. H. (1994) Maximum likelihood inference from sample survey data. International Statistical Review, 62, 349±63. Breidt, F. J. and Fuller, W. A. (1993) Regression weighting for multiphase samples. Sankhya, B, 55, 297±309. Breidt, F. J., McVey, A. and Fuller, W. A. (1996±7) Two-phase estimation by imput- ation. Journal of the Indian Society of Agricultural Statistics, 49, 79±90. Breslow, N. E. and Holubkov, R. (1997) Maximum likelihood estimation of logistic regression parameters under two-phase outcome-dependent sampling. Journal of the Royal Statistical Society, Series B, 59, 447±61. Breunig, R. V. (1999) Nonparametric density estimation for stratified samples. Working Paper, Department of Statistics and Econometrics, The Australian National Uni- versity. Breunig, R. V. (2001) Density estimation for clustered data. Econometric Reviews, 20, 353±67. Brewer, K. R. W. and Mellor, R. W. (1973) The effect of sample structure on analytical surveys. Australian Journal of Statistics, 15, 145±52. Brick, J. M. and Kalton, G. (1996) Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215±38. Brier, S. E. (1980) Analysis of contingency tables under cluster sampling. Biometrika, 67, 591±6. Browne, M. W. (1984) Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62±83. Bryk, A. S. and Raudenbush, S. W. (1992) Hierarchical Linear Models: Application and Data Analysis Methods. Newbury Park, California: Sage. Bull, S. and Pederson, L. L. (1987) Variances for polychotomous logistic regression using complex survey data. Proceedings of the American Statistical Association, Survey Research Methods Section, pp. 507±12. 346 REFERENCES Buskirk, T. (1999) Using nonparametric methods for density estimation with complex survey data. Unpublished Ph. D. thesis, Arizona State University. Carroll, R. J., Ruppert, D. and Stefanski, L. A. (1995) Measurement Error in Nonlinear Models. London: Chapman and Hall. Cassel, C M., Sa È rndal, C E. and Wretman, J. H. (1977) Foundations of Inference in Survey Sampling. New York: Wiley. Chamberlain, G. (1982) Multivariate regression models for panel data. Journal of Econometrics, 18, 5±46. Chambers, R. L. (1996) Robust case-weighting for multipurpose establishment surveys. Journal of Official Statistics, 12, 3±22. Chambers, R. L. and Dunstan, R. (1986) Estimating distribution functions from survey data. Biometrika, 73, 597±604. Chambers, R. L. and Steel, D. G. (2001) Simple methods for ecological inference in 2x2 tables. Journal of the Royal Statistical Society, Series A, 164, 175±92. Chambers, R. L., Dorfman, A. H. and Wang, S. (1998) Limited information likelihood analysis of survey data. Journal of the Royal Statistical Society, Series B, 60, 397±412. Chambers, R. L., Dorfman, A. H. and Wehrly, T. E. (1993) Bias robust estimation in finite populations using nonparametric calibration. Journal of the American Statis- tical Association, 88, 268±77. Chesher, A. (1997) Diet revealed? Semiparametric estimation of nutrient intake-age relationships (with discussion). Journal of the Royal Statistical Society, Series A, 160, 389±428. Clayton, D. (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biome- trika, 65, 14±51. Cleave, N., Brown, P. J. and Payne, C. D. (1995) Evaluation of methods for ecological inference. Journal of the Royal Statistical Society, Series A, 158, 55±72. Cochran, W. G. (1977) Sampling Techniques. 3rd Edn. New York: Wiley. Cohen, M. P. (1995) Sample sizes for survey data analyzed with hierarchical linear models. National Center for Educational Statistics, Washington, DC. Cosslett, S. (1981) Efficient estimation of discrete choice models. In Structural Analysis of Discrete Data with Econometric Applications (C. F. Manski and D. McFadden, eds), pp. 191±205. New York: Wiley. Cox, D. R. (1972) Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34, 187±220. Cox, D. R. and Isham, V. (1980) Point Processes. London: Chapman and Hall. Cox, D. R. and Oakes, D. (1984) Analysis of Survival Data. London: Chapman and Hall. David, M. H., Little, R. J. A., Samuhel, M. E. and Triest, R. K. (1986) Alternative methods for CPS income imputation. Journal of the American Statistical Associ- ation, 81, 29±41. Davies, H., Joshi, H. and Clarke, L. (1997) Is it cash that the deprived are short of? Journal of the Royal Statistical Society, Series A, 160, 107±26. Deakin, B. M. and Pratten, C. F. (1987) The economic effects of YTS. Employment Gazette, 95, 491±7. Decady, Y. J. and Thomas, D. R. (1999) Testing hypotheses on multiple response tables: a Rao-Scott adjusted chi-squared approach. In Managing on the Digital Frontier (A. M. Lavack, ed. ). Proceedings of the Administrative Sciences Association of Canada, 20, 13±22. Decady, Y. J. and Thomas, D. R. (2000) A simple test of association for contingency tables with multiple column responses. Biometrics, 56, 893±896. Deville, J C. and Sa È rndal, C E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376±82. REFERENCES 347 [...]... D (1993) The role of sampling weights when modeling survey data International Statistical Review, 61, 317±37 Pfeffermann, D (1996) The use of sampling weights for survey data analysis Statistical Methods in Medical Research, 5, 239±61 Pfeffermann, D and Holmes, D J (1985) Robustness considerations in the choice of method of inference for regression analysis of survey data Journal of the Royal Statistical... 97±98, 109 , 325 cut-off 19, 21 ignorable 1, 7, 8, 30, 53, 155, 157, 183, 232, 290 Tabular data 76±81 Testing 79, 82 Bonferroni procedure 92±93, 97 100 , 106 goodness -of- fit 82, 87, 90, 93, 213±215 jackknife 88, 94 101 Hotelling 131, 183±185 likelihood ratio 86±88, 102 Mantel±Haenszel 107 nested hypotheses 88, 90, 105 , 106 Pearson 86±88, 101 102 Ordinary least squares 209 Outliers 51 Overdispersion 107 ,... of survey design on multivariate analysis In O'Muircheartaigh, C A and Payne, C (eds) The Analysis of Survey Data, Vol 2: Model Fitting, New York: Wiley, pp 175±92 3 T M F Smith (1978), Principles and problems in the analysis of repeated surveys In N K Namboodiri (ed.) Survey Sampling and Measurement, New York: Academic Press, Ch 13, pp 201±16 4 T M F Smith (1981), Regression analysis for complex surveys... Scatterplots with survey data American Statistician, 52, 58±69 Korn, E L and Graubard, B I (1999) Analysis of Health Surveys New York: Wiley Korn, E L Graubard, B I and Midthune, D (1997) Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale American Journal of Epidemiology, 145, 72±80 Kott, P S (1990) Variance estimation when a first phase area sample is restratified Survey Methodology,... misclassification, and other issues in the analysis of survey samples of life histories In Longitudinal Analysis of Labor Market Data (J J Heckman and B Singer, eds), Ch 5 Cambridge: Cambridge University Press Holland, P (1986) Statistics and causal inference Journal of the American Statistical Association, 81, 945±61 Holt, D (1989) Aggregation versus disaggregation In Analysis of Complex Surveys (C Skinner, D Holt... Sources and types of sociological data In Handbook of Modern Sociology (R Farris, ed ) Chicago: Rand McNally Roberts, G., Rao, J N K and Kumar, S (1987) Logistic regression analysis of sample survey data Biometrika, 74, 1±12 Robins, J M., Greenland, S and Hu, F C (1999) Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome Journal of the American... to Part B: aggregated analysis In Analysis of Complex Surveys (C Skinner, D Holt and T M F Smith, eds) Chichester: Wiley Smith, T M F (1994) Sample surveys 1975±1990; an age of reconciliation? International Statistical Review, 62, 5±34 (with discussion) Smith, T M F and Holt, D (1989) Some inferential problems in the analysis of surveys over time Proceedings of the 47th Session of the ISI, Vol 1, pp... Bureau of the Census Fay, R E (1996) Alternative paradigms for the analysis of imputed survey data Journal of the American Statistical Association, 91, 490±8 (with discussion) REFERENCES 349 Feder, M., Nathan, G and Pfeffermann, D (2000) Multilevel modelling of complex survey longitudinal data with time varying random effects Survey Methodology, 26, 53±65 Fellegi, I P (1980) Approximate test of independence... `History and development of the theoretical foundations of surveys' by J N K Rao and D R Bellhouse, Survey Methodology, 16, 26±9 8 T M F Smith (1990), Discussion of `Public confidence in the integrity and validity of official statistics' by J Hibbert, Journal of the Royal Statistical Society, Series A, 153, 137 9 T M F Smith (1991), Discussion of A Hasted et al `Statistical analysis of public lending right... 82, 83, 109 ±121, 155 Categorical data: see Part B Censoring 226±229, 233 Census parameter 3, 23, 82, 84, 104 , 180 Classification error: see Measurement Error Clustered data 107 , 234±235, 242 Competing risks 239, 252, 258 Complex survey data 1 Confidence intervals 15 simultaneous 93 Contextual analysis 323 Covariance structure model 207± 210 Cross-classified data 76±81, 86±96 Current status data 241 . analysis for complex survey designs, unpublished report, U.S. Bureau of the Census. Fay, R. E. (1996) Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical. Weighting, misclassification, and other issues in the analysis of survey samples of life histories. In Longitudinal Analysis of Labor Market Data (J. J. Heckman and B. Singer, eds), Ch. 5. Cambridge:. D. (1997) Testing of distribution functions from complex sample surveys. Journal of Official Statistics, 13, 123±42. Lancaster, T. (1990) The Econometric Analysis of Transition Data. Cambridge: