Applied regression analysis a research tool 2ed (rawlings) 1998

Applied Regression Analysis: A Research Tool, Second Edition John O Rawlings Sastry G Pantula David A Dickey Springer Springer Texts in Statistics Advisors: George Casella Springer New York Berlin Heidelberg Barcelona Hong Kong London Milan Paris Singapore Tokyo Stephen Fienberg Ingram Olkin Springer Texts in Statistics Alfred: Elements of Statistics for the Life and Social Sciences Berger: An Introduction to Probability and Stochastic Processes Bilodeau and Brenner: Theory of Multivariate Statistics Blom: Probability and Statistics: Theory and Applications Brockwell and Davis: An Introduction to Times Series and Forecasting Chow and Teicher: Probability Theory: Independence, Interchangeability, Martingales, Third Edition Christensen: Plane Answers to Complex Questions: The Theory of Linear Models, Second Edition Christensen: Linear Models for Multivariate, Time Series, and Spatial Data Christensen: Log-Linear Models and Logistic Regression, Second Edition Creighton: A First Course in Probability Models and Statistical Inference Dean and Voss: Design and Analysis of Experiments du Toit, Steyn, and Stumpf: Graphical Exploratory Data Analysis Durrett: Essentials of Stochastic Processes Edwards: Introduction to Graphical Modelling, Second Edition Finkelstein and Levin: Statistics for Lawyers Flury: A First Course in Multivariate Statistics Jobson: Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and Multivariate Methods Kalbfleisch: Probability and Statistical Inference, Volume I: Probability, Second Edition Kalbfleisch: Probability and Statistical Inference, Volume II: Statistical Inference, Second Edition Karr: Probability Keyfitz: Applied Mathematical Demography, Second Edition Kiefer: Introduction to Statistical Inference Kokoska and Nevison: Statistical Tables and Formulae Kulkarni: Modeling, Analysis, Design, and Control of Stochastic Systems Lehmann: Elements of Large-Sample Theory Lehmann: Testing Statistical Hypotheses, Second Edition Lehmann and Casella: Theory of Point Estimation, Second Edition Lindman: Analysis of Variance in Experimental Design Lindsey: Applying Generalized Linear Models Madansky: Prescriptions for Working Statisticians McPherson: Applying and Interpreting Statistics: A Comprehensive Guide, Second Edition Mueller: Basic Principles of Structural Equation Modeling: An Introduction to LISREL and EQS (continued after index) John O Rawlings Sastry G Pantula David A Dickey Applied Regression Analysis A Research Tool Second Edition With 78 Figures John O Rawlings Sastry G Pantula David A Dickey Department of Statistics North Carolina State University Raleigh, NC 27695 USA Editorial Board George Casella Biometrics Unit Cornell University Ithaca, NY 14853-7801 USA Stephen Fienberg Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213-3890 USA Ingram Olkin Department of Statistics Stanford University Stanford, CA 94305 USA Library of Congress Cataloging-in-Publication Data Rawlings, John O., 1932– Applied regression analysis: a research tool — 2nd ed / John O Rawlings, Sastry G Pentula, David A Dickey p cm — (Springer texts in statistics) Includes bibliographical references and indexes ISBN 0-387-98454-2 (hardcover: alk paper) regression analysis I Pentula, Sastry G II Dickey, David A III Title IV Series QA278.2.R38 1998 97-48858 519.5 36—dc21 Printed on acid-free paper c 1989 Wadsworth, Inc c 1998 Springer-Verlag New York, Inc All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone 987654321 ISBN 0-387-98454-2 Springer-Verlag New York Berlin Heidelberg SPIN 10660129 To Our Families PREFACE This text is a new and improved edition of Rawlings (1988) It is the outgrowth of several years of teaching an applied regression course to graduate students in the sciences Most of the students in these classes had taken a two-semester introduction to statistical methods that included experimental design and multiple regression at the level provided in texts such as Steel, Torrie, and Dickey (1997) and Snedecor and Cochran (1989) For most, the multiple regression had been presented in matrix notation The basic purpose of the course and this text is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical The emphasis is on regression concepts, rather than on mathematical proofs Proofs are given only to develop facility with matrix algebra and comprehension of mathematical relationships Good students, even though they may not have strong mathematical backgrounds, quickly grasp the essential concepts and appreciate the enhanced understanding The learning process is reinforced with continuous use of numerical examples throughout the text and with several case studies Some numerical and mathematical exercises are included to whet the appetite of graduate students The first four chapters of the book provide a review of simple regression in algebraic notation (Chapter 1), an introduction to key matrix operations and the geometry of vectors (Chapter 2), and a review of ordinary least squares in matrix notation (Chapters and 4) Chapter also provides a foundation for the testing of hypotheses and the properties of sums of squares used in analysis of variance Chapter is a case study giving a complete multiple regression analysis using the methods reviewed in the viii PREFACE first four chapters Then Chapter gives a brief geometric interpretation of least squares illustrating the relationships among the data vectors, the link between the analysis of variance and the lengths of the vectors, and the role of degrees of freedom Chapter discusses the methods and criteria for determining which independent variables should be included in the models The next two chapters include special classes of multiple regression models Chapter introduces polynomial and trigonometric regression models This chapter also discusses response curve models that are linear in the parameters Class variables and the analysis of variance of designed experiments (models of less than full rank) are introduced in Chapter Chapters 10 through 14 address some of the problems that might be encountered in regression A general introduction to the various kinds of problems is given in Chapter 10 This is followed by discussions of regression diagnostic techniques (Chapter 11), and scaling or transforming variables to rectify some of the problems (Chapter 12) Analysis of the correlational structure of the data and biased regression are discussed as techniques for dealing with the collinearity problem common in observational data (Chapter 13) Chapter 14 is a case study illustrating the analysis of data in the presence of collinearity Models that are nonlinear in the parameters are presented in Chapter 15 Chapter 16 is another case study using polynomial response models, nonlinear modeling, transformations to linearize, and analysis of residuals Chapter 17 addresses the analysis of unbalanced data Chapter 18 (new to this edition) introduces linear models that have more than one random effect The ordinary least squares approach to such models is given This is followed by the definition of the variance–covariance matrix for such models and a brief introduction to mixed effects and random coefficient models The use of iterative maximum likelihood estimation of both the variance components and the fixed effects is discussed The final chapter, Chapter 19, is a case study of the analysis of unbalanced data We are grateful for the assistance of many in the development of this book Of particular importance have been the dedicated editing of the earlier edition by Gwen Briggs, daughter of John Rawlings, and her many suggestions for improvement It is uncertain when the book would have been finished without her support A special thanks goes to our former student, Virginia Lesser, for her many contributions in reading parts of the manuscript, in data analysis, and in the enlistment of many data sets from her graduate student friends in the biological sciences We are indebted to our friends, both faculty and students, at North Carolina State University for bringing us many interesting consulting problems over the years that have stimulated the teaching of this material We are particularly indebted to those (acknowledged in the text) who have generously allowed the use of their data In this regard, Rick Linthurst warrants special mention for his stimulating discussions as well as the use of his data We acknowledge the encouragement and valuable discussions of colleagues in the Department PREFACE ix of Statistics at NCSU, and we thank Matthew Sommerville for checking answers to the exercises We wish to thank Sharon Sullivan and Dawn Haines for their help with LATEX Finally, we want to express appreciation for the critical reviews and many suggestions provided for the first edition by the Wadsworth Brooks/Cole reviewers: Mark Conaway, University of Iowa; Franklin Graybill, Colorado State University; Jason Hsu, Ohio State University; Kenneth Koehler, Iowa State University; B Lindsay, The Pennsylvania State University; Michael Meridith, Cornell University; M B Rajarshi, University of Poona (India); Muni Srivastava, University of Toronto; and Patricia Wahl, University of Washington; and for the second edition by the Springer-Verlag reviewers Acknowledgment is given for the use of material in the appendix tables Appendix Table A.7 is reproduced in part from Tables and of Durbin and Watson (1951) with permission of the Biometrika Trustees Appendix Table A.8 is reproduced with permission from Shapiro and Francia (1972), Journal of the American Statistical Association The remaining appendix tables have been computer generated by one of the authors We gratefully acknowledge permission of other authors and publishers for use of material from their publications as noted in the text Note to the Reader Most research is aimed at quantifing relationships among variables that either measure the end result of some process or are likely to affect the process The process in question may be any biological, chemical, or physical process of interest to the scientist The quantification of the process may be as simple as determining the degree of association between two variables or as complicated as estimating the many parameters of a very detailed nonlinear mathematical model of the system Regardless of the degree of sophistication of the model, the most commonly used statistical method for estimating the parameters of interest is the method of least squares The criterion applied in least squares estimation is simple and has great intuitive appeal The researcher chooses the model that is believed to be most appropriate for the project at hand The parameters for the model are then estimated such that the predictions from the model and the observed data are in as good agreement as possible as measured by the least squares criterion, minimization of the sum of squared differences between the predicted and the observed points Least squares estimation is a powerful research tool Few assumptions are required and the estimators obtained have several desirable properties Inference from research data to the true behavior of a process, however, can be a difficult and dangerous step due to unrecognized inadequacies in the data, misspecification of the model, or inappropriate inferences of x PREFACE causality As with any research tool it is important that the least squares method be thoroughly understood in order to eliminate as much misuse or misinterpretation of the results as possible There is a distinct difference between understanding and pure memorization Memorization can make a good technician, but it takes understanding to produce a master A discussion of the geometric interpretation of least squares is given to enhance your understanding You may find your first exposure to the geometry of least squares somewhat traumatic but the visual perception of least squares is worth the effort We encourage you to tackle the topic in the spirit in which it is included The general topic of least squares has been broadened to include statistical techniques associated with model development and testing The backbone of least squares is the classical multiple regression analysis using the linear model to relate several independent variables to a response or dependent variable Initially, this classical model is assumed to be appropriate Then methods for detecting inadequacies in this model and possible remedies are discussed The connection between the analysis of variance for designed experiments and multiple regression is developed to build the foundation for the analysis of unbalanced data (This also emphasizes the generality of the least squares method.) Interpretation of unbalanced data is difficult It is important that the application of least squares to the analysis of such data be understood if the results from computer programs designed for the analysis of unbalanced data are to be used correctly The objective of a research project determines the amount of effort to be devoted to the development of realistic models If the intent is one of prediction only, the degree to which the model might be considered realistic is immaterial The only requirement is that the predictions be adequately precise in the region of interest On the other hand, realism is of primary importance if the goal is a thorough understanding of the system The simple linear additive model can seldom be regarded as a realistic model It is at best an approximation of the true model Almost without exception, models developed from the basic principles of a process will be nonlinear in the parameters The least squares estimation principle is still applicable but the mathematical methods become much more difficult You are introduced to nonlinear least squares regression methods and some of the more common nonlinear models Least squares estimation is controlled by the correlational structure observed among the independent and dependent variables in the data set Observational data, data collected by observing the state of nature according to some sampling plan, will frequently cause special problems for least squares estimation because of strong correlations or, more generally, near-linear dependencies among the independent variables The seriousness of the problems will depend on the use to be made of the analyses Understanding the correlational structure of the data is most helpful in in- REFERENCES 645 [144] F M Speed, R R Hocking, and O P Hackney Methods of analysis of linear models with unbalanced data Journal of the American Statistical Association, 73:105–112, 1978 [145] R G D Steel, J H Torrie, and D A Dickey Principles and Procedures of Statistics: A Biometrical Approach McGraw-Hill, New York, 3rd edition, 1997 [146] C M Stein Multiple regression In Contributions to Probability and Statistics, Essays in Honor of Harold Hotelling Stanford University Press, Stanford, California, 1960 [147] G W Stewart Introduction to Matrix Computations Academic Press, New York, 1973 [148] F S Swed and C Eisenhart Tables for testing randomness of grouping in a sequence of alternatives Annals of Mathematical Statistics, 14:66–87, 1943 [149] H Theil Principles of Econometrics Wiley, New York, 1971 [150] R A Thisted Comment: A critique of some ridge regression methods Journal of the American Statistical Association, 75:81–86, 1980 [151] J W Tukey Exploratory Data Analysis Addison-Wesley, Reading, Massachusetts, 1977 [152] J C van Houwelingen Use and abuse of variance models in regression Biometrics, 44:1073–1081, 1988 [153] A Wald The fitting of straight lines if both variables are subject to error Annals of Mathematical Statistics, 11:284–300, 1940 [154] J T Webster, R F Gunst, and R L Mason Latent root regression analysis Technometrics, 16:513–522, 1974 [155] S Weisberg An empirical comparison of the percentage points of W and W Biometrika, 61:644–646, 1974 [156] S Weisberg Comment on White and MacDonald (1980) Journal of the American Statistical Association, 75:28–31, 1980 [157] S Weisberg A statistic for allocating Cp to individual cases Technometrics, 23:27–31, 1981 [158] S Weisberg Applied Linear Regression Wiley, New York, 2nd edition, 1985 [159] H White and G M MacDonald Some large-sample tests for nonnormality in the linear regression model (with comment by S Weisberg) Journal of the American Statistical Association, 75:16–31, 1980 646 REFERENCES [160] F S Wood Comment: Effect of centering on collinearity and interpretation of the constant The American Statistician, 38:88–90, 1984 [161] H Working and H Hotelling Application of the theory of error to the interpretation of trends Journal of the American Statistical Association, Supplement (Proceedings), 24:73–85, 1929 AUTHOR INDEX Addelman, S., 336 Afifi, A A., 220, 226, 227 Agresti, Alan, 510 Ahrenholz, D W., 96, 352, 360 Akaike, H., 225 Alderdice, D F., 258–262 Allen, D M., 230 Anderson, R L., 494 Anderson, T W., 247 Andrews, D F., 98, 319, 395, 460 Anscombe, F J , 344, 345 Atkinson, A C., 342, 363 Baldwin, K F., 446, 461 Bancroft, T A., 227 Bartlett, M S., 291, 398, 404, 407, 409 Basson, R P., 546 Belsley, D A., 91, 341–343, 361, 363, 364, 370, 371, 373 Bendel, R B., 220, 226, 227 Berk, K N., 209, 219, 222, 373 Biggar, J W., 395 Blom, G., 356 Bloomfield, P., 330, 351 Bolch, B W., 343 Box, G E P., 236, 255, 328, 400, 404, 409–411 Bradu, D., 439 Brown, R L., 344 Bunke, O., 231 Cameron, E., 98, 319, 569 Campbell, F., 446 Carroll, R J., 335, 336, 338, 509 Carter, R L., 337 Clarke, G P Y., 501 Cochran, W G., vii, 197 Cook, J., 337, 338 Cook, R D., 341–343, 358, 362, 370, 410 Corsten, L C A., 439 Cox, D R., 328, 404, 409, 411 Cramér, H., 78 Cure, W W., 492 Daniel, C., 356 Dickey, D A., vii, 243, 256, 581 Dixon, W J., 416, 568 Drake, S., 514 Draper, N R., 255, 497 Droge, B., 231 Durbin, J., 344, 354, 631 648 AUTHOR INDEX Eisenhart, C., 353 Erh, E T., 395 Evans, J M., 344 Feldstein, M., 337 Francia, R S., 359, 633 Francis, C A., 62, 65–67 Freund, R J., 546, 561, 568 Fuller, W A., 330, 334–338, 351, 419, 421, 494 Furnival, G M., 210, 211, 220 Gabriel, K R., 334, 436, 439, 473 Gallant, A R., 494, 497–501, 507–509, 538 Galpin, J S., 344, 358 Gray, R J., 342, 358 Graybill, F A., 78 Griffiths, W E., 225 Guarnieri, J A., 336 Gumpertz, M L., 585, 586 Gunst, R F., 370, 445, 457 Hackney, O P., 546, 552, 554, 559 Hampel, F R., 326 Hartley, H O., 356, 497 Hausman, W H., 37, 50, 53, 55, 57 Hawkins, D M., 344, 358, 445 Heck, W W., 492 Hedayat, A., 331, 344 Henderson, H V., 546, 562, 563, 568 Hernandez, F., 409, 410 Herzberg, A M., 98, 319, 395, 460 Hill, R C., 225 Hocking, R R., 206, 209, 220, 223, 224, 274, 286, 445, 546, 552, 554, 559, 583, 596 Hoerl, A E., 445, 446, 461, 473 Hotelling, H., 138 Householder, A S., 61 Huang, C J., 343 Huber, P J., 326 Hunter, J S , 236, 410 Hunter, W G , 236, 410 Jennrich, R I., 497 Johnson, R A., 409, 410 Judge, G G., 225 Kennard, R W., 445, 446, 461, 473 Kennedy, W J., 227 Kopecky, K J., 358 Kuh, E., 91, 341–343, 361, 363, 364, 370, 371, 373 Lee, T., 225 Linthurst, R A., 161 Littell, R C., 546, 561, 568 Lott, W F., 445 Lynn, M J., 445 MacDonald, G M., 358 Madansky, A., 336 Mallows, C L., 206, 223, 224 Marquardt, D W., 370, 373, 377, 445, 446, 497 Mason, R L., 445, 457 Miller, Jr., R G., 138 Milliken, G A., 564 Mombiela, R A., 489 Mosteller, R., 399 Myers, R H., 568 Nelder, J A., 490 Nelson, L A., 489, 494 Nelson, W R., 95, 352, 360 Nielsen, D R., 395 Norusis, M J., 568 Pantula, S G., 585, 586, 588 Park, S H., 446 Pauling, L., 98, 319, 569 Pearson, E S., 356 Pennypacker, S P., 492 Pharos Books, 263 Pierce, D A., 342, 358 Pollock, K H., 588 Prescott, P., 342, 343 Quesenberry, C P., 343, 344 Ralston, M L., 497 Rao, C R., 53, 55 Rawlings, J O., 440, 492 AUTHOR INDEX Riggs, D S., 336 Riviere, J S., 440 Robson, D S., 331, 344 Rohlf, F J., 356 Ronchetti, E M., 326 Rousseeuw, P J., 326 Ruppert D., 335, 336, 338, 509 Saeed, M., 62, 65–67 SAS Institute, Inc., 165, 211, 215, 219, 220, 232, 243, 255, 283, 311, 342, 343, 379, 391, 417, 424, 467, 502, 510, 520, 525, 536, 546, 553, 559, 564, 566, 583, 588, 596, 597, 615 Satterthwaite, F E., 582, 592 Scheffé, H., 138, 575 Schwarz, G., 225 Searle, S R., 17, 37, 50, 53, 55, 57, 78, 86, 105, 113, 115, 120, 280, 282, 546, 562–564, 568, 581 Shapiro, S S., 359, 633 Shy-Modjeska, J S., 440, 443 Smith, H., 446, 497 Snedecor, G W., vii Snee, R D., 230, 370, 373, 446 Sokal, R R., 356 Spector, P C., 546, 561, 568 649 Speed, F M., 445, 546, 552, 554, 559, 564, 583 Stahel, W.A., 326 Steel, R G D., vii, 243, 256, 581 Stefanski, L A., 335–338 Stein, C M., 445 Stewart, G W., 37 Swed, F S., 353 Theil, H., 373 Thisted, R A., 371, 458, 473 Tidwell, P W., 400 Torrie, J H., vii, 243, 256, 581 Tukey, J W., 399 van Houwelingen, J C., 508 Wang, P C., 410 Watson, G S., 354, 631 Webster, J T., 445 Weisberg, S., 230, 341–343, 356, 358, 359, 362 Welsch, R E., 91, 341–343, 361, 363, 364, 370, 371, 373 White, H., 358 Wilk, M B., 359 Wilson, R B., 211, 220 Wood, F S., 356, 370 Working, H., 138 Young, G., 61 SUBJECT INDEX Adequacy of the model, 146, 240, 326 Adjusted coefficient of determination, 220, 222 Adjusted means, 314 Adjusted treatment means, 298 AIC criterion, 220, 225, 589 Analysis of cell means, 546 unweighted, 549 weighted, 552 Analysis of covariance, 271, 294, 307 Analysis of variance, 7, 107 Analysis of variance approach, 575, 593 Analysis of variance estimators, 576 Angle between vectors, 191 Anscombe plots, 344 Arcsin transformation, 404, 408 Assumptions homogeneous variance, 325 independent errors, 326 normality, 325, 326 Asymmetric distribution, 327 Autocorrelated errors, 588 B.L.U.E best linear unbiased estimators, 77, 325, 443, 552 Backward elimination, 213, 215, 467, 468 Balanced data definition, 545 Bartlett’s test statistic, 293 Bias, 210 Biased regression methods, 433, 434, 443, 446, 466 Biplot Gabriel’s, 433, 436, 442, 455, 463, 466, 473, 475, 476, 483 Bonferroni joint prediction intervals, 143 Bonferroni method, 137, 172, 507 Box–Cox transformation, 409, 428, 509, 532, 618 Box–Tidwell transformation, 400, 402 SUBJECT INDEX Cell means, 546 Centered, 256 Centered independent variables, 195, 434, 435, 447, 471 Central chi-square, 117 Characteristic roots, 57 Characteristic vectors, 57 Class statement, 283 Class variables, 269–271, 545 Coefficient of determination, 9, 220 Coefficient of variation, 203 Cofactor, 43 Collinear, 433 Collinearity, 197, 242, 256, 326, 333, 369, 433, 435, 443, 446, 450, 463, 466, 471, 478 diagnostics, 369 general comments, 457 impact of, 198 nonessential, 370 Column marker, 437, 442 Complete block design, 577 Components of variance, 573, 575 Composite hypothesis, 557 Condition index, 371 Condition number, 371, 473 Confidence ellipsoid, 172 Confidence interval estimates, 19 Consistent equations, 50 Consistent estimator, 337 Contrast, 276, 548 Controlled experiments, 208 Cook’s D, 361, 362 Corrected sum of squares, 111 Correction factor, 8, 110 Correlated errors, 29, 329 impact of, 329 Correlated residuals, 351 Correlation product moment, 50 Correlation matrix, 164, 469 651 Correlational structure, 434, 463, 466, 471 Covariance, 11 one-way analysis of, 592 Covariance of linear functions, 13 COVRATIO, 361, 364 Cox, Gertrude M., 301, 310 Critical point, 257 Data algae density–all treatments, 265 algae density–one treatment, 237, 238, 241 bacterial growth, 512 beer production, 245 biomass score, 267 blue mold infection, 357 cabbage, 301, 321 calcium uptake, 501, 502, 514 cancer, 319, 569 chemical response, 265 coho salmon, 258 collinearity, 372, 435 colon cancer, 98, 154 corn borer, 316, 430, 572 corn production, 593 dust exposure, 22 fishing pressure, 95, 153, 352, 354, 360 fitness, 123, 124, 128, 133, 136, 138, 141, 349 Francis, 62 Galileo, 514 growth, 429 Heagle mean ozone, 4, 33, 80, 81, 95, 109, 111, 118 Heagle ozone plot, 144, 147 Heagle soybean, 411, 515, 518, 531, 572 heart rate, 30 hospital days, 35, 95, 156 Lauri-Alberg, 394, 460 652 SUBJECT INDEX Linthurst–all variables, 463, 465, 482, 483 Linthurst–five variables, 161, 211, 215, 223, 227, 322, 377, 463 listening–reading data, 292 peak flow, 96, 152 pine salt tolerance, 402, 403, 427 precipitation, 263 Pseudomona dermatis, 34 radiation–seed weight, 34, 95, 155, 202, 348–350, 365 renal function, 440 sand, silt, clay mix, 395, 460 soil moisture, 511 soil organic matter, 93, 150 soil phosphorus, 310, 322 solar radiation, 32, 203 stolen timber prediction, 422, 430 temperature–herbicide, 318, 572 watershed, 179, 232, 428, 460 Defining matrix, 101–103 Degrees of freedom, 8, 126, 190 for a quadratic form, 103 Derivative, 237 Derivative-free method, 497, 525 DFBETAS, 361, 364 DFFITS, 361, 363 Dimensionality, 184 Distance between two vectors, 55 Dummy variables, 269, 272 Durbin–Watson test for independence, 354 Effects model, 271, 546, 547 Eigenanalysis, 57, 435, 437, 471 Eigenvalues, 57, 436, 437, 447 Eigenvectors, 57, 436, 437, 447, 448 Elimination of variables, 207 Equations consistent, 50, 51 inconsistent, 50 Equitable distribution property, 558, 559, 561 Errors-in-variables model, 334 Estimability, 545 Estimable, 276 Estimable functions, 545, 546, 549, 553, 554 general form, 554, 562, 566, 598 properties for balanced data, 557 unbalanced data, 558 Estimated generalized least squares, 421, 508, 574, 588 Estimated means, 80 Estimates regression coefficient, 80 Estimation, 206, 207 least squares, Experimental designs, 92 External Studentization, 342 Extrapolation, 206, 207, 256, 524 F-statistic, 117 F-to-enter, 214, 226 F-to-stay, 214, 226 Factoring matrix products, 84 Fixed effects model, 573 Forward selection, 213, 215 Full model, 126 Gauss–Newton method, 496 modified, 497 General linear hypothesis, 119, 308 General linear model, 553, 596 Generalized inverse, 53, 75, 282, 553 Generalized least squares, 330, 397, 411, 413, 417, 418, 509, 573 SUBJECT INDEX Generalized ridge regression estimators, 461 Geometry of least squares, 183 Gram-Schmidt orthogonalization, 74, 243 Grid search, 496 Harmonic mean, 551 Heterogeneous variances, 328 Heteroscedastic errors, 507 High leverage points, 330 Homogeneity of intercepts, 291 Homogeneity of regressions, 271, 288, 306 Homogeneity of slopes, 290 Hypothesis alternative, 17 null, 17 Inconsistent, 50 Indicator matrix, 272 Indicator variables, 269, 272 Influence statistics, 331, 361 Influential data points, 326, 330 Information criteria, 220, 225 Instrumental variables, 337 Intercept, Inverse of diagonal matrix, 46 Iterative reweighted least squares, 508 Jackknife residuals, 342 Join point, 493 Joint confidence intervals, 135, 172 Joint confidence regions, 139, 172 Joint prediction regions, 142 Kurtosis, 327 Lack of fit, 146, 240 Lack-of-fit sum of squares, 241 Ladder of transformations, 399 Latent roots, 57 653 Latent vectors, 57 Leaps-and-bounds algorithm, 211 Least squares estimation, principle, 3, 494 Least squares means, 610 Leverage plots, 359 Likelihood function, 77, 588 Likelihood ratio procedure, 501 Likelihood ratio tests, 589 Linear functions, 82 mean of, 86 variance of, 86 Linear transformation, 83 Linear-by-linear interaction, 253 Linearly dependent, 197 Linearly independent, 38, 48, 50 Logistic regression, 509 Logit transformation, 404, 492, 510 LSMEANS, 314, 564, 567, 584, 595 Mallow’s Cp , 220, 223 Marquardt’s compromise, 497 Matrix, 37 addition, 40 column space of, 39 decomposition of, 58 determinant, 42, 57 diagonal, 39 elements of, 38 full rank, 38, 79, 273 generalized inverse, 53 idempotent, 55, 80 identity, 39 inverse, 44, 79 multiplication, 40 nonnegative definite, 60, 105 nonsingular, 38, 44 not of full rank, 273 order of, 38 P , 80 projection, 55, 80, 187, 331 654 SUBJECT INDEX rank of, 38, 58, 184 real, 57 row operations, 51 singular, 38, 44 square, 39 symmetric, 40, 56, 57 transpose, 40 transpose of product, 42 variance–covariance, 82 Maximum likelihood estimator, 77, 325, 410, 507, 508, 573, 574, 588 Maximum R-square, 467 mci, multicollinearity index, 371, 473 Mean square, 108 Mean square error of prediction, 228 Mean square expectations, 10 Mean squared error, 209, 443 Means model, 271, 274, 286, 546 Measurement error, 29 Minimum variance property, 328 Minor, 43 Mixed model analysis, 615 Mixed models, 573, 574, 615 Model autocatalytic growth, 490 autoregressive, 588 Bertalanffy’s, 491 centered, 33 exponential decay, 405, 487 exponential growth, 405, 487, 495 first-order autoregressive, 419 fixed effects, 573 full rank, 75, 76 general mixed linear, 586 Gompertz growth, 490, 491 intrinsically linear, 405, 487 intrinsically nonlinear, 2, 487 inverse polynomial, 406, 490 linear, logistic growth, 406, 490, 491, 510 Mitscherlich, 489, 511 mixed, 573, 574, 579, 593 monomolecular growth, 428, 489, 491 no intercept, 21 nonlinear, 2, 398, 485, 486 one-way, 271 p independent variables, 75 polynomial response, 406 random, 574, 586 random coefficient regression, 584, 587 segmented polynomial, 493 split-plot, 579, 587, 591 two-level nested, 590 two-term exponential, 488, 502, 511 two-way cross-classified, 590, 591 two-way with covariate, 295 Weibull, 428, 492, 504, 512, 515, 524, 534 Model validation, 228 MSE mean squared error, 443, 446 Multicollinearity index, 371 Multicollinearity problem, 240 Multivariate normal distribution, 86 Mutually independent, 86 Near-singularity, 433 Nelson, L A., 316 Nested models, 132 Newton–Raphson method, 588 NID, Noncentral chi-square, 116, 117 Noncentrality parameter, 116, 117 Nonestimable, 273, 276, 548 Nonestimable functions, 276 Nonlinear models, 332, 485, 486 SUBJECT INDEX Nonnormality, 327, 398 impact of, 327 tests for, 358 Nonunique solution, 273 Normal equations, 4, 78 Normal order statistics, 356 Normal plot, 327 interpretation, 357 Normality, 77, 325, 326 not required for least squares estimation, 77 Observational data, 177, 463 Odds ratio, 510 One-way analysis of variance model, 575 Order statistics, 356 Ordinary least squares, 325, 413, 467 Orthogonal, 209 Orthogonal polynomial coefficients, 106 Orthogonal polynomials, 242 Orthogonal quadratic forms, 104 Orthogonal transformations, 54 Orthogonality property, 558, 559, 561 Outlier, 326, 330, 348 Outlier in the residuals, 331 Over-defined model, 503 Overparameterized, 273 Parameter, Parameter effects curvature, 501 Partial hypotheses, 554, 559 Partial regression coefficient, 76 Partial regression leverage plots, 359, 400 Partial sum of squares, 122, 130, 131, 134, 560 Polynomial models, 132, 235, 236, 250, 400, 485, 515, 520 cubic, 239 degree of, 250 655 first degree, 250, 251 higher order, 236, 251 interaction term, 252 order of, 250 risk of over fitting, 256 second degree, 252, 253, 520 second-order, 236 third degree, 255 Population marginal means, 564, 566, 610 Potentially influential, 331 Power family of transformations, 399, 408 Power of a test, 118 Precision measures of, 11 Predicted values, Prediction, 6, 90, 175, 176, 206, 207, 249 Prediction error, 14 Prediction interval, 136, 176 PRESS statistic, 230 Principal component, 436, 438, 447, 471, 473, 475, 476 Principal component analysis, 61, 64, 433, 447, 455, 463, 466, 471, 479, 482, 483 Principal component regression, 433, 445, 446, 450, 455, 463, 466, 476, 479, 483 Principal component regression estimates, 451 Principal component scores, 64 Principal components, 64 Principle of parsimony, 220 Prior information, 250 Probability density function, 77, 86, 87 Probability distribution, 115 Probit analysis, 492 Probit transformation, 404 Problem areas collinearity, 326 influential data points, 326 656 SUBJECT INDEX misspecified model, 326 near-linear dependencies, 326 outliers, 326 PROC GLM, 283, 581 PROC MIXED, 588, 589 PROC REG, 211 Product moment correlation, 50 Projection, 55, 186, 187, 437 Pure error, 143, 146, 241 Pure error sum of squares, 241 Pythagorean theorem, 47, 189 Q, hypothesis sum of squares, 120, 126 Quadratic forms, 101, 102 distribution of, 115 expectations of, 113 Quadratic model, 236 Quantitative variables as class variables, 270 R-notation, 129 RANDOM statement, 581, 583, 607 Random vectors, 77, 82, 86 linear functions of, 82 linear transformation, 83 Randomized complete block design, 577, 579, 593 Recursive residuals, 343, 344 Reduced model, 126 Reference cell model, 280 Regression through the origin, 21 Regression coefficients properties of, 87 Regression diagnostics, 341 Regression sum of squares, 110 Relative efficiency, 420 REML, 589 Reparameterize, 192, 198, 244, 273 Residual, 3, 6, Residual mean square, 220, 222 Residuals vector, 81, 187 Response curve modeling, 249 Restricted maximum likelihood, 573, 574, 589, 616 Ridge regression, 445, 446, 461 Robust regression, 326 Row marker, 438, 442 RSQUARE method, 211 RSTUDENT, 342 Runs test, 353 normal approximation, 353 Sample-based selection, 209 Satterthwaite approximation, 582, 592, 609, 616 Satterthwaite option, 616 SBC criterion, 220, 225, 589 Scalar, 39 Scalar multiplication, 42 Scaled independent variables, 434, 435, 447, 471 Scheffé joint prediction intervals, 143 Scheffé method, 138, 172, 507 Second-degree polynomial model, 250 Sequential hypotheses, 554, 559 Sequential sum of squares, 131, 132, 197, 559 Shapiro–Francia test for normality, 359 Significance level to enter, 214 Significance level to stay, 214 SIMEX estimator, 337 Simultaneous confidence statements, 137 Singular value decomposition, 61, 435, 437, 447, 471 Singular values, 61 Singular vectors, 61, 63 Skewness coefficient, 327 Slope, Space, 184 Space, n-dimensional, 184 Spatial relationship, 54 SUBJECT INDEX Split-plot design, 579, 593 SS(Model), 108 SS(Regr), 110, 451 SS(Res), 108 Standardized residual, 342 Steepest descent method, 497 Stein shrinkage, 445 Stepwise regression methods, 213, 467 warnings, 219 Stepwise selection, 214, 215, 218, 468 Stopping rules, 206, 214, 220 Studentized residual, 342 Subset, 213 Subset model, 205, 209 Subset size criteria for choice of, 220 Subspace, 48, 49, 184, 187 Sum of squares corrected, model, 21, 108 of a linear contrast, 102 residual, 21, 108 uncorrected, Symmetry, 56 t-statistic, 117 t-test, 17 Testable hypothesis, 284, 546, 553, 559 Testing equality of variances, 291 Transformation, 397 arcsin, 404, 408 Box–Cox, 409, 428 Box–Tidwell, 400 ladder of, 399 logarithmic, 411 logit, 404 one-bend, 399 power family, 398, 399, 400, 409, 509 probit, 404 to improve normality, 327, 409 657 to simplify relationships, 398, 399 to stabilize variance, 328, 407, 409 two-bend, 398, 404 Trigonometric models, 235, 245, 485 Trigonometric regression, 245 Two-way classified data, 284 Type I hypotheses, 553 Type III hypotheses, 554 Unbalanced data, 545, 593 Uniquely estimated, 283 Univariate confidence intervals, 135, 171, 176 Uses of regression, 206 Validation, 230 Validity of assumptions, 326 Variable dependent, independent, Variable selection, 205, 206 effects of, 208 error bias, 209 Variance heterogeneous, 29, 328, 398 of linear functions, 11, 22 Variance component problems, 573 Variance components, 575 Variance decomposition proportions, 373 for linear functions, 376 Variance inflation factor, VIF, 372, 473 Variance of adjusted treatment means, 300 contrasts, 86 estimates, 12, 13 mean, 85 predictions, 14 Variance–covariance 658 SUBJECT INDEX of linear transformation, 83 of regression coefficients, 88 of residuals, 90 Variances, heterogeneous, 398 Vector, 39 addition, 48 geometric interpretation, 46 length of, 47 space defined by, 47 Vectors linearly independent, 48–50 orthogonal, 49, 54, 435 VIF, Variance inflation factor, 473 Wald methodology, 500, 514 Wald statistic, 500 Weber, J B., 318, 572 Weibull probability distribution, 492, 524 Weighted least squares, 328, 397, 413–415, 507, 552 X-space, 184 Springer Texts in Statistics (continued from page ii) Nguyen and Rogers: Fundamentals of Mathematical Statistics: Volume I: Probability for Statistics Nguyen and Rogers: Fundamentals of Mathematical Statistics: Volume II: Statistical Inference Noether: Introduction to Statistics: The Nonparametric Way Nolan and Speed: Stat Labs: Mathematical Statistics Through Applications Peters: Counting for Something: Statistical Principles and Personalities Pfeiffer: Probability for Applications Pitman: Probability Rawlings, Pantula and Dickey: Applied Regression Analysis Robert: The Bayesian Choice: A Decision-Theoretic Motivation Robert: The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, Second Edition Robert and Casella: Monte Carlo Statistical Methods Santner and Duffy: The Statistical Analysis of Discrete Data Saville and Wood: Statistical Methods: The Geometric Approach Sen and Srivastava: Regression Analysis: Theory, Methods, and Applications Shao: Mathematical Statistics Shorack: Probability for Statisticians Shumway and Stoffer: Time Series Analysis and Its Applications Terrell: Mathematical Statistics: A Unified Introduction Whittle: Probability via Expectation, Fourth Edition Zacks: Introduction to Reliability Analysis: Probability Models and Statistical Methods ... Multivariate Statistics Jobson: Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and Multivariate... G Pantula David A Dickey Applied Regression Analysis A Research Tool Second Edition With 78 Figures John O Rawlings Sastry G Pantula David A Dickey Department of Statistics North Carolina... linearize, and analysis of residuals Chapter 17 addresses the analysis of unbalanced data Chapter 18 (new to this edition) introduces linear models that have more than one random effect The ordinary least

Định dạng
Số trang	671
Dung lượng	6,09 MB