Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 718 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
718
Dung lượng
5,83 MB
Nội dung
Applied Multivariate Analysis Neil H Timm SPRINGER Springer Texts in Statistics Advisors: George Casella Springer New York Berlin Heidelberg Barcelona Hong Kong London Milan Paris Singapore Tokyo Stephen Fienberg Ingram Olkin This page intentionally left blank Neil H Timm Applied Multivariate Analysis With 42 Figures Neil H Timm Department of Education in Psychology School of Education University of Pittsburgh Pittsburgh, PA 15260 timm@pitt.edu Editorial Board George Casella Department of Statistics University of Florida Gainesville, FL 32611-8545 USA Stephen Fienberg Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213-3890 USA Ingram Olkin Department of Statistics Stanford University Stanford, CA 94305 USA Library of Congress Cataloging-in-Publication Data Timm, Neil H Applied multivariate analysis / Neil H Timm p cm — (Springer texts in statistics) Includes bibliographical references and index ISBN 0-387-95347-7 (alk paper) Multivariate analysis I Title II Series QA278 T53 2002 519.5’35–dc21 2001049267 ISBN 0-387-95347-7 Printed on acid-free paper c 2002 Springer-Verlag New York, Inc All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed in the United States of America SPIN 10848751 www.springer-ny.com Springer-Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH To my wife Verena This page intentionally left blank Preface Univariate statistical analysis is concerned with techniques for the analysis of a single random variable This book is about applied multivariate analysis It was written to provide students and researchers with an introduction to statistical techniques for the analysis of continuous quantitative measurements on several random variables simultaneously While quantitative measurements may be obtained from any population, the material in this text is primarily concerned with techniques useful for the analysis of continuous observations from multivariate normal populations with linear structure While several multivariate methods are extensions of univariate procedures, a unique feature of multivariate data analysis techniques is their ability to control experimental error at an exact nominal level and to provide information on the covariance structure of the data These features tend to enhance statistical inference, making multivariate data analysis superior to univariate analysis While in a previous edition of my textbook on multivariate analysis, I tried to precede a multivariate method with a corresponding univariate procedure when applicable, I have not taken this approach here Instead, it is assumed that the reader has taken basic courses in multiple linear regression, analysis of variance, and experimental design While students may be familiar with vector spaces and matrices, important results essential to multivariate analysis are reviewed in Chapter I have avoided the use of calculus in this text Emphasis is on applications to provide students in the behavioral, biological, physical, and social sciences with a broad range of linear multivariate models for statistical estimation and inference, and exploratory data analysis procedures useful for investigating relationships among a set of structured variables Examples have been selected to outline the process one employs in data analysis for checking model assumptions and model development, and for exploring patterns that may exist in one or more dimensions of a data set To successfully apply methods of multivariate analysis, a comprehensive understanding of the theory and how it relates to a flexible statistical package used for the analysis viii Preface has become critical When statistical routines were being developed for multivariate data analysis over twenty years ago, developing a text using a single comprehensive statistical package was risky Now, companies and software packages have stabilized, thus reducing the risk I have made extensive use of the Statistical Analysis System (SAS) in this text All examples have been prepared using Version for Windows Standard SAS procedures have been used whenever possible to illustrate basic multivariate methodologies; however, a few illustrations depend on the Interactive Matrix Language (IML) procedure All routines and data sets used in the text are contained on the Springer-Verlag Web site, http://www.springer-ny.com/detail.tpl?ISBN=0387953477 and the author’s University of Pittsburgh Web site, http://www.pitt.edu/∼timm Acknowledgments The preparation of this text has evolved from teaching courses and seminars in applied multivariate statistics at the University of Pittsburgh I am grateful to the University of Pittsburgh for giving me the opportunity to complete this work I would like to express my thanks to the many students who have read, criticized, and corrected various versions of early drafts of my notes and lectures on the topics included in this text I am indebted to them for their critical readings and their thoughtful suggestions My deepest appreciation and thanks are extended to my former student Dr Tammy A Mieczkowski who read the entire manuscript and offered many suggestions for improving the presentation I also wish to thank the anonymous reviewers who provided detail comments on early drafts of the manuscript which helped to improve the presentation However, I am responsible for any errors or omissions of the material included in this text I also want to express special thanks to John Kimmel at Springer-Verlag Without his encouragement and support, this book would not have been written This book was typed using Scientific WorkPlace Version 3.0 I wish to thank Dr Melissa Harrison, Ph.D., of Far Field Associates who helped with the LATEX commands used to format the book and with the development of the author and subject indexes This book has taken several years to develop and during its development it went through several revisions The preparation of the entire manuscript and every revision was performed with great care and patience by Mrs Roberta S Allan, to whom I am most grateful I am also especially grateful to the SAS Institute for permission to use the Statistical Analysis System (SAS) in this text Many of the large data sets analyzed in this book were obtained from the Data and Story Library (DASL) sponsored by Cornell University and hosted by the Department of Statistics at Carnegie Mellon University (http://lib.stat.cmu.edu/DASL/) I wish to extend my thanks and appreciation to these institutions for making available these data sets for statistical analysis I would also like to thank the authors and publishers of copyrighted Subject Index matrix normal distribution, 91 multivariate F distribution, 105 multivariate normal (MVN), 84 multivariate t distribution, 104 Student t-distribution, 99 Type I and Type II Beta multivariate, 102 univariate, 101 univariate normal, 84 Wishart distribution, 96 Designs with empty cells, 266 Determinant column expansion, 51 covariance matrix, 80 distribution of, 98 definition, 50 eigenvalues, 70 partitioned matrix, 52 row expansion, 51 Diagonal matrix, 29 Diagonalization, 42 symmetric matrix, 69 two symmetric matrices, 70 Dimension of vector space, 13 Direct product, 33 Direct sum of matrices, 35 Direct sum of vector spaces, 17, 18 Direction cosines, 22 Disconnected two-way designs, 267 DISCRIM procedure, 429, 440, 441 POOL option, 431 Discriminant analysis and classification analysis, 426 Egyptian skull data example, 429 in MANCOVA, 240 in MANOVA, 238 Latin square design, 280 logistic, 439 multiple groups, 434 dimensionality, 435 example, 440 plots, 442 tests of hypotheses coefficients, 422 discriminant functions, 424 two groups, 151, 420 assumptions, 420 CANONICAL option, 155 correlation weights, 153 679 DISCRIM procedure, 430 plots, 430 standardized coefficients, 152 = , 426 variable selection, 429 stepwise, 437 test of additional information, 438 Discriminant scores, 151 normalized, 151 Dissimilarity measure definition, 516 Distance Euclidean, 14 dissimilarity measure, 517 Mahalanobis, 21 Distance in the metric of, 22 Distribution of Beta hat, 112 |S|, 98 Hotelling’s T2 statistic, 99 Mahalanobis distance, 94 partitioned covariance matrix, 97 partitioned matrices, 97 quadratic form, 94 random normal matrix, 90 sample variance, 93 vech S, 90 Dot product matrix, 34 vector, 13 Double eigenequation, 70 Double multivariate model (DMM), 400 example, 404 Duplication matrix, 39 Dynamic multipliers, 585 Dynamic structural model, 585 Edward’s data, 294 Egyptian skull data, 429 Eigenequation, 68 double, 70 eigenvalue, 67 eigenvector, 68 EIGVAL function, 299 Elementary permutation, 29 Elementary transformations, 43 Elimination matrix, 39 Elliptical distribution, 85, 605 Endogenous variable, 316, 557 680 Subject Index Equality of covariance matrices test of, 133 example, 135 Error rates actual error rate, 427 apparent correct error rate, 427 apparent error rate, 427 bias correction, 428 expected actual error rate, 428 hold out, 428 leave-one-out, 428 resubstitution, 427 true error rate, 428 Estimable functions MANCOVA, 226 nonorthogonal designs, 266 one-way MANCOVA, 229 one-way MANOVA, 220 solution, 60 two-way additive MANOVA, 254 two-way MANOVA, 248 two-way nested design, 275 ESTIMATE statement, 237, 260 Estimation least squares, 108 Euclidean distance matrix, 517 Euclidean matrix norm, 31 Euclidean norm squared, 21 Euclidean space, Evaluating expected mean squares, 391 Hartley’s method of synthesis, 377 Exogeniety, 604 and Granger noncausality, 606 strong, 606 weak, 211, 317, 605 Exogenous variable, 316, 557 Expected value random matrix, 80 random vector, 80 Exploratory factor analysis, 496 and regression, 497 communality, 499 Di Vesta and Walls example, 512 eigenvectors, 498 estimation Howe’s determinant method, 505 ML method, 505 Rao’s canonical factors, 505 unweighted least squares, 502 weighted least squares, 506 estimation of communalities, 503 estimation of factor scores, 509 factor rotation, 507 Guttman’s bound, 511 Heywood case, 501 indeterminacy canonical correlation characterization, 501 principal component characterization, 501 interpretation, 511 iterative factoring method, 503 loading matrix, 498 measure of sampling adequacy, 507 model assumptions, 497 model definition, 497 number of factors, 500 number of factors to retain, 511 of correlation matrix, 499 of covariance matrix, 497 performance assessment example, 511 principal component method, 502 principal factoring method, 502 rotation varimax method, 508 scale invariance, 499 Shin’s example, 512 simple structure, 501 structure matrix, 499 test of model fit, 506 transformation problem, 500 unique variance, 499 Extended linear hypotheses, 286 Extended linear model, 335 F-approximation Bartlett-Lawley-Hotelling trace, 103 Bartlett-Nanda-Pillai trace, 103 in mixed models, 356 Roy’s statistic, 103 Wilks’ , 102 F-distribution, 99 and Beta distribution, 101 and Hotelling’s T2 , 99, 101 multivariate, 105 noncentrality parameter, 99 table explanation, 610 table of critical values, 614 Subject Index Factor analysis see Confirmatory, 496 Factor analysis see Exploratory, 496 FACTOR procedure, 453, 460, 464 Factor scores, 509 estimation Bartlett’s method, 509 regression method, 510 output in SAS, 476 Factoring a matrix, 42 FASTCLUS procedure, 536 Feasible generalized least squares estimate, 109 in SUR, 313 iterative, 313 Finite intersection tests in MANOVA, 232 FINV function, 182 Fisher’s discriminant function, 151, 421 in multiple linear regression, 422 robustness, 424 Fisher’s Iris data, 443 Fisher’s protected t-tests, 153 Fit functions in SEM, 562 Fit indices in CFA, 574 Fundamental theorem in scaling, 543 g-inverse, 47 partitioned matrix, 49 Gauss’ matrix inversion technique, 45 Gauss-Markov model, 106 estimate of Beta hat, 109 Wald’s statistic, 109 General linear model, 106 multivariate, 111 noncentrality matrix, 113 General solution linear equations, 56 Generalized alienation coefficient, 81 Generalized coefficient of determination, 485 Generalized Gauss-Markov estimator, 109 Generalized inverse definition, 47 Generalized least squares, 109 Generalized variance, 80, 98 expected value, 98 variance of, 98 Generating MVN data, 124 data sets A, B, and C, 124 681 Geometry ANOVA model, 18 correlation, 25 generalized least squares, 73 least squares, 67 linear model, 67 orthogonal transformation, 62 projection matrix, 64 glimmix macro, 359 GLM procedure, 155, 163, 171, 181, 240, 259, 262, 278, 295, 329, 378, 391 RANDOM statement, 181 GLM vs MIXED, 386 GMANOVA model, 115 as a SUR model, 326 definition, 320 ill-conditioned, 475 ML estimate of B matrix, 321 ML estimate of , 321 one group example, 328 Rao-Khatri reduction, 323 selection of covariates, 325 test of fit, 324 two group example, 330 Goodman-Kruskal gamma, 538 Gower metric, 517 Gram-Schmidt orthogonalization, 15 Gram-Schmidt orthonormalization example, 16 Gramian matrix, 69 H and E matrices, 102 multivariate, 113 univariate, 108 Hadamard product, 34 Hansen’s data, 465 Harris’ test of circularity, 142 Hierarchical clustering methods, 523 Homogeneous system of equations, 56 Hotelling’s T2 distribution, 99 and F-distribution, 101 mean and variance of, 101 HOVTEST option, 235 Idempotent matrix, 26 properties, 54 Identification in CFA, 572 in SEM, 597 682 Subject Index local, 561 necessary condition, 561 path analysis, 582 Identity matrix, 29 Image factor analysis, 500 IML procedure, 298 Independence distribution-preserving, 114 matrix quadratic forms, 97 mean and covariance, 100 quadratic forms, 94 test of, 136, 143 example, 145 zero covariance, 82 Independent random variables, 76 random vectors, 82 INDSCAL models, 547 Influence measures Cook’s distance, 194 covariance ratio, 196 DFBETA, 195 DFFITS, 194 Studentized residuals, 194 Welsch-Kuh statistic, 195 Information criteria in confirmatory factor analysis, 579 in exploratory factor analysis, 511 multivariate regression, 201 univariate mixed models, 360 Inner product definition, 13 Inverse matrix, 42 Inverted Beta distribution multivariate, 102 univarate, 101 Inverted Wishart distribution, 96 J matrix, 30 properties, 39 Jaccard coefficient, 521 METHOD=dJaccard, 541 Jacobs and Hritz data, 234 Joseph Raffaele’s data, 257 Kernel space, 61 complement of, 61 dimension, 61 Kronecker product, 33 Kurtosis, 82, 121 L p norm in cluster analysis, 517 matrix, 72 vector, 21 Largest root test see Roy’s statistic, 104 Latent variable, 557 endogenous, 559 exogenous, 559 Latin square design, 279 Law of cosines, 22 Least squares estimate Beta, 108 Gauss-Markov, 109 Lee’s data, 332 Left inverse, 47 Levine and Saxe data, 272 Likelihood ratio statistic multivariate, 102 univariate, 109 Wilks’ , 102 Linear dependence, Linear equations, 55 consistent, 55 general solution, 56 homogeneous, 56 nonhomogeneous, 56 parametric functions, 59 solution g-inverse, 59 reparameterization, 56 restrictions, 56 unique solution, 56 Linear independence, Linear model, 67 ANCOVA, 107 ANOVA, 107 Beta vector, 106 F-test, 108 general linear hypothesis, 108 GLS estimate beta, 108 ML estimate beta, 109 multivariate, 110 OLS estimate beta, 108 regression, 108 univariate, 106 mixed, 357 Subject Index Linear space, Linear transformation, 61 nonsingular, 61 one-to-one, 61 orthogonal, 62 rotation, 62, 69 singular, 61 LISREL notation, 567 LSMEANS statement, 238, 241 Lubischew’s flea beetle data, 440 M estimate, 119 M-sphericity, 407 Mahalanobis distance, 22, 82 and T2 , 101 distribution of, 94 in classification analysis, 427 in cluster analysis, 517 robust estimate, 123 two vectors, 82 Manifest variable, 557 MANOVA statement, 172 MANOVA-GMANOVA model, 115, 338 Marascuilo’s data, 385 Mardia’s test of normality, 121 example, 124 table explanation kurtosis, 610 table explanation skewness, 610 table of critical values skewness, 622 table of lower critical values kurtosis, 623 table of upper critical values kurtosis, 624 Matrix addition, 26 adjoint, 51 adjugate, 51 affine projection, 73 canonical form, 42 commutation, 38 definition, 25 diagonal, 29 direct sum, 35 duplication, 39 elementary, 43 elementary permutation matrix, 29 elimination, 39 equality, 26 equicorrelation, 83 683 factorization, 44 square root, 69 Hadamard, 54 Hadamard product, 34 idempotent, 26 identity, 29 inverse, 52 infinite series, 567 J matrix, 30 Kronecker product, 33 lower triangular, 69 minor, 51 multiplication, 26 by scalar, 26 nilpotent, 30 nonsingular, 42 norm, 30 order, 25 orthogonal, 42 orthogonal projection, 64 outer product, 27 partial correlations, 92 partitioned, 32 permutation, 30 positive definite, 69 positive semidefinite, 69 postmultiplication, 29 premultiplication, 29 projection, 49, 55, 64 quadratic forms distribution of, 97 random, expected value, 80 rank, 41 singular, 42 skew-symmetric, 28 square, 25, 27 square root, 69 sum of squares and cross products, 88 Toeplitz, 54 trace, 30 generalized, 401 transpose, 28 triangular lower or upper, 29 tripotent, 30 vec-permutation, 37 Matrix norm, 71 Euclidean, 71 max norm, 72 684 Subject Index spectral, 72 von Neumann, 72 Matrix normal distribution, 90 covariance structure, 91 mean structure, 91 Matrix properties addition, 26 determinant, 52 direct sum, 35 double eigenequation roots, 71 eigenvalues, 70 generalized inverse, 49 Hadamard product, 34 inverse matrix, 46 Kronecker product, 33 matrix norm, 31 Moore-Penrose inverse, 47 multiplication, 26 rank, 45 trace, 30 transpose, 28 vec operator, 36 Mauchly’s test of circularity, 140 randomized block ANOVA, 165 sphericity, 140 Maximum eigenvalue, 72 Maximum likelihood estimate Beta, 108, 109 variance, 108 MDS procedure, 549, 550, 553 Mean vector, 80 equality - two groups, 149 example, 154 estimate, 88 Measures of association canonical correlation, 485 coefficient of determination, 485 vector correlation coefficient, 485 Wilks’ , 485 MGGC model see sum-of-profiles, 116 Minimum eigenvalue, 72 Minkowski norm, 21, 517 Minor, 51 Missing data, 326 MAR, 327 MCAR, 327 MIXED procedure, 372, 378, 397 CONTAIN option, 410 DDFM option, 372 Kronecker = A B , 408 METHOD option, 372 TYPE option for , 373 Mixed procedure, 182 RANDOM statement, 181 MIXED vs GLM, 378 MODEL statement, 172 Moore and McCabe cheese data, 458 Moore-Penrose inverse, 47 construction of, 48 More general growth curve model see sum-of-profile model, 116 MSUR model, 116 as SUR model, 341 definition, 339 MTEST statement, 214, 240 MULTEST procedure, 238 Multidimensional scaling, 516, 541 and cluster analysis, 550 classical method example, 549 classical metric method, 542 evaluation of fit, 544 fundamental theorem, 542 MDS procedure, 550 nonmetric, 544 STRESS criterion, 546 number of dimensions, 549 plots, 550 principal coordinates, 544 rotation of dimensions, 551 Multilevel hierarchical models, 367 MULTINORM macro, 124 Multiple design multivariate see SUR, 114 Multiple linear regression, 107, 186 adjusted R2 , 197 and discriminant analysis, 422 coefficient of determination R2 , 197 Cook’s distance, 194 covariance ratio, 196 estimate of Beta, 107 on principal components, 474 power calculations, 208 residual DFBETA, 195 DFFITS, 194 Subject Index externally Studentized, 194 internally Studentized, 194 test of hypotheses, 108 variable selection, 198 weighted LS estimate, 108 Multivariate Beta plots, 122 chi-square distribution, 104 chi-square plots, 122 F-distribution, 104 Outlier, 126 t-distribution, 104 Multivariate analysis of covariance (MANCOVA) general linear model, 225 H and E matrices, 226 one-way classification, 230 adjusted means, 229, 241 discriminant analysis, 240 example, 239 GLM procedure, 240 simultaneous confidence intervals, 231 tests of hypotheses, 229 test of additional information, 242 test of parallelism, 228 tests of hypotheses, 227 two-way classification, 256 example, 261 test of parallelism, 261 Multivariate analysis of variance (MANOVA), 111 extended linear hypotheses, 286 general linear hypothesis, 112 Latin square design, 279 one-way classification, 218 discriminant analysis, 238 estimable functions, 220 example, 234 GLM procedure, 223, 235 protected F-tests, 230, 236 simultaneous confidence intervals, 230, 236 tests of hypotheses, 223 unequal i , 245 unequal i (example), 246 power calculations, 302 repeated measures, 282 see Repeated measurement designs, 282 robustness of tests, 301 685 tests of hypotheses unequal covariance matrices, 308 three-way classification, 274 two-way classification, 246 additive model, 252 example, 257 simultaneous confidence intervals, 252, 260 test of additivity, 333 test of interaction, 249 tests of main effects, 250 two-way nested design, 274 example, 276 two-way nonorthogonal design, 264 connected, 268 disconnected, 267 empty cells, 266 Multivariate association, 81, 485 Multivariate circularity, 146 test of, 147 example, 148 Multivariate general linear model (MGLM), 111 Multivariate hierarchical model, 415 Multivariate linear model, 111 chi-square statistic, 112 general hypothesis, 112 likelihood ratio statistic, 113 MANCOVA, 111 MANOVA, 111 ML estimate of B matrix, 112 ML estimate vec B, 111 power calculations, 301 vector form, 111 Multivariate mixed model (MMM), 116, 385 balanced designs, 394 multivariate split-plot example, 395 two-way classification - example, 395 definition, 386 estimating the mean, 392 expected mean squares, 391 repeated measurements, 392 tests of hypotheses, 388 Multivariate normal distribution, 84 central limit theorem, 88 conditional distribution, 87 covariance matrix, 87 mean, 87 constant density ellipsoid, 85 686 Subject Index covariance matrix maximum likelihood estimate, 88 estimation of covariance, 88 estimation of mean, 88 generation of, 124 independent, 84 kurtosis, 85 linear functions of, 87 of mean vector, 88 properties, 86 skewness, 85 test for, 121 Multivariate normality, test of, 121 Multivariate outlier example, 126 Multivariate regression (MR), 111, 187 assessing normality, 197 estimation of B, 187 deviation scores, 188 least squares, 187 maximum likelihood (ML), 187 simultaneous confidence intervals, 205 standardized scores, 188 estimation of deleted covariance matrix, 194 estimation of , 193 example, 212 MTEST statement, 214 REG procedure, 212 simultaneous confidence intervals, 214 exogeniety, 211 expect mean square error of prediction, 209 fitted values, 193 hat matrix, 193 influence measures Cook’s distance, 195 covariance ratio, 196 DFBETA, 196 DFFITS, 196 externally Studentized residuals, 194 internally Studentized residuals, 194 on principal components, 474 example, 476 power calculations, 301 prediction, 204 simultaneous confidence intervals, 205 predictive validity, 490 random X matrix, 206, 490 reduction notation, 192 REG procedure, 192 regression coefficients, 187 residual matrix, 193 deleted residual matrix, 193 see Multivariate linear model, 111 test of hypotheses, 189 additional information, 191 lack of fit, 203 MTEST statement, 192 multivariate test criteria, 190 overall regression, 189 partial F-test, 202 row of parameter matrix B, 190 variable selection, 198, 212 using Eq , 199 information criteria (AIC, BIC, etc.), 200 using MPRESSq , 199 using Rq2 , 198 stepwise, 201 Multivariate trimming (MVT), 123 Napoir’s data, 538 Nation’s data, 553 Nested designs, 273 Nonhierarchical clustering methods, 530 Nonnegative definite matrix, 69 Nonnested models, 344 example, 347 Nonorthogonal two-way designs, 264 additive model, 268 with empty cells, 269 connected, 268 disconnected, 267 example, 270 interaction model, 265 with empty cells, 266 TYPE I, II, and IV hypotheses, 272 Nonrecursive model, 559, 580 block nonrecursive, 580 Nonsingular matrix, 42, 52 Norm matrix, 30 vector, 14 Normal distribution table explanation, 609 table of critical values, 611 Normal equations, 56 Subject Index Normalized vector, 14 Null matrix, 29 Null space, 17 Null space, see Kernel, 61 Null vector, Obenchain’s data, 371 Oblique transformations, 509 Ochiai coefficient, 522 ODS option, 347 Oh notation, 77 oh notation, 76 Ordinary least squares estimate, 108 ORPOL function, 328 Orthogonal basis, 14 Orthogonal complement, 17 dimension of, 17 subspace, 18 Orthogonal decomposition, 20 Orthogonal matrix, 42 Orthogonal polynomials, 328 Orthogonal projection, 15, 64 Orthogonal transformation, 62 properties, 62 Orthonormal basis, 14 Orthonormal vector, 14 Outlier detection with a plot, 126 detection with PCA, 449 example, 459 Output Delivery System (ODS), 461 Parametric functions, 59 Part canonical correlation analysis, 488 example, 495 tests of hypotheses, 488 Partial canonical correlation analysis, 488 example, 494 tests of hypotheses, 488 Partitioned matrix, 32 inverse, 46 Path analysis, 580 direct effects, 583 dynamic model, 585 identification, 582 nonrecursive models, 583 recursive models, 582 indirect effects, 584 LINEQS statements, 589 model definition, 580 model equilibrium, 585 nonrecursive example, 590 nonrecursive model, 581 recursive example, 586 recursive model, 581 Permutation matrix, 30 Phi coefficient, 522 Plim definition, 77 Plots Beta, 122 CANDISC procedure, 541 chi-square, 122 gamma, 123 in multidimensional scaling, 550 non-normal chi-square, 126 scree, 461 Positive definite, 69 Positive semidefinite, 69 Power analysis ANOVA example, 306 MANOVA example, 304 multivariate linear model, 303 two groups, 182 example, 183 Prediction expected mean square error, 209 in MR, 204, 490 in SUR model, 314 in univariate mixed models, 368 Principal component analysis, 445 and cluster analysis, 465 and correspondence analysis, 458 and eigenvalues, 446 and eigenvectors, 446 and residual variance, 451 calculation of components, 451 FACTOR procedure, 459 PRINCOMP procedure, 459 component scores, 465 covariance loadings, 448 effect of scaling, 450 FACTOR procedure, 461 in reduced rank regression, 474 interpretation, 460, 463, 464, 466 maximum eigenvalue, 446 687 688 Subject Index number of components eigenvalue criterion, 451 geometric mean, 449 proportion of variance, 449 scree plots, 456 tests of significance, 470, 473 orthogonal components, 446 outlier detection, 449 example, 459 pattern loadings, 450 performance assessment example, 465 plotting components, 458 population model with , 446 population model with P, 450 regression on components, 474 rotation, 447, 463 score matrix, 447 semantic differential ratings example, 461 standardized components, 447 test battery example, 460 tests using correlation matrix dimensionality, 472 equal eigenvalues, 472 tests using covariance matrix average of eigenvalues, 469 confidence interval maximum eigenvalue, 468 dimensionality, 470 eigenvectors, 469 equal eigenvalues, 470 proportion of variance, 471 using S, 455 variable selection, 457 with covariates, 453, 457 Principal coordinates, 516, 543 Principle curves, 458 PRINCOMP procedure, 453, 459 PROBF function, 183, 303 Profile analysis, 160 and randomized block mixed ANOVA, 164 definition, 160 one group, 160 example, 162 simultaneous confidence intervals, 162 two groups = , 165 = , 175 simultaneous confidence intervals, 170 two groups = example, 171 two groups = example, 176 Project talent data, 466 Projection (orthogonal), 15 Projection matrix, 49, 63 affine, 73 properties, 63 Protected F-tests, 230 Protected t-tests, 153 Protein consumption data, 534 Proximity measures, 516 Pseudo statistics, 532 Q-Q plots Beta, 124 chi-square, 126 normal, 119, 122 Quadratic classification rule, 426 example, 430 Quadratic form, 72 distribution of, 100 independence, 94 max eigenvalue/root, 72 nonnegative definite Gramian, 72 positive definite, 72 positive semidefinite, 72 Ramus bone data, 126, 328 Random coefficient model, 352 best linear unbiased predictor (BLUP), 354 covariance matrix of BLUP, 355 covariance of Beta hat, 354 estimate of covariance matrix, 354 ML estimate, 354 restricted ML (REML), 354 example, 371 Mixed procedure, 372 RANDOM statement, 372 REPEATED statement, 372 ML estimate of Beta, 353 tests of hypotheses, 355 Satterthwaite F test, 356 Wald statistic, 356 two-stage hierarchical model, 352 RANDOM statement, 181 Subject Index Rank, 42 full rank, 42 Rank condition, 583 Rao’s F-approximation Wilks’ , 102 Rao’s score statistic, 109 Rayleigh quotient, 72 Recursive model, 559, 580 block recursive, 580 Reduced rank multivariate regression model, 476 Redundancy analysis, 487 index, 487 REG procedure, 212, 262 Regression see Multivariate or Multiple, 106 Reliability, 578 Reparameterization, 56 Repeated measurement designs, 282 CGMANOVA, 319 crossover design, 289 double multivariate model, 400 extended linear hypotheses, 287 example, 298 generalized contrast matrix, 287 Roy criterion, 292 simultaneous confidence intervals, 293 trace criterion, 292 Wald statistic, 294 multivariate split-plot design, 393 one-way classification, 283 example, 294 mixed ANOVA model, 284, 380 power calculations, 305 see ANOVA, 283 tests of hypotheses unequal covariance matrices, 308 with Kronecker structure, 403 REPEATED statement, 163, 172, 173, 296 Restricted linear model univariate, 110 Restrictions, 56 Reticular action model (RAM), 564 Right inverse, 47 Robust estimate breakdown point, 123 of covariance matrix, 123 two groups, 160 of mean, 119, 123 689 of variance, 120 Rohwer’s data, 212, 492 Root, see Eigenvalue, 68 Rotation, 69 in exploratory factor analysis, 507 in multidimensional scaling, 552 Row space, 25, 42 Roy’s statistic, 103 F-approximation, 103 SAS/INSIGHT software creating chi-square plots, 128 examining observations, 130 in multidimensional scaling, 553 invoking, 122 three-dimensional plots, 554 Scale free, 573 Scale invariance, 573 Scree plot, 456 cluster analysis, 531 exploratory factor analysis, 511 principal component analysis, 461 Seemingly Unrelated Regression (SUR), 311 asymptotic covariance matrix Beta hat, 313 CGMANOVA model, 116, 318 example, 319 definition, 114 estimate of covariance matrix, 312 example, 316 STEST statement, 317 exogeniety, 316 extended linear SUR model tests of nonnested models, 346 FGLSE of Beta, 313 lack of fit test, 337 ML estimate of Beta, 312 nonnested models, 344 example, 347 prediction, 314 simultaneous confidence intervals, 317 tests of hypotheses, 313 Semi-metric, 518 Shapiro-Wilk test, 124 Shin’s data, 460 Signed minor, 51 Similarity measure conversion to dissimilarity, 520 correlation coefficient, 519 690 Subject Index cosine, 520 definition, 519 Simultaneous confidence intervals eigenvalues of covariance matrix, 469 extended linear hypotheses, 293 finite intersection tests, 233 GMANOVA model, 323 IML procedure, 214, 236, 240, 260, 263 MANCOVA, 230 MANOVA, 230 mean vectors - two groups, 150 maximum contrast, 152 protected t-tests, 153 multivariate regression, 205 profile analysis one group, 162 two groups, 169 random coefficient model, 356 SUR model, 317 Simultaneous test procedure, 104 ADJUST option, 237, 241 Bonferroni-Dunn, 154 ˇ ak, 154 Dunn-Sid´ DUNNETT option, 263 finite intersection tests, 154, 231 MULTEST procedure, 238 multivariate regression, 206 multivariate t-distribution, 154 one way classification MANOVA, 237 one-way classification MANCOVA, 230 table explanation, 610 table of critical values, 620 Single link method, 523 Singular matrix, 42 Singular values, 69 Skew-symmetric matrix, 28 Skewness, 82, 121 Slutsky’s theorem, 77 Smith, Gnanadesikan and Hughes’ data, 476 Smith, Gnanadesikan and Hughes’ data, 244 SOLUTION option, 240, 279 Spanning set, Sparks’ tobacco leaf data, 216 Spectral decomposition theorem, 69 Sphericity test of one sample, 139 test of several populations example, 142 Square root decomposition, see Cholesky, 69 Square root matrix, 69 Squared multiple correlation population, 87 SSTRESS criterion, 547 Standardized data matrix, 520 Stankov’s data, 513 Statistical distance, 21 STEPDISC procedure, 432 STEST statement, 317 STRESS criterion, 546 Structural equation models, 558 and multilevel models, 600 Bentler-Weeks model, 566 covariance structure of, 560 example, 594 model identification, 597 MODIFICATION option, 598 nested models, 598 exogeniety, 604 fit functions, 562 asymptotic distribution free, 562 identification, 561 latent growth, 602 LISREL notation, 559 McDonald’s COSAN model, 566 model equivalence, 567 model uniqueness, 561 multiple groups, 604 nonrecursive model, 559 notation, 559 path diagram, 559 symbols, 559 recursive model, 559 reticular action model (RAM), 564 see confirmatory factor analysis, 567 simultaneous equation models, see path analysis, 580 structural model, 559 tests of hypotheses, 562 with measurement error, 560 Student t-distribution, 99 and Beta distribution, 101 multivariate, 104 noncentrality parameter, 99 table explanation, 609 table of critical values, 613 Studentized maximum modulus see multivariate t-distribution, 105 Subspace, Subject Index Sum of vector spaces, 17 Sum-of-profiles model, 116 example, 341 tests of hypotheses, 339 Sylvester’s law, 45 Symmetric gauge functions, 291 Symmetric matrix, 28 SYSLIN procedure, 314, 316, 319, 343 estimation options, 316 FIML, SUR, ITSUR, 317 T2 - tests and two group discriminant function, 150 in discriminant analysis, 421 one sample statistic, 100 profile analysis one group, 161 two groups, 165 two mean vectors power calculation, 183 = , 156 = , 149 two sample statistic, 100 using ranks, 160 Test of hypotheses additional information, 242 additivity in MANOVA, 333 bipartial canonical correlations, 489 canonical correlations, 483 circularity, 140 compound symmetry, 138 confirmatory factor analysis, 574 covariance matrices, equality Box’s M test, 134 = o , 137 discriminant analysis, 422 eigenvalues of correlation matrix, 472 eigenvectors of covariance matrix, 470 equal covariance matrices, 133 equal eigenvalues of S, 470 extended linear, 286 fit in factor analysis, 506 GMANOVA model, 321 in principal component analysis, 468 independence, 143 MANCOVA, 225 MSUR model, 340 multivariate circularity, 147 multivariate mixed model (MMM), 388 691 multivariate regression, 189 multivariate repeated measurements, 401 nonnested models, 344 one-way MANCOVA parallelism, 227 one-way MANOVA, 218 parallelism in MANCOVA REG procedure, 240 part canonical correlations, 489 partial canonical correlations, 488 random coefficient model, 355 significant discriminant functions, 436 sphericity, 139 several populations, 142 sum-of-profiles model, 339 SUR model, 313 two mean vectors = , 149 = , 156 using ranks, 160 Tests of location multiple groups unequal covariance matrices, 307 nonnormal data, 160 two group example = , 154 = , 159 Tests of nonadditivity, 256 Three-dimensional plots, 553 Trace, 30 eigenvalues, 70 Trace statistic Bartlett-Lawley-Hotelling, 103 Bartlett-Nanda-Pillai, 103 Transformation oblique, 509 problem in factor analysis, 500 projection, 64 Transformation, see Linear, 61 Transforming data in SAS, 298 Transpose, 8, 28 TRANSREG procedure, 236, 240, 262, 336 Triangular inequality matrices, 31 vectors, 21 Triangular matrix lower, 29 upper or lower unit upper/unit lower, 29 692 Subject Index Trimmed mean, 119 TTEST procedure, 430 Tubb’s pottery data, 244 TYPE I - IV sum of squares, 379 Type I Beta distribution see Beta distribution, 101 Type II Beta distribution see inverted Beta distribution, 101 TYPE option, 375 Ultrametric, 518 Unequal covariance matrices tests for means, 308 Unfolding methods, 547 Unified theory LS, 109, 315 Uniformly most powerful, 103 Union-intersection see Roy’s statistic, 104 Unit vector, 15 Univariate linear model, 107 Univariate mixed model BLUP, 359 covariance structures, 359 definition, 357 estimation of Beta, 358 generalized randomized block design, 376 multilevel hierarchical model, 367 example, 381 residual analysis, 361 tests of hypotheses model fit, 361 variance components, 361 UNIVARIATE procedure, 126, 235, 373 USS function, 459 VARCLUS procedure, 524 Variable selection canonical correlation analysis, 492 in discriminant analysis, 423 in principal component analysis, 457 Multivariate regression, 215 Variance generalized, 98 in exploratory factor analysis, 498 in PCA, 446 ML estimate, 108 REML estimate, 108 Winsorized-trimmed, 120 Variance-covariance matrix see Covariance matrix, 80 Vec operator, 31 Vec-permutation matrix, 37 Vech operator, 38 Vector addition, column, cosine of angle, 14 definition, dimension order, direction cosines, 22 inner product, 13 laws, 12 linearly dependent, linearly independent, normalized, 14 orthogonal, 14 orthogonal projection, 15 position, random expected value, 79 uncorrelated, 82 scalar multiplication, unit, 15 Vector alienation coefficient, 485 Vector correlation coefficient, 198, 485 Vector norm definition, 21 distribution of, 99 Euclidean, 14 infinity norm, 21 max norm, 21 minimum, 21 Minkowski, 21 Vector space, basis of, 13 definition, dimension of, 13 direct sum, 17 Euclidean, existence, 13 intersection, 17 linear manifold, orthonormal, 15 subspace, null, uniqueness, 13 Subject Index Vector subspace improper, orthocomplement, 17 VTOH option, 549 Wald statistic, 109, 110 extended linear hypotheses, 294 F-approximation, 110 SUR model, 313 F-approximation, 313 Ward’s method, 529 weak monotonicity, 542 weak monotonicity constraint, 542 Welsch-Kuh statistic, 195 Wilks’ distribution, 102 Wilks’ statistic, 102 Bartlett’s chi-square approximation, 102 F-approximation, 102 Willerman’s brain size data, 432 Winsorized mean, 119 alpha trimmed, 119 Winsorized sum of squares, 120 Wishart distribution, 96 independence, 97 linear model noncentrality matrix, 113 noncentrality matrix, 96 with latent variables, 559 Zellner’s model see SUR, 114 Zullo’s data, 404 693 ... following statements (a) Any set of vectors containing the zero vector is linearly dependent (b) Any subset of a linearly independent set is also linearly independent (c) In a linearly dependent... squares, non-Euclidean distance and non-Euclidean norms are often useful in multivariate analysis We have seen that the Euclidean norm generalizes to a 22 Vectors and Matrices more general function... Discriminant and Classification Analysis 7.1 Introduction 7.2 Two Group Discrimination and Classification a Fisher’s Linear Discriminant Function b Testing Discriminant Function