a handbook of statistical analyses using SPSS

A Handbook of Statistical Analyses using SPSS Sabine Landau and Brian S Everitt CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C © 2004 by Chapman & Hall/CRC Press LLC Library of Congress Cataloging-in-Publication Data Landau, Sabine A handbook of statistical analyses using SPSS / Sabine, Landau, Brian S Everitt p cm Includes bibliographical references and index ISBN 1-58488-369-3 (alk paper) SPSS ( Computer file) Social sciences—Statistical methods—Computer programs Social sciences—Statistical methods—Data processing I Everitt, Brian S II Title HA32.E93 2003 519.5d0285—dc22 2003058474 This book contains information obtained from authentic and highly regarded sources Reprinted material is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale Specific permission must be obtained in writing from CRC Press LLC for such copying Direct all inquiries to CRC Press LLC, 2000 N.W Corporate Blvd., Boca Raton, Florida 33431 Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe Visit the CRC Press Web site at www.crcpress.com © 2004 by Chapman & Hall/CRC Press LLC No claim to original U.S Government works International Standard Book Number 1-58488-369-3 Library of Congress Card Number 2003058474 Printed in the United States of America Printed on acid-free paper © 2004 by Chapman & Hall/CRC Press LLC Preface SPSS, standing for Statistical Package for the Social Sciences, is a powerful, user-friendly software package for the manipulation and statistical analysis of data The package is particularly useful for students and researchers in psychology, sociology, psychiatry, and other behavioral sciences, containing as it does an extensive range of both univariate and multivariate procedures much used in these disciplines Our aim in this handbook is to give brief and straightforward descriptions of how to conduct a range of statistical analyses using the latest version of SPSS, SPSS 11 Each chapter deals with a different type of analytical procedure applied to one or more data sets primarily (although not exclusively) from the social and behavioral areas Although we concentrate largely on how to use SPSS to get results and on how to correctly interpret these results, the basic theoretical background of many of the techniques used is also described in separate boxes When more advanced procedures are used, readers are referred to other sources for details Many of the boxes contain a few mathematical formulae, but by separating this material from the body of the text, we hope that even readers who have limited mathematical background will still be able to undertake appropriate analyses of their data The text is not intended in any way to be an introduction to statistics and, indeed, we assume that most readers will have attended at least one statistics course and will be relatively familiar with concepts such as linear regression, correlation, significance tests, and simple analysis of variance Our hope is that researchers and students with such a background will find this book a relatively self-contained means of using SPSS to analyze their data correctly Each chapter ends with a number of exercises, some relating to the data sets introduced in the chapter and others introducing further data sets Working through these exercises will develop both SPSS and statistical skills Answers to most of the exercises in the text are provided at © 2004 by Chapman & Hall/CRC Press LLC http://www.iop.kcl.ac.uk/iop/departments/BioComp/SPSSBook.shtml The majority of data sets used in the book can be found at the same site We are grateful to Ms Harriet Meteyard for her usual excellent word processing and overall support during the writing of this book Sabine Landau and Brian Everitt London, July 2003 © 2004 by Chapman & Hall/CRC Press LLC Distributors The distributor for SPSS in the United Kingdom is SPSS U.K Ltd 1st Floor St Andrew’s House, West Street Woking Surrey, United Kingdom GU21 6EB Tel 0845 3450935 FAX 01483 719290 Email sales@spss.co.uk In the United States, the distributor is SPSS Inc 233 S Wacker Drive, 11th floor Chicago, IL 60606-6307 Tel 1(800) 543-2185 FAX 1(800) 841-0064 Email sales@spss.com © 2004 by Chapman & Hall/CRC Press LLC Contents Preface Distributors A Brief Introduction to SPSS 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Introduction Getting Help Data Entry 1.3.1 The Data View Spreadsheet 1.3.2 The Variable View Spreadsheet Storing and Retrieving Data Files The Statistics Menus 1.5.1 Data File Handling 1.5.2 Generating New Variables 1.5.3 Running Statistical Procedures 1.5.4 Constructing Graphical Displays The Output Viewer The Chart Editor Programming in SPSS Data Description and Simple Inference for Continuous Data: The Lifespans of Rats and Ages at Marriage in the U.S 2.1 2.2 2.3 2.4 Description of Data Methods of Analysis Analysis Using SPSS 2.3.1 Lifespans of Rats 2.3.2 Husbands and Wives Exercises 2.4.1 Guessing the Width of a Lecture Hall 2.4.2 More on Lifespans of Rats: Significance Tests for Model Assumptions 2.4.3 Motor Vehicle Theft in the U.S 2.4.4 Anorexia Nervosa Therapy 2.4.5 More on Husbands and Wives: Exact Nonparametric Tests © 2004 by Chapman & Hall/CRC Press LLC Simple Inference for Categorical Data: From Belief in the Afterlife to the Death Penalty and Race 3.1 3.2 3.3 3.4 Multiple Linear Regression: Temperatures in America and Cleaning Cars 4.1 4.2 4.3 4.4 Description of Data Methods of Analysis Analysis Using SPSS 3.3.1 Husbands and Wives Revisited 3.3.2 Lifespans of Rats Revisited 3.3.3 Belief in the Afterlife 3.3.4 Incidence of Suicidal Feelings 3.3.5 Oral Contraceptive Use and Blood Clots 3.3.6 Alcohol and Infant Malformation 3.3.7 Death Penalty Verdicts Exercises 3.4.1 Depersonalization and Recovery from Depression 3.4.2 Drug Treatment of Psychiatric Patients: Exact Tests for Two-Way Classifications 3.4.3 Tics and Gender 3.4.4 Hair Color and Eye Color Description of Data Multiple Linear Regression Analysis Using SPSS 4.3.1 Cleaning Cars 4.3.2 Temperatures in America Exercises 4.4.1 Air Pollution in the U.S 4.4.2 Body Fat 4.4.3 More on Cleaning Cars: Influence Diagnostics Analysis of Variance I: One-Way Designs; Fecundity of Fruit Flies, Finger Tapping, and Female Social Skills 5.1 5.2 5.3 5.4 Description of Data Analysis of Variance Analysis Using SPSS 5.3.1 Fecundity of Fruit Flies 5.3.2 Finger Tapping and Caffeine Consumption 5.3.3 Social Skills of Females Exercises 5.4.1 Cortisol Levels in Psychotics: Kruskal-Wallis Test 5.4.2 Cycling and Knee-Joint Angles 5.4.3 More on Female Social Skills: Informal Assessment of MANOVA Assumptions © 2004 by Chapman & Hall/CRC Press LLC Analysis of Variance II: Factorial Designs; Does Marijuana Slow You Down? and Do Slimming Clinics Work? 6.1 6.2 6.3 6.4 Analysis of Repeated Measures I: Analysis of Variance Type Models; Field Dependence and a Reverse Stroop Task 7.1 7.2 7.3 7.4 Description of Data Repeated Measures Analysis of Variance Analysis Using SPSS Exercises 7.4.1 More on the Reverse Stroop Task 7.4.2 Visual Acuity Data 7.4.3 Blood Glucose Levels Analysis of Repeated Measures II: Linear Mixed Effects Models; Computer Delivery of Cognitive Behavioral Therapy 8.1 8.2 8.3 8.4 Description of Data Analysis of Variance Analysis Using SPSS 6.3.1 Effects of Marijuana Use 6.3.2 Slimming Clinics Exercises 6.4.1 Headache Treatments 6.4.2 Biofeedback and Hypertension 6.4.3 Cleaning Cars Revisited: Analysis of Covariance 6.4.4 More on Slimming Clinics Description of Data Linear Mixed Effects Models Analysis Using SPSS Exercises 8.4.1 Salsolinol Levels and Alcohol Dependency 8.4.2 Estrogen Treatment for Post-Natal Depression 8.4.3 More on “Beating the Blues”: Checking the Model for the Correlation Structure Logistic Regression: Who Survived the Sinking of the Titanic? 9.1 9.2 9.3 9.4 Description of Data Logistic Regression Analysis Using SPSS Exercises 9.4.1 More on the Titanic Survivor Data 9.4.2 GHQ Scores and Psychiatric Diagnosis 9.4.3 Death Penalty Verdicts Revisited © 2004 by Chapman & Hall/CRC Press LLC 10 Survival Analysis: Sexual Milestones in Women and Field Dependency of Children 10.1 Description of Data 10.2 Survival Analysis and Cox’s Regression 10.3 Analysis Using SPSS 10.3.1 Sexual Milestone Times 10.3.2 WISC Task Completion Times 10.4 Exercises 10.4.1 Gastric Cancer 10.4.2 Heroin Addicts 10.4.3 More on Sexual Milestones of Females 11 Principal Component Analysis and Factor Analysis: Crime in the U.S and AIDS Patients’ Evaluations of Their Clinicians 11.1 Description of Data 11.2 Principal Component and Factor Analysis 11.2.1 Principal Component Analysis 11.2.2 Factor Analysis 11.2.3 Factor Analysis and Principal Components Compared 11.3 Analysis Using SPSS 11.3.1 Crime in the U.S 11.3.2 AIDS Patients’ Evaluations of Their Clinicians 11.4 Exercises 11.4.1 Air Pollution in the U.S 11.4.2 More on AIDS Patients’ Evaluations of Their Clinicians: Maximum Likelihood Factor Analysis 12 Classification: Cluster Analysis and Discriminant Function Analysis; Tibetan Skulls 12.1 Description of Data 12.2 Classification: Discrimination and Clustering 12.3 Analysis Using SPSS 12.3.1 Tibetan Skulls: Deriving a Classification Rule 12.3.2 Tibetan Skulls: Uncovering Groups 12.4 Exercises 12.4.1 Sudden Infant Death Syndrome (SIDS) 12.4.2 Nutrients in Food Data 12.4.3 More on Tibetan Skulls References © 2004 by Chapman & Hall/CRC Press LLC 10.4.2 Heroin Addicts The data shown in Table 10.3 give the times that heroin addicts remained in a clinic for methadone maintenance treatment Here the endpoint of interest is not death, but termination of treatment Some subjects were still in the clinic at the time these data were recorded or failed to complete their treatment program (status variable: “1” for subjects who had departed the clinic on completion of treatment and “0” otherwise) Possible explanatory variables for predicting time to complete treatment are maximum methadone dose, whether or not the addict had a criminal record, and the clinic in which the addict was treated Fit a Cox’s regression model to these data and consider the possibility of stratifying on clinic 10.4.3 More on Sexual Milestones of Females The times to first sexual intercourse shown in Table 10.1 were analyzed using a nonparametric survival analysis in the main body of the text If appropriate, reanalyze this data set using a semi-parametric survival analysis Specifically: Ⅲ Assess the proportional hazards assumption (Hint: Under the proportional hazards assumption, Kaplan-Meier estimates of the cumulative hazard function in the two groups are expected to be proportional.) Ⅲ Fit a Cox regression and interpret your findings Ⅲ Plot the survivor functions and cumulative hazard functions predicted by the fitted Cox model and compare these plots with Kaplan-Meier estimates of the curves © 2004 by Chapman & Hall/CRC Press LLC Table 10.3 Heroin Addicts Data Subject Clinic Status Time Prison? Dose Subject Clinic Status Time Prison? Dose 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 30 31 32 33 34 36 37 38 39 40 41 42 43 44 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 428 275 262 183 259 714 438 796 892 393 161 836 523 612 212 399 771 514 512 624 209 341 299 826 262 566 368 302 602 652 293 564 394 755 591 787 739 550 837 612 581 0 1 1 1 0 1 1 1 0 1 1 0 0 1 0 0 50 55 55 30 65 55 65 60 50 65 80 60 55 70 60 60 75 80 80 80 60 60 55 80 65 45 55 50 60 80 65 60 55 65 55 80 60 60 60 65 70 © 2004 by Chapman & Hall/CRC Press LLC 132 133 134 135 137 138 143 144 145 146 148 149 150 153 156 158 159 160 161 162 163 164 165 166 168 169 170 171 172 173 175 176 177 178 179 180 181 182 183 184 186 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 0 0 1 1 0 1 0 0 0 1 0 1 0 633 661 232 13 563 969 1052 944 881 190 79 884 170 286 358 326 769 161 564 268 611 322 1076 788 575 109 730 790 456 231 143 86 1021 684 878 216 808 268 222 683 0 1 0 1 0 1 1 1 0 1 1 1 1 0 0 70 40 70 60 70 80 80 80 80 50 40 50 40 45 60 60 40 40 80 70 40 55 80 40 70 80 70 80 90 70 60 70 40 80 80 60 100 60 40 40 100 Table 10.3 (continued) Heroin Addicts Data Subject Clinic Status Time Prison? Dose Subject Clinic Status Time Prison? Dose 45 46 48 49 50 51 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 523 504 785 774 560 160 482 518 683 147 563 646 899 857 180 452 760 496 258 181 386 439 563 337 613 192 405 667 905 247 821 821 517 346 294 244 95 376 212 96 1 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 60 60 80 65 65 35 30 65 50 65 70 60 60 60 70 60 60 65 40 60 60 80 75 65 60 80 80 50 80 70 80 75 45 60 65 60 60 55 40 70 © 2004 by Chapman & Hall/CRC Press LLC 187 188 189 190 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0 1 1 496 389 126 17 350 531 317 461 37 167 358 49 457 127 29 62 150 223 129 204 129 581 176 30 41 543 210 193 434 367 348 28 337 175 149 546 84 283 533 0 1 1 1 0 1 1 1 1 0 0 1 0 0 1 1 40 55 75 40 60 65 50 75 60 55 45 60 40 20 40 60 40 60 40 40 65 50 65 55 60 60 40 50 70 55 45 60 50 40 60 80 50 45 80 55 Table 10.3 (continued) Heroin Addicts Data Subject Clinic Status Time Prison? Dose Subject Clinic Status Time Prison? Dose 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 109 110 111 113 114 118 119 120 121 122 123 124 125 126 127 128 129 131 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 0 1 1 1 1 0 1 1 0 0 0 0 0 532 522 679 408 840 148 168 489 541 205 475 237 517 749 150 465 708 713 146 450 555 460 53 122 35 532 684 769 591 769 609 932 932 587 26 72 641 367 0 0 0 0 1 0 0 1 0 1 1 0 0 80 70 35 50 80 65 65 80 80 50 75 45 70 70 80 65 60 50 50 55 80 50 60 60 40 70 65 70 70 40 100 80 80 110 40 40 70 70 Source: Hand et al., 1994 © 2004 by Chapman & Hall/CRC Press LLC 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 266 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 207 216 28 67 62 111 257 136 342 41 531 98 145 50 53 103 157 75 19 35 394 117 175 180 314 480 325 280 204 366 531 59 33 540 551 90 47 0 1 1 0 1 0 1 1 1 1 0 0 1 0 0 50 50 50 50 60 55 60 55 60 40 45 40 55 50 50 50 60 60 55 40 60 80 40 60 60 70 50 60 90 50 55 50 45 60 80 65 40 45 Chapter 11 Principal Component Analysis and Factor Analysis: Crime in the U.S and AIDS Patients’ Evaluations of Their Clinicians 11.1 Description of Data In this chapter, we are concerned with two data sets; the first shown in Table 11.1 gives rates of various types of crime in the states of the U.S The second data set, shown in Table 11.2, arises from a survey of AIDS patients’ evaluations of their physicians The 14 items in the survey questionnaire measure patients’ attitudes toward a clinician’s personality, demeanour, competence, and prescribed treatment; each item is measured using a Likert-type scale ranging from one to five Since seven of the items are stated negatively, they have been recoded (reflected) so that one always represents the most positive attitude and five the least positive The 14 items are described in Table 11.2 © 2004 by Chapman & Hall/CRC Press LLC Table 11.1 Crime Rates in the U.S.a State Murder Rape Robbery Aggravated Assault ME NH VT MA RI CT NY NJ PA OH IN IL MI WI MN IA MO ND SD NE KS DE MD DC VA WV NC SC GA FL KY TN AL MS AR LA OK TX MT ID 2.2 3.6 3.5 4.6 10.7 5.2 5.5 5.5 8.9 11.3 3.1 2.5 1.8 9.2 3.1 4.4 4.9 31 7.1 5.9 8.1 8.6 11.2 11.7 6.7 10.4 10.1 11.2 8.1 12.8 8.1 13.5 2.9 3.2 14.8 21.5 21.8 29.7 21.4 23.8 30.5 33.2 25.1 38.6 25.9 32.4 67.4 20.1 31.8 12.5 29.2 11.6 17.7 24.6 32.9 56.9 43.6 52.4 26.5 18.9 26.4 41.3 43.9 52.7 23.1 47 28.4 25.8 28.9 40.1 36.4 51.6 17.3 20 28 24 22 193 119 192 514 269 152 142 90 325 301 73 102 42 170 16 51 80 124 304 754 106 41 88 99 214 367 83 208 112 65 80 224 107 240 20 21 102 92 103 331 192 205 431 265 176 235 186 434 424 162 148 179 370 32 87 184 252 241 476 668 167 99 354 525 319 605 222 274 408 172 278 482 285 354 118 178 © 2004 by Chapman & Hall/CRC Press LLC Burglary Larceny/Theft Motor Vehicle Theft 803 755 949 1071 1294 1198 1221 1071 735 988 887 1180 1509 783 1004 956 1136 385 554 784 1188 1042 1296 1728 813 625 1225 1340 1453 2221 824 1325 1159 1076 1030 1461 1787 2049 783 1003 2347 2208 2697 2189 2568 2758 2924 2822 1654 2574 2333 2938 3378 2802 2785 2801 2500 2049 1939 2677 3008 3090 2978 4131 2522 1358 2423 2846 2984 4372 1740 2126 2304 1845 2305 3417 3142 3987 3314 2800 164 228 181 906 705 447 637 776 354 376 328 628 800 254 288 158 439 120 99 168 258 272 545 975 219 169 208 277 430 598 193 544 267 150 195 442 649 714 215 181 Table 11.1 (continued) Crime Rates in the U.S.a State Murder Rape Robbery Aggravated Assault WY CO NM AZ UT NV WA OR CA AK HI 5.3 11.5 9.3 3.2 12.6 6.6 11.3 8.6 4.8 21.9 42.3 46.9 43 25.3 64.9 53.4 51.1 44.9 72.7 31 22 145 130 169 59 287 135 206 343 88 106 243 329 538 437 180 354 244 286 521 401 103 a Burglary Larceny/Theft Motor Vehicle Theft 817 1792 1846 1908 915 1604 1861 1967 1696 1162 1339 3078 4231 3712 4337 4074 3489 4267 4163 3384 3910 3759 169 486 343 419 223 478 315 402 762 604 328 Data are the number of offenses known to police per 100,000 residents of 50 states plus the District of Columbia for the year 1986 Source: Statistical Abstract of the USA, 1988, Table 265 For the crime data, the main aim of our analysis is to try to uncover any interesting patterns of crime among the different states For the AIDS data, we explore whether there is a relatively simple underlying structure that produces the observed associations among the 14 questionnaire items 11.2 Principal Component and Factor Analysis Two methods of analysis are the subject of this chapter, principal component analysis and factor analysis In very general terms, both can be seen as approaches to summarizing and uncovering any patterns in a set of multivariate data, essentially by reducing the complexity of the data The details behind each method are quite different 11.2.1 Principal Component Analysis Principal component analysis is a multivariate technique for transforming a set of related (correlated) variables into a set of unrelated (uncorrelated) variables that account for decreasing proportions of the variation of the original observations The rationale behind the method is an attempt to reduce the complexity of the data by decreasing the number of variables © 2004 by Chapman & Hall/CRC Press LLC Table 11.2 AIDS Patient’s Evaluation of Clinician y y My doctor treats me in a friendly manner I have some doubts about the ability of my doctor My doctor seems cold and impersonal My doctor does his/her best to keep me from worrying My doctor examines me as carefully as necessary My doctor should treat me with more respect I have some doubts about the treatment suggested by my doctor My doctor seems very competent and well trained My doctor seems to have a genuine interest in me as a person My doctor leaves me with many unanswered questions about my condition and its treatment Q11 My doctor uses words that I not understand Q12 I have a great deal of confidence in my doctor Q13 I feel I can tell my doctor about very personal problems Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 2 2 2 2 2 1 1 2 2 4 2 2 2 1 1 2 2 2 2 2 2 1 1 1 3 3 2 2 2 1 1 2 2 2 2 2 2 1 1 1 2 4 2 2 2 1 2 2 4 2 2 4 1 3 3 1 2 1 2 2 1 1 2 1 2 2 3 1 1 2 2 2 2 1 2 2 2 2 2 4 2 2 2 1 1 2 4 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 © 2004 by Chapman & Hall/CRC Press LLC Table 11.2 (continued) AIDS Patient’s Evaluation of Clinician Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 1 1 2 2 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 4 1 2 1 1 1 2 1 2 1 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 1 3 2 2 2 2 1 2 2 2 1 2 2 2 1 4 2 2 2 2 1 2 1 2 2 2 1 4 2 2 2 2 1 1 1 1 1 2 1 1 2 1 2 2 2 2 2 1 1 1 2 1 1 2 2 1 2 2 2 2 1 1 2 2 2 2 2 2 1 2 4 2 2 4 4 1 4 4 2 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 1 1 2 2 2 1 1 1 2 1 2 2 2 1 1 2 2 2 2 1 1 © 2004 by Chapman & Hall/CRC Press LLC that need to be considered If the first few of the derived variables (the principal components) among them account for a large proportion of the total variance of the observed variables, they can be used both to provide a convenient summary of the data and to simplify subsequent analyses The coefficients defining the principal components are found by solving a series of equations involving the elements of the observed covariance matrix, although when the original variables are on very different scales, it is wiser to extract them from the observed correlation matrix instead (An outline of principal component analysis is given in Box 11.1.) Choosing the number of components to adequately summarize a set of multivariate data is generally based on one of a number of relative ad hoc procedures: Ⅲ Retain just enough components to explain some specified large percentages of the total variation of the original variables Values between 70 and 90% are usually suggested, although smaller values might be appropriate as the number of variables, q, or number of subjects, n, increases Ⅲ Exclude those principal components with variances less than the average When the components are extracted from the observed correlation matrix, this implies excluding components with variances less than one (This is a very popular approach but has its critics; see, for example, Preacher and MacCallam, 2003.) Ⅲ Plot the component variances as a scree diagram and look for a clear “elbow” in the curve (an example will be given later in the chapter) Often, one of the most useful features of the results of a principal components analysis is that it can be used to construct an informative graphical display of multivariate data that may assist in understanding the structure of the data For example, the first two principal component scores for each sample member can be plotted to produce a scatterplot of the data If more than two components are thought necessary to adequately represent the data, other component scores can be used in three-dimensional plots or in scatterplot matrices Box 11.1 Principal Component Analysis Ⅲ Principal components is essentially a method of data reduction that aims to produce a small number of derived variables that can be used in place of the larger number of original variables to simplify subsequent analysis of the data © 2004 by Chapman & Hall/CRC Press LLC Ⅲ The principal component variables y1, y2, …, yq are defined to be linear combinations of the original variables x1, x2, …, xq that are uncorrelated and account for maximal proportions of the variation in the original data, i.e., y1 accounts for the maximum amount of the variance among all possible linear combinations of x1, …, xq , y2 accounts for the maximum variance subject to being uncorrelated with y1 and so on Ⅲ Explicitly, the principal component variables are obtained from x1, …, xq as follows: y1 ! a11x1 a12 x2 L a1q x q y2 ! a21x1 a22 x2 L a2 q x q M yq ! aq 1x1 aq x2 L aqq x q Ⅲ Ⅲ Ⅲ Ⅲ where the coefficients aij (i = 1, …, q, j = 1, …, q) are chosen so that the required maximal variance and uncorrelated conditions hold Since the variances of the principal components variables could be increased without limit, simply by increasing the coefficients that define them, a restriction must be placed on these coefficients The constraint usually applied is that the sum of squares of the coefficients is one so that the total variance of all the components is equal to the total variance of all the observed variables It is often convenient to rescale the coefficients so that their sum of squares are equal to the variance of that component they define In the case of components derived from the correlation matrix of the data, these rescaled coefficients give the correlations between the components and the original variables It is these values that are often presented as the result of a principal components analysis The coefficients defining the principal components are given by what are known as the eigenvectors of the sample covariance matrix, S, or the correlation matrix, R Components derived from S may differ considerably from those derived from R, and there is not necessarily any simple relationship between them In most practical applications of principal components, © 2004 by Chapman & Hall/CRC Press LLC the analysis is based on the correlation matrix, i.e., on the standardized variables, since the original variables are likely to be on very different scales so that linear combinations of them will make little sense Ⅲ Principal component scores for an individual i with vector of variable values x Ti can be obtained by simply applying the derived coefficients to the observed variables, generally after subtracting the mean of the variable, i.e., from the equations yi ! a 1T x i x M yiq ! a qT x i x – where aTi = [ai1, ai2, L, aiq ], and x is the mean vector of the observations (Full details of principal components analysis are given in Everitt and Dunn, 2001, and Jolliffe, 2002.) 11.2.2 Factor Analysis Factor analysis (more properly exploratory factor analysis) is concerned with whether the covariances or correlations between a set of observed variables can be explained in terms of a smaller number of unobservable constructs known either as latent variables or common factors Explanation here means that the correlation between each pair of measured (manifest) variables arises because of their mutual association with the common factors Consequently, the partial correlations between any pair of observed variables, given the values of the common factors, should be approximately zero Application of factor analysis involves the following two stages: Ⅲ Determining the number of common factors needed to adequately describe the correlations between the observed variables, and estimating how each factor is related to each observed variable (i.e., estimating the factor loadings); Ⅲ Trying to simplify the initial solution by the process known as factor rotation (Exploratory factor analysis is outlined in Box 11.2.) © 2004 by Chapman & Hall/CRC Press LLC Box 11.2 Exploratory Factor Analysis Ⅲ In general terms, exploratory factor analysis is concerned with whether the covariances or correlations between a set of observed variables x1, x2, …, xq can be ‘explained’ in terms of a smaller number of unobservable latent variables or common factors, f1, f2, …, fk , where k < q and hopefully much less Ⅲ The formal model linking manifest and latent variables is simply that of multiple regression (see Chapter 4), with each observed variable being regressed on the common factors The regression coefficients in the model are known in this context as the factor loadings and the random error terms as specific variates since they now represent that part of an observed variable not accounted for by the common factors Ⅲ In mathematical terms, the factor analysis model can be written as follows: x1 ! P 11 f1 P 12 f2 L P 1k fk u1 x2 ! P 21 f1 P 22 f2 L P 2k fk u2 M x q ! P q f1 P q f2 L P qk fk u q Ⅲ The equations above can be written more concisely as x ! 0f u where « x1 » « P 11 ¬ ¼ ¬ ¬ ¼ x= M , 0!¬ M ẳ ơP ơx ẳ q1 qẵ L L « u1 » « f1 » P 1k » ¼ ¬ ¼ ¬ ¼ M ¼, f ! ¬ M ẳ, u ! M ẳ ơu ẳ ơư fk ẳẵ P qk ẳẵ qẵ Since the factors are unobserved, we can fix their location and scale arbitrarily, so we assume they are in standardized form with mean zero and standard deviation one (We will also assume they are uncorrelated although this is not an essential requirement.) © 2004 by Chapman & Hall/CRC Press LLC ... 1.5.3 Running Statistical Procedures Performing a variety of statistical analyses using SPSS is the focus of this handbook and we will make extensive use of the statistical procedures offered under... Description of Data Methods of Analysis Analysis Using SPSS 2.3.1 Lifespans of Rats 2.3.2 Husbands and Wives Exercises 2.4.1 Guessing the Width of a Lecture Hall 2.4.2 More on Lifespans of Rats:... different aspects of a statistical analysis They allow manipulation of the format of the data spreadsheet to be used for analysis (Data), generation of new variables (Transform), running of statistical

Định dạng
Số trang	339
Dung lượng	4,35 MB