ebook THE GUILFORD PRESS Regression Analysis and Linear Models Methodology in the Social Sciences David A Kenny, Founding Editor Todd D Little, Series Editor www.guilford.com/MSS This series provides applied researchers and students with analysis and research design books that emphasize the use of methods to answer research questions Rather than emphasizing statistical theory, each volume in the series illustrates when a technique should (and should not) be used and how the output from available software programs should (and should not) be interpreted Common pitfalls as well as areas of further development are clearly articulated RECENT VOLUMES DOING STATISTICAL MEDIATION AND MODERATION Paul E Jose LONGITUDINAL STRUCTURAL EQUATION MODELING Todd D Little INTRODUCTION TO MEDIATION, MODERATION, AND CONDITIONAL PROCESS ANALYSIS: A REGRESSION-BASED APPROACH Andrew F Hayes BAYESIAN STATISTICS FOR THE SOCIAL SCIENCES David Kaplan CONFIRMATORY FACTOR ANALYSIS FOR APPLIED RESEARCH, SECOND EDITION Timothy A Brown PRINCIPLES AND PRACTICE OF STRUCTURAL EQUATION MODELING, FOURTH EDITION Rex B Kline HYPOTHESIS TESTING AND MODEL SELECTION IN THE SOCIAL SCIENCES David L Weakliem REGRESSION ANALYSIS AND LINEAR MODELS: CONCEPTS, APPLICATIONS, AND IMPLEMENTATION Richard B Darlington and Andrew F Hayes GROWTH MODELING: STRUCTURAL EQUATION AND MULTILEVEL MODELING APPROACHES Kevin J Grimm, Nilam Ram, and Ryne Estabrook PSYCHOMETRIC METHODS: THEORY INTO PRACTICE Larry R Price Regression Analysis and Linear Models Concepts, Applications, and Implementation Richard B Darlington Andrew F Hayes Series Editor’s Note by Todd D Little THE GUILFORD PRESS New York London Copyright © 2017 The Guilford Press A Division of Guilford Publications, Inc 370 Seventh Avenue, Suite 1200, New York, NY 10001 www.guilford.com All rights reserved No part of this book may be reproduced, translated, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the publisher Printed in the United States of America This book is printed on acid-free paper Last digit is print number: Library of Congress Cataloging-in-Publication Data is available from the publisher ISBN 978-1-4625-2113-5 (hardcover) Series Editor’s Note What a partnership: Darlington and Hayes Richard Darlington is an icon of regression and linear modeling His contributions to understanding the general linear model have educated social and behavioral science researchers for nearly half a century Andrew Hayes is an icon of applied regression techniques, particularly in the context of mediation and moderation His contributions to conditional process modeling have shaped how we think about and test processes of mediation and moderation Bringing these two icons together in collaboration gives us a work that any researcher should use to learn and understand all aspects of linear modeling The didactic elements are thorough, conversational, and highly accessible You’ll enjoy Regression Analysis and Linear Models, not as a statistics book but rather as a Hitchhiker’s Guide to the world of linear modeling Linear modeling is the bedrock material you need to know in order to grow into the more advanced procedures, such as multilevel regression, structural equation modeling, longitudinal modeling, and the like The combination of clarity, easy-to-digest “bite-sized” chapters, and comprehensive breadth of coverage is just wonderful And the software coverage is equally comprehensive, with examples in SAS, STATA, and SPSS (and some nice exposure to R)— giving every discipline’s dominant software platform a thorough coverage In addition to the software coverage, the various examples that are used span many disciplines and offer an engaging panorama of research questions and topics to stimulate the intellectually curious (a remedy for “academic attention deficit disorder”) This book is not just about linear regression as a technique, but also about research practice and the origins of scientific knowledge The v vi Series Editor’s Note thoughtful discussion of statistical control versus experimental control, for example, provides the basis to understand when causal conclusions are sufficiently implicated As such, policy and practice can, in fact, rely on wellcrafted nonexperimental analyses Practical guidance is also a hallmark of this work, from detecting and managing irregularities, to collinearity issues, to probing interactions, and so on I particularly appreciate that they take linear modeling all the way up through path analysis, an essential starting point for many advanced latent variable modeling procedures This book will be well worn, dog-eared, highlighted, shared, re-read, and simply cherished It will now be required reading for all of my firstyear students and a recommended primer for all of my courses And if you are planning to come to one of my Stats Camp courses, brush up by reviewing Darlington and Hayes As always, “Enjoy!” Oh, and to paraphrase the catch phrase from the Hitchhiker’s Guide to the Galaxy: “Don’t forget your Darlington and Hayes.” TODD D LITTLE Kicking off my Stats Camp in Albuquerque, New Mexico Preface Linear regression analysis is by far the most popular analytical method in the social and behavioral sciences, not to mention other fields like medicine and public health Everyone is exposed to regression analysis in some form early on who undertakes scientific training, although sometimes that exposure takes a disguised form Even the most basic statistical procedures taught to students in the sciences—the t-test and analysis of variance (ANOVA), for instance—are really just forms of regression analysis After mastering these topics, students are often introduced to multiple regression analysis as if it is something new and designed for a wholly different type of problem than what they were exposed to in their first course This book shows how regression analysis, ANOVA, and the independent groups t-test are one and the same But we go far beyond drawing the parallels between these methods, knowing that in order for you to advance your own study in more advanced statistical methods, you need a solid background in the fundamentals of linear modeling This book attempts to give you that background, while facilitating your understanding using a conversational writing tone, minimizing the mathematics as much as possible, and focusing on application and implementation using statistical software Although our intention was to deliver an introductory treatment of regression analysis theory and application, we think even the seasoned researcher and user of regression analysis will find him- or herself learning something new in each chapter Indeed, with repeated readings of this book we predict you will come to appreciate the glory of linear modeling just as we have, and maybe even develop the kind of passion for the topic that we developed and hope we have successfully conveyed to you vii viii Preface Regression analysis is conducted with computer software, and you have many good programs to choose from We emphasize three commercial packages that are heavily used in the social and behavioral sciences: IBM SPSS Statistics (referred to throughout the book simply as “SPSS”), SAS, and STATA A fourth program, R, is given some treatment in one of the appendices But this book is about the concepts and application of regression analysis and is not written as a how-to guide to using your software We assume that you already have at least some exposure to one of these programs, some working experience entering and manipulating data, and perhaps a book on your program available or a local expert to guide you as needed That said, we provide relevant commands for each of these programs for the key analyses and uses of regression analysis presented in these pages, using different fonts and shades of gray to most clearly distinguish them from each other Your program’s reference manual or user’s guide, or your course instructor, can help you fine-tune and tailor the commands we provide to extract other information from the analysis that you may need one day In this rest of this preface, we provide a nonexhaustive summary of the contents of the book, chapter by chapter, to give you a sense of what you can expect to learn about in the pages that follow Overview of the Book Chapter introduces the book by focusing on the concept of “accounting for something” when interpreting research results, and how a failure to account for various explanations for an association between two variables renders that association ambiguous in meaning and interpretation Two examples are offered in this first chapter, where the relationship between two variables changes after accounting for the relationship between these two variables and a third—a covariate These examples are used to introduce the concept of statistical control, which is a major theme of the book We discuss how the linear model, as a general analytic framework, can be used to account for covariates in a flexible, versatile manner for many types of data problems that a researcher confronts Chapters and are perhaps the core of the book, and everything that follows builds on the material in these two chapters Chapter introduces the concept of a conditional mean and how the ordinary least squares criterion used in regression analysis for defining the best-fitting model yields a model of conditional means by minimizing the sum of the squared residuals After illustrating some simple computations, which are then replicated using regression routines in SPSS, SAS, and STATA, distinctions are drawn between the correlation coefficient and the regression coefficient as Subject Index Hotdeck imputation, 545 Human judgment, 205–207, 208 Hypothesis testing assumption violations and, 507 interactions and, 436 multidimensional sets and, 148 multiple test problem and, 317, 318, 331–335, 336–337 path analysis and, 455–456 power of a statistical test and, 519, 523 random assignment and, 160 regression diagnostics and, 518 statistical inference and, 104 I Importance of regressors See also Regressors; Variable importance dominance analysis and, 233–240 overview, 153, 210–212 single regression model, 223–233 squared correlations and, 212–223, 215f, 217t statistical software and, 237–240, 238f Improper linear model, 143 Imputation, 545 Independence assumption, 506–509 Independence predictor variable configuration, 196, 197f, 199, 203–204 See also Predictor variables Independent sampling, 89 Independent tests, 321–322 Independent variable See also Regressors Bonferroni method and, 321–322 causality and, 470 fixing direct effects to zero and, 474– 475 linear models and, 10–12 measurement error and, 527 multicategorical independent variables and, 473–474 multiple mediator models and, 465 multiple test problem and, 331–335 pathways of influence and, 448 random assignment and, 3, 162 regression analysis and, 47 relations among statistics, 81 relationship between dependent variables and, 1–2 singularity and, 535–538 647 statistical inference and, 105 tolerance and, 109 transformations and, 372 Indicator coding See also Coding systems; Statistical software constructing, 249–250, 250f equality of the means and, 252–254, 254f interactions and, 414–415 overview, 245–248, 246t, 249t, 273, 278t, 279t reference category and, 250–252, 252f statistical software and, 590 Indicator variables See also Dichotomous regressors; Dummy variables coding and, 245–248, 246t, 249t constructing, 249–250, 250f dichotomous regressors and, 125–126, 126t multicategorical variables and, 244–245 Indirect effect multicategorical independent variables and, 473–474 multiple mediator models and, 465, 466, 468–469 overview, 153, 448–452, 450f path analysis and, 452–454, 453f, 455– 458, 476–477 random assignment and, 176 statistical software and, 458–459, 459f, 461f Inference See also Statistical inference causality and, 470 conditional effects and, 411–422, 418f, 422f interactions and, 397–398 multiple mediator models and, 466 multiple regression analysis and, 311 regression coefficients and, 560–562 simple regression model and, 27 statistical software and, 597–598 time series analysis and, 573 without random sampling, 514–516 Influence, 448, 482–490 Influential cases, 484 Influential outliers, 155 Influential points, 484 Interactions See also Moderation complications and confusions in the study of, 429–441, 430t, 431f, 433t, 439t conditional effects and, 411–422, 418f, 422f 648 Subject Index Interactions (cont.) detection of, 429–430 examples of, 398–401, 400f, 402f involving a categorical regressor, 390– 404, 391f, 396f, 400f, 402f linear models and, 11 organizing tests on, 441–445 overview, 153, 154, 377–390, 380f, 381t, 383f, 384t, 385f, 388f, 389f, 409–410, 445–446 power of a statistical test and, 521 probing, 422–428, 424f, 427f R programming language and, 608–609 regression diagnostics and, 516–517 statistical software and, 593–596 between two categorical regressors, 404–408, 406f, 407t Intercept, 20, 41, 380–381 See also Y-intercept Intercorrelation, 379–380, 523 Interpretation, 560–562 Interval estimates, 106–107, 318 Inverse butterfly heteroscedasticity, 499, 500f, 503 See also Heteroscedasticity Irregularities See also Errors assumption violations, 496–509, 497f, 500f dealing with, 509–514 inference without random sampling and, 514–516 overview, 479, 517–518 regression diagnostics, 480–495, 483f, 492f, 516–517 J Jackknife, 510, 512 Johnson–Neyman technique, 425–427, 427f, 445–446 L Layering, 324–325 Least squares criterion, 41, 55, 85 Leave-one-out method estimating true validity, 184, 185–186, 187–188 statistical software and, 187–188 Legal factors, 165–166 Leverage diagnostic statistics and, 491–493, 492t measuring, 484–487 regression diagnostics and, 482–490 Leverage points, 154, 484 Likelihood, 553–556, 578 Likelihood function, 553–554, 565–568 Likelihood ratio test, 567–568 Linear interaction See also Interactions overview, 409 R programming language and, 608–609 statistical software and, 592–593 Linear models See also Models alternative view of, 52 estimation with computer software, 55–58, 56f, 57f overview, 2, 8–16, 12–14, 47–49 random assignment and, 170–174, 171f regression to the mean and, 141–144 Linear regression analysis Analysis of Variance Summary Table and, 92–102, 93f, 94f, 96t, 102t computer analysis and, 28–29, 30f degrees of freedom and, 99–100 dichotomous variables and, 248 matrix algebra of, 621–625 model estimation with computer software and, 55 multicategorical variables and, 259 overview, 8–9, 11–12, 41, 83–84, 85, 122–123, 153–156, 341–347, 343f, 345f, 347f , 342, 374–375, 475, 578 partial regression coefficients and, 58 path analysis and, 448–463, 450f, 453f, 459f, 460f, 461f, 464f residuals and, 35–40, 38f, 39f, 40f statistical inference and, 116–118 using R programming language, 603–610 Linear spline models See also Spline regression overview, 357, 358–362, 359f, 360f polynomial spline regression, 364–368, 366f Linearity assumption of, 88, 90 errors of estimates and, 20–22 path analysis and, 475 residual analysis and, 39–40, 39f statistical inference and, 113–114 Listwise deletion, 544–545 Subject Index Local property, 353 Log odds, 554–556 Logarithmic transformation, 370–372, 371f, 375 See also Transformations Logical independence, 331–335 Logistic regression examples of, 557–560, 558t, 559t, 560f ordered logistic regression, 570–571 overview, 12, 551–570, 558t, 559t, 560f, 563f, 578 Logistic regression equation, 556–557 Logit, 554–556, 558–559, 559t, 562 Lower order regression coefficients, 433– 435 See also Regression coefficient M Mahalanobis distance, 484, 485–487 See also Distance Main effects, 406f, 407 Manipulation, 3, 10, 160, 165–169 See also Covariates Marginal mean, 19, 526–527 Mathematical equating, 12–13 Mean, adjusted covariates and the comparison of, 297f multicategorical variables and, 266– 268, 267f weighted group coding and, 308 Mean, conditional See also Linear regression model measurement error and, 526–527 scatterplots and, 19, 20f statistical inference and, 116–118 Mean, group dichotomous regressors and, 126–127, 127f effect coding and, 287–288 Mean, regression to, 135–144, 136f, 138f, 533–534 Mean centered variables, 354–356, 356f, 596 Mean difference, 172–173, 260–261 Mean imputation, 545 Mean squares, 100–102 Meaningfulness of association, 515–516 Means, comparison of, 317 Measurement, 134, 565–568 Measurement compression, 373 649 Measurement error effects of, 528–530 managing, 530–531 overview, 155, 525–531 structural equation modeling and, 575 Measurement expansion, 373 Mechanical prediction, 177–181, 208 See also Prediction Mediation analysis See also Multiple mediator models causality and, 469–472 nonsignificant total effect and, 472–473 overview, 447–448, 469, 476–477 statistical software and, 458–463, 459f, 460f, 461f, 464f Mediation model, 452–454, 453f Mediator variable, 449 Minitab, 13 See also Statistical software Miscellaneous set, 145 See also Multidimensional sets Missing data, 155, 543–546, 549, 599 Mixed ANOVA, 143 See also Analysis of variance (ANOVA) Model fit, 565–568 Model of Y, 26 Models See also Linear models; Nonlinear model; Regression models alternative view of, 52, 53f–54f Analysis of Variance Summary Table and, 95, 96t, 97–99 best fitting model, 55–70, 56f, 57f, 59f, 60f, 61t, 62f, 65t, 66f, 67f, 69f conditional effects and, 411–416 estimation with computer software, 55–58, 56f, 57f geometric representation of, 49–50, 49f indicator variables and, 250f model errors and, 50–52, 51f, 53f multiple correlation R and, 68–70, 69f overview, 9, 47–49 partial regression coefficients and, 58–63, 59f, 60f, 61t, 62f scale-free measures of partial association and, 70–75, 71f, 72f statistical inference and, 107–108 three or more regressors and, 64–67, 65t, 66f, 67f Moderated mediation, 475–476 Moderation, 154, 375, 378, 413 See also Interactions Moderator, 419–422, 422f 650 Subject Index Monte Carlo confidence interval, 456– 458, 462, 464f, 476–477 See also Confidence intervals Monotonic transformations, 369–370 See also Transformations Morality, 165–166 Mortality, 159 Multicategorical independent variables, 473–474 See also Independent variable Multicategorical variables See also Dichotomous regressors; Regressors alternative coding systems, 276–288, 277t, 278t, 279t, 280t, 283t, 284t, 288t coding and, 308–309 comparison of adjusted means and, 294–298, 297f conditional effects and, 423 contrasts and, 289–294, 293f, 298–308, 299t, 301t, 302t focal predictor or moderator and, 419–422, 422f interactions and, 394–397, 396f, 397– 398, 408, 414–415 linear models and, 9–10, 11 as or with covariates, 258–273, 260f, 262f, 267f, 270f overview, 153, 243–244, 273, 275, 276t as sets, 244–257, 246t, 249t, 250f, 252t, 254f statistical control and, statistical software and, 589–592, 594–595 weighted group coding and, 298–308, 299t, 301t, 302t Multidimensional sets, 144–152, 151f Multilevel modeling, 12, 509, 575–577, 578 Multiple correlation R estimating true validity, 181–188, 182t, 186t, 187f mechanical prediction and, 180–181 of TR, 87, 181–188, 182t, 186t, 187f of TR2, 102–104 overview, 68–70, 69f Multiple imputation, 545 Multiple interactions, 438–441, 439t See also Interactions Multiple logistic regression, 562–563 See also Logistic regression Multiple mediator models, 464–469, 466f See also Mediation analysis Multiple regression analysis See also Regression analysis Analysis of Variance Summary Table and, 93f, 94f overview, 64–67, 65t, 66f, 67f, 83–84, 311 relations among statistics, 75–83, 78, 79f Multiple regression correlations, 75–78, 78t Multiple test problem Bonferroni method and, 320–328, 324t overview, 308–309, 312–320, 314f, 328–339 Multiple tests, 311–312, 328–338 Multivariate Mahalonobis distance, 485 See also Mahalanobis distance N Narrow tests, 443–445 See also Testing Negative binomial regression, 572–573, 578 Negative monotonic transformations, 369–370 See also Transformations Negative residuals, 37 See also Residuals Nested structure, 12 Nominal variables, 74–75, 286 Noncontribution of missingness, 544 Nonindependence, 506–509 Nonindependent tests, 322–324, 324t Noninterval scaling, 155, 541–543 See also Scaling Nonlinear model, 432, 446, 542–543 Nonlinear regression model See also Models overview, 47–49, 341–347, 343f, 345f, 347f, 374–375 polynomial regression and, 347–357, 349f, 352f, 353f, 356f spline regression and, 357–369, 359f, 360f, 366f transformations and, 369–374, 371f, 373f Nonlinearity, 154, 475, 496–498, 497f Nonnormality, 90, 498–499 Nonnormality of errors in estimation, 154 Subject Index Nonparallel lines, 384–385, 385f Nonrandom attrition, 164–165 Nonrandom measurement error, 525– 526 See also Measurement error Nonrandom sampling, 515 Nonsampling, 515 Nonsense values, 327 Nonsignificant covariates, 121 See also Covariates Nonsignificant linear terms, 437 Nonsignificant total effect, 472–473 Normality assumption, 90, 105, 114–116, 115f Null hypothesis Bonferroni method and, 320–328, 324t, 330–331 coding and, 308 conditional effects and, 415–416 dichotomous regressors and, 128 effect coding and, 287–288 interactions and, 420, 436, 444 irregularities and, 512 logistic regression and, 552, 567–568 multicategorical variables and, 263, 273 multidimensional sets and, 149–150 multiple test problem and, 313–317, 314f, 329–331, 332–334, 336, 338, 339 path analysis and, 455–456, 459 power of a statistical test and, 521–522, 548–549 regression and correlation coefficients and, 35 statistical inference and, 87, 104, 105– 106, 112–114, 117–118, 120 weighted group coding and, 306–307 Null hypothesis significance testing, 210 See also Null hypothesis Numerical regressors See also Regressors artificial categorization of, 132–135 conditional effects and, 423 dichotomous regressors and, 129 examples of, interactions and, 378–379, 380f, 390– 392, 391f, 394–397, 396f, 412–414, 441 linear models and, 8, 9, 11 logistic regression and, 559–560, 560f probing an interaction and, 426 statistical control and, 651 O Observational data, 429–430 Observed score, 526 Odds, 554–556, 558–559, 559t, 562 Odds ratio, 561–562 OLS regression See Ordinary least squares regression (OLS regression) Ordinal logistic regression, 570–571 See also Logistic regression Ordinal variables Helmert coding and, 283t noninterval scaling and, 541–543 sequential coding and, 282 Organizing tests, 441–445 Outliers, 154, 480, 487 See also Irregularities Overall mean See Marginal mean Overall tests, 443 See also Testing Overcontrol, 154, 158, 538–541, 539f, 541f P Pairwise comparison, 289, 317 Pairwise deletion, 543–544 Parabola’s maximum or minimum, 356–357 Paradoxical results, Parallel multiple mediator model, 464– 467, 466f, 468–469 See also Multiple mediator models Parallels, 254–255 Parameterizations, 406f Parameters, 86, 87 See also Population values; True values Partial association multiple test problem and, 317 overview, 41, 157–158 partial regression coefficients and, 58 power of a statistical test and, 520–521 Partial correlation See also Correlations overview, 209–210 R programming language and, 607–608 scale-free measures of partial association and, 71–73, 71f, 72f statistical inference and, 87, 112–116, 114f, 115f Partial correlation Tprj, 87 652 Subject Index Partial dominance, 235 See also Dominance analysis Partial homoscedasticity, 499 Partial influence, 490 See also Influence Partial multiple correlations multicategorical variables and, 261, 263 multidimensional sets and, 145–148 statistical inference and, 87 Partial nonlinearity, 496–497 Partial redundancy predictor variable configuration, 196–198, 197f See also Predictor variables Partial regression coefficient Tbj, 87 Partial regression coefficients See also Regression coefficient best fitting model and, 58–63, 59f, 60f, 61t, 62f overview, 83, 209, 241 scale-free measures of partial association and, 70–75, 71f, 72f statistical inference and, 105–116, 114f, 115f three or more regressors and, 64–67, 65t, 66f, 67f Partial regression correlations, 75–80, 78t, 79f Partial regression slopes See Partial regression coefficients Partial regression weights, 87 Partial relationship examples of, relations among statistics, 75–78, 78t, 80–81 residuals and, 37 Partial scatterplot, 71f, 72, 72f Partialing, 12–13, 58 Path analysis causality and, 469–472 overview, 448–463, 450f, 453f, 459f, 460f, 461f, 464f, 476–477 Path analysis algebra, 454–455 Path diagram, 449, 450f, 540, 541f Pathways of influence, 448 Pearson’s correlation, 2, 12, 41, 44, 137, 218–220 Permutation tests, 510, 513–514 Phenomenon, the, 135–138, 136f, 138f Pick-a-point approach, 423, 424–425, 424f Planned tests, 335–338 Plausibility, 331–335 Poisson regression, 572–573, 578 Polynomial regression centering variables in, 354–356, 356f examples of, 350–352, 352f, 353f overview, 347–357, 349f, 352f, 353f, 356f, 375 Polynomial spline regression, 364–368, 366f See also Polynomial regression; Spline regression Population See also Samples degrees of freedom and, 100 dichotomous regressors and, 132 measurement error and, 526–527 random assignment and, 161, 163 simple regression model and, 23–24, 27, 31 statistical inference and, 105 transformations and, 374 Population inference, 86–87, 90–91, 515 See also Statistical inference Population values, 86 See also Parameters Positive monotonic transformations See also Transformations Post hoc testing, 336 Power overcontrol and, 540 overview, 519–525, 548–549 random assignment and, 170–174, 171f statistical inference and, 121 structural equation modeling and, 575 Precision, 170–174, 171f, 519–525 Predicted values, 606–607 Prediction See also Predictor variables human judgment and, 205–207 importance of regressors and, 220–222 mechanical prediction, 177–181 multiple test problem and, 335–338 overview, 207–208 power of a statistical test and, 520–521 selecting predictor variables, 188–195 true validity and, 182t true validity and, 181–188, 186t, 187f Prediction model, 194–195 Predictive power, 34 Predictor variables See also Prediction configurations, 195–205, 197f, 200t, 203f, 204f, 208 interactions and, 378 mechanical prediction and, 180–181 Subject Index power of a statistical test and, 523 regression analysis and, 43–52, 44t, 45f, 49f, 51t, 53f–54f selecting, 188–195 Primary assumptions, 88–91 See also Assumptions; Standard assumptions of regression theory Probabilities, 554–556, 558–559, 559t Probability sample, 90–91, 115–116 See also Samples Probing an interaction, 422–428, 424f, 427f, 593–594 See also Interactions Probit regression, 12, 571, 578 See also Logistic regression Proportional reduction, 220–222 p-value assumption violations and, 502, 505– 506 Bonferroni method and, 320–321, 330–331 contrasts and, 293–294 effect coding and, 287–288 Helmert coding and, 285–286 importance of regressors and, 230–231 indicator coding and, 253, 257 interactions and, 407, 435–436 irregularities and, 512 multiple test problem and, 313–314, 316–317, 324–327, 337, 338, 339 path analysis and, 456, 459 predictor variables and, 192–193 probing an interaction and, 424, 425–426 regression and correlation coefficients and, 35 spline regression and, 368–369 statistical inference and, 105–106 weighted group coding and, 303, 307 Q Quadratic model nonlinear regression model and, 342– 343, 343f, 353–354 polynomial regression and, 348, 353f, 357, 358f spline regression and, 362 Quartic model, 362–363 653 R Random assignment See also Measurement error; Random sampling; Statistical control causality and, 470 inference without random sampling and, 516 limitations of, 162–169 multicategorical variables and, 261 overview, 10, 157–162, 176 Random assignment on the independent variable, See also Independent variable Random assignment without random sampling, 161 See also Random sampling Random but nonindependent assignment, 161 See also Random assignment Random measurement error, 525–526, 527–528, 549 Random sampling See also Random assignment; Sampling assumptions and, 90–92 inference without, 514–516 overview, 161 statistical inference and, 105 Randomization, 3, 160, 513 See also Random assignment Rank-order correlation, 2, 502–503 Rectangular data matrix, 9–10 Redundancy predictor variable configuration, 196–198, 197f, 203f See also Predictor variables Reference category, 250–252, 252f, 255–257 Regression Analysis of Variance Summary Table and, 95, 96t, 97–99, 100–102 statistical inference and, 122–123 Regression algebra, 452–454, 453f Regression analysis See also Simple regression model coding and, 275, 276t, 308 dichotomous regressors and, 127, 127f examples of, 26–28, 28f measurement error and, 525 mechanical prediction and, 177–181 with multiple predictor variables, 43–52, 44t, 45f, 49f, 51t, 53f–54f 654 Subject Index Regression analysis (cont.) multiple test problem and, 317–318 overview, 243 partial regression coefficients and, 209 prediction and, 207–208 problems that arise in, 532–548, 539f, 541f, 547t, 548t regression to the mean and, 143 statistical inference and, 87 tolerance and, 109–112 Regression centering approach, 416 Regression coefficient See also Multiple regression analysis; Partial regression coefficients assumptions and, 90 compared to the correlation coefficient, 31–35, 33f comparing in the same model, 229–233 dichotomous regressors and, 128–129, 130–132 heteroscedasticity and, 501–502 importance of regressors and, 223–233 indicator coding and, 255–257 interactions and, 385–386, 392–394, 397–398, 402–404, 409, 413, 429–430, 433–435, 437 interpretation of and inference about, 560–562 logistic regression and, 557–560, 558t, 559t, 560f models and, 48–50, 49f multicategorical variables and, 259, 264–266 overview, path analysis and, 448–449 Pearson’s r and, 218–220 power of a statistical test and, 520–524, 548–549 properties of, 32–34, 33f random assignment and, 172 relations among statistics, 81–82 scale-free measures of partial association and, 70–75, 71f, 72f spline regression and, 367–369 standardized regression coefficient, 73–75 statistical software and, 586, 598 symbolic representation and, 15–16 uses of, 34–35 Regression coefficient b1, 23–24 Regression constant b0, 23–24, 63 See also Y-intercept Regression diagnostics See also Diagnostic statistics conducting, 516–517 overview, 480–495, 483f, 492f, 517–518 statistical software and, 494–495, 587 Regression equation, 23–24, 41 Regression imputation, 545 Regression line degrees of freedom and, 99–100 finding, 25–26 overview, 23–24, 41 random assignment and, 170–172, 171f regression and correlation coefficients and, 32–33, 33f residual analysis and, 38–39, 38f Regression models See also Linear regression model; Models regression to the mean and, 135–144, 136f, 138f statistical inference and, 107–108 Regression residuals See Residuals Regression slope, 9, 170–172, 171f, 174 See also Slope Regression sum of squares, 97–99 See also Sum of squares Regression to the mean, 135–144, 136f, 138f, 533–534 Regression weight See also Weights estimating true validity, 181–188, 182t, 186t, 187f importance of regressors and, 215–216 mechanical prediction and, 180–181 overview, path analysis and, 448–449 predictor variables and, 195–196 statistical inference and, 105–106 Regressors See also Collinear regressors; Complementary regressors; Importance of regressors; Independent variable; Multicategorical variables; Numerical regressors; Two-regressor model assumptions and, 88 degrees of freedom and, 100 dichotomous regressors, 125–135, 126t, 127f, 130f dominance analysis and, 233–240 interactions and, 378 Subject Index multicategorical regressors, 153 multidimensional sets and, 144–152, 151f overview, 46–47, 83, 240–241 partial regression coefficients and, 58–59 relations among statistics and, 75–78, 78t standardized regression coefficient, 74–75 statistical inference and, 115–116, 120 three or more, 64–67, 65t, 66f, 67f tolerance and, 109–112 transformations of, 369–374, 371f, 373f Venn diagrams and, 78–80, 79f Relative importance, 34 Reliability, 205–206, 526, 530 Repeated categories coding See Sequential coding Replicability of association, 515–516 Residual analysis, 37–40, 38f, 39f, 40f, 41 See also Residuals Residual scatterplot, 346 See also Scatterplots Residual variances, 87 Residuals analysis of, 37–40, 38f, 39f, 40f Analysis of Variance Summary Table and, 95, 96t, 97–99, 100–102 overview, 35–40, 38f, 39f, 40f, 41 partial regression coefficients and, 60–61, 61t R programming language and, 606–607 Reverse Helmert coding, 286 See also Helmert coding Right-tail normal probabilities, 612 RLM, 581–601 See also SAS; SPSS; Statistical software Robustification, 509–510 Rounding error, 546–548, 547t, 548t S Sample regression weights, 181–188, 182t, 186t, 187f Sample size See also Samples assumption violations and, 507 degrees of freedom and, 99–100 interactions and, 441–442 655 multiple test problem and, 316–317 power of a statistical test and, 520–521 regression and correlation coefficients and, 34–35 statistical inference and, 121 Samples See also Population; Sample size; Sampling assumptions and, 90–91 dichotomous regressors and, 132 overview, 86 simple regression model and, 23–24 Sampling See also Samples assumptions and, 89, 90–92 power of a statistical test and, 524 Sampling distribution, 456–457 Sampling error, 525 See also Measurement error Sampling variance, 86 See also Variance Sandwich estimators, 511 See also Heteroscedasticity-consistent standard errors SAS See also Statistical software Analysis of Variance Summary Table and, 92, 101–102 assumption violations and, 504, 506 comparison of adjusted means and, 295, 297–298, 297f contrasts and, 294 dominance analysis and, 237, 241 estimating true validity, 186–188, 187f importance of regressors and, 232–233 indicator coding and, 249–250, 254–255 interactions and, 387–388, 388f, 390, 399, 417–419, 422 irregularities and, 512 linear regression analysis and, 29, 30f logistic regression and, 562–563, 563 model estimation with, 55–56, 57–58, 58 multicategorical variables and, 260f, 264, 268–269 multidimensional sets and, 151–152 multilevel modeling and, 577 multiple mediator models and, 466 multiple regression analysis and, 67, 67f multiple test problem and, 313, 327 overview, 13–14 path analysis and, 462, 469 predictor variables and, 191–192 656 Subject Index SAS (cont.) probing an interaction and, 426 regression analysis and, 46–47 regression diagnostics and, 494–495 RLM macro for, 581–601 semipartial correlation and, 70 statistical inference and, 118 tolerance and, 111–112 weighted group coding and, 304, 307–308 Scale-free measures, 209–210 Scaling See also Transformations interactions and, 432–433, 433t noninterval scaling, 155, 541–543 Scatterplots curvilinearity and, 343f, 344–347, 345f, 347f degrees of freedom and, 99–100 nonlinearity and, 497–498, 497f overview, 17–18, 41 polynomial regression and, 351–352, 352f polynomial spline regression, 365–366, 366f, 369 random assignment and, 170–172, 171f scale-free measures of partial association and, 71–73, 71f, 72f simple regression model and, 17–22, 18f, 19f, 20f, 22t, 23f three or more regressors and, 64–65 transformations and, 371f, 372 Secondary assumptions, 88–91, 114 See also Assumptions; Standard assumptions of regression theory Segmented regression, 357–358 See also Spline regression Selection, 158, 192–195 Semipartial correlation See also Correlations importance of regressors and, 225–226 overview, 70, 209–210 R programming language and, 607–608 relations among statistics, 75–80, 78t, 79f Semipartial multiple correlations, 145– 148, 261, 263 See also Correlations Semipartial scatterplot, 72 Sequential coding See also Coding systems overview, 276, 277, 278t, 279t, 280–282, 280t, 308 statistical software and, 590 Serial multiple mediator model, 464, 466f, 467–469 See also Multiple mediator models Sets assumption violations and, 505–506 collinearity and, 533 multicategorical variables and, 244– 257, 246t, 249t, 250f, 252, 254f singularity and, 534–535 statistical software and, 597 Setwise partial association, 145 Shrunken R, 181–188, 182t, 186t, 187f, 587 Side effects, 163 Significance, statistical See Statistical significance Significance test, 128 Simple correlation Tprj, 87 Simple effects, 378, 405, 406f Simple interaction tests, 443 See also Testing Simple linear interaction, 380–382, 380f, 381t See also Interactions Simple mediation model, 452, 461f See also Mediation model Simple regression coefficient Tbj, 87 Simple regression model See also Regression analysis examples of, 26–28, 28f measurement error and, 529 overview, 23–29, 30f regression line and, 25–26 relations among statistics, 75–78, 78t residuals and, 35–40, 38f, 39f, 40f scatterplots and conditional distributions and, 17–22, 18f, 19f, 20f, 22t, 23f statistical inference and, 121–122 Simple regression weights, 87 Simple relationship, 80–81 Simpson’s paradox, Single dichotomous regressor, 557–560, 558t, 559t, 560f Single numerical regressor, 559–560, 560f Single regression model, 223–233, 557– 560, 558t, 559t, 560f See also Standardized regression coefficient Singularity, 109, 154, 251, 534–538 Slope See also Regression coefficient b1; Regression slope degrees of freedom and, 99–100 interactions and, 377–378, 380–381 overview, 41 Subject Index random assignment and, 170–172, 171f, 174 regression and correlation coefficients and, 32–33, 33f residual analysis and, 38–39, 38f scatterplots and, 19–20 spline regression and, 361–362 three or more regressors and, 65 Sobel test, 456, 457–458, 462 Specification error, 538–541, 539f, 541f Spline regression, 357–369, 359f, 360f, 366f, 375, 588–589 Spotlight analysis, 423 SPSS See also Statistical software Analysis of Variance Summary Table and, 92, 93f, 101–102 assumption violations and, 504, 506 comparison of adjusted means and, 295, 296–297 contrasts and, 293–294, 293f dominance analysis and, 238f, 241 estimating true validity, 186–188, 187f importance of regressors and, 233 indicator coding and, 249–250, 250f, 254–255 interactions and, 390, 399, 400f, 401, 418–419, 420–421, 422f interval estimates and, 106 irregularities and, 512 linear regression analysis and, 29, 30f logistic regression and, 562, 563, 563f model estimation with, 55–56, 56f, 57f, 58 multicategorical variables and, 262f, 263–264, 268–269 multidimensional sets and, 151–152, 151f multilevel modeling and, 577 multiple mediator models and, 466 multiple test problem and, 312–313, 326 overview, 13–14 path analysis and, 461f, 462–463, 469 predictor variables and, 191–192 probing an interaction and, 426 regression diagnostics and, 494, 495 RLM macro for, 581–601 rounding errors and, 546 semipartial correlation and, 70 spline regression and, 363–364, 366–367 statistical inference and, 118 657 tolerance and, 111–112 weighted group coding and, 303–304, 307 Squared correlations, 212–223, 215f, 217t, 240–241 Squared leverage corrected residual, 487–488 Squared partial correlation, 79–80, 79f See also Partial regression correlations Squared residuals, 107–108 Standard assumptions of regression theory, 88–91 See also Assumptions Standard configuration, 79 Standard deviation dichotomous regressors and, 131 importance of regressors and, 225–226 regression to the mean and, 142 simple regression model and, 25 Standard errors See also Errors assumption violations and, 502, 506 conditional effects and, 415–416, 417 contrasts and, 291–292, 296 heteroscedasticity-consistent standard errors, 510, 511–512 interactions and, 432 irregularities and, 512 measurement error and, 531 path analysis and, 456 power of a statistical test and, 521–524, 548–549 random assignment and, 172–173 statistical inference and, 105–106, 107– 108, 118–119, 120 tolerance and, 110–112 weighted group coding and, 306, 307 Standardized partial regression coefficient, 74, 241 See also Partial regression coefficients; Regression coefficient Standardized regression coefficient See also Regression coefficient; Single regression model dichotomous regressors and, 130–132 importance of regressors and, 223–233 limitations of, 224–225 overview, 73–75, 209–210 R programming language and, 605–606 statistical software and, 586 Standardized values, 31–32 658 Subject Index STATA See also Statistical software Analysis of Variance Summary Table and, 92, 94f, 101–102 assumption violations and, 504 comparison of adjusted means and, 295 importance of regressors and, 232–233 indicator coding and, 249–250, 254– 255, 254f interactions and, 387, 390, 399, 417–419, 418f, 421 irregularities and, 512 linear regression analysis and, 29, 30f logistic regression and, 562–563 model estimation with, 58 multicategorical variables and, 264, 268–269, 270f multidimensional sets and, 151–152 multilevel modeling and, 577 multiple test problem and, 313, 327 overview, 13–14 path analysis and, 459, 460f, 462, 463, 464f predictor variables and, 191–192 regression analysis and, 47 regression diagnostics and, 495 rounding errors and, 546 spline regression and, 363 tolerance and, 111–112 weighted group coding and, 304 Statistical control See also Random assignment examples of, 4–8, 5t, 7t, 8t limitations of, 158–159 linear modeling and, methods of, 2–4 multicategorical variables and, 260– 264, 262f need for, 1–2 overcontrol, 154, 158, 538–541, 539f, 541f overview, 1–8, 5t, 7t, 8t, 16, 157–158, 176 random assignment and, 158–169 statistical software and, 12–14 supplementing random assignment with, 169–176, 170t, 171f Statistical inference See also Inference; Population inference Analysis of Variance Summary Table and, 92–102, 93f, 94f, 96t, 102t assumptions for, 88–91 collinearity and, 118–119 conditional means and, 116–118 contradicting inferences, 119–120 contrasts and, 291–292 linear models and, 10, 11–12 multiple correlation TR2 and, 102–104 overview, 84, 85–92, 122–123 partial correlations and, 112–116, 114f, 115f partial regression coefficients and, 105–112 sample size and nonsignificant covariates, 121 sets of variables and, 149–151, 151f simple regression model and, 121–122 Statistical power, 153 See also Power Statistical significance fixing direct effects to zero and, 474 inference without random sampling and, 515–516 interactions and, 432, 440 power of a statistical test and, 519 random assignment and, 172 regression and correlation coefficients and, 34–35 Statistical significance test, 191 Statistical software See also Indicator coding; SAS; SPSS; STATA assumption violations and, 506 contrasts and, 292–294, 293f detecting irregularities and, 481–482 dominance analysis and, 237–240, 238f estimating true validity, 184, 186–188, 187f importance of regressors and, 232–233, 237–240, 238f, 241 indicator coding and, 249–250, 254–255 interactions and, 386–390, 388f, 389f, 398–401, 400f, 402f, 417–419 linear regression analysis and, 28–29, 30f logistic regression and, 562–565, 563f measurement error and, 531 mechanical prediction and, 178 model estimation with, 55–58, 56f, 57f multiple test problem and, 312–313 overview, 12–14 path analysis and, 457, 458–463, 459f, 460f, 461f, 464f predictor variables and, 191–192 Subject Index probing an interaction and, 426–427, 427f R programming language, 603–610 regression diagnostics and, 494–495 singularity and, 535–538 spline regression and, 363–364 weighted group coding and, 303–304 Statistics importance of regressors and, 211–212 mechanical prediction and, 178 overview, 85–86 statistical tables, 612–620 Stepwise regression, 189–195, 192, 208, 317–318 Structural equation modeling program, 531 Structural equation modeling (SEM), 574–575, 578 Sum of means, contrasts and, 289–290 Sum of squares Analysis of Variance Summary Table and, 97–99 predictor variables and, 199 three or more regressors and, 66–67 Suppression, 81 Suppression predictor variable configuration, 196, 200–201, 203–204, 203f, 205 See also Predictor variables Suppressor variable, 201 See also Predictor variables Survival analysis, 573–574, 578 Symbolic representations, 15–16 SYSTAT, 13 See also Statistical software T t-distributions, 488 Testing assumptions, 153 Tests of interaction, 521 See also Interactions Three-way interaction, 438–440, 439t See also Interactions Tilted plane, 129, 130f Time series analysis, 573, 578 Tolerance interactions and, 435–436 overview, 107, 109–112 power of a statistical test and, 523 Total effect multiple mediator models and, 466 nonsignificant total effect and, 472–473 659 overview, 448–452, 450f path analysis and, 453–454, 455 statistical software and, 458–459, 459f Total influence, 490 See also Influence Transformations interactions and, 432–433, 433t nonlinear regression model and, 369– 374, 371f, 373f overview, 375, 509 t-ratio, 293–294 t-residual assumption violations and, 505–506 diagnostic statistics and, 493 R programming language and, 607 regression diagnostics and, 494 TRS estimating true validity, 181–188, 182t, 186t, 187f predictor variables and, 192–193 True score, 526 True validity, 181–188, 182t, 186t, 187f True values, 86 See also Parameters t-test dichotomous regressors and, 128 logistic regression and, 552 overview, random assignment and, 172–173 simple regression model and, 31 t-value effect coding and, 287–288 Helmert coding and, 285–286 importance of regressors and, 230–231 indicator coding and, 257 interactions and, 405, 407, 435–436 path analysis and, 459 probing an interaction and, 424 statistical tables, 613 weighted group coding and, 303 Two-regressor model, 55, 77 See also Regressors Two-tailed confidence interval, 116 Two-tailed p-value, 456 See also p-value Two-way interaction, 438–440 See also Interactions Type I error Bonferroni method and, 320–328, 324t multiple regression analysis and, 311 multiple test problem and, 314–316, 335, 336, 339 overview, 308–309 660 Subject Index U Unbiased estimation, 91–92 Unbiased statistics, 91–92 Undercontrol, 154, 538 Unique contribution, 145–148 Univariate Mahalonobis distance, 485 See also Mahalanobis distance Unnecessary covariates, 524–525 Unplanned tests, 335–338 Unstandardized coefficients, 132 See also Regression coefficient Unweighted average simple effect, 407 See also Simple effects V Validity estimating true validity, 181–188, 182t, 186t, 187f mechanical prediction and, 180–181 random assignment and, 159 Validity shrinkage, 182, 207–208 Variable importance, 210–212 See also Importance of regressors Variable selection methods, 192–195 See also Selection Variable-by-variable tests, 443 See also Testing Variables, 533 See also Categorical variables; Dependent variables; Independent variable; Multicategorical variables; Ordinal variables; Regressors Variance See also Sampling variance interactions and, 436 missing data and, 545–546 simple regression model and, 24–25, 27–28 statistical inference and, 107–108 Venn diagrams and, 79–80 Variance inflation factor, 107, 111 Venn diagrams Cohen’s f 2, 227–228 multicategorical variables and, 269, 270f predictor variables and, 196–198, 197f relations among statistics and, 78–80, 79f Visualizing interactions, 596 See also Interactions W Wald statistic, 561 Warped surface, 384–385, 385f Weighted contrasts, 304–308 See also Contrasts Weighted effect coding, 301t, 592 Weighted group coding, 298–308, 299t, 301t, 302t See also Coding systems Weighted Helmert coding, 300–304, 301t, 302t, 591–592 See also Helmert coding Weighted means, 307 Weighted sum of means, 289–290 Weighted tests, 336–337 Y Y-intercept, 20, 23–24, 99–100 See also Intercept; Regression constant b0 Z Z scores, 503 About the Authors Richard B Darlington, PhD, is Emeritus Professor of Psychology at Cornell University He is a Fellow of the American Association for the Advancement of Science and has published extensively on regression and related methods, the cultural bias of mental tests, the long-term effects of preschool programs, and, most recently, the neuroscience of brain development and evolution Andrew F Hayes, PhD, is Professor of Quantitative Psychology at The Ohio State University His research and writing on data analysis have been published widely, and he is the author of Introduction to Mediation, Moderation, and Conditional Process Analysis and Statistical Methods for Communication Science Dr Hayes teaches data analysis, primarily at the graduate level, and frequently conducts workshops on statistical analysis throughout the world His website is www.afhayes.com 661 ... David L Weakliem REGRESSION ANALYSIS AND LINEAR MODELS: CONCEPTS, APPLICATIONS, AND IMPLEMENTATION Richard B Darlington and Andrew F Hayes GROWTH MODELING: STRUCTURAL EQUATION AND MULTILEVEL MODELING... Mediation and Path Analysis 15.1 Path Analysis and Linear Regression / 448 15.1.1 15.1.2 15.1.3 15.1.4 15.1.5 15.1.6 Direct, Indirect, and Total Effects / 448 The Regression Algebra of Path Analysis. .. covariates, and equal cell frequencies if there were two or 12 Regression Analysis and Linear Models more independent variables When a problem does meet the narrow requirements of ANOVA, linear models and