Assumptions to be met when using multiple regressi- 123docz.net

1. Make sure you have enough participants: psychologists have different opinions as to the numbers of participants required for using multiple regression. Often authors of statistical textbooks will recommend a participant/variable ratio. Assume you have four explanatory variables. The participant/variable ratio given in books tends to range from 15 participants

Statistics without maths for psychology 404

per variable (which means you should have 60 participants in the analysis) to 40 participants per variable (which means you should have 160 participants in the analysis) – quite a difference. Tabachnick and Fidell (2012) say that the simplest way of determining sample size is:

N Ú 50+8M

where M is the number of explanatory variables. Thus if you have four explanatory variables, and simply wish to look at the combined effects of the explanatory variables (Multiple R) you should have at least:

50+(8 * 4)= 50+(32)=

7 82 participants

Often, however, researchers wish to look at the significance of each variable separately. In this case, Tabachnick and Fidell recommend the following calculation:

N Ú 104+m

=104+4

= 7 108

If you are looking at both the combined and separate results, choose the higher number. In this case, you need at least 110 participants. If you do not use enough participants, your results will be over-optimistic, and you would not know whether your results would be generalisable.

2. The criterion variable should be drawn from a normally distributed population of scores:

the explanatory variables do not need to be normally distributed. It is the distribution of the criterion variable, y (conditional on the explanatory variables), which should be drawn from a normal distribution.

3. Variables should be linearly related to the criterion variable: just as in linear regression, the explanatory variables should be linearly related to the criterion variable – otherwise there is not much point in doing multiple regression. Inspecting the scattergrams for your variables will let you know whether you have linear relationships (as compared with curvilinear relationships).

4. Outliers may need to be eliminated: you learnt about outliers (extreme scores) in Chapter 3.

Outliers can have a big influence on regression analysis. Univariate outliers (unusual and extreme scores on one variable) are easy to spot, but multivariate outliers (extreme on two variables together) are more difficult. To give you an example of a multivariate outlier, a person aged 17 is not unusual, and earning a salary of £50,000 is not unusual (except for lecturers who write statistics books; they earn much less than this). However, to find a 17-year-old who earns such a salary is unusual. Sometimes it is quite hard to spot these sorts of outliers (SPSS can do this for you, although we do not cover it in this book).

However, if you have a small dataset, simply looking at the data might be enough. Then you might want to consider deleting extreme outliers from the analysis. However, it is obvi- ous you cannot just delete outliers simply because they are outliers. It requires careful con- sideration, especially when you have 6100 participants. Some students have asked us whether removing outliers is cheating, but we want the regression line (or plane) to reflect the ‘average’ subject, not somebody incredibly different from the rest.

5. Multicollinearity: the best situation occurs when the explanatory variables have high correla- tions with the criterion variable, but not with each other. You can inspect your correlational matrix before you perform multiple regression. You may find that some variables correlate highly with each other (0.8 and above). Having such high intercorrelated variables is called multicollinearity. Your variables are obviously measuring much the same thing. Sometimes you can combine highly correlated variables, or you can omit one. This has the benefit of reducing the number of variables, of course, which helps with (1) above.

CHAPTER 12 Regression analysis 405

Example from the literature

The impact of work setting, demographic characteristics and personality factors related to burnout among professional counsellors

Lent and Schwartz (2012) investigated burnout (physical and emotional depletion after job-related stress) in professional counsellors. As part of this study, they looked at whether personality factors could predict the degree of burnout in counsellors. The predictor variables (also called ‘exploratory variables’) were assessed using the 50-item International Personality Pool Big Five (IPIP) (Goldberg, 1999) which consists of neuroticism, openness to experience, agreeableness, extraversion and conscientiousness.

Items are scored from 1 to 5. Burnout was assessed by the Maslach Burnout Inventory; the subscales which form the inventory are emotional exhaustion, depersonalisation, and personal accomplishment (these are the criterion variables). The authors carried out three separate standard multiple regressions, one for each criterion variable. All three regressions are summarised in the table below.

The authors say: ‘all five independent [exploratory; predictor] variables significantly predicted emotional exhaustion, F (5, 336)= 48. 05, p 6 . 001; depersonalization, F (5, 336)= 17. 15, p 6 . 001;

and personal accomplishment, F (5, 336)= 20. 50, p 6 . 001. A large effect size was found for pre- dictions of emotional exhaustion (R2=.41). . . . Emotional exhaustion was predicted only by neuroticism (t=11.36, p 6 .001); as neuroticism increases, so does emotional exhaustion’.

Personal reflection

Jonathan Lent Ph.D., Assistant Professor, Marshall University, USA

ARTICLE: The impact of work setting, demographic characteristics, and personality factors related to burnout among professional counsellors

Professor Lent says:

“I have always been interested in understanding what factors contribute to burnout among counsellors because of the impact this can have on the effectiveness and wellness of counsellors. In that paper, I wanted to explore if factors such as work setting, demographics, or personality had an influence on the level of burnout experienced. One of the main issues when studying personality factors and burnout is that both personality factors and burnout are multidimensional. Burnout consists of three separate components: depersonalization, emotional exhaustion, and personal accomplishment.

Personality was defined using the ‘Big Five’ personality factors: extraversion, neuroticism, agreeableness, openness to experience, and conscientiousness. This required a more complicated analysis than when looking at the other factors being examined. The results demonstrated that low neuroticism, high extraversion, high agreeableness, and high conscientiousness led to the experience of higher personal accomplishment, lower depersonalization, and lower emotional exhaustion. The results, while not surprising, were interesting because little research had been conducted on personality and burnout at the time.

”

Statistics without maths for psychology 406

The following table is reproduced from the article:

Summary of Multiple Regression analysis for variables predicting emotional exhaustion, depersonalization, and personal accomplishment (N=340)

B B T p

Emotional Exhaustion

Neuroticism 2.31 .59 11.36 6.001

Extraversion .07 .04 .88 .38

Openness .18 .08 1.81 .07

Agreeableness -.15 -.06 -1.18 .24

Depersonalization

Neuroticism .15 .24 3.83 6.001

Extraversion .01 .02 .33 .74

Openness .07 .07 1.44 .15

Agreeableness -.32 -.27 -5.06 6.001

Conscientiousness -.06 -.08 -1.41 .16

Personal Accomplishment

Neuroticism -.21 -.30 -5.03 6.001

Extraversion -.02 -.01 -.01 .99

Openness -.01 -.01 -.07 .95

Agreeableness .26 .21 4.04 6.001

Consciousness .08 .10 1.80 .07

Activity 12.6

Look at the part of the table above which relates to the criterion variables Deperson- alization and Personal Accomplishment. Using the authors’ explanation relation to Emotional Exhaustion (above), complete their explanation relating to Depersonaliza- tion and Personal Accomplishment by filling in the gaps:

Depersonalization was predicted by ... t=..., p 6 0.001) and ... (t=..., p 6 0.001): as ... increases and ... decreases, depersonalization ...

Personal accomplishment was predicted by ... t=..., p ...) and agreeableness (t=..., p ...); as neuroticism decreases and agreeableness increases, personal accomplishment ...

CHAPTER 12 Regression analysis 407

Discover the website at www.pearsoned.co.uk/dancey where you can test your knowledge with multiple choice questions and activities, discover more about topics using the links to relevant websites, and explore the interactive flowchart designed to help you find the right method of analysis.

Summary

• Regression analysis allows us to predict scores on a dependent variable from a knowledge of the scores on one or more explanatory variables.

• Psychologists use regression to assess relationships between variables.

• The dependent variable is also called the criterion variable, and the exploratory variables are called the predictor variables.

• The line of best fit (the slope, b) can be used to determine how the criterion (y) changes as a result of changes in the predictor variable(s) (x).

• Confidence limits around b allow you to estimate the population slope with a certain degree of confidence.

SPSS exercises

Exercise 1

Enter the following data into SPSS and analyse it using the regression procedure: x is the score on examination anxiety and y is number of hours spent revising for examinations, in one week before the exam.

x y x y

33 45 56 79

64 68 44 44

33 100 22 16

22 44 44 61

70 62 80 60

66 61 66 61

59 52 79 60

84 66

Is anxiety about exams related to number of hours studied? How well does the regression equation predict?

Statistics without maths for psychology 408

Exercise 2

Professor Lemon wants to analyse the contribution made by social support (SOCSUP) and outgoing personality (OUTGO) to contentment at work (CONTENT), using his own colleagues as the participants.

Perform a multiple regression on the data below. Interpret the output for Professor Lemon, in the form of a written report, making sure that you let him know the contribution made by each of the predictor variables to the criterion variable.

SOCSUP OUTGO CONTENT

20.00 15.00 20.00

10.00 30.00 15.00

4.00 5.00 5.00

17.00 16.00 20.00

10.00 14.00 15.00

11.00 8.00 10.00

7.00 7.00 8.00

4.00 4.00 5.00

15.00 10.00 17.00

17.00 5.00 17.00

18.00 6.00 15.00

11.00 12.00 18.00

12.00 10.00 15.00

16.00 16.00 17.00

18.00 12.00 20.00

14.00 13.00 14.00

12.00 14.00 15.00

10.00 4.00 5.00

11.00 6.00 7.00

10.00 10.00 11.00

CHAPTER 12 Regression analysis 409

1. The line of best fit:

(a) Minimises the distance between the scores and the regression line (b) Is the best of all possible lines

2. In linear regression, where only one variable predicts y, and F is statistically significant at p=0.049, then:

(a) The value of p for t=0.049 (b) The value of p for t=0.0245 (c) The value of p for t=0.098 (d) Cannot tell

3. In a linear regression analysis, the residuals are:

(a) Actual scores minus the predicted scores (b) Actual scores plus the predicted scores

Questions 4 to 7 relate to the following output:

Multiple choice questions

Std. Error of the Estimate Model Summary

.639 R Square

.102

Adjusted R Square .078 R

.319a Model

a. Predictors: (Constant), MRL ANOVAb

.40881 15.12589

16.84279

37 38 Residual

.048a 1.71690 4.199

1.71690 1

1 Regression

Sig.

Mean Square F

Sum of Squares df Model

Total a. Predictors: (Constant), MRL b. Dependent Variable: PAIN Coefficientsa

1.757722 .01659

.15455 8.09626E-03

11.373 2.049 .31928

.000 .0476 (Constant)

Beta

B Std. Error

Model

Sig.

Standardized

Coefficient t

Unstandardized Coefficients

MRL 1

a. Dependent Variable: PAIN

Statistics without maths for psychology 410

4. Marks on MRL would be called:

(a) The predictor variable (b) The criterion variable (c) The covariate (d) The constant

5. The exact probability value of the results having occurred by sampling error, assuming the null hypothesis to be true, is:

(a) 0.0000 (b) 0.05 (c) 4.19978 (d) 0.048 6. b is:

(a) 2.049 (b) 0.31928 (c) 0.01659 (d) None of these 7. a is:

(a) 1.75772 (b) 1.5455 (c) 4.19978 (d) 0.01659

8. How many degrees of freedom would you have where the linear regression scatterplot had only ONE datapoint? (very unrealistic we know . . . )

(a) Zero (b) One (c) Two (d) Three

9. Psychologists use regression mainly to:

(a) Assess relationships between variables (b) Use the regression formula for further research (c) Look at differences between groups

(d) None of the above

Questions 10 to 15 relate to the partial output of the multiple regression analysis below:

Std. Error of the Estimate Model Summary

3.2388 R Square

.752

Adjusted R Square .711 R

.867a Model

a. Predictors: (Constant), age, previous history rating

CHAPTER 12 Regression analysis 411

10. The correlation between credit rating and the other variables is:

(a) 0.867 (b) 0.752 (c) 0.711 (d) 1.32

11. For every 1 standard deviation rise in previous history rating, credit rating:

(a) Decreases by 0.5 of a standard deviation (b) Increases by 0.5 of a standard deviation (c) Decreases by 0.3 of a standard deviation (d) Increases by 0.3 of a standard deviation 12. The predictor variables are called:

(a) Credit rating and age

(b) Credit rating and previous history rating (c) Previous history and age

(d) The criterion variables

13. The achieved significance level associated with the F-value of 18.182 is:

(a) 0.824 (b) 0.36 (c) 6 0.001 (d) None of these

Coefficientsa

.571

.276

.241

.145

2.368

1.904 .413

.514 1.096

.592 previous

history rating

Beta

B Std.

Error Model

95% Confidence Interval for B Standardized

Coefficient t

Unstandardized Coefficients

age

.790 3.471 .228 8.353

Upper Bound (Constant)

a. Dependent Variable: credit rating

.046

2.040 26.774 Lower Bound

.036

.081 Sig.

.824 ANOVAb

10.490 125.876

507.333

12 14 Residual

.000a

190.729 18.182

381.457 2

1 Regression

Sig.

Mean Square F

Sum of Squares df Model

Total

a. Predictors: (Constant), age, previous history rating b. Dependent Variable: credit rating

Statistics without maths for psychology 412

14. The slope of the line (b) for previous history rating is:

(a) 0.514 (b) 0.790 (c) 0.276 (d) 0.571 15. a is:

(a) 0.514 (b) 0.790 (c) 0.276 (d) 0.571

16. Multicollinearity means:

(a) There are high intercorrelations among the predictor variables

(b) The predictor variables are positively correlated with the criterion variable (c) The variables show a skewed distribution

(d) The variables show a peaked distribution

17. Kieran wants to perform a standard multiple regression using six explanatory variables. He is only interested in the overall R2. According to Tabachnick and Fidell’s formula, how many participants should he recruit?

(a) 98 (b) 56 (c) 240 (d) 120

18. Saeeda doesn’t know about the necessity for large participant numbers in multiple regression. She’s only got 20 participants in her study, and she has 10 explanatory variables. Which is the most appropriate statement? Compared with an analysis using 100 participants, Multiple R will be:

(a) Conflated (b) Inflated (c) Deflated (d) No different

Questions 19 and 20 relate to the following, which is an extract from a results section in a journal:

‘All predictors significantly predicted blood pressure (adj. r2=.42; p=.002). Stress during the inter- view was the strongest predictor of increased blood pressure (beta=0.49, p=.001) followed by age (beta=0.18, p=.002).’

19. Which is the most appropriate statement? The explanatory variables predicted (a) 6.5% of the variation in blood pressure

(b) 42% of the variation in blood pressure (c) 6.5% of the variation in stress (d) 18% of the variation in age

CHAPTER 12 Regression analysis 413

20. Which is the most appropriate statement?

(a) As stress increased by 1 standard deviation, blood pressure increased by nearly half a standard deviation

(b) As stress increased by 1 standard deviation, age increased by 0.18 of a standard deviation (c) As age increased by 1 year, blood pressure fell by 0.18 of a standard deviation

(d) As age increased by 1 standard deviation, blood pressure increased by 18%

Goldberg, L. R. (1999) ‘A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models’, Personality Psychology in Europe, 7(1): 7–28.

Lent, J. and Schwartz, R. C. (2012) ‘The impact of work setting, demographic characteristics, and personality factors related to burnout among professional counselors’, Jour- nal of Mental Health Counseling, 34(4): 355–72.

Meyrueix, L., Durham, G., Miller, J., Bryant Smalley, K. and Warren, J. C. (2015) ‘Association between depression and aggression in rural women’, Journal of Health Disparities

Research and Practice, 8(4), Article 10. Available at:

Tabachnick, B. G. and Fidell, L. S. (2012) Using Multivariate

Statistics, 6th edn. New York: Allyn and Bacon.

Yu, C. H. (2003) ‘Illustrating degrees of freedom in multi- media’. Available at: www.creative-wisdom.com/pub/df/

default.htm [accessed 23 March 2015].

Yu, C. H., Lo, W. J. and Stockford, S. (2001) ‘Using multi- media to visualize the concepts of degree of freedom, perfect-fitting, and over-fitting’. Paper presented at the Joint Statistical Meetings, Atlanta, GA.

References

1. d, 2. d, 3. a, 4. a, 5. d, 6. c, 7. a, 8. a, 9. a, 10. a, 11. b, 12. c, 13. c, 14. d, 15. b, 16. a, 17. a, 18. b,

19. b, 20. a

Answers to multiple choice questions

CHAPTER OVERVIEW

This chapter will introduce you to a technique that is based on both analysis of variance (ANOVA) and linear regression. This technique is called analysis of covariance (ANCOVA) and it builds on the material you have learnt in previous chapters. A simple ANCOVA shows you whether your groups differ on a dependent variable, while partialling out the effects of another variable, called the covariate. A covariate is a variable that has a linear relationship with the dependent variable. You have already learnt about removing (partialling out) the effects of a variable, in Chapter 6, on correlational analysis. In ANCOVA, the variable that is partialled out is called the covariate. In this chapter, we are going to discuss the analysis of a one-way, between-participants design, and the use of one covariate.

As the material in this chapter is based on the one-way ANOVA, everything we talked about in relation to the analysis of a one-way, between-participants design (in Chapter 10) applies here. In other words, the analysis of a one-way design includes the following:

■ Descriptive statistics, such as means, standard deviations, graphical illustrations such as box and whisker plots, and error bars.

■ Effect size – either the magnitude of the difference between the conditions (d), and/or an overall measure of effect such as partial eta2 (η2).

■ An inferential test: in this case ANCOVA, which shows us (assuming the null hypothesis to be true) how likely it is, after partialling out the effects of the covariate, that differences between the conditions are due to sampling error.

In this chapter you will:

■ gain a conceptual understanding of ANCOVA

■ learn the conditions under which it is appropriate to use ANCOVA

■ understand the assumptions that you must meet in order to perform ANCOVA

■ learn how to present results using graphical techniques.

There are two main reasons for using ANCOVA:

1. To reduce error variance.

2. To adjust the means on the covariate, so that the mean covariate score is the same for all groups.

Analysis of three or more groups partialling out

effects of a covariate 13

CHAPTER 13 Analysis of three or more groups partialling out effects of a covariate 415

Example

Imagine that new students are assigned at random to three different introductory statistics groups, using three different teaching methods. They have an hour’s session.

1. Group 1 has an hour of traditional ‘chalk ‘n’ talk’.

2. Group 2 has an hour of the same, only the lecture is interactive in that students can interrupt and ask questions, and the lecturer will encourage this. This is traditional plus interactive.

3. Group 3 is highly interactive in that the students work in groups with guidance from the lecturer.

In order to discover which method has worked best, we give the students a 20-question test to see which group has retained the most material from the one-hour session. Let’s say we expect Group 3 to retain the most material (i.e. the highly interactive method is expected to be the most effective teaching method).

We could perform a simple one-way ANOVA, using teaching method as the independent variable (three levels). This would show us whether there were differences between the groups, in retention of the statistics material. However, assume that the ability to retain material in the lecture is related to IQ, irrespective of teaching method. If IQ and ability to retain such material are associated, we would expect the association to be positive: that is, IQ and scores on the statistics test should be positively correlated.

Imagine that we have collected data on IQ and marks in a statistics test; the scattergram might look something like Figure 13.1.

Although the correlation is positive, it is a moderate one: +0.49 in fact.

What happens in ANCOVA is that IQ (called the covariate because it varies with the dependent variable) is taken into account in the mathematical calculations. What the formula does is to remove the variance due to the association between statistics performance and IQ. As we have said above, this will reduce our error variance.

A good way to visualise what is happening is with a graph (Figure 13.2). This shows you our three different teaching method groups (called traditional, mixed and interactive). The instructions on how to obtain a chart of regression lines such as the ones in Figure 13.2 follow.

Figure 13.1 Scattergram of IQ and marks in statistics test

106 108 110 112 114 116 118 120

0 10 20 30

Test

Statistics without maths for psychology 416

Figure 13.2 Graph showing scattergrams and lines of best fit for three teaching groups IQ

Group

Test

106 108 110 112 114 116 118 120

0 6 4 2 8 1012 14 16 18 2024262822 30

3.00 2.00 1.00

SPSS: obtaining a chart of regression lines

Select Graphs, Legacy Dialogs then Scatter/Dot as follows:

CHAPTER 13 Analysis of three or more groups partialling out effects of a covariate 417

This gives you the following dialogue box:

Ensure that the Simple Scatter box is selected, then click on Define. This gives you the simple scatterplot dialogue box as follows:

The variables are moved from the left-hand side to the right. Score is moved to the Y Axis box, Motiva- tion to the X Axis box, and Group to the Set Markers by box. This is important because it enables you to obtain separate regression lines for each group.

Statistics without maths for psychology 418

Click on OK. This leads to the following:

Once you have done this, you double-click on the graph and obtain the following. Click on the X icon.

You can now change the numbers on the x axis, so that the x axis begins at zero and goes up in incre- ments of 5, and shows the range of scores (0 to 20)

Assumptions to be met when using multiple regression

Predicting the criterion variables from several explanatory

Steps taken in performing a factor analysis