SPSS Explained SPSS Explained SPSS Explained provides the student with all that they need to undertake statistical analysis using SPSS It combines a step by step approach to each procedure with easy t[.]
SPSS Explained SPSS Explained provides the student with all that they need to undertake statistical analysis using SPSS It combines a step-by-step approach to each procedure with easy-to-follow screenshots at each stage of the process A number of other helpful features are provided, including: n n n regular advice boxes with tips specific to each test explanations divided into ‘essential’ and ‘advanced’ sections to suit readers at different levels frequently asked questions at the end of each chapter The first edition of this popular book has been fully updated for IBM SPSS version 21 and also includes: n n n chapters that explain bootstrapping and how this is used an introduction to binary logistic regression coverage of new features such as Chart Builder Presented in full colour and with a fresh, reader-friendly layout, this fully updated new edition also comes with a companion website featuring an array of supplementary resources for students and instructors Minimal prior knowledge is assumed, so the book is well designed for the novice user, but it will also be a useful reference source for those developing their own expertise in SPSS It is suitable for all students who need to statistical analysis using SPSS in various disciplines including psychology, social science, business studies, nursing, education, health and sport science, communication and media, geography, and biology The authors have many years of experience in teaching SPSS to students from a wide range of disciplines Their understanding of SPSS users’ concerns, as well as a knowledge of the type of questions students ask, form the foundation of this book Perry R Hinton is a psychologist, and has worked for over twenty-five years in four British universities, in positions ranging from lecturer to Head of Department He has taught in the areas of cognitive and social psychology, and research methods and statistics, primarily to psychology and communication and media students; but also to a wide range of students studying subjects including nursing, social work, linguistics, philosophy and education He has written four textbooks and edited the Psychology Focus series for Routledge Isabella McMurray is a senior lecturer in psychology at the University of Bedfordshire, UK She has taught in a range of areas, primarily within psychology, including developmental psychology, and qualitative and quantitative research methods and analysis She undertakes consultancy, training and research with local authorities and charities, including working with social workers, probation services, education services and road safety teams Charlotte Brownlow is a senior lecturer in psychology at the University of Southern Queensland, Australia She has taught in a range of areas, primarily within psychology, including developmental and social psychology, and qualitative and quantitative research methods SPSS Explained Second edition PERRY R HINTON, ISABELLA MCMURRAY AND CHARLOTTE BROWNLOW Second edition published 2014 by Routledge 27 Church Road, Hove, East Sussex BN3 2FA and by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2014 Perry R Hinton, Isabella McMurray & Charlotte Brownlow The right of Perry R Hinton, Isabella McMurray and Charlotte Brownlow to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988 All rights reserved No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe First edition published by Routledge 2004 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data Hinton, Perry R (Perry Roy), 1954– SPSS explained/Perry R Hinton, Isabella McMurray & Charlotte Brownlow – 2nd edition pages cm Includes bibliographical references and index SPSS (Computer file) Psychometrics – Computer programs I McMurray, Isabella, 1970– II Brownlow, Charlotte, 1974– III Title BF39.H538 2014 300.285′555 – dc23 2013037038 ISBN: 978-0-415-61601-0 (hbk) ISBN: 978-0-415-61602-7 (pbk) ISBN: 978-1-315-79729-8 (ebk) Typeset in Berkeley and Stone Sans by Florence Production, Stoodleigh, Devon, UK To our families This page intentionally left blank Contents Preface Acknowledgements CHAPTER xi xiii INTRODUCTION THE BOOK OUTLINE THE CHAPTER OUTLINE CHAPTER 2 DATA ENTRY THINGS TO THINK ABOUT BEFORE ENTERING DATA USING SPSS FOR THE FIRST TIME DATA EDITOR 7 SENDING VARIABLES ACROSS TO A DIFFERENT BOX PERFORMING CALCULATIONS ON YOUR DATA RECODING DATA SPLIT FILE COMMAND 22 25 WEIGHT CASES COMMAND 28 IMPORTING DATA FROM EXCEL 29 DESCRIPTIVE STATISTICS 35 INTRODUCTION TO DESCRIPTIVE STATISTICS DATA ENTRY 17 17 20 REPLACING MISSING VALUES CHAPTER 35 FREQUENCIES COMMAND 35 DESCRIPTIVES COMMAND 41 EXPLORE COMMAND 43 CROSSTABS COMMAND 43 CUSTOM TABLES COMMAND 50 35 viii CONTENTS CHAPTER ILLUSTRATIVE STATISTICS 57 INTRODUCTION TO ILLUSTRATIVE STATISTICS GENERATING GRAPHS WITH CHART BUILDER HISTOGRAMS BOXPLOTS 58 58 61 BAR CHARTS 66 CLUSTERED AND STACKED BAR CHARTS ERROR BAR CHARTS LINE GRAPHS PIE CHARTS 69 72 75 78 GENERATING GRAPHS WITH LEGACY DIALOGS EDITING GRAPHS EDITING AXES CHAPTER 57 80 89 90 INTRODUCTION TO STATISTICAL TESTING DIFFERENT STATISTICAL TESTS 93 INTRODUCTION TO PARAMETRIC TESTS THE LOGIC OF SIGNIFICANCE TESTING POWER 93 93 94 96 CONFIDENCE INTERVALS 98 THINGS TO CONSIDER BEFORE ANALYSING YOUR DATA PLOTTING THE NORMAL CURVE TESTING FOR NORMALITY 99 100 103 COMPARING MORE THAN ONE SAMPLE: THE ASSUMPTION OF HOMOGENEITY OF VARIANCE 109 VIOLATIONS OF THE ASSUMPTIONS BOOTSTRAPPING 113 ADVICE ON SIGNIFICANCE TESTING CHAPTER 117 t TESTS 119 INDEPENDENT SAMPLES t TEST THE PAIRED SAMPLES t TEST CHAPTER 109 120 127 INTRODUCTION TO ANALYSIS OF VARIANCE (GENERAL LINEAR MODEL) INTRODUCTION 137 GENERAL LINEAR MODEL COMMAND A MODEL FOR ANALYSING DATA THE GENERAL LINEAR MODEL 138 138 141 KEY TERMS IN THE ANALYSIS OF VARIANCE UNIVARIATE ANALYSIS OF VARIANCE 145 MULTIVARIATE ANALYSIS OF VARIANCE 146 144 137 CONTENTS REPEATED MEASURES ANALYSIS OF VARIANCE 147 CONTRASTS AND MULTIPLE PAIRWISE COMPARISONS MAIN EFFECTS COMPARING CONDITION MEANS CHAPTER 150 154 155 ONE FACTOR ANALYSIS OF VARIANCE 157 ONE FACTOR INDEPENDENT ANALYSIS OF VARIANCE 157 ONE FACTOR REPEATED MEASURES ANALYSIS OF VARIANCE 177 CHAPTER TWO FACTOR ANALYSIS OF VARIANCE 189 TWO FACTOR INDEPENDENT MEASURES ANALYSIS OF VARIANCE 190 TWO FACTOR REPEATED MEASURES ANALYSIS OF VARIANCE TWO FACTOR MIXED DESIGN ANALYSIS OF VARIANCE 196 205 CALCULATION OF SIMPLE MAIN EFFECTS – TWO FACTOR MIXED DESIGN ANALYSIS OF VARIANCE 216 CHAPTER 10 INTRODUCTION TO MULTIVARIATE ANALYSIS OF VARIANCE 221 INDEPENDENT MULTIVARIATE ANALYSIS OF VARIANCE 222 REPEATED MEASURES MULTIVARIATE ANALYSIS OF VARIANCE 228 CHAPTER 11 NONPARAMETRIC TWO SAMPLE TESTS 235 MANN–WHITNEY U TEST (FOR INDEPENDENT SAMPLES) 235 WILCOXON SIGNED-RANKS TEST (FOR RELATED SAMPLES) 246 CHAPTER 12 NONPARAMETRIC k SAMPLE TESTS 257 KRUSKAL–WALLIS TEST FOR INDEPENDENT SAMPLES FRIEDMAN TEST FOR RELATED SAMPLES 257 267 CHAPTER 13 CHI-SQUARE TEST OF INDEPENDENCE AND GOODNESS OF FIT TEST CHI-SQUARE AS A TEST OF INDEPENDENCE CUSTOM TABLES AND CHI-SQUARE A2 277 278 283 × CROSSTABULATION AND CHI-SQUARE 286 × × CROSSTABULATION AND TEST PROCEDURE: LAYERED CHI-SQUARE 288 CHI-SQUARE AS A ‘GOODNESS OF FIT’ TEST 292 CHAPTER 14 LINEAR CORRELATION AND REGRESSION INTRODUCTION TO THE PEARSON CORRELATION INTRODUCTION TO THE SPEARMAN CORRELATION 297 298 301 ix 358 USING SPSS TO ANALYSE QUESTIONNAIRES: RELIABILITY SPSS essential n n n The SPSS output shows the findings of the analysis for each item on the questionnaire The Corrected Item-Total Correlation column shows the relationship between the responses on individual questions and the overall total score on the questionnaire We would expect a reliable question to have a positive relationship with the overall total, ideally being above An item displaying a weak positive or a negative relationship to the total indicates a question that may be poor on reliability and is thus affecting the findings from the whole scale In the example above, we can see that question is the weakest item, because its correlation with the overall total is only 052 The effects that individual questions can have on the overall reliability of the questionnaire are highlighted by the inverse relationship between the Corrected Item-Total Correlation and the Alpha if Item Deleted columns The importance of the weak relationship between question and the overall total score on the questionnaire is reflected in the increase in the alpha score for the questionnaire if this item is omitted We saw in the earlier table that the overall alpha value of 852 While this figure is high, the removal of question from the final questionnaire would see this figure rise to 865 (as can be seen from the table) SPSS advanced n n The Scale Mean if Item Deleted column shows the effects on the overall mean of the scale if an individual question is deleted In the example above, if question was omitted from the final version of the questionnaire, the overall mean of the scale would fall from 42.62 (see table below) to 38.40 Similar effects can be seen from examining the Scale Variance if Item Deleted column The Squared Multiple Correlation column gives us a value for the amount of variability on this item that can be predicted by the items in the rest of the questionnaire The Scale Statistics table gives the descriptive statistics for the scale as a whole SPSS essential n n In our example, when the total scores of the questionnaire are examined, participants scored a mean of 42.62, with a variance of 116.377 and a standard deviation of 10.788 The small standard deviation thus indicates that there are not wide variations in the scores of our participants for the overall total score on the questionnaire USING SPSS TO ANALYSE QUESTIONNAIRES: RELIABILITY 359 FAQ Sometimes when I conduct a reliability analysis, the Alpha and Standardized alpha values are missing from the screen How I obtain these values? This may be the case if the questionnaire contains many items In these instances click on the output box once to select the text, then place the cursor on the middle square of the enlarging bar at the bottom of the box and drag down until the box is big enough to display the desired information Alternatively, print your output screen, and the complete analysis should be visible FAQ I have got my alpha figure but how I know whether this indicates a reliable scale or not? There is much debate among researchers as to where the appropriate cut-off points are for reliability A good guide is: n n n n 0.90 0.70 0.50 0.50 and above shows excellent reliability; to 90 shows high reliability; to 70 shows moderate reliability; and below shows low reliability FAQ I have measured the reliability of my questionnaire using Cronbach’s Alpha How can I assess the validity of my questionnaire? To assess test validity you could ask participants to complete a previously validated questionnaire, which purports to measure the same thing, in addition to the newly created questionnaire The scores from the two questionnaires need to be positively correlated in order for the new test to be considered valid FAQ I have assessed the reliability of my questionnaire but would now like to analyse it to look for relationships and differences within my respondents How could I this? There are numerous options available to you n n n n n Use the Compute command to generate total scores, or sub-scale scores (Chapter 2) Look for differences between groups of participants filling in the questionnaire by using tests of difference (t tests, ANOVA, or their nonparametric equivalents described in Chapters 6, 8, 11 and 12) Look for associations between responses to individual questions (chi-square, Chapter 13) Look for relationships between scales using correlations (Chapter 14) Examine whether one score reliably predicts other scores using regression (Chapters 14 and 15) FAQ I did not get all of the tables you had in your output What have I done wrong? It sounds as if you forgot to press the Statistics option Go through the procedure again and remember to click on the Statistics button See Chapter 360 USING SPSS TO ANALYSE QUESTIONNAIRES: RELIABILITY FAQ I have completed my survey using an online application Can this be analysed using SPSS? See Chapter Yes, indeed some questionnaire applications enable you to download your data directly into SPSS Others produce an Excel spreadsheet as the data output If your application produces an Excel file, this can be imported into SPSS, just make sure that the naming conventions match the requirements of SPSS Please see Chapter for more details on how to import data from Excel into SPSS Glossary ANOVA An acronym for the ANalysis Of VAriance By analysing the variance in the data due to different sources (e.g an independent variable or error) we can decide if our experimental manipulation is influencing the scores in the data Asymp Sig (asymptotic significance) An estimate of the probability of a nonparametric test statistic employed by computer statistical analysis programs This is often used when the exact probability cannot be worked out quickly beta weight The average amount by which the dependent variable increases when the independent variable increases by one standard deviation (all other independent variables are held constant) between subjects Also known as independent measures In this design, the samples we select for each condition of the independent variable are independent, as a member of one sample is not a member of another sample bootstrapping A sample is used to estimate a population New bootstrap samples are randomly selected from the original sample with replacement (so an item can be selected more than once) The bootstrap samples, often 1,000 or more, are then used to estimate the population sampling distribution case A row in the Data Editor file; the data collected from a single participant Chart Editor The feature in SPSS that allows the editing of charts and graphs comparisons The results of a statistical test with more than two conditions will often show a significant result but not where the difference lies We need to undertake a comparison of conditions to see which ones are causing the effect If we compare them two at a time this is known as pairwise comparison and if we perform unplanned comparisons after discovering the significant finding these are referred to as post hoc comparisons component The term used in the principal components method of factor analysis for a potential underlying factor condition A researcher chooses levels or categories of the independent variable(s) to observe the effect on the dependent variable(s) These are referred to as 362 GLOSSARY conditions, levels, treatments or groups For example, ‘morning’ and ‘afternoon’ might be chosen as the conditions for the independent variable of time of day confidence interval In statistics we use samples to estimate population values, such as the mean or the difference in means The confidence interval provides a range of values within which we predict lies the population value (to a certain level of confidence) The 95 per cent confidence interval of the mean worked out from a sample indicates that the population mean would fall between the upper and lower limits 95 per cent of the time contrasts With a number of conditions in a study we may plan a set of comparisons such as contrasting each condition with a control condition These planned comparisons are referred to as contrasts We can plan complex contrasts – for example, the effects of conditions and against condition correlation The degree to which the scores on two (or more) variables co-relate That is, the extent to which a variation in the scores on one variable results in a corresponding variation in the scores on a second variable Usually the relationship we are looking for is linear A multiple correlation examines the relationship between a combination of predictor variables with a dependent variable critical value We reject the null hypothesis after a statistical test if the probability of the calculated value of the test statistic (under the null hypothesis) is lower than the significance level (e.g .05) Computer programs print out the probability of the calculated value (e.g .023765) and we can examine this to see if it is higher or lower than the significance level Textbooks print tables of the critical values of the test statistic, which are the values of the statistic at a particular probability For example, if the calculated value of a statistic (i.e a t test) is 4.20 and the critical value is 2.31 (at the 05 level of significance), then clearly the probability of the test statistic is less than 05 crosstabulation Frequency data can be represented in a table with the rows as the conditions of one variable and the columns as the conditions of a second variable This is a crosstabulation We can include more variables by adding ‘layers’ to the crosstabulation in SPSS Data Editor The feature in SPSS where data is entered Saving the information from the Data Editor will produce an SPSS sav file There are two windows within the Data Editor: Data View and Variable View Data View The Data View window within the Data Editor presents a spreadsheet style format for entering all the data points degrees of freedom When calculating a statistic we use information from the data (such as the mean or total) in the calculation The degrees of freedom is the number of scores we need to know before we can work out the rest using the information we already have It is the number of scores that are free to vary in the analysis dependent variable The variable measured by the researcher and predicted to be influenced by (that is, depend on) the independent variable descriptive statistics Usually we wish to describe our data before conducting further analysis or comparisons Descriptive statistics such as the mean and standard deviation enable us to summarise a dataset GLOSSARY discriminant function A discriminant function is one derived from a set of independent (or predictor) variables that can be used to discriminate between the conditions of a dependent variable distribution The range of possible scores on a variable and their frequency of occurrence In statistical terms we refer to a distribution as a ‘probability density function’ We use the mathematical formulae for known distributions to work out the probability of finding a score as high as or as low as a particular score effect size The size of the difference between the means of two populations, in terms of standard deviation units eigenvalue In a factor analysis an eigenvalue provides a measure of the amount of variance that can be explained by a proposed factor If a factor has an eigenvalue of 1, it can explain as much variance as one of the original independent variables equality of variance See homogeneity of variance factor Another name for ‘variable’, used commonly in the analysis of variance to refer to an independent variable In factor analysis we analyse the variation in the data to see if it can be explained by fewer factors (i.e ‘new’ variables) than the original number of independent variables general linear model The underlying mathematical model employed in parametric statistics When there are only two variables, X and Y, the relationship between them is linear when they satisfy the formula Y = a + bX (where a and b are constants) The general linear model is a general form of this equation allowing as many X and Y variables as we wish in our analysis grouping variable In analysing data in SPSS we can employ an independent measures independent variable as a grouping variable This separates our participants into groups (such as introverts versus extroverts) It is important when inputting data into a statistical analysis program that we include the grouping variable as a column, with each group defined (i.e introvert as ‘1’ and extrovert as ‘2’) We can then analyse the scores on other variables in terms of these groups, such as comparing the introverts with the extroverts on, say, a monitoring task homogeneity of variance Underlying parametric tests is the assumption that the populations from which the samples are drawn have the same variance We can examine the variances of the samples in our data to see whether this assumption is appropriate with our data or not homoscedasticity The scores in a scatterplot are evenly distributed along and about a regression line This is an assumption made in linear correlation (This is the correlation and regression equivalent of the homogeneity of variance assumption.) hypothesis A predicted relationship between variables For example: ‘As sleep loss increases so the number of errors on a specific monitoring task will increase.’ illustrative statistics Statistics that illustrate rather than analyse a set of data, such as the total number of errors made on a reading task Often we illustrate a dataset by means of a graph or a table 363 364 GLOSSARY independent or independent measures A term used to indicate that there are different subjects (participants) in each condition of an independent variable; also known as ‘between subjects’ independent variable A variable chosen by the researcher for testing, predicted to influence the dependent variable inferential statistics Statistics that allow us to make inferences about the data – for example, whether samples are drawn from different populations or whether two variables correlate interaction When there are two or more factors in an analysis of variance, we can examine the interactions between the factors An interaction indicates that the effect of one factor is not the same at each condition of another factor For example, if we find that more cold drinks are sold in summer and more hot drinks sold in winter, we have an interaction of ‘drink temperature’ and ‘time of year’ intercept A linear regression finds the best fit linear relationship between two variables This is a straight line based on the formula Y = a + bX, where b is the slope of the line and a is the intercept, or point where the line crosses the Y-axis (In the SPSS output for an ANOVA the term ‘intercept’ is used to refer to the overall mean value and its difference from zero.) item When we employ a test with a number of variables (such as questions in a questionnaire) we refer to these variables as ‘items’, particularly in reliability analysis where we are interested in the correlation between items in the test kurtosis The degree to which a distribution differs from the bell-shaped normal distribution in terms of its peakness A sharper peak with narrow ‘shoulders’ is called leptokurtic and a flatter peak with wider ‘shoulders’ is called platykurtic levels of data Not all data are produced by using numbers in the same way Sometimes we use numbers to name or allocate participants to categories (i.e labelling a person as a liberal, and allocating them the number 1, or a conservative, and allocating them the number 2) In this case the data is termed ‘nominal’ Sometimes we employ numbers to rank order participants, in which case the data is termed ‘ordinal’ Finally, when the data is produced on a measuring scale with equal intervals the data is termed ‘interval’ (or ‘ratio’ if the scale includes an absolute zero value) Parametric statistics require interval data for their analyses Likert scale A measuring scale where participants are asked to indicate their level of agreement or disagreement to a particular statement on, typically, a 5or 7-point scale (from strongly agree to strongly disagree) linear correlation The extent to which variables correlate in a linear manner For two variables this is how close their scatterplot is to a straight line linear regression A regression that is assumed to follow a linear model For two variables this is a straight line of best fit, which minimises the ‘error’ main effect The effect of a factor (independent variable) on the dependent variable in an analysis of variance measured without regard to the other factors in the analysis In an ANOVA with more than one independent variable we can examine the effects of each factor individually (termed the main effect) and the factors in combination (the interactions) GLOSSARY MANOVA A Multivariate Analysis of Variance An analysis of variance technique where there can be more than one dependent variable in the analysis mean A measure of the ‘average’ score in a set of data The mean is found by adding up all the scores and dividing by the number of scores mean square A term used in the analysis of variance to refer to the variance in the data due to a particular source of variation median If we order a set of data from lowest to highest, the median is the point that divides the scores into two, with half the scores below and half above the median mixed design A mixed design is one that includes both independent measures factors and repeated measures factors For example, a group of men and a group of women are tested in the morning and the afternoon In this test ‘gender’ is an independent measures variable (also known as ‘between subjects’) and time of day is a repeated measures factor (also known as ‘within subjects’), so we have a mixed design mode The score that has occurred the highest number of times in a set of data multiple correlation The correlation of one variable with a combination of other variables multivariate Literally, this means ‘many variables’ but is most commonly used to refer to a test with more than one dependent variable (as in the MANOVA) nonparametric test Statistical tests that not use, or make assumptions about, the characteristics (parameters) of populations normal distribution A bell-shaped frequency distribution that appears to underlie many human variables The normal distribution can be worked out mathematically using the population mean and standard deviation null hypothesis A prediction that there is no relationship between the independent and dependent variables one-tailed test A prediction that two samples come from different populations, specifying the direction of the difference – that is, which of the two populations will have the larger mean value outlier An extreme value in a scatterplot in that it lies outside the main cluster of scores When calculating a linear correlation or regression, an outlier will have a disproportionate influence on the statistical calculations Output Navigator An SPSS navigation and editing system in an outline view in the left-hand column of the output window This enables the user to hide or show output or to move items within the output screen p value The probability of a test statistic (assuming the null hypothesis to be true) If this value is very small (e.g .02763), we reject the null hypothesis We claim a significant effect if the p value is smaller than a conventional significance level (such as 05) parameter A characteristic of a population, such as the population mean parametric tests Statistical tests that use the characteristics (parameters) of populations or estimates of them (when assumptions are also made about the populations under study) 365 366 GLOSSARY partial correlation The correlation of two variables after having removed the effects of a third variable from both participant A person taking part as a ‘subject’ in a study The term ‘participant’ is preferred to ‘subject’ as it acknowledges the person’s agency – i.e that they have consented to take part in the study population A complete set of items or events In statistics, this usually refers to the complete set of subjects or scores we are interested in, from which we have drawn a sample post hoc tests When we have more than two conditions of an independent variable, a statistical test (such as an ANOVA) may show a significant result but not the source of the effect We can perform post hoc tests (literally, post hoc means ‘after this’) to see which conditions are showing significant differences Post hoc tests should correct for the additional risk of Type I errors when performing multiple tests on the same data power of a test The probability that, when there is a genuine effect to be found, the test will find it (that is, correctly reject a false null hypothesis) As an illustration, one test might be like a stopwatch that gives the same time for two runners in a race but a more powerful test is like a sensitive electronic timer that more accurately shows the times to differ by a fiftieth of a second probability The chance of a specific event occurring from a set of possible events, expressed as a proportion For example, if there were women and men in a room, the probability of meeting a woman first on entering the room is 4/10 or as there are women out of 10 people in the room A probability of indicates an event will never occur and a probability of that it will always occur In a room of only 10 men there is a probability of (0/10) of meeting a woman first and a probability of (10/10) of meeting a man range The difference between the lowest score and the highest score rank When a set of data is ordered from lowest to highest, the rank of a score is its position in this order regression The prediction of scores on one variable by their scores on a second variable The larger the correlation between the variables, the more accurate the prediction We can undertake a multiple regression where the scores on one variable are predicted from the scores on a number of predictor variables reliability A reliable test is one that that will produce the same result when repeated (in the same circumstances) We can investigate the reliability of the items in a test (such as the questions in a questionnaire) by examining the relationship between each item and the overall score on the test repeated measures A term used to indicate that the same subjects (participants) are providing data for all the conditions of an independent variable; also known as ‘within subjects’ residual A residual is the difference between an actual score and a predicted score If scores are predicted by a model (such as the normal distribution curve) then the residual will give a measure of how well the data fit the model Sig (2-tailed) The exact probability of the test statistic for a two tailed prediction Sometimes an estimate (see Asymp.Sig – asymptotic significance – is also included) GLOSSARY significance level The risk (probability) of erroneously claiming a relationship between an independent and a dependent variable when there is not one Statistical tests are undertaken so that this probability is chosen to be small, usually set at 05 indicating that this will occur no more than times in 100 simple main effects A significant interaction in a two factor analysis of variance indicates that the effect of one variable is different at the various conditions of the other variable Calculating simple main effects tells us what these different effects are A simple main effect is the effect of one variable at a single condition of the other variable skew The degree of symmetry of a distribution A symmetrical distribution, like the normal distribution, has a skew of zero The skew is negative if the scores ‘pile’ to the right of the mean and positive if they pile to the left sphericity An assumption we make about the data in a repeated measures design Not only must we assume homogeneity of variance but homogeneity of covariance – that is, homogeneity of variance of the differences between samples Essentially, we must assume the effect of an independent variable to be consistent across both conditions and subjects in these designs for the analysis to be appropriate standard deviation A measure of the standard (‘average’) difference (deviation) of a score from the mean in a set of scores It is the square root of the variance (There is a different calculation for standard deviation when the set of scores are a population as opposed to a sample.) standard error of the estimate A measure of the ‘average’ distance (standard error) of a score from the regression line standard error of the mean The standard deviation of the distribution of sample means It is a measure of the standard (‘average’) difference of a sample mean from the mean of all sample means of samples of the same size from the same population standard score The position of a score within a distribution of scores It provides a measure of how many standard deviation units a specific score falls above or below the mean It is also referred to as a z score statistic Specifically, a characteristic of a sample, such as the sample mean More generally, statistic and statistics are used to describe techniques for summarising and analysing numerical data statistics viewer The SPSS Statistics Viewer is the name of the file that contains all of the output from the SPSS procedures Often referred to (as in this book) as the Output Window subject The term used for the source of data in a sample If people are the subjects of the study it is viewed as more respectful to refer to them as participants, which acknowledges their role as helpful contributors to the investigation sums of squares The sum of the squared deviations of scores from their mean value test statistic The calculated value of the statistical test that has been undertaken two-tailed test A prediction that two samples come from different populations, but not stating which population has the higher mean value 367 368 GLOSSARY Type I error The error of rejecting the null hypothesis when it is true The risk of this occurring is set by the significance level Type II error The error of not rejecting the null hypothesis when it is false univariate A term used to refer to a statistical test where there is only one dependent variable ANOVA is a univariate analysis as there can be more than one independent variable but only one dependent variable value labels Assigning value labels within the Variable View screen in SPSS ensures that the output is labelled appropriately when grouping variables are used – for example, = males, = females Variable View The screen within the SPSS Data Editor where the characteristics of variables are assigned variance A measure of how much a set of scores vary from their mean value Variance is the square of the standard deviation weighting/loading The contribution made to a composite value (such as a regression equation or a factor in a factor analysis) by a variable We use the calculated size of the weighting to assess the importance of the variable to the factor in a factor analysis within subjects Also known as repeated measures We select the same subjects (participants) for each condition of an independent variable for a within subjects design z score See standard score Index advice on significance testing 117–118 alpha reliability see Cronbach’s Alpha analysis of variance (ANOVA): comparing condition means 155–156; interaction of factors 145–146, 148, 151, 155–156, 189–190, 192, 194–196, 199–200, 202–205, 207, 210–212, 215–218, 220; introduction to 97, 137–156; in linear regression 318, 331; key terms 144–145; in multiple regression 335–336; multivariate analysis of variance (MANOVA) 146, 221–234; nonparameteric equivalents 257–276; one factor independent measures 157–176; one factor repeated measures, 177–188; repeated measures 147–150; two factor independent measures 189–195; two factor repeated measures 189, 196–204; two factor mixed design 189, 205–220; simple main effects 216–219; univariate analysis 145–146 assumptions underlying statistical tests 99–112 bar charts 40, 66–88 Bartlett’s test of sphericity see factor analysis Bonferroni 154–156, 181, 185–186, 189, 199, 208, 213, 217, 222, 228, 264, 273 bootstrapping 113–117, 122, 125 boxplots 61–66 Box’s Test of Equality of Covariance 225–226 Chart Builder 57–80 Chart Editor 69, 80, 89–90, 92, 204, 309–313 chi-square 277–296; as goodness of fit 292–296; introduction to 277–278; test of independence 278–292; weight cases 28, 45–46, 293–294 Compute command 17–19, 352, 359 confidence intervals 75, 98–99, 104, 113, 116–117, 122, 125–126, 131, 134, 164–165, 175, 185, 213 confirmatory factor analysis 339, 345 continuity correction see Yates’ continuity correction contrasts and multiple pairwise comparisons: contrasts 150–152, 155, 158–160, 163, 168–169, 173, 187; introduction to 150–151, 159–161; post hoc multiple comparisons 161–165, 169–176, 208, 211 correlation: introduction to 297–298; Kendall tau-b 304–306; linear regression 316–319; logistic regression 319–323; multiple correlation 331, 335; partial 314–316; Pearson’s correlation coefficient 298–301; scatterplots 306–313; Spearman’s correlation coefficient 301–303 Cronbach’s Alpha reliability 351–359: Alpha if Item Deleted 357–358; Corrected Item-Total Correlation 357–358; Cronbach’s Alpha based on standardized items 355–356; introduction to 351–352; procedure 370 INDEX 353–355; Scale Mean if Item Deleted 357–358 Crosstabs command 43–50, 279–280 crosstabulation 43–50 Custom Tables 50–53, 283–286 data: before analysing data 99–100; levels of data 5, 13, 99 Data Editor 6–10 data entry 5–16 Data View 7–15 descriptive statistics: Crosstabs command 43–50; Descriptives command 41–43; Explore command 43, 103–104; Frequencies command 35–41; introduction to 35 editing graphs and axes 89–92 eigenvalue see factor analysis Error bar chart 72–75 Exploratory factor analysis 339, 341–350 Explore command 43, 103–104 extreme scores 61, 63 factor analysis 339–350; Bartlett’s test of sphericity 341, 343, 347; confirmatory factor analysis 339, 345; determinant 343, 346; eigenvalue 342, 344, 347–348; exploratory factor analysis 341–350; introduction to 339–340; KMO test 341, 343, 346–347; multicollinearity 341, 346; principal components analysis 339, 342–349; scree plot 340, 343, 348–349 Fisher’s Exact Test 277–278, 286, 288; Friedman test 267–276 Frequencies 36–41, 56, 57 general linear model (GLM) 137–156 Go to Case 24 Goodness of fit (chi-square) 292–296 histogram 40–41, 43, 58–61, 100–103 homogeneity of variance 147–149, 159, 163, 168, 171, 191, 208, 226–227 homoscedasticity 143, 298, 341 illustrative statistics 57–92; bar charts 66–72, 81–86; boxplots 61–66; Chart Editor 69, 80, 89–90, 92; editing axes 90–92; editing graphs 89–90; error bar charts 72–75, 86–89; generating graphs with Chart Builder 58–80; generating graphs with legacy dialogs 80–92; histograms 58–61; introduction to 57; line graphs 75–78; pie charts 78–80 importing data from Excel 29–31 independent ANOVA see analysis of variance independent measures MANOVA see multivariate analysis of variance independent t test 120–127 inferential statistics: before analysing data 99–100; introduction to 93–96 inserting a case 24 inserting a variable 23 interaction 145, 148, 189, 195, 203–204, 211, 215 Kaiser–Meyer–Olkin (KMO) test see factor analysis Kendall tau-b 304–306 Kolmogorov–Smirnov test 106, 240, 245 Kruskal–Wallis test 257–267 kurtosis 43, 103, 106 Levene’s Test of Equality of Variances 123, 162, 171, 226–227 line graphs 75–77 linear regression 316–319 logistic regression 319–323 log transformation 110–112 main effects 154–155 Mann–Whitney U test 235–246 missing values 12–13, 22–23 model 138–141 Model Viewer 241, 251, 262, 271 multiple comparisons 150–154 multiple correlation 325–326 multiple regression 325–338; collinearity statistics 338; enter method 328–333; introduction to 325–326; multicollinearity 326, 341; procedure 327–328; stepwise 333–338 multivariate analysis of variance (MANOVA): discriminant function analysis 222, 228; independent measures 222–228; introduction to 146, 221–222; Mauchly’s Test of Sphericity 177–178, 182–183; repeated measures 228–234; Wilks’ lambda 178, 182, 221–222, 226 naming variables 10 nonparametric tests: introduction to 253, 257; Mann–Whitney U test 235–247; INDEX Wilcoxon test 247–255; Friedman test 267–276; Kruskal–Wallis test 257–267 normal distribution 100–109 one factor measures independent ANOVA see analysis of analysis one factor repeated measures ANOVA see analysis of variance outliers 61, 63 Output Navigator 16 paired samples t test see t test parametric tests: introduction to 93–94 partial correlation 314–316 Pearson’s correlation coefficient 298–301 pie chart 79–80 planned contrasts 150, 152, 159–165 post hoc tests 151, 159–165, 168–169 power 96–98 principal component analysis 339, 342–349 ranks 110, 235–236, 247, 258, 268, 301, 304 recoding data 20–22 regression: linear 143, 316–319; multiple 108, 325–338 related t test see t test repeated measures t test see t test repeated measures ANOVA see analysis of variance repeated measures MANOVA see multivariate analysis saving data 15–17 scatterplot 306–313 significance testing 94–95 simple main effects 216–219 skew 43, 102–103, 106 Spearman’s rho correlation coefficient 301–303 Sphericity 108–109, 147–150, 177–178, 196, 205 Split File command 25–27 Syntax command 216–219 t test: introduction to 119–120; independent measures 120–127; nonparametric equivalent see nonparametric tests; repeated measures (paired samples, related) 127–134 the logic of statistical testing 94–96 transforming data 17–19 Tukey test 153 two factor independent ANOVA see analysis of variance two factor mixed design ANOVA see analysis of variance two factor repeated measures ANOVA see analysis of variance Type I and II error 95–96 univariate analysis of variance see analysis of variance value labels 11–12 variable type 10–11 Variable View 8–9 Weight Cases 28 Wilcoxon test 248–255 Yates’ continuity correction 288 371 Reduce data Correlate variables Compare frequency counts (in categories) Look for differences between conditions What you want to do? Parametric Regression (binary variable) Parametric Parametric Regression Many variables Correlational Nonparametric Correlational Parametric Nonparametric Parametric Correlational Correlational Many variables Correlational Two or more variables Two variables Nonparametric Association Reliability analysis Factor analysis (Multiple) regression (Multiple) regression Kendall tau-b Spearman Pearson Chi-square Analyze Analyze Analyze Analyze Analyze Analyze Analyze Analyze Analyze Repeated Measures MANOVA Parametric Analyze Analyze Two factor mixed design ANOVA Analyze Analyze Independent MANOVA Parametric Repeated measures on both variables Two factor repeated measures ANOVA Two factor independent ANOVA Analyze Analyze Parametric Parametric One independent and one repeated measures factor Independent More than one measures dependent variable Repeated measures Two variables Parametric Independent measures on both variables Reliability Analysis Factor Dimension Reduction Scale Binary Logistic Linear Bivariate Bivariate Bivariate Crosstabs Repeated Measures Multivariate Repeated Measures Repeated Measures Univariate Regression Regression Correlate Correlate Correlate Descriptive Statistics General Linear Model General Linear Model General Linear Model General Linear Model General Linear Model Related Samples Friedman Nonparametric Tests Nonparametric Repeated measures Independent Samples Nonparametric Tests Kruskal–Wallis Nonparametric Analyze Univariate Related Samples Nonparametric Tests Repeated Measures Analyze 17 16 14 14 & 15 14 14 14 13 10 10 9 12 12 8 11 11 Independent Samples Nonparametric Tests General Linear Model Parametric Independent measures Wilcoxon Analyze Paired-Samples T Test Compare Means One factor repeated measures ANOVA Nonparametric Repeated measures Mann–Whitney U Analyze Chapter IndependentSamples T Test Compare Means Parametric Nonparametric Independent Related t test Analyze SPSS procedure General Linear Model Parametric Repeated measures Independent samples t test Recommended statistical test One factor independent Analyze measures ANOVA Parametric Parametric/ nonparametric Independent Design Repeated measures One variable: more than two Independent conditions measures One variable: two conditions Number of variables/ conditions CHOOSING A STATISTICAL TEST