Introductory Statistics Using SPSS® Second Edition For Mildred & Helen Introductory Statistics Using SPSS® Second Edition Herschel Knapp University of Southern California FOR INFORMATION: SAGE Publications, Inc 2455 Teller Road Thousand Oaks, California 91320 E-mail: order@sagepub.com SAGE Publications Ltd Oliver’s Yard 55 City Road London EC1Y 1SP United Kingdom SAGE Publications India Pvt Ltd B 1/I Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India SAGE Publications Asia-Pacific Pte Ltd Church Street #10-04 Samsung Hub Singapore 049483 Copyright © 2017 by SAGE Publications, Inc All rights reserved No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher All trademarks depicted within this book, including trademarks appearing as part of a screenshot, figure, or other image are included solely for the purpose of illustration and are the property of their respective holders The use of the trademarks in no way indicates any relationship with, or endorsement by, the holders of said trademarks SPSS is a registered trademark of International Business Machines Corporation Printed in the United States of America Library of Congress Cataloging-in-Publication Data Names: Knapp, Herschel, author Title: Introductory statistics using SPSS / Herschel Knapp Description: Second edition | Thousand Oaks, California : SAGE, [2017] | Includes index Identifiers: LCCN 2016022121 | ISBN 978-1-5063-4100-2 (pbk : alk paper) Subjects: LCSH: SPSS for Windows | Social sciences—Statistical methods—Computer programs Classification: LCC HA32 K59 2016 | DDC 005.5/5—dc23 LC record available at https://lccn.loc.gov/2016022121 Acquisitions Editor: Helen Salmon eLearning Editor: Katie Ancheta Editorial Assistant: Chelsea Pearson Production Editor: Libby Larson Copy Editor: Jim Kelly Typesetter: C&M Digitals (P) Ltd Proofreader: Alison Syring Indexer: Maria Sosnowski Marketing Manager: Susannah Goldes Brief Contents Preface Acknowledgments About the Author PART I: STATISTICAL PRINCIPLES Research Principles Sampling Working in SPSS PART II: STATISTICAL PROCESSES Descriptive Statistics t Test and Mann-Whitney U Test ANOVA and Kruskal-Wallis Test Paired t Test and Wilcoxon Test Correlation and Regression—Pearson and Spearman Chi-Square PART III: DATA HANDLING 10 Supplemental SPSS Operations Glossary Index Detailed Contents Preface Acknowledgments About the Author PART I: STATISTICAL PRINCIPLES Research Principles Learning Objectives Overview—Research Principles Rationale for Statistics Research Questions Treatment and Control Groups Rationale for Random Assignment Hypothesis Formulation Reading Statistical Outcomes Accept or Reject Hypotheses Variable Types and Levels of Measure Continuous Interval Ratio Categorical Nominal Ordinal Good Common Sense Key Concepts Practice Exercises Sampling Learning Objectives Overview—Sampling Rationale for Sampling Time Cost Feasibility Extrapolation Sampling Terminology Population Sample Frame Sample Representative Sample Probability Sampling Simple Random Sampling Stratified Sampling Proportionate and Disproportionate Sampling Systematic Sampling Area Sampling Nonprobability Sampling Convenience Sampling Purposive Sampling Quota Sampling Snowball Sampling Sampling Bias Optimal Sample Size Good Common Sense Key Concepts Practice Exercises Working in SPSS Learning Objectives Video Overview—SPSS Two Views: Variable View and Data View Variable View Name Type Width Decimals Label Values Missing Columns Align Measure Role Data View Value Labels Icon Codebook Saving Data Files Good Common Sense Key Concepts Practice Exercises PART II: STATISTICAL PROCESSES Descriptive Statistics Learning Objectives Videos Overview—Descriptive Statistics Descriptive Statistics Number (n) Mean (μ) Median Mode Standard Deviation (SD) Variance Minimum Maximum Range SPSS—Loading an SPSS Data File Run SPSS Data Set Test Run SPSS—Descriptive Statistics: Continuous Variables (age) Statistics Tables Histogram With Normal Curve Skewed Distribution SPSS—Descriptive Statistics: Categorical Variables (gender) Statistics Tables Bar Chart SPSS—Descriptive Statistics: Continuous Variable (age) Select by Categorical Variable (gender)—Female or Male Only SPSS—(Re)Selecting All Variables Good Common Sense Key Concepts Practice Exercises t Test and Mann-Whitney U Test Learning Objectives Videos Overview—t Test Example Research Question Groups Procedure Hypotheses Data Set Pretest Checklist Pretest Checklist Criterion 1—Normality Pretest Checklist Criterion 2—n Quota Pretest Checklist Criterion 3—Homogeneity of Variance Test Run Results Pretest Checklist Criterion 2—n Quota Pretest Checklist Criterion 3—Homogeneity of Variance p Value Hypothesis Resolution α Level Documenting Results Type I and Type II Errors Type I Error p: See p value p value: A score generated by inferential statistical tests to indicate the likelihood that the differences detected would emerge by chance alone Paired t test: Indicates if there is a statistically significant difference between the pretest and posttest (T1 : T2), for continuous variables Paste: The Paste button assesses the parameters specified on the associated menu(s) and produces the equivalent block of SPSS Syntax code Pearson correlation: See Regression Percent (%): A method for expressing a fraction in terms of 100 (% = Part ữ Total ì 100) Pie chart: A graphical representation of the numbers contained within a categorical variable, consisting of a circle wherein each “pie slice” represents the proportion of each category Polychotomous: A categorical variable that contains more than two values (e.g., meal = Breakfast, Lunch, Dinner) Population: All of the members/records (see Sampling) Positive correlations: The specified pair of scores tends to increase or decrease concurrently Power calculations: Formulas that provide estimates specifying optimal sample size Pretest checklist: Assumptions regarding the characteristics of the data that must be assessed prior to running a statistical test Pretest/posttest design: See Pretest/treatment/posttest Pretest/treatment/posttest: Longitudinal design model, typically using a single group, wherein a pretest is administered, followed by the treatment, followed by the posttest, which involves (re)administering the same instrument/metric used at the pretest to detect the effectiveness of the treatment Probability sample: A sample wherein each item/participant has an equal chance of being selected to partake in the research procedure Proportionate stratified sampling: A probability sampling technique wherein the percentage of items/participants selected from each stratum matches the percentage in the population Purposive sampling: A nonprobability sampling technique wherein each potential participant must meet multiple criteria Quota sampling: A nonprobability sampling technique wherein the total number of participants is specified prior to starting the data collection process; data collection continues until the specified number of participants is achieved r: See Regression R2: The total predictive value of a multiple regression or logistic regression model See Multiple regression (R2) Random assignment: Randomly assigning members to (control/experimental) groups reduces the likelihood of creating biased/unbalanced groups Random numbers: Figures that have no predictable sequence Range: The maximum − minimum Ratio variable: A continuous variable wherein the values are equally spaced and cannot be negative (e.g., age) Recoding: Systematically altering the way a variable is represented in a data set Regression: Indicates the direction of the relationship between two continuous variables gathered from each participant/data record Regression line: The line drawn through a scatterplot that shows the average pathway through those points Representative sample: A sample that is proportionally equivalent to the population Research question: The inquiry that forms the basis for the hypotheses construction, analyses, and documentation of results Sample: A sublist of the sample frame or population specifying those who will actually partake in the research procedure Sample frame: A sublist of the population that could be accessed to comprise the sample Sampling: The process of gathering a (small) portion of the population data to better comprehend the overall population, or a portion of the population with specific characteristics Sampling bias: Any procedure/incident/factor that interferes with the process of gathering a representative sample Scatterplot: A graphical representation of a bivariate correlation involving two continuous variables SD: See Standard deviation (SD) Sidak test: A test used to detect pairwise score differences wherein the groups have unequal ns; typically used as an ANOVA post hoc test Simple random sample: A probability sampling technique wherein a set number of participants are randomly selected from a sample frame Simple time-series design: See Pretest/treatment/posttest Skewed distribution: A nonnormal (asymmetrical) distribution within a continuous variable wherein most of the numbers are either high or low Snowball sampling: A nonprobability sampling technique wherein the researcher requests each participant to provide referral(s) to other potentially suitable participants Sort cases: See Sorting Sorting: Arranging items in ascending or descending sequence Spearman correlation: Assesses the similarity of two sequenced lists, also alternative to Pearson test when pretest conditions are not fully met Spearman’s rho: See Spearman correlation SPSS Syntax: A language used to code and run SPSS statistical programs Standard deviation (SD): A statistic that indicates the amount of similarity/diversity among the numbers contained within a variable Stratified sampling: A probability sampling technique wherein the sample frame is split into two or more strata (lists) (e.g., Females/Males), and then random selections are made from each stratum (list) Syntax: See SPSS Syntax Systemic sampling: A probability sampling technique wherein periodic selections of items/participants are made t test: Indicates if there is a statistically significant difference between the two groups (G1 : G2) containing continuous variables Text file: See ASCII Treatment group: See Experimental group Tukey test: A test used to detect pairwise score differences wherein the groups have equal ns; typically used as an ANOVA post hoc test Type I error: Occurs when the findings indicate that there is a statistically significant difference between two variables (or groups) (p ≤ 05) when, in fact, on the whole, there actually is not, meaning that you would erroneously reject the null hypothesis Type II error: Occurs when the findings indicate that there is no statistically significant difference between two variables (or groups) (p > 05) when, in fact, on the whole, there actually is, meaning that you would erroneously accept the null hypothesis Unique pairs formula: Computes the total number of comparisons that can be made when groups are gathered two at a time (G = total number of groups); unique pairs = G! ữ [2 ì (G 2)!] Variable View: SPSS screen wherein the attributes (properties) for each variable are defined Variance: The standard deviation squared (variance = SD2) Wilcoxon test: Similar to the paired t test but used when data distribution (posttest − pretest) does not meet normality criteria Work file: The work file is typically a copy of the master file that statistical analyses/recoding is carried out on Index Figures and tables are indicated by f or t following the page number Absolute certainty versus confidence, 106 Abstracts, 108–109 Align in variable view, 45 Alpha (α) error, 109–110 Alpha (α) level, 107 Alphabetization, sort as similar to, 247 Alternative hypothesis (H1), 11 American Standard Code for Information Interchange See ASCII ANOVA (analysis of variance) overview, 124–126, 126f cost-benefit consideration, 146 documentation of results, 139–140, 140t groups, number of, 146–147 hypothesis resolution, 139, 139t pretest checklist, 126–130, 127–129f research questions and, results, 132–138, 133t, 136–138t test run, 130–131f, 130–132 t test compared, 124 See also Kruskal-Wallis test Approaching versus static, 140 Area sampling, 29, 30f ASCII, 258, 261–264, 261–265f Association/correlation, 202–203, 203t Assumptions about data, 101 Availability (convenience) sampling, 30–31, 31f Average See Mean (μ) Backups, 269 Bar charts, 80, 81f Baseline scores, 158–159 Bell curve See Normal curve (bell curve) Beta (β) error, 110 Bias, sampling, 34 Bimodal distribution, 111f Bivariate correlation, 182 Blank cells, 44–45, 45f Categorical variables characteristics of, 67 for Chi-square, 230 descriptive statistics in SPSS, 77–80, 78–81f dichotomous variables, 205 types of, 13–14, 218 Causation, correlation compared, 202–203, 203t Certainty versus confidence, 106 Chi-square overview, 218–219 documentation of results, 228–229, 228t hypothesis resolution, 228 percentages display, 229, 229f pretest checklist, 221 research questions and, results, 223–225, 224–227f, 228 test overview, 219–220 test run, 221–223, 222–223f Cleaning of data, 49 Cluster (area) sampling, 29, 30f Codebook, 48, 51–52f Columns in variable view, 45 Combination (unique pairs) formula, 138, 146 Comments in syntax files, 268 Confidence versus absolute certainty, 106 Confidentiality, 269 Confounding variables, 105–106 Continuous variables characteristics of, 66–67 descriptive statistics in SPSS, 72–77, 72–77f types of, 13 Control versus treatment groups, 7–9f, 7–10, 169 Convenience sampling, 30–31, 31f Correlation (r) causation compared, 202–203, 203t, 206–207 defined, 182, 202 negative or no correlation, 195–196f negative or none, 195–196 Pearson test, 186–187, 186–187f, 192–193, 193t, 195–196, 195–196f positive versus negative, 182, 182t, 195, 195f research questions and, Cost, sampling limited by, 21 Cost-benefit consideration, 114, 146 Data assumptions about, 101 cleaning of, 49 importing with excel, 258–261, 259–261f recoding, 253–257, 254–257f source data, 269 Data view, 46–47f, 46–48 Decimals in variable view, 43 Deletion of records, 269 Delimiters, 261 delta% formula, 165–166 Descriptive statistics overview, 66–67 categorical variables, 77–80, 78–81f continuous variables, 72–77, 72–77f continuous variables selected by categorical variable, 81–85, 82–85f loading data files in SPSS, 71f rationale for, 4, 66 research questions and, reselecting all variables, 85, 86–87f, 87 types of, 67–70, 69f Dichotomous variables, 205, 218 Direct sources, 33 Discrete variables See Categorical variables Disproportionate sampling, 26, 27f 888, 44–45, 45f Encryption, 269 Error, 205 Ethics, 269 Evidence-based practice (EBP), 4–5 Excel imports, 258–261, 259–261f External validity, 24, 34, 40 Extrapolation, 21 Factorial function, 138 File management, 269 Flat distribution, comparison, 111f GIGO (garbage in, garbage out), 49 Hardware failure, 269 Hawthorne effect (reactivity), 170 Hidden populations, 33 Histograms, 67, 75–76, 76f Homogeneity of variance ANOVA, 129–130, 132–133, 133t t test, 102, 105, 105t See also p value Homoscedasticity, 192, 193f Hypothesis, defined, 11 Hypothesis formulation, 11–12 Importing See ASCII; Excel imports Indirect sources, 33 Inferential statistics, rationale for, Internal validity, 169–170 Interval variables, 13 Invisible populations, 33 Journal abstracts, 108–109 Justification, 45 Kruskal-Wallis test overview, 140–146, 141–145f, 144–146t cost-benefit consideration, 146 research questions and, when to use, 124 See also ANOVA (analysis of variance) Labels in variable view, 43 Left alignment, 45 Linearity, 191–192, 192f Logistic regression, 205–206, 206f Mann-Whitney U test overview, 110–114, 111–113f, 113t research questions and, 5–6 when to use, 98, 101 See also t test Master copies, 269 Maturation effects, 170 Maximums, 70 Mean (μ), 67–68 Measure in variable view, 45 Measures of central tendency, 67–69 Median, 68 Minimums, 70 Missing in variable view, 44–45, 45f Mode, 68 Multilevel sorting, 247, 247–248t Multimodal, 68 Multiple regression, 204–205, 204f Multistage cluster sampling See Area sampling Name in variable view, 41–42 Negative correlation, 182, 182t Negatively skewed, 77, 77f Negative values, 13 999, 44–45, 45f Nominal variables, 13–14 Nonparametric tests, defined, xviii, 199 Nonprobability sampling, 30–33, 31–33f Nonspurious, 202–203, 203t Normal curve (bell curve), 75–76, 76f, 101, 102f See also Normality criteria Normal distribution, xviii, 76, 111f Normality criteria ANOVA, 127–128, 127–129f Pearson test, 185, 185f t test, 101, 102f Normality of differences (diff), 161–162, 161–163f, 169 n quota ANOVA, 128, 132, 133t t test, 102, 104, 105t Null hypothesis (H0), 11 Number (n), 67 One-Way Anova, 103–104, 103–104f Ordinal variables, 14 Outcome (Y) variables, 204 Outliers, 77 O X O design, 158–159, 158f Paired t test (delta)% formula, 165–166 overview, 158–160, 158f documentation of results, 165 hypothesis resolution, 165 pretest checklist, 160–162, 161–163f research questions and, results, 164–165, 164–165t test run, 163–164, 163–164f See also Wilcoxon test Parametric tests, defined, xviii Parsimonious models, 204 Passwords, 269 PASW (Predictive Analytics Software) See SPSS Pearson test overview, 182–183, 182f, 182t documentation of results, 194–195, 194t hypothesis resolution, 194 negative or no correlation, 195–196, 195–196f pretest checklist, 184–186, 185f, 191–192, 192–193f research questions and, results, 190–193, 190–193f, 193t Spearman compared, 197 test overview, 183–184 test run, 186–189f, 186–190 See also Spearman’s rho (ρ) Periods in syntax files, 268 Polychotomous variables, 218 Population, 22, 23f Positive correlation, 182, 182t Positively skewed, 77, 77f Post hoc test ANOVA, 131–132, 131f, 134–137, 136–138t unique pairs formula, 138 Posttest See Pretest/posttest design Power calculation, 34 Predictive Analytics Software (PASW) See SPSS Predictor (X) variables, 204 Pretest/posttest design, 158–159, 158f Pretest scores, 158–159 Probability sampling, 24–29, 25–28f, 30f Proportionate sampling, 26, 27f Purposive sampling, 31–32 p value ANOVA, 129–130, 133–138, 133t, 136–138t extremely low, 110 homogeneity of variance and, 102 as static, 140 t test, 105–107, 106t Type I error and, 110 Quasi-experimental design, internal validity and, 169–170 Quota sampling, 32, 32f r See Correlation (r) Random assignment, 10–11 Random number generation, 244–246, 244–246f Range, 70 Ratio variables, 13 Reactivity (Hawthorne effect), 170 Recoding, 253–257, 254–257f Recruitment location, 34 Referrals, 33 Regression defined, 182 logistic regression, 205–206, 206f multiple regression, 204–205, 204f research questions and, See also Pearson test; Spearman’s rho (ρ) Regression lines, 186, 191, 191f Representative samples, 24 Research principles overview, common sense about, 14 hypothesis formulation, 11–12 random assignment, 10–11 rationale for statistics, 4–5 research questions, 5–7 treatment versus control groups, 7–9f, 7–10 variable types, 12–14 Research questions, 5–7 Right alignment, 45 Role in variable view, 46 Sample, 22–23, 23f Sample frame, 22, 23f, 30 Sample size, 34, 110, 219 Sampling overview, 20 bias about, 34 findings based on, 34 nonprobability sampling, 30–33, 31–33f power calculation and, 34 probability sampling, 24–29, 25–28f, 30f rationale for, 20–21 representative samples, 24 terminology, 22–24, 23f Saving data files, 49, 49f Scatterplot points, 190–191, 190–191f SD (standard deviation), 69–70, 69f Select cases, 251–253, 252t Sidak post hoc test, 132 Significance level See p value Simple random sampling, 24–25, 25f Simple time-series design, 158–159, 158f Skewed distribution, xviii, 76–77, 77f, 111f Skewed left, 77, 77f Skewed right, 77, 77f Skip terms (k), 28 Snowball sampling, 32–33, 33f Sort cases, 246–251, 247–248t, 247f, 249–251 Source data, 269 Spearman’s rho (ρ) overview, 196–198, 197f correlation versus causation, 202–203, 203t documentation of results, 202 hypothesis resolution, 202 Pearson compared, 197 pretest checklist, 199–200 research questions and, results, 201, 201t test overview, 198–199, 199f test run, 200, 200–201f See also Pearson test SPSS overview, 40 ASCII imports, 261–264, 261–265f categorical variables, 77–80, 78–81f clearing data, 42, 52f continuous variables, 72–77, 72–77f continuous variables selected by categorical variable, 81–85, 82–85f data accuracy issues, 49 data view, 46–47f, 46–48 excel imports, 258–261, 259–261f limits on statistics from, 88 loading data files in, 71, 71f multilevel sorting, 247, 247–248t random number generation, 244–246, 244–246f recoding, 253–257, 254–257f reselecting all variables, 85, 86–87f, 87 saving data files, 49, 49f select cases, 251–253, 252t sort cases, 246–251, 247–248t, 247f, 249–251 syntax, 265–268, 267f variable view, 40–46, 41–42f, 44–46f See also specific tests Spurious, 202 Standard deviation (SD), 69–70, 69f Static versus approaching, 140 Statistical abstracts, 108–109 Statistics, 4–5, 14 Statistics tables, 74, 74–75f, 79–80, 79–80f Strata, 25 Stratified sampling, 25, 26f Strength of correlation, 182–183, 182f String variables, 42 Summary statistics See Descriptive statistics Symmetry, 101, 102f Syntax, 265–268, 267f Systematic sampling, 28, 28f, 29 Temporality, 202–203, 203t Testing, effects of, 170 Text Import Wizard, 263–264, 263–264f Textual statistical abstracts, 108–109 Time, sampling limited by, 20–21 Treatment versus control groups, 7–9f, 7–10, 169 Trend lines, 186, 191, 191f t test overview, 98–100, 100f ANOVA compared, 124 cost-benefit consideration, 114 documentation of results, 108–109 hypothesis resolution, 107–108 pretest checklist, 100–102, 102f research questions and, 5–6 results, 104–107, 105–106t test run, 103–104, 103–104f when to use, 97 See also Mann-Whitney U test Tukey post hoc test, 132, 135–136, 136t Type I error, 109–110 Type II error, 110 Type in variable view, 42, 42f Unique pairs formula, 138, 146 Unique random numbers, 246 Validity external validity, 24, 34, 40 internal validity, 169–170 Value labels icon, 47–48, 47f Values in variable view, 43–44, 43f Variables confounding variables, 105–106 continuous variables, 13, 66–67 dichotomous variables, 205, 218 interval variables, 13 nominal variables, 13–14 ordinal variables, 14 outcome (Y) variables, 204 predictor (X) variables, 204 ratio variables, 13 string variables, 42 See also Categorical variables Variable view, 40–46, 41–42f, 44–46f Variance, 70 See also Homogeneity of variance Width in variable view, 43 Wilcoxon test overview, 166–169, 167f, 169t research questions and, when to use, 158 See also Paired t test .. .Introductory Statistics Using SPSS Second Edition For Mildred & Helen Introductory Statistics Using SPSS Second Edition Herschel Knapp University of Southern California... journals, he is also the author of Intermediate Statistics Using SPSS (2018), Practical Statistics for Nursing Using SPSS (2017), Introductory Statistics Using SPSS (1st ed., 2013), Therapeutic Communication:... SPSS Loading an SPSS Data File Run SPSS Data Set Test Run SPSS Descriptive Statistics: Continuous Variables (age) Statistics Tables Histogram With Normal Curve Skewed Distribution SPSS Descriptive Statistics: