70 statistics for social understanding with stata and SPSS rowman littlefield publishers (2019)

696 20 0
70  statistics for social understanding  with stata and SPSS rowman  littlefield publishers (2019)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

STATISTICS for SOCIAL UNDERSTANDING With Stata and SPSS NANCY WHITTIER Smith College TINA WILDHAGEN Smith College HOWARD J GOLD Smith College Lanham • Boulder • New York • London Executive Editor: Nancy Roberts Assistant Editor: Megan Manzano Senior Marketing Manager: Amy Whitaker Interior Designer: Integra Software Services Pvt Ltd Credits and acknowledgments for material borrowed from other sources, and reproduced with permission, appear on the appropriate page within the text Published by Rowman & Littlefield An imprint of The Rowman & Littlefield Publishing Group, Inc 4501 Forbes Boulevard, Suite 200, Lanham, Maryland 20706 www.rowman.com Tinworth Street, London SE11 5AL, United Kingdom Copyright © 2020 by The Rowman & Littlefield Publishing Group, Inc All rights reserved No part of this book may be reproduced in any form or by any electronic or mechanical means, including information storage and retrieval systems, without written ­permission from the publisher, except by a reviewer who may quote passages in a review British Library Cataloguing in Publication Information Available Library of Congress Cataloging-in-Publication Data Names: Whittier, Nancy, 1966– author | Wildhagen, Tina, 1980– author | Gold, Howard J., 1958– author Title: Statistics for social understanding: with Stata and SPSS / Nancy ­Whitter (Smith College), Tina Wildhagen (Smith College), Howard J Gold (Smith ­College) Description: Lanham : Rowman & Littlefield, [2020] | Includes bibliographical references and index Identifiers: LCCN 2018043885 (print) | LCCN 2018049835 (ebook) | ISBN 9781538109847 (electronic) | ISBN 9781538109823 (cloth : alk paper) | ISBN 9781538109830 (pbk : alk paper) Subjects: LCSH: Statistics | Social sciences—Statistical methods | Stata Classification: LCC QA276.12 (ebook) | LCC QA276.12 W5375 2020 (print) | DDC 519.5—dc23 LC record available at https://lccn.loc.gov/2018043885 ∞ ™ The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI/NISO Z39.48-1992 Printed in the United States of America Brief Contents Preface viii About the Authors  xvi CHAPTER Introduction 1 CHAPTER Getting to Know Your Data  54 CHAPTER Examining Relationships between Two Variables  121 CHAPTER Typical Values in a Group  161 CHAPTER The Diversity of Values in a Group  203 CHAPTER Probability and the Normal Distribution  241 CHAPTER From Sample to Population  280 CHAPTER Estimating Population Parameters  314 CHAPTER Differences between Samples and Populations  356 CHAPTER 10 Comparing Groups  399 CHAPTER 11 Testing Mean Differences among Multiple Groups  435 CHAPTER 12 T  esting the Statistical Significance of Relationships in Cross-Tabulations 463 CHAPTER 13 R  uling Out Competing Explanations for Relationships between Variables  501 CHAPTER 14 Describing Linear Relationships between Variables  542 SOLUTIONS TO ODD-NUMBERED PRACTICE PROBLEMS 599 GLOSSARY 649 APPENDIX A Normal Table  656 APPENDIX B Table of t-Values  658 APPENDIX C F-Table, for Alpha = 05  660 APPENDIX D Chi-Square Table  662 APPENDIX E Selected List of Formulas  664 APPENDIX F Choosing Tests for Bivariate Relationships  666 INDEX 667 iii Contents Preface viii About the Authors  xvi CHAPTER 1  Introduction 1 Why Study Statistics?  Research Questions and the Research Process 3 Pinning Things Down: Variables and Measurement 4 Units of Analysis  Measurement Error: Validity and Reliability  Levels of Measurement  Causation: Independent and Dependent Variables 11 Getting the Data: Sampling and Generalizing 12 Sampling Methods  13 Sources of Secondary Data: Existing Data Sets, Reports, and “Big Data”  15 Big Data  17 Growth Mindset and Math Anxiety  18 Using This Book  20 Statistical Software  21 Chapter Summary  23 Using Stata  25 Using SPSS  33 Practice Problems  45 Notes 52 CHAPTER 2  Getting to Know Your Data 54 Frequency Distributions  55 Percentages and Proportions  57 Cumulative Percentage and Percentile  60 iv Percent Change  62 Rates and Ratios  63 Rates 63 Ratios 65 Working with Frequency Distribution Tables 65 Missing Values  65 Simplifying Tables by Collapsing Categories 67 Graphical Displays of a Single Variable: Bar Graphs, Pie Charts, Histograms, Stem-and-Leaf Plots, and Frequency Polygons 69 Bar Graphs and Pie Charts  69 Histograms 72 Stem-and-Leaf-Plots 73 Frequency Polygons  75 Time Series Charts  76 Comparing Two Groups on the Same Variable Using Tables, Graphs, and Charts 77 Chapter Summary  84 Using Stata  85 Using SPSS  95 Practice Problems  109 Notes 120 CHAPTER 3  Examining Relationships between Two Variables  121 Cross-Tabulations and Relationships between Variables  122 Independent and Dependent Variables  123 Column, Row, and Total Percentages  127 Interpreting the Strength of Relationships 134 Contents Interpreting the Direction of Relationships 136 Graphical Representations of Bivariate Relationships 140 Chapter Summary  142 Using Stata  143 Using SPSS  147 Practice Problems  152 Notes 160 CHAPTER 4  Typical Values in a Group 161 What Does It Mean to Describe What Is Typical? 162 Mean 163 Median 167 Mode 171 Finding the Mode, Median, and Mean in Frequency Distributions  173 Choosing the Appropriate Measure of Central Tendency  175 Median Versus Mean Income  179 Chapter Summary  181 Using Stata  182 Using SPSS  187 Practice Problems  193 Notes 202 CHAPTER 5  The Diversity of Values in a Group  203 Range 205 Interquartile Range  205 Standard Deviation  210 Using the Standard Deviation to Compare Distributions 212 Comparing Apples and Oranges  214 Skewed Versus Symmetric Distributions  218 Chapter Summary  220 Using Stata  221 Using SPSS  225 Practice Problems  231 Notes 240 CHAPTER 6  Probability and the Normal Distribution 241 The Rules of Probability  242 The Addition Rule  245 The Complement Rule  246 The Multiplication Rule with Independence 248 The Multiplication Rule without Independence 249 Applying the Multiplication Rule with Independence to the “Linda” and “Birth-Order” Probability Problems 251 Probability Distributions  253 The Normal Distribution  254 Standardizing Variables and Calculating z-Scores 258 Chapter Summary  266 Using Stata  267 Using SPSS  270 Practice Problems  272 Notes 279 CHAPTER 7  From Sample to Population 280 Repeated Sampling, Sample Statistics, and the Population Parameter  281 Sampling Distributions  284 Finding the Probability of Obtaining a Specific Sample Statistic  287 Estimating the Standard Error from a Known Population Standard Deviation 288 Finding and Interpreting the z-Score for Sample Means  289 Finding and Interpreting the z-Score for Sample Proportions  292 The Impact of Sample Size on the Standard Error 293 Chapter Summary  295 Using Stata  295 Using SPSS  300 Practice Problems  306 Notes 313 v vi Contents CHAPTER 8  Estimating Population Parameters 314 Inferential Statistics and the Estimation of Population Parameters  315 Confidence Intervals Manage Uncertainty through Margins of Error  317 Certainty and Precision of Confidence Intervals 317 Confidence Intervals for Proportions  318 Constructing a Confidence Interval for Proportions: Examples  322 Confidence Intervals for Means  326 The t-Distribution  326 Calculating Confidence Intervals for Means: Examples 329 The Relationship between Sample Size and Confidence Interval Range  333 The Relationship between Confidence Level and Confidence Interval Range 335 Interpreting Confidence Intervals  337 How Big a Sample?  338 Assumptions for Confidence Intervals  341 Chapter Summary  342 Using Stata  344 Using SPSS  346 Practice Problems  349 Notes 354 CHAPTER 9  Differences between Samples and Populations  356 The Logic of Hypothesis Testing  357 Null Hypotheses (H0) and Alternative Hypotheses (Ha) 358 One-Tailed and Two-Tailed Tests  359 Hypothesis Tests for Proportions  359 The Steps of the Hypothesis Test  364 One-Tailed and Two-Tailed Tests  365 Hypothesis Tests for Means  367 Example: Testing a Claim about a Population Mean 373 Error and Limitations: How Do We Know We Are Correct?  375 Type I and Type II Errors  376 What Does Statistical Significance Really Tell Us? Statistical and Practical Significance 379 Chapter Summary  381 Using Stata  382 Using SPSS  386 Practice Problems  392 Notes 398 CHAPTER 10  Comparing Groups  399 Two-Sample Hypothesis Tests  401 The Logic of the Null and Alternative Hypotheses in Two-Sample Tests  401 Notation for Two-Sample Tests  402 The Sampling Distribution for Two-Sample Tests 403 Hypothesis Tests for Differences between Means  404 Confidence Intervals for Differences between Means  411 Hypothesis Tests for Differences between Proportions 412 Confidence Intervals for Differences between Proportions 416 Statistical and Practical Significance in Two-Sample Tests  418 Chapter Summary  419 Using Stata  420 Using SPSS  424 Practice Problems  429 Notes 434 CHAPTER 11  Testing Mean Differences among Multiple Groups  435 Comparing Variation within and between Groups 436 Hypothesis Testing Using ANOVA  438 Analysis of Variance Assumptions  439 The Steps of an ANOVA Test  440 Determining Which Means Are Different: Post-Hoc Tests  446 ANOVA Compared to Repeated t-Tests  447 Chapter Summary  448 Using Stata  448 Contents Using SPSS  450 Practice Problems  453 Notes 461 The “Best-Fitting” Line  552 Slope and Intercept  553 Calculating the Slope and Intercept  556 Goodness-of-Fit Measures  557 CHAPTER 12  Testing the Statistical Significance of Relationships in Cross-Tabulations 463 The Logic of Hypothesis Testing with Chi-Square 466 The Steps of a Chi-Square Test  469 Size and Direction of Effects: Analysis of Residuals 475 Example: Gender and Perceptions of Health 477 Assumptions of Chi-Square  481 Statistical Significance and Sample Size  481 Chapter Summary  486 Using Stata  487 Using SPSS  489 Practice Problems  492 Notes 500 CHAPTER 13  Ruling Out Competing Explanations for Relationships between Variables  501 Criteria for Causal Relationships  506 Modeling Spurious Relationships  508 Modeling Non-Spurious Relationships  513 Chapter Summary  520 Using Stata  521 Using SPSS  526 Practice Problems  532 Notes 541 CHAPTER 14  Describing Linear Relationships between Variables 542 Correlation Coefficients  544 Calculating Correlation Coefficients  545 Scatterplots: Visualizing Correlations  546 Regression: Fitting a Line to a Scatterplot 550 R-Squared (r2) 557 Standard Error of the Estimate  558 Dichotomous (“Dummy”) Independent Variables 559 Multiple Regression  563 Statistical Inference for Regression  565 The F-Statistic  566 Standard Error of the Slope  568 Assumptions of Regression  571 Chapter Summary  573 Using Stata  575 Using SPSS  581 Practice Problems  588 Notes 598 SOLUTIONS TO ODD-NUMBERED PRACTICE PROBLEMS  599 GLOSSARY  649 APPENDIX A  Normal Table  656 APPENDIX B  Table of t-Values  658 APPENDIX C  F-Table, for Alpha = 05  660 APPENDIX D  Chi-Square Table  662 APPENDIX E  Selected List of Formulas  664 APPENDIX F  Choosing Tests for Bivariate Relationships 666 INDEX  667 vii Preface The idea for Statistics for Social Understanding: With Stata and SPSS began with our desire to offer a different kind of book to our statistics students We wanted a book that would introduce students to the way statistics are actually used in the social ­sciences: as a tool for advancing understanding of the social world We wanted thorough coverage of statistical topics, with a balanced approach to calculation and the use of statistical software, and we wanted the textbook to cover the use of software as a way to explore data and ­answer exciting questions We also wanted a textbook that incorporated Stata, which is widely used in graduate programs and is increasingly used in undergraduate classes, as well as SPSS, which remains widespread We wanted a book designed for introductory students in the social sciences, including those with little quantitative background, but one that did not talk down to students and that covered the conceptual aspects of statistics in detail even when the mathematical ­details were minimized We wanted a clearly written, engaging book, with plenty of practice problems of every type and easily available data sets for classroom use We are excited to introduce this book to students and instructors We are three ­experienced instructors of statistics, two sociologists and a political scientist, with viii more than sixty combined years of teaching experience in this area We drew on our teaching experience and research on the teaching and learning of statistics to write what we think will be a more effective textbook for fostering student ­ learning In addition, we are excited to share our experiences teaching statistics to ­social ­science students by authoring the book’s ancillary materials, which include not only practice problems, test banks, and data sets but also suggested class exercises, P ­ owerPoint slides, assignments, lecture notes, and class exercises Statistics for Social Understanding is distinguished by several features: (1) It is the only major introductory statistics book to integrate Stata and SPSS, giving instructors a choice of which software package to use (2) It teaches statistics the way they are used in the social sciences This includes beginning every chapter with examples from real research and taking students through research questions as we cover statistical techniques or software applications It also includes extensive discussion of relationships between variables, through the earlier placement of the chapter on cross-tabulation, the addition of a dedicated chapter on causality, and comparative examples throughout every chapter of the book (3) It is informed by Preface research on the teaching and learning of quantitative material and uses principles of universal design to optimize its contents for a variety of learning styles Distinguishing Features 1) Integrates Stata and SPSS While most existing textbooks use only SPSS or assume that students will purchase an additional, costly, supplemental text for Stata, this book can be used with either Stata or SPSS We include parallel sections for both SPSS and Stata at the end of every chapter These sections are written to ensure that students understand that software is a tool to be used to improve their own statistical reasoning, not a replacement for it.1 The book walks students through how to use Stata and SPSS to analyze interesting and relevant research questions We not only provide students with the syntax or menu selections that they will use to carry out these commands but also carefully explain the statistical procedures that the commands are telling Stata or SPSS to perform In this way, we encourage students to engage in statistical reasoning as they use software, not to think of Stata or SPSS as doing the statistical reasoning for them For Stata, we teach students the basic underlying structure of Stata syntax This approach facilitates a more intuitive understanding of how the program works, promoting greater confidence and competence among students For SPSS, we teach students to navigate the menus fluently ix 2) Draws on teaching and learning research Our approach is informed by research on teaching and learning in math and statistics and takes a universal design approach to accommodate multiple learning styles We take the following research-based approaches: • Research on teaching math shows that students learn better when teachers use multiple examples and explanations of topics.2 The book explains topics in multiple ways, using both alternative verbal explanations and visual representations As e­ xperienced instructors, we know the topics that students frequently stumble over and give special attention to ­explaining these areas in multiple ways This ­approach also accommodates differences in learning styles across students • Some chapter examples and practice problems lead students through the process of addressing a problem by acknowledging commonly held mis­ conceptions before presenting the proper solution This approach is based on research that shows that simply presenting students with information that corrects their statistical misconceptions is not enough to change these “strong and resilient” misconceptions.3 Students need to be able to examine the differences in the reasoning underlying incorrect and correct strategies of statistical work • Each chapter provides numerous, care- fully proofread, practice problems, with additional practice problems on the text’s website Students learn best by CI = y ± t(SEy ), DF = N – SEy = s N One Sample Two Samples DF = N1 + N2 – CI = (y1 – y2) ± t( SEy1 – y2 ), SEy + SEy SEy – y = Confidence Interval Mean y–µ , SEy DF = N – t= SEy = s N One Sample Two Samples DF = N1 + N2 – SEy1 – y (y1 – y2) –0 SEy + SEy SEy – y = t= Hypothesis Test p(1 – p) N CI = p± z(SEp) SEp = One Sample SEp + SEp SEp – p = Two Samples z(SEp –p ) CI = (p1 – p2 ) ± Confidence Interval z= p–π SEp Two Samples z= SEp – p ( p1 − p2) −0 SEp – p = π(1 – π) π(1 – π) + N1 N2 Hypothesis Test π(1 – π) N One Sample SEp = ProporƟon Appendix E Inference for Means and ProporƟons Appendix E 665 666 Inference Test: Chi-Square CrossTabulaƟon Independent Variable: Nominal or Ordinal with ² categories (Not covered in this book) LogisƟc, Probit, MulƟnomial Regression Independent Variable: Interval-Ratio Dependent Variable: Nominal or Ordinal with ² categories Means: Inference Test: t Test; EsƟmaƟon: CI for Difference ProporƟons: Inference Test: z Test; EsƟmaƟon: CI for Difference Compare Means or ProporƟons? Independent Variable: Nominal or Ordinal with categories) Inference Test: F Test ANOVA Independent Variable: Nominal or ordinal with > categories Dependent Variable: Interval-Ratio Inference Test: t Test for Slopes, F Test for Global Model CorrelaƟon, Ordinary Least Squares Regression Independent Variable: Interval-Ratio Choosing Tests for Bivariate RelaƟonships, by Level of Measurement of Independent and Dependent Variables Appendix F Index Note: Page numbers in italics indicate figures and those in bold indicate tables Addition Rule, 245–46, 266 Adelman, Robert, 542–43 aggregate data, 55 aggregate level, 6, 24 alpha-level (α), 364, 381; in ANOVA, 441, 445, 447, 448; in chi-square test, 466, 470, 473, 474, 484, 486; in one-sample ­hypothesis tests, 364–70, 373, 378, 380, 381, 382, 384, 385, 387; in statistical inference for ­regression, 543, 567–70; in two-­sample hypothesis tests, 400, 405–6, 408, 412, 413, 415, 417, 419, 421, 422, 427 alternative hypothesis: in chi-square test for ­goodness-of-fit, 483–85, 484, 485; in ­chi-square test of independence, 469–70, 478, 480–81, 483, 486–88, 489; in h ­ ypothesis testing using ANOVA, 438–39, 439; in one-sample hypothesis tests, 359; in two-sample hypothesis tests, 402 American National Election Study (ANES), 15, 16, 65, 71, 73, 124, 126, 208, 212, 234, 235, 239–40, 253, 260–61, 521, 523, 526, 529, 595, 596–97 analysis, units of, 6, 55 analysis of variance (ANOVA), 435–62; alpha level in, 441, 445, 447, 448; assumptions, 439–40; decision f in, 439, 441–46, 448; decision t in, 442; defined/overview of, 435–36; degrees of freedom and, 439, 439, 441–46, 448; F-curves, 439, 439; F-statistic, 438, 439, 442, 445–46, 448, 449, 452, 454, 461; F-values, 438, 441, 445; hypothesis testing using, 438–39; null hypothesis in, 441, 444, 446, 448; as omnibus test, 436; personal control, research on, 435–36; post-hoc tests, 436, 446–48, 447, 450, 453, 458, 461; repeated t-tests compared to, 447; research, 435–36; SPSS for, 450–53; Stata for, 448–50; steps of, 440–46, 448; Tukey’s test, 446–47, 447, 448, 458; variation between groups sum of squares, 442, 443, 443–44; variation within and between groups, 436–38, 437; variation within groups sums of squares, 442, 442 antecedent variable, 509, 509, 513, 520 “apples and oranges” case, 214–17 arithmetic operators in Stata, 31, 31 assumptions: chi-square test, 481; confidence ­intervals, 341; hypothesis test, 372; linear ­regression, 579–80, 585–87; regression, 571–72 average, calculating, 11 axis: compressed, 83; horizontal, 83, 255; ­vertical, 69, 73, 82–83, 82–83, 105; x, 76, 106, 108, 393, 546, 548, 551, 561, 575, 581, 588; y, 76, 108, 299, 546, 548, 551, 554, 555, 556, 561, 575, 581, 588, 592 bar graphs, 69–72, 70, 78, 79, 85; clustered, 140, 140–41; comparing two groups on same variable using, 78, 79; defined, 85; of single variable, 69–72, 70; stacked, 140–41, 141; Stata to generate, 94 Base Reference Manual for Stata, 33 best-fitting line, 552–53 bias, nonresponse, 14 big data, 17–18 bivariate relationships, 122, 140–41, 140–41 box plot: generating, in SPSS, 229–30, 230; ­generating, in Stata, 223, 223; interquartile range displayed with, 208–10, 209 boyd, danah, 17–18 Brame, Robert, 314–15, 317 categories: in cross-tabulations, c­ ollapsing, 134; data in variable, 55; simplifying ­frequency ­distribution tables by collapsing, 67, 67–69, 68, 69 causal relationships, 11–12, 501–41; ­association between variables in, 506–7; chi-square statistic 667 668 Index Index causal relationships (continued) in controlled cross-­tabulation, interpreting, 517–19; control variables in, 503–17, 520–21, 523–24, 525–27; correlation differentiated from, 550; experimental control, 11–12; ­independent and dependent variables in, 11; independent effect of variables in, 513, 520; intervening variables in, 509, 510, 513, 520; mediating ­variables in, 509, 513, 520; ­moderating ­variables in, 515, 520; non-­spurious relationships in, ­modeling, 513, 513–17; spurious relationships in, ­modeling, 503, 503–5, 506, 508–13; ­statistical control, 12; statistical interaction between ­variables in, 515–17, 516, 520; Tufte’s research on, 501–3, 550 Central Limit Theorem, 284–85, 287, 293, 295, 319, 341, 362, 403–4, 439 central tendency, measures of See measures of ­central tendency certainty, of confidence intervals, 317–18, 342 chi-square test, 463–500; alpha-level (α) in, 466, 470, 473, 474, 484, 486; alternative hypothesis in, 469–70, 478, 480–81, 483, 486–88, 489; assumptions of, 481; in controlled cross-­ tabulation, interpreting, 517–19; degrees of freedom in, 470, 473, 473–75, 474, 475, 477, 479, 484–85, 486, 489, 492; example, gender and ­perceptions of health, 477–78; expected ­frequencies in, 470–71, 473, 476, 481, 483, 484, 486, 487–88, 490; for goodness-of-fit, 483–85, 484, 485; of independence, 467–69, 471, 478–79, 479, 481, 483, 486; to investigate relationship between class and political ideology, 478, 478–79, 479; logic of, 466–69; null hypothesis in, 464–75, 477–78, 480, 482–85, 486–88, 489, 492; SPSS for, 489–92; standardized residuals in, 475–76, 476; Stata for, 487–89; statistical significance and sample size, 481–83; steps of, 469–75; two-sample z-tests for × cross-­ tabulations and, ­relationship between, 480, 480–81 chi-square test for goodness-of-fit, 483–85, 484, 485 Clinton, Hillary, 121–22, 208–10, 209, 209, 270–71 closed-ended survey items, 5–6, 24 clustered bar graph, 140, 140 cluster sampling, 14 codebooks, 16, 16–17 code logic in Stata, 28 Command Box in Stata, 26, 26, 27 Command Window in Stata, 26, 26 competing explanations for relationships between variables, ruling out: ­antecedent ­variable, 509, 509, 513, 520; control ­variable, 501–41; independent effect, 513, 520; intervening variable, 509, 510, 513, 520; mediating variable, 509, 513, 520; moderating variable, 515, 520; non-­spurious relationships, 513–17; spurious relationship, 508–13; statistical i­ nteraction, 515–17, 516, 520 Complement Rule, 246–48, 266 compressed axis, 83 Compute Variable dialog box in SPSS, 37–38, 38 Compute Variable in SPSS, 36, 37, 41 concepts, 4–5, 20, 23 confidence interval range: confidence level in relation to, 335–37; sample size in relation to, 333–35 confidence intervals, 314–54; assumptions for, 341; Brame’s research on, 314–15, 317; certainty of, 317–18, 342; differences between proportions and, 416–18, 419; election polling and, 325–26; inferential statistics and, 316, 316, 323, 342; interpreting, 337–38; lower bound of, 321, 324, 331, 334, 342; margin of error in, 280, 315, 317, 318, 320–21, 323, 324, 325–26, 328, 330, 331, 332, 335, 336, 337, 339–40, 343; of means, 326–33, 343, 344–45, 347–48; point estimates and, 316, 317, 324, 325, 329, 330, 332, 335, 337; precision of, 317–18, 335, 336, 339, 342; of proportions, 318–26, 342–43, 345–46, 348–49; range and (See confidence interval range); sample size and, 333–35, 338–41, 343; SPSS to calculate, 346–49; Stata to calculate, 344–46; two-tailed hypothesis tests and, equivalence of, 367; upper bound of, 321, 323, 324, 324, 329–32, 331, 334, 335, 336, 338, 342, 343 confidence level: in calculating confidence ­intervals for means, 330, 331; defined, 317, 342; degrees of freedom and, 330–32, 331, 335, 341, 343; in interpreting ­confidence intervals, 337; in relation to c­ onfidence interval range, 335–37; sample size in ­confidence intervals and, 339–41; z-score associated with, 318–22, 320, 321, 324, 342 continuous variables, 9–10 control variables, 501–41; in causal ­relationships, 503, 503–17, 520–21, 523–24, 525–27; in cross-tabulations, 503, 503, 520–5234, 526–29; defined, 503, 520; in modeling non-spurious Index data, 54–120 See also graphical ­representations of data; aggregate, 55; analysis before ­digital era, 22; cases in variable categories, 55; ­comparing two groups on same ­variable, 77–81; ­cumulative percentage, 60–62, 61, 62, 84; frequency ­distributions, 55–57, 56, 57; ­frequency ­distribution tables, 65–69, 66, 67, 68, 69; ­Furstenberg’s and Kennedy’s research on, 54–55; getting to know, 109–20; independence of, in assumptions of regression, 571; ­percentages, 55, 57–60, 58, 59, 84; percent change, 62–63, 84; ­percentile, 62, 84; ­proportions, 57–60; rate, 63–64, 84; ratio, 65, 84; raw frequency, 55; ­secondary (See ­secondary data); SPSS to generate ­statistics and graphs, 95–109; Stata to ­generate statistics and graphs, 85–94; unit of analysis, 55; univariate ­statistics, 55 data, gathering, 12–15; descriptive ­statistics, 12–13; inferential statistics, 13; non-­probability sample, 13; population, 12–13; ­probability sample, 13; sample, 12–13; sampling methods, 13–15 Data Editor in SPSS, 34, 34–36, 36, 38, 38, 41, 42, 42 Data Editor in Stata: icon, 26, 26; opening, 26, 26; reviewing data in, 26–27, 27 data set display in Stata, 26–27, 27 Data View in SPSS, 35, 38–39, 39 decision f: in ANOVA, 439, 441–46, 448; in ­statistical inference for regression, 567–68, 574 decision t: in ANOVA, 442; in one-sample hypothesis tests, 369–71, 371, 374–75, 381; in statistical inference for regression, 569–71, 575; in two-sample hypothesis tests, 405, 405–9, 406, 409, 419, 423 degrees of freedom (DF): in ANOVA, 438–39, 439, 441–46, 448; in chi-square test, 470, 473, 473–75, 474, 475, 477, 479, 484–85, 486, 489, 492; confidence level and, 328, 328, 330–32, 331, 335, 341, 343; in one-sample hypothesis test, 369–70, 373; t-distributions and, 327, 327–28, 328; in two-­sample hypothesis test, 404–7, 406, 412, 419, 423 dependent variables, 11–12 Descriptives dialog box in SPSS, 43–44, 44 descriptive statistics: cumulative percentage, 60–62, 61, 62, 84; frequency, 55, 57–60, 58, 59, 84; in gathering data, 12–13; percentages, 55, 57–60, 58, 59, 84; percent change, 62–63, 84; percentile, 62, 84; purpose of, 55; rate, 63–64, 84; ratio, 65, 84; sampling, 12–13 discrete variables, 9–10 distributions: frequency, 55–57, 56, 57, 85; ­frequency distribution tables, 65–69, 66, 67, 68, 69; normal, 254–58, 255, 256, 257, 272–79; probability, 253–58; sampling, 280–313, 403, 403–4, 438–39, 439; skewed, 218–19; ­standard deviation to compare, 212–14, 213–14; symmetric, 218–19 do-file in Stata, 31–33, 32, 94 drop-down menus: SPSS, 33, 35, 95; Stata, 25 drop-down menus in SPSS, 35 dummy variables, 559–61 Dweck, Carol, 19 Index relationships, 513–17; Tufte’s research on, 501–3, 550 correlation, 542–98; causation differentiated from, 550; direction of, 544, 546, 547; ­negative, 544, 546, 547; nonsensical, 550; positive, 546, 547, 551; r-squared (r2), 557–58; scatterplots for ­visualizing, 546–57, 547, 548, 549, 551; standard error of the estimate, 558–59; strength of, 544, 544, 546, 547 correlation coefficients, 544–46; ­assumptions of regression, 571–72; calculating, 545–46; defined, 544; dummy variables, 559–61; formula for, 545–46; f-statistic, 566–68; intercept, 553–57, 555, 556; inverse ­correlation, 544, 546, 547; linear regression, 550, 554, 561–62, 579–80, 584; linear ­relationship, 545, 546, 549, 552–53, 555; logistic regression, 561–62, 575; positive correlation, 546, 547, 551, 573; regression line, 550–57; r-squared (r2), 557–58;  size of, 544; slope (regression coefficient), 553–57; standard error of the slope, 568–71 covariance of two variables, formula for, 545, 546 Crawford, Kate, 17–18 cross-tabulations: collapsing categories in, 134; controlling for third variable in, 503, 503, 520–23, 526–29; interpreting the chi-square statistic in, 517–19; statistical s­ ignificance as it applies to, 463–500 (See also chi-square test) cumulative percentage, 60–62, 61, 62, 84 cumulative probability, 263, 263, 264–65, 267, 269 curvilinear relationship, 549, 549 cutoff points, 207–8, 413 669 670 Index Index ecological fallacy, 6, 24 election polling, confidence intervals and, 325–26 election polling and confidence intervals, 325–26 Empirical Rule, 256, 256–58 equivalence of two-tailed hypothesis and ­confidence intervals, 367 error messages in Stata, 30–31 expected frequencies, 470–71, 473, 476, 481, 483, 484, 486, 487–88, 490 experimental control, 11–12 explanatory variables in social sciences, 513 F-curves, 438–39, 439 feeling thermometer: for big business, Tea Party, and Black Lives Matter, 212–14, 213–14; Clinton, 270–71; for conservatives, 234, 234–35; of different groups, 212; of gay men and ­lesbians by gender and region, 514, 514–15, 515; of illegal immigrants, 208–10, 209, 209; for ­liberals, 235, 236; Obama, 74, 71, 71–75, 72, 73, 74; recoding, 71, 71–72, 72; for Trump voters, 272, 272 frequencies, 55, 57–60, 58, 59; defined, 84; ­relative size assessed by, 59 frequency distributions, 55–57, 56, 57, 85; aggregate data in, 55; defined, 85; measures of central tendency in, finding, 173, 173–75, 175; raw ­frequency in, 56; SPSS to generate, 95–98, 95–98; Stata to generate, 86, 86–88, 87–88, 94; unit of analysis in, 55; univariate statistics in, 55 frequency distribution tables, 65–69, 66, 67, 68, 69; missing values, 65–67, 66; recoding, 67–69; simplifying by collapsing categories, 67, 67–69, 68, 69 frequency polygons, 75, 75, 79, 81; comparing two groups on same variable using, 79, 81; defined, 85; graphical displays of single variable, 75, 75 F-statistic, 566–68; ANOVA, 438, 439, 442, 445–46, 448, 449, 452, 454, 461; in statistical inference for regression, 566–68 Furstenberg, Frank, 54–55 F-values, 438, 441, 445 generalizing, 12–15 General Social Survey (GSS), 15, 48, 48, 56, 56–57, 57, 69, 85, 95, 132, 134–35, 137, 139, 143–44, 147, 149, 173, 177–78, 182, 186–87, 192, 205–6, 217, 252, 279, 322–23, 433, 434, 444–46, 448–49, 450–51, 487, 489, 499, 500, 521, 526, 540, 560 Goldberg, Amir, 17 goodness-of-fit, 557–59; chi-square test for, 483–85, 484, 485 goodness-of-fit measures, 557–59; r-squared (r2), 557–58; standard error of the estimate, 558–59 Gould, Stephen Jay, 203 graphical representations of data: bar graphs, 69–72, 70, 78, 79, 85; of bivariate relationships, 122, 140–41, 140–41; comparing two groups on same variable, 77–81; ­compressed axis, 83; displays of single v ­ ariable, 69–75; frequency distributions, 55–57, 56, 57, 85, 86–88; frequency distribution tables, 65–69, 66, 67, 68, 69; frequency polygons, 75, 75, 79, 81, 85; histograms, 72–73, 73, 79, 80–81, 85, 94; horizontal axis, 83, 255; misleading, 82–83, 82–83; pie charts, 69–72, 70, 78, 80, 85; SPSS to generate, 95–109; Stata to generate, 85–94; s­ tem-and-leaf plots, 73–75, 74, 85; time series charts, 76, 76–77, 85; Venn diagrams, 563, 563, 594, 594; vertical axis, 69, 73, 82–83, 82–83, 105; x axis, 76, 106, 108, 393, 546, 548, 551, 561, 575, 581, 588; y axis, 76, 108, 299, 546, 548, 551, 554, 555, 556, 561, 575, 581, 588, 592 graphical user interface (GUI): SPSS, 33, 34, 34–36, 36; Stata, 25–26, 26 groups: comparing, 399–434; differences between, in social sciences, 399; diversity of values in, 231–40; Houle’s and ­Warner’s research on, 161–62, 204; ­Rivera’s and Tilcsik’s research on, 399–400, 418; sums of squares, v ­ ariation between, 442, 443, 443–44; sums of squares, variation within, 442, 442; testing mean differences among multiple, 453–61; two, on same variable, 77–81; typical values in, 161–202, 193–202; ­variation within and between, 436–38, 437 growth mindset, 18–19 histograms, 72–73, 73, 79, 80–81; ­comparing two groups on same variable using, 79, 80–81; defined, 85; graphical displays of single ­variable, 72–73, 73; probability ­ distributions found with, 253, 253; ­showing ­percentages, Stata to generate, 94 Hollerith, Herman, 22 homoscedasticity of residuals, 571 horizontal axis, 83, 255 Houle, Jason, 161–62, 204 hypothesis, defined, 3, 23 hypothesis testing: one-sample, 356–98; power of, 378, 382; two-sample, 399–434; using ANOVA, 438–39 Index and, 254–58, 257; right-tail probability for, 259, 259–60; sampling d ­ istribution and, 288–90, 289, 290; s­ tandard deviations and, 258–65; z-scores and, 258–65 Kahneman, Daniel, 241–43, 251–52 Kennedy, Sheela, 54–55 launching: SPSS, 33–34; Stata, 25 left-tail probability, 264, 264–65, 383, 384, 387, 402 level of measurement, 9–11 linear regression, 550, 554, 561–62, 579–80, 584 See also regression line; check ­assumptions of, SPSS for, 585–87; check assumptions of, Stata for, 579–80; defined, 550, 573; intercept in, 554–57; models, 561–62; scatterplots to visualize, 550–51; slope in, 554–57 linear relationships between variables, 542–98; assumptions of regression, 571–72, 578–80, 585–87; best-fitting line, 552–53; ­ irection of, ­correlation coefficients, 544–46; d 544; dummy variables, 559–61; g ­ oodness-of-fit measures, 557–59; multiple regression, 563–65; scatterplots, 546–51; slope and intercept, 553–57; SPSS for, 581–88; Stata for, 575–81; ­statistical i­ nference for regression, 565–71; straight line for representing, 548–49, 573; strength of, 544, 544 logical operators in Stata, 31, 31 logistic regression, 561–62, 575 Long, L Scott, 562 lower bounds, 321, 324, 331, 342 Luker, Kristen, margin of error, 280, 315, 317–18, 320–21, 323, 324, 325–26, 328, 330, 331, 332, 335–37, 339–40, 343 Marked (Pager), 2–3 math anxiety, 19–20 means, 163–67; “apples and oranges” case, 214–17; claim about, testing, 373–75; c­ onfidence intervals for, 326–33; ­confidence intervals of, 326–33, 343, 344–45, 347–48; defined, 163; differences among m ­ ultiple groups, testing, 453–61; differences between, confidence intervals for, 411–12, 419; differences between, hypothesis tests for, 404–12; differences between, post-hoc tests to determine, 446–47, 447; equation for calculating, 163; express individual observation as distance from, 258; in f­ requency distributions, Index ideological identification variable, 66, 67–68, 124, 124–26, 125, 134 “if” in a command in Stata, rule for, 29 income, median versus mean, 179–80, 180 independence, chi-square test of, 467–69, 471, 478–79, 479, 481, 483, 486 See also ­chi-square test independent effect, 513, 520 independent variables, 11–12 in depth feature: assessing relative size using percentages and frequencies, 59; assumptions of hypothesis tests, 372; ­collapsing categories in cross-tabulations, 134; election polling and ­confidence intervals, 325–26; equivalence of two-tailed hypothesis and confidence intervals, 367; GSS tests Americans’ knowledge of ­probability, 252; misleading graphs, 82–83; ­nonexistent values for mean and median in the population, 171; publication bias toward ­statistically significant results, 365; punched cards and data analysis before digital era, 22; redistributive property of the mean, 164–65; sampling from skewed population, 285; ­statistical notation for samples and populations, 281; why the mean has no meaning for ­nominal-level variables, 165 inferential statistics, 2, 280; confidence i­ ntervals and, 316, 316, 323, 342; defined, 13, 25, 342; ­normal curve and, 242, 264, 264–65, 281; ­population parameters and, ­estimating, 315–17; regression and, 543, 565–66, 571, 579; standard error and, 287; visual representation of, 316 Input Variable box in SPSS, 40 intercept, slope and See slope and intercept interface: SPSS, 34, 34–36, 36; Stata, 25–26, 26 interquartile range (IQR), 205–10; box plot for ­displaying, 208–10, 209; cutoff points, 207–8; defined, 205, 220; interval-ratio variables and, 205–6; ordinal variables and, 205; size of, ­calculating, 206; SPSS for ­finding, 228–29; Stata for finding, 222–23 interval-ratio variables, 60, 71, 85, 481, 523, 529; effect of independent variable on means of, using SPSS, 529–31; effect of independent ­variable on means of, using Stata, 523–24; ­interquartile range and, 205–6 intervening variables, 509, 510, 513, 520 inverse correlation, 544, 546, 547 IQ scores: cumulative probability for, 263, 263; for Mensa membership, 265; normal distribution 671 672 Index Index means (continued) finding, 173–75, 175; having no meaning for ­nominal-level ­variables, 165; income, versus median, 179–80, 180; of interval-ratio variables, effect of independent variable on, using SPSS, 529–31; nonexistent values for, in the population, 171; one-sample hypothesis test, 367–75, 385–86, 390–91; redistributive property of, 164–65; SPSS for calculating confidence intervals of, 347–48; SPSS for finding, 188–92; SPSS for one-sample hypothesis test, 390–91; SPSS for two-­sample hypothesis test, 427–28; standard deviations, comparing individual score to, 217; standard deviations, in normal distribution, 256–57; Stata for ­calculating confidence intervals of, 344–45; Stata for finding, 183–84; Stata for one-sample hypothesis test, 385–86; Stata for two-­sample hypothesis test, 422–23; t-distribution and, 326–29, 327 measurement, 5, 23; error, 6–9; goodness-offit, 557–59; of key concept, 4–6, 5; key terms ­involving, 23–24; level of, 9–11; scales, 10, 24 measures of central tendency, 161–202 See also means; median; mode; choosing, 175–79, 181–82; finding, in frequency distributions, 173, 173–75, 175; income, median versus mean, 179–80, 180; mean, 163–67; median, 167–71; mode, 171–73; SPSS for finding, 187–93; Stata for finding, 182–86 measures of variability, 203–40 See also interquartile range (IQR); range; standard deviation (SD); variance; Gould’s research on, 203; Houle’s and Cody’s research on, 161–62, 204; interquartile range, 205–10; range, 205; SPSS for finding, 225–31; standard deviation, 210–19; Stata for finding, 221–25; variance, 211 median, 167–71; defined, 167; finding, 167–71; in frequency distributions, ­finding, 173–75; income, versus mean, 179–80, 180; nonexistent values for, in the ­population, 171; SPSS for finding, 192; Stata for ­finding, 184 “Median Isn’t the Message, The” (Gould), 203 mediating variables, 509, 513, 520 misleading graphics, 82–83, 82–83 mixed methods, 4, 23 mode, 171–73; defined, 171; in frequency ­distributions, finding, 173–75; SPSS for finding, 192; Stata for finding, 184–86 moderating variables, 515, 520 Moving to Opportunity (MTO) project, 356–57 multiple regression, 563–65 Multiplication Rule: with independence, 248–49, 251–52, 266; without independence, 249–51, 266 multistage cluster sampling, 14 National Center for Education Statistics, 238 National Longitudinal Survey of Youth (NLSY), 15, 314, 344, 346, 347 negative correlation, 544, 546, 547 nominal-level variables, 10–11, 165 nominal variables, 10–11 nonlinear relationships, 549, 561–62 non-probability sample, 13 nonresponse bias, 14 non-spurious relationships, 513–17 normal curve: for analyzing distributions and finding probabilities, 257, 261; inferential statistics and, 242, 264, 264–65, 281; terminology for, 264–65 normal distribution, 254–58, 255, 256, 257, 272–79; characteristics of, 246, 266; defined, 254, 266; Empirical Rule, 256, 256–58; ­features of, 255–56; importance of, 254–55; probability and, 272–79; standard deviations of the mean, 256–57; z-scores in, ­calculating, 258–65 normal tables, 259, 260–67, 271, 291, 319–22, 320, 328, 362–63, 365 260, 415, 481 notation See statistical notation null hypothesis: in ANOVA steps, 441, 444, 446, 448; in chi-square test, 464–75, 477–78, 480, 482–85, 486–88, 489, 492; in chi-square test for goodness-of-fit, 483–85, 484, 485; in hypothesis testing using ANOVA, 438–39, 439; in one-­ sample hypothesis tests, 358–59; in two-sample hypothesis tests, 401–2 Numeric Expression box in SPSS, 38, 38 Obama, Barack, 71–75, 121, 508; Feeling Thermometer Ratings, 74, 71, 71–75, 72, 73, 74 Obama Feeling Thermometer Ratings, 74, 71, 72, 73, 74 observed frequencies, 469–72, 475–76, 478, 481, 484, 486–88, 490, 492 Old and New Values box in SPSS, 40, 41 omnibus test, 436 See also analysis of v ­ ariance (ANOVA) one-sample hypothesis tests, 356–98; alpha-level (α), 364–70, 373, 378, 380, 381, 382, 384, 385, 387; alternative hypothesis, 359; ANOVA, 438–39; Index Pager, Devah, 2–3 percentages, 55, 57–60, 58, 59, 84; calculating, 58; cumulative, 60–62, 61, 62, 84; defined, 84; ­relative size assessed by, 59 percent change, 62–63, 84 percentiles, 62, 84; SPSS for finding, 227–28; Stata for finding, 222 personal control, in ANOVA research, 436 pie charts, 69–72, 70, 78, 80; comparing two groups on same variable using, 78, 80; defined, 85; of single variable, 69–72, 70 point estimates, 316, 317, 324, 325, 329, 330, 332, 335, 337 Police Public Contact Survey (PPCS), 15, 16, 16, 382–90, 420–21, 423, 424, 425 political party identification variables, 67, 68, 69, 82–83, 124, 124–26, 125, 134, 411, 482, 482–83, 507–8, 508, 510, 511, 511–13, 512, 518, 518–19, 523–24, 529–31 population parameters See also confidence intervals: defined, 281; estimating, 314–55; inferential statistics and, 315–17, 316; in sampling distributions, 284–87 populations: defined, 12; in gathering data, 12–13; nonexistent values for the mean and median in, 171; parameters in, e­ stimating, 349–54; ­regression equation for, 566; to sample from, 12, 306–13; samples and, differences between, 392–98; skewed, s­ ampling from, 285; statistical notation for, 281 positive correlation, 546, 547, 551, 573 post-hoc tests, 436, 446–48, 447, 450, 453, 458, 461 power of hypothesis test, 378, 382 practical significance: in one-sample hypothesis tests, 379–80, 381; in two-sample hypothesis tests, 418 practice problems See also SPSS p ­ ractice problems; Stata practice problems: ­comparing groups, 429–34; describing linear relationships between variables, 588–97; differences between samples and populations, 392–98; diversity of values in a group, 231–40; estimating population parameters, 349–54; examining relationships between two variables, 152–60; getting to know your data, 109–20; introduction, 45–52; probability and the normal distribution, 272–79; from sample to population, 306–13; testing mean differences among multiple groups, 453–61; testing statistical significance of relationships in cross-tabulations, 492–500; typical values in a group, 193–202; variables, ruling out competing explanations for relationships between, 532–41 precision, of confidence intervals, 317–18, 335, 336, 339, 342 probability, 241–79 See also normal distribution; Americans’ knowledge of, 252; cumulative, 263, 263, 264–65, 267, 269; as fraction, 243; importance of, to statistics, 242; reasons for using, 242–52; sample, 13; SPSS for, 270–71; standardizing variables, 258–65; Stata for, 267–70; ­Tversky’s and Kahneman’s scenario, 241–43, 251–52; z-scores, 258–65 probability, rules of, 242–52; Addition Rule, 245–46, 266; Complement Rule, 246–48, 266; Multiplication Rule, with independence, 248–49, 251–52, 266; Multiplication Rule, without i­ ndependence, 249–51, 266 probability distributions, 253–58 See also ­normal distribution; of continuous ­variables, 253–54, 254; histogram to find, 253, 253; normal ­distribution, 254–58, 255, 256, 257, 272–79 Index assumptions of, 372; decision t in, 368, 369–71, 371, 374–75, 381; degrees of freedom, 369–70, 373; error and limitations, 375–79, 381–82; logic of, 357–59; for means, 367–75, 385–91; M ­ oving to Opportunity project, 356–57; null hypothesis, 358–59; one-tailed test, 359, 365–67, 368, 381; power of, 378; ­practical significance in, 379–80, 381; for proportions, 359–64, 382–85, 387–90; SPSS for, 386–91; Stata for, 382–86; statistical ­significance in, 379–80, 381; steps of, 364–65, 381; t-value in, 327–28, 328, 330–32, 331, 335–36, 341–43, 345, 351, 367–72, 371, 374, 397–98; two-tailed test, 359, 365–67, 366, 373–74, 374; Type I and Type II error, 376–79, 381–82; z-scores in, 367–68, 370, 375, 394 one-tailed test, 359, 365–67, 368 open-ended survey items, 6, 24 operationalization, 4–5, 23 See also measurement operators in Stata, 31, 31 ordinal variables, 10–11; interquartile range and, 205 Organization for Economic Cooperation and Development (OECD), 64 Output Variable: Name box in SPSS, 40 Output window in SPSS, 34, 34, 35, 36, 44, 44 673 674 Index Index proportions, 57–60; confidence intervals of, 318–26, 342–43, 345–46, 348–49; differences between, confidence intervals for, 416–18, 419; differences between, hypothesis tests for, 412– 16; one-sample hypothesis test, 359–64, 382–85, 387–90; SPSS for calculating confidence levels of, 348–49; SPSS for one-sample hypothesis test, 387–90; SPSS for two-sample hypothesis test, 424–27; Stata for calculating confidence levels of, 345–46; Stata for one-sample ­hypothesis test, 382–85; Stata for two-sample ­hypothesis test, 420–22 publication bias toward statistically ­significant results, 365 publicly available secondary data sets, 15–16 punched cards and data analysis before ­digital era, 22 quantitative analysis, 4, 23 quantitative methods, 4, 23 random sampling: error, 287; importance of, 280; repeated, 281–84, 295, 297, 300, 316; sampling distributions, 284–87; simple, 13, 24; stratified, 14, 24 range, 205; defined, 205, 220; formula for, 205; SPSS for finding, 225–27; Stata for finding, 221–22 rank ordering, 9–10, 24, 57, 60, 69, 88, 98, 137, 165, 168, 169, 207 rate, 63–64, 84 ratio, 65, 84 ratio-level variables, 9, 23, 24 raw frequency, 56 Recode into Different Variables dialog box in SPSS, 39–40, 40, 52, 100, 100, 120 Recode into Same Variables dialog box in SPSS, 39 recode/recoding, 67–69; bar graphs and pie charts, 71–72; feeling thermometer ­variable, 71, 71–72; frequency distribution tables, 67–69; SPSS for, 38–42, 39, 40, 41, 98–102; Stata for, 88, 88–90, 90, 94, 119, 158; value labels, SPSS for assigning, 101–2, 102; value labels, Stata for assigning, 90–91, 91, 94 redistributive property of the mean, 164–65 regression analysis, 542–98; Adelman’s research on, 542–43; assumptions of, 571–72; inferential statistics and, 543, 565–66, 571, 579; linear, 550, 554, 561–62, 579–80, 584; logistic, 561–62, 575; logistic regression, 561–62, 575; multiple, 563– 65; statistical inference for, 565–71 regression equation: dummy variables in, 559–61; F-statistic for, 566–68; multiple, 564–65; in null and alternative hypotheses, 567; for a population, 566; r-squared (r2) in, 557–58; slope and intercept in, 554–56; standard error of the slope in, 569 regression line, 550–57 See also linear r­ egression; alternative terms and notation for, 554; in assumptions of regression, 571; best-fitting, 552–53; defined, 550, 573; dummy variables and, 561; equation for, 553, 554; goodness-of-fit measures for, 553–59; in scatterplots, 550–51; slope and intercept of, 553–57, 573; standard error of the estimate, 558–59, 574 regression models: in Adelman’s research, 542–43; goodness-of-fit measures for, 557; for nonlinear relationships, 561–62 Regression Models for Categorical and Limited ­Dependent Variables (Long), 562 relational operators in Stata, 31, 31 relationships: bivariate, 122, 140–41, 140–41; competing explanations between ­variables, ruling out, 532–41; in cross-­tabulations, testing statistical significance of, 492–500; 2016 presidential election and, 121–22; between two variables, 152–60 relative frequencies, 55, 57–60, 58, 59 relative size, percentages and frequencies for assessing, 59 reliability, 6–9, repeated t-Tests, 447 research process, 3–4, 23; concepts, 4–5, 20, 23; measurement or operationalization, 5, 23; research question, 3–4, 23; sampling, 12–15 research question, 3–4, 23 residuals: in assumptions of regression, 571–72, 572; defined, 553; distribution of, in population, 571–72; h ­ omoscedasticity of, 571; independence of, 571; size of, c­ alculating, 553, 558; standardized, in ­chi-square test, 475–76, 476; sum of squared, 558, 566, 568–71 Results Window in Stata, 26, 26, 27–28 right-tail probability, 259, 260, 264, 264–65, 269; for alternative hypotheses, 383, 387, 402; confidence levels and, 321, 321, 322, 342; defined, 260, 264–65; for IQ scores, 259, 259–60; z-scores and, 269, 319–22, 320, 321, 342 Rivera, Lauren, 399–400, 418 Index sample: in gathering data, 12–13; non-­probability, 13; to population from, 306–13; populations and, differences between, 392–98; probability, 13; in sampling, 12; ­statistical notation for, 281 sample size: confidence intervals and, 333–35, 338–41, 343; in relation to ­confidence interval range, 333–35 sampling, 12–15; aggregate level, 6, 24; ­cluster, 14; descriptive statistics, 12–13; ecological fallacy, 6, 24; frame, 13; ­inferential statistics, 13; multistage ­cluster, 14; non-probability, 13; nonresponse bias, 14; population, 12; probability, 13; s­ ample in, 12; simple random sample, 13; from skewed population, 285; stratified ­random sample, 14; unit of analysis, 6, 16, 24, 55, 221 sampling distributions, 280–313; in hypothesis testing using ANOVA, 438–39, 439; population parameters in, 281, 284–87; for two-sample hypothesis tests, 403, 403–4 sampling frame, 13 sampling methods, 13–15; cluster sampling, 14; in gathering data, 13–15; multistage cluster sampling, 14; nonresponse bias, 14; sampling frame, 13; simple random sample, 13; steps in, 13; stratified random sampling, 14 saving your work in SPSS, 44 scales, 10, 24 scatterplots, 546–57, 547, 548, 551; best-­fitting lines in, 552, 552–53; of curvilinear relationship, 549, 549; defined, 546; points in, 548–49; regression lines in, 550–51, 551; slope and intercept in, 553–57, 555, 556 secondary data, 15–18; big data, 17–18; ­codebooks, 16, 16–17; defined, 15; publicly available ­secondary data sets, 15–16 simple random sampling, 13 skewed distributions, 218–19 skewed population, sampling from, 285 slope and intercept, 553–57; calculating, 556–57; defined, 554; statistical software to calculate, 554 social sciences: analysis of variance in, 438; big data in, 17–18; confidence intervals in, 315; differences between groups, 399; explanatory variables in, 513; f­ requencies in, 55; hypothesis tests in, 375; to measure attributes of people, 171; m ­ easurement error in, 6; research question in, 3; in study of statistics, 1; to study ­relationships among variables, 11; unit of ­analysis in, sources of data not collected directly by researcher: big data, 17–18; secondary data, 15–18 SPSS, 33–44; ANOVA, 450–53; box plot, ­generating, 229–30, 230; chi-square test, 489–92; commands, 36–44; Compute V ­ ariable dialog box, 36–38, 37, 38, 41, 187, 188; confidence intervals, calculating, ­346–49; Data Editor, 34, 34–36, 36, 38, 38, 41, 42, 42, 51–52, 101, 120, 270–71, 301; Data View, 35, 38–39, 39, 51–52, 120, 271, 279, 301; Descriptives dialog box, 43–44, 44, 52, 188, 225–26, 226, 230–31, 231, 240, 270–71, 271, 302, 304, 312; drop-down menus, 33, 35, 95; Explore command, 229; Frequencies dialog box, 95–96, 95–96, 190, 190–93, 191, 227, 227–28, 228; frequency distributions, generating, 9–98, 95–98; graphical user interface, 33, 34, 34–36, 36; Input V ­ ariable box, 40; interquartile range, finding, 228–29; launching, 33–34; linear regression, checking assumptions of, 585–87; linear relationships between variables, 581–88; means, finding, 188–92; means, two-­sample test for difference between, 427–28; measures of central tendency, ­finding, ­187–93; measures of variability, finding, 225–31; median, ­finding, 192; mode, f­ inding, 192; Numeric Expression box, 38, 38; Old and New ­Values box, 40, 41; one-sample hypothesis tests, 386–91; O ­ utput Variable: Name box, 40; Output window, 34, 34, 35, 36, 44, 44, 97, 97, 103, 106, 188, 270; overview, 33; percentiles, finding, 227–28; probability, 270–71; p ­ roportions, two-sample test for difference between, 424–27; range, finding, 225–27; Recode into Different Variables dialog box, 39–40, 40, 52, 100, 100, 109, 120; Recode into Different Variables: Old and New Values, 40, 100, 100; Recode into Same Variables dialog box, 39; saving your work, 44; Split File command, 530–31, 531; standard deviation, finding, 230; ­statistics and graphs, generating, 95–109; S ­ ystem-missing, 41, 41; Target Variable box, 38, 38; Transform drop-down menu, 36, 37; two-sample hypothesis tests, 4­ 24–28; value labels, assigning to recoded variables, 101–2, 102; Value Labels d ­ ialog box, 42, 43, 101; values as “missing” for any variable, 42; variables, Index Romney, Mitt, 121 Root Mean Square Error (RMSE), 558–59 r-squared (r2), 557–58 675 676 Index Index SPSS (continued) analyze ­existing, 43, 43–44; variables, calculating new, 187–88; variables, controlling for third variable in cross-­tabulations, 526–29; ­variables, creating frequency distributions for, 95, 95–98, 96, 97, 98; variables, creating new, 37–38; variables, effect of independent variable on means of interval-ratio variable, 529–31; variables, names of, 42, 270; variables, recoding, 38–42, 39, 40, 41, 98–101, 99, 100, 100, 101; variables, save standardized values as, 270–71, 271; ­variables, transform existing, 38–43; ­variables, values as “missing” for any, 42; Variable View, 34, 35, 36, 38, 39, 41, 42, 42; variance, finding, 230 SPSS practice problems: comparing groups, 434; describing linear relationships between variables, 596–97; differences between samples and populations, 397–98; diversity of values in a group, 239–40; ­estimating population parameters, 354; examining relationships between two variables, 159–60; getting to know your data, 119–20; introduction, 51–52; probability and the normal ­distribution, 279; ruling out competing explanations for relationships between variables, 540–41; from sample to population, 312–13; ­testing mean differences among multiple groups, 461; testing statistical ­significance of ­relationships in crosstabulations, 500; ­typical values in a group, 202 SPSS procedures, review of: comparing groups, 428; describing linear relationships between variables, 588; differences between samples and populations, 391; diversity of values in a group, 231; estimating population parameters, 349; examining relationships between two variables, 152; getting to know your data, 109; probability and the normal distribution, 271; ruling out competing explanations for relationships between variables, 531; from sample to population, 305; testing mean differences among multiple groups, 453; testing statistical significance of relationships in cross-tabulations, 492; typical values in a group, 193 spurious relationships, 508–13 stacked bar graph, 140–41, 141 standard deviation (SD), 210–19; “apples and oranges” case, 214–17; to compare individual score to the mean, 217; defined, 210, 220; ­distributions compared with, 212–14, 213–14; formula for, 211; interval-ratio variables compared with, 214–15; skewed versus symmetric distributions, 218–19; SPSS for finding, 230; squared d ­ eviations, 210, 211; Stata for finding, 223–24; s­ ummarizing, 210; variance, 211 standard deviations of mean in normal ­distribution, 256–57 standard error: of the estimate, 558–59; inferential statistics and, 287; of the slope, 568–71 Stata, 25–33 See also tabulate command in Stata; ANOVA, 448–50; bar graphs, ­generating, 94; Base Reference Manual, 33; bootstrap command, 297–98, 298, 312; box plot, generating, 223, 223; by sort command, 523–24, 524; chi-square test, 487–89; ci command, 344, 345, 353, 344–345; code, basic logic of, 28; coding mistakes, 29; Command Box, 26, 26, 27; commands, 28–30; Command Window, 26, 26; ­confidence intervals, calculating, 344–46; corr command, 577, 577, 595; Data ­Editor, 26–27, 27; data set display, 26–27, 27; d ­ isplay command, 268, 268, 269, 269, 279; display invnormal command, 269, 269; do-file, 31–33, 32, 94; drop-down menus, 25; egen command, 182, 185; error messages, 30–31; frequency distributions, generating, 86, 86–88, 87–88, 94; generate command, 28, 29, 51, 89, 119; graph box command, 223, 225, 239; graph command, 91, 223; graphical user interface, 25–26, 26; help sources, 33; histogram command, 92–93, 201, 299, 312, 596; histograms ­showing percentages, generating, 94; “if” in a command, rule for, 29; interquartile range, finding, 222–23; keep command, 295–96; label define command, 90, 94, 119; label values command, 90, 94, 119; launching, 25; linear regression, checking assumptions of, 579–80; mean, finding, 183–84; measures of central tendency, ­finding, 182–86; measures of ­variability, finding, 221–25; median, finding, 184; mode, finding, 184–86; one-sample hypothesis tests, 382–86; oneway command, 449, 449, 460; operators in, 31, 31; overview, 25; percentiles, finding, 222; ­predict command, 579, 596; probability, 267–70; proportions, two-sample test for difference between, 420–22; prtest command, 383, 383, 421, 421; pwmean ­command, 461; range, ­finding, 221–22; recode command, 89, 119, 158; regress command, 579, 596; replace ­command, 29, 51; Results Window, 26, 26, 27–28; sample command, 576; sample command, 296, 297; saving your work, 33, 94; scatter command, 575–76, 595, 596; standard deviation, finding, 223–24; s­ tatistics, generating, Index variables, 11–12; experimental control, 11–12; generalizing, 12–15; growth mindset, 18–19; independent variables, 11–12; level of measurement, 9–11; math anxiety, 19–20; measurement error, 6–9; measurements, 4–6, 5, 23–24; overview of book, 20–21; Pager’s study and, 2–3; quantitative analysis, 4, 23; quantitative methods, 4, 23; reliability, 6–9, 8; research process, 3–4, 23; research questions, 3–4; sampling, 12–15; SPSS to generate, 95–109; Stata to generate, 85–94; statistical control, 12; statistical software programs, 21–22; studying, reasons for, 1–3; units of analysis, 6; validity, 6–9, 8; ­variables, 4–6, 23–24 stem-and-leaf plots, 73–75, 74, 85 stratified random sampling, 14 studying statistics, reasons for, 1–3 summarize command in Stata, 30 sum of squared residual, 558, 566, 568–71 sums of squares (SS): variation between groups, 442, 443, 443–44; variation within groups, 442, 442 Survey of Income and Program Participation (SIPP), 254, 285 symmetric distributions, 218–19 System-missing in SPSS, 41, 41 tabulate command in Stata: to conduct ANOVA, 449, 449–50; to draw random samples from larger sample, 296, 296; to find mode, 185, 185–86, 186, 201; to generate cross-tabulation, 143–46, 144, 145, 146, 158, 499, 521, 521–23, 522, 540; to g ­ enerate frequency distributions, 86, 86–88, 87–88, 94, 119, 185, 185–86, 186, 296, 296, 300; for recoding variables, 88, 88; to run ­chi-square test, 487–89, 488 Target Variable box in SPSS, 38, 38 t-distributions: confidence interval for means and, 326–29, 327; degrees of freedom and, 327, 327–28, 328 testing statistical significance of relationships in cross-tabulations: chi-square test for goodnessof-fit, 483–85, 484, 485; chi-square test of independence, 467–69, 471, 478–79, 479, 481, 483, 486; expected frequencies, 470–71, 473, 476, 481, 483, 484, 486, 487–88, 490; observed frequencies, 469–72, 475–76, 478, 481, 484, 486–88, 490, 492 third variables See control variables Tilcsik, András, 399–400, 418 time series charts, 76, 76–77, 85 Transform drop-down menu in SPSS, 36, 37 Index 85–94; statistics and graphs, generating, 85–94; summarize command, 30, 30, 51, 183, 183, 201, 224, 224, 296, 296, 298, 298, 312; tabstat command, 183–84, 184, 185, 186, 201, 221, 222, 222, 223, 224, 224, 239; Tools menu, 33; ttest command, 397, 422–23, 423; two-sample hypothesis tests, 420–24; User’s Guide, 33; variables, analyze existing, 30; variables, controlling for third variable in cross-­tabulations, 520–23; variables, create new, 28–29, 87, 94, 182–83; variables, creating and attaching labels to categories of, 94; variables, effect of independent variable on means of interval-ratio variable, 523–24; variables, linear relationships between, 575–81; variables, names of, 28–29; variables, transform existing, 29–30; Variables Window, 26, 26; variance, finding, 223–24; z-scores, finding, 267–70 Stata practice problems: comparing groups, 433; describing linear relationships between variables, 595–96; differences between samples and populations, 397; diversity of values in a group, 239; estimating population parameters, 353–54; examining relationships between two variables, 158–59; getting to know your data, 119; introduction, 51; probability and the normal distribution, 279; ruling out competing explanations for relationships between variables, 540; from sample to population, 312; testing mean differences among multiple groups, 460–61; testing statistical significance of relationships in cross-tabulations, 499–500; typical values in a group, 201 statistical control, 12 statistical inference for regression, 565–71; alphalevel (α) in, 543, 567–70; decision f in, 567–68, 574; decision t in, 569–71, 575; F-statistic in, 566–68 statistical interaction between variables, 515–17, 516, 520 statistical notation: for regression line, 554; for sample, 281; for samples and populations, 281; for two-sample hypothesis tests, 402–3 statistical significance: defined, 358, 381; in one-sample hypothesis test, 358, 379–80, 381; publication bias toward, 365; of ­relationships in cross-tabulations, ­testing, 492–500; in two-­ sample hypothesis tests, 418 statistical software programs, 21–22 statistics: causal relationships, 11–12; data, gathering, 12–15; data, secondary, 15–18; dependent 677 678 Index Index Trump, Donald, 121–22, 208–10, 209, 209, 272, 272 t-Table: in chi-square test, 474; in linear ­relations, 569–70; in one-sample hypothesis test, 328, 328, 332, 336, 369–70, 373, 375, 381; in two-sample hypothesis test, 406, 407, 412, 419 t-Tests, 447 Tufte, Edward R., 501–3, 550 Tukey’s test, 446–47, 447, 448, 458 t-value: in ANOVA, 446–47, 453; in linear relations, 543, 569, 571, 575; in one-sample hypothesis test, 327–28, 328, 330–32, 331, 335–36, 341–43, 345, 351, 367–72, 371, 374, 397–98; in two-sample hypothesis test, 406, 412, 419, 423, 434 Tversky, Amos, 241–43, 251–52 two-sample hypothesis tests, 399–434; ­alpha-level (α) in, 400, 405–6, 408, 412, 413, 415, 417, 419, 421, 422, 427; alternative hypothesis in, 402; ANOVA in, 438–39; decision t in, 405, 405–9, 406, 409, 419423; defined, 401, 419; degrees of freedom and, 404, 405–7, 406, 412, 419, 423; for means, differences between, 404–12, 419, 422– 23, 427–28; null hypothesis in, 401–2; ­practical significance in, 418; for proportions, ­differences between, 409–11, 412–17, 420–22, 424–27; sampling distribution for, 403, 403–4; SPSS for, 424–28; Stata for, 420–24; statistical notation for, 402–3; ­statistical significance in, 418; steps for conducting, 419; t-value in, 406, 412, 419, 423, 434; two-tailed hypothesis test in, 402, 406, 414; z-scores in, 413–15, 414, 417, 419, 429, 431, 433 two-tailed hypothesis test: confidence intervals and, equivalence of, 367; in one-­sample hypothesis test, 359, 365–67, 366; in two-sample hypothesis test, 402, 406, 414 Type I and Type II error, 376–79, 381–82 unit of analysis, 6, 16, 24, 55, 221 univariate statistics, 55 upper bounds, 321, 323, 324, 324, 329–32, 331, 334, 335, 336, 338, 342, 343 User’s Guide for Stata, 33 validity, 6–9, Value Labels dialog box in SPSS, 42, 43 value labels for recoding variables: SPSS for assigning, 101–2, 102; Stata for assigning, 90–91, 91, 94 values: in groups, diversity of, 231–40; ­treating as “missing” for any variable in SPSS, 42; typical, in groups, 193–202 variability, 167 variability, measures of See measures of variability variable names: SPSS, 270; Stata, 28–29 ­ ariables; variables, 4–6, 23–24 See also control v linear relationships between variables; analyze existing, in SPSS, 43, 43–44; analyze existing, in Stata, 30; antecedent, 509, 509, 513, 520; association between, 506–7; categories of, cases in, 55; in causal relationships, independent and dependent, 11; closed-ended survey items, 5–6, 24; comparing two groups on same, 77–81; competing explanations for relationships between, ruling out, 532–41; continuous, 9–10; continuous variables, 9–10; covariance of two, formula for, 545, 546; creating new, SPSS for, 37–38; creating new, Stata for, 28–29, 87, 94; dependent, 11–12; describing linear relationships between, 588–97; discrete, 9–10; displays of single, 69–75; dummy, 559–61; examining relationships between two, 152–60; explanatory, in social sciences, 513; formula for covariance of two, 545; frequency distributions for, Stata for creating, 86, 86–88, 87–88; ideological identification, 66, 67–68, 124, 124–26, 125, 134; independent, 11–12; independent effect of, in causal relationships, 513, 520; interval-­ratiolevel, 60, 71, 85, 481, 523, 529; intervening, 509, 510, 513, 520; mediating, 509, 513, 520; moderating, 515, 520; name of, adding full label to in SPSS, 42; nominal-level, 10–11, 165; non-spurious relationships in, modeling, 513, 513–17; open-ended survey items, 6, 24; ordinal-level, 10–11; ratio-level, 9, 23, 24; recoding, Stata for, 88, 88–90, 90; ruling out competing ­explanations for relationships between, 532–41; social sciences to study relationships among, 11; spurious relationships in, modeling, 503, 503–13, 506, 508–13; ­statistical interaction between, 515–17, 516, 520; transform existing, in SPSS, 38–43; transform existing, in Stata, 29–30; validity, 6–9, 8; variability, 167 Variables Window in Stata, 26, 26 Variable View in SPSS, 34, 35, 36, 38, 39, 41, 42, 42 variance: defined, 211, 220; finding, 211; SPSS for finding, 230; Stata for finding, 223–24 Venn diagrams, 563, 563, 594, 594 vertical axis, 69, 73, 82–83, 82–83, 105 visual representations of data See graphical ­representations of data Index x axis, 76, 106, 108, 393, 546, 548, 551, 561, 575, 581, 588 y axis, 76, 108, 299, 546, 548, 551, 554, 555, 556, 561, 575, 581, 588, 592 Yerkes-Dodson Law, 20 z-scores: calculating, probability and, 258–65; in chi-square test, 481; confidence levels and, 321, 321, 322, 342; defined, 258, 266; in normal tables, 259, 260–67, 271, 291, 319–22, 320, 328, 362–63, 365 260, 415, 481; in one-sample hypothesis test, 367–68, 370, 375, 394; right-tail probability and, 269, 319–22, 320, 321, 342; Stata for finding, 267–70; in two-sample hypothesis test, 413–15, 414, 417, 419, 429, 431, 433 Index Warner, Cody, 161–62, 204 World Values Survey (WVS), 15, 16, 128, 143, 147, 158, 159, 201, 202, 221, 225, 244, 247, 281–82, 289–91, 290, 295, 297, 300, 312–13, 353–54, 397 679 ... Title: Statistics for social understanding: with Stata and SPSS / Nancy ­Whitter (Smith College), Tina Wildhagen (Smith College), Howard J Gold (Smith ­College) Description: Lanham : Rowman & Littlefield, ... INDEX  667 vii Preface The idea for Statistics for Social Understanding: With Stata and SPSS began with our desire to offer a different kind of book to our statistics students We wanted a book... lecture notes, and class exercises Statistics for Social Understanding is distinguished by several features: (1) It is the only major introductory statistics book to integrate Stata and SPSS, giving

Ngày đăng: 01/09/2021, 21:32

Từ khóa liên quan

Mục lục

  • Cover

  • Title Page

  • Brief Contents

  • Contents

  • Preface

  • About the Authors

  • Chapter 1 Introduction

    • Why Study Statistics?

    • Research Questions and the Research Process

    • Pinning Things Down: Variables and Measurement

    • Units of Analysis

    • Measurement Error: Validity and Reliability

    • Levels of Measurement

    • Causation: Independent and Dependent Variables 

    • Getting the Data: Sampling and Generalizing

    • Sources of Secondary Data: Existing Data Sets, Reports, and “Big Data”

    • Growth Mindset and Math Anxiety

    • Using This Book

    • Statistical Software

    • Chapter Summary

    • Using Stata

Tài liệu cùng người dùng

Tài liệu liên quan