Fundamentals of statistical reasoning in education 3th edition Coladaci Cobb Minimum and Ckarke Fundamentals of statistical reasoning in education 3th edition Coladaci Cobb Minimum and Ckarke Fundamentals of statistical reasoning in education 3th edition Coladaci Cobb Minimum and CkarkeFundamentals of statistical reasoning in education 3th edition Coladaci Cobb Minimum and Ckarke Fundamentals of statistical reasoning in education 3th edition Coladaci Cobb Minimum and Ckarke
CHAPTER 11 Testing Statistical Hypotheses About m When s Is Known: The One-Sample z Test 11.1 Testing a Hypothesis About m: Does \Homeschooling" Make a Difference? In the last chapter, you were introduced to sampling theory that is basic to statistical inference In this chapter, you will learn how to apply that theory to statistical hypothesis testing, the statistical inference approach most widely used by educational researchers It also is known as significance testing We present a very simple example of this approach: testing hypotheses about means of single populations Specifically, we will focus on testing hypotheses about m when s is known Since the early 1980s, a growing number of parents across the U.S.A have opted to teach their children at home The United States Department of Education estimates that 1.5 million students were being homeschooled in 2007—up 74% from 1999, when the Department of Education began keeping track Some parents homeschool their children for religious reasons, and others because of dissatisfaction with the local schools But whatever the reasons, you can imagine the rhetoric surrounding the \homeschooling" movement: Proponents treat its efficacy as a foregone conclusion, and critics assume the worst But does homeschooling make a difference—whether good or bad? Marc Meyer, a professor of educational psychology at Puedam College, decides to conduct a study to explore this question As it turns out, every fourth-grade student attending school in his state takes a standardized test of academic achievement that was developed specifically for that state Scores are normally distributed with m ¼ 250 and s ¼ 50 Homeschooled children are not required to take this test Undaunted, Dr Meyer selects a random sample of 25 homeschooled fourth graders and has each child complete the test (It clearly would be too expensive and time-consuming to test the entire population of homeschooled fourth-grade students in the state.) His general objective is to find out how the mean of the population of achievement scores for homeschooled fourth graders compares with 250, the state value Specifically, his research question is this: \Is 250 a reasonable value for the mean of the 214 11.2 Dr Meyer’s Problem in a Nutshell 215 homeschooled population?" Notice that the population here is no longer the larger group of fourth graders attending school, but rather the test scores for homeschooled fourth graders This illustrates the notion that it is the concerns and interests of the investigator that determine the population Although we will introduce statistical hypothesis testing in the context of this specific, relatively straightforward example, the overall logic to be presented is general It applies to testing hypotheses in situations far more complex than Dr Meyer’s In later chapters, you will see how the same logic can be applied to comparing the means of two or more populations, as well as to other parameters such as population correlation coefficients In all cases—whether here or in subsequent chapters—the statistical tests you will encounter are based on the principles of sampling and probability discussed so far 11.2 Dr Meyer’s Problem in a Nutshell In the five steps that follow, we summarize the logic and actions by which Dr Meyer will answer his question We then provide a more detailed discussion of this process Step Dr Meyer reformulates his question as a statement, or hypothesis: The mean of the population of achievement scores for homeschooled fourth graders, in fact, is equal to 250 That is, m ¼ 250 Step He then asks, \If the hypothesis were true, what sample means would be expected by chance alone—that is, due to sampling variation—if an infinite number of samples of size n ¼ 25 were randomly selected from this population (i.e., where m ¼ 250)?" As you know from Chapter 10, this information is given by the sampling distribution of means The sampling distribution relevant to this particular situation is shown in Figure 11.1 The mean of this sampling distribution, mX , is equal to the hypothesized value of 250, and the standard error, sX , is equal to pffiffiffiffiffi pffiffiffi s= n ¼ 50= 25 ¼ 10 Step He selects a single random sample from the population of homeschooled fourth-grade students in his state (n ¼ 25), administers the achievement test, and computes the mean score, X Sampling distribution of means (n = 25) XA XB mX = 250 s = 10 X Figure 11.1 Two possible locations of the obtained sample mean among all possible sample means when the null hypothesis is true 216 Chapter 11 Testing Statistical Hypotheses About m When s Is Known: The One-Sample z Test Step He then compares his sample mean with all the possible samples of n ¼ 25, as revealed by the sampling distribution This is done in Figure 11.1, where, for illustrative purposes, we have inserted two possible results Step On the basis of the comparison in Step 4, Dr Meyer makes one of two decisions about his hypothesis that m ¼ 250: It will be either \rejected" or \retained." If he obtains XA , he rejects the hypothesis as untenable, for XA is quite unlike the sample means that would be expected if the hypothesis were true That is, the probability is exceedingly low that he would obtain a mean as deviant as XA due to random sampling variation alone, given m ¼ 250 It’s possible, mind you, but not very likely On the other hand, Dr Meyer retains the hypothesis as a reasonable statement if he obtains XB , for XB is consistent with what would be expected if the hypothesis were true That is, there is sufficient probability that XB could occur by chance alone if, in the population, m ¼ 250 The logic above may strike you as being a bit backward This is because statistical hypothesis testing is a process of indirect proof To test his hypothesis, Dr Meyer first assumes it to be true Then he follows the logical implications of this assumption to determine, through the appropriate sampling distribution, all possible sample results that would be expected under this assumption Finally, he notes whether his actual sample result is contrary to what would be expected If it is contrary, the hypothesis is rejected as untenable If the result is not contrary to what would be expected, the hypothesis is retained as reasonably possible You may be wondering what Dr Meyer’s decision would be were his sample mean to fall somewhere between XA and XB Just how rare must the sample value be to trigger rejection of the hypothesis? How does one decide? As you will soon learn, there are established criteria for making such decisions With this general overview of Dr Meyer’s problem, we now present a more detailed account of statistical hypothesis testing 11.3 The Statistical Hypotheses: H0 and H1 In Step on the previous page, Dr Meyer formulated the hypothesis: The mean of the population of achievement scores for homeschooled fourth graders is equal to 250 This is called the null hypothesis and is written in symbolic form, H0: m ¼ 250 The null hypothesis, H0, plays a central role in statistical hypothesis testing: It is the hypothesis that is assumed to be true and formally tested, it is the hypothesis that determines the sampling distribution to be employed, and it is the hypothesis about which the final decision to \reject" or \retain" is made 11.3 The Statistical Hypotheses: H0 and H1 217 A second hypothesis is formulated at this point: the alternative hypothesis, H1 The alternative hypothesis, H1, specifies the alternative population condition that is \supported" or \asserted" upon rejection of H0 H1 typically reflects the underlying research hypothesis of the investigator In the present case, the alternative hypothesis specifies a population condition other than m ¼ 250 H1 can take one of two general forms If Dr Meyer goes into his investigation without a clear sense of what to expect if H0 is false, then he is interested in knowing that the actual population value is either higher or lower than 250 He is just as open to the possibility that mean achievement among homeschoolers is above 250 as he is to the possibility that it is below 250 In this case he would specify a nondirectional alternative hypothesis: H1: m 6¼ 250 In contrast, Dr Meyer would state a directional alternative hypothesis if his interest lay primarily in one direction Perhaps he firmly believes, based on pedagogical theory and prior research, that the more personalized and intensive nature of homeschooling will, if anything, promote academic achievement In this case, he would hypothesize the actual population value to be greater than 250 if the null hypothesis is false Here, the alternative hypothesis would take the form, H1: m > 250 If, on the other hand, he posited that the population value was less than 250, then the form of the alternative hypothesis would be H1: m < 250 You see, then, that there are three specific alternative hypotheses from which to choose in the present case: H1: m 6¼ 250 (nondirectional) H1: m < 250 (directional) H1: m > 250 (directional) Let’s assume that Dr Meyer has no compelling basis for stating a directional alternative hypothesis Thus, his two statistical hypotheses are: H0: m ¼ 250 H1: m 6¼ 250 Notice that both H0 and H1 are statements about populations and parameters, not samples and statistics That is, both statistical hypotheses specify the population parameter m, rather than the sample statistic Furthermore, both hypotheses are formulated before the data are examined We will further explore the nature of H0 and H1 in later sections of this chapter 218 11.4 Chapter 11 Testing Statistical Hypotheses About m When s Is Known: The One-Sample z Test The Test Statistic z Having stated his null and alternative hypotheses (and collected his data), Dr Meyer calculates the mean achievement score from his sample of 25 homeschoolers, which he finds to be X ¼ 272 How likely is this sample mean, if in fact the population mean is 250? In theoretical terms, if repeated samples of n ¼ 25 were randomly selected from a population where m ¼ 250, what proportion of sample means would be as deviant from 250 as 272? To answer this question, Dr Meyer determines the relative position of his sample mean among all possible sample means that would obtain if H0 were true He knows that the theoretical sampling distribution has as its mean the value hypothesized under the null hypothesis: 250 (see Figure 11.1) And from his knowledge that s ¼ 50, he easily determines the standard error of the mean, s X , for this sampling distribution: s 50 50 ¼ 10 s X ¼ pffiffiffi ¼ pffiffiffiffiffi ¼ n 25 Now Dr Meyer converts his sample mean of 272 to a z score using Formula (10.3) Within the context of testing statistical hypotheses, the z score is called a test statistic: It is the statistic used for testing H0 The general structure of the z-score formula has not changed from the last time you saw it, although we now replace m with m0 to represent the value of m that is specified in the null hypothesis: The test statistic z z¼ X À m0 sX (11:1) In the present case, z¼ X À m0 272 À 250 22 ¼ ¼ þ2:20 ¼ 10 10 sX The numerator of this ratio, 22, indicates that the sample mean of 272 is 22 points higher than the population mean under the null hypothesis (m0 ¼ 250) When divided by the denominator, 10, this 22-point difference is equivalent to 2.20 standard errors—the value of the z statistic, or z ratio Because it involves data from a single sample, we call this test the one-sample z test Equipped with this z ratio, Dr Meyer now locates the relative position of his sample mean in the sampling distribution Using familiar logic, he then assesses the probability associated with this value of z, as described in the next section 11.5 11.5 The Probability of the Test Statistic: The p Value 219 The Probability of the Test Statistic: The p Value Let’s return to the central question: How likely is a sample mean of 272, given a population where m ¼ 250? More specifically, what is the probability of selecting from this population a random sample for which the mean is as deviant as 272? From Table A (Appendix C), Dr Meyer determines that 0139 of the area under the normal curve falls beyond z ¼ 2:20, the value of the test statistic for X ¼ 272 This is shown by the shaded area to the right in Figure 11.2 Is 0139 the probability value he seeks? Not quite Recall that Dr Meyer has formulated a nondirectional alternative hypothesis, because he is equally interested in either possible result: that is, whether the population mean for homeschoolers is above or below the stated value of 250 Even though the actual sample mean will fall on only one side of the sampling distribution (it certainly can’t fall on both sides at once!), the language of the probability question nonetheless must honor the nondirectional nature of Dr Meyer’s H1 (Remember: H1 was formulated before data collection.) This question concerns the probability of selecting a sample mean as deviant as 272 Because a mean of 228 (z ¼ À2:20) is just as deviant as 272 (z ¼ þ2:20), Dr Meyer uses the OR/addition rule and obtains a two-tailed probability value (see Figure 11.2) This is said to be a two-tailed test He combines the probability associated with z ¼ þ2:20 (shaded area to the right) with the probability associated with z ¼ À2:20 (shaded area to the left) to obtain the exact probability, or p value, for his outcome: p ¼ :0139 þ :0139 ¼ :0278 (In practice, you simply double the tabled value found in Table A.) A p value is the probability, if H0 is true, of observing a sample result as deviant as the result actually obtained (in the direction specified in H1 ) A p value, then, is a measure of how rare the sample results would be if H0 were true The probability is p ¼ :0278 that Dr Meyer would obtain a mean as deviant as 272, if in fact m ¼ 250 Sampling distribution (n = 25) Area = 0139 X = 228 z = –2.20 Area = 0139 m0 = 250 s = 10 X = 272 (obtained mean) z = +2.20 X Figure 11.2 Location of Dr Meyer’s sample mean (X ¼ 272) in the sampling distribution under the null hypothesis (m0 ¼ 250) 220 11.6 Chapter 11 Testing Statistical Hypotheses About m When s Is Known: The One-Sample z Test The Decision Criterion: Level of Significance (a) Now that Dr Meyer knows the probability associated with his outcome, what is his decision regarding H0? Clearly, a sample mean as deviant as the one he obtained is not very likely under the null hypothesis (m ¼ 250) Indeed, over an infinite number of random samples from a population where m ¼ 250, fewer than 3% (.0278) of the sample means would deviate this much (or more) from 250 Wouldn’t this suggest that H0 is false? To make a decision about H0, Dr Meyer needs an established criterion Most educational researchers reject H0 when p :05 (although you often will encounter the lower value 01, and sometimes even 001) Such a decision criterion is called the level of significance, and its symbol is the Greek letter a (alpha) The level of significance, a, specifies how rare the sample result must be in order to reject H0 as untenable It is a probability (typically 05, 01, or 001) based on the assumption that H0 is true Let’s suppose that Dr Meyer adopts the 05 level of significance (i.e., a ¼ :05) He will reject the null hypothesis that m ¼ 250 if his sample mean is so far above or below 250 that it falls among the most unlikely 5% of all possible sample means We illustrate this in Figure 11.3, where the total shaded area in the tails represents the 5% of sample means least likely to occur if H0 is true The 05 is split evenly between the two tails—2.5% on each side—because of the nondirectional, twotailed nature of H1 The regions defined by the shaded tails are called regions of Sampling distribution (n = 25) Area = 025 Area = 025 m0 = 250 sX = 10 Region of rejection z.05 = –1.96 critical value Region of retention X = 272 z = +2.20 Region of rejection z.05 = +1.96 critical value Figure 11.3 Regions of rejection for a two-tailed test (a ¼ :05) Dr Meyer’s sample mean (X ¼ 272) falls in the critical region (þ 2:20 > þ1:96); H0 is rejected and H1 is asserted 11.6 The Decision Criterion: Level of Significance (a) 221 rejection, for if the sample mean falls in either, H0 is rejected as untenable They also are known as critical regions The critical values of z separate the regions of rejection from the middle region of retention In Chapter 10 (Problem of Section 10.8), you learned that the middle 95% of all possible sample means in a sampling distribution fall between z ¼ 61:96 This also is illustrated in Figure 11.3, where you see that z ¼ À1:96 marks the beginning of the lower critical region (beyond which 2.5% of the area falls) and, symmetrically, z ¼ þ1:96 marks the beginning of the upper critical region (with 2.5% of the area falling beyond) Thus, the two-tailed critical values of z, where a ¼ :05, are z:05 ¼ 61:96 We attach the subscript \.05" to z, signifying that it is the critical value of z (a ¼ :05), not the value of z calculated from the data (which we leave unadorned) Dr Meyer’s test statistic, z ¼ þ2:20, falls beyond the upper critical value (i.e., þ2:20 > þ1:96) and thus in a region of rejection, as shown in Figure 11.3 This indicates that the probability associated with his sample mean is less than a, the level of significance He therefore rejects H0: m ¼ 250 as untenable Although it is possible that this sample of homeschoolers comes from a population where m ¼ 250, it is so unlikely (p ¼ :0278) that Dr Meyer dismisses the proposition as unreasonable If his calculated z ratio had been a negative 2.20, he would have arrived at the same conclusion (and obtained the same p value) In that case, however, the z ratio would fall in the lower rejection region (i.e., À2:20 < À1:96) Notice, then, that there are two ways to evaluate the tenability of H0 You can compare the p value to a (in this case, :0278 < :05), or you can compare the calculated z ratio to its critical value (þ 2:20 > þ1:96) Either way, the same conclusion will be reached regarding H0 This is because both p (i.e., area) and the calculated z reflect the location of the sample mean relative to the region of rejection The decision rules for a two-tailed test are shown in Table 11.1 The exact probabilities for statistical tests that you will learn about in later chapters cannot be easily determined from hand calculations With most tests in this book, you therefore will rely on the comparison of calculated and critical values of the test statistic for making decisions about H0 Back to Dr Meyer The rejection of H0 implies support for H1: m 6¼ 250 He won’t necessarily stop with the conclusion that the mean achievement for the population of homeschooled fourth graders is some value \other than" 250 For if 250 is so far below his obtained sample mean of 272 as to be an untenable value for m, then any value below 250 is even more untenable Thus, he will follow common practice and conclude that m must be above 250 How far above 250, he cannot say (You will learn in the next chapter how to make more informative statements about where m probably lies.) Table 11.1 Decision Rules for a Two-Tailed Test Reject H0 In terms of p: In terms of z: if p if z a Àza or z ! þza Retain H0 if p > a if z > Àza or z < þza 222 Chapter 11 Testing Statistical Hypotheses About m When s Is Known: The One-Sample z Test Table 11.2 Step Summary of the Statistical Hypothesis Testing Conducted by Dr Meyer Specify H0 and H1, and set the level of significance (a) • H0: m ¼ 250 • H1: m 6¼ 250 • a ¼ :05 (two-tailed) Step Select the sample, calculate the necessary sample statistics • Sample mean: X ¼ 272 • Standard error of the mean, sX : ffiffiffiffiffi ¼ 50 ¼ 10 s X ¼ psffiffiffi ¼ p50 n 25 • Test statistic z: z¼ X À m0 272 À 250 22 ¼ ¼ þ2:20 ¼ sX 10 10 Step Determine the probability of z under the null hypothesis The two-tailed probability is p ¼ :0139þ :0139 ¼ :0278, which is less than 05 (i.e., p a) Of course the obtained z ratio also exceeds the critical z value (i.e.,þ2:20 > þ1:96) and therefore falls in the rejection region Step Make the decision regarding H0 Because the calculated z ratio falls in the rejection region ( p and H1 is asserted a), H0 is rejected In Table 11.2, we summarize the statistical hypothesis testing process that Dr Meyer followed We encourage you to review this table before proceeding 11.7 The Level of Significance and Decision Error You have just seen that the decision to reject or retain H0 depends on the announced level of significance, a, and that 05 and 01 are common values in this regard In one sense these values are arbitrary, but in another they are not The level of significance, a, is a statement of risk—the risk the researcher is willing to assume in making a decision about H0 Look at Figure 11.4, which shows how a two-tailed test would be conducted where a ¼ :05 When H0 is true (m0 ¼ mtrue ), 5% of all possible sample means nevertheless will lead to the conclusion that H0 is false This is necessarily so, for 5% of the sample means fall in the \rejection" region of the sampling distribution, even though these extreme means will occur (though rarely) when H0 is true Thus, when you adopt a ¼ :05, you really are saying that you will accept a probability of 05 that H0 will be rejected when it is actually true Rejecting a true 11.7 The Level of Significance and Decision Error 223 Sampling distribution 025 025 m0 = mtrue Reject H0 Reject H0 Figure 11.4 Two-tailed test (a ¼ :05); 5% of sample z ratios leads incorrectly to the rejection of H0 when it is true (Type I error) H0 is a decision error, and, barring divine revelation, you have no idea when such an error occurs The level of significance, a, gives the probability of rejecting H0 when it is actually true Rejecting H0 when it is true is known as a Type I error Stated less elegantly, a Type I error is getting statistically significant results \when you shouldn’t." To reduce the risk of making such an error, the researcher can set a at a lower level Suppose you set it very low, say at a ¼ :0001 Now suppose you obtain a sample result so deviant that its probability of occurrence is only p ¼ :002 According to your criterion, this value is not rare enough to cause you to reject H0 (i.e., :002 > :0001) Consequently, you retain H0, even though common sense tells you that it probably is false Lowering a, then, increases the likelihood of making another kind of error: retaining H0 when it is false Not surprisingly, this is known as a Type II error: A Type II error is committed when a false H0 is retained We illustrate the notion of a Type II error in Figure 11.5 Imagine that your null hypothesis, H0: m ¼ 150, is tested against a two-tailed alternative with a ¼ :05 You draw a sample and obtain a mean of 152 Now it may be that unbeknown to you, the true mean for this population is 154 In Figure 11.5, the distribution drawn with the solid line is the sampling distribution under the null hypothesis, the one that describes the situation that would exist if H0 were true (m0 ¼ 150) The true distribution, known only to powers above, is drawn with a dashed line and centers on 154, the true population mean (mtrue ¼ 154) To test your hypothesis that m ¼ 150, you evaluate the sample mean of 152 according to its position in the sampling distribution shown by the solid line Relative to that distribution, it is not so deviant (from m0 ¼ 150) as to call for the rejection of H0 Your decision therefore is to retain the null hypothesis, H0: m ¼ 150 It is, of course, an erroneous decision—a Type II error Statistical Tables Table E C.5Critical Values of r Table C.5Critical Values of r Levels of Significance for a One-Tailed Test 05 025 01 df Levels of Significance for a Two-Tailed Test 10 05 02 988 997 900 950 805 878 729 811 669 755 622 707 582 666 549 632 521 602 10 497 576 11 476 553 12 458 532 13 441 514 14 426 497 15 412 482 16 400 468 17 389 456 18 378 444 19 369 433 20 360 423 22 344 404 (Continued in next column) 451 9995 980 934 882 833 789 750 716 685 658 634 612 592 574 558 542 529 516 503 492 472 005 01 9999 990 959 917 875 834 798 765 735 708 684 661 641 623 606 590 575 561 549 537 515 df 24 26 28 30 35 40 45 50 55 60 70 80 90 100 120 150 200 300 400 500 1000 Levels of Significance for a One-Tailed Test 05 025 01 005 Levels of Significance for a Two-Tailed Test 10 05 02 01 330 317 306 296 275 257 243 231 220 211 195 183 173 164 150 134 116 095 082 073 052 496 479 463 449 418 393 372 354 339 325 302 283 267 254 232 208 181 148 128 115 081 388 374 361 349 325 304 288 273 261 250 232 217 205 195 178 159 138 113 098 088 062 Source: # 1963 R A Fisher and F Yates, reprinted by permission of Pearson Education Limited .453 437 423 409 381 358 338 322 307 295 274 256 242 230 210 189 164 134 116 104 073 452 Appendix C Statistical Tables Table F C.6The x2 Statistic The first column identifies the specific w2 distribution according to its number of degrees of freedom Other columns give the proportion of the area under the entire curve that falls above the tabled value of w2 Area χ2 Area Area in in the the Upper Upper Tail Tail df 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 10 05 025 01 005 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 21.06 22.31 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 35.56 36.74 37.92 39.09 40.26 51.81 63.17 74.40 85.53 96.58 107.56 118.50 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 38.89 40.11 41.34 42.56 43.77 55.76 67.50 79.08 90.53 101.88 113.14 124.34 5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 26.12 27.49 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 41.92 43.19 44.46 45.72 46.98 59.34 71.42 83.30 95.02 106.63 118.14 129.56 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 44.31 45.64 46.96 48.28 49.59 50.89 63.69 76.15 88.38 100.42 112.33 124.12 135.81 7.78 10.60 12.84 14.86 16.75 18.55 20.28 21.96 23.59 25.19 26.76 28.30 29.82 31.32 32.80 34.27 35.72 37.16 38.58 40.00 41.40 42.80 44.18 45.56 46.93 48.29 49.64 50.99 52.34 53.67 66.77 79.49 91.95 104.22 116.32 128.30 140.17 Source: Biometrika Tables for Statisticians, by E Pearson and H Hartley (Copyright # 1976 by the Oxford University Press, Table Adapted by permission of the Oxford University Press on behalf of the Biometrika Trust.) EPILOGUE A Note on (Almost) Assumption-Free Tests The inferential statistical tests that we have considered in this text are known as parametric tests They involve hypotheses about population parameters (e.g., m, r) and/or require assumptions about the population distributions Regarding the latter, for example, the t-test for independent samples assumes population normality and homogeneity of variance, as does the F-test Fortunately, as you have learned, these tests can be quite robust That is, substantial departure from the assumed conditions may not seriously invalidate the test when sample size is moderate to large However, a problem can arise when the distributional assumptions are seriously violated and sample size is small The good news is that there are alternative statistical procedures that carry less restrictive assumptions regarding the population distributions These procedures have been called \distribution-free" tests, although we prefer the more descriptive term \assumptionfree." (We inserted the word \almost" in the epilogue title to emphasize that you are never freed completely from underlying assumptions when carrying out a statistical test; it is just that with some tests the assumptions are less restrictive.) Such procedures also are known more generally as nonparametric tests.1 With respect to their logic and required calculations, nonparametric tests are much \friendlier" than their parametric counterparts But there is a price for everything In this case, it is that nonparametric tests are somewhat less sensitive, or statistically powerful, than the equivalent parametric test when the assumptions for the latter are fully met That is, nonparametric tests are less likely to result in statistical significance when the null hypothesis is false However, nonparametric procedures are more powerful when the parametric assumptions cannot be satisfied Four commonly used nonparametric procedures are Spearman’s rank correlation (analogous to Pearson r), Mann-Whitney U (t-test for independent samples), the sign test (t-test for dependent samples), and the Kruskal-Wallis test (one-way ANOVA) Each of these nonparametric tests may be given special consideration when (a) the data as gathered are in the form of ranks or (b) the distributional assumptions required for parametric tests are untenable and sample size is small There are entire volumes devoted to nonparametric procedures (e.g., Daniel, 1990; Marascuilo & McSweeney, 1977; Siegel & Castellan, 1988) If you find that your own work takes you in the nonparametric direction, you should consult this literature for a full treatment of the associated logic and calculations of the test you are considering Although technically not synonymous, the terms assumption-free, distribution-free, and nonparametric tend to be used interchangeably 409 GLOSSARY absolute zero The value of reflects the absence of the characteristic being measured, as in a ratio scale of measurement (Compare with arbitrary zero.) accessible population When a convenience sample has been selected, the corresponding population about which conclusions can be justifiably drawn algebraic property (of the mean) In short, SðX À XÞ ¼ That is, deviations involving scores above the mean are balanced equally by deviations involving scores below the mean alpha (a) Symbolizes the level of significance alternative hypothesis (H1) Specifies the alternative population condition that is \supported" or \asserted" upon rejection of the null hypothesis (H0) H1 typically represents the underlying research hypothesis of the investigator analysis of variance (ANOVA) (See one-way analysis of variance.) AND/multiplication rule The rule stating that the probability of the joint occurrence of one event AND another AND another AND is obtained by multiplying their separate probabilities, provided the events are independent ANOVA Analysis of variance ANOVA summary table A table summarizing the results of an analysis of variance Typically includes the sources of variation, sums of squares, degrees of freedom, variance estimates, calculated F ratio, and p value arbitrary zero The value of is arbitrarily set, as in an interval scale of measurement (Compare with absolute zero.) bar chart A graph showing the distribution of frequencies for a qualitative variable between-groups variation In ANOVA, the variability among the means of two or more groups; reflects inherent variation plus any differential treatment effect bimodal distribution In the idealized form, a perfectly symmetrical distribution having two modes bivariate Involving two variables box plot A graph that simultaneously conveys information about a variable’s central tendency, variability, and shape central limit theorem Theorem that the sampling distribution of means tends toward a normal shape as the samples size increases, regardless of the shape of the population distribution from which the samples have been randomly selected central tendency Measures of central tendency communicate the \typical" score in a distribution Common measures are the mean, median, and mode chi-square (w2) test A test statistic appropriate for data expressed as frequencies, where the fundamental comparison is between observed frequencies versus the frequencies one would expect if the null hypothesis were true class intervals The grouping of individual scores into intervals of scores, as reported in a groupeddata frequency distribution coefficient of determination (r2) The proportion of common variance in X and Y; an effect size The coefficient of nondetermination is equal to À r2 common variance Variance that is shared by two variables confidence interval A range of values within which it can be stated with reasonable confidence (e.g., 95%) the population parameter lies confidence level The degree of confidence associated with an interval estimate (usually 95% or 99%) confidence limits The upper and lower values of a confidence interval contingency table A bivariate frequency distribution with rows representing the categories of one 453 454 Glossary variable and columns representing the categories of the second variable convenience sample A sample chosen based on ease of accessibility correlation coefficient A measure of the degree of linear association between two quantitative variables covariance Measures the magnitude and direction of linear association between two quantitative variables Because the covariance is dependent on the scales of X and Y, it is insufficient as an index of association critical region(s) The area corresponding to the region(s) of rejection critical value(s) Value(s) appropriate to the test statistic used that mark off the region(s) of rejection crosstabulation (See contingency table.) cumulative percentage The percentage of cases falling below the upper exact limit of a class interval curvilinear relationship Where a curved line best represents the pattern of data points in a scatterplot data point A pair of X and Y scores in a scatterplot decision error Either rejecting a null hypothesis when it is true (Type I error ) or retaining a null hypothesis when it is false (Type II error) degrees of freedom (df ) The number of independent pieces of information a sample of observations can provide for purposes of statistical inference dependent samples Samples in which there is some way to connect the groups; they are not independent of one another (e.g., matched pairs, repeated measures on the same individuals) dependent variable In regression, the variable designated Y and assumed to be influenced or temporally preceded by X (the dependent variable) derived score (See standard score.) descriptive statistics Procedures and statistics that organize, summarize, and simplify the data so they are more readily comprehended Conclusions from descriptive statistics are limited to the people (or objects) on whom (or on which) the data were collected deviation score X À X directional alternative hypothesis An alternative hypothesis that states a specific direction of a hypothesized difference (e.g., m > 500) rather than an inequality (e.g., m = 500); calls for a one-tailed probability effect size A general term for a statistic that communicates the magnitude of a research finding rather than its statistical significance Effect size can pertain to a bivariate relationship (r2) or differences between two or more means (e.g., d, o ˆ ) exact probability (See p value.) expected frequencies In a chi-square analysis, the frequencies one would expect if the null hypothesis were true experimental control (See randomization.) F ratio Test statistic used in one-way ANOVA, representing the ratio of between-groups to within-groups variation family of distributions When there are multiple sampling distributions for a test statistic, depending on the respective degrees of freedom The sampling distribution of t, F, and w2 each entails a family of distributions, whereas there is a single sampling distribution of z frequency The number of occurrences of an observation; also called absolute frequency frequency distribution The display of unique observations in a set of data and the frequencies associated with each grand mean The mean of two or more means, weighted by the n of each group heteroscedasticity In a scatterplot, when the spread of Y values is markedly different across values of X (Also see homoscedasticity.) histogram A graph showing the distribution of frequencies for a quantitative variable homogeneity of variance The condition where population variances are equal: s21 ¼ s22 ¼ Á Á Á ¼ s2k The independent samples t test and ANOVA both require the assumption of homogeneity of variance homoscedasticity In a scatterplot, when the spread of Y values is similar across values of X Within the more specific context of regression, when the spread of Y scores about Y is similar for all values of Y independent samples Samples in which none of the observations in one group is in any way related to observations in the other groups Glossary independent variable In simple regression, the variable designated X and assumed to influence or temporally precede Y (the dependent variable) indirect proof The nature of statistical hypothesis testing, which starts with the assumption that the null hypothesis is true, and then examines the sample results to determine whether they are inconsistent with this assumption inferential statistics Statistics that permit conclusions about a population, based on the characteristics of a sample drawn from the population inherent variation (See within-groups variation.) intercept In a regression equation, the intercept (symbolized by a) is the predicted value of Y where X = interval estimate (See confidence interval.) interval midpoint The midpoint of a class interval interval scale The scale’s values have equal intervals of values (e.g., Celsius or Fahrenheit thermometer) and an arbitrary zero interval width The number of score values in a class interval J-curve An extreme negatively skewed distribution least squares criterion In fitting a straight line to a bivariate distribution, the condition that P ðY À Y Þ is minimized (as is the case with the regression equation, Y0 = a + bX) level of significance A decision criterion that specifies how rare the sample result must be in order to reject H0 as untenable (typically 05, 01, or 001); denoted as alpha (a) line of best fit (See regression line.) linear relationship Where a straight line best represents the pattern of data points in a scatterplot matched-subjects design A research design where the investigator matches research participants on some characteristic prior to randomization mean The sum of all scores in a distribution divided by the number of scores The mean—\average" to the layperson—is the algebraic balance point in a distribution of scores (Also see algebraic property.) measurement The process of assigning numbers to the characteristics under study median The middle score in an ordered distribution, so that an equal number of scores falls below and above it The median corresponds to the 50th percentile 455 mode The score that occurs with the greatest frequency negative association (negative correlation) As values of X increase, values of Y tend to decrease negative skew A skewed distribution where the elongated tail is to the left nominal scale The scale’s values merely name the category to which the object under study belongs (e.g., = \male," = \female") Qualitative, or categorical, variables have a nominal scale of measurement nondirectional alternative hypothesis An alternative hypothesis that simply states an inequality (e.g., m = 500) rather than a specific direction of a hypothesized difference (e.g., m > 500); calls for a two-tailed probability nonparametric tests Statistical procedures that carry less restrictive assumptions regarding the population distributions; sometimes called assumption-free or distribution-free tests normal curve (normal distribution) In the idealized form, a perfectly symmetrical, bell-shaped curve The normal curve characterizes the distributions of many physical, psychoeducational, and psychomotor variables Many statistical tests assume a normal distribution null hypothesis (H0) The hypothesis that is assumed to be true and formally tested, the hypothesis that determines the sampling distribution to be employed, and the hypothesis about which the final decision to \reject" or \retain" is made observed frequencies In a chi-square analysis, the actual frequencies recorded (\observed") by the investigator omnibus F test In ANOVA, the F test of the null hypothesis, m1 = m2 = = mk one-sample t test Statistical test to evaluate the null hypothesis for the mean of a single sample when the population standard deviation is unknown one-sample z test The statistical test for the mean of a single sample when the population standard deviation is known one-tailed probability Determining probability from only one side of the probability distribution; appropriate for a directional alternative hypothesis one-tailed test A statistical test calling for a onetailed probability; appropriate for a directional alternative hypothesis 456 Glossary one-way analysis of variance (one-way ANOVA) Statistical analysis for comparing the means of two or more groups OR/addition rule The rule stating that the probability of occurrence of either one event OR another OR another OR is obtained by adding their individual probabilities, provided the events are mutually exclusive ordinal scale The scale’s values can be ordered, reflecting differing degrees or amounts of the characteristic under study (e.g., class rank) outlier A data point in a scatterplot that stands apart from the pack p value The probability, if H0 is true, of observing a sample result as deviant as the result actually obtained (in the direction specified in H1) parameter Summarizes a characteristic of a population parametric tests Statistical tests pertaining to hypotheses about population parameters (e.g., m, r) and/or require assumptions about the population distributions Pearson r Measures the magnitude and direction of linear association between two quantitative variables Pearson r is independent of the scales of X and Y, and it can be no greater than ±1.0 percentage A proportion multiplied by 100 (e.g., 15  100 = 15%) percentile rank The percentage of cases falling below a given score point point estimate When a sample statistic (e.g., X) is used to estimate the corresponding parameter in the population (e.g., m) pooled variance estimate Combining (\pooling") sample variances into a single variance for significance testing of differences among means population The complete set of observations or measurements about which conclusions are to be drawn positive association (positive correlation) As values of X increase, values of Y tend to increase as well positive skew A skewed distribution where the elongated tail is to the right post hoc comparisons In ANOVA, significance testing involving all possible pairs of sample means (e.g., Tukey’s HSD Test) post hoc fallacy The logical fallacy that if X and Y are correlated, and X temporally precedes Y, then X must be a cause of Y predicted score The score, Y0 , determined from the regression equation for an individual case prediction error The difference between a predicted score and the actual observation probability distribution Any relative frequency distribution The ability to make statistical inferences is based on knowledge of the probability distribution appropriate to the situation (Also see sampling distribution.) probability theory A framework for studying chance and its effects proportion The quotient obtained when the amount of a part is divided by the amount of the whole Proportions are always positive and preceded by a decimal point, as in \.15" proportional area The area under a frequency curve corresponding to one or more class intervals (or between any two score points in an ungrouped distribution) qualitative variable A variable whose values differ in kind rather than amount (e.g., categories of marital status); also called categorical variable Such variables have a nominal scale of measurement quantitative variable A variable whose values differ in amount or quantity (e.g., test scores) Quantitative variables have either an ordinal, interval, or ratio scale of measurement quartile The 25th, 50th, and 75th percentiles in a distribution of scores Quartiles are denoted by the symbols Q1, Q2, and Q3 random sample A sample so chosen that each possible sample of the specified size (n) has an equal probability of selection randomization A method for randomly assigning an available pool of research participants to two or more groups, thus allowing chance to determine who is included in what group Randomization provides experimental control over extraneous factors that otherwise can bias results range The difference between the highest and the lowest scores in a distribution ratio scale The scale’s values possess the properties of an interval scale, except zero is absolute Glossary region of retention Area of a sampling distribution that falls outside the region(s) of rejection The null hypothesis is retained when the calculated value of the test statistic falls in the region of retention region(s) of rejection Area in the tail(s) of a sampling distribution, established by the critical value(s) appropriate to the test statistic used The null hypothesis is rejected when the calculated value of the test statistic falls in the regions(s) of rejection regression equation The equation of a best-fitting straight line that allows one to predict the value of Y from a value of X: Y = a + bX regression line The mathematically best-fitting straight line for the data points in a scatterplot (See least-squares criterion, regression equation.) regression toward the mean When r < 1.00, the value of Y will be closer to Y than the corresponding value of X is to X relative frequency The conversion of an absolute frequency to a proportion (or percentage) of the total number of cases repeated-measures design A research design involving observations collected over time on the same individuals (as in testing participants before and after an intervention) restriction of range Limited variation in X and/or Y that therefore reduces the correlation between X and Y sample A part or subset of a population sampling The selection of individual observations from the corresponding population sampling distribution The theoretical frequency distribution of a statistic obtained from an unlimited number of independent samples, each consisting of a sample size n randomly selected from the population sampling variation Variation in any statistic from sample to sample due to chance factors inherent in forming samples scales of measurement A scheme for classifying variables as having nominal, ordinal, interval, or ratio scales scatterplot A graph illustrating the association between two quantitative variables A scatterplot comprises a collection of paired X and Y scores plotted along a two-dimensional grid to reveal the nature of the bivariate association 457 score limits The highest and lowest possible scores in a class interval significance testing Making statistical inferences by testing formal statistical hypotheses about population parameters based on sample statistics simple random sampling (See random sample.) skewed distribution The bulk of scores favor one side of the distribution or the other, thus producing a distribution having an elongated tail in one direction or the other slope Symbolized by b, slope reflects the angle (flat, shallow, steep) and direction (positive or negative) of the regression line For each unit increase in X, Y changes b units standard deviation The square root of the variance standard error of estimate Reflects the dispersion of data points about the regression line Stated more technically, it is the standard deviation of Y scores about Y standard error of r The standard deviation in a sampling distribution of r standard error of the difference between means The standard deviation in a sampling distribution of the difference between means standard error of the mean The standard deviation in a sampling distribution of means standard normal distribution A normal distribution having a mean of and a standard deviation of standard score Expresses a score’s position relative to the mean, using the standard deviation as the unit of measurement T scores and z scores are examples of standard scores standardized score (See standard score.) statistic Summarizes a characteristic of a sample statistical conclusion The researcher’s conclusion expressed in the language of statistics and statistical inference For example, \On a test of conceptual understanding, the mean score of students who were told to generate their own examples of concepts was statistically significantly higher than the mean score of students who were not (p < 05)" A statistical conclusion follows a statistical question and leads to a substantive conclusion statistical hypothesis testing (See significance testing.) statistical inference Based on statistical theory and associated procedures, drawing conclusions 458 Glossary about a population from data collected on a sample taken from that population statistical power The probability, given that H0 is false, of obtaining sample results that will lead to the rejection of H0 statistical question The researcher’s question expressed in the language of statistics and statistical inference For example, \On a test of conceptual understanding, is there a statistically significant difference (a = 05) between the mean score of students who were told to generate their own examples of concepts and the mean score of students who were not?" A statistical question derives from a substantive question and leads to a statistical conclusion statistical significance When sample results lead to the rejection of the null hypothesis—that is, when p a Student’s t (See t ratio.) Studentized range statistic In Tukey’s HSD Test, a test statistic (q) that is used for calculating the HSD critical value substantive conclusion The researcher’s conclusion that is rooted in the substance of the matter under study (e.g., \Generating one’s own examples of a concept improves conceptual understanding") A substantive conclusion derives from a statistical conclusion and answers the substantive question substantive question The researcher’s question that is rooted in the substance of the matter under study For example, \Does generating one’s own examples of a concept improve conceptual understanding?" A substantive question leads to a statistical question sum of squares The sum of the squared deviation scores; serves as the numerator for the variance (and standard deviation) and serves prominently in the analysis of variance systematic sampling Selecting every nth person (or object) from an ordered list of the population; not a truly random sample t ratio Test statistic used for testing a null hypothesis involving a mean or mean difference when the population standard deviation is unknown; also used for testing null hypotheses regarding correlation and regression coefficients T score A standard score having a mean of 50 and a standard deviation of 10 t statistic (See t ratio.) test statistic The statistical test used for evaluating H0 (e.g., z, t, F, w2) trend graph A graph in which the horizontal axis is a unit of time (e.g., 2006, 2007, etc.) and the vertical axis is some statistic (e.g., percentage of unemployed workers) Tukey’s HSD Test In ANOVA, the \honestly significant difference" post-hoc test for evaluating all possible mean differences two-tailed probability Determining probability from both sides of the probability distribution; appropriate for a nondirectional alternative hypothesis two-tailed test A statistical test calling for a twotailed probability; appropriate for a nondirectional alternative hypothesis Type I error Rejecting a null hypothesis when it is true Alpha, a, gives the probability of a Type I error Type II error Retaining a null hypothesis when it is false; equal to À b univariate Involving a single variable variability The amount of spread, or dispersion, of scores in a distribution Common measures of variability are range, variance, and standard deviation variable A characteristic of a person, place, or thing variance A measure of variation that involves every score in the distribution Stated more technically, it is the mean of the squared deviation scores within-group variation In ANOVA, the variation of individual observations about their sample mean; reflects inherent variation z ratio (See one-sample z test.) z score A standard score having a mean of and a standard deviation of x2 goodness-of-fit test Chi-square test involving frequencies on a single variable x2 test of independence Chi-square test involving frequencies on two variables simultaneously REFERENCES Abelson, R P (1995) Statistics as principled argument Hillsdale, NJ: Erlbaum Acton, F S (1959) Analysis of straight-line data New York: Wiley American Educational Research Association (2006, June) Standards for reporting empirical social science research in AERA publications Washington, DC: Author (Available online at http://www.aera.net/) Babbie, E R (1995) The practice of social research (7th ed.) Belmont, CA: Wadsworth Cohen, J (1988) Statistical power analysis for the behavioral sciences (2nd ed.) Hillsdale, NJ: Erlbaum Cronbach, L J., Linn, R L., Brennan, R L., & Haertel, E H (1997) Generalizability analysis for performance assessments of student achievement or school effectiveness Educational & Psychological Measurement, 57(3), 373–399 Daniel, W W (1990) Applied nonparametric statistics (2nd ed.) Boston, MA: PSW-KENT Gaito, J (1980) Measurement scales and statistics: Resurgence of an old misconception Psychological Bulletin, 87, 564–567 Galton, F (1889) Natural inheritance London: Macmillan Glass, G V , & Hopkins, K D (1996) Statistical methods in education and psychology (3rd ed.) Boston, MA: Allyn & Bacon Gould, S J (1996) Full house: The spread of excellence from Plato to Darwin New York: Harmony Books Hedges, L V., & Olkin, I (1985) Statistical methods for meta-analysis New York: Academic Press Huck, S W (2009) Statistical misconceptions New York: Routledge Huff, H H (1954) How to lie with statistics New York: Norton Imrey, H H (1983) Smoking cigarettes: A risk factor for sexual activity among adolescent girls Journal of Irreproducible Results, 28(4), 11 King, B M., & Minium, E W (2003) Statistical reasoning in psychology and education (4th ed.) New York: Wiley Kirk, R E (1982) Experimental design: Procedures for the behavioral sciences (2nd ed.) Monterey, CA: Brooks/Cole Kirk, R E (1990) Statistics: An introduction (3rd ed.) Fort Worth, TX: Holt, Rinehart & Winston Miller, M D., Linn, R L., & Grondlund, N E (2009) Measurement and assessment in teaching (10th ed.) Upper Saddle River, NJ: Merrill Marascuilo, L A., & McSweeney, M (1997) Nonparametric and distribution-free methods for the social sciences Monterey, CA: Brooks/Cole Mlodinow, L (2008) The drunkard’s walk: How randomness rules our lives New York: Vintage Books Paulos, J A (1988) Innumeracy: Mathematical illiteracy and its consequences New York: Vintage Books 410 References 411 Scherer, M (2001) Improving the quality of the teaching force: A conversation with David C Berliner Educational Leadership, 58(8), Siegel, S., & Castellan, N J (1988) Nonparametric statistics for the behavioral sciences (2nd ed.) New York: McGraw-Hill Stigler, S M (1986) The history of statistics: The measurement of uncertainty before 1900 Cambridge, MA: The Belknap Press of Harvard University Press Stine, W W (1989) Meaningful inference: The role of measurement in statistics Psychological Bulletin, 105, 147–155 Tankard, J W (1984) The statistical pioneers Cambridge, MA: Schenkman Tufte, E R (2001) The visual display of quantitative information (2nd ed.) Cheshire, CT: Graphics Press Wilkinson, L., and Task Force on Statistical Inference (1999) Statistical methods in psychology journals: Guidelines and explanations American Psychologist, 54, 594–604 Winer, B J., Brown, D R., & Michels, K M (1991) Statistical principles in experimental design (3rd ed.) New York: McGraw-Hill USEFUL FORMULAS Percentile rank (ungrouped frequency distribution) P¼ f =2 þ Cum: f (below) 100 n X¼ Arithmetic mean Formula (2.1) SX n Formula (4.1) Grand mean X¼ ðn1 X Þ þ ðn2 X Þ n1 þ n2 Formula (4.2) Variance (descriptive statistic) S2 ¼ SðX À XÞ2 SS ¼ n n Formula (5.1) Standard deviation (descriptive statistic) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffi SðX À XÞ2 SS S¼ ¼ n n z¼ z score T score Covariance Pearson r (defining formula) Regression equation (expanded raw-score formula) Regression equation (z-score form) Standard error of estimate Standard error of estimate (alternate formula) X ÀX S Formula (6.1) T = 50 + 10z Cov ¼ Formula (6.2) SðX À XÞðY À YÞ n r¼ Cov SX SY intercept Formula (5.2) Formula (7.1) Formula (7.2) slope zfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflffl{ zfflfflfflfflffl ffl}|fflfflfflfflffl ffl{ SY SY X þr Y ¼Y Àr X SX SX zY ¼ rzX sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SðY À Y Þ2 SY ÁX ¼ n pffiffiffiffiffiffiffiffiffiffiffiffiffi SY ÁX ¼ SY À r2 Formula (8.4) Formula (8.5) Formula (8.7) Formula (8.8) 465 s s X ¼ pffiffiffi n Standard error of the mean X À m0 sX Formula (10.2) One-sample z test z¼ General rule for a confidence interval for m (s known) X za s X Formula (12.3) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffi SðX À XÞ2 SS s¼ ¼ nÀ1 nÀ1 Formula (13.1) s s X ¼ pffiffiffi n Formula (13.2) Standard deviation (inferential statistic) Standard error of the mean (estimated) t¼ One-sample t test General rule for a confidence interval for m (s not known) X ta s X s2pooled ¼ Pooled variance estimate of s21 and s22 Estimate of s X ÀX t test for two independent samples General rule for a confidence interval for m1 À m2 Effect size, d ˆ2 Effect size, o (independent-samples t test) 466 X À m0 sX s X ÀX SS1 þ SS2 n1 þ n2 À sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SS1 þ SS2 1 ¼ þ n1 þ n2 À n1 n2 t¼ X1 À X2 s X ÀX ðX À X Þ ta s X ÀX X1 À X2 X1 À X2 d ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ spooled SS1 þ SS2 n1 þ n2 À ˆ2 ¼ o v t2 À t þ n þ n2 À Formula (11.1) Formula (13.3) Formula (13.4) Formula (14.4) Formula (14.5) Formula (14.6) Formula (14.7) Formula (14.8) Formula (14.9) Standard error of the difference between means (dependent samples) t test for two dependent samples: direct-difference method s X ÀX t¼ Between-groups sum of squares D D ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sD SS Formula (15.5) D ta s D all scoresall SSwithin ¼ S SSbetween ¼ S ðX À XÞ Formula (16.3) F¼ Formula (16.8) SSbetween kÀ1 Formula (16.9) s2between ¼ F-ratio for one-way analysis of variance s2between s2within sffiffiffiffiffiffiffiffiffiffiffiffi s2 HSD ¼ q within ngroup Critical HSD for Tukey’s test X i À X j HSD General rule for a confidence interval for mi À mj ˆ2 ¼ o ! SSbetween À ðk À 1Þs2within SStotal þ s2within sb ¼ Formula (16.1) SSwithin ntotal À k Between-groups variance estimate Standard error of r (r = 0) ðX À XÞ2 all scores all s2within ¼ Standard error of b Formula (15.4) nðn À 1Þ Within-groups variance estimate ˆ2 Effect size, o (one-way analysis of variance) Formula (15.1) D General rule for confidence interval for mD Within-groups sum of squares sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s21 þ s22 À 2r12 s1 s2 ¼ n sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SðY À Y Þ2 =n À SðX À XÞ2 rffiffiffiffiffiffiffiffiffiffiffiffiffi À r2 sr ¼ nÀ2 Formula (16.10) Formula (16.11) Formula (16.13) Formula (16.14) (p 351) Formula (17.2) 467 t ratio for r t¼ r sr Formula (17.3) t ratio for b t¼ b sb Formula (17.4) " Chi-square x ¼ S ( f o À f e )2 fe # General rule for a confidence interval for p vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n 1:92 uP(1 À P) :967 À 1:96t þ 25 %L ¼ 4P þ n þ 3:84 n n n Formula (18.1) Formulas (18.3) and (18.4) rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 1:92 P(1 À P) :967 %U ¼ þ 1:96 þ 25 4P þ n þ 3:84 n n n Chi-square for a  table Population effect size (mean difference) 468 x2 ¼ n(AD À BC)2 (A þ B)(C þ D)(A þ C)(B þ D) d¼ m1 À m2 s Formula (18.17) Formula (19.1) [...]... statistic of +2. 20, which led to the rejection of H0 (Section 11.6) Now compare his decision about H0 to the 95% confidence interval, 27 2 6 19.6 (Section 12. 3) Notice that the resulting interval, 25 2.4 to 29 1.6, does not 12. 6 25 2.4 Interval Estimation and Hypothesis Testing X Ϯ 1.96 X 27 2 Ϯ (1.96)(10) 27 2 Ϯ 19.6 0 = 25 0 X = 10 z.05 = –1.96 critical value 29 1.6 X = 27 2 z = +2. 20 reject H0 24 7 Interval... homeschooled fourth graders in his state is between 25 2 and 29 2 (i.e., 27 2 6 20 points), just as the pollster might state that between 52% and 58% of all voters prefer Candidate X (i.e., 55% 6 3 percentage points) 24 2 Chapter 12 Estimation Of course, both Dr Meyer and the pollster could be wrong in supposing that the parameters they seek lie within the reported intervals Other things being equal, if wide limits... Sampling distribution (n = 25 ) X = 27 2 z = +2. 20 Area = 005 Area = 005 0 = 25 0 X = 10 Region of rejection z.01 = 2. 58 critical value Region of retention Region of rejection z.01 = +2. 58 critical value Figure 11.6 Regions of rejection for a two-tailed test (a ¼ :01) Dr Meyer’s sample mean (X ¼ 27 2) falls in the region of retention (þ 2: 20 < 2: 58); H0 is retained What is learned when H0 is retained?... (two-tailed) 24 8 Chapter 12 Estimation X Ϯ 1.96X 27 2 Ϯ (1.96)(10) 27 2 Ϯ 19.6 25 2.4 29 1.6 Interval estimation z.05 ϭ ϩ1.96 z.05 ϭ –1.96 0 ϭ 25 5 X ϭ 10 X ϭ 27 2 z ϭ +1.70 retain H0 Hypothesis testing Figure 12. 4 Hypothesis testing and interval estimation: The null hypothesis, H0: m ¼ 25 5, is retained (a ¼ :05, two-tailed), and the value specified in H0 falls within the 95% confidence interval for m Naturally... ¼ 20 ; 95% of all sample means fall in the interval m 6 1:96 s X 12. 3 Constructing an Interval Estimate of m 24 3 correct (those falling in the nonshaded area), and for 5% it would not (those falling in the shaded area) We illustrate this in Figure 12. 2, which displays the interval, X 6 1:96s X , for each of 20 random samples (n ¼ 100) from the population on which pffiffiffi Figure 12. 1 is based With s ¼ 20 ,... mean achievement score of homeschooled fourth graders in his state falls in the interval, X 6 2: 58s X ¼ 27 2 6 2: 58Þð10Þ ¼ 27 2 6 25 :8 or between 24 6 .2 and 29 7.8 Notice that this interval is considerably wider than his 95% confidence interval In short, with greater confidence comes a wider interval This stands to reason, for a wider interval includes more candidates for m So, of course Dr Meyer is more... technically had in mind Opinion polls offer the most familiar example of a point estimate When, on the eve of a presidential election, you hear on CNN that 55% of voters prefer Candidate X (based on a random sample of likely voters), you have been given a point estimate of voter preference in the population In terms of Dr Meyer’s undertaking, his sample mean of X ¼ 27 2 is a point estimate of m—his single best... result in a Type I error? (Hint: Sketch the sampling distribution and put in the regions of rejection.) CHAPTER 12 Estimation 12. 1 Hypothesis Testing Versus Estimation Statistical inference is the process of making inferences from random samples to populations In educational research, the dominant approach to statistical inference traditionally has been hypothesis testing, which we introduced in the... as follows: Step 1 s X is determined: pffiffiffi s X ¼ s= n ¼ 50=5 ¼ 10 ðremember; n ¼ 25 and s ¼ 50Þ Step 2 X and s X are entered in Formula ( 12. 1): X 6 1:96s X ¼ 27 2 6 ð1:96Þð10Þ ¼ 27 2 6 19:6 Step 3 The interval limits are identified: 25 2:4 ðlower limitÞ and 29 1:6 ðupper limitÞ Dr Meyer therefore is 95% confident that m lies in the interval 27 2 6 19.6, or between 25 2.4 and 29 1.6 He knows that if he selected... ¼ s= n ¼ 20 =10 ¼ 2: 0, which results in the interval X 6 1:96 2: 0Þ, or X 6 3: 92 For example, the mean of the first sample is X 1 ¼ 1 02, for which the interval is 1 02 6 3. 92, or 98.08 to 105. 92 Notice that although the 20 sample means in Figure 12. 2 vary about the population mean (m ¼ 100)—some means below, some above—m falls within the interval m ϭ 100 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X 12 X13 X14