(BQ) Part 2 book Business statistics has contents: Inference for regression, understanding residuals, multiple regression, building multiple regression models, time series analysis.
Inference for Counts: Chi-Square Tests SAC Capital Hedge funds, like mutual funds and pension funds, pool investors’ money in an attempt to make profits Unlike these other funds, however, hedge funds are not required to register with the U.S Securities and Exchange Commission (SEC) because they issue securities in “private offerings” only to “qualified investors” (investors with either $1 million in assets or annual income of at least $200,000) Hedge funds don’t necessarily “hedge” their investments against market moves But typically these funds use multiple, often complex, strategies to exploit inefficiencies in the market For these reasons, hedge fund managers have the reputation for being obsessive traders One of the most successful hedge funds is SAC Capital, which was founded by Steven (Stevie) A Cohen in 1992 with nine employees and $25 million in assets under management (AUM) SAC Capital returned annual gains of 40% or more through much of the 1990s and is now reported to have more than 800 employees and nearly 449 450 CHAPTER 15 • Inference for Counts: Chi-Square Tests $14 billion in assets under management According to Forbes, Cohen’s $6.4 billion fortune ranks him as the 36th wealthiest American Cohen, a legendary figure on Wall Street, is known for taking advantage of any information he can find and for turning that information into profit SAC Capital is one of the most active trading organizations in the world According to Business Week (7/21/2003), Cohen’s firm “routinely accounts for as much as 3% of the NYSE’s average daily trading, plus up to 1% of the NASDAQ’s—a total of at least 20 million shares a day.” I n a business as competitive as hedge fund management, information is gold Being the first to have information and knowing how to act on it can mean the difference between success and failure Hedge fund managers look for small advantages everywhere, hoping to exploit inefficiencies in the market and to turn those inefficiencies into profit Wall Street has plenty of “wisdom” about market patterns For example, investors are advised to watch for “calendar effects,” certain times of year or days of the week that are particularly good or bad: “As goes January, so goes the year” and “Sell in May and go away.” Some analysts claim that the “bad period” for holding stocks is from the sixth trading day of June to the fifth-to-last trading day of October Of course, there is also Mark Twain’s advice: October This is one of the peculiarly dangerous months to speculate in stocks The others are July, January, September, April, November, May, March, June, December, August, and February —Pudd’nhead Wilson’s Calendar One common claim is that stocks show a weekly pattern For example, some argue that there is a weekend effect in which stock returns on Mondays are often lower than those of the immediately preceding Friday Are patterns such as this real? We have the data, so we can check Between October 1, 1928 and June 6, 2007, there were 19,755 trading sessions Let’s first see how many trading days fell on each day of the week It’s not exactly 20% for each day because of holidays The distribution of days is shown in Table 15.1 Day of Week Count % of days Monday Tuesday Wednesday Thursday Friday 3820 4002 4024 3963 3946 19.3369% 20.2582 20.3695 20.0607 19.9747 Table 15.1 The distribution of days of the week among the 19,755 trading days from October 1, 1928 to June 6, 2007 We expect about 20% to fall in each day, with minor variations due to holidays and other events Goodness-of-Fit Tests 451 Of these 19,755 trading sessions, 10,272, or about 52% of the days, saw a gain in the Dow Jones Industrial Average (DJIA) To test for a pattern, we need a model The model comes from the supposition that any day is as likely to show a gain as any other In any sample of positive or “up” days, we should expect to see the same distribution of days as in Table 15.1—in other words, about 19.34% of “up” days would be Mondays, 20.26% would be Tuesdays, and so on Here is the distribution of days in one such random sample of 1000 “up” days Day of Week Monday Tuesday Wednesday Thursday Friday Count 192 189 202 199 218 % of days in the sample of “up” days 19.2% 18.9 20.2 19.9 21.8 Table 15.2 The distribution of days of the week for a sample of 1000 “up” trading days selected at random from October 1, 1928 to June 6, 2007 If there is no pattern, we would expect the proportions here to match fairly closely the proportions observed among all trading days in Table 15.1 Of course, we expect some variation We wouldn’t expect the proportions of days in the two tables to match exactly In our sample, the percentage of Mondays in Table 15.2 is slightly lower than in Table 15.1, and the proportion of Fridays is a little higher Are these deviations enough for us to declare that there is a recognizable pattern? 15.1 Goodness-of-Fit Tests To address this question, we test the table’s goodness-of-fit, where fit refers to the null model proposed Here, the null model is that there is no pattern, that the distribution of up days should be the same as the distribution of trading days overall (If there were no holidays or other closings, that would just be 20% for each day of the week.) Assumptions and Conditions Data for a goodness-of-fit test are organized in tables, and the assumptions and conditions reflect that Rather than having an observation for each individual, we typically work with summary counts in categories Here, the individuals are trading days, but rather than list all 1000 trading days in the sample, we have totals for each weekday Counted Data Condition The data must be counts for the categories of a categorical variable This might seem a silly condition to check But many kinds of values can be assigned to categories, and it is unfortunately common to find the methods of this chapter applied incorrectly (even by business professionals) to proportions or quantities just because they happen to be organized in a two-way table So check to be sure that you really have counts Independence Assumption Independence Assumption The counts in the cells should be independent of each other You should think about whether that’s reasonable If the data are a random sample you can simply check the randomization condition 452 CHAPTER 15 • Inference for Counts: Chi-Square Tests Randomization Condition The individuals counted in the table should be a random sample from some population We need this condition if we want to generalize our conclusions to that population We took a random sample of 1000 trading days on which the DJIA rose That lets us assume that the market’s performance on any one day is independent of performance on another If we had selected 1000 consecutive trading days, there would be a risk that market performance on one day could affect performance on the next, or that an external event could affect performance for several consecutive days Expected Cell Frequencies Companies often want to assess the relative successes of their products in different regions However, a company whose sales regions had 100, 200, 300, and 400 representatives might not expect equal sales in all regions They might expect observed sales to be proportional to the size of the sales force The null hypothesis in that case would be that the proportions of sales were 1/10, 2/10, 3/10, and 4/10, respectively With 500 total sales, their expected counts would be 50, 100, 150, and 200 Notation Alert! We compare the counts observed in each cell with the counts we expect to find The usual notation uses Obs and Exp as we’ve used here The expected counts are found from the null model Sample Size Assumption Sample Size Assumption We must have enough data for the methods to work We usually just check the following condition: Expected Cell Frequency Condition We should expect to see at least individuals in each cell The expected cell frequency condition should remind you of—and is, in fact, quite similar to—the condition that np and nq be at least 10 when we test proportions Chi-Square Model We have observed a count in each category (weekday) We can compute the number of up days we’d expect to see for each weekday if the null model were true For the trading days example, the expected counts come from the null hypothesis that the up days are distributed among weekdays just as trading days are Of course, we could imagine almost any kind of model and base a null hypothesis on that model To decide whether the null model is plausible, we look at the differences between the expected values from the model and the counts we observe We wonder: Are these differences so large that they call the model into question, or could they have arisen from natural sampling variability? We denote the differences between these observed and expected counts, (Obs – Exp) As we did with variance, we square them That gives us positive values and focuses attention on any cells with large differences Because the differences between observed and expected counts generally get larger the more data we have, we also need to get an idea of the relative sizes of the differences To that, we divide each squared difference by the expected count for that cell The test statistic, called the chi-square (or chi-squared) statistic, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts: x2 = a all cells Notation Alert! The only use of the Greek letter x in Statistics is to represent the chi-square statistic and the associated sampling distribution This violates the general rule that Greek letters represent population parameters Here we are using a Greek letter simply to name a family of distribution models and a statistic 1Obs - Exp22 Exp The chi-square statistic is denoted x2, where x is the Greek letter chi (pronounced ki) The resulting family of sampling distribution models is called the chi-square models The members of this family of models differ in the number of degrees of freedom The number of degrees of freedom for a goodness-of-fit test is k - 1, where k is the number of cells—in this example, weekdays We will use the chi-square statistic only for testing hypotheses, not for constructing confidence intervals A small chi-square statistic means that our model fits the data well, so a small value gives us no reason to doubt the null hypothesis If the observed counts don’t match the expected counts, the statistic will be large If the calculated statistic value is large enough, we’ll reject the null hypothesis So the chi-square test is always one-sided What could be simpler? Let’s see how it works Goodness-of-Fit Tests 453 Goodness of fit test Atara manages call center operators at a telecommunications company To develop new business, she gives each operator a list of randomly selected phone numbers of rival phone company customers She also provides the operators with a script that tries to convince the customers to switch providers Atara notices that some operators have found more than twice as many new customers as others, so she suspects that some of the operators are performing better than others The 120 new customer acquisitions are distributed as follows: Operator New customers 11 17 12 19 18 13 21 Question: Is there evidence to suggest that some of the operators are more successful than others? Answer: Atara has randomized the potential new customers to the operators so the Randomization Condition is satisfied The data are counts and there are at least in each cell, so we can apply a chi-square goodness-of-fit test to the null hypothesis that the operator performance is uniform and that each of the operators will convince the same number of customers Specifically we expect each operator to have converted 1/8 of the 120 customers that switched providers Operator Observed 11 Expected 15 Observed-Expected -4 16 (Obs-Exp) (Obs-Exp)2/Exp 16>15 = 1.07 17 12 19 18 13 21 15 15 15 15 15 15 15 -6 36 -3 16 -2 36 4>15 = 0.27 36>15 = 2.40 9>15 = 0.60 16>15 = 1.07 4>15 = 0.27 36>15 = 2.40 a 1Obs - Exp22 Exp 9>15 = 0.60 = 1.07 + 0.27 + 2.40 + + 2.40 = 8.67 The number of degrees of freedom is k - = P1x27 8.672 = 0.2772 8.67 is not a surprising value for a Chi-square statistic with degrees of freedom So, we fail to reject the null hypothesis that the operators actually find new customers at different rates The chi-square calculation Here are the steps to calculate the chi-square statistic: Find the expected values These come from the null hypothesis model Every null model gives a hypothesized proportion for each cell The expected value is the product of the total number of observations times this proportion (The result need not be an integer.) Compute the residuals Once you have expected values for each cell, find the residuals, Obs - Exp Square the residuals 1Obs - Exp22 Compute the components Find 1Obs - Exp22 Exp for each cell (continued) 454 CHAPTER 15 • Inference for Counts: Chi-Square Tests Find the sum of the components That’s the chi-square statistic, x2 = a all cells 1Obs - Exp22 Exp Find the degrees of freedom It’s equal to the number of cells minus one Test the hypothesis Large chi-square values mean lots of deviation from the hypothesized model, so they give small P-values Look up the critical value from a table of chi-square values such as Table X in Appendix D, or use technology to find the P-value directly The steps of the chi-square calculations are often laid out in tables Use one row for each category, and columns for observed counts, expected counts, residuals, squared residuals, and the contributions to the chi-square total: Table 15.3 Calculations for the chi-square statistic in the trading days example, can be performed conveniently in Excel Set up the calculation in the first row and Fill Down, then find the sum of the rightmost column The CHIDIST function looks up the chi square total to find the P-value Stock Market Patterns We have counts of the “up” days for each day of the week The economic theory we want to investigate is whether there is a pattern in “up” days So, our null hypothesis is that across all days in which the DJIA rose, the days of the week are distributed as they are across all trading days (As we saw, the trading days are not quite evenly distributed because of holidays, so we use the trading days percentages as the null model.) We refer to this as uniform, accounting for holidays The alternative hypothesis is that the observed percentages are not uniform The test statistic looks at how closely the observed data match this idealized situation PLAN Setup State what you want to know Identify the variables and context Hypotheses State the null and alternative hypotheses For x2 tests, it’s usually easier to state the hypotheses in words than in symbols We want to know whether the distribution for “up” days differs from the null model (the trading days distribution) We have the number of times each weekday appeared among a random sample of 1000 “up” days H0: The days of the work week are distributed among the up days as they are among all trading days HA: The trading days model does not fit the up days distribution Goodness-of-Fit Tests Model Think about the assumptions and check the conditions ✓ ✓ ✓ ✓ DO Counted Data Condition We have counts of the days of the week for all trading days and for the “up” days Independence Assumption We have no reason to expect that one day’s performance will affect another’s, but to be safe we’ve taken a random sample of days The randomization should make them far enough apart to alleviate any concerns about dependence Randomization Condition We have a random sample of 1000 days from the time period Expected Cell Frequency Condition All the expected cell frequencies are much larger than Name the test you will use The conditions are satisfied, so we’ll use a x2 model with - = degrees of freedom and a chi-square goodness-of-fit test Mechanics To find the expected number The expected values are: Specify the sampling distribution model of days, we take the fraction of each weekday from all days and multiply by the number of “up” days Monday: 193.369 Tuesday: 202.582 Wednesday: 203.695 Thursday: 200.607 Friday: 199.747 For example, there were 3820 Mondays out of 19,755 trading days So, we’d expect there would be 1000 * 3820>19,755 or 193.369 Mondays among the 1000 “up” days And we observe: Monday: 192 Tuesday: 189 Wednesday: 202 Thursday: 199 Friday: 218 Each cell contributes a value equal to 1Obs - Exp22 to the chi-square sum Exp Add up these components If you it by hand, it can be helpful to arrange the calculation in a table or spreadsheet 455 x = 1192 - 193.36922 193.369 = 2.615 + Á + 1218 - 199.74722 199.747 The P-value is the probability in the upper tail of the x2 model It can be found using software or a table (see Table X in Appendix D) Using Table X in Appendix D, we find that for a significance level of 5% and degrees of freedom, we’d need a value of 9.488 or more to have a P-value less than 05 Our value of 2.615 is less than that Large x2 statistic values correspond to small P-values, which would lead us to reject the null hypothesis, but the value here is not particularly large Using a computer to generate the P-value, we find: P-value = P1x42 2.6152 = 0.624 (continued) 456 CHAPTER 15 REPORT • Inference for Counts: Chi-Square Tests Conclusion Link the P-value to your decision Be sure to say more than a fact about the distribution of counts State your conclusion in terms of what the data mean 15.2 MEMO Re: Stock Market Patterns Our investigation of whether there are day-of-the-week patterns in the behavior of the DJIA in which one day or another is more likely to be an “up” day found no evidence of such a pattern Our statistical test indicated that a pattern such as the one found in our sample of trading days would happen by chance about 62% of the time We conclude that there is, unfortunately, no evidence of a pattern that could be used to guide investment in the market We were unable to detect a “weekend” or other day-of-the-week effect in the market Interpreting Chi-Square Values When we calculated x2 for the trading days example, we got 2.615 That value was not large for degrees of freedom, so we were unable to reject the null hypothesis In general, what is big for a x2 statistic? Think about how x2 is calculated In every cell any deviation from the expected count contributes to the sum Large deviations generally contribute more, but if there are a lot of cells, even small deviations can add up, making the x2 value larger So the more cells there are, the higher the value of x2 has to be before it becomes significant For x2, the decision about how big is big depends on the number of degrees of freedom Unlike the Normal and t families, x2 models are skewed Curves in the x2 family change both shape and center as the number of degrees of freedom grows For example, Figure 15.1 shows the x2 curves for and for degrees of freedom df = Figure 15.1 df = 10 15 20 The x curves for and degrees of freedom Notice that the value x2 = 10 might seem somewhat extreme when there are degrees of freedom, but appears to be rather ordinary for degrees of freedom Here are two simple facts to help you think about x2 models: • The mode is at x2 = df - (Look at the curves; their peaks are at and 7.) • The expected value (mean) of a x2 model is its number of degrees of freedom That’s a bit to the right of the mode—as we would expect for a skewed distribution Goodness-of-fit tests are often performed by people who have a theory of what the proportions should be in each category and who believe their theory to be true In some cases, unlike our market example, there isn’t an obvious null hypothesis against which to test the proposed model So, unfortunately, in those cases, the only null hypothesis available is that the proposed theory is true And as we know, the hypothesis testing procedure allows us only to reject the null or fail to reject it We can never confirm that a theory is in fact true; we can never confirm the null hypothesis Examining the Residuals 457 At best, we can point out that the data are consistent with the proposed theory But this doesn’t prove the theory The data could be consistent with the model even if the theory were wrong In that case, we fail to reject the null hypothesis but can’t conclude anything for sure about whether the theory is true Why Can’t We Prove the Null? A student claims that it really makes no difference to your starting salary how well you in your Statistics class He surveys recent graduates, categorizes them according to whether they earned an A, B, or C in Statistics, and according to whether their starting salary is above or below the median for their class He calculates the proportion above the median salary for each grade His null model is that in each grade category, 50% of students are above the median With 40 respondents, he gets a P-value of 07 and declares that Statistics grades don’t matter But then more questionnaires are returned, and he finds that with a sample size of 70, his P-value is 04 Can he ignore the second batch of data? Of course not If he could that, he could claim almost any null model was true just by having too little data to refute it 15.3 Examining the Residuals Chi-square tests are always one-sided The chi-square statistic is always positive, and a large value provides evidence against the null hypothesis (because it shows that the fit to the model is not good), while small values provide little evidence that the model doesn’t fit In another sense, however, chi-square tests are really manysided; a large statistic doesn’t tell us how the null model doesn’t fit In our market theory example, if we had rejected the uniform model, we wouldn’t have known how it failed Was it because there were not enough Mondays represented, or was it that all five days showed some deviation from the uniform? When we reject a null hypothesis in a goodness-of-fit test, we can examine the residuals in each cell to learn more In fact, whenever we reject a null hypothesis, it’s a good idea to examine the residuals (We don’t need to that when we fail to reject because when the x2 value is small, all of its components must have been small.) Because we want to compare residuals for cells that may have very different counts, we standardize the residuals We know the mean residual is zero,1 but we need to know each residual’s standard deviation When we tested proportions, we saw a link between the expected proportion and its standard deviation For counts, there’s a similar link To standardize a cell’s residual, we divide by the square root of its expected value2: 1Obs - Exp2 1Exp Notice that these standardized residuals are the square roots of the components we calculated for each cell, with the plus 1+2 or the minus 1-2 sign indicating whether we observed more or fewer cases than we expected The standardized residuals give us a chance to think about the underlying patterns and to consider how the distribution differs from the model Now that we’ve divided each residual by its standard deviation, they are z-scores If the null hypothesis was true, we could even use the 68–95–99.7 Rule to judge how extraordinary the large ones are Residual = observed - expected Because the total of the expected values is the same as the observed total, the residuals must sum to zero It can be shown mathematically that the square root of the expected value estimates the appropriate standard deviation 458 CHAPTER 15 • Inference for Counts: Chi-Square Tests Here are the standardized residuals for the trading days data: Standardized Residual = 2Exp - 0.0984 - 0.9542 - 0.1188 - 0.1135 1.292 Monday Tuesday Wednesday Thursday Friday Table 15.4 1Obs - Exp B Standardized residuals None of these values is remarkable The largest, Friday, at 1.292, is not impressive when viewed as a z-score The deviations are in the direction suggested by the “weekend effect,” but they aren’t quite large enough for us to conclude that they are real Examining residuals from a chi-square test Question: In the call center example (see page 453), examine the residuals to see if any operators stand out as having especially strong or weak performance Answer: Because we failed to reject the null hypothesis, we don’t expect any of the standardized residuals to be large, but we will examine them nonetheless The standardized residuals are the square roots of the components (from the bottom row of the table in the Example on page 453) Standardized Residuals - 1.03 0.52 - 1.55 - 0.77 1.03 0.77 - 0.52 1.55 As we expected, none of the residuals are large Even though Atara notices that some of the operators enrolled more than twice the number of new customers as others, the variation is typical (within two standard deviations) of what we would expect if all their performances were, in fact, equal 15.4 The Chi-Square Test of Homogeneity Skin care products are big business According to the American Academy of Dermatology, “the average adult uses at least seven different products each day,” including moisturizers, skin cleansers, and hair cosmetics.3 Growth in the skin care market in China during 2006 was 15%, fueled, in part, by massive economic growth But not all cultures and markets are the same Global companies must understand cultural differences in the importance of various skin care products in order to compete effectively The GfK Roper Reports® Worldwide Survey, which we first saw in Chapter 3, asked 30,000 consumers in 23 countries about their attitudes on health, beauty, and other personal values One question participants were asked was how important is “Seeking the utmost attractive appearance” to you? Responses were a scale with = Not at all important and = Extremely important Is agreement with this www.aad.org/public/Publications/pamphlets/Cosmetics.htm Index A-81 Discrete probability models, 226–234, 234FE Discrete random variables, 218 Discrete Uniform distribution, 227 Disjoint events Addition Rule, 197 defined, 196 independent vs., 203 Distribution-free methods, 810 Distributions, 245–276 bimodal, 91 of categorical variables, 54 center of, 91, 93–95 chi-square, 449–489 comparing groups with histograms, 102–103 conditional, 61–63, 364 defined, 54 displaying quantitative variables, 86–91 F, 730 marginal, 59 multimodal, 91 Normal, 245–276 outliers in, 92–93 re-expressing data, 546–550, 551FE sampling, 277–301 shapes of, 91–93, 93FE, 98 skewed, 92, 112–113, 114FE spread of, 91, 95–97, 97FE standard Normal, 248–254 summarizing data, 99FE symmetric, 92, 548–549 tails of, 92 triangular, 285 Uniform, 227, 285 uniform, 91 unimodal, 91 DMAIC cycle, 771 Dominance, 844 Dotplots, 120, 428 Double-blind experiments, 727 Dow, Charles, 357 Dummy variables See Indicator variables Durbin-Watson statistic, 543–545 E Earthquakes, 224 Effect size, 364–365, 378 Empirical probability, 192 Empirical Rule, 248n See also 68–95–99.7 Rule Equal Spread Condition for linear regression, 157 for multiple regression, 585 for regression, 495 Equal Variance Assumption for ANOVA, 733–734, 739 for multiple regression, 585 for pooled t-tests, 412 for regression, 494–495, 496 Error(s) absolute percentage error, 676 effect size and, 378 in extrapolation, 537 forecast, 675–677, 677FE margin of, 309–311, 319 mean absolute percentage error, 676 mean squared error, 675 sampling, 28, 32, 280 standard See Standard error(s) Type I, 377–378, 379FE Type II, 377–378, 379FE Ethics bias and, 41, 319 in business, 396 chi-square test and, 472 data mining and, 877 design of research and, 425 expected payoff and, 853 experiments and, 739, 749 extrapolation and, 168 interpretation of confidence interval and, 346 lurking variables and, 168 nonparametric methods and, 825 normal model and, 235, 268 outliers and, 116–117, 346n, 557 placebo effect and, 727 probability and, 853 in quality control, 795 regression and, 512, 602, 649 in research, 18 sample proportion and, 292 Simpson’s Paradox and, 69 statistical vs practical significance and, 382 as subject of study, 2, 326 time series and, 697 Ethics in action, 4, 18, 41, 69, 116–117, 168, 235, 268, 292, 319, 346, 383, 425, 472, 512, 557, 602, 649, 697, 751, 795, 825, 853, 877 European Network and Information Security Agency (ENISA), 864 Events defined, 191 disjoint, 196 independent, 192, 196, 202–203 probability of, 191, 195–197 sample space and, 191 EWMA chart, 783 Excel 2007 displaying categorical data, 71 linear regression, 171–172 scatterplots and correlation, 171–172 Excel spreadsheet package for ANOVA, 754–755 chi-square tests, 474 confidence intervals for proportions, 321 displaying quantitative data, 120 hypothesis tests for proportions, 387 inference for means, 348 linear regression, 171–172 making Normal probability plots, 270 paired t-test, 430 process control rules, 777 quality control charts, 797 RAND() function, 31 re-expressing data, 559 regression analysis, 515, 604, 651 scatterplots and correlation, 171–172 time series methods, 700 two-sample methods, 428 Exogenous variables, 690 Expected Cell Frequency Condition for chi-square tests, 452, 461 for goodness-of-fit tests, 452 for homogeneity tests, 461 Expected pay off, and ethics, 853 Expected value of an action, 840–841, 841FE for chi-square statistic, 461 defined, 219, 840 with perfect information, 841, 841FE of random variables, 218–220, 220FE, 223–225 with sample information, 842–843 Experiment(s), 717–767 balanced, 721 blinding in, 727 blocking in, 721–722 computers and, 754–755 confounding in, 728 defined, 720 double-blind, 727 ethical considerations, 739, 749, 751 A-82 APPENDIX E • Index Experiment(s) (continued) factors in, 720 follow up example, 744FE–748FE mini case study, 756 placebos in, 727 potential problems, 749–750 random assignments in, 720 response variables in, 720 single-blind, 727 train example, 720FE Experimental designs on blocking, 721–722 completely randomized, 722–723 on control, 721 direct mail experiment example, 725FE–726FE factorial, 723–724 multifactor analysis, 740–742, 742FE principles of, 721–722, 722FE on randomization, 721 randomized block, 723, 821 replicates, 721 types of, 722–724, 724FE Experimental units defined, 10, 720 participants, 10 Explanatory variables, 142 Exponential probability model, 266–267 Exponential smoothing, 674–675, 695 Extrapolation defined, 535 ethics and, 168 prediction and, 535–537, 538FE F F-distribution, 730 F-statistic, 593–595, 730 F-test in ANOVA, 734–736 defined, 591 for simple regression, 591 Factor(s) blocking, 722 confounding and, 728 defined, 720 level of, 720 Factorial design, ethics and, 425 functionality, 723–724 Fair Isaacs Corporation (FICO), 189–190 False alarm, 773, 776 False negative results See Type II error False positive results See Type I error Far outliers, 100 FedEx Express, Satchell et al vs., 273 Fisher, Ronald Aylmer, 363, 371 Five-number summary boxplots and, 99–100 defined, 99 Forecast choosing method, 695–696 defined, 667 long- vs short-term, 670 naive, 673 with regression-based models, 691–692 simple moving average forecast, 671–673, 673FE–674FE, 695 smoothing methods for, 670 Forecast error, 675–677, 677FE Frequency histograms, 88 Frequency tables defined, 53–54 relative, 53, 54FE Friedman test, 820–821 G Gallup, George, 305–306 Galton, Francis, 154 Gaps, 87 Gauss, Karl Friedrich, 151 General Addition Rule, 197, 197FE General Multiplication Rule, 202 Geometric probability model, 227–228 GfK Roper Consulting, 26–27, 30, 44, 65FE–66FE GMAT (Graduate Management Admission Test), 250FE–252FE Goodness-of-fit tests assumptions and conditions, 451–452 call center example, 453FE Counted Data Condition, 451 Expected Cell Frequency Condition, 452 Independence Assumption, 451 Randomization Condition, 452 Sample Size Assumption, 452 Google Analytics, 52 Gosset, William S (“Student”), 332–333, 368 Groups Central Limit Theorem for, 810–811 comparing, 102–103 control, 721 examining residuals for, 532––534, 535FE Guinness, Arthur, 331–332 H Health Insurance Portability and Accountability Act (HIPAA), 865 Hedge funds, 449–450 Histograms bimodal, 91 comparing groups with, 102–103 of Cook’s distances, 629 creating, 89FE defined, 87 for displaying quantitative variables, 86–88 frequency, 88 inference for regression, 497 modes of, 91 multimodal, 91 re-expressing data, 548–549 relative frequency, 88 skewed, 92, 112–113, 114FE symmetric, 92, 548–549 uniform, 91 unimodal, 91 Home Depot, The, 173–174 Homogeneity test (chi-square) assumptions and conditions, 460–461 Counted Data Condition, 460 defined, 460 Expected Cell Frequency Condition, 461 finding expected values, 461 Independence Assumption, 461 inflation example, 463FE–464FE Randomization Condition, 461 Sample Size Assumption, 461 skin care products example, 458–460 Homoscedasticity, 157 Human Development Index (HDI), 162–164 Human Resource Accounting, 807 Hunter, Stu, 371 Hypotheses alternative, 359, 365–366 framing, 361FE null, 359, 362 Hypothesis testing, 357–398 about proportions, 357–398 alpha levels in, 371–373, 372FE Index A-83 on the computer, 385–387 confidence intervals and, 374–375, 377FE, 409 credit card promotion, 376FE–377FE critical values in, 372–373, 373FE effect size and, 364–365, 378 ethical considerations, 383 home field advantage example, 366FE–368FE for means, 357–398 mini case study, 388 multiple regression models, 591–592, 592FE–593FE nonparametric methods, 809–811 one-sample t-test, 368–369, 369FE–370FE P-value in, 362–363, 365–366, 368FE potential problems, 382, 424–425 power of, 378, 379–381 reasoning of, 363–365, 365FE significance levels in, 371–372 standard deviation and, 360, 374 trial as, 361–363 Type I error, 377–378, 379FE Type II error, 377–378, 379FE I Identifier variables, 13–14 In-control ARL, 776 Independence 10% Condition, 228–229 Bernoulli trials and, 228–229 chi-square tests of, 466–470, 470FE–471FE defined, 192 of events, 192, 202–203 Multiplication Rule, 196, 196FE, 202 Independence Assumption for ANOVA, 733 for chi-square tests, 451, 461 for comparing means, 404–405 for confidence intervals, 312 for goodness-of-fit tests, 451–452 for homogeneity tests, 461 for linear regression, 156 for multiple regression, 584 for nonparametric methods, 811 for paired data, 419 for regression, 494–495, 496 sampling distribution models, 282, 289 for Student’s t-models, 336–337 Independence test (chi-square) assumptions and conditions, 467–470 defined, 467 Independent events conditional probability, 202–203, 203FE disjoint vs., 203 Law of Large Numbers, 192 Multiplication Rule, 196, 196FE, 202 Independent Groups Assumption, 405 for nonparametric methods, 811 Independent t-methods, 403 Independent variables, 63, 142, 192 Indicator variables defined, 621 functionality, 620–622, 622FE–624FE for influential cases, 630–631, 631FE–632FE for multiple categories, 622 for seasonal term in time series, 687 Influential case, 629–631 Influential points, 539–541 Intel Corporation, 671, 701 Interaction, 724 Interaction plot, 741–742 Interaction term, 624–625, 626FE Intercept defined, 151 in linear model, 151 Internet, data on, 16 Interquartile range (IQR), 96 Interval scale, 14n Irregular component, 669, 689–690 J JMP statistics package for ANOVA, 755 chi-square tests, 474–475 confidence intervals for proportions, 321 displaying categorical data, 72 displaying quantitative data, 120 hypothesis tests for proportions, 387 inference for means, 348 linear regression, 172 making Normal probability plots, 270 paired t-test, 430 quality control charts, 797 re-expressing data, 559 regression analysis, 515–516, 604, 651 scatterplots and correlation, 172 time series methods, 700 two-sample methods, 428 Joint probabilities, 200–201, 203–204 Jones, Edward, 357 K KEEN Inc., 51–52, 72 Kellogg, John Harvey, 531–532 Kendall, Maurice, 821 Kendall’s tau, 821–822, 822FE Keno (game), 193 KomTek Technologies, 786, 786FE–789FE Kruskal-Wallace test, 813–814, 814FE–816FE L Ladder of Powers, 551–552, 552FE Lagged variables, 678 Lagging, 678 Laplace, Pierre-Simon, 286–287 Large Enough Sample Condition, 289 Law of Averages, 192–193 Law of Diminishing Returns, 290 Law of Large Numbers (LLN) defined, 192 Law of Averages and, 192–193 Least squares correlation and, 151–153 defined, 150 finding intercept, 150–151 finding slope, 151 Legendre’s solution, 151 multiple regression and, 579 Legendre, Adrien-Marie, 151 Level shift, 776 Levels of a factor, 720 Leverage defined, 539, 626 in linear regression, 539–541, 541FE in multiple regression, 626–628 Likert scale, 808, 821 Linear association correlation and, 143 correlation coefficient and, 146 in scatterplots, 139 Linear model defined, 150 intercept in, 151 interpreting, 151FE A-84 APPENDIX E • Index Linear model (continued) leverage on, 539–541, 541FE line of best fit, 151–153 regression to the mean, 153–154 residuals in, 150, 156–157, 158FE slope in, 151–152 variation in, 158–159 Linear regression, 149–165 See also Multiple Regression; Regression assumptions and conditions, 155–156 on computers, 171–172 correlation and the line, 151–153 ethical considerations, 168 extrapolation, 535–537, 538FE home size/price example, 160FE–162FE influential points in, 539–541 leverage in, 539–541, 541FE linear model, 149–151 mini case study, 174 potential problems, 165–168 R2 and, 158–159 reasonableness of, 160 regression to the mean, 153–154 residuals in, 150, 156–157, 158FE variation in residuals, 158–159 Linear trend model, 667–669, 685–687, 687FE Linearity Assumption for linear regression, 156 for multiple regression, 583–584 for regression, 494, 496 Linearity Condition for correlation, 143–144 for linear regression, 153, 156 for multiple regression, 584 for nonparametric methods, 822 for regression, 494, 496 for residuals, 546–548 LLN (Law of Large Numbers) defined, 192 Law of Averages and, 192–193 Lockhart, Denis, 670 Logarithms natural, 686 re-expressing data, 113, 550, 552 re-expressing time series, 685–689 Logistic regression model, 595–597, 598FE Lowell, James Russell, 365 Lowe’s, 137–138, 149–150, 157–159 Lurking variables causation and, 148–149 confounding vs., 728 defined, 148 M Mabillard, Claude, 617 Mann, H B., 810 Mann-Whitney test, 808–811, 812FE Margin of error defined, 309–311 potential problems, 319 Marginal distribution, 59 Marginal probability, 200, 201FE Matching samples to populations, 27–28 Maximax choice, 839, 839FE MBNA, 277–278 Mean(s), 331–356, 399–448 See also Expected value assumptions and conditions for inference, 404–405 center of distributions, 91, 93–95 Central Limit Theorem and, 285–286 comparing, 399–448, 400–402, 403FE on the computer, 427–429 confidence interval for, 334–335, 336FE, 409, 411FE credit card example, 405FE–407FE defined, 93 finding, 93FE hypothesis testing for, 331–356, 370FE inferences about, 347–348 mini case study, 431 one-sample t-interval for the mean, 335 one-sample t-test, 368–369 pooled t-tests, 411–413, 416 potential problems, 424–425 of random variables, 219–220 of response values in regression, 506 sampling distribution models for, 285–291, 332–334 standard deviation and, 97 stationary in the, 667 Student’s t-models, 335 two-sample t-interval for the difference between means, 409 two-sample t-test for the difference between means, 403, 404FE Mean absolute deviation (MAD), 675 Mean absolute percentage error (MAPE), 676 Mean Square due to Error (MSE), 730–731 Mean Square due to Treatment (MST), 730–731 Mean squared error (MSE), 675 Median defined, 94 finding by hand, 94–95, 95FE Mercer Human Resource Consulting, 174 Metadata, 10, 865 Metropolitan Life Insurance Company, 217–218 Minimax choice, 839, 839FE Minimin choice, 839, 839FE Minimum significant difference (MSD), 737 Minitab statistics package for ANOVA, 755 chi-square tests, 475 confidence intervals for proportions, 321 displaying categorical data, 72 displaying quantitative data, 120–121 hypothesis tests for proportions, 387 inference for means, 348 linear regression, 172 making Normal probability plots, 270 paired t-test, 430 process control rules, 777 quality control charts, 797 re-expressing data, 559–560 regression analysis, 516, 604, 651 scatterplots and correlation, 172 time series methods, 700 two-sample methods, 429 M&M’s example, 198FE–200FE Mode(s) defined, 91 of histograms, 91 Model(s) additive, 688–689, 690FE autoregressive, 677–681, 680FE, 696 binomial probability, 229–231 chi-square, 452 exponential probability, 266–267 geometric probability, 227–228 linear, 149–151 Index A-85 linear trend, 667–669, 685–687, 687FE logistic regression, 595–597, 598FE multiple regression, 591–592, 592FE–593FE, 632–635 multiplicative, 688–689, 690FE Normal See Normal model(s) parameters in, 30, 249, 288 Poisson probability, 232–233 predictive, 867, 871 probability See Probability models sampling distribution, 277–301 single exponential smoothing, 674–675 Student’s t-models, 333–335 time series, 669–670, 696 Uniform, 227 Monotonicity, 821–822 Moving averages defined, 671 investors and, 673 simple moving average forecast, 671–673, 673FE–674FE weighted, 673, 673FE–674FE Multimodal distribution, 91 Multiple comparisons, 737–738, 738FE–739FE Multiple regression, 577–616, 617–664 additive and multiplicative models, 688–689 adjusted R2, 593–595, 595FE adjusting for different slopes, 624–625 assumptions and conditions, 583–586, 586FE–587FE body fat measurement, 582 building models, 632–635 Burger King menu items, 624–625 coefficients, 581–583, 583FE collinearity, 641–643 on the computer, 604–605, 651 defined, 579 diagnostics, 626–631, 631FE–632FE dummy variables, 620–622, 622FE–624FE Equal Spread Condition, 585 Equal Variance Assumption, 585 ethical considerations, 602, 649 forecasting with, 691–692 functionality, 579, 581, 581FE housing prices example, 587FE–591FE, 636FE–640FE Independence Assumption, 584 indicator variables, 620–622, 622FE–624FE influential case, 629–631 leverage in, 626–628 Linearity Assumption, 583–584 Linearity Condition, 584 logistic model, 595–597, 598FE mini case studies, 606, 652 Nearly Normal Condition, 585–586 Normality Assumption, 585–586 potential problems, 601–602, 648 quadratic terms, 644–648, 646FE–647FE Randomization Condition, 584 residuals and, 628–629 response variables in, 592, 595 roller coaster example, 616–622 seasonal components, 686–687, 687FE testing coefficients, 591–592, 592FE–593FE time on market example, 598FE–600FE time series, 689, 690 trend components, 685–686 Multiplication Rule defined, 196, 196FE General Multiplication Rule, 202 for independent events, 196, 202 Multiplicative model, 688–689, 690FE Multistage samples, 32–33 Mutually exclusive events See Disjoint events N Naive forecast, 673 Nambé Mills, 491–492, 501FE–503FE Nature, states of, 836–837, 837FE, 845 Nearly Normal Condition for ANOVA, 734–736 for comparing means, 404 for multiple regression, 585–586 for paired data, 419 for regression, 495, 496 for Student’s t-models, 337–338 Neural networks, 873–874 New York Stock Exchange (NYSE), 245–246 Nominal variables, 14 Nonlinear relationships basic approaches, 162–164 potential problems, 167 Nonparametric methods, 807–834 ANOVA alternative, 820–821 assumptions and conditions, 811, 817 buying from friends, 812FE–813FE defined, 810 ethical considerations, 825 Friedman test, 820–821 Kendall’s tau, 821–822, 822FE Kruskal-Wallace test, 813–814, 814FE–816FE Mann-Whitney test, 809–811 potential problems, 824 ranking values, 808–809 recommendations, 823–824 Spearman’s rho, 822–823, 823FE Wilcoxon rank-sum test, 809–811 Wilcoxon signed-rank test, 816–817, 818FE–820FE Nonresponse bias, 36–37 Normal model(s), 245–276 68–95–99.7 Rule, 247–248 Binomial random variables and, 260–262 Central Limit Theorem and, 285–288, 288FE cereal box weight examples, 252FE–254FE continuous random variables and, 263 critical values from, 310–311 defined, 249 distribution of the sums, 256–257 ethical considerations, 235, 245, 268 GMAT examples, 250FE–252FE mini case study, 269 Normal probability plots, 495–496 packaging stereos example, 257FE–259FE parameters in, 249, 288 potential problems, 267 standard, 249 Success/Failure Condition, 282 z-scores and, 249–250 Normal percentiles defined, 250 SAT examples, 250FE–252FE Normal Population Assumption for ANOVA, 734–736 for comparing means, 404 for paired data, 419 for regression, 495–496 for Student’s t-models, 337–338 Normal probability plots defined, 255, 495 functionality, 495–496 A-86 APPENDIX E • Index Normal probability plots (continued) making, 270 of residuals, 496 using, 256FE Normal scores, 495 Normality Assumption, 585–586 Null hypothesis chi-square tests and, 457 defined, 359 F-test, 591 in hypothesis testing, 363–365, 371–372 innocence as, 362 multiple regression and, 591 one-proportion z-test, 364 P-values and, 362–363, 371 rejecting, 362, 371–372 O Observational studies, 717–767 ANOVA on, 739–740 defined, 718 functionality, 718–719 train example, 719FE One-proportion z-interval defined, 308, 312 in hypothesis testing, 364 One-proportion z-test, 364 One-sample t-interval for the mean, 335 One-sample t-test for the mean, 368–369, 369FE–370FE One-sided alternative hypothesis, 366 Online analytical processing (OLAP), 866–867 Operating-characteristic (OC) curve, 777 Ordinal variables, 14 Ordinate (y-axis), 141 Origin, 141 Out-of-control action plan (OCAP), 785 Out-of-control processes actions for, 785–786, 786FE–790FE defined, 771 Outcomes of actions, 837, 837FE defined, 190, 191, 837 random phenomenon in, 190, 196 of sample space, 191 Outlier Condition for correlation, 144 for linear regression, 153, 156 for regression, 496 Outliers boxplot rule for nominating, 100FE Bozo the clown as, 166, 540 correlation and, 540 defined, 92, 140 in distributions, 92–93 estimating association, 823FE ethics and, 116–117, 346n, 557 far outlier, 100 identifying, 105–106, 106FE–107FE Outlier Condition, 144, 153, 156 in scatterplots, 140 P p charts, 790–792, 793FE P-values conclusions from, 363FE as conditional probabilities, 364 defined, 362 in hypothesis testing, 362–366, 371–372 for test of proportion, 368FE Paired data assumptions and conditions, 418–419 car discount example, 419FE, 423FE–424FE on the computer, 429–430 defined, 418 mini case study, 431 paired t-interval, 420 paired t-test, 403, 419–420, 423FE–424FE potential problems, 424–425 Wilcoxon signed-rank test, 816–817, 818FE–820FE Paired Data Assumption, 418 Paired t-interval defined, 420 seasonal spending example, 421FE–423FE Paired t-test car discount example, 419FE, 423FE–424FE on the computer, 429–430 defined, 418–419 Paralyzed Veterans of America, 652, 863–864 Parameters See also Model(s) defined, 30, 219, 249 population, 30 Participants, 10, 720 Payoff, 837 Payoff table, 838, 839FE PDCA cycle, 771, 772 Pearson correlation, 165FE, 823FE Pennzoil, 855–856 Percentages clarifying, 61 defined, 53 Percentiles, Normal, 249–252 Period, 668 Personal probability, 194 Pew Research Center, 29 Phenomenon, 191 Pie charts of conditional distributions, 62 defined, 56–57 Pilot test, 38 Placebo, 727 Placebo effect, 727 Poisson, Denis, 232 Poisson probability model, 232–233 Poisson random variable, 232 Polling methods, 25–26 Pooled-t confidence intervals, 413 Pooled t-tests, 411–413, 416, 416FE–417FE Pooling, 411–413, 416, 416FE–417FE Population(s) clusters in, 33 defining for surveys, 35–36 matching samples to, 27–28 parameters for, 30 regression and, 492–493 sampling frame, 37 strata, 32 undercoverage of, 39 Population parameters, 30 Posterior probability, 842 Power of hypothesis test, 378, 379–381 Predicted values confidence intervals for, 506 defined, 150 standard errors for, 504–509 Prediction interval for an individual response, 507, 514 confidence interval vs., 509–510, 510FE–511FE defined, 507 Predictive model, 867, 871 Predictor variables, 142 Prior probability, 842 Probability, 189–215 Addition Rule, 196, 196FE Complement Rule, 195, 195FE conditional, 201–203, 203FE, 842, 848–852 Index A-87 defined, 191 empirical, 192 ethics and, 853 of events, 191, 195–197 General Addition Rule, 197, 197FE General Multiplication Rule, 202 in hypothesis testing, 362 joint, 200–201, 203–204 Law of Large Numbers, 192–193 marginal, 200, 201FE mini case study, 206–207 M&M’s example, 198FE–200FE Multiplication Rule, 196, 196FE personal, 194 posterior, 842 potential problems, 204 prior, 842 Probability Assignment Rule, 195 of random phenomenon, 190–192, 195 rules for working with, 195–197 of sample space, 191 theoretical, 193–194 types of, 193–194 Probability Assignment Rule, 195 Probability density function (pdf), 263 Probability models, 217–243 binomial, 229–231 defined, 218–219 discrete, 226–234, 234FE ethical considerations, 235, 268 exponential, 266–267 geometric, 227–228 independence and, 228–229 Poisson, 232–233 potential problems, 234–235 random variables and, 218–219, 226–234 universal blood donor example, 231FE–232FE Probability trees, 848–851, 850FE Process capability indices, 773, 774FE Process capability studies, 773 Proportion(s), 305–329, 357–398 comparing, 464–465 on the computer, 321, 385–386 confidence intervals for, 305–329, 309FE, 311FE, 465, 466FE defined, 53, 306n, 307–308 hypothesis testing about, 357–398 mini case study, 388 one-proportion z-interval, 308, 312 sampling distribution models for, 278–279, 279FE, 280–284, 282FE Prospective studies, 719 Public opinion polling, 26 Push polls, 40 Q Qualitative variables, 12–13 See also Categorical data Quality control, 769–805 actions for out-of-control processes, 785–786, 786FE–790FE on the computer, 797 control charts for attributes, 790–792, 793FE control charts for individual observations, 774–778 control charts for measurements, 778–783, 783FE–784FE ethical considerations, 795 history of, 770–774 mini case study, 798 philosophies of, 793–794 Quantitative data, 85–135 AIG stock price example, 103FE–104FE boxplots, 99–100 center of distributions, 91, 93–95 comparing groups, 102–103 credit card bank example, 101FE–102FE displaying distributions, 86–91 displaying on computers, 119–121 ethical considerations, 116–117 five-number summary, 99–100 identifying outliers, 105–106, 106FE–107FE in manufacturing process, 790 mini case study, 121 potential problems, 114–116 range in, 95 shapes of distributions, 91–93, 98 spread of distributions, 91, 95–97, 97FE standardizing, 107–108, 108FE time series plot, 109–111, 111FE transforming skewed data, 112–113, 114FE Quantitative Data Condition checking, 91 for linear regression, 155 Quantitative variables defined, 12 differentiating, 14n displaying distributions, 86–91 histograms, 86–88 linear association between, 143 scatterplots for, 138–139, 141 stem-and-leaf displays, 90–91 units for, 12–13 Quantitative Variables Condition for correlation, 143 for linear regression, 153 for regression, 494 Quartiles defined, 96 finding by hand, 96 questionnaires in valid surveys, 36–38 R R chart, 778–783, 783FE–784FE, 786FE–789FE R2 adjusted, 593–595, 595FE defined, 159, 160FE size considerations, 159–160 variation in residuals, 158–159 Random assignment, 720 Random phenomenon defined, 190 probability of, 190–192, 195 Random samples choosing, 31FE–32FE on computers, 43 simple, 31–32 stratified, 32 Random selection, 29 Random variables, 217–243 Addition Rule for Expected Values, 224 Addition Rule for Variances of, 224 binomial, 229, 260–262 changing by constants, 223 computer inventory example, 221FE–222FE continuous, 218, 263–267 defined, 218 discrete, 218 ethical considerations, 235, 268 expected value of, 218–220, 220FE means and, 219–220 mini case study, 237–238 Poisson, 232 potential problems, 234–235 probability model of, 218–219, 226–234 Pythagorean Theorem of Statistics, 224 A-88 APPENDIX E • Index Random variables (continued) standard deviation of, 220–221, 223FE sums of, 226FE universal blood donor example, 231FE–232FE variance of, 220, 223–225 Random walks, 681 Randomization defined, 27 in experiments, 721 for sample surveys, 27–28 Randomization Condition for ANOVA, 733 for chi-square tests, 452, 461 for comparing means, 404 for confidence intervals, 312 for goodness-of-fit tests, 452 for homogeneity tests, 461 for multiple regression, 584 for nonparametric methods, 811 for paired data, 419 for regression, 495 sampling distribution models, 282, 289 for Student’s t-models, 336 Randomized block design, 723, 821 Randomness, 189–215 Range in quantitative data, 95 Rank sum test, 810, 821 Ranks assigning, 808–809 correlating for variables, 822 Ratio scale, 14n Re-expressing data on computers, 559–560 goals of, 548–550 Ladder of Powers, 551–552, 552FE mini case studies, 560 for regression models, 163–164, 164FE–165FE residuals in, 546–550, 551FE for skewed distributions, 113, 548–549 for symmetry, 548–549 for time series, 685–689 Records, 10 Regression, 491–530 See also Multiple Regression assumptions and conditions, 494–496, 497FE on computers, 171–172, 514–516 correlation and, 153, 504 Equal Spread Condition, 495 Equal Variance Assumption, 494–495, 496 ethical considerations, 512, 602, 649 extrapolation, 535–537, 538FE Galton on, 154 hypothesis test for correlation, 504, 504FE Independence Assumption, 494–495, 496 inferences for, 491–530 influential points in, 539–541 leverage, 539–541, 541FE linear See Linear regression Linearity Condition, 494 mini case study, 516–517 multiple See Multiple regression Nearly Normal Condition, 495, 496 nonlinear relationships and, 162–164 Normal Population Assumption, 495–496 Outlier Condition, 496 population and sample, 492–493 potential problems, 511–512 Quantitative Variables Condition, 494 Randomization Condition, 495 re-expressing data for linearity, 163–164, 164FE–165FE simple, 578n, 591 standard error for predicted values, 504–509 standard error of the slope, 497–499, 499FE stepwise, 633–635, 635FE–636FE Regression slope confidence interval for, 501 sampling distribution for, 500 standard error for, 497–499, 499FE t-test for, 499–501, 503FE, 591 Regression Sum of Squares (SSR), 593–594 Regression to the mean, 153–154 Relational databases, 11 Relative frequency, 191–192 bar chart, 56 histogram, 88 table, 53, 54FE vs probability, 191 Replicates, 721 Representative sample, 30 Research, and ethics, 18 Research hypothesis See Alternative hypothesis Residual(s), 531–575 autocorrelation, 543–546 for chi-square tests, 457–458, 458FE on computers, 559–560 defined, 150, 532 ethical considerations, 557 extraordinary observations, 538–541 extrapolation and prediction, 535–537, 538FE groups in, 532–534, 535FE influential points in, 539–541 irregular component in time series, 689–690 Ladder of Powers, 551–552, 552FE least squares, 532–533 linear models and, 150, 156–157, 158FE Linearity Condition, 546–548 mini case studies, 560 multiple regression and, 628–629 negative, 150 Normal probability plots of, 496 positive, 150 potential problems, 556–557 in re-expressing data, 546–550, 551FE scatterplots of, 496, 544 standard deviation of, 157, 496 standardized, 457 Studentized, 628–629 variation in, 158–159 working with summary values, 542–543 Residual standard deviation, 157, 498 Respondents, 10 Response, 720 Response bias, 37 Response sample, 38–39 Response variables defined, 142, 720 in experiments, 720 in multiple regressions, 592, 595 Retrospective studies, 719, 719FE Return to risk ratio (RRR), 845 Risk management, 835–862 Roper, Elmo, 25–26 Row percent, 59 RSPT Inc., 18 Run charts, 774–778, 778FE S S chart, 783 SAC Capital, 449–450 Sagan, Carl, 374 Sample(s) biases in, 27 Index A-89 cluster, 32–33 convenience, 39 defined, 26 matching to populations, 27–28 multistage, 32–33 regression and, 492–493 representative, 30 simple random sample, 31–32 stratified random, 32 systematic, 33–34 voluntary response, 38–39 Sample size calculating by hand, 344 choosing, 315–316 for confidence interval for means, 344FE effect of, 280 Law of Diminishing Returns, 290 for sample surveys, 28–29 Sample Size Assumption for chi-square tests, 452, 461 for confidence intervals, 312 for goodness-of-fit tests, 452 for homogeneity tests, 461 sampling distribution models, 282, 289 Sample space, 191 Sample statistic, 30 Sample surveys census considerations, 29 defined, 27 defining populations, 35–36 ethical considerations, 41 examining part of the whole, 26–27 population parameters, 30 randomizing, 27–28 sample size for, 28–29 simple random sample, 31–32 undercoverage in, 39 valid, 36–38 Sampling, 25–50 cluster, 32–33 convenience, 39 examining part of the whole, 26–27 identifying terms, 30FE randomizing, 27–28 sample size for, 28–29 systematic, 33–34 variability in, 31 Sampling distribution models, 277–301 10% Condition, 282, 289 assumptions and conditions, 282, 283FE, 289 Central Limit Theorem, 285–291, 332 cereal box weight examples, 252FE–254FE for difference between means, 403, 404FE ethical considerations, 245 Independence Assumption, 282, 289 Large Enough Sample Condition, 289 Law of Diminishing Returns, 290 for means, 285–291, 332–334 mini case study, 294 parameters in, 288 potential problems, 292 for a proportion, 278–279, 279FE, 280–284, 282FE Randomization Condition, 282, 289 for regression slopes, 500 for the sample proportion, 280–281 Sample Size Assumption, 282, 289 simulations, 279 Success/Failure Condition, 282 working with, 290–291, 290FE Sampling error bias vs., 32 defined, 28, 280 Sampling frame bias in, 39 defined, 31 for valid surveys, 37 Sampling variability, 31, 280 Sarbanes-Oxley Act, 483 Satterthwaite, F E., 402n Scatterplot matrix, 147n Scatterplots assigning roles to variables in, 141–142 on computers, 171–172 correlation in, 142–147, 544 creating, 140FE–141FE customer spending example, 145FE–146FE defined, 138 direction of association, 139 ethical considerations, 168 form of, 139 linear association in, 139 lurking variables in, 148–149 mini case study, 173–174 outliers in, 140 potential problems, 165–168 re-expressing data, 550 of residuals, 496, 544 standardizing, 143 strength of, 140 strength of relationships in, 140 summary values in, 542–543 Seasonal component, 668, 670FE, 686–687, 687FE Securities and Exchange Commission, 174 Segmented bar charts, 64 Sensitivity in decision-making, 845–846, 846FE Shapes of distributions defined, 91 describing, 93FE explanation, 98 modes of histograms, 91 outliers, 92–93 symmetric histograms, 92 Shewhart, Walter A., 771, 772, 794 Shewhart charts, 771–772 Significance level defined, 371 ethics and, 383 in hypothesis testing, 371–372 Similar Variance Condition, 733–734 Simple moving average forecast, 671–673, 673FE–674FE, 695 Simple random sample (SRS), 31–32 Simple regression, 578n, 591 Simpson’s Paradox, 69 Simulation(s) of decision-making, 846 sampling distribution models, 279 sampling distributions of a mean, 285–286 Single-blind experiments, 727 Single exponential smoothing (SES) model, 674 Six Sigma philosophy, 794 68–95–99.7 Rule applying, 248FE critical values and, 373n defined, 247–248 Skewed distributions defined, 92 re-expressing to improve symmetry, 548–549 transforming, 112–113, 114FE Slope adjusting for, 624–625 defined, 151 in linear model, 151–152 standard error of, 497–499, 499FE A-90 APPENDIX E • Index Smoothing methods exponential, 674–675, 695 SES model, 674 simple moving averages, 671–673, 673FE–674FE, 695 time series, 670–675 weighted moving averages, 673, 673FE–674FE Something Has to Happen Rule, 195 Sony Corporation, 769–770, 774 Spearman rank correlation, 163, 165FE Spearman’s rho, 822–823, 823FE Special-cause variation, 771 Specification limits, 772–774 Spend lift, 400 SPLOM, 147n Spread of distribution explanation, 91, 97FE, 99 interquartile range, 96 quartiles and, 96 ranges and, 95 variance and, 97 Spreadsheets, 10 SPSS statistics package for ANOVA, 755 chi-square tests, 475 confidence intervals for proportions, 321 displaying categorical data, 72 displaying quantitative data, 121 hypothesis tests for proportions, 387 inference for means, 348 linear regression, 172 making Normal probability plots, 270 paired t-test, 430 quality control charts, 797 re-expressing data, 560 regression analysis, 516, 604, 651 scatterplots and correlation, 172 time series methods, 701 two-sample methods, 429 Stacked format for data, 428 Standard deviation(s) of an action, 843–845, 845FE defined, 97, 220 finding by hand, 97–98 hypothesis testing and, 360, 374 Law of Diminishing Returns, 290 of the mean, 97 measuring spread, 97 Normal models, working with, 250FE–252FE, 252FE–254FE of random variables, 220–221, 223FE of residuals, 157, 498 as a ruler, 246–248 Six Sigma philosophy, 794 z-scores, 108, 247 Standard error(s) critical values, 310–311 defined, 290–291 for predicted values, 504–509 for regression slope, 497–499, 499FE Standard Normal distribution, 249 Standard Normal model, 249 Standardized residuals for chi-square, 457 Standardized values, 108 Standardizing quantitative data, 107–108, 108FE States of nature, 836–837, 837FE, 845 Stationary in the mean, 667 Stationary in the variance, 669 Stationary time series, 112 Statistic defined, 30 sample, 30 Statistical process control (SPC), 771 Statistical significance in hypothesis testing, 371 practical significance vs., 372 Statistics, 1–6 variation and, 1–2 Statistics packages See specific packages Stem-and-leaf display defined, 90 for displaying quantitative variables, 90–91 Tukey and, 417n Stemplot See Stem-and-leaf display Stepwise regression, 633–635, 635FE–636FE Strata clusters vs., 33 defined, 32 Stratified random sample, 32, 33FE Studentized residuals, 628–629 Student’s t-models 10% Condition, 336–337 assumptions and conditions, 336–338, 338FE–339FE defined, 333 degrees of freedom and, 333, 345 finding critical values, 341–342 functionality, 333–335 Gosset and, 333 Independence Assumption, 336 Nearly Normal Condition, 337–338 Normal Population Assumption, 337–338 potential problems, 345–346 Randomization Condition, 336 test for regression slopes, 499–501, 503FE Subjects in experiments, 10, 720 Success/Failure Condition for Binomial models, 261 for confidence intervals, 312 sampling distribution models, 282 Sum of Squared Residuals (SSE), 160, 593–594, 731 Summary statistics, 542–543 Surveys, 25–50 designing, 38FE, 40FE market demand, 34FE–35FE mini case study, 44 respondents to, 10 sample See Sample surveys Symmetric distributions defined, 92 re-expressing data, 548–549 Systematic sample, 33–34 T t-interval one-sample, 335 paired, 420 pooled, 413 two-sample, 409 t-ratios for regression coefficients, 580 t-tests for the correlation coefficient, 504 independent, 403 one-sample, for means, 368–369, 369FE–370FE paired, 418–419, 419FE, 423FE–424FE, 429–430 pooled, 411–413, 416, 416FE–417FE for regression slope, 499–501, 503FE, 591 two-sample, for means, 403, 404FE Tables ANOVA, 730 cells of, 59, 451–452 contingency, 58–64, 200–201, 203–204, 461 data, 9–10 frequency, 53–54, 54FE Index A-91 payoff, 838, 839FE relative frequency, 53, 54FE Tails of distribution, 92 Taleb, Nassim Nicholas, 193 Technology help See Computers 10% Condition for comparing means, 404 for confidence intervals, 312 independence and, 228–229 for paired data, 419 sampling distribution models, 282, 289 for Student’s t-models, 336–337 Terminal node, 871 Test set, 870 Texaco, 855–856 Theoretical probability, 193–194 Tiffany & Co., 701–702 Time series, 665–716 additive model, 688–689, 690FE autoregressive models, 677–681, 680FE, 696 choosing forecasting method, 695–696 comparing methods example, 681FE–684FE, 692FE–695FE components of, 667–670 on the computer, 700–701 defined, 14–15, 543, 667 deseasonalized, 668 ethical considerations, 697 exponential smoothing, 674–675, 695 forecasting with regression-based models, 691–692, 696 interpreting models, 696 mini case studies, 701–702 modeling, 669–670, 670FE multiple regression based models, 685–687 multiplicative model, 688–689, 690FE potential problems, 697 random walks, 681 SES model, 674 simple moving average methods, 671–673, 673FE–674FE, 695 smoothing methods, 670–675 stationary, 112 summarizing forecast error, 675–677, 677FE weighted moving averages, 673, 673FE–674FE Time series plot defined, 109 explanation, 109–111, 111FE Total percent, 59 Total quality management (TQM), 794 Total Sum of Squares (SST), 593–594 Training set, 870 Transactional data, 9, 864–866 Transforming data See Re-expressing data Treatment(s), 720 Trend component, 667–668, 685–687, 687FE Trends, 776 Trials Bernoulli, 227–233 defined, 190, 191, 227 and hypothesis testing, 361–363 independence of, 192 Law of Large Numbers, 192 Triangular distribution, 285 Truman, Harry, 25 Tukey, John W., 100, 308, 417n Tukey’s test, 417, 417FE–418FE Two-sample t-interval, 409 Two-sample t-methods on the computer, 427–429 degrees of freedom and, 402 two-sample t-interval for the difference between means, 409 two-sample t-test for the difference between means, 403, 404FE, 408FE Two-sample t-test for the difference between means, 403, 404FE, 408FE Two-sided alternative hypothesis, 365 Type I error defined, 377 effect size and, 378 example, 379FE reducing, 381 Type II error defined, 377 effect size and, 378 example, 379FE reducing, 381 U Undercoverage, 39 Underestimate, 150, 690, 692 Uniform distribution defined, 91, 285 discrete, 227 functionality, 264–266 Unimodal distribution, 91 See also Nearly Normal Condition Units experimental, 10, 720 for quantitative variables, 12–13 Unstacked data, 428 V Variables assigning roles in scatterplots, 141–142, 142FE categorical, 12–13, 54–57, 451, 467 cryptic abbreviations for, 13 defined, 10 demographic, 865 dependent, 142 dichotomous, 790 dummy, 620–622, 622FE–624FE exogenous, 690 explanatory, 142 five-number summary, 99 identifier, 13–14 independent, 63, 142, 192 indicator, 620–622, 622FE–624FE, 630–631, 631FE–632FE interaction term, 624–625, 626FE lagged, 678 lurking, 148–149, 728 nominal, 14 ordinal, 14 predictor, 142 qualitative, 12–13 quantitative, 12–13, 14n, 138–139, 142 random See Random variables response, 142, 592, 595, 720 x-variable, 142 y-variable, 142 Variance adding for sums and differences, 401 Addition Rule for, 224 defined, 97, 220 of random variables, 220, 223–225 stationary in the, 669 Variance Inflation Factor (VIF), 642 Variance inflation of coefficients, 642 Variation, 1–6 coefficient of, 844 common-cause, 771 estimating, for actions, 843–845, 845FE special-cause, 771 statistics and, 1–2 VIF (Variance Inflation Factor), 642 A-92 APPENDIX E • Index Visa credit card, 399–400 Voluntary response bias, 39 Voluntary response sample, 38–39 Wilcoxon signed-rank test, 816–817, 818FE–820FE Wilson, E B., 317n W X Weighted moving averages, 673, 673FE–674FE Welch, B L., 402n Whitney, D R., 810 Whole Foods Market, 665–666, 685–686, 696 Wilcoxon, Frank, 809 Wilcoxon rank-sum test, 808–811, 812FE x-axis, 141 X chart, 778–783, 783FE–784FE, 786FE–789FE x-variable, 142 Y y-axis, 141 y-variable, 142 Z z-interval, one-proportion, 308, 312, 364 z-scores defined, 108, 249 Normal models and, 249–250 Normal percentiles and, 250 in scatterplots, 143 standard deviations and, 108, 247 z-test, one-proportion, 364 Zabriskie, Dave, 548 Zillow.com, 577–578, 587FE–591FE This page intentionally left blank Assumptions for Inference And the Conditions That Support or Override Them Proportions (z) • One sample Individuals are independent Sample is sufficiently large SRS and n 10% of the population Successes and failures each Ú 10 Means (t) • One Sample 1df = n - 12 Individuals are independent Population has a Normal model SRS and n 10% of the population Histogram is unimodal and symmetric.* Data are matched Individuals are independent Population of differences is Normal (Think about the design.) SRS and n 10% OR random allocation Histogram of differences is unimodal and symmetric.* • Matched pairs 1df = n - 12 • Two independent samples (df from technology) Groups are independent Data in each group are independent Both populations are Normal (Think about the design.) SRSs and n 10% OR random allocation Both histograms are unimodal and symmetric.* Distributions/Association (x2) • Goodness of fit (df = # of cells - 1; one variable, one sample compared with population model) Data are counts Data in sample are independent Sample is sufficiently large (Are they?) SRS and n 10% of the population All expected counts Ú Data are counts Data in groups are independent Groups are sufficiently large (Are they?) SRSs and n 10% OR random allocation All expected counts Ú Data are counts Data are independent Sample is sufficiently large (Are they?) SRSs and n 10% of the population All expected counts Ú • Homogeneity [df = 1r - 121c - 12; many groups compared on one variable] • Independence [df = 1r - 121c - 12; sample from one population classified on two variables] Regression with k predictors (t, df ؍n ؊ k ؊ 1) • Association of each quantitative predictor with the response variable Form of relationship is linear Errors are independent Variability of errors is constant Errors follow a Normal model Scatterplots of y against each x are straight enough Scatterplot of residuals against predicted values shows no special structure No apparent pattern in plot of residuals against predicted values Plot of residuals against predicted values has constant spread, doesn’t “thicken.” Histogram of residuals is approximately unimodal and symmetric, or Normal probability plot is reasonably straight.* Analysis of Variance (F, df dependent on number of factors and number of levels in each) • Equality of the mean response across levels of categorical predictors Additive Model (if there are factors with no interaction term) Independent errors Equal variance across treatment levels Errors follow a Normal model Interaction plot shows parallel lines (otherwise include an interaction term if possible) Randomized experiment or other suitable randomization Plot of residuals against predicted values has constant spread Boxplots (partial boxplots for factors) show similar spreads Histogram of residuals is unimodal and symmetric, or Normal probability plot is reasonably straight *Less critical as n increases Quick Guide to Inference Plan Inference about? Proportions Means Distributions (one categorical variable) Independence (two categorical variables) Do One group or two? One sample Procedure Model Parameter Estimate 1-Proportion z-Interval z p pN t df = n - pN qN An p0q0 Two independent groups 2-Sample t-Test 2-Sample t-Interval Matched pairs y df from technology m1 - m2 y1 - y2 Paired t-Test Paired t-Interval t df = n - md d One Sample Goodnessof-Fit x2 df = cells - Many independent groups Homogeneity x2 Test One sample *Confidence Interval for mn x2 Association (one quantitative variable fit modeled by k quantitative variables) 12 s 21 s 22 + A n1 n2 13 sd 14 1n a df = (r - 1)(c - 1) 15 (Obs - Exp)2 Exp se t df = n - *Prediction Interval for yn One sample 11 1n t Linear Regression t-Test or Confidence Interval for b 10 s m Independence x2 Test Chapter A n t-Interval t-Test One sample SE 1-Proportion z-Test One sample Association (two quantitative variables) Report b1 b1 mn yN n yn yN n bj sx 1n - (compute with technology) A A SE2(b1) # (xn - x)2 + SE2(b1) # (xn - x)2 + Multiple Regression t-test or Confidence interval for each b j t df = n - 1k + 12 F test for regression model F df = k and n - 1k + 12 MST/MSE F df = k - and N - k MST/MSE (from technology) bj se2 n 16 s 2e + se2 n 17, 18, 19 18, 19 Association (one quantitative and two or more categorical variables) Two or more ANOVA 23 ... U.K U.S Total 197 25 7 315 480 98 63 92 274 405 364 326 82 46 38 6 42 304 196 26 3 41 36 53 21 0 25 2 348 486 125 70 62 197 20 3 25 0 478 100 58 29 1 520 1 421 1473 20 33 446 27 3 27 4 15 02 1535 1535 1553... 13. 12% 17.11 20 .97 31.96 6. 52 4.19 6.13 17.85 26 .38 23 .71 21 .24 5.34 3.00 2. 48 41. 82 19.80 12. 77 17.13 2. 67 2. 35 3.45 13. 52 16 .23 22 .41 31 .29 8.05 4.51 3.99 14.98 15.44 19.01 36.35 7.60 4.41 2. 21... 20 29 30–39 40–49 50–59 60 ؉ Total 396 325 318 397 83 37 40 337 326 3 12 376 83 43 37 300 307 317 403 88 53 53 25 2 25 4 27 0 423 93 58 56 1 42 123 150 22 4 54 37 36 93 86 106 21 0 45 45 52 1 520 1 421