Statictis for managers using microsoft excel 8th global edtion by levine stephan

792 2.9K 1
Statictis for managers using microsoft excel 8th global edtion by levine stephan

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

A Roadmap for Selecting a Statistical Method Data Analysis Task For Numerical Variables For Categorical Variables Describing a group or Ordered array, stem-and-leaf display, frequency Summary table, bar chart, pie several groups distribution, relative frequency distribution, chart, doughnut chart, Pareto chart percentage distribution, cumulative percentage (Sections 2.1 and 2.3) distribution, histogram, polygon, cumulative percentage polygon, sparklines, gauges, treemaps (Sections 2.2, 2.4, 2.6, 17.4) Mean, median, mode, geometric mean, quartiles, range, interquartile range, standard deviation, variance, coefficient of variation, skewness, kurtosis, boxplot, normal probability plot (Sections 3.1, 3.2, 3.3, 6.3) Index numbers (online Section 16.8) Inference about one group Confidence interval estimate of the mean (Sections 8.1 and 8.2) t test for the mean (Section 9.2) Chi-square test for a variance or standard deviation (online Section 12.7) Confidence interval estimate of the proportion (Section 8.3) Z test for the proportion (Section 9.4) Comparing two groups Tests for the difference in the means of two ­independent populations (Section 10.1) Wilcoxon rank sum test (Section 12.4) Paired t test (Section 10.2) F test for the difference between two variances (Section 10.4) Z test for the difference between two proportions (Section 10.3) Chi-square test for the difference between two proportions (Section 12.1) McNemar test for two related samples (online Section 12.6) Comparing more than One-way analysis of variance for comparing several Chi-square test for differences two groups means (Section 11.1) among more than two proportions (Section 12.2) Kruskal-Wallis test (Section 12.5) Two-way analysis of variance (Section 11.2) Randomized block design (online Section 11.3) Analyzing the relationship between two variables Scatter plot, time-series plot (Section 2.5) Covariance, coefficient of correlation (Section 3.5) Simple linear regression (Chapter 13) t test of correlation (Section 13.7) Time-series forecasting (Chapter 16) Sparklines (Section 2.6) Contingency table, side-by-side bar chart, doughnut chart, ­PivotTables (Sections 2.1, 2.3, 2.6) Chi-square test of independence (Section 12.3) Analyzing the relationship between two or more variables Multiple regression (Chapters 14 and 15) Regression trees (Section 17.5) Multidimensional contingency ­tables (Section 2.6) Drilldown and slicers (Section 2.6) Logistic regression (Section 14.7) Classification trees (Section 17.5) Available with MyStatLab™ for Your Business Statistics Courses MyStatLab is the market-leading online learning management program for learning and teaching business statistics Statistical Software Support Built-in tutorial videos and functionality make using the most popular software solutions seamless and intuitive Tutorial videos, study cards, and manuals (for select titles) are available within MyStatLab and accessible at the point of use Easily launch exercise and eText data sets into Excel or StatCrunch, or copy and paste into any other software program Leverage the Power of StatCrunch MyStatLab leverages the power of StatCrunch –powerful, web-based statistical software In addition, access to the full online community allows users to take advantage of a wide variety of resources and applications at www.statcrunch.com Bring Statistics to Life Virtually flip coins, roll dice, draw cards, and interact with animations on your mobile device with the extensive menu of experiments and applets in StatCrunch Offering a number of ways to practice resampling procedures, such as permutation tests and bootstrap confidence intervals, StatCrunch is a complete and modern solution www.mystatlab.com Statistics for Managers Using ® Microsoft Excel 8th Edition Global Edition David M Levine Department of Statistics and Computer Information Systems Zicklin School of Business, Baruch College, City University of New York David F Stephan Two Bridges Instructional Technology Kathryn A Szabat Department of Business Systems and Analytics School of Business, La Salle University Harlow, England • London • New York • Boston • San Francisco • Toronto • Sydney • Dubai • Singapore • Hong Kong Tokyo • Seoul • Taipei • New Delhi • Cape Town • Sao Paulo • Mexico City • Madrid • Amsterdam • Munich • Paris • Milan Editorial Director: Chris Hoag Editor in Chief: Deirdre Lynch Acquisitions Editor: Suzanna Bainbridge Senior Editorial Assistant: Justin Billing Program Manager: Chere Bemelmans Project Manager: Sherry Berg Project Manager, Global Edition: Laura Perry Associate Acquisitions Editor, Global Edition: Ananya Srivastava Associate Project Editor, Global Edition: Paromita Banerjee Manager, Media Production, Global Edition: M Vikram Kumar Manufacturing Controller, Production, Global Edition: Kay Holman Program Management Team Lead: Karen Wernholm Project Management Team Lead: Peter Silvia Media Producer: Jean Choe TestGen Content Manager: John Flanagan QA Manager, Assessment Content: Marty Wright Senior MathXL Content Developer: Robert Carroll Marketing Manager: Tiffany Bitzel Field Marketing Manager: Adam Goldstein Marketing Assistant: Jennifer Myers Senior Author Support/Technology Specialist: Joe Vetere Manager, Rights Management: Gina Cheselka Senior Procurement Specialist: Carol Melville Associate Director of Design: Andrea Nix Program Design Lead and Cover Design: Barbara Atkinson Text Design: Lumina Datamatics, Inc Production Coordination, Composition, and Illustrations: Lumina Datamatics Cover Image: Shutterstock MICROSOFT® AND WINDOWS® ARE REGISTERED TRADEMARKS OF THE MICROSOFT CORPORATION IN THE U.S.A AND OTHER COUNTRIES THIS BOOK IS NOT SPONSORED OR ENDORSED BY OR AFFILIATED WITH THE MICROSOFT CORPORATION ILLUSTRATIONS OF MICROSOFT EXCEL IN THIS BOOK HAVE BEEN TAKEN FROM MICROSOFT EXCEL 2013, UNLESS OTHERWISE INDICATED MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS MAKE NO REPRESENTATIONS ABOUT THE SUITABILITY OF THE INFORMATION CONTAINED IN THE DOCUMENTS AND RELATED GRAPHICS PUBLISHED AS PART OF THE SERVICES FOR ANY PURPOSE ALL SUCH DOCUMENTS AND RELATED GRAPHICS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS HEREBY DISCLAIM ALL WARRANTIES AND CONDITIONS WITH REGARD TO THIS INFORMATION, INCLUDING ALL WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT IN NO EVENT SHALL MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THE SERVICES THE DOCUMENTS AND RELATED GRAPHICS CONTAINED HEREIN COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED HEREIN AT ANY TIME PARTIAL SCREEN SHOTS MAY BE VIEWED IN FULL WITHIN THE SOFTWARE VERSION SPECIFIED Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsonglobaleditions.com © Pearson Education Limited 2017 The rights of David M Levine, David F Stephan, and Kathryn A Szabat to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988 Authorized adaptation from the United States edition, entitled Statistics for Managers Using Microsoft Excel, 8th edition, ISBN 978-0-13-417305-4, by David M Levine, David F Stephan, and Kathryn A Szabat, published by Pearson Education © 2017 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a license permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS All trademarks used herein are the property of their respective owners The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners ISBN 10: 1-292-15634-1 ISBN 13: 978-1-292-15634-7 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library 10 14 13 12 11 10 To our spouses and children, Marilyn, Sharyn, Mary, and Mark and to our parents, in loving memory, Lee, Reuben, Ruth, Francis, Mary, and William About the Authors David M Levine, David F Stephan, and Kathryn A Szabat are all experienced business school educators committed to innovation and improving instruction in business statistics and related subjects David Levine, Professor Emeritus of Statistics and CIS at Baruch College, CUNY, is a nationally recognized innovator in statistics education for more than three decades Levine has coauthored 14 books, including several business statistics textbooks; textbooks and professional titles that explain and explore quality management and the Six Sigma approach; and, with David Stephan, a trade paperback that explains statistical concepts to a general audience Levine has presented or chaired numerous sessions about business eduKathryn Szabat, David Levine, and David Stephan cation at leading conferences conducted by the Decision Sciences Institute (DSI) and the American Statistical Association, and he and his coauthors have been active participants in the annual DSI Making Statistics More Effective in Schools and Business (MSMESB) mini-conference During his many years teaching at Baruch College, Levine was recognized for his contributions to teaching and curriculum development with the College’s highest distinguished teaching honor He earned B.B.A and M.B.A degrees from CCNY and a Ph.D in industrial engineering and operations research from New York University Advances in computing have always shaped David Stephan’s professional life As an undergraduate, he helped professors use statistics software that was considered advanced even though it could compute only several things discussed in Chapter 3, thereby gaining an early appreciation for the benefits of using software to solve problems (and perhaps positively influencing his grades) An early advocate of using computers to support instruction, he developed a prototype of a mainframe-based system that anticipated features found today in Pearson’s MathXL and served as special assistant for computing to the Dean and Provost at Baruch College In his many years teaching at Baruch, Stephan implemented the first computer-based classroom, helped redevelop the CIS curriculum, and, as part of a FIPSE project team, designed and implemented a multimedia learning environment He was also nominated for teaching honors Stephan has presented at the SEDSI conference and the DSI MSMESB mini-conferences, sometimes with his coauthors Stephan earned a B.A from Franklin & Marshall College and an M.S from Baruch College, CUNY, and he studied instructional technology at Teachers College, Columbia University As Associate Professor of Business Systems and Analytics at La Salle University, Kathryn Szabat has transformed several business school majors into one interdisciplinary major that better supports careers in new and emerging disciplines of data analysis including analytics Szabat strives to inspire, stimulate, challenge, and motivate students through innovation and curricular enhancements, and shares her coauthors’ commitment to teaching excellence and the continual improvement of statistics presentations Beyond the classroom she has provided statistical advice to numerous business, nonbusiness, and academic communities, with particular interest in the areas of education, medicine, and nonprofit capacity building Her research activities have led to journal publications, chapters in scholarly books, and conference presentations Szabat is a member of the American Statistical Association (ASA), DSI, Institute for Operation Research and Management Sciences (INFORMS), and DSI MSMESB She received a B.S from SUNY-Albany, an M.S in statistics from the Wharton School of the University of Pennsylvania, and a Ph.D degree in statistics, with a cognate in operations research, from the Wharton School of the University of Pennsylvania For all three coauthors, continuous improvement is a natural outcome of their curiosity about the world Their varied backgrounds and many years of teaching experience have come together to shape this book in ways discussed in the Preface Brief Contents Preface 17 First Things First 25 Defining and Collecting Data  36 Organizing and Visualizing Variables  56 Numerical Descriptive Measures  119 Basic Probability  165 Discrete Probability Distributions  190 The Normal Distribution and Other Continuous Distributions  213 Sampling Distributions  240 Confidence Interval Estimation  261 Fundamentals of Hypothesis Testing: One-Sample Tests  294 10 Two-Sample Tests  331 11 Analysis of Variance   372 12 Chi-Square Tests and Nonparametric Tests  410 13 Simple Linear Regression  451 14 Introduction to Multiple Regression  499 15 Multiple Regression Model Building  545 16 Time-Series Forecasting  577 17 Getting Ready To Analyze Data In The Future 622 18 Statistical Applications in Quality Management (online)  18-1 19 Decision Making (online)  19-1 Appendices A–G  637 Self-Test Solutions and Answers to Selected Even-Numbered Problems  685 Index  714 Credits  721 Contents Preface 17 1.4 Data Preparation  44 Data Cleaning  44 Data Formatting  45 Stacked and Unstacked Variables  45 Recoding Variables  46 First Things First  25 Using Statistics: “The Price of Admission”  25 1.5 Types of Survey Errors  47 Coverage Error  47 Nonresponse Error  47 Sampling Error  47 Measurement Error  48 Ethical Issues About Surveys  48 Now Appearing on Broadway and Everywhere Else  26 FTF.1  Think Differently About Statistics  26 Statistics: A Way of Thinking  26 Analytical Skills More Important than Arithmetic Skills  27 Statistics: An Important Part of Your Business Education  27 FTF.2  B  usiness Analytics: The Changing Face of Statistics 28 “Big Data”  28 Structured Versus Unstructured Data  28 FTF.3  Getting Started Learning Statistics  29 Statistic 29 Can Statistics (pl., Statistic) Lie?  30 FTF.4  Preparing to Use Microsoft Excel for Statistics  30 Reusability Through Recalculation  31 Practical Matters: Skills You Need  31 Ways of Working with Excel  31 Excel Guides  32 Which Excel Version to Use?  32 Conventions Used  32 References 33 Key Terms  33 Excel Guide  34 EG.1 Entering Data  34 EG.2 Reviewing Worksheets  34 EG.3 If You Plan to Use the Workbook Instructions  35 Defining and Collecting Data  36 Consider This: New Media Surveys/Old Survey Errors  48 Using Statistics: Defining Moments, Revisited  50 Summary 50 References 50 Key Terms  50 Checking Your Understanding  51 Chapter Review Problems  51 Cases For Chapter 1 52 Managing Ashland MultiComm Services  52 CardioGood Fitness  52 Clear Mountain State Student Survey  53 Learning with the Digital Cases  53 Chapter Excel Guide  54 EG1.1 Defining Variables  54 EG1.2 Collecting Data  54 EG1.3 Types of Sampling Methods  55 EG1.4 Data Preparation  55 Organizing and Visualizing Variables  56 Using Statistics: “The Choice Is Yours”  56 Using Statistics: Defining Moments  36 2.1 Organizing Categorical Variables  57 1.1 Defining Variables  37 Classifying Variables by Type  38 Measurement Scales  38 The Summary Table  57 The Contingency Table  58 2.2 1.2 Collecting Data  39 The Frequency Distribution  62 Classes and Excel Bins  64 The Relative Frequency Distribution and the Percentage Distribution 65 The Cumulative Distribution  67 Populations and Samples  40 Data Sources  40 1.3 Types of Sampling Methods  41 Simple Random Sample  42 Systematic Sample  42 Stratified Sample  43 Cluster Sample  43 Organizing Numerical Variables  61 2.3 Visualizing Categorical Variables  70 The Bar Chart  70 The Pie Chart and the Doughnut Chart  71 Contents The Pareto Chart  72 Visualizing Two Categorical Variables  74 The Variance and the Standard Deviation  126 EXHIBIT: Manually Calculating the Sample Variance, S2, and Sample Standard Deviation, S 127 The Coefficient of Variation  129 Z Scores  130 Shape: Skewness  132 Shape: Kurtosis  132 2.4 Visualizing Numerical Variables  76 The Stem-and-Leaf Display  77 The Histogram  78 The Percentage Polygon  79 The Cumulative Percentage Polygon (Ogive)  80 2.5 Visualizing Two Numerical Variables  83 3.3 Exploring Numerical Data  137 Quartiles 137 EXHIBIT: Rules for Calculating the Quartiles from a Set of Ranked Values  137 The Interquartile Range  139 The Five-Number Summary  139 The Boxplot  141 The Scatter Plot  83 The Time-Series Plot  85 2.6 Organizing and Visualizing a Mix of Variables  87 Multidimensional Contingency Table  87 Adding a Numerical Variable to a Multidimensional Contingency Table  88 Drill Down  88 Excel Slicers  89 PivotChart 90 Sparklines 90 2.7 The Challenge in Organizing and Visualizing Variables 92 Obscuring Data  92 Creating False Impressions  93 Chartjunk 94 EXHIBIT: Best Practices for Creating Visualizations  96 Using Statistics: The Choice Is Yours, Revisited  97 Summary 97 References 98 Key Equations  98 Key Terms  99 Checking Your Understanding  99 Chapter Review Problems  99 Cases For Chapter 2 104 Managing Ashland MultiComm Services  104 Digital Case  104 CardioGood Fitness  105 The Choice Is Yours Follow-Up  105 Clear Mountain State Student Survey   105 Chapter Excel Guide  106 EG2.1 Organizing Categorical Variables  106 EG2.2 Organizing Numerical Variables  108 EG2.3 Visualizing Categorical Variables  110 EG2.4 Visualizing Numerical Variables  112 EG2.5 Visualizing Two Numerical Variables  116 EG2.6 Organizing and Visualizing a Set of Variables  116 Numerical Descriptive Measures  119 3.4 Numerical Descriptive Measures for a Population 143 The Population Mean  144 The Population Variance and Standard Deviation  144 The Empirical Rule  145 Chebyshev’s Theorem  146 3.5 The Covariance and the Coefficient of Correlation  148 The Covariance  148 The Coefficient of Correlation  149 3.6 Statistics: Pitfalls and Ethical Issues  154 Using Statistics: More Descriptive Choices, Revisited 154 Summary 154 References 155 Key Equations  155 Key Terms  156 Checking Your Understanding  156 Chapter Review Problems  157 Cases For Chapter 3 160 Managing Ashland MultiComm Services  160 Digital Case  160 CardioGood Fitness  160 More Descriptive Choices Follow-up  160 Clear Mountain State Student Survey  160 Chapter Excel Guide  161 EG3.1 Central Tendency  161 EG3.2 Variation and Shape  162 EG3.3 Exploring Numerical Data  162 EG3.4 Numerical Descriptive Measures for a Population  163 EG3.5 The Covariance and the Coefficient of Correlation  163 Basic Probability  165 Using Statistics: More Descriptive Choices  119 Using Statistics: Possibilities at M&R Electronics World 165 3.1 Central Tendency  120 4.1 Basic Probability Concepts  166 The Mean  120 The Median  122 The Mode  123 The Geometric Mean  124 3.2 Variation and Shape  125 The Range  125 Events and Sample Spaces  167 Contingency Tables  169 Simple Probability  169 Joint Probability  170 Marginal Probability  171 General Addition Rule  171 Self-Test Solutions and Answers to Selected Even-Numbered Problems price can be explained by variation in number of rooms r 2Y2.1 = 0.431 Holding constant the effect of number of rooms, 43.1% of the variation in selling price can be explained by variation in neighborhood (k) The slope of selling price with number of rooms is the same, regardless of whether the house is located in an east or west neighborhood (l) Yn = 253.95 + 8.032X1 - 5.90X2 + 2.089X1X2 For X1 X2, p@value = 0.330 Do not reject H0 There is no evidence that the interaction term makes a contribution to the model (m) The model in (b) should be used (n) The number of rooms and the neighborhood both significantly affect the selling price, but the number of rooms has a greater effect 14.42 (a) Predicted time = 8.01 + 0.00523 Depth - 2.105 Dry (b) Holding constant the effect of type of drilling, for each foot increase in depth of the hole, the mean drilling time is estimated to increase by 0.00523 minutes For a given depth, a dry drilling hole is estimated to reduce the drilling time over wet drilling by a mean of 2.1052 minutes (c) 6.428 minutes, 6.210 … mY͉X … 6.646, 4.923 … YX … 7.932 (d) The model appears to be adequate (e) FSTAT = 111.11 3.09; reject H0 (f) tSTAT = 5.03 1.9847; reject H0 tSTAT = -14.03 -1.9847; reject H0 Include both variables (g) 0.0032 … b1 … 0.0073 (h) -2.403 … b2 … -1.808 (i) 69.0% (j) 0.207, 0.670 (k) The slope of the additional drilling time with the depth of the hole is the same, regardless of the type of drilling method used (l) The p-value of the interaction term = 0.462 0.05, so the term is not significant and should not be included in the model (m) The model in part (b) should be used Both variables affect the drilling time Dry drilling holes should be used to reduce the drilling time 14.44 (a) Yn = 1.2604 - 0.0071X1 + 0.0541X2 - 0.0006X1X2, where X1 = efficiency ratio, X2 = total risk@based capital, p@value = 1037 0.05 Do not reject H0 There is not enough evidence that the interaction term makes a contribution to the model (b) Because there is insufficient evidence of any interaction effect between efficiency ratio and total risk-based capital, the model in Problem 14.4 should be used 14.46 (a) The p-value of the interaction term = 0.0347 0.05, so the term is significant and should be included in the model (b) Use the model developed in this problem 14.48 (a) For X1 X2, p@value = 0.2353 0.05 Do not reject H0 There is insufficient evidence that the interaction term makes a contribution to the model (b) Because there is not enough evidence of an interaction effect between total staff present and remote hours, the model in Problem 14.7 should be used 14.50 Holding constant the effect of other variables, the natural ­logarithm of the estimated odds ratio for the dependent categorical response will increase by 2.2 for each unit increase in the particular independent ­variable 14.54 (a) ln1estimated odds ratio2 = -6.9394 + 0.1395X1 + 2.7743X2 = -6.9394 + 0.13951362 + 2.7743102 = -1.91908 Estimated odds ratio = 0.1470 Estimated Probability of Success = Odds Ratio> 11 + Odds Ratio2 = 0.1470> 11 + 0.14702 = 0.1260 (b) From the text discussion of the example, 70.2% of the ­individuals who charge $36,000 per annum and possess additional cards can be expected to purchase the premium card Only 12.60% of the individuals who charge $36,000 per annum and not possess additional cards can be expected to purchase the premium card For a given amount of money charged per annum, the likelihood of purchasing a premium card is substantially higher among individuals who already possess additional cards than for those who not possess additional cards (c) ln(estimated odds ratio) = -6.9394 + 0.13957X1 + 2.7743X2 = -6.9394 + 0.13951182 + 2.7743102 = -4.4298.Estimated odds ratio = e-4.4298 = 0.0119 Estimated Probability of Success = Odds Ratio> 11 + Odds Ratio2 = 0.0119> 11 + 0.01192 = 0.01178 (d) Among individuals who not 707 purchase additional cards, the likelihood of purchasing a premium card diminishes dramatically with a substantial decrease in the amount charged per annum 14.58 (a) ln1estimated odds2 = -0.6048 + 0.0938 claims>year + 1.8108 new business (b) Holding constant the effects of whether the policy is new, for each increase of the number of claims submitted per year by the policy holder, ln(odds) increases by an estimate of 0.0938 Holding constant the number of claims submitted per year by the policy holder, ln(odds) is estimated to be 1.8108 higher when the policy is new as compared to when the policy is not new (c) ln1estimated odds ratio2 = 1.2998 Estimated odds ratio = 3.6684 Estimated probability of a fraudulent claim = 0.7858 (d) The deviance statistic is 119.4353 with a p@value = 0.0457 0.05 Reject H0 The model is not a good fitting model (e) For claims/year: ZSTAT = 0.1865, p@value = 0.8521 0.05 Do not reject H0 There is insufficient evidence that the number of claims submitted per year by the policy holder makes a significant contribution to the logistic regression model For new business: ZSTAT = 2.2261, p@value = 0.0260 0.05 Reject H0 There is sufficient evidence that whether the policy is new makes a significant contribution to the logistic model regression (f) ln1estimated odds2 = -1.0125 + 0.9927 claims>year (g) ln1estimated odds2 = -0.5423 + 1.9286 new business (h) The deviance statistic for (f) is 125.0102 with a p@value = 0.0250 0.05 Reject H0 The model is not a good fitting model The deviance statistic for (g) is 119.4702 with a p@value = 0.0526 0.05 Do not reject H0 The model is a good fitting model The model in (g) should be used to predict a fraudulent claim 14.60 (a) ln(estimated odds) = 1.252 - 0.0323 Age + 2.2165 subscribes to the wellness newsletters (b) Holding constant the effect of subscribes to the wellness newsletters, for each increase of one year in age, ln(estimated odds) decreases by an estimate of 0.0323 Holding constant the effect of age, for a customer who subscribes to the wellness newsletters, ln(estimated odds) increases by an estimate of 2.2165 (c) 0.912 (d) Deviance = 102.8762, p-value = 0.3264 Do not reject H0 so model is adequate (e) For Age: Z = - 1.8053 - 1.96, Do not reject H0 For subscribes to the wellness newsletters: Z = 4.3286 1.96, Reject H0 (f) Only subscribes to wellness newsletters is useful in predicting whether a customer will purchase organic food 14.80 The r of the multiple regression is very low, at 0.0645 Only 6.45% of the variation in thickness can be explained by the variation of pressure and temperature The F test statistic for the model including pressure and temperature is 1.621, with p@value = 0.2085 Hence, at a 5% level of significance, there is not enough evidence to conclude that pressure and/or temperature affect thickness The p-value of the t test for the significance of pressure is 0.8307 0.05 Hence, there is insufficient evidence to conclude that pressure affects thickness, holding constant the effect of temperature The p-value of the t test for the significance of temperature is 0.0820, which is also 0.05 There is insufficient evidence to conclude that temperature affects thickness at the 5% level of significance, holding constant the effect of pressure Hence, neither pressure nor temperature affects thickness individually The normal probability plot does not suggest any potential violation of the normality assumption The residual plots not indicate potential violation of the equal variance assumption The temperature residual plot, however, suggests that there might be a nonlinear relationship between temperature and thickness The r of the multiple regression model that includes the interaction of pressure and temperature is very low, at 0.0734 Only 7.34% of the variation in thickness can be explained by the variation of pressure, temperature, and the interaction of the two The F test statistic for the model that includes pressure and temperature and their interaction is 1.214, with a p-value of 0.3153 Hence, at a 5% level of significance, there is 708 Self-Test Solutions and Answers to Selected Even-Numbered Problems insufficient evidence to conclude that pressure, temperature, and the interaction of the two affect thickness The p-value of the t test for the significance of pressure, temperature, and the interaction term are 0.5074, 0.4053, and 0.5111, respectively, which are all greater than 5% Hence, there is insufficient evidence to conclude that pressure, temperature, or the interaction individually affects thickness, holding constant the effect of the other variables The pattern in the normal probability plot and residual plots is similar to that in the regression without the interaction term Hence the article’s suggestion that there is a significant interaction between the pressure and the temperature in the tank cannot be validated 14.82 b0 = 18.2892 (die temperature), b1 = 0.5976, (die diameter), b2 = -13.5108 The r of the multiple regression model is 0.3257 so 32.57% of the variation in unit density can be explained by the variation of die temperature and die diameter The F test statistic for the combined ­significance of die temperature and die diameter is 5.0718 with a p-value of 0.0160 Hence, at a 5% level of significance, there is enough evidence to conclude that die temperature and die diameter affect unit density The p-value of the t test for the significance of die temperature is 0.2117, which is greater than 5% Hence, there is insufficient evidence to conclude that die temperature affects unit density holding constant the effect of die diameter The p-value of the t test for the significance of die diameter is 0.0083, which is less than 5%.There is enough evidence to conclude that die diameter affects unit density at the 5% level of significance holding constant the effect of die temperature After removing die temperature from the model, b0 = 107.9267 (die diameter), b1 = -13.5108 The r of the multiple regression is 0.2724 So 27.24% of the variation in unit density can be explained by the variation of die diameter The p-value of the t test for the significance of die diameter is 0.0087, which is less than 5% There is enough evidence to conclude that die diameter affects unit density at the 5% level of significance There is some lack of equality in the residuals and some departure from normality Chapter 15 15.2 (a) Predicted HOCS is 2.8600, 3.0342, 3.1948, 3.3418, 3.4752, 3.5950, 3.7012, 3.7938, 3.8728, 3.9382, 3.99, 4.0282, 4.0528, 4.0638, 4.0612, 4.045, 4.0152, 3.9718, 3.9148, 3.8442, and 3.76 (c) The curvilinear relationship suggests that HOCS increases at a decreasing rate It reaches its maximum value of 4.0638 at GPA = 3.3 and declines after that as GPA continues to increase (d) An r of 0.07 and an adjusted r of 0.06 tell you that GPA has very low explanatory power in identifying the variation in HOCS You can tell that the individual HOCS scores are scattered widely around the curvilinear relationship 15.6 (b) price = 17,429.2098 - 1,797.0198 age + 62.843 age2 (c) 17,429.2098 - 1,797.0198(5) + 62.843(5)2 = $10,015.19 (d) There are no patterns in any of the residual plots (e) FSTAT = 363.0192 3.18 Reject H0 There is a significant quadratic relationship between age and price (f) p-value = 0.0000 The probability of FSTAT = 363.0192 or higher is 0.0000 (g) tSTAT = 5.9707 2.0281 Reject H0 (h) The probability of tSTAT = 5.9707 or higher is 0.0000 (i) r = 0.9528 95.28% of the variation in price can be explained by the quadratic relationship between age and price (j) adjusted r = 0.9501 (k) There is a strong quadratic relationship between age and price 15.8 (a) 215.37 (b) For each additional unit of the logarithm of X1, the logarithm of Y is estimated to increase by 0.9 unit, holding all other variables constant For each additional unit of the logarithm of X2, the logarithm of Y is estimated to increase by 1.41 units, holding all other variables constant 15.12 (a) Predicted ln1Price2 = 9.7435 - 0.1065 Age (b) $10,005.20 (c) Because there is a lack of equal variance in the residuals, the model is not adequate (d) tSTAT = -19.0145 -2.0262; reject H0 (e) 90.72% 90.72% of the variation in the natural log of price can be explained by the age of the auto (f) 90.47% (g) Choose the model from Problem 15.6 That model has a higher adjusted r of 95.01% 15.14 1.25 15.16 R21 = 0.0681, VIF1 = VIF2 = = 1.073, R22 = 0.0681, 1- 0.0681 = 1.073 There is no evidence of collinearity - 0.0681 15.18 VIF = 1.0069 There is no evidence of collinearity 15.30 (a) An analysis of the linear regression model with all of the three possible independent variables reveals that the highest VIF is only 1.06 A stepwise regression model selects only the supplier dummy variable for inclusion in the model A best-subsets regression produces only one model that has a Cp value less than or equal to k + which is the model that includes pressure and the supplier dummy variable This model is Yn = -31.5929 + 0.7879X2 + 13.1029X3 This model has F = 5.1088 with a p@value = 0.027 r = 0.4816,r 2adj = 0.3873 A residual analysis does not reveal any strong patterns The errors appear to be normally distributed 15.38 In the multiple regression model with catalyst, pH, pressure, temperature, and voltage as independent variables, none of the variables has a VIF value of or larger The best-subsets approach showed that only the model containing X1, X2, X3, X4, and X5 should be considered, where X1 = catalyst, X2 = pH, X3 = pressure, X4 = temp, and X5 = voltage Looking at the p-values of the t statistics for each slope coefficient of the model that includes X1 through X5 reveals that pH level is not significant at the 5% level of significance 1p@value = 0.28622 The multiple regression model with pH level deleted shows that all coefficients are significant individually at the 5% level of significance The best linear model is determined to be Yn = 3.6833 + 0.1548X1 - 0.04197X3 - 0.4036X4 + 0.4288X5 The overall model has F = 77.0793, with a p-value that is virtually r = 0.8726, r 2adj = 0.8613 The normal probability plot does not suggest possible violation of the normality assumption A residual analysis reveals a potential nonlinear relationship in temperature The p-value of the squared term for temperature (0.1273) in the following quadratic transformation of temperature does not support the need for a quadratic transformation at the 5% level of significance The p-value of the interaction term between pressure and temperature (0.0780) indicates that there is not enough evidence of an interaction at the 5% level of significance The best model is the one that includes catalyst, pressure, temperature, and voltage, which explains 87.26% of the variation in thickness 709 Self-Test Solutions and Answers to Selected Even-Numbered Problems Chapter 16 16.2 (a) 1988 (b) The first four years and the last four years Year Hours Per Day MA(3) ES1W = 0.52 ES1W = 0.252 2008 2009 2010 2011 2012 2013 2014 2.2 2.3 2.4 2.6 2.5 2.3 2.4 #N/A 2.3000 2.4333 2.5000 2.4667 2.4000 #N/A 2.2000 2.2500 2.3250 2.4625 2.4813 2.3906 2.3953 2.2000 2.2250 2.2688 2.3516 2.3887 2.3665 2.3749 Euro Stoxx Index 3579 4120 4400 2451 2966 2793 2317 2636 3109 3146 3268 Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 16.4 (b), (c), (e) (d) W = 0.5: Yn2015 = E2014 = 2.3953; W = 0.25: Yn2015 = E2014 = 2.3749 (f) The exponentially smoothed forecast for 2015 with W = 0.5 is slightly higher than that with W = 0.25 A smoothing coefficient of W = 0.25 smooths out the hours more than W = 0.50 MA(3) 3550.0 4033.0 3657.0 3272.3 2736.7 2692.0 2582.0 2687.3 2963.7 3174.3 0.5 0.25 ES1W = 0.52 3274.6 3697.3 4048.6 3249.8 3107.9 2950.5 2633.7 2634.9 2871.9 3009.0 3138.5 ES1W = 0.252 3184.8 3418.6 3664.0 3360.7 3262.0 3144.8 2937.8 2862.4 2924.0 2979.5 3051.6 (f) The forecasts for 2016 using the two weights are very close In ­general, a higher weight allows for a faster adaptation to changes while a lower weight is better suited for eliminating unwanted cyclical and irregular variations (g) According with the exponential smoothing with W = 0.25 and focusing on the period since 2000 the index is very stable, i.e., overall the stocks included in the index are showing a constant performance since 2000 16.6 (a), (b), (c), and (e) see chart Euro Stoxx Index 6000 16.8 (a), (b), (c), and (e) see chart European Union Population 510 4000 505 2000 15 20 13 09 Year 500 European Union Population (in million inhabitants) MA(3) ES(W = 0.5) ES(W = 0.25) 495 490 Year Euro Stoxx Index MA(3) ES1W = 0.52 ES1W = 0.252 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 1507 1850 2532 3342 4904 4772 3806 2386 2761 2951 1963.0 2574.7 3592.7 4339.3 4494.0 3654.7 2984.3 2699.3 3097.0 1507.0 1678.5 2105.3 2723.6 3813.8 4292.9 4049.5 3217.7 2989.4 2970.2 1507.0 1592.8 1827.6 2206.2 2880.6 3353.5 3466.6 3196.5 3087.6 3053.4 15 14 20 13 20 12 20 11 Year 20 10 20 09 20 08 20 07 20 06 20 05 20 0.25 20 0.5 04 485 (d) 20 07 20 20 05 03 20 20 01 99 20 19 97 19 19 95 20 1000 11 Euro Stoxx Index MA(3) ES(W = 0.5) ES(W = 0.25) Million Inhabitants 3000 20 Value of Index 5000 710 Self-Test Solutions and Answers to Selected Even-Numbered Problems (d) Year European Union Population (in million inhabitants) 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 493 495 496 498 500 502 503 504 504 505 507 508 0.5 0.25 MA(3) ES1W = 0.52 ES1W = 0.252 494.7 496.3 498.0 500.0 501.7 503.0 503.7 504.3 505.3 506.7 493.0 494.0 495.0 496.5 498.3 500.1 501.6 502.8 503.4 504.2 505.6 506.8 493.0 493.5 494.1 495.1 496.3 497.7 499.1 500.3 501.2 502.2 503.4 504.5 1.3 1.7 2.0 2.0 1.3 1.0 0.3 0.7 1.7 1.3 13.3 (f) The larger weight, 0.5, allows the exponential smoothing to adapt to the trend faster than the lower weight, 0.25, which means that the forecast is closer to the actual values (g) There is a trend without noteworthy variations in the data, i.e., all three “averaging” methods considered are underestimating the population systematically However, MA(3) and ES(W = 0.5) provide better forecasts because, in this case, the more recent data is more important 16.10 (a) The Y intercept b0 = 4.0 is the fitted trend value reflecting the real total revenues (in millions of dollars) during the origin, or base year, 1994 (b) The slope b1 = 1.5 indicates that the real total revenues are increasing at an estimated rate of $1.5 million per year (c) Year is 1996, X = 1998 - 2014 = 4, Yn5 = 4.0 + 1.5142 = 10.0 million dollars (d) Year is 2015, X = 2015 - 1994 = 21, Yn20 = 4.0 + 1.51212 = 35.5 million dollars (e) Year is 2018, X = 2018 - 1994 = 24,Yn23 = 4.0 + 1.51242 = 40 million dollars 1.5 1.75 1.875 1.4375 1.21875 0.609375 0.8046875 1.40234375 1.20117188 12.8 1.875 2.90625 3.6796875 4.25976563 3.94482422 3.70861816 2.78146362 2.83609772 3.62707329 3.47030497 33.1 16.12 (b) Linear trend: Yn = 92.07 + 5.3757X, where X is relative to 2000 (c) Quadratic trend: Yn = 77.8553 + 11.9363X - 0.4686X 2, where X is relative to 2000 (d) Exponential trend: log10Yn = 1.9476 + 0.0207X, where X is relative to 2000 (e) Linear trend: Yn2015 = 92.07 + 5.37571152 = 172.7055 Yn2016 = 92.07 + 5.37571162 = 178.0812 Quadratic trend: Yn2015 = 77.8553 + 11.93631152 - 0.46861152 = 151.46 Yn2016 = 77.8553 + 11.93631162 + -0.46862 1162 = 148.87 Exponential trend: Yn2015 = 101.9476 + 0.02071152 = 181.18 Yn2016 = 101.9476 + 0.02071162 = 189.73 (f) The quadratic and exponential trend models fit the data better than the linear trend model and, hence, either forecast should be used 16.14 (a) See chart UN’s Estimate of World Population from 1950 until 2015 8,000,000,000 y = 8E + 07x + 2E + 09 R2 = 0.9958 7,000,000,000 6,000,000,000 Persons 5,000,000,000 4,000,000,000 3,000,000,000 2,000,000,000 1,000,000,000 0 10 20 40 30 Year Relative to 1950 50 60 70 711 Self-Test Solutions and Answers to Selected Even-Numbered Problems (b) Xi is the year considered relative to 1950 relative to 2002 (e) Linear trend: Yn2015 = 9,643.538 million KWh Yn2016 = 10,581.38 million KWh Quadratic trend: Yn2015 = 17,895.99 million KWh Yni = 2,243,588,839 + 77,041,391.6Xi (c) Code of the year 2016: 2016 - 1950 = 66 Yn2016 = 22,370.6 millions of KWh Exponential trend: Yn2015 = 8,684.42 million KWh Yn = 2,243,588,839 + 77,041,391.6(66) ≈ 7,328,320,687 Yn2016 = 102.5659 + 0.07471122 = 11,368.34 million KWh Code of the year 2017: 2017 - 1950 = 67 Yn = 2,243,588,839 + 77,041,391.6(67) ≈ 7,405,362,078 (d) In the time period considered, there is an almost linear trend World population is growing at the rate of approx 77 million persons per year 16.16 (b) Linear trend: Yn = -2,548.4615 + 937.8462X, where X is relative to 2002 (c) Quadratic trend: Yn = 2,638.7912 - 1,891.5644 + 235.7842X 2, where X is relative to 2002 (d) Exponential trend: log10Yn = 2.4183 + 0.1170X, where X is 16.18 (b) Linear trend: Yn = 2.0718 + 0.1250X, where X is relative to 2000 (c) Quadratic trend: Yn = 2.1391 + 0.0962X - 0.0019X 2, where X is relative to 2000 (d) Exponential trend: log10Yn = 0.3329 + 0.0183X, where X is relative to 2000 All the models appear to fit the data equally, so choose the linear model because it is simplest (f) The forecast using the linear model is: Linear trend: Yn2016 = 4.072 millions 16.20 (a) See chart Intel Microprocessors 8E + 09 7E + 09 6E + 09 Transistor Count Transistor Count 5E + 09 Linear (Transistor Count) 4E + 09 y = 7E + 07x – 1E + 09 3E + 09 Poly (Transistor Count) 2E + 09 y = 6E + 06x2 – 2E + 08x + 9E + 08 1E + 09 Expon (Transistor Count) –1E + 09 10 –2E + 09 20 30 40 50 y = 2009.6e0.3368x Year Relative to 1971 (b) In some years, there are several points with very different values – this reflects the different lines of processors that Intel produces (e.g., energy efficient microprocessors for laptop computers and high performance microprocessors for servers) Overall, there was a rapid growth in the transistor count in particular in the past 15 years (c) Xi is the year considered relative to 1971 Yni = -1,170,207,680.7 + 72,720,565.7Xi (d) Xi is the year considered relative to 1971 Yni = 871,617,341 - 189,306,072.3Xi + 5,506,682.4X 2i to be more linear than the graph of Y versus X, so an exponential model appears to be more appropriate (b) Time Series I : Yn = 100.0731 + 14.9776X, where X = years ­relative to 2003 Time Series II: Yn = 101.9982 + 0.0609X, where X = years relative to 2003 (c) X = 12 for year 2015 in all models Forecasts for the year 2015: Time Series I: nY = 100.0731 + 14.97761122 = 279.8045 Time Series II: Yn = 101.9982 + 0.06091122 = 535.6886 (e) Xi is the year considered relative to 1971 Yni = 1(2009.6)(1.4)Xi (f) The exponential model seems to be the best choice in particular when 16.24 tSTAT = 2.40 2.2281; reject H0 considering that the other two models produce negative forecast for some years: 16.28 (a) 16.26 (a) tSTAT = 1.60 2.2281; not reject H0 Code of the year 2017: 2017 - 1971 = 46 Yn = (2009.6)(1.4)46 ≈ 10,753,468,276 Code of the year 2018: 2018 - 1971 = 47 Yn = (2009.6)(1.4)47 ≈ 15,059,790,826 16.22 (a) For Time Series I, the graph of Y versus X appears to be more linear than the graph of log Y versus X, so a linear model appears to be more appropriate For Time Series II, the graph of log Y versus X appears Intercept YLag1 YLag2 YLag3 Coefficients Standard Error t Stat P-value 95.3171 0.0540 -0.2600 0.0643 42.4806 0.3112 0.3562 0.2975 2.2438 1.7352 - 0.7298 0.2160 0.0551 0.1209 0.4863 0.8344 Since the p-value = 0.8344 0.05 level of significance, the third-order term can be dropped 712 Self-Test Solutions and Answers to Selected Even-Numbered Problems (b) Intercept YLag1 YLag2 Coefficients Standard Error t Stat P-value 76.2264  0.7059 - 0.2499 34.8522  0.2843  0.2947 2.1871 2.4827 - 0.8480 0.0536 0.0324 0.4163 Since the p@value = 0.4163 0.05, the second-order term can be dropped (c) Intercept YLag1 Coefficients Standard Error t Stat p-value 51.9983 0.6302 30.9242 0.2343 1.6815 2.6892 0.1185 0.0197 Since the p@value = 0.0197 0.05 the first-order term cannot be dropped (d) The most appropriate model for forecasting is the first-order autoregressive model: Yn2015 = 51.9983 + 0.6302Y2014 = 160.96 $000 Yn2016 = 51.9983 + 0.6302Yn2015 = 153.44 $000 16.30 (a) Since the p@value = 0.4653 0.05 level of significance, the third-order term can be dropped (b) Since the p@value = 0.9739 0.05 level of significance, the second-­ order term can be dropped (c) Since the p-value is 0.0000, the first-order term is significant (d) The most appropriate model for forecasting is the first-order autoregressive model: Yn2016 = 0.0355 + 1.0394Y2015 = $4.453 million 16.32 (a) 2.121 (b) 1.50 16.34 (a) The residuals in the linear, quadratic, and exponential trend model show strings of consecutive positive and negative values (b), (c) Linear Syx MAD 3,871.6310 2,751.11 Quadratic 2,314.8400 1,685.3551 Exponential 3,789.5293 1,623.5791 460.8646 305.8783 16.36 (b), (c) Linear Quadratic Exponential AR1 31.9700 22.4162 32.1210 24.0834 32.9932 22.3892 32.6210 24.7721 (d) The residuals in the three trend models show strings of consecutive positive and negative values The autoregressive model has a fairly r­ andom pattern of residuals There is very little difference in MAD and SYX between the models The fact that the autoregressive model has a random pattern in the residuals, the autoregressive model would be the best model for forecasting 16.38 (b), (c) Syx MAD 16.40 (a) log bn0 = 2, bn0 = 100 This is the fitted value for January 2009 prior to adjustment with the January multiplier (b) log bn1 = 0.01, bn1 = 1.0233 The estimated monthly compound growth rate is 2.33% (c) log bn2 = 0.1, bn2 = 1.2589 The January values in the time series are estimated to have a mean 25.89% higher than the December values 16.42 (a) log bn0 = 3.0, bn0 = 1,000 This is the fitted value for the first quarter of 2011 prior to adjustment by the quarterly multiplier (b) log bn1 = 0.1, bn1 = 1.2589 The estimated quarterly compound growth rate is 1bn1 - 12100% = 25.89% (c) log bn3 = 0.2,bn3 = 1.5849 16.44 (a)The retail industry is heavily subject to seasonal variation due to the holiday seasons and so are the revenues for Toys R Us (b) There is obvious seasonal effect in the time series (c) log10Yn = 3.6398 + 0.0020X - 0.3635Q1 - 0.3640Q2 - 0.3411Q3 (d) log10bn1 = 0.0020 bn1 = 1.0046 The estimated quarterly compound growth rate is 1bn1 - 12 100% = 0.46% (e) log10bn2 = -0.3635 bn2 = 0.4330 1bn2 - 12 100% = -56.70% The 1st quarter values in the time series are estimated to have a mean 57.30% below the 4th quarter values log10bn3 = -0 3640 bn3 = 0.4325 1bn3 - 12 100% = -56.75% The 2nd quarter values in the time series are estimated to have a mean 56.75% below the 4th quarter values log10bn4 = -0.3411 bn = 0.4559 1bn4 - 12 100% = -54.41% The 3rd quarter values in the time series are estimated to have a mean 54.41% below the 4th quarter values (f) Forecasts for the last three quarters of 2014 and all of 2015 are 2,624.8108, 2,779.4253, 6,124.2936, 2,664.1336, 2,672.7110, 2,830.6684, and 6,236.0559 millions 16.46 (b) AR1 (d) The residuals in the three trend models show strings of consecutive positive and negative values The autoregressive model performs well for the historical data and has a fairly random pattern of residuals It has the smallest values in MAD and SYX The autoregressive model would be the best model for forecasting Syx MAD sive model has a fairly random pattern of residuals The MAD and SYX v­ alues are similar in all the models The autoregressive model would be the best model for forecasting due to its fairly random pattern of residuals Linear Quadratic Exponential AR1 0.1223 0.0896 0.1203 0.0901 0.1151 0.0912 0.1306 0.0963 (d) The residuals in the linear, quadratic, and exponential trend models show strings of consecutive positive and negative values The autoregres- Intercept Coded Month M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 Coefficients Standard Error t Stat P-value   2.2545 -0.0010 -0.2146 -0.1697   0.0335 -0.1720 -0.0982   0.0662 -0.0623 -0.0256   0.0912   0.0243 -0.0506 0.0246 0.0006 0.0295 0.0294 0.0292 0.0291 0.0290 0.0290 0.0289 0.0288 0.0288 0.0288 0.0288  91.8251 - 1.6812 - 7.2778 - 5.7804    1.1465 - 5.9014 - 3.3818    2.2853 - 2.1560 - 0.8858    3.1651    0.8449 - 1.7605 0.0000 0.1063 0.0000 0.0000 0.2634 0.0000 0.0026 0.0318 0.0418 0.3849 0.0043 0.4069 0.0916 (c) Yn135 = 165.68 (d) Forecasts for the last four months of 2014 are 205.8022, 176.0334, 147.7968 161.254 (e) -0.23% (f) 0.8664 1bn8 - 12 100% = -13.36% The July values in the time series are ­estimated to have a mean 13.36% below the December values 16.48 (b) Intercept Coded Quarter Q1 Coefficients Standard Error t Stat P-value 0.8790 0.0141 0.0585 0.0540 0.0016 0.0567 16.2773 8.9055 1.0325 0.0000 0.0000 0.0382 Self-Test Solutions and Answers to Selected Even-Numbered Problems Q2 Q3 Standard Error 0.0566 0.0565 Coefficients 0.0061 0.0101 t Stat 0.1087 0.1783 P-value 0.9140 0.8594 (c) 3.3%, after adjusting for the seasonal component (d) 14.42% above the fourth-quarter values (e) Last quarter, 2014: Yn35 = $30.4705 (f) 2015: 36.1638, 32.9739, 34.3685, 34.6857 16.60 (b) Linear trend: Yn = 173,913.2560 + 2,447.1528 X, where X is relative to 1984 (c) 2015: Yn2015 = 249,774.994 thousands 2016: Yn2016 = 252,722.146 (d) (b) Linear trend: Yn = 116,213.625 + 1,485.8831X, where X is relative to 1984 (c) 2015: Yn2015 = 162,276 thousands 2016: Yn2016 = 163,761.883 thousands 16.62 Linear trend: Yn = -2.8167 + 0.7383X, where X is relative to 1975 Intercept Coded Yr Coefficients Standard Error t Stat P-value - 2.8167   0.7383 0.6187 0.0273 -4.5529 27.0421 0.0001 0.0000 (c) Quadratic trend: Yn = 0.9724 + 0.1400X + 0.0153X 2, where X is relative to 1975 Intercept Coded Yr Coded Yr Sq Coefficients Standard Error t Stat P-value 0.9724 0.1400 0.0153 0.3106 0.0369 0.0009  3.1303  3.7995 16.7937 0.0034 0.0005 0.0000 713 Exponential trend: log10Yn = 0.1895 + 0.0362X, where X is relative to 1975 Intercept Coded Yr Coefficients Standard Error t Stat P-value 0.1895 0.0362 0.0234 0.0010  8.1064 35.0857 0.0000 0.0000 AR(3): Yni = 0.4954 + 1.2924Yi - - 0.7386Yi - + 0.4769i - Intercept YLag1 YLag2 YLag3 Coefficients Standard Error t Stat P-value   0.4954   1.2924 -0.7386   0.4769 0.1875 0.1774 0.2708 0.1905   2.6428   7.2852 - 2.7272 2.503 0.0125 0.0000 0.0102 0.0174 Test of A3: p@value = 0.0174 0.05 Reject H0 that A3 = Third-order term cannot be deleted A third-order autoregressive model is appropriate Syx MAD Linear  1.9932  1.6967 Quadratic  0.6879  0.4338 Exponential 3.0371 1.6667 AR3 0.6194 0.4390 (h) The residuals in the first three models show strings of consecutive positive and negative values The autoregressive model performs well for the historical data and has a fairly random pattern of residuals It also has the smallest values in the standard error of the estimate and MAD Based on the principle of parsimony, the autoregressive model would probably be the best model for forecasting (i) Yn2015 = $28.3149 billions Index A a (level of significance), 298 A priori probability, 167 Addition rule, 172 Adjusted r2, 505–506 Algebra, rules for, 638 Alternative hypothesis, 295 Among-group variation, 374 Analysis of means (ANOM), 384 Analysis of proportions (ANOP), 423 Analysis of variance (ANOVA), 373 Kruskal-Wallis rank test for differences in c medians, 436–439 assumptions of, 439 one-way, assumptions, 380 F test for differences in more than two means, 376 F test statistic, 376 Levene’s test for homogeneity of variance, 381 summary table, 377 Tukey-Kramer procedure, 382–383 two-way, 387 cell means plot, 395 factorial design, 387 interpreting interaction effects, 395–397 multiple comparisons, 393–394 summary table, 391 testing for factor and interaction effects, 388–393 Analysis ToolPak, 32 checking for presence, 661 frequency distribution, 109 histogram, 109, 110, 114 descriptive statistics, 161 exponential smoothing, 618 F test for ratio of two variances, 371 multiple regression, 541–543 one-way ANOVA, 406 paired t test, 370 pooled-variance t test, 367–368 random sampling, 54 residual analysis, 497 sampling distributions, 262 separate-variance t test, 368–369 simple linear regression, 497 two-way ANOVA, 408–409 Analyze, 27 ANOVA See Analysis of variance (ANOVA) Area of opportunity, 202 Arithmetic mean See Mean Arithmetic operations, rules for, 638 714 Assumptions analysis of variance (ANOVA), 380 of the confidence interval estimate for the mean (s unknown), 269 of the confidence interval estimate for the proportion, 276, 278 of the F test for the ratio of two variances, 358 of the paired t test, 345 of Kruskal-Wallis test, 439 of regression, 467 for * table, 416 for * c table, 421 for r * c table, 429 for the t distribution, 269 t test for the mean (s unknown), 310–311 in testing for the difference between two means, 335 of the Wilcoxon rank sum test, 431 of the Z test for a proportion, 319 Autocorrelation, 471 Autoregressive modeling, steps involved in, on annual time-series data, 599–600 B Bar chart, 70–71 Bayes’ theorem, 183 Best-subsets approach in model building, 562–563 b Risk, 298 Bias nonresponse, 47 selection, 47 Big data, 28 Binomial distribution, 195–201 mean of, 200 properties of, 195 shape of, 199 standard deviation of, 200 Binomial probabilities calculating, 197–199 Bootstraapping, 285 Boxplots, 141–142 Brynne packaging, 495 Business analytics, 28, 627 C CardioGood Fitness, 52–53, 105, 162, 188, 237, 291, 366, 405, 447 Categorical data chi-square test for the difference between two proportions, 411–416 chi-square test of independence, 424–429 chi-square test for c proportions, 418–421 organizing, 57–59 visualizing, 70–75 Z test for the difference between two proportions, 350–353 Categorical variables, 38 Causal forecasting methods, 578 Cell, 30 Cell means plot, 395 Central limit theorem, 247 Central tendency, 120 Certain event, 166 Challenges in organizing and visualizing variables, obscuring data, 92–93 creating false impressions, 93–94 Chartjunk, 94–95 Charts bar, 70–71 doughnut, 71–72, 75 Pareto, 72–74 pie, 71–72 side-by-side bar, 74 Chebyshev Theorem, 146–147 Chi-square (x2) distribution, 412 Chi-square (x2) test for differences between c proportions, 418–421 between two proportions, 411–416 Chi-square (x2) test for the variance or standard deviation, 441 Chi-square (x2) test of independence, 424–429 Chi-square (x2) table, 669 Choice is Yours Followup, 105, 188 Class boundaries, 63 Class intervals, 62 Class midpoint, 64 Class interval width, 63 Classes, 62 and Excel bins, 64 Classification trees, 631–632 Clear Mountain State Survey, 53, 105, 162, 188, 237, 291, 366, 405, 447 Cluster analysis, 631 Cluster sample, 43 Coefficient of correlation, 149–152 inferences about, 479 Coefficient of determination, 463–464 Coefficient of multiple determination, 505 Coefficient of partial determination, 517–518 Coefficient of variation, 129–130 Collectively exhaustive events, 171 Collect, 27 Collinearity of independent variables, 558 Combinations, 196 Complement, 168 Index 715 Completely randomized design See One-way analysis of variance Conditional probability, 175–176 Confidence coefficient, 299 Confidence interval estimation, 262 connection between hypothesis testing and, 305–306 for the difference between the means of two independent groups, 337–338 for the difference between the proportions of two independent groups, 354 for the mean difference, 347–348 ethical issues and, 284–285 for the mean (s known), 265–267 for the mean (s unknown), 271–274 for the mean response, 482–483 for the proportion, 276–278 of the slope, 478, 511–512 Contingency tables, 58, 87, 169 Continuous probability distributions, 214 Continuous variables, 38 Convenience sampling, 41 Correlation coefficient See Coefficient of correlation Counting rules, 184 Covariance, 148–149 of a probability distribution, 205 Coverage error, 47 Cp statistic, 563 Craybill Instrumentation Company case, 573–574 Critical range, 382 Critical value approach, 300–303, 308–310, 314–315, 319 Critical values, 266, of test statistic, 296–297, 308–310 Cross-product term, 521 Cross validation, 566 Cumulative percentage distribution, 66–68 Cumulative percentage polygons, 80–81 Cumulative standardized normal distribution, 217 tables, 665–666 Cyclical effect, 579 D Dashboards, 629–630 Data, 26 sources of, 39–40 Data cleaning, 44–45 Data collection, 39–41 Data formatting, 45 Data mining, 627–628 Data discovery, 87 DCOVA, 27, 57 Decision trees, 176–177 Define, 27 Degrees of freedom, 268, 270, Dependent variable, 452 Descriptive analytics, 627, 628–630 Descriptive statistics, 29 Digital Case, 104, 162, 188, 210, 237, 259, 291, 328, 366, 405, 447, 495, 539–540, 573 Directional test, 314 Discrete probability distributions binomial distribution, 195–201 Poisson distribution, 202 Discrete variables, 38 expected value of, 191–192 probability distribution for, 191 variance and standard deviation of, 193 Dispersion, 125 Doughnut chart, 71–72, 74–75 Downloading files for this book, 653 Drill-down, 88–90 Dummy variables, 519–521 Durbin-Watson statistic, 472–473 tables, 677 E Effect size, 360 Empirical probability, 167 Empirical rule, 145–146 Ethical issues confidence interval estimation and, 284–285 in hypothesis testing, 323 in multiple regression, 568 in numerical descriptive measures, 154 for probability, 180–181 for surveys, 48 Events, 167 Expected frequency, 412 Expected value, of discrete variable, 191–192 Explained variation or regression sum of squares (SSR), 462 Explanatory variables, 452 Exponential distribution, 214, 233 Exponential growth with monthly data forecasting equation, 608–612 with quarterly data forecasting equation, 608–612 Exponential smoothing, 582–584 Exponential trend model, 588–591 Extrapolation, predictions in regression analysis and, 457 F Factor, 373 Factorial design See Two-way analysis of variance F distribution, 376 tables, 670–673 Finite population correction factor, 285 First-order autoregressive model, 595–596 First quartile, 137 Five-number summary, 139–140 Fixed effects models, 399 Forecasting, 578 autoregressive modeling for, 595–602 choosing appropriate model for, 604–606 least-squares trend fitting and, 585–592 seasonal data, 607–612 Frame, 41 Frequency distribution, 62–64 F test for the ratio of two variances, 356–359 F test for the factor effect, 390 F test for factor A effect, 390 F test for factor B effect, 390 F test for interaction effect, 391 F test for the slope, 477 F test in one-way ANOVA, 376 G Gauges, 373, 629 General addition rule, 171–173 General multiplication rule, 179 Geometric mean, 124 Geometric mean rate of return, 124–125 Grand mean, 374 Greek alphabet, 643 Groups, 373 Guidelines for developing visualizations, 96 H Histograms, 77 Homogeneity of variance, 380 Levene’s test for, 381 Homoscedasticity, 467 Hypergeometric distribution, 206 Hypothesis See also One-sample tests of hypothesis alternative, 295 null, 295 I Impossible event, 166 Independence, 178 of errors, 467 x2 test of, 424–429 Independent events, multiplication rule for, 179 Independent variable, 452 Index numbers, 613 Inferential statistics, 29 Interaction, 521 Interaction terms, 521 Interpolation, predictions in regression analysis and, 457 Interquartile range, 139 Interval scale, 38 Irregular effect, 579 J Joint probability, 170 Joint event, 168 Joint response, 58 Judgment sample, 41 K Kruskal-Wallis rank test for differences in c medians, 436–439 assumptions of, 439 Kurtosis, 132 716 Index L Lagged predictor variable, 595 Least-squares method in determining simple linear regression, 454–455 Least-squares trend fitting and forecasting, 585–592 Left-skewed, 132 Leptokurtic, 132 Level of confidence, 265 Level of significance (r), 298 Levels, 373 Levene’s test for homogeneity of variance, 381 Linear regression See Simple linear regression Linear relationship, 453 Linear trend model, 585–586 Logarithms, rules for, 639 Logarithmic transformation, 555–557 Logical causality, 30 Logistic regression, 528–531 M Main effects, 393 Main effects plot, 395 Managing the Managing Ashland MultiComm Services, 52, 104, 162, 209–210, 237, 259, 290, 328, 365–366, 404–405, 446–447, 495, 539, 617 Marascuilo procedure, 421–423 Marginal probability, 171, 180 Margin of error, 47, 279 Matched samples, 342 Mathematical model, 195 McNemar test, 441 Mean, 120–122 of the binomial distribution, 200 confidence interval estimation for, 265– 267, 271–274 geometric, 124 population, 163 sample size determination for, 279–281 sampling distribution of, 241–251 standard error of, 243 unbiased property of, 241 Mean absolute deviation, 605 Mean difference, 342 Mean squares, 375 Mean Square Among (MSA), 375 Mean Square A (MSA), 390, Mean Square B (MSB), 390 Mean Square Error (MSE), 390, Mean Square Interaction (MSAB), 390 Mean Square Total (MST), 375 Mean Square Within (MSW), 375 Measurement types of scales, 38 Measurement error, 48 Median, 122–123 Microsoft Excel, absolute and relative cell references, 646 adding numericsl vriables, 88, 117 add-ins, 31, 661 array formulas, 647 arithmetic mean, 161 autogressive modeling, 619–620 bar charts, 110–111 Bayes’ theorem, 189 basic probabilities, 189 binomial probabilities, 211 bins, 64 boxplots, 163 cells, 30 cell means plot, 409 cell references, 646 central tendency, 161 chart formatting, 649 chart sheets, 30 checklist for using, 644 chi-square tests for contingency tables, 448–449 coefficient of variation, 162 confidence interval esimate for the difference between the means of two independent groups, 368 confidence interval for the mean, 292 confidence interval for the proportion, 293 contingency tables, 107–108 correlation coefficient, 164 covariance, ,163 creating histograms for discrete probability distributions, 651–652 creating and copying worksheets, cross-classification table, 107–108 cumulative percentage distribution, 110 cumulative percentage polygon, 115 descriptive statistics, 161–163 doughnut chart, 110–111 drilldown, 88–89 dummy variables, 543 entering data, 34 entering formulas into worksheets, 647 establishing the variable type, 54 expected value, 211 exponential smoothing, 618 FAQs, 684–685 formatting cell contents, 648 formatting cell elements, 648–649 formulas, 645 frequency distribution, 109 functions, 647 F test for the ratio of two variances, 371 gauges, 635–636 geometric mean, 161 Getting ready to use, 661 Guide workbooks, 31 histogram, 113–114 interquartile range, 163 Kruskal-Wallis test, 450 kurtosis, 162 least-squares trend fitting, 619 Levene test, 407 logistic regression, 544 Marascuilo procedure, 448–449 median, 161 mode, 161 moving averages, 618 multidimensional contingency tables, 116–117 multiple regression, 541 mean absolute deviation, 620 model building, 576 new function names, 681 normal probabilities, 238 normal probability plot, 239 Office 389, 644 one-tail tests, 330 one-way analysis of variance, 406 opening workbooks, 645 ordered array, 108 quartiles, 162 paired t test, 369 Pareto chart, 111–112 pasting with Paste Special, 647 percentage distribution, 110 percentage polygon, 115 pie chart, 110–111 PivotChart, 90, 118 PivotTables, 87 Poisson probabilities, 212 pooled-variance t test, 367 population mean, 163 population standard deviation, 163 population variance, 163, Power Pivot, 635 prediction interval, 498 preparing and using data, 30 printing worksheets, 645 probability, 189 probability distribution for a discrete random variable, quadratic regression, 575 range, 162 recalculation, 646 recoding, 54 relative frequency distribution, 110 residual analysis, 497, 542 reviewing worksheets, 34 sample size determination, 293 sampling distributions, 262 saving workbooks, 645 scatter plot, 116 seasonal data, 620 security settings, 661–662 separate-variance t test, 368 side-by-side bar chart, 112 simple linear regression, 496–498 simple random samples, 54 skewness, 162 skill set needed, 31 slicers, 89, 118 sparklines, 91, 118 standard deviation, 162 stem-and-leaf display, 113 summary tables, 106–107 task pane, 35 t test for the mean (s unknown), 329 templates, 27 Index 717 time-series plot, 116 transformations, 575–576 treemaps, 636 two-way analysis of variance, 408 Tukey-Kramer multiple comparisons, 407 understanding nonstatistical functions, 682–683 useful keyboard shortcuts, 680 variance, 162 variance inflationary factor (VIF), 576 verifying formulas and worksheets, 680 which version to use, 32, 644 Wilcoxon rank sum test, 449–450 workbooks, 30 worksheet entries and references, 645 worksheets, 30 Z test for the difference between two proportions, 370 Z test for the mean (s known), 329 Z scores, 162 Z test for the proportion, 330 Midspread, 139 Missing values, 44 Mixed effects models, 399 Mode, 123–124 Models See Multiple regression models More Descriptive Choices Follow-up, 162, 291, 366, 405, 447, 574 Mountain States Potato Company case, 572 Moving averages, 580–582 Multidimensional contingency tables, 87–88 Multidimensional scaling, 631 Multiple comparisons, 382 Multiple regression models, 500 Adjusted r2, 505–506 best-subsets approach to, 562–563 coefficient of multiple determination in, 505, 551–552 coefficients of partial determination in, 517–518 collinearity in, 558 confidence interval estimates for the slope in, 511–512 dummy-variable models in, 519–521 ethical considerations in, 568 interpreting slopes in, 502 interaction terms, 521–525 with k independent variables, 501 model building, 559–565 model validation, 565–566 net regression coefficients, 502 overall F test, 506–507 partial F test statistic in, 513–517 pitfalls in, 568 predicting the dependent variable Y, 503 quadratic, 546–551 residual analysis for, 508–509 stepwise regression approach to, 561–562 testing for significance of, 506–507 testing portions of, 513–517 testing slopes in, 510–511 transformation in, 553–557 variance inflationary factors in, 558 Multiplication rule, 179 Mutually exclusive events, 171 N Net regression coefficient, 502 Nominal scale, 38 Nonparametric methods, 431 Nonprobability sample, 40 Nonresponse bias, 47 Nonresponse error, 47 Normal approximation to the binomial distribution, 233 Normal distribution, 214 cumulative standardized, 217 properties of, 215 Normal probabilities calculating, 217–225 Normal probability density function, 214 Normal probability plot, 229 constructing, 229 Normality assumption, 380, 467 Null hypothesis, 295 Numerical descriptive measures coefficient of correlation, 149–152 measures of central tendency, variation, and shape, 161–162 from a population, 144–145 Numerical variables, 38 organizing, 61–68 visualizing, 77–81 O Observed frequency, 412 Odds ratio, 528 Ogive, 80 One-tail tests, 314 null and alternative hypotheses in, 314 One-way analysis of variance (ANOVA), assumptions, 380 F test for differences among more than two means, 376 F test statistic, 358 Levene’s test for homogeneity of variance, 381 summary table, 377 Tukey-Kramer procedure, 382–383 Online resources, 653–660 Operational definitions, 29, 37 Ordered array, 61 Ordinal scale, 38 Organize, 27 Outliers, 45, 130 Overall F test, 506–507 P Paired t test, 342–347 Parameter, 40 Pareto chart, 72–74 Pareto principle, 72 Parsimony principle of, 605 Partial F-test statistic, 513–517 Percentage distribution, 65–66 Percentage polygon, 78–80 Percentiles, 138 PHStat, 31, 662 autocorrelation, 498 bar chart, 110 basic probabilities, 189 best subsets regression, 576 binomial probabilities, 211 boxplot, 163 cell means plot, 409 chi-square (x2) test for contingency tables, 448–449 confidence interval for the mean (s known), 292 for the mean (s unknown), 292 for the difference between two means, 368 for the mean value, 498 for the proportion, 293 contingency tables, 107 cumulative percentage distributions, 109–110 cumulative percentage polygons, 115 F test for ratio of two variances, 371 frequency distributions, 108 histograms, 113 Kruskal-Wallis test, 450 kurtosis, 162 Levene’s test, 407 logistic regression, 544 Marascuilo procedure, 448 mean, 161 median, 161 mode, 161 model building, 576 multiple regression, 541–543 normal probabilities, 238 normal probability plot, 239 one-way ANOVA, 406 one-way tables, 106 one-tail tests, 330 paired t test, 369 Pareto chart, 111 percentage distribution, 109–110 percentage polygon, 115 pie chart, 110 Poisson probabilities, 212 pooled-variance t test, 367 prediction interval, 498 quartiles, 162 random sampling, 55 range, 162 relative frequency, 109–110 residual analysis, 497 sample size determination, for the mean, 293 for the proportion, 293 sampling distributions, 262 scatter plot, 116 separate-variance t test, 368 side-by-side bar chart, 112 simple linear regression, 496–498 718 Index PHStat, (continued) simple probability, 189 simple random samples, 54 skewness, 162 stacked data, 108 standard deviation, 162 stem-and-leaf display, 112 stepwise regression, 576 summary tables, 106 t test for the mean (s unknown), 329 two-way ANOVA, 408 Tukey-Kramer procedure, 407 unstacked data, 108 Wilcoxon rank sum test, 449 Z test for the mean (s known), 329 Z test for the difference in two proportions, 370 Z test for the proportion, 330 Pie chart, 71–72 PivotChart, 90 PivotTables, 87 Platykurtic, 132 Point estimate, 262 Poisson distribution, 202 calculating probabilities, 203–204 properties of, 203 Polygons, cumulative percentage, 80 Pooled-variance t test, 332–337 Population(s), 40 Population mean, 144, 242 Population standard deviation, 144–145, 242 Population variance, 144–145 Power of a test, 299, 324 Power Pivot, 627–628 Practical significance, 323 Prediction interval estimate, 483–484 Prediction line, 454 Predictive analytics, 627, 630–632 Prescriptive analytics, 627 Primary data source, 40 Probability, 166 a priori, 167 Bayes’ theorem for, 183 conditional, 175 empirical, 167 ethical issues and, 180–181 joint, 170 marginal, 171 simple, 167 subjective, 167 Probability density function, 217 Probability distribution function, 195 Probability distribution for discrete random variable, 191 Probability sample, 41 Proportions chi-square (x2) test for differences between two, 411–416 chi-square (x2) test for differences in more than two, 418–421 confidence interval estimation for, 276–278 sample size determination for, 281–283 sampling distribution of, 252–254 Z test for the difference between two, 350–353 Z test of hypothesis for, 318–321 pth-order autoregressive model, 595 p-value, 303 p-value approach, 303–305, 310, 315–317, 320–321 Q Quadratic regression, 546–551 Quadratic trend model, 587–588 Qualitative forecasting methods, 578 Qualitative variable, 38 Quantitative forecasting methods, 579 Quantitative variable, 38 Quartiles, 137 Quantile-quantile plot, 229 R Random effect, 579 Random effects models, 399 Randomized block design 399 Randomness and independence, 380 Random numbers, table of, 663–664 Range, 125–126 interquartile, 139 Ratio scale, 38 Recoded variable, 46 Rectangular distribution, 217 Region of nonrejection, 297 Region of rejection, 297 Regression analysis See Multiple regression models; Simple linear regression Regression coefficients, 455–456, 500–501 Regression trees, 631 Relative frequency, 65 Relative frequency distribution, 65–66 Relevant range, 457 Repeated measurements, 341 Replicates, 388 Residual analysis, 467, 508–509, 604 Residual plots in detecting autocorrelation, 471–472 in evaluating equal variance, 470 in evaluating linearity, 468 in evaluating normality, 469 in multiple regression, 509 Residuals, 467 Resistant measures, 139 Response variable, 452 Right-skewed, 132 Robust, 311 S Sample, 40 Sample mean, 120 Sample proportion, 253, 318 Sample standard deviation, 127 Sample variance, 126 Sample size determination for mean, 279–281 for proportion, 281–283 Sample space, 168 Samples, 40 cluster, 43 convenience, 41 judgment, 41 nonprobability, 41 probability, 41 simple random, 42 stratified, 43 systematic, 42–43 Sampling from nonnormally distributed populations, 247–251 from normally distributed populations, 244–247 with replacement, 42 without replacement, 42 Sampling distributions, 241 of the mean, 241–251 of the proportion, 252–254 Sampling error, 47, 265 Scale interval, 38 nominal, 38 ordinal, 38 ratio, 38 Scatter diagram, 452 Scatter plot, 83–84, 452 Seasonal effect, 579–596 Second-order autocorrelation, 595 Second quartile, 137 Secondary data source, 40 Selection bias, 47 Separate-variance t test for differences in two means, 338 Shape, 120 Side-by-side bar chart, 74 Simple event, 167 Simple linear regression assumptions in, 467 coefficient of determination in, 464 coefficients in, 455–456 computations in, 457–459 Durbin-Watson statistic, 472–473 equations in, 453, 455 estimation of mean values and prediction of individual values, 482–485 inferences about the slope and correlation coefficient, 475–479 least-squares method in, 454–455 pitfalls in, 486 residual analysis, 467–470 standard error of the estimate in, 465–466 sum of squares in, 462–463 Simple probability, 169 Simple random sample, 42 Skewness, 132 Slicers, 89–90 Slope, 454 inferences about, 475–479 interpreting, in multiple regression, 502 Index 719 Solver add-in Checking for presence, 661 Sources of data, 40 Sparklines, 90–91 Spread, 125 Square-root transformation, 553–555 Stacked variables, 45 Standard deviation, 126–127 of binomial distribution, 200 of discrete random variable, 193 of population, 144–145 Standard error of the estimate, 465–466 Standard error of the mean, 243 Standard error of the proportion, 253 Standardized normal random variable, 216 Statistic, 29, 42 Statistics, 26, descriptive, 29 inferential, 29 Statistical inference, 29 Statistical symbols, 643 Stem-and-leaf display, 77 Stepwise regression approach to model building, 561–562 Strata, 43 Stratified sample, 43 Structured data, 28 Studentized range distribution, 382 tables, 675–676 Student’s t distribution, 268–269 Properties, 269 Student tips, 37, 38, 44, 46, 57, 63, 67, 77, 93, 122, 127, 129, 137, 142, 166, 167, 168, 171, 175, 191, 195, 216, 218, 219, 243, 252, 253, 262, 267, 276, 295, 297, 300, 301, 303, 308, 314, 318, 332, 333, 342, 350, 356, 373, 374, 375, 376, 378, 380, 381, 382, 389, 391, 412, 413, 422, 425, 431, 434, 437, 455, 456, 459, 464, 465, 468, 501, 478479, 505, 506, 509, 517, 520, 522, 529, 531, 546, 549, 551, 555, 580, 582, 589, 596, 600 Subjective probability, 167 Summary table, 57 Summation notation, 640–643 Sum of squares, 126 Sum of squares among groups (SSA), 375 Sum of squares due to factor A (SSA), 388 Sum of squares due to factor B (SSB), 389 Sum of squares due to regression (SSR), 463 Sum of squares of error (SSE), 389, 463, Sum of squares to interaction (SSAB), 389 Sum of squares total (SST), 374, 388, 462 Sum of squares within groups (SSW), 375 SureValue Convenience Stores, 291, 328, 366, 405, 447 Survey errors, 47–48 Symmetrical, 132 Systematic sample, 42–43 T Tables chi-square, 669 contingency, 58 Control chart factors, 678 Durbin-Watson, 677 F distribution, 670–673 for categorical data, 57–59 cumulative standardized normal distribution, 665–666 of random numbers, 42, 663–664 standardized normal distribution, 679 Studentized range, 675–676 summary, 57 t distribution, 667–668 Wilcoxon rank sum, 674 t distribution, properties of, 269 Test statistic, 297 Tests of hypothesis Chi-square (x2) test for differences between c proportions, 418–421 between two proportions, 411–416 Chi-square (x2) test of independence, 424–429 F test for the ratio of two variances, 356–359 F test for the regression model, 506–507 F test for the slope, 477 Kruskal-Wallis rank test for differences in c medians, 434–439 Levene test, 381 Paired t test, 342–347 pooled-variance t test, 332–337 quadratic effect, 549–550 separate-variance t test for differences in two means, 338 t test for the correlation coefficient, 479 t test for the mean (s unknown), 308–311 t test for the slope, 475–476, 510–511 Wilcoxon rank sum test for differences in two medians, 430–434 Z test for the mean (s known), 300–305 Z test for the difference between two proportions, 350–353 Z test for the proportion, 318–321 Third quartile, 137 Times series, 578 Time-series forecasting autoregressive model, 595–603 choosing an appropriate forecasting model, 604–606 component factors of classical multiplicative, 578–579 exponential smoothing in, 583–584 least-squares trend fitting and forecasting, 585–592 moving averages in, 580–582 seasonal data, 607–612 Times series plot, 85 Total variation, 374, 462 Transformation formula, 217 Transformations in regression models logarithmic, 555–557 square-root, 553–555 Treatment, 40 Treemap, 629 Trend, 579 t test for a correlation coefficient, 479 t test for the mean (s unknown), 308–311 t test for the slope, 475–476, 510–511 Tukey-Kramer multiple comparison procedure, 382–383 Tukey multiple comparison procedure, 393–394 Two-factor factorial design, 387 Two-sample tests of hypothesis for numerical data, F tests for differences in two variances, 356–359 Paired t test, 342–347 t tests for the difference in two means, 332–339 Wilcoxon rank sum test for differences in two medians, 430–434 Two-tail test, 300 Two-way analysis of variance cell means plot, 395 factorial design, 387 interpreting interaction effects, 395–397 multiple comparisons, 393–394 summary table, 391 testing for factor and interaction effects, 388–393 Two-way contingency table, 411 Type I error, 298 Type II error, 298 U Unbiased, 241 Unexplained variation or error sum of squares (SSE), 389, 463 Uniform probability distribution, 217 mean, 231 standard deviation, 231 Unstacked variables, 45 Unstructured data, 28 V Variables, 29 categorical, 38 continuous, 38 discrete, 38 dummy, 519–521 numerical, 38 Variance inflationary factor (VIF), 558 720 Index Variance of discrete random variable, 192 F-test for the ratio of two, 356–359 Levene’s test for homogeneity of, 381 population, 144–145 sample, 126 Variation, 120 Visual Explorations normal distribution, 222 sampling distributions, 251 simple linear regression, 460 Visualize, 27 Visualizations Guidelines for constructing, 96 W Wald statistic, 531 Width of class interval, 63 Wilcoxon rank sum test for differences in two medians, 430–434 Tables, 674 Within-group variation, 374 X Y Y intercept, 454 Z Z scores, 130–131 Z test for the difference between two proportions, 350–353 for the mean (s known), 300–305 for the proportion, 318–321 Credits Front matter Chapter 11 First Things First Chapter 12 Page 6, Courtesy of David Levine Page 25, Wallix/iStock/Getty Images; page 26, Excerpt from Ticket Pricing Puts ‘Lion King’ Atop Broadway’s Circle Of Life by Patrick Healy Published by The New York Times © 2014 Chapter Pages 36 and 50, Haveseen/YAY Micro/AGE Fotostock Chapter Pages 56 and 97, Scanrail/123RF Chapter Pages 119 and 154, Gitanna/Fotolia Chapter Pages 165 and 185, Blue Jean Images/Collage/Corbis Chapter Pages 190 and 206, Hongqi Zhang/123RF Chapter Pages 213 and 234, Cloki/Shutterstock; screen image from The Adventures of Dirk “Sunny” Lande appears courtesy of Waldowood Productions Chapter Pages 240 and 255, Bluecinema/E+/Getty Images Chapter Pages 261 and 285, Mark Hunt/Huntstock/Corbis Chapter Pages 294 and 324, Ahmettozar/iStock/Getty Images Chapter 10 Pages 331 and 361, Echo/Cultura/Getty Images Pages 372 and 399, Paul Morris/Bloomberg/Getty Images Pages 410 and 442, Vibrant Image Studio/Shutterstock Chapter 13 Pages 451 and 488, Hero Images/Hero Images/Corbis Chapter 14 Pages 499 and 533, Maridav/123RF Chapter 15 Pages 545 and 568, Antbphotos/Fotolia Chapter 16 Pages 577 and 613, Stylephotographs/123RF Chapter 17 Pages 622 and 632, Courtesy of Sharyn Rosenberg Online Chapter 18 Pages 18-1 and 18-31, Zest_marina/Fotolia; Figure 18.9, From The Deming Route to Quality and Productivity: Road Maps and Roadblocks by William W Scherkenbach Copyright by CEEP Press Books Used by permission of CEEP Press Books Online Chapter 19 Pages 19-1 and 19-22, Ken Mellott/Shutterstock Appendix E Table E.09, From ASTM-STP 15D Copyright by ASTM International Used by permission of ASTM International Appendix F Excerpt from Microsoft Corporation, Function Improvements In Microsoft Office Excel 2010 Published by Microsoft Corporation 721 ... pages 23 and 24) David M Levine David F Stephan Kathryn A Szabat Resources for Success MyStatLab™ Online Course for Statistics for Managers Using Microsoft Excel by Levine/ Stephan/ Szabat (access... edition, entitled Statistics for Managers Using Microsoft Excel, 8th edition, ISBN 978-0-13-417305-4, by David M Levine, David F Stephan, and Kathryn A Szabat, published by Pearson Education © 2017... solution www.mystatlab.com Statistics for Managers Using ® Microsoft Excel 8th Edition Global Edition David M Levine Department of Statistics and Computer Information Systems Zicklin School of

Ngày đăng: 06/02/2018, 14:44

Mục lục

  • A Roadmap for Selectinga Statistical Method

  • FTF.1 Think Differently About Statistics

    • Statistics: A Way of Thinking

    • Analytical Skills More Important than Arithmetic Skills

    • Statistics: An Important Part of Your Business Education

    • FTF.2 Business Analytics: The Changing Face of Statistics

      • “Big Data”

      • Structured Versus Unstructured Data

      • FTF.3 Getting Started Learning Statistics

        • Statistic

        • FTF.4 Preparing to Use Microsoft Excel for Statistics

          • Reusability Through Recalculation

          • Practical Matters: Skills You Need

          • Ways of Working with Excel

          • Which Excel Version to Use?

          • EG.3 If You Plan to Use the Workbook Instructions

          • 1 Defining and Collecting Data

            • Using Statistics: Defining Moments

            • 1.1 Defining Variables

              • Classifying Variables by Type

              • 1.2 Collecting Data

                • Populations and Samples

                • 1.3 Types of Sampling Methods

                  • Simple Random Sample

                  • Stacked and Unstacked Variables

                  • 1.5 Types of Survey Errors

                    • Coverage Error

                    • Ethical Issues About Surveys

                    • Consider This: New Media Surveys/Old Survey Errors

Tài liệu cùng người dùng

Tài liệu liên quan