1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Basic business statistics concepts and applications 13th global edtion by bereson 1

100 972 3

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 100
Dung lượng 5,08 MB

Nội dung

Basic business statistics concepts and applications 13th global edtion by bereson 1 Basic business statistics concepts and applications 13th global edtion by bereson 1 Basic business statistics concepts and applications 13th global edtion by bereson 1 Basic business statistics concepts and applications 13th global edtion by bereson 1 Basic business statistics concepts and applications 13th global edtion by bereson 1 Basic business statistics concepts and applications 13th global edtion by bereson 1

Basic Business Statistics Concepts and Applications For these Global Editions, the editorial team at Pearson has collaborated with educators across the world to address a wide range of subjects and requirements, equipping students with the best possible learning tools This Global Edition preserves the cutting-edge approach and pedagogy of the original, but also features alterations, customization, and adaptation from the North American version THIRTEENTH edition Berenson • Levine • Szabat This is a special edition of an established title widely used by colleges and universities throughout the world Pearson published this exclusive edition for the benefit of students outside the United States and Canada If you purchased this book within the United States or Canada you should be aware that it has been imported without the approval of the Publisher or Author Global edition Global edition Global edition Basic Business Statistics Concepts and Applications THIRTEENTH edition Berenson • Levine • Szabat Pearson Global Edition Berenson_1292069023_mech.indd 09/09/14 4:19 PM MyStatLab ™ for Business Statistics MyStatLab is a course management system that provides engaging learning experiences and delivers proven results while helping students succeed Tools are embedded which make it easy to integrate statistical software into the course And, MyStatLab comes from an experienced partner with educational expertise and an eye on the future Tutorial Exercises MyStatLab homework and practice exercises correlated to the exercises in the textbook are generated algorithmically, giving students unlimited opportunity for practice and mastery MyStatLab grades homework and provides feedback and guidance Help Me Solve This breaks the problem into manageable steps Students enter answers along the way View an Example walks students through a problem similar to the one assigned Textbook links to the appropriate section in the etext Tech Help is a suite of Technology Tutorial videos that show how to perform statistical calculations using popular software Powerful Homework and Test Manager Create, import, and manage online homework assignments, quizzes, and tests that are automatically graded, allowing you to spend less time grading and more time teaching Thousands of high-quality and algorithmic exercises of all types and difficulty levels are available to meet the needs of students with diverse mathematical backgrounds Z10_BERE9029_13_SE_IFC.indd 09/09/14 4:01 PM Adaptive Learning www.downloadslide.com An Adaptive Study Plan serves as a personalized tutor for your students When enabled, Knewton in MyStatLab monitors student performance and provides personalized recommendations It gathers information about learning preferences and is continuously adaptive, guiding students though the Study Plan one objective at a time Integrated Statistical Software Copy our data sets, from the eText and the MyStatLab questions, into software such as StatCrunch, Minitab, Excel, and more Students have access to support tools—videos, Study Cards, and manuals for select titles—to learn how to use statistical software  StatCrunch MyStatLab includes web-based statistical software, StatCrunch, within the online assessment platform so that students can analyze data sets from exercises and the text In addition, MyStatLab includes access to www.StatCrunch.com, the full web-based program where users can access thousands of shared data sets, create and conduct online surveys, perform complex analyses using the powerful statistical software, and generate compelling reports Engaging Video Resources • Business Insight Videos are 10 engaging videos showing managers at top companies using statistics in their everyday work Assignable questions encourage discussion • StatTalk Videos, hosted by fun-loving statistician Andrew Vickers, demonstrate important statistical concepts through interesting stories and real-life events This series of 24 videos includes available assessment questions and an instructor’s guide PHStat™  (access code required) PHStat is a statistics add-in for Microsoft Excel that simplifies the task of operating Excel, creating real Excel worksheets that use in-worksheet calculations Download PHStat by visiting www.pearsonhighered.com/phstat or through a link in MyStatLab’s Tools for Success, access code required This book features PHStat version which is compatible with all current Microsoft Windows and (Mac) OS X Excel versions A01_BERE9029_13_SE_FM.indd 19/09/14 2:52 PM www.downloadslide.com A Roadmap for Selecting a Statistical Method Data Analysis Task For Numerical Variables For Categorical Variables Describing a group or several groups Ordered array, stem-and-leaf display, frequency distribution, ­relative Summary table, bar chart, pie chart, Pareto chart frequency distribution, percentage distribution, cumulative percentage (Sections 2.1 and 2.3) distribution, histogram, polygon, cumulative percentage polygon, bullet maps, sparklines, gauges, treemaps (Sections 2.2, 2.4, 17.1) Mean, median, mode, geometric mean, quartiles, range, interquartile range, standard deviation, variance, coefficient of variation, skewness, kurtosis, boxplot, normal probability plot (Sections 3.1, 3.2, 3.3, 6.3) Index numbers (online Section 16.8) Gauges, bullet graphs, and treemaps ­(Section 17.1) Inference about one group Confidence interval estimate of the mean (Sections 8.1 and 8.2) t test for the mean (Section 9.2) Chi-square test for a variance or standard deviation (online Section 12.7) Confidence interval estimate of the proportion (Section 8.3) Z test for the proportion ­(Section 9.4) Comparing two groups Tests for the difference in the means of two independent ­populations (Section 10.1) Wilcoxon rank sum test (Section 12.4) Paired t test (Section 10.2) F test for the difference between two variances (Section 10.4) Wilcoxon signed ranks test (online Section 12.8) Z test for the difference ­between two proportions ­(Section 10.3) Chi-square test for the ­difference between two proportions (Section 12.1) McNemar test for two related samples (online Section 12.6) Comparing more than two groups One-way analysis of variance for comparing several means (Section 11.1) Kruskal-Wallis test (Section 12.5) Chi-square test for differences among more than two proportions (Section 12.2) Randomized block design (Section 11.2) Two-way analysis of variance (Section 11.3) Friedman rank test (online Section 12.9) Analyzing the r­ elationship ­between two ­variables Scatter plot, time series plot (Section 2.5) Covariance, coefficient of correlation (Section 3.5) Simple linear regression (Chapter 13) t test of correlation (Section 13.7) Time-series forecasting (Chapter 16) Sparklines (Section 17.1) Contingency table, side-by-side bar chart, PivotTables (­ Sections 2.1, 2.3, 2.6) Chi-square test of i­ndependence (Section 12.3) Analyzing the ­relationship ­between two or more variables Multiple regression (Chapters 14 and 15) Regression trees (Section 17.3) Neural nets (Section 17.4) Cluster analysis (Section 17.5) Multidimensional scaling (Section 17.6) Multidimensional contingency tables (Section 2.7) Drilldown and slicers (Section 17.1) Logistic regression (Section 14.7) Classification trees (Section 17.3) Neural nets (Section 17.4) A01_BERE9029_13_SE_FM.indd 19/09/14 2:52 PM www.downloadslide.com Basic Business Statistics Concepts and Applications T h i rt e e n t h Ed i t i o n Global Ed ition Mark L Berenson Department of Information and Operations Management School of Business, Montclair State University David M Levine Department of Statistics and Computer Information Systems Zicklin School of Business, Baruch College, City University of New York Kathryn A Szabat Department of Business Systems and Analytics School of Business, La Salle University Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City S~ ao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo A01_BERE9029_13_SE_FM.indd 19/09/14 2:52 PM www.downloadslide.com Editor in Chief: Deirdre Lynch Head of Learning Asset Acquisition, Global Editions: Laura Dent Acquisitions Editor: Marianne Stepanian Acquisitions Editor, Global Editions: Subhasree Patra Project Editor: Dana Bettez Assistant Editor: Sonia Ashraf Senior Managing Editor: Karen Wernholm Senior Project Editor, Global Editions: Vaijyanti Senior Production Supervisor: Kathleen A Manley Senior Manufacturing Production Controller, Global Editions:   Trudy Kimber Digital Assets Manager: Marianne Groth Manager, Multimedia Production: Christine Stavrou Media Production Manager, Global Editions: M Vikram Kumar Software Development: John Flanagan, MathXL; Marty Wright, TestGen Senior Marketing Manager: Erin Lane Marketing Assistant: Kathleen DeChavez Senior Author Support/Technology Specialist: Joe Vetere Rights and Permissions Advisor: Cathy Pare Image Manager: Rachel Youdelman Procurement Specialist: Debbie Rossi Art Direction: Barbara Atkinson Cover Design: Lumina Datamatics Text Design, Production Coordination, Composition, and Illustrations: PreMediaGlobal Cover photo: âSergey Nivens/Shutterstock MICROSOFTđ AND WINDOWSđ ARE REGISTERED TRADEMARKS OF THE MICROSOFT CORPORATION IN THE U.S.A. AND OTHER COUNTRIES THIS BOOK IS NOT SPONSORED OR ENDORSED BY OR AFFILIATED WITH THE MICROSOFT CORPORATION. Illustrations of Microsoft Excel in this book have been taken from Microsoft Excel 2013, unless otherwise indicated MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS MAKE NO REPRESENTATIONS ABOUT THE SUITABILITY OF THE INFORMATION CONTAINED IN THE DOCUMENTS AND RELATED GRAPHICS PUBLISHED AS PART OF THE SERVICES FOR ANY PURPOSE ALL SUCH DOCUMENTS AND RELATED GRAPHICS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS HEREBY DISCLAIM ALL WARRANTIES AND CONDITIONS WITH REGARD TO THIS INFORMATION, INCLUDING ALL WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT IN NO EVENT SHALL MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THE SERVICES THE DOCUMENTS AND RELATED GRAPHICS CONTAINED HEREIN COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED HEREIN AT ANY TIME.  PARTIAL SCREEN SHOTS MAY BE VIEWED IN FULL WITHIN THE SOFTWARE VERSION SPECIFIED Minitab © 2013 Portions of information contained in this publication/book are printed with permission of Minitab Inc All such material remains the exclusive property and copyright of Minitab Inc All rights reserved The contents, descriptions, and characters of WaldoLands and Waldowood are Copyright © 2014, 2011 Waldowood Productions, and used with permission Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsonglobaleditions.com © Pearson Education Limited 2015 The rights of Mark L Berenson, David M Levine and Kathryn A Szabat to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988 Authorized adaptation from the United States edition, entitled Basic Business Statistics: Concepts and Applications, 13th edition, ISBN 978-0-321-87002-5, by Mark L Berenson, David M Levine and Kathryn A Szabat, published by Pearson Education © 2015 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a license permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS All trademarks used herein are the property of their respective owners The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and Pearson was aware of a trademark claim, the designations have been printed in initial caps or all caps British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library 10 9 8 7 6 5 4 3 2 1 15 14 13 12 11 ISBN 10: 1-292-06902-3 ISBN 13: 978-1-292-06902-9 Typeset in Times LT Std by Lumina Datamatics Printed by Courier Kendallville in the United States of America A01_BERE9029_13_SE_FM.indd 19/09/14 2:52 PM www.downloadslide.com About the Authors The authors of this book: Kathryn Szabat, David Levine, and Mark Berenson at a Decision Sciences Institute meeting Mark L Berenson is Professor of Management and Information Systems at Montclair State University (Montclair, New Jersey) and also Professor Emeritus of Statistics and Computer Information Systems at Bernard M Baruch College (City University of New York) He currently teaches graduate and undergraduate courses in statistics and in operations management in the School of Business and an undergraduate course in international justice and human rights that he co-developed in the College of Humanities and Social Sciences Berenson received a B.A in economic statistics and an M.B.A in business statistics from City College of New York and a Ph.D in business from the City University of New York Berenson’s research has been published in Decision Sciences Journal of Innovative Education, Review of Business Research, The American Statistician, Communications in Statistics, Psychometrika, Educational and Psychological Measurement, Journal of Management Sciences and Applied Cybernetics, Research Quarterly, Stats Magazine, The New York Statistician, Journal of Health Administration Education, Journal of Behavioral Medicine, and Journal of Surgical Oncology His invited articles have appeared in The Encyclopedia of Measurement & Statistics and Encyclopedia of Statistical Sciences He is co-author of 11 statistics texts published by Prentice Hall, including Statistics for Managers Using Microsoft Excel, Basic Business Statistics: Concepts and Applications, and Business Statistics: A First Course Over the years, Berenson has received several awards for teaching and for innovative contributions to statistics education In 2005, he was the first recipient of the Catherine A Becker Service for Educational Excellence Award at Montclair State University and, in 2012, he was the recipient of the Khubani/Telebrands Faculty Research Fellowship in the School of Business David M Levine is Professor Emeritus of Statistics and Computer Information Systems at Baruch College (City University of New York) He received B.B.A and M.B.A degrees in statistics from City College of New York and a Ph.D from New York University in industrial engineering and operations research He is nationally recognized as a leading innovator in statistics education and is the co-author of 14 books, including such best-selling statistics textbooks as Statistics for Managers Using Microsoft Excel, Basic Business Statistics: Concepts and Applications, Business Statistics: A First Course, and Applied Statistics for Engineers and Scientists Using Microsoft Excel and Minitab A01_BERE9029_13_SE_FM.indd 19/09/14 2:52 PM www.downloadslide.com ABOUT THE AUTHORS He also is the co-author of Even You Can Learn Statistics: A Guide for Everyone Who Has Ever Been Afraid of Statistics, currently in its second edition, Six Sigma for Green Belts and Champions and Design for Six Sigma for Green Belts and Champions, and the author of Statistics for Six Sigma Green Belts, all published by FT Press, a Pearson imprint, and Quality Management, third edition, McGraw-Hill/Irwin He is also the author of Video Review of Statistics and Video Review of Probability, both published by Video Aided Instruction, and the statistics module of the MBA primer published by Cengage Learning He has published articles in various journals, including Psychometrika, The American Statistician, Communications in Statistics, Decision Sciences Journal of Innovative Education, Multivariate Behavioral Research, Journal of Systems Management, Quality Progress, and The American Anthropologist, and he has given numerous talks at the Decision Sciences Institute (DSI), American Statistical Association (ASA), and Making Statistics More Effective in Schools and Business (MSMESB) conferences Levine has also received several awards for outstanding teaching and curriculum development from Baruch College Kathryn A Szabat is Associate Professor and Chair of Business Systems and Analytics at LaSalle University She teaches undergraduate and graduate courses in business statistics and operations management Szabat’s research has been published in International Journal of Applied Decision Sciences, Accounting Education, Journal of Applied Business and Economics, Journal of Healthcare Management, and Journal of Management Studies Scholarly chapters have appeared in Managing Adaptability, Intervention, and People in Enterprise Information Systems; Managing, Trade, Economies and International Business; Encyclopedia of Statistics in Behavioral Science; and Statistical Methods in Longitudinal Research Szabat has provided statistical advice to numerous business, nonbusiness, and academic communities Her more recent involvement has been in the areas of education, medicine, and nonprofit capacity building Szabat received a B.S in mathematics from State University of New York at Albany and M.S and Ph.D degrees in statistics, with a cognate in operations research, from the Wharton School of the University of Pennsylvania A01_BERE9029_13_SE_FM.indd 19/09/14 2:52 PM www.downloadslide.com Brief Contents Preface 19 Getting Started: Important Things to Learn First  29 Defining and Collecting Data  41 Organizing and Visualizing Variables  64 Numerical Descriptive Measures  129 Basic Probability  179 Discrete Probability Distributions  213 The Normal Distribution and Other Continuous Distributions  247 Sampling Distributions  278 Confidence Interval Estimation  300 Fundamentals of Hypothesis Testing: One-Sample Tests  336 10 Two-Sample Tests  375 11 Analysis of Variance  422 12 Chi-Square and Nonparametric Tests  475 13 Simple Linear Regression  519 14 Introduction to Multiple Regression  571 15 Multiple Regression Model Building  624 16 Time-Series Forecasting  657 17 Business Analytics  702 18 A Roadmap for Analyzing Data  735 19 Statistical Applications in Quality Management (online) 20 Decision Making (online) Appendices A–G  743 Self-Test Solutions and Answers to Selected Even-Numbered Problems  795 Index 831 A01_BERE9029_13_SE_FM.indd 19/09/14 8:21 PM www.downloadslide.com Contents Preface 19 Getting Started: Important Things to Learn First 29 Stratified Sample  51 Cluster Sample  51 1.5 Types of Survey Errors 52 Coverage Error  53 Nonresponse Error  53 Sampling Error  53 Measurement Error  53 Ethical Issues About Surveys  54 Using Statistics: “You Cannot Escape from Data”  29 GS.1   Statistics: A Way of Thinking 30 GS.2   Data: What Is It? 31 Think About This: New Media Surveys/Old Sampling Problems  54 GS.3  Business Analytics: The Changing Face of Using Statistics: Beginning of the End … Revisited  55  Statistics 32 Summary  56    “Big Data”  32    Statistics: An Important Part of Your Business   Education 33 REFERENCES 56 Key Terms  56 Checking Your Understanding  57 GS.4   Software and Statistics 34   Excel and Minitab Guides  34 References 35 Key Terms  35 Excel Guide  36   EG1 Getting Started with Microsoft Excel  36   EG2 Entering Data  36   EG3 Opening and Saving Workbooks  37   EG4 Creating and Copying Worksheets  38   EG5 Printing Worksheets  38 Minitab Guide  39   MG1 Getting Started with Minitab  39   MG2 Entering Data  39   MG3 Opening and Saving Worksheets and Projects  39   MG4 Creating and Copying Worksheets  40   MG5 Printing Parts of a Project  40 1 Defining and Collecting Data 41 Using Statistics: Beginning of the End … Or the End of the Beginning?  41 1.1 Defining Data 42 Establishing the Variable Type  42 1.2 Measurement Scales for Variables 43 Nominal and Ordinal Scales  43 Interval and Ratio Scales  44 1.3 Collecting Data 46 Data Sources  46 Populations and Samples  47 Data Formatting  47 Data Cleaning  48 Recoding Variables  48 1.4 Types of Sampling Methods 49 Simple Random Sample  50 Systematic Sample  51 A01_BERE9029_13_SE_FM.indd Chapter Review Problems  57 Cases for Chapter 1  58 Managing Ashland MultiComm Services  58 CardioGood Fitness  58 Clear Mountain State Student Surveys  59 Learning with the Digital Cases  59 Chapter Excel Guide  61   EG1.1 Defining Data  61   EG1.2 Measurement Scales for Variables  61   EG1.3 Collecting Data  61   EG1.4 Types of Sampling Methods  61 Chapter Minitab Guide  62   MG1.1 Defining Data  62   MG1.2 Measurement Scales for Variables  62   MG1.3 Collecting Data  63   MG1.4 Types of Sampling Methods  63 2 Organizing and Visualizing Variables 64 Using Statistics: The Choice Is Yours 64 How to Proceed with This Chapter  65 2.1 Organizing Categorical Variables 66 The Summary Table  66 The Contingency Table  67 2.2 Organizing Numerical Variables 70 The Ordered Array  70 The Frequency Distribution  71 Classes and Excel Bins  73 The Relative Frequency Distribution and the Percentage Distribution 73 The Cumulative Distribution  75 Stacked and Unstacked Data  77 19/09/14 8:21 PM 84 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables Problems for Section 2.3 Applying the Concepts 2.24  An online survey commissioned by Vizu, a Nielsen company, of digital marketing and media professionals on current attitudes and practices regarding paid social media advertising was conducted by Digiday in fall 2012 Advertisers were asked to indicate the primary purpose of their paid social media ads The survey results were as follows: Paid Social Media Advertising Objective Percentage Primarily branding related, e.g raising   awareness, influencing brand opinions Primarily direct-response related, e.g   driving product trials or site visits Mix—more than half is branding Mix—more than half is direct-response 45% 16% 25% 14% Source: Data extracted from www.nielsen.com/us/en/reports/2013/ the-paid-social-media-advertising-report-2013.html a Construct a bar chart, a pie chart, and a Pareto chart b Which graphical method you think is best for portraying these data? c What conclusions can you reach concerning the purpose of paid media advertising? 2.25  What college students with their time? A survey of 3,000 traditional-age students was taken, with the results as follows: Activity Attending class/lab Sleeping Socializing, recreation, other Studying Working, volunteering, student clubs Percentage 9% 24% 51% 7% 9% Source: Data extracted from M Marklein, “First Two Years of College Wasted?” USA Today, January 18, 2011, p 3A a Construct a bar chart, a pie chart, and a Pareto chart b Which graphical method you think is best for portraying these data? c What conclusions can you reach concerning what college students with their time? 2.26  The following data has been recorded of the consumer ­complaints in a hotel: Complaint type Heating Cleaning Towels Theft Noise Room Service Number of consumer complaints 30 100 50 10 20 a Construct a Pareto chart b What were the top and bottom 50% complaints received for? c Based on results of a and b, what would you advise the hotel management to prioritize? 2.27  The Edmunds.com NHTSA Complaints Activity Report contains consumer vehicle complaint submissions by automaker, brand, and category (data extracted from edmu.in/Ybmpuz.) The following tables, stored in Automaker1 and Automaker2 , represent complaints received by automaker and complaints received by category for January 2013 Automaker American Honda Chrysler LLC Ford Motor Company General Motors Nissan Motors Corporation Toyota Motor Sales Other Number 169 439 440 551 467 332 516 a Construct a bar chart and a pie chart for the complaints received by automaker b Which graphical method you think is best for portraying these data? Category Airbags and seatbelts Body and glass Brakes Fuel/emission/exhaust system Interior electronics/hardware Powertrain Steering Tires and wheels Number 201 182 63 240 279 1,148 397 71 c Construct a Pareto chart for the categories of complaints d Discuss the “vital few” and “trivial many” reasons for the categories of complaints 2.28  The following table indicates the percentage of residential electricity consumption in the United States, in a recent year organized by type of use Type of Appliance Cooking Cooling Electronics Heating Lighting Refrigeration Water heating Wet cleaning Other Percentage  4%  9%  6% 45%  6%  4% 18%  3%  5% Source: Department of Energy M02_BERE9029_13_SE_C02.indd 84 19/09/14 10:17 AM www.downloadslide.com 85 2.4  Visualizing Numerical Variables a Construct a bar chart, a pie chart, and a Pareto chart b Which graphical method you think is best for portraying these data? c What conclusions can you reach concerning residential electricity consumption in the United States? 2.29  Visier’s Survey of Employers explores how North American organizations are solving the challenges of delivering workforce analytics Employers were asked what would help them be successful with human resources metrics and reports The responses were as follows (stored in Needs ): Needs Frequency Easier-to-use analytic tools Faster access to data Improved ability to present and interpret data Improved ability to plan actions Improved ability to predict impacts of my  actions Improved relationships to the business line  organizations 127  41 123  33  49  37 Source: Data extracted from bit.ly/YuWYXc a Construct a bar chart and a pie chart b What conclusions can you reach concerning needs for employer success with human resource metrics and reports? 2.30  A study has been conducted on the prevalence of depression with respect to demographic features such as age, race, and ­gender The survey was administered to 155 patients and it was found that women are more likely to be depressed compared to men, as shown in the table below Depressed Not Depressed Total Male Female Total  54  68 122 21 12 33  75  80 155 Source: Data extracted from Gottlieb SS, Khatta M, Friedmann E, et al., “The Influence of Age, Gender, and Race on the Prevalence of Depression in Heart Failure Patients.” a Which type of graph will be most suitable for the given data? b Draw the graph so meaningful conclusions can be drawn from it 2.31  A research report states that an adolescent's overall health impacts their physical and psychological well being The report looks at the direct relationship between nutritional intake and academic achievement Milk intake was looked at specifically, as it is rich in calcium and contributes to bone growth However, it was found that teens consume twice as much soft drink as milk ­Assume the following data is collected: Good PE grades Not good PE grades Drink milk Do not drink milk  67   33 22    48 a Construct contingency tables based on total percentages, row percentages, and column percentages b Draw a side-by-side bar chart c What conclusion can you draw related to drinking milk and PE grades? Which method of analysis is better, contingency tables or side-by-side graphs? 2.32  A research was conducted to find if dogs resemble their owners The finding of the research was that people tend to select dogs that in some way resemble them and the resemblance increases with the duration of ownership Assume that this finding is specific to a particular breed of dogs and that the following data has been collected: Specific Breed Other Dogs Resemble Owner Do Not Resemble Owner 20 12 11 17 a Draw a side-by-side chart to project whether only dogs of a specific breed resemble their owners, or dogs of all breeds so b What conclusions can you draw from the chart? 2.4  Visualizing Numerical Variables You visualize the data for a numerical variable through a variety of techniques that show the distribution of values These techniques include the stem-and-leaf display, the histogram, the percentage polygon, and the cumulative percentage polygon (ogive), all discussed in this section, as well as the boxplot, which requires descriptive summary measures, as explained in Section 3.3 The Stem-and-Leaf Display A stem-and-leaf display visualizes data by presenting the data as one or more row-wise stems that represent a range of values In turn, each stem has one or more leaves that branch out to the right of their stem and represent the values found in that stem For stems with more than one leaf, the leaves are arranged in ascending order M02_BERE9029_13_SE_C02.indd 85 19/09/14 10:17 AM 86 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables Stem-and-leaf displays allow you to see how the data are distributed and where concentrations of data exist Leaves typically present the last significant digit of each value, but sometimes you round values For example, suppose you collect the following meal costs (in $) for 15 classmates who had lunch at a fast-food restaurant (stored in FastFood ): 7.42 6.29 5.83 6.50 8.34 9.51 7.10 6.80 5.90 4.89 6.50 5.52 7.90 8.30 9.60 To construct the stem-and-leaf display, you use whole dollar amounts as the stems and round the cents to one decimal place to use as the leaves For the first value, 7.42, the stem would be and its leaf would be For the second value, 6.29, the stem would be and its leaf The completed stem-and-leaf display for these data is Student Tip If you turn a stem-andleaf display sideways, the display looks like a histogram Example 2.8 Stem-and-Leaf Display of the One-Year Return Percentage for the Value Funds 589 3558 149 33 56 As a member of the company task force in The Choice Is Yours scenario (see page 64), you want to study the past performance of the value funds One measure of past performance is the numerical variable 1YrReturn%, the one-year return percentage Using the data from the 89 value funds, you want to visualize this variable as a stem-and-leaf display Solution  Figure 2.7 illustrates the stem-and-leaf display of the one-year return percentage for value funds Figure 2.7 Minitab stem-and-leaf display of the one-year return percentage for value funds Using Excel with PHStat will create an equivalent display that contains a ­different set of stems Figure 2.7 allows you to conclude: M02_BERE9029_13_SE_C02.indd 86 • The lowest one-year return was approximately • The highest one-year return was 28 • The one-year returns were concentrated between 12 and 19 • Very few of the one-year returns were above 21 19/09/14 10:17 AM www.downloadslide.com 2.4  Visualizing Numerical Variables 87 The Histogram A histogram visualizes data as a vertical bar chart in which each bar represents a class interval from a frequency or percentage distribution In a histogram, you display the numerical variable along the horizontal (X) axis and use the vertical (Y) axis to represent either the frequency or the percentage of values per class interval There are never any gaps between adjacent bars in a histogram Figure 2.8 visualizes the data of Table 2.9 on page 71, meal costs at city and suburban restaurants, as a pair of frequency histograms The histogram for city restaurants shows that the cost of meals is concentrated between approximately $30 and $60 Only one meal at city restaurants cost more than $80 The histogram for suburban restaurants shows that the cost of meals is also concentrated between $30 and $60 However, many more meals at suburban restaurants cost between $30 and $40 than at city restaurants Very few meals at suburban restaurants cost more than $70 Figure 2.8 Minitab frequency histograms for meal costs at city and suburban restaurants Example 2.9 Histograms of the One-Year Return Percentages for the Growth and Value Funds M02_BERE9029_13_SE_C02.indd 87 As a member of the company task force in The Choice Is Yours scenario (see page 64), you seek to compare the past performance of the growth funds and the value funds, using the oneyear return percentage variable Using the data from the sample of 316 funds, you construct histograms for the growth and the value funds to create a visual comparison Solution  Figure 2.9 displays frequency histograms for the one-year return percentages for the growth and value funds Reviewing the histograms in Figure 2.9 leads you to conclude that the returns were lower for the growth funds than for value funds The return for both the growth funds and the value funds is concentrated between 10 and 20, but the return for the value funds is more concentrated between 15 and 20 while the return for the growth funds is more concentrated between 10 and 15 (continued) 19/09/14 10:17 AM 88 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables Figure 2.9 Excel frequency histograms for the one-year return percentages for the growth and value funds The Percentage Polygon When using a categorical variable to divide the data of a numerical variable into two or more groups, you visualize data by constructing a percentage polygon This chart uses the midpoints of each class interval to represent the data of each class and then plots the midpoints, at their respective class percentages, as points on a line along the X axis While you can construct two or more histograms, as was done in Figures 2.8 and 2.9, a percentage polygon allows you to make a direct comparison that is easier to interpret (You cannot, of course, combine two histograms into one chart as bars from the two groups would overlap and obscure data.) Figure 2.10 displays percentage polygons for the cost of meals at city and suburban restaurants Compare this figure to the pair of histograms in Figure 2.8 on page 87 Reviewing the polygons in Figure 2.10 allows you to make the same observations as were made when examining Figure 2.8, including the fact that while city restaurant meal costs are both concentrated between $30 and $60, suburban restaurants have a much higher concentration between $30 and $40 However, unlike the pair of histograms, the polygons allow you to more easily identify which class intervals have similar percentages for the two groups and which not The polygons in Figure 2.10 have points whose values on the X axis represent the midpoint of the class interval For example, look at the points plotted at X = 35 (+35) The point for meal costs at city restaurants (the lower one) show that 20% of the meals cost between $30 and $40, while the point for the meal costs at suburban restaurants (the higher one) shows that 34% of meals at these restaurants cost between $30 and $40 Figure 2.10 Minitab percentage polygons of meal costs for city and suburban restaurants M02_BERE9029_13_SE_C02.indd 88 19/09/14 10:17 AM www.downloadslide.com 2.4  Visualizing Numerical Variables 89 When you construct polygons or histograms, the vertical (Y) axis should include zero to avoid distorting the character of the data The horizontal (X) axis does not need to show the zero point for the numerical variable, but a major portion of the axis should be devoted to the entire range of values for the variable Example 2.10 Percentage Polygons of the One-Year Return Percentage for the Growth and Value Funds As a member of the company task force in The Choice Is Yours scenario (see page 64), you seek to compare the past performance of the growth funds and the value funds using the oneyear return percentage variable Using the data from the sample of 316 funds, you construct percentage polygons for the growth and value funds to create a visual comparison Solution  Figure 2.11 displays percentage polygons of the one-year return percentage for the growth and value funds Figure 2.11 Excel percentage polygons of the one-year return percentages for the growth and value funds Figure 2.11 shows that the value funds polygon is to the right of the growth funds polygon This allows you to conclude that the one-year return percentage is higher for value funds than for growth funds The polygons also show that the return for value funds is concentrated between 15 and 20, and the return for the growth funds is concentrated between 10 and 15 The Cumulative Percentage Polygon (Ogive) The cumulative percentage polygon, or ogive, uses the cumulative percentage distribution discussed in Section 2.2 to plot the cumulative percentages along the Y axis Unlike the percentage polygon, the lower boundary of the class interval for the numerical variable are plotted, at their respective class percentages, as points on a line along the X axis Figure 2.12 shows cumulative percentage polygons of meal costs for city and suburban restaurants In this chart, the lower boundaries of the class intervals (20, 30, 40, etc.) are approximated by the upper boundaries of the previous bins (19.99, 29.99, 39.99, etc.) Reviewing the curves leads you to conclude that the curve of the cost of meals at the city restaurants is located to the right of the curve for the suburban restaurants This indicates that the city restaurants have fewer meals that cost less than a particular value For example, 52% of the meals at city restaurants cost less than $50, as compared to 68% of the meals at suburban restaurants M02_BERE9029_13_SE_C02.indd 89 19/09/14 10:17 AM 90 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables Figure 2.12 Minitab cumulative percentage polygons of meal costs for city and suburban restaurants Example 2.11 Cumulative Percentage Polygons of the One-Year Return Percentages for the Growth and Value Funds As a member of the company task force in The Choice Is Yours scenario (see page 64), you seek to compare the past performance of the growth funds and the value funds using the oneyear return percentage variable Using the data from the sample of 316 funds, you construct cumulative percentage polygons for the growth and the value funds Solution  Figure 2.13 displays cumulative percentage polygons of the one-year return percentages for the growth and value funds Figure 2.13 Excel cumulative percentage polygons of the one-year return percentages for the growth and value funds In Microsoft Excel, you approximate the lower boundary by using the upper boundary of the previous bin The cumulative percentage polygons in Figure 2.13 show that the curve for the one-year return percentage for the growth funds is located slightly to the left of the curve for the value funds This allows you to conclude that the growth funds have fewer one-year return percentages that are higher than a particular value For example, 59.03% of the growth funds had one-year return percentages below 15, as compared to 48.31% of the value funds You can conclude that, in general, the value funds slightly outperformed the growth funds in their oneyear returns M02_BERE9029_13_SE_C02.indd 90 19/09/14 10:18 AM www.downloadslide.com 2.4  Visualizing Numerical Variables 91 Problems for Section 2.4 Learning the Basics 2.33  Construct a stem-and-leaf display, given the following data from a sample of midterm exam scores in finance: 54 69 98 93 53 74 2.34  Construct an ordered array, given the following stem-andleaf display from a sample of n = midterm exam scores in information systems: 446 19 Applying the Concepts 2.35  The following is a stem-and-leaf display representing the amount of gasoline purchased, in gallons (with leaves in tenths of gallons), for a sample of 25 cars that use a particular service station on the New Jersey Turnpike: 147 10 02238 11 125566777 12 223489 13 02 c Does the ordered array or the stem-and-leaf display provide more information? Discuss d Around what value, if any, is the amount of caffeine in energy drinks concentrated? Explain 2.38  The file Utility contains the following data about the cost of electricity during July 2013 for a random sample of 50 onebedroom apartments in a large city: 96 157 141 95 108 171 185 149 163 119 202 90 206 150 183 178 116 175 154 151 147 172 123 130 114 102 111 128 143 135 153 148 144 187 191 197 213 168 166 137 127 130 109 139 129 82 165 167 149 158 a Construct a histogram and a percentage polygon b Construct a cumulative percentage polygon c Around what amount does the monthly electricity cost seem to be concentrated? 2.39  As player salaries have increased, the cost of attending baseball games has increased dramatically The following histogram and cumulative percentage polygon visualizes the total cost (in $) for four tickets, two beers, four soft drinks, four hot dogs, two game programs, two baseball caps, and parking for one vehicle at each of the 30 Major League Baseball parks during the 2012 season that is stored in BBCost2012 a Construct an ordered array b Which of these two displays seems to provide more information? Discuss c What amount of gasoline (in gallons) is most likely to be purchased? d Is there a concentration of the purchase amounts in the center of the distribution? SELF 2.36  The file BBCost2012 contains the total cost (in $) Test for four tickets, two beers, four soft drinks, four hot dogs, two game programs, two baseball caps, and parking for one vehicle at each of the 30 Major League Baseball parks during the 2012 season Source: Data extracted from fancostexperience.com/pages/fcx /fci_pdfs//8.pdfs a Construct a stem-and-leaf display b Around what value, if any, are the costs of attending a baseball game concentrated? Explain 2.37  The file Caffeine contains the caffeine content (in milligrams per ounce) for a sample of 26 energy drinks: 3.2 1.5 4.6 8.9 7.1 9.0 9.4 31.2 10.0 10.1 9.9 11.5 11.8 11.7 13.8 14.0 16.1 74.5 10.8 26.3 17.7 113.3 32.5 14.0 91.6 127.4 Source: Data extracted from “The Buzz on Energy-Drink Caffeine,” Consumer Reports, December 2012 a Construct an ordered array b Construct a stem-and-leaf display M02_BERE9029_13_SE_C02.indd 91 What conclusions can you reach concerning the cost of attending a baseball game at different ballparks? 19/09/14 10:18 AM 92 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables 2.40  The following histogram and cumulative percentage polygon visualize the data about the property taxes per capita($) for the 50 states and the District of Columbia, stored in Property Taxes automobile company The data are collected from a sample of 100 steel parts and stored in Steel The measurement reported is the difference in inches between the actual length of the steel part, as measured by a laser measurement device, and the specified length of the steel part For example, the first value, - 0.002 represents a steel part that is 0.002 inch shorter than the specified length a Construct a percentage histogram b Is the steel mill doing a good job meeting the requirements set by the automobile company? Explain 2.44  A manufacturing company produces steel housings for electrical equipment The main component part of the housing is a steel trough that is made out of a 14-gauge steel coil It is produced using a 250-ton progressive punch press with a wipe-down operation that puts two 90-degree forms in the flat steel to make the trough The distance from one side of the form to the other is critical because of weatherproofing in outdoor applications The company requires that the width of the trough be between 8.31 inches and 8.61 inches The widths of the troughs, in inches, collected from a sample of 49 troughs, are stored in Trough a Construct a percentage histogram and a percentage polygon b Plot a cumulative percentage polygon c What can you conclude about the number of troughs that will meet the company’s requirements of troughs being between 8.31 and 8.61 inches wide? What conclusions can you reach concerning the property taxes per capita? 2.41  How much time Americans living in or near cities spend waiting in traffic, and how much does waiting in traffic cost them per year? The data in the file Congestion include this cost for 31 cities (Source: Data extracted from “The High Cost of Congestion,” Time, October 17, 2011, p 18.) For the time Americans living in or near cities spend waiting in traffic and the cost of waiting in traffic per year, a Construct a percentage histogram b Construct a cumulative percentage polygon c What conclusions can you reach concerning the time Americans living in or near cities spend waiting in traffic? d What conclusions can you reach concerning the cost of waiting in traffic per year? 2.42  How the average credit scores of people living in various cities differ? The file Credit Scores contains an ordered array of the average credit scores of 143 American cities (Data extracted from usat.ly/17a1fA6.) a Construct a percentage histogram b Construct a cumulative percentage polygon c What conclusions can you reach concerning the average credit scores of people living in different American cities? 2.43  One operation of a mill is to cut pieces of steel into parts that will later be used as the frame for front seats in an automobile The steel is cut with a diamond saw and requires the resulting parts to be within { 0.005 inch of the length specified by the M02_BERE9029_13_SE_C02.indd 92 2.45  The manufacturing company in Problem 2.44 also produces electric insulators If the insulators break when in use, a short circuit is likely to occur To test the strength of the insulators, destructive testing in high-powered labs is carried out to determine how much force is required to break the insulators Force is measured by observing how many pounds must be applied to the insulator before it breaks The force measurements, collected from a sample of 30 insulators, are stored in Force a Construct a percentage histogram and a percentage polygon b Construct a cumulative percentage polygon c What can you conclude about the strengths of the insulators if the company requires a force measurement of at least 1,500 pounds before the insulator breaks? 2.46  The file Bulbs contains the life (in hours) of a sample of 40 20-watt compact fluorescent light bulbs produced by Manufacturer A and a sample of 40 20-watt compact fluorescent light bulbs produced by Manufacturer B Use the following class interval widths for each distribution: Manufacturer A: 6,500 but less than 7,500, 7,500 but less than 8,500, and so on Manufacturer B: 7,500 but less than 8,500, 8,500 but less than 9,500, and so on a Construct percentage histograms on separate graphs and plot the percentage polygons on one graph b Plot cumulative percentage polygons on one graph c Which manufacturer has bulbs with a longer life—Manufacturer A or Manufacturer B? Explain 2.47  The data stored in Drink represents the amount of soft drink in a sample of 50 2-liter bottles a Construct a histogram and a percentage polygon b Construct a cumulative percentage polygon c On the basis of the results in (a) and (b), does the amount of soft drink filled in the bottles concentrate around specific values? 19/09/14 10:18 AM www.downloadslide.com 93 2.5  Visualizing Two Numerical Variables 2.5  Visualizing Two Numerical Variables Visualizing two numerical variables together can reveal possible relationships between two variables and serve as a basis for applying the methods discussed in Chapters 13 through 17 To visualize two numerical variables, you construct a scatter plot For the special case in which one of the two variables represents the passage of time, you construct a time-series plot The Scatter Plot A scatter plot explores the possible relationship between two numerical variables by plotting the values of one numerical variable on the horizontal, or X, axis and the values of a second numerical variable on the vertical, or Y, axis For example, a marketing analyst could study the effectiveness of advertising by comparing advertising expenses and sales revenues of 50 stores by using the X axis to represent advertising expenses and the Y axis to represent sales revenues Example 2.12 Scatter Plot for NBA Investment Analysis Suppose that you are an investment analyst who has been asked to review the valuations of the 30 NBA professional basketball teams You seek to know if the value of a team reflects its revenues You collect revenue and valuation data (both in $millions) for all 30 NBA teams, organize the data as Table 2.18, and store the data in NBAValues To quickly visualize a possible relationship between team revenues and valuations, you construct a scatter plot as shown in Figure 2.14, in which you plot the revenues on the X axis and the value of the team on the Y axis Ta b l e Revenues and Values for NBA Teams Team Code Revenue ($millions) Value ($millions) Team Code Revenue ($millions) Value ($millions) Team Code Revenue ($millions) Value ($millions) ATL  99 316 HOU 135   568 OKC 127 475 BOS 143 730 IND  98   383 ORL 126 470 BRK  84 530 LAC 108   430 PHI 107 418 CHA  93 315 LAL 197 1,000 PHX 121 474 CHI 162 800 MEM  96   377 POR 117 457 CLE 128 434 MIA 150   625 SAC  96 525 DAL 137 685 MIL  87   312 SAS 135 527 DEN 110 427 MIN  96   364 TOR 121 405 DET 125 400 NOH 100   340 UTA 111 432 GSW 127 555 NYK 243 1,100 WAS 102 397 Source: Data extracted from www.forbes.com/nba-valuations Solution  From Figure 2.14, you see that there appears to be a strong increasing (positive) relationship between revenues and the value of a team In other words, teams that generate a smaller amount of revenues have a lower value, while teams that generate higher revenues have a higher value This relationship has been highlighted by the addition of a linear regression prediction line that will be discussed in Chapter 13 (continued) M02_BERE9029_13_SE_C02.indd 93 19/09/14 10:18 AM 94 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables Figure 2.14 Scatter plot of revenue and value for NBA teams Other pairs of variables may have a decreasing (negative) relationship in which one variable decreases as the other increases In other situations, there may be a weak or no relationship between the variables Learn More Read the Short Takes for Chapter for an example that illustrates a negative relationship Example 2.13 Time-Series Plot for Movie Revenues Ta b l e Movie Revenues (in $billions) from 1995 to 2012 The Time-Series Plot A time-series plot plots the values of a numerical variable on the Y axis and plots the time period associated with each numerical value on the X axis A time-series plot can help you visualize trends in data that occur over time As an investment analyst who specializes in the entertainment industry, you are interested in discovering any long-term trends in movie revenues You collect the annual revenues (in $billions) for movies released from 1995 to 2012, and organize the data as Table 2.19, and store the data in Movie Revenues To see if there is a trend over time, you construct the time-series plot shown in Figure 2.15 Year Revenue ($billions) Year Revenue ($billions) 1995 5.29 2004 9.27 1996 5.59 2005 8.95 1997 6.51 2006 9.25 1998 6.77 2007 9.63 1999 7.30 2008 9.95 2000 7.48 2009 10.65 2001 8.13 2010 10.50 2002 9.19 2011 10.28 2003 9.35 2012 10.71 Source: Data extracted from www.the-numbers.com/market, March 18, 2013 Solution  From Figure 2.15, you see that there was a steady increase in the revenue of movies between 1995 and 2003, a leveling off from 2003 and 2006, followed by a further increase from 2007 to 2009, followed by another leveling off from 2010 to 2012 During that time, the revenue increased from under $6 billion in 1995 to more than $10 billion in 2009 to 2012 M02_BERE9029_13_SE_C02.indd 94 19/09/14 10:18 AM www.downloadslide.com 2.5  Visualizing Two Numerical Variables 95 Figure 2.15 Time-series plot of movie revenue per year from 1995 to 2012 Problems for Section 2.5 Learning the Basics 2.48  The following is a set of data from a sample of n = 11 items: X: Y: a Construct a scatter plot b Is there a relationship between X and Y? Explain 2.49  The following is a series of annual sales (in $millions) over an 11-year period (2002 to 2012): Year: 2 002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Sales: 13.0 17.0 19.0 20.0 20.5 20.5 20.5 20.0 19.0 17.0 13.0 a Construct a time-series plot b Does there appear to be any change in annual sales over time? Explain Applying the Concepts SELF 2.50  Movie companies need to predict the gross reTest ceipts of individual movies once a movie has debuted The following results, stored in PotterMovies , are the first week- end gross, the U.S gross, and the worldwide gross (in $millions) of the eight Harry Potter movies: Title Sorcerer’s Stone Chamber of Secrets Prisoner of Azkaban Goblet of Fire Order of the Phoenix Half-Blood Prince Deathly Hallows Part I Deathly Hallows Part II First Worldwide Weekend U.S Gross Gross ($millions) ($millions) ($millions) 90.295 88.357 93.687 102.335 77.108 77.836 125.017 169.189 317.558 261.988 249.539 290.013 292.005 301.460 295.001 381.011 976.458 878.988 795.539 896.013 938.469 934.601 955.417 1,328.111 Source: Data extracted from www.the-numbers.com/interactive/ comp-HarryPotter.php M02_BERE9029_13_SE_C02.indd 95 a Construct a scatter plot with first weekend gross on the X axis and U.S gross on the Y axis b Construct a scatter plot with first weekend gross on the X axis and worldwide gross on the Y axis c What can you say about the relationship between first weekend gross and U.S gross and first weekend gross and worldwide gross? 2.51  Data were collected on the typical cost of dining at American-cuisine restaurants within a 1-mile walking distance of a hotel located in a large city The file Bundle contains the typical cost (a per transaction cost in $) as well as a Bundle score, a measure of overall popularity and customer loyalty, for each of 40 selected restaurants (Data extracted from www.bundle.com via the link on-msn.com/MnlBxo.) a Construct a scatter plot with Bundle score on the X axis and typical cost on the Y axis b What conclusions can you reach about the relationship between Bundle score and typical cost? 2.52  College football is big business, with coaches’ pay and revenues in millions of dollars The file College Football contains the coaches’ total pay and net revenue for college football at 105 schools (Data extracted from “College Football Coaches Continue to See Salary Explosion,” USA Today, November 20, 2012, p 1C.) a Do you think schools with higher net revenues also have higher coaches’ pay? b Construct a scatter plot with net revenue on the X axis and coaches’ pay on the Y axis c Does the scatter plot confirm or contradict your answer to (a)? 2.53  A Pew Research Center survey found that social networking is popular in many nations around the world The file Global SocialMedia contains the level of social media networking (measured as the percentage of individuals polled who use social networking sites) and the GDP at purchasing power parity (PPP) per capita for each of 25 selected countries (Data extracted from Pew Research Center, “Global Digital Communication: Texting, Social Networking Popular Worldwide,” updated February 29, 2012, via the link bit.ly/sNjsmq.) 19/09/14 10:18 AM 96 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables a Construct a scatterplot with GDP (PPP) per capita on the X axis and social media usage on the Y axis b What conclusions can your reach about the relationship between GDP and social media usage? 2.54  How have stocks performed in the past? The following table presents the data stored in Stock Performance and shows the performance of a broad measure of stocks (by percentage) for each decade from the 1830s through the 2000s: Decade 2.55  The data in NewHomeSales represent number and median sales price of new single-family houses sold in the United States recorded at the end of each month from January 2000 through ­December 2012 (Data extracted from www.census.gov, February 24, 2013.) a Construct a times series plot of new home sales prices b What pattern, if any, is present in the data? 2.56  The file Movie Attendance contains the yearly movie a­ ttendance (in billions) from 2001 through 2012: Performance (%) 1830s 2.8 Year Attendance (billions) 1840s 12.8 2001 1.44 1850s 6.6 2002 1.58 1860s 12.5 2003 1.55 1870s 7.5 2004 1.49 1880s 6.0 2005 1.40 1890s 5.5 2006 1.41 1900s 10.9 2007 1.40 1910s 2.2 2008 1.39 1920s 13.3 2009 1.42 1930s - 2.2 2010 1.33 1940s 9.6 2011 1.30 1950s 18.2 2012 1.37 1960s 8.3 1970s 6.6 1980s 16.6 1990s 17.6 2000s* - 0.5 *Through December 15, 2009 Source: Data extracted from T Lauricella, “Investors Hope the '10s Beat the '00s,” The Wall Street Journal, December 21, 2009, pp C1, C2 a Construct a time-series plot of the stock performance from the 1830s to the 2000s b Does there appear to be any pattern in the data? Source: Data extracted from the-numbers.com/market a Construct a time-series plot for the movie attendance (in ­billions) b What pattern, if any, is present in the data? 2.57  The file Audits contains the number of audits of corporations with assets of more than $250 million conducted by the Internal Revenue Service between 2001 and 2012 (Data extracted from www.irs.gov.) a Construct a time-series plot b What pattern, if any, is present in the data? 2.6  Organizing Many Categorical Variables Methods designed specifically to visualize many categorical variables are beyond the scope of this book to discuss M02_BERE9029_13_SE_C02.indd 96 You construct a multidimensional contingency table to tally the responses of three or more categorical variables In the simplest case of three categorical variables, each cell in the table contains the tallies of the third variable, organized by the subgroups represented by the row and column variables Both Excel and Minitab can organize many variables at the same time, but the two programs have different strengths Using Excel, you can create a PivotTable, an interactive table that facilitates exploring multidimensional data A PivotTable summarizes the variables as a multidimensional table and allows you to interactively change the level of summarization and the arrangement and formatting of the variables PivotTables also allow you to interactively “slice” your data to summarize subsets of data that meet specified criteria as discussed in Section 17.1 19/09/14 10:18 AM www.downloadslide.com 2.6  Organizing Many Categorical Variables 97 Using Minitab, you can also create multidimensional tables, but unlike Excel PivotTables, the Minitab tables are not interactive However, Minitab, unlike Excel, contains a number of specialized statistical and graphing procedures (beyond the scope of this book to discuss) that can be used to analyze and visualize multidimensional data Consider the Table 2.5 contingency table on page 67 that jointly tallies the type and risk variables for the sample of 316 retirement funds as percentages of the overall total For convenience, this table is shown as a two-dimensional PivotTable in the left illustration of Figure 2.16 This table shows, among other things, that there are many more growth funds of low risk than of average or high risk Figure 2.16 PivotTables for the retirement funds sample showing percentage of overall total for fund type and showing percentage of overall total risk (left) and for fund type, market cap, and risk (right) Adding a third categorical variable, the market cap of the fund, creates the multidimensional contingency table shown at right in Figure 2.16 This new PivotTable reveals the following patterns that cannot be seen in the original Table 2.5 contingency table: • For the growth funds, the pattern of risk differs depending on the market cap of the fund Large cap funds are most likely to have low risk and are very unlikely to have high risk Mid-cap funds are equally likely to have low or average risk Small cap funds are most likely to have average risk and are less likely to have high risk • The value funds show a pattern of risk that is different from the pattern seen in the growth funds Mid-cap funds are more likely to have low risk Almost all of large value funds are low risk, and the small value funds are equally likely to have low or average risk Based on these results, the market cap of the mutual fund (small cap, mid-cap, large cap) is an example of a lurking variable, a variable that is affecting the results of the other variables The relationship between the type of fund (growth or value) and the level of risk is clearly affected by the market cap of the mutual fund (small cap, mid-cap, or large cap) Problems for Section 2.6 Applying the Concepts   Using the sample of retirement funds stored in RetirementFunds : a Construct a table that tallies type, market cap, and rating b What conclusions can you reach concerning differences among the types of retirement funds (growth and value), based on market cap (small, mid-cap, and large) and the rating (one, two, three, four, and five)?   Using the sample of retirement funds stored in RetirementFunds : a Construct a table that tallies market cap, risk, and rating b What conclusions can you reach concerning differences among the types of funds based on market cap (small, mid-cap, and large), risk (low, average, and high), and the rating (one, two, three, four, and five)? M02_BERE9029_13_SE_C02.indd 97   Using the sample of retirement funds stored in RetirementFunds : a Construct a table that tallies type, risk, and rating b What conclusions can you reach concerning differences among the types of retirement funds (growth and value), based on the risk (low, average, and high), and the rating (one, two, three, four, and five)?   Using the sample of retirement funds stored in RetirementFunds : a Construct a table that tallies type, market cap, risk, and rating b What conclusions can you reach concerning differences among the types of funds based on market cap (small, mid-cap, and large), based on type (growth and value), risk (low, average, and high), and rating (one, two, three, four, and five)? 19/09/14 10:18 AM 98 www.downloadslide.com Chapter 2  Organizing and Visualizing Variables 2.7  Challenges in Organizing and Visualizing Variables As noted throughout this chapter, organizing and visualizing variables can provide useful summaries that can jumpstart the analysis of the variables However, you must be mindful of the limits of the information technology being used to collect, store, and analyze data as well as the limits of others to be able to perceive and comprehend your results Many people make a mistake of being overly worried about the former limits—over which, in a typical business environment, they have no control—and forgetting or being naïve about the presentation issues that are often much more critical You can sometimes easily create summaries and visualizations that obscure the data or create false impressions of the data that lead to misleading or unproductive analysis The challenge in organizing and visualizing variables is to avoid these complications Obscuring Data Management specialists have long known that information overload, presenting too many details, can obscure data and hamper decision making (see reference 2) Figure 2.17 presents an expanded version of the multidimensional contingency table shown in Figure 2.16 on page 97 This table, broken up into two parts by Minitab, illustrates that too many variables as well as data poorly formatted and presented can obscure the data Figure 2.17 Expanded multidimensional contingency table for the retirement funds sample showing percentage of overall total for fund type, market cap, risk, and star rating Even though Figure 2.17 uses an example constructed by Minitab, the principle being illustrated holds for Excel as well The equivalent Excel PivotTable is even more obscuring than the Figure 2.17 table! Visualizations can also be subject to information overload Figure 2.18 presents a sideby-side bar chart that is based on the obscured data of Figure 2.17 and is typical of charts that sometimes get constructed when using large or complex sets of data, including the “big data” discussed in Chapter 17 As a bar chart, this visualization can highlight certain characteristics of the sample data, consistent with the discussion earlier in the chapter (For example, when you examine Figure 2.18, you can notice more quickly than you would when examining Figure 2.17 that there are more large-cap retirement funds with low risk and a three-star rating than any other combination of risk and star rating.) However, other details are less obvious, and an overly complex legend poses its own problems even for people who not suffer from color perception problems M02_BERE9029_13_SE_C02.indd 98 19/09/14 10:18 AM ... 2  11 0 Managing Ashland MultiComm Services  11 0 Digital Case  11 1 CardioGood Fitness  11 1 The Choice Is Yours Follow-Up  11 1 Clear Mountain State Student Surveys  11 2 Chapter Excel Guide  11 3... available from the British Library 10  9 8 7 6 5 4 3 2 1 15 14 13 12 11 ISBN 10 : 1- 292-06902-3 ISBN 13 : 978 -1- 292-06902-9 Typeset in Times LT Std by Lumina Datamatics Printed by Courier Kendallville in... Chapter 11 Excel Guide  468   EG 11. 1 The Completely Randomized Design: One-Way ANOVA 468   EG 11. 2 The Randomized Block Design  470   EG 11. 3 The Factorial Design: Two-Way ANOVA  4 71 Chapter 11 Minitab

Ngày đăng: 06/02/2018, 14:44

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w