Chapter Describing Categorical Data Copyright © 2011 Pearson Education, Inc 3.1 Looking At Data Which hosts send the most visitors to Amazon’s Web site? Data set consists of 188,996 visits Host is a categorical variable To answer this question we must describe the variation in Host of 28 Copyright © 2011 Pearson Education, Inc 3.1 Looking At Data Frequency and Relative Frequency Tables The distribution of a categorical variable is a list of values with its associated count (frequency) A frequency table summarizes the distribution of a categorical variable A relative frequency table shows the proportion (or percentage) in each category of 28 Copyright © 2011 Pearson Education, Inc 3.1 Looking At Data of 28 Copyright © 2011 Pearson Education, Inc 3.2 Charts of Categorical Data Bar Charts and Pie Charts Unless you need to know exact counts, charts are better than tables for summarizing more than five categories The two most common displays of a categorical variable are a bar chart and a pie chart of 28 Copyright © 2011 Pearson Education, Inc 3.2 Charts of Categorical Data The Bar Chart Uses horizontal or vertical bars to show the distribution of a categorical variable Is called a Pareto chart when the categories are sorted by frequency (popular in quality control) Becomes cluttered with too many categories Is appropriate for ordinal categorical variables of 28 Copyright © 2011 Pearson Education, Inc 3.2 Charts of Categorical Data Bar Chart (Horizontal) of Top 10 Hosts of 28 Copyright © 2011 Pearson Education, Inc 3.2 Charts of Categorical Data Bar Chart (Vertical) of Top 10 Hosts of 28 Copyright © 2011 Pearson Education, Inc 3.2 Charts of Categorical Data The Pie Chart Uses wedges of a circle to show the distribution of a categorical variable Commonly chosen to illustrate market shares or sources of revenue for a company Less useful than bar charts if we want to compare actual counts (easier to compare bars than angles of wedges) 10 of 28 Copyright © 2011 Pearson Education, Inc 3.3 The Area Principle The Same Example Respecting the Area Principle 14 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.1: ROLLING OVER Motivation Are certain types of vehicles more prone to rollover accidents than others? 15 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.1: ROLLING OVER Method Data gathered from Fatality Analysis Reporting System (FARS) for roll-over accidents on interstate highways Cases that make up the rows are accidents resulting in roll-overs in 2000 The column of interest is model of the car involved 16 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.1: ROLLING OVER Mechanics 17 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.1: ROLLING OVER Mechanics 18 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.1: ROLLING OVER Message Ford Broncos were involved in more than twice as many roll-over accidents as the next-closest model 19 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.2: CHIP SALES Motivation Infineon pled guilty to price fixing for DRAM’s in September 2004 Did Infineon gain a larger share of the market for chips during this period? 20 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.2: CHIP SALES Method 21 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.2: CHIP SALES Mechanics 22 of 28 Copyright © 2011 Pearson Education, Inc 4M Example 3.2: CHIP SALES Message Infineon and Samsung increased their shares from 1999 to 2002 It appears to have been at the expense of smaller companies 23 of 28 Copyright © 2011 Pearson Education, Inc 3.4 Mode and Median Mode Category with the highest frequency The longest bar in a bar chart The widest slice in a pie chart Two or more categories can tie with the highest frequency (bimodal or multimodal) 24 of 28 Copyright © 2011 Pearson Education, Inc 3.4 Mode and Median Median Not appropriate for nominal data Data must be ordinal It is the category label of the middle observation in ordered data 25 of 28 Copyright © 2011 Pearson Education, Inc Best Practices Use a bar chart to show the frequencies of a categorical variable Use a pie chart to show the proportions of a categorical variable Preserve the ordering of an ordinal variable 26 of 28 Copyright © 2011 Pearson Education, Inc Best Practices (Continued) Respect the area principle Show the best plots to answer the motivating question Label your chart to show the categories and indicate whether some have been combined or omitted 27 of 28 Copyright © 2011 Pearson Education, Inc Pitfalls Avoid elaborate plots that may be deceptive Do not show too many categories Do not put ordinal data in a pie chart Do not carelessly round data 28 of 28 Copyright © 2011 Pearson Education, Inc ... Education, Inc 3.2 Charts of Categorical Data Bar Charts and Pie Charts Unless you need to know exact counts, charts are better than tables for summarizing more than five categories The two most... Education, Inc 4M Example 3.1: ROLLING OVER Method Data gathered from Fatality Analysis Reporting System (FARS) for roll-over accidents on interstate highways Cases that make up the rows are... CHIP SALES Motivation Infineon pled guilty to price fixing for DRAM’s in September 2004 Did Infineon gain a larger share of the market for chips during this period? 20 of 28 Copyright © 2011 Pearson