Chapter Describing Numerical Data Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables Can 500 different songs fit on the iPod Shuffle? To answer this question we must understand the typical length of a song and the variation of song sizes around the typical length We can this using summary statistics of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables A Subset of the Data of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables The Median Value in the middle of a sorted list of numerical values (a typical value) Half of the values fall below the median; half fall above It is the 50th Percentile of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables Common Percentiles Lower Quartile = 25th Percentile Upper Quartile = 75th Percentile One quarter of the values fall below the lower quartile and one quarter fall above the upper quartile of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables The Interquartile Range (IQR) IQR = 75th Percentile – 25th Percentile A measure of variation based on quartiles Used to accompany the median of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables The Range Range = Maximum - Minimum Maximum Value = 100th Percentile Minimum Value = 0th Percentile Another measure of variation; not preferred because based on extreme values of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables The Five Number Summary Minimum Lower Quartile Median Upper Quartile Maximum of 42 Copyright © 2011 Pearson Education, Inc 4.1 Summaries of Numerical Variables The Five Number Summary for Song Sizes Minimum = 0.148 MB Lower Quartile = 2.85 MB Median = 3.5015 MB Upper Quartile = 4.32 MB Maximum = 21.622 MB 10 of 42 Copyright © 2011 Pearson Education, Inc 4.4 Shape of a Distribution Modes Position of an isolated peak in a histogram A histogram with one peak is unimodal; two is bimodal; three or more is multimodal A histogram with all bars about the same height is uniform 28 of 42 Copyright © 2011 Pearson Education, Inc 4.4 Shape of a Distribution Symmetry and Skewness A distribution is symmetric if the two sides of its histogram are mirror images A distribution is skewed if one tail of the histogram stretches out farther than the other 29 of 42 Copyright © 2011 Pearson Education, Inc 4.4 Shape of a Distribution Distribution of Song Sizes The mode lies between and MB The distribution is right skewed (the right tail stretches out farther than the left tail) 30 of 42 Copyright © 2011 Pearson Education, Inc 4M Example 4.2: EXECUTIVE COMPENSATION Motivation What can we say about the salaries of CEO’s in 2003? 31 of 42 Copyright © 2011 Pearson Education, Inc 4M Example 4.2: EXECUTIVE COMPENSATION Method Data consist of the salaries for 1,501 CEO’s reported in thousands of dollars (obtained from Compustat) 32 of 42 Copyright © 2011 Pearson Education, Inc 4M Example 4.2: EXECUTIVE COMPENSATION Mechanics 33 of 42 Copyright © 2011 Pearson Education, Inc 4M Example 4.2: EXECUTIVE COMPENSATION Message The distribution of annual salaries of CEO’s in 2003 is unimodal, nearly symmetric around the median of $650,000, and right skewed The average is $697,000 The largest salary is $4,000,000 34 of 42 Copyright © 2011 Pearson Education, Inc 4.4 Shape of a Distribution Bell-Shaped Distributions and Empirical Rule A bell-shaped distribution is symmetric and unimodal The empirical rule uses the standard deviation to describe how data with a bellshaped distribution cluster around the mean 35 of 42 Copyright © 2011 Pearson Education, Inc 4.4 Shape of a Distribution The Empirical Rule 36 of 42 Copyright © 2011 Pearson Education, Inc 4.4 Shape of a Distribution Standardizing Converting data to z-scores Z- scores measure the distance from the mean in standard deviations yy z s 37 of 42 Copyright © 2011 Pearson Education, Inc 4.5 Epilog Can 500 different songs fit on the iPod Shuffle? Because of variation, not every collection of 500 songs will fit The longest 500 songs won’t fit However, based on the typical song size, the amount of variation in song sizes and the shape of its distribution, we can say that most collections of 500 songs will fit! 38 of 42 Copyright © 2011 Pearson Education, Inc Best Practices Be sure that data are numerical when using histograms and summaries such as the mean and standard deviation Summarize the distribution of a numerical variable with a graph Choose interval widths appropriate to the data when preparing a histogram 39 of 42 Copyright © 2011 Pearson Education, Inc Best Practices (Continued) Scale your plots to show data, not empty space Anticipate what you will see in a histogram Label clearly Check for gaps 40 of 42 Copyright © 2011 Pearson Education, Inc Pitfalls Do not use the methods of this chapter for categorical variables Do not assume that all numerical data have a bell-shaped distribution Do not ignore the presence of outliers 41 of 42 Copyright © 2011 Pearson Education, Inc Pitfalls (Continued) Do not remove outliers unless you have a good reason Do not forget to take the square root of a variance 42 of 42 Copyright © 2011 Pearson Education, Inc ... M&M’s Mechanics Mean Weight = 0.86 gm SD = 0 .04 gm Cv = 0 .04 gm / 0.86 gm = 0 .046 5 20 of 42 Copyright © 2011 Pearson Education, Inc 4M Example 4.1: MAKING M&M’s Message Since the SD is quite small... Numerical Variables Summary Statistics for Song Sizes Mean = 3.7794 MB Variance = 2.584 MB² SD = 1.607 MB 17 of 42 Copyright © 2011 Pearson Education, Inc 4M Example 4.1: MAKING M&M’s Motivation... answer this question we must understand the typical length of a song and the variation of song sizes around the typical length We can this using summary statistics of 42 Copyright © 2011 Pearson