Statistics for Business and Economics 7th Edition Chapter Describing Data: Numerical Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-1 Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard deviation, and coefficient of variation and know what these values mean Apply the empirical rule to describe the variation of population values around the mean Explain the weighted mean and when to use it Explain how a least squares regression line estimates a linear relationship between two variables Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-2 Chapter Topics Measures of central tendency, variation, and shape Mean, median, mode, geometric mean Quartiles Range, interquartile range, variance and standard deviation, coefficient of variation Symmetric and skewed distributions Population summary measures Mean, variance, and standard deviation The empirical rule and Bienaymé-Chebyshev rule Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-3 Chapter Topics (continued) Five number summary and box-and-whisker plots Covariance and coefficient of correlation Pitfalls in numerical descriptive measures and ethical considerations Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-4 Describing Data Numerically Describing Data Numerically Central Tendency Variation Arithmetic Mean Range Median Interquartile Range Mode Variance Standard Deviation Coefficient of Variation Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-5 2.1 Measures of Central Tendency Overview Central Tendency Mean Median Mode Midpoint of ranked values Most frequently observed value n x i x i1 n Arithmetic average Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-6 Arithmetic Mean The arithmetic mean (mean) is the most common measure of central tendency For a population of N values: N x i x1 x x N μ N N i1 Population values Population size For a samplen of size n: x x i1 n i x1 x x n n Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Observed values Sample size Ch 2-7 Arithmetic Mean (continued) The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 10 Mean = 15 3 5 Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall 10 Mean = 10 20 4 5 Ch 2-8 Median In an ordered list, the median is the “middle” number (50% above, 50% below) 10 10 Median = Median = Not affected by extreme values Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-9 Finding the Median The location of the median: n 1 Median position position in the ordered data If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of the two middle numbers n 1 Note that is not the value of the median, only the position of the median in the ranked data Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-10 Chebychev’s Theorem (continued) Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean (for k > 1) Examples: At least within (1 - 1/1.52) = 55.6% …… k = 1.5 (μ ± 1.5σ) (1 - 1/22) = 75% … k = (μ ± 2σ) (1 - 1/32) = 89% …….… k = (μ ± 3σ) Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-42 The Empirical Rule If the data distribution is bell-shaped, then the interval: μ 1σ contains about 68% of the values in the population or the sample 68% μ μ 1σ Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-43 The Empirical Rule μ 2σ contains about 95% of the values in the population or the sample μ 3σ contains almost all (about 99.7%) of the values in the population or the sample 95% 99.7% μ 2σ μ 3σ Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-44 2.3 Weighted Mean The weighted mean of a set of data is n w x i x i n Where wi is the weight of the ith observation w and n w 1x1 w x w n x n n i1 i Use when data is already grouped into n classes, with wi values in the ith class Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-45 Approximations for Grouped Data Suppose data are grouped into K classes, with frequencies f1, f2, fK, and the midpoints of the classes are m1, m2, , mK For a sample of n observations, the mean is K fm i x i1 Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall n i K where n fi i1 Ch 2-46 Approximations for Grouped Data Suppose data are grouped into K classes, with frequencies f1, f2, fK, and the midpoints of the classes are m1, m2, , mK For a sample of n observations, the variance is K f (m x ) i i s2 i1 Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall n Ch 2-47 2.4 The Sample Covariance The covariance measures the strength of the linear relationship between two variables The population covariance: N (x i Cov (x , y) xy i1 x )(yi y ) N The sample covariance: n (x x)(y y) i Cov (x , y) s xy i1 i n Only concerned with the strength of the relationship No causal effect is implied Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-48 Interpreting Covariance Covariance between two variables: Cov(x,y) > x and y tend to move in the same direction Cov(x,y) < x and y tend to move in opposite directions Cov(x,y) = x and y are independent Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-49 Coefficient of Correlation Measures the relative strength of the linear relationship between two variables Population correlation coefficient: Cov (x , y) ρ σXσY Sample correlation coefficient: Cov (x , y) r sX sY Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-50 Features of Correlation Coefficient, r Unit free Ranges between –1 and The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-51 Scatter Plots of Data with Various Correlation Coefficients Y Y r = -1 X Y Y r = -.6 X Y Y r = +1 X Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall r=0 X r = +.3 X r=0 X Ch 2-52 Using Excel to Find the Correlation Coefficient Select Data / Data Analysis Choose Correlation from the selection menu Click OK Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-53 Using Excel to Find the Correlation Coefficient (continued) Input data range and select appropriate options Click OK to get output Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-54 Interpreting the Result r = 733 There is a relatively strong positive linear relationship between test score #1 and test score #2 Students who scored high on the first test tended to score high on second test Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-55 Chapter Summary Described measures of central tendency Illustrated the shape of the distribution Symmetric, skewed Described measures of variation Mean, median, mode Range, interquartile range, variance and standard deviation, coefficient of variation Discussed measures of grouped data Calculated measures of relationships between variables covariance and correlation coefficient Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-56 .. .Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range,... standard deviation The empirical rule and Bienaymé-Chebyshev rule Copyright © 2010 Pearson Education, Inc Publishing as Prentice Hall Ch 2-3 Chapter Topics (continued) Five number summary and... the most common measure of central tendency For a population of N values: N x i x1 x x N μ N N i1 Population values Population size For a samplen of size n: x x i1 n i x1