Table 6-1 Methods for Finding Normal Distribution Areas Table A-2, STATDISK, Minitab, Excel Gives the cumulative area from the left up to a vertical line above a specific value of z.. I
Trang 1Normal Probability Distributions
6-2 The Standard Normal
Trang 2Ergonomics involves the study of people
fit-ting into their environments Ergonomics is
used in a wide variety of applications such as
these: Design a doorway so that most people
can walk through it without bending or
hit-ting their head; design a car so that the
dash-board is within easy reach of most drivers;
design a screw bottle top so that most
peo-ple have sufficient grip strength to open it;
design a manhole cover so that most workers
can fit through it Good ergonomic design
re-sults in an environment that is safe,
func-tional, efficient, and comfortable Bad
er-gonomic design can result in uncomfortable,
unsafe, or possibly fatal conditions For
ex-ample, the following real situations illustrate
the difficulty in determining safe loads in
air-craft and boats.
• “We have an emergency for Air Midwest
fifty-four eighty,” said pilot Katie Leslie, just
be-fore her plane crashed in Charlotte, North
Carolina The crash of the Beech plane killed
all of the 21 people on board In the
subse-quent investigation, the weight of the
passengers was suspected as a factor that
contributed to the crash This prompted
the Federal Aviation Administration to
or-der airlines to collect weight information
from randomly selected flights, so that the
old assumptions about passenger weights
could be updated.
• Twenty passengers were killed when the
Ethan Allen tour boat capsized on New
York’s Lake George Based on an assumed mean weight of 140 lb, the boat was certi- fied to carry 50 people A subsequent in- vestigation showed that most of the pas- sengers weighed more than 200 lb, and the boat should have been certified for a much smaller number of passengers.
• A water taxi sank in Baltimore’s Inner bor Among the 25 people on board, 5 died and 16 were injured An investigation revealed that the safe passenger load for the water taxi was 3500 lb Assuming a mean passenger weight of 140 lb, the boat was allowed to carry 25 passengers, but the mean of 140 lb was determined 44 years ago when people were not as heavy
Har-as they are today (The mean weight of the
25 passengers aboard the boat that sank was found to be 168 lb.) The National Transportation and Safety Board suggested that the old estimated mean of 140 lb be updated to 174 lb, so the safe load of 3500
lb would now allow only 20 passengers stead of 25.
in-This chapter introduces the statistical tools that are basic to good ergonomic de- sign After completing this chapter, we will
be able to solve problems in a wide variety of different disciplines, including ergonomics.How do we design airplanes, boats, cars, and homes
for safety and comfort?
Trang 3Review and Preview
In Chapter 2 we considered the distribution of data, and in Chapter 3 we consideredsome important measures of data sets, including measures of center and variation InChapter 4 we discussed basic principles of probability, and in Chapter 5 we presented
the concept of a probability distribution In Chapter 5 we considered only discrete probability distributions, but in this chapter we present continuous probability distri-
butions To illustrate the correspondence between area and probability, we begin with
a uniform distribution, but most of this chapter focuses on normal distributions
Nor-mal distributions occur often in real applications, and they play an important role inmethods of inferential statistics In this chapter we present concepts of normal distri-butions that will be used often in the remaining chapters of this text Several of thestatistical methods discussed in later chapters are based on concepts related to thecentral limit theorem, discussed in Section 6-5 Many other sections require normallydistributed populations, and Section 6-7 presents methods for analyzing sample data
to determine whether or not the sample appears to be from such a normally uted population
on the left side and one variable x on the right side The letters and e represent the
constant values of 3.14159 and 2.71828 , respectively The symbols and represent fixed values for the mean and standard deviation, respectively Once specificvalues are selected for and we can graph Formula 6-1 as we would graph any
equation relating x and y ; the result is a continuous probability distribution with the
same bell shape shown in Figure 6-1 From Formula 6-1 we see that a normal bution is determined by the fixed values of the mean and standard deviation And that’s all we need to know about Formula 6-1!
distri-sm
s,m
smÁ
Trang 4The Standard Normal Distribution
Key Concept In this section we present the standard normal distribution, which has
these three properties:
1. Its graph is bell-shaped (as in Figure 6-1)
2. Its mean is equal to 0 (that is, )
3. Its standard deviation is equal to 1 (that is, )
In this section we develop the skill to find areas (or probabilities or relative
frequen-cies) corresponding to various regions under the graph of the standard normal
distri-bution In addition, we find z-scores that correspond to areas under the graph.
Uniform Distributions
The focus of this chapter is the concept of a normal probability distribution, but we
begin with a uniform distribution The uniform distribution allows us to see two very
important properties:
1. The area under the graph of a probability distribution is equal to 1
2. There is a correspondence between area and probability (or relative frequency),
so some probabilities can be found by identifying the corresponding areas
Chapter 5 considered only discrete probability distributions, but we now consider
continuous probability distributions, beginning with the uniform distribution.
s = 1
m = 0
6-2
A continuous random variable has a uniform distribution if its values are
spread evenly over the range of possibilities The graph of a uniform
distri-bution results in a rectangular shape
Home Power Supply The Newport Power and Light
Company provides electricity with voltage levels that are uniformly distributed
be-tween 123.0 volts and 125.0 volts That is, any voltage amount bebe-tween 123.0
volts and 125.0 volts is possible, and all of the possible values are equally likely If
we randomly select one of the voltage levels and represent its value by the random
variable x, then x has a distribution that can be graphed as in Figure 6-2.
1
The Placebo Effect
It has long been believed that placebos actually help some patients In fact, some formal
studies have shown that when given
a placebo (a treat- ment with no medici- nal value), many test subjects show some improvement Estimates of improvement rates have typically ranged between one-third and two- thirds of the patients How- ever, a more recent study suggests that placebos have
no real effect An article in
the New England Journal of
Medicine (Vol 334, No 21)
was based on research of 114 medical studies over 50 years The authors of the article con- cluded that placebos appear
to have some effect only for relieving pain, but not for other physical conditions They con- cluded that apart from clini- cal trials, the use of placebos
Trang 5Voltage Level Given the uniform distribution illustrated in
Figure 6-2, find the probability that a randomly selected voltage level is greaterthan 124.5 volts
Figure 6-3 Using Area to Find Probability
The shaded area in Figure 6-3 represents voltage levels that aregreater than 124.5 volts Because the total area under the density curve is equal to
1, there is a correspondence between area and probability We can find the sired probability by using areas as follows:
de-The probability of randomly selecting a voltage level greaterthan 124.5 volts is 0.25
= 0.25
= 0.5 * 0.5
P (voltage greater than 124.5 volts) = area of shaded region in Figure 6-3
The graph of a continuous probability distribution, such as in Figure 6-2, is called a
density curve A density curve must satisfy the following two requirements.
Requirements for a Density Curve
1. The total area under the curve must equal 1
2. Every point on the curve must have a vertical height that is 0 or greater (That
is, the curve cannot fall below the x-axis.)
By setting the height of the rectangle in Figure 6-2 to be 0.5, we force the closed area to be , as required (In general, the area of the rectangle be-comes 1 when we make its height equal to the value of ) The requirementthat the area must equal 1 makes solving probability problems simple, so the follow-ing statement is important:
en-Because the total area under the density curve is equal to 1, there is a
cor-respondence between area and probability.
1>range
2 * 0.5 = 1
Trang 6Standard Normal Distribution
The density curve of a uniform distribution is a horizontal line, so we can find the
area of any rectangular region by applying this formula:
Be-cause the density curve of a normal distribution has a complicated bell shape as
shown in Figure 6-1, it is more difficult to find areas However, the basic principle is
the same: There is a correspondence between area and probability In Figure 6-4 we
show that for a standard normal distribution, the area under the density curve is
equal to 1
Area = width * height
The standard normal distribution is a normal probability distribution with
and The total area under its density curve is equal to 1 (See
Figure 6-4.)
s = 1
m = 0
It is not easy to find areas in Figure 6-4, so mathematicians have calculated
many different areas under the curve, and those areas are included in Table A-2 in
Figure 6-4 Standard Normal Distribution:
Bell-Shaped Curve with M ⴝ 0 and S ⴝ 1
Finding Probabilities When Given z Scores
Using Table A-2 (in Appendix A and the Formulas and Tables insert card), we can find
areas (or probabilities) for many different regions Such areas can also be found using
a Plus calculator, or computer software such as STATDISK, Minitab, or
Excel The key features of the different methods are summarized in Table 6-1 on the
next page Because calculators or computer software generally give more accurate results
than Table A-2, we strongly recommend using technology (When there are
discrep-ancies, answers in Appendix D will generally include results based on Table A-2 as
well as answers based on technology.)
If using Table A-2, it is essential to understand these points:
1. Table A-2 is designed only for the standard normal distribution, which has a
mean of 0 and a standard deviation of 1
2. Table A-2 is on two pages, with one page for negative z scores and the other
page for positive z scores.
TI-83>84
Trang 73. Each value in the body of the table is a cumulative area from the left up to a tical boundary above a specific z score.
ver-4. When working with a graph, avoid confusion between z scores and areas.
z score: Distance along the horizontal scale of the standard normal
distribution; refer to the leftmost column and top row of Table A-2.
Area: Region under the curve; refer to the values in the body of
Table A-2.
5. The part of the z score denoting hundredths is found across the top row of
Table A-2
Table 6-1 Methods for Finding Normal Distribution Areas
Table A-2, STATDISK, Minitab, Excel
Gives the cumulative area from the left up to a vertical line above a specific value
of z.
The procedure for using Table A-2 is described in the text.
Select Analysis,
Probability Distributions, Normal
Distribution Enter the z value,
then click on Evaluate.
Select Calc,
Probability Distributions, Normal.
In the dialog box, select Cumulative
Probability, Input Constant.
Select fx,
Statisti-cal, NORMDIST In the dialog box,
enter the value and mean, the standard deviation, and “true.”
[2: normal cdf ( ], then enter the
two z scores separated by a comma,
as in (left z score, right z score).
T I - 8 3 / 8 4
z
UpperLower
CAUTION
When working with a normal distribution, avoid confusion between z scores and
areas
The following example requires that we find the probability associated with a
z score less than 1.27 Begin with the z score of 1.27 by locating 1.2 in the left column;
next find the value in the adjoining row of probabilities that is directly below 0.07, asshown in the following excerpt from Table A-2
Trang 8TABLEA-2 (continued) Cumulative Area from the LEFT
Scientific Thermometers The Precision Scientific
In-strument Company manufactures thermometers that are supposed to give
read-ings of 0 C at the freezing point of water Tests on a large sample of these
instru-ments reveal that at the freezing point of water, some thermometers give readings
below 0 (denoted by negative numbers) and some give readings above 0
(de-noted by positive numbers) Assume that the mean reading is 0 C and the
stan-dard deviation of the readings is 1.00 C Also assume that the readings are
nor-mally distributed If one thermometer is randomly selected, find the probability
that, at the freezing point of water, the reading is less than 1.27 °
Area 0.8980(from Table A-2)
Figure 6-5 Finding the Area Below
zⴝ 1.27
The probability distribution of readings is a standard normal tribution, because the readings are normally distributed with and We
dis-need to find the area in Figure 6-5 below The area below is
equal to the probability of randomly selecting a thermometer with a reading less than
1.27 From Table A-2 we find that this area is 0.8980 °
z = 1.27
The probability of randomly selecting a thermometer with a
reading less than 1.27 (at the freezing point of water) is equal to the area of 0.8980
shown as the shaded region in Figure 6-5 Another way to interpret this result is to
conclude that 89.80% of the thermometers will have readings below 1.27 °
°
The area (or probability) value of 0.8980 indicates that there is a probability of
0.8980 of randomly selecting a z score less than 1.27 (The following sections will
consider cases in which the mean is not 0 or the standard deviation is not 1.)
Trang 9We again find the desired probability by finding a corresponding
area We are looking for the area of the region that is shaded in Figure 6-6, but Table
A-2 is designed to apply only to cumulative areas from the left Referring to Table A-2 for the page with negative z scores, we find that the cumulative area from the left up
to is 0.1093 as shown Because the total area under the curve is 1, wecan find the shaded area by subtracting 0.1093 from 1 The result is 0.8907 Eventhough Table A-2 is designed only for cumulative areas from the left, we can use it tofind cumulative areas from the right, as shown in Figure 6-6
z = -1.23
Scientific Thermometers Using the thermometers from
Example 3, find the probability of randomly selecting one thermometer that reads(at the freezing point of water) above -1.23°
Figure 6-6 Finding the Area Above zⴝ ⴚ1.23
Because of the correspondence between probability and area,
we conclude that the probability of randomly selecting a thermometer with a reading
above at the freezing point of water is 0.8907 (which is the area to the right of
) In other words, 89.07% of the thermometers have readings above -1.23°
z = -1.23-1.23°
Example 4 illustrates a way that Table A-2 can be used indirectly to find a lative area from the right The following example illustrates another way that we canfind an area indirectly by using Table A-2
cumu-Scientific Thermometers Make a random selection
from the same sample of thermometers from Example 3 Find the probability that thechosen thermometer reads (at the freezing point of water) between -2.00°and 1.50 °
5
We are again dealing with normally distributed values having amean of 0 and a standard deviation of 1 The probability of selecting a thermome-ter that reads between and 1.50 corresponds to the shaded area in Figure 6-7.Table A-2 cannot be used to find that area directly, but we can use the table to findthat corresponds to the area of 0.0228, and corresponds to the area of 0.9332, as shown in the figure From Figure 6-7 we see that the shadedarea is the difference between 0.9332 and 0.0228 The shaded area is therefore
.0.0228 = 0.91040.9332 -
Trang 10Using the correspondence between probability and area, weconclude that there is a probability of 0.9104 of randomly selecting one of the ther-
mometers with a reading between and 1.50 at the freezing point of water
Another way to interpret this result is to state that if many thermometers are selected
and tested at the freezing point of water, then 0.9104 (or 91.04%) of them will read
between and -2.00° 1.50 °
°-2.00°
z 1.500
z 2.00
(3)Area
0.9332 0.0228
0.9104
(1) Area is
0.0228 (from Table A-2)
(2) Total area from left up to
z 1.50 is 0.9332 (from Table A-2)
Figure 6-7 Finding the Area Between Two Values
Example 5 can be generalized as the following rule: The area corresponding to
the region between two specific z scores can be found by finding the difference
between the two areas found in Table A-2 Figure 6-8 illustrates this general rule.
Note that the shaded region B can be found by calculating the difference between two
areas found from Table A-2: area A and B combined (found in Table A-2 as the area
corresponding to ) and area A (found in Table A-2 as the area corresponding to
) Study hint: Don’t try to memorize a rule or formula for this case Focus on
un-derstanding how Table A-2 works If necessary, first draw a graph, shade the desired
area, then think of a way to find that area given the condition that Table A-2 provides
only cumulative areas from the left
Shaded area B (areas A and B combined) — (area A)
(area from Table A-2 using z Right) — (area from Table A-2 using z Left)
Figure 6-8 Finding the
Area Between Two z
Scores
Probabilities such as those in the preceding examples can also be expressed with
the following notation
Notation
denotes the probability that the z score is between a and b.
denotes the probability that the z score is greater than a.
denotes the probability that the z score is less than a.
Using this notation, we can express the result of Example 5 as:
, which states in symbols that the probability of a z score falling between
P(z 6 a)
P(z 7 a)
P (a 6 z 6 b)
Trang 11and 1.50 is 0.9104 With a continuous probability distribution such as the
normal distribution, the probability of getting any single exact value is 0 That is,
For example, there is a 0 probability of randomly selecting someoneand getting a person whose height is exactly 68.12345678 in In the normal distri-bution, any single point on the horizontal scale is represented not by a region underthe curve, but by a vertical line above the point For we have a verticalline above , but that vertical line by itself contains no area, so
With any continuous random variable, the probability of any
follows that the probability of getting a z score of at most b is equal to the probability
of getting a z score less than b It is important to correctly interpret key phrases such as
at most, at least, more than, no more than, and so on.
Finding z Scores from Known Areas
So far in this section, all of the examples involving the standard normal distribution
have followed the same format: Given z scores, find areas under the curve These
areas correspond to probabilities In many cases, we have the reverse: Given the area
(or probability), find the corresponding z score In such cases, we must avoid sion between z scores and areas Remember, z scores are distances along the horizontal
confu-scale, whereas areas (or probabilities) are regions under the curve (Table A-2 lists
z-scores in the left column and across the top row, but areas are found in the body of
the table.) Also, z scores positioned in the left half of the curve are always negative If
we already know a probability and want to determine the corresponding z score, we
find it as follows
Procedure for Finding a z Score from a Known Area
1. Draw a bell-shaped curve and identify the region under the curve that sponds to the given probability If that region is not a cumulative region from theleft, work instead with a known region that is a cumulative region from the left
corre-2. Using the cumulative area from the left, locate the closest probability in the
body of Table A-2 and identify the corresponding z score.
When referring to Table A-2, remember that the body of the table gives cumulative
areas from the left.
Figure 6-9 Finding the 95th Percentile
Scientific Thermometers Use the same thermometers
from Example 3, with temperature readings at the freezing point of water that arenormally distributed with a mean of 0 C and a standard deviation of 1.00 C Findthe temperature corresponding to , the 95th percentile That is, find the tem-perature separating the bottom 95% from the top 5% See Figure 6-9
P95
°
°
6
Trang 12Figure 6-9 shows the z score that is the 95th percentile, with 95%
of the area (or 0.95) below it Referring to Table A-2, we search for the area of 0.95 in
the body of the table and then find the corresponding z score In Table A-2 we find
the areas of 0.9495 and 0.9505, but there’s an asterisk with a special note indicating
that 0.9500 corresponds to a z score of 1.645 We can now conclude that the z score
in Figure 6-9 is 1.645, so the 95th percentile is the temperature reading of 1.645 C.°
When tested at freezing, 95% of the readings will be lessthan or equal to 1.645 C, and 5% of them will be greater than or equal to 1.645 C.° °
Note that in the preceding solution, Table A-2 led to a z score of 1.645, which is
midway between 1.64 and 1.65 When using Table A-2, we can usually avoid
inter-polation by simply selecting the closest value Special cases are listed in the
accompa-nying table because they are often used in a wide variety of applications (For one of
those special cases, the value of gives an area slightly closer to the area of
0.9950, but has the advantage of being the value midway between
and ) Except in these special cases, we can select the closest value
in the table (If a desired value is midway between two table values, select the larger
value.) For z scores above 3.49, we can use 0.9999 as an approximation of the
cumu-lative area from the left; for z scores below , we can use 0.0001 as an
approxi-mation of the cumulative area from the left
z Score the Left
-Scientific Thermometers Using the same thermometers
from Example 3, find the temperatures separating the bottom 2.5% and the top 2.5%
7
The required z scores are shown in Figure 6-10 To find the z score located to the left, we search the body of Table A-2 for an area of 0.025 The
result is To find the z score located to the right, we search the body of
Table A-2 for an area of 0.975 (Remember that Table A-2 always gives cumulative
areas from the left.) The result is The values of and
separate the bottom 2.5% and the top 2.5%, as shown in Figure 6-10
greater than 1.96 Another interpretation is that at the freezing point of water, 95%
of all thermometer readings will fall between ° -1.96°and 1.96 °
°-1.96
z 1.96
Area 0.025Area 0.025
To find this z score,locate the cumulativearea to the left in Table A–2. Locate 0.975
in the body of Table A–2.
Figure 6-10
Finding z Scores
Trang 13Caution: When using Table A-2 for finding a value of for a particular value of
, note that is the area to the right of , but Table A-2 lists cumulative areas to the
left of a given z score To find the value of by using Table A-2, resolve that conflict
by using the value of In Example 8, the value of can be found by ing the area of 0.9750 in the body of the table
locat-The examples in this section were created so that the mean of 0 and the standarddeviation of 1 coincided exactly with the properties of the standard normal distribu-tion In reality, it is unusual to find such convenient parameters, because typical nor-mal distributions involve means different from 0 and standard deviations differentfrom 1 In the next section we introduce methods for working with such normal dis-tributions, which are much more realistic and practical
Critical Values For a normal distribution, a critical value is a z score on the
bor-derline separating the z scores that are likely to occur from those that are unlikely.
shown in Example 7 In Example 7, the values below are not likely tooccur, because they occur in only 2.5% of the readings, and the values above
are not likely to occur because they also occur in only 2.5% of the
read-ings The reference to critical values is not so important in this chapter, but will
become extremely important in the following chapters The following notation is
used for critical z values found by using the standard normal distribution.
The notation of is used to represent the z score with an area
of 0.025 to its right Refer to Figure 6-10 and note that the value of has anarea of 0.025 to its right, so z0.025 = 1.96 z = 1.96
za
Y When working with the standard normal distribution, a
technol-ogy can be used to find z scores or areas, so the technoltechnol-ogy can be
used instead of Table A-2 The following instructions describe
how to find such z scores or areas.
Select Analysis, Probability
Distribu-tions, Normal Distribution Either enter the z score to find
corresponding areas, or enter the cumulative area from the left to
find the z score After entering a value, click on the Evaluate
button See the accompanying STATDISK display for an entry
of z = 2.00
S TAT D I S K
STATDISK
Trang 14• To find the cumulative area to the left of a z score (as in Table
A-2), select Calc, Probability Distributions, Normal,
Cu-mulative probabilities Then enter the mean of 0 and
stan-dard deviation of 1 Click on the Input Constant button and
enter the z score.
• To find a z score corresponding to a known probability, select
Calc, Probability Distributions, Normal Then select Inverse cumulative probabilities and the option Input con- stant For the input constant, enter the total area to the left of
the given value.
• To find the cumulative area to the left of a z score (as in Table A-2),
click on f x, then select Statistical, NORMSDIST, and enter the z score (In Excel 2010, select NORM.S.DIST.)
• To find a z score corresponding to a known probability,
select f x, Statistical, NORMSINV, and enter the total
area to the left of the given value (In Excel 2010, select
NORM.S.INV.)
To find the area between two
z scores, press F Oand select normalcdf Proceed to enter
the two z scores separated by a comma, as in (left z score, right
T I - 8 3 / 8 4 P L U S
E X C E L
M I N I TA B
To find a z score corresponding to a known probability, press F
Oand select invNorm Proceed to enter the total area to the left
of the z score For example, the command of invNorm(0.975)
yields a z score of 1.959963986, which is rounded to 1.96, as in
Example 6.
TI-83/84 PLUS
Basic Skills and Concepts
Statistical Literacy and Critical Thinking
1 Normal DistributionWhen we refer to a “normal” distribution, does the word “normal”
have the same meaning as in ordinary language, or does it have a special meaning in statistics?
What exactly is a normal distribution?
2 Normal DistributionA normal distribution is informally described as a probability
dis-tribution that is “bell-shaped” when graphed Describe the “bell shape.”
3 Standard Normal DistributionWhat requirements are necessary for a normal
probabil-ity distribution to be a standard normal probabilprobabil-ity distribution?
4 NotationWhat does the notation indicate?
Continuous Uniform Distribution In Exercises 5–8, refer to the continuous
uni-form distribution depicted in Figure 6-2 Assume that a voltage level between
123.0 volts and 125.0 volts is randomly selected, and find the probability that the
given voltage level is selected.
5 Greater than 124.0 volts
6 Less than 123.5 volts
7 Between 123.2 volts and 124.7 volts
8 Between 124.1 volts and 124.5 volts
za
6-2
z score) Example 5 could be solved with the command of
normalcdf( , 1.50), which yields a probability of 0.9104
(rounded) as shown in the accompanying screen.
ⴚ2.00
Trang 15Standard Normal Distribution In Exercises 9–12, find the area of the shaded
re-gion The graph depicts the standard normal distribution with mean 0 and dard deviation 1.
stan-9 10
11 12
Standard Normal Distribution In Exercises 13–16, find the indicated z score.
The graph depicts the standard normal distribution with mean 0 and standard deviation 1.
13 14
15 16
Standard Normal Distribution In Exercises 17–36, assume that thermometer
readings are normally distributed with a mean of 0°C and a standard deviation
of 1.00°C A thermometer is randomly selected and tested In each case, draw a sketch, and find the probability of each reading (The given values are in Celsius degrees.) If using technology instead of Table A-2, round answers to four decimal places.
17 Less than 18 Less than
19 Less than 1.23 20 Less than 2.34
21 Greater than 2.22 22 Greater than 2.33
23 Greater than 24 Greater than
25 Between 0.50 and 1.00 26 Between 1.00 and 3.00
29 Between and 1.95 30 Between and 1.34
31 Between and 5.00 32 Between and 1.00
33 Less than 3.55 34 Greater than 3.68
35 Greater than 0 36.Less than 0
Basis for the Range Rule of Thumb and the Empirical Rule In Exercises 37–40,
find the indicated area under the curve of the standard normal distribution, then
-4.50-2.50
-2.87-1.20
-0.50-1.00
-1.00-3.00
-1.96-1.75
-2.75-1.50
Trang 16convert it to a percentage and fill in the blank The results form the basis for the
range rule of thumb and the empirical rule introduced in Section 3-3.
37.About _% of the area is between and (or within 1 standard deviation
of the mean)
38 About _% of the area is between and (or within 2 standard
devia-tions of the mean)
39 About _% of the area is between and (or within 3 standard
devia-tions of the mean)
40 About _% of the area is between and (or within 3.5 standard
de-viations of the mean)
Finding Critical Values In Exercises 41–44, find the indicated value.
41 42
43 44
Finding Probability In Exercises 45–48, assume that the readings on the
ther-mometers are normally distributed with a mean of 0°C and a standard deviation
of 1.00° Find the indicated probability, where z is the reading in degrees.
45 46
Finding Temperature Values In Exercises 49–52, assume that thermometer
read-ings are normally distributed with a mean of 0°C and a standard deviation of
1.00°C A thermometer is randomly selected and tested In each case, draw a
sketch, and find the temperature reading corresponding to the given information.
49 Find , the 95th percentile This is the temperature reading separating the bottom 95%
from the top 5%
50 Find , the 1st percentile This is the temperature reading separating the bottom 1%
from the top 99%
51 If 2.5% of the thermometers are rejected because they have readings that are too high and
another 2.5% are rejected because they have readings that are too low, find the two readings
that are cutoff values separating the rejected thermometers from the others
52 If 0.5% of the thermometers are rejected because they have readings that are too low and
another 0.5% are rejected because they have readings that are too high, find the two readings
that are cutoff values separating the rejected thermometers from the others
Beyond the Basics
53 For a standard normal distribution, find the percentage of data that are
a within 2 standard deviations of the mean
b more than 1 standard deviation away from the mean
c more than 1.96 standard deviations away from the mean
d between and
e more than 3 standard deviations away from the mean
54 If a continuous uniform distribution has parameters of and then the
min-imum is and the maximum is
a For this distribution, find
b Find if you incorrectly assume that the distribution is normal instead of
Trang 17Applications of Normal Distributions
Key Concept In this section we introduce real and important applications
involv-ing nonstandard normal distributions by extendinvolv-ing the procedures presented inSection 6-2 We use a simple conversion (Formula 6-2) that allows us to standard-ize any normal distribution so that the methods of the preceding section can beused with normal distributions having a mean that is not 0 or a standard deviationthat is not 1 Specifically, given some nonstandard normal distribution, we should
be able to find probabilities corresponding to values of the variable x, and given
some probability value, we should be able to find the corresponding value of the
variable x.
To work with a nonstandard normal distribution, we simply standardize values touse the procedures from Section 6-2
If we convert values to standard z-scores using Formula 6-2, then
proce-dures for working with all normal distributions are the same as those for the standard normal distribution.
Formula 6-2
z = x - ms (round z scores to 2 decimal places)
Some calculators and computer software programs do not require the above
conversion to z scores because probabilities can be found directly However, if you use Table A-2 to find probabilities, you must first convert values to standard z
scores Regardless of the method you use, you need to clearly understand the aboveprinciple, because it is an important foundation for concepts introduced in the fol-lowing chapters
Figure 6-11 illustrates the conversion from a nonstandard to a standard normal
distribution The area in any normal distribution bounded by some score x (as in Figure 6-11(a)) is the same as the area bounded by the equivalent z score in the stan-
dard normal distribution (as in Figure 6-11(b)) This means that when working with
a nonstandard normal distribution, you can use Table A-2 the same way it was used
in Section 6-2, as long as you first convert the values to z scores.
55 Assume that z scores are normally distributed with a mean of 0 and a standard
56 In a continuous uniform distribution,
Find the mean and standard deviation for the uniform distribution represented in Figure 6-2
Trang 18m
ms
z x
(a)
z
P P
0(b)
NonstandardNormal Distribution
StandardNormal Distribution
Figure 6-11 Converting from a Nonstandard
to a Standard Normal Distribution
When finding areas with a nonstandard normal distribution, use this procedure:
1. Sketch a normal curve, label the mean and the specific x values, then shade the
region representing the desired probability
2. For each relevant value x that is a boundary for the shaded region, use Formula 6-2
to convert that value to the equivalent z score.
3. Refer to Table A-2 or use a calculator or computer software to find the area of
the shaded region This area is the desired probability
The following example applies these three steps to illustrate the relationship
be-tween a typical nonstandard normal distribution and the standard normal distribution
Why Do Doorways Have a Height of 6 ft 8 in.? The
typical home doorway has a height of 6 ft 8 in., or 80 in Because men tend to be
taller than women, we will consider only men as we investigate the limitations of
that standard doorway height Given that heights of men are normally distributed
with a mean of 69.0 in and a standard deviation of 2.8 in., find the percentage of
men who can fit through the standard doorway without bending or bumping
their head Is that percentage high enough to continue using 80 in as the standard
height? Will a doorway height of 80 in be sufficient in future years?
1
continued
Step 1: See Figure 6-12, which incorporates this information: Men have heights
that are normally distributed with a mean of 69.0 in and a standard deviation of
2.8 in The shaded region represents the men who can fit through a doorway that
has a height of 80 in
Figure 6-12 Heights (in inches) of Men
Step 2: To use Table A-2, we first must use Formula 6-2 to convert from the
non-standard normal distribution to the non-standard normal distribution The height of
80 in is converted to a z score as follows:
z = x - ms = 80 - 69.0
Trang 19Figure 6-13 shows the shaded region representing birth weights tween 2450 g and 4390 g We can’t find that shaded area directly from Table A-2, but
be-we can find it indirectly by using the same basic procedures presented in Section 6-2,
as follows: (1) Find the cumulative area from the left up to 2450; (2) find the lative area from the left up to 4390; (3) find the difference between those two areas
cumu-The proportion of men who can fit through the standarddoorway height of 80 in is 0.9999, or 99.99% Very few men will not be able to fitthrough the doorway without bending or bumping their head This percentage ishigh enough to justify the use of 80 in as the standard doorway height However,heights of men and women have been increasing gradually but steadily over the pastdecades, so the time may come when the standard doorway height of 80 in may nolonger be adequate
Birth Weights Birth weights in the United States are
nor-mally distributed with a mean of 3420 g and a standard deviation of 495 g TheNewport General Hospital requires special treatment for babies that are less than
2450 g (unusually light) or more than 4390 g (unusually heavy) What is the centage of babies who do not require special treatment because they have birthweights between 2450 g and 4390 g? Under these conditions, do many babies re-quire special treatment?
Figure 6-13 Birth Weights
Step 3: Referring to Table A-2 and using , we find that this z score is in
the category of “3.50 and up,” so the cumulative area to the left of 80 in is0.9999 as shown in Figure 6-12
If we use technology instead of Table A-2, we get the more accurate cumulative area
of 0.999957 (instead of 0.9999)
z = 3.93
Find the cumulative area up to 2450:
Using Table A-2, we find that corresponds to an area of 0.0250, as shown
Evelyn Marie Adams won
the New Jersey Lottery
twice in four months This
happy event was reported
17 trillion.
But vard mathe-
Har-maticians Persi Diaconis
and Frederick Mosteller
note that there is 1 chance
in 17 trillion that a particular
person with one ticket in
each of two New Jersey
lotteries will win both
times However, there is
about 1 chance in 30 that
someone in the United
States will win a lottery
twice in a four-month
pe-riod Diaconis and Mosteller
analyzed coincidences and
conclude that “with a large
enough sample, any
outra-geous thing is apt to
hap-pen.” More recently,
ac-cording to the Detroit
News, Joe and Dolly
Hor-nick won the Pennsylvania
lottery four times in 12
years for prizes of $2.5
mil-lion, $68,000, $206,217,
and $71,037.
Trang 20Find the cumulative area up to 4390:
Using Table A-2, we find that corresponds to an area of 0.9750, as shown in
weights between 2450 g and 4390 g It follows that 5.00% of the babies do require
special treatment because they are unusually light or heavy The 5.00% rate is
proba-bly not too high for typical hospitals
Finding Values from Known Areas
Here are helpful hints for those cases in which the area (or probability or percentage)
is known and we must find the relevant value(s):
1. Don’t confuse z scores and areas Remember, z scores are distances along the
horizon-tal scale, but areas are regions under the normal curve Table A-2 lists z scores in the
left columns and across the top row, but areas are found in the body of the table
2. Choose the correct side of the graph A value separating the top 10%
from the others will be located on the right side of the graph, but a value
sepa-rating the bottom 10% will be located on the left side of the graph
3. A z score must be negative whenever it is located in the left half of the normal
distribution
4. Areas (or probabilities) are positive or zero values, but they are never negative
Graphs are extremely helpful in visualizing, understanding, and successfully
work-ing with normal probability distributions, so they should be used whenever possible
Procedure for Finding Values Using Table A-2 and Formula 6-2
1. Sketch a normal distribution curve, enter the given probability or percentage in
the appropriate region of the graph, and identify the x value(s) being sought.
2. Use Table A-2 to find the z score corresponding to the cumulative left area
bounded by x Refer to the body of Table A-2 to find the closest area, then
identify the corresponding z score.
3. Using Formula 6-2, enter the values for and the z score found in Step 2,
then solve for x Based on Formula 6-2, we can solve for x as follows:
(If z is located to the left of the mean, be sure that it is a negative number.)
4. Refer to the sketch of the curve to verify that the solution makes sense in the
context of the graph and in the context of the problem
x = m + (z #s)
c (another form of Formula 6-2)
m, s,
(right >left)
Trang 21Step 2: In Table A-2 we search for an area of 0.9500 in the body of the table (The
area of 0.9500 shown in Figure 6-14 is a cumulative area from the left, and that isexactly the type of area listed in Table A-2.) The area of 0.9500 is between theTable A-2 areas of 0.9495 and 0.9505, but there is an asterisk and footnote indi-cating that an area of 0.9500 corresponds to
men would not fit through a doorway with a height of 73.6 in Because so many
men walk through doorways so often, this 5% rate is probably not practical
Designing Doorway Heights When designing an
en-vironment, one common criterion is to use a design that accommodates 95% ofthe population How high should doorways be if 95% of men will fit throughwithout bending or bumping their head? That is, find the 95th percentile ofheights of men Heights of men are normally distributed with a mean of 69.0 in.and a standard deviation of 2.8 in
3
Step 1: Figure 6-14 shows the normal distribution with the height x that we want
to identify The shaded area represents the 95% of men who can fit through thedoorway that we are designing
Figure 6-14 Finding Height
The following example uses the procedure just outlined
Trang 22Birth Weights The Newport General Hospital wants to
redefine the minimum and maximum birth weights that require special
treat-ment because they are unusually low or unusually high After considering
rele-vant factors, a committee recommends special treatment for birth weights in the
lowest 3% and the highest 1% The committee members soon realize that
spe-cific birth weights need to be identified Help this committee by finding the
birth weights that separate the lowest 3% and the highest 1% Birth weights in
the United States are normally distributed with a mean of 3420 g and a standard
deviation of 495 g
Step 1: We begin with the graph shown in Figure 6-15 We have entered the mean
of 3420 g, and we have identified the x values separating the lowest 3% and the
Step 2: If using Table A-2, we must use cumulative areas from the left For the
leftmost value of x, the cumulative area from the left is 0.03, so search for an area
of 0.03 in the body of the table to get (which corresponds to the
clos-est area of 0.0301) For the rightmost value of x, the cumulative area from the left
is 0.99, so search for an area of 0.99 in the body of the table to get (which
corresponds to the closest area of 0.9901)
Step 3: We now solve for the two values of x by using Formula 6-2 directly or by
using the following version of Formula 6-2:
Step 4: Referring to Figure 6-15, we see that the leftmost value of
is reasonable because it is less than the mean of 3420 g Also, the rightmost value
of 4573.35 is reasonable because it is above the mean of 3420 g (Technology yields
the values of 2489.0 g and 4571.5 g.)
The birth weight of 2489 g (rounded) separates the lowest3% of birth weights, and 4573 g (rounded) separates the highest 1% of birth weights
The hospital now has well-defined criteria for determining whether a newborn baby
should be given special treatment for a birth weight that is unusually low or high
Trang 23When using the methods of this section with applications involving a normaldistribution, it is important to first determine whether you are finding a probability
(or area) from a known value of x or finding a value of x from a known probability (or
area) Figure 6-16 is a flowchart summarizing the main procedures of this section
Are you using technology or Table A-2
?
Are you using technology or Table A-2
?
Solve for x:
Look up the cumulative left area in Table A-2 and find the corresponding
(from a known value of x)
Applications with Normal Distributions
Find a value of x
(from known probability or area)
Identify the cumulative area to the
by using the technology.
Figure 6-16 Procedures for Applications with Normal Distributions
Trang 24USING TECHNOL
Y When working with a nonstandard normal distribution, a
technol-ogy can be used to find areas or values of the relevant variable, so the
technology can be used instead of Table A-2 The following
instruc-tions describe how to use technology for such cases.
Select Analysis, Probability Distributions,
Normal Distribution Either enter the z score to find corresponding
areas, or enter the cumulative area from the left to find the z score.
After entering a value, click on the Evaluate button.
• To find the cumulative area to the left of a z score (as in Table A-2),
select Calc, Probability Distributions, Normal, Cumulative
probabilities Enter the mean and standard deviation, then click
on the Input Constant button and enter the value.
• To find a value corresponding to a known area, select Calc,
Probability Distributions, Normal, then select Inverse
cumu-lative probabilities Enter the mean and standard deviation
Se-lect the option Input constant and enter the total area to the left
of the given value.
• To find the cumulative area to the left of a value (as in Table A-2),
click on f x, then select Statistical, NORMDIST (In Excel 2010,
select NORM.DIST.) In the dialog box, enter the value for x,
enter the mean and standard deviation, and enter 1 in the
“cu-mulative” space.
• To find a value corresponding to a known area, select f x, Statistical,
NORMINV, (or NORM.INV in Excel 2010), and proceed to
make the entries in the dialog box When entering the
probabil-ity value, enter the total area to the left of the given value See the
accompanying Excel display for Example 3.
E X C E L
M I N I TA B
S TAT D I S K
• To find the area between two values, press 2nd, VARS, 2 (for
normalcdf ), then proceed to enter the two values, the mean, and the standard deviation, all separated by commas, as in (left value,
right value, mean, standard deviation) Hint: If there is no left
value, enter the left value as , and if there is no right value, enter the right value as 999999 In Example 1 we want the area to the left of , so use the command
normalcdf ( , 80, 69.0, 2.8) as shown in the
accompa-nying screen display.
ⴚ999999
x = 80 in -999999
T I - 8 3 / 8 4 P L U S
EXCEL
TI-83/84 PLUS
Basic Skills and Concepts
Statistical Literacy and Critical Thinking
1 Normal DistributionsWhat is the difference between a standard normal distribution and
a nonstandard normal distribution?
2 IQ Scores The distribution of IQ scores is a nonstandard normal distribution with a
mean of 100 and a standard deviation of 15, and a bell-shaped graph is drawn to represent
this distribution
a What is the area under the curve?
b What is the value of the median?
c What is the value of the mode?
6-3
• To find a value corresponding to a known area, press 2nd, VARS, the select invNorm, and proceed to enter the total area to the left
of the value, the mean, and the standard deviation in the format
of (total area to the left, mean, standard deviation) with the mas included.
Trang 25com-3 Normal DistributionsThe distribution of IQ scores is a nonstandard normal tion with a mean of 100 and a standard deviation of 15 What are the values of the mean and
distribu-standard deviation after all IQ scores have been distribu-standardized by converting them to z scores
using
4 Random Digits Computers are often used to randomly generate digits of telephonenumbers to be called when conducting a survey Can the methods of this section be used tofind the probability that when one digit is randomly generated, it is less than 5? Why or whynot? What is the probability of getting a digit less than 5?
IQ Scores In Exercises 5–8, find the area of the shaded region The graphs depict
IQ scores of adults, and those scores are normally distributed with a mean of 100 and a standard deviation of 15 (as on the Wechsler test).
IQ Scores In Exercises 9–12, find the indicated IQ score The graphs depict IQ
scores of adults, and those scores are normally distributed with a mean of 100 and a standard deviation of 15 (as on the Wechsler test).
IQ Scores In Exercises 13–20, assume that adults have IQ scores that are
nor-mally distributed with a mean of 100 and a standard deviation of 15 (as on the Wechsler test) (Hint: Draw a graph in each case.)
13 Find the probability that a randomly selected adult has an IQ that is less than 115
14 Find the probability that a randomly selected adult has an IQ greater than 131.5 (therequirement for membership in the Mensa organization)
15 Find the probability that a randomly selected adult has an IQ between 90 and 110
(referred to as the normal range).
16 Find the probability that a randomly selected adult has an IQ between 110 and 120
(referred to as bright normal ).
Trang 2617 Find , which is the IQ score separating the bottom 30% from the top 70%.
18 Find the first quartile , which is the IQ score separating the bottom 25% from the top 75%
19 Find the third quartile , which is the IQ score separating the top 25% from the others
20.Find the IQ score separating the top 37% from the others
In Exercises 21–26, use this information (based on data from the National
Health Survey):
•Men’s heights are normally distributed with mean 69.0 in and standard deviation 2.8 in.
•Women’s heights are normally distributed with mean 63.6 in and standard deviation 2.5 in.
21 Doorway HeightThe Mark VI monorail used at Disney World and the Boeing 757-200 ER
airliner have doors with a height of 72 in
a What percentage of adult men can fit through the doors without bending?
b What percentage of adult women can fit through the doors without bending?
c Does the door design with a height of 72 in appear to be adequate? Explain
d What doorway height would allow 98% of adult men to fit without bending?
22 Doorway HeightThe Gulfstream 100 is an executive jet that seats six, and it has a
door-way height of 51.6 in
a What percentage of adult men can fit through the door without bending?
b What percentage of adult women can fit through the door without bending?
c Does the door design with a height of 51.6 in appear to be adequate? Why didn’t the
engi-neers design a larger door?
d What doorway height would allow 60% of men to fit without bending?
23 Tall Clubs International Tall Clubs International is a social organization for tall
people It has a requirement that men must be at least 74 in tall, and women must be at least
70 in tall
a What percentage of men meet that requirement?
b What percentage of women meet that requirement?
c Are the height requirements for men and women fair? Why or why not?
24 Tall Clubs InternationalTall Clubs International has minimum height requirements
for men and women
a If the requirements are changed so that the tallest 4% of men are eligible, what is the new
minimum height for men?
b If the requirements are changed so that the tallest 4% of women are eligible, what is the
new minimum height for women?
25 U.S Army Height Requirements for WomenThe U.S Army requires women’s
heights to be between 58 in and 80 in
a Find the percentage of women meeting the height requirement Are many women being
denied the opportunity to join the Army because they are too short or too tall?
b If the U.S Army changes the height requirements so that all women are eligible except the
shortest 1% and the tallest 2%, what are the new height requirements?
26 Marine Corps Height Requirement for MenThe U.S Marine Corps requires that
men have heights between 64 in and 80 in
a Find the percentage of men who meet the height requirements Are many men denied the
opportunity to become a Marine because they do not satisfy the height requirements?
b If the height requirements are changed so that all men are eligible except the shortest 3%
and the tallest 4%, what are the new height requirements?
27 Birth WeightsBirth weights in Norway are normally distributed with a mean of 3570 g
and a standard deviation of 500 g
a If the Ulleval University Hospital in Oslo requires special treatment for newborn babies
weighing less than 2700 g, what is the percentage of newborn babies requiring special treatment?
Q3
Q1
P30
Trang 27b If the Ulleval University Hospital officials plan to require special treatment for the lightest3% of newborn babies, what birth weight separates those requiring special treatment fromthose who do not?
c Why is it not practical for the hospital to simply state that babies require special treatment
if they are in the bottom 3% of birth weights?
28 Weights of Water Taxi PassengersIt was noted in the Chapter Problem that when awater taxi sank in Baltimore’s Inner Harbor, an investigation revealed that the safe passengerload for the water taxi was 3500 lb It was also noted that the mean weight of a passenger wasassumed to be 140 lb Assume a “worst case” scenario in which all of the passengers are adultmen (This could easily occur in a city that hosts conventions in which people of the samegender often travel in groups.) Based on data from the National Health and Nutrition Exami-nation Survey, assume that weights of men are normally distributed with a mean of 172 lband a standard deviation of 29 lb
a If one man is randomly selected, find the probability that he weighs less than 174 lb (thenew value suggested by the National Transportation and Safety Board)
b With a load limit of 3500 lb, how many men passengers are allowed if we assume a meanweight of 140 lb?
c With a load limit of 3500 lb, how many men passengers are allowed if we use the newmean weight of 174 lb?
d Why is it necessary to periodically review and revise the number of passengers that are lowed to board?
al-29 Body TemperaturesBased on the sample results in Data Set 2 of Appendix B, assumethat human body temperatures are normally distributed with a mean of 98.20°F and a stan-dard deviation of 0.62°F
a Bellevue Hospital in New York City uses 100.6°F as the lowest temperature considered to
be a fever What percentage of normal and healthy persons would be considered to have afever? Does this percentage suggest that a cutoff of 100.6°F is appropriate?
b Physicians want to select a minimum temperature for requiring further medical tests Whatshould that temperature be, if we want only 5.0% of healthy people to exceed it? (Such a re-
sult is a false positive, meaning that the test result is positive, but the subject is not really sick.)
30 Aircraft Seat WidthEngineers want to design seats in commercial aircraft so that theyare wide enough to fit 99% of all males (Accommodating 100% of males would require verywide seats that would be much too expensive.) Men have hip breadths that are normally dis-tributed with a mean of 14.4 in and a standard deviation of 1.0 in (based on anthropometricsurvey data from Gordon, Clauser, et al.) Find That is, find the hip breadth for men thatseparates the smallest 99% from the largest 1%
31 Lengths of PregnanciesThe lengths of pregnancies are normally distributed with amean of 268 days and a standard deviation of 15 days
a One classical use of the normal distribution is inspired by a letter to “Dear Abby” in which
a wife claimed to have given birth 308 days after a brief visit from her husband, who was ing in the Navy Given this information, find the probability of a pregnancy lasting 308 days
serv-or longer What does the result suggest?
b If we stipulate that a baby is premature if the length of pregnancy is in the lowest 4%, find
the length that separates premature babies from those who are not premature Premature bies often require special care, and this result could be helpful to hospital administrators inplanning for that care
ba-32 Sitting DistanceA common design requirement is that an item (such as an aircraft ortheater seat) must fit the range of people who fall between the 5th percentile for women andthe 95th percentile for men If this requirement is adopted, what is the minimum sitting dis-tance and what is the maximum sitting distance? For the sitting distance, use the buttock-to-knee length Men have buttock-to-knee lengths that are normally distributed with a mean of23.5 in and a standard deviation of 1.1 in Women have buttock-to-knee lengths that arenormally distributed with a mean of 22.7 in and a standard deviation of 1.0 in
P99
Trang 28Large Data Sets In Exercises 33 and 34, refer to the data sets in Appendix B and
use computer software or a calculator.
33 Appendix B Data Set: Systolic Blood PressureRefer to Data Set 1 in Appendix B
and use the systolic blood pressure levels for males
a Using the systolic blood pressure levels for males, find the mean and standard deviation,
and verify that the data have a distribution that is roughly normal
b Assuming that systolic blood pressure levels of males are normally distributed, find the 5th
percentile and the 95th percentile (Treat the statistics from part (a) as if they were population
parameters.) Such percentiles could be helpful when physicians try to determine whether
blood pressure levels are too low or too high
34 Appendix B Data Set: Duration of Shuttle FlightsRefer to Data Set 10 in Appendix B
and use the durations (hours) of the NASA shuttle flights
a Find the mean and standard deviation, and verify that the data have a distribution that is
roughly normal
b Treat the statistics from part (a) as if they are population parameters and assume a normal
distribution to find the values of the quartiles Q1, Q2, and Q3
Beyond the Basics
35 Units of MeasurementHeights of women are normally distributed
a If heights of individual women are expressed in units of centimeters, what are the units
used for the z scores that correspond to individual heights?
b If heights of all women are converted to z scores, what are the mean, standard deviation,
and distribution of these z scores?
36 Using Continuity CorrectionThere are many situations in which a normal
distribu-tion can be used as a good approximadistribu-tion to a random variable that has only discrete values In
such cases, we can use this continuity correction: Represent each whole number by the interval
extending from 0.5 below the number to 0.5 above it Assume that IQ scores are all whole
numbers having a distribution that is approximately normal with a mean of 100 and a
stan-dard deviation of 15
a Without using any correction for continuity, find the probability of randomly selecting
someone with an IQ score greater than 103
b Using the correction for continuity, find the probability of randomly selecting someone
with an IQ score greater than 103
c Compare the results from parts (a) and (b)
37 Curving Test ScoresA statistics professor gives a test and finds that the scores are
nor-mally distributed with a mean of 25 and a standard deviation of 5 She plans to curve the scores
a If she curves by adding 50 to each grade, what is the new mean? What is the new standard
deviation?
b Is it fair to curve by adding 50 to each grade? Why or why not?
c If the grades are curved according to the following scheme (instead of adding 50), find the
numerical limits for each letter grade
A: Top 10%
B: Scores above the bottom 70% and below the top 10%
C: Scores above the bottom 30% and below the top 30%
D: Scores above the bottom 10% and below the top 70%
F: Bottom 10%
d Which method of curving the grades is fairer: Adding 50 to each grade or using the scheme
given in part (c)? Explain
6-3
Trang 2938 SAT and ACT TestsScores on the SAT test are normally distributed with a mean of
1518 and a standard deviation of 325 Scores on the ACT test are normally distributed with amean of 21.1 and a standard deviation of 4.8 Assume that the two tests use different scales tomeasure the same aptitude
a If someone gets a SAT score that is the 67th percentile, find the actual SAT score and theequivalent ACT score
b If someone gets a SAT score of 1900, find the equivalent ACT score
39 OutliersFor the purposes of constructing modified boxplots as described in Section 3-4,outliers were defined as data values that are above by an amount greater than
or below by an amount greater than , where IQR is the interquartile range.Using this definition of outliers, find the probability that when a value is randomly selectedfrom a normal distribution, it is an outlier
1.5 * IQR
Q1
1.5 * IQR
Q3
Sampling Distributions and Estimators
Key Concept In this section we consider the concept of a sampling distribution of a
statistic Also, we learn some important properties of sampling distributions of the
mean, median, variance, standard deviation, range, and proportion We see that somestatistics (such as the mean, variance, and proportion) are unbiased estimators of pop-ulation parameters, whereas other statistics (such as the median and range) are not.The following chapters of this book introduce methods for using sample statistics
to estimate values of population parameters Those procedures are based on an standing of how sample statistics behave, and that behavior is the focus of this section
under-We begin with the definition of a sampling distribution of a statistic
6-4
The sampling distribution of a statistic (such as a sample mean or sample
proportion) is the distribution of all values of the statistic when all possible
samples of the same size n are taken from the same population (The
sam-pling distribution of a statistic is typically represented as a probability bution in the format of a table, probability histogram, or formula.)
distri-Sampling Distribution of the Mean
The preceding definition is general, so let’s consider the specific sampling tion of the mean
distribu-The sampling distribution of the mean is the distribution of sample
means, with all samples having the same sample size n taken from the same
population (The sampling distribution of the mean is typically represented
as a probability distribution in the format of a table, probability histogram,
or formula.)
Trang 30The top portion of Table 6-2 illustrates a process of rolling a die
5 times and finding the mean of the results Table 6-2 shows results from repeating
this process 10,000 times, but the true sampling distribution of the mean involves
repeating the process indefinitely Because the values of 1, 2, 3, 4, 5, 6 are all equally
likely, the population has a mean of , and Table 6-2 shows that the 10,000
sample means have a mean of 3.49 If the process is continued indefinitely, the mean
of the sample means will be 3.5 Also, Table 6-2 shows that the distribution of the
sample means is approximately a normal distribution
Based on the actual sample results shown in the top portion
of Table 6-2, we can describe the sampling distribution of the mean by the histogram
at the top of Table 6-2 The actual sampling distribution would be described by a
histogram based on all possible samples, not only the 10,000 samples included in
the histogram, but the number of trials is large enough to suggest that the true
sam-pling distribution of means is a normal distribution
m = 3.5
Sampling Distribution of the Mean Consider repeating
this process: Roll a die 5 times and find the mean of the results (See Table 6-2
on the next page.) What do we know about the behavior of all sample means that
are generated as this process continues indefinitely?
x
1
The results of Example 1 allow us to observe these two important properties of the
sampling distribution of the mean:
1. The sample means target the value of the population mean (That is, the mean
of the sample means is the population mean The expected value of the sample
mean is equal to the population mean.)
2. The distribution of sample means tends to be a normal distribution (This will
be discussed further in the following section, but the distribution tends to
be-come closer to a normal distribution as the sample size increases.)
Sampling Distribution of the Variance
Having discussed the sampling distribution of the mean, we now consider the
sam-pling distribution of the variance
The sampling distribution of the variance is the distribution of sample
variances, with all samples having the same sample size n taken from the
same population (The sampling distribution of the variance is typically
rep-resented as a probability distribution in the format of a table, probability
histogram, or formula.)
Caution: When working with population standard deviations or variances, be sure to
evaluate them correctly Recall from Section 3-3 that the computations for population
Do Boys or Girls Run in the
Family?
The author of this book, his siblings, and his siblings’ children consist
of 11 males and only one female Is this an example
of a nomenon whereby one particular gender runs
phe-in a family? This issue was studied by examining a random sample of 8770 households in the United States The results were
reported in the Chance
magazine article “Does Having Boys or Girls Run in the Family?” by Joseph Rodgers and Debby Doughty Part of their analysis involves use of the binomial probability distribution Their conclusion
is that “We found no compelling evidence that sex bias runs in the family.”
Trang 31standard deviations or variances involve division by the population size N (not the
value of ), as shown below
Because the calculations are typically performed with computer software or tors, be careful to correctly distinguish between the standard deviation of a sampleand the standard deviation of a population Also be careful to distinguish between thevariance of a sample and the variance of a population
Roll a die 5 times
and find the mean x
Roll a die 5 times and
find the variance s2
Roll a die 5 times and
find the proportion
Sample 3
Approximatelynormal
ApproximatelynormalSkewed
Trang 32The middle portion of Table 6-2 illustrates a process of rolling adie 5 times and finding the variance of the results Table 6-2 shows results from re-
peating this process 10,000 times, but the true sampling distribution of the variance
involves repeating the process indefinitely Because the values of 1, 2, 3, 4, 5, 6 are
all equally likely, the population has a variance of , and Table 6-2 shows
that the 10,000 sample variances have a mean of 2.88 If the process is continued
in-definitely, the mean of the sample variances will be 2.9 Also, the middle portion of
Table 6-2 shows that the distribution of the sample variances is a skewed distribution
Based on the actual sample results shown in the middleportion of Table 6-2, we can describe the sampling distribution of the variance by
the histogram in the middle of Table 6-2 The actual sampling distribution would be
described by a histogram based on all possible samples, not the 10,000 samples
in-cluded in the histogram, but the number of trials is large enough to suggest that the
true sampling distribution of variances is a distribution skewed to the right
s2 = 2.9
Sampling Distribution of the Variance Consider
repeat-ing this process: Roll a die 5 times and find the variance of the results What do
we know about the behavior of all sample variances that are generated as this
process continues indefinitely?
s2
2
The results of Example 2 allow us to observe these two important properties of the
sampling distribution of the variance:
1. The sample variances target the value of the population variance (That is, the
mean of the sample variances is the population variance The expected value of
the sample variance is equal to the population variance.)
2. The distribution of sample variances tends to be a distribution skewed to the
right
Sampling Distribution of Proportion
We now consider the sampling distribution of a proportion
The sampling distribution of the proportion is the distribution of sample
proportions, with all samples having the same sample size n taken from the
same population
We need to distinguish between a population proportion p and some sample
propor-tion, so the following notation is commonly used
Notation for Proportions
p N = sample proportion
p = population proportion
Trang 33The bottom portion of Table 6-2 illustrates a process of rolling adie 5 times and finding the proportion of odd numbers Table 6-2 shows resultsfrom repeating this process 10,000 times, but the true sampling distribution of theproportion involves repeating the process indefinitely Because the values of 1, 2, 3,
4, 5, 6 are all equally likely, the proportion of odd numbers in the population is 0.5,and Table 6-2 shows that the 10,000 sample proportions have a mean of 0.50 If theprocess is continued indefinitely, the mean of the sample proportions will be 0.5.Also, the bottom portion of Table 6-2 shows that the distribution of the sample pro-portions is approximately a normal distribution
Based on the actual sample results shown in the bottomportion of Table 6-2, we can describe the sampling distribution of the proportion bythe histogram at the bottom of Table 6-2 The actual sampling distribution would
be described by a histogram based on all possible samples, not the 10,000 samplesincluded in the histogram, but the number of trials is large enough to suggest thatthe true sampling distribution of proportions is a normal distribution
Sampling Distribution of the Proportion Consider
re-peating this process: Roll a die 5 times and find the proportion of odd numbers.
What do we know about the behavior of all sample proportions that are generated
as this process continues indefinitely?
3
The results of Example 3 allow us to observe these two important properties of thesampling distribution of the proportion:
1. The sample proportions target the value of the population proportion (That
is, the mean of the sample proportions is the population proportion The pected value of the sample proportion is equal to the population proportion.)
ex-2. The distribution of sample proportions tends to be a normal distribution.The preceding three examples are based on 10,000 trials and the results are sum-
marized in Table 6-2 Table 6-3 describes the general behavior of the sampling
distri-bution of the mean, variance, and proportion, assuming that certain conditions aresatisfied For example, Table 6-3 shows that the sampling distribution of the meantends to be a normal distribution, but the following section describes conditions thatmust be satisfied before we can assume that the distribution is normal
Unbiased Estimators The preceding three examples show that sample means,
variances, and proportions tend to target the corresponding population parameters More formally, we say that sample means, variances, and proportions are unbiased es-
timators That is, their sampling distributions have a mean that is equal to the mean
of the corresponding population parameter If we want to use a sample statistic (such
as a sample proportion from a survey) to estimate a population parameter (such as thepopulation proportion), it is important that the sample statistic used as the estimator
targets the population parameter instead of being a biased estimator in the sense that
it systematically underestimates or overestimates the parameter The preceding threeexamples and Table 6-2 involve the mean, variance, and proportion, but here is asummary that includes other statistics
Trang 34Estimators: Unbiased and Biased
• Standard deviation s (Important Note: The sample standard deviations do not
target the population standard deviation , but the bias is relatively small ins
Sample 3
NormalSample Means x
Table 6-3 General Behavior of Sampling Distributions
Trang 35large samples, so s is often used to estimate even though s is a biased estimator
of )The preceding three examples all involved rolling a die 5 times, so the number of
different possible samples, it is not practical to manually list all of them The next ample involves a smaller number of different possible samples, so we can list themand we can then describe the sampling distribution of the range in the format of atable for the probability distribution
ex-6 * 6 * 6 * 6 * 6 = 7776s
Sampling Distribution of the Range Three randomly
se-lected households are surveyed as a pilot project for a larger survey to be ducted later The numbers of people in the households are 2, 3, and 10 (based onData Set 22 in Appendix B) Consider the values of 2, 3, and 10 to be a popula-tion Assume that samples of size are randomly selected with replacementfrom the population of 2, 3, and 10
con-a.List all of the different possible samples, then find the range in each sample
b.Describe the sampling distribution of the ranges in the format of a table marizing the probability distribution
sum-c.Describe the sampling distribution of the ranges in the format of a probabilityhistogram
d.Based on the results, do the sample ranges target the population range, which is
b.The nine samples in Table 6-4 are all equally likely, so each sample has a probability
of The last two columns of Table 6-4 list the values of the range along with thecorresponding probabilities, so the last two columns constitute a table summarizingthe probability distribution, which can be condensed as shown in Table 6-5 Table
6-5 therefore describes the sampling distribution of the sample ranges.
c.Figure 6-17 is the probability histogram based on Table 6-5
d.The mean of the nine sample ranges is 3.6, but the range of the population is 8.Consequently, the sample ranges do not target the population range
e.Because the mean of the sample ranges (3.6) does not equal the populationrange (8), the sample range is a biased estimator of the population range Wecan also see that the range is a biased estimator by simply examining Table 6-5and noting that most of the time, the sample range is well below the populationrange of 8
Trang 36In this example, we conclude that the sample range is
a biased estimator of the population range This implies that, in general, the sample
range should not be used to estimate the value of the population range
Table 6-4 Sampling Distribution of the Range
2 9
1 9
Figure 6-17 Probability Histogram: Sampling Distribution of the Sample Ranges
Sampling Distribution of the Proportion In a study of
gender selection methods, an analyst considers the process of generating 2 births
When 2 births are randomly selected, the sample space is bb, bg, gb, gg Those 4
outcomes are equally likely, so the probability of 0 girls is 0.25, the probability of
1 girl is 0.5, and the probability of 2 girls is 0.25 Describe the sampling
distribu-tion of the propordistribu-tion of girls from 2 births as a probability distribudistribu-tion table and
also describe it as a probability histogram
5
continued
Trang 37See the accompanying display The top table summarizes the ability distribution for the number of girls in 2 births That top table can be used to
prob-construct the probability distribution for the proportion of girls in 2 births as shown.
The top table can also be used to construct the probability histogram as shown
2 births
0 0.5 1
Proportion ofgirls from
2 births Probability0
0.51
0.250.500.25
Number ofGirls from
2 Births
P (x) x
012
0.250.500.25
histogram
Sampling distribution ofthe proportion of girlsfrom 2 births
Example 5 shows that a sampling distribution can be described with a table or agraph Sampling distributions can also be described with a formula (as in Exercise 21),
or may be described in some other way, such as this: “The sampling distribution ofthe sample mean is a normal distribution with and ”
Why sample with replacement? All of the examples in this section involved
sampling with replacement Sampling without replacement would have the very
practi-cal advantage of avoiding wasteful duplication whenever the same item is selected
more than once However, we are particularly interested in sampling with replacement
for these two reasons:
1. When selecting a relatively small sample from a large population, it makes no nificant difference whether we sample with replacement or without replacement
sig-2. Sampling with replacement results in independent events that are unaffected
by previous outcomes, and independent events are easier to analyze and result
in simpler calculations and formulas
s = 15
m = 100
Trang 38For the above reasons, we focus on the behavior of samples that are randomly
se-lected with replacement Many of the statistical procedures discussed in the following
chapters are based on the assumption that sampling is conducted with replacement
The key point of this section is to introduce the concept of a sampling
distribu-tion of a statistic Consider the goal of trying to find the mean body temperature of
all adults Because that population is so large, it is not practical to measure the
tem-perature of every adult Instead, we obtain a sample of body temtem-peratures and use it
to estimate the population mean Data Set 2 in Appendix B includes a sample of 106
such body temperatures The mean for that sample is Conclusions that
we make about the population mean temperature of all adults require that we
under-stand the behavior of the sampling distribution of all such sample means Even
though it is not practical to obtain every possible sample and we are stuck with just
one sample, we can form some very meaningful conclusions about the population of
all body temperatures A major goal of the following sections and chapters is to learn
how we can effectively use a sample to form conclusions about a population In
Section 6-5 we consider more details about the sampling distribution of sample
means, and in Section 6-6 we consider more details about the sampling distribution
of sample proportions
x = 98.20°F
CAUTION
Many methods of statistics require a simple random sample Some samples, such as
volun-tary response samples or convenience samples, could easily result in very wrong results
Basic Skills and Concepts
Statistical Literacy and Critical Thinking
1 Sampling DistributionIn your own words describe a sampling distribution
2 Sampling DistributionData Set 24 in Appendix B includes a sample of FICO credit
rating scores from randomly selected consumers If we investigate this sample by constructing
a histogram and finding the sample mean and standard deviation, are we investigating the
sampling distribution of the mean? Why or why not?
3 Unbiased EstimatorWhat does it mean when we say that the sample mean is an
unbi-ased estimator, or that the sample mean “targets” the population mean?
4 Sampling with Replacement Give two reasons why statistical methods tend to be
based on the assumption that sampling is conducted with replacement, instead of without
re-placement
5 Good Sample?You want to estimate the proportion of all U.S college students who have
the profound wisdom to take a statistics course You obtain a simple random sample of
students at New York University Is the resulting sample proportion a good estimator of the
population proportion? Why or why not?
6 Unbiased EstimatorsWhich of the following statistics are unbiased estimators of
popu-lation parameters?
a Sample mean used to estimate a population mean
b Sample median used to estimate a population median
c Sample proportion used to estimate a population proportion
d Sample variance used to estimate a population variance
e Sample standard deviation used to estimate a population standard deviation
f Sample range used to estimate a population range
6-4
Trang 397 Sampling Distribution of the MeanSamples of size are randomly selectedfrom the population of the last digits of telephone numbers If the sample mean is found foreach sample, what is the distribution of the sample means?
8 Sampling Distribution of the ProportionSamples of size are randomly lected from the population of the last digits of telephone numbers, and the proportion of evennumbers is found for each sample What is the distribution of the sample proportions?
se-In Exercises 9–12, refer to the population and list of samples in Example 4.
9 Sampling Distribution of the MedianIn Example 4, we assumed that samples of sizeare randomly selected without replacement from the population consisting of 2, 3, and
10, where the values are the numbers of people in households Table 6-4 lists the nine ent possible samples
differ-a Find the median of each of the nine samples, then summarize the sampling distribution of
the medians in the format of a table representing the probability distribution (Hint: Use a
format similar to Table 6-5)
b Compare the population median to the mean of the sample medians
c Do the sample medians target the value of the population median? In general, do samplemedians make good estimators of population medians? Why or why not?
10 Sampling Distribution of the Standard DeviationRepeat Exercise 9 using dard deviations instead of medians
stan-11 Sampling Distribution of the VarianceRepeat Exercise 9 using variances instead ofmedians
12 Sampling Distribution of the MeanRepeat Exercise 9 using means instead of medians
13 Assassinated Presidents: Sampling Distribution of the MeanThe ages (years) ofthe four U.S presidents when they were assassinated in office are 56 (Lincoln), 49 (Garfield),
58 (McKinley), and 46 (Kennedy)
a Assuming that 2 of the ages are randomly selected with replacement, list the 16 differentpossible samples
b Find the mean of each of the 16 samples, then summarize the sampling distribution of themeans in the format of a table representing the probability distribution (Use a format similar
to Table 6-5 on page 283)
c Compare the population mean to the mean of the sample means
d Do the sample means target the value of the population mean? In general, do samplemeans make good estimators of population means? Why or why not?
14 Sampling Distribution of the MedianRepeat Exercise 13 using medians instead ofmeans
15 Sampling Distribution of the Range Repeat Exercise 13 using ranges instead ofmeans
16 Sampling Distribution of the VarianceRepeat Exercise 13 using variances instead ofmeans
17 Sampling Distribution of ProportionExample 4 referred to three randomly selectedhouseholds in which the numbers of people are 2, 3, and 10 As in Example 4, consider the val-ues of 2, 3, and 10 to be a population and assume that samples of size are randomly se-lected with replacement Construct a probability distribution table that describes the samplingdistribution of the proportion of odd numbers when samples of size are randomly se-lected Does the mean of the sample proportions equal the proportion of odd numbers in thepopulation? Do the sample proportions target the value of the population proportion? Doesthe sample proportion make a good estimator of the population proportion?
18 Births: Sampling Distribution of ProportionWhen 3 births are randomly selected,the sample space is bbb, bbg, bgb, bgg, gbb, gbg, ggb, and ggg Assume that those 8 outcomes
are equally likely Describe the sampling distribution of the proportion of girls from 3 births as
Trang 40a probability distribution table Does the mean of the sample proportions equal the
propor-tion of girls in 3 births? (Hint: See Example 5.)
19 Genetics: Sampling Distribution of ProportionA genetics experiment involves a
population of fruit flies consisting of 1 male named Mike and 3 females named Anna,
Bar-bara, and Chris Assume that two fruit flies are randomly selected with replacement.
a After listing the 16 different possible samples, find the proportion of females in each sample,
then use a table to describe the sampling distribution of the proportions of females
b Find the mean of the sampling distribution
c Is the mean of the sampling distribution (from part (b)) equal to the population proportion
of females? Does the mean of the sampling distribution of proportions always equal the
popu-lation proportion?
20 Quality Control: Sampling Distribution of ProportionAfter constructing a new
manufacturing machine, 5 prototype integrated circuit chips are produced and it is found that
2 are defective (D) and 3 are acceptable (A) Assume that two of the chips are randomly
se-lected with replacement from this population.
a After identifying the 25 different possible samples, find the proportion of defects in each of
them, then use a table to describe the sampling distribution of the proportions of defects
b Find the mean of the sampling distribution
c Is the mean of the sampling distribution (from part (b)) equal to the population proportion
of defects? Does the mean of the sampling distribution of proportions always equal the
popu-lation proportion?
Beyond the Basics
21 Using a Formula to Describe a Sampling DistributionExample 5 includes a table
and graph to describe the sampling distribution of the proportions of girls from 2 births
Consider the formula shown below, and evaluate that formula using sample proportions x of
0, 0.5, and 1 Based on the results, does the formula describe the sampling distribution? Why
or why not?
22 Mean Absolute DeviationIs the mean absolute deviation of a sample a good statistic
for estimating the mean absolute deviation of the population? Why or why not? (Hint: See
Example 4.)
2(2 - 2x)!(2x)! where x = 0, 0.5, 1
6-4
The Central Limit Theorem
Key Concept In this section we introduce and apply the central limit theorem The
central limit theorem tells us that for a population with any distribution, the
distri-bution of the sample means approaches a normal distridistri-bution as the sample size
increases In other words, if the sample size is large enough, the distribution of
sample means can be approximated by a normal distribution, even if the original
population is not normally distributed In addition, if the original population has
mean and standard deviation , the mean of the sample means will also be ,
but the standard deviation of the sample means will be , where n is the
sam-ple size
sm
6-5