Quantitative Methods for Business chapter 16 doc

57 327 0
Quantitative Methods for Business chapter 16 doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER Test driving – sampling theory, estimation and hypothesis testing 16 Chapter objectives This chapter will help you to: ■ understand the theory behind the use of sample results for prediction ■ make use of the t distribution and appreciate its importance ■ construct and interpret interval estimates of population means and population proportions ■ work out necessary sample sizes for interval estimation ■ carry out tests of hypotheses about population means, pro- portions and medians, and draw appropriate conclusions from them ■ use the technology; the t distribution, estimation and hypoth- esis testing in EXCEL, MINITAB and SPSS ■ become acquainted with the business origins of the t distribution In the previous chapter we reviewed the methods that can be used to select samples from populations in order to gain some understanding of those populations. In this chapter we will consider how sample results can be used to provide estimates of key features, or parameters, of the Chapter 16 Test driving – sampling theory, estimation and hypothesis testing 483 populations from which they were selected. It is important to note that the techniques described in this chapter, and the theory on which they are based, should only be used with results of samples selected using probabilistic or random sampling methods. The techniques are based on knowing, or at least having a reliable estimate of, the sampling error and this is not possible with non-random sampling methods. In Chapter 13 we looked at the normal distribution, an important statistical distribution that enables you to investigate the very many continuous variables that occur in business and many other fields, whose values are distributed in a normal pattern. What makes the nor- mal distribution especially important is that it enables us to anticipate how sample results vary. This is because many sampling distributions have a normal pattern. 16.1 Sampling distributions Sampling distributions are distributions that show how sample results vary. They depict the ‘populations’ of sample results. Such distribu- tions play a crucial role in quantitative work because they enable us to use data from a sample to make statistically valid predictions or judge- ments about a population. There are considerable advantages in using sample results in this way, especially when the population is too large to be accessible, or if investigating the population is too expensive or time-consuming. A sample is a subset of a population, that is, it consists of some obser- vations taken from the population. A random sample is a sample that consists of values taken at random from the population. You can take many different random samples from the same popula- tion, even samples that consist of the same number of observations. Unless the population is very small the number of samples that you could take from it is to all intents and purposes infinite. A ‘parent’ population can produce an effectively infinite number of ‘offspring’ samples. These samples will have different means, standard deviations and so on. So if we want to use, say, a sample mean to predict the value of the population mean we will be using something that varies from sample to sample, the sample mean (x – ), to predict something that is fixed, the population mean. To do this successfully we need to know how the sample means vary from one sample to another. We need to think of sample means as observations, x – s, of a variable, X –– , and consider how they are distributed. What is more we need to relate the distribution of sample means to the 484 Quantitative methods for business Chapter 16 parameters of the population the samples come from so that we can use sample statistics to predict population measures. The distribution of X –– , the sample means, is a sampling distribution. We will begin by considering the simplest case, in which we assume that the parent population is normally distributed. If this is the case, what will the sampling distributions of means of samples taken from the population be like? If you were to take all possible random samples consisting of n obser- vations from a population that is normal, with a mean ␮ and a standard deviation ␴, and analyse them you would find that the sample means of all these samples will themselves be normally distributed. You would find that the mean of the distribution of all these differ- ent sample means is exactly the same as the population mean, ␮. You would also find that the standard deviation of all these sample means is the population standard deviation divided by the square root of the size of the samples, ␴/√n. So the sample means of all the samples size n that can be taken from a normal population with a mean ␮ and a standard deviation ␴ are dis- tributed normally with a mean of ␮ and a standard deviation of ␴/√n. In other words, the sample means are distributed around the same mean as the population itself but with a smaller spread. We know that the sample means will be less spread out than the popu- lation because n will be more than one, so ␴/√n will be less than ␴. For instance, if there are four observations in each sample, ␴/√n will be ␴/2, that is the sampling distribution of means of samples which have four observations in them will have half the spread of the population distribution. The larger the size of the samples, the less the spread in the values of their means, for instance if each sample consists of 100 observations the standard deviation of the sampling distribution will be ␴/10, a tenth of the population distribution. This is an important logical point. In taking samples we are ‘aver- aging out’ the differences between the individual values in the popula- tion. The larger the samples, the more this happens. For this reason it is better to use larger samples to make predictions about a population. Next time you see an opinion poll look for the number of people that the pollsters have canvassed. It will probably be at least one thou- sand. The results of an opinion poll are a product that the polling organization wants to sell to media companies. In order to do this they have to persuade them that their poll results are likely to be reliable. They won’t be able to do this if they only ask a very few people for their opinions! Chapter 16 Test driving – sampling theory, estimation and hypothesis testing 485 The standard deviation of a sampling distribution, ␴/√n, is also known as the standard error of the sampling distribution because it helps us anticipate the error we will have to deal with if we use a sam- ple mean to predict the value of the population mean. If we know the mean and standard deviation of the parent population distribution we can find the probabilities of ranges different sample means as we can do for any other normal distribution, by using the Standard Normal Distribution. Example 16.1 Reebar Frozen Foods produces packs of four fish portions. On the packaging they claim that the average weight of the portions is 120 g. If the mean weight of the fish portions they buy is 124 g with a standard deviation of 4 g, what is the probability that the mean weight of a pack of four portions will be less than 120 g? We will assume that the selection of the four portions to put in a pack is random. Imagine we took every possible sample of four portions from the population of fish por- tions purchased by Reebar (which we will assume for practical purposes to be infinite) and calculated the mean weight of each sample. We would find that the sampling distri- bution of all these means has a mean of 124 g and a standard error of 4/√4, which is 2. The probability that a sample of four portions has a mean of less than 120 g is the probability that a normal variable with a mean of 124 and a standard deviation of 2 is less than 120. The z-equivalent of the value 120 in the sampling distribution is From Table 5 on pages 621–622 in Appendix 1 you will find that the probability that z is less than Ϫ2.00 is 0.0228 or 2.28%. We can conclude that there is a less than one in forty chance that four portions in a pack chosen at random have a mean weight of less than 120 g. You might like to compare this with the probability of one fish portion selected at random weighing less than 120 g: Using Table 5 you will find that the probability that Z is less than Ϫ1.00 is 0.1587 or 15.87%, approximately a one in six chance. This is rather greater than the chance of getting a pack of four whose mean weight is less than 120 g (2.28%); in general there is less variation among sample means than there is among single points of data. z x 120 124 1.00ϭ Ϫ ϭ Ϫ ϭϪ ␮ ␴ 4 z x n 120 124 ϭ Ϫ ϭ Ϫ ϭϪ ␮ ␴ √√44 200. 486 Quantitative methods for business Chapter 16 The procedure we used in Example 16.1 can be applied whether we are dealing with small samples or with very much larger samples. As long as the population the samples come from is normal we can be sure that the sampling distribution will be distributed normally with a mean of ␮ and a standard deviation of ␴/√n. But what if the population is not normal? There are many distribu- tions that are not normal, such as distributions of wealth of individuals or distributions of waiting times. Fortunately, according to a mathematical finding known as the Central Limit Theorem, as long as n is large (which is usually interpreted to mean 30 or more) the sampling distribution of sample means will be normal in shape and have a mean of ␮ and a standard deviation of ␴/√n. This is true whatever the shape of the population distribution. Example 16.2 The times that passengers at a busy railway station have to wait to buy tickets during the rush hour follow a skewed distribution with a mean of 2 minutes 46 seconds and a stand- ard deviation of 1 minute 20 seconds. What is the probability that a random sample of 100 passengers will, on average, have to wait more than 3 minutes? The sample size, 100, is larger than 30 so the sampling distribution of the sample means will have a normal shape. It will have a mean of 2 minutes 46 seconds, or 166 seconds, and a standard error of 80/√100 seconds. From Table 5 the probability that Z is more than 1.75 is 0.0401. So the probability that a random sample of 100 passengers will have to wait on average more than 3 minutes is 4.01%, or a little more than a one in twenty-five chance. PX P Z PZ PZ () 180 seconds 180 166 ( 14 8) ( 1.75) Ͼϭ ϾϪ ϭϾ ϭϾ 80 100√ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ If the samples taken from a population that is not normal consist of fewer than 30 observations then the Central Limit Theorem does not apply. The sampling distributions of means of small samples taken from such populations do not have a normal pattern. At this point you may find it useful to try Review Questions 16.1 to 16.6 at the end of the chapter. Chapter 16 Test driving – sampling theory, estimation and hypothesis testing 487 16.1.1 Estimating the standard error The main reason for being interested in sampling distributions is to help us use samples to assess populations because studying the whole population is not possible or practicable. Typically we will be using a sample, which we do know about, to investigate a population, which we don’t know about. We will have a sample mean and we will want to use it to assess the likely value of the population mean. So far we have measured sampling distributions using the mean and the standard deviation of the population, ␮ and ␴. But if we need to find out about the population using a sample, how can we possibly know the values of ␮ and ␴? The answer is that in practice we don’t. In the case of the population mean, ␮, this doesn’t matter because typically it is something we are trying to assess. But without the population standard deviation, ␴, we do need an alternative approach to measuring the spread of a sam- pling distribution. Because we will have a sample, the obvious answer is to use the stand- ard deviation, s, in place of the population standard deviation, ␴. So instead of using the real standard error, ␴/√n, we estimate the stand- ard error of the sampling distribution with s/√n. Using the estimated standard error, s/√n, is fine as long as the sam- ple concerned is large (in practice, that n, the sample size, is at least 30). If we are dealing with a large sample we can use s/√n as an approx- imation of ␴/√n. The means of samples consisting of n observations will be normally distributed with a mean of ␮ and an estimated stan- dard error of s/√n. The Central Limit Theorem allows us to do this even if the population the sample comes from is not itself normal in shape. Example 16.3 The mean volume of draught beer served in pint glasses in the Nerry Ash Leavy Arms pub is known to be 0.538 litres. A consumer organization takes a random sample of 36 pints of draught beer and finds that the standard deviation of this sample is 0.066 litres. What is the probability that the mean volume of the sample will be less than a pint (0.568 litres)? The population mean, ␮, in this case is 0.538 and the sample standard deviation, s, is 0.042. We want the probability that x – is less than 0.568, P (X –– Ͻ 0.568). The 488 Quantitative methods for business Chapter 16 It is important to remember that s/√n is not the real standard error, it is the estimated standard error, but because the standard deviation of a large sample will be reasonably close to the population standard devi- ation the estimated standard error will be close to the actual standard error. At this point you may find it useful to try Review Question 16.7 at the end of the chapter. 16.1.2 The t distribution In section 16.1.1 we looked at how you can analyse sampling distribu- tions using the sample standard deviation, s, when you do not know the population standard deviation, ␴. As long as the sample size, n, is 30 or more the estimated standard error will be a sufficiently consistent measure of the spread of the sampling distribution, whatever the shape of the parent population. If, however, the sample size, n, is less than 30 the estimated standard error, s/√n, is generally not so close to the actual standard error, ␴/√n, and the smaller the sample size, the greater will be the difference between the two. In this situation it is possible to model the sampling distribution using the estimated standard error, as long as the popula- tion the sample comes from is normal, but we have to use a modified normal distribution in order to do it. This modified normal distribution is known as the t distribution. The development of the distribution was a real breakthrough because it made it possible to investigate populations using small sample results. Small samples are generally much cheaper and quicker to gather than a large sample so the t distribution broadened the scope for analysis based on sample data. z-equivalent of 0.568 is: If you look at Table 5 you will find that the probability that Z is less than 2.73 is 0.9968, so the probability that the sample mean is more than a pint is 0.9968 or 99.68%. z x sn PX PZ 0.568 0.538 2.73 to 2 decimal places So 0.568 2.73 ϭ Ϫ ϭ Ϫ ϭ ϽϭϽ ␮ √√0 066 36. ()() Chapter 16 Test driving – sampling theory, estimation and hypothesis testing 489 The t distribution is a more spread out version of the normal distri- bution. The difference between the two is illustrated in Figure 16.1. The greater spread is to compensate for the greater variation in sample standard deviations between small samples than between large samples. The smaller the sample size, the more compensation is needed, so there are a number of versions of the t distribution. The one that should be used in a particular context depends on the number of degrees of freedom, represented by the symbol ␯ (nu, the Greek letter n), which is the sample size minus one, n Ϫ1. To work out the probability that the mean of a small sample taken from a normal population is more, or less, than a certain amount we first need to find its t-equivalent, or t value. The procedure is very similar to the way we work out a z-equivalent. t x sn ϭ Ϫ ␮ √ Ϫ3 Ϫ20123Ϫ1 0.0 0.1 0.2 0.3 0.4 Figure 16.1 The Standard Normal Distribution (solid line) and the t distribution (dotted line) Example 16.4 A customer visiting the pub in Example 16.3 purchases nine pints of draught beer. The precise volumes of the pints served are known to be normally distributed with a mean of 0.538 litres and the standard deviation of the volumes of the nine pints bought by the customer is 0.048 litres. What is the probability that the mean volume of each of the nine pints is less than a pint (0.568 litres)? 490 Quantitative methods for business Chapter 16 The t value that we used in Example 16.4 could be written as t 0.05,8 because it is the value of t that cuts off a tail area of 5% in the t distri- bution that has 8 degrees of freedom. In the same way, t 0.01,15 represent the t value that cuts off a tail area of 1% in the t distribution that has 15 degrees of freedom. You will find that the way the t distribution is used in further work depends on tail areas. For this reason, and also because the t distribution varies depending on the number of degrees of freedom, printed tables do not provide full details of the t distribution in the same way that Standard Normal Distribution tables give full details of the Standard Normal Distribution. Table 6 on page 623 gives selected values of t from the t distribution with different degrees of freedom for the most com- monly used tail areas. If you need t distribution values that are not in Table 6 you can obtain them using computer software, as shown in section 16.4 later in this chapter. The population mean, ␮, is 0.538 and the sample standard deviation, s, is 0.048. We want the probability that X –– is less than 0.568, P (X –– Ͻ 0.568). The t value equivalent to 0.568 is: You will find some details of the t distribution in Table 6 on page 623 in Appendix 1. Look down the column headed ␯ on the left hand side until you see the figure 8, the number of degrees of freedom in this case (the sample size is 9). Look across the row to the right and you will see five figures that relate to the t distribution with eight degrees of freedom. The nearest of these figures to 1.875 is the 1.86 that is in the column headed 0.05. This means that 5% of the t distribution with eight degrees of freedom is above 1.86. In other words, the probability that t is more than 1.86 is 0.05. This means that the probability that the mean volume of nine pints will be less than 0.568 litres will be approximately 0.95. t x sn PX Pt 0.568 0.538 1.875 So ( 0.568) ( 1.875) ϭ Ϫ ϭ Ϫ ϭ Ͻ ϭ Ͻ ␮ √√0 048 9. Example 16.5 Use Table 6 to find: (a) t with 4 degrees of freedom that cuts off a tail area of 0.10, t 0.10,4 (b t with 10 degrees of freedom that cuts off a tail area of 0.01, t 0.01,10 Chapter 16 Test driving – sampling theory, estimation and hypothesis testing 491 At this point you may find it useful to try Review Questions 16.8 and 16.9 at the end of the chapter. 16.1.3 Choosing the right model for a sampling distribution The normal distribution and the t distribution are both models that you can use to model sampling distributions, but how can you be sure that you use the correct one? This section is intended to provide a brief guide to making the choice. The first question to ask is, are the samples whose results make up the sampling distribution drawn from a population that is distributed normally? In other words, is the parent population normal? If the answer is yes then it is always possible to model the sampling distribu- tion. If the answer is no then it is only possible to model the sampling distribution if the sample size, n, is 30 or more. The second question is whether the population standard deviation, ␴, is known. If the answer to this is yes then as long as the parent popu- lation is normal the sampling distribution can be modelled using the normal distribution whatever the sample size. If the answer is no the sampling distribution can be modelled using the normal distribution only if the sample size is 30 or more. In the absence of the population standard deviation, you have to use the sample standard deviation to approximate the standard error. Finally, what if the parent population is normal, the population stand- ard deviation is not known and the sample size is less than 30? In these circumstances you should use the t distribution and approximate (c) t with 17 degrees of freedom that cuts off a tail area of 0.025, t 0.025,17 (d) t with 100 degrees of freedom that cuts off a tail area of 0.005, t 0.005,100 . From Table 6: (a) t 0.10,4 is in the row for 4 degrees of freedom and the column headed 0.10, 1.533. This means that the probability that t, with 4 degrees of freedom, is greater than 1.533 is 0.10 or 10%. (b) t 0.01,10 is the figure in the row for 10 degrees of freedom and the column headed 0.01, 2.764. (c) t 0.025,17 is in the row for 17 degrees of freedom and the 0.025 column, 2.110. (d) t 0.005,100 is in the row for 100 degrees of freedom and the 0.005 column, 2.626. [...]... obtain the z values necessary for other levels of confidence by looking for the appropriate values of ␣/2 in the body of Table 5 on pages 621–622 in Appendix 1 and finding the z values associated with them Example 16. 9 Use the sample result in Example 16. 7, £47.13, to produce a 98% confidence interval for the population mean 500 Quantitative methods for business Chapter 16 From Table 5 the z value that... 52.50 This is shown in Figure 16. 4 Suppose the researchers calculate the mean of their sample and it is £49.25, a figure inside the interval 47.50 to 52.50 that contains the 95% of sample means within 496 Quantitative methods for business Chapter 16 0.4 P(X ϭ x ) 0.3 0.2 0.1 0.0 46.175 47.45 48.725 50.00 51.275 52.55 53.825 x Figure 16. 3 The sampling distribution in Example 16. 6 0.4 P(X ϭ x ) 0.3 0.2... cases is called a one-sided test Table 16. 2 lists the three combinations of hypotheses In Table 16. 2 ␮0 is used to represent the value of the population mean, ␮, that is to be tested The type of null hypothesis that should be used depends on the context of the investigation and the perspective of the investigator 508 Quantitative methods for business Chapter 16 Table 16. 2 Types of hypotheses Null hypothesis... (z Ͼ 2.20) ϭ 0.0139 –– This means P (X Ͼ 61.87) ϭ 0.0139 or 1.39% This is shown in Figure 16. 5 in the context of the sampling distribution 510 Quantitative methods for business Chapter 16 0.4 P(X ϭ x ) 0.3 0.2 0.1 0.0 56.60 57.45 58.30 59.15 60.00 60.85 61.70 62.55 63.40 x Figure 16. 5 –– P (X Ͼ 61.87) in Example 16. 17 Once we know how likely it is that a sample mean of this order belongs to the sampling... illustrated in Figure 16. 7 If the null hypothesis states that the population mean is greater than or equal to a particular value, we would also conduct a one-tail test But this 0.4 P(Z ϭ z) 0.3 Figure 16. 7 Rejection region for a one-tail test of a ‘less than or equal’ null hypothesis at the 5% level of significance 0.2 0.1 0.0 Ϫ4 Ϫ3 Ϫ2 Ϫ1 0 z 1 2 3 4 514 Quantitative methods for business Chapter 16 0.4 P(Z ϭ... interval will be accurate To put it another way, on average 498 Quantitative methods for business Chapter 16 19 out of every 20 samples will produce an accurate estimate, and 1 out of 20 will not That is why the interval is called a 95% interval estimate or a 95% confidence interval We can express the procedure for finding an interval estimate for a population measure as taking a sample result and adding... (z␣/2 * ␴/√n) we use – estimate of ␮ ϭ x Ϯ (z␣/2 * s/√n) 502 Quantitative methods for business Chapter 16 Example 16. 11 The mean weight of the cabin baggage checked in by a random sample of 40 passengers at an international airport departure terminal was 3.47 kg The sample standard deviation was 0.82 kg Construct a 90% confidence interval for the mean weight of cabin baggage checked in by passengers... called the rejection regions, since we reject the null hypothesis if our sample mean is located in one of those parts of the distribution Another 512 Quantitative methods for business Chapter 16 0.4 P(Z ϭ z) 0.3 0.2 0.1 Figure 16. 6 Rejection regions for a two-tail test at the 5% level of significance 0.0 Ϫ4 Ϫ3 Ϫ2 Ϫ1 0 z 1 2 3 4 way of applying the decision rule is to use the z values that cut off the... what sample size to use You therefore need to make a prior assumption about the value of the sample proportion To be on the safe side we will assume the worst-case scenario, which is that the value of p will be the one that produces the highest value of 506 Quantitative methods for business Chapter 16 p (1Ϫp) The higher the value of p (1Ϫp) the wider the interval will be, for a given sample size We need... the 1% level of significance 516 Quantitative methods for business Chapter 16 Alternatively we can use Table 5 to find that P (Z Ͻ Ϫ2.64) ϭ 1 Ϫ0.9959 ϭ 0.0041 –– This means that the probability a sample mean is less than 23.5, P (X Ͻ 23.5), is 0.0041 Because this is less than 0.005 (half of ␣; this is a two-tail test), reject H0 at the 1% level of significance In Example 16. 20 the sample size, 15, is . £50.414 Example 16. 9 Use the sample result in Example 16. 7, £47.13, to produce a 98% confidence interval for the population mean. 500 Quantitative methods for business Chapter 16 At this point. square root of the sample size, ␴/√n. Chapter 16 Test driving – sampling theory, estimation and hypothesis testing 493 494 Quantitative methods for business Chapter 16 The sampling distributions that. the nine pints is less than a pint (0.568 litres)? 490 Quantitative methods for business Chapter 16 The t value that we used in Example 16. 4 could be written as t 0.05,8 because it is the value

Ngày đăng: 06/07/2014, 00:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan