Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 16 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
16
Dung lượng
115,28 KB
Nội dung
CHAPTER Taking short cuts – sampling methods 15 Chapter objectives This chapter will help you to: ■ appreciate the reasons for sampling ■ understand sampling bias and how to avoid it ■ employ probabilistic sampling methods and be aware of their limitations ■ use the technology: simple random sampling in MINITAB and SPSS ■ become acquainted with business uses of sampling methods A population is the entire set of items or people that form the subjects of study in an investigation and a sample is a subset of a population. Companies need to know about the populations they deal with: popu- lations of customers, employees, suppliers, products and so on. Typically these populations are very large, so large that they are to all intents and purposes infinite. Gathering data about such large populations is likely to be very expensive, time-consuming and to a certain extent impractical. The scale of expense can be immense; even governments of large countries only commit resources to survey their entire populations, that is, to conduct a census, about every ten years. The amount of time involved in surveying the whole population means that it may be so long before the results are available that they are completely out of date. There may be some elements within the Chapter 15 Taking short cuts – sampling methods 467 population that simply cannot be included in a survey of it; for instance, a car manufacturer may want to conduct a survey of all customers buying a certain model three years before in order to gauge customer satisfac- tion. Inevitably a number of those customers will have died in the period since buying their car and thus cannot be included in the survey. To satisfy their need for data about the populations that matter to them without having to incur great expense or wait a long time for results com- panies turn to sampling, the process of taking a sample from a population in order to use the sample data to gain insight into the entire population. Although not as accurate as the results of a population survey, sample results can be precise enough to serve the purposes of the investigation. The downside of sampling is that many different samples can be taken from the same population, even if the samples are the same size. You can work out the number of samples of n items that could be selected from a population of N items: We can use this to work out the number of samples of size 6 that could be selected from a very small population of just 20 items: You can imagine that the number of samples that could be selected from a much larger population will be so very large as to border on the infinite. Each of the samples you could select from a population inevitably excludes much of the population, so sample results will not be precisely the same as those from the entire population. There will be differences known as sampling errors between sample results and the results of a population survey and furthermore different samples will yield different results and hence different sampling errors. In this chapter you will find details of a variety of sampling methods, but before we look at them we need to consider what companies might look for in sampling, and what they would prefer to avoid. 15.1 Bias in sampling The point of selecting a sample from a population is to study it and use the results to understand the population. To be effective a sample should therefore reflect the population as a whole. However, there is Number of samples size 6 20! 6! 14! 38,760ϭϭ Number of samples size ! !( )! n N nN n ϭ Ϫ 468 Quantitative methods for business Chapter 15 no guarantee that the elements of the population that are chosen for the sample will collectively reflect the population. Even if the population is quite small there will be an enormous number of combinations of elem- ents that you could select in a sample. Inevitably some of these samples will represent the entire population better than others. Although it is impossible to avoid the possibility of getting an unrep- resentative sample, it is important to avoid using a sampling method that will almost invariably lead to your getting an unrepresentative sample. This means avoiding bias in selecting your sample. Effective methods of sampling are those that minimize the chances of getting unrepresentative samples and allow you to anticipate the degree of sampling error using appropriate probability distributions. Such methods should give every element of the population the same chance of being selected in a sample as any other element of the popula- tion, and consequently every possible sample of a certain size the same chance of selection as every other sample of the same size. If some elements of the population have a greater chance of being selected in a sample than others, then we have bias in our sampling method. Bias has to be avoided as the samples that can result will be extremely unlikely to reflect the population as a whole and such mis- leading results may have disastrous consequences. Example 15.1 Packaged potato crisps are sold by the million every day of the week; it is a huge market. You might think that the company that pioneered the product would by now be a very large and successful one, but you would be wrong; after their initial success they ran into problems that eventually lead to their being taken over. Occasionally the company that now owns the brand re-launches it as a retro product, with the distinctive small blue paper twist of salt in the crisp packet. A key factor in the decline of the potato crisp pioneers was product quality. The com- pany received a consistent stream of complaints from customers about the number of charred and green-tinged crisps. The company directors knew of these complaints but were baffled by them; they knew their product was good because they tasted a sample taken from the production line every day with their morning coffee. The problem for the directors was the method used to take the samples from the pro- duction line. The sample was selected by the shopfloor staff, who knew they were destined for the boardroom and quite understandably ensured that only the best were selected. The samples provided for the directors were therefore biased; the charred and green crisps that their customers wrote about had no chance of being selected in the samples taken for the directors. Chapter 15 Taking short cuts – sampling methods 469 The most effective way of avoiding bias in sample selection is to use probabilistic methods, which ensure that every element in the popula- tion has the same chance of being included in the sample. In the next section we will look at sampling methods that yield samples from which you can produce unbiased estimators of population measures, or param- eters such as a mean or a proportion. In Example 15.1 the company directors were completely misled by the bias in the selection of their samples of potato crisps. Biased samples will mislead, no matter how large the samples are; in fact, the larger such sam- ples are, the greater the danger of misrepresentation since it is always tempting to attach more credibility to a large sample. The directors were reluctant to take action to deal with a problem they were convinced did not exist. This made it easier for competitors to enter the market and the initial advantage the pioneers enjoyed was lost. Example 15.2 In the 1936 presidential election in the USA the incumbent Democrat, Franklin Roosevelt, faced the Republican governor of Kansas, Alfred Landon. Roosevelt was associated with the New Deal programme of large-scale public expenditure to alleviate the high level of unemployment in the depression of the time. Landon on the other hand wanted to end what he considered government profligacy. The prominent US weekly magazine of the time, The Literary Digest, conducted one of the largest polls ever undertaken to predict the result of the election. After analysing the returns from over 2 million respondents, the Digest confidently predicted that Landon would win by a large margin, 56% to 44%. The actual result was that Roosevelt won by a large margin, obtaining 60% of the vote. How could the Digest poll have been so wrong? The answer lay in the sampling method they used. They sent postcards to millions of people listed in telephone directories, car registration files and magazine subscription lists. The trouble was that in the USA of 1936 those who had telephones and cars and subscribed to magazines were the better-off citizens. In restricting the poll to such people, who largely supported Landon, the poll was biased against the poor and unemployed, who largely voted for Roosevelt. 470 Quantitative methods for business Chapter 15 Example 15.3 Strani Systems have 2000 employees in the UK. The HR director of the company wants to select a sample of 400 employees to answer questions about their experience of working for the company. How should she go about using simple random sampling? The population in this case consists of all the Strani employees in the UK. The sampling frame would be a list of employees, perhaps the company payroll, with each employee 15.2 Probabilistic sampling methods Perhaps the obvious way of giving every element in a population the same chance of being selected in sample is to use a random process such as those used to select winning numbers in lottery competitions. Lotteries are usually regarded as fair because every number in the popu- lation of lottery numbers has an equal chance of being picked as a winning number. 15.2.1 Simple random sampling Selecting a set of winning numbers in a lottery is an example of simple random sampling, whether the process involves elaborate machines or simply picking the numbers from the proverbial hat. You can use the same approach in drawing samples from a population. Before you can undertake simple random sampling you need to estab- lish a clear definition of the population and compile a list of the elements in it. In the same way as all the numbers in a lottery must be included if the draw is to be fair, all the items in the population must be included for the sample we take to be random. The population list is the basis or framework of our sample selection so it is known as the sampling frame. Once you have the sampling frame you need to number each elem- ent in it and then you can use random numbers to select your sample. If you have 100 elements in the population and you need a sample of 15 from it you can take a sequence of 15 two-digit random numbers from Table 4 on page 620 in Appendix 1 and select the elements for the sample accordingly; for instance if the first random number is 71 you take the 71st element on the sampling frame, if the second ran- dom number is 09 you take the ninth element and so on. If the ran- dom number 00 occurs in the sequence you take the 100th element. Chapter 15 Taking short cuts – sampling methods 471 Simple random sampling has several advantages; it is straightforward and inexpensive. Because the probability of selection is known it is pos- sible to assess the sampling error involved and ensure that estimates of population parameters based on the sample are unbiased. A potential disadvantage of simple random sampling is that in a case such as Example 15.3 the sample may consist of elements all over the country, which will make data collection expensive. Another is that whilst it is an appropriate method for largely homogenous populations, if a population is subdivided by, for instance, gender and gender is an important aspect of the analysis, using simple random sampling will not ensure suitable representation of both genders. 15.2.2 Systematic random sampling A faster alternative to simple random sampling is systematic sampling. This involves selecting a proportion of elements from the sampling frame by choosing elements at regular intervals through the list. The first element is selected using a random number. numbered from 1 to 2000. The HR director should then take a sequence of four-digit random numbers such as those listed along row 7 of Table 4: 1426 7156 7651 0042 9537 2573 and so on She does face a problem in that only two of the random numbers, 1426 and 0042, will enable her to select an employee from the list as the others are well above 2000. To get round this she could simply ignore the ones that are too high and continue until she has 400 random numbers that are in the appropriate range. This may take considerable time and she may prefer to replace the first digit in each number so that in every case they are either 0 or 1, making all the four-digit numbers in the range 0000 to 1999 (0000 would be used for the 2000th employee): Change 0, 2, 4, 6, 8 to 0 Change 1, 3, 5, 7, 9 to 1 By applying this to the figures from row 7 of Table 4 she would get: 1426 1156 1651 0042 1537 0573 Now she can use every number in the sequence to select for the sample. Example 15.4 How can the HR director in Example 15.3 use systematic sampling to select her sample of 400 employees? 472 Quantitative methods for business Chapter 15 As well as being cheap and simple, systematic sampling does yield samples with a definable sampling error and therefore able to produce unbiased estimates. This is true as long as the population list used to select the sample is not drawn up in such a way as to give rise to bias. In Example 15.4 a list of employees in alphabetical order should not result in bias but if most employees worked in teams of five, one of whom was the team leader and the list of employees was set out by teams rather than surnames, then the systematic sampling of every fifth employee would generate a sample with either all or none of the employees selected being team leaders. Systematic sampling has the same disadvantages as simple random sampling; expensive data collection if the sample members are widely dispersed, and the possibility of sub-sections of the population being under-represented. 15.2.3 Stratified random sampling One problem with both sampling methods we have looked at so far is that the samples they produce may not adequately reflect the balance of different constituencies within the population. In the long run this unevenness will be balanced out by other samples selected using the same methods, but this is little comfort if you only have the time or resources to take one sample. To avoid sections of a population being under-, or for that matter over-represented you can use stratified random sampling. As the name implies, the sample selection is random, but it is structured using the sections or strata in the population. The starting point is to define the size of the sample and then decide what proportion of each section of the population needs to be selected for the sample. Once you have decided how many elements you need from each section, then use simple ran- dom sampling to choose them. This ensures that all the sections of the population are represented in the sample yet preserves the random Since there are 2000 employees she needs to select every fifth employee in the list that constitutes the sampling frame. To decide whether she should start with the first, second, third, fourth or fifth employee on the list she could take a two-digit random number and if it is between 00 and 19 start with the first employee, between 20 and 39 the second, between 40 and 59 the third, between 60 and 79 the fourth, and between 80 and 99 the fifth. The first two-digit number at the top of column 9 of Table 4 is 47, so using this she should start with the third employee and proceed to take every fifth name after that. Chapter 15 Taking short cuts – sampling methods 473 The advantage of stratified random sampling is that it produces samples that yield unbiased estimators of population parameters whilst ensuring that the different sectors of the population are represented. The disadvantage in a case like Example 15.5 is that the sample consists of widely dispersed members and collecting data from them may be expensive, especially if face-to-face interviews are involved. 15.2.4 Cluster sampling If the investigation for which you require a sample is based on a popu- lation that is widely scattered you may prefer to use cluster sampling. This method is appropriate if the population you wish to sample is composed of geographically distinct units or clusters. You simply take a complete list of the clusters that make up your population and take a random sample of clusters from it. The elements in your sample are all the individuals in each selected cluster. Example 15.5 The 2000 UK employees of Strani Systems are based at six locations; 400 work in Leeds, 800 in Manchester, 200 in Norwich, 300 in Oxford, 100 in Plymouth, and 200 in Reading. How can the HR director in Example 15.3 use stratified random sampling to choose her sample of 400 employees? A sample of 400 constitutes 20% of the workforce of 2000. To stratify the sample in the same way as the population she should select 20% of the employees from each site; 80 from Leeds, 160 from Manchester, 40 from Norwich, 60 from Oxford, 20 from Plymouth and 40 from Reading. She should then use simple random sampling to choose the sample members from each site. For this she would need a sampling frame for each location. Example 15.6 How can the HR director from Example 15.3 use cluster sampling to select a sample of employees? She can make a random selection of two or maybe three locations by simply putting the names of the location in a hat and drawing two out. All the employees at these locations constitute her sample. nature of the selection and thus your ability to produce unbiased estima- tors of the population parameters from your sample data. 474 Quantitative methods for business Chapter 15 The advantages of cluster sampling are that it is cheap, especially if the investigation involves face-to-face interviews, because the number of locations to visit is small and you only need sampling frames for the selected clusters rather than the entire population. The disadvantages are that you may well end up with a larger sample than you need and there is a risk that some sections of the population may be under-represented. If Leeds and Manchester were the chosen clusters in Example 15.5, the sample size would be 1200 (the 400 employees at Leeds and the 800 at Manchester), a far larger sample than the HR director requires. If the overall gender balance of the company employees in Example 15.5 is 40% male and 60% females yet this balance was 90% male and 10% female at the Norwich and Reading sites there would be a serious imbalance in the sample if it consisted of employees at those two sites. 15.2.5 Multi-stage sampling Multi-stage is a generic term for any combination of probabilistic sam- pling methods. It can be particularly useful for selecting samples from populations that are divided or layered in more than one way. A rather more sophisticated approach would be to make the probability that a location is selected proportionate to its size by putting one ticket in the hat for every 100 employees at a location – four tickets for Leeds, eight for Manchester and so on. As an alternative to drawing tickets from a hat, she could follow the approach we used in section 12.4 of Chapter 12 to simulate business processes and employ random numbers to make the selections in accordance with the following allocations: Random number Location allocation Leeds 00–19 Manchester 20–59 Norwich 60–69 Oxford 70–84 Plymouth 85–89 Reading 90–99 Chapter 15 Taking short cuts – sampling methods 475 Example 15.6 The HR director from Example 15.3 likes the idea of cluster sampling as it will result in cost savings for her investigation, but she wants to avoid having a sample of more than 400 employees. How can she use multi-stage sampling to achieve this? She can use cluster sampling to select her locations and then, rather than contact all the employees at each site, she could use stratified sampling to ensure that the sample size is 400. For instance, if Leeds and Manchester were selected the 1200 employees at those sites constitute three times as many as the HR director requires in her sample so she should select one-third of the employees at each site; 133 at Leeds and 267 at Manchester. She could use either systematic or simple random sampling to choose the sample members. The advantage of multi-stage sampling is that you can customize your approach to selecting your sample; it enables you to benefit from the advantages of a particular method and use others alongside it to overcome its disadvantages. In Example 15.6 the HR director is able to preserve the cost advantage of cluster sampling and use the other methods to keep to her target sample size. Like other probabilistic methods it produces results that can be used as unbiased estimators of population parameters. 15.3 Other sampling methods Wherever possible you should use probabilistic sampling methods, not because they are more likely to produce a representative sample (which is not always true) but because they allow you to make a statistical evalu- ation of the sampling error and hence you can use the results to make predictions about the population the sample comes from that are statis- tically valid. Doing this with samples obtained by other methods does not have the same validity. Why then is it worth looking at other methods at all? There are sev- eral reasons: some populations that you might wish to investigate simply cannot be listed, such as the potential customers of a business, so it is impossible to draw up a sampling frame; secondly, some of these methods are attractive because they are convenient; and thirdly, they are used by companies and therefore it is a good idea to be aware of them and their limitations. [...]... assessing the sampling error so any generalization from the results is not statistically valid 478 Quantitative methods for business Chapter 15 15.3.4 Convenience sampling This method is very simple: samples are chosen purely on the basis of accessibility Example 15. 10 The HR director from Example 15. 3 is based at Strani’s Manchester site To select her sample she sends an email to all the employees...476 Quantitative methods for business Chapter 15 15.3.1 Quota sampling In one respect quota sampling is similar to stratified random sampling: you start by working out what proportion of the population you want to include in your... Unselected Cases Are the button beside Filtered is the default selection, and if not choose it, then click the Sample button 480 Quantitative methods for business ■ Chapter 15 In the Select Cases: Random Sample window that comes up click the small button to the left of Exactly then type 15 and 40 respectively in the empty spaces in the phrase Exactly— from the first—cases Click on Continue then OK in the Select... Chapter 12 If you want a more direct approach to selecting random samples try the MINITAB and SPSS facilities described here Chapter 15 Taking short cuts – sampling methods 479 15. 4.1 MINITAB Suppose you want to select a random sample of 15 of the 40 applications submitted for a job If you number each of the applications 1 to 40 you can use this procedure to select your sample: ■ Click the Calc button... in Example 15. 7 a young male researcher may well be disposed to approach young female employees rather more than others 15. 3.2 Judgemental sampling In judgemental sampling the selection of the sample is based entirely on the expertise of the investigator, who uses their judgement to select a sample they consider representative Chapter 15 Taking short cuts – sampling methods 477 Example 15. 8 The HR... not they consider it suitable for the channel Wilson also outlines how multi-stage sampling is used to conduct the UK National Readership survey, a key source of information for advertisers who want to know the size and nature of the readership of publications in which they might place advertisements, as well as for the publishers, who use the same information to set prices for the advertisements Hague... selected 15. 5 Road test: Do they really use sampling? In his survey of the extent to which US corporations used quantitative methods Kathawala (1988) found that 69% of companies reported that they made moderate, frequent or extensive use of statistical sampling He found that life insurance companies and electrical utilities were among the heaviest users Million (1980) described the methods used to forecast... issues under investigation who put themselves forward and the sample composition is therefore biased against those with more neutral views 15. 4 Using the technology: random sample selection in MINITAB and SPSS Random numbers play a key part in probabilistic sampling methods You can use EXCEL to generate streams of random numbers as described in section 12.5.1 of Chapter 12 If you want a more direct approach... certainly not be used for estimating population parameters as they lack statistical validity 15. 3.5 Self-selection This is sample selection by invitation; you might send an email or display a notice asking for people to participate in an interview or complete a questionnaire The advantages are that it is cheap and easy, and what is more you are guaranteed to get willing respondents Unfortunately this approach... Example 15. 8 The HR director from Example 15. 3 knows that the Reading workforce consists almost entirely of younger women who have young children, whereas at other locations the workforce is largely composed of older males with grown-up families Given this, she may judge that an appropriate sample would consist of all the employees at Reading and samples of the workforce at other locations The advantage . select for the sample. Example 15. 4 How can the HR director in Example 15. 3 use systematic sampling to select her sample of 400 employees? 472 Quantitative methods for business Chapter 15 As well. biased against the poor and unemployed, who largely voted for Roosevelt. 470 Quantitative methods for business Chapter 15 Example 15. 3 Strani Systems have 2000 employees in the UK. The HR director. statistically valid. 478 Quantitative methods for business Chapter 15 15.3.4 Convenience sampling This method is very simple: samples are chosen purely on the basis of accessibility. Example 15. 10 The HR