Mean
The mean, often referred to as the "arithmetic average," represents the central value of a set of scores When my daughter, then in fifth grade, expressed her confusion about averages after school, I realized the importance of explaining this concept clearly.
Jennifer, clearly frustrated, insisted, "Dad, this is serious!" as she interpreted my lighthearted response about calculating scores as teasing.
“See these numbers in your book; add them up What is the answer?” (She did that.)
“Now, how many numbers do you have?” (She answered that question.)
“Then, take the number you got when you added up the numbers, and divide that number by the number of numbers that you have.”
By applying the same reasoning, you will easily find the correct answer with Excel, as it automates all the necessary steps for you.
We will call this average of the scores the “mean” which we will symbolize as:
X, and we will pronounce it as: “Xbar.”
The formula for finding the mean with your calculator looks like this:
XẳΣX n ð1:1ị © Springer International Publishing Switzerland 2016
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
The Greek letter sigma (Σ) represents the concept of "sum," instructing you to total all scores denoted by the variable X and then divide that total by n, which signifies the count of numbers involved.
Suppose that you had these six scores:
To find the mean of these scores, you add them up, and then divide by the number of scores So, the mean is: 25/6ẳ4.17
Standard Deviation
The standard deviation (STDEV) indicates the proximity of scores to the mean; a small standard deviation signifies that the scores are closely clustered around the mean, while a large standard deviation reveals that the scores are more widely dispersed.
The formula look complicated, but what it asks you to do is this:
1 Subtract the mean from each score (XX).
2 Then, square the resulting number to make it a positive number.
3 Then, add up these squared numbers to get a total score.
4 Then, take this total score and divide it by n1 (where n stands for the number of numbers that you have).
5 The final step is to take the square root of the number you found in step 4.
This article focuses on calculating standard deviation using Excel rather than a calculator, which is commonly covered in basic statistics books By applying Excel to a set of six numbers, we determine that the standard deviation (STDEV) is 1.47.
2 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Standard Error of the Mean
The formula for the standard error of the mean(s.e., which we will use S X to symbolize) is: s:e:ẳS X ẳ S
To calculate the standard error (s.e.), divide the standard deviation (STDEV) by the square root of n, where n represents the total number of data points in your dataset For instance, if the standard deviation is 0.60, you can easily verify this calculation using a calculator.
If you want to learn more about the standard deviation and the standard error of the mean, see Weiers (2011).
Now, let’s learn how to use Excel to find the sample size, the mean, the standard deviation, and the standard error or the mean using a problem from sales:
To estimate the first-year sales of a new product set to launch, it is essential to analyze the initial sales data of similar products previously introduced by your company This comparative approach will provide valuable insights into typical sales performance, helping to forecast potential outcomes for the new product launch.
You decide to use the first-year sales of a similar product over the past eight years, and you have created the table in Fig.1.1:
Note that the first-year sales are in thousands of dollars ($000), so that 10 means that the first-year sales of that product were really $10,000.
Fig 1.1 Worksheet Data for First-year Sales
1.3 Standard Error of the Mean 3
Sample Size, Mean, Standard Deviation, and Standard
Using the Fill/Series/Columns Commands
Objective: To add the years 2–8 in a column underneath year 1
Home (top left of screen)
Important note: The “Paste” icon should be on the top of your screen on the far left of the screen.
Important note: Notice the Excel commands at the top of your computer screen:
File Home Insert Page Layout Formulas etc.
If these commands ever “disappear” when you are using Excel, you need to click on “Home” at the top of your screen to make them reappear!
Fill (top right of screen: click on the down arrow; see Fig.1.2)
Fig 1.2 Home/Fill/Series commands
4 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
The years should be identified as 1–8, with 8 in cell A11.
To enhance the presentation of your data, begin by entering the first-year sales figures into cells B4:B11 as outlined in the provided table Additionally, to improve the visual appeal of your spreadsheet, it's essential to learn how to widen the column width for a more professional appearance.
“center the information” in a group of cells Here is how you can do those two steps:
Changing the Width of a Column
Objective: To make a column width wider so that all of the information fits inside that column
To ensure that all information is properly displayed, it is necessary to widen Column B on your computer screen.
Click on the letter, B, at the top of your computer screen
Place your mouse pointer at the far right corner of B until you create a “cross sign” on that corner
Left-click on your mouse, hold it down, and move this corner to the right until it is
“wide enough to fit all of the data”
To adjust the column width, simply lift your finger from the mouse (refer to Fig 1.4) Additionally, Fig 1.3 illustrates the dialogue box used for Fill/Series commands, including options for setting step and stop values This section also covers important statistical concepts such as sample size, mean, standard deviation, and standard error of the mean.
Then, click on any empty cell (i.e., any blank cell) to “deselect” column B so that it is no longer a darker color on your screen.
When you widen a column, you will make all of the cells in all of the rows of this column that same width.
Now, let’s go through the steps to center the information in both Column A andColumn B.
Centering Information in a Range of Cells
Objective: To center the information in a group of cells
In order to make the information in the cells look “more professional,” you can center the information using the following steps:
Left-click your mouse on A3 and drag it to the right and down to highlight cells A3:B11 so that these cells appear in a darker color
At the top of your computer screen, you'll find a series of lines that are uniformly centered in width under the "Alignment" section, which can be identified as the second icon at the bottom left of the Alignment box (refer to Fig 1.5).
Fig 1.4 Example of How to Widen the Column Width
6 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Click on this icon to center the information in the selected cells (see Fig.1.6)
To simplify referencing first-year sales figures in your formulas, it's beneficial to assign a name to the range of data instead of memorizing specific cell locations like B4:B11 For instance, you can name this group of cells "Product," or choose any name that suits your preference.
Fig 1.5 Example of How to Center Information Within Cells
Fig 1.6 Final Result of Centering Information in the Cells
1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 7
Naming a Range of Cells
Objective: To name the range of data for the first-year sales figures with the name: Product
Highlight cells B4:B11 by left-clicking your mouse on B4 and dragging it down to B11
Formulas (top left of your screen)
Define Name (top center of your screen)
Product (type this name in the top box; see Fig.1.7)
Then, click on any cell of your spreadsheet that does not have any information in it (i.e., it is an “empty cell”) to deselect cells B4:B11
Now, add the following terms to your spreadsheet:
Fig 1.7 Dialogue box for “naming a range of cells” with the name: Product
8 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
When using formulas in Excel, it is essential to start each formula with an equal sign (ẳ) to indicate to the program that a calculation is intended.
Finding the Sample Size Using
Objective: To find the sample size (n) for these data using the ẳCOUNT function
Hit the Enter key, and this command should insert the number 8 into cell F6 since there are eight first-year sales figures.
Finding the Mean Score Using
Objective: To find the mean sales figure using theẳAVERAGE function
This command should insert the number 23.125 into cell F9.
Fig 1.8 Example of Entering the Sample Size, Mean, STDEV, and s.e Labels
1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 9
Finding the Standard Deviation Using the ẳ STDEV
Objective: To find the standard deviation (STDEV) using theẳSTDEV function
This command should insert the number 14.02485 into cell F12.
Finding the Standard Error of the Mean
Objective: To find the standard error of the mean using a formula for these eight data points
This command should insert the number 4.958533 into cell F15 (see Fig.1.9).
It is crucial to verify that all figures in your spreadsheet are accurately placed in their respective cells, as any discrepancies can lead to incorrect formula calculations.
Fig 1.9 Example of Using Excel Formulas for Sample Size, Mean, STDEV, and s.e.
10 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
1.4.8.1 Formatting Numbers in Number Format (2 Decimal Places)
Objective: To convert the mean, STDEV, and s.e to two decimal places
Home (top left of screen)
To decrease the number of decimal places displayed, locate the “Number” section at the top center of your screen Then, position your mouse pointer at the bottom right corner of the 00 0 until the option “Decrease Decimal” appears.
Click on this icononce and notice that the cells F9:F15 are now all in just two decimal places (see Fig.1.11)
Fig 1.10 Using the “Decrease Decimal Icon” to convert Numbers to Fewer Decimal Places
Fig 1.11 Example of Converting Numbers to Two Decimal Places
1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 11
Important note: The sales figures are in thousands of dollars ($000), so that the mean is $23,130, the standard deviation is $14,020, and the standard error of the mean is $4,960.
Now, click on any “empty cell” on your spreadsheet to deselect cells F9:F15.
Saving a Spreadsheet
Objective: To save this spreadsheet with the name: Product6
To ensure you can access your spreadsheet later, the first step is to choose the appropriate location for saving it You have multiple options, including saving it directly to your computer's hard drive If you're unsure how to do this, consider seeking assistance from someone familiar with your computer.
Or, you can save it onto a “CD” or onto a “flash drive.” You then need to complete these steps:
File (top of screen, far left icon)
(select the place where you want to save the file: for example: This PC: My Documents location)
File name: Product6 (enter this name to the right of File name; see Fig.1.12)
12 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Save (bottom right of dialog box)
Important note: Be very careful to save your Excel file spreadsheet every few minutes so that you do not lose your information!
Printing a Spreadsheet
Objective: To print the spreadsheet
Use the following procedure when printing any spreadsheet.
File (top of screen, far left icon)
Fig 1.12 Dialogue Box of Saving an Excel Workbook File as “Product6” in My Documents location
Print (at top left of screen)
The final spreadsheet is given in Fig1.14
Fig 1.13 Example of How to Print an Excel Worksheet Using the File/Print Commands
14 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Before concluding this chapter, let's focus on formatting figures in a spreadsheet through two practical examples: first, applying two decimal places for dollar amounts, and second, utilizing three decimal places for other numerical figures.
Save the final spreadsheet by: File/Save, then close your spreadsheet by: File/ Close, and then open a blank Excel spreadsheet by using:
File/New/Blank Workbook icon (on the top left of your screen).
Formatting Numbers in Currency Format (2 Decimal Places)
Objective: To change the format of figures to dollar format with two decimal places
Highlight cells A4:A6 by left-clicking your mouse on A4 and dragging it down so that these three cells are highlighted in a darker color
Number (top center of screen: click on the down arrow on the right; see Fig.1.15) Fig 1.14 Final Result of Printing an Excel Spreadsheet
1.7 Formatting Numbers in Currency Format (2 Decimal Places) 15
Decimal places: 2 (then see Fig.1.16)
Fig 1.15 Dialogue Box for Number Format Choices
Fig 1.16 Dialogue Box for Currency (2 decimal places) Format for Numbers
16 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
The three cells should have a dollar sign in them and be in two decimal places.Next, let’s practice formatting figures in number format, three decimal places.
Formatting Numbers in Number Format (3 Decimal Places)
Objective: To format figures in number format, three decimal places
Highlight cells A4:A6 on your computer screen
Number (click on the down arrow on the right)
At the right of the box, change 2 decimal places to 3 decimal places by clicking on the “up arrow” once
Ensure that the three figures are formatted as numbers with three decimal places Next, click on any empty cell to deselect the range A4:A6 Finally, close the file by navigating to File, then Close, and select 'Don’t Save' since there is no need to retain this practice problem.
You can use these same commands to format a range of cells in percentage format (and many other formats) to whatever number of decimal places you want to specify.
End-of-Chapter Practice Problems
1 Suppose that you have selected a random sample from last week’s customers at Wal-Mart You then created Fig.1.17:
1.9 End-of-Chapter Practice Problems 17
To analyze the data effectively, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure to label each result clearly and round the mean, standard deviation, and standard error to two decimal places, formatting these values in currency style for clarity.
(b) Print the result on a separate page.
(c) Save the file as: WAL6
The Human Resources department conducted a "Morale Survey" among middle-level managers, and the results of item #21 have been analyzed to evaluate employee sentiment The data indicates trends in morale levels, highlighting areas of concern and satisfaction among the management team This analysis aims to provide insights that can inform future HR initiatives and enhance overall workplace morale.
Fig 1.17 Worksheet Data for Chap 1: Practice
18 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
To analyze the ratings, first create a table in Excel to organize the data Then, calculate the sample size, mean, standard deviation, and standard error of the mean, ensuring to label each result clearly Round the mean, standard deviation, and standard error to two decimal places using the number format feature Finally, print the results on a separate page for clarity.
(c) Save the file as: MORALE4
3 Suppose that you have been hired to do analysis of data from the previous
During an 18-day observation period at a Ford assembly plant producing Ford Focus vehicles, the plant manager requested a summary of daily defects identified in the production process A defect is classified as any irregularity that necessitates the removal of a car from the production line for repairs before shipment to dealers The data collected over these three weeks reveals the frequency of defects, highlighting areas for potential improvement in quality control.
Fig 1.18 Worksheet Data for Chap 1: Practice Problem #2
1.9 End-of-Chapter Practice Problems 19
To analyze the data effectively, create a table in Excel and calculate the sample size, mean, standard deviation, and standard error of the mean Ensure to label each result clearly and round the mean, standard deviation, and standard error of the mean to three decimal places using the number format in Excel.
(b) Print the result on a separate page.
(c) Save the file as: DEFECTS4
Weiers, R.M Introduction to Business Statistics (7 th ed.) Mason, OH: South-Western Cengage Learning, 2011.
Fig 1.19 Worksheet Data for Chap 1: Practice
20 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Suppose that you wanted to take a random sample of 5 of your company’s
32 salespeople using Excel so that you could interview these five salespeople about their job satisfaction at your company.
To create a random sample, it is essential to establish a "sampling frame," which is essentially a list of individuals you intend to sample from This frame begins with assigning an identification code (ID) starting from 1 for the first salesperson in your list of 32 salespeople, with each subsequent salesperson receiving a sequential code number—2 for the second, 3 for the third, and so on, until the last salesperson is designated with the code number 32.
Since your company has 32 salespeople, your sampling frame would go from
1 to 32 with each salesperson having a unique ID number.
We will first create the frame numbers as follows in a new Excel worksheet:
Creating Frame Numbers for Generating Random Numbers
Objective: To create the frame numbers for generating random numbers
To create frame numbers in column A, utilize the Home/Fill commands as described in Section 1.4.1 of this book Begin by filling the cells sequentially from 1 to 32, ensuring that cell A35 contains the number 32 Follow the outlined steps for a seamless execution of this task.
Click on cell A4 to select this cell © Springer International Publishing Switzerland 2016
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
Fill (then click on the “down arrow” next to this command and select)
Then, save this file as: Random2 You should obtain the result in Fig.2.3.
Fig 2.1 Dialogue Box for Fill/Series Commands
Fig 2.2 Dialogue Box for Fill/Series/Columns/Step value/Stop value Commands
Now, create a column next to these frame numbers in this manner:
To format your spreadsheet correctly, use the Home/Fill command to populate frame numbers starting from cell B4 to B35 Ensure that columns A and B are widened to accommodate all data, and center the information in both columns for a neat appearance, as demonstrated in Fig 2.4.
Fig 2.3 Frame Numbers from 1 to 32
2.1 Creating Frame Numbers for Generating Random Numbers 23
Save this file as: Random3
To ensure that you have a complete set of 32 frame numbers before sorting them into a random sequence, you may have duplicated the information in both Column A and Column B of your spreadsheet This redundancy serves as a precautionary measure to confirm that all 32 numbers are accounted for prior to the sorting process.
Now, let’s add a random number to each of the duplicate frame numbers as follows:
Fig 2.4 Duplicate Frame Numbers from 1 to 32
Creating Random Numbers in an Excel Worksheet
(then widen columns A, B, C so that their labels fit inside the columns; then center the information in A3:C35)
Next, hit the Enter key to add a random number to cell C4.
To utilize the RAND() function effectively, ensure that both an open and closed parenthesis are present after the command The RAND function generates a random number by referencing the cells to the left of where it is applied.
To add a random number to all 32 ID frame numbers, position your mouse pointer over cell C4 and move it to the bottom right corner until a “plus sign” appears Then, click and drag the pointer down to cell C35.
Fig 2.5 Example of Random Numbers Assigned to the Duplicate Frame Numbers
2.2 Creating Random Numbers in an Excel Worksheet 25
Then, click on any empty cell to deselect C4:C35 to remove the dark color highlighting these cells.
Save this file as: Random3A
Now, let’s sort these duplicate frame numbers into a random sequence:
Sorting Frame Numbers into a Random Sequence
Objective: To sort the duplicate frame numbers into a random sequence
Highlight cells B3:C35 (include the labels at the top of columns B and C) Data (top of screen)
Sort (click on this word at the top center of your screen; see Fig.2.6)
Fig 2.6 Dialogue Box for Data/Sort Commands
Sort by: RANDOM NO (click on the down arrow)
Smallest to Largest (see Fig.2.7)
Click on any empty cell to deselect B3:C35.
Save this files as: Random4
These steps will produce Fig.2.8with the DUPLICATE FRAME NUMBERS sorted into a random order:
Important note: Because Excel randomly assigns these random numbers, your
Excel commands will produce a different sequence of random numbers from everyone else who reads this book!
Fig 2.7 Dialogue Box for Data/Sort/RANDOM NO./Smallest to Largest Commands
2.3 Sorting Frame Numbers into a Random Sequence 27
Because your objective at the beginning of this chapter was to select randomly
5 of your company’s 32 salespeople for a personal interview, you now can do that by selecting thefirst five ID numbersin DUPLICATE FRAME NO column after the sort.
In this chapter, we will choose five unique IDs of salespeople for interviews, which will differ from the random numbers previously selected in our random sort, as illustrated in Fig 2.9.
Fig 2.8 Duplicate Frame Numbers Sorted by Random Number
Each time you use the RAND() command in Excel, it generates a new set of random numbers, meaning that the five ID numbers you select will differ from those shown in Fig 2.9.
Before concluding this chapter, it's essential to understand how to print a file effectively, ensuring that all information fits neatly on a single page without spilling over onto additional pages.
2.4 Printing an Excel File So That All of the Information Fits onto One Page
Objective: To print a file so that all of the information fits onto one page Fig 2.9 First Five Salespeople Selected Randomly
2.4 Printing an Excel File So That All of the Information Fits onto One Page 29
The three practice problems at the end of this chapter involve sorting random numbers from files containing 63 customers, 114 counties in Missouri, and 76 key accounts To ensure these files fit onto a single printed page, it is essential to format them appropriately before printing, as they may be too large otherwise.
Let’s create a situation where the file does not fit onto one printed page unless you format it first to do that.
Go back to the file you just created, Random 4, and enter the name:Jenniferinto cell: A52.
When printing this file, the name "Jennifer" will appear on a second page due to its overflow beyond the designated page limits in the current format.
To ensure that all information, including the name Jennifer, fits onto a single page when printing, you need to adjust the page format by following these steps.
Page Layout (top left of the computer screen)
(Notice the “Scale to Fit” section in the center of your screen; see Fig.2.10)
Hit the down arrow to the right of 100 %onceto reduce the size of the page to
In the context of page layout, it's important to observe that the name "Jennifer" appears on a second page, positioned below the horizontal dotted line, as illustrated in Fig 2.11 This dotted line serves as a guide for the outline dimensions of the file when printed Additionally, Fig 2.10 displays the dialogue box for the Page Layout and Scale to Fit commands, which are essential for adjusting the document's formatting.
To resize the worksheet to 90% of its original size, simply press the down arrow on the right once more, following the “scale change steps.” As shown in Fig 2.12, the dotted lines on your screen now appear below Jennifer’s name, indicating that all content, including her name, is formatted to fit on a single printed page.
Fig 2.11 Example of Scale Reduced to 95 % with “Jennifer” to be Printed on a Second Page2.4 Printing an Excel File So That All of the Information Fits onto One Page 31
Save the file as: Random4A
Print the file Does it all fit onto one page? It should (see Fig.2.13).
Fig 2.12 Example of Scale Reduced to 90 % with “Jennifer” to be printed on the first page (note the dotted line below Jennifer on your screen)
Fig 2.13 Final Spreadsheet of 90 % Scale to Fit
2.4 Printing an Excel File So That All of the Information Fits onto One Page 33
End-of-Chapter Practice Problems
1 Suppose that you wanted to do a “customer satisfaction phone survey” of 15 of
63 customers who purchased at least $1,000 worth of merchandise from your company during the last 60 days.
(a) Set up a spreadsheet of frame numbers for these customers with the heading: FRAME NUMBERS using the Home/Fill commands.
To organize your data effectively, first create a column labeled "Frame Numbers" containing your original frame numbers Next, add a new column titled "Duplicate Frame Numbers" to the right, replicating the frame numbers from the first column In the following column, utilize the =RAND() function to generate random numbers corresponding to each frame number in the "Duplicate Frame Numbers" column Finally, format this column to display each random number with three decimal places for clarity.
(d) Sort the duplicate frame numbers and random numbers into a random order (e) Print the result so that the spreadsheet fits onto one page
(f) Circle on your printout the I.D number of the first 15 customers that you would call in your phone survey
(g) Save the file as: RAND9
It's important to understand that each time the RAND() function is used in Excel, it generates a unique random order of customer ID numbers Consequently, the sequence of random numbers presented in this Excel Guide will differ from the one you create, which is completely normal and expected.
To conduct a random sample of 10 out of the 114 counties in Missouri for a political pollster's phone survey on registered voters' preferences in the upcoming election, access the U.S Census Bureau's website for accurate county data Missouri, as part of the United States, contributes to the total of 3,140 counties across all 50 states.
(a) Set up a spreadsheet of frame numbers for these counties with the heading: FRAME NO.
(b) Then, create a separate column to the right of these frame numbers which duplicates these frame numbers with the title: Duplicate frame no.
To enhance your spreadsheet, add a new column next to the duplicate frame numbers labeled “Random Number.” Utilize the =RAND() function to generate random numbers for each entry in the duplicate frame numbers column Finally, adjust the formatting of this new column to display each random number with three decimal places.
(d) Sort the duplicate frame numbers and random numbers into a random order
(e) Print the result so that the spreadsheet fits onto one page
(f) Circle on your printout the I.D number of the first 10 counties that the pollster would call in his phone survey
(g) Save the file as: RANDOM6
To enhance customer relations, your Sales department plans to conduct a customer satisfaction survey targeting 20 out of 76 key accounts, defined by the Sales Vice-President as clients who have made purchases exceeding $30,000 in the last 90 days.
(a) Set up a spreadsheet of frame numbers for these customers with the heading: FRAME NUMBERS.
To organize your data effectively, first create a column titled "Duplicate Frame Numbers" adjacent to the original frame numbers Next, add another column labeled "Random Number" to the right of the duplicate frame numbers In this column, utilize the =RAND() function to generate random numbers corresponding to each frame number Finally, format this column to display each random number with three decimal places for clarity.
(d) Sort the duplicate frame numbers and random numbers into a random order (e) Print the result so that the spreadsheet fits onto one page
(f) Circle on your printout the I.D number of the first 20 customers that your Sales Vice-President would call for his phone survey.
(g) Save the file as: RAND5
U.S Census Bureau Census 2000 PHC-T-4 Ranking tables for counties 1990 and 2000 Retrieved from http://www.census.gov/population/www/cen2000/briefs/phc-t4/tables/tab01.pdf
Confidence Interval About the Mean Using the TINV Function and Hypothesis Testing
This chapter focuses on two ideas: (1) finding the 95 % confidence interval about the mean, and (2) hypothesis testing.
Let’s talk about the confidence interval first.
Confidence Interval About the Mean
How to Estimate the Population Mean
Objective: To estimate the population mean,μ
The population mean represents the average of individuals within a specific target group, such as adults aged 25–44 For instance, assessing the preference of this age group for a new Ben & Jerry’s ice cream flavor would be impractical if we attempted to survey every individual in the U.S within that demographic, as it would be time-consuming and costly.
Instead of testing the entire population, we can save time and money by taking a sample of individuals and using the results to estimate the population mean This method, known as "inferential statistics," allows us to infer the population mean based on the sample mean.
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
In business research, we analyze a sample of individuals, where we determine the sample size (n), the sample mean (X), and the sample standard deviation (STDEV) These statistics allow us to estimate the population mean through a method known as the “confidence interval about the mean.”
Estimating the Lower Limit and the Upper
of the 95 Percent Confidence Interval About the Mean
The theoretical background of this test is beyond the scope of this book, and you can learn more about this test from studying any good statistics textbook (e.g Levine
2011) but the basic ideas are as follows.
We assume that the population mean is somewhere in an interval which has a
In this book, we establish a "lower limit" and an "upper limit" for our confidence interval, aiming for a 95% confidence level that the population mean falls within this range.
“We are 95 % confident that the population mean in miles per gallon (mpg) for the Chevy Impala automobile is between 26.92 miles per gallon and 29.42 miles per gallon.”
In our research study, we found that the car achieves an impressive fuel efficiency of 28 miles per gallon (mpg), which falls within the 95% confidence interval of 26.92 mpg to 29.42 mpg While we cannot pinpoint the exact population mean, we can confidently advertise the car's fuel efficiency as 28 mpg, knowing it lies within this established range.
But we are only 95 % confident that the population mean is inside this interval, and 5 % of the time we will be wrong in assuming that the population mean is
In business research, we typically aim for a 95% confidence level in our assumptions, which, while arbitrary, serves as a standard for our results Although we could opt for different confidence levels—such as 80%, 90%, or even 99%—this book will consistently utilize a 95% confidence threshold This approach eliminates any uncertainty regarding the desired confidence level for the problems presented, ensuring clarity and consistency throughout the material.
So how do we find the 95 % confidence interval about the mean for our data?
In words, we will find this interval this way:
To calculate the confidence interval, start with the sample mean (X) and add 1.96 times the standard error of the mean (s.e.) to determine the upper limit For the lower limit, subtract 1.96 times the standard error of the mean from the sample mean.
38 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
The standard error of the mean (s.e.) is calculated by dividing the standard deviation of the sample (STDEV) by the square root of the sample size (n).
In mathematical terms, the formula for the 95 % confidence interval about the mean is:
To calculate the confidence interval, you start by determining the upper limit by adding 1.96 times the standard error (s.e.) to the mean Conversely, the lower limit is found by subtracting 1.96 times the s.e from the mean In this context, the term "1.96 s.e." represents the product of 1.96 and the standard error of the mean, which is essential for constructing the confidence interval.
Note: We will explain shortly where the number 1.96 came from.
Let’s try a simple example to illustrate this formula.
Estimating the Confidence Interval the Chevy
If Chevy Impala owners were to meticulously record their mileage alongside the gallons consumed for two full tanks of gas, they would gain valuable insights into the vehicle's fuel efficiency This data collection would help determine the average miles per gallon, allowing for a better understanding of the Impala's performance and potential areas for improvement By analyzing this information, owners can make informed decisions about their driving habits and maintenance practices to optimize fuel usage.
A study involving 49 vehicle owners revealed an average fuel efficiency of 27.83 miles per gallon (mpg), with a standard deviation of 3.01 mpg The standard error of this average is calculated to be 0.43, derived from dividing the standard deviation by the square root of the sample size.
The 95 % confidence interval for these data would be:
Theupper limit of this confidence intervaluses the plus sign of thesign in the formula Therefore, the upper limit would be:
Similarly,the lower limit of this confidence intervaluses the minus sign of the sign in the formula Therefore, the lower limit would be:
The result of our research study would, therefore, be the following:
“We are 95 % confident that the population mean for the Chevy Impala is somewhere between 26.99 mpg and 28.67 mpg.”
3.1 Confidence Interval About the Mean 39
Our data supports the claim that this car achieves 28 miles per gallon (mpg), as this figure falls within the 95% confidence interval for the population mean.
You are probably asking yourself: “Where did that 1.96 in the formula come from?”
Where Did the Number “1.96” Come From?
A detailed mathematical answer to that question is beyond the scope of this book, but here is the basic idea.
We assume that the population data follows a "normal distribution," resembling a "normal curve" if every individual in the population were tested This curve, which mirrors the outline of the Liberty Bell located in Philadelphia, Pennsylvania, is characterized by its symmetry; if divided down the center and folded, one half aligns perfectly with the other.
This article briefly touches on integral calculus, focusing on determining the lower and upper limits of population data within a normal curve, ensuring that 95% of the area lies between these limits For research studies involving over 40 participants, these limits are calculated as plus or minus 1.96 times the standard error of the mean (s.e.) from the sample This calculation provides the confidence interval's upper and lower bounds For further insights into this concept, readers are encouraged to refer to a reputable statistics book, such as Salkind (2010).
The number 1.96 would change if we wanted to be confident of our results at a different level from 95 % as long as we have more than 40 people in our research study.
1 If we wanted to be 80 % confident of our results, this number would be 1.282.
2 If we wanted to be 90 % confident of our results, this number would be 1.645.
3 If we wanted to be 99 % confident of our results, this number would be 2.576.
In this book, we aim for a 95% confidence level in our results, which is why we consistently use a value of 1.96 when our research study involves more than 40 participants.
You might be wondering if the confidence interval for the mean is always 1.96 The answer is no, and we will clarify the reasons behind this.
40 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
Finding the Value for t in the Confidence
Objective: To find the value for t in the confidence interval formula
The correct formula for the confidence interval about the mean for different sample sizes is the following:
To calculate a 95% confidence interval, start by determining the sample mean (X) For the upper limit, add the product of the t-value and the standard error (s.e.) to the sample mean Conversely, for the lower limit, subtract the product of the t-value and the standard error from the sample mean To find the appropriate t-value, refer to the table provided in Appendix E of this book.
Objective: To find the value of t in the t-table in AppendixE
Before we get into an explanation of what is meant by “the value of t,” let’s give you practice in finding the value of t by using the t-table in AppendixE.
Keep your finger on Appendix Eas we explain how you need to “read” that table.
In this chapter, the test referred to as the "confidence interval about the mean test" requires you to consult the first column labeled "sample size n" in Appendix E to determine the critical value of t for your research study.
To determine the value of t for your research study, locate the sample size in the first column of the table Then, move to the right to find the corresponding t value in the "critical t column," which is applicable for a 95% confidence interval about the mean For instance, with a sample size of 14 participants, the t value is 2.160.
If you have 26 people in your research study, the value of t is 2.060.
If you have more than 40 people in your research study, the value of t is always 1.96.
In Appendix E, the "critical t column" provides the necessary t value to achieve 95% confidence in your statistical results This book operates under the assumption that you aim for 95% confidence in your statistical tests Consequently, the t value from the t-table in Appendix E is essential for calculating the 95% confidence interval for the mean.
To calculate the confidence interval for the mean using Excel, first determine the value of t, which is essential for your calculations Once you have the t-value, you can apply it within Excel to derive the confidence interval effectively.
3.1 Confidence Interval About the Mean 41
Using Excel ’ s TINV Function to Find
Objective: To use the TINV function in Excel to find the confidence interval about the mean
When you use Excel, the formulas for finding the confidence interval are:
Lower limit: ẳXTINV 1ð 0:95, n1ị*s:e: ðno spaces between these symbolsị ð3:3ị
Upper limit: ẳXỵTINV 1ð 0:95, n1ị*s:e: ðno spaces between these symbolsị ð3:4ị
In Excel formulas, the asterisk (*) indicates multiplication, representing the concept of "times." Additionally, as mentioned in Chapter 1, "n" refers to the sample size, while "s" denotes the sample size minus one.
The standard error of the mean (s.e.) is calculated by dividing the standard deviation (STDEV) by the square root of the sample size (n), as outlined in Chapter 1 To illustrate this concept, we will use Excel to determine the 95% confidence interval for the mean in a practical example.
Suppose that General Motors wanted to claim that its Chevy Impala gets
28 miles per gallon (mpg), and that it wanted to advertise on a billboard in
St Louis at the Vandeventer entrance to Route 44: “The new Chevy Impala gets
The reference value for this car is 28 miles per gallon (mpg) As a Ford Motor Co employee, you aim to verify this fuel efficiency claim through research To assess the accuracy of the 28 mpg figure, you will collect data and apply a two-sided 95% confidence interval to analyze the mean results.
Using Excel to Find the 95 Percent Confidence
Objective: To analyze the data using a two-side 95 % confidence interval about the mean
In a study involving new car owners, participants were tasked with monitoring their mileage over two tanks of gas while recording the average miles per gallon achieved The findings are illustrated in Figure 3.1, showcasing the performance metrics gathered from this research.
42 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
To analyze the data effectively, create a spreadsheet in Excel that calculates the sample size (n), mean, standard deviation (STDEV), and standard error of the mean (s.e.) Utilize the designated cell references to ensure accurate results.
Enter the other mpg data in cells A7:A30
To enhance the professionalism of your table, first highlight cells A6:A30 and format the numbers to one decimal place, centering them in Column A Next, double the width of both columns A and B, and increase column C to three times the original width of column A.
Fig 3.1 Worksheet Data for Chevy Impala (Practical Example)
3.1 Confidence Interval About the Mean 43
B26: Draw a picture below this confidence interval
B29: lower (then right-align this word)
B30: limit (then right-align this word)
C28: ‘ -– 28 -–28.17 -– (note that you need to begin cell C28 with asingle quotation mark (‘) to tell Excel that this is a label, and not a number)
D28: ‘ - (notice the single quotation mark at the beginning) E28: ‘29.42 (note the single quotation mark)
Fig 3.2 Example of Chevy Impala Format for the Confidence Interval About the Mean Labels
44 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
Now, align the labels underneath the picture of the confidence interval so that they look like Fig.3.3.
Next, name the range of data from A6:A30 as: miles
D7: Use Excel to find the sample size
D10: Use Excel to find the mean
D13: Use Excel to find the STDEV
D16: Use Excel to find the s.e.
Now, you need to find the lower limit and the upper limit of the 95 % confidence interval for this study.
Fig 3.3 Example of Drawing a Picture of a Confidence Interval About the Mean Result
3.1 Confidence Interval About the Mean 45
We will use Excel’s TINV function to do this We will assume that you want to be 95 % confident of your results.
F21: ẳD10TINV 1ð :95, 24ị*D16 ðno spaces betweenị
Note that this TINV formula uses 24 since 24 is one less than the sample size of
25 (i.e., 24 is n1) Note that D10 is the mean, while D16 is the standard error of the mean The above formula gives thelower limit of the confidence interval, 26.92.
F23: ẳD10ỵTINV 1ð :95, 24ị*D16 ðno spaces betweenị
The upper limit of the confidence interval is 29.42, as indicated by the formula To ensure clarity in your Excel spreadsheet, format the mean, standard deviation, standard error of the mean, and both the lower (26.92) and upper limits of the confidence interval to two decimal places If printed in the current layout, the lower and upper limits may extend onto a second page due to their size, making it essential to adjust the formatting for better presentation.
To adjust the size of your spreadsheet in Excel, utilize the "Scale to Fit" commands found in the Page Layout section, reducing the size to 95% of its current dimensions After applying this adjustment, observe that the dotted line next to the values 26.92 and 29.42 indicates that these numbers will fit on a single printed page.
Fig 3.4 Result of Using the TINV Function to Find the Confidence Interval About the Mean
46 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
Note that you have drawn a picture of the 95 % confidence interval beneath cell B26, including the lower limit, the upper limit, the mean, and the reference value of
28 mpg given in the claim that the company wants to make about the car’s miles per gallon performance.
Now, let’s write the conclusion to your research study on your spreadsheet:
C33: Since the reference value of 28 is inside
C34: the confidence interval, we accept that
C35: the Chevy Impala does get 28 mpg.
The research study confirmed that the Chevy Impala achieved an average fuel efficiency of 28 miles per gallon, with the study's findings indicating a mean of 28.17 miles per gallon The resulting spreadsheet has been saved as CHEVY7.
Fig 3.5 Final Spreadsheet for the Chevy Impala Confidence Interval About the Mean
3.1 Confidence Interval About the Mean 47
Hypothesis Testing
Hypotheses Always Refer to the Population
or Events That You Are Studying
The first step is to understand that our hypotheses always refer to thepopulationof people under study.
When conducting research on 18–24 year-olds in St Louis, selecting a representative sample is crucial for ensuring that the findings can be generalized to the entire population of this age group in the city By carefully choosing participants, we aim to produce results that reflect the broader experiences and behaviors of all 18–24 year-olds in St Louis, rather than being limited to the specific individuals in our sample.
Our study focuses on the population of 18–24 year-olds in St Louis, with our specific group of participants referred to as the sample from this population.
Our sample sizes usually consist of only a few individuals, so we focus on the results primarily to determine how well they can be generalized to the larger population of interest.
48 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
That is why our hypotheses always refer to the population, and never to the sample of people in our study.
You will recall from Chap.1that we used the symbol:Xto refer to the mean of the sample we use in our research study (see Sect.1.1).
We will use the symbol:μ(the Greek letter “mu”) to refer to the population mean.
In testing our hypotheses, we are trying to decide which one of two competing hypothesesabout the population meanwe should accept given our data set.
The Null Hypothesis and the Research (Alternative) Hypothesis
The two main hypotheses in statistical analysis are the null hypothesis (H0) and the research hypothesis (H1), which is also known as the alternative hypothesis.
Let’s explain first what is meant by the null hypothesis and the research hypothesis:
(1) The null hypothesis is what we accept as true unless we have compelling evidence that it is not true.
(2) The research hypothesis is what we accept as true whenever we reject the null hypothesis as true.
In the American legal system, individuals are presumed innocent until proven guilty by a jury This principle establishes the null hypothesis that the defendant is innocent, while the research hypothesis posits that the defendant is guilty.
In Missouri, the state slogan "Show me" reflects the residents' skepticism, emphasizing their belief that actions are more significant than words Missourians value proof and are not easily swayed by mere claims; they expect individuals to demonstrate the truth of their assertions through their actions.
Hypothesis testing involves determining which of the two competing statements—the null hypothesis or the research hypothesis—is true Since both cannot coexist as true, the process uses statistical formulas to decide which hypothesis to accept and which to reject.
In business research, rating scales are frequently employed to assess individuals' attitudes towards a company, its products, or their purchasing intentions Commonly utilized scales include 5-point, 7-point, and 10-point formats, although various other scale values may also be implemented.
3.2.2.1 Determining the Null Hypothesis and the Research Hypothesis
When Rating Scales Are Used
Here is a typical example of a 7-point scale in attitude research in customer satisfaction studies (see Fig.3.6):
So, how do we decide what to use as the null hypothesis and the research hypothesis whenever rating scales are used?
Objective: To decide on the null hypothesis and the research hypothesis when- ever rating scales are used.
In order to make this determination, we will use a simple rule:
Rule: Whenever rating scales are used, we will use the “middle” of the scale as the null hypothesis and the research hypothesis.
In the above example, since 4 is the number in the middle of the scale (i.e., three numbers are below it, and three numbers are above it), our hypotheses become:
If the statistical test results show that the population mean for the attitude scale item is approximately 4, we accept the null hypothesis, indicating that our new car purchase experience was neutral, neither positive nor negative.
If our statistical test shows that the population mean significantly differs from 4, we reject the null hypothesis and accept the research hypothesis.
“The new car purchase experience was significantly positive” (this is true whenever our sample mean is significantly greater than our expected population mean of 4). or
“The new car purchase experience was significantly negative” (this is accepted as true whenever our sample mean is significantly less than our expected population mean of 4).
Fig 3.6 Example of a Rating Scale Item for a New Car Purchase (Practical Example)
50 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
Both of these conclusions cannot be true We accept one of the hypotheses as
“true” based on the data set in our research study, and the other one as “not true” based on our data set.
A business researcher's primary responsibility is to determine which hypothesis—either the null hypothesis or the research hypothesis—should be accepted as valid based on the data collected in the study.
Let’s try some examples of rating scales so that you can practice figuring out what the null hypothesis and the research hypothesis are for each rating scale.
In the spaces in Fig.3.7, write in the null hypothesis and the research hypothesis for the rating scales:
Here are the answers to these three questions:
1 The null hypothesis is 3, and the research hypothesis is not equal to 3 on this 5-point scale (i.e the “middle” of the scale is 3).
Fig 3.7 Examples of Rating Scales for Determining the Null Hypothesis and the Research Hypothesis
2 The null hypothesis is 4, and the research hypothesis is not equal to 4 on this 7-point scale (i.e., the “middle” of the scale is 4).
3 The null hypothesis is 5.5, and the research hypothesis is not equal to 5.5 on this 10-point scale (i.e., the “middle” of the scale is 5.5 since there are 5 numbers below 5.5 and 5 numbers above 5.5).
As another example, Holiday Inn Express in its Stay Smart Experience Survey uses 4-point scales where:
On this scale, the null hypothesis is: μẳ2.5 and the research hypothesis is: μ6ẳ2.5, because there are two numbers below 2.5, and two numbers above 2.5 on that rating scale.
Now, let’s discuss the 7 STEPS of hypothesis testing for using the confidence interval about the mean.
The 7 Steps for Hypothesis-Testing Using
the Confidence Interval About the Mean
Objective: To learn the 7 steps of hypothesis-testing using the confidence interval about the mean
There are seven basic steps of hypothesis-testing for this statistical test.
3.2.3.1 STEP 1: State the Null Hypothesis and the Research Hypothesis
When utilizing numerical scales in surveys, it's essential to focus on the midpoint of the scale For instance, in a 7-point scale ranging from 1 (poor) to 7 (excellent), the hypotheses should concentrate on the central values of the scale.
52 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
3.2.3.2 STEP 2: Select the Appropriate Statistical Test
In this chapter we are studying the confidence interval about the mean, and so we will select that test.
3.2.3.3 STEP 3: Calculate the Formula for the Statistical Test
You will recall (see Sect.3.1.5) that the formula for the confidence interval about the mean is:
In this chapter, we outlined the process for calculating the confidence interval for the mean using Excel The steps required to apply this formula effectively were discussed in detail.
1 Use Excel’sẳCOUNT function to find the sample size.
2 Use Excel’sẳAVERAGE function to find the sample mean,X.
3 Use Excel’sẳSTDEV function to find the standard deviation, STDEV.
4 Find the standard error of the mean (s.e.) by dividing the standard deviation (STDEV) by the square root of the sample size, n.
5 Use Excel’s TINV function to find the lower limit of the confidence interval.
6 Use Excel’s TINV function to find the upper limit of the confidence interval.
3.2.3.4 STEP 4: Draw a Picture of the Confidence Interval About the Mean, Including the Mean, the Lower Limit of the Interval, the Upper Limit of the Interval, and the Reference Value Given in the Null Hypothesis,H 0
3.2.3.5 STEP 5: Decide on a Decision Rule
(a)If the reference value is inside the confidence interval,accept the null hypoth- esis, H0
(b) If the reference value is outside the confidence interval, reject the null hypoth- esis, H0, and accept the research hypothesis, H1
3.2.3.6 STEP 6: State the Result of Your Statistical Test
When using the confidence interval to estimate the mean, there are two potential outcomes, but only one can be deemed "true." This means that your results will fall into one of these two categories.
Either: Since the reference value is inside the confidence interval, we accept the null hypothesis, H0
Or: Since the reference value is outside the confidence interval, we reject the null hypothesis, H0, and accept the research hypothesis, H1
3.2.3.7 STEP 7: State the Conclusion of Your Statistical Test in Plain English!
Summarizing the results of a statistical test, particularly the confidence interval for the mean, can be challenging, especially when aiming for clarity and simplicity for those without a statistics background It is essential to convey the findings in a way that is both concise and precise, ensuring that even individuals unfamiliar with statistical concepts can grasp the conclusion Throughout this book, we will provide ample opportunities to practice this critical skill.
Let’s set some basic rules for sating the conclusion of a hypothesis test.
Rule #1:Whenever you reject H0and accept H1, you must use the word “signifi- cantly” in the conclusion to alert the reader that this test found an important result.
Rule #2:Create an outline in words of the “key terms” you want to include in your conclusion so that you do not forget to include some of them.
Rule #3:Write the conclusion in plain English so that the reader can understand it even if that reader has never taken a statistics course.
To analyze the fuel efficiency of the Chevy Impala, we will utilize the Excel spreadsheet created previously The primary hypothesis is that the Chevy Impala achieves a mileage of 28 miles per gallon, as stated in the advertisement.
The reference value of 28 mpg falls within the 95% confidence interval for the data, allowing us to accept the null hypothesis (H0) for the Chevy Impala, confirming that the vehicle achieves an average fuel efficiency of 28 mpg.
Objective: To state the result when you accept H 0
Result: Since the reference value of 28 mpg is inside the confidence interval, we accept the null hypothesis, H0
Let’s try our three rules now:
Objective: To write the conclusion when you accept H 0
54 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
According to Rule #1, if the reference value falls within the confidence interval, the term "significantly" cannot be employed in the conclusion This fundamental guideline applies to all problems discussed in this chapter.
Rule #2: The key terms in the conclusion would be:
Rule #3: The Chevy Impala did get 28 mpg.
Writing a conclusion after accepting the null hypothesis (H0) is straightforward, as it simply reiterates the statement made in the null hypothesis In contrast, formulating a conclusion upon rejecting H0 and accepting the alternative hypothesis (H1) is more complex To enhance your skills in this area, let’s practice crafting conclusions through three case examples.
Objective: To write the result and conclusion when you reject H 0
CASE #1: Suppose that an ad in Business Week claimed that the Ford Escape
Hybrid got 34 miles per gallon The hypotheses would be:
Suppose that your research yields the following confidence interval:
30 31 32 34 lower Mean upper Ref. limit limit Value
Result: Since the reference value is outside the confidence interval, we reject the null hypothesis and accept the research hypothesis
The three rules for stating the conclusion would be:
Rule #1: We must include the word “significantly” since the reference value of
34 is outside the confidence interval.
Rule #2: The key terms would be:
– either “more than” or “less than”
Rule #3: The Ford Escape Hybrid got significantly less than 34 mpg, and it was probably closer to 31 mpg.
The conclusion indicates that the miles per gallon (mpg) was below 34, as the sample mean recorded was only 31 mpg Additionally, it is important to clarify that simply stating a result is "significantly less than" the null hypothesis is not enough; further context and analysis are necessary to fully understand the implications of the findings.
34 mpg,” because that does not tell the reader “how much less than 34 mpg” the sample mean was from 34 mpg To make the conclusion clear, you need to add:
“probably closer to 31 mpg” since the sample mean was only 31 mpg.
CASE #2: Suppose that you have been hired as a consultant by the St Louis
Symphony Orchestra (SLSO) to analyze the data from an Internet survey of attendees for a concert in Powell Symphony Hall in
St Louis last month You have decided to practice your data analysis skills on Question #7 given in Fig.3.8:
The hypotheses for this one item would be:
The null hypothesis posits that a mean score of 4 indicates that attendees were neutral regarding their satisfaction with SLSO concerts If the obtained mean score falls within a statistically significant range of 4 on the rating scale, it suggests that participants neither expressed satisfaction nor dissatisfaction with their concert experience The analysis yielded a confidence interval for this survey item, further informing the interpretation of attendee sentiments.
1.8 _2.8 _3.8 4 lower Mean upper Ref. limit limit Value
Result: Since the reference value is outside the confidence interval, we reject the null hypothesis and accept the research hypothesis.
Rule #1: You must include the word “significantly” since the reference value is outside the confidence interval
Rule #2: The key terms would be:
– either satisfied or dissatisfied (since the result is significant)
Fig 3.8 Example of a Survey Item Used by the St Louis Symphony Orchestra (SLSO)
56 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
Rule #3: Attendees were significantly dissatisfied, overall, on last month’s Internet survey with their experiences at concerts of the SLSO.
Note that you need to use the word “dissatisfied” since the sample mean of 2.8 was on the dissatisfied side of the middle of the rating scale.
The recent Guest Satisfaction Survey conducted at the Marriott Hotel located at the St Louis Airport revealed valuable insights from last week's customers, highlighting their experiences and preferences.
This item would have the following hypotheses:
Suppose that your research produced the following confidence interval for this item on the survey:
Result: Since the reference value is outside the confidence interval, we reject the null hypothesis and accept the research hypothesis
The three rules for stating the conclusion would be:
Rule #1: You must include the word “significantly” since the reference value is outside the confidence interval
Rule #2: The key terms would be:
– either “positive” or “negative” (we will explain this)
Rule #3: Customers at the St Louis Airport Marriott Hotel last week rated their check-in speed in a survey as significantly positive.
Fig 3.9 Example of a Survey Item from Marriott Hotels
In conclusion, it's important to note that in English, phrases like "significantly excellent" are not commonly used, as something is either excellent or not without modifiers Instead, we use "significantly positive" when the mean rating exceeds the neutral point, as seen with the check-in speed rating of 5.8, which is greater than 5.5 To further enhance your understanding of articulating research conclusions, the three practice problems at the end of this chapter will provide additional exercises, and this book will offer numerous examples to aid in writing clear and accurate conclusions for your research findings.
Alternative Ways to Summarize the Result
Different Ways to Accept the Null Hypothesis
The following quotes are typical of the language used in statistics and research books when the null hypothesis is accepted:
“The null hypothesis is not rejected.” (Black 2010, p 310)
58 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
“The null hypothesis cannot be rejected.” (McDaniel and Gates 2010, p 545)
“The null hypothesis claims that there is no difference between groups.” (Salkind 2010, p 193)
“The difference is not statistically significant.” (McDaniel and Gates 2010, p 545)
“ the obtained value is not extreme enough for us to say that the difference between
Groups 1 and 2 occurred by anything other than chance.” (Salkind 2010, p 225)
“If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative (hypothesis) is true.” (Keller 2009, p 358)
“The research hypothesis is not supported.” (Zikmund and Babin 2010, p 552)
Different Ways to Reject the Null Hypothesis
The following quotes are typical of the quotes used in statistics and research books when the null hypothesis is rejected:
“The null hypothesis is rejected.” (McDaniel and Gates 2010, p 546)
“If we reject the null hypothesis, we conclude that there is enough statistical evidence to infer that the alternative hypothesis is true.” (Keller 2009, p 358)
“If the test statistic ’ s value is inconsistent with the null hypothesis, we reject the null hypothesis and infer that the alternative hypothesis is true.” (Keller 2009, p 348)
“Because the observed value is greater than the critical value , the decision is to reject the null hypothesis.” (Black 2010, p 359)
“If the obtained value is more extreme than the critical value, the null hypothesis cannot be accepted.” (Salkind 2010, p 243)
“The critical t-value must be surpassed by the observed t-value if the hypothesis test is to be statistically significant ” (Zikmund and Babin 2010, p 567)
“The calculated test statistic exceeds the upper boundary and falls into this rejection region The null hypothesis is rejected.” (Weiers 2011, p 330)
It's important to recognize that the quotes mentioned are commonly referenced by statisticians and professors when interpreting hypothesis test results Therefore, you may encounter requests to summarize statistical test outcomes in terminology different from what is presented in this book.
End-of-Chapter Practice Problems
The St Louis Post-Dispatch manager requested an analysis of data from a recent survey targeting former subscribers who canceled their newspaper subscriptions within the last three months A random sample from this group was contacted via phone and asked a series of questions regarding their experience with the newspaper The hypothetical results for survey question #4 are illustrated in Fig 3.10.
3.4 End-of-Chapter Practice Problems 59
Top management is considering a new subscription price of $3.80 To determine if this price is reasonable, we can analyze the survey results, treating $3.80 as the null hypothesis for pricing.
Utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean for the price figures displayed in the table Ensure to label each result clearly and format the mean, standard deviation, and standard error of the mean in currency format with two decimal places.
To analyze your data, first input the null hypothesis and research hypothesis into your spreadsheet Next, utilize Excel's TINV function to calculate the 95% confidence interval for the mean of these figures, ensuring to label your results clearly Finally, format your answers in currency style, displaying two decimal places for accuracy.
(e) Enter yourconclusion in plain Englishonto your spreadsheet.
To print the final spreadsheet on a single page, refer to the objectives outlined in Chapter 2, Section 2.4 for guidance.
60 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
(g) On your printout, draw a diagram of this 95 % confidence interval by hand (h) Save the file as: POST9
The Human Resources department has requested an analysis of a recent morale survey conducted among managers to gauge their perceptions of working at the company To test Excel skills, a random sample of managers is selected, focusing on responses to Item #24 from the survey, with the hypothetical data presented in Fig 3.11.
Fig 3.11 Worksheet Data for Chap 3: Practice Problem #2
3.4 End-of-Chapter Practice Problems 61
Create an Excel spreadsheet with these data.
To analyze the data, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure to label each result clearly, and present the mean, standard deviation, and standard error with two decimal places for precision.
(b) Enter the null hypothesis and the research hypothesis for this item on your spreadsheet.
To determine the 95% confidence interval for the mean using Excel’s TINV function, input the appropriate data into your spreadsheet Ensure to label your results clearly, displaying the lower and upper limits of the confidence interval with two decimal places for accuracy.
(d) Enter theresultof the test on your spreadsheet.
To conclude the test, clearly summarize your findings in plain English on your spreadsheet Ensure that the final version of your spreadsheet is printed to fit on a single page; for guidance, refer to the objectives outlined in Chapter 2, Section 2.4 Additionally, include a visual representation of the confidence interval, highlighting the reference value, directly on your spreadsheet.
(h) Save the final spreadsheet as: top8
In a study involving three focus groups of adult women aged 25 to 44, participants evaluated a new blouse design by a renowned designer, priced at $68.00 for retail sale in department stores The one-hour discussions culminated in a survey, which revealed insights into the women's preferences and perceptions of the blouse design, as illustrated in the hypothetical results presented in Fig 3.12.
62 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
Create an Excel spreadsheet with these data.
To analyze the data effectively, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure that each result is clearly labeled, and format the mean, standard deviation, and standard error of the mean to two decimal places in currency format.
(b) Enter the null hypothesis and the research hypothesis for this item onto your spreadsheet.
Fig 3.12 Worksheet Data for Chap 3: Practice Problem #3
3.4 End-of-Chapter Practice Problems 63
To calculate the 95% confidence interval for the mean using Excel's TINV function, input the relevant data into your spreadsheet Ensure to label your findings clearly, specifying both the lower and upper limits of the confidence interval Format these limits in currency with two decimal places for clarity.
(d) Enter theresultof the test on your spreadsheet.
To conclude the test, clearly summarize your findings in plain English on your spreadsheet Ensure that your final spreadsheet is formatted to fit on a single page, referring to the objectives in Chapter 2, Section 2.4 if needed Additionally, include a visual representation of the confidence interval along with the reference value on your spreadsheet.
(h) Save the final spreadsheet as: blouse9
Black, K Business Statistics: for Contemporary Decision Making (6 th ed.) Hoboken, NJ: John Wiley& Sons, Inc., 2010.
Keller, G Statistics for Management and Economics (8th ed.) Mason, OH: South-Western Cengage learning, 2009.
Levine, D.M Statistics for Managers using Microsoft Excel (6 th ed.) Boston, MA: Prentice Hall/ Pearson, 2011.
McDaniel, C and Gates, R Marketing Research (8 th ed.) Hoboken, NJ: John Wiley & Sons, Inc., 2010.
Salkind, N.J Statistics for People Who (think they) Hate Statistics (2 nd Excel 2007 ed.) Los Angeles, CA: Sage Publications, 2010.
Weiers, R.M Introduction to Business Statistics (7 th ed.) Mason, OH: South-Western Cengage Learning, 2011.
Zikmund, W.G and Babin, B.J Exploring Marketing Research (10 th ed.) Mason, OH: South- Western Cengage learning, 2010.
64 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis
One-Group t-Test for the Mean
In this chapter, you will learn how to use one of the most popular and most helpful statistical tests in business research: the one-group t-test for the mean.
The formula for the one-group t-test is as follows: tẳXμ
To calculate the standardized score, subtract the population mean (μ) from the sample mean (X), and then divide the result by the standard error of the mean (s.e.) The standard error is determined by dividing the standard deviation by the square root of the sample size (n).
Let’s discuss the 7 STEPS of hypothesis testing using the one-group t-test so that you can understand how this test is used.
The 7 STEPS for Hypothesis-Testing Using
STEP 1: State the Null Hypothesis and the Research
When utilizing numerical scales in surveys, it's crucial to focus on the midpoint of the scale For instance, in a 7-point scale ranging from 1 (poor) to 7 (excellent), the hypotheses should center around the middle values of this range.
As a second example, suppose that you worked for Honda Motor Company and that you wanted to place a magazine ad that claimed that the new Honda Fit got
35 miles per gallon (mpg) The hypotheses for testing this claim on actual data would be:
STEP 2: Select the Appropriate Statistical Test
In this chapter we will be studying the one-group t-test, and so we will select that test.
STEP 3: Decide on a Decision Rule
(a) If the absolute value of t is less than the critical value of t, accept the null hypothesis.
(b) If the absolute value of t is greater than the critical value of t, reject the null hypothesis and accept the research hypothesis.
You are probably saying to yourself: “That sounds fine, but how do I find the absolute value of t?”
4.1.3.1 Finding the Absolute Value of a Number
To do that, we need another objective:
Objective: To find the absolute value of a number
66 4 One-Group t-Test for the Mean
The absolute value, a fundamental concept from high school algebra, refers to the positive representation of any number, regardless of its original sign.
For example, the absolute value of 2.35 is +2.35.
And the absolute value of minus 2.35 (i.e.─2.35) is also +2.35.
Understanding the t-table in Appendix E is crucial for conducting the one-group t-test We will elaborate on how to find the critical value of t using this table in Step 5 of our discussion.
STEP 4: Calculate the Formula for the One-Group t-Test
for the One-Group t-Test
Objective: To learn how to use the formula for the one-group t-test
The formula for the one-group t-test is as follows: tẳXμ
This formula makes the following assumptions about the data (Foster et al.1998):
The data points are independent, meaning each individual receives a unique score Additionally, the overall population of the data follows a normal distribution, and the variance remains constant, with the standard deviation being the square root of this variance.
To use this formula, you need to follow these steps:
1 Take the sample mean in your research study and subtract the population meanμ from it (remember that the population mean for a study involving numerical rating scales is the “middle” number in the scale).
2 Then take your answer from the above step, and divide your answer by the standard error of the mean for your research study (you will remember that you learned how to find the standard error of the mean in Chap.1; to find the standard error of the mean, just take the standard deviation of your research study and divide it by the square root ofn, wherenis the number of people used in your research study).
3 The number you get after you complete the above step is the value fort that results when you use the formula stated above.
4.1 The 7 STEPS for Hypothesis-Testing Using the One-Group t-Test 67
4.1.5 STEP 5: Find the Critical Value of t in the t-Table in Appendix E
Objective: To find the critical value of t in the t-table in AppendixE
Before diving into the definition of "the critical value of t," let's practice locating it using the t-table found in Appendix E.
Keep your finger on Appendix Eas we explain how you need to “read” that table.
In this chapter, the test referred to is the "one-group t-test," and to determine the critical value of t for your research study, you should refer to the first column on the left in Appendix E, which is labeled "sample size n."
To determine the critical value of t, locate your sample size in the first column of the table, then move right to find the corresponding critical t value This value applies to both the one-group t-test and the 95% confidence interval for the mean.
For example, if you have 27 people in your research study, the critical value of t is 2.056.
If you have 38 people in your research study, the critical value of t is 2.026.
If you have more than 40 people in your research study, the critical value of t is always 1.96.
Note that the “critical t column” in AppendixErepresents the value of t that you need to obtain to be 95% confident of your results as “significant” results.
The critical value of t is the value that tells you whether or not you have found a
“significant result” in your statistical test.
The t-table in Appendix E displays a series of bell-shaped normal curves, named for their resemblance to the outline of the Liberty Bell, which is located in Philadelphia outside Independence Hall.
The center of normal curves is considered the zero point on the x-axis, a concept explained in detail in various statistics textbooks, such as Zikmund and Babin (2010), for those interested in a deeper understanding.
Values of t to the right of the zero point are positive and are denoted with a plus sign, while values to the left are negative and use a minus sign Therefore, t can take on both positive and negative values.
Most statistics books featuring a t-table only display the positive side of the t-curves, as the negative side is simply a mirror image of the positive This indicates that the negative side contains identical values to the positive side, with the only difference being that the negative numbers are prefixed with a minus sign.
68 4 One-Group t-Test for the Mean
To utilize the t-table in Appendix E, it is essential to take the absolute value of the t-value obtained from the t-test formula, as the t-table exclusively presents positive t-values.
This book operates under the assumption that you aim for 95% confidence in your statistical test results Consequently, the t-value listed in the t-table in Appendix E indicates whether the t-value derived from your one-group t-test formula falls within the 95% confidence interval of the t-curve, ensuring reliable outcomes.
When the t-value calculated from a one-group t-test falls within the 95% confidence interval, it indicates that the result is not statistically significant, which effectively means that we accept the null hypothesis.
If the t-value calculated from the one-group t-test falls outside the 95% confidence interval, it indicates a significant result that is likely to occur in less than 5% of cases This outcome suggests the rejection of the null hypothesis in favor of the research hypothesis.
STEP 6: State the Result of Your Statistical Test
There are two possible results when you use the one-group t-test, and only one of them can be accepted as “true.”
If the absolute value of t calculated from the t-test formula is less than the critical value listed in Appendix E, you accept the null hypothesis Conversely, if the absolute value of t exceeds the critical value in Appendix E, you reject the null hypothesis in favor of the research hypothesis.
STEP 7: State the Conclusion of Your
Summarizing the results of your statistical test in clear and concise language can be challenging, especially for an audience without a background in statistics, such as your boss This book will provide ample practice to help you master this crucial skill.
To perform a one-group t-test using Excel on hypothetical data from the Marriott Hotels Guest Satisfaction Survey, follow these steps to analyze the data effectively.
4.1 The 7 STEPS for Hypothesis-Testing Using the One-Group t-Test 69
One-Group t-Test for the Mean
Suppose that you have been hired as a statistical consultant by Marriott Hotel in
St Louis will evaluate data from a comprehensive Guest Satisfaction survey distributed to all patrons, aiming to assess customer satisfaction levels regarding various hotel activities.
The survey contains a number of items, but suppose item #7 is the one in Fig.4.1:
Suppose further, that you have decided to analyze the data from last week’s customers using the one-group t-test.
Important note: You would need to use this test for each of the survey items separately.
Last week, data from Item #7 at the St Louis Marriott Hotel was derived from a sample of 124 guests, yielding a mean score of 6.58 and a standard deviation of 2.44.
Objective: To analyze the data for each question separately using the one-group t-test for each survey item.
Create an Excel spreadsheet with the following information:
Note: Remember that when you are using a rating scale item, both the null hypothesis and the research hypothesis refer to the “middle of the scale.”
In a 10-point scale, the midpoint is 5.5, with five values below (1-5) and five values above (6-10) Consequently, this establishes the basis for the hypotheses related to this rating scale item.
Fig 4.1 Sample Survey Item for Marriot Hotel (Practical Example)
70 4 One-Group t-Test for the Mean
D23: enter the STDEV (see Fig.4.2)
D26: compute the standard error using the formula in Chap.1
D29: find the critical t value of t in the t-table in AppendixE
Table for Front Desk Clerk
4.2 One-Group t-Test for the Mean 71
Now, enter the following formula in cell D32 to find the t-test result: ẳðD205:5ị ðno spaces betweenị
To calculate the t-test result, subtract the hypothesized population mean of 5.5 from the sample mean located in cell D20, ensuring to use parentheses around the subtraction operation This difference, which equals 1.08, is then divided by the standard error of the mean found in cell D26, which is 0.22 The final t-test result is 4.93, rounded to two decimal places for both the standard error and the t-test result.
Now, write the following sentence in D36-D39 to summarize the result of the t-test:
D36: Since the absolute value of t of 4.93 is
D37: greater than the critical t of 1.96, we
Result for Front Desk Clerk
72 4 One-Group t-Test for the Mean
D38: reject the null hypothesis and accept
Lastly, write the following sentence in D41-D43 to summarize the conclusion of the result for Item #7 of the Marriott Guest Satisfaction Survey:
D41: St Louis Marriott Hotel guests rated the
D42: Front Desk Clerks as significantly
Save your file as: MARRIOTT3
Print the final spreadsheet so that it fits onto one page as given in Fig.4.4 Enter the null hypothesis and the research hypothesis by hand on your spreadsheet
Fig 4.4 Final Spreadsheet for Front Desk Clerk Friendliness
4.2 One-Group t-Test for the Mean 73
Important note: It is important for you to understand that “technically” the above conclusion in statistical terms should state:
“St Louis Marriott Hotel Guests rated the Front Desk Clerks as friendly last week, and this result was probably not obtained by chance.”
In this book, we use the term "significantly" in the conclusions of statistical tests to indicate that the results are likely not due to chance This shorthand simplifies the explanation for readers, allowing them to grasp the conclusions in clear, straightforward language rather than complex statistical terminology.
Can You Use Either the 95 Percent Confidence Interval
You are probably asking yourself:
To analyze the results of the problems discussed in this book, you may need either the 95% confidence interval for the mean or the one-group t-test Is this understanding accurate?
The answer is a resounding:“Yes!”
In business research, both the confidence interval for the mean and the one-group t-test are frequently utilized to address various problems Remarkably, these two statistical methods yield identical results and lead to the same conclusions from the analyzed data set.
This book covers two statistical tests: the confidence interval about the mean test and the one-group t-test Different managers have varying preferences for these methods, and some may even choose to use both to enhance clarity in their research reports To ensure you are well-equipped for any situation, we provide comprehensive explanations of both tests, enabling you to effectively analyze statistical data regardless of your manager's preferences.
Now, let’s try your Excel skills on the one-group t-test on these three problems at the end of this chapter.
End-of-Chapter Practice Problems
Subaru of America conducts weekly assessments of customer satisfaction at its dealerships through the Purchase Experience Survey, setting a benchmark of 93% for satisfaction scores Dealers who fall short of this target must undergo additional training to enhance their customer service In this context, a random sample of rating forms from new car buyers at the St Louis Subaru dealer has been analyzed, as illustrated in the hypothetical table provided in Fig 4.5 for Question #1d.
74 4 One-Group t-Test for the Mean
To conduct your analysis, first, formulate the null hypothesis and the research hypothesis in your spreadsheet Next, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean, ensuring that the mean, standard deviation, and standard error are formatted to two decimal places Finally, retrieve the critical t-value from the t-table in Appendix E and input it into your spreadsheet, clearly labeling it for reference.
Fig 4.5 Worksheet Data for Chap 4: Practice Problem #1
4.4 End-of-Chapter Practice Problems 75
(d) Use Excel to compute the t-value for these data (use 2 decimal places) and label it on your spreadsheet
(e) Type the result on your spreadsheet, and then type the conclusion in plain English on your spreadsheet
(f) Save the file as: subaru4
As part of a Morale Survey initiated by top management, the Human Resources department conducted an analysis of managers' attitudes towards their work environment To evaluate Excel skills, a random sample was taken from the survey responses, specifically focusing on the data from Item #35, which is illustrated in Fig 4.6.
Fig 4.6 Worksheet Data for Chap 4: Practice Problem #2
76 4 One-Group t-Test for the Mean
(a) On your Excel spreadsheet, write the null hypothesis and the research hypothesis for these data.
To analyze the data effectively, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean, ensuring that all values are presented to two decimal places.
(c) Use Excel to perform aone-group t-teston these data (two decimal places). (d) On your printout, type thecritical value of t(.05 level) given in your t-table in Appendix E.
(e) On your spreadsheet, type theresultof the t-test.
(f) On your spreadsheet, type theconclusionof your study in plain English. (g) save the file as: challenge4
As a marketing consultant for the Missouri Botanical Garden, I have redesigned the Comment Card survey to enhance visitor feedback We have transitioned from a 5-point rating scale to a 9-point scale, ranging from 1 (poor) to 9 (excellent), to capture more nuanced data and increase the standard deviation of our results The hypothetical results from a recent week for Question #10 of the revised survey are presented in Figure 4.7, reflecting the improved feedback mechanism.
4.4 End-of-Chapter Practice Problems 77
In your spreadsheet, begin by formulating the null hypothesis and the research hypothesis Next, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean, placing these results adjacent to your dataset Ensure that the mean, standard deviation, and standard error of the mean are formatted to display two decimal places for clarity and precision.
78 4 One-Group t-Test for the Mean
(c) Enter the critical t from the t-table in AppendixEonto your spreadsheet, and label it.
(d) Use Excel to compute the t-value for these data (use 2 decimal places) and label it on your spreadsheet
(e) Type the result on your spreadsheet, and then type the conclusion in plain English on your spreadsheet
(f) Save the file as: Garden5
Zikmund, W.G and Babin, B.J Exploring Marketing Research (10th ed.) Mason, OH: South- Western Cengage Learning, 2010.
Foster, D.P., Stine, R.A., Waterman, R.P Basic Business Statistics: A Casebook New York, NY: Springer-Verlag, 1998.
Two-Group t-Test of the Difference of the Means for Independent Groups
In this section of the book, we shift our focus from analyzing a single group of participants with one measurement to examining two distinct groups of individuals This transition allows for a more comprehensive understanding of the differences and similarities between the groups in your research study.
The two-group t-test for independent groups is used to analyze situations where two distinct groups of individuals are compared, with each person measured on a single variable to generate a unique score In this context, the groups are considered independent of one another, meaning no individual belongs to both groups.
The two-group t-test relies on key assumptions: both groups must be drawn from normally distributed populations, and their variances should be roughly equal (Zikmund and Babin, 2010) It's important to note that standard deviation is simply the square root of variance While there are specific formulas for situations where individuals are measured twice, creating dependent groups, this book focuses solely on independent groups, ensuring that no individual is present in both datasets.
When testing for the difference between the means of two groups, it's crucial to use the appropriate formula based on the sample sizes of each group.
(1) Use Formula #1 in this chapter when both of the groups have more than
(2) Use Formula #2 in this chapter when either one group, or both groups, have sample sizes less than 30 people in them.
We will illustrate both of these situations in this chapter.
To effectively conduct hypothesis testing with two groups, it is essential to first understand the necessary steps involved in the process before exploring the relevant formulas.
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
The 9 STEPS for Hypothesis-Testing Using
STEP 1: Name One Group, Group 1,
In this chapter, we will utilize the numbers 1 and 2 to differentiate between two groups By designating one group as Group 1 and the other as Group 2, you can streamline your calculations by using these numerical identifiers instead of repeatedly writing out the group names.
When conducting taste tests among teenage boys comparing Coke and Pepsi, labeling the groups as "Coke" and "Pepsi" can be cumbersome Instead, designating the Coke group as Group 1 and the Pepsi group as Group 2 streamlines the process, allowing for easier and quicker references throughout the study This approach not only saves time but also enhances clarity in reporting results.
When comparing test market results for Kansas City and Indianapolis, using simplified labels like Group 1 and Group 2 can save time and enhance clarity This approach allows for more efficient communication without the need to repeatedly spell out the city names.
It is important to understand that the designation of groups as Group 1 or Group 2 is arbitrary; the outcome and conclusions drawn from the formulas will remain consistent regardless of how you label these groups.
STEP 2: Create a Table That Summarizes
Size, Mean Score, and Standard Deviation of Each
To ensure accuracy in your two-group t-test calculations, it is crucial to use the correct numbers in your formulas Mixing up the values can lead to significant errors, compromising the integrity of your entire analysis.
82 5 Two-Group t-Test of the Difference of the Means for Independent Groups
In a study assessing the taste preferences of teenage boys, participants were randomly assigned to sample either Coke or Pepsi After tasting one of the brands, they rated its flavor on a 100-point scale, with 0 indicating poor taste.
100ẳexcellent After the research study was completed, suppose that the Coke group had 52 boys in it, their mean taste rating was 55 with a standard deviation of
7, while the Pepsi group had 57 boys in it and their average taste rating was 64 with a standard deviation of 13.
To accurately analyze the taste ratings for teenage boys between two brands, it's essential to utilize six key statistics: the sample size, mean, and standard deviation for each group Proper application of these numbers in the formulas is crucial for determining any significant differences in the taste ratings.
If you create a table to summarize these data, a good example of the table, using both Step 1 and Step 2, would be the data presented in Fig.5.1:
In your research study, you can categorize the data by labeling Group 1 as the Coke group and Group 2 as the Pepsi group, effectively organizing the six numbers into a structured table as illustrated in Fig 5.2.
You can now use the formulas for the two-group t-test with more confidence that the six numbers will be placed in the proper place in the formulas.
You can label Group 1 as the Pepsi group and Group 2 as the Coke group; the choice of names for these two groups is entirely yours and does not affect their meaning.
STEP 3: State the Null Hypothesis and the Research
Hypothesis for the Two-Group t-Test
After completing Step 1, you will find this step straightforward, as the null hypothesis and research hypothesis for the two-group t-test are consistently formulated in the same manner The null hypothesis asserts that there is no difference between the population means of the two groups.
Entering the Data Needed for the Two-group t-test
5.1 The 9 STEPS for Hypothesis-Testing Using the Two-Group t-Test 83 groups are equal, while the research hypothesis states that the population means of the two groups are not equal In notation format, this becomes:
You can now see that this notation is much simpler than having to write out the names of the two groups in all of your formulas.
STEP 4: Select the Appropriate Statistical Test
This chapter focuses on scenarios involving two distinct groups of individuals, where each person in both groups has only one measurement To analyze this data, we will utilize the two-group t-test as our primary statistical method.
STEP 5: Decide on a Decision Rule
The decision rule is exactly what it was in the previous chapter (see Sect.4.1.3) when we dealt with the one-group t-test.
(a) If the absolute value of t is less than the critical value of t, accept the null hypothesis.
(b) If the absolute value of t is greater than the critical value of t, reject the null hypothesis and accept the research hypothesis.
Since you learned how to find the absolute value of t in the previous chapter (seeSect.4.1.3.1), you can use that knowledge in this chapter.
STEP 6: Calculate the Formula
In this chapter, we will discuss two distinct formulas for conducting a two-group t-test, which vary based on the sample sizes of the groups Detailed instructions on how to apply these formulas will be provided later in the text.
84 5 Two-Group t-Test of the Difference of the Means for Independent Groups
STEP 7: Find the Critical Value of t
in the t-Table in Appendix E
In the previous chapter discussing the one-group t-test, you learned how to determine the critical value of t using the t-table in Appendix E By locating the sample size of the group in the first column and reading the corresponding critical t value in the “critical t column” on the right, you could easily find the necessary value for your analysis With practice, this process becomes straightforward and efficient.
The two-group t-test involves a more complex process for determining the critical value of t due to the presence of two distinct groups in the study, which often vary in sample sizes.
To use AppendixEcorrectly in this chapter, you need to learn how to find the
“degrees of freedom” for your study We will discuss that process now.
5.1.7.1 Finding the Degrees of Freedom (df) for the Two-Group t-Test
Objective: To find the degrees of freedom for the two-group t-test and to use it to find the critical value of t in the t-table in AppendixE
The concept of "degrees of freedom" is essential in statistics, and while a detailed mathematical explanation is not provided here, it can be explored in any reputable statistics textbook, such as Keller (2009) For practical applications, understanding how to calculate degrees of freedom and its role in determining the critical value of t is straightforward, as outlined in Appendix E The formula for degrees of freedom (df) is given by df = n1 + n2 - 2.
To calculate the degrees of freedom for your analysis, simply sum the sample sizes of Group 1 and Group 2, then subtract 2 from that total This will give you the appropriate degrees of freedom to reference in Appendix E.
In a two-group t-test, it is essential to utilize the second column of the table, known as degrees of freedom (df), to determine the critical value of t, rather than relying on the first column that is used in the one-group t-test based on a single sample size, n.
In a scenario with 13 individuals in Group 1 and 17 individuals in Group 2, the degrees of freedom can be calculated as 28 by adding the group sizes together (13 + 17 = 28) To determine the critical value of t for this degrees of freedom, refer to the t-distribution table, where you find the critical value of t as 2.048 corresponding to df = 28.
In a scenario where Group 1 has 52 participants and Group 2 has 57, the total degrees of freedom is calculated as 107 Referring to Appendix E, it is noted that when the degrees of freedom exceed 39, the critical t value consistently remains at 1.96, which is the value applicable for this example in hypothesis testing using the two-group t-test.
STEP 8: State the Result of Your Statistical Test
The result follows the exact same result format that you found for the one-group t-test in the previous chapter (see Sect.4.1.6):
In the t-test analysis, if the absolute value of t calculated from the formula is less than the critical value found in Appendix E, the null hypothesis is accepted Conversely, if the absolute value of t exceeds the critical value, the null hypothesis is rejected in favor of the research hypothesis.
STEP 9: State the Conclusion of Your Statistical
Writing a conclusion for a two-group t-test presents more challenges than for a one-group t-test, as it requires determining the differences between the two groups being analyzed.
When you accept the null hypothesis, the conclusion is simple to write: “There is no difference between the two groups in the variable that was measured.”
But when you reject the null hypothesis and accept the research hypothesis, you need to be careful about writing the conclusion so that it is both accurate and concise.
Let’s give you some practice in writing the conclusion of a two-group t-test.
5.1.9.1 Writing the Conclusion of the Two-Group t-Test When
You Accept the Null Hypothesis
Objective: To write the conclusion of the two-group t-test when you have accepted the null hypothesis.
Suppose that you have been hired as a statistical consultant by Marriott Hotel in
St Louis will evaluate the findings from a Guest Satisfaction Survey distributed to all patrons, aiming to assess customer satisfaction levels regarding various hotel activities.
86 5 Two-Group t-Test of the Difference of the Means for Independent Groups
The survey contains a number of items, but suppose Item #7 is the one in Fig.5.3:
Suppose further, that you have decided to analyze the data from last week’s customers comparing men and women using the two-group t-test.
Important note: You would need to use this test for each of the survey items separately
Last week at the St Louis Marriott Hotel, a sample of 124 men reported a mean score of 6.58 with a standard deviation of 2.44 for Item #7 In comparison, data from 86 women indicated a mean score of 6.45 and a standard deviation of 1.86.
In this chapter, we will outline how to calculate the results of a two-group t-test using specific formulas For now, it's important to note that these calculations yield the following results: degrees of freedom are 208, the critical t value is 1.96 (referenced in Appendix E), and the t-test formula result is 0.44 when calculated.
Result: Since the absolute value of 0.44 is less than the critical t of
1.96, we accept the null hypothesis.
Conclusion: There was no difference between male and female guests last week in their rating of the friendliness of the front-desk clerk at the St Louis Marriott Hotel.
Now, let’s see what happens when you reject the null hypothesis (H 0 ) and accept the research hypothesis (H 1 ).
Fig 5.3 Marriott Hotel Guest Satisfaction Survey Item #7
Fig 5.4 Worksheet Data for Males vs Females for the St Louis Marriott Hotel for Accepting the Null Hypothesis
5.1 The 9 STEPS for Hypothesis-Testing Using the Two-Group t-Test 87
5.1.9.2 Writing the Conclusion of the Two-Group t-Test When You
Reject the Null Hypothesis and Accept the Research Hypothesis
Objective: To write the conclusion of the two-group t-test when you have rejected the null hypothesis and accepted the research hypothesis
Let’s continue with this same example of the Marriott Hotel, but with the result that we reject the null hypothesis and accept the research hypothesis.
Last week, data was collected from 85 males, revealing a mean score of 7.26 with a standard deviation of 2.35 In contrast, 48 females had a mean score of 4.37 and a standard deviation of 3.26 on the same question.
Without going into the details of the formulas for the two-group t-test, these data would produce the following result and conclusion based on Fig.5.5:
Research Hypothesis: μ16ẳμ2 degrees of freedom: 131 critical t: 1.96 (in AppendixE) t-test formula: 5.40 (when you use your calculator!)
Result: Since the absolute value of 5.40 is greater than the critical t of 1.96, we reject the null hypothesis and accept the research hypothesis.
To determine which group, men or women, rated the friendliness of the front-desk clerk more positively, a comparison of their ratings is necessary Analyzing the data will reveal insights into the perceptions of both genders regarding the clerk's friendliness, allowing for a clear understanding of overall satisfaction levels.
In summarizing the conclusion of a two-group t-test, it is essential to compare the means of the two groups If the null hypothesis is rejected and the research hypothesis is accepted, be sure to include the term "significantly" in your conclusion to highlight the importance of the findings.
To effectively conclude a two-group t-test using a rating scale, it's beneficial to visualize the mean scores of both groups on a scale diagram This approach allows for a clearer understanding of the differences between the mean scores For instance, in the case of the Marriott Hotel example, you would create a visual representation of the scale as illustrated in Fig 5.6.
Fig 5.5 Worksheet Data for St Louis Marriott Hotel for Obtaining a Significant Difference between Males and Females
88 5 Two-Group t-Test of the Difference of the Means for Independent Groups
The visual representation indicates that males received a significantly higher positive rating than females, scoring 7.26 compared to 4.37 By rejecting the null hypothesis and accepting the research hypothesis, it is evident that there is a meaningful difference between the mean scores of the two groups.
So, our conclusion needs to contain the following key words:
We can use these key words to write the either of two conclusions which are logically identical:
Last week at the Marriott Hotel in St Louis, male guests rated the Front Desk Clerks significantly higher in friendliness than female guests, with scores of 7.26 compared to 4.37.
Both of these conclusions are accurate, so you can decide which one you want to write It is your choice.
When drawing conclusions, ensure that the mean scores presented in parentheses correspond to the order of the groups discussed For instance, if you state, "Male guests rated the Front Desk Clerks as significantly more friendly than female guests," the accompanying scores should reflect this order: (7.26 vs 4.37), aligning with the mention of males first and females second.
Female guests perceived the Front Desk Clerks as notably less friendly compared to male guests, with ratings of 4.37 for females and 7.26 for males.
Including the two mean scores at the conclusion of your research report allows readers to easily access this information without needing to refer back to the table This practice enhances clarity and ensures that the differences between the mean scores are readily visible, improving the overall readability of your findings.
Fig 5.6 Example of Drawing a “Picture” of the Means of the Two Groups on the Rating Scale5.1 The 9 STEPS for Hypothesis-Testing Using the Two-Group t-Test 89
Now, let’s discuss FORMULA #1 that deals with the situation in which both groups have more than 30 people in them.
Objective: To use FORMULA #1 for the two-group t-test when both groups have a sample size greater than 30 people
Formula #1: Both Groups Have More Than 30
An Example of Formula #1 for the Two-Group t-Test
Now, let’s use Formula #1 in a situation in which both groups have a sample size greater than 30 people.
In a recent taste test conducted for PepsiCo, teenage boys aged 13 to 18 were tasked with evaluating the flavors of Pepsi and Coke without knowing the brand names Participants were divided into two groups: Group 1 sampled Coke, while Group 2 tasted Pepsi Each group rated their beverage on a 100-point scale, providing valuable insights into their preferences This study aims to uncover whether there is a significant difference in taste perception between the two popular soft drinks among adolescents.
In a recent analysis of beverage preferences, the 52 boys in the Coke group achieved a mean rating of 55 with a standard deviation of 7 In contrast, the 57 boys in the Pepsi group recorded a higher mean rating of 64, accompanied by a standard deviation of 13.
The two-group t-test is a robust statistical method that does not necessitate equal sample sizes for both groups This flexibility allows for more accurate comparisons between the groups being analyzed.
Your data then produce the following table in Fig.5.8:
Create an Excel spreadsheet, and enter the following information:
Now, widen column B so that it is twice as wide as column A, and center the six numbers and their labels in your table (see Fig.5.9)
Fig 5.7 Example of a Rating Scale for a Soft Drink Taste Test (Practical Example)
Fig 5.8 Worksheet Data for Soft Drink Taste Test
92 5 Two-Group t-Test of the Difference of the Means for Independent Groups
Since both groups have a sample size greater than 30, you need to use Formula #1 for the t-test for the difference of the means of the two groups.
Let’s “break this formula down into pieces” to reduce the chance of making a mistake.
B13: STDEV1 squared / n1 (note that you square the standard deviation of Group
1, and then divide the result by the sample size of Group 1)
Fig 5.9 Results of Widening Column B and Centering the Numbers in the Cells
5.2 Formula #1: Both Groups Have More Than 30 People in Them 93
You now need to compute the values of the above formulas in the following cells:
To calculate the values for cells B13, B16, and B19, apply the appropriate formulas and ensure the results are displayed with two decimal places Additionally, for cell D22, compute the square root of the value in D19, also formatted to two decimal places.
This formula should give you a standard error (s.e.) of 1.98.
Fig 5.10 Formula Labels for the Two-group t-test
94 5 Two-Group t-Test of the Difference of the Means for Independent Groups
(Since dfẳn1 + n22, this gives dfẳ1092ẳ107, and the critical t is, therefore, 1.96 in AppendixE.)
D28: ẳ(D4D5)/D22 (use 2 decimals) (no spaces between)
This formula should give you a value for the t-test of:4.55.
Next, check to see if you have rounded off all figures in D13: D28 to two decimal places (see Fig.5.11).
Now, write the following sentence in D31 to D34 to summarize the result of the study:
D31: Since the absolute value of4.55
D32: is greater than the critical t of
D33: 1.96, we reject the null hypothesis
D34: and accept the research hypothesis.
Finally, write the following sentence in D36 to D38 to summarize the conclusion of the study in plain English:
D36: Teenage boys rated the taste of
D37: Pepsi as significantly better than
D38: the taste of Coke (64 vs 55).
Save your file as: COKE4
Print this file so that it fits onto one page, and write by hand the null hypothesis and the research hypothesis on your printout.
The final spreadsheet appears in Fig.5.12.
Fig 5.11 Results of the t-test Formula for the Soft
5.2 Formula #1: Both Groups Have More Than 30 People in Them 95
Now, let’s use the second formula for the two-group t-test which we use whenever either one group, or both groups, have less than 30 people in them.
Objective: To use Formula #2 for the two-group t-test when one or both groups have less than 30 people in them
Now, let’s look at the case when one or both groups have a sample size less than
Fig 5.12 Final Worksheet for the Coke vs Pepsi Taste Test
96 5 Two-Group t-Test of the Difference of the Means for Independent Groups
Formula #2: One or Both Groups Have Less Than 30
A pricing experiment was conducted by the MP3 player manufacturer to evaluate the impact of price reductions on sales volume The results indicated that lowering the price led to a significant increase in the number of units sold, demonstrating a positive correlation between reduced pricing and consumer demand This experiment highlights the effectiveness of strategic pricing in boosting sales for electronic products like MP3 players.
Suppose, further, that you have randomly selected 7 wholesalers to purchase the product at the regular price, and they purchased a mean of 117.7 units with a standard deviation of 19.9 units.
In addition, you randomly selected a different group of 8 wholesalers to purchase the product at a 10 % price cut, and they purchased a mean of 125.1 units with a standard deviation of 15.1 units.
You want to test to see if the two different prices produced a significant difference in the number of MP3 units sold.
You have decided to use the two-group t-test for independent samples, and the following data resulted in Fig.5.13:
Note: Since both groups have a sample size less than 30 people, you need to use Formula #2 in the following steps:
Create an Excel spreadsheet, and enter the following information:
Now, widen column B so that it is three times as wide as column A.
To highlight all cells in column B of your spreadsheet, click on the letter "B" at the top left Next, position your mouse at the right edge of the B cell until a cross sign appears Click and drag this cross sign to the right until all text is visible on your screen, then release the mouse button.
Fig 5.13 Worksheet Data for Wholesaler Price Comparison (Practical Example)
5.3 Formula #2: One or Both Groups Have Less Than 30 People in Them 97
Next,center the information in cells C3 to E5 by highlighting these cells and then using this step:
Click on the bottom line, second from the left icon, under “Alignment” at the top-center of Home
Since both groups have a sample size less than 30, you need to use Formula #2 for the t-test for the difference of the means of two independent samples.
Formula #2 for the two-group t-test is the following: tẳX1X2
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n11 ð ịS 2 1ỵðn21ịS 2 2 n1þn22
1 n1 þ 1 n2 s ð5:5ị and where degrees of freedomẳdf ẳn1ỵn22 ð5:6ị
To minimize errors in writing complex formulas, it's advisable to break the formula down into smaller, manageable parts rather than attempting to create it as a single entry in one cell.
Now, enter these words on your spreadsheet:
Fig 5.14 Wholesaler Price Comparison Worksheet Data for Hypothesis Testing
98 5 Two-Group t-Test of the Difference of the Means for Independent Groups
Labels for Two-group t-test
5.3 Formula #2: One or Both Groups Have Less Than 30 People in Them 99
You now need to compute the values of the above formulas in the following cells:
To compute the value for cell B13, apply the necessary formula and ensure the result is displayed with two decimal places Similarly, for cell B16, utilize the appropriate formula to derive the result, also formatted to two decimal places Lastly, calculate the value for cell B19 using the required formula.
E22: the result of the formula needed to compute cell B22 (use 2 decimals) E25: ẳSQRT(((E13 + E16)/E19)*E22)
To ensure the formula functions properly, it is essential to include three opening parentheses after SQRT and three closing parentheses at the end The correct placement of these parentheses is crucial for the formula to work accurately.
The above formula gives a standard error of the difference of the means equal to 9.05 (two decimals).
E28: enter the critical t value from the t-table in AppendixEin this cell using dfẳn 1 + n 2 2 to find the critical t value
To calculate the t-test value, ensure you include an open parenthesis before D4 and a closed parenthesis after D5 This will allow the result of 7.40 to be divided by the standard error of the difference of the means, which is 9.05 The resulting t-test value is -0.82, rounded to two decimal places (refer to Fig 5.16).
100 5 Two-Group t-Test of the Difference of the Means for Independent Groups
Now write the following sentence in D34 to D37 to summarize theresultof the study:
D35: of t of0.82 is less than
Finally, write the following sentence in D39 to D43 to summarize the conclusion of the study:
D40: in the number of units of
Comparison Two-group t-test Formula Results
5.3 Formula #2: One or Both Groups Have Less Than 30 People in Them 101
D41: MP3 players sold at the
D42: two prices So, you should
Save your file as: MP4
Print the final spreadsheet so that it fits onto one page.
Write the null hypothesis and the research hypothesis by hand on your printout. The final spreadsheet appears in Fig.5.17.
Fig 5.17 Wholesaler Price Comparison Final Spreadsheet
102 5 Two-Group t-Test of the Difference of the Means for Independent Groups
End-of-Chapter Practice Problems
Boeing Company has commissioned a data analysis of the Morale Surveys completed by its managers over the past month The survey responses were aggregated to produce a total score, where a higher score reflects greater job satisfaction, while a lower score indicates diminished job satisfaction.
You select a random sample of managers, 202 females who averaged 84.80 on this survey with a standard deviation of 5.10 You also select a random sample of
241 males on this survey and they averaged 88.20 with a standard deviation of 4.30.
(a) State the null hypothesis and the research hypothesis on an Excel spreadsheet.
To calculate the standard error of the difference between the means in Excel, first input your data and use the appropriate formula Next, determine the critical t value by consulting Appendix E and record it in your spreadsheet Finally, conduct a t-test on the data in Excel to obtain the t value.
Use three decimal places for all figures in the formula section of your spreadsheet.
(e) State your result on your spreadsheet.
(f) State your conclusion in plain English on your spreadsheet.
(g) Save the file as: Boeing3
In 2010, Massachusetts Mutual Financial Group featured a full-page color advertisement in The Wall Street Journal, showcasing a male model embracing his two-year-old daughter The ad was designed to convey a heartfelt message, emphasizing the importance of family and financial security.
WHAT IS THE SIGN OF A GOOD DECISION?
It’s knowing your life insurance can help provide income for retirement And peace of mind until you get there.
Since the majority of the subscribers toThe Wall Street Journal are men, an interesting research question would be the following:
Research question: “Does a male model in a magazine ad affect adult men’s or adult women’s willingness to learn more about how life insurance can provide income for retirement?”
In a controlled experiment, adult males and females aged 25–39 were each shown an identical advertisement featuring a male model, ensuring that both groups experienced the same copy format The participants were kept separate to prevent any interaction between the two groups during the study.
At the end of a one-hour discussion of the mockup ad, the respondents were asked the question given in Fig.5.18:
5.4 End-of-Chapter Practice Problems 103
The resulting data for this question appear in Fig.5.19:
Fig 5.18 Rating Scale Item for a Magazine Ad Interest Indicator (Practical Example)
Fig 5.19 Worksheet Data for Chap 5: Practice
104 5 Two-Group t-Test of the Difference of the Means for Independent Groups
(a) On your Excel spreadsheet, write the null hypothesis and the research hypothesis.
To summarize the data effectively, create a table in your spreadsheet that outlines the relevant information Utilize Excel to calculate the sample sizes, means, and standard deviations for the two groups presented in the table.
(c) Use Excel to find the standard error of the difference of the means.
(d) Use Excel to perform a two-group t-test What is the value of tthat you obtain (use two decimal places)?
(e) On your spreadsheet, type thecritical value of tusing the t-table in AppendixE. (f) Type yourresulton the test on your spreadsheet.
(g) Type yourconclusion in plain Englishon your spreadsheet.
(h) save the file as: lifeinsur12
3 American Airlines offered an in-flight meal that passengers could purchase for
Passengers were offered a meal priced at $8.00 and requested to complete a survey assessing their meal experience They rated their likelihood of purchasing the meal on a 5-point scale However, if the airline modifies the survey to use a 7-point scale for measuring purchase intention, the revised item would be presented as shown in Fig 5.20.
A recent survey categorized passengers as either business travelers or vacationers Last month, the average rating for 64 business travelers was 3.23, accompanied by a standard deviation of 1.04.
56 “vacationers” had an average rating of 2.36 with a standard deviation of 1.35.
(a) State the null hypothesis and the research hypothesis on an Excel spreadsheet.
To find the standard error of the difference between the means in Excel, begin by calculating the necessary values Next, use Appendix E to determine the critical t value and input this into your spreadsheet Finally, conduct a t-test on the data using Excel to obtain the t value.
(e) State your result on your spreadsheet.
(f) State your conclusion in plain English on your spreadsheet.
(g) Save the file as: AAmeal3
Fig 5.20 Rating Scale Item for an In-flight Meal on an American Airlines Survey (Practical Example)
5.4 End-of-Chapter Practice Problems 105
Keller, G Statistics for Management and Economics (8 th ed.) Mason, OH: South-Western Cengage Learning, 2009.
Zikmund, W.G and Babin, B.J Exploring Marketing Research (10 th ed.) Mason, OH: South- Western Cengage Learning, 2010.
Mass Mutual Financial Group What is the Sign of a Good Decision? (Advertisement) The Wall Street Journal, September 29, 2010, p A22.
106 5 Two-Group t-Test of the Difference of the Means for Independent Groups
Correlation and Simple Linear Regression
There are many different types of “correlation coefficients,” but the one we will use in this book is the Pearson product–moment correlation which we will call:r.
What Is a “Correlation?”
Understanding the Formula for Computing
Objective: To understand the formula for computing the correlation r
The formula for computing the correlationris as follows: rẳ n1 1 Σ XX
This formula looks daunting at first glance, but let’s “break it down into its steps” to understand how to compute the correlation r.
Understanding the Nine Steps for Computing
Objective: To understand the nine steps of computing a correlation r
The nine steps are as follows:
1 Find the sample size n by noting the number of weeks 8
2 Divide the number 1 by the sample size minus 1 (i.e., 1/7) 0.14286
3 For each week, take the cost of TV ads for that week and subtract the mean cost of TV ads for the 8 weeks and call this
X X (For example, for week 6, this would be: 3.3 – 3.03)
Note: With your calculator, this difference is 0.27, but when
Excel uses 16 decimal places for every computation, this result will be 0.28 instead of 0.27.
4 For each week, take the weekly sales for that week and subtract the mean weekly sales for the 8 weeks and call this Y Y (For example, for week 6, this would be: 92 – 91.50)
5 Then, for each week, multiply (X X) times (Y Y) (For example, for week 6 this would be: 0.27 x 0.50)
6 Add the results of (X X) times (Y Y) for the 8 weeks 11.50 Steps 1–6 would produce the Excel table given in Fig.6.8.
112 6 Correlation and Simple Linear Regression
In Excel, multiplying two negative numbers yields a positive result, as demonstrated in week 2 with the calculation (1.13 x 4.50 = +5.06) Conversely, when a negative number is multiplied by a positive number, the outcome is negative, illustrated in week 5 with the equation (0.13 x +0.50 = -0.06).
Note: Excel computes all computation to 16 decimal places So, when you check your work with a calculator, you frequently get a slightly different answer than Excel’s answer.
For example, when you compute above:
YY for Week 2, your calculator gives:
But, as you can see from the table, Excel’s answer of 5.06 ismore accurate because Excel uses 16 decimal places for every number.
In Step 6, ensure you first sum all positive numbers to achieve +12.61, followed by summing all negative numbers to obtain 1.11 By subtracting the total of the negative numbers from the positive ones, you will arrive at the final answer of +11.50.
7 Multiply the answer for step 2 above by the answer for step
8 Multiply the STDEV of X times the STDEV of Y (0.93 x 2.33) 2.1669
9 Finally, divide the answer from step 7 by the answer from step
The correlation coefficient of 0.76 indicates a strong, positive relationship between the weekly cost of TV ads (X) and the weekly sales of this supermarket chain (Y) over an 8-week period.
6.1 What Is a “Correlation?” 113 two variables That is, as the chain increases its spending on TV ads, its sales for that week increase For a more detailed discussion of correlation, see Zikmund and Babin (2010).
You could also use the results of the above table in the formula for computing the correlation r in the following way: correlation rẳ 1=ðn 1ịxΣ XX
= STDEVx x STDEVy correlation rẳẵð1=7ịx 11:50=ẵð0:93ịx 2ð :33ị correlationẳr ẳ 0:76
Now, let’s discuss how you can use Excel to find the correlation between two variables in a much simpler, and much faster, fashion than using your calculator.
Using Excel to Compute a Correlation Between
Objective: To use Excel to find the correlation between two variables
Suppose that you have been hired by the owner of a supermarket chain in
St Louis will recommend the optimal number of shelf facings for Kellogg's Corn Flakes in this chain A "shelf facing" refers to the number of cereal boxes displayed side by side, so a shelf facing of 3 indicates three boxes arranged next to each other.
3 boxes of Kellogg’s Corn Flakes are stacked beside each other on the supermarket shelf in the cereals section.
In this study, supermarket locations were randomly assigned, and the number of facings for each location was also randomly selected, ranging from 1 to 3 facings Over a ten-week period, weekly sales data for the cereal, measured in thousands of dollars, were collected and are illustrated in Fig 6.9.
Fig 6.9 Worksheet Data for the Number of Facings and Sales (Practical
114 6 Correlation and Simple Linear Regression
To investigate the relationship between the number of facings of Kellogg's Corn Flakes (X) and its weekly sales figures (Y), a correlation analysis will be conducted This analysis aims to reveal how variations in the product's visibility may influence its sales performance.
Create an Excel spreadsheet with the following information:
Next, change the width of Columns B and C so that the information fits inside the cells.
Now, complete the remaining figures in the table given above so that A12 is
10, B12 is 3, and C12 is 4.5 (Be sure to double-check your figures to make sure that they are correct!) Then, center the information in all of these cells.
Next, define the “name” to the range of data from B3:B12 as: facings
We discussed earlier in this book (see Sect.1.4.4) how to “name a range of data,” but here is a reminder of how to do that:
To give a “name” to a range of data:
Click on the top number in the range of data and drag the mouse down to the bottom number of the range.
To name the range of cells B3:B12 as "facings," click on cell B3 and drag the pointer down to B12 to highlight the desired cells Next, click on the appropriate option to assign the name.
Define name (top center of your screen) facings (in the Name box; see Fig.6.10)
6.2 Using Excel to Compute a Correlation Between Two Variables 115
Now, repeat these steps to give the name:salesto C3:C12
Finally, click on any blank cell on your spreadsheet to “deselect” cells C3:C12 on your computer screen.
To complete the data for the specified sample sizes, means, and standard deviations in columns B and C, ensure that the value in cell B16 is set to 0.79 and the value in cell C16 is adjusted to 1.47, rounding both figures to two decimal places as illustrated in Figure 6.11.
Fig 6.10 Dialogue Box for Naming a Range of Data as: “facings”
Using Excel to Find the
116 6 Correlation and Simple Linear Regression
Objective: Find the correlation between the number of facings and the weekly sales dollars.
C18: ẳcorrel(facings,sales); see Fig.6.12
Hit the Enter key to compute the correlation
C18: format this cell to two decimals
In Excel, the equal sign indicates the beginning of a formula The correlation coefficient of +0.83 between the number of facings (X) and weekly sales (Y) demonstrates a strong positive relationship, suggesting that an increase in facings is associated with higher weekly sales.
1, 2, 3 facings are used), the higher the weekly sales dollars generated for this cereal.
Save this file as: FACINGS5
The final spreadsheet appears in Fig.6.13.
6.2 Using Excel to Compute a Correlation Between Two Variables 117
Creating a Chart and Drawing the Regression Line
Using Excel to Create a Chart and the Regression
The objective of this analysis is to illustrate the correlation between the number of shelf facings and weekly sales in thousands of dollars by creating a chart This involves plotting the data points and drawing a regression line to effectively summarize the relationship between these two variables.
2 Click and drag the mouse to highlight both columns of numbers (B3:C12),but do not highlight the labels at the top of Column B and Column C.
Insert (top left of screen)
Highlight: Scatter chart icon (immediately above the word: “Charts” at the top center of your screen)
Click on the down arrow on the right of the chart icon
Highlight the top left scatter chart icon (see Fig.6.14)
Click on the top left chart to select it
Click on the “ + icon “ to the right of the chart (CHART ELEMENTS).
Click on the check mark next to “Chart Title” and also next to “Gridlines” to remove these check marks (see Fig.6.15)
Fig 6.14 Example of Selecting a Scatter Chart
120 6 Correlation and Simple Linear Regression
Click on the box next to: “Chart Title” and then click on the arrow to its right. Then, click on: “Above chart”.
Note that the words: “Chart Title” are now in a box at the top of the chart (see Fig.6.16)
Enter the following Chart Title to the right off x at the top of your screen:
RELATIONSHIP BETWEEN NO OF FACINGS AND SALES (see Fig.6.17)
Fig 6.15 Example of Chart Elements Selected
Fig 6.16 Example of Chart Title Selected
6.3 Creating a Chart and Drawing the Regression Line onto the Chart 121
Hit the Enter Key to enter this chart title onto the chart
Clickinside the chart at the top right corner of the chart to “deselect” the box around the Chart Title (see Fig.6.18)
Click on the “ + box “ to the right of the chart
Add a check mark to the left of “Axis Titles” (This will create an “Axis Title” box on the y-axis of the chart)
Fig 6.17 Example of Creating a Chart Title
Fig 6.18 Example of a Chart Title Inserted onto the Chart
122 6 Correlation and Simple Linear Regression
Click on the right arrow for: “Axis titles” and then click on: “Primary Horizontal” to remove the check mark in its box (this will create the y-axis title)
Enter the following y-axis title to the right off x at the top of your screen: SALES ($000)
Then, hit the Enter Key to enter this y-axis title to the chart
Clickinside the chart at the top right corner of the chartto “deselect” the box around the y-axis title (see Fig.6.19)
Click on the “ + box “ to the right of the chart
Highlight: “Axis Titles” and click on its right arrow
Click on the words: “Primary Horizontal” to add a check mark to its box (this creates an “Axis Title” box on the x-axis of the chart)
Enter the following x-axis title to the right off x at the top of your screen:
Then, hit the Enter Key to add this x-axis title to the chart
Clickinside the chart at the top right corner of the chartto “deselect” the box around the x-axis title (see Fig.6.20).
Fig 6.19 Example of Adding a y-axis Title to the Chart
6.3 Creating a Chart and Drawing the Regression Line onto the Chart 123
6.3.1.1 Drawing the Regression Line Through the Data Points in the Chart
Objective: To draw the regression line through the data points on the chart
Right-clickon any one of the data points inside the chart
Highlight: Add Trendline (see Fig.6.21)
Fig 6.20 Example of a Chart Title, an x-axis Title, and a y-axis Title
Fig 6.21 Dialogue Box for Adding a Trendline to the Chart
124 6 Correlation and Simple Linear Regression
Linear (be sure the “linear” button near the top is selected on the “Format Trendline” dialog box; see Fig.6.22)
Click on the X at the top right of the “Format Trendline” dialog box to close this dialog box
Click on any blank celloutside the chartto “deselect” the chart
Save this file as: FACINGS7
Your spreadsheet should look like the spreadsheet in Fig.6.23.
Fig 6.22 Dialogue Box for a Linear Trendline
Fig 6.23 Final Chart with the Trendline Fitted Through the Data Points of the Scatterplot6.3 Creating a Chart and Drawing the Regression Line onto the Chart 125
6.3.1.2 Moving the Chart Below the Table in the Spreadsheet
Objective: To move the chart below the table
To reposition the chart, left-click on any blank area to the right of the top title, hold the button down, and drag the chart down and to the left until the top left corner aligns with cell A20, then release the mouse button.
6.3.1.3 Making the Chart “Longer” So That It Is “Taller”
Objective: To make the chart “longer” so that it is taller
To extend the chart, left-click on the bottom-center of the chart to create an up-and-down arrow icon While holding the left mouse button, drag the bottom of the chart down to row 42, then release the mouse button to finalize the adjustment.
Fig 6.24 Example of Moving the Chart Below the Table
126 6 Correlation and Simple Linear Regression
Objective: To make the chart “wider”
To widen the chart, position the pointer at the center of the right border and create a left-to-right arrow sign Then, click and hold the left mouse button while dragging the right border of the chart towards the middle of Column H.
Now, click on any blank cell outside the chart to “deselect” the chart (seeFig.6.25).
Printing a Spreadsheet So That the Table and Chart
Objective: To print the spreadsheet so that the table and the chart fit onto one page
Page Layout (top of screen)
Fig 6.25 Example of a Chart that is Enlarged to Fit the Cells: A20:H42
6.4 Printing a Spreadsheet So That the Table and Chart Fit onto One Page 127
To ensure that the table and chart fit on one page, adjust the scale by clicking the down-arrow on the middle icon at the top of the screen and selecting "95%."
Fig 6.26 Example of the Page Layout for Reducing the Scale of the Chart to 95 % of Normal Size
128 6 Correlation and Simple Linear Regression
Save your file as: FACINGS8
Finding the Regression Equation
Installing the Data Analysis ToolPak into Excel
Objective: To install the Data Analysis ToolPak into Excel
Since there are currently five versions of Excel in the marketplace (2003, 2007,
2010, 2013, 2016), we will give a brief explanation of how to install the Data Analysis ToolPak into each of these versions of Excel.
6.5.1.1 Installing the Data Analysis ToolPak into Excel 2016
Click on: Data (at the top of your screen)
Check the top right corner of your monitor screen for the words "Data Analysis." If they are present, it indicates that the Data Analysis ToolPak for Excel 2016 was successfully installed with Office 2016, allowing you to proceed to Section 6.5.2.
If the words: “Data Analysis” are not at the top right of your monitor screen, then the ToolPak component of Excel 2016 was not installed when you installed Office
2016 onto your computer If this happens, you need to follow these steps:
Options (bottom left of screen)
Note: This creates a dialog box with “Excel Options” at the top left of the box Add-Ins (on left of screen)
Manage: Excel Add-Ins (at the bottom of the dialog box)
Go (at bottom center of dialog box)
Highlight: Analysis ToolPak (in the Add-Ins dialog box)
Put a check mark to the left of Analysis Toolpak
OK (at the right of this dialog box)
130 6 Correlation and Simple Linear Regression
You now should have the words: “Data Analysis” at the top right of your screen to show that this feature has been installed correctly
Note: If these steps do not work, you should try these steps instead:
File/Options (bottom left)/Add-ins/Analysis ToolPak/Go/ click to the left of Analysis ToolPak to add a check mark/OK
If you need help doing this, ask your favorite “computer techie” for help.
You are now ready to skip ahead to Sect.6.5.2
6.5.1.2 Installing the Data Analysis ToolPak into Excel 2013
Click on: Data (at the top of your screen)
If you see "Data Analysis" at the far right of your monitor screen, it indicates that the Data Analysis ToolPak for Excel 2013 was successfully installed with Office 2013, allowing you to proceed to Section 6.5.2.
If the words: “Data Analysis” are not at the top right of your monitor screen, then the ToolPak component of Excel 2013 was not installed when you installed Office
2013 onto your computer If this happens, you need to follow these steps:
Options (bottom left of screen)
Note: This creates a dialog box with “Excel Options” at the top left of the box
Add-Ins (on left of screen)
Manage: Excel Add-Ins (at the bottom of the dialog box)
Go (at bottom center of dialog box)
Highlight: Analysis ToolPak (in the Add-Ins dialog box)
Put a check mark to the left of Analysis Toolpak
OK (at the right of this dialog box)
You now should have the words: “Data Analysis” at the top right of your screen to show that this feature has been installed correctly
If you get a prompt asking you for the “installation CD,” put this CD in the CD drive and click on: OK
If the initial steps do not resolve the issue, you can try an alternative method by navigating to File, selecting Options at the bottom left, then choosing Add-ins From there, click on Analysis ToolPak, select Go, and check the box next to Analysis ToolPak before clicking OK to activate it.
If you need help doing this, ask your favorite “computer techie” for help.
You are now ready to skip ahead to Sect.6.5.2
6.5.1.3 Installing the Data Analysis ToolPak into Excel 2010
Click on: Data (at the top of your screen)
If you see the words "Data Analysis" at the far right of your monitor screen, it indicates that the Data Analysis ToolPak for Excel 2010 was successfully installed with Office 2010, allowing you to proceed to Section 6.5.2.
If the words: “Data Analysis” are not at the top right of your monitor screen, then the ToolPak component of Excel 2010 was not installed when you installed Office
2010 onto your computer If this happens, you need to follow these steps:
Excel options (creates a dialog box)
Manage: Excel Add-Ins (at the bottom of the dialog box)
Highlight: Analysis ToolPak (in the Add-Ins dialog box)
(You now should have the words: “Data Analysis” at the top right of your screen)
If you get a prompt asking you for the “installation CD,” put this CD in the CD drive and click on: OK
If the initial troubleshooting steps are ineffective, you can try an alternative method by navigating to File, selecting Options at the bottom left, then choosing Add-ins From there, click on Analysis ToolPak, press Go, and check the box next to Analysis ToolPak before clicking OK to enable it.
If you need help doing this, ask your favorite “computer techie” for help. You are now ready to skip ahead to Sect.6.5.2.
6.5.1.4 Installing the Data Analysis ToolPak into Excel 2007
Click on: Data (at the top of your screen
If the words “Data Analysis” do not appear at the top right of your screen, you need to install the Data Analysis ToolPak using the following steps:
Microsoft Office button (top left of your screen)
Excel options (bottom of dialog box)
Add-ins (far left of dialog box)
Go (to create a dialog box for Add-Ins)
132 6 Correlation and Simple Linear Regression
OK (If Excel asks you for permission to proceed, click on: Yes)
(You should now have the words: “Data Analysis” at the top right of your screen)
If you need help doing this, ask your favorite “computer techie” for help. You are now ready to skip ahead to Sect.6.5.2.
6.5.1.5 Installing the Data Analysis ToolPak into Excel 2003
Click on: Tools (at the top of your screen)
To determine if the ToolPak is installed in your version of Excel, check if the Tools box displays “Data Analysis.” If it does, you can proceed to find the regression equation However, if “Data Analysis” is not listed, you will need to install the ToolPak to access this feature.
Options (bottom left of screen)
Analysis Tool Pak (it is directly underneath Inactive Application Add-ins near the top of the box)
Click to add a check-mark to the left of analysis Toolpak
Note: If these steps do not work, try these steps instead: Tools/Add-ins/Click to the left of analysis ToolPak to add a check mark to the left/OK
You are now ready to skip ahead to Sect.6.5.2.
Using Excel to Find the SUMMARY OUTPUT
You have now installedToolPak, and you are ready to find the regression equation for the “best-fitting straight line” through the data points by using the following steps:
Open the Excel file:FACINGS8(if it is not already open on your screen)
To deselect a chart with a gray border in an already open file, simply click on any empty cell outside the chart area.
After installing Toolpak, you can now determine the regression equation that illustrates the correlation between the number of shelf facings of Kellogg’s Corn Flakes and the corresponding sales dollars in your dataset.
Remember that you gave the name:facingsto the X data (the predictor), and the name:salesto the Y data (the criterion) in a previous section of this chapter (see Sect.6.2)
Data analysis (far right at top of screen; see Fig.6.28)
Scroll down the dialog box using the down arrowand highlight: Regression (see Fig.6.29)
Fig 6.28 Example of Using the Data/Data Analysis Function of Excel
134 6 Correlation and Simple Linear Regression
Click on the “button” to the left of Output Range to select this, and enter
A44 in the box as the place on your spreadsheet to insert the
TheSUMMARY OUTPUTshould now be in cells: A44 : I61
Widen Column A so that all of the words in the SUMMARY OUTPUT are readable.
Now, change the data in the following three cells to Number format (2 decimal places) by first clicking on “Home” at the top left of your screen:
Now, change the format for all other numbers that are in decimal format to number format, three decimal places.
Next, widen all columns so that all of the labels fit inside the column widths. Then, center all numbers in their cells.
To ensure your document fits on a single page, adjust the scale in the "Page Layout" settings to 70% Your final output should resemble the example shown in Fig 6.30.
Save the resulting file as: FACINGS9
Note the following problem with the summary output.
Whoever wrote the computer program for this version of Excel made a mistake and gave the name: “Multiple R” to cell A47 This is not correct Instead, cell A47
Fig 6.30 Final Spreadsheet of Correlation and Simple Linear Regression including the SUM- MARY OUTPUT for the Data
136 6 Correlation and Simple Linear Regression should say: “correlation r” since this is the notation that we are using for the correlation between X and Y.
You can now use your printout of the regression analysis to find the regression equation that is the best-fitting straight line through the data points.
But first, let’s review some basic terms.
6.5.2.1 Finding the y-intercept, a, of the Regression Line
The y-intercept, represented by the letter "a," is the point where the regression line intersects the y-axis if extended In the summary output, the y-intercept is noted as 0.65 in cell B60, indicating that an imaginary line drawn down the regression line would cross the y-axis at this value This is the reason it is referred to as the "y-intercept."
6.5.2.2 Finding the Slope, b, of the Regression Line
The slope of the regression line, known as the "tilt," indicates how much the line deviates from a horizontal position relative to the data points When the correlation between X and Y is zero, the regression line remains perfectly horizontal along the X-axis, resulting in a slope of zero.
A positive correlation between X and Y indicates that the regression line slopes upward to the right above the X-axis In the provided example, the regression line has a slope of +1.54, as noted in cell B61 For clarity, we will use the notation "b" to represent the slope of the regression line, which Excel refers to as "X Variable 1" in its printout.
The analysis reveals a strong positive correlation of +0.83 between the number of facings and weekly sales dollars, indicating that as the number of facings increases, weekly sales tend to rise This relationship is visually represented by an upward-sloping regression line, as shown in the SUMMARY OUTPUT in Fig 6.30, with the correlation coefficient recorded in cell B47.
If the correlation between X and Y were negative, the regression line would
“slope down to the right” above the X-axis This would happen whenever the correlation between X and Y is a negative correlation that is between zero and minus one (0 and1).
Finding the Equation for the Regression Line
To derive the regression equation predicting weekly sales based on the number of facings, we focus on two key values from the SUMMARY OUTPUT in Fig 6.30: B60 and B61.
The format for the regression line is: Y ẳ a ỵ b X ð6:3ị whereaẳthe y-intercept(0.65 in our example in cell B60) andbẳthe slope of the line(+1.54 in our example in cell B61)
Therefore, the equation for the best-fitting regression line for our example is:
Remember that Y is the weekly sales ($000) that we are trying to predict, using the number of facings as the predictor, X.
Let’s try an example using this formula to predict the weekly sales.
Using the Regression Line to Predict the y-Value
Objective: Find the weekly sales predicted fromone facingof Kellogg’s Corn
Flakes on the supermarket shelf.
Since the number of facings is one (i.e., Xẳ1), substituting this number into our regression equation gives:
To predict sales from a single shelf facing, observe your chart and draw a vertical line upward until you intersect with the regression line, which occurs just below the number 1 on the y-axis (specifically at 0.89).
But since weekly sales are recorded in thousands of dollars ($000), we need to multiply our answer above by 1,000 to find the weekly sales figure.
When we do that, this gives an estimated weekly sales of $890 (0.89 x 1,000) when we use one facing of this cereal.
138 6 Correlation and Simple Linear Regression
Now, let’s do a second example and predict what the weekly sales figure would be is we used 3 facings of Kellogg’s Corn Flakes on the supermarket shelf.
When analyzing your chart, observe that moving directly upward from three shelf facings intersects the regression line just below the number 4 on the y-axis, specifically at 3.97 This indicates the predicted sales outcome based on three shelf facings.
But since weekly sales are recorded in thousands of dollars ($000), we need to multiply our answer above by 1,000 to find the weekly sales figure.
When we do that, this gives an estimated weekly sales of $3,970 when we use three facings of the cereal.
For a more detailed discussion of regression, see Black (2010).
Adding the Regression Equation to the Chart
Objective: To Add the Regression Equation to the Chart
If you want to include the regression equation within the chart next to the regression line, you can do that, but a word of caution first.
Throughout this book, we are using the regression equation for one predictor and one criterion to be the following:
Yẳaỵb X ð6:3ị where aẳy-intercept and bẳslope of the line
See, for example, the regression equation in Sect.6.5.3where the y-intercept wasaẳ 0.65and the slope of the line wasbẳ+ 1.54to generate the following regression equation:
However, Excel 2016 uses a slightly different regression equation (which is logically identical to the one used in this book) when you add a regression equation to a chart:
6.6 Adding the Regression Equation to the Chart 139
Y ẳ b Xỵa ð6:4ị where aẳy-intercept and bẳslope of the line
Note that this equation is identical to the one we are using in this book with the terms arranged in a different sequence.
For the example we used in Sect.6.5.3, Excel 2016 would write the regression equation on the chart as:
This is the format that will result when you add the regression equation to the chart using Excel 2016 using the following steps:
Open the file: FACINGS9 (that you saved in Sect.6.5.2)
Click justinside the outer border of the chart in the top right corner to add the
“border” around the chart in order to “select the chart” for changes you are about to make
Right-click on any of the data-points in the chart
Highlight: Add Trendline, and click on it to select this command
The “Linear button” near the top of the dialog box will already be selected (on its left)
Scroll down this dialog box, and click on: Display Equation on chart (near the bottom of the dialog box; see Fig.6.31)
Click on the X at the top right of the Format Trendline dialogue box to remove this box.
Click on any empty cell outside of the chart to deselect the chart.
Fig 6.31 Dialogue Box for Adding the Regression Equation to the Chart Next to the Regression Line on the Chart
140 6 Correlation and Simple Linear Regression
Note that the regression equation on the chart is in the following form next to the regression line on the chart (see Fig.6.32).
Yẳ1.54 X0.65 (Save this file as: FACINGS10, and print it out so that it fits onto one page)
Fig 6.32 Example of a Chart with the Regression Equation Displayed Next to the Regression Line
6.6 Adding the Regression Equation to the Chart 141
How to Recognize Negative Correlations
in the SUMMARY OUTPUT Table
Important note: Since Excel does not recognize negative correlations in the SUM-
MARY OUTPUT generates results that mistakenly treat all correlations as positive It is crucial to recognize that there may be a negative correlation between X and Y, despite the output indicating a positive correlation.
You will know that the correlation between X and Y is a negative correlation when these two things occur:
(1) THE SLOPE, b, IS A NEGATIVE NUMBER This can only occur when there is a negative correlation.
(2) THE CHART CLEARLY SHOWS A DOWNWARD SLOPE INTHE REGRESSION LINE, which can only occur when the correlation between X and Y is negative.
Printing Only Part of a Spreadsheet Instead of the Entire
Printing Only the Table and the Chart
Objective: To print only the table and the chart on a separate page
1 Left-click your mouse starting at the top left of the tablein cell A2and drag the mousedown and to the right so that all of the table and all of the chart are highlighted in light blue on your computer screen from cell A2 to cell I43(the highlighted cells are called the “selection” cells).
The resulting printout should contain only the table of the data and the chart resulting from the data.
Then, click on any empty cell in your spreadsheet to deselect the table and chart.
Printing Only the Chart on a Separate Page
Objective: To print only the chart on a separate page
1 Click on any “white space”just inside the outside border of the chart in the top right corner of the chartto create the border around all of the borders of the chart in order to “select” the chart.
The resulting printout should contain only the chart resulting from the data.
After printing a chart on a separate page, be sure to click on any white space outside the chart to eliminate the gray border surrounding it This gray border indicates to Excel that you intend to print only the chart independently.
6.8 Printing Only Part of a Spreadsheet Instead of the Entire Spreadsheet 143
Printing Only the SUMMARY OUTPUT
of the Regression Analysis on a Separate Page
Objective: To print only the SUMMARY OUTPUT of the regression analysis on a separate page
1 Left-click your mouse at the cell just above SUMMARY OUTPUT incell A43on the left of your spreadsheet and drag the mousedown and to the rightuntil all of the regression output is highlighted in dark blue on your screen from A43 to I62. (Change the “Scale to Fit” to 75 % so that the SUMMARY OUTPUT will fit onto one page when you print it out.)
The resulting printout should contain only the summary output of the regression analysis on a separate page.
Finally, click on any empty cell on the spreadsheet to “deselect” the regression table.
End-of-Chapter Practice Problems
Blockbuster Video has engaged you to create a regression equation aimed at forecasting the average daily rentals from its stores, utilizing the average family income of households located within a two-mile radius of these stores in Missouri This predictive model will assist Blockbuster in estimating potential sales for new store locations they are contemplating in the state To refine your Excel regression analysis skills, you will work with hypothetical data presented in Fig 6.33.
Fig 6.33 Worksheet Data for Chap 6: Practice
144 6 Correlation and Simple Linear Regression
To analyze the relationship between income and daily rentals, create an Excel spreadsheet where income serves as the independent variable (X) in the left column and the number of daily rentals is positioned as the dependent variable (Y) in the right column.
To accurately determine the correlation between two variables in Excel, ensure that the predictor variable (X) is positioned in the left column, with the criterion variable (Y) placed directly to the right of it This arrangement is essential for effectively analyzing the relationship between the variables and validating your hypotheses.
(a) Use Excel’sẳcorrel function to find the correlation between these two vari- ables, and round off the result to two decimal places.
(b) Create anXY scatterplotof these two sets of data such that:
• Top title: RELATIONSHIP BETWEEN INCOME AND RENTALS/DAY
• x-axis title: AVERAGE FAMILY INCOME ($000)
• y-axis title: RENTALS (per day)
• re-size the chart so that it is 8 columns wide and 25 rows long
• move the chart below the table
To create the least-squares regression line for the given data on the scatterplot, utilize Excel to perform regression analysis After running the regression statistics, ensure to display the resulting equation for the least-squares regression line beneath the chart in your spreadsheet Format the correlation coefficient and the regression coefficients to two decimal places for clarity.
To ensure clarity and organization, print the input data and chart on a single page, while the regression output table should be printed on a separate page, formatted to fit neatly within that page.
(f) save the file as: RENTAL10
Now, answer these questions using your Excel printout:
(2) What is the slope of the line?
(3) What is the regression equation for these data (use two decimal places for the y-intercept and the slope)?
(4) Use the regression equation to predict the average number of daily rentals you would expect for a retail area that had an average family income of
In a large engineering company, the relationship between engineers' salaries as a percentage of the midpoint salary and the raises granted during the last contract is significant The midpoint salary, set at 100, serves as a benchmark for evaluating individual salaries, with each engineer's compensation compared to this midpoint to determine their "position in range." Engineers earning below the midpoint score less than 100, while those above score higher, reflecting their relative standing within the salary structure Understanding this relationship is crucial for assessing compensation fairness and the impact of contractual raises on employee morale.
In analyzing the hypothetical data presented in Fig 6.34, it is observed that 145 salaries exceed the midpoint, corresponding to scores greater than 100 This indicates a significant number of salaries fall above the average benchmark, warranting further investigation into the factors contributing to these elevated earnings.
Create an Excel spreadsheet, and enter the data.
(a) create anXY scatterplotof these two sets of data such that:
• top title: RELATIONSHIP BETWEEN POSITION IN RANGE AND PER- CENT RAISE FOR ENGINEERS
• x-axis title: POSITION IN RANGE
• move the chart below the table
• re-size the chart so that it is 7 columns wide and 25 rows long
To create the least-squares regression line for the data presented in the scatterplot, utilize Excel to run the regression statistics This will allow you to derive the equation for the least-squares regression line, which should be displayed below the chart on your spreadsheet Additionally, ensure to add the regression equation to the chart and format the numbers appropriately for clarity.
(2 decimal places) for the correlation and number format (3 decimal places) for the coefficients.
Printjust the input data and the chartso that this information fits onto one page in portrait format.
Then, printjust the regression output tableon a separate page so that it fits onto that separate page in portrait format.
(d) Circle and label the value of they-interceptand theslopeof the regression line on your printout.
(e) Write the regression equationby handon your printout for these data (use three decimal places for the y-intercept and the slope).
Fig 6.34 Worksheet Data for Chap 6: Practice Problem #2
146 6 Correlation and Simple Linear Regression
(f) Circle and label thecorrelationbetween the two sets of scores in the regression analysis summary output table on your printout.
To predict the expected percent raise for an engineer with a POSITION IN RANGE score of 90, apply the regression equation you previously noted By substituting the score into the equation, you can calculate the anticipated increase in salary This analysis provides valuable insights into how position scores correlate with salary adjustments in the engineering field.
(h) Read from the graph, the PERCENT RAISE you would expect for an engineer with a POSITION IN RANGE score of 110, and write your answer in the space immediately below:
(i) save the file as: ENGINE3
The relationship between the number of sales calls made by sales staff and the number of copier machines sold in a month is worth exploring Analyzing the hypothetical data collected for the previous month, as illustrated in Fig 6.35, can provide insights into how sales activity correlates with sales outcomes Understanding this relationship can help optimize sales strategies and improve overall performance.
To analyze the relationship between sales calls and copier sales, create an Excel spreadsheet where the number of sales calls serves as the independent variable (predictor) and the number of copiers sold by each salesperson last month acts as the dependent variable (criterion).
(a) Use Excel’sẳcorrelfunction to find the correlation between these two sets of scores, and round off the result to two decimal places.
(b) create anXY scatterplotof these two sets of data such that:
• top title: RELATIONSHIP BETWEEN NO OF SALES CALLS AND COPIERS SOLD
• x-axis title: NO OF SALES CALLS
• y-axis title: NO OF COPIERS SOLD
• move the chart below the table
• re-size the chart so that it is 7 columns wide and 25 rows long
Fig 6.35 Worksheet Data for Chap 6: Practice
6.9 End-of-Chapter Practice Problems 147
To create the least-squares regression line for the given data on the scatterplot, utilize Excel to perform regression analysis After running the regression statistics, obtain the equation for the least-squares regression line and display the results beneath the chart in your spreadsheet Ensure that the correlation and coefficients are formatted to two decimal places for clarity.
To ensure clarity and organization, print the input data and the chart on a single page, ensuring they fit appropriately Subsequently, place the regression output table on a separate page, formatted to fit well within that page.
(f) save the file as: copier4
Answer the following questions using your Excel printout:
1 What is the correlation between the number of sales calls and the number of copiers sold?
3 What is the slope of the line?
4 What is the regression equation?
To predict the number of copiers sold by a salesperson who made 25 sales calls last month, you can utilize the regression equation By substituting the value of 25 into the equation, you will be able to calculate the expected number of copiers sold Ensure to document your calculations on a separate sheet of paper for clarity and reference.
Black, K Business Statistics: For Contemporary Decision Making (6 th ed.) Hoboken, NJ: John Wiley & Sons, Inc., 2010.
Levine, D.M Stephan, D.F., Krehbiel, T.C., and Berenson, M.L Statistics for Managers Using Microsoft Excel (6 th ed.) Boston, MA: Prentice Hall/Pearson, 2011.
Zikmund, W.G and Babin, B.J Exploring Marketing Research (10 th ed.) Mason, OH: South- Western Cengage Learning, 2010.
148 6 Correlation and Simple Linear Regression
In business, there are instances when predicting a criterion, Y, is essential, and using multiple predictors (e.g., X1, X2, X3) can enhance the accuracy of the prediction compared to relying on a single predictor, X This approach is known as "multiple correlation," as it combines two or more predictors to effectively forecast the outcome of Y.
Multiple Regression Equation
The multiple regression equation follows a similar format and is:
Yẳaỵb1X1ỵb2X2ỵb3X3ỵetc:depending on the number of predictors used ð7:2ị © Springer International Publishing Switzerland 2016
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
The “weight” given to each predictor in the equation is represented by the letter
“b” with a subscript to correspond to the same subscript on the predictors.
Important note: In order to do multiple regression, you need to have installed the
“Data Analysis TookPak” that was described in Chap.6(see Sect. 6.5.1) If you did not install this, you need to do so now
A car rental company has tasked you with predicting annual sales by analyzing the correlation between its fleet size and the number of rental locations across the U.S By examining these key factors, you can develop insights that may enhance sales forecasting and improve business strategies.
Y Annual Sales (in millions of dollars)
X 1 No of cars in the fleet (in thousands of cars)
X 2 No of locations in the U.S.
Suppose, further, that this rental car company supplied you with the following hypothetical data summarizing its performance along with the performance of its competitors (see Fig.7.1):
Create an Excel spreadsheet for these data using the following cell reference:
Fig 7.1 Worksheet Data for Rental Car Companies (Practical Example)
150 7 Multiple Correlation and Multiple Regression
Next, change the column width to match the above table, and change all figures to number format (zero decimal places).
Now, fill in the additional data in the chart such that:
C21: 44 (Then, center the information in all cells of your table.)
Ensure that all numbers in your table are accurate to avoid discrepancies in your spreadsheets Save the file under the name RENTAL5.
Before we do the multiple regression analysis, we need to try to make one important point very clear:
When utilizing a single predictor variable, X, to forecast a criterion variable, Y, it is essential to position the X variable on the left side of your table and the Y variable on the right This arrangement clearly distinguishes the predictor from the criterion, ensuring clarity in your analysis.
However, in multiple regression, you need to follow this rule which is exactly the opposite:
In multiple regression analysis, it is crucial to position the criterion variable, Y, on the far left of your table, with all predictor variables placed to the right This arrangement clearly distinguishes the criterion from the predictors, ensuring clarity in your analysis.
In the provided table, the criterion variable, SALES, is positioned on the far left, while the predictors, NUMBER OF CARS and NUMBER OF LOCATIONS, are located to its right Adhering to this arrangement is crucial; otherwise, the accuracy of your regression equation will be compromised.
Finding the Multiple Correlation and the Multiple
Objective: To find the multiple correlation and multiple regression equation using Excel.
7.2 Finding the Multiple Correlation and the Multiple Regression Equation 151
You do this by the following commands:
Click on: Data Analysis (far right top of screen)
Regression (scroll down to this in the box; see Fig.7.2)
Click on the Labels box toadd a check markto it (because you have included the column labels in row 6)
Output Range (click on the button to its left, and enter): A25 (see Fig.7.3) Fig 7.2 Dialogue Box for Regression Function
152 7 Multiple Correlation and Multiple Regression
Excel automatically adds a dollar sign ($) before each column letter and row number, ensuring that data ranges remain constant during regression analysis.
OK (see Fig.7.4to see the resulting SUMMARY OUTPUT)
Next, format the following four cells in Number format (2 decimal places):
Fig 7.3 Dialogue Box for Regression of Car Rental Companies Data
Fig 7.4 Regression SUMMARY OUTPUT of Car Rental Companies Data
7.2 Finding the Multiple Correlation and the Multiple Regression Equation 153
Note that both the input Y Range and the Input X Range above both include the label at the top of the columns.
Re-save the file as: RENTAL5
Now, print the file so that it fits onto one page by changing the scale to60% size. The resulting regression analysis is given in Fig.7.5.
After obtaining the SUMMARY OUTPUT, you can identify the multiple correlation and derive the regression equation that represents the best-fit line for the data points, utilizing the number of cars (in thousands) and the number of locations as the two predictor variables, while sales (in millions of dollars) serves as the criterion.
In the SUMMARY OUTPUT, the term "Multiple R" accurately reflects Excel's designation for multiple correlation, which is +0.93 This indicates that the relationship between the number of cars and the other variables in the dataset is strong, suggesting a significant correlation.
NO OF LOCATIONS together form a very strong positive relationship in predicting Annual Sales.
Fig 7.5 Final Spreadsheet for Car Rental Companies Regression Analysis
154 7 Multiple Correlation and Multiple Regression
To find the regression equation, notice the coefficients at the bottom of the SUMMARY OUTPUT:
Intercept : a (this is the y-intercept) 53.55
Since the general form of the multiple regression equation is:
Yẳaỵb1X1ỵb2X2 ð7:2ị we can now write the multiple regression equation for these data:
Using the Regression Equation to Predict Annual Sales
Objective: To find the predicted annual sales for a rental car company that has
Note that X 1 (NO OF CARS) is measured in thousands of cars in the original data set This means, that for our example, that 80,000 cars would become just
80, since 80 is 80,000 measured in thousands of cars Plugging these two numbers into our regression equation gives us:
To analyze annual sales effectively, it is essential to convert the figures from the original dataset into millions of dollars Based on this conversion, the projected annual sales for a rental car company operating 80,000 vehicles across 900 rental locations is calculated.
If you want to learn more about the theory behind multiple regression, see Keller (2009).
7.3 Using the Regression Equation to Predict Annual Sales 155
Using Excel to Create a Correlation Matrix
The final step in multiple regression is to find the correlation between all of the variables that appear in the regression equation.
In our example, this means that we need to find the correlation between each of the three pairs of variables:
(1) number of cars and sales
(2) number of locations and sales
(3) number of cars and number of locations
To do this, we need to use Excel to create a “correlation matrix.” This matrix summarizes the three correlations above.
Objective: To use Excel to create a correlation matrix between the three vari- ables in this example.
To use Excel to do this, use these steps:
Data (top of screen under “Home” at the top left of screen)
Correlation (scrollupto highlight this formula; see Fig.7.6)
The dataset encompasses key metrics, including SALES, NUMBER OF CARS, and NUMBER OF LOCATIONS, providing a comprehensive overview of the automotive market Each variable is crucial for understanding market dynamics and performance trends By analyzing these figures, stakeholders can gain valuable insights into sales patterns and operational reach, facilitating informed decision-making in the automotive industry.
Put a check in the box for: Labels in the First Row (since you included the labels at the top of the columns in your input range of data above)
Fig 7.6 Dialogue Box for Correlation Matrix for Car Rental Companies
156 7 Multiple Correlation and Multiple Regression
Output range (click on the button to its left, and enter): A47 (see Fig.7.7)
The resulting correlation matrix appears in A47:D50 (see Fig.7.8).
To enhance the correlation matrix in the Excel file, format the decimal numbers to two decimal places, adjust column D's width to ensure the "Number of Locations" label fits within cell D47, and center all numbers in the matrix Finally, save the file under the name RENTAL6.
The final spreadsheet for these Car Rental Companies appears in Fig.7.9.
Fig 7.7 Dialogue Box for Input/Output Range for Correlation Matrix
Fig 7.8 Resulting Correlation Matrix for Rental Car Companies Data
7.4 Using Excel to Create a Correlation Matrix in Multiple Regression 157
Note that the number “1” along the diagonal of the correlation matrix means that the correlation of each variable with itself is a perfect, positive correlation of 1.0.
Correlation coefficients are always expressed in just two decimal places. You are now ready so read the correlation between the three pairs of variables:
The correlation between NO OF CARS and SALES is: + 92 The correlation between NO OF LOCATIONS and SALES is: + 56 The correlation between NO OF CARS and NO OF LOCATIONS is: + 69
The number of cars is the most significant predictor of sales, demonstrating a strong correlation of +0.92 Introducing the second variable, the number of locations, only marginally enhanced the prediction to +0.93, indicating that the additional effort is not justified.
NO OF CARS is an excellent prediction of ANNUAL SALES all by itself.
If you want to learn more about the correlation matrix, see Levine et al (2011). Fig 7.9 Final Spreadsheet for Car Rental Companies Regression and the Correlation Matrix
158 7 Multiple Correlation and Multiple Regression
End-of-Chapter Practice Problems
1 The Graduate Record Examinations (GRE) are frequently used to predict the first-year GPA of students in an MBA program.
The Graduate Record Examinations (GRE) are a standardized test required for admission to many U.S MBA programs, designed to assess general academic readiness across various fields The GRE consists of three subtests: Verbal Reasoning, Quantitative Reasoning, and Analytical Writing, with scores ranging from 130 to 170 for the first two and 0 to 6 for the latter A director of an MBA program has requested an analysis of the correlation between GRE scores from last year's entering graduate class and their first-year grade-point average (GPA) to evaluate the GRE's predictive ability regarding academic success in graduate studies.
You have decided to use the three subtest scores as the predictors, X1, X2, and X3 and the first-year grade-point average (FIRST-YEAR GPA) as the criterion,
Y To test your Excel skills, you have randomly selected a small group of students from last year’s entering MBA class, and have recorded their scores on these variables.
But, suppose, that you want to find out what would happen if you added undergraduate GPA as a fourth predictor What would be the multiple correlation?
Using the hypothetical data from Fig 7.10, we explore the impact of undergraduate GPA as a fourth predictor of first-year GPA for MBA program students This analysis aims to uncover the relationship between prior academic performance and subsequent success in graduate studies, providing valuable insights for prospective students and educational institutions alike.
Fig 7.10 Worksheet Data for Chap 7: Practice Problem #1
7.5 End-of-Chapter Practice Problems 159
(a) Create an Excel spreadsheet using FIRST-YEAR GPA as the criterion (Y), and the other variables as the four predictors of this criterion.
(b) Use Excel’s multiple regression function to find the relationship between these variables and place it below the table.
For the Summary Output, utilize a number format with two decimal places for the multiple correlation Coefficients should be formatted with three decimal places, while all other decimal figures in the Summary Output should be presented with four decimal places.
(d) Print the table and regression results below the table so that they fit onto one page.
(e) By hand on this printout,circle and label:
(2b) coefficients for the y-intercept, GRE VERBAL, GRE QUANTITA- TIVE, GRE WRITING, AND UNDERGRAD GPA
(f) Save this file as: GRE24
To create a correlation matrix for the five variables in your Excel file, place it beneath the SUMMARY OUTPUT, ensuring that each correlation is rounded to two decimal places Save the updated file as GRE24.
(h) Now, print outjust this correlation matrix in portrait modeon a separate sheet of paper.
Answer the following questions using your Excel printout:
1 What is the multiple correlation R xy ?
3 What is the coefficient for GRE VERBALb1?
4 What is the coefficient for GRE QUANTITATIVEb2?
5 What is the coefficient for GRE WRITINGb3?
6 What is the coefficient for UNDERGRAD GPAb4?
7 What is the multiple regression equation?
To predict the first-year GPA based on a GRE verbal score of 159, a GRE quantitative score of 154, a GRE writing score of 4, and an undergraduate GPA of 3.05, we can use the established regression equation By substituting these values into the equation, we can calculate the expected first-year GPA outcome.
Answer the following questions using your Excel printout Be sure to include the plus or minus sign for each correlation:
9 What is the correlation between UNDERGRAD GPA and FIRST-YEAR GPA?
10 What is the correlation between UNDERGRAD GPA and GRE VERBAL?
11 What is the correlation between UNDERGRAD GPA and GRE QUANTITATIVE?
12 What is the correlation between UNDERGRAD GPA and GRE WRITING?
13 Discuss which of the four predictors is the best predictor of FIRST- YEAR GPA.
160 7 Multiple Correlation and Multiple Regression
14 Explain in words how much better the four predictor variables combined predict FIRST-YEAR GPA than the best single predictor by itself.
The Graduate Management Admission Test (GMAT) is a three-and-a-half hour exam accepted by nearly 6,000 business and management programs across over 80 countries, serving as a key component of graduate degree applications for more than 200,000 candidates annually A prominent university offering a Master's in Human Resources Management is investigating the correlation between GMAT scores and first-year graduate students' Grade-Point Averages (GPA) The GMAT consists of four subtests, including a Verbal section with scores ranging from 0 to 60.
The four subtest scores used as predictors of first-year GPA include Quantitative (score range 0–60), Analytical Writing (score range 0–6 in 0.5 intervals), and Integrated Reasoning (score range 1–8) To enhance your Excel skills, you have developed hypothetical data, as illustrated in Fig 7.11.
To analyze the impact of various predictors on first-year GPA, create an Excel spreadsheet with first-year GPA as the dependent variable (Y) Use the following four predictors: Verbal (X1), Quantitative (X2), Analytical Writing (X3), and Integrated Reasoning (X4) This structured approach will facilitate a comprehensive evaluation of how these variables influence academic performance in the first year of study.
Utilize Excel's multiple regression function to analyze the relationship among five variables, and ensure the SUMMARY OUTPUT is positioned beneath the table Format the multiple correlation in the Summary Output to two decimal places, while displaying the coefficients in the SUMMARY OUTPUT with three decimal places for clarity and precision.
(d) Save the file as: GMAT26
Fig 7.11 Worksheet Data for Chap 7: Practice Problem #2
7.5 End-of-Chapter Practice Problems 161
(e) Print the table and regression results below the table so that they fit onto one page.
Answer the following questions using your Excel printout:
1 What is the multiple correlationRxy?
3 What is the coefficient for VERBAL,b1?
4 What is the coefficient for QUANTITATIVE,b2?
5 What is the coefficient for ANALYTICAL WRITING,b3?
6 What is the coefficient for INTEGRATED REASONING,b4?
7 What is the multiple regression equation?
8 Predict the FIRST-YEAR GPA you would expect for a VERBAL score of
52, a QUANTITATIVE SCORE OF 48, an ANALYTICAL WRITING SCORE of 4.5, and an INTEGRATED REASONING SCORE OF 6.
(f) Now, go back to your Excel file and create a correlation matrix for these five variables, and place it underneath the SUMMARY OUTPUT.
(g) Re-save this file as: GMAT26
(h) Now, print outjust this correlation matrixon a separate sheet of paper.
Answer to the following questions using your Excel printout (Be sure to include the plus or minus sign for each correlation):
9 What is the correlation between VERBAL and FIRST-YEAR GPA?
10 What is the correlation between QUANTITATIVE and FIRST-YEAR GPA?
11 What is the correlation between ANALYTICAL WRITING and FIRST- YEAR GPA?
12 What is the correlation between INTEGRATED REASONING and FIRST-YEAR GPA?
13 What is the correlation between VERBAL and QUANTITATIVE?
14 What is the correlation between QUANTITATIVE and ANALYTICAL WRITING?
15 What is the correlation between ANALYTICAL WRITING and INTE- GRATED REASONING?
16 What is the correlation between QUANTITATIVE and INTEGRATED REASONING?
17 Discuss which of the four predictors is the best predictor of FIRST- YEAR GPA.
18 Explain in words how much better the four predictor variables combined predict FIRST-YEAR GPA than the best single predictor by itself.
As the marketing manager for 7Eleven Stores in Missouri, you are evaluating the potential sales volume of a proposed new store location to determine its viability By analyzing internal data, you aim to ascertain whether the projected yearly sales would justify the investment in building a new store at this site.
162 7 Multiple Correlation and Multiple Regression for a random sample of 20 7Eleven stores in Missouri based on last year’s data to create the hypothetical data given in Fig.7.12.
(a) create an Excel spreadsheet using the annual sales figures as the criterion and the average daily traffic, population, and income figures as the predictors.
Utilize Excel's multiple regression function to analyze the relationship among the four specified variables, ensuring to include the SUMMARY OUTPUT beneath the corresponding table Format the multiple correlation in the Summary Output to two decimal places, and apply the same formatting to the coefficients presented in the summary output.
(d) Save the file as: multiple2
(e) Print the table and regression results below the table so that they fit onto one page.
Answer the following questions using your Excel printout:
3 What is the coefficient for Average Daily Trafficb1?
4 What is the coefficient for Populationb2?
5 What is the coefficient for Average Incomeb3?
6 What is the multiple regression equation?
7 Predict the annual sales you would expect for Average Daily Traffic of 42,000, a population of 23,000, and income of $22,000.
Fig 7.12 Worksheet Data for Chap 7: Practice Problem #3
7.5 End-of-Chapter Practice Problems 163
(f) Now, go back to your Excel file and create a correlation matrix for these four variables, and place it underneath the SUMMARY OUTPUT on your spreadsheet.
(g) Save this file as: multiple3
(h) Now, print outjust this correlation matrixon a separate sheet of paper.
Answer the following questions using your Excel printout Be sure to include the plus or minus sign for each correlation:
8 What is the correlation between traffic and sales?
9 What is the correlation between population and sales?
10 What is the correlation between income and sales?
11 What is the correlation between traffic and population?
12 What is the correlation between population and income?
13 Discuss which of the three predictors is the best predictor of annual sales:
14 Explain in words how much better the three predictor variables combined predict annual sales than the best single predictor by itself.
Keller, G Statistics for Management and Economics (8 th ed.) Mason, OH: South-Western Cengage Learning, 2009.
Levine, D.M., Stephan, D.F., Krehbiel, T.C., and Berenson, M.L Statistics for Managers using Microsoft Excel (6 th ed.) Boston, MA: Prentice Hall/Pearson, 2011.
164 7 Multiple Correlation and Multiple Regression
One-Way Analysis of Variance (ANOVA)
In this 2016 Excel Guide, you have explored the use of one-group and two-group t-tests for comparing means However, when faced with more than two groups, it is essential to employ techniques such as ANOVA (Analysis of Variance) to assess whether significant differences exist among the group means.
The answer to this question is:Analysis of Variance (ANOVA).
The ANOVA test allows you to test for the difference between the means when you havethree or more groupsin your research study.
To conduct a One-way Analysis of Variance (ANOVA), it is essential to have the "Data Analysis Toolpak" installed, as outlined in Chapter 6, Section 6.5.1 If you have not yet installed this tool, please do so before proceeding.
If you're looking to compare prices among three prominent supermarket chains in St Louis—Dierberg’s, Schnuck’s, and Shop‘n Save—consider using a market basket of 28 specific items This comparison will help you identify price differences for essential products, including package sizes for each item, such as Tide Liquid laundry detergent for item #14.
In the 63119 zip code area of St Louis, you can find one store each from three major supermarket chains After visiting these supermarkets, you gathered hypothetical price data for a selection of products in your market basket, as illustrated in Fig 8.1 This information provides a comprehensive overview of the price variations among the three chains in this specific location.
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
Create an Excel spreadsheet for these data in this way:
Using Excel to Perform a One-Way Analysis
Objective: To use Excel to perform a one-way ANOVA test.
You are now ready to perform an ANOVA test on these data using the following steps:
Data (at top of screen)
Data Analysis (far right at top of screen)
Anova: Single Factor (scroll up to this formula and highlight it; see Fig.8.2)
Input range: B3: D31 (note that you have included in this range the column titles that are in row 3)
When comparing groups with varying sample sizes, it is crucial to define the INPUT RANGE correctly Start by selecting the column title of the first group on the left and extend to the last column on the right, ensuring you include all rows down to the lowest figure in the data matrix This approach guarantees that the INPUT RANGE maintains a rectangular shape, which is essential for accurate data analysis.
Put a check mark in: Labels in First Row
Output range (click on the button to its left): A36 (see Fig.8.3)
Fig 8.2 Dialog Box for Data Analysis: Anova Single Factor
8.1 Using Excel to Perform a One-Way Analysis of Variance (ANOVA) 167
Save this file as: SUPER6
To enhance the readability of your table presented in Fig 8.4, ensure that all decimal figures are rounded to two decimal places and that the numbers are centered within their respective cells.
To ensure all information is displayed on a single page, print both the data table and the ANOVA summary table by adjusting the Page Layout settings to a scale of 85%.
As a check on your analysis, you should have the following in these cells:
Now, let’s discuss how you should interpret this table:
Fig 8.3 Dialog Box for Anova: Single Factor Input/Output Range
Fig 8.4 ANOVA Results for Supermarket Price Comparisons
168 8 One-Way Analysis of Variance (ANOVA)
How to Interpret the ANOVA Table Correctly
Objective: To interpret the ANOVA table correctly
ANOVA, or Analysis of Variance, is a statistical method used to determine if there are significant differences between the means of three or more groups This test utilizes the F-test statistic, commonly represented by the letter F, to evaluate the variation among group means.
The formula for the F-test is this:
FẳMean Square between groups MSð bịdivided by Mean Square within groups MSð wị
This Excel Guide focuses on teaching you how to use Excel rather than delving into the statistical theory behind ANOVA formulas For a comprehensive understanding of ANOVA, refer to Weiers (2011).
In Excel, dividing the values in cell D47 (MSbẳ4.17) and cell D48 (MSwẳ1.23) yields an F-test result of 3.40, which is displayed in cell E47 This F value of 3.40 is significant for interpreting the variance between groups in statistical analysis.
To assess whether the F value of 3.40 signifies a significant difference among the means of the three price groups, we must first formulate the null hypothesis and the research hypothesis for these groups.
In our supermarket price comparisons, we analyze two hypotheses: the null hypothesis posits that the population means of the three groups are equal, while the research hypothesis suggests that these means are not equal, indicating a significant difference among them Based on the ANOVA results, we must determine which hypothesis to accept.
Using the Decision Rule for the ANOVA F-Test
To state the hypotheses, let’s call Dierberg’s as Group 1, Schnuck’s as Group 2, and Shop‘n Save as Group 3 The hypotheses would then be:
The decision-making process for this question mirrors the rules established for the one-group and two-group t-tests discussed in the book For a thorough understanding, refer to Sections 4.1.6 and 5.1.8 for detailed explanations of these rules.
8.3 Using the Decision Rule for the ANOVA F-Test 169
If the absolute value of t is less than the critical t, you accept the null hypothesis. or
If the absolute value of t is greater than the critical t, you reject the null hypothesis, and accept the research hypothesis.
Now, here is the decision rule for ANOVA:
Objective: To learn the decision rule for the ANOVA F-test
The decision rule for the ANOVA F-test is the following:
If the value for F is less than the critical F-value, accept the null hypothesis. or
If the value of F is greater than the critical F-value, reject the null hypothesis, and accept the research hypothesis.
Note that Excel tell you the critical F-value in cell G47: 3.11
Therefore, our decision rule for the supermarket AVOVA test is this:
Since the value of F of 3.40 is greater than the critical F-value of 3.11, we reject the null hypothesis and accept the research hypothesis.
Therefore, our conclusion, in plain English, is:
There is a significant difference between the population means of the three supermarket prices.
The F-value, which cannot fall below one, is inherently non-negative; therefore, it is unnecessary to take its absolute value.
ANOVA indicates a significant difference among the population means of three groups; however, it does not specify which pairs of groups exhibit significant differences.
Testing the Difference Between Two Groups Using
Comparing Dierberg ’ s vs Shop ‘ n Save
Objective: To compare Dierberg’s vs Shop ‘n Save in their prices for the
28 items in the shopping basket using the ANOVA t-test.
The first step is to write the null hypothesis and the research hypothesis for these two supermarkets.
In the context of the ANOVA t-test, the null hypothesis posits that the population means of Dierberg’s (Group 1) and Shop‘n Save (Group 3) are equal, while the research hypothesis suggests that these means are not equal, indicating a significant difference between the two groups.
For Group 1 vs Group 3, the formula for the ANOVA t-test is:
ANOVA tẳ X1X2 s:e:ANOVA ð8:2ị where s:e:ANOVAẳ
The steps involved in computing this ANOVA t-test are:
1 Find the difference of the sample means for the two groups (2.44 – 1.69ẳ0.75).
2 Find 1=n 1 þ1=n 3 (since both groups have 28 supermarket items in them, this becomes: 1/28 + 1/28ẳ0.0357 + 0.0357ẳ0.0714)
8.4 Testing the Difference Between Two Groups Using the ANOVA t-Test 171
3 Multiply MS w times the answer for step 2 (1.230.0714ẳ0.0878)
4 Take the square root of step 3 (SQRT (0.0878)ẳ0.30)
5 Divide Step 1 by Step 4 to find ANOVA t (0.75/0.30ẳ2.50)
When performing calculations in Excel, it's important to note that the software computes values to 16 decimal places As a result, users may find that Excel yields an answer of 2.54, whereas a standard calculator would provide a rounded result of 2.50.
To interpret the ANOVA t-test result of 2.50 accurately, it is essential to determine the critical value of t for the test This involves calculating the degrees of freedom associated with the ANOVA t-test.
8.4.1.1 Finding the Degrees of Freedom for the ANOVA t-Test
Objective: To find the degrees of freedom for the ANOVA t-test.
In ANOVA t-tests, the degrees of freedom (df) are calculated by taking the total sample size across all groups and subtracting the number of groups in the study Mathematically, this is expressed as df = nTOTAL - k, where nTOTAL represents the total sample size and k denotes the number of groups.
In our example, the total sample size of the three groups is 84 since there are
28 prices for each of the three supermarkets, and since there are three groups, 84–3 gives a degrees of freedom for the ANOVA t-test of 81.
In the t-table found in Appendix E, the critical t-value for df = 81 is 1.96, as indicated in the second column under degrees of freedom.
Important note: Be sure to use the degrees of freedom column (df) in AppendixE for the ANOVA t-test critical t value
8.4.1.2 Stating the Decision Rule for the ANOVA t-Test
Objective: To learn the decision rule for the ANOVA t-test
Interpreting the results of the ANOVA t-test adheres to the same decision-making criteria applied in both the one-group t-test and the two-group t-test.
172 8 One-Way Analysis of Variance (ANOVA)
If the absolute value of t is less than the critical value of t, we accept the null hypothesis. or
If the absolute value of t is greater than the critical value of t, we reject the null hypothesis and accept the research hypothesis.
In our analysis using a t-test, we calculated the absolute value of t, which is 2.50 This value exceeds the critical t-value of 1.96, leading us to reject the null hypothesis that the population means of the two groups are equal Consequently, we accept the research hypothesis, indicating that the population means of the two groups are significantly different.
This means that our conclusion, in plain English, is as follows:
The average prices of our market basket of items at Dierberg’s were significantly higher than the average prices at Shop‘n Save ($2.44 vs $1.69).
The average prices at Dierberg's are significantly higher, by 44%, compared to those at Shop 'n Save, reflecting a notable price difference of $0.75 This disparity highlights the importance of considering price variations between these two supermarkets based on our hypothetical data.
8.4.1.3 Performing an ANOVA t-Test Using Excel commands
Now, let’s do these calculations for the ANOVA t-test using Excel with the file you created earlier in this chapter: SUPER6
A54: 1/n of Dierberg’s + 1/n of Shop‘n Save
A56: s.e of Dierberg’s vs Shop‘n Save
You should now have the following results in these cells when you round off all these figures in the ANOVA t-test to two decimal points:
Save this final result under the file name: SUPER7
Print out the resulting spreadsheet so that it fits onto one page like Fig.8.5(Hint: Reduce the Page Layout/Scale to Fit to75 %).
8.4 Testing the Difference Between Two Groups Using the ANOVA t-Test 173
Fig 8.5 Final Spreadsheet of Supermarket Price Comparisons for Dierberg ’ s vs Shop ‘ n Save
174 8 One-Way Analysis of Variance (ANOVA)
For a more detailed explanation of the ANOVA t-test, see Black (2010).
It is crucial to conduct an ANOVA t-test for comparing the population means of two groups only when the F-test indicates a significant difference among the population means of all groups involved in the study.
Conducting an ANOVA t-test is inappropriate when the F value is lower than the critical F value This indicates that there is no significant difference between the population means of the groups Consequently, testing for differences between the means of any two groups would merely exploit random variations, leading to unreliable conclusions.
End-of-Chapter Practice Problems
In a laboratory test comparing the premium tire brand A with competitors B and C, the performance was evaluated based on the number of simulated miles driven until the tread length reached a specified threshold The results, presented in thousands of miles, indicate that Brand A outperformed both competitors, showcasing its superior durability and longevity For instance, Brand A achieved a remarkable 63,000 miles before reaching the tread limit, highlighting its competitive edge in the market.
Fig 8.6 Worksheet Data for Chap 8: Practice Problem #1
8.5 End-of-Chapter Practice Problems 175
(a) Enter these data on an Excel spreadsheet.
To conduct a one-way ANOVA test on the tire data, generate an ANOVA table that summarizes the results for the three tire brands If the F-value indicates significance, utilize an Excel formula to perform a t-test comparing the averages of Brand A and Brand C Display the standard error and the ANOVA t-test value separately in the spreadsheet, ensuring that both values are formatted to two decimal places for clarity.
(d) Print out the resulting spreadsheet so that all of the information fits onto one page
(e) Save the spreadsheet as: TIRE7
Now, write the answers to the following questions using your Excel printout:
1 What are the null hypothesis and the research hypothesis for the ANOVA F-test?
2 What is MS b on your Excel printout?
3 What is MS w on your Excel printout?
4 Compute FẳMSb=MSwusing your calculator.
5 What is the critical value of F on your Excel printout?
6 What is the result of the ANOVA F-test?
7 What is the conclusion of the ANOVA F-test in plain English?
In the context of the ANOVA F-test indicating a significant difference in miles driven among three brands, the null hypothesis posits that there is no difference in miles driven between Brand A and Brand C Conversely, the research hypothesis suggests that there is a significant difference in miles driven between these two brands.
9 What is the mean (average) for Brand A on your Excel printout?
10 What is the mean (average) for Brand C on your Excel printout?
11 What are the degrees of freedom (df) for the ANOVA t-test comparing Brand A versus Brand C?
12 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?
13 Compute the s.e ANOVA using your calculator.
14 Compute the ANOVA t-test value comparing Brand A versus Brand C using your calculator.
15 What is the result of the ANOVA t-test comparing Brand A versus Brand C?
16 What is the conclusion of the ANOVA t-test comparing Brand A versus Brand C in plain English?
To analyze the differences between the three tire brands, it's essential to conduct three separate ANOVA t-tests Having already completed the comparison between Brand A and Brand C, the next step is to perform the ANOVA t-test to evaluate the differences between Brand A and Brand B.
17 State the null hypothesis and the research hypothesis comparing Brand A versus Brand B.
18 What is the mean (average) for Brand A on your Excel printout?
19 What is the mean (average) for Brand B on your Excel printout?
176 8 One-Way Analysis of Variance (ANOVA)
20 What are the degrees of freedom (df) for the ANOVA t-test comparing Brand A versus Brand B?
21 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?
22 Compute the s.e ANOVA for Brand A versus Brand B using your calculator.
23 Compute the ANOVA t-test value comparing Brand A versus Brand B.
24 What is the result of the ANOVA t-test comparing Brand A versus Brand B?
25 What is the conclusion of the ANOVA t-test comparing Brand A versus Brand B in plain English?
The last ANOVA t-test compares Brand B versus Brand C Let’s do that test below:
26 State the null hypothesis and the research hypothesis comparing Brand B versus Brand C.
27 What is the mean (average) for Brand B on your Excel printout?
28 What is the mean (average) for Brand C on your Excel printout?
29 What are the degrees of freedom (df) for the ANOVA t-test comparing Brand B versus Brand C?
30 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?
31 Compute the s.e ANOVA comparing Brand B versus Brand C using your calculator.
32 Compute the ANOVA t-test value comparing Brand B versus Brand C with your calculator.
33 What is the result of the ANOVA t-test comparing Brand B versus Brand C?
34 What is the conclusion of the ANOVA t-test comparing Brand B versus Brand C in plain English?
35 What is the summary of the three ANOVA t-tests in plain English?
36 What recommendation would you make to your company about these three brands of tires based on the results of your analysis? Why would you make that recommendation?
McDonald’s introduced the "100% Angus Beef Third Pounders Burgers" to rival Hardee’s supersize hamburgers As a consultant analyzing data from a 12-week test market study across four comparable cities, we focused on factors such as population size, average household income, family size, and the number of McDonald's locations Each city utilized a single advertising method—either radio or local ads—to promote the new burgers.
In a study examining the effectiveness of different advertising mediums, cities were randomly assigned to utilize either TV, billboards, or local newspapers for promoting the Angus Burger, with each city allocating the same weekly advertising budget The hypothetical data presented in Fig 8.7 illustrates the weekly sales figures for the Angus Burger across these various advertising platforms.
(a) Enter these data on an Excel spreadsheet.
8.5 End-of-Chapter Practice Problems 177
(b) Perform a one-way ANOVA test on these data, and show the resulting ANOVA tableunderneaththe input data for the four types of ads.
If the F-value in the ANOVA table is significant, use an Excel formula to calculate the ANOVA t-test for comparing the average units sold between Billboard ads and Radio ads Display the results beneath the ANOVA table in your spreadsheet, ensuring that the standard error and the ANOVA t-test value are presented on separate lines, formatted to two decimal places for clarity.
(d) Print out the resulting spreadsheet so that all of the information fits onto one page
(e) Save the spreadsheet as: McD4
Let’s call the Radio ads Group 1, the Local TV ads Group 2, the Billboards ads Group 3, and the Local Newspaper ads Group 4.
Now, write the answers to the following questions using your Excel printout:
1 What are the null hypothesis and the research hypothesis for the ANOVA F-test?
2 What is MS b on your Excel printout?
3 What is MS w on your Excel printout?
4 Compute FẳMSb=MSwusing your calculator.
5 What is the critical value of F on your Excel printout?
6 What is the result of the ANOVA F-test?
7 What is the conclusion of the ANOVA F-test in plain English?
Fig 8.7 Worksheet Data for Chap 8: Practice Problem #2
178 8 One-Way Analysis of Variance (ANOVA)
In the context of the ANOVA F-test indicating a significant difference in Angus Burger sales among four ad types, the null hypothesis for the ANOVA t-test comparing Billboard ads (Group 3) and Radio ads (Group 1) posits that there is no significant difference in the number of Angus Burgers sold between these two advertising methods Conversely, the research hypothesis suggests that a significant difference does exist in the sales figures between Billboard ads and Radio ads.
9 What is the mean (average) for Billboards ads on your Excel printout?
10 What is the mean (average) for Radio ads on your Excel printout?
11 What are the degrees of freedom (df) for the ANOVA t-test comparing Billboards ads versus Radio ads?
12 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?
13 Compute the s.e ANOVA using your calculator for Billboards ads versus Radio ads.
14 Compute the ANOVA t-test value comparing Billboard ads versus Radio ads using your calculator.
15 What is the result of the ANOVA t-test comparing Billboards ads versus Radio ads?
16 What is the conclusion of the ANOVA t-test comparing Billboards ads versus Radio ads in plain English?
As a consultant for Procter & Gamble, I analyzed data from a pilot study involving three focus groups that viewed four new television commercials for Crest toothpaste After the commercials, participants completed a 10-item survey, and the results from question #8 reveal insights into their perceptions and preferences regarding the advertisements This data is crucial for understanding the effectiveness of the commercials before their official television launch.
8.5 End-of-Chapter Practice Problems 179
Fig 8.8 Worksheet Data for Chap 8: Practice Problem #3
180 8 One-Way Analysis of Variance (ANOVA)
(a) Enter these data on an Excel spreadsheet.
To conduct a one-way ANOVA test on the provided data for the four types of commercials, generate an ANOVA table that summarizes the results beneath the input data If the F-value from the ANOVA table is significant, utilize an Excel formula to perform a t-test comparing the average of Commercial B with that of Commercial D Ensure to display the standard error and the ANOVA t-test value on separate lines in the spreadsheet, rounding each value to two decimal places for clarity and precision.
(d) Print out the resulting spreadsheet so that all of the information fits onto one page
(e) Save the spreadsheet as: TV6
Now, write the answers to the following questions using your Excel printout:
1 What are the null hypothesis and the research hypothesis for the ANOVA F-test?
2 What is MS b on your Excel printout?
3 What is MS w on your Excel printout?
4 Compute FẳMSb=MSwusing your calculator.
5 What is the critical value of F on your Excel printout?
6 What is the result of the ANOVA F-test?
7 What is the conclusion of the ANOVA F-test in plain English?
In the context of the ANOVA F-test indicating a significant difference in believability among four types of TV commercials, the null hypothesis for the ANOVA t-test comparing Commercial B versus Commercial D posits that there is no difference in believability between the two commercials Conversely, the research hypothesis suggests that a significant difference in believability exists between Commercial B and Commercial D.
9 What is the mean (average) for Commercial B on your Excel printout?
10 What is the mean (average) for Commercial D on your Excel printout?
11 What are the degrees of freedom (df) for the ANOVA t-test comparing Commercial B versus Commercial D?
12 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?
13 Compute the s.e ANOVA using your calculator for Commercial B versus Commercial D.
14 Compute the ANOVA t-test value comparing Commercial B versus Com- mercial D using your calculator.
15 What is the result of the ANOVA t-test comparing Commercial B versus Commercial D?
16 What is the conclusion of the ANOVA t-test comparing Commercial B versus Commercial D in plain English?
8.5 End-of-Chapter Practice Problems 181
Black, K Business Statistics: For Contemporary Decision Making (6 th ed.) Hoboken, NJ: John Wiley & Sons, Inc., 2010.
Weiers, R.M Introduction to Business Statistics (7 th ed.) Mason, OH: South-Western Cengage Learning, 2011.
182 8 One-Way Analysis of Variance (ANOVA)
Appendix A: Answers to End-of-Chapter Practice Problems © Springer International Publishing Switzerland 2016
T.J Quirk, Excel 2016 for Business Statistics, Excel for Statistics,
Chapter1: Practice Problem #1 Answer (see Fig.A.1)
Chapter1: Practice Problem #2 Answer (see Fig.A.2)