1. Trang chủ
  2. » Công Nghệ Thông Tin

Excel 2013 for physical sciences statistics a guide to solving practical problems

258 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Excel 2013 for Physical Sciences Statistics A Guide to Solving Practical Problems
Tác giả Thomas J. Quirk, Meghan H. Quirk, Howard F. Horton
Người hướng dẫn Thomas J. Quirk
Trường học Springer International Publishing
Chuyên ngành Physical Sciences Statistics
Thể loại guide
Năm xuất bản 2016
Thành phố Switzerland
Định dạng
Số trang 258
Dung lượng 10,3 MB

Cấu trúc

  • 1.1 Mean (20)
  • 1.2 Standard Deviation (21)
  • 1.3 Standard Error of the Mean (22)
  • 1.4 Sample Size, Mean, Standard Deviation, and Standard (23)
    • 1.4.1 Using the Fill/Series/Columns Commands (23)
    • 1.4.2 Changing the Width of a Column (24)
    • 1.4.3 Centering Information in a Range of Cells (25)
    • 1.4.4 Naming a Range of Cells (27)
    • 1.4.5 Finding the Sample Size Using the ẳ COUNT (28)
    • 1.4.6 Finding the Mean Score Using the ẳ AVERAGE (28)
    • 1.4.7 Finding the Standard Deviation Using the ẳ STDEV (29)
    • 1.4.8 Finding the Standard Error of the Mean (29)
  • 1.5 Saving a Spreadsheet (31)
  • 1.6 Printing a Spreadsheet (32)
  • 1.7 Formatting Numbers in Currency Format (Two Decimal Places) (34)
  • 1.8 Formatting Numbers in Number Format (Three Decimal Places) (36)
  • 1.9 End-of-Chapter Practice Problems (36)
  • 2.1 Creating Frame Numbers for Generating (40)
  • 2.2 Creating Random Numbers in an Excel Worksheet (43)
  • 2.3 Sorting Frame Numbers into a Random Sequence (45)
  • 2.4 Printing an Excel File So That All of the Information (48)
  • 2.5 End-of-Chapter Practice Problems (52)
  • 3.1 Confidence Interval About the Mean (54)
    • 3.1.1 How to Estimate the Population Mean (54)
    • 3.1.2 Estimating the Lower Limit and the Upper Limit (55)
    • 3.1.3 Estimating the Confidence Interval the Chevy (56)
    • 3.1.4 Where Did the Number “1.96” Come From? (57)
    • 3.1.5 Finding the Value for t in the Confidence (58)
    • 3.1.6 Using Excel ’ s TINV Function to Find the (59)
    • 3.1.7 Using Excel to Find the 95 % Confidence Interval (59)
  • 3.2 Hypothesis Testing (65)
    • 3.2.1 Hypotheses Always Refer to the Population (66)
    • 3.2.2 The Null Hypothesis and the Research (Alternative) Hypothesis (66)
    • 3.2.3 The 7 Steps for Hypothesis-Testing Using the (70)
  • 3.3 Alternative Ways to Summarize the Result of a (76)
    • 3.3.1 Different Ways to Accept the Null Hypothesis (77)
    • 3.3.2 Different Ways to Reject the Null Hypothesis (77)
  • 3.4 End-of-Chapter Practice Problems (78)
  • 4.1 The 7 STEPS for Hypothesis-Testing Using (82)
    • 4.1.1 STEP 1: State the Null Hypothesis (83)
    • 4.1.2 STEP 2: Select the Appropriate Statistical Test (83)
    • 4.1.4 STEP 4: Calculate the Formula (84)
    • 4.1.5 STEP 5: Find the Critical Value of t (85)
    • 4.1.6 STEP 6: State the Result of Your Statistical Test (86)
    • 4.1.7 STEP 7: State the Conclusion of Your Statistical (86)
  • 4.2 One-Group t-Test for the Mean (87)
  • 4.3 Can You Use Either the 95 % Confidence Interval (91)
  • 4.4 End-of-Chapter Practice Problems (91)
  • 5.1 The 9 STEPS for Hypothesis-Testing Using (97)
    • 5.1.1 STEP 1: Name One Group, Group 1, (97)
    • 5.1.2 STEP 2: Create a Table That Summarizes (98)
    • 5.1.3 STEP 3: State the Null Hypothesis and the (99)
    • 5.1.4 STEP 4: Select the Appropriate Statistical Test (99)
    • 5.1.5 STEP 5: Decide on a Decision Rule (99)
    • 5.1.6 STEP 6: Calculate the Formula (100)
    • 5.1.7 STEP 7: Find the Critical Value of t (100)
    • 5.1.8 STEP 8: State the Result of Your Statistical Test (101)
    • 5.1.9 STEP 9: State the Conclusion of Your (101)
  • 5.2 Formula #1: Both Groups Have a Sample Size (105)
    • 5.2.1 An Example of Formula #1 for the Two-Group t-Test (107)
  • 5.3 Formula #2: One or Both Groups Have a Sample (112)
  • 5.4 End-of-Chapter Practice Problems (119)
  • 6.1 What Is a “Correlation?” (122)
    • 6.1.1 Understanding the Formula for Computing (126)
    • 6.1.2 Understanding the Nine Steps for Computing (127)
  • 6.2 Using Excel to Compute a Correlation Between (129)
  • 6.3 Creating a Chart and Drawing the Regression (134)
    • 6.3.1 Using Excel to Create a Chart and the (136)
  • 6.4 Printing a Spreadsheet So That the Table and (144)
  • 6.5 Finding the Regression Equation (146)
    • 6.5.1 Installing the Data Analysis ToolPak into Excel (147)
    • 6.5.2 Using Excel to Find the SUMMARY (150)
    • 6.5.3 Finding the Equation for the Regression Line (154)
    • 6.5.4 Using the Regression Line to Predict the y-Value (154)
  • 6.6 Adding the Regression Equation to the Chart (155)
  • 6.7 How to Recognize Negative Correlations in the (158)
  • 6.8 Printing Only Part of a Spreadsheet Instead of the (158)
    • 6.8.1 Printing Only the Table and the Chart on (159)
    • 6.8.2 Printing Only the Chart on a Separate Page (159)
    • 6.8.3 Printing Only the SUMMARY OUTPUT (160)
  • 6.9 End-of-Chapter Practice Problems (160)
  • 7.1 Multiple Regression Equation (167)
  • 7.2 Finding the Multiple Correlation and the Multiple (170)
  • 7.3 Using the Regression Equation to Predict (174)
  • 7.4 Using Excel to Create a Correlation Matrix (175)
  • 7.5 End-of-Chapter Practice Problems (178)
  • 8.1 Using Excel to Perform a One-Way Analysis of (185)
  • 8.2 How to Interpret the ANOVA Table Correctly (189)
  • 8.3 Using the Decision Rule for the ANOVA F-test (189)
  • 8.4 Testing the Difference Between Two Groups (190)
    • 8.4.1 Comparing Brand A vs. Brand C in Miles (191)
  • 8.5 End-of-Chapter Practice Problems (195)

Nội dung

Mean

The mean, often referred to as the "arithmetic average," represents the central value of a set of scores When my daughter, in fifth grade, expressed her confusion about calculating averages, I realized the importance of explaining this concept clearly.

Jennifer urged me to take her seriously as I explained how to calculate the average by adding all the scores and dividing by the total count She wasn't amused by my lighthearted approach and made it clear that she wanted to focus on the importance of the task at hand.

“See these numbers in your book; add them up What is the answer?” (She did that.)

“Now, how many numbers do you have?” (She answered that question.)

“Then, take the number you got when you added up the numbers, and divide that number by the number of numbers that you have.”

By following the same reasoning, you'll discover the correct answer effortlessly, as Excel will automate all the necessary steps for you.

We will call this average of the scores the “mean” which we will symbolize as:

X, and we will pronounce it as: “Xbar.”

The formula for finding the mean with your calculator looks like this:

XẳΣX n ð1:1ị © Springer International Publishing Switzerland 2016

T.J Quirk et al., Excel 2013 for Physical Sciences Statistics,

The Greek letter sigma (Σ) represents "sum," indicating that you should add all the values represented by X and then divide the total by n, which signifies the count of the numbers involved.

Suppose that you had these six chemistry test scores on an 7-item true-false quiz:

To find the mean of these scores, you add them up, and then divide by the number of scores So, the mean is: 25/6ẳ4.17

Standard Deviation

Standard deviation measures the proximity of scores to the mean, indicating how tightly or loosely data points are clustered A small standard deviation signifies that scores are closely grouped around the mean, while a large standard deviation indicates a wider dispersion of scores The formula for calculating standard deviation, represented by the symbol S, helps quantify this variability in data.

The formula look complicated, but what it asks you to do is this:

1 Subtract the mean from each score (XX).

2 Then, square the resulting number to make it a positive number.

3 Then, add up these squared numbers to get a total score.

4 Then, take this total score and divide it by n1 (where n stands for the number of numbers that you have).

5 The final step is to take the square root of the number you found in step 4.

This article focuses on calculating standard deviation using Excel rather than manual computation While traditional statistics books provide methods for calculating standard deviation, such as those by Schuenemeyer and Drew (2011), this guide demonstrates how to efficiently use Excel For example, when applying Excel to a set of six scores, the calculated standard deviation (STDEV) is found to be 1.47.

2 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Standard Error of the Mean

The formula for the standard error of the mean(s.e., which we will use S X to symbolize) is: s:e:ẳS X ẳ S

To calculate the standard error (s.e.), divide the standard deviation (STDEV) by the square root of n, where n represents the total number of observations in your data set For instance, if the standard deviation is 0.60, you can easily verify this calculation using a calculator.

To understand the standard deviation and standard error of the mean, refer to McKillup and Dyar (2010) and Schuenemeyer and Drew (2011) This article will guide you on using Excel to calculate sample size, mean, standard deviation, and standard error of the mean, specifically analyzing the levels of sulfur dioxide in rainfall measured in milligrams per liter It is important to note that one milligram (mg) is one-thousandth of a gram, while one liter (L) is a metric volume equivalent to one kilogram of pure water under standard conditions For this analysis, we will consider data from eight samples of rainfall, as illustrated in Fig 1.1.

Fig 1.1 Worksheet Data for Sulphur Dioxide Levels

1.3 Standard Error of the Mean 3

Sample Size, Mean, Standard Deviation, and Standard

Using the Fill/Series/Columns Commands

Objective: To add the sample numbers 2–8 in a column underneath Sample #1

Home (top left of screen)

Fill (top right of screen: click on the down arrow; see Fig.1.2)

Fig 1.2 Home/Fill/Series commands

4 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

The sample numbers should be identified as 1–8, with 8 in cell B11.

To ensure accurate results, input the milligrams per liter values into cells C4 through C11, and remember to verify your entries for correctness to achieve the desired outcome.

Since your computer screen shows the information in a format that does not look professional, you need to learn how to “widen the column width” and how to

“center the information” in a group of cells Here is how you can do those two steps:

Changing the Width of a Column

Objective: To make a column width wider so that all of the information fits inside that column

To ensure that all the information fits properly, you need to widen Column C on your computer screen.

Click on the letter, C, at the top of your computer screen

Place your mouse pointer on your computer at the far right corner of C until you create a “cross sign” on that corner

Fig 1.3 Example of Dialogue Box for Fill/Series/Columns/Step Value/Stop Value commands1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 5

Left-click on your mouse, hold it down, and move this corner to the right until it is

“wide enough to fit all of the data”

Take your finger off your mouse to set the new column width (see Fig.1.4)

Then, click on any empty cell (i.e., any blank cell) to “deselect” column C so that it is no longer a darker color on your screen.

When you widen a column, you will make all of the cells in all of the rows of this column that same width.

Now, let’s go through the steps to center the information in both Column B andColumn C.

Centering Information in a Range of Cells

Objective: To center the information in a group of cells

In order to make the information in the cells look “more professional,” you can center the information using the following steps:

Left-click your mouse pointer on B3 and drag it to the right and down to highlight cells B3:C11 so that these cells appear in a darker color

At the top of your computer screen, you'll find a series of lines that are uniformly centered in width under the "Alignment" section, which can be identified as the second icon in the bottom left corner of the Alignment box (refer to Fig 1.5).

Fig 1.4 Example of How to Widen the Column Width

6 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Click on this icon to center the information in the selected cells (see Fig.1.6)

To simplify referencing milligrams per liter in your formulas, it's advisable to name your data range instead of memorizing specific cell locations like C4:C11 For instance, you can label this group of cells as "Weight," or choose any name that suits your preference.

Fig 1.5 Example of How to Center Information Within Cells

Centering Information in the Cells

1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 7

Naming a Range of Cells

Objective: To name the range of data for the milligrams per liter with the name:

Highlight cells C4:C11 by left-clicking your mouse pointer on C4 and dragging it down to C11

Formulas (top left of your screen)

Define Name (top center of your screen)

Weight (type this name in the top box; see Fig.1.7)

Then, click on any cell of your spreadsheet that does not have any information in it (i.e., it is an “empty cell”) to deselect cells C4:C11

Now, add the following terms to your spreadsheet:

Fig 1.7 Dialogue box for “naming a range of cells” with the name: Weight

8 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

When utilizing formulas in Excel, it is essential to begin the function name with an equal sign (=) to ensure that Excel recognizes your intention to execute a formula.

Finding the Sample Size Using the ẳ COUNT

Objective: To find the sample size (n) for these data using the ẳCOUNT function

This command should insert the number 8 into cell F6 since there are eight samples of rainfall in your sample.

Finding the Mean Score Using the ẳ AVERAGE

Objective: To find the mean weight figure using theẳAVERAGE function

This command should insert the number 0.8125 into cell F9.

Fig 1.8 Example of Entering the Sample Size, Mean, STDEV, and s.e Labels

1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 9

Finding the Standard Deviation Using the ẳ STDEV

Objective: To find the standard deviation (STDEV) using theẳSTDEV function

This command should insert the number 0.352288 into cell F12.

Finding the Standard Error of the Mean

Objective: To find the standard error of the mean using a formula for these eight data points

This command should insert the number 0.124553 into cell F15 (see Fig.1.9).

It's crucial to verify that all figures in your spreadsheet are accurately placed in their respective cells, as any discrepancies will lead to incorrect formula calculations.

Fig 1.9 Example of Using Excel Formulas for Sample Size, Mean, STDEV, and s.e.

10 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

1.4.8.1 Formatting Numbers in Number Format (Two Decimal Places)

Objective: To convert the mean, STDEV, and s.e to two decimal places

Home (top left of screen)

To decrease the number of decimal places displayed in your spreadsheet, locate the "Number" section at the top center of your screen Then, move your mouse pointer to the bottom right corner of the decimal display until the option "Decrease Decimal" appears.

Click on this icontwiceand notice that the cells F9:F15 are now all in just two decimal places (see Fig.1.11)

Fig 1.10 Using the “Decrease Decimal Icon” to convert Numbers to Fewer Decimal Places1.4 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean 11

Now, click on any “empty cell” on your spreadsheet to deselect cells F9:F15.

Saving a Spreadsheet

Objective: To save this spreadsheet with the name: sulphur3

To save your spreadsheet for future access, first determine where you want to store it, as you have several options You can save it on your computer's hard drive—consult someone for guidance on this process—or opt for external storage like a CD or flash drive.

To save your file, simply scroll through the left sidebar to select your desired location, such as "My Documents," and click to confirm your choice.

File name: sulphur3 (enter this name to the right of File name; see Fig.1.12) Fig 1.11 Example of Converting Numbers to Two Decimal Places

12 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Important note: Be very careful to save your Excel file spreadsheet every few minutes so that you do not lose your information!

Printing a Spreadsheet

Objective: To print the spreadsheet

Use the following procedure when printing any spreadsheet.

Print Active Sheets (see Fig.1.13)

Fig 1.12 Dialogue Box of Saving an Excel Workbook File as “sulphur3” in My Documents location

Print (top of your screen)

The final spreadsheet is given in Fig.1.14

Fig 1.13 Example of How to Print an Excel Worksheet

Using the File/Print/Print

14 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Before concluding this chapter, let's practice adjusting the format of figures in a spreadsheet through two examples: first, formatting dollar amounts to display two decimal places, and second, formatting numerical figures to show three decimal places.

Close your spreadsheet by: File/Close/Don’t Save, and open a blank Excel spreadsheet by using File/New/Blank Workbook (on the top left of your screen).

Formatting Numbers in Currency Format (Two Decimal Places)

Objective: To change the format of figures to dollar format with two decimal places

Highlight cells A4:A6 by left-clicking your mouse on A4 and dragging it down so that these three cells are highlighted in a darker color

Number (top center of screen: click on the down arrow on the right; see Fig.1.15) Fig 1.14 Final Result of Printing an Excel Spreadsheet

1.7 Formatting Numbers in Currency Format (Two Decimal Places) 15

Decimal places: 2 (then see Fig.1.16)

The three cells should have a dollar sign in them and be in two decimal places. Next, let’s practice formatting figures in number format, three decimal places.

Fig 1.15 Dialogue Box for Number Format Choices

Fig 1.16 Dialogue Box for Currency (two decimal places) Format for Numbers

16 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Formatting Numbers in Number Format (Three Decimal Places)

Objective: To format figures in number format, three decimal places

Highlight cells A4:A6 on your computer screen

Number (click on the down arrow on the right)

At the right of the box, change two decimal places to three decimal places by clicking on the “up arrow” once

Ensure that the three figures are formatted in number format with three decimal places Next, click on any empty cell to deselect the range A4:A6 Finally, close the file by selecting File, then Close, and choose Don’t Save, as there is no need to save this practice problem.

You can use these same commands to format a range of cells in percentage format (and many other formats) to whatever number of decimal places you want to specify.

End-of-Chapter Practice Problems

Limonite, a mineral composed of various other minerals, significantly influences soil coloration and the weathered surfaces of rocks, and is also found in iron ore To analyze the iron content in limonite samples, one can calculate the mean, standard deviation, and standard error of the mean based on hypothetical data, as illustrated in Fig 1.17.

1.9 End-of-Chapter Practice Problems 17

To analyze the data effectively, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure to label each result clearly and round the mean, standard deviation, and standard error to two decimal places, applying the appropriate number format for these values.

(b) Print the result on a separate page.

(c) Save the file as: iron3

As a research assistant, your task is to analyze air samples collected near Route 101 in San Francisco during weekday afternoons from 4 p.m to 7 p.m The goal is to determine the average lead concentration in micrograms per cubic meter (μg/m³) The data for this analysis is presented in Fig 1.18.

Fig 1.17 Worksheet Data for Chap 1: Practice

Fig 1.18 Worksheet Data for Chap 1: Practice

18 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

To analyze the data in Excel, first create a table to organize the information Next, calculate the sample size, mean, standard deviation, and standard error of the mean, placing these results to the right of the table Ensure that all answers are clearly labeled and round the mean, standard deviation, and standard error of the mean to two decimal places using the number format.

(b) Print the result on a separate page.

(c) Save the file as: air3

In a recent analysis, 16 ore samples collected from various locations within a mine were examined to determine their silver content Each sample was processed to quantify the percentage of silver present, with the hypothetical results illustrated in Fig 1.19.

To analyze the provided data, create a table in Excel and calculate the sample size, mean, standard deviation, and standard error of the mean Ensure to label each result clearly and round the mean, standard deviation, and standard error of the mean to three decimal places using the appropriate number format.

(b) Print the result on a separate page.

(c) Save the file as: SILVER3

Fig 1.19 Worksheet Data for Chap 1: Practice

1.9 End-of-Chapter Practice Problems 19

McKillup S., Dyar M Geostatistics Explained: an introductory guide for earth scientists. Cambridge: Cambridge University Press; 2010.

Schuenemeyer J, Drew L Statistics for Earth and Environmental Scientists Hoboken: John Wiley

20 1 Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Salt marshes are vital coastal wetlands located along the protected shorelines of the eastern USA, where fresh water and seawater intersect These unique ecosystems experience flooding from ocean tides, requiring the resident plants to adapt to the challenges posed by saltwater.

Salinity, which refers to the salt content in water, varies based on the proximity of the marsh to the ocean A biogeographer researching the impact of salinity on vegetation in a Maine salt marsh has conducted a detailed mapping of the area.

To conduct a study on salinity levels within a salt marsh, you need to randomly select 5 out of 32 distinct geographic areas Utilizing your Excel skills, the first step is to establish a "sampling frame" that outlines the specific areas from which the sample will be drawn This approach ensures a representative measurement of salinity across the selected regions.

A sampling frame is essential for selecting a random sample, consisting of a list of objects, events, or individuals In this context, the sampling frame includes 32 distinct areas of a salt marsh, each assigned a unique identification code starting from 1 for the first area and continuing sequentially up to 32 for the last area This structured approach ensures a systematic method for sampling within the defined salt marsh regions.

32 with each area having a unique ID number.

We will first create the frame numbers as follows in a new Excel worksheet:

Creating Frame Numbers for Generating

Objective: To create the frame numbers for generating random numbers

T.J Quirk et al., Excel 2013 for Physical Sciences Statistics,

To create frame numbers in column A using the Home/Fill commands, follow these steps: Begin by entering the number 1 in cell A1, then select the cell and drag the fill handle down to cell A35 This action will automatically populate the cells with consecutive numbers, ending with the number 32 in cell A35 For a quick reference, ensure you are familiar with the instructions provided in Section 1.4.1 of this book.

Click on cell A4 to select this cell

Fill (then click on the “down arrow” next to this command and select)

Then, save this file as: Random29 You should obtain the result in Fig.2.3.

Fig 2.1 Dialogue Box for Fill/Series Commands

Fig 2.2 Dialogue Box for Fill/Series/Columns/Step value/Stop value Commands

Now, create a column next to these frame numbers in this manner:

To organize your spreadsheet effectively, use the Home/Fill command to populate frame numbers from cell B4 to B35 Ensure that columns A and B are widened to accommodate all data, and center the content in both columns for a polished appearance, as shown in Fig 2.4.

Fig 2.3 Frame Numbers from 1 to 32

2.1 Creating Frame Numbers for Generating Random Numbers 23

Save this file as: Random30

To ensure an accurate sorting process, you may have duplicated the information in both Column A and Column B of your spreadsheet This duplication guarantees that you will have exactly 32 frame numbers to sort into a random sequence, maintaining the integrity of your data.

Now, let’s add a random number to each of the duplicate frame numbers as follows:

Creating Random Numbers in an Excel Worksheet

C3: RANDOM NO (then widen columns A, B, C so that their labels fit inside the columns; then center the information in A3:C35)

Next, hit the Enter key to add a random number to cell C4.

To utilize the RAND() function effectively, ensure that you include both an open and a closed parenthesis after the command The RAND function generates a random number by referencing the cells to the left of the cell containing the RAND() command.

To generate random numbers for all 32 ID frame numbers, position your mouse pointer over cell C4 and drag it to the bottom right corner until a "plus sign" appears Then, click and drag down to cell C35 to complete the process.

Then, click on any empty cell to deselect C4:C35 to remove the dark color highlighting these cells.

Save this file as: Random31

Now, let’s sort these duplicate frame numbers into a random sequence:

Random Numbers Assigned to the Duplicate Frame

2.2 Creating Random Numbers in an Excel Worksheet 25

Sorting Frame Numbers into a Random Sequence

Objective: To sort the duplicate frame numbers into a random sequence

Highlight cells B3:C35 (include the labels at the top of columns B and C) Data (top of screen)

Sort (click on this word at the top center of your screen; see Fig.2.6)

Sort by: RANDOM NO (click on the down arrow)

Smallest to Largest (see Fig.2.7)

Fig 2.6 Dialogue Box for Data/Sort Commands

Click on any empty cell to deselect B3:C35.

Save this file as: Random32

These steps will produce Fig.2.8with the DUPLICATE FRAME NUMBERS sorted into a random order:

Important note: Because Excel randomly assigns these random numbers, your

Excel commands will produce a different sequence of random numbers from everyone else who reads this book!

Fig 2.7 Dialogue Box for Data/Sort/RANDOM NO./Smallest to Largest Commands

2.3 Sorting Frame Numbers into a Random Sequence 27

Because your objective at the beginning of this chapter was to select randomly

5 of the 32 areas of the salt marsh, you now can do that by selecting thefirst five ID numbersin DUPLICATE FRAME NO column after the sort.

While your initial set of five random numbers may differ from those chosen in our random selection process outlined in this chapter, we will identify these five area IDs using Fig 2.9.

Save this file as: Random33

Each time you use the RAND() function in Excel, it generates a new set of random numbers, meaning that the five ID numbers you select will differ from those shown in Fig 2.9.

Before concluding this chapter, it's essential to understand how to print a file effectively, ensuring that all its information fits neatly onto a single page without spilling over onto additional pages.

Printing an Excel File So That All of the Information

Objective: To print a file so that all of the information fits onto one page

2.4 Printing an Excel File So That All of the Information Fits onto One Page 29

This chapter includes three practice problems that involve sorting random numbers from files containing 63 resistors, 114 steel samples, and 75 toxic waste sites To ensure these files fit on a single printed page, proper formatting is essential, as they may otherwise be too large to print effectively.

Let’s create a situation where the file does not fit onto one printed page unless you format it first to do that.

Go back to the file you just created, Random 33, and enter the name:Jennifer into cell: A50.

If you print this document now, the name "Jennifer" will appear on a second page due to it extending beyond the current page boundaries.

To ensure all information, including the name Jennifer, fits on a single printed page, adjust the page format by following these steps.

Page Layout (top left of the computer screen)

(Notice the “Scale to Fit” section in the center of your screen; see Fig.2.10)

To adjust the page size to 95%, click the down arrow next to 100% on your screen Keep in mind that the name "Jennifer" appears on the second page, as it is located below the horizontal dotted line, which indicates the printable area of the document, as shown in Fig 2.11.

To reduce the size of the worksheet to 90% of its normal size, simply press the down arrow on the right once more to repeat the “scale change steps.” As shown in Fig 2.12, the “dotted lines” on your screen now appear below Jennifer’s name, indicating that all information, including her name, is formatted to fit on a single printed page.

Save the file as: Random34

Print the file Does it all fit onto one page? It should (see Fig.2.13).

Fig 2.12 Example of Scale Reduced to 90 % with “Jennifer” to be printed on the first page (note the dotted line below Jennifer on your screen)

Fig 2.11 Example of Scale Reduced to 95 % with “Jennifer” to be Printed on a Second Page2.4 Printing an Excel File So That All of the Information Fits onto One Page 31

End-of-Chapter Practice Problems

To ensure quality control in an electronics company, a random sample of 15 out of 63 electrical resistors of a specific type will be tested for performance and reliability.

(a) Set up a spreadsheet of frame numbers for these resistors with the heading: FRAME NUMBERS using the Home/Fill commands.

To organize your data, first create a column labeled "Frame Numbers," listing all relevant frame numbers Next, add a column titled "Duplicate Frame Numbers" adjacent to the first column, replicating the frame numbers Finally, introduce another column to the right, utilizing the =RAND() function to generate random numbers for each entry in the "Duplicate Frame Numbers" column, and ensure that this column displays the random numbers formatted to three decimal places.

(d) Sort the duplicate frame numbers and random numbers into a random order (e) Print the result so that the spreadsheet fits onto one page

(f) Circle on your printout the I.D number of the first 15 resistors that you would use in your research study

(g) Save the file as: RAND9

It's important to note that each individual solving this problem will produce a unique random order of resistor ID numbers due to Excel's RAND() function, which generates different random numbers with each use Consequently, the sequence of random numbers provided in this Excel Guide will differ from the random sequence you create, which is completely normal and expected.

As a consultant tasked with testing building materials for suspension bridge engineers, I have been provided with 114 samples of a new type of steel intended for future bridge construction My objective is to evaluate the tensile strength and material consistency by randomly selecting and testing a sample of 10 of these steel samples.

(a) Set up a spreadsheet of frame numbers for these steel samples with the heading: FRAME NO.

To organize your data, first create a column labeled "Duplicate frame no." next to the original frame numbers Then, add another column titled "Random number" to the right of the duplicate frame numbers, utilizing the =RAND() function to generate random numbers corresponding to each frame number Finally, format this column to display three decimal places for each random number.

2.5 End-of-Chapter Practice Problems 33

(d) Sort the duplicate frame numbers and random numbers into a random order (e) Print the result so that the spreadsheet fits onto one page

(f) Circle on your printout the I.D number of the first 10 steel samples that would be used in this research study.

(g) Save the file as: RANDOM6

3 Suppose that a chemical field researcher wants to take a random sample of 20 of

A recent study has identified 75 toxic waste sites surrounding an abandoned commercial house paint plant Researchers are conducting field tests to measure lead levels in the soil around the facility, aiming to assess the environmental impact of the site's closure.

(a) Set up a spreadsheet of frame numbers for these sites with the heading: FRAME NUMBERS.

To organize your data effectively, first create a column labeled "Duplicate Frame Numbers" to the right of your existing frame numbers Next, add another column titled "Random Number" adjacent to the duplicate frame numbers In this column, utilize the =RAND() function to generate random numbers corresponding to each frame number Finally, adjust the formatting of this column to display each random number with three decimal places.

(d) Sort the duplicate frame numbers and random numbers into a random order (e) Print the result so that the spreadsheet fits onto one page

(f) Circle on your printout the I.D number of the first 20 sites that the field chemist should select for her study.

(g) Save the file as: RAND5

Confidence Interval About the Mean Using the TINV Function and Hypothesis Testing

This chapter focuses on two ideas: (1) finding the 95 % confidence interval about the mean, and (2) hypothesis testing.

Let’s talk about the confidence interval first.

Confidence Interval About the Mean

How to Estimate the Population Mean

Objective: To estimate the population mean,μ

The population mean represents the average value of a specific characteristic within a target demographic For instance, if we wanted to assess the preference of adults aged 25–44 for a new Ben & Jerry’s ice cream flavor, it would be impractical to survey every individual in that age group across the U.S due to the excessive time and costs involved in conducting such a comprehensive study.

Instead of testing the entire population, we can efficiently estimate the population mean by analyzing a sample of individuals This approach not only conserves time and resources but also falls under the category of "inferential statistics," as it allows us to infer the overall population mean based on the sample mean.

T.J Quirk et al., Excel 2013 for Physical Sciences Statistics,

In scientific research, analyzing a sample involves understanding key statistics such as the sample size (n), the sample mean (X̄), and the standard deviation (STDEV) These metrics are essential for estimating the population mean through a method known as the "confidence interval about the mean."

Estimating the Lower Limit and the Upper Limit

of the 95 % Confidence Interval About the Mean

This test's theoretical background is not covered in this book, but you can explore it further in reputable statistics textbooks such as McKillup and Dyar (2010) or Ledolter and Hogg (2010) The fundamental concepts of the test are essential for understanding its application.

We assume that the population mean is somewhere in an interval which has a

In this book, we establish both a lower and an upper limit for our analysis, aiming for a 95% confidence level that the population mean falls within this defined interval.

“We are 95 % confident that the population mean in miles per gallon (mpg) for the Chevy Impala automobile is between 26.92 miles per gallon and 29.42 miles per gallon.”

To highlight the Chevy Impala's lower environmental impact, we can confidently state that it achieves 28 miles per gallon (mpg), as this figure falls within the 95% confidence interval established in our research Although we cannot pinpoint the exact population mean, we know it lies between 26.92 mpg and 29.42 mpg, confirming that 28 mpg is a valid representation of the vehicle's efficiency.

But we are only 95 % confident that the population mean is inside this interval, and 5 % of the time we will be wrong in assuming that the population mean is

In scientific research, we typically aim for a 95% confidence level in our assumptions, although this is an arbitrary standard While we could opt for different confidence levels such as 80%, 90%, or even 99%, this book will consistently adhere to a 95% confidence threshold This approach eliminates any uncertainty regarding the level of confidence required for the problems presented, ensuring that readers can rely on a uniform standard throughout.

So how do we find the 95 % confidence interval about the mean for our data?

In words, we will find this interval this way:

To calculate the confidence interval, first determine the sample mean (X̄) To find the upper limit, add 1.96 times the standard error of the mean (s.e.) to the sample mean For the lower limit, subtract 1.96 times the standard error of the mean from the sample mean.

36 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

The standard error of the mean (s.e.) is calculated by dividing the sample's standard deviation (STDEV) by the square root of the sample size (n).

In mathematical terms, the formula for the 95 % confidence interval about the mean is:

To calculate the confidence interval, first add and subtract 1.96 times the standard error (s.e.) from the mean This process determines the upper limit by adding 1.96 s.e to the mean and the lower limit by subtracting 1.96 s.e from the mean The term 1.96 s.e represents the multiplication of 1.96 with the standard error of the mean, which is essential for constructing the confidence interval.

Note: We will explain shortly where the number 1.96 came from.

Let’s try a simple example to illustrate this formula.

Estimating the Confidence Interval the Chevy

In a study examining the carbon footprint of Chevy Impala drivers, 49 owners recorded their mileage and fuel consumption for two tanks of gas The findings revealed an average fuel efficiency of 27.83 miles per gallon (mpg) with a standard deviation of 3.01 mpg Consequently, the standard error (s.e.) was calculated as 0.43, derived from the standard deviation divided by the square root of the sample size.

The 95 % confidence interval for these data would be:

Theupper limit of this confidence intervaluses the plus sign of thesign in the formula Therefore, the upper limit would be:

Similarly, the lower limit of this confidence interval uses the minus sign of thesign in the formula Therefore, the lower limit would be:

3.1 Confidence Interval About the Mean 37

The result of our part of the ongoing research study would, therefore, be the following:

“We are 95 % confident that the population mean for the Chevy Impala is somewhere between 26.99 mpg and 28.67 mpg.”

Highlighting the Chevy Impala's impressive fuel efficiency of 28 mpg can effectively promote its lower environmental impact This statistic positions the Impala as an eco-friendly choice for consumers seeking to reduce their carbon footprint while enjoying a reliable vehicle.

95 % confidence interval for the population mean.

You are probably asking yourself: “Where did that 1.96 in the formula come from?”

Where Did the Number “1.96” Come From?

A detailed mathematical answer to that question is beyond the scope of this book, but here is the basic idea.

We assume that the population data follows a "normal distribution," resembling a "normal curve" if we could test every individual or property within the population This normal curve, which resembles the shape of the Liberty Bell located in front of Independence Hall in Philadelphia, Pennsylvania, is characterized by its symmetry; when divided down the middle, each half aligns perfectly when folded over.

Integral calculus is not the focus of this book; however, we aim to determine the lower and upper limits of population data within the normal curve, encompassing 95% of the area beneath it For research studies involving over 40 participants, these limits are calculated as plus or minus 1.96 times the standard error of the mean (s.e.) of the sample This value provides the boundaries for our confidence interval For further insights into this concept, refer to a reputable statistics book, such as McKillup and Dyar (2010).

The value of 1.96 will vary if we aim for a confidence level other than 95%, provided our research study includes more than 40 participants.

1 If we wanted to be 80 % confident of our results, this number would be 1.282.

2 If we wanted to be 90 % confident of our results, this number would be 1.645.

3 If we wanted to be 99 % confident of our results, this number would be 2.576.

In this book, we aim for a 95% confidence level in our research results, which is why we will consistently use the value of 1.96 for studies involving more than 40 participants.

You might be wondering if the value of 1.96 is always used in confidence intervals for the mean The answer is no, and we will clarify the reasons behind this.

38 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

Finding the Value for t in the Confidence

Objective: To find the value for t in the confidence interval formula

The correct formula for the confidence interval about the mean for different sample sizes is the following:

To calculate the 95% confidence interval, begin by determining the sample mean (X) The upper limit is obtained by adding the product of the t-value and the standard error (s.e.) to the sample mean, while the lower limit is found by subtracting the same product from the sample mean To find the appropriate t-value, refer to the table provided in Appendix E of this book.

Objective: To find the value of t in the t-table in AppendixE

Before we get into an explanation of what is meant by “the value of t,” let’s give you practice in finding the value of t by using the t-table in AppendixE.

Keep your finger on Appendix Eas we explain how you need to “read” that table.

In this chapter, the test referred to as the "confidence interval about the mean test" requires you to consult the first column on the left in Appendix E to determine the critical value of t for your research study, which is labeled as "sample size n."

To determine the t-value for your research study, locate the sample size in the first column of the table, then move to the right to find the corresponding value in the "critical t column," which is used for the 95% confidence interval about the mean For instance, if your study includes 14 participants, the t-value is 2.160.

If you have 26 people in your research study, the value of t is 2.060.

If you have more than 40 people in your research study, the value of t is always 1.96.

The "critical t column" in Appendix E indicates the t value necessary to achieve 95% confidence in your statistical results This book operates under the assumption that you aim for 95% confidence in your statistical tests Consequently, the t value found in the t-table of Appendix E is essential for calculating the 95% confidence interval around the mean.

To calculate the confidence interval for the mean using Excel, first determine the value of t required for your analysis Once you have this value, you can utilize Excel's statistical functions to compute the confidence interval effectively By inputting your data and applying the appropriate formulas, you can easily derive the confidence interval, providing a clear understanding of the data's reliability.

3.1 Confidence Interval About the Mean 39

Using Excel ’ s TINV Function to Find the

Objective: To use the TINV function in Excel to find the confidence interval about the mean

When you use Excel, the formulas for finding the confidence interval are:

Lower limit: ẳXTINVð10:95,n1ị*s:e:ðno spaces between these symbolsị ð3:3ị

Upper limit: ẳXỵTINVð10:95,n1ị*s:e:ðno spaces between these symbolsị ð3:4ị

In Excel formulas, the “* symbol” indicates multiplication, representing the term “times” as used in mathematical language As mentioned in Chapter 1, 'n' denotes the sample size, while 's' refers to the sample size minus one.

In Chapter 1, we learned that the standard error of the mean (s.e.) is calculated by dividing the standard deviation (STDEV) by the square root of the sample size (n) To illustrate this concept, we will use Excel to compute the 95% confidence interval for the mean in a sample problem.

Let’s suppose that General Motors wanted to claim that its Chevy Impala achieves 28 miles per gallon (mpg) Let’s call 28 mpg the “reference value” for this car.

As an employee of Ford Motor Co., you aim to validate a specific claim through research evidence To achieve this, you collect data and employ a two-sided 95% confidence interval to analyze the mean, ensuring your results are statistically sound and reliable.

Using Excel to Find the 95 % Confidence Interval

Objective: To analyze the data using a two-side 95 % confidence interval about the mean

New car owners are chosen to participate in a study where they track their mileage for two tanks of gas, recording the average miles per gallon achieved.

40 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis achieve on these two tanks of gas Your research study produces the results given in Fig.3.1:

To analyze your data effectively, create a spreadsheet in Excel and calculate the sample size (n), mean, standard deviation (STDEV), and standard error of the mean (s.e.) using the specified cell references.

Enter the other mpg data in cells A7:A30

Now, highlight cells A6:A30 and format these numbers in number format (one decimal place) Center these numbers in Column A Then, widen columns A and

B by making both of them twice as wide as the original width of column

A Then, widen column C so that it is three times as wide as the original width of column A so that your table looks more professional.

Fig 3.1 Worksheet Data for Chevy Impala (Practical Example)

3.1 Confidence Interval About the Mean 41

B26: Draw a picture below this confidence interval

B29: lower (right-align this word)

B30: limit (right-align this word)

C28: ‘ - 28 -–28.17 -– (note that you need to begin cell C28 with a single quotation mark(‘) to tell Excel that this is alabel, and not a number) D28: ‘ - (note the single quotation mark)

E28: ‘29.42 (note the single quotation mark)

Fig 3.2 Example of Chevy Impala Format for the Confidence Interval About the Mean Labels

42 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

Now, align the labels underneath the picture of the confidence interval so that they look like Fig.3.3.

Next, name the range of data from A6:A30 as: miles

D7: Use Excel to find the sample size

D10: Use Excel to find the mean

D13: Use Excel to find the STDEV

D16: Use Excel to find the s.e.

Now, you need to find the lower limit and the upper limit of the 95 % confidence interval for this study.

We will use Excel’s TINV function to do this We will assume that you want to be 95 % confident of your results.

F21: ẳD10TINV 1ð :95, 24ị*D16 Fig 3.3 Example of Drawing a Picture of a Confidence Interval About the Mean Result

3.1 Confidence Interval About the Mean 43

Note that this TINV formula uses 24 since 24 is one less than the sample size of

25 (i.e., 24 is n1) Note that D10 is the mean, while D16 is the standard error of the mean The above formula gives thelower limit of the confidence interval, 26.92.

The calculated upper limit of the confidence interval is 29.42 To ensure clarity in your Excel spreadsheet, format the mean, standard deviation, standard error of the mean, and both limits of the confidence interval to two decimal places If printed in its current format, the lower limit of 26.92 and the upper limit of 29.42 may extend onto a second page due to size constraints.

To adjust the size of your spreadsheet in Excel, utilize the "Scale to Fit" commands found in the Page Layout section, as mentioned in Chapter 2, Section 2.4 Set the scale to 95% of its current size, and observe the dotted line indicators next to the values 26.92 and 29.42, which now signify that these dimensions will fit on a single printed page (refer to Fig 3.4).

Fig 3.4 Result of Using the TINV Function to Find the Confidence Interval About the Mean

44 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

Note that you have drawn a picture of the 95 % confidence interval beneath cell B26, including the lower limit, the upper limit, the mean, and the reference value of

28 mpg given in the claim that the company wants to make about the car’s miles per gallon performance.

Now, let’s write the conclusion to your research study on your spreadsheet:

C33: Since the reference value of 28 is inside

C34: the confidence interval, we accept that

C35: the Chevy Impala does get 28 mpg.

When formatting a spreadsheet, it's essential to present the conclusion across three separate lines rather than one long line This approach prevents two undesirable outcomes: first, reducing the page layout to fit everything on one page would result in an unreadably small font size, and second, printing without adjusting the layout could cause part of the conclusion to spill over onto a separate page, compromising the professional appearance of your spreadsheet.

The research study confirmed that the Chevy Impala achieved an average fuel efficiency of 28 miles per gallon, with the study's results showing an average of 28.17 miles per gallon (refer to Fig 3.5) Please save the resulting spreadsheet under the name CHEVY7.

3.1 Confidence Interval About the Mean 45

Hypothesis Testing

Hypotheses Always Refer to the Population

Properties That You Are Studying

The first step is to understand that our hypotheses always refer to thepopulationof physical properties in a study.

To investigate the brightness of different light bulbs utilized in a specific vehicle headlight, we would choose a variety of bulb types and measure their brightness in lumens These measurements would constitute our sample, enabling us to generalize our findings to all light bulbs used in that vehicle model.

In our study, we focus on the population of light bulbs utilized in this type of vehicle, with the specific light bulbs selected for analysis referred to as the sample from this population.

Our sample sizes usually represent only a fraction of the total light bulbs, so we focus on how the findings from our sample can be effectively generalized to the larger population we aim to understand.

That is why our hypotheses always refer to the population, and never to the sample of physical properties in our study.

You will recall from Chap.1that we used the symbol:Xto refer to the mean of the sample we use in our research study (See Sect.1.1).

We will use the symbol:μ(the Greek letter “mu”) to refer to thepopulation mean.

In testing our hypotheses, we are trying to decide which one of two competing hypothesesabout the population meanwe should accept given our data set.

The Null Hypothesis and the Research (Alternative) Hypothesis

These two hypotheses are called thenull hypothesisand theresearch hypothesis. Statistics textbooks typically refer to thenull hypothesiswith the notation:H0.

Theresearch hypothesisis typically referred to with the notation:H1, and it is sometimes called thealternative hypothesis.

Let’s explain first what is meant by the null hypothesis and the research hypothesis:

(1) The null hypothesis is what we accept as true unless we have compelling evidence that it is not true.

(2) The research hypothesis is what we accept as true whenever we reject the null hypothesis as true.

In the American legal system, the principle of "innocent until proven guilty" reflects our foundational belief in justice, where the null hypothesis posits that a defendant is innocent, while the research hypothesis asserts guilt.

In Missouri, the state slogan "Show me" reflects the residents' skepticism towards claims without proof This phrase embodies the belief that actions are more significant than words, emphasizing the importance of demonstrating truth through behavior rather than mere statements.

In hypothesis testing, the goal is to determine which of the two competing statements—the null hypothesis or the research hypothesis—should be accepted as true, given that both cannot coexist Statistical formulas are employed to make this decision, guiding researchers in their conclusions.

In scientific research, rating scales are commonly employed to assess individuals' attitudes towards a company, its products, or their purchase intentions These scales typically consist of 5-point, 7-point, or 10-point formats, though variations in scale values can also be utilized.

3.2.2.1 Determining the Null Hypothesis and the Research Hypothesis

When Rating Scales Are Used

Rating scales can serve as effective tools for testing both the null hypothesis and the research hypothesis, despite their infrequent use in the physical sciences The examples provided illustrate practical applications of these scales, demonstrating how to conduct hypothesis testing in scenarios where rating scales are applicable.

Here is a typical example of a 7-point scale in science education for parents of 10th grade pupils at the end of a school year (see Fig.3.6):

48 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

So, how do we decide what to use as the null hypothesis and the research hypothesis whenever rating scales are used?

Objective: To decide on the null hypothesis and the research hypothesis when- ever rating scales are used.

In order to make this determination, we will use a simple rule:

Rule: Whenever rating scales are used, we will use the “middle” of the scale as the null hypothesis and the research hypothesis.

In the above example, since 4 is the number in the middle of the scale (i.e., three numbers are below it, and three numbers are above it), our hypotheses become:

According to the statistical test results for the attitude scale item, a population mean close to 4 suggests that we accept the null hypothesis, indicating that the parents of 10th grade pupils are neither satisfied nor dissatisfied with the quality of the science program at their children's school.

If our statistical test reveals a significant difference between the population mean and the value of 4, we will reject the null hypothesis and accept the research hypothesis.

Parents of 10th grade students expressed high satisfaction with the science program provided by their child's school, as evidenced by a sample mean that notably exceeds the anticipated population mean of 4.

Parents of 10th-grade students expressed notable dissatisfaction with the quality of the science program provided by their child's school, particularly when the sample mean was significantly lower than the anticipated population mean of 4.

Fig 3.6 Example of a Rating Scale Item for Parents of 10th Graders (Practical Example)

Both of these conclusions cannot be true We accept one of the hypotheses as

“true” based on the data set in our research study, and the other one as “not true” based on our data set.

A research scientist's primary responsibility is to determine whether to accept the null hypothesis or the research hypothesis as valid based on the data collected in their study.

Let’s try some examples of rating scales so that you can practice figuring out what the null hypothesis and the research hypothesis are for each rating scale.

In the spaces in Fig.3.7, write in the null hypothesis and the research hypothesis for the rating scales:

Here are the answers to these three questions:

1 The null hypothesis isμẳ3, and the research hypothesis isμ6ẳ3 on this 5-point scale (i.e the “middle” of the scale is 3).

Fig 3.7 Examples of Rating Scales for Determining the Null Hypothesis and the Research Hypothesis

50 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

2 The null hypothesis isμẳ4, and the research hypothesis isμ6ẳ4 on this 7-point scale (i.e., the “middle” of the scale is 4).

3 The null hypothesis is μẳ5:5, and the research hypothesis isμ6ẳ5:5 on this 10-point scale (i.e., the “middle” of the scale is 5.5 since there are 5 numbers below 5.5 and 5 numbers above 5.5).

Webster University, located in St Louis, Missouri, utilizes a Course Feedback form for student evaluations at the conclusion of its courses This form includes 12 rating items that assess course planning, organization, and instructor communication After the course ends, the ratings are summarized and provided to instructors for review Each item is evaluated on a 4-point scale.

In this study, the null hypothesis is defined as μ = 2.5, while the research hypothesis posits that μ ≠ 2.5 This is based on the observation that there are two ratings below 2.5 and two ratings above 2.5 on the scales used It is important to note that the scoring system is designed so that lower scores, similar to golf, indicate a better performance.

Now, let’s discuss the 7 STEPS of hypothesis testing for using the confidence interval about the mean.

The 7 Steps for Hypothesis-Testing Using the

the Confidence Interval About the Mean

Objective: To learn the 7 steps of hypothesis-testing using the confidence interval about the mean

There are seven basic steps of hypothesis-testing for this statistical test.

3.2.3.1 STEP 1: State the Null Hypothesis and the Research Hypothesis

When utilizing numerical scales in surveys, it's essential to focus on the midpoint of these scales For instance, in a 7-point scale ranging from 1 (poor) to 7 (excellent), the hypotheses should be centered around the middle values.

3.2.3.2 STEP 2: Select the Appropriate Statistical Test

In this chapter we are studying the confidence interval about the mean, and so we will select that test.

3.2.3.3 STEP 3: Calculate the Formula for the Statistical Test

You will recall (see Sect.3.1.5) that the formula for the confidence interval about the mean is:

In this chapter, we previously outlined the procedure for calculating the confidence interval for the mean using Excel The steps to effectively implement this formula include gathering the necessary data, applying the appropriate statistical functions, and interpreting the results to ensure accurate analysis.

1 Use Excel’sẳCOUNT function to find the sample size.

2 Use Excel’sẳAVERAGE function to find the sample mean,X.

3 Use Excel’sẳSTDEV function to find the standard deviation, STDEV.

4 Find the standard error of the mean (s.e.) by dividing the standard deviation (STDEV) by the square root of the sample size, n.

5 Use Excel’s TINV function to find the lower limit of the confidence interval.

6 Use Excel’s TINV function to find the upper limit of the confidence interval.

3.2.3.4 STEP 4: Draw a Picture of the Confidence Interval About the Mean, Including the Mean, the Lower Limit of the Interval, the Upper Limit of the Interval, and the Reference Value Given in the Null Hypothesis, H 0

We will explain Step 4 later in the chapter.

3.2.3.5 STEP 5: Decide on a Decision Rule

(a) If the reference value is inside the confidence interval, accept the null hypoth- esis, H0

(b) If the reference value is outside the confidence interval, reject the null hypoth- esis, H0, and accept the research hypothesis, H1

52 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

3.2.3.6 STEP 6: State the Result of Your Statistical Test

When utilizing the confidence interval for the mean, there are two potential outcomes, but only one can be deemed "true." Therefore, your findings will fall into one of these categories.

Either:Since the reference value is inside the confidence interval, we accept the null hypothesis, H0

Or: Since the reference value is outside the confidence interval, we reject the null hypothesis, H0, and accept the research hypothesis, H1

3.2.3.7 STEP 7: State the Conclusion of Your Statistical Test in Plain English!

Summarizing the results of your statistical test in clear and concise language can be challenging, especially when aiming to make it understandable for someone without a statistics background, like your boss This book will provide ample practice to help you master this crucial skill.

Objective: To write the conclusion of the confidence interval about the mean test

Let’s set some basic rules for stating the conclusion of a hypothesis test.

Rule #1:Whenever you reject H0and accept H1, you must use the word “signifi- cantly” in the conclusion to alert the reader that this test found an important result.

Rule #2: Create an outline in words of the “key terms” you want to include in your conclusion so that you do not forget to include some of them.

Rule #3: Write the conclusion in plain English so that the reader can understand it even if that reader has never taken a statistics course.

Let’s practice these rules using the Chevy Impala Excel spreadsheet that you created earlier in this chapter, but first we need to state the hypotheses for that car.

If General Motors wants to claim that the Chevy Impala gets 28 miles per gallon on a billboard ad, the hypotheses would be:

The reference value of 28 mpg falls within the 95% confidence interval for the data analyzed, leading us to accept the null hypothesis (H0) for the Chevy Impala, confirming that the vehicle achieves an average fuel efficiency of 28 mpg.

Objective: To state the result when you accept H 0

Result: Since the reference value of 28 mpg is inside the confidence interval, we accept the null hypothesis, H0

Let’s try our three rules now:

Objective: To write the conclusion when you accept H 0

In this chapter, we adhere to a fundamental guideline: if the reference value falls within the confidence interval, we must refrain from using the term "significantly" in our conclusions This principle applies consistently to all problems addressed in this section.

Rule #2: The key terms in the conclusion would be:

Rule #3: The Chevy Impala did get 28 mpg.

Writing a conclusion when accepting the null hypothesis (H0) is straightforward, as it simply reflects the wording of the null hypothesis In contrast, formulating a conclusion after rejecting H0 and accepting the alternative hypothesis (H1) is more complex To enhance understanding, we will practice crafting such conclusions through three illustrative case examples.

Objective: To write the result and conclusion when you reject H 0

CASE #1: Suppose that an ad inThe Wall Street Journalclaimed that the Honda

Accord Sedan gets 34 miles per gallon The hypotheses would be:

Suppose that your research yields the following confidence interval:

30 31 32 34 lower Mean upper Ref. limit limit Value

Result: Since the reference value is outside the confidence interval, we reject the null hypothesis and accept the research hypothesis

The three rules for stating the conclusion would be:

Rule #1: We must include the word “significantly” since the reference value of 34 is outside the confidence interval.

54 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

Rule #2: The key terms would be:

Honda Accord Sedan significantly either “more than” or “less than” and probably closer to

Rule #3: The Honda Accord Sedan got significantly less than 34 mpg, and it was probably closer to 31 mpg.

The conclusion indicates that the miles per gallon (mpg) was below 34, as the sample mean was recorded at just 31 mpg Additionally, it is important to clarify that simply rejecting the null hypothesis does not suffice; one must specify that the results are "significantly less than" the stated value.

34 mpg,” because that does not tell the reader “how much less than 34 mpg” the sample mean was from 34 mpg To make the conclusion clear, you need to add:

“probably closer to 31 mpg” since the sample mean was only 31 mpg.

The density of a substance remains consistent regardless of the quantity, making it a vital tool for mineral identification Density is determined by dividing the mass (g) of an object by its volume (cm³) For instance, pure silver has a density of 10.49 g/cm³; therefore, if a sample labeled as "pure silver" has a different density, it cannot be genuine silver Variations in density may arise from impurities mixed with the substance, reinforcing the reliability of density measurements in scientific analysis.

In a scenario where a company requires verification of a substance claimed to be pure silver, you would analyze 50 random samples of the material to assess its authenticity By calculating the density of each sample, you can apply statistical methods to establish a confidence interval, which will help determine whether the substance meets the criteria for pure silver This process enhances your data interpretation skills while providing valuable insights into the material's purity.

The hypotheses for this test would be:

The null hypothesis posits that a mean score of 10.49 g/cm³ indicates the substance acquired by the company is pure silver; thus, if the sample's mean does not significantly differ from this value, it supports the purity of the silver.

Suppose that your analysis produced the following confidence interval for this test:

10.41 _ 10.43 10.45 10.49 lower Mean upper Ref. limit limit Value

Result: Since the reference value is outside the confidence interval, we reject the null hypothesis and accept the research hypothesis.

Rule #1: You must include the word “significantly” since the reference value is outside the confidence interval

Rule #2: The key terms would be:

– less or greater (depending on your result)

– either pure silver or not pure silver (since the result is significant)

Rule #3: The observed density of the substance tested was significantly less than the known density of silver Therefore, the tested substance was not pure silver.

Note that you need to use the word “less” since the sample mean of 10.43 g/cm 3 was less than the reference value of 10.49 g/cm 3

As a quality control supervisor in a reputable machine shop that manufactures high-quality steel rods for construction, you are tasked with assessing the accuracy of a recently repaired cutting machine The machine, which has a history of cutting issues, has been set to produce rods at a precise length of 5.5 centimeters (cm) To evaluate its performance, you will utilize your Excel skills to analyze a randomly selected set of test rods Your goal is to determine whether the machine is now functioning within the specified standards and producing accurate cuts consistently.

Suppose that your research produced the following confidence interval for this machine for your test:

Result: Since the reference value is outside the confidence interval, we reject the null hypothesis and accept the research hypothesis

56 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

The three rules for stating the conclusion would be:

Rule #1: You must include the word “significantly” since the reference value is outside the confidence interval

Rule #2: The key terms would be:

– longer or shorter (depending on the result of your test)

Rule #3: The sample of test bars were cut significantly longer than what the cutting machine was set for at 5.5 cm, and were probably closer to 5.8 cm.

In conclusion, it's important to note that while native English speakers typically avoid phrases like "significantly longer," the incorporation of statistical data empowers you to communicate with authority Additionally, the average measurement of 5.8 cm exceeds the reference value of 5.5 cm, highlighting a notable difference.

If you want a more detailed explanation of the confidence interval about the mean, see Townend (2002).

At the conclusion of this chapter, you will find three practice problems designed to enhance your skills in articulating your research conclusions Additionally, this book features numerous examples that will aid you in crafting clear and precise conclusions for your research findings.

Alternative Ways to Summarize the Result of a

Different Ways to Accept the Null Hypothesis

The following quotes are typical of the language used in statistics and research bookswhen the null hypothesis is accepted:

“The null hypothesis is not rejected.” (Black 2010, p 310)

“The null hypothesis cannot be rejected.” (McDaniel and Gates 2010, p 545)

“The null hypothesis claims that there is no difference between groups.” (Salkind 2010, p 193)

“The difference is not statistically significant.” (McDaniel and Gates 2010, p 545)

“ the obtained value is not extreme enough for us to say that the difference between

Groups 1 and 2 occurred by anything other than chance.” (Salkind 2010, p 225)

“If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative (hypothesis) is true.” (Keller 2009, p 358)

“The research hypothesis is not supported.” (Zikmund and Babin 2010, p 552)

Different Ways to Reject the Null Hypothesis

The following quotes are typical of the quotes used in statistics and research books when the null hypothesis is rejected:

“The null hypothesis is rejected.” (McDaniel and Gates 2010, p 546)

“If we reject the null hypothesis, we conclude that there is enough statistical evidence to infer that the alternative hypothesis is true.” (Keller 2009, p 358)

“If the test statistic ’ s value is inconsistent with the null hypothesis, we reject the null hypothesis and infer that the alternative hypothesis is true.” (Keller 2009, p 348)

“Because the observed value is greater than the critical value , the decision is to reject the null hypothesis.” (Black 2010, p 359)

“If the obtained value is more extreme than the critical value, the null hypothesis cannot be accepted.” (Salkind 2010, p 243)

“The critical t-value must be surpassed by the observed t-value if the hypothesis test is to be statistically significant ” (Zikmund and Babin 2010, p 567)

“The calculated test statistic exceeds the upper boundary and falls into this rejection region The null hypothesis is rejected.” (Weiers 2011, p 330)

It's important to recognize that the quotes mentioned are commonly referenced by statisticians and professors when interpreting the outcomes of hypothesis tests Therefore, you may encounter requests to summarize statistical test results using terminology or phrasing different from what is presented in this book.

58 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

End-of-Chapter Practice Problems

1 Suppose that you are an engineer working for a major tire manufacturer and that you have been asked to determine in a laboratory using specialized machines the

The new passenger sedan tire, crafted from innovative synthetic materials, is advertised to last for 40,000 miles To validate this claim, a random sample of these tires was tested for their expected lifetime, utilizing Excel for data analysis The results, illustrated in Fig 3.8, provide insights into the actual performance of this cutting-edge tire technology.

To analyze the lifetime figures, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure that the results are labeled appropriately and formatted to two decimal places for the mean, standard deviation, and standard error of the mean.

(b) Enter the null hypothesis and the research hypothesis onto your spreadsheet.

(c) Use Excel’s TINV function to find the 95 % confidence interval about the mean for these figures Label your answers Use number format (two decimal places).

Fig 3.8 Worksheet Data for Chap 3: Practice Problem #1

3.4 End-of-Chapter Practice Problems 59

(e) Enter yourconclusion in plain Englishonto your spreadsheet.

To finalize your spreadsheet, ensure it fits onto a single page by referring to the objectives in Chapter 2, Section 2.4 for guidance Next, create a hand-drawn diagram of the 95% confidence interval on your printout Finally, save the document under the name "lifetime3."

As an electrical engineer, you are tasked with evaluating whether a new type of light bulb, manufactured using an innovative fusing process, meets the expected lifetime of 1,300 hours under controlled laboratory conditions To conduct this assessment, you have collected a small random sample of these light bulbs and are preparing to analyze the hypothetical data presented in Fig 3.9.

Create an Excel spreadsheet with these data.

To analyze the data effectively, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure that each result is clearly labeled, and present the mean, standard deviation, and standard error of the mean with one decimal place for accuracy.

(b) Enter the null hypothesis and the research hypothesis for this item on your spreadsheet.

Fig 3.9 Worksheet Data for Chap 3: Practice Problem #2

60 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

To calculate the 95% confidence interval for the mean using Excel's TINV function, input the relevant data and label the results clearly on your spreadsheet Ensure that the lower and upper limits of the confidence interval are presented with one decimal place for accuracy.

(d) Enter theresultof the test on your spreadsheet.

To conclude the test, clearly summarize your findings in plain English on your spreadsheet Ensure that your final spreadsheet is printed to fit on a single page; for guidance on this process, refer to the objectives outlined in Chapter 2, Section 2.4.

(g) Draw a picture of the confidence interval, including the reference value, onto your spreadsheet.

(h) Save the final spreadsheet as: lightbulb3

Welch's offers a small can labeled as "100% Grape Juice," containing 5.5 fluid ounces (163 milliliters) of juice To verify this claim, a random sample of today's production is taken to measure the actual volume of grape juice in the cans The analysis of the collected data will help determine if the cans consistently contain the advertised 163 ml of grape juice.

Fig 3.10 Worksheet Data for Chap 3: Practice Problem #3

3.4 End-of-Chapter Practice Problems 61

Create an Excel spreadsheet with these data.

To analyze the data effectively, utilize Excel to the right of the table to calculate the sample size, mean, standard deviation, and standard error of the mean Ensure that each of these metrics is clearly labeled, and present the mean, standard deviation, and standard error of the mean with two decimal places for precision.

(b) Enter the null hypothesis and the research hypothesis for this problem onto your spreadsheet.

To determine the 95% confidence interval for the mean using Excel's TINV function, input the relevant data into your spreadsheet Ensure to label the results clearly, displaying both the lower and upper limits of the confidence interval rounded to two decimal places.

(d) Enter theresultof the test on your spreadsheet.

To conclude the test, summarize your findings clearly in plain English on your spreadsheet Ensure that the final version of your spreadsheet is printed on a single page; for guidance on formatting, refer to the objectives outlined at the end of Chapter 2, Section 2.4.

(g) Draw a picture of the confidence interval, including the reference value, onto your spreadsheet.

(h) Save the final spreadsheet as: grape3

Black K Business statistics: for contemporary decision making 6 th ed Hoboken: John Wiley& Sons, Inc.; 2010.

Keller G Statistics for management and economics 8th ed Mason: South-Western Cengage Learning; 2009.

Ledolter R, Hogg R Applied statistics for engineers and physical scientists 3 rd ed Upper Saddle River: Pearson Prentice Hall; 2010.

McDaniel C, Gates R Marketing research 8 th ed Hoboken: John Wiley & Sons, Inc.; 2010. McKillup S, Dyar M Geostatistics explained: an introductory guide for earth scientists. Cambridge: Cambridge University Press; 2010.

Salkind N Statistics for people who (think they) hate statistics 2 nd Excel 2007 ed Los Angeles: Sage Publications; 2010.

Townend J Practical statistics for environmental and biological scientists Hoboken: John Wiley

Weiers R Introduction to business statistics 7 th ed Mason: South-Western Cengage Learning; 2011.

Zikmund W, Babin B Exploring marketing research 10 th ed Mason: South-Western Cengage Learning; 2010.

62 3 Confidence Interval About the Mean Using the TINV Function and Hypothesis

One-Group t-Test for the Mean

In this chapter, you will learn how to use one of the most popular and most helpful statistical tests in science research: the one-group t-test for the mean.

The formula for the one-group t-test is as follows: tẳXμ

To calculate the z-score, subtract the population mean (μ) from the sample mean (X̄) and divide the result by the standard error of the mean (s.e.), which is determined by dividing the standard deviation by the square root of the sample size (n) For further insights into this statistical test, refer to Schuenemeyer and Drew (2011).

Let’s discuss the 7 STEPS of hypothesis testing using the one-group t-test so that you can understand how this test is used.

The 7 STEPS for Hypothesis-Testing Using

STEP 1: State the Null Hypothesis

When utilizing numerical scales in surveys, it's essential to focus on the midpoint of these scales For instance, in a 7-point scale ranging from 1 (poor) to 7 (excellent), the hypotheses should center around the middle values.

As a second example, suppose that you worked for Honda Motor Company and that you wanted to place a magazine ad that claimed that the new Honda Fit got

35 miles per gallon (mpg) The hypotheses for testing this claim on actual data would be:

STEP 2: Select the Appropriate Statistical Test

In this chapter we will be studying the one-group t-test, and so we will select that test.

4.1.3 STEP 3: Decide on a Decision Rule for the One-Group t-Test

(a) If the absolute value of t is less than the critical value of t, accept the null hypothesis.

(b) If the absolute value of t is greater than the critical value of t, reject the null hypothesis and accept the research hypothesis.

You are probably saying to yourself: “That sounds fine, but how do I find the absolute value of t?”

4.1.3.1 Finding the Absolute Value of a Number

To do that, we need another objective:

Objective: To find the absolute value of a number

64 4 One-Group t-Test for the Mean

The absolute value is a fundamental concept from high school algebra, representing any number as a positive value, regardless of its original sign.

For example, the absolute value of 2.35 is +2.35.

And the absolute value of minus 2.35 (i.e.2.35) is also +2.35.

Understanding the t-table in Appendix E is crucial for conducting a one-group t-test We will explore this table in detail during Step 5 of the t-test process, where we will explain how to determine the critical value of t using Appendix E.

STEP 4: Calculate the Formula

for the One-Group t-Test

Objective: To learn how to use the formula for the one-group t-test

The formula for the one-group t-test is as follows: tẳXμ

This formula makes the following assumptions about the data (Foster et al.1998):

The data analyzed are independent, meaning each individual or event is assigned a unique score Additionally, the population of the data follows a normal distribution, and the variance remains constant across the dataset, with the standard deviation being the square root of this variance.

To use this formula, you need to follow these steps:

1 Take the sample mean in your research study and subtract the population meanμ from it (remember that the population mean for a study involving numerical rating scales is the “middle” number in the scale).

2 Then take your answer from the above step, and divide your answer by the standard error of the mean for your research study (you will remember that you learned how to find the standard error of the mean in Chap.1; to find the standard error of the mean, just take the standard deviation of your research study and divide it by the square root ofn, wherenis the number of people or events used in your research study).

3 The number you get after you complete the above step is the value fort that results when you use the formula stated above.

4.1 The 7 STEPS for Hypothesis-Testing Using the One-Group t-Test 65

STEP 5: Find the Critical Value of t

in the t-Table in Appendix E

Objective: To find the critical value of t in the t-table in AppendixE

Before explaining "the critical value of t," let's practice locating it using the t-table found in Appendix E.

Keep your finger on Appendix Eas we explain how you need to “read” that table.

In this chapter, the test referred to as the "one-group t-test" requires you to consult the first column in Appendix E, labeled "sample size n," to determine the critical value of t for your research study.

To determine the critical value of t, locate your sample size in the first column of the table Then, move horizontally to the right to find the corresponding critical t value, which applies to both the one-group t-test and the 95% confidence interval for the mean.

For example, if you have 27 people in your research study, the critical value of t is 2.056.

If you have 38 people in your research study, the critical value of t is 2.026.

If you have more than 40 people in your research study, the critical value of t is always 1.96.

The "critical t column" in Appendix E indicates the t-value required to achieve 95% confidence in your results being deemed significant This critical t-value serves as a benchmark to determine the significance of your findings.

“significant result” in your statistical test.

The t-table in Appendix E displays a collection of bell-shaped normal curves, named for their resemblance to the outline of the Liberty Bell located in Philadelphia, near Independence Hall.

In statistical analysis, the center of normal curves is often considered the zero point on the x-axis For those interested in a deeper understanding of this concept, reputable statistics resources, such as Zikmund and Babin (2010), provide comprehensive explanations.

Values of t to the right of the zero point are positive and are denoted with a plus sign, while values to the left are negative and are marked with a minus sign Therefore, t can be both positive and negative.

Most statistics books featuring a t-table typically present only the positive side of the t-curves This is because the negative side is a mirror image of the positive side, containing identical values but with negative signs.

66 4 One-Group t-Test for the Mean

To utilize the t-table in Appendix E, it is essential to take the absolute value of the t-value obtained from the t-test formula, as the t-table exclusively lists positive t-values.

This book operates under the assumption that you aim for 95% confidence in your statistical test results Consequently, the t-value found in the t-table in Appendix E indicates whether the t-value you calculated using the one-group t-test formula falls within the 95% confidence interval of the t-distribution.

When the t-value from a one-group t-test falls within the 95% confidence interval, the result is deemed not significant, which effectively means we accept the null hypothesis.

If the t-value calculated from the one-group t-test falls outside the 95% confidence interval, it indicates a significant result, occurring less than 5% of the time This outcome leads to the rejection of the null hypothesis in favor of the research hypothesis.

STEP 6: State the Result of Your Statistical Test

There are two possible results when you use the one-group t-test, and only one of them can be accepted as “true.”

In a t-test analysis, if the absolute value of t calculated from the formula is less than the critical value provided in Appendix E, the null hypothesis is accepted Conversely, if the absolute value of t exceeds the critical value, the null hypothesis is rejected in favor of the research hypothesis.

STEP 7: State the Conclusion of Your Statistical

Summarizing the results of your statistical test in clear and simple language can be challenging, especially when aiming for conciseness and accuracy for an audience without a statistics background, such as your boss This crucial task requires practice, and we will provide ample opportunities to hone this skill throughout the book.

If you have read this far, you are ready to sit down at your computer and perform the one-group t-test using Excel on some hypothetical data.

4.1 The 7 STEPS for Hypothesis-Testing Using the One-Group t-Test 67

One-Group t-Test for the Mean

If you work for a company that produces powdered graphite for machine gear lubrication, you'll find that graphite is an effective lubricant because it can be applied dry, preventing the accumulation of dirt and debris that can hinder smooth gear operation Your company offers powdered graphite in various container sizes, catering to the needs of different businesses that rely on this product for optimal machine performance.

The company has traditionally charged for bulk graphite by volume, using cubic meters (m³) for pricing However, there is a concern that selling graphite by mass (kg) might be more profitable It is known that each cubic meter of graphite contains approximately 650 kilograms To investigate this, the company conducted tests comparing the volume and mass of graphite samples The analysis of this data will be performed using a one-group t-test to determine if transitioning to a mass-based pricing model could enhance profitability.

Suppose that the hypothetical data for these tests were based on a sample size of

124 samples which had a mean of 678 kg and a standard deviation of 144 kg. Objective: To analyze the data using the one-group t-test

Create an Excel spreadsheet with the following information:

Note: In this situation, you know that one cubic meter of graphite should have

650 kg of graphite Therefore, the hypotheses for this example are:

D23: enter the STDEV (see Fig.4.1)

68 4 One-Group t-Test for the Mean

D26: compute the standard error using the formula in Chap.1

D29: find the critical t value of t in the t-table in AppendixE

Now, enter the following formula in cell D32 to find the t-test result: ẳðD20650ị

To calculate the t-test result, subtract the hypothesized population mean of 650 from the sample mean located in cell D20, ensuring to use parentheses around D20 and 650 This calculation yields a difference of 28, which is then divided by the standard error of the mean found in cell D26, with a value of 12.93 The final t-test result, rounded to two decimal places, is 2.17.

4.2 One-Group t-Test for the Mean 69

Now, write the following sentence in D36-D39 to summarize the result of the t-test:

D36: Since the absolute value of t of 2.17 is

D37: greater than the critical t of 1.96, we

D38: reject the null hypothesis and accept

Lastly, write the following sentence in D41-D44 to summarize the conclusion of the result for the graphite example:

D41: There is significantly more than 650 kg per cubic meter

D42: of graphite sold when measured by volume compared

D43: to when the graphite is measured by mass, and it is

D44: probably closer to 678 kg per cubic meter.

Save your file as: graphite4

Important note: We have used the term “significantly more” because the sample mean mass of 678 kg is greater than the hypothesized mean mass of 650 kg.

Important note: You are probably wondering why we entered both the result and the conclusion in separate cells instead of in just one cell This is

When using a one-group t-test for the mean, it's crucial to avoid entering data in a single cell, as this can lead to disappointing outcomes when printing your final spreadsheet If you choose to fit the spreadsheet onto one page, the font size may become so small that the content is unreadable Conversely, if you opt not to fit it onto one page, the results and conclusions may spill over onto a second page, detracting from the overall professionalism of the document To maintain a polished appearance, ensure proper data entry and formatting.

Print the final spreadsheet so that it fits onto one page as given in Fig.4.3 Enter the null hypothesis and the research hypothesis by hand on your spreadsheet.

Important Note: It is important for you to understand that “technically” the above conclusion in statistical terms should state:

When measured by volume, over 650 kg of graphite is sold, which is considerably more than when assessed by mass, suggesting that this outcome is likely not coincidental.

However, throughout this book, we are using the term “signifi- cantly” in writing the conclusion of statistical tests to alert the

Fig 4.3 Final Spreadsheet for Graphite Example

The one-group t-test for the mean indicates that the statistical results are likely not due to chance To simplify communication, we use the term "significantly" as a concise way to convey this finding, making it easier for readers to grasp the conclusion in straightforward language rather than complex statistical jargon.

Can You Use Either the 95 % Confidence Interval

About the Mean OR the One-Group t-Test When

You are probably asking yourself:

To analyze the results of the problem types discussed in this book, you may consider using either the 95% confidence interval for the mean or the one-group t-test Is this understanding accurate?

The answer is a resounding: “Yes!”

In scientific research, both the confidence interval for the mean and the one-group t-test are frequently utilized for addressing the types of problems discussed in this book Remarkably, these two statistical tests yield identical results and lead to the same conclusions from the analyzed data set.

This book explains two statistical tests: the confidence interval about the mean test and the one-group t-test Different researchers have varying preferences for these tests, with some choosing one over the other, while others utilize both to enhance clarity in their research findings To ensure you are well-equipped for data analysis, we provide detailed explanations of both tests, accommodating diverse researcher preferences.

Now, let’s try your Excel skills on the one-group t-test on these three problems at the end of this chapter.

End-of-Chapter Practice Problems

The U.S Environmental Protection Agency (EPA) has established a maximum total phosphorus concentration of 0.015 mg/L for wastewater effluent from chemical plants Over a 90-day period, a random sample of wastewater effluent from a specific chemical plant was analyzed for phosphorus concentration, and you are tasked with testing your Excel skills using the hypothetical data provided in Fig 4.4.

72 4 One-Group t-Test for the Mean

To conduct your analysis, first, write the null hypothesis and the research hypothesis in your spreadsheet Next, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean, placing these results adjacent to your data set Ensure that the mean, standard deviation, and standard error of the mean are formatted to four decimal places for clarity.

(c) Enter the critical t from the t-table in Appendix Eonto your spreadsheet, and label it.

(d) Use Excel to compute the t-value for these data (use two decimal places) and label it on your spreadsheet

(e) Type the result on your spreadsheet, and then type the conclusion in plain English on your spreadsheet

(f) Save the file as: waste31

The Mayor of St Louis aims to decrease household waste in the Central West End, currently averaging 26 kilograms per week, by implementing a six-month recycling program alongside the existing garbage collection service To evaluate the effectiveness of this initiative, an analysis will be conducted using hypothetical data to assess any changes in weekly garbage collection per household.

Fig 4.4 Worksheet Data for Chap 4: Practice Problem #1

4.4 End-of-Chapter Practice Problems 73

(a) On your Excel spreadsheet, write the null hypothesis and the research hypothesis for these data.

To determine the sample size, mean, standard deviation, and standard error of the mean using Excel, input your data set and utilize Excel functions to calculate these statistics Ensure that the mean, standard deviation, and standard error of the mean are presented with two decimal places for accuracy This process provides a clear statistical overview of your data.

(c) Use Excel to perform a one-group t-test on these data (two decimal places). (d) On your printout, type the critical value of t given in your t-table in AppendixE.

(e) On your spreadsheet, type the result of the t-test.

(f) On your spreadsheet, type the conclusion of your study in plain English. (g) save the file as: garbage3

Maine, located in the northeastern United States, is renowned for its abundant lakes, boasting over 2,000 named lakes and more than 4,000 unnamed lakes that exceed one acre in size A key indicator of water quality in these lakes is the level of dissolved oxygen (DO), which plays a crucial role in maintaining a healthy aquatic ecosystem.

Dissolved oxygen (DO) levels decrease in lakes as waste accumulates The presence of oxygen is essential for the decomposition of nutrients in the water According to Burt et al (2009), the DO content in lakes is significantly impacted by the introduction of waste.

The one-group t-test for the mean is set at 5 milligrams (mg) per liter (L) In this study, data was gathered from a random sample of named lakes in Maine to assess Excel proficiency on a smaller scale before analyzing a larger dataset The hypothetical data can be found in Fig 4.6.

To analyze the data set effectively, first, write the null hypothesis and the research hypothesis in your spreadsheet Next, utilize Excel to calculate the sample size, mean, standard deviation, and standard error of the mean, ensuring that the results are displayed in number format with two decimal places for the mean, standard deviation, and standard error of the mean.

(c) Enter the critical t from the t-table in Appendix Eonto your spreadsheet, and label it.

(d) Use Excel to compute the t-value for these data (use two decimal places) and label it on your spreadsheet

(e) Type the result on your spreadsheet, and then type the conclusion in plain English on your spreadsheet

(f) Save the file as: MElakes3

Fig 4.6 Worksheet Data for Chap 4: Practice problem #3

4.4 End-of-Chapter Practice Problems 75

Burt J, Barber G, Rigby D Elementary statistics for geographers New York: The Guilford Press; 2009.

Foster D, Stine R, Waterman R Basic business statistics: a casebook New York: Springer-Verlag; 1998.

Schuenemeyer L, Drew L Statistics for earth and environmental scientists Hoboken: John Wiley

Zikmund W, Babin B Exploring marketing research 10th ed Mason: South-Western Cengage Learning; 2010.

76 4 One-Group t-Test for the Mean

Two-Group t-Test of the Difference of the Means for Independent Groups

In this section, we shift our focus from analyzing a single group of individuals or events to examining two distinct groups This transition involves measuring and comparing the outcomes from both groups, allowing for a more comprehensive understanding of the research study.

The two-group t-test for independent groups is used to analyze situations where two distinct groups, with no overlapping individuals or events, are measured on a single variable Each group produces a unique numerical value for comparison, highlighting their independence from one another.

The two-group t-test is based on two key assumptions: first, that both groups are drawn from a normally distributed population, and second, that the variances of these populations are approximately equal It's important to note that the standard deviation is simply the square root of the variance While there are specific formulas for dependent samples, this book focuses exclusively on independent groups, ensuring that no individual or event appears in both datasets.

When testing the difference between the means of two groups, it's crucial to use the appropriate formula based on the sample sizes of each group.

(1) Use Formula #1 in this chapter when both of the groups have a sample size greater than 30, and

(2) Use Formula #2 in this chapter when either one group, or both groups, have a sample size less than 30.

We will illustrate both of these situations in this chapter.

To effectively conduct hypothesis testing with two groups, it's essential to first grasp the necessary steps involved in the process before exploring the relevant formulas.

T.J Quirk et al., Excel 2013 for Physical Sciences Statistics,

The 9 STEPS for Hypothesis-Testing Using

STEP 1: Name One Group, Group 1,

and the Other Group, Group 2

In this chapter, we will utilize the numbers 1 and 2 to differentiate between two groups, allowing for streamlined computations By designating one group as Group 1 and the other as Group 2, you can simplify your calculations without the need to repeatedly reference the full names of the groups.

To investigate potential gender differences in SAT-Math scores among college freshmen majoring in Physics, you could categorize the participants as "Freshmen Males" and "Freshmen Females." However, this approach necessitates the explicit use of these terms in your documentation.

“Freshmen Males” and ”Freshmen Females” whenever you wanted to refer to one of these groups If you call the “Freshmen Males” group, Group 1, and the

“Freshmen Females” group, Group 2, this makes it much easier to refer to the groups because it saves you writing time.

When comparing the durability of two types of house paint, such as latex paint and oil-based paint, using simplified terms like Group 1 and Group 2 can enhance efficiency This approach saves time by avoiding the repetitive use of the full paint names, allowing for a clearer and more concise discussion of their lasting life.

It is important to understand that the designation of groups as Group 1 or Group 2 is entirely arbitrary; regardless of how you label them, the formulas will yield the same results and conclusions.

78 5 Two-Group t-Test of the Difference of the Means for Independent Groups

STEP 2: Create a Table That Summarizes

the Sample Size, Mean Score, and Standard

To ensure accuracy in your two-group t-test calculations, it's crucial to use the correct data Mixing up the numbers can lead to significant errors in your results For instance, if you analyze data from entering freshmen aspiring to be Physics majors, you might find that the Freshmen Males group consists of 57 individuals with an average SAT-Math score of 610 and a standard deviation of 120, while the Freshmen Females group has 46 individuals with an average SAT-Math score of 640 and a standard deviation of 110.

To analyze the significant differences in average SAT-Math scores between Freshmen Males and Freshmen Females, it is essential to accurately utilize six key statistics: the sample size, mean, and standard deviation for each group Proper application of these numbers in the formulas is crucial for effective data analysis.

If you create a table to summarize these data, a good example of the table, using both Step 1 and Step 2, would be the data presented in Fig.5.1:

In a research study comparing two groups, you can label Group 1 as Freshmen Males and Group 2 as Freshmen Females This classification allows you to organize your data effectively, as illustrated in the basic table format for a two-group t-test.

Fig 5.2 Results of Entering the Data Needed for the Two-group t-test

5.1 The 9 STEPS for Hypothesis-Testing Using the Two-Group t-Test 79

You can now use the formulas for the two-group t-test with more confidence that the six numbers will be placed in the proper place in the formulas.

You can label Group 1 as Freshmen Females and Group 2 as Freshmen Males, as the naming convention does not impact the outcome of your statistical test; the results will remain consistent regardless of the chosen designations.

STEP 3: State the Null Hypothesis and the

Hypothesis for the Two-Group t-Test

In a two-group t-test, formulating the null and research hypotheses is straightforward The null hypothesis posits that the population means of both groups are equal, whereas the research hypothesis asserts that these means are not equal.

You can now see that this notation is much simpler than having to write out the names of the two groups in all of your formulas.

STEP 4: Select the Appropriate Statistical Test

This chapter focuses on scenarios involving two groups with a single measurement for each individual or event within those groups, utilizing the two-group t-test as the primary statistical method.

STEP 5: Decide on a Decision Rule

The decision rule is exactly what it was in the previous chapter (see Sect.4.1.3) when we dealt with the one-group t-test.

(a) If the absolute value of t is less than the critical value of t, accept the null hypothesis.

(b) If the absolute value of t is greater than the critical value of t, reject the null hypothesis and accept the research hypothesis.

Since you learned how to find the absolute value of t in the previous chapter (see Sect.4.1.3.1), you can use that knowledge in this chapter.

80 5 Two-Group t-Test of the Difference of the Means for Independent Groups

STEP 6: Calculate the Formula

for the Two-Group t-Test

In this chapter, we will discuss the application of two distinct formulas for conducting a two-group t-test, which vary based on the sample sizes of the two groups Detailed instructions on how to utilize these formulas will be provided later in the chapter.

STEP 7: Find the Critical Value of t

of t in the t-Table in Appendix E

In the previous chapter, you learned how to find the critical value of t for a one-group t-test using the t-table in Appendix E, where you identified the sample size and located the corresponding critical value This process becomes more complex for the two-group t-test, as it involves two distinct groups, each potentially having different sample sizes, making the determination of the critical value of t more challenging.

To use AppendixEcorrectly in this chapter, you need to learn how to find the

“degrees of freedom” for your study We will discuss that process now.

5.1.7.1 Finding the Degrees of Freedom (df) for the Two-Group t-Test

Objective: To find the degrees of freedom for the two-group t-test and to use it to find the critical value of t in the t-table in AppendixE

The concept of "degrees of freedom" is essential in statistics, and while a detailed mathematical explanation is not provided here, it can be explored in various statistics textbooks, such as Keller (2009) For practical applications, you can easily determine the degrees of freedom and utilize it to find the critical value of t, as outlined in Appendix E The formula for calculating degrees of freedom (df) is given by df = n1 + n2 - 2.

To calculate the degrees of freedom for your analysis, sum the sample sizes of Group 1 and Group 2, and then subtract 2 from this total This will provide you with the necessary degrees of freedom to reference in Appendix E.

5.1 The 9 STEPS for Hypothesis-Testing Using the Two-Group t-Test 81

In a two-group t-test, it is essential to refer to the second column of the table, denoted as (df), to determine the critical value of t, rather than using the first column based on the sample size of a single group, as done in the one-group t-test.

To calculate the degrees of freedom for two groups, add the number of participants in each group For instance, with 13 individuals in Group 1 and 17 in Group 2, the degrees of freedom would be 28 (13 + 17 - 2) To find the critical t value, refer to a t-distribution table, locating the row for 28 degrees of freedom, which shows a critical t value of 2.048.

In a scenario where Group 1 consists of 52 individuals and Group 2 has 57, the total degrees of freedom would be calculated as 52 + 57 - 2 = 107 Referring to Appendix E, it is noted that for degrees of freedom exceeding 39, the critical t value consistently remains at 1.96 Therefore, for this example, the critical t value to be utilized is 1.96.

STEP 8: State the Result of Your Statistical Test

The result follows the exact same result format that you found for the one-group t-test in the previous chapter (see Sect.4.1.6):

In the t-test analysis, if the absolute value of t calculated from your data is less than the critical value listed in Appendix E, you accept the null hypothesis Conversely, if the absolute value of t exceeds the critical value, you reject the null hypothesis in favor of the research hypothesis.

STEP 9: State the Conclusion of Your

Writing the conclusion for a two-group t-test is more challenging than for a one-group t-test, as it requires determining the differences between the two groups being compared.

When you accept the null hypothesis, the conclusion is simple to write: “There is no difference between the two groups in the variable that was measured.”

But when you reject the null hypothesis and accept the research hypothesis, you need to be careful about writing the conclusion so that it is both accurate and concise.

Let’s give you some practice in writing the conclusion of a two-group t-test.

82 5 Two-Group t-Test of the Difference of the Means for Independent Groups

5.1.9.1 Writing the Conclusion of the Two-Group t-Test When

You Accept the Null Hypothesis

Objective: To write the conclusion of the two-group t- test when you have accepted the null hypothesis.

A company producing engineered stone has received feedback regarding the softness of its most popular product To address this issue, the company modified the chemical formula of the stone in an effort to enhance its hardness The hardness of stones is measured using the Mohs scale, which ranges from 1 (talc) to 10 (diamond) A pilot study was conducted to test the new formula's effectiveness, utilizing the Mohs scale to evaluate the results.

Suppose further, that you have decided to analyze the data from the tests comparing the “Old Stone” to the “New Stone” by using the two-group t-test.

Important note: You would need to use this test for each of the survey items separately.

Suppose that the hypothetical data for Item #10 was based on a sample size of

124 pieces of “Old Stone” which had a mean score on this item of 6.58 and a standard deviation on this item of 2.44 Suppose that you also had data from

86 samples of the “New Stone” which had a mean score of 6.45 with a standard deviation of 1.86.

In this chapter, we will eventually demonstrate how to calculate the results of the two-group t-test using its formulas However, for now, we will directly present the results illustrated in Fig 5.4.

Fig 5.3 Mohs Scale Survey Item #10

The worksheet data for the Mohs Scale, specifically for Item #10, supports the acceptance of the null hypothesis In the hypothesis-testing process using the two-group t-test, there are nine essential steps to follow With 83 degrees of freedom, the critical t-value is 1.96, as referenced in Appendix E The calculated t-test value is 0.44, which can be determined using a calculator.

Result: Since the absolute value of 0.44 is less than the critical t of

1.96, we accept the null hypothesis.

Conclusion: There was no difference between the “Old Stone” and the

“New Stone” in their hardness using the Mohs scale of hardness.

Now, let’s see what happens when you reject the null hypothesis (H 0 ) and accept the research hypothesis (H1).

5.1.9.2 Writing the Conclusion of the Two-Group t-Test When You

Reject the Null Hypothesis and Accept the Research Hypothesis

Objective: To write the conclusion of the two-group t-test when you have rejected the null hypothesis and accepted the research hypothesis

Let’s continue with this same example, but with the result that we reject the null hypothesis and accept the research hypothesis.

The analysis of the "Old Stone" samples, consisting of 85 data points, reveals a mean score of 7.26 with a standard deviation of 2.35 In contrast, the "New Stone" samples, totaling 48, show a mean score of 4.37 and a standard deviation of 3.26.

Without going into the details of the formulas for the two-group t-test, these data would produce the following result and conclusion based on Fig.5.5:

Research Hypothesis: μ16ẳμ2 degrees of freedom: 131 critical t: 1.96 (in AppendixE)

Fig 5.5 Worksheet Data for Item #10 for Obtaining a Significant Difference between the Two Types of Stone

84 5 Two-Group t-Test of the Difference of the Means for Independent Groups t-test formula: 5.40 (when you use your calculator!)

Result: Since the absolute value of 5.40 is greater than the critical t of 1.96, we reject the null hypothesis and accept the research hypothesis.

To determine which type of stone possesses a higher hardness rating according to the Mohs hardness scale, compare the ratings of the old stone with those of the new stone This comparison will reveal which stone is harder based on their respective ratings.

In conclusion, when summarizing the results of a two-group t-test, it is essential to compare the means of the two groups If the null hypothesis is rejected in favor of the research hypothesis, it is important to state that the difference in means is “significantly” different.

To effectively conclude a two-group t-test using a rating scale, visualize the mean scores of both groups on a scale diagram This visual representation helps to clearly illustrate the differences between the mean scores, as demonstrated in our stone hardness example depicted in Fig 5.6.

The visual representation indicates that the Old Stone exhibits greater hardness compared to the New Stone, with respective scores of 7.26 and 4.37 By rejecting the null hypothesis and accepting the research hypothesis, a significant difference between the two mean scores has been established.

So, our conclusion needs to contain the following key words:

Fig 5.6 Example of Drawing a “Picture” of the Means of the Two Groups on the Rating Scale5.1 The 9 STEPS for Hypothesis-Testing Using the Two-Group t-Test 85

We can use these key words to write the either of two conclusions which are logically identical:

Either:The Old Stones were significantly harder than the New Stones according to the Mohs scale of hardness (7.26 vs 4.37).

Or: The New Stones were significantly softer than the Old Stones according to the Mohs scale of hardness (4.37 vs 7.26).

Both of these conclusions are accurate, so you can decide which one you want to write It is your choice.

When drawing conclusions, ensure that the mean scores in parentheses correspond to the order of the groups mentioned For instance, if you state, “The Old Stones were significantly harder than the New Stones according to the Mohs scale of hardness,” the conclusion should reflect this order with the scores formatted as follows: (7.26 vs 4.37), indicating Old Stones first and New Stones second.

The New Stones exhibit a notable softness compared to the Old Stones, as evidenced by the Mohs scale of hardness, with values of 4.37 for the New Stones and 7.26 for the Old Stones.

Including the two mean scores at the conclusion of your research report allows readers to easily reference the data without needing to flip back to the table This practice enhances clarity and ensures that the differences between the mean scores are readily accessible, improving the overall readability of your findings.

Now, let’s discuss FORMULA #1 that deals with the situation in which both groups have a sample size greater than 30.

Objective: To use FORMULA #1 for the two-group t-test when both groups have a sample size greater than 30

Formula #1: Both Groups Have a Sample Size

An Example of Formula #1 for the Two-Group t-Test

Now, let’s use Formula #1 in a situation in which both groups have a sample size greater than 30.

A large manufacturing company produced various thermoses using two metal alloys, A and B, to construct the thermos bodies To evaluate the insulating properties of these alloys, the company conducted a comparative test to determine any differences in performance.

A thermos containing 100°C water was sealed with a temperature probe, and the temperature was measured after 8 hours The results of this experiment are illustrated in Fig 5.7.

In a recent analysis of thermos temperatures, Group A, consisting of 52 thermoses, recorded a mean temperature of 55 degrees with a standard deviation of 7, while Group B, comprising 57 thermoses, achieved a mean temperature of 64 degrees and a standard deviation of 13.

The two-group t-test is considered robust, meaning it does not require equal sample sizes for both groups This flexibility allows for more reliable statistical analysis in various scenarios.

Your data then produce the following table in Fig.5.8:

Fig 5.7 Example of a Temperature Scale Rating for Water Temperature Inside the Thermos (Practical Example)

Fig 5.8 Worksheet Data for Water Temperature

88 5 Two-Group t-Test of the Difference of the Means for Independent Groups

Create an Excel spreadsheet, and enter the following information:

Now, widen column B so that it is twice as wide as column A, and center the six numbers and their labels in your table (see Fig.5.9)

Since both groups have a sample size greater than 30, you need to use Formula #1 for the t-test for the difference of the means of the two groups.

Let’s “break this formula down into pieces” to reduce the chance of making a mistake.

B13: STDEV1 squared/n 1 (note that you square the standard deviation of Group 1, and then divide the result by the sample size of Group 1)

Fig 5.9 Results of Widening Column B and Centering the Numbers in the Cells

5.2 Formula #1: Both Groups Have a Sample Size Greater Than 30 89

You now need to compute the values of the above formulas in the following cells:

To calculate the values in the specified cells, the result for cell B13 should be computed to two decimal places, followed by the result for cell B16, also rounded to two decimals Similarly, the calculation for cell B19 must be presented with two decimal precision Finally, for cell D22, apply the square root function to the value in D19, ensuring the result is formatted to two decimal places.

This formula should give you a standard error (s.e.) of 1.98.

D25: 1.96 (Since dfẳn 1 ỵn 2 2, this gives dfẳ1092ẳ107, and the critical t is, therefore, 1.96 in AppendixE.)

This formula should give you a value for the t-test of:4.55.

Nest, check to see if you have rounded off all figures in D13:D28 to two decimal places (see Fig.5.11).

Fig 5.10 Formula Labels for the Two-group t-test

90 5 Two-Group t-Test of the Difference of the Means for Independent Groups

Now, write the following sentence in D31 to D34 to summarize the result of the study:

D31: Since the absolute value of4.55

D32: is greater than the critical t of

D33: 1.96, we reject the null hypothesis

D34: and accept the research hypothesis.

Finally, write the following sentence in D36 to D38 to summarize the conclusion of the study in plain English:

D36: Overall, Thermoses from Group B were significantly

D37: better at insulating than Thermoses from Group A

Save your file as: TEMP12E

Fig 5.11 Results of the t-test Formula for Water

5.2 Formula #1: Both Groups Have a Sample Size Greater Than 30 91

It's crucial to enter the result and conclusion in separate cells to maintain the readability and professionalism of your final spreadsheet Combining them in one cell can lead to issues when printing; either the text will shrink to an unreadable font size to fit on one page, or it will spill over onto a second page, compromising the overall appearance To ensure your spreadsheet looks polished, keep these elements distinct.

Print this file so that it fits onto one page, and write by hand the null hypothesis and the research hypothesis on your printout.

The final spreadsheet appears in Fig.5.12.

Fig 5.12 Final Worksheet for Water Temperature

92 5 Two-Group t-Test of the Difference of the Means for Independent Groups

Now, let’s use the second formula for the two-group t-test which we use whenever either one group, or both groups, have a sample size less than 30.

Objective: To use Formula #2 for the two-group t-test when one or both groups have a sample size less than 30

Now, let’s look at the case when one or both groups have a sample size less than 30.

Formula #2: One or Both Groups Have a Sample

As an electrical engineer tasked with comparing the hours until failure of two new light bulb models, A and B, developed by your company's Research & Development department, you opt to utilize the two-group t-test for independent samples to analyze the data To apply your Excel skills effectively, you will work with a small sample of light bulbs from each model, referencing the hypothetical data provided in Fig 5.13 for your analysis.

Fig 5.13 Worksheet Data for Light Bulbs (Practical

5.3 Formula #2: One or Both Groups Have a Sample Size Less Than 30 93

Let’s call Model A as Group 1, and Model B as Group 2.

Note: Since both groups have a sample size less than 30, you need to use Formula

Create an Excel spreadsheet, and enter the following information:

B2: LIGHT BULB HOURS (hrs) UNTIL FAILURE

Ensure that you accurately input all figures into the table, as even a single incorrect entry can lead to an incorrect solution for this problem It is essential to double-check your figures for accuracy.

Now, widen columns B and C so that all of the information fits inside the cells.

To equalize the width of columns B and C in your spreadsheet, highlight both columns by clicking on their letters at the top Next, position your mouse pointer at the right edge of column B until a cross sign appears Click and drag this cross to the right until all text is visible on your screen, then release the mouse button This will ensure that both columns B and C are now the same width.

Then, center all information in the table except the top title by using the following steps:

To center the content in cells B4:C15, simply left-click and highlight these cells Next, navigate to the "Alignment" section at the top-center of the Home tab and click on the second icon from the left on the bottom line This action will align all information within the selected table cells to the center.

Your spreadsheet should now look like Fig.5.14.

94 5 Two-Group t-Test of the Difference of the Means for Independent Groups

Now you need to use your Excel skills from Chap.1to fill in the sample sizes (n), the Means, and the Standard Deviations (STDEV) in the Table in cells F10:H11.

Always verify your calculations, as even a single incorrect figure can lead to an incorrect answer Round the means and standard deviations to zero decimal places and ensure these six figures are centered within their respective cells.

Since both groups have a sample size less than 30, you need to use Formula #2 for the t-test for the difference of the means of two independent samples.

Formula #2 for the two-group t-test is the following: tẳX1X2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n11 ð ịS 2 1ỵðn21ịS 2 2 n1þn22

1 n1 þ 1 n2 s ð5:5ị and where degrees of freedom ẳdf ẳn1ỵn22 ð5:6ị

To minimize errors when working with this complex formula, it is advisable to break it down into smaller, manageable parts rather than attempting to write it all in a single cell entry.

Now, enter these words on your spreadsheet:

Fig 5.14 Light Bulb Hours Until Failure Worksheet Data for Hypothesis Testing

5.3 Formula #2: One or Both Groups Have a Sample Size Less Than 30 95

Fig 5.15 Light Bulb Formula Labels for the Two-group t-test

96 5 Two-Group t-Test of the Difference of the Means for Independent Groups

You now need to use your Excel skills to compute the values of the above formulas in the following cells:

To calculate the value for cell E14, apply the necessary formula and round the result to two decimal places Similarly, compute the result for cell E16 using the appropriate formula, ensuring it is also rounded to two decimals Finally, determine the value for cell E18 by utilizing the correct formula.

H20: the result of the formula needed to compute cell E20 (use two decimals) H28: ẳSQRTðððH14ỵH16ị=H18ị*H20ị

Ensure that you include three opening parentheses after SQRT and three closing parentheses at the end of the formula This precise structure is essential; without it, the formula will not function properly.

The above formula gives a standard error of the difference of the means equal to 25.68 (two decimals) in cell H23.

H26: Enter the critical t value from the t-table in AppendixEin this cell using df ẳn 1 ỵn 2 2 to find the critical t value

To calculate the t-test value, ensure you place an open parenthesis before G10 and a closed parenthesis after G11 This will allow you to divide the answer of 57 by the standard error of the difference of the means, which is 25.68, resulting in a t-test value of 2.22 Remember to present the t-test result using two decimal places (refer to Fig 5.16).

5.3 Formula #2: One or Both Groups Have a Sample Size Less Than 30 97

Now write the following sentence in C32 to C33 to summarize theresultof the study:

C32: Since the absolute value of2.22 is greater than 2.101,

C33: we reject the null hypothesis and accept the research hypothesis.

Finally, write the following sentence in C36 to C37 to summarize theconclusion of the study:

C36: Model B lasted significantly more hours until failure than Model A C37: (965 hours vs 908 hours).

Fig 5.16 Light Bulb Hours Two-group t-test Formula Results

98 5 Two-Group t-Test of the Difference of the Means for Independent Groups

Save your file as: bulb3

Print the final spreadsheet so that it fits onto one page.

Write the null hypothesis and the research hypothesis by hand on your printout. The final spreadsheet appears in Fig.5.17.

Fig 5.17 Light Bulb Hours Final Spreadsheet

5.3 Formula #2: One or Both Groups Have a Sample Size Less Than 30 99

End-of-Chapter Practice Problems

As an electrical engineer tasked with testing two types of lead wires for household light bulb production, you found that the Current Wires had an average of 21.1 misfeeds per hour (standard deviation: 3.24) over 112 tests, while the New Wires showed a lower average of 19.6 misfeeds per hour (standard deviation: 3.06) across 126 tests This indicates that the New Wire demonstrates a significant improvement in performance, as it consistently misfeeds less than the Current Wire, supporting the potential transition to the new product for enhanced manufacturing efficiency.

(a) State the null hypothesis and the research hypothesis on an Excel spreadsheet.

To find the standard error of the difference between the means in Excel, first, input your data and use the appropriate formula Next, consult Appendix E to determine the critical t value and record it in your spreadsheet Finally, perform a t-test using Excel to calculate the t value for your data.

Use three decimal places for all figures in the formula section of your spreadsheet.

(e) State your result on your spreadsheet.

(f) State your conclusion in plain English on your spreadsheet.

(g) Save the file as: leadwire3

In a recent research project, two types of household primer paint were analyzed to compare their average drying times The study focuses on the current paint used by the company versus a newly introduced paint on the market The objective is to determine if there is a significant difference in drying times measured in minutes An experiment has been conducted, and the collected data is now available for analysis using Excel skills, as illustrated in Fig 5.18.

100 5 Two-Group t-Test of the Difference of the Means for Independent Groups

(a) On your Excel spreadsheet, write the null hypothesis and the research hypothesis.

To analyze the data on paint types, create a summary table in your spreadsheet Utilize Excel to calculate the sample sizes, means, and standard deviations for both types of paint, ensuring that the means and standard deviations are presented with two decimal places.

(c) Use Excel to find the standard error of the difference of the means (two decimal places).

(d) Use Excel to perform a two-group t-test What is the value oft that you obtain (use two decimal places)?

(e) On your spreadsheet, type the critical value of t using the t-table in AppendixE.

(f) Type your result on the test on your spreadsheet.

(g) Type your conclusion in plain English on your spreadsheet.

(h) save the file as: PAINT3

The CEO of a national fishing hook manufacturing company has requested an analysis to determine the difference in tensile strength between its two brands of fishing hooks: Brand A, a premium option with a higher cost, and Brand B, a standard choice Tensile strength, measured in pascals (Pa), is crucial for assessing the durability of these hooks Due to the proprietary nature of the alloys used in the steel production, the CEO has secured the cooperation of the raw materials supplier for this analysis.

5.4 End-of-Chapter Practice Problems 101 materials (rolls of high carbon steel wire) for the hooks, and has obtained the hypothetical data given in Fig.5.19:

(a) State the null hypothesis and the research hypothesis on an Excel spreadsheet.

To calculate the standard error of the difference between means in Excel, follow these steps: first, input your data and use the appropriate formula Next, determine the critical t value by consulting Appendix E and record it in your spreadsheet Finally, conduct a t-test on the data in Excel to find the resulting t value.

(e) State your result on your spreadsheet.

(f) State your conclusion in plain English on your spreadsheet.

(g) Save the file as: FISH13

Keller G Statistics for management and economics 8 th ed Mason: South-Western Cengage Learning; 2009.

Wheater C, Cook P Using statistics to understand the environment New York: Routledge; 2000. Fig 5.19 Worksheet Data for Chap 5: Practice Problem #3

102 5 Two-Group t-Test of the Difference of the Means for Independent Groups

Correlation and Simple Linear Regression

There are many different types of “correlation coefficients,” but the one we will use in this book is the Pearson product-moment correlation which we will call:r.

What Is a “Correlation?”

Understanding the Formula for Computing

Objective: To understand the formula for computing the correlation r

Fig 6.7 Worksheet Data for High School GPA and

The formula for computing the correlationris as follows: rẳ

This formula looks daunting at first glance, but let’s “break it down into its steps” to understand how to compute the correlation r.

Understanding the Nine Steps for Computing

Objective: To understand the nine steps of computing a correlation r

The nine steps are as follows:

1 Find the sample size n by noting the number of students 8

2 Divide the number 1 by the sample size minus 1 (i.e., 1/7) 0.14286

3 For each student, take the HSGPA and subtract the mean

HSGPA for the 8 students and call this X X (For example, for student # 6, this would be: 2.6 2.86)

Note: With your calculator, this difference is 0.26, but when

Excel uses 16 decimal places for every computation, this result could be slightly different for each student

4 For each student, take the FROSH GPA and subtract the mean

FROSH GPA for the 8 students and call this Y Y (For example, for student # 6, this would be: 2.3 2.74)

5 Then, for each student, multiply ð X X ị times ð Y Y ị (For example, for student # 6 this would be: ( 0.26) ( 0.44))

6 Add the results of ð X X ị times ð Y Y ị for the 8 students +1.09 Steps 1–6 would produce the Excel table given in Fig.6.8.

108 6 Correlation and Simple Linear Regression

In Excel, multiplying two negative numbers yields a positive result, as demonstrated by student #7's calculation of -0.46 * -0.64, which equals 0.29 Conversely, when a negative number is multiplied by a positive number, the outcome is negative, illustrated by student #1's example of -0.06 * 0.16, resulting in -0.01.

Note: Excel computes all computation to 16 decimal places So, when you check your work with a calculator, you frequently get a slightly different answer than Excel’s answer.

For example, when you compute above:

XX ð ị ðYYịfor student#2, your calculator gives:

Excel provides a highly accurate result of 0.02, utilizing 16 decimal places for every number, even though only two decimal places are displayed in the visual representation.

In Step 6, ensure that you first add all positive numbers to achieve +1.10, followed by summing all negative numbers to reach 0.03 Subtracting these results will give you +1.07 However, when using Excel for these calculations, the total will be +1.09, as Excel maintains precision in every computation.

16 decimal places which is much more accurate than your calculator.

7 Multiply the answer for step 2 above by the answer for step 6 0.1557

8 Multiply the STDEV of X times the STDEV of Y 0.1872

9 Finally, divide the answer from step 7 by the answer from step 8 +0.83

Fig 6.8 Worksheet for Computing the Correlation, r

The correlation coefficient of +0.83 between HSGPA (X) and FROSH GPA (Y) for the eight students indicates a strong positive relationship, suggesting that higher HSGPA is associated with higher FROSH GPA For further insights on correlation, refer to the works of Ledholter and Hogg (2010) and McCleery, Watt, and Hart (2007).

You could also use the results of the above table in the formula for computing the correlation r in the following way: correlation rẳ ð1=ðn1ịị X

= STDEVx STDEVy correlation rẳẵð1=7ị 1:09= :ẵð ị 48 ð ị:39 correlationẳrẳ0:83

Using Excel for calculations yields a correlation of +0.82, as it employs 16 decimal places for all numbers and computations, resulting in greater accuracy compared to standard calculators.

Now, let’s discuss how you can use Excel to find the correlation between two variables in a much simpler, and much faster, fashion than using your calculator.

Using Excel to Compute a Correlation Between

Objective: To use Excel to find the correlation between two variables

In a study conducted on 4-door sedans, the relationship between vehicle weight and fuel consumption for a 150-mile drive was analyzed The findings revealed a significant correlation, indicating that heavier sedans tend to consume more fuel over the specified distance This insight is crucial for car manufacturers aiming to optimize fuel efficiency while considering vehicle design and weight.

Twelve of the latest sedan models were utilized for a 150-mile journey from Forest Park in St Louis, Missouri, to Kansas City, Missouri Professional drivers, all of similar weight, navigated a predetermined route at designated speeds throughout the trip.

To evaluate your Excel skills, you have structured the data into a table that records the weight of cars in thousands of pounds alongside the gallons of gasoline consumed during their drives The hypothetical data is illustrated in Fig 6.9.

110 6 Correlation and Simple Linear Regression

Important note: Note that the weight of the cars is recorded in thousands of pounds, so that a car that weighed 3,500 pounds would be recorded as 3.5 in this table.

To investigate the relationship between car weight and fuel consumption, a correlation analysis is conducted In this study, car weight serves as the predictor variable (X), while the fuel consumption measured in gallons is the criterion variable (Y).

Create an Excel spreadsheet with the following information:

A3: WEIGHT OF 4-DOOR SEDANS VS NO OF GALLONS USED TO DRIVE 150 MILES

B5: Is there a relationship between the weight of a 4-door sedan

B6: and the number of gallons used to drive 150 miles?

Next, change the width of Columns B and C so that the information fits inside the cells.

To complete the table, ensure that B20 equals 4.1 and C20 equals 6.9 by accurately filling in the remaining figures After inputting the data, center the information within all relevant cells for a polished presentation.

Fig 6.9 Worksheet Data for Weight and Number of Gallons Used (Practical Example)6.2 Using Excel to Compute a Correlation Between Two Variables 111

Next, define the “name” to the range of data from B9:B20 as: weight

We discussed earlier in this book (see Sect.1.4.4) how to “name a range of data,” but here is a reminder of how to do that:

To give a “name” to a range of data:

Click on the top number in the range of data and drag the mouse down to the bottom number of the range.

To name the cells B9:B20 as "weight," click on cell B9 and drag the pointer down to B20 to highlight the range After selecting the cells, proceed to the next step.

Define name (top center of your screen) weight (enter this in the Name box; see Fig.6.10)

Now, repeat these steps to give the name: gallons to C9:C20

Finally, click on any blank cell on your spreadsheet to “deselect” cells C9:C20 on your computer screen.

To complete the data for the specified sample sizes, means, and standard deviations, ensure that the value in cell B23 is set to 3.08 and the value in cell C24 is adjusted to 0.75, using two decimal places for both means and standard deviations as illustrated in Figure 6.11.

Fig 6.10 Dialogue Box for Naming a Range of Data as: “weight”

112 6 Correlation and Simple Linear Regression

Objective: Find the correlation between weight and gallons used

C26: ẳcorrel(weight,gallons) ; see Fig.6.12

Fig 6.11 Example of Using Excel to Find the Sample Size, Mean, and STDEV

6.2 Using Excel to Compute a Correlation Between Two Variables 113

Hit the Enter key to compute the correlation

C26: format this cell to two decimals

Note that the equal sign inẳcorrel(weight,gallons) in C26 tells Excel that you are going to use a formula in this cell.

There is a strong positive correlation of +0.91 between weight (X) and the number of gallons used (Y), indicating a significant relationship between these two variables This suggests that as weight increases, the amount of gallons required to drive 150 miles also increases.

Save this file as: GALLONS3

The final spreadsheet appears in Fig.6.13.

Fig 6.12 Example of Using Excel ’ s ẳ correl Function to Compute the Correlation Coefficient

114 6 Correlation and Simple Linear Regression

Creating a Chart and Drawing the Regression

Using Excel to Create a Chart and the

Line Through the Data Points

Objective: To create a chart and the regression line summarizing the relationship between weight and gallons used

2 Click and drag the mouse to highlight both columns of numbers (B9:C20),but do not highlight the labels above the data points.

Insert (top left of screen)

Highlight: Scatter chart icon (immediately above the word: “Charts” at the top center of your screen)

Click on the down arrow on the right of the chart icon

Highlight the top left scatter chart icon (see Fig.6.14)

Click on the top left chart to select it

Click on the “+icon” to the right of the chart (CHART ELEMENTS)

Click in the check mark next to “Chart Title” and also next to “Gridlines” to remove these check marks (see Fig.6.15)

Fig 6.14 Example of Selecting a Scatter Chart

6.3 Creating a Chart and Drawing the Regression Line onto the Chart 117

Click on the box next to: “Chart Title” and then click on the arrow to its right. Then, click on: “Above chart”

Note that the words: “Chart Title” are now in a box at the top of the chart (see Fig.6.16)

Enter the following Chart Title to the right off x at the top of your screen:

RELATIONSHIP BETWEEN WEIGHT AND NO OF GALLONS USED (see Fig.6.17)

Fig 6.15 Example of Chart Elements Selected

Fig 6.16 Example of Chart Title Selected

118 6 Correlation and Simple Linear Regression

Hit the Enter Key to enter this title onto the chart

Clickinside the chart at the top right corner of the chart to“deselect” the box around the Chart Title (see Fig.6.18)

Click on the “+box” to the right of the chart

Add a check mark to the left of “Axis Titles” (This will create an “Axis Title” box on the y-axis of the chart)

Fig 6.17 Example of Creating a Chart Title

Fig 6.18 Example of a Chart Title Inserted onto the Chart

6.3 Creating a Chart and Drawing the Regression Line onto the Chart 119

Click on the right arrow for: “Axis titles” and then click on: “Primary Horizontal” to remove the check mark in its box (this will create the y-axis title)

Enter the following y-axis title to the right off x at the top of your screen:

Then hit the Enter Key to enter this y-axis title to the chart

Clickinside the chart at the top right corner of the chart to “deselect” the box around the y-axis title (see Fig.6.19)

Click on the “+box” to the right of the chart

Highlight: “Axis Titles” and click on its right arrow

Click on the words: “Primary Horizontal” to add a check mark to its box (this creates an “Axis Title” box on the x-axis of the chart)

Enter the following x-axis title to the right off x at the top of your screen:

Then, hit the Enter Key to add this x-axis title to the chart

Clickinside the chart at the top right corner of the chart to “deselect” the box around the x-axis title (see Fig.6.20).

Fig 6.19 Example of Adding a y-axis Title to the Chart

120 6 Correlation and Simple Linear Regression

The regression line, known as the "least-squares regression line," is the optimal straight line that fits the data points on the chart.

6.3.1.1 Drawing the Regression Line Through the Data Points in the Chart

Objective: To draw the regression line through the data points on the chart

Right-clickon any one of the data points inside the chart

Highlight: Add Trendline (see Fig.6.21)

Fig 6.20 Example of a Chart Title, an x-axis Title, and a y-axis Title

6.3 Creating a Chart and Drawing the Regression Line onto the Chart 121

Linear (be sure the “linear” button near the top is selected on the “Format Trendline” dialog box; see Fig.6.22)

Fig 6.21 Dialogue Box for Adding a Trendline to the Chart

Fig 6.22 Dialog Box for a Linear Trendline

122 6 Correlation and Simple Linear Regression

Click on the X at the top right of the “Format Trendline” dialog box to close this dialog box

Click on any blank celloutside the chartto “deselect” the chart

Save this file as: GALLONS4

Your spreadsheet should look like the spreadsheet in Fig.6.23.

6.3.1.2 Moving the Chart Below the Table in the Spreadsheet

Objective: To move the chart below the table

To reposition the chart, left-click on any white space to the right of the top title, hold the click, and drag the chart down and to the left until the top left corner aligns with cell A29, then release the mouse button.

Fig 6.23 Final Chart with the Trendline Fitted Through the Data Points of the Scatterplot6.3 Creating a Chart and Drawing the Regression Line onto the Chart 123

6.3.1.3 Making the Chart “Longer” So That It Is “Taller”

Objective: To make the chart “longer” so that it is taller

To extend the chart, left-click on the bottom-center and create an "up-and-down arrow" sign Then, while holding down the left mouse button, drag the chart down to row 48, and release the mouse button to finalize the adjustment.

Objective: To make the chart “wider”

To widen the chart, position the pointer at the center of the right border, creating a left-to-right arrow sign Then, click and hold the left mouse button while dragging the right border towards the middle of Column H (refer to Fig 6.25).

Fig 6.24 Example of Moving the Chart Below the Table

124 6 Correlation and Simple Linear Regression

Now, click on any blank celloutside the chartto “deselect” the chart

Save this file as: GALLONS5

To print this spreadsheet on a single page, you must reduce the scale below 100%, as it currently exceeds the page limits and would spill over onto four pages Follow the steps outlined below to print some or all of the spreadsheet effectively.

Printing a Spreadsheet So That the Table and

Objective: To print the spreadsheet so that the table and the chart fit onto one page

Page Layout (top of screen)

To ensure that the table and chart fit on a single page when printing, adjust the scale by clicking the down-arrow next to the "Scale to Fit" icon at the top of the screen, and select "80%."

Fig 6.25 Example of a Chart that is Enlarged to Fit the Cells: A29:H48

6.4 Printing a Spreadsheet So That the Table and Chart Fit onto One Page 125

Fig 6.26 Example of the Page Layout for Reducing the Scale of the Chart to 80 % of Normal Size

126 6 Correlation and Simple Linear Regression

Save your file as: GALLONS6

Finding the Regression Equation

Installing the Data Analysis ToolPak into Excel

Objective: To install the Data Analysis ToolPak into Excel

Since there are currently four versions of Excel in the marketplace (2003, 2007,

2010, 2013), we will give a brief explanation of how to install the Data Analysis ToolPak into each of these versions of Excel.

6.5.1.1 Installing the Data Analysis ToolPak into Excel 2013

Click on: Data (at the top of your screen)

To check if the Data Analysis ToolPak for Excel 2013 is installed, look for the words “Data Analysis” at the far right of your monitor screen If you see this, it means the ToolPak was correctly installed with Office 2013, and you can proceed to Section 6.5.2.

If the words: “Data Analysis” are not at the top right of your monitor screen, then the ToolPak component of Excel 2013 was not installed when you installed Office

2013 onto your computer If this happens, you need to follow these steps:

Options (bottom left of screen)

Note: This creates a dialog box with “Excel Options” at the top left of the box Add-Ins (on left of screen)

Manage: Excel Add-Ins (at the bottom of the dialog box)

Go (at bottom center of dialog box)

Highlight: Analysis ToolPak (in the Add-Ins dialog box)

Put a check mark to the left of Analysis Toolpak

128 6 Correlation and Simple Linear Regression

OK (at the right of this dialog box)

You now should have the words: “Data Analysis” at the top right of your screen to show that this feature has been installed correctly

If you get a prompt asking you for the “installation CD,” put this CD in the CD drive and click on: OK

Note: If these steps do not work, you should try these steps instead:

File/Options (bottom left)/Add-ins/Analysis ToolPak/Go/ click to the left of Analysis ToolPak to add a check mark/OK

If you need help doing this, ask your favorite “computer techie” for help.

You are now ready to skip ahead to Sect.6.5.2

6.5.1.2 Installing the Data Analysis ToolPak into Excel 2010

Click on: Data (at the top of your screen)

To check if the Data Analysis ToolPak for Excel 2010 is properly installed, look at the top right corner of your monitor screen for the words "Data Analysis." If you see this option, it indicates a successful installation during your Office 2010 setup, and you can proceed to Section 6.5.2.

If the words: “Data Analysis” are not at the top right of your monitor screen, then the ToolPak component of Excel 2010 was not installed when you installed Office

2010 onto your computer If this happens, you need to follow these steps:

Excel options ( creates a dialog box)

Manage: Excel Add-Ins (at the bottom of the dialog box)

Highlight: Analysis ToolPak (in the Add-Ins dialog box)

(You now should have the words: “Data Analysis” at the top right of your screen)

If you get a prompt asking you for the “installation CD,” put this CD in the CD drive and click on: OK

Note: If these steps do not work, you should try these steps instead:

File/Options (bottom left)/Add-ins/Analysis ToolPak/Go/ click to the left of Analysis ToolPak to add a check mark/OK

If you need help doing this, ask your favorite “computer techie” for help.

You are now ready to skip ahead to Sect.6.5.2.

6.5.1.3 Installing the Data Analysis ToolPak into Excel 2007

Click on: Data (at the top of your screen

If the words “Data Analysis” do not appear at the top right of your screen, you need to install the Data Analysis ToolPak using the following steps:

Microsoft Office button (top left of your screen)

Excel options (bottom of dialog box)

Add-ins (far left of dialog box)

Go (to create a dialog box for Add-Ins)

OK (If Excel asks you for permission to proceed, click on: Yes)

(You should now have the words: “Data Analysis” at the top right of your screen)

If you need help doing this, ask your favorite “computer techie” for help.

You are now ready to skip ahead to Sect.6.5.2.

6.5.1.4 Installing the Data Analysis ToolPak into Excel 2003

Click on: Tools (at the top of your screen)

To determine if the ToolPak is installed in your Excel version, check the Tools box for the "Data Analysis" option If it is present, you can proceed to find the regression equation If not, you will need to install the ToolPak to access this feature.

Options (bottom left of screen)

Analysis Tool Pak (it is directly underneath Inactive

Application Add-ins near the top of the box)

Click to add a check-mark to the left of analysis Toolpak

Note: If these steps do not work, try these steps instead: Tools/Add-ins/Click to the left of analysis ToolPak to add a check mark to the left/OK

You are now ready to skip ahead to Sect.6.5.2.

130 6 Correlation and Simple Linear Regression

Using Excel to Find the SUMMARY

You have now installedToolPak, and you are ready to find the regression equation for the “best-fitting straight line” through the data points by using the following steps:

Open the Excel file:GALLONS6(if it is not already open on your screen)

To deselect a chart with a gray border in an already open file, simply click on any empty cell outside of the chart.

Now that you have installed Toolpak, you are ready to find the regression equation summarizing the relationship between weight and number of gallons used in your data set.

Remember that you gave the name:weight to the X data (the predictor), and the name:gallonsto the Y data (the criterion) in a previous section of this chapter (see Sect.6.2)

Data analysis (far right at top of screen; see Fig.6.28)

Scroll down the dialog box using the down arrow and click on: Regression (see Fig.6.29)

Fig 6.28 Example of Using the Data/Data Analysis Function of Excel

To perform the Regression analysis, click the button next to Output Range and specify cell A50 in the provided box to designate where the results will be inserted in your spreadsheet.

TheSUMMARY OUTPUTshould now be in cells: A50:I67

To improve readability in the Regression Summary Output section of your spreadsheet, widen the columns to ensure all column headings are clearly visible Next, format the data in the specified two cells to Number format with two decimal places by selecting "Home" at the top left of your screen.

Next, change this cell to four decimal places: B67

To format all decimal numbers to three decimal places and center them within their cells, adjust the settings in your spreadsheet Ensure that the document is printed to fit on a single page by changing the scale to 60% under the "Page Layout" tab Your final file should resemble the example shown in Fig 6.30.

132 6 Correlation and Simple Linear Regression

Fig 6.30 Final Spreadsheet of Correlation and Simple Linear Regression including the SUM- MARY OUTPUT for the Data

Save the resulting file as: GALLONS7

Note the following problem with the summary output.

Whoever wrote the computer program for this version of Excel made a mistake and gave the name: “Multiple R” to cell A53.

This is not correct Instead, cell A53 should say: “correlation r” since this is the notation that we are using for the correlation between X and Y.

You can now use your printout of the regression analysis to find the regression equation that is the best-fitting straight line through the data points.

But first, let’s review some basic terms.

6.5.2.1 Finding the y-Intercept, a, of the Regression Line

The y-intercept, denoted by the letter "a," is the point where the regression line intersects the y-axis when extended In the summary output of Fig 6.30, the y-intercept is 2.75, located in cell B66 This indicates that if the regression line were to be extended downward toward the y-axis, it would cross at the value of 2.75, which is the reason it is referred to as the "y-intercept."

6.5.2.2 Finding the Slope, b, of the Regression Line

The “tilt” of the regression line is called the “slope” of the regression line.

The regression line indicates the extent to which it deviates from a horizontal line across the data points In cases where the correlation between X and Y is zero, the regression line remains perfectly horizontal along the X-axis, resulting in a slope of zero.

A positive correlation between X and Y indicates that the regression line slopes upward to the right of the X-axis In the provided figure, the regression line demonstrates this upward trend, with a slope value of +1.0762 as noted in cell B67.

We will use the notation “b” to stand for the slope of the regression line (Note that Excel calls the slope of the line: “X Variable 1” in the Excel printout).

The data reveals a strong positive correlation of +0.91 between weight and gallons used, indicating that as weight increases, the gallons consumed also rise This relationship is visually represented by an upward-sloping regression line, as illustrated in the SUMMARY OUTPUT of the regression analysis in Fig 6.30, where the correlation coefficient (r) is noted as +0.91 in cell B53.

If the correlation between X and Y were negative, the regression line would

“slope down to the right” above the X-axis This would happen whenever the correlation between X and Y is a negative correlation that is between zero and minus one (0 and1).

134 6 Correlation and Simple Linear Regression

Finding the Equation for the Regression Line

To determine the regression equation that predicts the number of gallons used based on a car's weight, we need to focus on two key values from the SUMMARY OUTPUT in Figure 6.27: B66 and B67.

The format for the regression line is: Yẳaỵb X ð6:3ị whereaẳthe y-intercept(2.75 in our example in cell B66) andbẳthe slope of the line(+1.0762 in our example in cell B67)

Therefore, the equation for the best-fitting regression line for our example is:

Remember that Y is the number of gallons used that we are trying to predict, using the weight of the car as the predictor, X.

Let’s try an example using this formula to predict the number of gallons used for a car.

Using the Regression Line to Predict the y-Value

Objective: To find the number of gallons predicted for a car that weighed

3,000 pounds (Note: 3,000 pounds, when measured in thousands of pounds, is recorded as 3.0)

Important note: Remember that the weight of the car in thousands of pounds.

Since the weight is 3,000 pounds (i.e., Xẳ3.0 in thousands of pounds), substituting this number into our regression equation gives:

Yẳ5:98 gallons of gas needed to drive 150 miles

Important note: If you look at your chart, if you go directly upwards for a weight of

3.0 until you hit the regression line, you see that you hit this line just below 6 on the y-axis to the left when you draw a line horizontal to the x-axis (actually, it is 5.98), the result above for predicting the number of gallons needed for a car weighing 3,000 pounds.

To estimate the number of gallons required for a weight of 3,500 pounds, we convert the weight into thousands, resulting in 3.5.

Yẳ6:52 gallons of gas needed to drive 150 miles

To predict the number of gallons of gas required for a 3,500-pound car to travel 150 miles, refer to the chart and trace a vertical line upwards from the weight of 3.5 until it intersects the regression line, which occurs at approximately 6.52 on the y-axis.

For a more detailed discussion of regression, see Black (2010) and McKillup andDyar (2010).

Adding the Regression Equation to the Chart

Objective: To Add the Regression Equation to the Chart

If you want to include the regression equation within the chart next to the regression line, you can do that, but a word of caution first.

Throughout this book, we are using the regression equation for one predictor and one criterion to be the following:

YẳaỵbX ð6:3ị where aẳy-intercept and bẳslope of the line

See, for example, the regression equation in Sect.6.5.3where the y-intercept wasaẳ2.75and the slope of the line wasbẳ+1.0762to generate the following regression equation:

However, Excel 2013 uses a slightly different regression equation (which is logically identical to the one used in this book) when you add a regression equation to a chart:

Yẳb Xỵa ð6:4ị where aẳy-intercept and bẳslope of the line

136 6 Correlation and Simple Linear Regression

Note that this equation is identical to the one we are using in this book with the terms arranged in a different sequence.

For the example we used in Sect.6.5.3, Excel 2013 would write the regression equation on the chart as:

This is the format that will result when you add the regression equation to the chart using Excel 2013 using the following steps:

Open the file:GALLONS7 (that you saved in Sect.6.5.2)

To modify the chart, click just inside the outer border at the top right corner to add a "gray border," which will allow you to select the chart for upcoming changes.

Right-click on any of the data-points in the chart

Highlight: Add Trendline, and click on it to select this command

To display the equation on the chart, select the "Linear button" located at the top left of the dialog box, and then click on "Display Equation on chart" at the bottom of the dialog box (refer to Fig 6.31).

Click on the X at the top right of the Format Trendline dialog box to remove this box

Click on any empty celloutside of the chartto deselect the chart

Note that the regression equation on the chart is in the following form next to the regression line on the chart (see Fig.6.32).

Fig 6.31 Dialog Box for Adding the Regression Equation to the Chart Next to the Regression Line on the Chart

6.6 Adding the Regression Equation to the Chart 137

Fig 6.32 Example of a Chart with the Regression Equation Displayed Next to the Regression Line

138 6 Correlation and Simple Linear Regression

Yẳ1:0762 Xỵ2:75Now, save this file as: GALLONS10, and print it out so that it fits onto one page

How to Recognize Negative Correlations in the

in the SUMMARY OUTPUT Table

Important note: Since Excel does not recognize negative correlations in the

The SUMMARY OUTPUT results incorrectly treat all correlations as positive, due to an error by the programmer It is essential to recognize that a negative correlation between X and Y may exist, even if the output suggests a positive correlation.

You will know that the correlation between X and Y is a negative correlation when these two things occur:

(1) THE SLOPE, b, IS A NEGATIVE NUMBER This can only occur when there is a negative correlation.

(2)THE CHART CLEARLY SHOWS A DOWNWARD SLOPE INTHE REGRESSION LINE, which can only occur when the correlation between X and Y is negative.

Printing Only Part of a Spreadsheet Instead of the

Printing Only the Table and the Chart on

Objective: To print only the table and the chart on a separate page

1 Left-click your mouse starting at the top left of the tablein cell A3and drag the mousedown and to the right so that all of the table and all of the chart are highlighted in light blue on your computer screen from cell A3 to cell H48(the light blue cells are called the “selection” cells).

Print Active Sheet (hit the down arrow on the right)

The resulting printout should contain only the table of the data and the chart resulting from the data.

Then, click on any empty cell in your spreadsheet to deselect the table and chart.

Printing Only the Chart on a Separate Page

Objective: To print only the chart on a separate page

1 Click on any “white space”just inside the outside border of the chart in the top right corner of the chartto create the gray border around all of the borders of the chart in order to “select” the chart.

The resulting printout should contain only the chart resulting from the data.

To ensure optimal printing of your chart in Excel, remember to click on any white space outside the chart immediately after printing it on a separate page This action removes the gray border that indicates you want to print the chart alone Make this adjustment now for the best results.

140 6 Correlation and Simple Linear Regression

Printing Only the SUMMARY OUTPUT

of the Regression Analysis on a Separate Page

Objective: To print only the SUMMARY OUTPUT of the regression analysis on a separate page

1 Left-click your mouse at the cell just above SUMMARY OUTPUT incell A50 on the left of your spreadsheet and drag the mousedown and to the rightuntil all of the regression output is highlighted in dark blue on your screen from A50 to I67 (Change the “Scale to Fit” to 60 % so that the SUMMARY OUTPUT will fit onto one page when you print it out.)

Print active sheets (hit the down arrow on the right)

The resulting printout should contain only the SUMMARY OUTPUT of the regression analysis on a separate page.

Finally, click on any empty cell on the spreadsheet to “deselect” the regression table.

End-of-Chapter Practice Problems

This study aims to investigate the correlation between refluxing time and the amount of tin extracted from a product when boiled with hydrochloric acid The refluxing time, measured in minutes, serves as the predictor variable (X), while the amount of tin extracted is quantified in milligrams per kilogram (mg/kg) as the criterion variable (Y) To enhance Excel skills, a random sampling method has been employed for the analysis.

12 samples of this process, and have recorded the hypothetical scores on these variables in Fig.6.33:

6.9 End-of-Chapter Practice Problems 141

Create an Excel spreadsheet and enter the data using REFLUXING TIME (min) as the independent variable (predictor) and TIN EXTRACTED (mg/kg) as the dependent variable (criterion).

When analyzing the correlation between two variables in Excel, it is crucial to position the predictor variable, X, in the left column and the criterion variable, Y, directly to the right This arrangement should be consistently followed to avoid confusion and ensure accurate analysis when checking the relationship between the two variables.

To analyze the relationship between two variables, first create a table in Excel and utilize the CORREL function to calculate the correlation coefficient, rounding the result to two decimal places Label the correlation and position it below the table Next, generate an XY scatterplot to visually represent the data sets, illustrating the correlation effectively.

• Top title: RELATIONSHIP BETWEEN REFLUXING TIME AND AMOUNT OF TIN EXTRACTED

• x-axis title: REFLUXING TIME (min)

• y-axis title: TIN EXTRACTED (mg/kg)

• re-size the chart so that it is 8 columns wide and 25 rows long

• move the chart below the table

(c) Create theleast-squares regression linefor these data on the scatterplot and add the regression equation to the chart.

To analyze the data, utilize Excel to perform regression statistics and determine the equation for the least-squares regression line Ensure to display the results beneath the chart in your spreadsheet, formatting the numbers to two decimal places for clarity.

142 6 Correlation and Simple Linear Regression correlation and three decimal places for all the other decimal numbers, including the coefficients.

To ensure clarity and organization, print the input data and the chart on a single page, ensuring that all information is neatly contained Subsequently, print the regression output table on a separate page, formatted to fit appropriately within that page.

(f) save the file as: TIN3

Now, answer these questions using your Excel printout:

(1) What is the correlation coefficientr?

(3) What is the slope of the line?

(4) What is the regression equation for these data (use three decimal places for the y-intercept and the slope)?

(5) Use the regression equation to predict the amount of tin extracted you would expect for a refluxing time of 60 minutes.

Permafrost refers to soil, sediment, or rock that remains frozen for two years or more, with temperatures at or below zero degrees Celsius It is commonly found in high-altitude regions, such as the Rocky Mountains in Colorado To study the relationship between down-hole depth (measured in meters) and temperature (measured in degrees Celsius), researchers utilize drill holes to gather data for geophysical studies.

To analyze the relationship between DEPTH and TEMPERATURE, create an Excel spreadsheet and input the data with DEPTH as the independent variable and TEMPERATURE as the dependent variable Utilize the hypothetical data shown in Fig 6.34 to test your Excel skills on a small sample of drill holes.

Create an Excel spreadsheet and enter the data using DEPTH (meters) as the independent variable (predictor) and TEMPERATURE (degrees centigrade) as the dependent variable (criterion).

(a) create anXY scatterplotof these two sets of data such that:

Fig 6.34 Worksheet Data for Chap 6: Practice

6.9 End-of-Chapter Practice Problems 143

• top title: RELATIONSHIP BETWEEN DOWN-HOLE DEPTH AND TEMPERATURE

• y-axis title: TEMPERATURE (degrees centigrade)

• re-size the chart so that it is 7 columns wide and 25 rows long

• move the chart below the table

To create the least-squares regression line for the given data on the scatterplot, utilize Excel to run the regression statistics This will allow you to derive the equation for the least-squares regression line, which should be displayed below the chart in your spreadsheet Ensure that the correlation coefficient, r, as well as both the y-intercept and slope of the line, are formatted to two decimal places, while all other decimal figures are set to four decimal places.

To ensure clarity and organization, print the input data and the chart on a single page Subsequently, generate a separate page for the regression output table, allowing for easy reference and analysis.

(1a) Circle and label the value of the y-intercept and the slope of the regression line onto that separate page.

(2b) Read from the graphthe temperature you would predict for adepth of three metersand write your answer in the space immediately below: _

(f) save the file as: DEPTH3

Answer the following questions using your Excel printout:

3 What is the slope of the line?

4 What is the regression equation for these data (use two decimal places for the y-intercept and the slope)?

5 Use that regression equation to predict the temperature you would expect for a down-hole depth of two meters.

(Note that this correlation is not the multiple correlation as the Excel table indicates, but is merely the correlation r instead.)

Note that you found a positive correlation of +.94 between depth and tem- perature You know that the correlation is a positive correlation for two reasons:

(1) the regression line slopes upward and to the right on the chart, signaling a positive correlation, and (2) the slope is +0.53 which also tells you that the correlation is a positive correlation.

But how does Excel treatnegative correlations?

144 6 Correlation and Simple Linear Regression

Important note: Since Excel does not recognize negative correlations in the

When analyzing correlations between two variables, it's essential to recognize that not all correlations are positive Careful attention must be given to instances of negative correlations, as they indicate an inverse relationship between the variables being studied.

You know that the correlation is negative when:

(1) The slope, b, is a negative number which can only occur when there is a negative correlation.

(2) The chart clearly shows a downward slope in the regression line, which can only happen when the correlation is negative.

In a controlled laboratory study, the relationship between glue strength and temperature in wooden joints is examined, which is crucial for applications like furniture and construction A company has commissioned tests on their adhesive used for bonding wood, specifically focusing on cross grain joints made from uniform wood pieces Each test piece is identical in size, with consistent glue application across all surfaces The strength of these joints is measured by applying pressure with a force-measuring machine in Newtons (N), while the laboratory environment allows for precise temperature control To ensure proficiency in data analysis, hypothetical data is utilized for testing Excel skills.

Fig 6.35 Worksheet Data for Chap 6: Practice Problem #3

6.9 End-of-Chapter Practice Problems 145

To analyze the relationship between Force (N) and Temperature (°C), create an Excel spreadsheet and input the data with Force as the independent variable and Temperature as the dependent variable Utilize Excel's CORREL function to calculate the correlation between these two variables, labeling the result and placing it below the table Finally, round the correlation value to two decimal places for clarity.

(a) create anXY scatterplotof these two sets of data such that:

• top title: RELATIONSHIP BETWEEN GLUE STRENGTH AND TEMPERATURE

• move the chart below the table

• re-size the chart so that it is 8 columns wide and 25 rows long

(b) Create theleast-squares regression line for these data on the scatterplot, and add the regression equation to the chart.

Utilize Excel to perform regression analysis and determine the equation for the least-squares regression line based on the given data Ensure to present the results beneath the chart in your spreadsheet, formatting the correlation to two decimal places and all other numerical values, including coefficients, to three decimal places.

To ensure clarity and organization, print the input data along with the chart on a single page Subsequently, place the regression output table on a separate page, ensuring it fits neatly within that page.

(e) save the file as: GLUE3

Answer the following questions using your Excel printout:

1 What is the correlation between Force and Temperature?

3 What is the slope of the line?

4 What is the regression equation?

5 Use the regression equation to predict the Temperature you would expect for a force of 800 N Show your work on a separate sheet of paper.

Black K Business statistics: for contemporary decision making 6 th ed Hoboken: John Wiley & Sons, Inc.; 2010.

Ledholter R, Hogg R Applied statistics for engineers and physical scientists 3 rd ed Upper Saddle River: Pearson Prentice Hall; 2010.

146 6 Correlation and Simple Linear Regression

Levine D, Stephan D, Krehbiel T, Berenson M Statistics for managers using microsoft excel 6 th ed Boston: Prentice Hall Pearson; 2011.

McCleery R, Watt T, Hart T Introduction to statistics for biology 3 rd ed Boca Raton: Chapman & Hall/CRC; 2007.

McKillup S, Dyar M Geostatistics explained: an introductory guide for earth scientists Cam- bridge: Cambridge University Press; 2010.

In the physical sciences, predicting a criterion, Y, often necessitates exploring whether a combination of multiple predictors (e.g., X1, X2, X3) can enhance the prediction model, rather than relying solely on a single predictor, X.

The resulting statistical procedure is called “multiple correlation” because it uses two or more predictors in combination to predict Y, instead of a single predictor,

Multiple Regression Equation

The multiple regression equation follows a similar format and is:

Yẳaỵb1X1ỵb2X2ỵb3X3ỵetc:depending on the number of predictors used ð7:2ị

The “weight” given to each predictor in the equation is represented by the letter

In statistical analysis, the multiple correlation coefficient, denoted as R xy, ranges from 0 to +1, indicating the strength of the relationship between variables Unlike the correlation coefficient, r, which can vary between +1 and -1, R xy is always a non-negative value.

T.J Quirk et al., Excel 2013 for Physical Sciences Statistics,

Important note: In order to do multiple regression, you need to have installed the

“Data Analysis ToolPak” that was described in Chap 6 (see Sect.6.5.1) If you did not install this, you need to do so now.

The SAT Reasoning Test, a standardized college admissions test in the U.S., evaluates students' readiness for academic work, with approximately 1.4 million high school students participating annually The test comprises three subtests: Critical Reading, Writing, and Mathematics, each scoring between 200 and 800, and typically averaging around 500.

A selective college in the northeast U.S aims to explore how SAT Reading, Writing, and Math scores predict the freshman grade-point average (FROSH GPA) for Chemistry majors To investigate this relationship, the college has enlisted your help in analyzing data from 11 randomly selected chemistry majors from the previous year's freshman class, using the SAT subtest scores as predictors (X1, X2, X3) and FROSH GPA as the criterion (Y).

Let’s use the following notation:

Suppose, further, that you have collected the following hypothetical data sum- marizing these scores (see Fig.7.1):

150 7 Multiple Correlation and Multiple Regression

Create an Excel spreadsheet for these data using the following cell reference:

A4: Is there a relationship between SAT scores and Freshman GPA at a local college?

Next, change the column width to match the above table, and change all GPA figures to number format (two decimal places).

Now, fill in the additional data in the chart such that:

Then, center all numbers in your table

Always verify the accuracy of the numbers in your table to ensure your spreadsheets are correct Save this file under the name: GPA25.

Fig 7.1 Worksheet Data for SAT versus FROSH GPA (Practical Example)

Before we do the multiple regression analysis, we need to try to make one important point very clear:

When using a single predictor variable, X, to forecast a criterion variable, Y, it is crucial to position the X variable on the left side of your table and the Y variable on the right side This arrangement helps prevent any confusion between the two variables, ensuring clarity in your analysis (refer to Section 6.3 for more details).

However, in multiple regression, you need to follow this rule which is exactly the opposite:

In multiple regression analysis, it is crucial to position the criterion variable, Y, on the far left of your table, with all predictor variables aligned to the right This clear organization helps to easily identify the criterion and its associated predictors, ultimately saving you from potential confusion and errors in your analysis Establishing this practice will enhance your efficiency and accuracy in regression modeling.

In the provided table, the criterion variable Y (FROSH GPA) is positioned on the far left, while the three predictors—READING SCORE, WRITING SCORE, and MATH SCORE—are located to the right Adhering to this arrangement helps minimize errors in analysis.

Finding the Multiple Correlation and the Multiple

Objective: To find the multiple correlation and multiple regression equation using Excel.

You do this by the following commands:

Click on: Data Analysis (far right top of screen)

Regression (scroll down to this in the box; see Fig.7.2)

152 7 Multiple Correlation and Multiple Regression

Note that both the input Y Range and the Input X Range above both include the label at the top of the columns.

Click on the Labels box toadd a check markto it (because you have included the column labels in row 6)

Output Range (click on the button to its left, and enter): A20 (see Fig.7.3)

Excel automatically adds a dollar sign ($) before each column letter and row number to ensure that data ranges remain fixed during regression analysis.

Fig 7.2 Dialogue Box for Regression Function

7.2 Finding the Multiple Correlation and the Multiple Regression Equation 153

OK (see Fig.7.4to see the resulting SUMMARY OUTPUT)

Fig 7.3 Dialogue Box for SAT vs FROSH GPA Data

Fig 7.4 Regression SUMMARY OUTPUT of SAT vs FROSH GPA Data

154 7 Multiple Correlation and Multiple Regression

Next, format cell B23 in number format (two decimal places)

Next, format the following four cells in Number format (four decimal places):

Change all other decimal figures to two decimal places, and center all figures within their cells.

Save the file as: GPA26

Now, print the file so that it fits onto one page by changing the scale to60% size. The resulting regression analysis is given in Fig.7.5.

Fig 7.5 Final Spreadsheet for SAT vs FROSH GPA Regression Analysis

7.2 Finding the Multiple Correlation and the Multiple Regression Equation 155

To analyze the data effectively, utilize the SUMMARY OUTPUT to identify the multiple correlation and the regression equation that best fits the data points In this analysis, READING SCORE, WRITING SCORE, and MATH SCORE serve as the three predictor variables, while FROSH GPA is designated as the criterion variable.

The term "Multiple R," as indicated in the SUMMARY OUTPUT, accurately reflects Excel's terminology for multiple correlation, which is +0.80 This indicates a robust positive relationship among READING SCORES, WRITING SCORES, and MATH SCORES in predicting FROSH GPA.

To find the regression equation, notice the coefficients at the bottom of the SUMMARY OUTPUT:

Intercept : a (this is the y-intercept) 1.5363

Since the general form of the multiple regression equation is:

Yẳaỵb1X1ỵb2X2ỵb 3 X 3 ð7:2ị we can now write the multiple regression equation for these data:

Using the Regression Equation to Predict

Objective: To find the predicted FROSH GPA using an SAT Reading Score of

600, an SAT Writing Score of 500, and an SAT Math Score of 550 Plugging these three numbers into our regression equation gives us:

Yẳ3:20 since GPA scores are typically measured in two decimalsð ị

If you want to learn more about the theory behind multiple regression, see Keller

156 7 Multiple Correlation and Multiple Regression

Using Excel to Create a Correlation Matrix

The final step in multiple regression is to find the correlation between all of the variables that appear in the regression equation.

In our example, this means that we need to find the correlation between each of the six pairs of variables:

To do this, we need to use Excel to create a “correlation matrix.” This matrix summarizes the correlations between all of the variables in the problem.

Objective: To use Excel to create a correlation matrix between the four variables in this example.

To use Excel to do this, use these steps:

Data (top of screen under “Home” at the top left of screen)

Correlation (scrollupto highlight this formula; see Fig.7.6)

The dataset includes key variables such as FROSH GPA, READING SCORE, WRITING SCORE, and MATH SCORE, along with their corresponding figures This comprehensive collection of data provides valuable insights into academic performance metrics Analyzing these variables can help identify trends and correlations that are essential for understanding student success.

Fig 7.6 Dialogue Box for SAT vs FROSH GPA Correlations

7.4 Using Excel to Create a Correlation Matrix in Multiple Regression 157

Put a check in the box for: Labels in the First Row (since you included the labels at the top of the columns in your input range of data above)

Output range (click on the button to its left, and enter): A42 (see Fig.7.7)

The resulting correlation matrix appears in A42:E46 (see Fig.7.8).

To enhance the correlation matrix, format all decimal numbers to two decimal places and adjust the width of column E to ensure that the MATH SCORE label fits neatly within cell E42.

Save this Excel file as: GPA27

The final spreadsheet for these scores appears in Fig.7.9.

Fig 7.7 Dialogue Box for Input/Output Range for Correlation Matrix

Fig 7.8 Resulting Correlation Matrix for SAT Scores vs FROSH GPA Data

158 7 Multiple Correlation and Multiple Regression

In a correlation matrix, the diagonal entries marked with the number "1" indicate a perfect positive correlation of 1.0 for each variable with itself Correlation coefficients are typically presented with two decimal places You are now prepared to analyze the correlations among the six pairs of variables.

The analysis reveals significant correlations between various academic scores and freshman GPA The strongest correlation is observed between Math Score and Freshman GPA, with a value of 0.77, indicating a robust relationship Reading Score and Freshman GPA also show a notable correlation of 0.51 Additionally, Writing Score correlates with Freshman GPA at 0.45 When examining the relationships among the scores, Reading and Writing Scores have a correlation of 0.47, while Math Score correlates with Reading Score at 0.44 and with Writing Score at 0.43, highlighting the interconnectedness of these academic performance indicators.

This means that the best predictor of FROSH GPA is the MATH SCORE with a correlation of +.77 Adding the other two predictor variables, READING SCORE

Fig 7.9 Final Spreadsheet for SAT Scores vs FROSH GPA Regression and the Correlation Matrix

Using Excel to create a correlation matrix in multiple regression reveals that while incorporating WRITING SCORE improved prediction by a mere 0.03 to 0.80, the enhancement was minimal In contrast, MATH SCORES serve as a strong standalone predictor of FROSH GPA.

If you want to learn more about the correlation matrix, see Levine et al (2011).

End-of-Chapter Practice Problems

The combustion rate of gunpowder is influenced by various chemical and physical properties, particularly particle size; smaller particles burn faster, generating higher pressure and temperatures Gunpowder manufacturers produce different sizes and shapes to control burn rates, which are tailored for specific types of ammunition and firearms.

Gunpowder used in cannons is typically larger and burns at a slower rate, while rifle gunpowder is smaller and burns more quickly If gunpowder ignites too rapidly or generates excessive heat, it can create dangerous pressure levels that may lead to the failure of the weapon’s barrel, potentially resulting in an explosion.

The SI unit for measuring pressure is the pascal (Pa), with typical pressures in a 50 caliber military rifle reaching approximately 370 MPa (1 MPa = 1,000,000 Pa) A gunpowder manufacturing company has been tasked with developing an experimental gunpowder load that achieves a pressure of 378 MPa in the rifle's chamber, which holds the cartridge containing the bullet and gunpowder To meet this requirement, the company will utilize a single chemical formula for the gunpowder, varying only the particle sizes into four distinct categories.

In the test, four types of powder—Round, Cylinder, Flake, and Irregular—were evaluated, revealing that a blend of these powders produces the most consistent burn, ensuring similar results with each use Each 50 caliber cartridge will contain a total of 214 grains (gr) of gunpowder for optimal performance.

To evaluate your Excel skills, you have chosen to conduct multiple correlation and multiple regression analysis using data from a random sample of 12 test firings of mixed powders, as illustrated in Fig 7.10.

160 7 Multiple Correlation and Multiple Regression

(a) Create an Excel spreadsheet using Breech pressure (MPa) as the criterion (Y), and the other variables as the predictors.

(b) Use Excel’smultiple regressionfunction to find the relationship between these five variables and place it below the table.

(c) Use number format (two decimal places for the multiple correlation on the SUMMARY OUTPUT, and use four decimal places for the coefficients in the SUMMARY OUTPUT).

(d) Print the table and regression results below the table so that they fit onto one page.

(e) Save this file as: GUNPOWDER8

Answer the following questions using your Excel printout:

1 What is the multiple correlationRxy?

3 What is the coefficient for Round,b1?

4 What is the coefficient for Cylinder,b2?

5 What is the coefficient for Flake, b 3 ?

6 What is the coefficient for Irregular, b 4 ?

7 What is the multiple regression equation?

8 Predict the Breech pressure you would expect for a Round score of 63, a Cylinder score of 58, a Flake score of 41, and an Irregular score of 50.

(f) Now, go back to your Excel file and create acorrelation matrixfor these five variables, and place it underneath the SUMMARY OUTPUT. (g) Re-save this file as: GUNPOWDER8

(h) Now, print outjust this correlation matrixon a separate sheet of paper.

Answer the following questions using your Excel printout Be sure to include the plus or minus sign for each correlation:

9 What is the correlation Round and Breech pressure?

10 What is the correlation between Cylinder and Breech pressure?

Fig 7.10 Worksheet Data for Chap 7: Practice Problem #1

7.5 End-of-Chapter Practice Problems 161

11 What is the correlation between Flake and Breech pressure?

12 What is the correlation between Irregular and Breech pressure?

13 What is the correlation between Cylinder and Round?

14 What is the correlation between Flake and Cylinder?

15 Discuss which of the four predictors is the best predictor of Breech pressure.

16 Explain in words how much better the four predictor variables together predict Breech pressure than the best single predictor by itself.

In studying locomotive engines, it's essential to analyze the relationship between the stopping distance of a train (measured in feet) and two key factors: the enforcement speed of the brakes (in miles per hour) and the weight of the train (in tons) In this context, the stopping distance serves as the dependent variable, while enforcement speed and weight act as independent variables This framework allows for a comprehensive understanding of how these factors influence train braking performance.

13 test runs are presented in Fig.7.11.

(a) create an Excel spreadsheet using stopping distance as the criterion (Y), and the other variables as the two predictors of this criterion.

(b) Use Excel’smultiple regressionfunction to find the relationship between these variables and place it below the table.

For the Summary Output, format the multiple correlation values to two decimal places, while ensuring that the coefficients and all other decimal figures are displayed with three decimal places.

(d) Print the table and regression results below the table so that they fit onto one page.

(e) By hand on this printout,circle and label:

(2b) coefficients for the y-intercept, enforcement speed, and weight. Fig 7.11 Worksheet Data for Chap 7: Practice Problem #2

162 7 Multiple Correlation and Multiple Regression

(f) Save this file as: TRAIN3

To create a correlation matrix for the three variables, return to your Excel file and position it beneath the Summary Table Ensure that each correlation is rounded to two decimal places Finally, save the file with the name: TRAIN3.

(h) Now, print outjust this correlation matrix in portrait modeon a separate sheet of paper.

Answer the following questions using your Excel printout:

1 What is the multiple correlation R xy ?

3 What is the coefficient for enforcement speedb1?

4 What is the coefficient for weightb2?

5 What is the multiple regression equation?

6 Underneath this regression equation by hand, predict the stopping distance you would expect for an enforcement speed of 10.8 mph and a weight of 4,600 tons.

Answer the following questions using your Excel printout Be sure to include the plus or minus sign for each correlation:

7 What is the correlation between enforcement speed and stopping distance?

8 What is the correlation between weight and stopping distance?

9 What is the correlation between enforcement speed and weight?

10 Discuss which of the two predictors is the better predictor of stopping distance.

11 Explain in words how much better the two predictor variables combined edict stopping distance than the better single predictor by itself.

"Grow Your Own Crystals" kits are popular home science projects that enable individuals to cultivate vibrant crystals from basic solutions The success of crystal growth is influenced by key factors such as temperature (°C), humidity (%), and time (days) As a new company selling these kits, it is essential to determine the optimal combination of these variables to maximize crystal production In this context, the growth of crystals, measured in centimeters (cm), serves as the dependent variable, while temperature, humidity, and time act as the independent variables that predict the outcome.

In this analysis, multiple correlation and multiple regression techniques are employed to evaluate a dataset derived from 10 random trials of crystal growth, as illustrated in Fig 7.12 This approach aims to assess the relationships between various factors influencing crystal growth, showcasing the practical application of Excel skills in statistical analysis.

7.5 End-of-Chapter Practice Problems 163

(a) Create an Excel spreadsheet using GROWTH (cm) as the criterion and the other three variables as the predictors.

To analyze the relationship among the four variables, utilize Excel's multiple regression function and include the SUMMARY OUTPUT beneath the data table Ensure that the multiple correlation is formatted to two decimal places, while the coefficients and all other decimal figures in the SUMMARY OUTPUT should be displayed with three decimal places for clarity.

(d) Save the file as: CRYSTAL8

(e) Print the table and regression results below the table so that they fit onto one page.

Answer the following questions using your Excel printout:

3 What is the coefficient for Temperature,b1?

4 What is the coefficient for Humidity,b2?

5 What is the coefficient for Time,b3?

6 What is the multiple regression equation?

7 Predict the GROWTH you would expect for a Temperature of 25 C, a Humidity of 34 %, and a Time of 6 days.

(f) Now, go back to your Excel file and create a correlation matrix for these four variables, and place it underneath the SUMMARY OUTPUT on your spreadsheet.

(g) Re-save this file as: CRYSTAL8

(h) Now, print outjust this correlation matrixon a separate sheet of paper. Fig 7.12 Worksheet Data for Chap 7: Practice Problem #3

164 7 Multiple Correlation and Multiple Regression

Answer the following questions using your Excel printout Be sure to include the plus or minus sign for each correlation:

8 What is the correlation between Temperature and GROWTH?

9 What is the correlation between Humidity and GROWTH?

10 What is the correlation between Time and GROWTH?

11 What is the correlation between Humidity and Temperature?

12 What is the correlation between Time and Temperature?

13 What is the correlation between Time and Humidity?

14 Discuss which of the three predictors is the best predictor of GROWTH.

15 Explain in words how much better the three predictor variables combined predict GROWTH than the best single predictor by itself.

Keller G Statistics for management and economics 8 th ed Mason: South-Western Cengage Learning; 2009.

Ledolter J, Hogg R Applied statistics for engineers and physical scientists 3 rd ed Upper Saddle River: Pearson Prentice Hall; 2010.

Levine D, Stephan D, Krehbiel T, Berenson M Statistics for managers using Microsoft Excel 6 th ed Boston: Pearson Prentice Hall; 2011.

One-Way Analysis of Variance (ANOVA)

In this 2013 Excel Guide, you have explored the use of one-group and two-group t-tests for comparing sample means to population means and between two sample means, respectively However, when dealing with more than two groups, it is essential to utilize an analysis of variance (ANOVA) to assess whether there are significant differences among the means of these groups.

The answer to this question is:Analysis of Variance (ANOVA).

The ANOVA test allows you to test for the difference between the means when you havethree or more groupsin your research study.

To perform a One-way Analysis of Variance (ANOVA), it is essential to have the "Data Analysis Toolpak" installed, as outlined in Chapter 6, Section 6.5.1 If you have not yet installed this tool, please do so before proceeding.

As a research scientist for a tire company, you conducted a laboratory test to compare your premium tire brand (Brand A) with two major competitors (Brands B and C) The test measured the simulated miles driven until the tread length reached a specified threshold, with results indicating that the data is represented in thousands of miles For instance, a result of 63 corresponds to 63,000 miles driven.

In this analysis, we aim to determine if there are significant differences in the miles driven among three tire brands A random sample of tires from each brand has been selected for this purpose, as illustrated in Fig 8.1 It's important to note that the number of tires from each brand may vary, which allows us to apply ANOVA to the data This flexibility is a key strength of ANOVA, often described by statisticians as a "very robust test."

T.J Quirk et al., Excel 2013 for Physical Sciences Statistics,

Create an Excel spreadsheet for these data in this way:

A6: (Data are in thousands of miles)

Using Excel to Perform a One-Way Analysis of

Objective: To use Excel to perform a one-way ANOVA test.

You are now ready to perform an ANOVA test on these data using the following steps:

Fig 8.1 Worksheet Data for Tire Mileage Test (Practical Example)

168 8 One-Way Analysis of Variance (ANOVA)

Data (at top of screen)

Data Analysis (far right at top of screen)

Anova: Single Factor (scroll up to this formula and highlight it; see Fig.8.2)

Input range: B8:D18 (note that you have included in this range the column titles that are in row 8)

When comparing groups with varying sample sizes, ensure that the defined INPUT RANGE begins at the column title of the first group on the left and extends to the last column on the right, encompassing all rows with data in the matrix For instance, if Brand B has a value of 63 in cell C18, your INPUT RANGE must include row 18 to maintain a rectangular shape.

Put a check mark in: Labels in First Row

Output range (click on the button to its left): A20 (see Fig.8.3)

Fig 8.2 Dialog Box for Data Analysis: Anova Single Factor

8.1 Using Excel to Perform a One-Way Analysis of Variance (ANOVA) 169

Center all of the numbers in the ANOVA table, and round off all numbers that are decimals to two decimal places.

Save this file as: TIRE6A

You should have generated the table given in Fig.8.4.

Fig 8.3 Dialog Box for Anova: Single Factor Input/Output Range

170 8 One-Way Analysis of Variance (ANOVA)

To ensure all information is displayed on a single page, print both the data table and the ANOVA summary table by adjusting the Page Layout settings to fit the scale at 85%.

As a check on your analysis, you should have the following in these cells:

Now, let’s discuss how you should interpret this table:

Fig 8.4 ANOVA Results for Tire Mileage Test

8.1 Using Excel to Perform a One-Way Analysis of Variance (ANOVA) 171

How to Interpret the ANOVA Table Correctly

Objective: To interpret the ANOVA table correctly

ANOVA, or Analysis of Variance, enables the comparison of means across three or more data groups This statistical method utilizes the F-test statistic, commonly denoted by the letter F, to determine significant differences among the group means.

The formula for the F-test is this:

FẳMean Square between groups (MS b ) divided by Mean Square within groups (MS w )

This Excel Guide focuses on teaching users how to utilize Excel rather than delving into the statistical theory behind ANOVA formulas For an in-depth understanding of ANOVA, please refer to the works of Hibbert and Gooding (2006) and Black (2010).

In Excel, dividing the values in cell D31 (MS b = 41.50) by the value in cell D32 (MS w = 3.24) yields an F-test result of 12.83, which is displayed in cell E31 This demonstrates the precision of Excel calculations compared to traditional calculators.

To assess if the F value of 12.83 signifies a meaningful difference among the means of the three tire brands, we must first establish the null hypothesis and the research hypothesis for these brands.

In our mileage comparison statistics, the null hypothesis posits that the population means of the three groups are equal, whereas the research hypothesis suggests that the population means are not equal, indicating a significant difference among them Based on the ANOVA results, you must determine which hypothesis to accept.

Using the Decision Rule for the ANOVA F-test

To state the hypotheses, let’s call Brand A as Group 1, Brand B as Group 2, and Brand C as Group 3 The hypotheses would then be:

172 8 One-Way Analysis of Variance (ANOVA)

The decision-making process for this question mirrors the rules established for the one-group and two-group t-tests outlined in Sections 4.1.6 and 5.1.8 of this book.

If the absolute value of t is less than the critical t, you accept the null hypothesis. or

If the absolute value of t is greater than the critical t, you reject the null hypothesis, and accept the research hypothesis.

Now, here is the decision rule for ANOVA:

Objective: To learn the decision rule for the ANOVA F-test

The decision rule for the ANOVA F-test is the following:

If the value for F is less than the critical F-value, accept the null hypothesis. or

If the value of F is greater than the critical F-value, reject the null hypothesis, and accept the research hypothesis.

Note that Excel tells you the critical F-value in cell G31: 3.55

Therefore, our decision rule for the AVOVA test is this:

Since the value of F of 12.83 is greater than the critical F-value of 3.55, we reject the null hypothesis and accept the research hypothesis.

Therefore, our conclusion, in plain English, is:

There is a significant difference between the number of miles driven between the three brands of tires.

The F-value, which cannot fall below one, is inherently non-negative; therefore, there is no need to consider its absolute value for calculations.

ANOVA indicates a significant difference among the population means of three groups; however, it does not specify which pairs of groups exhibit significant differences.

Testing the Difference Between Two Groups

Comparing Brand A vs Brand C in Miles

Objective: To compare Brand A vs Brand C in miles driven using the ANOVA t-test.

The first step is to write the null hypothesis and the research hypothesis for these two brands of tires.

In the context of the ANOVA t-test, the null hypothesis posits that the population means of Brand A (Group 1) and Brand C (Group 3) are equal, while the research hypothesis asserts that there is a significant difference between these two means, indicating that the population means are not equal.

For Group 1 vs Group 3, the formula for the ANOVA t-test is:

ANOVA tẳ X1X2 s:e:ANOVA ð8:2ị where s:e:ANOVAẳ

174 8 One-Way Analysis of Variance (ANOVA)

The steps involved in computing this ANOVA t-test are:

1 Find the difference of the sample means for the two groups (62 – 66.33ẳ 4.33).

2 Find 1/n 1 + 1/n 3 (since both groups have a different number of tires in them, this becomes: 1/5 + 1/6ẳ0.20 + 0.17ẳ0.37)

3 Multiply MS w times the answer for step 2 (3.240.37ẳ1.20)

4 Take the square root of step 3 (SQRT (1.20)ẳ1.09)

5 Divide Step 1 by Step 4 to find ANOVA t (4.33/1.09ẳ 3.97)

Excel performs calculations with a precision of 16 decimal places, ensuring highly accurate results While your rounded answer may be 3.98 when expressed to two decimal places, Excel retains greater accuracy in its computations, reflecting the true value more precisely.

To accurately interpret the ANOVA t-test result of 3.97, it is essential to determine the critical value of t associated with the test This involves calculating the degrees of freedom for the ANOVA t-test.

8.4.1.1 Finding the Degrees of Freedom for the ANOVA t-Test

Objective: To find the degrees of freedom for the ANOVA t-test.

The degrees of freedom (df) for the ANOVA t-test can be calculated by taking the total sample size of all groups and subtracting the number of groups in the study, expressed as df = n_TOTAL - k, where k represents the number of groups.

In our example, the total sample size of the three groups is 21 since there are

5 tires in Group 1, 10 tires in Group 2, and 6 tires in Group 3, and since there are three groups, 213 gives a degrees of freedom for the ANOVA t-test of 18.

In the t-table found in Appendix E, the critical t-value for df = 18 is 2.101, as indicated in the degrees of freedom column.

Important note: Be sure to use the degrees of freedom column (df) in AppendixE for the ANOVA t-test critical t value

8.4.1.2 Stating the Decision Rule for the ANOVA t-Test

Objective: To learn the decision rule for the ANOVA t-test

8.4 Testing the Difference Between Two Groups Using the ANOVA t-Test 175

Interpreting the results of the ANOVA t-test adheres to the same decision-making criteria utilized in both the one-group t-test and the two-group t-test.

If the absolute value of t is less than the critical value of t, we accept the null hypothesis. or

If the absolute value of t is greater than the critical value of t, we reject the null hypothesis and accept the research hypothesis.

In conducting a t-test, we take the absolute value of t, which in this case is 3.98 This value exceeds the critical t-value of 2.101, leading us to reject the null hypothesis that posits the population means of the two groups are equal Consequently, we accept the research hypothesis, indicating that the population means of the two groups are significantly different.

This means that our conclusion, in plain English, is as follows:

The average tire mileage for Brand C was significantly greater than the average tire mileage for Brand A (66,000 vs 62,000).

The average tire mileage difference of approximately 4,000 miles between Brand A and Brand C may appear minimal, yet it signifies that Brand C's average miles driven exceeded Brand A's by 7% This distinction in mileage is significant based on our hypothetical data.

8.4.1.3 Performing an ANOVA t-Test Using Excel Commands

Now, let’s do these calculations for the ANOVA t-test using Excel with the file you created earlier in this chapter: TIRE6A

You should now have the following results in these cells when you round off all these figures in the ANOVA t-test to two decimal points:

Save this final result under the file name: TIRE7

176 8 One-Way Analysis of Variance (ANOVA)

Print out the resulting spreadsheet so that it fits onto one page like Fig 8.5 (Hint: Reduce the Page Layout/Scale to Fit to85%).

Fig 8.5 Final Spreadsheet of Tire Mileage for Brand A vs Brand C

8.4 Testing the Difference Between Two Groups Using the ANOVA t-Test 177

For a more detailed explanation of the ANOVA t-test, see Black (2010).

When conducting an ANOVA t-test to compare the means of two groups, it is crucial to first ensure that the F-test indicates a significant difference among the means of all groups involved in the study.

Conducting an ANOVA t-test is inappropriate when the F value is lower than the critical F value, indicating no significant difference between group means In such cases, testing for differences between any two groups would merely exploit random variations For further insights on this crucial aspect, refer to Gould and Gould (2002).

End-of-Chapter Practice Problems

1 Let’s suppose that you have been asked to study the yield (grams of product produced) of a chemical reaction conducted under three different temperature conditions: (1) BELOW ROOM TEMPERATURE (15 degrees Celsius ( C)),

(2) ROOM TEMPERATURE (25 degrees Celsius ( C)), and (3) ABOVE ROOM TEMPERATURE (30 degrees Celsius ( C)).

In this analysis, the objective is to assess the yield differences (grams of product produced) across three distinct temperatures using a random sample of results from each chemical reaction (refer to Fig 8.6) It's important to note that the number of results in each group may vary, which allows for the application of ANOVA on the dataset Statisticians appreciate this flexibility, often describing ANOVA as a "very robust test."

Fig 8.6 Worksheet Data for Chap 8: Practice Problem #1

178 8 One-Way Analysis of Variance (ANOVA)

(a) Enter these data on an Excel spreadsheet.

To conduct a one-way ANOVA test on the provided data, generate an ANOVA table that summarizes the results for the three temperature groups If the F-value in the ANOVA table indicates significance, utilize an Excel formula to perform a t-test comparing the average values of ROOM TEMPERATURE and ABOVE ROOM TEMPERATURE Display the standard error and the ANOVA t-test value on separate lines below the ANOVA table in the spreadsheet, ensuring that each value is formatted to two decimal places.

(d) Print out the resulting spreadsheet so that all of the information fits onto one page

(e) Save the spreadsheet as: REACTION3

Now, write the answers to the following questions using your Excel printout:

1 What are the null hypothesis and the research hypothesis for the ANOVA F-test?

2 What is MS b on your Excel printout?

3 What is MS w on your Excel printout?

4 Compute FẳMSb=MSwusing your calculator.

5 What is the critical value of F on your Excel printout?

6 What is the result of the ANOVA F-test?

7 What is the conclusion of the ANOVA F-test in plain English?

In the context of an ANOVA F-test that indicates a significant difference among three temperature types, the null hypothesis posits that there is no difference in the means of ROOM TEMPERATURE and ABOVE ROOM TEMPERATURE Conversely, the research hypothesis suggests that a significant difference exists between these two temperature conditions.

9 What is the mean (average) for ROOM TEMPERATURE on your Excel printout?

10 What is the mean (average) for ABOVE ROOM TEMPERATURE on your Excel printout?

11 What are the degrees of freedom (df) for the ANOVA t-test comparing ROOM TEMPERATURE versus ABOVE ROOM TEMPERATURE?

12 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?

13 Compute the s.e ANOVA using your calculator.

14 Compute the ANOVA t-test value comparing ROOM TEMPERATURE versus ABOVE ROOM TEMPERATURE using your calculator.

15 What is the result of the ANOVA t-test comparing ROOM TEMPERA- TURE versus ABOVE ROOM TEMPERATURE?

16 What is the conclusion of the ANOVA t-test comparing ROOM TEMPER- ATURE versus ABOVE ROOM TEMPERATURE in plain English?

To identify significant differences among the three types of temperatures, it is essential to conduct three separate ANOVA t-tests Having completed the ANOVA t-test, you can now analyze the results to understand the variations between the temperature types.

To draw meaningful conclusions from the data, it is essential to conduct multiple ANOVA t-tests Specifically, compare room temperature with above room temperature, room temperature with below room temperature, and above room temperature with below room temperature These comparisons will provide a comprehensive summary of the effects of temperature variations on the studied outcomes.

Small variations in horsepower output from race car engines can significantly impact the outcomes of professional races As a sponsor's engineer tasked with enhancing engine performance, you have created four prototypes of fuel injectors, labeled A, B, C, and D These injectors have been tested on various engines to evaluate their horsepower output To analyze the performance, you collected a random sample of horsepower readings from these engines, utilizing hypothetical data for your Excel analysis.

(a) Enter these data on an Excel spreadsheet.

To conduct a one-way ANOVA test on the provided data for the four types of fuel injectors, generate the corresponding ANOVA table below the input data Ensure that all decimal figures are rounded to two decimal places and that all numbers in the ANOVA table are centered for clarity.

If the F-value in the ANOVA table is significant, use an Excel formula to calculate the ANOVA t-test for comparing the horsepower output of FUEL INJECTOR A with that of FUEL INJECTOR C Display the results below the ANOVA table in the spreadsheet, ensuring that the standard error and the ANOVA t-test value are presented on separate lines, formatted to two decimal places.

Fig 8.7 Worksheet Data for Chap 8: Practice Problem #2

180 8 One-Way Analysis of Variance (ANOVA)

(d) Print out the resulting spreadsheet so that all of the information fits onto one page

(e) Save the spreadsheet as: FUEL3

Now, write the answers to the following questions using your Excel printout:

1 What are the null hypothesis and the research hypothesis for the ANOVA F-test?

2 What is MS b on your Excel printout?

3 What is MS w on your Excel printout?

4 Compute FẳMSb=MSwusing your calculator.

5 What is the critical value of F on your Excel printout?

6 What is the result of the ANOVA F-test?

7 What is the conclusion of the ANOVA F-test in plain English?

In the context of the ANOVA F-test analyzing horsepower output among four fuel injectors, the null hypothesis posits that there is no significant difference in horsepower output between Fuel Injector A and Fuel Injector C Conversely, the research hypothesis suggests that a significant difference in horsepower output exists between the two injectors.

9 What is the mean (average) horsepower for FUEL INJECTOR A on your Excel printout?

10 What is the mean (average) horsepower for FUEL INJECTOR C on your Excel printout?

11 What are the degrees of freedom (df) for the ANOVA t-test comparing FUEL INJECTOR A versus FUEL INJECTOR C?

12 What is the critical t value for this ANOVA t-test in AppendixEfor these degrees of freedom?

13 Compute the s.e.ANOVAusing Excel for FUEL INJECTOR A versus FUEL INJECTOR C.

14 Compute the ANOVA t-test value comparing FUEL INJECTOR A versus FUEL INJECTOR C using Excel.

15 What is the result of the ANOVA t-test comparing FUEL INJECTOR A versus FUEL INJECTOR C?

16 What is the conclusion of the ANOVA t-test comparing FUEL INJECTOR

A versus FUEL INJECTOR C in plain English?

In a research study aimed at examining the relationship between vehicle size and gasoline usage, five vehicle categories were analyzed: subcompacts, compacts, mid-size, large, and SUVs Participants, who owned each type of vehicle, tracked their highway mileage over a set route while using three tanks of gasoline The study seeks to determine whether the size of the vehicle influences highway miles per gallon (mpg), providing valuable insights into fuel efficiency across different vehicle types.

8.5 End-of-Chapter Practice Problems 181

(a) Enter these data on an Excel spreadsheet.

To conduct a one-way ANOVA test on the provided data for five types of vehicles, generate an ANOVA table that summarizes the results If the F-value in the ANOVA table indicates significance, use an Excel formula to calculate the ANOVA t-test, comparing the average miles per gallon (mpg) for COMPACTS versus LARGE vehicles Present the findings below the ANOVA table in the spreadsheet, ensuring that the standard error and the ANOVA t-test value are displayed on separate lines, formatted to two decimal places each.

(d) Print out the resulting spreadsheet so that all of the information fits onto one page

(e) Save the spreadsheet as: CARS3

Now, write the answers to the following questions using your Excel printout:

1 What are the null hypothesis and the research hypothesis for the ANOVA F-test?

2 What is MS b on your Excel printout?

3 What is MS w on your Excel printout?

4 Compute FẳMS b =MS w using your calculator.

5 What is the critical value of F on your Excel printout?

6 What is the result of the ANOVA F-test?

7 What is the conclusion of the ANOVA F-test in plain English?

In the context of an ANOVA F-test that indicates a significant difference in miles per gallon (mpg) among five vehicle types, the null hypothesis (H0) for the ANOVA t-test comparing COMPACTS versus LARGE vehicles posits that there is no significant difference in mpg between these two categories Conversely, the research hypothesis (H1) asserts that there is a significant difference in mpg between COMPACTS and LARGE vehicles.

9 What is the mean (average) mpg for COMPACTS on your Excel printout?

10 What is the mean (average) mpg for LARGE on your Excel printout?

11 What are the degrees of freedom (df) for the ANOVA t-test comparing COMPACTS versus LARGE?

Fig 8.8 Worksheet Data for Chap 8: Practice Problem #3

182 8 One-Way Analysis of Variance (ANOVA)

Ngày đăng: 20/09/2022, 22:50