21 Double-click on the chart to move the histogram from the Chart Carousel Window to a Chart Window. The menu bar and tool bar change to show editing facilities. First, click on CHART then OPTIONS and NORMAL CURVE - then hit OK. The normal curve superimposed over the histogram is the one for the above mean and standard deviation. Admittedly, it‟s difficult to make a decision with such a small sample, but does the curve appear to be a good fit to the histogram? Now, click on the icon „swap axes‟. Does the histogram look better with vertical bars or horizontal bars? Now try some of the other icons and tools to change the chart. These changes require the appropriate part of the chart to have been selected. Click on any bar. The bars will become highlighted with small black squares at their corners. Then click on the Fill Pattern - tool button (the rectangle with diagonal shading). To apply a pattern, click on it and then click on apply. Once you have finished with the patterns, click on close. Also, try the Colour Palette tool button (the one with the pen) and the Bar Labels icon tool button (the one with the fingernails). You can also change the style of the line showing the Normal curve, and the fill pattern and colour of the background of the histogram. Once you have finished with your work, select FILE and then SAVE CHART. Save your histogram as artwork.chz To copy or move a chart into Word click on EDIT and then select COPY the chart. To move to Word minimise SPSS and open word. If Word is already open then press ALT & TAB to move between programs. Once in Word, go to EDIT PASTE. Finally, exit from SPSS for windows by selecting FILE EXIT Section II: Manipulating the Data in the Matrix (Computing, Recoding, Filtering and Deleting Data) Computing Values Start off SPSS and open the file family.sav (you should find this file on your M: drive in the folder that you named survey). We shall use the COMPUTE command to build up a new variable that will be labelled BMI, which stands for body mass index. This is calculated as: Body mass index = weight (pounds)/ height (inches) 2 Select TRANSFORM and then COMPUTE and set the Target Variable to bmi. Click on Type & Label and enter the label body mass index in the label box. Click continue to return to the Computer Variable dialog box. Using the source list on the left and the calculator pad in the centre, build up 22 Weight * 0.4536 / (height * 0.0254) **2 in the numeric expression box. Run the completed command. The new variable is added to the end of the data. We shall check the new variable by estimating a few descriptive statistics using FREQUENCIES (via Analyze – Descriptive Statistics). (Analyze – Descriptive Statistics – Explore would be a better command, but Frequencies will do here). Select ANALYZE, DESCRIPTIVE STATISTICS and then FREQUENCIES. Move body mass index (bmi) to the Variable(s) box. Since bmi is a metric variable with a potentially different value for every case in the data suppress frequency tables by clearing the check box. Click on DISPLAY FREQUENCY TABLES. Now you will get a message saying „You have turned off all output. Unless you request Display Frequency Tables, Statistics or Charts, Frequencies will generate no output‟. No worries, we will estimate descriptive statistics by clicking on STATISTICS and clicking on the check boxes for the following: MEAN, MEDIAN, MINIMUM and MAXIMUM. Run the command and look at the output. What are the sample values of the mean, median, minimum and maximum? (The mean should be around 25.0. Any values outside the range15.0 to 35.0 should be queried). Do the sample statistics satisfy these rough checks? If not, something is wrong! Conditionally Computing Values Now we shall use the IF sub-command (via Transform-Compute) to set up a new variable. The sub-command allows you to set up a new variable under the condition that the original variable, which it is based on, fulfils certain criteria. We want to set up a new variable AGEHOH for the age of the head of the household. In other words, If a person in the sample is head of the household, AGEHOH shall indicate that person‟s age. Select TRANSFORM and then COMPUTE and clear the previous settings by clicking on RESET. Set the Target Variable to AGEHOH and click on TYPE & LABEL to assign the label age head of household. Click on Continue, and then set the Numeric Expression to AGE. We want this (i.e., the current age in years) to be applied when the case is head of household, which occurs when RELTOHOH is zero. (For the variable RELTOHOH – relationship to head of household – the value 0 denotes that a person is head of household). Select IF… and INCLUDE IF CASE SATISFIES CONDITION. Set up the condition RELTOHOH = 0 in the large box and run the command. The variable AGEHOH should now be added to the end of the data. Have a look at the new variable. You should see ages set for some cases only. Let‟s check AGEHOH by moving it in the data matrix to the column after RELTOHOH so that we can see what happened more clearly. First we must make a space in the data matrix by inserting a new variable. Find RELTOHOH by either scrolling through the DATA EDITOR window or by 23 selecting UTILITIES and VARIABLES…. selecting RELTOHOH from the source lists and then clicking on GO TO and CLOSE. Now click on any cell of the variable that is immediately to the right of RELTOHOH (this variable should be sex). Then select DATA and then INSERT VARIABLE. Alternatively, you can click on INSERT VARIABLE tool (which is the sixth button from the right). Now, a blank column headed var00001 containing system-missing values (dots) is inserted before the selected variable. Move the AGEHOH to this column by single- clicking on AGEHOH to highlight the column and then selecting EDIT and CUT. To paste it in the desired location single-click on the head of the blank column (var00001) and select EDIT and then PASTE. Look at the values in the DATA EDITOR window. Do all heads of household have AGEHOH set? If not, what might be the reason? (Hint: Look at the variable that agehoh is derived from!). What value is set for cases who are not heads of household? Re-coding Values The RECODE command in SPSS is very powerful and efficient but it can be a little tricky to set up due to the number of clicks required. We shall recode BMI into a new variable BMIGRP, which takes the values Value Range Interpretation 1 bmi < 25.0 Okay 2 25.0 bmi < 30.0 Overweight 3 bmi 30.0 Obese Select TRANSFORM and then RECODE and INTO DIFFERENT VARIABLES. Select BMI from the source list into the central INPUT VARIABLE – OUTPUT VARIABLE box. Enter BMIGRP into the Name box and click on Change to complete the INPUT VARIABLE – OUTPUT VARIABLE box. Also enter a suitable variable label for BMIGRP in the LABEL box (e.g., categorical body mass index). To set up the recoding, click on OLD and NEW VALUES….We build up the recode specification for the third category of BMIGRP first. In the OLD VALUE box, select RANGE and THROUGH HIGHEST and enter 30.0 in the box before THROUGH HIGHEST. In the NEW VALUE section, enter 3 into the VALUE box. Then click on ADD to copy the specification 30.0 THROUGH HIGHEST = 3 to the OLD – NEW box. Build up the other two specifications, in order of 25.0 through 30.0 = 2 and LOWEST THROUGH 25.0 = 1. Now run the completed command. 24 To finish, double-click on BMIGRP in the Data Editor window, and define suitable value labels (i.e., 1= okay, 2 = overweight, 3 = obese). Are the values of BMIGRP correct for the first ten cases? Filtering Cases In this example, we shall filter cases. The filtering option allows you to exclude certain cases from further analysis temporarily. Before filtering, generate a two-way frequency table for ownrent by typaccm by selecting ANALYZE, then DESCRPTIVE STATISTICS and then CROSSTABS and selecting ownrent for Row(s) and typaccm for column(s). Run the command and look at the table in the output. 1. What exactly does the frequency count in the first cell of the second table refer to? 6 what? We shall filter using the variable PERSNO, which is the number of persons in the household. 2. What will be the effect of selecting cases satisfying the condition persno=1? What is the impact on households? Now, select DATA and SELECT CASES and then IF CONDITION IS SATISFIED and make sure that UNSELECTED CASES are FILTERED (This is very important as the alternative is DELETED, which we want to avoid now!) Select IF… and build up the condition persno = 1 in the large box. Run the completed command. Find persno in the data editor window. 3. What appears in the status bar when filtering is in effect? (The status bar is at the bottom of the window) 4. What has happened to case numbers with persno ≠ 1? Rerun the CROSSTABS command (via Analyse – Descriptive statistics) and look at the new table in the output. 5. What exactly does the frequency count in the first cell refer to now? 3 What? Go to the Data Editor Window and save the filtered data as familyf.sav. Then select DATA, SELECT CASES and then ALL CASES. Run the command. 6. What happens to the status bar and the case numbers? 25 Deleting Cases Instead of filtering cases we shall delete unselected cases without doing any harm to data stored in disk system files. Select DATA, SELECT CASES, IF CONDITION IS SATISFIED which picks up the previous condition on persno = 1. Then select UNSELECTED CASES are DELETED. Run the command and have a look at the Data Editor Window. 1. How many cases are left? 2. What are the values of PERSNO? 3. What are the values of HSEMO? What does that successfully show? Now, rerun the CROSSTABS command in the previous section and look at the output. 4. Do the results agree with those obtained when cases are filtered? Return to the Data Editor Window and save the selected cases to a NEW system file named familyd.sav (after deleting cases you should do this as soon as possible to avoid overwriting your complete data file by accident). Finally, re-open familyf.sav, the filtered file you saved from the previous section 5. Is filtering still on? Exit from SPSS, saving the contents of the output window into output3.spo Open up family.sav that you saved to your survey folder. 26 WEEK 3: October 17 th T-Tests Section I: Parametric T-tests (related & unrelated) This practical will show you how to run a t-test so that you can look at the difference between means of two scores. Experimental designs can be of two basic types – within subject (dependent or related) and between subject (independent or unrelated). The former is when all subjects are subjected to all conditions (e.g., testing reaction times before and after receiving a drug). Between subject designs are when you divide subjects into independent groups, such as on the basis of gender, or into one group that receives a drug, and a second that receives a placebo. DEPENDENT OR RELATED SAMPLES T-TEST First, a quick review of the test layouts. 1. Related Samples - two variables, one for each condition of the experiment. Each subject has two scores, as a result: Variable 1 (First set of scores for the subjects, e.g. reaction time before taking the drug) Variable 2 (Second set of scores for the subjects, e.g. reaction time after taking the drug) Sub. No. 1 10 30 2 11 31 3 12 32 4 10 30 5 9 29. 2. Independent or Unrelated Samples - two variables, the first tells SPSS what condition EACH subject belongs to, the second is the actual score for that subject: Variable 1 (what condition each subject belongs to, e.g. group 1 are the controls, group 2 receive the drug) Variable 2 (actual score, e.g. each subject‟s reaction time) Sub. No. 1 (control) subject‟s condition (1) subject 1 score 2 (control) 1 subject 2 score 3 (experimental) 2 etc. 4 (experimental) 2 etc. 27 T-Test for Related Sample This is the parametric comparison of two related groups, for example, when you want to compare mean scores for subjects at some task before and after taking a drug. Each set of subject scores for the related t-test must be entered as an individual variable in SPSS. So, in the above example, all the individual(s) scores for the task before taking the drug would be in one column and all the scores after taking the drug in another. First, open family.sav. The next step is to add a variable to the data file, so that we can run the related t-test. In this case, the comparison will be between the subjects‟ height/weight ratio before they were put on a 4-week diet/exercise plan and after. The variable already in the data set HWRATIO is the measure before. At the end of the data file, add the variable HWRATIO2 to represent their measurements after the plan. Using what you learned in the first lesson about entering data, create the new variable using the information below: Variable Name: HWRATIO2 Variable Label: Height/Weight Ratio after plan Data: see table 1 below To run the procedure, go ANALYZE, COMPARE MEANS and then PAIRED- SAMPLES T-TEST The usual dialogue box appears. The dialogue box has the two-column format. The only difference is that you must select pairs of variables and move them across, rather than just one variable at a time. To do this, you have to click on one variable, then locate the other variable and click on it. The two variables that you have requested should appear in the current selection box. After clicking on both, you then press the arrow button to move the pair across. SPSS will analyse each pair to determine if their means are significantly different statistically. In this case, select the variables HWRATIO and HWRATIO2 and move them across, then press the OK button. Table 1: Data for Height/Weight Ratio after a 4-week diet/exercise plan Subject Number HWRATIO2 score 1 .44 2 .52 3 .46 4 . 5 .44 6 .42 7 .33 8 .74 9 .80 10 .32 11 .60 12 .65 13 .40 28 14 .50 15 .57 16 .41 17 .60 18 .55 19 .49 20 .60 OUTPUT The results appear in three sections The first section gives you a table called Paired Samples Statistics with the mean scores, standard deviations and standard error mean for the two variables. The second section is a table called Paired Samples Correlation(s) showing the correlation between the two variables and the level of significance The third section is more important. The table called Paired Samples Test indicates the significance of the results. This includes the t-value, degrees of freedom (d.f.) and the two-tailed significance level. What is the t-value for the comparison between the height to weight ratio scores? Is there a significant difference between the scores before and after the diet/exercise plan? If so, which is the greater height/weight ratio? T-Test for Independent Samples This is the parametric t-test for two independent samples - a between-subjects design where, for example, subjects are randomly assigned to two separate test conditions (e.g. drug and control) and the mean scores (e.g. reaction time) are compared to determine if they are significantly different from each other. In this case, you want to test whether there is a statistical difference in weight to height ratios between the male and female subjects. The format for variables to be used in the independent t-test is different from that used in the related. Instead of the scores being placed in two separate columns (variables), all of the scores are placed in a single column (variable). A second variable identifies for SPSS which of the two groups each score belongs to. So, in this case, there is the variable HWRATIO2 as the dependent variable and NSEX as the independent variable. To run the analysis, go to ANALYZE, COMPARE MEANS and then INDEPENDENT-SAMPLES T-TEST. As usual, the left column lists all the variables in your data file. On the right, there are two boxes: The test variable(s) box is where you move the dependent variable(s). (e.g., HWRATIO2) 29 The grouping variable box is where you move the variable that distinguishes between the two independent groups (e.g. the variable NSEX) First, select the dependent variable HWRATION2 and move it over to the test variable(s) section. Next move NSEX over into the grouping Variable section and press the DEFINE GROUPS button. Values from the grouping variable must be entered into the two boxes. In the case of the variable sex, where only two levels are recorded, you would just enter “1" in the top box for male subjects, and “2" in the lower one for female subjects. Hit the CONTINUE button, then hit the OK button. [Note: There may be times where you have a larger range of values, such as five different education levels, but only want to look at the difference between two of them. You would enter the two values you wish to compare.] OUTPUT There are two sections: The first section of the output gives you a table called Group Statistics which indicates the number of cases and the mean scores etc. for each condition. The second section provides a table called Independent Samples T-test and starts with Levene‟s Test for Equality of Variance. If the variance is unequal and is indicated by significant difference, then when you look at the results of the t- test in the final table, you use the line starting with Equal variances not assumed. If it isn‟t significant, you look at the line starting with Equal variances assumed. The final table gives you t-values, degrees of freedom and the two-tailed significance levels. In this case, Levene‟s is not significant (0.137), so we look at the equal variance line. In this case, it is not significant (two-tailed significance of .478), so we reject the hypothesis that there is a difference between males and females in their height to weight ratios. Section II: Non-Parametric T-tests (Wilcoxon - related & Mann- Whitney - unrelated) All of the tests today can be found under ANALYZE, NONPARAMETRIC TESTS Mann-Whitney - Unrelated This is the non-parametric t-test for two independent samples - a between-subjects design. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 INDEPENDENT SAMPLES As usual, the left column lists all the variables in your data file. On the right, there are two boxes: 30 the “test variable(s)” box is where you move the dependent variable(s) the “grouping variable” box is where you move the variable that distinguishes between the two independent groups (e.g. the variable sex) So, move HWRATIO2 into the test variable box, and move NSEX into the grouping variable box. Now, click the Define Groups button. Values from the grouping variable must be entered into the two boxes. In the case of the variable NSEX, you enter “1" in the top box for male subjects, and “2" in the lower one for female subjects. Hit the Continue button, then hit the Ok button. OUTPUT SPSS divides the entire set of subjects into three groups: those with a score of 1 (male) those with a score of 2 (female) cases with missing data, which are excluded from the analysis) The first section gives the mean ranks for the two conditions that are included, as well as the sums of the ranks and the numbers of cases The second section gives the Z score and p-values for the T-test. Is there a difference between males and females? How do the results from this week compare to last week‟s? Wilcoxon - Related This is the non-parametric repeated measures T-test, in a within subjects design. Like the parametric equivalent, we‟ll be running a comparison of height to weight ratios for the sample population before and after a four-week exercise/diet program. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 RELATED SAMPLES The dialogue box has the two-column format. The only difference is that you must select pairs of variables and move them across. SPSS will analyse each pair to determine if their mean ranks are significantly different statistically. For this analysis, select the two variables HWRATIO and HWRATIO2, then click the Ok button. OUTPUT The output for this procedure is quite different from the parametric test. The first section gives you information about how many rank scores for one condition are less than (LT) greater than (GT) equal to (EQ) . the chart. To move to Word minimise SPSS and open word. If Word is already open then press ALT & TAB to move between programs. Once in Word, go to EDIT PASTE. Finally, exit from SPSS for. Interpretation 1 bmi < 25.0 Okay 2 25.0 bmi < 30 .0 Overweight 3 bmi 30 .0 Obese Select TRANSFORM and then RECODE and INTO DIFFERENT VARIABLES. Select BMI from the source list into the central. HIGHEST and enter 30 .0 in the box before THROUGH HIGHEST. In the NEW VALUE section, enter 3 into the VALUE box. Then click on ADD to copy the specification 30 .0 THROUGH HIGHEST = 3 to the OLD –