Correlation means, of course, co-relation or covariation. That is, two sets of marks increase or decrease in tandem. When scores for one test (say, English) go up, so do the scores for another test (say, Math). Using the appropriate formula, we get a positivecorrelation coefficient(r) likely to be somewhere around 0.5 to 0.7. Such a moderate correlation coefficient is commonly found for pairs of various subjects in the school curriculum. Positive correlation coefficients vary from 0.01 to 1.00.
On the other hand, if scores for one test increase, being accompanied by decrease in scores for another test, we get a negative correlation coefficient. This does not happen too often in school context, at least within the curriculum. However, such negative correlation may be found between, say, students’physique and achieve- ment. There are evidences that obese students tend to do not so well as their lean peers in school work. Recently, it was reported in theNature(a high-level journal of scientific research) that obesity is detrimental to brain functions. Negative cor- relation coefficients vary from−0.01 to−1.00.
These leave us with a correlation coefficient r = 0.00 which denotes the lack of systematic or predictable covariation between two sets of marks. That is, when a student’s score for a test goes up, his score for the other test may go up, go down, or stay put. In short, in such a case, you cannot predict what theotherscore will be from a score for a test. This means that a student good in one subject may and may not be also good in another.
In the old-old days when even the hand-held calculator was not in existence, calculating a correlation coefficient on paper for a class of 40 students might take around 15–20 min. Therefore, quantitative educational research was slow. Now, we delegate this tedious job to the computer. It produces a large number of such coefficients literally with one key stroke. This is wonderful, but it is also dangerous because this can lead to abuse and misuse.
To help you appreciate what the computer can do for us, look at Table5.1. It shows what you have to do if you calculate a correlation coefficient by hand.
Compare the marks for English and Math and what do you notice? You see that, generally, students who have high scores for English also tend to have high scores for Math, though not exactly in the same order.
Figure5.2 shows this clearly. The students’ scores for the two subjects are plotted on a two-dimension graph, vertically for English and horizontally for Math.
Note that the dotted line represents the tendency of covariation of the two sets of scores. This pattern suggests a positive correlation but not a perfect one.
In Table5.1, the totals of X, Y, XX, YY, and XY are shown in the bottom row.
These are needed for the calculation of the correlation coefficient between English Fig. 5.1 Correlations and
score distributions
and Math scores for the 10 students. Once these totals are available, they are fed into the frightening formula:
rẳðNSXYSXSYị=p
NSXXSXSX
ð ị ðNSYYSYSYị
ð ị
This could well be the most complicated formula in your life. If you have made no mistakes in the calculation, r is 0.72. By the way, stop at two decimal values, although it works out to be 0.721212121.
What we have calculated using the original test scores is called thePearson’s product moment correlation or simply Pearson’s correlation, or just r. In the school context, there are occasions when students’performance takes the form of ranks and the original scores are not available. In this case, we still canfind the correlation by using the ranks for calculation. What we get is called the Spearman’s rank difference correlation or simpler, Spearman’sρ(rho).
For illustration, let us say 10 students (N = 10) presented their project and were independently assessed by Miss Lee and Mr. Tan. They rank the students from 1 Table 5.1 Calculation of a
correlation coefficient Student English (X)
Math (Y)
XX YY XY
A 90 75 8100 5625 6750
B 85 80 7225 6400 6800
C 80 70 6400 4900 5600
D 75 85 5625 7225 6375
E 70 55 4900 3025 3850
F 65 60 4225 3600 3900
G 60 45 3600 2025 2700
H 55 40 3025 1600 2200
I 50 65 2500 4225 3250
J 45 50 2025 2500 2250
Sum (S)
675 625 47625 41125 43675
0 10 20 30 40 50 60 70 80 90
0 20 40 60 80 100
Fig. 5.2 Scatter plot for English and Math scores
38 5 On Correlation: What Is Between Them?
(the best) to 10 (the poorest) as shown in Table5.2 to calculate the Spearman’s correlation, youfindfirst the difference between the two ranks for each student and then square the difference. Once this is done for all students, sum all the squares of rank difference (which turns out to be 46). Then,fill this and the number of students (N) into the formula, the Spearman’s correlation is 0.72.
Spearman0s correlationẳ1ð6Sum of squared rank differencesị=ðN1ị ð ịN ðNỵ1ị
For this example, it just happens that the Pearson’srand the Spearman’sρhave the same value of 0.72. In most cases, ρ will be smaller than r, because in the process of calculatingρ, some information is lost. And what does this mean?
Let us say four students score 90, 85, 75, and 60. They are ranked 1, 2, 3, and 4, respectively. The differences between ranks are all 2− 1 = 1, 3− 2 = 1, and 4− 3 = 1. However, the differences in scores are different: 90− 85 = 5, 85−75 = 10, and 75− 60 = 15. This means that the same rank differences are not the same score differences. In other words, a difference in rank between thefirst two students is 1 and the corresponding score difference is 5, but the same is not true for the difference between the third and the fourth students (rank difference 1, but score difference 15). Thus, using ranks loses the differences in scores.
Figure5.3is the scatter plot of English and Math ranks for the 10 students. Note that the dotted line is similar to that in Fig.5.3, indicating a positive correlation.
The two graphs do not look exactly the same, but the tendency of the dotted lines is almost identical since both correlation coefficients are the same.
Table 5.2 Calculation of Spearman’s rank difference correlation
Student Miss Lee
Mr.
Tan
Difference (D)
D squared
A 1 3 −2 4
B 2 2 0 0
C 3 4 −1 1
D 4 1 3 9
E 5 7 −2 4
F 6 6 0 0
G 7 9 −2 4
H 8 10 −2 4
I 9 5 4 16
J 10 8 2 4
Sum (S)
– – – 46