Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
114,86 KB
Nội dung
Do Language Proficiency Test Scores Differ by Gender? CINDY L JAMES Thompson Rivers University Kamloops, British Columbia, Canada doi: 10.5054/tq.2010.222215 & Most postsecondary educational institutions employ some type of language proficiency assessment for international applicants to assess their language skills (Alderson, Krahnke, & Stansfield, 1987; ChalhoubDeville & Turner, 2000; Kahn, Butler, Weigle, & Sato, 1994; Paltridge, 1992; Person, 2002; Rees, 1999; Roemer, 2002; Seaman & Hayward, 2000) The performance of these applicants is of interest to administrators, faculty, staff, and researchers alike, with gender variations being one issue often studied These types of studies tend to compare the performance of females with males in terms of mean test scores by subtest and/or total test score, and in some cases by specific test questions or types of questions The score differences are often reported as raw score differences, but to enhance comparability between tests, the differences also can be expressed in a standardized form such as a standard mean difference or a percent difference The standard mean difference, denoted by D, is considered by Willingham and Cole (1997) in their meta-analysis of gender and assessment as one of the most common measurements It is calculated by subtracting the male mean score from the female mean score and dividing the difference by the average standard deviation (SD): D~ female mean { male mean average SD Referring to the classifications by Cohen (1988), values of D from 0.20 to 0.49 indicate small differences, from 0.50 to 0.79 are considered medium differences, and 0.80 or higher equate to large differences When standard deviations are not available, a percent difference can be employed by calculating the difference between the male and female mean and dividing this by the total score of the test and converting the answer to a percentage: % Difference~ female mean { male mean |100% total test score BRIEF REPORTS AND SUMMARIES 387 To date, many of the gender studies of language proficiency tests have revealed stronger performances by females compared with males, although these differences in general tend to be quite small For instance, Zeidner (1987) explored the impact of gender and other factors on test scores for an English language aptitude test used for selection and placement at Israeli educational institutions His research revealed that the mean test scores were significantly different, with females, in general, scoring higher than males Although standard differences were not provided, it was possible to calculate post hoc with the standard mean difference being 0.17 and the percent difference being 1.36% Females also scored slightly higher on the academic examination of the International English Language Testing System (IELTS) based on data from 2004 (University of Cambridge, 2006) This testing system includes listening, reading, writing, and speaking sections, and in each case the mean band scores for females were greater than those for males, as was the overall test score Once again, the standard differences were not included in this IELTS report, but it was possible to calculate post hoc the percent difference with the data provided For the reading test the percent difference was 1.44%, for the listening and speaking tests it was 1.56%, for the writing test the difference was 1.78%, and for the total test score it was 1.67% Similarly, a data report by the Educational Testing Service (2007) for the Test of English as a Foreign Language Internet-Based Test (TOEFL iBT) revealed that, overall, females scored marginally higher than males on tests conducted between September 2005 and December 2006 At the individual test level, the mean test scores for females were higher than those for males on three of the four sections—listening, speaking, and writing test—but the reverse was true for the reading test This TOEFL iBT score report provided means and standard deviations, so it was possible to calculate standard differences post hoc The standard mean and percentage differences between females and males on the subtests were as follows: 0.04 and 1.0% for listening, 0.21 and 3.7% for speaking, 0.09 and 1.7% for writing, and 20.05 and 21.3% for reading This equated to an overall standard mean difference of 0.04 and a percent difference of 0.8% Similar results were reported for the Michigan English Language Assessment Battery (MELAB), with the mean final MELAB score again being higher for females than males, based on scores collected in 2007 (Johnson & Song, 2008) The final MELAB score is calculated by averaging the scores on the three compulsory sections of the MELAB— listening, reading, and writing Scores by gender for these individual sections were not provided in this report, so post hoc analysis was only possible on the final mean scores, generating a standard mean difference of 0.12 and a percentage difference of 1.2% A comparison of female and male mean test scores on the Canadian Academic English Language (CAEL) assessment tests also showed females 388 TESOL QUARTERLY scoring slightly higher on all four of the language subtests—writing, listening, reading, and speaking—based on candidate scores from 2002 to 2008 (Carleton University, 2009) Post hoc analysis of the mean differences produced values of 0.18 for writing, 0.17 for listening, 0.20 for reading, and 0.23 for speaking This equated to percentage differences of 3.0% for writing, 2.8% for listening, 3.2% for reading, and 3.8% for speaking In terms of the studies that have analyzed gender differences for specific questions or types of questions, results vary For example, a study conducted by Pae (2004) examining the effect of gender on an English reading comprehension test for Korean learners revealed that items with content relating to mood, impression, and tone tended to be easier for females, whereas passages with logical inferences were easier for males A study by Takala and Kaftandjieva (2000) involving an English vocabulary test—one of the subtests of the Finnish Foreign Language Certificate Examination— also revealed that some of the test items tended to favour females, whereas others favoured males; however, they concluded that the test as a whole was considered to be gender neutral Meanwhile, a review of writing assessments used for postsecondary admission by Breland, Bridgeman, and Fowles (1999) found that males tended to better than females on the multiple-choice subtests of the TOEFL but the females performed better on the essay portion of the TOEFL Another TOEFL study by Wainer and Lukhele (1997) that investigated gender differences on reading comprehension testlets—sections with several questions based on one reading passage—found that there were essentially no gender differences (greatest difference was ,3%) Based on these reports and studies, females on average appear to score higher than males on language proficiency exams However, these differences are minimal and in some cases not apply to all subtests of the various language-testing batteries To verify this trend using a different language proficiency test—the College Board’s Accuplacer English as a second language (ESL) testing system—and to explore gender differences in terms of specific skill areas based on subtest scores, data from several years of testing were collected and analyzed to ascertain how females performed on the Accuplacer ESL tests compared with males METHOD This study took place at the Thompson Rivers University (TRU) in Kamloops, British Columbia, Canada—a public educational institute offering a variety of university, college, and technical programs, including a comprehensive, preparatory English as a second or additional language program (ESAL) Data were collected from 494 ESAL students tested over a 2-year period (2006–2008) These students BRIEF REPORTS AND SUMMARIES 389 ranged in age from 16 years to 54 years, with an average age of 22.0 years (SD 3.80) The gender distribution was fairly even, with 57% males and 43% females The ethnic background of these students was diverse, with representation from 47 different countries Testing Tool and Procedures The Accuplacer ESL testing system is a Web-based program marketed by the College Board that assesses the English skills of students who have learned English as a second or alternate language It consists of four multiple-choice tests: Reading Skills, which evaluates the student’s comprehension of short passages; Language Usage, which measures grammar and usage; Sentence Meaning, which assesses the understanding of word meanings in one- or two-sentence contexts; and Listening, which measures the ability to listen to and understand one or more speakers (College Examination Board, 2007) All of these tests are offered in a computer-adaptive test format where the pool of test items is calibrated for difficulty and content When the candidate starts each test, she or he is presented with a question of average difficulty randomly selected from several starter questions of the same level of difficulty If the examinee answers the question correctly, the next question to be administered is chosen from a group of somewhat more difficult questions, whereas an incorrect answer will cause the next question to be somewhat easier Each of the four tests consists of 20 questions offered in this adaptive format and scored on a scale of to 120 The Accuplacer ESL testing system also includes the WritePlacer ESL test, which requires the examinee to complete a writing sample based on a randomly assigned prompt This test is scored by IntelliMetric—an electronic scoring system that utilizes a holistic scoring process designed to emulate human scorers, rating each essay on a scale of to based on its overall effectiveness (Elliot, 2003) Detailed descriptions of this testing system and each test can be obtained from the Accuplacer Web site (College Examination Board, 2008) The Accuplacer ESL testing system was adopted by TRU for several reasons, including its ability to assess a wide range of skills because of its computer-adaptive format and its ability to test all relevant skill areas Other reasons related to the efficiency and cost of the system—namely, it was fairly straightforward to administer, the results were available immediately, and it was relatively inexpensive The Accuplacer ESL tests were administered in computer labs by the TRU Assessment Centre staff to international students during their first week of orientation at TRU, before classes began For the multiplechoice tests there was no time limit; however, for the writing sample a 390 TESOL QUARTERLY 1-hour time limit was imposed In most cases, students took approximately hours to complete all five tests; however, some took only 90 minutes, and others took a full hours Data Collection and Analysis The raw test scores for each of the five Accuplacer ESL tests were collected from students who wrote the assessment during the international intakes (fall, winter, and summer) from May 2006 to May 2008, along with basic demographic information To determine whether there were any differences in mean scores by gender for the five Accuplacer ESL tests, an analysis of variance (ANOVA) was employed The standard mean difference, percentage differences, and intercorrelations also were calculated for comparison purposes A line graph of the mean test scores by gender was generated to provide a pictorial representation of the mean score distributions The WritePlacer ESL test scores were adjusted by a factor of 20 so that they too would be out of a total of 120 and hence could be plotted on the same line graph with the other tests RESULTS The descriptive and mean difference statistics for the test scores by gender are provided in Table The distribution of the test scores were normal or near normal with slight skewedness For every test, the females scored higher than the males; however, based on the percentage difference and the standard mean difference, these differences were quite small, as displayed in Table TABLE Descriptive and Mean Difference Statistics for Accuplacer ESL Tests by Gender ACCUPLACER ESL tests Listening Language usage Reading skills Sentence meaning WritePlacer ESL Female (n 211) Mean SD 67.7* 16.23 87.3* 20.88 82.5* 21.54 80.8 22.42 2.87* 2.788 Male (n 283) 66.1 19.39 81.7* 24.65 77.6* 25.63 75.6 26.00 2.60* 1.773 Gender Statistic Mean SD Maximum 120 score %Difference 1.42% D 0.095 120 4.67% 0.245 120 4.08% 0.207 120 4.33% 0.214 4.50% 0.116 Note SD standard deviation; D standard mean difference *Slightly skewed distributions BRIEF REPORTS AND SUMMARIES 391 FIGURE ACCUPLACER ESL mean scores by gender A pictorial representation of the mean test scores for the Accuplacer ESL tests by gender is provided in Figure 1, and the ANOVA analysis is provided in Table As shown in Figure 1, both females and males scored highest on the Language Usage test and lowest on the WritePlacer ESL test The ANOVA analysis revealed that the differences between mean test scores were not significant for the Listening or WritePlacer ESL tests, but were significant for the ‘‘Language Usage’’, Reading Skills, and Sentence Meaning tests (Table 2) The Pearson correlations between the five tests by gender produced large (Cohen, 1988), significant correlations for both females and males and all combinations of tests at the 0.01 level (2-tailed; Table 3) In all cases, the male subgroup produced the larger correlations, ranging from 0.648 to 0.839 For the females, the correlations ranged from 0.539 to 0.811 For both genders, the smaller correlations were between the Listening and WritePlacer ESL tests, and the largest were between the Reading Skills and Sentence Meaning tests (Table 3) TABLE ANOVA Analysis of Accuplacer ESL Mean Test Scores by Gender Mean test score ACCUPLACER ESL test Listening Language Usage Reading Skills Sentence Meaning WritePlacer ESL n 392 Female 67.75 87.30 82.51 80.82 2.87 211 Male 66.08 81.69 77.59 75.60 2.60 283 Total 66.79 84.08 79.69 77.83 2.71 494 F 1.025 7.060 5.088 5.478 3.053 Significance 0.312 0.008 0.025 0.020 0.081 TESOL QUARTERLY TABLE Pearson Correlations Between Test Scores by Gender Accuplacer ESL tests Listening Language Usage Reading Skills Sentence Meaning WritePlacer ESL Listening 0.660* (F) 0.728* (M) 0.727* (F) 0.758* (M) 0.764* (F) 0.813* (M) 0.539* (F) 0.648*(M) Language usage Reading skills Sentence meaning WritePlacer ESL 0.744* 0.798* 0.739* 0.824* 0.601* 0.715* (F) (M) (F) (M) (F) (M) 0.811* 0.839* 0.643* 0.733* (F) (M) (F) (M) 0.616* (F) 0.699* (M) Note F female; M male *Correlation is significant at the 0.01 level (2-tailed) DISCUSSION In this study, on average, females scored higher than males on the Accuplacer ESL tests, with significant differences in mean scores for three of the five tests—Language Usage, Reading Skills, and Sentence Meaning Although these differences were statistically significant, based on the standard mean differences and percent differences, they were quite small These results are akin to those reported in most of the aforementioned studies, with females scoring slightly higher overall on other language tests such as the IELTS (University of Cambridge, 2006), TOEFL iBT (Educational Testing Service, 2007), and MELAB (Johnson & Song, 2008) In terms of skill areas, this study found that the subtests measuring reading and vocabulary skills produced the greatest difference in mean scores (0.207 , D , 0.245), followed by writing (D 0.116) and then listening (D 0.095; Table 1) The CAEL exam exhibited a similar pattern, with the greatest mean difference after the speaking test being reading (D 0.20), followed by writing (D 0.18) and then listening (D 0.17) (Carleton University, 2009) However, for the TOEFL iBT exam, males scored slightly higher on the reading test compared with females (D 20.05) (Educational Testing Service, 2007) Aside from this test, the trend of females scoring higher than males remained intact for the other TOEFL iBT subtests, with the writing test producing the greatest difference between mean scores (D 0.09) and the listening test producing the smallest difference (D 0.04; Educational Testing Service, 2007) These results indicate that the overall differences in language-testing scores by gender may be attributed to particular skill areas Specifically, females appear to perform noticeably better on subtests related to reading or vocabulary skills compared with males but only marginally better on tests related to writing and listening skills BRIEF REPORTS AND SUMMARIES 393 However, the existing data are insufficient to make any explicit claims at this time, especially considering the incongruous results from the TOEFL iBT exam The impact of test format is also ambiguous In this study, question format did not seem to matter because females scored higher on the multiple-choice tests and the written section However, in the study by Breland et al (1999), females scored higher on the essay portion of the TOEFL but lower than males on the multiple-choice tests Again, more research is needed to determine whether test format is a factor in performance on language proficiency tests by gender Given the similar results from this and the other published studies, it is very plausible that the differences between the two groups are because of real skill differences This conjecture is supported in this study specifically by the mean score distributions for males and females (Figure 1) and by the existence of the large, significant correlations and their differing values between all combinations of the subtests by gender (Table 3) These data attest to the validity and accuracy of the Accuplacer ESL testing system and its purpose—to assess the English skills of students who have learned English as a second or alternate language—in several ways First, the patterns of the distribution of the mean test scores for both females and males are nearly identical, with the highest mean scores on the Language Usage and Reading Skills tests and lowest on the Listening and WritePlacer ESL tests (Figure 1) Hence none of the subtests seem to favour either gender By diminishing the possibility of gender bias, these findings enhance the probability of the existence of true skill differences Second, the large, significant correlations between all subtests (Table 3) indicate that, if a female or male candidate scored lower on any one of the subtests such as Listening, she or he also scored lower on all the other tests, and vice versa Thus this testing system appears to provide a consistent measurement of language skills for both genders Finally, in this testing battery the Reading Skills and Sentence Skills tests are the two subtests that have the most in common in terms of measuring analogous language skills—namely, reading comprehension and reading vocabulary Hence the correlations between these two tests should be greater than any other correlation by gender Because this was exactly the case in this study for both males and females (rm5 0.839 and rf 0.811, respectively), these correlations verify the soundness of this assessment tool Similarly, although still large, for both genders the correlations between the Listening and WritePlacer ESL tests were the lowest of all the subtest correlations (rm 0.648 and rf 0.539) Given that these two tests measure two distinct language skills—namely, listening and writing—and therefore differ the most out of all the 394 TESOL QUARTERLY subtests, such results also would be anticipated and would further validate the accuracy of the assessment Therefore, based on this validity evidence, it would appear that the Accuplacer ESL testing system does what it purports to Consequently, with the use of these analyses, it is reasonable to conclude that language proficiency test scores differ by gender, with females on average outperforming males, and to surmise that these differences equate to real skill differences LIMITATIONS AND CONCLUSION It should be noted, however, that there are several limitations in this study that prevent formation of any definitive conclusions First, this study did not control for the diversity of the students As mentioned earlier, the 494 subjects in this study originated from 47 different countries, including representation from Africa, Asia, Europe, the Middle East, and South and Central America Hence the subjects had varying cultural and educational backgrounds which may have affected their performance, but, because of the sample size and the lack of background information, it was not feasible to account for these differences Another limitation is that it was not possible to assess the performance of females and males on specific test questions This is because of the computer-adaptive nature of the Accuplacer testing system which prevents administrators from gathering scoring information on individual questions for the multiplechoice tests Hence to elucidate these findings, further research is necessary To begin with, this study needs to be replicated at TRU and at other institutions to determine whether the results are consistent over time and with other populations Such studies should endeavour to control for ethnicity, because ethnicity may have had an influence on these results Moreover, it would be beneficial if an item analysis study could be conducted in collaboration with the College Board In addition, to enhance interpretation, all future studies should provide comparative statistics such as the standard mean difference and the percentage difference Finally, because a pattern of females outperforming males on language proficiency exams has been revealed in other studies and verified by this study, it is important that subsequent studies delve into the question of why this is happening and to ascertain the implications of such differences When investigating the reasons behind these differences, one factor that definitely should be studied is the candidates’ educational BRIEF REPORTS AND SUMMARIES 395 backgrounds and particularly their previous language-training experiences Some specific questions to consider are as follows: How does the schooling of males and females compare during the K–12 years in their native countries? Are females streamed into the social sciences and languages and males into the sciences and math, and, if so, what impact does this have on their second-language acquisition skills? How effective are the ESAL courses and programs offered in their native country? Do participation and success rates in these programs differ by gender? Is there any gender bias in the curricula and pedagogy of these courses or programs? Besides answering questions about gender differences in language proficiency test scores, research into these types of questions also may generate insights into other issues, such as why writing skills appear to be weaker than any of the other ESAL skills for both males and females, based on the data from this study and also data from the CAEL exam (Carleton University, 2009) and the IELTS exam (University of Cambridge, 2006) Regarding the implications of these gender differences, again questions surrounding educational context would be paramount First, it is necessary to determine whether these differences have an impact on the candidates’ postsecondary education In particular, they interfere with acceptance and success rates? If these are true skill differences, it would be reasonable to assume that females would peform better in ESAL courses compared with males, but is this necessarily true? Perhaps by the end of these programs gender differences dissipate What about other subject areas? Do these differences equate to performance differences in academia? Do they influence the students’ abilities to adjust to their new social environment and so on? The answers to these questions and others should provide an explanation of why females tend to score higher than males on language proficiency tests and should determine the necessity of and direction for possible interventions In the meantime, postsecondary institutions can take a proactive stance by providing supplemental language instruction or tutoring for any ESAL students currently struggling with language issues Moreover, test designers and those administering tests must continue to monitor performance on language proficiency tests to ensure that testing accurately measures language skills for all examinees regardless of gender or any other dimension such as ethnicity, age, or social status THE AUTHOR Cindy L James has more than 20 years of postsecondary administrative and instructional experience, and currently is the coordinator of the Thompson Rivers University Assessment Centre in Kamloops, British Columbia, Canada 396 TESOL QUARTERLY REFERENCES Alderson, J C., Krahnke, K J., & Stansfield, C W (Eds.) (1987) Reviews of English language proficiency tests Washington, DC: Teachers of English to Speakers of Other Languages Breland, H M., Bridgeman, B., & Fowles, M E (1999) Writing assessments in admission to higher education: Review and framework (Report No 99-3) New York, NY: College Entrance Examination Board Carleton University (2009) CAEL test score and users’ guide Ottawa, Canada: Author Retrieved from http://www.cael.ca/edu/testuserguide.shtml Chalhoub-Deville, M., & Turner, C E (2000) What to look for in ESL admission tests: Cambridge certificate exams, IELTS, and TOEFL System, 28, 523–539 Cohen, J (1988) Statistical power analysis for the behavioral sciences (2nd ed.) Hillsdale, NJ: Lawrence Erlbaum College Examination Board (2007) ACCUPLACER ESL New York, NY Retrieved from http://www.collegeboard.com/student/testing/accuplacer/accuplacer-esltests.html College Examination Board (2008) ACCUPLACER ESL New York, NY Retrieved 2008 from http://www.collegeboard.com/student/testing/accuplacer/index html Educational Testing Service (2007) Test and score data summary for TOEFL Internetbased test: September 2005-December 2006 test data Princeton, NJ Retrieved from www.ets.org/toefl Elliot, S (2003) IntelliMetric: From Hear to Validity In M D Shermis & J Burstein, J (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp 71–86) Mahwah, NJ: Lawrence Erlbaum Johnson, J S., & Song, T (2008) MELAB 2007 descriptive statistics and reliability estimates Ann Arbor, MI: English Language Institute, University of Michigan Kahn, A B., Butler, F A., Weigle, S C., & Sato, E Y (1994) Adult ESL placement procedures in California: A summary of survey results Adult ESL assessment project Sacramento, CA: California State Department of Education Pae, Y.-I (2004) Gender effect on reading comprehension with Korean EFL learners System, 32, 265–281 Paltridge, B (1992) EAP placement testing: An integrated approach English for Specific Purposes, 11, 243–268 Person, N E (2002) Assessment of TOEFL scores and ESL classes as criteria for admission to career & technical education and other selected Marshall University graduate programs (Master’s thesis) Available from ERIC database (ED 473 756) Rees, J (1999) Counting the cost of international assessment: Why universities may need to get a second opinion Assessment & Evaluation in Higher Education, 24, 427–438 Roemer, A (2002) A more valid alternative to TOEFL College and University Journal, 77, 13–17 Seaman, A., & Hayward, L (2000) Standardized ESL test equating study: Equating the CELSA and the NYSPT with the MELT SPLs (ED 445084) New York, NY: Institute for Education and Social Policy Takala, S., & Kaftandjieva, F (2000) Test fairness: a DIF analysis of an L2 vocabulary test Language Testing, 17, 323–340 University of Cambridge (2006) IELTS test performance data 2004 Research Notes, 23, 13–15: Retrieved from www.ielts.org Wainer, H., & Lukhele, R (1997) How reliable are TOEFL scores? Educational and Psychological Measurement, 57, 741–758 Retrieved from http://epm.sagepub.com BRIEF REPORTS AND SUMMARIES 397 Willingham, W W., & Cole, N S (1997) Gender and fair assessment Mahwah, NJ: Lawrence Erlbaum Zeidner, M (1987) A comparison of ethnic, sex and age bias in the predictive validity of English language aptitude tests: Some Israeli data Language Testing, 4, 55–71 398 TESOL QUARTERLY