Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 18 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
18
Dung lượng
93,08 KB
Nội dung
Rater Sensitivity to Qualities of Lexis in Writing RACHAEL RUEGG Kanda University of International Studies Chiba, Japan ERIK FRITZ Kanda University of International Studies Chiba, Japan JENNIFER HOLLAND Kanda University of International Studies Chiba, Japan An examination of the scores given using analytic rating scales can provide insight into how well the scales are measuring what they are supposed to be measuring Many analytic rating scales include separate scales for lexis and grammar, yet the distinction between lexis and grammar has not been widely researched or defined It is therefore unclear whether separate lexis and grammar scales are appropriate For this reason, the lexis scores for 140 timed essays on a Japanese university English proficiency test were analyzed to determine what portion of variance is accounted for by different lexical qualities In this test, which was administered to incoming university students at the beginning of the academic year, it was found that lexical accuracy is predictive of lexis scores The lexis scores, however, are predicted by the scores on the grammar scale much more than range, frequency, or even accuracy of lexis in the essays The difficulties in separating lexis from grammar when rating writing are discussed Finally, limitations of the study and suggestions for further research are given doi: 10.5054/tq.2011.240860 he present study focuses on the way in which raters of timed essays assess lexis The data were collected from the writing section of the Kanda English Proficiency Test (KEPT).1 Research on the test has been published in Language Testing (Bonk & Ockey, 2003) and has been T The Kanda English Proficiency Test is a general English proficiency test given to incoming freshmen at the beginning of the academic year for the purpose of placement, to freshman and sophomore students at the end of the academic year to measure progress of individual students, and for the purpose of curriculum evaluation It consists of multiple-choice reading and listening sections, an essay writing section, and a speaking test that takes the form of a group discussion TESOL QUARTERLY Vol 45, No 1, March 2011 63 found to be highly predictive (74%) of the TOEFL (Bonk, 2001) The writing section consists of a single timed essay written by examinees based on a prompt The essay is double-rated by qualified and experienced English language instructors based on the KEPT essay rating scales The ratings are then scaled using Rasch modelling The scaling determines the comparative severity of each rater and adjusts the scores accordingly See Appendix A for the essay rating scales Although the data were collected from the KEPT, a test not widely used, the way in which raters assess lexis in writing is an area which should be of interest to a broad range of English language educators Many tests of English as a second language (ESL)/English as a foreign language (EFL) writing assess vocabulary using a separate analytic scale For example, in both Task and Task of the writing section of IELTS (n.d.), examiners use analytic rating scales to assign four separate scores, including one for lexical resource In the ESL composition profile (Jacobs, Zinkgraf, Wormuth, Hartfiel, & Hughey, 1981), five separate scores are assigned, including one for vocabulary The ELTT rating scale for writing, developed by the Austrian University ELTT Group (n.d.), has four separate scales, one of which is vocabulary In addition, many tests of ESL/EFL writing which use holistic scoring mention vocabulary as one factor for raters to consider when assigning a score; for example, the TOEFL (ETS., n.d.) paper-based writing test mentions vocabulary as one factor for consideration Furthermore, classroom teachers often include vocabulary as one of their criteria for rating students’ writing Therefore, the question of how raters assess vocabulary has implications not only for testing but also for classroom assessment Although vocabulary in EFL/ESL writing is often assessed, it is unclear how well raters are able to assess the range and accuracy of vocabulary usage Furthermore, it has been considered by members of the KEPT research group2 that raters may base the lexis scores more on the frequency of the words used than either the range or accuracy of their use Through discussion with raters during rater norming sessions, it also seems that the presence of low frequency words in an essay may be sufficient to artificially inflate an examinee’s score on the lexis scale even when, overall, the lexis is insufficient in terms of accuracy and range In addition to this concern, anecdotal evidence suggests that raters have difficulty distinguishing between lexis and grammar, and it has been noticed that raters of the KEPT often give the same score for the two scales This study endeavors to reveal whether raters are more sensitive 64 The KEPT research group is a collaborative research group which develops and administers the KEPT, as well as conducting research on the test and classroom assessment practices TESOL QUARTERLY to range, accuracy, or frequency of the words used when rating writing for lexis The research question is: When rating writing for lexis using an analytic rating scale, are raters more sensitive to lexical range, lexical accuracy or the frequency of the lexical items used? REVIEW OF LITERATURE Previous studies have discussed lexis in relation to speaking, but there has been very little research relating directly to the distinction between lexis and grammar in writing Research by Batty (2006) found no correlation between examinees’ vocabulary knowledge as measured by the Depth of Vocabulary Knowledge Test and their vocabulary scores for the speaking section of the KEPT As a result of Batty’s research, the vocabulary scale and the grammar scale were collapsed into a single lexicogrammar scale for the speaking section of the KEPT The present study aims to determine whether similar measures should be taken for the KEPT writing scales By examining the ratings with various measures of lexical quality, we attempt to determine what the lexis scale is actually measuring and decide the best course of action When thinking about what constitutes quality writing, vocabulary is of special importance for second language writers, because this may be the crucial factor in determining the reader’s ability to understand and evaluate the writing Evaluating timed writing samples, for example, requires a rater to not only be familiar with the assessment scales and the population being rated, but also to perform the more difficult task of reserving personal expectations on what constitutes quality writing Repetition of lexical items, for instance, tends to be acceptable in Japanese writing (Wakabayashi, 1992), whereas, in English, writers use synonyms and rephrase previously mentioned ideas In addition, rhetorical structure in writing can highlight differences between cultures related to how and in what order ideas are explained and presented Even though expectations can be moderated through training and norming processes (Lumley, 2002), raters may still interpret coherence and repetition of words differently Bacha (2001) examined timed essays written primarily by native Arabic speakers and found that, out of five different categories, students were rated especially low on vocabulary performance What is not certain, however, is what warrants each score given for the vocabulary scale In other words, what characteristics essays with higher lexis scores have? This study intends to examine ratings given using a lexis scale to understand what raters are sensitive to when rating vocabulary in second language timed writing samples RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 65 One way to determine the quality of a piece of writing is to measure the richness of lexical items, in other words, the degree to which a writer is accurately using a varied and large vocabulary (Laufer & Nation, 1995) Engber’s (1995) study of raters’ holistic scoring on timed writing tasks found significant correlations between the percentage of lexical variation (minus lexical error) and the quality scores (r 0.57, p , 0.01) In the study, lexical errors were counted for both meaning and form Thus the fewer lexical errors there were in the writing samples and the more varied the lexical items, the higher the readers scored them, as would be expected It is clear that the raters in this study paid particular attention to lexical error when assigning scores Santos (1988) measured the reactions of 178 university instructors to two compositions written by nonnative speakers of English The instructors rated lexical errors as being the most serious type of error and suggested that more emphasis be placed on lexical selection Separating Lexis From Grammar In Halliday’s (2004) lexicogrammar cline, lexis and grammar are presented as two ends of a single continuum rather than as distinct categories The researchers wondered whether raters were able to separate lexis from grammar when rating writing, and that is the main issue that prompted this study For the purpose of ascertaining this, a distinction had to be made in order to check whether lexical qualities were being assessed via the lexis scale Therefore, the researchers decided to remove function words from consideration as a lexical error, because they are closer to the grammatical rather than the lexical end of the continuum Recent studies and theories based on corpus linguistics have called into question the traditional view that vocabulary is separate from grammar (see Biber, Conrad, & Cortes 2004; Gries & Stefanowitsch, 2004; Hoey, 2004, 2005; Hoey & ODonnell, 2008; Hunston, 2008; Roămer, 2009; Sinclair 1991, 2004) Roămers (2009) article surveyed several influential authors arguments and evidence to support the position that grammar and lexis are inseparable Hoey’s (2005) theory of the lexicon, for example, called ‘‘lexical priming,’’ ‘‘can be seen as reversing the traditional relationship between grammar as systematic and lexis as loosely organised, amounting to an argument for lexis as systematic and grammar as more loosely organised’’ (p 9) Hoey (2004) argued that every word is ‘‘primed’’ for use in discourse as a result of an individual’s cumulative interactions with the word Lexical units ‘‘are primed to occur in, or avoid, certain grammatical functions or structures’’ (p 1) 66 TESOL QUARTERLY Rather than viewing texts as random words fitting into slots only limited by grammaticality, Sinclair (1991) wrote that ‘‘all the evidence points to an underlying rigidity of phraseology, despite a rich superficial variation’’ (p 121) Sequences of four words frequently appearing in spoken and written registers, called ‘‘lexical bundles,’’ were examined by Biber et al (2004) These lexical bundles were found to be ‘‘readily interpretable in both structural and functional terms’’ (p 399) Hunston (2008) also examined sequences of words that occur in corpora while examining the interface between lexis and grammar, with particular focus on ‘‘prepositions in the identification of phraseology .’’ (p 283) Gries and Stefanowitsch (2004) examined pairs of semantically similar grammatical constructions or ‘‘alternations’’ and the words that occur in them, for example: ‘‘the university’s budget’’ and ‘‘the budget of the university’’ (p 98) In some cases, the results show that ‘‘each of the two members of the alternating pair is a construction in its own right with its own meaning’’ (p 124) When determining a lexical versus a grammatical error in this study, the researchers decided to categorize lexical chunks, or what Biber et al (2004) called ‘‘multi-word prefabricated expressions’’ (e.g on the other hand or as soon as possible seen in Appendix B) as single lexical units (p 372) Roămer (2009) stated that, corpus studies, based on large collections of authentic text from a range of different sources, have provided mass evidence for the interdependence of lexis and grammar’’ (p 141) Taking into account these recent corpus-based studies and theories, the practice of separating lexis and grammar when rating a piece of writing has become even more worthy of inquiry Lexical Accuracy Because there are separate rating scales for grammar and lexis on the KEPT writing test, the researchers made specific choices on what constituted a lexical versus a grammatical error (see Appendix B) The accuracy of lexis in writing pertains to a writer’s knowledge of a word, which includes form, meaning, and function In this study, accuracy of lexical items was judged in terms of idiom usage, contextually appropriate word choice, word class, and spelling (only if it impeded meaning) Previous studies have defined vocabulary errors in a number of different ways Chastain (1990), studying the characteristics of graded and ungraded compositions by Spanish majors, defined a vocabulary error as: ‘‘[a] a wrong word; [b] a missing word; or [c] an extra word’’ (p 11) RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 67 Kobayashi and Rinnert (1992), in their study on the effects of first language on second language writing, considered three kinds of errors: lexical choice, awkward formation of phrases and sentences, and transitional problems Incorrect lexical choice was defined by the authors as ‘‘inappropriate or incorrect use of a word that led to obscurity or misunderstanding of a writer’s intended meaning ’’ (p 191) In examining the relationship between syntactic complexity and overall accuracy in the written English of adult foreign language learners, Bardovi-Harlig and Bofman (1989) separated errors into the following three categories: syntactic (e.g., word order), morphological (e.g., errors in nominal and verbal morphology), and lexical-idiomatic (i.e., vocabulary) The latter was the only category not clearly defined by the authors The ESL Composition Profile of Jacobs et al (1981) is a 100-point scale that has been used to evaluate the writing proficiency of second and foreign language learners (Astika, 1993; Bacha, 2001; Hedgcock & Lefkowitz, 1992) A top vocabulary score of 18–20 points is assigned to a composition demonstrating ‘‘sophisticated range, effective word/idiom choice and usage, word form mastery, and appropriate register’’ (p 90) The studies above show varied methods of determining what constitutes a vocabulary error but provide little rationale as to how or why the determination was made, which is why the researchers of the current study had little to refer to in making decisions for determining what exactly is a lexical error Lexical Frequency The researchers expected that the wider the range of lexis used in timed writings, the higher the score would be Receptive vocabulary use, associated with listening and reading, and productive use, associated with speaking and writing, have different learning burdens (Nation, 2001) Nation and Waring (1997) estimated that a vocabulary of the most frequent 2,000 words would be acceptable for everyday conversation, which covers about 96% of spoken discourse, whereas a much larger number of words would be needed for reading texts Because writing is generally more lexically dense than speech, there is a built-in expectation that the range of lexis will be more varied than is typical in everyday conversation The differences between spoken and written discourse are further highlighted by the contrasting number of highfrequency words used This study focuses on whether raters gave higher lexis scores to essays with lower frequency words In addition to overall sophistication of vocabulary, this study also gave each writer an average score of vocabulary frequency using Nation’s (n.d.) word frequency lists 68 TESOL QUARTERLY Evaluating Lexical Qualities It is intuitive that lexical richness and accurate vocabulary usage, among other factors, can aid in determining the perceived quality of a piece of writing The question of what role lexical richness plays in deciding the quality of a learner’s writing has been examined in previous studies (see Engber, 1995; Laufer & Nation, 1995; Linnarud, 1986) However, in studies undertaken by Engber (1995) and Linnarud (1986), holistic scales were used in conjunction with various other measures of lexical richness to determine writing quality, which is to say that vocabulary was not given a unique evaluation The present study, on the other hand, examines the scores for a learner’s vocabulary performance using an analytic scale which is specifically designed for assessing the lexis of an essay (see Appendix A for rating scales) Read (2000) argued that analytic scales can direct raters’ attention in a systematic way to particular aspects of writing, such as lexis, so that they arrive at a judgment for each one However, there appears to be no published research exploring the connection between lexical richness and raters’ judgments using analytic scales Indeed, there are few studies in second language writing exploring the relationship between lexical statistics and ratings, even using holistic scales This raises the questions: What specific features raters focus on when making judgments about the quality of the vocabulary used? Are they more sensitive to lexical range, lexical accuracy, or the frequency of the lexical items used? This study attempts to answer these questions Accuracy and range of lexical usage in addition to average frequency of words used and overall lexical sophistication were the measures used to determine what raters seemed to value the most METHOD A selection of 140 essays with a range of lexis scores from the March 2008 administration were taken as the study sample An equal sample of 35 essays was taken at random from each of four score groups based on the composite Rasch-adjusted scores for the lexis scale: Group 1: 0–1, Group 2: 1–2, Group 3: 2–3, Group 4: 3–4 Two sample essays are included in the appendix: Appendix C is the essay with the lowest overall lexical quality, and Appendix D is the one with the highest overall lexical quality The essay prompt for the March 2008 administration was: Keeping in touch with friends is important for everyone Emailing, texting by cell phone and social networking sites like Mixi are popular but cause problems for society and among friends People should communicate and keep in touch RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 69 through other, more personal ways Give your reaction to the above statement and support your answer with specific reasons and examples The examinees were 140 freshman students entering a foreign studies university in Japan Their English language proficiency ranged from preintermediate to advanced level It should be noted that, within English classes in the Japanese school system, grammar often dominates the curriculum, resulting in a focus on sentence level concerns and a relative lack of spoken and written fluency Questionnaire data (Ruegg & Koyama, 2010) showed that only 30% of incoming freshmen have ever written English language compositions prior to entering the university, and only 74% have written compositions in Japanese Therefore, although the students’ overall proficiency levels ranged from preintermediate upwards, their grammatical knowledge is higher, whereas their writing ability is much lower The raters for the test were 45 native- or near-native-English-speaking lecturers at the university from the United States (16), England (8), Australia (7), Japan (5), Canada (4), Bulgaria, Ireland, Jamaica, New Zealand, and Scotland (1 each) Sixteen (35.6%) of the raters were females and the other 29 (64.4%) were males All hold Master’s degrees in TESOL, linguistics, or a related field The researchers are also lecturers at the university The KEPT is a paper-based test, but for the purposes of the study, Word files were created from the sample essays for ease of analysis The aim of the KEPT writing test is to demonstrate ability to communicate ideas rather than to show perfect execution of language Therefore, it was decided that spelling mistakes which did not impede comprehension should not adversely affect an examinee’s score for lexis Incomprehensible words, however, were removed The number of tokens was recorded based on these corrected versions to provide a length variable for analysis For the KEPT, an essay needs to be a minimum of 80 words in length to be scored The researchers also created a list of distinctions between lexical and grammatical errors, as shown in Appendix B The essays were then analyzed for lexical errors by each of the researchers individually Function words, according to the list in Nation (2001), were excluded from this error analysis These words are considered to be grammatical and therefore were not counted as lexical errors A word was counted as a lexical error only if it was marked by at least two out of the three researchers Multiple instances of a single error, that is, making the same error on the same lexical item within the essay, were counted only once Error totals for each essay were then recorded These procedures were decided upon after numerous norming sessions with discussion amongst 70 TESOL QUARTERLY the researchers, with each session focusing on a small number of essays from the sample The essay files were analyzed using Paul Nation’s RANGE software (Nation, 2005) To prepare the files to be run through the analysis program, hyphens, apostrophes, and non–sentence-terminating periods were removed All three word lists available in the RANGE program were used, and a stop list was created to exclude all forms of words appearing in the essay prompt, because students are not given credit for using these words when essays are rated Before further analysis, proper nouns, abbreviations, and non-English words which appeared on the output files as not in the lists were deducted from the total types Further analyses were calculated on figures in the token category Table shows the output file for one essay Lexical frequency was also calculated based on the analysis of the types in each essay First, each word list was assigned a value: Word list 1, Word list 2, Word list 3, Not in the lists 5 The value of was assigned to words not in the lists, because the exact frequency value of each word appearing in these lists was not known, and the researchers wanted to give credit for advanced vocabulary That is to say, a word which was analyzed as not in the lists could be from the 5,000 word list or the 10,000 word list; there is no reliable way of determining which list they are from Therefore, the words which were analyzed as not in the lists were often quite significantly less frequent than those in the three word lists This is why it was decided to use the value of rather than Lexical frequency was calculated by converting the type percentages of words from each word list to decimals and then multiplying this number by its corresponding frequency value Table demonstrates how the lexical frequency of the essay in Table would be calculated In addition, lexical sophistication was calculated by adding the total percentage of words in word list and words not in the lists In the case of the essay in Table 1, the lexical sophistication would be 2.86% TABLE Sample RANGE Output Token Word list One Two Three Not in the lists Total Families Type Number Percent Number Percent 59 92.19 6.25 0.00 1.56 31 88.57 8.57 0.00 2.86 64 35 RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 32 35 71 TABLE Lexical Frequency Type (%) 0.89 0.09 0.00 0.03 Lexical frequency Value Total 0.89 0.17 0.00 0.14 1.20 Multiple regression was then performed using SPSS 15.0, to ascertain what raters were sensitive to when rating writing using the lexis scale RESULTS AND DISCUSSION Multiple regression shows to what extent changes in each independent variable can predict changes in the dependent variable (lexis) The descriptive statistics in Table show the mean, standard deviation, and the number of instances of each variable that was included in the analysis The lexis figure represents the Rasch-adjusted score the essay received on the lexis scale The grammar figure represents the Rasch-adjusted score the essay received on the grammar scale The possible scores for these two variables ranged from zero to The actual scores for the lexis scale ranged from 0.05 to 3.99, and the actual scores for the grammar scale ranged from 0.51 to 4.00 Table shows that the means and standard deviations for these two variables were comparable The accuracy figure represents the number of lexical errors present in the writing The accuracy figures ranged from zero to five, showing that there were relatively few of the errors which were distinguished as lexical errors in the sample of essays The length figure represents the number of tokens in the writing The length figures ranged from 80, which is the minimum length requirement of an essay in this test, to 300 tokens TABLE Descriptive Statistics (N 140) Variable Lexis Grammar Accuracy Length Range Frequency Sophistication 72 Mean Standard deviation 2.0274 2.0654 1.2357 141.8786 59.2143 1.2112 4.9226 1.00169 0.87823 1.1729 50.2644 21.35274 0.13158 3.63271 TESOL QUARTERLY The range figures represent the number of different types used in the writing, excluding types appearing in the prompt The range figures ranged from 25 to 129, showing a great variety of differing productive vocabulary abilities The frequency figures represent the average frequency in the English language of all the words used in the writing The average frequencies ranged from 1.00 to 1.75, which shows that the students overall used mostly very high frequency vocabulary However, it also needs to be considered that the frequency figure includes not only content words but also function words Excluding function words from this analysis may make a significant difference to these figures The sophistication figures represent the percentage of words in the writing coming from the less frequent words in the English language (words in the 3,000-word list or less frequent) The sophistication scores ranged from 0.00 to 18.67 As can be seen from the results of the multiple regression in Table 4, lexis scores are predicted by grammar scores much more than by the different lexical qualities of an essay The accuracy scores represent the number of lexical errors in an essay Because it is reasonable to assume that the more lexical errors an essay has, the lower lexis score it should receive, one should expect to see a negative relationship between these two variables On the other hand, for all of the other variables, one should expect to see a positive relationship The negative relationship between length and lexis scores means that longer essays predicted slightly lower lexis scores Likewise, essays that contained on average lower frequency words and had more sophisticated vocabulary had negative values, showing that they predicted slightly lower lexis scores when compared with essays with on average higher frequency words and less sophisticated vocabulary The standardized beta value of 0.820 shows that a large majority of the variance in lexis scores can be explained by grammar scores In addition to grammar scores, but to a lesser degree, TABLE Multiple Regression Variable Constant Grammar Accuracy Length Range Frequency Sophistication B SE 0.815 0.936 20.088 20.002 0.008 20.652 20.008 0.669 0.068 0.032 0.002 0.004 0.615 0.023 Beta t Significance 0.820 20.103 20.098 0.179 20.086 20.030 1.219 13.688 22.731 21.174 1.876 21.062 20.354 0.225 0.000* 0.007* 0.242 0.063 0.290 0.724 Note R2 0.827; SE standard error; dependent variable Lexis * Significant at the 0.05 level RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 73 lexical accuracy is also significantly predictive of lexis scores, and the extent to which lexical range predicts lexis scores is approaching significance This is a satisfactory result, because it is lexical accuracy and lexical range which are mentioned in the lexis rating scale Whereas, the variables which are not predictive of lexis scores are essay length, average frequency of types used, and overall lexical sophistication of the essays, and these qualities are not mentioned in the rating scales The R2 was 0.827, indicating that almost 83% of variance in lexis scores was accounted for by the independent variables investigated Because the score an examinee receives on the grammar scale is so strongly predictive of their score on the lexis scale, it was considered whether the lexis scale functions both to penalize those essays with poor lexical accuracy and range and to reward those with high accuracy and wide lexical range If the scale fulfills one of these roles and not the other, then this could be considered to decrease the effectiveness of the writing scales as a whole Further inquiry was carried out to clarify the relationship between the two scales It was found that, in 69.37% of ratings, the same score was given on both scales However, within the 30.63% of ratings that did have a different score on the lexis and grammar scales, the number of ratings in which the score on the lexis scale was higher than that on the grammar scale was comparable to the number in which the grammar score was higher than the lexis score, with 54% having a higher lexis score and 46% having a higher grammar score CONCLUSION AND SUGGESTIONS FOR FURTHER RESEARCH In this study, we set out to discover what KEPT raters were sensitive to when using the lexis scale to rate timed essays It is clear from the results that, although the lexis scale aims to assess range and accuracy of lexis, grammar plays a much more significant role than lexis in determining the actual ratings given This indicates that KEPT raters find it challenging to distinguish lexis from grammar This distinction is an area of English language teaching and assessment which has not been well explored Certainly, some errors are inherently more lexical or more grammatical than others Errors of word choice or formation, for example, are clearly lexical in nature, whereas verb tense errors and incorrect word order are clearly grammatical The lexis–grammar dichotomy becomes blurred when we consider such examples as collocation, where, for example, misuse of a preposition in an idiomatic phrase could be considered an error within a lexical chunk or simply a stand-alone grammatical error Indeed, the researchers in this study 74 TESOL QUARTERLY found it difficult to clearly lay out these distinctions when assessing the essays for lexis It was only through much analysis and discussion that such distinctions could be made in light of the purpose of the task The lack of literature attempting to make a concrete distinction between what can be considered lexical as opposed to grammatical is the main limitation of this study It was with much difficulty that the researchers teased the two apart for the purpose of counting lexical errors There will undoubtedly be some disagreement regarding the distinctions agreed upon This is one area in which further research is certainly necessary Distinguishing lexis from grammar is a particularly thorny issue Because the distinction is so difficult to make, many scholars suggest that one should not try to distinguish the two but rather should accept the inextricable interwovenness that exists However, many assessment criteria involve individual assessments of both lexis and grammar, which presupposes that the two can be evaluated independently of each other Therefore, if such criteria are to be used, it is necessary to investigate just how raters fare when it comes to distinguishing lexis from grammar On the other hand, if one is to accept that the two are inextricably interwoven, the assessment criteria must reflect this by assessing lexicogrammar as a single criterion Given the results of this study, it seems that, when rated individually, the lexis scale for the KEPT is not providing a valid representation of the item it is meant to assess Because of the complexity of the grammar versus lexis issue, collapsing the scales into a single lexicogrammar scale may improve raters’ ability to assess these items more accurately As noted earlier, a lexicogrammar scale has already been successfully implemented for the KEPT speaking test On the other hand, it is also possible that the lexis and grammar scales could be teased apart through discussion during rater training and norming sessions, in much the same way that the researchers approached dividing grammar and lexis for the study However, for the purposes of the KEPT, such time-intensive and large-scale discussion may not be practical, given the constraints of the actual testing schedule Nevertheless, it is clear that the current lexis and grammar scales are not performing well enough and there is a need for future research to inform changes to the KEPT writing scales For deeper insight into what the raters believe constitutes vocabulary and what they believe constitutes grammar, or indeed whether they feel the two are distinguishable at all, a qualitative study would be fruitful A qualitative study could involve questionnaire data, interviews with raters, or think-aloud protocols This study represented an observation of the way raters rate essays for lexis using the KEPT rating scales As such, no intervention in the rating itself was made The results could be verified more strongly through an RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 75 experiment involving the manipulation of the lexical content of essays before rating THE AUTHORS Rachael Ruegg is a senior lecturer and coordinator of the KEPT collaborative research group at Kanda University of International Studies in Chiba, Japan Her research interests include vocabulary, writing, and assessment Erik Fritz has a MATESOL from the Monterey Institute of International Studies His research interests include the field of second language writing and classroom-based performance assessment He has taught in Kyrgyzstan, Japan, and the United States Jennifer Holland is a lecturer and research coordinator in the English Language Institute at Kanda University of International Studies in Chiba, Japan She has also taught English in Egypt and the United States Her research interests include selfediting in writing, error logs, and student-directed grammar instruction REFERENCES Astika, G (1993) Analytical assessment of foreign students’ writing RELC Journal, 24, 61–72 Austrian University ELTT Group (n.d.) Austrian University ELTT rating scale for writing Retrieved from http://www.uni-klu.ac.at/ltc/downloads/ELTT_Writing_ Scale.pdf Bacha, N (2001) Writing evaluation: What can analytic versus holistic scoring tell us? System, 29, 371–383 Bardovi-Harlig, K., & Bofman, T (1989) Attainment of syntactic and morphological accuracy by advanced language learners Studies in Second Language Acquisition, 11, 17–34 Batty, A O (2006) An analysis of the relationships between vocabulary learning strategies, a Word Associates Test, and the KEPT Studies in Linguistics and Language Education: Research Institute of Language Studies and Language Education, 17, 1–22 Biber, D., Conrad, S., & Cortes, V (2004) If you look at : Lexical bundles in university teaching and textbooks Applied Linguistics, 2, 371–405 Bonk, W J (2001) Predicting paper-and-pencil TOEFL scores from KEPT data Studies in Linguistics and Language Education: Research Institute of Language Studies and Language Education, 12, 65–85 Bonk, W J., & Ockey, G J (2003) A many-facet Rasch analysis of the second language group oral discussion task Language Testing, 20, 89–110 Chastain, K (1990) Characteristics of graded and ungraded compositions The Modern Language Journal, 74, 10–14 Engber, C (1995) The relationship of lexical proficiency to the quality of ESL compositions Journal of Second Language Writing, 4, 139–155 ETS (n.d.) For test-takers: TOEFL Paper-based Test (PBT): Writing score guide Retrieved from http://www.ets.org Gries, S T., & Stefanowitsch, A (2004) Extending collostructional analysis: A corpus-based perspective on ‘‘alternations.’’ International Journal of Corpus Linguistics, 9, 97–129 76 TESOL QUARTERLY Halliday, M A K (2004) An introduction to functional grammar (3rd ed.) London, England: Hodder Arnold Hedgcock, J., & Lefkowitz, N (1992) Collaborative oral/aural revision in foreign language writing instruction Journal of Second Language Writing, 1, 255–276 Hoey, M P (2004) Lexical priming and the properties of text Retrieved from http:// www.monabaker.com/tsresources/LexicalPrimingandthePropertiesofText.htm Hoey, M P (2005) Lexical priming: A new theory of words and language London, England: Routledge Hoey, M P., & O’Donnell, M B (2008) Lexicography, grammar, and textual position International Journal of Lexicography, 21, 293–309 Hunston, S (2008) Starting with the small words: Patterns, lexis, and semantic sequences International Journal of Corpus Linguistics, 13, 271–295 IELTS (n.d.) Researchers—Score processing, reporting and interpretation Retrieved from http://www.ielts.org/researchers/score_processing_and_reporting.aspx Jacobs, H L., Zinkgraf, S A., Wormuth, D R., Hartfiel, V F., & Hughey, J B (1981) Testing ESL composition: A practical approach Rowley, MA: Newbury House Kobayashi, H., & Rinnert, C (1992) Effects of first language on second language writing: Translation versus direct composition Language Learning, 42, 183–215 Laufer, B., & Nation, P (1995) Vocabulary size and use: Lexical richness in L2 written production Applied Linguistics, 16, 307–321 Linnarud, M (1986) Lexis in composition: A performance analysis of Swedish learners’ written English Malmo, Sweden: Liber Forlag Malmo Lumley, T (2002) Assessment criteria in a large scale writing test: What they really mean to the raters? Language Testing, 19, 246–276 Nation, I S P (2001) Learning vocabulary in another language Cambridge, England: Cambridge University Press Nation, I S P (2005) Range and frequency: Programs for Windows based PCs [Computer software and manual] Retrieved from http://www.victoria.ac.nz/ lals/staff/paul-nation/nation.aspx Nation, I S P (n.d.) Vocabulary Resource Booklet Retrieved from http://www.victoria ac.nz/lals/staff/paul-nation.aspx Nation, P., & Waring, R (1997) Editors’ comments—description section In N Schmitt & M McCarthy (Eds.), Vocabulary: Description, acquisition and pedagogy (pp 103–108) Cambridge, England: Cambridge University Press Read, J (2000) Assessing vocabulary Cambridge, England: Cambridge University Press Roămer, U (2009) The inseparability of lexis and grammar: Corpus linguistic perspectives Annual Review of Cognitive Linguistics, 7, 141–163 Ruegg, R., & Koyama, D (2010) Student confidence in writing: The effect of feedback Studies in Linguistics and Language Education: Research Institute of Language Studies and Language Education, 21, 137–166 Santos, T (1988) Professors’ reactions to the academic writing of nonnativespeaking students TESOL Quarterly, 22, 69–90 Sinclair, J M (1991) Corpus concordance collocation Oxford, England: Oxford University Press Sinclair, J M (2004) Trust the text: Language, corpus and discourse London, England: Routledge Wakabayashi, J (1992) Some characteristics of Japanese style and the implications for Japanese-English translation The Interpreters’ Newsletter, 1, 60–68 RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 77 APPENDIX A KEPT Rating Scales Organization Think about: NCoherence NStructure 78 Lexis Think about: NVariety NControl Grammar Content Think about: NRange NAccuracy Think about: NRelevancy to the main idea NIdeas that are supported and developed No coherence or Demonstrates Phrases or sentences A list of sentences organisation, uncon- minimal word produced, but many with no logical connected sentences knowledge inaccuracies make nection and/or are which communicate message/writing dif- irrelevant little ficult to understand Some attempts to A limited variety of Inadequate range of Ideas are connected organise information vocabulary with lit- grammar used repe- but not relevant, but with little contle control titively or inaccudeveloped or supnection between ideas rately ported apparent Obvious attempts to Uses an adequate An adequate range Ideas are connected, organize information variety of vocabuof grammar used, relevant, but not supthough sometimes lary with moderate with inaccuracies ported or developed the lack of coherence control that impede the creates ambiguity understanding of sentences The writing displays Uses a wide variety An adequate range Ideas are connected an organizational of vocabulary but of grammar but and relevant They structure which there are inaccura- occasionally accuracy are supported but the enables the message cies in word choice affects the undermain idea is not to be followed and formation standing of sendeveloped although sometimes tences the lack of coherence might create ambiguity The writing displays a Uses a wide variety A wide range of The ideas are relecoherent organizaof vocabulary, grammar used accu- vant, well supported tional structure which accurately and with rately and developed enables the message control to be followed effortlessly TESOL QUARTERLY APPENDIX B Lexical vs Grammatical Errors Lexical errors Grammatical errors Word class When the wrong class of word is used it will be considered that the word family is known by the examinee Therefore, the word will remain in the essay for analysis As the word has been incorrectly used it will be counted as an error e.g.) Japan is an affluence country error Word choice If the wrong word has been chosen for the context it will be considered that the word family is known by the examinee Therefore, the word will remain in the text for analysis As the word has been used incorrectly it will be counted as an error e.g.) I talk to my friend by e-mail error Collocation with preposition If words are collocated with an incorrect preposition, it will not be considered a lexical error; preposition usage will be considered a grammatical issue and ignored e.g.) I listen my mother no lexical error Word form If the word is contextually appropriate but in the wrong form, it will not be considered a lexical error; these kinds of malformations will be considered a grammatical issue and ignored e.g.) We can get lots of informations on the internet no lexical error I want to gone to Kyushu in the summer vacation no lexical error Spelling Omission If it is unclear which word is being attempted, the The omission of necessary words will not word will be counted as an error, as well as deleted be considered a lexical error; omission from the essay before analysis (Otherwise, spelling will be considered a grammatical issue errors will not be classed as errors.) As long as the and ignored word is clearly recognizable, minor misspellings will e.g.) If I can it I want to no error simply be corrected for purposes of analysis Day before yesterday I went park no error e.g.) comunicate no error, conflusigate error I went to a live no lexical error Idiom misusage Idioms will be considered lexical units and therefore misuse of idioms will be considered a lexical error However, idioms are not recognized by the analysis software, so they will remain in the essay for analysis as individual words e.g.) How are you do? error I want to e-mail as possible as I can error APPENDIX C Essay With the Lowest Overall Lexical Quality I think to Email and cell phone is good ways Because, It is very easy to communicate to friends For example, I’m busy, so I sometimes to meet friends But, Email us always using Not need to meeting I think Mixi is good ways too But Mixi is little dangerous Because, Mixi is not to read voice and to know reale people Each other is friends are not program But, first is not know each other I think little dangerous it Mixi is good site to make friends When to meet friends, very careful RATER SENSITIVITY TO QUALITIES OF LEXIS IN WRITING 79 APPENDIX D Essay With the Highest Overall Lexical Quality I think that communication is a real importance in the society that we live in today Having moved to so many different countries in my life and meeting many different people from different countries, staying in contact with them via e-mail has been necessary Though at times I feel letters are more personal and warm, us human beings in the 20th century tend to feel more comfortable with ‘useful’, ‘quick’ and ‘easy’ e-mails In addition to that, where as we keep in touch with for away friends with e-mails, the young generation of today very much rely on their cell phones to stay in touch with their closer friends and everyday news It has come to a point that even phoning each other is a hassle or expensive that some people prefer texting each other as if doesn’t disturb the person who receives the text in what ever they like I have noticed that Mixi and facebook have become very popular because we are able to keep a huger amount of people updated about our lives through one site I must admit that I also am one of those youths being swept away in a cyber world where communication has become a technical thing Still, recently I have experienced a warm, heart filled written letter addressed to me and re-learnt the precious beauty of writing our own words through pen onto paper I now feel the great importance that we shouldn’t forget or not use old ways of doing things just because we live in a world ruled by technology 80 TESOL QUARTERLY