How Do Teachers Observe and Evaluate Elementary School Students’ Foreign Language Performance? A Case Study from South Korea YUKO GOTO BUTLER University of Pennsylvania Philadelphia, Pennsylvania, United States This study investigates how teachers observe and assess elementary school students’ foreign language performance in class and how such assessments vary among teachers Twenty-six elementary school teachers and 23 English teachers at secondary schools in South Korea watched videotapes of 6th-grade students’ group activities in English and were asked to assess the students’ performance as if they were in their own classrooms The study found that the teachers varied substantially in their overall evaluations both within and across school levels A discussion held among the teachers after the individual assessments were completed showed that the elementary school teachers and secondary school teachers differed with respect to (1) their views toward assessment criteria, (2) how to evaluate student confidence and motivation; and (3) how to gauge students’ potential ability to communicate competently in a foreign language Such differences between the elementary and secondary school teachers appeared to be deeply rooted in their respective teaching contexts Using Davison’s (2004) framework for analyzing teachers’ beliefs and practices in teacher-based assessment, the current study suggests that both groups of teachers need to negotiate assessment criteria while paying close attention to the local context and adapting their teaching practices to fit therein T his study is concerned with how teachers observe and assess young learners’ foreign language performance in class and how such assessments vary among teachers By asking elementary school teachers and secondary school teachers to evaluate 6th-grade students’ English abilities at the end of their elementary school education in South Korea, the current study aims to examine similarities and differences in teachers’ observations both among teachers working at the same school level and across different school levels In doing so, it is hoped that this study will help us better understand teacher observation, as one popular type of teacher-based assessment, and help us enable a smoother transition TESOL QUARTERLY Vol 43, No 3, September 2009 417 in assessment practice from elementary school English to secondary school English Teacher-based assessment can be defined as “nonstandardized local assessment carried out by teachers in the classroom” (Leung, 2005, p 871) A number of countries have begun heavily promoting teacher-based assessments as part of their language-in-education policies in recent times The degree to which teacher-based assessments have gained in prominence is particularly evident at the elementary school level We find a strong emphasis on the development of communicative competence, especially in the oral domain, because it has become one of the central goals of English as a foreign language education at the elementary school level (FLES) The promotion of teacher-based assessment is also tied to the intentions of policy makers, who have strived to avoid traditional achievement tests, such as paper-and-pencil standardized tests, at the elementary school level Despite the strong promotion of teacher-based assessment in various educational contexts, including FLES, concerns have been cited with regard to the validity, reliability, high costs, fairness, and logistical challenges in developing, administering, and scoring teacher-based assessment (e.g., Gattullo, 2000; Linn, Baker, & Dunbar, 1991; Rea-Dickins & Gardner, 2000) Part of the challenge of teacher-based assessment appears to come from the dilemma between the pedagogic and measurement aspects of such assessments Namely, at the same time that these assessments are supposed to help teach students, education policies often ask teacherbased assessment to fulfill an accountability requirement Teachers worldwide encounter tension in meeting the pedagogical needs of students while at the same time meeting the accountability requirements that are often based on prescribed standards and criteria (e.g., Arkoudis & O’Loughlin, 2004; Brindley, 1998, for a discussion of this tension in Australia; Davison, 2004, for Australia and Hong Kong; Gardner & ReaDickins, 1999; Teasdale & Leung, 2000, for England) South Korea is no exception to this trend At the elementary school level, among the various types of teacher-based assessments, teacher observation has been promoted as a primary means of assessment Teachers are encouraged to observe students’ performance systematically during classroom activities and to use such observations for both summative and formative purposes.1 However, in many cases, no specific criteria for conducting observations have been provided, and we know 418 The summative assessment is usually given to students at the end of an instructional sequence and the results are primarily used for giving students reports about their achievement The formative assessment is usually undertaken before and/or during an instructional sequence and is primarily used to help the students identify their strengths and weaknesses, and in turn, provides teachers with information in order to make instructional decisions TESOL QUARTERLY little about how teachers observe and assess their students’ performance during classroom activities At the secondary school level, teacher-based assessment, including teacher observation, has also increasingly been emphasized However, practical and pedagogical challenges including large class sizes and limited class hours leave teachers little time for making systematic observations for formative purposes Under such conditions, parents and students have frequently cited their distrust of teacher observation as a summative assessment (Butler, 2005) A number of researchers have questioned the application of traditional measurement-based concepts of validity and reliability to teacherbased assessment (Brookhart, 2003; McMillan, 2003; Moss, 2003; Smith, 2003) Traditional validity and reliability theories are fundamentally concerned with the ability (or lack thereof) to generalize assessment-based inferences, and the consistency of measures irrespective of the context, form, time span, and raters involved in assessment Such concepts are not necessarily relevant or even compatible with teacher-based assessment, which is highly context dependent and primarily formative in nature (McNamara, 2001; Teasdale & Leung, 2000) Teacher-based assessment should not be considered as a collection of miniature summative assessments (Rea-Dickins, 2007) Indeed, criteria that are derived from the psychometric tradition may not be appropriate for teacher-based assessment (Leung, 2005) As an alternative approach, Wiliam (2001) proposed construct-referenced assessment, which is based on “the consensus of the teachers making the assessment” (p 172) In this approach, there is no predefined objective criterion Instead, teachers’ judgments are based on shared understandings of what a community of teachers in a given teaching context would consider competency Leung (2005) argues that the concept of constructreferenced assessment is “useful in that it opens the way to an examination of the kind of information teachers seek and the basis of their decision making” (p 880) This approach sheds light on the importance of understanding teachers’ knowledge about assessment and paying attention to the specific context in which the assessment is undertaken To date, researchers have only a limited understanding of the reasoning and criteria that teachers use in their teacher-based assessments for young learners in English as a foreign language (EFL) contexts In examining English as a second language (ESL) environments, a number of studies have investigated how teachers understand and work with assessment criteria when they perform teacher-based assessments (e.g., Breen, et al., 1997; Davison, 2004; Leung, 1999; Teasdale & Leung, 2000) In England, for example, Leung (1999) found that teachers not seem to make judgments simply based on students’ linguistic performance on a given task, but rather that they make holistic judgments while bringing in various external factors such as performance in previous activities ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 419 or performance outside of the classroom Much of the previous work, however, is primarily focused on investigating how teachers interpret prescribed assessment frameworks and how they apply these in their assessment practices As Rea-Dickins (2007) indicates, there are usually multiple motivations for implementing FLES Many FLES programs have not only linguistic targets but also nonlinguistic objectives such as developing positive attitudes toward the target language as well as an appreciation of both foreign and domestic cultures One may hypothesize that having so many varied objectives for FLES programs could make it difficult for teachers to reach a consensus among themselves on what specific objectives they should try to achieve through assessment Moreover, researchers have observed a lack of consistency in foreign language teaching practices, including assessment, between the elementary and secondary school levels (e.g., Bolster, Balandier-Brown, & Rea-Dickins, 2004; Butler, 2005) We also have very little understanding of how secondary school language teachers understand and evaluate the performance of those incoming students who have come up through FLES programs This study therefore investigates how elementary and secondary school teachers observe and assess 6th-grade students’ foreign language performance in class (at the end of their elementary school education) and how those assessments vary among such teachers The study focuses on what abilities teachers pay attention to when assessing elementary school students’ performance and what kinds of criteria and reference points they use The study attempts to address these topics by focusing on English FLES in South Korea as a case study As we shall see, South Korean society has traditionally placed substantial value on measurement in its educational system, but the government recently began promoting teacher-based assessment as part of its language-in-education policy Unlike many of the cases that have been documented thus far, no prescribed assessment framework is available for teachers in South Korea; rather, teachers are responsible for developing their own assessments as part of their teaching practice ENGLISH AS A FOREIGN LANGUAGE EDUCATION IN SOUTH KOREA In South Korea, various types of assessment have played a significant role in education and society as a whole English, as one of the key academic subjects, has been used as a barometer of students’ general academic achievement and diligence Grammar translation and vocabulary exercises long dominated English classrooms at the secondary school level and beyond; rigorous standardized assessments measuring students’ 420 TESOL QUARTERLY discrete linguistic knowledge had a substantial impact on students’ future academic and career opportunities In the 1990s, as part of the South Korean government’s globalization policy, the Ministry of Education shifted their English curriculum from traditional grammar-translation instruction to communicative-based instruction Acquiring communicative competency, and oral communicative abilities in particular, became a central goal of English education In line with the promotion of communicative language teaching (CLT), teachers are now coached to use various types of activities in class and encouraged to use only English in their classrooms A student-centered approach has been strongly promoted in the current policy Along with this shift in teaching approach, the government introduced a series of reforms in assessment One such reform was the promotion of teacher-based assessment as part of the assessment requirements The 7th National Curriculum (implemented in 1999) indicated that teachers should assess students’ process of learning as well as the outcome of their learning through ongoing observation and other forms of performance assessment, as opposed to one-shot multiple choice tests It is important that the policy stressed the autonomy of schools and teachers in administering such assessments Schools were now responsible for deciding the methods, criteria, and frequency of assessments through discussions among their teachers (see, e.g., Chungcheongnamdo Office of Education, 2007) As part of the effort to enhance the communicative competence of its citizens, the South Korean government introduced English as a compulsory subject at the elementary school level nationwide in 1997 In addition to the strong emphasis on oral communication as a central goal of FLES, motivating students to learn English was set as another important goal A variety of group activities have been implemented in classrooms based on the uniform national curriculum With respect to assessment, the government (via the Korea Institute of Curriculum and Evaluation, or KICE) has suggested that teachers should periodically observe their students’ performance in class and keep “observation records” on attitudes, oral skill development, and written skill development KICE also created a 5-point scale to serve as an example for these teacher observations (Lee, 2007) However, KICE did not provide precise criteria for each point and domain Individual teachers must decide how to use such assessment information for summative and/or formative purposes Currently, report cards to students and parents at the elementary school level in South Korea are based on verbal descriptions and not on a numeric scale However, parents may request teachers to disclose any of the information on their children’s performance in class that was used as a basis for their evaluation In practice, many teachers keep records of one type of numeric scale or another, including standardized test scores, in addition to verbal comments on their students’ performance ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 421 Despite the promotion of teacher-based assessment at school, South Korean society continues to emphasize a measurement-driven orientation toward assessment Even among elementary school students, a number of large-scale standardized English tests such as the Test of English for International Communication (TOEIC) Bridge and the Practical English Level Test for Elementary English (PELT) are very popular (Choi, 2008) Many students go to private English institutes where they can access various types of proficiency tests including these standardized tests At the secondary school level, the pressure that students feel about exams appears to have become even more intense Under recent reforms in English assessment, a certain portion of the students’ final grades should now come from teacher-based assessment Final grades, however, are given to students on a numeric basis Students and parents are greatly concerned with their English grades at school as well as their scores on various types of standardized tests of English, especially in relation to accessing higher education and fulfilling their career aspirations Under substantial pressure from parents and students, teachers in South Korea are expected to promote a learning culture in accordance with government policy, but in an excessively focused exam culture (Hamp-Lyons, 2007) Teacher-based assessment thus inevitably becomes part of accountability measures, at least to some degree Finally, it has been reported that elementary school teachers have had little direct communication with secondary school English teachers regarding instruction and assessment in South Korea (Butler & Lee, 2005) Teacher training is usually offered to elementary school teachers and secondary school teachers separately, and teachers typically have few opportunities to observe English classes at different school levels As such, assessment practices may differ significantly between elementary and secondary school teachers COMMUNICATIVE ABILITIES IN A FOREIGN LANGUAGE One of the challenges in developing teacher-based assessments in FLES is the limited understanding of what specifically entails having communicative abilities in a foreign language Language assessment constructs are not yet clearly understood, especially when pedagogical value is primarily placed on assessment (McNamara, 2001; Rea-Dickins & Gardner, 2000) In identifying constructs for teacher-based assessment for young learners, the following three characteristics of existing models of communicative abilities are especially problematic: (a) the notion that communicative abilities reside in individuals, (b) the lack of a clear conceptualization of the affective aspects of communicative abilities, and (c) the lack of a developmental perspective 422 TESOL QUARTERLY With regard to the first characteristic, current theories on communicative competence in a second or foreign language overemphasize individual performance as opposed to interactive performance (McNamara, 1996, 1997, 2001) McNamara argues that it is dangerous to consider one’s performance on a performance test as being a mere reflection of one’s individual competence; rather, performance is co-constructed through interactions among various agents such as interlocutors, test materials, raters, and so forth In FLES programs, a frequently emphasized goal is developing students’ communicative abilities rather than their discrete linguistic knowledge per se, and pair and group activities are widely used One can expect that a student’s performance is influenced by the nature of the activities and interactions with the student’s interlocutors In addition, which activities the teacher chooses to observe and which aspects of the student’s performance the teacher chooses to pay attention to all contribute to the dynamic interactions that influence ratings and evaluations However, it is not clear how best to understand students’ communicative competence in such interactions Second, although affective aspects such as motivation and confidence are often set as a key objective for FLES programs worldwide, including in South Korea, current theories on communicative competence not agree on how best to conceptualize such affective factors in language assessments (McNamara, 1996) Hymes (1972) distinguished ability for use from knowledge in his model of communicative competence (which was originally developed in the context of first language use) Hymes conceptualizes ability for use as one’s potential ability for performance, and it includes various language-relevant cognitive and noncognitive factors such as motivation However, a model proposed by Canale and Swain (1980) that has become one of the most influential models of communicative competence in second or foreign language acquisition carefully excludes factors that are relevant to ability for use There have been some attempts to capture the affective dimensions of ability for use in successive models, such as Bachman’s (1990) strategic competence and Bachman and Palmer’s (1996) affective schemata However, as the term schemata indicates, affective dimensions in their models are conceptualized as cognitive entities in nature and are primarily considered to be a source of response bias in assessment; the role of affective factors in language use is far from clear (McNamara, 1996) In addition, the role of nonverbal behavior in language assessment, such as body movements and facial expressions, has yet to be sufficiently explored (Young, 2002) One can also point to a lack of developmental perspectives in the current leading models of communicative competence Such models identify and classify different constructs of communicative competence but they not explain how different components interact with each other or how such interactions may change over the course of individual ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 423 development Are abilities in different constructs expected to develop simultaneously, or are certain constructs more important than others at some point in development? We not yet have a comprehensive theory of the development of communicative competence that teachers can use when conducting formative assessments for their students If we are to introduce teacher assessment, therefore, an important starting point would be to understand teachers’ perceptions of what’s important in carrying on competent communication in a foreign language RESEARCH QUESTIONS The purpose of the study was to examine how teachers observe and assess elementary school students’ foreign language performance in daily classroom activities More specifically, the study aimed to investigate the following three questions: How teachers observe and assess elementary school students’ performance while they engage in group activities? How much consistency (or variability) is there in teachers’ assessments of student performance while their students are interacting with other students? What kinds of selective attention teachers demonstrate in assessing their students’ performance (including both verbal and nonverbal aspects)? What criteria or methods they rely on to observe and assess their students’ performance? How they negotiate criteria among themselves through discussion? Do elementary and secondary school teachers assess 6th-grade students differently? METHOD Participants The participating teachers were recruited from an in-service teachers’ training site in central South Korea The local provincial government selects groups of English teachers each year from elementary and secondary schools from across the province to receive approximately two months of professional training at various local sites The elementary and secondary school teachers receive their training separately The current study was conducted as part of the in-service training program However, participation in the current study was on a voluntary basis With the help of a training organizer, 26 elementary school teachers and 23 secondary school teachers were recruited at one of the training sites during the summer of 2007 All the participants came from different schools and their 424 TESOL QUARTERLY TABLE Participating Teachers’ Profiles Elementary school teachers Secondary school teachers Subjects taught* English Multiple subjects 11 (42.3%) 15 (57.7%) 23 (100%) (0%) Educational background 4-year college Some postgraduate education 15 (57.7%) 11 (42.3%) 16 (69.6%) (30.4%) Average years of teaching (SD)** English Including other subjects 5.81 (3.91) 10.01 (6.07) 14.21 (7.42) 15.11 (6.53) Age 20s 30s 40s (7.7%) 16 (61.5%) (30.8%) (17.4%) (21.7%) 14 (60.9%) Gender Male Female (15.4%) 22 (84.6%) 10 (43.5%) 13 (56.5%) (3.8%) (34.6%) (19.2%) (11.5%) (30.8%) (0%) (0%) (8.7%) 14 (60.9%) (30.4%) Average class size (SD) 29.1 (10.15) 32.1 (6.24) Familiarity with curriculum of a school level other than the level they teach (i.e., familiarity with elementary school curriculum for secondary school teachers and vice-versa) Fully understood Good knowledge, if not full Some knowledge Little knowledge (No response) 16 (61.5%) (7.7%) (11.5%) (7.7%) (11.5%) (26.1%) 11 (47.8%) (13.0%) (0%) (13.0%) Hours of English taught per week Not taught yet hours or less 4–10 hours 10–20 hours More than 20 hours Experience of observing English classes at a school level other than the level at which they teach (7.7%) (0%) Note * As of 2008 in South Korea, English is taught by homeroom teachers who teach multiple subjects, as well as by teachers who specialize in teaching English only at elementary schools Teachers may change their status on their principals’ requests each year As a result, English teachers may become homeroom teachers and vice-versa At the secondary school level, English teachers are specialized and teach English only ** SD = Standard deviations backgrounds were diverse Table summarizes the teachers’ profiles based on a background survey that was distributed to the teachers prior to the study Notably, the secondary school English teachers who participated in the current study appeared to be less familiar with the elementary school English curriculum whereas the elementary school teachers who participated were more familiar with the secondary school English curriculum ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 425 Very few of the teachers in either group had observed English classes at the other school level Although language assessment is considered an important component of professional development for English teachers in South Korea, the participating teachers had received relatively little training on how to conduct teacher-based assessment, and they had not had any extensive discussions on this topic with other teachers prior to this study Materials and Procedures The participating teachers watched videotapes of students’ group activities in English and were asked to assess the students’ communicative performance This activity was conducted separately for the elementary school teachers and the secondary school teachers The video showed four 6thgrade students engaging in two different activities in a group: One was a simple jigsaw activity and the other was a more complicated decisionmaking activity (Pica, Kanagy, & Falodun, 1993) In the first activity, the students were asked to complete the weekly schedule for a student named Minho on a collaborative basis Each of the students had different information about the schedule and thus two-way interactions were required to complete the task However, only a limited set of expressions and vocabulary (e.g., “What does Minho on Monday afternoon?” and “He plays baseball”) were needed for the completion of the task The second activity was an open-ended shopping task wherein one student played a customer and the rest of the students played shop owners Each shop carried different items at different prices The goals of this task included: (a) to buy a list of items for a party and leave with as much money as possible (for the shopper), and (b) to sell many items and make as much money as possible (for the shop owners) This task required the students to use a variety of English expressions and vocabulary related to shopping in order to buy or sell goods and to negotiate prices (all of the necessary expressions and vocabulary had already been covered in class based on the National Curriculum) Unlike the first activity, the number of utterances produced and the content of the discussions among the students varied substantially in the second activity This latter activity required some simple math skills to help solve the problems encountered in the task itself Both activities were commonly used in 6th-grade English classrooms in South Korea, and each activity lasted for 15 minutes The participating teachers could see the written notes that the students took during the activities Among the four students shown in the video, two were boys and two were girls They used English pseudonyms during the activities: Tom, John, Jane, and Sally In the video, the four students appeared to differ in terms of their activeness/shyness and their general English proficiency levels The only objective measure of their proficiency available to the 426 TESOL QUARTERLY FIGURE Teachers’ Holistic Analysis of Activity 430 TESOL QUARTERLY attention to various aspects of the students’ performance, including both linguistic and nonlinguistic aspects, in possibly complicated ways One can also observe variability in the teachers’ judgments regarding individual students The teachers’ judgments were more widely spread for Tom and Sally whereas they exhibited relatively more agreement with respect to John The elementary and secondary school teachers also evaluated individual students differently The secondary school teachers tended to rate Tom higher compared with the elementary school teachers in both activities The reverse tendency was observed for John and Jane.3 Traits Chosen by the Teachers Table summarizes the traits chosen by the teachers for each activity This table lists the traits in order of higher frequency (note that the teachers were not asked to rank these traits; rather, they were asked to indicate the traits that they paid attention to while they observed the students) Speaking fluency, confidence in talking, listening comprehension, motivation, and speaking accuracy were the traits most frequently chosen by both groups of teachers On the other hand, pronunciation was the least frequently chosen trait There did not seem to be a notable difference in the choice of prescribed traits between the elementary and secondary school teachers Qualitative Analysis of the Discussions Among the Teachers The Nature of Observation: What Did the Teachers Look for? Although in the quantitative analysis the elementary and secondary school teachers did not show a notable difference in their choice of traits, the qualitative analysis revealed that the two groups of teachers looked for different things during their observations Elementary school teachers, either consciously or unconsciously, tended to avoid setting any criteria, whereas secondary school teachers tended to depend on some form of set criteria even when they made holistic judgments based on their observations Some elementary school teachers suggested that identifying traits was a somewhat artificial activity A few of them indicated that they made As far as the average scores are concerned, a series of one-way ANOVAs indicated that the secondary school teachers gave significantly higher scores to Tom than the elementary school teachers in Activity (F(1, 44) = 4.43, p < 0.05, η2 = 0.09) A similar tendency was observed in Activity (F(1, 42) = 3.88, p = 0.056, η2= 0.09) The elementary school teachers gave a higher evaluation for John in Activity (F(1, 43) = 7.02, p < 0.05, η2 = 0.14) and for Jane in Activity (F(1, 41) = 7.88, p < 0.01, η2 = 0.19) No significant differences were observed for Sally between the two groups of teachers for either activity ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 431 432 TESOL QUARTERLY 11 (42%) Motivation Motivation (15%) Pronunciation (0%) Range of vocabulary use Others Others (19%) Range of vocabulary use Pronunciation (0%) (12%) (19%) (27%) Ability to interact effectively with other students (interpersonal) (27%) Ability to interact effectively with other students (interpersonal) (27%) (31%) (31%) 12 (46%) 16 (62%) 19 (73%) 20 (77%) 21 (81%) Use of appropriate (27%) Content being spoken expression in the about (content) given context (pragmatics) (31%) Task completion 11 (42%) Use of appropriate expression in the given context (pragmatics) Speaking accuracy (35%) Speaking accuracy 16 (62%) Listening comprehension Listening comprehension Content being spoken about (content) 21 (81%) Confidence in talking Confidence in talking Task completion 22 (84%) Speaking fluency Activity Speaking fluency Activity Elementary school teachers (n = 26) Speaking fluency (30%) Use of appropriate expression in the given context (pragmatics) (39%) Range of vocabulary use 10 (43%) Speaking accuracy 12 (52%) Motivation 17 (74%) Confidence in talking 17 (74%) Listening comprehension Others Ability to interact effectively with other students (interpersonal) Pronunciation Task completion (0%) Others (13%) Pronunciation (13%) Ability to interact effectively with other students (interpersonal) (26%) Task completion Use of appropriate (30%) Content being spoken expression in the given about (content) context (pragmatics) Range of vocabulary use Speaking accuracy Content being spoken about (content) Motivation Confidence in talking 18 (78%) Speaking fluency Activity Secondary school teachers (n = 23) Activity Listening comprehension TABLE Traits Chosen by the Teachers (0%) (22%) (26%) (26%) (26%) (39%) (39%) (39%) 11 (48%) 16 (70%) 16 (70%) 17 (74%) their judgments based on “overall judgment” (E8),4 “general performance” (E19), and “overall flow” (E9) without specifying any particular criteria in mind The elementary school teachers seemed to believe strongly that evaluation at the elementary school level should focus only on students’ strengths and that setting standard criteria or traits may lead to a more measurement-oriented practice that should be avoided at the elementary level Some indicated that they set different criteria depending on the individual student because “I pay attention to the strength of each student” (E4) Inconsistencies in their evaluations across students did not seem to matter too much to these teachers Some elementary school teachers, however, did express concern regarding such inconsistencies “If you don’t have any criteria, lower performers tend to get higher evaluations than they are supposed to” (E3) Others were concerned that holistic judgments without specific criteria could be easily influenced by other students’ performance; one teacher stated, “I don’t think holistic judgment is valid because I cannot help but compare students with one another” (E7) Another teacher commented that “salient features can greatly influence our judgment High performers and troublemakers catch our attention, but it is difficult to assess middle-range students if you don’t have criteria” (E10) However, even those who supported setting criteria acknowledged the practical challenges of doing so: “We are not trained to it and there is no clear guideline for us” (E1) Another concern frequently cited by the elementary teachers was their large class sizes They commented that it was hard to pay attention to multiple traits in each student during the activities In practice, teachers often appear to set only one or two criteria at a time, and they so relatively flexibly depending on the activities in question The assessment literature suggests that the multiple-trait approach has more diagnostic merit than the holistic approach (Hamp-Lyons, 1991) It is interesting, however, that neither group of teachers mentioned the possible diagnostic merits of having multiple traits in their judgment In general, the secondary school teachers opted for employing criteria for observation This difference may be due to their familiarity with various types of criterion-based oral and written assessments One secondary school teacher said that “holistic judgment without setting any criteria is too subjective My evaluation would differ depending on my mood” (S23) Some teachers clearly advocated multiple-trait scoring (as opposed to scoring based on a single trait): “it is hard to evaluate students by giving them a single score If a student is excellent in grammar and vocabulary, then these qualities should be evaluated separately as such” (S2) Others indicated that multiple traits help teachers be attentive to students’ “E” refers to elementary school teachers and the numbers that follow indicate his or her ID Similarly, “S” refers to secondary school teachers ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 433 performance As with the elementary school teachers, however, they made no clear statements indicating the potential pedagogical merits of applying multiple-trait scoring Rather, more teachers supported having criteria from the viewpoint of fairness A number of teachers indicated that making holistic judgments is potentially unfair to students who fail to show the abilities that a particular teacher thinks are important Some said that fairness should be attained by setting clearer and more detailed criteria (as opposed to roughly identifying traits), and others felt that tasks should be carefully designed so that everybody has enough output to be assessed Referring to Activity in the video, a teacher said, “Sally would make more mistakes if she spoke more She had few mistakes because she did not speak a lot This is ‘less talk, better score’ and is unfair” (S1) Different Understandings of Criteria and Conflicting Values Among Teachers The discussions among the teachers also revealed that the teachers differed in their understanding of their self-chosen criteria Their judgments were also frequently influenced by different aspects of the students’ behaviors in nonsystematic ways For example, speaking fluency, which was the most frequently chosen trait by both groups of teachers, meant different things to different teachers For some teachers, fluency was synonymous with general proficiency, and for others it meant “ability to grasp” (E19; i.e., the ability to “pick up” vocabulary and expressions that were taught and/or corrected by others) For yet other teachers it referred to having no hesitation and/or displaying an “attitude that is not afraid to make mistakes” (E7, S5) Some teachers even equated fluency with confidence and indicated that “a loud voice” (e.g., E11, E20, E22) was a sign of fluency Both groups of teachers frequently encountered a dilemma with regards to how best to weigh the different traits in their holistic judgments Speaking fluency and speaking accuracy appeared to present such a dilemma to teachers, as one can see from the following exchange: E12: I think that fluency is always the priority in speaking It is followed by grammar and pronunciation Students need to be able to speak English first E13: This is true, but the new national curriculum also emphasizes accuracy E14: In fact, I usually don’t focus on grammar in speaking, but I picked up accuracy today, because Tom’s speech was full of mistakes Although fluency is important, we should correct his errors in order to help him speak English accurately E13: Yes, we should not let students keep on speaking incorrectly 434 TESOL QUARTERLY E14: Do you mean that accuracy is more important than fluency? E11, E12, E13: Well, fluency is more important … E11: I think that for the 6th grade, fluency is the most important And for lower grade students who learn English for the first time, confidence and attitude are more important E13: I have a slightly different view from you… As for Tom, he just spoke He spoke fluently but did not use correct grammar The secondary teachers often interpreted good fluency as a sign of communicative competence, and they struggled when other traits showed inconsistent results in student performance Among the traits that the teachers discussed, confidence in talking proved to be a particularly interesting trait from a formative point of view, and yet one may say that it could be a problematic construct from a summative point of view As with speaking fluency, the teachers saw different student behaviors as evidence of confidence, including speaking with “a loud voice” (e.g., E11, E20, E22, S18), “active participation in activities” (e.g., E1, E7, E10, E12, E13, S2, S6, S22) and an “attitude of not being afraid to make mistakes” (e.g., E7, S5, S6) Yet these same behaviors were also perceived as negative by some teachers Moreover, confidence tended to attract teachers’ attention and could easily mask other abilities of the students The teachers often showed substantial disagreement in their assessments when students were perceived as being confident but did not show strong performance in linguistic aspects such as speaking accuracy and pronunciation (or vice-versa): E7: The boys made a lot of grammatical errors, I gave them points overall The girls performed better, so I gave them E8: You suggested that the boys had weaker grammatical ability But I found them confident because they were so active in the activities E7: Sally was less confident and less active E8: Judging from her written note [note: the teachers could see the written notes which the students took during the activities], I thought that Sally’s writing was good She must have good English ability But she did not show it in her oral activity as much as I expected That’s why I gave her lower scores E9: I thought that Sally’s writing was good, too, and I thought that she would also be excellent in her speaking ability I was surprised to find out she was not very good in the activities E7: But Sally certainly has a good command of English She just did not display it! E8: Sally is good at writing She seemed to have good listening skills as well But I think that she does not have good speaking skills ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 435 E9: Sally does not have any confidence in speaking E8: I gave Sally points because I couldn’t see her participation in the activity …… E10: I gave Sally points, because she always gave immediate answers It’s not true that Sally doesn’t have a good ability to communicate The wide variation in the teachers’ judgments that appeared in the quantitative analysis, particularly for Tom and Sally, turned out to stem from such disagreements over how to value the students’ confidence Curiously, in the small group discussion, the secondary school teachers tended to place greater emphasis on students’ confidence and other affective aspects in evaluating the 6th-grade students’ performance compared with the elementary school teachers, whereas the elementary school teachers were more concerned with the linguistic aspects of student performance For example, Tom was considered confident and was more highly rated by the secondary school teachers than by the elementary school teachers, despite his perceived weaknesses in speaking accuracy and pronunciation This seemed to be due, in part, to different expectations of the goals of FLES between the two groups of teachers One may recall that, in the background survey, the secondary school teachers in the current study were not very familiar with the elementary school curriculum and practice It is important to note that the majority of the secondary school teachers appeared to have a different standard for the elementary and secondary school levels: Namely, confidence in talking and motivation were judged to be important at the elementary school level (as in the case of the 6th grade students), but not as much so at the secondary school level In fact, some teachers explicitly stated their concern over the perceived discrepancy in expectations between teachers at the two school levels For example, I agree with you that confidence and motivation are important at elementary school Tom made a good impression on the teachers with his confidence and I think that speaking with lots of errors is fine at the elementary school level But students like Tom don’t adapt to English classes at secondary school well because teachers at secondary school expect students to speak accurately, as in saying “I go” and “He goes.” Tom would not be ready to receive such treatment As a result, teachers have difficulty in teaching students like Tom (S14) In addition, although many secondary school teachers indicated that confidence in talking and motivation are important at the elementary school level, they also admitted that such traits are difficult to assess, and thus they questioned if such affective traits could be assessed objectively Many secondary school teachers agreed that the affective aspects of 436 TESOL QUARTERLY communication would not play a significant role in the assessment of students at the secondary school level Potential Ability Another concept that was difficult to specify was potential ability The teachers frequently mentioned the importance of students’ potential ability to be a competent communicator in a foreign language This potential ability was predominantly cited by the elementary school teachers In their discourse, confidence in talking and other affective aspects such as motivation were often considered as a sign of such potential ability, whereas some teachers included cognitive-based abilities such as “the ability to self-correct” (E2) and “the creative use of language” (E7) as indicative of such potential ability Interpersonal-related skills such as “the ability to listen to others carefully” (E18) and “helping others to understand” (E26) were also often included The teachers tended to give higher evaluations to those students who were perceived to have such potential ability However, the potential ability referred to by the teachers was not the same as the gap between ability for use and performance as described in Hymes (1972) Rather, it appeared to be similar to Vygotsky’s (1978) zone of proximal development, which refers to the gap in developmental levels between what a child can actually individually and what he or she can with assistance from adults or more capable peers We should note, however, that the teachers tended to view such issues from a long-term point of view (e.g., performance once students got to secondary school or their ultimate attainment of proficiency) rather than from an immediate one That is to say, the potential ability that the teachers in this study referred to included cognitive and affective traits which were considered helpful for the children to acquire communicative competence in the long run Such attributes might be better characterized as an aptitude for language learning, if we could broaden our current view of aptitude from one that predominantly restricts it to the cognitive domain Although the teachers saw students’ different behaviors as signs of potential ability, the teachers differed with respect to who had such ability The notion of potential ability appeared to stem from the elementary school teachers’ wish to shed light on the positive side of their students’ performance These teachers’ inclusion of such potential abilities in their assessments of student performance may have some important value from a pedagogical point of view, such as giving students essentially a delayed judgment, and in turn encouraging them However, it also raises a number of questions regarding how to conceptualize such abilities in language assessment if teacher-based assessment ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 437 is also used for summative purposes It is also important to note that such potential ability was rarely mentioned by the secondary school teachers DISCUSSION The current study investigated how elementary and secondary school teachers in South Korea observe and assess elementary school students’ foreign language performance in class The study found that there was substantial variability among the teachers in their holistic judgments and their attitudes toward holistic observation and the importance of establishing criteria Although both groups of teachers chose similar traits for their evaluations, including speaking fluency, confidence in talking, listening comprehension, motivation, and speaking accuracy, the teachers differed in the ways in which they interpreted such traits (or constructs) and in how they arrived at their respective evaluations of individual students Most notably, the teachers appeared to disagree over the interpretations of affective aspects such as confidence in talking and motivation The notion of potential ability was also frequently included in the elementary school teachers’ judgments, but not in those of the secondary school teachers It was evident that elementary and secondary school teachers had different expectations of FLES and different perceptions of how assessment should be conducted for elementary school students In interpreting such variability and the differences among the teachers in this study, Davison’s “cline for mapping teacher assessment beliefs, attitudes and practices” (2004, p 324–325) appears to be useful As summarized in Table 3, in this framework, the teachers’ beliefs and practices regarding teacher-based assessment are classified into five orientation types according to the teachers’ views toward assessment tasks, assessment processes, assessment products, inconsistencies, and assessor needs In one of the extreme orientations, assessor as technician, the teachers are highly restricted by criteria and see the assessment process very mechanically At the other extreme, the assessor as God has a strong communitybound orientation, and the teachers in this orientation take a highly personalized and intuitive approach toward assessment In the present data from South Korea, the elementary school teachers tended to be more oriented toward the assessor as God and assessor as the arbiter of “community” values viewpoints, whereas the secondary school teachers tended to be more oriented toward the assessor as technician and assessor as the interpreter of the law positions As we have seen, some elementary school teachers avoided employing any standard criteria and made their judgments in highly intuitive and personalized ways Other elementary school teachers were concerned with the inconsistencies in 438 TESOL QUARTERLY ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 439 Text focused Seemingly unaffected by inconsistencies Need better assessment criteria Secondary school teachers → Transition from grammar-translation to communicative teaching, teachers more experienced with measurement-oriented assessments and standards, English teaching specialists, students assessed using standardized tests View of the assessment product View of inconsistencies View of assessor needs Korean teachers Context of teaching/ assessment Need more time for moderation and professional dialogue (to make basis of judgments more explicit) Inconsistencies inevitable, cannot necessarily be resolved satisfactorily, teachers need to rely on professional judgment Text and student focused Principled, explicit but interpretative, attuned to local cultures/ norms/expectations Criteria-referenced, but localized accommodations Assessor as principled yet pragmatic professional System not open to scrutiny, not accountable, operated by the “chosen” Seemingly unaffected by inconsistencies Student focused Personalized, intuitive, beyond analysis Community-bound Assessor as God English only recently introduced, emphasis on communicative competence and motivation, many teachers are generalists, mentality of avoiding measurement-oriented assessment and competition among students ←Elementary school teachers Need better assessors (to uphold standards) Inconsistencies a problem, threat to validity, assessor training needs to be improved Student focused Personalized, implicit, high impressionistic, culturally-bound Community-referenced Assessor as arbiter of ‘community’ values Note Adapted from Chris Davison, Language Testing, 21, pp 305–334, copyright © 2004 by Sage Publications Reprinted by permission of Sage Need better assessor, training (in interpreting criteria) Inconsistencies a problem, threat to reliability Text focused, but awareness of student Mechanistic, De-personalized, procedural, automatic, explicit, codified, technical, seemingly legalistic, culturally universalized detached View of the assessment process Criteria-based Assessor as interpreter of the law Criterion-bound Assessor as technician View of the assessment task Davison’s orientations TABLE Analysis of Teachers’ Beliefs and Practices in This Study Based on Davison (2004, p 325) their personal judgments and viewed these inconsistencies as a threat to validity, as exemplified in the remark by one teacher that teachers only pay attention to the salient features of student performance in holistic assessments These teachers were more open to depending on community norms but felt that they would need more training to be better assessors in order to apply standards in practice On the other hand, the secondary school teachers in this study tended to be more oriented toward the assessor as technician and assessor as the interpreter of the law positions, the other end of the framework Inconsistencies were clearly a problem for the secondary school teachers, and many of them preferred to have clear standards based in large part on a wish to maintain fairness For these teachers, fairness should be secured by having objective and explicit criteria and applying them equally to everybody A clear discrepancy in beliefs and practices in assessment appeared in this study between the elementary school teachers and the secondary school teachers Their attitudes toward confidence in talking serve as a good example Many secondary school teachers felt that confidence was a very important trait for elementary school students, but many admitted that confidence was difficult to objectify and that it would not be highly valued as part of assessment once the students got to secondary school One can easily see that these differences in beliefs and practices between the elementary and secondary school teachers are highly embedded in their respective teaching and assessment contexts English FLES was recently introduced at the elementary school level in South Korea as part of the reforms to change from traditional grammar-translation oriented language teaching to communicative-based language teaching Affective domains such as confidence in talking and motivation have been strongly emphasized in setting the goals of FLES, as has developing communicative competence The belief prevails among elementary school teachers that they should avoid measurement-oriented assessment at the early stages of English learning and avoid competition among students (Butler, 2005) At the same time, they are increasingly held accountable for reporting individual student’s performance The dilemma they face and their confusion regarding assessment result in many ways from such policy requirements as well as their various backgrounds The inclusion of potential ability can be understood as their attempt to shed light on their early learners’ strengths, and their motivation for doing so is readily understandable given the context in which they are working Unlike the majority of elementary school teachers in South Korea who may teach multiple subjects, the secondary school teachers are all trained as English language teaching specialists In their highly exam-oriented secondary school culture, the teachers are much more used to using standards and criteria than their elementary school counterparts, though 440 TESOL QUARTERLY many of them are not familiar with the newly implemented FLES curricula and assessment practices Much of their assessment experience, both as students and as teachers, has largely been with measurement-oriented practices They therefore still appeared to be heavily constrained by measurement-oriented notions of assessment Although it may not be easy to develop and adapt assessment for learning in teaching practice, the current discrepancies in beliefs and practices in assessment between the elementary and secondary school levels are potentially very harmful for some students, as articulated by one of the teachers cited previously Davison (2004) suggested a potential middle ground in her framework This is the assessor as the principled yet pragmatic professional, or what she also referred to as classroom-referenced assessment, as an alternative approach to teacher-based assessment Perhaps both elementary and secondary school teachers in South Korea need to work together to reach this orientation and to narrow the gap between their assessment practices In classroom-referenced assessment, “the assessor-teacher is attuned to local cultures and expectations, yet is keen to articulate and interpret community norms, to make explicit their own and others’ underlying criteria and to hold them up for critique” (p 326) To date there has unfortunately been very little dialogue on assessment among teachers at the different school levels in South Korea In fact, this is not unique to South Korea This lack of dialogue appears to be a common problem in FLES programs in different parts of the world (e.g., Bolster, Balandier-Brown, & Rea-Dickins, 2004; Butler, 2005) As Davison indicated, although inconsistencies may be inevitable and may not necessarily be resolved in a satisfactory fashion, a mutual understanding of each others’ teaching practices is an indispensible first step toward helping students make a smoother transition from elementary school to secondary school The current study suggests that some of the important issues that need to be discussed among teachers include the identification of the most important traits at different grade levels, the role of affective aspects (such as confidence and motivation) in assessment, how to account for students’ potential abilities, and so forth CONCLUSION This study examined how teachers observe and assess elementary school students’ foreign language performance in class, focusing on FLES in South Korea as an example It was found that the teachers within the same school levels as well as across different school levels varied substantially in both their holistic judgment of the performance of young learners working in groups as well as in how they arrived at such judgments Although the elementary and secondary teachers chose similar ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 441 prescribed traits, a detailed qualitative analysis based on the teachers’ small group discussion revealed differences in elementary and secondary teachers beliefs and practices in teacher-based assessment, and such differences appeared to be deeply rooted in their respective teaching contexts Utilizing Davison’s (2004) framework for analyzing teachers’ beliefs and practices regarding teacher-based assessment, the current study suggests that both groups of teachers need to make mutual efforts to negotiate criteria while paying close attention to their local contexts and allowing for various accommodations to such contexts To enable students to make a smoother transition from elementary school to secondary school, this study implies that teachers should be given more opportunities to coordinate their activities with other teachers across school levels as well as within the same school levels Given the fact that teachers have few opportunities to communicate directly with teachers at different school levels, it is advisable for policy makers to create more opportunities for teachers to receive training together with teachers at different school levels and/or to exchange ideas and information among themselves The current study used a videotaped student performance in a somewhat detached context (as opposed to using the participating teachers’ own classroom activities) This decision was made in order to foster dialogue among teachers from different schools as part of the in-service teachers’ training program in which this study was carried out However, one must also caution that the judgments made by teachers in such a setting may differ from those they might make within their own daily practice In reality, teachers can have a number of opportunities to observe their students’ performance (as opposed to a one-shot observation as was done for this study) and, as a result, may be able to use various information that was not revealed in the current study To better understand teachers’ assessment practices, teachers should be provided with more localized discussion opportunities using their own students in future research ACKNOWLEDGMENT The author is grateful to Kikyeong Kim, Yeoguk Im, and Jiyoon Lee for their assistance in collecting data for this study Select findings of this study were presented at the 2008 Annual Conference of the Japanese Society of Language Sciences, held at the University of Shizuoka in Japan THE AUTHOR Yuko Goto Butler is an Associate Professor at the Graduate School of Education at the University of Pennsylvania, Philadelphia, United States Her research interests 442 TESOL QUARTERLY include assessment and the role of awareness in teaching and learning, especially among young learners of second/foreign languages REFERENCES Arkoudis, S., & O’Loughlin, K (2004) Tensions between validity and outcomes: Teacher assessment of written work of recently arrived immigrant ESL students Language Testing, 21, 284–304 Bachman, L F (1990) Fundamental considerations in language testing Oxford: Oxford University Press Bachman, L F., & Palmer, A S (1996) Language testing in practice Oxford: Oxford University Press Bolster, A., Balandier-Brown, C., & Rea-Dickins, P (2004) Young learners of modern foreign languages and their transition to the secondary phase: A lost opportunity? Language Learning Journal, 30, 35–41 Breen, M P., Barratt-Pugh, C., Derewianka, B., House, H., Hudson, C., Lumley, T., et al (Eds.) (1997) Profiling ESL children: How teachers interpret and use national and state assessment frameworks: Vol 1: Key issues and findings Canberra, Australia: Department of Employment, Education, Training and Youth Affairs Brindley, G (1998) Outcome-based assessment and reporting in language learning Programs: A review of the issues Language Testing, 15, 45–85 Brookhart, S M (2003) Developing measurement theory for classroom assessment purposes and uses Educational Measurement: Issues and Practice, 22(4), 5–12 Butler, Y G (2005) Nihon-no shogakko eigo-o kangaeru: Ajia-no shiten-karano kensho-to teigenn (English language education in Japanese elementary schools: Analyses and suggestions based on East Asian perspectives) Tokyo: Sansedo Butler, Y G., & Lee, J (2005, October) Teachers’ perceptions about assessing young learners’ foreign language proficiency Paper presented at the Second Language Research Forum, 2005 Columbia University, New York Cambridge ESOL (n.d.) Cambridge young learners’ English tests: Handbook for teachers Cambridge: Cambridge ESOL Retrieved January 25, 2009, from http://www cambridgeesol.org/assets/pdf/resources/teacher/yle_hb.pdf Canale, M., & Swain, M (1980) Theoretical bases of communicative approaches to second language teaching and testing Applied Linguistics, 1, 1–47 Choi, I.-C (2008) The impact of EFL testing on EFL education in Korea Language Testing, 25, 39–62 Chungcheongnamdo Office of Education (2007) 2007 haknyeondo cho, jung, go, teuksuhakgyo hakeopseongjeokgoanlijichim (Academic records management practices and guidelines for elementary, middle, high, and special schools in the 2007 academic year) (Department of Secondary Education, Publication No 6174) Daejeon, South Korea: Author Davison, C (2004) The contradictory culture of teacher-based assessment: ESL teacher assessment practices in Australian and Hong Kong secondary schools Language Testing, 21 , 305–334 Gardner, S., & Rea-Dickins, P (1999) Literacy and oracy assessment in an early years intervention project: The roles of English language stages British Studies in Applied Linguistics, 14, 14–25 Gattullo, F (2000) Formative assessment in ELT primary (elementary) classrooms: An Italian case study Language Testing, 17, 278–288 Hamp-Lyons, L (1991) Scoring procedures for ESL contexts In L Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp 241–276) Norwood, NJ: Ablex ELEMENTARY SCHOOL STUDENTS’ FOREIGN LANGUAGE PERFORMANCE 443 Hamp-Lyons, L (2007) The impact of testing practices on teaching: Ideologies and alternatives In J Cummins & C Davison (Eds.), The international handbook of English language teaching: Vol (pp 487–504) Norwell, MA: Springer Hymes, D H (1972) On communicative competence In J B Pride & J Holmes (Eds.), Sociolinguistics: Selected reading (pp 269–293) Middlesex, England: Penguin Lee, W.-K (2007, June) Assessments used in teaching English as a foreign language at elementary schools in Asia: Korea’s case Paper presented at the 2007 Asia TEFL International Conference Kuala Lumpur, Malaysia Leung, C (1999) Teachers’ responses to linguistic diversity In A Tosi & C Leung (Eds.), Rethinking language education: From a monolingual to a multilingual perspective (pp 225–240) London: Center for Information on Language Teaching and Research Leung, C (2005) Classroom teacher assessment of second language development: Construct as practice In E Hinkel (Ed.), Handbook of research in second language teaching and learning (pp 869–888) Mahwah, NJ: Erlbaum Linn, R L., Baker, E L., & Dunbar, S B (1991) Complex, performance-based assessment: Expectations and validation criteria Educational Researcher, 20, 15–21 McMillan, J H (2003) Understanding and improving teachers’ classroom assessment decision making: Implications for theory and practice Educational Measurement: Issues and Practice, 22(4), 34–43 McNamara, T (1996) Measuring second language performance London: Longman McNamara, T (1997) “Interaction” in second language performance assessment: Whose performance? Applied Linguistics, 18, 446–466 McNamara, T (2001) Language assessment as social practice: Challenges for research Language Testing, 18, 333–349 Moss, P A (2003) Reconceptualizing validity for classroom assessment Educational Measurement: Issues and Practice, 22(4), 13–25 Pica, T., Kanagy, R., & Falodun, J (1993) Choosing and using communication tasks for second language instruction In G Crookes & S Gass (Vol Eds.), Tasks and language learning: Vol (pp 9–34) Clevedon, England: Multilingual Matters Rea-Dickins, P (2007) Classroom-based assessment: Possibilities and pitfalls In J Cummins & C Davison (Eds.), International handbook of English language teaching: Vol (pp 505–520) Norwell, MA: Springer Rea-Dickins, P., & Gardner, S (2000) Snares and silver bullets: Disentangling the construct of formative assessment Language Testing, 17, 215–243 Smith, J K (2003) Reconsidering reliability in classroom assessment and grading Educational Measurement: Issues and Practice, 22, 26–33 Teasdale, A., & Leung, C (2000) Teacher assessment and psychometric theory: A case of paradigm crossing? Language Testing, 17, 163–184 Vygotsky, L S (1978) Mind in society Cambridge, MA: Harvard University Press Wiliam, D (2001) An overview of the relationship between assessment and the curriculum In D Scott (Ed.), Curriculum and assessment (pp 165–181) Westport, CT: Ablex Young, R F (2002) Discourse approaches to oral language assessment Annual Review of Applied Linguistics, 22, 243–262 444 TESOL QUARTERLY ... elementary and secondary school teachers observe and assess 6th-grade students’ foreign language performance in class (at the end of their elementary school education) and how those assessments vary among... school teachers in South Korea observe and assess elementary school students’ foreign language performance in class The study found that there was substantial variability among the teachers in their... abilities, and so forth CONCLUSION This study examined how teachers observe and assess elementary school students’ foreign language performance in class, focusing on FLES in South Korea as an example