1. Trang chủ
  2. » Giáo Dục - Đào Tạo

LANGUAGE ASSESSMENT evaluative essay analyze the targets of all the questions in the exam paper, the tasks in all the skills assessment, and the question writing techniques

16 23 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 16
Dung lượng 303,77 KB

Nội dung

UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES VIETNAM NATIONAL UNIVERSITY LANGUAGE ASSESSMENT Evaluative Essay Pair 12: Nguyễn Minh Ngọc – 19040154 Phạm Đỗ Nguyên Hương – 19040101 Lecturer: Cao Thúy Hồng Hanoi, December 2021 TASK ● Analyze the targets of all the questions in the exam paper, the tasks in all the skills assessment, and the question writing techniques ● Evaluate the given exam paper based on prescribed criteria in the rating scale below ● Estimate the match between the exam paper (targets, tasks) and the contents (targets, tasks) that students have learned in the 9th grade, second-semester English textbook RESPONSE In this essay, we would perform an analysis and evaluation of the chosen test - an end-of-term II exam paper for Vietnamese 9th graders This essay would explore the assessment targets of the test tasks, and analyze the test in terms of five language assessment principles and question-writing techniques I Assessment targets There are six units included in the coursebook (Tieng Anh - Volume 2), namely Recipes and eating habits, Tourism, English around the world, Space travel, Changing roles in society, and My future career For each unit, six components are covered, which are vocabulary, grammar, listening, reading, writing, and speaking The table below demonstrates the assessment targets of the test according to the six components mentioned above Table Assessment targets Performance levels Target contents Genres Topics Conditions VOCABULARY (covered in II Reading - Exercise 1; III Writing - Exercise 1; IV Speaking - Exercise 1) Recognize meanings of a range of words and phrases Apply forms, meanings and uses of a range of words and phrases about the difficulty of learning English in a text of around 150 words in length, in which about 15 % of the words are above A2 level (CEFR) about tourism in constructing sentences provided that all the lexical items have been taught in the course and parts of the sentences are given Apply forms, meanings, uses of a range of words and phrases in responses to open-ended questions about personal eating habits provided that all the lexical items have been taught in the course LISTENING (covered in I Listening) Identify Identify specific information general ideas provided the audio is around 150-200 words in length, in which about 10 % of the words are above A2 level (CEFR); speech is delivered relatively slowly and clearly in standard dialect in talks about the number of English speakers and teaching career in a talk provided the audio is around 200 words in length, in which about the about 15 % of the words are teaching career above A2 level (CEFR); speech is delivered relatively slowly and clearly in standard dialect READING (covered in II Reading) Identify - general ideas in a - specific paragraph information Identify - specific information - lexical inferences in a paragraph about the difficulty of learning English provided the text is around 150 words in length, in which about 15 % of the words are above A2 level (CEFR) about space travel provided the text is around 200 words in length, in which about 37 % of the words are above A2 level (CEFR) WRITING (covered in III Writing, Exercise 2) Construct about a trip that the student in 100-120 words, provided that remember the some cues are given most a paragraph SPEAKING (covered in IV Speaking) Produce responses Produce a conversation to five open-ended questions about personal eating habits provided that related lexical items have been taught in the course about the information of cooking clubs provided that context is made clear and adequate prompts are given In summary, from the given test tasks, it can be concluded that this test aims at assessing language component - vocabulary, and language skills Pronunciation can be deduced as more or less incorporated in the speaking assessment Abovementioned assessment targets cover some important learning targets such as vocabulary, speaking, listening, reading, and writing (See Appendix A), but noticeably miss out on assessing grammar Specifically, as an achievement test, assessing the depth of vocabulary (Unit 7, 8, 9) is integrated in the assessment of reading, speaking and writing skills In addition, listening tasks touch upon the assessment of two major sub-skills in the syllabus, which are listening for general and specific information (Unit 9, 12) Regarding reading skills, the tasks display a comprehensive coverage of crucial sub-skills for 9th graders: reading for general and specific information, and making lexical inferences (Unit 9, 10) Concerning writing skills, the assessment target is questionable with the choice of irrelevant topic compared to the learning targets This, accompanied with the exclusion of the subject matter in Unit 11, shows a marked discrepancy between assessment targets of the test and learning targets of the textbook The effects of this will be discussed in the following sections II Qualities of the test To deduce the quality of this assessment, this section will analyze the test tasks against five benchmarks of a language assessment, namely reliability, validity, authenticity, washback, and practicality Reliability A test is considered reliable when a consistent result is recorded on different occasions of administration (Brown, 2004) While the factor of test administration and the students themselves cannot be measured, test/retest and rater reliability can be examined based on the given test tasks Specifically, this test showcases a considerable level of test unreliability Firstly, the 45-minute time allowance seems too constricted considering the coverage of four language skills, which total 13 selected-response items, 12 limited-response items and extended-response tasks Furthermore, the mismatch between learning targets and assessment targets can cause test unreliability as students are expected to revise according to the predetermined lesson objectives only In addition, poorly written test items such as writing task 2, which will be discussed later, can interfere with the interpretation of students’ performances, leading to test unreliability Besides, the reliability of the test is influenced by human errors and subjectivity in the scoring process (Brown, 2004) While inter-rater reliability is not an issue since this type of test is rarely graded by more than one teacher, problems might arise within the scoring process itself, also known as intra-rater reliability In this test, the inclusion of selected-response tasks and limited-response tasks (See Table 2) entails higher intra-rater reliability However, the objectivity in scoring of the latter can be compromising as alternative answers might occur Additionally, performance tasks in the writing and speaking sections are subject to scoring subjectivity if marking rubrics are not well-constructed The grading of students’ competencies then lays at the sole mercy of the scorer, impeding the impartial assessment of the students In this case, since there is only a sample writing for reference, and no marking rubrics provided, it is of limited power for us to further assess the test’s rater reliability Validity A test is considered valid if it successfully measures the targets it sets out to measure (Hughes, 2003) Regarding the construct validity, which refers to the extent to which a test score can be interpreted to assess the target language proficiency (Bachman & Palmer, 1996), the test tasks are partly construct-invalid To begin with, the reading and speaking tasks, which are selected-response and performance tasks, are relatively well-suited to assess the learning targets clarified in the textbook (See Appendix A) such as identifying general and specific information, and delivering a talk or conversation about the given topics Meanwhile, the listening and writing tasks show substantial room for improvement For the listening tasks, the writing of gap-filling items in Q1-2 of task underrepresents the target skill of listening for specific information Simultaneously, the recording, which supposedly consists of talks, is modified into scripted monologues, losing the natural characteristics of the target situations As for the writing tasks, while the sentence completion task matches the target of assessing vocabulary, the paragraph writing task fails to clearly communicate the expected outcome to the students regarding the genre and topic of the writing These shortcomings prove the test to be construct-invalid, which might hinder the interpretation of the test scores in evaluating students’ performances In addition, the test is also partially content-invalid While sufficient tasks are provided to assess vocabulary and four language skills on a range of topics, grammar - a crucial language component - is not assessed in any tasks Besides, the content of Unit 11 is not covered in any parts of the test This lends itself to inadequate representativeness of the learning targets, which reduces the content validity of the test Furthermore, the sentence completion task can be seen as an indirect testing of vocabulary, which might lower the content validity as well (Brown, 2004) At the same time, Q4 of this task seems out of place since it touches on a grammatical point that is not included in the curriculum, and does not serve any meaningful assessment target in the entirety of the task Writing task also touches on a topic that is not the target of writing clarified in the textbook (See Appendix A) Generally speaking, content validity is severely underperformed in this test Authenticity Authenticity is the degree to which test materials and test conditions present what happens in the real target situation (Brown, 2004) In terms of authenticity, Brown (2004) suggested several criteria to precisely evaluate the authenticity of a test, namely language use, items, topics, thematic organization and resemblance to real-life situations Taking these into consideration, the listening recording shows an appropriate use of language; however, it is adapted into scripted monologues with little intonation and relatively slow speaking pace For the reading section, although Task Q2-3 are contextualized to measure students’ sub-skill of inferring the meaning of unknown words from the context, both of the texts are not provided with an authentic source Regarding the speaking part, it successfully resembles real-world tasks in which students have to perform their learned knowledge and skill In addition, one advantage that the four parts have in common is the topics of the tasks, which are meaningful and relevant to the course Washback Washback is the effect of testing on how students prepare for the test (Brown, 2004) There are two types of washback: positive and negative, based on whether it has beneficial or undesirable effects on educational practices (Hughes, 2003) In this case, the analyzed test has a somewhat positive washback as it thoroughly covers a sizable portion of learning targets identified in the texbook However, the test fails to assess grammar as well as knowledge learned in Unit 11: Changing roles in society Therefore, students might be perplexed when their preparation is not reflected on the test, potentially resulting in demotivation for students in later assessment Practicality Practicality is the relationship between the resources that will be required and the resources that will be available (Bachman & Palmer, 1996) The chosen test requires reasonably priced printing materials and well-prepared equipment such as speakers and exam papers, which cannot be practically measured The only criterion that can be evaluated is the impracticality of the time allowance It is challenging for students to successfully complete the test within the set time frame of 45 minutes To elaborate, the test includes a 7-minute long recording, 150-200-word texts, and performance tasks, which might be impractical for teachers to administer and for students to finish the test in the time limit Moreover, as the test does not contain an evaluation system and the procedure on how teachers can administer the two speaking tasks, it is difficult to evaluate the practicality of this part in particular and the test in general III Question writing techniques The analyzed test encompasses two types of assessment methods, which are selected-response assessment, and constructed-response assessment The specifics of the question types are presented in the following table: Table Assessment methods Assessment methods Selected-response Limitedresponse Constructed -response Extendedresponse Frequency Multiple Choice (covered in II Reading - Exercise 1) True/False Statements (covered in II Reading - Exercise 2) Sequencing (covered in I Listening - Exercise - Q3-6) Gap-filling (covered in I Listening - Exercise 1; - Q1-2) Sentence Completion (covered in III Writing - Exercise 1) Performance: - Paragraph Writing (covered in III Writing - Exercise 2) - Interview: Response to open-ended questions (covered in IV Speaking - Exercise 1) - Paired test (covered in IV Speaking - Exercise 2) TOTAL Accordingly, all questions in the analyzed test will be closely examined in terms of the characteristics of their corresponding assessment methods, the construction of tasks’ instructions, input and vocabulary level, and their achievement of the expected assessment targets Listening The listening tasks are constructed in the forms of gap-filling (Task 1, Q1-5; Task 2, Q1-2) and sequencing tasks (Task 2, Q3-6) The former’s task items are designed with little modifications from the tapescript while the latter are synthesized to fit with the aims of listening for general ideas With this construct, both tasks allow minimal guessing probability and highly subjective scoring Besides, the instructions for both tasks are written clearly with a brief description of the context However, there is no direct instruction on how to note the order of events in the sequencing task, which might pose an unnecessary challenge for students in achieving the task requirement Regarding the input, gap-filling items of task (Q1-5) are efficiently designed in a table with reasonable intervals in between This allows students to track the items easily and have enough time to fill in one gap before the next item is mentioned Meanwhile, the writing of gap-filling items in task (Q1-2) is of little service to the purpose of listening for specific information, but rather just to testing the recognition of words in use Vocabulary wise, all words and phrases in the tasks are largely within the A2 level (roughly 90% in task and 85% in task 2) Those of higher levels are mostly taught previously, such as although (B1) and combine (B2) Advanced level words (C1, C2), accounting for 5% in task 2, might hinder the comprehensibility of the recording Reading Both instructions for these two tasks are written clearly and briefly, which ensures the effectiveness of the instruction The test designers also make great use of action verbs (read, complete, circle) as well as clarify the task requirements (decide if the statements are true or false) In terms of the topics chosen, both are relevant to the content of the course as they demand the knowledge learned in Unit and Unit 10 Besides, there are noticeable differences between the two reading tasks Reading task is a multiple choice assessment, which contributes to the subjectivity of the rating process However, as the answers for Q4-5 are directly taken from the passage without being paraphrased, there is a fair chance that students can guess the answers correctly without regard to the comprehension of the text In terms of the input, approximately 20% of the words in passage are above A2 level with only word above B2 level (See Appendix C) Given the majority of those are included in the syllabus, the chosen passage is generally comprehensible to the targeted learners Meanwhile, reading task takes the form of a true/ false assessment, thus requiring students’ understanding of the text and enabling subjective grading Regarding the input of text 2, although approximately 23% and 12% of the words are at B1 and B2 level respectively (See Appendix C), the majority of the words at B2 level have already been taught to students in Unit 10 such as launched, missions, telescopes (See Appendix B), which improves the comprehensibility of the text However, there are several spelling and grammatical mistakes shown in the text, such as accommodate, flybies, serve as space environment… In terms of layout, task has an effective layout as it requires students to circle the correct answers instead of writing them down, which creates the uniformity and subjectivity for the answers Writing Writing task is designed as a sentence completion task with given prompts This type of task can only assess writing ability to a narrow extent Instead, it concerns the assessing of the depth of vocabulary, which is often the focus of assessing writing for students of low level (Brown, 2004) The instruction and wording of items are also clear and straightforward with an example provided However, Q4 seems redundant as it tests the grammatical point (present perfect) not included in the textbook (See Appendix A) Besides, all the tested lexical items (breathtaking, affordable, break the bank, full board) have been taught in Unit (See Appendix B), and no grammatical mistakes are recorded Writing task is an extended-response assessment in the form of paragraph writing This type of task is suitable for evaluating higher cognitive skills such as analysis and production (Florida Center for Instructional Technology, n.d.) However, the construction of this task reveals several problems First, the instructions fail to clarify the specific genre of the writing task such as a blog or a description paragraph Moreover, the topic of the writing does not represent any learning targets identified in the textbook This raises a question regarding the content validity and reliability of the test tasks It is highly likely that students will struggle to deliver the requirements of the task since task specifications are not clearly communicated Apart from that, the cues provided are sufficient and contain simple languages appropriate for students of A2 level Speaking There are several problems that need adjusting in the instructions of the speaking section Both of them are unclear, informal, and lengthy, which might cause confusion to the test-takers A clear context for the role-play in task is also missing In response, task should include an example as its requirement is rather complicated and might cause students difficulty performing it Regarding the assessment method, task is designed as an interview, requiring students to respond to open-ended questions while task is a paired test in which two students have to carry out a conversation with given prompts These two formats are particularly well-suited for assessing the grasp of vocabulary and speaking skills set out in the targets Nevertheless, the test designers should elaborate more on how the examiners can successfully conduct the tasks given the tense time limit of the test Besides, both speaking tasks contain simple vocabulary, which is suitable for students of A2 level In terms of topic, the focus is placed upon Unit 7: Recipes and eating habits, which is relevant to the course and interesting to students REFERENCES Bachman, L F., & Palmer, A S (1996) Language Testing in Practice: Designing and Developing Useful Language Tests Oxford University Press Brown, H D (2004) Principles of Language Assessment In Language Assessment: Principles and Classroom Practices (pp 19-41) Pearson Education, Inc Florida Center for Instructional Technology (n.d.) Classroom Assessment - Constructed Response Florida Center for Instructional Technology Retrieved from: https://fcit.usf.edu/assessment/constructed/constructb.html Hughes, A (2003) Validity In Testing for Language Teachers (2nd ed., pp 26-35) Cambridge University Press APPENDIX A Learning targets identified in Tieng Anh (Volume 2) Performance levels Target contents Genres Topics Conditions VOCABULARY - Forms, meanings and uses of words and phrases related to the topics of recipes and eating habits, tourism, English in the world, space travel, changing roles in society, and my future career Pronunciation: Tones in statements with varied purposes GRAMMAR - Quantifiers Modal verbs in conditional sentences type Articles Conditional sentences type Relative clauses Past simple and past perfect Future passive Despite/ In spite of Verbs + to-inf/ Verbs + V-ing LISTENING about personal eating habits provided the audio is ~200 words in length, in which about 11 % of the words are above A2 level (CEFR); speech is delivered relatively slowly and clearly in standard dialect in a lecture about the benefits of tourism to an area/country provided the audio is ~170 words in length, in which about 33 % of the words are above A2 level (CEFR) (most of them are taught in the curriculum); speech is delivered relatively slowly and clearly in standard dialect - general ideas - specific information in a talk about some students’ experiences in learning and using languages provided the audio is ~210 words in length, about 11 % of the words are above A2 level (CEFR); speech is delivered relatively slowly and clearly in standard dialect - general ideas in a talk about some space tourism services provided the audio is ~180 words in length, about 18 % of detailed and specific information Identify specific information in a radio show interview - specific information specific information - general ideas - specific information the words are above A2 level (CEFR); speech is delivered relatively slowly and clearly in standard dialect provided the audio is ~160 words in length, about 27 % of about the changes the words are above A2 level that women in (CEFR) (most are taught in the Kenya are going curriculum); speech is delivered through relatively slowly and clearly in standard dialect in a talk in a conversation about choosing future jobs and reasons for these choices provided the audio is ~210 words in length, about 15 % of the words are above A2 level (CEFR) (most are taught in the curriculum); speech is delivered relatively slowly and clearly in standard dialect READING - general ideas - specific information - general ideas - specific information - lexical inferences Identify - general ideas - specific information - lexical inferences specific information specific information in an article a passage about Japanese eating habits about a tourist attraction provided the text is about 210 words in length, in which about 25 % of the words are above A2 level (CEFR) provided the text is about 220 words in length, in which about 23 % of the words are above A2 level (CEFR) provided the text is about 220 words in length, in which about 21 % of the words are above A2 level (CEFR) in a text about English as a means of international communication in a text provided the text is about 160 about two famous words in length, in which about astronauts’ space 11 % of the words are above A2 travel level (CEFR) in a passage about the changing roles of women in society and its effects provided the text is about 200 words in length, in which about 36 % of the words are above A2 level (CEFR) - general ideas - specific information - lexical inferences in an article about choosing a career provided the text is about 210 words in length, in which about 28 % of the words are above A2 level (CEFR) WRITING Construct a description about a with 100-120 words in length, classmate’s eating following a suggested writing habits plan a discussion paragraph about the negative effects of tourism with 100-120 words in length, accompanied with a suggested outline and suggested cohesive devices a descriptive paragraph about the uses of English in everyday life with 120-140 words in length, following a suggested writing plan a short advertisement to advertise some products following the structure of a given sample a discussion paragraph about the roles of teenagers in the future with 120-140 words in length, following a suggested writing plan a discussion paragraph about the qualities to a job well with 120-140 words in length, following a suggested writing plan SPEAKING Produce a short presentation on Vietnamese eating habits with suggested cues and contexts a talk on choices of holiday - following a given sample - with suggested ideas a conversation on learning and using English - following a given sample - with suggested ideas a role-play about solving problems in the space station - with suggested cues and contexts a talk on changing roles in the future - following a given sample - with suggested ideas a talk on skills and abilities to a job - following a given sample - with suggested jobs B Vocabulary learned in Tieng Anh (Volume 2) UNIT Unit 7: Recipes and eating habits VOCABULARY Different dishes: lasagne, steak pie, curry, Cobb salad, fajitas, beef noodle soup, sushi, mango sticky rice Ways of preparing and cooking: whisk, grate, chop, sprinkle, slice, dip, spread, marinate, stir-fry, deep-fry, roast, grill, bake, steam, stew, simmer Extra vocabulary: purée, shallot, garnish, cube, tender Unit 8: Tourism Tourism: trip, travel, expedition, resort, tour, tour guide, visit, environment, holiday, book, guides, pleased, excursion, reasonable Compound nouns: jet lag, drawback, stop over, check in, bus stop, swimming pool, touchdown, checkout, pile-up, mix-up, full board Extra vocabulary: breathtaking, affordable, not break the bank Unit 9: English in the world Languages: bilingual, fluent, rusty, pick up a language, reasonably, get by in a language Language use and learning: guess the meaning of a word, know what a word means, have an accent, make mistakes, translate from your first language, correct a mistake, imitate other speakers, look up a word in a dictionary Unit 10: Space travel Astronomy and space travel: astronaut, mission, microgravity, astronomy, habitable, altitude, satellite, meteorite, universe, spacecraft, rocket, telescope, land, orbit, train, experience, launch, a flight suit, spacewalks, operate, good health, parabolic flights, planets Extra vocabulary: attach, rinseless, maintenance, Mission Control Centre Unit 12: My future career Jobs, careers, and factors affecting career choice: housekeeper, tour guide, lodging manager, event planner, customer service staff, biologist, opera singer, architect, mechanic, fashion designer, pharmacist, businesswoman, craftsman, physicist, career, job, profession, career path Extra vocabulary: can’t stand, make a bundle, burn the midnight oil C Vocabulary level of reading texts (Part III Reading - Exercise and Exercise 2) LEVEL TEXT TEXT Number (words) Percent (%) Number (words) Percent (%) A1 57 64.77 43 35.54 A2 16 18.18 13 10.74 B1 9.09 28 23.14 B2 3.41 15 12.40 C1 1.14 0.83 C2 0 0.83 Unlisted 3.41 20 16.53 SCORING SCALE FOR THE EVALUATIVE ESSAY Performance levels 1: Arguments are not communicated clearly and effectively, with no supporting evidence from the test Or arguments are irrelevant or absent 2: A small number of arguments are communicated clearly and there is modest evidence from the test 3: Most arguments are clear, with specific, sufficient and reasonably well-explained examples from the test 4: Arguments are clear and convincing, with specific, sufficient, clearly well-explained and highly connected examples from the test RATING SCALE Performance standards No Criteria 1.5 2.1 Reliability 0.4 0.7 2.2 Content and construct validity 0.4 0.7 2.3 Authenticity 0.4 0.7 2.4 Washback 0.4 0.7 2.5 Practicality 0.4 0.7 Evaluate the question-writing techniques (2 points) 1.5 Demonstrate good essay writing competence (C1) 0.3 0.7 Identify the assessment targets in the test (2 points) Evaluate the qualities of the test where appropriate (5 points) and appropriate referencing (1 point) ...TASK ● Analyze the targets of all the questions in the exam paper, the tasks in all the skills assessment, and the question writing techniques ● Evaluate the given exam paper based... depth of vocabulary (Unit 7, 8, 9) is integrated in the assessment of reading, speaking and writing skills In addition, listening tasks touch upon the assessment of two major sub -skills in the. .. targets of the test tasks, and analyze the test in terms of five language assessment principles and question- writing techniques I Assessment targets There are six units included in the coursebook

Ngày đăng: 11/03/2022, 16:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w