TAP HUAN DOI MOI KT DG 2014 03

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	56
Dung lượng	148,91 KB

Nội dung

A few general rules There are a few general rules to follow when constructing any kind of item: • items should always attempt to test salient information • normal grammatical conventions[r]

(1)Phần thứ hai XU THẾ ĐỔI MỚI TRONG KIỂM TRA ĐÁNH GIÁ HỌC NGOẠI NGỮ I AN OVERVIEW OF LANGUAGE TESTING AND ASSESSMENT Questions for discussion: How you assess your students’ language learning in your schools ? Can you assess students’ ability, knowledge and (specific) competence? Task 1: Fill in the forms of assessment commonly used at high schools in Vietnam Kinds of test Mini test Sequence tests/semester Content Language Test type MCQ, gap fill Time 15 minutes skill/element I CHANGED PERCEPTIONS OF ASSESSMENT Teachers, parents, and students frequently bristle when they hear the word assessment So negative are some of the reactions to state and federally mandated testing that we have all but forgotten the positive impact assessment can have on teaching, learning and promoting language programs Current research suggests that when they view assessment as a learning toll, teachers are more likely to integrate authentic assessments into their lessons and alter how they organize learning experiences (Williams, 2006) The negative attitude toward testing may be a result of testing used by government as an intrusment for accountability and not as a learning tool Fifteen years ago, states such as Kentucky and Vermont were on a different course with statewide assessments taking the form of portfolios in writing and mathematics (2) and performance events in science, math, social studies and the arts Early results showed teachers spent more time training students to think critically and solve complex problems than previously (What the research, 1996) Today many of those alternative open-ended, large-scale assessments have been abandoned and testing has become standardized The No Child Left Behind (NCLB) Act has driven teachers toward designing instruction to help students “past the test”, rather than engineering creative and critical thinking learning environments Many world language teachers have felt a trickle down effect from NCLB: some are asked to contribute to students’ test readiness by incorporating general statewide testing practices in their own assessments, others feel the need to fight to maintain their programs when funding is diverted to tested content areas Without significant assessment data to make a strong case, many more programs could be in jeopardy when resources tighten (Keatley, 2006) Throughout this period of test-driven teaching, leaders in the world language profession have responded to the need to demonstrate student progress and proficiency by choosing to view assessment as a learning tool This stance focuses attention on integrated performance tasks, oral interviews, porfolios, collaborative projects and other alternative assessmens Assessments such as these can be particularly benefical for young learners (McKay, 2006, Shohamy, 1998) Integrated assessments, which are activities that blend content and language in a real world tasks, make learning meaningful to students and provide confortable, and at times, playful, opportunities for contextual language output They bring a more balanced approach to assessment and have a positive effect on achievement, because students perceive them as activities rather than tests, and consequently, they perform in a more relaxed, stress-free manner, self-correcting Implementing such assessment experiences often has the effect of increasing teachers’ use of the target language in class and improving student motivation Instruction becomes more student-centred and sparks student-initiated activities (3) Current trends in education emphasize the importance of this type of formative assessment that ofeers a snapshot of what students know in order to make responsive changes in teaching and learning (Wiggins, 2004, Pellegrino, Chudowsky, & Glaser, 2006) Formative assessment provide a balance to assessment systems and turns attention to a type of teaching that looks more like managing learning than teaching Reframing how we think about assessment can positively affect how teachers shape curriculum, plan lessons and guide students’ leanring Dylan Williams (2006) purports that improvement in student achievement will be result from what happens in these newly constructed learning enviroments To gain greater ground we need compelling data Dr Carolyn Taylor’s study shows that foreign language students significantly outperformed their non-foreign language counterparts on every subtest of the Louisiana state assessment and the language portion of the fifth-grade lowa Basic Skills Test (2003) Preliminary date from Dr Adeidine Moeller’s longitudinal study of the effects of LinguaFolio on student achievement suggest a positive effect on achievement from self-assessment and goal setting (personal conversation, September 26, 2006) A balanced assessment system is vital for generating the necessary scientific data to maintain current programs, make systemic program improvements and advocate for new world language programs Therefore assessment can be an essential tool to: diagnose key areas for improvement, describe achievement and progress, manage and assist learning, improve curriculum and instruction, validate program design, facilitate articulation, and advocate for language learning I.2 ASSESSMENT LITERACY Acknowledging the need to reframe our perceptions of assessment signifies half of the challenge; apply this new perspective to practice is the other crucial aspect To build a successful balanced assessment program, teachers need to be assessment literate, that is to know what assessment tolls are available and to understand which particular types of instruments should be used for what purposes A variety of assessments is fundamental (4) to providing a comprehensive overview of a student competence and making learning transparent to students and other stakeholders Different assessments address different needs and purposes Many of the simple, daily, classroom activities can serve as assessments to inform palnning and motivate learners After assessment instruments are selected and implemented, the next step is to analyse the results and interpret the findings to inform instructional decision-making Adopting this new assessment perspective does not imply more time for teachers, rather a redirection of teacher energy to adapt instruction based on the results of assessment Leaders in the field (O’Malley & Valdez Pierce, 1996; Donato, 1998; Tollefson, 2005; McKay, 2006) agree that teachers need to use a suite of assessments in order to provide a comprehensive view of students’ knowledge and performance Generally, assessments can be categorized in the following manner: Diagnostic assessments identify problem areas, such as reading, language and cognitive skills Information gained from diagnostic tests provides guidance for student placement or strategic intervention Achievement tests examine students’ mastery over what was taught Not limited to paper-pencil tests, this type of assessment often focuses on discrete points, covers specific content and allows for a perfect score Norm referenced, achievement tetss were principal determing factors in the assignment of grades Achievement tests may be considered formative if conducted on an ongoing basis and used to inform teaching and learning Proficiency tests identify, globally, what students know and can with the language The content on a proficiency test is not limited to what was taught in the classroom Criterionreferenced compare students’overall language competence to a standard, such as the ACTFL K-12 Performance Guidelines or state or district standards Summative assessment is a comprehensive check of what students have learned at the end of a lesson, unit, or course Based on a cumulative learning experience, summative assessment is testing for achievement and, depending on age and level, can take the form of performance tasks , oral interviews, written reports, projects or role-plays Formative assessment encompasses of many of the ordinary learning tasks students on a routine basis Formative assessment provides on-going, continuous snapshots of (5) knowledge used to monitor progress, give student feedback, modify curriculum and adjust learning experiences Examples of formative assessment include anecdotal records, observations, interviews, performance tasks, written work, worded graphics, and journals Peer-Assessment encourages student to think deeply about the various elements of language competency when they rate other students’ performances and products against specific criteria With well-designed rubrics, students can analyze and discuss language use and provide feedback to one another (Source: Van Houten, J (2006) Turning a new light on assessment with LingualFolio Learning Languages, 12(1), 7-11) I ASSESSMENT FOR LEARNING Assessment for learning involves using assessment in the classroom to raise students’ achievement It is based on the idea that students will improve most if they understand the aim of their learning, where they are in relation to this aim and how they can achieve the aim (or close the gap in their knowledge) Effective assessment for learning happens all the time in the classroom It involves:  Sharing learning goals with students  Helping students know and recognize the standards to aim for  Providing feedback that helps students identify how to improve  Believing that every student can improve in comparison with previous achievement  Both the teacher and students reviewing and reflecting on students’ performance and progress  Pupils learning self-assessment techniques to discover areas they need to improve  Recognizing that both motivation and self-esteem, crucial for effective learning and progress, can be increased by effective assessment techniques Research has shown that being part of the review process raises standards and empowers students to take action to improve their performance An AFL checklist can be used to (6) identify effective assessment for learning in your own classroom or school Assessment for learning (formative assessment) is different from assessment of learning (summative assessment), which involve judging students’ performance against national standards (level descriptions) Teachers often make these judgment at the end of a unit of work, year or key stage Test results also describe students’ performance in terms of levels However, an important aspect of assessment for learning is the formative use of summative data Key characteristics of assessment for learning are:  Using effective questioning techniques  Using marking and feedback strategies  Sharing learning goals  Peer and self-assessment High-level questioning can be used as a tool for assessment for learning Teachers can:  Use questions to find out what students know, understand and can  Analyse students’ responses and their questions in order to find out what they know, understand and can  Use questions to find out what students’ specific misconceptions are in order to target teaching more effectively  Use students’ questions to assess understanding Some questions are better than others at providing teachers with assessment opportunities Changing the way a question is phrased can make a significant difference to:  The thought processes pupils need to go through  The language demands made on students  The extent to which pupils reveal their understanding  The number of questions needed to make an assessment of pupils’ current understanding Using marking and feedback strategies Teachers recognize that feedback is an essential element in helping pupils to improve (7) When using assessment for learning strategies, teachers need to move away from giving work marks out of 10 with comments that may not be related to the learning intention of the task and move towards giving feedback to help the students improve in the specific activity This will help to close the gap and move students forward in their understanding It is important to establish trust between the teacher and the student before giving feedback Students benefit from opportunities for formal feedback through group plenary sessions Where this works well, there is a shift from teachers telling students what they have done wrong to students seeing for themselves what they need to to improve and discussing it with the teacher Giving feedback involves making time to talk to students and teaching them to be reflective about the learning objectives and about their work and responses Characteristics of effective feedback Feedback is more effective if it focuses on the learning intention of the task and is given regularly while still relevant  Feedback is most effective when it confirms tht students are on the right track and when it stimulates correction or improvement of a piece of work  Suggestions for improvement should act as “scaffolding”, i.e students should be given as much help as they need to use their knowledge They should not be given the complete solutions as soon as they get stuck and should learn to think through for themselves  Students should be helped to find alternative solutions if simply repeating an explanation continues to lead to failure  Feedback on progress over a number of attempts is more effective than feedback on one attempt treated in isolation  The quality of dialogue in feedback is important and most research indicates that oral feedback is more effective than written feedback  Students need to have the skills to ask for help and the ethos of the school should encourage them to so A culture of success should be promoted in which every student can make achievements by building on their previous performance rather than being compared with others This is (8) based on informing students about th sthengths and weaknesses demonstrated in their work and giving feedback about what their next steps should be Sharing learning goals Most schemes of work emphasize the need to clearly identify the learning objectives for a lesson Teachers should ensure that students recognize the differenc between the task and its learning intention (separating what they have to from what they will learn) Assessment criteria or learning outcomes are often defined in formal language that students may not understand To involve students fully in their learning teachers should:  Explain clearly the reasons for the lesson or activity in terms of the learning objectives  Share the specific assessment criteria with students  Help students to understand what they have done well and what they need to develop Looking at a range of other students’ responses to the task set can help students understand how to use the assessment criteria to assess their own learning Peer and self-assessment Research has shown that students will achieve more if they are fully engaged in their own learning process This means that if people know what they need to learn and why, then they actively assess their understanding, gaps in their own knowledge and areas they need to work on, they will achieve more than if they sit passively in a classroom working through exercises with no real comprehension either of the learning intension of the exercise or of why it might be important Peer assessment Peer assessment can be effective because students can clarify their own ideas and understanding of both the learning intention and the assessment criteria while marking other students’ work Peer assessment must be managed carefully It is not for the purpose of ranking because if students compare themselves with others rather than (9) their own previous attainment, those performing better than their peers will not be challenged and those performing worse will be demotivated Self-assessment Self-assessment is na important tools for teachers Once pupils understand how to assess their current knowledge and the gaps in it, they will have a clearer idea of how they can help themselves progress Teachers and students can set targets relating to specific goals rather than to national curriculum levels The pupils will then be able to guide their own learning, with teacher providing help where necessary or appropriate In addition, students will need to reflect:  on their own work  be supported to admit problems without risk to self-esteem  be given time to work problems out Asking students to look at examples of other students’ work that does and does not meet the assessment criteria can help them to understand what was required from a task and to assessment the next steps they might need to take Looking at different reponses can also help students understand the different approaches they could have taken to the task It is often helpful if the work is from students they not know (Source: adapted from http://www.qca.org.uk/qca_4336.aspx) (10) II QUALITY TEST CONSTRUCTION Questions for discussion: What are major characteristics of a good test? How to plan a test? How to write a test specification? II CHARACTERISTICS OF A GOOD TEST A good classroom test is valid and reliable Validity is the quality of a test which measures what it is supposed to measure It is the degree to which evidence, common sense, or theory supports any interpretations or conclusions about a student based on his/her test performance More simply, it is how one knows that a math test measures students' math ability, not their reading ability Another aspect of test validity of particular importance for classroom teachers is content-related validity Do the items on a test fairly represent the items that could be on the test? Reasonable sources for "items that should be on the test" are class objectives, key concepts covered in lectures, main ideas, and so on Classroom teachers who want to make sure that they have a valid test from a content standpoint often construct a table of specifications which specifically lists what was taught and how many items on a test will cover those topics The table can even be shared with students to guide them in studying for the test and as an outline of what was most important in a unit or topic Reliability is the quality of a test which produces scores that are not affected much by chance Students sometimes randomly miss a question they really knew the answer to or sometimes get an answer correct just by guessing; teachers can sometimes make an error or score inconsistently with subjectively scored tests These are problems of low reliability Classroom teachers can solve the problem of low reliability in some simple ways First, a test with many items will usually be more reliable than a shorter test, as whatever random fluctuations in performance occur over the course of a test will tend to cancel itself out across many items By the same token, a class grade will itself be more reliable if it reflects many different assignments or components Second, the (11) more objective a test is, the fewer random errors there will be in scoring, so teachers concerned about reliability are often drawn to objectively scored tests Even when using a subjective format, such as supply items, teachers often use a detailed scoring rubric to make the scoring as objective, and, therefore, as reliable as possible Classroom tests can also be categorized based on what they are intended to measure Traditional paper-and-pencil classroom tests (e.g multiple-choice, matching, truefalse) are best used to measure knowledge They are typically objectively scored (a computer with an answer key could score it) Performance-based tests, sometimes called authentic or alternative tests, are best used to assess student skill or ability They are typically subjectively scored (a teacher must apply some degree of opinion in evaluating the quality of a response) Task 1: which of these are more reliable? a) tests with a large number of test items in them tests with a small number of test items in them b) tests piloted in a few schools tests piloted across the whole country c) test items in which 40% of candidates get it right test items in which everyone gets it right d) tests which rely on the markers’ good level of English and their subjective judgement test which can be marked “clerically” e) test items for reading which use the same grammar and vocabulary as the textbook but in a new passage that no candidate has ever seen before test items for reading which are taken directly from the textbook Task 2: Arrange the following steps in order of planning a test a) administering the test b) scoring and rating c) determining the purpose of the test (12) d) report the test result e) analyzing the test items and task f) planning the test g) improve the course and the teaching methods h) selecting items and task II BLOOM’S TAXONOMY Bloom’s Taxonomy provides an important framework for teachers to use to focus on higher order thinking By providing a hierarchy of levels, this taxonomy can assist teachers in designing performance tasks, crafting questions for conferring with students, and providing feedback on student work This resource is divided into different levels each with Keywords that exemplify the level and questions that focus on that same critical thinking level Questions for Critical Thinking can be used in the classroom to develop all levels of thinking within the cognitive domain The results will be improved attention to detail, increased comprehension and expanded problem solving skills Use the keywords as guides to structuring questions and tasks Finish the Questions with content appropriate to the learner Assessment can be used to help guide culminating projects The six levels are: Level I: Knowledge Level IV: Analysis Level II: Comprehension Level V: Synthesis Level III: Application Level VI: Evaluation # Blooms Level I: Knowledge Exhibits memory of previously learned material by recalling fundamental facts, terms, (13) basic concepts and answers about the selection Key words: who, what, why, when, omit, where, which, choose, find, how, define, label, show, spell, list, match, name, relate, tell, recall, select Questions: • What is…? • Can you select? • Where is…? • When did happen? • Who were the main…? • Which one…? • Why did…? • How would you describe…? • When did…? • Can you recall…? • Who was…? • How would you explain…? • How did _happen…? • Can you list the three ? • How is…? • How would you show…? Assessment:  Match character names with pictures of the characters  Match statements with the character who said them  List the main characteristics of one of the main characters in a WANTED poster  Arrange scrambled story pictures and/or scrambled story sentences in sequential order  Recall details about the setting by creating a picture of where a part of the story took place # Blooms Level II: Comprehension Demonstrate understanding of facts and ideas by organizing, comparing, translating, interpreting, giving descriptors and stating main ideas Key words: compare, contrast, demonstrate, interpret, explain, extend, illustrate, infer, outline, (14) relate, rephrase, translate, summarize, show, classify Questions: • How would you classify the type of…? • How would you compare…? contrast…? • Will you state or interpret in your own words…? • How would you rephrase the meaning? • What facts or ideas show…? • What is the main idea of ……? • Which statements support…? • Which is the best answer…? • What can you say about …? • How would you summarize… ? • Can you explain what is happening…? • What is meant by…? Assessment:  Interpret pictures of scenes from the story or art print  Explain selected ideas or parts from the story in his or her own words  Draw a picture and/or write a sentence showing what happened before and after a passage or illustration found in the book (visualizing)  Predict what could happen next in the story before the reading of the entire book is completed  Construct a pictorial time-line that summarizes what happens in the story  Explain how the main character felt at the beginning, middle, and /or end of the story # Blooms Level III: Application Solve problems in new situations by applying acquired knowledge, facts, techniques and rules in a different, or new way Key words: apply build, choose, construct, develop, interview, make use of, organize, experiment with, plan, select, solve, utilize, model, identify Questions: • How would you use…? • How would you solve _ using what you’ve learned…? (15) • What examples can you find to…? • How would you show your understanding of…? • How would you organize _ to show…? • How would you apply what you learned to develop…? • What approach would you use to…? • What other way would you plan to…? • What would result if…? • Can you make use of the facts to…? • What elements would you use to change…? • What facts would you select to show…? • What questions would you ask during an interview? Assessment:  Classify the characters as human, animal, or thing  Transfer a main character to a new setting  Make finger puppets and act out a part of the story  Select a meal that one of the main characters would enjoy eating: plan a menu, and a method of serving it  Think of a situation that occurred to a character in the story and write about how he or she would have handled the situation differently  Give examples of people the student knows who have the same problems as the characters in the story # Blooms Level IV: Analysis Examine and break information into parts by identifying motives or causes Make inferences and find evidence to support generalizations Key words: analyze, categorize, classify, compare, contrast, discover, dissect, divide, examine, inspect, simplify, survey, test for, distinguish, list, distinction, theme, relationships, function, motive, inference, assumption, conclusion, take part in Questions: • What are the parts or features of ? • How is _ related to ? • Why doyou think ? • What is the theme ? • What motive is there ? • Can you list the parts ? • What inference can you make ? • What conclusions can you draw ? • How would you classify ? (16) • How would you categorize ? • Can you identify the different parts ? • What evidence can you find ? • What is the relationship between ? • Can you make a distinction between ? • What is the function of ? • What ideas justify ? Assessment:  Identify general characteristics (stated and/or implied) of the main characters  Distinguish what could happen from what couldn't happen in the story in real life  Select parts of the story that were the funniest, saddest, happiest, and most unbelievable  Differentiate fact from opinion  Compare and/or contrast two of the main characters  Select an action of a main character that was exactly the same as something the student would have done # Blooms Level V: Synthesis Compile information together in a different way by combining elements in a new pattern or proposing alternative solutions Key words: build, choose, combine, compile, compose, construct, create, design, develop, estimate, formulate, imagine, invent, make up, originate, plan, predict, propose, solve, solution, suppose, discuss, modify, change, original, improve, adapt, minimize, maximize, theorize, elaborate, test, happen, delete Questions: • What changes would you make to solve…? • How would you improve…? • What would happen if…? • Can you elaborate on the reason…? • Can you propose an alternative…? • Can you invent…? • How would you adapt to create a different…? • How could you change (modify) the plot (plan)…? • What facts can you compile…? (17) • What way would you design…? • What could be combined to improve (change)…? • Suppose you could _what would you do…? • How would you test…? • Can you formulate a theory for…? • Can you predict the outcome if…? • How would you estimate the results for…? • What could be done to minimize (maximize)…? • Can you construct a model that would change…? • How is _ related to…? • Can you think for an original way for the…? • What are the parts or features of…? • Why you think…? • What is the theme…? • What motive is there…? • Can you list the parts…? • What inference can you make…? • What ideas justify…? • What conclusions can you draw…? • How would you classify…? • How would you categorize…? • Can you identify the different parts…? • What evidence can you find…? • What is the relationship between…? • Can you make the distinction between…? • What is the function of Assessment:  Create a story from just the title before the story is read (pre-story exercise)  Write three new titles for the story that would give a good idea what it was about  Create a poster to advertise the story so people will want to read it  Use your imagination to draw a picture about the story  Create a new product related to the story  Restructure the roles of the main characters to create new outcomes in the story  Compose and perform a dialogue or monologue that will communicate the thoughts of the main character(s) at a given point in the story  Imagine that you are the main character Write a diary account of daily thoughts and activities  Create an original character and tell how the character would fit into the story  Write the lyrics and music to a song that one of the main characters would sing if he/she/it became a rock star and perform it (18) # Blooms Level VI: Evaluation Present and defend opinions by making judgments about information, validity of ideas or quality of work based on a set of criteria Key words: award, choose, conclude, criticize, decide, defend, determine, dispute, evaluate, judge, justify, measure, compare, mark, rate, recommend, rule on, select, agree, appraise, prioritize, opinion, interpret, explain, support importance, criteria, prove, disprove, assess, influence, perceive, value, estimate, deduct Questions: • Do you agree with the actions/outcome…? • What is your opinion of…? • How would you prove/ disprove…? • Can you assess the value or importance of…? • Would it be better if…? • Why did they (the character) choose…? • What would you recommend…? • How would you rate the…? • How would you evaluate…? • How would you compare the ideas…? the people…? • How could you determine…? • What choice would you have made…? • What would you select…? • How would you prioritize…? • How would you justify…? • What judgment would you make about…? • Why was it better that…? • How would you prioritize the facts…? • What would you cite to defend the actions…? • What data was used to make the conclusion…? • What information would you use to support the view…? • Based on what you know, how would you explain…? Assessment:  Decide which character in the selection he or she would most like to spend a day with and why  Judge whether or not a character should have acted in a particular way and why  Decide if the story really could have happened and justify reasons for the decision (19) II WRITING TEST SPECIFICATION Test specifications in general cover the following areas, as appropriate Test specifications naturally vary according to their uses, so not all of these will be appropriate for all tests # What is the purpose of the test? The purpose of a test generally falls into one of five broad categories: placement, progress, achievement, proficiency, and diagnostic It is important before starting to write a test to know which of these broad purposes the test has # What sort of learners will be taking the test? Useful information about the learner can include the age ỏ educational level; general level of proficiency; first language(s); cultural or national background; level and nature of education; reason for taking the test; professional interests, if any; and levels of background knowledge # How many sections should the test have, and how long should they be? The specifications should establish how mny sections the test has, how long each of them is, and how they are different For example, the test might be one two-hour exam or two one-hour sections, one an examination and one an essay # What text types should be used in the test? The specifications should indicate whether the texts should be written or spoken, what kinds of sources they should come from, what topics they should include, how difficult they should be, what their functions should be (for example, persuasion or summarizing), etc # What languages skills should be tested? The specifications should indicate what skills are the test should cover, including the enabling skills, and whether they should be in an integrative or discrete way They should also establish whether the test should ask for the main idea, specific details, inferences, etc # What language elements should be tested? If there are specific grammatical points, functions, or lexical items that should be (20) covered in the test, the specifications should list these # what sort of tasks are required? The specifications should indicate whether the tasks should be simulated authentic tasks, objective or subjective, etc # How many items are there in each section, and what is the relative weight for each item? The specifications should specify the number of items in each section and indicate whether they are weighted equally or whether more weight is given to more difficult or longer items # What test methods are used? The test specifications should indicate whether the items should be multiple choice, fill-in-the-blank, picture description, role play using cue cards, essay, etc # What instructions should be given to the candidate? The test specifications should indicate what information should be incuded in the instructions, whether examples of worked problems should be provided, whether there will be information about how the responses will be evaluated # What criteria will be used for assessment? The specifications should establish whether the test will be assessed according to accuracy or fluency, whether spelling will be counted, and so on TEST SPECIFICATION GRID Focus Input Response/ite m type Marks Weights (21) III TYPES OF TEST AND TEST ITEMS Questions for discussion: What are types of tests? What types of test items? III.1 TYPES OF TEST Task 1: Match the test types with their descriptions: placement To identify students’ strengths and weaknesses diagnostic To measure students’ proficiency using international standards progress To evaluate students’ attainment of course outcomes at end of course achievement To place students at appropriate level of standardized instruction within a program To provide information about mastery of the course materials III.2 TYPES OF TASK/ITEM TYPES In approaching the task of item writing, the writer needs to be clear about the following: • why that particular item type has been selected for the test • which areas of the test taker's ability are to be the focus of the items Terminology A test is composed of a number of tasks The more tightly controlled type of task (the kind used to test reading skills, structural competence, listening and writing at sentence level) is made up of a combination of the rubric , input consisting of a (22) stimulus such as a text , and the candidate's response based on items of various types (whether selected or produced) which is scored against a key or mark scheme A distinction must be made between item-based task types and the tasks used in tests of extended writing and speaking, which consist of rubric, input and a response which is scored against a rating scale or set of criteria as opposed to a key or mark scheme Some particular issues which are related to texts, item types, non-item-based task types, rubrics and keys and mark schemes will now be considered III.2 TEXTS When selecting texts for a task it is very important to use texts which are suitable for the purpose of testing the particular candidate population concerned The level of difficulty of the language must be appropriate, and the subject matter suitable for the candidates' probable age-group and other aspects of their background In general, topics which might cause distress or offence should be avoided These issues are touched on in Module Two, in the discussion of the process of examination production Two issues concerning texts will be discussed here: the question of authenticity and the question of what makes a text difficult a) Authenticity A much debated issue affecting choice of texts, for teaching as well as for testing is that of authenticity Is it more appropriate to the candidate's needs for the examination to include a text (in a test of reading skills, for example) which is taken from a genuine source such as a newspaper or magazine, or a text written by a test provider or item writer? The newspaper or magazine text may seem more appropriate because it is derived from 'real-life' use of language, written for native speakers of the language and not just for the purposes of language testing Being able to deal with the texts a native speaker can deal with may be the goal of the learner, and so this is the language s/he should be exposed to and tested on A text written for the sole purpose of testing a certain area of language may bear no resemblance to language as it is used by native speakers who are (23) not concerned with language testing It has been argued that authenticity is a consequence of the interaction between the reader and the text, and not simply a quality of the text alone Even a quick look at the range of language use contained in a variety of newspapers and magazines shows that not all written texts are authentic for all readers Who the reader is, the reader's purpose in looking at the text, the writer's purpose and the degree of social and cultural match between reader and text all have a bearing on the authenticity of the text for that reader If there is very little match between the factual and cultural knowledge contained in the text and that possessed by the reader (think of an elderly opera lover attempting to read a teenage rock magazine) there may be little authenticity in the experience of reading it An important view of authenticity has developed since the late seventies Widdowson (1978) and later Bachman (1990) conceptualize authenticity on two levels, situational and interactional # situational authenticity Situational authenticity may be defined as the degree to which the test method characteristics of a language task reflect the characteristics of a real life situation in which the language will be used In designing a situationally authentic task, it is necessary first to identify the critical features that define the task in the target language use domain, using a framework of test method characteristics as a starting point Test tasks which have these critical features can then be designed # interactional authenticity Bachman defines his concept of interactional authenticity in the interaction between test task and test taker: "If our objective is to design language test tasks that correspond to non-test language use, then test tasks must incorporate the goal-directed, purposive nature of language of language as communication, which means that they must involve the test taker in functions other than simply demonstrating his knowledge of the language." (Bachman, 1990) This view of authenticity implies that test writers and developers should: • make use of texts, situational contexts, and tasks which simulate 'real life' without (24) trying to replicate it exactly; • attempt to use situations and tasks which are likely to be familiar and relevant to the intended test taker at the given level; • make clear, in providing contexts, the purpose for carrying out a particular task, as well as make clear the intended audience; • make clear the criterion for success in completing the task b) Difficulty of texts Test writers and developers regularly have to deal with the concept of what constitutes text difficulty With reference to both written and spoken texts, it is necessary to be aware that there are a number of factors which affect the degree of difficulty readers and listeners (whether or not they are in the position of examination candidates) experience in processing them: # linguistic structure of the text A text which is composed of short, simple sentences, using the active voice, is perceived as easier than one composed of long, complex sentences which include much use of the passive Structures and vocabulary which are relatively familiar to candidates are easier than those with which they are less familiar # the context in which the text is placed Whether the text is spoken or written, it is easier to process if it addresses readers or listeners directly, rather than putting them in the position of the 'fly on the wall' observing interaction between other characters The visual support provided by video (in a listening test), pictures or diagrams makes a text easier, as does the absence of pressure to deal with the text in a limited time If the text is placed in a context which creates an 'information gap', giving candidates a compelling reason to wish to extract information from the text, this too helps to make it easier # the content of a text In a narrative, a small number of clearly differentiated characters are easiest to deal with For example, a story about two women and two men, who are of different ages, have dissimilar names to one another and clearly presented contrasting characters, is perceived as easier than a story which involves a great number of lightly sketched minor characters The sequence of events in a narrative is easiest when most straightforward, with events (25) described in the order in which they take place, without the use of flashbacks If there is a clear link between events - such as cause and effect - this also makes the text easier than one which appears to consist of unrelated events The listener or reader who already possesses knowledge structures which the new narrative fits into finds it less difficult than someone who lacks these # the type of interaction and the relationship which it creates between text and reader or listener Extremely formal texts expressing a cold relationship or a very informal, intimate style are both likely to cause more difficulty to readers or listeners than a relatively neutral or moderately informal style c) The issue of difficulty in listening tasks The relationships between parts of the text and the possibility of looking back over them and seeing the text as a whole which is offered by a written text is not present in a listening task Apart from considering the level of linguistic difficulty, in terms of the complexity of structure and vocabulary used, a writer of listening tasks should take note of the following factors when writing or choosing texts All of them affect the amount of processing required over and above the level of simple comprehension, and this impacts on the difficulty level of the text # Interaction of speakers • a monologue is the easiest type of speech to follow, especially if the speaker seems to be addressing the listener directly • two contrasting voices (one male, one female, or one adult, one child) are next easiest • a conversation between two people of the same sex and age, or involving more than two speakers, is more difficult • a conversation between speakers who have clearly differentiated roles, such as parent to child, is easier to follow • a conversation between speakers who have similar roles, for example, colleagues of the same sex and similar status discussing a situation at work, is generally more difficult # Time reference and context • a text which involves changes of scene, changes of time reference and a large (26) number of events, will be more difficult than one which is limited to a small number of events, all of which share the same time and setting • A text in which a clear context is established from the beginning is easier to follow # Language • a short text packed with information and accompanied by a proportionately large number of items is difficult for candidates to process, even if the level of language used seems appropriate The inclusion of redundant material, in the form of explanation, rephrasing and repetition, helps to lower the difficulty level of a text • informal language, with its high speed, use of contractions and colloquialisms, its apparent lack of coherent organisation and frequent short turns, often presents a more difficult listening task than more formal language, which tends to be slower, to consist of longer turns and to share more of the features of written language • a naturally slow speaker with an expressive voice is easier to understand than someone who speaks fast or in a monotone It also helps if the speed at which the person speaks is consistent III.2 ITEM TYPES Which type of item is most appropriate for testing a particular skill in a particular test? This is an important issue and the question is normally decided at the test design stage The large number of different item types used in language testing can be categorized in various ways: • Some are seen as objective, in that no human judgement is required in marking them, while others demand a constructed response and subjective marking methods • Some are based on receptive skills while others test production • Some are text based while others are free-standing or discrete What is the most important criterion for measuring the value of an item type? Although some item types are more frequently used than others, it would be inappropriate to believe that these are the best ones to use The most important criterion for measuring the value of an item type is its appropriacy for use in testing language in a particular situation and for a specified purpose (27) The item type which provides the most direct means of measuring the desired learning outcome tends to be the best item type to choose A few general rules There are a few general rules to follow when constructing any kind of item: • items should always attempt to test salient information • normal grammatical conventions should be followed • when a new item type is used, an example should be provided (unless the procedure is so simple that this is unnecessary) • with text-based items it must be necessary to read and understand the text in order to arrive at the correct answer - it should not be possible to answer correctly by using background or general knowledge only • text-based items may be placed before or after the text, but those placed before should test an overview of the text, while those placed after the text may require more detailed reading or ask for conclusions to be drawn One way of dividing item types into two broad groupings is the following: # Selection items Item types which involve the candidate in making a choice of response between various options offered, e.g three or four option multiple choice item, true/false and various kinds of matching items Scoring selection items is usually quicker and objective Sometimes teachers decide to use selection items when they are interested in measuring basic, lower levels of understanding (at the knowledge or comprehension level in a Bloom's taxonomy sense, Bloom et al.,1956) # Candidate-supplied items Item types which demand that the candidate supplies the response, e.g short answer items, open cloze items Generally, tests composed of multiple choice items are regarded as more objective from the point of view of marking than those where the candidate has to supply the response It is important to reiterate that one item type is not in itself more or less useful than another item type The selection of an appropriate item type depends on the specific aims of the test provider, and what the priorities are In the descriptions of item types and the comments on them which follow, some indication is given of the skills usually (28) associated with the use of a particular item type Scoring supply items tends to take more time and is usually more subjective Teachers tends to use supply items if they are interested in higher levels of understanding, but a well-written selection item can still get at higher levels of understanding Testing Speaking and Writing The testing of speaking and writing can be divided into testing of elements of skill which may be labelled 'grammar', 'vocabulary', 'spelling', pronunciation, etc Items which test writing skills may appear in tests or components of tests, called either 'writing' or ‘grammar and usage' or 'structural competence' Speaking skills or writing skills looked at on this level of discrete elements are sometimes assessed by means of item-based tests Those speaking or writing skills which involve organization of ideas and arguments, interaction, sequencing and the construction of coherent narrative have to be tested by means of tasks which are not generally item-based A consideration of a range of items and task types is presented in the following sections It is not exhaustive, but aims to cover most commonly used item and task types, especially those used in ALTE members' examinations for foreign language learners III.2.2.1 MULTIPLE CHOICE AND OTHER SELECTION ITEM TYPES When selection item types are used in a test, it is likely that the test provider considers some of the following features of these item types to be advantageous Selection items tend to be: • familiar to nearly all candidates in all places • independent of writing ability • easy and quick to mark, lending themselves easily to the use of a template or Optical Mark Reader • capable of being objectively scored • economical of the candidate's time, so that many can be attempted in a short period and a range of objectives covered, adding to the reliability of the test (29) On the other hand, it should be pointed out that selection items are sometimes criticized because they tend to be: • tests of recognition rather than production • limited in the range of what they can test • incapable of letting a candidate express a wide range of abilities • dependent, in many cases, on reading ability • affected by guesswork - even with three distractors there is a in chance of getting the answer right by guessing, while with fewer distractors the effect increases accordingly • very difficult and time consuming to write successfully • capable of leading to poor classroom practice, if teaching focuses too intensively on preparation for tackling this sort of test item The decision as to whether this category of item types is used depends on what is to be tested and why No task or item type is right or wrong in absolute terms A wide variety of techniques may be grouped together under the heading of multiple choice and other selection item types What they all have in common is that candidates are required to make a choice among options supplied in the test They not have to supply a single word of their own The following list gives examples of the most familiar selection techniques a) Multiple choice items # Discrete point and text based multiple choice items  A discrete point multiple choice item is presented in the example below: The singer ended the concert her most popular song A by B with C in D as The gapped sentence is the stem, which is followed, in the above example, by four options B is the correct choice, or key, while A, C and D act as distractors which may be chosen by weaker candidates Three, four or five options may be given (30)  A text-based multiple choice item is presented in the example below Then he saw a violin in a shop It was of such high quality that even top professional players are rarely able to afford one like it ‘I’d never felt money was important until then,’ Colin explained ‘Even with the money I’d won, I wasn’t sure I could afford to buy the violin, so I started to leave the shop Then I thought I’d just try it, and I fell in love with the beautiful sound it made I knew it was perfect both for live concerts and for recordings.’ When Colin first found the violin, what did he think? A He might not have enough money to buy it B He should not spend all of his money on it C He was not a good enough player to own it D He could not leave the shop without it Text-based multiple choice items are often presented as a question followed by three, four or five options which include the key, or correct answer Multiple choice items are very frequently used in tests of reading and listening Below is an example from a listening test It is an interview with a young American woman who runs a coffee company in London Interviewe: Now, Ally, you run this company with your husband, Scott, so tell me how did it all start? Ally: Well, I’ve known Scott since I was fifteen and after we’d both finished college in the States, he came to England because he had got a job in a bank We weren’t married then but I decided to follow him over here I had a degree in Media Studies, so I got a job in magazine publishing very quickly Question: Ally first decided to go to England because she A would have the chance to study there B was offered a job in London (31) C wanted to be with her boyfriend In the above example the multiple choice item is presented as a stem with options for completion Ally: The coffee thing started when on my first morning here I told Scott I’d walk him to work and we’d stop for a latte, that’s a milky coffee, by the way Scott looked at me blankly, and I just assumed he’d been working away too hard and hadn’t discovered where you could get great latte coffees in London I couldn’t believe they weren’t available! Years passed I brought the subject up everywhere we went and people’s eyes would light up Question: What surprised Ally about London? A a certain type of coffee was not on sale B people were not interested in the quality of coffee C the coffee bars were not conveniently located In the above example the multiple choice item is presented as a question One of the decisions to make when writing a text based multiple choice item is whether to present it as a question or as a completion item In some tests a text is followed by multiple choice items of one of these kinds only, while other test constructors prefer to use a mixture of question and completion types What are the rules for writing discrete or text based multiple choice items?  The item should measure one important point  Items should not be interdependent i.e the answer to one item should not influence the answer to another  There should be only one correct option, and its status as key must be clear and unambiguous  The distractors, while being incorrect, should be plausible enough to distract weak candidates (32)  Options should form a coherent set of alternatives; there should not be three similar-looking options and one which stands out as different from the others  Where the options complete a stem, each should form a grammatically correct sentence Similarly, in discrete items, grammatically nonsensical forms should not be invented as distractors  Each option should be as close in length to the others as is possible  To reduce the reading load, any information which is repeated in each option should be taken out of the options and placed in the stem  Options which cancel each other out, using words such as 'always' and 'never', should be avoided  Options should have an approximately equivalent grammatical structure and level of complexity to one another  Negative forms should be avoided as much as possible, but if a negative word is included in the stem, it should be emphasized by putting it in bold print, and all the options should be positive  Verbal clues which direct the candidate to the correct option (‘word spotting’) should be avoided In the following example, the repetition of the word 'fruit' in the stem and one of the options provides a clue to the answer: Which of the following is mentioned in the recipe for Rich Fruit Cake? A almonds B milk C dried fruit D apples • The position of the key should vary randomly, and each letter (A,B,C or A,B,C,D) should be used a similar number of times b) True/false item (33) The true/false item is one in which test takers have to make a choice as to the truth or otherwise of a statement, normally in relation to a reading or listening text Example: These items accompany a listening text based on a conversation between two people about watching television YES/NO Tony and Rachel both dislike watching cartoons Tony and Rachel both prefer watching television alone Rachel thinks her mother can afford to buy her a television Tony has kept his promise about watching television at night Rachel wants to be able to choose when she watches television The disadvantage of this type of item is that there is a simple 'yes/no' choice Unfortunately, a candidate who relies on guessing still has a chance of achieving a reasonably high score The simplicity of the action required from the candidate is only appropriate for the lowest level reading tasks but it is a suitable item type to choose for tests of listening In tests of reading the tendency to encourage guessing can be limited by adding another option, giving a choice of true/false/not given or correct/incorrect/not stated, but it may be more appropriate to use a multiple choice item type The true/false/not given type is not appropriate in a test of listening because it causes a great deal of confusion It is extremely difficult to establish that something is 'not given' unless you are in a position to review a text, and this is generally not possible in a listening test c) Gap-filling (cloze passage) with multiple choice options A cloze test is one in which words are deleted from a text, creating gaps which the candidate has to fill, normally with either one or a two words Within this basic format, there are several variations: • gaps may be created mechanically e.g by the deletion of every sixth or seventh word • certain types of words may be deleted at irregular intervals throughout the text (34) The kind of cloze test illustrated here is accompanied by options from which to fill the gaps in the text There is also a cloze test known as 'open', in which the candidate supplies the missing words Open cloze tests are described under the heading of 'candidate-supplied response item types' Multiple choice cloze tests are typically used to test reading, grammar or vocabulary Example: THE LANGUAGES OF THE WORLD Thousands of languages are spoken in the world today Populations that (1) similar cultures and live only a short distance (2) may still speak languages that are quite distinct and not (3) understood by neighbouring populations A share B belong C keep D own A far B apart C divided D separated A closely B freely C smoothly D readily A wrapped B covered C drowned D filled The gaps in the text shown above were created where the item writer chose an item to test, as opposed to creating gaps by the method of mechanical deletion of words at regular intervals This type of cloze is therefore more properly referred to as 'selective deletion' Because of the limits on choice imposed by the multiple choice format, this type of cloze is often used in tests of reading or of grammar and usage, in a section of the test where the focus of testing is on knowledge of vocabulary, a situation in which an open cloze would create the possibility of too many acceptable responses It can also be used for testing knowledge of structure, although open cloze is also suitable for that purpose The disadvantage of selective deletion gap filling is that the range of skills that can be tested by this method is very limited, and restricted to sentence level What are the basic rules for constructing this sort of cloze test?  As with other types of multiple choice items, only one of the options must be (35) correct, and the options should form a coherent set One way of choosing distractors is to administer the test to some students as an open cloze (where no options are provided) and use some of the wrong responses as distractors  The first gap should not be placed too near the beginning of the passage, or subsequent gaps so close to each other that it becomes difficult to see which structure is being used A reasonable assumption is that there should generally be between seven and twelve words between gaps  Deleting the first word in a sentence should be done infrequently, and deleting negatives avoided It is also not advisable to delete words (usually adjectives or adverbs) which leave an acceptable sentence when omitted  Contractions, hyphenated words and any other form which may confuse candidates who have been directed to fill each gap with one word should not be deleted d) Gap-filling with selection from bank A similar sort of test to the cloze described above, consists of a text with gaps accompanied by a 'bank' containing all the correct words to insert in the text, with the addition of several which will not be used This is suitable for use in elementary level tests of reading Example: Choose a suitable word from the list given above the passage for each of the gaps, and write your answer in the space on the answer sheet sun late paper second work met but had enjoyed on went a see Carla’s Weekend On Saturday morning Carla …(1)…… down to Bournemouth to … (2)… some friends It was … (3)… beautiful day so they stayed … (4)… the beach all afternoon Carla got up …(5)…on Sunday morning She read the … (6)… and then (36) remembered that she ….(7)… arranged to meet her sister for lunch They … (8) … a film in the evening e) Gap-filling at paragraph level A further example of a gap-filling task follows It consists of a text with six paragraphlength gaps A choice of seven paragraphs is given from which to fill the gaps This is a test of reading skills at a relatively high level, involving a test of candidates' understanding of an extensive text as a whole and of its structural and narrative coherence f) Matching There are a number of variations of matching tasks What they all have in common is that elements from two separate lists of sets of options have to be brought together At its simplest, matching is sometimes used in tests of structural competence, when the first halves of sentences are given in one set, and the correct second half of each sentence has to be selected from another set A more extensive type of matching task presents the candidate with two sets of descriptions, one of which is usually of people who have some particular requirement, for example, a certain type of holiday, a book or accommodation The other set gives details of holidays, books, accommodation, etc out of which there is only one which fits exactly the requirements of each person In the following example, brief descriptions of five people, each of whom wants to buy a book, are given Next to these there are descriptions of eight books The task involves choosing one book for each person This kind of task is used in tests of reading Ali enjoys reading crime stories which are carefully written so that they hold his interest right to the end He enjoys trying to guess who the A London Alive This author of many famous novels has now turned to writing short stories with great success The stories tell of Londoners’ daily lives and happen in eighteen B Burnham’s Great Days Joseph Burnham is one of Britain’s best-loved painters these days, but I was interested to read that during his lifetime it was not always so Art (37) criminal really is while he’s reading different places – for example, one story takes place at a table in a café, another in the back of a taxi and another in a hospital historian Peter Harvey looks at how Burnham’s work attracted interest at first but then became less popular Monica is a history teacher in London She enjoys reading about the history of people in other parts of the world and how events changed their lives C The Missing Photograph Another story about the wellknown policeman, Inspector Manning It is written in the same simple but successful way as the other Manning stories – I found it a bit disappointing as I guessed who the criminal was halfway through! D Gone West A serious look at one of the leastknown regions of the United States The author describes the empty villages which thousands left when they were persuaded by the railway companies to go West in search of new lives The author manages to provide many interesting details about their history Silvia likes reading true stories which people have written about themselves She’s particularly interested in people who have had unusual or difficult lives E The Letter The murder of a television star appears to be the work of thieves who are quickly caught But they escape from prison and a young lawyer says she knows who the real criminals are Written with intelligence, this story is so fast-moving that it demands the reader’s full attention F Let me tell you … The twenty stories in this collection describe the lives of different people who were born in London in 1825 Each story tells the life history of a different person Although they are not true, they gave me a real feeling for what life used to be like for the ordinary person Daniel is a computer salesman who spends a lot of time travelling abroad on planes He enjoys detective stories which he can read easily as he gets interrupted a lot G The Last Journey John Reynolds’ final trip to the African Congo two years ago unfortunately ended in his death For the first time since then, we hear about where he went and what happened to him from journalist Tim Holden, who has followed H Free at Last! Matthew Hunt, who spent half his life in jail for a crime he did not do, has written the moving story of his lengthy fight to be set free Now out of prison, he has taken the advice of a judge to describe his experiences in (38) Reynolds’ route a book Takumi doesn’t have much free time so he reads short stories which he can finish quickly He likes reading stories about ordinary people and the things that happen to them in today’s world g) Multiple matching In a multiple matching exercise a number of questions or sentence completion items are set, which are generally based on a reading text The responses are provided in the form of a bank of words or phrases, each of which can be used an unlimited number of times Example: These items follow a reading passage entitled 'The Gases Heating Up The Earth' What are the sources of the following gases? A industry B insects C decay carbon dioxide CFCs E methane nitrous oxide D motor vehicles generating electricity from fossil fuels F reaction with sunlight ozone G household products The difference between this type of matching exercise and the previous example is the lack of restrictions on the number of times any of the options can be chosen As options are not removed as the candidate works through the items, the task does not become progressively easier This is an economical form of matching exercise, which can be used to test reading skills up to an advanced level (39) h) Extra word error detection In this type of task there is one extra, incorrect, word in most of the lines of a text Candidates have to identify and write the word at the end of the line in the right-hand column; if there is no extra word in the line, a tick should be written there Example: V The Ski Shop been Three years ago I spent six months working in a ski shop I had always been enjoyed skiing, and so I thought it would be a good opportunity to earn a little bit money and to practise on my favourite sport I learnt a lot while I was working there, even though it was hard work I can now tell of someone’s skiing ability just by watching them carry a pair of skis Most people are usually agree that good skiers pick their skis up with lots confidence and carry them over their shoulder pointing forwards This item type requires candidates to focus on their conscious knowledge of the way language structures work, and has a particular use in tests of structural competence In certain contexts it is appealing since it may be highly situationally authentic However, this item type also has disadvantages It is difficult to construct items which represent plausible errors, or errors which could not (if this were a 'real-life' task) be corrected in more than one way This example is at B2 level and the text needs to reflect the kind of text a student could produce at this level The errors need to focus on common errors at this level e.g must to / must, said him / told him, married with / married to, bored / boring , the news is / are etc There are many other variations on the multiple choice theme, including choosing paragraph headings from a list of options, completing sentences by choosing from a list of phrases, choosing a picture to go with a taped description and labelling a diagram by choosing from a list of options, but all share their essential characteristics with the (40) examples given above III.2.2.2 CANDIDATE-SUPPLIED RESPONSE ITEM TYPES Items for which the candidate has to provide the response come in a variety of types from sentence completion or short-answer questions, where the candidate may have to supply as little as one word, a number or a short phrase, to types requiring longer and more complex operations It may be claimed that, by comparison with selection types, they: • are easier to write • allow for a wider sample of content • minimize the effect of guessing • allow for creativity in language use • measure higher as well as lower order skills • have a more positive effect on classroom practice • can provide a similar degree of marking objectivity as selection items But these types of items also have their limitations Even the most controlled types are difficult to construct in such a way that the required response is clearly indicated There are often acceptable alternative responses rather than only one unambiguously correct response This makes them time consuming and difficult to mark, often calling for examiner marking rather than clerical or computerized marking, What these item types have in common is that, even in the most tightly controlled examples, candidates have to supply some language of their own, rather than choose among options supplied to them Although they may appear in tests of reading, listening and structural competence, an element of writing is involved In some cases this is done in the candidate's own language, but it is more likely to be a requirement of the test that the writing is in the target language This raises the issue of spelling The mark scheme must specify the policy towards spelling If poor spelling is penalised, candidates for whom this is a particular problem will be unable to show their true level of language ability If spelling is disregarded, there is the problem of deciding at which point a badly spelled version of a word ceases to be an acceptable representation of the word Other concerns are accuracy , and the point at which a response which is not entirely correct may be considered acceptable or not Problems of this kind make the (41) process of assessment more subjective a) Short answer item This item type consists of a question which can be answered in one word or a short phrase The exact limits on the length of the answer should be specified It is related to a text and generally used in tests of reading and listening Example: The text, which is in a test of listening, is a man asking for information about a train TRAIN To: Newcastle Day of journey: Train leaves at: Return ticket costs: Food on train: Drinks and …………… Address of Travel Agency 22 …………… Street The correct response in this type of item is always one word or number, or a short phrase, not more than three or four words long The range of the number of words expected should be made clear to the candidate The questions must be written so that they cannot be answered from general knowledge or common sense alone In the above example spelling is tested explicitly in the last item The street name (Mallet Street) is spelt out on tape (as it might be in real life) and the mark awarded only if the candidate spells the word correctly on the answer sheet b) Sentence completion In this kind of item part of a sentence is provided, and the candidate has to use information derived from a text to complete it These items are used in tests of reading and listening Example: The same text as in the example (man asking for information about a train) could be (42) followed by completion items: The man wants to travel on ……………… His train leaves at …………… A return ticket costs ………………… On the train the man can buy drinks and ………………… The address of the Travel Agency is 22 ……………… Street As a general rule, blanks should be placed near the end of a statement, so that the candidate is provided with enough context to respond to the item When writing such items, it is important to ensure that there is either one unambiguous correct answer or a very limited number of acceptable answers which can be specified The success of the marking process depends on this c) Open gap-filling (cloze) The varieties of cloze test have been described in the section on multiple choice and other selection item types In an open cloze, the gaps are selected by the item writer, who focuses on the particular structures to be tested The candidate's task is to supply the word which fills each gap in the text Example: A NEW CRUISE SHIP One (1) … the biggest passenger ships in history, the Island Princess, carries people on cruises around the Caribbean More than double (2) … weight of the Titanic (the large passenger ship which sank in 1912), it was (3) … large to be built in (4) … piece Instead, forty-eight sections (5) … total were made in different places The ship was then put together (6) … these sections at a shipbuilding yard in Italy Open cloze works well in tests of structural competence Prepositions and parts of verb forms can be deleted, for example, and there is often only one possible correct answer Knowledge of vocabulary is more easily tested by means of a multiple (43) choice cloze, as there are frequently too many possible correct answers to make an open cloze practicable Gaps should occur approximately every seven to ten words d) Transformation In this type of item the candidate is given a sentence, followed by the opening words of another sentence which give the same information, but expressed through a different grammatical structure For example, the first sentence may be active, while the second must be written in the passive The candidate has to complete the sentence correctly This item type is used in tests of structural competence or writing at sentence level Example: Maria missed the ferry because her car broke down If Please not smoke in this area of the restaurant Customers are requested The crucial point to remember when writing this kind of item is that it will only work well if there is one clear correct answer (or a very limited selection of permissible variants) It is important to consider the number of testing points, and the acceptable answers for each For example, in the first item given above, there are three testing points, and one mark is available for each The mark scheme is as follows: Marks Maria’s / her car had not broken down Maria / she would not have missed /failed to catch the ferry OR Maria’s / her car had not broken down Maria / she would / could (44) have caught /got/ been able to catch the ferry The mark scheme shows all possible acceptable responses For the second part, there are many possibilities One way to tighten the focus of transformation items is to put the gap in the middle of the sentence so that the correct response is controlled by the structures on either side Limiting the number of words that can be used (in this case to three) further limits the range of possible correct answers Example: She moved to New York in order to find singing work She moved to New York ……………………wanted to find singing work because of / so / since she e) Word formation In this type of item one word is deleted from a sentence, and a related form of the word is given to the candidate as a prompt For example, if the noun form 'singer' is required by the context of the sentence, the verbal form 'sing' is given as a prompt The candidate has to supply the correct form of the word in its context It is used in tests of structural competence or writing where there is a focus on testing knowledge of vocabulary There were over fifty in the orchestra (MUSIC) My parents ……………………… me to learn to drive (COURAGE) When writing items of this type it is important that the sentence gives an economical and unambiguous context to the target word It tends to look more natural to put in a proper name rather than to use 'he' or 'she' all the time If the word formation items are set within a continuous text, more contextual support is given to the candidate than in the discrete items in the above examples In the example (45) below, the additional context allows the testing of a negative form in (2) In Holland, people were so desperate to own tulip bulbs that their (1)… BEHAVE became quite extraordinary It was not (2) ……… for people to sell COMMON all their (3) …… in order to buy a single tulip bulb The situation POSSESS became so serious that laws were passed with the (4) ……… of INTEND controlling this trade in tulips f) Transformation cloze As well as discrete items of the type shown above, the transformation item type can be used in a continuous text, creating a task which may be considered a variety of cloze procedure It consists of a text with a word missing in each line, and a different grammatical form of the word required supplied The candidate has both to find the location of the missing word and supply it in its correct form This kind of item can be used in tests of structural competence or writing at sentence level g) Note expansion In this item type the lexical components of each sentence are supplied in a reduced form which resembles notes The candidate's task is to supply the correct grammatical form, including changes in word order and the addition of such elements as prepositions, articles and auxiliary verbs A marker indicates each point in the reduced sentence, where an addition or change should be made This type of task is most likely to appear in a test of structural competence or writing Example: The following notes must be expanded to produce a letter written in reply to an invitation Dear Mr Harris (46) a) I be very pleased / meet you / teachers' conference / London last year b) It be kind / you / invite me / come and see you while I be / England / this summer c) I hope / pay a visit / your school / 26th and 27th June if / not be inconvenient d) Please / not rearrange / programme / me e) I be very happy / fit in / whatever you / at that time f) I like / stay overnight / 26th June and hope / arrange accommodation / me g) I telephone you once / reach London / confirm / exact time / arrival / school h) I look forward / meet / again Yours sincerely The comments made above on transformation items also apply here The item writer must be clear about what the testing points are, and must construct a mark scheme as part of the item writing task The above text contains forty specific testing points, each of which carries one mark The mark scheme for the first sentence is as follows: I was very pleased (1 mark) to meet you (1 mark) during/(while we were) at (1 mark) the (1 mark) teachers' conference (which was held) in London last year (1 mark) A disadvantage of the item type is that it necessitates a rather complicated mark scheme, and is difficult to mark accurately h) Error correction / proof reading This task type consists of a text in which a word appears in an incorrect form in each numbered line The candidate has first to identify the incorrect word, and then write it in its correct form at the end of the line A simpler variation on this has the incorrect word already marked so that the candidate has only to supply the corrected form This is a tightly controlled type of candidate-supplied response item, most often seen in tests of structural competence In this example either the line is correct or there is an error, either of spelling or punctuation (47) What colour can for you Today, colour is a dazzling background to our lives in a way that our ancesters can only have dreamed about We take colour pictures of ancestors our holidays, watch colour TV go shopping in supermarkets which TV, go vibrate with colour and we have colour printers attached to our home computers We worry about the right colours for decorating the house and we have favourites and pet hates where clothes are conserned In this task there is only one correct answer It is important that the item writer knows the range of types of incorrect words to be used i) Information transfer Tasks described in this way always involve taking information given in a certain form and presenting it in a different form For example, facts about a piece of equipment may be taken from a text and used in labelling a diagram of that equipment, or figures given in a table or graph presented in the form of paragraphs of text This item type tests skills involved in writing and structural competence Example: The input is an email written by a sales representative to his boss The task consists of re-writing it as a report for Head Office The 'transfer' referred to here concerns register from email (informal) to report (formal) Email John Here’s my impression of this year’s sales conference We were all very happy about where it was held – no problems getting to the centre When we got there, the manager greeted us warmly and everything connected with the dinner that evening was pretty good Unfortunately, we had problems the next day – it was incredibly (48) hot and stuffy in the conference room so we felt very uncomfortable Then they told us at lunch that there were lots of things on the menu we couldn’t have but they didn’t tell us why Report The (1) ……… itself was found to be very satisfactory and (2) ……… to the centre was very straightforward In addition, on (3) ……… we were given a very warm (4) ……… by the manager, and the dinner arrangements were perfectly acceptable However, the (5) ……… in the conference room was extremely poor, causing great discomfort to all the delegates As for lunch, many of the dishes were (6) ……… and no (7) ……… was given As with selection items, there are additional item types, including longer answers to questions and transfer of information involving maps, diagrams, graphs, etc., but the examples given should give an idea of the range of item types available which involve the candidate in supplying the language used in the response III EVALUATING THE TEST The previous parts in this material have discussed how to construct and administer examinations of subskills and communication skills But one thing more is needed: how to tell whether or not we have been successful-that is, have we produced a good test? Why is this important? For one thing, good evaluation of our tests can help us measure student skills more’ accurately It also shows that we are concerned about those we teach For example, test analysis can help us remove weak items even before we record the results of the test This way we don’t penalize students because of bad test questions Students appreciate an extra effort like this, which shows that we are concerned about the quality of our exams And a better feeling toward our tests can improve class attitude, motivation, and even student performance Some insight comes almost intuitively We feel good about a test if advanced students seem to score high and slower students tend to score low Sometimes students provide helpful “feedback,” mentioning bad questions, as well as questions on material not previously covered in class, and unfamiliar types of test questions Besides being on the right level and covering material that has been discussed in class, good tests are also valid and reliable A valid test is one that in fact measures what it (49) claims to be measuring A listening test with written multiple-choice options may lack validity if the printed choices are so difficult to read that the exam actually measures reading comprehension as much as it does listening comprehension It is least valid for students who are much better at listening than at reading Similarly, a reading test will lack validity if success on the exam depends on information not provided in the passage, for example, familiarity with British or American culture A reliable test is one that produces essentially the same results consistently on different occasions when the conditions of the test remain the same We noted in the previous part, for example, that teachers’ grading of essays often lacks consistency or “reliability” since so many matters are being evaluated simultaneously In defining reliability in this paragraph, we referred to consistent results when the conditions of the test remain the same For example, for consistent results, we would expect the same amount of time to be allowed on each test administration When a listening test is being administered, we need to make sure that the room is equally free of distracting noises on each occasion If a guided oral interview were being administered on two occasions, reliability would probably be hampered if the teacher on the first occasion were warm and supportive and the teacher on the second occasion abrupt and unfriendly In addition to validity and reliability, we should also be concerned about the affect of our test, particularly the extent to which our test may cause undue anxiety Negative affect can be caused by a recording or reading, for example, that is far too difficult or by an unfamiliar examination task, such as translation if this has not been used in class or on other school exams There are differences, too, in how students respond to various forms of tests Where possible, one should utilize test forms that minimize the tension and stress generated by our English language tests Besides being concerned about these general matters of validity, reliability, and affect, there are ways that we can improve our tests by taking time to evaluate individual items While many teachers are too busy to evaluate each item in every test that they give, at least major class tests should be carefully evaluated The following sections describe how this can be done a) Preparing an Item Analysis Selection of appropriate language items is not enough by itself to ensure a good test Each question needs to function properly; otherwise, it can weaken the exam Fortunately, there are some rather simple statistical ways of checking individual items This procedure is called “item analysis.” It is most often used with multiple-choice (50) questions An item analysis tells us basically three things: how difficult each item is, whether or not the question “discriminates” or tells the difference between high and low students, and which distractors are working as they should An analysis like this is used with any important exam-for example, review tests and tests given at the end of a school term or course To prepare for the item analysis, first score all of the tests Then arrange them in order from the one with the highest score to the one with the lowest Next, divide the papers into three equal groups: those with the highest scores in one stack and the lowest in another (The classical procedure is to choose the top 27 percent and the bottom 27 percent of the papers for analysis But since the classes are usually fairly small, dividing the papers into thirds gives us essentially the same results and allows us to use a few more papers in the analysis.) The middle group can be put aside for awhile You are now ready to record student responses This can be done on lined paper as follows: Item # High Group Low Group A B C D (no answer) X Circle the letter of the correct answer Then take the High Group papers, and start with question number one Put a mark by the letter that each person chose, and this for each question on the test Then the same in the “Low Group” column for those in the bottom group b) Difficulty Level You are now ready to find the level of difficulty for each question This is simply the percentage of students (high and low combined) who got each question right To get the level of difficulty, follow these steps: (1) Add up the number of high students with the correct answer (to question number one, for example) (2) Then add up the number of low students with the correct answer (3) Add the sum found in steps and 2, together (4) Now divide this figure by the total number of test papers in the high and low groups combined A formula for this would be: High Correct + Low Correct or Hc + Lc Total Number in Sample N An example will illustrate how to this Let’s assume that 30 students took the test We correct the tests and arrange them in order from high to low Then we divide (51) them into three stacks We would have 10 in the high group and 10 in the low group We set the middle 10 aside The total number (N) in the sample is therefore 20 We now mark on the sheet how many high students selected A, B, C, or D; and how many low students marked these choices (If the item is left blank by anyone, we mark the “X” line.) Below is the tally for item Note that “B” is the right answer for this question We see that in the high group and in the low group got item number correct 5+ = =35 % 20 Thus, 20 answered this item correctly Now we can see if the item is too easy, too difficult, or “about right.” Generally, a test question is considered too easy if more than 90 percent get it right An item is considered too difficult if fewer than 30 percent get it right (You can see why by noting that a person might get 25 percent on a four-option test just by guessing.) Referring to the example, we find that item is acceptable However, it would be best to rewrite much of the test if too many items were in the 30’s and 40’s If you plan to use your test again with another class, don’t use items that are too difficult or too easy Rewrite them or discard them Two or three very easy items can be placed at the beginning of the test to encourage students Questions should also be arranged from easy to difficult Not only is this good psychology, but it also helps those who don’t have a chance to finish the test; at least they have a chance to try those items that they are most likely to get right It is obvious that our sample item would come near the end of the test, since only a third of the students got it right Before leaving this discussion of item difficulty, we need to point out that on many language- tests (a grammar exam, for instance), it is not completely accurate to think of very difficult and very easy items as “weak” questions “Difficult” items may simply be grammar points that you have not spent enough class time on or that you have not presented clearly enough Adjusting your instruction could result in an appropriate level of difficulty for the item And an easy item simply points up that almost all students in the class have mastered that grammar point In short, this part of (52) the analysis provides insight into our instruction as well as evaluating the test items themselves c) Discrimination Level You can use the same high and low group tally in the previous section to check each item’s level of discrimination (that is, how well it differentiate between those with more advanced language skill and those with less skill) Follow these steps to calculate item discrimination: (1) Again find the number in the top group who got the item right (2) Find the number in the bottom group who got it right (3) Then subtract the number getting it right in the low group from the number getting it right in the high group (4) Divide this figure by the total number of papers in the high and low groups combined A formula for this would be: High Correct - Low Correct or HC - LC Total Number in Sample N Returning to sample item 1, note that choice “B” is the correct answer So subtract the persons in the low group getting the item right from the in the high group getting it right This leaves Dividing by 20, the number of highs plus lows, you get 0.15, or in other words, 15 percent Generally it is felt that 10 percent discrimination or less is not acceptable, while 15 percent or higher is acceptable Between 10 and 15 percent is marginal or questionable Applying this standard to the sample item, we see that it has acceptable discrimination, There is one caution in applying discrimination to our language tests When doing an item analysis of rather easy and rather difficult questions, be careful not to judge the items too harshly For example, when almost 90 percent get an item right, this means that nearly all low students as well as high students have marked the same (correct) option As a result, there is little opportunity for a difference to show up between the high and low groups In other words, discrimination is automatically low Also be careful when evaluating very small classes-for example, those with only 20 or 25 students This is especially true if students have been grouped according to ability You can’t expect much discrimination on a test if all the students are performing at about the same level But if you have a number of high and low students, the discrimination figure is very helpful in telling how effective the item is When you find items that not discriminate well or that are too easy or too difficult, you need to look at the language of the question to find the cause Sometimes you will find negative discrimination-more low students getting a question right than high students Occasionally even useless items like this can be revised and made acceptable (53) For example, an evaluation of one overseas test found a question with unacceptable discrimination Most of the high group thought that the phrase “to various lands and peoples” was wrong; they had learned that “people” did not take the “s” plural, and they did not know this rare correct form Simply changing this part of the test question resulted in a satisfactory item d) Distractor Evaluation Weak distractors, as we have just seen, often cause test questions to have poor discrimination or an undesirable level of difficulty No set percentage of responses has been agreed upon, but examiners usually feel uneasy about a distractor that isn’t chosen by at least one or two examinees in a sample of 20 to 30 test papers But sometimes it does happen that only one or two distractors attract attention There are three common causes for this: (1) Included sometimes is an item that was drilled heavily in class – an item that almost everyone has mastered Therefore, the answer is obvious; the distractors cannot “distract.” (2) Sometimes a well-recognized pair is used (this/these, is/are, etc.) Even though not everyone has control of these yet, students know that one of the two is the right answer; no other choice seems likely Here we need to choose another test format (3) A third cause is the use of obviously impossible distractors: (“Did he the work?”/*A Yea, he did B Birds eat worms C Trains can’t fly.) The tally of student answers also shows how many people skipped each item Sometimes many questions are left blank near the end of the test In this case you will need to shorten the test or allow more time for it Tasks To tasks to below, an item analysis on these four multiple-choice questions There were 27 students in the class, and therefore test papers in each group (Note in the tallying below that “| |” means 2, and that “| | | |” means 5, etc.) (54) Calculate the level of difficulty for each of the four items Which of these are too difficult, and which are too easy? Submit your calculations with your answer Calculate the discrimination of each item Which has the poorest discrimination? Which have unsatisfactory discrimination? Which have borderline? Submit calculations Look at the distractors in the four items In which are they the most effective? In which are they the least effective? Do we have any item with negative discrimination? If so, which one? Which item did the fewest students leave blank? Which item did the most leave blank? REFERENCES: ALTE 2005 Materials for the guidance of test item writer Black, P., Harrison, C., Lee, C., Marshall, B and Wiliam, D (2003) Assessment for Learning: Putting it into Practice Buckingham: Open University Press 2-4 Bachman, L.F 1990 Fundamental considerations in language testing Oxford: Oxford University Press Cameron, L (2001) Teaching languages to young learners Cambridge University Press Canale, M and M.Swain 1980 Theoretical bases of communicative approaches to second language teaching and testing Applied Linguistics DIALANG 1997 “DIALANG: A new European system for diagnostic language assessment.” Language Testing Update Genesee, F and J Upshur 1996 Classroom-based evaluation in second language education Cambridge: Cambridge University Press (55) Hamayan, E 1995 Approaches to alternatives assessment In W Grabe, et al Hill, C and K Parry 1994 From testing to assessment London: Longman 10 McKay, P (2006) Assessing young language learners Cambridge: CUP Moon, J (2003) Learning Journals and Logs, Reflective Diaries University of Exerter 11 Nguyen Cong Khanh (2014) Kiem tra danh gia giao duc NXB Dai Hoc Su Pham 12 Nitko A.J., & Brookhart S.M (2007) Educational assessment of students (5th ed.), Upper Saddle River, NJ: Pearson/Prentice Hall 13 Le Thuy Linh (2013) English Language Teaching Methodology Practicalities in an English Classroom NXB Dai Hoc Su Pham 14 Oosterhof, A (2003) Developing and using classroom assessments New Jersey: Merrill Prentice Hall 15 Park, C (2003) Engaging Students in the Learning Process: The Learning Journal Journal of Geography in Higher Education 27(2) p.183-199 16 Shaaban, K (2005) Assessment of young learners English Teaching Forum, 43 (1), pp 34-40 17 Shiel, C & Jones, D (2003) Reflective learning and assessment: a systematic study of reflective learning as evidenced in student learning Retrieved from: eprints.bournemouth.ac.uk/1390/1/Shiel_Output_1.pdf 18 Shohamy, E 1995 Performance assessment in language testing In W Grabe, et al (eds.) Annual Review of Applied Linguistics, 15 Survey of applied linguistics New York: Cambridge University Press 188-211 (56) 19 Valette, R 1994 Teaching, testing and assessment: Conceptualizing the relationship In C Hancock (ed.) Teaching, testing and assessment: Making the connection Lincolnwood, IL: National Textbook Company 1-42 (57)

Ngày đăng: 13/09/2021, 03:12

Nguồn tham khảo

Tài liệu tham khảo

Loại

Chi tiết

2. Black, P., Harrison, C., Lee, C., Marshall, B. and Wiliam, D. (2003) Assessment for Learning: Putting it into Practice. Buckingham: Open University Press. 2-4

Sách, tạp chí

Tiêu đề:	Assessment for Learning: Putting it into Practice

3. Bachman, L.F. 1990. Fundamental considerations in language testing.Oxford: Oxford University Press

Sách, tạp chí

Tiêu đề:	Fundamental considerations in language testing

4. Cameron, L. (2001). Teaching languages to young learners. Cambridge University Press

Sách, tạp chí

Tiêu đề:	Teaching languages to young learners
Tác giả:	Cameron, L
Năm:	2001

5. Canale, M. and M.Swain. 1980. Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics

Sách, tạp chí

Tiêu đề:	Canale, M. and M.Swain. 1980. Theoretical bases of communicativeapproaches to second language teaching and testing

6. DIALANG. 1997. “DIALANG: A new European system for diagnostic language assessment.” Language Testing Update

Sách, tạp chí

Tiêu đề:	DIALANG. 1997. “DIALANG: A new European system for diagnosticlanguage assessment.”

7. Genesee, F. and J. Upshur. 1996. Classroom-based evaluation in second language education. Cambridge: Cambridge University Press

Sách, tạp chí

Tiêu đề:	Genesee, F. and J. Upshur. 1996. "Classroom-based evaluation in secondlanguage education

8. Hamayan, E. 1995. Approaches to alternatives assessment. In W. Grabe, et al

Sách, tạp chí

Tiêu đề:	Hamayan, E. 1995". Approaches to alternatives assessment

9. Hill, C. and K. Parry. 1994. From testing to assessment. London: Longman

Sách, tạp chí

Tiêu đề:	Hill, C. and K. Parry. 1994. "From testing to assessment

10. McKay, P. (2006). Assessing young language learners. Cambridge: CUP Moon, J (2003). Learning Journals and Logs, Reflective Diaries. University of Exerter

Sách, tạp chí

Tiêu đề:	McKay, P. (2006). "Assessing young language learners." Cambridge: CUPMoon, J (2003). "Learning Journals and Logs, Reflective Diaries
Tác giả:	McKay, P. (2006). Assessing young language learners. Cambridge: CUP Moon, J
Năm:	2003

11. Nguyen Cong Khanh. (2014) Kiem tra danh gia trong giao duc. NXB Dai Hoc Su Pham

Sách, tạp chí

Tiêu đề:	Kiem tra danh gia trong giao duc
Nhà XB:	NXB DaiHoc Su Pham

12. Nitko A.J., & Brookhart S.M. (2007). Educational assessment of students (5 th ed.), Upper Saddle River, NJ: Pearson/Prentice Hall

Sách, tạp chí

Tiêu đề:	Educational assessment of students
Tác giả:	Nitko A.J., & Brookhart S.M
Năm:	2007

13. Le Thuy Linh. (2013) English Language Teaching Methodology 3.Practicalities in an English Classroom. NXB Dai Hoc Su Pham

Sách, tạp chí

Tiêu đề:	Practicalities in an English Classroom
Nhà XB:	NXB Dai Hoc Su Pham

1. ALTE. 2005. Materials for the guidance of test item writer

Khác