This paper will start teachers on the path to being assessment literate.They will learn about the terms that make up the cornerstones of testing, how toplan their courses with assessment
Trang 1Are you assessment literate? What does that mean – to be assessment literate? Assessment is something that we as teachers must do all the time, but many of us feel unprepared or uncomfortable when it comes to testing our students Teachers often reuse without analyzing or revising them and seldom use statistical procedures to see how a test – or a test item – is actually performing Assessing students often means reaching for a test or quiz that is already prepared, whether it be a test included with a textbook, something another teacher prepared, or a standardized test produced by a major testing organization or our institution These are not necessarily bad choices (and sometimes it may not be our choice at all), but to make sure they are good choices, we must be knowledgeable about the principles and practices of assessment.
Trang 2TABLE OF CONTENTS
PageAbstract ……… 1Table of contents ……… 2
Part I – Introduction ……… 3
1 Testing vs assessing ………3
2 Importance of assessment ……… 3
Part 2 – Development ……… 5
1 Literature review ………5
1.1 Assessment literacy ……… 51.2 Key concepts and considerations ……… 5
1.2.1 Usefulness and purpose ……… 5
1.2.2 Reliability ………6
1.2.3 Validity ………7
1.2.4 Practicality ……… 8
1.2.5 Washback ……… 81.2.6 Authenticity ……… 10
1.2.7 Transparency ……….10
2 Planning your assessments ……… 11
Trang 32.1 Creating an exam blueprint ……….11
2.2 Providing feedback ……….14
2.3 Creating an exam file ……… 15
2.3.1 Setting: Task and administration ………15
2.3.2 Demands: Task ……… 15
2.4 Basic statistics for testing ………16
Part 3 – Conclusion ……….18
References ……… 20
Trang 4PART I - INTRODUCTION
In order for assessment to be effective, classroom teachers need to beassessment literate – knowledgeable about the key concepts of testing and howthey can inform the design of assessments and decisions surrounding theirusage This paper will start teachers on the path to being assessment literate.They will learn about the terms that make up the cornerstones of testing, how toplan their courses with assessments in mind, and how to make a test blueprint.Knowing more about assessment will not only help teachers to assess theirstudents more effectively, but it will also provide them with a means ofevaluating their own teaching and help them to produce tests that will actuallymotivate their students to learn Let’s begin by learning more about the wordstesting and assessment
1 Testing vs assessing
The word test can make people nervous It has semantic qualities that make us
think of being judged or measured by someone or something Many people have
an emotive reaction to testing and associate it with negative experiences that
they may have had as students In an educational context, the terms testing and assessment are often used interchangeably to indicate the measurement of
student learning However, although a test is a type of assessment – usuallythought of in the traditional sense of an exam or quiz – assessment is a morecomprehensive term It often indicates the collection of information aboutstudent learning that might include not only tests but also a variety of techniquessuch as performance tasks, porfolios, and observation
While tests are thought of as a means to give graders to students, assessmentsoffer diagnostic information for both students and teachers The ultimatepurpose of assessment is to improve student learning, as opposed to just beingable to give a mark for the amount of course content a student has mastered.Today teachers tend to talk about assessing (rather than testing) their studentsbecause we see the ongoing evaluation of student learning as more than justtesting knowledge and skills in a particular area at one point in time for gradingpurposes Thus, throughout this paper, references to tests will be made with theultimate goal of using them as assessment tools and not purely as testinginstruments
2 Importance of assessment
As we all know, assessment plays an important role in teaching and learning Itaffects decisions related to instruction, determines the extent to whichinstructional objectives are met, and provides information for administrativedecisions It has been estimated that teachers spend as much as 50 percent of
Trang 5their time in assessment-related activities (Stiggins 1991), and that whenassessment is implemented effectively, student achievement is improved(Campbell and Collins 2007).
Yet many teachers feel assessment and testing are not relevant to their classroompractice and report that they feel unprepared to undertake assessment-relatedactivities Popham (2004) reports that most public school educators in theUnited States tend to think of assessment as “a complex, quantitative arena wellbeyond the comprehension of mere mortals” (82) Some of these feelings maycome from the anxiety that teachers felt when they didn’t understand how thetests were graded or if the objectives of the tests were not clear
Teacher-education programs are also at fault for not making sure teachers areadequately trained before entering the classroom (Mertler 2004) As Taylor(2009) points out,
language education programs at graduate level typically devote little time or attention to assessment theory and practice, perhaps just a short (often optional) module; and although there is no shortage of books on language testing and assessment available today, many of these are perceived to be (and often are) highly technical or too specialized for language educators seeking to understand basic principles and practice in assessment (23)
During our time in school and teacher-training-courses, we take many tests, buthow often are we actually given practice creating them, marking them, andinterpreting the results? Developing these skills is part of becoming assessmentliterate
Trang 61.2 Key concepts and considerations
Seven key concepts – usefulness, reliability, validity, practicality, washback,authenticity, and transparency – are cornerstones in testing that help to ensurethat a test is solid (i.e., that it will consistently measure what you want it tomeasure in an efficient manner, and that both teacher and student will see it as avaluable source of information regarding learning) Understanding theseconcepts and being able to improve practices related to them are important indeveloping assessment literacy Each is discussed separately below, but as youwill notice, they are connected to and support one another; together, they formthe basis for building solid assessments
1.2.1 Usefulness and purpose
According to Bachman and Palmer (1996), usefulness is the most importantconsideration when choosing or designing a test Teachers must consider whatthe purpose of a particular assessment is and whether this purpose is congruentwith the students they are testing and the course they are teaching All languagetests must be developed with a specific purpose, a particular group of testingEnglish language ability (proficiency) are designed with a specific groups of testtakers in mind Take, for example, three standardized tests used globally for thepurpose of measuring language ability: the Test of English as a ForeignLanguage (TOEFL), the International English Language Testing System(IELTS), and the Michigan English Test (MET) Each of these has beendeveloped with very specific audiences and purposes (see Figure 1)
The examples in Figure 1 illustrate that tests are designed with very specificaudiences and purposes in mind This specificity is what allows them to
Trang 7effectively measure what they are designed to measure and makes them usefulfor a specific purpose You must carefully consider the purpose of a test beforeadministering it If you choose a pre-made test and it does not match yourstudents’ needs or your purpose, then it will not be an adequate assessment ofyour students and will not provide the information that you need in order tomake informed decisions about the teaching and learning taking place in theclassroom.
Test Information about the Purpose
TOEF
L
Measures the ability “to use and understand English at the universitylevel,” and evaluates how well the test taker can “combine listening,reading, speaking and writing skills to perform academic tasks”(Educational Testing Service 2014)
IELTS Has an academic and a general-training version The academic
version is for those who want to study in an English-speakinguniversity; the general version focuses on basic “survival skills inbroad social and workplace contexts” (IELTS 2013)
MET “Intended for adults and adolescents at or above a secondary level of
education who want to evaluate their general English languageproficiency” in social, educational, and workplace contexts It is “not
an admissions test for students applying to universities and colleges
in the United States, Canada, and the United Kingdom” (ModernLanguage Center 2010)
Figure 1 Examples of standardized proficiency tests and their purposes
For example, if you wanted to measure the reading ability of your students tosee if they would be able to order from a menu when visiting the United States
on an exchange trip, you couldn’t just use any reading test you find in a textbook
or online You would need to find one (or better yet, make one) that is specific
to the skills taught in class, that meets the vocabulary needs of the situation thestudents would be immersed in, and that uses an appropriate text style thatmatches what you expect the students to encounter Having them read a passagefrom a newspaper or a short story and then answer questions would notadequately measure their ability to read and order from a menu at a restaurant
So when you choose or design a test, consider the purpose of the test, the group
of test takers it is designed for, and the specific language use you want toevaluate
1.2.2 Reliability
Your assessments not only need to be useful for the intended purpose, they alsoneed to be reliable Reliability refers to the consistency of test scores If youwere to test a student more than once using the same test, the results should bethe same, assuming that nothing else had changed Reliability can be threatened
Trang 8by fluctuation in the learner are out of the testing administrator’s control; wecannot control whether a student is sick, tired, or under emotional stress at thetime of a test But we as teachers can limit the fluctuations in scoring and testadministration The guidelines for how a test is administered, the length of timeallotted to complete the test, and the conditions for testing should be established
in advance and written in a test-specifications document (See the Validitysection for an example of test specifications.) As much as possible, there should
be consistency in testing conditions and in how a test is administered each time
it is given Teachers can minimize fluctuations in score by preparing answerkeys and scoring rubrics, and by holding norming sessions with those who will
be scoring the test
You can take steps to improve the reliability of your tests You need to makesure that the test is long enough to sample the content that students are beingtested on and that there is enough time for most of the students to finish takingthe test The items should not be too easy or too difficult, the questions shouldnot be tricky or ambiguous, the directions should be clear, and the score rangeshould be wide Before you administer the test, you might want to have someoneelse take it to see whether he or she encounters problems with directions orcontent Use that person’s feedback to see where the test might need to beimproved
1.2.3 Validity
One thing to keep in mind is that a test may be highly reliable, but not valid.That is, it might produce similar scores consistently, but that does not mean it ismeasuring what you would like it to A test has validity when it measures whatyou want it to measure The most important aspect of validity is theappropriateness for the context and the audience of the test Think about what is
to be gained by administering a test and how the information will be used.Suppose your goal is to measure students’ listening ability, and you give a test inwhich students answer questions in written format about a lecture they hear Inthat case, you need to make sure the vocabulary, sentence structure, andgrammar usage in the written questions are not beyond the level of the students.Otherwise, you will be testing them on more than just their listeningcomprehension skills and thus decreasing the validity of the test as a measure forlistening ability
A number of factors can have an adverse effect on validity, including thefollowing:
Unclear directions
Test items that ask students to perform at a skill level that is not part ofthe course objectives
Trang 9 Test items that are poorly written
Test length that does not allow for adequate sampling or coverage ofcontent
Complexity and subjectivity of scoring that may inaccurately rank somestudents
The best way to ensure validity and reliability is to create test specifications andexam blueprints These will help ensure that tests created and used match what
is intended for the course and the students Figure 2 shows an example ofgeneral information for the test specifications of a final exam for a higher-education pre-academic English-language program course For each of thesubtests (listening, reading, and writing), specifications would also be writtenand would include the type of skills be assessed, level of vocabulary, grammarstructure, and length of text to be used
1.2.4 Practicality
Practicality refers to how “teacher friendly” a given test is Practicality issuesinclude the cost of test development and maintenance, time needed to administerand mark the test, ease of marking, availability of suitably trained markers, andadministration logistics If the test you want to give requires computers, andthese are not available or connectivity is unreliable, there will obviously be apracticality issue with the delivery of the test For many teachers, the amount oftime required to mark a test is an important practicality issue You can overcomethis issue by weighing how important a particular assessment is in terms ofoverall course mark and determining how much time you want to spend marking
it For example, if a vocabulary quiz will not be worth much in the overallcourse mark, you might consider having students exchange papers and markthem instead of marking each one yourself This arrangement also allowsstudents to review the materials at the same time For marking writing, it might
be more practical to have students review each other’s work and peer edit thefirst draft than to have the teacher make comments on initial draft
1.2.5 Washback
Washback refers to the effects of testing on students, teachers, and the overallprogram It can be positive or negative Positive washback occurs most oftenwhen testing and curriculum design are based on clear course outcomes that areknown to all students and teachers On the other hand, exams that requireextensive preparation can have negative washback and be harmful to teachingand learning process; if instruction solely focuses on helping students pass thetest, other learning activities may be neglected To make sure washback ispositive, teachers should link teaching and testing to instructional objectives.Tests should reflect the goals and objectives of the course along with the types
Trang 10of activities used to teach the content That underscores the importance ofplanning assessments at the same time you plan the course.
Another way to bring about positive washback is through feedback Providingfeedback in a timely manner is important if you want students to learn andbenefit from the assessment process In the above example of practicality,having students
General Test Information – Final Exam Purpose (Why are you
(How important is the test for
the course grade?)
High stakes – weighted as 40% of the finalgrade
Response format (What type
of questions will you use? How
will the test taker show mastery
of the objective?)
Listening: multiple choice, short answer,matching, gap fill, and information transferReading: multiple choice, short answer,matching, gap fill, and information transferWriting: one-paragraph response to aprompt (input is a picture or personalknowledge)
Number of examiners (How
many people are needed to
administer the test? Are there
any restrictions for test
supervisors?)
One test supervisor per 20 students; twomarkers per exam (cannot be the classteacher)
Number and weighting of
items/tasks (How many
questions will there be on each
part? How much will each part
be worth for the overall grade
of the test?)
Listening: approximately 20 items (33%)Reading: approximately 20 items (33%)Writing: 1 task (34%)
Examination length (How
much time will the assessment
take overall? Is there a time
length per section?)
Maximum of 2 hoursListening: 30 minutesReading: 40 minutesWriting: 40 minutes
Order of tasks (In what order
will the sections be tested?)
1 Listening
2 Reading
3 Writing
Rating scale type (Conditions
necessary for marking the
exam)
Reading and listening: Answer key agreed
to before the test; Writing: Two markers,analytical criteria, third marker if necessary