1. Trang chủ
  2. » Luận Văn - Báo Cáo

Agi item writing development workshop handout

32 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 32
Dung lượng 389,99 KB

Nội dung

1/31/2012 TEST DEVELOPMENT Item Writing / Development Workshop  One of the main test development activities is … Prof dr Eduardo Cascallar Assessment Group International (USA/Belgium) Katholieke Universiteit Leuven (Belgium)  ITEM WRITING… Test… what test? EXAM DEVELOPMENT PROCESS 1/31/2012 Exam Development Process Exam Development Process Develop Test Plan Item Writing Develop Test Plan Item Approval Form Examination Committee Analysis & Calibration Field i ld Testing Item Selection & Form Creation  Exam series coordinator  Program faculty Operational Use  Subject matter experts New Exams Existing Exams Exam Development Process Exam Development Process Develop test plan Develop Test Plan  General description New Revised  Review  Review content curriculum i l li outline  Review textbooks  Review current exam forms  Structure: Dimensions  Purpose and uses  Item characteristics (format, content)  Technical standards (basis of scoring)  Content outline  Proposed references 1/31/2012 Exam Development Process Exam Development Process Item Writing Content Outline Content Area Pct Foundations of Geology 30%  Content outline Geological Processes 10%  Content gaps identified by committee Types of Geological Formations Special Topics 40% Identify needed items using:  Item analysis results 20% Exam Development Process Exam Development Process Item Writing Item Reviews  Recruit writers  Surface corrections  Prepare item writer manual  Committee review  Make M k assignments i  Copy Cop and st style le editing  Receive and track items  Review for sensitivity & bias  Key validation  External review 11 12 1/31/2012 Exam Development Process Exam Development Process Item Approval Item Selection and Form Creation New exam  Committee endorsement  Eligible status Revised exam  Committee endorsement  Psychometric endorsement  Eligible status Test Plan Dimensions Content Psychometric Criteria Process Cognitive Item Information Item Difficulty Item Position Item Discrimination 13 Exam Development Process Exam Development Process Form Creation Field Testing  Assemble proposed form  Pilot test OR pretest status (embedded)  Edit for clues, overlaps p  Revise studyy materials to reflect new content  Address issues of sensitivity and balance  Replace and reread 15 16 1/31/2012 Exam Development Process Operational Use Exam Development Process Analysis & Calibration  Monitor item performance  Flag items for review  Respond to examinee queries 17 18 When to use MC tests  To assess mastery of knowledge  Recall of factual knowledge  Procedural P d l knowledge k l d Importance of writing good MC items  A study has found that more than 33% of items in classroom tests are flawed (unfocused item stems, use of none of the above and all of the above, and negative)  It was found that flawed items caused nearly 25% more students to fail than unflawed items  The increased test and item difficulty associated with the use of flawed items is an example of construct-irrelevant variance, because poorly crafted test questions add artificial difficulty to the test scores  To assess some reasoning skills  i.e., compare/contrast, inference, cause and  This variance interferes with the accurate and meaningful effect 19 20 interpretation of test scores and negatively affects students’ passing rates, particularly for passing scores at or just above the mean of the test score distribution 1/31/2012 Interconnectedness of the Steps Evidence-Centered Design Steps:  Earlier decisions affect all later phases  Any later phase could require changes in  Each item/task has to elicit evidence for p earlier phases  Phases provide information for each other, both forward and backward  Iteration and recycling throughout the phases is expected and helpful one or more claim(s)  Item Specifications will set specifications for the tasks, and rules for their accessibility and fairness Each task will be linked to evidence for a claim or claims 21 22 Stimulus Specifications (Additional materials / evidence) Benefits of Evidence-Centered Design What we mean by stimuli in assessment?  ECD strengthens validity argument Additional Materials (e.g., text, web pages,  ECD aligns all parts of the test design and speeches) development process process  Every step of test development from setting specifications to reporting scores is logically linked to previous and following steps 23 Mathematics M h i ((e.g., graphs, h ddata di displays) l ) Why provide specifications for stimuli? How are we determining the specifications? What kinds of information will be included? What progress have we made thus far? 24 1/31/2012 Measurement Issues Two very important characteristics  Reliability  VALIDITY  consistency  free from extraneous sources of error  RELIABILITY  Validity  how well a test measures what it is  FAIRNESS supposed to measure 25 26 Reliability and Validity Formative vs Summative Tests  Formative  monitor progress toward goals within a course of study  Summative  assess overall achievement of course goals 27 28 1/31/2012 Table of Specifications When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative  Blue print for test  Purpose  ensure proper emphasis h i given i to allll elements of a course of study  content validity  Guide for writing items Robert Stake 29 30 Steps and elements to consider The MC item type  Start with instructional objectives  We prefer the multiple choice format  Multiple choice items consist of:  Specify the psychometric model  Determine type of item(s) - stem - the correct choice - distractors: 3-6(+/-) choices (ideal: or 5)  Multiple choice items are very difficult to write  Can be written for a variety of cognitive levels  Items should be tested before they are used  We will examine recommendations for preparing multiple choice questions 31  Determine the complexity taxonomy  Determine length of test  Determine weight to be given to each objective  Determine weight to be given to each level of taxonomy  Estimate number of items in each cell 32 1/31/2012 Guidelines for writing objective items Guidelines for writing objective items (cont.) Construct questions free from extraneous reasons for problems Construct at appropriate level of difficulty for examinees Communicate the question in clear, concise language 22 IInclude l d iitems at appropriate i llevell off diffi difficulty l for f purpose of test In the correct alternatives, paraphrase statements from the reference source 3.Test significant elements of a course 8.Exclude clues to correct answer Write independent items 33 34 Guidelines for writing objective items (cont.) Validity  The extent to which inferences and actions Provide one correct answer made on the basis of a set of scores are appropriate and justified by evidence, for the stated purpose of the instrument 10.Edit the Items 35 36 1/31/2012 Validity: internal & external Validity Evidence  Construct Evidence Construct validity [internal]  the extent to which evidence can be found to support the underlying theoretical construct on which the test is based  Construct is the concept or the characteristic that a test is designed to measure such as knowledge of psychology  Content Evidence  Linkage to a specific body of knowledge Content validity [internal]  the extent to which the content of a test can be said to be sufficiently representative and comprehensive of the purpose for which it has been designed  Consequential Evidence  Related to the use of the outcomes for the stated purpose of the test 37 38 Validity [3] Validity [2] Response validity [internal]  the extent to which test takers respond in the way expected by the test developers Concurrent validity [external]  the extent to which test takers' scores on one test relate to those on another externally recognised test or measure 3939 Predictive validity [external]  the extent to which scores on test Y predict test takers' ability to X Face validity [internal/external]  the extent to which the test is perceived to reflect the stated purpose e.g writing in a listening test – is this appropriate? depends on the target language situation 4040 10 1/31/2012 Distractors Item Formats  The DISTRACTORS should:  Closed Stem  The stem is a complete sentence  Example:  Who among these historical figures was a Greek ruler? A Washington B Jelačić C Charlomagne D Pericles - be sufficiently plausible to attract candidates who are below the level of the item (common misconceptions, critical misunderstandings, etc.) - represent distinctions that a person might have to make to solve a task or in understanding the topic  Distractors such as “All of the above” and “None of the above” should not be used 69 70 Item Formats (cont.) Item Formats (cont.)  EXCEPT, NOT Formats DO NOT USE:  All of the answers are true except one, and only one  Example:  Multiple Choice items with more than one answer “true” Which of these historical figures was NOT a Greek philosopher?  A Socrates  B Plato  C Aristotle  D Mazdak  KEY (Persian philosopher and religious figure) 71  Example: … D Both B and C above are true Why? • Can be confusing • Can help those who know very little to choose the correct answer • There is generally a better way to ask this type of questions 72 18 1/31/2012 Common mistakes or flaws in writing an item Common mistakes or flaws in writing an item (cont.)  Non-directed stem (in which the question being asked is 73 not clear)  To test if you have a non-directed stem, cover up the options If you not understand what is being asked, you ma may havee a non non-directed directed stem item  Example:  Mazdak was a: (Only A and D are clearly wrong!)  A Musician  B Philosopher  C Prophet  D Runner  Heterogeneous options: - One or more options is completely different in content or form from the other options  This flaw makes the focus of the item unclear  All options should be drawn from the same frame of reference  Example:  if options A and B, are clearly from different subset as that of options C and D 74 Common mistakes or flaws in writing an item (cont.) Common mistakes or flaws in writing an item (cont.)  Subsuming options (an item in which one option includes  Mutually contradictory options: all the information from another option)  Positively worded stems  Items with this flaw are ambiguous, since if one option were  Negatively worded stems  i.e., when two options contradict each other, and between them they cover the entire range of possibilities Then, test-wise candidates will know that one of the mutually exclusive options has to be the ‘key’ (since both cannot be true) 75 76 true, the other would also be true (even if it would be less true detailed or specific)  Example:  … a Chinese philosopher  A Yang Zhu  B Yang Zhu an early Daoist …  … 19 1/31/2012 Common mistakes or flaws in writing an item (cont.) Common mistakes or flaws in writing an item (cont.)  Information in options that should be in  Conspicuous key the stem  The key to an item should not be  Information that applies to the item as a whole considerably longer, shorter, more detailed, or stated in more technical language than any of the other options should appear in the stem, not in one of the options 77 78 Common mistakes or flaws in writing an item (cont.) Common mistakes or flaws in writing an item (cont.)  Key words from the stem in an option  Specific Determiners  If an option p has a key-word y or pphrase repeated p  Words such as “ALWAYS” or “NEVER” in from the stem, it is likely that this option is the KEY 79 options make these options unlikely keys, since universal statements are seldom true 80 20 1/31/2012 IDEAS for ITEMS Cognitive Levels  Think of tasks related to the construct:  Which tasks are most closely related to the content area?  Which terms or concepts related to these tasks have you  Items may be written at different “cognitive levels”… had to explain to others in the class or on the job? W Which c mistakes sta es aand/or /o pproblems o e s have ave you eexperienced pe e ce  Problems with this concept with students or with employees (in job-related situations) or heard about? What was the situation and what knowledge was the student/person lacking?  The most important point to keep in mind as you write items is that they should test knowledge that is important for the student to possess at that level, and/or must be in the curriculum 81  Cognitive processing view  Level of complexity of the tasks  Relative value / guidance 82 Cognitive Levels (cont.) Cognitive Levels (cont.)  RECALL / RECOGNITION  Recall / Recognition  Comprehension / Understanding  Remembering or recognizing appropriate  Application terminology, facts, ideas, materials, principles, and generalizations  Analysis  Evaluation 83 84 21 1/31/2012 Cognitive Levels (cont.) Cognitive Levels (cont.)  COMPREHENSION  APPLICATION  Understanding what is being  Applying ideas ideas, rules of procedures procedures, communicated, reports, tables, diagrams, directions, regulations, etc 85 methods, principles, and theories to specific situations 86 Cognitive Levels (cont.) Cognitive Levels (cont.)  ANALYSIS  EVALUATION  Breaking down material or information into its constituent parts and detecting the relationship or the pparts and the wayy theyy are organized g  Making a judgment about the value of ideas, works, solutions, methods, materials, etc by applying two or more criteria  SYNTHESIS  Integrating material or information into larger constructs or units detecting the relationship or the parts and the way they are organized into a larger whole 87 88 22 1/31/2012 Reviewing Multiple-Choice Items Rationales  You must be very critical  Every item must have a rationale  It should include any references, including  You must be extremely yp pickyy complete citations, citations used during the development of the item  Include a reason for the approach/construct that is suggested or implied 89  You must feel like a perfectionist  You must act proactively 90 Development of a MC test (examples)  Develop the questions (items) in proportion to the emphasis and representation in the course content Designing and evaluating good multiple choice items  Use the Table of Specifications as a blueprint for the test 91 92 23 1/31/2012 Understanding Item Recall Item  Terminology  Important principles  Factual Knowledge  Methods and procedures Trees that keep their functional leaves throughout the h year are:  A Auxins  B Deciduous  C Evergreen  D Inhibitors 93 94 Application Item The Basic Rules for One-Best-Answer Items  Apply facts & principles Predict an event that will       95 result from a cause Evaluate the cause of some event Select the correct method for a problem presented Use organizational principles Distinguish between facts and inferences Evaluate relevance of material Distinguish between relevant and irrelevant information Belinda has an old rose plant with an unusually pretty flower If she wants another identical plant, she should:  A Buy a seed and plant it  B Plant a seed from the plant  C Root a stem cutting  D Root a leaf cutting What is the BEST type of container for storing oily rags?  A Approved covered metal waste can  B Approved self-closing metal container  C Cloth bag  D Open metal waste can Each item should focus on an important concept  Common, Common important central construct or concept  Avoid :  Tricky  Trivial  Overly complex 96 24 1/31/2012 The Basic Rules for One-Best-Answer Items The Basic Rules for One-Best-Answer Items Each item should assess the application of The stem of the item must pose a clear knowledge NOT recall of an isolated fact (unless absolutely necessary) question  It should be possible to arrive at an answer with  “Set-up” the question appropriately the options covered  Stems may be longer  Options should be short 97 98 Distractors The Basic Rules for One-Best-Answer Items  Should be:  Plausible  Grammatically consistent  Logically compatible  Same relative length as the correct answer All distractors (incorrect options) should be homogeneous  Correct and incorrect options should fall into  Order the options in logical order (numeric) or the same category 99 alphabetical 100 25 1/31/2012 The Basic Rules for One-Best-Answer Items The Basic Rules for One-Best-Answer Items Avoid “technical item flaws” that provide special spec a benefit e e t to test test-wise w se examinees e a ees oor that pose irrelevant difficulty  Subject each question you write to the aabove ove “tests” tests  If a question passes all 5, it is probably well- written and focused 101 102 Tips for writing items to test higher cognitive levels Tips for writing items to test higher cognitive levels  Give clues to the problem and ask for the  Incorporate construct-relevant situations best course of action that require analysis of multiple issues to arrive at a solution  Which of the following should “be done” first?  Which of the following would you recommend?  Avoid explicitly identifying the problem when prompting trouble-shooting 103 104 26 1/31/2012 Use an efficient and clear option format Put as many words as possible into the stem The psychometrician should recommend A that the panel write longer more difficult to read stems B that the panel write distractors of length similar i il tto th the kkey  Write options with similar lengths  Novice item writers tend to p produce keys y that are longer and more detailed than distractors The psychometrician should recommend that the panel write A longer more difficult to read stems B distractors of length similar to the key  Test wise candidates will be drawn to the longest response 105 106 Seek balance among options Undesirable A high blood pressure B low blood p pressure C high temperature D low heart rate Write in third-person style  “A historian is reviewing ” Desirable A high blood pressure B low blood p pressure C high heart rate D low heart rate  “A physicist is evaluating ”  Specifically avoid pronouns like “you” and “your” 107 108 27 1/31/2012 Cause each option to flow from the stem Item writing tip:  You should use elements equally among options  If you write an incomplete statement at the end of the stem, stem then evaluate the grammar of each option when linked to the stem 109  Novice writers tend to use correct elements more often  Test wise candidates then need only discard the least frequently used elements to find the key 110 Write distractors with care ITEM REVIEW • • • • •  When writing item stems, you should all you can to help candidates clearly l l understand d t d th the situation it ti and d the question  Distractors should be written with a more ruthless attitude 111 First editorial review Review by Test Development committee External Review Psychometric Review Final editorial review 112 28 1/31/2012 Item Analysis  Main purpose of item analysis is to improve the test  Analyze y items to identify: 114y • Potential mistakes in scoring • Ambiguous/tricky items • Alternatives that not work well • Problems with time limits Item Characteristics to Consider 113 114 Internal Consistency Reliability Reliability  The reliability of a test refers to the extent to which a test is likely to produce consistent results • Test-Retest • Split-Half • Internal consistencyy  High reliability means that the questions of the test tended to hang together Students that answered a given question correctly were more likely to answer other questions correctly correctly  Reliability coefficients range from (no reliability) to (perfect reliability)  Low reliability means that the questions tended to be unrelated to each other in terms of who answered them correctly  Internal consistency usually measured by Kuder- Richardson 20 (KR-20) or Cronbach’s coefficient alpha 115 116 29 1/31/2012 Reliability Coefficient Interpretation Item difficulty General guidelines for homogeneous tests  Proportion of students that got the item correct (ranges from 0% to 100%)  80 and above – Very good reliability  Helps evaluate if an item is suited to the level of  70 to 80 – Borderline/Good reliability, a few test examinee being tested tested items may needd tto bbe iimprovedd ((or llength it th iincreased) d)  50 to 70 – Low, several items will need improvement (and/or length of test increased)  50 and below – Poor reliability, test needs revision • Very easy or very hard items cannot adequately discriminate between student performance levels • Spread of student scores is maximized with items of moderate difficulty 117 118 Item discrimination Item discrimination  + rpb means that those scoring higher on the  How well does the item separate those that exam were more likely to answer the item correctly (better discrimination)  - rpb means that high scorers on the exam answeredd the h item wrong more ffrequently l than h low scorers (poor discrimination) know the material from those that not  Measured by the Point-Biserial (rpb) correlation l ti ((ranges ffrom -11 to t 1) 1)  rbp is the correlation between item and exam performance  A desirable rpb correlation is +0.20 or higher 119 120 30 1/31/2012 Evaluation of Distractors  Distractors are designed to fool those that not know the material Those that not know the answer, guess among the choices  Distractors should h ld be b equally ll popular l (# expected = # answered item wrong / # of distractors)  Distractors ideally have a low or -rpb 121 122 References ANY QUESTIONS… ? Downing, S M (2006) Selected-response item formats in test development Pp 287301 in Handbook of Test Development, edited by Downing, S.M and Haladyna, T M (Eds.) Lawrence Erlbaum Frary, R B (1995) More multiple-choice item writing do’s and don’ts Practical Assessment, Research & Evaluation, 4(11) Retrieved October 30, 2010 from http://PAREonline.net/getvn.asp?v=4&n=11 Kehoe, J (1995) Writing multiple-choice test items Practical Assessment, Research & EEvaluation, l 4(9) 4(9) Retrieved R i d October O b 30, 30 2010 from f http://PAREonline.net/getvn.asp?v=4&n=9 Office of the Superintendent of Public Instruction (2010) Measurements of student progress test and item specifications: Grades 6-8 mathematics Retrieved on October 31, 2010 from http://k12.wa.us/Mathematics/TestItemSpec.aspx Office of the Superintendent of Public Instruction (2010) Test and item specifications for grades 3-high schoolWashington state reading assessment Retrieved on October 31, 2010 from http://k12.wa.us/Reading/Assessment/default.aspx 123 124 31 1/31/2012 FarleyJK The multiple choice test: writing the questions Nurse Educ1989;14:10–12, 39 Medline Van HoozerH The teaching process: theory and practice in nursing Norwalk, Conn: Appleton-Century-Crofts, 1987 KempJE, Morrison GR, Ross SM Developing evaluation instruments In: Designing effective instruction New York, NY: MacMillan College Publishing, 1994; 180–213 BraddomCL A brief guide to writing better test questions Am J Phys Med Rehabil1997;76:514–516 CrossRef Medline GronlundNE Assessment of student achievement Boston, Mass: Allyn & Bacon, 1998 10 CoxKR, Bandaranayake R How to write good multiple choice questions Med J Aust1978;2: 553–554 Medline HaladynaTM, Downing SM, Rodriguez MC A review of multiple-choice itemwriting guidelines Appl Meas Educ2002;15:309–333 CrossRef 11 BloomBS, ed Taxonomy of educational objectives Vol I: Cognitive domain New York, NY: McKay, 1956 12 FuhrmannBS, Grasha AF A practical handbook for college teachers Boston: Little, Brown, 1983 CaseSM, Swanson DB Constructing written test questions for the basic and clinical sciences Philadelphia, Pa: National Board of Medical Examiners, 1998 125 JozefowiczRF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew H The quality of in-house medical school examinations Acad Med2002;77: 156–161 Medline 126 Downing, S M (2006) Selected-response item formats in test development Pp 287301 in Handbook of Test Development, edited by Downing, S.M and Haladyna, T M (Eds.) Lawrence Erlbaum Kehoe, J (1995) Writing multiple-choice test items Practical Assessment, Research & Evaluation, 4(9) Retrieved October 30, 2010 from http://PAREonline.net/getvn.asp?v=4&n=9 Stiggins, R J., Arter, J A., Chappuis, J., & Chappuis, S (2006) Classroom assessment for learning: Doing it right-using it well Educational Testing Service 127 32

Ngày đăng: 31/07/2023, 21:01

w