1. Trang chủ
  2. » Luận Văn - Báo Cáo

An investigation into writing assessment The use of multiple choice and written testing

88 322 0
Tài liệu được quét OCR, nội dung có thể không chính xác

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 88
Dung lượng 3,36 MB

Nội dung

Trang 1

sy gta tse SP 3h em ees ko § meee = Ree Abhay > at PRE ben PRA MBL Ậ

Fe com to remove the watermark Pree QAP GS

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY OPEN UNIVERSITY

Ae

AN INVESTIGATION INTO WRITING ASSESSMENT: THE USE OF MULTIPLE-CHOICE AND WRITTEN TESTING

A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF ARTS (TESOL)

~——

———————mm—mmm

TRƯỂNG ĐẠI Hột MŨ ï?.nGMĐ

THU VIEN nee eed

Submitted by NGUYEN THI XUAN BINH

Supervisor

Dr VU THI PHUONG ANH

Ho Chi Minh City, December 2010

Trang 2

ABSTRACT

With the effort of renewing our national examinations to increase testing accuracy and fairness, the Ministry of Education (MOET) has applied multiple-choice format

to some subjects in which English is included However, in the realm of writing

assessment, English caused the most controversial for the arisen question that "Can multiple-choice testing format assess students’ true writing ability?"

The purposes of this thesis are, therefore, to find out whether there are any differences of using multiple-choice and previous formats (known as written testing) to assess writing ability, and the most suitable format will be searched for

under careful consideration as well

With the stated aims, two tools were employed in this research: (1) a test including three formats (one with multiple-choice writing, and the others with sentence writing and paragraph writing) was firstly conducted to observe their differences in rating students In addition, the result of another test of writing (essay) before conducting this study was also employed to supply the readers with evidence on students’ true writing ability All the data got were analyzed and interpreted under descriptive statistics and correlation coefficients for the findings (2) Then a questionnaire was administered to these students for digging out more information on students’ background knowledge and their attitudes towards these formats

The results showed that most students do not obtain similar results taking different test formats; the mean score of multiple-choice test is the highest, whilst two others are relatively similar A combination of these test types was then considered because of closer relationship between their average total score with another test of essay writing In addition, it is noticeable that most students prefer this new type for full of interesting and insightful reasons From these findings, some recommendations were made for considering a mix-form assessment of students’ writing ability in high-school graduation examinations

Trang 3

TABLE OF CONTENTS CERTIFICATE OF ORIGINALITY i ACKNOWLEDGEMENTS ii ABSTRACT ili TABLE OF CONTENTS iv LIST OF TABLES AND FIGURES Vili r8 Vili FIGURES ix ABBREVIATIONS x INTRODUCTION 1

0.1 Statement of the prO€Im - <4 1 9 991 30 rà HT TH H03 1966 1

0.2 Aims and Overview of the research sesessseecessertseeseeceeeeceeseeeeeseseasoseveerssnaasssees 3 CHAPTER 1 2s 5 BACKGROUND TO THE STUDY 5 1.1 Vietnamese crucial examinatiOnS: TIẨICISIT 5 (<< 5< S4 2v 3v erereee 5 1.2 An appearance of “the overall project to innovate the system of HGEs and entrance

examinations to university and college since 2005”” - <s«ccs« «s2 8

1.3 The changes of testing formats used in HGTES HH HH HH HH 8

1.4 The issue of instruction in writing ability at high school . -¿ «<< << «<< 12 1.5, CHapf€T SUINTTATY G5 - Ác 10 11 TT HH HT 13

CHAPTER2 „14

LITERATURE REVIEW 14

2.1 Language †€sfÏn - - s4 Họ HT no HH H0 0g 14 2.1.1 What 1s languàe †€SfITØ? - 5 sọ TH HT 40 037 14

2.1.2 What is a good language †€SÉ? - - - «HH HH gu nh 15

2.1.3 Main relevant †est quaÏ1fICS - «<2 0 1 HH HH no ng 16

°“Ä Ni 7 16 2.1.3.2 Validity: face Validity .- << cv HH nu HH nh TH 701 re 19 2.1.3.3 Reliability and Validity . S0 ng nen 20

2.2 Writing ability asS€SSIm€TIE - - 7Á 5 HT TH nu TH H000 1100170 20

Trang 4

2.2.1 What 1S WTILÏT? - Án 09T TH TH 0181100110 1 0080010180110 21 2.2.2 Writing test formats - subjective format and obJective formal 21 2.2.2.1 Objectiv€ †€SfITE - - «Sàn HH HH HH HH 0014046131 101 22 2.2.2.2 Subjective {€S{ÏTE - - tk HH HH HH ng TH HH HH 23

2.3 The best way to assess writing ability: a combination of testing formats 25 2.4 Chapter SUMMALY 0.0 ec cseesseseseeseeeeesenessscsseessesensessseseesessssssasesssesesensensensensnsees 26 CHAPTER3 27 METHODOLOGY 27 3.1 Research QU€SÏOTIS - «<1 Ơn HH ng HH 01 HH H001 10154 27 3.2 Research €SiØT - 0 4 <9 TH TT TT TH HH HT TT 010 7814110 28 kh) 1 ƠƠƠỎ 28

3.2.1.1 The core subjects: First - year students at An Giang UnIversity 28 3.2.1.2 The supplementary subjects: 12 - grade students at Binh Khanh High S020 1177 29 3.2.2 ÏnSÍFUT€TIẨS .- 5 (5 ng 5 Họ H1 ng nh re 30 3.2.2.1 Survey QU€SẨÏOTTẠTC Án ng HH ng nh HH0 30 c”ˆ Na 31 3.2.2.3 Essay scoring methOd -s- + + +22 tre 32 3.3 ImplemenfatiO - - 5 + nh n4 111 0110 0101014181111 1 011418104001811 100 35 3.3.1 Data v0Ìvii80( 0 35 3.3.1.1 Conducting the †€Sf: 4Á G93 080 35 k0 0) 0n 7n 37

Trang 5

4.1.2 Central Tendency and ÏDiSp€rSIOT <5 5< + E932 E2 E13e 45 4.1.3 The skew Of distrIDufiOT\ - G5 << vn TY HH 011g Hàm 48 UWVîy J0 1 48 4.2.1 Correlations among the different formats 1n the †esf -©-<+<<s<++ 49

4.2.1.1 Correlation between Multiple - chọce testing and Sentence writing 49

4.2.1.2 Correlation between Multiple — choice testing and Paragraph writing 50

4.2.1.3 Correlation between Sentence writing and Paragraph writing 51

4.2.2 Correlations between the two tests’ ccscccscecssssseessessessssssesssesseessecsecssnecseeeseeeeens 52 4.3 T-test: Testing hypotheses about the difference of average value 55 4.3.1 Among the formats used in the major test .ccsscssssseeeseseeeeeeeneeeseseesetseneaes 55 4.3.2 Between the major and minOT €SS 4 55 s5 99v vn ng 91 03 g3 56 4.4 Questionnaire arnaÏYS1S - - << x3 9 ng HH 0101011010110 011310101100 57 ĐI vì (0n nối (0 62 4.5.1 Score distributions: ccccssscsssesecesseeesseesnsesnessenesveneeseneseeees KH ng kg 62 F060 2i 0n 64 F0) ii n8 65 4.6 Chapter summary t*9010398330501101181410411400144140010100100000000040010010 67 CHAPTERS 68 CONCLUSION AND RECOMMENDATIONS 68 he 0 68 5.2 RecommendafiOTIS - 5 5 2< Họ TH TH 0303001000 69

5.2.1 Recommendation 1: Suggestion for combination of different test formats 69

Trang 6

[d3 >))) ĐP, G024009)00.24:3600)0 c1 80

APPENDIX 2: STUDENT QUESTIONNAIRE 2- 2-22 se £zsccseerszerxecree 83 APPENDIX 3: UNORDERED LISTING OF SCORES 5-5 7<ccscczersecrs 86 APPENDIX 4: UNORDERED LISTING OF SCORES FOR PART 3 88 APPENDIX 5: CLASSIFY STUDENTS INTO GROUPS FOR MULTIPLE CHOICE

TESSTT 2 5-55 5S 3E 3971 111511 1111515 15111111 1111 717ATEEEerrrrrie 90 APPENDIX 6: CLASSIFY STUDENTS INTO GROUPS FOR SENTENCE WRITING

TESTT 52 252S219E13213211113211211 1511111111 13 1.11 100111111 0 re rrec 92 APPENDIX 7a: CLASSIFY STUDENTS INTO GROUPS FOR ESSAY TEST

SCORED BY GROUP ONE 2 52c crErkgrrerkerrkrrrriea 94 APPENDIX 7b: CLASSIFY STUDENTS INTO GROUPS FOR ESSAY TEST

SCORED BY GROUP TWO - 2262222 x1 EkELxirkrirkrree 96 APPENDIX 7c: A COLLECTION OF STUDENTS' RANKING FROM THE THREE

FORMATTS 256-542 SH TH g1 1 131111 1111111131 1-1 cá Hy ren rườt 98 APPENDIX 8: COLLECTED REASONS FOR THE QUESTION 4 (IN THE

9)8/2-⁄/19)0/.9:1292077 100 APPENDIX 9: QUESTIONNAIRES DATA ANALYSIS -cccccc.e 103 APPENDIX 10: GUIDELINE FOR APPLYING MULTIPLE-CHOICE TESTING TO

ENGLISH TEST IN 2006 ¿2+ 22 SE +k*£E£2E£ESEEkerreerreee 108 APPENDIX 11: DE THI THPT TU NAM 2003 DEN 2009 se cxsccee 110

Trang 7

TABLES Table 1.1: Table 1.2: Table 1.3: Table 4.1: Table 4.2: Table 4.3a: Table 4.3b: Table 4.4: Table 4.5: Table 4.6: Table 4.7: Table 4.8: Table 4.9: Table 4.10: Table 4.11: Table 4.12: Table 4.13: Table 4.14: Table 4.15:

LIST OF TABLES AND FIGURES

The test structures in 2003 and 2004 The test structures in 2005

The test structures through 2006 - 2009 Frequency table for part 1

Frequency table for part 2

Frequency table for part 3 - Group 1 Frequency table for part 3 - Group 2 A summary of frequency distribution Central Tendency and Dispersion

Correlation between Multiple-choice and Sentence writing

Correlation between Part 1 (Multiple - choice) and Part 3

(Paragraph writing) scored by Group 1

Correlation between Multiple-choice and Paragraph writing scored by Group 2

Correlation between essay scorer groups

Correlation between Sentence writing and Paragraph writing scored by Group 1

Correlation between Sentence writing and Paragraph writing

scored by Group 2

Correlation between Part 1 of the major test and minor test Correlation between Part 2 of the major test and minor test Correlation between Part 3 (scored by group 1) of the major

test and minor test

Correlation between Part 3 (scored by group 2) of the major

Trang 8

Table 4.16: Table 4.17: Table 4.18 Table 4.19a: Table 4.19b: Table 4.20a: Table 4.20b: Table 4.21: Table 4.22a: Table 4.22b: FIGURES Figure 1.1: Figure 4.1:

Multiple-choice and Sentence writing

T-test among the formats

T-test between the major and minor tests Question 1 analysis Reasons for Q.1 analysis Question 2 analysis Reasons for Q.2 analysis Question 3 analysis Question 4 analysis Reasons for Q.4 analysis

Trang 9

HGE Dr MOET SPSS UEE ABBREVIATIONS High-school graduation examination Doctor

Ministry of Education and Training

Trang 10

INTRODUCTION

In this introductory section, the purpose of this thesis will be clarified and stated under real situation of Vietnamese education in general and her context of AnGiang in particular after observing the changes of HGEs through the past few years

Beside that, the aims of the research and thesis overview will be presented as well

0.1 Statement of the problem

Since the MOET decided to apply multiple-choice testing format to HGEs (2005),

there has existed salient controversial issue turning around this change with a

variety of contrary views including support and rejection These viewpoints stem from the following causes:

Firstly, at the early stage of using this new format, objeCtive test was not

popularized to all stakeholders to get the agreements in the society, so some

educators showed suspicious eyes on this kind of test since they stressed more on its

drawbacks which have been stated as follows:

- For test designers: compared to other types, it is more challenging and exploiting to write good multiple-choice items because plausible distractors or incorrect answer options are required as the most important in this type, therefore it

is put under time-consuming to work For that reason, some designers tend to

support “recall” type questions as they are easy to write, but the: failure is that test takers are able to guess the answer through a test

In addition, the issue of language knowledge and skills caused a lot of controversy Some educators argued that this test type could not measure all language skills or

ability of critical and logical thinking However, others advocated this test type as it

Trang 11

- For test takers: multiple-choice testing can not measure students’ overall performance, it sometimes assesses recognition over recall because test-takers may

attempt to guess rather than determine the correct answer due to their incapacity to

answer a particular question Therefore, a number of students have a chance of

receiving a mark for an unknown question For productive (speaking and writing

skills), it lacks training students to express their ideas directly and to have logical creative thoughts

Secondly, it is sad but true that the training process in the high schools in Vietnam

mainly aims at examinations, that is to say teaching and learning what will be

tested Therefore, the consequence of this tendency has negative backwash on students and even teacher as they only focus on the test format known as multiple- choice; as a result, students cannot achieve the proficient ability of English

Over the years teaching at An Giang University, I recognized that most freshman students who are majored in English did not perform their writing ability well The possible explaination for this problem is that these students concentrated too much

on multiple-choice testing which is the main testing format applied in both HGEs

and UEEs

Moreover, the fact that how a variety of language abilities can be measured by only one kind of test (i.e direct or indirect testing) makes the testers a lot of difficulties since each skill has its specific characteristics, so they require different ways to test For example, if the teachers want to know how well the students pronounce a language, the best way is to get them to speak On the other hand, if they want to

check students' grammar structures, it is better for them to use multiple-choice

testing (Hughes, 2003)

A question has been arisen in the researcher's mind is that: How does objective test

Trang 12

In the face of such question, the study was carried out with the hope for appropriate kind of testing — multiple-choice or written testing — in which testing results can truly reflected students’ real writing ability

0.2 Aims and Overview of the research

The research was carried out for the aims of

- Investigating the relationship among various testing formats and certain problems

related

- Finding out the best test type for writing assessment in HGEs

In order to achieve these aims, the data were collected from two main sources: (1) A mix-formed test was designed to observe students’ writing ability through different three formats including multiple-choice testing, sentence writing, and paragraph writing

(2) Questionnaires for students were composed of four primary questions to identify students’ perceptions of writing assessment

The thesis consists of five chapters as follows: Chapter 1: Background to the study

This chapter mainly focuses on setting of the research based on background information and practical viewpoints

Chapter 2: Literature Review

Trang 13

Chapter 3: Methodology

In this chapter, based on the aims stated, the research design and implementation are

framed involving the setting of the study, the data collection procedure, and the data interpretation process

Chapter 4: Results and Discussion

This chapter presents the findings to the study through description and analysis of

the collected data, and finally, discussion of the study results Then, an interpretation of research findings is displayed together with conclusion

Chapter 5: Recommendations and Conclusion

In this chapter, recommendations to the problematized issues are given out And the

last parts reveal the declaration of limitations of the study, suggestions provided for

Trang 14

CHAPTER 1

BACKGROUND TO THE STUDY

This chapter addresses the relevant contexts to the issue raised for supplying the readers with the origin of the subject Besides, it provides background information

for general understanding There are four sections presented in this chapter Section

1 draws a real picture of important Vietnamese examinations under two primary

criticisms Then, section 2 mentions the MOET’s project to innovation of examination system in Vietnam, in which multiple-choice format was introduced as the new trend in HGEs and entrance exams Section 3 is considered as a subsequent impact to the testing innovation; it also shows an overall picture of the HGEs in the transition period through presenting various structures from 2002 to 2009 Finally, the last section describes the issue of teaching writing ability in-high school under primary techniques These contents are presented under a logic chain of large to more detailed problems so that the research issue will be delved into At the same

time two last sections also meet expected background information for the research questions

1.1 Vietnamese crucial examinations: Criticism

Trang 15

For high-stakes examinations, Marchant (2004) carefully explained the

_ consequences brought out under school-wide average scores a school got as well as individual students' high or low scores the students obtained as:

High school-wide scores may bring public praise or financial rewards; low scores may bring public embarrassment or heavy sanctions For individual students, high scores may bring a special diploma attesting to exceptional academic accomplishment; low scores may result in students being held back in grade or denied a high school diploma (p.1)

Accordingly, for certain purposes, a HGE (and even university entrance exam) in

Vietnam may be criticized under various viewpoints because of their weaknesses They are often criticized for the main following reasons:

- Causing stress for the stake holders As their results take afi important part in deciding students’ future, they have to learn more to prepare for the exam for many

years Besides, the exams are often organized in large scale in Vietnam, so all the stake holders tend to be involved in for a long time before and after the exam

Saville's model (2006) explaining the roles of stake holders in the testing

Trang 16

Chapter 1: BACKGROUND TO STUDY Context Stakeholders in the Testing Community Government Testing system Learners agencies Test construct Parents/carers Professional bodles Teachers, Heads School owners Leamers Test format Recelving institutions Teachers, Heads Test scores Government agencles School owners Test conditions Professional bodies Test writers Employers Consultants Test assessment criteria Academic researchers Examiners inter alia) Test centre administrators Publishe inter aifa

Input to test design Context of test use -

provided by stakeholders where decisions are made by stakeholders using test scores

Figure 1.1: Saville's model (2006) of stakeholders' roles in the testing community - Students’ results are solely used for many crucial decisions like continued education (college/ university), graduation certificate (or high-school diploma), job qualification, etc Some educators argue that evaluating the knowledge student obtained within three years of learning at high-school or making decision on their future on just a single test is not the best resolution, it may be unfair for some

students because of unexpected reasons happening during the exam

Therefore, the MOET's overall project to innovate the system of examinations at

high school in Vietnam was issued, yet it also represented salient controversial

Trang 17

instead of one large test result for students’ further education after high school This project will be shown immediately in the very next part

1.2 An appearance of “the overall project to innovate the system of HGEs and entrance examinations to university and college since 2005”

It is admitted that testing and evaluation is one of the most important stage of

education since it has strong impact on teaching and learning process Based on this acknowleged role, the MOET made an enhancement to the field of measurement and evaluation systems that is actually the nature of innovation to the nation-wide examinations along with serious formative assessments at school

In Vietnam, there are four current high-stakes examinations organized every year: HGE (at the end of May and early June), university and college entrance examination (at the early July), college entrance examination (at the end of July and early August), and entrance examination to vocational secondary school (after the previous exam) They are held under supervision of the MOET, hence the MOET's

project tends to affect them all with the change of examinations, particularly in test

format

In the framework of the mentioned project (see appendix | for further information), the MOET advocated using objective testing to foreign languages (including English, Chinese, French, and Russian) in HGEs and university and college entrance exams since 2005 However, until 2006, language tests with multiple- choice testing format (with four choices A, B, C, D) were put into practice The

more detailed description will be mentioned in the next part

1.3 The changes of testing formats used in HGEs

Trang 18

using multiple-choice testing in most crucial examinations would become predominant in Vietnam for its progress considered as a modern scientific study It is obvious that the trend of using multiple-choice testing in HGEs and UEEs seemed to be gradually expanded from 2002, and until the 2006 exam, written form was rejected completely Following is the presentation of the test structures (Note: just focus on test format, not test content) from 2001 - 2009 (collected by the writer) They are divided into three stages to give the readers a logical overview of changes through the years SCHOOL YEARS | 2000 - | 2001 -| 2002 | 2003 2001 2002 |- - TEST CONTENT 2003 | 2004

Written | Written and multiple format | - choice format)

Question I: Completing the passage with tenses x x x x

agreement

Question IJ: Filling the blanks with appropriate x prepositions (NOT GIVEN)

Or: Completing the sentences by choosing the best x x X option for each blank

Question III: Giving the correct form of the words xX xX xX xX in brackets

Question IV: Reading comprehension (making X Xx questions from the suggested words and answering

them; choosing true or false)

Or: Reading the passage and answering the X X questions

Question V: Completing the passage by filling the x Xx Xx Xx blanks with suitable words (NOT GIVEN)

Trang 19

Question VI: Finishing the sentences in such a way X x X Xx that it is as similar as the meaning of original sentences Question VII: Writing complete sentences with x suggested words

Table 1.1: The test structures through 2001 - 2004 (under WRITTEN and MULTIPLE - CHOICE questions) Question I: Completing the passage with tenses agreement by choosing the best options Question II: Completing the sentences by choosing the best option for each blank Question III: Completing the passage by filling the blanks with suitable words (NOT GIVEN)

Question IV: Reading comprehension (making questions from the suggested words and answering them; choosing true or false) Question V: Finishing the sentences in such a way that it is as similar as the meaning of original sentences Question VI: Writing complete sentences with suggested words

Table 1.2: The test structures in 2005 (under WRITTEN AND

MULTIPLE - CHOICE questions) SCHOOL YEARS | 2005- | 2006 - | 2007- | 2008 - 2006 | 2007 | 2008 | 2009 TEST CONTENT 1, PHONOLOGY (3 Qs) | (5 Qs) | (5 Qs) | (5 Qs)

- Word Stress/ Ending sound/ Vowels or Xx Xx Xx Xx

consonants: Odd one out

10

Trang 20

-| - Mistakes recognizing 2 VOCABULARY, GRAMMAR & CULTURE (24 (25 (25 (25 ISSUES Qs) Qs) Qs) Qs)

- Filling the blanks of the sentences by choosing x X Xx Xx

the best options (Comprising: tenses agreement, sentence structures, connectors, prepositions,

word choice, simple communicative functions, etc.) | 3 READING SKILL (15 (10 |} Qs | (Qs Qs) Qs each | each each | part) | part) part) - Filling the blanks of the passage by choosing the | X XxX Xx Xx best options - - Filling the blanks of the passage by choosing the Xx best options - Reading comprehension xX x x x - Reading comprehension X 4 WRITING ABILITY (8Qs |} (10 (10 | Qs ) Qs Qs each each | each | part) part) | part)

- Completing the sentences by choosing the best x X Xx x options (Focus on sentence level)

Trang 21

According to the structures presented in the above tables, the English test in 2006 encompassed 50 multiple-choice questions for students enrolled in both three-year and seven-year high-school programs, and the candiates were allowed to do the test within 60 minutes

1.4, The issue of instruction in writing ability at high school

In the High-school Program for English subject (pilot) issued by the Minister of

Training and Education in 2002, writing instruction was generalized by the

following two objectives: (1) Writing for personal communication, including letter,

inviting cards, etc.; describing or reporting personal routines, classroom activities;

or filling in the forms and survey (2) Paragraph writing about learnt topics, using language specified in the scope of high-school program (Hoang et al., 2006, p 231) However, teaching writing in most high schools in Vietnam has shown under techniques at sentence level

From the above statements, training writing ability focuses on following main types

of exercise:

- Finishing sentences in such a way that they are as similar as the meaning of

original sentences Students are given sample sentences with clear meaning and

target structures, then they are asked to write another sentences with similar

meaning but different structures

- Writing complete sentences with suggested words, based on the required

structures

- Explaining the original sentences by sentence transformation

Paragraph writing and essay writing have seemed to be neglected in most high schools for the reasons of time-consuming and not-being-used form in HGEs

Trang 22

1.5 Chapter summary

Reading this chapter, the readers get more essential information on the issue raised in this thesis, and are provided with the basis for specific discussion in the next chapters This information relates to the project of changing testing and assessment

at HGEs, the change of test structures through each exam, and a description of

techniques applied in teaching writing ability in high schools They are served as the ground of later arguments

Trang 23

CHAPTER 2

LITERATURE REVIEW

In chapter 1, relevant background information to the study has been shared specifically Works in this section will provide the reader with theory of writing

assessment in three portions: (1) language testing with the issue of reliability and validity; (2) test of writing ability and its focused test types (multiple - choice questions and written testing); (3) and discussion of the best way to assess writing

A conclusion, drawn from this reviewed literature, is expected to answer the

question "which test type and form should be the best choice to test writing ability

in a large-scale high-stakes examination"

2.1 Language testing

Reading this section, readers’ expectation will be met with two issues: discussion of

language testing and a good language test with its qualities relating to writing ability

assessment

2.1.1 What is language testing?

In language teaching, testing plays a crucial role among the components

constituting an instructional program Although they are manifested under diverse purposes in which “While the primary purpose of other components is to promote learning, the primary purpose of tests is to measure” (Bachman and Palmer, 2000,

p.19), they are implicitly understood to be constructed for teaching purpose It is clear that testing is regularly conducted during teaching and learning process with the participation of all stake-holders involved and various purposes of the test types

used

According to Bachman and Palmer (2000, p.8), “Language tests can be a valuable

tool for providing information that is relevant to several concerns in language

Trang 24

teaching” For this basic tenet, language tests were loaded with variety of roles as evidence to feedback on effectiveness of a teaching program, decision-making ground to determine learning materials and activities or to make inferences about

students’ language abilities, and finally a tool to clarify instructional objectives

Therefore, it is obvious that a test used correctly is a significant source to provide most useful information to teaching and learning process

2.1.2 What is a good language test?

Most educators find it hard to definite a good test in general since it depends on the

objectives of language uses, the purpose of the tests, and the resources available (Hughes 2003, p.8) Hughes accordingly suggested ideal characteristics for a test or testing system as follow

- consistently provides accurate measures of precisely the abilities in which

we are interested

- has a beneficial effect on teaching

- is economical in terms of time and money

It may be interpreted in brief the above factors as validity to concern with accurate

measurements made, and it may be said that a test includes reliability in it if it measures consistently At the same time, the concept of positive backwash is

inferred from the meaning of beneficial effect on teaching, and efficiency relating to

educational administration shows another way to refer to economical issues of time and money

Nevertheless, a question arisen from these factors likely entangles most educators as "Can we obtain all these essential factors in developing a single test?" The answer may be "Not always" because, in some cases, validity and reliability seem to be in contrast, or “a reliable test may not be valid at all” (Hughes 2003, p.50), but a valid

Trang 25

test must be reliable Consequently, a test type tends to be chosen according to certain essential qualities required regardless of their essentiality to the use of tests, as the following claim: “While reliability is a quality of test scores themselves, validity is a quality of test interpretation and use” (Bachman, 1990, p.25) For further argumentation, these two sides of a test will be discussed more specifically in the part 2.1.3

2.1.3 Main relevant test qualities

We all agree that to measure quality of a test, reliability and validity are referred as

key factors They provide major justification in the process of making inferences or decisions based on the test scores on an individual For the main concern of this study, apart from other qualities expressed in the usefulness of a test, two qualities will be discussed in details as reliability and validity

2.1.3.1 Reliability

In this sub-section, the issue of reliability is presented through four contents: (a) definition, (b) affecting factors, (c) reliability of objective and subjective tests, and finally (d) inter-rater reliability

(a) Definition: Reliability was defined by different authors with different ways, in

which scoring consistently seemed to be mentioned the most Accordance with Berkowitz, Wolkowitz, Fitch, and Kopriva (2000) cited in Rudner, Lawrence, Schafer, William (2000), reliability was explained, under the condition that “the scores are indicative of properties of the test takers”, as follows:

the degree to which test scores for a group of test takers are consistent over

repeated applications of a measurement procedure and hence are inferred to

be dependable and repeatable for an individual test taker (para.2)

Trang 26

(b) Affecting factors: There are three factors considered as main causes affect

reliability of a test: the test itself, the test-takers, and scoring factors For the first

one, each test type has its own characteristics, for instance multiple-choice concerns

with the effectiveness of distractors, correct answer, and difficulty of the items The

second is clearly explained by Rudner and Schafer (2002, p.22) as students’ performance in a testing process The last and the most concerned in this paper is consistency in scoring since “scorers are not always consistent; the scorers tend to change their criteria while scoring and are subject to biases ” (Rudner, 1992 Cited

in Rudner and Schafer, 2002, p.22) Hence, there exist thousands of ways to achieve

reliability, from consistent performance to scorer reliability, or partly to reach to a good test

(c) Reliability of objective and subjective tests: For objective and subjective tests, which are the most focused in this research and will be describéd in the next part, reliability clearly tends to be mentioned as the issue of scoring procedure for more reliable characteristic of the first type compared with the second Apart from many factors creating reliability of a test, consistency is often defined as measurement of reliability (Bachman and Palmer, 2000, p.19) In some cases, subjective tests for instance, the scorers who are responsible for the same paper may have different conclusions about it as chalk and cheese because one of them may expresse more

strictly in scoring than the others As a result, the scores obtained are not consistent, and could not be considered to be reliable, this causes unreliable tests

Once more time, Bachman and Palmer (2000) affirmed that:

Reliability is clearly an essential quality of test scores, for unless test scores are relatively consistent, they cannot provide us with any information at all about the ability we want to measure (p.20)

Trang 27

minimize inconsistency through test design, or by choosing test tasks as: “Of the many factors that can affect test performance, the characteristics of the test tasks are at least partly under our control” (Bachman and Palmer 2000, p.20)

(d) Inter-rater reliability: In writing assessment, inter-rater has been considered as a dilemma of essay evaluating, so many studies have been conducted under various issues relating to this problem Shohamy, Gordon and Kraemer (1992) wrote an article namely "The effect of Raters' background and training on the reliability of

direct writing tests" in The Modern Language Journal, Vol 76, No 1, pp 27-33

Baer et al (2004) compared the creativity manifested in more than 100 8th-grade students’ writing in their study Howell et al (2005) described the notion of Inter- rater reliability as follow:

Inter-rater reliability is the extent to which two or more individuals (coders or raters) agree Inter-rater reliability addresses the consistency of the implementation of a rating system

In the researcher's mind, the root of the consistency in essay rating is designing an

adequate scoring rubric to have the similar response to the same sample, because a perfect scoring scheme itself expresses a guidance to evaluate the students’ writing

so that judgments of two or more individuals may vary depending upon the criteria established (Moskal & Barbara , 2000) To enhance inter-rater reliability in a large-

scale examination, scorer training and monitoring skills are called for despite of corollary relating to financial problem

In short, there exists a conclusion that “By developing a pre-defined scheme for the evaluation process, the subjectivity involved in evaluating an essay becomes more objective.” (Moskal & Barbara , 2000)

Trang 28

2.1.3.2 Validity: face validity

Whereas test reliability relates to consistency of measurement, test validity refers to accuracy of measurement Hughes (2003) clarified that

A test is said to be valid if it measures accurately what it is intended to measure, or a test created can be inferred to be a true tool to measure

students” ability to perform their knowledge at certain levels required (p.26)

In other word, “if a test serves its intended function well, it is said to be valid, if it does not, it is invalid” (Nunnally, 1972, p.21) According to Alderson et al (1995), there are three types listed in the frame of validity: internal validity, external validity, and construct validity However, related to the research topic in this thesis, only internal validity is mentioned along with face validity as the main discussion Internal validity is discussed as conceptual approach which both subjective and objective judgments of a test required mostly base on assessment of the test itself

(such as its form, format, and content) That means it is mainly concerned with the

‘perceived content’ of the test and its ‘perceived effect’ (Alderson et al., 2003) To

assess this feature of a test, three most widespread methods should be included The first is face validity which the value of a test is criticized by various kinds of

stakeholders The second is content validity — the very dilemm because test content

can be only judged by professional experts or linguists The last one is response validity primarily conducted by test-takers However, for the scope of this research, internal validity will be discussed limitedly under face validity for its relevant to the test types mentioned

Face validity is not considered as the predominant method to assess the validity of a test; still in fact, it is the face of a test to be firstly examined when discussing the issue of good or bad test That is, a test with poor format together with such poor items, or unclear instructions may be said “not valid” In addition, to Ingram (1977,

Trang 29

p.18), face validity refers to the test’s “surface credibility or public acceptability” And judgments on test are regularly performed subjectively by ‘non-experts’,

administrators, and even by students or test-takers Their assessment, hence, may

not be officially accepted inspite of their admitted impacts on the validity of a test

Finally, a test called valid if it appears to be valid, or testing language skills should include adequate elements of those skills A question arisen for this argument, a critical issue, relates to face validity that which one is more valid - multiple-choice

format or essay format used in a test of writing ability?

2.1.3.3 Reliability and Validity

The contrast between reliability and validity (as mentioned before) are manifested under diversity of testers’ viewpoints in both principle and practice In principle,

some testers assumed that “a test cannot be valid unless it is reliable’, or “it is quite possible for a test to be reliable but invalid” (Alderson et al., 1995, p.187) Nevertheless, in practice, reliability alone is not sufficient since a test can only be

conducted when it is highly valuated with all test qualities Especially, with the present of these two factors, which one will be chosen to maximize, it depends on the test purpose A typical example for this argument is multiple-choice test, it is

called reliable test because of objective and consistent scoring, yet it is criticized for

invalidity to measure students’ ability of English in real life (Alderson et al., 1995, p.187) However, it is chosen for the purpose of reliable scoring

2.2 Writing ability assessment

In this part, some explanation of writing ability and its testing types will be included

for insight of writing assessment

Trang 30

2.2.1 What is writing?

Writing is considered as manifestation of human beings through language for its distinction between textual and non-textual medium of inhuman This has been proved by archeologists and linguists who observed the ancient illustration like as cave drawing and painting Thus, it is admitted that the ability of organization a text brings out the existence of people in a developed community In other word, only people in this world can develop the system of language, and writing system is one of the most central features of language To Ager (1998, par.1), “Writing is a method of representing language in visual or tactile form Writing systems use sets of symbols to represent the sounds of speech, and also have symbols for such things

as punctuation and numerals”

Besides, writing is a special way of communication among the people firstly having the same writing system with the implication that they are in an organization, it is

because writing stands for a great formality holding numerous principles as well as rules in it when writers organize their thoughts

In short, writing is the highest development of a language of which it completely

covers the whole features

2.2.2 Writing test formats - subjective format and objective format

Writing has been defined in the above section, and the ways used to assess a

person’s writing ability always attract attention of educators As indicated in the previous parts (in 2.1), the choice of test type depends on the objectives of a test To

deal with testing writing ability, all the features involved should be observed As the

result, the advantages and disadvantages of subjective and objective tests will be

discussed in this section on the ground of their nature as “the subjective type of test

allows students to write and use language in their answers while the objective type

Trang 31

does not” (Tunku, 2000) Yet, the system by which a test is scored gives that test the name, not really the test itself, so it is more focused on this study as well

2.2.2.1 Objective testing

In Vietnam, objective testing (the most common questions are multiple-choice, true/false, matching items, and completion) is considered as a new trend compared with subjective testing The advantages of these forms, however, are obviously manifested under scoring issue The scoring process is conducted objectively by

following a consistent answer or by scanning machines and computers Besides, the

test takers cannot supply the selected-response item formats known as multiple-

choice, matching, and true-false (Zimmaro, 2003, p.15) with various alternative so

that objectiveness will be always maintained in this type

However, this testing has its own disadvantages Because it requires test takers to choose the correct response from several alternatives suggested, it is said to embed

more guessing factors in this type In addition, this kind of test, especially, has a

severe effect on English - majored students (at University or College) for lack of

writing ability This result is of the true that students were not trained with adequate

ability in high school because of being prepared for examinations (mostly under multiple - choice and sentence writing exercises which mainly focus on grammar) rather than being established with sufficient knowledge (four language skills) to

further studying

In addition, some disadvatages of this type are also known as time consuming to write good items, less valid than subjective test, and no suitability for assessing certain abilities Despite being a controversial topic, it still remains popular due to its utility, reliability, and cost effectiveness

Trang 32

2.2.2.2 Subjective testing

Contrasting with objective testing is subjective testing which require the test takers to write and present an original answer, including short-answer essay, extended- response essay, problem solving, and performance tasks (Zimmaro, 2003, p.15) To measure student's writing ability, these tests are encouraged to use with a lot of

arguments based on their several advantages and disadvantages

For good side, subjective tests represent their strongest point under face validity which allows the test takers to perform their actual ability Students can demonstrate written expression in such a logical order that their results are precisely measured

Beside that, learners' language knowledge is assessed in a wide range including mechanics, grammar, style, organization, and logical development It is clear that these fields of knowledge are unable to be shown in objective tests since they are kind of discrete language properties assessment (emphasizes on mechanics, grammar, and vocabulary only) while other significant aspects of written language (organization, content, and coherence) are ignored (Albertini et al., 1996, p.75)

The biggest drawback of subjective test revealed under its name is the variation in

marking which is shown that "experienced examiners award widely varying marks

not only on the same piece of work marked by other examiners, but on their own

marked scripts re-marked after a passage of time" (Zimmaro, 2003) According to

her, this format has been criticized to base on the basis of loosely defined criteria with lack of consistency from one reader to another, and grading is time-consuming and often impractical in large numbers On the other hand, such examinations have been thought to be deficient in consistency because of essay ratings; besides, over-

burdened staff, time - consuming and financial problem are considered the big

dilemma to make the exam impractical as well Although it's not the best way to

assess student's writing ability, subjective form is still able to gain both validity and

reliability when issues of administration, testing conditions, population factors, and

scoring are under appropriate attention and adequate control (Huot, 1990)

wv

Trang 33

According to Jacobs (2004), the major task in scoring essay tests is to maintain consistency, and to make sure that answers of equal quality are given the same number of points, "At once, there has existed two approaches to scoring essay in writing assessment, they are named as analytic or point method, and holistic or rating method." (p.5)

- Analytic: a model answer, in which the major components are defined and point

values are assigned, should be called for before scoring in this method so that the

raters can carry out a comparison between the student's papers with the model

answer In addition, an analytic rubric should be used when the tester expects students’ strengths and weaknesses, detailed feedback, assessment of complicated skilled or performance, and their self-assessment of knowledge understanding and performance (Zimmaro, 2003, p.32)

- Holistic: This method requires a list of certain developed criteria that the total

quality of examinee's answers based on The score for each paper will be decided

according to the stacks it involved in, these various stacks are regularly sorted into

three levels: the first stack includes the best answers, the second stack comes from

the average, and the poorest go into the third stack This method is usually used for

a large group of test takers when a quick snapshot of achievement is desired

together with a single dimension is adequate to defined quality (Zimmaro, 2003, p.32)

In summary, each test format has its strong and weak points, the real challenge for

the educators who have responsibility for making decision on finding and using

valid and fair assessment tools is to bring out a suitable test format based on specific purpose of education, especially in HGEs

Trang 34

2.3 The best way to assess writing ability: a combination of testing formats The strong and weak points of each format presented in the above section lead to various opinions in deciding which test type should be applied to obtain a valid and reliable test of writing ability Beside that, the history of testing passed through movement from subjective to objective testing with presenting both strengths and weaknesses, thus most testers tend to find out a better sample with the best features

of these two extremes, that is to say, the overall content of the first and consistent

objective scoring of the second (Madsen, 1983, p.7)

In this part, choice of combinating these two forms will be shown for rejecting the unfavored elements of each one According to Driver & Krech (2001), to measure

students fairly “assessments should be valid (measuring what they intend to measure) and reliable (resulting in the same placement whenever they occur or

whoever grades” (p.18)

It is obviously, moreover, that both written and multiple-choice tests lead to problems when used alone (see the weak and strong points of each one presented in the previous section) since they cannot include reliability and validity in each

format Breland (1996) expressed that one way to ensure reliability is to use “both free-response and multiple-choice tasks to make up the assessment” (p 23) In addition, to ensure fair assessment, White (1998) also agreed that it should be combination of testing methods as: “The best use of multiple-choice scores, if they

must be employed in the area of writing, is as a portion of a test, rather than as the

assessment itself” (p 240) It is also reported in some studies, moreover, that there

existed this approach by Educational Testing Service (ETS), the more reliable essay scores were not aimed to but a unique kind of information would be provided for

the purpose of public reporting - based on the viewpoint that a sole test can be reliable when considered along with other information

In short, White (1998) emphasized the need for this kind of assessment as follows:

Trang 35

The results of a careful multiple-choice test, when combined with the results

of a single essay test, will yield a fairer and more accurate measure of writing ability than will either test when used by itself A preferable alternative is to score more than one writing sample, either in paired essay tests or in portfolios (pp 240-241)

2.4 Chapter summary

In closing, this chapter has presented a conceptual framework relating to three main issues The introduction of language test with relevant reliability and validity

supplied general knowledge of this realm Then, based on this theory, a more

specific understanding of writing ability assessment was discussed to provide the answer for the issue: “what is the best way to test writing ability?” which was mentioned in the last section of the chapter Following this theory chapter, research questions and methods to find the answers for them will be shown in the next chapter of research methodology

Trang 36

Chapter 3: DOLOG

CHAPTER 3

METHODOLOGY

The previous chapter has dealt with the key theories as supporting background to

the thesis This chapter discusses the methods employed to get data served for findings in the study, it includes two main sections The first section was stated with

reasearch questions Accordingly, the second section describes research design with three parts: the subjects, the instruments analysis, and implementaion under two

sub-sections: data collection procedures and data analysis procedures

3.1 Research questions

As stated, this thesis aims to investigate the two different test formats which are multiple-choice and written tests used in HGEs The findings will serve to observe

their relationships in writing ability assessment and find the most suitable test format for this large-scale exam To obtain these aims, specific following questions

have been raised as:

(1) Are there any differences of ranking students' writing ability between subjective

(sentence and essay writing) and objective testing (multiple - choice)?

(2) From the result of question (1), are there any correlations between written test

format and multiple- choice format in the field of writing ability assessment?

(3) Do students show the same favor or judgments on test forms in HGEs and

should another form be used instead of the current ones?

These questions are the basis for finding out the correlations of assessing students’

writing ability by objective and subjective tests

Trang 37

3.2 Research design

To get the necessary information for answers to the mentioned research questions, a test and a survey were designed and carried out In this research, data collection was

mainly based on the test of three different formats, and the survey questionnaires

were added to gain empirical evidence for diverse results achieved from the test

The following will describe in detail the subjects, the instruments, and the

implementation of the research 3.2.1 Subjects

3.2.1.1 The core subjects: First - year students at An Giang University

Aside from getting primary information for this research through a questionnaire answered by 60 high-school students at 12-grade of Binh Khanh High-school, the research was mainly conducted on 44 students who are the first - year students (school year 2008 - 2009) of An Giang University, majored in English They were

both test takers and the second questionnaire respondents, coming from the two

classes of Department of Foreign Languages, The Faculty of Pedagogy These

students had just graduated from high school, and were chosen to the University, so

they carried out the test in this study with their own knowledge of high school

program, since they were in the beginning of the first term at University That is to say, they were still not trained with college writing because this study aimed at investigation into writing ability after three years learning English at high school,

particular in wash back of preparation for HGE

Most of them had been graduated from the various high-schools of An Giang

province, this point brought so many advantages to the researcher since their

learning process was controlled under the same evaluation guideline, or their ability

was assessed equally among schools of one province Beside that, they had three years influenced by the changes of test form with full multiple-choice questions (get

Trang 38

more information with reference to the table 1.3) These may be considered as the base for making a practical comparison of their opinions about current test formats and their writing ability acquisition at high schools

Some arguments against the study subjects, though, may maintain that there exists contrary of the supplementary subjects and the core ones, it is true that those who

are chosen to university are good students from various high schools (the second group) whilst the first group was required to be at average level in one school The cause bases on the ground that this difference may lead to an objective viewpoint

towards the variety of students impacted by exam changes

3.2.1.2 The supplementary subjects: 12 - grade students at Binh Khanh High

school

Before conducting the research on the core subjects as mentioned in the previous section, another questionnaire had been designed and handed out to the high-school

students for the purpose of getting preliminary information as reference for

background of the research This group included 60 students of the 12 grade at Binh

Khanh High school (each 20 students were randomly chosen from three classes), they were in the second term of the school year 2007- 2008 The reasons for gaining information from this group will be stated as following:

(1) Among four state high schools in Long Xuyen city, An Giang province, Binh

Khanh was the youngest school with young staff of which the number of teachers

majored in English was 7, half of them had 3 - 5 years' seniority, and the rest were over 5 years This school is located in the environs of the city in which the most

students had been ranked at average rate through HGEs

Relying on these certain features, the researcher hold the view that examination

changes may have stronger impacts on students coming from this school for most of them considered getting High-school Graduated Certificate as their final goal of

Trang 39

studying, so test formats applied in all exams seems to be a decisive factor for their learning

(2) The questionnaire had been handed out randomly among 3 classes of the 12 grade for the reason that this survey was conducted for preliminary remarks of using multiple - choice testing, hence they were not considered as sample of the research 3.2.2 Instruments

To carry out the research, the instruments include a survey questionnaire, a test and

some information for essay scoring 3.2.2.1 Survey questionnaire

In this research, questionnaire was designed as supporting tool for getting information relating to students' attitudes of the current exams as well as finding out the writing drills with which students were trained the most in high school The

questionnaire was designed under four questions for two issues relating to students themselves: (1) self-assessment of four skills trained, particular in their ability of writing; and (2) their judgment on test forms used in HGE The questions were simple and similar to multiple - choice type, except for the question 4 with both multiple - choice and text open end (or verbatim) in order to find out the reasons or

justifications for the respondents’ choice Following are the descriptions of the

questionnaire:

Questions 1 and 2 have presented the same type with two parts included (mainly

under multiple - choice) The first part, relating to the four skills of language, was built up to observe the students' ability of English (which skill was trained the most

in high - school program, and vice versa, which one was least concentrated)

Moreover, the second part of these questions also took a crucial part in the analysis

of the key questions involving in writing ability because they were mentioned under three main reasons concerning both the process of teaching and learning at high

Trang 40

school and students' attitude Aside from the three options, students will be given a

chance to express their desired answer with the option "Others" if they find unsatisfied with the suggested ideas, and the positive viewpoints from students should be naturally expected for extra useful information to the thesis

Question 3 was also built up in the same type as the previous questions, that is to say the information would be collected from two parts under multiple - choice as well However, this question aimed at typical exercises of writing ability which have been applied at high school; the researcher stressed on three different formats as Multiple - choice exercises, Sentence writing, and Short paragraph writing as her purpose of this thesis And, one more option ("Others") was also looked ahead to be put in this question for getting various information from students

The last question was somewhat differently from the above with five options and an open question to collect reasons of the students’ choices This question was to find

out the most favorable format through their learning process As the result, three

mentioned formats in this thesis along with their combinations listed into five options partly contribute to the researcher's expectation of collecting persuasive

evidence for the recommendation section In addition, with various judgments on these test formats, a list of strengths and weaknesses of each test fortmat has been

partly manifested 3.2.2.2 Test

The test is a compile of writing testing sections in the HGEs of the years 2004 and 2006, including three parts: multiple - choice, sentence writing and short paragraph writing, with the purpose of investigating correlation coefficient of these formats The content in each part of the test is not the aim to reach in this research because they were chosen randomly from the HGEs, but its intended formats which take an

important part in designing this test were selected relying on the changes of

examinations by the MOET through the years

Ngày đăng: 25/11/2014, 00:36

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN