sy gta tse SP 3h em ees ko § meee = Ree Abhay > at PRE ben PRA MBL Ậ
Fe com to remove the watermark Pree QAP GS
MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY OPEN UNIVERSITY
Ae
AN INVESTIGATION INTO WRITING ASSESSMENT: THE USE OF MULTIPLE-CHOICE AND WRITTEN TESTING
A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF ARTS (TESOL)
~——
———————mm—mmm
TRƯỂNG ĐẠI Hột MŨ ï?.nGMĐ
THU VIEN nee eed
Submitted by NGUYEN THI XUAN BINH
Supervisor
Dr VU THI PHUONG ANH
Ho Chi Minh City, December 2010
Trang 2ABSTRACT
With the effort of renewing our national examinations to increase testing accuracy and fairness, the Ministry of Education (MOET) has applied multiple-choice format
to some subjects in which English is included However, in the realm of writing
assessment, English caused the most controversial for the arisen question that "Can multiple-choice testing format assess students’ true writing ability?"
The purposes of this thesis are, therefore, to find out whether there are any differences of using multiple-choice and previous formats (known as written testing) to assess writing ability, and the most suitable format will be searched for
under careful consideration as well
With the stated aims, two tools were employed in this research: (1) a test including three formats (one with multiple-choice writing, and the others with sentence writing and paragraph writing) was firstly conducted to observe their differences in rating students In addition, the result of another test of writing (essay) before conducting this study was also employed to supply the readers with evidence on students’ true writing ability All the data got were analyzed and interpreted under descriptive statistics and correlation coefficients for the findings (2) Then a questionnaire was administered to these students for digging out more information on students’ background knowledge and their attitudes towards these formats
The results showed that most students do not obtain similar results taking different test formats; the mean score of multiple-choice test is the highest, whilst two others are relatively similar A combination of these test types was then considered because of closer relationship between their average total score with another test of essay writing In addition, it is noticeable that most students prefer this new type for full of interesting and insightful reasons From these findings, some recommendations were made for considering a mix-form assessment of students’ writing ability in high-school graduation examinations
Trang 3TABLE OF CONTENTS CERTIFICATE OF ORIGINALITY i ACKNOWLEDGEMENTS ii ABSTRACT ili TABLE OF CONTENTS iv LIST OF TABLES AND FIGURES Vili r8 Vili FIGURES ix ABBREVIATIONS x INTRODUCTION 1
0.1 Statement of the prO€Im - <4 1 9 991 30 rà HT TH H03 1966 1
0.2 Aims and Overview of the research sesessseecessertseeseeceeeeceeseeeeeseseasoseveerssnaasssees 3 CHAPTER 1 2s 5 BACKGROUND TO THE STUDY 5 1.1 Vietnamese crucial examinatiOnS: TIẨICISIT 5 (<< 5< S4 2v 3v erereee 5 1.2 An appearance of “the overall project to innovate the system of HGEs and entrance
examinations to university and college since 2005”” - <s«ccs« «s2 8
1.3 The changes of testing formats used in HGTES HH HH HH HH 8
1.4 The issue of instruction in writing ability at high school . -¿ «<< << «<< 12 1.5, CHapf€T SUINTTATY G5 - Ác 10 11 TT HH HT 13
CHAPTER2 „14
LITERATURE REVIEW 14
2.1 Language †€sfÏn - - s4 Họ HT no HH H0 0g 14 2.1.1 What 1s languàe †€SfITØ? - 5 sọ TH HT 40 037 14
2.1.2 What is a good language †€SÉ? - - - «HH HH gu nh 15
2.1.3 Main relevant †est quaÏ1fICS - «<2 0 1 HH HH no ng 16
°“Ä Ni 7 16 2.1.3.2 Validity: face Validity .- << cv HH nu HH nh TH 701 re 19 2.1.3.3 Reliability and Validity . S0 ng nen 20
2.2 Writing ability asS€SSIm€TIE - - 7Á 5 HT TH nu TH H000 1100170 20
Trang 42.2.1 What 1S WTILÏT? - Án 09T TH TH 0181100110 1 0080010180110 21 2.2.2 Writing test formats - subjective format and obJective formal 21 2.2.2.1 Objectiv€ †€SfITE - - «Sàn HH HH HH HH 0014046131 101 22 2.2.2.2 Subjective {€S{ÏTE - - tk HH HH HH ng TH HH HH 23
2.3 The best way to assess writing ability: a combination of testing formats 25 2.4 Chapter SUMMALY 0.0 ec cseesseseseeseeeeesenessscsseessesensessseseesessssssasesssesesensensensensnsees 26 CHAPTER3 27 METHODOLOGY 27 3.1 Research QU€SÏOTIS - «<1 Ơn HH ng HH 01 HH H001 10154 27 3.2 Research €SiØT - 0 4 <9 TH TT TT TH HH HT TT 010 7814110 28 kh) 1 ƠƠƠỎ 28
3.2.1.1 The core subjects: First - year students at An Giang UnIversity 28 3.2.1.2 The supplementary subjects: 12 - grade students at Binh Khanh High S020 1177 29 3.2.2 ÏnSÍFUT€TIẨS .- 5 (5 ng 5 Họ H1 ng nh re 30 3.2.2.1 Survey QU€SẨÏOTTẠTC Án ng HH ng nh HH0 30 c”ˆ Na 31 3.2.2.3 Essay scoring methOd -s- + + +22 tre 32 3.3 ImplemenfatiO - - 5 + nh n4 111 0110 0101014181111 1 011418104001811 100 35 3.3.1 Data v0Ìvii80( 0 35 3.3.1.1 Conducting the †€Sf: 4Á G93 080 35 k0 0) 0n 7n 37
Trang 5
4.1.2 Central Tendency and ÏDiSp€rSIOT <5 5< + E932 E2 E13e 45 4.1.3 The skew Of distrIDufiOT\ - G5 << vn TY HH 011g Hàm 48 UWVîy J0 1 48 4.2.1 Correlations among the different formats 1n the †esf -©-<+<<s<++ 49
4.2.1.1 Correlation between Multiple - chọce testing and Sentence writing 49
4.2.1.2 Correlation between Multiple — choice testing and Paragraph writing 50
4.2.1.3 Correlation between Sentence writing and Paragraph writing 51
4.2.2 Correlations between the two tests’ ccscccscecssssseessessessssssesssesseessecsecssnecseeeseeeeens 52 4.3 T-test: Testing hypotheses about the difference of average value 55 4.3.1 Among the formats used in the major test .ccsscssssseeeseseeeeeeeneeeseseesetseneaes 55 4.3.2 Between the major and minOT €SS 4 55 s5 99v vn ng 91 03 g3 56 4.4 Questionnaire arnaÏYS1S - - << x3 9 ng HH 0101011010110 011310101100 57 ĐI vì (0n nối (0 62 4.5.1 Score distributions: ccccssscsssesecesseeesseesnsesnessenesveneeseneseeees KH ng kg 62 F060 2i 0n 64 F0) ii n8 65 4.6 Chapter summary t*9010398330501101181410411400144140010100100000000040010010 67 CHAPTERS 68 CONCLUSION AND RECOMMENDATIONS 68 he 0 68 5.2 RecommendafiOTIS - 5 5 2< Họ TH TH 0303001000 69
5.2.1 Recommendation 1: Suggestion for combination of different test formats 69
Trang 6[d3 >))) ĐP, G024009)00.24:3600)0 c1 80
APPENDIX 2: STUDENT QUESTIONNAIRE 2- 2-22 se £zsccseerszerxecree 83 APPENDIX 3: UNORDERED LISTING OF SCORES 5-5 7<ccscczersecrs 86 APPENDIX 4: UNORDERED LISTING OF SCORES FOR PART 3 88 APPENDIX 5: CLASSIFY STUDENTS INTO GROUPS FOR MULTIPLE CHOICE
TESSTT 2 5-55 5S 3E 3971 111511 1111515 15111111 1111 717ATEEEerrrrrie 90 APPENDIX 6: CLASSIFY STUDENTS INTO GROUPS FOR SENTENCE WRITING
TESTT 52 252S219E13213211113211211 1511111111 13 1.11 100111111 0 re rrec 92 APPENDIX 7a: CLASSIFY STUDENTS INTO GROUPS FOR ESSAY TEST
SCORED BY GROUP ONE 2 52c crErkgrrerkerrkrrrriea 94 APPENDIX 7b: CLASSIFY STUDENTS INTO GROUPS FOR ESSAY TEST
SCORED BY GROUP TWO - 2262222 x1 EkELxirkrirkrree 96 APPENDIX 7c: A COLLECTION OF STUDENTS' RANKING FROM THE THREE
FORMATTS 256-542 SH TH g1 1 131111 1111111131 1-1 cá Hy ren rườt 98 APPENDIX 8: COLLECTED REASONS FOR THE QUESTION 4 (IN THE
9)8/2-⁄/19)0/.9:1292077 100 APPENDIX 9: QUESTIONNAIRES DATA ANALYSIS -cccccc.e 103 APPENDIX 10: GUIDELINE FOR APPLYING MULTIPLE-CHOICE TESTING TO
ENGLISH TEST IN 2006 ¿2+ 22 SE +k*£E£2E£ESEEkerreerreee 108 APPENDIX 11: DE THI THPT TU NAM 2003 DEN 2009 se cxsccee 110
Trang 7TABLES Table 1.1: Table 1.2: Table 1.3: Table 4.1: Table 4.2: Table 4.3a: Table 4.3b: Table 4.4: Table 4.5: Table 4.6: Table 4.7: Table 4.8: Table 4.9: Table 4.10: Table 4.11: Table 4.12: Table 4.13: Table 4.14: Table 4.15:
LIST OF TABLES AND FIGURES
The test structures in 2003 and 2004 The test structures in 2005
The test structures through 2006 - 2009 Frequency table for part 1
Frequency table for part 2
Frequency table for part 3 - Group 1 Frequency table for part 3 - Group 2 A summary of frequency distribution Central Tendency and Dispersion
Correlation between Multiple-choice and Sentence writing
Correlation between Part 1 (Multiple - choice) and Part 3
(Paragraph writing) scored by Group 1
Correlation between Multiple-choice and Paragraph writing scored by Group 2
Correlation between essay scorer groups
Correlation between Sentence writing and Paragraph writing scored by Group 1
Correlation between Sentence writing and Paragraph writing
scored by Group 2
Correlation between Part 1 of the major test and minor test Correlation between Part 2 of the major test and minor test Correlation between Part 3 (scored by group 1) of the major
test and minor test
Correlation between Part 3 (scored by group 2) of the major
Trang 8Table 4.16: Table 4.17: Table 4.18 Table 4.19a: Table 4.19b: Table 4.20a: Table 4.20b: Table 4.21: Table 4.22a: Table 4.22b: FIGURES Figure 1.1: Figure 4.1:
Multiple-choice and Sentence writing
T-test among the formats
T-test between the major and minor tests Question 1 analysis Reasons for Q.1 analysis Question 2 analysis Reasons for Q.2 analysis Question 3 analysis Question 4 analysis Reasons for Q.4 analysis
Trang 9HGE Dr MOET SPSS UEE ABBREVIATIONS High-school graduation examination Doctor
Ministry of Education and Training
Trang 10
INTRODUCTION
In this introductory section, the purpose of this thesis will be clarified and stated under real situation of Vietnamese education in general and her context of AnGiang in particular after observing the changes of HGEs through the past few years
Beside that, the aims of the research and thesis overview will be presented as well
0.1 Statement of the problem
Since the MOET decided to apply multiple-choice testing format to HGEs (2005),
there has existed salient controversial issue turning around this change with a
variety of contrary views including support and rejection These viewpoints stem from the following causes:
Firstly, at the early stage of using this new format, objeCtive test was not
popularized to all stakeholders to get the agreements in the society, so some
educators showed suspicious eyes on this kind of test since they stressed more on its
drawbacks which have been stated as follows:
- For test designers: compared to other types, it is more challenging and exploiting to write good multiple-choice items because plausible distractors or incorrect answer options are required as the most important in this type, therefore it
is put under time-consuming to work For that reason, some designers tend to
support “recall” type questions as they are easy to write, but the: failure is that test takers are able to guess the answer through a test
In addition, the issue of language knowledge and skills caused a lot of controversy Some educators argued that this test type could not measure all language skills or
ability of critical and logical thinking However, others advocated this test type as it
Trang 11
- For test takers: multiple-choice testing can not measure students’ overall performance, it sometimes assesses recognition over recall because test-takers may
attempt to guess rather than determine the correct answer due to their incapacity to
answer a particular question Therefore, a number of students have a chance of
receiving a mark for an unknown question For productive (speaking and writing
skills), it lacks training students to express their ideas directly and to have logical creative thoughts
Secondly, it is sad but true that the training process in the high schools in Vietnam
mainly aims at examinations, that is to say teaching and learning what will be
tested Therefore, the consequence of this tendency has negative backwash on students and even teacher as they only focus on the test format known as multiple- choice; as a result, students cannot achieve the proficient ability of English
Over the years teaching at An Giang University, I recognized that most freshman students who are majored in English did not perform their writing ability well The possible explaination for this problem is that these students concentrated too much
on multiple-choice testing which is the main testing format applied in both HGEs
and UEEs
Moreover, the fact that how a variety of language abilities can be measured by only one kind of test (i.e direct or indirect testing) makes the testers a lot of difficulties since each skill has its specific characteristics, so they require different ways to test For example, if the teachers want to know how well the students pronounce a language, the best way is to get them to speak On the other hand, if they want to
check students' grammar structures, it is better for them to use multiple-choice
testing (Hughes, 2003)
A question has been arisen in the researcher's mind is that: How does objective test
Trang 12
In the face of such question, the study was carried out with the hope for appropriate kind of testing — multiple-choice or written testing — in which testing results can truly reflected students’ real writing ability
0.2 Aims and Overview of the research
The research was carried out for the aims of
- Investigating the relationship among various testing formats and certain problems
related
- Finding out the best test type for writing assessment in HGEs
In order to achieve these aims, the data were collected from two main sources: (1) A mix-formed test was designed to observe students’ writing ability through different three formats including multiple-choice testing, sentence writing, and paragraph writing
(2) Questionnaires for students were composed of four primary questions to identify students’ perceptions of writing assessment
The thesis consists of five chapters as follows: Chapter 1: Background to the study
This chapter mainly focuses on setting of the research based on background information and practical viewpoints
Chapter 2: Literature Review
Trang 13
Chapter 3: Methodology
In this chapter, based on the aims stated, the research design and implementation are
framed involving the setting of the study, the data collection procedure, and the data interpretation process
Chapter 4: Results and Discussion
This chapter presents the findings to the study through description and analysis of
the collected data, and finally, discussion of the study results Then, an interpretation of research findings is displayed together with conclusion
Chapter 5: Recommendations and Conclusion
In this chapter, recommendations to the problematized issues are given out And the
last parts reveal the declaration of limitations of the study, suggestions provided for
Trang 14
CHAPTER 1
BACKGROUND TO THE STUDY
This chapter addresses the relevant contexts to the issue raised for supplying the readers with the origin of the subject Besides, it provides background information
for general understanding There are four sections presented in this chapter Section
1 draws a real picture of important Vietnamese examinations under two primary
criticisms Then, section 2 mentions the MOET’s project to innovation of examination system in Vietnam, in which multiple-choice format was introduced as the new trend in HGEs and entrance exams Section 3 is considered as a subsequent impact to the testing innovation; it also shows an overall picture of the HGEs in the transition period through presenting various structures from 2002 to 2009 Finally, the last section describes the issue of teaching writing ability in-high school under primary techniques These contents are presented under a logic chain of large to more detailed problems so that the research issue will be delved into At the same
time two last sections also meet expected background information for the research questions
1.1 Vietnamese crucial examinations: Criticism
Trang 15
For high-stakes examinations, Marchant (2004) carefully explained the
_ consequences brought out under school-wide average scores a school got as well as individual students' high or low scores the students obtained as:
High school-wide scores may bring public praise or financial rewards; low scores may bring public embarrassment or heavy sanctions For individual students, high scores may bring a special diploma attesting to exceptional academic accomplishment; low scores may result in students being held back in grade or denied a high school diploma (p.1)
Accordingly, for certain purposes, a HGE (and even university entrance exam) in
Vietnam may be criticized under various viewpoints because of their weaknesses They are often criticized for the main following reasons:
- Causing stress for the stake holders As their results take afi important part in deciding students’ future, they have to learn more to prepare for the exam for many
years Besides, the exams are often organized in large scale in Vietnam, so all the stake holders tend to be involved in for a long time before and after the exam
Saville's model (2006) explaining the roles of stake holders in the testing
Trang 16Chapter 1: BACKGROUND TO STUDY Context Stakeholders in the Testing Community Government Testing system Learners agencies Test construct Parents/carers Professional bodles Teachers, Heads School owners Leamers Test format Recelving institutions Teachers, Heads Test scores Government agencles School owners Test conditions Professional bodies Test writers Employers Consultants Test assessment criteria Academic researchers Examiners inter alia) Test centre administrators Publishe inter aifa
Input to test design Context of test use -
provided by stakeholders where decisions are made by stakeholders using test scores
Figure 1.1: Saville's model (2006) of stakeholders' roles in the testing community - Students’ results are solely used for many crucial decisions like continued education (college/ university), graduation certificate (or high-school diploma), job qualification, etc Some educators argue that evaluating the knowledge student obtained within three years of learning at high-school or making decision on their future on just a single test is not the best resolution, it may be unfair for some
students because of unexpected reasons happening during the exam
Therefore, the MOET's overall project to innovate the system of examinations at
high school in Vietnam was issued, yet it also represented salient controversial
Trang 17
instead of one large test result for students’ further education after high school This project will be shown immediately in the very next part
1.2 An appearance of “the overall project to innovate the system of HGEs and entrance examinations to university and college since 2005”
It is admitted that testing and evaluation is one of the most important stage of
education since it has strong impact on teaching and learning process Based on this acknowleged role, the MOET made an enhancement to the field of measurement and evaluation systems that is actually the nature of innovation to the nation-wide examinations along with serious formative assessments at school
In Vietnam, there are four current high-stakes examinations organized every year: HGE (at the end of May and early June), university and college entrance examination (at the early July), college entrance examination (at the end of July and early August), and entrance examination to vocational secondary school (after the previous exam) They are held under supervision of the MOET, hence the MOET's
project tends to affect them all with the change of examinations, particularly in test
format
In the framework of the mentioned project (see appendix | for further information), the MOET advocated using objective testing to foreign languages (including English, Chinese, French, and Russian) in HGEs and university and college entrance exams since 2005 However, until 2006, language tests with multiple- choice testing format (with four choices A, B, C, D) were put into practice The
more detailed description will be mentioned in the next part
1.3 The changes of testing formats used in HGEs
Trang 18
using multiple-choice testing in most crucial examinations would become predominant in Vietnam for its progress considered as a modern scientific study It is obvious that the trend of using multiple-choice testing in HGEs and UEEs seemed to be gradually expanded from 2002, and until the 2006 exam, written form was rejected completely Following is the presentation of the test structures (Note: just focus on test format, not test content) from 2001 - 2009 (collected by the writer) They are divided into three stages to give the readers a logical overview of changes through the years SCHOOL YEARS | 2000 - | 2001 -| 2002 | 2003 2001 2002 |- - TEST CONTENT 2003 | 2004
Written | Written and multiple format | - choice format)
Question I: Completing the passage with tenses x x x x
agreement
Question IJ: Filling the blanks with appropriate x prepositions (NOT GIVEN)
Or: Completing the sentences by choosing the best x x X option for each blank
Question III: Giving the correct form of the words xX xX xX xX in brackets
Question IV: Reading comprehension (making X Xx questions from the suggested words and answering
them; choosing true or false)
Or: Reading the passage and answering the X X questions
Question V: Completing the passage by filling the x Xx Xx Xx blanks with suitable words (NOT GIVEN)
Trang 19
Question VI: Finishing the sentences in such a way X x X Xx that it is as similar as the meaning of original sentences Question VII: Writing complete sentences with x suggested words
Table 1.1: The test structures through 2001 - 2004 (under WRITTEN and MULTIPLE - CHOICE questions) Question I: Completing the passage with tenses agreement by choosing the best options Question II: Completing the sentences by choosing the best option for each blank Question III: Completing the passage by filling the blanks with suitable words (NOT GIVEN)
Question IV: Reading comprehension (making questions from the suggested words and answering them; choosing true or false) Question V: Finishing the sentences in such a way that it is as similar as the meaning of original sentences Question VI: Writing complete sentences with suggested words
Table 1.2: The test structures in 2005 (under WRITTEN AND
MULTIPLE - CHOICE questions) SCHOOL YEARS | 2005- | 2006 - | 2007- | 2008 - 2006 | 2007 | 2008 | 2009 TEST CONTENT 1, PHONOLOGY (3 Qs) | (5 Qs) | (5 Qs) | (5 Qs)
- Word Stress/ Ending sound/ Vowels or Xx Xx Xx Xx
consonants: Odd one out
10
Trang 20-| - Mistakes recognizing 2 VOCABULARY, GRAMMAR & CULTURE (24 (25 (25 (25 ISSUES Qs) Qs) Qs) Qs)
- Filling the blanks of the sentences by choosing x X Xx Xx
the best options (Comprising: tenses agreement, sentence structures, connectors, prepositions,
word choice, simple communicative functions, etc.) | 3 READING SKILL (15 (10 |} Qs | (Qs Qs) Qs each | each each | part) | part) part) - Filling the blanks of the passage by choosing the | X XxX Xx Xx best options - - Filling the blanks of the passage by choosing the Xx best options - Reading comprehension xX x x x - Reading comprehension X 4 WRITING ABILITY (8Qs |} (10 (10 | Qs ) Qs Qs each each | each | part) part) | part)
- Completing the sentences by choosing the best x X Xx x options (Focus on sentence level)
Trang 21
According to the structures presented in the above tables, the English test in 2006 encompassed 50 multiple-choice questions for students enrolled in both three-year and seven-year high-school programs, and the candiates were allowed to do the test within 60 minutes
1.4, The issue of instruction in writing ability at high school
In the High-school Program for English subject (pilot) issued by the Minister of
Training and Education in 2002, writing instruction was generalized by the
following two objectives: (1) Writing for personal communication, including letter,
inviting cards, etc.; describing or reporting personal routines, classroom activities;
or filling in the forms and survey (2) Paragraph writing about learnt topics, using language specified in the scope of high-school program (Hoang et al., 2006, p 231) However, teaching writing in most high schools in Vietnam has shown under techniques at sentence level
From the above statements, training writing ability focuses on following main types
of exercise:
- Finishing sentences in such a way that they are as similar as the meaning of
original sentences Students are given sample sentences with clear meaning and
target structures, then they are asked to write another sentences with similar
meaning but different structures
- Writing complete sentences with suggested words, based on the required
structures
- Explaining the original sentences by sentence transformation
Paragraph writing and essay writing have seemed to be neglected in most high schools for the reasons of time-consuming and not-being-used form in HGEs
Trang 22
1.5 Chapter summary
Reading this chapter, the readers get more essential information on the issue raised in this thesis, and are provided with the basis for specific discussion in the next chapters This information relates to the project of changing testing and assessment
at HGEs, the change of test structures through each exam, and a description of
techniques applied in teaching writing ability in high schools They are served as the ground of later arguments
Trang 23
CHAPTER 2
LITERATURE REVIEW
In chapter 1, relevant background information to the study has been shared specifically Works in this section will provide the reader with theory of writing
assessment in three portions: (1) language testing with the issue of reliability and validity; (2) test of writing ability and its focused test types (multiple - choice questions and written testing); (3) and discussion of the best way to assess writing
A conclusion, drawn from this reviewed literature, is expected to answer the
question "which test type and form should be the best choice to test writing ability
in a large-scale high-stakes examination"
2.1 Language testing
Reading this section, readers’ expectation will be met with two issues: discussion of
language testing and a good language test with its qualities relating to writing ability
assessment
2.1.1 What is language testing?
In language teaching, testing plays a crucial role among the components
constituting an instructional program Although they are manifested under diverse purposes in which “While the primary purpose of other components is to promote learning, the primary purpose of tests is to measure” (Bachman and Palmer, 2000,
p.19), they are implicitly understood to be constructed for teaching purpose It is clear that testing is regularly conducted during teaching and learning process with the participation of all stake-holders involved and various purposes of the test types
used
According to Bachman and Palmer (2000, p.8), “Language tests can be a valuable
tool for providing information that is relevant to several concerns in language
Trang 24
teaching” For this basic tenet, language tests were loaded with variety of roles as evidence to feedback on effectiveness of a teaching program, decision-making ground to determine learning materials and activities or to make inferences about
students’ language abilities, and finally a tool to clarify instructional objectives
Therefore, it is obvious that a test used correctly is a significant source to provide most useful information to teaching and learning process
2.1.2 What is a good language test?
Most educators find it hard to definite a good test in general since it depends on the
objectives of language uses, the purpose of the tests, and the resources available (Hughes 2003, p.8) Hughes accordingly suggested ideal characteristics for a test or testing system as follow
- consistently provides accurate measures of precisely the abilities in which
we are interested
- has a beneficial effect on teaching
- is economical in terms of time and money
It may be interpreted in brief the above factors as validity to concern with accurate
measurements made, and it may be said that a test includes reliability in it if it measures consistently At the same time, the concept of positive backwash is
inferred from the meaning of beneficial effect on teaching, and efficiency relating to
educational administration shows another way to refer to economical issues of time and money
Nevertheless, a question arisen from these factors likely entangles most educators as "Can we obtain all these essential factors in developing a single test?" The answer may be "Not always" because, in some cases, validity and reliability seem to be in contrast, or “a reliable test may not be valid at all” (Hughes 2003, p.50), but a valid
Trang 25
test must be reliable Consequently, a test type tends to be chosen according to certain essential qualities required regardless of their essentiality to the use of tests, as the following claim: “While reliability is a quality of test scores themselves, validity is a quality of test interpretation and use” (Bachman, 1990, p.25) For further argumentation, these two sides of a test will be discussed more specifically in the part 2.1.3
2.1.3 Main relevant test qualities
We all agree that to measure quality of a test, reliability and validity are referred as
key factors They provide major justification in the process of making inferences or decisions based on the test scores on an individual For the main concern of this study, apart from other qualities expressed in the usefulness of a test, two qualities will be discussed in details as reliability and validity
2.1.3.1 Reliability
In this sub-section, the issue of reliability is presented through four contents: (a) definition, (b) affecting factors, (c) reliability of objective and subjective tests, and finally (d) inter-rater reliability
(a) Definition: Reliability was defined by different authors with different ways, in
which scoring consistently seemed to be mentioned the most Accordance with Berkowitz, Wolkowitz, Fitch, and Kopriva (2000) cited in Rudner, Lawrence, Schafer, William (2000), reliability was explained, under the condition that “the scores are indicative of properties of the test takers”, as follows:
the degree to which test scores for a group of test takers are consistent over
repeated applications of a measurement procedure and hence are inferred to
be dependable and repeatable for an individual test taker (para.2)
Trang 26
(b) Affecting factors: There are three factors considered as main causes affect
reliability of a test: the test itself, the test-takers, and scoring factors For the first
one, each test type has its own characteristics, for instance multiple-choice concerns
with the effectiveness of distractors, correct answer, and difficulty of the items The
second is clearly explained by Rudner and Schafer (2002, p.22) as students’ performance in a testing process The last and the most concerned in this paper is consistency in scoring since “scorers are not always consistent; the scorers tend to change their criteria while scoring and are subject to biases ” (Rudner, 1992 Cited
in Rudner and Schafer, 2002, p.22) Hence, there exist thousands of ways to achieve
reliability, from consistent performance to scorer reliability, or partly to reach to a good test
(c) Reliability of objective and subjective tests: For objective and subjective tests, which are the most focused in this research and will be describéd in the next part, reliability clearly tends to be mentioned as the issue of scoring procedure for more reliable characteristic of the first type compared with the second Apart from many factors creating reliability of a test, consistency is often defined as measurement of reliability (Bachman and Palmer, 2000, p.19) In some cases, subjective tests for instance, the scorers who are responsible for the same paper may have different conclusions about it as chalk and cheese because one of them may expresse more
strictly in scoring than the others As a result, the scores obtained are not consistent, and could not be considered to be reliable, this causes unreliable tests
Once more time, Bachman and Palmer (2000) affirmed that:
Reliability is clearly an essential quality of test scores, for unless test scores are relatively consistent, they cannot provide us with any information at all about the ability we want to measure (p.20)
Trang 27
minimize inconsistency through test design, or by choosing test tasks as: “Of the many factors that can affect test performance, the characteristics of the test tasks are at least partly under our control” (Bachman and Palmer 2000, p.20)
(d) Inter-rater reliability: In writing assessment, inter-rater has been considered as a dilemma of essay evaluating, so many studies have been conducted under various issues relating to this problem Shohamy, Gordon and Kraemer (1992) wrote an article namely "The effect of Raters' background and training on the reliability of
direct writing tests" in The Modern Language Journal, Vol 76, No 1, pp 27-33
Baer et al (2004) compared the creativity manifested in more than 100 8th-grade students’ writing in their study Howell et al (2005) described the notion of Inter- rater reliability as follow:
Inter-rater reliability is the extent to which two or more individuals (coders or raters) agree Inter-rater reliability addresses the consistency of the implementation of a rating system
In the researcher's mind, the root of the consistency in essay rating is designing an
adequate scoring rubric to have the similar response to the same sample, because a perfect scoring scheme itself expresses a guidance to evaluate the students’ writing
so that judgments of two or more individuals may vary depending upon the criteria established (Moskal & Barbara , 2000) To enhance inter-rater reliability in a large-
scale examination, scorer training and monitoring skills are called for despite of corollary relating to financial problem
In short, there exists a conclusion that “By developing a pre-defined scheme for the evaluation process, the subjectivity involved in evaluating an essay becomes more objective.” (Moskal & Barbara , 2000)
Trang 28
2.1.3.2 Validity: face validity
Whereas test reliability relates to consistency of measurement, test validity refers to accuracy of measurement Hughes (2003) clarified that
A test is said to be valid if it measures accurately what it is intended to measure, or a test created can be inferred to be a true tool to measure
students” ability to perform their knowledge at certain levels required (p.26)
In other word, “if a test serves its intended function well, it is said to be valid, if it does not, it is invalid” (Nunnally, 1972, p.21) According to Alderson et al (1995), there are three types listed in the frame of validity: internal validity, external validity, and construct validity However, related to the research topic in this thesis, only internal validity is mentioned along with face validity as the main discussion Internal validity is discussed as conceptual approach which both subjective and objective judgments of a test required mostly base on assessment of the test itself
(such as its form, format, and content) That means it is mainly concerned with the
‘perceived content’ of the test and its ‘perceived effect’ (Alderson et al., 2003) To
assess this feature of a test, three most widespread methods should be included The first is face validity which the value of a test is criticized by various kinds of
stakeholders The second is content validity — the very dilemm because test content
can be only judged by professional experts or linguists The last one is response validity primarily conducted by test-takers However, for the scope of this research, internal validity will be discussed limitedly under face validity for its relevant to the test types mentioned
Face validity is not considered as the predominant method to assess the validity of a test; still in fact, it is the face of a test to be firstly examined when discussing the issue of good or bad test That is, a test with poor format together with such poor items, or unclear instructions may be said “not valid” In addition, to Ingram (1977,
Trang 29
p.18), face validity refers to the test’s “surface credibility or public acceptability” And judgments on test are regularly performed subjectively by ‘non-experts’,
administrators, and even by students or test-takers Their assessment, hence, may
not be officially accepted inspite of their admitted impacts on the validity of a test
Finally, a test called valid if it appears to be valid, or testing language skills should include adequate elements of those skills A question arisen for this argument, a critical issue, relates to face validity that which one is more valid - multiple-choice
format or essay format used in a test of writing ability?
2.1.3.3 Reliability and Validity
The contrast between reliability and validity (as mentioned before) are manifested under diversity of testers’ viewpoints in both principle and practice In principle,
some testers assumed that “a test cannot be valid unless it is reliable’, or “it is quite possible for a test to be reliable but invalid” (Alderson et al., 1995, p.187) Nevertheless, in practice, reliability alone is not sufficient since a test can only be
conducted when it is highly valuated with all test qualities Especially, with the present of these two factors, which one will be chosen to maximize, it depends on the test purpose A typical example for this argument is multiple-choice test, it is
called reliable test because of objective and consistent scoring, yet it is criticized for
invalidity to measure students’ ability of English in real life (Alderson et al., 1995, p.187) However, it is chosen for the purpose of reliable scoring
2.2 Writing ability assessment
In this part, some explanation of writing ability and its testing types will be included
for insight of writing assessment
Trang 30
2.2.1 What is writing?
Writing is considered as manifestation of human beings through language for its distinction between textual and non-textual medium of inhuman This has been proved by archeologists and linguists who observed the ancient illustration like as cave drawing and painting Thus, it is admitted that the ability of organization a text brings out the existence of people in a developed community In other word, only people in this world can develop the system of language, and writing system is one of the most central features of language To Ager (1998, par.1), “Writing is a method of representing language in visual or tactile form Writing systems use sets of symbols to represent the sounds of speech, and also have symbols for such things
as punctuation and numerals”
Besides, writing is a special way of communication among the people firstly having the same writing system with the implication that they are in an organization, it is
because writing stands for a great formality holding numerous principles as well as rules in it when writers organize their thoughts
In short, writing is the highest development of a language of which it completely
covers the whole features
2.2.2 Writing test formats - subjective format and objective format
Writing has been defined in the above section, and the ways used to assess a
person’s writing ability always attract attention of educators As indicated in the previous parts (in 2.1), the choice of test type depends on the objectives of a test To
deal with testing writing ability, all the features involved should be observed As the
result, the advantages and disadvantages of subjective and objective tests will be
discussed in this section on the ground of their nature as “the subjective type of test
allows students to write and use language in their answers while the objective type
Trang 31
does not” (Tunku, 2000) Yet, the system by which a test is scored gives that test the name, not really the test itself, so it is more focused on this study as well
2.2.2.1 Objective testing
In Vietnam, objective testing (the most common questions are multiple-choice, true/false, matching items, and completion) is considered as a new trend compared with subjective testing The advantages of these forms, however, are obviously manifested under scoring issue The scoring process is conducted objectively by
following a consistent answer or by scanning machines and computers Besides, the
test takers cannot supply the selected-response item formats known as multiple-
choice, matching, and true-false (Zimmaro, 2003, p.15) with various alternative so
that objectiveness will be always maintained in this type
However, this testing has its own disadvantages Because it requires test takers to choose the correct response from several alternatives suggested, it is said to embed
more guessing factors in this type In addition, this kind of test, especially, has a
severe effect on English - majored students (at University or College) for lack of
writing ability This result is of the true that students were not trained with adequate
ability in high school because of being prepared for examinations (mostly under multiple - choice and sentence writing exercises which mainly focus on grammar) rather than being established with sufficient knowledge (four language skills) to
further studying
In addition, some disadvatages of this type are also known as time consuming to write good items, less valid than subjective test, and no suitability for assessing certain abilities Despite being a controversial topic, it still remains popular due to its utility, reliability, and cost effectiveness
Trang 32
2.2.2.2 Subjective testing
Contrasting with objective testing is subjective testing which require the test takers to write and present an original answer, including short-answer essay, extended- response essay, problem solving, and performance tasks (Zimmaro, 2003, p.15) To measure student's writing ability, these tests are encouraged to use with a lot of
arguments based on their several advantages and disadvantages
For good side, subjective tests represent their strongest point under face validity which allows the test takers to perform their actual ability Students can demonstrate written expression in such a logical order that their results are precisely measured
Beside that, learners' language knowledge is assessed in a wide range including mechanics, grammar, style, organization, and logical development It is clear that these fields of knowledge are unable to be shown in objective tests since they are kind of discrete language properties assessment (emphasizes on mechanics, grammar, and vocabulary only) while other significant aspects of written language (organization, content, and coherence) are ignored (Albertini et al., 1996, p.75)
The biggest drawback of subjective test revealed under its name is the variation in
marking which is shown that "experienced examiners award widely varying marks
not only on the same piece of work marked by other examiners, but on their own
marked scripts re-marked after a passage of time" (Zimmaro, 2003) According to
her, this format has been criticized to base on the basis of loosely defined criteria with lack of consistency from one reader to another, and grading is time-consuming and often impractical in large numbers On the other hand, such examinations have been thought to be deficient in consistency because of essay ratings; besides, over-
burdened staff, time - consuming and financial problem are considered the big
dilemma to make the exam impractical as well Although it's not the best way to
assess student's writing ability, subjective form is still able to gain both validity and
reliability when issues of administration, testing conditions, population factors, and
scoring are under appropriate attention and adequate control (Huot, 1990)
wv
Trang 33
According to Jacobs (2004), the major task in scoring essay tests is to maintain consistency, and to make sure that answers of equal quality are given the same number of points, "At once, there has existed two approaches to scoring essay in writing assessment, they are named as analytic or point method, and holistic or rating method." (p.5)
- Analytic: a model answer, in which the major components are defined and point
values are assigned, should be called for before scoring in this method so that the
raters can carry out a comparison between the student's papers with the model
answer In addition, an analytic rubric should be used when the tester expects students’ strengths and weaknesses, detailed feedback, assessment of complicated skilled or performance, and their self-assessment of knowledge understanding and performance (Zimmaro, 2003, p.32)
- Holistic: This method requires a list of certain developed criteria that the total
quality of examinee's answers based on The score for each paper will be decided
according to the stacks it involved in, these various stacks are regularly sorted into
three levels: the first stack includes the best answers, the second stack comes from
the average, and the poorest go into the third stack This method is usually used for
a large group of test takers when a quick snapshot of achievement is desired
together with a single dimension is adequate to defined quality (Zimmaro, 2003, p.32)
In summary, each test format has its strong and weak points, the real challenge for
the educators who have responsibility for making decision on finding and using
valid and fair assessment tools is to bring out a suitable test format based on specific purpose of education, especially in HGEs
Trang 34
2.3 The best way to assess writing ability: a combination of testing formats The strong and weak points of each format presented in the above section lead to various opinions in deciding which test type should be applied to obtain a valid and reliable test of writing ability Beside that, the history of testing passed through movement from subjective to objective testing with presenting both strengths and weaknesses, thus most testers tend to find out a better sample with the best features
of these two extremes, that is to say, the overall content of the first and consistent
objective scoring of the second (Madsen, 1983, p.7)
In this part, choice of combinating these two forms will be shown for rejecting the unfavored elements of each one According to Driver & Krech (2001), to measure
students fairly “assessments should be valid (measuring what they intend to measure) and reliable (resulting in the same placement whenever they occur or
whoever grades” (p.18)
It is obviously, moreover, that both written and multiple-choice tests lead to problems when used alone (see the weak and strong points of each one presented in the previous section) since they cannot include reliability and validity in each
format Breland (1996) expressed that one way to ensure reliability is to use “both free-response and multiple-choice tasks to make up the assessment” (p 23) In addition, to ensure fair assessment, White (1998) also agreed that it should be combination of testing methods as: “The best use of multiple-choice scores, if they
must be employed in the area of writing, is as a portion of a test, rather than as the
assessment itself” (p 240) It is also reported in some studies, moreover, that there
existed this approach by Educational Testing Service (ETS), the more reliable essay scores were not aimed to but a unique kind of information would be provided for
the purpose of public reporting - based on the viewpoint that a sole test can be reliable when considered along with other information
In short, White (1998) emphasized the need for this kind of assessment as follows:
Trang 35
The results of a careful multiple-choice test, when combined with the results
of a single essay test, will yield a fairer and more accurate measure of writing ability than will either test when used by itself A preferable alternative is to score more than one writing sample, either in paired essay tests or in portfolios (pp 240-241)
2.4 Chapter summary
In closing, this chapter has presented a conceptual framework relating to three main issues The introduction of language test with relevant reliability and validity
supplied general knowledge of this realm Then, based on this theory, a more
specific understanding of writing ability assessment was discussed to provide the answer for the issue: “what is the best way to test writing ability?” which was mentioned in the last section of the chapter Following this theory chapter, research questions and methods to find the answers for them will be shown in the next chapter of research methodology
Trang 36Chapter 3: DOLOG
CHAPTER 3
METHODOLOGY
The previous chapter has dealt with the key theories as supporting background to
the thesis This chapter discusses the methods employed to get data served for findings in the study, it includes two main sections The first section was stated with
reasearch questions Accordingly, the second section describes research design with three parts: the subjects, the instruments analysis, and implementaion under two
sub-sections: data collection procedures and data analysis procedures
3.1 Research questions
As stated, this thesis aims to investigate the two different test formats which are multiple-choice and written tests used in HGEs The findings will serve to observe
their relationships in writing ability assessment and find the most suitable test format for this large-scale exam To obtain these aims, specific following questions
have been raised as:
(1) Are there any differences of ranking students' writing ability between subjective
(sentence and essay writing) and objective testing (multiple - choice)?
(2) From the result of question (1), are there any correlations between written test
format and multiple- choice format in the field of writing ability assessment?
(3) Do students show the same favor or judgments on test forms in HGEs and
should another form be used instead of the current ones?
These questions are the basis for finding out the correlations of assessing students’
writing ability by objective and subjective tests
Trang 37
3.2 Research design
To get the necessary information for answers to the mentioned research questions, a test and a survey were designed and carried out In this research, data collection was
mainly based on the test of three different formats, and the survey questionnaires
were added to gain empirical evidence for diverse results achieved from the test
The following will describe in detail the subjects, the instruments, and the
implementation of the research 3.2.1 Subjects
3.2.1.1 The core subjects: First - year students at An Giang University
Aside from getting primary information for this research through a questionnaire answered by 60 high-school students at 12-grade of Binh Khanh High-school, the research was mainly conducted on 44 students who are the first - year students (school year 2008 - 2009) of An Giang University, majored in English They were
both test takers and the second questionnaire respondents, coming from the two
classes of Department of Foreign Languages, The Faculty of Pedagogy These
students had just graduated from high school, and were chosen to the University, so
they carried out the test in this study with their own knowledge of high school
program, since they were in the beginning of the first term at University That is to say, they were still not trained with college writing because this study aimed at investigation into writing ability after three years learning English at high school,
particular in wash back of preparation for HGE
Most of them had been graduated from the various high-schools of An Giang
province, this point brought so many advantages to the researcher since their
learning process was controlled under the same evaluation guideline, or their ability
was assessed equally among schools of one province Beside that, they had three years influenced by the changes of test form with full multiple-choice questions (get
Trang 38
more information with reference to the table 1.3) These may be considered as the base for making a practical comparison of their opinions about current test formats and their writing ability acquisition at high schools
Some arguments against the study subjects, though, may maintain that there exists contrary of the supplementary subjects and the core ones, it is true that those who
are chosen to university are good students from various high schools (the second group) whilst the first group was required to be at average level in one school The cause bases on the ground that this difference may lead to an objective viewpoint
towards the variety of students impacted by exam changes
3.2.1.2 The supplementary subjects: 12 - grade students at Binh Khanh High
school
Before conducting the research on the core subjects as mentioned in the previous section, another questionnaire had been designed and handed out to the high-school
students for the purpose of getting preliminary information as reference for
background of the research This group included 60 students of the 12 grade at Binh
Khanh High school (each 20 students were randomly chosen from three classes), they were in the second term of the school year 2007- 2008 The reasons for gaining information from this group will be stated as following:
(1) Among four state high schools in Long Xuyen city, An Giang province, Binh
Khanh was the youngest school with young staff of which the number of teachers
majored in English was 7, half of them had 3 - 5 years' seniority, and the rest were over 5 years This school is located in the environs of the city in which the most
students had been ranked at average rate through HGEs
Relying on these certain features, the researcher hold the view that examination
changes may have stronger impacts on students coming from this school for most of them considered getting High-school Graduated Certificate as their final goal of
Trang 39
studying, so test formats applied in all exams seems to be a decisive factor for their learning
(2) The questionnaire had been handed out randomly among 3 classes of the 12 grade for the reason that this survey was conducted for preliminary remarks of using multiple - choice testing, hence they were not considered as sample of the research 3.2.2 Instruments
To carry out the research, the instruments include a survey questionnaire, a test and
some information for essay scoring 3.2.2.1 Survey questionnaire
In this research, questionnaire was designed as supporting tool for getting information relating to students' attitudes of the current exams as well as finding out the writing drills with which students were trained the most in high school The
questionnaire was designed under four questions for two issues relating to students themselves: (1) self-assessment of four skills trained, particular in their ability of writing; and (2) their judgment on test forms used in HGE The questions were simple and similar to multiple - choice type, except for the question 4 with both multiple - choice and text open end (or verbatim) in order to find out the reasons or
justifications for the respondents’ choice Following are the descriptions of the
questionnaire:
Questions 1 and 2 have presented the same type with two parts included (mainly
under multiple - choice) The first part, relating to the four skills of language, was built up to observe the students' ability of English (which skill was trained the most
in high - school program, and vice versa, which one was least concentrated)
Moreover, the second part of these questions also took a crucial part in the analysis
of the key questions involving in writing ability because they were mentioned under three main reasons concerning both the process of teaching and learning at high
Trang 40
school and students' attitude Aside from the three options, students will be given a
chance to express their desired answer with the option "Others" if they find unsatisfied with the suggested ideas, and the positive viewpoints from students should be naturally expected for extra useful information to the thesis
Question 3 was also built up in the same type as the previous questions, that is to say the information would be collected from two parts under multiple - choice as well However, this question aimed at typical exercises of writing ability which have been applied at high school; the researcher stressed on three different formats as Multiple - choice exercises, Sentence writing, and Short paragraph writing as her purpose of this thesis And, one more option ("Others") was also looked ahead to be put in this question for getting various information from students
The last question was somewhat differently from the above with five options and an open question to collect reasons of the students’ choices This question was to find
out the most favorable format through their learning process As the result, three
mentioned formats in this thesis along with their combinations listed into five options partly contribute to the researcher's expectation of collecting persuasive
evidence for the recommendation section In addition, with various judgments on these test formats, a list of strengths and weaknesses of each test fortmat has been
partly manifested 3.2.2.2 Test
The test is a compile of writing testing sections in the HGEs of the years 2004 and 2006, including three parts: multiple - choice, sentence writing and short paragraph writing, with the purpose of investigating correlation coefficient of these formats The content in each part of the test is not the aim to reach in this research because they were chosen randomly from the HGEs, but its intended formats which take an
important part in designing this test were selected relying on the changes of
examinations by the MOET through the years