An investigation into writing assessment the use of multiple choice and written testing

With the stated aims, two tools were employed in this research: 1 a test including three formats one with multiple-choice writing, and the others with sentence writing and paragraph writ

Trang 1

HO CHI MINH CITY OPEN UNIVERSITY

AN INVESTIGATION INTO WRITING ASSESSMENT: THE USE OF

MULTIPLE-CHOICE AND WRITTEN TESTING

A THESIS SUBMITTED IN PARTIAL FULFILMENT

OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF ARTS (TESOL)

Submitted by NGUYEN THI XUAN BINH

Supervisor

Dr VU THI PHUONG ANH

Trang 2

CERTIFICATE OF ORIGINALITY

I certify the authorship of the thesis submitted today entitled “An

investigation into writing assessment: The use of Multiple-choice and

Written testing” is my own work

Except where reference is made in the text of the thesis, this thesis does not contain material published elsewhere or extracted in whole or in part from a thesis by which I have qualified for or been awarded another degree or diploma

No other person‟s work has been used without due acknowledgement in the main text of the thesis

This thesis has not been submitted for any degree or diploma in any other tertiary institution

Ho Chi Minh City, December 2010

NGUYEN THI XUAN BINH

Trang 3

ACKNOWLEDGEMENTS

I am very happy to express my all appreciation to my dear supporters for what they brought me, since I would hardly have fulfilled this thesis without them

I would firstly like to show my sincere gratitude to my supervisor, also my lecturer,

Dr Vu Thi Phuong Anh, for her inspirational guidance to me This topic has been taken shape in my mind through the time of studying with her for the practical and critical viewpoints of testing and evaluating in Vietnam Moreover, through the time guiding me to accomplish this thesis, she supported me with valuable orientation and insightful discussion that I still keep in my heart

In the most approving condition of An Giang University, then, this thesis has been conducted smoothly Accordingly, I would then like to thank the Managing Board

of Foreign Languages Department for allowing me to conduct testing on English - majored students I also express my gratitude to all the Department staffs who helped me conduct the study In addition, I owe this thesis to the classes DH9D1 and DH9D2 at An Giang University for their vital contribution as well

I also show my special thanks to Ho Chi Minh City Open University for giving me

an opportunity to study with enthusiastic teachers and dear classmates of TESOL 1

in an accademic environment They shared with me useful and interesting knowledge during the course

The next to be in my mind and my heart are the friends whose contribution seems to

be indispensable to my research, especially Le Thi Thien Huong and Tran Thi Bich Dung

After all, I would like to show my deep love and great thankfulness to my family as the endless supporters for my life and work, particularly in the fulfillment of this thesis

Trang 4

The purposes of this thesis are, therefore, to find out whether there are any differences of using multiple-choice and previous formats (known as written testing) to assess writing ability, and the most suitable format will be searched for under careful consideration as well

With the stated aims, two tools were employed in this research: (1) a test including three formats (one with multiple-choice writing, and the others with sentence writing and paragraph writing) was firstly conducted to observe their differences in rating students In addition, the result of another test of writing (essay) before conducting this study was also employed to supply the readers with evidence on students' true writing ability All the data got were analyzed and interpreted under descriptive statistics and correlation coefficients for the findings (2) Then a questionnaire was administered to these students for digging out more information

on students‟ background knowledge and their attitudes towards these formats The results showed that most students do not obtain similar results taking different test formats; the mean score of multiple-choice test is the highest, whilst two others are relatively similar A combination of these test types was then considered because of closer relationship between their average total score with another test of essay writing In addition, it is noticeable that most students prefer this new type for full of interesting and insightful reasons From these findings, some recommendations were made for considering a mix-form assessment of students‟ writing ability in high-school graduation examinations

Trang 5

TABLE OF CONTENTS

CERTIFICATE OF ORIGINALITY i

ACKNOWLEDGEMENTS ii

ABSTRACT iii

TABLE OF CONTENTS iv

LIST OF TABLES AND FIGURES viii

TABLES Page viii

FIGURES iix

ABBREVIATIONS x

INTRODUCTION 1

0.1 Statement of the problem 1

0.2 Aims and Overview of the research 3

CHAPTER 1 5

BACKGROUND TO THE STUDY 5

1.1 Vietnamese crucial examination: Criticism 5

1.2 An appearance of “the overall project to innovate the system of HGEs and entrance examinations to university and college since 2005” 8

1.3 The changes of testing formats used in HGEs 8

1.4 The issue of instruction in writing ability at high school 12

1.5 Chapter summary 13

CHAPTER 2 14

LITERATURE REVIEW 14

2.1 Language testing 14

2.1.1 What is language testing? 14

2.1.2 What is a good language test? 15

2.1.3 Main relevant test qualities 16

2.1.3.1 Reliability 16

2.1.3.2 Validity: face validity 19

2.1.3.3 Reliability and Validity 20

Trang 6

2.2.1 What is writing? 21

2.2.2 Writing test formats - subjective format and objective format 21

2.2.2.1 Objective testing 22

2.2.2.2 Subjective testing 23

2.3 The best way to assess writing ability: a combination of testing formats 25

CHAPTER 3 27

METHODOLOGY 27

3.1 Research questions 27

3.2 Research design 28

3.2.1 Subjects 28

3.2.1.1 The core subjects: First - year students at An Giang University 28

3.2.1.2 The supplementary subjects: 12 - grade students at Binh Khanh High school 29

3.2.2 Instruments 30

3.2.2.1 Survey questionnaire 30

3.2.2.2 Test 31

3.2.2.3 Essay scoring method 32

3.3 Implementation 35

3.3.1 Data collection procedures 35

3.3.1.1 Conducting the test: 35

3.3.1.2 Questionnaire 37

3.3.2 Data analysis procedures 37

3.3.2.1 Test scores analysis: 37

3.3.2.2 Descriptive Statistics 37

3.3.2.3 Correlation coefficient 39

3.3.2.4 Questionnaire 39

CHAPTER 4 41

RESULTS INTERPRETATION AND DISCUSSION OF FINDINGS 41

4.1 Descriptive statistics: 41

Trang 7

4.1.1 Frequency distribution 41

4.1.2 Central Tendency and Dispersion 45

4.1.3 The skew of distribution 48

4.2 Correlations 48

4.2.1 Correlations among the different formats in the test 49

4.2.1.1 Correlation between Multiple - choice testing and Sentence writing 49

4.2.1.2 Correlation between Multiple – choice testing and Paragraph writing 50

4.2.1.3 Correlation between Sentence writing and Paragraph writing 51

4.2.2 Correlations between the two tests* 52

4.3 T-test: Testing hypotheses about the difference of average value 55

4.3.1 Among the formats used in the major test 55

4.3.2 Between the major and minor tests 56

4.4 Questionnaire analysis 57

4.5 Discussion of findings 62

4.5.1 Score distributions 62

4.5.2 Correlation 64

4.5.3 Questionnaire 65

CHAPTER 5 68

CONCLUSION AND RECOMMENDATIONS 68

5.1 Conclusions 68

5.2 Recommendations 69

5.2.1 Recommendation 1: Suggestion for combination of different test formats 69

5.2.2 Dealing with shortcomings of subjective scoring 70

5.2.3 Recommendation 2: Distinguishing between Formative and Summative assessments 70

5.3 Limitation of the study 72

5.4 Further study 72

REFERENCES 73

Trang 8

APPENDIX 1: TEST OF WRITING 80

APPENDIX 2: STUDENT QUESTIONNAIRE 83

APPENDIX 3: UNORDERED LISTING OF SCORES 86

APPENDIX 4: UNORDERED LISTING OF SCORES FOR PART 3 88

APPENDIX 5: CLASSIFY STUDENTS INTO GROUPS FOR MULTIPLE CHOICE TEST 90

APPENDIX 6: CLASSIFY STUDENTS INTO GROUPS FOR SENTENCE WRITING TEST 92

APPENDIX 7a: CLASSIFY STUDENTS INTO GROUPS FOR ESSAY TEST SCORED BY GROUP ONE 94

APPENDIX 7b: CLASSIFY STUDENTS INTO GROUPS FOR ESSAY TEST SCORED BY GROUP TWO 96

APPENDIX 7c: A COLLECTION OF STUDENTS' RANKING FROM THE THREE FORMATS 98

APPENDIX 8: COLLECTED REASONS FOR THE QUESTION 4 (IN THE QUESTIONNAIRES) 100

APPENDIX 9: QUESTIONNAIRE DATA ANALYSIS 103

APPENDIX 10: GUIDELINE FOR APPLYING MULTIPLE-CHOICE TESTING TO ENGLISH TEST IN 2006 108

APPENDIX 11: ĐỀ THI THPT TỪ NĂM 2003 ĐẾN 2009 110

Trang 9

LIST OF TABLES AND FIGURES

Table 1.1: The test structures in 2003 and 2004 9

Table 1.3: The test structures through 2006 - 2009 10

Table 4.3a: Frequency table for part 3 - Group 1 43 Table 4.3b: Frequency table for part 3 - Group 2 44 Table 4.4: A summary of frequency distribution 44

Table 4.6: Correlation between Multiple-choice and Sentence writing 49 Table 4.7: Correlation between Part 1 (Multiple - choice) and Part 3

(Paragraph writing) scored by Group 1

test and minor test

54

Table 4.15: Correlation between Part 3 (scored by group 2) of the major 54

Trang 10

Table 4.16: Multiple-choice and Sentence writing 55

Table 4.18 T-test between the major and minor tests 57

Trang 11

ABBREVIATIONS

HGE High-school graduation examination

MOET Ministry of Education and Training

SPSS Statistical Package for the Social Sciences

UEE University entrance examination

Trang 12

0.1 Statement of the problem

Since the MOET decided to apply multiple-choice testing format to HGEs (2005), there has existed salient controversial issue turning around this change with a variety of contrary views including support and rejection These viewpoints stem from the following causes:

Firstly, at the early stage of using this new format, objective test was not popularized to all stakeholders to get the agreements in the society, so some educators showed suspicious eyes on this kind of test since they stressed more on its drawbacks which have been stated as follows:

- For test designers: compared to other types, it is more challenging and exploiting to write good multiple-choice items because plausible distractors or incorrect answer options are required as the most important in this type, therefore it

is put under time-consuming to work For that reason, some designers tend to support “recall” type questions as they are easy to write, but the failure is that test takers are able to guess the answer through a test

In addition, the issue of language knowledge and skills caused a lot of controversy Some educators argued that this test type could not measure all language skills or ability of critical and logical thinking However, others advocated this test type as it could cover the whole content required in a fixed program

Trang 13

- For test takers: multiple-choice testing can not measure students‟ overall performance, it sometimes assesses recognition over recall because test-takers may attempt to guess rather than determine the correct answer due to their incapacity to answer a particular question Therefore, a number of students have a chance of receiving a mark for an unknown question For productive (speaking and writing skills), it lacks training students to express their ideas directly and to have logical creative thoughts

Secondly, it is sad but true that the training process in the high schools in Vietnam mainly aims at examinations, that is to say teaching and learning what will be tested Therefore, the consequence of this tendency has negative backwash on students and even teacher as they only focus on the test format known as multiple-choice; as a result, students cannot achieve the proficient ability of English

Over the years teaching at An Giang University, I recognized that most freshman students who are majored in English did not perform their writing ability well The possible explaination for this problem is that these students concentrated too much

on multiple-choice testing which is the main testing format applied in both HGEs and UEEs

Moreover, the fact that how a variety of language abilities can be measured by only one kind of test (i.e direct or indirect testing) makes the testers a lot of difficulties since each skill has its specific characteristics, so they require different ways to test For example, if the teachers want to know how well the students pronounce a language, the best way is to get them to speak On the other hand, if they want to check students' grammar structures, it is better for them to use multiple-choice testing (Hughes, 2003)

A question has been arisen in the researcher's mind is that: How does objective test affect teaching and learning process according to educational objectives, in term of

Trang 14

In the face of such question, the study was carried out with the hope for appropriate kind of testing – multiple-choice or written testing – in which testing results can truly reflected students‟ real writing ability

0.2 Aims and Overview of the research

The research was carried out for the aims of

- Investigating the relationship among various testing formats and certain problems related

- Finding out the best test type for writing assessment in HGEs

In order to achieve these aims, the data were collected from two main sources:

(1) A mix-formed test was designed to observe students‟ writing ability through different three formats including multiple-choice testing, sentence writing, and paragraph writing

(2) Questionnaires for students were composed of four primary questions to identify students‟ perceptions of writing assessment

The thesis consists of five chapters as follows:

Chapter 1: Background to the study

This chapter mainly focuses on setting of the research based on background information and practical viewpoints

Chapter 2: Literature Review

Discussion and analysis on the related theoretical perspectives presented in this section, including language testing with the issues of reliability and validity in a test; test of writing ability and its focused test types (multiple - choice questions,

Trang 15

Chapter 3: Methodology

In this chapter, based on the aims stated, the research design and implementation are framed involving the setting of the study, the data collection procedure, and the data interpretation process

Chapter 4: Results and Discussion

This chapter presents the findings to the study through description and analysis of the collected data, and finally, discussion of the study results Then, an interpretation of research findings is displayed together with conclusion

Chapter 5: Recommendations and Conclusion

In this chapter, recommendations to the problematized issues are given out And the last parts reveal the declaration of limitations of the study, suggestions provided for further research, and the thesis conclusion

Trang 16

CHAPTER 1 BACKGROUND TO THE STUDY

This chapter addresses the relevant contexts to the issue raised for supplying the readers with the origin of the subject Besides, it provides background information for general understanding There are four sections presented in this chapter Section

1 draws a real picture of important Vietnamese examinations under two primary criticisms Then, section 2 mentions the MOET‟s project to innovation of examination system in Vietnam, in which multiple-choice format was introduced as the new trend in HGEs and entrance exams Section 3 is considered as a subsequent impact to the testing innovation; it also shows an overall picture of the HGEs in the transition period through presenting various structures from 2002 to 2009 Finally, the last section describes the issue of teaching writing ability in high school under primary techniques These contents are presented under a logic chain of large to more detailed problems so that the research issue will be delved into At the same time two last sections also meet expected background information for the research questions

1.1 Vietnamese crucial examinations: Criticism

It is known that HGE is a national annual exam organized by the MOET for the purposes of evaluating high-school teaching and learning program, certifying the students who have enough qualification for a high school diploma, or choosing the students who have obtained enough knowledge required to take part in the UEE As

a result, crucial decisions relating to stakes-holders, educational programs, and other social units concerned will be made relying upon the achieved outcome in this exam For this important feature, this exam appeals to public opinion and MOET a lot, so it is usually discussed in the media for months before the exam

Trang 17

For high-stakes examinations, Marchant (2004) carefully explained the consequences brought out under school-wide average scores a school got as well as individual students' high or low scores the students obtained as:

High school-wide scores may bring public praise or financial rewards; low scores may bring public embarrassment or heavy sanctions For individual students, high scores may bring a special diploma attesting to exceptional academic accomplishment; low scores may result in students being held back

in grade or denied a high school diploma (p.1)

Accordingly, for certain purposes, a HGE (and even university entrance exam) in Vietnam may be criticized under various viewpoints because of their weaknesses They are often criticized for the main following reasons:

- Causing stress for the stake holders As their results take an important part in

deciding students' future, they have to learn more to prepare for the exam for many years Besides, the exams are often organized in large scale in Vietnam, so all the stake holders tend to be involved in for a long time before and after the exam Saville's model (2006) explaining the roles of stake holders in the testing community as follow:

Trang 18

Figure 1.1: Saville's model (2006) of stakeholders' roles in the testing community

- Students' results are solely used for many crucial decisions like continued education (college/ university), graduation certificate (or high-school diploma), job qualification, etc Some educators argue that evaluating the knowledge student obtained within three years of learning at high-school or making decision on their future on just a single test is not the best resolution, it may be unfair for some students because of unexpected reasons happening during the exam

Therefore, the MOET's overall project to innovate the system of examinations at high school in Vietnam was issued, yet it also represented salient controversial

Context Stakeholders in the Testing Community

inter alia)

stakeholders using test scores

Cambridge ESOL

Test construct

Test format

Test conditions

Test assessment criteria

Test scores

Testing system

Trang 19

instead of one large test result for students‟ further education after high school This project will be shown immediately in thevery next part

1.2 An appearance of “the overall project to innovate the system of HGEs and entrance examinations to university and college since 2005”

It is admitted that testing and evaluation is one of the most important stage of education since it has strong impact on teaching and learning process Based on this acknowleged role, the MOET made an enhancement to the field of measurement and evaluation systems that is actually the nature of innovation to the nation-wide examinations along with serious formative assessments at school

In Vietnam, there are four current high-stakes examinations organized every year:

HGE (at the end of May and early June), university and college entrance

examination (at the early July), college entrance examination (at the end of July and early August), and entrance examination to vocational secondary school (after the previous exam) They are held under supervision of the MOET, hence the MOET's project tends to affect them all with the change of examinations, particularly in test format

In the framework of the mentioned project (see appendix 1 for further information), the MOET advocated using objective testing to foreign languages (including English, Chinese, French, and Russian) in HGEs and university and college entrance exams since 2005 However, until 2006, language tests with multiple-choice testing format (with four choices A, B, C, D) were put into practice The more detailed description will be mentioned in the next part

1.3 The changes of testing formats used in HGEs

As mentioned, there was a change of test form from the 2006 HGEs in which multiple-choice testing was entirely applied to English, instead of partly being

Trang 20

using multiple-choice testing in most crucial examinations would become predominant in Vietnam for its progress considered as a modern scientific study

It is obvious that the trend of using multiple-choice testing in HGEs and UEEs seemed to be gradually expanded from 2002, and until the 2006 exam, written form was rejected completely Following is the presentation of the test structures (Note: just focus on test format, not test content) from 2001 - 2009 (collected by the writer) They are divided into three stages to give the readers a logical overview of changes through the years

Written and multiple

- choice format)

Question I: Completing the passage with tenses

agreement

Question II: Filling the blanks with appropriate

prepositions (NOT GIVEN)

X

Or: Completing the sentences by choosing the best

option for each blank

Question III: Giving the correct form of the words

in brackets

Question IV: Reading comprehension (making

questions from the suggested words and answering

them; choosing true or false)

Or: Reading the passage and answering the

questions

Question V: Completing the passage by filling the

blanks with suitable words (NOT GIVEN)

Trang 21

Question VI: Finishing the sentences in such a way

that it is as similar as the meaning of original

Table 1.1: The test structures through 2001 - 2004 (under WRITTEN and

MULTIPLE - CHOICE questions)

Question I: Completing the passage with tenses agreement by choosing the best

options

Question II: Completing the sentences by choosing the best option for each blank Question III: Completing the passage by filling the blanks with suitable words (NOT

GIVEN)

Question IV: Reading comprehension (making questions from the suggested words

and answering them; choosing true or false)

Question V: Finishing the sentences in such a way that it is as similar as the meaning

of original sentences

Question VI: Writing complete sentences with suggested words

Table 1.2: The test structures in 2005 (under WRITTEN AND

MULTIPLE - CHOICE questions)

- Word Stress/ Ending sound/ Vowels or

consonants: Odd one out

Trang 22

2 VOCABULARY, GRAMMAR & CULTURE

ISSUES

(24 Qs)

(25 Qs)

- Filling the blanks of the sentences by choosing

the best options (Comprising: tenses agreement,

sentence structures, connectors, prepositions,

word choice, simple communicative functions,

(5 Qs each part)

- Filling the blanks of the passage by choosing the

(10

Qs each part)

(5 Qs each part)

- Completing the sentences by choosing the best

options (Focus on sentence level)

- Choosing the answers which are close to the

meaning of the original sentences

X

Table 1.3: The test structures through 2006 - 2009 (under 50 MULTIPLE -

CHOICE questions)

Trang 23

According to the structures presented in the above tables, the English test in 2006 encompassed 50 multiple-choice questions for students enrolled in both three-year and seven-year high-school programs, and the candiates were allowed to do the test within 60 minutes

1.4 The issue of instruction in writing ability at high school

In the High-school Program for English subject (pilot) issued by the Minister of Training and Education in 2002, writing instruction was generalized by the following two objectives: (1) Writing for personal communication, including letter, inviting cards, etc.; describing or reporting personal routines, classroom activities;

or filling in the forms and survey (2) Paragraph writing about learnt topics, using language specified in the scope of high-school program (Hoang et al., 2006, p 231) However, teaching writing in most high schools in Vietnam has shown under techniques at sentence level

From the above statements, training writing ability focuses on following main types

of exercise:

- Finishing sentences in such a way that they are as similar as the meaning of original sentences Students are given sample sentences with clear meaning and target structures, then they are asked to write another sentences with similar meaning but different structures

- Writing complete sentences with suggested words, based on the required structures

- Explaining the original sentences by sentence transformation

Paragraph writing and essay writing have seemed to be neglected in most high schools for the reasons of time-consuming and not-being-used form in HGEs

Trang 24

1.5 Chapter summary

Reading this chapter, the readers get more essential information on the issue raised

in this thesis, and are provided with the basis for specific discussion in the next chapters This information relates to the project of changing testing and assessment

at HGEs, the change of test structures through each exam, and a description of techniques applied in teaching writing ability in high schools They are served as the ground of later arguments

Trang 25

CHAPTER 2 LITERATURE REVIEW

In chapter 1, relevant background information to the study has been shared specifically Works in this section will provide the reader with theory of writing assessment in three portions: (1) language testing with the issue of reliability and validity; (2) test of writing ability and its focused test types (multiple - choice questions and written testing); (3) and discussion of the best way to assess writing

A conclusion, drawn from this reviewed literature, is expected to answer the question "which test type and form should be the best choice to test writing ability

in a large-scale high-stakes examination"

2.1 Language testing

Reading this section, readers‟ expectation will be met with two issues: discussion of language testing and a good language test with its qualities relating to writing ability assessment

2.1.1 What is language testing?

In language teaching, testing plays a crucial role among the components constituting an instructional program Although they are manifested under diverse purposes in which “While the primary purpose of other components is to promote learning, the primary purpose of tests is to measure” (Bachman and Palmer, 2000, p.19), they are implicitly understood to be constructed for teaching purpose It is clear that testing is regularly conducted during teaching and learning process with the participation of all stake-holders involved and various purposes of the test types used

According to Bachman and Palmer (2000, p.8), “Language tests can be a valuable

Trang 26

teaching” For this basic tenet, language tests were loaded with variety of roles as evidence to feedback on effectiveness of a teaching program, decision-making ground to determine learning materials and activities or to make inferences about students‟ language abilities, and finally a tool to clarify instructional objectives Therefore, it is obvious that a test used correctly is a significant source to provide most useful information to teaching and learning process

2.1.2 What is a good language test?

Most educators find it hard to definite a good test in general since it depends on the objectives of language uses, the purpose of the tests, and the resources available (Hughes 2003, p.8) Hughes accordingly suggested ideal characteristics for a test or testing system as follow

- consistently provides accurate measures of precisely the abilities in which

we are interested

- has a beneficial effect on teaching

- is economical in terms of time and money

It may be interpreted in brief the above factors as validity to concern with accurate measurements made, and it may be said that a test includes reliability in it if it measures consistently At the same time, the concept of positive backwash is inferred from the meaning of beneficial effect on teaching, and efficiency relating to educational administration shows another way to refer to economical issues of time and money

Nevertheless, a question arisen from these factors likely entangles most educators as

"Can we obtain all these essential factors in developing a single test?" The answer may be "Not always" because, in some cases, validity and reliability seem to be in

contrast, or “a reliable test may not be valid at all” (Hughes 2003, p.50), but a valid

Trang 27

test must be reliable Consequently, a test type tends to be chosen according to certain essential qualities required regardless of their essentiality to the use of tests,

as the following claim: “While reliability is a quality of test scores themselves, validity is a quality of test interpretation and use” (Bachman, 1990, p.25) For further argumentation, these two sides of a test will be discussed more specifically

in the part 2.1.3

2.1.3 Main relevant test qualities

We all agree that to measure quality of a test, reliability and validity are referred as key factors They provide major justification in the process of making inferences or decisions based on the test scores on an individual For the main concern of this study, apart from other qualities expressed in the usefulness of a test, two qualities will be discussed in details as reliability and validity

2.1.3.1 Reliability

In this sub-section, the issue of reliability is presented through four contents: (a) definition, (b) affecting factors, (c) reliability of objective and subjective tests, and finally (d) inter-rater reliability

(a) Definition: Reliability was defined by different authors with different ways, in

which scoring consistently seemed to be mentioned the most Accordance with Berkowitz, Wolkowitz, Fitch, and Kopriva (2000) cited in Rudner, Lawrence, Schafer, William (2000), reliability was explained, under the condition that “the scores are indicative of properties of the test takers”, as follows:

the degree to which test scores for a group of test takers are consistent over repeated applications of a measurement procedure and hence are inferred to

be dependable and repeatable for an individual test taker (para.2)

Trang 28

(b) Affecting factors: There are three factors considered as main causes affect

reliability of a test: the test itself, the test-takers, and scoring factors For the first one, each test type has its own characteristics, for instance multiple-choice concerns with the effectiveness of distractors, correct answer, and difficulty of the items The second is clearly explained by Rudner and Schafer (2002, p.22) as students‟ performance in a testing process The last and the most concerned in this paper is consistency in scoring since “scorers are not always consistent; the scorers tend to change their criteria while scoring and are subject to biases…” (Rudner, 1992 Cited

in Rudner and Schafer, 2002, p.22) Hence, there exist thousands of ways to achieve reliability, from consistent performance to scorer reliability, or partly to reach to a good test

(c) Reliability of objective and subjective tests: For objective and subjective tests,

which are the most focused in this research and will be described in the next part, reliability clearly tends to be mentioned as the issue of scoring procedure for more reliable characteristic of the first type compared with the second Apart from many factors creating reliability of a test, consistency is often defined as measurement of reliability (Bachman and Palmer, 2000, p.19) In some cases, subjective tests for instance, the scorers who are responsible for the same paper may have different conclusions about it as chalk and cheese because one of them may expresse more strictly in scoring than the others As a result, the scores obtained are not consistent, and could not be considered to be reliable, this causes unreliable tests

Once more time, Bachman and Palmer (2000) affirmed that:

Reliability is clearly an essential quality of test scores, for unless test scores are relatively consistent, they cannot provide us with any information at all about the ability we want to measure (p.20)

It can be understood that obtaining consistency or reliable test is one of the most crucial purposes of achieving a good test Hence, test developers had better

Trang 29

minimize inconsistency through test design, or by choosing test tasks as: “Of the many factors that can affect test performance, the characteristics of the test tasks are

at least partly under our control” (Bachman and Palmer 2000, p.20)

(d) Inter-rater reliability: In writing assessment, inter-rater has been considered as

a dilemma of essay evaluating, so many studies have been conducted under various issues relating to this problem Shohamy, Gordon and Kraemer (1992) wrote an article namely "The effect of Raters' background and training on the reliability of direct writing tests" in The Modern Language Journal, Vol 76, No 1, pp 27-33 Baer et al (2004) compared the creativity manifested in more than 100 8th-grade students' writing in their study Howell et al (2005) described the notion of Inter-

rater reliability as follow:

Inter-rater reliability is the extent to which two or more individuals (coders

or raters) agree Inter-rater reliability addresses the consistency of the implementation of a rating system

In the researcher's mind, the root of the consistency in essay rating is designing an adequate scoring rubric to have the similar response to the same sample, because a perfect scoring scheme itself expresses a guidance to evaluate the students' writing

so that judgments of two or more individuals may vary depending upon the criteria established (Moskal & Barbara , 2000) To enhance inter-rater reliability in a large-scale examination, scorer training and monitoring skills are called for despite of corollary relating to financial problem

In short, there exists a conclusion that “By developing a pre-defined scheme for the evaluation process, the subjectivity involved in evaluating an essay becomes more objective.” (Moskal & Barbara , 2000)

Trang 30

2.1.3.2 Validity: face validity

Whereas test reliability relates to consistency of measurement, test validity refers to accuracy of measurement Hughes (2003) clarified that

A test is said to be valid if it measures accurately what it is intended to measure, or a test created can be inferred to be a true tool to measure students” ability to perform their knowledge at certain levels required (p.26)

In other word, “if a test serves its intended function well, it is said to be valid, if it does not, it is invalid” (Nunnally, 1972, p.21) According to Alderson et al (1995), there are three types listed in the frame of validity: internal validity, external validity, and construct validity However, related to the research topic in this thesis, only internal validity is mentioned along with face validity as the main discussion

Internal validity is discussed as conceptual approach which both subjective and objective judgments of a test required mostly base on assessment of the test itself (such as its form, format, and content) That means it is mainly concerned with the

„perceived content‟ of the test and its „perceived effect‟ (Alderson et al., 2003) To assess this feature of a test, three most widespread methods should be included The first is face validity which the value of a test is criticized by various kinds of stakeholders The second is content validity – the very dilemm because test content can be only judged by professional experts or linguists The last one is response validity primarily conducted by test-takers However, for the scope of this research, internal validity will be discussed limitedly under face validity for its relevant to the test types mentioned

Face validity is not considered as the predominant method to assess the validity of a test; still in fact, it is the face of a test to be firstly examined when discussing the issue of good or bad test That is, a test with poor format together with such poor items, or unclear instructions may be said “not valid” In addition, to Ingram (1977,

Trang 31

p.18), face validity refers to the test‟s “surface credibility or public acceptability” And judgments on test are regularly performed subjectively by „non-experts‟, administrators, and even by students or test-takers Their assessment, hence, may not be officially accepted inspite of their admitted impacts on the validity of a test

Finally, a test called valid if it appears to be valid, or testing language skills should include adequate elements of those skills A question arisen for this argument, a critical issue, relates to face validity that which one is more valid - multiple-choice format or essay format used in a test of writing ability?

2.1.3.3 Reliability and Validity

The contrast between reliability and validity (as mentioned before) are manifested under diversity of testers‟ viewpoints in both principle and practice In principle, some testers assumed that “a test cannot be valid unless it is reliable”, or “it is quite possible for a test to be reliable but invalid” (Alderson et al., 1995, p.187) Nevertheless, in practice, reliability alone is not sufficient since a test can only be conducted when it is highly valuated with all test qualities Especially, with the present of these two factors, which one will be chosen to maximize, it depends on the test purpose A typical example for this argument is multiple-choice test, it is called reliable test because of objective and consistent scoring, yet it is criticized for invalidity to measure students‟ ability of English in real life (Alderson et al., 1995, p.187) However, it is chosen for the purpose of reliable scoring

2.2 Writing ability assessment

In this part, some explanation of writing ability and its testing types will be included for insight of writing assessment

Trang 32

2.2.1 What is writing?

Writing is considered as manifestation of human beings through language for its distinction between textual and non-textual medium of inhuman This has been proved by archeologists and linguists who observed the ancient illustration like as cave drawing and painting Thus, it is admitted that the ability of organization a text brings out the existence of people in a developed community In other word, only people in this world can develop the system of language, and writing system is one

of the most central features of language To Ager (1998, par.1), “Writing is a method of representing language in visual or tactile form Writing systems use sets

of symbols to represent the sounds of speech, and also have symbols for such things

as punctuation and numerals”

Besides, writing is a special way of communication among the people firstly having the same writing system with the implication that they are in an organization, it is because writing stands for a great formality holding numerous principles as well as rules in it when writers organize their thoughts

In short, writing is the highest development of a language of which it completely covers the whole features

2.2.2 Writing test formats - subjective format and objective format

Writing has been defined in the above section, and the ways used to assess a person‟s writing ability always attract attention of educators As indicated in the previous parts (in 2.1), the choice of test type depends on the objectives of a test To deal with testing writing ability, all the features involved should be observed As the result, the advantages and disadvantages of subjective and objective tests will be discussed in this section on the ground of their nature as “the subjective type of test allows students to write and use language in their answers while the objective type

Trang 33

does not” (Tunku, 2000) Yet, the system by which a test is scored gives that test the name, not really the test itself, so it is more focused on this study as well

2.2.2.1 Objective testing

In Vietnam, objective testing (the most common questions are multiple-choice,

true/false, matching items, and completion) is considered as a new trend compared with subjective testing The advantages of these forms, however, are obviously manifested under scoring issue The scoring process is conducted objectively by following a consistent answer or by scanning machines and computers Besides, the test takers cannot supply the selected-response item formats known as multiple-choice, matching, and true-false (Zimmaro, 2003, p.15) with various alternative so that objectiveness will be always maintained in this type

However, this testing has its own disadvantages Because it requires test takers to choose the correct response from several alternatives suggested, it is said to embed more guessing factors in this type In addition, this kind of test, especially, has a severe effect on English - majored students (at University or College) for lack of writing ability This result is of the true that students were not trained with adequate ability in high school because of being prepared for examinations (mostly under multiple - choice and sentence writing exercises which mainly focus on grammar) rather than being established with sufficient knowledge (four language skills) to further studying

In addition, some disadvatages of this type are also known as time consuming to write good items, less valid than subjective test, and no suitability for assessing certain abilities Despite being a controversial topic, it still remains popular due to its utility, reliability, and cost effectiveness

Trang 34

2.2.2.2 Subjective testing

Contrasting with objective testing is subjective testing which require the test takers

to write and present an original answer, including short-answer essay, response essay, problem solving, and performance tasks (Zimmaro, 2003, p.15) To measure student's writing ability, these tests are encouraged to use with a lot of arguments based on their several advantages and disadvantages

extended-For good side, subjective tests represent their strongest point under face validity which allows the test takers to perform their actual ability Students can demonstrate written expression in such a logical order that their results are precisely measured Beside that, learners' language knowledge is assessed in a wide range including mechanics, grammar, style, organization, and logical development It is clear that these fields of knowledge are unable to be shown in objective tests since they are kind of discrete language properties assessment (emphasizes on mechanics, grammar, and vocabulary only) while other significant aspects of written language (organization, content, and coherence) are ignored (Albertini et al., 1996, p.75)

The biggest drawback of subjective test revealed under its name is the variation in marking which is shown that "experienced examiners award widely varying marks not only on the same piece of work marked by other examiners, but on their own marked scripts re-marked after a passage of time" (Zimmaro, 2003) According to her, this format has been criticized to base on the basis of loosely defined criteria with lack of consistency from one reader to another, and grading is time-consuming and often impractical in large numbers On the other hand, such examinations have been thought to be deficient in consistency because of essay ratings; besides, over-burdened staff, time - consuming and financial problem are considered the big dilemma to make the exam impractical as well Although it's not the best way to assess student's writing ability, subjective form is still able to gain both validity and reliability when issues of administration, testing conditions, population factors, and

Trang 35

According to Jacobs (2004), the major task in scoring essay tests is to maintain

consistency, and to make sure that answers of equal quality are given the same number of points, "At once, there has existed two approaches to scoring essay in writing assessment, they are named as analytic or point method, and holistic or rating method." (p.5)

- Analytic: a model answer, in which the major components are defined and point

values are assigned, should be called for before scoring in this method so that the raters can carry out a comparison between the student's papers with the model answer In addition, an analytic rubric should be used when the tester expects students‟ strengths and weaknesses, detailed feedback, assessment of complicated skilled or performance, and their self-assessment of knowledge understanding and performance (Zimmaro, 2003, p.32)

- Holistic: This method requires a list of certain developed criteria that the total

quality of examinee's answers based on The score for each paper will be decided according to the stacks it involved in, these various stacks are regularly sorted into three levels: the first stack includes the best answers, the second stack comes from the average, and the poorest go into the third stack This method is usually used for

a large group of test takers when a quick snapshot of achievement is desired together with a single dimension is adequate to defined quality (Zimmaro, 2003, p.32)

In summary, each test format has its strong and weak points, the real challenge for the educators who have responsibility for making decision on finding and using valid and fair assessment tools is to bring out a suitable test format based on specific purpose of education, especially in HGEs

Trang 36

2.3 The best way to assess writing ability: a combination of testing formats

The strong and weak points of each format presented in the above section lead to various opinions in deciding which test type should be applied to obtain a valid and reliable test of writing ability Beside that, the history of testing passed through movement from subjective to objective testing with presenting both strengths and weaknesses, thus most testers tend to find out a better sample with the best features

of these two extremes, that is to say, the overall content of the first and consistent objective scoring of the second (Madsen, 1983, p.7)

In this part, choice of combinating these two forms will be shown for rejecting the unfavored elements of each one According to Driver & Krech (2001), to measure students fairly “assessments should be valid (measuring what they intend to measure) and reliable (resulting in the same placement whenever they occur or whoever grades” (p.18)

It is obviously, moreover, that both written and multiple-choice tests lead to problems when used alone (see the weak and strong points of each one presented in the previous section) since they cannot include reliability and validity in each format Breland (1996) expressed that one way to ensure reliability is to use “both free-response and multiple-choice tasks to make up the assessment” (p 23) In addition, to ensure fair assessment, White (1998) also agreed that it should be combination of testing methods as: “The best use of multiple-choice scores, if they

must be employed in the area of writing, is as a portion of a test, rather than as the

assessment itself” (p 240) It is also reported in some studies, moreover, that there existed this approach by Educational Testing Service (ETS), the more reliable essay scores were not aimed to but a unique kind of information would be provided for the purpose of public reporting - based on the viewpoint that a sole test can be reliable when considered along with other information

In short, White (1998) emphasized the need for this kind of assessment as follows:

Trang 37

The results of a careful multiple-choice test, when combined with the results

of a single essay test, will yield a fairer and more accurate measure of writing ability than will either test when used by itself… A preferable alternative is

to score more than one writing sample, either in paired essay tests or in portfolios (pp 240-241)

2.4 Chapter summary

In closing, this chapter has presented a conceptual framework relating to three main issues The introduction of language test with relevant reliability and validity supplied general knowledge of this realm Then, based on this theory, a more specific understanding of writing ability assessment was discussed to provide the answer for the issue: “what is the best way to test writing ability?” which was mentioned in the last section of the chapter Following this theory chapter, research questions and methods to find the answers for them will be shown in the next chapter of research methodology

Trang 38

CHAPTER 3 METHODOLOGY

The previous chapter has dealt with the key theories as supporting background to the thesis This chapter discusses the methods employed to get data served for findings in the study, it includes two main sections The first section was stated with reasearch questions Accordingly, the second section describes research design with three parts: the subjects, the instruments analysis, and implementaion under two sub-sections: data collection procedures and data analysis procedures

3.1 Research questions

As stated, this thesis aims to investigate the two different test formats which are multiple-choice and written tests used in HGEs The findings will serve to observe their relationships in writing ability assessment and find the most suitable test format for this large-scale exam To obtain these aims, specific following questions have been raised as:

(1) Are there any differences of ranking students' writing ability between subjective (sentence and essay writing) and objective testing (multiple - choice)?

(2) From the result of question (1), are there any correlations between written test format and multiple- choice format in the field of writing ability assessment?

(3) Do students show the same favor or judgments on test forms in HGEs and should another form be used instead of the current ones?

These questions are the basis for finding out the correlations of assessing students‟ writing ability by objective and subjective tests

Trang 39

3.2 Research design

To get the necessary information for answers to the mentioned research questions, a test and a survey were designed and carried out In this research, data collection was mainly based on the test of three different formats, and the survey questionnaires were added to gain empirical evidence for diverse results achieved from the test

The following will describe in detail the subjects, the instruments, and the implementation of the research

3.2.1 Subjects

3.2.1.1 The core subjects: First - year students at An Giang University

Aside from getting primary information for this research through a questionnaire answered by 60 high-school students at 12-grade of Binh Khanh High-school, the research was mainly conducted on 44 students who are the first - year students (school year 2008 - 2009) of An Giang University, majored in English They were both test takers and the second questionnaire respondents, coming from the two classes of Department of Foreign Languages, The Faculty of Pedagogy These students had just graduated from high school, and were chosen to the University, so they carried out the test in this study with their own knowledge of high school program, since they were in the beginning of the first term at University That is to say, they were still not trained with college writing because this study aimed at investigation into writing ability after three years learning English at high school, particular in wash back of preparation for HGE

Most of them had been graduated from the various high-schools of An Giang province, this point brought so many advantages to the researcher since their learning process was controlled under the same evaluation guideline, or their ability was assessed equally among schools of one province Beside that, they had three

Trang 40

more information with reference to the table 1.3) These may be considered as the

base for making a practical comparison of their opinions about current test formats and their writing ability acquisition at high schools

Some arguments against the study subjects, though, may maintain that there exists contrary of the supplementary subjects and the core ones, it is true that those who are chosen to university are good students from various high schools (the second group) whilst the first group was required to be at average level in one school The cause bases on the ground that this difference may lead to an objective viewpoint towards the variety of students impacted by exam changes

3.2.1.2 The supplementary subjects: 12 - grade students at Binh Khanh High school

Before conducting the research on the core subjects as mentioned in the previous section, another questionnaire had been designed and handed out to the high-school students for the purpose of getting preliminary information as reference for background of the research This group included 60 students of the 12 grade at Binh Khanh High school (each 20 students were randomly chosen from three classes), they were in the second term of the school year 2007- 2008 The reasons for gaining information from this group will be stated as following:

(1) Among four state high schools in Long Xuyen city, An Giang province, Binh Khanh was the youngest school with young staff of which the number of teachers majored in English was 7, half of them had 3 - 5 years' seniority, and the rest were over 5 years This school is located in the environs of the city in which the most students had been ranked at average rate through HGEs

Relying on these certain features, the researcher hold the view that examination changes may have stronger impacts on students coming from this school for most of them considered getting High-school Graduated Certificate as their final goal of

Định dạng
Số trang	121
Dung lượng	1,29 MB