1. Trang chủ
  2. » Luận Văn - Báo Cáo

The use of the analytic scoring scale in assessing third year students portfolio essay writing assignments at nông lâm university

100 6 0
Tài liệu được quét OCR, nội dung có thể không chính xác

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The Use Of The Analytic Scoring Scale In Assessing Third Year Students Portfolio Essay Writing Assignments
Trường học Nông Lâm University
Thể loại Essay
Định dạng
Số trang 100
Dung lượng 16,78 MB

Nội dung

Trang 1

MINISTRY OF EDUCATION AND TRAINING HOCHIMINH CITY OPEN UNIVERSITY

4#

THE USE OF THE ANALYTIC SCORING SCALE IN ASSESSING THIRD-YEAR STUDENTS’ PORTFOLIO ESSAY WRITING ASSIGNMENTS

AT NONG LAM UNIVERSITY

TRUONG DAI HOC MO TP.HCM THU VIEN

pees

Submitted in partial fulfillment

Trang 2

-TABLE OF CONTENT

Certificate of Originality + nọ SH rcey ii Retention and Use of the Thesis .ssssssssssssssesssecsessesseeseesseasesesesecsessessneene iii Acknowledgement List of Scorers & Co-operative Teachers Table of Content

List of tables & figures

AbsftraC( c cọc HH HH TH HH Thư rệc xiii

Chapter One — Introductfion Úc 1 I 1

1.2 — Theoretical Framework ¿+6 tt rrrerrerree 2

1.3 — The Purposes of the Study

1.4 — Research Hypothesis

1.5 — Research Questions - + + hàng re 6

1.6 — The Importance of the S(udy - - 5-5 ssveekeierekeree 6 1.7— The Scope of the Sfudy «caro Ổ 1.8 — Definition of Terms 1.9 — Summary Chapter Two — Literature Review 2.1 — Portfolio Assessment

2.2 — Analytic versus Holistic Scoring Schemes

Trang 3

2.2.1.3.1 — Test of English as a Foreign Language (TOEFL) 2.2.1.3.2 — Cambridge First Certificate in English (FCE) 2.2.1.3.3 — International English Language Testing System

2.2.2 — Analytic Scoring Schemes " keeererrerreveeceee 24 2.2.2.1 — Advantages of Analytic Scoring S Schemes

2.2.2.2 — Disadvantages of Analytic Scoring Schemes 2.2.3 — Validity and Reliability in Two Scoring Schemes 2.3 — CONCIUSION c.ccsescessesesesteseeesecsesesseesesseneseeveseereaesnesesesssarsesecsesessesteneeseee Chapter Three — Research Method 3.1 — Research Design 3.2 — Participants .sscesecsesesseseesessseesessenssecnssesneseenesssnseceesneasseeesscsecresessenense 32 3.2.1 — Student 3.2.1.1 — Analytic group 3.2.1.2 — Holistic group 3.2.2 — Teacher 3.2.3 — Scorers

3.2.3.1 — Scorers for the analytic group 3.2.3.2 — Scorers for the holistic group 3.2.3.3 — Scorers for the final exam of two groups 3.3 — Instrumentation «hàn HH TH ng 35 3.3.1 — Pret€sf HH HH HH ườn 3.3.2 — Portfolio assessment & Final exam 3.3.2.1 — Portfolio assessment 3.3.2.2 — Final exam +

3.3.3 — POSE€S co HH HH tư nưàn 37

3.3.4 — The analytic scoring scale used in the study .- 38 3.3.4.1 ~ The analytic scoring scale

3.3.4.2 — Criteria of the study’s analytic scoring scale

3.3.5 — The holistic scoring scale used in the study

Trang 4

3.3.6 — Correlations 3.3.7 —Critical a value & T-test 3.3.7.1 — Critical a value Em '- - — ⁄‹Á 3.3.8 — Questionnaire 3.4 — Materials

3.4.1 — Course books & Course Syllabus

3.4.2 — Instructions for the two scoring schemes . «+ 43 3.5 — Research Procedures - sành 44 3.6 — Variables in the research 3.6.1 — Independent variable 3.6.2 — Dependent variable 3.6.3 — Controlling the threats to the validity ‹-‹«-x<«< 45 3.6.3.1 — HÌS{OTY - Sàn Hiệp 3.6.3.2 — Maturation 3.6.3.3 — Statistic Regression 3.6.3.4 — Selection c ceeeeccee 3.6.3.5 — Experimental Mortality 3.6.3.6 — Testing 40 3.6.3.7 — Instrumentation 3.6.3.8 — Design Contamination

3.7— Assumptions of the Sfudịy ¿ <6 tàng ri 47 3.8 — Limitations of the Study Ặe Sàn 47 3.9 — SumimAry + tàn H91 1210112121.111111210 01101011 xe 48

Chapter Four — Data Analysis & Findings 49

4.1 — Data AnalySÏS ch HH HH net 4.1.1 — Reliability of the scores

Trang 6

BibWO@FAPHY sys, ccpcesrscra nnn cnmnmenrnnnnnmncnennmamnnnnn 85

Appendices:

Appendix A: Questionnaire (Vietnamese and English versions) 92 Appendix B: Course syllabus

Appendix C: T.O.E.F.L Holistic Scoring Scale

Appendix D: F.C.E Holistic Scoring Scale 101

Trang 7

LIST OF TABLES & FIGURES

TABLES:

Table 1.1: The comparison of holistic and analytic scales on six qualities Oƒ1est uS€flWlfieSS ào Table 1.2: The analytic scoring scheme suggested by Nakamura with five

CFÏÍ€FÏ4 St TH HH TH HH HH HH nàn 04

Table 4.1: Correlations for Assignment I — Example essay 50 Table 4.2: Correlations for Assignment 2 — Comparison-Contrast essay 50 Table 4.3: Correlations for Assignment 3 — Classijfication essay 3 Table 4.4: Correlations for Assignment 4 — Process ÁnaÌysis essay 51 Table 4.5: Correlations for Assignment 5 — Causes & Effects essay 52 Table 4.6: Correlations for Assignment 6 — Argumentative @SSAY weve 52 Table 4.7: Correlations ƒor the fìnaÏ eXaiS coi 532 Table 4.8: Descriptive statistics of the two scores sets of the two groups

28.7 043 %4

Table 4.9: T-test results for the pretests

Table 4.10: Descriptive statistics for Ássignmenf Ì .-c c« 56

Table 4.11: 2eseripive statistics ƒor ẢsSig'ehI 2 c sec 56 Table 4.12: Descriptive statistics for ÁsSig'iim€HI 3 cv 37

Table 4.13: Descriptive statistics ƒor ÁssignimeHl 4 .ccceeeeeereerree 38

Table 4.14: Descriptive statistics for Assignment 5

Table 4.15: Descriptive statistics for ÁsSig'i€H Ố c cĂccsteeercey 60 Table 4.16: Descriptive statistics for the fìnaÏ eXAE 55+ «+ 61 Table 4.17: 7-test results oƒthe fìnal exam for the fWO gTOMJS - 62 Table 4.18: Descriptive statistics for the posfteSfS - 63

Trang 8

Table 4.19: T-test results of the posttests for the FWwO gYOWDS 63 Table 4.20: The summary of the results that the holistic group got with

six assignments

Table 4.21: The summary of the results that the analytic group got with

SUX ASSIQNINENUS « eececessesesescsessesceesenseseseessecsesesevseerscateeesceeteeerseneee 72

Table 4.22: T-test results for the six assignments

Table 4.23: Average scores of the analytic group in terms of the five

criteria in the Six đ5SigHIM€HIS .ằằằằvecisieccrecrex 7

Table 4.24: The comparison of the descriptive statistics of the analytic group and the holistic group for the pretest (left) and posttest

(ight): susnnnacnancnsmniann eee aes 77

FIGURES:

Figure 3.1: The gender distributions of the two groups

Figure 3.2: The distributions of the types of essays that students in the wo grou0s chose to write in their final exam 37 Figure 4.1: The pattern of the responses to [tem l .-« 65 Figure 4.2: The pattern of the responses t0 Ïiem 2 .-ceeccceccreeceex 66

Figure 4.3: The pattern of the responses to Item 3

Figure 4.4: The pattern of the responses to ltem 44 .-. -c- 5+ 68 Figure 4.5: The pattern of the responses to Item 5 .-.-e 69 Figure 4.6: The pattern of the responses to Iter 6 .- eccccccc+ 70 Figure 4.7: The means of the two groups with the six assignment .00 73 Figure 4.8: The improvements of students in the analytic groups with five

HERG sconecwmpranimanuunscauemnnemamTIN ERS 76

Figure 4.9, Differences of the two groups’ means for the pretests and the POSUOSIS cseecscseseseseneseseeneeeneseecsnsenenesesesenesseeseseneasesseaeseateneceneaees 77

Trang 9

ABSTRACT

This thesis reports the study which was done on 104 third-year students at Néng Lam University The study purposed to explore the effectiveness that the

analytic scoring scale would benefit students when the scale was used for

assessing these students’ portfolio assignments The study also surveyed students’ attitudes towards the use of the analytic scoring scale in the portfolio assessment

The results of the study proved that the analytic scoring scale would help to improve third-year university students’ essay writing skills The independent t- tests were used for considering the differences between two groups in the

pretests, each assignment of the portfolio assignment, the final test and the

posttests The descriptive statistics of the pretests displays that the two groups were almost similar to one another at the beginning of the study The t-test results in the two-tailed p value is 877, higher than the critical a value of 05, proving that the difference between the two groups is not statistically significant After that, the t-test results of each assignment in the portfolio assessment changes from not statistically significant to extremely statistically significant The t-tests of the final examination and the posttest display that the differences are extremely statistically significant

The responses to the questionnaire displayed that students in the analytic groups highly appreciated the use of analytic scoring scale in scoring their assignments These students were motivated by the scores given analytically to use the scores to regulate their learning strategies at home Most of these students responded that it was the analytic scoring scale applied into scoring students’ writing papers that helped them improve their writing ability

From the findings of the study, the researcher would like to recommend applying the analytic scoring scale into portfolio assessment to improve students’ writing ability

Trang 10

CHAPTER ONE INTRODUCTION

This thesis reports the results of the study which was done on 104 third- year university students at Nong Lam University The study aimed to examine the use of the analytic scoring scale in scoring portfolio assignments in an experimental research This study purposed to check if the scoring scale enhances students’ writing ability better than the holistic scoring scale The study also surveyed students’ attitudes towards the use of the analytic scoring scale of scoring students’ assignments of the portfolio assessment

This Chapter presents (1.1) rationale, (1.2) theoretical framework, (1.3)

the purposes of the study, (1.4) the research hypothesis, (1.5) research questions, (1.6) the importance of the study, (1.7) the scope of the study, (1.8) the definition of terms, and (1.9) the chapter summary

1.1- RATIONALE

Currently, in the portfolio assessment for third-year students at Nong Lam University the holistic scoring scale has always been used for scoring writing papers During the course of Essay Writing, these students have to write about five or six essays on different types of essays These essays will be scored as the in-term performance and the in-term score will be aggregated with the exam score that students get from the final exam to form the score for the course As the procedure required, these essays have been scored with the holistic scoring scale Students can base on the scores given to each assignment to regulate their learning strategies to improve their writing ability

Trang 11

because “‘a single score does not allow [students] to distinguish between various aspects of writing such as control of syntax, depth of vocabulary, organization, and so on” (Park, 2004, p 1) More seriously, students may interpret wrongly

their success with the single score given as Weigle (2002) asserted “different

papers can have the same scores but actually those same scores do not need to

arrive at the same criteria, therefore, it is very difficult to interpret holistic scores

(as cited in Weir, 2005, p 188) In years of teaching the subject of Essay Writing for third-year university students, the researcher has continuously got complaints from these students, stating that they have not known what their strengths are and what their weaknesses are with the scores given holistically by teachers To be able to improve their writing ability on themselves, these students need to be able to know their strengths and weaknesses from teachers’ feedback, which are the scores that teachers give their papers

1.2 - THEORETICAL FRAMEWORK Ộ

Bachman and Palmer (1996) suggested a framework for testing in terms of the usefulness The framework can be relevant in helping teachers decide which type of test to use This framework proposed six qualities of test usefulness:

Reliability, Construct Validity, Authenticity, Inter-activeness, Impact, and Practicality (Bachman and Palmer, 1996, pp 17-38) Weigle (2002) commented

on the Bachman and Palmer’s framework, showing a comparison of holistic and analytic scales based on the same six qualities of test usefulness

Table 1.1 summarizes differences between the holistic scoring scale and the analytic scoring scale on the six qualities of test usefulness, according to Weigle’s comment

Trang 12

with more information about their writing ability; teachers and educators also are benefited with the analytic scale when it can orientate teachers and educators what they should use to instruct students However, the analytic scales will be more disadvantaged than the holistic scales when Practicality is taken into consideration It is a time-consuming and expensive method of scoring

Quality Holistic Scale | Analytic Scale

Reliability lower than analytic but still higher than holistic acceptable

Construct Validity holistic scale assumes that all analytic scales more relevant aspects of writing

ability develop at the same rate and can thus be captured in a

appropriate for L2 writers as different aspects of writing ability develop at different single score; Holistic score correlates with rates

superficial aspects such as length and handwriting

Practicality relative fast and easy , time-consuming expensive Impact single score may mask an/| more scales provide useful

uneven writing profile and may | diagnostic information for be misleading for placement placement and/or instruction;

More useful for rater training Authenticity White (1995) argues that | raters may read _ holistically

reading holistically is a more

natural process than reading analytically

and adjust analytic scores to

match holistic impression

Inter-activeness (*) | not available not available

(*) Inter-activeness, as defined by Bachman and Palmer in Weigle (2002), relates to the interaction between the test taker and the test It may be that this interaction is influenced by the rating scale if the test taker knows his/her writing will be evaluated; this is an empirical question

(Weigle, 2002, p 121)

Table 1.1: The comparison of holistic and analytic scales on six qualities of test

usefulness

Source: Assessing Writing, Weigle (2002)

Trang 13

Minnesota, the performance dimensions commonly found in analytic scoring schemes include Content, Vocabulary, Accuracy/Grammar/Language Use, Task fulfillment, Appropriate use of language, Creativity, Sentence structure/Text

type, Comprehensibility, Organization, Style, Mechanics, Coherence and

Cohesion From the list, each educator or researcher has developed his/her own analytic scheme Nakamura (2004) suggested the analytic scale with five criteria

of Originality of Content, Organization, Vocabulary, Grammar and Logical

Consistency when he performed a study relating to the test reliability Tedick, Klee and Cohen (1995), drawn heavily upon the analytic rubrics developed by Hughey e¢ ai (1983), developed an analytic scoring scale for use in scoring essays with five criteria of content, organization, vocabulary, language use and mechanic Tedick (2002) affirmed that the analytic rubrics developed by Hughey et al was one of the best known analytic rubrics used for writing assessments in the field of English as a second language (Tedick, 2002, p 35) The analytic scoring scale developed by Tedick, Klee and Cohen (1995) can be found in Appendix F, page 106 This analytic writing scale was revised in July 1996 by the Center for Advanced Research on Language Acquisition of the University of

Minnesota (USA)

Table 1.2 briefly introduces the analytic scoring scale with five criteria of Originality of Content, Organization, Vocabulary, Grammar and Logical

Consistency developed by Nakamura (2004) :

Originality of Content Organization sang Vocabulary Grammar Consistency Logical

points: interesting 4 points: well organized | 4 points: very effective | 4 points: almost no “ points: sentences ideas were stated clearly choice of words errors logically combined

3 points: fairly well

3 points: interesting organized 3 points: effective 3 points: few minor 3 points: sentences ideas were stated fairly choice of words errors fairly logically clearly 2 points: loosely ‘combined

organized 2 points: fairly good 2 points: some errors

2 points: ideas vocabulary 2 points: sentences serene meter 1 point: ideas 1 point: many errors ĐEN GENHĐES

disconnected 1 point: limited

1 point: ideas not clear vocabulary range of 1 point: many vocabulary unfinished sentences

Table 1.2: The analytic scoring scheme suggested by Nakamura with five criteria

Trang 14

The analytic scoring scale used in this study, drawing upon characteristics of the scales suggested by Hughey et al and Nakamura, is based on five criteria

These criteria are alphabetically, Coherence, Content, Grammar & Structure, Language used (consisting of Vocabulary, Spelling and Word used), and

Organization

1.3— THE PURPOSES OF THE STUDY

The purpose of this study was to test the theory stating that the analytic scoring scale would be better for enhancing essay writing skills for students of the second language In this study, the two groups in the study, analytic group and holistic group, would be compared to one another in terms of the results that students in the two groups got under the effects of the scales used for scoring their papers with analytic scoring scale or holistic scoring scale To prove so, the study was carried out by giving third-year students at Néng Lam University a portfolio assessment with six assignments for their homework and a compulsory final test

The study also aimed to examine the students’ attitude towards the use of analytic scoring scale in the portfolio assessment Students in the analytic group would be the subjects of a questionnaire These students were asked to respond to a six-scale Likert-type questionnaire The questionnaire was designed to examine how students in the analytic group thought about the use of the analytic scoring scale and how these students were pleased with their improvement on the criteria of the writing skills

1.4- RESEARCH HYPOTHESES

The study was done under the hypothesis stating that:

Trang 15

1.5 - RESEARCH QUESTIONS

The study was carried out to answer the following research questions:

Research question 01: Does scoring portfolio assignments analytically

enhance third-year university students’ essay writing ability better than scoring assignments holistically?

Research question 02: How do students think about the use of the analytic scoring scale of scoring their essay writing assignments?

1.6—- THE IMPORTANCE OF THE STUDY

The study gives an insight into the relationship between the scoring scale and students’ improvement in learning second language writing skills This insight will help teachers and educators have a clear-cut decision when thinking about the scoring schemes used for assessing their students’ papers Secondly, the study displays the mentioned relationship in a process rather than a product This process not only reflects the current method of assessing writing papers but also displays the process of students’ improvement Lastly, the study is a feedback to the current favorite holistic scoring scale on writing skills The holistic scoring scale can bring some advantages to teachers, to educators in assessing students’ papers but introduce students some disadvantages to their process of learning second language writing skills

1.7—- THE SCOPE OF THE STUDY

This study confined itself to studying the effects that the analytic scoring scale had on the essay writing skills of third year students at Néng Lam University These students had learned about writing skills but only writing paragraphs in different types

Trang 16

Smalley & Mary K Ruetten, in their Refining Composition Skills, which were used as the course material These types of essays will be defined in the next

section, Definition of Terms

1.8 — DEFINITION OF TERMS

In this study, the following terms should be understood with their specific meaning

The term portfolio assignments used in this study mentions six assignments for the homework that students in the two groups had to submit for their compulsory in-term assignments These assignments were for six types of

essays, alphabetically, Argumentative, Cause-and-Effect, Classification,

Comparison and Contrast, Example, and Process Analysis In this study, the term portfolio assessment refers to the process of assessing the mentioned portfolio essay writing assignments

The term analytic group is used for mentioning the class of which the assignments were scored with the analytic scoring scale That is to say, each paper would be given five scores for five criteria of Coherence, Content, Grammar & Structure, Language used and Organization Besides these scores, a total score would also be given The procedure for giving the total score will be described in section 3.3.4.2, page 38

The term holistic group is used for mentioning the class of which the assignments were scored with the holistic scoring scale For this group, each paper would be given a total score

The term essay writing skills used in this study refers to the ability of developing an approximately 300-word essay in six different types of essays mentioned above

Trang 17

In this study, sub-skills of the writing skills, such as Coherence, Content,

Grammar, Language used, Organization, and others are referred with the terms criteria (plural) and criterion (singular) These terms are completely synonymous

with the terms of feature(s), component(s), aspect(s), sub-skill(s) used by other

researchers

1.9-SUMMARY

Briefly, this Chapter mentions general aspects of the study titled The Use of the Analytic Scoring Scale in Assessing Third-year Students’ Portfolio Essay Writing Assignments at Nong Lam University The study aimed to prove that analytic scoring scale would be more effective than the holistic scoring one in this field, especially when the population of students was not a big group In this study, only 104 students in the two groups, analytic group and holistic group, involved The study also surveyed learners’ attitudes towards the use of the analytic scoring scale of scoring their portfolio assignments The study based on the hypothesis stating that the analytic scoring scale would enhance students’

essay writing ability better than the holistic scoring scale This chapter also

Trang 18

CHAPTER TWO LITERATURE REVIEW

This chapter reviews the literature relevant to the aspects on which the study focuses In this chapter, the aspects of the portfolio assessment, the analytic scoring scheme and the holistic scoring scheme will be discussed The results of previous researches related to the study will also be discussed in the relevant sections to give in-depth information about the aspects before mentioning the research method in the next chapter

2.1- PORTFOLIO ASSESSMENT

Trang 19

disciplines (as cited in Conrad, 2001, p 2) Armstrong Smith (1991) affirmed that timed exams of writing per se do not truly reflect the purposes of the essay tests that are used in academic content classes, which is significant in light of the fact that the mere existence of essay exams in academic courses has been seen as a justification for their use in writing assessment Furthermore, some researches suggest that second-language writers are particularly disadvantaged when it

comes to timed-essay exams (Conrad, 2001, p 2) Ruetten (1994), in examining

the pass rates of ESL students on an institutional exit proficiency exam, found that ESL students were twice as likely to fail the exam; similarly, evidence from Byrd & Nelson’s study pointed out a significantly high failure rate on timed- essay exams for nonnative students who were otherwise academically successful (Byrd & Nelson, 1995)

To cover the two mentioned limits of writing assessment, the approach of portfolios to writing assessment has been introduced as an alternative one The use of writing portfolios as assessment instruments has been hailed to a certain extent as a potential answer to the shortcomings of both the indirect multiple- choice writing tests and the more direct timed-essay assessment (Conrad, 2001)

Portfolios share the common goal of other “alternative, authentic, or

performance” assessments, which is essentially to provide evidence regarding the complex processes in which students engage themselves in actual, real-life

performances (Camp, 1993; Brown & Hudson, 1998; Gitomer, 1993; Huerta- Macias, 1995; and Linn, Baker, & Dunbar, 1991, in Conrad, 2001, p 1) Such

kind of writing assessment also highlights the evolution over the past few decades as a result of the emergence of current theories of learning and education Whereas learning was once thought of as a linear progression of acquired knowledge and skills, it is now seen more as a complex, nonlinear process that involves dramatic and intermittent changes in the learner’s

understanding and ability (Wolf, Bixby, Glenn, & Gardner, 1991, in Conrad,

2001, p 1) Because of this shift in the way that learning is viewed, it has become a goal of educational assessment to develop instruments that better

Trang 20

-10-measure learning in light of its complexity and nonlinear nature Then, the portfolio assessment is introduced

In short, a portfolio is a collection of student work that exhibits the student’s efforts, progress and achievements in one or more areas (Martin-Kniep, 2004, p 66) More specifically, in terms of writing assessment, a portfolio is a collection of written texts written for different purposes over a period of time (Weigle, 2002, p 198) In fact, portfolio approaches to assessing literacy have been described in a wide variety of publications so that many descriptions of portfolios exist Depending on the purposes that the portfolios serve, the portfolios are classified into some main kinds which have dozens of variations Herman ef al (1996) discussed three types of portfolios: the showcase portfolio which contains a student’s best pieces only; the progréss portfolio, which documents evidence of growth over time; and the working portfolio, which contains all works done for a course or at least samples that represent the major learning goals or units or a course (as cited in Weigle, 2002, p 214) Martin- Kniep (2004) introduced five kinds of portfolios They are the showcase portfolio, the development — or growth portfolio, the process portfolio, the transfer portfolio and the keepsake portfolio (Martin-Kniep, 2004, p 68) Of Martin-Kniep’s kinds of portfolios, the process portfolio is similar to the working portfolio discussed by Herman et al., the transfer portfolio is derived from the preceding portfolios and is used for communicating with students’ subsequent teachers, school, or admissions office when students move from one educational setting to another, and the keepsake portfolio serves as a repository of students’ favorite work or memorable work

While an all-encompassing definition of the writing portfolio is difficult to arrive at, the portfolio programs used at many institutions seem to share a number of commonalities which will be used to operationalize the term portfolio for the remainder of this paper They include the following:

1, Multiple samples of writing gathered over a number of occasions

Trang 21

-11-2 Variety in the kinds of writing or purposes for writing that are

represented

3 Evidence of process in the creation of one or more pieces of writing 4 Reflection on individual pieces of writing and/or on changes observable

over time

(Camp & Levine, 1991, p 197) Generally, a literacy portfolio is a systematic collection of a variety of

teacher observations and student products, collected over time, that reflect a

student's developmental status and progress made in literacy Currently, in the field of education generally, educational assessment (especially as represented by the new journal Educational Assessment) and composition pedagogy in particular, the use of portfolios is attracting great attention (Belanoff & Dickson,

1991; Yancey, 1992, in Hamp-Lyons and Kroll, 1997, p 18) Portfolios have been heralded as a favorable assessment approach to the writing of E.S.L

students (Brookes, Markstein, Price, & Withrow, 1992; Valdez, 1991, in Hamp-

Lyons and Kroll, 1997, p 18) because of the benefits of extra writing time, access to support services such as writing centers, and for other reasons

Martin-Kniep (2004) gives eight good reasons for teachers to consider using portfolio procedures in assessing students’ work These reasons are:

1 Documentation of students’ best work, effort, and growth

2 Focus on authentic performance, or knowledge in-use

w Student access to structured opportunities for self-assessment and reflection

Evidence of thinking

Thick description of student learning Validation of a development of learning

Choice and individualization for students and teachers

re

Nn

YF

Opportunities for conversations with different audiences

This approach has some clear advantages Brown & Hudson (1998) affirmed that the potential benefits of the use of writing portfolios can be

Trang 22

-12-described in terms of the ability to enhance not only the process of writing

assessment, but also student learning and the role of the teacher Martin-Kniep

added portfolios can provide a multi-dimensional view of students’ development and achievement (Martin-Kniep, 2004, p 66) Williams (2003) claimed that it forces students to consider readers other than their teacher and their peers as part of their audience It also creates a sense of collegiality often missing among faculty members According to White (1994), “portfolios bring teaching, learning, and assessment together as mutually supportive activities, as opposed to

the artificiality of conventional tests” (White (1994, p 27)

Moreover, reflection is a critical attribute of portfolios Hamp-Lyons and Condon (2000) stated “everything that we have read about how and why portfolios work successfully, as pedagogical tools, teacher development tools, and as assessment tools, teaches that without reflection all we have is simple a pile, or a larger folder” (as cited in Weigle, 2002, p 200) Reflection supports the use of portfolios because it becomes the means through which students can study themselves and their work It is a staple of action research as learners ponder, study, and evaluates their practices (Martin-Kniep, 2004, p 71) These methods of assessment also provide evolving images of students' work and, accompanied

by students’ reflections, enable readers to witness what students think about themselves as learners It is also tied to rubrics because it enables students to

refer to explicit performance criteria to monitor their learning It is argued for portfolios to be the most comprehensive tool for documenting students’ growth,

efforts, and achievements

Another advantage of the portfolio assessment is that it gives students additional time to express their ideas to the assignments Silva (1993), Hamp- Lyons & Condon (2000) affirmed that speeded tests such as timed writing examinations frequently put non-native writers at a disadvantage (as cited in Weigle 2002, p 202) The additional time allows students to have opportunities to revise, to edit their writing before submitting Hamp-Lyons (1996) believed

that the increased amount of time for revisions allows nonnative students the

Trang 23

-13-opportunity to correct any “fossilized errors” that might have otherwise surfaced and gone unrevised under the unnatural time constraints of an essay test Hamp- Lyons & Condon, (2000) claimed that nonnative writers benefit, in a portfolio- based assessment context, from having the chance to revise the various aspects of their writing (e.g., idea development, organization, grammar/mechanics) at different stages in the drafting process, instead of having to attend to competing textual needs at the same time This may contribute its parts to express student’s writing ability better than a timed essay provide (Weigle, 2002, p 202) and help teachers have a true picture about his/her learners Upon a research done at Stony

Brook, State University of New York, Elbow & Belanoff (1986) reported that

portfolio assessment has been found to better recognize the intricacies involved in the various stages of process writing

To date, when the communicative approaches — with their emphasis on oral proficiency — have tended to de-emphasize writing skills (Homstad and Thorson, 1994, p 1), the portfolio assessment may not be relevant of feasible for many second language learners in non-academic settings; however, it is important for those involved in second language assessment at every level to be aware of the potential benefits — as well al the potential drawbacks ~ of portfolios so that they can decide for themselves whether the benefits of portfolios outweigh the drawbacks (Weigle, 2002, p 229) Recognizing the value of portfolio assessment, some states in the USA, such as Kentucky, mandated this approach in the mid-1990s for all writing assessments in public schools statewide (Williams, 2003, p 330)

2.2 - ANALYTIC VERSUS HOLISTIC SCORING SCHEMES

The evaluation of the writing ability of second language students has become increasingly important in recent years This is true because the results of

such evaluations are used for a variety of administrative, instructional, and

research purposes One of the first decisions to be made in determining a system for directly assessing writing quality is what type of scoring procedure will be

Trang 24

-14-used: Should a single score be given to each text, or should the different criteria of a text be scored separately? This issue has been the subject of a great deal of research and discussion in the composition literature

Referring the role of scoring schemes on students’ improvement, Bachman and Palmer (1996) claimed that scoring schemes play an important role in the way of improving second language skills for students (Bachman and Palmer, 1996, pp 193-195) For writing skills, the past 20 years have seen two major changes in the teaching and rating of second language writing papers In teaching, emphasis is shifting from product to process The timed-essay

assessment has let the portfolio assessment precedence In rating, the indirect,

“objective” tests have given way to direct assessment of samples of student writing Writing assessment researches, like oral proficiency assessment research, have focused on improving the reliability of rating scales and procedures (Connor-Linton, 1995, p 762) Before the 1970s, the quality of student writing was typically assessed through the use of indirect (and usually multiple-choice) tests of usage and mechanics (Huot, 1994, in Conrad, 2001, p 1) In accordance with a somewhat antiquated theory of learning, these tests were founded on the underlying assumption that the ability to write is fundamentally governed by the linear acquisition of a discrete set of skills From an educational measurement standpoint, the presumed advantages of these types of tests included the notion that they could be objectively and reliably scored (Conrad, 2001, p 1) The most common indirect type of writing exams was the multiple- choice type This type, however, was thought to be lack of both reliability and validity Williams (2003) affirmed that “for using multiple-choice exams to assess writing skills, lack of reliability was deemed the biggest problem with classroom assessment, and lack of validity was deemed the biggest problem with large-group assessment” (Williams, 2003, p 318) According to Conrad (2001), indirect tests of writing lack validity because they do not accurately represent the

construct of writing, or in other words, what it means to write Specifically,

Trang 25

-15-multiple-choice items are not effective for measuring the ability to organize or

express ideas, formulate arguments, or demonstrate novel thought

The best way to improve our learners’ writing skills is undeniably to let them write and write more A student can only prove his/her writing ability through his writing papers rather than some questions or exercises on specific criteria of writing skills Multiple-choice tests may have some advantages in assessing some criteria of learners’ writing ability; it fails to reflect learners’ real writing ability when it gives learners no chance to write Then, a measure to evaluate pieces of writing which has not only validity and reliability but also make the evaluation meet the purpose of asking students to demonstrate their membership in the community of fluent writers of English should be developed Researchers have introduced holistic scoring schemes and analytic scoring schemes to replace the multiple-choice exam

2.2.1 — Holistic Scoring Schemes

Until the 1960s, the assessment of large numbers of students’ ability of writing chiefly used multiple-choice exams Such measures of writing evaluation are called indirect assessments of writing skills, typically asking questions about spelling, punctuation and editing, but do not require learners to produce a written text Some teachers and researchers, however, were concerned that such exams were not very effective, lacking both reliability and validity (Williams, 2003, p 318) It was argued that the only way to measure writing was to ask students to write and that multiple choice tests were invalid

In response to this criticism, Educational Testing Service (E.T.S.) of the United States of America decided to explore the possibility of developing a valid and reliable way to evaluate writing After several years of efforts, the Service came up with the method known as holistic scoring (White, 1986) The method “involves having readers evaluate a writing passage for its overall

effectiveness, as a whole, rather than by considering its individual features such

as word use, grammar, punctuation, organization, and style in isolation”

Trang 26

-16-(Glowacki, 1992, p 14) Hobson & Steele (1992) defined the scoring method as “ a procedure for assessing writing in which a reader judges a writing sample for its overall effectiveness” (Hobson & Steele, 1992, p 4) In holistic scoring

procedures, essays are not given low scores just because they contain many

mechanical errors, nor are they given high scores just because they are well organized According to Park (2004), this approach to writing assessment aims to rate the overall proficiency level reflected in a given sample of student writing (Park, 2004, p 1) The reader considers the overall impression created by the

student's papers and assigns a score consistent with that overall

impression Holistic scoring involves looking at the whole essay, not just parts of it White (1994) called what this approach measures writing papers as “a unit of expression” According to White (1994), the procedure is based on the notion that evaluating writing skill does not consist of measuring a set of sub-skills, such as knowledge of punctuation conventions, but rather of measuring “a unit of

expression” (White, 1994, p 18) Clearly, some things are more important than

Trang 27

The method quickly became popular as an effective means of testing large numbers of students, especially at the university level (Williams, 2003, p 318) In the early 1980s, individual teachers started using holistic scoring in their own classrooms as a means of assessing their students’ writing papers This popularity came from the reason that “holistic evaluation can be as reliable as multiple- choice testing and since it is always more valid, it should have the first claim on our attention when we need scores to rank-order a group of students” (Cooper

and Odell, 1977, p 4),

Between the two new scoring schemes, the holistic scoring scheme has been in favor of the community of teachers and educators Right after being introduced, the system quickly became popular as an effective means of testing large numbers of students, especially at the university level This scheme is believed to be valid, highly reliable, and doesn’t take much time (Williams, 2003, p 318) Three biggest organizations in teaching English as a second language have used this scoring scale to rate papers They are Test of English as a Foreign Language (T.O.E.F.L.), Cambridge First Certificate in English (F.C.E.), and International English Language Testing System (LE.L.T.S.) The holistic scoring scales suggested by these organizations can be found in Appendix C, page 100, Appendix D, page 102, and Appendix E, page 104

This method of scoring asks raters to “read quickly and make a judgment about the total effectiveness of the writing sample with factors such as organizations, spelling, and grammar considered to be of equal importance”

(Hobson & Steele, 1992, p 4), Hobson & Steele (1992), moreover, advised raters

“should not reread the paper to justify the score in terms of specific errors” (Hobson & Steele, 1992, p 6) It is thought that when a reader reads a paper very quickly he/she can make his/her assessment more reliably Skilled holistic readers, therefore, are encouraged to take only about a minute or two to go through a two-page paper because the more time readers take to get through a paper, the more inclined they are to begin mentally editing, focusing on the

surface errors (Williams, 2003, p 319) Thus, the holistic scoring emphasizes the

Trang 28

-18-importance of the communicative content of the writing sample (Terry, 1989, p 42)

2.2.1.1 — Advantages of Holistic Scoring Schemes

Compared to analytic scoring schemes, holistic scoring schemes are very advantageous when raters have to deal with a large population of test-takers For this reason, holistic scoring schemes are commonly used in large-scale assessment of writing (Park, 2004, p 1), especially at the university level, because it is valid, highly reliable, and doesn’t take much time (Williams, 2003, p 318)

Holistic scoring schemes are also more economical than analytic scoring schemes Since readers are required to make only one decision (i.e., a single score) for each writing sample, scorers can assign a global rating quickly

(Hoffman & Holden, 1997, p 2); Cooper (1984) believed that the scoring scheme, although having little instructional value, offers the cheapest and best

means of rating essays for the rank ordering and selection of candidates (Cooper, 1984, Abstract) Furthermore, Terry (1989) argued that holistic scoring is a more

efficient and effective method of evaluating written work than meticulous,

tedious discrete point scoring (Terry, 1989, pp 42-54) Hoffman & Holden, (1997) pinpointed that the major advantage of the holistic scoring scheme over the analytic scoring one is that the scheme is relatively economical; raters can assign a global rating quickly (Hoffman & Holden, 1997, p 2) Summarizing

ideas of Tedick (2002), Muller (2002) and TeacherVision.com (2000-2002), the

Center for Advanced Research on Language Acquisition (CARLA) of the University of Minnesota listed the advantages of the holistic scoring scales as:

e The holistic scoring scales are often written generally and can be used

with many tasks

e The holistic scoring scales emphasize what students can do, rather than

what students cannot do

Trang 29

-19-e Th-19-e holistic scoring scal-19-es sav-19-e tim-19-e by minimizing th-19-e numb-19-er of decisions scorers must takes

e Trained scorers tend to apply the holistic scoring scales consistently,

resulting in more reliable measurement

e The holistic scoring scales are usually less detailed than analytic scales

and may be more easily understood by younger students 2.2.1.2 — Disadvantages of Holistic Scoring Schemes

Regardless of its advantages, the method has also been criticized by many researchers, The major disadvantage of holistic scoring emerges from the limits of the single score, which gives useful ranking information but no details That is, holistic scoring cannot provide useful diagnostic information about a person’s writing ability, as a single score does not allow raters to distinguish between various aspects of writing such as control of syntax, depth of vocabulary,

organization, and so on (Park, 2004, p 1) Quellmalz (1986) and Weigle (2002)

had the same idea, stating that “one drawback to holistic scoring is that a single score does not provide useful diagnosis information about a person’s writing

ability” (Quellmalz, 1986, in Hoffman & Holden, 1997, p 3; Weigle, 2002, in

Weir, 2005, p 188) Hamp-Lyons (1995) indicated that “the writing of second language English users is particularly likely to show varied performance on different traits, and if we do not score for these traits and report the scores, much information is lost” (Hamp-Lyons, 1995, p 760) The author concluded “in order to reach a reasonable balance among all the essential elements of good writing, readers need to pay conscious attention to all those elements” (Hamp-Lyons,

1991, as cited in Weir, 2005, p 189) Weir (2005) claimed that this scoring

scheme might deny writers’ efforts, warning “if the overall score has been affected by just one or two aspects of the work, it is very dangerous to evaluate the effort of a writer” (Weir, 2005, p 188)

Additionally, this shortcoming of the holistic scale may mislead learners about their efforts Writing specialists have shared the ideas that the result of a

Trang 30

-20-writing paper makes result-readers fail to distinguish between the various criteria of writing papers According to Weigle (2002), different writing papers can have

the same scores but actually those same scores do not need to arrive at the same criteria, therefore, it is very difficult to interpret holistic scores (as cited in Weir,

2005, p 188) Hamp-Lyons (1995) observed that scores generated holistically cannot be explained to other readers in the same assessment community The author symbolized a holistic scoring system as a closed system, offering no windows through which teachers can look in and no access points through which researchers can enter (Hamp-Lyons, 1995, pp 760-761) Consequently, “writers who have pieces of writing scored under holistic scoring system cannot be protected against the influence on raters’ scores of features of writers’ text where the scoring system obscures the basis for scores” (Hamp-Lyons, 1995, p 761) This is especially problematic for second language writers since different aspects of writing ability may develop at different rates for different second language

learners (Weigle, 2002, in Weir, 2005, p 188; Park, 2004, p 1) Because the

objective of the writing assessment program is to provide students with the feedback about their strengths and weaknesses of their writing ability, the holistic scoring scale does not benefit students when all criteria are not taken into consideration when the writing paper is evaluated

Weigle (2002), additionally, introduced another disadvantage of holistic scoring schemes That is the superficial characteristics such as length and hand- writing may influence the anchor score of the writing The author studied the scores rated under both holistic and analytic scoring schemes and found that scores rated under holistic scoring system have been shown to correlate with those superficial characteristics This correlation occurs when long writing papers and bad script writing papers often get low scores (as cited in Weir, 2005,

p 188)

The Center for Advanced Research on Language Acquisition (CARLA) of the University of Minnesota also based on the ideas of Tedick (2002), Muller

Trang 31

-21-(2002) and TeacherVision.com (2000-2002) to give a summary of the disadvantages of the holistic scoring scales

s The holistic scoring scales do not provide specific feedback to test takers about the strengths and weaknesses of their performance

e Test takers’ performances may meet criteria in two or more categories, making it difficult to select the one best description

© Criteria cannot be differentially weighted 2.2.1.3 — Holistic Scoring Scales

Below are the three holistic scoring scales used for Test of English as a Foreign Language (T.O.E.F.L.), Cambridge First Certificate in English (F.C.E.) and International English Language Testing System (LE.L.T.S.) The differences in these scoring scales depend on the purposes of the tests

In this research, the holistic scoring scale was developed based on the holistic scoring scale of International English Language Testing System (LE.L.T.S.) with some merely differences, depending on the purposes of the tests and on being lined with the requirements of the University The holistic scoring scale used in this research can be found on the Appendix G, page 110

a— Test of English as a Foreign Language (TOEFL)

The Test of English as a Foreign Language (T.O.E.F.L.) aims to define the ability of the attendees to generate and organize ideas, to support those ideas with examples or evidence, and to compose in standard written English According to Weigle (2002), the Test of English as a Foreign Language (T.O.E.F.L.) purposes to evaluate the English proficiency of people whose native language is not English and the scores are used primarily in decisions about admission to colleges and universities in United States of America and Canada (Weigle, 2002, p 141)

The writing paper for this test is scored on a six-point holistic scale The score of 6 is the best and the score of 0 is given when a paper contains no

Trang 32

-22-response, merely copies the topic, is off-topic, is written in a foreign language, or consists of only keystroke characters The T.O.E.F.L holistic scale is on the

Appendix C (page 100)

b —- Cambridge First Certificate in English (FCE)

The First Certificate in English examination is part of a suite of English language examinations at five levels of proficiency administered by the University of Cambridge Local Examinations Syndicate These examinations are used for certifying English language proficiency for a variety of purposes For example, the examinees that pass the third level of proficiency in the system are presumed to have sufficient language proficiency for office work or to pursue a training course in English

The holistic scale scores writing papers on five bands Besides the instructions for raters, the scale suggests an overall effectiveness for a writing paper meets the bands The highest is band five with the overall effectiveness is “a very positive effect on the target reader” and the lowest is band zero which is given to papers with too little language for assessment The First Certificate in English holistic scale is on the Appendix D (page 102)

c — International English Language Testing System (IELTS)

The International English Language Testing System (I.E.L.T.S.) purposes to assess the language ability of candidates who need to study at the post- secondary or university level or work in a professional capacity where English is used as the language of communication According to Weigle (2002), the level of English targeted by LE.L.T.S is more similar to the T.O.E.F.L than to F.C.E., which is targeted at a somewhat lower level of English proficiency (Weigle,

2002, p 155)

Scores on the I.E.L.T.S are reported as band scores between | (non user) and 9 (expert user) Separated band scores are reported for each skill section as

Trang 33

-23-well as an overall band score The overall band descriptors are found in

Appendix E (page 104)

2.2.2 — Analytic Scoring Schemes

On the whole, analytic scoring schemes are means of scoring writing papers by breaking down the objective of final products into criteria parts, and each part is scored independently The procedures of this method involve the separation of the various features of a composition into categories for scoring purposes (Park, 2004, p 1) The total score is the sum of the rating for all of the parts that are being evaluated When using analytic scoring schemes, it is necessary to treat each criterion or part as separate to avoid bias towards the whole product Depending on the purpose of the assessment, writing papers

might be rated on such criteria as content, organization, cohesion, register,

vocabulary, grammar, or mechanics Contrary to the holistic scoring scheme, this method of scoring avoids the potential flaw in global impression band scales of

uneven development in the different criteria (Weir, 2005, p 189) Moreover,

with this approach of assessment, a teacher is easy to give a higher scoring for a certain criterion by giving a certain coefficient when he/she think that his/her students should focus on the criterion For example, of the five criteria that Nakamura (2004) introduced in the previous chapter, section 1.2 — Theoretical Framework, page 2, if a teacher pays more attention to the organization of the writing papers, he/she can give a coefficient of two to the criteria before the total score of the writing papers is calculated

2.2.2.1 — Advantages of Analytic Scoring Schemes

Analytic scoring schemes are preferred over holistic schemes by many writing specialists for a number of reasons First, as mentioned above, it provides more useful diagnostic information about students’ writing abilities That is, it tells learners where their weaknesses are and where their strengths are Analytic scoring has been considered as more interpretable scoring approach because it

Trang 34

accesses the examinee’s specific strengths and weaknesses and identifies the particular components of writing that an examinee needs to develop (Downing & Haladyna, 2006, p 314) Erwin (1991) reported that holistic scores provide valuable information for an overall categorization of writing ability, but analytic scores provide more diagnostic information (as cited in Hughes, 2006, p 314) The information also allows instructors and curriculum developers to tailor

instructions closer to the needs of their students To a certain extent, the analytic

scoring scale is a useful tool to provide teachers’ feedback for students in the areas of students’ strengths and weaknesses Park (2004) pinpointed that the explicitness of the analytic scoring scheme guides offers teachers a potentially valuable tool for providing writers with consistent and direct feedback (Park, 2004, p 2) Second, analytic scoring schemes are particularly useful for second language learners, who are more likely to show a marked or an uneven profile across different aspects of writing Some second language learners may have excellent writing skills in terms of content and organization, but may have much

lower grammatical control; others may have an excellent control of sentence

structure, but may not know how to organize their writing in a logical way On this aspect, the analytic scoring scales can show students that they have made progress over time in some or all dimensions when the same rubric categories are used repeatedly (Moskal, 2000)

Also, analytic schemes have been found to be particularly useful for scorers who are relatively inexperienced (Weir, 2005, p 190) Weir in his 1990 research reported that a multi-trait analytic mark scheme is seen as a useful tool

for the training and standardization of new examiners (Weir, 2005, p 190) Other

authors maintained that, compared to holistic scoring schemes, analytic scoring schemes are easier to train scorers to use it, an inexperienced scorer may find it easier to work with an analytic scoring scheme than a holistic scoring one

because they can evaluate specific textual criteria (Park, 2004, p 2; Cohen 1994;

McNamara, 1996) Thus, inexperienced scorers may find it easier to work with an analytic scale than a holistic one

Trang 35

-25-2.2.2.2 — Disadvantages of Analytic Scoring Schemes

The major disadvantage of scoring analytically is that it takes a lot of time

to rate writing papers (Hughes, 2003, in Weir, 2005, p 191; Park, 2004, p 2; Nakamura, 2004, p 45) It is because readers are required to make more than one

decision for every writing paper When scoring analytically, a reader has to check, consider, and score each criterion of the writing ability and then gives a total score depending on the coefficient put forward It is really a hardship for a scorer when he/she has to work with piles of assignments

Critics of analytic scoring schemes also point out that measuring the quality of a text by tallying accumulated sub-skill scores diminishes the interconnectedness of written discourse At this aspect, it is thought that “the whole should be greater than the sum of its part” Tedick (2002) noted that “separate scores for different aspects of a student’s writing may be considered artificial in that it does not give the teacher (or students) a good assessment of the “whole” of a performance (Tedick, 2002, p 36) Hillocks (1995) and White (1994) shared with Tedick’s idea, pointing out that measuring the quality of a text by tallying accumulated sub-skill gives the false impression that writing can be understood and fairly assessed by analyzing autonomous text features (Park, 2004, p 3) Hughes (1989) pinpointed that concentration on the different aspects may divert attention from the overall effect of the piece of writing Inasmuch as the whole is often greater than the sum of its parts, a composite score may be very reliable but not valid (Hughes 1989, p 93-94) In this aspect, Fowles (1978) claimed that the analytic scoring often has the tendencies to reduce and oversimplify the components of writing, and to emphasize the flaws than the strengths of writing (as cited in Hughes, 2006, p 314)

Hughes (2003) warned that in scoring analytically, the criterion scored first may affect on subsequent criteria which are scored later, making the overall effect of a writing paper diverted to an individual criterion (as cited in Weir,

2005, p 191) Futcher (2009), based on Thorndike’s idea, named this

Trang 36

-26-defined the phenomenon as a problem that arises in data collection when there is

carry-over from one judgment to another In other words, when scorers are asked

to make multiple judgments they really make one, and this affects all other judgments If scorers are giving five scales each with nine points, and they award a score of five on the first scale for a piece of writing, it is highly likely that they

will score five on the second and subsequent scale, and be extremely reluctant to

move too far away from this generally As a result what we find is that profiles tend to be “flat”, defeating the aim of providing informative, rich information, on learmer performance (Futcher, 2009) Consequently, criteria scales may not be used effectively according to their internal criteria, resulting in a halo effect in

which one criteria score may influence another

An additional problem with some analytic scoring schemes is that even experienced essay judges sometimes find it difficult to assign numerical scores

based on certain descriptors (Hamp-Lyons, 1989) In this aspect, the Center for

Advances Research on Language Acquisition of the University of Minnesota (2011) claimed that there are possibilities for scorers to disagree with one another It is more difficult to achieve intra- and inter-rater reliability on all of the dimensions in an analytic scoring scheme than on a single score yielded by a holistic scale Also, on the scorers’ part, McNamara (1996) exposed that there are some evidences proving that scorers tend to evaluate grammar-related categories more harshly than they do other categories (McNamara, 1996), ‘thereby overemphasizing the role of accuracy in providing a profile of students’ proficiency This disadvantage is inevitable, especially with un-trained or un- experienced scorers Grammar-related categories are somewhat wrong — right categories whereas the other categories are judgments Focusing on wrong — right categories will always be easier than judgments White (1985) added other limits of analytic scoring That is the lack of agreement about what separate traits exist and its tendency to complicate the assignment of the scores for readers, increasing time and therefore costs (as cited in Hughes, 2006, p 314)

Trang 37

-27-2.2.3 — Validity and Reliability in two Scoring Schemes

Researchers have seemingly failed to reach a consensus on the validity

and reliability of the two scoring schemes While Perkins (1983) claimed that “holistic scoring has the highest construct validity when overall attained writing proficiency is the construct assessed” (Perkins, 1983, p 652), Weigle (2002) noted that holistic scoring schemes would lack validity He stated “holistic scoring has also come under criticism in recent years for its focus on achieving high inter-rater reliability at the expense of validity” (Weigle, 2002, in Weir, 2005, p 188) Connor-Linton (1995) shared with Weigle’s idea, stating that writing assessment research, “like oral proficiency assessment research, has focused on improving the reliability of rating scales and procedures, potentially

at the cost of the ratings’ validity” (Connor-Linton, 1995, p 762) However,

Nakamura (2004) conducted a research with ninety second language English writers on both holistic and analytic scoring schemes and found a different result about the validity According to his report, the correlation of the students' scores in two scales was 95 (the variance was over 90%), which was extremely high Nakamura wrote it might be that holistic scale test results could predict analytic scale test results with a high degree of probability and vice versa, maintaining that the validity was equal to the two scoring scales (Nakamura, 2004, p 48) The question seems to be true for Charney when he stated that “the validity of

holistic scoring remains an open question” (Charney, 1984; in McLean, 1992)

Different ideas have also been found with the reliability of the two scoring schemes This polarization may come from the facts that there is a lack of agreement about ways of measuring reliability of writing papers and that there are many different ways available to compute writing reliability The studies

conducted by White & Polin (1986) and Swartz, Patience & Whitney (1985)

showed that the inter-rater correlation in the holistic scoring scales was higher

(Glowacki, 1992, p 17) To the contrary, the study conducted by Bauer (1981)

showed that the analytic scoring scale had the higher reliability than the holistic scoring scale, especially 954 and 928, respectively (Glowacki, 1992, p 17)

Trang 38

-28-Similarly, the results from the research conducted by Nakamura (2004) with ninety second language English writers mentioned previously on both holistic

and analytic scoring schemes also found that the analytic scoring scheme was

more reliable

On the whole, despite the drawbacks mentioned above, researchers in both

first language and second language writing generally agree that analytic schemes are more reliable, provided guidelines pertaining to rater training and rating

session administration are faithfully adhered to (Perkins, 1983; White, 1994 in

Park, 2004, p 1)

To date, there are few empirical researches indicating that second language writers gain an advantage from assessment by portfolio rather than the

traditional writing assessment (Hamp-Lyons & Kroll, 1997, p 18) In addition,

there has been little literature referring to using the analytic scoring scheme to

enhance essay writing skills for students at universities Hafner & Hafner (2003)

claimed that “although analytic rubrics have emerged as one of the most popular assessment tools in progressive educational programs, there is an unfortunate dearth of information in the literature quantifying the actual effectiveness of the

rubric as an assessment tool” Meier ef al (2006) shared the idea, stating “the

bulk of existing research on rubrics has been done in the area of English writing and composition, and these studies have typically focused primarily on holistic essay grading rather than analytic, rubric-based assessment” Andrade, Du & Wang (2008) affirmed quantitative and experimental research methods have rarely been used in this regard Brookhart (2005) suggested that additional study needs to take place to ensure that assessment in all subjects be reliably judged

with the rubrics Due to such a dearth of information in the literature, the holistic

scoring schemes have been used intensively in Néng Lam University as well as other educational facilities regardless the disadvantages that the schemes bring to students

Trang 39

-39-The researcher tends to favor third-year university students at Nong Lam

University by using the analytic scoring scale to score students’ portfolio assignments The researcher believes that the scoring scale will contribute its part

to enhance these students’ writing ability 2.3 - CONCLUSION

In short, this chapter reviews the literature relevant to the aspects on which the research focuses In the first section, this chapter gives a brief overview of the portfolio assessment The next section in this chapter is for the two schemes of evaluating students’ writing ability, the analytic scoring schemes and the holistic scoring schemes In this section, the advantages and disadvantages of each scheme are discussed in details The last section in this chapter is for the validity and reliability of the two schemes Previous researches have been studied to weigh the validity and reliability of the two schemes

Trang 40

-30-CHAPTER THREE RESEARCH METHOD

This chapter discusses the methods and procedures used in this study This chapter mentions the design that the research was carried out, the participants in

the research, the instruments and the materials used in the study and the

procedure that the study was done This chapter also discusses the variables in the study, consisting of the independent variable, dependent variables and the way that the researcher controlled the threats to the validity of the research The two last sections of this chapter are for the assumptions and limitations

3.1 - RESEARCH DESIGN

The study used Non-equivalent Two-group Pretest Posttest Quasi- Experimental Design to check if the use of the analytic scoring scale would help students improve their writing ability better than the holistic scoring scale The two groups were taught by the same teacher, with the same syllabus, during the same amount of time and were assigned the same amount of homework The only difference between the two groups in the process of studying was that the holistic group’s papers were scored holistically and the analytic group’s papers were scored analytically In the course, all of the students were required to write a series of six essays with different types of essays in a portfolio assessment At

the end of the course, a final exam was held together for all students in the two

groups Then, the independent t-tests were used for checking with the differences

between sets of the scores

In addition, a questionnaire was used for investigating the attitudes of students in the analytic group towards the use of the analytic scoring scale The questionnaire is a six-scale Likert-type one Students in the analytic group were the subjects of the questionnaire

Ngày đăng: 07/01/2022, 19:59

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w