Độ “tin cậy” và độ “xác trị” trong xây dựng, thiết kế bài kiểm tra đánh giá năng lực tiếng Anh, những điểm cần lưu ý đối với giảng viên

Bài viết nêu ra một vài vấn đề có liên quan tới quá trình xây dựng và thiết kế một bài kiểm tra năng lực trong chương trình học tiếng Anh sẽ được nêu lên và thảo luận.

Trang 1

1 INTRODUCTION

As every nation is increasingly integrating into

international business, not only the governments

but also the community at large recognise that

a high level of English language ability among

workforce is imperial for the success in almost

every aspect of life Therefore, a widespread

concern for a standard of English proficiency,

together with a buoyant demand for the validity of

English proficency test, has been addressed among

educational institutions In order to ensure a high

standard of English proficiency among English

language learners, a number of efforts have been

NGUYỄN MẠNH TUẤN *

* Học viện Khoa học Quân sự,  tuannguyenmanh0715@gmail.com

Ngày nhận bài: 24/4/2018; ngày sửa chữa: 22/5/2018; ngày duyệt đăng: 20/6/2018

ĐỘ “TIN CẬY” VÀ ĐỘ “XÁC TRỊ”

TRONG XÂY DỰNG, THIẾT KẾ BÀI KIỂM TRA

ĐÁNH GIÁ NĂNG LỰC TIẾNG ANH,

NHỮNG ĐIỂM CẦN LƯU Ý ĐỐI VỚI GIẢNG VIÊN

put by education experts to provide reliable and valid English tests as many as possible

Following a critical review of literature

on English Proficiency Test and its validy and reliability, the current paper is hoped to highlight the significance of reliability and validity of English Proficency Test Therefore, it is essential,

in this article, to work out the following basic points:

- What English Proficiency Test is;

- What the reliability of English Proficiency Test is and how to achieve it;

TÓM TẮT

Kiểm tra là một phần không thể thiếu trong các chương trình học ngoại ngữ nói chung, và chương

trình học tiếng Anh nói riêng Từ thực tế đó, mối quan tâm tới “độ tin cậy” và “độ xác trị” của

một bài kiểm tra năng lực tiếng Anh là thực sự quan trọng Bởi một thực tế là hầu hết các giáo

viên tiếng Anh hiện nay hầu như chưa được đào tạo về kiểm tra đánh giá, mà họ hầu hết dựa vào

khả năng trực giác, kinh nghiệm và giáo trình để xây dựng, thiết kế một bài kiểm tra tiếng Anh

Từ những lý do nêu trên, trong khuôn khổ bài viết này, một vài vấn đề có liên quan tới quá trình

xây dựng và thiết kế một bài kiểm tra năng lực trong chương trình học tiếng Anh sẽ được nêu lên

và thảo luận

Từ khóa: bài kiểm tra năng lực tiếng Anh, chương trình học tiếng Anh, độ tin cậy, độ xác trị

Trang 2

- What the validity of English Proficiency Test

is and how to achieve it

2 DEFINITION OF ENGLISH

PROFICIENCY TEST

The nature of the term “language proficiency”,

for a long time, is still an area of disagreement

among eminent linguistic and educational experts

with no clear definition A number of researchers

like Bachman and Palmer (1996) favour the

term “ability” rather than “proficiency” Brown

(2004) shares the view with Bachman and

Palmer, explaining that the term “ability” sounds

more consistent with the current understanding

that specific components of language need to be

assessed separately (p 71) However, there is a

general agreement on both terms, that is to say

the constructs that can be specified and measured

Bachman and Palmer (1996) recommend that

the language ability consist of four component

skills: listening, speaking, reading and writing

McNamara (2000) further suggests that the

integrative nature of language ability should

be evaluated by integrating several isolated

components (grammar, lexicology) with skill

performances (reading, listening, writing, and

speaking) Meanwhile, Hughes in his Testing for

language teachers (2003, p.44) mentions that

Proficiency tests refer to the ones that are designed

to measure people’s ability in a language

From these ideas mentioned above, an English

Proficiency Test can be defined as a kind of test to

test language ability of English language learners

in terms of language components and language

skill performances

3 QUALITIES FOR A GOOD ENGLISH

PROFICIENCY TEST

There are six qualities needed for an English

Proficiency Test, stated by Bachman (1996),

namely reliability, construct validity, authenticity,

interactiveness, impact and practicality He further

indicates that the conventional means to define such test qualities has been to some extent intuitive

In his view, therefore, test designers should try to attain the balance among these qualities

As a matter of fact, the discussion of all these qualities requires considerable time and space Within this paper, the first two qualities, reliability and construct validity will be focused Accordingly three major issues relating to language test reliability and validity will be clarified They include:

- Define English proficiency test reliability and test validity;

- Factors influencing English proficiency test reliability and validity;

- How to provide reliability and validity in English proficiency test

3.1 English Proficiency Test Reliability

A lot of attempts have been made to provide

an insight into the reliability of language proficiency test To define a language proficiency test, Henning (1987) holds that only when does

an examinee’s result of the same or similar test prove consistent, a test is regarded as reliable Brown (1996) demonstrates reliability by making comparison of language testing and measuring instruments Both of them require the same results whenever measurement occurs In the same year, Bachman indicates that a language proficiency test demonstrates its reliability when the same test or two tests in the same level of difficulty, in a two-week interval from each other, take place with no significant difference between the levels of scores From these ideas, it can be inferred that the reliability of language proficiency test is a function

of accuracy However, it is necessary to note that unlike other types of measuring, measuring language proficiency is a much more complicated process since this is the task of dealing with abstract notions rather than objective reality

Trang 3

3.1.1 Factors influencing English Proficiency

Test Reliability

Accurately accessing students’ language

ability requires the awareness from teachers as

well as educational staff of the considerations to

be taken into Therefore, Brown (1996) divides

these factors into three general categories:

environmental factors; administrative factors;

features of the test items

Environmental Factors

A number of environmental factors, which

negatively influence of students’ language

performance, have been acknowledged If a test is

monitored in a noisy, cramped setting where it is

too hot or too cold, students’ results are likely to

suffer Likewise, if the test takes place in a badly-

lit surroundings, students’ performance is by all

means negatively affected

Besides, according to Henning (1987), these

objective factors, the test inconsistency can stem

from psychological or physiological changes in the

test takers He further proclaims that physical or

psychological illness, sickness, and the like might

as well result in wrong reflection of the students’

language proficiency It should be acknowledged

that unpredictable and out of the teachers’ control

as these are, constant efforts should be made to

create favourable testing conditions

Administrative Factors

Factors relevant to administration procedures

are also highlighted as the one contributing to

the decline in students’ language performance

As Henning (1987) states that this is result of the

testing procedures applied in different groups of

students in different locations and different days

of testing Moreover, the decrease in the test

reliability also results from factors such as implicit

instructions or unsuitable time of test

Features of the Test Items

There has been suggested that the length, difficulty and manner in which the test is implemented are factors affecting test reliability First, there is argument that the longer test takes place, the better job of spreading students’ level proficiency it does Moreover, it is the level of test difficulty that has also made great contribution to test reliability Explicitly, too difficult or too easy tests surely fail to evaluate accurately students in terms of their proficiency Last but not least, it is often reported that the rest reliability also depends

on the manner of test, the way in which students respond to the examination Being familiar with test procedures, students seem to develop a certain kind of strategy and techniques to deal with questions more effectively, which undoubtedly leads to the lack of test reliability

3.1.2 Ways of Improving Test Reliability

To maximize test reliability, requires significant complex methods Due to the limit

of time and space, within this paper only two of them, which can be easily applied by teachers, will

be discussed

Test – Retest method

In this method, the same test is implemented twice in the same group of students The second implementation takes place no later than two weeks from the first one Students are not only uninformed of the first test result but also given

no feedback on their performances They are also not warned about the second one and, therefore, undergo no preparation in the upcoming test during this period After the second test, individual results will be arranged into two columns to make comparison If there is no significant difference, it will be claimed that the test seem to meet reliability requirement Although, as Brown (1996) states, this way might sound strange and upset students who are asked to take the same test twice, it could

Trang 4

prove to be a useful method of working out about

the reliability of a test

Parallel Test Method

In this method two test equivalent in terms

of difficulty are conducted to the same group of

students The same procedures as in the test-retest

methods are applied Now, although parallel test

method sounds more natural than the test-retest

method, it is more challenging because two

versions of a test need to be designed with the strict

equivalence in terms of difficulty Consequently,

the level of difficulty, at first, is defined and then

the test items are developed to match the difficulty,

requiring teachers and test designer a huge amount

of effort

3.2 Test Validity

As Huges (1992) states, a test proves valid

only when it corresponds with language skills or

structures which are going to be measured For

example, when testing students’ knowledge of

vocabulary, which they have just covered, students

should be tested what they have already been

presented If in the test, some vocabulary items

of which students have yet to receive instructions

and explanations are included in the test, the test

is surely reduced to invalidity, since it fails to

respond what is designed to identify

It will be a mistake when discussing language

test validity without clarifying the construct

validity According to Bachman (1996) “the so

called construct validity is subordinate to the sense

and rationality of interpretation of the language

test scores, which means this interpretation

is the assessment of language skills of the

subject” (Bachman and Palmer, 1996,

pp.254-271) Bachman holds a belief that by means of

interpreting the test score, we can not only assess

the language ability of the subject, but we also

estimate the reasonability of the language adopted

in the test For example, when the aim of the test is

to evaluate students’ ability to use Passive Voice,

it is important that the test be designed to directly deal with this grammatical structure in the hope that the scores will help us to assess our students’ language proficiency If somehow the test items include other structures, such as Conditionals, the test will surely lack validity

From the mentioned ideas, it could be said that construct validity is to interpret scores, from which language proficiency of students and test tasks can

be estimated

3.2.1 Factors that Affect Test Validity

A series of factors having negative effects on validity have been identified Henning (1987), for example, has listed some of them The first factor that affects test validity is the mismatch between a test and construct it is going to measure Bachman also proposes that an invalid adaptation of tests

is another detrimental factor If, for instance, a test designed to test lexical level of first-year students, is used with high school students, it is surely invalid However, only when McNamara (2000) proposes that there are two major notable factors: “irrelevant variance of validity” and

“underrepresentation of validity ”, is the problem further clarified

Irrelevant Variance of Validity

A test will be classified into “irrelevant variance” if the test is too broad, consisting a number of variables which are irrelevant to the interpreted validity McNamara argues that the tested knowledge or skill mismatches in a setting which is either out of student’s experience or irrelevant to the content being tested For example,

in an oral test, candidates may be asked to discuss

an abstract topic; if that topic is of their disinterest

or is one of which they may be ignorance, their performance stands less chance of competence than when they are asked to speak on a more accustomed topic at the same level of abstraction

Trang 5

In this case, it is noted that the quality being tested,

the ability to discuss an abstract topic in English, is

inconsistent with irrelevant requirement of having

particular knowledge of a certain topic

Underrepresentation of Validity

“Underrepresentation of validity is contrary

to “irrelevant variance of validity”, that is to say

the testing is insufficient; the test either is too

narrow in terms of knowledge or fails to include

important aspects of validity In other words, as

Fulcher (2010) states, the extent to which a test

fails to measure the relevant knowledge is the

degree to which it under-represents the validity

that is supposed to be tested

3.2.2 Methods of Improving Language

Proficiency Test Validity

When discussing how to determine the test

validity, Henning (1987) indicates that there are

two main ways to achieve test validity One is the

experimental method in which the data collection

together with the statistic formulas is applied to

calculation of validity The other is through

non-experimental methods This involves inspection,

intuition and common sense Since the application

of experimental methods requires special training

in terms of statistics and the use of specialized

computer programs to work out complex

calculations, within the paper, the author would

focus on non-experimental methods for preference

Although, as many worry, lack of experimental

evidence may somehow lead to lack of objectivity,

by a number of practical actions teachers can

enhance the chances of upholding the validity of

their test For example, if one teacher wants to

evaluate his/her students’ knowledge of grammar

at the end of an elementary course, he or she need to

acknowledge and be aware of what knowledge of

grammar at the elementary level consists of Then,

he or she should adopt test items matching what

students have been exposed to during the course

4 CONCLUSION AND IMPLICATIONS FOR TEACHERS

This paper has provided some basic understandings of English proficiency test in which the definition, along with qualities needed for English proficiency test, is mentioned Also,

“reliability” and “validity” are chosen among the features of English proficiency test to be discussed Accordingly, the factors that affect and the methods used to improve “reliability” and

“validity” are also discussed

The paper is written in the hope of providing what is fundamental in designing and developing English proficiency test Without it, students will

be exposed to a considerable challenge in English learning process This, unfortunately, leads to the fact that teachers are incapable of providing students with objective feedback about students’ progress in their English learning process This lack of knowledge in turn has bad effect on teachers as well They will do not address what their students’ weaknesses are and how to promote their strengths

From such reasons, it is significant that teachers train themselves in problems relevant

to assessment and testing Also, our educational institutions should start offering courses in test design and development together with other courses

in English language teaching methodology./

References:

Bachman, L (1980) Fundamental Considerations

in Language Testing Oxford: Oxford

University Press

Bachman, L F., & Palmer, A S (1996) Language

Testing in Practice: Designing and Developing Useful Language Tests Oxford: Oxford

University Press

Brown, J D (1996) Testing in Language

Programs New Jersey: Prentice Hall Regents

Brown, H D (2004) Language assessment:

Trang 6

Principles and classroom practices White

Plains, New York: Pearson Education

Fulcher, G (2010) Practical Language Testing

London: Hodder Education

Henning, G (1987) A Guide to Language

Testing: Development, Evaluation, Research

Massachusetts: Heinle & Heinle

Hughes, A (1992) Testing for Language Teachers

Cambridge: Cambridge University Press

Hughes, A (2003) Testing for Language Teachers

Press

McNamara, T F (2000) Communication and design of language tests In H G Widdowson (Ed,), Language testing (pp 13-22) Oxford, England: Oxford University Press

A REVIEW OF ENGLISH PROFICIENCY TEST: RELIABILITY, VALIDITY,

AND IMPLI-CATIONS FOR TEACHERS

NGUYEN MANH TUAN Abstract: Testing is an indispensable component in foreign language programs in general, and

in English in particular In this context, the concerns about the reliability and validity are of importance There is a fact that teachers with practically no training in the field of test development often depend mostly on their own intuition or their previous experience and text books From these above, within this article, the problems of test design and development in English program will be raised and discussed

Keywords: English proficiency test, English program, reliability, validity

Received: 24/4/2018; Revised: 22/5/2018; Accepted for publication: 20/6/2018

Định dạng
Số trang	6
Dung lượng	289,49 KB