Bài viết nêu ra một vài vấn đề có liên quan tới quá trình xây dựng và thiết kế một bài kiểm tra năng lực trong chương trình học tiếng Anh sẽ được nêu lên và thảo luận.
Trang 11 INTRODUCTION
As every nation is increasingly integrating into
international business, not only the governments
but also the community at large recognise that
a high level of English language ability among
workforce is imperial for the success in almost
every aspect of life Therefore, a widespread
concern for a standard of English proficiency,
together with a buoyant demand for the validity of
English proficency test, has been addressed among
educational institutions In order to ensure a high
standard of English proficiency among English
language learners, a number of efforts have been
NGUYỄN MẠNH TUẤN *
* Học viện Khoa học Quân sự, tuannguyenmanh0715@gmail.com
Ngày nhận bài: 24/4/2018; ngày sửa chữa: 22/5/2018; ngày duyệt đăng: 20/6/2018
ĐỘ “TIN CẬY” VÀ ĐỘ “XÁC TRỊ”
TRONG XÂY DỰNG, THIẾT KẾ BÀI KIỂM TRA
ĐÁNH GIÁ NĂNG LỰC TIẾNG ANH,
NHỮNG ĐIỂM CẦN LƯU Ý ĐỐI VỚI GIẢNG VIÊN
put by education experts to provide reliable and valid English tests as many as possible
Following a critical review of literature
on English Proficiency Test and its validy and reliability, the current paper is hoped to highlight the significance of reliability and validity of English Proficency Test Therefore, it is essential,
in this article, to work out the following basic points:
- What English Proficiency Test is;
- What the reliability of English Proficiency Test is and how to achieve it;
TÓM TẮT
Kiểm tra là một phần không thể thiếu trong các chương trình học ngoại ngữ nói chung, và chương
trình học tiếng Anh nói riêng Từ thực tế đó, mối quan tâm tới “độ tin cậy” và “độ xác trị” của
một bài kiểm tra năng lực tiếng Anh là thực sự quan trọng Bởi một thực tế là hầu hết các giáo
viên tiếng Anh hiện nay hầu như chưa được đào tạo về kiểm tra đánh giá, mà họ hầu hết dựa vào
khả năng trực giác, kinh nghiệm và giáo trình để xây dựng, thiết kế một bài kiểm tra tiếng Anh
Từ những lý do nêu trên, trong khuôn khổ bài viết này, một vài vấn đề có liên quan tới quá trình
xây dựng và thiết kế một bài kiểm tra năng lực trong chương trình học tiếng Anh sẽ được nêu lên
và thảo luận
Từ khóa: bài kiểm tra năng lực tiếng Anh, chương trình học tiếng Anh, độ tin cậy, độ xác trị
Trang 2- What the validity of English Proficiency Test
is and how to achieve it
2 DEFINITION OF ENGLISH
PROFICIENCY TEST
The nature of the term “language proficiency”,
for a long time, is still an area of disagreement
among eminent linguistic and educational experts
with no clear definition A number of researchers
like Bachman and Palmer (1996) favour the
term “ability” rather than “proficiency” Brown
(2004) shares the view with Bachman and
Palmer, explaining that the term “ability” sounds
more consistent with the current understanding
that specific components of language need to be
assessed separately (p 71) However, there is a
general agreement on both terms, that is to say
the constructs that can be specified and measured
Bachman and Palmer (1996) recommend that
the language ability consist of four component
skills: listening, speaking, reading and writing
McNamara (2000) further suggests that the
integrative nature of language ability should
be evaluated by integrating several isolated
components (grammar, lexicology) with skill
performances (reading, listening, writing, and
speaking) Meanwhile, Hughes in his Testing for
language teachers (2003, p.44) mentions that
Proficiency tests refer to the ones that are designed
to measure people’s ability in a language
From these ideas mentioned above, an English
Proficiency Test can be defined as a kind of test to
test language ability of English language learners
in terms of language components and language
skill performances
3 QUALITIES FOR A GOOD ENGLISH
PROFICIENCY TEST
There are six qualities needed for an English
Proficiency Test, stated by Bachman (1996),
namely reliability, construct validity, authenticity,
interactiveness, impact and practicality He further
indicates that the conventional means to define such test qualities has been to some extent intuitive
In his view, therefore, test designers should try to attain the balance among these qualities
As a matter of fact, the discussion of all these qualities requires considerable time and space Within this paper, the first two qualities, reliability and construct validity will be focused Accordingly three major issues relating to language test reliability and validity will be clarified They include:
- Define English proficiency test reliability and test validity;
- Factors influencing English proficiency test reliability and validity;
- How to provide reliability and validity in English proficiency test
3.1 English Proficiency Test Reliability
A lot of attempts have been made to provide
an insight into the reliability of language proficiency test To define a language proficiency test, Henning (1987) holds that only when does
an examinee’s result of the same or similar test prove consistent, a test is regarded as reliable Brown (1996) demonstrates reliability by making comparison of language testing and measuring instruments Both of them require the same results whenever measurement occurs In the same year, Bachman indicates that a language proficiency test demonstrates its reliability when the same test or two tests in the same level of difficulty, in a two-week interval from each other, take place with no significant difference between the levels of scores From these ideas, it can be inferred that the reliability of language proficiency test is a function
of accuracy However, it is necessary to note that unlike other types of measuring, measuring language proficiency is a much more complicated process since this is the task of dealing with abstract notions rather than objective reality
Trang 33.1.1 Factors influencing English Proficiency
Test Reliability
Accurately accessing students’ language
ability requires the awareness from teachers as
well as educational staff of the considerations to
be taken into Therefore, Brown (1996) divides
these factors into three general categories:
environmental factors; administrative factors;
features of the test items
Environmental Factors
A number of environmental factors, which
negatively influence of students’ language
performance, have been acknowledged If a test is
monitored in a noisy, cramped setting where it is
too hot or too cold, students’ results are likely to
suffer Likewise, if the test takes place in a badly-
lit surroundings, students’ performance is by all
means negatively affected
Besides, according to Henning (1987), these
objective factors, the test inconsistency can stem
from psychological or physiological changes in the
test takers He further proclaims that physical or
psychological illness, sickness, and the like might
as well result in wrong reflection of the students’
language proficiency It should be acknowledged
that unpredictable and out of the teachers’ control
as these are, constant efforts should be made to
create favourable testing conditions
Administrative Factors
Factors relevant to administration procedures
are also highlighted as the one contributing to
the decline in students’ language performance
As Henning (1987) states that this is result of the
testing procedures applied in different groups of
students in different locations and different days
of testing Moreover, the decrease in the test
reliability also results from factors such as implicit
instructions or unsuitable time of test
Features of the Test Items
There has been suggested that the length, difficulty and manner in which the test is implemented are factors affecting test reliability First, there is argument that the longer test takes place, the better job of spreading students’ level proficiency it does Moreover, it is the level of test difficulty that has also made great contribution to test reliability Explicitly, too difficult or too easy tests surely fail to evaluate accurately students in terms of their proficiency Last but not least, it is often reported that the rest reliability also depends
on the manner of test, the way in which students respond to the examination Being familiar with test procedures, students seem to develop a certain kind of strategy and techniques to deal with questions more effectively, which undoubtedly leads to the lack of test reliability
3.1.2 Ways of Improving Test Reliability
To maximize test reliability, requires significant complex methods Due to the limit
of time and space, within this paper only two of them, which can be easily applied by teachers, will
be discussed
Test – Retest method
In this method, the same test is implemented twice in the same group of students The second implementation takes place no later than two weeks from the first one Students are not only uninformed of the first test result but also given
no feedback on their performances They are also not warned about the second one and, therefore, undergo no preparation in the upcoming test during this period After the second test, individual results will be arranged into two columns to make comparison If there is no significant difference, it will be claimed that the test seem to meet reliability requirement Although, as Brown (1996) states, this way might sound strange and upset students who are asked to take the same test twice, it could
Trang 4prove to be a useful method of working out about
the reliability of a test
Parallel Test Method
In this method two test equivalent in terms
of difficulty are conducted to the same group of
students The same procedures as in the test-retest
methods are applied Now, although parallel test
method sounds more natural than the test-retest
method, it is more challenging because two
versions of a test need to be designed with the strict
equivalence in terms of difficulty Consequently,
the level of difficulty, at first, is defined and then
the test items are developed to match the difficulty,
requiring teachers and test designer a huge amount
of effort
3.2 Test Validity
As Huges (1992) states, a test proves valid
only when it corresponds with language skills or
structures which are going to be measured For
example, when testing students’ knowledge of
vocabulary, which they have just covered, students
should be tested what they have already been
presented If in the test, some vocabulary items
of which students have yet to receive instructions
and explanations are included in the test, the test
is surely reduced to invalidity, since it fails to
respond what is designed to identify
It will be a mistake when discussing language
test validity without clarifying the construct
validity According to Bachman (1996) “the so
called construct validity is subordinate to the sense
and rationality of interpretation of the language
test scores, which means this interpretation
is the assessment of language skills of the
subject” (Bachman and Palmer, 1996,
pp.254-271) Bachman holds a belief that by means of
interpreting the test score, we can not only assess
the language ability of the subject, but we also
estimate the reasonability of the language adopted
in the test For example, when the aim of the test is
to evaluate students’ ability to use Passive Voice,
it is important that the test be designed to directly deal with this grammatical structure in the hope that the scores will help us to assess our students’ language proficiency If somehow the test items include other structures, such as Conditionals, the test will surely lack validity
From the mentioned ideas, it could be said that construct validity is to interpret scores, from which language proficiency of students and test tasks can
be estimated
3.2.1 Factors that Affect Test Validity
A series of factors having negative effects on validity have been identified Henning (1987), for example, has listed some of them The first factor that affects test validity is the mismatch between a test and construct it is going to measure Bachman also proposes that an invalid adaptation of tests
is another detrimental factor If, for instance, a test designed to test lexical level of first-year students, is used with high school students, it is surely invalid However, only when McNamara (2000) proposes that there are two major notable factors: “irrelevant variance of validity” and
“underrepresentation of validity ”, is the problem further clarified
Irrelevant Variance of Validity
A test will be classified into “irrelevant variance” if the test is too broad, consisting a number of variables which are irrelevant to the interpreted validity McNamara argues that the tested knowledge or skill mismatches in a setting which is either out of student’s experience or irrelevant to the content being tested For example,
in an oral test, candidates may be asked to discuss
an abstract topic; if that topic is of their disinterest
or is one of which they may be ignorance, their performance stands less chance of competence than when they are asked to speak on a more accustomed topic at the same level of abstraction
Trang 5In this case, it is noted that the quality being tested,
the ability to discuss an abstract topic in English, is
inconsistent with irrelevant requirement of having
particular knowledge of a certain topic
Underrepresentation of Validity
“Underrepresentation of validity is contrary
to “irrelevant variance of validity”, that is to say
the testing is insufficient; the test either is too
narrow in terms of knowledge or fails to include
important aspects of validity In other words, as
Fulcher (2010) states, the extent to which a test
fails to measure the relevant knowledge is the
degree to which it under-represents the validity
that is supposed to be tested
3.2.2 Methods of Improving Language
Proficiency Test Validity
When discussing how to determine the test
validity, Henning (1987) indicates that there are
two main ways to achieve test validity One is the
experimental method in which the data collection
together with the statistic formulas is applied to
calculation of validity The other is through
non-experimental methods This involves inspection,
intuition and common sense Since the application
of experimental methods requires special training
in terms of statistics and the use of specialized
computer programs to work out complex
calculations, within the paper, the author would
focus on non-experimental methods for preference
Although, as many worry, lack of experimental
evidence may somehow lead to lack of objectivity,
by a number of practical actions teachers can
enhance the chances of upholding the validity of
their test For example, if one teacher wants to
evaluate his/her students’ knowledge of grammar
at the end of an elementary course, he or she need to
acknowledge and be aware of what knowledge of
grammar at the elementary level consists of Then,
he or she should adopt test items matching what
students have been exposed to during the course
4 CONCLUSION AND IMPLICATIONS FOR TEACHERS
This paper has provided some basic understandings of English proficiency test in which the definition, along with qualities needed for English proficiency test, is mentioned Also,
“reliability” and “validity” are chosen among the features of English proficiency test to be discussed Accordingly, the factors that affect and the methods used to improve “reliability” and
“validity” are also discussed
The paper is written in the hope of providing what is fundamental in designing and developing English proficiency test Without it, students will
be exposed to a considerable challenge in English learning process This, unfortunately, leads to the fact that teachers are incapable of providing students with objective feedback about students’ progress in their English learning process This lack of knowledge in turn has bad effect on teachers as well They will do not address what their students’ weaknesses are and how to promote their strengths
From such reasons, it is significant that teachers train themselves in problems relevant
to assessment and testing Also, our educational institutions should start offering courses in test design and development together with other courses
in English language teaching methodology./
References:
Bachman, L (1980) Fundamental Considerations
in Language Testing Oxford: Oxford
University Press
Bachman, L F., & Palmer, A S (1996) Language
Testing in Practice: Designing and Developing Useful Language Tests Oxford: Oxford
University Press
Brown, J D (1996) Testing in Language
Programs New Jersey: Prentice Hall Regents
Brown, H D (2004) Language assessment:
Trang 6Principles and classroom practices White
Plains, New York: Pearson Education
Fulcher, G (2010) Practical Language Testing
London: Hodder Education
Henning, G (1987) A Guide to Language
Testing: Development, Evaluation, Research
Massachusetts: Heinle & Heinle
Hughes, A (1992) Testing for Language Teachers
Cambridge: Cambridge University Press
Hughes, A (2003) Testing for Language Teachers
Press
McNamara, T F (2000) Communication and design of language tests In H G Widdowson (Ed,), Language testing (pp 13-22) Oxford, England: Oxford University Press
A REVIEW OF ENGLISH PROFICIENCY TEST: RELIABILITY, VALIDITY,
AND IMPLI-CATIONS FOR TEACHERS
NGUYEN MANH TUAN Abstract: Testing is an indispensable component in foreign language programs in general, and
in English in particular In this context, the concerns about the reliability and validity are of importance There is a fact that teachers with practically no training in the field of test development often depend mostly on their own intuition or their previous experience and text books From these above, within this article, the problems of test design and development in English program will be raised and discussed
Keywords: English proficiency test, English program, reliability, validity
Received: 24/4/2018; Revised: 22/5/2018; Accepted for publication: 20/6/2018