FINAL ACHIEVEMENT COMPUTER

A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF FOREIGN LANGUAGES DEPARTMENT OF POSTGRADUATE STUDIES

NGUYEN THI VIET HA

A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4 TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY

(đánh giá độ tin cậy của bài thi trắc nghiệm THứ NHấT TRÊN MáY TíNH cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học kinh doanh và công nghệ hà nội)

Minor Programme Thesis Field: Methodology

HANOI, 2008

VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF FOREIGN LANGUAGES DEPARTMENT OF POSTGRADUATE STUDIES

NGUYễN THị VIệT Hà

Trang 2

A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4 TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY

(đánh giá độ tin cậy của bài thi trắc nghiệm THứ NHấT TrÊN MáYTíNH cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyênngành tiếng anh trờng đại học kinh doanh và công nghệ hà nội)

Minor Programme Thesis

Field: MethodologyCode: 601410Supervisor: Nguyễn Thu Hiền M.A

HANOI, 2008

Trang 3

VIETNAM NATIONAL UNIVERSITY, HANOI

COLLEGE OF FOREIGN LANGUAGES DEPARTMENT OF POSTGRADUATE STUDIES

CANDIDATE S STATEMENT’S STATEMENT

I hereby state that I: Nguyen Thi Viet Ha, Class 14A, being a candidate for the degree ofMaster of Arts (TEFL) accept the requirements of the College relating to the retention anduse of Master of Arts Thesis deposited in the library

In terms of these conditions, I agree that the origin of my thesis deposited in the libraryshould be accessible for the purposes of study and research, in accordance with the normalconditions established by the librarian for the care, loan or reproduction of the thesis.Signature

Date

Trang 4

In the completion of this thesis, I have received a great deal of backup Of primary importance has been the role of my supervisor, Ms Nguyen Thu Hien, M.A, Teacher of Department of English and American Languages & Cultures, College of Foreign Language, Vietnam National University, Hanoi I am deeply grateful to her for her precious guidance, enthusiastic encouragement and invaluable critical feedback Without her dedicated support and correction, this thesis could not have been completed

I am deeply indebted to my dear teacher, Mr Vu Van Phuc, M.A, Head of Testing Center, College of Foreign Languages, VNU, who provided me with a lot of useful suggestion and assistance towards my study.

I would also like to express my sincere thanks to all teachers and colleagues in English Department, HUBT, for their help in conducting the survey, sharing opinions and making suggestions to the study Especially, my thanks go to Ms Le Thi Kieu Oanh, Assistant of English Department, HUBT for her willingness to offer test score data

I wish to show my special thanks to the students of K11 at Hanoi University of Business and Technology who have actively participated in the survey

Finally, it is my great pleasure to acknowledge my gratitude to beloved members of my family, especially my husband who constantly encouraged and helped me with my thesis

Trang 5

The main aim of this minor thesis is to evaluate the reliability of the final Achievement Computer-based MCQs Test 1 for the 4th semester non-English majors at Hanoi University of Business and Technology

In order to achieve this aim, a combination of both qualitative and quantitative research methods were adopted The findings indicate that there is a certain degree of unreliability

in the final achievement computer-based MCQs test1 and there are two main factors that cause the unreliability including test item quality and test- takers performance ’S STATEMENT

Having carefully considered a thorough analysis of the collected data, the author made some suggestions in order to improve the quality of the final achievement test and the MCQs test 1 for the non-majors of English in the 4 th semester in Hanoi University of Business and Technology Firstly, the test objectives, sections and skill weight should be adjusted to be more compatible with the course objectives and the syllabus Secondly, a testing committee should be set up for the construction and development of a multi choice item bank including test items which are of good p-value and discrimination value

Trang 6

LIST OF ABBRIVIATIONS

1 CBT: Computer-based testing

2 HUBT: Hanoi University of Business and Technology

3 MC: Multi choice

4 MCQs: Multi choice questions

5 ML Pre- : Market Leader Pre-intermediate

6 KD: Kuder- Richardson

7 SD: Standard deviation

Trang 7

LIST OF TABLES AND CHARTS

1 Table 1 Types of tests

2 Table 2 Scoring format for each semester

3 Table 3 The syllabus for 4 th semester (for non –English majorsEnglish majors )

4 Table 4 Time allocation for language skills and sections

5 Table 5 Specification grid for the final computer-based MCQs test 1

6 Table 6 Main points in the grammar section

7 Table 7 Main points in the vocabulary section

8 Table 8 Topics in reading section

9 Table 9 Items in the functional language sections

10 Table 10: Test reliability coefficient

10 Table 11: p-value of items in 4 sections

11 Table 12: Discrimination value of items in 4 sections

12 Table 13: Number of test items with acceptable p-value and discrimination

value in 4 sections

13 Table 14: Suggested scoring format

14 Table 15: Proposed test specifications

12 Chart 1 Students response on test content’S STATEMENT

13 Chart 2 Students response on item discrimination value’S STATEMENT

14 Chart 3 Students response on time length ’S STATEMENT

15 Chart 4 Students response arbitrariness ’S STATEMENT

16 Chart 5 Students response on relation between test score and their ’S STATEMENT

achievement

Trang 8

1.3 Theoretical and practical significance of the study 2

2.2.3 Considerations in final achievement test construction 7

3.1 The current English learning, teaching and testing situation at HUBT 16

3.2 The course objectives, syllabus and materials used for the second

non-majors of English in Semester 4.

17

3.2.4 Specification grid for the final achievement Computer-based MCQs test

in Semester 4

19

5.1 The compatibility of the objectives, content and skill weight format of

the final achievement computer-based MCQ test 1 for 4 th semester with

the course objectives and the syllabus

23

5.1.2 The test item content in four sections and the syllabus content 24

5.1.3 The skill weight format in the test and the syllabus 26

Trang 9

5.2.1 Reliability coefficient 27

5.4 Pedagogical implications and suggestions on improvements of the

existing final achievement computer-based MCQs test 1 for the

non-English majors at HUBT

34

Trang 10

Chapter 1: Introduction 1.1 Rationale of the study

Testing plays a very important role in teaching and learning process Testing is oneform of measurement which is used to point out strengths and weaknesses in the learnedabilities of the students Through testing, especially tests scores we may discover theperformance of given students and of teachers As far as students are concerned, test scoresreveal what they have achieved after a learning period As for teachers, test scores indicatewhat they have taught to their students Based on test results, we may make improvement inteaching, learning and testing for better instructional effectiveness

Another reason for the selection of testing a matter of study lies in the fact that thecurrent language testing at Hanoi University of Business and Technology (HUBT) hasbeen under a lot of controversy among students and teachers Testing is mainly carried out

in the form of two objective tests on computers (named test 1 and test 2) which areadministered at the end of each semester The scores that a student gets on these tests arethe main indicators of his or her performance during the whole semester There aredifferent comments on the results of these tests, especially the test 1 for the second-yearnon-English majors Some subject teachers claim that these tests do not truly reflect thestudents’S STATEMENT language competence Others say that these tests are appropriate to what studentshave learnt in class and compatible with the course objectives and therefore reliable Also,among the students, do opposite ideas exist Many think that these tests are more difficultthan what they have learnt and studied for the exam, others say that these test items areeasy and relevant to what they have been taught Therefore finding out whether the tests areclosely related with what the students have been learnt and what the teachers have taught,also, whether these tests are of reliability is indispensable

For the two reasons mentioned above, the author would like to undertake this study

entitled A study on the reliability of the final achievement Computer-based MCQs TestE

1 for the 4th semester non-English majors at Hanoi University of Business and Technology” with the intention to examine rumors about this test In addition, the author

hopes that the study results help to raise awareness among teachers as well as those whoare interested in this field At the same time, study results, in some extent, can be applied toimprove the current testing situation in HUBT

1.2 Aims and research questions

The main aim of the study is to investigate the reliability of the existing finalachievement MCQs test 1 (4th semester) for non-English majors at HUBT throughanalyzing the test objectives, test content and test skill weight format, students’S STATEMENT scores, testitems, perception and comments from students on the test and then to make suggestionstowards the test’S STATEMENTs improvement

To achieve this aim, the following research questions are set for exploration:

Trang 11

1 Are the objectives, content and skill weight format of the final achievementcomputer-based MCQs test 1 compatible with the course objectives, thesyllabus content and skill weight format ?

2 To what extend is the test 1 reliable?

3 What is the student’S STATEMENTs attitude towards the final achievement Computer-basedMCQs test 1?

1.3 Scope of the study

The existing final achievement Computer-based MCQs test 1 in the 4th semester forthe second-year non-English majors at HUBT

1.4 Theoretical and practical significance of the study

Theoretically, the study proves that testing is crucial in order to measure andevaluate the quality of learning and teaching Also, test reliability is one of the mostimportant criteria for the evaluation of a test

Practically, the study presents how reliable the final achievement MCQs test 1administered at HUBT is and how to improve its quality

1.5 Method of the study :

Both qualitative and quantitative methods are used

Regarding literature review on language testing, course objectives, syllabus, theobjectives, content and format of the achievement test 1 for 4th term, results of thequestionnaires for students, qualitative method is applied

With reference to test scores and test items analysis, quantitative method is used

1.6 Organization of the paper

The study is composed of 6 chapters

Chapter 1- Introduction briefly states the rationale, aims and research questions,scope of the study, theoretical and practical significance of the study, method of the studyand organization of the paper

Chapter 2- Literature review discusses relevant theories of language testing, finalachievement test, Computer-based MCQ tests and test reliability

Chapter 3- The context of the study deals with English learning, teaching and testingsituation at HUBT, course book, syllabus and check list for the test

Chapter 4- Methodology presents participants, data collection instruments, datacollection and data analysis procedure

Chapter 5–English majors Results and Discussions presents and discusses the results of the study.Suggestions for the improvement of the achievement test 1 are also proposed in thischapter

Chapter 6- Conclusion summarizes the findings, mentions the limitations andprovides suggestions for further study

Trang 12

Chapter 2: Literature review 2.1 Language testing

2.1.1 What is a language test?

There are a wide variety of definitions of a language test which have one point ofsimilarity That is to say, a language test is considered as a device for measuringindividuals’S STATEMENT language ability

According to Henning (1987, p.1), ETesting, including all form of language test, isone form of measurement” In his opinion, tests such as listening or readingcomprehension are delivered in order to find out the extent to what the abilities of theseskills are present in the learners Similarly, Bachman (1990, p.20) stated: EA test is ameasurement instrument designed to elicit a specific sample of an individual’S STATEMENTsbehavior” He also considered obtaining the elicited sample of behavior as thedistinction of a test from other types of measurement

Brown H.D (1995, p.384) presented the notion in a simpler way: EA test, in plainwords, is a method of measuring a person’S STATEMENTs ability or knowledge in a given domain”

He explained that a test first and foremost is a method which includes items andtechniques requiring the performance of testees Via this performance, a person’S STATEMENTsability or language competence is measured

These viewpoints show that a language test is an effective tool of measuring andassessing students’S STATEMENT language knowledge and skills and providing precious informationfor better future teaching and learning

2.1.2 The purposes of language tests

Language tests regarding their purposes are perceived from different perspectives

by different scholars Typically, Henton (1990) mentioned 7 points which can berepresented as follows:

 Finding out about progress

 Encouraging students

 Finding out about learning difficulties

 Finding out about achievement

 Placing students

 Selecting student

 Finding out about proficiency

In general, a language test is used to evaluate both teaches and students’S STATEMENTperformance, to make judgment and adjustment to teaching materials and methods, and

to strengthen students’S STATEMENT motivation for their further study

2.1.3 Types of language tests

Language tests can be classified into different types according to their purposes.Henton (1990), Brown (1995), Harrison (1983) and Hughes (1989) pointed out thatlanguage tests include four main types: proficiency tests, diagnostic tests, placementtests and achievement tests with characteristics illustrated in the following table:

Trang 13

Type of test Characteristics

Proficiency test Measure people’S STATEMENTs abilities in a language regardless of any

training they may have had in that languageDiagnostic test Check students’S STATEMENT progress for their strengths and weaknesses

and what further teaching is necessaryAchievement test Assess what students have learnt as known syllabus

Placement test Classify students into groups at different level at the beginning

of a course

Table 1: Types of tests

Another researcher, Henning (1987) divided tests into objective and subjective ones

on the basic of the manner in which they are scored Subjective tests obtain scoring byopinionated-judgment on the part of the scorer while objective tests are scored bycomparing examinee responses with an established set of acceptable responses orscoring key

2.1.4 Criteria of a good language test

Just like any measuring device, a language test presents potential errormeasurement For the purpose of investigating and evaluating and Etesting” a test,researchers such as Brown (1995), Henning (1987), Bachman (1990) and Harrison(1983) identified criteria to determine if a test is good or not A good language testmust feature four most important qualities: reliability, validity, practicality anddiscrimination

The reliability of a test is its consistency (Brown, 1995; Harrison, 1983) A test isreliable only when it yields the same results whether it is administrated under anycircumstances or scored by any markers The validity of a test refers to Ethe degree towhich the test actually measures what it is intended to measure” (Brown, 1995, p.387)

A test is considered to be valid if it possesses content validity, face validity andconstruct validity The practicality of a test is administrative A test is practical when it

is time and money- saving Also, it is easy to administer, mark and interpret Thediscrimination of a test is the extent to which a test separates the students from eachother (Harrison, 1983) In other words, it is the capacity of the test to discriminateamong different students and to reflect individuals’S STATEMENT performance of the same group

2.2 Achievement test

2.2.1 Definition

Achievement tests are of extensive use at different levels of education due to theirdistinguished characteristics Researchers define the notion of achievement tests invarious ways

Henning (1987, p.6) held that:

Achievement tests are used to measure the extent of learning in a

prescribed content domain, often in accordance with explicitly stated

objectives of a learning program

From this definition, it followed that an achievement test was a measurement tooldesigned to examine language competence of learners over a period of instruction

Trang 14

learning and to evaluate instruction program In the same token, Hughes (1989) put thatachievement tests were intended to assess how successful individual students, groups ofstudents or the courses themselves have been in achieving objectives Achievementtests play an important role in the education programs, especially in evaluatingstudents’S STATEMENT acquired language knowledge and skills during a given course

2.2.2 Types of achievement test

Achievement tests can be subdivided into the final achievement and progressachievement according to the time of administration and the desired objectives(Henton, 1990)

Final achievement tests are usually given at the end of the school year or at the end

of the course to measure how far students have achieved the teaching goals Thecontents of these tests must be related to the teaching content and objectives concerned.Progress achievement tests are usually administrated during the course to measurethe progress that students are making The results from the test enables teachers toidentify the weaknesses of the learners, diagnose the areas not properly obtained bystudents during the course in order to have remedial action

Henton (1990) also stated the two types of test differ in the sense that finalachievement tests are designed to cover a longer period of learning and it shouldattempt to cover as much of syllabus as possible

2.2.3 Considerations in final achievement test construction

On the basis of its characteristics, Heaton (1990) put that covering much thecontents of a syllabus or a course book is a requirement for designing a finalachievement test Testers should avoid basing the test on their own teaching rather than

on the syllabus or course book in order to establish and maintain a certain standard Inaddition, Mc Namara (2000) stated that test writers should draw out a test specificationbefore writing a test Test specification is resulted from the process of designing testcontent and test method Test specification has to include information on the length, thestructure of each part of the test, the type of materials with which the candidates willhave to engage, the source of materials, the extent to which authentic materials may bealtered, the response format and how responses are scored They are usually writtenbefore the tests and then the test is written on the basis of the specifications After thetest is written, the specification should be consulted again to see whether the testmatches the objectives set in the specifications

2.3 MCQs test

Multi-choice questions tests (MCQs tests) are objective tests which require noparticular knowledge or training in the examined content area on the part of the scorer(Henning, 1990) They are different from subjective tests in terms of scoring methods.That means no matter which examiners mark the test, a testee will get the same score

on the test (Heaton, 1988)

Trang 15

MCQs tests use multi-choice questions which is also called multi-choice items as atesting technique An MC item is a test item where the test taker is required to choosethe only correct answer from a number of given options (McNamara, 2000; Weir,1990) .

In the view of Heaton (1988), MC items take many forms but their basic structureincludes two parts The initial part is known as stem The primary purpose of the stem is

to present the problem clearly and concisely The stem needs to provide the testees avery general idea of the problem and the answer required The stem may be in the form

of an incomplete statement, a complete statement or a question The other part is thechoices from which the students select their answers and is referred as options/responses or alternatives In an MC item there may be three, four or five options ofwhich one is the correct options or key while the others are distractors of which the task

is to distract the majority of poor students from the correct option The optimumnumber of options in most public test for each multi choice item is five And it isdesirable to use four options for grammar items and five for vocabulary and reading

2.3.2 Benefits of MCQs test

MC items are undoubtedly one of the most widely used types of items in objectivetest (Heaton, 1988) The popularity of this testing technique results from its efficiency.Researchers such as Weir (1990), Heaton (1988) and Hughes (1989) pointed out anumber of benefits which are presented as detailed below

Firstly, the scoring of MCQs test is perfectly reliable, rapid and economical There

is only one correct in the format of an MC item so that the scorers’S STATEMENT interference into thetest is minimized The scorers are not permitted to impose their personal expertise,experience, attitudes and judgment when giving marks to testees’S STATEMENT responses Thetestees, thus, always get a consistent result whoever the scorers are and whenever theirtests are given marks In addition, MCQs tests can be marked mechanically withminimal human intervention As a result, the marking is not only reliable and simplebut also more rapid and often more cost effective than other forms of written test (Weir,1990)

Secondly, an MCQs test can cover a much wider sample of knowledge than a

subjective test When taking an MCQs test, a candidate has only to make a mark on the

paper and therefore it is possible for testers to add more items in a given period of time(Hughes, 1988) With a large number of items in the test, the coverage of knowledge in

MC items is so broad and is very useful for identifying students’S STATEMENT strengths andweaknesses and distinguishing their ability

Thirdly, MCQs tests increase test reliability According to Heaton (1988) and Weir(1990), it will not be difficult to obtain reliability for MCQs tests because of perfectlyobjective scoring Besides, due to the fact that the testees do not have to deploy the skill

of writing as in open-ended one and MC items have clear and unequivocal format, theextent to which measurement errors exert on the trait being assessed is narrowed

Trang 16

Another benefit is that MC items can be trialed beforehand fairly easily From thesetrials, the difficulty level of each item and that of the test as a whole are usuallypossible to be estimated in advance (Weir, 1990) The results from item difficultyestimate make a great contribution to the success of designing a more appropriate test

to candidates’S STATEMENT level of language

In addition, Heaton (1988, p.27) claimed that Emulti choice items can provide auseful means of teaching and testing in various learning situation (particularly at thelower levels) provided that it is always recognized such items test knowledge ofgrammar, vocabulary, etc rather than the ability to use language” MC items can bevery useful in measuring students’S STATEMENT ability to recognize correct grammatical forms, etcand therefore can help both teacher and students to identify areas of difficulty

As far as computer-based MCQs tests are concerned, according to McNamara(2000) many important national and international language tests, including TOEFL, aremoving to computer-based testing (CBT) since there have been rapid developments incomputer technology The main feature of CBT is that stimulus texts and prompts arepresented not in examination booklets but on the screen, with candidates being required

to key in their responses The advent of CBT has not necessarily involved any change inthe test content but often simply represents a change in test method McNamara (2000)noted that the proponents of computer-based testing can point to a number ofadvantages First, just as paper-done MCQs tests, scoring of fixed response items can bedone automatically and the candidate can be given a score immediately Second, thecomputer can deliver tests that are tailored to the particular abilities of the candidate.This type of test, as also called computer-adaptive test, can provide far more informationabout the testees’S STATEMENT ability

2.3.3 Limitations of MCQs tests:

Despite the fact that MCQs tests bring lots of benefits, especially, to testadministrators, there are several problems associated with the use of MC items Theseproblems were identified by a number of researchers such as Weir (1990), Hughes(1989), Heaton (1988), McCOUBRIE P (2004) and McNamara (2000)

First of all, Hughes (1989) criticized that MCQ technique tests only recognitionknowledge To do a given task, a testee just needs to look at the stem and four or fiveoptions and then picks out the key His or her performance is not much more than therecognition of the right form of language It shows no evidence that this person canproduce the language Obviously, this type of test presents a lack between at least somecandidates’S STATEMENT productive and receptive skill and therefore the performance on an MCQstest may give an inaccurate picture of these candidates’S STATEMENT ability (Hughes, 1989) Heaton(1988) also pointed out that an MC item does not lend itself to the testing language ascommunication and the process involved in the actual selection of one out of four or fiveoptions does not bear much relation to the language used in most real life situation

Trang 17

Normally, in everyday situation we are required to produce and receive language while

MC items are merely aimed to test receptive skills

Another problem arises when using MCQs tests is that Ethe multi choice item is one

of the most difficult and time consuming types of items to construct” (Heaton, 1988,p.27) In order to write a good item, test designers have to strictly follow certainprinciples For example, they have to write many more items than they actually need for

a test After that they have to pre-test and analyze students’S STATEMENT performance on the itemevaluate items and recognize the usable ones or even to rewrite the items for asatisfactory final version These procedures take a lot of test constructors’S STATEMENT time and needfar more careful preparation than subjective tests

Furthermore, objective tests of MCQs type encourage guessing (Weir, 1990;Heaton, 1988; Hughes, 1989) Hughes estimated the chance of guessing the correctanswer in a three option multi choice item is roughly 33%; in four or five option item it

is 25% or 20% respectively The format of MC items makes it possible for testees tocomplete some items without reference to the texts they are set on As a result, the scoregained in MCQs maybe suspect and the score range may become narrow

Some other limitations in the use of MC items involve backwash and cheating.Backwash may be harmful because MQ items require students to memorize as manystructures and forms as possible and do not stimulate them to produce language Thuspracticing MC items is not a good way to improve learners’S STATEMENT command of language.Cheating may be facilitated as MC items make students easy to communicate with eachother and exchange selected response nonverbally

Referring to computer-based tests, according to McNamara (2000), this type of testrequires the prior creation of item bank which have been thoroughly trialed Thepreparation for a standardized item bank to estimate difficulty for candidates at givenlevels of ability as precisely as possible is not at ease In addition, delivering CBT raisesthe question of validity and reliability For example, different levels of familiarity withcomputers or of reading texts on computer screens will affect students’S STATEMENT performance.These differences might lead to difficult conclusion about a candidate’S STATEMENTs ability

2.3.4 Principles to construct MC items

In order to construct a good MC item, there are a large number of principles whichcan be summarized as follows (Heaton, 1988):

 Each MC item should have only one answer

 Only one feature at a time should be tested

 Each option should be grammatical correct when placed in the stem, except for thecase of specific grammar test items

 All multi-choice items should be at a level appropriate to the proficiency level ofthe testees

 Multi choice items should be as brief and as clear as possible

Trang 18

 Multi choice items should be arranged in rough order of increasing difficulty andthere should be one or two simple items to Elead in” the testees

2.4 Reliability of a test

In research, the term reliability means ‘repeatability’S STATEMENT or ‘consistency’S STATEMENT A test isconsidered reliable if it would give us the same result over and over again assuming thatwhat we are measuring isn't changing Lynch (2003, p.83) stated that reliability refers toEthe consistency of our measurement” In the same vein, Harrison (1983) explained that to

be reliable, tests should not be elastic in their measurement Whatever the version of thetest a testee take, whatever the occasion the test is administrated, and whatever raters whoscore the test, it still yields the same results

2.4.2 Methods of test reliability estimate

Reliability may be estimated through a variety of methods which is presented below:

* Test-retest method is a classic way to calculate the reliability coefficient of a test The

test is given to a group of students and then given again to these students immediatelyafterward (the interval between two test administration is no more than two weeks) Thetest is assumed to be perfectly reliable if the students get the same score on the first and thesecond administration (Alderson, J.S et al., 1995)

* Parallel-form methods involve correlating the scores from two or more similar (parallel)

tests which are administrated to the same sample of persons A formula for this methodmay be expressed as follows:

Rtt = rA,B (Henning, 1987)

Rtt: the reliability coefficient

rA,B: the correlation of form A with form B of the test when administered to the samepeople at the same time

* Inter-rater method is applied when scores on the test are independent estimates by two

or more raters It involves the correlation of the ratings of one rater with those of another.The following formula is used in calculating reliability:

nrA,B

1 + (n-1)r A,BRtt: inter-rater reliability

n: the number of rater who combines estimates from the final mark for the examinerrA,B: the correlation between the raters, or the average correlation among all rater ifthere are more than two

* Internal consistency method judges the reliability of the test by estimating how

consistent test-takers’S STATEMENT performances on different parts of the tests with each other(Bachman, 1990) The following are internal consistency measures that can be used:

Trang 19

Split-half reliability involves dividing a test into two, and correlating these two

halves The more strongly the two halves correlate, the higher the reliability will be Thismethod uses the following formula:

2rA,B

1 + r A,BRtt: Reliability estimated by the split half method

rA.B: The correlation of the score from one half of the test with those from theother half

Kuder-Richardson Formula 20 (KD20) is based on item level data and is used

when the tester has the results for each test item The KD-20 is as follows:

n st2 - ∑si2

(n-1) st2

Rtt: The KR 20 reliability estimate

n: The number of items in the test

st2: The variance of test scores

∑si2 : The sum of the variances of all items (or ∑pq)

Kuder- Richardson Formula 21 (KD-21) is based on total test scores and assumes that all

items of an equal level of difficulty The KD-21 is as follows:

n x –English majors x 2/n

( n-1) st2

Rtt : The KR 20 reliability estimate

n: The number of items in the test

x: The mean of scores on the test

st2: The variances of test scores

Alderson, J.S et al (1995) stated that for the internal consistency reliability, theperfect reliability index is +1.0 In the same view, Hughes (1989, p.31-32) noted that E theideal reliability coefficient is 1- a test with a reliability coefficient of 1 is one which wouldgive precisely the same results for a particular sets candidates regardless of when ithappened to be administrated” Reliability coefficient for a good vocabulary, structure andreading test is usually in the 0.90 to 0.99 range, for an auditory comprehension test is moreoften in the 0.80 to 0.89 range and for an oral production test it may be in the 0.70 to 0.79range while an MCQs test typically has the reliability coefficient of more than 0.80(Hughes, 1989)

Among the above ways of estimating reliability, test-retest and parallel methodsrequire at least two test administrations while the inter-rater and internal consistency

Trang 20

methods need only a single administration For the reason of convenience and satisfaction,KD20 and KD 21 are often chosen more than the others and are considered the two mostcommon formulae (Alderson J.S et al., 1995)

Concerning MCQs tests, besides estimating test reliability coefficient, item analysisincluding item difficulty and item discrimination provides more concise insight into the testreliability (Henning, 1997)

The formula for calculating item difficulty is:

∑Cr

Np: proportion correct

∑Cr : the sum of correct responses

N: the number of students

Henning (1987) pointed out that p value for each item should be between 0.33 and0.67 and thus the level of difficulty of the item is acceptable If p value is below 0.33, theitem is considered as too difficult If it is above 0.67, the item is too easy

The formula for computation of item discrimination is:

Hc

Hc + LcD: discriminability

Hc: the number of correct response in the high group

Lc: the number of correct response in the low group

The optimal size of each group is 28% of the total sample For very large samples ofexaminees, the number of examinees in the high and low groups are reduced to 20% forcomputational convenience The acceptable discrimination value by sample separationmethod is >= 0.67 (Henning, 1987)

2.4.3 Measures to improve test reliability

Reliability may be improved by eliminating its sources of error Hughes (1989)makes a list of recommendation to improve test reliability as follows:

 Take enough sample of behavior

 Do not allow candidates too much freedom

 Write unambiguous items

 Provide clear and explicit instructions

 Ensure that the test are well laid out and perfectly legible

 Candidate should be familiar with format and testing techniques

 Provide uniform and non-distracting conditions of administration

Furthermore, item difficulty and item discriminability show that the reliability of anMCQs test is low or high (Henning, 1987) Therefore the most straight forward ways to

Trang 21

improve test reliability is to design MCQs items with good level of difficulty anddiscrimination value

2.5 Summary

This chapter presents the theoretical framework for the study In Section 2.1, thenotion of a language test as a measuring device of people’S STATEMENTs ability is reviewed.Additionally, the purposes of language testing, types of language tests and criteria of agood test are also discussed Section 2.2 classifies achievement tests into two types andmentions consideration in designing final achievement tests The definition, benefits andlimitations of MCQs tests and principles of this type of test construction are dealt with insection 2.3 The final Section - 2.4 is concerned with test reliability, methods for estimatingtest reliability, and ways to make language tests more reliable

Trang 22

Chapter 3: The Context of the Study 3.1 The current English learning, teaching and testing situation at HUBT

There are over 1500 second-year non-majors of English at HUBT English is theirrequired subject for foreign language Their levels of proficiency vary because of theirdifferent backgrounds, knowledge of language, exposure to English, characteristics,learning attitudes, motivations and so on These students have to cover a comparativelylarge amount of knowledge of English as English hold the highest credits among allsubjects In the English Department, HUBT there are totally 62 teachers who work with thenon-English majors enthusiastically to help them with the foreign language They are alldedicated and qualified with an average of five years’S STATEMENT teaching experience

With the aim to equip students with business English and communication skillsnecessary for their future career, learning and teaching activity for the second-year non-English majors mainly focus on developing speaking and listening skills However, testingprocess is quite complicated and can be described as follows

In semester 4 the students have to experience daily assessment and go through fourtests all together Daily assessment includes checking vocabulary, speaking skill, and doingtasks in the course book and practice files The four tests comprise of two paper tests andtwo computer-based MCQs tests These tests are designed by teachers of EnglishDepartment, HUBT and The paper tests, given in the middle of the term (week 9) and atthe end of the term (week 17) focus on listening, writing, grammar and vocabulary Thecomputer-based MCQs tests are administered on computers in the week 19 Each test lasts

2 hours and includes 150 multi choice items emphasizing on vocabulary, grammar,reading and functional language The construction of the first test (hereafter achievementtest 1) is based on the three units of the course book (Unit 7, 8, 9) that the students havealready learnt The second one (achievement test 2) is designed on the basis of the lastthree units of the course book (Unit 10, 11, 12) Items of MCQs tests are selected by oneperson in charged of teaching English in the 3rd and 4th semester for the second yearstudents

The Computer-based MCQs test administered in HUBT is similar to a paper-doneone The main different is that the test is delivered on computers and students simply clickmouse for their chosen response among A, B, C, D This kind of test is different fromcomputer adaptive tests which are tailored to the particular abilities of the candidate Inother words, a Computer-based MCQs test at HUBT is in fact an MCQs test delivered oncomputers

The following chart illustrates testing guideline for semester 4:

Semester 4 (12 credits)The first score (6 credits) The second score (6 credits )

Trang 23

This study is only focused on the final achievement computer-based MCQs 1

3.2 The course objectives, syllabus and materials used for the second non-majors of English in Semester 4.

3.2.1 The course objectives

The training objectives in the 4th semester are to help students to:

- Further develop speaking and listening skill in business contexts

- Further develop skill of reading business texts

- Consolidate basic grammar

- Broaden business vocabulary

- Further practice pronunciation

-Write business letters and memorandums

3.2.2 Business English syllabus

The syllabus is described in the following table:

Grammar review correction/

220 8 Starting up- Vocabulary C.B- 70-71 P.F- 32

Grammar Review correction

Note: C.B: Course book; P.F: Practice file; T.B: Teacher’S STATEMENTs book

Table 3: The syllabus for 4 th semester (for non –English majorsEnglish majors )

Time allocation for language skills and sections is illustrated as

follows:

Trang 24

practicing functional language)

The course book check lists necessary for examining the task and content in thecourse book used for construction of the achievement computer-based MCQs test 1 is inAppendix 1

3.2.4 Specification grid and scoring scale for the final achievement Computer-based MCQs test 1 in Semester 4

In order to evaluate students’S STATEMENT achievement, the following grid is used to designachievement test 1

mark

Skill weighting

sentences, approx 18 words

50 x; 4 multiple choice

sentences, approx 18 words

factual test, approx 60 words

language

Short sentences, approx 16 words

Table 5: Specification grid for the final computer-based MCQs test 1

Trang 25

The scoring scale for the test is designed by the teachers in HUBT and includes two levels

as follows:

Pass: For students who can get 50% of the whole test

Fail: For students who get below 50% of the whole test

Trang 26

Chapter 4: Methodology 4.1 Participants

The first subjects who participated in this study include 349 second year studentsfrom 14 classes Their test scores were collected for the purpose of analyzing andcomputing the internal consistency reliability, item difficulty, and item discriminability

The second subjects who took part in answering a questionnaire include 236 secondyear non-English majors Their responses to 14 questions were analyzed in order toinvestigate the students’S STATEMENT attitude towards the final achievement MCQs test 1

4.2 Data collection instruments

The following instruments were adopted to obtain information for the study:

- Kuder-Richardson Formula 20 for internal consistency reliability estimate

- Item difficulty and item discrimination formulae mentioned in section 2.4.2

- A questionnaire survey for students (see Appendix 2)

The questionnaires were designed on the basis of Henning’S STATEMENTs list of threats to reliability of atest (1987) The objective is to find out students’S STATEMENT attitude towards the reliability of thecurrent achievement MCQs test 1 in the 4th term The questionnaires included 14 items andwere in Vietnamese to make sure the informants understood the questions appropriately(see Appendix 2) These items focus on the characteristics of the test, test administrationand test-takers

4.3 Data collection procedure

The data about test objectives and the course objectives were elicited throughEnglish Department Bulletin, HUBT enacted in 2003 The data about the syllabus contentwere collected through the syllabus for the second year students The data about the testcontent and test format were obtained through a copy of the official current test fromEnglish Department

The data about the students’S STATEMENT test scores and items responses were obtained from afile containing both the students’S STATEMENT score and responses on the test provided by InformaticsDepartment, HUBT

The data about the results of questionnaire were collected from 236 second yearstudents who were randomly selected one week after they have finished the finalachievement test 1

4.4 Data analysis procedure

First, the comparison between the test objectives and the course objectives, the testcontent and the syllabus content, and skill weight in the test format and the syllabus wasmade in order to determine if they are compatible with each other

Second, reliability coefficient, item difficulty and item discrimination indices ofthe MCQs test 1 were analyzed in order to determine the extent to which the finalachievement test 1 is reliable

Finally, analysis of students’S STATEMENT responses on the questionnaire was made in order tofind out students’S STATEMENT attitude towards the MCQs test given to them

Trang 28

Chapter 5: Results and Discussions

5.1 The compatibility of the objectives, content and skill weight format of the final achievement computer-based MCQ test 1 for 4 th semester with the course objectives and the syllabus

5.1.1 The test objectives and the course objectives

As mentioned in section 3.2.1, the course is mainly targeted to further developstudents’S STATEMENT essential business communication skill of speaking such as makingpresentations, taking parts in meetings, negotiating, telephoning and using English insocial situation Through a lot of interesting discussion activities, students will build uptheir confidence in using English and improve their fluency The course is also aimed atdeveloping students’S STATEMENT listening skill such as listening for information and note-taking Inaddition, it provides students with important new words and phrases and increases theirbusiness vocabulary Students’S STATEMENT skill of reading will be also built up through authenticarticles on a variety of topic on business The course also helps students to revise andconsolidate basic grammar, to improve their pronunciation and to perform some writingtasks on business letter and memorandum

The MCQs test 1 is designed to check what students have learnt about vocabulary,grammar, reading topics and functional language in Unit 7,8,9 of Market Leader Pre- It isalso constructed to assess students’S STATEMENT achievement at the end of the course, especially toevaluate students’S STATEMENT results after completing these 3 units Particularly, vocabulary andgrammar section making up of 100 items are aimed at examining the amount ofvocabulary and grammar that students have been instructed Reading section of 30 items

is to measure students’S STATEMENT reading skill on business topics such as marketing, planning andmanaging Functional language sections of 20 items is to measure students’S STATEMENT ability ofcommunicating in daily business situations

Obviously, the objectives of the course and of the MCQs test 1 are partiallycompatible with each other That is to say, the course provides students with knowledgeabout vocabulary, grammar and functional language and develop students’S STATEMENT reading skillsand the MCQs test 1 is designed to measure students’S STATEMENT ability of these knowledge andskills However, the difference is that the course objectives are targeted to develop bothreceptive and productive skills for students whereas the test merely focuses on students’S STATEMENTreceptive skill of reading and examines students’S STATEMENT ability of knowledge recognition ratherthan language production

5.1.2 The test item content in four sections and the syllabus content

* Grammar section

The grammar items in the test are shown clearly and specifically in the table below

tested items

Percentage of tested items

Trang 29

Table 6: Main points in the grammar section

Compared to the grammar checklist (see Appendix 1), it can be seen that test items in thissection generally cover grammar items in the course book such as question forms, futuretime expression and reported speech However the total proportion of these items onlymakes up of 56%, a little higher than the total percentage of items which are not targeted

at in grammar part of the syllabus such as prepositions, connectors, comparatives, verbtense and verb form

1 Noun- noun collocation (Marketing

Table7: Main points in the vocabulary section

In comparison with vocabulary checklist (see Appendix 1), it can be recognized that testitems in vocabulary section of the test are of 80% the same as vocabulary items in thecourse book That is to say, the test items stick to what students have learnt such as noun-noun collocation relating to marketing terms, verb-noun collocation relating to ways toplan and verb-preposition collocation relating to ways to manage Nevertheless, there arealso items such as verbs showing trends, multi-word verbs and adjective related to profitswhich do no include in vocabulary part of the syllabus but in reading articles in Unit 7,8,9

* Reading comprehension section

In this section, there are 30 extracts of which main topics are shown as follows:

tested items

Percentage

of tested items

Trang 30

2 Company profile 5 16.7

7 The role of Public Relation

department

Table 8: Topics in reading section

By comparing the reading section with the reading checklist (see Appendix 1), it can beobserved that the topics in the MCQs test 1 such as managing, marketing and planning arehighly relevant to the ones that the students have already learnt

* Functional language section

This section includes 20 items of business situations The function of language in thesesituations is presented in the following table:

Table 9: Items in the functional language sections

To bring Table 9 into comparison with functional language checklist (see Appendix 1), itcan be obviously realized that all test items broadly cover what the students have alreadybeen taught in business situations (for example, telephoning, meeting and socializing &

entertaining) However, there is a lack of language items of interruption and making

excuses although they are focal points in the syllabus

To sum up, with regard to the content, items in four sections of the MCQs test 1 isgenerally to large extent relevant to the course book

5.1.3 The skill weight format in the test and the syllabus

According to skill weight format in the syllabus illustrated in Table 4- section 3.2.2,among four parts including reading, vocabulary, grammar and functional language, readinghas the highest proportion of skill weight (18%) and ranks number 1 Grammar ranksnumber 2 with the skill weight percentage of 14 Functional language ranks number 3 withthe rate of 13% and vocabulary is at the bottom with the proportion of 12%

However, in the test specification grid, skill weighting for four sections is not in thesame rank as in the syllabus Vocabulary and grammar section, with the number of 50 testsitems for each hold the same rank –English majors number 1 whereas the rank of reading (30 test items)and functional language (20 test items) is number 3 and 4 respectively Thus it can be seen

Trang 31

that in the MCQs test 1, the rank of reading section is changed from number 1 to number 3and vocabulary section changed from rank 4 to rank 1

From the detailed findings presented above, we can realize that the MCQs test 1objectives are partially compatible with the course objectives Also, the skill weight format

of the MCQs test 1 is partially similar to the skill weight format in the syllabus Only the

content of the MCQs test 1 nearly reflects all the course book content It thus might be

concluded that the MCQs test 1 is to a certain degree related to the teaching content andobjectives

5.2 The reliability of the final achievement test

Table 10: Test reliability coefficient

As stated in chapter 2, the typical reliability coefficient for MCQs tests is >= 0.8 and thecloser it gets to 1.0, the better it is However, the reliability coefficient of the MCQs test 1here is too low in comparison with the desirable one

5.2.2 Item difficulty and discrimination value

Difficulty and discriminability value for each of the 150 tested items were illustrated inAppendix 5

* Item difficulty value

Among 150 items, there are 54 items of which p value is bigger than 0.67, making up of36% of the total test items while there are no items with p value smaller than 0.33 (seeAppendix 5) That means 64% of test items have acceptable difficulty level, 36% of testitems are too easy and 0% of test items is too difficult

In addition, the MCQs test 1 merely obtained the average p value of 0.55 (see Appendix 3)The following table illustrates p-value for items in 4 sections of the MCQs test 1:

Table 11: p-value of items in 4 sections

Table 10 shows that half of test items in grammar section and especially 95% of test items

in functional language section are too easy It appears that the MCQs test 1 includes toomany too easy items, especially in functional language section Besides, this test as a whole

Trang 32

does not have a range of items with a desirable average p-value of 0.55 Accordingly, itemswith undesirable difficulty index in this test might reduce the test reliability

* Item discrimination value

Among 150 items there are 76 items of which discrimination values are acceptable(>=0.67) The others are non-discriminating (see Appendix 5)

The following table demonstrates discrimination value for items in 4 sections of the MCQstest 1:

Table 12: Discrimination value of items in 4 sections

Table 11 proves that 95% of items in functional language section are mostly discriminating Roughly half of items in vocabulary and grammar section are also not ofgood discrimination value Only items in reading sections can discriminate students well.Thus it can be inferred that item discriminability of MCQs 1 is not as good as expected

non-* The number of items with both acceptable p-value and discrimination value is 68,making up of 45.3% of the whole test (see Appendix 5) That might be understood thatonly 45.3% of test items have good quality

The number of test items with acceptable p-value and discrimination value in 4 sections ofthe MCQs test 1can be shown as follows:

Section No of items with acceptable p-value and

Table 13: Number of test items with acceptable p-value

and discrimination value in 4 sections

From this table we can see that items in reading section have the best quality as they satisfythe requirement for p-value and discrimination value Then come the items in vocabularysection The items in grammar, and especially in functional language section areundesirable since they are too easy and non-discriminating

In brief, the findings show that the MCQs test 1 to large extent lacks reliability for tworeasons First, the reliability coefficient of this test is too far from a desirable reliabilitycoefficient of an MCQs test Second, more than half of test items (54.7%) do not have goodp-value and discrimination value

5.3 The attitude of students towards the MCQs test 1

The survey questionnaires were delivered to 236 second year non-English majors, but only

218 papers were collected The following are the results:

Định dạng
Số trang	64
Dung lượng	1,29 MB

FINAL ACHIEVEMENT COMPUTER

APPENDIX 2 Câu hỏi điều tra