thạc sỹ, luận văn, ngoại ngữ, tiếng anh, khóa luận, chuyên đề
Trang 1– TỔNG CÔNG TY XÂY DỰNG HÀ NỘI)
M.A minor thesisField: Methodology
Code: 60.14.10
HANOI –2008
Trang 2NỘI – TỔNG CÔNG TY XÂY DỰNG HÀ NỘI)
MA minor thesis
Field: Methodology Code: 60.14.10
Supervisor: Phùng Hà Thanh, M.A.
HANOI – 2008
Trang 4I would like to express my deepest thanks to my supervisor Ms Phùng Hà Thanh,M.A for the invaluable support, guidance, and timely encouragement she gave mewhile I was doing this research I am truly grateful to her for her advice andsuggestions right from the beginning when this study was only in its formative stage.I would like to send my sincere thanks to the teachers at English Department,HATECHS, who have taken part in the discussion as well given insightful commentsand suggestion for this paper.
My special thanks also go to students in groups KT1, KT2, KT3, KT4-K06 for theirparticipation to the study as the subjects of the study With out them, this projectcould not have been so successful.
I owe a great debt of gratitude to my parents, my sisters, my husband especially myson, who have constantly inspired and encouraged me to complete this research.
Trang 5Test evaluation is a complicated phenomenon which has been paid much attention bynumber of researchers since the importance of language test in assessing theachievements of students was raised When evaluating a test, evaluator should haveconcentrated on criteria of a good test such as the mean, the difficulty level,discrimination, the reliability and the validity
This present study, researcher chose the final reading test for students at HATECHSto evaluate with an aim at estimating the reliability and checking the validity This isa new test that followed the PET form and was used in school year 2006 – 2007 as aprocedure to assess the achievement of students at HATECHS From theinterpretation of the data got from scores, researcher has found out that the finalreading test is reliable in the aspect of internal consistency The face and constructvalidity has been checked as well and the test is concluded to be valid based on thecalculated validity coefficients However, the study remains limitations that lead tothe researcher’s directions for future studies.
Trang 6ACKNOWLEGEMENTii
Trang 72.3.1.1 Course objectives 20
2.3.3.4 Summary to the results of the study 32
Trang 8HATECHS: Hanoi, Technical and Professional Skill Training School
Trang 9Table 1: Types of language tests 9
Table 4: The syllabus for teaching English – Semester 2 21
Table 7: The Raw Scores of the final reading test and the PET 24
Trang 10Discrimination is the spread of scores produced by a test, or the extent to which a test
separates students from one another on a range of scores from high to low Also usedto describe the extent to which an individual multi-choice item separates the studentswho do well on the test as a whole from those who do badly.
Difficulty is the extent to which a test or test item is within the ability range of a
particular candidate or group of candidates.
Mean is a descriptive statistic, measuring central tendency The mean is calculated by
dividing the sum of a set of scores by the number of scores.
Median is a descriptive, measuring central tendency: the middle score or value in a set.Marker, also scorer is the judge or observer who operates a rating scale in the
measurement of oral and written proficiency The reliability of markers depends inpart on the quality of their training, the purpose of which is to ensure a high degree ofcomparability, both inter- and intra-rater.
Mode is a descriptive statistic, measuring central tendency: the most frequent
occurring score or score interval in a distribution.
Raw scores – test data in their original format, not yet transformed statistically in any
way ( eg by conversion into percentage, or by adjusting for level of difficulty of taskor any other contextual factors).
Reading comprehension test is a measure of understanding of text.
Reliability is the consistency, the extent to which the scores resulting from a test are
similar wherever and whenever it is taken, and whoever marks it.
Trang 11measure The measure may be based on ratings, judgements, grades, number of testitems correct.
Standard deviation is the property of the normal curve Mathematically, it is the
square root of the variance of a test.
Test analysis is the data from test trials are analyzed during the test development
process to evaluate individual items as well as the reliability and validity of the test asa whole Test analysis is also carried out following test administration in order toallow the reporting of results Test analysis may also be conducted for researchpurposes.
Test item is part of an objective test which sets the problem to be answered by the
student: usually either in multi choice form as statement followed by several choicesof which one is the right answer and the rest are not; or in true/false statement whichthe student must judge to be either right or wrong.
Test taker is a term used to refer to any person undertaking a test or examination.
Other terms commonly used in language testing are candidate, examinee, testee.
Test retest is the simplest method of computing test reliability; it involves
administering the same test to the same group of subjects on two occasions The timebetween administrations is normally limited to no more than two weeks in orderminimize the effect of learning upon true scores.
Validity is the extent to which a test measures what it is intended to measure The test
validity consists of content, face and construct validity.
Trang 12PART ONE: INTRODUCTION
Trang 131 Rationale
Testing is necessary in the process of language teaching and learning, therefore it hasgained much concern from teachers and learners Through testing, teacher can evaluatelearners’ achievements in a certain learning period; self assess their different teachingmethod and provide input into the process of language teaching (Bachman, 1990, p 3).Thanks to testing, learners also self-assess their English ability to examine whethertheir levels of English meet the demand of employment or studying abroad Theimportant role of test makes test evaluation necessary By evaluating tests, testdesigners would have the best test papers for assessing their students.
Despite the importance of testing, in many schools, tests are designed withoutfollowing any rigorous principles or procedures Thus, the validity and the reliabilityshould be doubted In HATECHS, the final English course tests had been designed byteachers at English Department at the end of the course, and some of tests were usedrepeatedly with no adjustment In school year 2006 – 2007, there has been a change intest designing Final tests were designed according to PET (Preliminary English Test)procedure The PET is from Cambridge Testing System for English Speakers of OtherLanguages Based on the PET, new Final Reading Test has been also developed andused as an instrument to assess students’ achievement in reading skill The test wasdelivered to students at the end of school year 2006 - 2007, there was not anyevaluation To decide whether the test is reliable and valid a serious study is needed.The context at HATECHS has inspired the author, a teacher of English to take this
opportunity to undertake the study entitled “Evaluating a Final English Reading Test
for the Students at Hanoi Technical and Professional Skills Training School” with an
aim to evaluate the test to check the validity and reliability of the test The author wasalso eager to have a chance to find out some suggestions for test designers to get betterand more effective test for their students.
Trang 142 Objectives of the study
The study is aimed at evaluating final reading test for the students at Hanoi, Technical andProfessional Skill Training School The test takers are non-majors The results of the testwill be analyzed, evaluated and interpreted with the aims:
- to calculate the internal consistency reliability of the test - to check the face and construct validity of the test
3 Scope of the study
Test evaluation is a wide concept and there are many criteria in evaluating the test.Normally, there are four major criteria - item difficulty, the discrimination, reliability andthe validity when any test evaluator wants to evaluate a test However, it is said that itemdifficulty and the discrimination of the test are difficult to evaluate and interpret; therefore,with in this study the researcher focuses on the reliability and the validity of the test as awhole.
At HATECHS, at the end of Semester 1, there is a reading achievement test, and at the endof the first year, after finishing 120 periods studying English, there is a final reading test.The researcher chose the final test to evaluate the internal consistency reliability, face andconstruct validty.
4 Methodology of the study
In this study, the author evaluated the test by adopting both qualitative andquantitative methods The research is quantitative in the sense that the data will becollected through the analysis to the scores of the 30 random papers of students at theFaculty of Finance and Accounting To calculate the internal consistency reliabilityresearcher use Formula called Kuder-Richardson 21 and Pearson CorrelationCoefficient Formula would be adopted to calculate the validity coefficient It is
Trang 15qualitative in the aspect of using a semi-structured interview with open questionswhich were delivered to teachers at HATECHS at the annual meeting on teachingsyllabus and methodology The conclusion to the discussion would be used as thequalitative data of the research.
5 The organization of the study
The study is divided into three parts:
Part one: Introduction – is the presentation of basic information such as the
rationale, the scope, the objectives, the methods and the organization of the study.
Part two: Development – This part consists of two chapters
Chapter 1: Literature Review – in which the literature that related to language of
testing and test evaluation.
Chapter 2: Methodology and Results – is concerned with the methods of the study,
the selection of participants, the materials and the methods of data collection andanalysis as well the results of the process of data analysis.
Part three: Conclusion – this part will be the summary to the study, limitations as
well the recommendations for further studies.Then, Bibliography and Appendices
Trang 16PART TWO: DEVELOPMENT
Trang 17CHAPTER 1
LITERATURE REVIEW
This chapter is an attempt to establish theoretical backgrounds for the study.Approaches to language testing and testing reading as well as some literature to thetest evaluation will be reviewed
1.1 Language testing
1.1.1 Approaches to language testing
1.1.1.1 The essay translation approach
According to Heaton (1998), this approach is commonly referred to as the scientific stage of language testing In this approach, no special skill or expertise intesting is required Tests usually consist of essay writing, translation and grammaticalanalysis The tests, for Heaton, also have a heavy literary and cultural bias He alsocriticized that public examination i.e secondary school leaving examinationsresulting from the essay translation approach sometimes have an aural/oralcomponent at the upper intermediate and advanced levels though this has sometimesbeen regarded in the past as something additional and in no way an integral part ofthe syllabus or examination (p 15)
pre-1.1.1.2 The structuralist approach
“This approach is characterized by the view that language learning is chiefly concernedwith the systematic acquisition of a set of habits It draws on the work of structurallinguistics, in particular the importance of contrastive analysis and the need to identify andmeasure the learner’s mastery of the separate elements of the target language: phonology,vocabulary and grammar Such mastery is tested using words and sentences completelydivorced from any context on the grounds that a language forms can be covered in the testin a comparatively short time The skills of listening, speaking, reading and writing are alsoseparated from one another as much as possible because it is considered essential to test onething at a time” Heaton, 1998, p.15).
Trang 18According to him, this approach is now still valid for certain types of test and forcertain purposes such as the desire to concentrate on the testees’ ability to write byattempting to separate a composition test from reading The psychometric approach tomeasurement with its emphasis on reliability and objectivity forms an integral part ofstructuralist testing Psychometrists have been able to show early that such traditionalexaminations as essay writing are highly subjective and unreliable As a result, theneed for statistical measures of reliability and validity is considered to be the utmostimportance in testing: hence the popularity of the multi-choice item – a type of itemwhich lends itself admirably to statistical analysis.
1.1.1.3 The integrative approach
Heaton (1998, p.16) considered this approach the testing of language in context and isthus concerned primarily with meaning and the total communicative effect ofdiscourse As the result, integrative tests do not seek to separate language skills intoneat divisions in order to improve test reliability: instead, they are often designed toassess the learner’s ability to use two or more skills simultaneously Thus, integrativetests are concerned with a global view of proficiency – an underlying languagecompetence or ‘grammar of expectancy’, which it is argued every learner possessesregardless of the purpose for which the language is being learnt
The integrative testing, according to Heaton (1998) are best characterized by the useof cloze testing and dictation Beside, oral interviews, translation and essay writingare also included in many integrative tests – a point frequently overlooked by thosewho take too narrow a view of integrative testing.
Heaton (1998) points out that cloze procedure as a measure of reading difficulty andreading comprehension will be treated briefly in the relevant section of the chapter ontesting reading comprehension Dictation, another major type of integrative test, waspreviously regarded solely as a means of measuring students’ skills of listeningcomprehension Thus, the complex elements involved in tests of dictation werelargely overlooked until fairly recently The integrated skills involved in test dictation
Trang 19includes auditory discrimination, the auditory memory span, spelling, the recognitionof sound segments, a familiarity with the grammatical and lexical patterning of thelanguage, and overall textual comprehension.
1.1.1.4 The communicative approach
According to Heaton (1998, p.19), “the communicative approach to language testingis sometimes linked to the integrative approaches However, although bothapproaches emphasize the importance of the meaning of utterances rather than theirform and structure, there are nevertheless fundamental differences between the twoapproaches” The communicative approach is said to be very humanistic It ishumanistic in the sense that each student’s performance is evaluated according to hisor her degree of success in performing the language tasks rather than solely relationto the performance of other students (Heaton, 1998, p.21)
However, the communicative approach to language testing reveals two drawbacks.First, teachers will find it difficult to assess students’ ability without comparingachievement results of performing language tests among students Second,communicative approach is claimed to be somehow unreliable because of variousreal-life situation (Hoang, 2005, p.8) Nevertheless, Heaton (1988) proposes asolution to this matter In his point of view, to avoid the lack of reliability, verycareful drawn - up and well-established criteria must be designed, but he does not setany criteria in detail.
It a nutshell, each approach to language testing has its weak points and strong point aswell Therefore, a good test should incorporate features of these four approaches.(Heaton, 1988, p.15).
1.1.2 Classifications of Language Tests
Language tests may be of various types but different scholars hold different views onthe types of language tests.
Trang 20Henning (1987), for instant, establishes seven kinds of language test which can bedemonstrated as follows:
1Objective vs objectivetests
Objective tests have clear making scale; do not need muchconsideration of markers.
Subjective tests are scored based on the raters’ judgements oropinions They are claimed to be unreliable and dependent.2Direct vs indirect tests Direct tests are in the forms of spoken tests (in real life
and proficiency tests
Aptitude tests (intelligence tests) are used to select students ina special programme.
Achievement tests are designed to assess students’ knowledgein already-learnt ereas.
Proficiency tests (placement tests) are used to select studentsin desired field.
5Criterion referencedvs norm-reference
Criterion referenced tests: The instructions are designed afterthe tests are devised The tests obey the teaching objectivesperfectly.
Norm-reference tests: there are a large number of people fromthe target population Standards of achievement such as themean, average score are established after the course.
6Speed test and powertests
Speed tests consist of items, but time seems to be insufficient. Power tests contain difficult items, but time is sufficient.7Others
Table 1: Types of language tests(Source: Henning, 1987, pp 4-9)
However, Hughes (1989) mentions two categories: kinds of tests and kinds oflanguage testing Basically, kinds of language testing consist of direct vs indirecttesting, norm-referenced testing vs criterion-referenced testing, discrete vs integrativetesting, objective vs subjective testing (Hughes, 1989, pp 14-19) Apart from this, he
Trang 21develops one more type of test called communicative language testing which isdescribed as the assessment of the ability to take part in acts of communication
(Hughes, 1989, p.19) Hughes also discusses kinds of tests which can be illustrated in
the following table:
NoKinds of testsCharacteristics
1 Proficiency Sufficient command of language for a particularpurpose
2 Achievement Final AchievementProgress Achievement Organized after the end of the course
Measure the students’ progress
3 Diagnostic Find students’ strengths and weaknesses; whatfurther teaching necessary.
4 Placement Classify students into classes at different levels.
Table 2: Types of tests
(Source: Hoang, 2005, p.13 as cited in Hughes, 1990, pp 9-14)
Language tests are divided into two types by Mc Namara (2000) based on testmethods and test purposes About test methods, he believes that there exists two basictypes, namely traditional paper and pencil language tests which are used to assesseither separate components or receptive understanding; performance tests Regardingto test purpose he divides language tests into two types: achievement tests andproficiency tests.
1.2 Testing reading
Reading can be defined as the interaction between the reader and the text (Aebersold& Field, 1997) This dynamic relationship portrays the reader as creating meaning ofthe text in relation to his or her prior knowledge (Anderson, 1999) Reading is one offour main skills, which plays a decisive role in process of acquiring a language.Therefore, testing reading comprehension is also important Traditionally, testingreading is no doubt because of the social important of literacy and because these testsare considered more reliable than speaking test
Trang 22Alderson (1996) proposes that reading teachers feel uncomfortable in testing reading.To him, although most teachers use a variety of techniques in their reading classes,they do not tend to use the same variety of techniques when they administer readingtests Despites the variety of testing techniques, none of them is subscribed to as thebest one Alderson (1996, 2000) considers that no single method satisfies readingteachers since each teacher has different purposes in testing He listed a number oftest techniques or formats often used in reading assessments, such as cloze tests,multiple-choice techniques, alternative objective techniques (e.g., matchingtechniques, ordering tasks, dichotomous items), editing tests, alternative integratedapproaches (e.g., the C-test, the cloze elide test), short-answer tests (e.g., the free-recall test, the summary test, the gapped summary), and information-transfertechniques Among the many approaches to testing reading comprehension, the threeprincipal methods have been the cloze procedure, multiple-choice questions, andshort answer questions (Weir, 1997).
Cloze test is now a well-known and widely-used integrative language test WilsonTaylor (1953) first introduced the cloze procedure as a device for estimating thereadability of a text However, what brought the cloze procedure widespreadpopularity was the investigations with the cloze test as a measure of ESL proficiency(Jonz, 1976, 1990; Bachman, 1982, 1985; Brown, 1983, 1993) The results of thesubstantial volume of research on cloze test have been extremely varied Furthermore,major technical defects have been found with the procedure Alderson (1979), forinstance, showed that changes in the starting point or deletion rate affect reliabilityand validity coefficients Other researchers like Carroll (1980), Klein-Braley (1983,1985) and Brown (1993) have questioned the reliability and different aspects ofvalidity of cloze tests
According to Heaton (1998) “cloze test was originally intended to measure thereading difficulty level of the text Used in this way, it is a reliable means ofdetermining whether or not certain texts are at an appropriate level for particulargroups of students” (p.131) However, for Heaton the most common purpose of thecloze test is to measure reading comprehension It has long been argued that cloze
Trang 23measures text involving the interdependence of phrases, sentences and paragraphs
within the text However, a true cloze is said generally to measure global reading
comprehension although insights can undoubtedly be gained into particular readingdifficulty In contrast, Cohen (1998) concludes that cloze tests do not assess globalreading ability but they do assess local-level reading Each research tends to show hisevident to prove their arguments; however, most of them agree that cloze procedure isreally effective in testing reading comprehension.
Another technique that Alderson (1996, 2000), Cohen (1998), and Hughes (2003)discuss is ‘multiple-choice’; a common device for text comprehension Ur (1996,p.38) defines multiple-choice questions as consisting “ of a stem and a number ofoptions (usually four), from which the testee has to select the right one” Alderson(2000: 211) states that multiple-choice test items are so popular because they providetesters with the means to control test-takers’ thought processes when responding; they“… allow testers to control the range of possible answers …”
Weir (1993) points out that short-answer tests are extremely useful for testing readingcomprehension According to Alderson (1996, 2000), ‘short-answer tests’ are seen as‘a semi-objective alternative to multiple choice’ Cohen (1998) argues that open-ended questions allow test-takers to copy the answer from the text, but firstly oneneeds to understand the text to write the right answer Test-takers are supposed toanswer a question briefly by drawing conclusions from the text, not just responding‘yes’ or ‘no’ The test-takers are supposed to infer meaning from the text beforeanswering the question Such tests are not easy to construct since the tester needs tosee all possible answers Hughes (2003: 144) points out that “the best short-answerquestions are those with a unique correct response” However, scoring the responsesdepends on thorough preparation of the answer-key Hughes (2003) proposes that thistechnique works well when the aim is testing the ability to identify referents.
These above techniques are what usually used in testing reading, however, it difficultto say which the most effective one is because it depends on the purpose of teachersin assessing their students.
Trang 241.3 Criteria in evaluating a test
Test evaluation is a complicated phenomenon; this process needs to analyze numberof criteria However, there are five main criteria that most researchers evaluate theirtests; they are the mean, difficult level, discrimination, reliability and validity.
1.3.1 The mean
According to a dictionary of language testing by Milanovic and some other authors,the mean, also the arithmetical average is a descriptive statistic, measuring centraltendency The mean is calculated by dividing the sum of a set score by the number ofscore Like other measures of central tendency the mean gives an indication of thetrend or the score which is typical of the whole group In normal distributions themean is closely aligned to the median and the mode This measure is by far the mostcommonly used and it is the basis of a number of statistical tests of comparisonbetween groups commonly used in language testing (Milanovic et al, 1999, p.118)In language test evaluation, this also a criterion needs evaluating because the meanscore of the test will tell you how difficult or easy the test was for the given group.This is useful for evaluators to have reasonable adjustment to the test as a whole.
1.3.2 The difficulty level
Difficulty level of a test tells you how difficult or easy each item of the test is.Difficulty also shows the ability range of a particular candidate or group ofcandidates “In language testing, most tests are designed in such a way that themajority of items are not too difficult or too easy for the relevant sample of testcandidates.” (Milanovic et al, 1999, p.44)
Item difficulty requirements vary according to test purpose In selection test, forexample, there may be no need for fine graded assessment within the ‘pass’ or ‘fail’
Trang 25groups so that the most efficient test design will have a majority f items clusteringnear the critical cut-score Information about item difficulty is also useful indetermining the order of items on a test Tests tend to begin with easy items in orderto boost confidence and to ensure that weaker candidates do not waste valuable timeon items which are two difficult for them.
For test evaluators, difficulty level of a test should be analyzed for its importance indeciding the sequence of items on a test As well, this is one of factors that affect thetest scores of test-takers.
1.3.3 Discrimination
According to Heaton, “the discrimination index of an item indicates the extent towhich the item discriminates between the testees, separating the more able testeesfrom the less able (Heaton, 1998, p 179) For him, the index of discrimination tells uswhether those students who performed well on the whole test tended to do well orbadly on each item in the test
As well, in Milanovic’s definition, it is understood as “a fundamental property of alanguage test, in their attempt to capture the range of individual abilities On thatbasis the more widely discrimination is an important indicator of a test’s reliability”.(Milanovic et al, 1999, p.48)
By looking at the test scores, can the evaluators check the discrimination Because ofits decisive role in categorizing the test takers into bad and good group,discrimination of a test needs analyzing in the process of evaluating a test.
1.3.4 Reliability
Reliability is another factor of a test should be estimated by the test evaluator.“Reliability is often defined as consistency of measurement” (Bachman & Palmer,1996, p.19) A reliable test score will be consistent across different characteristics ofthe testing situation Thus, reliability can be considered to be a function of
Trang 26consistency of scores from one set of test tasks to another Reliability is also means“the consistency with which a test measure the same thing all the time” (Harrison,1987, p.24)
For test evaluators, reliability can be estimated by some of methods such as “parallelform, split half, rational equivalence, test-retest and inter-rater reliability checks”(Milanovic et al, 1999, p.168) According to Shohamy (1985), the types and thedescription as well the ways to calculate the reliability are summarized in thefollowing table:
Reliability typesDescriptionHow to calculate
1 Test-retestThe extent to which the testscore are stable from oneadministration to anotherassuming no learningoccurred between twooccasions
Correlations between scoresof the same test given on twooccasions
2 Parallel formThe extent to which 2 tests Correlations between two
Trang 27taken form the same domainmeasure the same things
forms of the same rater ondifferent occasions or oneoccasion
3 Internal consistencyThe extent to which the testquestions are related to oneanother, and measure thesame trait
Kuder-Richardson Formula21
4 Intra-raterThe extent to which the samerater is consistent in hisrating form one occasion toanother, or in occasions butwith different test-takers
Correlations between scoresof the same rater on differentoccasions, or one occasion.
5 Inter-raterThe extent to which thedifferent raters agree aboutthe assigned score or rating.
Correlations among ratingprovided by different raters.
Table 3: Types of reliability
(Source: Hoang, 2005, p.31 as cited in Shohamy, 1985, p.71)
However, the reliability is said to be a necessary but not a sufficient quality of a test.And the reliability of a test should be closely interlocked with its validity Whilereliability focuses on the empirical aspects of the measurement process, validityfocuses on theoretical aspects and seeks to interweave these concepts with theempirical ones For this reason it is easier to assess reliability than validity.
Test reliability could be analyzed by looking at the test score If the test scoreunchanged in different times the test is taken, the test is said a reliable one and vice-versa However, this depends on some of conditions and situations such as thecircumstances in which the test is taken, the way in which it is marked and theuniformity of the assessment it makes Therefore, it is necessary for evaluators whenthey try to estimate the reliability of a test.
1.3.5 Validity
Trang 28Validity is the most important consideration in test evaluation The concept refers tothe appropriateness, meaningfulness and usefulness of the specific inferences madefrom test scores Test evaluation is the process of accumulating evidence to supportsuch inferences Validity, however, is a unitary concept Although evidence may beaccumulated in many ways, validity refers to the degree to which that evidencesupports the inferences that are made from scores The inferences regarding specificuses of a test are validated, not the test itself.
Traditionally, validity evidence has been gathered in three distinct categories:
content-related, criterion related and constructed evidence of validity More recent
writing on validity theory stress the important of viewing validity as a ‘unitaryconcept’ (Messick, 1989) Thus, while the validity evidence is presented in separatecategories, this categorization is principally an organizational technique for thepurpose of the presentation of research in this manual.
According to Milanovic et al (1999), content and construct validity are conceptualwhereas concurrent and predictive (criterion-related) validity are statistical Or in
other words, scores obtained on the test may be used to investigate criterion-relatedvalidity, for example, by relating them to other test scores or measure such asteachers’ assessment or future prediction (pp 220-221)
Another type of test validity, for Milanovic et al, is face validity which refers to the
degree to which a test appears to measure the knowledge or abilities it claims tomeasures, as judged by untrained observer such as the candidate taking the test orthe institution which plans to administer it (Milanovic et al, 1999, p 221)
In a book by Alderson et al (1995), the authors divided validity into other
categories; they are internal, external and construct validity Internal validityaccording to them consists of three sub-types – face, content and response validity.For external validity, there are two sub-types; they are concurrent and predictive
validity And construct validity relates to five forms; they are comparison withtheory, internal correlations, comparison biodata and psychological characteristic,
Trang 29multitrait – multi method analysis and convergent – divergent validation and factoranalysis (Alderson et al, 1995, pp 171-186)
The validity of the test is paid much attention by number of researchers, testevaluators should take time in checking the validity of the test based on thecategories of it which is categorized by authors and researchers Through the testscores, evaluators check whether the test is valid or not so that they will have goodadjustment to the test they evaluated.
Summary: In this chapter, we have attempted to establish the theoretical framework
for the thesis Language testing is one of most important procedures for languageteachers in student assessing There are number of approaches to language testingand testing reading This has been discussed in the first part of the chapter Thesecond matter has been explored in the chapter is the theory of test evaluation whichrelated to the criteria of a test need analyzing by test evaluators.
Trang 30CHAPTER 2
METHODOLOGY AND RESULTS
This chapter will include the research questions, the selection of participants whotook part in the study and the testing materials The methods of data collection anddata analysis as well the results are presented afterwards.
2.1 Research questions
On the basis of the literature review, this chapter aims at answering two researchquestions:
1) Is the final reading test for the students at HATECHS reliable?
2) To what extent is the final reading test valid in terms of face and construct?
2.2 The participants
The students at HATECHS are from different provinces, cities and towns in the Northof Vietnam They are generally aged between 18 and 21 Thirty participants werechosen randomly from students at Faculty of Finance and Accounting of school year2006 – 2007 All of them are first year students In addition, seven teachers at EnglishDepartment were chosen for the interview These teachers are all female and mostlyget more than five year experience of teaching English These teachers all took part inteaching the students at the school year 2006-2007.
At the school, the students take an English course in the first year The course isdivided into two components, each lasts 60 periods It is a compulsory subject atschool After finishing the course, they are required to have pre-intermediate level.However, students often have varying English levels prior to the course Some ofthem have learnt English for 7 years at high school, or some have learnt it for 3 yearsdue to each part of the country Some of them even have never learnt English becauseat the lower level of school they learned other foreign languages not English It is