Concurrent validity of the English tests in the national secondary school leaving examination, school years 2008-2009, 2009-2010 Tính giá trị so sánh của bài t

In Vietnam, the English test in the National Secondary School Leaving Examination NSSLE plays an important role in English language teaching ELT: it stimulates student progress and evalu

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES

FALCUTY OF POST- GRADUATE STUDIES

tiếng Anh năm học 2008-2009 và năm học 2009-2010)

M.A Minor Programme Thesis

Major: Methodology Code : 601410

HANOI, 2010

Trang 2

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES

FALCUTY OF POST- GRADUATE STUDIES

tiếng Anh năm học 2008-2009 và năm học 2009-2010)

M.A Minor Programme Thesis

Major : Methodology Code : 601410

Supervisor: HOÀNG THỊ XUÂN HOA, Ph.D.

HANOI, 2010

Trang 4

1.2 Quality of a good test 8

1.2.4.2 Reasons for giving more emphasis to concurrent validity 16

Trang 5

1.3.2.5 The standard deviation 19

2.1.4 The National secondary school leaving examination 24

Trang 6

3.1.2 Descriptive statistics 29

2 Implications for improvement of the English test in the NSSLE 36

Appendix 1: The English test in the NSSLE school year 2008-2009 I

Appendix 2: The English test in the NSSLE school year 2009-2010 VII

Appendix 3: Example of calculating correlation coefficient XIII

Appendix 8: Example of calculating the standard deviation XV

Trang 7

LIST OF ABBREVIATIONS

EFL: English as a foreign language

ELT: English language teaching

L2: Second language: in the context of this study, it usually refers to English NSSLE: National Secondary School Leaving Examination

Trang 8

LIST OF TABLES AND FIGURES

TABLES

Table 2.2: Students‘ grade in English, school year 2009-2010 26

Trang 9

PART A: INTRODUCTION

1 Rationale

Education has always played an important part in people‘s life Now, in the conditions of the world economic crisis, solid knowledge and skills will help people to save their present job or will make it easier to find a new one However, it is said that effective education is impossible without an effective management and that knowledge and skills must be checked and controlled effectively Therefore, testing in education, an attempt to measure a person‘s knowledge, intelligence, or other characteristics in a systematic way, is of great significance

Ever since the English language began to be taught in formal settings, the development of tests to assess learner‘s performance has been an integral part of the language learning and teaching process Language testing, then, is central to language teaching It provides goals for language teaching and monitors success in reaching those goals

In Vietnam, the English test in the National Secondary School Leaving Examination (NSSLE) plays an important role in English language teaching (ELT): it stimulates student progress and evaluates students‘ achievement in acquiring English throughout 7 years from junior to upper secondary school Failure or pass this test is a matter of great concerns as it decides whether a student can proceed to higher education or not Besides, through this test teachers can also evaluate the effectiveness of a new teaching method or of new materials (Valette, 1977) The English tests in the NSSLE school years 2008-2009, 2009-2010 were particularly significant because high school English teachers had chance to look back at their success as well as their failure in implementing the new English textbook series:

Trang 10

English 10, English 11, and English 12 which ran officially in 2004; therefore, they can make necessary amendments in the school years after

This paper is the writer‘s attempt to evaluate the Concurrent Validity of the English Tests

in the NSSLE school years 2008-2009, 2009-2010 by establishing the correlation between the scores of the two tests, showing how widely the scores are spread out, presenting how closely they cluster and illustrating how well the tests have separated students from each other It is hoped that the results of the study can raise the awareness of English teachers in general and those interested in making better English Tests in the NSSLE in particular

2 Scope of the study

This research will focus on the concurrent validity of the English Tests in the NSSLE, school years 2008-2009, 2009-2010 only Therefore, other aspects in evaluating an achievement language test will be beyond the scope of this study

Also, due to the fact that the English tests in the NSSLE school years 2008-2009,

2009-2010 were multiple choice tests marked by a scoring machine which only gives out final test scores, a careful analysis of score patterns on each of the test items is out of the question

In addition, due to limitations in time, ability and conditions, it is impossible for the author

to take a sample population that includes representatives from different geographical areas (e.i urban, rural, island and mountainous) as well as those from a variety of ethnic groups (Kinh, Cham, H‘Mong, Kh‘Me ) Therefore, this study investigates the concurrent validity

of the English Tests in the NSSLE, school years 2008-2009, 2009-2010 in Gia Lam – Long Bien Districts, where the writer is currently working only

3 Aims of the study

This study is intended to examine the concurrent validity of the English tests in the NSSLE school years 2008-2009, 2009-2010 It places high emphasis on investigating and analyzing test scores in order to set up the correlation coefficient between the two sets of test results reveal the spread of score and determine the tests‘ ability to discriminate students

Trang 11

Third, the study also made use of supporting methods such as informal discussion, opinion exchanges with teachers and colleagues, and consulting experienced and enthusiastic supervisor to gather information needed

5 Research questions

This study is implemented to find out the answers to the following research questions:

- How are the English tests in the National Secondary School Leaving Examination, school years 2008-2009, 2009-2010 correlated?

- How do scores of each test cluster together?

- How do scores of each test spread out?

- How do the tests discriminate students‘ achievement?

6 Design of the study

The thesis is organized into three major parts:

Part A, INTRODUCTION, presents such basic information as the rationales, the aims, the methods, the research questions, and the design of the study

Part B, DEVELOPMENT, provides the literature review, the methodology and the findings

of the study in three corresponding chapters one, two, and three In chapter one, literature

Trang 12

review, theoretical background for evaluating a language test is described This chapter also includes reasons for testing, criteria of a good language test, achievement tests and issues on test concurrent validity In chapter two, methodology, the setting of the study and the methodology employed to carry out the research are fully portrayed In chapter three, discussion and findings, results of analyzed data regarding the correlation coefficient of the two sets of score, the mean, the mode, the range, and the standard deviation are shown in great details

Part C, CONCLUSION, gives a summary of the study, its implications for improvement of the English test in the NSSLE, its limitations and suggestions for future research

Trang 13

PART B: DEVELOPMENT CHAPTER 1: LITERATURE REVIEW

1.1 Achievement tests

An achievement test is concerned with measuring a student‘s competence with regard to what has been taught or what is in the syllabus This type of test is usually given at the end

of a period of instruction and as a result, its content is a sample of what has been included

in the syllabus This test is normally school-based and typically provides control over previous learning However, it should be borne in mind that the purpose of achievement tests should be to indicate how successful the learning experiences have been for the learner, rather than to show in what respects they were insufficient, and the tests themselves should also be firmly established in preceding classroom experiences in terms

of activities practiced, language used, and criteria of evaluation adopted (Weir, 1993)

1.1.1 Definition

There are a variety of definitions on achievement tests As Baker (1982) put it,

achievement tests are “used presumably to assess the subject matter and skills that students

have learnt‖ J.B Heaton (1988) added that achievement tests should be laid foundation on

― what students are presumed to have learnt – not necessarily on what they have actually

learnt not on what has actually been taught‖ In the same vein, Hughes (1989) emphasized

the importance of achievement tests in assessing students‘ success in reaching the language course‘s goals Tim McNamara (2000) shared the same point of view with Hughes (1989)

He asserted that ―Achievement tests accumulate evidence during or at the end of a course

of study in order to see whether and where progress has been made in terms of the goals of learning‖

Trang 14

These definitions all have one thing in common That is, achievement tests are used to determine a student‘s academic strengths and weaknesses and measure a student‘s mastery

of a given subject or skill They are directly related to language courses, their purpose being to establish how successful individual students, groups of students, or the courses

themselves have been in achieving objectives

1.1.2 Kinds of achievement tests

Hughes (1989) divided achievement tests into two types: final and progress, which can be either content - or objective – based While final achievement tests are given upon completion of a course, progress achievement tests are delivered at a particular stage of a course to measure students‘ advance towards the course‘s objectives Hughes (1989) also stated that content-based achievement tests are considered to be fair as it tests what students have already encountered but they can be misleading in case of badly-designed syllabus or badly-chosen materials Whereas, objective-based achievement tests are beneficial because they help to determine whether there is a consistency between the material and the course objectives and promote a positive backwash effect on teaching

On the contrary to Hughes‘ idea, Alderson, Clapham and Wall (1995) considered achievement tests and progress tests two independent and distinct categories They claimed that although progress tests and achievement tests are both content-based, they are given at different stages of the course In my own point of view, it is a good idea to group the two types of test under one roof instead of dividing them into two as they bear so much resemblance

McNamara (2000) asserted that achievement tests can gather proofs to show whether and what students have acquired as well as how the students‘ advance toward the learning goals Personally I think that tests alone cannot reflect exactly students‘ progress Mental and physical health should also be taken into account

1.1.3 Benefits of achievement testing

Achievement tests serve a variety of purposes, none of which, obviously, is to instill a sense of anxiety and frustration in the students and/or teachers However, being assessed is

Trang 15

inevitably an anxiety-provoking experience Nevertheless, as teachers and students, we can also attest that a well-written, content-valid test provides us with an opportunity to take stock of what we have learned, and to demonstrate to ourselves and to others the knowledge and skills that we have accumulated A well-constructed test will give both the teacher and students an appraisal of their respective achievements It provides teachers with invaluable information regarding students‘ needs, abilities, and a measure of how well the students have met the course objectives

Rather than lessening self-confidence, achievement tests have the capacity to foster it One

of our goals as teachers is to create multiple opportunities for the students to experience success and to excel as language learners Achievement tests provide one of strongest ways

in which we can help to instill and strengthen positive feelings Such feelings towards the L2 learning experience, as a whole, can be fostered through tests that challenge students with items that have been designed to emphasize what the students are able to do with the L2 The importance of administering tests with challenging items and a high degree of content validity cannot be stressed enough Administering tests will lose its importance if the items do not pose a particular challenge to the students and/or if the items do not adequately reflect the given body of content

Assessment tests in the L2 classroom can foster language learning in a number of ways, including the following: (a) tests can enhance students‘ motivation by serving as indicators

to the progress that they have made, (b) tests can help students establish learning goals for themselves, both prior to and after the test, (c) tests can help students confirm their strengths and weaknesses, thus helping to promote autonomy in their learning experience, (d) tests are able to provide a degree of periodic closure to particular units, while providing students with a sense of accomplishment and mastery of the specified content area, (e) tests can assist teachers in evaluating their own effectiveness, and (f) tests can foster the retention of the particular content area by way of the feedback that they give regarding the students‘ level of mastery

In short, it is evident that assessment tests can be invaluable components of the L2 curriculum Such tests foster the overall language development of the students, while simultaneously providing teachers with critical information regarding the students‘ mastery

Trang 16

of the specified instructional domain Achievement tests can be administered in the second language classroom without jeopardizing the interactive, communicative focus of the L2 classroom that teachers value so greatly

1.2 Qualities of a good test

1.2.1 Test reliability

A fundamental concern in the development and use of language tests is their reliability; that is, the stability of the test as a measure Reliability refers to the consistency of the examination scores Also, it refers to the extent to which the test produces consistent results if different markers mark it Put it another way, a test is reliable if it is consistent

within itself and across time (Alderson, Clapham, & Wall, 1995:6)

In the same vein, Bachman (1990:160) noted that reliability of test score should be exempted from any measurement errors caused by both outside factors like testing conditions and subjective factors such as tiredness or anxiety

Communicative language ability

TEST SCORE

Test method

facets

Personal attributes

Random factors

Trang 17

Figure 1.1: Factors that affect language test scores

(Bachman, 1990:165)

Figure 1.1 shows the effects of various factors on a test In this type of diagram, rectangles are used to represent observed variables, such as test scores, ovals to represent unobserved variables, or hypothesized factors, and straight hypothesized causal relationships The result of the effects of all these factors is that whenever individuals take a language test, they are not all likely to perform equally well, and so their scores vary

Heaton (1988) named two factors affecting the reliability of a test: the extent of the sample

of material selected for testing and the administration of the test He also suggested administering the same test after a lapse of time or administering parallel forms of the test

re-to the same group re-to measure reliability of a test He noted in the first case, it is assumed that all candidates have been treated in the same way in the interval – that they have either all been taught or that none of them have Provided that such assumptions can be made, comparison of the two results would then show how reliable the test has proved In the second case, the parallel forms of the test should be identical in the nature of their sampling, difficulty, length, rubrics, etc If the correlation between the two tests is high, the test can be termed reliable

1.2.2 Practicality

Practicality was defined as ―the relationship between the resources that will be required in the design, development, and use of the test and resources that will be available for these activities‖ (Bachman and Palmer, 1996).This relationship can be represented as below:

Practicality = Available resources

Required resources

If practicality ≥ 1, the test development and use is practical

If practicality ≤1, the test development and use is not practical

Trang 18

Figure 1.2: Practicality (Bachman &Palmer, 1996)

Practicality is a matter of the extent to which the demands of the particular test specifications can be met within the limits of existing resources If the resource demands of the test specifications do not exceed the available resources at any stage in test development, then the test is practical and development and test use can proceed If available resources are exceeded, then the test is not practical and the developer must either modify the specifications to reduce the resources required, or increase the available resources or reallocate them so that they can be utilized more efficiently Thus, a practical test is one whose design, development, and use do not require more resources than are available (Bachman &Palmer, 1996: 36) This idea was similar to Harrison‘s (1983:13) who pointed out that a test should be as economical as possible in time (preparation, sitting, and marking) and in cost (materials and hidden costs of time spent)

1.2.3 Comparison and Discrimination

In a sense, all assessment is based on comparison, either between one student and another,

or between the student as he is now and as he was earlier or between the student‘s capability and the task the test requires him to perform It is also important for assessment

to have the capacity to discriminate among the different candidates and to reflect the differences in the performances of the individuals in the group (Heaton, 1988:165)

Also according to Heaton (1988:167) the differences can be seen in the spread of test scores Briefly, the item in the test should be spread over a wide difficulty level as follows:

- extremely easy items

- very easy items

- easy items

- fairly easy items

- items below average difficulty level

- items above average difficulty level

- fairly difficult items

- difficult items

- very difficult items

- extremely difficult items

Trang 19

As for Harrison (198:14), discrimination was defined as the ―extent to which a test separates the students from each other‖ However, the extent of the need to discriminate will vary depending on the purpose of the test, whether to check students‘ mastering of the syllabus or to locate areas of difficulties for students

1.2.4 Test validity

The primary concern in test development and use is demonstrating not only that test score are reliable but that the interpretations and uses we make of test scores are valid In examining validity, we look beyond the reliability of the test scores themselves, and consider the relationships between test performance and other types of performance in other contexts The types of performance and contexts we select for investigation will be determined by the uses or interpretations we wish to make of the test results

Alderson et al (1995:6) stated that: ― Validity is the extent to which a test measures what it

is intended to measure: it relates to the uses made of test scores and the ways in which test scores are interpreted, and is therefore always relative to test purpose ‖ Chapelle (1999:258) shared the same view point: ―Validity is considered an argument concerning test interpretation and use: the extent to which test interpretations and used can be justified‖

In test validation, we are not examining the validity of the test content or of event the test scores themselves, but rather the validity of the way we interpret or use the information gathered through the testing procedure (Bachman, 1990)

1.2.4.1 Types of test validity

Authors in language testing have divided validity into subtypes differently Hughes (1989:258) divided validity into 4 types including content validity, criterion-related validity, construct validity, and face validity Alderson et al (1995:171-183) classified validity into 3 types including internal validity which has three subtypes – face validity, content validity and response validity; external validity which has two subtypes concurrent validity and predictive validity; and construct validity

Trang 20

However, it has been traditional to classify validity into different types such as content, criterion, and construct validity like what Alderson et al and Hughes did Measurement specialists have come to view these as aspects of a unitary concept of validity that subsumes all of them This unitary view of validity has also been clearly endorsed by the measurement profession as a whole in the most recent revision of the Standards for Educational and Psychological Testing:

Validity …is a unitary concept Although evidence may be accumulated in many ways, validity always refers to the degree to which that evidence supports the inferences that are made from the scores The inferences regarding specific uses of

a test are validated, not the test itself

(American Psychological Association 1985: 9)

It is still necessary to gather information about content relevance, predictive utility, and concurrent criterion relatedness in the process of developing a given test However, it is important to recognize that none of these by itself is sufficient to demonstrate the validity

of a particular interpretation or use of test scores

In this study I employ Hughes‘ classification as it is simple and clear

Face validity is considered important J.B Heaton (1988: 160) said that a test that has good face validity can help maintain students‘ motivation because it makes them try harder; otherwise, students may not put maximum efforts in doing the test if the test doesn‘t look

Trang 21

sound in their eyes Hughes (1989:27) agreed with Heaton‘s idea He pointed out that a test which lacks face validity cannot be welcomed by candidates, teachers, education authorities and employers, and therefore may not be used If used, the candidates may not perform it in a way that doesn‘t reflect their ability

1.2.4.1.2 Construct validity

A test has construct validity if it accurately measures a theoretical, non-observable construct or trait The construct validity of a test is worked out over a period of time on the basis of an accumulation of evidence J.B Heaton (1988: 161) noted that ―if a test has construct validity, it is capable of measuring certain specific characteristics in accordance with a theory of language behavior and learning This type of validity assumes the existence of certain learning theories or constructs underlying the acquisition of abilities and skills.‖ Bachman (1990:254-5) put it in a more simple way He stated that there should

be a consistency between the performance on tests and our predictions made on the basis

of a theory of abilities or constructs In his later publication with Palmer, Bachman (1996:21) advocated the view by saying that interpretation of a given test score can be an indicator of the ability(ies) or construct we want to measure

SCORE INTERPRETATION:

Inferences about language ability (Construct definition)

Domain of generalization

TEST SCORE

the test task Interactiveness

authenticity construct

validity

Trang 22

Figure 1.3: Construct validity of score interpretations

( Bachman & Palmer, 1996:22) The figure above indicates that test scores are to be interpreted appropriately as indicators

of the ability we intend to measure with respect to a specific domain of generalization

1.2.4.1.3 Content validity

A test has content validity if it measures knowledge of the content domain of which it was designed to measure knowledge Put it another way, validity primarily concerns with ―what goes into the test‖ (Harrison, 1983) That is, a content specification or the adequacy with which the test items adequately and representatively sample the content area are to be measured For e.g., a comprehensive math achievement test would lack content validity if good scores depended primarily on knowledge of English, or if it only had questions about one aspect of math (e.g., algebra)

According to J.B Heaton (1988: 160), ―this kind of validity depends on a careful analysis

of the language being tested and of the particular course objectives‖ This is to say that the test should be constructed so as to contain a representative sample of the course and that the relationship between the test items and the course objectives should be apparent Hughes (1989) asserted that in order to judge whether or not a test has content validity, a test specification made at early stage of test construction should be taken into consideration A comparison of test specification and test content is the basis for judgment

as to content validity

Alderson et Al (1995: 173) argued that ―content validation involves gathering the judgment of experts: people whose judgment one is prepared to trust, even if it disagrees with one‘s own‖ This means expert judgments (not statistics) is the primary method used

to determine whether a test has content validity Nevertheless, the test should have a high correlation with other tests that purport to sample the same content domain

Trang 23

Now that we can see the most important distinction between content and face validity: in face validation, the judgment of others may not be necessarily accepted while in content validation judgments from experts are gathered and believed

1.2.4.1.4 Criterion – related validity

Another approach to test validity is to see how far results on the test agree with those provided by some independent and highly dependable assessment of the candidate‘s ability This independent assessment is thus the criterion measure against which the test is validated According to Hughes (1989:23-5), there are essentially two kinds of criterion-related validity: concurrent validity and predictive validity

1.2.4.1.4.1 Concurrent validity

Concurrent validity is established when the test and the criterion are administered at about the same time (Hughes, 1989:23) It refers to how well scores on a new test correspond to the scores obtained in other previously validated measures of the same skills

According to Bachman (1990:248) information on concurrent criterion relatedness takes one of the two forms: (1) examining differences in test performance among groups of individuals at different level of language ability, or (2) examining correlations among various measures of a given ability

Alderson et al (1995:177) stated:

…concurrent validation involves the comparison of the test scores with some other measure for the same candidates taken at roughly the same time as the test This other measure may be scores from a parallel version of the same test or from some other test; the candidates‘ self-assessments of their language abilities; or ratings of the candidate on relevant dimensions by teachers, subject specialists or other informants… The results of the comparison are usually expressed as a correlation coefficient, ranging in value from -1.0 to +1.0 Most concurrent validity coefficients range from +.5 to +.7- higher coefficient are possible for closely related

Trang 24

and reliable tests, but unlikely for measures like self-assessments or teacher assessment

Bachman (1990: 290) pointed out that demonstration of criterion relatedness consists of identifying an appropriate criterion behavior – another language test, or other observed language use – and then demonstrating that scores on the test are functionally related to this criterion The criterion may occur nearly simultaneously with the test, in which case

we can speak of concurrent relatedness (concurrent validity) She also asserted that the major consideration in collecting evidence of criterion relatedness is that of determining the appropriateness of the criterion That is, we must ensure that the criterion is itself a valid indicator of the same abilities measured by the test in question

1.2.4.1.4.2 Predictive validity

This refers to the correlation between scores obtained on a measure such as a proficiency test and the language performance of the students when they use the language in the real world In predictive validation, the predictor scores are collected first and criterion data are collected at some later/future point This is appropriate for tests designed to asses a person‘s future status on a criterion One consideration in examining predictive utility is that of determining the importance, relative to a variety of other factors, of the test score as

a predictor (Bachman, 1990:290)

1.2.4.2 Reasons for giving more emphasis to concurrent validity

There are several reasons why the author investigated concurrent validity of the two English tests in the NSSLE

Firstly, information on concurrent criterion relatedness is undoubtedly the most commonly used in language testing If we can identify groups of individuals that are at different levels

on the ability in which we are interested, we can investigate the degree to which a test of this ability accurately discriminates between these groups of individuals

Trang 25

Also, it is of great interests to know whether a new test is correlated with some standardized test Concurrent validity is estimated in this study because there appears to be

a validated equivalent measure that could act as a concurrent test in the present setting

Besides, it is essential to show concurrent validity between two tests claiming to measure the same thing Other types of validity such as face validity, which is considered as the least scientific method, content validity, which is considered difficult to measure in the social and educational sciences, or construct validity, which is regarded as a rather abstract concept, were not investigated in this study

1.3 Statistical analysis of test results

According to Alderson et al., 1995, the following aspects should be taken into consideration when analyzing a test score: correlation, descriptive statistics, and classical i

1.3.1 Correlation

Correlation refers to the extent to which two sets of results agree with each other A correlation of +1 is a perfect positive correlation whereas a correlation of -1 is regarded as perfect negative correlation: -1 A correlation of 0 means there exists no correlation between the two sets of scores

Correlation can be shown through a scatter plot – X Y chart Each factor or variable should

be drawn on a graph, with X plotted on the vertical axis and Y plotted on the horizontal axis, or visa-versa Each factor or variable should be graphed in such a way that they intersect both axis making one point for each of the X-Y paired numbers

The information above can be summarized as in the table below:

rho = + 1.0 Strong – Perfect Positive As X goes up, Y always also goes up

rho = + 0.5 Weak - Positive As X goes up, Y tends to usually also go up rho = 0 - No Correlation - X and Y are not correlated

Trang 26

rho = - 0.5 Weak - Negative As X goes up, Y tends to usually go down rho = - 1.0 Strong – Perfect Negative As X goes up, Y always goes down

Table 1.1: Correlations

It is not very common for there to be no correlation between the results of two language tests Since both are intended to test aspects of the same strait- language ability- they might

be expected to show at least some degree of agreement

The following formula is used to measure correlation:

rho = 1-

)1(

rho: correlation coefficient

D: difference between ranks

1.3.2.1 The mean

The mean is the sum of all the scores, divided by the number of students (Alderson ,et al, 1995) Put it another way, the mean score of any test is the arithmetical average: i.e the sum of the separate scores divided by the total number of testees (Heaton, 1988)

Heaton also stated that the mean by itself enables us to describe an individual student‘s score by comparing it with the average set of scores obtained by a group, but it tells us nothing at all about the highest and lowest scores and the spread of marks

The following formula is used to measure the mean:

Trang 27

M =

∑X _

1.3.2.4 The standard deviation

This is the square root of the average squared deviation from the mean of the students‘ scores Standard deviation measures the degree to which the group of scores deviates from the mean; in other words, it shows how all the scores are spread out and thus gives a fuller description of test scores than the range, which simple describes the gap between the highest and lowest marks, and ignores the information provided by all the remaining scores Standard deviation is also helpful for providing information concerning characteristics of different groups.(Heaton, 1988) As for Harrison (1983) standard deviation shows how well the test has separated the students from each other

Trang 28

The following formula is used to measure standard deviation:

S.D = Standard deviation

M= the mean of the scores

∑= the sum of

X = the score

N= the number of students

1.3.3 Classical item analysis

According to McNamara (2000), item analysis is a procedure which ―involves the careful analysis of score patterns on each of the test items This analysis tells us how well each item is working; that is, the contribution it is making to the overall picture of candidates‘ ability emerging from the test.‖ He also stated that item analysis usually provides two kinds of information on items:

- Item facility, which helps us decide if the test items are at the right level for the target

group, and

- Item discrimination, which allows us to see if the individual items are providing information on candidates, abilities consistent with that provided by the other items on the test

Trang 29

1.3.3.2 Item discrimination

Item discrimination refers to the degree to which items differentiate among examinees in terms of the characteristic being measured (e.g., between high and low scorers) One method is to correlate item responses with the total test score; items with the highest test correlation with the total score are retained for the final version of the test This would be appropriate when a test measures only one attribute and internal consistency is important The usual method for calculating item discrimination involves comparing performance on each item by different groups of test takers: those who have done well on the test overall, and those who have done relatively poorly For examples, as items get harder, we would expect those who do best on the test overall to be the ones who in the main get them right

If there are a lot of items with problems of discrimination, the information coming out of the test is confusing, as it means that some items are suggesting certain candidates are relatively better, while others are indicating that other individuals are better; no clear pictures of the candidates‘ abilities emerges from the test (the scores, in other words, are misleading, and not reliable indicators of the underlying abilities of the candidates.) Such

a test will need considerable revision

Trang 30

In Vietnam today, English is taught to school children from Grade 6 (aged 11) through to Grade 12 (aged 18) In some primary schools in large cities, English is even taught from Grade 3 (aged 8) There exist two sets of English text books for high school students in Vietnam One is ―Sách chuẩn‖ (i.e ‗standard textbooks‘) which aim at students pursuing

Ban cơ bản (i.e , non-specialization program and Ban tự nhiên (i.e specialization in

sciences) The other is ―Sách nâng cao‖ (i.e ‗advanced textbooks‘), which is set for Ban xã

hội (i.e specialization in humanities) At the end of grade 12, students will sit for the

National Secondary School Leaving Examinations, which consist of 6 subjects including

an English subject test Passing these tests is one of the prerequisite conditions that students need to have to be eligible for taking entrance exams to college and university as well as vocational institutions

2.1.2 Cao Ba Quat Gia Lam High School

Trang 31

Cao Ba Quat Gia Lam High School is a suburban public school in Gia Lam district located

at No 57 Co Bi Street, Gia Lam District, Hanoi The school is supervised by the Hanoi Department of Education and Training and follows completely the national curriculum as well as the directions from the Hanoi Department of Education and Training

The students of the school are mainly from peasant families living in rural areas of Gia Lam district such as Da Ton, Trau Qui, Phu Dong, Trung Mau, Le Chi However, the students‘ parents‘ attitudes together with investments in their learning English is increasing due to the fact that English is getting more and more important in future job finding and career development

In the school, English is instructed by the eight (one male and seven female) qualified and enthusiastic English teachers Both kinds of textbooks are employed Eleven over twelve classes in the school are using ―Sách chuẩn‖; whereas only one class is using ―Sách nâng cao‖

The youngest teacher is 27 years of age and the oldest one is 58 They were all trained adequately in subject matters and methods All of them graduated from ELT, College of Foreign Languages, Vietnam National University, Hanoi None of them, however, possesses a Mater degree in ELT In the coming year, half of them will

2.1.3 Nguyen Gia Thieu High School

Nguyen Gia Thieu High School is a public school of Long Bien district, Hanoi situated at 27/298 Ngoc Lam The school is under supervision of by the Hanoi Department of Education and Training and also follows completely the national curriculum and the directions from the Hanoi Department of Education and Training

The student population of the school is mainly composed of those from urban families whose high living standards and condition enable them to invest time, money and efforts in English, the key to a successful career Beside learning English in school, students also take courses in prestigious English centers such as Equest Academy, Apollo, Cleverlearn, Language Link, just to name a few Their aim is beyond passing the English test in the NSSLE but to get admission to colleges and universities abroad like Singapore, New

Định dạng
Số trang	63
Dung lượng	557,78 KB