(LUẬN VĂN THẠC SĨ) An evaluative study on the current final achievement tests for non-English majors at Quang Ninh Teacher Training College M.A. Thesis Linguistics 60 14 01 1

INTRODUCTION

Rationale of the study

In today's interconnected world, English has emerged as a vital tool for global communication Recognizing this significance, Vietnam has prioritized English language education, leading to its widespread adoption across various sectors Consequently, English has become a mandatory subject in numerous schools and universities throughout the country.

Quang Ninh Teacher Training College (QNTTC), established in 1959, is the oldest institution for undergraduate teacher education in Quang Ninh In 1991, it was restructured from four provincial teacher training institutions: Quang Ninh Early Childhood TTI, Quang Ninh Primary TTI, Quang Ninh Education Management TTI, and Quang Ninh Low Secondary TTI Recognizing the significance of English, the college authorities have prioritized enhancing the quality of English teaching and learning.

Testing and assessment are essential components of the teaching and learning process, especially in foreign language education Language testing is widely acknowledged by professionals in the field as a critical element that goes beyond traditional teaching methods In a competitive educational landscape, testing serves as a vital motivator for learning and significantly influences an individual's career trajectory As noted by Lauwerys and Seaton in the World Yearbook of Education 1969, testing not only aids in educational research and program evaluation but also enhances our understanding of language proficiency and the learning process.

According to Nga (1997:1), tests play a significant role in shaping classroom dynamics, as they are believed to have a direct and indirect impact on both teaching methods and learning outcomes.

Testing plays a crucial role in the teaching and learning process, yet it often does not receive the attention it deserves With five years of experience teaching English to both majors and non-majors, the author has designed and administered various tests, revealing persistent issues such as misalignment between test content and curriculum, and the repetitive use of outdated tests These factors contribute to concerns regarding the validity and reliability of assessments Hughes (1990) notes that much language testing is of poor quality, often negatively impacting teaching and failing to accurately measure intended outcomes Additionally, many teachers lack formal training in educational measurement, leading to a disconnect from the testing process.

A well-designed test is essential for language learners at all proficiency levels To address existing challenges, achievement tests for non-English majors at QNTTC must be created to ensure accuracy and fairness for all students This approach aims to enhance the effectiveness of teaching and foster student satisfaction and motivation These considerations motivated my decision to conduct this study.

“An evaluative study on the current final achievement tests for non-English majors at Quang Ninh Teacher Training College”

Aims of the study

The study aims at evaluating the current final achievement tests at QNTTC To achieve this aim, the following objectives are established:

1 To evaluate the current final achievement tests for non-English majors from perspectives of the teachers and non-English majors at QNTTC

2 To investigate the alignment of the current final achievement tests at QNTTC to The Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR).

Research questions

In order to achieve the above aims of the study, the following questions will be addressed:

1 How do English teachers and non-English majors at QNTTC evaluate the current final achievement tests for non -English majors at QNTTC?

2 How does the current test align to the CEFR?

Scope of the study

This study, titled "An Evaluative Study on the Current Final Achievement Tests for Non-English Majors at Quang Ninh Teacher Training College," aims to examine key issues related to the effectiveness and relevance of the final achievement tests administered to non-English major students at the institution.

- This study is only aimed at evaluating the existing testing situations at QNTTC from two stakeholders, the teachers and the students

- This study is limited to evaluate the final achievement tests for non-English majors

- This study focuses on evaluating the constructs of the final achievement tests at QNTTC and the tests based on the CEFR (PET)

This study conducted at QNTTC presents specific findings that are not meant to be generalized to other educational settings The results are applicable solely to the participants involved in this research and may not extend beyond this particular context.

Significance of the study

The thesis findings support the enhancement of testing methods for non-English majors at QNTTC, providing valuable insights for both educators and students through reflective practices It is anticipated that this research will contribute significantly to improving the overall testing environment at QNTTC, specifically tailored for non-English major students.

Methodology

The above-given aims are to be achieved by means of:

A survey conducted with 30 non-English majors at QNTTC aimed to gather feedback on the current final achievement tests for non-English students The study sought to evaluate their experiences and obtain suggestions for enhancing the testing environment and language assessments at QNTTC.

A survey conducted with 10 teachers from the English Faculty of QNTTC gathered their feedback on the current final achievement tests for non-English majors, along with their recommendations for enhancing these assessments.

(3) Analysis of the contents and constructs of the current final achievement tests at QNTTC to find out the alignment of these tests to the CEFR

In addition to surveys and analyses, the study incorporated information gathered through formal and informal discussions with students and teachers, along with critical reading It utilized a mixed-methods approach, combining qualitative and quantitative methodologies, which included cross-tabulation data and statistical analysis of survey results Furthermore, the study assessed the difficulty level of the current final achievement tests at QNTTC in alignment with the Common European Framework of Reference for Languages (CEFR).

Outline of the thesis

The author divided this study into five chapters:

- Chapter 1: Introduction, this chapter provides the author’ reasons for choosing the topic, aims, research questions, scope, significance, methodology and outline of the study

Chapter 2: Literature Review delves into the theoretical foundations of language testing, exploring essential concepts such as the purpose and significance of testing in language acquisition It categorizes various types of tests based on their intended objectives and outlines the criteria for an effective assessment Additionally, this chapter examines the Common European Framework of Reference for Languages (CEFR) and its relevance for non-English majors, providing a comprehensive overview of the landscape of language testing.

Chapter 3: Methodology outlines the research approach, providing an in-depth analysis of the English teaching and learning environment at QNTTC It includes a concise overview of materials utilized for non-English majors and examines the existing testing conditions at the institution Additionally, this chapter identifies the informants involved in the study and details the instruments and procedures used for data collection and analysis.

Chapter 4 presents the key findings of the thesis, focusing on the current English teaching and learning environment at QNTTC It examines the existing assessment methods and evaluates their alignment with the Common European Framework of Reference for Languages (CEFR), highlighting the implications for effective language instruction and assessment practices.

- Chapter 5: Conclusion, the author sheds the mantle of reviewing the study and suggesting further research.

LITERATURE REVIEW

Basic concepts of testing/ Language testing

Language testing plays a crucial role in the educational landscape, acknowledged by professionals across the field These assessments serve as essential tools that offer insights into language instruction, demonstrating the effectiveness of teaching methods and learning outcomes Additionally, language tests equip both teachers and students with valuable information to guide their decisions in the learning process.

Testing is an essential component of language teaching and a key aspect of effective methodology Various definitions of testing have emerged from different perspectives, highlighting its significance in the educational process.

A test is a measuring device used to compare individuals within the same group (Allen, 1974) Carroll (1968) describes a psychological and educational test as a procedure that elicits behavior to infer characteristics of an individual, while Brown (1971) defines it as a systematic method for measuring behavior According to Peny Urr (1996), the primary purpose of a test is to convey how well testees know or can perform a task Moore (1992) emphasizes that evaluation is crucial for teachers as it provides feedback on student learning and informs future teaching strategies However, Brown (1994) notes that tests often induce anxiety in learners, who fear disappointing results Read (1983) views language tests as samples of linguistic performance, and Nga (1999) defines tests as sets of items presented to students under specific conditions Broughton (1990) highlights the complexity of the term "test," which can refer to a carefully prepared measuring instrument, a quick classroom activity for ongoing assessment, or an item within a larger test Assessment encompasses documenting knowledge and skills to facilitate improvement, utilizing methods like observations and interviews alongside tests Harrison (1983) points out that tests, while necessary, can be seen as unpleasant impositions, yet they serve as valuable tools for measuring learners' abilities in educational settings.

Testing serves as a crucial method for evaluating students' language knowledge and skills While the definition of "testing" varies among researchers, it generally involves students responding to specific questions or assessments that target particular learning aspects This process is often viewed more broadly as an assessment journey, encompassing various stages including preparation, data collection, and evaluation.

The role of testing in teaching and learning

Historically, language testing and teaching were often viewed as distinct processes However, numerous applied linguists and professional designers have emphasized the crucial role that language testing plays in informing both language instruction and learning outcomes.

Heaton (1988:5) emphasizes the intricate relationship between teaching and testing, noting that they are so closely intertwined that it is challenging to separate them He argues that both elements are interdependent, making it nearly impossible to engage in one without considering the other.

Heaton (1988:5) highlights that tests can serve two main purposes: to enhance learning and motivate students or to evaluate their language performance When tests are designed to support teaching, they reinforce the learning process Conversely, when assessments drive instruction, teaching strategies are often aligned primarily with the requirements of the tests.

Testing can significantly impact teaching in both positive and negative ways According to Hughes (1989), the concept of "backwash" can either be harmful or beneficial When the test content aligns with the teaching material and course methods, it can enhance the educational process Conversely, misalignment between testing and teaching can lead to detrimental effects on learning outcomes.

Testing and teaching activities are inherently interconnected and aligned with the course objectives The impact of testing on teaching can be both positive and negative, highlighting the importance of integrating assessment effectively within educational programs.

Types of tests according to test purpose

Language tests are designed for various purposes, leading to the existence of multiple types of tests Each type serves distinct objectives, and the information gathered from these tests informs different decision-making processes Here, we provide a concise overview of several types of language tests based on their intended purposes.

Hughes (1990:13) states: “Diagnostic tests are used to identify students’ strengths and weaknesses They are intended primarily to ascertain what further teaching is necessary”

According to Brown (1994), diagnostic tests are designed to identify individual strengths and weaknesses, enabling educators to address deficiencies in instruction before they become critical.

According to Brown (1994b:259), diagnostic tests are specifically crafted to assess distinct elements of a language Similarly, Harrison (1983b) notes that these tests are often administered at the conclusion of a unit in a course-book, following lessons aimed at teaching specific language points.

Diagnostic tests aim to identify the strengths and weaknesses of test-takers in language proficiency They provide insights into specific issues and suggest appropriate interventions to enhance performance by leveraging strengths and addressing weaknesses.

Placement tests are designed to assess students' abilities and assign them to appropriate levels within a teaching program, ensuring they start courses alongside peers with similar skills (Hughes, 1990) These tests are crucial for quickly determining the right class placement, allowing instruction to commence promptly (Harrison, 1983).

Proficiency tests, as noted by Brown (1995), aim to assess the extent of language learning and retention among students, emphasizing overall language ability rather than specific educational programs or materials Similarly, McNamara (2000) highlights that these tests focus on future language use, independent of the teaching process that preceded them.

Proficiency tests, as defined by Hughes (1990:9), assess individuals' language abilities independent of any prior training or specific language course content Instead, these tests focus on the essential skills and competencies required for test takers to achieve their future goals in the language.

Proficiency tests are essential tools for both teachers and learners, as noted by experts like Carroll and Hall (1985), Harrison (1983a), and Henning (1987), to determine if students are prepared for specific courses or require additional training Popular assessments such as TOEFL and IELTS are commonly used to evaluate students’ proficiency for studying in English-speaking countries In Vietnam, proficiency tests have evolved from levels A, B, and C to align with the CEFR framework, now categorized as A1, A2, B1, B2, C1, and C2, reflecting the country's commitment to enhancing English language competence.

Achievement tests, as defined by Hughes (1990:10), are specifically designed to assess the success of students or courses in meeting educational objectives, unlike proficiency tests These assessments are widely utilized across all educational levels and play a crucial role in evaluating the language knowledge and skills that students have developed throughout the English teaching and learning process.

Achievement tests are integral to the instructional process, as they gather evidence of student progress during or at the conclusion of a course According to McNamara (2000), these tests evaluate whether learning goals have been met and should align with the teaching they assess While they may focus on specific aspects of knowledge, such as grammar or vocabulary, their primary purpose is to reflect the content taught in the classroom Brown (1994) echoes this perspective, emphasizing that achievement tests are directly connected to lessons, units, or the overall curriculum.

Achievement tests are divided into two basic types according to the time of administration They are namely progress achievement tests and final achievement tests

Progress achievement tests, whether criterion-referenced or objective-referenced, are designed to assess learners' advancement towards specific course objectives These tests are closely aligned with the objectives of the course, ensuring a clear pathway towards the final achievement test Typically administered to evaluate the degree of mastery students have attained from classroom instruction, these assessments play a crucial role in measuring educational progress.

Achievement tests enable teachers to identify and diagnose areas where students struggle, allowing for targeted remedial action Additionally, these tests offer students an opportunity to enhance their learning and confidently demonstrate their proficiency in the target language This process also serves as a preparatory step, helping students become familiar with the testing format.

Final achievement tests are conducted at the conclusion of a course and can be administered by education ministries, official examining boards, or teaching institution members These tests must align with the course content, though there is ongoing debate among language testers regarding the specifics of this relationship They provide teachers with valuable insights into the effectiveness of their instruction and help identify areas where students may struggle.

Hughes (1990) categorizes tests into two types based on their approaches: the syllabus-content test and the syllabus-objective test The syllabus-content test aligns its content with a detailed course syllabus or relevant materials, while the syllabus-objective test assesses students' abilities to meet course objectives Although the latter can measure achievement, it may hinder teaching effectiveness by focusing on testing issues rather than actual student accomplishments.

Criteria of a good test

When designing language tests, test creators must consider several critical questions: How can we effectively assess all language skills? Who is the intended audience, and is the test appropriate for them? What specific skills or knowledge is the test designed to evaluate? How can we determine the quality and effectiveness of the test? Additionally, does the test accurately measure the desired proficiency level? These considerations are essential to ensure that the test has a positive impact on teaching and learning outcomes.

To create an effective test, educators must consider several key factors, including the test's purpose, course content, and students' backgrounds Additionally, high-quality tests should exhibit essential characteristics such as validity, reliability, practicality, and discrimination, as emphasized by prominent scholars in the field of testing, including Valette.

(1977), Harrison (1983a), Carroll and Hall (1985), Henning (1987), Heaton (1988), Hughes (1990) and Brown (1994a) all good tests possess all these four characteristics These characteristics will be critically reviewed bellow

Validity is the most crucial characteristic of a test, as even a reliable test holds little value if it is not valid According to Carmen (1995), a test is considered valid if it measures the intended content accurately Hughes (1989) supports this definition, stating that a test has validity when it accurately assesses what it is meant to measure Aik (1983:2) further emphasizes that a test is valid if it aligns with the aims and purposes of the learning areas it addresses Therefore, the validity of a test is inherently linked to the objectives outlined in the course syllabus.

Validity encompasses various types, including face validity, content validity, criterion-related validity, construct validity, empirical validity, and predictive validity Among these, content validity, face validity, and criterion-related validity are considered the most essential for ensuring the accuracy and relevance of assessments.

Content validity is the alignment between a test's content and the materials it aims to assess While it's impossible for a test to encompass all elements of the subject matter, it should provide a reasonable and representative sample According to Read (1983), content validity is crucial for classroom testing, as it ensures that the test reflects the syllabus's content and objectives Anastasi (1982) defines content validity as a systematic evaluation of test content to confirm it adequately represents the behavior domain being measured, offering essential guidelines for establishing this validity.

- The behavior domain to be tested must be systematically analyzed to make certain that major aspects are covered by the test items with correct proportions;

- The domain under consideration should be fully described in advance, rather than being defined after the test has been prepared;

- The content validity depends on the relevance of the individual test relevance of item content

From the above concepts, it is obvious that the contents of a tests are main concern in achieving its content validity

Face validity refers to how well a test's appearance aligns with its intended measurement, as highlighted by Anastasi (1982) It is important to note that face validity is not a technical measure of validity; rather, it reflects perceptions from test-takers, administrative personnel, and untrained observers According to Hughes (1990), a test possesses face validity when it appears to measure what it claims to measure Therefore, it is essential for tests to be aligned with course content and effective teaching methodologies.

Criterion-related validity is the degree to which test results align with an external criterion that has established validity Unlike face validity and content validity, which are assessed subjectively, criterion-related validity is determined through objective measures, ensuring a reliable comparison between the test and the established standard.

In short, validity is the “must” for testers to take into consideration when they construct a language test

Reliability is a crucial characteristic of all tests, especially language assessments, as an unreliable test holds little value It plays a significant role in proficiency tests used for public achievement and classroom evaluations Teachers must understand the various factors influencing reliability, as many mistakenly view tests as flawless measuring tools In reality, even the most well-designed tests are inherently imprecise in measuring skills.

Reliability in assessment is primarily concerned with the consistency of candidate performance and scoring Factors influencing this consistency include the number of questions, test administration, and instructions provided As defined by Moore (1992), reliability reflects the dependability of a measurement device in assessing specific behaviors or traits Similarly, Bachman (1990) emphasizes that reliability pertains to the quality of test scores For example, a multiple-choice test may produce varying scores across different administrations, indicating low reliability Additionally, the conditions under which a test is administered—such as timing, environment, and observation—significantly impact the results, highlighting the importance of standardized testing conditions for accurate assessment.

It is important to understand that a test can be reliable without being valid However, relying solely on reliability is insufficient if the test fails to accurately measure its intended purpose.

Separating a test's validity and reliability from its practicality is not advisable for test constructors Practicality encompasses the resources available for test administration and scoring As Harrison (1983) notes, "a valid and reliable test is of little use if it does not prove to be a practical one," emphasizing that financial constraints, time limitations, and ease of administration are crucial factors A test becomes impractical if it is excessively costly or requires an unreasonable amount of time to develop Brown (1994) further highlights that a test is impractical if it is prohibitively expensive or takes an excessive duration, such as ten hours, to complete.

Bachman and Palmer (1996) emphasize the importance of aligning the resources needed for test design, development, and usage with the resources available for these activities They also highlight the concept of practicality, which pertains to how the test will be implemented in specific contexts and whether it will ultimately be utilized.

In conclusion, a test has practicality if it does not involve much time or money in constructing, implementing and scoring it

A crucial aspect of any assessment is its ability to differentiate among candidates and accurately reflect their performance variations Assessments typically involve comparisons, either between students (norm-referenced) or an individual’s current performance versus their past achievements (Harrison, 1983b) This principle applies to both teacher-made and standardized tests An effective language test must distinguish between a student and their peers; if a test is excessively easy or difficult, it fails to serve its purpose of discrimination As noted by Heaton (1988:165), a score of 70% lacks meaning without the context of other scores, and tests where most candidates achieve 70% do not effectively differentiate between students.

To effectively incorporate a discrimination feature in a test, it is essential to include a range of items that span from extremely easy to extremely difficult This scale should encompass categories such as extremely easy, very easy, easy, fairly easy, below average difficulty, average difficulty, above average difficulty, fairly difficult, difficult, very difficult, and extremely difficult items.

Difficulty level is a crucial aspect of test items, as it reflects how easy or difficult they are perceived by students (Hai, 1999) Test items that are too easy fail to differentiate among students, providing no insight into their varying abilities Conversely, Henning (1987) emphasizes that accurately determining an item's difficulty is perhaps its most significant characteristic.

The CEFR

The Common European Framework of Reference (CEFR), established by the Council of Europe in 2001, outlines the proficiency levels of language learners across four skills: speaking, reading, listening, and writing It categorizes language abilities into six distinct reference levels, providing a comprehensive guide for assessing and enhancing language learning.

In November 2001, the European Union Council recommended the use of the Common European Framework of Reference for Languages (CEFR) to establish language proficiency validation systems The six CEFR levels—A1, A2, B1, B2, C1, and C2—are increasingly recognized as the European standard for assessing language skills.

The CEFR categorizes language learners into three main divisions: Basic User, Independent User, and Proficient User, which are further divided into six specific levels Each level outlines the expected abilities of learners in reading, listening, speaking, and writing.

Target level for the non-English majors

As per Decision 1400/QĐ-TTg issued on September 30, 2008, college and university students majoring in fields other than English are required to achieve a KNLNN level of 3 (B1) in English for graduation According to the CEFR assessment framework established in 2001, students at the B1 level demonstrate a functional command of English, enabling them to communicate effectively in various contexts.

 Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc

 Can understand the main point of many radio or TV programmes on current affairs or topics of personal or professional interest when the delivery is relatively slow and clear

 Can understand texts that consist mainly of high frequency everyday or job- related language

 Can understand the description of events, feelings and wishes in personal letters

 Can deal with most situations likely to arise while traveling in an area where the language is spoken

 Can produce simple connected text on topics that are familiar or of personal interest

 Can enter unprepared into conversation on topics that are familiar, of personal interest or pertinent to everyday life (e.g family, hobbies, work, travel and current events)

 Can connect phrases in a simple way in order to describe experiences and events, my dreams, hopes and ambitions

 Can briefly give reasons and explanations for opinions and plans I can narrate a story or relate the plot of a book or film and describe my reactions

 Can describe experiences and events, dreams, hopes and ambitions and

 Can write personal letters describing experiences and impressions

 Can write simple connected text on topics which are familiar or of personal interest

The 6 levels of the CEFR aligned with international English tests can be summarized as follows:

Pass Pass Table 2.1: Common European Framework of Reference (CEFR)

Review of related studies

Establishing the relationship between a test product and the Common European Framework of Reference for Languages (CEFR) is complex due to the CEFR's intentional underspecification, which allows it to function as a flexible framework (Davidson & Fulcher, 2007; Milanovic, 2009; Weir, 2005) This relationship is not established through a single assessment; instead, it requires the ongoing accumulation of evidence to demonstrate that the quality and standards of the test are consistently maintained over time.

Limited research has been conducted on the alignment of various tests with the Common European Framework of Reference (CEFR) In the 2015 research memorandum titled “The Association between TOEFL iBT Test Scores and the Common European Framework of Reference (CEFR) levels” by Spiros Papagoergiou, Richard J Tannenbaum, Brent Bridgeman, and Yeonsuck Cho, the authors highlighted the content alignment of the TOEFL iBT with the CEFR framework.

Summary of Chapter 2

Chapter 2 has briefly discussed the basic concepts of language testing The Chapter has been concerned with the issues relating to different test types according to the test purpose Besides this, the author has introduced the characteristics of a good test Finally, the CEFR, the target level for the non-English majors and the related studies to the CEFR have also been introduced.

METHODOLOGY

Setting of the study

3.1.1 English teaching and learning of non-English majors at QNTTC

QNTTC is considered the oldest institution in providing undergraduate teacher education in Quang Ninh English has been taught here for both English majors and non-English majors since 1982

During the non-English majors’ course of learning English (three semesters with

Students are required to complete 105 lesson periods, equivalent to 10 credits, and pass three tests corresponding to each semester to achieve a B1 level upon course completion To support this curriculum, various textbooks have been utilized, starting with Head Way, followed by New Head Way, Lifelines, and currently, New Cutting Edge, Pre-Intermediate Additionally, improvements have been made in the quality and quantity of the teaching staff to enhance the overall learning experience.

3.1.2 Brief description of the materials used for non-English majors at QNTTC

As mentioned earlier, to get Level B1 they have to take an English course consisting of 3 semesters, the course book “New Cutting Edge - Elementary” and

“New Cutting Edge - Pre - Intermediate” are now chosen as the main ones According to the authors (Sarah Cunningham, Peter Moor and Jane Comyns Carr),

"New Cutting Edge - Elementary" and "New Cutting Edge - Pre-Intermediate" guide students from A1 to A2 and A2 to B1 levels of the CEFR, respectively Utilizing a task-based learning approach, these books aim to help students effectively use their language skills to achieve specific communication goals They feature a comprehensive syllabus that includes in-depth grammar, vocabulary, and skills work, alongside systematic vocabulary building focused on high-frequency, practical words and phrases With clearly-structured tasks designed to enhance fluency and confidence, these resources serve as a vital stepping stone for students transitioning from General English to the academic and professional language demands they will encounter in college and their future careers.

The key points of 8 modules (New Cutting Edge – Pre - Intermediate) for third semester are described as in Appendix 1

3.1.3 The testing practice at QNTTC

English is a mandatory subject throughout the educational curriculum, making test activities a significant focus During examinations, students are arranged in alphabetical order and provided with unique test papers to minimize the chances of copying, ensuring that each student completes the test independently.

Objective tests, including multiple choice, error correction, sentence construction, and Q&A formats, are employed to enhance reliability and discrimination among test takers Overall, QNTTC English tests are deemed fair and beneficial for students To simplify the grading process for teachers, separate answer sheets are provided, allowing test takers to write their answers while ensuring anonymity by removing their names from the sheets, promoting equality among candidates.

Non-English majors at QNTTC are required to complete three assessments throughout their studies The first test, consisting of 50 items, is administered at the end of the first semester, after students have covered 13 modules from the New Cutting Edge – Elementary book, and is timed at 90 minutes The second assessment includes 40 items and is conducted at the end of the second semester, following the completion of 2 additional modules from the New Cutting Edge – Elementary and 7 modules from the New Cutting Edge – Pre-Intermediate, with a duration of 60 minutes Finally, after completing the last 8 modules, students will take the third test to assess their overall progress.

Cutting Edge – Pre-Intermediate, students have to take the final achievement test (Test 3) with 40 items in 60 minutes.

Informants

The study's informants included students and teachers from QNTTC, with students chosen as they were the primary learners who completed all three final achievement tests The teachers selected for the study possessed a minimum of three years of teaching experience and had previously developed final achievement tests for non-English major students.

The informants of the study are the non-English majors in the college, ranging from

A study of students aged 19 to 21 reveals that 77% have been learning English for 7 to 10 years, while 20% have studied for over a decade, with all participants originating from urban areas This urban background has provided them with better access to English learning opportunities, such as part-time courses and interactions with English speakers However, as non-English majors, many students lack motivation and view English as less significant compared to other subjects, primarily studying it to pass exams While 40% acknowledge the importance of English, only 17% understand its value as a national language for communication in daily life Conversely, 37% deem English unimportant, citing a lack of necessity for their future careers.

At QNTTC, there are 12 English teachers, with 10 participating in a recent study A significant majority, three-fourths, hold or are pursuing MA degrees The teachers' ages range from 27 to 50 years, with half having over a decade of teaching experience Only 10% have taught for 1 to 3 years, while 20% have 4 to 6 years and another 20% have 7 to 10 years of experience in teaching English.

Data collection instruments

This research is conducted by using surveys, the current final achievement tests for non-English majors at QNTTC, the CEFR and some softwares such as Pearson, Estim

To assess the English testing situation at QNTTC, researchers utilized survey questionnaires as the primary data collection tool, enabling efficient information gathering from a large respondent pool Additionally, the study examined the alignment of final achievement tests for non-English majors by analyzing the constructs and content of these assessments to enhance the research's reliability.

Two sets of survey questionnaires are conducted with the assistance of 10 teachers of the English Faculty and 30 non-English majors

The first questionnaire with 12 questions was administered to 30 second-year students of non-English majors at QNTTC:

- The first question was written to find out the students’ time of learning English

- The second question is to investigate whether the students think English is important to their future career or not

- The third question aims at finding out how well they complete a test

- The fourth question is to examine the difficulty/difficulties the students have when they do a test

- Questions 5 - 7 aim at eliciting their interest in some test items (reading and answering questions, making up sentences and correcting the mistakes)

- Questions 8, 9 and 11 were written to investigate their attitudes toward the current test

- Question 10 was used to get the students’ opinions about correcting their work right after they have done the test

- Question 12 was written to find out other opinions of the students about the current test

The second survey questionnaire was administered to 10 teachers of the English Faculty at QNTTC including 12 questions:

- Question 1 is to investigate the teachers’ teaching experience

- Question 2 was written to find out the necessity of an English achievement test at the end of each semester

- Questions 3 - 5 aim at investigating the teachers’ test making as well as their reasons for their answers

Questions 6 to 8 aimed to explore teachers' perspectives on the content, marking scale, and time allowance of the current English achievement test for non-English majors, along with their justifications for these opinions.

- Question 9 - 11 were used to get their opinions and reasons for changing the construction of the current test (the content, the marking scale, the time allowance)

- Question 12 is to investigate the teachers’ further comments and suggestions towards the improvement of the current test

These two sets of questionnaire can be seen in Appendixes 2 and 3

For this thesis, two final achievement tests were randomly selected, ensuring they were the most recent and comparable to other final tests for non-English majors at QNTTC Each test comprises four sections.

- Item format: Multiple choice questions

- Scores: 0.25 points for each item

Section 2: Grammar and vocabulary (5 points)

Part 1: Choose the best answer

Part 2: Identify the mistakes and correct

Use the set of words and phrases to make meaningful sentences

The chosen tests can be seen in Appendixes 4 and 5

 The softwares Englishprofile, Pearson and Estim

The evaluation process utilized the software tools Englishprofile, Pearson, and Estim to enhance reliability These tools were instrumental in assessing vocabulary, grammar, reading, and writing components to determine their alignment with the Common European Framework of Reference for Languages (CEFR) The author examined vocabulary and grammar items within Englishprofile to identify their respective levels Additionally, Pearson and Estim were employed to analyze the reading texts and evaluate the difficulty levels of both the texts and the associated questions.

The author, in collaboration with fellow English teachers, conducted an analysis of the test items to evaluate their alignment with the skills outlined in the CEFR descriptors The findings of this investigation are detailed in Appendix 6.

The alignment framework

The alignment between the current final achievement tests at QNTTC and the CEFR (tests at Level B1 – PET) can be evaluated in terms of:

- The constructs of the two tests

- The contents of the two tests

- The length of the two tests

- The degree of difficulty of the final achievement tests at QNTTC based on the CEFR.

Data collection and data analysis procedure

To accomplish the purpose of the study, the following procedures were pursued:

A survey was conducted involving two sets of questionnaires distributed to 10 English teachers and 30 non-English majors at QNTTC The teachers completed the questionnaire during a break in their weekly English group meeting, while the students answered theirs at the end of their class, shortly after taking their final achievement tests.

Before administering the instruments, participants were informed about the study's purposes and importance, and they received oral instructions on completing the surveys After 30 minutes, the completed surveys were collected, and the data from both surveys were imported into Excel for analysis The author applied descriptive and inferential statistics, utilizing frequencies and sorting to determine the percentage emphasis on each item for accurate and effective data interpretation.

To assess the alignment of final achievement tests with the CEFR, the evaluation focused on the constructs, content, and length of the tests, ensuring reliability and validity through expert judgments Two experienced English teachers assisted in this process, utilizing their expertise along with software tools such as Englishprofile, Pearson, and Estim They evaluated each test item, including vocabulary and grammar, and assessed the difficulty level of the reading texts The alignment of vocabulary and grammar was specifically analyzed using the English Profile, while Pearson and Estim helped determine how well the texts of the final achievement tests corresponded with CEFR standards.

Summary of chapter 3

This chapter outlines the study's context regarding English teaching and learning for non-English majors at QNTTC, including a brief overview of the materials utilized and the existing testing conditions It also details the informants involved, the data collection tools employed, and the alignment framework established Additionally, the procedures for data collection and analysis are introduced.

FINDINGS AND DISCUSSION

The current tests at QNTTC

4.1.1 Students’ comments on the existing tests

Figure 1 illustrates the test performance of students, revealing that only 7% achieved scores between 80% and 100% In contrast, 40% scored between 20% and 50%, while 17% managed to score between 0% and 20%, despite having studied English for several years The accompanying chart provides a clearer depiction of these results.

Figure 4.1: Students’ accomplishment of a test

Many students struggle to achieve perfect scores on tests due to various challenges Specifically, 17% of students find phonetics challenging, while 43% encounter difficulties with different types of test items Additionally, issues with vocabulary affect 33% of students, and 37% face problems with grammatical structures.

Figure 4.2: Students’ difficulty/ difficulties when doing the test

A recent survey reveals that 67% of students prefer reading and answering questions on tests, as they find it easier to score marks from information directly available in the texts In contrast, only 20% show interest in creating sentences from prompts, while 33% enjoy correcting mistakes Many students perceive sentence construction and error correction as challenging tasks that demand a strong grasp of English grammar and vocabulary.

Figure 4.3: Students’ interests in test items

4.1.2 Students’ opinions towards the improvement of the tests

While only 30% of students express a desire to change the current test format, a significant portion—43% and 87%—advocate for the inclusion of time limits and a marking scale for each section to better manage their performance Conversely, many students feel that adding time constraints is unnecessary, as they prefer to avoid the pressure associated with strict timing during the test.

4.1.3 Teachers’ comments on the existing tests

At QNTTC English Faculty, all teachers have a minimum of three years of experience teaching non-English majors and are responsible for creating their own final achievement tests None of the teachers rely solely on complete tests from test books, as these often do not meet their students' needs About 20% of the teachers prefer to design their own tests to align with their specific objectives, while 60% adapt existing tests from books to save time and ensure suitability for their students The remaining 20% utilize various sources to develop tests that are both reliable and engaging.

Figure 4.5: Teachers’ making the English achievement test

Most teachers have created final achievement tests, but many do not consider them reasonable Half of the teachers rated the tests as not very reasonable, with 10% believing they are entirely unreasonable, primarily because the tests lack requirements for essay or letter writing, which are essential for CEFR-based assessments Additionally, the existing tests focus solely on reading and writing skills Only 40% of teachers felt that non-English majors would not need English in their future careers, suggesting that the current tests may be deemed reasonable by some.

Figure 4.6: Teachers’ attitudes toward the current test

Teachers expressed varied opinions on the current test, with nearly all agreeing that the marking scale was reasonable, while only 10% deemed it unreasonable due to discrepancies in item difficulty Regarding time allowance, no teachers rated it as very reasonable or unreasonable; instead, 70% considered it reasonable, while 30% found it not very reasonable.

4.1.4 Teachers’ opinions towards the improvement of the tests

A recent survey revealed varied opinions on the current test structure among teachers While 70% expressed a desire to change the test content to better reflect the comprehensive aspects of English learning, 30% believed the existing content was adequate Regarding the marking scale, 70% agreed it should be adjusted based on test format and student proficiency, while 30% opposed changes, citing that students typically perform well on multiple-choice questions Additionally, opinions on the 60-minute time allowance were split, with 40% satisfied with the duration for 40 items, including 20 multiple-choice questions, whereas 60% felt that the time was excessive, potentially leading to distractions or opportunities for copying.

The alignment between the current tests at QNTTC and the tests according to

4.2.1 In terms of their constructs

The final achievement tests at QNTTC emphasize phonetics, vocabulary, grammar, reading comprehension, and writing skills These assessments include multiple-choice questions that address phonetics, grammar, vocabulary, and reading comprehension, ensuring that the topics and content are familiar to the students.

The test comprises four sections with an uneven distribution of items The phonetics section includes five test items, where two assess word stress and three focus on vowel sounds In contrast, the grammar and vocabulary section features 15 multiple-choice questions that cover the material learned throughout the course Additionally, the test evaluates reading and writing skills; however, the writing section is limited to sentence building, while the reading section requires students to select the best answers and respond to questions.

In contrast, the tests according to the CEFR usually consist of all four skills and sometime structure and written expression and the Cambridge PET test consists of

Writing skills encompass the ability to craft essays, letters, cards, and stories, necessitating a comprehensive mastery of vocabulary, grammatical structures, and formatting knowledge.

The vocabulary used in the final achievement tests for non-English majors at QNTTC is primarily sourced from the course book New Cutting Edge – Pre Intermediate The tests predominantly feature words from levels A1 to B1, with only one word classified at level B2, which does not significantly impact the students' overall results This indicates that students have effectively learned the necessary vocabulary at lower levels, enabling them to achieve the target level of B1.

Questions Answers/ Degree of difficulty

Table 4.1: Analysis of the degree of difficulty of the test items in phonetics in Test number 1

Questions Answers/ Degree of difficulty

Table 4.2: Analysis of the degree of difficulty of the test items in phonetics in Test number 2

Table 4.3 illustrates the difficulty levels of the test items in Test number 1 according to the CEFR framework, revealing that 14 out of 20 responses fall within the B1 level, while 6 responses are categorized as A1 or A2.

Questions Answers/ Degree of difficulty Notes

6 Ha Long Bay is located in the northeastern part of Vietnam B1

7 H He used to be a good student in my class when he was small B1

8 B Before that time, he worked for that company for 3 years A1

9 It has rained since 3 o’clock B1

10 I I have a lot of friends in Hanoi A2

11 We can’t go to the movies because I have to write an essay A2

12 Turn right at the traffic lights A1

13 S She probably won’t attend this course B1

14 I I would study harder if I were you B1

15 Q Quang Ninh’s natural land consists of 17 types of soil B1

16 Up to now, I have worked for that company for 10 years A1

17 When he arrived, we were having dinner B1

18 I wonder if she still remembers me A1

19 Do you know where Quang Trung street is? B1

20 If you’ve got a temperature, take an aspirin B1

21 If she were here now, I would tell her the truth B1

22 What were you doing when I phoned you at 10 p.m last night? B1

23 I don’t have much free time during the week B1

24 If we drive quickly, we will probably get home before it gets dark B1

25 This book is very interesting I’ve read it many times B1

Table 4.3: Analysis of the degree of difficulty of the test items in grammar and vocabulary in Test number 1

Table 4.4 is about the degree of difficulty of the test items in Test number 2 in accordance with the CEFR From this table, we can see that 15 out of 20 answers

Questions Answers/ Degree of difficulty Notes

6 Sh She usually sits in the middle of the class B1

7 Q Quang Ninh Teacher Training College is located in Nam Khe district A2

8 The boy we met yesterday is one of - my brothers A2

9 At 7 a.m last Sunday, when I came to her house, she was sleeping B1

10 He spends a lot of time playing computer games A2

11 We definitely won’t leave you alone B1

12 Tuan Chau Island belongs to Ha Long city B1

13 We haven’t met each other since we left school B1

14 She has got a sister but she has no brothers B1

15 Y You will pass the exams if you study harder B1

16 I’ll finish this report soon if necessary B1

17 Many people didn’t use to wash very much because they thought baths were dangerous B1

18 At that time, he was a professional footballer B1

19 It’s on the other side of the road A2

20 D Do you know where he lives? B1

21 There used to be a hospital here B1

22 At 3 p.m yesterday? I was doing my homework B1

23 I don’t have enough money to go on holiday is this year so I’ll have to stay at home A2

24 P Promise to tell me the news as soon as you hear anything B1

25 I I’ve just seen a friend of mine on TV B1

Table 4.4: Analysis of the degree of difficulty of the test items in grammar and vocabulary in Test number 2

Non-English majors are required to achieve a B1 level, and to assess whether the reading texts meet this standard, the author utilized Estim and the Pearson Reading Maturity Metric (RMM) for analysis The findings, presented in Tables 4.5 and 4.6, indicate that while all four texts from the two tests fall within the A2 to B1 range according to Estim indicators, the difficulty levels of these texts vary significantly Additional readability metrics, including Flesh-Kincaid Grade Level, Coleman-Liau Readability, Dale-Chall Readability, and Automated Readability, further illustrate these differences in text complexity.

 The indicator of the first text of Test number 1 is 27 - the nearly lowest of the range from A2 to B1

 The indicator of the second text of Test number 1 is 32 (that is the highest of this range)

 The indicator of the first text of Test number 2 is 25 - the lowest of the range from A2 to B1

 The indicator of the second text of Test number 2 is 29 (that is quite high in this range)

It can be seen clearly when looking at the other indicator – Flesh-Kincaid Grade Level that the texts of Test number 1 are more difficult than these of Test number 2

Section Questions Length Estim indicators

Degree of difficulty based on other criteria

Read and choose the best answers

26 – 30 89 words (8 sentences) 11.13 words per sentence

Table 4.5: Analysis of the degree of difficulty of the reading texts in Test number 1

Section Questions Length Estim indicator

Degree of difficulty based on other criteria

Read and choose the best answers

Table 4.6: Analysis of the degree of difficulty of the reading texts in Test number 2

An analysis of the reading text question items from two tests reveals that they are largely similar, with most questions categorized at the A2 level However, a closer examination indicates that the questions in the first test are slightly more challenging than those in the second test.

Types of question Level Types of question Level

32 specific information with some inference from the information in the text

A2+ specific information, with some inferencing

33 Inference information B1- Simple inference information

34 specific information, it might be with some inferencing

Table 4.7: The analysis of question items of the reading texts of Tests number 1&2

Table 4.8 illustrates that the difficulty levels of the two tests differ significantly The Vocabulary and Grammar section of Test 1 appears to be easier compared to Test 2, while the Reading section of Test 1 is somewhat more challenging than that of Test 2 This discrepancy highlights that, despite being designed for the same students by a single creator, the tests are not aligned in terms of difficulty.

Levels Test number 1 Test number 2

Table 4.8: The comparison between Tests number 1 and 2

A Cambridge PET reading – writing test was chosen to compare with a current final achievement test at QNTTC (see Appendix 7) Here are some main points:

The current tests at QNTTC

- The texts in PET are longer than the texts of the current tests at QNTTC

- The number of sentences is the same, the formats are not the same but they are equivalent

- Write a short letter or a story (100 words)

- Multiple choice questions in grammar, vocabulary and phonetics (20 sentences)

- They are different both in number of sentences and format

Table 4.9: Comparison of the length between the current tests at QNTTC and the reading – writing tests of PET

The final achievement test at QNTTC differs significantly from the Cambridge PET test in both structure and duration While the QNTTC test requires students to complete it in just 60 minutes, the Cambridge PET test includes more items and demands over 2 hours for completion.

To meet the Ministry of Education and Training's requirement for B1 level proficiency based on the CEFR, QNTTC must enhance its tests by improving both content and structure Currently, the tests focus on language elements but do not adequately assess all language skills learned by students Additionally, the difficulty level of these tests falls short of the expected standards for non-English majors, leading to concerns about their validity, reliability, and ability to discriminate between varying levels of student performance.

Summary of Chapter 4

This chapter presents a situational analysis of the teaching and learning environment at QNTTC, focusing on the current assessment methods It also examines the alignment of QNTTC's final achievement tests with the Common European Framework of Reference (CEFR) and the Cambridge Preliminary English Test (PET).

CONCLUSION

Tiêu đề	An Evaluative Study On The Current Final Achievement Tests For Non-English Majors At Quang Ninh Teacher Training College
Tác giả	Vũ Thanh Hòa
Người hướng dẫn	Đỗ Thị Thanh Hà, Ph.D
Trường học	Vietnam National University, Hanoi
Chuyên ngành	English Teaching Methodology
Thể loại	thesis
Năm xuất bản	2016
Thành phố	Hanoi

Định dạng
Số trang	82
Dung lượng	1,04 MB