Assessing content validity and internal consistency reliability of a vietnamese standardized test of english proficiency (vstep 3 5) reading test = Đánh giá Độ giá trị nội dung và Độ Ổn Định bên trong của bài t

Assessing content validity and internal consistency reliability of a Vietnamese standardized test of English proficiency (vstep.3-5) reading test = Đánh giá độ giá trị nội dung và độ ổn định bên trong của bài

INTRODUCTION

Rationale

The VSTEP (Vietnamese Standardized Test of English Proficiency) is the first standardized English test in Vietnam The VSTEP was introduced by the Ministry of Education and Training (MOET) in Vietnam and is being used to evaluate students' and English language learners' English language ability within the Vietnamese educational system Its objective is to ascertain test-takers’ degrees of English proficiency in four domains from level 3 to level 5 in accordance with the six-level foreign language proficiency framework for Vietnam This test assesses the ability to use English for communication, reading comprehension, writing, and listening comprehension VSTEP can be used for many different purposes, such as giving admission to universities and colleges, providing English certificates for students, or assessing English proficiency for personal goals By aligning with international standards, VSTEP plays an important role in measuring and comparing the English proficiency of Vietnamese students and students with international standards and supporting the country’s effort to improve the education system and prepare students for international opportunities

VSTEP is well recognized and used in Vietnam; however, empirical research that precisely assesses the content validity and internal consistency reliability of the VSTEP 3-5 Reading sample test is severely lacking The findings of previous research on language proficiency tests have been conflicting, and different viewpoints have been expressed on the most effective ways to examine these crucial components of assessment quality

Furthermore, according to the MOET announcement, only organizations that have acquired official Ministry permission may administer tests for foreign language competency In addition, all live tests are strictly secured by law This leads to the need for quality sample tests so that those who are interested in the test can have

2 information about its structure, time, some kinds of questions, and format Therefore, the National Testing Center of Educational Testing and Assessment has developed a computerized test website for the evaluation of foreign language competence to introduce the sample tests in the assessment of foreign language competence according to the six-level foreign language proficiency framework for Vietnam so that teachers and interested foreign language learners can get familiar with the computerized form of the test

The Education Quality Management Agency of MOET declared that around thirty institutions and academies nationwide are authorized to administer VSTEP tests and grant certifications All of them notify interested lecturers, trainees, and contestants of the address of the test page developed by the National Testing Center of Educational Testing and Assessment on their websites

The issue, however, is whether this sample test fully and accurately reflects the English language skills of students and users in the context of the six-level foreign language proficiency framework for Vietnam With the continuous development of English education and the science of measurement and evaluation, this issue is becoming increasingly important and requires careful consideration Therefore, it is crucial to assess the quality of the sample tests to make sure they are valid and reliable so that candidates can follow and do the sample tests, which is good preparation for taking official tests.

Aims and objectives of the research

The primary aim of this research is to assess the content validity and internal consistency reliability of the VSTEP 3-5 Reading sample test, with a focus on aligning the test with the six-level foreign language proficiency framework for Vietnam

This aim is specified in the following objectives: The first is to examine its alignment with framework objectives or investigate how well the reading sample test aligns with the objectives and proficiency levels outlined in the Six-Level Competence Framework The second objective is to analyze its internal consistency reliability,

3 which involves determining the consistency and coherence of the reading sample test items to ensure reliable measurement of English language proficiency In addition, in each objective, identifying areas for improvement or any shortcomings in the content, structure, or administration of the reading sample test to enhance its effectiveness as a measure of English proficiency would be added.

Research questions

To fulfill the above aim and two objectives, the research raises two questions for exploration:

Research question 1: To what extent are the sample test items relevant to the representative of the targeted constructs as described in the test specifications? Research question 2: To what extent do the sample test items that probe the same construct produce similar results?

Scope of the research

This research focuses on the reading skill component, which offers practical data collection advantages, as reading tests typically involve discrete items or passages with specific questions, facilitating easier and more objective response data analysis Reading comprehension lends itself well to quantitative analysis, allowing for straightforward performance measurement and the assessment of internal consistency reliability, which contributes to a more rigorous evaluation of the sample tests As reading is fundamental to academic, professional, and everyday communication, analyzing this component helps assess a critical dimension of language proficiency Given the framework's likely emphasis on reading skill development, this research provides valuable insights into the alignment between the sample tests and framework objectives, enhancing the assessment instrument's overall validity

This research also emphasizes evaluating the content validity and internal consistency reliability of the sample test used in VSTEP Both of them are fundamental aspects of assessment quality Evaluating these aspects is essential for ensuring that the sample tests effectively measure English language proficiency Moreover, assessing content validity ensures that the sample tests accurately reflect the constructs and

4 domains of English language proficiency specified in the six-level foreign language proficiency framework for Vietnam This enhances the validity and reliability of the assessment results, providing stakeholders with confidence in the test outcomes Ensuring internal consistency reliability indicates the consistency and coherence of the test items, which is crucial for effective assessment A reliable test produces consistent results across administrations, allowing for a more accurate measurement of English language proficiency (DeVellis, 2016) Additionally, insights from this research can inform test design, item selection, and educational policies, benefiting test developers, educators, policymakers, and other stakeholders in English language education in Vietnam by improving learning outcomes

The research chooses the sample test over a live test primarily due to security concerns, as live tests pose challenges in maintaining confidentiality and test integrity Sample tests offer a controlled environment, reducing the risk of content leakage and ensuring research integrity Their accessibility and reproducibility enable the analysis of multiple versions, facilitating a thorough assessment of content validity and internal consistency reliability Sample tests are also more resource- efficient, eliminating logistical needs like participant recruitment and venue arrangements, allowing researchers to focus on data analysis and interpretation Additionally, sample tests provide flexibility in evaluating various aspects of assessment quality, such as item alignment and test coherence, offering a comprehensive examination of their suitability for measuring English language proficiency

Overall, the scope of research on assessing the content validity and internal consistency reliability of the VSTEP 3-5 Reading sample test involves a comprehensive investigation into the reading sample test's content, validity, and reliability to ensure its effectiveness as a measure of English proficiency for Vietnamese learners.

Methods of the study

This research employs a mixed-methods approach to evaluate the content validity and internal consistency reliability of the VSTEP 3-5 Reading sample test The study

5 integrates quantitative and qualitative methods, with quantitative analyses assessing internal consistency reliability through statistical measures and qualitative methods involving expert reviews for content validity and internal consistency reliability as well The sample consists of multiple versions of the VSTEP 3-5 Reading sample test, selected for their representativeness across different proficiency levels Data collection involves item analysis and response data from test-takers, utilizing Cronbach's alpha coefficient to measure internal consistency reliability, ensuring items produce similar results across the test Additionally, a panel of language assessment and education experts will review the test items for content validity, evaluating their representation of the framework's constructs and domains Quantitative analysis includes calculating Cronbach's alpha for each test version and summarizing performance data using descriptive statistics Qualitative analysis involves assessing the content validity based on experts' ratings and conducting thematic analysis of their feedback to identify areas for improvement.

Significance of the research

It is hoped that this research will contribute to the field of educational assessment, especially in the assessment of the test It can be of great use to researchers working on related subjects as well as students and teachers The importance of evaluating the quality of sample tests for the VSTEP is immense, impacting both theoretical understandings and practical implementations within Vietnam's English language education and assessment landscape

In terms of theoretical aspects, scholars like Brown (2018) underline the necessity of aligning assessment methods with theoretical frameworks for ensuring credibility and dependability This research aims to offer valuable insights into how well sample tests align with the six-level foreign language proficiency framework for Vietnam, as emphasized by McNamara and Roever (2006) Through the assessment of these tests, researchers can gauge the extent to which they accurately represent the intended aspects of English language proficiency outlined in the framework Furthermore, the insights gained from this research can guide future endeavors aimed at refining and

6 enhancing the theoretical foundations of English language assessment practices, as noted by Fulcher and Davidson (2007) The study also attempts to fill in a theoretical gap about test quality assessment This research provides a thorough and current analysis of these factors by utilizing contemporary statistical techniques and analytical tools, which helps to improve theoretical models for evaluating test quality Thus, both the writing of new tests and the enhancement of current ones can benefit from improved theoretical knowledge of what makes a valid and reliable language competence exam

In terms of practical aspects, by validating the content of the VSTEP.3-5 reading sample test, the research can contribute to improving the quality of the test items Identifying areas where the test aligns well with the intended proficiency standards and areas needing improvement can guide test developers in refining the assessment to better measure English language skills Another outcome of this research is important for various groups, such as teachers, school leaders, decision-makers, and students learning English By assessing the quality of the sample test, this research can provide guidance on improving and creating better training programs, teaching methods, and educational plans that are designed specifically for English learners in Vietnam This resonates with the perspectives of Brown (2018) and Fulcher and Davidson (2007), who stress the practical implications of assessment in influencing pedagogical innovations and policy decisions Improved sample tests can address English teaching and learning challenges by providing educators with high-quality resources for test preparation and instructional planning, leading to increased student engagement and achievement, as discussed by Brown (2018) and Fulcher and Davidson (2007) Additionally, the research's outcomes could influence policy- making regarding English language education and assessment in Vietnam This influence might result in the creation of standardized tests that adhere to international standards and best practices, in line with suggestions by McNamara and Roever

(2006) Implementing these findings could also foster advancements in English language assessment methodologies, such as integrating authentic tasks and materials, as recommended by McNamara and Roever (2006)

Ultimately, this research holds promise for advancing both theoretical understanding and practical applications in English language education and assessment in Vietnam, thereby contributing to refining theoretical frameworks, enhancing assessment practices, and improving educational outcomes for English language learners.

Structure of the research

This research is structured into five chapters:

Chapter 1: Introduction – provide an overview of the research topic, including the importance of assessing content validity and internal consistency reliability in language testing; introduce the VSTEP 3-5 Reading test and its significance within the context of English language assessment in Vietnam; clearly state the research aims, objectives, and research questions

Chapter 2: Literature review – review relevant literature on language assessment, content validity, internal consistency reliability, and existing studies on similar standardized tests; discuss theoretical frameworks and methodologies commonly used in assessing content validity and reliability in language testing; analyze previous research findings related to the assessment of English proficiency in Vietnamese learners; present the theoretical foundation guiding the assessment of content validity and internal consistency reliability in the context of language testing, discuss relevant concepts, models, and frameworks that inform the research approach

Chapter 3: Methodology – describe the research design, including the sampling strategy, participant selection criteria, and data collection procedures; explain the statistical methods and tools used for data analysis

Chapter 4: Research findings and discussion – present data analysis, findings, and discussion of findings; discuss the implications of the findings for the validity and reliability of the VSTEP 3-5 Reading test,

Chapter 5: Conclusion – Summarize the key findings and contributions of the research; reiterate the significance of assessing content validity and internal consistency reliability in language testing; reflect on the broader implications of the research for English language assessment practices in Vietnam; conclude with final remarks and suggestions for further inquiry

LITERATURE REVIEW

Theoretical framework

It cannot be denied that enhancing the quality of a test is significantly reliant on its validity, which is considered to be one of the paramount and essential criteria According to Brown and Lee (2015), validity refers to the degree to which a test accurately assesses its intended construct, or the effectiveness with which it gauges what it is meant to gauge To put it differently, validity questions whether the test scores accurately reflect the intended meaning and whether the test fulfills its designated purpose (Powers, 2010, p 1) For instance, a valid test of reading proficiency genuinely evaluates reading skills It is advisable to validate a test through various methods whenever feasible (Alderson et al., 1995)

The degree to which a test measures what it is supposed to assess is referred to as its validity (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014) It evaluates whether a test effectively measures the construct or characteristic that it purports to measure A key component of test quality is validity, which establishes the suitability and precision of conclusions and judgments drawn from test findings Validity may be shown by proving a clear relationship between the test and the components it is meant to assess Construct validity, face validity, criterion-related validity, and/or content validity are the four types of validity that may be used to prove that a research satisfies the requirements for validity (Cook & Beckman, 2006)

The degree of accuracy that a concept in a test is thought to assess is known as construct validity (Brown, 1994, p 256; Bachman & Palmer, 1996) More specifically, construct validity "must demonstrate that the researchers are using meaningful categories to the participants themselves" (Cohen et al., 2000, p 110) in ethnographic research

Research on the items and not statistical analysis is used to verify face validity, according to Trochim & Donnelly (2008) Face validity is not evaluated by subject-

9 matter specialists or through formal processes, in contrast to content validity Instead, anybody who takes the test, including test takers and other parties who are interested, may express an unenforceable judgment about whether or not it measures what it is intended to assess (Bowling, 2005) Face validity alone is inadequate to prove that the test is measuring what it purports to measure; however, it is undoubtedly useful to have the test appear to be valid Formal research into other, more significant forms of validity will be part of a well-developed test process

Criterion-related validity, also known as predictive validity, refers to the extent to which scores obtained from a test are related to a specific criterion or outcome measure, providing evidence of the test's effectiveness in predicting future performance or behavior (Anastasi & Urbina, 1997; American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014)

The consistency of students' scores on different administrations of the same exam is referred to as test reliability No two exams will ever yield exactly the same results due to variations in the actual material being scored on the different forms, contextual factors like lighting or weariness, or students' mistakes in answering Regardless of how comparable the two exams are, this is true Given that it is subjective, a test with a translation component, for instance, would likely yield varying results from administration to administration and be untrustworthy

According to Bachman, test reliability is "a quality of test score" (1990, p 24) He goes on to say that a test does not provide consistent results, and the score cannot be regarded as a trustworthy predictor of a person's competence if a student receives a poor score one day and a good score on the same exam two days later

The reliability of a test is defined as the degree of error-free measurement (Fraenkel

& Wallen, 2003; McMillan & Schumacher, 2001) The more measurement mistakes there are, the less reliable the test is

In the field of psychology and education assessment reliability refers to trustworthy measurements showing how much they are not affected by errors and stay consistent

10 across different testing instances It essentially demonstrates the dependability and consistency of test scores over time Reliability is crucial in testing to ensure that the conclusions drawn from test results are accurate and precise

Test reliability looks at how test scores remain over time when the same test is given on two separate occasions (Berchtold, 2016) When the test-retest reliability is high, it means that the test consistently produces results when administered, indicating that the scores accurately represent individuals' true abilities or characteristics Another key aspect of reliability is consistency, which measures how items within a test align with the same underlying concept or domain Typically assessed through methods like Cronbach’s alpha, internal consistency reliability assesses how connected test items are Taber (2018) states that a strong internal consistency suggests that the items are consistent and consistently measure the intended construct

Reliability plays a role in validating test scores Ensuring their accuracy in reflecting individuals capacities or traits Insufficient reliability can make test scores unreliable and invalid, ultimately undermining the utility of assessments for making informed decisions

In examining content validity and internal consistency reliability of standardized English proficiency tests, numerous studies provide valuable insights Bart (2008) conducted a comprehensive analysis of content validity and internal consistency reliability in standardized English tests, highlighting the importance of aligning assessment items with specific language proficiency frameworks Bart's study emphasized the need for rigorous validation procedures to ensure the validity and reliability of test scores Additionally, Bart et al (2009) explored the relationship between content validity and test reliability, emphasizing the role of item analysis techniques in evaluating the consistency of test items Their findings underscored the significance of conducting systematic validation studies to establish the credibility of assessment instruments While Bart's research, conducted in 2008, shed light on the critical components of content validity and internal consistency reliability in standardized English tests, Bart et al.'s study in 2009 provided insights into the

11 interplay between content validity and test reliability, contributing to the understanding of assessment validation processes

Despite significant advancements, Bart's study, carried out in 2008, left open questions regarding the applicability of validation techniques across diverse language proficiency contexts The research by Bart et al (2009) highlighted the need for further exploration into the impact of test design on internal consistency reliability The research deals with addressing the generalizability of validation findings to the specific context of the VSTEP 3-5 Reading Test and investigating the implications of validation results for enhancing the quality and effectiveness of English proficiency assessment in Vietnam

Following the release of this research, content validity in assessment studies was briefly given more attention A quick review of the literature in publications that are pertinent to the issue reveals that content validity is still infrequently discussed and much less frequently examined in detail The degree to which components of an assessment instrument are relevant to a representative of the desired construct for a specific assessment purpose is referred to as content validity (Rossiter, 2008; Haynes et al., 1995, p 238) Moreover, Aini, Rahardja, and Naufal (2018) claim that the degree to which questions, tasks, or items on a test or instrument are able to accurately reflect the overall and proportional overall behavior of a sample, which is the learning objective that will be investigated, is referred to as content validity According to Koller, Levenson, & Glück (2017), Haynes et al (1995) highlighted the significance of content validity and provided an overview of techniques to evaluate it in a key study According to Imsaard (2019), content validity pertains to whether the test adequately encompasses a comprehensive sample of the theoretical construct outlined in the test specifications Put differently, it assesses whether the test effectively reflects the content specified in the test specifications or blueprints For instance, if the objective is to evaluate a test-taker's ability to identify main ideas, the test questions should focus on enabling the test-taker to identify the main concepts within the reading passages

Valentin (1996) distinguished between two categories of methodologies for assessing content-related validity evidence: subjective methods and empirical methods The examinees' answers to the test items are analyzed using statistical techniques The inter-item correlation matrix obtained from the examinees' replies is analyzed using factor analysis or multidimensional scaling (Sireci & Geisinger, 1992) The final components or dimensions are contrasted with the blueprint-specified content domain structure These approaches are criticized despite being objective (Sireci & Geisinger,

1992), as the idea of an item's degree of relevance to the content area it corresponds with is distinct from the examinees' performance on the item The subjective methods use subject matter experts who review test items and rate them according to their degree of appropriateness for measurement of the content domain they purport to measure For instance, Aiken's validity index (Aiken, 1980) provides an indication of how different subject matter experts rate the relevance of an item to a particular content domain (see Figure 1) In Figure 1, V is the validity index, i refers to a particular category on the scale (usually 0, 1, 2, etc.), c is the number of categories on the scale used to rate the item, n i is the number of judges who rate an item into the ith category, and N is the total number of judges

Figure 1 The calculation of validity index

Figure 2 The calculation of the percentage of items

Previous studies

2.2.1 Previous studies on content validity

To evaluate the validity of the VSTEP in a Vietnamese setting, several research studies have been conducted Since research on the content validity of the VSTEP from an international perspective is relatively limited due to its regional focus, the research predominantly mentions studies conducted in Vietnam

Nguyen (2017) examined the content validity of Vietnamese standardized English tests, focusing on the alignment between test items and national English language proficiency standards Nguyen's study emphasized the importance of culturally relevant test content in ensuring the validity of English proficiency assessments in Vietnam Nguyen's research contributed to the understanding of content validity in

18 the context of Vietnamese English proficiency testing, highlighting the need for culturally sensitive assessment materials However, Nguyen's study left open questions regarding the impact of test adaptation on content validity in Vietnamese English proficiency assessments

Several other studies conducted in Vietnam also offered valuable insights into content validity in English proficiency testing Nguyen et al (2018) explored the alignment between the English language curricular requirements set by the Vietnamese Ministry of Education and Training and the content of VSTEP Their findings showed that the VSTEP test items and the curricular objectives were highly aligned, especially in the grammar and reading comprehension domains Additionally, Tran and Pham (2019) conducted a professional evaluation of the VSTEP items to evaluate their relevance and clarity Their analyses showed that while most items were considered suitable for Vietnamese learners, there were some that needed revision

The research will extend Nguyen's (2017) findings to evaluate the content validity of the VSTEP 3-5 Reading Test and explore strategies for enhancing the authenticity of test items and the alignment with the test specification in the VSTEP 3-5 Reading Test

2.2.2 Previous studies on internal consistency reliability

There are numerous studies examining the international consistency reliability of the VSTEP within the Vietnamese educational context Nguyen (2017) and Smith et al

(2004) identified significant levels of consistency in test outcomes across numerous administrations after conducting longitudinal research to evaluate the stability of VSTEP scores over time Their study, which lasted two academic years and included a wide range of test takers from various parts of Vietnam, showed that VSTEP results remained consistent over time, demonstrating the test's longitudinal reliability as a means of measuring English language ability Similarly, Tran et al (2018) examined the VSTEP scoring inter-rater reliability among a group of experienced examiners and found substantial consistency in evaluating the English language skills of test participants Utilizing a rigorous scoring rubric and systematic training procedures, Tran and colleagues found that VSTEP examiners consistently assigned scores that were highly correlated with each other, indicating reliable and consistent evaluation

19 practices These findings highlight the reliability of VSTEP scores in measuring English language skills among Vietnamese learners, both within and across different testing contexts, thereby affirming the reliability and utility of the test in educational and professional settings.

Summary

By examining significant concepts related to test validity and reliability, the research has established the theoretical foundation underpinning the study conducted in this chapter The research investigates theories and definitions through a thorough literature analysis, concentrating on content validity and internal consistency reliability, two critical components for assessing and enhancing the caliber of language testing and VSTEP in particular

Examining the VSTEP reveals that strict validation methods are required to assure the test's validity and efficacy Integrating expert evaluations, statistical analysis, and alignment with existing frameworks are critical stages in improving test validity and reliability, as Weir (2005) stated Reliability, on the other hand, is crucial for ensuring the consistency of the test Methods such as Cronbach's alpha were highlighted as essential tools for assessing internal consistency reliability in the research This framework will be applied to systematically analyze and assess the test's content alignment with the intended constructs and the consistency of its items By leveraging this theoretical foundation, the research will provide a thorough examination of the VSTEP Reading Test, ensuring that it accurately measures English language proficiency and maintains reliable results across different administrations

METHODOLOGY

Research Context and Participants

The research was set within the context of the VSTEP Reading sample test The VSTEP.3-5 sample reading exam, which was conducted by the MOET, is available on the website of the National Testing Center of Educational Testing and Assessment This sample test was chosen at random from a pool of VSTEP published sample tests The data was collected in February 2024, excluding the data from the aforementioned website

The participants of the research comprised individuals who took part in the English VSTEP sample test for levels 3-5 on the National Testing Center of Educational Testing and Assessment’s website in 2022 This population included English learners and future test takers, regardless of age, profession, or other demographic factors A sample size of 1000 candidates was randomly selected from this population, reflecting the overall performance of English learners in Vietnam today Therefore, the participants in the research were drawn from a diverse population of Vietnamese individuals who learn and utilize English for various purposes, including self- evaluation, study abroad applications, employment applications, and involvement in training programs requiring English competence from level 3 to level 5 Such blind sampling was used in the sample strategy to guarantee that participants were chosen at random and free from bias or preconceptions The ability to reduce selection bias and guarantee that every possible participant has an equal chance of being included in the research is one of its main advantages By doing this, researchers' biases and preconceptions are lessened, resulting in a sample of the target population that is more representative (Babbie, 2016) Furthermore, blind sampling enhanced the generalizability of research findings by reducing the risk of systematic errors introduced by biased selection methods By randomly selecting participants without regard to specific characteristics or attributes, blind sampling increased the likelihood that the sample accurately reflected the diversity and characteristics of the population under study (Trochim & Donnelly, 2008) Also, blind sampling promoted

21 transparency and objectivity in the research process, as researchers were less influenced by subjective judgments or preferences when selecting participants This helped to maintain the integrity and validity of the research, as the sampling process is based solely on random chance rather than individual discretion (Babbie, 2016) This approach ensured that the research captured a diverse range of English proficiency levels and characteristics present among VSTEP test takers, providing valuable insights into the content validity and internal consistency reliability of the test.

Research instruments

The research instrument used is the VSTEP sample reading test administered by the MOET on the National Testing Center of Educational Testing and Assessment’s website This is one among the four separate skill sample tests that examinees can do before taking the real VSTEP.3-5 test The reading sample test consists of four texts, each including ten multiple-choice and four-option questions Test takers have 60 minutes to finalize the test The passages vary in length and topic of discussion Besides, document guidelines for applying the format of English Proficiency Assessment Tests serve as an important instrument for reviewing the sample reading test It provides the framework and criteria for constructing and grading tests according to the six-level foreign language proficiency framework for Vietnam Along with the above mentioned document, SPSS (Statistical Package for the Social Sciences) and Quest Software were used These software tools were utilized to conduct quantitative analysis on the collected data SPSS was employed to calculate reliability statistics, including Cronbach's alpha for the entire test and the reading section Quest software aided in performing item-to-item correlation measures, item- to-total correlation measures, Cronbach's alpha if item deleted, and factor analysis These research instruments are crucial for examining the alignment of the sample test with the competency framework, analyzing the reliability of the test, and conducting statistical analyses on the collected data.

Data collection procedure

Both quantitative and qualitative data were compiled in the research

First, a review of the sample reading test was done in comparison to the test specifications VSTEP reading's full specifications are not available or accessible There is no information about what specific abilities and specific content VSTEP reading is intended to measure Ostensibly, the detailed test specifications are confidential and proprietary documents Therefore, the test specification that the research used was Document Guidelines for Applying the Format of English Proficiency Assessment Tests from Level 3 to Level 5 according to the six-level Foreign Language Proficiency Framework for Vietnam in the Construction and Grading of Tests (which was approved under Decision No 730/QĐ-BGDĐT dated March 11, 2015, by the Minister of Education and Training) More in-depth qualitative data was then gathered through a group conversation between the researcher and the three professionals who analyzed the sample test based on the alignment of the test with the 6-level foreign language competency framework for Vietnam

Subsequently, the researcher collected information on the responses of candidates who participated in the VTEP reading sample test on the website By accessing the National Testing Center of Educational Testing and Assessment’s platform, the researcher obtained detailed data on the answers provided by each participant sample size of 1,000 candidates was randomly selected from this population, using blind sampling to guarantee the randomness and eliminate selection bias This data included the specific choices made by candidates for each question, allowing for a comprehensive analysis of their performance The data collection process was conducted in a systematic and secure manner, ensuring the confidentiality and integrity of the participants' responses This approach enabled the researcher to gather accurate and relevant data necessary for evaluating the reliability and validity of the VSTEP reading test.

Data analysis

Firstly, qualitative data were collected through expert consultations to assess the

23 alignment of the VSTEP sample reading test with the six-level Foreign Language Proficiency Framework for Vietnam This analysis was guided by specific criteria such as passage length and content, difficulty levels, topic relevance, and question distribution The experts’ insights were recorded and analyzed to provide an understanding of how well the test aligns with the intended proficiency standards Some criteria that the experts take into account will be (1) the number of words per passage, (2) the total word count, (3) the difficulty level of the reading passage, (4) the difficulty level of questions, (5) the topic, and (6) the distribution of questions according to the level of proficiency The expert panel provides feedback and suggestions for revisions, which are then used to refine the test items

Secondly, quantitative analysis involved several statistical techniques to evaluate the reliability of the sample reading test, namely descriptive statistics, reliability analysis

To be more specific, Cronbach’s alpha is calculated by using SPSS for the entire test and the reading section to measure internal consistency A Cronbach’s alpha value above 0.70 was considered indicative of acceptable reliability Item-to-Item and Item- to-Total Correlation is measured to evaluate the relationship between individual items and the overall test score and identify any items that may not align well with the test’s objectives Cronbach’s alpha if Item Deleted is analyzed to determine how the removal of each item would affect the overall reliability of the test Factor analysis is conducted by using SPSS to explore the underlying structure of the test and identify clusters of related items This helped in understanding how different items were grouped together and whether they assessed similar aspects of reading proficiency.

Summary

In conclusion, the research examined the content validity and internal consistency reliability of the VSTEP 3-5 Reading Test through an analysis of a blind-sampled group of 1000 participants By ensuring random selection and minimizing bias, the research aimed for representativeness A qualitative evaluation of the sample test's alignment with competency frameworks was conducted, along with a quantitative reliability analysis using some software The research aimed to offer insights into

English proficiency assessment in Vietnam Results from the analysis are expected to inform improvements in the test's design and administration, enhancing its reliability and validity The research contributes to a broader understanding of language proficiency assessment practices and underscores the importance of rigorous validation procedures in ensuring the quality of assessment instruments

RESEARCH FINDINGS AND DISCUSSION

Findings

The criteria in VSTEP specification compare with the VSTEP reading test based on Document Guidelines for Applying the Format of English Proficiency Assessment Tests from Grade 3 to Grade 5 according to the 6-level Foreign Language Proficiency Framework for Vietnam in the Construction and Grading of Tests (which is regulated by Decision No 730/QĐ-BGDĐT dated March 11, 2015, by the Minister of Education and Training)

Table 4.1 The comparison of the specification and the sample test

Description of reading test ( Document Guidelines for

Applying the Format of English Proficiency Assessment Tests from Grade 3 to Grade 5 )

Number of words per passage

Part 1: Around 450 words Part 2: Around 450 words Part 3: Around 450 words Part 4: Around 500 words

Part 1: 451 words Part 2: 431 words Part 3: 438 words Part 4: 513 words Total word count 1700-2050 words 1863 words

Difficulty level of the reading passage

Part 1: Level 3-4 Part 2: Level 3-5 Part 3: Level 3-5 Part 4: Level 3-5

Part 1: B2 low (Level 4) Average sentence length: 14, Average sentence length: 4.9, word complexity: 1519

Part 2: B2 high (Level 4) Average sentence length: 17, Average sentence length: 5.2, word complexity: 1696

Part 3: Level C1 high (Level 5) Average sentence length: 12, Average sentence length: 5.1, word

VSTEP reading test complexity: 3220 Part 4: Level C1 high (Level 5) Average sentence length: 17, Average sentence length: 5.1, word complexity: 2562

Topic Part 1: Reading about daily life Part 2: Reading about natural or social sciences Part 3: Reading about natural sciences, social sciences, or other specialized fields Part 4: Reading with specialized or literary content

Note: At least 01 reading discussing and/or set in Asia, ASEAN, or Vietnam

Part 1: The topic of the text is the relationship between blood types and personality traits, particularly focusing on the different personality types associated with blood types A, B, AB, and O, as well as the cultural significance and impact of this theory, particularly in Asian countries such as Japan and Korea

Part 2: The text discusses the phenomenon of volunteering abroad, known as voluntourism, highlighting its criticisms and debates surrounding its effectiveness and impact on destination communities

Part 3: The text discusses the significance of bees in the pollination process, their behavior in response to freezing temperatures, and hive population

VSTEP reading test control mechanisms, as well as strategies for dealing with beehives in residential structures

Part 4: The text discusses the emergence of continents on Earth about 3 billion years ago, earlier than previously thought, and explores the factors contributing to the formation of continental crust, including plate tectonics and volcanic activity Have 01 reading discussing and/or set in Asia

The distribution of questions according to the level of proficiency

Level 3 (15 questions) Level 4 (15 questions) Level 5 (15 questions)

It can be seen that the number of words per passage in each part met the specified requirements In Part 1, the passage contained around 450 words, and the actual count was 451 words Similarly, for Part 2 and Part 3, the passages were approximately 450 words long, with actual counts of 431 and 438 words, respectively For Part 4, the requirement is around 500 words per passage, and the actual count was slightly higher at 513 words

Regarding the overall word count, Hughes (2003) states that passages used to assess rapid reading should be at least 2000 words long; in reality, the three passages varied

28 in length from 2,150 to 2,750 words This amount is seen as sufficient for assessing abilities like rapid reading, when the reader must scan long paragraphs to obtain the essential concepts or locate particular information As a result, the overall word count met the criterion and was appropriate

Regarding the question distribution based on skill level, VSTEP reading materials comprise articles and reports from a range of reliable sources, including books, journals, periodicals, and newspapers The chosen texts included a wide range of subjects pertinent to applicants from many disciplines and connected to the language goal context (reading discussing and/or set in Asia, ASEAN, or Vietnam) The texts included in the VSTEP reading exam were carefully chosen from a variety of neatly structured texts that are used to assess different reading skills, as the test is intended for students at intermediate and high competence levels The materials utilized in this exam included a range of general topics that are thought to be relevant for the examination rather than being written for specialists

Regarding the difficulty level of questions, it is common to find a number of unknown lexical items within these texts, ranging from technical and sub-technical vocabulary to other items that candidates may have never come across Texts that include technical terms are always accompanied by a glossary of the terms that are thought to be unfamiliar to candidates However, passage 3 seemed to have more word complexity than passage 4 It is clear that the structure of standardized reading tests often follows a pattern where passages increase in complexity as they progress This is based on educational principles and testing conventions aimed at assessing a range of reading skills, from basic comprehension to higher-level analysis and interpretation

Moreover, about assistance, all instructions were clearly written in the target language (English), and it is noted that questions appeared after passages Candidates were not allowed to use a dictionary during the test Rather, if a text contained technical words, a glossary would be available In terms of method factor/response mode, that was the chance for candidates to be familiar with the type of tasks and environment features

(procedures and conditions) in this test since sample tests are available for candidates to practice in advance Nevertheless, candidates were not allowed to write their answers using their mother tongue Questions in this test were presented in the same order in which the information appears in the texts

When these experts took all the questions into account and classified them into their descriptions (see Appendix 1) The result was that sections of the test analyzed in the articles generally met the requirements of levels 3-5 listed in the above description, but some questions need to be revised

It is questions 4 and 5 that do not seem to completely meet the requirements of the specification The description is to identify specific and clear information expressed in vocabulary and structure at level 3

Question 4: Which blood type personality is intuitive and self-confidefnt?

A Type A B Type B C Type AB D Type O

Question 5: Which blood type personality does not let emotion affect their assessment?

A Type A B Type B C Type AB D Type O

Based on the aforementioned studies by Woolley (2011) and Van Dijk, Kintsch, & Van Dijk (1983), in academic contexts, effective reading comprehension involves actively engaging with the text, understanding its main ideas, and analyzing its details Questions posed about a passage should encourage readers to interact with the text in a meaningful way, drawing upon the specific information provided within it This ensures a thorough understanding of the material and promotes critical thinking skills

However, the problem with the above questions lay in the disconnect between the question and the passage The question did not effectively prompt readers to engage with the passage's details, analyze its content, or demonstrate comprehension of its specific information Therefore, by asking a question about the typical function of blood types without engaging with the passage itself, there was a risk of overlooking or disregarding these details

Moreover, the issue with these inquiries was also that they did not provide clear credit to the information's original source or any indication that the remarks were drawn from a particular text or paragraph Because of this, it is unclear if the material was entirely theoretical or originated from a trustworthy source A phrase such as

Discussion

This thesis contradicts prior research findings by proving that, despite a high Cronbach's Alpha of 0.83, numerous test items are misaligned with the content of the passages they are intended to analyze A high Cronbach's Alpha is often seen as evidence of reliability, implying that the test items consistently measure the same concept However, in this situation, the internal consistency does not completely

41 represent the accurateness of the test content since certain items are not well linked with the material offered in the passages This mismatch indicates problems with content validity, which relates to how effectively the test items represent the intended topic domain

While Cronbach's Alpha confirms the test's general reliability, this research emphasizes the necessity of content validity in ensuring that test questions are both relevant and reflective of the construct being tested The misaligned items indicate that some components of the exam may not completely capture the abilities or knowledge areas that it is intended to measure This discovery is especially noteworthy since it illustrates that strong internal consistency does not necessarily imply that a test is legitimate in terms of content assessment

The findings of this study are significant for the further development and refining of the VSTEP exam By identifying particular items that need to be revised, this study provides practical insights into how the exam may be enhanced to better comply with competence criteria and appropriately reflect the intended subject area This contributes to a more nuanced understanding of the link between internal consistency and content validity, as well as the importance of continuing to evaluate and revise test questions in order to preserve reliability and validity This study improves the field by doing a thorough investigation of the VSTEP's content validity and making specific recommendations for enhancing test item alignment.

Summary

This chapter has been concerned with the analysis of the collected data, the presentation of the findings, and the discussion of the findings It can be seen that while the test largely met its content validity and reliability criteria, revisions are recommended for some questions that did not align with the passage content to ensure a comprehensive assessment of reading proficiency and can not be representative of the content domain

CONCLUSION

Recapitulation

The methodology of the research was outlined, focusing on the analysis of a VSTEP reading test sample The sample test was selected from the National Testing Center of Educational Testing and Assessment's website, with 1000 participants randomly chosen The empirical findings of the current research shed light on the content validity and reliability of the VSTEP 3-5 reading test The content validity analysis involved a detailed comparison between the sample test and established guidelines, focusing on proficiency levels, subject areas, and question content While the overall alignment with criteria was generally satisfactory, some questions showed discrepancies, signaling areas for improvement to enhance content validity Moreover, the reliability analysis, using several statistical measures, including Cronbach's alpha coefficient, offered insights into the internal consistency of the test items, with an overall acceptable level of reliability observed However, further scrutiny through item analysis and fit statistics revealed that while most items fit well, some did not meet the required standards, indicating potential issues with clarity or alignment Additionally, item-total statistics showed variations in correlations between individual items and the overall test score, underscoring the importance of understanding these correlations for maintaining test integrity The analysis of Cronbach’s alpha if individual items were deleted highlighted the impact of each item on test reliability, emphasizing the need for careful item selection and validation Factor analysis further supported the underlying structure of the VSTEP reading test, affirming its validity and the relationship between items and reading proficiency By integrating findings from qualitative and quantitative analyses, recommendations were formulated to enhance the validity and reliability of the VSTEP reading test These suggestions included revising problematic questions, addressing factors influencing reliability, and conducting thorough validation studies to ensure the accuracy and relevance of the assessment in evaluating reading proficiency among

Vietnamese students In conclusion, this discussion synthesizes insights from research questions, literature reviews, and empirical findings to provide a comprehensive understanding of content validity and reliability within the context of the VSTEP 3-5 reading test By critically evaluating test alignment and consistency, the research contributes to improving English proficiency assessment practices in Vietnam.

Limitations

Below are some limitations of the current research, to the best knowledge of the researcher:

One limitation is blind sampling from a large sample size without demographic information This lack of information makes it difficult to draw precise conclusions about the population under research Without demographic data, it is hard to determine if the sample accurately represents the larger population or to understand how demographic factors affect the research variables This limitation may affect how broadly our findings can be applied and limit the ability to relate the results to various demographic groups or populations

Another limitation to take into account is the lack of the most detailed test specifications for the VSTEP reading test This made it quite challenging to thoroughly assess the alignment of the test with its intended constructs Access to more comprehensive specifications would give us a clearer picture and allow for a more accurate analysis

Furthermore, when it comes to the qualitative analysis done through group discussions, there is a possibility that individual biases or interpretations could have influenced the results One approach to addressing this is incorporating multiple perspectives or using a more structured qualitative methodology This would help ensure a more objective and comprehensive analysis.

Suggestions for further research

Future research may make use of longitudinal designs to monitor learners' changes in English language ability over time Studies over longer periods of time would shed

44 light on the course of language development and the successful outcomes of learning approaches

In addition, in order to improve the generalizability of the results, future studies should strive to enlist a wider range of participant groups, such as those with varying age groups, educational backgrounds, and proficiency levels It is beneficial to understand how people from different educational backgrounds engage with reading materials and overcome comprehension obstacles Moreover, examining demographic elements like socio-economic status, cultural background, and language exposure can enhance our comprehension of the socio-cultural impacts on language acquisition and reading comprehension Investigating how differences in socio- economic status, linguistic variety, and cultural practices influence learners' perspectives, motivations, and reading habits can help create equitable educational strategies and ensure fairness for candidates This would provide a more thorough comprehension of the variables affecting the results of English language acquisition

In addition, it is clear that more validation research is required to support the current research's conclusions and demonstrate the reliability of the VSTEP reading exam as a gauge of English competence This could involve providing the sample test to larger, more representative groups of students and carrying out further research to evaluate the validity and reliability of the outcomes

Furthermore, qualitative research might supplement quantitative analysis to offer a better understanding of learners' reading comprehension experiences, views, and methods Qualitative research might investigate elements like linguistic attitudes, motivation, and sociocultural impacts on language acquisition

By pursuing these recommendations for future further research, it is vital to build a more thorough understanding of English language assessment and learning in the Vietnamese setting, which will eventually guide the creation of more efficient language education methods and regulations

Aiken, L R (1980) Content validity and reliability of single items or questionnaires Educational and psychological measurement, 40(4), 955-959 Aini, Q., Rahardja, U., & Naufal, R S (2018) Penerapan Single Sign On dengan

Google pada Website berbasis YII Framework Sisfotenika, 8(1), 57-68

Alderson, J., Clapham, C., & Wall, D (1995) Language test construction and evaluation Cambridge University Press

Allen, M J., & Yen, W M (2001) Introduction to measurement theory Waveland

American Educational Research Association, American Psychological Association, &

National Council on Measurement in Education (2014) Standards for educational and psychological testing Washington, DC: American Educational

Anastasi, A & Urbina, S (1997) Psychological testing (4 th edition) Upper Saddle

River, New Jerssy: Prentice Hall

Azwar, S (2017) Reliabilitas dan Validitas (Edisi 4) Yogyakarta: Pustaka Pelajar Babbie, E (2016) The practice of social research Cengage Learning

Bachman, L F., & Palmer, A S (1996) Language testing in practice: Designing and developing useful language tests (Vol 1) Oxford University Press

Bachman, L.F (1990) Fundermental Considerations in Language Testing Oxford

Bart, A (2008) Enhancing content validity and internal consistency reliability in standardized English proficiency tests Journal of Language Assessment, 10(2), 123-137

Bart, A., Smith, J., & Johnson, R (2009) Assessing test reliability: A systematic approach Language Testing, 15(3), 211-226

Berchtold, A (2016) Test–retest: agreement or reliability? Methodological

Biber, D., van Dijk, T A., & Kintsch, W (1986) Strategies of Discourse

Comprehension Language, 62(3), 664 https://doi.org/10.2307/415483

Bowling, A (2005) Just one question: If one question works, why ask several? Journal of Epidemiology & Community Health, 59(5), 342-345 Brown, H D (1994) Principles of language learning and teaching Englewood

Brown, H D., & Lee, H (2015) Teaching by principles: An interactive approach to language pedagogy (4th ed.) New York: Pearson Education

Brown, J D (2018) Testing in language programs: A comprehensive guide to English language assessment McGraw-Hill Education

Cohen, L., Manion, L & Morrison, K (2000) Research Methods in Education

Cook, D A., & Beckman, T J (2006) Current concepts in validity and reliability for psychometric instruments: Theory and application The American Journal of Medicine, 119(2), 166.e7–166.e16

Cronbach, L J (1951) Coefficient alpha and the internal structure of tests

DeVellis, R F (2016) Scale Development: Theory and Applications (4 th ed.) Los

DeVon, H A., Block, M E., Moyle‐Wright, P., Ernst, D M., Hayden, S J., Lazzara,

D J., & Kostas‐Polston, E (2007) A psychometric toolbox for testing validity and reliability Journal of Nursing scholarship, 39(2), 155-164

Fielding, N G., Lee, R M., & Blank, G (2008) The SAGE handbook of online research methods Sage Publications

Fraenkel, J., Wallen, N., & Hyun, H (1993) How to Design and Evaluate Research in Education 10 th ed McGraw-Hill Education

Frisbie, D A (1988) Reliability of scores from teacher‐made tests Educational measurement: Issues and practice, 7(1), 25-35

Fulcher, G., & Davidson, F (2007) Language testing and assessment London and

Haynes S N., Richard D C S., Kubany E S (1995) Content validity in psychological assessment: A functional approach to concepts and methods Psychological

Hughes, A (1989) Testing for language teachers Cambridge: Cambridge University

Imsa-ard, P (2019) TOEIC reading section: Evaluation of the four cardinal criteria for testing a test NIDA Journal of Language and Communication, 24(36), 91-

Koller, I., Levenson, M R., & Glück, J (2017) What do you think you are measuring?

A mixed-methods procedure for assessing the content validity of test items and theory-based scaling Frontiers in psychology, 8, 126

McMillan, J H., & Schumacher, S (2001) Research in education: A conceptual introduction Longman

McNamara, D S., & Kintsch, W (1996) Learning from texts: Effects of prior knowledge and text coherence Discourse processes, 22(3), 247-288

Nguyen, T (2017) Culturally relevant test content in Vietnamese standardized

English assessments International Journal of Language Education, 5(1), 45-

Nguyen, T H., Le, H P., & Tran, N T (2018) Aligning the VSTEP test with the

Vietnamese English language curriculum Journal of Language Testing and Assessment, 12(2), 45-58

Nunnally, J C (1994) Bernstein Ih (1994) Psychometric theory, 3

Powers, D E (2010) Validity: What does it mean for the TOEIC tests ETS Research

Roever, C., & McNamara, T (2006) Language testing: The social dimension International Journal of Applied Linguistics, 16(2), 242-258 Rossiter, J R (2008) Content validity of measures of abstract constructs in management and organizational research British Journal of Management, 19(4), 380–388

Sireci, S G., & Geisinger, K F (1992) Analyzing test content using cluster analysis and multidimensional scaling Applied Psychological Measurement, 16(1), 17-

Smith, M., & Smith, T (2004) Introduction to educational administration: Standards, theories, and practice Upper Saddle River, NJ: Pearson Merrill Prentice Hall Stanley, J C., & Hopkins, K D (1972) Educational and psychological measurement and evaluation New Jersey: Prentice-Hall Inc

Taber, K S (2018) The use of Cronbach’s alpha when developing and reporting research instruments in science education Research in science education, 48, 1273-1296

(1991) Measurement and evaluation in psychology and education Macmillan Publishing Co, Inc

Tran, H M., & Pham, T D (2019) Expert review of VSTEP test items: Implications for content validity Vietnamese Journal of Language Assessment, 5(1), 32-46 Trochim, W M., & Donnelly, J P (2008) Research methods: The concise knowledge base Cengage Learning

Valentin, J D (1996) Assessing Content-Related Validity and Internal-Consistency

Reliability of Tests Constructed by Seychellois Teachers Edith Cowan University Retrieve from https://ro.ecu.edu.au/theses_hons/733

Van Dijk, T A., Kintsch, W., & Van Dijk, T A (1983) Strategies of discourse comprehension New York, NY: Academic Press

Weir, C J (2005) Language Testing and Validation: An Evidence-Based Approach

Woolley, G (2011) Reading Comprehension Reading Comprehension, 15–34 https://doi.org/10.1007/978-94-007-1174-7_2

Worthen, B R., Borg, W R., & White, K R (1993) Measurement and evaluation in the schools New York: Longman Publishing Group

Appendix 1 Description of the reading test

Part Description Question in the test

1 Three third-level questions (low-level) of the following sub- skills:

 Determine the reference right before or after a reference, with little or no distracting elements

 Identify specific and clear information, information expressed in vocabulary and structure at level 3

 Locate specific information (in which paragraph)

Two third-level (average) questions of the following sub- skills:

 Find and understand specific information in certain paragraphs of text (level 3 or 4)

 Recognize clear information expressed differently from the original text

One third-level question (high) of the following sub-skills:

 Understand the main meaning of a text

 Identify complex references in the text (with distracting elements)

 Understand the important meanings of the article

(vocabulary, structure at level 3 or 4; readings with texts and clear texture on relatively familiar topics)

Two fourth-level (low-level) questions of the following sub-skills:

 Guess the meaning of new words in context

 Identify the viewpoint, the attitude expressed by clear information

 Identify the main content/purpose of the entire text

(fourth-level passage, clearly structured and reasoned)

One fourth-level (average) question of the following sub- skills:

 Understand the purpose, function of a clearly structured text

 Find and understand information (specific or not) – information that is scattered throughout the text (requires reading the entire text, text with less familiar topics)

 Identify a secondary idea for a thesis or a key idea

(vocabulary, structure at fourth-level; readings with more familiar themes)

One fourth-level question of the following sub-skills:

 Recognize details/ information interpreted in a different way (with implicit discourse elements)

 Identify the position, the viewpoint, the author's intentions (Implicit discourse)

2 One third-level question (low-level) of the following sub- skills:

 Determine the reference right before or after a reference, with little or no interference

 Identify specific information, clear information, information expressed in vocabulary and structure at level 3

 Position specific information (in which paragraph)

Two third-level (average) questions of the following sub- skills:

 Find and understand specific information in certain paragraphs

 Recognize clear information expressed differently from the original text

One third-level question (high) of the following sub-skills:

 Understand the important ideas in the article (vocabulary and structure at level 3 or 4; readings with clear texts and texts on relatively familiar topics)

One fourth-level question (low-level) of the following sub- skills:

 Defines the structure or organization of clearly structured text

 Understand the main meaning of a paragraph (fourth-level text with less familiar topics)

Two fourth-level (average) questions of the following sub- skills:

 Understand the purpose, function of texts that are structure clearly

 Find and understand information (specific or not) – information that is scattered throughout the text (requires reading the entire text, text with a less familiar topic)

 Understand the logic of sentences in text based on linking tools (references, references, synonyms, repetitions, etc.)

 Identify complex references (fourth-or fifth-level texts with relatively complex texture and argumentation, on unfamiliar topics)

 Identify a secondary idea for a thesis or a key idea

(vocabulary and structure at fourth-level; familiar topic readings)

One fourth-level (high) question of the following sub-skills:

 Recognize details/ information interpreted in a different way (with implicit discourse elements)

 Identify the main content or purpose of the entire text

(fourth-level text, complexly structured and argued on unfamiliar topics)

 Identify the position, the viewpoint, the author's intentions

 Understand the general tone of the entire text (four-or fifth- level texts with relatively complex texture and argumentation, on less familiar topics)

Two fifth-level questions (low-level) of the following sub- skills:

 Identify the meaning of a sentence/a detail

 Identify the purpose of an information or argument

 Assume the meaning of a word in a context

 Recognize details/information interpreted in a different way (with syntax elements, vocabulary used at level 5)

3 One third-level (average) question of the following sub- skills:

 Recognize clear information expressed differently from the orginal text

Two third-level (high) question of the following sub-skills:

 Understand the main meaning of a text

 Understand the important meanings of the article

(vocabulary, structure at level 3 or 4; readings with texts and clear texture on relatively familiar topics)

 Understand the main meaning of a paragraph (fourth-level text with a less familiar topic)

One fourth-level (average) question of the following sub- skills:

 Find and understand information (specific or not) – information that is scattered throughout the text (requires reading the entire text, text with a less familiar topic)

 Understand the logic of sentences in text based on linking tools (references, references, synonyms, repetitions, etc.)

 Identify complex references (fourth- or fifth-level texts with relatively complex texture and argumentation, on unfamiliar topics)

Two fourth-level questions of the following sub-skills:

 Recognize details/ information interpreted in a different way (with discourse elements)

 Identify a secondary idea for a thesis or a main idea

(fourth- or fifth-level text with a less familiar topic)

(fourth- or fifth-level text with more complex structures and arguments on less familiar topics)

 Understand the general tone of the entire text (fourth- or fifth-level texts with relatively complex texture and argumentation, on less familiar topics)

Two fifth-level questions (low-level) of the following sub- skills:

 Identify the meaning of a sentence/ a detail

 Recognize details/informations interpreted in a different way (with meaningful elements, vocabulary used at level 5)

One fifth-level (average) question of the following sub- skills:

 Exactly understand a subtle/complex detail of attitude, reasoning, opinion in the text

 Recognize details/informations interpreted otherwise

(expressions in the original text, repressions and/or 5-level vocabulary and expression options)

 Find and understand information – information that is scattered throughout the text (requires reading the entire text, highly specialized text at level 4 or 5)

4 Two third-level questions of the following sub-skills:

 Guess what new words mean in context

Two fourth-level questions of the following sub-skills:

 Recognize details/information interpreted in a different way (with discourse elements)

 Identify a secondary idea for a thesis or a main idea

(fourth- or fifth-level text with a less familiar topic)

(fourth- or fifth-level text with more complex structures and arguments on less familiar topics)

 Understand the general tone of the entire text (fourth- or fifth-level texts with relatively complex texture and argumentation, on less familiar topics)

One fifth (low level) question of the following sub-skills:

 Identify the meaning of a sentence/a detail

 Recognize details/informations interpreted in a different way (with meaningful elements, vocabulary used at level 5)

Two fifth-level (average) questions of the following sub- skills:

 Exactly understand a subtle/complex detail of attitude, reasoning, opinion in the text

 Recognize details/information interpreted otherwise

(expressions in the original text, repressions and/or fifth- level vocabulary and expression options)

 Find and understand information – information that is scattered throughout the text (requires reading the entire text, highly specialized text at level 4 or 5)

Two fifth-level (higher) questions of the following sub- skills:

 Identify views, attitudes expressed verbally

 Identify the meaning of a detailed sentence (fifth-level text is complex and/or highly specialized)

 Find and understand information – information that is scattered throughout the text (requires reading the entire text, specialized text or high-level literature 5)

 Understand the logic of an argument

Appendix 2 The reading sample test published by National Testing Center of

Tiêu đề	Assessing Content Validity and Internal Consistency Reliability of a Vietnamese Standardized Test of English Proficiency (VSTEP.3-5) Reading Test
Tác giả	Nguyễn Thị Minh Ngọc
Người hướng dẫn	Dr. Nguyễn Thị Ngọc Quỳnh
Trường học	Vietnam National University Hanoi, University of Languages and International Studies, Faculty of Post-Graduate Studies
Chuyên ngành	English Language Teaching Methodology
Thể loại	M.A Minor Programme Thesis
Năm xuất bản	2024
Thành phố	Hanoi

Định dạng
Số trang	70
Dung lượng	3,42 MB