An investigation into the cognitive validity of the speaking section of the vietnamese standardized test of english proficiency (vstep 3 5) = nghiên cứu giá trị xác thực Đối với quá trình tư duy của thí sinh

An investigation into the cognitive validity of the speaking section of the Vietnamese Standardized Test of English Proficiency (VSTEP.3-5) = Nghiên cứu giá trị xác thực đối với quá trình tư duy của thí sinh

INTRODUCTION

Rationale

The University of Languages and International Studies (ULIS) is a research university in languages, linguistics, international studies and related social sciences and humanities ULIS‘s mission is to contribute to the national development through the pursuit of excellence in education, research, and the provision of quality services relevant to social demands ―We particularly value creativity, dynamics, honesty, responsibility, and capability to work and gain success in a multicultural and competitive globalizing world.‖ 1 ULIS was appointed by the National Foreign Languages Project of Vietnamese Ministry of Education and Training to develop the English language test for the three levels of English language competences, which are corresponding to the B1, B2, and C1 levels of the CEFR (see Appendix 1 for the VSTEP.3-5 development team members) The test was named as the Vietnamese Standardized Test of English Proficiency (VSTEP.3-

5) The VSTEP.3-5 was designed to target the population of test-takers of post- secondary school levels such as undergraduate, graduate students of the Vietnam‘s colleges and universities, and working people of different fields

The VSTEP.3-5, like other high-stake language tests, shall be used predictively, e.g to show that an individual is capable of performing in a particular job, class or academic setting This places a responsibility on the test designers to ensure that the test elicits behaviors similar to the behaviors that happen in a real-world context When developed, the VSTEP.3-5 test was validated and validation results showed evidence of construct validity of the test, which was significant for the test stakeholders to confidently use the test scores

Nevertheless, like all other language tests 2 , validity evidence should continuously be developed when the tests are in use VSTEP.3-5, though was validated at the development stages and validation results were widely published, further validation studies have been conducted 3 For such purpose, the cognitive validity of the speaking section of the VSTEP.3-5 test, therefore, was investigated to establish the

1 Retrieved from http://ulis.vnu.edu.vn/english/taxonomy/term/1/6 on Feb 9th 2014

2 Cambridge research notes series are a typical example

3 Such as the study of the validity of the cut-scores of the listening section of the test conducted by Yen Nguyen (2017)

2 extent to which the cognitive processes that VSTEP.3-5 speaking section elicits from a candidate resemble the processes that he/she would employ in non-test conditions, to study if the VSTEP.3-5 speaking section actually covers the processes that it is supposed to measure, and if the cognitive factors imposed by the VSTEP.3-5 speaking section are appropriately calibrated to reflect the levels of language competences of the test-takers

When established, the cognitive validity shall provide the models of mental processes underlying the speaking construct of the VSTEP.3-5, and the confidence of VSTEP.3-5‘s stakeholders in applying the scores of the VSTEP.3-5 test takers In order to establish the cognitive validity of the VSTEP.3-5, various methods shall be used, the findings of which will cater for implications in the commissioning the VSTEP.3-5 speaking items/tasks and administration of the VSTEP.3-5 speaking section, which in turn helps improve the validity of the test and confidence in the use of the test scores of the test stakeholders.

Objectives of the study

The study is aimed at establishing the cognitive validity of the speaking section of the VSTEP.3-5, which are presented in the cognitive processes imposed by the test, whether the scores of the sections can tell such cognitive processes imposed by the test and whether the scores can tell real-life processes that the test takers may experience For such an aim, the research questions of the study were designed to establish the speaking cognitive validity of the test as a predictor of real-life performance basing on the central issues that a language test must deal with in terms of its cognitive validity (Field, 2013):

- RQ1: To what extent does the VSTEP.3-5 speaking section actually cover the cognitive processes that it is supposed to represent?

- RQ2: To what degree are the cognitive demands imposed in the

VSTEP.3-5 speaking section appropriately calibrated to reflect the levels of language competences of the test-takers?

- RQ3: How closely do the cognitive processes that the VSTEP.3-5 speaking section elicits from a candidate resemble the processes that he/she would employ in non-test conditions?

Significance of the study

The study focuses on the cognitive validity, which shall provide the model of

3 mental processes underlying the speaking construct of the VSTEP.3-5, which helps the test stakeholders in improving the qualities of the test and so improve the confidence of VSTEP.3-5‘s stakeholders in applying the scores of the VSTEP.3-5 test takers In order to establish the cognitive validity of the VSTEP.3-5, various methods shall be used, which shall shape an approach in validation study in language testing in relation to cognitive validity not only in Vietnam but in the world as well.

Scope of the study

The study shall focus on the cognitive validity of the speaking section of the VSTEP.3-5 test In the study, the VSTEP.3-5 speaking section format and specifications, task writers‘ guidelines, oral examiners‘ training manual, one test form and the test scores of the test takers taking the test form shall be studied The conclusions withdrawn from the study shall therefore be relevant for the VSTEP.3-5 speaking section and similar application may possibly be appropriate for the speaking section of other language tests taking into consideration the procedures how the study is to be conducted.

Organization of the study

The study is divided into 8 chapters as below:

Chapter 1: Introduction, introducing the research topic, its rationale, aims, significance, scope, and the organization of the research

Chapter 2: Literature Review, discussing the theoretical background in the light of which the research matters will be discussed

Chapter 3: The Vietnamese Standardized Test of English Proficiency, describing the VSTEP.3-5 testing cycle

Chapter 4: Methodology, describing the methods applied to investigate the research matters

Chapter 5: The cognitive processes supposedly represented in the VSTEP.3-

5 speaking section, stating data analysis, discussion and findings of the research question of ―To what extent does the VSTEP.3-5 speaking section actually cover the cognitive processes that it is supposed to represent?‖ Chapter 6: The calibration of cognitive demands imposed in the VSTEP.3-5 speaking rating scale, stating data analysis, discussion and findings of the research question of ―To what degree are the cognitive demands imposed in

4 the VSTEP.3-5 speaking section appropriately calibrated to reflect the levels of speaking competences of the test takers?‖

Chapter 7: The speaking cognitive processes in VSTEP.3-5 test and non-test conditions, stating data analysis, discussion and findings of the research question of ―How closely do the cognitive processes that the VSTEP.3-5 speaking section elicits from a candidate resemble the processes that he/she would employ in non-test conditions?‖

Chapter 8: Conclusion, summarizing the overall study, proposing some recommendations with regards to the levels of speaking cognitive processes imposed on the test-takers in the test conditions, and suggesting further studies in the field

LITERATURE REVIEW

The concept of validity

The term validity has substantial history In the first edition of Educational Measurement, Cureton (1951) stated that ―the essential question of test validity is how well a test does the job it is employed to do‖ He also mentioned that ―validity has two aspects, which may be termed relevance and reliability‖ Relevance concerns the closeness of agreement between the actual test scores and the true criterion scores, whilst reliability refers to consistency of scores across replications (Cureton & Lindquist, 1951, pp 621-623)

According to Cronbach and Meehl (1955), validity includes construct validity, concurrent validity, predictive validity, criterion-related validity and content validity, among which construct validity is the degree to which ―a test could be interpreted as a measure of some attribute or quality which is not operationally defined‖ (Cronbach & Meehl, 1955) 4 Though being one of the first researchers

4 In modern conceptions of validity, one job of the test developer is to provide exactly that operational development

6 who introduced construct validity in educational measurement, such Cronbach & Meehl‘s definition of construct validity has not been prevalent in contemporary testing, when language testing is considered as ―a method of operationalizing the inferences of both student assessment and of research‖ (Davidson et al, 1985, p.137).Construct validity is studied when the tester needs to prove that an element is valid with the underlying construct of the test, the test behavior and the corresponding score

In 1961, Lado provided the first significant contribution to language testing by applying the term validity to language testing In his study, validity is considered as one of the most important qualities of a language test Validity is ―a matter of relevance‖ A test is considered valid when test content and test conditions are relevant and there are no ―irrelevant problems which are more difficult than the problems being tested‖ and whether the test is relevant to ―what it claims to measure‖ (Lado, 1961) Cronbach looked at validity in the same way when he mentioned

―every time an educator asks ‗but what does the instrument really measure?‘ he is calling for information on construct validity‖ (Cronbach, 1971, p 463)

Campbell and Fiske (1959) introduced two forms of reasoning about validity: convergent and discriminant Convergent validity demonstrates that measures should be related to each other are, in fact, observed to be related to each other whilst discriminant validity shows measures that theoretically should not be related to each other are, in fact, observed to not be related to each other (D T Campbell & Fiske, 1959) The concepts of internal validity and external validity are also used by Campbell and Stanley (1966) in the field of experimental design to describe and investigate the causes of the results of a particular study and the way in which independent and dependent variables are linked together in a cause-effect relationship Internal validity is the essential validity and is specific to a particular case, whilst external validity asks the questions of generalizability, or to what extent the findings of a particular case can be applied to different groups or settings While internal and external validity were first introduced as part of experimental design, they have also achieved relevance to broader concerns in measurement

According to Messick (1989), validity is ―an overall evaluative judgement of the degree to which evidence and theoretical rationales support the adequacy and

7 appropriateness of interpretations and actions based on test scores and other modes of assessment‖ (Messick, 1989, p 13) The key concept is ‗score meaning‘, which is defined as a ―construction that makes theoretical sense out of both the performance regularities summarized by the score and its pattern of relationships with other variables, the psychometric literature views the fundamental issue as construct validity‖ (Messick, 1996, p 245) Messick‘s view was important because it provided new understanding of validity as compared to the traditional definitions of validity His work remains dominant in validity theory to this day

For language testers, Messick‘s views in language testing have been developed by Bachman (1990), who claims that the validity of a given use of test scores is the outcome of a complex process that must include ―the analysis of the evidence supporting that interpretation or use, the ethical values which are the basis for the interpretation or use but also the test takers‘ performance‖ (Bachman, 1990, p 237) Starting from Messick‘s ―progressive matrix‖, Bachman focused on construct validity, and on the ―value implications of interpreting the score in a particular way‖ by considering the theories of language and the relevant educational and social ideologies we attach to the score interpretation (Bachman 1990, p 243) Bachman drew on Messick‘s theories and started from the analysis of the evidential ―basis of validity‖, which he refers to as the gathering of complementary types of evidence into the process of validation to support the relationship between test score and interpretation and use As far as the consequential basis of validity is concerned, Bachman argued that tests are not designed and used in a ―value-free psychometric test-tube‖ (Bachman, 1990, p 279) but that they meet the needs of an educational system or of the whole society for which we must assume the potential consequences of testing

From the definitions mentioned, the term validity relates much to the interpretations of test scores and finally it comes to the matter of the construct of a test, which plays the most important element in determining the validity of a test Recently, Weir develops a summary of validity, which is categorized into four attributes (Weir, 2005):

(1) Validity in general is the matter of whether a test really measures what it claims to measure

(2) Validity resides in test scores

According to Weir (2005), ―validity is perhaps better defined as the extent to which a test can be shown to produce data, i.e., test scores, which are an accurate representation of a candidate‘s level of language knowledge or skills‖ (Weir, 2005, pp 12-15) He also states that ―validity resides in the scores on a particular administration of a test rather than in the test per se.‖ (Weir, 2005, p 12) In his opinion, a test can be valid over time or across versions if various versions of a test or administrations of the same test provide similar results over time

Validity is multifaceted and different types of evidence are needed to support any claims for the validity of scores on a test These are not alternatives but complementary aspects of an evidential basis for test interpretation He mentions ―no single validity can be considered superior to another‖ and stresses the importance of the different elements of validity including theory-based validity (which is later named cognitive validity), context validity, scoring validity, consequential validity, and criterion-related validity

(4) Validity is a matter of degree

He mentions Messick‘s idea that ―it is important to note that validity is a matter of degree, not all or none‖ He takes one single aspect of validity of content coverage to illustrate Messick‘s idea He elaborates that a test may not provide a perfect fit in terms of appropriate operations and conditions Another version of the test may demonstrate a stronger match with the test specification A test‘s claims to validity may also differ across the types of validity evidence generated in relation to a single administration of one version of the test He concluded that ―validity should be viewed as a relative concept‖ (Weir, 2005, p 15)

Within the scope of this article, the theory-based validity or cognitive validity is the core concept and will be discussed in more details in the coming parts

Validation in language testing

In the second edition of Educational Measurement, Cronbach (1971) defined

―validation is the process of examining the accuracy of a specific prediction or inference made from a test score… More broadly, validation examines the soundness of all interpretations of a test – descriptive and explanatory interpretations as well as situation-bound predictions‖ (Cronbach, 1971, p 443)

In 1996, Messick stated that ―test validation is empirical evaluation of the meaning and consequences of measurement, taking into account extraneous factors in the applied setting that might erode or promote the validity of local score interpretation and use‖ (Messick, 1996, p 245)

Kane (2006) defined validation as ―the process of evaluating the plausibility of proposed interpretation and uses, and on validity as the extent to which the evidence supports or refutes a proposed interpretations and uses‖ (Kane, 2006)

Validation processes are developed and introduced by different scholars including Cronbach and Meehl (1955), Cronbach (1971), Messick (1989) in their publications quoted in many validation studies (Cronbach, 1971) (Cronbach & Meehl, 1955) (Loevinger, 1957) (Messick, 1989)

Thus, a typical validation process involves the development of evidence to support the proposed the interpretation and usage and is associated with an evaluation of the extent to which the proposed interpretations are plausible and appropriate Accordingly, to validate an interpretation or use is to evaluate its overall credibility Given the extensive research literature, validation has been developed into different models

Between 1920 and 1950, criterion validity came to be considered as a standard for validity Validation was to address the question of how well a test estimates the criterion, which could be defined in terms of ―performances of the actual talk‖ (Cureton & Lindquist, 1951, p 623) A test was considered valid for any criterion for which it provided accurate estimates (Gulliksen, 1950) Under the criterion model, the variable of interest was assumed to have a definite value as accurately as possible Given this goal, it was natural to conceive of validity as the relation between test scores and criterion scores The criterion model was developed into two versions,

10 concurrent and predictive Concurrent validity studies employed criterion scores obtained at about the same time as the test scores and could be used to validate a proxy measure that would be cheaper, easier, or safer than the criterion Predictive validity studies employed a criterion of future performance, which was not available at the time of testing (Kane, 2006) The criterion model worked particularly well in those cases where a criterion was available For example, if the tests were used to predict some future performance, evaluation of actual performance could be used as the criterion For admissions, placement, and employment testing, the criterion model is a preferred approach

In the 1950s, the content model was developed The content model interprets test scores based on a sample of partial performance as an estimate of overall performance of the domain The content model studies three possibilities: (a) the observed performance can be considered a representative sample from the domain, (b) the performances are evaluated appropriately and fairly, and (c) the sample is large enough to control sampling error (Guion, 1977)

The content model has most frequently been applied to measures of academic achievement A content domain is outlined in the form of a test specification plan or blueprint, which may involve several dimensions (e.g content per se, cognitive level, item type), with different numbers of items assigned to each element in the plan The items are not sampled from the domain; they are created to match the test specifications (Loevinger, 1957), and ―to the extent that they do, they may be considered to be representative of the content domain described by the test plan‖ (Kane, 2006)

In 1955, Cronbach and Meehl outlined construct validity in terms of the hypothetico-deductive model of scientific theories, in which a theory consists of a network of relationships linking theoretical constructs to each other and to observable attributes (Cronbach & Meehl, 1955) Loevinger (1957) subdivided construct validity into its substantive component, structural component, and external component The substantial component focuses on a theory-based analysis of test content, while the structural component focuses on the internal structure of the test

11 in relation to that of the target construct, and the external component focuses on relationships to other test and non-test variables and on potential sources of systematic error (Loevinger, 1957)

Cronbach and Meehl (1955) presented construct validity as an alternative to the criterion and content models, it was to be used ―whenever a test is to be interpreted as a measure of some attribute or quality which is not operationally defined‖, and

―for which there is no adequate criterion‖ (Cronbach & Meehl, 1955) Loevinger

(1957) stated that ―since predictive, concurrent, and content validities are all essentially ad hoc, construct validity is the whole of validity from scientific point of view‖ By the early 1980s, the construct model was widely accepted as a general approach to validity (Anastasi, 1986; Embretson, 1983; Guion, 1977; Messick,

Since late 1980s, the construct validity encompassed all evidence for validity, including content and criterion evidence, reliability, and the wide range of methods associated with theory testing Messick adopted a broadly defined version of the construct model as a unifying framework for validity He incorporated the content model to a subsidiary role in supporting the relevance of test tasks to the constructs of interest, and he treated the criterion model as an constituent methodology for validating secondary measures of a construct against its primary measures (Messick,

The unified validity framework was developed based on two main interconnected facets in order to highlight the inferences and decisions made from test scores One facet is ―the source of justification of the testing, being based on appraisal of either evidence or consequence‖, whilst the other facet is represented ―by function or the outcome of the testing, being either interpretation or use‖ (Messick, 1989, p 20)

He associated the evidential basis for some interpretation with construct validity and the evidential basis for score interpretation with construct validity and relevance or validity That is, all interpretations of test scores are to be supported by construct validation, and the appropriateness of the scores for a particular purpose is to be evaluated in terms of the relevance of the construct to the purpose at hand Under this model, the evaluation of test use is a two-step process, from score to construct and from construct to use

The concept of cognitive validity

The concept of cognitive validity was first introduced in the 1990s by Baxter and Glaser as a part of construct validity As a strand of construct validity, it addresses the extent to which a test requires a candidate to engage in cognitive processes that resemble or parallel those that would be employed in non-test circumstances

Cognitive validity has had a considerable impact on recent studies in educational measurement in the USA and UK, where it has been applied to tests of scientific and logical reasoning (Baxter & Glaser, 1998), language testing (Weir, 2005) It has also been used to evaluate concept mapping (Ruiz‐Primo, Schultz, Li, & Shavelson, 2001)

In language testing and assessment, the concept of cognitive validity was developed by Weir (2005) in his socio-cognitive approach to test validation Cognitive validity, as described by Weir (2005), is similar to instructiveness as conceived by Bachman and Palmer (1996) This concerns the extent and type of involvement of a test taker‘s language ability, topical knowledge, and affective schemata in performing a language task (Bachman & Palmer, 1996) What is important is that the cognitive processing involved in real life language use should be reflected as far as possible in language test situations if claims for validity are to be supported Cognitive validity is, according to Field (Field, 2013), of particular concern in the case of tests whose scores are employed predictively to indicate the test taker‘s suitability for a future university place, for a job in a particular domain as instances The evidences of cognitive validity are often collected through studying test-takers‘ behaviors using various types of verbal reporting (e.g., introspective, immediate retrospective, and delayed retrospective) to elicit their comments on what they actually do in a speaking (Field, 2011), listening (Field, 2013), reading (Khalifa & Weir,

2009), or writing test (Shaw, Shaw, & Weir, 2007) In addition, a test‘s cognitive validity can be investigated by studying the way an expert reader/writer/listener/speaker behaves or the processes that he/she uses in performing the same reading/writing/listening/speaking task in a non-testing or real- world situation (Field, 2013)

In general, according to Field (2013), cognitive validity also studies ―how finely the relevant processes are graded across the levels of the suite in terms of the cognitive demands that they impose upon the candidate‖ The term ―cognitive demand‖ mentioned here is a typical term in cognitive science, which refers to the demand which is placed on cognitive abilities, through the dimensions of complexity, openness, implicitness, and level of abstraction (Edwards & Dall'Alba, 1981) or as stated by Stein (2009) as ―the kind and level of thinking required of students in

19 order to successfully engage with and solve a task‖ (Stein, Smith, Henningsen, & Silver, 2009) Also mentioned by Field (2013), cognitive validity considers the cognitive load that the test takers may encounter when taking a test In this context, the term cognitive load refers to the levels of cognitive demands of processing information

Within the scope of the study, the cognitive validity of testing speaking is the central issue and is discussed in further details in the coming parts.

Cognitive validity and testing speaking

The ability to speak in a foreign language shows a person has the competence to use the language When speaking, one needs to express his/her thoughts and so his/her command of the language is exposed To be able to speak in a foreign language, one should have a proper command of the sound system, appropriate vocabulary, understand what is being said to and what should be uttered, etc Speaking in a foreign language, therefore, is difficult and to develop the competence to speak a foreign language is a long time journey Speaking is done in real-time, learners‘ abilities to plan, process and produce the foreign language are of utmost importance, and so is the importance of assessing speaking

Accordingly, speaking is ―the most diﬃcult language skill to assess reliably‖ according to Alderson and Bachman as cited by Louma (Luoma, 2004) When assessing speaking the assessor has to make judgements about a range of aspects of what is being said Bachman points out that any test must developed based on a clear definition of the abilities that are to be assessed (Bachman, 1990, p 81) In foreign language testing, the framework of communicative competence by Canale and Swain (1980) and the model of communicative language ability by Bachman and Palmer (1996) have been applied as theoretical background for language test development, especially in assessing speaking skill

According to Louma, when assessing speaking, the elements of speaking ability should be studied in detail They are the sound of speech, grammar and spoken structures, vocabulary and spoken words, features of speech production, and functions of speaking (Luoma, 2004) The testing of pronunciation (both segmentals and suprasegmentals), spoken grammar, spoken vocabulary, and even

20 sociolinguistic applications of speech all fall into the construct of speaking These features are fundamental when designing and developing tests, which form the construct of the test

Lado wrote ―the ability to speak a foreign language is without doubt the most highly prized skill, and rightly so… Yet testing the ability to speak a foreign language is perhaps the least developed and the leased practiced in the language testing field.‖ (Lado, 1961, p 239)

Speaking is the verbal use of language to communicate with others The purpose for which we wish to communicate with others are so large that they are innumerable, and as this is not a dissertation about human needs and desires we will not even attempt to provide examples

Any construct of speaking or speech is obviously going to be multifaceted And however much we may try to define and clarity, the kinds of choices that a second language speaker makes are going to be influenced by the totality of their current understanding, abilities (personal and cognitive), language competence and speech situation The choices are also going to be influenced by the person or persons they are talking to What kind of feedback is being given? How does the other person contribute to the conversation?

According to Fulcher, the construct of speaking includes sound features (pronunciation and intonation, accuracy and fluency), psychological aspect, speaking strategies (achievement and avoidance), structural aspects of speaking including opening and closing conversation and turn-taking, pragmatic aspects of speaking, vocabulary, and the co-construction of discourse (Fulcher, 2003)

Field (2004) developed the model of speech production following Levelt (1989,

1999) (Levelt, 1989; Levelt, 1999) makes clear that any model of speech production, whether in L1 or L2 needs to incorporate a number of stages:

- A conceptual stage, where the proposition that is to be expressed first enters the mind of the speaker

- A systematic stage, where the speaker chooses an appropriate frame into which words are to be inserted, and marks parts of it for plural, verb agreement, etc

- A lexical stage, where a meaning-driven search of speaker‘s lexicon or vocabulary store takes place, supported by cues as to the form of the word (i.e its first syllable)

- A phonological stage, where the abstract information assembled so far is converted into a speech-like form

- A phonetic stage, where features such as assimilation are introduced

- An articulation stage, in which the message is uttered

In this model, it is important to notice that the first three of these stages are abstract and are not in verbal form It is only at stage four that linguistic forms become involved A model of speaking also needs to follow:

- A forward planning mechanism at discourse level, which (for example) marks out in advance which syllable is to carry out sentence stress

- A buffer, in which an articulatory plan for the current utterance can be held while the utterance is actually being produced

- A monitoring mechanism, which enables a speaker to check an utterance for accuracy, clarity and appropriacy immediately before it is uttered and almost immediately afterwards

2.4.2 Cognitive validity in assessing speaking

Among the facets of validity mentioned in Weir‘s socio-cognitive framework is cognitive validity, according to Weir (2005), cognitive validity is established by a priori evidence on the cognitive processing activated by the test task before the live test event, as well as through the more traditional a posteriori evidence on constructs measured involving statistical analysis of scores following test administration ―Language test constructors need to be aware of the established theory relating to the cognitive processing that underpins equivalent operations in real-life language use‖ (Taylor, 2011) Based on Weir‘s study of cognitive validity, Field (2011) adapted Levelt‘s model of speech production (1999) to develop a cognitive validity framework for speaking as below:

Constructing a syntactic frame forming links to lexical entries

Conversion to instructions to articulators

Cues stored in a speech buffer

Figure 1: Field’s model of separating levels of processing from outputs of processing (adapted version of the Levelt model, 1989)

The cognitive stages considered in the model are:

- Conceptualization: generating an idea or set of ideas for expression

- Grammatical encoding: constructing a syntactic frame and locating the lexical items that will be needed

- Phonological encoding: converting the abstract output of the previous stage into a string of words which are realized phonologically

- Phonetic encoding: adjusting the phonological sequence to make articulation easier; linking each of the syllables to a set of neutral instructions to the articulators; storing the instructions in a buffer while the clause is being articulated

- Self-monitoring: focusing attention on the message immediately before and shortly after it is uttered in order to check for accuracy, clarity and appropriacy

According to Field (2011), the quality and availability of all these information

23 sources affect the individual‘s ability to construct accurate and appropriate L2 utterances According to Field, performance deficit in a test of L2 speaking might arise from:

- Linguistic sources: gaps in the mental lexicon, imprecise or incomplete representations in the syllabary, inability to encode a syntactic pattern into a form of words

- Knowledge sources: cultural gaps in world knowledge or pragmatic knowledge

- Failure of comprehension or recall: gaps in the discourse representation The framework for validating the cognitive processes in testing speaking is discussed in the next part of the chapter

2.4.3 A framework for establishing the cognitive validity in testing speaking

As aforementioned, the socio-cognitive framework developed by Weir (2005) is adopted as the validation model of this study Taylor (2005) named one component in the Weir model of theory-based validity cognitive validity Then, in 2011, Field has developed the cognitive models of cognitive processing in examining speaking with focus on the cognitive processes applied by expert speakers, which he adapted basing on the models developed by Levelt (1989, 1999) With regards to testing speaking, the socio-cognitive model is summarized by Weir (2005) in the figure below:

Figure 2: The socio-cognitive framework for validating speaking

The above speaking framework was later developed by Lynda Taylor (2011) as below:

Figure 3: A framework for conceptualizing speaking test validity (Taylor, 2011)

The framework provides a comprehensive model for validating test validity in general and speaking test validity in particular

Basing on the model of speech production developed by Levelt (1989, 1999), Field

(2011) developed a validation framework for cognitive validity of testing speaking in the L2 context as below:

Figure 4: Field’s information sources feeding into the phases of the processing system

The cognitive processes were described in details by Field (2011) as below:

Basing on the two types of conceptualization (macro-planning and micro-planning) introduced by Levelt (1989), Field (2011) mentioned the macro-planning in speaking is much more constrained than it is in writing The speaker is under pressure to respond promptly in most speaking contexts, thus limiting the time available for planning and structuring content Moreover, there are working memory constraints on how much longer-term material the L2 speaker can store while at the same time dealing with current production demands unless the speaker has support of devices or tools to record their intentions Micro-planning is much more localized It positions the intended utterance in relation to the discourse as a whole by taking account of knowledge shared with listener and current topic

As Field (2011) described, Levelt‘s account of how the message is generated views spoken interaction as a process of joint construction, in which a speaker has to relate each new utterance to a shared discourse framework Amongst the factors which Levelt identified as affecting both macro-planning micro-planning are awareness of the ongoing topic, thematization of new information, recognition of information shared and not shared with the listener, accommodation to the point-of-view and even the form of words of the interlocutor and certain basic principles which determine how information is ordered

Basing on Levelt‘s original formulation phase (which first entails the construction of a surface structure, an abstract framework for the sentence to be uttered, based upon a syntactic pattern; and then converts the framework and the associated lexis to phonological form, which involves retrieving the appropriate forms from memory), Field (2011) mentioned that ―syntactic complexity is clearly a factor in the cognitive difficulty of producing an utterance‖ and L2 speakers‘ inability ―to perform a particular structure derives from a lack of linguistic knowledge; but it may equally well derive from the demands of assembling the structure and retain it in the mind while the utterance is being produced‖

According to Field (2011), in tests of speaking, linguistic content is often expressed not in terms of syntactic complexity of form but in terms of language functions which test takers are required to perform This makes it simpler to envision the transition from the test taker‘s initial idea to a template for an utterance It can be treated as a matter of mapping from the function that the test taker wishes to perform to the pattern that best expresses that function It is important to recognize that any staging of the functions to be performed is also a staging of cognitive demands The factors affecting the cognitive demands imposed on the test taker at this stage of processing might include the frequency and transparency of the function and the complexity of the form of words that expresses it

The socio-cognitive framework in different studies

Since being introduced by Weir in 2005, the socio-cognitive framework in language test development and validation has been applied by different scholars in a wide range of test development, validation and CEFR alignment projects

Firstly, the framework has been successfully applied by Cambridge English Language Assessment into its various studies to support the validity of the Cambridge main suite exams including the KET, PET, FCE, CAE, CPE tests, the results of which are books in the series of Studies in Language Testing: Shaw and Weir (2007) Examining Writing: Research and practice in assessing second language writing Studies Studies in Language Testing 26, Cambridge: UCLES/ CUP; Khalifa and Weir (2009) Examining Reading: Research and practice in assessing second language reading, Studies in Language Testing 29, Cambridge: UCLES/ CUP; Taylor (ed) (2011) Examining Speaking: Research and practice in assessing second language speaking, Studies in Language Testing 30, Cambridge: UCLES/CUP; Geranpayeh and Ta ylor (2013) Examining Listening: Research and practice in assessing second language listening, Studies in Language Testing 35, Cambridge: UCLES/CUP

In all those publications, the socio-cognitive framework is applied as the central literature In addition, these publications have contributed to the development of the framework to a better details and explanations With regards to cognitive validity introduced by Weir (2005) as an example, Field has developed the cognitive models of cognitive processing in examining speaking and listening with focus on the cognitive processes applied by expert speakers and listeners, which he adapted basing on the models developed by Levelt (1989, 1999)

34 and Cutler and Clifton (1999) In this series of Cambridge publications, the cognitive validity has been studied in great details with focus on the priori stages of test development Little and no attention has been paid on the posteriori stages of a typical testing cycle

The framework has also been applied to develop Cambridge specific-purpose tests in the domains of legal, financial and business English, and Teaching Knowledge Test Moreover, other scholars have applied the framework across a wide range of international contexts and for a variety of purposes, in the assessment of both English and other languages such as 5 :

 In the Baltic States it formed the basis for building the test specifications for a generic specific-purpose, test of English in higher education

 In Mexico it was used to create a set of English tests related to the Common European Framework of Reference (CEFR)

 A similar approach was adopted in the United Arab Emirates to generate tests at six proficiency levels for university preparation

 Other projects are currently under way with institutions as far apart as Saudi Arabia (at the pre-university level) and Malaysia (at pre- and exit university levels)

 With consultancy input from CRELLA, university authorities in the former Yugoslav Republic of Macedonia used the socio-cognitive framework to develop the first national tests of Macedonian as a Foreign Language

 The Goethe Institute found it useful when revising its higher level examinations in German as a Foreign Language

Besides, the framework has been used by examination boards in the UK, Turkey, Mexico, Taiwan and Japan as a theoretical basis for CEFR linking projects Many of these projects have been written up for publication and a comprehensive list of references can be found on the CRELLA website, e.g.:

 Book chapters and journal articles that use the socio-cognitive framework

 Cambridge ESOL research notes articles that use the socio-cognitive framework

 PhD theses that use the socio-cognitive framework

5 https://www.beds.ac.uk/crella/sociocognitive

Among all those aforementioned studies, with regards to test validation studies, the socio-cognitive framework has been applied by scholars to provide validity evidence for many renown international English tests including Cambridge main suite and specific-purpose tests, IELTS, and APTIS Considering the cognitive validity, all recent studies in which the socio-cognitive framework has been applied have covered the tests assessing all the four language domains of speaking, listening, reading, and writing In the domain of speaking, the socio-cognitive framework has been applied in both the priori and posteriori stages of test development; however, only one study has been done on the posteriori stages by Li-Shih Huang (2013) in her project of ―Cognitive processes involved in performing the IELTS Speaking Test: Respondents‘ strategic behaviors in simulated testing and non-testing contexts‖ In this study, Huang focused on investigating the strategies that the test takers tend to apply when taking the IELTS speaking test with a desire to suggest that the construct of IELTS should or should not include test takers‘ strategic competence This project sets out to probe and describe the strategic behaviors that test takers used when performing the IELTS speaking test Specifically, the study involved collecting stimulated verbal report data from 40 Chinese speaking, English-as-an-additional-language students at both intermediate and advanced levels, to examine the strategic behaviors of those who perform the IELTS speaking test in a simulated testing situation versus those who perform it in a non-testing situation The study was designed to analyze test takers‘ strategic behaviors, through both elicitation from stimulated recalls carried out in the participants‘ first language and observation of the participants‘ actual production during their performance of the three IELTS speaking tasks Her report concluded with statements about empirical and methodological implications and specific directions for future research that should involve an adequate sample size based on the power analysis, as well as an inter - disciplinary approach to gain insight into the complex nature of test-takers‘ cognitive processes and strategic behaviors Regarding the cognitive validity studies of other skill domains at the priori stages of test development, there are

36 studies in the reading domain of Tineke Brunfaut and Gareth McCr ay (2015) 6 and Stephen Bax (2013) 7 Significant contributions of both studies in the aspect of cognitive validity of language tests are the research methodology of using eye-tracking and stimulated recall interviews The two methods allowed balancing the strengths and weaknesses of each individual method, generating a richer and wider-reaching set of data than each alone, and allowing triangulation of the findings of each method These three studies have provided valuable implications for the study hereof in terms of both research methodology and how the findings are stated In terms of research methodology, the study hereof applied stimulated recall interview data collection method, which were also used in the two studies mentioned above by Huang (2013) and Tineke Brunfaut and Gareth McCray (2015) and Stephen Bax (2013) With regards to data analysis, the findings of the study applied the way how the cognitive validity of the Cambridge main suit exams was stated

In a nutshell, in this chapter, the definitions of validity, cognitive validity, and different validation models are discussed with focus on the validation model that can help establish the cognitive validity of a language test in general and the speaking cognitive validity in particular, which is the socio-cognitive model developed and introduced by Weir (2005) This model provides a framework for the researcher to investigate the cognitive validity of the the VSTEP.3-5 speaking section The following chapters discuss the VSTEP.3-5 and VSTEP.3-5 speaking section, research methodology of the study and findings and discussion relating to the cognitive validity of the VSTEP.3-5 speaking section

6 Looking into test-takers‘ cognitive processes while completing reading tasks: A mixed- method eye-tracking and stimulated recall study

7 The cognitive processing of candidates during reading tests: Evidence from eye-tracking

THE VIETNAMESE STANDARDIZED TEST OF ENGLISH

Decision to provide the VSTEP.3-5

This part of the chapter describes the background information of how the VSTEP.3-

5 was decided to be developed and the characteristics of its prospective test takers

3.1.1 The National Foreign Languages Project

In 2008, based on the baseline study results on the existing conditions of teaching English in Vietnam and the needs for learning English of the work force of Vietnam, a national project named the National Foreign Languages Project (the NFLP), part of the Ministry of Education and Training of Vietnam (MOET) was founded The NFLP‘s mission is to implement the Government Decision 1400/QD- TTg dated September 30 th 2008 and later Decision 2080/QD-TTg dated December

22 nd 2017 with the overall objective of ―by the year 2020 most Vietnamese youth

38 whoever graduate from vocational schools, colleges and universities gain the capacity to use a foreign language independently This will enable them to be more confident in communication, further their chance to study and work in an integrated and multi-cultural environment with variety of languages This goal also makes language as an advantage for Vietnamese people, serving the cause of industrialization and modernization for the country and in 2017‖ In 2017, the goal was modified into ―…renovate foreign language teaching and learning in the national education system; continue to implement new foreign language curricula for different levels of education; improve foreign language proficiency of the Vietnamese people to meet study and work requirements; enhance Vietnam‘s human resource‘s competitive capability in the context of global integration, thereby contributing to the development of the country; create a foundation for universalization of foreign languages in general education by 2025.‖

One of the measures of the NFLP is to improve assessment and testing in foreign languages teaching and learning in Vietnam, part of which is to design a tool that helps measure the language proficiency of English teachers and university students in Vietnam

With such an aim, the Ministry of Education and Training of Vietnam assigned the University of Languages and International Studies (ULIS) to develop the tool as one important measures of the NFLP ULIS is a research university in languages, linguistics, international studies and related social sciences and humanities ULIS‘s mission is to contribute to the national development through the pursuit of excellence in education, research, and the provision of quality services relevant to social demands ―We particularly value creativity, dynamics, honesty, responsibility, and capability to work and gain success in a multicultural and competitive globalizing world.‖ 8 After one year and a half from mid 2013, the Vietnamese Standardized Test of English Proficiency (VSTEP.3-5) was constructed by the research team led by professor Nguyen Hoa from ULIS-VNU and the support of professor Fred Davidson from University of Illinois at Urbana- Champaign, USA (see Appendix 1 for more details about the research team), who mentioned in one of his reports relating to this mission that ―this was both a

8 Retrieved from http://ulis.vnu.edu.vn/english/taxonomy/term/1/6 on Feb 9th 2014

39 touching professional moment (for me) and yet another affirmation that the entire ULIS/VNU test development team are professionals of the highest order.‖ Later, the VSTEP.3-5 test was formed following Decision No 729/QD-BGDDT dated March

11 th 2015 issued by the Ministry of Education and Training of Vietnam The VSTEP.3-5 test is to measure the English proficiency of level 3 to level 5 of the Vietnamese Framework of references of foreign languages, which are equivalent to the B1 to C1 levels of the Common European Framework of References of languages (hereinafter referred to as the CEFR) The ultimate target population of VSTEP.3-5 includes English learners of post-secondary school levels such as undergraduate, graduate students of the Vietnam‘s colleges and universities, and working people of different fields

3.1.3 The VSTEP.3-5 test taker characteristics

As abovementioned, the VSTEP.3-5 test takers are English learners of post- secondary school levels such as undergraduate, graduate students of the Vietnam‘s colleges and universities, and working people of different fields The prospective test takers of the VSTEP.3-5 share the following basic characteristics 9 :

- Using Vietnamese as their mother tongue,

- Living and working/learning in Vietnam

- Want to learn English and earn English language certificates for learning and working purposes or other purposes

According to O‘Sullivan (2000), test taker characteristics are classified into three groups including physical/physiological and experiential characteristics Physical characteristics refer to the age, gender, short-term ailments and longer-term disabilities of the test takers Psychological characteristics can be seen in personality, memory, cognitive style, affective schemata, concentration, motivation and emotional state of the test takers Experiential characteristics are those relating to education, examination preparedness, examination experience, target language- country residence, topic knowledge and knowledge of the world (O'Sullivan, 2000) When developing and validating the VSTEP.3-5, all those characteristics of the test takers have been considered The VSTEP.3-5 test development report has mentioned that the VSTEP.3-5 developers considered those characteristics of the test takers at different stages of developing the test

9 Extracted from the VSTEP.3-5 development report, NFLP-MOET, 2014

The VSTEP.3-5 development

This part of the chapter describes how the VSTEP.3-5 was developed

Figure 6: The VSTEP.3-5 development processes (adopted from the Manual for language test development and examining, 2011)

The VSTEP.3-5 was developed strictly following the 4 stages suggested in the Manual for language test development and examining introduced by the Europe Council in 2011 (de l‘Europe, 2011) including planning, design, try-out and informing stakeholders Products of the development stages are the VSTEP.3-5 format, VSTEP.3-5 specifications, VSTEP.3-5 item writer guidelines, VSTEP.3-5 raters‘ training manuals

The general format of the VSTEP.3-5 is described in the table below:

Table 1: The VSTEP.3-5 test format 10

10Decision No 729/QD-BGDDT dated March 11 th 2015 issued by the Ministry of Education and Training of Vietnam

Paper Time allocation Number of items/tasks Item/task types

40 minutes including the time to transfer the answers to the answer sheet

Test-takers listen to short conversations, instructions, notices, longer conversations, talks, and answer MCQs

60 minutes including the time to transfer the answers to the answer sheet

Test-takers read 4 passages about different topics with the difficulty levels of the passages varying from level 3 (B1) to level

5 (C1) and with the number of words ranging from 1900 to 2050 words and answer corresponding MCQs

Task 1: Write an email of about 120 words, which account for one third of the total score of the Writing paper

Task 2: Write an essay of about 250 words about a given topic, developing the topic using specific arguments and examples This task accounts for two third of the total score of the Writing paper

Part 1: Social Interaction Test-takers answer from 3 to 6 questions about two different topics

Part 2: Solution Discussion Test-takers are provided with a situation and three options to deal with the issue raised in the situation Test-takers gives arguments to support the option that they think is the best choice to deal with the issue and counter-arguments about the two other options

Part 3: Topic development Test-takers talk about a given topic using the suggested supporting ideas and/or their own ideas Part three ends with some further questions to discuss the given topic

With regards to speaking skill, the VSTEP.3-5 speaking section is described as below:

 Time length: 12 minutes (including 2 minutes of preparation: 1 minute in part 2 and 1 minute in part 3)

 General description: The speaking paper consist of three parts: (1) Social Interaction,

 Output language: oral conversation (social interaction, discussion, questions and answers) and extended talk

 Overall description: one-to-one speaking assessment model with

- Part 1: Social Interaction (The examiner asks three to five questions, the test-taker answers the questions.)

Topic 1: three questions Topic 2: three questions

- Part 2: Solution Discussion (The examiner and test-taker discuss three options and select the best alternative.)

- Part 3: Topic Development (The test-taker develops a topic, using a given outline.)

3.2.2 The construct of the VSTEP.3-5 speaking section

The construct of the VSTEP.3-5 speaking is summarized in the below table:

Table 2: Summary of the VSTEP.3-5 speaking test construct 11

Task demand and response format

Intended operations/ tested sub-skills

- Giving explanations and descriptions of particular issues

6 questions The examiner asks questions, which are not provided in the speaking paper given to the test-

11extracted from VSTEP.3-5 development report, NFLP-MOET, 2014

Discussion between the examiner and the test- taker

A situation in which the test- taker discusses with the examiner to select the best solution among the three given options to solve a problem

The examiner orally instructs the test-taker to deliver the task The test-taker is provided with a situation in the speaking test paper

- Describing an object or an event

- Talking about complicated and general topics

A topic, an outline in the form of a mind map to develop the topic, three follow-up questions

The examiner orally instructs the test-taker to deliver the task The test-taker is provided with a topic and a mindmap in a speaking test paper

The construct of the speaking section contributes to the overall language ability tested of the VSTEP.3-5 as described in the below table:

Table 3: VSTEP.3-5 overall language ability 12

3 Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc Can deal with most situations likely to arise whilst travelling in an area where the language is spoken Can produce simple connected text on topics which are familiar or of personal interest Can describe experiences and events, dreams, hopes and ambitions and briefly give reasons and explanations for opinions and plans

4 Can understand the main ideas of complex text on both concrete and abstract topics, including technical discussions in his/her field of specialization Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without strain for either party Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages and disadvantages of various options

12 Extracted from VSTEP.3-5 development report, NFLP-MOET, 2014

5 Can understand a wide range of demanding, longer texts, and recognize implicit meaning Can express him/herself fluently and spontaneously without much obvious searching for expressions Can use language flexibly and effectively for social, academic and professional purposes Can produce clear, well-structured, detailed text on complex subjects, showing controlled use of organizational patterns, connectors and cohesive devices

The more detailed descriptions of the oral ability tested in the VSTEP.3-5 speaking sections are summarized in the below tables:

Table 4: VSTEP.3-5 oral production ability 13

3 Can reasonably fluently sustain a straightforward description of one of a variety of subjects within his/her field of interest, presenting it as a linear sequence of points

4 Can give clear, detailed descriptions and presentations on a wide range of subjects related to his/her field of interest, expanding and supporting ideas with subsidiary points and relevant examples

Can give clear, systematically developed descriptions and presentations, with appropriate highlighting of significant points, and relevant supporting detail

5 Can give clear, detailed descriptions and presentations on complex subjects, integrating sub-themes, developing particular points and rounding off with an appropriate conclusion

Table 5: VSTEP.3-5 oral Interaction ability 14

3 Can exploit a wide range of simple language to deal with most situations likely to arise whilst travelling Can enter unprepared into conversation on familiar topics, express personal opinions and exchange information on topics that are familiar, of personal interest or pertinent to everyday life (e.g family, hobbies, work, travel and current events)

Can communicate with some confidence on familiar routine and non-routine matters related to his/her interests and professional field Can exchange, check and confirm information, deal with less routine situations and explain why something is a problem Can express thoughts on more abstract, cultural topics such as films, books, music etc

4 Can interact with a degree of fluency and spontaneity that makes regular interaction, and sustained relationships with native speakers quite possible without imposing strain on either party Can highlight the personal significance of events and experiences, account for and sustain views clearly by providing relevant explanations and arguments

Can use the language fluently, accurately and effectively on a wide range of general, academic, vocational or leisure topics, marking clearly the relationships between ideas Can communicate spontaneously with good grammatical control without much sign of having to restrict what he/she wants to say, adopting a level of formality appropriate to the circumstances

5 Can express him/herself fluently and spontaneously, almost effortlessly Has a good command of a broad lexical repertoire allowing gaps to be readily overcome with circumlocutions There is little obvious searching for expressions or avoidance strategies; only a conceptually difficult subject can hinder a natural, smooth flow of language

The VSTEP.-3-5 specifications include two components: detailed descriptions of test items and tasks and sample items, tasks 15

Regarding the detailed descriptions of the speaking tasks, three components mentioned are the language input and the test tasks The language input includes the levels of difficulty of the input (vocabulary and structure difficulties), the content (the familiarity of speaking situations and topics) The test tasks are described in detail in terms of the language functions expected to be produced per each task, the task shells which describe the length (number of words) and ideas of sentences and questions, how the questions are formed and the scripts that the examiners should follow

The other part of the VSTEP.3-5 specifications is the sample test Below is an example of the speaking test section for test takers:

Talk about the climate in your area

- What is the weather like in your area at this time of the year?

- Which season do you like the best? Why?

- Do you prefer to live in a cold region or hot region? Why?

Talk about how your travelling experience

- What was the last place you traveled to?

- Have you ever travelled alone?

- Which city in Vietnam do you like the most?

15Decision No 730/QD-BGDDT dated March 11 th 2015 issued by the Ministry of Education and Training of Vietnam

Situation: You are having a birthday party and many of your friends are invited Three locations are suggested: at home, in a restaurant, and in a karaoke bar Which do you think is the best place for the party?

Topic: Mobile phones are useful tools at schools

- Do you think people will continue using mobile phones in the future?

- What negative effects do you think mobile phones have on young children?

- Do young people use mobile phones differently from old people in your country? How?

3.2.4 The VSTEP.3-5 item writer training

Those who want to become VSTEP.3-5 item writers should take an item writer training course The contents of the course were issued officially by the Ministry of Education and Training of Vietnam (Decision No 2912/QĐ-BGDĐT dated 23/8/2016 by Ministry of Education and Training of Vietnam)

The training contents include two parts: part 1 about general knowledge of testing and assessment in foreign languages teaching and learning, part 2 about specific information related to a particular test

Test use

The VSTEP.3-5 test use stages follow the stages described in the Manual for language testing published by the European Council in 2011, which are assembling, administering, marking, grading and reporting:

16 Circular No 1/2014/TT-BGDDT dated Jan 24 th 2014 by Ministry of Education and Training of Vietnam about the Vietnamese framework of reference for 6 levels of foreign language competences

Figure 7: Test use stages (adapted from the Manual for language testing)

This stage comprises items/tasks writers training, item/task commissioning and test banking and/or items/tasks banking a) Item/task writers training

VSTEP.3-5 item writers are trained in compliance with MOET Decision No

2912/QĐ-BGDĐT dated 23/8/2016 by Ministry of Education and Training of

Vietnam and Decision No 2913/QĐ-BGDĐT dated 23/8/2016 by Ministry of Education and Training of Vietnam, in which an item/task writer should sit in a training course which includes such contents as communicative competencies, competency-based assessment, assessing speaking, reliability and validity, writing VSTEP.3-5 speaking tasks Only the ones who have been trained to become VSTEP.3-5 are eligible to be invited to become

VSTEP.3-5 speaking task writers b) Test assembling

VSTEP.3-5 tests are assembled following the 12 stages of standardized test form assembling (MOET Circular No 23/2017/TT-BGDDT dated

29/2/2017) The test assembling steps are described as below:

Assembling Administering Marking Grading Reporting

Figure 8: VSTEP.3-5 test assembling stages 3.3.2 Test administering

VSTEP.3-5 test is delivered following the requirements of MOET Circular 23 No

Items/tasks without modification requirements

Items/tasks with minor modification

Discarded items/tasks returned to item/task writers

Items/tasks with major modification

23/2017/TT-BGDDT dated 29/2/2017 The administration includes informing the test dates and venues to prospective test takers, preparation of test forms and other materials, registering test takers, arranging the venue, administering the test, collecting all the test materials

Regarding the speaking section, the testing procedure includes three stages of before the test, during the test, and after the test

Oral examiners should arrive at the examination location in time and take the examination materials from the test managers The oral examiners should make sure that the seats of the examiner and the test-taker are arranged properly for a one-to- one speaking test model When the seats are ready, the examiners should turn on the voice-recording device and ensure that the volume of the device is sufficient for a recording duration of minimum 3.5 hours

Examiners should strictly follow the examination frame and avoid unscripted comments or asides They should respect the order of the different parts of the test

Part 1: 3 minutes Part 2: 4 minutes Part 3: 5 minutes Examiners can take notes during the examination time and have 2 minutes of transition between two adjacent test-takers to do the assessment and complete the mark sheet Below is the template of a typical VSTEP.3-5 speaking mark sheet

Figure 9: VSTEP.3-5 speaking mark sheet

During the test, the VSTEP.3-5 speaking examiners must:

- ensure the security and confidentiality of the VSTEP speaking test (varying the test materials following the test manager‘s instruction, ensuring no test materials are taken out of their personal charge, maintaining the confidentiality of the examination materials throughout and after the examining time, etc.)

- make sure that all the test-takers are treated fairly

- be familiar with the VSTEP speaking test format and rating scale

- be familiar with the VSTEP speaking test procedure

- be professional in their dress and behaviors (clean and tidy clothing, mobile phones being switched off, etc.)

- arrive at the examination location in time in accordance with the examination timetable

- create a good atmosphere to encourage test-takers to perform to the best of their ability

Examiners should ensure the security and confidentiality of all the mark sheets and the speaking materials and return the full package of materials to the test managers

A VSTEP.3-5 speaking examiners are eligible to do the rating only after being through a strict training course, which is stipulated in MOET Decision No 23/2017/TT-BGDDT dated 29/2/2017 The training contents include understanding of rating and rating speaking competencies, the Vietnamese Framework for foreign language references (stipulated under MOET Decision No.1/2014/TT0BGDDT dated 24/1/2014), common rating problems, VSTEP.3-5 speaking test format and specifications, VSTEP.3-5 speaking rating scale, rating sample speaking performances, and standardization.

VSTEP.3-5 marking, grading and reporting results

VSTEP.3-5 marking, grading and results reporting in general follow the requirements of the MOET circular No 23/2017/TT-BGDDT dated 29/2/2017 VSTEP.3-5 marking includes rating (speaking and writing) and machine marking (reading and listening) Grading includes analyzing the component scores and overall scores of the test takers and putting them into certain levels of English proficiency Final results are reported and informed to the test takers

All in all, this chapter introduces the Vietnamese Standardized Test of English Proficiency (VSTEP.3-5), its testing cycle including decision to provide test, test development and test use, with which test taker characteristics, test development stages, test format and specifications, item writer guidelines, rater‘s manual and marking rubrics, sample test, marking reports and score report relating to the speaking section of the VSTEP.3-5 shall be deliberated, which provide intense information about the VSTEP.3-5 test in general and VSTEP.3-5 speaking section in particular, which shall be discussed in further details in Chapters 5, 6, 7 of the study

RESEARCH METHODOLOGY

Research questions

The study is aimed at:

- outlining cognitive processes underlying the speaking construct of the VSTEP.3-5, which can serve as a framework for establishing the cognitive validity of the VSTE.3-5 speaking test; and

- establishing the cognitive validity of the speaking section of the VSTEP.3-5

For such aims, the research questions of the study are designed to establish the speaking cognitive validity of the test as a predictor of real-life performance basing on the central issues that a language test must deal with in terms of its cognitive validity (Field, 2013):

- RQ1: To what extent does the VSTEP.3-5 speaking section actually cover the cognitive processes that it is supposed to represent?

- RQ2: To what degree are the cognitive demands imposed in the

VSTEP.3-5 speaking section appropriately calibrated to reflect the levels of speaking competences of the test takers?

- RQ3: How closely do the cognitive processes that the VSTEP.3-5 speaking section elicits from a candidate resemble the processes that he/she would employ in non-test conditions?

Research design

Mixed methods research, an emergent methodology increasingly used in linguistic studies in recent year, is the use of quantitative and qualitative methods in a single study or series of studies To define, according to Creswell & Clark (2017), ―mixed methods research is a research design with philosophical assumptions as well as methods of inquiry As a methodology it involves philosophical assumptions that guide the direction of the collection and analysis of data and the mixture of

54 qualitative and quantitative approaches in many phases of the research process As a method, it focuses on collecting, analyzing, and mixing both quantitative and qualitative data in a single study or series of studies Its central premise is that the use of quantitative and qualitative approaches in combination provides a better understanding of research problems than either approach alone‖ (Creswell & Clark,

The underlying assumption of mixed methods research is that it can address some research questions more comprehensively than by using either quantitative or qualitative methods alone Questions that profit most from a mixed methods design tend to be broad and complex, with multiple facets that may each be best explored by quantitative or qualitative methods Strengths of quantitative research include its procedures to minimize confounding and its potential to generate generalizable findings if based on samples that are both large enough and representative It remains the dominant paradigm in health research However, this deductive approach is less suited to generating hypotheses about how or why things are happening, and/or explaining complex social or cultural phenomena Thus, the mixed methods research is ideal for studying the cognitive validity of the VSTEP.3-

5 speaking section, because it is not possible to frame the inherent complexity of test development as deductive hypotheses, particularly with reference to test-taker cognitive abilities

When embarking on a mixed methods research project it is important to consider: the methods that will be used; the priority of the methods; the sequence in which the methods are to be used

According to Creswell & Clark (2017), there are three different ways with which researchers could design a mixed methods research, namely triangulation design, explanatory design and embedded design Each way reflects different research aims and to address different research issues, among which, the mixed model of triangulation and embedded designs were applied for this study to capture the evidence of cognitive validity at the various stages of a testing cycle The following figure illustrates the design of the study

Figure 10: The research design of the study

Firstly, via documents analysis and focus group interviews, which was conducted qualitatively on almost 3 test developers, 5 item/task writers, 6 oral examiners, the cognitive demands imposed by the VSTEP.3-5 were analyzed following the priori stages of the VSTEP testing cycle Then, the quantitative scores and the levels of test fulfillment of 288 test-takers were analyzed to decide on the levels of cognitive demands calibrated in different language proficiency levels of the VSTEP.3-5 speaking rating scale Questionnaires on those 288 test takers and stimulated-recall interviews on 30 test takers among the 288 ones were taken to study the cognitive processes elicited in the test and non-test conditions

The stages of conducting the research are summarized as below:

Stage 1 Identifying whether the VSTEP.3-5 speaking section actually covers the cognitive processes that it is supposed to present

At this stage, the evidence of cognitive processes presented in the VSTEP.3-

5 speaking section was identified following the features of the cognitive processes featured in the Weir‘s socio-cognitive framework Such evidence was confirmed to be present by those who are closely related to the

56 development and administration of the test: the test developers, item writers and oral examiners via focus group interviews

Stage 2 Identifying whether the cognitive demands imposed in the VSTEP.3-

5 speaking section are appropriately calibrated to reflect the levels of speaking competences of the test takers

With stage 1 mentioned above, the evidence of the cognitive processes presented in the VSTEP.3-5 were identified At this stage, the researcher worked out whether the cognitive demands imposed in the VSTEP.3-5 speaking section and presented in the rating scale are appropriately calibrated to reflect the levels of speaking competences of the test takers by looking into the scores of the studied participants, who are the VSTEP.3-5 test takers

Stage 3 Identifying whether the cognitive processes that the VSTEP.3-5 speaking section elicits from a candidate resemble the processes that he/she would employ in non-test conditions

With stage 1 and stage 2 of collecting data, the researcher got the evidence related to the cognitive processes imposed in the speaking section in the test context Stage 3 of collecting data helped with the cognitive processes that the test takers experienced in the test context and in non-test context by collecting ideas of the test takers with survey questionnaires and stimulated- recall interviews

The mixed method was applied by exploring both qualitative and quantitative data collection methods The qualitative data helped establish the initial findings and both the quantitative data and qualitative data helped confirm the findings established with the qualitative data.

Selection of participants

Different groups of participants of the study are: 3 VSTEP.3-5 test designers (who are the team leader and two members of the VSTEP.3-5 test development team, who were in charge of the speaking section of the test), 5 VSTEP.3-5 item writers (randomly invited among the certified VSTEP.3-5 item writers to participate in the research), 6 VSTEP.3-5 speaking examiners (randomly invited among the certified VSTEP.3-5 speaking examiners to participate in the research), and 288 VSTEP.3-5 test takers (selected among 299 VSTEP3-5 randomly selected test takers, 99 ones

57 were randomly selected among those who registered to take the test with an aim of getting level 3 of English proficiency, 100 ones were randomly selected among those who registered to take the test with an aim of getting level 4 of English proficiency, 100 ones were randomly selected among those who registered to take the test with an aim of getting level 5 of English proficiency)

With the process of investigating the cognitive processes underlying the construct of the VSTEP.3-5, 3 VSTEP.3-5 test designers were invited to join a focus group One participant was the head of the VSTEP.3-5 design team, the other two were the key members the VSTEP.3-5 speaking design team They were the authors of the VSTEP.3-5 speaking section format, specifications and task writer‘s guidelines as well as oral examiner‘s training manual To see whether the construct of VSTEP.3-

5 was presented in the VSTEP.3-5 speaking test form, 5 item writers were invited to attend a focus group The five task writers were those who had been trained to become VSTEP.3-5 item writers They were also the ones who had at least three years of writing VSTEP.3-5 speaking tasks To get evidence over whether the VSTEP.3-5 construct was presented in the speaking rating scale and marking process, 6 VSTEP.3-5 oral examiners were invited to conduct a focus group They had been trained to become VSTEP.3-5 oral examiners and has at least three years of experience of marking VSTEP.3-5 speaking performance

299 test takers were invited to do survey questionnaires right after a test session They were randomly selected among the VSTEP.3-5 speaking candidature who took the speaking section of the test batch 288 of them got enough requested scores (who took all four sections of the test); therefore, the data are analyzed for those

288 test takers Among the 288 test takers, 30 ones were selected to attend stimulated recall interview sessions Criteria to select the 30 participants were:

 The test takers of lower than level 3 proficiency accounts for 10 per cent, level 3 test takers account for 30 per cent, level 4 participants account for 30 percent, and level 5 test takers account for 30 per cent

 The test takers varied in ages, genders and occupations

 The test takers were consented to attend the interviews and the use of their VSTEP.3-5 speaking test recordings

 Special remarks plotted out with the recordings of the test takers such as pauses, uses of fillers etc

Data collection instruments

In order that the data collected were relevant for the study, the data collection instruments used are the VSTEP.3-5 test format and related documents, a VSTEP.3-

5 speaking test form, focus group interview questions, survey questionnaires, and stimulated recall interview questions

Research questions Data collection instruments

RQ1: To what extent does the VSTEP.3-5 speaking section actually cover the cognitive processes that it is supposed to represent?

VSTEP.3-5 test development report, VSTEP.3-5 test format, VSTEP.3-5 test specifications and other related documents (qualitative data)

Focus group interviews (qualitative data)

RQ2: To what degree are the cognitive demands imposed in the VSTEP.3-5 speaking section appropriately calibrated to reflect the levels of speaking competences of the test takers?

VSTEP.3-5 test scores (quantitative data)

RQ3: How closely do the cognitive processes that the VSTEP.3-5 speaking section elicits from a candidate resemble the processes that he/she would employ in non-test conditions?

Stimulated recall sessions (qualitative data)

The reason for selecting focus group interview in stead of individual interview are that focus group may challenge an idea to different experts on the concept of a discussion session and that the researcher can explore the complementary experiences of the participants to identify crucial points needed for the research Besides, with focus group, the research can gather a diversity of responses and so can have useful insights into the issues discussed among the participants Individual interview‘ strength is that it helps the researcher to explore more depth with individuals; however, with the goal of gathering information to confirm findings of documents analysis, information discussed among the participants shall be of more interest to outcome of the study

4.4.1 VSTEP.3-5 test format and related documents

According to Field (2011), at the priori stages of a testing cycle, cognitive validity

59 evidence could be established in the language input, the test format, and the test tasks Thus, the VSTEP.3-5 testing cycle was studied The VSTEP.3-5 testing cycle included different stages of work relating to the VSTEP.3-5 from the development stages to the actual administration stages With those stages, the test takers‘ characteristics, the test format, the test specifications, the item writer‘s guidelines, oral examiner‘s training manual were studied to have an insight into the underlying cognitive processes of the VSTEP.3-5 from the designing stages With permission from the National Foreign Languages Project – Ministry of Education and Training of Vietnam, the author hereof had been able to access the relevantly requested documents The analysis of the said documents provided insightful information for the evidence of cognitive validity of the VSTEP.3-5 speaking section

At the posteriori stages of a testing cycle, according to Weir (2005), evidence of cognitive validity could also be collected For such an aim, a VSTEP.3-5 speaking test form and the scores of 288 test takers taking the test were collected to provide the overall proficiency levels and the proficiency levels of the speaking skill of the test takers The VSTEP.3-5 test form of a test batch was used basing on the assumption that all the VSTEP test forms were developed strictly following the test specifications of the VSTEP.3-5 and the test banking process as stipulated in MOET‘s Circular 23/2017/TT-BGDDT 17 issued in 2017, in which VSTEP.3-5 forms should be developed strictly following the test format and specifications stipulated in Decisions No 729/QD-BGDDT and 730/QD-BGDDT dated March

11 th 2015 issued by the Ministry of Education and Training of Vietnam and the 12 stages of test development and banking

The focus groups were conducted on three different groups of participants: 3 test designers, 5 item writers, and 6 oral examiners in order to validate the cognitive processes underlying the construct of the VSTEP.3-5 speaking section Weir‘s socio-cognitive validation framework was applied with focus on the cognitive

17 MOET‘s Circular 23 stipulates the procedures to organize tests of foreign language proficiency in accordance with the Vietnam‘s Framework of Reference for Foreign Languages

60 validity of the speaking skill developed by Field (2011) Following the framework, the focus group discussion questions were designed into three groups to collect information about: (1) the cognitive processes underlying the construct of the VSTEP.3-5 speaking section at the stage of developing the VSTEP.3-5 speaking section; (2) the item writers‘ actual work of writing the VSTEP.3-5 speaking test tasks in connection to their understanding of the cognitive processes underlying the construct of the test; (3) the actual work of examining the speaking skill of the VSTEP.3-5 test takers in connection to their understanding of the cognitive process underlying the construct of the test

A set of questions were designed to collect information about the test takers and what the test takers actually went through in the testing contexts The questions were grouped into 4 groups: (1) general information about the test takers at the time taking the VSTEP.3-5 speaking test; (2) information about the part 1 of the test; (3) information about the part 2 of the test; (4) information about the part 3 of the test With all the three parts, the questions were designed so as that the test takers can share their all experience with the test including the allocation of time lengths, the topics, the task types, the examiners and their own preparation of/for the test The questions designed also covered the different stages of the cognitive processes including the conceptualization, grammatical encoding, phonological encoding, phonetic encoding, and self-monitoring to see whether or not the test takers have experienced themselves with those stages of thinking when they were speaking, during the buffering time of preparation, and immediately before their speaking turns

Stimulated recall interview questions were prepared based on the validation framework developed by Field (2011) The questions asked what actually happened inside the test takers‘ heads regarding the conceptualization of ideas, the assembly of grammatical patterns, lexical patterns and phonetic patterns right before and while the test taker uttered It is impossible for the researcher to apply think-aloud protocol because the test taker is taking the test Applying think-aloud is to interfere in the testing process What the test taker thought in his/her mind could only be accessed via post-test interviewing, and specifically with stimulation The

61 stimulated recall interviews were conducted following the below procedures:

First, the recording of the speaking performance of the 30 test takers were collected (30 test takers out of 288 ones who attended the survey questionnaires were invited to join the stimulated recall sessions In order to eliminate the other factors that may affect the outcome of the research, the test takers were selected so as that their ages, genders, occupations, health conditions had no significant effect on the data collected The criteria to select the test takers for stimulated recall sessions abovementioned were applied)

Then, the test takers were scheduled to attend stimulated recall interviews At the interviews, the researcher and the interviewed test-taker listened to the recording of the test taker‘s taking the VSTEP.3-5 speaking section for particular patterns plotted and when the test taker needed stimulus to recall his/her experience The stimulated recall interviews were made within one week after the test date so as that the test takers‘ experience of taking the test, specifically the speaking section of the test, was still fresh It was important that the researcher provided sufficient information about cognitive processes of speaking so that the test takers interviewed could understand the nature of the matter and the interviews were conducted following the prepared relevant interviewing questions

All the interviews were recorded and later transcribed for analysis

Relating the stimulated recall interview questions, based on Field‘s framework of cognitive validation, the questions were categorized into groups: (1) to grasp the inner thinking of the test takers in the speaking testing context and (2) to compare what the test takers experience in the testing context and in real life situations The questions of both groups relate to the 5 different stages of cognitive processing including conceptualization, grammatical encoding, phonological encoding, phonetic encoding, and self-monitoring When studying these five stages, the factors that affect the cognitive demands imposed on the test takers were considered In addition, the researcher tried to realize the cognitive loads that the test takers claimed they experienced in the test context and non-test contexts, which provided evidence to confirm the cognitive processes that the test-takers might encounter when taking the VSTEP.3-5 speaking test.

Data analysis

The most important, and perhaps most difficult, aspect of mixed methods research

62 is integrating the qualitative and quantitative data According to Creswell & Clark

(2017), one approach is to analyze the two data types separately and to then undertake a second stage of analysis where the data and findings from both studies are compared, contrasted and combined The quantitative and qualitative data are kept analytically distinct and are analyzed using techniques usually associated with that type of data; for example, statistical techniques could be used to analyze survey data whilst thematic analysis may be used to analyze interview data In this approach, the integrity of each data is preserved whilst also capitalizing on the potential for enhanced understanding from combining the two data and sets of findings Another approach to mixed methods data analysis is the integrative strategy Rather than keeping the datasets separate, one type of data may be transformed into another type That is to say that qualitative data may be turned into quantitative data or quantitative data may be converted into qualitative data The former is probably the most common method of this type of integrated analysis Quantitative transformation is achieved by the numerical coding of qualitative data to create variables that may relate to themes or constructs These data can then be combined with the quantitative dataset and analyzed together

The integrative strategy is the data analysis approach used for looking into the different types of data of the study hereof Firstly, the qualitative data of documents analysis and focus group discussion were analyzed to cater for the cognitive validity evidence of the priori stages of the VSTEP.3-5 testing cycle The results were later triangulated with the empirical evidence of the results of 288 test takers taking the VSTEP.3-5 speaking section The results of survey questionnaires and stimulated recall interviews were analyzed in connection to each other in order to compare and contrast the cognitive processes that the test takers experienced in the test contexts to what they did in real life situations The results of all different types of data collection were then integrative for the findings of the study

When analyzing the quantitative data including the test scores of 288 test takers and the survey data of these 288 persons, tools of EXCEL, SPSS, and FACETS were exploited to look into the data from descriptive statistics, correlations, t-test, Cronbach's alpha, factor analysis and many-facet Rasch analysis As for the qualitative data, the documents relating to VSTEP.3-5 are analyzed in comparison

63 and contrast to the socio-cognitive framework to establish the validity evidence of test-priori states Then transcription of all the focus group discussions and stimulated recall interviews was conducted using transcription codes The transcription was then examined to identify patterns that stand out and support and supplement the quantitative data

In this chapter, the research design of the study with focus on the research methods used to study the cognitive validity of the VSTEP.3-5 speaking section is discussed The research questions, research design, data collection tools, selection of participants, data collection procedures and data analysis of the study are presented to explain how the data are collected The following chapters 5, 6, 7 shall discuss how the data collected help to cater for the findings of the study

THE COGNITIVE PROCESSES SUPPOSEDLY REPRESENTED

The cognitive processes presented in the VSTEP.3-5 speaking section

This part of the dissertation elaborates whether the VSTEP.3-5 is supposed to cover the cognitive processes mentioned in Weir‘s socio-cognitive model in the speaking component and how the processes are mentioned by studying the VSTEP.3-5 test development reports and related documents and the results of the focus groups

5.1.1 The VSTEP.3-5 test development reports and related documents

Firstly, the VSTEP.3-5 speaking test development reports and related documents were analyzed to collect evidence of the incorporation of cognitive processes in the construct of the VSTEP.3-5 and their representation in the test format, test specifications, and other related documents as well Then, the information from the VSTEP.3-5 designers, item writers and oral examiners will provide validation evidence

Studying the different stages of developing the VSTEP.3-5 test against the cognitive processes introduced in the socio-cognitive framework, the followings are evidence of the incorporation of the speaking cognitive processes into the VSTEP.3-5 speaking section:

Considering the cognitive demands that shall be imposed on the test takers, a list of themes and domains for different levels of targeted English proficiency were developed by the VSTEP.3-5 development team The list was developed based on the CEFR list of themes and domains of public, personal, educational and occupational following different categories of locations, institutions, persons, objects, events, operations, and texts From the lower level to higher level of English proficiency, the themes and domains are listed based on the levels of familiarity and abstractness The topics and themes of higher levels of English proficiency tend to be more abstract and less familiar to the prosperous test takers, which is quite relevant to the description of language proficiency levels of the CEFR

The levels of familiarity and abstractness are features of the stage of conceptualization of the cognitive processes in speaking introduced in the socio cognitive model, which are awareness of the ongoing topic, thematization of new information, recognition of information shared and not shared with the listener,

66 accommodation to the point-of-view and even the form of words of the interlocutor and certain basic principles which determine how information is ordered The themes and domains used in the VSTEP.3-5 are listed for different levels of the targeted English language proficiencies of the prosperous test takers following the levels of familiarity and abstractness reflect the consideration of the factors that may affect the conceptualization of the speakers when speaking in English

As mentioned in the previous part hereof, the VSTEP.3-5 is supposed to measure the English proficiency from level 3 to level 5 of the Vietnamese framework for foreign languages competences (hereinafter referred to as the CEFR-VN), which was developed based on the CEFR And thus the construct of the VSTEP.3-5 is the CEFR-VN from level 3 to level 5, corresponding to the CEFR level B1 to level C1 When the CEFR-VN was adopted to be the construct of the VSTEP.3-5, the descriptors of the CEFR-VN were used to develop the tested contents of the test and the underlying theories of the CEFR-VN should also be reflected in the construct of the test

Studying the CEFR-VN, it can be seen that the framework includes the descriptors of foreign languages competences from level 1 to level 6 for overall foreign language competences and the competences corresponding to the 4 languages skills of reading, listening, speaking, and writing The descriptors which are part of the CEFR-VN are almost the same to those of the CEFR With regards to speaking, the descriptors also include levels of range, accuracy, fluency, interaction, cohesion, as well as those relating to pragmatic and socio-linguistic competences

Thus, the underlying theories of the CEFR should also be reflected in the construct of the test As mentioned in the VSTEP.3-5 development reports, the theories underlying the test are Bachman and Palmer, and Canale and Swain‘s communicative competences, and those related to the CEFR (and the CEFR-VN) In those models, the cognitive aspects of assessment are mentioned as factors that may affect the competence development of language learners i) Bachman and Palmer, and Canale and Swain’s communicative competences

Regarding the communicative competence models, the theory was first introduced by Canale and Swain and later was developed by Bachman and Palmer (Canale &

Swain, 1980) (Bachman & Palmer, 1996) According to Canale and Swain (1980), the communicative competences include grammatical structures and the speech functions the learner wants to perform, sociolinguistic knowledge and communicative strategies They stress the importance what the learners need to be able to do, and which language material they will need to do Also, they emphasize the need for situations in which learners have opportunities to perform those language functions Later in 1980, Canale and Swain focused on sociolinguistics and its interaction with other components of the model as grammatical and strategic competences This new trend was a great improvement over the previous ones on the ground that it considered communication as a dynamic process which can be realized in language use Some later analysts noted elements missing from the original Canale and Swain formulation, such as the demonstration of communicative knowledge in actual performance (Fulcher & Davidson 2007) Later on, Canale‘s refinement of the model determined two components for communicative competence: conscious and unconscious knowledge and the skills needed to use this knowledge in actual communication, which added one more competence to the model as discourse competence (Canale, 1983)

Bachman (1990) and then Bachman & Palmer (1996, 2010) introduced a more comprehensive model of language ability A major achievement of this model is its emphasis on the central role of strategic competence, metacognitive strategies or higher order processes that explain the interaction of knowledge and affective components of language use Bachman & Palmer‘s model of language competence

(2010) is multidisciplinary and complex, which are resulted in the introduction of affective factors According to Bachman and Palmer (1996), many traits of language users such as some general characteristics, their topical knowledge, affective schemata and language ability influence the communicative language ability The crucial characteristic is their language ability which is comprised of two broad areas – language knowledge and strategic competence Language knowledge consists of two main components - organizational knowledge and pragmatic knowledge which complement each other in achieving communicatively effective language use In Bachman and Palmer‘s model, organizational knowledge is composed of abilities engaged in a control over formal language structures, i.e of

68 grammatical and textual knowledge Grammatical knowledge includes several rather independent areas of knowledge such as knowledge of vocabulary, morphology, syntax, phonology, and graphology They enable recognition and production of grammatically correct sentences as well as comprehension of their propositional content Textual knowledge enables comprehension and production of (spoken or written) texts It covers the knowledge of conventions for combining sentences or utterances into texts, i.e knowledge of cohesion (ways of marking semantic relationships among two or more sentences in a written text or utterances in a conversation) and knowledge of rhetorical organization (way of developing narrative texts, descriptions, comparisons, classifications etc.) or conversational organization (conventions for initiating, maintaining and closing conversations) ii) The CEFR (and the CEFR-VN)

The Common European Framework of Reference for Languages: Learning, Teaching, Assessment, or CEFR, is a guideline used to describe achievements of learners of languages across Europe The main aim of CEFR and the CEFR-VN as well is to provide a method of assessing and teaching which applies to all languages in Europe CEFR divides learners into three broad divisions which can be divided into six levels: A Basic User, A1/Level 1 Breakthrough, A2/Level 2 Way stage, B Independent User, B1/Level 3 Threshold, B2/Level 4 Vantage, C1/Level 5 Proficient User, Effective Operational Proficiency, and C2/Level 6 Mastery The below discusses the factors that may influence language learners‘ competence development described in the CEFR (and the CEFR-VN as well)

Regarding language learners‘ competences development, the CEFR has also described the factors that may influence the cognitive demands of a test They include the cognitive factors, skills and ability to cope with processing demands The cognitive factors include task familiarity, with which cognitive load may be lessened based on the extent of the learner‘s familiarity with the type of task and operations involved, the themes, type of text (genre), interactional schemata (scripts and frames), necessary background knowledge, and relevant sociocultural knowledge Among the factors, interactional schemata (scripts and frames) considers the availability to the learner of unconscious or ‗routinised‘ schemata can free the learner to deal with other aspects of performance, or assists in anticipating

Findings

The above information from the VSTEP.3-5 documents and focus groups‘ results illustrate that the cognitive processes that the test takers may encounter when taking the VSTEP.3-5 have been addressed at both the development and administration stages The analysis was conducted based on the model of speech production developed by Levelt (1989, 1999) and applied by Weir (2005) in the socio-cognitive model of test development and validation The model falls into six major phases of processing: conceptualization, grammatical encoding, phonological encoding, phonetic encoding, articulation and self-monitoring The below evidence of validity with regards to the presentation of the cognitive processes that the test takers may encounter when taking the VSTEP.3-5 speaking section was the summary of the above data analysis and discussion

As already mentioned, the socio-cognitive model presents conceptualization in two types of operation: macro-planning (in which a set of speech acts is anticipated) and micro-planning (at a more local level relating to the role and form of the upcoming utterance) Field (2011, 88) categorizes these two types into two main headings of provision of ideas and integrating utterances into discourse framework:

Provision of ideas: ―the complexity of the ideas which test takers have to express and the extent to which the ideas are supplied to them‖

Integrating utterances into discourse framework: ―the extent to which test takers are assessed on their ability to relate utterances to the wider discourse (including their awareness information shared with the interlocutor‖ The conceptualization phase of cognitive processing in speech production in the VSTEP.3-5 speaking section shall be provided in two main headings mentioned above

According to Field (2011, 88), ―retrieving information and generating ideas impose heavy cognitive demands upon a speaker‖, which means the less conceptualization is needed, the lower the level of cognitive demand required, allowing more working memory to be allocated to retrieving the relevant linguistic forms Field (2011) also stresses the importance of supports that the test takers are provided with In his opinion, the test administrator should ―ensure that a test does not too heavily reward the candidate‘s imagination rather than their language proficiency‖

The level of cognitive demand imposed on the VSTEP.3-5 test takers when they do the speaking section has also been identified by looking into two aspects of availability of information and support provision As mentioned in the previous part, the availability of information varies from level 3 to level 5 of the test with the level of familiarity decreases from lower level of proficiency to higher level of proficiency for a particular test taker, which ensure the level of cognitive load that the test taker of different levels of English proficiency may not affect much the production of the test taker per any particular target proficiency level As mentioned in the construct of the VSTEP.3-5 speaking test:

- Level 3: Can reasonably fluently sustain a straightforward description of one of a variety of subjects within his/her field of interest, presenting it as a linear sequence of points

- Level 4: Can give clear, detailed descriptions and presentations on a wide range of subjects related to his/her field of interest, expanding and supporting ideas with subsidiary points and relevant examples

Can give clear, systematically developed descriptions and

93 presentations, with appropriate highlighting of significant points, and relevant supporting detail

- Level 5: Can give clear, detailed descriptions and presentations on complex subjects, integrating sub-themes, developing particular points and rounding off with an appropriate conclusion

The second aspect which may affect the provision of information to the VSTEP.3-5 test takers is how much support is provided by the test A simple way of increasing the level of difficulty of a test is to reduce the support to the test takers as the level of the test increases However, that is not the policy adopted in the speaking section of the VSTEP.3-5 The support to the VSTEP.3-5 test takers is provided to ensure comparability between the performances of candidates across levels since the concepts and the areas of lexis upon which they draw are similar Besides, the support is to make sure the cognitive load with regards to conceptualization does not affect the linguistic performance of the test takers

The support to the VSTEP.3-5 test takers is observed in two ways The first is the prompts, which range from questions of familiar topics (part 1) to more abstract topics (part 3), situations (part 2), topic and a mind map (part 3)

- Part 1: Social Interaction (the examiner asks three to five questions, the test-taker answers the questions)

Topic 1: three questions of a topic of interest or familiarity to the targeted test takers

Topic 2: three questions of another topic of interest or familiarity to the targeted test takers

- Part 2: Solution Discussion (the examiner and test-taker discuss three options and select the best alternative)

The topics for discussion are familiar and concrete, which are relating to everyday speaking situations, situations of speaking that may occur at work or when travelling

- Part 3: Topic Development (the test-taker develops a topic, using a given outline)

The topics and given outlines are familiar to the test takers The level of complexity discussed increased in the three additional questions, one of which is targeted at test takers of level 3, one is level 4, and one is level 5

Another factor which plays an important part in assisting conceptualization is whether the speaker is given time to preplan what to say (in terms of general ideas, of the links between those ideas or of the actual form of words to be used) or not Considering the format of the VSTEP.3-5, the test takers have three minutes to prepare for their talk, the first is one minute to prepare for task 2 and the other is two minutes to prepare for part 3 of the test

Another factor that may affect the level of conceptualization is the level of difficulty of the language used to compose the three tasks of the VSTEP.3-5 speaking section The level of difficulty of structures and vocabulary is of level 3 and lower to make sure that the test takers of proficiency level 3, level 4 or level 5 could understand the prompts and then can conceptualize the ideas Or to express in other words, the prompts of the test are not hindering the test takers to produce utterances Only the vocabulary and words of the two additional questions of level 4 and level 5 could be of level 4 and level 5

5.2.1.2 Integrating utterances into a discourse framework

Amongst the factors which Levelt identifies as affecting both macro- and microplanning are: awareness of the ongoing topic, thematization of new information, recognition of information shared and not shared with the listener, accommodation to the point- of- view and even form of words of the interlocutor and certain basic principles which determine how information is ordered The VSTEP.3-5 speaking section takes into account all those factors, which are presented in the interlocutor frame, and the rating scale with the criterium of

Band 3: Hardly expresses or develop his/her ideas

Band 4: Expresses his/her ideas with limited relevance to questions and cannot develop ideas without relying heavily on the repetition of the prompts

Band 5: Relevantly responds to questions and can develop ideas in a simple list of points, shows some attempts of idea elaboration

Band 6: Relevantly responds to questions and can develop ideas in a simple list of points; even though some attempts of idea elaborations (details and examples) are evident, they are either vaguely or repetitively expressed

Band 7: Relevantly develops ideas with relative ease, elaborating on ideas with some appropriate details and examples

Band 8: Relevantly develops ideas with relative ease, elaborating on ideas with many appropriate details and examples

Band 9: Relevantly develops ideas with ease, elaborating on ideas with appropriate details and examples

Band 10: Generally coherently develops ideas with elaborated details and examples and can round off with an appropriate conclusion

The above descriptors of the rating scale provide the interlocutors with how to handle highly formulaic language that does not sufficiently recognize the interlocutor as a participant in a dialogue and more importantly the ability to relate utterances to a wider discourse representation is explicitly taken to be a determinant of successful performance

The VSTEP.3-5 model of speaking section of one to one with one assessor/interlocutor and one examinee Those mentioned clearly in the construct, interaction is observed only between the interlocutor and the examinee, which to some extent is not close to actual use of interactive language One advantage of this type of testing model is that the test takers performance is not affected by the performance of his/her counterpart Their performance is more affected by the co- operativeness of the interlocutor As mentioned above, co-operativeness of interlocutor(s) refer to that a sympathetic interlocutor will facilitate successful communication by ceding a degree of control over the interaction to the user/learner, e.g in negotiating and accepting modiﬁcation of goals, and in facilitating comprehension, for example by responding positively to requests to speak more slowly, to repeat, to clarify Features of speech of interlocutors are the characteristics of interlocutors‘ voice, e.g rate, accent, clarity, coherence Visibility of interlocutors refer to the accessibility of paralinguistic features in face to face communication facilitates communication General and communicative competences of interlocutors, including behaviour refer to the degree of familiarity with norms in a particular speech community, and knowledge of the subject matter Considering such issues that may cause different levels of cognitive demand

THE CALIBRATION OF COGNITIVE DEMANDS IN THE VSTEP.3-5 SPEAKING RATING SCALE

Many-facet Rasch Model

Analysis of the scores was conducted using Mini FACETS 3.82.2 by Mike Linacre

(2012), the Rating Scale Model The VSTEP.3-5 speaking test scores were analyzed in relation to the rating criteria and raters The total number of examiners is 31, each examiner rated from 6 test takers to 20 test takers The rating scale include 5 criteria, each with 9 levels (level/band 1 is removed because the Many-Facet Rasch Model of Rating Scale Model run in Mini FACETS 3.82.2 observations are valid when on a rating scale whose highest category is category 9 (or less)

The Rasch-Grouped Rating Scale Model specifies the probability that person of

105 certain ability is observed in category of a rating scale specific to a group of items Rasch Models can be expanded to as many facets as we like The central focus is to study whether adding or reducing facets provides a better model of observed data The number of facets applied in this study is three facets, including persons, items and raters, that specifies the probability that person of certain ability is observed by a certain judge of leniency in a certain category of item of certain level of difficulty Based on item fit values (ranging from 7 to 1.4), the data fit the Rasch model to an acceptable extent, each item made a substantial contribution to the variance in the data and loaded very highly with only one of the rating variables Table 7 presents item facet summary statistics The items (rating categories) present a fairly narrow range along the ability being measured (-0.70 to 0.74 logits), which is to be expected: the main impetus of measurement is examinee ability and not item difficulty

Table 7: Item facet summary of statistics

Item Measure Measure SE Infit Mean Square

The test takers were Vietnamese people of ages ranging from 20 years old to 45 years old, taking a same VSTEP.3-5 test in 2019 Most of the test takers participating in the study are university students, high-school students and teachers Their ability measures spanned 5.65 to 7.17 logits with adjusted SD of 1,74; the mean of standard errors of the measures was 66, rather large due to the small number of items and ratings associated with each person This range of logits was interpreted to indicate four levels of ability corresponding to lower than B1, B1, B2 and B2 high

The data showed some disagreement among the raters This is probably because operational constraints prevented each rater from encountering test-takers at different ability levels: some examiners encountered only Level 5, for example, while others rated only Level 3 The Rating Scale Model showed that the bands of different criteria of the rating scale should receive further research and development attention when it is operationally possible to expose each examiner to a wider range of test-taker ability In particular, with vocabulary, bands 4,5, 9,10 (bands 3, 4, 8, 9

106 in the Figure 12) should be studied more carefully With fluency, bands 6 and 7 (bands 5 and 6 in the Figure 12) should be reviewed No significant difference found among the logits arranged for the other descriptors of the same bands of different criteria

Figure 12: All facets table of VSTEP.3-5 speaking scores of 288 test takers

Correlation study

Besides Rasch analysis, correlation is applied among the scores of the four skills Correlation results between the speaking scores and the writing, listening, reading and overall scores of those 288 test takers show acceptable level of agreement

107 among the scores of the skills and specifically highest agreement between the speaking scores and the overall scores The correlation between speaking and writing scores is 0.719; between speaking and reading scores is 0.705, between speaking and listening is 0.844, between speaking scores and overall scores is 0.878 This means the VSTEP.3-5 test is arranging the test takers in a relatively similar manner among the four skills and between the skills and the overall scores

Table 8: VSTEP.3-5 speaking section correlation data

Overall Speaking Listening Reading Writing

T-test study

Grouping the test takers into 4 different levels of overall English language proficiency, among the 288 test takers, 85 ones are of level 5, 73 ones are of level 4, the number of level 3 test takers is 108 and the rest 22 ones are of lower than level 3 English language proficiency Running t-test on two variables of speaking scores and overall scores for each level of the 4 levels studied, it can be seen that there is no significant different of the speaking scores of levels 3, 4 and lower and the overall scores of those test takers; however, significant difference was found between the speaking scores of the test takers of level 5 English proficiency The mean of the speaking scores of those test takers is 7.905 meanwhile the mean of their overall scores is 8.611 with p value of less than 0.05 The difference is while the overall scores placed those test takers at level 5 of English proficiency, the speaking scores placed them at level 4 Thus the speaking score bands of 9 and 10 of all speaking criteria should be studied and the oral examiners should be informed of this pattern of the speaking scores

Table 9: T-test result between VSTEP.3-5 overall scores and speaking scores (C1 students)*

*Variable 1: speaking score; variable 2: overall score

Table 10: T-test result between VSTEP.3-5 overall scores and speaking scores (B2 students)*

*Variable 1: speaking score; variable 2: overall score

Table 11: T-test result between VSTEP.3-5 overall scores and speaking scores (B1 students)

Hypothesized Mean Difference 0 df 92 t Stat -8.487531217

P(T

Tiêu đề	An Investigation into the Cognitive Validity of the Speaking Section of the Vietnamese Standardized Test of English Proficiency (VSTEP.3-5)
Tác giả	Nguyễn Thị Mai Hữu
Người hướng dẫn	Professor Hoa Nguyen, Professor Fred Davidson, Dr. Karen Ashton
Trường học	Vietnam National University, Hanoi University of Languages and International Studies
Chuyên ngành	English Language Teaching Methodology
Thể loại	Thesis
Năm xuất bản	2022
Thành phố	Hanoi

Định dạng
Số trang	257
Dung lượng	2,79 MB