1. Trang chủ
  2. » Ngoại Ngữ

ielts online rr 2013 1

51 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Cognitive Processes In Performing The IELTS Speaking Test
Tác giả Li-Shih Huang
Trường học University of Victoria
Chuyên ngành English Language Testing
Thể loại research report
Năm xuất bản 2013
Thành phố British Columbia
Định dạng
Số trang 51
Dung lượng 2,16 MB

Cấu trúc

  • Table of contents

    • Introduction from IELTS

  • Dr Gad S Lim

    • References

  • 1 INTRODUCTION

  • 2 RELATION TO THE EXISTING LITERATURE AND RESEARCH

    • 2.1 Defining strategic behaviours

    • 2.2 Strategic competence as part of the speaking construct

    • 2.3 Taxonomies and research on speaking strategies in the second-language acquisition (SLA) and language testing (LT) fields

    • 2.4 Stimulated retrospective recall as a data-gathering method

  • 3 RESEARCH DESIGN AND METHODOLOGY

    • 3.1 Guiding questions

    • 3.2 Research design and participants

      • Figure 1: Research design

    • 3.3 Research instruments

      • 3.3.1 Background questionnaire

      • 3.3.2 Pre-test language proficiency

      • Table 1: Participants’ characteristics (N = 40)

      • 3.3.3 IELTS Speaking Test

      • Table 2: A summary of task types in the IELTS Speaking Test

    • 3.4 Data collection procedures

    • 3.5 Data coding and analyses

      • 3.5.1 Data coding

      • 3.5.2 Sampling design matrix

      • Table 3: Factorial design matrix

      • 3.5.3 Dependent and independent variables

      • 3.5.4 Interaction

      • 3.5.5 Mixed model multivariate analysis of variance (MANOVA)

      • Table 4: Tests for normality of six dependent variables

      • Figure 2: Frequency distribution histograms of the dependent variables

      • Table 5: Test results for outliers

      • Figure 3: Matrix plot between six dependent variables

      • Table 6: Matrix of correlation coefficients between six dependent variables (N = 40)

      • Table 7: Test for sphericity of within-subject effects

      • Table 8: Levene's test for homogeneity of variance (N = 40)

  • 4 RESULTS

    • 4.1 Strategic behaviours

      • Table 9: Frequencies and percentages of strategy use for all three tasks combined (N = 40 for all strategies)

      • Figure 4: Comparison of mean (arcsine-transformed) scores for the use of strategies

    • 4.2 Multivariate effects

      • 4.2.1 Between-subjects effects

      • Table 10: Multivariate statistics

      • Table 11: Between-subjects effects

      • 4.2.2 Within-subjects effects

      • Table 12: Within-subjects effects

      • Figure 5: Interaction plots

      • Table 13: Top-five individual strategies by task

  • 5 SUMMARY AND DISCUSSIONS

    • 5.1 Summary of results

      • 5.1.1 Guiding question 1

      • 5.1.2 Guiding question 2

      • 5.1.3 Guiding question 3

      • 5.1.4 Guiding question 4

      • 5.1.5 Guiding question 5

    • 5.2 Empirical implications

    • 5.3 Methodological implications

      • Figure 6: Results of the power analysis

      • Figure 7: Results of power analysis for correlational analysis (correlation coefficient of .5)

  • 6 CONCLUSION

  • ACKNOWLEDGEMENTS

  • REFERENCES

  • APPENDIX 1: SAMPLE CODING SCHEME

  • APPENDIX 2a: DESCRIPTIVE STATISTICS BY CONTEXT, PROFICIENCY LEVEL, AND TASK

  • APPENDIX 2b: DESCRIPTIVE STATISTICS (NON-ARCSINE-TRANSFORMED) BY CONTEXT, PROFICIENCY LEVEL, AND TASK

  • APPENDIX 3: RESULTS OF REPEATED MEASURES ANOVA ON RATER SCORES

Nội dung

Defining strategic behaviours

Strategic behaviors are defined as the intentional and goal-driven thoughts and actions that learners employ to manage their cognitive processes, ultimately aimed at enhancing their language learning and usage.

In this study, the terms "test-takers" in testing contexts and "learners" in non-testing contexts refer to the conscious thoughts and actions directly associated with the test-taking or task performance process, which are utilized to acquire or manipulate information.

In this study, strategic behaviors refer to the observable actions and thoughts of test-takers and learners, which are gathered through verbal think-aloud reports The use of these strategies is closely associated with cognitive processes, highlighting the connection between how learners approach tasks and their underlying mental activities.

Cognitive processes, a concept rooted in cognitive psychology, encompass the various ways sensory information is transformed, reduced, elaborated, stored, retrieved, and utilized (Neisser, 1976) To enhance their performance on tests or tasks, learners employ specific strategies—intentional thoughts and behaviors designed to effectively navigate these cognitive processes.

This study explored the strategic behaviors employed by test-takers and learners while completing the three IELTS speaking tasks in both testing and non-testing environments.

Strategic competence as part of the speaking construct

LT researchers have ongoing concerns about the various sources of variability that may influence performance on language tests (e.g., Bachman, 1990; Bachman and

Purpura, 1999; Shaw and Weir, 2007) Even though researchers and theorists view the L2 communicative construct as multidimensional (e.g., Bachman, 1990;

Bachman and Palmer, 1996; Purpura, 1998; Wesche,

1987), as pointed out by Kunnan (1998) and Douglas

Research conducted in 2000 has not yet provided evidence for the specific components and processes that define this multidimensional construct Additionally, it has not demonstrated how these components interact during language use A key aspect of this construct includes the strategies employed by test-takers and learners.

Strategic competence, a key aspect of Canale and Swain’s (1980) framework of communicative competence, refers to learners’ and speakers’ ability to effectively employ communication strategies to overcome communication breakdowns.

Since then, Canale (1983), Bachman (1990), Bachman and Palmer (1996), Douglas (1997) and Fulcher (2003) have all further discussed and expanded this component to include various strategic components (see Swain et al.,

2009) Much systematic research has examined the construct validation of the concept of communicative competence in L2 education (e.g., Bachman and Palmer,

1996; Harley, Cummins, Swain and Allen, 1990;

Jamieson et al., 2000; Milanovic et al., 1996; Palmer,

Groot and Trosper, 1981; Swain, 1985; Wesche, 1981)

Whether it is termed Canale and Swain’s (1980) communicative competence framework, Bachman’s

(1990) and Bachman and Palmer’s (1996) communicative language ability model, or the social- cognitive construct representation (see Chalhoub-Deville,

2003), strategic competence remains critical and has been recognised as interacting with other components of communicative competence (Swain et al., 2009)

While it is acknowledged that strategies and their interaction with tasks can influence performance, and that the strategies employed by test-takers can shed light on test validity, there is still a gap in research regarding the strategic aspects of speaking Additionally, the specific nature of strategic competence in the contexts of second language acquisition (SLA) and language testing (LT) remains underexplored.

Cognitive validity, as highlighted by Messick (1989), emphasizes the importance of verifying that test-takers engage in the assumed cognitive processes when responding to test items It is crucial for test developers and users to ensure that the skills and abilities assessed align with the actual processes utilized by test-takers, avoiding any construct-irrelevant variance in scores For instance, a speaking task in a testing environment may prompt different oral language production compared to a non-testing scenario, as test-takers might prioritize accuracy or fluency over effective communication This shift in focus can lead to the use of varying strategies to achieve their communicative goals.

The context in which testing occurs can significantly impact performance, suggesting that oral language production and strategic behaviors may vary between testing and non-testing situations This raises critical questions about the assessment of learners' communicative competence and the degree to which test performance reflects cognitive processes similar to those in real-world interactions As Douglas (2000) noted, validation is essential in understanding these dynamics.

The process of gathering and presenting diverse types of evidence is essential for understanding the true purpose of a particular test (Chalhoub-Deville, 2001, p 258) Researchers and test developers are encouraged to broaden their test specifications to encompass the underlying knowledge and skills related to the language construct (Chalhoub-Deville, 2001, p 225) Additionally, the strategic behaviors exhibited by test-takers during assessments serve as a critical source of construct-validity evidence (Bachman, 2002; Chalhoub-Deville, 2001).

McNamara, 1996), and the subject warrants ongoing, rigorous, and in-depth investigation.

Taxonomies and research on speaking strategies in the second-language acquisition (SLA) and language testing (LT) fields

Since the foundational work of Rubin and Stern in 1975, various researchers have sought to classify learner strategies, leading to overlapping categories in different taxonomies (Nakatani, 2006; O’Malley and Chamot, 1990; Oxford, 1990, 2011; Rubin, 1987; Stern, 1992; Wenden and Rubin, 1987) While there is some agreement on categorizing these strategies, achieving a unified theoretical framework remains contentious (Cohen, 2011; Cohen and Macaro, 2008; Macaro, 2006) In testing contexts, distinctions have been made between construct-relevant and construct-irrelevant strategies (Allan, 1992; Cohen, 2012), with critiques regarding the vague definitions and research tools used in categorizing learner strategies (Dornyei, 2005; Gao, 2007; Tseng et al., 2006) Key issues include the combination of strategies, the multifunctionality of single strategies, overlapping individual strategies, and the potential subdivision of strategies into sub-strategies (Cohen, 2007, 2012; Dornyei, 2005; Nikolov, 2006; Rose, 2012).

In the 1970s, significant research in the field of Second Language Acquisition (SLA) focused on descriptive studies that identified various learner strategy types and their frequencies, as highlighted by scholars such as Rubin (1975) and Naiman et al (1978).

Since the 1980s, there has been a significant shift in emphasis from product-oriented approaches to process-oriented strategies in second language acquisition (SLA) This transition has sparked considerable interest in understanding cognitive processing and the utilization of strategies in language learning, as highlighted by various studies (Cohen, 1984; Cohen and Aphek, 1981; Homburg and Spaan, 1981; O’Malley and Chamot, 1990; Wenden and Rubin, 1987).

1990s, research established the role that learner strategies play in making language-learning more efficient and successful (e.g., O’Malley and Chamot, 1990; Oxford,

Studies also have shown a positive association between proficiency level and the use of certain types of strategies, especially, for example, metacognitive (e.g.,

Flaitz and Feyten, 1996), cognitive (e.g., Oxford and

Ehrman, 1995), compensation (Dreyer and Oxford,

1996), and social-affective strategies (Nakatani, 2006)

In the area of speaking, several studies have addressed how learner strategies can help learners develop their oral communication ability (e.g., Cohen and Olshtain, 1993;

Cohen, Weaver and Li, 1996; Dadour, 1995)

Since Bormuth's 1970 call for increased focus on test-taker responses in first-language assessments, numerous studies have explored the strategies and processes employed by test-takers in language testing.

Bachman, Perkins and Cohen, 1991; Buck, 1991; Cohen,

Research on the interplay between language proficiency, strategic behaviors, and speaking performance remains limited, with findings showing inconsistent relationships between proficiency levels and the use of strategies (Yoshida-Morise, 1998; Cohen, 2011).

Despite previous research, there remains a lack of studies examining the strategic behaviors of test-takers and learners during IELTS-like speaking tasks Additionally, the connections between language proficiency, reported strategic behaviors, and speaking performance in both testing and non-testing environments have not been explored.

Despite the growth of learner strategy research over the past four decades, the relationship between strategy use and specific tasks and contexts has only recently gained attention, highlighting the need for substantial empirical evidence to advance the field (Macaro, 2006) Studies examining variations in tasks and contexts have demonstrated that both language performance and strategy utilization can vary significantly depending on the task at hand (e.g., Bachman).

Swain et al., 2009) Findings from previous research have also suggested that less-proficient L2 learners tend to use the same strategies repeatedly, whereas more-proficient

L2 learners draw on a greater variety of strategies to accomplish the different language tasks at hand (see

The effectiveness of strategy use is influenced by the specific task, context, and individual learner characteristics (Anderson, 2005) While the strategies themselves remain consistent, the varying demands of different tasks lead to differences in how learners apply these strategies (Macaro, 2006).

This study examined the speaking strategies employed during the IELTS speaking tasks, encompassing both testing and non-testing situations Recognizing that responding to language assessments necessitates various strategies—such as those for language learning, usage, and test-taking—the analysis utilized a strategy classification scheme derived from existing theoretical and empirical literature on L2 use and communication strategies, as referenced in works by Cohen and Upton (2006), Fulcher (2003), and others.

In this study, the analysis of test-takers’/learners’ strategic behaviours included the following six major categories:

(a) approach strategies (i.e., orienting oneself to the speaking task)

(b) communication strategies (i.e., involving conscious plans for solving a linguistic problem to reach a communication goal)

(c) cognitive strategies (i.e., manipulating the target language for understanding and producing language) (d) metacognitive strategies (i.e., examining the learning process to organise, plan, and evaluate efficient ways of learning)

(e) affective strategies (i.e., involving self-talk or mental control over affect)

(f) social strategies (i.e., interacting with others to improve language learning/use)

The current study analyzed all strategic behaviors utilized by participants during IELTS speaking tasks for two main reasons Firstly, to enable comparison with previous research, a comprehensive coding scheme was developed that synthesizes individual strategies and categories from existing literature Secondly, this inclusive approach allows for the investigation of how specific strategies influence oral production If certain strategies, deemed irrelevant to the task, are frequently employed, it highlights the need for careful test construction to mitigate the risk of test-wiseness, ensuring the integrity of the assessment process.

“without going through the expected cognitive processes” or “without engaging the second language knowledge and performance ability”) (e.g., Cohen, 2012, p 264; Yang, 2000)

This study examined both observable and reported strategic behaviours, as theoretically and operationally defined previously, in performing the IELTS speaking tasks Strategic behaviours, encompassing the so-called

Test-management strategies, defined as the consciously selected processes that aid in generating responses (Cohen, 2012, p 263), play a crucial role in effective communication While some may view these strategies as unrelated to the construct being assessed, they are, in fact, integral to essential skills such as organizing thoughts, managing time, and engaging with the interlocutor's interests These skills are vital for a speaker's ability to articulate opinions and participate in dialogue, whether in testing scenarios, simulated environments, or everyday interactions.

Stimulated retrospective recall as a data-gathering method

A large body of research in the area of learners’ and test- takers’ strategies has used questionnaires to elicit learners’ strategic behaviours (e.g., Phakiti, 2003;

Research by Purpura (1999), Taguchi (2001), and Yoshizawa (2002) raises doubts about the reliability of strategies identified through generic questionnaire items These strategies may not accurately represent learners' true strategic behaviors when faced with specific tasks in particular research or language contexts.

Methodologically, to enhance the quality of the data, this study has gone beyond the common self-report or questionnaire-based methods used to gather strategy- related data

As Macaro (2006) pointed out, “Questionnaires and inventories provide the broad picture; verbal reports

(think-aloud techniques and task-based retrospectives) effectively yield insights into skill-specific or task- specific strategy use” (p 321, emphasis mine) Since the

Since the 1980s, verbal reports have served as a key research method for collecting data on the strategic behaviors of learners and test-takers In the realm of second language (L2) studies, various verbal reporting techniques, including introspective, immediate retrospective, and delayed retrospective approaches, have been extensively utilized.

(Cohen, 1998, 2012; Ericsson and Simon, 1993; Gass and

Mackey, 2000) For example, diaries or dialogue journals and verbal reports have been used extensively by L2 strategy researchers (e.g., Anderson and Vandergrift,

1996; Bowles and Leow, 2005; Carson and Longhini,

2002; Halbach, 2000; Schmidt and Frota, 1986; Phakiti,

Learners’ introspection or retrospection may not provide a complete picture of any particular process and, as thoroughly examined by researchers across disciplines, is not without criticisms (e.g., Cohen, forthcoming;

Ericsson and Simon, 1993; Gass and Mackey, 2000;

In 2005, researchers highlighted the importance of verbal reports in understanding the true measurements of a test These reports provide insight into the cognitive processes that are not evident from test scores or observational data By analyzing these underlying processes, researchers can better comprehend how test-takers approach problem-solving and task performance.

A comprehensive review of numerous studies in 1993 revealed that when applied correctly, verbal protocol analysis serves as a valid and effective method Additionally, Macaro (2006) emphasized in his research on learner strategies that the methodology for capturing learner strategy use demonstrates an acceptable level of validity and reliability.

Guiding questions

This study was guided by the following inter-related research questions:

1 Strategic behaviours: When participants perform the

IELTS speaking tasks, what strategic behaviours do they report that they employ to regulate their cognitive processes in testing and non-testing situations?

2 Strategic behaviours vis-à-vis contexts: Is there a difference in participants’ reported strategic behaviours between testing and non-testing situations?

3 Strategic behaviours vis-à-vis proficiency levels:

When participants perform the IELTS speaking tasks, are there differences in their reported strategy use between advanced versus intermediate participant groups in testing and non-testing situations?

4 Strategic behaviours vis-à-vis task types: Are there differences in reported strategy use in performing the three IELTS speaking tasks in testing and non-testing situations?

This study explores the relationship between participants' reported and observed strategic behaviors in both testing and non-testing environments and their oral language production scores By examining how these strategies influence language performance, the research aims to uncover insights into effective oral communication Understanding these dynamics can enhance language instruction and assessment methods, ultimately improving learners' speaking abilities.

Research design and participants

The study examined four groups of international English-as-an-additional-language (EAL) students in British Columbia, Canada, consisting of 10 participants per group, totaling 40 participants Figure 1 illustrates the overall design of the study.

Subgroups A and B consisted of international EAL students with advanced and intermediate English language proficiency who took the IELTS Speaking Test in a simulated environment Similarly, subgroups C and D included two additional groups of international EAL students at the same advanced and intermediate levels.

C and D performed the same speaking tasks in the IELTS Speaking Test in a non-testing situation.

In a controlled study, subgroups A and B completed the IELTS Speaking Test under a simulated testing environment with an IELTS-certified examiner adhering to official guidelines Conversely, subgroups C and D engaged in the same speaking tasks, but outside of a formal testing context.

In a language-learning environment, an IELTS-certified examiner, who also serves as the participants' current or recent language teacher, was tasked with utilizing identical speaking practice tasks to engage with the participants.

Participants in both the testing and non-testing groups received specific instructions before the testing and practice sessions, emphasizing the importance of treating their tasks appropriately Each individual was reminded prior to each of the three speaking tasks to approach the subsequent task as they would in a formal testing environment for the testing group, or in a language-learning context for the non-testing group During the final think-aloud session, participants were asked to confirm whether they had adhered to the requested performance standards.

The sample size for this study was selected to manage costs effectively, maintain a balance between the number of variables and subjects for valid statistical analysis, and to gather comprehensive insights into strategy use from each participant.

The study concentrated on participants whose native language is Mandarin Chinese to maximize the information gathered By permitting participants to select their preferred language during the stimulated recall process, the research aimed to facilitate better expression of their thoughts.

The study focuses on participants fluent in a language familiar to the principal investigator to strengthen conclusions with available resources It addresses the representativeness of the sample, particularly noting that historically, Chinese-speaking individuals have formed the largest group of international students at the university from which the sample was selected.

In North America, this group represents one of the largest cohorts of examinees in English-language proficiency testing As summarized in Table 1, the participants predominantly majored in finance or business, with only one exception.

Research instruments

Background questionnaire

A comprehensive questionnaire was administered to four participant groups to gather demographic information, including age, gender, language proficiency, educational background, duration of residence in English-speaking countries, and prior experience with the IELTS speaking test, including scores All participants completed the questionnaire prior to the language proficiency pre-test Notably, each participant had previously taken the IELTS Speaking Test, with scores ranging from 5 to 6.5 and an average score of 5.8.

Pre-test language proficiency

Prior to the study, the oral proficiency of participants was evaluated by two experienced examiners from various language schools, using a modified version of Swain et al.’s (2009) pre-test This adaptation required participants to narrate a story based on pictures, while the remaining components and timing for preparation and responses remained unchanged The pre-test aimed to select participants with appropriate proficiency levels for a familiarisation test conducted one week before the main speaking tasks During the pre-test, administrators read instructions and questions aloud, meticulously timing preparation and response durations according to established guidelines.

Characteristic Testing group Non-testing group Overall

Age (years) M = 23.5, SD = 2.26 M = 24.2, SD = 2.69 M = 23.9, SD = 2.48

English language learning (years) M = 10.92, SD = 1.79 M = 10.85, SD = 3.82 M = 10.88, SD = 2.95

Length of stay in English-speaking countries (months) M = 22.05, SD = 19.37 M = 26.5, SD = 17.33 M = 24.3, SD = 18.31

IELTS Speaking Test

Two versions of the IELTS Speaking Test were used

Participants received a version of the test to familiarize themselves with the task types, while the scores helped validate the pre-test results and categorize learners into two proficiency levels for data analysis.

The study categorized respondents into two groups based on their IELTS scores: the intermediate group, comprising those who scored 6.0 or below, and the advanced group, consisting of individuals who scored above 6.0 This classification aligns with the institutional admissions requirement that mandates a minimum IELTS score of 6.0 for entry.

The other version was used for the main study for both testing and non-testing groups.The mean scores for tests administered in sessions 1 and 2 were similar

SD = 0.54) By context, the scores also were similar in both situations (testing, familiarisation: M = 6.40, SD 0.51; testing, main: M = 6.3, SD = 0.50; non-testing, familiarisation: M = 6.41, SD = 0.48; non-testing, main:

The testing duration for the main study was extended to 11 to 14 minutes to allow for immediate stimulated recall following each of the three speaking tasks, as detailed in Table 2.

Answer questions about themselves and their families

2 Speak about a topic 1 min 2-3 min

Engage in a longer discussion on the topic in Task 2

Table 2: A summary of task types in the IELTS

Task 1 of the IELTS Speaking Test involves asking test- takers to respond to general questions about themselves (e.g., their homes, families, jobs, studies and interests) and a range of everyday familiar topics Task 2 involves having test-takers talk on a particular topic for one to two minutes, with one minute of preparation time The examiner then asks one or two questions to conclude this portion of the test Task 3 involves a discussion of more abstract issues, which are linked to Task 2, with a similar set of directive prompts or input.

Data collection procedures

Prior to the two main data collection sessions, the research assistant clarified the study's purpose to interested participants in accordance with university ethical guidelines Participants were asked about their IELTS test-taking experiences and scores to confirm they met the selection criteria, which required them to be Chinese first-language speakers and university-level students with intermediate or higher English proficiency Those who met these criteria were then scheduled for the initial data collection session, where the outlined procedures were implemented.

Forty participants received a comprehensive overview of the study's purpose and their responsibilities during the two data collection sessions They were encouraged to ask questions that may have developed since their recruitment, in accordance with the university's ethical guidelines Following this, participants provided their informed consent to take part in the study.

2 Each participant completed the background questionnaire

3 Each participant completed a 10-minute pre-test proficiency assessment

4 Participants were individually administered a version of the IELTS Speaking Test that served to familiarise them with the task types to be expected in the following week

Each participant underwent a stimulated recall session and participated in a practice session following their final speaking task Importantly, the time frame for the test remained consistent throughout the administration of the three speaking tasks.

During the practice stimulated recall session after Task 3, the examiner recorded the scores for each participant

During the week between data-collection sessions 1 and

The principal investigator provided training to research assistants on enhancing the questions and procedures used in the stimulated recall sessions Additionally, during the same week, three independent raters evaluated audio clips from a 10-minute language proficiency test to assess each participant's spoken language proficiency.

English (The three raters each had graduate degrees and professional experiences in English language teaching.)

The oral-language production scores from the familiarization testing session were utilized to validate the pre-test proficiency assessment results and categorize learners into two proficiency levels for data analysis It was noted that some participants were already acquainted with one of the certified examiners, leading to their assignment to the non-testing group, while the remaining participants were placed in the testing group Subsequently, each participant's schedule for the following week was organized and confirmed via email and phone.

1 The testing group, i.e., 20 participants, were formally administered another version of the

The IELTS Speaking Test is conducted with a standardized approach, similar to the familiarization test in Session 1 The examiner adheres to a specific script for instructions and prompts, ensuring consistency in the management of the speaking test to maintain its reliability and validity.

In a non-testing environment, 20 participants were given speaking tasks that mirrored those of the IELTS Speaking Test, ensuring consistency in format with the testing group These tasks were conducted without the pressure of formal assessment, as outlined in Section 3.2.

3 All participants engaged in verbal reports through a process of stimulated recall immediately after performing each task (Bowles,

2010; Ericsson and Simon, 1993; Gass and

Mackey, 2000, 2012; Green, 1998) While the participants engaged in the stimulated recall, both examiners rated and recorded each participant’s spoken performance, using

IELTS’s official scoring criteria, before proceeding to the next task

The entire process for both groups was documented using two cameras and a digital audio recorder to enable stimulated recall and mitigate potential technical issues This recall session took place in the participants' native language immediately after completing each of the three IELTS speaking tasks, allowing for quick access to their short-term memory and accurate reporting of their responses (Ericsson and Simon, 1993; Jourdenais, 2001).

Data coding and analyses

Data coding

To protect participant identities, all video and audio clips were systematically organized and renamed with numbers The stimulated recall sessions of the 40 participants, totaling 120 clips, were transcribed with meticulous attention to detail During the data-coding phase, careful analysis was conducted on both the stimulated recall sessions and the oral-production data from participants' performances in IELTS speaking tasks The coding scheme for participants' strategic behaviors was adapted based on established classification systems from L2 learning and language teaching literature This scheme comprises six strategy categories: approach, cognitive, communication, metacognitive, affective, and social Definitions and examples of these strategies, presented in both Chinese and English, can be found in Appendix 1.

This pioneering study is the first to incorporate the coding of oral-production data, with the principal investigator and a research assistant independently coding 100% of the data from stimulated recall sessions and oral-production data for strategic behaviours Unlike the stimulated recall data, which was fully transcribed, oral-production data was coded directly from recordings without transcription to contain costs and focus on observable strategy use not reported by participants The coding process involved verifying participant-reported behaviours and resolving ambiguities, with additional reliability checks conducted by two research assistants who coded 60% and 30% of the data, respectively, resulting in an inter-coder agreement percentage of 96% between the principal investigator and the research assistants.

RA, 92% among the PI and RAs 1 and 2, and 91% among the three coders All coding disagreements were discussed until they were resolved.

Sampling design matrix

A multifactorial experimental design was employed for statistical analyses involving 40 participants, incorporating two fixed factors: context and proficiency level, which generated between-subjects effects Additionally, three repeated measures (tasks) were included, also resulting in between-subjects effects The design matrix was balanced, with an equal distribution of participants (n = 10) across each cell, as illustrated in Table 3.

Context Proficiency Level Task 1 Task 2 Task 3

The 40 participants were divided into two mutually exclusive groups according to context, termed non-testing

(n = 20) and testing (n = 20) Each context was further sub-divided into two mutually exclusive subgroups according to their proficiency level, termed advanced (n

The multifactorial model was designed to allow for comparisons across different proficiency levels and contexts, ensuring that each proficiency level was consistently represented within each context.

Dependent and independent variables

In this study, repeated measures were conducted on six dependent variables, which included both reported and observed strategies, to evaluate the performance of three IELTS speaking tasks (task 1, task 2, and task 3) All participant groups completed the same set of tasks, while the independent variables comprised four groups stratified by context (testing vs non-testing) and proficiency level (intermediate vs advanced).

The six dependent variables used to measure strategy use were the following strategies: affective, approach, cognitive, communication, metacognitive, and social

Each strategy-use score was constructed by

(1) calculating a total strategy-use frequency score, by adding up the individual participants' scores for the use of individual strategies within each strategy category;

(2) computing the total individual strategy-use frequency score for each strategy category (i.e., the sum of the scores for affective, approach, cognitive, communication, metacognitive, and social) reported by each participant;

(3) converting each strategy-use score into a proportion of the total strategy-use score; and (4) performing arcsine transformations of the proportions

Arcsine transformations were needed for the parametric statistical analysis, because (1) proportions are not continuous, but restricted in range from 0 to 1;

Proportions often exhibit a skewed distribution rather than a normal one, typically clustering at either extreme of the range The arcsine transformation effectively stretches extreme proportions near 0 and 1 while compressing those around 0.5, resulting in a more centralized distribution This transformation enhances the fit of the data to statistical models, allowing for more accurate inferences, including interaction effects However, a drawback of the arcsine transformation is that its descriptive statistics can be challenging to interpret, as they are expressed in radians (Tabachnik and Fidell, 2007).

Interaction

The study evaluated interaction effects among multiple factors, acknowledging that the level of each factor could influence others These interactions, which involve two or more factors acting in combination, indicate that participants’ strategies varied non-parallelly across different levels The specific interactions tested included context x level, context x task, task x level, and context x level x task, leading to the formulation of corresponding null hypotheses.

The null hypothesis (H0 1) posits that there are no significant differences in mean scores for the dependent variables among four distinct groups of participants, categorized by context and proficiency level.

 H02: There were no within-subjects effects (i.e., the mean scores for the dependent variables were not significantly different across the three tasks performed consecutively by all participants)

 H03: There were no significant interactions among context, task, and proficiency level.

Mixed model multivariate analysis of variance (MANOVA)

The study employed a mixed model multivariate analysis of variance (MANOVA) to evaluate the mean arcsine-transformed scores of four participant groups across six strategy types: affective, approach, cognitive, communication, metacognitive, and social, in both testing and non-testing contexts This statistical approach was appropriate for addressing the research questions, as MANOVA examines the impact of categorical independent variables on multiple dependent variables measured at the scale or interval level By generating a composite dependent variable through a linear combination of the inter-correlated dependent variables, the analysis effectively captured the complexities of the participants' strategy use.

V1, V2, V3, V4, V5, and V6, the new composite dependent variable, V n is: V n = a 1 V 1 + a 2 V 2 + a 3 V 3 + a 4 V 4 + a 5 V5 + a6V6 The coefficients a1 to a6 are calibrated to provide maximal differences between V n with respect to the effects of the specified factors

The advantage of using MANOVA is that differences between mean values may not be identified when the dependent variables are tested individually, but

MANOVA may reveal differences using a linear combination of dependent variables In addition,

MANOVA is an effective statistical method that safeguards against Type I errors, which can occur when multiple hypothesis tests are performed, leading to false rejections of the null hypothesis The likelihood of committing a Type I error increases with the number of tests, calculated as 1 - (1-α) k, where α represents the significance level and k denotes the number of tests For instance, conducting six tests at a significance level of α = 05 results in a Type I error probability of approximately 265, meaning that about one in four tests could yield misleading results by chance Prior to executing MANOVA, diagnostic tests are essential to verify that the data meets the necessary theoretical assumptions.

Statistical significance is determined by comparing p values from inferential test statistics against a significance level of α = 05 While p values indicate whether findings are likely due to random chance, they do not necessarily reflect the importance or practical implications of the results It is essential to understand that statistical significance does not equate to practical significance.

Effect sizes were calculated to demonstrate the practical significance of the findings, highlighting the strength of relationships between variables Many education and psychology researchers advocate for a shift away from an overreliance on p-values, suggesting that effect sizes should be prioritized for a more meaningful interpretation of results (Ferguson, 2009; Hill and Thompson).

In this study, we utilized eta squared (η²) as the effect size measure, which indicates the proportion of variance explained Unlike p values, effect sizes are independent of sample size, providing a more stable and informative measure of the strength of relationships in the data (Kotrlik & Williams, 2003; Kline, 2004; Kraemer et al., 2003).

Applying Ferguson’s (2009) criteria, η 2 = 04 indicated a minimal effect; η 2 = 25 indicated a moderate effect; and η2 = 64 indicated a strong effect

In MANOVA, the sample size significantly impacts the p-values of the test statistics, as is the case with all inferential statistical tests A small sample size may lead to insufficient power to reject the null hypothesis, potentially affecting the validity of the results.

Type II error could occur (i.e., the null hypothesis is falsely not rejected when, in fact, it should be rejected) According to Hair et al (2010, p 453) with respect to MANOVA, “As a bare minimum, the sample size in each cell (group) must be greater than the number of dependent variables As a practical guide, a recommended minimum cell size is 20 observations”

The study had a sample size of 10 participants per group, which exceeded the number of dependent variables (6) However, this sample size was only half of the recommended minimum of 20, potentially leading to insufficient statistical power for the MANOVA analysis.

In the context of MANOVA, while each dependent variable ideally should follow a normal distribution, the method demonstrates robustness against deviations from normality, provided the factorial design is balanced and deviations stem from skewness rather than outliers According to Hair et al (2010), MANOVA remains relatively insensitive to the frequency distribution's shape This study employed Kolmogorov-Smirnov (K-S) statistics to assess normality, revealing that only the variable of social strategies exhibited a significant deviation from normality at α = 001 (K-S = 2.763, p < 001).

Figure 2 illustrates the combined frequency counts of six dependent variables across participant groups, revealing a skewed distribution for social strategies While the distributions for approach, communication, cognitive, metacognitive, and affective strategies do not perfectly align with a normal distribution, they are adequately close to normality, which supports the application of parametric statistical methods.

Outliers, defined as extreme values that do not represent the sample, can introduce significant bias in parametric statistics, often more so than deviations from normality The MANOVA computational formulas rely on sums of squares, making them susceptible to distortion from outliers (Hair et al., 2010; Huberty and Olejnik, 2006) In this study, univariate outliers were assessed by calculating Z scores, which measure how many standard deviations each dependent variable deviates from its mean Ultimately, no outliers were found in the approach, communication, metacognitive, and affective domains, as all Z scores fell within the acceptable range of ± 3.3 (Tabachnik and Fidell, 2007).

Variables Affective Approach Cognitive Communication Metacognitive Social n 120 120 120 120 120 120

Note: * Significant deviation from normality at α = 001

Table 4: Tests for normality of six dependent variables

Note: APP = approach; COM = communication; COG = cognitive; METACOG = metacognitive; SOC = social; AFF = affective

Figure 2: Frequency distribution histograms of the dependent variables

The cognitive strategy variable, achieving a Z score of 3.312, was deemed acceptable for MANOVA analysis, despite being near the limits Slight deviations from normality are not expected to impact the results significantly, ensuring that statistical inferences remain robust, as supported by Hair et al (2010).

Previous simulation studies have shown that the false positive rate is minimally impacted by the violation of distribution assumptions (McDonald, 2009) However, a significant positive outlier with a Z score of 4.156 was identified in the social strategy variable, suggesting that it should be excluded from further analysis.

Table 5: Test results for outliers

In a MANOVA model, the multiple dependent variables are expected to exhibit multicollinearity, indicating inter-correlation among them A matrix plot illustrates the linear relationships among six key variables: affective, approach, cognitive, communication, metacognitive, and social, supported by simple linear regression trend lines The analysis revealed seven significant Pearson’s correlation coefficients (r ranging from -.169 to r = -.617), validating the combination of these variables for MANOVA analysis.

Note: AFF = affective; APP = approach; COG = cognitive; COM = communication; METACOG = metacognitive; SOC = social

Figure 3: Matrix plot between six dependent variables

Variable Affective Approach Cognitive Communication Metacognitive Social

Table 6: Matrix of correlation coefficients between six dependent variables (N = 40)

In MANOVA, the assumption of sphericity requires that the variances of the differences between repeated measures remain homogeneous Mauchly’s test was employed to assess this assumption at an alpha level of 001, as shown in Table 7 The results indicated that all dependent variables met the sphericity assumption, except for the social strategy variable, which had a Mauchly’s W of 625 and a p-value less than 001 Consequently, the social strategy variable was excluded from the MANOVA model.

Including the social strategy variable Excluding the social strategy variable

Variable Mauchly's W df p Mauchly's W df p

Table 7: Test for sphericity of within-subject effects

Homogeneity of variance The variances of each dependent variable in MANOVA theoretically should be homogeneous

Strategic behaviours

Participants employed a total of 90 distinct individual strategies across various tasks, resulting in 2,454 instances of strategy use This analysis categorized the strategies to better understand their frequency and application among participants.

Individual strategy Total M Range SD % in relation to strategy category

% in relation to total number of strategy used Approach

Recalling what one has said 5 13 2 404 2.31% 0.20%

Pausing to generate ideas/solutions 50 1.2 5 1.324 6.99% 2.04%

Spelling out to clarify meaning 1 02 1 158 0.14% 0.04%

Individual strategy Total M Range SD % in relation to strategy category

% in relation to total number of strategy used Cognitive

Recalling what one has written 3 08 1 267 1.53% 0.12%

Individual strategy Total M Range SD % in relation to strategy category

% in relation to total number of strategy used Metacognitive

Evaluating what one has heard 81 2.03 5 1.330 8.72% 3.30%

Asking examiner questions to direct conversation 1 02 1 158 0.85% 0.04%

Asking examiner questions to engage the examiner 2 05 1 221 1.71% 0.08%

Attending to the listener’s interest 7 28 2 446 5.98% 0.29%

Using examiner’s feedback in one’s response 8 20 1 405 6.84% 0.33%

Asking questions to lower anxiety 1 02 1 158 0.36% 0.04%

Engaging in positive self-talk 12 0.28 2 599 3.91% 0.45%

Table 9: Frequencies and percentages of strategy use for all three tasks combined (N = 40 for all strategies)

As Table 9 shows, the individual strategy with the highest percentage in each category were developing reasons (approach; 48.6%), linking (communication;

33.6%), organising thoughts (cognitive; 25.5%), evaluating performance (metacognitive; 18.1%), seeking clarification (social; 75.2%), and justifying performance

(affective; 58%) Overall, the top-10 individual strategies were as follows

1 Communication: Linking to prior experiences/knowledge (9.78%)

8 Metacognitive: Evaluating what one has heard

The analysis reveals that six out of the total strategies fall under the metacognitive category, accounting for 37.86% of all individual strategies employed This is followed by communication strategies at 29.14%, with one strategy each in the affective, approach, and social categories.

Analysis of the relationships among strategy categories revealed significant negative correlations in seven instances Specifically, the affective category showed a notable negative correlation with cognitive strategies (r = -.366, p < 05), communication strategies (r = -.203, p < 05), and metacognitive strategies.

The study found significant negative correlations among various strategy categories, indicating that participants who utilized more affective strategies reported fewer cognitive, communication, and metacognitive strategies (r = -.169, p < 05) Additionally, a strong negative correlation was observed between approach and communication strategies (r = -.325, p < 001), as well as between cognitive and social strategies (r = -.352, p < 05), communication and metacognitive strategies (r = -.617, p < 05), and metacognitive and social strategies (r = -.350, p < 05) These findings suggest that increased use of one strategy category often corresponds to a decrease in others, highlighting a complex interplay between different strategic approaches in participants' reporting.

The descriptive statistics for the arcsine-transformed strategy-use scores, categorized by context, level, and task, are detailed in Appendix 2a Additionally, Figure 4 provides a clustered and stacked bar chart that visually represents and compares the means of these scores.

The analysis of arcsine-transformed means revealed that metacognitive strategies had the highest overall mean score (M = 383, SD = 140), indicating their predominant use across the three tasks and two factors This was followed by communication strategies, which recorded a mean score of (M = 322, SD = 132) In contrast, lower mean scores were observed for affective (M = 102, SD = 081), approach (M = 085, SD = 069), and cognitive strategies.

The study revealed that the overall mean score for strategy use was 077 (SD = 069), with social strategy use exhibiting the lowest mean score at 052 (SD = 079), indicating the least proportional application of this strategy Detailed descriptive statistics for non-arcsine-transformed strategy-use scores, categorized by context, level, and task, can be found in Appendix 2b.

Figure 4: Comparison of mean (arcsine-transformed) scores for the use of strategies

Multivariate effects

Between-subjects effects

The null hypothesis, which posits no significant between-subjects effects, was upheld, as evidenced by a p-value greater than 05 for the F statistic (see Table 11) Nonetheless, the between-subjects effects were influenced by the interaction of task and context.

Effect Wilk’s λ Hypothesis df Error df p η 2

Within-subjects effects

The null hypothesis of no significant within-subjects effects was tested assuming a linear model (Table 12)

The study found statistically significant effects of three tasks on reported strategy use, particularly in affective, communication, and metacognitive variables, with p < 05 for the F statistics and small to moderate effect sizes (η² = 128, 413, and 459) Additionally, the main effects were influenced by significant interactions between task and context concerning affective and communication variables.

The study revealed that (a) the strategy use scores significantly differed between testing and non-testing contexts, but showed no difference between intermediate and advanced proficiency levels; (b) there was a notable variation in scores across the three tasks; and (c) an interaction effect was observed between task and context Initial analysis with raw frequency counts showed no significant interactions, leading to their exclusion from the model However, a significant task-context interaction emerged when using arcsine-transformed data, prompting the inclusion of interaction terms in both within-subjects and between-subjects effects.

The interaction plots in Figure 5 illustrate the task x context interactions for mean strategy use scores across affective, approach, cognitive, communication, and metacognitive dimensions Notably, the interactions were disordinal, as evidenced by the non-parallel lines representing the mean values of the testing and non-testing groups across the three tasks, which tended to cross each other Furthermore, the mean scores exhibited no systematic changes—such as consistent increases, decreases, or stability—across task 1, task 2, and task 3.

In the study, a notable interaction was observed for the affective variable, indicated by an increase in scores from task 2 to task 3 in the non-testing group, which contrasted with the testing group Additionally, a significant interaction was found for the communication variable, demonstrated by a marked decline in scores across tasks 1, 2, and 3 in the testing group, while the non-testing group did not exhibit similar changes.

The analysis reveals that the usage of strategies varied significantly between the testing and non-testing groups across three tasks In Task 1, the testing group primarily employed affective and communication strategies, while cognitive and metacognitive strategies were used less frequently Task 2 showed a notable increase in the use of approach, cognitive, and metacognitive strategies, with both groups demonstrating similar levels in the latter two However, Task 3 resulted in lower usage of communication and cognitive strategies for both groups, with the testing group favoring approach and metacognitive strategies, while the non-testing group leaned towards affective strategies.

Table 13 lists the top five individual strategies with the highest mean for each task, highlighting those unique to each context in bold The strategies employed by participants in both testing and non-testing environments show notable similarities, with linking (a communication strategy) and evaluating performance (a metacognitive strategy) appearing consistently among the top five for all tasks However, the strategy of restarting to ensure the correctness of utterances is specific to the testing context Importantly, none of these strategies fall under non-construct-related, test-wiseness strategies that test developers seek to eliminate (Cohen, 2012).

(forthcoming), “[T]est-wiseness strategies are best applied to the former two types of items [i.e., listening and reading tasks] and not to the latter [i.e., speaking and writing tasks].”

Seeking clarification (Soc) Linking (Com) Linking (Com)

Linking (Com) Evaluating performance (Meta) Evaluating performance (Meta) Evaluating performance (Meta) Restarting (Com) Justifying performance (Aff) Justifying performance (Aff) Seek clarification (Soc) Setting goal (Meta)

Evaluating task (Meta) Justifying performance (Aff) Developing reasons (App)

Linking (Com) Evaluating task (Meta) Linking (Com)

Evaluating performance is crucial for understanding outcomes and improving future efforts Linking concepts enhances comprehension and facilitates effective communication Seeking clarification fosters a deeper understanding of tasks and expectations Organizing thoughts is essential for clear expression and decision-making Justifying performance involves providing rationale behind actions and results, while setting goals is vital for guiding progress and maintaining focus Developing reasons supports the justification of actions, and evaluating tasks ensures that objectives are met efficiently.

Justifying performance (Aff) Evaluating performance (Meta) Evaluating performance (Meta) Evaluating performance (Meta) Setting goal (Meta) Justifying performance (Aff)

Linking (Com) Linking (Com) Linking (Com)

Evaluating task (Meta) Developing reasons (App) Setting goal (Meta)

Setting goal (Meta) Evaluating task (Meta) Evaluating task (Meta)

Note: AFF = affective; APP = approach; COG = cognitive; COM = communication; METACOG = metacognitive; SOC = social;

Table 13: Top-five individual strategies by task

Summary of results

Guiding question 1

Strategic behaviours: When participants perform the

IELTS speaking tasks, what strategic behaviours do they report that they employ to regulate their cognitive processes in testing and non-testing situations?

Participants reported that they used all six strategies

In both testing and non-testing contexts, various strategies were analyzed, including cognitive, communication, metacognitive, affective, and social strategies Notably, social strategies were the least utilized, leading to a right-skewed frequency distribution and significant deviation from normality, which resulted in their exclusion from further parametric statistical analysis In contrast, the scores for the remaining five strategies demonstrated a normal distribution, as confirmed by conservative statistical methods.

.001 level of significance for the Kolmogorov-Smirnov test, and no outliers were identified These scores for strategy use did not strongly violate the assumptions of

Overall, participants used 90 different individual strategies across all tasks (see Table 9) The metacognitive strategy category represents 37.86% of all individual strategies used, followed by communication

(29.14%), affective (11.45%), approach (8.80%), cognitive (7.99%), and social (4.77%) Similar to findings generated from previous studies examining test- takers’ strategic behaviours in performing speaking tasks

In the study by Swain et al (2009), metacognitive and communication strategies emerged as the most utilized among participants, while affective strategies, typically underrepresented in Second Language Acquisition (SLA) literature (Huang, 2012; Oxford, 2011), ranked third in usage Cognitive strategies, although prominent in past research within SLA and Language Teaching (LT), were less frequently employed during IELTS speaking tasks Notably, social strategies were uniquely highlighted due to the specific nature of the tasks in the first and third IELTS speaking assessments The individual strategies identified closely mirrored those from prior studies, with the exception of the unique social strategy Additionally, the top ten strategies utilized by participants revealed a consistent application of metacognitive strategies for goal-setting and performance evaluation, alongside communication strategies that connected to prior knowledge and experiences.

Guiding question 2

Strategic behaviours vis-à-vis contexts: Is there a difference in participants’ reported strategic behaviours between testing and non-testing situations?

The MANOVA analysis revealed a significant difference in strategy use between participants in testing and non-testing situations, leading to the rejection of the null hypothesis The presence of disordinal interactions indicated that the effects of context on strategy varied by task, complicating the interpretation of main effects This finding highlights the intricate relationships between strategy use and second-language performance, challenging the simplistic associations often presented in existing literature Such complexity necessitates further empirical research, as it has crucial implications for the validity of language assessments.

Guiding question 3

This study investigates the differences in strategy use during IELTS speaking tasks between advanced and intermediate participants in both testing and non-testing environments It aims to identify how proficiency levels influence the strategic behaviors employed by candidates, shedding light on the effectiveness of their approaches in various contexts Understanding these distinctions can enhance preparation methods for IELTS candidates at different proficiency levels.

The MANOVA analysis revealed no significant difference in strategy use between participants at advanced and intermediate proficiency levels, indicating that the strategies employed did not vary notably across these groups This outcome aligns with prior research suggesting that strategy use does not directly correlate with language performance in testing contexts (e.g., Purpura, 1999; Swain et al., 2009) Additionally, it challenges the traditional belief that more effective learners utilize a greater number of strategies, emphasizing instead the importance of managing a diverse set of strategies tailored to specific tasks However, it is crucial to recognize that the lack of statistical significance does not imply the absence of a difference, as the findings could be influenced by sampling limitations, particularly if the sample size was insufficient to provide adequate statistical power.

Guiding question 4

Strategic behaviours vis-à-vis tasks: Are there differences in reported strategy use in performing the three IELTS speaking tasks in testing and non-testing situations?

The MANOVA analysis revealed a significant difference in participants' reported strategy use across three tasks, leading to the rejection of the null hypothesis This indicates that the mean strategy-use scores for the five strategies varied notably between task 1 and task 3, which accounts for the significant within-subjects effects observed.

The MANOVA model revealed significant disordinal interactions between task and context regarding affective and communication strategies This finding is crucial, as statistically significant interactions complicate the interpretation of mixed-model MANOVA results, confounding the understanding of main effects associated with both between-subjects and within-subjects factors (Hair et al.).

The IELTS speaking tasks, comprising both monologues and dialogues, highlight the significance of affective and communication strategies These strategies are adaptable and can be employed variably across the different tasks and contexts, emphasizing their importance in effective communication during the exam.

More importantly, the finding indicates that further research is needed to examine test-takers’/learners’ patterns of strategy use for the same and different tasks on multiple occasions.

Guiding question 5

Strategy use vis-à-vis oral production: What are the relationships between participants’ reported and observed strategy use in testing and non-testing situations and their oral-language production scores?

The results showed that, overall, there was no difference in participants’ oral-language production, as measured by their IELTS speaking scores between the testing

SD = 0.55) groups Results from the repeated measures

ANOVA also showed that there was no significant difference between the scores for the two sets of ratings

(p = 213) and there was no interaction (p = 933)

Excluding the interaction also resulted in the value of p = 207 (refer to Appendix 3 for the results of the repeated-measures ANOVA on rater scores)

This study revisits the guiding research question by analyzing meaningful multivariate relationships among various variables Initially framed using a bivariate correlation matrix, the guiding question reflects common practices in the field; however, this approach is often prone to errors, as highlighted by previous research (Baron and Kenny, 1986; Edwards and others).

In Lambert's 2007 study, the correlational analyses revealed that all p values exceeded 05, indicating no significant relationships between strategy use—categorized by individual strategies—and oral language production scores across varying proficiency levels and contexts The simple bivariate correlation matrix was deemed unsuitable for assessing these relationships, primarily due to its inability to account for the mediating and moderating effects of key variables such as context, proficiency, and task.

First, a correlation matrix suffers from Type I errors caused by random chance, leading to meaningless conclusions about the relationships between variables

Constructing a 5 x 5 matrix with 25 correlation coefficients significantly increases the likelihood of a Type I error, where a statistically significant correlation is mistakenly identified at α = 05 The probability of this occurring is approximately 72.2%, meaning that nearly 3 out of 4 correlations may appear significant due to random chance rather than a true relationship among the variables.

Many bivariate correlations observed in a correlation matrix can be attributed to partial correlation, often referred to as spurious correlation (Haig, 2003) This type of spurious correlation arises when two variables are correlated due to their relationship with a third variable, which acts as a controlling or mediating factor.

When the influence of a third variable is accounted for or eliminated, the correlation between the first two variables disappears Conversely, this third variable may act as a mediating factor, affecting the strength or direction of the relationship between the first two variables.

Empirical implications

The examination of learners’ strategic behaviours and their strategy use in relation to context, proficiency, and task may have the following empirical implications:

Learners' strategic behaviors are evident in their speaking performance, as shown through stimulated recalls and oral-production data The interaction of these strategies with individual factors like proficiency levels, task types, and contextual variables highlights the need to reassess the theoretical foundations and methodologies used in studying learners' strategic behaviors To advance this field, it is essential to integrate insights from cognitive psychology and neuroscience, proposing a theoretical framework that aligns behavioral, psychological, and neural processes.

Evidence supporting the cognitive validity of the test is strong, particularly regarding the strategic behaviors of participants in both testing and non-testing situations, warranting further exploration Notably, aside from social strategies, individuals in the non-testing group employed fewer strategies across five categories compared to those in the testing group Additionally, the between-subjects effects related to context were statistically significant.

This preliminary finding underscores the key focus of this study in understanding whether the IELTS

The Speaking Test reveals behaviors in learners that are typically absent in non-testing scenarios Participants in the testing group were instructed to approach speaking tasks as if their university admission relied on their scores from a certified IELTS examiner, while the non-testing group practiced as they would in a regular classroom setting Despite these instructions, the testing group's performance lacked real consequences, whereas the non-testing group's interaction with an IELTS-certified examiner may have induced test-like feelings, potentially affecting their speaking scores and strategic behaviors However, it is challenging to accurately observe learners' strategic behaviors in authentic testing or learning environments, and efforts were made to closely simulate both conditions As this study is the first to explore learners' strategic behaviors in simulated contexts, further research is essential to confirm these findings, which hold significant implications for the development and validation of the IELTS Speaking Test.

The study found no empirical support for the role of strategic competence as a component that interacts with other aspects of communicative competence Participants with intermediate and advanced proficiency levels, based on their oral language production scores, showed no significant differences in their use of strategies for communication.

IELTS speaking tasks This finding is congruent with findings from previous research, although in a different high-stakes standardised testing context

Swain et al (2009) challenge the strategic component of the communicative competence framework, a concept that has been revised by various researchers over the last forty years Despite these modifications, empirical evidence supporting its conceptualization remains insufficient.

The analysis of participants' strategic behaviors during three speaking tasks revealed both similarities and differences in strategy usage patterns and frequency Findings indicate that various contexts and tasks can activate distinct strategies, highlighting the complexity of task-dependent strategy use, especially given the significant interactions between task and context To fully understand the study's results on task-specific strategy use, it is essential to consider factors such as potential learning effects from the stimulated-recall sessions conducted after each task and the inability to counterbalance task order among participant groups Additionally, learners' preferences for specific strategies may contribute to the observed similarities in strategy use across different tasks.

This study highlights the necessity for additional research to establish effective strategies for task performance, aiming to create a simplified framework applicable across various contexts (Macaro, 2006, p 329) One potential approach to achieve this is by having learners engage in the same task multiple times, allowing for the analysis of the sequence, clusters, and overall effectiveness of strategies employed for different task types.

This study aligns with Cohen's call for test constructors to recognize the strategic behaviors involved in test items through verbal report data collection The findings indicate that learners' strategy use significantly impacts their test performance and is relevant in both testing and non-testing contexts Participants employed various strategies that varied by task, highlighting the importance of strategic behaviors in IELTS speaking tasks Consequently, validating these behaviors as part of communicative performance is essential However, the study does not suggest the use of test-wiseness strategies among respondents Determining whether a strategy is construct irrelevant to identify potential measurement invalidity is complex, particularly regarding the categorization of test-taking strategies in reading and speaking It is crucial to cross-check the use of specific strategies with task designers' intentions to ensure that test construction effectively assesses respondents' underlying speaking competence.

Methodological implications

This study is the first to investigate learners' strategic behaviors in both testing and non-testing contexts, highlighting the need for further research in the oral construct and providing cognitive validity evidence It presents several methodological implications that could lead to breakthroughs in validating language ability and communicative competence models, where strategic competence is crucial for effective communication, as noted by Douglas (1997) and recognized by theorists like Bachman (1990).

Bachman and Palmer, 1996; Canale and Swain, 1980;

Chapelle and Douglas, 1993; Chapelle, Grabe and Berns,

1997; Douglas, 1997; Fulcher, 2003; Swain, 1985), must be considered

Critics often raise concerns about reactivity, the verbal reporting abilities of individual participants, and the validity of data in research studies Specifically, drawing participants' attention to their cognitive processes may influence their behavior, while the capacity of participants to articulate their thought processes can impact the accuracy of verbal reports Additionally, questions about whether the data truly reflect learners' behaviors have been thoroughly explored in existing literature, highlighting the complexity of these issues (see e.g., Bowles).

In the context of speaking tasks, while stimulated recall may enhance respondents' critical engagement and awareness, it does not enable them to exceed their existing knowledge and abilities, which the test aims to assess (Young, 2005) Furthermore, it is acknowledged that a complete understanding of respondents' strategic behaviors cannot be fully captured solely through their verbal reporting skills To address this, the study allowed participants sufficient time to articulate their thoughts and permitted them to use the language they felt most comfortable with during the recall sessions Researchers generally agree that verbalizing participants' thought processes provides deeper insights into their behaviors compared to relying solely on observations made during silent tasks.

This study represents a groundbreaking approach to examining learners' strategic behaviors by integrating post-task stimulated recall data with oral production data, marking a departure from traditional methods reliant solely on surveys, retrospective reflections, or isolated stimulated recalls This comprehensive analysis may explain the differences observed in findings compared to previous research conducted over the past four decades.

Nakatani, 2006) Construct validation is not a one-shot effort Further replication is warranted to see whether the present study’s results can be validated

Further research is needed to determine how learners' strategic behaviors vary with proficiency levels Without this evidence, the strategic aspect of oral communicative competence remains unsubstantiated, highlighting a significant gap in current research.

2 Along the line of methods used to elicit strategic behaviours, the use of rigorous stimulated recall sessions carried out immediately after each task is, as Macaro

In 2006, a critical review of research on language learning and use strategies highlighted a methodology designed to effectively elicit learner strategy use, achieving an acceptable level of validity and reliability.

Recent advancements in neuroscience, particularly through methods like functional magnetic resonance imaging (fMRI), suggest that metacognitive accuracy is distinct from task performance and can vary among individuals While stimulated-recall sessions conducted in participants' first language aimed to minimize cognitive interference, findings indicate that metacognitive insights may not align with self-reports To advance the field of learner strategic behaviors, interdisciplinary research is essential, integrating diverse data sources to reassess the validity of past research methods and findings Such progress is expected to foster innovative developments and fresh perspectives in the study of language learning and teaching (LT).

Research on learners' task-specific strategic behaviors should reconsider the statistical methods employed, especially when using non-questionnaire-based approaches In real-world scenarios, variables do not function in isolation, and even studies focusing on a single dependent and independent variable must account for the complex interplay of contextual, instructor-related, and learner variables Utilizing univariate research questions and conducting multiple univariate inferential tests, such as t-tests or univariate ANOVA, can lead to an increased risk of Type I errors, as the likelihood of obtaining statistically significant results by chance rises with each test performed Moreover, univariate statistics fail to adequately identify interactions among variables, which are crucial when dependent variables like strategy use are influenced by multiple factors, including context, proficiency level, and task.

Presenting the results for each guiding research question treats variables as isolated entities, which contradicts their real-world interconnectivity The dependent variables, such as strategy use, are inter-correlated, and the influence of context, proficiency, and task levels can interact with one another This interconnectedness is especially important in studies aimed at validating a test's cognitive validity.

Due to the potential compromise of statistical inferences in MANOVA by small sample sizes, it is crucial for future research to include a larger number of participants A power analysis was conducted using G*Power to support this need.

To determine the absolute minimum total sample size, three software programs were utilized, as outlined by Faul et al (2007) The analysis incorporated a moderate effect size (f(V) = 25), indicating that 25% of the variance in the dependent variable is explained The parameters also included a significance level (α) of 05, representing a 5% risk of a Type I error, and a power level of 8, which corresponds to a 20% chance of a Type II error Additionally, the study involved four groups of participants.

The analysis included 18 measurements, consisting of six dependent variables across three repeated measures The findings, illustrated in Figure 6, indicate that a total sample size of 175 is necessary, with at least 45 participants required in each group to ensure adequate power for testing the null hypotheses.

Insufficient power due to a small sample size limits the ability to compute correlation coefficients and conduct complex analyses like moderation and mediation, as seen in social-psychological research (Fairchild & MacKinnon, 2009) For accurate calculation of a moderate Pearson correlation coefficient of 5, a sample size of 29 is necessary, while a smaller correlation coefficient of 25 requires an increase to 97 participants.

A large-scale study is essential due to recent challenges to the reliability of findings from small sample sizes in various fields Notably, individual studies that initially showed significant effects often failed to replicate these results in larger, definitive studies This highlights the potential unreliability of meta-analyses that combine a limited number of non-significant studies to achieve significant outcomes, as demonstrated by Rerkasem and Rothwell (2010).

Figure 6: Results of the power analysis

Figure 7: Results of power analysis for correlational analysis (correlation coefficient of 5)

To strengthen the scientific impact of learner strategies in social psychology, it is essential to incorporate mediational and moderational research methods, which have not been previously applied in the fields of Second Language Acquisition (SLA) or Language Teaching (LT) These approaches will help investigate the intricate relationships among key variables, thereby enhancing construct validity as highlighted in this study Employing these statistical techniques can yield valuable insights into the dynamics of learner strategies.

SAMPLE CODING SCHEME

It is important to recognize that some examples illustrating individual strategic behaviors have been extracted from their original context during participants' IELTS speaking performances and stimulated recall sessions, which may lead to potential ambiguity for the reader.

Approach strategies: Involving what the test-taker/learner does to orient him- or herself to the task

Developing reasons Test-taker/learner offering explanations for doing what he/she does

If the room's structure isn't clearly defined, I prefer to explore broader concepts, such as the historical significance of rituals and their impact on humanity's future This approach allows for a deeper understanding of the subject matter.

If the description of the room's structure is unclear, I would expand the discussion to include the impact of historic ceremonies on humanity's future, providing a deeper analysis from that viewpoint.

When generating choices, test-takers or learners can consider examples from their family and friends to gather insights It's beneficial to explore what they have to say or share, as their perspectives might provide valuable information or ideas Engaging in discussions with those close to you can enhance decision-making and broaden your understanding of various options.

新闻报纸上说得这些东西嘛,拿出来就能用得是最好的。

I considered various examples from family and friends that could relate to the task questions, while also reflecting on relevant information from newspapers Utilizing direct insights from these sources would provide the most effective content for discussion.

When generating ideas, test-takers or learners may experience pauses, as they struggle to think of specific places or people This mental block can hinder their ability to respond effectively, leading to moments of confusion and hesitation.

I hesitated when faced with the question, pondering the types of places and people I could mention In that moment, I struggled to generate ideas and respond effectively.

Identifying task format Test-taker/learner trying to figure out the format of the task

I find that the conversational style of communication is more comfortable for me.

[I figured that] because the format of the task was like a conversation, I felt more comfortable about this task format… (L8, TASK 3)

Identifying task purpose Test-taker/learner trying to figure out the purpose of the task

The second part addresses a unique and specific topic; it's essential to clarify what you intend to answer and quickly identify the key points to focus on.

For the second part, it’s a special, a specific topic… Figure out how you need to respond, and quickly and clearly come up with a few points (T15, TASK 3)

Making choices Test-taker/learners narrowing down the choices in response to the question

Initially, I considered the Great Wall, but I realized that its deterioration is not primarily due to the many visitors In contrast, the damage to the ancient town of Li River is largely attributed to human factors, which have adversely affected both the environment and the architectural systems in the area.

While the Great Wall initially springs to mind, its deterioration is not primarily caused by the influx of tourists In contrast, the environmental degradation and structural damage within LiJiang Ancient City can be attributed to human activities.

Recalling questions Test-taker/learner thinking about the meaning of the questions

我是觉得好像每一个问题,对我来说好像都一样。

It seems that I have already addressed her previous question, but the next inquiry may have some specific differences that require further clarification.

During the examination, I experienced a sense of repetition with the questions, as they often felt similar to those I had previously answered Although there were slight variations in wording, the essence of each inquiry seemed to overlap, leading to a feeling of déjà vu throughout the assessment.

Recalling what one has said Test-taker/learner thinking about what he/she has said during the task

I will reflect on the questions I previously answered to support my final reasoning for expressing myself in this way.

I would think of my previous response and use it to support the reason why I express myself in such way (L12, TASK 2)

Communication strategies: Involving conscious plans for solving communication problems in order to reach a communicative goal

Abandoning Test-taker/learner abandoning ideas or utterances 事实上比如说develop their mind,

I realized that he might not understand my point, so I decided to simplify my explanation to make it easier for him to grasp.

Ngày đăng: 29/11/2022, 18:24

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN