The construct of listening
L2 listening comprehension involves connecting spoken language elements, such as words and phrases, to the listener's existing mental concepts and real-world references This process is crucial for effective understanding and interpretation of aural L2 speech, as highlighted by researchers like Buck (2001) and Rost (1990).
Vandergrift 2007; Rost 2005) L2 listeners have to isolate and semantically process salient, linguistic information
According to Révész and Brunfaut (2013), a crucial aspect of language acquisition involves recognizing the key components of the incoming speech stream Second language (L2) learners frequently struggle to comprehend this stream, particularly when their proficiency in the target language is lower than that of the spoken input.
Therefore, they must use their background knowledge and interpretive abilities to try to compensate for their deficits in automatic linguistic processing (Segalowitz
Skilled L2 listeners can often maintain comprehension despite not recognizing all linguistic elements in speech However, not all L2 listeners achieve this level of understanding, and even proficient listeners may encounter comprehension breakdowns, particularly when the speech is at or below their linguistic proficiency level.
Breakdowns in L2 listening comprehension can arise from multiple factors, including difficulties in chunking and storing oral information, recognizing phonemes, and associating meaning with grammatical concepts These challenges may be linked to issues such as a lack of attention, misdirected focus, or split attention due to competing cognitive demands L2 listening comprehension encompasses a range of sub-skills across cognitive, linguistic, and social-cultural dimensions, which can vary based on the listener's proficiency, their perceived need to understand, and the linguistic complexity and genre of the speech Consequently, defining L2 listening ability is complex, as highlighted by Wagner (2004), due to the diverse cognitive processes and individual variables at play.
Defining L2 listening as a subskill poses challenges for test developers, who must create assessments that accurately measure this ability High-stakes tests like the TOEFL and IELTS evaluate academic listening independently from speaking, reading, and writing To effectively design these tests, creators must clearly define the construct of L2 listening they intend to assess, adhering to the principle that tests should reflect real-life language use (Chalhoub-Deville 1997; Chalhoub-Deville 2001) In classroom settings where the language is not used outside the learning environment, real-life use is often interpreted through classroom interactions.
A reliable and valid L2-listening test is fundamentally shaped by various factors, including the characteristics of the test takers, their age, the specific listening objectives, the listening modes they utilize, and their listening strategies Defining the construct of L2 listening is a complex process that encompasses these diverse elements.
To effectively predict a test taker's future academic performance, tests like TOEFL® and IELTS™ should be designed to reflect real-world language use Admissions committees rely on these scores to gauge potential success, making it essential for test designers to create assessments that align with the language skills needed in prospective academic situations Additionally, listening tasks must accurately represent the types of challenges students will face in their educational environments.
Testwiseness
An effective L2-listening test evaluates not only L2-listening skills but may also inadvertently assess secondary skills, often referred to as construct-irrelevant skills, which are not the primary focus of the test.
As explained by Buck (2001), “in all listening tests the response [format] will be a potential source of construct- irrelevant variance” (p 125) This is because listening comprehension is an internal, cognitive process
Measuring listening skills involves having the listener respond to external stimuli, such as multiple-choice questions, or articulate their thoughts about the listening material The resulting performance score serves as an indirect indicator of the listener's cognitive processes Consequently, when listening assessments are evaluated through writing tasks, the individual's writing abilities may influence their performance on the listening test.
Another potential secondary skill that may be implicated during listening-test-taking is known as testwiseness
Millman, Bishop & Ebel 1965), that is, talent in being able to apply appropriate and effective test-taking strategies that relate directly to the test format (Sarnaki
1979) Testwiseness is considered something that helps test takers maximize their observed test scores (Rogers &
Yang 1996), but it is also considered independent of the test takers’ knowledge of the subject matter being tested
(Millman, Bishop & Ebel 1965) Researchers have suggested different operational definitions of testwiseness and ways in how to teach it (Pan 2010)
This study defines testwiseness as encompassing both the ability to utilize clues within test formats, as outlined by Sarnaki (1979), and the management skills necessary to effectively navigate the testing environment, as described by Cohen (2007) Testwiseness involves employing strategies to control one's thoughts and behaviors during assessments.
Research on the impact of testwiseness and test-management skills on listening test scores in foreign and second languages is limited However, these skills are generally recognized as important factors that can influence test performance.
(including test-taking strategies), have been shown to be positively related to test outcomes (Cohen 2007)
Dolly and Williams (1986) conducted a study with 25 undergraduate students who received a one-hour lesson on multiple-choice test-taking strategies, compared to a control group of 29 similar students After the lesson, both groups took a four-option multiple-choice test covering subjects like home economics, archeology, macroeconomics, and astronomy, followed by a test of testwiseness The results showed that the experimental group scored significantly higher on both the content test and the test of testwiseness Notably, their advantage was primarily on multiple-choice items that contained identifiable flaws, such as the longest answer being correct or having similar or opposite options These item-writing flaws were more effectively recognized by students trained to spot them, enhancing the experimental group's overall performance.
High-stakes tests ideally should be free of item-writing flaws; however, the presence of such flaws raises questions about the role of testwiseness in aiding students According to researchers like Taguchi, Vandergrift, and Pan, testwiseness encompasses metacognitive listening strategies that are independent of the test format, such as activating prior knowledge, making predictions, and inferring meanings from context Pan further suggests that students with higher proficiency levels are more likely to employ these metacognitive strategies effectively This indicates that successful use of these strategies is contingent upon the alignment of a student's listening skills with the difficulty of the listening material; if the material is too challenging, it hampers comprehension and the application of effective listening strategies.
Effective test preparation can enhance language proficiency by familiarizing students with various item types, which reduces the time needed to comprehend directions and optimizes cognitive resources This is especially crucial for exams featuring novel or multiple formats Additionally, targeted prep can teach students relevant metacognitive strategies aligned with their language proficiency, particularly concerning listening components Moreover, becoming acquainted with the test format can significantly alleviate anxiety, contributing to a more confident test-taking experience.
Testing companies assert that test preparation is advantageous, often profiting from this belief by marketing their own test-preparation materials Interestingly, they avoid using the term "testwiseness" in their promotions, likely due to its technical nature This article examines how three distinct testing companies promote their L2-test-preparation materials on their websites, while also exploring their references to enhancing testwiseness.
According to the Educational Testing Service (ETS) TOEFL iBT® website (2014), their TOEFL practice materials, which encompass sample questions, practice tests, interactive skill-building programs, and comprehensive tips, are designed to aid test takers in their preparation and enhance their test-taking skills Additionally, these materials are intended to help individuals improve their English language proficiency.
In other words, the test-preparation would help test takers’ increase their English-language listening skills along with their testwiseness
(http://www.ets.org/toefl/ibt/prepare/)
The IELTS (2014) test-preparation document available on the official IELTS website does not assert that test-preparation enhances L2 skills Instead, it offers a compilation of resources for sample-test materials and preparation courses The authors indicate that while attending a preparation course is not mandatory for test takers, many candidates benefit from such courses, as they often lead to improved performance.
! The College Board’s (2014) website for the
Scholastic Aptitude Test (SAT) offered five different levels of practice-material options, from
“free practice” (free, sample practice questions) to
“affordable practice” (online practice-test courses) for USD69.95 (http://sat.collegeboard.org/practice)
For the language sections, more directions are given
For example, for the French with Listening Subject
Test, “recommended preparation” includes three to four years of high school (or equivalent) French classes and a “review of sample listening questions using a Subject Test with Listening practice CD”
(http://sat.collegeboard.org/practice/sat-subject-test- preparation-french-with-listening)
Advertisements from testing companies imply an awareness that testwiseness is crucial for optimizing test scores (Rogers & Yang, 1996) By promoting the advantages of test preparation, these companies may be conceding that their test questions can be influenced by testwiseness strategies However, it remains uncertain whether they consider testwiseness to be separate from the test takers' subject-matter knowledge (Millman).
Bishop & Ebel 1965), especially if one considers ETS’s claim that practice, which includes their “interactive skill-building programs” aids test-understanding and builds English skills.
Test anxiety
Test anxiety is a significant construct-irrelevant variable that impacts test outcomes, causing individuals to underperform due to negative thoughts, excessive worry, and emotional instability in response to testing conditions.
2010, for reviews; Hembree 1988) More specifically, as quoted in In’nami (2006, p 318-319), test anxiety is a
“special case of general anxiety consisting of phenomenological, physiological, and behavioral responses” which is related to an overall fear of failure
(Seiber 1980, p 17) It is hypothesized that test anxiety may co-vary with testwiseness in explaining the total variance in L2 listening test performance (Golchi 2012)
More specifically, testwiseness may be inversely related to test anxiety—as testwiseness increases, test-taking anxiety may decrease, as researchers in general education
(Kalechstein, Hocevar & Kalechstein 1998) and applied linguistics (Elkhafaifi 2005; Golchi 2012) have shown
A study by Kalechstein et al demonstrated that fifth and sixth graders who received instruction in test-taking strategies and practice-reading tests performed better on subsequent assessments and experienced lower test anxiety compared to a control group without such instruction This research supports the idea that a teacher's positive and supportive approach can alleviate anxiety and assist students in managing stress during challenging situations (Gregersen & Horwitz, 2002) Additionally, it aligns with findings that suggest practicing test-taking serves as a form of systematic desensitization, helping individuals gain emotional and mental control over anxiety-inducing scenarios (Arnold, 2000).
Many researchers have explored the impact of anxiety on second language performance (Hewitt & Stephenson 2012; MacIntyre & Gardner 1991; MacIntyre & Gardner 1989; Cheng 2004; Ergene 2003; Cassady & Johnson 2002), but there is limited focus on its effects specifically on listening test performance (Elkhafaifi 2005; In'nami 2006; Golchi 2012).
In a 2005 study, Elkhafaifi examined the relationships between listening-comprehension grades, Arabic-listening anxiety, and general Arabic-language-learning anxiety among 233 undergraduate learners of Arabic He adapted a reading anxiety scale for a listening context, specifically for Arabic, but did not clarify how listening comprehension was assessed or provide relevant statistical data Despite these limitations, his findings revealed that higher listening comprehension grades correlated with lower anxiety levels (r = -.53 for general foreign language anxiety and r = -.70 for listening anxiety) Additionally, third-year students exhibited significantly less anxiety compared to first and second-year students, with no notable differences between the latter two groups Elkhafaifi concluded that increased anxiety negatively impacts student performance.
Lower proficiency students may experience heightened anxiety due to their inability to comprehend material as effectively as their more proficient peers, a notion supported by Sparks and Ganschow (2007) This anxiety might not directly cause lower test scores but could instead reflect a lack of comprehension Dunkel (1991) noted that students struggling with listening skills often feel inadequate, indicating a direct link between listening frustrations and anxiety While Elkhafaifi's study offers valuable insights, it fails to clarify how anxiety, particularly test-taking anxiety, impacts students' performance on assessments.
To understand this, researchers need to manipulate the level of test-taking anxiety in a group of test takers to see if different anxiety levels result in differentiating test scores
Golchi (2012) conducted a study similar to Elkhafaifi’s
(2005), but better controlled the listening comprehension test scores by providing all of her 63 English-language learners with an IELTS academic-English listening test
She gave the learners the Foreign Language Listening
Anxiety Scale (FLLAS) developed by Kim (2000) and later validated by Kimura (2008) Golchi found a correlation between anxiety and listening: the higher the anxiety, the lower the listening-test score (r = -.63)
Likewise, the higher the anxiety, the less frequent the use of listening strategies (r = -.32) But again, as with
Elkhafaifi (2005) indicates that lower proficiency in listening, as evidenced by reduced test scores, correlates with increased anxiety and a diminished use of listening strategies However, the study does not establish a definitive causal relationship between anxiety and students' test performance.
In a third study, In’nami (2006) investigated the English- listening comprehension and test-taking anxiety of
In a study involving 79 first-year university students in Japan, participants enrolled in general English classes were assessed using listening comprehension test items derived from the TOEFL listening test The learners completed this evaluation to measure their listening skills effectively.
Anxiety Scale from Sarason (1975) and the Test
Fujii's (1993) Influence Inventory study utilized structural equation modeling and revealed that test-taking anxiety did not predict listening test performance among participants with high English-language proficiency In’nami emphasized the need for better control and definition of proficiency levels in future research Additionally, he pointed out that the order of assessments—administering anxiety questionnaires before listening tests—might have influenced students' responses, suggesting that future studies should reverse this order to minimize bias.
Previous studies have not clarified the extent to which a test-taker's score in listening assessments can be influenced by factors such as testwiseness and test anxiety Although researchers have explored this issue, their findings remain inconclusive As the prevalence of test preparation resources increases and testing agencies market materials that purportedly enhance scores, addressing this question has become increasingly critical This raises an ethical concern: if testing companies assert that their preparation tools boost scores, is it ethically sound for them to sell these materials?
The testing industry offers two distinct tiers of test packages: one that includes test preparation and another that does not This raises ethical concerns regarding fairness, as test takers who forgo preparation may face significant disadvantages The question arises whether it is justifiable for testing companies to sell preparation materials separately, potentially favoring those who can afford to pay more Before delving into these ethical dilemmas, it is crucial to address preliminary questions that can be explored through empirical data.
Research indicates that L2 learners enhance their testwiseness through intensive training, which includes practice tests and explicit learning of test-taking strategies This increased testwiseness allows test takers to improve their scores when encountering items that benefit from these strategies This study aims to explore the extent and mechanisms through which test takers elevate their scores by employing effective test-taking strategies.
This study aims to investigate the varying effectiveness of two distinct types of test-taking instruction: explicit strategies instruction and implicit strategies instruction Additionally, it will explore the impact of multiple practice tests compared to a single practice test on student performance.
We also measure test-taking anxiety to understand more completely if anxiety covaries with testwiseness in explaining overall L2-listening-test score variance
To enhance our understanding of test-takers' cognitive processes during L2 listening tests, we incorporate eye-tracking methodology alongside retrospective verbal reports This approach allows us to accurately observe how individuals process visual information and allocate their attentional resources while listening By monitoring eye movements, we aim to identify changes in test-taking strategies that may arise from varying test-preparation methods.
We do this because there is little or no research on the effects of different question formats in L2 listening tests The present study will help fill this research gap
Eye-tracking technology is increasingly being utilized in language test development and research focused on the cognitive validity of test items This growing interest in eye-movement data highlights its importance in understanding how test-takers process language assessments.
Educational Testing Service employs eye-tracking technology to analyze the strategies used by test takers in answering questions, ensuring that test items accurately measure their intended constructs Additionally, researchers like Bax (2013) and Bax and Weir (2012) have utilized eye trackers to assess whether test items provoke the expected cognitive processes.
A summary of the study’s variables are in Figure 1
In particular, with the current research, we aim to address the following questions
1 What effects does L2-listening-test preparation have on (a) test scores, (b) testwiseness, and (c) test- anxiety levels?
2 Do the constructs of testwiseness and test anxiety relate?
3 How do the effects of test preparation manifest themselves (i.e., in altered test-taking processes)?
Figure 1: Diagram of the proposed constructs contributing to L2-listening test scores
Participants
Seventy-six English-language learners from Michigan
State University’s (MSU) English Language Center
(ELC) participated in at least the first parts of this study
In this study, we analyzed the results from 63 participants who successfully completed all components, including a pretest, two test-preparation training sessions, questionnaires, an interview, and a posttest, amounting to approximately 8 hours of engagement per participant.
The 63 test takers were in the ELC’s English for
Academic Purposes classes are designed for students who have been provisionally admitted to the university These students participate in courses at the English Language Center (ELC) to enhance their English language skills, with the goal of transitioning from provisional status to regular matriculation.
MSU academic programs Based on the placement test scores that the ELC and the university used to place them into the EAP courses (the Michigan State University
The English for Academic Purposes (EAP) courses at Michigan State University are designed to enhance the English language skills of international students through credit-bearing courses led by experienced faculty Upon acceptance, students must complete the MSU English Language Test to determine their placement in either Intensive English Courses or advanced EAP courses The EAP curriculum includes courses focused on grammar, composition, listening, speaking, and academic reading, each aimed at developing essential skills for academic success Students enrolled in the Intensive English Program (IEP) can concurrently take academic courses if admitted under provisional status For more information, contact the English Language Center.
American Council on the Teaching of Foreign Languages
(ACTFL) Proficiency Guidelines scale (ACTFL 2012).
Materials
Pre and posttests of listening
In this study, we assessed the listening skills of 63 learners by administering a 40-item pretest and a posttest to evaluate changes in scores and test-taker perceptions based on the type of listening practice received We utilized two distinct IELTS TM practice-test forms from an official Cambridge University Press publication to conduct these assessments.
In the Cambridge IELTS 8 (2011), the pretest utilized was Test 3 (pp 56-64), while the posttest employed was Test 1 (pp 10-17) Both tests featured comparable formats, including fill-in tables, fill-in gaps, and multiple-choice questions.
In the summer of 2012, Vineet Bansal, a computer programmer at the Center for Language Education and Research (CLEAR) at Michigan State University, developed computerized test forms for a significant project.
Three questionnaires (listening strategies, test-taking strategies, test anxiety)
(listening strategies, test-taking strategies, test anxiety)
Besides a general background questionnaire, we employed three questionnaires in this study that each learner would take twice (pre and post treatment): (a) a listening-strategies questionnaire,
(b) a test-taking-strategies questionnaire, and (c) a test-anxiety questionnaire, which were administered as three parts on a single questionnaire form
We finalized the three questionnaires through piloting in the summer of 2012; 40 English-language learners (not those included in the fall (autumn) 2012 data collection sessions) at Michigan State University’s English
Language Center participated in the pilot testing
We adopted and modified the listening-strategy questionnaire from Vandergrift (1997) The test-taking- strategy questionnaire was adopted and modified from
Cohen and Upton (2007) focused on the iBT TOEFL® reading test, which consists solely of multiple-choice questions To enhance our study, we incorporated fill-in-the-gap questions that are also part of the test format.
In this study, we adapted the test of test anxiety developed by Cassady and Johnson (2002) to specifically address the context of the IELTS TM listening test for ESL learners Our modifications aimed to ensure the instrument accurately reflects the unique challenges faced by non-native speakers during language assessments.
The original numbers of items on the questionnaires were
28, 32, and 14, respectively Using IBM’s SPSS version
In our study, we conducted an exploratory factor analysis (EFA) on data collected from 40 summer pilot-test learners to streamline the questionnaires Anticipating a relationship between question items and factors, we utilized an oblique rotation and the direct oblimin method, as outlined by Field (2009) To simplify interpretation, we focused on the pattern matrix and excluded items with coefficient values below 40, indicating they did not significantly load on the respective factors Consequently, we removed factors that demonstrated minimal relevance.
In our study, we focused on eigenvalues smaller than 1 and excluded items that loaded on multiple factors to prevent overlap Due to the failure of SPSS to generate a pattern matrix for test-taking strategies, we utilized the component matrix for item reduction Consequently, our pilot testing and exploratory factor analysis (EFA) resulted in 15 items for measuring listening strategies, 16 items for test-taking strategies, and 11 items for assessing test anxiety.
6-point Likert-scale items that ranged from “extremely true of me” (6) to “not true of me at all” (1)
The questionnaire items that remained after pilot testing and which were used in the main study are listed in
Stimulated-recall interview questions
At the conclusion of each test taker's final data collection session, we compiled a series of questions aimed at exploring the thought processes of learners during the listening posttest This approach allows us to gain insights into their cognitive strategies and experiences while engaging with the assessment.
In a study exploring eye movements during posttest evaluations, learners viewed a video stimulus displaying their eye tracking on the final page The researcher guided the stimulated recall process using specific directions and questions adapted from Gass and Mackey (2000).
During this session, the researcher will record audio to document the participant's insights Participants will view a video clip showcasing their eye movements and are encouraged to reflect on their thoughts during that moment.
Feel free to pause the video at any moment to share your thoughts from that time I will also pause occasionally to ask questions Please focus on recalling your thoughts during the test, rather than your current reflections while watching the video My goal is to gain insight into your mindset during the test Do you have any questions?
! Questions the researcher was allowed to ask during the stimulated recall:
- What were you thinking then?
- What were you thinking at the time when you read the question?
- What were you thinking about when you checked the options?
- What were you thinking when you read that?
- When you were making decisions on the test, did you have any thoughts that popped into your head?
- Did anything in particular occur to you while you were solving the test questions?
Procedure
We used the listening pretest scores to assign the original
76 participants to three different treatment groups
The groups were the following:
! Explicit group: received test-taking strategies instruction and took practice IELTS listening tests
! Implicit group: received vocabulary instruction and took the same practice IELTS listening tests as the explicit group
! Control group: received instruction on American culture
We ensured that each group had equivalent listening pretest score averages and standard deviations Following attrition, the explicit instruction group consisted of 21 participants, the implicit instruction group had 22, and the control group included 20, maintaining balanced listening proficiency across all groups (refer to Appendix B for average scores).
A one-way analysis of variance (ANOVA) was conducted to compare the average listening pretest scores among three groups, revealing no significant differences The analysis included only the 63 participants who completed all measures, yielding results of F(2, 61) = 172, p = 84, and an eta squared value of 006.
During the data collection phases of the study, we conducted listening pre and posttests using a computer equipped with a 23-inch wide screen TFT monitor and Tobii TX300 eye-tracking cameras to capture the eye movements of test takers Participants were provided with a blank sheet of paper for note-taking while they listened.
All learners were invited individually to take the tests and fill out the questionnaires at the Michigan State
At the University’s Second Language Studies Tobii eye-tracking laboratory, participants met with researcher Hyojung Lim after signing a consent form and completing a background questionnaire Hyojung ensured optimal conditions for eye tracking by having participants adjust their chair height and posture, aligning their eyes with the computer screen's center She verified that the distance from the participants' eyes to the cameras was maintained between 60 cm and 65 cm to enhance gaze accuracy and precision, followed by calibrating their eye movements to the eye-tracking camera.
During the 9-point calibration procedure, participants observed a sequence of 9 dots appearing individually on a computer screen Hyojung closely monitored their eye movements using an external viewer to ensure accurate eye tracking throughout the calibration and the experiment In the event that a participant's eyes became untrackable, Hyojung promptly noticed this on the external viewer, halted the experiment, recalibrated, and restarted the process Fortunately, no issues with eye tracking occurred during the session.
63 participants that remained in the study
On the first visit, a participant took the first form of the
IELTS listening test (as a pretest) on a computer screen
The testing session lasted 30 minutes, during which Hyojung provided each participant with a blank sheet of paper and a pen for note-taking After the audio portion concluded, an additional 10 minutes were allocated for participants to finalize their answers, allowing those who needed to transfer their notes to the computer screen adequate time Following the pretest, participants completed three online questionnaires: the listening strategy questionnaire, the test-taking strategy questionnaire, and a test of test anxiety (refer to Appendix A for the questionnaire items).
Participants attended their assigned training session one day to two weeks after completing the pretest and initial questionnaires The two-hour session involved dividing learners into three groups—explicit, implicit, and control—ensuring balanced mean pretest scores across groups Both experimental groups (explicit and implicit) engaged in identical IELTS listening-practice tests.
(also from Cambridge IELTS 8 2011; Cambridge IELTS
In a study conducted in 2009, participants were divided into three groups: an explicit group that received targeted test-taking strategies, an implicit group that focused on vocabulary related to listening test items, and a control group that attended general English and American culture lessons without practice tests Each session lasted two hours and included a dinner break, with pizza and fruit provided for the first session and Asian food for the second.
A week later, a second two-hour training session took place, with each group receiving consistent instruction Instructors Paula Winke and Laura Ballard, both experienced in teaching, switched roles to mitigate any potential teacher effects During the first week, Paula instructed the explicit group while Laura led the implicit group; in the second week, their roles were reversed The control group was also taught by both instructors, ensuring that all learners experienced the same teaching staff across both sessions.
Two weeks following the second training session, each participant individually returned to the eye-tracking laboratory to meet with Hyojung and complete the second IELTS listening test, referred to as the posttest Participants were permitted to take notes and were allotted an additional 10 minutes after the 30-minute listening session, consistent with the previous test format After completing the posttest, Hyojung instructed participants to fill out the same questionnaires as before, including the listening strategy questionnaire, the test-taking strategy questionnaire, and the test of text anxiety, this time reflecting on their posttest experience.
Participants engaged in a stimulated recall session where they viewed a video of their eye movements on the final web page of listening posttest questions They could respond in either English or their native language, and these sessions were audio recorded Upon completion of the study, each participant received USD 40 for their time Due to the one-at-a-time testing format for pre and posttests, the intervals between the pretest and the first training session, as well as between the second training session and the posttest, varied among participants but did not exceed two weeks.
A diagram of the study procedure is in Figure 2 below.
Analyses
In our multiple-methods study, we utilized both quantitative and qualitative approaches to analyze the diverse data collected from participants We calculated summary scores on quantitative measures for each individual and group, ensuring comprehensive analysis Prior to this, we reverse-coded responses to the initial two statements of the test-taking anxiety questionnaire, as they assess anxiety in a direction opposite to the remaining nine items.
To address the research question regarding the impact of L2-listening-test preparation on test scores, testwiseness, and test anxiety levels, we analyzed descriptive statistics and conducted one-way ANOVA tests using IBM SPSS version 22 This analysis aimed to determine whether the three groups exhibited differing performance across various measures—including listening test scores, testwiseness, and levels of test-taking anxiety—following a four-hour instructional treatment.
To address research question two (Do the constructs of testwiseness and test anxiety relate?), we ran Spearman correlations
To investigate how the effects of test preparation manifest, we conducted a quantitative and qualitative analysis of eye movement data alongside stimulated recall interview data Our methodology was informed by established procedures from previous studies focusing on the eye movements and stimulated recalls of L2 test-takers (Bax & Weir, 2012; Bax, 2013).
Research question 1
The initial segment of research question one investigates the impact of L2-listening test preparation on test scores To provide context for this analysis, we first present descriptive statistics in Table 1, illustrating the learners' scores across different measures based on their assigned groups.
Gain on fill- in-the-gap questions from pre to posttest (in %)
Gain on multiple-choice questions from pre to posttest (in %)
Gain in listening strategies score (pre to post)
Gain in test-taking strategies score (pre to post)
Gain in test-taking- anxiety score (pre to post)
Table 1: Descriptive statistics of the test takers scores on the study’s measures pre to posttesting
In a study involving 63 English-language learners, an independent samples t test revealed a significant improvement in their average scores on the second IELTS TM listening test, increasing from 16.41 to 20.87 out of 40 after four hours of instruction, resulting in an average score increase of 4 points (t = 8.2, df = 62, p = 000, d = 64) However, when analyzing the gain scores by group, no significant differences were found in posttest scores Both pretest (F(2, 61) = 172, p = 84, eta squared = 006) and posttest (F(2, 61) = 708, p = 50, eta squared = 023) results indicated no group differences.
The study revealed that while learners demonstrated overall improvements from the pretest to the posttest, there were no significant differences in gains based on the type of instruction received This indicates that we should accept the null hypothesis, suggesting no variations among groups regarding their L2-listening posttest performances The type of instruction did not influence the learners' test score improvements, as illustrated in Figure 3.
We looked at the test takers’ scores by item type as well
We analyzed group performance on multiple-choice and fill-in-the-gap test questions to determine if either format was more influenced by test-taking strategies or practice-testing methods Our goal was to assess whether students in specific test-preparation groups achieved higher overall posttest scores in either question type To ensure comparability, we first examined the average scores presented in Table 2, confirming that the groups performed similarly on both multiple-choice and fill-in-the-blank questions during the pretest.
The statistical analysis revealed no significant differences in learner performance on multiple-choice and fill-in-the-gap questions, with F(2, 61) = 112, p = 894 for multiple-choice and F(2, 61) = 106, p = 899 for fill-in-the-gap On average, learners demonstrated similar performance levels on both question types during the posttest.
F(2, 61) = 948, p = 393 for multiple-choice; and F(2, 61) = 585, p = 560 for fill-in-the-gap
Performance on the two primary question formats of the L2-listening test showed no significant improvement linked to specific test-training methods Instead, results remained consistent across all groups for both item types, irrespective of the learners' test preparation approach.
Figure 3: Gains from pretesting to posttesting on the L2-listening test by group
Table 2: L2-listening test question types and group performance on them
Group Listening strategies Test-taking strategies Test-taking anxiety
Pre Post Pre Post Pre Post
Table 3: Average scores per group on the three questionnaires, pre- and post-treatment
Post-treatment analysis of learners' scores on three questionnaires—L2-listening strategies, test-taking strategies, and test-taking anxiety—revealed no significant differences between groups As shown in Table 3, the descriptive statistics indicate that the average scores across all three measures remained consistent, regardless of the type of instruction received Specifically, the results were F(2, 61) = 684, p = 509 for listening strategies; F(2, 61) = 339, p = 714 for test-taking strategies; and similar findings for test anxiety, suggesting that the instructional methods did not impact learners' performance in these areas.
Research question 2
Research question two asked, “Do the constructs of testwiseness and test-taking anxiety relate?” We posed this question because several researchers (Elkhafaifi
1998; Gregersen 2005) have found that these two factors do relate, and inversely, with increases in testwiseness resulting in lowered test-taking anxiety We ran
In our study, we utilized Spearman correlations to analyze ordinal data from Likert-scale responses, as outlined by Field (2009) We explored the relationships over time by first examining the correlation between testwiseness factors, such as listening strategies and test-taking strategies, and test-taking anxiety at time 1 (pretesting), followed by a similar analysis at time 2 (posttesting).
The study found no correlation between testwiseness constructs, such as listening strategies and test-taking strategies, and test anxiety among learners However, both pretesting and posttesting revealed an inverse relationship between test-taking anxiety scores and overall performance on the L2-listening test, with weak but significant correlations of -0.267 and -0.279, respectively This indicates that while a learner's level of test-taking anxiety can weakly predict their L2-listening test scores, the relationship between these two factors remains minimal.
Figure 4 illustrates the posttesting correlation between L2-listening test scores and learners' results on the test-taking anxiety questionnaire The data reveals a weak trend indicating that higher scores in one area tend to correspond with lower scores in the other.
Notes Significant correlations are marked with asterisks Significant at 05 level is *, 01 is **
P values are listed in parentheses behind the correlation coefficients
Table 4: Spearman’s Rho (r) correlations among questionnaire and L2-listening-test data
Figure 4: Test-taking anxiety and L2-listening test performance (at posttesting)
Research question 3
The research question aimed to explore how test preparation affects test-taking processes, specifically examining potential differences among three instructional groups It was hypothesized that variations in posttest performance could indicate differing test-taking strategies, particularly with learners in the explicit instruction group employing more strategies than those in the control group The researchers anticipated corroborating evidence through eye movement records and stimulated recall transcripts However, the findings revealed no significant differences among the groups on the L2-listening posttest, nor in measures of test-wiseness or test-taking anxiety during posttesting.
Because we didn’t find any effects of test preparation on
We investigated the correlation between test-taking anxiety and L2 listening test scores, focusing on the reasons behind the differing performance of test-anxious versus less anxious learners, independent of instructional methods.
We identified the 12 highest and lowest scorers on test-taking anxiety by combining scores from two anxiety measures The low-anxiety group's average score was 45.58 (SD = 6.92), while the high-anxiety group averaged 96.75 (SD = 6.25) An independent sample t-test confirmed a significant difference between the two groups in test-taking anxiety, with results showing t = -19.00, df = 22, p = 000, and an effect size of d = 7.76.
7.76 indicates that the higher-anxiety group is almost eight standard deviations (on the anxiety measure’s scale) above the lower-anxiety group
We analyzed eye movement records of individuals with varying levels of test-taking anxiety during L2-listening tests to identify patterns in visual attention linked to their anxiety levels This investigation was motivated by previous research indicating that eye movements can vary from baseline conditions when individuals face difficulties in processing information, particularly under stress (Mitchell et al 2008; Warren &).
In L1 reading-processing studies, longer eye fixations and increased regressions to earlier text sections indicate reading difficulties compared to baseline data (McConnell, 2007) Eye-movement research can provide valuable insights for language-testing researchers by elucidating the cognitive processes involved in test-taking and the timing of various components within this process By utilizing eye tracking, researchers can effectively measure processing procedures and difficulties, as eye movements are believed to be driven by cognitive activity (Rayner).
Reichle & Pollatsek 2005; Reichle et al 2013; Rayner
And eye-movement data can be triangulated in relation to other, concurrent or subsequent measures of attention and awareness (Godfroid & Uggen 2013), which we have
Before presenting some of the eye-movement data, we first present some terms There are two kinds of eye-movement data that eye-trackers, including the Tobii
The TX300 device is utilized to record eye movements, revealing that during fixations, individuals process visual information, often focusing on a specific word or image (Rayner, 1998, 2009a) Saccades, the rapid eye movements between fixations, typically signify the need for additional information (Brysbaert & Nazir, 2005).
The time in between saccades is the eye fixation duration Fixation durations are influenced by a number of low-level (visual) and high-level (cognitive) factors
For example, in reading research, low-level factors include the length of the word (Kliegl, Nuthmann &
In research, high-level factors, such as the accuracy of word processing, are crucial (Reichle, Warren & McConnell 2009) Researchers identify specific elements of text or images they wish to study, referred to as interest areas In this study, test directions are designated as interest areas, distinguishing them from other text and images present in the test.
This study investigates whether individuals with high test anxiety spend more time on test directions compared to those with low anxiety levels By defining specific areas of interest, researchers can analyze various eye-movement statistics related to these areas, including the metrics utilized in this research.
! total fixation duration is the total time (in milliseconds) spent fixating on the interest area
! a fixation count is how many times a person’s line of sight entered the area of interest, i.e., the total number of fixations of which the total fixation duration consists
To explore how test-taking anxiety affects individuals, we analyzed the total fixation duration on test directions among low- and high-anxiety test takers using Tobii Studio software We employed the Velocity-Threshold Identification (I-VT) fixation classification algorithm to distinguish between fixations and saccades Based on previous research indicating that skilled English readers have an average fixation duration of 200-250 milliseconds per word (Rayner, 2009), we established a minimum fixation duration of 200 milliseconds for our study, excluding shorter fixations from analysis as they may represent noise in the data, such as re-fixations following a blink or distraction.
In a comparison of test-taking anxiety levels among learners, it was found that those with low anxiety exhibited significantly less time reading instructions than their highly anxious counterparts During a pretest, highly anxious students dedicated an average of 10.17 milliseconds to the initial directions, while low-anxiety learners spent only 3.46 milliseconds on the same content.
Test-taking anxiety frequently correlates with the amount of time students dedicate to identifying key words essential for answering fill-in-the-blank questions.
Table 5 presents the total fixation durations and counts (in milliseconds) that highlight differences in test-taking behaviors between individuals with high and low test-taking anxiety Highly anxious test takers exhibited significantly longer fixation times on initial instructions and key terms in the test booklet, which are crucial for answering questions correctly Additionally, we explored the relationship between test performance and test-taking behaviors, questioning whether L2-listening test scores, indicative of listening proficiency, reveal the impact of proficiency levels on test-taking behaviors, particularly concerning the test's difficulty.
In our study, we categorized test takers into two groups according to their average L2-listening test scores The eye movement data revealed that high-scoring individuals tended to fixate on key words near the answer location more quickly than their low-scoring counterparts.
High scorers demonstrated a significantly quicker time to first fixation on words adjacent to answer blanks in fill-in-the-gap questions, both in pre and posttests This suggests that they could allocate more time to processing the information surrounding the blanks, allowing them to advance more swiftly through the text Consequently, it appears that high scorers may possess superior reading speed compared to low scorers on the L2-listening test This reading efficiency could provide them an advantage during the listening test or reflect their pre-existing listening skills, making it challenging to separate the two factors in this context.
Mann-Whitney U High test- taking-anxiety students (N)
Low test- taking-anxiety students (N)
Instruction for Q1-3 Z = -2.421 (p = 0.015) 31.20 (18.81) 11.10 (9.15) Q16 open Z = -2.481 (p = 0.013) 11.38 (6.61) 4.25 (3.37) Q1 location Z = -2.348 (p = 0.019) 35.700 (13.27) 21.00 (12.74) Q1 in the Z = -2.007 (p = 0.045) 24.40 (11.04) 14.00 (10.52) Q35 from Z = -2.003 (p = 0.045) 29.63 (16.60) 13.83 (13.01) Q35 idea Z = -2.876 (p = 0.004) 49.40 (24.00) 17.70 (13.27) Q36 example Z = -2.172 (p = 0.03) 18.56 (16.64) 8.33 (13.47) Q8 address Z = -1.961 (p = 0.05) 6.11 (3.18) 3.57 (1.39)
Table 5: Some effects of test-taking anxiety on test-taking behavior
Mann-Whitney U High Scorers Mean
Table 6: High and low scorers’ time to first fixation on words adjacent to blanks
This study aimed to explore the impact of various types of test preparation—explicit, implicit, or minimal—on L2-listening test scores, particularly for test takers who were initially unfamiliar with the test format We assessed the participants' levels of testwiseness and test-taking anxiety, following established measurement methods from previous empirical research (Rogers & Harley 1999; Hewitt & Stephenson).
Hocevar & Kalechstein 1998; Horwitz, Horwitz & Cope