Language proficiency and learning gain
The first research question (RQ1) explores whether participants exhibit comparable proficiencies across the four language skills and examines any changes in these abilities over time Consistent with prior studies (e.g., Green, 2005; 2007a; 2007b), language proficiency development is defined as learning gain, calculated by subtracting Test 1 Score from Test 2 Score (e.g., a score increase from 5.0 to 5.5 reflects a 0.5 half-band gain, while a decrease from 6.5 to 5.5 indicates a -1.0 loss).
Learning gain in language proficiency requires significant time investment to observe progress on band scales For instance, Green (2007b) highlights that only 10% of test-takers achieved an improvement of one band or more in the IELTS Writing component after undergoing IELTS preparation.
EAP course of study (course duration 8 –9 weeks, 20 hours per week) Thus, following a
A 160–180 hour course in an English-speaking environment resulted in only a small percentage of students achieving significant improvements in their IELTS Writing scores Additionally, personal factors, environmental influences, and the difficulty of the test contribute to score variations, such as a half band difference in IELTS results when taking different test versions within a short timeframe.
(i.e regression to the mean: Green, 2005) Scores may increase or decrease by half a band, but this is not necessarily a true reflection of language proficiency change
In Green (2007b), one-third of participants demonstrated lower scores on a subsequent test, while Green (2005) reported a mean learning gain of -0.4, indicating an overall decline in scores Additionally, initial proficiency levels of test-takers significantly predict their learning gains.
Research indicates that initial IELTS Writing test scores are a significant predictor of subsequent performance, with lower proficiency learners (Band 5 and below) showing more improvement than their higher-level counterparts (Elder & O’Loughlin, 2003; Green, 2005; Humphreys et al., 2012) Green (2005) concluded that a two-month intensive pre-sessional course is unlikely to enhance proficiency for those who achieved a Band 6 or higher, while it may positively influence those at Band 5 or lower.
In the current study, participants engaging in two 90-minute English classes weekly for a 13-week semester, alongside two hours of homework per class, will accumulate a total of 127 hours of study per semester, equating to 254 hours annually However, due to variations in course selection, homework load, and involvement in extracurricular activities, it remains uncertain whether students will achieve significant improvements on the IELTS test within a year Individual differences are likely, with lower initial test scorers potentially experiencing greater gains (Elder & O’Loughlin, 2003; Green, 2005; Humphreys et al., 2012).
The second research question (RQ2) explores the factors influencing the differences in proficiency and learning gains among test-takers These factors can be related to current learning contexts, such as university studies and IELTS test preparation, or to past educational experiences, including studying abroad and the medium of instruction in attended schools Prior studies have highlighted these influences on language proficiency and learning outcomes.
2013; Xie & Andrews, 2012) provided a starting point for determining which factors to include in the investigation
The third research question (RQ3) aimed to profile IELTS test-takers' preparation by examining their study habits across university, high school, and cram schools This includes analyzing the quantity and type of study undertaken, their motivations, and their perceived progress in language proficiency.
Analyzing various learning situations revealed that learners' behaviors and perceptions were significantly influenced by the specific context and the tests they were preparing for Additionally, the study examined how these learning experiences affected language proficiency and its development.
Washback
Washback refers to the influence that a test has on both teaching and learning processes It is a key aspect of test impact, which examines how tests affect individuals, educational policies, and practices both in and out of the classroom.
1996) The scope of washback is, therefore, narrower than that of test impact and deals specifically with the effect that tests have on what (and how) teacher’s teach and what
Within the socio-cognitive framework of test validation (O’Sullivan & Weir, 2011; Weir,
Washback refers to the impact a test has on teaching and learning, highlighting its role in the consequential validity of assessments To establish this validity, it is essential to provide evidence of the washback effects generated by a test, which in turn supports its application in specific educational contexts.
Moreover, seeking and providing such evidence is in line with an ethical approach to language test development (O’Sullivan & Weir, 2011)
Since Alderson and Wall's (1993) study, washback has gained significant attention in language testing, primarily focusing on its impact on teaching rather than learning (Cheng, 2014) Research indicates that teachers' beliefs and experiences play a crucial role in determining how washback manifests in educational settings (Watanabe, 1996; 1997; 2004) Nonetheless, learning is viewed as the most critical outcome, positioning learners as the central figures in the washback process (Hughes, 2003) As a result, an increasing amount of research is now directly exploring the effects of washback on learners and their learning experiences (e.g., Mickan & ).
Motteram, 2009; Shih, 2007; Xie, 2013; Xie & Andrews, 2012; Zhan & Andrews, 2014)
This study aims to enhance the existing literature on learning by focusing on non-instructed test preparation contexts By minimizing the influence of teaching, it allows for a direct examination of the washback effect that tests have on learning outcomes.
In this study, washback upon learning was investigated primarily in terms of the test preparation strategies that test-takers employed when preparing for the IELTS test
Preparation strategies for the IELTS test emphasize specific activities, skills, and knowledge types If the test encourages effective language learning strategies, it can create positive washback; however, if it promotes ineffective strategies, it may result in negative washback.
The assessment of language tests reveals their significant influence on students' motivation to learn specific skills By offering a clear sub-goal within the broader objective of language acquisition, these tests serve as a powerful motivator, marking progress while providing immediate incentives and feedback (Dürnyei, 1998: 121).
Taking the IELTS test can enhance awareness of language abilities and encourage ongoing language study This motivation from the test complements a learner's overall language learning drive; a motivated learner may find the test a valuable boost, while an unmotivated learner is less likely to be influenced Thus, the impact of tests on study motivation should be considered within the broader context of individual learners and their unique circumstances.
In washback research, the perceived importance and difficulty of a test are essential factors that influence washback intensity (Cheng, 1997) Tests viewed as low-stakes or easy tend to receive minimal preparation, resulting in limited washback Conversely, when a test is regarded as both important and challenging yet achievable, it is likely to generate optimal washback (Green, 2005) Additionally, participant factors, including test-takers' understanding of test demands, available resources, and willingness to prepare, significantly impact the learning effects of a test (Hughes, 2003; Green, 2005) Therefore, examining how well test-takers comprehend the tasks and their readiness to engage in preparation is crucial for understanding the washback process, with interviews being a suitable method for gathering this information.
The context of test introduction significantly influences the washback process, as highlighted in studies by Gosa (2004) and Shih (2007) This research focuses on a prestigious university in Japan, which necessitates careful evaluation of the washback effects from the IELTS test Crucially, admission to this university mandates that applicants first succeed in the National Center for University Entrance Examinations.
To qualify for the highly competitive UT entrance exam, applicants must achieve a top score of 80-100% on the NCUEE exam This necessitates that students dedicate significant time and effort to serious study and preparation, particularly during their high school years.
The UT exam's high stakes are likely to create a significant washback effect on test-takers' English knowledge and skills, as well as their study strategies To validate this assumption, it was essential to explore learners' past language learning experiences, particularly concerning entrance exams This investigation is key to understanding how the IELTS test influences washback in this specific context.
Overview of the exams
To formulate more detailed predictions about the potential washback on learning, a brief overview of the two entrance exams is presented, followed by a comparison with the IELTS test.
The NCUEE is a syllabus-based test based on the national course of study (e.g MEXT,
2011) The exam focuses on vocabulary, grammar, pronunciation and receptive skills; there are no writing or speaking tasks
The reading and listening tests are separate All responses are multiple-choice
The reading test starts with pronunciation questions aimed at assessing speaking ability indirectly, followed by multiple-choice tasks that evaluate vocabulary knowledge and discourse comprehension The first half of the test emphasizes communicative skills through dialogues, while the second half includes longer texts such as film reviews, quasi-academic articles, and advertisements In the listening test, participants engage with numerous short dialogues, typically involving two speakers in three to four exchanges, followed by two longer monologues Overall, the tests focus on understanding 'everyday English' in both written and spoken formats, highlighting a practical approach to language comprehension.
The 2013 UT exam, which participants in this study completed, comprised reading and grammar, listening, and writing sections, with estimated weightings of 60% for reading/grammar, 25% for listening, and 15% for writing The reading section assessed general comprehension and grammatical knowledge through various tasks, such as summarizing a 500-word English text in Japanese (70–80 characters), gap-filling exercises, ordering jumbled words within sentences, translating from English to Japanese, and multiple-choice questions focusing on grammatical elements like articles and demonstrative pronouns.
(comprehension, choosing a sentence with the closest meaning to that in the text)
The assessment of reading comprehension primarily involved translation and multiple-choice questions, with grammatical queries being the least common The texts used were predominantly academic The listening comprehension section featured three texts and included multiple-choice and sentence completion items The writing section required two tasks: a 50–60 word response to a prompt about a significant learning experience and a 60–70 word dialogue based on a picture prompt, indirectly assessing speaking skills Reading and listening sections are scored objectively, while writing is evaluated holistically Both reading and listening emphasize the cultural aspects of English study, highlighting the importance of English proficiency for accessing higher cultural knowledge.
Comparing the NCUEE, UT and IELTS tests, a number of key differences are apparent
High-stakes tests serve different purposes: the NCUEE evaluates high school English curriculum learning, the UT test selects candidates based on performance, and IELTS ensures applicants possess adequate academic English proficiency for English-medium universities Additionally, the constructs assessed vary among these exams Proficiency exams like IELTS are rooted in a theoretical model of communicative language ability, focusing on specific skills and sub-skills In contrast, the NCUEE is aligned with high school syllabi, while the UT exam aims to assess higher-level abilities without publicly available specifications or a clear theoretical model, necessitating a reverse engineering approach to understand its construct, which can lead to varied interpretations.
The UT test construct is characterized by ambiguity, particularly regarding the skills assessed and their respective weightings, as highlighted in Table 1 Although there is some overlap in receptive skills and their formats, the same cannot be said for productive skills, which show minimal similarities.
Table 1: Comparison of NCUEE, UT entrance exam and IELTS proficiency test
NCUEE UT Entrance Exam IELTS (Academic)
Weighting of skills tested directly
Reading > listening > writing Reading = listening = writing speaking
Answer formats for reading and listening
Multiple choice, short answer, English-to-Japanese translation
Multiple choice, short answer, information transfer
Writing task format N/A Write a short paragraph on familiar, personal topic (50–60 words): write a 4-turn conversation (50–60 words)
Describe data and trends in tables and graphs (150 words): short academic essay (250 words)
Speaking test format N/A N/A One-to-one, face-to-face interactive,
Predicted washback on learning
The study examined the washback effects of high school and cram school experiences on learning and teaching, particularly in relation to NCUEE and UT exam preparations, as well as IELTS test preparation The analysis provided insights that allowed for predictions about potential washback effects in these educational contexts.
Washback was expected in terms of the focus on receptive and productive skills
During high school (ages 16–18), students emphasized reading and listening skills, along with vocabulary and grammar, to prepare for the NCUEE examination, while cram schools (Juku or Yobiko) prioritized reading and, to a lesser extent, listening and writing for the challenging UT entrance exams Speaking skills received minimal attention in preparation for both tests, as they are not assessed Test-taking strategies, particularly in cram schools, leaned heavily towards grammar and vocabulary, neglecting pronunciation and spoken fluency Classroom interaction styles varied based on teachers' beliefs and training, as well as the nature of the test tasks Overall, students' perceived development and motivation to study aligned closely with the demands of these high-stakes assessments.
Reading is the key skill assessed in the UT test, leading to the assumption that learners would be highly motivated to focus on reading and would notice significant improvement in this area.
The IELTS test is anticipated to drive a stronger emphasis on speaking skills, especially fluency and interaction, as well as writing skills focused on graph descriptions and argumentative essays Test-takers are likely to engage in studying test-taking strategies, including familiarization with question formats, alongside grammar, vocabulary, and pronunciation Preparation for the IELTS occurs in a non-instructed setting, making it challenging to compare with other teaching environments It is also expected that test-takers will feel motivated to enhance their productive skills, leading to a perceived improvement in these areas.
Summary of research design
To investigate the research questions, scores from the initial and subsequent IELTS tests were summarized and compared to assess learning gains Additionally, covariate factors from survey data were analyzed to identify predictors of test scores and learning gains Furthermore, survey and interview data regarding preparation for the IELTS tests across three learning environments—high school, cram school, and university—were thoroughly examined.
Participants
Three hundred first-year undergraduates were recruited on a first-come, first-served basis Of those, 255 took the first IELTS examinations and 45 failed to attend
(85% completion rate) Of the 255 students, 204 also took the second test
Test preparation
The British Council provided two half-day test-preparation sessions before the first exam
The sessions aimed to familiarize participants with the IELTS test, focusing particularly on the speaking and writing components, which significantly differ from other tests they might know A total of 64 students participated in the session prior to the first test, while 21 students attended the session before the second test.
Council also provided limited-duration, free access to their IELTS preparation website
(http://www.britishcouncil.jp/exam/ielts/resources/free-practice) to which 173 students signed up prior to the first test, and 23 students signed up prior to the second test.
Test administration
The first test was administered at four Eiken testing centre locations: Tokyo (ns),
Yokohama (n), UT (n0), Eiken head office (n=3) Participants took the test on one of 17 different dates during the period from September 2013 to February 2014
The second test was administered over six full-day sessions at UT (all components administered on the same day) between September and December 2014.
Survey design
The survey aimed to generate quantitative metrics for predicting test performance and proficiency growth, while also collecting data to evaluate the impact of prior education and current language instruction.
IELTS test-preparation impacted study habits, learner motivation and perceived proficiency (RQ3).
The survey was developed based on insights from earlier studies and analyses, specifically referencing Brown (2001) and Dürnyei and Taguchi (2009) It was conducted through the platform www.surveymonkey.com, utilizing Likert scale responses to enable effective comparison of results across different sections.
The survey items were developed through collaborative discussions among the research team and external reviewers, followed by translation into Japanese and verification Additionally, two focus groups, each consisting of two to four students, were organized, with participants receiving 1000 yen for their involvement.
Volunteers received £5 for their participation in sessions conducted in Japanese, which utilized a streamlined version of the survey These sessions were video-recorded, and an analysis of participant comments facilitated further enhancements to the survey's design and content The final iteration of the survey comprised 122 items and required approximately 25 minutes to complete.
Table 2 shows the information collected from the surveys Appendix 1 lists the questions
(in English) used in the surveys
Table 2: Content of the survey
Language history Languages known and used
Age began learning Experience of living and schooling abroad Study abroad experience
Extra-curricular English activities English test-taking experience Expectations to study abroad
(All items repeated for Test 1 and Test 2)
*Items that are comparable to those in the following section
Amount of preparation (hours) Motivation for taking IELTS Spoken fluency focus*
Form (grammar, vocabulary, pronunciation) focus*
BC website use, preparation sessions, and additional tuition
English courses taken Classroom organisation Teacher/student-centred instruction Main language used by teacher / students Amount / focus of homework
Spoken fluency focus Form focus
Skills focus Activities focus Test-taking techniques focus Satisfaction
Motivation Perceived proficiency development Additional information
Participants completed the survey online within a week following the second test
EIKEN offered a 500 yen (2.50 GBP) gift card as an incentive, which was sent to participants after they completed the survey Out of 204 students who took both IELTS tests, 190 participated in the survey, achieving a 93% completion rate.
All ethical procedures followed the standard guidelines set by UK higher education institutions, ensuring compliance and integrity Participants were required to complete informed consent forms for their involvement in surveys, focus groups, and interviews.
Interviews
The interviews aimed to enhance the survey data by gathering in-depth insights into individual perceptions, circumstances, and learning experiences (RQ3) They concentrated on three key topic areas.
The interviews were semi-structured and the question prompts were developed by the principal researcher and interviewers, working first in English and then translating prompts into Japanese
The interviewers, all postgraduate students from the English department at UT specializing in language research, underwent comprehensive training that included readings, workshops, practice interviews, and feedback sessions Interviewees were selected through a survey process.
The interviews were held in a serene and cozy campus setting, conducted in Japanese Interviewers utilized the participants' survey responses as references during the discussions Overall, the participants seemed at ease, engaging in informal and relaxed conversations with the interviewers.
After conducting the interviews, the interviewers transcribed the discussions, noting hesitations, surprises, and emotions where relevant The transcripts were organized in a spreadsheet based on the questions' focus, allowing for iterative reading by the principal researcher to identify key themes within and across the data Each interview was analyzed for defining characteristics, including the interviewees' main discussion points and their emphasized thoughts on language education and testing Subsequently, recurring themes were summarized across all interviews, and specific responses were revisited to highlight similarities and differences among participants Finally, the accuracy of English translations was thoroughly verified.
1 Perceptions of language learning behaviour in preparation for the IELTS test and in high school, cram school and university.
2 Perceptions of motivation for learning English and the relationship between this and study behaviour
3 Perceptions of own proficiency development and the factors that influenced this (see Appendix 2).
IELTS tests scores (RQ1)
Test 1 and Test 2 scores
The score distributions for Test 1 and Test 2 are shown in Figures 1 and 2, respectively
Based on the figures and the skewness and kurtosis values presented in Table 3, it was determined that the data appear to be sufficiently normally distributed
Figure 1: Initial IELTS band scores for four skills
IELTS test band scores for four skills: Test 1
Figure 2: IELTS band scores for four skills on Test 2
Table 3: Descriptive data for Test 1 and Test 2 scores
IELTS test band scores for four skills: Test 2
As the actual results show, there is a large discrepancy between productive and receptive skills of the present sample Participants scored, on average, highest on reading (Test 1
= 7.2 / Test 2 = 7.3), followed by listening (6.6/6.7), while writing (5.5/5.6) and speaking
(5.4/5.7) scores were considerably lower Thus, there is a considerable difference evident in the receptive versus productive language abilities of the present population sample
In examining the differences in scores for receptive and productive skills among IELTS test-takers, data from 2012 reveals that global averages for reading, listening, and speaking were closely aligned, ranging from 5.9 to 6.0, while writing lagged at 5.5 However, the present sample shows a more pronounced variation in skill scores, with notably higher averages in reading and listening, contrasted by a lower speaking score This trend highlights a significant inclination towards receptive skills among test-takers.
Compared to the average IELTS scores of Japanese first language test-takers (Table
Participants achieved an overall score increase of 0.4 to 0.6 bands, excelling in all skills except for speaking, which remained consistent at 5.4/5.7 compared to 5.6 Notably, the most significant improvements were observed in reading and listening, with scores of 7.2/7.3 versus 6.0 and 6.6/6.7 versus 5.9, respectively The largest discrepancy was found in reading ability, showing a difference of 1.2 to 1.3 bands.
The current sample demonstrates strong receptive skills, particularly in reading, with performance levels notably above national averages While writing skills show a slight improvement compared to the national average, speaking skills remain on par with it, indicating minimal variation in performance across these areas.
Table 4: IELTS mean test results for participants who took both tests (Test 1 and Test 2)
*Average of female and male candidates data taken from: http://www.ielts.org/researchers/analysis-of-test-data/test-taker-performance-2012.aspx
Learning gain
During the assessment period, 34% of participants saw an increase in their overall band scores by half a band or more, while 51% maintained their scores, and 15% experienced a decrease A comparison of mean scores from Test 1 and Test 2 revealed slight increases across all categories, with the most notable improvements in overall and speaking scores, which rose by 0.2 and 0.3 bands, respectively Paired samples t-tests indicated that the differences in overall and speaking scores were highly significant (p.05) Notably, the differences in listening and writing scores were close to significance (p=0.06).
D effect sizes, where small = 0.2 and medium = 0.5, the differences for overall and speaking scores both fall between the range of small and medium effect size
Our investigation revealed that participants with lower initial proficiency levels experienced greater learning gains Specifically, test-takers with initial proficiency scores of 4.5 and 5.0 demonstrated the highest overall improvement In contrast, those with higher proficiency levels, ranging from 7.5 to 8.5, exhibited smaller learning gains.
Table 5: Learning gain for test-takers at different initial band scores (gain=T2 - T1)
Note: The highest two mean scores for each row are shown in bold: the lowest two scores are shown in italics.
Test score and survey data (RQ2)
Response and predictor variables
The analysis aimed to identify the factors influencing IELTS scores among 190 participants, utilizing data from both initial and subsequent tests, which included overall scores as well as specific scores in reading, listening, writing, and speaking, resulting in a total of 10 measures Additionally, a learning gain analysis was conducted to determine which factors contributed to score improvements; however, this analysis was limited to participants who showed score enhancements during the testing period Due to the restricted nature of learning gains, which predominantly fell within 0.5 or 1.0 bands, it was not feasible to use learning gain as a dependent measure, limiting the ability to explore predictors of varying score improvements.
Seventy predictor variables, encompassing categorical, ordinal, and continuous data, were selected from the survey data, as detailed in Appendix 1 These variables were derived from sections including English language learning history, IELTS preparation and results, and university English study Certain items were excluded for technical reasons, and data from high school and cram schools were omitted due to similarity in responses among participants For Test 1, 66 predictors were included, excluding 'motivation to study reading/listening/writing/speaking following Test 1,' while Test 2 incorporated all 70 variables.
The administration of tests involved varying dates and locations for participants, necessitating statistical control for these factors In the analyses for Test 1, three control variables—test location, date, and duration between tests—were included, while 'duration' was solely considered in the analyses for Test 2.
Research by Green (2005) indicates that initial IELTS test scores are strong predictors of subsequent scores taken within a short timeframe, with his study examining a two-month interval Preliminary analyses confirm this correlation in the current data set; however, due to the high correlation between previous and new test scores, and the lack of insight it provides regarding the sample's language history or test preparation, the initial test score was excluded from further analyses.
Overview of analyses
To identify the factors predicting IELTS test scores, an appropriate statistical method was essential to streamline numerous predictor variables into the most significant ones According to Green (2007), various techniques, including structural equation modeling, cluster analysis, and neural networks, have been utilized by researchers examining factors affecting learning gain These methods effectively manage large variable sets for predicting test scores and learning outcomes In this study, however, a unique approach was employed using regression tree analyses known as conditional inference trees (Hothorn et al., 2006).
Conditional inference trees utilize an algorithm that recursively partitions observations through univariate splits for covariates, employing permutation tests from Strasser and Weber (1999) The algorithm first estimates a regression relationship for the response variable against each covariate, selecting the most explanatory one based on the lowest Bonferroni-corrected p-value This unbiased statistical approach to variable selection ensures that covariates are not preferentially chosen based on their data type—whether continuous, nominal, or binary—or the presence of missing values Subsequently, the algorithm estimates the optimal split point, effectively dividing the observations into two distinct groups.
A two-sample non-parametric permutation test (p < 0.05) is utilized to determine if the groups formed by the split represent distinct populations, effectively mitigating the risk of model overfitting Upon obtaining a significant test result, a constant regression model is applied to each segment of the partition Conversely, if the test is not significant, the covariate is discarded This recursive selection and partitioning process is repeated for all covariates and for each new node in the regression tree For example, this method can be applied to analyze a dependent variable such as 'overall test score.'
The algorithm evaluates the variable 'motivation to study' on a Likert scale of 1 to 6 by first assessing its significant association with the response variable, adhering to a Bonferroni-corrected p-value criterion Once a significant association is confirmed, it proceeds to determine the optimal split point, enabling the formation of two distinct groups based on the levels of motivation to study.
≤2 and >2) and for which the observations form two distinct proficiency groups (e.g ≤5.5,
If the permutation test indicates significance for the partitioned groups (p