ielts online rr 2017 4

Rationale and previous research

Computer-based academic writing in International Higher Education

IELTS is now accepted by over 6000 university-level institutions in over 135 countries

Millions of tertiary level students in the US, once accepted into academic programs, must effectively plan, take notes, write, monitor, revise, and submit extensive written assignments using computers, as essays are a fundamental component of academic evaluation.

Many universities have shifted to requiring electronic submissions of academic essays to facilitate the use of online plagiarism checkers like Turnitin This trend towards computer-based (CB) assessment in higher education is significant, particularly as these institutions have increasingly utilized IELTS scores in recent years.

The IELTS Academic Writing test must maintain its credibility by planning for computer-based (CB) formats in the near future, ensuring that its assessment practices closely align with real-world academic writing experiences This adaptation is crucial for meeting the evolving needs of test-takers and upholding the integrity of the evaluation process.

IELTS responds to this challenge in order to maintain its position in the field of academic language assessment.

A further important group with which IELTS needs to maintain its reputation is the test- takers themselves, particularly students applying for university entrance Furneaux

In 2013, an IELTS candidate noted a significant disparity between their performance in the IELTS writing assessment and their actual academic writing skills, suggesting that this gap could negatively impact IELTS's long-term reputation While many modern students are digital natives who are comfortable with technology, it's important to recognize that not all students share this proficiency, and the extent of this issue may be exaggerated.

Students preparing for the IELTS Writing test must recognize its significance in relation to their current and future academic writing experiences at university This understanding highlights the importance of aligning test preparation with real-world writing skills.

IELTS to move towards CB writing assessment in the coming years.

Introducing a computer version of the IELTS Writing test naturally necessitates a research base from which such a CB mode could be reliably and confidently launched

This article critically examines the current research on the IELTS Computer-Based (CB) mode, highlighting a significant lack of comprehensive and recent studies that support the implementation of CB tests This gap in research serves as the motivation for the proposed study, which seeks to establish a stronger research foundation for future CB testing in IELTS.

In other words, the time is ripe to consider such a move towards IELTS CB delivery

Not only are the potential advantages of CB delivery in terms of cost, security, efficiency and so on now established (Al-Amri, 2008; Puhan, Boughton, & Kim, 2007, citing Jodoin,

The evolving landscape of academic study, coupled with the rise of 'digital natives' entering higher education, necessitates an inevitable shift towards technology across all facets of academic life.

As Duderstadt, Atkins, & Van Houweling (2002: 24) note:

Institutions that effectively leverage technology to enhance learning will outshine their less tech-savvy competitors To maintain their leadership status, both institutions and faculty who pride themselves on excellent teaching must also master the integration of technology in education.

The same is held true for language testing organisations.

The use of CB writing in academic writing assessments

As academic writing increasingly transitions to computer-based formats, numerous international language testing organizations, including Cambridge English Language, are adopting computer-based (CB) writing assessments, with some even phasing out paper-based (PB) testing entirely.

Assessment has more than 350 test centres in 64 countries for its CB versions of KET,

The writing components of PET, FCE, CAE, and CPE exams are available in computer-based (CB) versions Additionally, the TOEFL iBT can now be taken in CB mode at 1,355 test centers worldwide.

149 countries – indeed, the PB format is now being phased out completely A total of

The TOEFL iBT® test is the preferred choice for 96% of test-takers globally, with this trend on the rise Pearson's PTE Academic test, which includes academic writing, is available exclusively in computer-based (CB) mode and has delivered over 27 million test questions, showcasing its global reach across more than 100 countries In Taiwan, LTTC provides the GEPT Advanced Writing Paper in both paper-based (PB) and CB modes Additionally, the British Council has introduced the Aptis test in CB format This shift indicates that nearly all major academic writing assessments are incorporating some form of computer-based essay tasks, reflecting a growing momentum towards CB testing.

The IELTS Academic Writing test, known for its rigor and reliability, is widely accepted by institutions globally As IELTS considers transitioning to a computer-based (CB) format alongside the traditional paper-based (PB) version, it is crucial to conduct thorough research to ensure that both formats yield comparable results across all dimensions Weir et al (2007) emphasize that the issue of equivalence between PB and CB formats cannot be overlooked Additionally, they highlight the necessity for the CB test to demonstrate two fundamental types of equivalence: score equivalence and equivalence of the underlying construct being measured.

We will now consider these two areas in turn, in terms of the CB and PB testing in general, and in terms of language testing in particular.

Score equivalence

Research indicates that with appropriate design, scores from Computer-Based (CB) and Paper-Based (PB) testing modes can be regarded as comparable (Puhan, Boughton & Kim, 2007; Taylor, Jamieson, Eignor & Kirsch).

1998; Weir et al, 2007; Wise & Plake, 1989) Early research by Mazzeo and Harvey

In 1988, it was suggested that computer-based (CB) tests were more challenging than paper-based (PB) versions, likely due to test-takers' unfamiliarity with technology However, recent studies indicate a significant comparability in scores between the two formats, attributed to the growing comfort of test-takers with computers in both educational and everyday contexts This trend has also been observed in research related to language testing.

A study by Taylor et al (1998) examined the comparability of the paper-based (PB) and computer-based (CB) versions of the TOEFL exam from 1996, revealing no significant score differences between the two formats Similarly, Wise and Plake (1989) argued that both PB and CB versions of achievement tests produce comparable scores.

Recent research by Wolfe and Manolo (2005) indicates that computer-based (CB) essays receive scores that are slightly more reliable than those of handwritten essays, showing higher correlations with TOEFL multiple-choice sub-scores Additionally, Puhan et al (2007) analyzed over 1,000 participants in writing tests conducted in both computer-based and paper-based (PB) modes, revealing no significant difference in scores between the two formats.

262 participants, Weir et al (2007) reported that the difference between the PB and CB versions was not significant.

Carefully moderated test design, along with appropriate familiarity, attitude, and anxiety levels among test-takers, can lead to score equivalence in large-scale writing assessments However, the issue of score equivalence in IELTS Writing across different modes is still largely under-researched, and this study aims to explore it further.

A questionnaire will also be used to examine the variable of computer familiarity and anxiety.

Cognitive validity and cognitive equivalence

Score equivalence alone does not fully establish the equivalence between CB (Computer-Based) and PB (Paper-Based) test modes According to Weir (2005), for criterion-referenced assessments like IELTS, the primary focus for test developers should be on criterion-related decision consistency This emphasizes the importance of maintaining consistent judgments regarding whether a specific criterion has been achieved, rather than merely ensuring scoring consistency.

[i]n determining test equivalence we need to establish that the processing in CBA and P&P mode are similar in terms of theory-based validity

(Weir et al 2007:8, emphasis added).

The varying modes of testing may activate different executive processing in candidates, leading to discrepancies in interactional authenticity Consequently, language test providers must ensure that the cognitive processes utilized by candidates during writing tasks accurately reflect the real-world demands of writing This includes establishing cognitive and score equivalence between computer-based (CB) and paper-based (PB) modes, ensuring that both testing formats provide a comparable assessment of a candidate's writing abilities.

This study focuses on comparing real-world cognitive processes with those employed in PB and CB modes of the IELTS AWT2 test Notably, it leverages recent research by Chan to provide unique insights into these cognitive mechanisms.

(2013), building on a body of writing processing literature including Hayes and Flower

In a study by Chan, cognitive processing among L2 students during essay tasks was explored within a real-life academic context, drawing on the research of Field (2004) and Shaw and Weir (2007) This investigation focused on academic essay writing, establishing a baseline for comparing cognitive processing in IELTS Academic Writing Task 2 (AWT2) Chan's research examined the processes involved across five distinct phases of academic writing.

Cognitive parameters for the analysis of academic writing (adapted from Chan, 2013)

Generating ideas Careful reading (local/global)

Scanning, skimming and search reading Connecting ideas and generating new representations

Organising ideas Organising ideas in relation to input texts

Organising ideas in relation to own texts

Generating texts Translating ideas into linguistic forms

Monitoring and revising Online monitoring and revising at low-level

Online monitoring and revising at high-level After writing monitoring and revising at low-level After writing monitoring and revising at high-level

Chan's research focused on L2 students' writing in authentic academic environments, identifying essential cognitive processes crucial for academic reading-into-writing assessments Excluding two specific processes related to reading texts, her findings establish a foundational understanding of the cognitive strategies employed by L2 writers in real academic contexts This project builds on Chan's work to examine how her identified cognitive processes are reflected in both the PB and CB versions of the IELTS AWT2 test, and explores potential adjustments to the IELTS test formats based on this comparison.

Cognitive processing in IELTS Academic writing

The IELTS Academic Writing Task 2 (AWT2) requires candidates to compose an essay addressing a specific viewpoint, argument, or issue in a formal manner Assessments focus on the ability to propose solutions, justify opinions, compare and contrast various evidence and opinions, as well as evaluate and challenge ideas and arguments (IELTS, 2013:5) While AWT2 has been the subject of various studies (e.g., Mickan & Slater, 2003; Mickan, Slater & Gibson, 2000), there has been limited research on the cognitive processing of test-takers during this task.

The cognitive processing of participants completing IELTS Academic Task 1 (AWT1)

Yu et al (2011) conducted an in-depth investigation into the cognitive processing of participants in Academic Task 2 (AWT2), highlighting that, despite AWT2 receiving considerable research attention, its cognitive aspects have been largely overlooked This gap in research emphasizes the need for further exploration into how participants engage cognitively with AWT2.

CB mode has not previously been researched – this is an important gap in the research base if the IELTS Writing test is to be computerised in future

Yu et al (2011) utilized a layered data collection approach, primarily employing the think-aloud method, which led to the development of a model outlining three interrelated cognitive process stages specific to AWT1 However, they did not directly compare the cognitive processes of their test-takers with those in similar real-world academic tasks, which could have helped establish the cognitive validity of AWT1 It appears that they implicitly assumed the cognitive processes identified under test conditions were generally applicable to authentic academic settings.

Our study builds on Yu et al (2011) by exploring the cognitive processes of test-takers in both AWT2 PB and CB modes, aiming to assess the cognitive equivalence of these formats Additionally, we will compare these processes with those of second language (L2) students engaging in authentic academic writing at a UK university, thereby establishing the cognitive validity of the test in both modes This research leverages recent findings from Chan (2013), which provide insights into the key cognitive processes involved.

This study examines L2 academic writers engaged in authentic writing tasks at a UK university, providing crucial data for evaluating the IELTS AWT2's cognitive validity in relation to real-world academic writing It also facilitates a comparison of cognitive processing between computer-based (CB) and paper-based (PB) modes The findings offer significant insights that can guide decisions on the potential launch of a CB version of the IELTS writing component.

The cognitive processes of participants using AWT2 in CB and PB modes may differ significantly A comprehensive review highlights how computer usage can either hinder or enhance the writing process.

Shaw (2005) discusses earlier research by Hermann (1987), which indicated that computer use could disrupt the writing process However, as computers became more integrated into writing practices, subsequent studies revealed that prolonged use of computers can significantly enhance students' writing skills (Shaw 2005:15) Notably, in cognitive terms, while traditional paper-based writing often requires extensive planning beforehand to minimize rewriting, computer-based writing may facilitate planning during the text production process itself.

Recent years may have altered the dynamics of cognitive processing in writing, particularly in planning and revising Research indicates that computer-based (CB) writing could lead to distinct attention levels and types of editing activities during the revision process Studies have shown that L2 writers tend to engage in more extensive revisions when using CB mode, with notable differences in both the quality of revisions and the time allocated to them This study focuses on the cognitive processing variations between computer-based and paper-based (PB) writing, emphasizing the significance of planning and revising activities.

Before implementing tests that facilitate alternative output modes, such as PB and CB modes, it is crucial to deepen our understanding of the cognitive processes these modes engage.

This research project aims to investigate the cognitive validity and equivalence of test tasks utilizing various modes Ensuring this validity is crucial, and to achieve this, we are collecting data on participants' cognitive processing during CB and PB tasks through a Writing Process Questionnaire and retrospective interviews Additionally, we will gather important information regarding participants' computer aptitude, which will be discussed further in the article.

The potential impact of writers’ computer familiarity on performance

Delivery mode significantly influences writers' performance, as noted by Shaw and Weir (2007) Despite the widespread use of computers in academic writing, concerns remain that some test-takers may be disadvantaged due to their unfamiliarity with technology Research by Al-Amri (2008), Russell (1999), and Shermis & Lombard (1998) supports this notion, highlighting the need to address these challenges in writing assessments.

Research by Taylor et al (1998, 1999) indicates that writers' familiarity with computers and their anxiety levels do not significantly affect their performance, as evidenced by test scores In contrast, Russell (1999) found that writers with a positive attitude towards using computers tend to engage more enthusiastically in writing tasks, leading to more extensive writing and careful revisions Weir et al (2007) examined the impact of computer familiarity, anxiety, and attitudes, concluding that these factors have a negligible effect on performance Despite these findings, Taylor et al emphasized the necessity of providing support, such as computer tutorials, for test-takers during preparation Ongoing calls for further research highlight the need to explore how these variables may influence writers' performance before drawing definitive conclusions.

(Hertz-Lazarowitz & Bar-Natan, 2002; McDonald,2002) on the presence or absence of any impact This important dimension was, therefore, investigated in this study by means of a self-report questionnaire.

Research questions

4 Research design, data collection and analysis

General approach

This study employs a mixed-methods approach to effectively address the research questions, as integrating both qualitative and quantitative methods offers a more comprehensive understanding of the research issues than relying on a single approach.

Onwuegbuzie, 2004) Research tools utilised include two different questionnaires

(Writing Processing Questionnaire and Computer Familiarity Questionnaire) (see

Appendices 2 and 3), retrospective interview and score analysis.

Participants

Students studying on undergraduate programs at a British university were recruited

The students' English proficiency levels were assessed to be between B1 and C1 according to their entrance profiles, with Table 1 illustrating their overall language skills as reflected in their IELTS scores Those students identified as needing additional support were mandated to participate in pre-sessional English classes.

The study included participants with IELTS overall scores of 5.5 or below, as it was believed that the writing mode—paper versus computer—would significantly affect lower proficiency writers A total of 153 students took part, comprising 45.4% male and 54.6% female, representing diverse academic disciplines, including Business.

Language and Communication and Computing

Table 1: Participants' previous IELTS scores

Band range Percentage of participants

The research questions for the study are as follows:

1 Score equivalence: Are there significant differences in the scores awarded by independent raters for performances in CB and PB mode in IELTS

2 Cognitive equivalence: Do test participants use cognitive processes differently in completing IELTS Academic Writing Task 2 in CB mode and

Research investigates whether independent variables such as computer familiarity and anxiety significantly predict test scores The analysis aims to identify meaningful correlations between these factors and students' performance, highlighting the potential impact of technological comfort and emotional state on academic outcomes.

The scores reported were double rated by four certified IELTS raters approved by the

British Council Rater 1 marked all the scripts whereas Raters 2–4 each double marked a sub-set of the scripts (Note: The overall scores were rounded down as in the operational

IELTS test, i.e 5.75 becomes 5.5, 5.25 becomes 5.0.) Inter-rater reliability, raters' severity and percentage of absolute agreement will be reported in Section 5.1.

Instruments

Test tasks

Two publicly available sample AWT2 tasks were used in a pre-pilot which involved

Feedback from 11 students indicated that one task was more demanding than the other, raising concerns about the open access of the sample tasks and the potential for prior exposure among participants In response, the research team requested a set of retired AWT2 tasks from the test providers, selecting eight for further evaluation A panel of six expert language testing practitioners and researchers reviewed these tasks to identify the two most comparable AWT2 tasks regarding topic, domain, and required language functions Ultimately, two tasks were chosen for the main study, with a discussion on their comparability to follow in Section 5.1.

In the study, participants engaged in two tasks: one utilizing the traditional PB mode and the other in the experimental CB mode During the CB mode, participants wrote their essays using Microsoft Word, with all proofing functions enabled (refer to Appendix 1 for task details).

CB mode (e.g grammar and spell check) were disabled More information about the procedures of data collection is presented in Section 4.4.

Computer Familiarity Questionnaire

As suggested by previous research into comparability of CB and PB modes, participants' familiarity with computers might have an impact on their performance in a

CB test A Computer Familiarity Questionnaire was, therefore, deployed, adapted from

Weir et al (2007) conducted a study that has been adapted for the current research context The Computer Familiarity Questionnaire, detailed in Appendix 2, includes 14 closed questions and one open-ended question The open-ended question (Q15) asks participants about their preference for taking the IELTS Academic Writing test on paper or on a computer The question structure is outlined in Table 2.

Table 2: Structure of the Computer Familiarity Questionnaire

Writing Process Questionnaire

The Writing Process Questionnaire, adapted from Chan's (2013) research on cognitive processes in academic writing, was tailored to align with the task features of IELTS AWT2, comprising 40 items (refer to Appendix 3) To verify the internal consistency reliability of the items across each cognitive phase, an assessment was conducted, ensuring that each category effectively measured the same theoretical construct related to the distinct processes in this study, as outlined in Table 3.

Table 3: Structure of the Writing Process Questionnaire

Cognitive phases Items Internal consistency reliability Conceptualisation Q1, Q2, Q3, Q4, Q5, Q6, Q7,

Monitoring and revising at high-level

Monitoring and revising at low-level

Interview

Retrospective interviews were carried out to delve into the cognitive processing of participants during AWT2 in both modes Twenty percent of participants (n=0) were individually interviewed by the research team immediately following each test event, with recruitment based on voluntary participation While the average scores for PB and CB bands were identical, the standard deviation for the CB band was marginally higher.

The study revealed that participants had an average score of 5.80 (M=5.80, SD=0.49 for the overall group; M=5.80, SD=0.55 for the CB group), with most interviewees receiving the same band score under both conditions Notably, 16.7% of participants experienced a 0.5 band score difference, while 13.4% had a 1 band score difference, indicating that performance across conditions was largely equivalent All interviews were voice recorded and subsequently transcribed by two research assistants, with 10% of the transcripts verified for accuracy by a research team member Further details on the analysis of the interview data can be found in Section 4.5.

Data collection

The research team visited classes of Academic Writing offered by the Department of

Language and Communication to explain the overall research aim and recruit students

To motivate student participation in the study, we provided written feedback on their performance in two tasks and organized open sessions for interested students to discuss their test results Participants were informed about the testing procedures and their scheduled test times, with test events occurring one week after the class visits.

During regular class sessions, approximately 15 test events were conducted with the assistance of lecturers Participants began each event by completing ethics procedures before being randomly assigned to one of two groups The data collection procedures are outlined in Table 4 Each group then completed two AWT2 tasks in a counter-balanced order.

In each test event, prompts were presented in a counterbalanced order, with both paper and computer formats utilized Each test lasted 40 minutes, and the two tests were conducted consecutively For further details, please refer to Table 4.

All participants filled in a Computer Familiarity Questionnaire 5

Completed AWT2 on paper Completed AWT2 on computer 40

All participants filled in a Writing Process Questionnaire 10

Completed AWT2 on computer Completed AWT2 on paper 40

All participants filled in a Writing Process Questionnaire 10

20% of the group were interviewed individually 20

As previously mentioned, 153 students participated in the study A total of 306 scripts

(153 paper scripts and 153 computer scripts) were collected The data from all questionnaires (including 153 Computer Familiarity Questionnaires and 306 Writing

Process Questionnaires) were entered into a spreadsheet using SPSS v22 Scripts which received no mark due to illegible writing or inadequate writing sample were excluded

Questionnaires with 30% or above missing data were also excluded The final totals of each data point is shown in Table 5

Table 5: Final totals of each data point

Data analysis

RQ1: Score equivalence between CB and PB mode

To investigate score equivalence under the CB and PB conditions, two Multi-Facet

The study utilized Rasch Measurement (MFRM) with FACETS 3.71.2 (Linacre, 2013) to conduct a comprehensive 5-facet analysis, examining the influence of test-takers' writing ability, testing mode, essay topic (prompts), raters, and rating categories on scoring outcomes This analysis aimed to evaluate score equivalence across different conditions by comparing overall scores between the two testing modes Additionally, it focused on assessing test-takers' performance across each analytic rating scale.

(i.e Task Achievement, Coherence and Cohesion, Lexical Resources, and Grammatical

Four separate 4-facet analyses were conducted to compare the range and accuracy of different delivery modes, excluding delivery mode as a facet Each analytic category, such as CB Task Achievement and PB Task Achievement, was treated independently, allowing for a direct comparison of the four pairs of analytic scales between the delivery modes.

In all analyses, the data were entered into the Rating Scale Model (RSM), which operates under the assumption that the rating scale associated with each item functions similarly.

RQ2: Cognitive equivalence between CB and PB mode

The cognitive processes employed by the test-takers in CB and PB mode in IELTS

Academic Task 2 (AWT2) were measured through the Writing Process Questionnaire

Descriptive statistics of individual questionnaire items from the CB and PB modes were obtained As the data of most items was not normally distributed, non-parametric

The Wilcoxon signed-rank tests were employed to analyze and compare the outcomes of the two modes Additionally, these results were descriptively compared to the findings of a previous study by Chan (2013), which focused on students' cognitive processes during academic writing tasks in real-life scenarios.

In addition to questionnaire data, retrospective interviews were used to provide supplementary findings to the participants’ writing processes between the two modes

The 30 transcripts were coded using NVivo v10 (see Appendix 4 for some examples)

RQ3: Relationship between affective variables and test performance in CB and PB mode

To investigate the influence of affective variables on student performance in different modes, descriptive statistics were computed for participants who selected "definitely agree/always" and "mostly agree/often" in the Computer Familiarity Questionnaires (CFQs) After verifying that the data satisfied the necessary analysis prerequisites, including normality, homoscedasticity, linearity, absence of multicollinearity, and outliers, Multiple Regression analysis was conducted to assess the impact of these variables on performance in the CB test The stepwise method was employed, allowing for the inclusion or exclusion of independent variables based on their F probability at each stage.

Score equivalence between CB and PB mode (RQ1)

This article presents findings on raters' reliability and severity, the comparability of prompts, and score equivalence between two modes, based on a 5-facet MFRM analysis Additionally, it discusses individual analytic scores across the two modes, derived from a 4-facet MFRM analysis.

Regarding raters' reliability and severity, Rasch logit scale and the Infit mean square index as a measure of fit (i.e meeting the assumptions of the Rasch model) are reported in Table 6.

Real, Populn: RMSE 08 Adj (True) S.D .19 Separation 2.38 Strata 3.50 Reliability (not inter-rater) 85

Real, Sample: RMSE 08 Adj (True) S.D .22 Separation 2.80 Strata 4.07 Reliability (not inter-rater) 89

Real, Fixed (all same) chi-square: 26.2 d.f.: 3 significance (probability): 00

Real, Random (normal) chi-square: 2.7 d.f.: 2 significance (probability): 26

Inter-Rater agreement opportunities: 1096 Exact agreements: 732 = 66.8% Expected: 483.2 = 44.1%

Before analyzing the severity of the raters, it's important to note that Rater A evaluated all scripts, while Raters B, C, and D assessed only subsets According to the Logit measure in Table 6, Raters B and D were more lenient compared to Rater A, while Rater C was stricter However, the differences in fair mean scores among the four raters were minimal, within 0.2, or half an IELTS band Additionally, all raters' Infit values fell within the acceptable range.

Infit values in the range of 0.5 to 1.5 are ‘productive for measurement’ (Wright and

According to Linacre (1994) and Bond and Fox (2007), the acceptable Infit value range is typically between 0.7 and 1.3 However, due to the high-stakes nature of the IELTS test, this report adheres to a stricter acceptable range Notably, the agreement between the first and second rater was 66.8%.

Table 7 reports results regarding the comparability of the prompts used in the study

According to the analysis of mean and logit measures, Prompt 2 proved to be significantly easier than Prompt 1 (X2w.6, p

Tiêu đề	Researching participants taking IELTS Academic Writing Task 2 (AWT2) in paper mode and in computer mode in terms of score equivalence, cognitive validity and other factors
Tác giả	Sathena Chan, Stephen Bax, Cyril Weir
Trường học	British Council
Chuyên ngành	Language Testing
Thể loại	Research Report
Năm xuất bản	2017

Định dạng
Số trang	47
Dung lượng	1,35 MB