IELTS Research Reports Volume 12

Research questions

The design of this study allowed for the exploration of how examiners view the revised IELTS

The study examines the pronunciation scale utilized to assign scores to candidates at band levels 5, 6, and 7 By employing a mixed-method approach, it effectively analyzes both quantitative and qualitative data gathered from various sources, addressing key research questions related to pronunciation assessment.

Examiners generally perceive the revised IELTS Pronunciation scale positively, finding the descriptors clear and the increased number of bands beneficial for assessment They express confidence in evaluating candidates' pronunciation features outlined in the descriptors Additionally, examiners identify specific pronunciation features, such as intonation, stress, and clarity, as the most crucial factors when awarding scores.

When applying the revised IELTS Pronunciation scale to assess candidates at band levels 5, 6, and 7, examiners encounter varying degrees of difficulty in distinguishing between these levels Key features of pronunciation that examiners consider include clarity, intonation, and stress patterns However, examiners also report challenges with the scale, particularly in consistently applying criteria across different candidates.

Data collection

There were three phases of data collection

1 Phase 1, Pre-rating: The online Questionnaire A elicited background details and experiences with, and attitudes towards, the revised Pronunciation scale in general from 27 examiners

2 Phase 2, Rating: All but one of the same group of examiners (n&):

! used the revised IELTS Pronunciation scale to score 12 sample performances of Part 3 of the Speaking Test (Questionnaire B)

! completed Questionnaire C which invited them to reflect on how they had used the scale to award a Pronunciation score to sample performances

In Phase 3 of the study, a distinct group of six examiners evaluated four out of twelve sample performances from Phase 2 They provided scores along with a summary of their reasoning for each assessment Additionally, the examiners employed a stimulated verbal protocol procedure to analyze the specific features that influenced their judgments regarding the candidates' pronunciation.

Copies of questionnaires A, B and C used in Phases 1 and 2 are provided in Appendix 1.

IELTS Research Reports Volume 12 © www.ielts.org 8

Participants

Examiners

All examiners were currently certified IELTS examiners from a single centre, whose examining experience ranged from newly trained (two months experience) to very experienced (13 years)

All IELTS examiners possess the necessary qualifications, including an undergraduate degree or equivalent, a relevant TESOL certification, and a minimum of three years of teaching experience Many examiners hold a Certificate in English Language Teaching to Adults (CELTA) or possess both undergraduate and postgraduate TESOL qualifications.

A new set of examiners was selected for Phase 3 of the study to guarantee that they had not previously assessed the samples As indicated in Table 1, these examiners possessed comparable TESOL qualifications to those from the initial two phases and had similar teaching and examining experience.

Participation in the study was voluntary, and paid at standard hourly IELTS examiner rates

Phase 1 and 2 examiners (n' a ) Phase 3 examiners (n=6)

Experience as an IELTS examiner newly trained – 13 years (M = 3.3) newly trained – 10 years

TESOL qualification 20 (74.1%) 4 (66.7%) a 27 completed Questionnaire A, from which the information reported here was taken Only 26 of these participated in Phase 2

Table 1: Characteristics of participating IELTS examiners

Candidate speech samples

The 12 speech samples were provided by IELTS Australia and comprised excerpts (Part 3) from IELTS Speaking Test interviews of candidates from two language backgrounds, Punjabi and Arabic, who had been awarded Pronunciation band scores of 5, 6 and 7 There were two samples from each language group at each level As shown in Table 2, this means there were six from each language group and four (two Punjabi and two Arabic) at each band level Most of the samples were from male candidates; only one from each language group was female Part 3 of the Speaking Test was chosen because it provided an extended sample of the candidate’s spoken English in interaction (a discussion usually lasting four to five minutes) for the examiners to rate, and also because it has been argued to show the best correlation with marks on the full test (IELTS, 2010)

Band level Punjabi Arabic Total

Table 2: Speech samples: distribution of band level and language

Procedure

Questionnaire A was administered electronically and, where possible, the examiners completed and returned it several days before the predetermined date of the Phase 2 data collection session

The speech samples were randomized into four distinct orders and recorded onto separate CDs for the examiners' use Rating tasks took place in a library section where each examiner had individual access to a computer or CD player with headphones A practice sample was provided before the 12 samples for assessment Examiners utilized a revised Pronunciation scale to score each sample, documenting their evaluations in Questionnaire B They had the flexibility to rate at their own pace, with the ability to pause or replay recordings as needed After scoring all 12 samples, examiners completed Questionnaire C.

A number of studies of the rating process on oral proficiency tests have used examiners’ retrospective verbal reports to focus on the decisions they make when judging a candidate’s performance Orr

In 2002, research focused on the examiner's role as an assessor in the Cambridge First Certificate in English (FCE) Speaking test Brown utilized verbal protocol studies to analyze the earlier holistic IELTS band scales and assess the validity of newer analytic scales Additionally, Hubbard et al (2006) reported positive findings from a 'real time' verbal protocol analysis of the Cambridge CAE Speaking test, employing a methodology similar to Brown's earlier work.

During the verbal protocol phase, each examiner evaluated pronunciation and provided feedback on four distinct samples chosen from a total of 12 speech samples The selection ensured that each examiner assessed samples representing both language groups and a variety of IELTS-assigned band levels Notably, each sample was evaluated by two different VP examiners, as detailed in Table 3.

Each VP session occurred in a private setting with only the VP examiner and a researcher present Audio samples were played through a computer connected to external speakers, and the VP examiner controlled the playback using the computer's keyboard Additionally, each session was documented with a digital voice recorder, following a specific format.

1 Practice stage: examiners practised with a sample (not included in the 12 samples) before listening to the four samples assigned to them

During the rating stage, the VP examiner evaluated the recording as if assessing the candidate, assigning a band score for Pronunciation Additionally, the examiner summarized the rationale behind the chosen score To ensure a fair assessment, the order of audio samples presented to each VP examiner was randomized, allowing them to hear recordings from various language backgrounds and IELTS-assigned band levels in a unique sequence.

During the review stage, the VP examiner was directed to re-listen to the recording, pausing to provide comments on aspects that influenced her evaluation of the candidate's pronunciation.

4 The reflection stage: the VP examiner either commented spontaneously, or if she did not, was asked for additional comments after each recording had finished

After the completion of all four verbal reports, the VP examiner was invited to share any additional comments The researcher took this opportunity to clarify any points raised by the VP examiner during the session This interaction was unique, as the researchers limited their feedback in earlier stages to avoid influencing the subsequent verbal reports.

‘intended as no more than tokens of acceptance of what they said’ (Lumley 2005, p 119)

A total of 24 verbal reports were recorded and transcribed, with six examiners evaluating four samples each However, for one report (VP2's report on sample 7P2), the summary of the rating stage was not captured during recording due to a technical issue and was instead documented by the researcher.

VP 1 VP 2 VP3 VP4 VP5 VP6

The labels denote the IELTS band levels (5, 6, or 7) assigned to candidates, along with their language backgrounds, which include Arabic (A) and Punjabi (P) Each language group features two distinct candidates, labeled as 1 and 2.

Table 3: Samples rated by VP examiners

3.5 Summary of study design and aims

Table 4 provides an overview of the data sources relevant to the research questions Additionally, copies of the questionnaires utilized in Phases 1 and 2 can be found in Appendix 1.

Study phase Data source Purpose (research question addressed)

• To elicit background details of examiners

• To elicit examiners’ views of the revised Pronunciation scale in general (Research Question 1)

Questionnaire B • To record the Pronunciation score awarded for each sample

• To elicit examiners’ reflections on using the revised Pronunciation scale to award Pronunciation scores to the candidates in the samples (Research Questions 2a and 2c)

• To record the Pronunciation score awarded for each sample (Research Question 2a)

• To elicit examiners’ reasons for awarding a particular score (Research Question 2b)

• To elicit reflections on the features of pronunciation that contributed to the Pronunciation score awarded (Research Question 2b)

• To elicit responses highlighting difficulties using the revised Pronunciation scale (Research Question 2c)

Table 4: Overview of data sources

Quantitative data from questionnaires were analyzed using SPSS to calculate means and standard deviations for Likert scale responses, following the methodology established by Derwing and Munro (1997; Munro & Derwing, 1995a) Paired-sample t-tests were conducted to examine differences in Likert scale responses, with a significance level set at 05.

Qualitative data from the questionnaires were manually coded by two researchers to identify themes relevant to the research question One author utilized NVivo 8 to code and analyze VP data, establishing clear coding category descriptions (refer to Appendix 2) To ensure coding reliability, a second author independently coded 10% of randomly selected comments, resulting in minimal disagreements that were subsequently discussed and resolved.

Research Question 1: In general, how do examiners view the revised IELTS Pronunciation scale?

Research Question 1a): How easy do they find the descriptors and increased number of bands to use?

Responses from Questionnaire A revealed that 21 examiners who utilized the previous version of the Pronunciation scale favored the revised version Their feedback highlighted that the new scale allowed for greater precision and flexibility in their evaluations, with 11 and seven comments reflecting these benefits, respectively Additionally, five comments noted that this improvement contributed to a fairer assessment process for candidates.

The results regarding the ease of use of the descriptors in the Speaking Test were somewhat ambiguous Examiners rated their experience on a five-point Likert scale, with 1 indicating "very easy" and 5 representing "very hard." According to Table 5, the mean rating for the Pronunciation scale was 2.81, while the ratings for Grammatical Range & Accuracy, Fluency & Coherence, and Lexical Resource were lower, at 2.59, 2.41, and 2.26, respectively.

Resource respectively, suggesting that they found it a little harder to use than the other scales

However, only the difference between the means for Pronunciation and Lexical Resource was significant (see Appendix 3)

Table 5: Ease of use of descriptors on all scales of the Speaking Test

Table 6 shows a breakdown of examiner responses to this item (first row) and responses to

Examiner feedback on Question A3 revealed a tendency to rate the ease of using specific aspects of the Pronunciation scale around the mid-point for all four items assessed.

verbal protocol phase)

Summary of study design and aims

Table 4 provides an overview of the data sources relevant to the research questions Additionally, copies of the questionnaires utilized in Phases 1 and 2 can be found in Appendix 1.

Study phase Data source Purpose (research question addressed)

• To elicit background details of examiners

• To elicit examiners’ views of the revised Pronunciation scale in general (Research Question 1)

Questionnaire B • To record the Pronunciation score awarded for each sample

• To elicit examiners’ reflections on using the revised Pronunciation scale to award Pronunciation scores to the candidates in the samples (Research Questions 2a and 2c)

• To record the Pronunciation score awarded for each sample (Research Question 2a)

• To elicit examiners’ reasons for awarding a particular score (Research Question 2b)

• To elicit reflections on the features of pronunciation that contributed to the Pronunciation score awarded (Research Question 2b)

• To elicit responses highlighting difficulties using the revised Pronunciation scale (Research Question 2c)

Table 4: Overview of data sources

Data analysis

Quantitative data from questionnaires were analyzed using SPSS to calculate means and standard deviations for Likert scale responses, following the methodology established by Derwing and Munro in their earlier studies To assess differences in Likert scale responses, paired-sample t-tests were conducted at a significance level of 05.

Qualitative data from questionnaires were manually coded by two researchers to identify themes pertinent to the research question One author utilized NVivo 8 to code and analyze VP data, establishing clear coding category descriptions (refer to Appendix 2) A coding reliability check was conducted by a second author, who independently coded 10% of randomly selected comments based on these descriptions The few disagreements that arose during this process were discussed and resolved.

Research questions 1 and 1a)

Research Question 1: In general, how do examiners view the revised IELTS Pronunciation scale?

Research Question 1a): How easy do they find the descriptors and increased number of bands to use?

Responses from Questionnaire A revealed that 21 examiners who utilized the previous version of the Pronunciation scale favored the revised version Their feedback highlighted that the new scale allowed for greater precision and flexibility in their assessments, with 11 and seven comments supporting these points, respectively Additionally, five comments noted that this improvement contributed to a fairer evaluation process for candidates.

The clarity of examiners' experiences using the descriptors in the Speaking Test was mixed In Question 1 (A1), examiners rated the ease of use on a five-point Likert scale, with 1 indicating "very easy" and 5 indicating "very hard." The average ratings revealed that the Pronunciation scale received a mean score of 2.81, while the Grammatical Range & Accuracy, Fluency & Coherence, and Lexical scales received lower mean scores of 2.59, 2.41, and 2.26, respectively.

Resource respectively, suggesting that they found it a little harder to use than the other scales

However, only the difference between the means for Pronunciation and Lexical Resource was significant (see Appendix 3)

Table 5: Ease of use of descriptors on all scales of the Speaking Test

Table 6 shows a breakdown of examiner responses to this item (first row) and responses to

Examiner feedback on Question A3 regarding the ease of using specific aspects of the Pronunciation scale revealed a tendency to rate their experiences around the mid-point for all four items assessed.

A rating of 3 on the Pronunciation scale may indicate that the examiner found its aspects to be neither easy nor hard, reflecting a neutral stance This mid-point rating could arise from uncertainty in decision-making, as the perceived difficulty of using the Pronunciation descriptors can vary depending on the candidate Additionally, a mid-point score might suggest that while the Pronunciation scale was relatively user-friendly, it was not as straightforward as other scales assessed during the Speaking Test For instance, three examiners rated the Pronunciation descriptors as a 3, while assigning lower ratings of 1 or 2 to other scales, highlighting a discrepancy in ease of use among the different evaluation criteria.

E5: All the descriptors are relatively easy to use The reason why pronunciation is a bit harder is because of the ‘in between’ bands

E19: The pronunciation descriptors are relatively new compared to the others

E26: The descriptors are overall succinct and user-friendly The pronunciation descriptors

A1 How easy have you found it to use the

A3 (a) How easy do you find it to use the increased number of band levels?

A3 (b) How easy do you find it to distinguish between the band levels

A3 (c) How easy do you find it to understand the descriptors

1 (3.7%) Note: Percentages may not add to 100% because of rounding to one decimal place

Table 6: Ease of use of descriptors on the revised Pronunciation scale

Examining the extreme ratings of the Pronunciation scale reveals that examiners generally found the descriptors easy to use, with a tendency towards positive evaluations Table 6 illustrates that ratings of 1 or 2, indicating ease of use, were more common than ratings of 4 or 5, which suggest difficulty.

! Almost twice as many examiners found the descriptors easy to use (n = 10) than hard to use (n = 6)

! More than three times as many examiners found the increased number of band levels easy to use (n = 14) as those who found them hard (n = 4)

! Almost twice as many examiners found it easy to distinguish between the band levels (n = 11) as those who found it hard (n = 6)

! Twice as many examiners found it easy to understand the descriptors (n = 10) as those who found it hard (n = 5)

The findings reveal that examiners favored the revised Pronunciation scale over its predecessor, showing a trend towards more positive feedback regarding the usability of the descriptors and the expanded band levels Additionally, examiners expressed confidence in their ability to assess various pronunciation features outlined in the scale descriptors.

Research question 1b)

Research Question 1b): How confident do they feel about judging a candidate’s use of the different features of pronunciation covered in the descriptors?

Examiners expressed a strong sense of confidence in evaluating the pronunciation features outlined in the Pronunciation scale descriptors They utilized a five-point Likert scale to assess their confidence levels regarding various elements, including sounds, rhythm, word-level stress, sentence-level stress, intonation, chunking, speech rate, intelligibility, listener strain, and accent Consistent with Brown's (2006) confidence scale, lower ratings indicated diminished confidence, with a score of 1 representing "not very confident" and a score of 5 signifying "very confident."

According to Table 7, examiners exhibited a high level of confidence in assessing various features, particularly global aspects like intelligibility (M = 4.19), listener effort (M = 4.07), and accent (M = 3.96) In contrast, their confidence in evaluating specific features was slightly lower, with speech rate (M = 3.96) being the exception The confidence levels for other concrete features were notably lower, including rhythm (M = 3.52), sentence stress and intonation (M = 3.67), chunking and word stress (M = 3.74), and sounds (M = 3.78).

A series of paired-sample t-tests indicated that intelligibility scores were significantly higher than those for most concrete features, with the exception of speech rate Additionally, listener effort was notably greater than for all concrete features except for chunking and speech rate In contrast, assessments of accent showed no significant differences between means, except for rhythm.

Sounds Rhythm Word stress Sentence stress Intonation Chunking Speech rate Accent Listener effort Intelligibility

Table 7: Confidence judging features of pronunciation

The examiners exhibited greater confidence in assessing global features of pronunciation, such as intelligibility and listener effort, compared to specific concrete features This confidence aligns with their belief that these global features are the most significant when determining scores.

Research question 1c)

Research Question 1c): Which features of pronunciation do they think are most important when awarding a pronunciation score?

Examiners' feedback on Questionnaire A highlighted that intelligibility and listener effort are crucial factors in determining Pronunciation scores Specifically, 85.2% of examiners identified intelligibility as significant, with 77.8% ranking it among the top two features Listener effort followed closely, nominated by 70.4% of examiners, with 51.9% placing it in the top two rankings Notably, 11 examiners considered both intelligibility and listener effort as the most vital features, with nine prioritizing intelligibility over listener effort.

Table 8: Features considered most important when awarding a Pronunciation score

Examiners showed a preference for the revised IELTS Pronunciation over the previous version, expressing a generally positive outlook on the ease of using the descriptors They felt confident in assessing various features outlined in the descriptors, particularly global aspects like intelligibility and listener effort, which they deemed crucial for determining Pronunciation scores.

As will be discussed below, however, when actually rating the samples in this study, the examiners did experience some difficulty in distinguishing between the band levels selected for focus

Research question 2a)

The revised IELTS Pronunciation scale aims to assess candidates at band levels 5, 6, and 7 This research question investigates the ease with which examiners can differentiate between these band levels when awarding pronunciation scores Understanding the clarity of distinction among these levels is crucial for ensuring fair and accurate evaluation in the IELTS examination process.

Examiners faced challenges in differentiating between various band levels while assigning Pronunciation scores in this study This difficulty is illustrated in the previously presented table.

Before rating the samples, examiners expressed confidence in distinguishing between band levels However, their responses to Questionnaire C revealed a contrasting perspective, as they indicated greater difficulty in differentiating band levels among candidates in the samples, using a five-point Likert scale.

Table 9 presents a summary of the examiners' feedback before and after evaluating the samples The responses collected prior to the rating (A3b, see Table 6) reflect the examiners' general perceptions regarding the differentiation between various band levels In contrast, the feedback given post-evaluation (C1) reveals their insights on utilizing the scale to differentiate among the samples they assessed, specifically for Arabic and Punjabi speakers at levels 5, 6, and 7.

It is important to note that the percentages presented may not total 100% due to rounding to one decimal place Out of the original 27 examiners, only 25 responded to this question; one examiner did not progress from Phase 1 to Phase 2, while another participated in Phase 2 but did not answer this specific question.

Table 9: Ease of distinguishing between adjacent band levels before and after rating the samples

Table 9 indicates that the examiners generally leaned towards the mid-point in their evaluations during both assessments Notably, there was a decline in their confidence regarding the differentiation of band levels among candidates in the samples, with only 16.0% expressing that they found this task easy.

After evaluating the samples, the results indicated a decrease to 40.7% compared to the initial ratings Qualitative feedback suggests that the specific language backgrounds chosen for this study posed notable challenges for certain examiners.

E2: Some accents that I am not used to hearing are more difficult to decipher than others

My ears are not as attuned to these sounds

Differentiating accents can be challenging, especially when they stem from diverse linguistic backgrounds In my interviews, most candidates had accents that were unfamiliar to me, making it difficult to distinguish between them.

Other comments suggest that they may have also found it particularly difficult to distinguish between the band levels selected for this study (5, 6 and 7), as in:

E1: The most difficult is for Band 7 and Band 5 where the descriptors cross some of one and some of another I would rather have clear guidelines for each distinct area

E21: Levels 6-7-8 are a little difficult sometimes

Examiners faced challenges in distinguishing between band levels during candidate assessments, as highlighted by data from Questionnaire C (C2) Nearly all respondents reported difficulties in selecting Pronunciation bands, particularly between Bands 6-7 and 5-6, with 54.1% and 37.5% of examiners respectively noting these distinctions as problematic The specific issues encountered when differentiating between band levels will be further explored in relation to Research Question 2c).

Responses Difficult band level decisions n % of examiners (n = 24)

A total of 24 examiners provided feedback on the question; however, some reported multiple band decisions that complicated the responses As a result, the overall number of responses does not equate to 24, and the percentages do not total 100%.

Table 10: Pronunciation bands examiners found difficult to choose between when awarding a Pronunciation score to the samples

The examiners encountered challenges in differentiating between band levels 5, 6, and 7, as evidenced by the significant variation in Pronunciation scores assigned to the samples According to the Methodology section, the 12 samples were previously scored at band levels 5, 6, or 7 by IELTS Australia, with four samples at each level Table 11 illustrates the distribution of scores awarded by the examiners, highlighting that the figures in bold indicate matches with IELTS-assigned scores, while shaded cells represent discrepancies exceeding one band level Notably, scores ranged from Band 3 to Band 8, often differing from the original IELTS assessments, with less than half of the scores at each band level aligning with the IELTS-assigned score.

Note: Percentages may not add up to totals or 100% because of rounding to one decimal place.

Table 11: Pronunciation scores awarded by examiners

Table 11 shows that the difficulty the examiners reported in distinguishing between band levels 6 and

7 (see Table 10) is reflected in the scores they awarded Firstly, few examiners awarded a

The analysis of pronunciation scores reveals that Band 7 samples face significant challenges, with only 29.8% aligning with the IELTS-assigned score Notably, over a quarter of the scores varied by more than one band, with 22.1% differing by two bands and 3.8% by three bands Examiners frequently assigned a score of 6 to Band 7 samples, with 40.4% receiving this lower score compared to just 29.8% scoring a 7 Interestingly, some Band 6 samples were awarded a score of 7, indicating inconsistencies in scoring.

In the recent assessment, Band 6 emerged as the most frequently awarded score, accounting for 36.5% (114 out of 312 total scores), while Bands 5 and 7 represented 26.0% and 22.4%, respectively Qualitative feedback from Questionnaire C indicates a potential bias towards awarding Band 6.

E11 (C1): Band 6 seemed the easiest and the most common This is the band I usually give during the IELTS tests also

E29 (C2): To some extent I am hesitant to give a higher band score [meaning above 6] to candidates if there is still a noticeable accent even if they are actually quite easy to understand

E11 (C5): I would like [the descriptors for] bands 7 and 5 to be longer as often I find it difficult to differentiate If I am confused, I often find myself choosing 6 as a default

The findings were corroborated by the VP data, as illustrated in Table 12, which presents the Pronunciation scores assigned by each VP examiner Notably, there was a significant tendency to award a Band 6 score, comprising half of the total scores (12 out of 24) Additionally, one examiner, VP5, consistently awarded Band 6 to all samples evaluated, which included one Band 5, one Band 6, and two Band 7 samples.

VP 1 VP 2 VP3 VP4 VP5 VP6

Note: (+) next to the score signifies that this score is higher than the IELTS-assigned score by the amount indicated

Table 12: Pronunciation scores awarded by VP examiners

The VP examiners seemed to have similar difficulties distinguishing between band levels:

! less than half of the 24 VP scores matched the IELTS-assigned scores

! only two samples (5A1 and 7P2) were awarded the IELTS-assigned score by both examiners who rated them

! only one VP examiner (VP2) awarded IELTS-assigned band scores to all four samples that she rated

Like Phase 2 examiners, VP examiners also seemed to have particular difficulties awarding a score of

In the evaluation of IELTS writing samples, a trend emerged where scores were predominantly assigned as Band 6 rather than Band 7 Specifically, out of four samples that did not align with the IELTS-assigned scores, three were classified as Band 7 (7A1, 7A2, 7P1), yet each received at least one score of 6 from the VP examiner.

Research question 2b)

Research Question 2b): When using the revised IELTS Pronunciation scale to award a pronunciation score to candidates at band levels 5, 6 and 7, which features of pronunciation do examiners take into consideration?

As discussed above (see Table 8), before rating the samples, Phase 2 examiners indicated that they considered intelligibility and listener effort to be the most important features when awarding a

When evaluating pronunciation scores, it appears that examiners placed significant emphasis on concrete features of connected speech, including intonation, stress, and rhythm Feedback from the rating process indicates that these elements influenced scoring decisions, particularly in distinguishing between Band 7 and lower bands Responses to specific questions in Questionnaire C highlight that examiners considered these features crucial in assessing higher proficiency levels.

E21 (C2): A high 6 is very close to a 7, with the impression of stress and intonation often making the difference

E23 (C4): Individual sounds of course, and around Band 7 rhythm (esp w Indian/Pakistani speakers) is important

At higher proficiency levels, distinguishing between levels 6, 7, and 8 becomes challenging While level 6 is generally comprehensible, level 8 is easily understood The nuances that differentiate these levels likely hinge on intonation and stress patterns in speech.

In Phase 3 of the study, VP examiners shared valuable insights regarding the criteria they considered while rating samples After assigning a Pronunciation score, they summarized their rationale for that score Subsequently, during the review stage, they offered verbal reports detailing the specific features that influenced their assessments.

Table 13 highlights the frequency of different features mentioned by VP examiners during the rating stage, revealing that intonation (83.3%) and chunking (75.0%) were the most frequently cited aspects of connected speech This finding contrasts with the priorities set by Phase 2 examiners, who emphasized intelligibility and listener effort as key factors (refer to Table 8) Notably, intelligibility was mentioned in only 54.2% of the summaries, indicating it was less frequently acknowledged compared to several other features.

The difference in evaluation may stem from the task's influence, as VP examiners often rely on the descriptors at the corresponding band levels, systematically checking them off during their summaries.

Initially, I considered rating the speaker a 4 due to his hesitance However, upon further evaluation, I leaned towards a 6 as he met the first three parameters of Band 4, demonstrating acceptable phonological features and improved chunking once he warmed up Despite some lapses in rhythm and occasional choppy intonation and stress, he was generally understandable, though it required some effort While individual words were pronounced, clarity was frequently lacking, leading me to ultimately assign a score of 5.

VP examiner summaries in which the feature was mentioned

Accent 5 20.8% a Number of VP summaries (6 VP examiners X 4 samples each) b Includes word stress and sentence stress as comments did not always differentiate between the two

Table 13: Features mentioned by VP examiners summarising their reasons for awarding a Pronunciation score

The VP examiners primarily assigned scores of Band 5 and Band 6, referencing features outlined in Bands 4 and 6, where intelligibility is not specifically mentioned In contrast, the Phase 2 examiners utilized a provided list that explicitly included intelligibility as a key feature.

During the review stage of Phase 3, intelligibility emerged as one of the top three features highlighted by VP examiners when assessing candidates' pronunciation As illustrated in Table 14, the breakdown of features mentioned during review turns reveals that phonemes received the most comments, followed by intonation and intelligibility Notably, there was significant variability in the assessments among the six VP examiners.

Table 14 reveals that a significant majority of comments on phonemes (64 out of 76) originated from a single examiner, VP3 Additionally, there was notable variation in the focus of comments among examiners; VP1 and VP2 primarily addressed intelligibility, while VP5 and VP6 frequently highlighted intonation, and VP4 concentrated mainly on rhythm.

VP examiners’ years of experience and number of review turns

VP1 (1.5yrs) VP2 (0.2yrs) VP3 (10 yrs) VP4 (5yrs) VP5 (2.5yrs) VP6 (2yrs) Feature

Note: The figures in bold type in the shaded cells are the features mentioned at the most review turns by each VP examiner

Table 14: Pronunciation features commented on in VP review turns by each examiner

Table 14 reveals significant variation among VP examiners in the number of review turns, with VP4 conducting 21 and VP3 conducting 104 This disparity indicates differing perceptions of which features were deemed noteworthy for comment during the review process.

In the analysis of examiner comments, it was found that phonemes were a significant focus only in VP3, where they made up 45.4% of the feedback In contrast, VP2 and VP1 showed minimal attention to phonemes, with only 1.7% and 2.4% of their comments addressing this aspect, respectively.

! Although intonation was mentioned by VP5 and VP6 in 32.4% and 29.8% of their turns respectively, VP1 only commented on this feature in 4.8% of her turns

! Although rhythm was the feature that VP4 commented on most frequently (in 22.2% of her turns), VP5 did not mention it at all

! While VP5 mentioned word stress in 5.9% of her turns, VP6 mentioned it in 15.5% of hers

! Similarly, the number of turns in which chunking was mentioned ranged from 2.9% (VP5) to 14.3% (VP6)

Variability in ratings among VP examiners can be attributed to the fact that each examiner evaluated only four samples, each with a different mix Notably, significant discrepancies were observed even when two examiners assessed the same sample, highlighting the inconsistency in evaluations.

In the study, two VP examiners evaluated each of the 12 samples, with only samples 5A1 and 7P2 receiving consistent IELTS-assigned scores from both examiners A comparison of the review comments revealed significant differences in the attention given to pronunciation features; for instance, VP3 provided 32 comments on sample 5A1, focusing on concrete phonetic elements, while VP1 made only eight comments, emphasizing global intelligibility Despite their differing approaches, both examiners assigned the same score to 5A1 The disparity in their feedback may be attributed to VP3's extensive experience as an IELTS examiner and trainer, suggesting that greater expertise can influence the evaluation of pronunciation features.

Table 15 reveals significant discrepancies in the features identified by the two examiners who evaluated sample 7P2, particularly in the number of review turns recorded by each examiner.

The VP examiner's approach was less extreme, with VP2, the least experienced examiner with only two months of experience, emphasizing global judgments of intelligibility (21.9%) and listener effort (18.8%) In contrast, VP6, who has two years of experience, only mentioned these aspects once and not at all, respectively Instead, VP6 concentrated on intonation (25.9%), a feature that VP2 barely acknowledged Despite both examiners awarding the same score, their differing focuses suggest that their levels of experience and expertise influenced their evaluation processes.

(number of review turns) VP1 (n=8) VP3 (n2) VP2 (n) VP6 (n)

Features of pronunciation n % of review turns n % of review turns n % of review turns n % of review turns

Note: The figures in bold type are the feature mentioned at the most review turns for each VP examiner

Table 15: Frequency of comments on features of pronunciation at review turns when same score awarded by both VP examiners

Research question 2c)

Research Question 2c): When using the revised IELTS Pronunciation scale to award a pronunciation score to candidates at band levels 5, 6 and 7, what problems do examiners report regarding use of the scale?

In this study, the Phase 2 examiners expressed a generally positive view regarding the usability of the descriptors and the increased number of band levels, as indicated by their responses to Questionnaire A.

The analysis of Questionnaire C, following the evaluation of samples and feedback from VP examiners in Phase 3 of the study, identified two key issues: first, the descriptors for Bands 3, 5, and 7 require attention; second, there is a significant overlap between the Pronunciation scale and the Fluency and Coherence scale.

4.6.1 The descriptors at Bands 3, 5 and 7

In response to a number of different questions in Questionnaire C, the majority of Phase 2 examiners

(19 out of 26, 73.1%) indicated at some stage that they would like more specific details in the

Examiners expressed challenges in interpreting the pronunciation descriptors at Bands 3, 5, and 7, particularly regarding the terms 'positive features' and 'some but not all.' In response to a query about which bands were difficult to differentiate, seven examiners specifically highlighted the confusion surrounding Bands 5 and 7.

E14: 5 and 4 is always hard eg ‘5’ refers to all the positive features of Band 4, yet there aren't many!!

E11: 5-6 especially, as descriptions for Band 5 are quite minimal, similar for Band 7 6-7

E13: 6 and 7: It is not clear what qualifies as a ‘7’

E18: 6- 7-8 because there is no specific Band 7 indicator, it makes it a little more difficult to assess

In C3, examiners were asked to choose between the following three statements regarding the rationale for awarding a band score of 5 and they were then invited to comment on their answer

1 The candidate displays all the features of 4 and most of the positive features of 6

2 The candidate displays all of the features of 4 and all but one of the positive features of 6

3 The candidate appears to be mid-way between a 4 and a 6

Responses to Q3 revealed confusion regarding the phrase "some but not all" in the new 'in between' band levels Out of 26 examiners, only two correctly interpreted Statement 2, aligning with the guidelines provided in the IELTS instructions (IELTS 2008a) Their feedback indicates that, instead of evaluating all features of a score of 4 and nearly all positive features of a score of 6, examiners tend to concentrate on a specific feature when assigning scores.

E5: I tend to use the 'chunking' descriptor more as a benchmark

E29: I tend to focus more on 'can be generally understood ' descriptor even though the other descriptors are also considered

Ten examiners selected Statement 1 indicating they thought a speaker should be awarded a Band 5 if a candidate displays ‘most of the positive features of 6’ For example:

E6: I would award a 5 if the candidate achieves 2 or more (but not all) of the Band 6 criteria

E19: Some, but not all could fall under any of the above definitions This is poorly expressed and too arbitrary especially in something as important as pronunciation

And 10 examiners selected Statement 3: ‘the candidate appears to be mid-way between a 4 and a 6’ For example:

E16: Identifying positive and negative features is essential for evaluation A 'star' symbol can effectively highlight the positive attributes, such as when a speaker attempts to use intonation, even if their control is limited This raises the question of whether such attempts should be classified as positive or negative in overall assessment.

E22: 1+2 above do not suggest a mid-way mark between 4+6

E26: The candidate should display all positives of 4 and at least one positive of 6

(not 'most' or 'all but one')

The remaining four examiners either chose two alternatives or amended one of the options to fit their view of what the correct answer should be

In response to C5 regarding the length of the Pronunciation descriptors, 53.9% of examiners believed the descriptors were appropriately sized, while 23.1% suggested they should be longer Six examiners specifically noted that the descriptors for Bands 5 and 7 were inadequate Negative feedback regarding the length and wording of descriptors for Bands 3, 5, and 7 came from nine examiners, with two recommending additional options or guidance for these levels Criticism of the new band levels was evident, with two examiners labeling them as "cop outs."

The introduction of the new descriptor bands, particularly 5 and 7, has been met with disappointment, as they are perceived as inadequate and a simplistic solution Many candidates are expected to fall within these ranges, leading to concerns about their effectiveness After a long wait for these new bands, the lack of clarity and usefulness compared to previous descriptors has left many feeling let down.

Others indicated a desire for more specific descriptions at the new band levels and reported having issues with the concept of ‘positive features’ For example:

E14: I don't like the ‘displays all positive features of Band X and some but not all positive features of Band Y.’ Too confusing in time pressure situation

In the revised Bands 9-7-5 system, each band should highlight the positive attributes of both the higher and lower bands For instance, a band score of '7' should include the strengths and positive features associated with bands '8' and '6' in its descriptor, ensuring a comprehensive understanding of each level's qualities.

Fifteen out of the twenty-one examiners who provided feedback on the revised Pronunciation scale (C11) expressed a desire for increased specificity, particularly emphasizing the need for more detailed criteria at Bands 3, 5, and 7.

E2: As mentioned, 3, 5 and 7 could be expanded more

E16: I’d like descriptors listed in the new Bands

E19: This is insufficient detail in Bands 3, 5 and 7

E21: Levels 5 and 7 are defined by reference to other levels which is not always easy

E27: No detail for Bands 3, 5 and 7

Concerns regarding the wording of descriptors were highlighted in responses to C12, where 14 out of 24 examiners recommended modifications for Bands 3, 5, and 7 to enhance the revised scale.

E10: It can be changed to be ‘mid-way’ or more descriptors given

E13: Again, being more explicit in the in-between Bands - 3, 5, 7

E14: Get rid of the ‘displays features of X, but not all features of Y.’ Replace with more easily assessable descriptors

E18: A bit more detail for Bands 3, 5 and 7

E20: The descriptors for odd numbers should be more explicit ‘some, but not all, ’ needs to be clearer

E26: Maybe, put negative feature in bold for each band to distinguish them

Despite the examiners favoring the revised scale over the old one, they expressed concerns regarding the clarity of the descriptors in the new bands and a desire for more specificity Additionally, there were worries about the overlap between the Pronunciation scale and the Fluency and Coherence scale.

4.6.2 The overlap between the Pronunciation scale and the Fluency and Coherence scale

A significant theme identified by Phase 2 examiners and reviewed by VP examiners in Phase 3 was the overlap between the Pronunciation scale and the Fluency and Coherence scale Eight out of 26 Phase 2 examiners noted challenges in distinguishing between these two scales and addressing the perceived overlap in their assessments.

E2 (C7): I wonder if speech rate should be under Fluency and Coherence rather than under

E10 (C7): I know repetition is covered in F& C, but I find it affects intelligibility in some candidates

Many speech samples exhibit issues with rhythm, which is crucial for IELTS scoring This aspect is closely linked to fluency and coherence (FC), reflecting how quickly a candidate speaks and their proficiency in chunking language effectively.

Evaluating a candidate's coherence can be challenging, especially when their pronunciation is notably poor In instances where I assign a score of 6 for coherence but feel they merit only a 4 for pronunciation, I question whether to reassess coherence based on their pronunciation This dilemma often leads to uncertainty in my evaluations.

Further insight into concerns about the overlap between these two scales emerged from review turn comments by the VP examiners As noted above (see Research Question 2b), at 12 review turns,

VP examiners commented on features of Fluency and Coherence rather than Pronunciation, as in:

VP1/5A1: Yeah his rate of speech, that’s sort of bringing him down from a 6 down to a 5 because of the the- the speech is quite slow and hesitant in a way

At the conclusion of their verbal protocol sessions, two VP examiners noted a strong correlation between the two scales.

Summary of findings

Phase 2 examiners expressed a preference for the revised Pronunciation scale, finding it easier to use due to its clear descriptors and increased band levels They felt confident in assessing pronunciation features, particularly in judging intelligibility and listener effort, which they deemed crucial for scoring However, they faced challenges in differentiating between band levels during actual ratings, often assigning scores between Band 3 and Band 8 to candidates with IELTS scores of 5, 6, or 7 The distinction between Bands 6 and 7 proved especially difficult, leading to a reluctance to award a score of 7, resulting in Band 6 being the most frequently assigned score.

The difficulty distinguishing between Bands 5, 6 and 7 when rating the samples was also reflected in

In the VP data analysis, Band 6 emerged as the most frequently awarded score, with less than 50% of the scores aligning with IELTS-assigned ratings Examiners identified intonation and chunking as key aspects of connected speech, alongside the overall listener effort However, their verbal reports revealed significant variability in the features they observed and discussed, even when analyzing the same speech segment This inconsistency extended to their descriptions of similar features and their evaluations of candidates' pronunciation accuracy Additionally, the terminology used to describe global pronunciation features varied among examiners, suggesting that some may have been influenced by criteria beyond the established Pronunciation scale descriptors when assigning scores.

Two areas of concern about the revised Pronunciation scale were identified: (a) the specificity of the descriptors at Bands 3, 5 and 7, and (b) its overlap with the Fluency and Coherence scale

Examiner attitudes to, and use of, the scales

This study aimed to examine examiners' perceptions of the revised Pronunciation scale and its application in scoring speakers from diverse language backgrounds at key band levels 5, 6, and 7 The findings indicate that examiners generally favored the revised scale over its predecessor, expressing confidence in assessing the outlined features and appreciating the usability of the increased band levels Their positive feedback on the scale's length and content highlights its effectiveness in avoiding the pitfalls of overly complex descriptors, which, as noted by Orr (2002), can hinder consistent usage.

Examiners generally viewed the revised Pronunciation scale positively, but faced challenges in differentiating between band levels, particularly between Bands 6 and 7, often defaulting to a score of 6 This inclination persisted despite an equal representation of candidates across Bands 5, 6, and 7 Brown (2006) noted a similar trend with the previous four-point scale, where Band 6 was frequently used as a default due to reluctance in awarding lower or higher scores This scoring tendency could significantly impact candidates who rely on these assessments for various gate-keeping purposes, suggesting that the introduction of in-between bands in the revised scale has not effectively resolved these issues.

Confusion among examiners regarding the interpretation of the new in-between bands in IELTS may stem from discrepancies in the documentation Specifically, the descriptors for Bands 5 and 7 state that "some but not all" positive features of Band 6 must be present to award a Band 5, yet the self-access training materials for examiners provide a differing definition Clarifying the intentions and specifications for these band levels could enhance understanding and consistency in scoring.

Examiners face challenges in using the Pronunciation scale due to its perceived overlap with Fluency and Coherence, as both aspects are intricately linked in assessing spoken English proficiency For example, appropriate pausing that groups words into meaningful chunks is a key feature of pronunciation, yet pauses are also used as a measure of fluency Additionally, while some researchers classify speech rate as an element of pronunciation, it appears in both scales of the IELTS descriptors The overlap in documentation regarding speech rate, hesitation, and chunking further complicates the distinction between these scales This repetition in wording may hinder examiners' ability to assign a clear score for pronunciation, making the evaluation process more complex.

Variation between examiners

Examiners exhibited significant variation in their scoring and evaluation of the same samples, as revealed by VP data This variation was evident in the specific pronunciation features they focused on during their ratings and comments Differences among examiners included the number of samples they referenced, the sections they highlighted, the particular features they chose to comment on, and the descriptions they provided for those features.

Assessing speaking skills presents significant challenges, marked by inherent variability (McNamara 1996, p 127) Orr (2002) highlighted this variability in his study of the First Certificate in English (FCE) speaking test, revealing that examiners often rated samples similarly yet assigned different scores, or conversely, gave the same score while focusing on different aspects of performance Orr concluded that each rater's scoring was influenced by a unique combination of factors (Orr 2002, p 151) Additionally, the context of the speaking test may further elucidate the variations in the feedback provided by examiners (Hubbard et al.).

While the implementation of a scale with specific descriptors has somewhat mitigated variations in interpretation, individual factors such as personal interest, expertise, and subjective interpretation continue to pose challenges.

Variability in assessment outcomes can be attributed to several factors, including individual examiner experience, expertise, and preferences, as well as the characteristics of spoken assessments and the specific challenges of using rating scales Although examiners in this study completed questionnaires and rating tasks in a familiar context and reviewed sample recordings similar to real test conditions, they lacked face-to-face interactions and only evaluated a single segment of the spoken interview This limitation may affect their rating judgments, as research suggests that examiners tend to score audio samples more harshly (Taylor & Jones).

2001, p 2) However, since the examiner scores varied in both directions at Bands 5 and 6, other factors are obviously important here

The study involved trained IELTS examiners with teaching experience ranging from three to 30 years and examination experience from less than a year to 13 years While the research did not explicitly explore the correlation between these factors and scoring tendencies, the VP data suggested that examiner background may influence scoring Notably, the findings revealed that the most experienced examiners or English teachers did not always assign scores that aligned closely with IELTS standards However, expertise appeared to enhance the accuracy in identifying and describing specific phonological features.

In phoneme identification, some VP examiners demonstrated clear expertise and were able to articulate issues effectively, while others provided minimal commentary on this aspect The extent of their ability to discuss phonemes explicitly remains uncertain This reflects a broader trend where many teachers lack confidence in this area, and even seasoned listeners struggle with making accurate judgments (Schmid & Yeni-Komshian, 1999).

According to Derwing & Rossiter (2003), Levis (2006), and Macdonald (2002), the absence of comments from an examiner raises questions about their awareness of specific features It remains unclear whether the examiner overlooked a detail, lacked the expertise to provide feedback, or deemed the aspect unworthy of comment.

The rating process and what examiners take into consideration

The rating process for pronunciation assessment, while challenging to generalize due to its intrusive nature (Brown 2007), reveals that examiners often rely on the Pronunciation scale and key indicators to evaluate speech samples This structured approach allows them to systematically identify and articulate pronunciation features, checking them off against established descriptors at various band levels Consequently, this method encourages examiners to focus on a diverse range of features when assigning scores and provides a consistent framework for discussing performance aspects Questionnaire data indicated a consensus on important pronunciation features, and verbal protocols reflected a shared emphasis on these elements, albeit with varying degrees of focus Thus, the scale serves as a guiding script for examiners, as noted by Lumley (2005), who highlights its role in providing a structured language and methodology for justifying assessments.

Despite a shift towards emphasizing concrete phonological features, Phase 1 examiners still considered global judgments of intelligibility and listener effort crucial for scoring decisions, while largely overlooking accent as a factor This preference for global assessments may stem from the majority of examiners having prior experience with an older scale, influencing their perception of pronunciation Additionally, the simplicity of these judgments requires less technical expertise, making it easier to label speech as unintelligible rather than conduct a detailed phonological analysis While this approach allows for assessments by less trained examiners, it also introduces variability in interpretation, as terms like "unclear" or "difficult to understand" may not convey the same meaning among different evaluators.

Examiners utilized terms such as clarity, intelligibility, and listener effort to assess speech samples; however, inconsistencies in their interpretations of these concepts were evident in the VP data This ambiguity allowed for imprecision in identifying specific features within the speech samples Research by Brown (2007) highlighted variations in examiners' tolerance for broader judgments like comprehensibility, suggesting a need for more precise definitions of these global pronunciation aspects For instance, while clarity is frequently mentioned in relation to word-level features like stress and phonemes, it lacks a clear definition in the glossary At times, clarity referred to the accuracy of sound articulation, while in other instances, it seemed to align more with intelligibility Although unclear articulation can impede understanding, the connection between clarity and intelligibility appeared ambiguous for some examiners.

The pronunciation assessment scale serves as a valuable checklist for examiners, helping them to justify their scoring decisions during evaluations According to Lumley (2005), raters often rely on this scale due to time constraints in verbal assessments However, while the scale provides a structured approach for discussing samples, it may also obscure variations and imprecision in evaluations Orr (2002) found that one-third of examiners tended to focus on overall impressions, highlighting potential inconsistencies in assessment practices.

‘the limitations of the rating scale and the training for focussing raters’ attention on the components of communicative language ability and not its overall effect’ (p 151)

Orr (2002) identified that examiners often commented on factors beyond the established rating scales, suggesting a lack of understanding of the model of communicative language ability While some examiners in the VP phase of the study occasionally deviated from the Pronunciation scale descriptors, this was not a significant issue Similarly, Iwashita et al (2008) noted that variability among raters was minimal, as they considered multiple factors within the scales to determine scores Overall, the revised Pronunciation scale appears to have helped the VP examiners maintain focus on the relevant criteria.

Fulcher, Davidson, and Kemp (2011) emphasize that the effective use and interpretation of a scale is influenced by socialization, particularly through proper training for examiners to align their understanding with the test developers' intentions and global standards While examiners generally view the revised Pronunciation scale positively and utilize it in the scoring process, there remains some ambiguity regarding the descriptors at specific band levels To enhance clarity and consistency in evaluating spoken performances, it is recommended that examiners engage in professional development to better understand the relationships between various features and their application across different proficiency levels.

Revisions to the descriptors at Bands 3, 5, and 7 are necessary to clearly identify specific features of performances at these levels Additionally, adopting further guidelines will help clarify the interpretation of the current descriptors.

! Instructions in training documentation be clarified to ensure consistent interpretation of the Band descriptors 3, 5 and 7

! Guidelines be developed to assist examiners to distinguish between similar features in the Pronunciation scale and Fluency and Coherence scale

Ongoing professional development, re-certification, and examiner moderation focus on key aspects of pronunciation assessment and the rating process This includes understanding the nature of the scoring scale and identifying specific pronunciation features Additionally, it emphasizes the importance of score standardization, which indicates the presence or absence of these features Furthermore, it explores the connection between the Pronunciation scale and the Fluency and Coherence scales, ensuring a comprehensive evaluation of language proficiency.

! Examiner selection processes ensure a minimal level of expertise in pronunciation.

Bent, T, Bradlow, A and Smith, B, 2007, ‘Segmental errors in different word positions and their effects on intelligibility of non-native speech’ in Language Experience in Second Language Speech

Learning, eds O-S Bohn and MJ Munro, John Benjamins Publishing Company, Amsterdam, pp 331-347

Birrell, R and Healey, E, 2008, ‘How are skilled migrants doing?’, People and Place, vol 16 no 1, pp 1-19

Boyd, S, 2003, ‘Foreign-born teachers in the multilingual classroom in Sweden: the role of attitudes to foreign accent’, International Journal of Bilingual Education and Bilingualism, vol 6, no 3/4, pp 283-295

Brown, A, 2006, ‘An examination of the rating process in the revised IELTS Speaking Test’,

IELTS Research Reports Volume 6, IELTS Australia, Canberra and British Council, London, pp 41-65

Brown, A, 2007, ‘An investigation of the rating process in the IELTS oral interview’ in

IELTS collected papers: research in speaking and writing assessment, eds L Taylor and P Flavey,

Cambridge University Press, Cambridge, pp 98-139

Brown, A and Taylor, L, 2006, ‘A worldwide survey of examiners’ views and experience of the revised IELTS Speaking Test’, Research Notes, vol 26, pp 14-18

Cauldwell, R, 2002, Streaming speech: listening and pronunciation for advanced learners of English, speechinaction, Birmingham, UK

Derwing, TM and Munro, MJ, 1997, ‘Accent, intelligibility, and comprehensibility: evidence from four L1s’, Studies in Second Language Acquisition, vol 19, no 1, pp 1-16

Derwing, TM and Munro, MJ, 2005, ‘Second language accent and pronunciation teaching: a research- based approach’, TESOL Quarterly, vol 39, pp 379-398

Derwing, TM and Rossiter, M, 2003, ‘The effects of pronunciation instruction on the accuracy, fluency, and complexity of L2 accented speech’, Applied Language Learning, vol 13, no 1, pp 1-17

De Velle, S, 2008, ‘The revised IELTS Pronunciation scale’, Research Notes, vol 34, pp 36-38

Fayer, JM and Krasinski, E, 1987, ‘Native and nonnative judgments of intelligibility and irritation’,

Language Learning, vol 37, no 3, pp 313-326

Field, J, 2005, ‘Intelligibility and the listener: the role of lexical stress’, TESOL Quarterly, vol 39, pp 399-424

Fulcher, G, Davidson, F and Kemp, J, (2011), ‘Effective rating scale development for speaking tests: Performance decision trees’ in Language Testing

Hahn, LD, 2004, ‘Primary stress and intelligibility: research to motivate the teaching of suprasegmentals’, TESOL Quarterly, vol 38, no 2, pp 201-223

Hansen Edwards, JG and Zampini, M, (eds), 2008, Phonology and second language acquisition, John Benjamins Publishing Company, Amsterdam/Philadelphia

Hubbard, C, Gilbert, S and Pidcock, J, 2006, ‘Assessment processes in speaking tests: a pilot verbal protocol study’, Research Notes, vol 24, pp 14-19

IELTS, 2008a, IELTS Speaking Test Self-Access Re-training Set for the Revised Pronunciation Scale, IELTS, Cambridge

IELTS, 2008b, IELTS Speaking Test Instructions to IELTS examiners, IELTS, Cambridge

IELTS, 2010, Examiner information, accessed 21 October 2010, from

Iwashita, N, Brown, A, McNamara, T and O’Hagan, S, 2008, ‘Assessed levels of second language speaking proficiency: how distinct?’, Applied Linguistics, vol 29, no 1, pp 24-49

Levis, JM, (2006), ‘Pronunciation and the assessment of spoken language’ in Spoken English, TESOL, and Applied Linguistics: Challenges for theory and practice, ed R Hughes, Palgrave Macmillan,

Lumley, T, 2005, Assessing second language writing: The rater’s perspective Peter Lang,

MacDonald, S, 2002, ‘Pronunciation: Views and practices of reluctant teachers’, Prospect, vol 17, no 3, pp 3-18

McNamara, T, 1996, Measuring Second Language Performance, Longman, London/New York

Munro, MJ and Derwing, TM, 1995a, ‘Foreign accent, comprehensibility, and intelligibility in the speech of second language learners’, Language Learning, vol 45, no 1, pp 73-97

Munro, MJ and Derwing, TM, 1995b, ‘Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech’, Language & Speech, vol 38, no 3, pp 289-306

Munro, MJ and Derwing, TM, 2001, ‘Modeling perceptions of the accentedness and comprehensibility of L2 speech: the role of speaking rate’, Studies in Second Language Acquisition, vol 23, no 4, pp 451-

Munro, MJ, Derwing, TM and Morton, SL, 2006, ‘The mutual intelligibility of L2 speech’, Studies in

Second Language Acquisition, vol 28, no 1, pp 111-131

Orr, M, 2002, ‘The FCE Speaking test: using rater reports to help interpret test scores’, in System, vol 30, no 2, pp 143-154

Schmid, PM and Yeni-Komshian, GH, 1999, ‘The effects of speaker accent and target predictability on perception of mispronunciations’, Journal of Speech, Language, and Hearing Research, vol 42, no 1, pp 56-64

Segalowitz, N, 2010, Cognitive bases of second language fluency, Routledge, London

Taylor, L and Jones, N, 2001, ‘Revising the IELTS Speaking Test’, Research Notes, vol 4, pp 9-12 Zielinski, BW, 2008, ‘The listener: no longer the silent partner in reduced intelligibility’, System, vol 36, no 1, pp 68-84

Questionnaires

(Note: In order to conserve space, the lines provided for answers have not been included in this version.)

Thank you for participating in our study on assessing pronunciation as an examiner using the new IELTS Pronunciation scale Your insights and experiences are valuable to us, and please be assured that all responses will remain strictly confidential.

How many years have you been an IELTS examiner? years

How many years have you been teaching ESL / EFL? years

What languages do you speak?

What language did you speak when you were growing up?

What language do you speak at home now?

In which countries have you lived and for how long?

What qualification(s) do you have? (Tick one or more)

! Diploma in Education (TESOL method) ! Graduate Certificate in _

! Graduate Diploma in ! Masters in _

! Bachelor of Education (TESOL) ! Bachelor of Arts (Major: _ )

! Other (please specify) _ ! Other (please specify) _

1 How easy have you found the descriptors to use on the following IELTS Speaking test scales?

Give reasons for your answer

2 How confident do you feel about the accuracy of your rating on the following scales? not very confident very confident

Give reasons for your answer

3 How easy do you find it to: very easy very hard

(a) Use the increased number of Band levels on the Pronunciation scale?

(b) Distinguish between Band levels for pronunciation?

(c) Understand the descriptors very hard very easy

4a How confident do you feel when you are judging the following features of a candidate’s speech? not very confident very confident

4b Which of these features of spoken language do you think are most important when you are awarding a pronunciation score? Please rank them if appropriate

5 When you re-certified on the new Pronunciation scale, did you have: a group session with an IELTS trainer or individual self access? (Underline your answer)

6 How well do you feel the training prepare you to examine using the revised Pronunciation scale? not very well very well

Please give reasons for your answer

7 If you are familiar with previous Band scale, which scale do you prefer? Underline your preferred answer

The previous 4 band scale or The revised 9 band scale?

8 Do you have any other comments on the revised Pronunciation scale?

(Note: to conserve space, the rating scales for all speakers are not included in this version.)

You will hear 12 recordings of Part 3 of the IELTS Speaking Test, the 4-5 minute two way discussion

For each speaker, listen to the recording, refer to the scales as you would when examining and then write your

To assess your IELTS pronunciation band score, listen to the recording sections again as needed, similar to the actual IELTS test environment After evaluating your performance, circle the number that reflects your confidence in the accuracy of your rating.

How confident are you that this rating is accurate? Not at all confident Very Confident

(Note: In order to conserve space, the spaces provided for answers have not been included in this version.)

1 How easy did you find it to distinguish between Pronunciation Band levels for these candidates? very easy 1 2 3 4 5 very hard

2 Were there any Pronunciation Bands you found it difficult to choose between?

Yes/ No If yes, which ones and why?

3 Which statement best fits your understanding of the rationale for awarding a Pronunciation Band 5?

1 The candidate displays all the features of 4 and most of the positive features of 6

2 The candidate displays all of the features of 4 and all but one of the positive features of 6

3 The candidate appears to be mid-way between a 4 and a 6

4 When you were assessing Pronunciation which part(s) of the descriptor did you generally find yourself paying most attention to?

5 Do you think the descriptors are about the right length or would you prefer them to be shorter/longer? Please elaborate

6 Do you think the descriptors cover features of pronunciation that can be readily assessed in the testing situation? Yes/no Please elaborate

7 Are there aspects of pronunciation you think are important that are not mentioned in the descriptors? If so, please note them below

8 Which part of the test is most useful to you when making a judgement about pronunciation? Please circle the best answer:

9 How is your final Pronunciation rating achieved? How do you work towards it? At what point do you finalise your Pronunciation rating?

10 What do you like about the new Pronunciation scale?

11 What don’t you like about the revised Pronunciation scale?

12 In your opinion, how could the Pronunciation scale be improved?

Coding categories for VP comments

The word accent is used by the VP examiner

Comments that reflect on what the candidate might be feeling or thinking

The word chunking is used by the VP examiner or the VP examiner indicates that the candidate pauses in the right place

The word clarity is used by the VP examiner Includes comments related to how clear a candidate’s speech is

Comments on anything above the word level Includes stress at sentence level

Effort required to understand candidate

The degree of effort required of the listener to understand the candidate Includes comments related to how hard a candidate is to understand

Features contributing directly to decision on band level assigned

Any connection between a feature of pronunciation and the band level assigned or the decision making process of assigning a band level

Comments on features and/ or use of terms that are not included in the band descriptors or key indicators for the revised IELTS Pronunciation scale

The VP examiner either (1) uses the word intelligibility, (2) comments she can't understand what the candidate is saying, or

Intelligibility, or word recognition, can be influenced by specific features of speech Certain characteristics may distort the intended message, causing what the candidate says to sound different, while other features can enhance the clarity and ease of recognizing the spoken words.

The word intonation is used or the VP examiner’s comments are related to tone or pitch variation

Comments related to linking words together – related to phonemes rather than rhythm

Comments about something the candidate is doing wrong

The word phoneme is used or comments relate to sounds, consonants or vowels

Comments about something the candidate is doing right or well

The word rhythm is used or comments relate to timing (eg, stress timing, syllable timing) or linking of words in connected speech

Tiêu đề	The assessment of pronunciation and the new IELTS Pronunciation scale
Tác giả	Lynda Yates, Beth Zielinski, Elizabeth Pryor
Trường học	Macquarie University
Chuyên ngành	Linguistics
Thể loại	Research Report
Năm xuất bản	2008

Định dạng
Số trang	46
Dung lượng	811,04 KB