Unpacking the Relationship Between Formulaic Sequences and Speech Fluency on Elicited Imitation Tasks: Proficiency Level, Sentence Length, and Fluency Dimensions XUN YAN University of Illinois at Urbana-Champaign Urbana, Illinois, United States Previous research has provided ample evidence for the processing advantage of formulaic sequences, leading researchers and teachers to argue for its facilitation of speech fluency However, few studies have examined how the processing of formulaic sequences interacts with speaker proficiency and task difficulty; even fewer studies have investigated how formulaic sequences facilitate different dimensions of speech fluency To address these gaps, this study examined the impact of formulaic sequences on speech fluency for both first and second language speakers (N = 269) across proficiency levels on elicited imitation tasks Participants’ speech fluency was measured on both rate and pausing features Results from linear mixed-effects models reveal that formulaic sequences had a significant effect on the reduction of pauses, but not on speech rate The effect on pausing was stronger in the processing of long sentences and on intermediate second language speakers These results suggest that formulaic sequences have differential impacts on the rate and pausing dimensions of speech fluency, and the impacts are further conditioned by speaker proficiency and task difficulty Findings of this study have both theoretical and pedagogical implications regarding the acquisition of formulaic sequences and the development of speech fluency.(ancestor::component/descendant::publicationMeta/titleGroup/title=Reviews’))" doi: 10.1002/tesq.556 A s a long-recognized linguistic phenomenon, formulaic sequences have been extensively studied from both cognitive and sociocultural perspectives in the fields of corpus linguistics, psycholinguistics, second language acquisition, and second language (L2) pedagogy (Wray, 2002) The findings from corpus linguistics and 465 TESOL QUARTERLY Vol 0, No 0, 0000 © 2019 TESOL International Association psycholinguistics research on first language (L1) speakers suggest that (1) formulaic sequences are ubiquitous in natural language use and (2) formulaic sequences create a processing advantage that alleviates the cognitive pressure of speech production and facilitates the flow of conversation between interlocutors (Christiansen & Chater, 2016; Conklin & Schmitt, 2012) Although a similar processing advantage was observed in advanced-level L2 speakers (Conklin & Schmitt, 2008; Jiang & Nekrasova, 2007), scholars remain dubious about whether it applies equally across speakers of different proficiency levels (Northbrook & Conklin, 2019; Wray, 2017) In L2 teaching and assessment research, formulaic sequences have also gained popularity as a target construct in the development of fluency and overall language proficiency With focused instruction, L2 speakers tend to use more formulaic sequences, and their speech fluency improves over time (Boers, Brussels, Kappel, Stenfers, & Demecheleer, 2006; Serrano, Stengers, & Housen, 2015; Wood, 2009) However, in those studies, fluency tended to be measured through rate features (e.g., speech rate, mean length of run), but less often through pausing features (although Wood [2006, 2009] incorporated pausing when computing mean length of run) Little is known about how the processing advantage of formulaic sequences manifests through different dimensions of speech fluency Motivated by these gaps in the literature, this study examined whether formulaic sequences present a similar processing advantage on the rate and pausing dimensions of fluency and whether the processing advantage applies equally to L2 speakers across proficiency levels BACKGROUND AND MOTIVATION Processing Advantage of Formulaic Sequences: Evidence From L1 and L2 Speakers Formulaic language has been referred to using different terms: formulae, formulaic sequences, prefabricated patterns, multiword units or sequences, to mention a few (see reviews by Arnon & Christiansen, 2017; Wray, 2002) It also covers a broad range of phenomena such as idioms, collocations, phrasal verbs, binomials, and lexical bundles (Carrol & Conklin, 2019) This article focuses on what usually are referred to as lexical bundles, which are sequences of words that occur frequently in language, are generally corpus derived, and may or may not constitute a complete utterance However, these bundles have also TESOL QUARTERLY 466 been referred to as formulaic sequences or units in the literature For convenience, I use the umbrella term formulaic sequences to describe this linguistic phenomenon in the remainder of this article The prevalence of formulaic sequences in natural language leads scholars to believe that formulaic sequences are stored holistically in the speaker’s mental lexicon and thus are retrieved and processed more efficiently than noncollocated word sequences (Peters, 1983) The processing advantage is warranted by three assumptions: (1) limited cognitive resources for online processing (e.g., attention, working memory) encourage chunking of linguistic input, (2) the processing of high-frequency linguistic items is more automatic than that of lowfrequency items, and (3) grammar and the lexicon are not completely separable L1 speakers can internalize a bank of formulaic sequences, and their language production is marked by a flexible alternation between linguistic creativity and chunking (Ellis, 2002; Nattinger, 1980) or, in Sinclair’s (1987) terms, between the application of the open choice principle (words and rules) and the idiom principle (chunking) That said, it should be noted that despite the widespread belief in the processing advantage of formulaic sequences, there has not been concrete evidence to support the holistic representation of formulaic sequences in the brain; however, recent event-related potential research found the activation of a mental “template” in the brain when speakers process formulaic sequences versus nonformulaic sequences (Siyanova-Chanturia, Conklin, Caffarra, Kaan, & van Heuven, 2017, p 111) There has been an abundance of psycholinguistic evidence for the processing advantage of formulaic sequences from L1 speakers Previous research utilized self-paced reading tasks, elicited imitation tasks, and eye-tracking techniques to measure formulaic sequence processing both in isolation and embedded in texts Findings of these studies reveal that L1 speakers—both children and adults—process formulaic sequences faster and more accurately than nonformulaic sequences in both comprehension and production modes (Bannard & Matthews, 2008; Conklin & Schmitt, 2008; Siyanova-Chanturia, Conklin, & Schmitt, 2011; Siyanova-Chanturia, Conklin, & van Heuven, 2011; Sosa & MacFarlane, 2002; Tremblay, Derwing, Libben, & Westbury, 2011; Underwood, Schmitt, & Galpin, 2004) Studies have also revealed a processing advantage of formulaic sequences among advanced and intermediate L2 speakers (Conklin & Schmitt, 2008; Jiang & Nekrasova, 2007; Yeldham, 2018) Findings of these studies seem to indicate a close relationship between knowledge of formulaic sequences and proficiency level That is, an internalized knowledge of formulaic sequences is more likely to exist among advanced and intermediate L2 speakers than among low-proficiency 467 FORMULAIC SEQUENCES AND SPEECH FLUENCY speakers (Serrano et al., 2015; Siyanova-Chanturia, Conklin, & Schmitt, 2011; Valsecchi et al., 2014; although see Northbrook & Conklin, 2019, for an exception) However, when accounting for the processing advantage of formulaic sequences, it might also be the case that proficiency is confounded with exposure or familiarity Several studies have shown that when (low-proficiency) L2 speakers are familiar with the formulaic sequences (e.g., a similar expression existent in their L1 or a lexical bundle appearing in their textbooks), they demonstrate the typical processing advantage as L1 speakers (Carrol & Conklin, 2014, 2017; Northbrook & Conklin, 2019) These findings align with predictions of a usage-based approach to L2 acquisition, in which exposure influences memory and processing of linguistic forms In this approach, proficiency often serves as a proxy for exposure, such that higher proficiency participants have sufficient exposure to lead to a processing advantage whereas low proficiency participants have not had enough exposure to as many formulaic sequences In either account, it remains underexplored whether the processing advantage of formulaic sequences applies equally between intermediate and advanced L2 speakers To complicate the matter further, research also suggests that the processing advantage for L2 speakers is conditioned by not only the speaker’s proficiency level but also the difficulty level of language tasks used to elicit the processing of formulaic sequences (Yeldham, 2018) On one hand, the processing advantage might exist only after a threshold proficiency level is reached or vary in degree according to proficiency level On the other hand, the processing advantage might be needed or triggered only when the language tasks are sufficiently challenging Therefore, the processing advantage of formulaic sequences for L2 speakers remains an open question Formulaic Sequences and the Development of L2 Speech Fluency Speech fluency is a multifaceted, multdimensional construct Lennon (1990) defined fluency in both broad and narrow senses In the broad sense, fluency is synonymous with overall language proficiency; in the narrow sense, fluency refers to the temporal features of speech in two large dimensions, namely, rapidity (e.g., a high speech rate) and smoothness (e.g., the lack of disfluency) Skehan (2003) further divides the smoothness dimension into breakdown fluency (pausing) and repair fluency, although pause and repair are closely associated cognitive traces of speech disfluency Findings from empirical research TESOL QUARTERLY 468 have repeatedly confirmed that, although rapidity and smoothness constitute distinct constructs in L2 learning, teaching, and assessment, both dimensions have been observed to be strong predictors of speakers’ language proficiency (Ginther, Dimova, & Yang, 2010; Iwashita, Brown, McNamara, & O’Hagan, 2008) The abundance of psycholinguistic evidence has led researchers and practitioners in L2 pedagogy and assessment to associate the acquisition of formulaic sequences with the development of L2 speech fluency Theoretically, this is a reasonable speculation in that nativelike proficiency or fluency is marked by a high proportion of chunking and formulaic language use (Pawley & Syder, 1983) From a pedagogical standpoint, formulaic sequences can be used as a strategy to help L2 speakers alleviate the processing load and disfluency in speech production Following this line of reasoning, focused instruction of formulaic sequences should also facilitate the development of L2 speech fluency That said, it is important to note that the psycholinguistic studies reviewed above only examined the processing advantage as the dependent variable, but not speech fluency (e.g., speech rate, pausing) The association between formulaic sequences and L2 fluency can be supported by a cognitive perspective Segalowitz (2010) identified three domains of L2 speech fluency: cognitive, utterance, and perceived The relationship among the three domains is that cognitive fluency results in observable utterance fluency of the speaker, which further leads to the listener’s perception of fluency Based on Levelt’s (1989) blueprint for the speaker, Segalowitz speculated that disfluencies can occur at seven different stages of speech production related to the processing of lexico-grammar: microplanning, grammatical encoding, lemma retrieval, morpho-phonological encoding, phonetic encoding, articulation, and self-perception The ability to retrieve and produce grammatical and lexical items effortlessly during speech production results in a higher speech rate and the perception of higher fluency from listeners Findings from empirical studies are overall in support of this association (de Jong, 2016; Segalowitz, 2016; Wright & Tavakoli, 2016); however, pedagogically, a deeper understanding of the link between the processing advantage of formulaic sequences and the resultant changes in utterance fluency features can help language teachers and testers better assess and monitor the acquisition of formulaic sequences by language learners Studies have shown a significant effect of focused instruction on the use of formulaic sequences in speech (Wood, 2009) as well as a positive correlation between the use of formulaic sequences and rate features of speech fluency (Boers et al., 2006; Ejzenberg, 2000; Towell, Hawkins, & Bazergui, 1996) To some extent, these findings support 469 FORMULAIC SEQUENCES AND SPEECH FLUENCY the commonly held pedagogical assumption that the use of formulaic sequences facilitates the development of L2 speech fluency Formulaic sequences are articulated faster than nonformulaic sequences as words within a formulaic sequence tend to be phonetically reduced due to holistic processing (Arnon & Cohen Priva, 2013; Bybee & Scheibman, 1999) In this connection, it can be hypothesized that perception of higher speech fluency is a result of phonetic reduction on the formulaic sequences, which leads to higher speech rate However, there seems to another plausible explaination, that is, the holistic processing of formulaic sequences alleviates the cognitive processing load and thus reduces disfluencies in speech To further test these two hypotheses, the processing of formulaic sequences needs to be embedded in individual sentences and be examined in terms of its impact on both the rate and pausing features Previous research examined the processing of formulaic sequences in its own right, but whether formulaic sequences lead to faster speech rate at the sentence level remains largely underexplored Moreover, little is known about the relationship between formulaic sequences and the pausing dimension of fluency; investigations of this relationship can help us better understand the processing advantage of formulaic sequences and its manifestations in speech fluency The Processing of Formulaic Sequences on Elicited Imitation Tasks Speech fluency is influenced by not only the proficiency level of the speaker but also the difficulty of the speaking task (Robinson, 2001; Skehan, 1998) Therefore, in addition to speaker proficiency, it is reasonable to posit that task difficulty also influences the processing of formulaic sequences; Sinclair (1987) contended that, to facilitate communicative efficiency, L1 speakers prefer formulating utterances from chunks and will break formulaic sequences down to the individual word level only when necessary Although both openended and more controlled speaking tasks have been used to examine speech fluency, the difficulty of controlled speaking tasks is relatively easier to manipulate This study employed elicited imitation, a psycholinguistic task used to measure L2 proficiency that asks participants to listen to and repeat a series of sentences (van Moere, 2012; Yan, Maeda, Lv, & Ginther, 2016) Elicited imitation presents a clear advantage over open-ended speech tasks for measuring the processing of formulaic sequences That is, researchers can construct the stimulus sentence using a preselected set of formulaic sequences TESOL QUARTERLY 470 Asking the participants to repeat the sentences verbatim forces them to use the formulaic sequences If the participants are familiar with the formulaic sequences, it is likely to be easier for them to repeat the sentences Meanwhile, the task also elicits speech production, allowing for the analysis of speech fluency In summary, decades of research on formulaic sequences reveals converging evidence for its processing advantage and impact on speech fluency However, few studies have investigated how formulaic sequences interact with speaker’s proficiency level and task difficulty and how they influence different dimensions of speech fluency Motivated by these gaps, this study investigated (1) whether formulaic sequences create an equal processing advantage for L1 and L2 speakers across proficiency levels when produced in sentences of varying lengths and, if so, (2) how the processing of formulaic sequences influences rate and pausing dimensions of speech fluency In order to gauge the facilitation effect of formulaic sequences while controlling other variables affecting sentence processing and production, this quasi-experimental study utilized elicited imitation tasks to examine the processing of formulaic sequences within individual sentences RESEARCH QUESTIONS This study employed a quasi-experimental design to examine the processing of formulaic sequences on elicited imitation tasks by both L1 and L2 speakers Two research questions (RQs) were formulated to guide this study, with each question consisting of four subquestions: RQ1: How does the presence of formulaic sequences influence speech rate? Does the presence of formulaic sequences have a significant effect on speech rate? Does sentence length have a significant effect on speech rate? Does proficiency level have a significant effect on speech rate? Is there any significant interaction among the three independent variables? RQ2: How does the presence of formulaic sequences influence the number of silent pauses? Does the presence of formulaic sequences have a significant effect on the number of silent pauses? 471 FORMULAIC SEQUENCES AND SPEECH FLUENCY Does sentence length have a significant effect on the number of silent pauses? Does proficiency level have a significant effect on the number of silent pauses? Is there any significant interaction among the three independent variables? METHOD Participants Participants in this study were 17 L1 speakers of English and 252 English as a second language (ESL) freshmen enrolled in an English for academic purposes course at a large U.S university The L1 speakers of English were all native speakers of North American English, recruited as a baseline control group Descriptive statistics of the ESL students’ TOEFL iBT scores are included in Table Following the score interpretation guidelines by ETS and the Common European Framework of Reference (CEFR), the students’ proficiency levels were labeled as high intermediate (n = 89) to advanced (n = 163), largely corresponding to the B2–C1 levels (Council of Europe, 2011; Papageorgiou, Tannenbaum, Bridgeman, & Cho, 2015) The ESL students had larger score differences in their receptive subskills (i.e., listening and reading) than in productive subskills (i.e., speaking and writing) This restricted range of proficiency might have an impact on the interpretation and generalizability of the results However, this group of students is representative of the ESL population in U.S universities Though not as proficient as their L1 peers, these students are not lowproficiency L2 speakers and can be reasonably expected to have developed knowledge of high-frequency formulaic sequences The overwhelming majority of the participants were L1 Mandarin Chinese speakers (n = 236), with only a few participants from other L1 backgrounds: Korean (n = 7), Hindi (n = 3), Portuguese (n = 2), Spanish (n = 2), Italian (n = 1), and Vietnamese (n = 1) The age of the participants ranged between 18 and 25 years Identification of Formulaic Sequences This study targeted a list of high-frequency formulaic sequences identified in a corpus-driven approach Formulaic sequences were defined in terms of frequency and mutual information (MI) score TESOL QUARTERLY 472 TABLE Descriptive Statistics of L2 Participants’ TOEFL iBT Scores Intermediate Reading Listening Speaking Writing Total Advanced N M SD N M SD 89 89 89 89 89 22.55 21.52 20.08 22.06 85.95 2.18 2.43 1.63 2.00 3.08 163 163 163 163 163 25.83 24.76 21.30 23.44 95.19 2.58 2.75 2.28 2.58 4.91 Specifically, a list of commonly used spoken formulaic sequences was selected from the academic spoken formula lists in Simpson-Vlach and Ellis (2010) and Schmitt (2004) The formulaic sequences were then cross-validated with the Corpus of Contemporary American English for frequency and MI score, and all of them had a frequency higher than 10 occurrences per million words and an MI score larger than The overwhelming majority of the bundles occurred more than 20 times per million words, a suggested frequency cutoff for large corpora (Biber, Conrad, & Cortes, 2004) The mean occurrence of these sequences was around 50 times per million words across forms All formulaic sequences were three to four words long Although it is customary to select four-word lexical bundles in research (Biber et al., 2004), a number of three-word bundles are complete by themselves (e.g., all sorts of, as well as) Therefore, in this study, both three- and four-word bundles were selected Sequences with figurative meaning were excluded because the processing of those sequences is not tuned to frequency the same way literal formulaic sequences are (SiyanovaChanturia & Martinez, 2015) When selecting formulaic sequences, the frequency criterion was prioritized because word frequency influences the fluency and disfluency in lexical production of both L1 and L2 speakers (de Jong, 2016) However, a balance was strived for to select formulaic sequences across three functional types (i.e., referential expressions, stance expressions, and discourse organizers) and syntactic categories (i.e., verb phrase fragments, dependent clause fragments, noun and prepositional phrase fragments; Biber et al., 2004; SimpsonVlach & Ellis, 2010) Similarly, a balance was also strived for concerning the position at which formulaic sequences occurred in the sentences (i.e., toward the beginning, middle, or end of the sentence) The list of target formulaic sequences is presented in the Appendix The selected formulaic sequences were piloted on 20 ESL students (10 intermediate and 10 advanced) All of them were able to understand the meaning of the sequences and reported using them frequently in speech or conversation 473 FORMULAIC SEQUENCES AND SPEECH FLUENCY Elicited Imitation Task The formulaic and nonformulaic sequences were embedded in elicited imitation tasks The design of the tasks followed recommendations from previous literature, including a range of sentence length levels and the insertion of a delay between the stimulus and repetition, as well as the use of noncognitively challenging disruption tasks to prevent parroting and force the participants to process the meaning of the sentences (Vinther, 2002; Yan et al., 2016) Four parallel forms were developed, each comprising 24 sentences In each set of 24 sentences, only 12 sentences contained formulaic sequences These two conditions were also evenly distributed across three length bands All the sentences were authentic such that they either conveyed important information about college life or were sentences that college students frequently hear or say on campus The topics covered in those sentences included health insurance, academic coursework, student clubs and activities, lifestyle on campus, and lifestyle in the midwestern United States The elicited imitation tasks were administered in a computer lab Participants were randomly assigned to one of the four forms The tasks were timed: Participants had 20 seconds to repeat each sentence During the test session, each participant had the opportunity to complete two practice items before hearing the 24 target sentences Data Analysis The present quasi-experimental study employed a mixed-effects, repeated-measures design, with three independent variables and two dependent variables Independent variables The three independent variables were the presence of formulaic sequences (formulaicity), proficiency, and task difficulty The three independent variables were fully crossed Formulaicity was a two-level within-subject factor (i.e., with vs without formulaic sequences) Although L2 participants’ proficiency levels were regarded to range between L2 intermediate to L2 advanced, proficiency was treated as a continuous variable in the statistical model, which was operationalized in terms of the participants’ TOEFL total score1; however, for the consideration of interpretability, descriptive statistics of fluency features were also reported by proficiency levels Task difficulty of The TOEFL scores were the official scores the students submitted for university enrollment, and all the participants were in their first semester at the time of the study TESOL QUARTERLY 474 articulation rate because speech rate tends to be a stronger predictor of fluency and overall proficiency whereas articulation rate is relatively more stable across speakers (Miller, Grosjean, & Lomanto, 1984) Number of silent pauses was selected instead of pause duration and filler pauses because it better reflects cognitive fluency and language proficiency (de Jong & Bosker, 2013) To ensure the reliability of automated fluency measures, 10% of the data were manually analyzed, and the Pearson correlation between the two methods was 84 for speech rate and 89 for number of silent pauses To account for the durational variation of individual elicited imitation responses, silent pause was normalized to number of silent pauses per second by dividing the number of silent pauses by the duration of responses Statistical analysis A total of 6,456 elicited imitation responses were recorded from the 269 participants Three responses were excluded from the analysis due to poor recording quality, resulting in a total of 6,453 speech samples To address the research questions, the descriptive statistics of fluency features were first examined Then, the lme4 package (Bates, Maechler, & Bolker, 2012) in R (R Core Team, 2012) was used to run linear mixed-effects models to analyze the impact of the independent variables on speech rate and number of silent pauses In these models, formulaicity, sentence length, and proficiency level were entered as fixed effects, along with random intercepts for participants and items Prior to the statistical analyses, screening procedures were performed to check the normality of the data No obvious deviations from homoscedasticity or normality were found for both dependent variables Because participants were randomly assigned to one of the four forms, the mixed-effects models also included form as a fixed effect to examine whether speech rate and number of silent pauses of the participants’ task responses differed substantially across forms No significant cross-form difference was observed for either speech rate (F (3, 226) = 1.05, p = 37) or number of silent pauses (F(3, 226) = 0.85, p = 47), suggesting that the form effect was negligible RESULTS RQ1: What Are the Main and Interaction Effects of Formulaic Sequences, Sentence Length, and Proficiency Level on Speech Rate? Descriptive statistics for both speech rate and number of silent pauses across conditions are summarized in Table Tables and TESOL QUARTERLY 476 present the summary statistics of the main and interaction effects of formulaicity, sentence length, and proficiency level on speech rate The omnibus tests revealed significant main effects for sentence length (F(2, 257) = 5.323, p = 005) and proficiency level (F(1, 268) = 25.484, p < 001) The beta coefficients for the contrasts in Table (the fourth column) show that participants’ average speech rate was 0.001 and 0.172 syllables per second faster on the long sentences than on the medium and short sentences, respectively (bL-S = 0.172, SE = 0.090, t = 1.910, p = 057; bL-M = 0.001, SE = 0.090, t = 0.012, p = 990) This shows that although speech rate is comparable between long and medium sentences, it is slower on short sentences The slower speech rate on shorter sentences was likely an artifact of timed tasks: shorter sentences were easier to repeat, and the participants had less pressure to rush through the repetition within the time limit In terms of proficiency level, as expected, proficiency showed a positive relationship with speech rate (bProficiency = 0.314, SE = 0.053, t = 5.911, p < 001), suggesting that higher proficiency speakers repeated the sentences at a faster speech rate In contrast, the main effect of formulaicity did not reach statistical significance (F(1, 1285) = 0.739, p = 390), suggesting that the presence of formulaic sequences does not TABLE Descriptive Statistics of Speech Rate and Number of Pauses per Sentence Speech rate Number of pauses Proficiency level Formulaic sequence Sentence length N M SD M SD L1 FS Short Medium Long Short Medium Long Short Medium Long Short Medium Long Short Medium Long Short Medium Long 17 17 17 17 17 17 163 163 163 163 163 163 89 89 89 89 89 89 4.48 4.56 4.85 4.39 4.75 4.61 3.98 4.06 4.07 3.88 4.09 3.99 3.93 3.96 3.90 3.81 3.90 3.91 0.91 0.68 0.58 0.63 0.88 0.65 0.90 0.70 0.69 0.74 0.68 0.69 0.88 0.59 0.65 0.79 0.67 0.72 0.38 0.37 0.57 0.41 0.38 0.71 0.55 1.14 1.47 0.55 1.28 1.72 0.70 1.28 1.85 0.71 1.56 2.29 0.55 0.60 0.70 0.53 0.57 0.75 0.86 1.16 1.38 0.71 1.30 1.45 1.02 1.21 1.52 0.93 1.47 1.75 NoFS L2 advanced FS NoFS L2 intermediate FS NoFS FS = sentence containing a formulaic sequence; NoFS = sentence containing no formulaic sequence 477 FORMULAIC SEQUENCES AND SPEECH FLUENCY TABLE Significance Tests of Main and Interaction Effects on Speech Rate Sum Sq Mean Sq 0.281 4.050 9.694 0.377 0.002 9.600 2.005 0.281 2.025 9.694 0.189 0.002 4.800 1.002 df1 df2 F p 2 2 1285 257 268 1285 6157 6137 6157 0.739 5.323 25.484 0.496 0.006 12.618 2.635 390 005