Prosodic transfer in vietnamese acquisition of english contrastive stress patterns

ARTICLE IN PRESS Journal of Phonetics 36 (2008) 158–190 www.elsevier.com/locate/phonetics Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns T Anh-Thu’ Nguye˜ˆ nÃ, C.L John Ingram, J Rob Pensalfini School of English, Media Studies, and Art History, University of Queensland, St Lucia, Qld 4072, Australia Received 15 June 2004; received in revised form August 2007; accepted September 2007 Abstract This paper reports a study of prosodic transfer effects in the production and perception of three English stress patterns (broad-focus noun phrase, narrow-focus noun phrase and compound) at the level of word and phrase prosody by Vietnamese learners of English The experiments examined the acoustic features and the perceptual strategies that native Australian English speakers and different groups of non-native speakers (Vietnamese beginning learners and advanced speakers of English) use to distinguish the three stress patterns The results showed that native speakers and non-native speakers differ in their use of acoustic patterns which are optimally suited to their respective first language phonologies for realizing the three English stress patterns Native speakers of English employed a combination of syntagmatic f0 (and correlated intensity) contrasts and duration in distinguishing the three stress patterns Vietnamese speakers had no problem in manipulating contrastive levels of f0 and intensity on accent-bearing syllables but failed to realize the timing contrast between compound words and phrases and the syntagmatic contrast of accent in larger units such as polysyllabic words or phrases, as evidenced by their failure to deaccent the second element of the compound and narrow-focus patterns Nevertheless, the advanced speakers’ ability to compress the constituents of the compounds and to deaccent the final nouns shows the effect of language learning/experience on prosodic acquisition Possible mechanisms that underlie the transfer effects involved in three stress patterns are also discussed Crown Copyright r 2007 Published by Elsevier Ltd All rights reserved Introduction A wealth of studies, based upon loanword formation (Blair & Ingram, 2003; LaCharite´ & Paradis, 2005; Silverman, 1992, among others) and phonetic and phonological accommodation in second language (L2) learning (Archibald, 1998; Best, 1995; Flege, 1995; Iverson et al., 2003; Kuhl, 1993, among others) have shown that native speakers perceive and produce words and utterances of L2 through a phonetic or phonological ‘filter’ of their native language (L1) Studies of segmental feature transfer effects dominate the literature and experimental studies of prosodic accommodation to a second language remain scarce, though in recent years have begun to appear (McGory, 1997; Nguyeˆñ & Ingram, 2005; Ueyama, 2000; Ueyama & Jun, 1998) ÃCorresponding author E-mail addresses: nguyenthianhthu@email.com (T.A.-T Nguyeˆñ), j.ingram@uq.edu.au (C.L.J Ingram), r.pensalfini@uq.edu.au (J.R Pensalfini) 0095-4470/$ - see front matter Crown Copyright r 2007 Published by Elsevier Ltd All rights reserved doi:10.1016/j.wocn.2007.09.001 ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 159 Current models of transfer effects (Best, 1995; Flege, 1995; Kuhl, 1993) have been formulated almost entirely upon studies of segmental contrasts All models acknowledge the importance of prior phonological learning, in the form of L1 imposed categorical boundaries on otherwise gradable phonetic dimensions But because they limit themselves to segmental transfer effects (where sound categories at single level of phonological contrast map onto those of another language), such models avoid the more awkward and interesting questions arising from considerations of how phonetic similarities interact with structural differences of the kind that are inevitably encountered in even the simplest cases of prosodic contact phenomena This paper reports a study of prosodic transfer effects in the production and perception of three contrastive English stress patterns by Vietnamese learners of English These contrasts are framed as sets of ‘minimal triplets’, disambiguated in their linguistic function by a preceding contextual phrase such as: (a) (b) (c) Context Target Type This is a bottle which is colored blue This bottle isn’t colored yellow This kind of jelly-fish is common here It’s a blue bottle It’s a blue bottle It’s a blue bottle Broad focus NP Narrow focus NP Compound word Our original motivation was simply to fill a gap in the literature by undertaking a study of prosodic transfer effects across two languages with sharply different prosodic systems, using a set of stimuli that would involve minimal confounding of segmental transfer effects, and two groups of second language learners (beginners and advanced) that might provide some purchase on the learnability of the relevant phonetic features required to master the target phonological contrasts in perception and production From a strictly phonetic perspective, the three English ‘stress’ patterns may be viewed as contrasting patterns of pitch or accentual prominence (right edge prominence for the broad-focus noun phrase [B], left edge prominence for the narrow-focus noun phrase [N] or compound [C]), plus a temporal factor (distinguishing the compound from the phrase) This, we demonstrate in the first part of the data analysis, by showing that two orthogonal linear discriminant functions, based upon vowel nuclei fundamental frequency (f0) difference measures and normalized word or phrase durations successfully classify spoken English nativespeaker tokens into the three stress groupings Thus, a linear phonetic feature detector, trained to recognize category boundaries on the relevant pitch and timing dimensions may be all that is required to discriminate/ identify the three target stress types However, the finding that native speakers of Australian English performed significantly worse in the perceptual experiment than a simple linear discriminator, supplied with the critical acoustic measurements of nuclear f0 peak differences and syllable duration gave pause for reflection, that a two-parameter feature detector may be seriously flawed as a perceptual model of discrimination between the three stress patterns Further calling into question the appropriateness of the simple parametric model is the phonological consideration that the broad focus, narrow focus and compound contrasts span two distinct domains of contrastive feature assignment—the lexicon in the case of compounds, and the post-lexical domain of phrasal accent assignment in the case of the broad and narrow focus NPs Hence the listener’s task in perceptually judging or producing acceptable tokens of the three stress patterns is likely to involve simultaneous access to two autonomous aspects of prosodic competence concerned with phrase level control of intonational focus marking on the one hand, and control over the lexical prosody that distinguishes compounds as words from their otherwise homophonous phrasal counterparts on the other It will be argued that this simultaneous duallevel access to prosodic forms complicates the listener’s task in accurately perceiving the three stress patterns From the perspective of the second language learner, any phonological transfer effects from Vietnamese to English prosody will likely involve considerations of the lexical prosody of Vietnamese compounds and intonation-based accent assignment Elaboration of a model of the interaction of lexical stress with phrasal accent assignment constitutes a central theme in the attempted integration of autosegmental theory (from phonology) with phonetic investigations of suprasegmental features in speech production (Beckman & Ayers, 1994; Beckman & Pierrehumbert, 1986; Fletcher & Harrington, 2001; among others) Section of this paper reviews the phonetics and phonology of Vietnamese compounds and word prosody, with a view to predicting ARTICLE IN PRESS 160 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 prosodic interference effects that might be expected from the perspective of an autosegmental model of lexical stress and accent assignment Notwithstanding its shortcomings, the two-factor phonetic model, when it is supplemented by a distinction between ‘active’ and ‘non-active’ prosody control parameters derived from L1 (Ueyama, 2000) proves worthy of closer scrutiny for its ability to predict transfer effects in discrimination and production of the three stress patterns for Vietnamese learners of English The argument strategy of this paper is to push the predictions of an admittedly overly simple phonetic model of prosodic interference effects; to reveal how well it succeeds and where it fails to predict the perceptual responses and production characteristics of learners, as well as the perceptual responses of native listeners Our analysis of the experimental findings emphasizes the linguistic information processing demands imposed by the perception and production tasks and the strategies that native and non-native speaker–listeners probably adopted to meet these task demands The organization of the paper is as follows: Section begins with a review of recent phonetic studies of prosodic transfer effects, focussing on contact situations between languages that differ in terms of the familiar typology of ‘tone’, ‘stress’, and ‘pitch accent’ languages Relevant background information on Vietnamese word and phrase level prosody is then discussed; specifically, (a) the lexical tonal system of Vietnamese and tonal transfer effects in the production and perception of English word stress and (b) a comparative phonetic analysis of the compound—phrasal stress contrast in English and Vietnamese Section describes the experiments that were conducted In Section 3.1 the acoustic properties of the stimulus materials are described that were subsequently used in the perceptual experiment and as training material for the production experiment We first show that the broad focus (B) narrow focus (N) and compound (C) categories may be discriminated using just two critical acoustic parameters (F0 and normalized duration), among a range of acoustic correlates of stress that were tested Next (Section 3.2), we investigate how well the critical f0 and timing parameters are preserved in the non-native speakers’ productions of the target English stress patterns, elicited as appropriate continuations of the context sentence cue Discriminant analysis and other statistical tests indicate that Vietnamese subjects primarily respond to pitch cues, but that some acquired sensitivity to the durational contrast between compound and phrasal stress is evident from the responses of the advanced learner group In Section 3.3 a perceptual discrimination experiment is reported Native Australian English listeners as well as Vietnamese learners were given the task of identifying the appropriate stress pattern (presented as auditory stimuli on the carrier phrase: It’s a _.) for a given context Section addresses apparent discrepancies in the results of the production and perception tests in terms of differential task demands imposed on native Australian English listeners and Vietnamese learners We argue that beginning learners’ responses are dominated by a tonal transfer strategy in which Vietnamese lexical tones are substituted for English accentual and boundary tones on the basis of tone ‘shape’ similarity, constrained by Vietnamese phonotactics on the distribution of lexical tones In contrast, the advanced learner group evince some accommodation to the temporal cues that contrast (compound) word, and phrase phonology in English and show evidence of de-accenting the rightmost components of the narrow-focus phrases under conditions of contrastive focus at the level of phrasal prosody The case for these distinct learner response strategies is strengthened by a qualitative ToBI style analysis of pitch accent types and their distribution in the native English and learner tokens Some tentative conclusions are then offered in Section Prosodic transfer effects A few studies that closely examine the phonetic properties of L2 prosody production by learners from various language backgrounds have shown how L1 phonology constrains the production and perception of L2 prosodic patterns Willems (1982) investigated intonational deviations in the English produced by native speakers of Dutch He found that L2 productions of English deviate from the native British English norm mainly in the size and direction of pitch movements This could be clearly attributed to the transfer of intonational characteristics from Dutch, which was examined through an instrumental comparison of the production of English utterances by monolingual English speakers and Dutch learners of English, and the production of comparable Dutch utterances by functionally monolingual Dutch speakers ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 161 A similar transfer effect from L1 intonation was found for Seoul Korean and Mandarin Chinese speakers of English McGory (1997) investigated the production of American English word pairs differing in the location of stress (e.g., memorizes vs memorial) in statements and questions and in several focus conditions by Seoul Korean and Mandarin Chinese speakers Both groups of non-native speakers appeared to have difficulties producing native English prominence relations: where native English speakers produced pitch accents in prominent target words only, non-native speakers produced stressed syllables with higher f0 values in both prominent and less prominent words In addition, the non-native speakers did not distinguish between statements and questions in their f0 patterns The differences in intonation patterns between non-native and native speakers of English could (to a large extent) be attributed to influences of the L1, which was clearly shown in the different error patterns produced by non-native speakers of two different L1 backgrounds: Mandarin and Korean speakers In a similar study, Ueyama and Jun (1998) examined the realization of English post-focus deaccentuation produced by Tokyo Japanese speakers and Seoul Korean speakers at different proficiency levels They found that post-focus deaccentuation was easier for Japanese than for Korean speakers at the same level of proficiency, which was attributed to an L1 transfer effect It appears that Japanese downstep (i.e., the pitch accent of an accented word following another accented word has a lower f0 peak relative to the preceding pitch accent) was being positively transferred to L2 production and the lack of downstep in Korean (in which phrases are marked by a phrase-final H tone) was negatively transferred to L2 production They also found that the degree of deaccentuation correlated with proficiency levels: the more fluent the speaker, the greater is the degree of dephrasing The above studies are restricted to the deviation of f0 patterns (tonal shapes) in L2 intonation production Studies that have jointly examined transfer effects in the acoustic correlates of adaptation to L2 temporal and accentual structure are even rarer, but are now beginning to appear (see, Mennen, 2004) In a study on the production of accent peak alignment by Dutch non-native speakers of Greek, Mennen (2004) found a bidirectional interference in the realization of an accent contrast common to both languages The majority of the L2 learners (four out of five) in her study failed to produce native-like f0 peak alignment values in the L2 They produced the peak as early as that of the native Dutch control group (i.e., within the accented vowel) in statements with long vowels in the accented syllable of the test word, and considerably earlier than that of the native Greek control group, who realized the peak in the following unaccented vowel Ueyama (2000) and Nguyeˆñ and Ingram (2005) are two recent studies that investigate the acoustic correlates of L2 word-level prosody In a study on Japanese learners’ production of English words, Ueyama (2000) found that the active role of f0 in the L1 Japanese pitch accent system positively transfers to English, as indicated by a consistently higher f0 in accented than in unaccented syllables By contrast, even though Japanese has a phonemic vowel length contrast, beginning Japanese speakers of English were less successful in realizing accented vs nonaccented syllable duration contrasts than advanced speakers, because L1 Japanese word accent production is restricted to f0 contrasts while duration is not actively manipulated In a companion study to the present paper, using the same subjects as this study (Vietnamese speakers of English at two different levels of proficiency) but investigating a different L2 prosodic contrast: the production of English word stress contrasts in segmentally homophonous noun/verb pairs (e.g., permit[n] vs permit[v]), Nguyeˆñ and Ingram (2005) found that Vietnamese learners can differentiate between stressed and unstressed syllables in English by means of f0 contrast—an acoustic correlate available in both languages However, in the early stage of second language learning they fail to produce a syllable duration contrast that characterizes native productions and fail to reduce vowels in unstressed syllables, possibly because these two important phonetic features are not active in Vietnamese tonal contrasts In brief, both studies that examined acoustic correlates of L2 prosody suggest that L2 learners will have less difficulty realizing an acoustic correlate that is actively used for prosodic contrasts in both native and target languages (e.g., f0 in Japanese: pitch accent, f0 in Vietnamese: lexical tone vs f0 in English: word stress and accent) than those that are not active in L1 (e.g., stress-induced duration contrasts and vowel reduction in Japanese and Vietnamese) Nevertheless, these two studies are restricted to the acoustic correlates of L2 wordlevel prosody The present study takes a further step in investigating the adaptation to contrasting temporal and accentual structure of (compound) word vs phrase prosody using both quantitative acoustic measurement and qualitative tonal transcription (ToBI: Beckman & Ayers, 1994) ARTICLE IN PRESS 162 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 2.1 Comparative analysis of English word stress vs Vietnamese tones Vietnamese—a tone language—and English—a stress accent language—have quite different systems of word prosody English has a system of culminative word stress with predominantly short stressed word roots and reduced suffixes and thus the majority of words have stressed first syllables (Dauer, 1983; Garde, 1965) Also, stressed syllables have more complex structures, whereas unstressed syllables often have reduced vowels Vietnamese, on the other hand, has a system of lexically distinctive tones (Nguyeˆñ, 1970, 1980) and is strongly syllabic in its phonological organization and morphology Most syllables are independent morphemes and every syllable in an utterance bears an independent lexical tone specification which is not neutralized (become toneless) in context In addition, no system of culminative word stress has been found; nevertheless, it is widely accepted that there is stress in the sense of accentual prominence at the phrasal level (Nguyeˆñ, 1970; Thompson, 1987) English and Vietnamese also differ in terms of how they manipulate the acoustic correlates at word-level prosody Studies on the acoustic correlates of English stress show that judgements of linguistically significant stress in English are contingent upon at least acoustic parameters: fundamental frequency, duration, amplitude, and vowel quality (Beckman, 1986; Fry 1955; among others) On the other hand, in Vietnamese, in addition to direction of f0 movement (tone contour) and f0 height—the two primary dimensions of linguistic tone—voice quality, intensity and duration have also been found to distinguish tones (Nguyeˆñ & Edmondson, 1997; Ph: am, 2003; Vu˜, 1981, 1982) Voice quality, particularly the laryngeal features of creakiness and breathiness are found to accompany some particular tones across dialects Creakiness, in addition to occurring as a regular feature on the Broken (nga˜) and Drop (na.ng) tones of the Northern dialect and the Curve (hh oi) tone of the Central dialect, also occurs on some local variants of the Southern Drop tone (Vu˜, 1981) Creakiness and breathiness are found to accompany Falling (huye`ˆ n), Drop, Curve and Broken tones of the Hanoi dialect and claimed to be a distinctive register feature, distinguishing low register tones from high register tones (Ph: am, 2003) Intensity was found to highly correlate with f0 (Vu˜, 1981) and thus can be said to be supplementary to f0 Duration or particularly tonal length has been found to be not a distinctive feature in Vietnamese (Ph: am, 2003; Vu˜, 1981) but only varies in segmental contexts (i.e tones in stop final syllables are inherently shorter than tones in other environments) From a study on native speakers’ perception of Vietnamese tones, Vu˜ (1981) came to the conclusion that the direction of f0 movement, f0 height and voice quality play a more important role than other tonal dimensions, such as duration and intensity, in the identification of tones Intensity and duration supportively contribute to perception but play no independent role in tone recognition The aforementioned studies show that even though both languages employ f0 as perceptual cues (to tones in Vietnamese and word stress and accent in English), the two languages differ in terms of the manipulation of the acoustic cues Evidence on the transfer of tonal pitch features into Vietnamese learners’ English production and perception has been observed (Ho`ˆ , 1997; Nguyeˆñ, 1970, 1980, 2003; Pittam & Ingram, 1991; Riney, 1988) For example, at the production level, Nguye˜ˆ n (1970) noted that Vietnamese speakers of English tend to substitute the high rising (sac) tone for primary stress resulting in exaggerated pitch changes on stressed syllables Pittam and Ingram (1991) and Riney (1988) observed tonal effects provoked by English words with syllables closed by an obstruent which were produced with a checked quality of the Rising (sac) tone and easily identified by its abrupt high rise (Nguyeˆñ & Ingram, 2004) In a recent study on Vietnamese perception of English polysyllabic words, Nguyeˆñ (2003) found that an English syllable could be perceived as a certain Vietnamese tone depending on the syllable structure (a closed syllable ending in an obstruent or a syllable ending in a sonorant) and stress levels (stressed and unstressed), namely stressed syllables associated with high level tones and unstressed syllables with low level tones, which suggests that there is perceptual tonal transfer which is constrained by relative pitch levels and the segmental composition of the syllables The results of these studies suggest that Vietnamese learners make reference to pitch in tone in the perception of English stress, in other words, they seem to interpret the intonation patterns on English words and phrases in terms of their native language tone categories However, the fact that word stress is ‘culminative’ in English in the sense that every content word or larger domain has exactly one primary-stressed syllable, and whatever syllables remain are subordinate to it (Trubetskoy, 1939/1969), potentially has far-reaching implications for gestural timing and rhythmic ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 163 differences between the two languages, and for how such differences may be modelled phonologically In English, stress contrasts are enhanced segmentally: stressed syllables are longer than unstressed syllables (i.e., duration is a distinctive and active correlate in word stress production) and unstressed vowels tend to be reduced In contrast, in Vietnamese, generally considered as a syllable-timed language (Nguyeˆñ, 1970, 1980) in which each syllable has a lexical tone specification, no systematic difference in duration or vowel quality among syllables has been found Production-wise, one of the five or six lexical tones, namely tone na.ng (dropping), is much shorter than all other tones (Brunelle, 2003; Ph: am, 2003) However, tonal length was found to have no distinctive status in Vietnamese (Ph: am, 2003; Vu˜, 1981), suggesting that duration is not an active cue in tonal contrast From this comparative analysis, it is predicted that Vietnamese learners of English will be able to produce f0 contrast—an acoustic correlate available in both languages—but will have problems using duration to enhance prominence contrast because they make limited use of it in their L1 and fail to reduce unstressed syllables 2.2 Comparative analysis of compound—phrasal accent contrasts in English and Vietnamese The three stress patterns of interest in this study (illustrated in examples (a)–(c) below) exemplify three types of prominence: (a) blackberry ¼ blaćkberry (compound, meaning: a kind of fruit) (b) black berry ¼ blaćk be´rry (broad-focus noun phrase, meaning: a berry that is black) (c) black berry ¼ blaćk berry (narrow-focus noun phrase, with an emphatic contrastive accent on black, as contrastive to green berry) The compound blaćkberry as a single three-syllable word has a primary word stress on black and secondary stress on berry The broad-focus noun phrase blaćk be´rry consists of two accented constituents: a phrasal stress (or default accent assignment) on ber-, the first syllable of berry with a pre-nuclear accent on black (Farnetani & Cosi, 1988; Hardcastle, 1968) In the narrow-focus noun phrase blaćk berry the syllable black receives an emphatic or contrastive stress There has been controversy over the pragmatic functions, the phonological structures and the phonetic cues associated with these three ‘‘stress’’ patterns Firstly, there is the question of the nature of the distinction between broad and narrow focus, which is commonly described in terms of scope: as to whether the listeners’ attention is drawn to new information that has scope over the whole phrase (broad focus) or only to the element within the phrase which contains new information (narrow scope) Narrow scope is often considered to convey the specialized communicative function of countermanding some erroneous assumption that the speaker believes the listener to hold In this usage, narrow scope is identified as ‘‘contrastive stress’’ There is ongoing debate as to whether contrastive stress is a distinct type of prosodic effect or whether it should be simply treated as a case of accentual prominence Indeed, some maintain that contrastive accents are formally different from other accents, either because the type of accent is different for the contrastive cases or because they are more prominent Couper-Kuhlen (1984) and Chafe (1974) mention the existence of a sudden drop in f0 after the contrastive accent, whereas a non-contrastive accent is more likely to be sustained Pierrehumbert and Hirschberg (1990) suggested that contrastive accents have an L+H* pattern while novelty accents have an H* form Ladd and Morton (1997) have shown (for standard southern British English) that the ‘‘emphatic’’ peak accent type has a higher, later peak than the ‘‘normal’’ peak accent type, which is in consistent with Pierrehumbert and Hirschberg (1990)’s L+H* and H* contours Bartels and Kingston (1994) argue that what distinguishes narrow focus is enhanced prominence on the focused element In English broad and late narrow focus have identical accent patterns: pitch accents are aligned with the last accentable constituents within an intonation contour Under early narrow focus the pitch accent is claimed to shift to an earlier location and the last accentable syllable is deaccented (Beckman & Pierrehumbert, 1986; Jackendoff, 1972; Ladd, 1980) Nevertheless, in a study on the production and perception of narrow-focus patterns (e.g., RED ball vs red BALL) by 42 American English children (age 3–10) and adults, Jannedy (1997) found that children as well as adults accent the noun regardless of whether the adjective or the noun is contrasted However, there appears to be a strong tendency to use non- or less prominent accent types on the noun when the adjective is narrowly focussed The perception results showed that adult listeners can reliably ARTICLE IN PRESS 164 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 interpret the pragmatic information of an early narrow focus regardless of whether the following noun is deaccented or not The results of this study suggested that deaccenting the noun does not involve taking the accent completely away but using less prominent accent types over more prominent ones and that children have to learn to use less prominent accent types over prominent ones Secondly, although the phonological status of the contrast between compound and phrasal stress is not in question for prototypical cases such as blackberry vs black berry there is some doubt as to whether the prosodic pattern of the compound form is reliably distinguishable from its phrasal counterpart both in terms of production and perception by native listeners In fact, Atkinson-King (1973) and Vogel and Raimy (2002) investigated the acquisition of compound vs phrasal stress (ho´t dog vs hot do´g) in English by children aged 5, 7, and 11 The subjects were shown pairs of pictures representing a compound word and the corresponding phrase They heard a prerecorded tape with the names of the items, and were asked to indicate which one they heard The results of both studies (Vogel & Raimy (2002) replicated Atkinson-King (1973)’s study) showed that even though the youngest children produced the right pattern, they did not parse the pattern in perception until as late as 11 or 12 years of age Despite the vast body of work on the phonology and the phonetics of English stress, the interface between word and phrasal prosody as exemplified by the acoustic correlates and perception of these three stress patterns has not been adequately investigated Hardcastle (1968) examined the f0 and intensity changes between the accented syllables of each stress pattern in Australian English and found that the f0 and intensity changes were clearly greatest for the syllables carrying the emphatic stress pattern (narrow focus) He also found the f0 changes associated with the narrow focus closely resembled those associated with the compound stress pattern A sharp upward f0 movement at the beginning of the second element was found in the broadfocus noun phrases, which did not occur in items associated with the emphatic or compound stress pattern Surprisingly perhaps, Hardcastle’s perception experiment showed that a significant majority of listeners had difficulty in reliably distinguishing the three stress patterns Narrow-focus phrasal stress was often confused with the compound stress This he attributed to the similar f0 and intensity changes associated with these two stress patterns Some other studies have focussed on the investigation of the acoustic and perceptual correlates of two patterns: compounds and broad-focus noun phrases only Bolinger and Gerstman (1957) found that in three-constituent pairs like lighthouse keeper vs light housekeeper, the temporal interval between the constituents was an efficient cue for distinguishing these pairs But there has been no evidence of temporal interval as a distinctive cue in simpler pair constructions, like lighthouse and light house Faure, Hirst, and Chafcouloff (1980) investigated two-constituent minimal pairs blackbird and black bird, finding that the two constructions differed significantly in duration; total duration of phrases were 20% longer than total duration of compounds Their data suggested that, while both duration and fundamental frequency are important features in the production of compounds and phrases, pitch, but not temporal structure, is crucial for their perception Farnetani and Cosi (1988) found that while duration is the major differentiating parameter in production (compounds are shorter in comparison to phrases), the perceptual distinction lies primarily in the different prominence pattern: a sequence of an accented constituent followed by an unaccented one in compounds and of two accented constituents (the second heard as stronger than the first) in non-compounds In Vietnamese, it seems that compounds and noun phrases are syntactically and semantically contrastive but not phonologically contrastive under normal circumstances of production In terms of syntactic structure, Vietnamese compounds and phrases have a reversed word order from English (Vietnamese: Noun+Adjective: hoa[flower] hoˆ`ng[pink] vs English: Adjective+Noun: black berry) A subtype of Vietnamese compounds(e.g., specializing compounds vs phrases) is claimed to have a reverse prosodic pattern of prominence of the English compound—phrase pattern; that is weak–strong for compounds and strong–weak for noun phrases (Thompson, 1987) However, no conclusive acoustic evidence has been found to support this compoundphrasal prominence pattern (Nguyeˆñ & Ingram, 2007) In addition, it is generally claimed that there is no tonal neutralization due to ‘‘sandhi’’ in Vietnamese (except in a subclass of reduplication), a phenomenon that occurs in other tone languages like Chinese and Thai (Chen, 2000; Gandour, 1974); that is, there is no toneless syllable such as in Shanghainese and the other Wu dialects or a systematic change of tone when words occur in combination such as in Mandarin Chinese (Chen, 2000) No systematic prosodic difference between ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 165 compounds and phrases has been found in Vietnamese Instead, every syllable in both a compound word and a phrase has a full vowel and a lexical tone specification In a recent experiment, Nguyeˆñ and Ingram (2007) investigated the acoustic and perceptual correlates that distinguish compounds (hoa hoˆ`ng: a rose) from phrases (hoa hoˆ`ng: a pink flower) in Vietnamese under two experimental conditions: one with a picture-naming task (representing spontaneous natural speech) and one with a minimal pair sentence task (the ‘maximally contrastive’ elicitation condition) by 45 Vietnamese native speakers of three dialects (Hanoi, Hue, and Saigon) It was found that even under conditions of maximal contrast, there was no conclusive acoustic evidence (in terms of f0 [Hz], intensity [dB], spectral tilt and duration) to support the claim of contrastive stress patterns between compounds and noun phrases in Vietnamese If forced to realize a prosodic contrast under elicitation conditions of ‘maximal contrast’, Vietnamese speakers produced a juncture between the two constituents of the noun phrases, but only under this condition, and no juncture was present between components of compounds Compound words as a whole were not temporally compressed in comparison to their phrasal counterparts as in English, a stressed language with stress or foot-based timing Listeners relied only on the juncture between the two components of noun phrases as a cue to distinguish between phrases and compounds and failed to distinguish noun phrases from compounds in stimuli elicited under the picture-naming task where no juncture was produced between the two constituents of a phrase In regards to phonetic cues of contrastive or corrective accentual focus in Vietnamese, some authors, such as Hoa`ng and Hoa`ng (1975), or Gsell (1980) consider that full tonal realization of accented syllables is one of the positive marks of prominence (accent) at phrasal level in Vietnamese In a recent study on the effect of emphasis on glotalized and non-glotalized Vietnamese tones (the Hanoi creaky falling tone (i.e the na.ng tone) in obstruent vs sonorant final consonant environment respectively), Michaud and Vu˜ (2004) found that in Vietnamese emphasis, syllable lengthening appears as a speaker-dependent variable, whereas a stable correlate of emphasis is curve amplification, manifested as increased slope of f0 curve or as f0 register raising Three experiments 3.1 Preliminary experiment The aims of the first experiment were: (a) to construct a set of native speaker exemplar stimuli that could be used in perception and production experiments to test the mastery of the three-way pattern of prosodic contrasts between compound words and broad and narrow-focus noun phrases by Vietnamese learners of English, and (b) to establish the effectiveness of the accentual pitch and timing cues discussed earlier for discriminating among the three Australian English stress patterns 3.1.1 Linguistic materials Three sets of minimally contrastive triplets of compound words (C), broad-focus noun phrases (B), and narrow-focus noun phrases (N) were constructed, using three syllabic templates: monosyllabic first element plus disyllabic second element (e.g black berry); disyllabic first element plus monosyllabic second element (e.g butter fish); disyllabic first and second element (e.g English teacher) There were four tokens for each syllable type, yielding 12 sets of triplets or 36 items for each speaker Each item was made up of a short context sentence, followed by fixed carrier sentence (This/it/he is a y) which ensured that target contrasts appeared as the final elements in each sentence and were in approximately the same position in the sentence intonation contour (see item example below and Appendices A and B for a complete list of stimuli) (a) (b) (c) This is a bottle which is colored blue This bottle isn’t colored yellow This kind of jelly fish is quite common here It’s a blue bottle It’s a blue bottle It’s a blue bottle Broad focus NP Narrow focus NP Compound word 3.1.2 The speakers Four native speakers of Australian English, experienced in producing good quality exemplars for phonetic experiments were used; two adult males (J.I and R.P.) and two females (E.C and F.H.) J.I is a phonetician ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 166 and R.P a linguist and actor with professional voice training (and are co-authors of this paper) E.C and F.H are speech pathologists with extensive clinical and teaching experience They were presented with a randomized list of the test triplets with their preceding context sentences and instructed to read each item (context+target sentence) in a natural speaking manner 3.1.3 Measurements The sentences were digitized (at 20 kHz sampling rate and 16 bit precision) and spectrographic measurements were made via a sound editing and analysis program, the Emu Speech Tools (Cassidy, 1999) First, the Emu Labeller was used to mark the edges of the target syllables and vowels, relying primarily on the spectrographic display in the Labeller Then the Emu Query Tool was used to extract syllable durations (ms) and f0 (Hz) and intensity (dB) values at vowel midpoint The segmentation criteria were generally based on the major discontinuities of the energy distribution over frequency and time visible on the spectrograms Taking butter fish as an example, the syllable bu- was measured from the onset of the closure for [b] to the cessation of the vowel formants; the syllable -ter from the onset of closure for [t] to the onset of fricative noise for [f]; the syllable fish from the onset of fricative noise for [f] to the offset of high-frequency fricative noise for [P] Since all the stops of the test items appear utterances medially, the onset of closure for the stops at the start of the syllable was taken from the offset/cessation of the preceding word/segment Studies of the effects of stress and accent on duration in English have shown that not only the rhymes but also the initial consonants are lengthened relative to their counterparts in unstressed syllables (Ingrisano & Weismer, 1979; Umeda, 1977; among others) Therefore, in this experiment, the duration of the whole syllable, including the onset and the rhyme, was measured f0 and intensity measurements were taken at the center of the vowels of the stressed syllables, which was extracted automatically by using an EMU-R query command on the basis of the labelled vowel onset and offset 3.1.4 Analysis The acoustic analysis concerns fundamental frequency (f0), duration and intensity of the constituents of test items The following acoustic parameters were investigated: mid-vowel f0 value of the first and second stressed syllable (e.g., English teacher: V1F0, V2F0), mid-vowel intensity value of the first and second stressed syllable (V1 and V2 intensity), F0 change (V1F0–V2F0), intensity change (V1 intensity–V2 intensity), duration of the constituent syllables (e.g., English teacher: S1 and S2 for the underlined accent-bearing syllables and U1 and U2 for italic unstressed syllable), duration of the whole compound words or noun phrases (blackberry, English teacher) In order to control for segment compositional effects in duration measurements, intrinsic and contextdependent f0 effects, and individual speaker differences, all measurements were analyzed as pair-wise comparisons within items (i ¼ 1, 6) and speakers (j ¼ 1, 4) and a mixed model ANOVA was used e.g Eng | Eng | Eng lish| lish | lish tea | tea | tea cher | cher | cher Compoundi speakerj bold ¼ accented Ph Broadi speakerj Ph Narrowi speakerj The mixed model two-way ANOVA, with stress patterns (Compound, Broad, and Narrow) and speaker groups (native, advanced, beginner) as fixed effects and speakers and items as random effects was conducted on each acoustic parameter The restricted maximum likelihood (REML) method was used to estimate variance components A Tukey post hoc test (with the criterion p-value set at 0.05) was then conducted to determine the significant differences among levels of the main fixed factors and their interaction effects (i.e., the pair-wise comparison among the three stress patterns within each speaker group) ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 167 Guided by the ANOVA results, two of the acoustic measures (f0 change and duration of the whole words or noun phrases) were selected as a basis for classification of the stimulus tokens produced by the four native speakers A scatter plot diagram of the test items on these two parameters, supported by a discriminant function analysis, provided an acoustic basis for distinguishing the three target stress patterns It is also worth noting that the preliminary analysis on all nine acoustic parameters showed no systematic significant difference among the three syllabic templates (e.g black berry, butter fish and English teacher) therefore, the syllabic template as a factor was excluded from further analysis However, in some four-syllable compounds (e.g., English teacher), the second accent-bearing syllables, in spite of having less prominent f0, were not totally deaccented and will be discussed separately in the qualitative analysis of the f0 contour section 3.1.5 Results: classification of native speaker productions A detailed report of individual parameters of the native English productions is presented along with the Vietnamese learners’ productions in Section 3.2 Here we merely present the results of the discriminant function analysis which established that the three English contrastive stress patterns may be successfully discriminated on the basis of speaker-normalized peak f0 measurements and rate-normalized duration differences between compound words and phrases In order to examine whether stress patterns produced by native speakers are classifiable on the basis of f0 and duration cues, the f0 change measure (V1F0–V2F0, defined previously) and a normalized word duration measure were fed into a linear Discriminant analysis (Splus2000TM) to partition the stimulus items into three non-overlapping groups in acoustic space The results are shown in a scatter plot (Fig 1) The normalized duration measure used was a modified Z-score, derived as follows Measurements of the duration of the whole compound word or noun phrase were expressed as a difference score for each item from the mean of its minimal set (thus normalizing for intrinsic phone value and speaker differences) Z-score ¼ (duration valueÀmean duration value/standard deviation) Then, the Z-score was converted to a t-score (t-score ¼ Z-score Â 10+50) in order to yield a whole and positive number The scatter plot of native speakers’ items (Fig 1) shows that all but 12 of the experimental stimuli (12/144: 92% of tokens) were correctly classified into their Broad, Narrow and Compound groupings on the basis of the f0 change measure and the normalized ‘word’ duration scores 3.1.6 Conclusions: classification of native speaker productions The foregoing result demonstrates that on a set of tokens carefully produced by four phonetically aware native speakers, a simple linear discriminator equipped with the ability to measure pitch changes across adjacent stressed syllables and the relative timing of these intervals can discriminate the three stress patterns Normalised word/phrase duration Native speakers 51.0 50.5 50.0 49.5 49.0 48.5 100 F0 change 200 Fig The scatter plots of four native speakers’ stress patterns: B—Broad, N—Narrow, and C—Compound ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 176 Table The post hoc Tukey results on duration values Native Advanced MD Sig Phrase B4C N4C B4N 26.8 203.9 63.9 po.0001 po.0001 po.01 S1 B4C N4C BoN 77.5 88.6 À11.1 U1 B4C N4C B$N Beginner MD Sig MD Sig B4C N4C BoN 88.3 121.6 À33.2 po.0001 po.0001 po.02 BoC N$C BoN À28.1 8.6 À36.7 po.04 p ¼ 0.5 n.s po0.01 po.0001 po.0001 p ¼ 25 n.s B$C N4C BoN À4.3 À57.6 À62.0 p ¼ 0.5 ns po.0001 po.0001 BoC N4C BoN À14.4 14.7 À29.1 po.02 po.02 po.0001 51.7 41.7 9.9 po.0001 po.002 p ¼ 0.4 n.s B$C N4C BoN 4.7 31.8 À27.0 p ¼ 0.5 ns po.0001 po.001 B$C N4C BoN À11.9 32.1 À44.1 p ¼ 0.1 n.s po.0001 po.0001 S2 B4C N4C B4N 96.6 44.9 51.6 po.0001 po.0001 po.0001 B4C N4C B4N 73.3 35.5 37.7 po.0001 po.0001 po.0001 B$C N$C B4N 5.3 10.7 16.1 p ¼ 0.5 n.s p ¼ 0.1 n.s po.05 U2 B4C N$C B$N 37.0 18.2 18.7 po.002 p ¼ 0.1 ns p ¼ 0.1 ns B4C N$C B$N 24.2 13.8 13.4 po.002 p ¼ 0.1 ns p ¼ 0.06 BoC NoC B$N À16.5 À25.0 8.5 po.03 po.001 p ¼ 0.2 n.s MD: mean difference 1400 U2 S2 U1 S1 1200 1000 800 600 400 200 B N Native C B N Advanced C B N Beginner C Fig Comparison of duration of word/phrase (across constituent syllables: ¼ S1+U1+S2+U2) between stress patterns among speaker groups Vertical axis: mean duration (ms) Legend: S1: first stressed syllable (e.g., English teacher), U1: unstressed syllable in the first word (U1: e.g., English teacher), S2: second stressed syllable (e.g., English teacher), and U2: the final syllable (e.g., English teacher) on the adjective of the narrow than that of the compound) However, they failed to realize the syntagmatic contrasts of accent (i.e., more prominent elements alternate and contrast syntagmatically with less prominent ones) and thus failed to deaccent the nouns of the narrow and compound patterns In addition, the f0 pattern produced by Vietnamese speakers were more varied than that of native speakers, suggesting a transfer effect of the various tonal contour on different syllable types from Vietnamese, supporting previous observations on ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 177 Vietnamese learners’ making reference to tonal features in the perception and realization of English stress patterns (Hoˆ`, 1997; Nguyeˆñ, 1970, 1980, 2003; Pittam & Ingram, 1991; Riney, 1988) The current results of beginning L1 Vietnamese’s failure to deaccent English unstressed syllables is consistent with McGory (1997)’s findings for Mandarin and Seoul Korean L1 speakers, showing that English L2 learners of different L1 prosodic systems (Vietnamese, Mandarin Chinese, and Seoul Korean) all fail at first to produce contrastive f0 prominence patterns between stressed and unstressed levels in an English-like way This can be explained as a confluence of different types of L1 interference that result in f0 patterns that are different in shape and possibly alignment but all interpretable by L1 native speakers only as a failure to deaccent 3.2.6.2 Discussion of duration The duration analysis showed that there were significant differences in temporal structure between compounds and their corresponding (broad or narrow) phrasal constructions, further confirming findings on compound word vs phrase duration contrast by previous researchers (Farnetani & Cosi, 1988; Faure et al., 1980) The duration compression was evidenced in all syllable constituents of compounds The finding of a temporal compression effect associated with compounds may be explained by two different mechanisms: (1) accentual lengthening and word edge lengthening in phrases or (2) a word shortening effect in compounds, i.e., the syllables in the compound were compressed to conform to the temporal template of a word by the word shortening effect This will be further elaborated in the Discussion section below On the basis of the first mechanism, it can be clearly seen that the constituents of the phrases were longer than compound counterparts due to two different lengthening effects: an accentual lengthening and a word edge lengthening In black berry, the syllable black in the phrases were longer possibly due to an accentual and word edge lengthening effect, the syllable ber- was also longer than the compound counterpart due to an accentual lengthening particularly in the broad-focus pattern, while in the narrow focus phrase it is preserved to be an accent-bearing element, i.e., it can be de-accented but is not reduced On the other hand, compounding may be seen phonologically as a process in which words (new lexical items) are created from phrases (compositional syntactic constructions) by altering the prosodic characteristics of the phrasal construction to conform to the prosodic template of the word, i.e the process of taking a single accentable syllable and being subject to temporal compression effects associated with affixation in polysyllabic word forms The conformity to a ‘‘word accentual template’’ is supported by compound accent patterns in different languages As in other Germanic languages, most English words have initial stress, so the fact that it is the accent-bearing element of the first word in the compound that gets to keep its stress as the primary stress is part of this template By contrast, in Italian, a stress accent language, stress tends to occur on the stressable syllable of the rightmost morpheme carrying an underlying accent of a word (Garde, 1965; Rossi, 1998); and in Italian compounds, stress ‘‘is assigned to the last member of a compound’’ (e.g., lava pia´tti ‘dish washer’) (Vogel & Raimy, 2002, p 229) Further evidence is found in pitch accent languages In Turkish, a pitch accent language (Levi, 2005), word-level accent consistently shows the pattern of promoting the leftmost site of the lexical accent (Barker, 1989; Inkelas, 1999) Compounds, like affixed words, also ‘‘promote the accent of the leftmost member of the compound’’ (e.g., aya´k-kabi ‘shoe’ vs aya´k ‘foot’ and kabi ‘cover’) (Levi, 2002) In Japanese compounds, on the other hand, it’s often the tone pattern of the last element that is kept (Kubozono, 1993) Therefore, it is argued that the finding on the compression of all compound constituents in comparison to phrasal counterparts in this study can also be explained as a ‘‘word shortening effect’’ in which the first element of the compound takes on the accentual characteristics of primary word stress (e.g., black in blackberry), the other accent-bearing elements (e.g., ber- in blackberry and tea- in English teacher) are either de-accented and/or reduced) while every constituent syllable is subject to the rhythmicinduced word template As a result of this, the compound as a whole takes on the rhythmic properties of a lexical word, as a domain for rhythm-induced temporal compensation effects The consistent compression of all constituents particularly even unstressed syllables at a word edge together with the de-accented and reduced vowel quality of the second stressed syllables observed in many highly lexicalized compounds produced by native speakers (e.g., /blæb=ri/ compared to the full vowels in phrasal counterparts (/blæk beri/) tend to support this analysis Particularly even though many four-syllable compounds (e.g., English teacher, plastic money, open classroom) still preserve a less prominent accent on the noun, they all have ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 178 more compressed syllable constituents and thus shorter duration as a whole than their phrasal counterparts (B4C: po.001; N4C: po.001; N$B: p ¼ 05) A similar temporal compression effect is found in Italian compounds compared to their phrasal counterparts (e.g., centopiedi ‘centipede’ vs cento piedi ‘one hundred feet’) in our preliminary acoustic analysis It would be interesting to extend this investigation to other languages 3.2.6.3 Discriminant analysis The result of the discriminant analysis on non-native speakers’ items (similar methodology as described in Section 3.1.5) showed that in contrast to native speakers’ items which were wellpartitioned into three non-overlapping groups in acoustic space (Fig 1), many items spoken by the advanced speakers were misclassified by the discrimination function, particularly narrow patterns misclassified as broad along the f0 x-axis (Fig 6a) This is consistent with the quantitative and qualitative analysis of f0 contours (Section 3.2.5.2) that many advanced speakers failed to de-accent the noun of the narrow patterns Compounds are generally well separated from phrases (in spite of some compounds misclassified as narrow) normalised word/phrase duration Advanced speakers 51.0 50.5 50.0 49.5 49.0 48.5 -30 -10 10 30 F0 change 50 70 90 70 90 normalised word/phrase duration Beginning speakers 51.0 50.5 50.0 49.5 49.0 48.5 -30 -10 10 30 F0 change 50 Fig The scatter plots of stress patterns spoken by Advanced and Beginning Vietnamese speakers of English: B—Broad, N—Narrow, and C—Compound ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 179 along the duration y-axis However, the three stress patterns could be separately grouped The scatter plot Fig 6b showed that items spoken by beginning speakers were unclassifiable into three separate groups on either f0 or duration dimensions, a further confirmation that non-native speakers failed to de-accent the nouns of the compound and narrow patterns and to compress constituents of the compounds 3.3 Perception of three stress patterns In the perceptual experiment, native Australian English listeners and non-native listeners (Vietnamese learners of English) heard a context sentence (e.g This berry is black) They were then required to make a forced choice among three alternatives as an appropriate continuation of the preceding context: a carrier sentence ending with a compound word, a narrow-focus noun phrase, or a broad-focus noun phrase The acoustic analysis presented in the production experiment suggests that an optimal perceptual strategy would employ a combination of: (a) pitch change over the nuclear elements of the compound word or phrase to distinguish left-headed (narrow-focus noun phrases and compounds) from right-headed constructions (broadfocus noun phrases) and (b) normalized word/phrase duration to distinguish compounds from phrasal constructions English listeners may be expected to use this strategy, particularly considering that the two stimulus dimensions, respectively, map directly onto separate phonological dimensions of accent and (word) rhythm Vietnamese listeners may be expected to transfer L1 tonal contrasts onto English stress (accent) perception and therefore would also be predicted to make use of the pitch-change dimension Duration is not an active and distinctive cue in Vietnamese tonal contrasts and thus would be unlikely to be involved in tonal transfer effects Also, it may be predicted that Vietnamese listeners would be insensitive to the duration contrast between compounds and phrases because Vietnamese lacks the specific temporal compensation effects associated with culminative word stress and foot timing However, this does not necessarily imply poorer performance, because as was shown previously, the tokens in this experiment supported a three-way classification of the stimuli (N4C4B) Advanced Vietnamese learners of English would be expected to perform at a higher level than beginning learners But interest on the effects of L2 fluency was not on the level of performance of the two groups so much as whether a comparison of their behavior on the tasks indicates any shift in strategy from a ‘Vietnamese’ towards an ‘English’ response strategy This has bearing on the question of whether there is evidence of reorganization of perceptual schemas in adaptation to L2 prosody, or simply more efficient usage of cues predicted to be used in L1 perceptual processing 3.3.1 Method: perception experiment 3.3.1.1 Subjects Three groups of subjects participated in this experiment; Vietnamese beginning learners of English, advanced Vietnamese speakers of English, and a control group of native Australian speakers of English The beginner group consisted of 80 first-year English-major undergraduates (20 Hanoi, 20 Hue, 20 Nghe An and 20 Saigon speakers; half-male and half-female in each dialect group) with no known auditory deficiencies The advanced group consisted of 20 postgraduate students at the University of Queensland (12 Southerners, Northerners, and Hue dialect speakers) Ten subjects in each non-native group also participated in the production experiment The control native English listener group consisted of 29 subjects (five males, 24 females) who received course credit for their volunteer participation in the experiment All were first year linguistics students at the University of Queensland Their first language was Australian English 3.3.1.2 Stimuli As reported in the production experiment, four native speakers (two males and two females) recorded the sentences, but only stimuli of two male linguists were used for this perceptual experiment In the listening identification test, listeners heard a contextual sentence (the target context) followed by three different test sentences carrying the three stress patterns The context sentence was read once followed by the three test sentences spoken in sequence, with a short pause between each (approximately s) and then repeated The subjects’ task was to choose the test sentence with the appropriate stress pattern that ARTICLE IN PRESS 180 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 fitted the meaning of the context sentence by circling the letter corresponding to their response in the answer sheet (see Appendix C for the answer sheet) An example of a test trial is as follows: Context phrase (target) Test sentences Category Response type This berry isn’t green (A) It’s a black berry (B) It’s a black berry (C) It’s a black berry Broad Narrow Compound Error Correct Error The response sheet contained no orthographic cues to the correct response Although two speakers read the test items, there was no mixing of speakers’ voices within trials The speaker who provided the context phrase was always the one who produced the three test sentences The order of the correct sentence within a trial triplet was random The stimuli were prepared using a sound-editing program (Speech Station), stored as wav files and put in a web page format There were in total 72 testing items (12 items Â stress patterns Â speakers) The test items were put into blocks of 12, with a gap of about s between each item 3.3.1.3 Procedures The beginning learners did the perception test in a quiet classroom at a university in each location (Hanoi, Saigon, Hue, and Nghe An) The test was played from a Compaq laptop computer with loud speakers of good quality For the native English listeners and advanced Vietnamese learners of English, the perception test was carried out in the Phonetic Laboratory at the University of Queensland The test was played from a desktop computer with loud speakers of good quality Before the test, there was a 5-min training in which listeners listened to six examples to acquaint themselves to the format of the test Before analysis and discussion of the results, it is important to note that in the listening task of this experiment, the subjects were comparing, for each given trial, a set of auditory images of the three different stress patterns for a particular context sentence presented both textually and auditorily, in isolation from the other two context sentences This means that in order to make a ‘‘correct’’ response, the subject would need to have interpreted the context correctly in isolation without being able to compare it to the other two contexts As a result, it was a concern that an incorrect response to a target context could be due to either subjects’ misinterpretation of the context or their inability to perceive the stress pattern correctly This concern was addressed by the explanation of the meaning of the context sentences that was given to non-native participants in the training session prior to the listening task In addition, it is also noted that an alternative form of test task in which subjects listened to only the target stress pattern and then identified the context (e.g., they heard ‘‘It’s a black berry’’ and then identified the most suitable context among the three given: (1) This berry is back It’s a y, (2) This berry isn’t green It’s a y, and (3) This is a kind of fruit It’s a y) was carried out in the pilot study and it turned out to be too hard for nonnative listeners: to listen to only the target stress pattern and then identify the context Therefore, the current test task (as described above) had to be used since it proved to be more suitable for non-native listeners: they listen to three stress patterns in a row, compare and then pick out one they think fit the context sentence There might be a concern about a large memory demand of the current test task, and this was anticipated and already addressed with the repetition of the audio input twice 3.3.2 Results: perception experiment 3.3.2.1 Effects of L1 background and L2 proficiency levels In order to examine the effect of language background and English proficiency levels on subjects’ perception of the three stress patterns, a three-way analysis of variance (ANOVA) was conducted The experimental factors in this analysis were the listener groups (three levels: native listeners, non-native beginners, and non-native advanced learners), the target stress patterns (three levels: compound, narrow focus, and broad focus) and the speakers (two levels: J.I and R.P.) The speaker factor is of interest because it was anticipated that native and non-native listeners may respond differently to speaker variation in the stimulus items The dependent variable was the percentage correct score for each item, calculated within each level of the three independent factors This analysis was adopted because ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 181 the three listener groups have different sample sizes (beginners: 80, advanced learners: 20, and native listeners: 29) The non-native listeners’ dialect as a factor was found not to be a significant effect in a preliminary analysis and was therefore excluded from the current analysis The ANOVA results show a significant main effect for each of the three factors (stress patterns: F(2,216) ¼ 41.65, po.0001; listener groups (i.e proficiency levels): F(2,216) ¼ 16.12, po.0001; and speakers: F(1,216) ¼ 6.22, po.02) and two interaction effects (stress patterns Â listener groups: F(4,216) ¼ 7.6, po.0001; listener groups Â speakers: F(2,216) ¼ 3.7, po.03) The most significant main and interaction effects between stress patterns and listener groups (proficiency levels) are shown in Fig Pair-wise comparisons among the three listener groups by the Tukey method indicated that native listeners and advanced learners did not differ significantly from each other in overall performance (Native vs Advanced: non-significance), but both differed significantly from the beginner group (Native vs Beginners: po.01; Advanced vs Beginners: po.01) However, this main effect of listener group needs to be evaluated in relation to the highly significant interaction of this factor with stress pattern (discussed below) The highly significant main effect of stress pattern also needs to be evaluated in relation to its interaction with the listener group factor The marginally significant main effect of speakers (R.P vs J.I.) also requires interpretation in relation to its interaction with listener group In order to examine the interaction between stress patterns and listener groups, a post hoc pair-wise multiple comparison by Tukey test was conducted The results showed that native listeners performed significantly better in the narrow and compound items than the broad items (N4B: po.001, C4B: po.001, N$C, n.s.) In contrast, the two non-native groups seem to have a similar pattern of performance; both groups identified the narrow pattern more successfully than the compound and broad (N4B: po.001, N4C: po.001, B$C: n.s.) Considered jointly, the main effect of listener group and its interaction with stress pattern indicates that although the advanced learners performed as well as native listeners overall, they shared with beginners a pattern of poor performance on identification of compounds An overall comparison of mean performance (Fig 7) shows that the narrow focus pattern was most accurately identified by all three groups While native listeners tended to perceive compounds better and the broad pattern worst, both groups of Vietnamese tended to identify the broad pattern better and the compound worst This difference can be attributed to different strategies and acoustic cues used by listeners of the two languages It is argued that native listeners of English relied on both pitch and duration to distinguish the three stress patterns They relied on greater pitch change to identify the narrow pattern and on the shortened word duration to distinguish the compound pattern, thus the broad pattern was left most difficult to identify because it is of comparable duration to the narrow-focus pattern but has small pitch change On the other hand, Vietnamese listeners relied mainly on pitch and did not make use of the duration cue As a result, they could easily identify the narrow patterns and had some difficulty in identifying the broad pattern but could not properly recognize the compound pattern Post hoc tests on the interaction between listener group and speakers (Fig 8) show that there is no significant speaker effect on native and advanced Vietnamese listeners’ performance (native: p ¼ n.s.; Stress patterns x Listener groups 90 85 80 75 70 Native Advanced Beginner 65 60 55 50 45 40 Broad Narrow Compound Fig Mean of percentage of correct perception scores by stress patterns and listener groups ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 182 Listener groups x Speakers 75 70 65 60 55 50 45 Native Advanced Beginner 40 JI RP Fig Mean of percentage of correct perception scores by listener groups and speakers advanced: p ¼ n.s.); i.e., items spoken by speaker (J.I.) were as equally identifiable as those spoken by speaker (R.P.) In contrast, there is a significant effect on the beginning listener groups’ performance (po.001), indicating that items spoken by speaker 1.(J.I.) were better perceived by beginning listeners than those spoken by speaker (R.P.) It is surmised that the significant speaker effect for the beginner group is due to differences in pitch range (and correlated intensity) between the two speakers and the Vietnamese listeners’ greater reliance on pitch contrasts to perform the identification task Examination of the speakers’ average pitch and intensity range between stressed syllables and unstressed syllables shows that speaker (J.I.) has a higher pitch and intensity level and greater pitch and intensity range than speaker (R.P.) The average pitch and intensity range of speaker (J.I.) for the testing items is 42 Hz and 20 db, respectively, while that of speaker (R.P.) is 30 Hz and 17 db The average pitch and intensity level of speaker (J.I.) is 151 Hz and 86 db on stressed syllables and 109 Hz and 82 db on unstressed syllables, while that of speaker (R.P.) is 115 Hz and 76 db on stressed syllables compared to 85 Hz and 70 db on unstressed syllables, respectively With pitch height as a cue to tone perception in their native language, Vietnamese are sensitive to pitch contrasts As a result, items spoken by a speaker with a greater pitch range (accompanied by enhanced intensity) are more easily identified than those spoken by a speaker with a more restricted pitch range 3.3.3 Analysis of the perceptual error patterns 3.3.3.1 Comparisons among native listener and learner groups Table shows the percentage of responses for each stress pattern The column labels indicate the subjects’ choice of test sentence patterns in response to the target contexts, indicated by the row labels As shown in Table 5, the error patterns differ between native and non-native listeners Native listeners tended to choose the narrow-focus pattern (23.9%) for compound context This means that some narrow stress patterns were misperceived as compounds However, the narrow-focus context was more likely to elicit an erroneous broad focus stress pattern than a compound pattern (15% of broad patterns were chosen to indicate narrow) And vice versa, the broad context was more likely to elicit an erroneous narrow stress pattern than a compound pattern (23.8% of narrow patterns were chosen to indicate broad) This indicates native speakers’ tendency to confuse between the two phrasal patterns, arguably due to the comparable duration between the two patterns and/or possibly a lack of pitch accent contrast between some narrow and broad stress patterns By contrast, both groups of non-native speakers tended to confuse the broad pattern and the compound pattern They tended to choose compound patterns for broad context (beginners: 25.4%, advanced: 21.6%) and vice versa, they chose the broad stress patterns for the compound context (beginners: 28.8%, advanced: 33.5%) It is argued that prosodically this confusion can be due to the lack of pitch prominence on the second element of many of the broad pattern stimuli due to the double accent and particularly pitch declination effect ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 183 Table The percentage of responses for each stress pattern across three listener groups Target contexts Stimulus stress patterns Broad (%) Beginning listeners (12 items Â speakers Â 80 listeners ¼ 1920 items) Broad 51.4 Narrow 14.4 Compound 28.8 Advanced listeners (12 items Â speakers Â 20 listeners ¼ 480 items) Broad 59.3 Narrow 8.5 Compound 33.5 Native English listeners (12 items Â speakers Â 29 listeners ¼ 696 items) Broad 55.4 Narrow 15 Compound 8.6 Narrow (%) Compound (%) 23.1 63.8 21.9 25.4 21.7 49 18.9 84.7 15.8 21.6 6.6 50.4 23.8 77.7 23.9 20.6 7.1 67.3 The column labels indicate the subjects’ choice of stimulus stress patterns in response to the target contexts, indicated by the row labels at utterance finally which made some broad focus token sound similar to compound tokens It is noted that the first element of the broad pattern is also accented due to a pre-nuclear accent, thus there needs to be a very sharp rise on the second accented element of the broad pattern so as to make it more prominent but this can be cancelled perceptually in many tokens due to the declination effect There was certainly duration contrasts between the two patterns but Vietnamese learners were not sensitive to duration cues Besides, this error pattern might stem from the confusion of the different word order in their native language Vietnamese and English have contrastive word order in compounds and noun phrases The word order is adjective-noun in English but it is noun-adjective in Vietnamese Therefore, it may be this contrastive pattern between the two languages that caused Vietnamese listeners’ confusion and misperception 3.3.3.2 Poor perceptual discrimination of the native listener group The relatively poor performance of the native listener group in assigning stress patterns to their appropriate elicitation contexts requires comment We have previously noted the superior performance of a simple linear discriminator (which obviously makes no use of context) to correctly identify the prosodic class membership of the stimulus tokens It is also remarkable that high level L2 functioning non-native listeners should perform as well overall as native listeners on a task of phonological pattern recognition This result calls for critical analysis of the listening task As one careful reviewer noted, performance on the perceptual task depended not only on the listener’s ability to distinguish the three prosodic contrasts, but to correctly interpret the pragmatic or semantic significance of the context sentence; and the importance of the latter was probably enhanced by the testing method of offering a single context sentence per trial and asking the listener to choose amongst three candidates, the appropriate stress pattern (Compound, Narrow or Broad) A post hoc analysis of the felicity conditions for the C, N and B responses was therefore undertaken, with particular reference to the form and content of the triggering contexts in relation to the target sentences It is important to consider specifically how the felicity conditions were instantiated lexically and syntactically in the stimulus materials of the experiment because such features may serve as cues for native or non-native response strategies Non-native listeners may be able to form response strategies on the basis of lexical or formal cues present in the stimulus materials without fully appreciating the pragmatic conditions that render an N B or C reading felicitous, or in some cases, without appreciating ambiguities that render more than one N B or C response possible in context The felicity condition on the selection of a C target is semantic in nature and critically dependent upon lexical knowledge A C target is unambiguously signalled when it has a lexical (non-compositional) meaning which is compatible with that of the context sentence and where its phrasal counterpart has a compositional ARTICLE IN PRESS 184 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 reading that is incompatible with the context (e.g., Context: ‘y house y for growing plants.’ Target: ¼ ‘hothouse’; * ‘hot house’) Post hoc analysis revealed that eight of the 12 C context–target pairs met this condition but four did not (e.g., Context: ‘ywoman teaches English.’ Target ¼ ‘English-teacher’ but ‘English teacher’ is not ruled out See also, C items 6, and in Appendix B) Removing these four items from the data analysis did not however substantially change the pattern of results reported in the ANOVAs in Section 3.3.2.1 But lexical familiarity probably did affect non-native listeners’ performance on C targets in relation to native speakers And this rather than lack of sensitivity to the temporal cue for compound words in English may be a factor in the Vietnamese listeners higher error rates on C targets (see Table 5) An N target is unambiguously signalled, at least in the experimental materials, when it has a compositional reading that countermands some attribution of the subject in the context sentence, expressed by negation and the use of a synonym or antonym of the adjective used in the target sentence (e.g., Context: ‘y house y not cold’ Target ¼ ‘hot house’; * ‘hot-house’) The consistent use of the negative (‘not’ or verb+n’t) was probably a highly salient structural cue signalling an N target for the non-native listeners, even though they may not have fully appreciated the countermanding function of contrastive intonation It should be noted also that the intonation of the context sentence contained an intonational cue for the N target, which was not present in the case of the B or C targets; namely, the presence of a contrastive pitch accent, which was also signalled orthographically for subjects in the written stimulus materials by use of italics (e.g., ‘This bottle isn’t colored yellow’) The felicity condition for the B or broad-focus phrase is difficult to state with precision, because it is the default reading, which applies when the context demands no particular narrow focus The broad-focus context sentences that were used in the experimental sentences could not strictly rule out a compound reading of the target (After all, it may be truly said of most ‘blackberries’ that: ‘This berry is black’.) Nor could the broad focus triggers strictly be said to exclude a narrow-focus target reading, which can still produce a felicitous mini-discourse (This berry is black It’s a black berry! (emphatic).) Non-native listeners may not have appreciated these ambiguities, and simply followed a response strategy of choosing the B target whenever the context sentence mirrored the target (e.g., ‘yberry is black’ Target: ¼ ‘black berry’ The C and N targets never occur with ‘mirror contexts’ in the stimulus sentence set) Failure to appreciate the potential ambiguity of the broad focus contexts may explain why the Vietnamese learners performed as well as or better than English native listeners on this condition Thus, an item analysis of the target sentences and their contexts revealed a complex set of moderating conditions involving the pragmatics of focus assignment and lexical semantic knowledge of English that potentially affected listeners’ responses, as well as their ability to phonetically discriminate amongst the target stress patterns It also showed that response strategies specific to the properties of the item set used in the experiment may have been formulated by non-native listeners in performing the discrimination task Clearly, to tease out the operation of these factors would require a series of carefully designed control experiments, including, at least a proportion of distracter items to discourage the development of stimulus set-specific response strategies on the part of listeners General discussion While the results of the perceptual study are open to alternative interpretations, they are consistent with the findings of the production test in indicating that tonal transfer effects can explain beginner learners’ responses to the three contrastive stress patterns and that as L2 competence in English increases, Vietnamese learners acquire sensitivity to the temporal cues that differentiate (compound) words from otherwise segmentally homophonous phrasal constructions in English Indeed, it may be argued that the production task provided a clearer window on (inter-language) perceptual representations of the three prosodic patterns than did the results of the perceptual experiment, because the former was uncontaminated by pragmatic and lexical semantic factors that moderated listeners’ responses in the perceptual judgement task Some may argue, to the contrary, that the production task failed to tap a more abstract level of perceptual representation because the subjects’ task was simply to mimic short context+target sentence pairs, for which long-term phonological representations of the L2 items are not required But such a line of reasoning has difficulty accounting for the improved pattern-matching in the temporal domain evident in the productions of the advanced learners, ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 185 compared with the beginner group Further experimentation, with variable time delays or an intervening task between the presentation of the stimulus items and the elicited imitations may provide relevant data on this question Conclusion and prospectus In summary, the results of the acoustic analysis of the production data provided clear support for what is probably the widely accepted view that a combination of f0 and timing cues serve to distinguish minimal prosodic triplets of compound words and their broad and narrow-focus phrasal counterparts Compounds as a whole were shorter than phrases, which is argued to be due to a word shortening effect The narrow-focus noun phrases were marked by an extensive f0 change (mean of 80 Hz from the first to the second constituent) Intensity cues played a supportive role to f0 perturbations in the data set spoken by native speakers (intensity change correlated with f0 change) The results also lend some support for the conventional phonological analysis that compounds and early narrow-focus phrases share the distinguishing property of being prosodically left-headed in contrast to the default or broad-focus phrasal stress pattern However, the f0 change measurements also indicated, for the maximally contrastive but nevertheless natural stimulus set used in the present experiment that a three-way contrast set (N4C4B) may also be phonetically supported The magnitude of the mean f0 changes which separate the centroids of the three stress patterns spoken by native speakers (Broad: 10 Hz, Compound: 39 Hz, and Narrow: 80 Hz) on the f0 dimension were clearly in excess of complex tone f0 discrimination limen or f0 limen for categorical tone identification (Gandour, 1978) Particularly, the acoustic analysis of the native English speakers’ data also provides evidence for the claim that compounding involves phrases conforming to temporal and accentual word template in stress accent languages In terms of f0 cues, generally the two groups of non-native speakers had no problem in manipulating the f0 and intensity contrastive levels on the accent-bearing syllables as a result of positive transfer from lexical tonal pitch On the timing aspect, only advanced speakers could discriminate compounds from phrases by means of duration contrast, whereas beginners fail to use duration cue because it is not a distinctive tonal feature in Vietnamese In terms of contrastive relative prominent patterns, beginners also fail to realize the syntagmatic contrasts of accent in larger units such as polysyllabic words or phrases evidenced by their failure to deaccent the second element of the compound and narrow-focus patterns, which has a causal relationship with their failure to compress compounds That is, their compounds are of comparable duration to phrases due to many effects Compounds were unreduced as a result of (1) an accentual lengthening effect due to not deaccenting the nouns; and (2) not reducing unstressed syllables Another contributing transfer effect is the lack of a reliable prosodic difference between compounds and phrases by means of either temporal compression or tonal neutralization in L1 Vietnamese In addition, this also suggests a transfer effect of the paradigmatic tonal pattern where a lexical tone is preserved for each syllable, indicating a prosodic transfer effect at both phonological and phonetic levels and consistent with Ueyama and Jun (1998)’s findings on different interference effects respective to L1 Japanese and Korean post-focus tonal patterns The results of this study not only confirm the transfer effect of acoustic prosodic cues (Nguye˜ˆ n & Ingram, 2005; Ueyama, 2000) and L1 phonetic f0 patterns per se (McGory, 1997; Ueyama & Jun, 1998) but also suggest the transfer of functional (phonological) prosodic patterns (i.e., paradigmatic vs syntagmatic f0 contrast) On the other hand, the advanced speakers’ ability to de-accent the noun in the narrow and compound patterns and to compress the compound words to some extent indicates the effect of language learning/experience on prosodic acquisition Acknowledgements We would like to thank Prof Mary Beckman and the anonymous reviewers for their constructive and helpful comments Thanks to our subjects for their participation, Jeffery Chapman for ToBI transcription of part of the data The Postdoctoral research fellowship granted to N.T.A.T by the University of Queensland is gratefully acknowledged ARTICLE IN PRESS 186 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 Appendix A Three types of test words/phrases syllable+2 syllables syllables+1 syllable syllables+2 syllables Black berry Blue bottle Gray matter Hot houses Moving van Butter fish Heavy weight Rubber plant English teacher Sleeping partner Open classroom Plastic money Appendix B List of minimal sets of sentences black berry a This berry is black It is a black berry b This berry isn’t green It’s a black berry c This is a kind of fruit It is a blackberry blue bottle a This is a bottle which is colored blue It is a blue bottle b This bottle isn’t colored yellow It’s a blue bottle c This kind of jelly fish is quite common here It is a blue- bottle English teacher a This teacher is from England She is an English teacher b This teacher is not from France She’s an English teacher c This woman teaches English She is an English- teacher gray matter a This person has a lot of gray stuff He has a lot of gray matter b This person hasn’t a lot of green matter He has a lot of green matter c This person is very brainy He has a lot of gray- matter hot houses a These houses are very hot They are hot houses b These houses are not cold They’re hot houses c These houses are for growing plants They are hot- houses moving van a He is driving the van It is a moving van b This van is not parked It’s a moving van c This is a van for moving furniture It is a moving- van sleeping partner a This is her partner who is asleep He is her sleeping partner b This is not her partner who is awake He’s her sleeping partner c This is the person she sleeps with He is her sleeping- partner heavy weight a This man is heavy to carry He’s a heavy weight b The man isn’t light to carry He’s a heavy weight c He is the boxer in the heaviest weight group He’s a heavy weight butter fish a This fish is made from butter It’s a butter fish b This fish is not made from flour It’s a butter fish c This is a kind of tropical fish It’s a butter fish 10 plastic money a This money is made from plastic It’s plastic money b This is not paper money It’s plastic money c This is a credit card It’s plastic money ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 187 11 rubber plant a This plant is made from rubber It’s a rubber plant b This is not a real plant It’s a rubber plant c This is a kind of tree It’s a rubber plant 12 open classroom a The classroom is open It’s an open classroom b This classroom is not closed It’s an open classroom c There are no rows of desks here It’s an open classroom Appendix C The answer sheet for the perception experiment Instructions for the test This stocking is blue It is a blue STOCKing This stocking is not green It is a BLUE stocking This woman is a feminist She is a blue-stocking ‘‘blue stocking’’ in the above three sentences have different meanings depending on three different stress patterns -‘‘blue STOCKing’’ in (1) is a broad-focus noun phrase The stress is on ‘‘stock’’ It means a stocking which has blue color -‘‘BLUE stocking’’ in (2) is a narrow-focus noun phrase in which the word ‘‘BLUE’’ is emphasized to show the contrast with ‘‘green’’; it means that the stocking is blue, not green, and thus BLUE is stressed -‘‘blue-stocking’’ in (3) is a compound noun with a lexical stress on blue It is a word, not a phrase It means ‘‘a feminist’’ Notice the three different stress pattern: 00 blue STOCKing: 00 BLUE stocking blue-stocking – In the following test, you will hear a contextual sentence followed by three different target sentences Your task is to choose the target sentence with the appropriate stress pattern that fits the meaning of the context sentence The contextual sentence will be read once and the three target sentences will be read twice Circle the letter of your choice 10 11 12 13 14 15 16 17 18 This berry is black This bottle isn’t colored yellow This person is very brainy These houses are very hot This van is not parked This is a kind of tropical fish This man is heavy to carry This is not a real plant This woman teaches English This is her partner who is asleep This is not paper money There are no rows of desks here This berry is black This bottle isn’t colored yellow This person is very brainy These houses are very hot This van is not parked This is a kind of tropical fish A A A A A A A A A A A A A A A A A A B B B B B B B B B B B B B B B B B B C C C C C C C C C C C C C C C C C C ARTICLE IN PRESS 188 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 This man is heavy to carry This is not a real plant This woman teaches English This is her partner who is asleep This is not paper money There are no rows of desks here This berry isn’t green This kind of jelly fish is common here This person has a lot of gray stuff These houses are not cold This van is for moving furniture This fish is made from butter The man isn’t light to carry This is a kind of tree This teacher is from England This is not her partner who is awake This is a credit card The classroom is open This is a kind of fruit This is a bottle which is colored blue This person hasn’t a lot of green matter These houses are for growing plants He is driving the van This fish is not made from flour He is a boxer in the heaviest weight group This plant is made from rubber This teacher is not from France This is the person she sleeps with This money is made from plastic This classroom is not closed This is a kind of fruit This is a bottle which is colored blue This person hasn’t a lot of green matter These houses are for growing plants He is driving the van This fish is not made from flour He is a boxer in the heaviest weight group This plant is made from rubber This teacher is not from France This is the person she sleeps with This money is made from plastic This classroom is not closed This berry isn’t green This kind of jelly fish is common here This person has a lot of gray stuff These houses are not cold This van is for moving furniture This fish is made from butter The man isn’t light to carry This is a kind of tree This teacher is from England This is not her partner who is awake A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C ARTICLE IN PRESS T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 71 72 This is a credit card The classroom is open A A 189 B B C C References Archibald, J (1998) Second language phonetics, phonology, and typology Studies in Second Language Acquisition, 20, 189–211 Atkinson-King, K (1973) Children’s acquisition of phonological stress contrasts Unpublished doctoral dissertation, University of California, Los Angeles Barker, C (1989) Extrametricality, the cycle, and Turkish word stress In J Runner (Ed.), Phonology at Santa Cruz UC Santa Cruz: Syntax Research Center Bartels, C., & Kingston, J (1994) Salient pitch cues in the perception of contrastive focus In P Bosch, & R van der Sandt (Eds.), Focus and natural language processing, Proceedings of the journal of semantics conference on focus IBM working papers, TR-80.94-006 Beckman, M E (1986) Stress and non-stress accent Holland, Dorrecht, The Netherlands: Foris Publications Beckman, M., & Ayers, G (1994) Guidelines for ToBI Labelling Unpublished manuscript Ohio State University Version March 1997 Downloadable manuscript /http://ling.ohio-state.edu/Phonetics/etobi_homepage.htmlS For information on obtaining by ftp, send e-mail to tobi@ling.ohio-state.edu and visit /http://ling.ohio-state.edu/$tobi/S] Beckman, M E., & Pierrehumbert, J B (1986) Intonational structure in Japanese and English Phonology Yearbook, 3, 255–309 Best, C T (1995) A direct realist view of cross language speech perception In W Strange (Ed.), Speech perception and linguistic experience: Issues in cross language research Baltimore: York Press Blair, A D., & Ingram, J (2003) Learning to predict the phonological structure of English loanwords in Japanese Applied Intelligence, 19, 101–108 Bolinger, D L., & Gerstman, L J (1957) Disjuncture as a cue to constructs Word, 13, 246–256 Brunelle, M (2003) Coarticulation effects in Northern Vietnamese tones In Proceedings of the 15th international conference of phonetic sciences, Barcelona, 3–9 August Cassidy, S (1999) Compiling multi-tiered speech databases into the relational model: Experiments with the Emu System In Proceedings of the Eurospeech ’99, Budapest, September 1999 Chafe, W (1974) Language and consciousness Language, 50, 111–133 Chen, M Y (2000) Tone sandhi: Patterns across Chinese dialects Cambridge, UK: Cambridge University Press Couper-Kuhlen, E (1984) A new look at contrastive intonation In R Watts, & U Weidman (Eds.), Modes of interpretation: Essays presented to Ernst Leisi (pp 137–158) Gunter Narr Verlag Dauer, R M (1983) Stress-timing and syllable-timing reanalyzed Journal of Phonetics, 11, 51–62 Farnetani, E., & Cosi, P (1988) English compound versus non-compound noun phrases in discourse: An acoustic and perceptual study Language and Speech, 31, 157–180 Faure, G., Hirst, D J., & Chafcouloff, M (1980) Rhythm in English: Isochronism, pitch, and perceived stress In L R Waugh, & C H van Schooneveld (Eds.), The melody of language (pp 71–79) Baltimore: University of Park Press Flege, J E (1995) Second language speech learning: Theory, findings and problems In W Strange (Ed.), Speech perception and linguistic experience: Issues in cross language research Baltimore: York Press Fletcher, J., & Harrington, J (2001) High-rising terminals and fall–rise tunes in Australian English Phonetica, 58(4), 215–229 Fry, D B (1955) Duration and intensity as physical correlates of linguistic stress Journal of the Acoustical Society of America, 27, 765–768 Gandour, Jack (1974) On the representation of tone in Siamese UCLA Working Papers in Phonetics, 27, 118–146 Gandour, J T (1978) The perception of tone In V A Fromkin (Ed.), Tone: A linguistic survey (pp 41–72) New York: Academic Press Garde, Paul (1965) Accentuation et morphologie La Linguistique, 2, 24–39 Gsell, R (1980) Remarques sur la structure de l’ espace tonal en Vietnamien du sud (Parler de Saigon) Cahiers d’etudes Vietnamiennes, 4, 1–26 Universite´ Paris Hardcastle, W J (1968) Stress in Australian English Unpublished M.A thesis, University of Queensland Hoˆ`, a˘ć Tuć (1997) Tonal facilitation of code-switching Australian Review of Applied Linguistics, 20(2), 129–151 Hoa`ng, Tue:ˆ , & Hoa`ng, Minh (1975) Remarques sur la structure phonologique du Vietnamien Etudes Vietnamiennes, 40 Hanoi Ingrisano, D., & Weismer, G (1979) s-Duration: methodological and linguistic factors Phonetica, 36, 32–43 Inkelas, S (1999) Exceptional stress-attracting suffixes in Turkish: Representations vs the grammar In W Zonneveld (Ed.), The prosodymorphology interface (pp 134–187) Cambridge: Cambridge University Press Iverson, P., Kuhl, P K., Akahane-Yamada, R., Diesch, E., Tohkura, Y., Kettermann, A., et al (2003) A perceptual interference account of acquisition difficulties for non-native phonemes Cognition, 87, B47–B57 Jackendoff, R (1972) Semantic interpretation in generative grammar Cambridge: MIT Press Jannedy, S (1997) Acquisition of narrow focus prosody In Proceedings of the GALA ’97 conference: Language acquisition, knowledge representation & processing Kubozono, H (1993) The organization of Japanese prosody Tokyo: Kurosio Publishers Kuhl, P K (1993) Innate predispositions and the effects of experience in speech perception: The native language magnet theory In B de Boysson-Bardies, et al (Eds.), Developmental neurocognition: Speech and face processing in the first year of life Dordrecht: Kluwer Academic Publishers ARTICLE IN PRESS 190 T.A.-T Nguyeˆñ et al / Journal of Phonetics 36 (2008) 158–190 LaCharite´, D., & Paradis, C (2005) Category preservation and proximity versus phonetic approximation in loanword adaptation Linguistic Inquiry, 36(2), 223–258 Ladd, D R., & Morton, R (1997) The perception of intonation emphasis: Continuous or categorical? Journal of Phonetics, 25, 313–342 Ladd, R D (1980) The structure of intonational meaning Bloomington: Indiana University Press Levi, Susannah V (2002) Intonation of noun compounds and genitives In Turkish ninth international phonology meeting on structure and melody, 1–3 November, Vienna Levi, Susannah V (2005) Acoustic correlates of lexical accent in Turkish Journal of the International Phonetic Association, 35, 73–97 McGory, J T (1997) Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese speakers Unpublished Ph.D., Ohio State University Mennen, I (2004) Bi-directional interference in the intonation of Dutch speakers of Greek Journal of Phonetics, 32, 543–563 Michaud, A., & Vu, Ngoc Tuaˆn (2004) Glottalized and nonglottalized tones under emphasis: Open quotient curves remain stable, f0 curve is modified In Bernard, B., & Isabelle, M (Eds.), Speech prosody—International conference (pp 745–748), Nara, Japan, March 23–26, ISCA Archive Nguyeˆñ, aˇng Lieˆm (1970) A contrastive phonological analysis of English and Vietnamese (Pacific linguistics series, no 8) Canberra: Australian National University Nguyeˆñ, ı` nh Hoà (1980) Language in Vietnamese society Chicago: University of Illinois Press Nguyeˆñ, Thi Anh Thu’ (2003) Prosodic transfer: The tonal constraints on Vietnamese acquisition of English stress and rhythm Ph.D thesis, Australia: University of Queensland Nguyeˆñ, T A T., & Ingram, J (2004) A corpus-based analysis of transfer effects and connected speech processes in Vietnamese English In Proceedings of the tenth Australian international conference on speech science & technology, Macquarie University, Sydney, 8–10 December Nguyeˆñ, T A T., & Ingram, J (2005) Vietnamese acquisition of English word stress TESOL Quarterly, 39(2), 309–319 Nguyeˆñ, T A T., & Ingram, J C (2007) Acoustic and perceptual cues for compound–phrasal contrasts in Vietnamese The Journal of the Acoustical Society of America, 112(3), 1746–1757 Nguyeˆñ, V L., & Edmondson, J (1997) Tones and voice quality in modern northern Vietnamese: Instrumental case studies Mon-Khmer Studies, 28, 1–18 Ph: am, Hoa Andrea (2003) Vietnamese tone: A new analysis New York: Routledge Pierrehumbert, J., & Hirschberg, J (1990) The meaning of intonational contours in discourse In P Cohen, J Morgan, & M Pollack (Eds.), Intentions in communication Cambridge, MA: MIT Press Pittam, J., & Ingram, J (1991) Influence of Vietnamese tone and prosody on the acquisition of English stress patterns In Proceedings of the second European conference on speech communication and technology meeting Riney, T J (1988) The interlanguage phonology of Vietnamese English Unpublished Ph.D., Georgetown University Rossi, M (1998) Intonation in Italian In D Hirst, & A Di Cristo (Eds.), Intonation systems: A survey of twenty languages Cambridge: Cambridge University Press Silverman, D (1992) Multiple scansions in loanword phonology: Evidence from Cantonese Phonology, 9, 289–328 Thompson, L (1987) A Vietnamese reference grammar Honolulu: University of Hawaii Press Trubetskoy, N S (1939) Grundzuege der phonologie (Travaux du Cercle linguistique de Prague No 7.) Prague: Cercle linguistique de Prague [Translated 1969, by C.A.M Baltaxe as Principles of phonology University of California Press.] Ueyama, M (2000) Prosodic transfer: An acoustic study of L2 English vs L2 Japanese Unpublished Ph.D thesis, University of California Los Angeles Ueyama, Motoko, & Jun, Sun-Ah (1998) Focus realization in Japanese English and Korean English intonation Japanese and Korean Linguistics, 7, 629–645 Umeda, N (1977) Consonant duration in American English Journal of the Acoustical Society of America, 60, 846–858 Vogel, I., & Raimy, E (2002) The acquisition of compound vs phrasal stress: The role of prosodic constituents Journal of Child Language, 29, 225–250 Vu˜, Thanh Phu’o’ng (1981) The acoustic and perceptual nature of tone in Vietnamese Unpublished Ph.D thesis, Australian National University, Canberra Vu˜, Thanh Phu’o’ng (1982) Phonetic properties of Vietnamese tones across dialects Papers in South-East Asian Linguistics, 8, 55–76 Willems, N (1982) English intonation from a Dutch point of view Dordrecht: Foris Publication ... study of prosodic transfer effects in the production and perception of three contrastive English stress patterns by Vietnamese learners of English These contrasts are framed as sets of ‘minimal... findings of the production test in indicating that tonal transfer effects can explain beginner learners’ responses to the three contrastive stress patterns and that as L2 competence in English increases,... Intentions in communication Cambridge, MA: MIT Press Pittam, J., & Ingram, J (1991) In? ??uence of Vietnamese tone and prosody on the acquisition of English stress patterns In Proceedings of the second

Định dạng
Số trang	33
Dung lượng	637,71 KB