The Perception of English Speech Sounds by Cantonese ESL Learners in Hong Kong

The Perception of English Speech Sounds by Cantonese ESL Learners in Hong Kong ALICE Y W CHAN City University of Hong Kong Hong Kong Special Administrative Region, China This article reports on the results of a research study which investigated the perception of English speech sounds by Hong Kong Cantonese English as a second language speakers A total of 40 university English majors participated in one categorial discrimination task and two second language (L2) minimal pair identification tasks, which aimed at discerning the participants’ perception of different English speech sounds The results show that certain English speech sounds trigger more perception problems than others, but perception problems not necessarily correspond to documented production difficulties It is argued that learners’ preconception of word pronunciations may be a contributing factor for their perception problems The position of a sonorant consonant may also play a role in perception, but positional effects not seem to be as significant in the perception of obstruents as in that of sonorant consonants It is suggested that remedial teaching on both perception and production should go hand in hand to enhance learners’ L2 phonology acquisition doi: 10.5054/tq.2011.268056 he role that one’s native language plays in the acquisition of a second or foreign language has often been of interest to language education specialists Discussions of second language acquisition often attribute a learner’s second language (L2) learning difficulty to the differences between his or her native language and the target language Examples of such perspectives include the Contrastive Analysis Hypothesis (CAH) proposed by Lado (1957) and the Markedness Differential Hypothesis (MDH) proposed by Eckman (1977) The Intralingual Markedness Hypothesis (IMH) developed by Carlisle (1988) incorporates markedness relationships within the target language and claims that, if the structures in the target language differing from those in the native language are in a markedness relationship, the more marked structures will be more difficult to acquire than the less marked ones Despite their diverse theoretical underpinnings, these hypotheses all focus on the differences between T 718 TESOL QUARTERLY Vol 45, No 4, December 2011 the first language (L1) and the L2 and attempt to explain learner difficulties in terms of L1-L2 differences In L2 phonology acquisition research, the production of speech sounds is one major area of investigation Carlisle (1988), for example, investigated Spanish speakers’ production of English onsets not found in Spanish and proposed the IMH Major and Kim (1999) studied the production of English /d / and /z/ by Korean speakers to test their Similarity Differential Rate Hypothesis Lo (2005) and Chan (2007) investigated the production of English final consonants by Cantonese English as a second language (ESL) learners in Hong Kong to assess the validity of the MDH There have also been other research studies which did not address a particular theoretical framework but which also focused on production Bohn and Flege (1992), for example, investigated the production of English vowels by adult German learners; Chan and Li (2000) and Deterding, Wong, and Kirkpatrick (2008) documented the production difficulties of Hong Kong Cantonese ESL learners; Deterding (2006) studied the pronunciation of English by speakers from China; Hung (2000) proposed a phonology of Hong Kong English based on the production of English speech sounds by a group of Hong Kong Cantonese ESL learners; and Chan (2006a, 2006b) examined Hong Kong Cantonese ESL learners’ production of English singleton codas and onset clusters However essential it is, speech production is not the only area of investigation in L2 phonology research L2 learners’ perception of L2 speech sounds is another major area for the understanding of interphonology It has been argued that speech perception bears an intimate relation with speech production, in such a way that learners’ perception may affect the accuracy with which L2 phonetic segments can be produced (e.g., Munro & Derwing, 1995; Schmid & Yeni-Komshian, 1999) Flege (1995) claims that ‘‘without accurate perceptual targets to guide the sensorimotor learning of L2 sounds, production of the L2 sounds will be inaccurate’’ (p 238) The Speech Learning Model developed by Flege (1995) and the Perceptual Assimilation Model developed by Best (1994) address learners’ speech perception to account for their speech production Both of these models maintain that perception informs production in the learning of L2 speech sounds Research in the area of speech perception has either focused on perception alone or on the interaction between perception and production Ingram and Park (1997), for example, investigated the perception of nonnative vowels by Japanese and Korean learners of English and found that the participants confused the /e/ and /æ/ vowels that are not contrastive in their languages but showed no difficulty perceiving other vowels contrastive in their languages Pater (2003) studied the perceptual acquisition of Thai phonology by English THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 719 speakers and found that learners’ perception of aspiration outperformed their perception of voice The place of articulation was also found to have interacted with perception of laryngeal distinctions Flege and Mackay (2004) examined the perception of English vowels by Italian native speakers and concluded that learning an L2 in childhood did not guarantee a native-like perception of L2 vowels Studies which attempted to explore the interaction between perception and production include Proctor (2004), who investigated the production and perception of Australian English vowels by Vietnamese and Japanese ESL learners and found evidence for the transfer of skills in the perception of duration from L1 to L2 Chan (2001) investigated the perception and production of English word-initial consonants by Cantonese speakers and found a positive correlation between the two: Learners who consistently mispronounced the target consonants had significantly poorer perceptual performance than those who consistently produced the same sounds correctly Having observed enhanced intelligibility of Japanese learners’ production of the English /r/ and /l/ contrast resulting from perceptual training, Bradlow, Pisoni, Akahane-Yamada, and Tohkura (1997) claimed that perceptual knowledge gained in perceptual training could be transferred to learners’ production domain, and that there might be a common mental representation determining both speech perception and speech production The importance of speech perception in L2 phonology and the effects of perception on production are evident PHONOLOGY ACQUISITION BY CANTONESE ESL LEARNERS IN HONG KONG Research into L2 phonology acquisition by Cantonese ESL learners in Hong Kong has mainly focused on production Learner difficulties in speech production are often attributed to the differences between the English and Cantonese phonological systems and their inventory gaps (e.g., Chan & Li, 2000; Jones, 1979) It has often been documented that English consonants nonexistent in Cantonese, such as /h/, /z/, /r/ / /, are difficult for Cantonese ESL learners to produce In spoken production, these consonants are often replaced by similar consonants found in the native (e.g., [ts] for / /) and/or the target languages (e.g., [f] for /h/; [w] for /r/) either in the same category (e.g., fricatives replacing fricatives) or across categories (e.g., sonorant consonants replacing fricatives) (e.g., Bolton & Kwok, 1990; Chan & Li, 2000; Hung, 2000) Those consonants with comparable equivalents in Cantonese, such as /f/, /s/, and /w/, are not difficult to produce, but they are often used to substitute for target English consonants nonexistent in Cantonese 720 TESOL QUARTERLY Final /l/, often surfacing as dark [l˜] in Received Pronunciation (RP) English (Ladefoged, 2006; Roach, 2000), has often been omitted or replaced by [u] (vocalization), especially when the preceding vowel is a front vowel (Bolton & Kwok, 1990; Chan, 2006a, 2007; Chan & Li, 2000; Deterding et al., 2008; Hung, 2000), resulting in nondifferentiation of words such as dew and dill (both being pronounced as [dIu] or [dIu:]) Initial /n/ is often pronounced as [l] (Wong & Setter, 2002) Devoicing of voiced consonants, especially of final ones, is also common It has been found that Cantonese ESL learners not actualize the voicing contrast between voiced and voiceless obstruents in production (Chan & Li, 2000; Chan, 2006a, 2007), so words such as sip and zip and safe and save are practically indistinguishable Some English vowels, such as / :/ and /æ/, not exist in Cantonese and are often replaced with similar Cantonese vowels (e.g., Cantonese [œ] for English / :/, Cantonese [e] for English /æ/; Chan & Li, 2000; Sewell, 2009) Other pairs of English vowels, such as /i:/ and /I/, /u:/ and / /, and / :/ and / /, are often confused with one particular Cantonese vowel, namely /i/, /u/, and / /, respectively When pronouncing either vowel of these tense and lax vowel pairs, some learners may use a lax vowel for a tense one, others a tense vowel for a lax one, and still others may produce a vowel intermediate between the tense and lax ones (Chan & Li, 2000; see Appendix I for a comparison between English and Cantonese) Diphthongs are also often reported to have been replaced by pure vowels, such as /e/ for /eI/, and /a:/ for /aI/, especially before a final nasal (Chan, 2004; Chan & Li, 2000) Despite the well-known findings concerning Cantonese ESL learners’ difficulties in producing English speech sounds and the documentation of the importance of perception on production in the literature, there is a lack of research into the perception of English speech sounds by Cantonese ESL learners (Chan, 2001 is a notable exception) Investigating the way English speech sounds are perceived by a Cantonese ESL learner may give researchers valuable insights into the learner’s interlanguage THE PRESENT STUDY Objectives of the Study The present study aimed to explore the perception of English speech sounds by Cantonese ESL learners in Hong Kong In accordance with previous claims that perception informs production in the learning of L2 speech sounds, the following hypotheses were postulated: THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 721 Speech sounds documented in the literature as posing production problems for Cantonese ESL learners are difficult to perceive Speech sounds documented in the literature as not posing production problems for Cantonese ESL learners not pose perception problems If perception problems are found for both easy (easy to produce) and difficult (difficult to produce) sounds, Cantonese ESL learners’ perception of easy sounds will be better than that of difficult sounds The objectives of the study were (a) to test the above hypotheses by analyzing Cantonese ESL learners’ perception performance within the context of documented production evidence and (b) to identify the factors which may affect L2 perception Perceptual Targets English consonants which have been documented in the literature as posing production problems for Cantonese ESL learners were included in the study as perceptual targets For obstruents, these included all the fricatives and affricates nonexistent in Cantonese except / /, namely /ð/, /h/, /v/, /# /, /z/, /d /, and /t# / / / was excluded because of its limited distribution Plosives were also excluded because of Cantonese ESL learners’ usual production patterns: Initial plosives are not at all problematic, yet final plosives are often unreleased, giving rise to an absence of articulatory differences (Chan & Li, 2000; Chan, 2006a) For sonorant consonants, all problematic sounds documented, namely / /, /n/ (in initial position), /l/ (in final position), and /r/, were included Because in production there often exist one or more common substitutes for each of the above problematic consonants, the common substitutes for the problematic consonants were also included in the study as perceptual targets and were paired up with the corresponding problematic ones The only exception was the two affricates, which were paired up with themselves (the most common substitutes for English affricates are the corresponding Cantonese affricates and could not be included in the perception tasks) As a result, seven obstruent pairs (namely /ð, d/, /h, f/, /v, f/, /v, w/, /# , s/, /z, s/, /d , t # /)1 and five sonorant consonant pairs (namely / , n/, /n, l/, /l, u:/, /r, l/, /r, w/) were contrasted in the study The bolded items were the problematic sounds for production (i.e., the first item in each pair except /d , t#/, in which both are problematic), whereas the unbolded ones were the common substitutes which not normally cause production problems 722 The classification of /v, w/ under obstruents was made following the category of the difficult (to produce) sound (/v/), not the substitute (/w/) The same rationale applied to /l, u:/ (classified under sonorant consonants instead of vowels) TESOL QUARTERLY Unlike consonants, only a few vowels have been documented in the speech production literature as common substitutes for others (e.g., /e/ for /æ/, /a:/ for /aI/) A number of other vowels have been described as indistinguishable from others instead, such as tense and lax vowels (see earlier) Therefore, vowels were paired up in the study either as indistinguishable pairs or target and substitute pairs A total of six vowel pairs were thus included, namely /i:, I/, /u:, /, / :, /, /æ æ, e/, /aI, æ/, and /aI, a:/ The bolded items were the problematic sounds (i.e., the first item in /æ, e/ and /aI, a:/ and all the others) for production, whereas the unbolded ones were the common substitutes used for contrast Procedures Participants Forty Hong Kong Cantonese ESL learners (29 females and 11 males) participated in the study Their ages ranged from 19 years to 42 years at the time of the study They were all English majors studying at three local universities, including Year students, 22 Year students, and 10 Year or postgraduate students All of the students started their learning of English at the age of years or below, so they had all learned the language for 13 years or more at the time of the study Twenty-six students had received some form of phonetics training (such as taking a phonetics and phonology or a pronunciation course), and the accent they learned was RP English.2 Although no measures were administered to test the participants’ proficiency levels, they could all be regarded as advanced ESL learners, in view of their exposure to English and their major area of study (i.e., English) Having a greater number of female participants than male participants was virtually unavoidable, because in Hong Kong female language majors significantly outnumber male ones The inclusion of students with phonetics training as well as students without phonetics training was intended to ensure a more representative sample of advanced learners No attempt was made to compare the performance of the two groups Stimuli Preparation Three sets of English stimuli (isolated English phones, isolated English words, and English words embedded in sentences) for use in It may be argued that not much RP was really used in the phonetics and phonology or pronunciation courses taken by the participants, given that many ESL teachers, native or nonnative English speakers alike, not speak RP themselves However, because the accent the participants had learned was indicated by the participants themselves, such information is reported in this article as is THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 723 three perception tasks were spoken in RP English by a female phonologist in a soundproof room and recorded using a high-quality mini-disk recorder (SONY MZ+-R910; for details of the perception tasks, see later) Before the actual experiments, the stimuli were played to two native speakers of English, who gave confirmatory judgments that the target stimuli were accurate representations of the target sounds During the experiments, these stimuli (together with accompanying research materials) were presented to the participants individually at a comfortable volume over earphones in a quiet room The use of stimuli spoken in RP English may be considered inappropriate to the Hong Kong context, given that RP English is not widely used in the territory and the majority of local ESL speakers, English teachers inclusive, typically speak English with an identifiable Hong Kong accent rather than conforming to a British reference norm However, as Deterding et al (2008) point out, Hong Kong English is only in the process of developing its own identity, and the city is still looking elsewhere for its norms Regardless of which English accent is the dominant variety, it is best to adopt a standard model in a perception study in an ESL setting like the present one RP English, being the most widely accepted model in pronunciation teaching and learning and the accent that many participants claimed to have learned, was chosen as the accent used for the perception tasks Data Collection Tasks The participants took part in one categorical discrimination task and two L2 minimal pair identification tasks individually A research assistant was responsible for implementing all the experiments and gave a short briefing session to each participant to ensure that he or she was clear about the requirements Verbal instructions, mainly in Cantonese or a mixed code of Chinese and English, were also given where necessary Categorial discrimination task (Task 1) The first task was a categorial AXB discrimination test based on Best, McRoberts, and Goodell (2001) Seventy-two trios of isolated phones, with two instances of one phoneme in different positions and one instance of another phoneme in the remaining position for each trio (e.g., /i:/ /i:/ /I/, or /i:/ /I/, /I/), were presented to the participants over earphones These phone stimuli were recorded as individual phones rather than as phones excised from spoken words For each trial, the participants had to determine whether the middle item spoken (X) was the same as the first (A) or the third (B) item and to indicate their judgment by circling the answer (i.e., either A or B) on the response 724 TESOL QUARTERLY sheet, which showed 72 corresponding lists of A X B All the four trial types, AAB (e.g., i:, i:, I), ABB (e.g., i:, I, I), BBA (e.g., I, I, i:), and BAA (e.g., I, i:, i:), were presented for each pair of contrasts Following Best and Strange (1992), the interstimulus interval was set to s to minimize backward and forward masking between adjacent stimuli (see Appendix II for a summary of the contrast pairs used) L2 minimal pair identification tasks Word identification task (Task 2) The second task was a word identification task Eighty-five English words (e.g., fan) spoken in isolation twice with an interstimulus interval of about s between the two tokens of each word were presented A response sheet with the recorded word (e.g., fan) and a word differing in only one phoneme (e.g., van) was given to the participants, who listened to the recording and identified the word they had heard from the corresponding pair (e.g., fan, van) on the response sheet About four to five minimal pairs were presented for each contrast For vowels, care was taken to ensure that the target sounds appeared in a range of phonological environments, and for consonants, care was taken to ensure that the target sounds appeared in both initial and final positions, except for sounds which are only permissible in a certain position in RP (e.g., / /, /r/, w/) and those which were included in the study for contrast only in a particular position (e.g., /n, l/ were included for contrast only in initial position; see Appendix II for a list of minimal pairs used) Picture identification task (Task 3) The third task was a picture identification task In this task, target English words were placed in carrier sentences (e.g., Now I say _fan_) instead of spoken in isolation to avoid the list effect which may have resulted from the previous task For each trial, the participants were presented with two pictures showing the target English word and a word differing in only one phoneme (e.g., a picture of a van labeled A and a picture of a fan labeled B) They had to listen to the recording and determine whether Picture A or Picture B represented the word being spoken in the carrier sentence (e.g., fan) and circle the label (A or B) Care was taken to ensure that the target words were identifiable from the pictures and the pictures were clear Where it was necessary to use a word that was difficult to represent in a picture, a cueing sentence was given alongside the picture For example, the picture for the word save showed a floppy disk and a cueing sentence which read ‘‘Click this icon to .’’ Like the previous task, about four to five minimal pairs THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 725 were presented for each contrast Care was also taken to ensure that the target sounds appeared in a range of phonological environments unless otherwise prohibited by the phonotactics of the language or the purpose of the study A total of 84 picture pairs were used (see Appendix II for a list of picture words used) All the words included in the two L2 minimal pair identification tasks were simple monosyllabic or disyllabic words (e.g., fan, river) to minimize perceptual efforts For a small number of minimal pairs included in Tasks and 3, it may be the case that a certain word (e.g., though) is more commonly used than the other (e.g., dough) Given that all the participants were advanced ESL learners, such differences in word frequencies did not impinge on the implementation of the tasks Two identification tasks (Tasks and 3) instead of one were included to introduce some variety in the presentation of visual prompts: In Task 2, the visual prompts were given in the form of spelling pairs in the response sheets Such word spellings might have given unexpected cues to the participants In Task 3, the visual prompts were given in the form of picture pairs only, so such unexpected spelling cues were avoided These two identification tasks were different from the previous categorial discrimination task (Task 1) in three respects: (a) The stimuli presented to the participants in Task were isolated phones, whereas those presented in Tasks and were sounds embedded in words; (bi) no visual prompts were given to the participants in Task 1, whereas visual prompts in the form of spelling pairs or picture pairs were given in Tasks and 3; and (c) Task was a discrimination task which required participants to determine whether the presented sounds were the same or different based on their deliberate comparison between the stimuli Tasks and were identification tasks which required participants to identify the appropriate words or pictures which matched the spoken stimuli Data Analysis The percentages of the participants’ correct judgments on each sound (or sound pair) in each task and in all the tasks were computed by dividing the number of accurate judgments by the total number of tokens presented The resultant accuracy rates were used to uncover (a) the L2 sounds or sound categories that the participants often misperceived, (b) the frequency with which a particular target sound was misperceived as another, and (c) the participants’ relative difficulty in perceiving a target sound and its most common production substitute(s), if any Proportion Z tests were used to determine the significance of the differences in the participants’ performance on different sounds or 726 TESOL QUARTERLY sound pairs and/or their performance in different tasks A Proportion Z test is a test of the significance of the difference between two proportions from independent samples (Davis, 1982) Assuming that the samples are normally distributed, if Z (Z value) 1.96, then the difference between the two proportions is significant at the 0.05 significance level Otherwise, the difference can be attributed to sampling errors Given that the results were presented as accuracy rates (proportions), Proportion Z tests were deemed the most appropriate statistical analysis for the study RESULTS In the following sections, the participants’ performance on the three categories of obstruents, sonorant consonants, and vowels is discussed first This is followed by discussion of the participants’ performance in the different perception tasks A brief comparison between the participants’ performance on the problematic sounds and on their substitutes is also given as a basis to test the three hypotheses which underlay the study PERFORMANCE ON DIFFERENT CATEGORIES OF SOUNDS OBSTRUENTS The participants achieved an overall accuracy rate of 86% for obstruents Their perception of the fricative pair /# , s/ was the best The overall accuracy rate over the three tasks for these two fricatives was nearly perfect (99.6%) Their perception of the fricative pair /h, f/ was the poorest, regardless of task requirements: Overall accuracy rate was only 64% (65% for /h/ and 63% for /f/) These findings are in line with Tabain’s (1998) findings on native speakers’ perception of fricatives, such that /f/ and /h/ are likely to be confused, whereas /s/ and /# / are not Other obstruent pairs also presented some difficulties to the participants, but to a much lesser extent (e.g., 92% accuracy rate for /ð, d/, 91% for /v, w/, 90% for /d , t#/, 86% for /v, f/, and 83% for /z, s/; see Table 1).3 In all the tables, the data are presented as results on a contrast pair and as results on individual items in the pair Should the accuracy rate of an individual item be lower than 100%, then the misperceived tokens were perceived as the other item in the corresponding contrast pair THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 727 common substitutes When contrasted with /w/, /r/ received an overall accuracy rate of 95%, whereas /w/ only received an overall accuracy rate of 89% This difference was statistically significant at the 0.05 significance level (Z 2.53) Similar patterns were found for /l, u:/ (94% and 74%; Z 6.17; see Table 3) Some interesting findings contrary to documented production evidence also emerged from the participants’ perception of vowels /aI/ and /a:/ both received 100% accuracy rates when contrasted with each other /e/, a vowel documented as a common substitute for /æ/ in production, was among the most problematic vowels for perception Its overall accuracy rate was only 79%, which was statistically not significantly different from the accuracy rate of /æ/ (75%; Z 1.12) DISCUSSION Learners’ Perception Performance Within the Context of Documented Production Evidence It can be seen from the results of the study that none of the hypotheses concerning the relationship between speech perception and speech production, as set out at the beginning of the article, were borne out Speech sounds often documented as causing production problems for CantoneseESLlearnersdidnotnecessarilyposeproblemsfortheparticipants in the study (Hypothesis 1), whereas some sounds often documented as unproblematic for production indeed posed perception problems for some of the participants (Hypothesis 2) The participants’ perception of speech soundsdocumentedaseasytoproducewasnotbetterthantheirperceptionof speech sounds documented as difficult to produce (Hypothesis 3) The exclusive focus of the present study on perception without any production evidence from the same group of learners does not license any solid claims about the relationship between perception and production However, given the deviations of the findings from the premises of the hypotheses, there is reason to believe that ESL learners’ speech perception may not display a strict one-to-one positive correlation with their speech production, and it is yet to be ascertained whether Bradlow et al.’s (1997) claim about a common mental representation for perception and production is applicable to advanced Cantonese ESL learners Further research with both production and perception data from the same group of learners is needed to establish a precise relationship between the two areas of speech learning 734 TESOL QUARTERLY Effects of L1 The results of the present study suggest that, unlike production difficulties which have often been explained in terms of the differences between the native and target languages (e.g., in CAH, MDH, and IMH, as mentioned earlier), contrastive differences between the native and target languages not have significant effects on perception: Absence of a nonnative contrast in a learner’s native language (e.g., /aI, æ/; /r, l/; and / , s/) does not inevitably result in difficulty in perceptually tuning in to the differences, and presence of a nonnative contrast in a learner’s native language (e.g., / , n/) does not categorically facilitate perception A learner’s mother tongue repertoire does not necessarily form the basis of perceptual abilities, and inventory gaps may not be the principal source of perception problems It might be argued that similarity between the L1 and L2, rather than differences, is a more important factor affecting L2 perception Flege (1987) claims that similar sounds are difficult to acquire because a learner classifies or perceives them to be equivalent to those in their L1, whereas dissimilar sounds are easier to learn because the differences between the L1 and L2 sounds are more easily noticed Take /h, f/ as an example Articulatorily, /h/ and /f/ only differ in the place of articulation: The former is dental but the latter is labiodental (Roach, 2000) Acoustically, both have relatively low intensity (Gimson & Ramsaran, 1989) and have the main noise energy in the high-frequency band from about 6,000 to 8,000 Hz (Fry, 1979) Would these slight differences between the two sounds be too similar for discrimination, resulting in the poor performance? Similarly, although Cantonese does not have long and short vowel phonemes comparable to English /i:, I, :, , u:, /, the Cantonese short vowels in question, /i, , u/, all have long and short allophones realized in different contexts For example, the two allophones of the Cantonese high front vowel /i/ are [i:], which occurs before labials and alveolars such as /m/ or /n/; and [I], which occurs before velars such as / / or /k/ (Bauer & Benedict, 1997) Would the L2 long and short vowels be so similar to the allophones of the corresponding L1 vowels that the learners cannot discriminate between the L2 vowels? How similar or dissimilar (different) a pair of L1 and L2 sounds is falls beyond the scope of the present article, yet it is evident that, for speech perception, mother tongue influence does not surface as a coefficient of inventory In another phase of the same study in which the relationship between perceived distance and perception was investigated, it has been found that the perceived similarity between a pair of L1 and L2 sounds has adverse effects on ESL learners’ perception of the L2 sound (see Chan, in press) THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 735 gaps Further research is needed to examine the extent of crosslinguistic influences on perception.6 Positional Effects The position of a consonant may also be a factor affecting L2 perception As Redford and Diehl (1996) discern, initial consonants were significantly more perceptible than final consonants because the latter are less perceptually salient than the former The results of the present study also demonstrate such perceptual differences, especially with regard to sonorant consonants, although the effects may be defective because of the uneven distribution of sonorant consonants in the two positions Initial sonorant consonants (e.g., /r/) are on the whole easier to perceive than final ones (e.g / /) Where the same consonant can appear either word-initially or word-finally, it is the final position which poses more problems The lateral /l/ is a good example When occurring word-initially, the sound is easily identified, but when occurring word-finally, it is often misperceived The allophonic articulatory features of dark [l˜] as a consequence of its position trigger a relatively higher degree of perceptual difficulty.7 As for obstruents, positional effects also seem to operate Final obstruents are on the whole more difficult to perceive than initial ones However, a closer look at the findings suggests that voicing contrasts may also be a determining factor The three obstruents pairs which trigger significant perceptual differences at different positions are all voiced and voiceless pairs (i.e., /v, f/, /z, s/, and / , /), whereas other obstruent pairs which not differ in voicing (e.g., /h, f/, /ð, d/) not trigger significantly diverse performance when placed in different positions In the speech production literature, terminal devoicing of obstruents has been documented as a very common phenomenon for ESL learners whose native language does not show voicing contrasts, including Cantonese (Chan, 2007; Eckman, 1981; Edge, 1991; Major & Faudree, 1996) As a result, final voiced and voiceless obstruents are often neutralized Such production habits may also help explain the higher degree of perceptual difficulty of final obstruents than initial obstruents Learners who are used to neutralizing final /f/ and /v/ in words such as safe and save may tend to identify stimuli such as [seIf] or [seIv] as either word, but they may be less inclined to so for stimuli such as [fæn] or [væn] Positional effects, thus, work in tandem 736 In the study, final /l/ was pronounced as dark [l˜] following the RP accent RP speakers sometimes vocalize final /l/, but such vocalization is typically limited to words with a labial articulation, such as careful or people (Cruttenden, 2001) TESOL QUARTERLY with voicing effects in determining learners’ perception of final obstruents Effects of Previous Word Knowledge Another issue which may affect L2 perception concerns learners’ preconceptions of word pronunciations When given contrasting phones in isolation, most Cantonese ESL learners have the ability to discriminate the contrasts, whereas when contrasting phones are embedded in words, many more perception problems arise These perception problems may be the result of the interference of predetermined word pronunciations The acoustic stage of perception involves the reception of input sounds in the form of semicontinuous acoustic signals (Fromkin, Rodman, & Hyams, 2003) The discrimination of isolated phones belongs to this stage However, in processing speech, higher-level linguistic information often combines with acoustic signals to determine category identity (Miller & Eimas, 1995), so perception involves not just the ability to detect sensory information but also the ability to make sense of what has been heard using higher orders of knowledge, including the listener’s knowledge of the structure of a word, as well as other properties derived from his or her general knowledge of situational and pragmatic constraints (Fromkin et al., 2003) In isolated word identification, listeners cannot draw on any syntactic, situational, or pragmatic information, but their knowledge of word pronunciations may operate to override the detected acoustic differences between minimal pairs, resulting in wrong decisions For instance, a learner who is unaware of the difference between the pronunciations of the words fool /fu:l/ and full /f l/ may not realize the difference between the two words, despite the contrast between the two vowels being distinguishable by ear By the same token, a learner who has constantly taken [dIu] or [dIu:] as the pronunciation of dill /dIl/ will mistake dew /dju:/ as dill, or vice versa Given that the perceptual targets of the present study were determined on the basis of English speech sounds documented to be confusable in production, there is reason to believe that, in perceiving an English speech sound, Cantonese ESL learners not rely on just the acoustic signals that their ears receive Rather, their perception of English (isolated words) may be dependent on other psycholinguistic factors, such as previous word knowledge It may be reasonable to suggest that a learner’s mental representation for perception is mediated by predetermined word pronunciations, converting input acoustic signals to forms which fit his or her shifted mental representation, resulting in incorrect perceptual judgments With the existing pool THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 737 of data and focus of the study, it is unclear to what extent and how previous word knowledge will affect a learner’s perception of word pronunciations Further research into the perception and production of English words by the same learners is needed to explore the relationship between the two and to ascertain the precise effects of preconceptions of word knowledge on word perception PEDAGOGICAL IMPLICATIONS Perception Problems Versus Production Problems It may be argued that perception problems are not as serious as production problems because, given enough contextual cues, they may not cause serious communication problems In meeting daily demands for perception of English speech sounds, ESL learners can have their difficulties solved by the help of contextual cues However, in situations where word identification is vital, such as note-taking or identification of isolated words over the phone, perception problems may cause frustration or result in misunderstanding Perceptual training should, therefore, be another integral component of a holistic L2 phonology program going hand in hand with production training Pronunciation teaching models which give attention to both perception and production, such as that of Celce-Murcia, Brinton, and Goodwin (1996), are good exemplars Because the L2 speech sounds which appear easy to produce may still cause perception problems, whereas those which are difficult to produce may not be difficult for perception, ESL teachers should diagnose students’ problems to determine whether they are production problems or perception problems If the root of a problem is perception related, then it should further be differentiated into discrimination and identification problems For discrimination problems, effort should be put into helping learners identify the acoustic differences between different stimuli, such as /e/ or /æ/ Ear-training exercises may start with isolated contrasting phones, followed by minimal pair discrimination exercises For identification problems, effort should be put into helping learners associate a certain label (e.g., the word set) with a certain set of acoustic signals (e.g., [set]) Identification exercises with words or pictures such as those used in the present study may help achieve this purpose It may be helpful, then, for teachers to perform a needs analysis, which is not necessarily time consuming or laborious, because it can be incorporated into regular pronunciation teaching and practice activities Diagnostic tests of perception can be carried out by having teachers 738 TESOL QUARTERLY produce the target sounds in isolation or in minimal pairs, while students identify the spoken sounds or words By the same token, diagnostic tests of production can be performed by having teachers show students minimal word pairs in spelling form or in pictures, with students reading the words aloud Sporadic testing of students’ skills will suffice in informing ESL teachers of learner needs Elements of Pronunciation Training Ear-training exercises are not the only core elements of perception training A perception training programme should include various kinds of audiovisual elements, such as recognition of the physical dimensions in the production of a speech sound For example, it is fruitful for teachers to alert ESL learners to the observable physical differences between the production of /æ/ and that of /e/ (i.e., a greater degree of the lowering of the jaw in the production of the former than of the latter) when learners encounter minimal pairs such as sat and set, or when they encounter either element of the pair Given that / / is often produced by RP speakers with slightly rounded lips, whereas /s/ is not (Roach, 2000), learners may be able to identify the right sound if they are trained to focus on the speaker’s lip shape when they encounter words such as ship and sip Research has also indicated that the use of visual cues is beneficial to learners’ perception of nonnative contrasts (Hazan, Sennema, Faulkner, Ortega-Llebaria, Iba, & Chung, 2006) Visual displays via spectrographic analysis, which have been adopted not only for the teaching of segmentals but also for the teaching of suprasegmentals (Anderson-Hsieh, 1994; Hardison, 2004; Molholt, 1988; Stibbard, 1996), could also be used Students can be encouraged to practice the model pattern of a target phoneme and to visualize the acoustic differences between native speakers’ pronunciation and their own (Lambacher, 1999; Molholt, 1988) Some form of awareness-raising exercises should also be employed when necessary Perception training of voicing contrasts, for example, can be made more successful with the introduction of awareness-raising activities targeting the length difference between the preceding vowels In final position, voiced obstruents are normally devoiced by native speakers The distinction between final voiced and voiceless obstruents is usually maintained by means of vowel duration, that is, the vowel preceding a voiced obstruent is lengthened, whereas the vowel preceding a voiceless consonant is shortened (Roach, 2000) If ESL learners are alerted to such durational differences, their perception of voicing contrasts in words such as set and said, or leaf and leave, will be more successful THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 739 Priorities of Teaching Although it may be true that helping students overcome as many difficulties as they can and achieving as much accuracy as possible is the ultimate target of speech training, it is suggested that ESL teachers should prioritize their teaching focuses As Jenkins (2002) claims, concentrating on phonological and phonetic features which are crucial as safeguards of mutual intelligibility in English as an international language (EIL) is likely to be more effective than addressing every feature of learner speech which differs from that of standard native speaker speech Not all problems are equally treatable, and not all problems are worth attending to The difference between /h/ and /f/ may be an example of an untreatable perception problem, as evidenced by the participants’ exclusively poor performance on the discrimination of these two sounds regardless of position and task requirements /h/ has also been described in the literature as a very difficult sound Its exceptionally rare occurrence in the world’s languages (Maddieson, 1984) has given added difficulty to ESL learners universally Complaints from Cantonese ESL learners about their difficulties in producing the sound and their inability to distinguish it from /f/ both initially and finally are heard everywhere (Chan, 2006a) Not only Cantonese ESL learners often have problems with /h/, but substitution of other sounds for this dental fricative is also common in Southeast Asian English (Deterding & Kirkpatrick, 2006) Studies investigating fricative perception by native speakers also prove that /h/ and /f/ are most likely to be confused (Tabain, 1998) Given the universal difficulty of /h/, the articulatory and acoustic similarities between it and /f/, as well as widespread substitution of /h/ by [f] even in the inner circle, it is not at all apparent how useful intensive teaching of the sound will be So, should this sound be included as an ESL perception teaching target, and if so, how important should teachers consider complete mastery of this sound to be? One important purpose of learning a language is communication (Fernandez Amaya, 2008), yet in Hong Kong, English is often used by ESL learners for communicating with non-Chinese speakers who are also nonnative speakers of Englsh (Li, 1999), rather than just for communicating with native speakers of English In this regard, the purpose of learning English is for international communication (EIL) Jenkins (2002) has argued against the appropriateness of including /h/ as a pronunciation target for EIL, so teachers should strike a balance between the purpose of learning and the desired degree of mastery Another issue about teaching priorities concerns the use of contrastive analysis It has been argued in the literature that raising learners’ consciousness of the differences between their mother tongue 740 TESOL QUARTERLY and the target language has facilitative effects on the teaching of pronunciation to bilingual learners (Hung, 1993) For speech production training, this should be true, because many production problems are due to the nonexistence of a target sound in the learners’ native language However, given the findings of the present study, contrastive analysis may not be of equal importance in perception training ESL teachers are advised to have different teaching priorities for the two different areas of speech training Other Considerations In the implementation of pronunciation training, a few other factors also need to be taken into consideration Learners’ proficiency level is one important concern Given the divergent findings of the present study from documented production difficulties in other research, it can be seen that, for advanced ESL learners, pronunciation training programmes should be research driven and be guided by findings from both production and perception studies They should also be continuously reshaped by teachers’ own diagnoses Because not all English speech sounds are difficult for advanced learners, and their production difficulties and perception difficulties may not coincide, it is counterproductive for teachers to spend excessive amounts of time on documented problems which are not at all problematic to their learners (Chan, 2010) The purpose of learning, as mentioned earlier, is another important concern The design of an appropriate pronunciation programme and the priorities of teaching should be led by learners’ learning purposes: Is the language being learned to fulfill perceptual needs (such as answering the phone), production demands, or both? Is it being learned for communication as an international language or for communication with native speakers exclusively? Is the goal to approximate a target language norm or to attain intelligibility? Answers to these questions are vital determinants for planning pronunciation curricula Conclusion In this article, I have reported on the results of a research study which investigated the perception of English speech sounds by Cantonese ESL learners in Hong Kong It was found that advanced Hong Kong ESL learners not have many problems in their perception of English speech sounds which have been documented as difficult for production Where problems occur, there is no evidence to suggest that their perception problems correspond to documented production problems THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 741 Rather, preconception of word pronunciations may be a contributing factor for learners’ inability to distinguish minimal pairs Learners’ native language repertoire does not necessarily form the basis of their L2 perceptual ability, and cross-linguistic influences work in tandem with other factors to govern speech perception Further research is needed to include learners at different proficiency levels participating in both perception and production tasks It may also be useful to conduct research which requires participants to listen to sounds that are read out to them directly instead of requiring participants to discriminate prerecorded stimuli played on headphones ACKNOWLEDGMENTS The work described in this article was fully supported by a competitive earmarked research grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project Number: CityU 1455/05H] The support of the Council is acknowledged I would also like to thank the participants of the study for their contribution and my research assistant for her administrative assistance THE AUTHOR Alice Y W Chan is an Associate Professor at the Department of English, City University of Hong Kong Her research interests include second language acquisition, English grammar, English phonetics and phonology, and lexicography REFERENCES Anderson-Hsieh, J (1994) Interpreting visual feedback on suprasegmentals in computer assisted pronunciation instruction CALICO Journal, 11, 5–16 Bauer, R S., & Benedict, P K (1997) Modern Cantonese phonology: Trends in linguistics: Studies and monographs 102 Berlin, Germany: Mouton de Gruyter Best, C T (1994) The emergence of native-language phonological influences in infants: A perceptual assimilation model In J C Goodman, & H C Nusbaum (Eds.), The development of speech perception: The transition from speech sounds to spoken words (pp 167–224) Cambridge, MA: MIT press Best, C T., McRoberts, G W., & Goodell, E (2001) Identification of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system Journal of the Acoustical Society of America, 109, 775–794, doi:10.1121/1.1332378 Best, C T., & Strange, W (1992) Effects of phonological and phonetic factors on cross-language perception on approximants Journal of Phonetics, 20, 305–330 Bohn, O., & Flege, J E (1992) The production of new and similar vowels by adult German learners of English Studies in Second Language Acquisition, 14, 131–158, doi:10.1017/S0272263100010792 Bolton, K., & Kwok, H (1990) The dynamics of the Hong Kong accent: social identity and sociolinguistic description Journal of Asian Pacific Communication, 1, 147–172 742 TESOL QUARTERLY Bradlow, A R., Pisoni, D B., Akahane-Yamada, R., & Tohkura, Y (1997) Training Japanese listeners to identify English /r/ and /l/: IV Some effects of perceptual learning on speech production Journal of the Acoustical Society of America, 101, 2299–2310, doi:10.1121/1.418276 Carlisle, R S (1988) The effect of markedness on epenthesis in Spanish/English interlanguage phonology Issues and Developments in English and Applied Linguistics, 3, 15–23 Celce-Murcia, M., Brinton, D., & Goodwin, J (1996) Teaching pronunciation: A reference for teachers of English to speakers of other languages Cambridge, England: Cambridge University Press Chan, A Y W (2004) Investigating Cantonese ESL learners’ acquisition of English final consonants Paper presented at the 31st Linguistic Association of Canada and the United States Forum 2004, University of Illinois at Chicago, IL Chan, A Y W (2006a) Cantonese ESL learners’ pronunciation of English final consonants Language, Culture and Curriculum, 19, 296–313, doi:10.1080/ 07908310608668769 Chan, A Y W (2006b) Strategies used by Cantonese speakers in pronouncing English initial consonant clusters: Insights into the interlanguage phonology of Cantonese ESL learners in Hong Kong International Review of Applied Linguistics in Language Teaching, 44, 331–355, doi:10.1515/IRAL.2006.015 Chan, A Y W (2007) The acquisition of English word-final consonants by Cantonese ESL learners in Hong Kong Canadian Journal of Linguistics, 52, 231– 253 Chan, A Y W (2010) Advanced Cantonese ESL learners’ production of English speech sounds: Problems and strategies System, 38, 316–328, doi:10.1016/ j.system.2009.11.008 Chan, A Y W (in press) Cantonese ESL learners’ perceived relations between ‘‘similar’’ L1 and L2 speech sounds: A test of the Speech Learning Model The Modern Language Journal Chan, A Y W., & Li, D C S (2000) English and Cantonese phonology in contrast: Explaining Cantonese ESL learners’ English pronunciation problems Language, Culture and Curriculum, 13, 67–85, doi:10.1080/07908310008666590 Chan, C P H (2001) The perception (and production) of English word-initial consonants by native speakers of Cantonese Hong Kong Journal of Applied Linguistics, 6, 26–44 Cruttenden, A (2001) Gimson’s pronunciation of English 6th edition London, England: Arnold Davis, L M (1982) American social dialectology: A statistical appraisal American Speech, 57, 83–94, doi:10.2307/454442 Deterding, D (2006) The pronunciation of English by speakers from China English World-Wide, 27, 175–198, doi:10.1075/eww.27.2.04det Deterding, D., & Kirkpatrick, A (2006) Emerging South-east Asian Englishes and intelligibility World Englishes, 25, 391–409, doi:10.1111/j.1467-971X.2006.00478.x Deterding, D., Wong, J., & Kirkpatrick, A (2008) The pronunciation of Hong Kong English English World-Wide, 29, 148–175, doi:10.1075/eww.29.2.03det Eckman, F (1977) Markedness and the contrastive analysis hypothesis Language Learning, 27, 315–330, doi:10.1111/j.1467-1770.1977.tb00124.x Eckman, F (1981) On the naturalness of interlanguage phonological rules Language Learning, 31, 195–216, doi:10.1111/j.1467-1770.1981.tb01379.x Edge, B A (1991) The production of word-final voiced obstruents in English by L1 speakers of Japanese and Cantonese Studies in Second Language Acquisition, 13, 377–393, doi:10.1017/S0272263100010032 THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 743 Fernandez Amaya, L (2008) Teaching culture: Is it possible to avoid pragmatic failure? Revista Alicantina de Estudios Ingleses, 21, 11–24 Flege, J E (1987) The production of ‘‘new’’ and ‘‘similar’’ phones in a foreign language: Evidence for the effect of equivalence classification Journal of Phonetics, 15, 47–65 Flege, J E (1995) Second language speech learning: Theory, findings and problems In W Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp 233–277) Baltimore, MD: York Press Flege, J E., & Mackay, I R A (2004) Perceiving vowels in a second language Studies in Second Language Acquisition, 26, 1–34, doi:10.1017/S0272263104261010 Fromkin, V., Rodman, R., & Hyams, N (2003) An introduction to language 7th edition Boston, MA: Thomson Heinle Fry, D B (1979) The physics of speech Cambridge, England: Cambridge University Press Gimson, A C., & Ramsaran, S (1989) An introduction to the pronunciation of English 4th edition Kent, England: Edward Arnold Hardison, D M (2004) Generalization of computer-assisted prosody training: Quantitative and qualitative findings Language Learning & Technology, 8, 34–52 Hazan, V., Sennema, A., Faulkner, A., Ortega-Llebaria, M., Iba, M., & Chung, H (2006) The use of visual cues in the perception of nonnative consonant contrasts Journal of the Acoustical Society of America, 119, 1740–1751, doi:10.1121/ 1.2166611 Hung, T T N (1993) The role of phonology in the teaching of pronunciation to bilingual students Language, Culture and Curriculum, 6, 249–256, doi:10.1080/ 07908319309525155 Hung, T T N (2000) Towards a phonology of Hong Kong English World Englishes, 19, 337–356, doi:10.1111/1467-971X.00183 Ingram, J C L., & Park, S G (1997) Cross-language vowel perception and production by Japanese and Korean learners of English Journal of Phonetics, 25, 343–370, doi:10.1006/jpho.1997.0048 Jenkins, J (2002) A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language Applied Linguistics, 23, 83–103, doi:10.1093/applin/23.1.83 Jones, I (1979) Some cultural and linguistic considerations affecting the learning of English by Chinese children in Britain English Language Teaching Journal, 34, 55– 61 Ladefoged, P (2006) A course in phonetics (5th ed.) Boston, MA: Thomson Wadsworth Lado, R (1957) Linguistics across cultures Ann Arbor, MI: The University of Michigan Press Lambacher, S (1999) A CALL tool for improving second language acquisition of English consonants by Japanese learners Computer-Assisted Language Learning, 12, 137–156, doi:10.1076/call.12.2.137.5722 Li, D C S (1999) The functions and status of English in Hong Kong: A post-1997 update English World-Wide, 20, 67–110, doi:10.1075/eww.20.1.03li Lo, S K (2005) The acquisition of English final consonants by Cantonese learners of English as a second language in Hong Kong: A study to test the Markedness Differential Hypothesis Paper presented at the 3rd Annual Hawaii International Conference on Arts and Humanities, Honolulu, HI Maddieson, I (1984) Patterns of sounds New York, NY: Cambridge University Press Major, R C., & Faudree, M C (1996) Markedness universals and the acquisition of voicing contrasts by Korean speakers of English Studies in Second Language Acquisition, 18, 69–90, doi:10.1017/S0272263100014686 744 TESOL QUARTERLY Major, R C., & Kim, E (1999) The similarity differential rate hypothesis Language Learning, 9, Supplement 1, 151–183, doi:10.1111/0023-8333.49.s1.5 Miller, J L., & Eimas, P D (1995) Speech perception: From signal to word Annual Review of Psychology, 46, 467–492, doi:10.1146/annurev.ps.46.020195.002343 Molholt, G (1988) Computer-assisted instruction in pronunciation for Chinese speakers of American English TESOL Quarterly, 22, 91–111, doi:10.2307/ 3587063 Munro, M., & Derwing, T (1995) Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech Language and Speech, 38, 289–306 Pater, J (2003) The perceptual acquisition of Thai phonology by English speakers: Task and stimulus effects Second Language Research, 19, 209–223 Proctor, M (2004) Production and perception of AusE vowels by Vietnamese and Japanese ESL learners Paper presented at 2004 Australian Linguistic Society Annual Conference, University of Sydney, Sydney, Australia, 13–15 July 2004 Redford, M A., & Diehl, R L (1996) The relative perceptibility of initial and final consonants The Journal of the Acoustical Society of America, 100, 2693–2693, doi:10.1121/1.417050 Roach, P (2000) English phonetics and phonology: A practical course 3rd edition Cambridge, England: Cambridge University Press Schmid, P., & Yeni-Komshian, G (1999) The effects of speaker accent and target predictability on perception of mispronunciation Journal of Speech, Language, and Hearing Research, 42, 56–64 Sewell, A (2009) World Englishes, English as a Lingua Franca and the case of Hong Kong English English Today, 25, 37–43, doi:10.1017/S0266078409000066 Stibbard, R (1996, August) Teaching English intonation with a visual display of fundamental frequency The Internet TESL Journal, II.8 Retrieved from http:// iteslj.org/Articles/Stibbard-Intonation/index.html Tabain, M (1998) Non-sibilant fricatives in English: Spectral information above 10 kHz Phonetica, 55, 107–130, doi:10.1159/000028427 Wong, C S P., & Setter, J (2002) ‘‘Is it ‘night’ or ‘light’? How and why Cantonese speaking ESL learners confuse syllable-initial [n] and [l] In A James, & J Leather (Eds.), New sounds 2000: Proceedings of the fourth international symposium on the acquisition of second language speech (pp 351-359) Klagenfurt, Austria: Universitaăt Klagenfurt THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 745 746 w w m m pb pb Bilabial fv f Labiodental 0- ð Dental ts dz n n l l td td sz s Alveolar Vowel Systems: English short vowels (7): /I, e, æ, , , , / Cantonese short vowels (7): /i, e, y, u, , œ, a/ English long vowels (5): /i:, u:, :, a:, :/ Cantonese long vowel (1): /a:/ English diphthongs (8): /I , e , , eI, aI, I, a , / Cantonese diphthongs (10): /ei, ai, a:i, i, ui, au, a:u, iu, ou, œy/ (Adapted from Chan & Li, 2000) Plosives: E C Fricatives: E C Affricates: E C Nasals: E C Lateral: E C Approximants: E C Place of Art./Manner of Art r t# # d Palatal-(Post) Alveolar j j Palatal kg kg Velar kw gw LabioVelar h h Glottal Appendix I The English (RP) and Cantonese Consonantal and Vowel Systems TESOL QUARTERLY Appendix II List of Phones and Word Pairs Task (Categorial Discrimination Task) Contrasts included: / / /u:/ /i:/ /I/ /aI/ /a:/ /aI//æ/ /z/ /s/ 11 / / /s/ 10 15 13 /v/ /w/ 14 / / /n/ 17 /l, u:/ 18 /r, w/ Task (Minimal Pair Identification Task) eat it bin pick peak fool look Luke suit 10 Don dawn 11 wok 13 send sand 14 man 16 bed bad 17 set 19 back bike 20 fan 22 fight fat 23 hi 25 spa spy 26 laugh 28 thin fin 29 first 31 Ruth roof 32 fought 34 bayed bathe 35 though 37 than Dan 38 sell 40 puss push 41 see 43 bus buzz 44 doze 46 rice rise 47 sink 49 save safe 50 leaf 52 vine fine 53 joke 55 Jane chain 56 chunk 58 we v 59 verse 61 wail veil 62 ran 64 win wing 65 son 67 load road 68 read 70 ray lay 71 lick 73 knife life 74 lock 76 mew mill 77 dill 79 kill queue 80 hill 82 run one 83 white 85 ways raise Task (Picture Identification Task) bean bin hit sit seat ship full fool pool 10 caller collar 11 wok 13 stock stalk 14 not 16 bend band 17 men 19 pen pan 20 fat 22 cat kite 23 laugh 25 kill queue 26 hill 28 mew mill 29 nil 31 few view 32 leaf / :/ / / /h/ /f/ /v/ /f/ /r/ /l/ 12 16 /æ/ /e/ /ð/ /d/ / // / /n/ /l/ bean full soot walk men sat fine life thirst thought dough shell she dose zinc leave choke junk worse rang sung lead Nick knock due Hugh write 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 beach hood pod cot beg died dime ma pie half deaf dare sign mash zip surf fat H G’s vet gong lock right no net few woof red bitch who’d pawed caught bag dad dam my pa hearth death there shine mass sip serve vat age cheese wet gone rock light low let fill roof wed heat sheep pull walk nought man fight life Hugh new leave 12 15 18 21 24 27 30 33 tin look hood chalk send said back la few fan safe teen Luke who’d choc sand sad bike lie fill van save THE PERCEPTION OF ENGLISH SPEECH SOUNDS BY CANTONESE ESL LEARNERS 747 TABLE Continued 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 748 fine fin breed dough rice sip H hand ton cell sheet vet read load red write lock vine thin breathe though rise zip age hanged tongue shell seat wet lead road wed white knock 35 38 41 44 47 50 53 56 59 62 65 68 71 74 77 80 83 deaf first day these race badge cheese ran win save verse vent light lock run life nine death thirst they D’s raise batch G’s rang wing shave worse went write rock one knife line 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 fought free dare sink ice choke junk son puss sign west v liver ways rest light no thought three there zinc eyes joke chunk sung push shine vest we river raise west night low TESOL QUARTERLY

Tiêu đề	The Perception of English Speech Sounds by Cantonese ESL Learners in Hong Kong
Tác giả	Alice Y. W. Chan
Trường học	City University of Hong Kong
Chuyên ngành	English
Thể loại	article
Năm xuất bản	2011
Thành phố	Hong Kong

Định dạng
Số trang	31
Dung lượng	158,08 KB