The role of syllables in speech perception

To appear in Ziolkowski, M S., Noske, M., & Deaton, K (Eds.) (1990) Papers from the parasession on the syllable in phonetics and phonology Chicago: Chicago Linguistic Society Nusbaum, H C., & DeGroot, J The role of syllables in speech perception In M S Ziolkowski, M Noske, & K Deaton (Eds.), Papers from the parasession on the syllable in phonetics and phonology Chicago: Chicago Linguistic Society, 1991 The Role of Syllables in Speech Perception Howard Nusbaum and Jenny DeGroot The University of Chicago Traditionally, psychologists have viewed speech perception as a recognition process in which continuously varying acoustic patterns are matched to discrete phonemic representations in the listener's mind (e.g., see Pisoni, 1978, 1981) Once phonemes are recognized, words can be identified based on patterns of segments rather than on patterns of acoustic properties (see Pisoni, 1981) According to this view, the waveform is segmented into chunks of acoustic information and each chunk (or a portion of each chunk) is recognized as a phoneme A classic problem facing theories of speech perception is to explain how the waveform is divided into acoustic segments that are appropriate for phoneme recognition While it is possible to segment speech according to acoustic criteria, speech cannot be segmented according to the criteria that would be needed to provide appropriate correspondence with phonemes (see Fant, 1962) A further problem for theories of speech perception arises because the acoustic specification of specific phonemes changes with talker, phonetic context, and situation (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) Thus there are no context-independent, invariant acoustic properties that can be considered as criterial for recognizing a particular phonological segment These problems make it difficult to explain how listeners recognize the segmental structure of speech One approach to dealing with these problems is to propose that speech perception is mediated by linguistic units other than the phoneme The hope of this approach is that other types of linguistic units (e.g., phonetic features or syllables) may be less affected by the coarticulatory processes that affect the acoustic representation of phonemes For example, Stevens and Blumstein (1978) have argued that phonetic features may be related directly to unique acoustic patterns, whereas Massaro (1972) has argued that the syllable is a better candidate for the fundamental unit of speech perception Unfortunately, there is some empirical evidence against each of the candidates (e.g., Ohman, 1966; Walley & Carrell, 1983) Thus there has been a longstanding, unresolved question about the nature of the first level of linguistic representation of speech after auditory processing To a degree, the question of how listeners represent the sound pattern of language has perhaps even achieved the same kind of notoriety for being undecidable as the question of whether mental images are represented as analog "pictures" or propositional "descriptions" (Anderson, 1978; Kosslyn, 1980; Pylyshyn, 1981) For some time, the issue seemed to be resolved by the argument that listeners actively use several levels of linguistic representation simultaneously, such that no particular level has primacy, and each may serve a different function For example, Studdert-Kennedy (1976) suggested that speech may be segmented into syllables, which then serve as the basis for phonetic recognition By this view, each individual syllable should provide all the information necessary to support phoneme recognition The initial psychological research investigating the perceptual representation of the sounds of speech did not seem to provide any resolution whatsoever for this issue of the linguistic representation of speech sounds In these early studies, subjects were instructed to listen for a particular target phoneme or syllable During a trial of the experiment, subjects monitored for this target in a sequence of utterances, such as isolated syllables or vowels, pressing a button to indicate recognition of the target On the basis of their seminal study, Savin and Bever (1970) argued that the syllable is the primary unit of speech analysis: Subjects recognized whole syllables faster than phonemes within syllables The logic behind this conclusion is relatively straightforward, but perhaps overly simplistic Subjects should be able to respond most quickly based on target representations that are available to consciousness in the shortest amount of time If the sounds of speech are represented as whole syllables, then recognizing syllable targets should be faster than recognizing phoneme targets because the phonemes have to be extracted or inferred after the syllables are recognized Subsequent studies argued against this logic and conclusion These experiments indicated that the results of target monitoring may reflect a listener's expectations about the structure of the stimuli, even more than they reflect the perceptual units that mentally represent the sound patterns of those stimuli (Foss & Swinney, 1973; Healy & Cutting, 1976; McNeill & Lindig, 1973; Mills, 1980a, 1980b) Furthermore, studies that have used the target monitoring paradigm to investigate how listeners represent the sound patterns of speech have been qualified by the problems of disentangling task-specific strategies from the basic perceptual mechanisms used in recognizing spoken language (Norris & Cutler, 1988) In fact, the target monitoring paradigm was not really designed for investigating directly the perceptual units that represent the sounds of speech, but was developed instead as a probe task to measure the online demands of comprehension while listening to spoken language (Cutler & Norris, 1979) Although it is possible to conclude that the question of how listeners represent speech is undecidable and better left to theoretical analysis rather than empirical study, we could also conclude that previous studies have asked the question the wrong way Perhaps it would be better to investigate the perceptual units of speech by using a different empirical method rather than by trying to coerce the monitoring paradigm to serve that particular function To start with, in order to understand how the sounds of speech are initially encoded into linguistic units, we must have some idea about what we mean by a unit of perception If we are asking about basic units of perception from which higher order representations are formed, then we could start from the assumption that these basic units are atomic or perceptually indivisible (see Pomerantz, 1986) In other words, if there is some unit that can be decomposed into other subordinate linguistic units, it is not basic (For this definition to work, we must treat perceptual decomposition as different from inferential decomposition In other words, it is likely that listeners can always use their metalinguistic knowledge to make inferences about linguistic units but this ability is not what we are interested in investigating.) How can we tell if a perceptual unit is constructed of component parts that are also psychological units themselves? Garner (1970, 1974) has outlined the rationale by which we can investigate the psychological representation of perceptual units According to this logic, we can start by defining analytically the dimensional or constituent structure of a particular unit For example, a phoneme may have a constituent feature structure or a syllable may have a constituent phoneme structure If subjects can make a decision about one of the constituent parts, independent of the other constituent parts, we can think about the unit as decomposable into that constituent structure Consider the example of a syllable which is composed of a sequence of phonemes We can ask listeners to decide whether the consonant in a CV syllable is a /b/ or a /d/ If listeners can make this decision without being influenced by the other phoneme in the same syllable, namely the vowel, it seems reasonable to conclude that the phonemes within a syllable are represented as discrete and separate entities Garner described two experimental conditions that can be compared to determine whether a perceptual unit can be decomposed into smaller parts If it can be decomposed, then there are more basic units than the one being investigated (i.e., its constituents) In one condition, the unidimensional condition, only the target dimension varies For our example of the syllable, subjects would be instructed to decide if each CV syllable contained a /b/ or /d/ and the vowel would be held constant, so listeners only hear two different syllables This condition provides baseline data about the time it takes listeners to recognize a consonant within a syllable In a second condition, the orthogonal condition, the vowel also varies so that there are four different stimuli—the two consonants combined with two vowels In this condition, subjects perform exactly the same task of deciding whether the consonant is /b/ or /d/ If the consonant and vowel are completely independent perceptual units, listeners should be able to identify the target without being influenced by irrelevant variation in the context If this is the case, response times should be the same in both conditions On the other hand, if the syllable is an integral unit, decisions about one part of a syllable should be affected by variation in another part; response times should be longer in the orthogonal condition than in the unidimensional condition Wood and Day (1975) carried out this specific study to investigate the integrality of adjacent phonemes within a syllable They found that recognition times for a consonant were significantly slowed by independent variation in the vowel and vice versa These results demonstrate that the consonant and vowel are perceptually integral within a syllable This suggests that the syllable may be the basic unit representing the sounds of speech However, just as basic perceptual units should not be easily decomposed into more primitive units, they should not be integral with each other Basic units of perception should be independent of one another This means that listeners should be able to make decisions about the information contained within one syllable without depending on the information in adjacent syllables If syllables are basic units for recognizing the sound patterns of spoken language, then we would expect each syllable to stand on its own If the sound pattern information in one syllable is integral with the information in an adjacent syllable, then the basic unit representing sound patterns would be a higher order description than the syllable, or it would be a representation that organizes information regardless of syllable structure or boundaries Experiment If the syllable is a basic unit of perception, segments within a syllable should be perceived integrally, while segments in different syllables should be perceptually separable To test these hypotheses, we extended the earlier study by Wood and Day (1975) Subjects identified a stop consonant, while an adjacent vowel either stayed constant (unidimensional condition) or varied (orthogonal condition) We used two different orthogonal conditions to examine the degree to which information in adjacent syllables is perceptually integral In one orthogonal condition, the vowel varied within the same syllable as the consonant that was being identified In a second orthogonal condition, the vowel varied in a syllable that was adjacent to the syllable containing the target consonant If the syllable is a basic perceptual unit, then response times for consonant recognition should be significantly slowed by vowel variation within the same syllable, whereas vowel variation in an adjacent vowel should not affect consonant perception at all Method Twenty-four University of Chicago students and neighborhood residents, aged 17 - 30, were tested in our study The stimuli were two sets of four VCV utterances: [o'pha], [o'tha], [o'phæ], and [o'thæ]; and [a'pho], [a'tho], [æ'pho], and [æ'tho] In all the utterances, the second syllable was stressed and the consonant aspirated, so that the consonant was perceived as the onset of the second syllable, rather than the coda of the first The first four utterances composed the withinsyllable stimulus set, in which the consonant was in the same syllable as the varying /a/ or /æ/ The second set of four utterances constituted the betweensyllable set, in which the varying /a/ or /æ/ was adjacent to the consonant but not in the same syllable The stimuli were recorded by a male talker, low-pass filtered at 4.6 kHz, digitized with 12 bits of resolution at 10 kHz, and edited into individual stimulus files with a digital waveform editor Since these stimuli were natural utterances, there was some variability in duration and intonation However, each stimulus was heard in the unidimensional condition and the orthogonal condition, so recognition performance for each stimulus was compared in two conditions rather than across different utterances The stimuli were presented under computer control binaurally over headphones at approximately 79 dB SPL The subjects were told that they would hear repetitions of the specified utterances over headphones They were instructed to determine whether each item contained a /p/ or a /t/ sound, and to press the corresponding key as quickly as possible without sacrificing accuracy The order of presentation of testing conditions and stimulus sets was counterbalanced across subjects Results and Discussion Subjects were highly accurate in identifying the target consonants /p/ and /t/ in all conditions (around 98% correct recognition) Response times are shown in Table for both sets of stimuli for the unidimensional and orthogonal conditions Subjects recognized consonants significantly more slowly in the within-syllable set than in the between-syllable set, F(1, 23) = 8.985, p < 01 This is not surprising, since the two stimulus sets contained different natural utterances, which varied in duration On average, the stop consonant burst occurred later in the within-syllable stimuli than in the between-syllable stimuli; this accounts for slower consonant identification in the within-syllable set In addition, recognition times were longer in the orthogonal condition, F(2, 23) = 5.024, p < 05, demonstrating that variation in vowel context slows consonant recognition overall This result replicates Wood and Day's (1975) finding that consonant recognition is retarded by variation in vowel identity Table Consonant recognition times (msec) Condition Unidimensional Orthogonal Stimulus set Within-Syllable Between-Syllable 760 677 799 753 In the present study, the primary question of interest concerned the amount of interference vowel variation produced for consonant recognition when there was variation in the same syllable as the consonant and when the variation was in the adjacent syllable As can be seen in Table 1, the increase in recognition times for variation for the within-syllable stimuli was numerically less than the increase for the between-syllable stimuli, although this interaction was not significant The pattern of results from this study indicate that consonants and vowels are perceptually integral, as found by Wood and Day (1975) However, there is at least as much integrality between segments in different syllables as there is between segments within the same syllable The presence of a syllable boundary between segments does not reduce the perceptual integrality between adjacent segments Thus the information within a syllable is not recognized as if it were independent of the information in adjacent syllables This suggests that the perceptual representation of speech sounds is not in terms of discrete syllables As a result, the syllable cannot be called a basic perceptual unit (also see Shand, 1976) One interpretation of our results is that the perceptual representation of speech sounds is more continuous than any specific discrete linguistic unit, or it is at least extremely sensitive to contextual information (e.g., Wickelgren, 1969) without regard for syllable structure But it is also possible that speech sounds are not represented by unitary, monolithic syllables, but are instead represented by a more hierarchically organized and articulated syllable structure Certainly, many modern phonological analyses and theories represent speech sounds more in terms of a syllable with hierarchically defined constituent structure (e.g., Clements & Keyser, 1981; Goldsmith, 1989a; Halle & Vergnaud, 1980; Kahn, 1976; Selkirk, 1982) It is possible that listeners directly code the sound patterns of speech as syllabic constituents rather than as unitary syllables Based on our present results and other earlier findings (e.g., Shand, 1976; Wood & Day, 1975), it is clear that discrete phonemes are not represented as independent perceptual units Consonant recognition is based on processing information in adjacent vowels within and between syllables However, it is possible that the primary representation of speech sounds is by larger syllabic constituents such as syllable onsets (prevocalic consonants) and rhymes (vocalic nucleus and coda) (Fudge, 1969; Selkirk, 1982) In fact, some psychologists have proposed these constituents as basic perceptual units (Cutler, Butterfield, & Williams, 1987; Treiman, Salasoo, Slowiaczek, & Pisoni, 1982) If an onset is a perceptual unit, then the onset as a whole should be perceptually available before its constituent phonemes A consonant within the onset must be identified by making reference to the entire onset In order to recognize the /p/ in /spa/, for example, a listener must analyze the /sp/ onset into its component phonemes Two recent studies used a speeded recognition task to investigate the psychological reality of onset representation (Cutler et al., 1987; Treiman et al., 1982) Both studies provided evidence that listeners are faster at recognizing entire onset structures than they are at recognizing other parts of syllables such as the initial consonant and vowel together There are basically two problems with these studies, however First, it is possible that there is more perceptual cohesion between adjacent consonants than between a consonant and vowel, simply because of the increased similarity of consonantal segments Thus, the comparison conditions (e.g., CC vs CV) are not really comparable Second, these studies did not use a paradigm designed to investigate the integrality of perceptual units Using the procedure proposed by Garner, the effects of contextual variation can be determined for each type of stimulus independently So it is possible to examine how recognition of a consonant within an onset is affected by variation of another consonant in that onset It is also possible to examine how recognition of a syllable-initial consonant is affected by variation of an adjacent consonant in the coda of the preceding syllable and to compare the two effects Experiment In the present experiment, the stimuli contained two types of fricative-stop consonant pairs In the within-syllable stimulus set, the two consonants formed a syllable onset In the between-syllable stimuli, a syllable boundary intervened between the fricative and the stop consonant Subjects were instructed to identify the stop consonant, regardless of the identity of the adjacent fricative in a unidimensional condition and in an orthogonal condition, for both sets of stimuli Another group of subjects were instructed to identify the fricative and ignore variation in the stop consonant in both unidimensional and orthogonal conditions If onsets are integral perceptual units, it should not matter whether contextual variation occurs in the first or second segment of the onset Thus we used the same stimuli but used different instructions to the subjects to change which segment was treated as the target and which was the context Method The subjects were 36 University of Chicago students and neighborhood residents, aged 18 - 31 The four within-syllable stimuli were monosyllables beginning with a fricative-stop consonant cluster: [spa], [sta], [∫pa], and [∫ta] The four between-syllable stimuli were bisyllables that contained the same fricativestop sequences, but with the consonants in separate syllables: [is'pha], [is'tha], [i∫'pha], and [i∫'tha] The between-syllable items were stressed on the second syllable, so that the stop could be aspirated, designating the stop as the syllable onset for English listeners Stimuli were recorded, digitized, selected, and edited following the same procedures as in Experiment In addition, the experimental procedure, apparatus, and instructions to subjects were the same as in Experiment 1, with minor exceptions One group of subjects was again instructed to determine whether each stimulus contained a /p/ or a /t/ sound, and to press the corresponding labeled key as quickly as possible without sacrificing accuracy The second group of subjects identified the fricative segment as either /s/ or /∫/, and pressed the corresponding key (labeled "s" or "sh"), again as quickly as possible without sacrificing accuracy The presentation order of testing conditions was counterbalanced across subjects Results and Discussion As in Experiment 1, subjects were highly accurate at identifying both the stop consonants (over 98% correct across conditions) Table shows the mean recognition times for subjects identifying stop consonants Across both stimulus sets, stop consonant judgments were slower in the orthogonal condition than in the unidimensional condition, F(2, 17) = 11.415, p < Stop consonants were recognized as quickly in the within-syllable stimuli as in the between-syllable stimuli Moreover, there was no significant interaction between condition and stimulus set: The amount of integrality between fricative and stop consonant was as great for the within-syllable stimuli as it was for the between-syllable stimuli Table Stop consonant recognition times (msec) Condition Unidimensional Orthogonal Stimulus set Within-Syllable Between-Syllable 671 669 696 697 Of course, it is important to note that for these stop consonant judgments, the varying fricative always preceded the stop target Therefore it is possible that it is difficult for listeners to ignore the fricative information that they hear before judging the stop consonant However, an examination of the data for the subjects judging the fricative identity shows the same pattern of integrality, even though the varying consonant followed the target Fricative judgments were highly accurate (over 97% correct across conditions) The overall pattern of fricative recognition times shown in Table replicates the pattern of results for stop consonant judgments: Fricative judgments were significantly slower in the orthogonal condition than in the unidimensional condition, F(2, 17) = 18.689, p < 001 Furthermore, there was no reliable difference in recognition speed between the two sets of stimuli, and for the orthogonal and unidimensional conditions there was no interaction between stimulus set and condition Table Fricative recognition times (msec) Condition Unidimensional Orthogonal Stimulus set Within-Syllable Between-Syllable 546 561 668 636 Taken together, the results for stop consonant and fricative judgments demonstrate that the consonants within a syllable onset are perceptually integral, as would be wished by those researchers proposing subsyllabic constituents as perceptual units (Cutler et al., 1987; Treiman et al., 1982) Unfortunately for this proposal, the same consonants are perceived as integral even when a syllable boundary separates them If a consonant within an onset is as integral with a consonant in the coda of a preceding syllable as are the consonants within an onset, the view of onsets and codas as independent perceptual units becomes questionable Neither the syllable nor its constituents (e.g., onsets) should be thought of as basic perceptual units, since the information within these units cannot be processed independently from the information within the preceding syllable Our first two experiments and previous studies (e.g., Shand, 1976; Wood & Day, 1975), demonstrate that listeners attend to phonetic context when identifying a target phoneme In other words, phonemes are not identified without considering the phonetic environment in which they are produced Moreover, attention to phonetic context is not restricted to the specific syllabic environment surrounding the target phoneme; listeners also attend to phonetic information in adjacent syllables Our findings raise an interesting question: Are these results telling us anything about the role of linguistic units in representing the sound patterns of spoken language? It is possible that the integrality among segments that we observe is due to purely auditory interactions among the acoustic properties used by listeners to make phonetic judgments The results of our first two studies not provide any evidence for an effect of perceived linguistic structure on the perceptual processing of phonemes; phonetic judgments depend on phonetic context without regard for linguistic organization Perhaps listeners are really making some kind of auditory decision that is highly correlated with phonetic structure and that auditory decision is affected by auditory patterns without regard for linguistic organization This is basically the argument that has been proposed by Pastore et al (1976) They claim that auditory processes account for the type of contextual effects that occur in phonetic judgments and that there is no need to appeal to linguistic representations or processes for an explanation On the other hand, a recent study by Tomiak, Mullennix, and Sawusch (1987) has suggested a very different conclusion Tomiak et al presented listeners with acoustic patterns consisting of a burst of noise followed by a chord composed of steady-state sine waves One group of subjects was told that the patterns were fricative-vowel syllables and were instructed to judge the fricative identity A second group was given the same stimuli and told to judge an auditory property of the noise; they were told nothing about the possible linguistic interpretation of the sounds Subjects who heard the patterns as speech showed integrality between the fricative and the vowel, whereas subjects who heard the same patterns as nonspeech sounds recognized the auditory properties of the noise independently of the tone chord These results suggest that the perceptual integrality of adjacent phonemes is more in the psychological representation and processing of speech than it is a consequence of the auditory interactions that occur when processing an acoustic pattern In order to understand the implications of phonetic integrality for the psychological representation of the sounds of speech, it is important to determine whether these effects occur at an auditory level of processing or a higher order linguistic level Certainly speech is processed as auditory pattern information and as a signal of linguistic significance Since our original question concerns the linguistic representation of the sound patterns of speech, it is important to tease apart auditory and linguistic effects Perhaps the sound patterns of speech are only represented as auditory properties and these properties are then used to directly recognize spoken words This view is quite similar to Klatt's (1979) LAFS model in which speech sounds are represented as sequences of short-term auditory spectra These spectra are then used as the basis for recognition of spoken words with no other intervening linguistic forms Of course, this type of model has no way of accounting for explicit effects of phonological rules or representations As such it seems quite implausible In order to assure ourselves that the present methodology is informative about the linguistic representation of speech sounds and not just the auditory processes that operate on acoustic patterns, we carried out a third experiment In this final experiment, as in the previous two, we examined the effects of phonetic variation on phoneme target judgments when the variation was within the same syllable compared to the case in which the phonetic variation was in a preceding syllable However, in this experiment, the two syllables were different words so that the syllable boundary also constituted a word boundary Experiment In the present experiment, one group of subjects identified a target stop consonant as /g/ or /k/ on each trial, using the same paradigm as in the previous experiments The varying context phoneme was an adjacent liquid, /r/ or /l/ A second group of subjects judged the identity of the liquid and the varying stop consonant served as context In contrast to the previous experiments, however, the two stimulus sets were distinguished not only by syllable structure but by word boundary location In the earlier studies, the subjects were told that each two-syllable utterance like [is'pha] was a single nonsense word, whereas in the present experiment each two-syllable utterance consisted of two English words In one stimulus set, the stop and the varying adjacent liquid were in the same word, as in the utterance paw grew, where the /g/ and /r/ are in the onset of the same word In the other set, the target and varying context were in adjacent words, as in hog root If the phonetic integrality across syllable boundaries that we observed in our first two studies is due to auditory interactions among acoustic patterns, we would expect the same pattern of results in the present experiment Listeners should be slower to judge the identity of a target consonant when there is variation in a context phoneme regardless of whether the target and context are in the same syllable or different syllables After all, the acoustic manifestation of adjacent phonemes is not dependent on the lexical status of the syllables, and the auditory processing of adjacent lexical syllables should be identical to the processing of two syllables that are nonwords On the other hand, if the degree of Condition Unidimensional Orthogonal Within-Word 1065 1143 Between-Word 772 803 In contrast to the previous experiments, there was a significant interaction between stimulus set and condition, F(1, 22) = 7.59, p < 05 Orthogonal variation within a word slowed stop recognition by an average of 78 ms in relation to the unidimensional condition, whereas between-word variation slowed stop recognition by only 31 ms on average This same pattern of results was observed for judgments of the liquid when the stop context varied Table Liquid recognition times (msec) Condition Unidimensional Orthogonal Stimulus set Within-Word 1164 1235 Between-Word 1059 1081 For the group of subjects judging liquid identity, accuracy was also very high (over 98% correct) The pattern of response times, shown in Table mirrors the pattern observed for the stop judgments: Recognition of the liquid target was significantly slower in the within-word stimuli than in the between-word stimuli, F(1, 22) = 79.23, p < 001 (Again this result is probably due to the fact that different natural utterances made up the two stimulus sets.) Also, recognition responses were slower in the orthogonal condition than in the unidimensional condition, F(1, 22) = 32.79, p < 001 Finally, the interaction between condition and word boundary location approached significance F(1, 22) = 4.15, p < 06 For both liquid and stop judgments, varying phonetic context within the same word as the target slowed recognition time by around 70 msec whereas varying the context in the adjacent word increased recognition time by about 30 msec It is important to remember that since these words are monosyllables, there is another way to describe these results: For this particular set of stimuli, phonetic variation within the target-bearing syllable increases target recognition time more than phonetic variation in an adjacent syllable Clearly, this pattern of results is very different from the pattern of results obtained in our first two experiments For these stimuli, the presence of a syllable boundary between target and context segments seems to reduce the impact of varying phonetic context on target recognition What is significant about this particular set of stimuli is, of course, that the syllable boundary is also a word boundary Certainly we can expect that there is as much coarticulation across syllable boundaries between words as there is across syllable boundaries between nonwords Therefore, the local acoustic environment of a syllable boundary is probably unaffected by the lexical status of the syllables on either side of the boundary However, our results show that a syllable boundary between words attenuates the influence of context outside the target-bearing syllable whereas syllable boundaries between nonwords not attenuate this influence to any measurable degree Our results argue against the view that the integrality of adjacent phonemes is due to interactions in the auditory processing of the acoustic patterns of speech If we accept that the listener's attention to phonetic context in making phonetic decisions is not an auditory phenomenon, what these results tell us about the psychological representation of speech sounds? Based on our initial definition of a perceptual unit, we can rule out phonemes, whole syllables, and syllabic constituents such as onsets and rhymes as representing perceptual phonology We might be tempted to interpret our observations of phonetic integrality as evidence for the representation of speech as context-sensitive allophones (see Wickelgren, 1969) A context-sensitive allophone could represent a speech sound as it occurs in the context of preceding and succeeding phonemes, without regard for syllable structure or boundaries From the results of our first two experiments, this might be a reasonable proposal Unfortunately, the results of the our third experiment are incompatible with this proposal Since allophones are uninformed about the lexical status of the context in which they occur, word boundaries and syllable boundaries should be equivalent in their effects (or lack of effect) on phonetic decisions On the other hand, it is clear that words function almost precisely in accordance with our definition of a basic perceptual unit Phonetic information within a word is integral and phonetic information is more separable between words (There is significant integrality of phonetic information in the coda of one word with the onset of the next, but this is significantly smaller than the integrality of within-word information.) This suggests that perhaps words are the basic unit of perceptual phonology This would mean that in listening to spoken language, continuously varying acoustic properties are recoded directly into the sound patterns of spoken words There is at least one readily apparent problem with this suggestion If we translate the acoustic structure of speech into words, how can we ever hear nonwords? Any perceived sound pattern that is not already stored in the mental lexicon should be matched to the sound pattern in the lexicon that is most similar to the actual utterance Since this does not happen, and we can hear and recognize the sound patterns of nonwords (see Pisoni, 1981), we need a different explanation of the present results Perhaps if the present results are largely discouraging about many of the linguistic units that have been proposed to explain speech perception, we need to examine the basic assumptions that underlie the traditional view of the linguistic representation of the sound patterns of spoken language General Discussion We started the present investigation to determine how listeners represent the sounds of spoken language There is clear evidence from previous perceptual contrast and adaptation studies that listeners perceive the sounds of speech in terms of both a set of auditory properties and a set of linguistic properties (see Sawusch & Nusbaum, 1983) The question we have addressed in the present studies is concerned with the way the linguistic properties are represented mentally There have been many proposals about the form of this representation, including the context-sensitive allophone, the phoneme, subsyllabic constituents (e.g., onset and rhyme), syllables, and words Over the years there have been arguments for and against each of these, with no resolution to the debate Indeed, many researchers have just given up on this question, viewing it as unresolvable The present results, even by themselves, seem to highlight the basic problem with this research question Consider our definition of a primary perceptual unit: Information within the unit is difficult to pull apart (the unit is an undifferentiated whole that can only be decomposed by post-perceptual analytic processes); information in different units should be entirely independent This definition of a perceptual unit is widely accepted (e.g., Garner, 1974; Pisoni, 1978; Pomerantz, 1986) However, by this definition, the results of our studies not support any of the proposed linguistic units Phonemes are not perceived independently of one another Phonetic information within a syllable is not perceived independently of information in an adjacent syllable Phonetic information in a syllable onset is not perceived independently of the information in the rhyme of a preceding syllable The phonetic information within a word is not perceived independently of information in an adjacent word One way to describe our results is that, in making phonological decisions, listeners are flexible in attending to different sources of information In the first two experiments, listeners made phonological decisions using information within and between syllables In the third experiment, listeners attended more to information within a syllable (i.e., word) than than they did to information in a different syllable (word) Thus, listeners can shift the weight they place on attending to intrasyllabic information compared to intersyllabic information Attention to intersyllabic information may reflect the perceptual usefulness of the acoustic manifestations of coarticulation among phonemes, even across syllable boundaries (e.g., Ohman, 1966) Although equal attention within and between syllables might, by itself, be taken as support for the use of context-sensitive allophones (e.g., Wickelgren, 1969) in representing speech sounds, the results of our third experiment raise problems for the allophone A context-sensitive allophone represents a segment differently in different phonetic contexts, reflecting the effects of coarticulation among adjacent segments Only the immediate phonetic context affects the representation of the target segment and higher order linguistic organizations not play any role in the form or content of an allophone Thus, this type of allophonic representation could account for equal attention to phonetic context within and between syllables, since coarticulation crosses syllable boundaries However, word boundaries are not included as part of the representation of context, so it is unclear why there should be less attention directed to phonetic information across a word boundary than there is to phonetic information within a word After rejecting traditional segments and syllables as possible representations of speech sounds, how can we interpret the present results? One possibility is that the present speeded classification task does not address the nature of perceptual representations However, this task comes directly from our definition of a perceptual unit Perhaps we should question the way perceptual units have been viewed in speech research In large part, the current definition of a perceptual unit in speech derives from the way researchers have thought about the process of speech perception For many years, speech perception has been viewed by psychologists as a pattern recognition process (see Pisoni, 1978, 1981) According to this view, the auditory properties coded from an utterance are compared with mental representations of linguistic units (e.g., phonemes) The listener finds the stored unit with the closest match to the part of the utterance that is under analysis Perceptual units such as phonemes are essentially represented in the mind of the listener as enumerated descriptions of the auditory (or other) properties that serve as the criteria for their recognition In functional terms, theories of speech perception therefore seek to explain how listeners classify acoustic information into linguistic categories This has led to a number of problems in formulating theories of speech perception because it has been difficult, if not impossible, to define phonemes or other linguistic units in acoustic terms Barsalou and Medin (1986) have pointed out that this view of speech perception has much in common with the traditional view of concepts and categorization in cognitive psychology (see Smith & Medin, 1981, for a review) In cognitive psychology, a category like bird or chair has been thought of as having a mental representation that defines the set of things to which the concept refers In concrete terms, a conceptual category specifies the features or properties that are used for the appropriate classification of a particular object This view holds that in speech research and in cognitive psychology, a category (such as /p/ or bird) is an enumeration of the specific criteria that determine category membership It is also true that differences among theories of speech perception parallel differences among theories of conceptual categorization For example, variations in the traditional view of conceptual categories suggest that categories may each be represented by a single prototype or there may be a number of specific examples of category members stored for each concept (see Smith & Medin, 1981) Similarly, listeners may represent phonetic categories with a prototype (e.g., Massaro, 1987; Samuel, 1977) or by storing a set of auditory examples (Klatt, 1979) The representation of a conceptual prototype may be in terms of sets of semantic features that are either defining or characteristic of a conceptual category (Smith, Shoben, & Rips, 1974), just as some acoustic properties may be phonetic invariants (and therefore defining) whereas others may be secondary and context-sensitive (and therefore characteristic) (Stevens & Blumstein, 1978) Or the semantic features of a conceptual prototype may be represented with continuously varying probabilities (Smith & Medin, 1981) and in speech, acoustic properties might have varying probabilities of phonetic or lexical diagnosticity (Massaro, 1975; De Mori, 1983) However, just as the traditional psychological view of perceptual units in speech has been confronted by theoretical problems (see Pisoni, 1978), the traditional view of conceptual categories has faced theoretical problems (e.g., McCloskey & Glucksberg, 1978, 1979; Murphy & Medin, 1985; also see Smith & Medin, 1981) Just coming up with objective and stable definitions of a particular conceptual category like game (Wittgenstein, 1953) has been as difficult as defining the acoustic properties that denote a particular phoneme such as /g/ (Liberman et al., 1967) Indeed, it is well known that the acoustic properties of a phoneme change with phonetic context, talker, and speaking rate (Liberman et al., 1967) It is also true that conceptual categorization and the meanings of concepts change with context (Barclay, Bransford, Franks, McCarrell, & Nitsch, 1974; Barsalou, 1982; Greenspan, 1986) In fact, the problem of context sensitivity of category meanings may be central to understanding some of the limitations on the current views of conceptual and perceptual representations Murphy and Medin (1985) have pointed out that these current views are based on the notion of a mental "chemistry" (Mill, 1965) that underlies much of current cognitive psychology One consquence of this theoretical foundation is that mental representations are viewed as having a fixed "elemental" structure that can be combined across representations and transformed from one representation to another Just as changing the atomic structure of a chemical changes its identity, the underlying assumption of mental chemistry is that changing part of the elemental structure of a concept or category changes the referential domain of the category For example, if the definition of the phoneme /p/ were [-voiced, +labial, +obstruent], a change in any one feature would produce the definition of a different phoneme Similarly, changing an attribute of a concept should produce a different concept, although this is a matter of some debate (e.g., Katz & Fodor, 1963; Lakoff, 1987; Quine, 1964; Tversky, 1977) This means that categories (both concepts and linguistic units) are rigid in their denotational specification Of course, if a mental representation is rigid in this way, it is difficult to explain the numerous demonstrations of context-sensitive categorization both in speech perception and in cognitive psychology There have been a number of attempts to relax this denotational rigidity in order to preserve the underying principle of mental chemistry, and still allow a degree of context sensitivity For example, the idea of features that not define a category but are generally characteristic of some of its members, suggests some flexibility in mental representation Furthermore, weighting features by their probability of occurrence provides even more flexibility, by allowing a representation to incorporate features that are unlikely to be diagnostic except in specific contexts But these attempts have not really provided a mechanism that is sensitive to context (Medin, 1989) because they still require that the complete set of criteria for category membership must be specified in the definition of a concept or perceptual unit independent of context It would be preferable to be able to generate the criteria for category membership in any specific context rather than require that these criteria be known in advance In fact, humans are quite capable of generating new criteria for category membership Barsalou (1983) has shown that subjects can form an ad hoc or goal-directed category like things to take on a vacation in Florida, even if they have never used that category before Moreover, categories that are constructed "on the fly" like this are used to classify objects in very much the same was as well-learned and highly familiar categories like chair People can thus develop new categories to satisfy specific goals and induce the criteria that are needed for determining membership in these categories What does all this tell us about the perceptual representation of speech sounds? Humans are very flexible in classifying stimuli according to the demands of specific contexts and novel situations Perhaps the kind of "generativity" displayed by subjects in forming and using ad hoc categories is the norm for familiar, well learned categories, as well Rather than view the perceptual representation of speech sounds as an enumeration of the sensory properties that indicate the classification of a particular sound, this representation may be a more abstract generating function In other words, the representation of a speech sound may be a procedure for generating perceptual criteria that can determine the linguistic classification of that sound, given a specific phonetic context and talker This procedure could be thought of as a kind of "theory" about what constitutes membership in a particular phonological category without enumerating every property that would be criterial of membership in all circumstances (which cannot really be done) Indeed, Murphy and Medin (1985) have suggested that a concept or category may be represented by a theory of what the concept means in terms of real world knowledge By this view, our knowledge of the world forms the basis for constructing a set of inferences that are used in applying the concept Thus, instead of using a fixed representation that directly specifies an attributebased description of a concept, a theory about a concept might be used to infer the appropriate attributes for any particular context Rather than view categorization as a pattern recognition process or attribute matching procedure, Murphy and Medin describe classification as a more general inferential process (or processes) that determines how well a particular pattern of information fits with the theory that represents a concept By extension, then, if concepts are represented as theories, we can view classification as a process of validating the theory for the particular information at hand As in science, validating a theory should consist of inferring specific hypotheses about the concept for a particular context and then testing the predictions represented by these hypotheses Therefore, it may be better to think about categorization as a kind of inferential and interpretive process, rather than as a pattern recognition process Transported to the domain of speech perception, a theory-based or generative view of categorization provides a different way of thinking about the representation of speech sounds We can think about the mental representation of the sounds of speech as theories about what constitutes the linguistically meaningful sounds of our language Given these theories about the phonological categories of language, speech perception becomes an inferential process by which the listener must first generate the hypotheses appropriate to a particular talker, context, and speaking rate and then must test these hypotheses Knowledge about the acoustic consequences of differences among talkers and contexts, together with the linguistic theory of each particular phonological category, provides the basic constraints necessary to generate these hypotheses The idea that speech perception is carried out by hypothesis testing is not entirely new Stevens and his colleagues (Stevens, 1960; Stevens & Halle, 1967; Stevens & House, 1972) proposed a model of speech perception called analysisby-synthesis that also involved recognition by hypothesis testing In their model, invariant acoustic attributes of speech were mapped directly onto phonetic categories The phonetic sequence that was known was used to infer phonetic segments to fill in the blanks left by the portions of the signal that were not invariant The hypothesized segments were used to drive a mental model of speech production and generate an auditory representation that could be matched against these portions of the utterance Of course, analysis-by-synthesis differs from the present proposal in several respects First, it assumes a traditional view of phonetic categories in which there are defining and characteristic features of phonemes The defining features are used for direct recognition and the characteristic features are used for generating hypotheses to be tested Second, only acoustic-phonetic hypotheses are generated Third, these hypotheses are tested using a model of the process of articulation to produce auditory patterns that are used in conjunction with standard pattern recognition processes By comparison, in the present proposal, there may be several types of hypotheses and different ways of testing them In some cases, these hypotheses may indeed be expectations about the acoustic manifestations of the coarticulation of phonetic context with each phonological category In these cases, theories that constitute phonological categories would be used to generate acoustic-phonetic hypotheses that could be tested by pattern comparison There is certainly a great deal of evidence that the listener's expectations about the sound pattern of an utterance affect perceptual processing of that utterance (Carden, Levitt, Jusczyk, & Walley, 1981; Cutler et al., 1987; Healy & Cutting, 1976; McNeill & Lindig, 1973; Remez, Rubin, Pisoni, & Carrell, 1981) In other cases, however, linguistic knowledge about phonological patterns in the lexicon may be used to generate hypotheses (Samuel, 1986) This suggests the outline of an explanation of our present results In performing a phoneme classification task, listeners use theories about the sounds of their language to generate hypotheses that may be tested during speech perception These hypotheses may be phonetic or lexical in nature (or may have some other source entirely), since these types of evidence could be highly diagnostic in classifying the phonological structure of an utterance For example, when asked to carry out a phonological decision on nonword syllables, a listener would want to use all the evidence that is relevant to testing hypotheses about phonological identity Thus, the listener might attend primarily to acousticphonetic information, including phonetic context to take coarticulatory influences into account In using coarticulatory information to test hypotheses about segment identity in nonsense syllables, attention is distributed to phonetic context, even across syllable boundaries By contrast, hypotheses about segment identity in words can be constrained by lexical, as well as acoustic-phonetic, information, since the phonological constituency of words is constrained by the lexicon of a specific language For spoken words, then, decisions about the phonological classification of parts of an utterance might be based on testing a combination of phonetic hypotheses and lexical hypotheses The present results of the speeded classification task may reflect the process by which listeners test hypotheses about the phonological identity of particular segments When phonetic context is fixed, in the unidimensional conditions, listeners can quickly test their acoustic-phonetic expectations about the target segment When phonetic context varies, listeners must process this phonetic context in order to assess the acoustic-phonetic evidence provided by the different coarticulatory environments, regardless of syllable structure However, when lexical knowledge can be used to construct phonological hypotheses, less attention needs to be directed to intersyllabic phonetic context, because knowledge about which word may have been spoken constrains the identity of possible phonological segments in that word If our results reflect the process by which listeners test hypotheses regarding phonological identity of speech segments, it is interesting to speculate about the way these hypotheses are generated If listeners have phonological representations that are theories rather than simple feature specifications, what does this mean? What is a "theory" about phonological identity? A hint about the answer may come from research on the development of speech perception Recently, Jusczyk (in press) has argued that the initial representation of the sound patterns of speech in infants is in the form of words in the mental lexicon He suggests that infants begin to develop a phonological system by learning the sound patterns of specific words rather than using an innate phonological system or learning segmental patterns divorced from lexical context The words that children learn are stored in the lexicon as sequences of syllable forms that are initially represented as auditory patterns As more words are learned and the phonological similarity of the words known to the infants increases, a more systematic description of sound patterns will be needed than that which is provided by an auditory coding of whole syllables This suggests that the syllable may emerge as the framework for phonological representation This is entirely consistent with modern theories of phonology such as autosegmental phonology (Goldsmith, 1989a) In this type of theory, the syllable plays a central role in organizing phonetic structure of speech The syllable provides the basis for constraining and licensing phonetic features (Goldsmith, 1989b) and thus may serve as the foundation for the listener's theories about phonological categories According to this type of theory, the syllable could provide the theoretical framework needed to generate specific phonetic hypotheses to be tested during speech perception, and can be used to relate phonetic hypotheses to lexical sound patterns In shifting the view of speech perception from matching the acoustic properties of an utterance to a set of fixed phonetic attributes, to a more flexible, theory-based representation of phonological information, we shift the importance of research questions We started with the question of determining the primary units of speech perception However, this question becomes less interesting if listeners can form ad hoc categories based on experimenter's demands in a phonological classification experiment Any particular speech perception experiment may establish for the listener a set of phonological goals that provide the basis for constructing ad hoc categories using theories of the sounds of the listener's native language In some sense, this conclusion is entirely compatible with the conclusions of previous studies investigating the perceptual units of speech (Foss & Swinney, 1973; Healy & Cutting, 1976; McNeill & Lindig, 1973) In another sense, we have moved beyond that position and can begin to speculate on new questions The traditional pattern recognition view of speech perception has always emphasized phonetics rather than phonology and psychophysics rather than cognitive psychology Since theories of pattern recognition require a precise characterization of the pattern information and structure, there has been an emphasis on phonetics for describing the acoustic properties of spoken language and an emphasis on psychophysics for describing the sensory transformation and representation of those properties In asking about the nature and source of a listener's theories about the categories of speech sounds, we are asking about the phonology of language and conceptual representation In asking about how hypotheses are generated from these theories and how these hypotheses are tested, we are asking questions about inferences and the role of attention to different sources of information Thus, a theory-based view of phonological classification moves toward unifying theories of speech perception with theoretical issues in phonology and cognitive psychology (e.g., see Nusbaum, in press) Acknowledgements The authors would like to thank Paula Hildebrand for her assistance in testing subjects This research was supported in part by a grant from the NIDCD DC-00601 and in part by a grant from theAir Force Office of Scientific Research AFOSR 87-0272 References Anderson, J R (1978) Arguments concerning representations for mental imagery Psychological Review, 85, 249-277 Barclay, J R., Bransford, J D., Franks, J J., McCarrell, N S., & Nitsch, K (1974) Comprehension and semantic flexibility Journal of Verbal Learning and Verbal Behavior, 13, 471-481 Barsalou, L W (1982) Context-independent and context-dependent information in concepts Memory & Cognition, 10, 82-93 Barsalou, L W (1983) Ad hoc categories Memory & Cognition, 11, 211-227 Barsalou, L W., & Medin, D L (1986) Concepts: Fixed definitions or dynamic context-dependent representations? Cahiers de Psychologie Cognitive, 6, 187-202 Carden, G., Levitt, A G., Jusczyk, P W., & Walley, A (1981) Evidence for phonetic processing of cues to place of articulation: Perceived manner affects perceived place Perception & Psychophysics, 29, 26-36 Clements, G., & Keyser, S (1981) A three-tiered theory of the syllable Occasional Paper #19, Center for Cognitive Science, Massachusetts Institute of Technology Cutler, A., Butterfield, S., & Williams, J N (1987) The perceptual integrity of syllabic onsets Journal of Memory and Language, 26, 406-418 Cutler, A., & Norris, D (1979) Monitoring sentence comprehension In W E Cooper & E C T Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett (pp 113-134) Hillsdale, NJ: Lawrence Erlbaum De Mori, R (1983) Computer models of speech using fuzzy algorithms New York: Plenum Fant, G (1962) Descriptive analysis of the acoustic aspects of speech Logos, 5, 3-17 Foss, D J., & Swinney, D A (1973) On the psychological reality of the phoneme: Perception, identification, and consciousness Journal of Verbal Learning and Verbal Behavior, 12, 246-257 Fudge, E C (1969) Syllables Journal of Linguistics, 5, 253-286 Garner, W R (1970) The stimulus in information processing American Psychologist, 25, 350-358 Garner, W R (1974) The processing of information and structure Hillsdale, NJ: Lawrence Erlbaum Goldsmith, J (1989a) Autosegmental and metrical phonology Oxford and New York: Basil Blackwell Goldsmith, J (1989b) Licensing, inalterability, and harmonic rule application In R Graczyk, B Music, & C Wiltshire (Eds.), Papers from the 25th Annual Regional Meeting of the Chicago Linguistic Society Chicago: Chicago Linguistic Society Greenspan, S L (1986) Semantic flexibility and referential specificity of concrete nouns Journal of Memory and Language, 25, 539-557 Halle, M., & Vergnaud, J.-R (1980) Three dimensional phonology Journal of Linguistic Research, 1, 83-105 Healy, A F., & Cutting, J E (1976) Units of speech perception: Phoneme and syllable Journal of Verbal Learning and Verbal Behavior, 15, 73-83 Jusczyk, P W (in press) In H C Nusbaum & J C Goodman (Eds.), The transition from speech sounds to spoken words: The development of speech perception Cambridge, MA: MIT Press Kahn, D (1976) Syllable-based generalizations in English phonology Doctoral dissertation, MIT Reproduced by the Indiana University Linguistics Club Katz, J J., & Fodor, J A (1963) The structure of a semantic theory Language, 39, 170-210 Klatt, D H (1979) Speech perception: A model of acoustic-phonetic analysis and lexical access Journal of Phonetics, 7, 279-312 Kosslyn, A M (1980) Image and Mind Cambridge, MA: Harvard Lakoff, G (1987) Women, fire, and dangerous things Chicago: The University of Chicago Press Liberman, A M., Cooper, F S., Shankweiler, D P., & Studdert-Kennedy, M (1967) Perception of the speech code Psychological Review, 74, 431461 Massaro, D W (1972) Preperceptual images, processing time, and perceptual units in auditory perception Psychological Review, 79, 124-145 Massaro, D W (1975) Acoustic features in speech perception In D W Massaro (Ed.), Understanding language: An information-processing analysis of speech perception, reading, and psycholinguistics (pp 77124) New York: Academic Massaro, D W (1987) Speech perception by ear and eye: A paradigm for psychological inquiry Hillsdale, NJ: Lawrence Erlbaum McCloskey, M., & Glucksberg, S (1978) Natural categories: Well defined or fuzzy sets? Memory & Cognition, 6, 462-472 McCloskey, M., & Glucksberg, S (1979) Decision processes in verifying category membership statements: Implications for models of semantic memory Cognitive Psychology, 11, 1-37 McNeill, D., & Lindig, K (1973) The perceptual reality of phonemes, syllables, words, and sentences Journal of Verbal Learning and Verbal Behavior, 12, 419-430 Medin, D L (1989) Concepts and conceptual structure American Psychologist, 44, 1469-1481 Mill, J S (1965) On the logic of the moral sciences New York: Bobbs-Merrill (Originally published 1843) Mills, C B (1980a) Effects of context on reaction time to phonemes Journal of Verbal Learning and Verbal Behavior, 19, 75-83 Mills, C B (1980b) Effects of the match between listener expectancies and coarticulatory cues on the perception of speech Journal of Experimental Psychology: Human Perception and Performance, 6, 528-535 Murphy, G L., & Medin, D L (1985) The role of theories in conceptual coherence Psychological Review, 92, 289-316 Norris, D., & Cutler, A (1988) The relative accessibility of phonemes and syllables Perception & Psychophysics, 43, 541-550 Nusbaum, H C (in press) Understanding speech perception from the perspective of cognitive psychology In J Charles-Luce, P Luce, & J R Sawusch (Eds.), Theories in spoken language: Perception, production and development Norwood, NJ: Ablex Publishers Ohman, S E G (1966) Coarticulation in VCV utterances: Spectrographic measurements Journal of the Acoustical Society of America, 39, 151-168 Pastore, R E., Ahroon, W A., Puleo, J S., Crimmins, D B., Golowner, L., & Berger, R S (1976) Processing interaction between two dimensions of nonphonetic auditory signals Journal of Experimental Psychology: Human Perception and Performance, 2, 267-276 Pisoni, D B (1978) Speech perception In W K Estes (Ed.), Handbook of learning and cognitive processes (Vol 6, pp 167-233) Hillsdale, NJ: Lawrence Erlbaum Pisoni, D B (1981) In defense of segmental representations in speech processing In Research on Speech Perception Progress Report No (pp 215-227) Bloomington: Indiana University, Speech Research Laboratory Pomerantz, J R (1986) Visual form perception: An overview In E C Schwab & H C Nusbaum (Eds.), Pattern recognition by humans and machines: Vol Visual perception (pp 1-30) Orlando, FL: Academic Press Pylyshyn, Z W (1981) The imagery debate: Analogue media versus tactic knowledge Psychological Review, 87, 16-45 Quine, W V (1964) Speaking of objects In J A Fodor & J J Katz (Eds.), The structure of language: Readings in the philosophy of language (pp 479-518) Englewood Cliffs, NJ: Prentice-Hall Remez, R E., Rubin, P E., Pisoni, D B., & Carrell, T D (1981) Speech perception without traditional speech cues Science, 212, 947-950 Samuel, A G (1977) The effect of discrimination training on speech perception: Noncategorical perception Perception & Psychophysics, 22, 321-330 Samuel, A G (1986) The role of the lexicon in speech perception In E C Schwab & H C Nusbaum (Eds.), Pattern recognition by humans and machines: Vol Speech perception (pp 89-111) Orlando, FL: Academic Press Savin, H., B., & Bever, T G (1970) The nonperceptual reality of the phoneme Journal of Verbal Learning and Verbal Behavior, 9, 295-302 Sawusch, J R., & Nusbaum, H C (1983) Auditory and phonetic processes in place perception for stops Perception & Psychophysics, 34, 560-568 Selkirk, E O (1982) The syllable In H van der Hulst & N Smith (Eds.), The structure of phonological representations (Part 2, pp 337-383) Dordrecht, Holland: Foris Shand, M A (1976) Syllabic vs segmental perception: On the inability to ignore "irrelevant" stimulus parameters Perception & Psychophysics, 20, 430432 Smith, E E., & Medin, D L (1981) Categories and concepts Cambridge, MA: Harvard University Press Smith, E E., Shoben, E J., & Rips, L J (1974) Structure and process in semantic memory: A featural model for semantic decisions Psychological Review, 81, 214-241 Stevens, K N (1960) Toward a model for speech recognition Journal of the Acoustical Society of America, 32, 47-55 Stevens, K N., & Blumstein, S E (1978) Invariant cues for place of articulation in stop consonants Journal of the Acoustical Society of America, 64, 1358-1368 Stevens, K N., & Halle, M (1967) Remarks on analysis by synthesis and distinctive features In W Wathen-Dunn (Ed.), Models for the perception of speech and visual form (pp 88-102) Cambridge, MA: MIT Press Stevens, K N., & House, A S (1972) Speech perception In J Tobias (Ed.), Foundations of modern auditory theory Vol (pp 1-62) New York: Academic Press Studdert-Kennedy, M (1976) Speech perception In N J Lass (Ed.), Contemporary issues in experimental phonetics (pp 243-293) New York: Academic Press Tomiak, G R., Mullennix, J W., & Sawusch, J R (1987) Integral processing of phonemes: Evidence for a phonetic mode of perception Journal of the Acoustical Society of America, 81, 755-764 Treiman, R., Salasoo, A., Slowiaczek, L M., & Pisoni, D B (1982) Effects of syllable structure on adults' phoneme monitoring performance In Research on Speech Perception Progress Report No (pp 63-81) Tversky, A (1977) Features of similarity Psychological Review, 84, 327-352 Walley, A C., & Carrell, T D (1983) Onset spectra and formant transitions in the adult's and child's perception of place of articulation in stop consonants Journal of the Acoustical Society of America, 73, 1011-1021 Wickelgren, W A (1969) Context-sensitive coding, associative memory, and serial order in (speech) behavior Psychological Review, 76, 1-15 Wittgenstein, L (1953) Philosophical investigations New York: Macmillan Wood, C C., & Day, R S (1975) Failure of selective attention to phonetic segments in consonant-vowel syllables Perception & Psychophysics, 17, 346-350 ... as in the utterance paw grew, where the /g/ and /r/ are in the onset of the same word In the other set, the target and varying context were in adjacent words, as in hog root If the phonetic integrality... different way of thinking about the representation of speech sounds We can think about the mental representation of the sounds of speech as theories about what constitutes the linguistically meaningful... speech perception Recently, Jusczyk (in press) has argued that the initial representation of the sound patterns of speech in infants is in the form of words in the mental lexicon He suggests that infants

Định dạng
Số trang	27
Dung lượng	182,43 KB