M. Breen, M. Wagner, S. Shattuck-Hufnagel, E. Flemming, & E. Gibson UMass Amherst, McGill University, MIT
mbreen@psych.umass.edu
A limitation on research into prosody has been the difficulty of finding a way to evaluate hypotheses quantitatively. This task is difficult because individual variation in speakers’ productions is often large enough to wash out experimental effects. This paper has two goals: 1) to introduce new statistical methods which account for variability among speakers and identify acoustic features of information structure in an objective way; and 2) to apply these methods to data which investigate how speakers prosodify cases of multiple foci, where the focus operator could associate with one or both of the foci (cf. Krifka, 1992; Rooth, 1996).
In ‘Association with focus’ (Jackendoff, 1972) a sentence’s meaning changes with the prosodic realization of material in the scope of ‘focus-sensitive operators’, e.g. exclusive only. A phrase may contain multiple foci, for example the focus of ‘only’ and a contrastive focus (e.g., condition B in Table 1). Little is known about the prosody of multiple foci, e.g., it is not clear whether listeners can distinguish the associating focus from contrastive focus, given that both are likely to be accented.
We recorded 10 pairs of nạve subjects producing semantically ambiguous target sentences like Grandma only gave a bunny to Maryanne after reading disambiguating contexts (Table 1), with the goal of inducing the listener to select the appropriate picture in Fig. 1. 24 acoustic measures of duration, pitch, and intensity were extracted from 5 target words (e.g. Gramma, only, gave, bunny, Maryanne). Without accounting for speaker and item variation, none of the conditions were discriminated by these measures in pair-wise comparisons. To remove variance due to speakers and items, we computed linear regression models in which speaker (n = 20) and item (n = 20) predicted the 24 acoustic features. From each model, we calculated the predicted value of each acoustic feature per item per speaker. The difference between the predicted and actual values (i.e. the residual measure) reflects acoustic differences due only to experimental manipulation. We submitted the residual measures to a stepwise discriminant function analysis, to independently determine which acoustic measures speakers used to differentiate productions.
Eight acoustic measures (duration, mean pitch, pitch range, and maximum intensity from bunny and Maryanne, respectively) resulted in better-than-chance 6-way classification of the productions according to context; moreover, many conditions were now discriminated in pair-wise comparisons. For example, (B) and (D) were discriminated, although in both conditions bunny and Maryanne are accented and invoke alternatives. The contrastive NPs are longer, have higher pitch, a larger pitch range, and higher intensity than the focused NPs associating with only. Thus contrastive material is more prominent than focused material.
Other conditions were also successfully discriminated, including (D) and (E), indicating that previous mention results in a less prominent realization of an NP, even where all NPs are accented (cf. Bard, 2000).
In summary, this paper will present both methodological and empirical results:
First, we will describe a way of removing inter-speaker variability in order to reveal hidden systematic patterns in prosody. Second, we will describe how using these methods reveals a set of differentiations among types of foci (see Table 1).
43
Figures and Tables
Figure 1. Picture display from the production experiment. The Listener selected the picture which best represented the meaning of the target sentence, as produced by the
Speaker.
CONTEXT TARGET
A Gramma didn’t give a scarf to Maryanne. Gramma only gave a bunnyF to MaryanneG
B Gramma gave a scarf and a bunny to John. Gramma only gave a bunnyC to MaryanneF
C Gramma didn’t give a bunny to John. Gramma only gave a bunnyG to MaryanneF
D Gramma gave a scarf to both Maryanne and John. Gramma only gave a bunnyF to MaryanneC
E Gramma picked one present and gave it to her favorite grandchild.
Gramma only gave a bunnyF to MaryanneF
F Gramma didn’t give a scarf to Maryanne, and she didn’t give either a bunny or a scarf to John.
Gramma only gave a bunnyF to MaryanneF
Table 1: Contexts and focus structures for target sentences used in the experiment.
Speakers produced only the target sentences aloud. All Contexts were preceded by a longer Set-up in which, in all but E, all discourse entities (i.e. bunny, scarf, Maryanne, &
John) were mentioned. Subscripts are as follows: G = given; F = associates with ‘only’; C
= contrastive, but non-associating, focus.
References
Bard, E.G., Anderson, A.H., Sotillo, C., Aylett, M., Doherty-Sneddon, G., and Newlands, A.
(2000) Controlling the Intelligibility of Referring Expressions. Journal of Memory and Language 42. 1--22
Jackendoff, R. (1972) Semantic Interpretation in Generative Grammar. Cambridge, MA:
MIT Press.
Krifka, M. (1992) A compositional semantics for multiple focus constructions.
Linguistische Berichte. Sonderheft 4: Informationsstruktur und Grammatik.
Rooth, M. (1996) Focus. In The Handbook of Contemporary Semantic Theory. Shalom Lappin, ed. London: Basil Blackwell.
45
Non-local pitch range relationships in read and elicited speech