Báo cáo khoa học: "THE CONTRIBUTION OF PARSING TO PROSODIC PHRASING IN AN EXPERIMENTAL TEXT-TO-SPEECH SYSTEM" doc

INTRODUCTION We describe an experimental text-to-speech system that uses a deterministic parser and prosody rules to generate phrase-level pitch and duration information for English inpu

Trang 1

THE C O N T R I B U T I O N OF P A R S I N G TO P R O S O D I C

P H R A S I N G IN AN E X P E R I M E N T A L

T E X T - T O - S P E E C H SYSTEM

ABSTRACT

While various aspects of syntactic structure have

been shown to bear on the determination of phrase-

level prosody, the text-to-speech field has lacked a

robust working system to test the possible relations

between syntax and prosody We describe an

implemented system which uses the deterministic

parser Fidditch to create the input for a set of prosody

rules The prosody rules generate a prosody tree that

specifies the location and relative strength of prosodic

phrase boundaries These specifications are converted

to annotations for the Bell Labs text-to-speech system

that dictate modulations in pitch and duration for the

input sentence

We discuss the results of an experiment to determine

the performance of our system We are encouraged

by an initial 5 percent error rate and we see the design

of the parser and the modularity of the system

allowing changes that will upgrade this rate

INTRODUCTION

We describe an experimental text-to-speech system

that uses a deterministic parser and prosody rules to

generate phrase-level pitch and duration information

for English input This information is used to

annotate the input sentence, which is then processed

by the text-to-speech programs currently under

development at Bell Labs In constructing the ,system,

our goal has been to test the hypotheses (i) that

information available in the syntax tree in particular

grammatical functions such as subject-predicate and

head-complement, is bv itself useful in determining

prosodic phrasing for svnthetic speech, and (ii) that it

ts possible to use a syntactic parser that specifies

grammatical functions to determine prosodic phrasing

for synthetic speech

Although certain connections between syntax and

prosody are well-known (e.g the influence of part of

speech on stress in words like progress, or the setting

off of parenthetical expressions) very little practical

knowledge is available on which aspects of syntax

might be connected to prosodic phrasing In many

studies, investigators have sought connections between

constituent structure and prosody (e.g Cooper and

Paccia-Cooper 1980 Umeda 1982 Gee and Grosjean

1983) but, with the exception of Selkirk (1984) they

tend to neglect the representation of grammatical

functions in the svntax tree Moreover, previous work

has not been specific enough to provide the basis for a

full system implementation Based on our study of

prosodic phrasing in recorded human speech, we

Joan Bachenko Eileen Fitzpatrick

C E Wright

A T & T Bell L a b o r a t o r i e s

M u r r a y H i l l , N e w J e r s e y 07974

decided to emphasize three aspects of structure that relate to phrasing: syntactic constituency, grammatical function, and constituent length These findings which we will discuss in detail, have been implemented as a collection of prosody rules in an experimental text-to-speech system

Two important features characterize our system First the input to our prosody system is a parse tree generated by a version of the deterministtc parser Fidditch (Hindle 1983) The left-corner search strategy of this parser and, in particular, its determinism, give Fidditch the speed that makes online text-to-speech production feasible 1 In building

a parse tree, Fldditch identifies the core subject-verb- object relations but makes no attempt to represent adjunct or modifier relations Thus relative clauses adverbials, and other non-argument constituents have

no specified position in the tree and no specified semantic role Second the rules in the prosody system build a prosody tree by referring both to the syntactic structure and to earlier stages of prosodic structure The result is a hierarchical representation that supports the view, also proposed in Selkirk (1984) that grammatical function information is related to prosodic phrasin.g, but indirectly, through different levels of processing

Informal tests of the system show that it is capable

of producing a significant improvement in the prosodic quality of the resulting synthesized speech, Our investigations of the system's problems, which we describe, have not revealed any serious counterexample to our basic approach In many cases

it appears that problems with the current version can

be resolved by taking our approach a step further, and including lexical information required by the parser as another factor in the determination of prosodic phrasing

TEXT-TO-SPEECH

Most text-to-speech systems comprise two components: pronunciation rules and a speech synthesizer Pronunciation rules convert the input text into a phonetic transcription; this information mav also be supplemented by a dictionary that provides information about the part of speech, stress pattern and phonetic makeup of particular words The speech

I With a ~rammar of about 600 rules and a lexicon of about 2400 words, "Fidditch parses the 25 sample sentences of Robinson (1982), averagin~ 7 words per sentence and chosen for their structural divers*t'¢, at an avera~hrate of 405 seconds per sentence on a Sv'mbolics 3670 ~ rate is approximately proportional to th~ number of words in a sentence

145

Trang 2

synthesizer then converts this phonetic transcription

into a series of speech p a r a m e t e r s which are

subsequently processecl to produce digitized speech

While these systems tend to p e r f o r m quite well on

word pronunciation, they fall short when it comes to

providing good prosody for complete sentences

Current text-to-speech systems have no access to the

syntactic and semantic properties of a sentence that

influence phrase-level prosody Hence rules for

sentence prosody, when they are provided at all

typically depend on superficial aspects of text (e.g

punctuation) and on heuristics that vary widely in

sophistication Although such techniques often add a

more natural quality to the resulting synthetic speech,

!hey can fail in important ways, for example, by

xgnormg the prosodic event between a lengthy subject

and a predicate, so that there is no clear prosodic

boundary between right and m a r k in The characters on

the right m a r k the salient f e a t u r e s 2

Several authors (e.g Allen 1976; Elovitz et al

1976; Luce et al 1983) have suggested that prosodic

differences between synthetic and natural speech are

the primary, unaddressed factor leading to difficulties

in the comprehension of fluent synthetic speech The

relation between phrase-level prosody and its sources,

however, is so poorly understood that we have no

good sense of the degree to which different levels of

explanation syntactic, semantic, or pragmatic are

applicable We currently have reasonable tools for

automatic syntactic anal~,sis of a text but there is

nothing equivalently well-developed for semantic or

pragmatic textual analysis Thus an obvious goal is to

explore the extent to which phrase-level prosody can

be explained by the syntax tree and develop a detailed

description of that relation A further goal is to

convert the resulting insights about this relation into a

system that can work with a speech synthesizer This

allows us to test our description more adequately and

perhaps also produce something that will further text-

to-speech technology

SYNTACTIC S T R U C T U R E AND

PROSODIC PHRASING

Certain relations between syntax and prosody

especially at the word level, are well-known For

example, the syntactic category of a word may affect

its phonetic realization, as in the verb/adjective

distinction of separate, approximate, and the verb/noun

distinction of house, wind, lives Likewise, syntactic

category affects word stress, so that verbs such as

whereas the corresponding nouns receive penultimate

stress

Beyond the word level, however, there has been

little investigation of systematic connections between

syntactic structure and prosodic phrasing The

psycholinguistic and acoustic investigations of Cooper

and Paccia-Cooper (1980), U m e d a (1982) and G e e and

Grosjean ( 1 9 8 3 ) a n d the prosodic theory of Selkirk

(1984) are among the more notable studies and

represent the two main approaches to syntax/prosody

2 Note that without a syntactic anal,,sis that correctly identifies

~rammatical functions, it is impos'sible to determine whether

tlae word mark is a noun ending the subject phrase or the verb

of the predicate phrase Simple 'surface" parsers, such as that

described in Umeda and Teranishl (1974l will still fail to

identify, the prosodic boundar.~ correctly

relations In C o o p e r and Paccia-Cooper (1980) and

U m e d a (1982), the connection from syntax to prosodic phrasing is u n m e d i a t e d by any filtering process, i.e they propose that the details of prosodic phrasing can

be d e t e r m i n e d directly f r o m syntactic structure by associating particular syntactic nodes (or constituent boundaries) with a phonetic value, either pausing, segmental lengthening, or the blocking of the cross- word conditioning of phonological rules By contrast,

G e e and G r o s j e a n (1983) and Selkirk (1984) believe that the syntax-prosody relation is indirect: prosodic phrasing is derived by rules that refer to left-to-right ordering, length (or branching patterns), and, in the ca~e of Selkirk grammatical function, as well as constituent membership in order to infer a hierarchical prosodic structure But while their respective positions are quite clear, none of these studies is conclusive All lack a syntactic f r a m e w o r k sufficiently detailed and formalized to allow extensive testing, and most consider 9nly a small number of sentences and sentence t y p e s ?

To develop our analysis, we first examined prosodic phrasing in the speech of one of us reading prose from various texts, including four instruction manuals These texts were later a u g m e n t e d by a

~ rofessional reading of a prose story The boundaries etween prosodic phrases were identified and then classed according to their syntactic context and semantic function

O u r results, which are outlined below, indicate an organization of the prosodic phrases that supports the 'indirect relationship' approach of G e e and Grosjean (1983) and Selkirk (1984) We found that, in our corpus, prosodic phrasing depends on three aspects of structure: the b r e a k d o w n into syntactic constituents, the grammatical function of a constituent, and constxtuent length, Let us review each of these factors

Syntactic Constituency

The possible constituents recognized by our parser are Noun Phrase (NP) Verb Phrase (VP) Adjective Phrase (AdjP), A d v e r b Phrase (AdvP), and Prepositional Phrase (PP) In general, we found that syntactic constituency is partxcularly important for predicting points at which a prosodic phrase boundary

is not produced, i.e., the words within a syntactic constituent cohere For example, the italicized phrases in (1)-(5) had no perceptible boundaries at the locations indicated by # :

(1) Left-hand # p o w e r unit is connected

(2) This procedure shows # you

(3) A n extremely # n a r r o w opening

(4) To spread powerload more # evenly

(5) next # to any p o w e r e d di-group The single exception to word cohesion within syntactic

3 Gee and Grosjean (1983) use a corpus of 14 sentences Umeda (1982) considers a large corpus but like Gee and Grosjean does not distinguish among grammatical functions Althou~_h Selkirk cites r~any exam~lgs in her discussionsof phra~'al stress and word-level prosody, her description of prosodic phrasing focusses on only a single example

Trang 3

constituents involved boundaries between the verb and

its first or second object when the object in question

was lengthy We discuss this exception below

Grammatical Functions

Our sample indicated that phrase boundaries are

also d e t e r m i n e d by the grammatical relations among

the syntactic constituents, i.e the argument structure

of the sentence Four grammatical relations concern

us:

(a) subject-predicate, as in T h e 4 8 - c h a n n e l m o d u l e

h a s t w o d i - g r o u p s

(b) head-complement, where the head can be a

noun, verb, or adjective and may have one

complement, e.g h a s t w o d i - g r o u p s , or two

complements, e.g s h o w s y o u h o w t o f l y y o u r k i t e

(c) sentence-adjunct, as in I n s e r t u n i t i n t o c o r r e c t

s h e l f l o c a t i o n p e r d e t a i l i n s t r u c t i o n s

(d) head-modifier, where the head can be a noun,

verb, adverb, or adjective and the modifier can be one

of several things, depending on the head (e.g., for

nouns, the modifier can be a relative clause; for verbs,

it can be a prepositional phrase; for adjectives and

adverbs, the modifier can be a comparative)

We observed a hierarchy among these relations

with respect to the strength, or perceptibility, of a

prosodic boundary, with the boundary between

sentence and adjunct receiving the highest potential

boundary strength, followed by the subject-predicate

boundary, then the head-complement and head-

modifier boundaries Thus in (6), there is a strong

boundary between subject and predicate, whereas in

(7), due to the strong boundary between adjunct and

core sentence, the subject-predicate boundary

diminishes (Dashes indicate the location of the

boundary being discussed.)

(6) The name of the character is not pronounced

(7) When this switch is off the name of the

character is not pronounced

Constituent Length

While we may view each boundary as having an

intrinsic strength based on constituency and

grammatical function, the determination of actual

strengths appears to depend on the interaction of the

intrinsic strength of a boundary with the strengths of

other boundaries in the sentence, as well as the

distance between these boundaries The most salient

of the interactions we observed was between the

placement of a boundary at the subject-predicate

junction and the placement of a boundary following

the verb-complement junction The mediating factor

in this interaction was the relative length of the

subject with respect to the length of the verb's

complements Thus a sentence such as (8) with both a

short subject and a single short object generally is

produced without a boundary in either position

(8) You have completed the task

But if, as in (9), the subject is long relative to the

object, then a break occurs between the subject and

predicate Conversely, if the subject is short relative

to the object, then a break will occur between the verb and the object, as in (10) Or, if there are two objects and the first is simple, the break will occur between them, as in (11)

(9) The materials required are one kite kit (10) How shall we judge the goodness of an algorithm?

(11) This procedure shows you how to fly your kite

AN EXPERIMENTAL PROSODY SYSTEM

O u r findings confirmed that syntactic structure plays a major role in determining prosodic structure, but the relationship is indirect the exact influence of syntactic constituency varies according to the length and grammatical function of each constituent To refine and test this idea, we implemented an experimental text-to-speech system in which rules apply to a parse tree to infer prosodic structure and then annotate the input string with phrasing information derived from the prosodic structure; this annotated input string is submitted to the Bell Labs text-to-speech programs, which convert it into a speech file O u r system comprises three components:

a parser that builds syntactic structure, rules that derive prosody information from the syntactic structure, and the Bell Labs text-to-speech programs The parser and speech programs are independent components The prosody rules act as a filter between them, converting the syntactic information generated

by the parser into prosodic information that can be supplied to the text-to-speech programs

Parsing

Our parser is a version of Fidditch (Hindle 1983), a moderate coverage parser based on the deterministic model described in Marcus (1980) To build syntactic structure, Fidditch uses a g r a m m a r that requires the representations produced by lexical and syntactic rules

to be consistent with the (semantic) predicate- argument structure The surface syntactic structures generated by the parser represent the argument structure of a phrase or sentence, i.e the "core" constituents of a sentence (its subject (NP), modality ( A U X ) , and predicate (VP)) and the complements of phrasal heads The structure is determined, for the most part, by rules that refer to argument information that is specified in the lexicon for the content words

!nouns, verbs, adjectives, adverbs), and by rules that insert null terminals such as the "trace" of wh- movement In general, the g r a m m a r is consistent with the government and binding f r a m e w o r k of C h o m s k y (1981), as adapted to the needs of a parser

The input to the parser is a phrase or sentence (punctuation is optional) Its output is a surface structure tree in which the status of a constituent with respect to the predicate-argument structure of the sentence is indicated by the constituent's attachment

to higher nodes in the tree Thus only constituents that belong to the core are attached to the S node, and only complements of a phrasal head can become righthand sisters of the head Adjuncts and modifiers

147

Trang 4

whose role depends on semantic and pragmatic

i n f o r m a t i o n about the discourse domain, have no

r e p r e s e n t e d as "orphan" nodes in the tree

For example, Figure 1 shows the parse tree for

L e f t - h ' a n d p o w e r u n i t on e a c h s h e l f in 4 8 - c h a n n e l m o d u l e

c a n p o w e r o n l y t h e e c h o c a n c e l e r s t h a t a r e in t h a t s h e l f

4 The structure in Figure 1 contains a single core

e a c h s h e l f and in 4 8 - c h a n n e l m o d u l e , and the adverb

o n l y which are u n a t t a c h e d constituents This is the

significance of the u n l a b e l e d node d o m i n a t i n g each of

these constituents The PPs are not attached because

u n i t is not lexically m a r k e d to take a PP headed by on

m a r k e d to take a PP c o m p l e m e n t headed by in Nor is

argument

t h a t s h e l f In the relative clause, T is a null t e r m i n a l

that stands for the trace of the relativized subject NP;

the * in tense stands for a null Aux e l e m e n t Because

n o u n s do not select relative clauses as a r g u m e n t s (any

n o u n can be relativized), the parser does not identify

the relations of the modifier c o n s t i t u e n t to the

e l e m e n t s of the core sentence H e n c e the relative

clause is not attached to any other syntactic node in

the tree

Text-to-speech Synthesis

The programs that make up the speech c o m p o n e n t

are described in L i b e r m a n and Buchsbaum (personal

c o m m u n i c a t i o n ) These programs take English text as

a n n o t a t i n g the input text to this system, m a n y aspects

of its o p e r a t i o n can be o v e r r i d d e n or modified: e.g the

location of major and m i n o r phrase boundaries, the

stress given to words, the t r a n s c r i p t i o n of words and

the b o u n d a r i e s b e t w e e n them, the timing of segments,

and details of the pitch contour As we will show,

with our prosody system we are able to produce

strings in which four b o u n d a r y levels are identified

a n d p e r c e p t u a l l y distinguished, using the c u r r e n t text-

to-speech system annotations

Prosodic Phrasing

c o n s t i t u e n t structure, g r a m m a t i c a l role, and length to

m a p a surface structure such as that in Figure 1 onto a

prosody tree such as that in Figure 2 The prosody

tree identifies the location of phrase b o u n d a r i e s

(signified by the • nodes) and the relative strength of

each b o u n d a r y (signified by a n u m b e r in the • node)

It is this i n f o r m a t i o n that is used to a n n o t a t e the input

text with escape sequences that provide the text-to-

phrasing

In f o r m u l a t i n g our rules for building the prosodic

i m p l e m e n t i n g the model of G e e and G r o s j e a n (1983)

This model, initially proposed to predict a form of

prosodic b o u n d a r i e s from a syntactic tree, but assumes

r a t h e r than explicitly presents a syntactic c o m p o n e n t

We were initially attracted to the G e e and G r o s j e a n model because of its emphasis on relative b o u n d a r y weighting, i.e., on the d e t e r m i n a t i o n of the strength of

a given b o u n d a r y with respect to the other b o u n d a r i e s

in the sentence We found that in the data we had collected, this weighting played an i m p o r t a n t role In fact, we i n c o r p o r a t e d directly into our system one

m e t h o d of doing this weighting, n a m e l y G e e and

G r o s j e a n ' s rule to d e t e r m i n e the strengths of the

relative length (as m e a s u r e d by t e r m i n a l node count)

As we e x t e n d e d G e e and G r o s j e a n ' s model to create an algorithm a d e q u a t e for use in a g e n e r a l purpose system, our algorithm diverged from its starting point, reflecting our a t t e m p t s to correct weaknesses and l a c u n a e that we e n c o u n t e r e d in the

G e e and G r o s j e a n model That we e n c o u n t e r e d these

b e t w e e n our goals and those of G e e and Grosjean The most i m p o r t a n t d i f f e r e n c e b e t w e e n the G e e

involves the factors d e t e r m i n i n g b o u n d a r y weight

G e e a n d G r o s j e a n assume that this weighting is

d e p e n d e n t only on the n u m b e r of syntactic nodes, their left-to-right ordering and, in the case of the verb phrase, on c o n s t i t u e n t length In contrast, our data, in

a g r e e m e n t with Selkirk's (1984) theoretical analysis,

i n d i c a t e d that b o u n d a r y strength is d e p e n d e n t on the

g r a m m a t i c a l functions that the c o n s t i t u e n t s in a given

s e n t e n c e play In p a r t i c u l a r , we observed a hierarchy

a m o n g these functions with respect to b o u n d a r y strength, as discussed below 5

In addition to i n c o r p o r a t i n g g r a m m a t i c a l f u n c t i o n

i n f o r m a t i o n into our system, we fleshed out the model

of G e e and G r o s j e a n to deal with syntactic structures that they do not explicitly consider In p a r t i c u l a r , G e e and G r o s j e a n ' s strictly left-to-right building of the

5 As an example of the effect that grammatical functions have

young man left We view this sentence as consisting of two lgrammatical relations: subject-predicate and adjunct-sentence

m our hierarchy of grammatical relations, the boundary between the adjuhct and the sentence is more salient than the boundary between the subject and the predicate The system

If we exclude any effects of grammatical functions and assume a simple l.eft-to-right attachment of the three

prosody tree,.we ~,ould assigr/ a -strofiger boundary following

manGr .man Imiowing Finally It is not clear that Gee and oslean make this lett-to-rlght assumption in such examples

comi~lementizer node in the s)ntax tree and it is difficult to determine whether the)' integrate the material in the comptemennzer Wltla the material in the core sentence as they are analy.zing the material in the core bentence or after that analysis IS completed If they integrate the complementizer

with the sentence in a left-td-right manner and- predict, incorrectly, that the stronger boundary occurs after man If they complete the prosodic analysis of the core sentence before bundling the sentence with the complementizer, then they incorrectly predict that there is a strong boundary after

problems d i a y o u expect the most perceptible boundary would

Furthermore, assuming that an adjunct in sentence-initial position is dominated b~ the complementizer node and in sentence-final position "by S-bar creates an inconsistent description, which hampe?s the ~alue of the model as an experimental tool

Trang 5

prosodic tree left c e r t a i n questions open, F o r

e x a m p l e , their m o d e l does not d e a l with s e n t e n c e s

e m b e d d e d in the m i d d l e of a m a i n sentence (as-in The

notion [that he would refrain f r o m such an act] was

incorrect.) W e i n c o r p o r a t e e m b e d d e d s e n t e n c e s into

the prosodic tree in a cyclic m a n n e r to insure that the

m a t e r i a l in the e m b e d d e d sentence is p r o c e s s e d b e f o r e

that in the main sentence 6 In a d d i t i o n G e e and

G r o s j e a n leave o p e n the t r e a t m e n t of the multiple

r i g h t w a r d e m b e d d i n g of n o n - s e n t e n t i a l constituents,

e.g., the NP e m b e d d i n g in The destruction o f the good

name of his f a t h e r O u r a p p r o a c h is to handle these

cases recursively, from the most d e e p l y e m b e d d e d

p h r a s e up, in o r d e r to p r e s e r v e the prosodic cohesion

of the entire NP

O u r adjunction rules are d e r i v e d for the most p a r t

from S e l k i r k ' s account We have also m a d e use of the

idea, which G e e and G r o s j e a n ([983) t a k e largely f r o m

the work of S e l k i r k , that c e r t a i n syntactic heads m a r k

off phonological phrase b o u n d a r i e s , and provide the

basic prosodic constituents for higher level analysis

O u r p r o s o d y rules run in four i n d e p e n d e n t stages

E a c h stage builds on the previous stage, so that the

rules can r e f e r to both syntactic and prosodic s t r u c t u r e

as they build successively higher levels of prosodic

structure

(i) Adjunction Rules combine o r t h o g r a p h i c a l l y

distinct words into phonological c o n s t i t u e n t s with no

internal w o r d b o u n d a r y , T h e y join a w o r d to its left

or right neighbor d e p e n d i n g on (a) the c a t e g o r y of the

word, and (b) its s t r u c t u r a l r e l a t i o n to o t h e r words In

g e n e r a l , adjoinable words are the function words

articles, c o m p l e m e n t i z e r s , auxiliary verbs,

conjunctions, p r e p o s i t i o n s and pronouns (except for

the "strong" possessives, mine, hers, theirs, yours, ours,

which are t r e a t e d as r e g u l a r NP's)

A d j u n c t i o n occurs six times for the s e n t e n c e in

F i g u r e 2 to c r e a t e six multiple word groups, all right-

adjoining: on each, in 48-channel, can power, the echo,

a p p e a r as t e r m i n a l s in the p r o s o d y tree in F i g u r e 2 In

subsequent processing the b o u n d a r i e s b e t w e e n the

words in these groups are m a r k e d so that the text-to-

speech system does not p r o d u c e the prosodic

indications of a word b o u n d a r y In addition, these

groups are t r e a t e d as single words in f u r t h e r analyses

(ii) ~-phrasing Rules construct phonological (or 6p)

phrases, which are the building blocks of the p r o s o d y

tree These rules identify groups of words that c o h e r e

strongly in speech and thus should not be s e p a r a t e d by

phrase boundaries In the p r e s e n t i m p l e m e n t a t i o n ,

each • phrase is c o n s t r u c t e d by a left-to-right process

that collects the words f o r m e d by adjunction until it

reaches a noun or verb A t this point, a • p h r a s e is

c r e a t e d that consists of the c o l l e c t e d words plus the

noun or verb, which acts as head of the phrase F o r

e x a m p l e , in that shelf, in F i g u r e 2 is a single • p h r a s e

consisting of two words

In Figure 2, the • nodes m a r k e d with a syntactic

c a t e g o r y are the minimal phonological constituents

with respect to l a t e r rules that build the prosodic

s Having taken this strona approach, we now understand the

limited exceptions to this~mechanism, which we discuss below'

phrases; these @ phrases have an internal s t r u c t u r e , but the s t r u c t u r e plays no role in f u r t h e r processing Note that n e i t h e r adjectives nor adverbs are allowed

to be the h e a d of a • p h r a s e , so that three additional open slots is a single • phrase consisting of four words

E x a m p l e s such as Someone tall walked into the room,

however, suggest that our t r e a t m e n t of these categories is not d e t a i l e d enough and that, in future versions of the system, some adjectives and adverbs should act as • heads

(iii) Prosody-phrasing rules use i n f o r m a t i o n about phrases and syntactic s t r u c t u r e to c r e a t e a new

o r g a n i z a t i o n of the sentence and to assign strength values to the b o u n d a r i e s b e t w e e n successive • phrases The process of building the prosody tree starts with the sentence node (S or Sbar) that is most deeply

e m b e d d e d in the u t t e r a n c e , t r a n s f o r m i n g it into a prosody subtree This process continues through successively higher levels of sentence nodes until all top-level sentences have b e e n t r a n s f o r m e d into prosody subtrees All the processing of each successive sentence is done b e f o r e the relation of the sentences to e a c h other is c o n s i d e r e d 7

W i t h i n a s e n t e n c e , the • phrases are p r o c e s s e d from left to right This stage of the analysis uses a window that allows access to t h r e e a d j a c e n t nodes

P a t t e r n - a c t i o n rules, which are d e s c r i b e d below, apply

to the nodes in the window and build p r o s o d y subtrees that r e p l a c e the syntax nodes T h e s e subtrees are

h e a d e d by a • node containing a n u m b e r that

r e p r e s e n t s node count; the n u m b e r is d e t e r m i n e d by counting the n u m b e r of nodes c o n t a i n e d in the

p r o s o d y a s u b t r e e , plus 1 for the • node that heads the subtree In g e n e r a l , the prosody p h r a s e rules do t h r e e things:

(a) Balance prosodic phrases by r e f e r r i n g to

c o n s t i t u e n t length This rule only applies for building the p r o s o d y subtree that contains the verb If the node count for subject plus verb is less than the node count of the verb's c o m p l e m e n t , then subject and verb are g r o u p e d t o g e t h e r in a prosodic subtree; this gives the phrasing in The characters on the right mark the salient f e a t u r e s O t h e r w i s e , the verb is g r o u p e d with its c o m p l e m e n t in a prosodic subtree; an e x a m p l e of this grouping is the s u b t r e e for can p o w e r only the echo cancelers in Figure 2,

(b) C o m b i n e the • p h r a s e d a u g h t e r s of the major constituents, excluding VP, into a prosodic subtree

A t p r e s e n t , this rule only applies to NP and PP since adjectives and adverbs are c u r r e n t l y not t r e a t e d as @ heads F o r e x a m p l e , the name o f the character, which forms two d~ phrases under NP (the name and of the

r e p l a c e s the NP

7, We have found at least one class of phrases for which this order of processing appears inappropriate In these, the head

of the top-level phrase is epistemlc e.g., believe, know, belief, knowledge andits complement is a sentence In most cases, the current processing order for embedded sentences will produce a break between a head and a following embedded sentence For this class of sentences, however, thd break does not seem to be appropriate "~Vhile it wot ld be straightforward

to handle this as an exception, we are currently examning whether there is a more principled wa? to describe what must

be done in these cases

s Onl,~ the top-level • nodes, those which contain the head of the ~ ntactic phrase, are counted in computing the node count LnU~,~'- ~y~:Lv~ ~am~lev • in Fi,,ure -, "~ the sub-phrasal branching' ot"

Left-hand and power unit c~oes not contribute to the node count

149

Trang 6

(c) Bundle t o g e t h e r prosodic constituents ( ~

p h r a s e s ) from left to right if no o t h e r rules apply

This rule i n t e g r a t e s the constituents left u n a t t a c h e d by

the p a r s e r into the prosodic s t r u c t u r e It accounts for

the prosodic s t r u c t u r e of left-hand power unit on each

shelf in 48-channel module in figure 2, which is f o r m e d

by first bundling left-hand power unit with on each

48-channel module into ~ - 5 The final a p p l i c a t i o n of

bundling r e p l a c e s the Sigma node with the top level

p r o s o d y node, which is q5-13 in F i g u r e 2

(iv) Prosody conversion rules m a p the b o u n d a r y

strength indices o n t o t h r e e p h o n o l o g i c a l m e c h a n i s m s

B o u n d a r y indices in the low r a n g e , e.g the ~ - 3 nodes

in F i g u r e 2, are r e a l i z e d as a p h r a s e accent

( P i e r r e h u m b e r t 1980) M i d - r a n g e indices such as ~-5

and ~ - 9 in F i g u r e 2 are r e a l i z e d as c h a n g e s in pitch

range H i g h indices are r e a l i z e d with m o d u l a t i o n s in

b o t h p i t c h range and d u r a t i o n Thus the h i e r a r c h i c a l

o r g a n i z a t i o n of a s t r u c t u r e such as that in F i g u r e 2 can

be r e f l e c t e d d i r e c t l y in the s y n t h e s i z e d speech

P H E N O M E N A NOT T R E A T E D

Several p h e n o m e n a have b e e n o m i t t e d f r o m this

p r e l i m i n a r y version of the system Some of these

omissions arise f r o m the fact that we c o n c e n t r a t e d on

sentence analysis r a t h e r t h a n discourse analysis

O t h e r s involve p h e n o m e n a that c h a r a c t e r i z e s p o k e n

English, and thus did not occur in our original corpus

of t e c h n i c a l r e p a i r manuals

C o n t r a s t i v e stress is an e x a m p l e of prosodic

phrasing based on discourse analysis In our system's

analysis, the p h r a s e f r o m India does not r e c e i v e

c o n t r a s t i v e stress in (12)

(12) P a s s e n g e r s f r o m s e v e r a l c o u n t r i e s e n t e r e d

the t e r m i n a l

Finally a m a n f r o m I n d i a w a l k e d in

In designing the c u r r e n t s y s t e m , we have c o n c e n t r a t e d

on the level of s e n t e n c e analysis H a n d l i n g the

c o n t r a s t s involved in d a t a like (12) n e c e s s i t a t e s an

a d d i t i o n a l level of discourse analysis

In a d d i t i o n , the s y s t e m n e v e r explicitly m a n i p u l a t e s

s e g m e n t d u r a t i o n s or o v e r a l l s p e e c h rate F o r

e x a m p l e , we have vet to e x p l o r e w h e t h e r l e n g t h e n i n g

of the s e g m e n t b e f o r e a m i d - r a n g e b o u n d a r y value is

a p p r o p r i a t e , or w h e t h e r increasing the d u r a t i o n of

constituents of the core s e n t e n c e might e n h a n c e the

n a t u r a l sound of the system

RESULTS AND FUTURE RESEARCH

To date our s y s t e m has b e e n t e s t e d s y s t e m a t i c a l l y

on a set of 39 s e n t e n c e s , and its p e r f o r m a n c e has b e e n

o b s e r v e d less f o r m a l l y on a set of a p p r o x i m a t e l y 300

sentences 9 The test corpus covers a r e p a i r m a n u a l for

t e l e p h o n e switching systems and an i n t r o d u c t o r y

d e s c r i p t i o n of the Prose 2000 t e x t - t o - s p e e c h system

W e a d d e d sentences cited in U m e d a (1982) and

s e n t e n c e s that we c o m p o s e d in o r d e r to e x t e n d the

range of syntactic constructions r e p r e s e n t e d in the

test In g e n e r a l , we have o b s e r v e d a significant

i m p r o v e m e n t of prosodic quality in those test

9 The 39 sentences are listed in the appendix to this paper

s e n t e n c e s w h e r e the p a r s e r and the prosodic

c o m p o n e n t have r e t u r n e d a c c e p t a b l e results

W e have o b s e r v e d p r o b l e m s , h o w e v e r , e s p e c i a l l y in the f o r m a l test corpus, much of which we chose for its

p o t e n t i a l difficulty Of the 39 test s e n t e n c e s , 38

p a r s e d correctly Of these, the prosodic c o m p o n e n t

r e t u r n e d 26 sentences with a c o m p l e t e set of

a c c e p t a b l e p r o s o d y m a r k i n g s In t e r m s of a c t u a l

m a r k i n g s , the system m a r k e d 393 prosodic e v e n t s , of which 21 m a r k i n g s were u n a c c e p t a b l e W e can

a t t r i b u t e e r r o r s in those s e n t e n c e s with u n a c c e p t a b l e

p r o s o d i c m a r k i n g s to t h r e e distinct p r o b l e m s discussed below

Complement Sentences

F i v e of the e r r o r s that arose from the p r o s o d y

s y s t e m ' s t r e a t m e n t of the test c o r p u s result f r o m the fact that the system sets off all s u b o r d i n a t e s e n t e n c e s , including c o m p l e m e n t s e n t e n c e s , from the m a i n

s e n t e n c e I n f o r m a l testing of the p r o d u c t i o n s of four

i n f o r m a n t s on the r e l e v a n t d a t a i n d i c a t e d that this

a p p r o a c h w o r k s c o r r e c t l y for c o m p l e m e n t s e n t e n c e s such as (13)-(16) ( C o m p l e m e n t s e n t e n c e s are italicized):

(13) H e a l t h services c a u t i o n e d W e s t e r n r e s i d e n t s

that they should ask where their watermelons come f r o m before buying

(14) W e have to satisfy p e o p l e that the crisis is past

(15) The v e n d o r s e x p l a i n e d that this is the result

of illness among 281 people who ate pesticide- tainted watermelons

(16) W a t e r m e l o n g r o w e r s w o n d e r whether this will continue throughout the rest of the season

H o w e v e r the i n f o r m a n t test consistently i n d i c a t e d that the c o m p l e m e n t s e n t e n c e s in (17)-(19)" are not set off by a c o m p a r a b l e b o u n d a r y :

(17) T h e y b e l i e v e California sales are still o f f

75 percent

(18) T h e y t h i n k the Southeast is shipping half its normal load

(19) G r o w e r s and r e t a i l e r s c l a i m e d the incident hurt sales across the USA

Cases like (17)-(19) in which no b r e a k is p e r c e i v e d

b e t w e e n the v e r b and its c o m p l e m e n t s e n t e n c e , f o r m a

s y n t a c t i c a l l y distinct class in F i d d i t c h This class is

c h a r a c t e r i z e d by the fact that the v e r b a l h e a d in e a c h case is one that does not r e q u i r e that its c o m p l e m e n t

s e n t e n c e begin with a c o m p l e m e n t i z e r ( e i t h e r that, f o r ,

or a wh- word) T h e class includes e p i s t e m i c verbs, like those in (17)-(19), as well as a wide r a n g e of verbs that t a k e e i t h e r t e n s e d s e n t e n c e s , or various types of

n o n - t e n s e d s e n t e n c e s as c o m p l e m e n t s ) ° The e x a m p l e s (20)-(26) d e m o n s t r a t e the range of this class ( c o m p l e m e n t s e n t e n c e s are italicized):

l0 Fidditch, in followin~ the outlines of Chomskv's (1981) Government and Binding theory, assumes that propositions, i.e., those elements that cBntain k]oth a prkdicate and a perhaps null subject, are syntactically represented as sentences, regardless of tensing

150

Trang 7

(20) We had the ship's forces make temporary

repairs

investigation impractical

advance

repairs

Sentence-Final Constituents

F i f t e e n of the errors that arose from the system's

t r e a t m e n t of the test corpus result from a high

boundary value that sets final c o n s t i t u e n t s off from

the main sentence The high value is due to the

system's purely left-to-right a t t a c h m e n t of syntactically

u n a t t a c h e d constituents (see rule iii.d above) The

high boundary value is acceptable in sentences like

examples are italicized)

(27) In these instances it may be desirable to use

p h o n e m e characters instead of text characters

in the input text

(28) Phonemic characters can also be used to

handle syntactic data such as boundaries

which can improve speech quality

to equipment failure

However the high b o u n d a r y value sets the final

constituent off u n n a t u r a l l y from the m a i n sentence in

data such as (30)-(32)

(30) The method by which you convert a word

into p h o n e m e s is provided in

Chapter 7

(31) The e x p e r i m e n t e r s instructed the i n f o r m a n t

implemented

In many cases it appears that the g r a m m a t i c a l

relation of the final constituent to the rest of the

sentence d e t e r m i n e s the boundary value that sets off

which bear no relation to any single item in a

sentence, are set off by a minor phrase b o u n d a r y

whereas final constituents that modify a p a r t i c u l a r

distinction b e t w e e n the final constituents in (27)-(29),

which are adjuncts, and those in (30)-(32), which are

modifiers However, while the distinction b e t w e e n the

( c o m p l e m e n t and subject) and those of the periphery

(adjunct and modifier) is fairly straightforward, and

handled directly bv the mechanisms of the Fidditch

e l e m e n t s of adjunct and modifier are complex a n d require the addition of costly mechanisms

T h e cost of adding a d j u n c t / m o d i f i e r distinctions is illustrated by the ambiguity that arises when both

more clearly, consider the r e a r r a n g e m e n t of this

the3: instructed the informants to speak.) The context of speech analysis prefers the f o r m e r reading However, the net benefit of adding sophisticated contextual analysis to our system, if a t t a i n a b l e , is, at best, unclear The same may be said of adding selectional restrictions, or detailed i n f o r m a t i o n on logical form

In contrast, a finer t r e a t m e n t of local syntactic

constituents is within reach From the data we have

e x a m i n e d , it appears that the character of the prosodic

e v e n t before the final c o n s t i t u e n t can be locally

d e t e r m i n e d to a great extent For the most part this

d e t e r m i n a t i o n depends on the category type of the final constituent and on the contents of the leading edge of the constituent For example, interjections

(however moreover, therefore, alas, thus, of course, etc.)

etc.) are u n i f o r m l y set off by a high b o u n d a r y value

a n d should r e m a i n so In contrast, the b o u n d a r y value

of final prepositional phrases, particularly those with

11

are c u r r e n t l y engaged in categorizing the c o n s t i t u e n t

constituents with respect to the prosodic event that precedes them

A l t e r n a t i v e l y , we are considering the play-it-safe approach of reducing the high b o u n d a r y values that set off final constituents to m i d - b o u n d a r y values

useful in conjunction with our local d e t e r m i n a t i o n approach for those constituents whose status is either undecidable or ambiguous u n d e r the latter a p p r o a c h J ~

particular, in consideration of, etc must be treated like interjections

12 Reducing the final boundary ~alue leaves ambiguities unresolved For sentences such as (i! and (ii), below, we believe this lack of resolution is appropriate:

(i) John saw a ~irl in the park with a telescope

park.liThe telesccTpe is witli John or the girl or it's in the (ii) I need a woman to fix the sink

[I need a woman so that I can fix the sink

I need a woman who can fix the sink.]

spoken Enghsh, such ambl~ulnes are not processed unless the speaker or listener is directly questioned re~,arding the

dlsamblguate are inappropriate unless such questioning occurs Other cases are less clear For example, it is difficult to imazine that, in (28) the difference between the readin~ of the

whic~'h clause as a sentence adjunct and as a noun~phrase

such cases some local distinction, such as the presence or absence of the comma in (28), obtains

151

Trang 8

Sentence-Initial Constituents

W h e n a sentence contains both sentence-initial and

sentence-final adjuncts, the sentence-initial adjuncts

will be less prominently set off than the sentence-final

adjuncts due to the left-to-right a t t a c h m e n t of adjuncts

to the prosodic tree (see rule iii.b above) In data like

(33), however, a more appropriate rendering would

have the boundary after the adjunct 011 a clear day be

strong relative to the boundary before the adjunct as it

rises over the mountains

(33) On a clear day you can see the sun as it rises

over the mountains

While it would be trivial to increase the value of

the pertinent boundary, we are as yet unsure what the

critical features are which require a more perceptible

boundary For example, while a higher boundary

value after the prepositional phrase in (34) might b'e

acceptable, it is not clear that it is necessary:

(34) In the morning John left

Given the stylistically distinct nature of this data, we

have not yet considered this question in detail

Summary

While we have systematically tested our system so

far on a small set of examples, the number of prosodic

events involved in those examples, 393 is high, due to

the length of the sentences tested We find the 5

percent error rate, representing 21 prosodic events,

encouraging at this stage in the development of the

system In addition, we have delimited the problem

areas of an approach that relies solely on information

available in the syntax tree Our initial investigation

of these problems indicates that at least part of the

necessary information about phrase-level prosody is

conveyed in the lexicon per se Additionally, due to

the left-corner orientation of the Fidditch parser,

which exists independently to optimize search

strategies, the necessary lexical information is made

easily available

CONCLUSIONS

We have described an on-line experimental system

that uses prosody rules to infer prosodic phrasing from

constituent structure, grammatical functions, and

length considerations The system contains three

modules: a deterministic parser, a set of prosodic

phrasing rules, and an algorithm to convert the output

of the prosodic phrasing rules into signals for the Bell

Labs text-to-speech system

In developing the experiment, our intention was to

build a working system that would allow us to test

various hypotheses about the connections between

syntax and prosodic phrasing in human speech and to

upgrade the prosody of existing synthetic speech The

modularity of our system enables us to alter each

module independently in order to test different

hypotheses For example, the parser can be altered to

reflect the difference between verbs that require a

complementizer before a sentential complement and

those that do not 13 This alteration is independent of

13 Fidditch represents this as a difference in the level of the com-

plement sentence Verbs that require a complementizer take

an S-bar complement, while verbs that do not require a com-

plementizer take an S complement with an optional that

preceding

the workings of the prosody system or the prosody conversion rules

The existence of this prosody system makes the problem areas in the syntax-prosody relation more tractable by allowing online testing of a large body of data For example, the prosodically different character of the two classes of complement sentences discussed above became apparent after several examples from each class were run through the system We therefore feel we have built a tool that will aid in designing better approximations of sentence prosody as it relates to syntacnc structure

REFERENCES

Allen, J 1976 Synthesis of speech from unrestricted

text Proceedings of the IEEE, 4, 433-442

C h o m s k y , N 1971 Lectures on government and binding

Dordrecht: Foris Publications

Cooper, W and J Paccia-Cooper 1980 Syntax and speech Cambridge, M A : H a r v a r d University Press Elovitz, H., R Johnson, A M c H u g h , and J E Shore

1976 Letter-to-sound rules for automatic translation

of English text to phonetics I E E E Transactions on Acoustics, Speech, and Signal Processing, 6, 446-459 Gee, J P and F Grosjean 1983 P e r f o r m a n c e structures: a psycholinguistic and linguistic appraisal

Cognitive Psychology, 15, 411-458

Hindle D 1983 User manual for Fidditch, a deterministic parser N R L Technical M e m o r a n d u m

#7590-142

Luce, P.A., Feustel, T.C., and Pisoni, D.B 1983 Capacity demands in short-term m e m o r y for synthetic

and natural speech Human Factors, 25, 17-32

Marcus, M 1980 A theory of syntactic recognition f o r natural language Cambridge, M A : M I T Press

Pierrehumbert, J B 1080 The phonetics and phonology of English intonation Ph.D Dissertation, MIT

Selkirk, E O 1984 Phonology and syntax: the relation between sound and structure Cambridge, M A : M I T Press

U m e d a , N 1982 Boundary: perceptual and acoustic properties and syntactic and statistical determinants

Speech and Language, 7, 333-371

U m e d a , N and R Teranishi The parsing program for automatic text-to-speech synthesis developed at the Electrotechnical L a b o r a t o r y in 1968 I E E E Transactions on Acoustics, Speech, and Signal Processing, 23, 183-188

A P P E N D I X : T E S T S E N T E N C E S

1 T H E N A M E OF T H E C H A R A C T E R IS N O T

P R O N O U N C E D

2 L E F T - H A N D P O W E R U N I T ON E A C H S H E L F

IN F O R T Y - E I G H T

C H A N N E L M O D U L E P O W E R S O N L Y E C H O

C A N C E L L E R S IN T H A T

S H E L F

152

Trang 9

3 THE C O N N E C T I O N MUST BE D E T E R M I N E D

F O R THE L E F T - H A N D P O W E R UNITS ON E A C H

S H E L F

4 T H E C O N N E C T I O N MUST BE D E T E R M I N E D

F O R T H E L E F T - H A N D P O W E R UNITS W H I C H

A R E ON E A C H SHELF

5 T H E M E T H O D BY W H I C H ONE C O N V E R T S A

W O R D INTO P H O N E M E S IS P R O V I D E D IN

C H A P T E R 7.14

6 WE DISCUSSED THE T E C H N I Q U E S WE H A D

I M P L E M E N T E D

7 T H E T E C H N I Q U E S WE H A D I M P L E M E N T E D

W E R E TESTED ON A L A R G E R M A C H I N E

8 THE M A N W H O M WE SAW Y E S T E R D A Y

LIVES F A R A W A Y F R O M H E R E

9 T H E Y T O L D HIM TO W A L K SLOWLY

10 T H E D E S T R U C T I O N OF T H E G O O D N A M E

OF HIS F A T H E R B O T H E R E D HIM

11 L A T E L Y HE H A D H A S C O N T R O L O V E R T H E

S I T U A T I O N

12 I N E E D A W O M A N TO F I X T H E SINK

13 JOHN MET A W O M A N H E T H O U G H T H E

LIKED

14 THE W O M A N I S A W C A M E F R O M H E R E ,

15 IN T H E S E INSTANCES IT M A Y BE

D E S I R A B L E TO USE P H O N E M E C H A R A C T E R S

I N S T E A D O F T E X T C H A R A C T E R S TO

R E P R E S E N T A W O R D E A C H T I M E IT A P P E A R S

ON T H E INPUT TEXT

16 P H O N E M E C H A R A C T E R S G I V E M O R E

C O N T R O L O V E R THE P A R T I C U L A R SOUNDS

T H A T A R E G E N E R A T E D

17 T H E M A T E R I A L S R E Q U I R E D A R E ONE

KITE KIT

18 P H O N E M I C C H A R A C T E R S C A N A L S O BE

USED TO H A N D L E SYNTACTIC D A T A SUCH AS

THE B O U N D A R I E S W H I C H C A N I M P R O V E

SPEECH Q U A L I T Y

19 IT M A Y BE D E S I R A B L E TO G I V E J O H N A

H A N D

20 A F T E R T H E S E Q U E S T I O N S , A D E T A I L E D

D E S C R I P T I O N O F T H E USE O F P H O N E M E S

W I L L BE

P R O V I D E D IN C H A P T E R 7

21 T H E E N G L I S H T H A T IS SPOKEN IN

A M E R I C A A T THE P R E S E N T DAY H A S

R E T A I N E D A G O O D M A N Y C H A R A C T E R I S T I C S

O F E A R L I E R BRITISH E N G L I S H T H A T DO NOT

S U R V I V E IN BRITISH E N G L I S H T O D A Y

22 P H O N E M I C C H A R A C T E R S C A N A L S O BE

USED TO H A N D L E S Y N T A C T I C D A T A SUCH AS

T H E L O C A T I O N O F T H E ENDS O F P H R A S E S

W H I C H C A N I M P R O V E S P E E C H Q U A L I T Y

23 T H E STUDENTS C O N S I D E R E D THE

A S S U M P T I O N T H A T A B R E A K M I G H T O C C U R

24 F I N A L L Y YOU MUST A S S U M E T H A T Y O U R

C I G A R E T T E S W I L L B O T H E R T H E

P A S S E N G E R S ,

25 TRY TO G I V E T H E N A M E S O F THE

C H A R A C T E R S TO JOHN,

26 I P R E F E R F O R HIM TO G I V E T H E N A M E S

O F T H E C H A R A C T E R S TO JOHN

27 I B E L I E V E T H O S E P E O P L E TO BE

I N T E L L I G E N T

28 I P R O M I S E D HIM T H A T HE C O U L D COME

29 T H E Y G A V E T H E BOY A BOOK

30 T H E Y G A V E H I M A BOOK

31 T H E 4 8 - C H A N N E L M O D U L E C A N H A V E

O N L Y T W O D I - G R O U P S BUT C A N H A V E UP TO

F O U R P O W E R UNITS IF BOTH D I - G R O U P S A R E

E Q U I P P E D W I T H E C H O C A N C E L E R S

32 I T O L D HIM Y E S T E R D A Y TO C L E A N HIS

R O O M

33 M O V E T H E P O W E R OPTION J U M P E R P L U G

SO T H A T IT IS A D J A C E N T TO D I - G R O U P ONE

ON P R I N T E D W I R I N G BOARD

34 I W A N T A LOT M O R E C O O K I E S

35 THE MINUS-SIGN P R O N U N C I A T I O N SWITCH

IS IN T H E M I D D L E

36 HE A S K E D T H E C H I L D R E N TO FINISH THE JOB

37 HE A R G U E D T H A T IT WAS IMPOSSIBLE

38 IS A M A N A T THE DOOR

39 A D E T A I L E D D E S C R I P T I O N O F T H E USE OF

P H O N E M E S IS P R O V I D E D IN C H A P T E R 7

1,1 Fidditch failed here on the relative clause with a PP left edge

153

Trang 10

0

tO

,g

° ~

a')

2.-

i::a.,

• v,,,~

,.-1

0

it)

t ~

<

o, ~

g.r.,

Định dạng
Số trang	11
Dung lượng	0,99 MB