Thông tin tài liệu
ExtractingKeySemanticTermsfromChineseSpeechQueryforWeb
Searches
GangWANG
NationalUniversityof
Singapore
wanggang_sh@hotmail.com
Tat-SengCHUA
NationalUniversityofSinga-
pore
chuats@comp.nus.edu.sg
Yong-ChengWANG
ShanghaiJiaoTongUniver-
sity,China,200030
ycwang@mail.sjtu.edu.cn
Abstract
This paper discusses the challenges and pro-
poses a solution to performing information re-
trievalontheWebusingChinesenaturallanguage
speech query. The main contribution of this re-
searchisindevisingadivide-and-conquerstrategy
toalleviatethespeech recognition errors. It uses
thequerymodeltofacilitatetheextractionofmain
coresemanticstring(CSS)fromtheChinesenatu-
rallanguagespeechquery.ItthenbreakstheCSS
into basic components corresponding to phrases,
and uses a multi-tier strategy to map the basic
components to known phrases inorderto further
eliminatetheerrors.Theresultingsystemhasbeen
foundtobeeffective.
1 Introduction
Weareentering aninformation era, where infor-
mationhasbecomeoneofthemajorresourcesin
ourdailyactivities.Withitswidespreadadoption,
Internethasbecomethelargestinformationwealth
for all to share.Currently, most (Chinese)search
engines can only support term-based information
retrieval,wheretheusersarerequiredtoenterthe
queriesdirectlythroughkeyboardsinfrontofthe
computer. However, there is a large segment of
populationinChinaandtherestoftheworldwho
areilliterateanddonothavetheskillstousethe
computer.Theyarethusunabletotakeadvantage
ofthevastamountoffreelyavailableinformation.
Since almost every person can speak and under-
standspokenlanguage,theresearchon“(Chinese)
natural language speech query retrieval” would
enableaveragepersonstoaccessinformationusing
thecurrentsearchengineswithouttheneedtolearn
specialcomputerskillsortraining.Theycansim-
ply access the search engine using common de-
vices that they are familiar with such as the
telephone,PDAandsoon.
Inordertoimplementaspeech-basedinforma-
tion retrieval system, one of the most important
challengesishowtoobtainthecorrectqueryterms
fromthespokennaturallanguagequerythatcon-
veythemainsemanticsofthequery.Thisrequires
theintegrationofnaturallanguagequeryprocess-
ingandspeechrecognitionresearch.
Naturallanguagequeryprocessinghasbeenan
activeareaofresearchfor many yearsandmany
techniques have been developed (Jacobs and
Rau1993;Kupie,1993;
Strzalkowski,1999;Yuet
al,1999).Mostofthesetechniques,however,focus
onlyonwrittenlanguage,withfewdevotedtothe
studyofspokenlanguagequeryprocessing.
Speech recognition involves the conversion of
acousticspeechsignalstoastreamoftext.Because
ofthecomplexityofhumanvocaltract,thespeech
signalsbeingobservedaredifferent,evenformul-
tipleutterancesofthesamesequenceofwordsby
thesameperson(Leeetal1996).Furthermore,the
speechsignalscanbeinfluencedbythedifferences
across different speakers, dialects, transmission
distortions, and speaking environments. These
have contributed to the noise and variability of
speechsignals.Asoneofthemainsourcesofer-
rorsinChinesespeechrecognitioncomefromsub-
stitution (Wang 2002; Zhou 1997), in which a
wrongbutsimilarsoundingtermisusedinplaceof
thecorrectterm,confusionmatrixhasbeenusedto
recordconfusedsoundpairsinanattempttoelimi-
nate this error. Confusion matrix has been em-
ployed effectively in spoken document retrieval
(Singhaletal,1999andSrinivasanetal2000)and
tominimizespeechrecognitionerrors(Shenetal,
1998). However, when such method is used di-
rectlytocorrectspeechrecognitionerrors,ittends
tobringin too many irrelevantterms(Ng2000).
Becauseimportant terms in a longdocumentare
oftenrepeatedseveraltimes,thereisagoodchance
thatsuchtermswillbecorrectlyrecognizedatleast
oncebyaspeechrecognitionenginewithareason-
ablelevelof wordrecognitionrate.Manyspoken
documentretrieval(SDR)systemstookadvantage
ofthisfactinreducingthespeechrecognitionand
matchingerrors(Mengetal2001;Wangetal2001;
Chen et al2001). Incontrastto SDR,very little
work has been done on Chinese spoken query
processing(SQP),whichistheuseofspokenque-
riestoretrievaltextualdocuments.Moreover,spo-
kenqueriesinSQPtendtobeveryshortwithfew
repeatedterms.
In this paper, we aim to integrate the spoken
languageandnaturallanguageresearchtoprocess
spokenquerieswithspeechrecognitionerrors.The
maincontributionofthisresearchisindevisinga
divide-and-conquerstrategytoalleviatethespeech
recognition errors. It first employs the Chinese
query model to isolate the Core Semantic String
(CSS) that conveys the semantics of the spoken
query. It then breaks the CSS into basic compo-
nentscorrespondingtophrases,andusesamulti-
tierstrategytomapthebasiccomponentstoknown
phrasesinadictionaryinordertofurthereliminate
theerrors.
Intherestofthispaper,anoverviewofthepro-
posedapproachisintroducedinSection2.Section
3describesthequerymodel,whileSection4out-
lines the use of multi-tier approach to eliminate
errorsinCSS.Section5discussestheexperimental
setup and results.Finally,Section 6 contains our
concludingremarks.
2 Overviewoftheproposedapproach
Therearemanychallengesinsupportingsurfingof
Webbyspeechqueries.Oneofthemainchallenges
isthatthecurrentspeechrecognitiontechnologyis
notverygood,especiallyforaverageusersthatdo
nothaveanyspeechtrainings.Forsuchunlimited
user group, the speech recognition engine could
achieveanaccuracyoflessthan50%.Becauseof
this,thekeyphraseswederived fromthespeech
querycouldbeinerrorormissingthemainseman-
ticofthequeryaltogether.This wouldaffectthe
effectivenessoftheresultingsystemtremendously.
Giventhespeech-to-textoutputwitherrors,the
keyissueisonhowtoanalyzethequeryinorderto
grasptheCoreSemanticString(CSS)asaccurately
as possible. CSS is defined as the key term se-
quenceinthequerythatconveysthemainseman-
tics of the query. For example, given the query:
“
”(Pleasetell
metheinformationonhowtheU.S.separatesthe
most-favored-nation status from human rights is-
sueinchina).TheCSSinthequeryisunderlined.
WecansegmenttheCSSintoseveralbasiccom-
ponentsthatcorrespondtokey concepts suchas:
(U.S.), (China), (human
rightsissue),
(themost-favored-nation
status)and
(separate).
Because of the difficulty in handling speech
recognitionerrorsinvolvingmultiple segmentsof
CSSs,welimitourresearchtoqueriesthatcontain
onlyoneCSSstring.However,weallowaCSSto
includemultiplebasiccomponentsasdepicted in
theaboveexample.Thisisreasonableasmostque-
riesposedbytheusersontheWebtendtobeshort
withonlyafewcharacters(Pu2000).
Thus the accurate extraction of CSS and its
separation into basic components is essential to
alleviatethespeechrecognitionerrors.Firstofall,
isolatingCSSfromtherestofspeechenablesusto
ignoreerrorsinotherpartsofspeech,suchasthe
greetingsandpoliteremarks,whichhavenoeffects
ontheoutcomeofthequery.Second,byseparating
theCSSintobasiccomponents,wecanlimitthe
propagationoferrors,andemploythesetofknown
phrasesinthedomaintohelpcorrecttheerrorsin
thesecomponentsseparately.
Figure1:Overviewoftheproposedapproach
To achieve this, weprocess the query in three
mainstagesasillustratedinFigure1.First,given
theuser’soralquery,thesystemusesaspeechrec-
ognitionenginetoconvertthespeechtotext.Sec-
ond, we analyze the query using a query model
(QM) to extract CSS from the query with mini-
mumerrors.QMdefinesthestructuresandsome
of the standard phrases used in typical queries.
Third,wedividetheCSSintobasiccomponents,
andemployamulti-tierapproachtomatchtheba-
QM
Confusionmatrix
PhraseDictionary
Multi-Tier
mapping
Basic
Components
Speech
Query
CSS
sic components to the nearest known phrases in
ordertocorrectthespeechrecognitionerrors.The
aimhereistoimproverecallwithoutexcessivelost
in precision. The resulting key components are
thenusedasquerytostandardsearchengine.
The following sections describe the details of
ourapproach.
3 QueryModel(QM)
Querymodel (QM) is used to analyzethe query
and extract the core semantic string (CSS) that
containsthemainsemanticofthequery.Thereare
twomaincomponentsforaquerymodel.Thefirst
isquery componentdictionary,which isa set of
phrasesthathascertainsemanticfunctions,suchas
the polite remarks, prepositions, time etc. The
othercomponentisthequerystructure,whichde-
finesasequenceofacceptablesemanticallytagged
tokens, such as “Begin, Core Semantic String,
QuestionPhrase, and End”.Each querystructure
alsoincludesitsoccurrenceprobabilitywithinthe
query corpus. Table 2 gives some examples of
querystructures.
3.1QueryModelGeneration
Inordertocomeupwithasetofgeneralizedquery
structures, we use a query log of typical queries
posedbyusers.Thequerylogconsistsof557que-
ries,collectedfromtwenty-eighthumansubjectsat
the Shanghai Jiao Tong University (Ying 2002).
Eachsubjectisaskedtopose20separatequeriesto
retrievegeneralinformationfromtheWeb.
After analyzing the queries, we derive a query
modelcomprising51querystructuresandasetof
query components. For each query structure, we
compute its probability of occurrence, which is
used to determine the more likely structure con-
tainingCSSincasetherearemultipleCSSsfound.
Aspartoftheanalysisofthequerylog,weclassify
thequerycomponentsintotenclasses,aslistedin
Table1.Thesetenclassesarecalledsemantictags.
Theycanbefurtherdividedintotwomaincatego-
ries:theclosedclassandopenclass.Closedclasses
are those that have relatively fixed word lists.
Theseincludequestionphrases,quantifiers,polite
remarks, prepositions, time and commonly used
verb and subject-verb phrases. Wecollectallthe
phrasesbelongingtoclosedclassesfromthequery
logandstoretheminthequerycomponentdiction-
ary.TheopenclassistheCSS,which wedonot
knowinadvance.CSStypicallyincludesperson’s
names,eventsandcountry’snamesetc.
Table1:DefinitionandExamplesofSemantictags
SemTag
Nameoftag Example
1. Verb-Object
Phrase
give
(me)
2. QuestionPhrase
(isthere)
3. QuestionField
(news),
(report)
4. Quantifier
(some)
5. VerbPhrase
(find)
collect
6. PoliteRemark
(pleasehelp
me)
7. Preposition
(about),
(about)
8. Subject-Verb
phrase
(I) (want)
9. CoreSemantic
String
9.11
(9.11event)
10. Time
(today)
Table2:ExamplesofQueryStructure
1
Q1:0,2,7,9,3,0:0.0025,
9.11
2793
IsthereanyinformationonSeptember11?
2
Q2:0,1,7,9,3,0:0.01
1793
GivemesomeinformationaboutBenladen.
Giventhesetofsamplequeries,aheuristicrule-
basedapproachisusedtoanalyzethequeries,and
break them into basic components with assigned
semantictagsbymatchingthewordslistedinTa-
ble 1. Any sequences of words or phrases not
foundintheclosedclassaretaggedasCSS(with
Semantic Tag 9). We can thus derive the query
structuresoftheformgiveninTable2.
3.2ModelingofQueryStructureasFSA
Duetospeechrecognitionerrors,wedonotexpect
thequerycomponentsandhencethequerystruc-
turetoberecognizedcorrectly. Instead,weparse
thequerystructure inordertoisolateandextract
CSS.Tofacilitatethis,weemploytheFiniteState
Automata(FSA)tomodelthequerystructure.FSA
modelstheexpectedsequencesoftokensintypical
queries andannotate the semantictags,including
CSS.AFSA isdefinedforeach of the51query
structures.AnexampleofFSAisgiveninFigure2.
BecauseCSSisanopenset,wedonotknowits
contentinadvance.Instead,weusethefollowing
tworulestodeterminethecandidatesforCSS:(a)
itisan unknownstring not present intheQuery
Component Dictionary; and (b) its length is not
lessthantwo,astheaveragelengthofconceptsin
Chineseisgreaterthanone(Wang1992).
At each stage of parsing the query using FSA
(Hobbsetal1997),weneedtomakedecisionon
which state to proceedand how to handleunex-
pected tokens in the query. Thus at each stage,
FSAneedstoperformthreefunctions:
a) Gotofunction:Itmapsapairconsistingofa
stateandaninputsymbolintoanewstateor
thefailstate.WeuseG(N,X)=N’todefine
thegotofunctionfromStateNtoStateN’,
giventheoccurrenceoftokenX.
b) Fail function: It is consulted whenever the
gotofunctionreportsafailurewhenencoun-
teringanunexpectedtoken.Weusef(N)=N’
torepresentthefailfunction.
c) Output function: In the FSA, certain states
aredesignatedasoutputstates,which indi-
cate that a sequence of tokens has been
found and are tagged with the appropriate
semantictag.
To construct a goto function, we begin with a
graph consisting of one vertex which represents
State0.WethenentereachtokenXintothegraph
byaddingadirectedpathtothegraphthatbegins
atthestartstate.Newverticesandedgesareadded
tothegraph sothat therewill be, startingat the
startstate,apathinthegraphthatspellsoutthe
tokenX.ThetokenXisaddedtotheoutputfunc-
tionofthestateatwhichthepathterminates.
Forexample,supposethatourQueryComponent
Dictionary consists of seven phrases as follows:
“
(please help me); (some);
(about);
(news); (collect); (tell
me);
(what do youhave)”. Adding these
tokensintothegraphwillresultinaFSAasshown
inFigure2.ThepathfromState0toState3spells
outthephrase“
(Pleasehelpme)”,andon
completion of this path, we associate its output
withsemantictag6.Similarly,theoutputof“
(some)” is associated with State 5, and semantic
tag4,andsoon.
Wenowuseanexampletoillustratetheprocess
of parsing the query. Suppose the user issues a
speechquery:”
” (please help me to collect some information
about Bin Laden).However, the resultofspeech
recognition witherrors is: ”
(please) (help)
(me) (receive) (send) (some)
(about) (half) (pull) (light) (of)
(news)”. Note that there are 4 mis-recognized
characterswhichareunderlined.
Note:indicatesthesemantictag.
Figure2:FSAforpartofQueryComponentDictionary
TheFSAbeginswithState0.Whenthesystem
encountersthesequenceofcharacters
(please)
(help) (me),thestatechangesfrom0to1,2
andeventuallyto3.AtState3,thesystemrecog-
nizes a polite remark phrase and output a token
withsemantictag6.
Next,thesystemmeetsthecharacter
(receive),
itwilltransittoState10,becauseofg(0,
)=10.
Whenthesystemseesthenextcharacter
(send),
which does not have a corresponding transition
rule, the goto function reports a failure. Because
thelengthofthestringis2andthestringisnotin
theQueryComponentDictionary,thesemantictag
9isassignedtotoken”
”accordingtothedefi-
nitionofCSS.
By repeating the aboveprocess, we obtain the
followingresult:
694793
HerethesemantictagsareasdefinedinTable1.
Itisnotedthatbecauseofspeechrecognitionerrors,
thesystem detected twoCSSs,andboth ofthem
containspeechrecognitionerrors.
3.3CSSExtractionbyQueryModel
Giventhat we mayfind multiple CSSs, the next
stageistoanalyzetheCSSsfoundalongwiththeir
surroundingcontextinordertodeterminethemost
probableCSS.Theapproachisbasedontheprem-
isethatchoosingthebestsenseforaninputvector
amountstochoosingthemostprobablesensegiven
that vector. The input vector i has three compo-
nents:leftcontext(L
i
),theCSSitself(CSS
i
),and
rightcontext(R
i
).Theprobabilityofsuchastruc-
tureoccurringintheQueryModelisasfollows:
=
=
n
j
jiji
pCs
0
)*( (1)
whereC
ij
issetto 1ifthe inputvectori(L
i
,R
i
)
matchesthetwocorrespondingleftandrightCSS
contextofthequerystructurej,and0otherwise.p
j
is the possibility of occurrence of the j
th
query
structure,andnisthetotalnumberofthestructures
intheQueryModel.NotethatEquation(1)givesa
detectedCSShigherweightifitmatchestomore
querystructureswithhigheroccurrenceprobabili-
ties. We simply select the best CSS
i
such that
)(maxarg
i
i
s
accordingtoEqn(1).
Forillustration,let’sconsidertheaboveexample
with2detectedCSSs.ThetwoCSSvectorsare:[6,
9, 4] and [7, 9, 3]. From the Query Model, we
know that the probability of occurrence, p
j
, of
structure[6,9,4]is0,andthatofstructure[7,9,3]
is0.03,withthelattermatchestoonlyonestruc-
ture.Hencethes
i
valuesforthemare0and0.03
respectively.Thusthemostprobablecoresemantic
structureis[7,9,3]andtheCSS“
(half) (pull)
(light)”isextracted.
4 QueryTermsGeneration
Becauseofspeechrecognitionerror,theCSSob-
tained is likely to contain error, or in the worse
case,missingthemainsemanticsofthequeryalto-
gether.Wenowdiscusshowwealleviatetheerrors
inCSSfortheformercase.Wewillfirstbreakthe
CSS into one or more basic semantic parts, and
thenapplythemulti-tiermethodtomapthequery
componentstoknownphrases.
4.1BreakingCSSintoBasicComponents
Inmanycases,theCSSobtainedmaybemadeup
ofseveralsemanticcomponentsequivalenttobase
nounphrases.Hereweemployatechniquebased
onChinesecutmarks(Wang1992)toperformthe
segmentation. The Chinese cut marks are tokens
that can separate aChinesesentence into several
semanticparts.Zhou(1997)usedsuchtechniqueto
detectnewChinesewords,andreportedgoodre-
sults with precision and recall of 92% and 70%
respectively.ByseparatingtheCSSintobasickey
components,wecanlimitthepropagationoferrors.
4.2Multi-tierquerytermmapping
Inordertofurthereliminatethespeechrecognition
errors,weproposeamulti-tierapproachtomapthe
basic componentsin CSS into known phrases by
usingacombinationofmatchingtechniques.Todo
this,weneedtobuildupaphrasedictionarycon-
taining typical conceptsused ingeneral and spe-
cificdomains.MostbasicCSScomponentsshould
bemappedtooneofthesephrases.Thusevenifa
basiccomponentcontainserrors,aslongaswecan
findasufficientlysimilarphraseinthephrasedic-
tionary, wecanusethisinplaceoftheerroneous
CSScomponent,thuseliminatingtheerrors.
We collected a phrase dictionary containing
about32,842phrases,covering mostlybasenoun
phraseandnamedentity.Thephrasesarederived
fromtwosources.We firstderivedasetofcom-
mon phrases from the digital dictionary and the
logsinthesearchengineusedattheShanghaiJiao
TongUniversity.Wealsoderivedasetofdomain
specific phrases by extracting the base noun
phrasesandnamedentitiesfromtheon-linenews
articlesobtainedduringtheperiod.Thisapproach
isreasonableasinpracticewecanuserecentweb
ornewsarticlesto extractconceptstoupdatethe
phrasedictionary.
Given the phrase dictionary, the next problem
then is to map the basicCSS components tothe
nearest phrases in the dictionary. As the basic
componentsmaycontainerrors,wecannotmatch
them exactly just at the character level. We thus
propose to match each basic component with the
knownphrasesinthedictionaryatthreelevels:(a)
character level; (b) syllable string level; and (c)
confusion syllable string level. The purpose of
matching at levels b and c is to overcome the
homophoneprobleminCSS.Forexample,“
(Laden)” is wrongly recognized as “
(pull
lamp)”bythespeechrecognitionengine.Sucher-
rorscannotbere-solvedatthecharactermatching
level,butitcanprobablybematchedatthesyllable
stringlevel.Theconfusionmatrixisusedtofurther
reducetheeffectofspeechrecognitionerrorsdue
tosimilarsoundingcharacters.
To account for possible errors in CSS compo-
nents, we perform similarity, instead of exact,
matchingatthethreelevels.GiventhebasicCSS
componentq
i
,andaphrasec
j
inthedictionary,we
compute:
=
=
),(
0
*
|}||,max{|
),(
),(
ii
cqLCS
k
k
ii
ii
ii
M
cq
cqLCS
cqSim
(2)
where LCS(q
i
,c
j
)gives the number of characters/
syllablematchedbetweenq
i
andc
i
intheorderof
theirappearanceusingthelongestcommonsubse-
quence matching (LCS) algorithm (Cormen et al
1990).M
k
isintroducedtoaccountsforthesimilar-
itybetweenthetwomatchingunits,andisdepend-
ent on the level of matching. If the matching is
performedatthecharacterorsyllablestringlevels,
thebasicmatchingunitisonecharacteroronesyl-
lableandthesimilaritybetweenthetwomatching
unitsis1.Ifthematchingisdoneattheconfusion
syllablestringlevel,M
k
isthecorrespondingcoef-
ficientsintheconfusionmatrix.HenceLCS(q
i
,c
j
)
givesthedegreeofmatchbetweenq
i
andc
j
,nor-
malizedbythemaximumlengthofq
i
orc
j
;andΣM
gives the degree of similarity between the units
beingmatched.
Thethreelevelofmatchingalsorangesfrombe-
ingmoreexactatthecharacterlevel,tolessexact
attheconfusionsyllablelevel.Thusifwecanfind
a relevant phrase with sim(q
i
,c
j
)>
at the higher
characterlevel,wewillnotperformfurthermatch-
ing at the lower levels. Otherwise, we will relax
theconstrainttoperformthe matchingatsucces-
sivelylowerlevels,probablyattheexpenseofpre-
cision.
Thedetailofalgorithmislistedasfollows:
Input:BasicCSSComponent,q
i
a. Matchq
i
withphrasesindictionaryatcharacter
levelusingEqn.(2).
b. Ifwecannotfindamatch,thenmatchq
i
with
phrasesatthesyllablelevelusingEqn.(2).
c. Ifwestillcannotfindamatch,matchq
i
with
phrasesattheconfusionsyllablelevelusing
Eqn.(2).
d. Ifwefoundamatch,setq’
i
=c
j
;otherwiseset
q’
i
=q
i
.
Forexample,givenaquery:“
”(pleasetellmesomenewsabout
Iraq).Ifthequeryiswronglyrecognizedas“
”. If, however, we
couldcorrectly extracttheCSS“
(Iraq)
fromthismis-recognizedquery,thenwecouldig-
norethespeechrecognitionerrorsinotherpartsof
the above query. Even if there are errors in the
CSSextracted,suchas“
(chen) (waterside)”
insteadof“
(chenshuibian)”,wecouldap-
plythesyllablestringlevelmatchingtocorrectthe
homophone errors. For CSS errors such as “
(corrupt)
(usually)”insteadofthecorrectCSS
“
(Taliban)”, which could not be corrected
atthesyllablestringmatchinglevel,wecouldap-
plytheconfusionsyllablestringmatchingtoover-
comethiserror.
5 Experimentsandanalysis
Asoursystem aimsto correct theerrorsand ex-
tractCSScomponentsinspokenqueries,itisim-
portant todemonstrate thatour system is able to
handlequeriesofdifferentcharacteristics.Tothis
end,wedevisedtwosetsoftestqueriesasfollows.
a)Corpuswithshortqueries
We devised 10 queries, each containing a CSS
withonlyonebasiccomponent.Thisisthetypical
typeofqueriesposedbytheusersontheweb.We
asked10 differentpeopleto “speak” thequeries,
and used the IBM ViaVoice 98 to perform the
speechtotextconversion.Thisgivesrisetoacol-
lectionof100spokenqueries.Thereisatotalof
1,340Chinesecharactersinthetestquerieswitha
speechrecognitionerrorrateof32.5%.
b)Corpuswithlongqueries
Inordertotestonqueriesusedinstandardtest
corpuses,weadoptedthequerytopics(1-10)em-
ployed in TREC-5Chinese-Languagetrack.Here
each query contains more thanone key semantic
component.Werephrasedthequeriesintonatural
languagequeryformat,andaskedtwelvesubjects
to “read” the queries. We again used the IBM
ViaVoice98toperformthespeechrecognitionon
theresulting120 differentspokenqueries,giving
risetoatotalof2,354Chinesecharacters witha
speechrecognitionerrorrateof23.75%.
Wedevisedtwoexperimentstoevaluatetheper-
formance of ourtechniques.The firstexperiment
wasdesignedtotesttheeffectivenessofourquery
model in extracting CSSs. The second was de-
signedtotesttheaccuracyofouroverallsystemin
extractingbasicquerycomponents.
5.1Test1:AccuracyofextractingCSSs
The test results show that by using our query
model,wecouldcorrectlyextract99%and96%of
CSSs from the spoken queries for the short and
long query category respectively. The errors are
mainly due to the wrong tagging of some query
components,whichcausedthequerymodeltomiss
the correct querystructure, or match to a wrong
structure.
Forexample:giventhequery“
”(pleasetellmesomenewsabout
Taliban).Ifitiswronglyrecognizedas:
97910
which is a nonsensical sentence. Since the prob-
abilitiesofoccurrencebothquerystructures[0,9,7]
and[7,9,10]are0,wecouldnotfindtheCSSatall.
Thiserrorismainlyduetothemis-recognitionof
thelastquerycomponent“
(news)”to“
(afternoon)”.ItconfusestheQueryModel,which
couldnotfindthecorrectCSS.
Theoverallresultsindicatethattherearefewer
errorsinshortqueriesassuchqueriescontainonly
one CSS component. This is encouraging as in
practicemostusersissueonlyshortqueries.
5.2Test2:Accuracyofextracting basic query
components
In order to test the accuracy of extracting basic
querycomponents,weaskedonesubjecttomanu-
ally divide the CSS into basic components, and
used that as the ground truth. We compared the
followingtwomethodsofextractingCSScompo-
nents:
a) As a baseline, we simply performed the stan-
dardstopwordremovalanddividedthequery
intocomponentswiththehelpofadictionary.
However, there is no attempt to correct the
speechrecognitionerrorsinthesecomponents.
Hereweassumethatthenaturallanguagequery
isabagofwordswithstopwordremoved(Ri-
cardo,1999).Currently,mostsearchenginesare
basedonthisapproach.
b)WeappliedourquerymodeltoextractCSSand
employed the multi-tier mapping approach to
extractandcorrecttheerrorsinthebasicCSS
components.
Tables 3 and 4 give the comparisons between
Methods(a)and(b),whichclearlyshowthatour
methodoutperformsthe baselinemethodbyover
20.2% and20%inF
1
measure fortheshortand
longqueriesrespectively.
Table3:ComparisonofMethodsaandbforshortquery
Average
Precision
Average
Recall
F
1
Methoda
31% 58.5% 40.5%
Methodb
53.98% 69.4% 60.7%
+22.98%
+10.9% +20.2%
Table4:ComparisonofMethodsaandbforlongquery
Average
Precision
Average
Recall
F
1
Methoda
39.23% 85.99% 53.9%
Methodb
67.75% 81.31% 73.9%
+28.52%
-4.68% +20.0%
Theimprovementislargelyduetotheuseofour
approach to extract CSS and correct the speech
recognition errors in the CSS components. More
detailedanalysisoflongqueriesinTable3reveals
thatourmethodperformsworsethanthebaseline
method in recall. This is mainly due to errors in
extracting and breaking CSS into basic compo-
nents. Although we used the multi-tier mapping
approachtoreducetheerrorsfromspeechrecogni-
tion, its improvement is insufficient to offset the
lost in recallduetoerrors inextractingCSS.On
theotherhand, fortheshortquerycases,without
theerrorsinbreakingCSS,oursystemismoreef-
fectivethanthebaselineinrecall.Itisnotedthatin
bothcases,oursystemperformssignificantlybet-
terthanthebaselineintermsofprecisionandF
1
measures.
6 Conclusion
Althoughresearchonnaturallanguagequeryproc-
essingandspeechrecognitionhasbeencarriedout
formanyyears,thecombinationofthesetwoap-
proachesto help a large population of infrequent
usersto“surfthewebbyvoice”hasbeenrelatively
recent. This paper outlines a divide-and-conquer
approachtoalleviatetheeffectofspeechrecogni-
tionerror,andinextractingkeyCSScomponents
foruseinastandardsearchenginetoretrieverele-
vantdocuments.Themaininnovativestepsinour
system are: (a) we use a query model to isolate
CSSinspeechqueries;(b)webreaktheCSSinto
basiccomponents;and(c)weemployamulti-tier
approach tomapthebasiccomponentstoknown
phrases in the dictionary. The tests demonstrate
thatourapproachiseffective.
Theworkisonlythebeginning.Furtherresearch
canbecarriedoutasfollows.First,asmostofthe
queriesareaboutnamedentities suchastheper-
sonsororganizations,weneedtoperformnamed
entityanalysis onthequeriestobetterextractits
structure,andinmappingtoknownnamedentities.
Second,mostspeechrecognitionenginewillreturn
a list of probable words for each syllable. This
couldbeincorporatedintoourframeworktofacili-
tatemulti-tiermapping.
References
BerlinChen,Hsin-minWang,andLin-ShanLee
(2001),“ImprovedSpokenDocumentRetrieval
byExploringExtraAcousticandLinguistic
Cues”,Proceedingsofthe7thEuropeanConfer-
enceonSpeechCommunicationandTechnology
locatedat
http://homepage.iis.sinica.edu.tw/
PaulS.JacobsandLisaF.Rau(1993),Innova-
tionsinTextInterpretation,ArtificialIntelli-
gence,Volume63,October1993(SpecialIssue
onTextUnderstanding)pp.143-191
Thomas H. Cormen, Charles E. Leiserson and
RonaldL.Rivest(1990),“Introductiontoalgo-
rithms”,publishedbyMcGraw-Hill.
JerryR.Hobbs,etal,(1997),FASTUS:ACas-
cadedFinite-StateTransducerforExtractingIn-
formationfromNatural-LanguageText,Finite-
StateLanguageProcessing,EmmanuelRoche
andYvesSchabes,pp.383-406,MITPress,
JulianKupiec(1993),MURAX:“Arobustlinguis-
tic approach for question answering using an
one-lineencyclopedia”, Proceedings of 16
th
an-
nual conference on Research and Development
inInformationRetrieval(SIGIR),pp.181-190
Chin-Hui Lee et al (1996), “A Survey on Auto-
matic Speech Recognition with an Illustrative
ExampleOnContinuousSpeechRecognitionof
Mandarin”, in Computational Linguistics and
ChineseLanguageProcessing,pp.1-36
Helen Meng and Pui Yu Hui (2001), “Spoken
DocumentRetrievalfor the languages of Hong
Kong”, International Symposium on Intelligent
Multimedia,VideoandSpeechProcessing,May
2001,locatedat
www.se.cuhk.edu.hk/PEOPLE/
KenneyNg(2000),“InformationFusionForSpo-
ken Document Retrieval”, Proceedings of
ICASSP’00, Istanbul, Turkey, Jun, located at
http://www.sls.lcs.mit.edu/sls/publications/
Hsiao Tieh Pu (2000), “Understanding Chinese
Users’ Information Behaviors through Analysis
of Web Search Term Logs”, Journal of Com-
puters,pp.75-82
Liqin, Shen, Haixin Chai, Yong Qin and Tang
Donald (1998),“CharacterError Correction for
ChineseSpeechRecognition System”,Proceed-
ings of International Symposium on Chinese
Spoken Language Processing Symposium Pro-
ceedings,pp.136-138
Amit Singhal and Fernando Pereira (1999),
“Document Expansion for Speech Retrieval”,
Proceedings of the 22
nd
Annual International
conferenceonResearchandDevelopmentinIn-
formationRetrieval(SIGIR),pp.34~41
Tomek Strzalkowski (1999), “Natural language
information retrieval”,Boston: Kluwer Publish-
ing.
GangWang(2002),“WebsurfingbyChinese
Speech”,Masterthesis,NationalUniversityof
Singapore.
Hsin-minWang,HelenMeng,PatrickSchone,Ber-
lin Chen and Wai-Kt Lo (2001), “Multi-Scale
Audio Indexing for translingual spoken docu-
ment retrieval”, Proceedings of IEEE Interna-
tionalConferenceonAcoustics,Speech, Signal
processing,SaltLakeCity,USA,May2001,lo-
catedat
http://www.iis.sinica.edu.tw/~whm/
YongchengWang(1992),Technologyandbasisof
Chinese Information Processing, Shanghai Jiao
TongUniversityPress
Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier
(1999),“Introductiontomoderninformationre-
trieval”,PublishedbyLondon:LibraryAssocia-
tionPublishing.
Hai-nanYing,YongJiandWeiShen,(2002),“re-
portofquerylog”,internalreportinShanghai
JiaoTongUniversity
GuodongZhouandKimTengLua(1997)Detec-
tionofUnknownChineseWordsUsingaHybrid
ApproachComputerProcessingofOrientalLan-
guages,Vol11,No1,1997,63-75
GuodongZhou(1997),“LanguageModellingin
MandarinSpeechRecognition”,Ph.D.Thesis,
NationalUniversityofSingapore.
. Extracting Key Semantic Terms from Chinese Speech Query for Web Searches GangWANG NationalUniversityof Singapore wanggang_sh@hotmail.com Tat-SengCHUA NationalUniversityofSinga- pore chuats@comp.nus.edu.sg Yong-ChengWANG ShanghaiJiaoTongUniver- sity,China,200030 ycwang@mail.sjtu.edu.cn Abstract This. re- searchisindevisingadivide-and-conquerstrategy toalleviatethe speech recognition errors. It uses the query modeltofacilitatetheextractionofmain core semantic string(CSS) from the Chinese natu- rallanguage speech query. ItthenbreakstheCSS into. analyzethe query and extract the core semantic string (CSS) that containsthemain semantic ofthe query. Thereare twomaincomponents for a query model.Thefirst is query componentdictionary,which
Ngày đăng: 31/03/2014, 03:20
Xem thêm: Báo cáo khoa học: "Extracting Key Semantic Terms from Chinese Speech Query for Web Searches" ppt, Báo cáo khoa học: "Extracting Key Semantic Terms from Chinese Speech Query for Web Searches" ppt