ExtractingKeySemanticTermsfromChineseSpeechQueryforWeb Searches GangWANG NationalUniversityof Singapore wanggang_sh@hotmail.com Tat-SengCHUA NationalUniversityofSinga- pore chuats@comp.nus.edu.sg Yong-ChengWANG ShanghaiJiaoTongUniver- sity,China,200030 ycwang@mail.sjtu.edu.cn Abstract This paper discusses the challenges and pro- poses a solution to performing information re- trievalontheWebusingChinesenaturallanguage speech query. The main contribution of this re- searchisindevisingadivide-and-conquerstrategy toalleviatethespeech recognition errors. It uses thequerymodeltofacilitatetheextractionofmain coresemanticstring(CSS)fromtheChinesenatu- rallanguagespeechquery.ItthenbreakstheCSS into basic components corresponding to phrases, and uses a multi-tier strategy to map the basic components to known phrases inorderto further eliminatetheerrors.Theresultingsystemhasbeen foundtobeeffective. 1 Introduction Weareentering aninformation era, where infor- mationhasbecomeoneofthemajorresourcesin ourdailyactivities.Withitswidespreadadoption, Internethasbecomethelargestinformationwealth for all to share.Currently, most (Chinese)search engines can only support term-based information retrieval,wheretheusersarerequiredtoenterthe queriesdirectlythroughkeyboardsinfrontofthe computer. However, there is a large segment of populationinChinaandtherestoftheworldwho areilliterateanddonothavetheskillstousethe computer.Theyarethusunabletotakeadvantage ofthevastamountoffreelyavailableinformation. Since almost every person can speak and under- standspokenlanguage,theresearchon“(Chinese) natural language speech query retrieval” would enableaveragepersonstoaccessinformationusing thecurrentsearchengineswithouttheneedtolearn specialcomputerskillsortraining.Theycansim- ply access the search engine using common de- vices that they are familiar with such as the telephone,PDAandsoon. Inordertoimplementaspeech-basedinforma- tion retrieval system, one of the most important challengesishowtoobtainthecorrectqueryterms fromthespokennaturallanguagequerythatcon- veythemainsemanticsofthequery.Thisrequires theintegrationofnaturallanguagequeryprocess- ingandspeechrecognitionresearch. Naturallanguagequeryprocessinghasbeenan activeareaofresearchfor many yearsandmany techniques have been developed (Jacobs and Rau1993;Kupie,1993; Strzalkowski,1999;Yuet al,1999).Mostofthesetechniques,however,focus onlyonwrittenlanguage,withfewdevotedtothe studyofspokenlanguagequeryprocessing. Speech recognition involves the conversion of acousticspeechsignalstoastreamoftext.Because ofthecomplexityofhumanvocaltract,thespeech signalsbeingobservedaredifferent,evenformul- tipleutterancesofthesamesequenceofwordsby thesameperson(Leeetal1996).Furthermore,the speechsignalscanbeinfluencedbythedifferences across different speakers, dialects, transmission distortions, and speaking environments. These have contributed to the noise and variability of speechsignals.Asoneofthemainsourcesofer- rorsinChinesespeechrecognitioncomefromsub- stitution (Wang 2002; Zhou 1997), in which a wrongbutsimilarsoundingtermisusedinplaceof thecorrectterm,confusionmatrixhasbeenusedto recordconfusedsoundpairsinanattempttoelimi- nate this error. Confusion matrix has been em- ployed effectively in spoken document retrieval (Singhaletal,1999andSrinivasanetal2000)and tominimizespeechrecognitionerrors(Shenetal, 1998). However, when such method is used di- rectlytocorrectspeechrecognitionerrors,ittends tobringin too many irrelevantterms(Ng2000). Becauseimportant terms in a longdocumentare oftenrepeatedseveraltimes,thereisagoodchance thatsuchtermswillbecorrectlyrecognizedatleast oncebyaspeechrecognitionenginewithareason- ablelevelof wordrecognitionrate.Manyspoken documentretrieval(SDR)systemstookadvantage ofthisfactinreducingthespeechrecognitionand matchingerrors(Mengetal2001;Wangetal2001; Chen et al2001). Incontrastto SDR,very little work has been done on Chinese spoken query processing(SQP),whichistheuseofspokenque- riestoretrievaltextualdocuments.Moreover,spo- kenqueriesinSQPtendtobeveryshortwithfew repeatedterms. In this paper, we aim to integrate the spoken languageandnaturallanguageresearchtoprocess spokenquerieswithspeechrecognitionerrors.The maincontributionofthisresearchisindevisinga divide-and-conquerstrategytoalleviatethespeech recognition errors. It first employs the Chinese query model to isolate the Core Semantic String (CSS) that conveys the semantics of the spoken query. It then breaks the CSS into basic compo- nentscorrespondingtophrases,andusesamulti- tierstrategytomapthebasiccomponentstoknown phrasesinadictionaryinordertofurthereliminate theerrors. Intherestofthispaper,anoverviewofthepro- posedapproachisintroducedinSection2.Section 3describesthequerymodel,whileSection4out- lines the use of multi-tier approach to eliminate errorsinCSS.Section5discussestheexperimental setup and results.Finally,Section 6 contains our concludingremarks. 2 Overviewoftheproposedapproach Therearemanychallengesinsupportingsurfingof Webbyspeechqueries.Oneofthemainchallenges isthatthecurrentspeechrecognitiontechnologyis notverygood,especiallyforaverageusersthatdo nothaveanyspeechtrainings.Forsuchunlimited user group, the speech recognition engine could achieveanaccuracyoflessthan50%.Becauseof this,thekeyphraseswederived fromthespeech querycouldbeinerrorormissingthemainseman- ticofthequeryaltogether.This wouldaffectthe effectivenessoftheresultingsystemtremendously. Giventhespeech-to-textoutputwitherrors,the keyissueisonhowtoanalyzethequeryinorderto grasptheCoreSemanticString(CSS)asaccurately as possible. CSS is defined as the key term se- quenceinthequerythatconveysthemainseman- tics of the query. For example, given the query: “ ”(Pleasetell metheinformationonhowtheU.S.separatesthe most-favored-nation status from human rights is- sueinchina).TheCSSinthequeryisunderlined. WecansegmenttheCSSintoseveralbasiccom- ponentsthatcorrespondtokey concepts suchas: (U.S.), (China), (human rightsissue), (themost-favored-nation status)and (separate). Because of the difficulty in handling speech recognitionerrorsinvolvingmultiple segmentsof CSSs,welimitourresearchtoqueriesthatcontain onlyoneCSSstring.However,weallowaCSSto includemultiplebasiccomponentsasdepicted in theaboveexample.Thisisreasonableasmostque- riesposedbytheusersontheWebtendtobeshort withonlyafewcharacters(Pu2000). Thus the accurate extraction of CSS and its separation into basic components is essential to alleviatethespeechrecognitionerrors.Firstofall, isolatingCSSfromtherestofspeechenablesusto ignoreerrorsinotherpartsofspeech,suchasthe greetingsandpoliteremarks,whichhavenoeffects ontheoutcomeofthequery.Second,byseparating theCSSintobasiccomponents,wecanlimitthe propagationoferrors,andemploythesetofknown phrasesinthedomaintohelpcorrecttheerrorsin thesecomponentsseparately. Figure1:Overviewoftheproposedapproach To achieve this, weprocess the query in three mainstagesasillustratedinFigure1.First,given theuser’soralquery,thesystemusesaspeechrec- ognitionenginetoconvertthespeechtotext.Sec- ond, we analyze the query using a query model (QM) to extract CSS from the query with mini- mumerrors.QMdefinesthestructuresandsome of the standard phrases used in typical queries. Third,wedividetheCSSintobasiccomponents, andemployamulti-tierapproachtomatchtheba- QM Confusionmatrix PhraseDictionary Multi-Tier mapping Basic Components Speech Query CSS sic components to the nearest known phrases in ordertocorrectthespeechrecognitionerrors.The aimhereistoimproverecallwithoutexcessivelost in precision. The resulting key components are thenusedasquerytostandardsearchengine. The following sections describe the details of ourapproach. 3 QueryModel(QM) Querymodel (QM) is used to analyzethe query and extract the core semantic string (CSS) that containsthemainsemanticofthequery.Thereare twomaincomponentsforaquerymodel.Thefirst isquery componentdictionary,which isa set of phrasesthathascertainsemanticfunctions,suchas the polite remarks, prepositions, time etc. The othercomponentisthequerystructure,whichde- finesasequenceofacceptablesemanticallytagged tokens, such as “Begin, Core Semantic String, QuestionPhrase, and End”.Each querystructure alsoincludesitsoccurrenceprobabilitywithinthe query corpus. Table 2 gives some examples of querystructures. 3.1QueryModelGeneration Inordertocomeupwithasetofgeneralizedquery structures, we use a query log of typical queries posedbyusers.Thequerylogconsistsof557que- ries,collectedfromtwenty-eighthumansubjectsat the Shanghai Jiao Tong University (Ying 2002). Eachsubjectisaskedtopose20separatequeriesto retrievegeneralinformationfromtheWeb. After analyzing the queries, we derive a query modelcomprising51querystructuresandasetof query components. For each query structure, we compute its probability of occurrence, which is used to determine the more likely structure con- tainingCSSincasetherearemultipleCSSsfound. Aspartoftheanalysisofthequerylog,weclassify thequerycomponentsintotenclasses,aslistedin Table1.Thesetenclassesarecalledsemantictags. Theycanbefurtherdividedintotwomaincatego- ries:theclosedclassandopenclass.Closedclasses are those that have relatively fixed word lists. Theseincludequestionphrases,quantifiers,polite remarks, prepositions, time and commonly used verb and subject-verb phrases. Wecollectallthe phrasesbelongingtoclosedclassesfromthequery logandstoretheminthequerycomponentdiction- ary.TheopenclassistheCSS,which wedonot knowinadvance.CSStypicallyincludesperson’s names,eventsandcountry’snamesetc. Table1:DefinitionandExamplesofSemantictags SemTag Nameoftag Example 1. Verb-Object Phrase give (me) 2. QuestionPhrase (isthere) 3. QuestionField (news), (report) 4. Quantifier (some) 5. VerbPhrase (find) collect 6. PoliteRemark (pleasehelp me) 7. Preposition (about), (about) 8. Subject-Verb phrase (I) (want) 9. CoreSemantic String 9.11 (9.11event) 10. Time (today) Table2:ExamplesofQueryStructure 1 Q1:0,2,7,9,3,0:0.0025, 9.11 2793 IsthereanyinformationonSeptember11? 2 Q2:0,1,7,9,3,0:0.01 1793 GivemesomeinformationaboutBenladen. Giventhesetofsamplequeries,aheuristicrule- basedapproachisusedtoanalyzethequeries,and break them into basic components with assigned semantictagsbymatchingthewordslistedinTa- ble 1. Any sequences of words or phrases not foundintheclosedclassaretaggedasCSS(with Semantic Tag 9). We can thus derive the query structuresoftheformgiveninTable2. 3.2ModelingofQueryStructureasFSA Duetospeechrecognitionerrors,wedonotexpect thequerycomponentsandhencethequerystruc- turetoberecognizedcorrectly. Instead,weparse thequerystructure inordertoisolateandextract CSS.Tofacilitatethis,weemploytheFiniteState Automata(FSA)tomodelthequerystructure.FSA modelstheexpectedsequencesoftokensintypical queries andannotate the semantictags,including CSS.AFSA isdefinedforeach of the51query structures.AnexampleofFSAisgiveninFigure2. BecauseCSSisanopenset,wedonotknowits contentinadvance.Instead,weusethefollowing tworulestodeterminethecandidatesforCSS:(a) itisan unknownstring not present intheQuery Component Dictionary; and (b) its length is not lessthantwo,astheaveragelengthofconceptsin Chineseisgreaterthanone(Wang1992). At each stage of parsing the query using FSA (Hobbsetal1997),weneedtomakedecisionon which state to proceedand how to handleunex- pected tokens in the query. Thus at each stage, FSAneedstoperformthreefunctions: a) Gotofunction:Itmapsapairconsistingofa stateandaninputsymbolintoanewstateor thefailstate.WeuseG(N,X)=N’todefine thegotofunctionfromStateNtoStateN’, giventheoccurrenceoftokenX. b) Fail function: It is consulted whenever the gotofunctionreportsafailurewhenencoun- teringanunexpectedtoken.Weusef(N)=N’ torepresentthefailfunction. c) Output function: In the FSA, certain states aredesignatedasoutputstates,which indi- cate that a sequence of tokens has been found and are tagged with the appropriate semantictag. To construct a goto function, we begin with a graph consisting of one vertex which represents State0.WethenentereachtokenXintothegraph byaddingadirectedpathtothegraphthatbegins atthestartstate.Newverticesandedgesareadded tothegraph sothat therewill be, startingat the startstate,apathinthegraphthatspellsoutthe tokenX.ThetokenXisaddedtotheoutputfunc- tionofthestateatwhichthepathterminates. Forexample,supposethatourQueryComponent Dictionary consists of seven phrases as follows: “ (please help me); (some); (about); (news); (collect); (tell me); (what do youhave)”. Adding these tokensintothegraphwillresultinaFSAasshown inFigure2.ThepathfromState0toState3spells outthephrase“ (Pleasehelpme)”,andon completion of this path, we associate its output withsemantictag6.Similarly,theoutputof“ (some)” is associated with State 5, and semantic tag4,andsoon. Wenowuseanexampletoillustratetheprocess of parsing the query. Suppose the user issues a speechquery:” ” (please help me to collect some information about Bin Laden).However, the resultofspeech recognition witherrors is: ” (please) (help) (me) (receive) (send) (some) (about) (half) (pull) (light) (of) (news)”. Note that there are 4 mis-recognized characterswhichareunderlined. Note:indicatesthesemantictag. Figure2:FSAforpartofQueryComponentDictionary TheFSAbeginswithState0.Whenthesystem encountersthesequenceofcharacters (please) (help) (me),thestatechangesfrom0to1,2 andeventuallyto3.AtState3,thesystemrecog- nizes a polite remark phrase and output a token withsemantictag6. Next,thesystemmeetsthecharacter (receive), itwilltransittoState10,becauseofg(0, )=10. Whenthesystemseesthenextcharacter (send), which does not have a corresponding transition rule, the goto function reports a failure. Because thelengthofthestringis2andthestringisnotin theQueryComponentDictionary,thesemantictag 9isassignedtotoken” ”accordingtothedefi- nitionofCSS. By repeating the aboveprocess, we obtain the followingresult: 694793 HerethesemantictagsareasdefinedinTable1. Itisnotedthatbecauseofspeechrecognitionerrors, thesystem detected twoCSSs,andboth ofthem containspeechrecognitionerrors. 3.3CSSExtractionbyQueryModel Giventhat we mayfind multiple CSSs, the next stageistoanalyzetheCSSsfoundalongwiththeir surroundingcontextinordertodeterminethemost probableCSS.Theapproachisbasedontheprem- isethatchoosingthebestsenseforaninputvector amountstochoosingthemostprobablesensegiven that vector. The input vector i has three compo- nents:leftcontext(L i ),theCSSitself(CSS i ),and rightcontext(R i ).Theprobabilityofsuchastruc- tureoccurringintheQueryModelisasfollows: = = n j jiji pCs 0 )*( (1) whereC ij issetto 1ifthe inputvectori(L i ,R i ) matchesthetwocorrespondingleftandrightCSS contextofthequerystructurej,and0otherwise.p j is the possibility of occurrence of the j th query structure,andnisthetotalnumberofthestructures intheQueryModel.NotethatEquation(1)givesa detectedCSShigherweightifitmatchestomore querystructureswithhigheroccurrenceprobabili- ties. We simply select the best CSS i such that )(maxarg i i s accordingtoEqn(1). Forillustration,let’sconsidertheaboveexample with2detectedCSSs.ThetwoCSSvectorsare:[6, 9, 4] and [7, 9, 3]. From the Query Model, we know that the probability of occurrence, p j , of structure[6,9,4]is0,andthatofstructure[7,9,3] is0.03,withthelattermatchestoonlyonestruc- ture.Hencethes i valuesforthemare0and0.03 respectively.Thusthemostprobablecoresemantic structureis[7,9,3]andtheCSS“ (half) (pull) (light)”isextracted. 4 QueryTermsGeneration Becauseofspeechrecognitionerror,theCSSob- tained is likely to contain error, or in the worse case,missingthemainsemanticsofthequeryalto- gether.Wenowdiscusshowwealleviatetheerrors inCSSfortheformercase.Wewillfirstbreakthe CSS into one or more basic semantic parts, and thenapplythemulti-tiermethodtomapthequery componentstoknownphrases. 4.1BreakingCSSintoBasicComponents Inmanycases,theCSSobtainedmaybemadeup ofseveralsemanticcomponentsequivalenttobase nounphrases.Hereweemployatechniquebased onChinesecutmarks(Wang1992)toperformthe segmentation. The Chinese cut marks are tokens that can separate aChinesesentence into several semanticparts.Zhou(1997)usedsuchtechniqueto detectnewChinesewords,andreportedgoodre- sults with precision and recall of 92% and 70% respectively.ByseparatingtheCSSintobasickey components,wecanlimitthepropagationoferrors. 4.2Multi-tierquerytermmapping Inordertofurthereliminatethespeechrecognition errors,weproposeamulti-tierapproachtomapthe basic componentsin CSS into known phrases by usingacombinationofmatchingtechniques.Todo this,weneedtobuildupaphrasedictionarycon- taining typical conceptsused ingeneral and spe- cificdomains.MostbasicCSScomponentsshould bemappedtooneofthesephrases.Thusevenifa basiccomponentcontainserrors,aslongaswecan findasufficientlysimilarphraseinthephrasedic- tionary, wecanusethisinplaceoftheerroneous CSScomponent,thuseliminatingtheerrors. We collected a phrase dictionary containing about32,842phrases,covering mostlybasenoun phraseandnamedentity.Thephrasesarederived fromtwosources.We firstderivedasetofcom- mon phrases from the digital dictionary and the logsinthesearchengineusedattheShanghaiJiao TongUniversity.Wealsoderivedasetofdomain specific phrases by extracting the base noun phrasesandnamedentitiesfromtheon-linenews articlesobtainedduringtheperiod.Thisapproach isreasonableasinpracticewecanuserecentweb ornewsarticlesto extractconceptstoupdatethe phrasedictionary. Given the phrase dictionary, the next problem then is to map the basicCSS components tothe nearest phrases in the dictionary. As the basic componentsmaycontainerrors,wecannotmatch them exactly just at the character level. We thus propose to match each basic component with the knownphrasesinthedictionaryatthreelevels:(a) character level; (b) syllable string level; and (c) confusion syllable string level. The purpose of matching at levels b and c is to overcome the homophoneprobleminCSS.Forexample,“ (Laden)” is wrongly recognized as “ (pull lamp)”bythespeechrecognitionengine.Sucher- rorscannotbere-solvedatthecharactermatching level,butitcanprobablybematchedatthesyllable stringlevel.Theconfusionmatrixisusedtofurther reducetheeffectofspeechrecognitionerrorsdue tosimilarsoundingcharacters. To account for possible errors in CSS compo- nents, we perform similarity, instead of exact, matchingatthethreelevels.GiventhebasicCSS componentq i ,andaphrasec j inthedictionary,we compute: = = ),( 0 * |}||,max{| ),( ),( ii cqLCS k k ii ii ii M cq cqLCS cqSim (2) where LCS(q i ,c j )gives the number of characters/ syllablematchedbetweenq i andc i intheorderof theirappearanceusingthelongestcommonsubse- quence matching (LCS) algorithm (Cormen et al 1990).M k isintroducedtoaccountsforthesimilar- itybetweenthetwomatchingunits,andisdepend- ent on the level of matching. If the matching is performedatthecharacterorsyllablestringlevels, thebasicmatchingunitisonecharacteroronesyl- lableandthesimilaritybetweenthetwomatching unitsis1.Ifthematchingisdoneattheconfusion syllablestringlevel,M k isthecorrespondingcoef- ficientsintheconfusionmatrix.HenceLCS(q i ,c j ) givesthedegreeofmatchbetweenq i andc j ,nor- malizedbythemaximumlengthofq i orc j ;andΣM gives the degree of similarity between the units beingmatched. Thethreelevelofmatchingalsorangesfrombe- ingmoreexactatthecharacterlevel,tolessexact attheconfusionsyllablelevel.Thusifwecanfind a relevant phrase with sim(q i ,c j )> at the higher characterlevel,wewillnotperformfurthermatch- ing at the lower levels. Otherwise, we will relax theconstrainttoperformthe matchingatsucces- sivelylowerlevels,probablyattheexpenseofpre- cision. Thedetailofalgorithmislistedasfollows: Input:BasicCSSComponent,q i a. Matchq i withphrasesindictionaryatcharacter levelusingEqn.(2). b. Ifwecannotfindamatch,thenmatchq i with phrasesatthesyllablelevelusingEqn.(2). c. Ifwestillcannotfindamatch,matchq i with phrasesattheconfusionsyllablelevelusing Eqn.(2). d. Ifwefoundamatch,setq’ i =c j ;otherwiseset q’ i =q i . Forexample,givenaquery:“ ”(pleasetellmesomenewsabout Iraq).Ifthequeryiswronglyrecognizedas“ ”. If, however, we couldcorrectly extracttheCSS“ (Iraq) fromthismis-recognizedquery,thenwecouldig- norethespeechrecognitionerrorsinotherpartsof the above query. Even if there are errors in the CSSextracted,suchas“ (chen) (waterside)” insteadof“ (chenshuibian)”,wecouldap- plythesyllablestringlevelmatchingtocorrectthe homophone errors. For CSS errors such as “ (corrupt) (usually)”insteadofthecorrectCSS “ (Taliban)”, which could not be corrected atthesyllablestringmatchinglevel,wecouldap- plytheconfusionsyllablestringmatchingtoover- comethiserror. 5 Experimentsandanalysis Asoursystem aimsto correct theerrorsand ex- tractCSScomponentsinspokenqueries,itisim- portant todemonstrate thatour system is able to handlequeriesofdifferentcharacteristics.Tothis end,wedevisedtwosetsoftestqueriesasfollows. a)Corpuswithshortqueries We devised 10 queries, each containing a CSS withonlyonebasiccomponent.Thisisthetypical typeofqueriesposedbytheusersontheweb.We asked10 differentpeopleto “speak” thequeries, and used the IBM ViaVoice 98 to perform the speechtotextconversion.Thisgivesrisetoacol- lectionof100spokenqueries.Thereisatotalof 1,340Chinesecharactersinthetestquerieswitha speechrecognitionerrorrateof32.5%. b)Corpuswithlongqueries Inordertotestonqueriesusedinstandardtest corpuses,weadoptedthequerytopics(1-10)em- ployed in TREC-5Chinese-Languagetrack.Here each query contains more thanone key semantic component.Werephrasedthequeriesintonatural languagequeryformat,andaskedtwelvesubjects to “read” the queries. We again used the IBM ViaVoice98toperformthespeechrecognitionon theresulting120 differentspokenqueries,giving risetoatotalof2,354Chinesecharacters witha speechrecognitionerrorrateof23.75%. Wedevisedtwoexperimentstoevaluatetheper- formance of ourtechniques.The firstexperiment wasdesignedtotesttheeffectivenessofourquery model in extracting CSSs. The second was de- signedtotesttheaccuracyofouroverallsystemin extractingbasicquerycomponents. 5.1Test1:AccuracyofextractingCSSs The test results show that by using our query model,wecouldcorrectlyextract99%and96%of CSSs from the spoken queries for the short and long query category respectively. The errors are mainly due to the wrong tagging of some query components,whichcausedthequerymodeltomiss the correct querystructure, or match to a wrong structure. Forexample:giventhequery“ ”(pleasetellmesomenewsabout Taliban).Ifitiswronglyrecognizedas: 97910 which is a nonsensical sentence. Since the prob- abilitiesofoccurrencebothquerystructures[0,9,7] and[7,9,10]are0,wecouldnotfindtheCSSatall. Thiserrorismainlyduetothemis-recognitionof thelastquerycomponent“ (news)”to“ (afternoon)”.ItconfusestheQueryModel,which couldnotfindthecorrectCSS. Theoverallresultsindicatethattherearefewer errorsinshortqueriesassuchqueriescontainonly one CSS component. This is encouraging as in practicemostusersissueonlyshortqueries. 5.2Test2:Accuracyofextracting basic query components In order to test the accuracy of extracting basic querycomponents,weaskedonesubjecttomanu- ally divide the CSS into basic components, and used that as the ground truth. We compared the followingtwomethodsofextractingCSScompo- nents: a) As a baseline, we simply performed the stan- dardstopwordremovalanddividedthequery intocomponentswiththehelpofadictionary. However, there is no attempt to correct the speechrecognitionerrorsinthesecomponents. Hereweassumethatthenaturallanguagequery isabagofwordswithstopwordremoved(Ri- cardo,1999).Currently,mostsearchenginesare basedonthisapproach. b)WeappliedourquerymodeltoextractCSSand employed the multi-tier mapping approach to extractandcorrecttheerrorsinthebasicCSS components. Tables 3 and 4 give the comparisons between Methods(a)and(b),whichclearlyshowthatour methodoutperformsthe baselinemethodbyover 20.2% and20%inF 1 measure fortheshortand longqueriesrespectively. Table3:ComparisonofMethodsaandbforshortquery Average Precision Average Recall F 1 Methoda 31% 58.5% 40.5% Methodb 53.98% 69.4% 60.7% +22.98% +10.9% +20.2% Table4:ComparisonofMethodsaandbforlongquery Average Precision Average Recall F 1 Methoda 39.23% 85.99% 53.9% Methodb 67.75% 81.31% 73.9% +28.52% -4.68% +20.0% Theimprovementislargelyduetotheuseofour approach to extract CSS and correct the speech recognition errors in the CSS components. More detailedanalysisoflongqueriesinTable3reveals thatourmethodperformsworsethanthebaseline method in recall. This is mainly due to errors in extracting and breaking CSS into basic compo- nents. Although we used the multi-tier mapping approachtoreducetheerrorsfromspeechrecogni- tion, its improvement is insufficient to offset the lost in recallduetoerrors inextractingCSS.On theotherhand, fortheshortquerycases,without theerrorsinbreakingCSS,oursystemismoreef- fectivethanthebaselineinrecall.Itisnotedthatin bothcases,oursystemperformssignificantlybet- terthanthebaselineintermsofprecisionandF 1 measures. 6 Conclusion Althoughresearchonnaturallanguagequeryproc- essingandspeechrecognitionhasbeencarriedout formanyyears,thecombinationofthesetwoap- proachesto help a large population of infrequent usersto“surfthewebbyvoice”hasbeenrelatively recent. This paper outlines a divide-and-conquer approachtoalleviatetheeffectofspeechrecogni- tionerror,andinextractingkeyCSScomponents foruseinastandardsearchenginetoretrieverele- vantdocuments.Themaininnovativestepsinour system are: (a) we use a query model to isolate CSSinspeechqueries;(b)webreaktheCSSinto basiccomponents;and(c)weemployamulti-tier approach tomapthebasiccomponentstoknown phrases in the dictionary. The tests demonstrate thatourapproachiseffective. Theworkisonlythebeginning.Furtherresearch canbecarriedoutasfollows.First,asmostofthe queriesareaboutnamedentities suchastheper- sonsororganizations,weneedtoperformnamed entityanalysis onthequeriestobetterextractits structure,andinmappingtoknownnamedentities. Second,mostspeechrecognitionenginewillreturn a list of probable words for each syllable. This couldbeincorporatedintoourframeworktofacili- tatemulti-tiermapping. References BerlinChen,Hsin-minWang,andLin-ShanLee (2001),“ImprovedSpokenDocumentRetrieval byExploringExtraAcousticandLinguistic Cues”,Proceedingsofthe7thEuropeanConfer- enceonSpeechCommunicationandTechnology locatedat http://homepage.iis.sinica.edu.tw/ PaulS.JacobsandLisaF.Rau(1993),Innova- tionsinTextInterpretation,ArtificialIntelli- gence,Volume63,October1993(SpecialIssue onTextUnderstanding)pp.143-191 Thomas H. Cormen, Charles E. Leiserson and RonaldL.Rivest(1990),“Introductiontoalgo- rithms”,publishedbyMcGraw-Hill. JerryR.Hobbs,etal,(1997),FASTUS:ACas- cadedFinite-StateTransducerforExtractingIn- formationfromNatural-LanguageText,Finite- StateLanguageProcessing,EmmanuelRoche andYvesSchabes,pp.383-406,MITPress, JulianKupiec(1993),MURAX:“Arobustlinguis- tic approach for question answering using an one-lineencyclopedia”, Proceedings of 16 th an- nual conference on Research and Development inInformationRetrieval(SIGIR),pp.181-190 Chin-Hui Lee et al (1996), “A Survey on Auto- matic Speech Recognition with an Illustrative ExampleOnContinuousSpeechRecognitionof Mandarin”, in Computational Linguistics and ChineseLanguageProcessing,pp.1-36 Helen Meng and Pui Yu Hui (2001), “Spoken DocumentRetrievalfor the languages of Hong Kong”, International Symposium on Intelligent Multimedia,VideoandSpeechProcessing,May 2001,locatedat www.se.cuhk.edu.hk/PEOPLE/ KenneyNg(2000),“InformationFusionForSpo- ken Document Retrieval”, Proceedings of ICASSP’00, Istanbul, Turkey, Jun, located at http://www.sls.lcs.mit.edu/sls/publications/ Hsiao Tieh Pu (2000), “Understanding Chinese Users’ Information Behaviors through Analysis of Web Search Term Logs”, Journal of Com- puters,pp.75-82 Liqin, Shen, Haixin Chai, Yong Qin and Tang Donald (1998),“CharacterError Correction for ChineseSpeechRecognition System”,Proceed- ings of International Symposium on Chinese Spoken Language Processing Symposium Pro- ceedings,pp.136-138 Amit Singhal and Fernando Pereira (1999), “Document Expansion for Speech Retrieval”, Proceedings of the 22 nd Annual International conferenceonResearchandDevelopmentinIn- formationRetrieval(SIGIR),pp.34~41 Tomek Strzalkowski (1999), “Natural language information retrieval”,Boston: Kluwer Publish- ing. GangWang(2002),“WebsurfingbyChinese Speech”,Masterthesis,NationalUniversityof Singapore. Hsin-minWang,HelenMeng,PatrickSchone,Ber- lin Chen and Wai-Kt Lo (2001), “Multi-Scale Audio Indexing for translingual spoken docu- ment retrieval”, Proceedings of IEEE Interna- tionalConferenceonAcoustics,Speech, Signal processing,SaltLakeCity,USA,May2001,lo- catedat http://www.iis.sinica.edu.tw/~whm/ YongchengWang(1992),Technologyandbasisof Chinese Information Processing, Shanghai Jiao TongUniversityPress Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier (1999),“Introductiontomoderninformationre- trieval”,PublishedbyLondon:LibraryAssocia- tionPublishing. Hai-nanYing,YongJiandWeiShen,(2002),“re- portofquerylog”,internalreportinShanghai JiaoTongUniversity GuodongZhouandKimTengLua(1997)Detec- tionofUnknownChineseWordsUsingaHybrid ApproachComputerProcessingofOrientalLan- guages,Vol11,No1,1997,63-75 GuodongZhou(1997),“LanguageModellingin MandarinSpeechRecognition”,Ph.D.Thesis, NationalUniversityofSingapore. . Extracting Key Semantic Terms from Chinese Speech Query for Web Searches GangWANG NationalUniversityof Singapore wanggang_sh@hotmail.com Tat-SengCHUA NationalUniversityofSinga- pore chuats@comp.nus.edu.sg Yong-ChengWANG ShanghaiJiaoTongUniver- sity,China,200030 ycwang@mail.sjtu.edu.cn Abstract This. re- searchisindevisingadivide-and-conquerstrategy toalleviatethe speech recognition errors. It uses the query modeltofacilitatetheextractionofmain core semantic string(CSS) from the Chinese natu- rallanguage speech query. ItthenbreakstheCSS into. analyzethe query and extract the core semantic string (CSS) that containsthemain semantic ofthe query. Thereare twomaincomponents for a query model.Thefirst is query componentdictionary,which