Tài liệu SEC 10 pptx

X SpeechProcessing RichardV.Cox AT&TLabs—Research LawrenceR.Rabiner AT&TLabs—Research 44SpeechProductionModelsandTheirDigitalImplementations M.MohanSondhi andJuergenSchroeter Introduction • GeometryoftheVocalandNasalTracts • AcousticalPropertiesoftheVocaland NasalTracts • SourcesofExcitation • DigitalImplementations 45SpeechCoding RichardV.Cox Introduction • UsefulModelsforSpeechandHearing • TypesofSpeechCoders • CurrentStandards 46Text-to-SpeechSynthesis RichardSproatandJosephOlive Introduction • TextAnalysisandLinguisticAnalysis • SpeechSynthesis • TheFutureofTTS 47SpeechRecognitionbyMachine LawrenceR.RabinerandB.H.Juang Introduction • CharacterizationofSpeechRecognitionSystems • SourcesofVariabilityofSpeech • ApproachestoASRbyMachine • SpeechRecognitionbyPatternMatching • ConnectedWord Recognition • ContinuousSpeechRecognition • SpeechRecognitionSystemIssues • PracticalIssues inSpeechRecognition • ASRApplications 48SpeakerVerification SadaokiFuruiandAaronE.Rosenberg Introduction • PersonalIdentityCharacteristics • VocalPersonalIdentityCharacteristics • Basic ElementsofaSpeakerRecognitionSystem • ExtractingSpeakerInformationfromtheSpeechSignal • FeatureSimilarityMeasurements • UnitsofSpeechforRepresentingSpeakers • InputModes • Representations • OptimizingCriteriaforModelConstruction • ModelTrainingandUpdating • SignalFeatureandScoreNormalizationTechniques • DecisionProcess • OutstandingIssues 49DSPImplementationsofSpeechProcessing KurtBaudendistel SoftwareDevelopmentTargets • SoftwareDevelopmentParadigms • AssemblyLanguageBasics • Arithmetic • AlgorithmicConstructs 50SoftwareToolsforSpeechResearchandDevelopment JohnShore Introduction • HistoricalHighlights • TheUser’sEnvironment(OS-Basedvs.Workspace-Based) • Compute-Orientedvs.Display-Oriented • Compiledvs.Interpreted • SpecifyingOperations AmongSignals • Extensibility(Closedvs.OpenSystems) • ConsistencyMaintenance • Other CharacteristicsofCommonApproaches • FileFormats(DataImport/Export) • SpeechDatabases • SummaryofCharacteristicsandUses • SourcesforFindingOutWhatisCurrentlyAvailable • FutureTrends c  1999byCRCPressLLC W ITH THE ADVENT OF CHEAP, HIGH SPEED PROCESSORS, and with the ever- decreasing cost of memory, the cost of speech processing has been driven down to the point where it can be (and has been) embedded in almost any system, from a low cost consumer product (e.g., solid-state digital answering machines, voice controlled telephones, etc.), to a desktop application (e.g., voice dictation of a first draft quality manuscript), to an application embedded in a voice or data network (e.g., voice dialing, packet telephony, voice browser for the Internet, etc.). It is the purpose of this section of the Handbook to provide discussions of several of the key technologies in speech processing and to illustrate how the technologies are implemented using special-purpose DSP processor chips or via standard software packages running on more con- ventional processors. The broad area of speech processing can be broken down into several individual areas according to both applications and technology. These include: 1. SpeechProductionModelsandtheirDigitalImplementations(see Chapter44 by Sondhi and Schroeter). In order to understand how the characteristics of a speech signal can be exploited in the different application areas, it is necessary to understand the properties and constraints of the human vocal apparatus (to understand how speech is generated by humans). It is also necessary to understand the way in which models can be built that simulate speech production as well as the ways in which they can be implemented as digital systems, since such models form the basis for almost all practical speech processing systems. 2. Speech Coding (see Chapter 45 by Cox). Speech coding is the process of compressing the information in a speech signal so as to either transit it or store it economically over a channel whosebandwidthis significantly smallerthanthatoftheuncompressedsignal. Speechcodingis used as the basis for most modern voice messaging and voice mail systems, for voice response systems, for digital cellular and for satellite transmission of speech, for packet telephony, for ISDN teleconferencing, and for digital answering machines and digital voice encryption machines. 3. Text-to-SpeechSynthesis (see Chapter 46 by Sproat and Olive). Speech synthesis is the process of creating a synthetic replica of a speech signal so as to transmit a message from a machine to a person, with the purpose of conveying the information in the message. Speech synthesis is often called “text-to-speech” or TTS, to convey the idea that, in general, the input to the system is ordinary ASCII text, and the output of the system is ordinary speech. The goal of most speech synthesis systems is to provide a broad range of capability for having a machine speak information (stored in the machine) to a user. Key aspects of synthesis systems are the intelligibility and the naturalness of the resulting speech. The major applications of speech synthesis include acting as a voice server for text-based information services (e.g., stock prices, sports scores, flight information); providing a means for reading e-mail, or the text portions of FAX messages over ordinary phone lines; providing a means for previewing text stored in documents (e.g., document drafts, Internet files); and finally as a voice readout for handheld devices, (e.g., phrase book translators, dictionaries, etc.) 4. Speech Recognition by Machine (see Chapter 47 by Rabiner and Juang). Speech recognition is the process of extracting the message information in a speech signal so as to control the action of a machine in response to spoken commands. In a sense, speech recognition is the complementary process to speech synthesis, and together they constitute the building blocks of a voice dialogue system with a machine. There are many factors which influence the type of speech recognition system that is used for different applications, including the mode of speaking to the machine (e.g., single commands, digit sequences, fluent sentences), the size and complexity of the vocabulary which the machine understands, the task which the machine c  1999 by CRC Press LLC is asked to accomplish, the environment in which the recognition system must run, and finally the cost of the system. Although there is a wide range of applications of speech recognition systems, the most generic systems are simple “command-and-control” systems (with menu- likeinterfaces), and the mostadvancedsystems support fullvoice dialogues for dictation, forms entry, catalog ordering, reservation services, etc. 5. Speaker Verification (see Chapter 48 by Furui and Rosenberg). Speaker verification is the process of verifying the claimed identity of a speaker for the purpose of restricting access to information (e.g., personal or private records), networks (computer, PBX), or physical premises. The basic problem of speaker verification is to decide whether or not an unknown speech sample was spoken by the individual whose identity was claimed. A key aspect of any speaker verification system is to accept the true speaker as often as possible while rejecting the impostor as often as possible. Since these are inherently conflicting goals, all practical systems arrive at some compromise between levels of these two types of system errors. The major area of application for speaker verification is in access control to information, credit, banking, machines, computer networks, private branch exchanges (PBX’s), and even premises. The concept of a “voice lock” that prevents access until the appropriate speech by the authorized individual(s) (e.g., “Open Sesame”) is “heard” by the system is made a reality using speaker verification technology. 6. DSP Implementations of Speech Processing (see Chapter 49 by Baudendistel). Until a few years ago, almost all speech processing systems were implemented on low-cost DSP fixed-point processors because of their high efficiency in realizing the computational aspects of the various signal processing algorithms. A key problem in the realization of any digital system in integer DSP code is how to map an algorithm efficiently (in both time and space) which is typically running in floating point C code on a workstation to integer C code that takes advantage of the unique characteristics of different DSP chips. Furthermore, because of the rate of change of technology, it is essential that the conversion to DSP code occur rapidly (e.g., on the order of 3-person months) or else by the time a given algorithm is mapped to a specific DSP processor, a new (faster, cheaper) generation of DSP chips will have evolved, obsoleting the entire process. 7. Software Tools for Speech Research and Development (see Chapter 50 by Shore). The field of speech processing has become a complex one, where an investigator needs a broad range of tools to record, digitize, display, manipulate, process, store, format, analyze, and listen to speech in its different file forms and manifestations. Although it is conceivable that an individual could create a suite of software tools for an individual application, that process would be highly inefficient and would undoubtedly result in tools which were significantly less powerfulthan those developed in the commercial sector, such as the EntropicSignal Processing System, MATLAB, Waves, Interactive Laboratory System (ILS), or the commercial packages for TTS and speech recognition such as the Hidden Markov Model Toolkit (HTK). The material presented in this section should provide the reader with a framework for understand- ing the signal processing aspects of speech processingand some pointers into the literature for further investigation of this fascinating and rapidly evolving field. c  1999 by CRC Press LLC . telephony, voice browser for the Internet, etc.). It is the purpose of this section of the Handbook to provide discussions of several of the key technologies. which were significantly less powerfulthan those developed in the commercial sector, such as the EntropicSignal Processing System, MATLAB, Waves, Interactive

Định dạng
Số trang	3
Dung lượng	30,71 KB