Speech RecognitionTechnologies and Applications docx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	576
Dung lượng	41,85 MB

Nội dung

Speech Recognition Technologies and Applications Speech Recognition Technologies and Applications Edited by France Mihelič and Janez Žibert I-Tech IV Published by In-Teh In-Teh is Croatian branch of I-Tech Education and Publishing KG, Vienna, Austria. Abstracting and non-profit use of the material is permitted with credit to the source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. Publisher assumes no responsibility liability for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained inside. After this work has been published by the In-Teh, authors have the right to republish it, in whole or part, in any publication of which they are an author or editor, and the make other personal use of the work. © 2008 In-teh www.in-teh.org Additional copies can be obtained from: publication@ars-journal.com First published November 2008 Printed in Croatia A catalogue record for this book is available from the University Library Rijeka under no. 120115073 Speech Recognition, Technologies and Applications, Edited by France Mihelič and Janez Žibert p. cm. ISBN 978-953-7619-29-9 1. Speech Recognition, Technologies and Applications, France Mihelič and Janez Žibert Preface After decades of research activity, speech recognition technologies have advanced in both the theoretical and practical domains. The technology of speech recognition has evolved from the first attempts at speech analysis with digital computers by James Flanagan’s group at Bell Laboratories in the early 1960s, through to the introduction of dynamic time-warping pattern-matching techniques in the 1970s, which laid the foundations for the statistical modeling of speech in the 1980s that was pursued by Fred Jelinek and Jim Baker from IBM’s T. J. Watson Research Center. In the years 1980-90, when Lawrence H. Rabiner introduced hidden Markov models to speech recognition, a statistical approach became ubiquitous in speech processing. This established the core technology of speech recognition and started the era of modern speech recognition engines. In the 1990s several efforts were made to increase the accuracy of speech recognition systems by modeling the speech with large amounts of speech data and by performing extensive evaluations of speech recognition in various tasks and in different languages. The degree of maturity reached by speech recognition technologies during these years also allowed the development of practical applications for voice human–computer interaction and audio- information retrieval. The great potential of such applications moved the focus of the research from recognizing the speech, collected in controlled environments and limited to strictly domain-oriented content, towards the modeling of conversational speech, with all its variability and language-specific problems. This has yielded the next generation of speech recognition systems, which aim to reliably recognize large-scale vocabulary, continuous speech, even in adverse acoustic environments and under different operating conditions. As such, the main issues today have become the robustness and scalability of automatic speech recognition systems and their integration into other speech processing applications. This book on Speech Recognition Technologies and Applications aims to address some of these issues. Throughout the book the authors describe unique research problems together with their solutions in various areas of speech processing, with the emphasis on the robustness of the presented approaches and on the integration of language-specific information into speech recognition and other speech processing applications. The chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for VI prosody modeling in emotion-detection systems and in other speech-processing applications that are able to operate in real-world environments, like mobile communication services and smart homes. We would like to thank all the authors who have contributed to this book. For our part, we hope that by reading this book you will get many helpful ideas for your own research, which will help to bridge the gap between speech-recognition technology and applications. Editors France Mihelič, University of Ljubljana, Slovenia Janez Žibert, University of Primorska, Slovenia Contents Preface V Feature extraction 1. A Family of Stereo-Based Stochastic Mapping Algorithms for Noisy Speech Recognition 001 Mohamed Afify, Xiaodong Cui and Yuqing Gao 2. Histogram Equalization for Robust Speech Recognition 023 Luz García, Jose Carlos Segura, Ángel de la Torre, Carmen Benítez and Antonio J. Rubio 3. Employment of Spectral Voicing Information for Speech and Speaker Recognition in Noisy Conditions 045 Peter Jančovič and Münevver Köküer 4. Time-Frequency Masking: Linking Blind Source Separation and Robust Speech Recognition 061 Marco Kühne, Roberto Togneri and Sven Nordholm 5. Dereverberation and Denoising Techniques for ASR Applications 081 Fernando Santana Pacheco and Rui Seara 6. Feature Transformation Based on Generalization of Linear Discriminant Analysis 103 Makoto Sakai, Norihide Kitaoka and Seiichi Nakagawa Acoustic Modelling 7. Algorithms for Joint Evaluation of Multiple Speech Patterns for Automatic Speech Recognition 119 Nishanth Ulhas Nair and T.V. Sreenivas VIII 8. Overcoming HMM Time and Parameter Independence Assumptions for ASR 159 Marta Casar and José A. R. Fonollosa 9. Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems 171 Juraj Kacur and Gregor Rozinaj Language modelling 10. Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages 193 Ebru Arısoy, Mikko Kurimo, Murat Saraçlar, Teemu Hirsimäki, Janne Pylkkönen, Tanel Alumäe and Haşim Sak ASR systems 11. Discovery of Words: Towards a Computational Model of Language Acquisition 205 Louis ten Bosch, Hugo Van hamme and Lou Boves 12. Automatic Speech Recognition via N-Best Rescoring using Logistic Regression 225 Øystein Birkenes, Tomoko Matsui, Kunio Tanabe and Tor André Myrvoll 13. Knowledge Resources in Automatic Speech Recognition and Understanding for Romanian Language 241 Inge Gavat, Diana Mihaela Militaru and Corneliu Octavian Dumitru 14. Construction of a Noise-Robust Body-Conducted Speech Recognition System 261 Shunsuke Ishimitsu Multi-modal ASR systems 15. Adaptive Decision Fusion for Audio-Visual Speech Recognition 275 Jong-Seok Lee and Cheol Hoon Park 16. Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition 297 Guoyun Lv, Yangyu Fan, Dongmei Jiang and Rongchun Zhao IX Speaker recognition/verification 17. Normalization and Transformation Techniques for Robust Speaker Recognition 311 Dalei Wu, Baojie Li and Hui Jiang 18. Speaker Vector-Based Speaker Recognition with Phonetic Modeling 331 Tetsuo Kosaka, Tatsuya Akatsu, Masaharu Kato and Masaki Kohda 19. Novel Approaches to Speaker Clustering for Speaker Diarization in Audio Broadcast News Data 341 Janez Žibert and France Mihelič 20. Gender Classification in Emotional Speech 363 Mohammad Hossein Sedaaghi Emotion recognition 21. Recognition of Paralinguistic Information using Prosodic Features Related to Intonation and Voice Quality 377 Carlos T. Ishi 22. Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features 395 Marko Lugger and Bin Yang 23. A Weighted Discrete KNN Method for Mandarin Speech and Emotion Recognition 411 Tsang-Long Pao, Wen-Yuan Liao and Yu-Te Chen Applications 24. Motion-Tracking and Speech Recognition for Hands-Free Mouse-Pointer Manipulation 427 Frank Loewenich and Frederic Maire 25. Arabic Dialectical Speech Recognition in Mobile Communication Services 435 Qiru Zhou and Imed Zitouni 26. Ultimate Trends in Integrated Systems to Enhance Automatic Speech Recognition Performance 455 C. Durán X 27. Speech Recognition for Smart Homes 477 Ian McLoughlin and Hamid Reza Sharifzadeh 28. Silicon Technologies for Speaker Independent Speech Processing and Recognition Systems in Noisy Environments 495 Karthikeyan Natarajan, Dr.Mala John, Arun Selvaraj 29. Voice Activated Appliances for Severely Disabled Persons 527 Soo-young Suk and Hiroaki Kojima 30. System Request Utterance Detection Based on Acoustic and Linguistic Features 539 T. Takiguchi, A. Sako, T. Yamagata and Y. Ariki [...]... automatic speech- to -speech translator,” Proc ICASSP’06, Tolouse, France, 2006 Y.Gong, ‘ Speech recognition in noisy environments: A survey,” Speech Communication, Vol.16, pp.261-291, April 1995 J Hershey, T Kristjansson, and Z Zhang, ”Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition,” in ISCAWorkshop on statistical and perceptual audio processing, 2004 Q Huo, and. .. distribution of clean 10 Speech Recognition, Technologies and Applications and noisy speech However, the model of the noisy channel and the correlation model are not set free as in the case of SSM They are parametrically related to the clean and noise distributions by the model of additive noise contamination in the log-spectral domain, and expressions of the noisy speech statistics and the correlation... scenarios demanding automatic speech recognition are: • Mobile phones • Moving cars • Spontaneous speech • Speech masked by other speech • Speech masked by music • Non-stationary noises • Co-channel voice interferences Interferences caused by other speakers constitute a bigger challenge than those changes in the recognition environment due to wide band noises 24 Speech Recognition, Technologies and Applications. .. the log-spectral domain for noisy speech recognition,” in IEEE Trans on Speech and Audio Processing, vol 13, no 3, May 2005 H Bourlard, and S Dupont, ”Subband-based speech recognition,” in Proc ICASSP’97, Munich, Germany, April 1997 V Digalakis, D Rtischev, and L Neumeyer, ”Speaker adaptation by constrained estimation of Gaussian mixtures,” IEEE Transactions on Speech and Audio Processing, vol 3, no... turns to (24) and the coefficients Ak and bk for the MAP estimate can be written as (25) 1 Note that other inverses that appear in the equations can be pre-computed and stored 8 Speech Recognition, Technologies and Applications (26) The coefficients in Equations (25) and (26) are exactly the same as those for the MMSE estimate that are given in Equations (20) and (21) To summarize, the MAP and MMSE estimates... the mixture weights, and the means and covariances of each Gaussian The mean and covariance of the kth component of state i can, similar to Equations (2) and (3), be partitioned as (32) (33) where subscripts x and y indicate the clean and noisy speech features respectively For the kth component of state i, given the observed noisy speech feature y, the MMSE estimate of the clean speech x is given by... types of acoustic distortion for robust speech recognition,” in Proc Eurospeech’01, Aalborg, Denmark, September, 2001 M Gales, and S Young, ”Robust continuous speech recognition using parallel model combination,” IEEE Transactions on Speech and Audio Processing, vol 4, 1996 M Gales, ”Semi-tied covariance matrices for hidden Markov models,” IEEE Transactions on Speech and Audio Processing, vol 7, pp 272-281,... utterance using the noisy marginal of the stereo HMMand converted into an N-best list Different sizes of the list were tested and results for lists of sizes 5, 10 and 15 are shown in the tables Hence, the summation in the 20 Speech Recognition, Technologies and Applications denominator of Equation (37) is performed over the list, and different values (1.0, 0.6 and 0.3) of the weighting υ are evaluated (denoted... uncertainity in observations: A new paradigm for robust speech recognition,” in Proc ICASSP’02, Orlando, Florida, May 2002 C.H Lee, ”On stochastic feature and model compensation approaches to robust speech recognition,” Speech Communication, vol 25, pp 29-47, 1998 C.H Lee, and Q Huo, ”On adaptive decision rules and decision parameter adaptation for automatic speech recognition,” Proceedings of the IEEE, vol... of hands-free (HF) data using three different configurations of MAP-SSM for 256 GMM size and different time window size 16 Speech Recognition, Technologies and Applications B SSM experiments for large vocabulary spontaneous speech recognition In this set of experiments the proposed technique is used for large vocabulary spontaneous English speech recognition The mapping is applied with the clean speech . Speech Recognition Technologies and Applications Speech Recognition Technologies and Applications Edited by France Mihelič and Janez Žibert I-Tech IV. 120115073 Speech Recognition, Technologies and Applications, Edited by France Mihelič and Janez Žibert p. cm. ISBN 978-953-7619-29-9 1. Speech Recognition, Technologies and Applications, . Separation and Robust Speech Recognition 061 Marco Kühne, Roberto Togneri and Sven Nordholm 5. Dereverberation and Denoising Techniques for ASR Applications 081 Fernando Santana Pacheco and

Ngày đăng: 26/06/2014, 23:20

Xem thêm