SPEECH TECHNOLOGIES Edited by Ivo Ipšić Speech Technologies Edited by Ivo Ipšić Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Iva Lipovic Technical Editor Teodora Smiljanic Cover Designer Jan Hyrat Image Copyright George Nazmi Bebawi, 2010. Used under license from Shutterstock.com First published June, 2011 Printed in India A free online edition of this book is available at www.intechopen.com Additional hard copies can be obtained from orders@intechweb.org Speech Technologies, Edited by Ivo Ipšić p. cm. ISBN 978-953-307-996-7 free online editions of InTech Books and Journals can be found at www.intechopen.com Contents Preface IX Part 1 Speech Signal Modeling 1 Chapter 1 Multi-channel Feature Enhancement for Robust Speech Recognition 3 Rudy Rotili, Emanuele Principi, Simone Cifani, Francesco Piazza and Stefano Squartini Chapter 2 Real-time Hardware Feature Extraction with Embedded Signal Enhancement for Automatic Speech Recognition 29 Vinh Vu Ngoc, James Whittington and John Devlin Chapter 3 Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition 55 Stephen A. Zahorian and Hongbing Hu Chapter 4 Determination of Spectral Parameters of Speech Signal by Goertzel Algorithm 79 Božo Tomas and Darko Zelenika Chapter 5 Blind Segmentation of Speech Using Non-linear Filtering Methods 105 Okko Räsänen, Unto K. Laine and Toomas Altosaar Chapter 6 Towards a Multimodal Silent Speech Interface for European Portuguese 125 João Freitas, António Teixeira, Miguel Sales Dias and Carlos Bastos Chapter 7 The Influence of Lombard Effect on Speech Recognition 151 Damjan Vlaj and Zdravko Kačič VI Contents Chapter 8 Suitable Reverberation Criteria for Distant-talking Speech Recognition 169 Takanobu Nishiura and Takahiro Fukumori Chapter 9 The Importance of Acoustic Reflex in Speech Discrimination 185 Kelly Cristina Lira de Andrade, Silvio Caldas Neto and Pedro de Lemos Menezes Chapter 10 Single-Microphone Speech Separation: The use of Speech Models 195 S. W. Lee Part 2 Speech Recognition 219 Chapter 11 Speech Recognition System of Slovenian Broadcast News 221 Mirjam Sepesy Maučec and Andrej Žgank Chapter 12 Wake-Up-Word Speech Recognition 237 Veton Këpuska Chapter 13 Syllable Based Speech Recognition 263 Rıfat Aşlıyan Chapter 14 Phone Recognition on the TIMIT Database 285 Carla Lopes and Fernando Perdigão Chapter 15 HMM Adaptation Using Statistical Linear Approximation for Robust Speech Recognition 303 Berkovitch Michael and Shallom D.Ilan Chapter 16 Speech Recognition Based on the Grid Method and Image Similarity 321 Janusz Dulas Part 3 Applications 341 Chapter 17 Improvement of Sound Quality on the Body Conducted Speech Using Differential Acceleration 343 Masashi Nakayama, Shunsuke Ishimitsu and Seiji Nakagawa Chapter 18 Frequency Lowering Algorithms for the Hearing Impaired 361 Francisco J. Fraga, Leticia Pimenta C. S. Prates, Alan M. Marotta and Maria Cecilia Martinelli Iorio Contents VII Chapter 19 The Usability of Speech and Eye Gaze as a Multimodal Interface for a Word Processor 385 T.R. Beelders and P.J. Blignaut Chapter 20 Vowel Judgment for Facial Expression Recognition of a Speaker 405 Yasunari Yoshitomi, Taro Asada and Masayoshi Tabuse Chapter 21 Speech Research in TUSUR 425 Roman V. Meshchryakov Preface The book “Speech Technologies” addresses different aspects of the research field and a wide range of topics in speech signal processing, speech recognition and language processing. The chapters are divided in three different sections: Speech Signal Model- ing, Speech Recognition and Applications. The chapters in the first section cover some essential topics in speech signal processing used for building speech recognition as well as for speech synthesis systems: speech feature enhancement, speech feature vec- tor dimensionality reduction, segmentation of speech frames into phonetic segments. The chapters of the second part cover speech recognition methods and techniques used to read speech from various speech databases and broadcast news recognition for English and non-English languages. The third section of the book presents various speech technology applications used for body conducted speech recognition, hearing impairment, multimodal interfaces and facial expression recognition. I would like to thank to all authors who have contributed research and application pa- pers from the field of speech and language technologies. Ivo Ipšić University of Rijeka, Croatia [...]... enhancement algorithms for robust speech recognition were presented and their performances have been tested by means of the Aurora 2 speech database suitably modified to deal with the multi-channel case study in a far-field acoustic scenario Three are the approaches here addressed, each one operating at a different 24 22 Speech Technologies Speech Technologies Book 1 level of the common speech feature extraction... estimators for speech enhancement under normal and Rayleigh inverse Gaussian distributions, Audio, Speech, and Language Processing, IEEE Transactions on 15(3): 918–927 26 24 Speech Technologies Speech Technologies Book 1 Herbordt, W., Buchner, H., Nakamura, S & Kellermann, W (2007) Multichannel bin-wise robust frequency-domain adaptive filtering and its application to adaptive beamforming, Audio, Speech and... between desired speech transient and interfering transient, enables the algorithm to work in nonstationary noise environments The multi-channel postfilter, combined with the TF-GSC, proved the best for handling abrupt noise spectral variations Moreover, in this algorithm, the decisions made by the postfilter, distinguishing between speech, stationary noise, and τb = 8 6 Speech Technologies Speech Technologies. .. Speech Technologies Speech Technologies Book 1 5.2 Multi-channel histogram equalization One of the well-known problems in histogram equalization is represented by the fact that there is a minimum amount of data per sentence necessary to correctly calculate the needed cumulative densities Such a problem exists both for reference and noisy CDFs and it is obviously related to the available amount of speech. .. rewritten as follows: Yi = Ri e jφi = Ai e jαi + Ni , 1≤i≤M (4) 6 4 Speech Technologies Speech Technologies Book 1 where Ri , φi , Ai and αi are the amplitude and phase terms of Yi and Xi respectively For simplicity of notation, the frequency bin and time frame indexes have been omitted The mel-frequency filter-bank’s output power for noisy speech is myi (b, l ) = ∑ wb (k)|Yi (k, l )|2 (5) k where wb (k)... for speech enhancement, Speech communication 49(2): 134–143 Cifani, S., Principi, E., Rocchi, C., Squartini, S & Piazza, F (2008) A multichannel noise reduction front-end based on psychoacoustics for robust speech recognition in highly noisy environments, Proc of IEEE Hands-Free Speech Communication and Microphone Arrays, pp 172–175 Cohen, I (2004) Relative transfer function identification using speech. .. beamforming, Acoustics, Speech, and Signal Processing, IEEE Transactions on 35: 1365–1376 De La Torre, A., Peinado, A., Segura, J., Perez-Cordoba, J., Benítez, M & Rubio, A (2005) Histogram equalization of speech representation for robust speech recognition, Speech and Audio Processing, IEEE Transactions on 13(3): 355–366 Deng, L., Droppo, J & Acero, A (2004) Estimating Cepstrum of Speech Under the Presence... (Ephraim & Malah (1985)), as well as techniques operating in the feature domain such 4 2 Speech Technologies Speech Technologies Book 1 as the MFCC-MMSE (Yu, Deng, Droppo, Wu, Gong & Acero (2008)) and its optimizations (Principi, Cifani, Rotili, Squartini & Piazza (2010); Yu, Deng, Wu, Gong & Acero (2008)) and VTS speech enhancement (Stouten (2006)) Other algorithms belonging to the single-channel class... For each DFT channel, the histogram of the corresponding spectral amplitude was computed and then fitted by means of a nonlinear least-squares (NLLS) technique to six different PDFs: 10 8 Speech Technologies Speech Technologies Book 1 Rayleigh: p = Laplace: p = Gamma: p = Chi: p = x − x2 σ exp 2σ −| x − a| 1 2σ exp σ 1 k−1 exp k Γ(k) | x | θ −| x | θ −| x | 2 θ 2 | x |k−1 exp θ k Γ(k/2) μ ν +1 −μ| x... concatenating the MFCC vectors of each channel (CDF Conc) Speech signals Baseline Front-end Average CDF Mean/Conc HEQ Fig 7 HEQ MFCCmean CDF mean/conc: HEQ based on averaged MFCCs and mean of CDFs or concatenated signals 19 17 Multi-channelEnhancement for Robust Speech RecognitionRobust Speech Recognition Multi-channel Feature Feature Enhancement for CDF Speech signals Baseline Front-end Average HEQ Fig 6 . cover some essential topics in speech signal processing used for building speech recognition as well as for speech synthesis systems: speech feature enhancement, speech feature vec- tor dimensionality. Lemos Menezes Chapter 10 Single-Microphone Speech Separation: The use of Speech Models 195 S. W. Lee Part 2 Speech Recognition 219 Chapter 11 Speech Recognition System of Slovenian Broadcast. wide range of topics in speech signal processing, speech recognition and language processing. The chapters are divided in three different sections: Speech Signal Model- ing, Speech Recognition and