Luận án tiến sĩ phân tích và phát hiện tiếng nói dựa trên đặc tính động phi tuyến

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI ĐẶNG THÁI SƠN PHÂN TÍCH VÀ PHÁT HIỆN TIẾNG NÓI DỰA TRÊN ĐẶC TÍNH ĐỘNG PHI TUYẾN LUẬN ÁN TIẾN SĨ KỸ THUẬT ĐIỆN TỬ HÀ NỘI - 2017 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI ĐẶNG THÁI SƠN PHÂN TÍCH VÀ PHÁT HIỆN TIẾNG NÓI DỰA TRÊN ĐẶC TÍNH ĐỘNG PHI TUYẾN LUẬN ÁN TIẾN SĨ KỸ THUẬT ĐIỆN TỬ Chuyên ngành: Kỹ thuật điện tử Mã số: 62520203 GIẢNG VIÊN HƯỚNG DẪN KHOA HỌC: PGS.TS HOÀNG MẠNH THẮNG HÀ NỘI - 2017 LỜI CAM ĐOAN Tôi xin cam đoan kết trình bày luận án công trình nghiên cứu hướng dẫn cán hướng dẫn Các số liệu, kết trình bày luận án hoàn toàn trung thực chưa công bố công trình trước Các kết sử dụng tham khảo trích dẫn đầy đủ theo quy định Hà nội, ngày tháng năm 2017 Tác giả Đặng Thái Sơn LỜI CÁM ƠN Để hoàn thành luận án này, xin gửi lời biết ơn sâu sắc đến Thày cô môn Điện tử Kỹ thuật máy tính, Viện Điện tử–Viễn thông hỗ trợ, giúp đỡ động viên suốt trình làm luận án tiến sĩ Trường Đại học Bách khoa Hà Nội Tôi gửi lời cám ơn đến người hướng dẫn, PGS Hoàng Mạnh Thắng, người bảo định hướng cho trình nghiên cứu Tôi xin gửi lời cám ơn đến PGS Santo Banerjee bàn luận quan trọng cho luận án Xin cám ơn nhiều Hà Nội, ngày tháng năm 2017 MỤC LỤC DANH MỤC CÁC TỪ VIẾT TẮT v DANH MỤC HÌNH VẼ vii DANH MỤC BẢNG xi DANH MỤC CÁC KÝ HIỆU xii MỞ ĐẦU Chương 1: Tổng quan nhận dạng tiếng nói 1.1 Giới thiệu 1.2 Bối cảnh lịch sử 1.3 Các đặc trưng âm loại tín hiệu tiếng nói 1.4 Các đặc trưng phức tạp nhận dạng tiếng nói 10 1.5 Ảnh hưởng nhiễu nhận dạng tiếng nói 12 1.6 Phát tiếng nói (VAD) 15 1.7 Nghiên cứu xử lý tín hiệu phi tuyến tính 17 1.8 Nhận dạng tiếng nói tuyến tính phi tuyến tính 19 1.9 Xác định điểm kết thúc tín hiệu tiếng nói 20 1.10 Miền thời gian miền tần số-thời gian 23 1.11 Mô hình Markov ẩn 24 1.12 Kết luận 27 i ii Chương 2: Khảo sát phức tạp hệ thống ngẫu nhiên tín hiệu tiếng nói 29 2.1 Giới thiệu 29 2.2 Lô tái xuất đo lường 33 2.2.1 Lô tái xuất 33 2.2.2 Entropy lô tái xuất có trọng số 34 2.3 Đánh giá độ phức tạp tín hiệu tiếng nói ứng dụng lô tái xuất 36 2.3.1 WRP hệ thống có tác động tín hiệu nhiễu 37 2.4 Ứng dụng phương pháp đồng phân tích đặc tính động tín hiệu tiếng nói 44 2.4.1 Phân tích sai số đồng 44 2.4.2 Tái xuất trung bình điều kiện (Mean Conditional Recurrence– MCR) 46 2.4.3 Xác định hệ thống điều khiển–phản hồi điều kiện tái xuất trung bình 47 2.5 Nhận dạng tín hiệu tiếng nói ứng dụng không gian pha tái tạo 48 2.5.1 Lợi ích tính động phi tuyến cho xử lý tín hiệu 48 2.6 Thu thập tín hiệu tiếng nói 51 2.7 Kỹ thuật nhận dạng hoạt động tiếng nói 51 2.8 Phân tích tần số thời gian–tần số 53 2.9 Sự tái tạo không gian pha đặc tính động tái xuất phi tuyến tiếng nói 57 iii 2.10 Ứng dụng đặc tính động tái xuất với nhận dạng tín hiệu tiếng nói 57 2.11 Kết luận 60 Chương 3: Đề xuất phương pháp phát tiếng nói 62 3.1 Giới thiệu 63 3.1.1 Khái quát chung 63 3.1.2 Hệ thống VAD 65 3.1.3 Mục tiêu 65 3.1.4 Phương pháp đánh giá thuật toán VAD 66 3.2 Các phương pháp VAD 67 3.2.1 VAD dựa tỉ lệ cắt qua không lượng tín hiệu [7] 67 3.2.2 VAD dựa lượng tuyến tính (LED) [119] 69 3.2.3 VAD dựa lượng tuyến tính thích nghi [119] 70 3.2.4 VAD dựa nhận dạng mẫu 71 3.2.4.1 Số lượng số lần không 71 3.2.4.2 Logarit-năng lượng 71 3.2.4.3 Hệ số tự tương quan chuẩn hóa 73 3.2.4.4 Hệ số dự đoán 73 3.2.4.5 Sai số dự đoán chuẩn hóa 74 3.2.4.6 Tính toán khoảng cách 75 3.2.4.7 Quá trình thực thi 76 3.2.5 VAD dựa vào đo lường thống kê [19, 20] 76 3.3 Phương pháp đánh giá hiệu thuật toán VAD 80 3.3.1 Các tham số mục tiêu 80 iv 3.4 Phương pháp thu thập liệu liệu AURORA [1, 42] 82 3.5 Đề xuất đặc trưng phương pháp VAD 84 3.5.1 Phương pháp tính với cửa sổ dịch mẫu 85 3.5.2 Phương pháp tính đặc trưng 85 3.5.3 Phương pháp tính đặc trưng Tp 87 3.5.4 Phương pháp tính đặc trưng tổng hợp Sp xác định endpoint 87 3.6 Đánh giá phương pháp tín hiệu tiếng nói khác 88 3.6.1 Đánh giá tín hiệu tiếng nói khác tạp âm 88 3.6.2 Ứng dụng tín hiệu tiếng nói khác có tạp âm 88 3.7 So sánh đánh giá kết 93 3.7.1 So sánh với phương pháp có 96 3.8 Kết luận 101 Kết luận 103 Danh mục công trình công bố 104 DANH MỤC CÁC TỪ VIẾT TẮT VIẾT TẮT TIẾNG ANH TIẾNG VIỆT AMI Average mutual information Thông tin tương hỗ trung bình ApEn Approximate entropy Entropy xấp xỉ AR Auto-regression Tự động hồi qui ASR Automatic speech recognition Nhận dạng tiếng nói tự động CASA Computational auditory Phân tích ngữ cảnh scene analysis âm thông qua tính toán CML Cepstral mean normalization Chuẩn hóa trung bình Cepstral CS Complete synchronization Đồng hoàn chỉnh ECG Electrocardiogram Tín hiệu điện tim EEG Electroencephalogram Tín hiệu điện não EMG Electromyogram Tín hiệu điện EOG Electrooculogram Tín hiệu nhãn đồ FEC Front End Clipping Cắt phía trước FFT Fast Fourier Transform Biến đổi Fourier nhanh FNN False nearest neighbor Lân cận gần sai GS Generalized synchronization Đồng tổng quát HCI Human computer interface Giao tiếp người máy HMM Hidden Markov model Mô hình Markov ẩn JRP Joint Recurrence Plot Bảng hợp tái xuất LE Lyapunov Exponent Lũy thừa Lyapunov LS Lorenz-Stenflo Hệ thống động Lorenz-Stenflo LPC Linear predictive coding Mã hóa dự báo tuyến tính LRT Likelihood ratio test Kiểm tra tỷ lệ khả LVCSR Large vocabulary continuous Nhận dạng tiếng nói liên tục speech recognition với lượng từ vựng lớn Mid – Speech Clipping Cắt tiếng nói MSC v vi MCR Mean Conditional Recurrence Tái xuất có điều kiện trung bình MEG Magnetoencephalogram Tín hiệu từ não MFCC Mel-Frequency Hệ số Cepstrum tần số Mel Cepstrum Coefficient NPD Normal probability distribution Phân bố xác suất thường NSE Normalized synchronization error Lỗi đồng chuẩn hóa OVER Over Hang Phần nhô RASTA Relative spectral processing Xử lý phổ tương đối RP Recurrence plots Lô tái xuất PDF Probability density function Hàm phân bố mật độ xác suất PLP Perceptual Linear Prediction Dự đoán cảm nhận tuyến tính SampEn Sample entropy Entropy mẫu SNR Signal-to-noise Ratio Tỷ lệ tín hiệu nhiễu SR Speech recognition Nhận dạng tiếng nói STE Short time energy Năng lương khoảng thời gian ngắn TEO Teager energy operator Toán tử lượng Teager VAD Voice activity detection Phát hoạt động âm ZCAE Zero-crossing amplitude estimation Ước lượng biên độ cắt không WRP Weighted recurrence plot Lô tái xuất theo trọng số WRPE Entropy of the WRP Entropy lô tái xuất tái xuất theo trọng số DANH MỤC CÔNG TRÌNH CÔNG BỐ CỦA LUẬN ÁN [C1 ] Dang Thai Son, Thang Manh Hoang, "An Average Technique for Real Time Voice Activity Detection in Time Domain," IEEE ICCE 2016, 27-29 Jul 2016, pp 614-617 [J1 ] Thai Son Dang, Sanjay Kumar Palit, Sayan Mukherjee, Thang Manh Hoang, Santo Banerjee, "Complexity and synchronization in stochastic chaotic systems," Physical Journal Special Topics (EPJ ST) 225, 159–170, 2016 [J2 ] Dang Thai Son, Sayan Mukherjee, Thang Manh Hoang, Santo Banerjee, "An Average Technique for Real Time Voice Activity Detection in Time Domain," The Journal of Science and Technology (7 Technical Universities) 113, 2016 [J3 ] Thai Son Dang, Thang Manh Hoang, "An endpoint detection technique for voice and nonvoice recognition," The Journal of Science and Technology (7 Technical Universities) (accepted), 2016 104 TÀI LIỆU THAM KHẢO [1] Noizeus: A noisy speech corpus for evaluation of speech enhancement algorithm [2] (1969, June) Ieee recommended practice for speech quality measurements IEEE No 297-1969 , 1–24 [3] AILab Ailab [4] Albus, J., R Anderson, J Brayer, R DeMori, H.-Y Feng, S Horowitz, B Moayer, T Pavlidis, W Stallings, P Swain, et al (2012) Syntactic pattern recognition, applications, Volume 14 Springer Science & Business Media [5] Atal, B and L Rabiner (1976, Jun) A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition IEEE Transactions on Acoustics, Speech, and Signal Processing 24 (3), 201– 212 [6] Atal, B and M Schroeder (1978, Apr) Predictive coding of speech signals and subjective error criteria In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’78., Volume 3, pp 573–576 [7] Bachu, R., S Kopparthi, B Adapa, and B Barkana (2010) Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy, pp 279– 282 Dordrecht: Springer Netherlands [8] Banerjee, S., P Saha, and A R Chowdhury (2001) Chaotic scenario in the stenflo equations Physica Scripta 63 (3), 177–180 [9] Beritelli, F., S Casale, and A Cavallaero (1998, Dec) A robust voice activity detector for wireless communications using soft computing IEEE Journal on Selected Areas in Communications 16 (9), 1818–1829 [10] Blauth, D A., V P Minotto, C R Jung, B Lee, and T Kalker (2012a) Voice activity detection and speaker localization using audiovisual cues Pattern Recognition Letters 33 (4), 373 – 380 105 106 [11] Blauth, D A., V P Minotto, C R Jung, B Lee, and T Kalker (2012b) Voice activity detection and speaker localization using audiovisual cues Pattern Recognition Letters 33 (4), 373 – 380 Intelligent Multimedia Interactivity [12] Boashash, B (2015) Time-Frequency Signal Analysis with Applications UK: Academic Press [13] Bouquin-Jeannès, R L and G Faucon (1995) Study of a voice activity detector and its influence on a noise reduction system Speech Communication 16 (3), 245 – 254 [14] Bradley, E and R Mantilla (2002) Recurrence plots and unstable periodic orbits Chaos 12 (3), 596–600 [15] Chen, Y., M Ding, and J A S Kelso (1997, December) Long Memory Processes ( 1/fa Type) in Human Coordination Physical Review Letters 79, 4501–4504 [16] Cho, Y D., K Al-Naimi, and A Kondoz (2001, Apr) Mixed decision-based noise adaptation for speech enhancement Electronics Letters 37 (8), 540–542 [17] Cohen, P R and S L Oviatt (1995) The role of voice input for human-machine communication Proceedings of the National Academy of Sciences 92 (22), 9921–9927 [18] CSLU Toolkit, (2009) Cslu toolkit, 2009 [19] Davis, A and S Nordholm (2003) A low complexity statistical voice activity detector with performance comparisons to itu-t/etsi voice activity detectors In Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint Conference of the Fourth International Conference on, Volume 1, pp 119–123 IEEE [20] Davis, A., S Nordholm, and R Togneri (2006) Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold IEEE Transactions on Audio, Speech, and Language Processing 14 (2), 412– 424 107 [21] Davis, K H., R Biddulph, and S Balashek (1952) Automatic recognition of spoken digits The Journal of the Acoustical Society of America 24 (6), 637–642 [22] Dimitriadis, D., P Maragos, and A Potamianos (2002, May) Modulation features for speech recognition In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, Volume 1, pp I–377–I– 380 [23] Dov, D., R Talmon, and I Cohen (2015, April) Audio-visual voice activity detection using diffusion maps IEEE/ACM Trans Audio, Speech and Lang Proc 23 (4), 732–745 [24] Erland, S and P E Greenwood (2007, Sep) Constructing 1/ω α noise from reversible markov chains Phys Rev E 76, 031114 [25] Eroglu, D., T K D Peron, N Marwan, F A Rodrigues, L d F Costa, M Sebek, I Z Kiss, and J Kurths (2014, Oct) Entropy of weighted recurrence plots Phys Rev E 90, 042919 [26] Farmer, J D and J J Sidorowichl (2013) Exploiting Chaos to Predict the Future and Reduce Noise, pp 277–330 World Scientific [27] Fraser, A M and H L Swinney (1986, Feb) Independent coordinates for strange attractors from mutual information Phys Rev A 33, 1134–1140 [28] Freeman, D., G Cosier, C Southcott, and I Boyd (1989) The voice activity detector for the pan-european digital cellular mobile telephone service In Acoustics, Speech, and Signal Processing, 1989 ICASSP-89., 1989 International Conference on, pp 369–372 IEEE [29] Gao, X., H Cao, J Zhang, J Bai1, T Zhang, and L Jia (2013) A real-time dsp-based system for voice activity detection: Design and implement International Journal of Signal Processing, Image Processing and Pattern Recognition (6), 27 – 40 [30] Garza, V R (1997) Product reviews: Continuous speech-recognition software: Naturallyspeaking edges out viavoice with hands-free editing In InfoWorld, pp 116 108 [31] Gazor, S and W Zhang (2003a, Sept) A soft voice activity detector based on a laplacian-gaussian model IEEE Transactions on Speech and Audio Processing 11 (5), 498–505 [32] Gazor, S and W Zhang (2003b, Sept) A soft voice activity detector based on a laplacian-gaussian model IEEE Transactions on Speech and Audio Processing 11 (5), 498–505 [33] Gold, B and N Morgan (1999) Speech and Audio Signal Processing: Processing and Perception of Speech and Music (1st ed.) New York, NY, USA: John Wiley & Sons, Inc [34] Haigh, J A and J S Mason (1993, Oct) Robust voice activity detection using cepstral features In TENCON ’93 Proceedings Computer, Communication, Control and Power Engineering.1993 IEEE Region 10 Conference on, Volume 3, pp 321–324 vol.3 [35] Hamila, R., J Astola, F A Cheikh, M Gabbouj, and M Renfors (1999, Jan) Teager energy and the ambiguity function IEEE Transactions on Signal Processing 47 (1), 260–262 [36] Hamila, R., M Renfors, M Gabbouj, and J Astola (1997) Time-frequency signal analysis using teager energy In Proc Fourth International Conference on Electronics, Circuits and Systems, (Cairo, Egypt), pp 911–914, December 1997 [37] Haykin, S (2001) Adaptive Filter Theory (4th ed.) New York, NY, USA: Prentice Hall [38] Hermansky, H (1990) Perceptual linear predictive (plp) analysis of speech The Journal of the Acoustical Society of America 87 (4), 1738–1752 [39] Hermansky, H., N Morgan, and H G Hirsch (1993, April) Recognition of speech in additive and convolutional noise based on rasta spectral processing In Acoustics, Speech, and Signal Processing, 1993 ICASSP-93., 1993 IEEE International Conference on, Volume 2, pp 83–86 vol.2 [40] Hilborn, R (2000) Chaos and nonlinear dynamics: an introduction for scientists and engineers (2nd ed ed.) Oxford University Press 109 [41] https://www.itu.int/net/itu-t/sigdb/genaudio/AudioForm g.aspx?val=1000050 (2009, Sept) [42] Hu, Y and P C Loizou (2007) Subjective comparison and evaluation of speech enhancement algorithms Speech Communication 49 (7–8), 588 – 601 [43] Huang, X., A Acero, and H.-W Hon (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (1st ed.) Upper Saddle River, NJ, USA: Prentice Hall PTR [44] Hui, L., B.-Q Dai, and L Wei (2006, May) A pitch detection algorithm based on amdf and acf In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Volume 1, pp I–I [45] Igarashi, T and J F Hughes (2001) Voice as sound: using non-verbal voice input for interactive control In Proceedings of the 14th annual ACM symposium on User interface software and technology, pp 155–156 ACM [46] Itakura, F (1975, Feb) Minimum prediction residual principle applied to speech recognition IEEE Transactions on Acoustics, Speech, and Signal Processing 23 (1), 67–72 [47] Iwanski, J S and E Bradley (1998) Recurrence plots of experimental data: To embed or not to embed? Chaos (4), 861–871 [48] Jaffery, Z., K Ahmad, and P Sharma (2010) Estimation of speech signal in the presence of white noise using wavelet transform In Int Conf on Control, Communication and Power Engineering ACEEE [49] Junqua, J.-C and J.-P Haton (2012) Robustness in automatic speech recognition: Fundamentals and applications, Volume 341 Springer Science & Business Media [50] Junqua, J C., B Mak, and B Reaves (1994, Jul) A robust algorithm for word boundary detection in the presence of noise IEEE Transactions on Speech and Audio Processing (3), 406–412 [51] Kantz, H and T Schreiber (2004) Nonlinear Time Series Analysis Cambridge University Press 110 [52] Kaulakys, B and T Meˇskauskas (1998, Dec) Modeling 1/f noise Phys Rev E 58, 7013–7019 [53] Kelebekler, E and M Inal (2006) White and Color Noise Cancellation of Speech Signal by Adaptive Filtering and Soft Computing Algorithms, pp 970–975 Berlin, Heidelberg: Springer Berlin Heidelberg [54] Kennel, M B., R Brown, and H D I Abarbanel (1992a, Mar) Determining embedding dimension for phase-space reconstruction using a geometrical construction Phys Rev A 45, 3403–3411 [55] Kennel, M B., R Brown, and H D I Abarbanel (1992b, Mar) Determining embedding dimension for phase-space reconstruction using a geometrical construction Phys Rev A 45, 3403–3411 [56] Kleijn, W B and K K Paliwal (Eds.) (1995) Speech Coding and Synthesis New York, NY, USA: Elsevier Science Inc [57] Kolmogorov, A (1959) On entropy per unit time as a metric invariant of automorphisms Dokl Akad Nauk SSSR 124, 754–755 [58] Kristjansson, T., S Deligne, and P Olsen (2005) Voicing features for robust speech detection In Ninth European Conference on Speech Communication and Technology [59] Kumar, A and S K Mullick (1996) Nonlinear dynamical analysis of speech The Journal of the Acoustical Society of America 100 (1), 615–629 [60] Lamel, L., L Rabiner, A Rosenberg, and J Wilpon (1981, Aug) An improved endpoint detector for isolated word recognition IEEE Transactions on Acoustics, Speech, and Signal Processing 29 (4), 777–785 [61] Lartillot, O and P Toiviainen (2007, September 23-27) Mir in matlab (ii): A toolbox for musical feature extraction from audio In Proceedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, pp 127–130 [62] Lee, H (2001, Jan) Statistical confidence measures and their applications In Proc ICSP, pp 1021–1028 111 [63] Lee, K Y., B.-G Lee, and S Ann (1997, Oct) Adaptive filtering for speech enhancement in colored noise IEEE Signal Processing Letters (10), 277–279 [64] Letellier, C (2006, Jun) Estimating the shannon entropy: Recurrence plots versus symbolic dynamics Phys Rev Lett 96, 254102 [65] Letellier, C., H Rabarimanantsoa, L Achour, A Cuvelier, and J.-F Muir (2008) Recurrence plots for dynamical analysis of non-invasive mechanical ventilation Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 366 (1865), 621–634 [66] Li, K., M N S Swamy, and M O Ahmad (2005, Sept) An improved voice activity detection using higher order statistics IEEE Transactions on Speech and Audio Processing 13 (5), 965–974 [67] Lorenz, E N (1963) Deterministic nonperiodic flow Journal of the atmospheric sciences 20 (2), 130–141 [68] MAGANTI, H K., P MOTLICEK, and D GATICA-PEREZ Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms martigny, switzerland: Idiap, 2006 [69] Mak, M.-W and H.-B Yu (2014) A study of voice activity detection techniques for {NIST} speaker recognition evaluations Computer Speech & Language 28 (1), 295 – 313 [70] Maragos, P., J F Kaiser, and T F Quatieri (1993, Oct) Energy separation in signal modulations with application to speech analysis IEEE Transactions on Signal Processing 41 (10), 3024–3051 [71] Maragos, P and A Potamianos (1999) Fractal dimensions of speech sounds: Computation and application to automatic speech recognition The Journal of the Acoustical Society of America 105 (3), 1925–1932 [72] Markowitz, J A (2000) Using speech recognition Markowitz, J Consultants [73] Martinez, W and A Martinez (2002) Computational Statistics Handbook with Matlab Chapman & Hall/CRC 112 [74] Marwan, N and J Kurths (2005) Line structures in recurrence plots Physics Letters A 336 (4–5), 349 – 357 [75] Marwan, N., M C Romano, M Thiel, and J Kurths (2007) Recurrence plots for the analysis of complex systems Physics Reports 438 (5–6), 237 – 329 [76] Marwan, N., N Wessel, U Meyerfeldt, A Schirdewan, and J Kurths (2002a, Aug) Recurrence-plot-based measures of complexity and their application to heart-rate-variability data Phys Rev E 66, 026702 [77] Marwan, N., N Wessel, U Meyerfeldt, A Schirdewan, and J Kurths (2002b, Aug) Recurrence-plot-based measures of complexity and their application to heart-rate-variability data Phys Rev E 66, 026702 [78] MICA Speech communication department [79] Miller, K D and T W Troyer (2002a) Neural noise can explain expansive, power-law nonlinearities in neural response functions Journal of Neurophysiology 87 (2), 653–659 [80] Miller, K D and T W Troyer (2002b) Neural noise can explain expansive, power-law nonlinearities in neural response functions Journal of Neurophysiology 87 (2), 653–659 [81] Mitchell, T M (1997) Machine Learning (1 ed.) New York, NY, USA: McGraw-Hill, Inc [82] Moattar, M H and M M Homayounpour (2011) A weighted feature voting approach for robust and real-time voice activity detection ETRI Journal 33 (1), 99–109 [83] Moore, M., S Mitra, and R Bernstein (1997) A generalization of the teager algorithm In Proc 1997 IEEE Workshop on Nonlinear Signal Porcessing, (Ann Arbor, Michigan), September [84] Mukherjee, S., S K Palit, S Banerjee, M Ariffin, L Rondoni, and D Bhattacharya (2015a) Can complexity decrease in congestive heart failure? Physica A: Statistical Mechanics and its Applications 439, 93 – 102 113 [85] Mukherjee, S., S K Palit, S Banerjee, M Ariffin, L Rondoni, and D Bhattacharya (2015b) Can complexity decrease in congestive heart failure? Physica A: Statistical Mechanics and its Applications 439, 93 – 102 [86] Muroi, T., R Takashima, T Takiguchi, and Y Ariki (2009, Jan) Gradientbased acoustic features for speech recognition In Intelligent Signal Processing and Communication Systems, 2009 ISPACS 2009 International Symposium on, pp 445–448 [87] Naik, G R., D K Kumar, V P Singh, and M Palaniswami (2006) Hand gestures for hci using ica of emg In Proceedings of the HCSNet workshop on Use of vision in human-computer interaction-Volume 56, pp 67–72 Australian Computer Society, Inc [88] Naylor, P A., A Kounoudes, J Gudnason, and M Brookes (2007, Jan) Estimation of glottal closure instants in voiced speech using the dypsa algorithm IEEE Transactions on Audio, Speech, and Language Processing 15 (1), 34–43 [89] Ney, H (1981, Apr) An optimization algorithm for determining the endpoints of isolated utterances In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’81., Volume 6, pp 720–723 [90] Noll, A M (1967) Cepstrum pitch determination The journal of the acoustical society of America 41 (2), 293–309 [91] Ogorzalek, M J (2002) Using nonlinear dynamics and chaos to solve signal processing tasks In Chaos In Circuits And Systems, pp 487–507 [92] Oppenheim, A V (1970, Aug) Speech spectrograms using the fast fourier transform IEEE Spectrum (8), 57–62 [93] Packard, N H., J P Crutchfield, J D Farmer, and R S Shaw (1980, Sep) Geometry from a time series Phys Rev Lett 45, 712–716 [94] Palit, S K., S Mukherjee, and D Bhattacharya (2012) New types of nonlinear auto-correlations of bivariate data and their applications Applied Mathematics and Computation 218 (17), 8951 – 8967 114 [95] Palit, S K., S Mukherjee, and D Bhattacharya (2013) A high dimensional delay selection for the reconstruction of proper phase space with cross autocorrelation Neurocomputing 113, 49 – 57 [96] Park, H.-M and R M Stern (2009, January) Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zerocrossings Speech Commun 51 (1), 15–25 [97] Payette, J (1994) Advanced human-computer interface and voice processing applications in space In Proceedings of the workshop on Human Language Technology, pp 416–420 Association for Computational Linguistics [98] Pearce, D., H g¨ unter Hirsch, and E E D Gmbh (2000) The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions In in ISCA ITRW ASR2000, pp 29–32 [99] Pecora, L M and T L Carroll (1990, Feb) Synchronization in chaotic systems Phys Rev Lett 64, 821–824 [100] Pecora, L M and T L Carroll (1991, Aug) Driving systems with chaotic signals Phys Rev A 44, 2374–2383 [101] Petry, A and D A C Barone (2002) Speaker identification using nonlinear dynamical features Chaos, Solitons & Fractals 13 (2), 221 – 231 [102] Pincus, S M (1991) Approximate entropy as a measure of system complexity Proceedings of the National Academy of Sciences 88 (6), 2297–2301 [103] Potamianos, A.; Maragos, P (2001, 3) Time-frequency distributions for automatic speech recognition IEEE Transactions on Speech and Audio Processing 9, 196–200 [104] Prasad, R V., A Sangwan, H Jamadagni, M Chiranth, R Sah, and V Gaurav (2002) Comparison of voice activity detection algorithms for voip In Computers and Communications, 2002 Proceedings ISCC 2002 Seventh International Symposium on, pp 530–535 IEEE [105] Rabarimanantsoa, H., L Achour, C Letellier, A Cuvelier, and J.-F Muir (2007) Recurrence plots and shannon entropy for a dynamical analysis of asynchronisms in noninvasive mechanical ventilation Chaos 17 (1), 013115 115 [106] Rabiner, L and B.-H Juang (1993) Fundamentals of Speech Recognition Upper Saddle River, NJ, USA: Prentice-Hall, Inc [107] Rabiner, L and R Schafer (2011) Digital speech processing The Froehlich/Kent Encyclopedia of Telecommunications 6, 237–258 [108] Rabiner, L R (1989, Feb) A tutorial on hidden markov models and selected applications in speech recognition Proceedings of the IEEE 77 (2), 257–286 [109] Radmard, M., M Hadavi, and M Nayebi (2011) A new method of voiced/unvoiced classification based on clustering Journal of Signal and Information Processing 2, 336–347 [110] Ramirez, J., J M Górriz, and J C Segura (2007) Voice activity detection fundamentals and speech recognition system robustness INTECH Open Access Publisher [111] Ramirez, J., J C Segura, C Benitez, A de la Torre, and A Rubio (2004) Efficient voice activity detection algorithms using long-term speech information Speech Communication 42 (34), 271 – 287 [112] Ramirez, J., J C Segura, C Benitez, L Garcia, and A Rubio (2005, Oct) Statistical voice activity detection using a multiple observation likelihood ratio test IEEE Signal Processing Letters 12 (10), 689–692 [113] Recognition, S Speech recognition [114] Renevey, P and A Drygajlo (2001) Entropy based voice activity detection in very noisy conditions In in Eurospeech, pp 1887–1890 [115] Richman, J S and J R Moorman (2000) Physiological time-series analysis using approximate entropy and sample entropy American Journal of Physiology - Heart and Circulatory Physiology 278 (6), H2039–H2049 [116] Rodríguez-Bermúdez, G and P J García-Laencina (2015) Analysis of eeg signals using nonlinear dynamics and chaos: a review Applied Mathematics & Information Sciences (5), 2309 116 [117] Romano, M C., M Thiel, J Kurths, and C Grebogi (2007, Sep) Estimation of the direction of the coupling by conditional probabilities of recurrence Phys Rev E 76, 036211 [118] Rouat, J., Y C Liu, and D Morissette (1997) A pitch determination and voiced/unvoiced decision algorithm for noisy speech Speech Communication 21 (3), 191 – 207 [119] Sakhnov, K., E Verteletskaya, and B Simak (2009) Dynamical energybased speech/silence detector for speech enhancement applications In Proceedings of the World Congress on Engineering, Volume 1, pp Citeseer [120] Savoji, M (1989a) A robust algorithm for accurate endpointing of speech signals Speech Communication (1), 45 – 60 [121] Savoji, M H (1989b, March) A robust algorithm for accurate endpointing of speech signals Speech Commun (1), 45–60 [122] Shannon, C E (1948) A Mathematical Theory of Communication The Bell System Technical Journal 27 (3), 379–423 [123] Shannon, C E (2001) A mathematical theory of communication ACM SIGMOBILE Mobile Computing and Communications Review (1), 3–55 [124] Shen, J.-l., J.-w Hung, and L.-s Lee (1998) Robust entropy-based endpoint detection for speech recognition in noisy environments In ICSLP, Volume 98, pp 232–235 [125] Sinai, Y (1959) On the notion of entropy for a dynamic system Dokl Akad Nauk SSSR 124 (4), 768–771 [126] Sohn, J., N S Kim, and W Sung (1999, Jan) A statistical model-based voice activity detection IEEE Signal Processing Letters (1), 1–3 [127] Sphinx, C Cmu sphinx [128] Stegmann, J and G Schroder (1997) Robust voice-activity detection based on the wavelet transform In Speech Coding For Telecommunications Proceeding, 1997, 1997 IEEE Workshop on, pp 99–100 IEEE [129] Takens, F (1981a) Detecting strange attractors in turbulence, pp 366–381 Berlin, Heidelberg: Springer Berlin Heidelberg 117 [130] Takens, F (1981b) Detecting strange attractors in turbulence, pp 366– 381 Berlin, Heidelberg: Springer Berlin Heidelberg [131] Teager, H M and S M Teager (1990) Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract, pp 241–261 Dordrecht: Springer Netherlands [132] Thiel, M., M C Romano, P L Read, and J Kurths (2004) Estimation of dynamical invariants without embedding by recurrence plots Chaos 14 (2), 234–243 [133] Tucker, R (1992, Aug) Voice activity detection using a periodicity measure IEE Proceedings I - Communications, Speech and Vision 139 (4), 377– 380 [134] Velichko, V and N Zagoruyko (1970) Automatic recognition of 200 words International Journal of Man-Machine Studies (3), 223 – 234 [135] VietVoice Vietvoice [136] Voss, R F and J Clarke (1976, Jan) Flicker ( f1 ) noise: Equilibrium temperature and resistance fluctuations Phys Rev B 13, 556–573 [137] VSpeech Vspeech [138] Webber, J., L Charles, and N Marwan (2015) Recurrence Quantification Analysis: Theory and Best Practices Springer International Publishing [139] Ye, J., R J Povinelli, and M T Johnson (2002, Oct) Phoneme classification using naive bayes classifier in reconstructed phase space In Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop Proceedings of 2002 IEEE 10th, pp 37–40 [140] Yegnanarayana, B (1996, 02) On timing in time-frequency analysis of speech signals Sadhana 21, 5–20 [141] Ying, G S., C D Mitchell, and L H Jamieson (1993a, April) Endpoint detection of isolated utterances based on a modified teager energy measurement In Acoustics, Speech, and Signal Processing, 1993 ICASSP-93., 1993 IEEE International Conference on, Volume 2, pp 732–735 vol.2 118 [142] Ying, G S., C D Mitchell, and L H Jamieson (1993b, April) Endpoint detection of isolated utterances based on a modified teager energy measurement In Acoustics, Speech, and Signal Processing, 1993 ICASSP-93., 1993 IEEE International Conference on, Volume 2, pp 732–735 vol.2 [143] Young, S J and S Young (1993) The HTK hidden Markov model toolkit: Design and philosophy University of Cambridge, Department of Engineering ... định khung có tiếng nói thông qua thuật toán nhận dạng 1.6 Phát tiếng nói (VAD) Phát tiếng nói nhiệm vụ ứng dụng liên quan đến xử lý tiếng nói như: mã hóa tiếng nói, nhận dạng tiếng nói, Nhiệm...BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI ĐẶNG THÁI SƠN PHÂN TÍCH VÀ PHÁT HIỆN TIẾNG NÓI DỰA TRÊN ĐẶC TÍNH ĐỘNG PHI TUYẾN LUẬN ÁN TIẾN SĨ KỸ THUẬT ĐIỆN TỬ Chuyên ngành:... hiệu giọng nói • Các bước xử lý tín hiệu phát phần tín hiệu có tiếng nói phần tiếng nói Đề xuất phương pháp phân tích VAD dựa đường bao vi phân trung bình tín hiệu tiếng nói đưa Việc phân tích thực

Định dạng
Số trang	134
Dung lượng	7,03 MB