Complex valued gaussian process regression for speech separation (tt)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	14
Dung lượng	3,71 MB

Nội dung

୯ ҥ ύ ѧ ε Ꮲ ၗૻπำᏢ‫س‬ ᅺγፕЎ ፄኧࠠଯථၸำӣᘜᔈҔ‫ܭ‬ᇟॣϩᚆ Complex-valued Gaussian Process Regression for speech separation ࣴ ‫ ز‬ғǺLe Dinh Nguyen ࡰᏤ௲௤ǺЦৎቼ ௲௤ ύ ๮ ҇ ୯ ΋ԭ႟Ϥ ԃ Ϥ Д NATIONAL CENTRAL UNIVERSITY Department of Computer Science Master Thesis Complex-valued Gaussian Process Regression for speech separation ࣴ ‫ ز‬ғ : Le Dinh Nguyen ࡰᏤ௲௤ǺJia-Ching Wang ύ ๮ ҇ ୯ 106 ԃ Д ύЎᄔा ᇟॣϩᚆӧૻဦೀ౛ύࢂ΋໨‫ڀ‬ԖࡷᏯ‫ޑ܄‬ୢᚒǴ‫ځ‬ӧӚᅿ੿ჴШࣚ‫ޑ‬ ᔈҔύวචΑख़ाբҔǴ‫ٯ‬ӵᇟॣᒣ᛽‫س‬಍‫܈‬ႝߞ೯ૻǶᇟॣϩᚆ‫ޑ‬Ьा Ҟ኱ࣁவ΋ঁ‫ڀ‬Ԗӭঁว၉‫ޑޣ‬షӝᇟॣ՗ीрঁձว၉‫ޑޣ‬ᇟॣǶҗ‫ܭ‬ ӧ΋૓ԾฅᕉნΠǴᇟॣૻဦ࿶த‫ډڙ‬Ꮣॣ‫ځ܈‬Ѭᇟॣ‫ޑ‬υᘋǴᇟॣϩᚆ ӢԜᡂԋ΋ঁԖ֎ЇΚ‫زࣴޑ‬ፐᚒǶ ќ΋Бय़Ǵଯථၸำ(Gaussian Process, GP)ࢂ΋ᅿ୷‫ܭ‬ਡ‫ڄ‬ኧ‫ޑ‬ᐒᏔᏢ ಞБ‫ݤ‬Ǵ٠Ъς࿶εໆ‫ޑ‬೏ᔈҔӧૻဦೀ౛΢ǶӧԜࣴ‫ز‬ύǴ‫ॺך‬ගр୷ ‫ܭ‬ଯථၸำӣᘜ(Gaussian Process Regression, GPR)‫ޑ‬Б‫ٰݤ‬ኳᔕషӝᇟॣ ૻဦᆶଳృᇟॣϐ໔‫ߚޑ‬ጕ‫ࢀ܄‬৔Ǵ೏ख़ࡌ‫ޑ‬ᇟॣૻဦёҗGPኳࠠ‫ޑ‬ѳ֡ ‫ڄ‬ኧ‫؃‬ளǶኳ္ࠠ‫ޑ‬ຬୖኧ(Hyper-parameter)җӅ೫ఊࡋ‫(ݤ‬Conjugate Gradient Method)ٰ຾Չന٫ϯǶӧჴᡍ΢٬ҔTIMIT‫ޑ‬ᇟॣၗ਑৤Ǵ‫่ځ‬ ݀ᡉҢගр‫ޑ‬Б‫ݤ‬Ԗၨӳ‫߄ޑ‬౜Ƕ ! i Abstract Speech separation is a challenging signal processing which plays a significant role in improving the accuracy of various real-world applications, such as speech recognition system and telecommunication Its main goal is to isolate or estimate the target voice of each speaker from a mixed speech talked by various speakers at the same time Due to the fact that speech signals collected in the natural environment are frequently corrupted by noise data, speech separation has become an attractive research topic over the past several decades In addition, Gaussian process (GP) is a flexible kernel-based learning method which has found widespread application in signal processing In this thesis, a supervised method is proposed for handling speech separation problem In this work, we focus on modeling a nonlinear mapping between mixed and clean speeches based on GP regression, in which reconstructed audio signal is estimated by the predictive mean of GP model The nonlinear conjugate gradient method was utilized to perform the hyper-parameter optimization An experiment on a subset of TIMIT speech dataset is carried out to confirm the validity of the proposed approach ii Acknowledgements The work presented in this thesis has been carried out at the Department of Computer Science and Information Engineering in National Central University, Taiwan during the years 2015-2017 First of all, I wish to express my deepest gratitude to my research advisor, Professor Jia-Ching Wang, for guiding and encouraging me in my research The fact that the thesis is finished at all is in great part of his endless enthusiasm for talking about my work I also specially thank to Ms Sih-Huei Chen She greatly supported me for theoretical and helped me take my initial thesis proposal and develop it into a true body of work, resulting in several conference and workshop papers together I would like to thank students in Laboratory for lots of interesting discussions, various help, and making life at the laboratory so enjoyable Especially, I would like to thank to Ms Sih-Huei Chen for discussing and coworking in the research, to Mr Tuan Pham for helping me familiar with source separation The financial support provided by National Central University fellowship program and advisor Professor Jia-Ching Wang is gratefully acknowledged In addition, I wish to thank my family for their support in all my efforts iii Table of Contents Chapter Introduction 1.1 Motivation 1.2 Aim and Objective 1.3 Thesis Overview Chapter Background knowledge 2.1 Gaussian Process 2.1.1 Introduction 2.1.2 Covariance functions 2.1.3 Optimization of hyper-parameters 10 2.2 Short-time Fourier transform 12 2.2.1 Introduction 12 2.2.2 Spectrogram of STFT 14 2.2.3 Inverse short-time Fourier transform 16 2.3 Overlap-add method 17 2.4 Complex-valued Derivatives: 22 2.4.1 Differentiating complex exponentials of a real parameter 22 2.4.1.1 Differentiating complex exponentials 22 2.4.2 Differentiating function of a complex parameter 23 Chapter Employed systems 26 3.1 System overview: 26 3.1.1 Real-valued GP-based system for source separation 26 3.1.2 Complex-valued GP-based system for source separation 28 iv 3.2 GP regression-based source separation: 29 3.2.1 Real-valued GPR-based source separation 29 3.2.2 Complex-valued GPR-based source separation 31 Chapter Experiments 34 4.1 Real-valued GP regression-based model for source separation 34 4.2 Complex-valued GP regression-based model for speech enhancement 37 Chapter Conclusions and future work 40 Bibliographies………………………………………………………………….41 v List of Figures Figure 1.1 Cocktail party problem Figure 1.2 An example of single channel source separation Figure 2.1 GP model for regression Figure 2.2 GP model for regression 12 Figure 2.3 Windows overlapping 13 Figure 2.4 STFT of signal 14 Figure 2.5 (2-D) presentation of a spectrogram 16 Figure 2.6 ISTF process 17 Figure 2.7 A general diagram of OLA analysis and synthesis system 18 Figure 2.8 Linear convolution 18 Figure 2.9 OLA overview 20 Figure 2.10 An example of OLA 21 Figure 3.1 Real-valued GPR-based system 27 Figure 3.2 Complex-valued GPR-based system 28 Figure 4.1 Spectrograms of mixture, source and de-noised speech 37 vi List of tables Table 2.1 List of common Kernel functions 10 Table 4.1 Source separation performance using 512-points STFT 36 Table 4.2 Source separation performance using 1024-points STFT 36 Table 4.3 SNR and SegSNR in dB averaged over the white noise 38 Table 4.4 SNR and SegSNR in dB averaged over the babble noise 38 vii List of symbols and abbreviations Symbols È f* x* cov( f* ) ld s θ I ¶ z zR zI Ñ ՜ Joint distribution ՜ Test input ՜ Characteristic length-scale ՜ Set of hyper-parameters ՜ Derivative function ՜ Predictive mean ՜ Predictive covariance ՜ ՜ Variance Identity matrix ՜ Complex number ՜ Imaginary part of z ՜ ՜ Real part of z Gradient viii Abbreviations DNN GP GPR NMF SCSS STFT DFT STFTM FT FFT iSTFT iFFT SDR SAR SIR SNR SegSNR i.i.d ՜ Deep neural networks ՜ Gaussian process regression ՜ Gaussian process ՜ Nonnegative Matrix Factorization ՜ Short-time Fourier transform ՜ STFT magnitude ՜ Fast Fourier transform ՜ Inverse Fast Fourier transform ՜ Source-to-artifacts ratio ՜ Signal-to-noise ՜ Independent and identically distributed ՜ Single-channel speech separation ՜ Discrete Fourier transform ՜ Fourier transform ՜ Inverse Short-time Fourier transform ՜ Source-to-distortion ՜ Source-to-interference ratio ՜ Segmental signal-to-noise ratio ix ... GP-based system for source separation 28 iv 3.2 GP regression- based source separation: 29 3.2.1 Real -valued GPR-based source separation 29 3.2.2 Complex- valued GPR-based source separation. .. Chapter Experiments 34 4.1 Real -valued GP regression- based model for source separation 34 4.2 Complex- valued GP regression- based model for speech enhancement 37 Chapter Conclusions...NATIONAL CENTRAL UNIVERSITY Department of Computer Science Master Thesis Complex- valued Gaussian Process Regression for speech separation ࣴ ‫ ز‬ғ : Le Dinh Nguyen ࡰᏤ௲௤ǺJia-Ching Wang ύ ๮ ҇ ୯ 106 ԃ

Ngày đăng: 06/12/2018, 11:25