Tách nguồn âm thanh sử dụng mô hình phổ nguồn tổng quát trên cơ sở thừa số hóa ma trận không âm

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	129
Dung lượng	1,84 MB

Nội dung

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY DUONG THI HIEN THANH AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GENERIC SOURCE SPECTRAL MODEL DOCTORAL DISSERTATION OF COMPUTER SCIENCE Hanoi - 2019 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY DUONG THI HIEN THANH AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GENERIC SOURCE SPECTRAL MODEL Major: Computer Science Code: 9480101 DOCTORAL DISSERTATION OF COMPUTER SCIENCE SUPERVISORS: ASSOC PROF DR NGUYEN QUOC CUONG DR NGUYEN CONG PHUONG Hanoi - 2019 DECLARATION OF AUTHORSHIP I, Duong Thi Hien Thanh, hereby declare that this thesis is my original work and it has been written by me in its entirety I confirm that: • This work was done wholly during candidature for a Ph.D research degree at Hanoi University of Science and Technology • Where any part of this thesis has previously been submitted for a degree or any other qualification at Hanoi University of Science and Technology or any other institution, this has been clearly stated • Where I have consulted the published work of others, this is always clearly attributed • Where I have quoted from the work of others, the source is always given With the exception of such quotations, this thesis is entirely my own work • I have acknowledged all main sources of help • Where the thesis is based on work done by myself jointly with others, I have made exactly what was done by others and what I have contributed myself Hanoi, February 2019 Ph.D Student Duong Thi Hien Thanh SUPERVISORS Assoc.Prof Dr Nguyen Quoc Cuong i Dr Nguyen Cong Phuong ACKNOWLEDGEMENT This thesis has been written during my doctoral study at International Research Institute Multimedia, Information, Communication, and Applications (MICA), Hanoi University of Science and Technology (HUST) It is my great pleasure to thank numerous people who have contributed towards shaping this thesis First and foremost I would like to express my most sincere gratitude to my supervisors, Assoc Prof Nguyen Quoc Cuong and Dr Nguyen Cong Phuong, for their great guidance and support throughout my Ph.D study I am grateful to them for devoting their precious time to discussing research ideas, proofreading, and explaining how to write good research papers I would like to thank them for encouraging my research and empowering me to grow as a research scientist I could not have imagined having a better advisor and mentor for my Ph.D study I would like to express my appreciation to my supervisor in Master cource, Prof Nguyen Thanh Thuy, School of Information and Communication Technology - HUST, and Dr Nguyen Vu Quoc Hung, my supervisor in Bachelors course at Hanoi National University of Education They had shaped my knowledge for excelling in studies In the process of implementation and completion of my research, I have received many supports from the board of MICA directors and my colleagues at Speech Communication department Particularly, I am very much thankful to Prof Pham Thi Ngoc Yen, Prof Eric Castelli, Dr Nguyen Viet Son and Dr Dao Trung Kien, who provided me with an opportunity to join researching works in MICA institute and have access to the laboratory and research facilities Without their precious support would it have been being impossible to conduct this research My warmly thanks go to my colleagues at Speech Communication department of MICA institute for their useful comments on my study and unconditional support over four years both at work and outside of work I am very grateful to my internship supervisor Prof Nobutaka Ono and the members of Ono’s Lab at the National Institute of Informatics, Japan for warmly welcoming me into their lab and the helpful research collaboration they offered I much appreciate his help in funding my conference trip and introducing me to the signal processing research communities I would also like to thank Dr Toshiya Ohshima, MSc Yasutaka Nakajima, MSc Chiho Haruta and other researchers at Rion Co., Ltd., Japan for ii welcoming me to their company and providing me data for experimental I would also like to sincerely thank Dr Nguyen Quang Khanh, dean of Information Technology Faculty, and Assoc Prof Le Thanh Hue, dean of Economic Informatics Department, at Hanoi University of Mining and Geology (HUMG) where I am working I have received the financial and time support from my office and leaders for completing my doctoral thesis Grateful thanks also go to my wonderful colleagues and friends Nguyen Thu Hang, Pham Thi Nguyet, Vu Thi Kim Lien, Vo Thi Thu Trang, Pham Quang Hien, Nguyen The Binh, Nguyen Thuy Duong, Nong Thi Oanh and Nguyen Thi Hai Yen, who have the unconditional support and help during a long time A special thank goes to Dr Le Hong Anh for the encouragement and his precious advice Last but not the least, I would like to express my deepest gratitude to my family I am very grateful to my mother-in-law and father-in-law for their support in the time of need, and always allow me to focus on my work I dedicate this thesis to my mother and father with special love, they have been being a great mentor in my life and had constantly encouraged me to be a better person The struggle and sacrifice of my parents always motivate me to work hard in my studies I would also like to express my love to my younger sisters and younger brother for their encouraging and helping This work has become more wonderful because of the love and affection that they have provided A special love goes to my beloved husband Tran Thanh Huan for his patience and understanding, for always being there for me to share the good and bad times I also appreciate my sons Tran Tuan Quang and Tran Tuan Linh for always cheering me up with their smiles Without love from them, this thesis would not have been completed Thank you all! Hanoi, February 2019 Ph.D Student Duong Thi Hien Thanh iii CONTENTS DECLARATION OF AUTHORSHIP DECLARATION OF AUTHORSHIP i i ACKNOWLEDGEMENT ii CONTENTS iv NOTATIONS AND GLOSSARY viii LIST OF TABLES xi LIST OF FIGURES xii INTRODUCTION Chapter AUDIO SOURCE SEPARATION: FORMULATION AND STATE OF THE ART 10 1.1 Audio source separation: a solution for cock-tail party problem 10 1.1.1 General framework for source separation 10 1.1.2 Problem formulation 11 State of the art 13 1.2.1 13 1.2.1.1 Gaussian Mixture Model 14 1.2.1.2 Nonnegative Matrix Factorization 15 1.2.1.3 Deep Neural Networks 16 Spatial models 18 1.2.2.1 Interchannel Intensity/Time Difference (IID/ITD) 18 1.2.2.2 Rank-1 covariance matrix 19 1.2.2.3 Full-rank spatial covariance model 20 Source separation performance evaluation 21 1.3.1 Energy-based criteria 22 1.3.2 Perceptually-based criteria 23 Summary 23 1.2 1.2.2 1.3 1.4 Spectral models Chapter NONNEGATIVE MATRIX FACTORIZATION 2.1 NMF introduction iv 24 24 2.2 2.3 2.1.1 NMF in a nutshell 24 2.1.2 Cost function for parameter estimation 26 2.1.3 Multiplicative update rules 27 Application of NMF to audio source separation 29 2.2.1 Audio spectra decomposition 29 2.2.2 NMF-based audio source separation 30 Proposed application of NMF to unusual sound detection 32 2.3.1 Problem formulation 33 2.3.2 Proposed methods for non-stationary frame detection 34 2.3.2.1 Signal energy based method 34 2.3.2.2 Global NMF-based method 35 2.3.2.3 Local NMF-based method 35 Experiment 37 2.3.3.1 Dataset 37 2.3.3.2 Algorithm settings and evaluation metrics 37 2.3.3.3 Results and discussion 38 Summary 43 2.3.3 2.4 Chapter SINGLE-CHANNEL AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GENERIC SOURCE SPECTRAL MODEL WITH MIXED GROUP SPARSITY CONSTRAINT 44 3.1 General workflow of the proposed approach 44 3.2 GSSM formulation 46 3.3 Model fitting with sparsity-inducing penalties 46 3.3.1 Block sparsity-inducing penalty 47 3.3.2 Component sparsity-inducing penalty 48 3.3.3 Proposed mixed sparsity-inducing penalty 49 3.4 Derived algorithm in unsupervised case 49 3.5 Derived algorithm in semi-supervised case 52 3.5.1 Semi-GSSM formulation 52 3.5.2 Model fitting with mixed sparsity and algorithm 54 Experiment 54 3.6.1 Experiment data 54 3.6.1.1 55 3.6 Synthetic dataset v 3.6.2 3.6.3 3.7 3.6.1.2 SiSEC-MUS dataset 55 3.6.1.3 SiSEC-BNG dataset 56 Single-channel source separation performance with unsupervised setting 57 3.6.2.1 Experiment settings 57 3.6.2.2 Evaluation method 57 3.6.2.3 Results and discussion 61 Single-channel source separation performance with semi-supervised setting 65 3.6.3.1 Experiment settings 65 3.6.3.2 Evaluation method 65 3.6.3.3 Results and discussion 65 Summary 66 Chapter MULTICHANNEL AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GSSM IN GAUSSIAN MODELING FRAMEWORK 68 4.1 Formulation and modeling 68 4.1.1 Local Gaussian model 68 4.1.2 NMF-based source variance model 70 4.1.3 Estimation of the model parameters 71 Proposed GSSM-based multichannel approach 72 4.2.1 GSSM construction 72 4.2.2 Proposed source variance fitting criteria 73 4.2.2.1 Source variance denoising 73 4.2.2.2 Source variance separation 74 4.2.3 Derivation of MU rule for updating the activation matrix 75 4.2.4 Derived algorithm 77 Experiment 79 4.3.1 Dataset and parameter settings 79 4.3.2 Algorithm analysis 80 4.2 4.3 4.3.2.1 4.3.2.2 4.3.3 Algorithm convergence: separation results as functions of EM and MU iterations 80 Separation results with different choices of λ and γ 81 Comparison with the state of the art vi 82 4.4 Summary 91 CONCLUSIONS AND PERSPECTIVES 93 BIBLIOGRAPHY 96 LIST OF PUBLICATIONS 113 vii NOTATIONS AND GLOSSARY Standard mathematical symbols C Set of complex numbers R Set of real numbers Z Set of integers E Expectation of a random variable Nc Complex Gaussian distribution Vectors and matrices a Scalar a Vector A Matrix A T Matrix transpose A H Matrix conjugate transposition (Hermitian conjugation) diag(a) Diagonal matrix with a as its diagonal det(A) Determinant of matrix A tr(A) Matrix trace A The element-wise Hadamard product of two matrices (of the same dimension) B with elements [A A (n) a A 1 B]ij = Aij Bij (n) The matrix with entries [A]ij -norm of vector -norm of matrix Indices f Frequency index i Channel index j Source index n Time frame index t Time sample index viii IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7):1462– 1476 [39] Fan, H.-T., Hung, J.-w., Lu, X., Wang, S.-S., and Tsao, Y (2014) Speech enhancement using segmental nonnegative matrix factorization In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4483– 4487 [40] Févotte, C., Bertin, N., and Durrieu, J L (2009) Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis Neural Computation, 21(3):793–830 [41] Févotte, C., Gribonval, R., and Vincent, E (2005) BSS EVAL Toolbox User Guide – Revision 2.0 Technical report Developed with the support of the French GdR-ISIS/CNRS Workgroup “Resources for Audio Source Separation” [42] Févotte, C and Idier, J (2011) Algorithms for nonnegative matrix factorization with the beta-divergence Neural Computation, 23(9):2421–2456 [43] Févotte, C., Vincent, E., and Ozerov, A (2017) Single-channel audio source separation with NMF: divergences, constraints and algorithms In Audio Source Separation Springer [44] Fitzgerald, D (2012) User assisted separation using tensor factorisations 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pages 2412–2416 [45] Fox, B., Sabin, A., Pardo, B., and Zopf, A (2007) Modeling perceptual similarity of audio signals for blind source separation evaluation In Independent Component Analysis and Signal Separation - 7th International Conference, ICA 2007, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 454–461 [46] Fritsch, J and Plumbley, M (2013) Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis In IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 888–891 [47] Gannot, S., Vincent, E., Markovich-Golan, S., and Ozerov, A (2017) A consolidated perspective on multimicrophone speech enhancement and source separation 100 IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(4):692– 730 [48] Gerber, T., Dutasta, M., Girin, L., and Févotte, C (2012) Professionally- produced music separation guided by covers In International Society for Music Information Retrieval Conference (ISMIR 2012), pages 85–90, Porto, Portugal [49] Gribonval, R., Vincent, E., Févotte, C., and Benaroya, L (2003) Proposals for performance measurement in source separation In 4th Int Symp on Independent Component Analysis and Blind Signal Separation (ICA2003), pages 763–768 [50] Gustafsson, T., Rao, B., and Trivedi, M (2003) Source localization in reverberant environments: modeling and statistical analysis IEEE Transactions on Speech and Audio Processing, 11(6):791–803 [51] Hennequin, R., David, B., and Badeau, R (2011) Score informed audio source separation using a parametric model of non-negative spectrogram In Proc IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 45–48 [52] Hershey, J R., Chen, Z., Roux, J L., and Watanabe, S (2016) Deep clustering: Discriminative embeddings for segmentation and separation In Proc IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 31–35 [53] Heymann, J., Drude, L., and Haeb-Umbach, R (2017) A generic neural acoustic beamforming architecture for robust multi-channel speech processing Computer Speech & Language, 46:374 – 385 [54] Houda, A and Otman, C (2015) Article: Blind audio source separation: Stateof-art International Journal of Computer Applications, 130(4):1–6 Published by Foundation of Computer Science (FCS), NY, USA [55] Huang, A (2013) NMF Face Recognition Method Based on Alpha Divergence In Zhong, Z., editor, Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012, volume 217, pages 477–483 Springer London [56] Huang, P., Kim, M., Hasegawa-Johnson, M., and Smaragdis, P (2015) Joint optimization of masks and deep recurrent neural networks for monaural source separation IEEE/ACM Trans Audio, Speech & Language Processing, 23(12):2136–2147 101 [57] Huber, R and Kollmeier, B (2006) PEMO-Q - A new method for objective audio quality assessment using a model of auditory perception IEEE Transactions on Audio, Speech, and Language Processing, 14(6):1902–1911 [58] Hurmalainen, A., Saeidi, R., and Virtanen, T (2012) Group sparsity for speaker identity discrimination in factorisation-based speech recognition In Proc Interspeech, pages 17–20 [59] Ito, N., Araki, S., and Nakatani, T (2013) Permutation-free convolutive blind source separation via full-band clustering based on frequency-independent source presence priors In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 3238–3242 [60] Izumi, Y., Ono, N., and Sagayama, S (2007) Sparseness-Based 2ch BSS using the EM Algorithm in Reverberant Environment In Proc IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 147–150 [61] Jeter, M and Pye, W (1981) A note on nonnegative rank factorizations Linear Algebra and its Applications, 38:171–173 [62] Jiang, Y., Wang, D., Liu, R., and Feng, Z (2014) Binaural classification for reverberant speech segregation using deep neural networks IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):2112–2121 [63] Jourjine, A., Rickard, S., and Yılmaz, O (2000) Blind separation of disjoint orthogonal signals: Demixing N sources from mixtures In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 2985–2988 [64] Kim, G and Loizou, P (2010) Improving speech intelligibility in noise using environment-optimized algorithms IEEE Trans Audio, Speech, Language Processing, 18(8):2080–2090 [65] Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., and Maas, R (2013) The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech In Proc IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 1–4, NY, USA [66] Kitamura, D., Ono, N., Sawada, H., Kameoka, H., and Saruwatari, H (2016a) Determined Blind Source Separation Unifying Independent Vector Analysis and 102 Nonnegative Matrix Factorization IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9):1626–1641 [67] Kitamura, D., Ono, N., Sawada, H., Kameoka, H., and Saruwatari, H (2016b) Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization IEEE/ACM Trans on Audio, Speech and Language Processing, 24(9):1622–1637 [68] Kompass, R (2007) A generalized divergence measure for nonnegative matrix factorization Neural Computation, 19(3):780–791 [69] Kuttruff, H (2000) Room Acoustics Spon Press, New York, 4rd edition edition [70] Lafay, G., Benetos, E., and Lagrange, M (2017) Sound event detection in synthetic audio: Analysis of the dcase 2016 task results In 2016 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 11–15 [71] Le Magoarou, L., Ozerov, A., and Duong, N Q K (2013) Text-informed audio source separation using nonnegative matrix partial co-factorization In IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6 [72] Lee, D D and Seung, H S (1999) Learning the parts of objects by non-negative matrix factorization Nature, 401 6755:788–91 [73] Lee, D D and Seung, H S (2001) Algorithms for non-negative matrix factorization In Advances in Neural and Information Processing Systems 13, pages 556–562 [74] Lefèvre, A., Bach, F., and Févotte, C (2011) Itakura-Saito non-negative matrix factorization with group sparsity In IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 21–24 [75] Leglaive, S., S¸ims¸ekli, U., Liutkus, A., Badeau, R., and Richard, G (2017) Alpha-stable multichannel audio source separation In IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 576–580 [76] Li, Y and Ngom, A (2013) The non-negative matrix factorization toolbox for biological data mining Source Code for Biology and Medicine, 8(1):1–10 103 [77] Liutkus, A., Badeau, R., and Richard, G (2011) Gaussian Processes for Underdetermined Source Separation IEEE Transactions on Signal Processing, 59(7):3155–3167 [78] Liutkus, A., Durrieu, J L., Daudet, L., and Richard, G (2013) An overview of informed audio source separation In Proc IEEE Int Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pages 1–4 [79] Liutkus, A., Fitzgerald, D., and Rafii, Z (2015) Scalable audio separation with light kernel additive modelling In Proc IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 76–80 [80] Liutkus, A., Stăoter, F R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., Ono, N., and Fontecave, J (2017a) The 2016 signal separation evaluation campaign In Proc Int Conf on Latent Variable Analysis and Signal Separation, pages 323–332 [81] Liutkus, A., Stter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., Ono, N., and Fontecave, J (2017b) The 2016 Signal Separation Evaluation Campaign In Latent Variable Analysis and Signal Separation, volume 10169, pages 323–332 Springer International Publishing, Cham [82] Lopez, A R., Ono, N., Remes, U., Palomăaki, K., and Kurimo, M (2015) Designing multichannel source separation based on single-channel source separation In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 469–473 [83] Magoarou, L L., Ozerov, A., and Duong, N Q K (2014) Text-informed audio source separation example-based approach using non-negative matrix partial cofactorization Journal of Signal Processing Systems, pages 1–5 [84] Magron, P., Badeau, R., and Liutkus, A (2017) Lévy NMF for robust nonnegative source separation In Proc IEEE Int Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 259–263 [85] Makino, S., Lee, T.-W., and Sawada, H (2007) Blind Speech Separation Springer [86] Mandel, M I., Weiss, R J., and Ellis, D P W (2010) Model-based expectationmaximization source separation and localization IEEE Transactions on Audio, Speech, and Language Processing, 18(2):382–394 104 [87] McCowan, I and Bourlard, H (2003) Microphone array post-filter based on noise field coherence IEEE Transactions on Speech and Audio Processing, 11(6):709–716 [88] Mesaros, A., Heittola, T., Benetos, E., Foster, P., Lagrange, M., Virtanen, T., and Plumbley, M D (2018) Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(2):379–393 [89] Mohammadiha, N., Smaragdis, P., and Leijon, A (2013) Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization IEEE Transactions on Audio, Speech, and Language Processing, 21(10):2140–2151 [90] Naik, G R and Wang, W., editors (2014) Blind source separation: advances in theory, algorithms and applications Signals and communication technology Springer, Berlin [91] Nakajima, Y., Sunohara, M., Naito, T., Sunago, N., Ohshima, T., and Ono, N (2016) DNN-based environmental sound recognition with real-recorded and artificially-mixed training data [92] Naylor, P A and Gaubitch, N D., editors (2010) Speech Dereverberation Signals and Communication Technology Springer London [93] Nesta, F and Omologo, M (2012) Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources IEEE Transactions on Audio, Speech, and Language Processing, 20(1):246–260 [94] Nikunen, J and Virtanen, T (2014) Direction of arrival based spatial covariance model for blind sound source separation IEEE/ACM Trans on Audio, Speech, and Language Processing, 22(3):727–739 [95] Nix, J and Hohmann, V (2007) Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering IEEE Transactions on Audio, Speech and Language Processing, 15(3):995– 1008 [96] Nugraha, A., Liutkus, A., and Vincent, E (2016) Multichannel audio source separation with deep neural networks IEEE/ACM Transactions on Audio, Speech, and Language Processing, 14(9):1652–1664 105 [97] O’Grady, P D., Pearlmutter, B A., and Rickard, S T (2005) Survey of sparse and non-sparse methods in source separation International Journal of Imaging Systems and Technology, 15(1):18–33 [98] Ono, N., Koldovsk, Z., Miyabe, S., and Ito, N (2013a) The 2013 Signal Separation Evaluation Campaign In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6 [99] Ono, N., Koldovsk, Z., Miyabe, S., and Ito, N (2013b) The 2013 Signal Separation Evaluation Campaign In Proc IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6 [100] Ono, N., Rafii, Z., Kitamura, D., Ito, N., and Liutkus, A (2015a) The 2015 Signal Separation Evaluation Campaign In Latent Variable Analysis and Signal Separation, volume 9237, pages 387–395 Springer International Publishing, Cham [101] Ono, N., Rafii, Z., Kitamura, D., Ito, N., and Liutkus, A (2015b) The 2015 Signal Separation Evaluation Campaign In Latent Variable Analysis and Signal Separation (LVAICA), volume 9237, pages 387–395 Springer [102] Ozerov, A and Fevotte, C (2010) Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation IEEE Transactions on Audio, Speech, and Language Processing, 18(3):550–563 [103] Ozerov, A and Févotte, C (2010) Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation IEEE Trans on Audio, Speech and Language Processing, 18(3):550–563 [104] Ozerov, A., Fevotte, C., Blouet, R., and Durrieu, J.-L (2011) Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 257–260 [105] Ozerov, A., Févotte, C., and Vincent, E (2017) An introduction to multichannel NMF for audio source separation In Audio Source Separation, Signals and Communication Technology Springer [106] Ozerov, A., Philippe, P., Bimbot, F., and Gribonval, R (2007) Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to 106 Voice/Music Separation in Popular Songs IEEE Transactions on Audio, Speech and Language Processing, 15(5):1564–1578 [107] Ozerov, A., Vincent, E., and Bimbot, F (2012) A general flexible framework for the handling of prior information in audio source separation IEEE Transactions on Audio, Speech, and Language Processing, 20(4):1118–1133 [108] Paatero, P (1997) Least squares formulation of robust non-negative factor analysis Chemometrics and Intelligent Laboratory Systems, 37(1):23–35 [109] Paatero, P and Tapper, U (1994) Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values Environmetrics, 5(2):111–126 [110] Parekh, S., Essid, S., Ozerov, A., Duong, N Q K., Perez, P., and Richard, G (2017) Motion informed audio source separation In Proc IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 6–10 [111] Parvaix, M and Girin, L (2011) Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding IEEE Transactions on Audio, Speech, and Language Processing, 19(6):1721–1733 [112] Pedersen, M S., Larsen, J., Kjems, U., and Parra, L C (2007) A survey of convolutive blind source separation methods In Springer Handbook of Speech Processing, pages 1–34 Springer [113] Quirs, A and Wilson, S P (2012) Dependent Gaussian mixture models for source separation EURASIP Journal on Advances in Signal Processing, 2012(1) [114] Rafii, Z and Pardo, B (2013) Online REPET-SIM for real-time speech enhancement In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 848–852 [115] Rennie, S J., Hershey, J R., and Olsen, P A (2008) Efficient model-based speech separation and denoising using nonnegative subspace analysis In In: Proc of ICASSP Las Vegas, pages 1833–1836 [116] Revit, L J and Schulein, R B (2013) Sound reproduction method and apparatus for assessing real-world performance of hearing and hearing aids The Journal of the Acoustical Society of America, 133(2):1196–1199 107 [117] Reynolds, D A., Quatieri, T F., and Dunn, R B Speaker verification using adapted gaussian mixture models Digital Signal Processing, 10(1):19–41 [118] Roy, R and Kailath, T (1989) Esprit-estimation of signal parameters via rotational invariance techniques IEEE/ACM Transactions on Audio, Speech, and Language Processing, 37(7):984–995 [119] Sainath, T N., Weiss, R J., Wilson, K W., Li, B., Narayanan, A., Variani, E., Bacchiani, M., Shafran, I., Senior, A., Chin, K., Misra, A., and Kim, C (2017) Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5):965–979 [120] Sandler, R and Lindenbaum, M (2011) Nonnegative matrix factorization with earth mover’s distance metric for image analysis IEEE Trans Pattern Anal Mach Intell., 33(8):1590–1602 [121] Sawada, H., Araki, S., and Makino, S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment IEEE Transactions on Audio, Speech, and Language Processing, 19(3):516– 527 [122] Sawada, H., Kameoka, H., Araki, S., and Ueda, N (2013) Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data IEEE Transactions on Audio, Speech, and Language Processing, 21(5):971–982 [123] Smaragdis, P and Mysore, G J (2009) Separation by humming: User-guided sound extraction from monophonic mixtures In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 69–72 [124] Smaragdis, P., Raj, B., and Shashanka, M (2007) Supervised and semi- supervised separation of sounds from single-channel mixtures In Int Conf on Independent Component Analysis and Signal Separation (ICA), pages 414–421 [125] Smith, J O (2011) Spectral audio signal processing W3K Publishing [126] Souviraà-Labastie, N., Olivero, A., Vincent, E., and Bimbot, F (2015) Multichannel audio source separation using multiple deformed references IEEE/ACM Transactions on Audio, Speech and Language Processing, 23:1775–1787 108 [127] Sprechmann, P., Bronstein, A M., and Sapiro, G (2015) Supervised nonnegative matrix factorization for audio source separation In Excursions in Harmonic Analysis, Volume 4, pages 407–420 Springer International Publishing, Cham [128] Sun, D L and Mysore, G J (2013) Universal speech models for speaker independent single channel source separation In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 141–145 [129] Sunohara, M., Haruta, C., and Ono, N (2017) Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP), pages 216– 220 [130] Tan, V Y F and Fevotte, C (2013) Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7):1592–1605 [131] Traa, J., Smaragdis, P., Stein, N D., and Wingate, D (2015) Directional nmf for joint source localization and separation In 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 1–5 [132] Uhlich, S., Giron, F., and Mitsufuji, Y (2015) Deep neural network based instrument extraction from music In Proc IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 2135–2139 [133] Vincent, E., Araki, S., and Bofill, P (2009) The 2008 signal separation evaluation campaign: A community-based approach to large-scale evaluation In Proc Int Conf on Independent Component Analysis and Signal Separation (ICA), pages 734–741 [134] Vincent, E., Araki, S., Theis, F., Nolte, G., Bofill, P., Sawada, H., Ozerov, A., Gowreesunker, V., Lutter, D., and Duong, N Q (2012) The signal separation evaluation campaign (2007 2010): Achievements and remaining challenges Signal Processing, 92(8):1928–1936 [135] Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., and Matassoni, M (2013) The second ’chime’ speech separation and recognition challenge: Datasets, 109 tasks and baselines In IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 126–130 [136] Vincent, E., Bertin, N., Gribonval, R., and Bimbot, F (2014) From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound IEEE Signal Processing Magazine, 31(3):107–115 [137] Vincent, E., Gribonval, R., and Fevotte, C (2006a) Performance measurement in blind audio source separation IEEE Transactions on Audio, Speech and Language Processing, 14(4):1462–1469 [138] Vincent, E., Jafari, M G., Abdallah, S A., Plumbley, M D., and Davies, M E (2010) Probabilistic modeling paradigms for audio source separation In In Machine Audition: Principles, Algorithms and Systems, pages 162–185 IGI Global [139] Vincent, E., Jafari, M G., and Plumbley, M D (2006b) Preliminary guidelines for subjective evalutation of audio source separation algorithms In UK ICA Research Network Workshop, Southampton, United Kingdom [140] Vincent, E., Sawada, H., Bofill, P., Makino, S., and Rosca, J P (2007) First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results In Independent Component Analysis and Signal Separation, pages 552–559 Springer Berlin Heidelberg [141] Vincent, E., Virtanen, T., and Gannot, S., editors (2017) Audio Source Separation and Speech Enhancement Wiley [142] Virtanen, T (2007) Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria IEEE Transactions on Audio, Speech and Language Processing, 15(3):1066–1074 [143] Virtanen, T., Singh, R., and Raj, B., editors (2012) Techniques for noise robustness in automatic speech recognition Wiley, Chichester, West Sussex, U.K ; Hoboken, N.J [144] Wang, D (2017) Deep learning reinvents the hearing aid IEEE Spectrum, 54(3):32–37 [145] Wang, D and Brown, G J., editors (2006) Computational auditory scene analysis: principles, algorithms, and applications IEEE Press ; Wiley Interscience 110 [146] Wang, L., Ding, H., and Yin, F (2011) A region-growing permutation alignment approach in frequency-domain blind source separation of speech mixtures Trans Audio, Speech and Language Processing, 19(3):549–557 [147] Wang, Y.-X and Zhang, Y.-J (2013) Nonnegative Matrix Factorization: A Comprehensive Review IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353 [148] Wang, Z.-Q., Roux, J L., and Hershey, J R (2018) Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation In Proc IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5 [149] Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Le Roux, J., Hershey, J R., and Schuller, B (2015) Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR In Latent Variable Analysis and Signal Separation, volume 9237, pages 91–99 Springer International Publishing [150] Weninger, F., Hershey, J R., Le Roux, J., and Schuller, B (2014) Discriminatively trained recurrent neural networks for single-channel speech separation In IEEE Global Conference on Signal and Information Processing, pages 577–581 [151] Winter, S., Kellermann, W., Sawada, H., and Makino, S (2006) MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and -norm minimization EURASIP Journal on Advances in Signal Processing, 2007(1):024717 [152] Wlfel, M and McDonough, J (2009) Distant speech recognition Wiley, Chichester, U.K [153] Wood, S and Rouat, J (2016) Blind speech separation with GCC-NMF In Proc Interspeech, pages 3329–3333 [154] Wood, S U N., Rouat, J., Dupont, S., and Pironkov, G (2017) Blind Speech Separation and Enhancement With GCC-NMF IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(4):745–755 [155] Xiao, X., Watanabe, S., Erdogan, H., Lu, L., Hershey, J., Seltzer, M L., Chen, G., Zhang, Y., Mandel, M., and Yu, D (2016) Deep beamforming networks for 111 multi-channel speech recognition In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5745–5749 IEEE [156] Yilmaz, Y K., Cemgil, A T., and Simsekli, U (2011) Generalised coupled tensor factorisation In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, pages 2151–2159, USA Curran Associates Inc [157] Yu, D and Deng, L (2011) Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP IEEE Signal Processing Magazine, 28(1):145–154 [158] Zdunek, R (2011) Convolutive nonnegative matrix factorization with markov random field smoothing for blind unmixing of multichannel speech recordings In Proc The 5th International Conference on Advances in Nonlinear Speech Processing, NOLISP’11, pages 25–32 Springer-Verlag [159] Zdunek, R (2013) Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing Cognitive Computation, 5(4):493–503 [160] Zhang, Z.-Y (2012) Nonnegative Matrix Factorization: Models, Algorithms and Applications In Data Mining: Foundations and Intelligent Paradigms, volume 24, pages 99–134 Springer Berlin Heidelberg 112 LIST OF PUBLICATIONS Hien-Thanh Thi Duong, Quoc-Cuong Nguyen, Cong-Phuong Nguyen, Thanh Huan Tran, and Ngoc Q K Duong (2015) Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint Proc ACM International Symposium on Information and Communication Technology (SoICT 2015), pp 247-251, Hue, Vietnam ISBN 978-1-4503-3843-1, DOI:10.1145/2833258.2833276 Hien-Thanh Thi Duong, Quoc-Cuong Nguyen, Cong-Phuong Nguyen, and Ngoc Q K Duong (2016) Single-channel speaker-dependent speech enhancement exploiting generic noise model learned by nonnegative matrix factorization Proc IEEE International Conference on Electronics, Information and Communication, pp 268-271, Danang, Vietnam, Electronic ISBN 978-1-4673-8016-4, PoD ISBN 978-1-46738017-1, DOI 10.1109/ELINFOCOM.2016.7562952 Thanh Thi Hien Duong, Nobutaka Ono, Yasutaka Nakajima and Toshiya Ohshima (2016) Non-stationary Segment Detection Methods based on Single-basis Non-negative Matrix Factorization for Effective Annotation Proc IEEE Asia-Pacific Signal and Information Processing Association Annual Summit Conference (IEEE APSIPA ASC), pp 1-6, Jeju, Korea, Electronic ISBN 978-9-8814-7682-1, PoD ISBN 978-1-5090-2401-8, DOI 10.1109/APSIPA.2016.7820760 Thanh Thi Hien Duong, Phuong Cong Nguyen, and Cuong Quoc Nguyen (2018) Exploiting Nonnegative Matrix Factorization with Mixed Group Sparsity Constraint to Separate Speech Signal from Singlechannel Mixture with Unknown Ambient Noise EAI Endorsed Transactions on Context-Aware Systems and Applications vol 18(13), pp 1-8 ISSN 2409-0026 Duong Thi Hien Thanh, Nguyen Cong Phuong, and Nguyen Quoc Cuong (2018) Combination of Nonnegative Matrix Factorization and mixed group sparsity constraint to exploit generic source spectral model in single-channel audio source separation Journal of Military Science and Technology Vol 45(4), pp: 83-94 ISSN 1859 - 1043 (In Viet- 113 namese) Thanh Thi Hien Duong, Ngoc Q K Duong, Phuong Cong Nguyen, and Cuong Quoc Nguyen (2018) Multichannel source separation exploiting NMF-based generic source spectral model in Gaussian modeling framework In Latent Variable Analysis and Signal Separation, vol 10891, pp 547-557 Springer International Publishing DOI 10.1007/9 78-3-319-93764-9 50 (SCOPUS) Thanh Thi Hien Duong, Ngoc Q K Duong, Phuong Cong Nguyen, and Cuong Quoc Nguyen (2019) Gaussian modeling-based multichannel audio source separation exploiting generic source spectral model IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol 27(1), pp 32-43 ISSN 2329-9304, DOI 10.1109/TASLP.2018.28 69692 (ISI - Q1) 114 ... a Vector A Matrix A T Matrix transpose A H Matrix conjugate transposition (Hermitian conjugation) diag(a) Diagonal matrix with a as its diagonal det(A) Determinant of matrix A tr(A) Matrix trace... directly estimating the time-frequency mask [144] or for estimating the source spectra whose ratio yields a time-frequency mask [4, 56, 132] Time-frequency masking, as its name suggests, estimates the... element-wise Hadamard product of two matrices (of the same dimension) B with elements [A A (n) a A 1 B]ij = Aij Bij (n) The matrix with entries [A]ij -norm of vector -norm of matrix Indices f

Ngày đăng: 13/03/2019, 12:53