1. Trang chủ
  2. » Luận Văn - Báo Cáo

Doctoral dissertation of computer science: Audio source separation exploiting nmf based generic source spectral model

129 40 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Aims to tackle the real-world recordings with challenging settings as mentioned earlier, we have proposed novel separation algorithms for both single-channel and multi-channel cases. The achieved results have been described in seven publications. The results of our algorithms were also submitted to the international source separation campaign SiSEC 20164 [81] and obtained the best performance in terms of energybased criteria.

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY DUONG THI HIEN THANH AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GENERIC SOURCE SPECTRAL MODEL DOCTORAL DISSERTATION OF COMPUTER SCIENCE Hanoi - 2019 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY DUONG THI HIEN THANH AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GENERIC SOURCE SPECTRAL MODEL Major: Computer Science Code: 9480101 DOCTORAL DISSERTATION OF COMPUTER SCIENCE SUPERVISORS: ASSOC PROF DR NGUYEN QUOC CUONG DR NGUYEN CONG PHUONG Hanoi - 2019 DECLARATION OF AUTHORSHIP I, Duong Thi Hien Thanh, hereby declare that this thesis is my original work and it has been written by me in its entirety I confirm that: • This work was done wholly during candidature for a Ph.D research degree at Hanoi University of Science and Technology • Where any part of this thesis has previously been submitted for a degree or any other qualification at Hanoi University of Science and Technology or any other institution, this has been clearly stated • Where I have consulted the published work of others, this is always clearly attributed • Where I have quoted from the work of others, the source is always given With the exception of such quotations, this thesis is entirely my own work • I have acknowledged all main sources of help • Where the thesis is based on work done by myself jointly with others, I have made exactly what was done by others and what I have contributed myself Hanoi, February 2019 Ph.D Student Duong Thi Hien Thanh SUPERVISORS Assoc.Prof Dr Nguyen Quoc Cuong i Dr Nguyen Cong Phuong ACKNOWLEDGEMENT This thesis has been written during my doctoral study at International Research Institute Multimedia, Information, Communication, and Applications (MICA), Hanoi University of Science and Technology (HUST) It is my great pleasure to thank numerous people who have contributed towards shaping this thesis First and foremost I would like to express my most sincere gratitude to my supervisors, Assoc Prof Nguyen Quoc Cuong and Dr Nguyen Cong Phuong, for their great guidance and support throughout my Ph.D study I am grateful to them for devoting their precious time to discussing research ideas, proofreading, and explaining how to write good research papers I would like to thank them for encouraging my research and empowering me to grow as a research scientist I could not have imagined having a better advisor and mentor for my Ph.D study I would like to express my appreciation to my supervisor in Master cource, Prof Nguyen Thanh Thuy, School of Information and Communication Technology - HUST, and Dr Nguyen Vu Quoc Hung, my supervisor in Bachelors course at Hanoi National University of Education They had shaped my knowledge for excelling in studies In the process of implementation and completion of my research, I have received many supports from the board of MICA directors and my colleagues at Speech Communication department Particularly, I am very much thankful to Prof Pham Thi Ngoc Yen, Prof Eric Castelli, Dr Nguyen Viet Son and Dr Dao Trung Kien, who provided me with an opportunity to join researching works in MICA institute and have access to the laboratory and research facilities Without their precious support would it have been being impossible to conduct this research My warmly thanks go to my colleagues at Speech Communication department of MICA institute for their useful comments on my study and unconditional support over four years both at work and outside of work I am very grateful to my internship supervisor Prof Nobutaka Ono and the members of Ono’s Lab at the National Institute of Informatics, Japan for warmly welcoming me into their lab and the helpful research collaboration they offered I much appreciate his help in funding my conference trip and introducing me to the signal processing research communities I would also like to thank Dr Toshiya Ohshima, MSc Yasutaka Nakajima, MSc Chiho Haruta and other researchers at Rion Co., Ltd., Japan for ii welcoming me to their company and providing me data for experimental I would also like to sincerely thank Dr Nguyen Quang Khanh, dean of Information Technology Faculty, and Assoc Prof Le Thanh Hue, dean of Economic Informatics Department, at Hanoi University of Mining and Geology (HUMG) where I am working I have received the financial and time support from my office and leaders for completing my doctoral thesis Grateful thanks also go to my wonderful colleagues and friends Nguyen Thu Hang, Pham Thi Nguyet, Vu Thi Kim Lien, Vo Thi Thu Trang, Pham Quang Hien, Nguyen The Binh, Nguyen Thuy Duong, Nong Thi Oanh and Nguyen Thi Hai Yen, who have the unconditional support and help during a long time A special thank goes to Dr Le Hong Anh for the encouragement and his precious advice Last but not the least, I would like to express my deepest gratitude to my family I am very grateful to my mother-in-law and father-in-law for their support in the time of need, and always allow me to focus on my work I dedicate this thesis to my mother and father with special love, they have been being a great mentor in my life and had constantly encouraged me to be a better person The struggle and sacrifice of my parents always motivate me to work hard in my studies I would also like to express my love to my younger sisters and younger brother for their encouraging and helping This work has become more wonderful because of the love and affection that they have provided A special love goes to my beloved husband Tran Thanh Huan for his patience and understanding, for always being there for me to share the good and bad times I also appreciate my sons Tran Tuan Quang and Tran Tuan Linh for always cheering me up with their smiles Without love from them, this thesis would not have been completed Thank you all! Hanoi, February 2019 Ph.D Student Duong Thi Hien Thanh iii CONTENTS DECLARATION OF AUTHORSHIP DECLARATION OF AUTHORSHIP i i ACKNOWLEDGEMENT ii CONTENTS iv NOTATIONS AND GLOSSARY viii LIST OF TABLES xi LIST OF FIGURES xii INTRODUCTION Chapter AUDIO SOURCE SEPARATION: FORMULATION AND STATE OF THE ART 10 1.1 Audio source separation: a solution for cock-tail party problem 10 1.1.1 General framework for source separation 10 1.1.2 Problem formulation 11 State of the art 13 1.2.1 13 1.2.1.1 Gaussian Mixture Model 14 1.2.1.2 Nonnegative Matrix Factorization 15 1.2.1.3 Deep Neural Networks 16 Spatial models 18 1.2.2.1 Interchannel Intensity/Time Difference (IID/ITD) 18 1.2.2.2 Rank-1 covariance matrix 19 1.2.2.3 Full-rank spatial covariance model 20 Source separation performance evaluation 21 1.3.1 Energy-based criteria 22 1.3.2 Perceptually-based criteria 23 Summary 23 1.2 1.2.2 1.3 1.4 Spectral models Chapter NONNEGATIVE MATRIX FACTORIZATION 2.1 NMF introduction iv 24 24 2.2 2.3 2.1.1 NMF in a nutshell 24 2.1.2 Cost function for parameter estimation 26 2.1.3 Multiplicative update rules 27 Application of NMF to audio source separation 29 2.2.1 Audio spectra decomposition 29 2.2.2 NMF-based audio source separation 30 Proposed application of NMF to unusual sound detection 32 2.3.1 Problem formulation 33 2.3.2 Proposed methods for non-stationary frame detection 34 2.3.2.1 Signal energy based method 34 2.3.2.2 Global NMF-based method 35 2.3.2.3 Local NMF-based method 35 Experiment 37 2.3.3.1 Dataset 37 2.3.3.2 Algorithm settings and evaluation metrics 37 2.3.3.3 Results and discussion 38 Summary 43 2.3.3 2.4 Chapter SINGLE-CHANNEL AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GENERIC SOURCE SPECTRAL MODEL WITH MIXED GROUP SPARSITY CONSTRAINT 44 3.1 General workflow of the proposed approach 44 3.2 GSSM formulation 46 3.3 Model fitting with sparsity-inducing penalties 46 3.3.1 Block sparsity-inducing penalty 47 3.3.2 Component sparsity-inducing penalty 48 3.3.3 Proposed mixed sparsity-inducing penalty 49 3.4 Derived algorithm in unsupervised case 49 3.5 Derived algorithm in semi-supervised case 52 3.5.1 Semi-GSSM formulation 52 3.5.2 Model fitting with mixed sparsity and algorithm 54 Experiment 54 3.6.1 Experiment data 54 3.6.1.1 55 3.6 Synthetic dataset v 3.6.2 3.6.3 3.7 3.6.1.2 SiSEC-MUS dataset 55 3.6.1.3 SiSEC-BNG dataset 56 Single-channel source separation performance with unsupervised setting 57 3.6.2.1 Experiment settings 57 3.6.2.2 Evaluation method 57 3.6.2.3 Results and discussion 61 Single-channel source separation performance with semi-supervised setting 65 3.6.3.1 Experiment settings 65 3.6.3.2 Evaluation method 65 3.6.3.3 Results and discussion 65 Summary 66 Chapter MULTICHANNEL AUDIO SOURCE SEPARATION EXPLOITING NMF-BASED GSSM IN GAUSSIAN MODELING FRAMEWORK 68 4.1 Formulation and modeling 68 4.1.1 Local Gaussian model 68 4.1.2 NMF-based source variance model 70 4.1.3 Estimation of the model parameters 71 Proposed GSSM-based multichannel approach 72 4.2.1 GSSM construction 72 4.2.2 Proposed source variance fitting criteria 73 4.2.2.1 Source variance denoising 73 4.2.2.2 Source variance separation 74 4.2.3 Derivation of MU rule for updating the activation matrix 75 4.2.4 Derived algorithm 77 Experiment 79 4.3.1 Dataset and parameter settings 79 4.3.2 Algorithm analysis 80 4.2 4.3 4.3.2.1 4.3.2.2 4.3.3 Algorithm convergence: separation results as functions of EM and MU iterations 80 Separation results with different choices of λ and γ 81 Comparison with the state of the art vi 82 4.4 Summary 91 CONCLUSIONS AND PERSPECTIVES 93 BIBLIOGRAPHY 96 LIST OF PUBLICATIONS 113 vii NOTATIONS AND GLOSSARY Standard mathematical symbols C Set of complex numbers R Set of real numbers Z Set of integers E Expectation of a random variable Nc Complex Gaussian distribution Vectors and matrices a Scalar a Vector A Matrix A T Matrix transpose A H Matrix conjugate transposition (Hermitian conjugation) diag(a) Diagonal matrix with a as its diagonal det(A) Determinant of matrix A tr(A) Matrix trace A The element-wise Hadamard product of two matrices (of the same dimension) B with elements [A A (n) a A 1 B]ij = Aij Bij (n) The matrix with entries [A]ij -norm of vector -norm of matrix Indices f Frequency index i Channel index j Source index n Time frame index t Time sample index viii ...MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY DUONG THI HIEN THANH AUDIO SOURCE SEPARATION EXPLOITING NMF- BASED GENERIC SOURCE SPECTRAL MODEL Major: Computer Science... Application of NMF to audio source separation 29 2.2.1 Audio spectra decomposition 29 2.2.2 NMF- based audio source separation 30 Proposed application of NMF. .. 2.4 Chapter SINGLE-CHANNEL AUDIO SOURCE SEPARATION EXPLOITING NMF- BASED GENERIC SOURCE SPECTRAL MODEL WITH MIXED GROUP SPARSITY CONSTRAINT 44 3.1 General workflow of the proposed approach

Ngày đăng: 07/01/2020, 19:40

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w