Audio signal classifcation using deep learning tech techniques

109 10 0
Audio signal classifcation using deep learning tech techniques

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

逢 甲 大 學 機械與航空工程博士學位學程 博士論文 使用深度學習技術對音頻信號進行分類 Audio Signal Classification Using Deep Learning Techniques 指導教授:黃錦煌博士 研 究 生 :阮明俊 中 華 民 國 一 百 一 十 年 六 月 Feng Chia University Ph.D Program of Mechanical and Aeronautical Engineering Ph.D Thesis Audio Signal Classification Using Deep Learning Techniques Adviser: Professor Jin H Huang Student: Minh Tuan Nguyen June 2021 Audio Signal Classification Using Deep Learning Techniques Acknowledgments First and foremost, my deepest gratitude goes to my advisor, Distinguished Professor Jin H Huang, for his support, enthusiastic guidance, and consistent encouragement He shared his brilliant insight and great vision on my research with me and taught me to be a good researcher with his experience Besides, he opens my mind with new ideas, and every time I am reached out or depressed, he is there to guide me through the right way and give me more motivation to continue I have been amazingly fortunate to have an advisor who could provide me with outstanding guidance on this long journey Without him, this thesis would not have been finished I would especially like to thank my thesis committee members, Dr Yu-Ting Tsai, Dr Chang-Ann Yuan, Dr Jiunn Fang, Dr Yen-Sheng Chen, and Dr Tian-Yau Wu, for their insightful comments, challenging questions, and valuable suggestions in this research and for their time and effort in service to my doctoral committee despite their already heavy loads of responsibility I would like to thank Dr Wen-Chin Tsai and Ms Jeou-Yuh Lin, for their assistance and willingness I thank all my lab members for their unconditional help and friendship It has been a great pleasure to work with them and get to know them In addition, I thank my Vietnamese friends in Taichung who make me happy and keep me entertained during my study time here Last but not least, I would express my gratitude to my beloved family My parents raised me and taught me to study hard and prioritize my life to quest for knowledge My wife for unwavering love, cheering me up, and standing beside me through the good and bad times My precious children, who mean everything to me and are the biggest strength, helped me always to try my best, my brother and my sister, for their sharing and encouragement All their support and constant encouragement help me through the hard times of this program My most profound appreciation is expressed to them for their love, understanding, and inspiration Minh Tuan Nguyen i FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques Abstract Audio signal classification (ASC) is a recognition based on the device's ability to hear audio signals This field is receiving substantial attention and development in recent years In speech/music recognition, the methodologies and standard models are well-developed (e.g., feature sets, classification models, learning strategies), achieved many successes, and applied in many life areas However, there are still much gaps that needs strong promotion, such as in engineering (diagnosing errors of machines and equipment via sound), medicine (diagnosing acquired diseases through heart sounds, sound pulmonary), and security monitoring (environmental recognition through sound) With the vigorous development of artificial intelligence, including deep learning techniques, many automatic and modern models have been developed for ASC with high performance In this thesis, a comprehensive investigation of ASC methodologies, including the features and the classification models, is performed Based on these analyses, the features and efficient models for high performance are selected for experimental applications Three studies using deep learning techniques are “sound receiver location estimation using a convolutional neural network,” “fault detection in water pumps based on sound analysis,” and “heartbeat sound classification” were implemented In each study, the consistent features of the sound signal were first extracted Secondly, classification models were developed, using these extracted features to classify the sound signals in open-access datasets All three studies have archived high accuracy, demonstrating the effectiveness of the proposed methods and the great potential of deep learning algorithms in processing and classifying audio signals Keywords: Audio signal classification, deep learning, recurrent neural network, convolution neural network, sound receiver location estimation, abnormality detection, heart sound classification ii FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques Contents Acknowledgments ………… …………………………………………………… i Abstract ………………………………………………………………….………… ii Contents ……………… ……………………………………………………… iii List of Figures ……………………………………………………………….…… v List of Tables …………………………………………………………… ……… vii Chapter Introduction …………………………………………………….……… 1.1 Background ………… ………………………………………………… 1.2 Literature review ………… ………………………………………….… 1.3 Objectives ……………………………………………….…………….… 1.4 Structure of this thesis ………….……………………….….…………… Chapter Methodology ………………………………………………… ……… 2.1 Audio features for ASC ….………………………………….………… 2.1.1 Time-domain features …………… …………….……………… 2.1.2 Frequency-domain features……………………… ….………… 2.2 Classification models …… …………… …………… …………… 11 2.2.1 Traditional machine learning models ……….…………………… 11 2.2.2 Deep learning models …………………….……… …… ……… 14 2.3 Evaluation metrics …………………… ……… …….……………… 20 2.4 Summary …………………………………….…… ………………… 21 Chapter Location estimation of receiver in an audio room …………….……… 23 3.1 Introduction ……………… ….………………………… ………… 23 3.2 Methodology…………………………………………………………… 25 3.2.1 Main framework……………………………………….……… 25 3.2.2 Proposed CNN model ……………………………….… …… 26 3.3 Simulation……………………………………………………… …… 27 3.3.1 Simulation rooms ……………………………… ……….…… 27 3.3.2 Data collection ………………………………….…….……… 31 3.3.3 Feature extraction ……………………………… …………… 32 3.3.4 Simulation results and discussion ………………….………… 34 3.4 Experiment …………………………………………………………… 40 3.4.1 Experiment setup …………………………………….……… 40 3.4.2 Experiment results and discussion …………………… …… 41 iii FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques 3.5 Summary ……………………………………………………………… 44 Chapter Abnormality detection in water pumps based on sound analysis ….… 46 4.1 Introduction ……………………………………………………… … 46 4.2 Methodology …… …………………………………………………… 49 4.2.1 Data collection ………………………………… ………….… 50 4.2.2 Pre-processing ………………………… …………………… 50 4.2.3 Feature extraction ………………………… ………………… 52 4.2.4 CNN models ………………………………………………… 54 4.2.5 Balancing the training datasets …………………………… … 56 4.3 Results and discussion ………………………………………….……… 57 4.3.1 Abnormality detection in a known machine ………… ……… 57 4.3.2 Abnormality detection in an unknown machine ……………… 59 4.4 Summary ……… …………………………………… ……………… 64 Chapter Sound classification for diagnosis of heart valve diseases ………….… 66 5.1 Introduction …………………………………………….………… 66 5.2 Related works ………………………………………………………… 68 5.3 Methodology …………………………………….…………………… 71 5.3.1 Data collection …………………………… ……….………… 71 5.3.2 Data preprocessing …………………………………………… 71 5.3.3 Feature extraction …………………………………………… 71 5.3.4 Proposed DL models …………………………… …………… 73 5.4 Results and discussion ……………………….………………………… 76 5.5 Summary and future works ……………………… …………………… 81 Chapter Conclusion and future works …………………………………….…… 82 6.1 Conclusion ……… …………… ….……………….………………… 82 6.2 Future works …….…… ……………… ……………… …………… 83 References ……………………………………………………………………… 84 Biography ……………………………………………………………………… 98 List of publicactions ……………………………………………………………… 99 iv FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques List of Figures Fig 1.1 Main framework of an ASC system ………………… …… ………… Fig 2.1 DT architecture example ………………………………… ………… 12 Fig 2.2 RF prediction process example …………………… ………………… 13 Fig 2.3 Mechanism of training an SVM classifier in a binary classification problem ……………………………………………………………… 14 Fig 2.4 An example of MLPs’ architecture …………………………………… 15 Fig 2.5 The plot of the common activation functions ………………………… 16 Fig 2.6 Pooling algorithms …………………………………………….……… 17 Fig 2.7 Architecture of RNN ……………………………………….………… 18 Fig 2.8 Architecture of LSTM block …………………………………… …… 19 Fig 3.1 The main framework of the sound receiver’s location estimation …… 26 Fig 3.2 Description of the proposed CNN architecture ……………………… 27 Fig 3.3 Configuration of the simulation room ………………………………… 28 Fig 3.4 BRIRs results of the receiver at the three different locations …….…… 30 Fig 3.5 Receiver’s location division classes in the simulation rooms … …… 31 Fig 3.6 Spectrogram without and with the threshold in the feature extraction of an audio signal …………….……………… ……………………….…… 33 Fig 3.7 Accuracy and loss curves of training progress ………….…………… 35 Fig 3.8 Confusion matrix of Room C ………………………………………… 40 Fig 3.9 Experiment room with the sound source and receiver ……….……… 41 Fig 3.10 Spectrogram of an audio signal of the experiment room ….………… 42 Fig 3.11 Accuracy and loss curves of the training progress of the experiment room ………………………………………………………………… 43 Fig 3.12 Confusion matrix of the experiment room …………………………… 43 Fig 4.1 The main framework of the CNN model for machine fault detection using sound signals ………………………………………….………… … 49 Fig 4.2 The samples of a normal and abnormal sound signal of three pumps with and without pre-processing ……………….………….……………… 52 Fig 4.3 Mel-spectrogram of the normal and abnormal sound signals from three pumps ………………………… …………………………………… 53 Fig 4.4 The architecture of AlexNet …………………….………………….… 54 v FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques Fig 4.5 The architecture of one of the designed CNN models …………….… 56 Fig 4.6 Data balancing using the random oversampling technique ……… … 56 Fig 4.7 Confusion matrices of each trained and tested models ………….…… 59 Fig 4.8 Confusion matrices each trained and tested models ………………… 63 Fig 4.9 The system automatically detects the fault of a pump through the sound signal ………………………………………………………………… 65 Fig 5.1 The main framework of heart sound classification … ……………… 68 Fig 5.2 The histogram of the pixel values of the training data ……………… 72 Fig 5.3 Waveform and log-mel spectrogram of some heart sound samples … 73 Fig 5.4 The architecture of the proposed LSTM model ……….……………… 73 Fig 5.5 The architecture of the proposed CNN model ………………….…… 74 Fig 5.6 Confusion matrices of LSTM models …………………….………… 78 Fig 5.7 Confusion matrices of CNN models …… ……….………………… 78 Fig 5.8 Performance comparison of LSTM and CNN models ………….…… 80 Fig 5.9 Performance comparison of previous studies and proposed models … 80 vi FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques List of Tables Table 3.1 Dimensions, face materials, the sound source’s location, and the receiver’s location of the simulation rooms ………………………… 29 Table 3.2 Absorption coefficients of face materials depend on the frequency band ………………………………………………………………… 30 Table 3.3 The number of classes and audio signals for each simulation room … 32 Table 3.4 Accuracy and training time of the simulation rooms ….…… ……… 36 Table 3.5 Precision, Sensitivity, and F1_score of Room A ………………….… 37 Table 3.6 Precision, Sensitivity, and F1_score of Room B ………………….… 38 Table 3.7 Precision, Sensitivity, and F1_score of Room C ………………….… 39 Table 3.8 Parameters of the experiment room ………………….…… ……… 41 Table 3.9 Precision, Sensitivity, and F1_score of the experiment room …….… 43 Table 4.1 Dataset content details ……………………………………………… 50 Table 4.2 Parameters of the CNN architecture …………….………… ……… 55 Table 4.3 The setting hyperparameters of the CNN models ……… ………… 55 Table 4.4 Classification results of AlexNet for abnormality detection in a known machine ………………………………………….….……………… 57 Table 4.5 Classification results of Model for abnormality detection in a known machine ………………………………………….….……………… 57 Table 4.6 Classification results of Model for abnormality detection in a known machine ………………………………………….….……………… 58 Table 4.7 Classification results of Model for abnormality detection in a known machine ………………………………………….….……………… 58 Table 4.8 The nine different pump combinations ……………………………… 60 Table 4.9 Classification results of AlexNet for abnormality detection in an unknown machine …………………………………………………… 60 Table 4.10 Classification results of Model for abnormality detection in an unknown machine …………………………………………………… 61 Table 4.11 Classification results of Model for abnormality detection in an unknown machine …………………………………………………… 61 Table 4.12 Classification results of Model for abnormality detection in an unknown machine …………………………………………………… 62 vii FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques Table 5.1 Summarized studies on the classification of heart sound using DL techniques …………………… ….…….…………………… …… 69 Table 5.2 Detail of the dataset ………………………………………………… 71 Table 5.3 Setting parameters of proposed LSTM model ……………………… 74 Table 5.4 Setting parameters of proposed CNN model ……….……………… 75 Table 5.5 The hyperparameters of the training processes …….…… ………… 76 Table 5.6 Classification results of 2.0 s-segment duration … ………………… 77 Table 5.7 Classification results of 1.5 s-segment duration …… ……………… 77 Table 5.8 Classification results of 1.0 s-segment duration ………….………… 78 Table 5.9 Single sample prediction time (ms) ………………………………… 79 viii FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [11] K Umapathy, S Krishnan, and R K Rao, "Audio signal feature extraction and classification using local discriminant bases," IEEE Transactions on Audio, Speech, and Language Processing, vol 15, pp 1236-1246, 2007 [12] C Xu, N C Maddage, and X Shao, "Automatic music classification and summarization," IEEE transactions on speech and audio processing, vol 13, pp 441-450, 2005 [13] J Ajmera, I McCowan, and H Bourlard, "Speech/music segmentation using entropy and dynamism features in a HMM classification framework," Speech communication, vol 40, pp 351-363, 2003 [14] C Panagiotakis and G Tziritas, "A speech/music discriminator based on RMS and zero-crossings," IEEE Transactions on multimedia, vol 7, pp 155-166, 2005 [15] S Honda, T Shinohara, T Uebo, and N Nakasako, "Estimating the Distance to a Sound Source using Single-Channel Cross-Spectral Method between Observed and Pseudo-Observed Waves based on Phase Interference," in Proceedings of the 23rd International Congress on Sound & Vibration, Athens, Greece, 2016, pp 10-14 [16] F Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological review, vol 65, p 386, 1958 [17] D E Rumelhart, G E Hinton, and R J Williams, "Learning representations by back-propagating errors," nature, vol 323, pp 533-536, 1986 [18] A Krizhevsky, I Sutskever, and G E Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol 25, pp 1097-1105, 2012 [19] G Hinton, L Deng, D Yu, G E Dahl, A.-r Mohamed, N Jaitly, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal processing magazine, vol 29, pp 82-97, 2012 [20] H Purwins, B Li, T Virtanen, J Schlüter, S.-Y Chang, and T Sainath, "Deep learning for audio signal processing," IEEE Journal of Selected Topics in Signal Processing, vol 13, pp 206-219, 2019 [21] F Richardson, D Reynolds, and N Dehak, "Deep neural network approaches to speaker and language recognition," IEEE signal processing letters, vol 22, pp 1671-1675, 2015 85 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [22] S Bansal, H Kamper, A Lopez, and S Goldwater, "Towards speech-to-text translation without speech recognition," arXiv preprint arXiv:1702.03856, 2017 [23] A Sehgal and N Kehtarnavaz, "A convolutional neural network smartphone app for real-time voice activity detection," IEEE Access, vol 6, pp 9017-9026, 2018 [24] Detection and Classification of Acoustic Scenes and Events Available: http://dcase.community/ [25] M Yiwere and E J Rhee, "Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach," Sensors, vol 20, p 172, 2020 [26] N Yalta, K Nakadai, and T Ogata, "Sound source localization using deep learning models," Journal of Robotics and Mechatronics, vol 29, pp 37-48, 2017 [27] P Bentley, G Nordehn, M Coimbra, S Mannor, and R Getz Classifying Heart Sounds Challenge Available: http://www.peterjbentley.com/heartchallenge/ [28] Q Chen, W Zhang, X Tian, X Zhang, S Chen, and W Lei, "Automatic heart and lung sounds classification using convolutional neural networks," in 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, pp 1-4 [29] H Liu, L Li, and J Ma, "Rolling bearing fault diagnosis based on STFT-deep learning and sound signals," Shock and Vibration, vol 2016, 2016 [30] D Yu and L Deng, "Deep learning and its applications to signal and information processing [exploratory dsp]," IEEE Signal Processing Magazine, vol 28, pp 145-154, 2010 [31] T Giannakopoulos and A Pikrakis, Introduction to Audio Analysis: a MATLAB® approach: Academic Press, 2014 [32] Z Fu, G Lu, K M Ting, and D Zhang, "A survey of audio-based music classification and annotation," IEEE transactions on multimedia, vol 13, pp 303-319, 2010 [33] P Mermelstein, "Automatic segmentation of speech into syllabic units," The Journal of the Acoustical Society of America, vol 58, pp 880-883, 1975 [34] G Peeters, "A large set of audio features for sound description (similarity and classification) in the CUIDADO project," CUIDADO Ist Project Report, vol 54, pp 1-25, 2004 86 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [35] H Jiang, J Bai, S Zhang, and B Xu, "SVM-based audio scene classification," in 2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005, pp 131-136 [36] V Peltonen, J Tuomi, A Klapuri, J Huopaniemi, and T Sorsa, "Computational auditory scene recognition," in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, pp II-1941-II-1944 [37] J Saunders, "Real-time discrimination of broadcast speech/music," in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996, pp 993-996 [38] P Ahrendt, A Meng, and J Larsen, "Decision time horizon for music genre classification using short time features," in 2004 12th European Signal Processing Conference, 2004, pp 1293-1296 [39] R Bachu, S Kopparthi, B Adapa, and B Barkana, "Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal," in American Society for Engineering Education (ASEE) zone conference proceedings, 2008, pp 1-7 [40] M Ramona, G Richard, and B David, "Vocal detection in music with support vector machines," in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp 1885-1888 [41] A Pikrakis, T Giannakopoulos, and S Theodoridis, "Gunshot detection in audio streams from movies by means of dynamic programming and bayesian networks," in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp 21-24 [42] T Giannakopoulos, A Pikrakis, and S Theodoridis, "A multi-class audio classification method with respect to violent content in movies using bayesian networks," in 2007 IEEE 9th Workshop on Multimedia Signal Processing, 2007, pp 90-93 [43] E Schubert, J Wolfe, and A Tarnopolsky, "Spectral centroid and timbre in complex, multiple instrumental textures," in Proceedings of the international conference on music perception and cognition, North Western University, Illinois, 2004, pp 112-116 [44] A Pikrakis, T Giannakopoulos, and S Theodoridis, "A computationally efficient speech/music discriminator for radio recordings," in ISMIR, 2006, pp 107-110 87 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [45] A Pikrakis, T Giannakopoulos, and S Theodoridis, "A speech/music discriminator of radio recordings based on dynamic programming and bayesian networks," IEEE Transactions on Multimedia, vol 10, pp 846-857, 2008 [46] S Lee, J Kim, and I Lee, "Speech/audio signal classification using spectral flux pattern recognition," in 2012 IEEE Workshop on Signal Processing Systems, 2012, pp 232-236 [47] T Li, M Ogihara, and Q Li, "A comparative study on content-based music genre classification," in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 2003, pp 282-289 [48] X Valero and F Alías, "Applicability of MPEG-7 low level descriptors to environmental sound source recognition," in Proceedings 1st Euroregio Conference, Ljubjana, 2010 [49] A I Al-Shoshan, "Speech and music classification and separation: a review," Journal of King Saud University-Engineering Sciences, vol 19, pp 95-132, 2006 [50] L Lu, D Liu, and H.-J Zhang, "Automatic mood detection and tracking of music audio signals," IEEE Transactions on audio, speech, and language processing, vol 14, pp 5-18, 2005 [51] D O’Saughnessy, "Speech communication-human and machine," Reading, PA: Addison-Wesley, 1987 [52] C Ittichaichareon, S Suksri, and T Yingthawornsuk, "Speech recognition using MFCC," in International conference on computer graphics, simulation and modeling, 2012, pp 135-138 [53] V Tiwari, "MFCC and its applications in speaker recognition," International journal on emerging technologies, vol 1, pp 19-22, 2010 [54] S O Sadjadi and J H Hansen, "Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions," in Eleventh Annual Conference of the International Speech Communication Association, 2010 [55] G Kour and N Mehan, "Music genre classification using MFCC, SVM and BPNN," International Journal of Computer Applications, vol 112, 2015 88 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [56] O Lartillot, P Toiviainen, and T Eerola, "A matlab toolbox for music information retrieval," in Data analysis, machine learning and applications, ed: Springer, 2008, pp 261-268 [57] J H Jensen, M G Christensen, D P Ellis, and S H Jensen, "Quantitative analysis of a common audio similarity measure," IEEE Transactions on Audio, Speech, and Language Processing, vol 17, pp 693-703, 2009 [58] Z Ali, M Alsulaiman, G Muhammad, I Elamvazuthi, and T A Mesallam, "Vocal fold disorder detection based on continuous speech by using MFCC and GMM," in 2013 7th IEEE GCC Conference and Exhibition (GCC), 2013, pp 292-297 [59] A Şengür, Y Guo, and Y Akbulut, "Time–frequency texture descriptors of EEG signals for efficient detection of epileptic seizure," Brain Informatics, vol 3, pp 101-108, 2016 [60] Y M Costa, L S Oliveira, and C N Silla Jr, "An evaluation of convolutional neural networks for music classification using spectrograms," Applied soft computing, vol 52, pp 28-38, 2017 [61] A Montalvo, Y M Costa, and J R Calvo, "Language identification using spectrogram texture," in Iberoamerican Congress on Pattern Recognition, 2015, pp 543-550 [62] L Pham, H Phan, T Nguyen, R Palaniappan, A Mertins, and I McLoughlin, "Robust acoustic scene classification using a multi-spectrogram encoderdecoder framework," Digital Signal Processing, vol 110, p 102943, 2021 [63] H Zhang, "The Optimality of Naive Bayes, 2004," American Association for Artificial Intelligence (www aaai org), 2004 [64] A McCallum and K Nigam, "A comparison of event models for naive bayes text classification," in AAAI-98 workshop on learning for text categorization, 1998, pp 41-48 [65] V Metsis, I Androutsopoulos, and G Paliouras, "Spam filtering with naive bayes-which naive bayes?," in CEAS, 2006, pp 28-69 [66] L E Peterson, "K-nearest neighbor," Scholarpedia, vol 4, p 1883, 2009 [67] S R Safavian and D Landgrebe, "A survey of decision tree classifier methodology," IEEE transactions on systems, man, and cybernetics, vol 21, pp 660-674, 1991 89 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [68] A Parmar, R Katariya, and V Patel, "A review on random forest: An ensemble classifier," in International Conference on Intelligent Data Communication Technologies and Internet of Things, 2018, pp 758-763 [69] T M Oshiro, P S Perez, and J A Baranauskas, "How many trees in a random forest?," in International workshop on machine learning and data mining in pattern recognition, 2012, pp 154-168 [70] Y LeCun, Y Bengio, and G Hinton, "Deep learning," nature, vol 521, pp 436-444, 2015 [71] L B Almeida, "C1 Multilayer perceptrons," Handbook of Neural Computation C, vol 1, 1997 [72] I Vilovic, "An experience in image compression using neural networks," in Proceedings ELMAR 2006, 2006, pp 95-98 [73] T Koskela, M Lehtokangas, J Saarinen, and K Kaski, "Time series prediction with multilayer perceptron, FIR and Elman neural networks," in Proceedings of the World Congress on Neural Networks, 1996, pp 491-496 [74] T.-h Kim, "Pattern recognition using artificial neural network: a review," in International Conference on Information Security and Assurance, 2010, pp 138-148 [75] D H Hubel and T N Wiesel, "Receptive fields of single neurones in the cat's striate cortex," The Journal of physiology, vol 148, pp 574-591, 1959 [76] A A M Al-Saffar, H Tao, and M A Talab, "Review of deep convolution neural network in image classification," in 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET), 2017, pp 26-31 [77] A B Nassif, I Shahin, I Attili, M Azzeh, and K Shaalan, "Speech recognition using deep neural networks: A systematic review," IEEE access, vol 7, pp 19143-19165, 2019 [78] A Khamparia, D Gupta, N G Nguyen, A Khanna, B Pandey, and P Tiwari, "Sound classification using convolutional neural network and tensor deep stacking network," IEEE Access, vol 7, pp 7717-7727, 2019 [79] K Parikh (2019) Understanding the Convolution function and CNN Available: https://medium.com/@parikhkadam/article-1-understanding-the- convolution-function-and-cnn-21dca53e2c27 90 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [80] S Hochreiter and J Schmidhuber, "Long short-term memory," Neural computation, vol 9, pp 1735-1780, 1997 [81] A W Bronkhorst, "Modeling auditory distance perception in rooms," presented at the Proceedings of the AAE Forum Acusticum, Sevilla, Spain, 2002 [82] Y.-C Lu and M Cooke, "Binaural estimation of sound source distance via the direct-to-reverberant energy ratio for static and moving sources," IEEE Transactions on Audio, Speech, and Language Processing, vol 18, pp 17931805, 2010 [83] T Rodemann, "A study on distance estimation in binaural sound localization," in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp 425-430 [84] L Wang and A Cavallaro, "Time-frequency processing for sound source localization from a micro aerial vehicle," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp 496-500 [85] S Chakrabarty and E A Habets, "Broadband DOA estimation using convolutional neural networks trained with noise signals," in 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017, pp 136-140 [86] S Chakrabarty and E A Habets, "Multi-speaker localization using convolutional neural network trained with noise," arXiv preprint arXiv:1712.04276, 2017 [87] T Rodemann, G Ince, F Joublin, and C Goerick, "Using binaural and spectral cues for azimuth and elevation localization," in 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp 2185-2190 [88] L Perotin, R Serizel, E Vincent, and A Guérin, "CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector," in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, pp 241-245 [89] N D Gaubitch, W B Kleijn, and R Heusdens, "Auto-localization in ad-hoc microphone arrays," in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp 106-110 91 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [90] R Parhizkar, I Dokmanić, and M Vetterli, "Single-channel indoor microphone localization," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp 1434-1438 [91] Z Huang, J Xu, Z Gong, H Wang, and Y Yan, "Multiple source localization in a shallow water waveguide exploiting subarray beamforming and deep neural networks," Sensors, vol 19, p 4768, 2019 [92] R Takeda and K Komatani, "Sound source localization based on deep neural networks with directional activate function exploiting phase information," in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2016, pp 405-409 [93] J B Allen and D A Berkley, "Image method for efficiently simulating small‐ room acoustics," The Journal of the Acoustical Society of America, vol 65, pp 943-950, 1979 [94] D E Hall, Basic acoustics: Wiley, 1987 [95] (2020) Office Noise and Acoustics Available: https://canadasafetycouncil.org/office-noise-and-acoustics/ [96] P Henriquez, J B Alonso, M A Ferrer, and C M Travieso, "Review of automatic fault diagnosis systems using audio and vibration signals," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol 44, pp 642-652, 2013 [97] N Tandon and A Choudhury, "A review of vibration and acoustic measurement methods for the detection of defects in rolling element bearings," Tribology international, vol 32, pp 469-480, 1999 [98] Z Zhong, J Chen, P Zhong, and J Wu, "Application of the blind source separation method to feature extraction of machine sound signals," The International Journal of Advanced Manufacturing Technology, vol 28, pp 855862, 2006 [99] Y Yao, H Wang, S Li, Z Liu, G Gui, Y Dan, et al., "End-to-end convolutional neural network model for gear fault diagnosis based on sound signals," Applied Sciences, vol 8, p 1584, 2018 [100] W Li, Y Tsai, and C Chiu, "The experimental study of the expert system for diagnosing unbalances by ANN and acoustic signals," Journal of Sound and Vibration, vol 272, pp 69-83, 2004 92 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [101] U Benko, J Petrovčič, Đ Juričić, J Tavčar, J Rejec, and A Stefanovska, "Fault diagnosis of a vacuum cleaner motor by means of sound analysis," Journal of Sound and Vibration, vol 276, pp 781-806, 2004 [102] U Benko, J Petrovc̆ic̆, Đ Juričić, J Tavčar, and J Rejec, "An approach to fault diagnosis of vacuum cleaner motors based on sound analysis," Mechanical Systems and Signal Processing, vol 19, pp 427-445, 2005 [103] J Lin, "Feature extraction of machine sound using wavelet and its application in fault diagnosis," NDT & e International, vol 34, pp 25-30, 2001 [104] H Kumar, V Sugumaran, and M Amarnath, "Fault diagnosis of bearings through sound signal using statistical features and Bayes classifier," 2016 [105] M Khazaee, H Ahmadi, M Omid, A Moosavian, and M Khazaee, "Classifier fusion of vibration and acoustic signals for fault diagnosis and classification of planetary gears based on Dempster–Shafer evidence theory," Proceedings of the Institution of Mechanical Engineers, Part E: Journal of Process Mechanical Engineering, vol 228, pp 21-32, 2014 [106] J Lee, H Choi, D Park, Y Chung, H.-Y Kim, and S Yoon, "Fault detection and diagnosis of railway point machines by sound analysis," Sensors, vol 16, p 549, 2016 [107] M Gan and C Wang, "Construction of hierarchical diagnosis network based on deep learning and its application in the fault pattern recognition of rolling element bearings," Mechanical Systems and Signal Processing, vol 72, pp 92104, 2016 [108] M He and D He, "Deep learning based approach for bearing fault diagnosis," IEEE Transactions on Industry Applications, vol 53, pp 3057-3065, 2017 [109] S Haidong, J Hongkai, L Xingqiu, and W Shuaipeng, "Intelligent fault diagnosis of rolling bearing using deep wavelet auto-encoder with extreme learning machine," Knowledge-Based Systems, vol 140, pp 1-14, 2018 [110] F Jia, Y Lei, L Guo, J Lin, and S Xing, "A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines," Neurocomputing, vol 272, pp 619-628, 2018 [111] C Li, R.-V Sánchez, G Zurita, M Cerrada, and D Cabrera, "Fault diagnosis for rotating machinery using vibration measurement deep statistical feature learning," Sensors, vol 16, p 895, 2016 93 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [112] W Zhang, C Li, G Peng, Y Chen, and Z Zhang, "A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load," Mechanical Systems and Signal Processing, vol 100, pp 439-453, 2018 [113] K Liang, N Qin, D Huang, and Y Fu, "Convolutional recurrent neural network for fault diagnosis of high-speed train bogie," Complexity, vol 2018, 2018 [114] T Tang, T Hu, M Chen, R Lin, and G Chen, "A deep convolutional neural network approach with information fusion for bearing fault diagnosis under different working conditions," Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, p 0954406220902181, 2020 [115] R.-Y Yang and R Rai, "Machine auscultation: enabling machine diagnostics using convolutional neural networks and large-scale machine audio data," Advances in Manufacturing, vol 7, pp 174-187, 2019 [116] H Purohit, R Tanabe, K Ichige, T Endo, Y Nikaido, K Suefusa, et al., "MIMII Dataset: Sound dataset for malfunctioning industrial machine investigation and inspection," arXiv preprint arXiv:1909.09347, 2019 [117] S V Vaseghi, Advanced digital signal processing and noise reduction: John Wiley & Sons, 2008 [118] A O M Salih, "Audio Noise Reduction Using Low Pass Filters," Open Access Library Journal, vol 4, pp 1-7, 2017 [119] A Savitzky and M J Golay, "Smoothing and differentiation of data by simplified least squares procedures," Analytical chemistry, vol 36, pp 16271639, 1964 [120] J Chen and Y Shen, "The effect of kernel size of CNNs for lung nodule classification," in 2017 9th international conference on advanced infocomm technology (ICAIT), 2017, pp 340-344 [121] N Srivastava, G Hinton, A Krizhevsky, I Sutskever, and R Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol 15, pp 1929-1958, 2014 [122] (2021) Cardiovascular Diseases Available: https://www.who.int/healthtopics/cardiovascular-diseases#tab=tab_1 94 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [123] C M Otto, Textbook of clinical echocardiography: Elsevier Health Sciences, 2013 [124] S A Morris and T C Slesnick, "Magnetic resonance imaging," Visual Guide to Neonatal Cardiology, pp 104-108, 2018 [125] M Ter-Pogossian, E Weiss, R Coleman, and B Sobel, "Computed tomography of the heart," American Journal of Roentgenology, vol 127, pp 7990, 1976 [126] A Varga-Szemes, F G Meinel, C N De Cecco, S R Fuller, R R Bayer, and U J Schoepf, "CT myocardial perfusion imaging," American Journal of Roentgenology, vol 204, pp 487-497, 2015 [127] S Karpagachelvi, M Arthanari, and M Sivakumar, "ECG feature extraction techniques-a survey approach," arXiv preprint arXiv:1005.0957, 2010 [128] B S Emmanuel, "A review of signal processing techniques for heart sound analysis in clinical diagnosis," Journal of medical engineering & technology, vol 36, pp 303-307, 2012 [129] H Uğuz, "Adaptive neuro-fuzzy inference system for diagnosis of the heart valve diseases using wavelet transform with entropy," Neural Computing and applications, vol 21, pp 1617-1628, 2012 [130] A Gharehbaghi, T Dutoit, P Ask, and L Sörnmo, "Detection of systolic ejection click using time growing neural network," Medical engineering & physics, vol 36, pp 477-483, 2014 [131] M Zabihi, A B Rad, S Kiranyaz, M Gabbouj, and A K Katsaggelos, "Heart sound anomaly and quality detection using ensemble of neural networks without segmentation," in 2016 Computing in Cardiology Conference (CinC), 2016, pp 613-616 [132] C Liu, D Springer, Q Li, B Moody, R A Juan, F J Chorro, et al., "An open access database for the evaluation of heart sound algorithms," Physiological Measurement, vol 37, p 2181, 2016 [133] C Potes, S Parvaneh, A Rahman, and B Conroy, "Ensemble of feature-based and deep learning-based classifiers for detection of abnormal heart sounds," in 2016 Computing in Cardiology Conference (CinC), 2016, pp 621-624 [134] H.-l Her and H.-W Chiu, "Using time-frequency features to recognize abnormal heart sounds," in 2016 Computing in Cardiology Conference (CinC), 2016, pp 1145-1147 95 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [135] G D Clifford, C Liu, B Moody, D Springer, I Silva, Q Li, et al., "Classification of normal/abnormal heart sound recordings: The PhysioNet/Computing in Cardiology Challenge 2016," in 2016 Computing in Cardiology Conference (CinC), 2016, pp 609-612 [136] M Tschannen, T Kramer, G Marti, M Heinzmann, and T Wiatowski, "Heart sound classification using deep structured features," in 2016 Computing in Cardiology Conference (CinC), 2016, pp 565-568 [137] W Zhang and J Han, "Towards heart sound classification without segmentation using convolutional neural network," in 2017 Computing in Cardiology (CinC), 2017, pp 1-4 [138] P Bentley, G Nordehn, M Coimbra, S Mannor, and R Getz, "Classifying heart sounds challenge," Retrieved from Classifying Heart Sounds Challenge: http://www peterjbentley com/heartchallenge, 2011 [139] M Nassralla, Z El Zein, and H Hajj, "Classification of normal and abnormal heart sounds," in 2017 Fourth International Conference on Advances in Biomedical Engineering (ICABME), 2017, pp 1-4 [140] F Beritelli, G Capizzi, G L Sciuto, C Napoli, and F Scaglione, "Automatic heart activity diagnosis based on Gram polynomials and probabilistic neural networks," Biomedical engineering letters, vol 8, pp 77-85, 2018 [141] S Latif, M Usman, R Rana, and J Qadir, "Phonocardiographic sensing using deep learning for abnormal heartbeat detection," IEEE Sensors Journal, vol 18, pp 9393-9400, 2018 [142] V Sujadevi, K Soman, R Vinayakumar, and A P Sankar, "Deep models for phonocardiography (PCG) classification," in 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT), 2017, pp 211-216 [143] B Bozkurt, I Germanakis, and Y Stylianou, "A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection," Computers in biology and medicine, vol 100, pp 132-143, 2018 [144] L Chen, J Ren, Y Hao, and X Hu, "The diagnosis for the extrasystole heart sound signals based on the deep learning," Journal of Medical Imaging and Health Informatics, vol 8, pp 959-968, 2018 96 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques [145] M Sotaquirá, D Alvear, and M Mondragón, "Phonocardiogram classification using deep neural networks and weighted probability comparisons," Journal of medical engineering & technology, vol 42, pp 510-517, 2018 [146] G.-Y Son and S Kwon, "Classification of heart sound signal using multiple features," Applied Sciences, vol 8, p 2344, 2018 [147] (2018) Available: https://github.com/yaseen21khan/Classification-of-HeartSound-Signal-Using-Multiple-Features-/find/master [148] A Raza, A Mehmood, S Ullah, M Ahmad, G S Choi, and B.-W On, "Heartbeat sound signal classification using deep learning," Sensors, vol 19, p 4819, 2019 [149] J M.-T Wu, M.-H Tsai, Y Z Huang, S H Islam, M M Hassan, A Alelaiwi, et al., "Applying an ensemble convolutional neural network with Savitzky– Golay filter to construct a phonocardiogram prediction model," Applied Soft Computing, vol 78, pp 29-40, 2019 [150] S L Oh, V Jahmunah, C P Ooi, R.-S Tan, E J Ciaccio, T Yamakawa, et al., "Classification of heart sound signals using a novel deep WaveNet model," Computer Methods and Programs in Biomedicine, p 105604, 2020 [151] P Narváez, S Gutierrez, and W S Percybrooks, "Automatic Segmentation and Classification of Heart Sounds Using Modified Empirical Wavelet Transform and Power Features," Applied Sciences, vol 10, p 4791, 2020 [152] B J Gersh, Mayo Clinic heart book: W Morrow, 2000 [153] J O Smith, Mathematics of the discrete Fourier transform (DFT): with audio applications: Julius Smith, 2007 97 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques Biography Minh-Tuan Nguyen received his B.E and M.E degrees in Mechanical Engineering from Hanoi University of Science and Technology, Hanoi, Vietnam in 2008 and 2013 He has worked as a lecturer at the Faculty of Mechanical Engineering, Hung Yen University of Technology and Education, Hung Yen, Vietnam since 2009 At present, he is pursuing a Ph.D degree in Mechanical Engineering at Feng Chia University, Taiwan (ROC) His research interest focuses on audio signal processing using DL techniques 98 FCU e-Theses & Dissertations (2021) Audio Signal Classification Using Deep Learning Techniques List of publications [1] Nguyen M.T., Yuan C., Huang J.H (2019) Kinematic Analysis of A 6-DOF Robotic Arm In: Uhl T (eds) Advances in Mechanism and Machine Science IFToMM WC 2019 Mechanisms and Machine Science, vol 73 Springer, Cham https://doi.org/10.1007/978-3-030-20131-9_292 [2] Nguyen MT., Huang JH (2020) Smooth and Time Optimal Trajectory Planning for Industrial Robot Using a Single Polynomial In: Sattler KU., Nguyen D., Vu N., Tien Long B., Puta H (eds) Advances in Engineering Research and Application ICERA 2019 Lecture Notes in Networks and Systems, vol 104 Springer, Cham https://doi.org/10.1007/978-3-030-37497-6_76 [3] Minh-Tuan Nguyen and Jin-H Huang, Location estimation of receivers in an audio room using deep learning with a convolution neural network, accepted to be published in Journal of Information Science and Engineering, 2020 https://jise.iis.sinica.edu.tw/pages/issues/accepted.html [4] Minh Tuan Nguyen and Jin H Huang, Fault detection in water pumps based on sound analysis using a deep learning technique, submitted the revision to Proceedings of the Institution of Mechanical Engineers, Part E: Journal of Process Mechanical Engineering, 2021 [5] Minh Tuan Nguyen, Wei-Wen Lin and Jin H Huang, Heart sound classification using deep learning techniques based on log-mel spectrogram, submitted to Computer Methods in Biomechanics and Biomedical Engineering, 2021 99 FCU e-Theses & Dissertations (2021)

Ngày đăng: 16/11/2023, 13:12

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan