1. Trang chủ
  2. » Giáo Dục - Đào Tạo

NOISE REDUCTION IN SPEECH ENHANCEMENT BY SPECTRAL SUBTRACTION WITH SCALAR KALMAN FILTER

48 310 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 48
Dung lượng 0,93 MB

Nội dung

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Đặng Minh Công NOISE REDUCTION IN SPEECH ENHANCEMENT BY SPECTRAL SUBTRACTION WITH SCALAR KALMAN FILTER Major:Computer Science HA NOI – 2015 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Đặng Minh Công NOISE REDUCTION IN SPEECH ENHANCEMENT BY SPECTRAL SUBTRACTION WITH SCALAR KALMAN FILTER Major:Computer Science Supervisor:Assoc Prof Dr Nguyễn Đình Việt HA NOI – 2015 AUTHORSHIP “I hereby declare that the work contained in this thesis is of my own and has not been previously submitted for a degree or diploma at this or any other higher education institution To the best of my knowledge and belief, the thesis contains no materials previously published or written by another person except where due reference or acknowledgement is made.” Signature:……………………………………………… SUPERVISOR’S APPROVAL “I hereby approve that the thesis in its current form is ready for committee examination as a requirement for the Bachelor of Computer Science degree at the University of Engineering and Technology.” Signature:……………………………………………… ACKNOWLEDGEMENT I would like to express my sincere gratitude to my supervisor Assoc Prof Nguyễn Đình Việt for his valuable guidance and feedback during the whole time I work on this thesis I greatly appreciate the Department of Information Technology, University of Engineering and Technology for many valuable knowledge and skills I learn during my studies at there Finally, I would like to thank my friends and family, who have supported me during the time I studied at UET ABSTRACT In the system that related to speech communication like telecommunication system or speech processing, the presence of background noise in speech signal is undesirable Background noise can make the user harder to hear the speech, or decrease the performance of speech processing systems Therefore, to enhance the quality of speech signal, noise reduction is an important problem In this thesis, we present a single channel noise reduction method for speech enhancement This method is based on the principle of spectral subtraction methods, with the addition of using scalar Kalman Filter for residual noise removal It models the changing of speech magnitude spectrum as Gaussian random process and the magnitude residual noise as Gaussian white noise for applying scalar Kalman Filter The scalar Kalman Filter used in this method is designed in order to be suitable for the characteristics of speech and noise signal Our obtained experiment results with the online NOIZEUS speech corpus show that the presented method has consistent improved the SNR measures of noisy speech signal In overall, experiment results also show that the SNR improvement of the presented method is better than other basic implementations of spectral subtraction TÓM TẮT Trong hệ thống liên quan đến truyền thông tiếng nói người hệ thống viễn thông xử lý tiếng nói, diện nhiễu tiếng nói không mong muốn Tiếng ồn xung quanh thu với tiếng nói làm cho người dùng khó khăn để nghe phát biểu, làm giảm hiệu suất hệ thống xử lý tiếng nói Vì vậy, để nâng cao chất lượng tín hiệu tiếng nói, giảm nhiễu vấn đề quan trọng Trong khóa luận này, trình bày phương pháp giảm nhiễu để nâng cao chất lượng tiếng nói Phương pháp dựa nguyên tắc phương pháp trừ phổ, bổ sung thêm việc sử dụng lọc Kalman chiều để loại bỏ nhiễu tàn dư Phương pháp mô hình hóa thay đổi phổ biên độ giọng nói theo thời gian trình ngẫu nhiên Gauss phổ biên độ nhiễu tàn dư nhiễu trắng Gauss để áp dụng lọc Kalman chiều Bộ lọc Kalman sử dụng phương pháp thiết kế để phù hợp với đặc điểm tín hiệu giọng nói nhiễu Kết thử nghiệm với liệu mẫu tiếng nói NOIZEUS trực tuyến cho thấy phương pháp trình bày cải thiện số đo SNR tín hiệu tiếng nói bị nhiễu Nhìn chung, kết thử nghiệm cho thấy cải thiện SNR phương pháp trình bày tốt so với cài đặt khác phép trừ phổ TABLE OF CONTENTS List of Figures 10 List of Tables 11 ABBREVATIONS 12 Chapter INTRODUCTION 13 1.1 Motivation 13 1.2 Survey of existing methods 13 1.3 Contributions 14 1.4 Structure of the Thesis 14 Chapter BACKGROUND 15 2.1 Sound 15 2.2 Human perception of sound 17 2.2.1 Loudness 17 2.2.2 Pitch 18 2.2.3 Timbre 18 2.3 Audio Signal 19 2.3.1 Analog audio signal 19 2.3.2 Digital audio signal 20 2.3.3 Sampling 20 2.3.4 Quantization 22 2.4 Fourier Transform and Frequency domain representation 22 2.5 Kalman Filter 25 Chapter NOISE REDUCTION BY SPECTRAL SUBTRACTION WITH SCALAR KALMAN FILTER 26 3.1 Spectral Subtraction 26 3.1.1 Principle 26 3.1.2 Half-wave Rectification 28 3.1.3 Residual noise 28 3.1.4 Block diagram 29 3.2 Scalar Kalman Filter for reducing residual noise 29 3.2.1 Model for magnitude of both residual noise and clean speech 29 3.2.2 Scalar Kalman Filter 31 3.2.3 Measurement noise variance R 49 3.2.4 Process noise variance Q 32 3.2.5 Algorithm 33 Chapter EVALUATION 54 4.1 Objective Measures of Speech Quality 54 4.1.1 SNR 54 4.1.2 Segmental SNR (SNRseg) 55 4.2 Experiment setup 35 4.3 Experiment results 57 Chapter CONCLUSION 43 5.1 Conclusions 43 5.2 Future Works 43 Bibliography 45 Appendix A MATLAB source code of the implementation 47 List of Figures Figure 1: Sound signals of some musical instruments 15 Figure 2: Musical notes in a piano keyboard 18 Figure 3: Waveform of two particular signals with the same sinusoidal components combined in a different ways 19 Figure 4: Sampling of a sinusoidal analog signal 20 Figure 5: Sampling process with low sampling rate 21 Figure 6: Block diagram of spectral subtraction 29 Figure 7: Flowchart of Kalman Filter with each frequency component 52 Figure 8: Block diagram of the presented method 52 Figure 9: SNR and SNRseg results of three methods with sp07_car_sn0.wav 40 Figure 10: Waveform of the clean speech signal sp07.wav 41 Figure 11: Waveform of noisy speech sp07_car_sn0.wav after noise reduction by proposed method 41 Figure 12: Waveform of noisy speech sp07_car_sn0.wav after noise reduction by Boll spectral subtraction 41 Figure 13: Waveform of noisy speech sp07_car_sn0.wav after noise reduction by Berouti spectral subtraction 41 10 Chapter EVALUATION 4.1 Objective Measures of Speech Quality An important question we have to answer is that how to measure the speech quality after noise reduction Normally, it is the best way to use human testers for this purpose (subjective evaluation), however, this method is costly and not every researcher can afford it Hence, we will employ some objective measures for evaluating the performance of method presented in this thesis 4.1.1 SNR A common way for measuring signal quality is to measure Signal to Noise Ratio (SNR) It is defined as the ratio of total energy of speech signal to total energy of noise signal, expressed in decibels: ∑𝑛 𝑠 [𝑛] ∑𝑛 𝑠 [𝑛] 𝑆𝑁𝑅 = 10 log10 = 10 log10 ∑𝑛 𝑣 [𝑛] ∑𝑛(𝑠[𝑛] − 𝑠̂ [𝑛])2 𝑠[𝑛] is the clean signal and 𝑣[𝑛] is the noise signal, both in time domain The SNR using for speech quality evaluation is not calculated on the whole signal, but only on the parts with speech activity Therefore, in order to calculate the SNR for speech signal, we must employ a Voice Activity Detector (VAD) to remove all the parts without speech activity However, the SNR has a weakness that it doesn’t take into account the distribution of noise and speech over time The SNR in one section of signal can be very low, but the overall SNR can still be high, because the total noise energy is small compare to the total energy of speech Thus the SNR is also called global SNR, to differentiate from the following type of SNR we will mention below 4.1.2 Segmental SNR (SNRseg) The segmental SNR is one of the most popular methods for measure speech quality It is calculated by dividing the signal into many segments and calculating the SNR of each 34 segment After that, the segmental SNR is defined as the mean of calculated SNR of each segment: 𝑀−1 ∑𝑛=(𝑁+1)𝑚−1 𝑠 [𝑛] 𝑛=𝑁𝑚 𝑆𝑁𝑅𝑠𝑒𝑔 = ∑ 10 log10 𝑛=(𝑁+1)𝑚−1 𝑀 ∑ 𝑣 [𝑛] 𝑛=𝑁𝑚 𝑚=0 (𝑀 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠, 𝑁 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑠𝑖𝑧𝑒) The size of segment is chosen according to the characteristics of speech signal (around 20 𝑚𝑠 to 40 𝑚𝑠) This type of SNR is more suitable for speech quality assessment than the global SNR In general, SNR measures have the benefit of low computational cost, but because it is calculated on time domain signals, some pre-processing must be done to align the tested signals 4.2 Experiment setup The algorithm of the presented method is implemented in MATLAB 8.3 (R2014a), with the STFT module taken from: http://www.mathworks.com/matlabcentral/fileexchange/45577inverse-short-time-fourier-transformation istft with-matlab-implementation, because MATLAB doesn’t have built-in Inverse STFT function (it only has the spectrogram function, which performs STFT) The window size for STFT is approximately 20 𝑚𝑠, rounding to the next power of two (with 8000 Hz sample rate, the window size is 128) The overlapping amount of a STFT frame with its next frame is80% The first 40 STFT time frames (approximately160 𝑚𝑠) are taken to be no-speech section and used for noise estimation The number of ∆𝑦̂𝑛 used for estimating 𝑄𝑛 in the Kalman Filter algorithm is set to 4, and the threshold value 𝜏 is set to The data using for experiment are taken from NOIZEUS speech corpus [9] It is the noisy database contains 30 IEEE sentences (produced by three male and three female speakers) corrupted by eight different real-world noises at different SNRs The noise was taken from the AURORA database and includes suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train-station noise All the audio files of NOIZEUS database are Mono, 8000 Hz, 16 bit PCM code WAV files The corpus can be accessed and downloaded from: http://ecs.utdallas.edu/loizou/speech/noizeus/ We also need to find other noise reduction methods in order to evaluating the proposed method in this thesis Because the proposed method is based on Boll spectral subtraction, an implementation of Boll method taken from this website: http://dea.brunel.ac.uk/cmsp/home_esfandiar/Sample%20Wave%20Files.htm A drawback of this implementation is that the author doesn’t guarantee its optimality However, it still can serve as a good reference point for our proposed method Another method used for comparison is Berouti spectral subtraction [8], which used power spectral 35 subtraction instead of magnitude spectral subtraction in Boll method Its implementation is also taken from the above website Because the necessity of using VAD and aligning signal, we not implement our own SNR calculation routine The SNR calculation routine we used is taken from VOICEBOX, a speech processing toolbox consists of MATLAB routines The website of VOICEBOX toolkit is located at: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html More the source code of actual implementation can be found in Appendix section 4.3 Experiment results With the experiment setup described in the above section, we have got the following results when tested with the samples affected by car noise: Table 2: Experiment results with the speeches corrupted by car noise at SNR 0dB SNR File sp01_car_sn0 sp02_car_sn0 sp03_car_sn0 sp04_car_sn0 sp05_car_sn0 sp06_car_sn0 sp07_car_sn0 sp08_car_sn0 sp09_car_sn0 sp10_car_sn0 sp11_car_sn0 sp12_car_sn0 sp13_car_sn0 sp14_car_sn0 sp15_car_sn0 sp16_car_sn0 sp17_car_sn0 sp18_car_sn0 sp19_car_sn0 sp20_car_sn0 sp21_car_sn0 sp22_car_sn0 sp23_car_sn0 sp24_car_sn0 sp25_car_sn0 sp26_car_sn0 sp27_car_sn0 sp28_car_sn0 sp29_car_sn0 sp30_car_sn0 Average Noisy -0.08 -0.04 -0.08 -0.07 0.14 -0.46 0.03 0.09 0.09 -0.21 -0.28 0.07 -0.12 -0.03 0.48 -0.25 0.33 0.16 -0.27 -0.02 -0.07 0.28 0.03 0.16 0.26 0.19 -0.16 0.03 0.01 0.22 0.01 Proposed method 6.76 6.63 4.06 6.40 6.23 6.53 7.05 6.14 6.53 4.32 3.97 5.16 5.71 7.58 6.53 7.47 6.99 7.11 7.31 6.47 5.75 6.64 6.48 6.99 6.81 6.62 5.81 7.11 6.44 5.98 6.32 SS Boll SS Berouti 2.08 4.35 3.87 4.83 6.00 5.70 2.56 5.87 6.06 3.08 3.69 4.63 2.88 6.25 3.01 7.32 3.48 6.95 5.91 6.81 6.38 2.22 3.88 2.45 4.27 4.48 3.43 4.87 5.60 3.31 4.54 3.38 4.58 3.10 4.07 4.52 5.35 4.03 4.88 4.72 3.64 3.05 3.73 4.13 5.89 3.93 6.18 3.99 4.38 5.62 5.42 3.27 3.29 3.60 4.33 5.20 4.80 3.84 5.08 4.94 4.24 4.37 36 Noisy -11.92 -10.26 -10.25 -14.09 -12.57 -16.26 -12.99 -9.35 -10.90 -9.77 -11.92 -10.49 -11.29 -12.03 -7.40 -18.91 -8.94 -12.04 -16.49 -13.86 -12.31 -5.59 -10.33 -8.12 -10.82 -7.88 -13.48 -11.84 -12.70 -9.07 -11.46 SNRseg Proposed SS Boll method -2.86 -1.17 -3.42 -4.51 -4.14 -6.35 -3.18 -1.21 -2.39 -0.72 -5.39 -1.19 -4.21 -2.58 2.19 -8.47 -0.32 -2.49 -5.86 -5.46 -3.08 2.05 -1.30 0.90 -2.43 0.79 -4.47 -1.43 -2.02 -1.42 -2.54 -2.26 -0.59 -1.31 -2.66 -0.47 -2.71 -2.44 0.11 -0.18 -2.49 -3.08 -0.72 -0.52 -0.61 0.11 -4.04 0.02 -0.07 -3.22 -1.68 -0.05 0.91 -0.15 -0.28 -0.27 0.37 -1.95 -1.02 -0.91 -0.39 -1.09 SS Berouti -6.22 -4.41 -4.64 -8.12 -6.47 -9.21 -7.10 -3.41 -4.98 -4.44 -6.55 -4.95 -5.35 -5.18 -2.56 -11.29 -3.63 -5.71 -9.42 -7.19 -6.37 -0.97 -4.71 -2.63 -4.46 -2.13 -7.77 -5.72 -6.17 -3.69 -5.52 Table 3: Experiment results with the speeches corrupted by car noise at SNR 5dB SNR File sp01_car_sn5 sp02_car_sn5 sp03_car_sn5 sp04_car_sn5 sp05_car_sn5 sp06_car_sn5 sp07_car_sn5 sp08_car_sn5 sp09_car_sn5 sp10_car_sn5 sp11_car_sn5 sp12_car_sn5 sp13_car_sn5 sp14_car_sn5 sp15_car_sn5 sp16_car_sn5 sp17_car_sn5 sp18_car_sn5 sp19_car_sn5 sp20_car_sn5 sp21_car_sn5 sp22_car_sn5 sp23_car_sn5 sp24_car_sn5 sp25_car_sn5 sp26_car_sn5 sp27_car_sn5 sp28_car_sn5 sp29_car_sn5 sp30_car_sn5 Average Noisy 5.03 4.91 5.14 5.03 5.08 4.49 5.05 5.19 5.10 4.75 4.88 5.04 4.65 5.05 5.26 4.55 5.18 4.95 5.37 4.90 4.93 5.35 5.01 5.17 5.30 5.27 4.85 5.04 5.10 5.15 5.03 Proposed method 10.59 10.03 10.11 9.73 9.92 8.33 10.66 10.00 9.52 9.34 10.35 10.22 8.12 11.04 10.27 10.87 9.63 8.76 10.71 10.24 8.07 10.16 9.74 9.99 10.02 9.45 9.77 10.17 8.65 9.19 9.79 SS Boll SS Berouti 4.63 9.22 8.59 9.51 8.73 7.81 7.58 9.41 9.08 6.78 9.03 9.99 7.38 7.81 5.47 10.60 7.40 7.11 8.79 9.44 7.56 7.89 5.79 3.78 6.64 6.23 7.96 9.71 7.64 6.42 7.80 5.86 7.59 9.11 7.65 8.04 7.24 6.38 8.79 7.94 7.97 8.26 8.60 5.43 8.80 5.55 9.81 7.46 6.89 9.18 8.54 4.21 4.51 5.54 6.14 7.67 6.87 6.44 8.26 7.55 7.00 7.31 37 Noisy -6.76 -5.33 -5.72 -9.00 -7.59 -10.64 -8.00 -4.26 -5.88 -5.52 -7.51 -6.18 -5.77 -6.96 -2.78 -14.16 -4.04 -6.40 -10.67 -8.89 -6.75 -0.51 -5.37 -3.08 -5.73 -2.74 -8.44 -6.87 -6.97 -4.19 -6.42 SNRseg Proposed SS Boll method 1.40 2.37 1.61 -1.07 -1.29 -3.65 0.69 2.78 0.79 1.65 -0.01 2.20 -0.12 1.92 4.00 -4.89 1.90 -0.02 -3.23 -0.99 -0.95 5.58 1.45 3.62 1.47 3.35 -1.31 0.31 -0.25 3.02 0.74 0.19 2.86 2.49 0.86 2.24 -0.75 0.95 3.37 2.66 1.23 0.39 2.71 1.30 1.10 3.44 -1.73 2.83 -0.36 -1.94 0.79 0.51 5.18 1.64 1.37 1.37 1.47 0.75 1.94 -0.05 1.97 1.36 SS Berouti -3.00 -1.11 -0.09 -3.94 -2.69 -5.41 -3.89 0.61 -1.19 -0.93 -2.88 -1.42 -2.90 -1.79 0.25 -7.67 -0.46 -2.55 -5.42 -3.65 -3.37 1.28 -1.59 0.24 -1.53 0.63 -4.37 -2.15 -2.31 -0.97 -2.14 Table 4: Experiment results with the speeches corrupted by car noise at SNR 10dB SNR File sp01_car_sn10 sp02_car_sn10 sp03_car_sn10 sp04_car_sn10 sp05_car_sn10 sp06_car_sn10 sp07_car_sn10 sp08_car_sn10 sp09_car_sn10 sp10_car_sn10 sp11_car_sn10 sp12_car_sn10 sp13_car_sn10 sp14_car_sn10 sp15_car_sn10 sp16_car_sn10 sp17_car_sn10 sp18_car_sn10 sp19_car_sn10 sp20_car_sn10 sp21_car_sn10 sp22_car_sn10 sp23_car_sn10 sp24_car_sn10 sp25_car_sn10 sp26_car_sn10 sp27_car_sn10 sp28_car_sn10 sp29_car_sn10 sp30_car_sn10 Average Noisy 10.34 9.83 10.13 9.83 10.12 9.62 10.01 10.73 10.04 9.84 9.82 10.00 9.78 10.02 10.23 9.43 11.06 10.21 9.78 9.90 9.94 10.41 9.96 10.19 10.29 10.23 9.85 9.92 10.07 10.13 10.06 Proposed method 13.32 13.06 13.73 12.52 13.78 12.04 13.72 12.31 13.38 13.22 12.53 13.86 13.15 14.65 13.98 13.13 14.19 14.22 14.86 14.33 12.92 13.34 13.06 13.28 13.56 13.18 12.90 12.69 13.66 12.26 13.36 SS Boll SS Berouti 10.53 10.94 10.31 10.16 11.28 10.42 11.21 9.75 10.85 10.57 10.28 11.54 10.29 8.82 10.84 10.22 10.40 10.14 12.02 11.53 10.35 11.05 10.73 10.08 10.23 10.30 10.94 9.93 10.57 8.67 10.50 7.24 11.81 12.03 10.89 12.41 11.36 7.46 11.66 10.55 11.03 11.91 10.98 9.23 12.04 11.34 11.91 11.53 13.02 12.66 13.38 10.77 9.81 12.28 7.70 9.95 11.05 8.12 11.29 12.36 9.25 10.90 38 Noisy -1.05 -0.36 -0.74 -3.22 -2.60 -5.54 -3.01 1.47 -0.99 -0.38 -1.96 -1.25 -1.44 -1.94 2.15 -8.35 1.95 -1.91 -6.43 -3.93 -2.31 4.60 -0.38 1.91 -0.76 2.17 -3.53 -1.24 -2.72 0.79 -1.37 SNRseg Proposed SS Boll method 3.98 4.72 4.98 2.10 3.91 3.57 3.69 6.41 4.44 5.63 3.70 5.80 3.82 5.12 7.56 -0.53 7.54 3.79 1.84 3.30 2.93 9.12 4.78 6.83 4.82 7.10 3.90 3.73 3.81 5.99 4.61 3.05 4.63 4.06 1.23 4.32 0.52 3.48 4.97 4.19 4.48 2.22 4.62 4.41 2.80 6.32 -1.41 5.51 3.13 2.45 3.05 4.02 7.93 4.24 5.05 3.62 5.28 3.64 2.39 3.52 4.43 3.74 SS Berouti -0.41 2.85 3.20 -0.54 1.06 -1.85 -1.10 4.26 2.09 2.70 0.94 1.73 1.51 1.76 4.96 -3.14 3.57 2.02 -1.53 0.95 1.00 5.58 3.22 2.80 1.63 4.16 -1.16 1.48 1.12 1.87 1.56 Table 5: Experiment results with the speeches corrupted by car noise at SNR 15dB SNR File Noisy sp01_car_sn15 sp02_car_sn15 sp03_car_sn15 sp04_car_sn15 sp05_car_sn15 sp06_car_sn15 sp07_car_sn15 sp08_car_sn15 sp09_car_sn15 sp10_car_sn15 sp11_car_sn15 sp12_car_sn15 sp13_car_sn15 sp14_car_sn15 sp15_car_sn15 sp16_car_sn15 sp17_car_sn15 sp18_car_sn15 sp19_car_sn15 sp20_car_sn15 sp21_car_sn15 sp22_car_sn15 sp23_car_sn15 sp24_car_sn15 sp25_car_sn15 sp26_car_sn15 sp27_car_sn15 sp28_car_sn15 sp29_car_sn15 sp30_car_sn15 Average 15.00 14.84 15.24 14.97 15.06 14.53 14.99 15.12 15.06 14.72 14.89 15.05 14.91 14.94 15.16 14.69 15.22 15.22 14.72 14.89 14.92 15.29 14.99 15.23 15.29 15.21 14.84 15.03 15.04 15.20 15.01 Proposed method 17.95 17.22 17.60 17.31 17.95 17.68 17.19 17.29 17.36 16.74 17.13 17.63 17.40 17.47 17.51 19.24 17.13 18.40 18.48 17.93 16.98 17.08 16.95 17.10 17.23 16.96 15.77 17.72 17.57 14.45 17.35 SS Boll SS Berouti 11.59 11.24 10.76 11.93 11.66 11.58 12.65 11.89 11.60 11.94 11.25 12.41 11.19 10.45 11.38 12.35 11.09 10.84 12.42 11.87 10.53 12.07 11.02 11.18 11.44 10.56 11.80 11.55 11.30 9.76 11.44 10.56 16.01 16.33 16.26 13.74 16.53 12.23 14.28 16.13 16.03 15.53 14.66 15.56 15.58 13.16 17.16 13.67 15.70 14.66 16.57 15.46 14.72 14.80 8.72 15.15 15.48 13.66 15.49 16.60 10.37 14.69 Noisy 3.18 4.63 4.36 0.99 2.34 -1.26 1.99 5.68 4.04 4.45 2.93 3.84 3.71 3.64 7.13 -3.93 6.01 3.10 -1.54 1.05 2.64 9.50 4.64 6.93 4.20 7.17 1.55 3.17 2.28 5.90 3.48 SNRseg Proposed SS Boll method 8.98 9.14 8.16 6.62 7.63 5.35 7.85 9.19 8.55 9.24 8.59 9.56 8.29 7.71 10.77 3.54 10.72 8.26 5.09 6.54 7.83 12.51 8.27 10.73 8.33 10.62 7.34 8.59 8.05 8.29 8.34 5.53 6.17 5.24 4.53 5.46 4.51 5.87 6.98 5.95 6.43 2.78 5.79 5.55 4.27 7.32 2.29 7.00 5.57 3.84 4.55 5.12 9.28 5.96 7.18 5.70 6.75 5.09 5.88 5.05 5.97 5.59 SS Berouti 2.71 7.11 6.84 4.01 3.25 2.79 2.42 7.05 6.53 7.10 4.43 4.88 6.02 5.60 7.46 0.54 5.95 5.29 1.37 4.36 5.32 9.93 6.04 5.30 5.92 8.51 3.17 5.15 5.35 3.48 5.13 Table 6: Average SNR and SNRseg gain when compare three methods’ results with noisy speech SNR gain SNRseg gain Noisy 15 dB Noisy 10 dB Noisy dB Noisy dB Noisy 15 dB Noisy 10 dB Noisy dB Noisy dB Proposed method 2.34 3.30 4.76 6.31 4.86 5.98 7.16 8.92 39 SS Boll -3.57 0.44 2.77 4.53 2.11 5.11 7.78 10.37 SS Berouti -0.32 0.84 2.28 4.36 1.65 2.93 4.28 5.94 In overall, all three methods used in this experiment have shown positive SNR gain when compared with the SNR of noisy speech signals That means all three methods have done their jobs of noise reduction When compared with the implementations of Boll spectral subtraction and Berouti spectral subtraction, our implementation of this thesis’s method gives best results overall Note that we used the word “implementation” instead of “method”, because there may be other implementations of Boll and Berouti spectral subtraction that give better results than the implementations used in this experiment In Table 6, the average SNR gain of proposed methods is higher than the average SNR gain of other methods The SNRseg gain of proposed methods is also the highest with the 15 dB and 10 dB test data, while the SNRseg gain of Boll method is the highest with the dB and dB test data The SNR and SNRseg improvements of proposed method to other two methods are shown in the Table below: Table 7: Improvements of proposed method compared to other two methods SNR Noisy 0dB SS Boll SS Berouti Noisy 5dB 1.78 1.95 1.99 2.48 SNRseg Noisy 10dB Noisy 15dB 2.86 2.46 5.91 2.66 Noisy 0dB Noisy 5dB -1.45 2.98 Noisy 10dB Noisy 15dB 0.87 3.05 2.75 3.21 -0.62 2.88 However, there still are some problems in the experiment results For example, with the car noise at SNR level = 15dB, the Boll method gives negative global SNR gain, but positive segmental SNR gain The gains are significant, so we cannot discard them as random inconsistency Furthermore, at high noise level (low SNR, 0dB and 5dB), the global SNR measurement shows that the proposed method is the best, while the segmental SNR shows that the Boll method is the best This systemic disagreement between global SNR and segmental SNR is indicating that there must be some problems the number cannot describe The particular case of sp07_car_sn0 (the speech sample no which was corrupted by car noise at SNR = 0dB) is one of the notably inconsistent cases: -7,1 -2.44 -3.18 SNRseg SS Berouti 4.03 global SNR SS Boll 2.56 7.05 -8 -6 -4 -2 SNR (in dB) Proposed Figure 9: SNR and SNRseg results of three methods with sp07_car_sn0.wav As we can see from the chart above, the SNR measure shows that the proposed method is the best, the next best one is Berouti method and the final is Boll method However, the SNRseg 40 shows that the Boll method is the best, the next best one is the proposed method, and the worst is Berouti method A closer examination on the result audio files of all three methods shows that: Figure 10: Waveform of the clean speech signal sp07.wav Figure 11: Waveform of noisy speech sp07_car_sn0.wav after noise reduction by proposed method Figure 12: Waveform of noisy speech sp07_car_sn0.wav after noise reduction by Boll spectral subtraction Figure 13: Waveform of noisy speech sp07_car_sn0.wav after noise reduction by Berouti spectral subtraction 41 All the three results contain notable distortions from the original clean speech, which is acceptable if we take into considering that the SNR of noisy signal was only 0dB However, as we can see from the Figure 12, the Boll spectral subtraction has over-removed some parts of the clean speech Examination of other 0dB results by subjective hearing also shows the similar problems But SNRseg measurement shows that Boll method is the best one This confirms the view that while widely used for evaluating the performance of speech enhancement algorithms, SNRseg has very low correlation to overall quality of speech, and therefore, should not be used for evaluating the performance of speech enhancement algorithms [10] 42 Chapter CONCLUSION 5.1 Conclusions The main goal of this thesis is attempting to enhance the quality of speeches by reducing the background noise contained in them For this purpose, we presented a modification to the spectral subtraction method of Boll This modification is using scalar Kalman Filter for residual noise removal We have tested the presented method in this thesis with the set of speech affected by car noise in online speech corpus NOIZEUS, and compared it with two basic implementations of spectral subtraction methods The SNR and SNRseg measurements showed that the presented method has achieved the goal of noise reduction In overall, it also showed better SNR and SNRseg than other two implementations The presented method’s average SNR gain is the best with all noise levels and its average SNRseg gain is also the best with 10 dB SNR and 15 dB SNR sets of speeches On the other hand, its average SNRseg gains are lower than the basic implementation of Boll spectral subtraction with dB SNR and dB SNR sets of speeches However, subjective hearing has found that the SNR and SNRseg measurements didn’t correlate to the quality of speech The basic implementation of Boll spectral subtraction gave best SNRseg result at 0dB and 5dB SNR, but it has over-removed the speech signal It was actually worse than the presented method at recovering original speech Subjective hearing also confirmed that all tested methods, while working best with high SNR speech samples, were struggled with 0dB SNR speech samples The speech samples after noise reduction contained notable distortions from original speech This is the greatest limitation of methods based on spectral subtraction 5.2 Future Works The reason that spectral subtraction struggles with 0dB SNR noisy signals is that it assumes that the phase difference between speech and noise can be ignored However, while this assumption works well when the speech is considerable greater the noise, in the case of 0dB SNR or lower, this assumption is not hold anymore Recently, there are some works that extend the spectral subtraction method in order to take account of the phase difference In 43 future, we could extend our method basing on those approaches And there is also a need to find another measurements for evaluating speech quality, instead of the SNR and SNRseg measurements in this thesis 44 Bibliography [1] S F.Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vols ASSP-27, no 2, pp 113-120, April 1979 [2] K K Paliwal and A Basu, “A speech enhancement method based on Kalman filtering,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87, 1987 [3] S So, K K Wojcick and K K Paliwal, “Single-channel speech enhancement using Kalman filtering in the modulation domain,” [Online] Available: http://www98.griffith.edu.au/dspace/bitstream/handle/10072/36143/65040_1.pdf [4] E M Stein and R Shakarchi, Fourier Analysis: An Introduction (Princeton Lectures in Analysis), Princeton University Press, 2003 [5] S W Smith, The Scientist & Engineer's Guide to Digital Signal Processing, 1st ed., California Technical Pub, 1997 [6] S Haykin and B V Veen, Signals and Systems, 2nd ed., John Wiley & Sons, 2003 [7] R Faragher, “Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation,” IEEE Signal Processing Magazine, pp 128-132, Septemper 2012 [8] M Berouti, R Schwartz and J Makhoul, “Enhancement of speech corrupted by acoustic noise,” in IEEE International Conference on ICASSP '79, 1979 [9] Y Hu and P Loizou, “Subjective evaluation and comparison of speech enhancement algorithms,” Speech Communication, vol 49, pp 588-601, 2007 [10] P C L Yi Hu, “Evaluation of Objective Quality Measures for Speech Enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol 16, no 1, pp 229238, 2008 45 46 Appendix A MATLAB source code of the implementation % x is the noisy signal in time domain, fs is the sample rate % Rfactor is used to scale R when need – default value: 1.0 % Qfactor is used to reduce Q faster – default value: 1.0 % numPast is number of past filtered data used to estimate Q, default value: % threshold default value is 3.0 function xRes = SSKalman (x, fs, Rfactor, Qfactor, threshold, numPast) wlen = round(fs * 0.020); % length of the hamming window is 20 ms hop = round(wlen/5); % hop size 1/5 wlen = ms nfft = wlen; % number of FFT points X = stft(x, wlen, hop, nfft, fs); % Short Time Fourier Transform [m, n] = size(X); % m is the number of frequency bins, n is the number of time frames Mag = abs(X); % Get Magnitude spectrum Phase = angle(X); % Get Phase spectrum % Calculate noise model using the first 40 time frames (around 160 ms) R = var(Mag(:,1:40), 0, 2) * Rfactor; NoiseMean = mean(Mag(:,1:40), 2); % Spectral subtraction for t=1:n Mag(:,t) = Mag(:,t) - NoiseMean; end % Kalman filter for residual noise reduction FilteredMag = zeros(m,n); OneMatrix = ones(m,1); Y = zeros(m,1); P = zeros(m,1); Q = zeros(m,1); Diff = zeros(m,n); for t=1:n % Prediction step P = P + Q; % Compare innovation value with threshold Inno = Mag(:,t) - Y; sqrtRP = sqrt(P+R); for f=1:m 47 if abs(Inno(f)) > threshold * sqrtRP(f) P = Inno.*Inno; break; end end % Correction step K = P./(P + R); Y = Y + K.*Inno; P = P.*(OneMatrix-K); FilteredMag(:,t) = Y; % Record the difference between past and current filtered value if t > Diff(:,t) = Y - FilteredMag(:,t-1); else Diff(:,t) = Y; end % Estimate Q t0 = t - numPast + 1; t0 = max(1, t0); Q = var(Diff(:,t0:t), 0, 2) * Qfactor; end FilteredMag(FilteredMag < 0) = 0; % Half-wave Rectification S = FilteredMag.*exp(Phase*1i); % Recombined with noisy phase xRes = istft(S, hop, nfft, fs); % Transfer back to time domain xRes = transpose(xRes); end 48 ... UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Đặng Minh Công NOISE REDUCTION IN SPEECH ENHANCEMENT BY SPECTRAL SUBTRACTION WITH SCALAR KALMAN FILTER Major:Computer Science Supervisor:Assoc...

Ngày đăng: 17/04/2017, 23:02

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w