An Adaptive Method for Classification of Noisy Respiratory Sounds An Adaptive Method for Classification of Noisy Respiratory Sounds Khanh Nguyen Trong Naver AI Lab, Posts and Telecommunications Instit[.]
2021 8th NAFOSTED Conference on Information and Computer Science (NICS) An Adaptive Method for Classification of Noisy Respiratory Sounds Khanh Nguyen-Trong Naver AI Lab, Posts and Telecommunications Institute of Technology, Ha Noi, Viet Nam Sorbonne University, IRD, UMMISCO, JEAI WARM, F-93143, Bondy, France Email: khanhnt@ptit.edu.vn which outperform the traditional machine learning ones The neural networks, such as Convolutional Neural Networks (CNN) [5], Recurrent Neural Networks (RNN) [6], or its combinations and variants like CNN-RNN [7], Long Shortterm Memory (LSTM) [8], and Bidirectional-Long Short-term Memory (BiLSTM) [9], have proven their efficiency in RS analysis They have shown high performance in computer vision, especially CNN Thus, the visual representation of RSs is the most used feature in recent studies Numerous useful features have been proposed in the literature, such as MFCC, GFCC, LPC, and so on In this context, with the same feature and method, the performance of final models depends on the way that RS was pre-processed, such as noise suppression or augmentation techniques We are interested in proposing suitable denoising techniques for RS analysis Several denoising algorithms have been proposed to pre-process RSs in the literature For instance, Jacome et al [10] presented a study for breath sound discrimination of acute exacerbation COPD (AECOPD) and stable COPD A Butterworth band-pass filter of order was used to clean the recorded lung sound signal Haider et al [11] also presented an algorithm to suppress noise in RSs using empirical mode decomposition (EMD), Hurst analysis, and spectral subtraction However, there are no systematic evaluations about the combination between the RS classification method and denoising algorithms The existing works usually chosen the denoising algorithm based on the noise characters Therefore, in this paper, we propose a method that allows the classification of noisy RSs First, we evaluate four popular adaptive denoising algorithms (RLS, LMS, NLMS, and Kalman) on a breath sound dataset Then we apply the selected denoising method to classify cough sound The remainder of this paper is structured as follows Section discusses relevant previous studies Section presents our method The experimental evaluation is presented in Section 4, and finally, some concluding remarks and a brief discussion are provided in Section Abstract—Respiratory sounds (RSs) contain essential information about the physiology and pathology of lungs and airways obstruction Therefore, RS understanding has a critical role in diagnosing respiratory patients However, the external noise in the respiratory sound signal is a large restriction for this study In this paper, we propose a method to classify noisy respiratory signals Firstly, four adaptive filtering algorithms (RLS, LMS, NLMS, and Kalman) are applied and evaluated for noise reduction Then, we extract features of filtered sounds, using Mel Frequency Cepstral Coefficient Finally, the SVM method is used to classify respiratory sounds We also conducted experiments on a dataset consisting of 1980 breath events, collected from 16 healthy volunteers The obtained results show that, the investigated methods, SVM and Kalman achieves the highest accuracy of 95.5% Index Terms—Respiratory, breath denoising, classification, SVM, MFCC, Kalman, RLS, LMS, NLMS I I NTRODUCTION Classification between normal and abnormal respiratory sounds (such as cough, crackles, wheezes ) is critical for an accurate medical diagnosis, especially in the context of COVID-19 pandemics Respiratory sounds (RS) contains key information related to the physiologies and pathologies of lungs and airways obstruction Therefore, RS understanding has an important role in diagnosing respiratory patients Auscultation is the most widely used method to determine the health condition of the respiratory organs However, due to its non-stationary character, RS is still difficult to analyze and also hard to distinguish by such traditional methods Especially, if not done by a well-trained physician, this may lead to wrong diagnosis [1] Moreover, the doctor’s subjectivity is also a significant concern, which can lead to inaccurate diagnosis In this context, RS analysis by pattern recognition methods will help to overcome the limitations of traditional auscultation Such methods typically acquire RSs by different electronicbased devices and transmit/store them then to/at facilities, such as mobile phones or dedicated devices The collected data is either pre-processed and classified directly on these devices, such as SoundSen [2], or partly pre-processed and sent to a powerful computer for more in-depth analysis, such as the work of [3] Recently, many works have been proposed to analyze RSs, especially deep learning-based methods They have been widely adopted in the classification of such sounds [4] Accordingly, for which many models have been proposed, 978-1-6654-1001-4/21/$31.00 ©2021 IEEE II R ELATED WORK Among the machine learning technologies developed until now for audio signals, three steps were identified to analyze RSs [12]: pre-processing, feature extraction, and model training First, depending on the quality of input data, it can be pre-processed by tasks such as data cleaning, segmentation, and transformation 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Due to environmental conditions, RSs are usually recorded with noise They need to be clean by related techniques at the data cleaning step, for example, reducing noise, detecting N/A values, detecting gaps, outlier analysis, normalization [13] and so on In the literature, many researchers applied different filters to improve the data quality The least mean square (LMS), normalized least mean square (NLMS), Kalman, and Wavelet filtering are widely used techniques in the field For example, Wu et al [14] presented a technique to improve the quality of tracheal sounds using adaptive filtering (AF) The normalized least mean square (NLMS) AF algorithm was applied to the tracheal sounds mixed with noises The authors thus examined the accuracy of the apnea detection algorithm with this data The experiment results have shown that the performance of tracheal sound detection with AF input data was improved After cleaning, data is usually segmented into windows with a fixed number of size [13] Sliding windows is the most popular technique using for segmentation For instance, Emmanouilidou et al.[15] applied a segmentation technique on lung sounds with 500ms-windows with 50% overlap Amrulloh et al [16] employed a rectangular sliding window to segment the cough sound into 20ms-windows The authors then trained an artificial neural network to detect coughs segment, which achieved high accuracy The segmented data can be then transformed to other corresponding forms at the data transformation step The second step usually transforms the raw sound data to timedomain (such as amplitude envelope, root-mean-square energy), frequency-domain (such as band energy ratio, spectral centroid, and spectral flux) or joint time-frequency domain (such as spectrogram, Mel-spectrogram, and constant-Q) features [17] Mel-frequency Cepstral Coefficients (MFCCs) are widely used in audio analysis It provides a compact representation of the upper airway acoustical properties and allows one to separate contributions from the airway cavity geometry and the source vibration sounds [16] Data are then passed to the next steps for feature extraction and model training For these steps, there are two main techniques in the literature: (i) traditional machine learning and (ii) deep learning The traditional techniques usually directly used transformed data from the previous step to train models This data is considered to be handcrafted features For example, Khamlich et al [18] presented a method for speech recognition, in which the authors convert the audio data to MFCCs The extracted features were then used to train SVM and ANN classifiers The deep learning-based methods automatically extract appropriate features from the data For example, Loo et al [19] presented a method based on MFCC and CNN for breath detection In this work, the authors extracted MFCC features from 5500 asynchronous breathing (AB) and 5500 normal breathing (NB) cycles The extracted features were then put to a CNN network to automatically learn embedded vectors for normal and asynchronous breath classification Yang et al [20] proposed an automatic Sleep Apnea-Hypopnea Syndrome (SAHS) event detection method using MFCCs with LSTM The used features were nasal airway pressures and temperature signals, extracted from clinical polysomnography (PSG) dataset [20] The trained LSTM model achieved an accuracy of 81.6% MFCCs were also used to train BiLSTM, as in the work of Balamurali et al [21] The authors proposed a method that allows distinguishing healthy children from the ones with pathological coughs The method achieves an accuracy of 91% for all three respiratory pathologies Regardless of applied methods, traditional machine learning, or deep learning, the performance of the two last steps (feature extraction and model training) is affected by the first one, especially noise suppression However, there are no systematic evaluations about the combination between the RS classification method and denoising algorithms The existing works usually chosen the denoising algorithm based on the noise characters III M ATERIAL AND METHODS We are interested in the classification of respiratory sounds recorded in natural environments that have a significant noise background The proposed method is illustrated in Fig 2, in which we classify three types of noisy breath sounds: normal, heavy, and deep breath It contains steps as follows: Fig Classification of noisy breath sounds • • • • • Annotation: the noisy breath sounds are annotated by respiratory physicians It allows us to have a labeled data with high confidence for the training and validation tasks Noise filtering: this step contains two main tasks: normalization and denoising First, the input data will be normalized into a fixed format The labeled and noisy data are then denoised at this step In this study, besides the classification of breath sounds, we also evaluated the suitable denoising techniques for such data Four adaptive filtering techniques including RLS, LMS, NLMS, and Kalman were applied to filter unexpected signals Thus, after this step, we obtained filtered datasets Segmentation: the continuous breath sounds are segmented into 25-ms windows, with 10 ms overlap between successive windows Feature extraction: we base on MFCC features to train models Therefore, at this step, the filtered breath sounds are converted to the corresponding features Model training: the Support Vector Machine (SVM) algorithm is used for the classification Thus, we employ 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) the MFCC features to train classifiers that are corresponding with four denoising algorithms • Evaluation: we evaluate four classifiers on our data test set to choose the best model and also denoising algorithm for breath sound classification The details of each step will be described in the next sections 2) RLS filter: The RLS algorithm bases on a recursive method to find filter weights while minimizing the average squared error Unlike the LMS algorithm that minimizes the current error value, the RLS algorithm focus on the total error from the beginning to the current data point In other words, RLS has infinite memory All error data is given the same consideration in the total error In the cases that the error value might come from a spurious input data point or points, the forgetting factor lets the RLS algorithm reduce the value of older error data by multiplying the old data with the forgetting factor The recursive method updates its status over time to calculate the weight of the filter, as the following formula: A Noise filtering Adaptive filtering, which is suitable with many kinds of data in the unknown statistical environment or non-stationary environment, is an efficient technique for noise cancellation Therefore, in this study, we applied four popular adaptive filtering methods on the breath dataset: RLS, LMS, NLSM, and Kalman In general, these filters are based on noise signals and hidden parameters to minimize the mean square error function between the filters output signal and the desired signal The advantage of these filters is the ability to selfstudy and manually adjust parameters to achieve the desired results 1) LMS and NLMS filter: LMS algorithm tries to minimize the Mean Square Error (MSE) cost function When an error is considered to be minimized, it focuses only on the current error value, filter weights are constantly updated until the smallest point is reached (derivative equals 0), as follows: w(m) = w(m − 1) + k(m)e(m), (2) where e(m) is the error signal that is calculated by the following formula: e(m) = x(m) − wT (m − 1)y(m), (3) k(m) is the gain vector that is calculated as follows: k(m) = λ−1 φ(m − 1)y(m) , + λ−1 y T (m)φ(m − 1)y(m) (4) φ(m) is the correlation matrix inversion calculated as follows: φ(m) = λ−1 φ(m − 1) − λ−1 k(m)y T (m)φ(m − 1), (5) where, λ is called the adaptive coefficient (i.e forgetting factor) that its value range in < λ < When λ = 1, all previous error is considered to be equal weight in the total error, while λ closes to 0, the past errors play a smaller role in the total value 3) Kalman filter: The Kalman filter is derived as the minimum mean square estimator It bases on the recursive method to estimate the average minimum square error of a signal (or a state vector) x(m) from an observed noisy signal y(m) Each recursive includes stages: • Prediction stage (update): The signal state xk is predicted from the previous observation process and a covariance matrix of prediction error, as shown in in Fig where Ak , Bk , Pk , Qk are the state transition matrix, control-input, error covariance and the covariance of process noise, respectively • Estimation stage (measurement): The prediction result from step (1), and signal innovation (innovation is the difference between the predicted and the observed interference) are used to estimate the signal and calculate the Kalman gain vector and covariance matrix of estimation error, as shown in in Fig where Hk , Rk , Sk , Kk are the measurement model, covariance of measurement noise, pre-fit residual covariance, and Kalman gain (m)) w(m + 1) = w(m) + µ( (∂e (∂w(m)) ) where the error signal e(m) is the difference between the output of the adaptive filter and the desired signal x(m), given by the formula: e(m) = x(m) − wT (m)y(m) The autocorrelation matrix of input signals and the adaptive step size µ influences the stability of the algorithm: if µ is too large, it will result in high convergence rates but instability; and if µ is too small, the convergence speed will be longer Supposing that λmax is the largest specific value of the autocorrelation matrix of y(m), then the limit of µ for a stable adaptation is: < µ < λmax One of the disadvantages of the LMS algorithm is that the constant adaptive step size, after each loop, is difficult to achieve Because it requires an understanding of the statistics of the input signal The NLMS algorithm solves the µ selection problem by normalizing the energy (or power) of the input signal Instead of minimizing the difference between the filter output and the desired signal output, NLMS uses the Euclidean distance minimization criterion of the increase in the filter weights vector δw(m + 1) in subsequent updates Let P the length of the filter, the NLMS algorithm is given by the following formula: µ w(m + 1) = w(m) + y(m)e(m) (1) PP −1 a + k=0 y (m − k) PP −1 where a + k=0 y (m − k) is the input signal energy, µ controls the adaptation step size and a is a small constant employed to avoid the denominator of the update term becoming zero when the input signal y(m) is zero B Segmentation Breath sounds are continuously recorded and thus nonstationary over time Studying short segments (i.e 20-40ms) makes them more stable Therefore, we segmented the breath sounds dataset into fixed and short windows Because the 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) respiratory sounds contain information about the dynamics, that is what the trajectories of MFCC coefficients over time Thus, calculating these factors might increase the performance of the Automatic Breathing Sound Recognition system To calculate the delta coefficients (Differential), the following formula is used: PN n(c − ct−n ) P t+n , (9) dt = n=1 ( n = 1)N n2 where dt is the delta coefficient from the frame t computed in terms of the static coefficients c( t+n) to c( t−n) A typical value for N is Delta-Delta (Acceleration) coefficients are calculated in the same way, but they are calculated from the deltas, not the static coefficients Finally, the variance and standard deviation of 13 MFCC coefficients, 13 differential factors, and 13 acceleration factors are calculated to increase the recognition performance Therefore, a feature vector of each frame is the values of variance and standard deviation that are calculated for 13 MFCC coefficients, 13 differentials, and 13 acceleration ratios It is called MFCC 78 So, if the feature vector calculated in the ith frame is xi , then it will be represented as a 78-dimensional vector Fig Kalman filter [22] frequency rate of a normal breath usually ranges from 12 to 20 times per minute, then we split the data into 25ms-sliding windows, with 10ms overlap The same person has the same type of sound, but depending on practical situations, it can be varied in different ways Therefore, splitting the frame into short segments allows results in coefficients that are almost identical for different circumstances We applied the sliding window technique on each frame to increase the continuity between adjacent frames This study uses Hamming window method: 2πn ), where0 ≤ n ≤ N − (6) w(n) = 0.54 − 0.46 ∗ cos( N −1 C Feature extraction D Model training The objective of the Support Vector Machine algorithm is to find a hyperplane in N-dimensional space (N the number of features) that distinctly classifies the data points Many possible hyperplanes could be chosen to separate two classes of data points The plane must have the maximum margin, i.e., the maximum distance between data points of both classes Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence The loss function that helps maximize the margin is: n X (1 − yi (xi , ))+ (10) λw2 + The MFCC algorithm is used for extracting breathing sound features The algorithm will warp nonlinear frequencies on the Mel frequency scale to calculate the MFCC coefficients Mel frequency scale has equal frequency bands, closer to the response of the human ear than the regular linear frequency bands Spectral analysis shows that the different timbres in the respiratory signal, corresponding to the different energy distribution by frequency Thus, Fast Fourier Transform is used to convert from the time domain to the frequency domain, as the following formula: Dk = NX m −1 Dm e −j2πkm Nm , k = 0, 1, , Nm − w After building the loss function, the partial derivatives regarding the weights were taken to find the gradients; using gradients can update weights (7) m=0 The calculated spectra are mapped then on the Mel scale to approximate the energy that exists at each point through the triangle overlapping window (or triangle filter bank), where the filters are located evenly on the Mel scale Finally, the conversion of Mel spectral logarithm to the spatial domain is performed through discrete Cosine transformations: Cn = k X i=1 δ λ||w||2 = 2λwk , δwk n 0, δ (1 − yi (xi , w))+ = δwk −yi xik , (11) yi (xi ,w)