Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.Nghiên cứu giải pháp nâng cao chất lượng phân loại tín hiệu thủy âm tại vùng biển nước nông ứng dụng trí tuệ nhân tạo.
ON THE CLASSIFICATION OF BI- OTIC AND ABIOTIC UNDERWATER
Biotic-Abioticunderwatersignalsanduderwatersignalclas-
The ocean environment, with its complex physical phenomena and characteris- tics such as scattering, refraction, acoustic ducts, Doppler, multipath, etc., makes underwater acoustic signals rapidly changing and difficult to predict if using the radio models [7] In addition, due to the diverse physical characteristics, the classi- ficationofacousticsignalsfrombothbioticandabioticsourcesremainsachallenge in both theoretical and practical research [4].Traditionally,the main task of a sonar system is to focus primarily on processing and analyzing signals in the frequency- time domain [27], and then sonar operators are trained tobeprimarily responsible for classification tasks.However,since the beginning of the 21stcentury,practi- cal applications as well as theoretical research on classifying underwater acoustic signals based on sonar principleshavebegun to apply artificial intelligence meth- odstocompletetheclassificationtaskautomatically[148],[149],andmorestable
[150] With the rapid development of the artificial intelligence since the 2010s, it is a necessary and appropriate task to consider a systematic combination of tradi- tional and modern methods toimprovethe accuracy of underwater acoustic signal classification.
The underwater sound signals are generatedbyvariousobjects resulting fromhumanactivities on the sea, such asmovingships, fishing activities, communica- tion signals of marine mammals existing in nature, seismic vibrations under the seabed, wind, lightning, etc [74].Forexample, the sound signal generatedbywhilemovingon the sea consists of a wide frequency range component (spectrum lines are continuous frequency components) createdbythe propeller and itsown hydro-dynamic interactions, and anarrowfrequency range component (spectrum lines are discrete frequency components) createdbythe propulsion system and other mechanical components [135] Automatic classification of these signal types is a complex task because it depends on the speed of the ship, operating period, and status of the propulsion system, as well as the complex and changing background noise and the diversity of sound transmission mechanisms in the ocean In partic- ular, in the shallow waters environment, which is greatly affectedbybackground noise, multi-path phenomena, and Doppler frequency shift, theoretical studies on ocean acoustics from the 1980s [4], [134] torecentyears [7], [10]havefocused on investigating the characteristics of sound signal propagation in water, analyzing the features of biotic and abiotic signal types, and explaining the physical nature, thereby providing appropriate classification solutions for signals from these sound sources.
Thepropagationofsoundintheoceandependsonthecharacteristicsofwater flow (temperature,salinity,and pressure) and phenomena on the ocean floor (scat- tering and acoustic reflection) [7].Fora temperature ofT p (measured in degrees
Celsius),adepthbelowthewatersurfaceofzh(measuredinmeters),andasalinity ofSa(inpartsperthousand),thespeedofsoundcintheoceancanbeexpressed p by the following equation [10], [134]: c=1449,2+4,6T p +0,055T 2 +1,39(Sa−35)+0,016zh (1.1)
The formula(1.1)is one of the ways to determine the dependence of sound velocity on the variables of ocean acoustics.
Sound waves follow the law of Snell’s refraction, with the ratio between the cosine of the angle toϕ(zh), compared to the horizontal plane, at the local sound velocityc(zh)and depthzh, as given by formula(1.2), which remains constant. cosϕ(zh) c(zh) =const (1.2)
Therefore, soundwavesbendtowardsthe region withlowersound velocity [74], resulting in different characteristics of each sound signal, which can create differ- entiation between warmer and colder geographic regions, time ofdayor season, and within the same region at different depths.Consequently,the influences from environmental factors such as temperature, depth, boundary interfaces, scattering, and refraction, among others, which are not accounted for, can significantlyimpact theperformanceofasonarsysteminpractice.
In general, the effects of marine environmental characteristics on the trans- mission path of sound waves can be classified into three basic regions shown in Figure 1.1: deep water, shallow waters, and coastal zones [7], [10], [16], [74].
Figure 1.1: Partitioning of water areas based on depth
In deep water (regions with a depth greater than 300 meters below sea level),most transmission paths canbecharacterizedbySnell’slawof refraction[ 9 1 ] , except for the cases where soundwavesare back-scattered from the seafloor (ex- tremely deep-water zones), in which case the process is dependent on the signalfrequency.Therefore, in deep-water zones, the most distinctive feature of thechan-nel comes from the seafloor characteristics rather than refraction.
In shallow waters (regions with a depth less than 300 meters), the effects of surface reflections, seafloor reflections,variousrefraction and scattering properties, and temperature differencesovertimemustbeconsidered [7],[ 1 3 4 ]
In coastal water (regions very close to the shore with depths below 20 meters), the main transmission paths of the signal are the direct paths and the reflections between the surfaces of water [4], [7], [10] This concept is defined to explain the channel characteristics when sound waves propagate without reflecting from the seafloor, and when sound waves reflect between the air-sea interface.
Therefore, the nature of sound transmission in the marineenvironmentd e - pends on physical and chemical phenomena when interacting with thes u r f a c e andseafloor,regionsofdifferentrefractiveindices,multipathphenomena,tem perature,salinity,and others [4] Compared to deep water zones, shallow watersandcoastalzonesaremorecomplexduetotheadditional factorsthatmustb eaccountedfor.Thefirst:duringthesummerseason,inaccordancewithSnell’slaw[7],so undwavesbendmoretowardstheseafloorinwintermonths,meaningthatthesurfac ereflectione f f e c t w i l l bem o r e p r o n o u n c e d d u r i n g h o t t e r t i m e s o f t h e y e a r , a n d therefore,s o u n d t r a n sm i s s i o n i n s h a l l o w w a t e r s a r e a s s u f f e r s g r e a t e r a t t e n u a t i o n int h e s u m m e r t h a n i n t h e w i n t e r I n w i n t e r c o n d i t i o n s , s o u n d t r a n s m i s s i o n c a n alsobechallenging,asitproducesattenuation duetogreaterscatteringathigherfrequencies,thusrequiringmoreenergytomaintainmec hanicaloscillationsbetween water molecules in thec h a n n e l
The second :scatteringphenomenoncausedbyroughboundariesorsmallobsta- cles is a process that causes signal attenuation In contrast to reflection, scattering occurs withwavesof different wavelengths,leading to the randomization of sound fields [143] The attenuation causedbyrough surface scattering depends on the frequency of the sound field, while volume scattering (due to bubbles near the sur- face, floating bubble clusters generatedbymarine mammals) decreases with depth and creates time-varying variations[ 7 4 ]
The third :theoceanographicprocessesvaryovertimeandsurfaceoceanwavesoften create a short-term variation channel that makes channel calculation and cor- rection difficult; furthermore, the Doppler frequency shift rate ofwavepropagation in acoustic channels is higher than in radio channels, affecting synchronization at receivers The interference of thesetwopaths creates the concept of Lloyd’s mirror [91],which,combinedwiththeconditionsofshallowwatersareas,makesdetecting and classifying source signals according to both active and passive sonarprinciples inareassuchasdocksandfishingportsachallengingtask[44].
Insummary,with the continuous and complex changes of sonar principles, the use of statistical feature analysis techniques is a rational solution and the founda- tionforscientiststograduallyapplytechniquesandtheavailableresultsofartificial intelligence to process this signalform.
The shallow waters environment is a complex space that contains numerous sig- nals originating from both biotic and abiotic sources.Historically,most signals in the marineenvironmentcame from natural sources such as storms, earthquakes, and marine mammals.However,with theadventof the industrial age, underwa- ter acoustic signals resulting fromhumanactivitieshavealso increased, including transportation, offshore drilling, and sonar systems[113].
Table 1.1 provides statistics on the sound intensity of some large marine mam- mals that exist in the marine environment [32].
Table 1.1: Some communication signals of marine mammals
Biotic sound sources Sound intensity
Sperm Whale 163-223 (dB.ref.1àPa@1m)Spinner Dolphin 108-135 (dB.ref.1àPa@1m)Bottle-nose Dolphin 125-173 (dB.ref.1àPa@1m)Fin Whale 155-186 (dB.ref.1àPa@1m)Blues Whale 155-188 (dB.ref.1àPa@1m)Gray Whale 142-185 (dB.ref.1àPa@1m)Humpback Whale 144-174 (dB.ref.1àPa@1m)Hammerhead Whale 128-189 (dB.ref.1àPa@1m)Right Whale 172-187 (dB.ref.1àPa@1m)
Biotic sound sources are generatedbyfish, invertebrates, marine mammals, and other marine mammals In nature, mammalshaveused sound for communication, navigation, detection of conspecifics andprey,and even echolocation similar to a sonar system [87], [105] Marine mammals can generate biotic sounds inmanydifferentways[56], for example, the rattling of snowflakecodswim bladders
[54], some emperor fish species use their teeth [68], and snapping shrimp strike their claws to create sounds in the range of 2-24kHz [75].Generally,small mammals create soundsbyrubbing hardbodyparts together, while large marine mammalslikewhales or dolphins produce acoustic signals from their lungs through their throattube[116],whichisrelativelysimilartohumans.
Table1.2describesthesoundintensityandprimaryfrequencyofdifferentabiotic sound sources during offshore drilling[ 9 8 ]
Table 1.2: Abiotic sound caused by humans
Abiotic sound sources Frequency Sound intensity
Oil extraction 4-38 (Hz) 119-127 (dB.ref.1àPa@1m) Piling the drilling rig 30-40 (Hz) 131-135 (dB.ref.1àPa@1m) Exploration drilling ship 20-1000 (Hz) 174-185 (dB.ref.1àPa@1m) Semi-submersible drilling ship 10-4000 (Hz) 154 (dB.ref.1àPa@1m)
Classicalapproachesinunderwatersignalclassification
In the field of ocean acoustics, a underwater signal is defined as a unidirec- tional signal whose amplitude oscillatesovertime The primary features of biotic and abiotic acoustic data are best collected within the frequency domain [10], therefore techniques such asFourier,Mel, andWavelettransformations, which are time-frequency domain techniques, are the most useful solutions currentlyavail- able for extracting information from underwater signals In addition, some specific techniques based on frequency domain transformations such as DEMON (Demod- ulation of Envelope Modulation On Noise),LOFAR(Low-FrequencyAnalysis and Recording), and CMS (Cyclic Modulation Spectrum)havebeen researched and practically applied in sonar systems to extract the characteristic frequency compo- nentsof signals generatedbypropeller ship duringmovement.
TheFourierTransform(FT) is used to extract frequency information from a time- domain acoustic signal and is one of the traditional approaches for both the- oretical research and practical applications [20] TheFourier Transformis mathe- matically representedbythef o r m u l a :
Wheretis the time, andwis the singlefrequency.The mathematical essence of theFouriertransform is to multiply a signalbya series of sinewavesof different frequencies.Iftheproductofthesignalbeingsearchedandthesinewaveofacertain frequency gives a large amplitude, there is a lot ofoverlapbetween thetwosignals, and the signal tobesearched contains that particularfrequency.The formula( 1 8 ) demonstrates that theFouriertransform has high frequency resolution but zero time resolution, meaning that the frequencyvaluein the signal canbeprecisely determined, but the timing of the frequency appearance is unknown In practical problems, it canbechallenging to distinguish whether the signal belongs to the desiredsoundsourceornotduetothislackoftimeresolution.
Classical Fourier transforms are effective for static signals as these signals con- tain stable frequency components from beginning to end However, if signals in- clude frequency components with different start and end times at different cycles, classical Fourier transforms are not effective.
Figure 1.7: Overlap technique of STFTThe solution to thisdrawbackis the Short TimeFourier Transform(STFT), which divides the signalintosegments, as shown in Figure 1.7, and then multiplies each segment of the non-static signal with a window function [155] The length of the window function is equal to the length of the segment of the non-static signal, after which the signal is segmentedintosufficiently small elements to meet the stability requirement of the static signal [90], [107].Fromthe starting time of calculationt 0,thewindowisappliedtothesignalsegmentsasaresultofshiftingthe windowbya certain percentage.Forcomplex signals as acoustic signals, choosing an appropriate window will create stability for frequency components during theFouriertransform process in the timed o m a i n γ, υ
Inacousticsignalprocessing,Melfrequencycepstrum(MFC)isarepresentation of the short-term energy spectrum of sound signals, based on the linear cosine transform of the logarithmic energy spectrum on the nonlinear mel frequency scale [101].
Mel frequency cepstral coefficients (MFCC) are defined as coefficients that form a Mel frequency spectrum [128] The Mel frequency scale is defined as a function ofthelinearfrequencyaxisusingtheformula: mel(f)%95log(1+ f
The MFCC transformation, which involves the recreation of human auditory filters by matching them to problems of marine communication signal analysis, results in efficient information extraction and spectral feature retention, and is therefore suitable for subsequent artificial intelligence classification steps.
Wavelet Transform (WT) is capable of processing nonlinear and non-stationary signals, making it suitable for processing underwater sound signals in practical conditions [18].
There are two types of separate Wavelet Transform: Continuous Wavelet Trans- form (CWT) and Discrete Wavelet Transform (DWT) [22], with CWT being more suitable for the purpose of extracting features of underwater sound signals [25].
The CWT of a function F(x) is defined as the integral transform:
−∞ whereγis a scale parameter,υis a position parameter on the time axis, and Ψ ∗ γ,υ isthecomplexconjugateofΨ γ,υ ,Ψ γ,υ isthemotherwavelet.
The structure of the basis functions is defined by the phase delay and translation ofΨ γ,υ withs̸= 0,υ∈R: Ψ γ,υ (x)= 1
Therefore, the CWT of a spatial signal sequenceF(x)is: f(a, b) :=⟨Ψ a,b (x), F(x)⟩ −∞ Ψ a,b (x)F(x)dx (1.11)
In the field of signal processing, the normalizedwavelettransform is a represen- tation of a continuous data sequencef(x)as a combination of approximation and detail components, givenbythe formula [35]: f(x)≈Σa j,k ϕ j,k (x)+ Σ d j,k φ j,k (x)+ + Σ d 1,k ϕ 1,k (x) (1.12) wherea j,k andd j,k denote the coefficients of the approximation and detail com- ponents, respectively,kis an integer that determines the number of coefficients in each component,ϕandφare the scaling function and wavelet function, respec- tively, andjis the lowest resolution level selected.
The wavelet transform with appropriately sized window functions can achieve high resolution in both frequency and time domains The essence of this technique is instead of using orthonormal harmonic functions, frames consisting of shifting and compressing functions are employed in both frequency and time domains.
TheLOFARtechnique is a broadband sonar signal processing technique first proposed[98]in1991,whichinvolvesseveralsteps:Afterthebeamformingprocess- ing is used to receive the beam, the signal is digitized using an analog-to-digital converter, with a sampling frequency offs.Next, a low-pass filter is used to sup- press the sawtooth pulse, and this down-frequency process yields a frequency offs/ρ.The resulting output is multipliedbya Hanning window to enhance the observed signals, and then generates continuous overlapping or non-overlapping signal blocks Each block canberegarded as a fixed signal block and transformedintothe frequency domainbyfastFouriertransform (FFT) The block diagram of theLOFARalgorithm is summarized in Figure1 8
TheabsolutevaluesofthespectrumobtainedwithafrequencyrangefromDC tofs/2ρare used to estimate the background noise Then, the signal is spec- tralizedandnormalizedusingtheTwo-passsplitwindowalgorithm(TPSW).The normalization step is performed to estimate the background noise appearing in each spectrum, eliminate bias, and balance thea m p l i t u d e
Other variations ofLOFARhavereplaced FFT with STFT to create representa- tions in both time and frequency domains Results using time-frequency distribu- tion on a spectrogramhavebeen more effective in detecting and classifying signals from propeller ships [30], [33] As the power spectral density of radiation signals from a ship changesovertime, stacking or non-stacking techniques are used on a case-by-case basis before extracting the envelope of the signalbytaking theaver- agevalueof each spectrum [11], [24] With such features,LOFARis widely used to analyze and process sonar signals generatedbyship engines during operation.
Base on the aerodynamic theory of propellers [45], the underwater signalx(t)i s s a i d t o bevortex of orderkif and only if there exists akorder nonlinear transformation of the signal with sinewavecomponents with finite amplitude In this case,X(t)is a swirling process if its meanvalueand auto-correlation are periodic withperiodT The periodic auto-correlation function is a measure of the time correlation between the frequency-shiftedvaluesof the swirling signal, and is representedinthetimedomainbytheformula:
−T/2 wherex(t)is the input signal,τis the delay, andαis the frequency of the periodicity. The spectral correlation density (SCD) is the Fourier transform of the periodic autocorrelation function and is defined as:
The major advantage of swirling processing is the ability to eliminate the effect n
Modernapproachesinunderwatersignalclassification
The classical approaches of underwater signal processinginvolvechanging thewaydata is represented, and do notinvolveanyinformation classification. Accord- ing to the traditional sonar system [57], experienced operators combine statistical or basic linear models with contrast comparison steps.However,inrecentyears, artificial intelligence in general, and deep learning (DL) in particular,havebecome
2 i, j i j i a potential approach for classifying underwater acoustic signals that are contam- inated by background noise By increasing statistical properties, automation and stability of computations that depend directly on the signal at the processing time or pre-existing records, solutions that use Restricted Boltzmann Machine (RBM), Auto-encoders (AE) models, and Convolution neural networks (CNN) are con- sidered as a way to increase operator confidence, as human factors can lead to discrepancies in assessing the same data pattern at different times This improves the quality of classification of underwater acoustic signals in actual conditions.
RestrictedBoltzmannmachine(RBM)isatypeofneuralnetworkthatlearnsthe probabilitydistributionofinputacousticdata.Thenetworkarchitectureusesvari- ablesinthehiddenlayerh=(h 1 ,h 2 , ,h N )tolearnthedistributionofvariables representing the input datax=(x 1 ,x 2 , , x M ) Eachvaluex i is transformedintoh i throughtheweightfunctionw i,j [41].BydiscretizingBoltzmann machines statesothatnodesinthehiddenandinputspacesarelinearlyuncorrelated,RBMshavebeenusedinmanyc omplexinputclassificationproblemswithrapidvariationoverthe observation time such as acoustic signal[ 8 6 ]
The RBMnetworkis trained through an energy function transformation with variablesb i andc i reflecting the influence level of each nodex i andh i in thenetwork,according to thef o r m u l a :
Then, the joint distribution of(x, h)with distributionZwill be:
From formula(1.22), the conditional probability of nodesx i andh i is calculated through the sigmoid activation function as:
Instead of calculating the output layer, RBM will restructure the input layer through the hidden layer that has been activated from formula(1.23)and formula(1.24) Therefore, the RBM network can simultaneously learn the distribution of both the hidden layer and the input layer, reducing the dimensionality of the input, directly changing the weight function and bias value, making the model easier to learn and more convergent.
The Auto-Encoder (AE) model in Figure 1.12 transforms multi-dimensional acoustic data into a lower-dimensional space by utilizing activation functions that enable the AE model to learn non-linear relationships [48].
The Auto Encoder [70] model is composed of three main components when processing sonar signals:
1 The Encoder: This componentremovesunnecessary points from the input signal while retaining the necessary features to compress themintoa feature vector with fewer dimensions than the original The result is a smaller space that contains all the selected features of theinput.
2 TheLatentspace: This is a space similar to a bottleneck, containing feature vectors with important information thathavebeen compressed from the input,w i t h f e a t u r e s t h a t :
• The smaller the encoding space, the less overfitting occurs, as the modelmust select more important information to carry with it, and the ability to contain unnecessary features is reduced.However,if the encoding space is too small, lessinformationcanbestored,makingdecodingdifficultfortheDecodercom- ponent. Therefore, achieving a balance in the encoding space is an important consideration.
3 TheDecoder:Thiscomponentdecodesthecompressedsignalfromtheencoding spacetogenerateanewoutputthatis mostcloselyrelatedtotheoriginalinput. However, the AE model is still prone to overfitting when the encoded data dimensionality is low because it solely aims to minimize the loss function without considering the structure of the data space This leads to two issues in the AE’s hidden space:
• Discontinuity:nearbypoints in the hidden spacemayprovide very different decodeddata.
• Incompleteness: some points in the hidden spacemayprovide meaninglesscontentwhend e c o d e d
The spectral representation of sonar data canbeconverted to image format, whichinvolvesrepresenting data intwodimensions (spatial dimensions for images and time-frequency for spectrograms) Therefore,recentadvances in Convolution NeuralNetworks(CNNs)haveimprovedthe accuracy of classification solutions for sonar data [85], [112] CNNs are configured in a feed-forward manner, but thelayersare not fully connected, with each node in eachlayeronly connecting to a subset of nodes in the nextlayer.CNNshavea set of connections from a node in a hiddenlayerto a number of nodes in the nextlayer[78] This set of connections is called a filter and is reused in subsequent hiddenlayers.The filter is repeatedly applied through the nodesbychanging the start and end points of the connections to generate new signalo u t p u t s
If the spectral representation of sonar data is represented as anRGBimage,i e , t h e i n p u t d a t a h a s 3 c h a n n e l s ( R e d , G r e e n , a n d
3 - d i m e n s i o n a l t e n s o r , t h e k e r n e l i s a 3 - d i m e n s i o n a l t e n s o r o f sizekxkx3,wherekisthekernelcoefficient,typicallychosentohaveth esame depth as the spectrogram The kernel is thenmovedalong the spectrogram and the convolution operation is performed Each different kernel learns different features of the spectrogram, so multiple kernels are used in each Convolutionlayerto learn multiple attributes of the data Since each kernel produces a matrix output,kkernels producekoutput matrices, which are combinedintoa 3-dimensionaltensor with depthkand sizeHxWxD The output of this convolutionlayerbecomes the input of the nextlayer.
Figure 1.13: Convolution calculation on a RGB image The poolinglayeris used between convolutionlayersto reduce the size of data and preserve important features The reduction in data size reduces computationalcomplexity.With a poolinglayersize of(k∗k),the input of the poolinglayerwith sizeH∗W∗Dis dividedintoDmatrices with a size ofH∗W.Foreach matrix, a maximum or globalaverage valueis calculated within a(k∗k)region of the matrix and then selected tobeincluded in the resulting matrix. The output tensor at the last hiddenlayerhas a size ofH∗W∗Dand is transformedintoa vector with a size ofH∗W∗D.Finally,the fully connectedlayeraggregates the featuresandproducestheoutputresultofthenetwork.
Thecurrentstateofresearchinunderwatersignalclassifi-
In Vietnam, numerous studies on the field of underwater communication and sonar signal processing have achieved positive results, notably from research in- stitutions such as the Military Academy Science and Technology Institute, Hanoi
University of Science andTechnology,MilitaryTechnicalAcademy.Many research outcomes with high scientificvaluehavebeen published and applied in practice, such as the design and fabrication of underwater communication devices, acoustic channel modeling, and sonar signal processing using the principle of coordinated fields The principles of underwater communication and analog signal processinghavebeen used to investigate the design and fabrication of underwater commu- nication devices. Hopping frequency signal processing techniques combined with coordinated filter systemshavebeen applied to develop devices for managing un- derwater divers [5], while the principles of the orthogonal frequency-divisionmul-tiplexing (OFDM) signal modulation modelhavebeen used to develop sonar data modems and survey shallow waters channels [1], [97] The investigation andeval-uation of shallow waters channel models in Vietnam [2], the use of environmental noise statistics within predetermined ranges forobjectdetection using coordinated fieldprinciples[6], [8],arealsonotableresearchoutcomes.
Table 1.3 presents some prominent research statistics on the field of underwater acoustics by Vietnamese scientists in recent years.
Table 1.3: Some published researches in underwater signal processing in Vietnam
Underwater communication system Analog processing 2006, [3] Underwater communication system Digital processing 2022, [1] Channel modeling theory Geometric channel survey 2017, [96] Channel model evaluation Actual channel survey 2020, [1] Underwater target controls y s t e m Semi-active sonar 2020, [5] Underwater target Detect Match-field processing 2017, [6], [8] Manage underwater targets Passive sonar 2019, [9]
However,r e se a r c h o n t h e p r o b l e m o f c l a s s i f y i n g s o n a r s i g n a l s u s i n g a r t i f i c i a l intelligence based on the principles of passive sonar is limited Studies int h i s areacurrentlyfocusonimprovingtheefficiencyofequipmentusageinspecializeduni tssuchastheunderwatersurveillancesystem[5],[9],communicationsystem[1],[3].In the worldnowaday,through data obtained from sonarp r in c i p l e a c o u s t i c sensors,t h r e e m a i n r e s e a r c h t r e n d s f o r c l a s s i f y i n g s o n a r s i g n a l s havee m e r g e d :thefirst,t h e t r e n d o f u s i n g o n l y d e e p l e a r n i n g a n d m a c h i n e l e a r n i n g w i t h o u t pre- processingsteps;the second ,thetrendofusingpre-processingsteps,suchas biotic-inspired filters and time-frequency domain transformations, before applying artificialintelligence;andthe third ,thetrendoftransferringlearningfromrelated fields The main contributions and limitations of these trends are briefly described, compared, and analyzed, leading to the research questions tobeaddressed in Sec- tion1.5.
Instead of using a two-stage approach, a classification model using only neural networks takes raw, unprocessed sound signals and directly filters them at the nodes of the convolution neural network Table 1.4 presents some typical research results in this approach.
Table 1.4: Typical results using artificial intelligence only
Deep CNN Surface ship(*) Detect: 81,96% 2019, [151]
(*) No public information about usedd a t a s e t
One of the earliest artificial intelligence models used to classify passive sonar signals in the Iranian sea in 1998wasa solution using hiddenMarkovchains and Euclideandistance[103].Theimprovedmodelscontinuedtodevelopin2011[104] and2015[89],achievingclassificationofthreegroupsofshipswithvaryingdegrees ofweight,fromsmalltolarge,withaccuracyincreasingfrom65.69%[103]to88.9% [89], despitevaryingsignal-to-noise ratios(SNRs).
The Auditory Perception inspired Deep Convolution Neural Network (ADCNN)
[151] is another model based on a convolution neuralnetworkused to classify underwater sound sources ADCNN uses amulti-layeredconvolution filter in the initial processing stage to separate therawsignals in the time domainintoa set of distinct frequency domain signals The output of the filter after performing the convolution operation is then fedintoa max poolinglayerand a fully connectedlayer.Thenetworkthen uses a process of merging the individual componentsintoa more informative synthesis.Finally,the data stream passes through a decisionlayerto produce the classification result The processing of the ADCNNnetworkis a type of adaptive learning in which the sub-networks learn directly from the inputdataaswellasdirectlyextractthefeaturesofthesoundsignals.
Inaddition,thedecisionlayercanbereplacedbyothersolutionssuchasExtreme Learning Machine (ELM) [62], an approach that produces better classification re- sults compared to fully connected neuralnetworks(FCNN) The results presented in[61]showthattheCNN-ELMmodelcanachieveanidentificationrateof93%on a dataset of civilian ships, which is 6% higher than FCNN methods CNN-ELM is similartotheADCNNmodel,inwhichfeaturesareextractedfromone-dimensional convolutionlayersthat act as band-passf i l t e r s
Similarly,a convolution neuralnetworkmodel based on DenseNet architecture named Underwater AcousticTargetC l a s s i f i c a t i o n D e n s e N e t ( U A T C -
[38]wasdesigned to classify underwater acoustic sources directly fromrawsound signals TheUATCDenseNet model consists of a deep convolutionnetworkwith a dense architecture trained to detect 12objectclasses The dataset usedwaslabeledbya sonar expert, containing 11 different sound classes and one corresponding to environmental background noise The modelwasadjusted for the number of filterlayers,depth,layerconfiguration, andvariousinput features to testaccuracy.The results published demonstrate that a six-layer filter architecture, usingrawaudio data as input without pre-processing, performed better than some other classical methods thathavebeen published, such as CNN-ELM [61], ResNet18 [55], and SqueezeNet [64].
To address the lack of underwater sound data, the Auto-Encoder method andBoltzmann Machine Auto-Encoder method [86] were used and achieved positive results in the period from 2020 to the present The RBM model has been shown tobesuperior to the traditional Gaussian Mixture Model-GMM with Gammatone frequency coefficient featurelayer.In thisway,the use of unsupervised feature ex- traction method with RBM (to provide feature extraction and signal reconstruction capabilities) and the combination of power spectrum and modulation spectrumintothe input data (to recognize frequency variation of the original audio signal) is a highly effective method for identifying ship signals and marine mammals signals from passive sonar signals.
A Multiscale Residual Deep NeuralNetwork(MSRDN) [132]wasdeveloped from a deep residual shrinkagenetwork[156] to classify underwater sound signalsbydirectly modeling the inputwaveforms.The purpose of usingraw,unprocessed signals in this studywasto limit the physical structure changes ofwaveformswhen converting signals to the time-frequency domain and reduce dependence on the sliding window ofFouriertransform (FT) and the displacement of the FT window, resulting inimprovedclassification performance compared to Auto-Encoder and machine learning models.
The Convolution NeuralNetworkarchitecture [60], [66], [123], [151] utilizes amulti-layeredconvolution filter in the initial processing stage to separaterawsig- nals in the time domainintoa set of distinct frequency-domain signals The output of the filter after convolution is then fedintoa max pooling (MP)layera n d a f u l l y c o n n e c t e d (FC) layer Subsequently,thenetworkemploys a process of aggre- gating individual componentsintoa synthesized representation containing more information.Finally,the data stream passes through a decisionlayerto produce a classificationresult.Somestudies[146],[154]haveoptimizedtheuseoffeaturesex- pressed in multiplelayersbyutilizing appropriate shortcut connections The most important point of this solution is to determine the interconnection structures and find appropriate parameters for the classificationobjective.
Overall, the processing of the convolutionnetworkis essentially a type of adap- tivelearning in which sub-networks learn directly from the input data and extract features of the audio signal Theadvantageof this trend is the ability to extract information without being constrainedbyprior assumptions about the nature of the identifiedobjectand the environment.However,direct learning from unpro- cessedd a t a w i l l maket h e s y s t e m m o r e c o m p l e x a n d c o m p u t a t i o n a l l y in tensive.
Additionally,most CNNnetworksused in this trend are inherited from image pro- cessing problems, withvariantsbased on adjusting the initially developed methods for image recognition Thus, developing specialized models that only use artificial intelligence and optimizenetworkstructures to processrawaudio signals remains aresearchproblemthatneedstobecontinued.
1.4.2 The use ofpre-processing combinedwith artificial intelli- gence
Animportantcharacteristicofunderwateracousticsignalsisthefrequencycom- ponent distribution in the time domain, and the approach using frequency domain transforms and time-frequency domain transforms has been used to classify pro- peller ship signals and marine organism communication signals.Table1.5 summa- rizes the main results of thistrend.
Table 1.5: Typical results using pre-processing combined with artificial intelligence processPre- Classify Target Result Publish
DEMON ICA Surface ship(*) Detect:82,7%
LOFAR SVM Surface ship(*) Classify: 82,14% 2015, [93]
DEMON-ICA FCNN Surface ship(*) Classify: 61,27% 2018, [71] Median-filter FCNN Surface ship(*) Classify: 85,4% 2018, [71] DEMON
DEMON CNN/MLP Surface ship(*) Classify: 80,1% 2020, [83]
Correlation CNN Surface ship(*) Detect: 80% 2022, [118]
Mel Inception Marine mammals(*) Detect: 84,42% 2018, [139] WMFCC KNN/SVM Marine mammals(*) Classify: 88,9 2018, [65]
GFCC FCNN Marine mammals(*) Detect: 94,3% 2019, [140]
Cochlea CNN Marine mammals(*) Classify: 87,72% 2020, [122] STFT CNN Marine mammals(*) Classify: 97.13% 2021, [14]
STFT SNN Marine mammals(*) Detect: 96,3%
STFT CNN Marine mammals(****) Classify: 80,45% 2022, [94]
(*) No public information about usedd a t a s e t
Chapter1conclusion
Signal classification of underwater acoustics is one of the fundamental tasks in the overall problem of underwater target recognition and localization Numerous classical and modern solutions have been used to classify signals from targets be- longing to either one or another class based on the principles of passive sonar The essence of this principle is to focus on processing the sound signals generated from the object transmitted in water By analyzing the unique features representing each of these signals, researchers can classify these sound sources.
Chapter 1 of the thesis provides an overview of the research issues, including the theoretical basis of ocean acoustics, a general introduction to the operating principlesofunderwateracousticsignalclassificationsystemsbasedonthepassive sonar principle, analysis of classical and modern approaches, solutions for acoustic signal processing and feature extraction, the development and trends of artificial intelligenceinclassificationmodels.Thethesisalsosystematicallyandpreliminarily analyzesthesolutionsthathavebeenusedinpublications,comparestheadvantagesanddisadv antagesof published studies, and identifies existing issues that willbeaddressed.Finally,thethesisselectstwospecificgroupsoftargetsingeneralshallow waters areas, namely ship propeller signals and marine mammal communication signals, as inputs for the classification model and presents proposed solutions for classifying acoustic signals generated from the selectedobjects.
OF PROPELLER SHIP
The formation process of the propeller ship signals duringmovement
• The first type: generatedbythe vibration of the engine,machinery,and equip- menton the ship duringoperation;
Forthe first and second types of the characteristic signals, energy from the vibration and noise emitted from themovingship’sbodywill propagateintothe environment This process’s cycle exists continuously with alow intensitywhen the shipmoves However,the propulsion engines and auxiliary systems on the ship will generate acoustic signals at characteristic frequencies related to specific operations The ship’shulltransmits vibrations from engine oscillationsintothe water, and vibration from the main engines creates frequencies related to engine speed and cylinders By mathematical analysis and the actual structure of the propulsion system onvariouspropeller ships [100], the main operating range of the propulsion engine is the primary signal source that causes oscillations and is inversely proportional to the engine size Therefore, a larger engine will generate low-frequency vibrations in anarrowerrange (1Hz-10Hz), while a smaller engine will create higher frequency vibrations in a broader range (10Hz-200Hz) [152]. Harmonicwavesof mechanical frequencies will also appear during the oscillation processoftheship’sbodyandtherotationalmotionofthepropeller.Bycomparing the engine frequency and the propeller rotation, the transmission parameters of the ship canbec a l c u l a t e d
Figure 2.1: Real cavitation noise when a propeller rotates in underwater environ- ment
For most ships of any size, the propeller will be the primary source of noise pollution [13] The noise pollution from the propeller is mainly generatedbycavi- tation When the propellers rotate, water bubbles appear in low-pressure areas on the propeller blades These bubbles thendetachfrom the propeller and collapserandomly,as shown in Figure 2.1 [100] Therefore, the mainobjectof the propeller ship signal processing is the cavitation noise generated from the propellers Accu- rately extracting the characteristic frequencies of the ship through the phenomenon of cavitation will help the classification systemimproveitsquality.
The phenomenon of cavitation not only significantly reduces the operating ef- ficiency of shipsbydestroying the physical structure of the propeller blades, but alsoconstitutestheprimarysourceofnoiseduringtheship’smovement.Therefore, in order to select an effective processing technique, it is necessary to analyze the physicalnatureandtheeffectsofthephenomenonofbubblecavitation.
Figure 2.2: The water bubbles burst during cavitation
The rotating propeller blades will absorb the torque of the engine andconvertitintothrust to propel the ship in the water environment According to Bernoulli’slaw[76], when the shipmovesin the water using the propeller, a positive pressure willappearonthesurfaceofthepropellerbladeandanegativepressurewillappear behindit.Thispressuredifferenceisaphysicalphenomenonthatcannotbeavoidedin the process of converting torqueintothe ship’s propulsive force, and it also creates air bubbles around the propeller These bubbles appear and burst whent h e f o r c e a c t i n g o n t h e b u b b l e s e x c e e d s t h e t e n s i l e s t r e n g t h o f t h e w a t e r ’ s o u t e r surface, as shown in Figure 2.2 [117], producing energy pulses that propagate in the water environment [44].
This process generates a wideband noise (around 15-20 kHz) that is modulatedbythe propeller blades rotation cycle [23] The surface of the propeller blades also creates a uniform flow on each rotation Changes in the velocity of the inflowing stream are due to the resistance force of the ship’shull,causing a sudden decrease ininflowvelocity,changing the angle of attack and rapidly reducing the pressure on the surface of the propeller blades This phenomenon increases the process of bubble cloud formation and cavitation As a result, the wideband noise generatedbythesebubblesismodulatedinamplitudeatthefrequencyofthepropellerblade’s rotationalspeedmultipliedbythenumberofblades[47].
Therepetitionofsuchmechanicalprocessescreatesuniquecharacteristicsofeach ship and each class of ship Therefore, a underwater signal recordingmaycontaintwomain noise components: the additive component and the modulated compo-nent[82], respectively corresponding to the approach of theLOFARand DEMON algorithms [98], which willbeused to process the underwater signal generatedbyship propeller Depending on the specific problem and practical conditions, theLOFARand DEMON algorithms willbeapplied to process the propeller noise signal.
Proposal of spectral amplitudevariationpre-processing
Traditionaltechniques based on the DEMON algorithmhavebeen a well-known solution for processing and classifying propeller underwater signals, whichhavebeen applied in nearly all military sonar systems fromWorldWarIIto the ColdWarera,suchastheUSA’sSURTASS[53],Sweden’sHAWKeye[40],andmodern devicesasIsrael’sBlackFish[19]andNorway’sKongBerg[21],demonstratingthe applicability and accuracy of the DEMON algorithm and itsvariantsin practice.However,faced with the requirements of modern underwater signal classification problems, the DEMON algorithm also facestwolimitations that affect the pro- cessing results of underwater signals,namely:
• The fi r s t : limitation is the design of bandpass filters In theoretical simula- tions or real systems, this is a task assigned to sonar operators who rely on their experience and formal training to adjust the bandwidth manually to se- lect the periodic components in the sonar signal related to the modulation of the cavitation signals A solution to replace humans at this stage with adaptive filtering based on fuzzy logic analysis [73] has been proposed, but it still boils down to the traditional DEMON algorithm, meaning that a specific bandpass filter still needs to be selected;
• The second: limitation is theimprovementof the new types of propeller thruster engines A new trend in ship engineswasopened upbythe well- knownSchottelenginecompanyin2015byintroducingversionsofanewtype of thruster engine system with solutions for propeller andSRTthruster en- gines The new productSRT(Schottel Rim Thruster) [147] is an electrically driven thrust system without a gearbox or transmission shaft The stationary part of the electric motor is attached to the outside of the connecting tube.T h e p r o p e l l e r b l a d e s a r e a t t a c h e d t o t h e i n s i d e o f t h e r o t a t i n g c h a m b e r T h i s c r e a t e s a s p a c e - s a v i n g a n d lightweightthrust device that reduces transmission line losses, limits engine noise to a verylowlevel, and attenuates signals in the rotating chamber before they are emittedintothe environment This ist h e m a i n r e a s o n t h a t h a s c h a n g e d t h e s t r u c t u r e o f c a v i t a t i o n n o i s e s i g n a l s , s i g n i f i c a n t l y a f f e c t i n g t h e d e t e c t i o n p e r f o r m a n c e o f t h e D E M O N a l g o r i t h m , w h i c h onlyfocusesonprocessingfrequencyinformationabouttheenvelopeof cavitationnoise.
Based on the technique of transforming the second-order periodic components in the torque signal of a flipperintofunctions containing first-order periodic through signalenvelopetransformationandsignalprocessingtechniquesinthetime-frequency domain, whichhavebeen effective approaches to date.However,the dependenceo n s e l e c t i n g t h e r a n g e o f t h e b a n d p a s s f i l t e r m a k e s
FromthetheoreticalanalysesinSection1.2.4andSection2.1,thenoisesignal generatedbythepropellers(t)ismodeledasamodulatedperiodicsignalm(f,t)withacycle ofT k =2π/k,andthecavitationnoisen k (t)ismodulatedbyampli- tudemodulation.
Using equation(1.18)from Nielsen [98], the signals(t)is expanded asf o r m u l a (2.1)below: s(t) =m w k (t)n k (t) (2.1)
The modulated signalm w k (t)with a cycle ofT k is the result of the propeller blade passing through the flow, creating the signal Althoughm w k (t)is periodic and canberepresentedbyaFouriertransform to find the rotation frequency of the fan,itsspecificmodulatedwaveformisdifficulttodetermine,makingitchallenging to determine theFouriercoefficientsaccurately.
Thetwocomponents of propeller cavitation noise canbeconsidered statistically independent because the structure of the flow that creates them w k (t)signal and the shape and structure of the propeller blade as well as the water bubble cloudthat generates then k (t)signal are not directly related to each other In the discrete- time domain, the cavitation noise is a discrete-time signal composed of a sequence of uncorrelated random variables with a mean of 0 and a finite variance, so the spectrumofthesignalx(t)canberegardedaswhitenoise.
According to the Rosenblatt central limit theorem [111], the signals(t)follows a Gaussian distribution becauses(t)consists of the totalweightof the formation and bursting processes of water bubbles, and the bubble bursting process is balanced in all directions, so the meanvalueofs(t)is 0 Thus, this signal has a normal distributionofamplitudeandanexpectedvalueof0.
Based on the signal model in equation(2.1), letϕ i (t)be a family of basis functions for the orthogonal projection of the signalx(t)onto the spatial domains. The basis functions are chosen such that the projections of the signal onto different basis functions are uncorrelated with each other.
If the observation timeTi s c h o s e n t obesufficiently large to observe the pro-
∫ k k k w k w k x i = x(t)ϕ i (t)dt (2.2) jected signal, then the signal is defined by the formula:
0 fori= 0,1,2, ∞ Since the projected signalsx i are orthogonal, each pair of signalsx i andx j must follow the rule of the orthogonal function [137], which is given by the formula:
E{x i x j }=λ i δ ij , (2.3) whereEis the expectedvalue,δ ij is the Dirac-delta function with avalueof 1 ifi=j,and 0 ifi̸=j.
The observation timeTis much larger than the cycleT k of the modulated signal toensurethatthesignals(t)canbeobserved.Bysubstitutingformula(2.3)intoformula(2.2),t hen: λ i δ ij =E
According to formula(2.3), formula(2.4)satisfies only if it satisfies the integral formula: λ j ϕ j (t)= T
The covariance function of the cavitation noise is given by the formula:
Sincen k (t)is white noise,E(n k (t)n k (u)) = 0, henceE(s(t)s(u)) = 0if u̸=t Ift=u, then:
By substituting formula(2.7)intoformula(2.6), thecovariancefunction ofthe
0 spreading sequence is given by the formula:
E{s(t)s(u)}=m w k (t)m w k (u)σ 2 δ(tưu), (2.8) From the covariance function(2.8), solving the integral function(2.5)as follows: λ j ϕ j (t)
The expected value of each spreading sequence sample will be 0 because the mean value of the cavitation noise along the phases of the cavitation noise is 0, then: m i =E{s(t i )}
Environmental noise in cases that follow the Gaussian distribution will have ameanof0andavarianceofN 0 /2whereN 0 /2isthepowerspectraldensityofthewhite noise.
Therefore, thei th signal of the spreading sequence will also have a mean power of 0 and a variance ofσ i calculated by the formula(2.12)as follows: λ j = m 2 (t i )σ 2 +N 0 /2 (2.12)
Thus, if a distribution at timei th of the signal is defined as:
1 x 2 (t i )Σ then the general formula(2.14)applies to all time stepst i of the signal when
2(t i )is periodic with characteristic frequencyw k
Next, the STFT transfor is used to overcome the inflexibility when using the FFT transform of the DEMON algorithm The STFT adds a time dimension tot h e p a r a m e t e r s o f t h e b a s e f u n c t i o n bymultiplying an infinite complex exponential function with aw i n d o w b(w,t 0)(t):=w(t−t 0)exp(iwt) (2.15) wherew(t)is a window function and(w,t 0)is the time-frequency coordinate ofthebasefunction.ThegeneralformulaforSTFTisgivenbytheformula:
Formula(2.16)provides information on both the time and frequency domainsbyselecting the size of the window The result of this transformation can alsoberegarded as a bandpass filter with aFouriertransform of the window functionw(t)shiftedtothecenterfrequencyw.Therefore,filterhavethesameban dwidth.
Insummary,theproposedamplitudevariationalgorithmaimstoimprovetheac- curacyofextractingcharacteristicfrequenciesfromunderwateracousticsignalsbycomputingthevarianceofthea mplitudegradientandtheprobabilitydistribution combined with the time-frequency domaintransformation.
Based on the variance transformation formula(2.11), signal distribution for- mula(2.14), and time-frequency domain transformation formula(2.16), the block diagram of the proposed algorithm is presented in Figure 2.3 The algorithm ex- ecution steps and flowchart are depicted in Figure 2.4 to enhance the ability to i m w k
2π(m 2 (t i )ãσ +N 0 /2) i extract characteristic frequencies from underwater acoustic signals.
Figure 2.3: Block diagram of the proposed algorithm The execution steps of the proposed algorithm include thef o l l o w i n g :
• Step 1: Consider the input to the algorithm as underwater acoustic signalst h a t a r e s e g m e n t e d intoequal-length samples Then, the frequency variance variationcalculationisappliedtoeachsegmentofthesignal.
• Step 2: The signal spectrum is calculatedbySTFT transformation so that the nominal frequency resolution is 1Hz Then, a 2D spectrum matrix is obtained, with one axis representing frequency in Hz and the other axis representing the number of samplesovertime inseconds.
• Step 3: The frequency amplitude of each segment is averagedoverthe entire rangetoobtainauniquevalueasthefeaturepeak.
• Step 4: The stack-piling technique is used to reduce the variance and increase the signal-to-noise ratio (SNR)value.This technique divides the underwater acoustic signal x(t)intooverlapping consecutive segments As the signal is dividedintomoreoverlappingsegments,thevariancewilldecreaseaccordingly,and the probability of accurately estimating the characteristic frequency of the signal willin c rea se
Rate, α, maxFreq Sample[60 x (rate/2)] i = rate/2, j = maxFreq
True var = S_maxSample[j] - S_maxSample[j-1] j = j - 1 maxSample[rate/2] = maxFreq
SD_Sample = Standard deviation(S_Sample)
S_Sample = M_Sample * Hamming[8] i = i-1 M_Sample[rate/2] = Mean(Sample)
AV[j] > (SD_Sample[j]*α) AV[j] = var3
Figure 2.4: Flowchart of proposed algorithm for processing propeller signals
2.2.4 Evaluation of theproposedalgorithm on actual shipdat a
The results of the proposed algorithm for variable instantaneous frequency es- timation are verified and compared with the DEMON algorithm using the Hilbert filter presented in Figure 1.11 in Section 1.2.4 on actual propeller ship signal sam- plesfromtheShipEardataset[114].Tofullyevaluatetheeffectivenessofextracting characteristic frequency components fromrawdata, the proposed algorithm and theDEMON- Hilbertalgorithm[106]willbetestedonrecordingscontainingcharac- teristic operational states of ships in shallow waters Specifically, three main cases areconsidered:
• The second case: a ship ismovingsteadily at a constant speed against a noisy background;
The pre-processing and testing process is conducted on a Dell T3600 Xeon 8- coreworkstationwithanNVIDIAK22004GBgraphicscardrunningUbuntu18.04 operatingsystemwithkernelCUDA10.1andCudnn7.6.5.
Figure 2.5: Signal detection by DEMON-Hilbert on Record-1
Figure 2.6: Signal detection by DEMON-Hilbert on Record-2
Figure 2.7: Signal detection by DEMON-Hilbert on Record-3
The results in Figure 2.5 indicate that the DEMON-Hilbert algorithm missedmanyneighboring frequency components due to the tendency to consider neigh- boring frequencies tobewithin a single signal envelope.Forsignals generated fromashipmovingsteadily,theDEMON-
Hilbertalgorithmextractedthecorrect characteristic frequency components as shown in Figure 2.6, but could not detect low-frequencycomponentswithweakintensityasinFigure2.7whentheshipbegan tomove.
B The proposed algorithm is applied toprocess:
Figure 2.8: Signal detection by the proposed AV algorithm on Record-1
Figure 2.9: Signal detection by the proposed AV algorithm on Record-2
Figure 2.10: Signal detection by the proposed AV algorithm on Record-3
Proposal of a customized convolutionneuralnetwork
The majority of traditional sonar systems rely solely on the ability to directly listen to physical sounds combined with spectral analysisbyexperienced sonar operators,asdepictedinFigure1.5.However,thehumanfactorisnotanimmutable factor,asdifferentsonaroperatorsanalyzingandprocessingthesamesignalsample atdifferenttimescanyielddifferentresults.Theapplicationofartificialintelligence tothesystemmodelisconsideredasupportivemeasurethatcanincreaseconfidence and provide reference for sonar operators, helping the system operate more stably andmaintainthestabilityofthesonarsystem.
The sonar systems currently used in practice [10] provide sonar operators with spectrogram and "spectrogram portrait" images that represent information about the frequencycontentof desired signals Therefore, the thesis has chosen the time- frequency transformation to pre-process, and the result of this step is the "spectro- gram portrait" images Thus, tobesuitable for practical applications in underwater signal classification systems, the thesis has used convolution neuralnetworksto re- searchandcompareresultswithpublishedresultsonthesameactualdataset.
TheCNNfeatureextractionisbasedontheprincipleofconvolution,whichhelps the model understand the spatially dependent levels of information andtem poral correlations between pixels in the spectrogram data Themulti-layeredconvolution filters will separate the spectrogram areasintoseparate components The output of the convolution operations is then fedintopoolinglayersand globallyconnectedlayersto merge the individual componentsintoa common representation that con- tainsmorecompleteinformation.Theconvolutionoperationsreducethedimension of the spectrogram to reduce the computational load, while still maintaining stable prediction results without losing important features of the data The convolution neuralnetworkensures stability when designing an architecture that is compatible with large-sized spectrogram data and can extract deep features of the spectro- gram.
In typical image processing problems, the processing techniques consider each input image as a matrix of numbers, where the set of pixel points is represented as numbers in a specific color system, typically the "RGB" (Red-Green-Blue)system. The challenge is to bridge the gap between this numerical matrix and the semantic information contained in the image.TraditionalNeuralNetworksare notvery effective for image input data In the dataset processed in Section 2.2.4, if each pixel is treated as an attribute, an input image with a size of (224×224x3) willhave150,528 attributes If the image size increases to 1000×1000, there willbe1 million (1M) attributes for each input image If a fully connectednetworkisu s e d a n d a s s u m i n g t h a t t h e s e c o n d layerhas
1000 components (network nodesi n a layer),theweightmatrix wouldhavea size of
1000×1M,equivalentto 1 billion parameters tobetrained This requires ahugeamount of computation ando f t e n l e a d s t o o v e r f i t t i n g d u e t o i n s u f f i c i e n t t r a i n i n g d a t a T h e c o n v o l u t i o n architecture performs better for image datasets by reducing the number of related parameters and allowing the reuse of weights In summary, the convolution model can understand spectral information in images better than other models.
Therefore, the solution to extract and classify underwater acoustic signals based on the pre-processing spectrogram dataset discussed in Section 2.2.4 is effective and reasonable by using a Convolution Neural Network.
There have been many studies utilizing Convolution Neural Networks for the classification of biotic and abiotic underwater signals, as documented in Tables 1.4 and Tables 1.5 In this thesis, the proposed network architecture analyzes spectro- gram as pre-processed input data, as outlined in Section 2.2.4.
The customized CNN architecture proposed for acoustic signal classification is illustrated in Figure 2.17,c o m p r i s i n g :
Figure 2.17: Structure and configuration of the proposed CNN
The main purpose of the four convolutionlayers(A, B, C, D) is to extract high- level features, such as edges from the input spectrogram The first convolutionlayerslearn low-level features, such as color and gradient orientation. With the C and D convolutionlayers,the model extracts high-level feature information, ensuring comprehensive knowledge of the information within thes p e c t r o g r a m
The maxpooling layers (E, F, G, H) are used to reduce the dimensionality of the convolution output The pooling stage reduces the computational load required to process data by decreasing the input feature size Maxpooling acts as a noise re- duction filter performed in parallel with the reduction in size, extracting the most important features by selecting the maximum values in sliding windows, thereby Σ ensuring data invariance Consequently, using maxpooling improves the effective- ness of the model training process.
In a CNN, the hiddenlayers takein input data and perform transformationst o c r e a t e i n p u t d a t a f o r t h e s u b s e q u e n t layer.The transformation used is the con- volution operation, with each convolutionlayercontaining one or more filters to detect and extractvariousfeatures from the spectrogram The general formula for a1-dimensionalconvolutionoperationinthecontinuousdomainis:
(f∗g)(t)≜∫∞ f(τ)g(t−τ)dτ, (2.17) Applyingthe CNN model,f(t)represents theinput image, whileg(t)acts as a filter that functions as a sliding window.tdenotes the position of the filter on theoriginalspectrogram.Ateachtimet,(f∗g)(t)isthevalueoftheconvolution between the signal and the window at timedelayt The result of the integration providesinformationaboutthecorrelationbetweenthesignalandthewindowin theentiredefineddomain.Theconvolveddataistransformedfromthecontinuousdomain to the discrete domain using the sum Riemann[ 6 7 ] :
(f∗g)(t)≜ τ∈D f ∨t−τ∈D g f(τ)g(t−τ), (2.18) The Riemann sum estimates an integralbydividing itintosmallintervalsand calculatingtheareaofeachinterval,asshowninFigure2.18.
Figure 2.18: An example of Reimannsum The number of intervals corresponds to the size of the filter in the convolution
−∞ Σ layer.Ifthefiltersizeis[5×5],thenconvolutionisperformed25timesandsummed, insteadofintegratingoverthecontinuousarea.Forexample,inFigure2.18,instead of calculating the functionvaluesat all points from 0 to 2, the Riemann sum only calculatesvaluesatintervalsof 0.5 units, reducing the number of computation points from infinity to 4values.The distance between the points is known as the stride, which is optimized tomovethe filter in predetermined steps A smaller stride results in more computations, but a larger output size, while a larger stride results in fewer computations but loses morerelevantinformation between the steps The stride in the proposed convolutionnetworkis similar to theFouriertransform sliding window used in Section2.2.
When sliding the windowsoverthe spectrogram, the pixel at the central space of the input matrices iscoveredbymanysliding windows, meaning that the cor- respondingvalueis used frequently to compute the outputvalue,while the pixels at the edges or corners are only used a few times, resulting in significant loss of important information in those areas In spectrogram of underwater signals, this region contains a lot of information about the characteristicfrequency.Inserting thepaddingvaluesof2tosolvethisissue;itmeanstheinputspectrogramispadded with zeros where there are novalues,and then integrated Without padding, the windowwouldonlyslidewherethewindowandsignalcompletelyintersect.Addingpadding increases the size of the input matrix, resulting in a larger output matrix This reduces the discrepancy between the output and original input matrices The positions on the edges and corners of the original input matrix are also pushed deeper inside, leading to their increased use in calculating the output matrix and avoiding informationlo ss
TheproposedCNNmodelusesbatchnormalization(BN)layerwithanetwork structure consisting of 1 inputlayer,4 convolutionlayers,4 max-poolinglayers,and 2 fully connectedlayers.The BN normalization step brings the distribution of thenetwork layersto approximate a normal distribution with an approximate mean of 0 and variance of 1 Mathematically, BN calculates the mean and variance of eachnetwork layer.The formulas for calculating the mean and variance of a randomlys e l e c t e d m i n i - b a t c h f r o m t h e e n t i r e d a t a s e t p r o c e s s e d t o g e t h e r a t e a c h iteration are provided.
The process of dividing the training set into smaller subsets (mini-batches) with sizes of 16, 32, 64, 128, etc., in practical calculations involves subtracting the fea- ture value from the mean value and dividing it by the standard deviation Batch normalization (BN) is then performed on these mini-batches, and the values are normalized using the formula: x i −à B xˆ i ← √ σ 2 , (2.21)
+ϵ The nature of thelayersin a neuralnetwork involvescalculating thederivative valueand reducing theweightbased on the direction of thederivative.Aslayersin a CNN arestackedon top of each other, the distribution of input data changes due to theweightupdates of the previouslayers.This process causes the distribution of input data to the subsequentlayerstobeentirely different from the initial distribution Batch normalization is used to fix the data distribution to a normal distribution, ensuring that the distribution properties of the data remain constant acrossalllayers.Asthemodelchangeswitheachiteration,thedistributionofinput data from the previouslayersconstantly changes, causing the proposed model to use additional normalizationlayersto maintain the mean and standard deviation ofinputfeaturesandreducethevariabilityofthedistribution.
Inbuildingaconvolutionneuralnetwork,severalparametersincludingfiltersize, poolinglayersize, number of convolutionlayers,and training-test iterationsplayacriticalroleindetermininganappropriatenetworkstructure.
• Training-validation:in an ideal model, increasing the number of training- validationiterationsshouldincreasetheaccuracyoftheresults.
• Convolutionlayers:thenumberoflayersincreaseleadingtoatrade-offbetween accuracy and computationale f f i c i e n c y
Classification results using the combination oftwoproposedsolutions onpropellersignals
Toevaluate the efficiency of signal classification of propeller ships in practicebyusing the proposed algorithm of spectral amplitude variation combined with a customized convolutionnetworkaccording to the general approach of the thesis proposed in Figure 1.14 The input datawasperformed inUbuntu18.04 operating system environment with CUDA 10.1 and Cudnn 7.6.5 cores on DellT3600X e o n
The inputs of the solution include:
• ShipEardataset [ 10 9] usedfo rp re - pr oc essi ng is de scr ib ed in Section 1 5 2 2.
• Spectrogram is generated from pre-processingbyDEMON-Hilbert algorithm and proposed amplitude variation algorithm These datasets are used as inputs for the customized convolutionnetworkproposed in Section 2.3.2.
• The convolutionnetworksused to compare the classification results with the proposed solution are thenetworkstructure LeNet [145] and VGG19 [63] in Figure 2.19 and Figure2 2 0
The LeNet architecture, as shown in Figure 2.19, is a 2D convolutionnetworkconsistingof2convolutionlayersand3fullyconnectedlayerswithasliding window of [5x5] LeNet uses a sub-sampling solution, which is a mean poolinglayer,to reduce the data dimension without changing the features.However,this solution has limitations as the model is difficult to converge, and the sigmoid activation function in each convolutionlayeris computationally complex, resulting in early saturationduringtraining,whichaffectstheclassificationresults.Theclassification accuracyachievedbyLeNetwasapproximately54%asshowninFigure2.22.
Figure 2.22: The accuracy of DEMON-LeNet (54%)The VGG19 architecture, as shown in Figure 2.20, along with the VGG16 net- work, initiated a trend of improving accuracybyincreasing the depth of the model. Both VGG16 and VGG19 are structures used with 13 and 16 convolutionlayersrespectively,eachwithaslidingwindowof[3x3],combinedwith3fullyc onnectedlayersat the end of the model The input data of each convolutionlayergradually decreases, compensatedbyan increasing depth of thenetwork,combined with the ReLu activation function for training, which has helped the VGG19 model im-proveaccuracybymore than 9% compared to LeNet.
The classification accuracyachievedbyVGG19wasapproximately63%asshowninFigure2.23.
Figure 2.23: The accuracy of DEMON-VGG (63%)
2.4.2 Classificationresultsof theproposed spectralamplitudevariation algorithm withLeNetandV G G 1 9
When using a spectrogram dataset createdbythe proposed algorithm with the LeNet and VGG19 convolution networks, as described in Section 2.4.1, the classification accuracyachieved was70% as shown in Figure 2.24 for LeNet and 78% as shown in Figure 2.25 for VGG19 The classification efficiency when pre- processing with the DEMON-Hilbert and proposed algorithm combined with the LeNetandVGG19networksissummarizedinTable2.4.
Table 2.4: Classification results of DEMON and AV combined with CNNs
Figure 2.24: The accuracy of AV-LeNet (70%)
Figure 2.25: The accuracy of AV-VGG (78%)
Theclassificationefficiencywhenpre-processingwiththeproposedspectralam- plitudevariationalgorithmimprovedby16% and 15%, respectively, compared to the DEMON-Hilbert algorithm The factor thatimprovesthe quality of signal clas- sification is due to the proposed spectral amplitude variation algorithm that ex- tractsmoreaccurateandcompletefeaturefrequenciesofpropellershipsignalsfrom actual ShipEarrawdata compared to DEMON-Hilbert algorithm This increases the probability of correct detection and reduces false alarm in signal classification, aspresentedinTable2.1ofSection2.2.4.
These results once again indicate that the trend of improving and optimizing the pre-processing step willcontinuetobea potential research direction in the problem of classifying propeller ship signals in shallow waterszones.
2.4.3 Classificationresultsof DEMON-Hilbert and theproposedspectralamplitude variation algorithmcombinedwith theproposedCNN
By using spectrograms generated from the DEMON-Hilbert and the proposed amplitude variation algorithm combined with a customized convolution network, the model achieved classification accuracy of 80% and 90%, respectively.
Figure 2.26: The accuracy of DEMON-proposed CNN (80%)
Figure 2.27: The accuracy of AV-proposed CNN (90%)
In Figure 2.26, the DEMON-Hilbert combining with proposednetwork achievedaclassificationaccuracyofapproximately80%,whichincreasedclassificationaccu- racyby26% and 17%, respectively, compared to the LeNet and VGG19 networks.Similarly,inFigure2.27,theproposedspectralamplitudevariationa ndproposed networkincreasedclassificationaccuracyby20%and12%,respectively,compared to the LeNet and VGG19 networks, whichachieveda classification accuracy of approximately90%.
When evaluating the effectiveness of a CNN model, besides classification accu- racy,it is necessary to determine whether the model suffers from overfitting This phenomenon often occurs when the model is too complex, leading to the feature learning stage being too tightly constrained, resulting inpoorperformance on new data If the model has higher accuracy on the training data butmuchlowerac- curacy on the test data, the model is said tobeoverfitting, aswasthe case with the LeNet (Figure 2.22, Figure 2.24) and VGG19 (Figure 2.23, Figure 2.25) net- works The results alsoshowedthat the proposed model in Figure 2.27continuedtoimproveclassificationaccuracyasthenumberoftrainingiteration sincreased.
The classification results for each class are described in the confusion matrix in Figure 2.28.
Theimprovementintheclassificationperformanceof theunderwatersignal is attributed to use a convolution neuralnetworkwith a convolution matrix size of [5x5],whichisequivalenttotheLeNetarchitecture.Thisallowsfortheretentionof moreinformationfromthespectrogramthanVGG19,withoutrequiringanincrease in model depth If the matrix size is even, padding is added to the left of the input matrix during convolution, creating an asymmetric input structure Therefore, an odd matrix size is used in processing underwater acoustic signals to ensure that there is a pixel at the center of the input matrix This center point is a critical point that represents the position of the filter andimprovesthe accuracy of the classificationprocess.
In addition, the proposednetworkstructure usesbatchnormalization (BN) after the activation function If BN is applied before activation, negativevaluedfeaturesmaybegenerated after normalization, which would reduce the effectiveness of classification Normalizing the output ensures that the coefficients are balanced, whichimprovesthemodel’sconvergenceandavoidsthevanishinggradientproblem associated with deepmultilayermodels duringt ra i n in g
Although modern CNN such as the Inception [129] (2017, 2020) and ResNet
[131] (2016, 2019, 2021) architecturehaveachievedhigher accuracy than single- branchmodelslike VGGin the ImageNet [112] competition, those stillhavesome limitations in terms of flexibility and memory resources The problem of classify- ingpropellershipsignalsisgenerallyperformedonfixed-anchoragedevices,whichhavelimited hardware upgrade capabilities, requiring system flexibility and opti- mization. Multi-branchnetworkshavelimitations when using hardware memory because the results of the training of all branches willbesaved untilthe phaseo f j o i n i n g t h e b r a n c h e s a s o n e W i t h s i n g l e - b r a n c h n e t w o r k s , m e m o r y i s i m m e d i - a t e l y r e l e a s e d a f t e r e a c h c a l c u l a t i o n , w h i c h o p t i m i z e s m e m o r y capacity.Therefore,multi-branch networksface difficulties in integration with embedded devices.Ad-ditionally,multi-branch networks are inflexible in the model building phase, such as the ResNetnetworkrequiring that the shortcut connections in the outputlayeralwaysbetensors of the same size tobeable to join the branches together, making thenetworkbuilding process complex and resource- intensive Therefore, in addi- tion to the factor of classificationaccuracy,other optimizing factors also need tobeconsideredandselectedtobecompatiblewithembeddeddevicesinpractice. Σ
Gradient-weighted ClassActivationMapping (GradCAM) [120] is used to test that the sliding windows aftermanytraining epochs of the proposed solution can properly and sufficiently extract information from the spectrogram or not, thereby evaluating more intuitively the effectiveness of the proposed structure The globalaveragepoolingofgradientcomponentsisrepresentedbytheweightedsumformula(2.23 ):
Figure 2.29 shows the block diagram of the proposed Grad-CAM algorithm [120].
Figure 2.29: Structure of Grad-CAM model
The Grad-CAM heatmaps depicted in Figure 2.30 and Figure 2.31 are an intu- itive way to evaluate whether a model has learned the correct partition containing the most important features of the data The area of the heatmap with a darker shade corresponds to the number of times the sliding windows pass through this informative region Thus, the proposed CNN model can accurately extract the re- gions containing the distinctive features of the input signals, thereby significantly improving the classification performance of propeller ships.
In summary, the two-stage processing solution, which combines convolution neu- ralnetworkmodels with signal pre-processing, is a suitable approach in the current context.Improvingtheperformanceoftheneuralnetworkfromthepreviouslyused convolutionneuralnetworksforimageprocessingwillsignificantlyenhancetheclas- sification performance.However,pre-processing is also important in ensuring theintegrityof the data It also enhances the quality and diversity of the training set to servenewtrainingstages,preventingthemodelfromlearningoverlyrigidinforma- tion from the underwater data With these proposals, the modelachievedhigher accuracy and reduced overfitting, ensuring better generalization when analyzing and processingd a t a
The proposed method of combining spectral amplitude variation with a cus- tomized convolution neural network achieved a classification accuracy of 90%, which is better than using single-branch CNN models with existing configurations such as LeNet [79], and VGGs [37] This accuracy is the best among the tested models with the same dataset ShipEar, as summarized in Table 2.5.
Table 2.5: Classification results of DEMON and AV combined with CNNs
Chapter2conclusion
Previously, the detection and classification of underwater acoustic signals in ocean acoustics relied on models based on known physical principles of the envi- ronment.Unlikeclassical methods, methods that utilize artificial intelligence canbeconsideredmodel-free,astheydonotsolelyrelyonphysicalmodels.Instead,AI models willtakesamples from the collected sound data to estimate characteristic features. The application of AI in underwater sound source classification is a rela- tively new research field with potential toadvancerecentprogress in mathematical modeling toimprovereal-time processinge f f i c i e n c y
Chapter 2’s approach is based on the idea of using a neural network trained on data that has been modeled using a new proposal, to analyze the solution transfer via weight function transformations The proposed solution incorporates devel- opments in propagation modeling, parallel computing tools, and computation on recorded acoustic signal data from sources such as propeller ships in environments withcomplicatedbackgroundnoise.Thecombinationofsignalpre-processingwith the neuralnetworkhelps the system to process in real-time while still producing accurate results Thenetworkconfiguration is optimized for the specific purpose of processing complexlyvaryingunderwater acoustic signals, with a focus on ex- tracting desired features The proposed variable spectral amplitude modulation techniquehasovercometwolimitationsoftheclassicalDEMONtechniquewithout increasing algorithmcomplexity,reducing processing time Therefore, the combi- nation of pre-processing and neuralnetworknot only makes the sonar system more flexiblebutalsoenhancesthedesiredaccuracyincurrentreal-worldproblems. Another issue that needs attention is the simultaneous occurrence of multiple biotic sources (communication signals of marine mammals) and abiotic acoustic sources (noise from ships) in water environments The challenge is to classify both propellershipsignalsandmarinemammalscommunicationsignalssimultaneously.This problem willbepresented in detail in Chapter 3 regarding the approach to the subject, the reasons for source classification, the limitations of current approaches, and proposing a model forprocessing.
The results presented in Chapter 2 were published in [J1], [C2], [J3], [C4], [J7], and [C8].
• The thesis studied the traditional DEMON algorithm and itsvariants,simu- lated and evaluated their effectiveness on real-world data [J1] The limitations of DEMON were then addressedbyproposing the spectral amplitudevariation algorithm and configuring a convolution neuralnetworkto classify actual ship signals datasets[ C2 ]
• The thesis proposed a model, demonstrated the mathematical proof, simu- lated and evaluated the results using a combination of the proposed spectral amplitudevariationalgorithm and customized CNN The proposed solutionachieveda signal detection rate of 98.22% and classification rate of
[154] on the same ShipEar dataset, which achieved accuracy rates of 88.92%
(2021) and 96.67% (2022), and the classification accuracy is also higher than the 86.75% (2021) achieved by these works.
• The proposed amplitude variation algorithmwasevaluated on real-world data for detecting respiratory signals of diver breath, and the resultsshowedan increase in signal detection accuracy compared to the DEMON algorithm, demonstrating the effectiveness and soundness of the proposed solution[ C 8 ]
• Thethesisconductedresearchanduseddirectrecordingsfromreal-worldenvi- roments to ensure the rationality of the solution for underwater acoustic signal acquisitionthroughtheUSBLsysteminmarineenvironments[C4].Thisstudy provides scientific grounds for further research on building a dataset of under- wateracousticsignalsinthecoastalareasofVietnam.
OF MARINE MAMMAL AND
Marine mammals communicationsignalstructure
Similar to humans, marine mammals’ acoustic communication signals are pro- duced by a set of tissues located in the larynx in the throat [142] The larynx contains folds called vocal cords, and vibrations created by the airflow from the lungsintothe oralcavity,depending on the shape and tension, can produce differ- entsounds All marine mammals produce sounds, and almost all sounds createdbymammalsaretheresultofthemotionofairthroughvarioustissues. Inthefirststage,airispushedupfromthelungs,creatingpressureonthelarynx, which opens to allow the flow of air through, and the pressure decreases, causing the larynx to automatically close This closure increases the pressure, and the processrepeats[108].Therepeatedopeningandclosingofthevocalcordsgenerate soundwavefrequencies,andthesefrequenciescontainuniquecharacteristicsofeach species The shape and tension of the vocal cords can alsobeindividually adjusted withineachspeciestoproducedifferentsounds.Inaddition,soundisalsoinfluencedbychangesinth eshapeoftheoralcavity,tongue,andlips.
Some marine mammals create different acoustic signalsbyslappingbodyparts onto the water surface Among them, bottle-nose dolphins andhumpbackwhales slap their tails on the water surface, creating broadband signals in the range of 30- 12000Hz[110].Theprocessofthesemammalsleapingoutofthewaterandslappingbodyparts on the surface creates noise and also generates water bubbles These bubbles burst and create acoustic pulses that propagate in the water, similar to the phenomenon of a duck’s paddlingmovement.In nature, marine mammals listen to these sounds andmakefinal decisions based on the specific characteristics of each species.Clearly,understanding the mechanism of "hearing" is more important than
Proposal of the cubic-splines interpolationpre-processing 89
In practice, the process of collecting and storing underwater acoustic signals from acoustic sources requires a large number of parameters related to environ- mental conditions, systemdeploymenttiming, and hardwarestability.Therefore, the stability of a datasetsmaybeuncertain.Additionally,when processingrawdata using digital signal processing methods that utilize parameters such as win-dowlength and filter coefficients, classification errors can occur due to differences in the resolution of each operation.Tomitigate these effects, an interpolation al- gorithm is used to enhance the quality of the data after transformation fromrawacousticdata,priorfedintotheclassificationmodeling.
Most underwater acoustic signals share common characteristics with time-series signals, where as the number of frequencies increases and the frequencyintensitychanges become more complex, the ability to extract useful information from the originalrawdata decreases The challenge is even greater when underwater acous- tic datasetshavelimited information about measurement and signal acquisition processes.Toreduce noise, statistical filters such asmovingaverage(MA) filters [84], [141] canbeused, but this approach can alter the correlation structure of the data, potentially resulting in loss of some data points at the beginning or end of thesignal.
The solution of using statistical methods and data interpolation does not in- crease the amount of useful information but rather focuses on highlighting the features of the data that require processing In this case, the interpolation algo- rithm functions similarly to a digital signal filter for extracting salient features of the dataset before subsequent steps of a classification system.
Assuming a cubic-splines interpolation in theintervalfromx 0tox n d e fi n e d by a set of polynomialsf i (x), the mathematical representation of a cubic-splines interpolation [59] is: f i (x)=A i (x−x i ) 3 +B i (x−x i ) 2 +C i (x−x i )+D (3.1) wherei=1,2, ,n−1; andnis the number of knots Therefore,n−1cu- bicpolynomialswillformthecubic-splinesinterpolation.Ifwehaven+1knots(x 0 , y 0), (x 1 ,y 1), ,(x n ,y n )withx i+1–x i =q,then the interpolation functionf i (x)mustsatisfy four conditionss i m u l t a n e o u s l y : f i (x) =y i (3.2) f i (x i+1)=f i+1(x i+1) (3.3) f i ′ (x i+1)=f i ′ + ′ 1 (x i+1) (3.4) f i ′′ (x i+1 )=f i ′ + ′ 1 (x i+1 ) (3.5)Solving formula(3.2), we have:D=y 1. i q 6 i
Expanding the first and second derivatives off i (x): f i ′ (x)=3a i (x−x i ) 2 +2b i (x−x i ) (3.6) f i ′′ (x)=6a i (x−x i )+2b i (3.7) LetM=f i ′′ (x), expanding formula(3.7):
Substituting this resultintoformula(3.5)to calculate thevalues:
6q Similarly, expanding(3.4)and(3.5)to calculatec: c=y i+1 −y i
The solution of cubic-splines interpolation will avoid the instability of high- degree polynomial interpolation and the limitations of statistical models Prepro- cessed data will be passed through a cubic-splines interpolation to estimate in- termediate pixel values between known values, thereby improving the quality of features and creating a new spectrogram with coherence between frequency com- ponents.
3.2.2 Interpolations on thefrequencydomain andproposedso- lutions
Fora given sequence of data points, the goal of an interpolation algorithm is to provide intermediate data points to adjust the sequence to get specific require- ments Consider a functionγ:R→R, which maps a parametert∈Rto avalueγ(t)∈R.Assuming that thevaluesofγ(t n )are discrete witht∈Randn∈Z, interpolation algorithms estimate thevalueofγ ∗ (t)from the knownvaluesofγ(t n ) witht∈R, such that: γ ∗ (t)≈γ(t) (3.9)
In practical applications, underwater signalsmaycontain signals from one or more objects in the same record Therefore, it is necessary to evaluate andcompare thequalityofeachinterpolationmethodonthedataset.
• If the simplest interpolation operation, Piece-wise constant interpolation (PCI), is used with a parametert∈R, the nearest parametert n and thevalueof the interpolation function are determined using theformu la: γ ∗ (t)=γ(t n ) (3.10)
• Iflinearinterpolation(LI)isusedwithaparametert i intherangeoft n−1to t n , the value of the interpolation function is determined by the formula: γ ∗ (t)=γ(t n−
• Ifcubic-splinesinterpolation(CSI)isusedwithaparametert i intherangeof t n−1tot n , the value of the CSI function is calculated using formula(3.1): γ ∗ (t)=a i (x−x i ) 3 +b i (x−x i ) 2 +c i (x−x i )+d (3.12)
Apply formulas(3.10),(3.11),(3.12)for the following cases:
A Inthecaseofanunderwaterdatasequencecontainingonlyonesinusoidalwave,ass uming that it has the formγ(t)=sin(2t n )witht n ∈[0,11], theinter- mediatevaluesarecalculatedusingthePCI,LI,andCSIrespectivelyasfollows:
Figure 3.1: Calculating intermediate values by PCI
Figure 3.2: Calculating intermediate values by LI
Figure 3.3: Calculating intermediate values by CSI
B In the case of an underwater data sequence containing more than one si- nusoidal mechanicalwave,assuming that it has the formγ 1(t) 1.8sin(2π6t)andγ 2(t) = 0.6sin(2π18t), the intermediatevaluesare calculated using the same interpolation methods as in CaseA:
The signal consists of two wavesγ 1(t)andγ 2(t), as described in Figure 3.4.
Figure 3.4: Consistent signal oftwo wavesThesignalafterusingFFTanalysisisdescribedinFigur e3.5.
Figure 3.5: Spectrum of consistent signal
The spectral values after processing with the interpolation methods are described in Figure 3.6, Figure 3.7, and Figure 3.8 as follows:
Figure 3.6: Spectrum after using Piece-wise constant interpolation
Figure 3.7: Spectrum after using linear interpolation
Figure 3.8: Spectrum after using cubic-splines interpolation
Thevaluesof the PCI method are discrete points in Figure 3.1 and Figure 3.6, so it will encountermanylimitations when applied to nonlinear functions that are characteristic of practical applications The results of cubic-splines interpolation ensure the continuous space of the data represented in Figure 3.3 and Figure 3.8, which is suitable for processing biotic and abiotic signals in practicala p p l i c a t i o n s
The communication signals of marine mammals are characterizedbya form of time-varying signals with complex patterns Many marine mammalshavesound production structures similar tohumans[142], whichmakethe use of biologically inspired filtering methods such as MFCC reasonable STFT has severaladvantagesin processing periodic signals that are mixed in complex noise backgrounds [121].Waveletiswell- adapted[15]toenvironmentswithcontinuouslychangingsignalen- vironmentssuchasshallowwaterszones.Therefore,toimprovetheeffectivenessof pre- processing, the thesis proposes the simultaneous use of three methods: STFT, Mel coefficient, andWavelet-Hartransformation combined with cubic-splines in- terpolation to pre-processrawrecords and creating spectrograms The proposed solution block diagram and proposed algorithmflowchartare shown in Figure 3.9 and Figure3 1 0
Figure 3.9: Block diagram of the proposed solutionThe result of pre-processing is a 2D spectrogram in which theintensityon the spectrogram represents the strength of the signal Each set of three spectrograms generated from the three transformations on the same signal samples arestackedtogether to form a final spectrogram The set of final spectrograms is then fedintoBranch-1, which employs a customized CNN used in Figure 2.17,Section 2.3.2, Chapter 2; and Branch-2, which uses a Siamese triple lossnetworkcombinedw i t h
S i =STFT(x,ω,θ i ) Δω i = ω i - ω 0 Δt i = t i - t 0 j = j-1 i = i - 1 initialization ω 0 , t 0 i = k, j = k aprobabilitydistributionmodelinthehiddenspace(asdescribedinSection3.3)to evaluate the accuracy of the classification results of complex underwater acoustic signals, including marine mammals and propellersignals.
Figure 3.10: Flowchart of proposed algorithm for processing marine mammal signals
Algorithm 3.1Proposed cubic-splines interpolation for pre-processing
Require:Raw datax, functionw, parameterθ= [θ 1 , θ 2 , , θ k ]
Ensure:TensorZwithkkernel Start the valuesw 0vàn 0 for i=[1,2, ,k]d o
Setw 0vàn 0minimum if∆w i