fault dialogis of spur gear box using artificial neural network

Applied Soft Computing 10 (2010) 344–360 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Fault diagnosis of spur bevel gear box using artificial neural network (ANN), and proximal support vector machine (PSVM) N Saravanan *, V.N.S Kumar Siddabattuni, K.I Ramachandran Department of Mechanical Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, India A R T I C L E I N F O A B S T R A C T Article history: Received July 2008 Received in revised form April 2009 Accepted August 2009 Available online August 2009 Vibration signals extracted from rotating parts of machineries carries lot many information with in them about the condition of the operating machine Further processing of these raw vibration signatures measured at a convenient location of the machine unravels the condition of the component or assembly under study This paper deals with the effectiveness of wavelet-based features for fault diagnosis of a gear box using artificial neural network (ANN) and proximal support vector machines (PSVM) The statistical feature vectors from Morlet wavelet coefficients are classified using J48 algorithm and the predominant features were fed as input for training and testing ANN and PSVM and their relative efficiency in classifying the faults in the bevel gear box was compared ß 2009 Elsevier B.V All rights reserved Keywords: Artificial neural network Proximal support vector machine Bevel gear box Morlet wavelet Statistical features Fault detection Introduction Malfunctions in machinery are often sources of reduced productivity and increased maintenance costs in various industrial applications For this reason, machine condition monitoring is being pursued to recognize incipient faults As modern production plants are expected to run continuously for extended hours, unexpected downtime due to rotating machinery failures has become more costly than ever before The faults arising in rotating machines are often due to damages and failures in the components of gear box assembly Fault diagnosis is an important process in preventive maintenance of gear box, which avoids serious damage if defects occur to one of the gears during operation condition Early detection of the defects, therefore, is crucial to prevent the system from malfunction that could cause damage or entire system halt Diagnosing a gear system by examining vibration signals is the most commonly used method for detecting gear failures The conventional methods for processing measured data contain the frequency domain technique, time-domain technique, and time-frequency domain technique These methods have been widely employed to detect gear failures The use of vibration analysis for gear fault diagnosis and monitoring has been widely investigated and its application in industry is well established [1– 3] This is particularly reflected in the aviation industry where the * Corresponding author at: Sohar University, Sohar, Oman E-mail address: nsaro_2000@yahoo.com (N Saravanan) 1568-4946/$ – see front matter ß 2009 Elsevier B.V All rights reserved doi:10.1016/j.asoc.2009.08.006 helicopter engine, drive trains and rotor systems are fitted with vibration sensors for component health monitoring These methods have traditionally been applied, separately in time and frequency domains A time-domain analysis focuses principally on statistical characteristics of vibration signal such as peak level, standard deviation, skewness, kurtosis, and crest factor A frequency domain approach uses Fourier methods to transform the time-domain signal to the frequency domain, where further analysis is carried out, and conventionally using vibration amplitude and power spectra It should be noted that use of either domain implicitly excludes the direct use of information present in the other Time-frequency based energy distribution method was employed for early detection of gear failure [4] The frequency domain refers to a display or analysis of the vibration data as a function of frequency The time-domain vibration signal is typically processed into the frequency domain by applying a Fourier transform, usually in the form of a fast Fourier transform (FFT) algorithm [5] The works presented in [6–9] found that, the FFT-based methods are not suitable for non-stationary signal analysis and are not able to reveal the inherent information of non-stationary signals However, various kinds of factors, such as the change of the environment and the faults from the machine itself, often make the output signals of the running machine contain non-stationary components Usually, these non-stationary components contain abundant information about machine faults; therefore, it is important to analyze the non-stationary signals Most algorithms recently developed for mechanical fault detection are based on the N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 assumption of stationarity of the vibration signals Some of these, including cepstrum, time-domain averaging, adaptive noise cancellation, demodulation analysis, etc [10–12] are well established and have proved to be very effective in machinery diagnostics However, in many cases these methods are not sufficient to reliably detect different types of faults There is a need for new techniques which can cope with technological advances in machinery, and which provide satisfactory fault detection sensitivity A relatively small amount of applied research has been done in the application of time-variant fault detection methods It is known [13,14], that local faults in gear boxes cause impacts As a result of this impact excitation, impulses and discontinuities may be observed in the instantaneous characteristics of the envelope and phase functions [14,15] Due to the nature of these functions, vibration signals can be considered as non-stationary [16] and strong non-stationary events can appear in a local time period, e.g one revolution of gear in mesh The analysis of non-stationary signals requires specific techniques which go beyond the classical Fourier approach There exist a lot of different time-variant methods, some are reviewed in [16–18] In the recent past reports of fault diagnosis of critical components using machine learning algorithms like SVM, PSVM are reported [19] In ANN, the condition-monitoring problem is treated as a generalization/classification problem based on training pattern from the samples of faulty roller bearings [20] However, the traditional ANN approaches have limitations on generalization of results in models that can over-fit the data Support vector machine (SVM) is used in many applications of machine learning because of its high accuracy and good generalization capabilities SVM is based on statistical learning theory SVM classifies better than ANN because of the principle of risk minimization In artificial neural network (ANN) traditional Empirical Risk Minimization (ERM) is used on training data set to minimize the error But in SVM, Structural Risk Minimization (SRM) is used to minimize an upper bound on the expected risk SVM is modeled as an optimization problem and involves extensive computation, whereas, PSVM is modeled as a system of linear equations which involves less computation [21] PSVM gives results very close to SVM One of the more recent mathematical tools adopted for transient signals is the wavelet transform [22,23] Wavelet transform (WT) has attracted many researchers’ attention recently The wavelet transform was utilized to represent all possible types of transients in vibration signals generated by faults in a gear box [24] A neural network was used to diagnose a simple gear system after the data have been pre-processed by the wavelet transform [25] Wavelet transform was used to analyze the vibration signal from the gear system with pitting on the gear [26] Hence based on the literature review there exist a wide scope to explore machine learning methods like ANN, SVM and PSVM for fault diagnosis of gear box This paper is one such attempt to apply machine learning methods like ANN and PSVM to wavelet features of the vibration signal of the gear box under investigation This work deals with extraction of wavelet features from the vibration data of a bevel gear box system and classification of Gear faults using artificial neural network (ANN) and proximal support vector machine (PSVM) The vibration signal from a piezoelectric transducer is captured for the following conditions: Good Bevel Gear, Bevel Gear with tooth breakage (GTB), Bevel Gear with crack at root of the tooth (GTC), and Bevel Gear with face wear of the teeth (TFW) for various loading and lubrication conditions of the gear box A group of statistical features like kurtosis, standard deviation, maximum value, etc form a set of features, which are widely used in fault diagnostics, are extracted from the wavelet coefficients of 345 Fig Flow chart for bevel gear box condition diagnosis the time-domain signals Selection of good features is an important phase in pattern recognition and requires detailed domain knowledge The Decision Tree using J48 algorithm was used for identifying the best features from a given set of samples The selected features were fed as input to ANN and PSVM for classification 1.1 Different phases of present work The signals obtained are processed further for machine condition diagnosis as explained in the flow chart in Fig Experimental studies The fault simulator with sensor is shown in Fig and the inner view of bevel gear box is shown in Fig A variable speed DC motor (0.5 hp) with speed up to 3000 rpm is the basic drive A short shaft of 30 mm diameter is attached to the shaft of the motor through a flexible coupling; this is to minimize effects of misalignment and transmission of vibration from motor The shaft is supported at its ends through two roller bearings From this shaft the motion is transmitted to the bevel gear box by means of a belt drive The gear box is of dimension Fig Fault simulator setup N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 346 Table Details of faults under investigation Gears Fault description Dimension (mm) G1 G2 G3 G4 Good Gear Tooth Breakage (GTB) Gear with crack at root (GTC) Gear with face wear – 0.8 Â 0.5 Â 20 0.5 Table Gear wheel and Pinion details Fig Inner view of the bevel gear box 150 mm Â 170 mm Â 120 mm and the full lubrication level is 110 mm and half lubrication level is 60 mm SAE 40 oil was used as a lubricant An electromagnetic spring loaded disc brake was used to load the gear wheel A torque level of N m was applied at the full load condition The various defects are created in the pinion wheels and the mating gear wheel is not disturbed With the sensor mounted on top of the gear box vibrations signals are obtained for various conditions The selected area is made flat and smooth to ensure effective coupling A piezoelectric accelerometer (Dytran model) is mounted on the flat surface using direct adhesive mounting technique The accelerometer is connected to the signalconditioning unit (DACTRAN FFT analyzer), where the signal goes through the charge amplifier and an Analogue-to-Digital Converter (ADC) The vibration signal in digital form is fed to the computer through a USB port The software RT Pro-series that accompanies the signal conditioning unit is used for recording the signals directly in the computer’s secondary memory The signal is then read from the memory and replayed and processed to extract different features 2.1 Experimental procedure In the present study, four pinion wheels whose details are as mentioned in Table were used One was a new wheel and was Parameters Gear wheel Pinion wheel No of teeth Module Normal pressure angle Shaft angle Top clearance Addendum Whole depth Chordal tooth thickness Chordal tooth height Material 35 2.5 208 908 0.5 mm 2.5 mm 5.5 mm 3.93À0.150 mm 2.53 mm EN8 25 2.5 208 908 0.5 mm 2.5 mm 5.5 mm 3.92À0.110 mm 2.55 mm EN8 assumed to be free from defects In the other three pinion wheels, defects were created using EDM in order to keep the size of the defect under control The details of the various defects are depicted in Table and its views are shown in Fig The size of the defects is a little bigger than one can encounter in the practical situation; however, it is in-line with work reported in literature [27] The vibration signal from the piezoelectric pickup mounted on the gear box was taken, after allowing initial running of the gear box for some time The sampling frequency was 12,000 Hz and sample length was 8192 for all conditions The sample length was chosen arbitrarily; however, the following points were considered Statistical measures are more meaningful, when the number of samples is more On the other hand, as the number of samples increases the computational time increases To strike a balance, sample length of around 10,000 was chosen In some feature extraction techniques, which will be used with the same data, as per the Nyquist criteria the number of samples is to be 2n The nearest 2n to 10,000 is 8192 and hence, it was taken as sample length Many trials were taken at the set speed and vibration signal was stored in the data The raw Fig (a) View of Good Pinion wheel; (b) View of Pinion wheel with face wear (TFW); (c) View of Pinion wheel with tooth breakage (GTB); (d) View of Pinion wheel N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 vibration signals acquired for various experimental conditions form the gear box using FFT are shown in Fig 5(a)–(d) Feature extraction After acquiring the vibration signals in the time domain, it is processed to obtain feature vectors The Continuous Wavelet Transform (CWT) is used for obtaining the wavelet coefficients of the signals The statistical parameters of the wavelet coefficients are extracted, which constitute the feature vectors The term wavelet means a small wave It is the representation of a signal in terms of finite length or fast decaying waveform known as mother wavelet This waveform is scaled and translated to match the input signal 347 The Continuous Wavelet Transform [28] is defined as W s t ị ẳ Z ỵ1 f tịC s; j tị dt À1 tÀt where C s; j tị ẳ p C s jsj is a window function called the mother wavelet, s is a scale and t is a translation The term translation is related to the location of the window, as the window is shifted through the signal This corresponds to the time information in the transform domain But instead of a frequency parameter, we have a scale Scaling, as a mathematical operation, either dilates or compresses a signal Smaller scale corresponds to high frequency of signals and large scale corresponds to low frequency signals Fig (a) Vibration Signal for Good Pinion wheel under different lubrication and loading conditions; (b) Vibration Signal for Pinion wheel with Teeth Breakage under different lubrication and loading conditions; (c) Vibration Signal for Pinion wheel with crack at root under different lubrication and loading conditions; (d) Vibration Signals for Pinion wheel with Teeth Face Wear under different lubrication and loading conditions 348 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 Fig (Continued ) Fig Morlet wavelet [30] Fig % Efficiency of Morlet wavelet coefficients N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 The wavelet series is simply a sampled version of the CWT, and the information it provides is highly redundant as far as the reconstruction of the signal is concerned This redundancy, on the other hand, requires a significant amount of computation time and resources 349 statistical features corresponding to it were given as input for J48 algorithm to determine the predominant features to be given as an input for training and classification using SVM Fig gives the efficiencies of all scales of Morlet wavelet Using J 48 algorithm in the present work 3.1 Wavelet-based feature extraction The multilevel 1D wavelet decomposition function, available in Matlab is chosen with the Morlet wavelets specified It returns the wavelet coefficients of signal X at scale N [29] Fig shows Morlet wavelet Sixty-four scales are initially chosen to extract the Morlet wavelet coefficients of the signal data The efficiency of 64 scales of Morlet wavelets was obtained using WEKA data mining software and the coefficients of highest level are considered for classification Since the eighth level gave maximum efficiency of 96.5%, the A standard tree induced with c5.0 (or possibly ID3 or c4.5) consists of a number of branches, one root, a number of nodes and a number of leaves One branch is a chain of nodes from root to a leaf; and each node involves one attribute The occurrence of an attribute in a tree provides the information about the importance of the associated attribute as explained in [31] A Decision Tree is a tree based knowledge representation methodology used to represent classification rules J48 algorithm (A WEKA implementation of c4.5 Algorithm) is a widely used one to construct Decision Trees as explained in [19] The Decision Tree algorithm has been Fig (a) Good-Dry-No Load vs GTB, GTC, TFW-Dry-No Load (b) Good-Dry-Full Load vs GTB, GTC, TFW-Dry-Full Load (c) Good-Half Lub-No Load vs GTB, GTC, TFW-Half LubNo Load (d) Good-Half Lub-Full Load vs GTB, GTC, TFW-Half Lub-Full Load (e) Good-Full Lub-No Load vs GTB, GTC, TFW-Full Lub-No Load (f) Good-Full Lub-Full Load vs GTB, GTC, TFW-Full Lub-Full Load 350 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 Fig (Continued ) applied to the problem under discussion Input to the algorithm is set of statistical features of the eighth scale Morlet coefficients It is clear that the top node is the best node for classification The other features in the nodes of Decision Tree appear in descending order of importance It is to be stressed here that only features that contribute to the classification appear in the Decision Tree and others not Features, which have less discriminating capability, can be consciously discarded by deciding on the threshold This concept is made use for selecting good features The algorithm identifies the good features for the purpose of classification from the given training data set, and thus reduces the domain knowledge required to select good features for pattern classification problem The decision trees shown in Fig 8(a)–(f) is for various lubrication and loading conditions of different faults compared with good conditions of the pinion gear wheel Based on above trees its clear that of all the statistical features, standard error, kurtosis, sample variance and minimum value play a dominant role in feature classification using Morlet coefficients These four predominant features are fed as an input to SVM for training and further classification The scatter plot showing the variation of the statistical parameters of Morlet coefficients are shown in Fig 9(a)–(d) These features were given as input for training and testing of classifying features using SVM Artificial neural network ANN is one of the approaches to forecast and validate using computer models with some of the architecture and processing capabilities of the human brain [22] The technology that attempts to achieve such results is called neural computing or artificial neural networks ANN mimics biological neurons by simulating some of the workings of the human brain An ANN is made up of processing elements called neurons that are interconnected in a network The artificial neurons receive inputs that are analogous to N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 the electro-chemical signals that natural neurons receive from other neurons By changing the weights given to theses signals, the network learns in a process that seems similar to that found in nature i.e., neurons in ANN receive signals or information from other neurons or external sources, perform transformations on the signals, and then pass those signals on to other neurons The way information is processed and intelligence is stored depends on the architecture and algorithms of ANN Fig 10 shows the architecture of ANN A main advantage of ANN is its ability to learn patterns in very complex systems Through learning or self-organizing process, they translate the inputs into desired outputs by adjusting the weights given to signals between neurodes 351 The proposed method diagnoses a gear box condition using ANN A multi layered feed forward neural network trained with error back propagation was used ANN’s are characterized by their topology, weight vector and activation functions They have three layers namely an input layer that receives signals from some external source, a hidden layer that does the processing of the signals and output layer that sends processed signals back to the external world 5.1 The back propagation algorithm of ANN The back propagation of an ANN assumes that there is a supervision of learning of the network The method of adjusting Fig (a) Vibration Signal for Good Pinion wheel under different lubrication and loading conditions; (b) Vibration Signal for Pinion wheel with Teeth Breakage under different lubrication and loading conditions; (c) Vibration Signal for Pinion wheel with Crack at root under different lubrication and loading conditions; (d) Vibration Signals for Pinion wheel with Teeth Face Wear under different lubrication and loading conditions N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 352 Fig (Continued ) weights is designed to minimize the sum of the squared errors for a given training data set: j – identifies a receiving node, i – denotes the node that feeds a second node, I – denotes input to a neuron, O – denotes output of a neuron, Wij – denotes the weights associated with the nodes di j ẳ ; ỵ eI j Ij ¼ X W i jO j @SSE @SSE ¼ @W i j @O j @O j @I j @I j @W i j (2) where by convention the left-hand side is denoted by dij, the change in the sum of squared errors (SSE) attributed to Wij Now error is given by X ei ¼ ðD j O j ị; SSE ẳ D j O j Þ2 (3) Each non-input node has an output level Oj where Oj ¼ The derivation of the back propagation formula involves the use of the chain rule of partial derivatives and equals: (1) where Oi is each of the signals to node j (i.e., the output of node of i) Therefore, X @SSE ¼ À2 Dj À Oj @O j (4) N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 353 For the hidden layers, the calculations are similar The only change is how the ANN output error is back propagated to the hidden layer nodes The output error at the ith hidden node depends on the output errors of all nodes in the output layer This relationship is given by ei ¼ X Wi je j (9) After calculating the output error for the hidden layer, the update rules for the weights in that layer are the same as the previous update 5.2 Proximal support vector machine (PSVM) PSVM is a modified version of support vector machine (SVM) The SVM is a new generation learning system based on statistical learning theory SVM belongs to the class of supervised learning algorithms in which the learning machine is given a set of features (or inputs) with the associated labels (or output values) Each of Fig 10 ANN architecture From the output of the output node, we obtain, @O j ¼ O j ð1 À O j Þ @I j (5) P The input to an input node is Ij = WijOi Therefore the change in the input to the output node resulting from the previous hidden node, i, is @I j ¼ Oi @W i j (6) Thus from above equations, the jth delta is di j ¼ 2e j O j ð1 À O j ÞOi (7) Now the old weight is updated by the following equation: DW i j ðnewÞ ẳ hdi j O j ỵ a DW i j ðoldÞ (8) Fig 12 Standard SVM classifier Fig 11 Flowchart of PSVM Fig 13 Proximal support vector classification N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 354 these features can be looked upon as a dimension of a hyper-plane SVMs construct a hyper-plane that separates the data into two classes (this can be extended to multi-class problems) While doing so, SVM algorithm tries to achieve maximum separation between the classes (see Fig 11) separating the classes with a large margin minimizes a bound on the expected generalization error By ‘minimum generalization error’, we mean that when a new set of features (that is data points with unknown class values) arrive for Table Network statistics of artificial neural network for dry-No Load condition No of neurons in hidden layer RMS error Training epochs No of data items Number right Number wrong % Right % Wrong 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.0202744 0.027886651 0.033039122 0.02669973 0.042514288 0.069008367 0.037675745 0.035492562 0.083582272 0.049940002 0.578279597 0.032163411 0.037353878 0.029882686 0.043822589 0.054780718 0.042322687 0.043488274 0.061746534 0.034594848 0.043477307 0.044333622 0.039927358 0.035807325 0.036475296 0.042460454 0.040477199 0.052781818 0.049526151 1001434 9481 7400 10434 8961 6567 8767 7057 15979 6630 9013 7047 6711 12546 5283 8605 8016 7217 11068 8503 9962 10620 11765 8102 8022 10908 144369 7592 10928 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 19 19 18 17 18 19 19 19 18 19 10 17 18 18 18 19 17 18 17 18 19 18 19 18 18 18 19 18 18 1 1 10 2 3 2 2 2 95 95 90 85 90 95 95 95 90 95 50 85 90 90 90 95 85 90 85 90 95 90 95 90 90 90 95 90 90 5 10 15 10 5 10 50 15 10 10 10 15 10 15 10 10 10 10 10 10 10 Table Network statistics of artificial neural network for dry-Full Load condition No of neurons in hidden layer RMS error Training epochs No of data items Number right Number wrong %Right %Wrong 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.035232304 0.094450731 0.035595021 0.030008942 0.051336513 0.032124762 0.03158118 0.051483804 0.053034626 0.033798348 0.038777874 0.036169174 0.035605809 0.037275766 0.028795063 0.131970086 0.205737861 0.033373041 0.037727216 0.029334942 0.038630539 0.041809193 0.053820579 0.030759342 0.051702301 0.03797929 0.032527782 0.171258904 0.081935969 1055554 180984 17146 236843 56857 18176 18837 16685 7579 15186 18358 98707 12798 53324 17344 9015 16777 13414 13288 19024 12608 18696 26390 16846 14037 14222 16571 14761 10753 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 19 20 20 17 19 18 19 16 19 19 20 18 19 19 15 10 19 18 20 19 17 17 19 17 19 20 15 15 0 1 1 10 3 5 100 95 100 100 85 95 90 95 80 95 95 100 90 95 95 75 50 95 90 100 95 85 85 95 85 95 100 75 75 0 15 10 20 5 10 5 25 50 10 15 15 15 25 25 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 classification, the chance of making an error in the prediction (of the class to which it belongs) based on the learned classifier (hyper-plane) should be minimum Intuitively, such a classifier is one, which achieves maximum separation-margin between the 355 classes The above process of maximizing separation leads to two hyper-planes parallel to the separating plane, on either side of it These two can have one or more points on them The planes are known as ‘bounding planes’ and the distance between them is Table Network statistics of artificial neural network for Half-No Load condition No of neurons in hidden layer RMS error Training epochs No of data items Human right Number wrong %Right %Wrong 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.099700634 0.044283153 0.02854899 0.029160886 0.0296755 0.029882647 0.029265981 0.032606331 0.032691361 0.033121943 0.03078548 0.038352113 0.03174524 0.046436428 0.030087139 0.073644677 0.026681461 0.035761723 0.029016746 0.040685636 0.037312156 0.039350267 0.025759268 0.028639303 0.033887383 0.034645032 0.030207027 0.041310921 0.037251286 1027735 1061079 366892 18715 147065 57957 26362 14946 6524 7003 118663 14811 10016 21906 9564 60426 13808 13656 26958 8812 85835 50552 19646 9279 55669 8155 48023 39719 43922 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 15 16 19 19 19 19 19 18 18 18 20 18 18 18 18 16 20 18 19 18 19 17 20 18 20 18 20 19 19 1 1 2 2 2 2 2 1 75 80 95 95 95 95 95 90 90 90 100 90 90 90 90 80 100 90 95 90 95 85 100 90 100 90 100 95 95 25 20 5 5 10 10 10 10 10 10 10 20 10 10 15 10 10 5 Table Network statistics of artificial neural network for Half-Full Load condition No of neurons in hidden layer RMS error Training epochs No of data items Human right Number wrong %Right %Wrong 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.510457237 0.015551669 0.016713248 0.017305003 0.019093083 0.016318133 0.017980004 0.021373819 0.015515707 0.01764415 0.017038721 0.023660636 0.086082501 0.020718832 0.022411157 0.022821959 0.019023725 0.20575058 0.019475239 0.019102404 0.026062937 0.022969669 0.05386634 0.017371293 0.033099658 0.023589721 0.258975288 0.032913074 0.023728736 1053634 3622 3443 3349 10961 24373 3758 6803 3215 4572 3056 3484 7632 17163 10082 5791 3650 48620 4379 6705 2828 3996 3981 12579 3239 6561 50564 152507 31231 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 19 20 20 20 20 20 20 19 17 19 20 20 20 19 20 19 19 20 18 20 17 19 18 20 20 20 0 0 0 0 0 1 0 100 100 100 95 100 100 100 100 100 100 95 85 95 100 100 100 95 100 95 95 100 90 100 85 95 90 100 100 100 0 0 0 0 15 0 5 10 15 10 0 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 356 called as ‘margin’ By SVM ‘learning’, we mean, finding a hyperplane, which maximizes the margin The points lying beyond the bounding planes are called support vectors As for as data points belonging to AÀ are concerned P1, P2, P3, P4, and P5 are support vectors (see Fig 12), but P6, P7 are not support vectors Similar thing hold good for class A+ These points play a crucial role in the theory and hence the name support vector machines Here by ‘machine’, we mean an algorithm: > ne0 y ỵ w0 w > = nỵ1ỵm w;g ;yị R s:t: DAw eg ị ỵ y ! e > > ; y!0 (10) where A RmÂn, D {À1, +1}mÂ1, e = 1mÂ1 Table Network statistics of artificial neural network for Full-No Load condition No of neurons in hidden layer RMS error Training epochs No of data items Number right Number wrong %Right %Wrong S 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.505417816 0.027060278 0.008324499 0.011176481 0.018590762 0.014064934 0.012413606 0.006726714 0.009909251 0.009252885 0.033977899 0.024112894 0.008712277 0.008745006 0.01437363 0.014105676 0.013767611 0.013927728 0.027126833 0.034763055 0.014020149 0.033682394 0.020139668 0.026711765 0.022877088 0.018008394 0.022976123 0.021863515 0.578409091 1008548 400021 6861 19818 62126 27116 9601 14690 8255 8418 29910 87526 11447 9362 11901 19652 13204 7977 15286 53244 16349 85299 11999 10999 20387 14865 13325 12706 15977 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 19 20 20 19 18 17 19 20 17 19 19 20 19 18 18 17 18 16 18 20 0 0 0 0 0 3 1 2 100 100 100 100 100 100 100 100 100 95 100 100 95 90 85 95 100 85 95 95 100 95 90 90 85 90 80 90 100 0 0 0 0 0 10 15 15 5 10 10 15 10 20 10 Table Network statistics of artificial neural network for Full-Full Load condition No of neurons in hidden layer RMS error Training epochs No of data items Number right Number wrong %Right %Wrong 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.370129251 0.055610674 0.280682366 0.018595841 0.026193107 0.022802092 0.025799733 0.012463606 0.011833057 0.013218549 0.018665787 0.017412187 0.021766144 0.012926895 0.0141823 0.018431218 0.013437833 0.016866424 0.014887121 0.019861005 0.021069119 0.015897522 0.016628143 0.025395146 0.03568455 0.019084121 0.019347065 0.014214479 0.035910444 4058055 9257 127294 5901 7600 7823 6256 8693 5493 5556 4921 3669 11193 4601 4752 6134 6855 6694 3746 6709 5362 5693 5602 5929 13826 3802 4153 5605 22028 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 10 18 17 19 19 19 18 18 19 18 19 19 17 19 18 19 18 18 19 18 17 16 18 18 16 19 18 19 18 10 1 2 1 2 2 2 2 50 90 85 95 95 95 90 90 95 90 95 95 85 95 90 95 90 90 95 90 85 80 90 90 80 95 90 95 90 50 10 15 5 10 10 10 5 15 10 10 10 10 15 20 10 10 20 10 10 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 Vapnik [32] has shown that if the training features are separated without errors by an optimal hyper-plane, the expected error rate on a test sample is bounded by the ratio of the expectation of the support vectors to the number of training vectors The smaller the size of the support vector set more general the above result Further the generalization is independent of the dimension of the problem In case such a hyper-plane is not possible the next best is to minimize the number of misclassifications whilst maximizing the margin with respect to the correctly classified features Recently a much simpler classifier, proximal support vector machine, was implemented wherein each class of points is assigned to the closer of two parallel planes (in input or feature space) that are pushed apart as far as possible This formulation leads to a fast and simple algorithm for generating a classifier (linear or nonlinear) that is obtained by solving a single system of linear equations The point of departure from SVM is that, the optimization problem given by Eq (10) is replaced by the following problem: 1 n jjyjj2 ỵ wT w ỵ g ị 2 subject toDAw elị ỵ y ẳ e 357 Fig 14 RMS error vs number of neurons (11) ðw;g ;yÞ The geometrical interpretation of the above formulation is in Fig 12 Referring to Fig 13, y represents deviation (scaled by 1/jjwjj) of the point from the plane passing through the centroid of the data cluster (A+ or AÀ) to which the point belongs Hence, there is no non-negativity constraint on y Further the 2-norm of the error vector y is minimized instead of the 1-norm; the margin between the bounding planes is maximized with respect to both orientation w and relative location g to the origin Extensive computational experience indicates that the formulation (Eq (11)) is almost as good as the classical formulation (Eq (10)) with some added advantages such as strong convexity of the objective function The key idea in this formulation is to make computation simple, by replacing the inequality constraint by equality The modification, even though simple, changes the nature of optimization problem significantly An explicit exact solution can be written to the problem in terms of the problem data It is impossible to that in the previous formulations because of their combinatorial nature Geometrically the formulation obtained by (Eq (11)) can be interpreted as follows The planes xTw À g = Ỉ1 are not bounding planes anymore, but can be thought of as ‘‘proximal’’ planes, around which the points of each class are clustered and which are pushed as far apart as possible by the term wTw + g2 in the objective function; in fact this term is the reciprocal of the 2-norm distance squared between the two planes in the (w, g) space The interpretation, however, is not based on the idea of maximizing the margin, the distance between the bounding parallel planes, which is a key feature of support vector machines After training, for any the new set of features prediction of its class is possible using the decision function as given below which is Fig 15 Number of epochs vs number of neurons Fig 16 Methodology of classification using PSVM a function of ‘w’ and ‘g’ It is called testing: f xị ẳ signwT x g ị (12) If the value of f(x), as shown in (Eq (12)) is positive then new set of features belongs to class A+ otherwise it belongs to class AÀ Table Optimum values of no of neurons in hidden layer and corresponding efficiencies for different conditions Condition No of neuron in hidden layer Training epochs No of data items for testing Number right Number wrong %Right %Wrong Dry-No Load Dry-Full Load Half-No Load Half-Full Load Full-No Load Full-Full Load 28 12 13 6567 16571 18715 3056 6861 3669 20 20 20 20 20 20 19 20 19 20 20 19 1 0 95 100 95 100 100 95 5 0 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 358 Table 10 PSVM results for statistical features from Morlet coefficients S no.a Nu 0.1 À0.55 À0.5 À0.55 À0.1 0.1 À0.1 À0.5 À0.55 À0.55 À0.1 0.1 À0.1 0.1 0.2 À0.1 0.1 0.2 W G S no.a % 1.709 98 8.82 98 0.5 9.415 99 0.6 9.400 99 À4.458 0.3 1.696 100 0.4 0.0024 0.0079 À0.6125 À1.0344 À0.0035 À0.0119 0.9269 1.4925 À0.0033 À0.0114 0.879 1.4136 À5.777 98 8.350 99 7.910 100 0.0051 0.0389 À0.095 À2.8081 0.0009 0.0066 À0.0158 À0.4703 À0.0008 À0.0064 0.0119 0.4546 À0.0509 30 0.0034 40 0.2 92.5 0.07 À0.0011 À0.0053 0.2067 0.5357 À0.0054 À0.0257 0.999 2.6152 À0.0058 À0.0275 1.0658 2.8042 0.004 0.0249 0.9361 À2.8311 À0.0022 À0.013 À0.4191 1.4922 0.0009 0.0052 0.1796 À0.5934 À0.007 10 Nu 11 12 0.2 0.2 0.1 0.3 0.35 13 À0.002 À0.0095 À0.0212 1.2507 0.0016 0.008 0.0152 À1.0367 0.003 0.0147 0.024 À1.9117 À0.3965 0.2928 99 0.4 0.5129 100 0.5 0.0007 0.0032 À0.154 À0.3604 À0.0006 À0.0026 0.1311 0.299 À0.001 À0.0048 0.2305 0.5551 À1.2589 22.5 14 À0.55 15 0.3 0.1 0.9364 90 0.2 1.6806 97.5 0.3 W G À0.0016 À0.0107 À0.0874 0.2779 À0.0041 À0.0266 À0.1606 0.6525 À0.0049 À0.0318 À0.1772 0.7704 À0.7277 77.5 À1.4014 80 À1.561 95.5 À1.3327 92.5 À1.7426 95 À2.0527 97.5 0.0019 0.0073 À0.1922 À1.4833 0.0027 0.0101 À0.2512 À2.0828 0.0033 0.0126 À0.2972 À2.6164 À0.001 À0.0055 0.0933 0.3571 À0.0028 À0.0152 0.2136 1.0138 À0.0032 À0.0175 0.2361 1.1689 0.0074 0.0463 À0.1052 À4.4774 À0.0021 À0.0133 0.026 1.2515 À0.0008 À0.0048 0.0085 0.4563 0.0034 0.0128 À0.1414 À1.6839 0.0044 0.0166 À0.1773 À2.1749 0.0053 0.0201 À0.2095 À2.6373 À0.001 À0.005 0.105 0.5005 À0.0018 À0.0095 0.1905 0.9449 À0.0026 À0.0135 0.264 1.344 % 0.5804 80 1.3742 87.5 1.5126 95 0.06 2.5 À0.0271 97.5 À0.0092 95 À0.9302 90 À1.1684 92.5 À1.3786 95 0.6836 65 1.2619 90 1.7558 95 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 359 Table 10 (Continued ) S no.a Nu W 0.1 0.17 À0.1 0.05 0.1 À0.3 0.1 0.5 a S no.a % 80 1.177 87 0.02 1.302 95 0.03 À0.0009 À0.0042 À0.0513 0.8685 0.0004 0.0019 0.0231 À0.3941 0.0008 0.0037 0.0407 À0.7655 À0.612 20 0.246 95 0.15 0.460 95 0.16 À0.0192 À0.0826 3.906 5.9826 À0.0012 À0.0055 0.2019 0.4111 À0.0019 À0.0091 0.3062 0.6847 30.82 2.5 16 Nu 0.835 À0.0012 À0.0063 0.1074 0.2307 À0.0017 À0.0092 0.1493 0.3351 À0.0019 À0.0103 0.1647 0.3752 0.15 G 0.01 17 0.1 18 0.09 1.458 95 0.1 2.260 95 0.13 W G À0.0001 À0.0007 0.0049 0.0614 À0.0002 À0.0014 0.0094 0.1224 À0.0004 À0.0021 0.0139 0.183 % 0.0245 90 0.0488 95 0.073 100 0.0012 0.0047 À0.0662 À0.6439 0.0018 0.0069 À0.0911 À0.9438 0.0019 0.0073 À0.0956 À1.0023 À0.4424 67.5 À0.6178 87.5 À0.6499 95 À0.0008 À0.0042 0.0713 0.421 À0.0009 À0.0047 0.0786 0.4661 À0.0011 À0.006 0.1 0.5999 0.4674 92.5 0.5169 97.5 0.6625 100 The details of comparison of various conditions are given in Table 11 Classifying multiple classes is commonly performed by combining several binary SVM classifiers in a tournament manner, either oneagainst-all or one-against-one, the latter approach required substantially more computational effort [23] of six networks as mentioned above were created for classifying the faults namely dry lubrication-no load, dry lubrication-full load, half lubrication-no load, half lubrication-full load, full lubricationno load and full lubrication-full load Application of ANN for problem at hand and results Results of ANN For each faults namely good Bevel Gear, Bevel Gear with tooth breakage (GTB), Bevel Gear with crack at root of the tooth (GTC), and Bevel Gear with face wear of the teeth (TFW) for various loading and lubrication conditions, four feature vectors consisting of 100 feature value sets were collected from the experiment Twenty-five samples in each class were used for training and five samples are reserved for testing ANN Training was done by selecting three layers neural network, of that one is input layer, one hidden layer and one output layer The numbers of neurons in the hidden layer were varied and the values of number of neurons, RMS error and number of epochs along with percentage efficiency of classification of various faults using ANN are computed A total The architecture of the artificial neural network is as follows: Network type No of neurons in input layer No of neurons in hidden layer No of neurons in output layer Transfer function Training rule Training tolerance Learning rule Momentum learning step size Momentum learning rate Forward neural network trained with feed back propagation Varied from to 30 Sigmoid transfer function in hidden and output layer Back propagation 0.05 Momentum training method 0.9 0.9 Table 11 Various conditions of comparison Good-dry-No Load vs GTB-dry-No Load Good-dry-No Load vs GTC-dry-No Load Good-dry-No Load vs TFW-dry-No Load Good-dry-Full Load vs GTB-dry-Full Load Good-dry-Full Load vs GTC-dry-Full Load Good-dry-Full Load vs TFW-dry-Full Load Good-Half-No Load vs GTB-Half-No Load Good-Half-No Load vs GTC-Half-No Load Good-Half-No Load vs TFW-Half-No Load 10 11 12 13 14 15 16 17 18 Good-Half-Full Load vs GTB-Half-Full Load Good Half-Full Load vs GTC-Half-Full Load Good Half-Full Load vs TFW-Half-Full Load Good-Full-No Load vs GTB-Full-No Load Good-Full-No Load vs GTC-Full-No Load Good-Full-No Load vs TFW-Full-No Load Good-Full-Full Load vs GTB-Full-Full Load Good-Full-Full Load vs GTC-Full-Full Load Good-Full-Full Load vs TFW-Full-Full Load 360 N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 The efficiency of classification of gear box faults using above networks are reported in Tables 3–8 for classifying the faults namely dry lubrication-no load, dry lubrication-full load, half lubrication-no load, half lubrication-full load, full lubrication-no load and full lubrication-full load respectively Figs 14 and 15 depicts the RMS error and number of epochs corresponding to different neurons in hidden layer for all the six networks respectively Initially the number neurons in hidden layer in each network were varied from to 30 and Table gives the optimal number of neurons required for training with minimum number of epochs for different classifications of faults The testing efficiencies of the most effective networks are also tabulated in Table The overall average efficiency of entire classification using ANN was found to be 97.5% [2] [3] [4] [5] [6] [7] Application of PSVM for problem at hand and results Fig 16 shows the methodology adopted for classification of various conditions of gear box using SVM and PSVM The efficiency of classification using PSVM was found to be 97% The feature vectors corresponding to good gear were compared with those of the faulty gears at different loading and lubrication levels, by taking each fault at a time For each condition feature vectors consisting of 100 feature value sets were collected from the experiment at 2000 rpm Seventy samples in each class were used for training and 30 samples are reserved for testing PSVM Training was done and the values of ‘w’ (Weights), ‘g’ (Gamma) and efficiency obtained for statistical features of Morlet coefficients using PSVM are tabulated Table 10 [8] [9] [10] [11] [12] [13] [14] [15] Discussion [16] (1) The use of Morlet wavelet and extraction of statistical features from the wavelet coefficients was found to be very efficient for classification using ANN and PSVM (2) Decision tree is a good tool in selecting the best features among the extracted feature vectors Standard error, Sample Variance, Kurtosis and Minimum value were found to be the most contributing features (3) In PSVM, the parameters w (Weight) and g (Gamma) define the separating plane These can be used for testing a new set of data and classifying the faults accordingly The efficiency for various conditions using PSVM for Morlet coefficients are shown in Table 10 (4) From Tables and 10 we can see that PSVM and ANN are very close in classification capability, but the time required for training and classification using ANN is more compared to PSVM technique and taking this into account the authors claim that PSVM has upper hand over ANN [17] [18] [19] [20] [21] [22] [23] [24] [25] 10 Conclusion Fault diagnosis of Gear box is one of the core research areas in the field of condition monitoring of rotating machines A comparative study of different classifying techniques and the ability of Morlet wavelet in feature extraction for bevel gear box fault detection was carried out It was found that PSVM has an edge over ANN in classification of features References [1] P Gadd, P.J Mitchell, Condition Monitoring of Helicopter Gearboxes Using Automatic Vibration Analysis Techniques, AGARD CP 369, Gears and [26] [27] [28] [29] [30] [31] [32] Power Transmission System for Helicopter Turboprops, 1984, pp 29/1– 129/10 J.F.A Leblanc, J.R.F Dube, B Devereux, Helicopter gearbox vibration analysis in the Canadian Forces—applications and lessons, in: Proceedings of the First International Conference, Gearbox Noise and Vibration, C404/023, IMechE, Cambridge, UK, 1990, pp 173–177 B.G Cameron, M.J Stuckey, A review of transmission vibration monitoring at Westland Helicopter Ltd., in: Proceedings of the 20th European Rotorcraft Forum, Paper 116, 1994, pp 16/1–116/16 A.K Nandi, Advanced digital vibration signal processing for condition monitoring, in: Proceedings of 13th Conference of Condition Monitoring and Diagnostic Engineering Management (COMADEM), Houston, TX, USA, (2000), pp 129– 143 W.J Wang, P.D Mcfadden, Early detection of gear failure by vibration analysis II Interpretation of the time-frequency distribution using image processing techniques, Mechanical Systems and Signal Processing (3) (1983) 205– 215 M.C Pan, P Sas, Transient analysis on machinery condition monitoring, in: International Conference on Signal Processing Proceedings, vol 2, ICSP, (1996), pp 1723–1726 P.C Russell, J Cosgrave, D Tomtsis, A Vourdas, L Stergioulas, G.R Jones, Extraction of information from acoustic vibration signals using Gabor transform type devices, Measurement Science and Technology (1998) 1282–1290 I.S Koo, W.W Kim, Development of reactor coolant pump vibration monitoring and a diagnostic system in the nuclear power plant, ISA Transactions 39 (2000) 309–316 A Francois, F Patrick, Improving the readability of time-frequency and time-scale representations by the reassignment method, IEEE Transactions on Signal Processing 43 (1995) 1068–1089 R.M Stewart, Some useful data analysis techniques for gearbox diagnosis, Technical report paper MHM/R/10/77, University of Southampton, Institute of Sound and Vibration, 1977 M.S Hundal, Mechanical signature analysis, Shock and Vibration Digest 18 (12) (1986) 3–10 C Cempel, Vibroacoustic Condition Monitoring, Ellis Horwood, Chichester, 1991 R.B Randall, A new method of modeling gear faults, Transactions of the ASME, Journal of Mechanical Design 104 (1982) 259–267 W.J Staszweski, et al., Report on the Application of the Signal Demodulation Procedure to the Detection of Broken and Cracked Teeth Utilizing the Pyestock FZG Spur Gear Test Rig, Dynamics and Control Research Group, Department of Engineering, University of Manchester, 1992 P.D McFadden, Detecting fatigue cracks in gears by amplitude and phase demodulation of the meshing vibration, Transactions of the ASME, Journal of Vibration, Acoustics, Stress and Reliability in Design 108 (1986) 165–170 P Flandrin, et al., Some of Aspects of Non-stationary Signal Processing with Emphasis on Time-frequency and Time-scale Methods, Wavelets, Time-frequency Methods and Phase Space, Springer-Verlag, Berlin, 1989 M.B Priestly, Non-linear and Non-stationary Time Series Analysis, Academic Press, London, 1988 L Cohen, Time-frequency distributions—a review, Proceedings of the IEEE 77 (7) (1989) 941–981 V Sugumaran, V Muralidharan, K.I Ramachandran, Feature selection using Decision Tree and classification through Proximal Support Vector Machine for fault diagnostics of roller bearing, Mechanical Systems and Signal Processing 21 (2006) 930–942 B Samanta, K.R Al-Baulshi, Artificial neural network based fault diagnostics of rolling element bearings using time domain features, Mechanical Systems and Signal Processing 17 (2) (2003) 317–328 C.J.C Burgess, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery (1998) 955–974 C.H.K Chui, Wavelets Analysis and its Applications, vol 1: An Introduction to Wavelets, Academic Press, Boston, 1992 C.H.K Chui, Wavelets Analysis and its Applications, vol 2: A Tutorial in Theory and Applications, Academic Press, Boston, 1992 W.J Wang, P.D Mcfadden, Early detection of gear failure by vibration analysis II Interpretation of the time-frequency distribution using image processing techniques, Mechanical Systems and Signal Processing (3) (1983) 205–215 O Petrille, B Paya, I.I Esat, M.N.M Badi, in: Proceedings of the Energy-sources Technology Conference and Exhibition: Structural Dynamics and Vibration PDvol 70, 1995, p 97 D Boulahbal, M.F Golnaraghi, F Ismail, in: Proceedings of the DETC’97, 1997, ASME Design Engineering Technical Conference DETC97/VIB-4009, 1997 F.A Andrade, et al., A new approach to time-domain vibration condition monitoring: Gear tooth fatigue crack detection and identification by the Kolmogorov– Smirnov test, Journal of Sound and vibration 240 (5) (2001) 909–919 R.A Collacott, Mechanical Fault Diagnosis and Condition Monitoring, Chapman & Hall, 1977 Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1998 K.P Soman, K.I Ramachandran, Insight into Wavelets from Theory to Practice, Prentice-Hall of India Private Limited, 2005 Y.H Peng, P.A Flach, P Brazdil, C Soares, Decision tree-based data characterization for meta-learning, in: ECML/PKDD-2002 Workshop IDDM-2002, Helsinki, Finland, 2002 V.N Vapnik, The Nature of Statistical Learning Theory, 2nd edn., SpringerVerlag, 1999, pp 138–146 ... This work deals with extraction of wavelet features from the vibration data of a bevel gear box system and classification of Gear faults using artificial neural network (ANN) and proximal support... the bevel gear box by means of a belt drive The gear box is of dimension Fig Fault simulator setup N Saravanan et al / Applied Soft Computing 10 (2010) 344–360 346 Table Details of faults under... number of epochs along with percentage efficiency of classification of various faults using ANN are computed A total The architecture of the artificial neural network is as follows: Network type No of

Định dạng
Số trang	17
Dung lượng	1,41 MB