Region Identification Based on Fuzzy Logic 291 (c) Figure 15.18 (continued) density type). We can consider this colorization process an additional rule which is based directly on the detected edge pixel brightness. Indeed, we tested DFC on all the normal cases of the database, none of them revealed specious structures with white or green colors. We are not claiming that the colorization scheme can be used to classify mammograms, but we can use it as one associative rule for mining abnormal mammograms. This issue is our current research program, in which we are trying to construct a data mining technique based on association rules extracted from our DFC method for categorizing mammograms. We are also intending, for the purpose of mammogram categorization, to use other measures besides our DFC association rules, like the brightness, mean, variance, skewness and kurtosis for the DFC segmented image. These measures have been reported to have some success in identifying abnormal mammograms [32]. Moreover, we experimented with all the edge detection techniques used in Table 15.1 and proved that no single method can be as effective as our current DFC method [33]. Figure 15.19 illustrates some comparisons for the stellate cancer image of this article. Table 15.1 Traditional mammography techniques. Mammography method Reference Gray-level thresholding [23] Comparing left and right breasts [24] Compass filters [25] Laplacian transform [11] Gaussian filters [12] Texture and fractal texture model [17,19] A space scale approach [26] Wavelet techniques [27] Mathematical morphology [28] Median filtering [29] Box-rim method [30] 292 Recognizing ROIs in Medical Images (a) (b) (c) (d) (e) (f) Figure 15.19 Results of other traditional edge detection techniques. (a) The kirsch edge detection technique; (b) the Laplacian edge detection technique; (c) Sobel edge detection with contrast enhancement; (d) Prewitt edge detection with contrast enhancement; (e) Cafforio edge detection with inner window = 3 and outer window = 21; (f) Canny edge detection, = 0 045; (g) Canny edge detection, =0065. Conclusions 293 (g) Figure 15.19 (continued) 7. Conclusions In this chapter we identified three primitive routes for region identification based on convolution, thresholding and morphology. Experiments on medical images show that none of these routes is capable of clearly identifying the region edges. A better recognition can be achieved by hybridizing the primitive techniques through a pipelining sequence. Although many region identification pipelines can be found that enable us to clearly identify regions of interest, such a hybridizing technique remains valid when the characteristic images remain static. The problem with most of the medical images is that their characteristics vary so much, even for one type of imaging device. With this in mind, we are proposing a new fuzzy membership function that transforms the intensity of a pixel into the fuzzy domain, according to the direction of the brightness slope in its neighboring transition zone. A new intensification operator based on a polygon is introduced for determining the corrected intensity value for any pixel. The membership fuzzification classifier dynamically evaluates every pixel’s brightness by optimizing its contrast according to the neighboring pixels. The method needs no preprocessing or training and does not change the brightness nature of the segmented image compared to the original image. The DFC method has been tested on a medical mammography database and has been shown to be effective for detecting abnormal breast regions. In comparisons with the traditional edge detection techniques, our current DFC method shows significant abnormality details, where many other methods (e.g. Kirsch, Laplacian) revealed irrelevant edges as well as extra noise. For Sobel and Prewitt, the original image becomes completely black. With contrast enhancements, Sobel and Prewitt still show extra edges and noise. However, with simpler edge detection techniques like the Cafforio method [10], the result is completely filled with noise. Moreover, we believe that our DFC technique can be used to generate association rules for mining abnormal mammograms. This will be left to our future research work. Finally, we are currently involved in developing global measures for measuring the coherence of our DFC method in comparison with the other techniques such as Canny or Gabor GEF filters. This will enable us quantitatively to determine the quality of the developed technique. We aim, in this area, to benefit from the experience of other researchers such as Mike Brady (http://www.robots.ox.ac.uk/∼mvl/). 294 Recognizing ROIs in Medical Images References [1] Shiffman, S. Rubin, G. and Napel, S. “Medical Image Segmentation using Analysis of Isolable-Contour Maps,” IEEE Transactions on Medical Imaging, 19(11), pp. 1064–1074, 2000. [2] Horn, B. K. P. Robot Vision, MIT Press, Cambridge, MA, USA, 1986. [3] Pal,N.andPal,S.“AReviewonImageSegmentationTechniques,”Pattern Recognition,26,pp.1277–1294,1993. [4] Batchelor, B. and Waltz, F. Interactive Image Processing for Machine Vision , Springer Verlag, New York, 1993. [5] Gonzalez, R. and Woods, R. Digital Image Processing, 2nd Edition, Addison-Wesley, 2002. [6] Parker, J. R. Algorithms for Image Processing and Computer Vision, Wiley Computer Publishing, 1997. [7] Canny, J. “A Computational Approach to Edge Detection,” IEEE Transactions on PAMI, 8(6), pp. 679–698, 1986. [8] Elvins, T. T. “Survey of Algorithms for Volume Visualization,” Computer Graphics, 26(3), pp. 194–201, 1992. [9] Mohammed, S., Yang, L. and Fiaidhi, J. “A Dynamic Fuzzy Classifier for Detecting Abnormalities in Mammograms,” The 1st Canadian Conference on Computer and Robot Vision CRV2004, May 17–19, 2004, University of Western Ontario, Ontario, Canada, 2004. [10] Cafforio, C., di Sciascio, E., Guaragnella, C. and Piscitelli, G. “A Simple and Effective Edge Detector”. Proceedings of ICIAP’97, in Del Bimbo, A. (Ed.), Lecture Notes on Computer Science, 1310, pp. 134–141. 1997. [11] Hingham, R. P., Brady, J. M. et al. “A quantitative feature to aid diagnosis in mammography” Third International Workshop on Digital Mammography, Chicago, June 1996. [12] Costa, L. F. and Cesar, R. M. Junior, Shape Analysis And Classification: Theory And Practice, CRC Press, 2000. [13] Liang, L. R. and Looney, C. G. “Competitive Fuzzy Edge Detection”, International Journal of Applied Soft Computing, 3(2), pp. 123–137, 2003. [14] Looney, C. G. “Nonlinear rule-based convolution for refocusing,” Real Time Imaging, 6, pp. 29–37, 2000. [15] Looney, C. G. Pattern Recognition Using Neural Networks, Oxford University Press, New York, 1997. [16] Looney, C. G. “Radial basis functional link nets and fuzzy reasoning,” Neurocomputing, 48(1–4), pp. 489–509, 2002. [17] Guillemet, H., Benali, H., et al. “Detection and characterization of micro calcifications in digital mammography”, Third International Workshop on Digital Mammography, Chicago, June 1996. [18] Russo, F. and Ramponi, G. “Fuzzy operator for sharpening of noisy images,” IEE Electronics Letters, 28 pp. 1715–1717, 1992. [19] Undrill, P., Gupta, R. et al. “The use of texture analysis and boundary refinement to delineate suspicious masses in mammography” SPIE Image Processing, 2710, pp. 301–310, 1996. [20] Tizhoosh, H. R. Fuzzy Image Processing, Springer Verlag, 1997. [21] van der Zwaag, B. J., Slump, K. and Spaanenburg, L. “On the analysis of neural networks for image processing,” in Palade, V., Howlett, R. J. and Jain, L. C. (Eds), Proceedings of the Seventh International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES’2003), Part II, volume 2774 of Springer LNCS/LNAI, pp. 950–958, Springer Verlag, 2003. [22] Mammographic Image Analysis Society (MIAS) http://www.wiau.man.ac.uk/services/MIAS/MIASweb.html. [23] Davies, D. H. and Dance, D. R. “Automatic computer detection of subtle calcifications in radiographically dense breasts”, Physics in Medicine and Biology, 37(6), pp. 1385–1390, 1992. [24] Giger, M. L. “Computer-aided diagnosis”, Syllabus: 79th Scientific Assembly of the Radiological Society of North America, pp. 283–298, 1993. [25] Maxwell, B. A. and Brubaker, S. J. “Texture Edge Detection Using the Compass Operator,” British Machine Vision Conference, 2003. [26] Netsch, T. “Detection of micro calcification clusters in digital mammograms: A space scale approach”, Third International Workshop on Digital Mammography, Chicago, June 1996. [27] McLeod, G., Parkin, G., et al. “Automatic detection of clustered microcalcifications using wavelets”, Third International Workshop on Digital Mammography, Chicago, June 1996. [28] Neto, M. B., Siqueira, U, W. N. et al. “Mammographic calcification detection by mathematical morphology methods”, Third International Workshop on Digital Mammography, Chicago, June 1996. [29] Bovik, A. C. et al. “The effect of median filtering on edge estimation and detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9, pp. 181–194, 1987. [30] Bazzani, A. et al., “System For Automatic Detection of Clustered Microcalcifications in Digital Mammograms,” International Journal of Modern Physics C, 11(5) pp. 1–12, 2000. References 295 [31] Halls, S. B. MD http://www.halls.md/breast/density.htm, November 10, 2003. [32] Antonie, M L., Z ¨ aiane, O. R. and Coman, A. “Application of Data Mining Techniques for Medical Image Classification,” International Workshop on Multimedia Data Mining MDM/KDD2001, San Francisco, August 26, 2001. [33] Mohammed, S., Fiaidhi, J. and Yang, L. “Morphological Analysis of Mammograms using Visualization Pipelines,” Pakistan Journal of Information & Technology, 2(2), pp. 178–190, 2003. 16 Feature Extraction and Compression with Discriminative and Nonlinear Classifiers and Applications in Speech Recognition Xuechuan Wang INRS-EMT, University of Quebec, 800 de la Gauchetière West, Montreal, Quebec H5A 1K6, Canada Feature extraction is an important component of a pattern classification system. It performs two tasks: transforming an input parameter vector into a feature vector and/or reducing its dimensionality. A well-defined feature extraction algorithm makes the classification process more effective and efficient. Two popular feature extraction methods are Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA). The Minimum Classification Error (MCE) training algorithm, which was originally proposed as a discriminative classifier, provides an integrated framework for feature extraction and classification. The Support Vector Machine (SVM) is a recently developed pattern classification algorithm, which uses nonlinear kernel functions to achieve nonlinear decision boundaries in the parametric space. In this chapter, the frameworks of LDA, PCA, MCE and SVM are first introduced. An integrated feature extraction and classification algorithm, the Generalized MCE (GMCE) training algorithm is discussed. Improvements on the performance of MCE and SVM classifiers using feature extraction are given on both Deterding vowels and the TIMIT continue speech database. Computer-Aided Intelligent Recognition Techniques and Applications Edited by M. Sarfraz © 2005 John Wiley & Sons, Ltd 298 Feature Extraction and Compression with Classifiers 1. Introduction Pattern recognition deals with mathematical and technical aspects of classifying different objects through their observable information, such as gray levels of pixels for an image, energy levels in the frequency domain for a waveform and the percentage of certain contents in a product. The objective of pattern recognition is achieved in a three-step procedure, as shown in Figure 16.1. The observable information of an unknown object is first transduced into signals that can be analyzed by computer systems. Parameters and/or features suitable for classification are then extracted from the collected signals. The extracted parameters or features are classified in the final step based on certain types of measure, such as the distance, likelihood or Bayesian, over class models. Conventional pattern recognition systems have two components: feature analysis and pattern classification, as shown in Figure 16.2. Feature analysis is achieved in two steps: the parameter extraction step and the feature extraction step. In the parameter extraction step, information relevant for pattern classification is extracted from the input data xt in the form of a p-dimensional parameter vector x. In the feature extraction step, the parameter vector x is transformed to a feature vector y, Unknown object Transduction Parameter and/or feature extraction Classification Observation information Collected signal x(t) Feature vector x Classified object Figure 16.1 A typical pattern recognition procedure. Parameter extraction Feature extraction Class models Λ Pattern classifier Input data Feature analysis Classification Recognized classes xy Figure 16.2 A conventional pattern recognition system. Introduction 299 which has a dimensionality mm ≤ p. If the parameter extractor is properly designed so that the parameter vector x is matched to the pattern classifier and its dimensionality is low, then there is no necessity for the feature extraction step. However in practice, parameter vectors are not suitable for pattern classifiers. For example, speech signals, which are time-varying signals, have time-invariant components and may be mixed up with noise. The time-invariant components and noise will increase the correlation between parameter vectors and degrade the performance of pattern classification systems. The corresponding parameter vectors thus have to be decorrelated before being applied to a classifier based on Gaussian mixture models (with diagonal variance matrices). Furthermore, the dimensionality of parameter vectors is normally very high and needs to be reduced for the sake of less computational cost and system complexity. For these reasons, feature extraction has been an important problem in pattern recognition tasks. Feature extraction can be conducted independently or jointly with either parameter extraction or classification. LDA and PCA are the two popular independent feature extraction methods. Both of them extract features by projecting the original parameter vectors into a new feature space through a linear transformation matrix. But they optimize the transformation matrix with different intentions. PCA optimizes the transformation matrix by finding the largest variations in the original feature space [1–3]. LDA pursues the largest ratio of between-class variation and within-class variation when projecting the original feature to a subspace [4–6]. The drawback of independent feature extraction algorithms is that their optimization criteria are different from the classifier’s minimum classification error criterion, which may cause inconsistency between feature extraction and the classification stages of a pattern recognizer, and consequently degrade the performance of classifiers [7]. A direct way to overcome this problem is to conduct feature extraction and classification jointly with a consistent criterion. The MCE training algorithm [7–9] provides such an integrated framework, as shown in Figure 16.3. It is a type of discriminant analysis but achieves a minimum classification error directly when extracting features. This direct relationship has made the MCE training algorithm widely popular in a number of pattern recognition applications, such as dynamic time-warping based speech recognition [10,11] and Hidden Markov Model (HMM) based speech and speaker recognition [12–14]. The MCE training algorithm is a linear classification algorithm, of which the decision boundaries generated are straight lines. The advantage of linear classification algorithms is their simplicity and computational efficiency. However, linear decision boundaries have little computational flexibility and are unable to handle data sets with concave distributions. SVM is a recently developed pattern classification algorithm with nonlinear formulation. It is based on the idea that the classification that affords dot-products can be computed efficiently in higher dimensional feature spaces [15–17]. The classes which are not linearly separable in the original parametric space can be linearly separated in Parameter extractor Class models Λ Feature extractor and classifier Input data x Recognized classes Transformation matrix T Classification Figure 16.3 An integrated feature extraction and classification system. 300 Feature Extraction and Compression with Classifiers the higher dimensional feature space. Because of this, SVM has the advantage that it can handle the classes with complex nonlinear decision boundaries. SVM has now evolved into an active area of research [18–21]. This chapter will first introduce the major feature extraction methods – LDA and PCA. The MCE algorithm for integrated feature extraction and classification and the nonlinear formulation of SVM are then introduced. Feature extraction and compression with MCE and SVM are discussed subsequently. The performances of these feature extraction and classification algorithms are compared and discussed based on the experimental results on Deterding vowels and TIMIT continuous speech databases. 2. Standard Feature Extraction Methods 2.1 Linear Discriminant Analysis The goal of linear discriminant analysis is to separate the classes by projecting class samples from p-dimensional space onto a finely orientated line. For a K-class problem, m = minK −1p different lines will be involved. Thus, the projection is from a p-dimensional space to a c-dimensional space [22]. Suppose we have K classes, X 1 X 2 X K . Let the ith observation vector from the X j be x ji , where j = 1J and i = 1N j . J is the number of classes and N j is the number of observations from class j. The within-class covariance matrix S w and between-class covariance matrix S b are defined as: S w = K j=1 S j = K j=1 1 N j N j i=1 x ji − j x ji − j T (16.1) S b = K j=1 N j j − j − T where j = 1 N j N j i=1 x ji is the mean of class j and = 1 N N i=1 x i is the global mean. The projection from observation space to feature space is accomplished by a linear transformation matrix T: y = T T x (16.2) The corresponding within-class and between-class covariance matrices in the feature space are: ˜ S w = K j=1 1 N j N j i=1 y ji −˜ j y ji −˜ j T (16.3) ˜ S b = K j=1 N j ˜ j −˜ ˜ j −˜ T where ˜ j = 1 N j N j i=1 y ji and ˜ = 1 N N i=1 y i . It is straightforward to show that: ˜ S w = T T S w T (16.4) ˜ S b = T T S b T A linear discriminant is then defined as the linear functions for which the objective function JT = ˜ S b ˜ S w = T T S B T T T S W T (16.5) is maximal. It can be shown that the solution of Equation (16.5) is that the ith column of an optimal T is the generalized eigenvector corresponding to the ith largest eigenvalue of matrix S −1 w S b [6]. [...]... MCE (%) LDA (%) PCA (%) 85.6 53 .7 99.1 55.8 97. 7 51.3 97. 7 49.1 Feature Extraction and Compression with Classifiers Recognition rate (%) 310 100 90 80 70 60 2 3 4 5 6 7 Feature dimension 8 9 10 2 3 4 5 6 7 Feature dimension 8 9 10 Recognition rate (%) (a) 65 60 55 50 45 40 MCE (UNIT) (b) GMCE + LDA LDA Recognition rate (%) Figure 16 .7 Results of MCE(UNIT), GMCE + LDA and LDA on the Deterding database... Training set (%) 49.43 79 .73 59.85 90.53 78 .98 90.34 Testing set (%) 40.91 53.03 42.42 55.63 51.95 58.01 312 Feature Extraction and Compression with Classifiers Recognition rate (%) 100 90 80 70 60 2 3 4 5 6 7 Feature dimension 8 9 10 2 3 4 6 5 7 Feature dimension 8 9 10 (a) Recognition rate (%) 65 60 55 50 45 40 (b) RDSVM GMCE + LDA LDA SVM Figure 16.9 Results of GMCE + LDA, LDA, SVM and RDSVM on the Deterding... Proceedings in Visual Communication and Image Processing, 1001, pp 1 070 –1 077 , 1988 [24] Burges, C J C “A tutorial on support vector machines for pattern recognition, ” Data Mining and Knowledge Discovery, 2(2), pp 955– 974 , 1998 [25] Vanderbei, R J “LOQO: An interior point code for quadratic programming,” Optimization Methods and Software, 11, pp 451–484, 1999 [26] Clarkson, P and Moreno, P J “On the use of... Doddington, G and Goudie-Mashall, K “The DARPA speech recognition research database: specifications and status,” Proceedings DARPA Speech Recognition Workshop, pp 93–99, 1986 [30] Sankar, A and Mammone, R “Neural tree networks,” in Mammone, R and Zeevi, Y (Eds) Neural Networks: Theory and Applications Academic Press, pp 281–302, 1991 [31] Tsoi, A C and Pearson, T “Comparison of the three classification techniques, ... only and the results of Classification Experiments 313 Table 16.5 Number of observations of selected phonemes in training and testing sets Phoneme Training Testing aa 541 176 ae 665 214 ah 313 136 ao 445 168 aw 126 40 ax 2 07 89 ay 395 131 eh 591 225 oy 118 49 uh 57 21 Phoneme Training Testing el 145 42 en 97 34 er 384 135 ey 346 116 ih 6 97 239 ix 583 201 iy 1089 381 ow 336 116 uw 106 37 Total 72 41... variety of types of mine and of scenarios Computer-Aided Intelligent Recognition Techniques and Applications © 2005 John Wiley & Sons, Ltd Edited by M Sarfraz 320 Improving Mine Recognition where mines can be found Consequently, there is no single sensor that works well enough in all the possible scenarios A conventional Metal Detector (MD) is the oldest mine detection sensor, and in reality, it is still... Minimum Error Discriminative Training for Speaker Recognition, ” Journal of Acoustical Society of America, 97( 1), pp 6 37 648, 1995 [14] Rainton, D and Sagayama, S “Minimun Error Classification Training of HMMs–Implementation Details and Experimental Results,” Journal of Acoustical Society of Japan(E), 13(6), pp 379 –3 87, 1992 [15] Boser, B E., Guyon, I M and Vapnik, V “A training algorithm for optimal... 329–358, 1964 [4] Bocchieri, E L and Wilpon, J G “Discriminative analysis for feature reduction in automatic speech recognition, ” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1, pp 501–504, 1992 [5] Poston, W L and Marchette, D J “Recursive Dimensionality Reduction Using Fisher”s Linear Discriminant,” Pattern Recognition, 31 (7) , pp 881–888, 1998 [6] Sun, D... Feature Extractor and Pattern Classifier Using the Minimum Classification Error Training Algorithm,” Proceedings of IEEE Workshop on Neural Networks for Signal Processing, Boston, USA, pp 67 76 , 1995 [10] Chang, P C., Chen, S H and Juang, B H “Discriminative Analysis of Distortion Sequences in Speech Recognition, ” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing,... Komori, I and Katagiri, S “GPD Training of Dynamic Programming-based Speech Recognizer,” Journal of Acoustical Society of Japan(E), 13(6), pp 341–349, 1992 [12] Chou, W., Juang, B H and Lee, C H “Segmental GPD Training of HMM-based Speech Recognizer,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1, pp 473 – 476 , 1992 [13] Liu, C S., Lee, C H., Chou, W and Juang, . (%) LDA (%) PCA (%) Vowels (Train) 85.6 99.1 97. 7 97. 7 Vowels (Test) 53 .7 55.8 51.3 49.1 310 Feature Extraction and Compression with Classifiers Recognition rate (%) 100 90 80 70 60 2345 678 910 Recognition rate (%) 65 60 55 50 45 40 2345 678 910 Feature. performance of MCE and SVM classifiers using feature extraction are given on both Deterding vowels and the TIMIT continue speech database. Computer-Aided Intelligent Recognition Techniques and Applications. popular in a number of pattern recognition applications, such as dynamic time-warping based speech recognition [10,11] and Hidden Markov Model (HMM) based speech and speaker recognition [12–14]. The