elsevier academic press - pattern recognition - 4th edition - 2003

ELSEVIER ACADEMIC PRESS I PATTERN RECOGNITION S E C O N n SERGIOS THEODORIDIS KONSTANTINOS KOUTROUMBAS A L PATTERN RECOGNITION SECOND EDITION PATTERN RECOGNITION SECOND EDITION SERGIOS THEODORIDIS Department of Informatics and Telecommunications University of Athens Greece KONSTANTINOS KOUTROUMBAS Institute of Space Applications & Remote Sensing National Observatory of Athens Greece ELSEVIER WADEMI( PRESS AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SANFRANCISCO SINGAPORE SYDNEY TOKYO Academic Pres\ I+dn imprint ot Elsevier This book is printed on acid-free paper Copyright 2003, Elsevier (USA) All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+a) 853333, 1865 e-mail: permissions @elsevier.Corn.uk.You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Customer Support” and then “Obtaining Permissions.” ACADEMIC PRESS An imprint o Elsevier f 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA http://www.academicpress.com Academic Press 84 Theobald’s Road, London WClX 8RR, UK http://www.academicpress.com Library of Congress Control Number: 2002117797 International Standard Book Number: 0-12-685875-6 PRINTED IN THE UNITED STATES OF AMERICA 03 04 05 06 07 08 CONTENTS Preface xi11 CLASSIFIERS BASED ON BAYES DECISION THEORY 13 1.2 1.3 1.4 CHAPTER 2.1 2.2 2.3 2.4 2.5 2.6 CHAPTER Is Pattern Recognition Important? Features, Feature Vectors, and Classifiers Supervised Versus Unsupervised Pattern Recognition Outline of the Book 1.1 Introduction Baycs Decision Theory Discriminant Functions and Decision Surfaces Bayesian Classification for Normal Distributions Estimation of Unknown Probability Density Functions 2.5.1 Maximum Likelihood Parameter Estimation 2.5.2 Maximum a Posteriori Probability Estimation 2.5.3 Bayesian Inference 2.5.4 Maximum Entropy Estimation 2.5.5 Mixture Models 2.5.6 Nonparametric Estimation The Nearest Neighbor Rule LINEAR CLASSIFIERS 3.1 3.2 Introduction Hyperplanes The Perceptron Algorithm 27 28 31 32 34 35 39 44 55 55 Linear Discriminant Functions and Decision 3.3 13 13 19 20 55 57 V CONTENTS vi 3.4 3.5 3.6 CHAPTER Least Squares Methods 3.4.1 Mean Square Error Estimation Stochastic Approximation and the LMS 3.4.2 Algorithm Sum of Error Squares Estimation 3.4.3 Mean Square Estimation Revisited Mean Square Error Regression 3.5.1 3.5.2 MSE Estimates Posterior Class Probabilities 3.5.3 The Bias-Variance Dilemma Support Vector Machines 3.6.1 Separable Classes 3.6.2 Nonseparable Classes NONLINEAR CLASSIFIERS 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 Introduction The XOR Problem The Two-Layer Perceptron 4.3.1 Classification Capabilities of the Two-Layer Perceptron Three-Layer Perceptrons Algorithms Based on Exact Classification of the Training Set The Backpropagation Algorithm Variations on the Backpropagation Theme The Cost Function Choice Choice of the Network Size A Simulation Example Networks With Weight Sharing Generalized Linear Classifiers Capacity of the l-Dimensional Space in Linear Dichotomies Polynomial Classifiers Radial Basis Function Networks Universal Approximators Support Vector Machines: The Nonlinear Case Decision Trees 4.18.1 Set of Questions 4.18.2 Splitting Criterion 4.18.3 Stop-Splitting Rule 4.18.4 Class Assignment Rule Discussion 65 65 68 70 72 72 73 76 77 77 82 93 93 93 94 98 101 102 104 112 115 118 124 126 127 129 131 133 137 139 143 146 146 147 147 150 CONTENTS CHAPTER FEATURE SELECTION I 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 CHAPTER Introduction Preprocessing 5.2.1 Outlier Removal 5.2.2 Data Normalization 5.2.3 Missing Data Feature Selection Based on Statistical Hypothesis Testing 5.3.1 Hypothesis Testing Basics 5.3.2 Application of the t-Test in Feature Selection The Receiver Operating Characteristics CROC Curve Class Separability Measures 5.5.1 Divergence 5.5.2 Chernoff Bound and Bhattacharyya Distance 5.5.3 Scatter Matrices Feature Subset Selection 5.6.1 Scalar Feature Selection 5.6.2 Feature Vector Selection Optimal Feature Generation Neural Networks and Feature Generation/Selection A Hint on the Vapnik-Chernovenkis Learning Theory FEATURE GENERATION I: LINEAR TRANSFORMS 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 Introduction Basis Vectors and Images The Karhunen-Lohe Transform The Singular Value Decomposition Independent Component Analysis 6.5.1 ICA Based on Second- and Fourth-Order Cumulants 6.5.2 ICA Based on Mutual Information 6.5.3 An ICA Simulation Example The Discrete Fourier Transform (DFT) 6.6.1 One-Dimensional DFT 6.6.2 Two-Dimensional DFT The Discrete Cosine and Sine Transforms The Hadamard Transform The Haar Transform vii i63 i63 164 164 165 165 165 166 171 173 174 174 177 179 181 182 183 187 191 193 207 207 208 210 215 219 22 222 226 226 227 229 230 23 I 233 Section D.3: SERIAL AND PARALLEL CONNECTION 679 input and output sequences of a linear time-invariant system Then (D.2) is shown to be equivalent to Y(z) = H(z)X(z) 0.9) If the unit circle is in the region of convergence of the respective z-transforms (for example, for causal FIR systems), then for z = exp(-jw) we obtain the equivalent Fourier transform and Y(w) = H(w)X(w) (D 10) If the impulse response of a linear time-invariant system is delayed by r samples, for example, to make it causal in case it is noncausal, the transfer function of the delayed system is given by z-'H(z) D.3 SERIAL AND PARALLEL CONNECTION Consider two LTI systems with responses hl (n) and h2(n), respectively Figure D l a shows the two systems connected in serial and Figure D 1b in parallel FIGURE D.l: Serial and Parallel connections of LTI systems 680 Appendix D: BASIC DEFINITIONS FROM LINEAR SYSTEMS THEORY The overall impulse responses are easily shown to be Serial h ( n ) = hl (n) * h2(n) Parallel h(n) = h l ( n ) + h2(n) (D.ll) (D.12) D.4 TWO-DIMENSIONAL GENERALIZATIONS A two-dimensional linear time-invariant system is also characterized by its twodimensional impulse response sequence H ( m , n), which in the case of images is n) known as a point spreadfunction On filtering an input image array X(m, by H ( m , n) the resulting image array is given by the two-dimensional convolution H(m - k , n - I)X(k,I ) Y ( m ,n) = k k H ( m , n) * * X ( m , n ) l = H ( k , Z)X(m - k, n - E ) (D 13) INDEX A Absolute moments, 272 Acceptance interval, 169 Adaline 70 Adaptive fuzzy C-shells (AFCS) algorithm, 12 Adaptive momentum, 125 Adaptive resonance theory (ART2), 435 Agglomerative hierarchical algorithms, 450 Akaike Information Criterion (AIC), 522,614 Algebraic distance, 508 See also distance between a point and a quadratic surface Alternative hypothesis, I66 Alternating Cluster Estimation, 522 Alternating Optimization, 504 Angular second moment, 274 Any path method, 363 Autoregressive-moving average (ARMA(p m)), 287 Autoregressive processes, 287 Average expected quantization error, 534 Average partition density (PA), 619 Average partition shell density 620 Average Risk, 17 B Backpropagation algorithm IO, 104 Basic competitive learning algorithm, 554 Basic sequential algorithmic scheme (BSAS), 433 Basis images, 209 Basis images (matrices), 209 Basis sequences, 229,243 Basis vectors, 208 Baum-Welch reestimation, 367 Bayes classification rule, 14 Bayes decision theory, 13 Bayes rule, 13 Bayesian Inference, 32 Bayesian Information Criterion (BIC), 614 Bellman’s optimality principle, 324 Bending energy, 302 Best path method, 365 Between-class scatter matrix 179 Bhattacharyya distance, 177 Bias-variance dilemma, 76 Binary morphology based clustering algorithm (BMCA), 57 Biorthogonal expansion, 243 Biorthogonality condition, 245 Boosting; See Combining classifiers Bootstrapping techniques, 594 Boundary detection algorithms (BDA), 432 Branch and bound, 186 Branch and bound clustering (BBC) algorithm, 562,432 Branch and bound mcthods 56 I c c-means algorithm Srr isodata algorithm Center of mass, 301 Central moments, 27 I 681 682 Centroid condition, 535 Chain codes, 298 Chaining effect, 457 Channel equalization, 356 Characteristicfunctions, 404,644 Chemoff bound, 177 Circular backpropagation model, 126 Classification error probability, 16 Classification tree, 561 Classifiers, Classifier combining; See Combining classifiers Closing, 567 Cluster, 397,402 compact, 489 linear-shaped, 489 ring-shaped, 489,497 shell-shaped, 489 spherical, 17 Cluster detection algorithm for discrete valued sets (CDADV), 569 Cluster validity, 591 Cluster variation, 617 Clustering, 402404,429 Clustering criterion, 397,399 Clustering hypothesis, 575 Clustering tendency, 399.59 1,624 Co-occurrence, 273 Co-occurrence matrices, 272 Code vector, 533 See also reproduction vector Combining classifiers, 150 Compactness and separation validity function, 617, See also Xie-Beni index Competitive learning algorithms, 552-560 Competitive learning associated with cost functions, 558 Complete link, 455,469 Computer-aided diagnosis, Computer storage utilization, 533 Concordant pair, 603 Confusion matrix, 496,506 Conjugate gradient, 664 INDEX Conscientious competitive learning algorithms, 556 Constrained optimization, 664 Constraints, 335 Constructive techniques, 102 Context-dependent classification, 35 Contingency table, 410 Continuous observation HMM, 370 Continuous speech recognition, 329 Contrast, 274 Convex hull, 625 Convex functions, 667 Convex programming, 674 Convex sets, 668 Cophenetic correlation coefficient (CPCC), 602 Cophenetic distance, 476 Cophenetic matrix, 477 “Corrected statistic, 600 Correlation, 66 Cost function, 489 Covariance matrix, 20 Cox-Lewis test, 630 Crisp clustering algorithms, 432 See hard clustering algorithms Critical interval, 166 Cross-correlation, 66 Cross-correlation coefficient, 338 Cross-entropy, 115,372,647 Crossover, 461 Cumulants, 221,645 Curse of dimensionality, 43, 134 Curvature features, 300 D Data compression, 400,533 Data normalization, 165 Data reduction, 400 Davies-Bouldin (DB) index, 612-613 Davies-Bouldin-like indices, 612-613 Decision surfaces, 19 Decision trees, 143 Decomposition layers, 630 Deformable template matching, 343 683 INDEX Delta-bar-delta, 113 Delta-delta rule, 113 Dendrogram, 452 See also threshold dendrogram Deterministic annealing, 580-581 Diameter of a cluster, 610 Dilation, 565 Directed graphs, 464 Directed path, 550 Directed tree, 550 Direction length features, 299 Discordant pair, 603 Discrete binary (DB) set, 564 Discrete cosine, 230 Discrete cosine transform (DCT), 230 Discrete fourier transform, 226 Discrete observation HMM models, 366 Discrete sine transform (DST), 231 Discrete time wavelet coefficients, 243 Discrete time wavelet transform, 239 Discrete wavelet frame, 260 Discriminant functions, 19 Dispersion of a cluster, 612 Dissimilarity matrix, 45 Dissimilarity measure between points, 423,425 between sets, 406,423 Distance between a point and a quadratic surface algebraic distance, 508 normalized radial distance, 510 perpendicular distance, 508 radial distance, 509 Distortion function, 534 Distortion measure 534 Divergence, I74 Divisive hierarchical algorithm, 477, 450,480 Dunn index, 610 Dunn-like indices, 61CM11 Dynamic programming, 186,324 Dynamic similarity measures, 413 Dynamic time warping, 329 E Eccentricity, 301 Edge connectivity, 467 Edgeworth expansion, 223,646 Edit distance, 325 EM-algorithm, 491 Empirical classification error 193 End point constraints, 333 Entropy, 34,213,272,275 Erosion, 565 Error counting approach, 385 Euclidean distance, 25,405 Expectation maximization (EM) Algorithm, 36 External criteria, 592,595-60 F Feature selection, 163,398 Features, interval scaled, 40 I nominal, 401 ordinal, 40 ratio scaled, 401 Fcature vectors, Finite state automation, 361 First-order statistics features, 270 Fisher’s discriminant ratio, 181 Fisher’s linear discriminant, 190 Floating search methods, 185 Fourier descriptors, 296 Fowlkes and Mallows index, 598 Fractal dimension, 303 Fractals, 303 Fractional Brownian motion sequences, 307 Frobenius norm, 21 Fukuyamma-Sugeno index (FS), 618 Fuzzifier, 501 Fuzzy approaches, 490 See also fuzzy clustering algorithms Fuzzy average shell thickness, 620 Fuzzy C ellipsoidal shells (FCES) algorithm, 15 Fuzzy c-Means (FCM) algorithm, 505 INDEX 684 + Fuzzy c means algorithm, 522 Fuzzy C plano-quadric shells (FCPQS), 517 Fuzzy C quadric shells (FCQS) algorithm, 516 Fuzzy c-varieties (FCV) algorithm Fuzzy covariance matrix, 618 Fuzzy density, 619 Fuzzy hypervolume, 618 Fuzzy measures, 415 Fuzzy shell clustering algorithms, 12-5 17 adaptive fuzzy C-shells (AFCS) algorithm, 12-5 13 fuzzy C ellipsoidal shells (FCES) algorithm, 512,515,541 fuzzy C plano-quadric shells (FCPQS), 517 fuzzy C quadric shells (FCQS) algorithm, 15-5 16,541 modified fuzzy C quadric shells (MFCQS) algorithm, 515-516,541 Fuzzy shell covariance matrix, 620 Fuzzy shell density, 620 Generalized possibilistic algorithmic scheme (GPAS), 525 Generalized XB index, 17 Genetic algorithms, 432, 545,582 crossover, 582 mutation, 582 reproduction, 582 Geometric moments, 28 Gibbs random fields, 377 Global constraints, 325, 333 Global convergence theorem, 522 Grade of membership, 403 Gradient descent algorithm, 57, 106,659 Graph complete, 546 edges, 464,546 inconsistent edges, 546 vertices, 464 Graph theory-based algorithmic scheme (GTAS), 467 Gray level run lengths, 275 Gustafson-Kessel (G-K) algorithm, 17 H G r statistic, 598, 604 y statistic, 603 Gabor filter, 260 Gabriel graphs (GG), 550,611,613 Gauss-Newton method, 506 Generalized agglomerative scheme (GAS), 450 Generalized competitive learning scheme (GCLS), 554 Generalized divisive scheme (GDS), 478 Generalized fuzzy algorithmic scheme (GFAS), 504 Generalized hard algorithmic scheme (GHAS), 530 Generalized linear classifiers, 127 Generalized mixture decomposition algorithmic scheme (GMDAS), 492 Haar transform, 233,249 Hadamard transform, 23 Hamming distance, 410-41 Hard clustering, 49 I Hard clustering algorithms, 529 Hidden Markov models, 36 Hierarchical clustering algorithms, 43 agglomerative algorithms, 43 divisive algorithms, 431 Hierarchical search, 341 Higher order, 126 Holdout method, 388 Hopkins test, 629 Hubert’s r statistic, 598 See also r statistic Hughes phenomenon, 196 Hurst parameter, 308 Hypercube, 521 Hypersphere, 595 INDEX Hypothesis generation, 400 Hypothesis testing, 400,592405 I Images 207 Incomplete data set, 36 Independent component analysis (ICA), 219 Information theoretic clustering, 584 Information theory based criteria, 584, 613,614 Inner product, 408 Internal criteria, 592,595.602-605 Interpoint distances 628 Interpretation of clustering results, 399 Intersymbol interference 356 Inverse difference moment, 275 Isodata algorithm, 532 Isolated word recognition, 329 ltakura constraints, 333, 334 Itakura-Saito distortion, 555 685 Leave-one-out method, 388 Levenberg-Marquardt (L-M) method, 506 Levinson’s algorithm, 288 Lexicographic ordering, 208 Lifetime of a cluster, 480 Likelihood function, 20,28 Likelihood ratio, 18 Likelihood ratio test, 44 Linear classifiers, 55 Linear dichotomies, I29 Lloyd’s algorithm, 535 LMS algorithm, 68 I , metric dissimilarity measures, 407408 Local constraints, 325, 334 Local feature extractor, 279 Logarithmic search, 340 Logistic function, 104 Long run emphasis, 278 Loss matrix, 17 M J Jaccard coefficient 598 K Karhunen-Lotve transform, 10 Karush-Kuhn-Tucker (KKT) conditions, 84.669 Kernels, 41 Kesler’s construction, 64 k-means algorithm See isodata algorithm k Nearest Neighbor density estimation, 43 Kullback-Leibler distance, 176,223, 647 Kurtosis, 226,27 1,645 L Langrange multipliers, 665 LBG algorithm, 535 Leaky learning algorithm, 556 Learning subspace methods, 214 Least squares methods, 65 Machine vision, I Mahalanobis distance, 26 Markov chain, 352 Markov model, 352 Markov random fields, 292,375 Matching score generator, 443 Matrix updating algorithmic scheme (MUAS), 455 Maximally complete subgraph, 465 Maximally connected subgraph, 465 Maximum a posteriori probabiIity estimation, I Maximum cntropy estimation, 34 Maximum likelihood, 28 MaxNet, 443,444 Mean center, 42 Mean square error estimation, 65 Mean square error regression, 72 Mean vector, 420 Median center, 421 Mellin transforms, 339 Membership functions, 403 686 Merging procedure, 441 Minimum description length (MDL), 614 Minimum distance classifier, 25 Minimum spanning tree (MST), 546.61 1, 613,628,630 Minimum variance algorithm, 458, See also Ward’s algorithm Minmax duality, 672 Minkowski distance, 505 Mixture decomposition,490 Mixturc models, 35 Mixture scatter matrix, 180 Mode-seeking property, 526 Modified BSAS, 437 Modified fuzzy C quadric shells (FCQS) algorithm, 15 Modified Hubert r statistic, 608 Moments, 270 Moments of Hu, 283 Momentum factor, 112 Monothetic algorithms, 479 Monotonicity, 334,461 Monte Carlo techniques, 593 Morphologicaloperations, 565-568 closing, 568 dilation, 566 erosion, 566 opening, 568 translation, 566 Motion compensation, 337 Motion estimation, 337 Mountain method, 583 Multiresolution analysis, 25 Multiresolutiondecomposition, 238 Multiresolution interpretation, 250 Multispectral remote sensing, Mutual information, 223,372,647 N Natural gradient, 225 Nearest neighbor condition, 534 Nearest neighbor distances tests, 629 Nearest Neighbor rule, 44 Nested clusterings, 449 INDEX Network, 126 Neural networks, 373 Neyman-Scott procedure, 626 Newton’s algorithm, 663 Noble identity, 236,245 Node connectivity, 466 Node impurity, 146 Node degree, 467 Normalization, 565 Normalized r statistic, 598 Normalized central moments, 282 Normalized radial distance, IO, See also distance between a point and a quadratic surface Null hypothesis, 166,592,594 Octave-band filter banks, 247 One-dimensional DFT, 227 One-tailed statistical test, 593 left-tailed statistical test, 594 right-tailed statistical test, 594 Opening, 567 Optical character recognition (OCR), 294 Optimization,659 Ordinal proximity matrices, 621,622 Orientation, 301 Outliers, 498,522,524 Overtraining, 119 P P statistic, 632 Packing density, 627 Paraunitary, 244 Partial clusterings, 561 Partition algorithmic schemes, 526 Partition coefficient ( C , P ) 619 Partition density (PD) index, 619 Partition entropy (PE), 615,619 Partition shell density, 620 Parzen windows, 41,577 Path of a graph, 464 Pattern mode algorithm, 553 INDEX 687 Perceptron, 61 Perceptron algorithm, 57 Perceptron cost, 57 Perfect reconstruction, 242 Perpendicular distance, 508, See also distance between a point and a quardatic surface pFCM 506 Pocket algorithm, 63 Poisson distribution, 626 Poisson process, 626 Polynomial classifiers, 131 Polythetic algorithms, 479 Possibilistic algorithms, 432 Power function of a test, 592 Prediction based on groups, 401 Principle Component Analysis (PCA); See Karhunen-LoCve transform Probabilistic clustering algorithms, 432 Projection pursuit, 220 Proximity dendrogram, 453 Proximity function between a point and a set average proximity function, 419 maximum proximity function, 419 minimum proximity function, 419 between two sets average proximity function, 424 maximum proximity function, 424 mean proximity function, 424 minimum proximity function, 424 Proximity graph, 466,621 dissimilarity graph, 466 similarity graph, 466 Proximity matrix, 45 Proximity measure, 399 Pruning techniques, 120 Pseudolikelihood function, 378 Radial basis function networks, 133 Radial distance, 509, See also distance between a point and a quadratic surface Rand statistic, 598 Random field, 286 Random graph hypothesis, 595 Random label hypothesis, 596 Random position hypothesis, 595 Randomness hypothesis, 595,624 Ratio scaled proximity matrices, 622 Reassignment procedure, 442 Receiver operating characteristics curve, 173 Recognition, Recognition of handwritten, 256 Regions of influence, 549 Regularity hypothesis, 624 Relative criteria, 605-619 Relative edge consistence, 549 Relative neighborhood graph (RNG), 611,613 Representatives hyperplane, I7 point, 505-506 quadratic surfaces, 507 Reproduction set, 534 Reproduction vector, 533 Resubstitution method, 387 Reward and punishment I Right-tailed statistical test, 594 Risk or loss, 17 Robbins-Monro iteration, 556 Roundness ratio, 302 Run length nonuniformity, 278 Run percentage, 279 Q S Quadrat analysis, 628 Quadric surface, 507 hyperellipses 507 hyperparabolas, 507 Quickprop, I14 Sakoe and Chiba, 335 Saliency, 122 Sampling frame, 625 Sampling window, 624 Scan test, 628 R 688 Scatter matrices, 179 Second moment stmcture, 628 Segment modeling, 372 Self-affine, 307 Self-organizing maps (SOM), 559 Self-similarity, 303 Sequential backward selection, 184 Sequential clustering algorithms, 431, 433-440 Sequential decomposition, 630 Sequential forward selection, 185 Shape characterization, 294 Shell hypervolume, 620 Shell partition density, 620 Short run emphasis, 278 Short-time fourier transform, 251 Significance level, I67 Sigmoid functions, 104 Similarity matrix, 451 Similarity measure between points, 405 between sets, 406 Simple sequential inhibition (SSI), 627 Simulated annealing, 579-580 sweep, 579 Simulated annealing for clustering, 579-580 Sine transforms, 230 Single link, 455,467 Singular value decomposition, 21 Skewness, 271 Softmax activation, 152 Spanning tree, 473 Sparse decomposition technique, 630-634 Spatial dependence matrix, 273 Speaker-dependent recognition, 329 Speaker-independent recognition, 329 Spectral representation, 216 Speech recognition, 560 Statistic, 592 Stirling numbers of the second kind, 430 Stochastic approximation, 68 Stochastic relaxation methods, 432 INDEX String patterns, 322 Structural graph tests, 628 Structural risk minimization, 196 Subgraph, 464,465 complete, 464 connected, 464 Suhsampling, 235 Subspace classification, 214 Sum of error squares estimation, 70 Supervised learning vector quantization 560 Support vector machines, 77, 136, 139, 199 Surface density criterion, 623 System evaluation, 385 T r-distribution, 170 r-test, 171 T-squared sampling tests, 630 Tabu search method, 187,583 Tanimoto measure for continuous valued for discrete valued, 412 Template matching, 321 Templates, 32 Test statistic, 166 Texture, 259,270 Tied-mixture densities, 372 Tiling algorithm, 102 Threshold dendrogram, 47 Threshold graph, 465 Topological ordering, 560 Total fuzzy average shell thickness, 620 Total variation, 617 Touching clusters, 547 Translation, 566 Tree-structured filter bank, 238 Triangular inequality, 405 Two-dimensional AR models, 289 Two-dimensional DFT, 229 Bo-tailed statistical test, 593 Two-threshold sequential algorithmic scheme, 439 INDEX U Uncertainty principle, 261 Undirected graphs, 464 Universal approximators, 137 Unsupervised learning, 397 Unweighted graphs, 464 Unweighted pair group method average (UPGMA) 457 Unweighted pair group method centroid (UPGMC) 457 v Validation of clustering results, 399 Valley-seeking algorithms, 433,576 Vapnik-Chernovenkis learning theory, 193 Variational similarity, 326 Vector quantization 533 Vector quantizer 534 decoder, 535 encoder 534 Video coding, 32 Viterbi algorithm 353 689 Viterbi reestimation 369 Voronoi tessellation, 46 W Ward’s algorithm, 459 Wavelet packets 252 Weighted graphs 464 Weighted pair group method average (WPGMA), 457 Weighted pair group method centroid (WPGMC), 458 Weight sharing, 126 Well-formed functions, I5 Within-class scatter matrix, 179 Within scatter matrix, 532 X Xie-Beni (XB) index, 617 XOR problem, 93 Z Zernike moments, 284 Elechlcal Englnwdng / Computlng / Applied MathemaHcs / Pattern Recognition PATTERN RECOGNITION SERGIOS THEODORIDIS KONSTANTINOS KOUTROUMBAS Pattern recognition is incredibly important in all automation, information handling and retrieval applications The new edition of Pattern Recognition, a text written b y two of the field's leading experts, covers the entire spectrum of pattern recognition applications from an engineering perspective, examining topics from image analysis to speech recognition and communications This thoroughly updated edition presents cutting-edge material on neural networks and highlights the latest developments in this growing field, including independent components and support vector machines Developed through more than 10 years of teaching experience, Paffern Recognition is the most comprehensive reference available for both engineering students and practicing engineers Coverage Includes: Latest techniques in feature generation, including features based on Wavelets, Wavelet Packets, Fractals and a new section on Independent Component Analysts (ICA) All new sections on Support Vector Machines, Deformable Template Matching and a related appendix on Constrained Optimization Feature selection techniques Design of linear and non-linear classifiers, including Bayesian, Multilayer Perceptrons,Decision Trees and RBF networks Context-dependent classification, including Dynamic Programming and Hidden Markov Modeling techniques Classical approaches, as well as the most recent developments in clustering algorithms, such as fuzzy, possibilistic, morphological, genetic, and annealing techniques Coverage of numerous, diverse applications, including Image Analysis, Character Recognition, Medical Diagnosis, Speech Recognition, and Channel Equalization Numerous computer simulation examples, supporting the methods given in the book, available via the Web Printed in the United States of America ELSEVER ACADEMIC I S B N 0-12-b85875-b ... DISTRIBUTIONS 25 where w = c- I (pi - p j ) (2.37) and xo =-( pi+pj)-ln (f":;:) - P - Pj2 i lla; - ajllx-, (2.38) where IIXII~:-I ( x T C - '' x ) '' / * is the so-called C-'' norm of x The comments... two mean vectors Thus, &CL1.x)=(x-CL,I) T = [ O, 2.21 -1 (X-fil) [ 0.95 -0 I5 -0 .15 0.55 ] 1.0 [2.2] = 2''952 Similarly, d i ( p , X) = [-2 .0, -0 .81 [-0 .15 0.95 -0 ''15] 0.55 [: I] : = 3.672 (2.46)... 0.0 = 0.0 0.11 [ Chapter 2: CLASSJFIERSBASED ON BAYES DECISION THEORY 22 22 $2 "2 -2 w2 -3 -5 -3 -2 -1 $1 -1 0 -5 $1 (b) (a) FIGURE 2.2: Quadric decision curves Decision Hyperplanes The only quadratic

Định dạng
Số trang	710
Dung lượng	17,62 MB