© 2000 by CRC Press LLC Supervised and unsupervised Pattern Recognition Feature Extraction and Computational © 2000 by CRC Press LLC Titles Included in the Series Supervised and Unsupervised Pattern Recognition: Feature Extraction and Computational Intelligence Evangelia Micheli-Tzanakou, Rutgers University Handbook of Applied Computational Intelligence Mary Lou Padgett, Auburn University Nicholas Karayiannis, University of Houston Lofti A. Zaden, University of California Berkeley Handbook of Applied Neurocontrols Mary Lou Padgett, Auburn University Charles C. Jorgensen, NASA Ames Research Center Paul Werbos, National Science Foundation Handbook of Power Electronics Tim L. Skvarenina, Purdue University Series Editor J. David Irwin, Auburn University Industrial Electronics Series Supervised and unsupervised Pattern Recognition Evangelia Micheli-Tzanakou Rutgers University Piscataway, New Jersey Boca Raton London New York Washington, D.C. CRC Press Feature Extraction and Computational Industrial Electronics Series © 2000 by CRC Press LLC Library of Congress Cataloging-in-Publication Data Micheli-Tzanakou, Evangelia, 1942- Supervised and unsupervised pattern recognition: feature extraction and computational intelligence /Evangelia Micheli-Tzanakou, editor/author p. cm (Industrial electronics series) Includes bibliographical references and index. ISBN 0-8493-2278-2 1. Pattern recognition systems. 2. Neural networks (Computer science) I. Title. II. Series. TK7882.P3 M53 1999 006.4 dc21 99-043495 CIP This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press for such copying. Direct all inquiries to CRC Press LLC., 2000 Corporate Blvd., N.W., Boca Raton, Florida 33431. © 2000 by CRC Press LLC No claim to original U.S. Government works International Standard Book Number 0-8493-2278-2 Library of Congress Card Number 99-043495 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper © 2000 by CRC Press LLC Dedication To my late mother for never being satisfied with my progress and for always pushing me to better things in life. © 2000 by CRC Press LLC PREFACE This volume describes the application of supervised and unsupervised pattern rec- ognition schemes to the classification of various types of waveforms and images. An optimization routine, ALOPEX, is used to train the network while decreasing the likelihood of local solutions. The chapters included in this volume bring together recent research of more than ten authors in the field of neural networks and pattern recognition. All of these contributions were carried out in the Neuroelectric and Neurocomputing Laboratories in the Department of Biomedical Engineering at Rut- gers University. The chapters span a large variety of problems in signal and image processing, using mainly neural networks for classification and template matching. The inputs to the neural networks are features extracted from a signal or an image by sophisticated and proven state-of-the-art techniques from the fields of digital signal processing, computer vision, and image processing. In all examples and problems examined, the biological equivalents are used as prototypes and/or simu- lations of those systems were performed while systems that mimic the biological functions are built. Experimental and theoretical contributions are treated equally, and interchanges between the two are examined. Technological advances depend on a deep under- standing of their biological counterparts, which is why in our laboratories, experi- ments on both animals and humans are performed continuously in order to test our hypotheses in developing products that have technological applications. The reasoning of most neural networks in their decision making cannot easily be extracted upon the completion of training. However, due to the linearity of the network nodes, the cluster prototypes of an unsupervised system can be reconstructed to illustrate the reasoning of the system. In these applications, this analysis hints at the usefulness of previously unused portions of the spectrum. The book is divided into four parts. The first part contains chapters that introduce the subjects of neural networks, classifiers, and feature extraction methods. Neural networks are of the supervised type of learning. The second part deals with unsupervised neural networks and fuzzy neural networks and their applications to handwritten character recognition, as well as recognition of normal and abnormal visual evoked potentials. The third part deals with advanced neural network archi- tectures, such as modular designs and their applications to medicine and three- dimensional neural networks architectures simulating brain functions. Finally, the fourth part discusses general applications and simulations in various fields. Most importantly, the establishment of a brain-to-computer link is discussed in some detail, and the findings from these human experiments are analyzed in a new light. All chapters have either been published in their final form or in a preliminary form in conference proceedings and presentations. All co-authors to these papers were mostly students of the editor. Extensive editing has been done so that repetitions © 2000 by CRC Press LLC of algorithms, unless modified, are avoided. Instead, where commonality exists, parts have been placed into a new chapter (Chapter 4), and references to this chapter are made throughout. As is obvious from the number of names on the chapters, many students have contributed to this compendium. I thank them from this position as well. Others contributed in different ways. Mrs. Marge Melton helped with her expert typing of parts of this book and with proofreading the manuscript. Mr. Steven Orbine helped in more than one way, whenever expert help was needed. Dr. G. Kontaxakis, Dr. P. Munoz, and Mr. Wei Lin helped with the manuscripts of Chapters 1 and 3. Finally, to all the current students of my laboratories, for their patience while this work was compiled, many thanks. I will be more visible—and demanding—now. Dr. D. Irwin was instrumental in involving me in this book series, and I thank him from this position as well. Ms. Nora Konopka I thank for her patience in waiting and for reminding me of the deadlines, a job that was continued by Ms. Felicia Shapiro and Ms. Mimi Williams. I thank them as well. Evangelia Micheli-Tzanakou, Ph.D. Department of Biomedical Engineering Rutgers University Piscataway, NJ © 2000 by CRC Press LLC Contributors Ahmet Ademoglu, Ph.D. Assistant Professor Institute of Biomedical Engineering Bogazici University Bebek, Istanbul, Turkey Sergey Aleynikov, M.S. IDT Hackensack, NJ Jeremy Bricker, Ph.D. Candidate Environmental Fluid Mechanics Laboratory Department of Civil and Environmental Engineering Stanford, CA Tae-Soo Chon, Ph.D. Professor Department of Biology College of Natural Sciences Pusan National University Pusan, Korea Woogon Chung, Ph.D. Assistant Professor Department of Control and Instrumentation Sung Kyun Kwan University Kyung Gi-Do, South Korea Lt. Col. Timothy Cooley, Ph.D. USAF Academy Department of Mathematical Sciences Colorado Springs, CO Timothy J. Dasey, Ph.D. MIT Lincoln Labs Weather Sensing Group Lexington, MA Cynthia Enderwick, M.S. Hewlett Packard Palo Alto, CA Faiq A. Fazal, M.S. Lucent Technologies Murray Hill, NJ Raymond Iezzi, M.D. Kresge Institute Detroit, Michigan Francis Phan, M.S. Harmonix Music Systems, Inc. Cambridge, MA Seth Wolpert, Ph.D. Associate Professor Pennsylvania State University — Harrisburg Middletown, PA Daniel Zahner, M.S. Data Scope Co. Paramus, NJ © 2000 by CRC Press LLC Contents Section I — Overviews of Neural Networks, Classifiers, and Feature Extraction Methods—Supervised Neural Networks Chapter 1 Classifiers: An Overview 1.1 Introduction 1.2 Criteria for Optimal Classifier Design 1.3 Categorizing the Classifiers 1.3.1 Bayesian Optimal Classifiers 1.3.2 Exemplar Classifiers 1.3.3 Space Partition Methods 1.3.4 Neural Networks 1.4 Classifiers 1.4.1 Bayesian Classifiers 1.4.1.1 Minimum ECM Classifers 1.4.1.2 Multi-Class Optimal Classifiers 1.4.2 Bayesian Classifiers with Multivariate Normal Populations 1.4.2.1 Quadratic Discriminant Score 1.4.2.2 Linear Discriminant Score 1.4.2.3 Linear Discriminant Analysis and Classification 1.4.2.4 Equivalence of LDF to Minimum TPM Classifier 1.4.3 Learning Vector Quantizer (LVQ) 1.4.3.1 Competitive Learning 1.4.3.2 Self-Organizing Map 1.4.3.3 Learning Vector Quantization 1.4.4 Nearest Neighbor Rule 1.5 Neural Networks (NN) 1.5.1 Introduction 1.5.1.1 Artificial Neural Networks 1.5.1.2 Usage of Neural Networks 1.5.1.3 Other Neural Networks 1.5.2 Feed-Forward Neural Networks 1.5.3 Error Backpropagation 1.5.3.1 Madaline Rule III for Multilayer Network with Sigmoid Function 1.5.3.2 A Comment on the Terminology ‘Backpropagation’ © 2000 by CRC Press LLC 1.5.3.3 Optimization Machines with Feed-Forward Multilayer Perceptrons 1.5.3.4 Justification for Gradient Methods for Nonlinear Function Approximation 1.5.3.5 Training Methods for Feed-Forward Networks 1.5.4 Issues in Neural Networks 1.5.4.1 Universal Approximation 1.5.5 Enhancing Convergence Rate and Generalization of an Optimization Machine 1.5.5.1 Suggestions for Improving the Convergence 1.5.5.2 Quick Prop 1.5.5.3 Kullback-Leibler Distance 1.5.5.4 Weight Decay 1.5.5.5 Regression Methods for Classification Purposes 1.5.6 Two-Group Regression and Linear Discriminant Function 1.5.7 Multi-Response Regression and Flexible Discriminant Analysis 1.5.7.1 Powerful Nonparametric Regression Methods for Classification Problems 1.5.8 Optimal Scoring (OS) 1.5.8.1 Partially Minimized ASR 1.5.9 Canonical Correlation Analysis 1.5.10 Linear Discriminant Analysis 1.5.10.1 LDA Revisited 1.5.11 Translation of Optimal Scoring Dimensions into Discriminant Coordinates 1.5.12 Linear Discriminant Analysis via Optimal Scoring 1.5.12.1 LDA via OS 1.5.13 Flexible Discriminant Analysis by Optimal Scoring 1.6 Comparison of Experimental Results 1.7 System Performance Assessment 1.7.1 Classifier Evaluation 1.7.1.1 Hold-Out Method 1.7.1.2 K-Fold Cross-Validation 1.7.2 Bootstrapping Method for Estimation 1.7.2.1 Jackknife Estimation 1.7.2.2 Bootstrap Method 1.8 Analysis of Prediction Rates from Bootstrapping Assessment References Chapter 2 Artificial Neural Networks: Definitions, Methods, Applications 2.1 Introduction 2.2 Definitions 2.3 Training Algorithms [...]... these interconnected devices look at patterns of data and learn to classify them NNs have been used in a wide variety of signal processing and pattern recognition applications and have been successfully applied in such diverse fields as speech processing, handwritten character recognition, time series prediction, data compression, feature extraction, and pattern recognition in general Their attractiveness... the purpose of data reduction for storage and transmission The exemplar classifiers (except for the KNN classifier) cluster the training patterns via unsupervised learning then followed by supervised learning or label assignment A Radial Basis Function (RBF) network14 is also a combination of unsupervised and supervised learning The basis function is radial and symmetric around the mean vector, which... misclassification for a pattern recognition system to misclassify pattern ‘A’ as pattern ‘B’ may be considered the same as the cost to misclassifying pattern ‘B’ as pattern ‘A’ In this situation we can disregard the cost information or assign the same cost to all cases An optimal classification procedure might also consider only the probability of misclassification (from conditional distributions) and its likelihood... presented with pattern near a prototype X it should output pattern X′, and as autoassociative memory or contents-addressable memory by which the desired output is completed to become X In all cases the network learns or is trained by the repeated presentation of patterns with known required outputs (or pattern indicators) Supervised neural networks find a mapping f : ᐄ → Ᏻ for a given set of input and output... 2.4.1 Expert Systems and Neural Networks 2.4.2 Applications in Mammography 2.4.3 Chromosome and Genetic Sequences Classification References Chapter 3 A System for Handwritten Digit Recognition 3.1 Introduction 3.2 Preprocessing of Handwritten Digit Images 3.2.1 Optimal Size of the Mask for Dilation 3.2.2 Bartlett Statistic 3.3 Zernike Moments (ZM) for Characterization of Image Patterns 3.3.1 Reconstruction... used to find the majority labels among the K closest patterns to a codebook vector ml Thus the LVQ, a form of supervised learning, follows the unsupervised learning, self-organizing map, as shown in Figure 1.3: FIGURE 1.3 Quantization Block diagram for a system of Self-organizing Map and Learning Vector The last two stages in the figure are called LVQ, and researchers10,24 have come up with different updating... happening in about 1 ~ 10 milliseconds.28 Yet we can recognize an old friend’s face and call him in about 0.1 seconds This is a complex pattern recognition task which must be performed in a highly parallel way, since the recognition is done in about 100 ~ 1000 steps This suggests that highly parallel systems can perform pattern recognition tasks more rapidly than current conventional sequential computers... Analysis 4.6 Fractal Dimension 4.7 SGLD Texture Features References © 2000 by CRC Press LLC Section II Unsupervised Neural Networks Chapter 5 Fuzzy Neural Networks 5.1 Introduction 5.2 Pattern Recognition 5.2.1 Theory and Applications 5.2.2 Feature Extraction 5.2.3 Clustering 5.3 Optimization 5.3.1 Theory and Objectives 5.3.2 Background 5.3.3 Modified ALOPEX Algorithm 5.4 System Design 5.4.1 Feature Extraction... family is unsupervised learning, that is clustering The class information is not known or it is irrelevant; the networks find the groups of the similar input patterns The neighboring code vectors in a neural network compete in their activities by means of mutual lateral interactions and develop adaptively into specific detectors of different signal patterns Examples are the Self-Organizing Map10 and the... 1.39 and Equation 1.40 1.4.4 NEAREST NEIGHBOR RULE The Nearest Neighbor (NN) classifier, a nonparametric exemplar method, is the natural classification method one can first think of Using the label information of the training sample, an unknown observation x is compared with all the cases in the training sample N distances between a pattern vector x and all the training patterns are calculated, and the . Supervised and unsupervised Pattern Recognition Feature Extraction and Computational © 2000 by CRC Press LLC Titles Included in the Series Supervised and Unsupervised. Series Editor J. David Irwin, Auburn University Industrial Electronics Series Supervised and unsupervised Pattern Recognition Evangelia Micheli-Tzanakou Rutgers University Piscataway, New Jersey Boca. with my progress and for always pushing me to better things in life. © 2000 by CRC Press LLC PREFACE This volume describes the application of supervised and unsupervised pattern rec- ognition