John wiley sons kuncheva combining pattern classifiers methods and algorithms (2004)(isbn 0471210781)(360s)mvsa

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	360
Dung lượng	2,93 MB

Nội dung

Combining Pattern Classifiers Combining Pattern Classifiers Methods and Algorithms Ludmila I Kuncheva A JOHN WILEY & SONS, INC., PUBLICATION Copyright # 2004 by John Wiley & Sons, Inc All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008 Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages For general information on our other products and services please contact our Customer Care Department within the U.S at 877-762-2974, outside the U.S at 317-572-3993 or fax 317-572-4002 Wiley also publishes its books in a variety of electronic formats Some content that appears in print, however, may not be available in electronic format Library of Congress Cataloging-in-Publication Data: Kuncheva, Ludmila I (Ludmila Ilieva), 1959– Combining pattern classifiers: methods and algorithms/Ludmila I Kuncheva p cm “A Wiley-Interscience publication.” Includes bibliographical references and index ISBN 0-471-21078-1 (cloth) Pattern recognition systems Image processing–Digital techniques I Title TK7882.P3K83 2004 006.4–dc22 2003056883 Printed in the United States of America 10 Contents Preface xiii Acknowledgments xvii Notation and Acronyms xix Fundamentals of Pattern Recognition 1.1 1.2 1.3 1.4 1.5 Basic Concepts: Class, Feature, and Data Set / 1.1.1 Pattern Recognition Cycle / 1.1.2 Classes and Class Labels / 1.1.3 Features / 1.1.4 Data Set / Classifier, Discriminant Functions, and Classification Regions / Classification Error and Classification Accuracy / 1.3.1 Calculation of the Error / 1.3.2 Training and Testing Data Sets / 1.3.3 Confusion Matrices and Loss Matrices / 10 Experimental Comparison of Classifiers / 12 1.4.1 McNemar and Difference of Proportion Tests / 13 1.4.2 Cochran’s Q Test and F-Test / 16 1.4.3 Cross-Validation Tests / 18 1.4.4 Experiment Design / 22 Bayes Decision Theory / 25 1.5.1 Probabilistic Framework / 25 v vi CONTENTS 1.5.2 Normal Distribution / 26 1.5.3 Generate Your Own Data / 27 1.5.4 Discriminant Functions and Decision Boundaries / 30 1.5.5 Bayes Error / 32 1.5.6 Multinomial Selection Procedure for Comparing Classifiers / 34 1.6 Taxonomy of Classifier Design Methods / 35 1.7 Clustering / 37 Appendix 1A K-Hold-Out Paired t-Test / 39 Appendix 1B K-Fold Cross-Validation Paired t-Test / 40 Appendix 1C Â 2cv Paired t-Test / 41 Appendix 1D 500 Generations of Training/Testing Data and Calculation of the Paired t-Test Statistic / 42 Appendix 1E Data Generation: Lissajous Figure Data / 42 Base Classifiers Linear and Quadratic Classifiers / 45 2.1.1 Linear Discriminant Classifier / 45 2.1.2 Quadratic Discriminant Classifier / 46 2.1.3 Using Data Weights with a Linear Discriminant Classifier and Quadratic Discriminant Classifier / 47 2.1.4 Regularized Discriminant Analysis / 48 2.2 Nonparametric Classifiers / 50 2.2.1 Multinomial Classifiers / 51 2.2.2 Parzen Classifier / 54 2.3 The k-Nearest Neighbor Rule / 56 2.3.1 Theoretical Background / 56 2.3.2 Finding k-nn Prototypes / 59 2.3.3 k-nn Variants / 64 2.4 Tree Classifiers / 68 2.4.1 Binary Versus Nonbinary Splits / 71 2.4.2 Selection of the Feature for a Node / 71 2.4.3 Stopping Criterion / 74 2.4.4 Pruning Methods / 77 2.5 Neural Networks / 82 2.5.1 Neurons / 83 2.5.2 Rosenblatt’s Perceptron / 85 2.5.3 MultiLayer Perceptron / 86 2.5.4 Backpropagation Training of MultiLayer Perceptron / 89 Appendix 2A Matlab Code for Tree Classifiers / 95 Appendix 2B Matlab Code for Neural Network Classifiers / 99 2.1 45 CONTENTS Multiple Classifier Systems vii 101 Philosophy / 101 3.1.1 Statistical / 102 3.1.2 Computational / 103 3.1.3 Representational / 103 3.2 Terminologies and Taxonomies / 104 3.2.1 Fusion and Selection / 106 3.2.2 Decision Optimization and Coverage Optimization / 106 3.2.3 Trainable and Nontrainable Ensembles / 107 3.3 To Train or Not to Train? / 107 3.3.1 Tips for Training the Ensemble / 107 3.3.2 Idea of Stacked Generalization / 109 3.4 Remarks / 109 3.1 Fusion of Label Outputs 111 Types of Classifier Outputs / 111 Majority Vote / 112 4.2.1 Democracy in Classifier Combination / 112 4.2.2 Limits on the Majority Vote Accuracy: An Example / 116 4.2.3 Patterns of Success and Failure / 117 4.3 Weighted Majority Vote / 123 4.4 Naive Bayes Combination / 126 4.5 Multinomial Methods / 128 4.5.1 Behavior Knowledge Space Method / 128 4.5.2 Wernecke’s Method / 129 4.6 Probabilistic Approximation / 131 4.6.1 Calculation of the Probability Estimates / 134 4.6.2 Construction of the Tree / 135 4.7 Classifier Combination Using Singular Value Decomposition / 140 4.8 Conclusions / 144 Appendix 4A Matan’s Proof for the Limits on the Majority Vote Accuracy / 146 Appendix 4B Probabilistic Approximation of the Joint pmf for Class-Label Outputs / 148 4.1 4.2 Fusion of Continuous-Valued Outputs 5.1 How Do We Get Probability Outputs? / 152 5.1.1 Probabilities Based on Discriminant Scores / 152 151 viii CONTENTS 5.1.2 Probabilities Based on Counts: Laplace Estimator / 154 Class-Conscious Combiners / 157 5.2.1 Nontrainable Combiners / 157 5.2.2 Trainable Combiners / 163 5.3 Class-Indifferent Combiners / 170 5.3.1 Decision Templates / 170 5.3.2 Dempster– Shafer Combination / 175 5.4 Where Do the Simple Combiners Come from? / 177 5.4.1 Conditionally Independent Representations / 177 5.4.2 A Bayesian Perspective / 179 5.4.3 The Supra Bayesian Approach / 183 5.4.4 Kullback –Leibler Divergence / 184 5.4.5 Consensus Theory / 186 5.5 Comments / 187 Appendix 5A Calculation of l for the Fuzzy Integral Combiner / 188 5.2 Classifier Selection 6.1 6.2 6.3 6.4 6.5 6.6 Preliminaries / 189 Why Classifier Selection Works / 190 Estimating Local Competence Dynamically / 192 6.3.1 Decision-Independent Estimates / 192 6.3.2 Decision-Dependent Estimates / 193 6.3.3 Tie-Break for Classifiers with Equal Competences / 195 Preestimation of the Competence Regions / 196 6.4.1 Clustering / 197 6.4.2 Selective Clustering / 197 Selection or Fusion? / 199 Base Classifiers and Mixture of Experts / 200 Bagging and Boosting 7.1 7.2 189 Bagging / 203 7.1.1 Origins: Bagging Predictors / 203 7.1.2 Why Does Bagging Work? / 204 7.1.3 Variants of Bagging / 207 Boosting / 212 7.2.1 Origins: Algorithm Hedge(b) / 212 7.2.2 AdaBoost Algorithm / 215 7.2.3 arc-x4 Algorithm / 215 203 CONTENTS ix 7.2.4 Why Does AdaBoost Work? / 217 7.2.5 Variants of Boosting / 221 7.3 Bias-Variance Decomposition / 222 7.3.1 Bias, Variance, and Noise of the Classification Error / 223 7.3.2 Decomposition of the Error / 226 7.3.3 How Do Bagging and Boosting Affect Bias and Variance? / 229 7.4 Which Is Better: Bagging or Boosting? / 229 Appendix 7A Proof of the Error Bound on the Training Set for AdaBoost (Two Classes) / 230 Appendix 7B Proof of the Error Bound on the Training Set for AdaBoost (c Classes) / 234 Miscellanea 237 Feature Selection / 237 8.1.1 Natural Grouping / 237 8.1.2 Random Selection / 237 8.1.3 Nonrandom Selection / 238 8.1.4 Genetic Algorithms / 240 8.1.5 Ensemble Methods for Feature Selection / 242 8.2 Error Correcting Output Codes / 244 8.2.1 Code Designs / 244 8.2.2 Implementation Issues / 247 8.2.3 Error Correcting Ouput Codes, Voting, and Decision Templates / 248 8.2.4 Soft Error Correcting Output Code Labels and Pairwise Classification / 249 8.2.5 Comments and Further Directions / 250 8.3 Combining Clustering Results / 251 8.3.1 Measuring Similarity Between Partitions / 251 8.3.2 Evaluating Clustering Algorithms / 254 8.3.3 Cluster Ensembles / 257 Appendix 8A Exhaustive Generation of Error Correcting Output Codes / 264 Appendix 8B Random Generation of Error Correcting Output Codes / 264 Appendix 8C Model Explorer Algorithm for Determining the Number of Clusters c / 265 8.1 Theoretical Views and Results 9.1 Equivalence of Simple Combination Rules / 267 267 x CONTENTS 9.1.1 9.2 9.3 9.4 10 Equivalence of MINIMUM and MAXIMUM Combiners for Two Classes / 267 9.1.2 Equivalence of MAJORITY VOTE and MEDIAN Combiners for Two Classes and Odd Number of Classifiers / 268 Added Error for the Mean Combination Rule / 269 9.2.1 Added Error of an Individual Classifier / 269 9.2.2 Added Error of the Ensemble / 273 9.2.3 Relationship Between the Individual Outputs’ Correlation and the Ensemble Error / 275 9.2.4 Questioning the Assumptions and Identifying Further Problems / 276 Added Error for the Weighted Mean Combination / 279 9.3.1 Error Formula / 280 9.3.2 Optimal Weights for Independent Classifiers / 282 Ensemble Error for Normal and Uniform Distributions of the Outputs / 283 9.4.1 Individual Error / 287 9.4.2 Minimum and Maximum / 287 9.4.3 Mean / 288 9.4.4 Median and Majority Vote / 288 9.4.5 Oracle / 290 9.4.6 Example / 290 Diversity in Classifier Ensembles 10.1 10.2 10.3 10.4 What Is Diversity? / 295 10.1.1 Diversity in Biology / 296 10.1.2 Diversity in Software Engineering / 298 10.1.3 Statistical Measures of Relationship / 298 Measuring Diversity in Classifier Ensembles / 300 10.2.1 Pairwise Measures / 300 10.2.2 Nonpairwise Measures / 301 Relationship Between Diversity and Accuracy / 306 10.3.1 Example / 306 10.3.2 Relationship Patterns / 309 Using Diversity / 314 10.4.1 Diversity for Finding Bounds and Theoretical Relationships / 314 10.4.2 Diversity for Visualization / 315 295 CONTENTS xi 10.4.3 Overproduce and Select / 315 10.4.4 Diversity for Building the Ensemble / 322 10.5 Conclusions: Diversity of Diversity / 322 Appendix 10A Equivalence Between the Averaged Disagreement Measure Dav and Kohavi–Wolpert Variance KW / 323 Appendix 10B Matlab Code for Some Overproduce and Select Algorithms / 325 References 329 Index 347 .. .Combining Pattern Classifiers Methods and Algorithms Ludmila I Kuncheva A JOHN WILEY & SONS, INC., PUBLICATION Copyright # 2004 by John Wiley & Sons, Inc All rights reserved Published by John. .. easily Combining Pattern Classifiers: Methods and Algorithms, by Ludmila I Kuncheva ISBN 0-471-21078-1 Copyright # 2004 John Wiley & Sons, Inc FUNDAMENTALS OF PATTERN RECOGNITION Fig 1.1 The pattern. .. Congress Cataloging-in-Publication Data: Kuncheva, Ludmila I (Ludmila Ilieva), 1959– Combining pattern classifiers: methods and algorithms/ Ludmila I Kuncheva p cm “A Wiley- Interscience publication.”

Ngày đăng: 23/05/2018, 14:00