Using Genetic Algorithms for Feature Selection and Weighting in Character Recognition Systems

Using Genetic Algorithms for Feature Selection and Weighting in Character Recognition Systems Nawwaf Kharma kharma@ece.concordia.ca Electrical and Computer Engineering Department, Concordia University, Montreal, Quebec, H3G 1M8 Canada Faten Hussein fatenh@ece.ubc.ca Electrical and Computer Engineering Department, University of British Columbia, Vancouver, B.C., V6T 1Z4 Canada Rabab Ward rababw@cicsr.ubc.ca Electrical and Computer Engineering Department, University of British Columbia, Vancouver, B.C., V6T 1Z4 Canada Abstract Feature weighting is the general case of feature selection, and hence should perform better than feature selection, at least in some situations The initial purpose of this study was to test the validity of this hypothesis within the context of character recognition systems However, we ended up carrying out two sets of studies, which in turn produced some unexpected but justified results The first set compares the performance of Genetic Algorithm (GA)-based feature selection to GA-based feature weighting, under various conditions The second set of studies evaluates the performance of the better method (which turned out to be feature selection) in terms of optimal performance and time The results of these studies show that (a) feature set selection prior to classification is important for k-nearest neighbour classifiers, in the presence of redundant or irrelevant features; and (b) that GAs are effective methods for feature selection However, their scalability to highly-dimensional problems, in practice, is still an open problem Keywords Character recognition, feature selection, feature weighting, Genetic Algorithms, k-Nearest Neighbour classifiers, optimization Introduction Computer-based pattern recognition is a process that involves several sub-processes, including preprocessing, feature extraction, classification, and post-processing (Kharma & Ward, 1999) Preprocessing encompasses all those functions that prepare an input pattern for effective and efficient extraction of relevant features Feature extraction is the measurement of certain attributes of the target pattern (e.g., the coordinates of the centre of gravity) Classification utilizes the values of these attributes to assign a class to the input pattern In our view, the selection and weighting of the right set of features is the hardest part of building a pattern recognition system The ultimate aim of our research work is the automation of the process of feature selection and weighting, within the context of character/symbol recognition systems Our chosen method of automation is Genetic Algorithms (see section 1.3 for justification) Genetic Algorithms (GAs) have been used for feature selection and weighing in many pattern recognition applications (e.g., texture classification and medical diagnostics) However, their use in feature selection (let alone weighting) in character recognition applications has been infrequent This fact is made clear in section Recently, the authors have demonstrated that GAs can, in principle, be used to configure the real-valued weights of a classifier component of a character recognition system in a near-optimal way (Hussein, Kharma & Ward, 2001) This study subsumes and further expands upon that effort Here, we carry out two sets of studies, which in turn produce some unexpected but justified results The first set (section 4.1) compares the performance of GA-based feature selection to GA-based feature weighting, under various conditions The second set of studies (section 4.2) evaluate the performance of the better method (which turns out to be feature selection) in terms of optimality and time The penultimate part of this paper (section 5) summarizes the lessons learnt from this research effort The most important conclusions are that (a) feature set selection (or pruning) prior to classification is essential for k-nearest neighbour classifiers, in the presence of redundant or irrelevant features; and (b) that GAs are effective methods for feature selection: they (almost always) find the optimal feature subsets and so within a small fraction of the time required for an exhaustive search The question of how well our method will scale-up to highly dimensional feature spaces remains an open problem This, as well as other problems appropriate for future research, are listed in section The following sections (1.1-2) provide the technical reader with an introduction to two directly related areas necessary for the appreciation of the rest of the paper 1.1 Instance Based Learning Algorithms Instance based learning algorithms are a class of supervised machine learning algorithms These algorithms not construct abstract concepts, but rather base their classification of new instances on their similarity to specific training instances (Aha, 1992) Old training instances are stored in memory, and classification is postponed until new instances are received by the classifier When a new instance is received, older instances similar in some respects to it are retrieved from memory and used to classify the new instance Instance based learning algorithms have the advantages of being able to (a) learn complex target concepts (e.g functions); and (b) estimate target concepts distinctly for each new instance In addition, their training is very fast and simple: it only requires storing all the training instances in memory In contrast, the cost of classifying new instances can be high because every new instance is compared to every training instance Hence, efficient indexing of training instances is important Another disadvantage of these learning algorithms is that their classification accuracy degrades significantly in the presence of noise (in training instances) One well-known and used instance based algorithm is the k-nearest neighbour algorithm of (Dasarathy, 1991) The function it uses for measuring similarity between two instances is based on Euclidean distance This is described thus: n D(x,y) = ∑ (x i =1 i − yi ) (Eq.1) Where D is distance, x and y are two instances, xi and yi are the i-th attribute for the x and y instances, and n is the total number of features To compensate for the difference in units between features, normalization should be performed This often scales all features to a range between and 1, inclusive 1.2 Feature Selection and Feature Weighting One major drawback of the Euclidean distance function is its sensitivity to the presence of noise, and particularly, redundant or irrelevant features This is because it treats all features of an instance (relevant or not) as equally important to its successful classification A possible remedy is to assign weights to features The weights can then be used to reflect the relative relevance of their respective features to correct classification Highly relevant features would be assigned high weights relative to the weights of redundant or irrelevant features Taking that into account, the Euclidean distance measure can be refined: n D(x,y) = ∑ w (x i =1 i i − yi ) (Eq.2) where wi is the weight of the i-th feature In feature weighting, the weights can hold any value in a continuous range of values (e.g [0,1]) The purpose of feature weighting is to find a vector of real-valued weights that would optimize classification accuracy of some classification or recognition system Feature selection is different Given a set of n features, feature selection aims to find a subset of m features (where m

Định dạng
Số trang	29
Dung lượng	712 KB