Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 16 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
16
Dung lượng
1,62 MB
Nội dung
216 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 17, NO 2, FEBRUARY 2005 Genetic Evolution Processing of Data Structures for Image Classification Siu-Yeung Cho, Member, IEEE, and Zheru Chi, Member, IEEE Abstract—This paper describes a method of structural pattern recognition based on a genetic evolution processing of data structures with neural networks representation Conventionally, one of the most popular learning formulations of data structure processing is Backpropagation Through Structures (BPTS) [7] The BPTS algorithm has been successfully applied to a number of learning tasks that involved structural patterns such as image, shape, and texture classifications However, this BPTS typed algorithm suffers from the long-term dependency problem in learning very deep tree structures In this paper, we propose the genetic evolution for this data structures processing The idea of this algorithm is to tune the learning parameters by the genetic evolution with specified chromosome structures Also, the fitness evaluation as well as the adaptive crossover and mutation for this structural genetic processing are investigated in this paper An application to flowers image classification by a structural representation is provided for the validation of our method The obtained results significantly support the capabilities of our proposed approach to classify and recognize flowers in terms of generalization and noise robustness Index Terms—Adaptive processing of data structures, genetic algorithm, image classification, and neural networks æ I INTRODUCTION many application domains in the field of pattern recognition and classification, it is more appropriate to model objects by data structures The topological behavior in the structural representation provides significant information to describe the nature of objects Unfortunately, most connectionist models assume that data are organized by relatively poor structures, such as arrays or sequences, rather than by a hierarchical manner In recent years, machine learning models conceived for dealing with sequences have been straightforwardly adapted to process data structures For instance, in image processing, a basic issue is how to understand a particular given scene Fig shows a tree representation of a flower image that can be used for content-based flower image retrieval and flower classification Obviously, the image can be segmented into two major regions (i.e., the background and foreground regions) and flower regions can then be extracted from the foreground region A tree-structure representation (to some extent of a semantic representation) can then be established and the image content can be better described The leaf nodes of the tree actually represent individual flower regions and the root node represents the whole image The intermediate tree nodes denote combined flower regions For flower classification, such a representation will take into account both flower regions and the background All the flower regions and the background in the tree representation will contribute to the flower classification to different extents partially decided by the tree structure The tree-structure processing by these specified models can N S.-Y Cho is with the Division of Computing Systems, School of Computer Engineering, Nanyang Technological University, 50 Nanyang Ave., Singapore 639798 E-mail: assycho@ntu.edu.sg Z Chi is with the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong E-mail: enzheru@polyu.edu.hk Manuscript received July 2003; revised Jan 2004; accepted 19 Apr 2004; published online 17 Dec 2004 For information on obtaining reprints of this article, please send e-mail to: tkde@computer.org, and reference IEEECS Log Number TKDE-0109-0703 1041-4347/05/$20.00 ß 2005 IEEE carry out on the sequential representation based upon the construction of trees However, this approach has two major drawbacks First, the sequential mapping of data structures, which are necessary to break some regularities inherently associated with the data structures, hence they will yield poor generalization Second, since the number of nodes grows exponentially with the depth of the trees, a large number of parameters need to be learned, which makes learning difficult and inefficient Neural networks (NNs) for adaptive processing of data structures are of paramount importance for structural pattern recognition and classification [1] The main motivation of this adaptive processing is that neural networks are able to classify static information or temporal sequences and to perform automatic inferring or learning [2], [3] Sperduti and Starita proposed supervised neural networks for the classification of data structures [4] This approach is based on using generalized recursive neurons [1], [5] Most recently, some advances in this area have been presented and some preliminary results have been obtained [6], [7], [8] The basic idea of a learning algorithm for this processing is to extend a Backpropagation Through Time (BPTT) algorithm [9] to encode data structures by recursive neurons The so-called recursive neurons means that a copy of the same neural network is used to encode every node of the tree structure In the BPTT algorithm, the gradients of the weights to be updated can be computed by backpropagating the error through the time sequence Similarly, if learning is performed on a data structure such as a directed acyclic graph (DAG), the gradients can be computed by backpropagating the error through the data structures, which is known as the Backpropagation Through Structure (BPTS) algorithm [5] However, this gradient-based learning algorithm has several shortcomings First, the rate of convergence is slow so that the learning process cannot guarantee completing within a reasonable time for most complex problems Although the algorithm can be accelerated simply by using a larger learning rate, this would probably introduce oscillation and might result in a failure in finding an optimal solution Published by the IEEE Computer Society CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 217 Fig A tree representation of a flower image Second, gradient-based algorithms are usually prone to local minima [10] From a theoretical point of view, we believe that gradient-based learning is not very reliable for rather complex error surfaces formulated in the data structure processing Third, it is extremely difficult for the gradient-based BPTS algorithm to learn a very deep tree structure because of the problem of long-term dependency [11], [12] Indeed, the gradient contribution disappears at a certain tree level when the error backpropagates through a deep tree structure (i.e., the learning information is latched) This is because the decreasing gradient terms tend to be zero since the backpropagating error is recursively multiplied by the derivative (between and 1) of the Sigmoid function in each neural node This results in convergence stalling and yields a poor generalization In view of the rather complex error surfaces formulated by the adaptive processing of data structures, we need more sophisticated learning schemes to replace the gradient-based algorithm so as to avoid the learning being converged to a suboptimal solution In our study, a Genetic-based Neural Network Processing of Data Structures (GNNPoDS) is developed to solve the problems of long-term dependency and local minima Genetic Algorithm (GA) or Evolutionary Computing (EC) [13], [14], [15] is a computational model inspired by population genetics It has been used mainly as function optimizers and it has been demonstrated to be effective in the global optimization Also, GA has been successfully applied to many multi objective optimizations Genetic evolution learning for NNs [16], [17] has been introduced to perform a global exploration of the search space, thus avoiding the problem of stagnation that is characteristic of local search procedures There are a number of different ways for GA implementation as the choice of genetic operations can be taken in various combinations During evolving the parameters of our proposed NN processing, the usual approach is to code the NN as a string obtained by concatenating the parameter values in one after another The structure of the strings corresponds to parameters to be learned and may vary depending on how we impose a certain fitness criteria In our study, two string structures are proposed The first one is called “whole-inone” structure Each parameter represents in 12-bits code and all parameters are arranged into a long string Simple fitness criteria based on the error between the target and the output values can be applied to this kind of string structure, but the problem lies in the slow convergence because the dimension of the strings is large As the string is not a simple chain like DNA structure, rather it is in a multidimensional form, performing crossover would become a rather complicated issue A simple single point crossover is not applicable for this structure; rather, a window crossover is suitable to be performed where a fixed window size of crossover segments is optimized The second string structure is called “4-parallel” structure Each parameter in four groups is represented in 12-bit code and all parameters are arranged into four parametric matrices, each of which is dealt with independently in the neural network processing of data structures It is a much faster approach compared with the “whole-in-one” structure, but a correlation among different groups of parameters to be learned may not be imposed directly for fitness evaluation based only on the error between the target and output values Therefore, introducing appropriate fitness function is an important issue Among many different kinds of encoding schemes available, the binary encoding is applied because of its simplicity Mutation and crossover size (i.e., window size in the “whole-in-one” structure) are determined and adjusted according to the best fitness among the population, which results in improving the GA convergence Our proposed GA-based NN processing of data structures are evaluated by flower images classifications [18] In this application, semantic image contents are represented by a tree-structure representation in which the algorithm can characterize the image features at multilevels to be beneficial to image classification by using a small number of simple features Experimental results illustrate that our proposed algorithm enhances the learning performance significantly in terms of quality of solution and the avoidance of the long-term dependency problem in the adaptive processing of data structures This paper is organized as follows: The basic idea of the neural network processing of data structures is presented in Section A discussion on the problem of long-term dependency for this processing is also given in this section Section presents the genetic evolution of the proposed neural network processing Section describes the method of generating the flower image representation by means of the tree structure and illustrates the working principle of this proposed application Section gives the simulation results and discussion of our study Finally, a conclusion is drawn in Section NEURAL NETWORK PROCESSING STRUCTURES (NNPODS) OF DATA In this paper, the problem of devising neural network architectures and learning algorithms for the adaptive processing of data structure is addressed in the content of classification of structured patterns The encoding method 218 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 17, NO 2, FEBRUARY 2005 Fig An illustration of a data structure with its nodes encoded by a single-hidden-layer neural network (a) A Directed Acyclic Graph (DAG) and (b) the encoded DAG by recursive neural networks is based on and modified by the research works of [1], [4] We consider that a structured domain D and all graphs (the tree is a special case of the graph) In the following discussion, we will use either graph or tree when it is appropriate G is a learning set representing the task of the adaptive processing of data structures This representation by the recursive neural network is shown in Fig As shown in Fig 2, a copy of the same neural network (shown on the right-side of Fig 2b) is used to encode every node in the graph G Such an encoding scheme is flexible enough to allow the model to deal with DAGs of different internal structures and with a different number of nodes Moreover, the model can also naturally integrate structural information into its processing In the Directed Acyclic Graph (DAG) shown in Fig 2a, the operation is run forward for each graph, i.e., from terminals nodes (N3 and N4) to the root node (N1) The maximum number of children for a node (i.e., the maximum branch factor c) is predefined for a task domain For instance, a binary tree (each node has two children only) has a maximum branch factor c equal to two At the terminal nodes, there will be no inputs from children Therefore, the terminal nodes are known as frontier nodes The forward recall is in the direction from the frontier nodes to the root in a bottom-up fashion The bottom-up processing from a child node to its parent node can be denoted by an operator qÀ1 Suppose that a maximum À1 branch factor of c has been predefined, each of the form qi , i ¼ 1; 2; :c, denotes the input from the ith child node into the current node This operator is similar to the shift operator used in the time series representation Thus, the recursive network for the structural processing is formed as À x ẳ Fn Aq1 y ỵ Bu ; 1ị y ẳ Fp Cx ỵ Duị; 2ị where x, u, and y are the n-dimensional output vector of the n hidden layer neurons, the m-dimensional inputs to the neurons, and the p-dimensional outputs of the neurons respectively qÀ1 is a notation indicating that the input to the node is taken from its child so that, À1 q1 y À1 C B q2 y B C qÀ1 y ¼ B C: @ A ð3Þ À1 qc y The parametric matrix A is defined as follows: À Á A ¼ A1 A2 Á Á Á Ac ; ð4Þ where c denotes the maximum number of children in the graph A is an n Á ðc Á pÞ matrix such that each Ak , k ¼ 1; 2; ; c is an n Á p matrix, which is formed by the vectors , j ¼ 1; 2; ; n B, C, and D are, respectively, ðn Á mÞ, j ðp Á nÞ, and ðp Á mÞ-dimensional matrices Fn ðÁÞ and Fp ðÁÞ are n and p-dimensional parametric vectors, respectively, given as follows: f ð Þ B f ð2 Þ C B C ð5Þ Fn ðÞ ¼ B C; @ A f ðn Þ where f ị is the nonlinear function defined as f ị ẳ 1=1 ỵ e ị: 2.1 BackPropagation through Structure (BPTS) Algorithm In accordance with the research work by Hammer and Sperschnedier [19], based on the theory of the universal approximation of the recursive neural network, a single hidden layer is sufficient to approximate any complicated mapping problems The input-output learning task can be defined by estimating the parameters A, B, C, and D in the parameterization from a set of training (input-output) examples Each input-output example can be formed in a tree data structure consisting of a number of nodes with their inputs and target outputs Each node’s inputs are described by a set of attributes u The target output is denoted by t, where t is a p-dimensional vector So, the cost function is defined as a total sum-squared-error function: CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION J¼ NT ÁT À Á 1XÀ ti À y R ti À y R ; i i iẳ1 6ị where NT is the total number of the learning data structures yR denotes the output at the root node Note that in the case of structural learning processing, it is often assumed that the attributes, u, are available at each node of the tree The main step in the learning algorithm involves the following gradient learning step: @J k ỵ 1ị ẳ kị ; 7ị @ ẳkị where kị denotes the free learning parameters : fA; B; C; Dg at the kth iteration and is a learning rate @J @ ẳkị is the partial derivative of the cost function with respect to evaluated at ẳ kị The derivation of the learning algorithm involves the evaluation of the partial derivative of the cost function with respect to the parameters in each node Thus, the general form of the derivatives of the cost function with respect to the parameters is given by: NT XÀ ÁT À Á @J ¼À t À yR à yR ðr xi Þ; i i @ iẳ1 8ị where yị is a p p diagonal matrix defined by the first derivative of the nonlinear activation function is defined as n-dimensional vector which is the function of the derivative of x with respect to the parameters It can be evaluated as: r x ¼ ÃðxÞAqÀ1 @y ; @ 219 2.2 Long-Term Dependency Problem For backpropagation learning of multilayer perceptron (MLP) networks, it is well-known that if there are too many hidden layers, the parameters at very deep layers are not updated This is because backpropagating errors are multiplied by the derivative of the sigmoidal function, which is between and and, hence, the gradient for very deep layers could become very small Bengio et al [11] and Hochreiter and Schmidhuber [20] have analytically explained why backprop learning problems with the longterm dependency are difficult They stated that the recurrent MLP network is able to robustly store information for an application of long temporal sequences when the states of the network stay within the vicinity of a hyperbolic attractor, i.e., the eigenvalues of the Jacobian are within the unit circle However, Bengio et al have shown that if its eigenvalues are inside the unit circle, then the Jacobian at each time step is an exponentially decreasing function This implies that the portion of gradients becomes insignificant This behavior is called the effect of vanishing gradient or forgetting behavior [11] In this section, we briefly describe some of the key aspects of the long-term dependency problem learning in the processing of data structures The gradient-based learning algorithm updates a set of parameters : fA; B; C; Dg in the recursive neural network for node representation defined in (1) and (2) such that the updated parameter can be denoted as ẳ r J; 10ị where is a learning rate and r is the matrix defined as h i @ @ @ r ¼ @1 @2 Á Á Á @n : ð11Þ By using the chain rule, the gradient can be expressed as: ð9Þ where ÃðxÞ is a n Á n diagonal matrix defined in a similar manner as ÃðyÞ It is noted that qÀ1 @y essentially repeats the @ same computation such that the evaluation depends on the structure of the tree This is called either the folding architecture algorithm [5] or backpropagation through structure algorithm [4] In the formulation of the learning structural processing task, it is not required to assume a priori knowledge of any data structures or any a priori information concerning the internal structures However, we need to assume the maximum number of children for each node in the tree is predefined The parameterization of the structural processing problem is said to be an overparameterization if the predefined maximum number of children is so much greater than that of real trees, i.e., there are many redundancy parameters in the recursive network than required to describe the behavior of the tree The overparameterization may give rise to the problem of local minima in the BPTS learning algorithm Moreover, the longterm dependency problem may also affect the learning performance of the BPTS approach due to the vanishing gradient information in learning deep trees The learning information may disappear at a certain level of the tree before it reaches at the frontier nodes so that the convergence of the BPTS stalls and a poor generalization results A detailed analysis of this problem will be given in the next section r J ¼ À NT XÀ ti À yR i ÁT rxR yR r xR : i 12ị iẳ1 If we assume that computing the partial gradient with respect to the parameters of the node representation at different levels of a tree is independent, the total gradient is then equal to the sum of these partial gradients as: ! NT R XÀ X Á R T R R;RÀl l r J ¼ À ti À yi rxR yi Á Jx rl x ; 13ị iẳ1 lẳ1 R;Rl where l ẳ R represents the levels of a tree and Jx ¼ rxl xR denotes the Jacobian of (1) expanded over a tree from level R (root node) to l backwardly Based on the idea of R;n Bengio et al [11], the Jacobian Jx is an exponentially decreasing function of n since the backpropagating error is multiplied by the derivative of the Sigmoidal function R;n which is between and 1, so that limn!1 Jx ¼ This implies that the portion of r J at the bottom levels of trees is insignificant compared to the portion at the upper levels of trees The effect of vanishing gradients is the main reason why the BPTS algorithm is not sufficiently reliable for discovering the relationships between desired outputs and inputs, which we term the problem of long-term dependency Therefore, we are now proposing a genetic evolution method to avoid this effect of vanishing gradients by the BPTS algorithm so that the evaluation for updating the parameters becomes more robust in the problem of deep tree structures 220 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 17, NO 2, FEBRUARY 2005 Fig The genetic evolution cycle for the neural network processing of data structure GENETIC EVOLUTION STRUCTURES FOR PROCESSING OF DATA The genetic evolution neural network introduces an adaptive and global approach to learning, especially in the reinforcement learning and recurrent neural network learning paradigm where gradient-based learning often experiences great difficulties on finding the optimal solution [16], [17] This section presents using the genetic algorithm for evolving neural network processing of data structures In our study, the major objective is to determine the parameters : fA; B; C; Dg of the recursive neural network in (1) and (2) over the whole data structures Our proposed genetic approach consists of two major considerations The first one is to consider the string representation of the parameters, i.e., either in form of “whole-in-one” or “4-parallel” structure These two string representations will be discussed in the next section Based on these two different string structures, the objection function for fitness criterion is the other main consideration Different string representations and object functions can lead to quite different learning performance A typical cycle of the evolution of learning parameters is shown in Fig The evolution terminates when the fitness is greater than a predefined value (i.e., the objective function reaches the stopping criterion) or the population has converged 3.1 String Structure Representation The genetic algorithm always uses binary strings to encode alternative solutions, often termed chromosomes In such a representation scheme, each parameter is represented by a number of bits with certain length The recursive neural network is encoded by concatenation of all the parameters in the chromosome Basically, the merits of the binary representation lie in its simplicity and generality It is straightforward to apply the classical crossover (such as the single-point or multipoint crossover) and mutation to binary strings There are several encoding methods (such as uniform, gray, or exponential) that can be used in the binary representation The gray code is suggested to alleviate the Hamming distance problem in our study It ensures that the codes for adjacent integers always have a Hamming distance of one so that the Hamming distance does not monotonously increase with the difference in integer values In the string structure representation, a proper string structure for GA operations is selected depending on fitness evaluation One of a simple way is a “whole-in-one” structure in which all parameters are encoded into one long string The encoding for the “whole-in-one” structure is simple and the objective function is simply evaluated by the error between the target and the root output values of data CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION structures But, the dimension may be very high so that the GA operations may be inefficient Moreover, this “whole-inone” structure representation has the permutation problem It is caused by the many-to-one mapping from the chromosome representation to the recursive neural network since two different networks have an equivalent function but they have different chromosomes This permutation problem makes the crossover operator very inefficient and ineffective in producing good offspring Thus, another string structure representation called “4-parallel” structure is used to overcome the above problem The GA process becomes efficient when we apply it over each group of parameters individually It is likely to perform a separate GA process on each group of parameters in parallel, but the limitation lies on its inability of performing the correlation constrains among the learning parameters of each node The objective function is essentially designed for this “4-parallel” string structure so as to evaluate the fitness criteria for GA operations of structural processing In (1) and (2), the recursive network for the structure processing is rewritten in matrices form as À1 & ' x h1 A B q y x Á ¼F : ð14Þ C D h2 y u u Note that h1 and h2 are used as two dummy vectors The matrix A B C D can be encoded into one binary string for the “whole-in-one” structure A very long chromosome is formed as: chromosomeA; B; C; Dị :ẳ f00100 0000110gjdẳncpịỵnmỵpnỵpm : 221 NT P Ea ẳ i¼1 ti À y R i ÁT À Á ti À y R i ð17Þ ; NT Á p where NT is the total number of the data structures in the learning set t and yR denote p-dimensional vectors of the desired output and the real output at the root node For GA operations, the objective is to maximize the fitness value by setting the chromosome to find the optimal solution In order to perform operations in the “whole-in-one” structure representation, the fitness evaluation can be simply defined based on Ea fitnessa ẳ p : ỵ Ea 18ị Basically, the above fitness is applied to the “whole-in-one” structure but cannot be applied directly to the “4-parallel” string structure The objective function for the “4-parallel” string representation is evaluated as follows: Let an error function, ei ị ẳ jti À yi j, be approximated by a first-order Taylor series as, ei ị % ei 0 ị ỵ r ei ; 19ị where ẳ fA; B; C; Dg represents the parameters of our proposed processing and, so, È @ ẫ @ @ @ 20ị r ẳ @A @B @C @D : Therefore, (19) becomes: ei ðÞ % @yi @yi @yi @yi Á ÁA À Á ÁB À Á ÁC À Á ÁD : ei ð0 ị ỵ @A @B @C @D 21ị 15ị On the other hand, for the “4-parallel” structure representation, four binary strings in the dimensions of n Á ðc Á pÞ, n Á m, p Á n, and p Á m, respectively, for the parametric matrices A, B, C, and D are formed as chromosomeAị :ẳ f00100 0000110gjdẳncpị ; 16bị chromosomeCị :ẳ f00100 0000110gjdẳpn ; 16cị chromosomeDị :ẳ f00100 0000110gjdẳpm : 16dị NT P @yR i À @ Á Á Eb ẳ iẳ1 16aị chromosomeBị :ẳ f00100 0000110gjd¼nÁm ; In (21), the first term is the initial error term while the second term can be denoted as a smoothness constraint that is given by the output derivatives of learning parameters Thus, the objective function of this constraint becomes, Note that d represents the number of parameters to be learned so that the total size of this chromosome is d Á number of encoding bits 3.2 Objective Function for Fitness Evaluation The genetic algorithm with the arithmetic crossover and nonuniform mutation is employed to optimize the parameters in the neural processing of data structures The objective function is defined as a mean-squared-error between the desired output and the network output at the root node: NT ð22Þ : So, the fitness evaluation for the “4-parallel” string structure representation is thus determined: fitnessb ẳ p ; ỵ Ea þ ð1 À ÞEb 1; ð23Þ where is a constant and ð1 À Þ weights the smoothness constraint It is noted that the range of the above fitness evaluation is within [0,1] This smoothness constraint is a trade off between the ability of the GA convergence and the correlation among four groups of parameters In our study, we empirically set ¼ 0:9 3.3 Selection Process Chromosomes in the population are selected for the generation of new chromosomes by a selection scheme It is expected that a better chromosome will generate a larger number of offsprings, and has a higher chance of surviving in the subsequent generation The well-known Roulette 222 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, Wheel Selection [21] is used as the selection mechanism Each chromosome in the population is associated with a sector in a virtual wheel According to the fitness value of the chromosome, which is proportional to the area of the sector, the chromosome that has a higher fitness value will occupy a larger sector while a lower value takes the slot of a smaller sector The selection rate of chromosome (s), is determined by: ratesị ẳ F fitnesssị ; Psize 1Þ Á F ð24Þ where F is the sum of the fitness values of all chromosomes and Psize is the size of chromosome population In our study, the selection rate is predefined such that the chromosome is selected if the rate is equal to or smaller than the predefined rate In our study, the predefined rate is set as 0.6 Another selection criterion of the chromosome may be considered on the constant in the fitness function (23) which takes the form as follows: Assume that at least one chromosome has been successfully generated in the population P , i.e., 9si P , such that Ea ðsi Þ ! 0, then the fitness evaluation becomes: fitnesssi ị ẳ : ỵ ịEb si ị 25ị Consider that chromosome sj 2ffi P fail À Á chosen in to be pffiffiffiffiffiffiffiffiffiffiffiffiffi À Á learning, i.e., Ea sj > ) Ea ðsj Þ >> Eb sj , so: À fitness sj ẳ 1ỵ p : Ea ðsj Þ ð26Þ Hence, is selected as follows to ensure À Á fitness sj < fitnessðsi Þ; VOL 17, NO 2, FEBRUARY 2005 rate pca is applied in the “whole-in-one” chromosome Once the probability test has passed (i.e., a random number is smaller than pca ), the crossover point is determined Besides, the crossover window size is determined by the best fitness (fitnessbest ) among the chromosome population The idea is that the window size is forced to decrease as the square of the best fitness value increases So, the window size is: À Á Wsize ¼ ðNbit À Ncrossover Þ Á À fitness2 ; ð29Þ best where Nbit denotes the number of bits in the “whole-in-one” chromosome and Ncrossover denotes the crossover point in the chromosome The crossover operation of this “whole-inone” structure is illustrated in Fig The parents are separated into two portions by a randomly defined crossover point and the size of the portions is determined by (29) The new chromosome is then formed by combining the shading portions of two parents as indicated in Fig For another chromosome structure as “4-parallel” structure since the size of this structure is smaller than that of the “whole-in-one” structure, single-point crossover operation can thus be applied directly There are four crossover rates to be assigned with the four groups of parameters, so that if a random number is smaller than the probability, the new chromosome is mated from the first portion of the parent and the last portion in the parent The crossover operation for this “4-parallel” structure is shown in Fig Mutation introduces variations of the model parameters into chromosomes It provides a global searching capability for the GA by randomly altering the values of string in the chromosomes Bit mutation is applied for the above two chromosome structures in the form of bit-string This is a random operation that occasionally (with probability pmb , typically between 0.01 and 0.05) occurs which alters the value of a string bit so as to introduce variations into the chromosome A bit is flipped if a probability test is satisfied then qffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Ea ðsj Þ > ð1 À ÞEb ðsi Þ; ð27Þ so Eb ðsi Þ ffi : > pffiffiffiffiffiffiffiffiffiffiffiffiffi Ea ðsj Þ þ Eb ðsi Þ ð28Þ As our empirical study defines the constant value of ¼ 0:9, the chromosome is successfully selected by satisfying the criterion in (28) To sum up, suppose that a chromosome, stest , will be selected if it satisfies the following conditions: if F À fitnessðstest Þ ðPsize À 1Þ Á F Eb ðstest Þ 0:6 and p < 0:9: Ea stest ị ỵ Eb stest ị 3.4 Crossover and Mutation Operations There are several ways to implement the crossover operation depending on the chromosome structure The single point crossover is appropriate for the “4-parallel” structure, but it is not applicable for the “whole-in-one” structure because of its high dimension It is more appropriate to implement the window crossover for the “whole-in-one” encoding, where the crossover point and the size of the window are taken within a valid range Basically, the point crossover operation with the probability STRUCTURE-BASED FLOWER IMAGE CLASSIFICATION Flower classification is a very challenging problem and will find a wide range of applications including live plant resource and data management, and education on flower taxonomy [18] There are 250,000 named species of flowering plants and many plant species have not been classified and named In fact, flower classification or plant identification is a very demanding and time-consuming task, which has mainly been carried out by taxonomists/botanists A significant improvement can be expected if the flower classification can be carried out by a machine-learning model with the aid of image processing and computer vision techniques Machine learning-based flower classification from color images is and will continue to be one of the most difficult tasks in computer vision due to the lack of proper models or representations, the large number of biological variations that a species of flowers can take, and imprecise or ambiguous image preprocessing results Also, there are still many problems in accurately locating flower regions when the background is complex It is due to its complex structure and the nature of 3D objects which adds another dimension of difficulty in modeling Flowers can, basically, be characterized by color, shape, and texture Color is a main feature that can be used to differentiate flowers from the background including leaves, stems, shadows, soils, etc Color-based domain knowledge can be CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 223 Fig Window crossover operation for the “whole-in-one” structure Fig Parallel crossover operation for the “4-parallel” string structure adopted to delete pixels that not belong to flower regions Das et al [27] proposed an iterative segmentation algorithm with a knowledge-driven mechanism to extract flower regions from the background Van der Heijden and Vossepoel proposed a general contour-oriented shape dissimilarity measure for a comparison of flowers of potato species [28] In another study, a feature extraction and learning approach was developed by Saitoh and Kaneko for recognizing 16 wild flowers [29] Four flower features together with two leaf features were used as the input for training the neural network flower classifier A quite good performance was achieved by their holistic approach However, the approach can only handle single flower orientation to classify the corresponding category It cannot be directly extended to several different flower orientations with the same species (i.e., they are the same species but in different orientations and colors) Image content representation has been a popular research topic in various images processing applications for the past few years Most of the approaches represent the image content using only low-level visual features either globally or locally It is noted that high-level features (such as Fourier descriptors or wavelet domain descriptors) cannot characterize the image contents accurately by their spatial relationships whereas local features (such as color, shape, or spatial texture) depend on error-prone segmentation results In this study, we consider a region-based representation called binary tree [22], [23], [24] The construction of image representation is based on the extraction of the relevant regions in the image This is typically obtained by a region-based segmentation in which the algorithm can extract the interesting regions of flower images based on a color clustering technique in order to simulate human visual perception [30] Once the regions of 224 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 17, NO 2, FEBRUARY 2005 Fig System architecture of the flower classification interest have been extracted, a node is added to the graph for each of these regions Relevant regions to describe the objects can be merged together based on a merging strategy Binary trees can be formed as a semantic representation whose nodes correspond to the regions of the flower image and arcs represent the relationships among regions Beside the extraction of the structure, a vector of real value attributes is computed to describe the image regions associated by the node The features include color information, shading/contrast properties, and invariant shape characteristics The following sections describe how to construct the binary trees representation for flower images Fig illustrates the system architecture of the structurebased flower images classification At the learning phase, a set of binary tree patterns representing flower images under different families were generated by the combining processes of segmentation, merging strategy, and feature extraction All these tree patterns were used for training the model by our proposed genetic evolution processing in data structures At the classification phase, a query image is supposed to be classified automatically by the trained neural network in which the binary tree was generated by the same processes for generating learning examples brightness of a color as perceived by the eye The second one is that the length of the maximum saturation vector varies depending on the hue of the color Therefore, from the color clustering point of view, it is desired that the image is represented by color features which constitute a space possessing uniform characteristics such as the ðLà ; aà ; bÃ Þ color channels system [26] It was shown that this system gives good results in segmenting the color images The values of the ðLà ; aà ; bÃ Þ are obtained by transforming the (R, G, B) values into the (X, Y, Z) space which is further converted to a cube-root system The transformation is shown below: 3 X 2:7690 1:7518 1:1300 R Y ¼ 1:0000 4:5907 0:0601 Á G 5; ð30aÞ Z 0:0000 0:0565 5:5943 B 1 Y L ¼ 116 À16; Y0 à 4.1 Segmentation A color image is usually given by R (red), G (green), and B (blue) values at every pixel But, the difficulty with the RGB color model is that it produces color components that not closely follow those of the human visual system A better color model produces color components that follow the understanding of color by H (hue), S (saturation), and I (intensity or luminance) [25] Of these three components, the hue is considered as a key component in the human perception However, the HSI color model has several limitations First, the model gives equal weighting to the RGB components when computing the intensity or luminance of an image This does not correspond with the Y > 0:01; Y0 ð30bÞ 1 1 ) Y X ; with À > 0:01; Y0 X0 ð30cÞ ( ) Y Z Z ; with À > 0:01; b ¼ 200 Y0 Z0 Z0 ð30dÞ Ã ( with a ¼ 500 X X0 à where X0 , Y0 , and Z0 are the (X, Y, Z) values of the reference white color (i.e., 255 for the 8-bit gray-scale image) Thus, the cube-root system yields a simpler decision surface in accordance with human color perception They are given by lightness : Là ; ð31aÞ CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 225 Fig (a) A single flower image example and (b) its histogram of Chroma channel hue : H ¼ tanÀ1 à b ; a 31bị r a ị2 ỵ b ị2 : chroma : C ẳ 31cị The proposed segmentation uses the Euclidean distance to measure the similarity between the selected cluster and the image pixels within the above cube-root system The first step of our method is to convert the RGB components into the lightness-hue-chroma channel based on (30) and (31) The Euclidean distance between each cluster centroid and the image pixels within the lightness-hue-chroma channel is given as: qÀ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Á2 À Á2 À Á2ffi Là x; yị L ỵ H x; yị Hi ỵ C x; yị Ci ; Di ¼ i for i M; ð32Þ Á where M is the number of selected clusters is the ith cluster centroid and ðLà ðx; yÞ; H ðx; yÞ; C ðx; yÞÞ is the image pixel at the coordinates x and y within the cuberoot system For clustering the regions of interest, the kmean clustering method [25] is used such that a pixel ðx; yÞ is identified as belonging to background cluster j if mini2 fDi x; yịg ẳ Dj For the above computation, the determination of the cluster centroids is very crucial They can be evaluated by: À Là ¼ i Là ; Hi ; Cià i à X Là ðx; yÞ; Ni Là ðx;yÞ2 ð33aÞ X H ðx; yÞ; Ni H ðx;yÞ2 ð33bÞ X C à ðx; yÞ; Ni C à ðx;yÞ2 33cị i Hi ẳ i Ci ẳ i where Ni is the number of pixels assigned to cluster i The number of assigned clusters is based on the number of the most dominant peaks determining by the k-mean clustering within the chroma channel For example, Fig illustrates a flower image with a histogram of the chroma channel in which there are two most dominant peaks within the channel (i.e., clusters “a” and “b”) Thus, two clusters can be assigned One of them should be the background cluster whereas another should be the foreground cluster The segmentation results of this example image are shown in Fig in which two images (Figs 8a and 8b) are segmented with two cluster centroids and the corresponding flower region is extracted as shown in Fig 8c 4.2 Merging Strategy and Tree Construction The idea of creating and processing a tree-structure image representation is an attempt to take benefit from the attractive features of the segmentation results based on the method described in the previous section In our study, we start from the terminated nodes and merge two similar neighboring regions associated with the child nodes based on their contents This merging is iteratively operated by a recursive algorithm until the child nodes of the root node (i.e., the background and foreground regions) The following explains the proposed merging strategy to create a binary tree Á Assume that the merged regions pair is denoted À as O Ri ; Rj i6¼j i;j , where Ri ; Rj for i; j ¼ P denote the P regions and the entropy function is MRi [Rj for a pair of regions Ri and Rj for the merging criterion The merging criterion is based on examining the entropy of all pairs of regions to identify which one is the maximum and the merging is terminated until the last pair of regions merged to become the entire image At each step, the algorithm searches for the pair of most similar regions’ contents, which should be the pair of child nodes linked with their parent node The most similar regions pair is determined by maximizing the entropy: n o À Á O Ri ; Rj i6¼j ¼ arg max MRi [Rj i6¼j : ð34Þ OðRi ;Rj Þ2i;j The entropy function MRi [Rj of regions Ri and Rj is computed based on the color homogeneity of two subregions, which is defined as: ! K K NRj X Rj NRi X Ri Rj Ri p log2 pk ỵ p log2 pk ; MRi [Rj i6ẳj ẳ NT kẳ1 k NT kẳ1 k 35ị where NRi ; NRj are the number of pixels for two regions Ri and Rj , NT is the total number of pixels for the parent region, K is the number of quantized colors, and pk represents the percentages of the pixels at the kth color in 226 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 17, NO 2, FEBRUARY 2005 Fig The segmentation results of a flower image example (a) Segmentation by cluster centriod “a” in the histogram shown in Fig 7b (b) Segmentation by cluster centriod “b” in the histogram shown in Fig 7b (c) The extracted flower region by the segmentation result with the selected cluster centriod “b.” the region The above computation is done recursively until no more regions can be merged For a flower image as shown in Fig 9a, the image is segmented into four regions, so the algorithm merges them in three steps In the first step, suppose that the pair of most similar regions is regions “a” and “b,” which can be merged to create “e.” Then, node “e” is merged with region “c” to create “f” corresponding to the foreground region Finally, node “f” is merged with region “d” corresponding to the background region to create “g” which is the root node corresponding to the whole image The merging sequence is: e ẳ Oa; bị ! f ẳ Oc; eị ! g ẳ Od; f ị; and the tree constructed is shown in Fig 9b The merging order is based on the color homogeneity criterion as well as the number of merged regions Fig 10 shows the tree construction results of the other two examples from our flower images database Such a region-based binary tree representation considers the foreground flower regions as well as the background containing leaves and tree branches The representation takes into consideration the contribution and distribution of multiple flowers Therefore, the representation is more meaningful than using a single flower region or combined flower regions in a flatvector representation We believe that this is a necessary step eventually leading to a more robust semantic image content representation 4.3 Feature Extraction Besides the creation of the tree structure-based image representation, the features of each region must be computed and attached to the corresponding node in the tree The features can be visual features, such as color, texture, and shape, which are very important in characterizing image contents In the tree structural representation, the content of a region can be well characterized by the features including color, shape, and statistical texture attributes Four attributes describing color, two simple statistical texture features, and four attributes of shape are extracted to characterize a region (a node in the binary tree) All these 10 attributes are extracted and attached to each node of the tree Color attributes include the percentage of the number of quantized colors in the region over that in the whole image and the percentages of the three most dominant colors in the RGB color space For each of 8-bits R, G, and B components, we consider two most significant bits Therefore, the total possible numbers of color levels are 64 bins ð4 Á Á 4Þ These four attributes are very useful for characterizing the color property of a region For extracting the texture features from an image region, we present the textures of a region in terms of two statistical attributes (i.e., mean and standard deviation) to characterize the texture Apart from the color and texture features, shape features are desirable in characterizing various flower regions In fact, it is rather difficult to extract shape descriptors which are insensitive to large variations in image scale, rotation, and translation In our study, two features are used to describe the shape of a flower region Two attributes are used to describe the edge densities in both vertical and horizontal directions of the flower region The edge densities can be evaluated by finding the area in the histograms of edge information in both vertical and horizontal directions Also, the position of a flower region has no effect on the edge directions We also present another shape feature of a flower region in terms of secondorder invariant moments (two attributes) These features are invariant under rotation, scale, translation, and reflection of the image In total, four attributes are used to represent the shape features Using a small number of simple features to describe each image region is actually the other main merit of our tree structural representation of flowers EXPERIMENTAL RESULTS AND DISCUSSION This section reports the performance of the flower image classification by the genetic evolution processing of data Fig Example of region merging to create a binary tree (a) Four regions (including the background region) created by the segmentation and (b) four-levels binary tree CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 227 Fig 10 Examples of tree structures constructed from the two flower images from the same class in the flower database (a) Single flower in an image (b) The segmentation result from (a) (c) The tree constructed based on (b) (d) Three flowers in an image (e) The segmentation result from (d) (f) The tree constructed based on (e) structures with the structural image representation The image classification was performed on eight families of flowers in which they are classified in terms of their living features in flowering plant taxonomy The eight flower familiess for our experiments are Amaryllidaceae, Asteraceae, Brassicaceae, Clematis, Rosaceae, Liliaceae, Malvaceae, and Violaceae Fig 11 shows the examples of these eight families from our flower database Some images are with a single flower and some images are with multiple numbers of flowers Most of the flower images were collected by us and some of them were downloaded from the Internet The flower database consists of 933 images from 79 flower species 575 images were used to generate a learning set for training and the other 358 images were used for classification In our study, each image was represented by three to five different trees in accordance with different k values (normally set to to 5) for the k-mean clustering during the segmentation process Therefore, about 3,400 treestructure patterns were generated in both learning and testing sets In our investigations, we compared the performance of our proposed genetic evolution algorithm with conventional neural classifiers based on the multilayer perceptron (MLP) network and the radial basis function (RBF) network We also used the backpropagation through structure (BPTS) learning algorithm to compare with our proposed algorithm to exhibit an ability of overcoming the long-term dependency problem The testing was also performed under different types and conditions of noise A single-hidden-layer recursive neural network was used to encode the node representation of tree structures As there are 10 input attributes and eight families for this classification problem, we set the configuration of 10-8-8 The parameter (weight) initialization is performed randomly in the range of [-1, 1] with a uniform distribution 5.1 Learning Performance Evaluations In our study, the learning methods to be compared include the MLP and RBF networks with flat-vector input patterns either region features-based or node features-based The region features based input to the MLP and RBF classifiers is a vector of 10 input attributes according to the feature extraction methods in Section 4.3 The features were extracted from each region generated from the segmentation method based on the method in Section 4.1 The vector components for all regions are arranged to form a flat vector Another node features based input vector is a long vector of 10 input attributes (in the same feature extraction schemes) obtained by arranging the input attributes of each node from the tree construction according to the method in Section 4.2 Also, our proposed genetic evolution algorithm is compared with the BPTS learning for the structural processing In this comparative studies, the number of nodes at the hidden layer is the same among the different neural classifiers but the number of parameters (weights) used are different, which are dependent on what features based input are used (i.e., region-based or node-based from the tree) For instance, the number of parameters used for the sequential processing with the flat-vector input is greater than that of the structural processing The difference in the number of hidden nodes reflects the different computational complexity of different classifiers Suppose that a classifier has m hidden nodes and n input features, the computational complexity of sequential processing with the flat-vector input with region-based features, flat-vector input with node features, and structural processing of binary tree, are, respectively, Âðr Á n Á mÞ, Âðð2r À 1Þ Á n Á mÞ, and 2c ỵ nị mị, where r and c represent the number of segmented regions and the number of categories, respectively Their classification results are tabulated in Table The comparative results show that our proposed genetic evolution algorithm exhibits a better performance with an average classification rate of 86 percent, whereas 60 percent, 65 percent, and 70 percent were obtained, respectively, for the MLP classifier, the RBF classifier, and the BPTS learning for processing of data structures 228 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 17, NO 2, FEBRUARY 2005 Fig 11 Examples of eight different families of flower images Families: (a) Amaryllidaceae, (b) Asteraceae, (c) Brassicaceae, (d) Clematis, (e) Rosaceae, (f) Liliaceae, (g) Malvaceae, and (h) Violaceae TABLE A Comparison among Different classifiers with Different Learning Methods in the Classification of Eight Species of Flowers TABLE Average Classification Rates by Different Classifiers under Different Noise Conditions 5.2 Noise Robustness Apart from the above classification results, the noise sensitivity is another important issue to be evaluated Experiments were conducted for patterns corrupted by different types and conditions of noise The flower images were corrupted by three different types of noise, namely, “Gaussian,” “Salt and Pepper,” and “Multiplicative” with noise levels in the range of percent to 10 percent Noise corrupted patterns were, respectively, obtained by changing the intensity of each pixel with certain distributions (i.e., certain degrees of mean and variance), changing the intensity of each uncorrelated pixel with a certain probability, and adding a certain random degree of noise to each pixel based on a random variable of uniform distribution with a zero mean The overall comparative results are tabulated in Table The classification rates were obtained by averaging 20 CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 229 Fig 12 Overall classification rates of different classification methods against different noise levels of (a) “Gaussian” noise, (b) “Salt and Pepper” noise, and (c) “Multiplicative” noise independent runs under different initializations and different noise conditions In fact, using the MLP and RBF classifiers with the flat-vector input has broken some regularities inherently associated with the data structures, which yield poor generalization, especially on the condition of noise corruption Moreover, the neural processing of data structures by BPTS learning algorithm is suffered from the problem of long-term dependency that has been discussed in the previous section The overall classification rate obtained by our proposed genetic processing of data structure with both the “whole-in-one” and the “4-parallel” chromosome structures are around 85 percent without noise and 80 percent with noise conditions On the contrary, approximately 70 percent was obtained by the BPTS learning algorithm The overall classification rates of the tested methods for different noise conditions are shown in Figs 12a, 12b, and 12c The results show that the derivation of classification rates among these methods is smaller under lower noise level, but they trend to increase as the noise level increases The results also illustrate that our approach is more robust to the noise 5.3 Classification on an Extended Data Set To further evaluate how well our system performs for flower classification, a selection of images corresponding to the studied flower families were downloaded from the internet (http://plantsdatabase.com/) to extend our test flower database Each image is represented by three binary trees automatically generated by using a combination of segmentation, merging strategy, and feature extractions according to the schemes in Section In this evaluation, five flower species in each category were selected to be added and each species has two to three images under different orientations Therefore, the number of flower species was extended to 84 and the total number of testing images became 459 Although the exact characteristic of TABLE Classification Confusion Matrix in Extended Testing Data Set 230 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, [4] each test image is unknown, the classification performed by our structure-based classifier can be used to categorize from the binary tree representation, which is then visually compared to the images database for retrieval The classification accuracy is shown by the confusion matrix in Table The result is encouraging and consistent in the extension of the testing flower database to classify more numbers of flower species [7] [8] CONCLUSION In this paper, we propose a new approach to image classification, which is referred to as an adaptive processing of data structures with the genetic evolution learning Unlike conventional connectionist and statistical approaches, which typically rely on a static representation of data resulting in vectors of features, patterns can be better represented by directed graphs/trees, which are subsequently processed using specific neural networks In this paper, we emphasize that it is extremely difficult for the gradient-based Backpropagation Through Structure (BPTS) algorithm to learn a very deep tree structure because of the problem of long-term dependency Indeed, the gradient contribution disappears at a certain level of tree structures due to the effect of the vanishing gradient or the forgetting behavior In our study, we develop a genetic evolution processing to overcome this problem In this proposed framework, the parameters/weights are tuned by genetic evolution with adaptive crossover and mutation Two different chromosome structures, namely, “whole-in-one” and “4-parallel,” are proposed The “4-parallel” structure delivers slightly better results than the “whole-in-one” structure under the specific fitness evaluation Also, the convergence rate of the “4-parallel” is faster than that of the “whole-in-one” structure In this paper, experimental results on flower image classification have shown the advantages of our proposed algorithm We considered a region-based binary tree representation to represent the image at multiple levels and the connectivity between regions is translation invariant Different evaluations, including learning performance and noise robustness, are also shown that our proposed approach can produce a promising performance for the application to flower classification and recognition Also, our approach is more robust to noise than the other methods tested ACKNOWLEDGMENTS The work described in this paper is partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No.: PolyU 5119/01E) and an ASD grant from the Hong Kong Polytechnic University (Project No.: A408) [5] [6] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] REFERENCES [1] [2] [3] C.L Giles and M Gori, Adaptive Processing of Sequences and Data Structures New York, Springer, 1998 A.C Tsoi, “Gradient Based Learning Methods,” Adaptive Processing of Sequences and Data Structures, C.L Giles and M Gori, eds., pp 27-62, New York, Springer, 1998 B Hammer, “Learning with Recurrent Neural Networks,” Springer Lecture Notes in Control and Information Sciences 254, SpringerVerlag, 2000 [29] [30] VOL 17, NO 2, FEBRUARY 2005 A Sperduti and A Starita, “Supervised Neural Networks for Classification of Structures,” IEEE Trans Neural Networks, vol 8, pp 714-735, 1997 A.C Tsoi, “Adaptive Processing of Data Structure: An Expository Overview and Comments,” technical report, Faculty of Informatics, Univ of Wollongong, Australia, 1998 P Frasconi, M Gori, and A Sperduti, “A General Framework for Adaptive Processing of Data Structures,” IEEE Trans Neural Networks, vol 9, pp 768-785, 1998 C Goller and A Kuchler, “Learning Task-Dependent Distributed Representations by Back-Propagation through Structure,” Proc IEEE Int’l Conf Neural Networks, pp 347-352, 1996 P Frasconi, M Gori, A Kuchler, and A Sperduti, “From Sequences to Data Structures: Theory and Applications,” A Field Guild to Dynamical Recurrent Networks, J Kolen and S Kremer, eds., pp 351-374, chapter 19, IEEE Press, 2001 D.E Rumelhart and J.L McClelland, Parallel Distributed Processing: Exploration in the Microstructure of Cognition MIT Press, 1986 M Gori and A Tesi, “On the Problem of Local Minima in Backpropagation,” IEEE Trans Pattern Analysis and Machine Intelligence, vol 14, no 1, pp 76-86, 1992 Y Bengio, P Simard, and P Frasconi, “Learning Long-Term Dependencies with Gradient Descent is Difficult,” IEEE Trans Neural Networks, vol 5, no 2, pp 157-166, Mar 1994 Y Bengio and P Frasconi, “Input-Output HMM’s for Sequence Processing,” IEEE Trans Neural Networks, vol 7, no 5, pp 12311249, Sept 1996 M Srinivas, “Genetic Algorithms: A Survey,” Computer, pp 17-26, June 1994 K.F Man, K.S Tang, and S Kwong, Genetic Algorithms, Concepts and Designs Springer-Verlag, 1999 T Back, U Hammel, and H.-P Schwefel, “Evolutionary Computations: Comments on the History and Current State,” IEEE Trans Evolutionary Computation, vol 1, no 1, pp 3-17, Apr 1997 X Yao, “Evolving Artificial Neural Networks,” Proc IEEE, vol 87, no 9, pp 1423-1447, Sept 1999 V Maniezzo, “Genetic Evolution of the Topology and Weight Distribution of Neural Networks,” IEEE Trans Neural Networks, vol 5, no 1, pp 39-53, Jan 1994 Z Chi, “Data Management for Live Plant Identification,” Mutimedia Information Retrieval and Management, D Feng, W.C Siu, and H Zhang, eds., Springer-Verlag, to be published B Hammer and V Sperschneider, “Neural Networks Can Approximate Mappings on Structured Objects,” Proc Second Int’l Conf Computational Intelligence and Neuroscience (ICCIN ’97), 1997 S Hochreiter and J Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol 9, no 8, pp 1735-1780, 1997 J.H Holland, Adaptation in Natural and Artificial System Ann Arbor, Mich.: Univ of Michigan Press, 1975 X Wu, “Image Coding by Adaptive Tree-Structured Segmentation,” IEEE Trans Information Theory, vol 38, no 6, pp 1755-1767, Nov 1992 H Radha, M Vetterli, and R Leonardi, “Image Compression Using Binary Space Partitioning Trees,” IEEE Trans Image Processing, vol 5, no 12, pp 1610-1624, Dec 1996 P Salembier and L Garrido, “Binary Partition Tree as an Efficient Representation for Image Processing, Segmentation, and Information Retrieval,” IEEE Trans Image Processing, vol 9, no 4, pp 561576, Apr 2000 A.R Weeks and G.E Hague, “Color Segmentation in the HSI Color Space Using the K-means Algorithm,” Proc SPIE, vol 3026, pp 143-154, 1997 M Celenk, “Colour Image Segmentation by Clustering,” IEE Proc.—E, vol 138, no 5, pp 368-376, Sept 1991 M Das, R Manmatha, and E.M Riseman, “Indexing Flower Patent Images Using Domain Knowledge,” IEEE Intelligent Systems, pp 24-33, Sept./Oct 1999 G.W.A.M van der Heijden and A.M Vossepoel, “A LandmarkBased Approach of Shape Dissimilarity,” Proc Int’l Conf Pattern Recognition (ICPR ’96), pp 120-124, 1996 T Saitoh and T Kaneko, “Automatic Recognition of Wild Flowers,” Proc 15th Int’l Conf Pattern Recognition, vol 2, pp 507510, 2000 E Vicario, Image Description and Retrieval New York and London: Plenum Press, 1998 CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION Siu-Yeung Cho received the BEng (Hons) degree from the University of Brighton, United Kingdom, in 1994 and the PhD degree from the City University of Hong Kong in August 1999, all in electronic engineering He is now an assistant professor in the School of Computer Engineering, Nanyang Technological University of Singapore His research interests include neural networks, pattern recognition, and 3D computer vision He has published more than 30 technical papers He is a member of the IEEE 231 Zheru Chi received the BEng and MEng degrees from Zhejiang University in 1982 and 1985, respectively, and the PhD degree from the University of Sydney in March 1994, all in electrical engineering Since February 1995, he has been with the Hong Kong Polytechnic University, where he is now an associate professor in the Department of Electronic and Information Engineering Since 1997, he has served as a session organizer/session chair/ area moderator/program committee member for a number of international conferences Dr Chi was one of contributors to the Comprehensive Dictionary of Electrical Engineering (CRC Press and IEEE Press, 1999) His research interests include image processing, pattern recognition, example-based machine translation, and computational intelligence He has coauthored one book and seven book chapters, and published more than 100 technical papers He is a member of the IEEE For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib ... CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 227 Fig 10 Examples of tree structures constructed from the two flower images from the same class in the flower database... classification rates were obtained by averaging 20 CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION 229 Fig 12 Overall classification rates of different classification. .. error between the target and the root output values of data CHO AND CHI: GENETIC EVOLUTION PROCESSING OF DATA STRUCTURES FOR IMAGE CLASSIFICATION structures But, the dimension may be very high so