SOME CONTEXT FUZZY CLUSTERING METHODS FOR CLASSIFICATION PROBLEMS Bui Cong Cuong Le Hoang Son Hoang Thi Minh Chau Institute of Mathematics, VAST Ha Noi University of Science, VNU University of Economic and Technical Industries 456 Minh Khai, Ha Noi, Viet Nam 18 Hoang Quoc Viet, HaNoi, Vietnam 334 Nguyen Trai, Ha Noi, Viet Nam ccuong@inbox.com sonlh@vnu.edu.vn htmchau.uneti@moet.edu.vn ABSTRACT trends to work on In this paper, we will propose a two-context fuzzy clustering algorithm (2C-FCM) and its parallel solution so called P2C-FCM for the classification problems Some initial experiments show the effectiveness of P2C-FCM and 2C-FCM when comparing with traditional Context FCM The applications of P2C-FCM and 2C-FCM are the basis to generate fuzzy rules for classifying member countries of United Nation Organization (UNO) according to the Human Development Index based on the statistics of UNO in 2005 H.M Berenji in [3] considered FCM as a method for tuning fuzzy rules and in [2] G Bortolan, W Pedrycz used CFCM in the design of fuzzy neural network In the section of [2] the contexts and the resulting prototypes in the input space are directly used towards the construction of the fuzzy neural networks Obviously, FCM and CFCM are useful techniques in computing methodologies nowadays 1.1 Previous work The first context-based clustering approach was proposed by W.Pedrycz [2] namely as context fuzzy C-means method (CFCM) In this study, they defined a context variable in order to narrow the origin dataset under some conditions of certain dimensions Because only a subset of origin dataset which has considerable meaning to the context is invoked, the velocity and effeciency of classification can be improved considerably, and the result focuses on the area that really has many relevant points In a specific case, the context-sensitive FCM allows us to concentrate the classification into a subspace due to conditions of some dimensions showed in defined context The convergence conditions of CFCM algorithm is analogous to standard FCM as shown in [3] The speed of CFCM is relatively faster than FCM in case of little context variables CFCM has been being the state-of-the-art algorithm in context-based clustering Categories and Subject Descriptors I.5.3 [Pattern Recognition]: Clustering – algorithms General Terms Algorithms, Experimentation, Theory Keywords Parallel Fuzzy Clustering, Fuzzy Rules, Classification INTRODUCTION The problem of classification and data clustering were studied long time ago Recent striking approaches have concentrated on fuzzy clustering method (FCM) whose applications range from data analysis, pattern recognition, image segmentation, group-positioning analysis, satellite images, financial analysis, With the growing demands for the exploitation of intelligent and highly autonomous systems, it would be beneficial to combine robust learning capabilities with a high level of knowledge interpretability Fuzzy neuro computation supports a new paradigm of intelligent information processing [1, 2, 5-8], in which we are able to achieve this powerful combination Nowadays, W.Pedrycz et al presented some knowledge-based clustering methods, including context fuzzy C-means method (CFCM) It is also considered as a strong aid of rule extraction and data mining from a set of data [1, 2, 48], in which fuzzy factors are really common and rise up various However, sometimes we need to concentrate to more important variables of the set of antecedent attributes of fuzzy rules For example, we need to look for countries which have ‘high’ GDP Per capita (GDPPC) and ‘high’ Education Index (EI) in the statistics of the United Nation Organization (UNO) Traditional CFCM does not allow us to this analysis due to the use of a context variable in its definition only Consequently, we need another method to deal with this kind of requests 1.2 This work In this paper, we will present a 2-context fuzzy clustering method (2C-FCM) and a parallel solution for this algorithm so called Parallel Two-Context Fuzzy Clustering Method (P2CFCM) These methods are applied to generate fuzzy rules in contextual situations for the classification problem The experiment on the statistics of UNO in 2005 shows the effectiveness of the proposed algorithms Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee SoICT’10, August 27–28, 2010, Hanoi, Vietnam Copyright 2010 ACM 978-1-4503-0105-3/10/08…$10.00 34 The paper is organized as follows: The second section reviews the CFCM method Section devotes to a concrete 2CFCM algorithm which could be applied in the parameter training phase of the process and to calculation the partition matrix The parallel solution P2C-FCM will be presented in Section In Section 5, we will present some evaluations and the application of these clustering methods for classifying member countries of United Nation Organization (UNO) according to the Human Development Index based on the statistics of UNO 2005 will be presented in Section Finally, we make conclusion and future works in the last section for i = 1,…, C → [0,1] fk = A(yk) c ik We have known that standard FCM is a direction-free construction, which means it is regardless if a dimension is an input or an output variable This may lead to a not very reasonable distribution of prototypes or centers of groups in that the algorithm sweeps over even unrelated areas in data space In a specific case, the context-sensitive FCM (CFCM) allows us to concentrate the classification into a subspace due to conditions of some dimensions showed in defined context Only a subset of original dataset which has considerable meaning to the context is helpful in the algorithm (1) f k i 1 c max u ik f i 1 (2) k with k=1, , N The basic objective function is: C N J uikm xk i i 1 k 1 where m is a coefficient of fuzziness and uik is an element of partition matrix U defined as following: 2-CONTEXT FUZZY CLUSTERING Sometimes we need to concentrate to more important variables of the set of antecedent attributes of fuzzy rules The following is a 2-context fuzzy clustering algorithm (2C-FCM) for classification the data set x k :x k R n Choose context c N U( f ) uik0,1: uik f k, k 1, ,N,0 uik N, i 1, ,C i 1 k 1 → [0,1] A: Y The algorithm CFCM yk The context fuzzy c-means algorithm has four steps: Initiate the matrix U(t) with t=0 Re-calculate centers of each clusters according to : variables, then two maps A and B are defined on subspaces Y and Z as follow: where uik is the membership value of the kth point to the ith cluster zk f1k = A(yk) (3) → [0,1] and B: Z f2k = B(zk) (4) The objective function is: N uikm xk i If the error of the partition matrix ||U(t+1) - U(t)||, defined through some analysis normal, is less than given threshold then the algorithm stops, else return step We arrive at the formula in step by transforming the condition: U U ( f ) to a standard unconstrained optimization by making use of Lagrange multipliers and determining a critical point of the resulting function That means we only need to change the total membership of each point to all the groups That sum is not necessary equal to 1, but it can vary from to It is obvious from those formulae that, if a point has no meanings in a certain context, its contextual value fk will be equal to 0, and it plays no role in re-manipulating the positions of centers and the membership measures The target function of the algorithm remains unchanged The value fk can be understood as the representation for the level of relation of the kth point to the supposed context Y These are some ways to define the relation between fk and the membership of kth point to the ith cluster, for instance, using the sum operator (1) or maximum operator (2) u m 1 for i = 1, , C ; k=1, , N that missing data have been processed, our purpose is to classify them into C clusters We will work in n-dimension space ( x R n ) with xk is the kth point and vi is the center of ith cluster Then, we define a context variable in Y X whose definition is stated through the map: x k i j 1 xk j C Given a dataset of n attributes: X x1 , , x N , supposed yk fk uik CONTEXT FUZZY C-MEANS METHOD A: Y Re- calculate matrix U(t+1) as follow: C N J ukjm xk j k 1 N j 1 k 1 m ik u k 1 The 2-context fuzzy clustering algorithm (2C-FCM) 35 The algorithm consists of steps: In the final step we receive C1*C2*C clusters according to C1*C2 contexts from choosen context variables The final partition matrix is: Use FCM with the first choosen context variblable to classify into C1 clusters with the objective function: C1 C C C1 N J u m kj x k j U { uks (0,1) : ukj 1, k 1, , N } j 1 j 1 k 1 where As we can see, the level of details in knowledge-based clustering is enhanced by making use of context variables For example, some countries having ‘high’ GDPPC and ‘high’ EI will be listed in 2C-FCM algorithms whilist the information of ‘high’ GDPPC or ‘high’ EI is shown in CFCM Therefore, CFCM is sometimes called one-context FCM (1C-FCM) Although the remains data set except context data values contribute less importance, however, in any case, we have to count these data with premise that every data have some relationships with given context variables x k R with k=1, , N are data values of the first context variable The first resulting partition matrix is: C1 U { ukj (0,1) : u kj 1, for k 1, , N } j 1 The output of this step are the matrix U and C1 cluster centers in R Calculate fk as f (1jk)u kj , k=1, ,N and j=1, C1 (5) PARALLEL 2-CONTEXT FCM The Two-Context Fuzzy Clustering Method (2C-FCM) increase the level of details in comparison with traditional CFCM However, the computation time of this algorithm also increase as a result, due to dealing with one more variable context Basically, each step in 2C-FCM uses CFCM or FCM as a tool to classify specific data values Assume that the complexity of CFCM and FCM in this scene are the same Therefore, we have total evaluation of 2C-FCM algorithms: For each context value f (1jk) with j=1, C1 , use CFCM to classify the second context choosen variable into C2 clusters according to the objective function: C2 N J ukjm xk j j 1 k 1 Where x k R with k=1, , N are data values of the Step 1: one time uses FCM second context variable The second resulting partition matrix is: Step 3: C1 times use CFCM Step 5: C 1C times use CFCM N U { ukj (0,1) : ukj f1(k j ) , i 1, , C2 } The complexity of general FCM (CFCM) is O(n4) j 1 Consequently, the complexity of 2C-FCM is equivalent to O(n6) Although we take advantage of knowledge-detail, however, the velocity of 2C-FCM is a big ostacle when the size of data set is relative large For example, in stock market where there are a lot of shareholders and transactions in a short time; the classification is really difficult! Until now, we have not found an optimal solution for this case Define the context values for the choosen second attribute: f 2(kj ) ukj for k=1, ,N and j=1, ,C2 We have C 1C context (6) values f (2lk) with Thanks to the invention of supercomputer and especially parallel computation, the answer for this question has been solved Here is the parallel solution for 2-context fuzzy clustering method l 1, , C1 C2 For each context values f (2lk) , use CFCM to classify The parallel 2-context fuzzy clustering algorithm (P2CFCM) the remains data set into C groups with C N J ukjm xk j The algorithm consists of steps: j 1 k 1 Use FCM with the first choosen context variblable to classify into C1 clusters with the objective function: where { x k R n 2 }, which are the data values according to n-2 corresponding attributes The partition matrix is: C1 N J ukjm xk j C U { uks (0,1) : uks f 2(kl ) , s 1, , C } j 1 k 1 s 1 36 where x k R with k=1, , N are data values of the EVALUATION The 2C-FCM and P2C-FCM methods are implemented in C and MPI/C respectively and executed on a Linux Cluster 1350 with eight computing nodes of 51.2GFlops Each node contains two Intel Xeon dual core 3.2GHz, 2GB Ram We begin the experiment on the dataset of UNO in 2005 This scenario consists of parts: 1) Does 2C-FCM really bring more information than CFCM? 2) What is the optimal number of processors used in P2C-FCM? 3) Compare the velocity of CFCM, 2C-FCM and P2C-FCM algorithms first context variable The output of this step are the matrix U1 and C1 cluster centers in R: C1 U 1 u kj 0,1 : u kj 1, k 1, , N j 1 Cluster centers: V ( j ), j 1, ,C Supposed that the number of processors are h, split the matrix U1 and C1 cluster centers according to this figure Indeed, the number of context values and cluster centers per processor is equivalent to the quotient of C1 and h Moreover, some first processors have to undertake extra context values and cluster centers depending on the surplus of this number This procedure can be exemplified by the following pseudo-codes: First, we use 2C-FCM to classify UNO dataset folowing by context variables: Education Index- EI and GDP Per capitaGDPPC The number of clusters with respect to each context variable is namely as “high EI”, “low EI” for EI context variable, “high GDPPC”, “low GDPPC” for GDPPC and “high”, “low” for remain dataset The following figure shows the membership degree of all countries in the dataset to two first groups int NumRows = C1 / h; int Surpluses = C1 % h; int pos = 1; For each processor ID: - Calculate the number of data which will be sent to processor ID: int NumData = (ID