2013 IEEE RIVF International Conference on Computing & Communication Technologies Research, Innovation, and Vision for the Future (RIVF) A max-min learning rule for Fuzzy ART Nong Thi Hoa, The Duy Bui Human Machine Interaction Laboratory University of Engineering and Technology Vietnam National University, Hanoi Abstract—Fuzzy Adaptive Resonance Theory (Fuzzy ART) is an unsupervised neural network, which clusters data effectively based on learning from training data In the learning process, Fuzzy ARTs update the weight vector of the wining category based on the current input pattern from training data Fuzzy ARTs, however, only learn from patterns whose values are smaller than values of stored patterns In this paper, we propose a maxmin learning rule of Fuzzy ART that learns all patterns of training data and reduces effect of abnormal training patterns Our learning rule changes the weight vector of the wining category based on the minimal difference between the current input pattern and the old weight vector of the wining category We have also conducted experiments on seven benchmark datasets to prove the effectiveness of the proposed learning rule Experiment results show that clustering results of Fuzzy ART with our learning rule (Max-min Fuzzy ART) is significantly higher than that of other models in complex datasets Index Terms—Fuzzy ART; Adaptive Resonance Theory; Clustering; Learning Rule;Unsupervised Neural Network Figure contribute to the learning process of Fuzzy ART Therefore, some important features of these patterns are not learned In this paper, we propose a new learning rule of Fuzzy ART, which we name max-min learning rule Our learning rule allows contribution of all training patterns to the learning process and reduces the effect of abnormal training patterns Proposed learning rule updates the weight vector of the wining category based on the minimal difference between the current input pattern and the old weight vector of the wining category We have conducted experiments on seven benchmark datasets in UCI and Shape database to prove the effectiveness of Maxmin Fuzzy ART Results from the experiments shows that Max-min Fuzzy ART clusters better than other models The rest of this paper is organized as follows The next section is back ground of Fuzzy ART Related works are presented in Section In section 4, we present the proposed learning rule Section shows experiments and compares results of Max-min Fuzzy ART with those of other models I I NTRODUCTION Clustering is an important tool in data mining and knowledge discovery It discovers the hidden similarities and key concepts from data based on its ability to group similar items together automatically Moreover, the clustering classifies a large amount of data into a small number of groups This serves as an invaluable tool for users to comprehend a large amount of data Fuzzy Adaptive Resonance Theory (Fuzzy ART) is an unsupervised neural network clustering from data in a effective way In Fuzzy ART, weight vectors of categories are updated when the similarity between the input pattern and the wining category satisfies a given condition Studies on Fuzzy ART can be divided into three categories including developing new models, studying properties of Fuzzy ART, and optimizing the computing of models In the category of developing new models, Fuzzy ARTs are improved in the learning step in order to increase the ability of clustering Carpenter et al [1] proposed the first Fuzzy ART showing the capacity of stable learning of recognition categories Isawa et al [2], [3] proposed an additional step, Group Learning, to present connections between similar categories Yousuf and Murphey [4] provided an algorithm that allowed Fuzzy ART to update multiple matching categories However, previous Fuzzy ARTs only learn patterns whose values are smaller than values of stored patterns It means that only when values of input pattern are smaller than values of the weight vector of the wining category, the weight vector of the wining category is modified Otherwise, the weight vector of wining category is not changed In other words, some input patterns might not 978-1-4799-1350-3/13/$31.00 ©2013 IEEE Architecture of an ART network II BACK GROUND A ART Network Adaptive Resonance Theory (ART) neural networks [5], [6] are developed to address the problem of stability-plasticity dilemma The general structure of an ART network is shown in the Figure A typical ART network consists of two layers: an input layer (F1) and an output layer (F2) The input layer contains n nodes, where n is the number of input patterns The number of nodes in the output layer is decided dynamically Every node in the output layer has a corresponding weight vector The network dynamics are governed by two sub-systems: an attention subsystem and an orienting subsystem The attention subsystem proposes a winning neuron (or category) and the orienting subsystem decides whether to accept the winning neuron or not This network is in a resonant state when the orienting subsystem accepts a winning category 53 B Fuzzy ART Algorithm ||I ∧ wj || ≤ρ ||I|| Carpenter et al summarize Fuzzy ART algorithm in [1] Input vector: Each input I is an M-dimensional vector (I1 , .IM ), where each component li is in the interval [0, 1] Parameters: Fuzzy ART’s dynamics are determined by a choice parameter α > 0; a learning rate parameter β ∈ [0, 1]; and a vigilance parameter θ ∈ [0, 1] Fuzzy ART algorithm consists of five following steps: Step 1: Set up weight vector Each category (j) corresponds to a vector Wj = (Wj1 , , WjM ) of adaptive weights, or LTM traces The number of potential categories N(j = i, , N) is arbitrary Initially Wj1 = = wjM = Then the value of the choice function Tj is reset to −1 for the duration of the input presentation A new index J is chosen, by Eqs The search process continues until the chosen j satisfies Eqs or actives a new category Step 4: Perform the learning process The weight vector if j th category, Wj , is updated according to the following equation: Wjnew = β(I ∧ Wjold ) + (1 − β)Wjold ||I ∧ wj || α + ||wj || (1) C Fuzzy ART with complement coding Proliferation of categories is avoided in Fuzzy ART if inputs are normalized; that is, for some γ > |I| = γ aci = − (2) (3) I = (ai , aci ) = (a1 , , aM , ac1 , , aM i ) (4) After normalization, ||I|| = M so inputs preprocessed into complement coding form are automatically normalized Where complement coding is used, the initial condition in Eqs is replaced by: M |xi | (10) The complement coded input I to the recognition system is the 2M-dimensional vector and where the norm ||.|| is defined by ||x|| = (9) for all inputs I Normalization can be achieved by preprocessing each incoming vector a Complement coding represents both a and the complement of a The complement of a is denoted by ac , where where the fuzzy AND operator ∧ is defined by (x ∧ y)i = min(xi , yi ) (8) Step 5: Active a new category For each input I, if no existing category satisfies Eqs then (new) a new category j becomes active Then, Wj = I and each category is said to be uncommitted Alternatively, initial weights Wji may be taken greater than Larger weights bias the system against selection of uncommitted nodes, leading to deeper searches of previously coded categories After a category is selected for coding it becomes committed As shown below, each LTM trace Wji is monotone non-increasing through time and hence converges to a limit Step 2: Choose a wining category For each input I and category j, the choice function Tj is defined by Tj (I) = (7) (11) i=1 For notational simplicity, Tj (I) in Eqs is often written as Tj when the input I is fixed The category choice is indexed by j, where Tj = max{Tj : j = N } Wj1 = = wj2M = III R ELATED WORK Studies on theory of Fuzzy ART can be divided into three categories including developing new models, studying properties of Fuzzy ART, and optimizing the computing of models In the first category, new models of Fuzzy ART used a general learning rule Carpenter et al [7] proposed Fuzzy ARTMAP for incremental supervised learning of recognition categories and multidimensional maps from arbitrary sequences of input set This model minimized predictive error and maximized code generalization by increasing the ART vigilance parameter to correct the predictive error Prediction was improved by training system several times with different sequences of input set, then voting This vote assigns probability estimations to competing predictions for small, noisy, and incomplete data Isawa et al [2] proposed an additional step that was called (5) If more than one Tj is maximal, the category j with the smallest index is chosen In particular, nodes become committed in order j = 1, 2, 3, Step 3: Test state of Fuzzy ART Resonance occurs if the match function of the chosen category meets the vigilance criterion; that is, if ||I ∧ wj || ≥ρ ||I|| (12) (6) then learning process is performed in Step Mismatch reset occurs if 54 Group Learning Its important feature was that creating connections between similar categories It means that this model learned not only weight vectors of categories but also relations among categories in a group Then, Isawa [3] designed an improved Fuzzy ART combining overlapped categories base on connections This study arranged the vigilance parameters for categories and varied them in learning process Moreover, this model voided the category proliferating Yousuf and Murphey [4] proposed an algorithm that compared the weights of every categories with the current input pattern simultaneously and allowed updating multiple matching categories This model monitored the effects of updating wrong clusters Weight scaling of categories depended on the closeness of the weight vectors to the current input pattern In the second category, important properties of Fuzzy ART were studied to choose suitable parameters for each Fuzzy ART Huang et al [8] presented some vital properties that were distinguished into a number of categories The vital properties included template, access, reset, and other properties for weight stabilization Moreover, the effects of the choice parameter and the vigilance parameter on the functionality of Fuzzy ART were presented clearly Geogiopoulos et al [9] provided a geometrical and clearer understanding of why, and in which order that categories were chosen for various ranges of the choice parameter This study came in useful when developing properties of learning that pertained to the architecture of neural networks Anagnostopoulos and Georgiopoulos [10] introduced geometric concepts, namely category regions, in the original framework of Fuzzy ART and Fuzzy ARTMAP These regions had the same geometrical shape and shared many common and interesting properties They proved properties of the learning and showed that training and performance phases did not depend on particular choices of the vigilance parameter in one special state of the vigilance-choice parameter space In the third category, studies focused on ways to improve the performance of Fuzzy ART Burwick and Joublin [11] discussed implementations of ART on a non-recursive algorithm to decrease algorithmic complexity of Fuzzy ART Therefore, the complexity dropped from O(N*N+M*N) down to O(NM) where N was the number of categories and M was the input dimension Dagher et al [12] introduced an ordering algorithm for Fuzzy ARTMAP that identified a fixed order of training pattern presentation based on the maxmin clustering method The combination of this algorithm with fuzzy ARTMAP established an ordered Fuzzy ARTMAP that exhibited a generalization performance better Cano et al [13] generated accurate function identifiers for noisy data This study was supported by theorems that guaranteed the possibility of representing an arbitrary function by fuzzy systems They proposed two neuron-fuzzy identifiers that offered a dual interpretation as fuzzy logic system or neural network Moreover, these identifiers can be trained on noisy data without changing the structure of neural networks or data preprocessing Kobayashi et al [14] proposed a reinforcement learning system that used Fuzzy ART to classify observed information and construct an effective state space Then, profit sharing was employed as a reinforcement learning method Furthermore, this system was used to effectively solve partially observed Markov decision process IV O UR APPROACH A Max-min learning rule of Fuzzy ART As we discuss in Section 1, Fuzzy ARTs cannot learn from some important patterns of training data Therefore, we propose the max-min learning rule that learns all training patterns and avoids the effect of abnormal training patterns In our learning rule, the weight vector of the wining category is updated based on the minimum difference between the input patterns and the old weight vector of the wining category Parameter shows the effect of each training pattern for each wining category We proposed a procedure to find an optimized value of for each dataset In this procedure, after is set up roughly based on the size of dataset, it is increased or decreased until clustering results become highest Max-min learning rule is presented as follow: Learning step is performed by the three following steps: • Step 1: Determine the minimum difference between the current input pattern and the old weight vector of the wining category including the minimum difference of decrease (MDD) and the minimum difference of increase (MDI) MDD and MDI are formulated by two equations: M DD = M DI = • • lim inf old Wji − Ii (13) lim inf old Ii − Wji (14) old ,i=1, ,M Ii Wji Step 2: Find an optimized value of δ by the procedure in the next subsection Step 3: The weight vector of the wining category, W, is updated by the following equation: old old Wji − δ ∗ M DD, Ii < Wji old old (15) Wji = Wji , Ii = Wji old old Wji + δ ∗ M DI, Ii > Wji B Procedure for determining an optimized value of δ We select a random subset from dataset with uniform distribution for categories Then, Max-min Fuzzy ART uses this subset to test the ability of clustering for each value of Our procedure consists of three steps as follow: • Step 1: Set up δ based on the size of dataset Then, calculate clustering result • Step 2: Do – Step 2.1: Increase or decrease value of with a small step – Step 2.2: Calculate the clustering result – Step 2.3: Test clustering result: IF the ability of clustering is an increase or decrease THEN Step IF clustering result is highest THEN go Step 55 Table I CHARACTERISTICS OF DATASETS Index • Dataset Name WDBC WINE-RED WINE-WHITE BALANCE-SCALE MONKS D31 R15 #Categories 6 31 15 #Attribute 30 11 11 2 Table II THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH WDBC DATASET #Patterns 569 1599 4898 625 459 3100 600 #Record 569 500 400 300 200 100 ComMART 90.51 89.2 86.5 82 92.5 89 OriART 35.85 36.4 39 32.33 21.5 ComART 74.17 73.6 70.75 67.33 76 54 EucART 46.92 41.8 33.75 21 0.5 K-mean 16.17 18.4 23 30.67 44.5 45 Table III THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH D31 DATASET until the clustering result is highest Step 3: Return the optimized value of δ #Record 3100 2500 2000 1500 1000 500 C Discussion The proposed learning rule has two important advantages overcome original learning rule as follow: • All patterns from training data are learned by Eqs 15 Moreover, the effect level is equal for each training pattern because parameter δ is fixed • Avoiding the effect of abnormal training patterns presented in Eqs 13 and 14 because MDI and MDD are the minimum difference of decrease and the minimum difference of increase The ability of clustering of Max-min Fuzzy ART can be improved based on improvements of the learning process OriMART 91.87 91.44 90.7 90 89.5 84.8 ComMART 94.45 94.16 93.95 93.87 93.2 90.6 OriART 84.74 84.44 86.1 85.67 86.4 89.6 ComART 92.94 91.96 91.55 91.6 90.9 86.8 EucART 92.48 91.64 91.2 90.8 90.5 88.6 K-mean 65 64.68 61.6 59.93 66.2 57.4 A Testing with WDBC dataset Distribution of two categories is non-uniform with medium level Data from Table II shows that the ability of clustering of Complement Max-min Fuzzy ART is greatly higher than other models in all sub-tests B Testing with D31 dataset Distribution of 31 categories is uniform Results of Table III show that Complement Max-min Fuzzy ART is the best model in every sub-tests V E XPERIMENTS We select seven bench mark datasets from UCI database and Shape database for experiments, namely, MONKS, BALANCE-SCALE, D31, R35, WDBC (Wisconsin Diagnostic Breast Cancer), WINE-RED (Wine Quality of Red wine), and WINE-WHITE (Wine Quality of White wine) These datasets are different from each other by the number of attributes, categories, patterns, and distribution of categories Table I shows characteristics of selected datasets Max-min Fuzzy ART is implemented into two models including first model (Original Max-min Fuzzy ART) and the second model with normalized inputs (Complement Maxmin Fuzzy ART) Similarly, Fuzzy ART of Carpenter [1] consists of two models including Original Fuzzy ART and Complement Fuzzy ART We use following models in experiments, namely, Original Max-min Fuzzy ART (OriMART), Complement Max-min Fuzzy ART (ComMART), Original Fuzzy ART (OriART), Complement Fuzzy ART (ComART), Kmean [15], and Euclidean ART (EucART) [16] to prove the effectiveness of Max-min Fuzzy ART Data of each datasets are normalized to values in [0,1] We choose a random vector of each category to be initial weight vector of categories Parameters of models are determined to obtain the highest clustering results For each dataset, we sub-tests with the different number of patterns The percents of successful clustering patterns are presented in a corresponding table Bold numbers in each table show results of the best model among compared models UCI OriMART 55.89 51 40 26 0 C Testing with WINE-WHITE dataset Distribution of six categories is non-uniform with high level Data of Table IV shows that the ability of clustering of Original Max-min Fuzzy ART is greatly higher than other models in all sub-tests D Testing with BALANCE-SCALE dataset Distribution of six categories is non-uniform with high level Results from Table V show that the ability of clustering of Original Max-min Fuzzy ART is sharply higher than other models in every sub-tests, excepting the last sub-test with smallest number of records (100 records) E Testing with R15 dataset Distribution of 15 categories is uniform Data of Table VI shows that the ability of clustering of both Complement Maxmin Fuzzy ART and Original Max-min Fuzzy ART is higher than other models in all sub-tests, excepting the last sub-test with smallest number of records (100 records) Table IV THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH WINE-WHITE DATASET #Record 4898 4000 3000 2000 1000 database, Avaliable at: http://archive.ics.uci.edu/ml/datasets database, Avaliable at: http://cs.joensuu./sipu/datasets Shape 56 OriMART 34.73 37.75 41.57 43.6 50.8 ComMART 43.32 33.18 12.67 4.3 4.1 OriART 21.95 23.48 28.47 26.3 11 ComART 17.78 15.425 18.13 21.3 23.8 EucART 17.07 18.45 16.93 19.25 23.7 K-mean 32.24 32.1 30.47 29.8 20.7 Table V THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH BALANCE-SCALE DATASET #Record 625 500 400 300 200 100 OriMART 80.16 75.2 70.5 67 50.5 19 ComMART 67.52 64.8 56.25 46.67 44 38 OriART 59.52 57.2 49.75 46.67 46 36 ComART 55.52 51.2 46.25 43.33 42 30 EucART 45.76 32.2 17 7.5 10 the ability of clustering Our learning rule learns every patterns of training data and avoids the effect of abnormal training patterns The improvement of clustering results is shown in our experiments with seven benchmark datasets The experiment results show that the ability of clustering of Max-min Fuzzy ART is higher than other model for complex datasets with the small number of patterns Especially, Max-min Fuzzy ART clusters more effectively with datasets that contain a high number of patterns K-mean 33.6 28.2 31.25 27.67 25 24 Table VI THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH R15 DATASET #Record 600 500 400 300 200 100 OriMART 98.17 97.8 97.25 96.33 96.5 99 ComMART 98.17 97.8 97.25 96.33 96.5 99 OriART 91.2 89.4 86.8 88.3 93.5 95 ComART 97.8 97.4 96.8 95.7 95.5 98 EucART 97.8 97.4 96.8 95.7 96 100 ACKNOWLEDGEMENTS This work was supported by Vietnams National Foundation for Science and Technology Development (NAFOSTED) under Granted Number 102.02-2011.13 K-mean 76 71.2 64 53.7 73 100 R EFERENCES [1] G Carpenter, S Grossberg, and D B Rosen, Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Resonance System, Neural Networks, vol 4, pp 759771, 1991 [2] H Isawa, M Tomita, H Matsushita, and Y Nishio, Fuzzy Adaptive Resonance Theory with Group Learning and its Applications, International Symposium on Nonlinear Theory and its Applications, no 1, pp 292295, 2007 [3] [3] H Isawa, H Matsushita, and Y Nishio, Improved Fuzzy Adaptive Resonance Theory Combining Overlapped Category in Consideration of Connections, IEEE Workshop on Nonlinear Circuit Networks, pp 811, 2008 [4] Yousuf and Y L Murphey, A Supervised Fuzzy Adaptive Resonance Theory with Distributed Weight Update, Proceedings of the 7th international conference on Advances in Neural Networks, Part I, LN, no 6063, pp 430435, 2010 [5] S Grossberg, Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction and illusions, Biological Cybernetics, 23, 187-212, 1976 [6] S Grossberg, How does a brain build a cognitive code, Studies of mind and brain: Neural principles of learning, perception, development, cognition, and motor control (Chap I) Boston, MA: Reidel Press, 1980 [7] G A Capenter, S Grossberg, and N Markuron, Fuzzy ARTMAP - an adaptive resonance architecture for incremental learning of analog maps, International Joint Conference on Neural Networks, vol 3., pp 309-314, 1992 [8] J Huang, M Georgiopoulos, and G L Heileman, Fuzzy ART Properties, Neural Networks, vol 8, no 2, pp 203213, 1995 [9] M Geogiopoulos, H Fernlund, G Bebis, and G Heileman, Fuzzy ART and Fuzzy ARTMAP-Effects of the choice parameter, Neural Networks, vol pp 15411559, 1996 [10] G C Anagnostopoulos and M Georgiopoulos, Category regions as new geometrical concepts in Fuzzy-ART and Fuzzy-ARTMAP, Neural Networks, vol 15, pp 12051221, 2002 [11] T Burwick and F Joublin, Optimal Algorithmic Complexity of Fuzzy ART, Neural Processing Letters, vol 7, pp 3741, 1998 [12] Dagher, M Georgiopoulos, G L Heileman, and G Bebis, An ordering algorithm for pattern presentation in fuzzy ARTMAP that tends to improve generalization per-formance, IEEE transactions on Neural Networks, vol 10, no 4, pp 76878, Jan 1999 [13] M Cano, Y Dimitriadis, E Gomez, and J Coronado, Learning from noisy information in FasArt and FasBack neuro-fuzzy systems, Neural Networks, vol 14, pp 407425, 2001 [14] K Kobayashi, S Mizuno, T Kuremoto, and M Obayashi, A Reinforcement Learning System Based on State Space Construction Using Fuzzy ART, Proceedings of SICE Annual Conference, vol 2005, no 1, pp.36533658, 2005 [15] J.B.MacQueen, Some methods for classication and analysis of multivariate obser-vations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, no 1, pp 281297, 1967 [16] R Kenaya and K C Cheok, Euclidean ART Neural Networks, Proceedings of the World Congress on Engineering and Computer Science, 2008 Testing with MONKS dataset Distribution of two categories is uniform Results of Table VII show that Original max-min Fuzzy ART clusters higher than other models, excepting the last sub-test with smallest number of records (100 records) F G Testing with WINE-RED dataset Distribution of six categories is non-uniform with high level Data of Table VIII show that Original Max-min Fuzzy ART clusters better than other model in sub-tests with high number of records and a bit lower than Complement Fuzzy ART at the first sub-test (lower 0.13 In conclusion, results from sub-tests of seven experiments show the ability of clustering of Max-min Fuzzy ART improves significantly for small datasets with the high complex (many attributes, non-uniform distribution with high level) Especially, clustering results is high when dataset contains many records VI C ONCLUSION In this paper, we propose a new learning rule of Fuzzy ART that learns from training data more effectively and improves Table VII THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH MONKS DATASET #Record 459 400 300 200 100 OriMART 67.97 63.25 57.67 64.5 78 ComMART 41.18 41.5 42.67 46 65 OriART 42.92 44.25 44 48 59 ComART 41.18 42 37.67 41.5 52 EucART 65.36 60.75 47.67 23 K-mean 45.75 45.25 48.67 59.5 89 Table VIII THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS WITH WINE-RED DATASET #Record 1599 1200 900 600 300 OriMART 25.39 33.83 41 32 12.67 ComMART 17.32 23.08 27.78 26.83 21.33 OriART 18.26 20.25 21.11 21.5 25.67 ComART 25.52 18.67 10.78 9.833 16.33 EucART 14.26 17.75 22 28.67 51.67 K-mean 16.77 17 19.22 17.83 24.67 57 ... experiments, namely, Original Max-min Fuzzy ART (OriMART), Complement Max-min Fuzzy ART (ComMART), Original Fuzzy ART (OriART), Complement Fuzzy ART (ComART), Kmean [15], and Euclidean ART (EucART) [16]... CLUSTERING PATTERNS WITH WINE-WHITE DATASET #Record 4898 4000 3000 2000 1000 database, Avaliable at: http://archive.ics.uci.edu/ml/datasets database, Avaliable at: http://cs.joensuu./sipu/datasets Shape... ordered Fuzzy ARTMAP that exhibited a generalization performance better Cano et al [13] generated accurate function identifiers for noisy data This study was supported by theorems that guaranteed