500 References A. Abraham, C. Grosan and V. Ramos (Eds.) (2006), Swarm Intelligence and Data Mining, Studies in Computational Intelligence, Springer Verlag, Germany, pages 270, ISBN: 3- 540-34955-3. Ahmed MN, Yaman SM, Mohamed N, Farag AA and Moriarty TA (2002) Modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data. IEEE Trans Med Imaging, 21, pp. 193199. Azzag H, Guinot C and Venturini G (2006) Data and text mining with hierarchical clustering ants, in Swarm Intelligence in Data Mining, Abraham A, Grosan C and Ramos V (Eds), Springer, pp. 153-186. Bandyopadhyay S and Maulik U (2000) Genetic clustering for automatic evolution of clus- ters and application to image classification, Pattern Recognition, 35, pp. 1197-1208. Beni G and Wang U (1989) Swarm intelligence in cellular robotic systems. In NATO Ad- vanced Workshop on Robots and Biological Systems, Il Ciocco, Tuscany, Italy. Bensaid AM, Hall LO, Bezdek JC.and Clarke LP (1996) Partially supervised clustering for image segmentation. Pattern Recognition, vol. 29, pp. 859-871. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. New York: Plenum. Bonabeau E, Dorigo M and Theraulaz G (1999) Swarm Intelligence: From Natural to Artifi- cial Systems. Oxford University Press, New York. Brucker P (1978) On the complexity of clustering problems. Beckmenn M and Kunzi HP(Eds.), Optimization and Operations Research, Lecture Notes in Economics and Mathematical Systems, Berlin, Springer, vol.157, pp. 45-54. Calinski RB and Harabasz J (1975) Adendrite method for cluster analysis, Commun. Statis- tics, 1 27. Chou CH, Su MC, and Lai E (2004) A new cluster validity measure and its application to image compression, Pattern Analysis and Applications 7(2), 205-220. Clark MC, Hall LO, Goldgof DB, Clarke LP, Velthuizen RP and Silbiger MS (1994) MRI segmentation using fuzzy clustering techniques. IEEE Eng Med Biol, 13, pp.730742. Clerc M and Kennedy J. The particle swarm - explosion, stability, and convergence in a multidimensional complex space, In IEEE Transactions on Evolutionary Computation (2002) 6(1), pp. 58-73. Couzin ID, Krause J, James R, Ruxton GD, Franks NR (2002) Collective Memory and Spa- tial Sorting in Animal Groups, Journal of Theoretical Biology, 218, pp. 1-11 Cui X and Potok TE (2005) Document Clustering Analysis Based on Hybrid PSO+Kmeans Algorithm, Journal of Computer Sciences (Special Issue), ISSN 1549-3636, pp. 27-33. Das S, Abraham A, and Konar A (2008) Automatic Kernel Clustering with Multi-Elitist Particle Swarm Optimization Algorithm, Pattern Recognition Letters, Elsevier Science, Volume 29, pp. 688-699. Davies DL and Bouldin DW (1979) A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 224227. Deb K, Pratap A, Agarwal S, and Meyarivan T (2002) A fast and elitist multiobjective ge- netic algorithm: NSGA-II, IEEE Trans. on Evolutionary Computation, Vol.6, No.2, April 2002. Deneubourg JL, Goss S, Franks N, Sendova-Franks A, Detrain C and Chetien L (1991) The dynamics of collective sorting: Robot-like ants and ant-like robots. In Meyer JA and Wilson SW (Eds.) Proceedings of the First International Conference on Simulation of Swagatam Das and Ajith Abraham 23 Pattern Clustering Using a Swarm Intelligence Approach 501 Adaptive Behaviour: From Animals to Animats 1, pp. 356363. MIT Press, Cambridge, MA. Dorigo M, Maniezzo V and Colorni A (1996), The ant system: Optimization by a colony of cooperating agents, IEEE Trans. Systems Man and Cybernetics Part B, vol. 26. Dorigo M and Gambardella LM (1997) Ant colony system: A cooperative learning approach to the traveling salesman problem, IEEE Trans. Evolutionary Computing, vol. 1, pp. 5366. Duda RO and Hart PE (1973) Pattern Classification and Scene Analysis. John Wiley and Sons, USA. Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95-104. Eberhart RC and Shi Y (2001) Particle swarm optimization: Developments, applications and resources, In Proceedings of IEEE International Conference on Evolutionary Computa- tion, vol. 1, pp. 81-86. Evangelou IE, Hadjimitsis DG, Lazakidou AA, Clayton C (2001) Data Mining and Knowl- edge Discovery in Complex Image Data using Artificial Neural Networks, Workshop on Complex Reasoning an Geographical Data, Cyprus. Everitt BS (1993) Cluster Analysis. Halsted Press, Third Edition. Falkenauer E (1998) Genetic Algorithms and Grouping Problems, John Wiley and Son, Chichester. Forgy EW (1965) Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of classification, Biometrics, 21. Frigui H and Krishnapuram R (1999) A Robust Competitive Clustering Algorithm with Ap- plications in Computer Vision, IEEE Transactions on Pattern Analysis and Machine In- telligence 21 (5), pp. 450-465. Gath I and Geva A (1989) Unsupervised optimal fuzzy clustering. IEEE Transactions on PAMI, 11, pp. 773-781. Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans. Neural Networks 13(3), 780784. Goldberg DE (1975) Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA. Grosan C, Abraham A and Monica C (2006) Swarm Intelligence in Data Mining, in Swarm Intelligence in Data Mining, Abraham A, Grosan C and Ramos V (Eds), Springer, pp. 1-16. Hall LO, zyurt IB and Bezdek JC (1999) Clustering with a genetically optimized approach, IEEE Trans. Evolutionary Computing 3 (2) pp. 103112. Handl J, Knowles J and Dorigo M (2003) Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and 1D-som. Technical Re- port TR/IRIDIA/2003-24. IRIDIA, Universite Libre de Bruxelles, Belgium Handl J and Meyer B (2002) Improved ant-based clustering and sorting in a document re- trieval interface. In Proceedings of the Seventh International Conference on Parallel Problem Solving from Nature (PPSN VII), volume 2439 of LNCS, pp. 913923. Springer- Verlag, Berlin, Germany. Hertz T, Bar A, and Daphna Weinshall, H (2006) Learning a Kernel Function for Classifi- cation with Small Training Samples, Appearing in Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA. Hoe K, Lai W, and Tai T (2002) Homogenous ants for web document similarity modeling and categorization. In Proceedings of the Third International Workshop on Ant Algorithms (ANTS 2002), volume 2463 of LNCS, pp. 256261. Springer-Verlag, Berlin, Germany. 502 Holland JH (1975) Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor. Huang Z and Ng MG (1999) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Systems 7 (4), 446452. Jain AK, Murty MN and Flynn PJ (1999) Data clustering: a review, ACM Computing Sur- veys, vol. 31, no.3, pp. 264323. Kanade PM and Hall LO (2003) Fuzzy Ants as a Clustering Concept. In Proceedings of the 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS03), pp. 227-232. Kaufman, L and Rousseeuw, PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York. Kennedy J, Eberhart R and Shi Y (2001) Swarm Intelligence, Morgan Kaufmann Academic Press. Kennedy J and Eberhart R (1995) Particle swarm optimization, In Proceedings of IEEE In- ternational conference on Neural Networks, pp. 1942-1948. Kim D W, Lee KY, Lee D, Lee KH (2005) A kernel-based subtractive clustering method. Pattern Recognition Letters 26(7), 879-891. Kohonen T (1995) Self-Organizing Maps, Springer Series in Information Sciences, Vol 30, Springer-Verlag. Konar A (2005) Computational Intelligence: Principles, Techniques and Applications, Springer. Krause J and Ruxton GD (2002) Living in Groups. Oxford: Oxford University Press. Kuntz P, Snyers D and Layzell P (1998) A stochastic heuristic for visualising graph clusters in a bi-dimensional space prior to partitioning. Journal of Heuristics, 5(3), pp. 327351. Kuntz P and Snyers D (1994) Emergent colonization and graph partitioning. In Proceed- ings of the Third International Conference on Simulation of Adaptive Behaviour: From Animals to Animats 3, pp. 494 500. MIT Press, Cambridge, MA. Kuntz P and Snyers D (1999) New results on an ant-based heuristic for highlighting the or- ganization of large graphs. In Proceedings of the 1999 Congress on Evolutionary Com- putation, pp. 14511458. IEEE Press, Piscataway, NJ. Leung Y, Zhang J and Xu Z (2000) Clustering by Space-Space Filtering, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12), pp. 1396-1410. Lewin B (1995) Genes VII. Oxford University Press, New York, NY. Lillesand T and Keifer R (1994) Remote Sensing and Image Interpretation, John Wiley & Sons, USA. Lumer E and Faieta B (1994) Diversity and Adaptation in Populations of Clustering Ants. In Proceedings Third International Conference on Simulation of Adaptive Behavior: from animals to animates 3, Cambridge, Massachusetts MIT press, pp. 499-508. Lumer E and Faieta B (1995) Exploratory database analysis via self-organization, Unpub- lished manuscript. MacQueen J (1967) Some methods for classification and analysis of multivariate observa- tions, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297. Major PF, Dill LM (1978) The three-dimensional structure of airborne bird flocks. Behav- ioral Ecology and Sociobiology, 4, pp. 111-122. Mao J and Jain AK (1995) Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Networks. vol. 6, 296317. Milonas MM (1994) Swarms, phase transitions, and collective intelligence, In Langton CG Ed., Artificial Life III, Addison Wesley, Reading, MA. Swagatam Das and Ajith Abraham 23 Pattern Clustering Using a Swarm Intelligence Approach 503 Mitchell T (1997) Machine Learning. McGraw-Hill, Inc., New York, NY. Mitra S, Pal SK and Mitra P (2002) Data mining in soft computing framework: A survey, IEEE Transactions on Neural Networks, Vol. 13, pp. 3-14. Monmarche N, Slimane M and Venturini G (1999) Ant Class: discovery of clusters in nu- meric data by a hybridization of an ant colony with the k means algorithm. Internal Report No. 213, E3i, Laboratoire dInformatique, Universite de Tours Moskovitch R, Elovici Y, Rokach L (2008) Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566. Ng R and Han J (1994) Efficient and effective clustering method for spatial data mining. In: Proc. 1994 International Conf. Very Large Data Bases (VLDB94). Santiago, Chile, September pp. 144155. Omran M, Salman A and Engelbrecht AP (2002) Image Classification using Particle Swarm Optimization. In Conference on Simulated Evolution and Learning, volume 1, pp. 370374. Omran M, Engelbrecht AP and Salman A (2005) Particle Swarm Optimization Method for Image Clustering. International Journal of Pattern Recognition and Artificial Intelli- gence, 19(3), pp. 297322. Omran M, Salman A and Engelbrecht AP (2005) Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification. Fifth World En- formatika Conference (ICCI 2005), Prague, Czech Republic. Pakhira MK, Bandyopadhyay S, and Maulik U (2004) Validity index for crisp and fuzzy clusters, Pattern Recognition Letters, 37, 487501. Pal NR, Bezdek JC and Tsao ECK (1993) Generalized clustering networks and Kohonens self-organizing scheme. IEEE Trans. Neural Networks, vol 4, 549557. Partridge BL, Pitcher TJ (1980) The sensory basis of fish schools: relative role of lateral line and vision. Journal of Comparative Physiology, 135, pp. 315-325. Partridge BL (1982) The structure and function of fish schools. Science American, 245, pp. 90-99. Paterlini S and Krink T (2006) Differential Evolution and Particle Swarm Optimization in Partitional Clustering. Computational Statistics and Data Analysis, vol. 50, pp. 1220 1247. Paterlini S and Minerva T (2003) Evolutionary Approaches for Cluster Analysis. In Bonarini A, Masulli F and Pasi G (eds.) Soft Computing Applications. Springer-Verlag, Berlin. 167-178. Pirooznia M and Deng Y: SVM Classifier a comprehensive java interface for support vector machine classification of microarray data, in Proc of Symposium of Computations in Bioinformatics and Bioscience (SCBB06), Hangzhou, China. Ramos V, Muge F and Pina P (2002) Self-Organized Data and Image Retrieval as a Con- sequence of Inter-Dynamic Synergistic Relationships in Artificial Ant Colonies. Soft Computing Systems: Design, Management and Applications. 87, pp. 500509. Ramos V and Merelo JJ (2002) Self-organized stigmergic document maps: Environments as a mechanism for context learning. In Proceedings of the First Spanish Conference on Evolutionary and Bio-Inspired Algorithms (AEB 2002), pp. 284293. Centro Univ. Merida, Merida, Spain. Rao MR (1971) Cluster Analysis and Mathematical Programming,. Journal of the American Statistical Association, Vol. 22, pp 622-626. Ratnaweera A and Halgamuge KS (2004) Self organizing hierarchical particle swarm opti- mizer with time-varying acceleration coefficients, In IEEE Trans. on Evolutionary Com- putation 8(3): 240-254. 504 Rokach L (2006), Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271. Rokach L and Maimon O.(2001), Theory and applications of attribute decomposition, IEEE International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001. Rokach L and Maimon O (2005), Clustering Methods, Data Mining and Knowledge Discov- ery Handbook, Springer, pp. 321-352. Rosenberger C and Chehdi K (2000) Unsupervised clustering method with optimal estima- tion of the number of clusters: Application to image segmentation, in Proc. IEEE Inter- national Conference on Pattern Recognition (ICPR), vol. 1, Barcelona, pp. 1656-1659. Sarkar M, Yegnanarayana B and Khemani D (1997) A clustering algorithm using an evolu- tionary programming-based approach, Pattern Recognition Letters, 18, pp. 975986. Scholkopf B and Smola AJ (2002) Learning with Kernels. The MIT Press, Cambridge. Selim SZ and Alsultan K (1991) A simulated annealing algorithm for the clustering problem. Pattern recognition, 24(10), pp. 1003-1008. Shi Y and Eberhart RCD (1999) Empirical Study of particle swarm optimization, In Pro- ceedings of IEEE International Conference Evolutionary Computation, Vol. 3, 101-106. Storn R and Price K (1997) Differential evolution A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, Journal of Global Optimization, 11(4), pp. 341359. Tsang W and Kwong S (2006) Ant Colony Clustering and Feature Extraction for Anomaly Intrusion Detection, in Swarm Intelligence in Data Mining, Abraham A, Grosan C and Ramos V (Eds), Springer, pp. 101-121. Vapnik VN (1998) Statistical Learning Theory. Wiley, New York. Wang X, Wang Y and Wang L (2004) Improving fuzzy c-means clustering based on feature- weight learning. Pattern Recognition Letters, vol. 25, pp. 112332. Xiao X, Dow ER, Eberhart RC, Miled ZB and Oppelt RJ (2003) Gene Clustering Using Self-Organizing Maps and Particle Swarm Optimization, Proc of the 17th International Symposium on Parallel and Distributed Processing (PDPS ’03), IEEE Computer Society, Washington DC. Xu, R., Wunsch, D.: (2005), Survey of Clustering Algorithms, IEEE Transactions on Neural Networks, Vol. 16(3): 645-678 Xu R and Wunsch D (2008) Clustering, IEEE Press Series on Computational Intelligence, USA. Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Transactions on Computers C-20, 6886. Zhang T, Ramakrishnan R and Livny M (1997) BIRCH: A New Data Clustering Algorithm and Its Applications, Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 141-182. Zhang DQ and Chen SC (2003) Clustering incomplete data using kernel-based fuzzy c- means algorithm. Neural Process Letters 18, 155162. Zhang R and Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In: The Sixteenth International Conference on Pattern Recognition, p. 289292. van den Bergh F and Engelbrecht AP (2001) Effects of swarm size on cooperative particle swarm optimizers, In Proceedings of GECCO-2001, San Francisco CA, 892-899. van der Merwe DW and Engelbrecht AP (2003) Data clustering using particle swarm opti- mization. In: Proceedings of the 2003 IEEE Congress on Evolutionary Computation, pp. 215-220, Piscataway, NJ: IEEE Service Center Swagatam Das and Ajith Abraham 24 Using Fuzzy Logic in Data Mining Lior Rokach 1 Department of Information System Engineering, Ben-Gurion University, Israel liorrk@bgu.ac.il Summary. In this chapter we discuss how fuzzy logic extends the envelop of the main data mining tasks: clustering, classification, regression and association rules. We begin by pre- senting a formulation of the data mining using fuzzy logic attributes. Then, for each task, we provide a survey of the main algorithms and a detailed description (i.e. pseudo-code) of the most popular algorithms. However this chapter will not profoundly discuss neuro-fuzzy techniques, assuming that there will be a dedicated chapter for this issue. 24.1 Introduction There are two main types of uncertainty in supervised learning: statistical and cognitive. Sta- tistical uncertainty deals with the random behavior of nature and all existing data mining tech- niques can handle the uncertainty that arises (or is assumed to arise) in the natural world from statistical variations or randomness. While these techniques may be appropriate for measuring the likelihood of a hypothesis, they says nothing about the meaning of the hypothesis. Cognitive uncertainty, on the other hand, deals with human cognition. Cognitive uncer- tainty can be further divided into two sub-types: vagueness and ambiguity. Ambiguity arises in situations with two or more alternatives such that the choice between them is left unspecified. Vagueness arises when there is a difficulty in making a precise dis- tinction in the world. Fuzzy set theory, first introduced by Zadeh in 1965, deals with cognitive uncertainty and seeks to overcome many of the problems found in classical set theory. For example, a major problem faced by researchers of control theory is that a small change in input results in a major change in output. This throws the whole control system into an un- stable state. In addition there was also the problem that the representation of subjective knowl- edge was artificial and inaccurate. Fuzzy set theory is an attempt to confront these difficulties and in this chapter we show how it can be used in data mining tasks. 24.2 Basic Concepts of Fuzzy Set Theory In this section we present some of the basic concepts of fuzzy logic. The main focus, however, is on those concepts used in the induction process when dealing with data mining. Since fuzzy O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09823-4_24, © Springer Science+Business Media, LLC 2010 506 Lior Rokach set theory and fuzzy logic are much broader than the narrow perspective presented here, the interested reader is encouraged to read (Zimmermann, 2005)). 24.2.1 Membership function In classical set theory, a certain element either belongs or does not belong to a set. Fuzzy set theory, on the other hand, permits the gradual assessment of the membership of elements in relation to a set. Definition 1. Let U be a universe of discourse, representing a collection of objects denoted generically by u. A fuzzy set A in a universe of discourse U is characterized by a member- ship function μ A which takes values in the interval [0, 1]. Where μ A (u)=0 means that u is definitely not a member of A and μ A (u)=1 means that u is definitely a member of A. The above definition can be illustrated on the vague set of Young. In this case the set U is the set of people. To each person in U, we define the degree of membership to the fuzzy set Young. The membership function answers the question ”to what degree is person u young?”. The easiest way to do this is with a membership function based on the person’s age. For example Figure 24.1 presents the following membership function: μ Young (u)= ⎧ ⎨ ⎩ 0 1 32−age(u) 16 age(u) > 32 age(u) < 16 otherwise (24.1) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 15 20 25 30 35 Age Young Membership Fig. 24.1. Membership function for the young set. Given this definition, John, who is 18 years old, has degree of youth of 0.875. Philip, 20 years old, has degree of youth of 0.75. Unlike probability theory, degrees of membership do not have to add up to 1 across all objects and therefore either many or few objects in the set may have high membership. However, an objects membership in a set (such as ”young”) and the sets complement (”not young”) must still sum to 1. 24 Using Fuzzy Logic in Data Mining 507 The main difference between classical set theory and fuzzy set theory is that the latter admits to partial set membership. A classical or crisp set, then, is a fuzzy set that restricts its membership values to {0,1}, the endpoints of the unit interval. Membership functions can be used to represent a crisp set. For example, Figure 24.2 presents a crisp membership function defined as: μ CrispYoung (u)= 0 age(u) > 22 1 age(u) ≤ 22 (24.2) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 15 20 25 30 35 Age Crisp Young Membership Fig. 24.2. Membership function for the crisp young set. In regular classification problems, we assume that each instance takes one value for each attribute and that each instance is classified into only one of the mutually exclusive classes. To illustrate how fuzzy logic can help data mining tasks, we introduce the problem of modelling the preferences of TV viewers. In this problem there are 3 input attributes: A = {Time of Day,Age Group,Mood} and each attribute has the following values: • dom(Time of Day)={Morning,Noon,Evening,Night} • dom(Age Group)={Young,Adult} • dom(Mood)={Happy,Indifferent,Sad,Sour,Grumpy} The classification can be the movie genre that the viewer would like to watch, such as C = {Action,Comedy,Drama}. All the attributes are vague by definition. For example, peoples feelings of happiness, in- difference, sadness, sourness and grumpiness are vague without any crisp boundaries between them. Although the vagueness of ”Age Group” or ”Time of Day” can be avoided by indicating the exact age or exact time, a rule induced with a crisp decision tree may then have an artificial crisp boundary, such as ”IF Age < 16 THEN action movie”. But how about someone who is 508 Lior Rokach 17 years of age? Should this viewer definitely not watch an action movie? The viewer pre- ferred genre may still be vague. For example, the viewer may be in a mood for both comedy and drama movies. Moreover, the association of movies into genres may also be vague. For instance the movie ”Lethal Weapon” (starring Mel Gibson and Danny Glover) is considered to be both comedy and action movie. Fuzzy concept can be introduced into a classical problem if at least one of the input at- tributes is fuzzy or if the target attribute is fuzzy. In the example described above , both input and target attributes are fuzzy. Formally the problem is defined as following (Yuan and Shaw, 1995): Each class c j is defined as a fuzzy set on the universe of objects U. The member- ship function μ c j (u) indicates the degree to which object u belongs to class c j . Each at- tribute a i is defined as a linguistic attribute which takes linguistic values from dom(a i )= {v i,1 ,v i,2 , ,v i, | dom(a i ) | }. Each linguistic value v i,k is also a fuzzy set defined on U. The mem- bership μ v i,k (u) specifies the degree to which object u’s attribute a i is v i,k . Recall that the membership of a linguistic value can be subjectively assigned or transferred from numerical values by a membership function defined on the range of the numerical value. Typically, before one can incoporate fuzzy concepts into a data mining application, an expert is required to provide the fuzzy sets for the quantitative attributes, along with their corresponding membership functions. Alternatively the appropriate fuzzy sets are determined using fuzzy clustering. 24.2.2 Fuzzy Set Operations Like classical set theory, fuzzy set theory includes operations union, intersection, complement, and inclusion, but also includes operations that have no classical counterpart, such as the modifiers concentration and dilation, and the connective fuzzy aggregation. Definitions of fuzzy set operations are provided in this section. Definition 2. The membership function of the union of two fuzzy sets A and B with membership functions μ A and μ B respectively is defined as the maximum of the two individual membership functions: μ A∪B (u)=max{ μ A (u), μ B (u)} (24.3) Definition 3. The membership function of the intersection of two fuzzy sets A and B with mem- bership functions μ A and μ B respectively is defined as the minimum of the two individual membership functions: μ A∩B (u)=min{ μ A (u), μ B (u)} (24.4) Definition 4. The membership function of the complement of a fuzzy set A with membership function μ A is defined as the negation of the specified membership function: μ A (u)=1 − μ A (u). (24.5) To illustrate these fuzzy operations, we elaborate on the previous example. Recall that John has a degree of youth of 0.875. Additionally John’s happiness degree is 0.254. Thus, the membership of John in the set Young ∪ Happy would be max(0.875, 0.254)=0.875, and its membership in Young ∩ Happy would be min(0.875,0.254)=0.254. 24 Using Fuzzy Logic in Data Mining 509 It is possible to chain operators together, thereby constructing quite complicated sets. It is also possible to derive many interesting sets from chains of rules built up from simple operators. For example John’s membership in the set Young ∪ Happy would be max(1 − 0.875,0.254)=0.254 The usage of the max and min operators for defining fuzzy union and fuzzy intersection, respectively is very common. However, it is important to note that these are not the only definitions of union and intersection suited to fuzzy set theory. Definition 5. The fuzzy subsethood S(A,B) measures the degree to which A is a subset of B. S(A,B)= M(A ∩B) M(A) (24.6) where M(A) is the cardinality measure of a fuzzy set A and is defined as M(A)= ∑ u∈U μ A (u) (24.7) The subsethood can be used to measure the truth level of the rule of classification rules. For example given a classification rule such as ”IF Age is Young AND Mood is Happy THEN Comedy” we have to calculate S(Hot ∩Sunny,Swimming) in order to measure the truth level of the classification rule. 24.3 Fuzzy Supervised Learning In this section we survey supervised methods that incoporate fuzzy sets. Supervised meth- ods are methods that attempt to discover the relationship between input attributes and a target attribute (sometimes referred to as a dependent variable). The relationship discovered is repre- sented in a structure referred to as a model. Usually models describe and explain phenomena, which are hidden in the dataset and can be used for predicting the value of the target attribute knowing the values of the input attributes. It is useful to distinguish between two main supervised models: classification models (classifiers) and Regression Models. Regression models map the input space into a real-value domain. For instance, a regressor can predict the demand for a certain product given its char- acteristics. On the other hand, classifiers map the input space into pre-defined classes. For instance, classifiers can be used to classify mortgage consumers as good (fully payback the mortgage on time) and bad (delayed payback). Fuzzy set theoretic concepts can be incorporated at the input, output, or into to backbone of the classifier. The data can be presented in fuzzy terms and the output decision may be provided as fuzzy membership values. In this chapter we will concentrate on fuzzy decision trees. 24.3.1 Growing Fuzzy Decision Tree Decision tree is a predictive model which can be used to represent classifiers. Decision trees are frequently used in applied fields such as finance, marketing, engineering, medicine and security (Moskovitch et al. (2008)). In the opinion of many researchers decision trees gained popularity mainly due to their simplicity and transparency. Decision tree are self-explained. There is no need to be an expert in data mining in order to follow a certain decision tree. . Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 20 01. Rokach L and Maimon O (20 05), Clustering Methods, Data Mining and Knowledge Discov- ery Handbook, Springer, pp. 321 -3 52. Rosenberger. with data mining. Since fuzzy O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09 823 -4 _24 , © Springer Science+Business Media, LLC 20 10. example, Figure 24 .2 presents a crisp membership function defined as: μ CrispYoung (u)= 0 age(u) > 22 1 age(u) ≤ 22 (24 .2) 0 0.1 0 .2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 15 20 25 30 35 Age Crisp