Khoo, Li-Pheng et al "RClass*: A Prototype Rough-Set and Genetic Algorithms Enhanced Multi-Concept Classification System for Manufacturing Diagnosis" Computational Intelligence in Manufacturing Handbook Edited by Jun Wang et al Boca Raton: CRC Press LLC,2001 ©2001 CRC Press LLC 19 RClass * : A Prototype Rough-Set and Genetic Algorithms Enhanced Multi-Concept Classification System for Manufacturing Diagnosis 19.1 Introduction 19.2 Basic Notions 19.3 A Prototype Multi-Concept Classification System 19.4 Validation of RClass * 19.5 Application of RClass * to Manufacturing Diagnosis 19.6 Conclusions 19.1 Introduction Inductive learning or classification of objects from large-scale empirical data sets is an important research area in artificial intelligence (AI). In recent years, many techniques have been developed to perform inductive learning. Among them, the decision tree learning technique is the most popular. Using such a technique, Quinlan [1992] has successfully developed the Inductive Dichotomizer 3 (ID3), and its later versions C4.5 and C5.0 (See 5.0) in 1986, 1992, and 1997, respectively. Essentially, decision support is based on human knowledge about a specific part of a real or abstract world. If the knowledge is gained by experience, decision rules can possibly be induced from the empirical training data obtained. In reality, due to various reasons, empirical data often has the property of granularity and may be incomplete, imprecise, or even conflicting. For example, in diagnosing a manufacturing system, the opinions of two engineers can be different, or even contradictory. Some earlier inductive learning systems such as the once prevailing decision tree learning system, the ID3, are unable to deal with imprecise and inconsistent information present in empirical training data [Khoo et al., 1999]. Thus, the ability to handle imprecise and inconsistent information has become one of the most important requirements for a classification system. Li-Pheng Khoo Nanyang Technological University Lian-Yin Zhai Nanyang Technological University ©2001 CRC Press LLC Many theories, techniques, and algorithms have been developed to deal with the analysis of imprecise or inconsistent data in recent years. The most successful ones are fuzzy set theory and Dempster–Shafer theory of evidence. On the other hand, rough set theory, which was introduced by Pawlak [1982] in the early 1980s, is a new mathematical tool that can be employed to handle uncertainty and vagueness. Basically, rough set handles inconsistent information using two approximations, namely the upper and lower approximations. Such a technique is different from fuzzy set theory or Dempster–Shafer theory of evidence. Furthermore, rough set theory focuses on the discovery of patterns in inconsistent data sets obtained from information sources [Slowinski and Stefanowski, 1989; Pawlak, 1996] and can be used as the basis to perform formal reasoning under uncertainty, machine learning, and rule discovery [Ziarko, 1994; Pawlak, 1984; Yao et al., 1997]. Compared to other approaches in handling uncertainty, rough set theory has its unique advantages [Pawlak, 1996, 1997]. It does not require any preliminary or additional information about the empirical training data such as probability distribution in statistics; the basic probability assignment in the Dempster–Shafer theory of evidence; or grades of membership in fuzzy set theory [Pawlak et al., 1995]. Besides, rough set theory is more justified in situations where the set of empirical or experimental data is too small to employ standard statistical method [Pawlak, 1991]. In less than two decades, rough set theory has rapidly established itself in many real-life applications such as medical diagnosis [Slowinski, 1992], control algorithm acquisition and process control [Mrozek, 1992], and structural engineering [Arciszewski and Ziarko, 1990]. However, most literature related to inductive learning or classification using rough set theory is limited to a binary concept, such as yes or no in decision making or positive or negative in classification of objects. Genetic algorithms (GAs) are stochastic and evolutionary search techniques based on the principles of biological evolution, natural selection, and genetic recombination. GAs have received much attention from researchers working on optimization and machine learning [Goldberg, 1989]. Basically, GA-based learning techniques take advantage of the unique search engine of GAs to perform machine learning or to glean probable decision rules from its search space. This chapter describes the work that leads to the development of RClass * , a prototype multi-concept classification system for manufacturing diagnosis. RClass * is based on a hybrid technique that combines the strengths of rough set, genetic algorithms, and Boolean algebra. In the following sections, the basic notions of rough set theory and GAs are presented. Details of RClass * , its validation, and a case study using the prototype system are also described. 19.2 Basic Notions 19.2.1 Rough Set Theory Large amounts of applications of rough set theory have proven its robustness in dealing with uncertainty and vagueness, and many researchers attempted to combine it with other inductive learning techniques to achieve better results. Yasdi [1995] combined rough set theory with neural network to deal with learning from imprecise training data. Khoo et al. [1999] developed RClass * , a prototype system based on rough sets and a decision-tree learning methodology, and the predecessor of RClass * , for inductive learning under noisy environment. Approximation space and the lower and upper approximations of a set form two important notions of rough set theory. The approximation space of a rough set is the classification of the domain of interest into disjoint categories [Pawlak, 1991]. Such a classification refers to the ability to characterize all the classes in a domain. The upper and lower approximations represent the classes of indiscernible objects that possess sharp descriptions on concepts but with no sharp boundaries. The basic philosophy behind rough set theory is based on equivalence relations or indiscernibility in the classification of objects. Rough set theory employs a so-called information table to describe objects. The information about the objects are represented in a structure known as an information system, which can be viewed as a table with its rows and columns corresponding to objects and attributes, respectively (Table 19.1). For example, an information system ( S ) with 4-tuple can be expressed as follows: S = 〈 U, Q, V, ρ 〉 ©2001 CRC Press LLC where U is the universe which contains a finite set of objects, Q is a finite set of attributes, V = q ∈ Q V q V q is a domain of the attribute q , ρ : U × Q → V is the information function such that ρ ( x, q ) ∈ for every q ∈ Q and x ∈ U and ∃ ( q, v ), where q ∈ Q and v ∈ V q is called a descriptor in S. Table 19.1 shows a typical information system used for rough set analysis with x i s ( i = 1, 2, . . . 10 ) representing objects of the set U to be classified; q i s ( i = 1, 2 ) denoting the condition attributes ; and d representing the decision attribute . As a result, q i s and d form the set of attributes, Q . More specifically, A typical information function, ρ (x 1 , q 1 ), can be expressed as Any attribute-value pair such as ( q 1 , 1) is called a descriptor in S . Indiscernibility is one of the most important concepts in rough set theory. It is caused by imprecise information about the observed objects. The indiscernibility relation ( R ) is an equivalence relation on the set U and can be defined in the following manner: If x, y ∈ U and P ∈ Q , then x and y are indiscernible by the set of attributes P in S . Mathematically, it can be expressed as follows For example, using the information system given in Table 19.1, objects x 5 and x 7 are indiscernible by the set of attributes P = { q 1 ,q 2 }. The relation can be expressed as because the information functions for the two objects are identical and are given by TABLE 19.1 A Typical Information System Used by Rough Set Theory Objects Attributes Decisions Uq 1 q 2 d x 1 100 x 2 111 x 3 121 x 4 000 x 5 010 x 6 021 x 7 011 x 8 020 x 9 100 x 10 000 U Uxxx Qqqd VVVV qq d =… {} = {} = {} = {}{ }{} {} 12 10 12 12 01 012 01 , ,, ,, ,,,,,,. ; ; and ρ xq 11 1, () = {} xPy ˆ if for ρρ xq yq q P,, . () = () ∃∈ xP x 57 ˆ ©2001 CRC Press LLC Hence, it is not possible to distinguish one from another using attributes set { q 1 ,q 2 }. The equivalence classes of relation, , are known as P-elementary sets in S . Particularly, when P = Q , these Q -elementary sets are known as the atoms in S . In an information system, concepts can be represented by the decision -elementary sets. For example, using the information system depicted in Table 19.1, the { q 1 } -elementary sets, atoms, and concepts can be expressed as follows: { q 1 }-elementary sets E 1 = {x 1 ,x 2 ,x 3 ,x 9 }for ρ (x, q 1 ) = {1} E 1 = {x 4 ,x 5 ,x 6 ,x 7 ,x 8 ,x 10 }for ρ (x, q 1 ) = {0} Atoms A 1 = {x 1 , x 9 } A 2 = {x 2 } A 3 = {x 3 } A 4 = {x 4 , x 10 } A 5 = {x 5 } A 6 = {x 6 } A 7 = {x 7 } A 8 = {x 8 } Concepts C 1 = {x 1 ,x 4 ,x 5 ,x 8 ,x 9 ,x 10 } ⇒ Class = 0 (d = 0) C 2 = {x 2 ,x 3 ,x 6 ,x 7 } ⇒ Class = 1 (d = 1) Table 19.1 shows that objects x 5 and x 7 are indiscernible by condition attributes q 1 and q 2 . Furthermore, they possess different decision attributes. This implies that there exists a conflict (or inconsistency) between objects x 5 and x 7 . Similarly, another conflict also exists between objects x 6 and x 8 . Rough set theory offers a means to deal with inconsistency in information systems. For a concept (C), the greatest definable set contained in the concept is known as the lower approximation of C (R (C)). It represents the set of objects (Y) on U that can be certainly classified as belonging to concept C by the set of attributes, R, such that where U/R represents the set of all atoms in the approximation space (U, R). On the other hand, the least definable set containing concept C is called the upper approximation of C (R (C)). It represents the set of objects (Y) on U that can be possibly classified as belonging to concept C by the set of attributes R such that where U/R represents the set of all atoms in the approximation space (U, R). Elements belonging only to the upper approximation compose the boundary region (BN R ) or the doubtful area. Mathematically, a boundary region can be expressed as A boundary region contains a set of objects that cannot be certainly classified as belonging to or not belonging to concept C by a set of attributes, R. Such a concept, C, is called a rough set. In other words, rough sets are sets having non-empty boundary regions. ρρ xqq xqq 512 712 10,, ,, ,. () = () = {} ˆ P RC Y U RY C () =∈ ⊆ {} U /.: RC Y U RY C () =∈ ∩≠∅ {} U / : BN C R C R C R () = () () –. ©2001 CRC Press LLC Using the information system shown in Table 19.1 again, based on rough set theory, the upper and lower approximations, concepts C 1 for d = 0 and C 2 for d = 1, can be easily obtained. For example, the lower approximation of concept C 1 (d = 0) is given by and its upper approximation is denoted as Thus, the boundary region of concept C 1 is given by As for concept C 2 (d = 1), the approximations can be similarly obtained as follows. As already mentioned, rough set theory offers a powerful means to deal with inconsistency in an information system. The upper and lower approximations make it possible to mathematically describe classes of indiscernible objects that possess sharp descriptions on concepts but with no sharp boundaries. For example, universe U (Table 19.1) consists of ten objects and can be described using two concepts, namely “d = 0” and “d = 1.” As already mentioned, two conflicts, namely objects x 5 and x 7 , and objects x 6 and x 8 , exist in the data set. These conflicts cause the objects to be indiscernible and constitute doubtful areas, which are denoted by BN R (0) or BN R (1), respectively (Figure 19.1). The lower approximation of concept “0” is given by object set {x 1 ,x 4 ,x 9 ,x 10 }, which forms the certain training data set of concept “0.” On the other hand, the upper approximation is represented by object set {x 1 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 ,x 9 ,x 10 }, which contains the possible training data set of concept “0.” Concept “1” can be similarly interpreted. 19.2.2 Genetic Algorithms As already mentioned, GAs are stochastic and evolutionary search techniques based on the principles of biological evolution, natural selection, and genetic recombination. They simulate the principle of “sur- vival of the fittest” in a population of potential solutions known as chromosomes. Each chromosome represents one possible solution to the problem or a rule in a classification. The population evolves over time through a process of competition whereby the fitness of each chromosome is evaluated using a fitness function. During each generation, a new population of chromosomes is formed in two steps. First, the chromosomes in the current population are selected to reproduce on the basis of their relative fitness. Second, the selected chromosomes are recombined using idealized genetic operators, namely crossover and mutation, to form a new set of chromosomes that are to be evaluated as the new solution of the problem. GAs are conceptually simple but computationally powerful. They are used to solve a wide variety of problems, particularly in the areas of optimization and machine learning [Grefenstette, 1994; Davis, 1991]. Figure 19.2 shows the flow of a typical GA program. It begins with a population of chromosomes either generated randomly or gleaned from some known domain knowledge. Subsequently, it proceeds to evaluate the fitness of all the chromosomes, select good chromosomes for reproduction, and produce RC x x x x 1 14910 () = {} ,,, ; RC xxxxxxxx 1 145678910 () = {} ,,,,,,, . BN C R C R C x x x x R 1 1 1 5678 () = () () = {} –,,,. RC x x RC xxxxxx BNC RC RC xxxx R 223 2 235678 2225678 () = {} () = {} () = () () = {} ,; ,,,,, – ,,, . ; and ©2001 CRC Press LLC FIGURE 19.1 Basic notions of rough sets. FIGURE 19.2 A typical GA program flow. U R (0) BN R (0) = BN R (1) R (1) R (1) Concept ‘1‘ Concept ‘0’ R (0) 1 4 9 5 8 6 7 2 3 10 Generation of a random population of chromosomes Computation of the fitness of individual chromosome Selection of chromosomes with good fitness Reproduction of next generation of chromosomes/population No Yes End Start Limit on number of generation reached? ©2001 CRC Press LLC the next generation of chromosomes. More specifically, each chromosome is evaluated according to a given performance criterion or fitness function, and assigned a fitness score. Using the fitness value attained by each chromosome, good chromosomes are selected to undergo reproduction. Reproduction involves the creation of offspring using two operators namely crossover and mutation (Figure 19.3). By randomly selecting a common crossover site on two parent chromosomes, two new chromosomes are produced. During the process of reproduction, mutation may take place. For example, the binary value of bit 2 in Figure 19.3 has been changed from 0 to 1. The above process of fitness evaluation, chromosome selection, and reproduction of next generation of chromosomes continues for a predetermined number of gener- ations or until an acceptable performance level is reached. 19.3 A Prototype Multi-Concept Classification System 19.3.1 Twin-Concept and Multi-Concept Classification The basic principle of rough set theory is founded on a twin-concept classification [Pawlak, 1982]. For example, in the information system shown in Table 19.1, an object belongs either to “0” or “1.” However, binary-concept classification, in reality, has limited application. This is because in most situations, objects can be classified into more than two classes. For example, in describing the vibration experienced by a rotary machinery such as a turbine in a power plant or a pump in a chemical refinery, it is common to use more than two states such as normal, slight vibration, mild vibration, and abnormal, rather than just normal or abnormal to describe the condition. As a result, the twin-concept classification of rough set theory needs to be generalized in order to handle multi-concept problems. Based on rough set theory, Grzymala-Busse [1992] developed an inductive learning system called LERS to deal with inconsistency in training data. Basically, LERS is able to perform multi-concept classification. However, as observed by Grzymala-Busse [1992], LERS becomes impractical when it encounters a large training data set. This can possibly be attributed to the complexity of its computational algorithm. Furthermore, the rules induced by LERS are relatively complex and difficult to interpret. 19.3.2 The Prototype System — RClass * 19.3.2.1 The Approach RClass * adopts a hybrid approach that combines the basic notions of rough set theory, the unique searching engine of GAs, and Boolean algebraic operations to carry out multi-concept classification. It possesses the ability of FIGURE 19.3 Genetic operators. Chromosome 1 101100 101 010 New chromosome 1 Crossover Chromosome 2 001010 001 100 New chromosome 2 1 0 0001 Before Mutation 1 1 0001 After Mutation Crossover site Before Crossover After Crossover Crossover Mutation ⇒ ⇒ ©2001 CRC Press LLC 1. Handling inconsistent information. This is treated by rough set principles. 2. Inducing probable decision rules for each concept. This is achieved by using a simple but effective GA-based search engine. 3. Simplifying the decision rules discovered by the GA-based search engine. This is realized using the Boolean algebraic operators to simplify the decision rules induced. Multi-concept classification can be realized using the following procedure. 1. Treat all the concepts (classes) in a training data set as component sets (sets A, B, C . . .) of a universe, U (Figure 19.4). 2. Partition the universe, U, into two sets using one of the concepts such as A and ‘not A’ (¬A). This implies that the rough set’s twin-concept classification can be used to treat concept A and its complement, ¬A. 3. Apply the twin-concept classification to determine the upper and lower approximations of concept A in accordance to rough set theory. 4. Use Steps 2 and 3 repeatedly to classify other concepts on universe U. 19.3.2.2 Framework of RClass * The framework of RClass * is shown in Figure 19.5. It comprises four main modules, namely a prepro- cessor, a rough-set analyzer, a GA-based searching engine, and a rule pruner. The raw knowledge or data gleaned from a process or experts is stored and subsequently forwarded to RClass * for classification and rule induction. The preprocessor module performs the following tasks: 1. Access input data. 2. Identify attributes and their value. 3. Perform redundancy check and reorganize the new data set with no superfluous observations for subsequent use. 4. Initialize all the necessary parameters for the GA-based search engine, such as the length of chromosome, population size, number of generation, and the probabilities of crossover and mutation. The rough set analyzer carries out three subtasks, namely, consistency check, concept forming, and approximation. It scans the training data set obtained from the preprocessor module and checks its consistency. Once an inconsistency is spotted, it will activate the concept partitioner and the approxima- tion operator to carry out analysis using rough set theory. The concept partitioner performs set operations for each concept (class) according to the approach outlined previously. The approximation operator employs the lower and upper approximators to calculate the lower and upper approximations, during which the training data set is split into certain training data set and possible training data set. Subsequently, these training sets are forwarded to the GA-based search engine for rule extraction. FIGURE 19.4 Partitioning of universe U. A B C A ¬ . . . U ©2001 CRC Press LLC The GA-based search engine, once invoked, performs the bespoke genetic operations such as crossover, mutation, and reproduction to gather certain rules and possible rules from the certain training data set and possible training data set, respectively. The rule pruner performs two tasks: pruning (or simplifying) and rule evaluation. It examines all the rules, both certain and possible rules, extracted by the GA-based search engine and employs Boolean algebraic operators such as union and intersection, to prune and simplify the rules. During the pruning operation, redundant rules are removed, whereas related rules are clustered and generalized during simplification. As possible rules are not definitely certain, the quality and reliability of these possible rules must therefore be assessed. For every possible rule, RClass * also estimates its reliability using the following index: where Observation_Possible_Rule is the number of observations that are correctly classified by a possible rule, and Observation_Possible_Original_Data is the number of observations with condi- tion attributes covered by the same rule in the original data set. This index can be viewed as the probability of classifying an inconsistent training data set correctly. For each certain rule extracted from the certain training data set, RClass * uses a so-called completeness index to indicate the number of observations in the original training data set that are related to the certain rule. Such an index is defined as follows: FIGURE 19.5 Framework of RClass * . Fi Module 3: GA-based search en gine Knowledge Extracted Input Data Pre-processor C L A S S I F I E R Redundancy Analysis Rough-set Analyzer GA-based Search Engine System Initializer GA Operator Rule Pruner Pruning/Simplifying Raw Information Expert System Attributes Identifier Concept Partitioner Approximator Consistency Analysis GA Configuration Rule Evaluation Reliability index Observation_Possible_Rule Observation_Possible_Original_Data = [...]... classified into this concept ©2001 CRC Press LLC References Arciszewski, T and Ziarko, W 1990 Inductive Learning in Civil Engineering: A Rough Sets Approach, Microcomputers in Civil Engineering, 5(1): 19-28 Davis, L (Ed.) 1991 Handbook of Genetic Algorithms Van Nostrand Reinhold, New York De Jong, K A., Spears, W M., and Gordon, D F 1993 Using Genetic Algorithms for Concept Learning, Machine Learning, 12(13):... by domain experts They are reasonable and logical Using the rules extracted, it is envisaged that a knowledge-based system can be developed to assist engineers in diagnosing the equipment Defining Terms Certain rules: Rules that can definitely classify some observations into a certain concept Certain training data set: The data set that all the observations contained can be definitely classified into a...Completeness index = Observation_Certain_Rule Observation_Certain_Original_Data where Observation_Certain_Rule is the number of observations that are correctly classified by a certain rule, and Observation_Certain_Original_Data is the number of observations with condition attributes covered by the same rule in the original training data In other words, the completeness index represents the usefulness... the rules induced are concise, sensible, and complete For all the rules extracted, RClass* is also able to provide an estimation of the expected reliability This would assist users in ascertaining the appropriateness of the rules extracted A case study was used to illustrate the possibility of using RClass* in performing machine diagnosis in a manufacturing environment In this case, machine vibration... and Rule Induction, Int Journal of Advanced Manufacturing, in press Mrozek, A 1992 Rough Sets in Computer Implementation of Rule-Based Control of Industrial Process In Intelligent Decision Support — Handbook of Applications and Advances of the Rough Sets Theory, Ed R Slowinski, pp 19-32 Kluwer Academic Publishers, Dordrecht, The Netherlands Pawlak, Z 1982 Rough Sets, Int Journal of Computer and Information... Mathematical & Computer Modeling, 12(10/11): 1347-1357 Yao, Y Y., Wong, S K M., and Lin, T Y 1997 A Review of Rough Sets Models In Rough Sets and Data Mining — Analysis for Imprecise Data, Ed T Y Lin and N Cercone, pp 47-76 Kluwer Academic Publishers, Boston, MA Yasdi, R 1995 Combining Rough Sets Learning and Neural Learning Method to Deal with Uncertain and Imprecise Information, Neurocomputing, 7: 61-84 Ziarko,... are depicted in Table 19.7 ©2001 CRC Press LLC TABLE 19.7 Sample Rules Corresponding to Machine Characteristics Working States Machine operating under normal condition Machine with misalignment problems Mechanical loosening Unbalanced machine Sample Rules Extracted IF(A1 = 1) & (A6 = 1) THEN Machine State = 1 IF(A2 = 3) THEN Machine State = 2 IF(A3 = 3) & (A4 = 3) & (A6 = 3) THEN Machine State = 3... 89-95 Quinlan, J R 1992 C4.5: Programs for Machine Learning Morgan Kaufmann, San Mateo, CA Slowinski, K 1992 Rough Classification of HSV Patients In Intelligent Decision Support — Handbook of Applications and Advances of the Rough Sets Theory, Ed R Slowinski, pp 77-94 Kluwer Academic Publishers, Dordrecht, The Netherlands Slowinski, R and Stefanowski, J 1989 Rough Classification in Incomplete Information... time, the certain and possible training data sets are identified Upon completion, the GA-based search engine is invoked to look for classification rules from the certain and possible training data sets obtained from the rough set analyzer It randomly generates 50 chromosomes to form an initial population of possible solutions (chromosomes) These chromosomes are coded using the scheme shown in Table 19.4... deal with rule induction under uncertainty RClass* has incorporated a novel approach that extends rough set’s twin-concept to perform multi-concept classification This has made RClass* more practical in dealing with real-life problems compared to its predecessor, RClass Using RClass*, two kinds of rules, certain rules and possible rules, can be induced from examples RClass* was validated using an example . approximations, during which the training data set is split into certain training data set and possible training data set. Subsequently, these training sets are. Multi-Concept Classification System for Manufacturing Diagnosis" Computational Intelligence in Manufacturing Handbook Edited by Jun Wang et al Boca