DSpace at VNU: CAR-Miner: An efficient algorithm for mining class-association rules

7 136 0
DSpace at VNU: CAR-Miner: An efficient algorithm for mining class-association rules

Đang tải... (xem toàn văn)

Thông tin tài liệu

Expert Systems with Applications 40 (2013) 2305–2311 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa CAR-Miner: An efficient algorithm for mining class-association rules Loan T.T Nguyen a, Bay Vo b,⇑, Tzung-Pei Hong c,d, Hoang Chi Thanh e a Faculty of Information Technology, VOV College, Ho Chi Minh, Viet Nam Information Technology College, Ho Chi Minh, Viet Nam c Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, ROC d Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC e Department of Informatics, Ha Noi University of Science, Ha Noi, Viet Nam b a r t i c l e i n f o Keywords: Accuracy Classification Class-association rules Data mining Tree structure a b s t r a c t Building a high accuracy classifier for classification is a problem in real applications One high accuracy classifier used for this purpose is based on association rules In the past, some researches showed that classification based on association rules (or class-association rules – CARs) has higher accuracy than that of other rule-based methods such as ILA and C4.5 However, mining CARs consumes more time because it mines a complete rule set Therefore, improving the execution time for mining CARs is one of the main problems with this method that needs to be solved In this paper, we propose a new method for mining class-association rule Firstly, we design a tree structure for the storage frequent itemsets of datasets Some theorems for pruning nodes and computing information in the tree are developed after that, and then, based on the theorems, we propose an efficient algorithm for mining CARs Experimental results show that our approach is more efficient than those used previously Ó 2012 Elsevier Ltd All rights reserved Introduction 1.1 Motivation Classification plays an important role in decision support systems A lot of methods for mining classification rules have been developed including C4.5 (Quinlan, 1992) and ILA (Tolun & Abu-Soud, 1998; Tolun, Sever, Uludag, & Abu-Soud, 1999) Recently, a new method for classification from data mining, called classification based on associations (CBA), has been proposed for mining class-association rules (CARs) This method has more advantages than the heuristic and greedy methods in that the former can easily remove noise, and the accuracy is thus higher It generates a more complete rule set than C4.5 and ILA For association rule mining, the target attribute (or class attribute) is not pre-determined However, the target attribute must be pre-determined in classification problems Thus, some algorithms for mining classification rules based on association rule mining have been proposed Examples include classification based on predictive association rules (Yin & Han, 2003), classification based on multiple association rules (Li, Han, & Pei, 2001), classification based on associations (CBA, Liu, Hsu, & Ma, 1998), multi-class, multi-label associative classification (Thabtah, Cowling, & Peng, 2004), multi-class classification based on association rules (Thabtah, Cowling, & Peng, 2005), associative ⇑ Corresponding author E-mail addresses: nguyenthithuyloan@vov.org.vn (L.T.T Nguyen), vdbay@itc edu.vn (B Vo), tphong@nuk.edu.tw (T.-P Hong), thanhhc@vnu.vn (H.C Thanh) 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd All rights reserved http://dx.doi.org/10.1016/j.eswa.2012.10.035 classifier based on maximum entropy (Thonangi & Pudi, 2005), Noah (Giuffrida, Chu, & Hanssens, 2000), and the use of the equivalence class rule tree (Vo & Le, 2008) Some researches have also reported that classifiers based on class-association rules are more accurate than those of traditional methods such as C4.5 and ILA both theoretically (Veloso, Meira, & Zaki, 2006) and with regard to experimental results (Liu et al., 1998) Veloso et al proposed lazy associative classification (Veloso et al., 2006; Veloso, Meira, Goncalves, & Zaki, 2007; Veloso, Meira, Goncalves, Almeida, & Zaki, 2011), which differed from CARs in that it used rules mined from the projected dataset of an unknown object for predicting the class instead of using the ones mined from the whole dataset Genetic algorithms have also been applied recently for mining CARs, and several approaches have been proposed For example, Chien and Chen (2010) proposed a GA-based approach to build the classifier for numeric datasets and applied it to stock trading data Kaya (2010) proposed a Pareto-optimal genetic approach for building autonomous classifiers Qodmanan, Nasiri, and Minaei-Bidgoli (2011) proposed a GA-based method without requiring minimum support or minimum confidence thresholds Yang, Mabu, Shimada, and Hirasawa (2011) proposed an evolutionary approach to rank rules These algorithms were mainly based on heuristics in order to build classifiers All the above methods focused on the design of the algorithms for mining CARs or for building classifiers, but did not discuss much with regard to their mining time Therefore, in this paper, we aim to propose an efficient algorithm for mining CARs based on a tree structure Section 1.2 will present our contributions in this paper 2306 L.T.T Nguyen et al / Expert Systems with Applications 40 (2013) 2305–2311 1.2 Our contributions In the past, Vo and Le (2008) proposed a method for mining CARs using the equivalence class rule-tree (ECR-tree) An efficient mining algorithm, named ECR-CARM, was proposed in their study ECR-CARM scanned the dataset only once and was based on object identifiers to quickly determine the support of itemsets However, it was quite time consuming for generate-and-test candidates because all itemsets with the same attributes are grouped into one node in the tree Therefore, when joining two nodes li and lj to create a new node, ECR-CARM had to consider each element of li with each element of lj to check whether they had the same prefix or not In this paper, we design a MECR-tree as follow: Each node in the tree contains an itemset, instead of their all the itemsets With this tree, some theorems are also designed and based on them, an algorithm is proposed for mining CARs 1.3 Organization of our paper The rest of this paper is organized as follows In Section 2, we introduce some works related to mining CARs Section presents preliminary concepts The main contributions are presented in Section 4, in which a tree structure, named MECR-tree, is developed and some theorems for pruning candidates fast are derived Based on the tree and these theorems, we propose an algorithm for mining CARs efficiently In Section 5, we show and discuss the experimental results The conclusions and future work are presented in Section Related work Mining CARs is discovery of all classification rules that satisfy the minimum support (minSup) and the minimum confidence (minConf) thresholds The first method for mining CARs was proposed by Liu et al (1998) It generates all candidate 1-ruleitems and then calculates their supports for finding ruleitems that satisfy minSup It then generates all the candidate 2-ruleitems from the 1ruleitems by the same checking way The authors also proposed a heuristic for building the classifier The weak point of this method is that it generates a lot of candidates and scans the dataset many times, so it is time-consuming Therefore, the proposed algorithm uses a threshold K and only generates k-ruleitems with k K In 2000, an improved algorithm for solving the problem of imbalanced datasets has been proposed (Liu, Ma, & Wong, 2000) The latter has higher accuracy than the former because it uses the hybrid approach for prediction Li et al proposed a method based on the FP-tree (Li et al., 2001) The advantage of this method is that it scans the dataset only two times and uses an FP-tree to compress the dataset It also uses the tree-projection technique to find frequent itemsets To predict unseen data, this method finds all rules that satisfy this data and uses a weighted v2 measure to determine the class Vo and Le proposed another approach based on the ECR-tree (Vo & Le, 2008) This approach develops a tree structure called the equivalence class- rules tree (ECR-tree), and proposes an algorithm called ECR-CARM for mining CARs The algorithm scans the dataset only once It is based on the intersection of object identifications to quickly compute the supports of itemsets Nguyen, Vo, Hong, and Thanh (2012) then proposed a new method for pruning redundant rules based on a lattice Thabtah et al (2004) proposed a multi-class, multi-label associative classification approach for mining CARs This method used the rule form {(Ai1,ai1), (Ai2,ai2), ,(Aim,aim)} ? ci1 _ ci2 _ _ cil, where aij is a value of attribute Aij, and cij is a class label Some other class-association rule mining approaches have been presented in the work of Coenen, Leng, and Zhang (2007), Giuffrida et al (2000), Lim and Lee (2010), Liu, Jiang, Liu, and Yang (2008), Priss (2002), Sun, Wang, and Wong (2006), Thabtah et al (2005), Thabtah, Cowling, and Hammoud (2006), Thonangi and Pudi (2005), Yin and Han (2003), Zhang, Chen, and Wei (2011), and Zhao, Tsang, Chen, and Wang (2010) Preliminary concepts Let D be the set of training data with n attributes A1,A2, ,An and jDj objects (cases) Let C = {c1, c2, ,ck} be a list of class labels A specific value of an attribute Ai and class C are denoted by the lower-case letters a and c, respectively Definition An itemset is a set of some pairs of attributes and a specific value, denoted {(Ai1,ai1), (Ai2,ai2), ,(Aim,aim)} Definition A class-association rule r is of the form {(Ai1,ai1), , (Aim,aim)} ? c, where {(Ai1,ai1), , (Aim,aim)} is an itemset, and c C is a class label Definition The actual occurrence ActOcc(r) of a rule r in D is the number of rows of D that match r’s condition Definition The support of a rule r, denoted Sup(r), is the number of rows that match r’s condition and belong to r’s class For example: Consider r: {(A,a1)} ? y from the dataset in Table We have ActOcc(r) = and Sup(r) = because there are three objects with A = a1, in that two objects have the same class y Mining class-association rules 4.1 Tree structure In this paper, we modify the ECR-tree structure (Vo & Le, 2008) into the MECR-tree structure (M stands for Modification) as follows In the ECR-tree, all itemsets with the same attributes are arranged into one group and put them in one node Itemsets in different groups were then joined together to form itemsets with more items This led to the consumption of much time for generate-and-test itemsets In our work, each node in the tree contains only one itemset along with the following information: (a) Obidset: a set of object identifiers that contain the itemset (b) (#c1, #c2, , #ck) – a list of integers, where #ci is the number of records in Obidset which belong to class ci, and (c) pos – a positive integer storing the position of the class with the maximum count, i.e., pos = argmaxi2[1,k]{# ci} In the ECR-tree, the authors did not store ci and pos, thus needing to compute them for all nodes However, some values are not calculated in the proposed approach here with the MECR-tree by using theorems presented in Section 4.2 Table An example of training dataset OID A B C Class a1 a1 a2 a3 a3 a3 a1 a2 b1 b2 b2 b3 b1 b3 b3 b2 c1 c1 c1 c1 c2 c1 c2 c2 y n n y n y y n L.T.T Nguyen et al / Expert Systems with Applications 40 (2013) 2305–2311 For example, consider a node containing the itemset X = {(A,a3), (B,b3)} from Table Because X is contained in objects and 6, all of them belong to class y Therefore, a node fðA; a3Þ; ðB; b3Þg or more 46ð2;0Þ simply as  a3b3 is generated in the tree The pos is (underlined 46ð2;0Þ at position of this node) because the count of class y is at a maximum (2 as compared to 0) The latter is another representation of the former for saving memory when we use the tree structure to store itemsets We use bit presentation for storage of the itemset’s attributes For example, AB can present as 11 in bit presentation, and therefore, the value of these attributes is With this presentation, we can use bitwise operations to make itemsets join faster 4.2 Proposed algorithm In this section, some theorems for fast mining CARs are designed Based on these theorems, we propose an efficient algorithm for mining CARs att1  v alues1 Theorem Given two nodes ? Obidset1 ðc11 ; ; c1k Þ att2  v alues2 , if att1=att2 and values1 – values2, then Obidset2 ðc21 ; ; c2k Þ Obidset1\ Obidset2 = £ Proof Since att1 = att2 and values1 – values2, there exist a val1 values1 and a val2 values2 such that val1 and val2 have the same attributes but different values Thus, if a record with OIDi contains val1, it cannot contain val2 Therefore, "OID Obidset1, and it can be inferred that OID R Obidset2 Thus, Obidset1 \ Obidset2 = £ h In this theorem, we divide the itemset into form att  values for ease of use Theorem infers that if two itemsets X and Y have the same attributes, they not need to be combined into the itemset XY because Sup(XY) = For example, consider the two nodes  a1  a2 and , in which Obidset({(A, a1)}) = 127, and 127ð2; 1Þ 38ð1; 1Þ Obidset({(A, a2)}) = 38 Obidset({(A, a1), (A, a2)}) = Obidset({(A, a1)}) \ Obidset({(A, a2)}) = £ Similarly, Obidset ({(A, a1), (B, b1)}) = 1, and Obidset({(A, a1); (B, b2)}) = It can be inferred that Obidset({(A, a1), (B, b1)}) \ Obidset({(A, a1); (B, b2)}) = £ because both of these two itemsets have the same attributes AB but with different values itemset1 and itemset2 Obidset1 ðc11 ; ; c1k Þ Obidset2 ðc21 ; ; c2k Þ, if itemset1 & itemset2 and jObidset1j = jObidTheorem Given two nodes set2j, then "i [1, k]: c1i = c2i Proof We have itemset1&itemset2, this means that all records containing itemset2 also contain itemset1, and therefore, Obidset2 # Obidset1 Additionally, according to theory, we have jObidset1j = jObidset2j, this means that we have Obidset2 = Obidset1, or " i [1, k]: c1i = c2i h From Theorem 2, when we join two parent nodes into a child node, then the itemset of the child node is always a supperset of the itemset of each of the parent nodes Therefore, we will check their cardinations, and if they are the same, we need not compute the count for each class and the pos of this node because they are the same as the parent node Using these theorems, we develop an algorithm for mining CARs efficiently By Theorem 1, we need not join two nodes with the same attributes, and by Theorem 2, we need not compute the information for some child nodes First of all, the root node of the tree (Lr) contains children nodes such that each node contains a single frequent itemset After that, 2307 procedure CAR-Miner will be called with the parameter Lr to mine all CARs from the dataset D The CAR-Miner procedure (Fig 1) considers each node li with all the other node lj in Lr, with j > i (Lines and 5) to generate a candidate child node l With each pair (li, lj), the algorithm checks whether li.att – lj.att or not (Line 6, using Theorem 1) If they are different, it computes the three elements att, values, Obidset for the new node O (Lines 7–9) Line 10 checks if the number of object identifiers of li is equal to the number of object identifiers of O (by Theorem 2) If this is true, then, by Theorem 2, the algorithm can copy all information from node li to node O (Lines 11–12) Similarly, in the event that the result of Line 10 is false, the algorithm checks li with O, and if the numbers of their object identifiers are the same (Line 13), the algorithm can copy all information from node lj to node O (Lines 14–15) Otherwise, the algorithm computes the O.count by using O.Obidset and O.pos (Lines 17–18) After computing all of the information for node O, the algorithm adds it to Pi (Pi is initialized empty in Line 4) if O.count[O.pos] P minSup (Lines 19–20) Finally, CAR-Miner will be recursively called with a new set Pi as its input parameter (Line 21) The procedure ENUMERATE-CAR(l, minConf) generates a rule from node l It first computes the confidence of the rule (Line 22), if the confidence of this rule satisfies minConf (Line 23), then it adds this rule into the set of CARs (Line 24) 4.3 An example In this section, we use the example in Table to describe the CAR-Miner process with minSup = 10% and minConf = 60% Fig shows the results of this process The MECR-tree was built from the dataset in Table as follows: First, the root node Lr contains all frequent 1-itemsets such as   1Âa1 1Âa2 1Âa3 2 b1 2Âb2 2Âb3 4Âc1 4Âc2 127ð2;1Þ 38ð0;2Þ 456ð2;1Þ 15ð1;1Þ 238ð0;3Þ 467ð3;0Þ 12346ð3;2Þ 578ð1;2Þ After that, procedure CAR-Miner is called with the parameter Lr 1Âa2 We use node li ¼ as an example for illustrating the CAR38ð0;2Þ Miner process li joins with all nodes following it in Lr:  a3 : They (li and lj) have the same attri456ð2; 1Þ bute and different values Do not make any thing from them  b1 : Because their attributes are different,  With node lj ẳ 151; 1ị three elements are computed such as O.att = li.att [ lj.att = j = or 11 in bit presentation; O.values = li.values [ lj values = a2 [ b1 = a2b1, and O.Obidset = li.Obidset \ lj.Obidset = {3,8} \ {1,5} = {£} Because the O.count[O.pos] = < minSup, O is not added to Pi  b2  With node lj ¼ : Because their attributes are different, 238ð0; 3Þ three elements are computed such as O.att = li.att [ lj.att = j = or 11 in bit presentation; O.values = li.values [ lj values = a2 [ b2 = a2b2, and O.Obidset = li.Obidset \ lj.Obidset = {3,8} \ {2,3,8} = {3,8} Because of jli.Obidsetj = jO.Obidsetj, the algorithm copies all information for li to O This means that  With node lj ¼ O.count = li.count = (0,2), and O.pos = Because O.count    a2b2 [O.pos] = > minSup, O is added to P i ) P i ¼ 38ð0; 2Þ Â b3 : Because their attributes are different,  With node lj ẳ 4673; 0ị three elements are computed such as O.att = li.att [ lj.att = 1j2 = or 11 in bit presentation; O.values = li.values [ lj values = a2 [ b3 = a2b3, and O.Obidset = li.Obidset \ lj.Obidset= {3,8} \ {4,6,7} = {£} Because the O.count[O.pos] = < minSup, O is not added to Pi 2308 L.T.T Nguyen et al / Expert Systems with Applications 40 (2013) 2305–2311 Fig The proposed algorithm for mining CARs Fig MECR-tree for the dataset in Table  c1 : Because their attributes are differ12346ð3; 2Þ ent, three elements are computed such as O.att = li.att [ lj.att = j = or 101 in bit presentation; O.values = li.values [lj.values = a2 [ c1 = a2c1, and O.Obidset = li.Obidset \ lj.Obidset = {3,8} \ {1,2,3,4,6} = {3} The algorithm computes additional information including O.count = {0,1} and O.pos = Because the O.count[O.pos] = P minSup, O is added to Pi ) P i ¼    a2b2  a2c1 ; 3ð0; 1ị 380; 2ị  With node lj ẳ c2 : Because their attributes are different, 578ð1; 2Þ three elements are computed such as O.att = li.att [ lj.att = j = or 101 in bit presentation; O.values = li.values [ lj.values = a2 [ c2 = a2c2, and O.Obidset = li.Obidset \ lj.Obidset = {3,8} \ {5,7,8} = {8} The algorithm computes additional information including O.count = {0,1} and O.pos = Because the O.count[O.pos] = P minSup, O is added to P i ) P i ¼    a2b2  a2c1  a2c2 ; ; 8ð0; 1Þ 80; 1ị 380; 2ị  With node lj ẳ L.T.T Nguyen et al / Expert Systems with Applications 40 (2013) 2305–2311 2309  After Pi is created, the CAR-Miner is called recursively with parameters Pi, minSup, and minConf to create all children nodes of Pi Consider the process to make child nodes of node  a2b2 : li ¼ 38ð0; 2Þ Â a2c1 : Because their attributes are differ With node lj ẳ 30; 1ị ent, three elements are computed such as O.att = li.att [ lj.att = j = or 111 in bit presentation; O.values = li values [ lj.values = a2b2 [ a2 c1 = a2b2c1, and O.Obidset = li.Obidset \ lj.Obidset = {3,8} \ {3} = {3} = lj.Obidset the algorithm copies all information of lj to O, it means that O.count = lj.count = (0,1) and O.pos = Because the O.count[O.pos] = > minSup, O is added to Pi ) Pi ¼    a2b2c1 3ð0; 1Þ Â a2c2 , we have the  Using the same process for node lj ¼   8ð0; 1Þ Â a2b2c1  a2b2c2 ; result Pi ẳ 30; 1ị 80; 1ị Rules are easily to generate in the same step for traversing li (Line 3) by calling procedure ENUMERATE-CAR(li, minConf) For  a2 example, when traversing node li ẳ , the procedure com380; 2ị putes the confidence of the candidate rule, conf = li.count[li.pos]/ jli.Obidsetj = 2/2 = Because conf P minConf (60%), add rule {(A, a2)} ? n (2,1) into the rule set CARs The meaning of this rule is ‘‘If A = a2 then class = n’’ (support = and confidence = 100%) To show the efficiency of Theorem 2, we can see that the algorithm need not compute the information of some itemsets, such as {3  a2b2,  a1b1c1,  a1b2c1,  a1b3c2,  a2b2c1,  a2b2c2,  a3b1c2,  a3b3c1} Fig Numbers of CARs in the breast dataset for various minSup values Fig Numbers of CARs in the German dataset for various minSup values Experimental results 5.1 Characteristics of experimental datasets The algorithms used in the experiments were coded on a personal computer with C#2008, Windows 7, Centrino  2.53 GHz, and MBs RAM The experimental results were tested in the datasets obtained from the UCI Machine Learning Repository (http:// mlearn.ics.uci.edu) Table shows the characteristics of the experimental datasets The experimental datasets had different features The Breast, German and Vehicle datasets had many attributes and distinctive (values) but had very few numbers of objects (or records) The Led7 dataset had only a few attributes, distinctive values and number of objects Fig Numbers of CARs in the lymph dataset for various minSup values 5.2 Numbers of rules of the experimental datasets Figs 3–7 show the numbers of rules of the datasets in Table for different minimum support thresholds We used a minConf = 50% for all experiments The results from Figs 3–7 show that some datasets had a lot of rules For example, the Lymph dataset had 4,039,186 rules with a Fig Numbers of CARs in the Led7 dataset for various minSup values Table The characteristics of the experimental datasets Dataset #attrs #classes #distinct values #Objs Breast German Lymph Led7 Vehicle 12 21 18 19 2 10 737 1077 63 24 1434 699 1000 148 3200 846 Fig Numbers of CARs in the vehicle dataset for various minSup values 2310 L.T.T Nguyen et al / Expert Systems with Applications 40 (2013) 2305–2311 Fig The execution time for CAR-Miner and ECR-CARM in the breast dataset Fig 12 The execution time for CAR-Miner and ECR-CARM in the vehicle dataset the Breast dataset with a minSup = 0.1% The mining time for the CAR-Miner was 1.517 s, while that for the ECR-CARM was 1:517 17.136 s The ratio was 17:136  100% ¼ 8:85% Conclusions and future work Fig The execution time for CAR-Miner and ECR-CARM in the German dataset This paper proposed a new algorithm for mining CARs using a tree structure Each node in the tree contained some information for fast computation of the support of the candidate rule In addition, using Obidset, we were able to compute the support of itemsets quickly Some theorems were also developed Based on these theorems, we did not need to compute the information for a lot of nodes in the tree With these improvements, the proposed algorithm had better performance relative to the previous algorithm in regard to all results Mining itemsets from incremental databases has been developed in recent years (Gharib, Nassar, Taha, & Abraham, 2010; Hong & Wang, 2010; Hong, Lin, & Wu, 2009; Hong, Wang, & Tseng, 2011; Lin, Hong, & Lu, 2009) It can be seen that it saves a lot of time and memory when compared with mining from integration databases Therefore, in the future, we will study how to use this approach for mining CARs Acknowledgements Fig 10 The execution time for CAR-Miner and ECR-CARM in the lymph dataset This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant No 102.01-2012.47 This paper has been completed while the second author is visiting Vietnam Institute for Advanced Study in Mathematics (VIASM), Ha Noi, Viet Nam References Fig 11 The execution time for CAR-Miner and ECR-CARM in the Led7 dataset minSup = 1% The German dataset had 752,643 rules with a minSup = 1%, etc 5.3 Execution time Experiments were then made to compare the execution time between CAR-Miner and ECR-CARM (Vo & Le, 2008) The results are shown in Figs 8–12 Results from Figs 8–12 show CAR-Miner to be more efficient than ECR-CARM in all of the experiments For example: Consider Chien, Y W C., & Chen, Y L (2010) Mining associative classification rules with stock trading data – A GA-based method Knowledge-Based Systems, 23(6), 605–614 Coenen, F., Leng, P., & Zhang, L (2007) The effect of threshold values on association rule based classification accuracy Data and Knowledge Engineering, 60(2), 345–360 Gharib, T F., Nassar, H., Taha, M., & Abraham, A (2010) An efficient algorithm for incremental mining of temporal association rules Data and Knowledge Engineering, 69(8), 800–815 Giuffrida, G., Chu, W W., & Hanssens, D M (2000) Mining classification rules from datasets with large number of many-valued attributes In 7th International conference on extending database technology: advances in database technology (EDBT’00) (pp 335–349) Munich, Germany Hong, T P., & Wang, C J (2010) An efficient and effective association-rule maintenance algorithm for record modification Expert Systems with Applications, 37(1), 618–626 Hong, T P., Lin, C W., & Wu, Y L (2009) Maintenance of fast updated frequent pattern trees for record deletion Computational Statistics and Data Analysis, 53(7), 2485–2499 Hong, T P., Wang, C Y., & Tseng, S S (2011) An incremental mining algorithm for maintaining sequential patterns using pre-large sequences Expert Systems with Applications, 38(6), 7051–7058 L.T.T Nguyen et al / Expert Systems with Applications 40 (2013) 2305–2311 Kaya, M (2010) Autonomous classifiers with understandable rule using multiobjective genetic algorithms Expert Systems with Applications, 37(4), 3489–3494 Li, W., Han, J., & Pei, J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules In 1st IEEE international conference on data mining (pp 369–376) San Jose, CA, USA Lim, A H L., & Lee, C S (2010) Processing online analytics with classification and association rule mining Knowledge-Based Systems, 23(3), 248–255 Lin, C W., Hong, T P., & Lu, W H (2009) The pre-FUFP algorithm for incremental mining Expert Systems with Applications, 36(5), 9498–9505 Liu, B., Hsu, W., & Ma, Y (1998) Integrating classification and association rule mining In 4th International conference on knowledge discovery and data mining (pp 80–86) New York, USA Liu, B., Ma, Y., & Wong, C K (2000) Improving an association rule based classifier In 4th European conference on principles of data mining and knowledge discovery (pp 80–86) Lyon, France Liu, Y Z., Jiang, Y C., Liu, X., & Yang, S L (2008) CSMC: A combination strategy for multiclass classification based on multiple association rules Knowledge-Based Systems, 21(8), 786–793 Nguyen, T T L., Vo, B., Hong, T P., & Thanh, H C (2012) Classification based on association rules: A lattice-based approach Expert Systems with Applications, 39(13), 11357–11366 Priss, U (2002) A classification of associative and formal concepts In The Chicago linguistic society’s 38th annual meeting (pp 273–284) Chicago, USA Qodmanan, H R., Nasiri, M., & Minaei-Bidgoli, B (2011) Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence Expert Systems with Applications, 38(1), 288–298 Quinlan, J R (1992) C4.5: program for machine learning Morgan Kaufmann Sun, Y., Wang, Y., & Wong, A K C (2006) Boosting an associative classifier IEEE Transactions on Knowledge and Data Engineering, 18(7), 988–992 Thabtah, F., Cowling, P., & Hammoud, S (2006) Improving rule sorting, predictive accuracy and training time in associative classification Expert Systems with Applications, 31(2), 414–426 Thabtah, F., Cowling, P., & Peng, Y (2004) MMAC: A new multi-class, multi-label associative classification approach In 4th IEEE international conference on data mining (pp 217–224) Brighton, UK 2311 Thabtah, F., Cowling, P., & Peng, Y (2005) MCAR: Multi-class classification based on association rule In 3rd ACS/IEEE international conference on computer systems and applications (pp 33–39) Tunis, Tunisia Thonangi, R., & Pudi, V (2005) ACME: An associative classifier based on maximum entropy principle In 16th International conference algorithmic learning theory (pp 122–134) LNAI 3734, Singapore Tolun, M R., & Abu-Soud, S M (1998) ILA: An inductive learning algorithm for production rule discovery Expert Systems with Applications, 14(3), 361–370 Tolun, M R., Sever, H., Uludag, M., & Abu-Soud, S M (1999) ILA-2: An inductive learning algorithm for knowledge discovery Cybernetics and Systems, 30(7), 609–628 Veloso, A., Meira, Jr., W., & Zaki, M J (2006) Lazy associative classification In 2006 IEEE international conference on data mining (ICDM’06) (pp 645–654) Hong Kong, China Veloso, A., Meira, W., Jr., Goncalves, M., & Zaki, M J (2007) Multi-label lazy associative classification In 11th European conference on principles of data mining and knowledge discovery (pp 605–612) Warsaw, Poland Veloso, A., Meira, W., Jr., Goncalves, M., Almeida, H M., & Zaki, M J (2011) Calibrated lazy associative classification Information Sciences, 181(13), 2656–2670 Vo, B., & Le, B (2008) A novel classification algorithm based on association rule mining In The 2008 Pacific rim knowledge acquisition workshop (held with PRICAI’08) (pp 61–75) LNAI 5465, Ha Noi, Viet Nam Yang, G., Mabu, S., Shimada, K., & Hirasawa, K (2011) An evolutionary approach to rank class association rules with feedback mechanism Expert Systems with Applications, 38(12), 15040–15048 Yin, X., & Han, J (2003) CPAR: Classification based on predictive association rules In SIAM international conference on data mining (SDM’03) (pp 331–335) San Francisco, CA, USA Zhang, X., Chen, G., & Wei, Q (2011) Building a highly-compact and accurate associative classifier Applied Intelligence, 34(1), 74–86 Zhao, S., Tsang, E C C., Chen, D., & Wang, X Z (2010) Building a rule-based classifier – A fuzzy-rough set approach IEEE Transactions on Knowledge and Data Engineering, 22(5), 624–638 ... of candidates and scans the dataset many times, so it is time-consuming Therefore, the proposed algorithm uses a threshold K and only generates k-ruleitems with k K In 2000, an improved algorithm. .. (2006), Thonangi and Pudi (2005), Yin and Han (2003), Zhang, Chen, and Wei (2011), and Zhao, Tsang, Chen, and Wang (2010) Preliminary concepts Let D be the set of training data with n attributes... method for mining CARs was proposed by Liu et al (1998) It generates all candidate 1-ruleitems and then calculates their supports for finding ruleitems that satisfy minSup It then generates all

Ngày đăng: 16/12/2017, 08:56

Từ khóa liên quan

Mục lục

  • CAR-Miner: An efficient algorithm for mining class-association rules

    • 1 Introduction

      • 1.1 Motivation

      • 1.2 Our contributions

      • 1.3 Organization of our paper

      • 2 Related work

      • 3 Preliminary concepts

      • 4 Mining class-association rules

        • 4.1 Tree structure

        • 4.2 Proposed algorithm

        • 4.3 An example

        • 5 Experimental results

          • 5.1 Characteristics of experimental datasets

          • 5.2 Numbers of rules of the experimental datasets

          • 5.3 Execution time

          • 6 Conclusions and future work

          • Acknowledgements

          • References

Tài liệu cùng người dùng

Tài liệu liên quan