DSpace at VNU: A method for mining top-rank-k frequent closed itemsets

9 177 0
DSpace at VNU: A method for mining top-rank-k frequent closed itemsets

Đang tải... (xem toàn văn)

Thông tin tài liệu

1 10 A method for mining top-rank-k frequent closed itemsets Loan T.T Nguyena,b,∗ , Truc Trinhc , Ngoc-Thanh Nguyend and Bay Voe a Division of Knowledge and System Engineering for ICT, Ton Duc Thang University, Ho Chi Minh City, Vietnam b Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam c VOV College, Ho Chi Minh City, Vietnam d Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wrocław, Poland e Faculty of Information Technology, Ho Chi Minh City University of Technology, Vietnam Au tho rP roo f Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx DOI:10.3233/JIFS-169128 IOS Press 23 Keywords: DCI-Plus, dynamic bit vectors, frequent closed itemsets, top-rank-k frequent closed itemsets 24 Introduction 15 16 17 18 19 20 21 25 26 27 28 29 30 rre 14 co 13 Data mining is the process of extracting interesting knowledge from data Various methods for discovering knowledge have been proposed, such as mining traditional association rules [1–4, 6, 7, 23, 31, 36, 37], mining non-redundant association rules [8, 41], mining minimal non-redundant association rules [26, 27], mining most generalization association rules [38], Un 12 cte d 22 Abstract Mining frequent closed itemsets (FCIs) is important in mining non-redundant (minimal) association rules Therefore, many algorithms have been developed for mining FCIs with reduced mining time and memory usage For mining FCIs, algorithms use the minimum support threshold, minSup, to prune itemsets However, using a fixed minSup is not suitable for mining top-rank-k FCIs A large threshold will lead to a small number of generated FCIs, leading to insufficient FCIs to query when k is large On the other hand, a small minSup will generate a huge number of generated FCIs, leading to large runtimes and high memory usage In this paper, we propose a method for mining top-rank-k FCIs without using a fixed minimum support threshold A strategy is first used to eliminate 1-items that cannot generate FCIs belonging to top-rank-k FCIs Next, based on the set of candidate 1-items, we propose TRK-FCI, a DCI-Plus-based algorithm, for mining top-rank-k FCIs In the process of mining top-rank-k FCIs, TRK-FCI automatically increases minSup according to the mined FCIs, efficiently pruning itemsets that cannot belong to top-rank-k FCIs We also modify the dynamic bit vector (DBV) structure and apply it to reduce memory usage and runtime in the TRK-FCI-DBV algorithm Experimental results show that TRK-FCI-DBV is more efficient than TRK-FCI for various databases 11 ∗ Corresponding author Loan T.T Nguyen, Division of Knowledge and System Engineering for ICT, Ton Duc Thang University, Ho Chi Minh City, Vietnam E-mail: nguyenthithuyloan@tdt.edu.vn 31 classification using decision trees [13, 20, 30] or ILA [33], classification based on association rules [13, 14, 20, 21], and clustering [22] Mining association rules has many applications in practice [3, 23] For mining association rules, frequent itemsets [2, 11, 21, 42], frequent closed itemsets (FCIs) [15, 21, 26, 28, 31, 32, 37, 40–42], or maximal frequent itemsets [12, 19] must be mined Mining frequent itemsets is often used for generating all association rules that satisfy minimum support threshold (minSup) and minimum confidence threshold (minConf) [1, 2, 35, 36] and mining FCIs is used for mining (minimal) non-redundant association rules (i.e., rules 1064-1246/16/$35.00 © 2016 – IOS Press and the authors All rights reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 roo f 49 belong to tabk Because DCI-Plus uses fixed bit vectors, it has high memory usage and runtime for storing and computing the bit vector of a new itemset, checking subsets, and computing the supports of itemsets TRK-FCI-DBV, an improved version of TRK-FCI, is then developed TRK-FCI-DBV uses the dynamic bit vector (DBV) structure instead of the bit vector structure to reduce mining time and memory usage The rest of this paper is organized as follows Section presents definitions of FCIs and top-rank-k FCIs and states the problem of mining top-rank-k FCIs In Section 3, we review works related to the problem of mining FCIs, top-k and top-rank-k frequent itemsets, and top-k FCIs Section describes a method for mining top-rank-k based on the DCIPlus algorithm and an improved algorithm based on DBVs Experimental results on standard databases for TRK-FCI and TRK-FCI-DBV are presented in Section Conclusions and suggestions for future work are given in Section Definitions and problem statement Let I = {i1 , i2 , , im } be a set of items and DB = {t1 , t2 , , tn } be a set of transactions, where each ti (1 ≤ i ≤ n) is a transaction labeled by a unique identifier and contains a set of items in I cte d 48 rre 47 considered redundant based on certain criteria are eliminated) [26, 27, 41] For mining maximal frequent itemsets, all frequent itemsets or FCIs (for which the database must be scanned to compute the supports of itemsets) must be generated to mine above kinds of rules Mining FCIs is important for pruning redundant rules The problem was first stated in 1999 by Pasquier et al [26] Since then, many algorithms have been developed to enhance the efficiency of mining FCIs, such those based on FP-tree [11, 28, 40], IT-tree [32, 42], bit vectors [31, 37], and N-Lists [15] To mine FCIs, the minSup is set The FCIs that satisfy the minSup threshold are selected It is difficult to mine a sufficient number of top-rank-k FCIs because an excessively high threshold will lead to very few FCIs, not enough to query Conversely, a minSup that is too low will lead to a very large number of FCIs, requiring a lot of memory and time to mine Therefore, developing efficient algorithms for mining top-rank-k FCIs is necessary Some algorithms have been developed for mining top-rank-k frequent itemsets Deng et al [5] proposed the NTK algorithm and used a Node-list to mine top-rank-k frequent itemsets The iNTK algorithm, an improved version of the NTK algorithm proposed by Le et al [14], uses the subsume concept and the N-list structure to fast mine top-rankk itemsets After that, some algorithms have been developed for mining top-k frequent itemsets [29], top-k FCIs [39], and top-k non-redundant association rules [8] Mining top-rank-k FCIs is important for mining non-redundant association rules However, for our best knowledge, there are no developed algorithms for mining top-rank-k FCIs Besides, algorithms developed for mining top-rank-k frequent itemsets or top-k FCIs cannot be applied to mine top-rankk FCIs Therefore, in this paper, we propose the TRK-FCI algorithm, which is based on DCI-Plus [31], for mining top-rank-k FCIs First, the algorithm finds a set of candidate items that may belong to top-rank-k FCIs, where k is a given threshold Then, it uses the DCI-Plus algorithm to generate FCIs based on these candidate items When an FCI is generated, it is directly inserted into a table named tabk FCIs with the same support are stored in the same entry The number of entries in tabk is below the threshold k In the process of mining top-rank-k FCIs, the algorithm automatically increases minSup to reduce the number of FCI candidates that not co 46 Un 45 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets Au tho rP Definition (support of an itemset) Given a DB and an itemset X (X ⊆ I), the support of X, denoted by SUPX , is the number of transactions containing X in DB Definition (frequent itemset) Given a DB and an itemset X (X ⊆ I), X is a frequent itemset if SUPX ≥ Sup Definition (FCI) Given a DB and an itemset X (X ⊆ I), X is called an FCI if no itemset Y exists such that X ⊂ Y and SUPX = SUPY Definition (rank of an FCI) Given a set of CI including all closed itemsets from a transaction database DB and an FCI X (X ∈ CI), the rank of X in CI is the number of itemsets whose support values are no greater than the support of X The rank of X is defined as: RX = |{SUPY |Y ∈ CI ∧ SUPY ≥ SUPX }| 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 Definition (a top-rank-k FCI) Given a set of CI including all closed itemsets from a transaction database DB and a threshold k, an itemset X ∈ CI is called a top-rank-k FCI if and only if RX is no greater than k, i.e., RX ≤ k Definition (mining top-rank-k FCIs) Given CI including all FCIs from transaction database DB and a threshold k, the goal of mining top-rank-k FCIs is to find a complete set of FCIs whose ranks are no greater than k, i.e., top-rank-k FCIs are a set of itemsets for which {X ∈ CI|RX ≤ k} From definition 6, the problem of mining top-rankk FCIs is stated as follows Given a database BD and a threshold k, mining top-rank-k FCIs is divided into two steps: Step 1: Mine all closed itemsets in DB, a set called CI Step 2: Keep the closed itemsets that satisfy definition in CI 156 Related works 157 3.1 Mining frequent closed itemsets 152 153 154 cte d 155 The above approach is simple but not feasible because the number of closed itemsets in the database is often large Therefore, finding a direct solution for mining top-rank-k FCIs without mining all closed itemsets is a challenge 151 eliminate items at high levels that have the same support as that of items at low levels FPClose [11] is an improved version of Closet+ that uses FP-array to reduce the number of FP-tree scans when FP-tree is projected CHARM [42] is based on tidsets for fast computing the supports of itemsets and uses subset checking to fast prune non-closed itemsets To check whether a generated itemset is closed, CHARM uses a hash table in which the key of each itemset is the sum of its items dCHARM, a diffset approach for mining FCIs is also developed [42] CloseMiner [32] uses closed tidsets to check whether an itemset is closed Although CHARM, dCHARM and CloseMiner have advantages over algorithms based on horizon data format such as Close, A-Close, Closet, Closet+, and FPClose, they must use hash tables to check whether a candidate itemset is closed, and thus closed itemsets must be stored in main memory for easy checking DCI-Closed [21] uses tidsets and a non-duplication generation strategy for mining FCIs DCI-Plus [31], an improved version of DCI-Closed [21], generates FCIs and minimal generators of each FCI Because DCI-Closed is based on tidsets, when the tidsets of itemsets are long, a lot of memory is required to store the tidsets and the runtime required to compute the intersection with other tidsets is high To reduce the length of tidsets and reduce computation time, DCIPlus uses BitTable roo f 133 Au tho rP 132 3.2 Mining top-rank-k frequent itemsets 161 162 163 164 165 166 167 168 169 170 171 172 173 174 rre 160 co 159 Problem of mining FCIs was first proposed in 1999 [26] Many algorithms for mining FCIs have since been developed to reduce runtime and memory usage Apriori-based algorithms for this purpose include Close [26] and A-Close [27] These algorithms generate candidates and compute their closure to find FCIs Algorithms based on the divide-andconquer technique have been developed Closet [28] uses FP-tree to compress the database and early pruning to prune non-closed itemsets Closet+ [40], an improved version of Closet (which uses a bottomup projection scheme for FP-tree), uses a hybrid approach: bottom-up for dense databases and topdown for sparse databases It uses item merging and sub-itemset pruning, which are widely used in other algorithms, and applies the subset checking strategy to fast check closed itemsets and item skipping to Un 158 Deng et al proposed the NTK algorithm for mining top-rank-k frequent itemsets [5] NTK uses the Node-list data structure to represent itemsets and uses a level-wise approach for mining top-rank-k frequent itemsets, i.e., t-patterns are used to form (t+1)-patterns By using Node-lists, the algorithm does not need to rescan the database to compute the supports of itemsets A dynamic minSup is used to efficient prune candidates Le et al developed iNTK [14], an improved version of NTK iNTK uses the subsume concept to reduce the number of generated candidates compared to those for NTK, reducing the time required to generate candidates 3.3 Mining top-k frequent closed itemsets Wang et al [39] proposed the TFP algorithm for mining top-k FCIs, where k is the number of FCIs that need to be mined TFP uses a divide-and-conquer technique (like FP-Growth) and prunes candidates 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 226 3.4 Mining top-k association rules 223 224 239 In 2012, Fournier-Viger et al [10] proposed the TopKRules algorithm for mining top-k association rules from datasets This algorithm uses the minConf value during the mining process of top-k rules The change of the minSup value is dependent on the lowest support of itemsets The TopKRules algorithm is based on the principle of extending rules and some methods for early eliminating rules that not belong to top-k rules Fournier-Viger and Tseng also extended TopKRules for mining top-k non-redundant rules [8] and top-k sequential rules [9] These algorithms are very efficient compared to post-processing methods 240 3.5 Dynamic bit vectors 228 229 230 231 232 233 234 235 236 237 238 241 242 243 244 245 246 247 248 249 250 In 2012, Vo et al [37] proposed the concept of dynamic bit vectors (DBV) and used it in mining frequent closed itemsets DBV of an itemset is a bit vector in which zero bits from the begin and the end are removed With this concept, we can save memory to store bit vectors and time to compute the intersection of bit vectors Tran et al expanded this concept to mine frequent closed sequences [34] Le et al also used DBV to develop an efficient algorithm for mining frequent closed inter-sequence using DBV [16] cte d 227 267 4.1 TRK-FCI algorithm roo f 225 based on minSup (automatically increased in the process of updating candidates) The authors also used a threshold l to eliminate itemsets whose lengths are smaller than l 222 Au tho rP 253 254 255 256 257 258 259 260 261 262 263 264 265 266 In this section, we present the TRK-FCI algorithm for mining top-rank-k FCIs based on BitTable TRKFCI uses DCI Plus [31] to generate candidate closed itemsets and apply some early pruning techniques to prune candidates First, the algorithm chooses a set of candidate items that may belong to top-rank-k FCIs, where k is a given threshold Then, it uses the DCI-Plus algorithm to generate FCIs based on these candidate items When an FCI is generated, it is directly inserted into a table named tabk FCIs with the same support are stored in the same entry The number of entries in tabk is below the threshold k In the process of mining top-rank-k FCIs, the algorithm automatically increases minSup to reduce the number of FCI candidates that not belong to tabk co 252 Proposed algorithms Un 251 rre Fig TRK-FCI algorithm for mining top-rank-k FCIs In the above algorithm, database D is first scanned to compute the BitTable and determine single items F1 These items are sorted in descending order according to their supports; if two items have the same support, then they are sorted in increasing lexicographical order Next, the algorithm creates F2 by inserting each item in F1 into F2 such that the number of items (which are different in their BitTables) is equal to k The items in F2 are sorted in increasing order according to their supports; if two items have the same support, then they are sorted in increasing lexicographical order POST SET is created by computing the closure of each item in F2 The procedure DCI CLOSED++ is called with the input iCLOSED SET = ∅, PRE SET = ∅, POST SET, and minSup, where minSup is the support of the first item 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 4.2 Illustration 287 Consider the database in Table 1, which includes 10 transactions and 10 items Assume that k = 5, the process of mining top-rankk FCIs is as follows First, the BitTable and support of each item are obtained, as shown in Table After F1 is sorted, we have F1 = {G, F, E, H, D, C, B, A, J, I} Next, we choose items from F1 that may belong to top-rank-k FCIs and store them in F2 , i.e., F2 = {A, C, D, H, E, F, G} (after sorting) Because A has the same BitTable as that of C and E has the same BitTable as that of F, they are grouped into two groups as (A, B) and (E, F), respectively After grouping, the algorithm computes the closure of each item The results are shown in Table From Table 3, we have POST SET = {ACEFG, CEFG, DEF, H, EF, F, G} Au tho rP roo f DCI CLOSED++ cte d Table Transaction database D Transaction Items 10 A, C, E, F, G, H D, E, F, H G, H, I B, D, E, F, G D, E, F, G G, H A, C, E, F, G B, E, F, H D, E, F, G, H D, E, F, G, H, J Table Items in D with their BitTable and support Item rre 286 in F2 if the number of items which are different in their BitTables in F2 is equal to k; otherwise, minSup is set to co 285 Un 284 Fig DCI CLOSED++ procedure BitTable Support 520 68 520 355 879 879 763 919 128 0.2 0.2 0.2 0.5 0.8 0.8 0.8 0.7 0.1 0.1 A B C D E F G H I J Table BitTable and Closure of items in D Item A C D H E F G BitTable Closure Support 520 520 355 919 879 879 763 ACEFG CEFG DEF H EF F G 0.2 0.2 0.5 0.7 0.8 0.8 0.8 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 4.3 Improved algorithm The TRK-FCI algorithm is based on the DCI-Plus algorithm Because DCI-Plus uses bit vectors to represent the tidsets of items, it requires more memory to store bit vectors and more time to compute the intersection of bit vectors when the number of transactions in the database is large To reduce the mining time and memory usage, we develop an improved algorithm that uses DBVs instead Table is presented to show the process of using DBVs for mining top-rank-k FCIs It shows the details of items, supports, closures, and DBVs of F2 Procedure DCI CLOSED++ is the same as that in TRK-FCI but the operations for BitTable are replied by operations for DBVs The final results are the same as those obtained with TRK-FCI Experiments Table Top-rank-k FCIs generated according to TRK-FCI algorithm k key/sup 0.8 0.7 0.6 0.5 0.4 FCIs {EF}, {G} {H} {EFG}, {DEF}, {HEF}, {HG} {DEFG} Table DBVs, closures, and supports of items in F2 A C D H E F G 348 349 350 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 The algorithms used in the experiments were implemented in C# 2012 on a personal computer with an i5-4200U 1.60-GHz CPU and GB of RAM running Windows 8.1 The experiments were tested on three databases downloaded from the UCI Machine Learning Repository (http://fimi.ua.ac.be/ data) Table shows the characteristics of the experimental databases Item 347 351 roo f 308 removed and EF is inserted into tabk , and minSup is set to 0.3 (the key of the last entry in tabk ) The algorithm will continue to process other FCIs The results are shown in Table cte d 307 rre 306 Procedure DCI CLOSED++ is called with the input PRE SET = ∅, POST SET, CLOSED SET = ∅, and minSup = supp(A) = 0.2 The first element of POST SET (ACEFG) is set to I Because PRE SET = ∅ and supp(ACEFG) = minSup, ACEFG is an FCI, and it is put into tabk with its key, which is its support (0.2), ACEFG is also inserted into PRE SET Next, itemset CEFG is processed Because supp(CEFG) = minSup and the BitTable of CEFG is a subset of the BitTable of ACEFG in PRE SET, CEFG is pruned When DEF is processed, because supp(DEF) > minSup and its BitTable is not a subset of the BitTable of any itemset in PRE SET, DEF is an FCI, and it is put into tabk with its key, which is its support (0.5) After that, procedure DCI CLOSED++ is called with PRE SET = {ACEFG}, CLOSED SETnew = DEF, and POST SETnew = {H, G} EF and F not appear in POST SETnew because they belong to CLOSED SET Because CLOSED SET = / φ, DEF is joined with H to create a newgen, which is DEFH Similarly, DEFH is an FCI, and is inserted into tabk with its key (0.3) The procedure is called recursively with parameters PRE SET = {ACEFG}, CLOSED SETnew = DEFH, and POST SETnew = {G} Because CLOSED SET = / φ and its generator is DEFHG, and there is no itemset X in PRE SET such that the BitTable of DEFHG is a subset of the BitTable of X, and thus DEFGH is an FCI, and is inserted into tabk with its key (0.2) Now, POST SET = φ and thus DEF is added into PRE SET The process continues by joining DEF with G to form DEFG DEFG is also an FCI and it is inserted into tabk with its key (0.4 The algorithm then starts with a newgen H H is an FCI and is inserted into tabk with its key (0.7) Note that now the number of entries in tabk is and equal to k The algorithm will continue to insert generated FCIs into tabk They include HEF (key is 0.5), HEFG (key is 0.3), and HG (key is 0.5) Consider the process of inserting FCI EF (whose key is 0.8) into tabk Because the key of EF is greater than that of the last entry (DEFGH) in tabk (key is 0.2), DEFGH is co 305 Un 304 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets Au tho rP DBV Closure Support {0,520} {0,520} {0,355} {0,919} {0,879} {0,879} {0,763} ACEFG CEFG DEF H EF F G 0.2 0.2 0.5 0.7 0.8 0.8 0.8 Table Characteristics of experimental databases Database # of transactions # of items Chess Pumsb Accidents 3196 49046 340183 76 7117 468 368 369 370 371 372 373 374 375 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 382 The efficiency of applying BitTable and DBVs for mining top-rank-k FCIs was evaluated The Fig Runtimes of TRK-FCI-DBV and TRK-FCI for Accidents database roo f 381 5.1 Execution time Au tho rP 380 experiments were conducted with various values of threshold k for the Accidents, Chess, and Pumsb databases With increasing threshold k, the number of FCIs increased, increasing the time required to obtain top-rank-k FCIs Figures to show that the time required for mining top-rank-k FCIs from the three databases increases with increasing k TRK-FCI-DBV runs Fig Memory usage of TRK-FCI-DBV and TRK-FCI for Chess database cte d 379 rre 378 The experimental databases have different features The Pumsb and Accidents databases have many transactions (or records), whereas the Chess database is small (3196 transactions) Fig Runtimes of TRK-FCI-DBV and TRK-FCI for Chess database Fig Memory usage of TRK-FCI-DBV and TRK-FCI for Accidents database co 377 Un 376 Fig Runtimes of TRK-FCI-DBV and TRK-FCI for Pumsb database Fig Memory usage of TRK-FCI-DBV and TRK-FCI for Pumsb database 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets faster than TRK-FCI For example, consider the Pumsb database with a threshold k of 200 The mining time of TRK-FCI is 179.8 s and that of TRK-FCIDBV is 130.7 s Most of the processing time for both algorithms is in the itemset expansion stage TRKFCI-DBV has a lower processing time because it uses a better data format [4] [5] [6] [7] 399 400 401 402 403 404 405 406 407 408 5.2 Memory usage Figures to show that the memory usage for mining top-rank-k FCIs for the three experimental databases increases with increasing threshold k The memory required by TRK-FCI-DBV is significantly less than that required by TRK-FCI Consider the Pumsb database with a threshold k of 120 The memory usage values of the two algorithms are similar; however, when the threshold k is increased to 200, the memory used by TRK-FCI is nearly double that used by TRK-FCI-DBV [8] [9] Au tho rP 398 [10] [11] [12] 409 Conclusion and future work [13] 414 415 416 417 418 419 420 421 422 423 424 425 426 References [1] 427 428 429 430 [2] 431 432 433 434 435 436 [14] [15] cte d 413 rre 412 [3] co 411 This paper proposed a method for mining top-rankk FCIs based on DCI-Plus Two efficient algorithms, TRK-FCI and TRK-FCI-DBV, were proposed These two algorithms differ in the way they represent data for each itemset, which gives them different mining times and memory usage values A strategy is used to automatically change minSup to prune candidates in the mining process The mining time and memory usage of the two algorithms were analyzed to compare the effectiveness of DBV compared to that of BitTable In the future, we will study how to prune candidates more efficiently Moreover, we will try to use other approaches for mining top-rank-k FCIs We will also expand our research to quantitative databases R Agrawal, T Imielinski and A Swami, Mining association rules between sets of items in large databases, In Proc of the 1993 ACM SIGMOD Conference Washington DC, USA, 1993, pp 207–216 R Agrawal and R Srikant, Fast algorithms for mining association rules in large databases, In Proc of the 20th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1994, pp 487–499 S Ayubi, M.K Muyeba, A Baraani and J Keane, An algorithm to mine general association rules from tabular data, Information Sciences 179(20) (2009), 3520–3539 Un 410 E Baralis, L Cagliero, T Cerquitelli and P Garza, Generalized association rule mining with constraints, Information Sciences 194 (2011), 68–84 Z.H Deng, Fast mining top-rank-k frequent patterns by using Node-lists, Expert Systems with Applications 41(4) (2014), 1763–1768 Y.J Du and H.M Li, Strategy for mining association rules for web pages based on formal concept analysis, Applied Soft Computing 10 (2010), 772–783 H.V Duong and T.C Truong, An efficient method for mining association rules based on minimum single constraints, Vietnam Journal of Computer Science 2(2) (2015), 67–83 P Fournier-Viger and V.S Tseng, Mining top-k nonredundant association rules, In Proc of 20th International Symposium, ISMIS 2012, Macau, China, 7661, 2012, pp 31–40 P Fournier-Viger and V.S Tseng, Mining top-K sequential rules, In Proc of ADMA 2011, Beijing, China, 7121, 2011, pp 180–194 P Fournier-Viger, C.W Wu and V.S Tseng, Mining top-K association rules, In Proc of Canadian Conference on AI 2012, Toronto, Canada, 7310, 2011, pp 61–73 G Grahne and J Zhu, Fast algorithms for frequent itemset mining using fptrees, IEEE Transactions on Knowledge and Data Engineering 17(10) (2005), 1347–1362 K Gouda and M.J Zaki, GenMax: An efficient algorithm for mining maximal frequent itemsets, Data Mining and Knowledge Discovery 11(3) (2005), 223–242 T.R Hoens, Q Qian, N.V Chawla and Z.H Zhou, Building decision trees for the multiclass imbalance problem, In Proc of PAKDD 2012, 2012, pp 122–134 Q.H.T Le, T Le, B Vo and B Le, An efficient and effective algorithm for mining top-rank-k frequent patterns, Expert Systems with Applications 42(1) (2015), 156–164 T Le and B Vo, An N-list-based algorithm for mining frequent closed patterns, Expert Systems with Applications 42(9) (2015), 6648–6657 B Le, M.T Tran and B Vo, Mining frequent closed intersequence patterns efficiently using dynamic bit vectors, Applied Intelligence 43(1) (2015), 74–84 W Li, J Han and J Pei, CMAR: Accurate and efficient classification based on multiple class-association rules, In Proc of The 1st IEEE International Conference on Data Mining, San Jose, California, USA, 2001, pp 369–376 B Liu, W Hsu and Y Ma, Integrating classification and association rule mining, In Proc of the 4th International Conference on Knowledge Discovery and Data Mining, New York, USA, 1998, pp 80–86 X.B Liu, K Zhai and W Pedrycz, An improved association rules mining method, Expert Systems with Applications 39(1) (2012), 1362–1374 W.Y Loh, Classification and regression trees, WIREs Data Mining and Knowledge Discovery 1(1) (2011), 14–23 C Lucchese, S Orlando and R Perego, Fast and memory efficient mining of frequent closed itemsets, IEEE Trans Knowledge and Data Engineering 18(1) (2006), 21–36 S.T Mai, X He, J Feng, C Plant and C Băohm, Anytime density-based clustering of complex data, Knowledge and Information Systems 45(2) (2015), 319–355 V Nebot and R Berlanga, Finding association rules in semantic web data, Knowledge-Based Systems 25 (2012), 51–62 L.T.T Nguyen, B Vo, T.P Hong and H.C Thanh, CARMiner: An efficient algorithm for mining class-association roo f [16] [17] [18] [19] [20] [21] [22] [23] [24] 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 505 [26] 506 507 508 509 [27] 510 511 512 [28] 513 514 515 516 [29] 517 518 519 [30] 520 521 [31] 522 523 524 525 [32] 526 527 528 529 530 531 [33] [35] [36] [37] [38] [39] roo f 504 M.T Tran, B Le and B Vo, Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently, Engineering Applications of Artificial Intelligence 38 (2015), 183–189 B Vo and B Le, Mining traditional association rules using frequent itemsets lattice, 39th International Conference on CIE, Troyes, France, 2009, pp 1401–1406 B Vo and B Le, Interestingness measures for association rules: Combination between lattice and hash tables, Expert Systems with Applications 38(9) (2011), 11630–11640 B Vo, T.P Hong and B Le, DBV-Miner: A dynamic bitvector approach for fast mining frequent closed itemsets, Expert Systems with Applications 39(8) (2012), 7196–7206 B Vo, T.P Hong and B Le, A lattice-based approach for mining most generalization association rules, KnowledgeBased Systems 45 (2013), 20–30 J Wang, J Han, Y Lu and P Tzvetkov, TFP: An efficient algorithm for mining top-k frequent closed itemsets, IEEE Transactions on Knowledge and Data Engineering 17(5) (2005), 652–664 J Wang, J Han and J Pei, CLOSET+: Searching for the best strategies formining frequent closed itemsets, In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp 236–245 M.J Zaki, Mining non-redundant association rules, Data Mining and Knowledge Discovery 9(3) (2004), 223–248 M.J Zaki and C.J Hsiao, Efficient algorithms for mining closed itemsets and their lattice structure, IEEE Transactions on Knowledge and Data Engineering 17(4) (2005), 462–478 Au tho rP 503 [34] [40] [41] [42] cte d [25] rre 502 rules, Expert Systems with Applications 40(6) (2013), 2305–2311 D Nguyen, L.T.T Nguyen, B Vo and T.P Hong, A novel method for constrained class association rule mining, Information Sciences 320 (2015), 107–125 N Pasquier, Y Bastide, R Taouil and L Lakhal, Discovering frequent closed itemsets for association rules In Proc of the 5th International Conference on Database Theory, 1999, pp 398–416 N Pasquier, Y Bastide, R Taouil and L Lakhal, Efficient mining of association rules using closed itemset lattices, Information Systems 24(1) (1999), 25–46 J Pei, J Han and R Mao, CLOSET: An efficient algorithm for mining frequent closed itemsets In Proc of the 5th ACMSIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, Texas, USA, 2000, pp.11–20 G Pyun and U Yun, Mining top-k frequent patterns with combination reducing techniques, Applied Intelligence 41(1) (2014), 76–98 J.R Quinlan, Introduction of decision tree, Machine Learning 1(1) (1986), 81–106 J Sahoo, A.K Das and A Goswami, An effective association rule mining scheme using a new generic basis, Knowledge and Information Systems 43(1) (2015), 127–156 N.G Singh, S.R Singh and A.K Mahanta, CloseMiner: Discovering frequent closed itemsets using frequent closed tidsets, In Proc of the 5th ICDM, Washington DC, USA, 2005, pp 633–636 M.R Tolun and S.M Abu-Soud, ILA: An inductive learning algorithm for production rule discovery, Expert Systems with Applications 14(3) (1998), 361–370 co 501 Un 500 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 ... International Conference on Very Large Data Bases, San Francisco, CA, USA, 1994, pp 487–499 S Ayubi, M.K Muyeba, A Baraani and J Keane, An algorithm to mine general association rules from tabular... other approaches for mining top-rank-k FCIs We will also expand our research to quantitative databases R Agrawal, T Imielinski and A Swami, Mining association rules between sets of items in large... Fig Memory usage of TRK-FCI-DBV and TRK-FCI for Chess database cte d 379 rre 378 The experimental databases have different features The Pumsb and Accidents databases have many transactions (or

Ngày đăng: 12/12/2017, 11:55

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan