DSpace at VNU: EIFDD: An efficient approach for erasable itemset mining of very dense datasets

10 155 0
DSpace at VNU: EIFDD: An efficient approach for erasable itemset mining of very dense datasets

Đang tải... (xem toàn văn)

Thông tin tài liệu

Appl Intell DOI 10.1007/s10489-014-0644-8 EIFDD: An efficient approach for erasable itemset mining of very dense datasets Giang Nguyen · Tuong Le · Bay Vo · Bac Le © Springer Science+Business Media New York 2015 Abstract Erasable itemset mining, first proposed in 2009, is an interesting problem in supply chain optimization The dPidset structure, a very effective structure for mining erasable itemsets, was introduced in 2014 The dPidset structure outperforms previous structures such as PID List and NC Set Algorithms based on dPidset can effectively mine erasable itemsets However, for very dense datasets, the mining time and memory usage are large Therefore, this paper proposes an effective approach that uses the subsume concept for mining erasable itemsets for very dense datasets The subsume concept is used to help early determine the information of a large number of erasable itemsets without the usual computational cost Then, the erasable itemsets for very dense datasets (EIFDD) algorithm, which uses the subsume concept and the dPidset structure for the erasable itemset mining of very dense datasets, is proposed G Nguyen Faculty of Information Technology, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam e-mail: nh.giang@hutech.edu.vn T Le ( ) · B Vo Division of Data Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam e-mail: lecungtuong@tdt.edu.vn; tuonglecung@gmail.com T Le · B Vo Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam B Vo e-mail: vodinhbay@tdt.edu.vn; bayvodinh@gmail.com B Le Faculty of Information Technology, University of Science, VNU, Ho Chi Minh City, Vietnam e-mail: lhbac@fit.hcmus.edu.vn An illustrative example is given to demonstrate the proposed algorithm Finally, an experiment is conducted to show the effectiveness of EIFDD Keywords Pattern mining · Erasable itemset · Subsume concept · Dense datasets Introduction Data mining, the computational process of discovering patterns in large datasets, has become important due to the fast growth of data Pattern mining, including frequent itemset mining [3, 9, 12, 16, 19, 21–23, 25, 26, 28, 29] top-rankk frequent patterns [10, 11], and sequential pattern mining [18, 30], is an essential task in data mining, especially association rule mining [1, 2, 17, 25] and its applications [4, 24] In recent years, the mining of erasable itemsets [7] and top-rank-k erasable itemsets [5, 20] has been introduced Consider the following problem A manufacturer produces many products, which are created from a number of items (components) Each product brings income for the manufacturer Unfortunately, in a financial crisis, the budget is insufficient to purchase all necessary components The erasable itemset mining problem is to find the itemsets that can be erased so as to minimize the loss of the factory’s revenue Managers can utilize the knowledge from these erasable itemsets to make a new production plan via a recommendation system There are many algorithms for mining erasable itemsets, including META [7], VME [8], MERIT [6], and MEI [14] Thes algorithms were presented and compared in a study by [15] The MERIT algorithm uses an NC Set generated from a WPPC-tree, whereas the MEI algorithm uses the dPidset structure Based on the experimental results Giang Nguyen et al reported by [14], MEI is currently the most effective algorithm for mining erasable itemsets However, the runtime and memory usage of MEI are still quite large for very dense datasets with a small number of total items, a large number of items for each product, and many co-occurring itemsets Therefore, this paper proposes an improved algorithm that uses the subsume concept and the dPidset structure to enhance the process of mining erasable itemsets for very dense datasets The main contributions of this paper are as follows: (i) the subsume concept for erasable itemsets is described, (ii) a method for determining the subsume index associated with erasable 1-itemsets is proposed, and (iii) the erasable itemsets for very dense datasets (EIFDD) algorithm is proposed To demonstrate the effectiveness of the proposed method for very dense datasets, experiments were conducted The experimental results show that EIFDD outperforms MEI and MERIT algorithms in terms of mining time and memory usage for very dense datasets The rest of the paper is organized as follows Section presents the basic concept of erasable itemsets Related work is presented in Section The dPidset structure for quickly mining erasable itemsets, the subsume concept for erasable itemset mining, a fast method for determining the subsume index of erasable 1-itemsets, and EIFDD are introduced in Section Performance studies are presented in Section to show the effectiveness of EIFDD for very dense datasets Section gives a summary and a number of recommendations Basic concepts Deng et al [7] defined erasable itemsets and the problem of mining erasable itemsets, which are summarized below Let all items of a manufacturer be represented as I = {i1 , i2 , , im } Let DB = {P1 , P2 , , Pn } be a product dataset of this manufacturer, where P i is a product presented in the form Items, Val Items are the items that compose this product and Val is the revenue that the factory gets by selling this product The example dataset presented in Table is used throughout this article Definition (revenue of an itemset) Let X(⊆ I ) be an itemset The revenue of X is determined as: g(X) = V al(Pi ) {Pi |X∩I tems(Pi )= } where: – – – g(X) is the revenue of itemset X; Val(Pi ) is the revenue of product Pi ; Items(Pi ) represents the items used to create Pi (1) Table Example dataset (DBE ) Product Items Val ($) P1 P2 P3 P4 P5 P6 P7 P8 a, b a, b, e c, e b, d, e, f c, d, e d, e, f , h d, h d, f , h 1,000 200 150 50 100 200 150 100 Example According to Definition 1, g(a) = V al(P1 ) + V al(P2 ) = 1200, and g(h) = V al(P6 ) + V al(P7 ) + V al(P8 ) = 450 Definition (erasable itemset.) Given a threshold ξ and a product dataset DB, let T = pi ∈DB V al(Pi ) be the total revenue of the factory An itemset X is an erasable itemset if and only if: g(X) ≤ T × ξ (2) Example Let ξ = 40 % For DBE , T = 1950 According to Definition 2, g(a) = 1200 > T × ξ = 40 % × 1950 = 780; therefore, item a is not an erasable itemset In contrast, g(h)600 < 780, and hence item h is an erasable itemset All EIs for DBE with ξ = 40 % are shown in Table Table All EIs for DBE with ξ = 40 % Erasable itemsets Val ($) e e, c d d, h d, f d, h, f d, c d, c, h d, c, f d, c, h, f h h, f h, c h, c, f f f, c c 700 700 600 600 600 600 750 750 750 750 450 500 700 750 350 600 250 EIFDD: An efficient approach for erasable itemset mining of very The mining erasable itemsets problem is to find all erasable itemsets whose revenue g(X) is less than T × ξ in the dataset Related work Deng et al [7] proposed META, an Apriori-based algorithm, for mining erasable itemsets However, the runtime of this algorithm is slow because: META scans the database k+1 times, where k is the maximum level of erasable itemsets The strategy META uses to generate candidate itemsets is a naăve strategy in which an erasable (k 1)-itemset X is considered with all remaining erasable (k − 1)-itemsets used to combine and generate erasable k-itemsets Consequently, [8] proposed the VME algorithm, which is faster than META However, VME still has some significant disadvantages: It scans the dataset twice to determine which 1-itemsets are erasable (it is well established that scanning a dataset requires considerable computer time and memory) It uses an inefficient mechanism for generating candidate erasable k-itemsets in which the set of (k-1)itemsets is used to define candidate k-itemsets For example, given the erasable 2-itemsets {ab, ac, ad, bc, bd, cd}, VME groups these according to their 1-itemset prefixes ({a}, {b}, and {c}) to give three groups: {ab, ac, ad}, {bc, bd}, and {cd} VME then combines the elements of each group to create the candidate erasable 3-itemsets, {abc, abd, acd} and {bcd} However, this process is computationally expensive It stores each product’s revenue in the form of a tuple, PID, Val , where PID is the product identifier and Val is the revenue value This leads to a duplication of data because a PID, Val pair can appear in many PID Lists associated with different erasable itemsets Thus, the VME algorithm requires a lot of memory It uses a strategy whereby the PID List associated with erasable itemset X is a subset of the PID List associated with erasable itemset Y , where X ⊂ Y This strategy requires significant memory and computational power when large numbers of erasable itemsets are considered MERIT [6] uses the concept of NC Sets to reduce memory usage for mining EIs Although the use of NC Sets gives MERIT some advantages over VME, there are still some disadvantages: The weight value of each node code (NC) is stored individually even though it can appear in many erasable itemsets’ NC Sets, leading to a lot of duplication It uses a strategy whereby itemset X’s NC Set is assumed to be a subset of itemset Y ’s NC Set if X ⊂ Y This leads to high memory consumption and high runtime when itemsets are combined to create new nodes MEI [14] uses the dPidset structure to quickly determine the information of erasable itemsets Although mining time and memory usage are better than those of the above algorithms, MEI’s performance for mining erasable itemsets for very dense datasets can be improved EIFDD algorithm 4.1 dPidset structure Definition (pidset of an itemset) The pidset of itemset X is denoted as: pX = p(A) (3) A∈X where: – – A is an item in itemset X; p(A) is the pidset of item A, i.e., the set of product identifiers (IDs) which have item A Definition (revenue of an itemset based on pidset) The revenue of itemset X, denoted by g(X), is computed easily as follows: g(X) = V al(Pi ) (4) Pi ∈p(X) Theorem Let XA and XB be two itemsets with the same prefix X (X can be an empty set).p(XA) and p(XB) are pidsets of XA and XB, respectively The pidset of XAB denoted by p(XAB) is computed as follows: p(XAB) = p(XB) ∪ p(XA) (5) Example For DBE , p(ab) = {1, 2, 4} and p(ac) = {1, 2, 3, 5} According to Theorem 1, p(abc) = p(acb) = p(ab) ∪ p(ac) = {1, 2, 4}∪ {1, 2, 3, 5} = {1, 2, 3, 4, 5} Definition (dPidset of an itemset) The dPidset of itemset XAB denoted by dP (XAB) based on p(XA) and p(XB) is defined as follows: dP (XAB) = pXB \ p(XA) (6) where p(XB) \ p(XA) is the set of product IDs which only exist on p(XB) Giang Nguyen et al Example We have p(ab) = {1, 2, 4} and p(ac) = {1, 2, 3, 5} Based on Definition 5, dP (abc) = p(ac) \ p(ab) = {1, 2, 3, 5}\ {1, 2, 4} = {3, 5} Note that reversing the order of ab and ac will get a different result Consequently, dP (acb) = p(ab) \ p(ac) = {4} Theorem Let XA and XB be two itemsets dP(XA) and dP(XB) are the dPidsets of XA and XB, respectively The dPidset of XAB denoted by dP(XAB) is computed as follows: dP (XAB) = dP (XB) \ dP (XA) (7) Example Based on DBE , p(a) = {1, 2}, p(b) = {1, 2, 4} and p(c) = {2, 3, 5} According to Definition 5, dP (ac) = p(c) \ p(a) = {3, 5} and dP (ab) = p(b) \ p(a) = {4} Based on Theorem 4, dP (abc) = dP (ac) \ dP (ab) = {3, 5}\ {4}= {3, 5} In Examples and 3, dP (abc) = {3, 5} Therefore, these examples verify Theorem Theorem The revenue of XAB denoted by g(XAB) is determined based on that of XA as follows: g(XAB) = g(XA) + V al(Pi ) (8) Pi ∈dP (XAB) Theorem Let A, B, and C ∈ I1 be three items If A ∈ Subsume(B) and B ∈ Subsume(C), then A ∈ Subsume(C) Proof We have A ∈ Subsume(B) and B ∈ Subsume(C) and hence p(B) ⊆ p(A) and p(C) ⊆ p(B) Therefore p(C) ⊆ p(A) and thus this theorem is proven 4.3 Algorithm for finding subsume index associated with E1 Using the definition of the subsume index associated with erasable 1-itemsets (E1 ) based on pidsets (Definition 6), this paper proposes Algorithm for finding this index After determining the pidsets associated with E1 and sorting E1 in descending order of pidset length, the algorithm uses two loops to find the subsume index associated with E1 , as shown in Fig 4.4 EIFDD algorithm where g(XA) is the revenue of X and V al(Pi ) is the revenue of Pi Example We have p(a) = {1, 2}, p(b) = {1, 2, 4} and p(c) = {3, 5}, and thus g(a) = 1200, g(b) = 1250, and g(c) = 250 According to Example 3, dP (ac) = {3, 5} and dP (ab) = {4}, and thus g(ac) = g(a) + Pi ∈dP (ac) V al(Pi ) = 1200 + 150 + 100 = 1450 and g(ab) = g(a) + Pi ∈dP (ab) V al(Pi ) = 1200 + 50 = 1250 dP (abc) = {3, 5} so g(abc) = g(ab) + Pi ∈dP (abc) V al(Pi ) = 1250 + 150 + 100 = 1500 4.2 Subsume concept Definition [11] The subsume index of an erasable 1itemset, A, denoted by Subsume(A), is defined as follows: Subsume(A) = {B ∈ II |P (A) ⊆ p(B)} {f , h, f h} Based on Theorem 4, the revenue of 2m − itemsets, which are 2m − nonempty subsets of Subsume(d) combined with d, is equal to g(d) In this case, the revenue of {df , dh, df h } is 750 dollars The EIFDD algorithm is shown in Fig Firstly, the algorithm scans the dataset to determine E1 with their pidsets, and then sorts E1 in descending order of pidset length Secondly, the algorithm calls Algorithm to generate the subsume index associated with E1 Thirdly, the algorithm puts all EIs in E1 to the results Finally, the algorithm calls the Expand E procedure, which uses the divide-andconquer strategy and the subume index associated with E1 to mine all erasable itemsets The processes of this approach are described below For erasable 1-itemsets, the algorithm considers the first element, A, with the remaining elements in erasable 1itemsets to create the erasable 2-itemsets If A has a number of subsume values, all itemsets that combine A and (9) We have p(a) = {1, 2} and p(b) = {1, 2, 4} Because p(a) ⊆ p(b), a ∈ Subsume(b) Theorem Given the subsume index of an item A, Subsume(A) = {a1 , a2 , , am }, the revenue of each of the 2m − nonempty subsets of {a1 , a2 , , am }is equal to the revenue of A According to DBE , we have Subsume(d) = {f , h} Therefore the 2m − nonempty subsets of Subsume(d) are Fig Algorithm for finding subsume index associated with E1 EIFDD: An efficient approach for erasable itemset mining of very Fig EIFDD algorithm 2m - nonempty subsets generated from Subsume(A) are considered erasable itemsets without calculating their pidsets and revenues The remaining elements (∈ / Subsume(A)) are combined with A to create 2-itemsets For elements whose revenues are smaller than T × ξ , the algorithm: (i) adds them and the erasable itemsets that are combined with 2m - nonempty subsets generated from Subsume(A) to the results and (ii) adds them to Enext The algorithm is called recursively with erasable 2-itemsets as parameter to create erasable 3-itemsets The algorithm continues until no EIs are created.The algorithm uses this strategy with all elements in E1 until all itemsets that can be created from n elements of erasable 1-itemsets are considered Fig Erasable 1-itemsets and their pidsets for DBE with ξ = 40 % {} e {23456} 700 d {45678} 600 h {678} 450 f {468} 350 c {35} 250 Giang Nguyen et al Table Subsume index associated with erasable 1-itemsets Erasable 1-itemset Subsume e d f h c c f, h Step 4: f is considered with c to create fc with g(fc) = 600 dollars fc is added to the results Figure shows all erasable itemsets for DBE with ξ = 40 % As shown in Fig 7, the algorithm does not compute and store the pidsets of the nodes {df, dh, dfh, dcf, dch, dcfh, ec} Therefore, using the subsume concept reduces the runtime and memory usage of erasable itemset mining 4.6 Complexity analysis 4.5 An illustrative example Firstly, EIFDD scans DBE to find erasable 1-itemsets and their pidsets (line of the EIFDD algorithm) with ξ = 40 % Fig Secondly, the algorithm finds the subsume index associated with erasable 1-itemsets Table on line of the EIFDD algorithm, which calls Algorithm Fig Thirdly, the algorithm calls the Expand E procedure for mining all erasable itemsets The following four steps are used: Step 1: e is considered with the remaining erasable 1itemsets Because Subsume(e) = {c}, ec is added to the results without determining its information ed, ef, and eh are disqualified because their revenues exceed the threshold Figure shows the results of this step Step 2: d is considered with the remaining erasable 1itemsets (h, f , and c) h and f belong to Subsume(d); therefore, the algorithm adds df, dh, and dfh to the results without determining their information d is combined with c to create dc with g(dc) = 750 dollars dc and the erasable itemsets {dch, dcf , dchf } created from Subsume(d) and dc are added to the results (Fig 5) Step 3: h is considered with the remaining erasable 1itemsets (f and c) to create hf and hc with g(hf ) = 500 dollars and g(hc) = 700 dollars hf and hc are added to the results They are used to create hfc with g(hf c) = 750 dollars, which is added to the results Fig Let m be the number of items in the product dataset The search space for the enumeration of all erasable itemsets is 2m − However, based on the anti-monotone characteristic (if X is inerasable, and Y is a superset of X, Y must also be inerasable), the overall complexity of algorithms such as MEI is O(r × n × 2l ), where n is the number of products, l is the length of the longest erasable itemset, and r is the maximum number of erasable itemsets The EIFDD algorithm reduces the search space for erasable itemset mining using the subsume concept The complexity of determining the subsume index associated with erasable 1-itemsets is O(m2 ) In the best case, all erasable 1-itemsets which are behind an erasable itemset X are in the subsume index of X Then, the complexity of finding all erasable itemsets is O(m) The overall complexity in this case is O(m2 ) In the worst case, all erasable 1-itemsets which are an erasable itemset X are not in the subsume index of X Then, the complexity of finding all erasable itemsets is O(r × n × 2l ) The overall complexity in this case is O(r × n × 2l + m2 ) Fortunately, for very dense datasets, there are many elements in the subsume index associated with erasable 1-itemsets (see Section 5.2) Therefore, the best case is most likely Performance studies This section reports the performance of the proposed EIFDD algorithm MEI and MERIT for a number of dense datasets All studies in this section were performed on Fig Erasable itemsets for DBE with ξ = 40 %(step 1) {} e {23456} 700 ec 700 d {45678} 600 h {678} 450 f {468} 350 c {35} 250 EIFDD: An efficient approach for erasable itemset mining of very {} Fig Erasable itemsets for DBE with ξ = 40 % (step 2) d {45678} 600 e {23456} 700 ec dh df dhf 700 600 600 600 h {678} 450 f {468} 350 c {35} 250 f {468} 350 c {35} 250 f {468} 350 c {35} 250 dc {3} 750 dch dcf dchf 750 750 750 Fig Erasable itemsets for DBE with ξ = 40 %(step 3) {} d {45678} 600 e {23456} 700 ec dh df dhf 700 600 600 600 h {678} 450 dc {3} 750 hc {35} 700 hf {4} 500 dch dcf dchf 750 750 750 Fig Erasable itemsets for DBE with ξ = 40 % (step 4) hfc {35} 750 {} d {45678} 600 e {23456} 700 ec dh df dhf 700 600 600 600 h {678} 450 dc {3} 750 hc {35} 700 hf {4} 500 dch dcf dchf 750 750 750 hfc {35} 750 fc {35} 600 Giang Nguyen et al Table Numbers of subsume values associted with erasable 1itemsets Table Features of datasets used in experiments Dataset Type # of products # of items Chess Mushroom Connect Very Dense Normal Dense 3,196 8,124 67,557 76 120 130 Dataset Threshold # of erasable # of subsume values associate 1-itemsets with erasable 1-itemsets Chess 38 43 48 53 58 33 36 38 39 41 13 20 23 24 32 Mushroom 1.2 1.7 2.2 2.7 3.2 28 29 31 37 38 18 18 18 42 42 Connect 33 34 35 37 39 6 7 These datasets are available at http://sdrv.ms/14eshVm a computer with an Intel Core i3-3110M 2.4-GHz CPU and GB of RAM All the programs were implemented in Visual Studio C# and the Net framework (version 4.5.50709) The experiments were conducted on datasets Chess, Connect, and Mushroom1 In order to make these datasets look like the product datasets introduced in Section 2, a column was added into each dataset to store the product revenues To generate revenue values, a function denoted by N(100, 50), where the mean value is 100 and the variance is 50, was created The features of these datasets are shown in Table 2.3 2.8 3.3 3.8 4.3 5.1 Compactness of subsume index Table shows the number of subsume values associated with erasable 1-itemsets and the number of erasable 1itemsets Note that an erasable 1-itemset can have one or more subsume values Therefore, the number of subsume values associated with erasable 1-itemsets can be greater than the number of erasable 1-itemsets The EIFDD algorithm is more effective than the MEI algorithm for datasets with a large number of subsume values associated with erasable 1-itemsets Figures 8, and 10 show the numbers of nodes with subsume values and the total nodes for various thresholds for Chess and Mushroom datasets For a larger number of nodes, the pidsets and revenues not need to be determined Therefore, the mining time and memory usage are reduced 5.3 Mining time This section compares the mining times of MEI [14] and EIFDD on the experimental datasets Figures 14, 15 and 16 show that the mining time of EIFDD is much smaller than that of MEI However, MEI cannot run for some thresholds (3.2 % for Mushroom and 4.3 % for Connect) due to memory limitations Conclusion and future work This paper proposed a method for mining erasable itemsets from very dense datasets Firstly, the subsume concept is used to help early determine the information of a large 5.2 Memory usage Downloaded from http://fimi.cs.helsinki.fi/data/ Millions This section compares the memory usage of MEI [14] and EIFDD on three experimental datasets For the Chess dataset, the memory usage of EIFDD is less than that of MEI for all thresholds (from 38 % to 58 %) Fig 11 For the other datasets Figs 12 and 13, MEI cannot run with large thresholds (3.2 % for Mushroom dataset and 4.3 % for Connect dataset) due to memory limitations The EIFDD algorithm is thus more efficient than MEI in terms of memory usage 10 # of nodes # of nodes with subsume 38 43 48 53 58 (%) Fig Number of nodes with subsume values and total number of nodes of EIFDD algorithm’s results for Chess dataset 12 800 10 700 # of nodes # of nodes with subsume Memory usage (Mb) Millions EIFDD: An efficient approach for erasable itemset mining of very 600 500 400 EIFDD 300 MEI 200 100 1.2 1.7 2.2 2.7 3.2 (%) 2.3 # of nodes # of nodes with subsume 2.3 2.8 3.3 3.8 4.3 (%) Fig 10 Number of nodes with subsume value and total number of nodes of EIFDD algorithm’s results for Connect dataset Memory usage (Mb) Mining Ɵme (seconds) 500 450 400 350 300 250 200 150 100 50 2.8 3.3 3.8 4.3 (%) Fig 13 Memory usage of EIFDD and MEI algorithms for Connect dataset 20 18 16 14 12 10 EIFDD MEI 38 43 48 53 58 (%) Fig 14 Mining time of EIFDD and MEI algorithms for Chess dataset 25 EIFDD MEI 38 43 48 53 58 Mining Ɵme (seconds) Millions Fig Number of nodes with subsume value and total number of nodes of EIFDD algorithm’s results for Mushroom dataset 20 15 EIFDD MEI 10 (%) 1.2 Fig 11 Memory usage of EIFDD and MEI algorithms for Chess dataset 1.7 2.2 2.7 3.2 (%) Fig 15 Mining time of EIFDD and MEI algorithms for Mushroom dataset 400 30 350 25 300 250 EIFDD 200 MEI 150 100 50 Mining Ɵme (seconds) Memory usage (Mb) 450 20 EIFDD MEI 15 10 1.2 1.7 2.2 2.7 3.2 (%) 2.3 Fig 12 Memory usage of EIFDD and MEI algorithms for Mushroom dataset 2.8 3.3 3.8 4.3 (%) Fig 16 Mining time of EIFDD and MEI algorithms for Connect dataset Giang Nguyen et al number of erasable itemsets without the usual computational cost Then, a fast procedure for finding the subsume index associated with erasable 1-itemsets based on the dPidset structure is used The EIFDD algorithm was proposed based on these concepts An example was presented to demonstrate the proposed algorithm The performance studies show that EIFDD outperforms the MEI algorithm in mining erasable itemsets from very dense datasets in terms of mining time and memory usage In future work, a number of issues will be considered, including mining erasable closed/maximal itemsets, mining erasable itemsets from a data stream, and erasable itemset applications Acknowledgments This research is funded by Foundation for Science and Technology Development of Ton Duc Thang University (FOSTECT), website: http://fostect.tdt.edu.vn, under Grant FOSTECT.2015.BR.01 References Agrawal R, Srikant R (1994) Fast algorithms for mining association rules In VLDB’94 Agrawal R, Imielinski T, Swami A (1993) Mining association rules between set of items in large databases In SIGMOD’93 Calders T, Dexters N, Gillis JJM, Goethals B (2014) Mining frequent itemsets in a stream Inf Syst 39:233–255 Czibula G, Marian Z, Czibula IG (2014) Software defect prediction using relational association rule mining Inf Sci 264:260– 278 Deng ZH (2013) Mining top-rank-k erasable itemsets by PID lists Int J Intell Syst 28(4):366–379 Deng ZH, Xu XR (2012) Fast mining erasable itemsets using NC sets Expert Syst Appl 39(4):4453–4463 Deng Z, Fang G, Wang Z, Xu X (2009) Mining erasable itemsets In ICMLC’09 Deng ZH, Xu XR (2010) An efficient algorithm for mining erasable itemsets In ADMA’10:214–225 Han J, Pei J, Yin Y (2003) Mining frequent patterns without candidate generation In SIGMOD’00:1–12 10 Huynh-Thi-Le Q, Le T, Vo B, Le B (2015) An efficient and effective algorithm for mining top-rank-k frequent patterns Expert Syst Appl 42(1):156–164 11 Pyun G, Yun U (2014) Mining top-k frequent patterns with combination reducing techniques Appl Intell 41(1):76–98 12 Pyun G, Yun G, Ryu KH (2014) Efficient frequent pattern mining based on Linear Prefix tree Knowledge-Based Syst 55:125–139 13 Le T, Vo B, Coenen F (2013) An efficient algorithm for mining erasable itemsets using the difference of NC-Sets In IEEE SMC’13:2270–2274 14 Le T, Vo B (2014) MEI: an efficient algorithm for mining erasable itemsets Eng Appl Artif Intell 27:155–166 15 Le T, Vo B, Nguyen G (2014) A survey of erasable itemset mining algorithms WIREs Data Min Knowl Disc 4(5):356–379 16 Li H, Zhang H, Zhu J, Cao H, Wang Y (2014) Efficient frequent itemset mining methods over time-sensitive streams KnowlBased Syst 56:281–298 17 Li Y, Wu J (2014) Interpretation of association rules in multi-tier structures Int J Approx Reason 55(6):1439–1457 18 Liao VCC, Chen MS (2014) DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences Knowl Inf Syst 38(3):623–639 19 Lin KC, Liao I, Chang TP, Lin SF (2014) A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping Inf Sci 276:278–289 20 Nguyen G, Le T, Vo B, Le B (2014) A new approach for mining top-rank-k erasable itemsets In ACIIDS’14 21 Nori F, Deypir M, Sadreddini MH (2013) A sliding window based algorithm for frequent closed itemset mining over data streams J Syst Softw 86(3):615–623 22 Sohrabi MK, Barforoush AA (2013) Parallel frequent itemset mining using systolic arrays Knowl-Based Syst 37:462–471 23 Song W, Yang B, Xu Z (2008) Index-BitTableFI: An improved algorithm for mining frequent itemsets Knowl-Based Syst 21:507–513 24 Versichele M, Groote L, Bouuaert MC, Neutens T, Moerman I, Weghe NV (2014) Pattern mining in tourist attraction visits through association rule learning on Bluetooth tracking data: A case study of Ghent, Belgium Tour Manag 44:67–81 25 Vo B, Coenen F, Le T, Hong T-P (2013) A hybrid approach for mining frequent itemsets In IEEE SMC’13:4647–4651 26 Vo B, Le T, Coenen F, Hong TP (2014) Mining frequent itemsets using the N-list and subsume concepts International Journal of Machine Learning and Cybernetics http://dx.doi.org/10.1007/ s13042-014-0252-2 27 Vo B, Hong TP, Le B (2013) A lattice-based approach for mining most generalization association rules Knowl-Based Syst 45:20– 30 28 Zaki MJ (2000) Scalable algorithms for association mining IEEE Trans Knowl Data Eng 12(3):372–390 29 Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets In SIGKDD’03 30 Zhang B, Lin CW, Gan W, Hong TP (2014) Maintaining the discovered sequential patterns for sequence insertion in dynamic databases Eng Appl Artif Intell 35:131–142 ... 600 250 EIFDD: An efficient approach for erasable itemset mining of very The mining erasable itemsets problem is to find all erasable itemsets whose revenue g(X) is less than T × ξ in the dataset... better than those of the above algorithms, MEI’s performance for mining erasable itemsets for very dense datasets can be improved EIFDD algorithm 4.1 dPidset structure Definition (pidset of an itemset) ... 1-itemsets are erasable (it is well established that scanning a dataset requires considerable computer time and memory) It uses an inefficient mechanism for generating candidate erasable k-itemsets

Ngày đăng: 16/12/2017, 17:55

Từ khóa liên quan

Mục lục

  • EIFDD: An efficient approach for erasable itemset mining of very

    • Abstract

    • Introduction

    • Basic concepts

    • Related work

    • EIFDD algorithm

      • dPidset structure

      • Subsume concept

      • Algorithm for finding subsume index associatedwith E1

      • EIFDD algorithm

      • An illustrative example

      • Complexity analysis

      • Performance studies

        • Compactness of subsume index

        • Memory usage

        • Mining time

        • Conclusion and future work

        • Acknowledgments

        • References

Tài liệu cùng người dùng

Tài liệu liên quan