FSKYMINE A Faster Algorithm For Mining Skyline Frequent Utility Itemsets. PowerPoint Presentation FSKYMINE A Faster Algorithm For Mining Skyline Frequent Utility Itemsets Good morning, chair, ladies and gentlemen My name is Cheng Wei, Wu I am a PhD student from National Che.
Trang 1FSKYMINE A Faster Algorithm For
Mining Skyline Frequent Utility Itemsets
Trang 2Frequent itemsets mining (FIM)
High-utility itemsets mining (HUIM)
Skyline frequent utility itemsets mining (SFUIM)
Trang 31) Difficulty to specify the minSup value
2) Ignoring the item utilities like weight, unit profit, and quantity, meanwhile such aspects are preferable in practical problems
High-utility itemsets mining (HUIM)
Overcome the second limitation of FIM by using both profits and quantities of products in transactions to extract actual utility values of itemsets.
The problem of both FIM and HUIM it is that they require
choosing an threshold for minimum support and utility by users It is very difficul to choice an appropriate
Skyline frequent utility itemsets mining (SFUIM)
Trang 4Skyline frequent utility itemsets mining (SFUIM)
Trang 5 Utility of an item ip in the transaction Td
u(ip ,Td ) = q(ip, Td ) × p(ip)
High Utility Itemset
An itemset X is called a high utility itemset iff
u(X) > min_utiliy
i.e., min_utility = 30,
{B}: 16 is a low utility itemset ; {BD}: 30 is a high utility itemset
ItemABCDEFGUnit
Transactional Database
Trang 6 Utility of an item ip in the transaction Td
u(ip ,Td ) = q(ip, Td ) × p(ip)
High Utility Itemset
An itemset X is called a high utility itemset iff
u(X) > min_utiliy
i.e., min_utility = 30,
{B}: 16 is a low utility itemset ; {BD}: 30 is a high utility itemset
ItemABCDEFGUnit
Transactional Database
Trang 7 Utility of an item ip in the transaction Td
u(ip ,Td ) = q(ip, Td ) × p(ip)
High Utility Itemset
An itemset X is called a high utility itemset iff
Transactional Database
Trang 8 Utility of an item ip in the transaction Td
u(ip ,Td ) = q(ip, Td ) × p(ip)
High Utility Itemset
An itemset X is called a high utility itemset iff
u(X) > min_utiliy
i.e., min_utility = 30,
{B}: 16 is a low utility itemset ; {BD}: 30 is a high utility itemset
ItemABCDEFGUnit
i.e., u({AD}) = u({AD}, T1) + u({AD}, T3) = 7 +
{BE}:31, {BCE}:37, {ACE}:31{BD}:30, {BCD}:34, {BDE}:36
{BCDE}:40, {ABCDEF}:30
min_utility = 30
T1 (A,1)(C,1)(D,1)T2 (A,2)(C,6)(E,2)(G,5)T3 (A,1)(B,2)(C,1)(D,6)(E,1)(F,5)T4 (B,4)(C,3)(D,3)(E,1)T5 (B,2)(C,2)(E,1)(G,1)
Transactional Database
Trang 9 An itemset X is said to dominate
another itemset Y in D, denoted as X≻Y iff f(X)≥f(Y) and u(X) ≥u(Y).
An itemset is skyline frequent utility itemset iff it is not dominated by any other itemset in the database
ItemABCDEFGUnit
Transactional Database
{C}: sup=5, Util=13;{C, E}: sup=4, Util=27;{B, C, E}: sup=3, Util=31;{B, C, D, E}: sup=2, Util=40.
Có thể thấy ở đây: Util(A)=5+10+5=20Sup(A)=3
Bị dominated bởi {B, C, E}=> {A} không phải là SFUI
Trang 10SKYMINE2 [9]
Limitations: The algorithm performs numerous operations of joining two utility lists and generates numerous utility lists and potentials SFUIs.
[6] Vikram Goyal, Ashish Sureka, and Dhaval Patel Efficient skyline itemsets
mining In Proceedings of the Eighth International Conferenceon Computer Science & Software Engineering, pages 119–124 ACM, 2015
[9] Jerry Chun-Wei Lin, Lu Yang, Philippe Fournier-Viger, Siddharth
Dawar, Vikram Goyal, Ashish Sureka, and Bay Vo A more efficient
algorithm to mine skyline frequent-utility patterns In International
Conference on Genetic and Evolutionary Computing, pages 127–135.
Springer, 2016.
Trang 11Proposed Algorithm
FMSFUI (Faster Algorithm For Mining Skyline Utility Itemsets)
Frequent-• We propose:
• a mechanism named remaining transaction-weighted
utility cooccurrence of pair item x, yin a database
SD is denoted as rtwuc(x, y).
• And a data structure name extent utility list of an itemset in a DB
•
Trang 12T1 (A,1)(C,1)(D,1)T2 (A,2)(C,6)(E,2)(G,5)T3 (A,1)(B,2)(C,1)(D,6)(E,1)(F,5)T4 (B,4)(C,3)(D,3)(E,1)T5 (B,2)(C,2)(E,1)(G,1)
T3 (F,5) (D,6)(B,2)(A,1) (E,1) (C,1) T4 (D,3)(B,4) (E,1) (C,3)
T5 (G,1)(B,2) (E,1)(C,2)
revisedTransactional Database
Trang 13Proposed Algorithm
FMSFUI (Faster Algorithm For Mining Skyline Frequent-Utility Itemsets)
The remaining transaction-weighted utility
cooccurrence of pair item x; y in a database SD is
denoted as rtwuc(x, y) and defined as the sum of the remaining transaction-weighted utility co-
occurrence of pair item x, y in all transactions containing both of the item x; y in the database.
rtwuc
Caculate rtwuc(x,y)
Trang 14(F,5) (D,6)(B,2)(A,1) (E,1) (C,1)
T4 (D,3)(B,4) (E,1) (C,3)T5 (G,1)(B,2) (E,1)(C,2)
revisedTransactional Database
ItemABCDEFGUnit
5212311
Trang 15Trans(xy).itemSetutil=Trans(x).itemSetutils +Trans(x).itemUtils Trans(xy) itemUtils =Trans(y).itemUtils
Trans(xy).rutils= Trans(y).rutils
Trang 16Proposed Algorithm
FMSFUI (Faster Algorithm For Mining Skyline Utility Itemsets)
Frequent-The maximal utility of the frequency value r is
denoted as umax[r] and defined as the
maximal utility of itemsets having the same frequency value r.
sumItemUtils) Given an itemset Px having occurrence frequency is r If the sum of sumItemSetutils and sumItemUtils values of extent utility list of Px is higher than or equal to umax(r) then Px is a potential skyline frequent-utility itemset.
Trang 17Proposed Algorithm
FMSFUI (Faster Algorithm For Mining Skyline Utility Itemsets)
itemsets Px and Py such that Px having occurrence frequency is r, Py having occurrence frequency is r1 If
min(Px.sumitemsetutils,
than umax(r) or umax(r1) then Pxy and all extensions of Pxy are not SFUIs.
Trang 18Performance Evaluation
Compared Algorithms
FSKYMINE (PROPOSED ALGORITHM)
Platform for Experiment
Intel® Core 5 Quad Processor @ 2.30 GHz 8 Gigabyte Memory
Implement in Java LanguageRunning on Windows 10
Trang 19Performance evaluation
Trang 20In this paper, we proposed a very fast algorithm namely
FSKYMINEfor efficiently mining skyline frequent utility itemsets.
We proposed a mechanism named remaining transaction-weighted utility cooccurrence of pair item x, y in a database SD is denoted as rtwuc(x, y) and a data structure name extent utility list of an itemset
in a DB And based on these, we develop strategy of
Pruning) to reduce the number of join operations in mining process skyline frequent utility itemsets.
significantly outperforms SKYMINE2.
System Lab, NCKU, Taiwan
Trang 21Thanks for your attention
Hung Manh Nguyen
Le Quy Don Technical University
Hanoi, Vietnam
Anh Viet Phan
Le Quy Don Technical University
Hanoi, Vietnamanhpv@mta.edu.vn
Lai Van Pham
Military Science and Technology Institute
Hanoi, Vietnamgarry@cinnamon.is