1. Trang chủ
  2. » Giáo Dục - Đào Tạo

CloSpan: Mining Closed Sequential Patterns in Large Datasets

50 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

CloSpan: Mining Closed Sequential Patterns in Large Datasets. PowerPoint Presentation CloSpan Mining Closed Sequential Patterns in Large Datasets SEQUENTIAL PATTERNS Natural Language Processing Lab , NTU, 2006 Slide Outline Introduction Search Space Pruning Cl.

SEQUENTIAL PATTERNS CloSpan: Mining Closed Sequential Patterns in Large Datasets Outline      Introduction Search Space Pruning CloSpan Experimental Results Conclusions Natural Language Processing Lab., NTU, 2006 Slide - Introduction  Definition – Sequence, Elements, Subsequence and Sequential Pattern A sequence : < (ef) (ab) (df) c b> A sequence database SID sequence 10 20 30 40 Elements items within an element are listed alphabetically is a subsequence of threshold Givendsupport min_sup_count =2, is a sequential pattern Natural Language Processing Lab., NTU, 2006 Slide - Introduction  (Cont.) Definition – Frequent Sequential Pattern (FS)  Include all the sequences whose support is no less than min_sup – Closed Frequent Sequential Pattern (CS)  Include no sequence which has a super-sequence with the same support  CS  FS Natural Language Processing Lab., NTU, 2006 Slide - Introduction  (Cont.) Example – FS & CS ID Sequence min_sup_count = (af)dea eab e(abf)(bde) FS: a:3, b:2, d:2, e:3, f:2, ab:2, ad:2, ae:2, (af):2, ea:3, eb:2, fd:2, fe:2, (af)d:2, (af)e:2, eab:2 CS: ea:3, (af)d:2, (af)e:2, eab:2 Natural Language Processing Lab., NTU, 2006 Slide - Introduction (Cont.)  Definition – Prefix and Postfix (Projection)  , , and are prefixes of sequence  Given sequence Prefix Postfix /Projection Slide - Natural Language Processing Lab., NTU, 2006 Introduction (Cont.)  Definition – sequence s = – an item  – I-Step extension  s   =  Ex: s=  ={e} is an I-Step extension of – S-Step extension s  s  =  Ex: is an S-Step extension of  Slide - Natural Language Processing Lab., NTU, 2006 Introduction (Cont.)  Definition – Prefix Search Tree as bi as as bs bs bs ci di Slide - Natural Language Processing Lab., NTU, 2006 Search Space Pruning  (Cont.) Definition   (D) –Gamma (Γγ)  Total number of items in D – Equivalence of Projected Database  Two sequences s and s’, s  s’  D = D   (D ) =  (D ) s s’ s s’  Example – Df = D(af) = {de, (de)}   (D(af)) =  (Df) = Natural Language Processing Lab., NTU, 2006 Slide - Search Space Pruning  (Cont.) Definition – Early Termination by Equivalence  Two sequences s and s’, s  s’  And also  (D ) =  (D ) s s’  Then  , support(s   ) = support(s’   )  Example   (D(af)) =  (Df) – (af)d & (af)e are frequent – support((af)d) = support(fd) – support((af)e) = support(fe) – don’t know the support of fd and fe Natural Language Processing Lab., NTU, 2006 Slide - 10 CloSpan  (Cont.) Example (Cont.) 0 nil Df d:2, e:2  (Ds) Mod de, (de) as:3 nil fi:2 ds:2 Natural Language Processing Lab., NTU, 2006 es:2 bs:2 es:3 as:3 bs:2 Slide - 36 CloSpan  (Cont.) Example (Cont.) as:3 fi:2 ds:2 as:3 bs:2 es:2 es:3 bs:2 ea:3 (af)d:2 (af)e:2 eab:2 Natural Language Processing Lab., NTU, 2006 Slide - 37 Experimental Results  Synthetic Data – Parameters       D : Number of sequences in 000s C : Average itemsets per sequence T : Average items per itemset N : Number of different items in 000s S : Average itemsets in maximal sequences I : Average items in maximal sequences – Two Data Set   D10 C10 T2.5 N10 S6 I2.5 D5 C20 T20 N10 S20 I20  Real world datasets – KDDCup2000 – Gazelle Click Stream Natural Language Processing Lab., NTU, 2006 Slide - 38 Experimental Results  (Cont.) Synthetic Data  D10 C10 T2.5 N10 S6 I2.5 Natural Language Processing Lab., NTU, 2006 Slide - 39 Experimental Results  (Cont.) Synthetic Data  D5 C20 T20 N10 S20 I20 Natural Language Processing Lab., NTU, 2006 Slide - 40 Experimental Results (Cont.)  Real world datasets – KDDCup2000         29,369 sequences 35,722 sessions 87,546 page views The average number of sessions in a sequence is around The average number of pageviews in a session is The largest session contains 342 views The longest sequence has 140 sessions The largest sequence contains 651 page views Natural Language Processing Lab., NTU, 2006 Slide - 41 Experimental Results Natural Language Processing Lab., NTU, 2006 (Cont.) Slide - 42 Conclusions   Clospan to mine frequent closed sequences efficiently Clospan outperforms PrefixSpan Natural Language Processing Lab., NTU, 2006 Slide - 43 Natural Language Processing Lab., NTU, 2006 Slide - 44 Lexicographic Order  Definition – Lexicographic Order  t = {i , i , …,i }, i  i  …  i k k  t’ = {j , j , …,j }, j  j  …  j l l  t

Ngày đăng: 08/11/2022, 14:03

Xem thêm:

w