1. Trang chủ
  2. » Giáo Dục - Đào Tạo

CloSpan: Mining Closed Sequential Patterns in Large Datasets

50 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 50
Dung lượng 3,2 MB

Nội dung

CloSpan: Mining Closed Sequential Patterns in Large Datasets. PowerPoint Presentation CloSpan Mining Closed Sequential Patterns in Large Datasets SEQUENTIAL PATTERNS Natural Language Processing Lab , NTU, 2006 Slide Outline Introduction Search Space Pruning Cl.

SEQUENTIAL PATTERNS CloSpan: Mining Closed Sequential Patterns in Large Datasets Outline      Introduction Search Space Pruning CloSpan Experimental Results Conclusions Natural Language Processing Lab., NTU, 2006 Slide - Introduction  Definition – Sequence, Elements, Subsequence and Sequential Pattern A sequence : < (ef) (ab) (df) c b> A sequence database SID sequence 10 20 30 40 Elements items within an element are listed alphabetically is a subsequence of threshold Givendsupport min_sup_count =2, is a sequential pattern Natural Language Processing Lab., NTU, 2006 Slide - Introduction  (Cont.) Definition – Frequent Sequential Pattern (FS)  Include all the sequences whose support is no less than min_sup – Closed Frequent Sequential Pattern (CS)  Include no sequence which has a super-sequence with the same support  CS  FS Natural Language Processing Lab., NTU, 2006 Slide - Introduction  (Cont.) Example – FS & CS ID Sequence min_sup_count = (af)dea eab e(abf)(bde) FS: a:3, b:2, d:2, e:3, f:2, ab:2, ad:2, ae:2, (af):2, ea:3, eb:2, fd:2, fe:2, (af)d:2, (af)e:2, eab:2 CS: ea:3, (af)d:2, (af)e:2, eab:2 Natural Language Processing Lab., NTU, 2006 Slide - Introduction (Cont.)  Definition – Prefix and Postfix (Projection)  , , and are prefixes of sequence  Given sequence Prefix Postfix /Projection Slide - Natural Language Processing Lab., NTU, 2006 Introduction (Cont.)  Definition – sequence s = – an item  – I-Step extension  s   =  Ex: s=  ={e} is an I-Step extension of – S-Step extension s  s  =  Ex: is an S-Step extension of  Slide - Natural Language Processing Lab., NTU, 2006 Introduction (Cont.)  Definition – Prefix Search Tree as bi as as bs bs bs ci di Slide - Natural Language Processing Lab., NTU, 2006 Search Space Pruning  (Cont.) Definition   (D) –Gamma (Γγ)  Total number of items in D – Equivalence of Projected Database  Two sequences s and s’, s  s’  D = D   (D ) =  (D ) s s’ s s’  Example – Df = D(af) = {de, (de)}   (D(af)) =  (Df) = Natural Language Processing Lab., NTU, 2006 Slide - Search Space Pruning  (Cont.) Definition – Early Termination by Equivalence  Two sequences s and s’, s  s’  And also  (D ) =  (D ) s s’  Then  , support(s   ) = support(s’   )  Example   (D(af)) =  (Df) – (af)d & (af)e are frequent – support((af)d) = support(fd) – support((af)e) = support(fe) – don’t know the support of fd and fe Natural Language Processing Lab., NTU, 2006 Slide - 10 CloSpan  (Cont.) Example (Cont.) 0 nil Df d:2, e:2  (Ds) Mod de, (de) as:3 nil fi:2 ds:2 Natural Language Processing Lab., NTU, 2006 es:2 bs:2 es:3 as:3 bs:2 Slide - 36 CloSpan  (Cont.) Example (Cont.) as:3 fi:2 ds:2 as:3 bs:2 es:2 es:3 bs:2 ea:3 (af)d:2 (af)e:2 eab:2 Natural Language Processing Lab., NTU, 2006 Slide - 37 Experimental Results  Synthetic Data – Parameters       D : Number of sequences in 000s C : Average itemsets per sequence T : Average items per itemset N : Number of different items in 000s S : Average itemsets in maximal sequences I : Average items in maximal sequences – Two Data Set   D10 C10 T2.5 N10 S6 I2.5 D5 C20 T20 N10 S20 I20  Real world datasets – KDDCup2000 – Gazelle Click Stream Natural Language Processing Lab., NTU, 2006 Slide - 38 Experimental Results  (Cont.) Synthetic Data  D10 C10 T2.5 N10 S6 I2.5 Natural Language Processing Lab., NTU, 2006 Slide - 39 Experimental Results  (Cont.) Synthetic Data  D5 C20 T20 N10 S20 I20 Natural Language Processing Lab., NTU, 2006 Slide - 40 Experimental Results (Cont.)  Real world datasets – KDDCup2000         29,369 sequences 35,722 sessions 87,546 page views The average number of sessions in a sequence is around The average number of pageviews in a session is The largest session contains 342 views The longest sequence has 140 sessions The largest sequence contains 651 page views Natural Language Processing Lab., NTU, 2006 Slide - 41 Experimental Results Natural Language Processing Lab., NTU, 2006 (Cont.) Slide - 42 Conclusions   Clospan to mine frequent closed sequences efficiently Clospan outperforms PrefixSpan Natural Language Processing Lab., NTU, 2006 Slide - 43 Natural Language Processing Lab., NTU, 2006 Slide - 44 Lexicographic Order  Definition – Lexicographic Order  t = {i , i , …,i }, i  i  …  i k k  t’ = {j , j , …,j }, j  j  …  j l l  t

Ngày đăng: 08/11/2022, 14:03

w