1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khai phá mẫu xu hướng tuần tự lên đối tượng từ tập dữ liệu chuỗi thời gian

101 9 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

I H C QU C GIA TP HCM IH - NGUY NG TU N T T T PD LI U CHU I TH I GIAN : 604801 LU TP H NG I IH - TPHCM ng d n khoa h Ng c ch m nh c ch m nh Lu TS Ph cb ov t 22 ih 2013 nH m: n Anh (Ch t ch h ng) TS Ph c (PB2) Ng n c a Ch t ch H ng ng Khoa qu c s a ch a (n CH T CH H NG n Anh - - NHI M V LU : 604801 I gian II NHI M V I DUNG m v d li u chu i th th c ph bi li u chu i th tr i s th i thu t h u tu n t ph bi n s h c c a vi t qu tc c c a lu Chu n b d li u th c nghi m, d li c hi n ti n x khoa li u th ng ch ng li u ng tu n t xu t hai gi i thu t h tr Ti i thu i thu III xu t qu th c nghi m xu u I M V : 21/1/2013 IV M V : 21/6/2013 V NG D N: Ng NG D N CH NHI M B (H TS (H Ng (H O L il ct ng d om Ng u ki n t t nh i dung su t th i gian th c hi n lu n il ih TP.HCM, ban ch nhi , cung c ng h c t p t t nh c hi L i cu t nh t ng ki n th c qu T LU D li u chu i th t hi n r t ph bi n nhi a ch c ng tri th d li u chu i th m n t quan tr iv nh ng k t qu li u s nh ng m i quan h mang y u t th i gian gi ng/hi y, lu t th i gian s ng c a xu t m ng tu n t gi i thu cg u t th i gian Lu p hai ng tu n t ph bi c t t p d li u chu i th bi ng th u xu u tu n t ph d li u tu n t , nhi u chu i th i mang l th i kho ng x y gi v id mang y u t Hai gi i thu c ki n (c th sau) xu t s d (brute- ng c -based) Tinh th n a c hai gi i thu t u tu n t ph bi t ng m c t ng ti p c n l p i thu ng c i ti t h bi n gi th n so x i quan h ph t qu th c nghi m t hai gi i thu cm hi u qu c a gi i thu t d i gi i thu t brute-force t qu th c nghi i v i d li lu m h tr lu t mang y u t th tr quy ct , nh d li u chu i th ng d ng h ABSTRACT Nowadays, time series is present in many various application domains such as finance, medicine, geology, meteorology, etc Analyzing and mining time series for useful information and hidden knowledge is very significant in those domains to help users such as data analysts and managers get fascinating insights into important temporal relationships of objects/phenomena along the time Therefore, this thesis introduced a notion of frequent temporal inter-object pattern and accordingly proposed two frequent temporal pattern mining algorithms on a set of different time series As compared to frequent sequential patterns in sequential databases, frequent temporal inter-object patterns are more informative with explicit and exact temporal information automatically discovered from many various time series The two proposed algorithms which are brute-force and tree-based are efficiently defined in a level-wise bottom-up approach dealing with the combinatorial explosion problem in the association analysis task When analyzing experimental results, we can see the advantage of tree-based algorithm in comparision with brute-force algorithm As shown in experiments on real financial time series, this thesis results can be further used to efficiently enhance the temporal rule mining process on time series for decision making support L xin c hi ng, ngo i tr nn t qu tham kh o t a lu cn b ng c p Nguy l y : GI I THI 1.1 Gi i thi u v 15 15 1.2 M c ti 19 1.3 Ph 19 19 1.5 C .20 : T .22 2.1 D li u chu i th i gian 22 2.1.1 Time Series 22 2.1.2 Chu i (Subsequence) 22 p (Match) 23 pt ng (Trivial Match) 23 ng (Non-trivial match) 24 2.1.6 Motif 24 2.2 X li u chu i th i gian 24 (similarity measurement) 24 2.2.2 Chu li u (Data Normalization) 27 u tu n t ph bi n 27 27 2.3.2 M t s ng g m s chi u x p x g p t u tu n t ph bi n 28 n PAA (Piecewise Aggregate Approximation) 29 p bi i chu i th i gian sang d SAX (Symbolic Aggregate Approximation) 30 i chu i th i gian sang d ng .33 i s quan h th i gian Allen 35 2.8 Generic Dictionary C# 36 U .38 li u chu i th i gian 38 iv d li u chu i th i gian l n 38 38 3.2 M t s gi i thu u ph bi n .40 3.2.1 Gi i thu t Apriori 40 3.2.2 Gi i thu t FP-Growth 41 3.2.3 Gi i thu t GSP 42 3.2.4 Gi i thu t Prefix Span .42 3.3 M t s li u chu i th i gian 43 NG TI P C N GI I QUY T V .47 4.1 Gi i thi u 47 ng tu n t d li u chu i th i gian .47 n chuy i thu i sang chu i th xu ng .49 i ph bi n t chu i th ng 50 ng tu n t ph bi ng 52 ng tu n t ng p d li u chu i th i gian 54 4.3.1 Gi i thu t Brute Force .55 4.3.2 Gi i thu t d 63 : TH C NGHI M 74 10 5.1 T ng quan 74 5.2 Ti n x li u 75 d li u 75 th c nghi m 76 5.4.1 Th c nghi m c nh min-sup 78 5.4.2 Th c nghi m c nh chi 5.4.3 Th c hi n ki m tra s 5.4.4 Th c hi n ki m tra s i th i gian 82 ng k t h u t o 86 a hai gi i thu t 88 5.5 K t qu .89 91 : K T LU N .92 6.1 T ng k t 92 a lu 92 .93 87 Tree Combination Function Calls (Tree- i gi i thu t Brute- Force (BF-CFC) B ng 5.11 s th hi n t ng s k t h p c a t ng gi i thu t v c 5.4.1 (c 5.11 ng s nh min_sup = 5) ng k t h u t o t hai gi i thu t Time series S&P500 S&P500, Boeing S&P500, Boeing, CAT S&P500, Boeing, CAT, CSX S&P500, Boeing, CAT, CSX, DE Length Motif# Pattern# BF-CFC 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 14 26 40 22 42 60 35 72 100 14 48 95 130 20 62 116 167 0 92 201 49 356 964 12 184 850 2646 14 292 1821 5394 16 451 3375 8943 B ng 5.12 s th hi n t ng s BF/Tree 325 4322 18841 52814 280 3635 14585 39887 1 1 1089 12363 69524 234731 10 2120 37940 248850 1110838 45 4394 70976 654827 3425875 45 8664 124109 1462348 7594717 821 9659 42690 108026 1477 25432 124296 322215 28 3247 46052 223493 572419 28 6523 76411 353637 997459 1 1 2 2 2 ng k t h p c a t ng gi i thu t v c 5.4.2 (c min-sup) TreeCFC i 88 5.12 ng s ng k t h p gi a hai gi i thu t Time min_sup Motif# Pattern# series S&P500 40 201 27 76 18 33 11 17 S&P500, 60 964 Boeing 43 273 31 106 20 47 14 22 S&P500, 100 2646 Boeing, 71 591 CAT 51 223 33 103 24 45 S&P500, 130 5394 Boeing, 94 1036 CAT, 68 365 CSX 43 160 30 67 S&P500, 167 8943 Boeing, 122 1558 CAT, 89 527 CSX, DE 58 223 43 92 5.4.4 Th c hi n ki m tra s Lu k thu t B ng 5.13 s th hi n t ng s 5.13 Time series S&P500 BF/Tree 1 2 2 2 2 2 6 2 2 2 t tc c c a t ng gi i a t ng gi i thu t v nh min_sup = 5) ng s c t o gi a hai gi i thu t Length Motif# Pattern# 20 40 60 80 52814 29061 16529 8545 4011 234731 95446 55205 30201 18863 1110838 291584 154807 82678 51917 3425875 580370 282326 142031 83085 7597862 1063560 497156 255860 159943 Treetime 39887 22423 12080 5625 2210 108026 63246 37646 18953 10754 322215 176527 102291 51751 30549 572419 307893 179793 87018 47970 997459 527015 311257 157503 95401 a hai gi i thu t ng c 5.4.1 (c BF-time 14 26 0 92 BFTreeBF/Tree Candidates Candidates 0 217 217 3052 3052 11851 11851 89 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 S&P500, Boeing S&P500, Boeing, CAT S&P500, Boeing, CAT, CSX S&P500, Boeing, CAT, CSX, DE 40 22 42 60 35 72 100 14 48 95 130 20 62 116 167 T b 201 49 356 964 12 184 850 2646 14 292 1821 5394 16 451 3375 8943 32959 723 8467 35680 88600 1324 21889 103442 251370 22 3027 40463 185259 432304 22 6161 67960 289470 733221 32959 723 8467 35680 88600 1324 21889 103442 251370 22 3027 40463 185259 432304 22 6161 67960 289470 733221 y r ng s gi m b t k t qu s 1 1 1 1 1 1 1 x cs l ng ng gi i thu t d 1 1 c t o b i gi i thu t ng Gi i thu t theo c v i gi i thu t h p ng m u ph bi s d nh t o k t h i gi i thu t Brute Force (xem k t qu t m c 5.4.3) 5.5 K t qu K t qu c ng tu n t li u chu i th li u ho c m d 5.16 t p k t qu m t s m u ph bi n n m sau: - S c phi u: (S&P500, Boeing, DE, CAT, CSX) - Chi - min-sup: 15 - S motifs: 353 i: 365 i 90 - S pattern: 175571 5.16 Minh h v T t ng tu n t u ph bi n u ch n l c nh ng m xu t hi n cao nh t ho c nh ng m ta th ( ng tr c ti p v t u s 1: EE-CATBB-DE M ph bi n gi a c phi y m i quan h am h p gi a chu i s ki ns s k t mm ng c phi u a DE S ki n c phi u CAT gi m m cr m c phi t nh ng s ki n n i b t l p l i T m u ph bi c m i quan h gi a lo i c phi i quan h Vi ki tl ng t t i th i gian s t t nh ng m u ph bi c r t nhi u th i gian ng m n D ng cs thu bi t ho li t lu n li u ph n nh 91 ng l n th t s ng v i m phi bi c li u c phi u d n d nh ng m cm tb th hi n m i quan h ph bi ng T nh c ho c l a ch pv in t v sau 5.6 Gi i thu t Brute Force i thu i thu (base i thu t d gi i thu t Brute-Force s i n c a gi i thu t d c i chi u t p k t qu M c c a hai gi i thu t i thu t d i gian th c thi t t i gi i thu t Brute-Force K t lu m c L1 l ng ph n t i thu t d k t h p gi nd n t t p Lk-1 sinh k m th p c n so v i s c quy t c t h gi i thu t d c g i gi i thu t Brute- Force d th max-span yt tc i h n th i kho ng gi ki n m c a c hai gi i thu min-sup s r ch s min-sup m n ho c p v i chi r t s ki n ch c t , vi i th c t , ta th y r ng n n s xu t hi l n nh 92 : K T LU N 6.1 T ng k t Lu c hi ng tu n t p d li u chu i th li i d li u sang d ng chu i th i th m m u xu ng tu n t xu t hai gi i thu c nghi th c hi li u th t Lu i thu mc i thu u ph bi n th t qu c nh ng cc ng n ch a t i d li u th i gian T nh ng m u ph bi cm i quan h ti m n gi ng a lu mm ng tu n t ph bi t p d li u chu i th nh ng n thi h tr nh - xu k t h p gi ti t v th - xu t hai gi i thu t m i thu t Brute- -based) Hai gi i thu cd gi i thu chi i thu t d ng ph bi n tu n t ng t gi i thu ng -Growth Hai c ki m ch c t K t qu c a hai gi i thu m u tu n t ph bi t p d li m i quan h th i gian Allen c ch nc am u d ng d li u ng nh t ng v 93 - Thi t k m d li u, v bi tr , ch p t k t qu ) tri - Lu gi i quy ng c i ti n ho ng h p chu i d li n - Lu nh ph c t p v th ng v i t ng gi i thu t - Trong ph m vi lu li l a ch c p v i t ng mi n d li c - K t qu c d ng l i v n d ng k t qu u ph bi ng t ho i quan h T as ng xung qua ng qua l i gi ng t 94 U THAM KH O [1] Jiawei Han, Micheline Kamber (2011), Data Mining: Concepts and Techniques, Third Edition, Morgan Kaufmann Publishers California Principles of Data Mining, MIT Press Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic Publishers, Boston, Dordrecht - London [4] Roy Batchelor (2004) - Lodon, [5] Ramasubramanian V.I.A.S.R.I (2012) New Delhi- 110 012 [6] Jo Ting, Takand Inter- -lai Chung (2006), Proceedings of the 2006 International Conference on Data Mining, DMIN, pp 30-36 [7] Chotirat Ann Ralanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn Keogh, Michail V Data Mining and Knowledge Discovery Handbook, pp 1069-1103 dimensionality reduction for indexing large time series ACM Transactions on Database Systems Volume 27, Issue 2, (June 2002) pp 188-228 95 In , 2001 Apr 2-6, Heidelberg, Germany, pp 273-282 In proceedings of the 18th , 2002 Feb 26-Mar 1; San Jose, CA, pp 212-221 Bioinformatics; 2001, Volume 17, pp 495-508 clustering of ARIMA timeData Mining, 2001 Nov 29-Dec 2, San Jose, CA, pp 273-280 Mining,1998 Aug 27-31, New York, NY, pp 239-241 In Proceedings of Principles of Data Mining and Knowledge Discovery, 5th European Conference, 2001 Sep 3-5; Freiburg, Germany, pp 115-127 In and Data Mining; 1999 Aug 15-18; San Diego, CA, pp 33-42 96 In Proceedings of 28th Internation Conference on Very Large Databases, 2002, Hong Kong, pp 406-417 In Proceedings of IEEE International Conference on Data Mining (ICDM 02), pp 370-377 [18] Duong Tuan Anh (2009), An Overview of Similarity Search in Time Series Data, Proceedings of the 11th Conference on Science and Technology - Section of Computer Science and Engineering, Ho Chi Minh City University of Technology, 21-23 October, 2009, pp 86-95 [19 solution Motif Discovery in Proceedings of the SIAM International Conference on Data Mining (SDM 2010), Columbus, Ohio, USA SIAM, 2010, pp 665-676 [20 for Prediction in Time- In Proceedings of Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, Florida, USA 2001 [21 Machine Learning and Data Mining in Pattern Recognition Lecture Notes in Computer Science Volume 2734, 2003, pp 252-265 [22 -Avelar (2007), Perception- based Data Mining and Decision Making in Economics and Finance vol 36, Springer Berlin Heidelberg Publisher, pp 85-118 97 [23 Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, California, USA, August 14-17, 1997, pp 24-30 [24 Technical Report TR-2005/1, Department of Computing, Curtin University of Technology [25 - Dimensional Inter - In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Seattle, Washington, June 1998, pp 12:1 12:7 [26 - based Approach to Mining Inter- In Proceedings of [27] Jiawei Han, Data Min Knowl Discov., Vol 15, Nr (2007) , pp 55-86 [28] J F Allen: Maintaining knowledge about temporal intervals In: Communications of the ACM 26 November 1983 ACM Press pp 832 843, ISSN 0001-0782 [29] Allen's Interval Algebra, http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html, Last accessed in 10/06/2013 98 [30] R Agrawal, R Srikant Fast algorithms for mining association rules In: Proc of VLDB pp 487-499, 1994 [31 temporal patterns for event detection in multivariate time series data In: Proc of -288, 2012 [32] D H Dorr, A M Denton Establishing relationships among patterns in stock market data Data & Knowledge Engineering 68 (2009) 318 337 [33] Financial time series, http://finance.yahoo.com/, Historical Prices tab, Accessed by 23/05/2013 [34] P G Ferreira, P J Azevedo, C G Silva, R M.M Brito Mining approximate motifs in time series In: Proc of the DS 2006, LNAI 4265, pp 89-101, 2006 [35] T Fu A review on time series data mining Engineering Applications of Artificial Intelligence 24 (2011) 164-181 [36] A Hafez: Association mining of dependency between time series In: Proc of SPIE Vol 4384, Data Mining and Knowledge Discovery: Theory, Tools, and Technology III, Belur V Dasarathy; Ed, pp 291-301, 2001 [37] J Han, J Pei, Y Yin Mining frequent patterns without candidate generation In: Proc of the 2000 ACM SIGMOD Int Conference on Management of Data, pp 1-12, 2000 [38 Computational Intelligence (SCI) 109 (2008) 169 184 n of 99 [39 multivariate interval time series Data Min Knowl Disc 15 (2007) 181-215 [40] A Mueen, E Keogh, Q Zhu, S S Cash, M B Westover, N Bigdely-Shamlo A disk-aware algorithm for time series motif discovery Data Min Knowl Disc 22 (2011) 73-105 [41] J Pei, J Han, B Mortazavi-Asl, J Wang, H Pinto, Q Chen, U Dayal, M Hsu Mining sequential patterns by Pattern-Growth: the PrefixSpan approach IEEE Transactions on Knowledge and Data Engineering 16 (10) (2004) 1-17 [42] L Sacchi, C Larizza, C Combi, R Bellazzi Data mining with temporal abstractions: learning rules from time series Data Mining and Knowledge Discovery 15(2007) 217-247 [43] R Srikant, R Agrawal Mining sequential patterns: generalizations and performance improvements In: Proc of the 5th Int Conference on Extending -17, 1996 [44] Z R Struzik Time series rule discovery: tough, not meaningless In: Proc of the Int Symposium on Methodologies for Intelligent Systems, pp 32-39, 2003 [45] H Tang, S S Liao Discovering original motifs with different lengths from time series Knowledge-Based Systems 21 (2008) 666-671 [46] Q Yang, X Wu 10 challenging problems in data mining research International Journal of Information Technology & Decision Making (4) (2006) 597 604 100 [47] J Lin, E Keogh, S Lonardi, P Patel Finding motifs in time series In: Proc of the 2nd Workshop on Temporal Data Mining, pp 53-68, 2002 [48 using neural networks Pattern Analysis and Applications, Springer-Verlag Publisher [49 Control Chart Pattern Recognition Using Artificial Neural Networks and Statistical International J.of Multidispl.Research & Advcs in Engg.(IJMRAE), ISSN 0975-7074, Vol 2, No II, pp 347-361 [50 WSEAS Transaction on Computers, Issue 1, Vol 6, pp 160-166 [51 Control Chart Pattern Recognition Using Articial Neural Networks, Turk J Elec Engin Publisher, 8(2) [52] Shasha, D & Wang, T (1990) New techniques for best-match retrieval ACM Trans on Information Systems, Vol 8(2) pp 140-158 101 H a ch F1, Q3, TP HCM O Th i gian ng o 2006 - 2011 Khoa h 2011 - 2013 Khoa h Th i gian T 4/2011 Th V nm m ... th i gian S&P500, -sup = 82 th bi u di n th i gian ch y ng v i chu i th u i th i gian = 100 83 5.12 th bi u di n th i gian ch y ng v i chu i th i gian S&P500, BA i th i gian. .. 5.13 th bi u di n th i gian ch y ng v i chu i th i gian S&P500, BA, i th i gian = 100 85 5.14 th bi u di n th i gian ch y ng v i chu i th i gian S&P500, BA, i th i gian = 100 85... quan h mang y u t th i gian gi ng/hi y, lu t th i gian s ng c a xu t m ng tu n t gi i thu cg u t th i gian Lu p hai ng tu n t ph bi c t t p d li u chu i th bi ng th u xu u tu n t ph d li u tu

Ngày đăng: 20/03/2022, 01:21

TRÍCH ĐOẠN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN