1. Trang chủ
  2. » Công Nghệ Thông Tin

Data Mining and Knowledge Discovery Handbook, 2 Edition part 81 pdf

10 334 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 368,95 KB

Nội dung

780 Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy constrained device that generates or receive streams of information. AOG has three main stages. Mining followed by adaptation to resources and data stream rates repre- sent the first two stages. Merging the generated knowledge structures when running out of memory represents the last stage. AOG has been used in clustering, classifica- tion and frequency counting (Gaber et al., 2005). Figure 39.8 shows a flowchart of AOG-mining process. It shows the sequence of the three stages of AOG. Fig. 39.8. AOG Approach Definitions, advantages and disadvantages of all of the above task-based ap- proaches are given in Table 39.3. 39.8 Related Work The last few years have witnessed the emergence of data management strategies focusing on data stream issues (Babcock et al., 2002). Querying and summarizing data that could be stored for further analysis are the main processing tasks studied in data stream management systems. Extension of query languages, query planning, scheduling, and optimization are the major research activities conducted in this area. Aurora (Abadi et al., 2003), COUGAR (Yao and Gehrke, 2002), Gigascope (Cra- nor et al., 2003), STREAM (Arasu et al., 2003), TelegraphCQ (Krishnamurthy et al., 2003) represent the first generation of data stream management systems. In this section, a brief description of each one is given as follows: • STREAM: STanford stREam datA Manager (STREAM) (Arasu et al., 2003) is a data stream management system that handles multiple continuous data streams and supports long-running continuous queries. The intermediate results of a con- tinuous query are stored in a data structure termed Scratch Store. The results of a query could be a data stream transferred to the user or it could be a relation that also could be stored for re-processing. To support continuous queries over data streams, a continuous query language termed as CQL has been developed as part of the system. The language supports relation-to-relation, stream-to-relation, and relation-to-stream operators. • Gigascope: is a specialized data stream management system (Cranor et al., 2003) for the application of network monitoring. It has its own SQL-like query language termed as GSQL. Unlike CQL, the input and output of this language are only 39 Data Stream Mining 781 Table 39.3. Task-based Techniques Technique Definition Pros Cons Approximation Al- gorithms Design algorithms that approximate mining results with error bounds. • Efficiency in running time. • the problem of data rates with regard to the avail- able resources could not be solved using approximation algorithms. Sliding Window Analyzing the most recent data streams • Applicable to most of data stream applications. • don’t provide a model for the whole data stream. Algorithm Output Granularity Adapting the algorithm param- eters according to data stream rate and memory consumption • Generic ap- proach that could be used with any mining technique with no or minor modifications • It has an over- head when run- ning for long period of time data streams. GSQL supports merge, selection, join and aggregation operations on data streams. Query optimization and performance considerations have been addressed in developing the language. The system serves a number of network related applications including intrusion detection and traffic analysis. • TelegraphCQ: is a continuous query processing system (Krishnamurthy et al., 2003) built on the basis of PostgreSQL open source query language. The system supports creating data streams, sources, wrappers and queries. • COUGAR: is a data stream management system (Yao and Gehrke, 2002) de- signed for sensor networks. Motivated by the fact that local computation in sen- sor networks is cheaper than transferring data generated from sensors over wire- less connections, a loosely coupled distributed architecture has been proposed to answer in-network queries. • Aurora: is a data stream management system (Abadi et al., 2003) that has the optimization features for load shedding, real-time query scheduling and QoS as- sessment. It is mainly designed to deal with very large numbers of data streams. 782 Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy Queries over data streams have some similarities with data stream mining in terms of research issues and challenges. The two main constraints for querying data streams are the unbounded memory requirement and the high data rate. Thus, the computation time per data element/record should be less than the data rate or the sampling rate. Furthermore, the unbounded memory requirement compounds the challenge by necessitating approximate rather than exact results. Significant re- search efforts have been conducted to approximate the query results (Babcock et al., 2002, Garofalakis et al., 2002b). The data stream mining algorithms have used some of the techniques introduced in the data stream management research. Sampling and load shedding (Muthukrish- nan, 2003) are among the basic techniques that have been introduced in querying data streams and extended to the data mining process. 39.9 Future Directions The field of data stream mining is in a nascent stage of evolution. The last few years have witnessed increased attention to this area of research due to the dissemination of data stream sources. Based on the state-of-the-art in the area and demands of data streaming applications, we can identify the future directions of research as follows: • Developing data mining algorithms for wireless sensor networks to serve a num- ber of real-time critical applications. • Online medical, scientific and biological data stream mining using data generated from medical, biological instruments and various tools employed in scientific laboratories. • Hardware solutions to small devices emitting or receiving data streams in order to enable high performance computation on small devices. • Developing software architectures that serve data streaming applications. 39.10 Summary In this chapter, a review of the state of the art in mining data streams has been pre- sented. Clustering, classification, frequency counting, time series analysis techniques have been discussed. Different systems that use data stream mining techniques have been also presented. Generalization of the approaches used in developing data stream mining techniques is given. The approaches have been broadly classified into data- based and task-based strategies. Sampling, load shedding, sketching, synopsis data structure creation and aggregation represent the data-based approaches. Approxi- mation algorithms, sliding window and algorithm output granularity are the two ap- proaches that form the task-based approaches. The chapter is concluded with pointers to future research directions in the area. 39 Data Stream Mining 783 References A. Arasu, B. Babcock. S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom. STREAM: The Stanford Stream Data Manager Demonstration description - short overview of system status and plans, in Proc. of the ACM Intl Conf. on Manage- ment of Data (SIGMOD 2003), June 2003, pp. 665 - 665. D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, C. Erwin, E. Galvez, M. Hatoun, J. Hwang, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R.Yan, S. Zdonik. Aurora: A Data Stream Management System (Demonstration). Pro- ceedings of the ACM SIGMOD International Conference on Management of Data (SIG- MOD’03), San Diego, CA, June 2003. C. Aggarwal, J. Han, J. Wang, P. S. Yu, A Framework for Clustering Evolving Data Streams, Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB’03), Berlin, Germany, Sept. 2003, pp 81-92. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, A Framework for Projected Clustering of High Dimensional Data Streams, Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB’04), Toronto, Canada, Aug. 2004, pp. 852-863. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, On Demand Classification of Data Streams, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD’04), Seattle, WA, Aug. 2004, pp. 503-508. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A Survey on Sensor Networks, IEEE Communication Magazine, August, 2002, pp. 102-114. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems, Proceedings of PODS, 2002, pp. 1-16. B. Babcock, M. Datar, and R. Motwani. Load Shedding Techniques for Data Stream Sys- tems (short paper), Proc. of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003), June 2003 B. Babcock, M. Datar, R. Motwani, L. O’Callaghan, Maintaining Variance and k-Medians over Data Stream Windows, Proceedings of the 22nd Symposium on Principles of Database Systems (PODS 2003), pp. 234 - 243. M. Burl, Ch. Fowlkes, J. Roden, A. Stechert, and S. Mukhtar, Diamond Eye: A distributed architecture for image data mining, in SPIE DMKD, Orlando, April 1999, pp. 197-206. M. Charikar, L. O’Callaghan, and R. Panigrahy, Better streaming algorithms for clustering problems, Proc. of 35th ACM Symposium on Theory of Computing (STOC), 2003, pp. 30-39. Y.D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil, MAIDS: Mining Alarming Incidents from Data Streams, (system demonstration), Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’04), Paris, France, June 2004, pp. 919 - 920. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, Multi-Dimensional Regression Analysis of Time-Series Data Streams, Proceedings of VLDB Conference, 2002, pp. 323-334. B. Castano, M. Judd, R. C. Anderson, and T. Estlin, Machine Learning Challenges in Mars Rover Traverse Science, Proc. of the ICML 2003 workshop on Machine Learning Tech- nologies for Autonomous Space Applications. C. Cranor , Johnson, T., Spataschek, O., and Shkapenyuk, V., Gigascope: a stream database for network applications, In Proceedings of the 2003 ACM SIGMOD international Con- ference on Management of Data (San Diego, California, June 09 - 12, 2003). SIGMOD ’03. ACM, New York, NY, 647-651 L. O’Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, and Rajeev Motwani, Streaming-data algorithms for high-quality clustering, Proceedings of IEEE Interna- 784 Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy tional Conference on Data Engineering, March 2002, pp. 685-697. G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, PODS 2003, pp. 296-306 J. Coughlan, Accelerating Scientific Discovery at NASA, SIAM SDM 2004, Florida USA. G. Cormode and S. Muthukrishnan., What is new: Finding significant differences in network data streams, INFOCOM 2004. Y. Chi, Philip S. Yu, Haixun Wang, Richard R. Muntz, Loadstar: A Load Shedding Scheme for Classifying Data Streams, The 2005 SIAM International Conference on Data Mining (SIAM SDM’05), 2005. G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang and P.S. Yu. Online mining of changes from data streams: Research problems and preliminary results, Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams. In cooper- ation with the 2003 ACM-SIGMOD International Conference on Management of Data (SIGMOD’03), San Diego, CA, June 8, 2003. P. Domingos and G. Hulten, Mining High-Speed Data Streams, In Proceedings of the As- sociation for Computing Machinery Sixth International Conference on Knowledge Dis- covery and Data Mining, 2000, pp. 71-80 P. Domingos and G. Hulten. Catching Up with the Data: Research Issues in Mining Data Streams, Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001. Santa Barbara, CA P. Domingos and G. Hulten, A General Method for Scaling Up Machine Learning Algo- rithms and its Application to Clustering, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, Williamstown, MA, Morgan Kaufmann, pp. 106-113. M. Dunham. Data Mining: Introductory and Advanced Topics. Pearson Education, 2003. F.J. Ferrer-Troyano, J.S. Aguilar-Ruiz and J.C. Riquelme, Discovering Decision Rules from Numerical Data Streams, ACM Symposium on Applied Computing - SAC04, 2004, ACM Press, pp. 649-653. U.M. Fayyad: Knowledge Discovery in Databases: An Overview. ILP 1997, pp. 3-16 U.M. Fayyad: Mining Databases: Towards Algorithms for Knowledge Discovery. IEEE Data Eng. Bull. 21(1), 1998 pp. 39-48. U.M. Fayyad, Georges G. Grinstein, Andreas Wierse: Information Visualization in Data Min- ing and Knowledge Discovery Morgan Kaufmann 2001. M.M. Gaber , Yu P. S., A Holistic Approach for Resource-aware Adaptive Data Stream Mining, Journal of New Generation Computing, Special Issue on Knowledge Discovery from Data Streams, 2006. V. Ganti, Johannes Gehrke, Raghu Ramakrishnan: Mining Data Streams under Block Evolu- tion. SIGKDD Explorations 3(2), 1002 pp. 1-10. M. Garofalakis, Johannes Gehrke, Rajeev Rastogi: Querying and mining data streams: you only get one look a tutorial. SIGMOD Conference 2002: 635 C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities, in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, AAAI/MIT, 2003. A.C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss: One-Pass Wavelet Decom- positions of Data Streams. TKDE 15(3), 2003, pp. 541-554. M.M. Gaber, Krishnaswamy, S., and Zaslavsky, A., On-board Mining of Data Streams in Sensor Networks, a book chapter in Advanced Methods of Knowledge Discovery from Complex Data, (Eds.) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane Cook, Springer Verlag,.2005. 39 Data Stream Mining 785 R. Grossman, Supporting the Data Mining Process with Next Generation DataMining Sys- tems, Enterprise Systems, August 1998 M.M. Gaber, Zaslavsky, A., and Krishnaswamy, S., Towards an Adaptive Approach for Min- ing Data Streams in Resource Constrained Environments, Proceedings of Sixth Inter- national Conference on Data Warehousing and Knowledge Discovery - Industry Track (DaWaK 2004), Zaragoza, Spain, 30 August - 3 September, Lecture Notes in Computer Science (LNCS), Springer Verlag. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering data streams, Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, November 2000, pp. 359-366. S. Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O’Callaghan, Cluster- ing Data Streams: Theory and Practice TKDE special issue on clustering, vol. 15, 2003, pp. 515-528. D.J. Hand, Statistics and Data Mining: Intersecting Disciplines, ACM SIGKDD Explo- rations, 1, 1, June 1999, pp. 16-19. D.J. Hand, Mannila H., and Smyth P. Principles of data mining, MIT Press, 2001. W. Hoeffding. Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association (58), 1963, pp. 13-30. J. Han, Pei, J., and Yin, Y, Mining frequent patterns without candidate generation, In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00), pp. 1-12. G. Hulten, L. Spencer, and P. Domingos. Mining Time-Changing Data Streams. ACM SIGKDD 2001, pp. 97-106. M. Henzinger, P. Raghavan and S. Rajagopalan, Computing on data streams , Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA, May 1998 T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, infer- ence, and prediction, New York: Springer, 2001 P. Indyk, N. Koudas, and S. Muthukrishnan, Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. In Proc. of the 26th Int. Conf. on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 363 - 372. C. Jin, Weining Qian, Chaofeng Sha, Jeffrey X. Yu, and Aoying Zhou, Dynamically Main- taining Frequent Items over a Data Stream, In Proceedings of the 12th ACM Conference on Information and Knowledge Management (CIKM’2003), pp. 287-294 M. Kantardzic, Data mining : concepts, models, methods and algorithms, Piscataway, NJ: IEEE Pr. Wiley Interscience, 2003. H. Kargupta, Ruchita Bhargava, Kun Liu, Michael Powers, Patrick Blair, Samuel Bushra, James Dull, Kakali Sarkar, Martin Klein, Mitesh Vasa, and David Handy, VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring, Proceedings of SIAM International Conference on Data Mining 2004. S. Krishnamurthy, S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: An Architectural Status Report. IEEE Data Engineering Bulletin, Vol 26(1), March 2003. E. Keogh, J. Lin, and W. Truppel. Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research. In proceedings of the 3rd IEEE International Conference on Data Mining. Melbourne, FL. Nov 19-22, 2003, pp. 115-122. H. Kargupta, Park, B., Pittie, S., Liu, L., Kushraj, D. and Sarkar, K. (2002). MobiMine: Monitoring the Stock Market from a PDA. ACM SIGKDD Explorations. January 2002. Volume 3, Issue 2, ACM Press, pp. 37-46. B. Krishnamachari and S.S. Iyengar. Efficient and Fault-tolerant Feature Extraction in Sensor Networks. In Proceedings of the 2nd International Workshop on Information Processing 786 Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy in Sensor Networks (IPSN ’03), Palo Alto, California, April 2003. B. Krishnamachari and S. Iyengar. Distributed Bayesian Algorithms for Fault-tolerant Event Region Detection in Wireless Sensor Networks. IEEE Transactions on Computers, vol. 53, No. 3, March 2004. M. Last, Online Classification of Nonstationary Data Streams, Intelligent Data Analysis, Vol. 6, No. 2, 2002, pp. 129-147. Y. Law, C. Zaniolo, An Adaptive Nearest Neighbor Classification Algorithm for Data Streams, Proceedings of the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2005), Springer Verlag, Porto, Portugal, October 3-7, 2005, pp. 108-120. J. Lin, E. Keogh, S. Lonardi, and B. Chiu, A Symbolic Representation of Time Series, with Implications for Streaming Algorithms, In proceedings of the 8th ACM SIGMOD Work- shop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA. June 13, 2003, pp. 2-11. G.S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceed- ings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 2002, pp. 346-357. R. Moskovitch, Y. Elovici, L. Rokach, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566, 2008. S. Muthukrishnan, Data streams: algorithms and applications. Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms, 2003. O. Nasraoui , Cardona C., Rojas C., and Gonzalez F., Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm, in Proc. of WebKDD 2003 - KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington DC, August 2003, p. 71 C. Ordonez. Clustering Binary Data Streams with K-means ACM DMKD 2003. B. Park and H. Kargupta. Distributed Data Mining: Algorithms, Systems, and Applications, Data Mining Handbook. Editor: Nong Ye. 2002. E. Perlman and A. Java, Predictive Mining of Time Series Data in Astronomy. In ASP Conf. Ser. 295: Astronomical Data Analysis Software and Systems XII, 2003. S. Papadimitriou, C. Faloutsos, and A. Brockwell, Adaptive, Hands-Off Stream Mining, 29th International Conference on Very Large Data Bases VLDB, 2003. S. Pirttikangas, J. Riekki, J. Kaartinen, J. Miettinen, S. Nissila, J. Roning. Genie Of The Net: A New Approach For A Context-Aware Health Club. In Proceedings of Joint 12th ECML’01 and 5th European Conference on PKDD’01. September 3-7, 2001, Freiburg, Germany. L. Rokach, Decomposition methodology for classification tasks: a meta decomposer frame- work, Pattern Analysis and Applications, 9(2006):257–271. L. Rokach, O. Maimon and R. Arbel, Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3) (2006), pp. 329–350. A. Srivastava and J. Stroeve, Onboard Detection of Snow, Ice, Clouds and Other Geophysical Processes Using Kernel Methods, Proceedings of the ICML’03 workshop on Machine Learning Technologies for Autonomous Space Applications. S. Tanner, M. Alshayeb, E. Criswell, M. Iyer, A. McDowell, M. McEniry, K. Regner, EVE: On-Board Process Planning and Execution, Earth Science Technology Confer- ence, Pasadena, CA, Jun. 11 - 14, 2002. 39 Data Stream Mining 787 N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack and M. Stonebraker, Load Shedding in a Data Stream Manager Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), September, 2003. N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, M. Stonebraker. Load Shedding on Data Streams, In Proceedings of the Workshop on Management and Processing of Data Streams (MPDS 03), San Diego, CA, USA, June 8, 2003. H. Toivonen, Sampling large databases for association rules, Proceeding of VLDB Confer- ence, 1996 Y. Yao, J. E. Gehrke, The Cougar Approach to In-Network Query Processing in Sensor Net- works, SIGMOD Record, Volume 31, Number 3. September 2002, pp. 9-18. H. Wang, W. Fan, P. Yu and J. Han, Mining Concept-Drifting Data Streams using Ensemble Classifiers, in the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Aug. 2003, Washington DC, USA. Y. Zhu and D. Shasha, Efficient Elastic Burst Detection in Data Streams, The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD- 2003 24 August 2003 - 27 August 2003, pp 336 - 345. 40 Mining Concept-Drifting Data Streams Haixun Wang 1 , Philip S. Yu 2 , and Jiawei Han 3 1 IBM T. J. Watson Research Center haixun@us.ibm.com 2 IBM T. J. Watson Research Center psyu@us.ibm.com 3 University of Illinois, Urbana Champaign hanj@cs.uiuc.edu Summary. Knowledge discovery from infinite data streams is an important and difficult task. We are facing two challenges, the overwhelming volume and the concept drifts of the stream- ing data. In this chapter, we introduce a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification ac- curacy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classifica- tion. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models. Key words: Data Mining, concept learning, classifier design and evaluation 40.1 Introduction Knowledge discovery on streaming data is a research topic of growing interest (Bab- cock et al., 2002, Chen et al., 2002, Domingos and Hulten, 2000, Hulten et al., 2001). The fundamental problem we need to solve is the following: given an infi- nite amount of continuous measurements, how do we model them in order to capture time-evolving trends and patterns in the stream, and make time-critical predictions? Huge data volume and drifting concepts are not unfamiliar to the Data Min- ing community. One of the goals of traditional Data Mining algorithms is to learn models from large databases with bounded-memory. It has been achieved by several classification methods, including Sprint (Shafer et al., 1996), BOAT (Gehrke et al., 1999), etc. Nevertheless, the fact that these algorithms require multi- ple scans of the training data makes them inappropriate in the streaming environment where examples are coming in at a higher rate than they can be repeatedly analyzed. O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09823-4_40, © Springer Science+Business Media, LLC 2010 . August, 20 02, pp. 1 02- 114. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems, Proceedings of PODS, 20 02, pp. 1-16. B. Babcock, M. Datar, and R Systems, and Applications, Data Mining Handbook. Editor: Nong Ye. 20 02. E. Perlman and A. Java, Predictive Mining of Time Series Data in Astronomy. In ASP Conf. Ser. 29 5: Astronomical Data Analysis. Items over a Data Stream, In Proceedings of the 12th ACM Conference on Information and Knowledge Management (CIKM 20 03), pp. 28 7 -29 4 M. Kantardzic, Data mining : concepts, models, methods and algorithms,

Ngày đăng: 04/07/2014, 05:21

TỪ KHÓA LIÊN QUAN