S Sumathi, S.N Sivanandam Introduction to Data Mining and its Applications Studies in Computational Intelligence, Volume 29 Editor-in-chief Prof Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul Newelska 01-447 Warsaw Poland E-mail: kacprzyk@ibspan.waw.pl Further volumes of this series can be found on our homepage: springer.com Vol 12 Jonathan Lawry Modelling and Reasoning with Vague Concepts, 2006 ISBN 0-387-29056-7 Vol 13 Nadia Nedjah, Ajith Abraham, Luiza de Macedo Mourelle (Eds.) Genetic Systems Programming, 2006 ISBN 3-540-29849-5 Vol 14 Spiros Sirmakessis (Ed.) Adaptive and Personalized Semantic Web, 2006 ISBN 3-540-30605-6 Vol 15 Lei Zhi Chen, Sing Kiong Nguang, Xiao Dong Chen Modelling and Optimization of Biotechnological Processes, 2006 ISBN 3-540-30634-X Vol 16 Yaochu Jin (Ed.) Multi-Objective Machine Learning, 2006 ISBN 3-540-30676-5 Vol 17 Te-Ming Huang, Vojislav Kecman, Ivica Kopriva Kernel Based Algorithms for Mining Huge Data Sets, 2006 ISBN 3-540-31681-7 Vol 18 Chang Wook Ahn Advances in Evolutionary Algorithms, 2006 ISBN 3-540-31758-9 Vol 19 Ajita Ichalkaranje, Nikhil Ichalkaranje, Lakhmi C Jain (Eds.) Vol 21 Câ ndida Ferreira Gene Expression on Programming: Mathematical Modeling by an Artificial Intelligence, 2006 ISBN 3-540-32796-7 Vol 22 N Nedjah, E Alba, L de Macedo Mourelle (Eds.) Parallel Evolutionary Computations, 2006 ISBN 3-540-32837-8 Vol 23 M Last, Z Volkovich, A Kandel (Eds.) Algorithmic Techniques for Data Mining, 2006 ISBN 3-540-33880-2 Vol 24 Alakananda Bhattacharya, Amit Konar, Ajit K Mandal Parallel and Distributed Logic Programming, 2006 ISBN 3-540-33458-0 Vol 25 Zoltá n É sik, Carlos Martín-Vide, Victor Mitrana (Eds.) Recent Advances in Formal Languages and Applications, 2006 ISBN 3-540-33460-2 Vol 26 Nadia Nedjah, Luiza de Macedo Mourelle (Eds.) Swarm Intelligent Systems, 2006 ISBN 3-540-33868-3 Vol 27 Vassilis G Kaburlasos Towards a Unified Modeling and KnowledgeRepresentation based on Lattice Theory, 2006 ISBN 3-540-34169-2 Vol 28 Brahim Chaib-draa, Jö rg P Mü ller (Eds.) Multiagent based Supply Chain Management, 2006 ISBN 3-540-33875-6 Intelligent Paradigms for Assistive and Preventive Healthcare, 2006 Vol 29 S Sumathi, S.N Sivanandam ISBN 3-540-31762-7 2006 ISBN 3-540-34350-4 Vol 20 Wojciech Penczek, Agata Półrola Advances in Verification of Time Petri Nets and Timed Automata, 2006 ISBN 3-540-32869-6 Introduction to Data Mining and its Applications, S Sumathi S.N Sivanandam Introduction to Data Mining and its Applications With 108 Figures and 23 Tables 123 Dr S Sumathi Assistant Professor Department of Electrical and Electronics Engineering PSG College of Technology Coimbatore 641 004 Tamil Nadu, India Dr S.N Sivanandam Professor and Head Department of Computer Science and Engineering PSG College of Technology P.O Box 1611 Peelamedu Coimbatore 641 004 Tamil Nadu, India Library of Congress Control Number: 2006926723 ISSN print edition: 1860-949X ISSN electronic edition: 1860-9503 ISBN-10 3-540-34350-4 Springer Berlin Heidelberg New York ISBN-13 978-3-540-34350-9 Springer Berlin Heidelberg New York This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag Violations are liable to prosecution under the German Copyright Law Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 The use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use Cover design: deblik, Berlin Typesetting by the authors and SPi Printed on acid-free paper SPIN: 11671213 89/SPi 543210 Contents Introduction to Data Mining Principles 1.1 Data Mining and Knowledge Discovery 1.2 Data Warehousing and Data Mining - Overview 1.2.1 Data Warehousing Overview 1.2.2 Concept of Data Mining 1.3 Summary 20 1.4 Review Questions 20 Data Warehousing, Data Mining, and OLAP 2.1 Data Mining Research Opportunities and Challenges 2.1.1 Recent Research Achievements 2.1.2 Data Mining Application Areas 2.1.3 Success Stories 2.1.4 Trends that Affect Data Mining 2.1.5 Research Challenges 2.1.6 Test Beds and Infrastructure 2.1.7 Findings and Recommendations 2.2 Evolving Data Mining into Solutions for Insights 2.2.1 Trends and Challenges 2.3 Knowledge Extraction Through Data Mining 2.3.1 Data Mining Process 2.3.2 Operational Aspects 2.3.3 The Need and Opportunity for Data Mining 2.3.4 Data Mining Tools and Techniques 2.3.5 Common Applications of Data Mining 2.3.6 What about Data Mining in Power Systems? 2.4 Data Warehousing and OLAP 2.4.1 Data Warehousing for Actuaries 2.4.2 Data Warehouse Components 2.4.3 Management Information 2.4.4 Profit Analysis 21 23 25 27 29 30 31 33 33 35 36 37 39 50 51 52 55 56 57 57 58 59 60 VI Contents 2.5 2.6 2.7 2.4.5 Asset Liability Management Data Mining and OLAP 2.5.1 Research 2.5.2 Data Mining Summary Review Questions 60 61 61 68 72 72 Data Marts and Data Warehouse 75 3.1 Data Marts, Data Warehouse, and OLAP 77 3.1.1 Business Process Re-engineering 77 3.1.2 Real-World Usage 78 3.1.3 Business Intelligence 78 3.1.4 Different Data Structures 82 3.1.5 Different Users 84 3.1.6 Technological Foundation 86 3.1.7 Data Warehouse 87 3.1.8 Informix Architecture 87 3.1.9 Building the Data Warehouse/Data Mart Environment 88 3.1.10 History 91 3.1.11 Nondetailed Data in the Enterprise Data Warehouse 92 3.1.12 Sharing Data Among Data Marts 93 3.1.13 The Manufacturing Process 93 3.1.14 Subdata Marts 95 3.1.15 Refreshment Cycles 95 3.1.16 External Data 96 3.1.17 Operational Data Stores (ODS) and Data Marts 97 3.1.18 Distributed Metadata 98 3.1.19 Managing the Warehouse Environment 100 3.1.20 OLAP 102 3.2 Data Warehousing for Healthcare 107 3.2.1 A Data Warehousing Perspective for Healthcare 107 3.2.2 Adding Value to your Current Data 107 3.2.3 Enhance Customer Relationship Management 108 3.2.4 Improve Provider Management 109 3.2.5 Reduce Fraud 109 3.2.6 Prepare for HEDIS Reporting 110 3.2.7 Disease Management 110 3.2.8 What to Expect When Beginning a Data Warehouse Implementation 110 3.2.9 Definitions 111 3.3 Data Warehousing in the Telecommunications Industry 112 3.3.1 Implementing One View 118 3.3.2 Business Benefit 120 3.3.3 A Holistic Approach 121 Contents 3.4 3.5 3.6 3.7 3.8 VII The Telecommunications Lifecycle 122 3.4.1 Current Enterprise Environment 122 3.4.2 Getting to the Root of the Problem 123 3.4.3 The Telecommunications Lifecycle 125 3.4.4 Telecom Administrative Outsourcing 127 3.4.5 Choose your Outsourcing Partner Wisely 127 3.4.6 Security in Web-Enabled Data Warehouse 128 Security Issues in Data Warehouse 129 3.5.1 Performance vs Security 130 3.5.2 An Ideal Security Model 131 3.5.3 Real-World Implementation 131 3.5.4 Proposed Security Model 136 Data Warehousing: To Buy or To Build a Fundamental Choice for Insurers 140 3.6.1 Executive Overview 140 3.6.2 The Fundamental Choice 140 3.6.3 Analyzing the Strategic Value of Data Warehousing 141 3.6.4 Addressing your Concerns 142 TM 146 3.6.5 Introducing FellowDSS Summary 148 Review Questions 149 Evolution and Scaling of Data Mining Algorithms 151 4.1 Data-Driven Evolution of Data Mining Algorithms 152 4.1.1 Transaction Data 153 4.1.2 Data Streams 154 4.1.3 Graph and Text-Based data 155 4.1.4 Scientific Data 156 4.2 Scaling Mining Algorithms to Large DataBases 157 4.2.1 Prediction Methods 157 4.2.2 Clustering 160 4.2.3 Association Rules 161 4.2.4 From Incremental Model Maintenance to Streaming Data 162 4.3 Summary 163 4.4 Review Questions 164 Emerging Trends and Applications of Data Mining 165 5.1 Emerging Trends in Business Analytics 166 5.1.1 Business Users 166 5.1.2 The Driving Force 167 5.2 Business Applications of Data Mining 170 5.3 Emerging Scientific Applications in Data Mining 177 5.3.1 Biomedical Engineering 177 5.3.2 Telecommunications 178 VIII Contents 5.4 5.5 5.3.3 Geospatial Data 180 5.3.4 Climate Data and the Earth’s Ecosystems 181 Summary 182 Review Questions 183 Data 6.1 6.2 6.3 6.4 6.5 Mining Trends and Knowledge Discovery 185 Getting a Handle on the Problem 186 KDD and Data Mining: Background 187 Related Fields 191 Summary 194 Review Questions 194 Data Mining Tasks, Techniques, and Applications 195 7.1 Reality Check for Data Mining 196 7.1.1 Data Mining Basics 196 7.1.2 The Data Mining Process 197 7.1.3 Data Mining Operations 199 7.1.4 Discovery-Driven Data Mining Techniques: 201 7.2 Data Mining: Tasks, Techniques, and Applications 204 7.2.1 Data Mining Tasks 204 7.2.2 Data Mining Techniques 206 7.2.3 Applications 209 7.2.4 Data Mining Applications – Survey 210 7.3 Summary 215 7.4 Review Questions 216 Data Mining: an Introduction – Case Study 217 8.1 The Data Flood 218 8.2 Data Holds Knowledge 218 8.2.1 Decisions From the Data 219 8.3 Data Mining: A New Approach to Information Overload 219 8.3.1 Finding Patterns in Data, which we can use to Better, Conduct the Business 219 8.3.2 Data Mining can be Breakthrough Technology 220 8.3.3 Data Mining Process in an Information System 221 8.3.4 Characteristics of Data Mining 222 8.3.5 Data Mining Technology 223 8.3.6 Technology Limitations 224 8.3.7 BBC Case Study: The Importance of Business Knowledge 225 8.3.8 Some Medical and Pharmaceutical Applications of Data Mining 228 8.3.9 Why Does Data Mining Work? 228 8.4 Summary 229 8.5 Review Questions 229 Contents IX Data Mining & KDD 231 9.1 Data Mining and KDD – Overview 232 9.1.1 The Idea of Knowledge Discovery in Databases (KDD) 234 9.1.2 How Data Mining Relates to KDD 235 9.1.3 The Data Mining Future 237 9.2 Data Mining: The Two Cultures 238 9.2.1 The Central Issue 238 9.2.2 What are Data Mining and the Data Mining Process?239 9.2.3 Machine Learning 239 9.2.4 Impact of Implementation 240 9.3 Summary 241 9.4 Review Questions 241 10 Statistical Themes and Lessons for Data Mining 243 10.1 Data Mining and Official Statistics 244 10.1.1 What is New in Data Mining is: 244 10.1.2 Goals and Tools of Data Mining 244 10.1.3 New Mines: Texts, Web, Symbolic Data? 245 10.1.4 Applications in Official Statistics 246 10.2 Statistical Themes and Lessons for Data Mining 246 10.2.1 An Overview of Statistical Science 248 10.2.2 Is Data Mining “Statistical Deja Vu” (All Over Again)? 252 10.2.3 Characterizing Uncertainty 254 10.2.4 What Can Go Wrong, Will Go Wrong 256 10.2.5 Symbiosis in Statistics 261 10.3 Summary 262 10.4 Review Questions 263 11 Theoretical Frameworks for Data Mining 265 11.1 Two Simple Approaches 266 11.1.1 Probabilistic Approach 267 11.1.2 Data Compression Approach 268 11.2 Microeconomic View of Data Mining 268 11.3 Inductive Databases 269 11.4 Summary 270 11.5 Review Questions 270 12 Major and Privacy Issues in Data Mining and Knowledge Discovery 271 12.1 Major Issues in Data Mining 272 12.2 Privacy Issues in Knowledge Discovery and Data Mining 275 12.2.1 Revitalized Privacy Threats 277 12.2.2 New Privacy Threats 279 X Contents 12.3 12.4 12.5 12.2.3 Possible Solutions 281 The OECD Personal Privacy Guidelines 283 12.3.1 Risks Privacy and the Principles of Data Protection 284 12.3.2 The OECD Guidelines and Knowledge Discovery 286 12.3.3 Knowledge Discovery about Groups 288 12.3.4 Legal Systems and other Guidelines 289 Summary 290 Review Questions 291 13 Active Data Mining 293 13.1 Shape Definitions 295 13.2 Queries 297 13.3 Triggers 299 13.3.1 Wave Execution Semantics 300 13.4 Summary 302 13.5 Review Questions 302 14 Decomposition in Data Mining - A Case Study 303 14.1 Decomposition in the Literature 304 14.1.1 Machine Learning 304 14.2 Typology of Decomposition in Data Mining 305 14.3 Hybrid Models 306 14.4 Knowledge Structuring 309 14.5 Rule-Structuring Model 310 14.6 Decision Tables, Maps, and Atlases 311 14.7 Summary 312 14.8 Review Questions 313 15 Data 15.1 15.2 15.3 15.4 Mining System Products and Research Prototypes 315 How to Choose a Data Mining System 316 Examples of Commercial Data Mining Systems 318 Summary 319 Review Questions 320 16 Data Mining in Customer Value and Customer Relationship Management 321 16.1 Data Mining: A Concept of Customer Relationship Marketing322 16.1.1 Traditional Marketing Research 322 16.1.2 Relationship Marketing – the Modern View 323 16.1.3 Understanding the Background of Data Mining 324 16.1.4 Continuous Relationship Marketing 326 16.1.5 Developing the Data Mining Project 327 16.1.6 Further Research: 328 16.2 Introduction to Customer Acquisition 328 814 References Samokhvalov, K., (1973) On theory of empirical prediction, (Comp Syst., #55), 3–35 (In Russian) Shavlik, J., & Towell, G (1989) Combining explanation-based learning and artificial neural networks Proceedings of the Sixth International Workshop on Machine Learning,pp 90–93 Ithaca, NY: Morgan Kaufmann Ahlert, H.: Enterprise Customer Management: Integrating Corporate and Customer Information In: Henning-Thurau, T., Hansen, U (eds.): Relationship Marketing, Springer, Berlin Heidelberg New York (2000) 255–264 Arndt, D.; Gersten, W.: External Data Selection for Data Mining in Direct Marketing In: International Conference on Information Quality 2001, MIT, Boston (2001) (to appear) Arndt, D.; Gersten, W.; Nakhaeizadeh, G.; Wirth, R.: eCustomers – How the Internet affects our relationship to our customers In: DaimlerChrysler (ed.): eMagine: Journey into the eFuture, Stuttgart (2000), 23–27 Berry, L.L.: Relationship Marketing In: Berry, L.L., Shostack; G.D., Upah, G.D (eds.): Emerging Perspectives in Service Marketing 1983, Chicago (1983) 25–28 Berry, M.J.A., Linoff, G.S.: Mastering Data Mining Wiley, New York (2000) Chojnacki, K.: Relationship Marketing at Volkswagen In: Henning-Thurau, T., Hansen, U (eds.): Relationship Marketing, Springer Berlin Heidelberg New York (2000) 49–58 Diller, H.: Vahlens groòes Marketinglexikon Deutscher Taschenbuchverlag, Mă unchen (1994) ECCS: CRM Dening customer relationship marketing and management In: http://www.eccs.uk.com/crm/crmdefinitions/define.asp, 27.01.1999 (1999) English, L.P.: Improving Data Warehouse and Business Information Quality Wiley, New York (1999) Gersten, W., Wirth, R., Arndt, D.: Predictive Modeling in Automotive Direct Marketing: Tools, Experiences and Open Issues In: Proceedings of the 6th International Conference on Machine Learning ACM, New York (2000) 398–406 Muther, A.: Electronic Commerce Springer, Berlin Heidelberg New York (1999) Pyle, D.: Data preparation for data mining Morgan Kaufmann Publishers, San Francisco (1999) Schmidt, R.E.: Mit Customer Relationship Management zum Prozessportal ă In: Bach, V., Osterle, H.: Customer Relationship Management in der Praxis XY, Berlin et al (2000) Schweiger, A., Wilde, K.D.: Database Marketing – Aufbau und Management in: Hilke, W.: Direct Marketing Gabler, Wiesbaden (1993) 89–125 Gallant, S., Piatesky-Shapiro, G and Pyle, D (2000): Successful customer relationship management in financial applications Tutorial PM-1 KDD-2000, ACM SIGKDD 7th annual conference on Data Mining and Knowledge Discovery References 815 Ripley, B.D (1996): Pattern Recognition and Neural Networks Cambridge University Press, Cambridge UK Saarenvirta, G (1998): Mining customer data A step-by-step look at powerful clustering and segmentation methodology DB2 magazine http://www.db2mag.com/db area/archives/1998/q3/98fsaar.shtml Thearling, K (2000) Data mining and customer relationships http://www3.shore.net/∼kht/text/whexcerpt/whexcerpt.htm Excerpted from Building Data Mining Applications for CRM by Alex Berson, Stephen Smith, Kurt Thearling (McGraw Hill, 2000) Barry, M.J.A and G Linoff (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley, New York Breiman, L., J.H Friedman, R.A Olshen, and P.J Stone (1984), Classification and Regression Trees, Wadworth International Group, Belmont, CA Carlett, J (1991), Megainduction: Machine Learning on Very Large Databases, Ph.D Thesis, Department of Computer Science, University of Sydney, Australia Cattral, R., F Oppacher, and D Deugo (2001), Supervised and unsupervised data mining with an evolutionary algorithm, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp 767–776 Cios, K., W Pedrycz, and R Swiniarski (1998), Data Mining: Methods for Knowledge Discovery, Kluwer, Boston, MA Dugherty, D., R Kohavi, and M Sahami (1995), Supervised and unsupervised discretization of continuous features, Proceedings of the 12th International Machine Learning Conference, pp 194–202 Duda, R.O and P.E Hart (1973), Pattern Recognition and Scene Analysis, John Wiley, New York Fayyad, U.M and K.B Irani (1993), Multi-interval discretization of continuously-valued attributes for classification learning, Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp 1022– 1027 Fukunaga, K (1990), Introduction to Statistical Pattern Analysis, Academic Press, San Diego, CA Han, J and M Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, San Diego, CA John, G., R Kohavi, and K Pfleger (1994), Irrelevant features and the subset selection problem, Proceedings of the 11th International Conference on Machine Learning, ICLM.94, Morgan Kaufmann, San Diego, CA, pp 121–127 Kovacs, T (2001), What should a classifier system learn, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp 775–782 Kusiak, A (2000), Decomposition in data mining: an industrial case study, IEEE Transactions on Electronics Packaging Manufacturing, Vol 23, No 4, pp 345–353 816 References Kusiak, A (2001a), Rough set theory: a data mining tool for semiconductor manufacturing, IEEE Transactions on Electronics Packaging Manufacturing, Vol 24, No 1, pp 44–50 Kusiak, A., J.A Kern, K.H Kernstine, and T.L Tseng (2000), Autonomous decision-making: A data mining approach, IEEE Transactions on Information Technology in Biomedicine, Vol 4, No 4, pp 274–284 Pawlak Z (1982), Rough sets, International Journal of Information and Computer Science, Vol 11, No 5, pp 341–356 Pawlak, Z (1991), Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, Boston, MA Quinlan, J.R (1986), Induction of decision trees, Machine Learning, Vol 1, No 1, pp 81–106 Ragel, A and B Cremilleux (1998), Treatment of missing values for association rules, Proceedings of the Second Pacific Asia Conference, PAKDD.98, Melbourne, Australia Slowinski, R (1993), Rough set learning of preferential attitude in multicriteria decision making, in Komorowski, J and Ras, Z (Eds), Methodologies for Intelligent Systems, Springer-Verlag, Berlin, Germany, pp 642–651 Tou, J.T and R.C Gonzalez (1974), Pattern Recognition Principles, Addison Wesley, New York Venables, W.N and B.D Ripley (1998), Modern Statistics with S-PLUS, Springer-Verlag, New York Wickens, G., S.E Gordon, and Y Liu (1998), An Introduction to Human Factors Engineering, Harper Collins, New York Wilson, S.W (1995), Classifier fitness based on accuracy, Evolutionary Computation, Vol 3, No 2, pp 149–175 Bianco, A., Garcia Ben, M., Martinez, E and Yohai, V (1996) “Robust procedure for regression models with ARIMA errors”, COMPSTAT ’96 Proceedings of Computational Statistics, 27–38 Physica-Verlag Bruce, A G and Martin, R D (1993) “Tree based robust Bayesian estimation of time series structural models”, in Proceedings of the 25th Symposium on the Interface Denby, L and Martin, R D (1979) “Robust estimation of the auto regression parameter”, Journal of the American Statistical Assoc., 74, 140–146 Fox, A J (1972) “Outliers in time series”, Jour of the Royal Statist Soc., B, 34, 350–363 Chang, I., Tiao, G.C., and Chen, C (1988) “Estimation of time series in the presence of outliers”, Technometrics, 30, No 2, 193–204 Martin, R D., Samarov, A and Vandaele, W (1983) “Robust methods for ARIMA models”, in Applied Time Series Analysis of Economic Data, edited by E Zellner Martin, R D and Yohai, V J (1985) “Robustness in time series and estimating ARMA models”, invited paper for Handbook of Statistics, Volume V: Time Series in the Time Domain, edited by Hannan, Krishnaiah and Rao, North-Holland References 817 Martin, R D and Yohai, V J (1996) “Highly robust estimation of autoregression integrated time series models”, Publicaciones Previas No 89, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires Muler, N and Yohai, V J (2001) “Robust estimates for ARCH processes”, to appear in Jour of Time Series Analysis S+ChangeDetection User’s-PLUS 2000, preprint, Insightful Corporation Tsay, R S (1988) “Outliers, level shifts and variance changes in time series”, Jour Of Forecasting, 7, 1–20 West, M and Harrison, J (1989) Bayesian Forecasting and Dynamic Models, Springer-Verlag C Bettini Mining temportal relationships with multiple granularities in time sequences IEEE Transactions on Data & Knowledge Engineering, 1998 G Das, K Lin, H Mannila, G.Renganathan, and P Smyth Rule discovery from time series In Proceedings of the international conference on KDD and Data Mining(KDD-98), 1998 G Das, D Gunopulos, and H Mannila Finding similar time seies In Principles of Knowledge Discovery and Data Mining ’97, 1997 Cen Li and Gautam Biswas Temporal pattern generation using hidden markov model based unsuperised classifcation In Proc of IDA-99, pages 245–256, 1999 Wei Q Lin and Mehmet A Orgun Applied hidden periodicity analysis for mining discretevalued time series In Proceedings of ISLIP-99, pages 56–68, Demokritos Institute, Athens, Greece, 1999 Wei Q Lin and Mehmet A Orgun Temporal data mining using hidden periodicity analysis In Proceedings of ISMIS 2000, University of North Carolina, USA, 2000 Wei Q Lin, Mehmet A Orgun, and Graham Williams Temporal data mining using multilevel-local polynomial models In Proceedings of IDEAL2000, The Chinese University of Hongkong, Hong Kong, 2000 S Jajodia and S Sripada O Etzion, editor Temporal databases: Research and Practice Springer-Verlag,LNCS1399, 1998 P Baldi and S Brunak Bioinformatics & The Machine Learning Approach The MIT Press, 1999 Z Huang Clustering large data set with mixed numeric and categorical values In 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1997 R Agrawal, T Imielinski, and A N Swami Mining association rules between sets of items in large databases In Proc of the 1993 Int’l Conf on Management of Data, pages 207–216, 1993 R Agrawal and R Srikant Fast algorithms for mining association rules in large databases In Proc of the 1994 Int’l Conf on Very Large Data Bases, pages 487–499, 1994 C Bettini, S Jajodia, and X.S Wang Time granularities in databases, data mining, and temporal reasoning Springer-Verlag, 2000 818 References R Kohavi and C Brodley 2000 knowledge discovery and data mining cup Data for the Cup was provided by Blue Martini Software and Gazelle.com, 2000 http://www.ecn.purdue.edu/KDDCUP/ B Leban, D McDonald, and D Foster A representation for collections of temporal intervals In Proc Of AAAI-1986 5th Int’l Conf on Artifical Intelligence, pages 367–371, 1986 Y Li, P Ning, X S Wang, and S Jajodia Discovering calendar-based temporal association rules In Proc of the 8th Int’l Symposium on Temporal Representation and Reasoning, 2001 S Ramaswamy, S Mahajan, and A Silberschatz On the discovery of interesting patterns in association rules In Proc of the 1998 Int’l Conf on Very Large Data Bases, pages 368–379, 1998 R Agrawal, T Imielinski, and A Swami, “Mining Association Rules between Sets of Items in Large Databases,” in Proc of the ACM SIGMOD Int’l Conf on Management of Data, Washington D.C., 1993, pp 207–216 R Agrawal and G Psaila, “Active Data Mining,” in Proc of the st Int’l Conf on Knowledge Discovery and Data Mining, Montreal, Canada, 1995 R Agrawal and R Srikant, “Fast Algorithms for Mining Association Rules,” in Proc of the 20th Int’l Conf on Very Large Data Bases, Santiago, Chile, 1994, pp 487–499 W.-H Au and K.C.C Chan, “An Effective Algorithm for Discovering Fuzzy Rules in Relational Databases,” in Proc of the 7th IEEE Int’l Conf on Fuzzy Systems, Anchorage, Alaska, 1998, pp 1314–1319 W.-H Au and K.C.C Chan, “FARM: A Data Mining System for Discovering Fuzzy Association Rules,” in Proc of the 8th IEEE Int’l Conf on Fuzzy Systems, Seoul, Korea, 1999, pp 1217–1222 W.-H Au and K.C.C Chan, “Classification with Degree of Membership: A Fuzzy Approach,” in Proc of the 1st IEEE Int’l Conf on Data Mining, San Jose, CA, 2001 K.C.C Chan and W.-H Au, “Mining Fuzzy Association Rules,” in Proc of the 6th Int’l Conf on Information and Knowledge Management, Las Vegas, Nevada, 1997, pp 209–215 K.C.C Chan and W.-H Au, “Mining Fuzzy Association Rules in a Database Containing Relational and Transactional Data,” in A Kandel, M Last, and H Bunke (Eds.), Data Mining and Computational Intelligence, New York, NY: Physica-Verlag, 2001, pp 95–114 D.W Cheung, J Han, V.T Ng, and C.Y Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,” in Proc of the 12th Int’l Conf on Data Engineering, New Orleans, Louisiana, 1996, pp 106–114 J.Y Ching, A.K.C Wong, and K.C.C Chan, “Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data,” IEEE Trans on Pattern Analysis and Machine Intelligence, vol 17, no 6, pp 1–11, 1995 References 819 V Ganti, J Gehrke, R Ramakrishnan, and W.-Y Loh, “A Framework for Measuring Changes in Data Characteristics,” in Proc of the 18 th ACM SIGMOD-SIGACT-SIGART Symp on Principles of Database Systems, Philadelphia, PA, 1999, pp 126–137 B Liu, W Hsu, and Y Ma, “Integrating Classification and Association Rule Mining,” in Proc of the 4th Int’l Conf on Knowledge Discovery and Data Mining, New York, NY, 1998 B Liu, Y Ma, and R Lee, “Analyzing the Interestingness of Association Rules from the Temporal Dimension,” in Proc of the st IEEE Int’l Conf on Data Mining, San Jose, CA, 2001 H Mannila, H Toivonen, and A.I Verkamo, “Efficient Algorithms for Discovering Association Rules,” in Proc of the AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp 181–192 R Srikant and R Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” in Proc of the ACM SIGMOD Int’l Conf On Management of Data, Montreal, Canada, 1996, pp 1–12 Agrawal, R., T Imielinski, et al “Mining Association Rules between Sets of Items in Large Database”, in Proc SIGMOD ’93 Agrawal R., R Srikant R “Mining Sequential Patterns”, In Proceedings of the 11th International Conference on Data Engineering (ICDE ’95), Tapei, Taiwan, March 1995 Blake C L and Merz C J., UCI Irvine ML Database Repository: http://www.ics.uci.edu/∼mlearn/MLRepository.html, University of California, Irvine, Dept Of Information and Computer, 1998 Brin S., Motwani R., Ullman J D., Tsur S., “Dynamic Itemset Counting and Implication Rules for Market Basket Data”, in Proc SIGMOD ’97 N Lavrac, P Flach, and B Zupan “Rule Evaluation Measures: A Unifying View”, in Ninth International Workshop on Inductive Logic Programming (ILP ’99 ), Vol 1634 of Lecture Notes in Artificial Intelligence, pages 174–185 Springer-Verlag, June 1999 R Agrawal, T Imielinski, and A Swami, “Mining Association Rules between Sets of Items in Large Databases,” in Proc of the ACM SIGMOD Int’l Conf On Management of Data, Washington D.C., 1993, pp 207–216 R Agrawal and R Srikant, “Fast Algorithms for Mining Association Rules,” in Proc of the 20th Int’l Conf on Very Large Data Bases, Santiago, Chile, 1994, pp 487–499 W.-H Au and K.C.C Chan, “An Effective Algorithm for Discovering Fuzzy Rules in Relational Databases,” in Proc of the 7th IEEE Int’l Conf on Fuzzy Systems, Anchorage, Alaska, 1998, pp 1314–1319 W.-H Au and K.C.C Chan, “FARM: A Data Mining System for Discovering Fuzzy Association Rules,” in Proc of the 8th IEEE Int’l Conf on Fuzzy Systems, Seoul, Korea, 1999, pp 1217–1222 W.-H Au and K.C.C Chan, “Classification with Degree of Membership: A Fuzzy Approach,” in Proc of the 1st IEEE Int’l Conf on Data Mining, San Jose, CA, 2001, pp 35–42 820 References K.C.C Chan and W.-H Au, “Mining Fuzzy Association Rules,” in Proc of the 6th Int’l Conf on Information and Knowledge Management, Las Vegas, Nevada, 1997, pp 209–215 K.C.C Chan and W.-H Au, “Mining Fuzzy Association Rules in a Database Containing Relational and Transactional Data,” in A Kandel, M Last, and H Bunke (Eds.), Data Mining and Computational Intelligence, New York, NY: Physica-Verlag, 2001, pp 95–114 J Han and M Kamber, Data Mining: Concepts and Techniques, San Francisco, CA: Morgan Kaufmann, 2001 D Hand, H Mannila, and P Smyth, Principles of Data Mining, Cambridge, MA: The MIT Press, 2001 K Hirota and W Pedrycz, “Fuzzy Computing for Data Mining,” Proc of the IEEE, vol 87, no 9, pp 1575–1600, 1999 D.H Lee and M.H Kim, “Database Summarization Using Fuzzy ISA Hierarchies,” IEEE Trans on Systems, Man, and Cybernetics – Part B: Cybernetics, vol 27, no 4, pp 671–680, 1997 B Liu, W Hsu, and Y Man, “Integrating Classification and Association Rule Mining,” in Proc of the 4th Int’l Conf on Knowledge Discovery and Data Mining, New York, NY, 1998 O Maimon, A Kandel, and M Last, “Information-Theoretic Fuzzy Approach to Data Reliability and Data Mining,” Fuzzy Sets and Systems, vol 117, pp 183–194, 2001 H Mannila, H Toivonen, and A.I Verkamo, “Efficient Algorithms for Discovering Association Rules,” in Proc Of the AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp 181–192 J.R Quinlan, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann, 1993 R Srikant and R Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” in Proc Of the ACM SIGMOD Int’l Conf on Management of Data, Montreal, Canada, 1996, pp 1–12 R.R Yager, “On Linguistic Summaries of Data,” in G Piatetsky-Shapiro and W.J Frawley (Eds.), Knowledge Discovery in Databases, Menlo Park, CA: AAAI/MIT Press, 1991, pp 347–363 L Zadeh, “Fuzzy Sets,” Inform Control, vol 8, pp 338–353, 1965 Mary E Califf, Relational Learning Techniques for Natural Language IE, Ph.D thesis, Univ Texas, Austin, www.cs.utexas.edu/users/mecaliff Claire Cardie, ‘Empirical methods in information extraction’, AI Journal, 18(4), 65–79, 1997 Fabio Ciravegna, Alberto Lavelli, and Giorgio Satta, ‘Bringing information extraction out of the labs: the Pinocchio Environment’, in ECAI 2000, Proc of the 14th European Conference on Artificial Intelligence, ed., W Horn, Amsterdam, 2000 IOS Press Fabio Ciravegna, ‘Learning to Tag for Information Extraction from Text’ in F Ciravegna, R Basili, R Gaizauskas (eds.) ECAI Workshop on References 821 Machine Learning for Information Extraction, Berlin, August 2000 (www.dcs.shef.ac.uk/∼fabio/ecai-workshop.html) Aaron Douthat, ‘The message understanding conference scoring software user’s manual’, in the 7th Message Understanding Conf., www.muc.saic.com Dayne Freitag, ‘Information Extraction from HTML: Application of a general learning approach’, Proc of the 15th National Conference on Artificial Intelligence (AAAI-98), 1998 Dayne Freitag and Andrew McCallum: ‘Information Extraction with HMMs and Shrinkage’, AAAI-99 Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999, www.isi.edu/∼muslea/RISE/ML4IE/ Dayne Freitag and Nicholas Kushmerick, ‘Boosted wrapper induction’, in F Ciravegna, R Basili, R Gaizauskas (eds.) ECAI 2000 Workshop on Machine Learning for Information Extraction, Berlin, 2000, (www.dcs.shef.ac.uk/∼fabio/ecai-workshop.html) Ralph Grishman, ‘Information Extraction: Techniques and Challenges In Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, in M.T Pazienza, (ed.), Springer, 97 N Kushmerick, D Weld, and R Doorenbos, ‘Wrapper induction for information extraction’, Proc of 15th International Conference on Artificial Intelligence, IJCAI-97, 1997 I Muslea, S Minton, and C Knoblock, ‘Wrapper induction for semistructured, web-based information sources’, in Proc of the Conference on Autonomous Learning and Discovery CONALD-98, 1998 Steven Soderland, ‘Learning information extraction rules for semi-structured and free text’, Machine Learning, (1), 1–44, 1999 Roman Yangarber, Ralph Grishman, Pasi Tapanainen and Silja Huttunen: “Automatic Acquisition of Domain Knowledge for Information Extraction” In Proc of COLING 2000, 18th Intern Conference on Computational Linguistics, Saarbră ucken, 2000 R Agrawal, H Mannila, R Srikant, H Toivonen, and A I Verkamo Fast discovery of association rules In U.M Fayyad, G Piatetsky-Shapiro, P Smyth, and R Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328 AAAI Press, Menlo Park, CA., 1996 Liu Bing, Wynne Hsu, and Yiming Ma Integrating classification and association rule mining In Proceeding of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 80–96, New York, NY, 1998 AAAI Press Sergey Brin, Rajeev Motwani, and Craig Silverstein Beyond market baskets: Generalizing association rules to correlations In Data Mining and Knowledge Discovery, volume 2, pages 39–68, 1999 J Catlett On changing continuous attributes into ordered discrete attributes In European Workshop on Machine Learning, pages 164–178 SpringerVerlag, 1991 David Jensen and Paul Cohen Multiple comparisons in induction algorithms In Machine Learning (in press) Boston, MA: Kluwer, 1999 822 References R Kohavi A study of cross validation and boostrap for accuracy estimation and model selection In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1137–1143 Morgan Kaufmann, 1995 J Ross Quinlan C4.5: Programs for Machine Learning Morgan Kaufmann Publishers, Inc., 1994 R Rymon An SE-tree based characterization of the induction problem In Proceedings of the Tenth International Conference on Machine Learning, pages 268–275 San Francisco: Morgan Kaufmann, 1993 R Vilalta, G Blix, and L A Rendell Global data analysis and the fragmentation problem in decision tree induction In 9th European Conference on Machine Learning, pages 312–326 Lecture Notes in Artificial Intelligence, Vol XXX, Springer-Verlag, Heidelberg, Available: http://www.research.ibm.com/people/v/vilalta, 1997 Ricardo Vilalta and Daniel Oblinger A quantification of distance-bias between evaluation metrics in classification In Proceedings of the 17th International Conference on Machine Learning, pages 1087–1094 Morgan Kaufman, 2000 [VO00] Geoffrey I Webb Systematic search for categorical attribute-value datadriven machine learning In N Foo and C Rowles, editors, Proceedings of the Sixth Australian Joint Artificial Intelligence Conference, pages 342–347, Singapore, 1993 World Scientific G I.Webb Opus: An efficient admissible algorithm for unordered search Journal of Artificial Intelligence Research, 3:431–435, 1995 Geoffrey I Webb Efficient search for association rules In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99–107, 2000 A.P White and W.Z Liu Bias in information-based measures in decision tree induction Machine Learning, 15:321–329, 1994 D Wolpert The lack of a priori distinctions between learning algorithms and the existence of a priori distinctions between learning algorithms Neural Computation, 8:1341–142, 1996 http://rock.cs.ndsu.nodak.edu/smiley/ http://midas.cs.ndsu.nodak.edu/∼smiley/ Smiley: a web-based remote sensing data mining system Dr William Perrizo, Longjun Chen, Dennis Amundson Computer Science Department North Dakota State University Genetic Algorithms in Search Optimization, and Machine Learning, Goldberg David E., First Edition, Addison Wesley Publishing Inc, 1989 Goldman http://www.cs.ndsu.NoDak.edu/∼perrizo/classes/765/pct.html Knowledge Discovery and Data Mining, Institute of Electrical Engineers, London 1999 A Genetic Algorithm-Based Approach to Data Mining, Flockhart Ian W., Radcliffe Nicholas J., Department of Mathematics and Statistics University of Edinburgh, London, 1995 References 823 Fundamentals of Database Systems, Third edition, Elmasiri R., Navathe S.B., Addison Wesley, 2000 D P Benjamin, editor Change of Representation and Inductive Bias Kluwer Academic Publishers, Boston, 1990 L B Booker, D E Goldberg, and J H Holland Classifier Systems and Genetic Algorithms Artificial Intelligence, 40:235–282, 1989 J Cheng and M J Druzdzel AIS-BN: An adaptive importance sampling algorithm for evidential reasoning in large Bayesian networks Journal of Artificial Intelligence Research (JAIR), 13:155–188, 2000 G F Cooper and E Herskovits A Bayesian Method for the Induction of Probabilistic Networks from Data Machine Learning, 9(4):309–347, 1992 G F Cooper The computational complexity of probabilistic inference using bayesian belief networks Artificial Intelligence, 42(2–3):393–405 Elsevier, 1990 K J Cherkauer and J W Shavlik Growing Simpler Decision Trees to Facilitiate Knowledge Discovery In Proceedings of the Second International Conference of Knowledge Discovery and Data Mining (KDD-96), Portland, OR, August, 1996 K A DeJong, W M Spears, and D F Gordon Using genetic algorithms for concept learning Machine Learning, 13:161–188, Kluwer Academic Publishers, 1993 G Elidan and N Friedman Learning the Dimensionality of Hidden Variables In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-2001), Morgan-Kaufmann, 2001 N Friedman and M Goldszmidt Learning Bayesian Networks From Data Tutorial, American National Conference on Artificial Intelligence (AAAI98), Madison, WI AAAI Press, San Mateo, CA, 1998 N Friedman, M Linial, I Nachman, and D Pe’er, Using Bayesian networks to analyze expression data In Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB 2000), ACMSIGACT, April 2000 D E Goldberg Genetic Algorithms in Search, Optimization, and Machine Learning Addison-Wesley, Reading, MA, 1989 C Guerra-Salcedo and D Whitley Genetic Approach to Feature Selection for Ensemble Creation In Proceedings of the 1999 International Conference on Genetic and Evolutionary Computation (GECCO-99) Morgan-Kaufmann, San Mateo, CA, 1999 D Heckerman, D Geiger, and D Chickering, Learning Bayesian networks: The combination of knowledge and statistical data Machine Learning, 20(3):197–243, Kluwer, 1995 R L Haupt and S E Haupt Practical Genetic Algorithms WileyInterscience, New York, NY, 1998 G Harik and F Lobo A parameter-less genetic algorithm Illinois Genetic Algorithms Laboratory technical report 99009, 1999 824 References W H Hsu, M Welge, T Redman, and D Clutter Genetic Wrappers for Constructive Induction in High-Performance Data Mining In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2000), Las Vegas, NV Morgan-Kaufmann, San Mateo, CA, 2000 W H Hsu, M Welge, T Redman, and D Clutter Constructive Induction Wrappers in High-Performance Commercial Data Mining and Decision Support Systems Knowledge Discovery and Data Mining, Kluwer, 2002 R Kohavi and G H John Wrappers for Feature Subset Selection Artificial Intelligence, Special Issue on Relevance, 97 (1–2):273–324, 1997 M Faupel GAJIT genetic algorithm package URL: http://www.angelfire.com/ca/Amnesiac/gajit.html, 2000 T M Mitchell Machine Learning McGraw-Hill, New York, NY, 1997 S L Lauritzen and D J Spiegelhalter Local computations with probabilities on graphical structures and their application to expert systems Journal of the Royal Statistical Society, Series B 50, 1988 R E Neapolitan Probabilistic Reasoning in Expert Systems: Theory and Applications Wiley-Interscience, New York, NY, 1990 R M Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods Technical Report CRG-TR-93–1, Department of Computer Science, University of Toronto, 1993 J Pearl and T S Verma, A theory of inferred causation In Principles of Knowledge Representation and Reasoning: Proceedings of the Second International Conference Morgan Kaufmann, San Mateo, CA, 1991 M Raymer, W Punch, E Goodman, P Sanschagrin, and L Kuhn, Simultaneous Feature Extraction and Selection using a Masking Genetic Algorithm, In Proceedings of the th International Conference on Genetic Algorithms, pp 561–567, San Francisco, CA, July, 1997 R D Schacter and M A Peot Simulation approaches to general probabilistic inference on belief networks In Uncertainty in Artificial Intelligence 5, p 221–231, Elsevier Science Publishing Company, New York, NY, 1989 R L Welch Real-Time Estimation of Bayesian Networks In Proceedings of UAI-96, Morgan-Kaufmann, 1996 Aksoy, A And T.B Culver, Effect of sorption assumptions on aquifer remediation designs, Groundwater, 38(2), 200–208, 2000 Albert L.A., and Goldberg, D.E., Efficient Evaluation Genetic Algorithms under Fitness Functions, IlliGAL Report No 2001024, July 2001 Cantu’-Paz, E., A survey of Parallel Genetic Algorithms, Calculateurs Paralleles, Reseaux et Systems Repartis, Vol 10, No 2, pp 141–171, Paris: Hermes, 1998 Cantu’-Paz, E., Designing efficient and accurate parallel genetic algorithms, PhD thesis, 1999 Clement, T P., RT3D - A modular computer code for simulating reactive multi-species transport in 3-Dimensional groundwater aquifers, Battelle Pacific Northwest National Laboratory Research Report, PNNLSA-28967 (http://bioprocesses.pnl.gov/rt3d.htm.), 1997 References 825 Clement, T P, Sun, Y., Hooker, B S., and Petersen, J N., Modeling multispecies reactive transport in groundwater, Ground Water Monitoring and Remediation, 18(2), 79–92, 1998 Clement, T P., Johnson, C D., Sun, Y., Klecka, G M., and Bartlett, C., Natural attenuation of chlorinated solvent compounds: Model development and field-scale application, Journal of Contaminant Hydrology, 42, 113–140, 2000 Harik G R., Cantu-Paz E., Goldberg D E., and Miller B L., The gambler’s ruin problem, genetic algorithms and the sizing of populations, In Proceedings of the 1997 IEEE Conference on Evolutionary Computation, pp 7–12, IEEE press, New York, NY, 1997 Gopalakrishnan G., Minsker B., and Goldberg D.E., Optimal sampling in a Noisy Genetic Algorithm for Risk-Based Remediation Design, Journal of Hydroinformatics, in press, 2002 Grefenstette J.J and Fitzpatrick J.M., Genetic search with approximate function evaluations, In Grefenstette, J.J (Ed.), Proceedings of an International Conference on Genetic Algorithms and their Applications, pp 112–120, Hillsdale, NJ, 1985 Hogg, R., and Craig, A., Introduction to Mathematical Statistics Macmillan Publishing Co., Inc., New York, 1978 Liu, Y., and B S Minsker, “Efficient multiscale methods for optimal in situ bioremediation design.” Journal of Water Resources and Planning Management, in press, 2001 McDonald, M.G., and Harbaugh, A.W (1988) “A modular three-dimensional finite-difference ground-water flow model.” Techniques of Water Resources Investigations 06-A1, United States Geological Survey Reed P., Minsker B S., and Goldberg D E., Designing a competent simple genetic algorithm for search and optimization, Water Resources Research, 36(12), 3757–3761, 2000 Reed, P Striking the Balance: Long-Term Groundwater Monitoring Design for Multiple Conflicting Objectives, Ph D Thesis, University of Illinois, 2002 Ritzel, B.J., J.W Eheart, and S Ranjithan, Using genetic algorithms to solve a multiple objective groundwater pollution containment problem, Water Resources Research, 30(5), 1589–1603, 1994 Smalley J B., Minsker B S., and Goldberg D E., Riskbased In Situ bioremeditation design using a noisy genetic algorithm, Water Resources Research, 36(20), 3043-3052, 2000 Wang, Q.J., The genetic algorithm and its application to calibrating conceptual runoff models, Water Resource Research, 27(9), 2467–2471, 1991 Wang, M And C Zheng, Optimal remediation policy selection under general conditions, Groundwater, 35(5), 757–764, 1997 John R Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, USA, 1992 826 References Sean Luke, “Genetic programming produced competitive soccer softbot teams for robocup97,” in Genetic Programming 1998: Proceedings of the Third Annual Conference, John R Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B Fogel, Max H Garzon, David E Goldberg, Hitoshi Iba, and Rick Riolo, Eds., University of Wisconsin, Madison, Wisconsin, USA, 22–25 July 1998, pp 214–222, Morgan Kaufmann Reiko Tanese, “Parallel genetic algorithms for a hypercube,” in Proceedings of the Second International Conference on Genetic Algorithms, John J Grefenstette, Ed 1987, Lawrence Erlbaum Associates, Publishers T C Fogarty and R Huang, “Implementing the genetic algorithm on transputer based parallel processing systems,” in Parallel Problem Solving from Nature, Berlin, Germany, 1991, pp 145–149, Springer Verlag Reiko Tanese, Distributed Genetic Algorithms for Function Optimization, Ph.D thesis, University of Michigan, 1989, Computer Science and Engineering David E Goldberg, “Sizing populations for serial and parallel genetic algorithms,” in Proceedings of the Third International Conference on Genetic Algorithms, J D Schaffer, Ed., San Mateo, CA, 1989, Morgan Kaufman Erick Cant’u-Paz, “Designing efficient master-slave parallel genetic algorithms,” IllGAL Report 97004, The University of Illinois, 1997, Available on-line at: ftp://ftp-illigal.ge.uiuc.edu/pub/ papers/IlliGALs/97004.ps.Z Erick Cant’u-Paz, “Designing scalable multi-population parallel genetic algorithms,” IllGAL Report 98009, The University of Illinois, 1998, Available on-line at: ftp://ftp-illigal.ge.uiuc.edu/pub/ papers/IlliGALs/98009.ps.Z Ricardo Bianchini and Christopher Brown, “Parallel genetic algorithms on distributed-memory architectures,” Technical Report 436, The University of Rochester, The University of Rochester, Computer Science Department, Rochester, New York 14627, May 1993 Shyh-Chang Lin, William F Punch, and Erik D Goodman, “Coarse-grain parallel genetic algorithms: Categorization and new approach,” in Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994, pp 28–37 Tsutomu Maruyama, Tetsuya Hirose, and Akihito Konagaya, “A fine-grained parallel genetic algorithm for distributed parallel systems,” in Proceedings of the Fifth International Conference on Genetic Algorithms, Stephanie Forrest, Ed., San Mateo, CA, 1993, pp 184–190, Morgan Kaufman John J Grefenstette, Michael R Leuze, and Chrisila B Pettey, “A parallel genetic algorithm,” in Proceedings of the Second International Conference on Genetic Algorithms, John J Grefenstette, Ed 1987, pp 155–161, Lawrence Erlbaum Associates, Publishers (Hillsdale, NJ) T C Belding, “The distributed genetic algorithm revisited,” in Proceedings of the Sixth International Conference on Genetic Algorithms, L Eschelman, Ed 1995, pp 114– 121, Morgan Kaufmann (San Francisco, CA) David E Goldberg, Kerry Zakrzewski, Brad Sutton, Ross Gadient, Cecilia Chang, Pillar Gallego, Brad Miller, and Eric Cant’u-Paz, “Genetic algo- References 827 rithms: A bibliography,” IlliGAL Report 97011, Illinois Genetic Algorithms Lab University of Illinois at Urbana-Champaign, December 1997 Erick Cant’u-Paz, “A survey of parallel genetic algorithms,” IllGAL Report 97003, The University of Illinois, 1997, Available on-line at: ftp://ftp-illigal.ge.uiuc.edu/pub/papers/ IlliGALs/97003.ps.Z Mariusz Nowostawski, “Parallel genetic algorithms in geometry atomic cluster optimisation and other applications,” M.S thesis, School of Computer Science, The University of Birmingham, UK, September 1998, http://studentweb.cs.bham ac.uk/˜mxn/gzipped/mpga-v0.1.ps.gz Shumeet Baluja, “A massively distributed parallel genetic algorithm (mdpga),” Technical Report CMU-CS-92-196R, Carnagie Mellon University, Carnagie Mellon University, Pittsburg, PA, 1992 Shumeet Baluja, “The evolution of genetic algorithms: Towards massive parallelism,” in Proceedings of the Tenth International Conference on Machine Learning, San Mateo, CA, 1993, pp 1–8, Morgan Kaufmann David Goldberg, Kalyanmoy Deb, and Bradley Korb, “Messy genetic algorithms: Motiation, analysis, and first results,” Complex Systems, vol 3, pp 493–530, 1989 Bianchini, R., C Brown Parallel Genetic Algorithms on Distributed-Memory Architectures Technical Report 436, Computer Science Department University of Rochester, Rochester NY, August 1992 Carriero, N., and D Gelernter How to Write Parallel Programs: A First Course Massachusetts: MIT Press, 1991 J G Elias, “Genetic generation of connection patterns for a dynamic artificial neural network,” in COGANN-92, Combinations of Genetic Algorithms and Neural Networks, eds L D Whitley and J D Schaffer, IEEE Computer Society Press, Los Alamitos, CA, pp 38–54, 1992 Brunk, C., Kelly, J & Kohavi, R (1997), MineSet: an integrated system for data mining, in D Heckerman, H Mannila, D Pregibon & R Uthurusamy, eds, ‘Proceedings of the third international conference on Knowledge Discovery and Data Mining’, AAAI Press, pp 135 to 138 http://mineset.sgi.com Fayyad, U M., Piatetsky-Shapiro, G & Smyth, P (1996), ‘The KDD process for extracting useful knowledge from volumes of data’, Communications of the ACM 39(11), 27 to 34 Kohavi, R & Kunz, C (1997), Option decision trees with majority votes, in D Fisher, ed., ‘Machine Learning: Proceedings of the Fourteenth International Conference’, Morgan Kaufmann Publishers, Inc., pp 161 to 169 http://robotics.stanford.edu/users/ronnyk Kohavi, R & Sommerfield, D (1998), Targeting business users with decision table classifiers, in R Agrawal, P Stolorz & G Piatetsky-Shapiro, eds, ‘Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining’, AAAI Press, pp 249 to 253 Kohavi, R., Sommerfield, D & Dougherty, J (1997), ‘Data mining using MLC++: A machine learning library in C++’, International Journal on 828 References Artificial Intelligence Tools 6(4), 537 to 566 http://www.sgi.com/Technology/mlc Quinlan, J R (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, California Silicon Graphics (1998), MineSet User’s Guide, Silicon Graphics, Inc http://mineset.sgi.com Java Specification Request 73: Java Data Mining (JDM), Version 1.0, Final Review XML for Analysis Specification version 1.0 Predictive Model Markup Language, Version 2.1.2, http://www.dmg.org OLE DB for Data Mining Specification, Version 1.0 SOAP Version 1.2, http://www.w3.org/TR/soap/ WS-Security, http://www-106.ibm.com/developerworks/webservices/library/ ws-secure/ WS-Resource Framework, http://www.globus.org/wsrf/ XML Specification, http://www.w3.org/TR/2000/REC-xml-20001006 The Data-Mining Industry Coming Of Age Gregory Piatetsky-Shapiro, Knowledge Stream Partnerswww.kdnuggets.com/ gpspubs/ieee-intelligentdec-1999-x6032.pdf Current issues in modeling Data Mining processes and results Panos Xeros [pxeros@cti.gr]& Yannis Theodoridis [ytheod@cti.gr] PANDA informal meeting, Athens,19 June 2002 dke.cti.gr/panda/tasks/meetings/2002–06Athens-informal/CTIpresentation-Athens-19June02.ppt The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration I Foster, C Kesselman, J Nick, S Tuecke, Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002 Java Specification Request 73: Java Data Mining (JDM)–JDM Public review Draft 2003/11/25: JSR-73 Expert Group ... Nets and Timed Automata, 2006 ISBN 3-540-32869-6 Introduction to Data Mining and its Applications, S Sumathi S.N Sivanandam Introduction to Data Mining and its Applications With 108 Figures and. .. preprocessing tools and Data mining tools Introduction to Data Mining Principles Data mining tools are considered for information extraction from data In recent research, data mining through pattern... 2.3.3 The Need and Opportunity for Data Mining 2.3.4 Data Mining Tools and Techniques 2.3.5 Common Applications of Data Mining 2.3.6 What about Data Mining in Power