Data mining special issue in annals of information systems stahlbock, crone lessmann 2009 11 23

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	402
Dung lượng	13,04 MB

Nội dung

Annals of Information Systems Series Editors Ramesh Sharda Oklahoma State University Stillwater, OK, USA Stefan Voß University of Hamburg Hamburg, Germany For further volumes: http://www.springer.com/series/7573 Robert Stahlbock · Sven F Crone · Stefan Lessmann Editors Data Mining Special Issue in Annals of Information Systems 123 Editors Robert Stahlbock Department of Business Administration University of Hamburg Institute of Information Systems Von-Melle-Park 20146 Hamburg Germany stahlbock@econ.uni-hamburg.de Sven F Crone Department of Management Science Lancaster University Management School Lancaster United Kingdom LA1 4YX sven.f.crone@crone.de Stefan Lessmann Department of Business Administration University of Hamburg Institute of Information Systems Von-Melle-Park 20146 Hamburg Germany lessmann@econ.uni-hamburg.de ISSN 1934-3221 e-ISSN 1934-3213 ISBN 978-1-4419-1279-4 e-ISBN 978-1-4419-1280-0 DOI 10.1007/978-1-4419-1280-0 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009910538 c Springer Science+Business Media, LLC 2010 All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface Data mining has experienced an explosion of interest over the last two decades It has been established as a sound paradigm to derive knowledge from large, heterogeneous streams of data, often using computationally intensive methods It continues to attract researchers from multiple disciplines, including computer sciences, statistics, operations research, information systems, and management science Successful applications include domains as diverse as corporate planning, medical decision making, bioinformatics, web mining, text recognition, speech recognition, and image recognition, as well as various corporate planning problems such as customer churn prediction, target selection for direct marketing, and credit scoring Research in information systems equally reflects this inter- and multidisciplinary approach Information systems research exceeds the software and hardware systems that support data-intensive applications, analyzing the systems of individuals, data, and all manual or automated activities that process the data and information in a given organization The Annals of Information Systems devotes a special issue to topics at the intersection of information systems and data mining in order to explore the synergies between information systems and data mining This issue serves as a follow-up to the International Conference on Data Mining (DMIN) which is annually held in conjunction within WORLDCOMP, the largest annual gathering of researchers in computer science, computer engineering, and applied computing The special issue includes significantly extended versions of prior DMIN submissions as well as contributions without DMIN context We would like to thank the members of the DMIN program committee Their support was essential for the quality of the conferences and for attracting interesting contributions We wish to express our sincere gratitude and respect toward Hamid R Arabnia, general chair of all WORLDCOMP conferences, for his excellent and tireless support, organization, and coordination of all WORLDCOMP conferences Moreover, we would like to thank the two series editors, Ramesh Sharda and Stefan Voß, for their valuable advice, support, and encouragement We are grateful for the pleasant cooperation with Neil Levine, Carolyn Ford, and Matthew Amboy from Springer and their professional support in publishing this volume In addition, we v vi Preface would like to thank the reviewers for their time and their thoughtful reviews Finally, we would like to thank all authors who submitted their work for consideration to this focused issue Their contributions made this special issue possible Hamburg, Germany Hamburg, Germany Lancaster, UK Robert Stahlbock Stefan Lessmann Sven F Crone Contents Data Mining and Information Systems: Quo Vadis? Robert Stahlbock, Stefan Lessmann, and Sven F Crone 1.1 Introduction 1.2 Special Issues in Data Mining 1.2.1 Confirmatory Data Analysis 1.2.2 Knowledge Discovery from Supervised Learning 1.2.3 Classification Analysis 1.2.4 Hybrid Data Mining Procedures 1.2.5 Web Mining 1.2.6 Privacy-Preserving Data Mining 1.3 Conclusion and Outlook References 1 3 10 11 12 13 Part I Confirmatory Data Analysis Response-Based Segmentation Using Finite Mixture Partial Least Squares Christian M Ringle, Marko Sarstedt, and Erik A Mooi 2.1 Introduction 2.1.1 On the Use of PLS Path Modeling 2.1.2 Problem Statement 2.1.3 Objectives and Organization 2.2 Partial Least Squares Path Modeling 2.3 Finite Mixture Partial Least Squares Segmentation 2.3.1 Foundations 2.3.2 Methodology 2.3.3 Systematic Application of FIMIX-PLS 2.4 Application of FIMIX-PLS 2.4.1 On Measuring Customer Satisfaction 2.4.2 Data and Measures 2.4.3 Data Analysis and Results 19 20 20 22 23 24 26 26 28 31 34 34 34 36 vii viii Contents 2.5 Summary and Conclusion 44 References 45 Part II Knowledge Discovery from Supervised Learning Building Acceptable Classification Models David Martens and Bart Baesens 3.1 Introduction 3.2 Comprehensibility of Classification Models 3.2.1 Measuring Comprehensibility 3.2.2 Obtaining Comprehensible Classification Models 3.3 Justifiability of Classification Models 3.3.1 Taxonomy of Constraints 3.3.2 Monotonicity Constraint 3.3.3 Measuring Justifiability 3.3.4 Obtaining Justifiable Classification Models 3.4 Conclusion References Mining Interesting Rules Without Support Requirement: A General Universal Existential Upward Closure Property Yannick Le Bras, Philippe Lenca, and St´ephane Lallich 4.1 Introduction 4.2 State of the Art 4.3 An Algorithmic Property of Confidence 4.3.1 On UEUC Framework 4.3.2 The UEUC Property 4.3.3 An Efficient Pruning Algorithm 4.3.4 Generalizing the UEUC Property 4.4 A Framework for the Study of Measures 4.4.1 Adapted Functions of Measure 4.4.2 Expression of a Set of Measures of Ddcon f 4.5 Conditions for GUEUC 4.5.1 A Sufficient Condition 4.5.2 A Necessary Condition 4.5.3 Classification of the Measures 4.6 Conclusion References 53 54 55 57 58 59 60 62 63 68 70 71 75 76 77 80 80 80 81 82 84 84 87 90 90 91 92 94 95 Classification Techniques and Error Control in Logic Mining 99 Giovanni Felici, Bruno Simeone, and Vincenzo Spinelli 5.1 Introduction 100 5.2 Brief Introduction to Box Clustering 102 5.3 BC-Based Classifier 104 5.4 Best Choice of a Box System 108 5.5 Bi-criterion Procedure for BC-Based Classifier 111 Contents ix 5.6 Examples 112 5.6.1 The Data Sets 112 5.6.2 Experimental Results with BC 113 5.6.3 Comparison with Decision Trees 115 5.7 Conclusions 117 References 117 Part III Classification Analysis An Extended Study of the Discriminant Random Forest 123 Tracy D Lemmond, Barry Y Chen, Andrew O Hatch, and William G Hanley 6.1 Introduction 123 6.2 Random Forests 124 6.3 Discriminant Random Forests 125 6.3.1 Linear Discriminant Analysis 126 6.3.2 The Discriminant Random Forest Methodology 127 6.4 DRF and RF: An Empirical Study 128 6.4.1 Hidden Signal Detection 129 6.4.2 Radiation Detection 132 6.4.3 Significance of Empirical Results 136 6.4.4 Small Samples and Early Stopping 137 6.4.5 Expected Cost 143 6.5 Conclusions 143 References 145 Prediction with the SVM Using Test Point Margins 147 ă og uă r-Akyăuz, Zakria Hussain, and John Shawe-Taylor Săureyya Oză 7.1 Introduction 147 7.2 Methods 151 7.3 Data Set Description 154 7.4 Results 154 7.5 Discussion and Future Work 155 References 157 Effects of Oversampling Versus Cost-Sensitive Learning for Bayesian and SVM Classifiers 159 Alexander Liu, Cheryl Martin, Brian La Cour, and Joydeep Ghosh 8.1 Introduction 159 8.2 Resampling 161 8.2.1 Random Oversampling 161 8.2.2 Generative Oversampling 161 8.3 Cost-Sensitive Learning 162 8.4 Related Work 163 8.5 A Theoretical Analysis of Oversampling Versus Cost-Sensitive Learning 164 ... Crone · Stefan Lessmann Editors Data Mining Special Issue in Annals of Information Systems 123 Editors Robert Stahlbock Department of Business Administration University of Hamburg Institute of. .. process the data and information in a given organization The Annals of Information Systems devotes a special issue to topics at the intersection of information systems and data mining in order to... while maintaining the efficiency and feasibility of a rule mining algorithm The field of logic mining represents a special form of classification rule mining in the sense that the resulting models

Ngày đăng: 23/10/2019, 15:16

Nguồn tham khảo

Tài liệu tham khảo

Loại

Chi tiết

1. N.R. Adam and J.C. Wortmann, Security Control Methods for Statistical Databases: A Com- parative Study, ACM Computing Surveys 21(4) (1989), pp. 515–556

Sách, tạp chí

Tiêu đề:	ACM Computing Surveys
Tác giả:	N.R. Adam and J.C. Wortmann, Security Control Methods for Statistical Databases: A Com- parative Study, ACM Computing Surveys 21(4)
Năm:	1989

2. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, Anonymizing Tables, in: Proceedings of the International Conference on Database Theory, 2005, pp. 246–258

Sách, tạp chí

Tiêu đề:	Proceedings of the International Conference on Database Theory

3. R. Agrawal, J. Kiernan, R. Srikant, R. and Y. Xu. Hippocratic Databases, in: Proceedings of the Very Large Data Base Conference, 2002, pp. 143–154

Sách, tạp chí

Tiêu đề:	Proceedings of"the Very Large Data Base Conference

4. R.J. Bayardo and R. Agrawal, Data Privacy through Optimal k-Anonymization, in: Proceed- ings of the IEEE International Conference on Data Engineering, 2005, pp. 217–228

Sách, tạp chí

Tiêu đề:	Proceed-"ings of the IEEE International Conference on Data Engineering

5. J.W. Byun, A. Kamra, E. Bertino and N. Li, Efficient k-Anonymity using Clustering Tech- niques, in: Proceedings of Database Systems for Advanced Applications, 2006, pp. 188–200

Sách, tạp chí

Tiêu đề:	Proceedings of Database Systems for Advanced Applications

6. A. Campan and T.M. Truta, Extended P-Sensitive K-Anonymity, Studia Universitatis Babes- Bolyai Informatica 51(2) (2006), pp. 19–30

Sách, tạp chí

Tiêu đề:	Studia Universitatis Babes-"Bolyai Informatica
Tác giả:	A. Campan and T.M. Truta, Extended P-Sensitive K-Anonymity, Studia Universitatis Babes- Bolyai Informatica 51(2)
Năm:	2006

7. A. Campan, T.M. Truta, J. Miller and R.A. Sinca, Clustering Approach for Achieving Data Privacy, in: Proceedings of the International Data Mining Conference, 2007, pp. 321–327

Sách, tạp chí

Tiêu đề:	Proceedings of the International Data Mining Conference

8. A. Campan and T.M. Truta, A Clustering Approach for Data and Structural Anonymity in Social Networks, in: Proceedings of the Privacy, Security, and Trust in KDD Workshop, 2008

Sách, tạp chí

Tiêu đề:	Proceedings of the Privacy, Security, and Trust in KDD Workshop

9. D. Lambert, Measures of Disclosure Risk and Harm, Journal of Official Statistics 9 (1993), pp. 313–331

Sách, tạp chí

Tiêu đề:	Journal of Official Statistics
Tác giả:	D. Lambert, Measures of Disclosure Risk and Harm, Journal of Official Statistics 9
Năm:	1993

10. K. LeFevre, D. DeWitt and R. Ramakrishnan, Incognito: Efficient Full-Domain K- Anonymity, in: Proceedings of the ACM SIGMOD, 2005, pp. 49–60

Sách, tạp chí

Tiêu đề:	Proceedings of the ACM SIGMOD

11. K. LeFevre, D. DeWitt and R. Ramakrishnan, Mondrian Multidimensional K-Anonymity, in:Proceedings of the IEEE International Conference on Data Engineering, 2006, 25

Sách, tạp chí

Tiêu đề:	Proceedings of the IEEE International Conference on Data Engineering

12. N. Li, T. Li and S. Venkatasubramanian, T-Closeness: Privacy Beyond k-Anonymity and l- Diversity, in: Proceedings of the IEEE International Conference on Data Engineering, 2007, pp. 106–115

Sách, tạp chí

Tiêu đề:	Proceedings of the IEEE International Conference on Data Engineering

13. A. Machanavajjhala, J. Gehrke and D. Kifer, L-Diversity: Privacy beyond K-Anonymity, in:Proceedings of the IEEE International Conference on Data Engineering, 2006, 24

Sách, tạp chí

Tiêu đề:	Proceedings of the IEEE International Conference on Data Engineering

14. J. Miller, A. Campan and T.M. Truta, Constrained K-Anonymity: Privacy with Generalization Boundaries, in: Proceedings of the Practical Preserving Data Mining Workshop, 2008

Sách, tạp chí

Tiêu đề:	Proceedings of the Practical Preserving Data Mining Workshop

15. M.C. Mont, S. Pearson and R. Thyne, A Systematic Approach to Privacy Enforcement and Policy Compliance Checking in Enterprises, in: Proceedings of the Trust and Privacy in Dig- ital Business Conference, 2006, pp. 91–102

Sách, tạp chí

Tiêu đề:	Proceedings of the Trust and Privacy in Dig-"ital Business Conference

18. P. Samarati, Protecting Respondents Identities in Microdata Release, IEEE Transactions on Knowledge and Data Engineering 13(6) (2001), pp. 1010–1027

Sách, tạp chí

Tiêu đề:	IEEE Transactions on"Knowledge and Data Engineering
Tác giả:	P. Samarati, Protecting Respondents Identities in Microdata Release, IEEE Transactions on Knowledge and Data Engineering 13(6)
Năm:	2001

19. L. Sweeney, k-Anonymity: A Model for Protecting Privacy, International Journal on Uncer- tainty, Fuzziness, and Knowledge-based Systems 10(5) (2002), pp. 557–570

Sách, tạp chí

Tiêu đề:	International Journal on Uncer-"tainty, Fuzziness, and Knowledge-based Systems
Tác giả:	L. Sweeney, k-Anonymity: A Model for Protecting Privacy, International Journal on Uncer- tainty, Fuzziness, and Knowledge-based Systems 10(5)
Năm:	2002

20. L. Sweeney, Achieving k-Anonymity Privacy Protection Using Generalization and Suppres- sion, International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems 10(5) (2002), pp. 571–588

Sách, tạp chí

Tiêu đề:	International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems
Tác giả:	L. Sweeney, Achieving k-Anonymity Privacy Protection Using Generalization and Suppres- sion, International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems 10(5)
Năm:	2002

21. T.M. Truta and V. Bindu, Privacy Protection: P-Sensitive K-Anonymity Property, in: Proceed- ings of the ICDE Workshop on Privacy Data Management, 2006, 94

Sách, tạp chí

Tiêu đề:	Proceed-"ings of the ICDE Workshop on Privacy Data Management

16. MSNBC, Privacy Lost, 2006, Available online at http://www.msnbc.msn.com/id/15157222

Link