1. Trang chủ
  2. » Tất cả

IT Training Applied Data Mining [Xu, Zong & Yang 2013-06-17]

284 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 284
Dung lượng 5,31 MB

Nội dung

Applied Data Mining This page intentionally left blank Applied Data Mining Guandong Xu University of Technology Sydney Sydney, Australia Yu Zong West Anhui University Luan, China Zhenglu Yang The University of Tokyo Tokyo, Japan p, A SCIENCE PUBLISHERS BOOK CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2013 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Version Date: 20130604 International Standard Book Number-13: 978-1-4665-8584-3 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Preface The data era is here It provides a wealth of opportunities, but also poses challenges for the effective and efficient utilization of the huge data Data mining research is necessary to derive useful information from large data The book reviews applied data mining from theoretical basis to practical applications The book consists of three main parts: Fundamentals, Advanced Data Mining, and Emerging Applications In the first part, the authors first introduce and review the fundamental concepts and mathematical models which are commonly used in data mining.There are five chapters in this section, which lay a solid base and prepare the necessary skills and approaches for further understanding the remaining parts of the book The second part comprises three chapters and addresses the topics of advanced clustering, multi-label classification, and privacy preserving, which are all hot topics in applied data mining In the final part, the authors present some recent emerging applications of applied data mining, i.e., data stream,recommender systems, and social tagging annotation systems.This part introduces the contents in a sequence of theoretical background, stateof-the-art techniques, application cases, and future research directions This book combines the fundamental concepts, models, and algorithms in the data mining domain together, to serve as a reference for researchers and practitioners from as diverse backgrounds as computer science, machine learning, information systems, artificial intelligence, statistics, operational science, business intelligence as well as social science disciplines Furthermore, this book provides a compilation and summarization for disseminating and reviewing the recent emerging advances in a variety of data mining application arenas, such as advanced data mining, analytics, internet computing, recommender systems as well as social computing and applied informatics from the perspective of developmental practice for emerging research and practical applications This book will also be useful as a textbook for postgraduate students and senior undergraduate students in related areas vi Applied Data Mining This book features the following topics: • Systematically presents and discusses the mathematical background and representative algorithms for data mining, information retrieval, and internet computing • Thoroughly reviews the related studies and outcomes conducted on the addressed topics • Substantially demonstrates various important applications in the areas of classical data mining, advanced data mining, and emerging research topics such as stream data mining, recommender systems, social computing • Heuristically outlines the open research issues of interdisciplinary research topics, and identifies several future research directions that readers may be interested in April 2013 Guandong Xu Yu Zong Zhenglu Yang Contents Preface v Part I: Fundamentals Introduction 1.1 Background 1.1.1 Data Mining—Definitions and Concepts 1.1.2 Data Mining Process 1.1.3 Data Mining Algorithms 1.2 Organization of the Book 1.2.1 Part 1: Fundamentals 1.2.2 Part 2: Advanced Data Mining 1.2.3 Part 3: Emerging Applications 1.3 The Audience of the Book 3 10 16 17 18 19 19 Mathematical Foundations 2.1 Organization of Data 2.1.1 Boolean Model 2.1.2 Vector Space Model 2.1.3 Graph Model 2.1.4 Other Data Structures 2.2 Data Distribution 2.2.1 Univariate Distribution 2.2.2 Multivariate Distribution 2.3 Distance Measures 2.3.1 Jaccard distance 2.3.2 Euclidean Distance 2.3.3 Minkowski Distance 2.3.4 Chebyshev Distance 2.3.5 Mahalanobis Distance 2.4 Similarity Measures 2.4.1 Cosine Similarity 2.4.2 Adjusted Cosine Similarity 21 21 22 22 23 26 27 27 28 29 30 30 31 32 32 33 33 34 viii Applied Data Mining 2.4.3 Kullback-Leibler Divergence 2.4.4 Model-based Measures 2.5 Dimensionality Reduction 2.5.1 Principal Component Analysis 2.5.2 Independent Component Analysis 2.5.3 Non-negative Matrix Factorization 2.5.4 Singular Value Decomposition 2.6 Chapter Summary 35 37 38 38 40 41 42 43 Data Preparation 3.1 Attribute Selection 3.1.1 Feature Selection 3.1.2 Discretizing Numeric Attributes 3.2 Data Cleaning and Integrity 3.2.1 Missing Values 3.2.2 Detecting Anomalies 3.2.3 Applications 3.3 Multiple Model Integration 3.3.1 Data Federation 3.3.2 Bagging and Boosting 3.4 Chapter Summary 45 46 46 49 50 50 51 52 53 53 54 55 Clustering Analysis 4.1 Clustering Analysis 4.2 Types of Data in Clustering Analysis 4.2.1 Data Matrix 4.2.2 The Proximity Matrix 4.3 Traditional Clustering Algorithms 4.3.1 Partitional methods 4.3.2 Hierarchical Methods 4.3.3 Density-based methods 4.3.4 Grid-based Methods 4.3.5 Model-based Methods 4.4 High-dimensional clustering algorithm 4.4.1 Bottom-up Approaches 4.4.2 Top-down Approaches 4.4.3 Other Methods 4.5 Constraint-based Clustering Algorithm 4.5.1 COP K-means 4.5.2 MPCK-means 4.5.3 AFCC 4.6 Consensus Clustering Algorithm 4.6.1 Consensus Clustering Framework 4.6.2 Some Consensus Clustering Methods 4.7 Chapter Summary 57 57 59 59 61 63 63 68 74 77 80 83 84 86 88 89 90 90 91 92 93 95 96 Contents ix Classification 5.1 Classification Definition and Related Issues 5.2 Decision Tree and Classification 5.2.1 Decision Tree 5.2.2 Decision Tree Classification 5.2.3 Hunt’s Algorithm 5.3 Bayesian Network and Classification 5.3.1 Bayesian Network 5.3.2 Backpropagation and Classification 5.3.3 Association-based Classification 5.3.4 Support Vector Machines and Classification 5.4 Chapter Summary 100 101 103 103 105 106 107 107 109 110 112 115 Frequent Pattern Mining 6.1 Association Rule Mining 6.1.1 Association Rule Mining Problem 6.1.2 Basic Algorithms for Association Rule Mining 6.2 Sequential Pattern Mining 6.2.1 Sequential Pattern Mining Problem 6.2.2 Existing Sequential Pattern Mining Algorithms 6.3 Frequent Subtree Mining 6.3.1 Frequent Subtree Mining Problem 6.3.2 Data Structures for Storing Trees 6.3.3 Maximal and closed frequent subtrees 6.4 Frequent Subgraph Mining 6.4.1 Problem Definition 6.4.2 Graph Representation 6.4.3 Candidate Generation 6.4.4 Frequent Subgraph Mining Algorithms 6.5 Chapter Summary 117 117 118 120 124 125 126 137 137 138 141 142 142 143 144 145 146 Part II: Advanced Data Mining Advanced Clustering Analysis 7.1 Introduction 7.2 Space Smoothing Search Methods in Heuristic Clustering 7.2.1 Smoothing Search Space and Smoothing Operator 7.2.2 Clustering Algorithm based on Smoothed Search Space 7.3 Using Approximate Backbone for Initializations in Clustering 7.3.1 Definitions and Background of Approximate Backbone 7.3.2 Heuristic Clustering Algorithm based on Approximate Backbone 7.4 Improving Clustering Quality in High Dimensional Space 7.4.1 Overview of High Dimensional Clustering 153 153 155 156 161 163 164 167 169 169 ... data mining algorithms to deal with the change of temporal and spatial data within the database The representative 10 Applied Data Mining methods and applications include sequential pattern mining. .. Xu Yu Zong Zhenglu Yang Contents Preface v Part I: Fundamentals Introduction 1.1 Background 1.1.1 Data Mining? ??Definitions and Concepts 1.1.2 Data Mining Process 1.1.3 Data Mining Algorithms 1.2.. .Applied Data Mining This page intentionally left blank Applied Data Mining Guandong Xu University of Technology Sydney Sydney, Australia Yu Zong West Anhui University Luan, China Zhenglu Yang

Ngày đăng: 05/11/2019, 13:06