1. Trang chủ
  2. » Tất cả

Data Mining and Statistics for Decision Making [Tufféry 2011-04-18]

717 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Data Mining and Statistics for Decision Making

    • Contents

    • Preface

    • Foreword

    • Foreword from the French language edition

    • List of trademarks

    • 1 Overview of data mining

      • 1.1 What is data mining?

      • 1.2 What is data mining used for?

        • 1.2.1 Data mining in different sectors

        • 1.2.2 Data mining in different applications

      • 1.3 Data mining and statistics

      • 1.4 Data mining and information technology

      • 1.5 Data mining and protection of personal data

      • 1.6 Implementation of data mining

    • 2 The development of a data mining study

      • 2.1 Defining the aims

      • 2.2 Listing the existing data

      • 2.3 Collecting the data

      • 2.4 Exploring and preparing the data

      • 2.5 Population segmentation

      • 2.6 Drawing up and validating predictive models

      • 2.7 Synthesizing predictive models of different segments

      • 2.8 Iteration of the preceding steps

      • 2.9 Deploying the models

      • 2.10 Training the model users

      • 2.11 Monitoring the models

      • 2.12 Enriching the models

      • 2.13 Remarks

      • 2.14 Life cycle of a model

      • 2.15 Costs of a pilot project

    • 3 Data exploration and preparation

      • 3.1 The different types of data

      • 3.2 Examining the distribution of variables

      • 3.3 Detection of rare or missing values

      • 3.4 Detection of aberrant values

      • 3.5 Detection of extreme values

      • 3.6 Tests of normality

      • 3.7 Homoscedasticity and heteroscedasticity

      • 3.8 Detection of the most discriminating variables

        • 3.8.1 Qualitative, discrete or binned independent variables

        • 3.8.2 Continuous independent variables

        • 3.8.3 Details of single-factor non-parametric tests

        • 3.8.4 ODS and automated selection of discriminating variables

      • 3.9 Transformation of variables

      • 3.10 Choosing ranges of values of binned variables

      • 3.11 Creating new variables

      • 3.12 Detecting interactions

      • 3.13 Automatic variable selection

      • 3.14 Detection of collinearity

      • 3.15 Sampling

        • 3.15.1 Using sampling

        • 3.15.2 Random sampling methods

    • 4 Using commercial data

      • 4.1 Data used in commercial applications

        • 4.1.1 Data on transactions and RFM Data

        • 4.1.2 Data on products and contracts

        • 4.1.3 Lifetimes

        • 4.1.4 Data on channels

        • 4.1.5 Relational, attitudinal and psychographic data

        • 4.1.6 Sociodemographic data

        • 4.1.7 When data are unavailable

        • 4.1.8 Technical data

      • 4.2 Special data

        • 4.2.1 Geodemographic data

        • 4.2.2 Profitability

      • 4.3 Data used by business sector

        • 4.3.1 Data used in banking

        • 4.3.2 Data used in insurance

        • 4.3.3 Data used in telephony

        • 4.3.4 Data used in mail order

    • 5 Statistical and data mining software

      • 5.1 Types of data mining and statistical software

      • 5.2 Essential characteristics of the software

        • 5.2.1 Points of comparison

        • 5.2.2 Methods implemented

        • 5.2.3 Data preparation functions

        • 5.2.4 Other functions

        • 5.2.5 Technical characteristics

      • 5.3 The main software packages

        • 5.3.1 Overview

        • 5.3.2 IBM SPSS

        • 5.3.3 SAS

        • 5.3.4 R

        • 5.3.5 Some elements of the R language

      • 5.4 Comparison of R, SAS and IBM SPSS

      • 5.5 How to reduce processing time

    • 6 An outline of data mining methods

      • 6.1 Classification of the methods

      • 6.2 Comparison of the methods

    • 7 Factor analysis

      • 7.1 Principal component analysis

        • 7.1.1 Introduction

        • 7.1.2 Representation of variables

        • 7.1.3 Representation of individuals

        • 7.1.4 Use of PCA

        • 7.1.5 Choosing the number of factor axes

        • 7.1.6 Summary

      • 7.2 Variants of principal component analysis

        • 7.2.1 PCA with rotation

        • 7.2.2 PCA of ranks

        • 7.2.3 PCA on qualitative variables

      • 7.3 Correspondence analysis

        • 7.3.1 Introduction

        • 7.3.2 Implementing CA with IBM SPSS Statistics

      • 7.4 Multiple correspondence analysis

        • 7.4.1 Introduction

        • 7.4.2 Review of CA and MCA

        • 7.4.3 Implementing MCA and CA with SAS

    • 8 Neural networks

      • 8.1 General information on neural networks

      • 8.2 Structure of a neural network

      • 8.3 Choosing the learning sample

      • 8.4 Some empirical rules for network design

      • 8.5 Data normalization

        • 8.5.1 Continuous variables

        • 8.5.2 Discrete variables

        • 8.5.3 Qualitative variables

      • 8.6 Learning algorithms

      • 8.7 The main neural networks

        • 8.7.1 The multilayer perceptron

        • 8.7.2 The radial basis function network

        • 8.7.3 The Kohonen network

    • 9 Cluster analysis

      • 9.1 Definition of clustering

      • 9.2 Applications of clustering

      • 9.3 Complexity of clustering

      • 9.4 Clustering structures

        • 9.4.1 Structure of the data to be clustered

        • 9.4.2 Structure of the resulting clusters

      • 9.5 Some methodological considerations

        • 9.5.1 The optimum number of clusters

        • 9.5.2 The use of certain types of variables

        • 9.5.3 The use of illustrative variables

        • 9.5.4 Evaluating the quality of clustering

        • 9.5.5 Interpreting the resulting clusters

        • 9.5.6 The criteria for correct clustering

      • 9.6 Comparison of factor analysis and clustering

      • 9.7 Within-cluster and between-cluster sum of squares

      • 9.8 Measurements of clustering quality

        • 9.8.1 All types of clustering

        • 9.8.2 Agglomerative hierarchical clustering

      • 9.9 Partitioning methods

        • 9.9.1 The moving centres method

        • 9.9.2 k-means and dynamic clouds

        • 9.9.3 Processing qualitative data

        • 9.9.4 k-medoids and their variants

        • 9.9.5 Advantages of the partitioning methods

        • 9.9.6 Disadvantages of the partitioning methods

        • 9.9.7 Sensitivity to the choice of initial centres

      • 9.10 Agglomerative hierarchical clustering

        • 9.10.1 Introduction

        • 9.10.2 The main distances used

        • 9.10.3 Density estimation methods

        • 9.10.4 Advantages of agglomerative hierarchical clustering

        • 9.10.5 Disadvantages of agglomerative hierarchical clustering

      • 9.11 Hybrid clustering methods

        • 9.11.1 Introduction

        • 9.11.2 Illustration using SAS Software

      • 9.12 Neural clustering

        • 9.12.1 Advantages

        • 9.12.2 Disadvantages

      • 9.13 Clustering by similarity aggregation

        • 9.13.1 Principle of relational analysis

        • 9.13.2 Implementing clustering by similarity aggregation

        • 9.13.3 Example of use of the R amap package

        • 9.13.4 Advantages of clustering by similarity aggregation

        • 9.13.5 Disadvantages of clustering by similarity aggregation

      • 9.14 Clustering of numeric variables

      • 9.15 Overview of clustering methods

    • 10 Association analysis

      • 10.1 Principles

      • 10.2 Using taxonomy

      • 10.3 Using supplementary variables

      • 10.4 Applications

      • 10.5 Example of use

    • 11 Classification and prediction methods

      • 11.1 Introduction

      • 11.2 Inductive and transductive methods

      • 11.3 Overview of classification and prediction methods

        • 11.3.1 The qualities expected from a classification or prediction method

        • 11.3.2 Generalizability

        • 11.3.3 Vapnik’s learning theory

        • 11.3.4 Overfitting

      • 11.4 Classification by decision tree

        • 11.4.1 Principle of the decision trees

        • 11.4.2 Definitions – the first step in creating the tree

        • 11.4.3 Splitting criterion

        • 11.4.4 Distribution among nodes – the second step in creating the tree

        • 11.4.5 Pruning – the third step in creating the tree

        • 11.4.6 A pitfall to avoid

        • 11.4.7 The CART, C5.0 and CHAID trees

        • 11.4.8 Advantages of decision trees

        • 11.4.9 Disadvantages of decision trees

      • 11.5 Prediction by decision tree

      • 11.6 Classification by discriminant analysis

        • 11.6.1 The problem

        • 11.6.2 Geometric descriptive discriminant analysis (discriminant factor analysis)

        • 11.6.3 Geometric predictive discriminant analysis

        • 11.6.4 Probabilistic discriminant analysis

        • 11.6.5 Measurements of the quality of the model

        • 11.6.6 Syntax of discriminant analysis in SAS

        • 11.6.7 Discriminant analysis on qualitative variables (DISQUAL Method)

        • 11.6.8 Advantages of discriminant analysis

        • 11.6.9 Disadvantages of discriminant analysis

      • 11.7 Prediction by linear regression

        • 11.7.1 Simple linear regression

        • 11.7.2 Multiple linear regression and regularized regression

        • 11.7.3 Tests in linear regression

        • 11.7.4 Tests on residuals

        • 11.7.5 The influence of observations

        • 11.7.6 Example of linear regression

        • 11.7.7 Further details of the SAS linear regression syntax

        • 11.7.8 Problems of collinearity in linear regression: an example using R

        • 11.7.9 Problems of collinearity in linear regression: diagnosis and solutions

        • 11.7.10 PLS regression

        • 11.7.11 Handling regularized regression with SAS and R

        • 11.7.12 Robust regression

        • 11.7.13 The general linear model

      • 11.8 Classification by logistic regression

        • 11.8.1 Principles of binary logistic regression

        • 11.8.2 Logit, probit and log-log logistic regressions

        • 11.8.3 Odds ratios

        • 11.8.4 Illustration of division into categories

        • 11.8.5 Estimating the parameters

        • 11.8.6 Deviance and quality measurement in a model

        • 11.8.7 Complete separation in logistic regression

        • 11.8.8 Statistical tests in logistic regression

        • 11.8.9 Effect of division into categories and choice of the reference category

        • 11.8.10 Effect of collinearity

        • 11.8.11 The effect of sampling on logit regression

        • 11.8.12 The syntax of logistic regression in SAS Software

        • 11.8.13 An example of modelling by logistic regression

        • 11.8.14 Logistic regression with R

        • 11.8.15 Advantages of logistic regression

        • 11.8.16 Advantages of the logit model compared with probit

        • 11.8.17 Disadvantages of logistic regression

      • 11.9 Developments in logistic regression

        • 11.9.1 Logistic regression on individuals with different weights

        • 11.9.2 Logistic regression with correlated data

        • 11.9.3 Ordinal logistic regression

        • 11.9.4 Multinomial logistic regression

        • 11.9.5 PLS logistic regression

        • 11.9.6 The generalized linear model

        • 11.9.7 Poisson regression

        • 11.9.8 The generalized additive model

      • 11.10 Bayesian methods

        • 11.10.1 The naive Bayesian classifier

        • 11.10.2 Bayesian networks

      • 11.11 Classification and prediction by neural networks

        • 11.11.1 Advantages of neural networks

        • 11.11.2 Disadvantages of neural networks

      • 11.12 Classification by support vector machines

        • 11.12.1 Introduction to SVMs

        • 11.12.2 Example

        • 11.12.3 Advantages of SVMs

        • 11.12.4 Disadvantages of SVMs

      • 11.13 Prediction by genetic algorithms

        • 11.13.1 Random generation of initial rules

        • 11.13.2 Selecting the best rules

        • 11.13.3 Generating new rules

        • 11.13.4 End of the algorithm

        • 11.13.5 Applications of genetic algorithms

        • 11.13.6 Disadvantages of genetic algorithms

      • 11.14 Improving the performance of a predictive model

      • 11.15 Bootstrapping and ensemble methods

        • 11.15.1 Bootstrapping

        • 11.15.2 Bagging

        • 11.15.3 Boosting

        • 11.15.4 Some applications

        • 11.15.5 Conclusion

      • 11.16 Using classification and prediction methods

        • 11.16.1 Choosing the modelling methods

        • 11.16.2 The training phase of a model

        • 11.16.3 Reject inference

        • 11.16.4 The test phase of a model

        • 11.16.5 The ROC curve, the lift curve and the Gini index

        • 11.16.6 The classification table of a model

        • 11.16.7 The validation phase of a model

        • 11.16.8 The application phase of a model

    • 12 An application of data mining: scoring

      • 12.1 The different types of score

      • 12.2 Using propensity scores and risk scores

      • 12.3 Methodology

        • 12.3.1 Determining the objectives

        • 12.3.2 Data inventory and preparation

        • 12.3.3 Creating the analysis base

        • 12.3.4 Developing a predictive model

        • 12.3.5 Using the score

        • 12.3.6 Deploying the score

        • 12.3.7 Monitoring the available tools

      • 12.4 Implementing a strategic score

      • 12.5 Implementing an operational score

      • 12.6 Scoring solutions used in a business

        • 12.6.1 In-house or outsourced?

        • 12.6.2 Generic or personalized score

        • 12.6.3 Summary of the possible solutions

      • 12.7 An example of credit scoring (data preparation)

      • 12.8 An example of credit scoring (modelling by logistic regression)

      • 12.9 An example of credit scoring (modelling by DISQUAL discriminant analysis)

      • 12.10 A brief history of credit scoring

      • References

    • 13 Factors for success in a data mining project

      • 13.1 The subject

      • 13.2 The people

      • 13.3 The data

      • 13.4 The IT Systems

      • 13.5 The business culture

      • 13.6 Data mining: eight common misconceptions

        • 13.6.1 No a priori knowledge is needed

        • 13.6.2 No specialist staff are needed

        • 13.6.3 No statisticians are needed (‘you can just press a button’)

        • 13.6.4 Data mining will reveal unbelievable wonders

        • 13.6.5 Data mining is revolutionary

        • 13.6.6 You must use all the available data

        • 13.6.7 You must always sample

        • 13.6.8 You must never sample

      • 13.7 Return on investment

    • 14 Text mining

      • 14.1 Definition of text mining

      • 14.2 Text sources used

      • 14.3 Using text mining

      • 14.4 Information retrieval

        • 14.4.1 Linguistic analysis

        • 14.4.2 Application of statistics and data mining

        • 14.4.3 Suitable methods

      • 14.5 Information extraction

        • 14.5.1 Principles of information extraction

        • 14.5.2 Example of application: transcription of business interviews

      • 14.6 Multi-type data mining

    • 15 Web mining

      • 15.1 The aims of web mining

      • 15.2 Global analyses

        • 15.2.1 What can they be used for?

        • 15.2.2 The structure of the log file

        • 15.2.3 Using the log file

      • 15.3 Individual analyses

      • 15.4 Personal analysis

    • Appendix A: Elements of statistics

      • A.1 A brief history

        • A.1.1 A few dates

        • A.1.2 From statistics . . . to data mining

      • A.2 Elements of statistics

        • A.2.1 Statistical characteristics

        • A.2.2 Box and whisker plot

        • A.2.3 Hypothesis testing

        • A.2.4 Asymptotic, exact, parametric and non-parametric tests

        • A.2.5 Confidence interval for a mean: student’s t test

        • A.2.6 Confidence interval of a frequency (or proportion)

        • A.2.7 The relationship between two continuous variables: the linear correlation coefficient

        • A.2.8 The relationship between two numeric or ordinal variables: Spearman’s rank correlation coefficient and Kendall’s tau

        • A.2.9 The relationship between n sets of several continuous or binary variables: canonical correlation analysis

        • A.2.10 The relationship between two nominal variables: the χ² test

        • A.2.11 Example of use of the χ² test

        • A.2.12 The relationship between two nominal variables: Cramér’s coefficient

        • A.2.13 The relationship between a nominal variable and a numeric variable: the variance test (one-way ANOVA test)

        • A.2.14 The Cox semi-parametric survival model

      • A.3 Statistical tables

        • A.3.1 Table of the standard normal distribution

        • A.3.2 Table of Student’s t distribution

        • A.3.3 Chi-Square table

        • A.3.4 Table of the Fisher-Snedecor distribution at the 0.05 significance level

        • A.3.5 Table of the Fisher-Snedecor distribution at the 0.10 significance level

    • Appendix B: Further reading

      • B.1. Statistics and data analysis

      • B.2. Data mining and statistical learning

      • B.3. Text mining

      • B.4. Web mining

      • B.5. R software

      • B.6. SAS software

      • B.7. IBM SPSS software

      • B.8. Websites

    • Index

Nội dung

W I L E Y S E R I E S I N C O M P U TAT I O N A L S TAT I S T I C S Stéphane Tufféry, University of Rennes, France With Forewords by Gilbert Saporta and David J Hand Translated by Rod Riesco Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge Data mining is usually associated with a business or an organization’s need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives This book looks at both classical and modern methods of data mining, such as clustering, discriminate analysis, decision trees, neural networks and support vector machines along with illustrative examples throughout the book to explain the theory of these models Recent methods such as bagging and boosting, decision trees, neural networks, support vector machines and genetic algorithm are also discussed along with their advantages and disadvantages Key Features:  Presents a comprehensive introduction to all techniques used in data mining and statistical learning  Includes coverage of data mining with R as well as a thorough comparison of the two industry leaders, SAS and SPSS  Gives practical tips for data mining implementation as well as the latest techniques and state of the art theory  Looks at a range of methods, tools and applications, such as scoring to web mining and text mining and presents their advantages and disadvantages  Supported by an accompanying website hosting datasets and user analysis Business intelligence analysts and statisticians, compliance and financial experts in both commercial and government organizations across all industry sectors will benefit from this book DATA MINING AND STATISTICS FOR DECISION MAKING DATA MINING AND STATISTICS FOR DECISION MAKING Tufféry W I L E Y S E R I E S I N C O M P U TAT I O N A L S TAT I S T I C S Stéphane Tufféry DATA MINING AND STATISTICS FOR DECISION MAKING www.wiley.com/go/decision_making Red box rules are for proof stage only Delete before final printing Data Mining and Statistics for Decision Making Wiley Series in Computational Statistics Consulting Editors: Paolo Giudici University of Pavia, Italy Geof H Givens Colorado State University, USA Bani K Mallick Texas A&M University, USA Wiley Series in Computational Statistics is comprised of practical guides and cutting edge research books on new developments in computational statistics It features quality authors with a strong applications focus The texts in the series provide detailed coverage of statistical concepts, methods and case studies in areas at the interface of statistics, computing, and numerics With sound motivation and a wealth of practical examples, the books show in concrete terms how to select and to use appropriate ranges of statistical computing techniques in particular fields of study Readers are assumed to have a basic understanding of introductory terminology The series concentrates on applications of computational methods in statistics to fields of bioinformatics, genomics, epidemiology, business, engineering, finance and applied statistics Titles in the Series Biegler, Biros, Ghattas, Heinkenschloss, Keyes, Mallick, Marzouk, Tenorio, Waanders, Willcox – Large-Scale Inverse Problems and Quantification of Uncertainty Billard and Diday – Symbolic Data Analysis: Conceptual Statistics and Data Mining Bolstad – Understanding Computational Bayesian Statistics Borgelt, Steinbrecher and Kruse – Graphical Models, 2e Dunne – A Statistical Approach to Neutral Networks for Pattern Recognition Liang, Liu and Carroll – Advanced Markov Chain Monte Carlo Methods Ntzoufras – Bayesian Modeling Using WinBUGS Data Mining and Statistics for Decision Making Ste´phane Tuffe´ry University of Rennes, France Translated by Rod Riesco First published under the title ‘Data Mining et Statistique Decisionnelle’ by Editions Technip Ó Editions Technip 2008 All rights reserved Authorised translation from French language edition published by Editions Technip, 2008 This edition first published 2011 Ó 2011 John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available in electronic books Designations used by companies to distinguish their products are often claimed as trademarks All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners The publisher is not associated with any product or vendor mentioned in this book This publication is designed to provide accurate and authoritative information in regard to the subject matter covered It is sold on the understanding that the publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional should be sought Library of Congress Cataloging-in-Publication Data Tuffery, Stephane Data mining and statistics for decision making / Stephane Tuffery p cm – (Wiley series in computational statistics) Includes bibliographical references and index ISBN 978-0-470-68829-8 (hardback) Data mining Statistical decision I Title QA76.9.D343T84 2011 006.3’12–dc22 2010039789 A catalogue record for this book is available from the British Library Print ISBN: 978-0-470-68829-8 ePDF ISBN: 978-0-470-97916-7 oBook ISBN: 978-0-470-97917-4 ePub ISBN: 978-0-470-97928-0 Typeset in by 10/12pt Times Roman by Thomson Digital, Noida, India to Paul and Nicole Tuffe´ry, with gratitude and affection Contents Preface xvii Foreword xxi Foreword from the French language edition List of trademarks xxiii xxv Overview of data mining 1.1 What is data mining? 1.2 What is data mining used for? 1.2.1 Data mining in different sectors 1.2.2 Data mining in different applications 1.3 Data mining and statistics 1.4 Data mining and information technology 1.5 Data mining and protection of personal data 1.6 Implementation of data mining 1 4 11 12 16 23 The development of a data mining study 2.1 Defining the aims 2.2 Listing the existing data 2.3 Collecting the data 2.4 Exploring and preparing the data 2.5 Population segmentation 2.6 Drawing up and validating predictive models 2.7 Synthesizing predictive models of different segments 2.8 Iteration of the preceding steps 2.9 Deploying the models 2.10 Training the model users 2.11 Monitoring the models 2.12 Enriching the models 2.13 Remarks 2.14 Life cycle of a model 2.15 Costs of a pilot project 25 26 26 27 30 33 35 36 37 37 38 38 40 41 41 41 Data exploration and preparation 3.1 The different types of data 3.2 Examining the distribution of variables 3.3 Detection of rare or missing values 3.4 Detection of aberrant values 3.5 Detection of extreme values 43 43 44 45 49 52 viii CONTENTS 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 Tests of normality Homoscedasticity and heteroscedasticity Detection of the most discriminating variables 3.8.1 Qualitative, discrete or binned independent variables 3.8.2 Continuous independent variables 3.8.3 Details of single-factor non-parametric tests 3.8.4 ODS and automated selection of discriminating variables Transformation of variables Choosing ranges of values of binned variables Creating new variables Detecting interactions Automatic variable selection Detection of collinearity Sampling 3.15.1 Using sampling 3.15.2 Random sampling methods 52 58 59 60 62 65 70 73 74 81 82 85 86 89 89 90 Using commercial data 4.1 Data used in commercial applications 4.1.1 Data on transactions and RFM data 4.1.2 Data on products and contracts 4.1.3 Lifetimes 4.1.4 Data on channels 4.1.5 Relational, attitudinal and psychographic data 4.1.6 Sociodemographic data 4.1.7 When data are unavailable 4.1.8 Technical data 4.2 Special data 4.2.1 Geodemographic data 4.2.2 Profitability 4.3 Data used by business sector 4.3.1 Data used in banking 4.3.2 Data used in insurance 4.3.3 Data used in telephony 4.3.4 Data used in mail order 93 93 93 94 94 96 96 97 97 98 98 98 105 106 106 108 108 109 Statistical and data mining software 5.1 Types of data mining and statistical software 5.2 Essential characteristics of the software 5.2.1 Points of comparison 5.2.2 Methods implemented 5.2.3 Data preparation functions 5.2.4 Other functions 5.2.5 Technical characteristics 5.3 The main software packages 5.3.1 Overview 111 111 114 114 115 116 116 117 117 117 ... What is data mining? 1.2 What is data mining used for? 1.2.1 Data mining in different sectors 1.2.2 Data mining in different applications 1.3 Data mining and statistics 1.4 Data mining and information... information Foreword It is a real pleasure to be invited to write the foreword to the English translation of Stephane Tuffery’s book Data Mining and Statistics for Decision Making Data mining. .. ever, this book covers all the essentials (and more) needed for a clear understanding and proper application of data mining and statistics for decision making Among the new features in this edition,

Ngày đăng: 17/04/2017, 09:51