1. Trang chủ
  2. » Tất cả

A Practical Guide to Data Mining for Business and Industry [Ahlemeyer-Stubbe & Coleman 2014-05-12]

325 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 325
Dung lượng 15,69 MB

Nội dung

229 x 152 19mm RED BOX RULES ARE FOR PROOF STAGE ONLY DELETE BEFORE FINAL PRINTING AhlemeyerStubbe A Practical Guide to Data Mining for Business and Industry Director Strategic Analytics, DRAFTFCB München GmbH, Germany Shirley Coleman Principal Statistician, Industrial Statistics Research Unit, School of Maths and Statistics, Newcastle University, UK A Practical Guide to Data Mining for Business and Industry presents a user friendly approach to data mining methods and provides a solid foundation for their application The methodology presented is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications This book is designed so that the reader can cross-reference a particular application or method to sectors of interest The necessary basic knowledge of data mining methods is also presented, along with sector issues relating to data mining and its various applications A Practical Guide to Data Mining for Business and Industry: • Equips readers with a solid foundation to both data mining and its applications • Provides tried and tested guidance in finding workable solutions to typical business problems • Offers solution patterns for common business problems that can be adapted by the reader to their particular areas of interest • Focuses on practical solutions whilst providing grounding in statistical practice • Explores data mining in a sales and marketing context, as well as quality management and medicine • Is supported by a supplementary website (www.wiley.com/go/data_mining) featuring datasets and solutions Aimed at statisticians, computer scientists and economists involved in data mining as well as students studying economics, business administration and international marketing A Practical Guide to Data Mining for Business and Industry Andrea Ahlemeyer-Stubbe Coleman A Practical Guide to Data Mining for Business and Industry Andrea Ahlemeyer-Stubbe Shirley Coleman A Practical Guide to Data Mining for Business and Industry A Practical Guide to Data Mining for Business and Industry Andrea Ahlemeyer-Stubbe Director Strategic Analytics, DRAFTFCB München GmbH, Germany Shirley Coleman Principal Statistician, Industrial Statistics Research Unit School of Maths and Statistics, Newcastle University, UK This edition first published 2014 © 2014 John Wiley & Sons, Ltd Registered Office John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher Wiley also publishes its books in a variety of electronic formats Some ­content that appears in print may not be available in electronic books Designations used by companies to distinguish their products are often claimed as trademarks All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners The publisher is not associated with any product or vendor mentioned in this book Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom If professional advice or other expert assistance is required, the services of a competent professional should be sought Library of Congress Cataloging-in-Publication Data Ahlemeyer-Stubbe, Andrea A practical guide to data mining for business and industry / Andrea Ahlemeyer-Stubbe, Shirley Coleman   pages cm   Includes bibliographical references and index   ISBN 978-1-119-97713-1 (cloth) 1.  Data mining.  2.  Marketing–Data processing.  3.  Management–Mathematical models I. Title   HF5415.125.A42 2014  006.3′12–dc23 2013047218 A catalogue record for this book is available from the British Library ISBN: 978-1-119-97713-1 Set in 10.5/13pt Minion by SPi Publisher Services, Pondicherry, India 1 2014 Contents Glossary of terms xii Part I  Data Mining Concept 1 Introduction 1.1  Aims of the Book 1.2  Data Mining Context 1.2.1  Domain Knowledge 1.2.2  Words to Remember 1.2.3  Associated Concepts 1.3  Global Appeal 1.4  Example Datasets Used in This Book 1.5  Recipe Structure 1.6  Further Reading and Resources 3 7 8 11 13 2  Data Mining Definition14 2.1  Types of Data Mining Questions 15 2.1.1  Population and Sample 15 2.1.2  Data Preparation 16 2.1.3  Supervised and Unsupervised Methods 16 2.1.4  Knowledge-Discovery Techniques 18 2.2  Data Mining Process 19 2.3  Business Task: Clarification of the Business Question behind the Problem 20 2.4  Data: Provision and Processing of the Required Data 21 2.4.1  Fixing the Analysis Period 22 2.4.2  Basic Unit of Interest 23 vi   2.4.3  Target Variables 2.4.4  Input Variables/Explanatory Variables 2.5  Modelling: Analysis of the Data 2.6  Evaluation and Validation during the Analysis Stage 2.7 Application of Data Mining Results and Learning from the Experience Part II  Data Mining Practicalities  Contents 24 24 25 25 28 31 3  All about data33 3.1  Some Basics 34 3.1.1  Data, Information, Knowledge and Wisdom 35 3.1.2  Sources and Quality of Data 36 3.1.3  Measurement Level and Types of Data 37 3.1.4  Measures of Magnitude and Dispersion 39 3.1.5  Data Distributions 41 3.2 Data Partition: Random Samples for Training, Testing and Validation 41 3.3  Types of Business Information Systems 44 3.3.1  Operational Systems Supporting Business Processes 44 3.3.2  Analysis-Based Information Systems 45 3.3.3  Importance of Information 45 3.4  Data Warehouses 47 3.4.1  Topic Orientation 47 3.4.2  Logical Integration and Homogenisation 48 3.4.3  Reference Period 48 3.4.4  Low Volatility 48 3.4.5  Using the Data Warehouse 49 3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS 50 3.5.1  Database Management System (DBMS) 51 3.5.2  Database (DB) 51 3.5.3  Database Communication Systems (DBCS) 51 3.6  Data Marts 52 3.6.1  Regularly Filled Data Marts 53 3.6.2 Comparison between Data Marts and Data Warehouses 53 3.7  A Typical Example from the Online Marketing Area 54 3.8  Unique Data Marts 54 3.8.1  Permanent Data Marts 54 3.8.2  Data Marts Resulting from Complex Analysis 56 Contents   3.9   vii Data Mart: Do’s and Don’ts 3.9.1  Do’s and Don’ts for Processes 3.9.2  Do’s and Don’ts for Handling 3.9.3  Do’s and Don’ts for Coding/Programming 58 58 58 59 4  Data Preparation 4.1 Necessity of Data Preparation 4.2 From Small and Long to Short and Wide 4.3 Transformation of Variables 4.4 Missing Data and Imputation Strategies 4.5 Outliers 4.6 Dealing with the Vagaries of Data 4.6.1 Distributions 4.6.2  Tests for Normality 4.6.3  Data with Totally Different Scales 4.7 Adjusting the Data Distributions 4.7.1  Standardisation and Normalisation 4.7.2 Ranking 4.7.3  Box–Cox Transformation 4.8 Binning 4.8.1  Bucket Method 4.8.2  Analytical Binning for Nominal Variables 4.8.3 Quantiles 4.8.4  Binning in Practice 4.9 Timing Considerations 4.10  Operational Issues 60 61 61 65 66 69 70 70 70 70 71 71 71 71 72 73 73 73 74 77 77 5 Analytics 5.1 Introduction 5.2 Basis of Statistical Tests 5.2.1  Hypothesis Tests and P Values 5.2.2  Tolerance Intervals 5.2.3  Standard Errors and Confidence Intervals 5.3 Sampling 5.3.1 Methods 5.3.2  Sample Sizes 5.3.3  Sample Quality and Stability 5.4 Basic Statistics for Pre-analytics 5.4.1 Frequencies 5.4.2  Comparative Tests 5.4.3  Cross Tabulation and Contingency Tables 5.4.4 Correlations 78 79 80 80 82 83 83 83 84 84 85 85 88 89 90 viii    Contents 5.4.5  Association Measures for Nominal Variables 5.4.6 Examples of Output from Comparative and Cross Tabulation Tests 5.5  Feature Selection/Reduction of Variables 5.5.1  Feature Reduction Using Domain Knowledge 5.5.2  Feature Selection Using Chi-Square 5.5.3  Principal Components Analysis and Factor Analysis 5.5.4  Canonical Correlation, PLS and SEM 5.5.5  Decision Trees 5.5.6  Random Forests 5.6  Time Series Analysis 6 Methods 6.1  Methods Overview 6.2  Supervised Learning 6.2.1  Introduction and Process Steps 6.2.2  Business Task 6.2.3  Provision and Processing of the Required Data 6.2.4  Analysis of the Data 6.2.5 Evaluation and Validation of the Results (during the Analysis) 6.2.6  Application of the Results 6.3  Multiple Linear Regression for use when Target is Continuous 6.3.1  Rationale of Multiple Linear Regression Modelling 6.3.2  Regression Coefficients 6.3.3  Assessment of the Quality of the Model 6.3.4  Example of Linear Regression in Practice 6.4 Regression when the Target is not Continuous 6.4.1  Logistic Regression 6.4.2  Example of Logistic Regression in Practice 6.4.3  Discriminant Analysis 6.4.4  Log-Linear Models and Poisson Regression 6.5  Decision Trees 6.5.1 Overview 6.5.2  Selection Procedures of the Relevant Input Variables 6.5.3  Splitting Criteria 6.5.4  Number of Splits (Branches of the Tree) 6.5.5 Symmetry/Asymmetry 6.5.6 Pruning 6.6  Neural Networks 6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks 91 92 96 96 97 97 98 98 98 99 102 104 105 105 105 106 107 108 108 109 109 110 111 113 119 119 121 126 128 129 129 134 134 135 135 135 137 141 ... A Practical Guide to Data Mining for? ?Business and Industry A Practical Guide to Data Mining for Business and Industry Andrea Ahlemeyer-Stubbe Director Strategic Analytics, DRAFTFCB München... 2.4.3  Target Variables 2.4.4  Input Variables/Explanatory Variables 2.5  Modelling: Analysis of the Data 2.6  Evaluation and Validation during the Analysis Stage 2.7 Application of Data Mining. .. the satisfaction of unearthing patterns and meaning This book is the result of detailed study of data and showcases the lessons learnt A Practical Guide to Data Mining for Business and Industry,

Ngày đăng: 17/04/2017, 19:53

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN