1. Trang chủ
  2. » Công Nghệ Thông Tin

Data mining for dummies brown 2014 09 29 (1)

411 59 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 411
Dung lượng 9,73 MB

Nội dung

Data Mining Data Mining For Dummies® Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com Copyright © 2014 by John Wiley & Sons, Inc., Hoboken, New Jersey Media and software compilation copyright © 2014 by John Wiley & Sons, Inc All rights reserved Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permissions Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc and may not be used without written permission Samsung and Galaxy S are registered trademarks of Samsung Electronics Co Ltd All other trademarks are the property of their respective owners John Wiley & Sons, Inc is not associated with any product or vendor mentioned in this book LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ For general information on our other products and services, please contact our Customer Care Department within the U.S at 877-762-2974, outside the U.S at 317-572-3993, or fax 317-572-4002 For technical support, please visit www.wiley.com/techsupport Wiley publishes in a variety of print and electronic formats and by print-on-demand Some material included with standard print versions of this book may not be included in e-books or in print-on-demand If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com For more information about Wiley products, visit www.wiley.com Library of Congress Control Number: 2014935519 ISBN 978-1-118-89317-3 (pbk); ISBN 978-1-118-89316-6 (ebk); ISBN 978-1-118-89319-7 (ebk) Manufactured in the United States of America 10 Contents at a Glance Introduction Part I: Getting Started with Data Mining Chapter 1: Catching the Data-Mining Train Chapter 2: A Day in Your Life as a Data Miner 17 Chapter 3: Teaming Up to Reach Your Goals 49 Part II: Exploring Data-Mining Mantras and Methods 61 Chapter 4: Learning the Laws of Data Mining 63 Chapter 5: Embracing the Data-Mining Process 73 Chapter 6: Planning for Data-Mining Success 89 Chapter 7: Gearing Up with the Right Sof tware 97 Part III: Gathering the Raw Materials 109 Chapter 8: Digging into Your Data 111 Chapter 9: Making New Data 119 Chapter 10: Ferreting Out Public Data Sources 141 Chapter 11: Buying Data 163 Part IV: A Data Miner’s Survival Kit 171 Chapter 12: Getting Familiar with Your Data 173 Chapter 13: Dealing in Graphic Detail 195 Chapter 14: Showing Your Data Who’s Boss 219 Chapter 15: Your Exciting Career in Modeling 245 Part V: More Data-Mining Methods 273 Chapter 16: Data Mining Using Classic Statistical Methods 275 Chapter 17: Mining Data for Clues 295 Chapter 18: Expanding Your Horizons 307 Part VI: The Part of Tens 319 Chapter 19: Ten Great Resources for Data Miners 321 Chapter 20: Ten Useful Kinds of Analysis That Complement Data Mining 325 Appendix A: Glossary 333 Appendix B: Data-Mining Sof  tware Sources 339 Appendix C: Major Data Vendors 349 Appendix D: Sources and Citations 357 Index 361 Table of Contents Introduction About This Book Foolish Assumptions Icons Used in This Book Beyond the Book Where to Go from Here Part I: Getting Started with Data Mining Chapter 1: Catching the Data-Mining Train Getting Real about Data Mining Not your professor’s statistics The value of data mining Working for it Doing What Data Miners Do 10 Focusing on the business 10 Understanding how data miners spend their time 11 Getting to know the data-mining process 11 Making models 12 Understanding mathematical models 12 Putting information into action 13 Discovering Tools and Methods 13 Visual programming 14 Working quick and dirty 15 Testing, testing, and testing some more 16 Chapter 2: A Day in Your Life as a Data Miner 17 Starting Your Day Off Right 17 Meeting the team 18 Exploring with aim 18 Structuring time with the right process 20 Understanding Your Business Goals 20 Understanding Your Data 22 Describing data 22 Exploring data 23 Cleaning data 27 Preparing Your Data 28 Taking first steps with the property data 28 Preparing the ownership change indicator 32 Merging the datasets 32 Deriving new variables 34 viii Data Mining For Dummies Modeling Your Data 40 Using balanced data 40 Splitting data 41 Building a model 43 Evaluating Your Results 44 Examining the decision tree 44 Using a diagnostic chart 46 Assessing the status of the model 47 Putting Your Results into Action 48 Chapter 3: Teaming Up to Reach Your Goals 49 Nothing Could Be Finer Than to Be a Data Miner 49 You can be a data miner 50 Using the knowledge you have 51 Data Miners Play Nicely with Others 51 Cooperation is a necessity 51 Oh, the people you’ll meet! 53 Working with Executives 56 Greetings and elicitations 57 Lining up your priorities 58 Talking data mining with executives 58 Part II: Exploring Data-Mining Mantras and Methods 61 Chapter 4: Learning the Laws of Data Mining 63 1st Law: Business Goals 63 2nd Law: Business Knowledge 64 3rd Law: Data Preparation 65 4th Law: Right Model 66 5th Law: Pattern 67 6th Law: Amplification 68 7th Law: Prediction 69 8th Law: Value 70 9th Law: Change 70 Chapter 5: Embracing the Data-Mining Process 73 Whose Standard Is It, Anyway? 73 Approaching the process in phases 74 Cycling through phases and projects 74 Documenting your work 75 Business Understanding 76 Data Understanding 79 Data Preparation 82 Modeling 84 Evaluation 86 Deployment 87 Index IBM, 341–342 KNIME.com AG, 342 KXEN, 342–343 Megaputer, 343 Oracle, 343–344 R Foundation, 344 RapidMiner, 344 Revolution Analytics, 345 sales representatives, engaging with, 106–108 Salford Systems, 345 SAS Institute, 345–346 Statsoft Inc., 346 Tableau Software, 346–347 Teradata, 347 University of Ljubljana, 347 University of Waikato, 348 Wolfram Research, 348 verifying data quality, 81–82 video tutorials for software, 308 visual programming defined, 338 general discussion, 14–15 importing data into, 28–32 overview, 102–103, 340 terminology related to, 191 visualization, 338 See also graphs vocabulary, data mining, 191 voter research, 127–130, 131 web analytics, 331–332 web logs, 121 web page testing, 126–127 web scraping, 23 weighting, 243 weights, linear models, 264 Weka associations, creating rules for, 300–303 associations, importing data for, 298–300 associations, refining results for, 303–306 chart matrix, 212 comments, 225 exporting data, 230, 233 interactive scatterplots, 209 supplier information, 348 text files, opening in, 185–188 wizards, visual programming interface, 29 Wolberg, William H., 249 See also breast tumor diagnosis data Wolfram Alpha, 348 Wolfram Research, 348 workflow for decision tree creation, 250– 258, 263 writing business cases, 94 •W• •Z• warehouse clubs, 123–124 Warning! icon, Zeroth Law (0th Law) of Data Mining, 71 •X• XML, importing, 190 xy pairs, 199 381 382 Data Mining For Dummies About the Author Meta S Brown helps technical professionals communicate with everybody else She’s the creator of the Storytelling for Data Analysts and Storytelling for Tech workshops Dedication For Marty, who never gives me a hard time about work Ever Author’s Acknowledgments A number of experts shared their experience and time to contribute to this book Each of them is named somewhere in the pages that follow The researchers who share data make books like this possible Sources are cited within the book Wiley editors Christopher Morris, Leah Michael, John Edwards, and Kyle Looper are models of professionalism I wonder if they know how exceptional that is Tom Khabaza, technical editor and the world’s best data-mining mentor, is a fountain of knowledge and a real mensch Laaren Brown and Lenny Hort — authors, editors, and much more — provided excellent advice galore Publisher’s Acknowledgments Acquisitions Editor: Kyle Looper Project Coordinator: Patrick Redmond Senior Project Editor: Christopher Morris Cover Image: ©iStock.com/Media Mates Oy Copy Editor: John Edwards Technical Editor: Thomas Khabaza Editorial Assistant: Claire Johnson Sr Editorial Assistant: Cherie Case ... successful data mining ✓ Why your data is valuable ✓ How and where to get additional data ✓ Why choosing tools shouldn’t be your first concern Data Mining For Dummies ✓ What data- mining techniques... Exploring Data- Mining Mantras and Methods 61 Chapter 4: Learning the Laws of Data Mining 63 Chapter 5: Embracing the Data- Mining Process 73 Chapter 6: Planning for Data- Mining Success... search for data on the federal government’s data portal, what common data- mining mistakes you can avoid, and more ✓ The Cheat Sheet for this book can be found at www .dummies. com/cheatsheet/datamining

Ngày đăng: 23/10/2019, 15:32