Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 726 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
726
Dung lượng
12,24 MB
Nội dung
Contents Foreword Preface to the second edition Preface to the first edition Acknowledgments PART I PRELIMINARIES Chapter Introduction 1.1 What Is Data Mining? 1.2 Where Is Data Mining Used? 1.3 Origins of Data Mining 1.4 Rapid Growth of Data Mining 1.5 Why Are There So Many Different Methods? 1.6 Terminology and Notation 1.7 Road Maps to This Book Chapter Overview of the Data Mining Process 2.1 Introduction 2.2 Core Ideas in Data Mining 2.3 Supervised and Unsupervised Learning 2.4 Steps in Data Mining 2.5 Preliminary Steps 2.6 Building a Model: Example with Linear Regression 2.7 Using Excel for Data Mining PROBLEMS PART II DATA EXPLORATION AND DIMENSION REDUCTION Chapter Data Visualization 3.1 Uses of Data Visualization 3.2 Data Examples 3.3 Basic Charts: bar charts, line graphs, and scatterplots 3.4 Multidimensional Visualization 3.5 Specialized Visualizations 3.6 Summary of major visualizations and operations, according to data mining goal PROBLEMS Chapter Dimension Reduction 4.1 Introduction 4.2 Practical Considerations 4.3 Data Summaries 4.4 Correlation Analysis 4.5 Reducing the Number of Categories in Categorical Variables 4.6 Converting A Categorical Variable to A Numerical Variable 4.7 Principal Components Analysis 4.8 Dimension Reduction Using Regression Models 4.9 Dimension Reduction Using Classification and Regression Trees PROBLEMS PART III PERFORMANCE EVALUATION Chapter Evaluating Classification and Predictive Performance 5.1 Introduction 5.2 Judging Classification Performance 5.3 Evaluating Predictive Performance PROBLEMS PART IV PREDICTION AND CLASSIFICATION METHODS Chapter Multiple Linear Regression 6.1 Introduction 6.2 Explanatory versus Predictive modeling 6.3 Estimating the Regression Equation and Prediction 6.4 Variable Selection in Linear Regression PROBLEMS Chapter k-Nearest Neighbors (k-NN) 7.1 k-NN Classifier (categorical outcome) 7.2 k-NN for a Numerical Response 7.3 Advantages and Shortcomings of k-NN Algorithms PROBLEMS Chapter Naive Bayes 8.1 Introduction 8.2 Applying the Full (Exact) Bayesian Classifier 8.3 Advantages and Shortcomings of the Naive Bayes Classifier PROBLEMS Chapter Classification and Regression Trees 9.1 Introduction 9.2 Classification Trees 9.3 Measures of Impurity 9.4 Evaluating the Performance of a Classification Tree 9.5 Avoiding Overfitting 9.6 Classification Rules from Trees 9.7 Classification Trees for More Than two Classes 9.8 Regression Trees 9.9 Advantages, weaknesses, and Extensions PROBLEMS Chapter 10 Logistic Regression 10.1 Introduction 10.2 Logistic Regression Model 10.3 Evaluating Classification performance 10.4 Example of Complete Analysis: Predicting Delayed Flights 10.5 Appendix: logistic Regression for Profiling PROBLEMS Chapter 11 Neural Nets 11.1 Introduction 11.2 Concept And Structure Of A Neural Network 11.3 Fitting A Network To Data 11.4 Required User Input 11.5 Exploring The Relationship Between Predictors And Response 11.6 Advantages And Weaknesses Of Neural Networks PROBLEMS Chapter 12 Discriminant Analysis 12.1 Introduction 12.2 Distance of an Observation from a Class 12.3 Fisher’s Linear Classification Functions 12.4 Classification performance of Discriminant Analysis 12.5 Prior Probabilities 12.6 Unequal Misclassification Costs 12.7 Classifying more Than Two Classes 12.8 Advantages and Weaknesses PROBLEMS PART V RECORDS MINING RELATIONSHIPS AMONG Chapter 13 Association Rules 13.1 Introduction 13.2 Discovering Association Rules in Transaction Databases 13.3 Generating Candidate Rules 13.4 Selecting Strong Rules 13.5 Summary PROBLEMS Chapter 14 Cluster Analysis 14.1 Introduction 14.2 Measuring Distance Between Two Records 14.3 Measuring Distance Between Two Clusters 14.4 Hierarchical (Agglomerative) Clustering 14.5 Nonhierarchical Clustering: The k-Means Algorithm PROBLEMS PART VI FORECASTING TIME SERIES Chapter 15 Handling Time Series 15.1 Introduction 15.2 Explanatory versus Predictive Modeling 15.3 Popular Forecasting Methods in Business 15.4 Time Series Components 15.5 Data Partitioning PROBLEMS Chapter 16 Regression-Based Forecasting 16.1 Model With Trend 16.2 Model With Seasonality 16.3 Model With Trend And Seasonality 16.4 Autocorrelation And ARIMA Models PROBLEMS Chapter 17 Smoothing Methods 17.1 Introduction 17.2 Moving Average 17.3 Simple Exponential Smoothing 17.4 Advanced Exponential Smoothing PROBLEMS PART VII CASES Chapter 18 Cases 18.1 Charles book Club 18.2 German Credit 18.3 Tayko Software Cataloger 18.4 Segmenting Consumers of Bath Soap 18.5 Direct-Mail Fundraising 18.6 Catalog Cross Selling 18.7 Predicting Bankruptcy 18.8 Time Series Case: Forecasting Public Transportation Demand 10 pivot table polynomial trend predicting bankruptcy case predicting new observations prediction prediction error prediction techniques predictive accuracy predictive analytics predictive modeling predictive performance accuracy measures predictor preprocessing principal components principal components analysis classification and prediction 712 labeling normalizing the data training data validation set weighted averages weights principal components scores principal componentsweights prior probability probabilities logistic regression probability plot profile plot profiling discriminant analysis pruning public utilities data 713 cluster analysis Public_Transportation_Demand data Public_Transportation_Demand_case pure quadratic discriminant analysis quadratic model R2 random forests random sampling random utility theory random walk rank ordering ranking of records ratio of costs rescaling recommender systems record 714 recursive partitioning redundancy reference category reference line regression time series regression trees residual series residuals histogram response response rate reweight RFM segmentation riding-mower data CART discriminant analysis 715 k-nearest neighbor logistic regression visualization right skewed right-skewed distribution RMSE robust robust distances robust to outliers ROC curve root-mean-squared error row rules association rules S&P monthly closing prices sample sampling 716 satellite radio customer data association rules scale scatterplot animated color coded scatterplot matrix score seasonal variable seasonality second principal component segmentation segmenting consumers of bath soap case self-proximity SEMMA sensitivity sensitivity analysis separating hyperplane 717 separating line Sept 11 travel data time_series shampoo sales data time_series similarity measures simple linear regression simple random sampling single linkage singular value decomposition smoothing time series smoothing parameter souvenir sales data time_series spam e-mail data discriminant analysis specialized visualization specificity 718 split points splitting values SQL standard error of estimate standardization standardize statistical distance statistics steps in data mining stepwise stepwise regression stopping tree growth stratified sampling subset selection subset selection in linear regression subsets success class 719 sum of squared deviations sum of squared errors sum of squared perpendicular distances summary statistics supervised learning system administrators data discriminant analysis logistic regression target variable Tayko data multiple linear regression Tayko software catalog case terabyte terminal node test data test partition test set 720 time series dummies lagged series residuals RMS error window width time series forecasting time series partitioning total SSE total sum of squared errors total variability Toyota Corolla data classification tree multiple linear regression best subsets forward selection neural nets 721 principal components analysis Toys “R” Us revenues data data reduction time series training training data training partition training set transfer function transform transformation transformation of variables transformations transpose tree depth trees search 722 trend trend lines triage strategy trial triangle inequality unbiased unequal importance of classes Universal Bank data k nearest neighbors classification tree discriminant analysis logistic regression university rankings data cluster analysis principal components analysis unsupervised learning validation data 723 validation partition validation set variability between-class within-class variability variable binary dependent selection variable selection variables categorical continuous nominal numerical ordinal text variation 724 between-cluster within-cluster visualization animation color hue maps multiple panels networks shape size Treemaps Wal-Mart Wal-Mart stock data time series weight decay weighted average 725 weighted sampling wine data principal components analysis within-cluster dispersion z score zooming 726 ... Cataloging -in- Publication Data: Shmueli, Galit, 197 1Data mining for business intelligence: concepts, techniques, and applications in Microsoft Office Excel with XLMiner / Galit Shmueli, Nitin R Patel, ... Discovery and Data Mining (KDD) was held in 1995, and there are a variety of definitions of data mining A concise definition that captures the essence of data mining is: Extracting useful information... on inference (determining whether a pattern or interesting result might have happened by chance) is missing in data mining In comparison to statistics, data mining deals with large datasets in