Data Mining Cookbook_15 ppt

23 129 0
Data Mining Cookbook_15 ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

opportunities by industry, 6 profile creation of, 273, 274–275 Auditing and scoring outside, 155 – 161 B Back-end validation, 176–177 Backward regression, 103, 105, 109–110, 112, 221, 222, 245, 268 Bad data, 175 Balance bombs, 201 Bayes' Theorem, 121 Behavioral data, 26, 27 Bootstrapping, 138 , 140 – 146, 147 , 148 adjusting, 146 analysis, 229 churn modeling and, 270, 271, 272–273 formula for, 140 jackknifing versus, 138 validation using, 138, 140–146, 147, 148, 224, 225, 227–230, 249, 251, 270 Branding on the Web, 316–317 Business builders customers, 201 Business intelligence infrastructure, 33 C Categorical variables, 69–70, 80–85, 218–220 churn modeling and, 265, 266–268 linear predictors development and, 95–97 Champion versus challenger, 166 – 167, 175 Chi-square statistic, 78, 247 backward regression and, 105 categorical variables and, 80, 82, 83, 85, 218–219 continuous variables and, 210 – 211 forward selection method and, 104 score selection method and, 105 Page 360 (continued) Chi-square statistic stepwise regression and, 105 variable reduction and, 77 , 78 , 79 Churn, 10. See also Modeling churn (retaining profitable customers) case example of, 42 definition of, 11 Classification trees, 19 – 20 example of, 19 goal of, 19 interaction detection using, 98 linear regression versus, 19 purpose of, 19 software for building, 98 Classifying data, 54–55 qualitative, 54 quantitative, 54 Cleaning the data, 60–70, 188 categorical variables, 69–70 continuous variables, 60–69 missing values, 64 – 69 outliers and data errors, 60, 62–64 Web and, 309 Cluster analysis, 184 example, 205 – 206 performing to discover customer segments, 203–204, 205–206 Collaborative filtering on the Web, 313–316 ad targeting, 315 applications in near future, 315 call centers, 315 e-commerce, 315 of information, 313 knowledge management and, 314 managing personal resources, 316 marketing campaigns, 315 trend in evolution of, 314 workings of, 9–10 Combining data from multiple offers, 47–48 Computers logistic models and, 4 Constructing modeling data set, 44–48 combining data from multiple offers, 47–48 developing models from modeled data, 47 sample size, 44–45 sampling methods, 45–47 Consummate consumers, 201 Continuous data, 54–55 Continuous variables, 76 – 79, 157 , 210 – 218, 236 churn modeling and, 263–265, 266 means analysis of, 212 transformation of, 215–216, 267 Cookie, 308 Cooking demonstration, 49–180 implementing and maintaining model, 151–180 preparing data for modeling, 51–70 processing and evaluating model, 101 – 124 selecting and transforming variables, 71–99 validating model, 125–150 Creating modeling data set, 57–59 sampling, 58–59 Credit bureaus, 3 Credit scoring and risk modeling, 232–233 Cross-sell model, 10 case example of, 41 opportunities for, 6, 41 Customer, understanding your, 183 – 206 cluster analysis performing to discover customer segments, 203–204, 205–206 developing customer value matrix for credit card company, 198–203 importance of, 184–189 market segmentation keys, 186 – 189 profiling and penetration analysis of catalog company's customers, 190–198 summary of, 204 types of profiling and segmentation, 184–186 value analysis, 109 – 203 Customer acquisition modeling examples, 37–40 Page 361 Customer database components, 28–29 Customer focus versus product focus, 22–23 Customer insight gaining in real time, 317–318 Customer loyalty, 258 Customer models, data for, 40–42 Customer profitability optimizing, 276–278 Customer relationship management (CRM), 284, 285, 286 Customer value matrix development, 198–203 D Data types of, 26–27 validation, 152–155 Data errors and outliers, 60 , 62 – 64, 69 Data marts, 31 Data mining software classification trees, 98 Data preparation for modeling. See Preparing data for modeling Data requirements review and evaluation, 187 – 188 Data sources selecting, 25–48 constructing modeling data set, 44–48 for modeling, 36–44 sources of data, 27–36 summary of, 48 types of data, 26–27 Data warehouse, 31–35 definition of, 31 meta data role, 34 mistakes and best practices, 34–35 typical, 32, 45 Dates, 75–76 Decile analysis, 101 , 247 , 297 bootstrapping and, 140–141 calculating, 239 TEAMFLY Team-Fly ® creating, 113 , 116 – 117, 122 – 123 example, 249 file cut-off determination and, 166 gains table and, 149 on key variables, 146–150 of scored file, 164 using validation data, 118, 120, 121, 123 Decision tree. See Classification trees Demographic data, 26 characteristics of, 27 Demographics and profile analysis, 7–8, 185–186 Descriptive models, 4, 5 Developing models from modeled data, 47 Duration and lifetime value modeling, 284 , 285 – 286 E E-mail inquiry on response data, 308 Evaluating and processing the model. See Processing and evaluating the model Exploratory data analysis, 175 External validity of model, 174 F Factor analysis, 184 File cut-off determination, 166 Financials, calculating, 161 – 165 Focus on product versus customer, 22–23 Fraud, 253, 254–255 Frequency type of profiling, 185 G Gains tables and charts, 125–129 creating, 126–127, 247 examples, 126, 129, 133, 138, 139, 145, 147, 149 lifetime value model and, 301, 302, 303, 304 NPV model, 165 for score comparison, 250 two model method and, 127–129 validation examples, 128 , 129 , 226 – 227, 271 , 272 Genetic algorithms, 17–18 example, 18 Goal defining, 4–12 activation, 10 Page 362 (continued) Goal defining attrition, 10 – 11 cross-sell and up-sell, 10 lifetime value, 11–12 net present value, 11 profile analysis, 7 – 8 response, 8 – 9 risk, 9–10 segmentation, 8 steps in, 5 H High-risk customers, avoiding, 231–255. See also Modeling risk Hiring and teamwork, 21–22 I Implementing and maintaining the model, 151 – 180, 230 back-end validation, 176–177 calculating the financials, 161–165 champion versus challenger, 166–167 checking, 172 churn and, 273 – 278 determining file cut-off, 166 high-risk customer avoidance and, 251–253 maintenance, 177–179 scoring a new file, 151 – 161 summary of, 179–180 tracking, 170–177 two-model matrix, 167–170 Intelligence architecture of business, 33 Interactions detection, 98–99 Internal validity of model, 174 Interval data, 54 J Jackknifing, 134 – 138, 139 L Lifecycle of model, 175, 177–178 benchmarking, 177 rebuild or refresh, 177–178 Life stage as type of profiling, 186 Lifetime value model, 4, 6, 11–12, 281–304. See also Modeling lifetime value Lift measurement, 127, 136, 137, 141, 143, 224, 225, 226 Linear predictors development, 85–97 categorical variables, 95 – 97 continuous variables, 85–95 Linear regression analysis, 12–14, 208, 209, 295 examples, 13, 14 logistic regression versus, 15 , 16 net revenues and, 292 neural networks versus, 16 List compilers, 36, 41 List fatigue, 173–174 List sellers, 36 , 41 Logistic regression, 3–4, 12, 15, 16, 295, 296 categorical variables and, 95, 218 continuous variables and, 85, 86, 93 example, 15 , 223 formula for, 16 jackknifing and, 135 linear regression versus, 15, 16, 85 processing the model and, 102, 221, 222, 245, 246 variable selection using, 240 – 241 LTV. See Lifetime value model M Mail tracking, 171 Maintaining and implementing the model, 151–180. See also Implementing and maintaining the model Maintenance of model, 177 – 179 model life, 177–178 model log, 178–179 Market or population changes, 152, 153–154, 155 MC. See Multicollinearity Meta data, 31 role of, 34 types of, 34 [...]... Outliers and data errors, 60 , 62 – 64 P Penetration analysis, 193, 194– 198 Planning the menu, 1– 48 considerations for, 2 selecting data sources, 25– 48 setting the objective, 3– 24 Population or market changes, 152, 153– 154, 155 Predictive models, 4, 5, 207 Preparing data for modeling, 51– 70 accessing the data, 51– 54 classifying data, 54– 55 cleaning the data, 60– 70 creating modeling data set,... ask for, 5, 6– 7 summary of, 23– 24 Solicitation mail, 31 Sources of data, 27– 36 See also Data sources selecting customer database, 28– 29 data warehouse, 31 – 35 external, 36 internal, 27 – 35 offer history database, 30 – 31 solicitation mail or phone types, 31 transaction database, 29 variation in, 153, 155 Page 366 Splitting the data, 103– 104, 105, 108, 198, 199 Stepwise regression, 103, 105, 109,... variables, 74– 76 developing linear predictors, 85 – 97 interactions detection, 98– 99 summary of, 99 variable reduction, 76 – 79 Selecting data for modeling, 36 – 44 for customer models, 40 – 42 prospect data, 37 – 40 for risk models, 42 – 44 Selecting data sources See Data sources selecting Selection criteria, different, 152, 154 Selection methods for variables entered/removed, 104, 105 Server logs, 307–... customers, retaining See Modeling churn Profitable customers, targeting See Modeling lifetime value Propensity model, 208 Prospect data, 37– 40 case examples for, 38 – 40 Psychographic data, 26– 27 characteristics of, 27 Q Quantitative data, 54– 55 R Ratios, 75 Reading raw data, 55– 57 Rebuild versus refresh a model, 177– 178 Team-Fly® Recency type of profiling, 185 Recipes for every occasion, 181– 322... preparing Web data, 309– 310 selecting methodology, 310– 316 sequential patterns, 311 server logs, 307– 308 sources of Web data, 307– 309 statistics on Web site visits, 320 summary of, 322 transaction types, 310 Web mining versus, 306 Web purchase data, 308– 309 Web sites as behavioral data, 26 direct mail versus, 9 troubleshooting, 171, 172 ... financial type, 42 fraud and, 10 insurance industry and, 9 selecting data for, 42 – 44 Risk score, scaling, 252– 254 Risky revenue customers, 201 Rotate your lists, 173– 174 R-square, 13 genetic algorithms using, 18 S Sample size, 44– 45 Sampling methods, 45 – 58 – 188 47, 59, Scoring alternate data sets, 130– 134, 221 Scoring a new file, 151– 161 data validation, 152– 155 in-house, 152– 155 outside scoring... study (Web usage mining) , 318– 322 Page 367 clustering, 311– 312 collaborative filtering, 313– 316 cookie, 308 e-mail inquiry or response data, 308 form or user registration, 308 gaining customer insight in real time, 317– 318 measurements to evaluate web usage, 318– 320 objective defining, 306– 307 path analysis, 310– 311 predictive modeling and classification analyses, 312 preparing Web data, 309– 310... of, 17 Nominal data, 54 NPV See Net present value model O Objective, setting See Setting the objective Page 364 Objective function, defining, 71 – 74 marketing expense, 74 probability of activation, 72– 73 product profitability, 73 risk index, 73 Objectives, defining, 187, 207– 210, 234– 235, 258– 263 Offer history database, 30– 31 Opportunities by industry targeting model, 6 Ordinal data, 54 Ordinary... decile analysis on key variables, 146– 150 external, 174 gains tables and charts, 125– 129 internal, 174 jackknifing, 134– 138, 139 resampling, 134– 146 scoring alternate data sets, 130– 134 summary of, 150 Validation backend, 176– 177 data sets, 103 using bootstrapping, 224, 225, 227– 230, 270, 271, 272– 273 Value matrix development, customer, 198– 203 Variable reduction, 76 – 79 Variables, categorical,... 153– 154, 155 Predictive models, 4, 5, 207 Preparing data for modeling, 51– 70 accessing the data, 51– 54 classifying data, 54– 55 cleaning the data, 60– 70 creating modeling data set, 57– 59 reading raw data, 55– 57 summary of, 70 Web, 309– 310 Preparing variables, 235– 244, 263– 268 Probability of activation, 72 – 73 PROC UNIVARIATE procedure, 60, 157, 225 decile identifiers and, 115, 239 outliers and, . development, 198–203 D Data types of, 26–27 validation, 152 155 Data errors and outliers, 60 , 62 – 64, 69 Data marts, 31 Data mining software classification trees, 98 Data preparation for modeling 45–47, 58–59, 188 Scoring alternate data sets, 130 – 134, 221 Scoring a new file, 151 –161 data validation, 152 155 in-house, 152 155 outside scoring and auditing, 155 –161 Segmentation analysis example,. 2 selecting data sources, 25–48 setting the objective, 3 – 24 Population or market changes, 152 , 153 154 , 155 Predictive models, 4, 5, 207 Preparing data for modeling, 51–70 accessing the data, 51–54 classifying

Ngày đăng: 21/06/2014, 04:20

Mục lục

  • CONTENTS

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan