Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
242,11 KB
Nội dung
Page 338 Extremes Lowest Obs Highest Obs 0( 84821) 4577900( 23869) 0( 79629) 4828676( 19189) 0( 74719) 5451884( 69915) 0( 70318) 5745300( 17281) 0( 59675) 9250120( 13742) Page 339 Age of File Univariate Procedure Variable=AGE_FIL Weight= SMP_WGT Moments N 85404 Sum Wgts 729228 Mean 174.3028 Sum 1.2711E8 Std Dev 254.3212 Variance 64679.28 Skewness . Kurtosis . USS 2.768E10 CSS 5.5238E9 CV 145.9077 Std Mean 0.297818 T:Mean=0 585.2661 Pr>|T| 0.0001 Num ^= 0 85351 Num > 0 85351 M(Sign) 42675.5 Pr>=|M| 0.0001 Sgn Rank 1.8212E9 Pr>=|S| 0.0001 Page 340 Quantiles (Def=5) 100% Max 666 99% 347 75% Q3 227 95% 329 50% Med 169 90% 295 25% Q1 108 10% 52 0% Min 0 5% 31 1% 5 Range 666 Q3 - Q1 119 Mode 214 Extremes Lowest Obs Highest Obs 0( 84242) 619( 84665) 0( 81481) 625( 58765) 0( 80706) 663( 38006) 0( 80589) 666( 10497) 0( 76936) 666( 71596) TEAMFLY Team-Fly ® Page 341 Number of 30-Day Delinquencies Univariate Procedure Variable=NO30DAY Weight= SMP_WGT Moments N 85404 Sum Wgts 729228 Mean 0.718138 Sum 523686 Std Dev 5.898548 Variance 34.79287 Skewness . Kurtosis . USS 3347494 CSS 2971415 CV 821.3675 Std Mean 0.006907 T:Mean=0 103.9667 Pr>|T| 0.0001 Num ^= 0 22303 Num > 0 22303 M(Sign) 11151.5 Pr>=|M| 0.0001 Sgn Rank 1.2436E8 Pr>=|S| 0.0001 Page 342 Quantiles (Def=5) 100% Max 43 99% 10 75% Q3 1 95% 4 50% Med 0 90% 2 25% Q1 0 10% 0 0% Min 0 5% 0 1% 0 Range 43 Q3 - Q1 1 Mode 0 Extremes Lowest Obs Highest Obs 0( 85404) 37( 78590) 0( 85403) 38( 44354) 0( 85402) 38( 47412) 0( 85401) 43( 7285) 0( 85400) 43( 46812) Page 343 Replacing Missing Values for Income The first frequency shows the distribution of HOME EQUITY. This is used to create the matrix that displays the mean INCOME by HOME EQUITY range and AGE Group. HOMEQ_R Frequency Percent Cumulative Frequency Cumulative Percent $0-$100K 53723 62.9 53723 62.9 $100-$20 20285 23.8 74008 86.7 $200-$30 7248 8.5 81256 95.1 $300-$40 2269 2.7 83525 97.8 $400-$50 737 0.9 84262 98.7 $500-$60 339 0.4 84601 99.1 $600-$70 213 0.2 84814 99.3 $700K+ 590 0.7 85404 100.0 Mean INCOME by HOME EQUITY and AGE Group Age group 25 - 34 35 - 44 45 - 54 55 - 65 Home Equity $47 $55 $57 $55 $0-$100K $100-$20 $70 $73 $72 $68 $200-$30 $66 $73 $76 $68 $300-$40 $70 $80 $84 $76 $400-$50 $89 $93 $94 $93 $500-$60 $98 $101 $102 $97 $600-$70 $91 $105 $109 $104 $700K+ $71 $102 $111 $107 The following regression used INFERRED AGE, HOME EQUITY, CREDIT LINE and TOTAL BALANCES to predict missing values for INCOME. Step 0 All Variables Entered R-square = 0.66962678 C(p) = 5.000000000 DF Sum of Squares Mean Square Regression 4 372912001.28200 93228000.320499 43230.8 Error 85315 183983294.99609 2156.51755255 Total 85319 556895296.27809 Parameter Standard Type II Variable Estimate Error Sum of Squares INTERCEP 36.87607683 0.25721498 44325117.661102 20554.0 INFD_AGE2 0.11445815 0.00602800 777500.06857066 360.54 HOM_EQU2 - 0.00000343 0.00000040 158246.95496857 73.38 CREDLIN2 0.00011957 0.00000120 21410199.091371 9928.14 TOT_BAL2 - 0.00000670 0.00000136 52180.72203473 24.20 Bounds on condition number: 19.28444, 161.5108 Page 345 All variables left in the model are significant at the 0.1000 level. OBS _MODEL_ _TYPE_ _DEPVAR_ _RMSE_ INTERCEP INFD_AG2 1 INC_REG PARMS INC_EST2 46.4383 36.8761 0.11446 OBS HOM_EQU2 CREDLIN2 TOT_BAL2 INC_EST2 1 - .0000034348 .00011957 - .0000067044 - 1 The following print output displays the values for INCOME after regression substitution. OBS INC_EST2 INC_EST3 436 . 43.3210 2027 . 44.5382 4662 . 40.2160 5074 . 43.5390 5256 . 41.6833 5552 . 43.1713 6816 . 41.4527 8663 . 62.8130 10832 . 42.3638 11052 . 42.4500 14500 . 41.6961 14809 . 41.2255 15132 . 63.4676 15917 . 41.2685 16382 . 41.8029 16788 . 40.8224 16832 . 34.2062 16903 . 42.0813 17201 . 43.5734 17419 . 42.8501 17540 . 43.9865 17700 . 40.3016 18045 . 42.4653 18147 . 43.5607 18254 . 41.3758 18296 . 41.3878 18362 . 41.5944 18597 . 40.7010 18931 . 51.3921 19058 . 42.4845 20419 . 42.0720 20640 . 40.1936 22749 . 42.1277 23201 . 42.6050 23334 . 42.4007 23651 . 41.7607 24227 . 34.0227 24764 . 39.8351 25759 . 42.0662 continues [...]... spreadsheets contain all formulas used to create the tables and charts in Data Mining Cookbook Page 359 INDEX A Access log, 307, 309 Accessing the data, 51– 57 ASCII and, 52– 53 classifying data, 54– 55 reading raw data, 55– 57 Activation models, 10 Adaptive company, 20– 23 hiring and teamwork, 21– 22 product focus versus customer focus, 22 – 23 American Standard for Computer Information Interchange... from the analyses created in SAS While the steps before and after the model processing can be used in conjunction with any datamodeling software package, the code can also serve as a stand-alone modeling template The model processing steps focus on variable preparation for use in logistic regression Additional efficiencies in the form of SAS macros for variable processing and validation are included... John W 1977 Exploratory Data Analysis Reading, MA: Addison-Wesley Page 357 WHAT'S ON THE CD-ROM? The CD -ROM contains step-by -step instructions for developing the data models described in Data Mining Cookbook Written in SAS code, you can use the contents as a template to create your own models The content on the CD -ROM is equivalent to taking a three-day course in datamodeling Within chapters... to each chapter in Data Mining Cookbook Within each folder are subfolders containing SAS programs, SAS output, Excel spreadsheets The programs can be used as templates You just need to change the data set and variable names More specific instructions on how to use the programs are included The output is included to provide a more complete understanding of the recipes in Data Mining Cookbook The spreadsheets... hiring of, 21 retaining, 22 teamwork with, 22 Analytics See Adaptive company Approval models See Risk models ASCII files, 52 access log and, 307 codes for, 52, 53 fixed format, 52– 53 variable format, 53 Attitudinal data, 26– 27 Attrition, 4, 10– 11 See also Modeling churn case example of, 42 credit cards and silent type of, 258 defining to optimize profits, 259– 261 definition of, 11 ... Gordon Linoff 1997 Data Mining Techniques New York: John Wiley & Sons Berry, Michael J.A., and Gordon Linoff 1997 Mastering Data Mining New York: John Wiley & Sons Hosmer, David W., and Stanley Lemeshow 1989 Applied Logistic Regression New York: John Wiley & Sons Hughes, Arthur M 1994 Strategic Database Marketing Chicago: Probus Publishing Journal of Targeting, Measurement and Analysis for Marketing London:... implement the data models By adapting this code and using some common sense, it is possible to build a model from the data preparation phase through model development and validation However, this could take a considerable amount of time and introduce the possibility of coding errors To simplify this task and make the code easily accessible for a variety of model types, a companion CD-ROM is available for purchase... churn, risk, and lifetime or net present value Detailed code for developing the objective function includes examples from the credit cards, insurance, telecommunications, and catalog industries The code is well documented and explains the goals and methodology for each step The only software needed is BASE SAS and SAS/STAT The spreadsheets used for creating gains tables and lift charts are also included... Installing the Software Insert the CD-ROM and launch the readme.htm file in a web browser, or navigate using Windows Explorer to browse the contents of the CD The model programs and output are in text format that can be opened in any editing software (including SAS) that reads ASCII files Spreadsheets are in Microsoft Excel 97/2000 and Microsoft Excel 5.0/95 Launch the application (SAS 6.12 or higher)... Additional efficiencies in the form of SAS macros for variable processing and validation are included Page 358 Hardware Requirements To use this CD-ROM, your system must meet the following requirements: Platform/processor/operating system: Windows® 95, NT 4.0 or higher; 200 MHz Pentium RAM: 64 MB minimum; 128 MB recommended Hard drive space: Nothing will install to the hard drive, but in order to make a local . 594 0.1 594 0.1 A1 1854 0.3 2448 0.3 A2 947 0.1 3395 0.5 A3 2278 0.3 5673 0.8 A4 2269 0.3 7942 1.1 B1 1573 0.2 9515 1.3 B2 1306 0.2 10821 1.5 B3 1668 0.2 12489 1.7 B4 112 0 0.2 13609 1.9 C1 2518 0.3 16127 2.2 C2 5759 0.8 21886 3.0 C3 404 0.1 22290 3.1 C4 119 4 0.2 23484 3.2 D1 59097 8.1 82581 11. 3 D2 7114 1.0 89695 12.3 D3 8268 1.1 97963 13.4 D4 14128 1.9 112 091 15.4 E1 1614 0.2 113 705 15.6 E2 1091 0.1 114 796 15.7 E3 13479 1.8 128275 17.6 E4 7695 1.1 135970 18.6 E5 3808 0.5 139778 19.2 F1 878 0.1 140656 19.3 F2 1408 0.2 142064 19.5 F3 1272 0.2 143336 19.7 G1 5459 0.7 148795 20.4 G2 28935 4.0 177730 24.4 G3 33544 4.6 2112 74 29.0 G4 14517 2.0 225791 31.0 G5 3862 0.5 229653 31.5 H1 41 0.0 229694 31.5 H2 153 0.0 229847 31.5 H3 1550 0.2 231397 31.7 TEAMFLY . 346 (Continued) OBS INC_EST2 INC_EST3 26608 . 54.4467 30922 . 42.8928 3114 1 . 41.6508 32963 . 42.6343 32986 . 41.2255 34175 . 41. 2114 34702 . 41.6541 35897 . 47.0708 36616 . 42.7077 42978 . 41.3285 44612 . 53.0752 45165 . 43.7436 45959 . 41 .111 0 46242 . 41.9122 46428 . 41.6833 46439 . 42.5990 47002 . 42.0267 47400 . 41.8678 48237 . 51.3944 49472 . 44.1610 50012 . 41.7293 50059 . 43.1998 50236 . 42.3850 50446 . 42.0642 51312 . 42.6312 52741 . 42.0676 53961 . 42.1993 53972 . 43.8084 54715 . 43.3766 55422 . 44.0192 57848 . 45.8676 59262 . 41.3399 59450 . 41.4544 59512 . 43.7946 59675 . 41.4544 60545 . 46.7328 64254 . 52.3536 66336 . 42.3752 69200 . 43.4159 70318 . 41.5689 72152 . 41.6255 . Squares INTERCEP 36.87607683 0.25721498 4432 5117 .6 6110 2 20554.0 INFD_AGE2 0 .114 45815 0.00602800 777500.06857066 360.54 HOM_EQU2 - 0.00000343 0.00000040 158246.95496857 73.38 CREDLIN2 0.00 0119 57 0.00000120 21410199.091371 9928.14 TOT_BAL2 - 0.00000670 0.00000136 52180.72203473 24.20 Bounds