Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
242,11 KB
Nội dung
Page 338 Extremes Lowest Obs Highest Obs 0( 84821) 4577900( 23869) 0( 79629) 4828676( 19189) 0( 74719) 5451884( 69915) 0( 70318) 5745300( 17281) 0( 59675) 9250120( 13742) Page 339 Age of File Univariate Procedure Variable=AGE_FIL Weight= SMP_WGT Moments N 85404 Sum Wgts 729228 Mean 174.3028 Sum 1.2711E8 Std Dev 254.3212 Variance 64679.28 Skewness . Kurtosis . USS 2.768E10 CSS 5.5238E9 CV 145.9077 Std Mean 0.297818 T:Mean=0 585.2661 Pr>|T| 0.0001 Num ^= 0 85351 Num > 0 85351 M(Sign) 42675.5 Pr>=|M| 0.0001 Sgn Rank 1.8212E9 Pr>=|S| 0.0001 Page 340 Quantiles (Def=5) 100% Max 666 99% 347 75% Q3 227 95% 329 50% Med 169 90% 295 25% Q1 108 10% 52 0% Min 0 5% 31 1% 5 Range 666 Q3 - Q1 119 Mode 214 Extremes Lowest Obs Highest Obs 0( 84242) 619( 84665) 0( 81481) 625( 58765) 0( 80706) 663( 38006) 0( 80589) 666( 10497) 0( 76936) 666( 71596) TEAMFLY Team-Fly ® Page 341 Number of 30-Day Delinquencies Univariate Procedure Variable=NO30DAY Weight= SMP_WGT Moments N 85404 Sum Wgts 729228 Mean 0.718138 Sum 523686 Std Dev 5.898548 Variance 34.79287 Skewness . Kurtosis . USS 3347494 CSS 2971415 CV 821.3675 Std Mean 0.006907 T:Mean=0 103.9667 Pr>|T| 0.0001 Num ^= 0 22303 Num > 0 22303 M(Sign) 11151.5 Pr>=|M| 0.0001 Sgn Rank 1.2436E8 Pr>=|S| 0.0001 Page 342 Quantiles (Def=5) 100% Max 43 99% 10 75% Q3 1 95% 4 50% Med 0 90% 2 25% Q1 0 10% 0 0% Min 0 5% 0 1% 0 Range 43 Q3 - Q1 1 Mode 0 Extremes Lowest Obs Highest Obs 0( 85404) 37( 78590) 0( 85403) 38( 44354) 0( 85402) 38( 47412) 0( 85401) 43( 7285) 0( 85400) 43( 46812) Page 343 Replacing Missing Values for Income The first frequency shows the distribution of HOME EQUITY. This is used to create the matrix that displays the mean INCOME by HOME EQUITY range and AGE Group. HOMEQ_R Frequency Percent Cumulative Frequency Cumulative Percent $0-$100K 53723 62.9 53723 62.9 $100-$20 20285 23.8 74008 86.7 $200-$30 7248 8.5 81256 95.1 $300-$40 2269 2.7 83525 97.8 $400-$50 737 0.9 84262 98.7 $500-$60 339 0.4 84601 99.1 $600-$70 213 0.2 84814 99.3 $700K+ 590 0.7 85404 100.0 Mean INCOME by HOME EQUITY and AGE Group Age group 25 - 34 35 - 44 45 - 54 55 - 65 Home Equity $47 $55 $57 $55 $0-$100K $100-$20 $70 $73 $72 $68 $200-$30 $66 $73 $76 $68 $300-$40 $70 $80 $84 $76 $400-$50 $89 $93 $94 $93 $500-$60 $98 $101 $102 $97 $600-$70 $91 $105 $109 $104 $700K+ $71 $102 $111 $107 The following regression used INFERRED AGE, HOME EQUITY, CREDIT LINE and TOTAL BALANCES to predict missing values for INCOME. Step 0 All Variables Entered R-square = 0.66962678 C(p) = 5.000000000 DF Sum of Squares Mean Square Regression 4 372912001.28200 93228000.320499 43230.8 Error 85315 183983294.99609 2156.51755255 Total 85319 556895296.27809 Parameter Standard Type II Variable Estimate Error Sum of Squares INTERCEP 36.87607683 0.25721498 44325117.661102 20554.0 INFD_AGE2 0.11445815 0.00602800 777500.06857066 360.54 HOM_EQU2 - 0.00000343 0.00000040 158246.95496857 73.38 CREDLIN2 0.00011957 0.00000120 21410199.091371 9928.14 TOT_BAL2 - 0.00000670 0.00000136 52180.72203473 24.20 Bounds on condition number: 19.28444, 161.5108 Page 345 All variables left in the model are significant at the 0.1000 level. OBS _MODEL_ _TYPE_ _DEPVAR_ _RMSE_ INTERCEP INFD_AG2 1 INC_REG PARMS INC_EST2 46.4383 36.8761 0.11446 OBS HOM_EQU2 CREDLIN2 TOT_BAL2 INC_EST2 1 - .0000034348 .00011957 - .0000067044 - 1 The following print output displays the values for INCOME after regression substitution. OBS INC_EST2 INC_EST3 436 . 43.3210 2027 . 44.5382 4662 . 40.2160 5074 . 43.5390 5256 . 41.6833 5552 . 43.1713 6816 . 41.4527 8663 . 62.8130 10832 . 42.3638 11052 . 42.4500 14500 . 41.6961 14809 . 41.2255 15132 . 63.4676 15917 . 41.2685 16382 . 41.8029 16788 . 40.8224 16832 . 34.2062 16903 . 42.0813 17201 . 43.5734 17419 . 42.8501 17540 . 43.9865 17700 . 40.3016 18045 . 42.4653 18147 . 43.5607 18254 . 41.3758 18296 . 41.3878 18362 . 41.5944 18597 . 40.7010 18931 . 51.3921 19058 . 42.4845 20419 . 42.0720 20640 . 40.1936 22749 . 42.1277 23201 . 42.6050 23334 . 42.4007 23651 . 41.7607 24227 . 34.0227 24764 . 39.8351 25759 . 42.0662 continues [...]... attrition or churn, risk, and lifetime or net present value Detailed code for developing the objective function includes examples from the credit cards, insurance, telecommunications, and catalog industries The code is well documented and explains the goals and methodology for each step The only software needed is BASE SAS and SAS/STAT The spreadsheets used for creating gains tables and lift charts are... blocks of SAS code used to develop, validate, and implement the data models By adapting this code and using some common sense, it is possible to build a model from the data preparation phase through model development and validation However, this could take a considerable amount of time and introduce the possibility of coding errors To simplify this task and make the code easily accessible for a variety... complete understanding of the recipes in Data Mining Cookbook The spreadsheets contain all formulas used to create the tables and charts in Data Mining Cookbook Page 359 INDEX A Access log, 307, 309 Accessing the data, 51– 57 ASCII and, 52– 53 classifying data, 54– 55 reading raw data, 55– 57 Activation models, 10 Adaptive company, 20– 23 hiring and teamwork, 21– 22 product focus versus customer focus,... Software Insert the CD-ROM and launch the readme.htm file in a web browser, or navigate using Windows Explorer to browse the contents of the CD The model programs and output are in text format that can be opened in any editing software (including SAS) that reads ASCII files Spreadsheets are in Microsoft Excel 97/2000 and Microsoft Excel 5.0/95 Launch the application (SAS 6.12 or higher) and open the file directly... Michael J.A., and Gordon Linoff 1997 Data Mining Techniques New York: John Wiley & Sons Berry, Michael J.A., and Gordon Linoff 1997 Mastering Data Mining New York: John Wiley & Sons Hosmer, David W., and Stanley Lemeshow 1989 Applied Logistic Regression New York: John Wiley & Sons Hughes, Arthur M 1994 Strategic Database Marketing Chicago: Probus Publishing Journal of Targeting, Measurement and Analysis... 23 American Standard for Computer Information Interchange See ASCII files Analyst hiring of, 21 retaining, 22 teamwork with, 22 Analytics See Adaptive company Approval models See Risk models ASCII files, 52 access log and, 307 codes for, 52, 53 fixed format, 52– 53 variable format, 53 Attitudinal data, 26– 27 Attrition, 4, 10– 11 See also Modeling churn case example of, 42 credit cards and silent type... analyses created in SAS While the steps before and after the model processing can be used in conjunction with any data modeling software package, the code can also serve as a stand-alone modeling template The model processing steps focus on variable preparation for use in logistic regression Additional efficiencies in the form of SAS macros for variable processing and validation are included Page 358 Hardware... wish to make changes, you can rename the files and save them to your local hard drive Using the Software The CD is organized into folders that correspond to each chapter in Data Mining Cookbook Within each folder are subfolders containing SAS programs, SAS output, Excel spreadsheets The programs can be used as templates You just need to change the data set and variable names More specific instructions . Equity $47 $55 $57 $55 $0- $100 K $100 -$20 $70 $73 $72 $68 $200-$30 $66 $73 $76 $68 $300-$40 $70 $80 $84 $76 $400-$50 $89 $93 $94 $93 $500-$60 $98 $101 $102 $97 $600-$70 $91 $105 $109 $104 $700K+ $71 $102 $111 $107 The following regression used INFERRED AGE, HOME EQUITY, CREDIT LINE and TOTAL BALANCES to predict. Y 728945 100 .0 729228 100 .0 DRIV_IN Frequency Percent Cumulative Frequency Cumulative Percent A 338 410 46.4 338 410 46.4 N 333066 45.7 671476 92.1 O 57752 7.9 729228 100 .0 MOB_IND. Percent $0- $100 K 53723 62.9 53723 62.9 $100 -$20 20285 23.8 74008 86.7 $200-$30 7248 8.5 81256 95.1 $300-$40 2269 2.7 83525 97.8 $400-$50 737 0.9 84262 98.7 $500-$60 339 0.4 84601 99.1 $600-$70 213 0.2 84814 99.3 $700K+ 590 0.7 85404 100 .0 Mean INCOME by HOME EQUITY and AGE Group Age group 25 - 34 35 - 44 45 - 54 55 - 65 Home Equity $47 $55 $57 $55 $0- $100 K $100 -$20 $70 $73 $72 $68 $200-$30 $66 $73 $76 $68 $300-$40 $70 $80 $84 $76 $400-$50 $89 $93 $94 $93 $500-$60 $98 $101 $102 $97 $600-$70 $91 $105 $109 $104 $700K+ $71 $102 $111 $107 The