Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
742,38 KB
Nội dung
FindingMinimalNeural Networks forBusiness Intelligence Applications Rud y Setiono y School of Computing National University of Singapore d/d www.comp.nus.e d u.sg / ~ru d ys Outline • Introduction • Feed-forward neural networks • Neuralnetwork training and pruning • Neuralnetwork training and pruning • Rule extraction • Business intelligence applications • Conclusion • References References • For discussion: Time-series data mining 2 using neuralnetwork rule extraction Introduction • BusinessIntelligence(BI):Asetofmathematicalmodelsandanalysis methodologiesthatexploitavailabledatatogenerateinformationand knowledgeusefulforcomplexdecision‐makingprocess. • Mathematical models and analysis methodologies for BI include various • Mathematical models and analysis methodologies for BI include various inductivelearningmodelsfordataminingsuchasdecisiontrees,artificial neuralnetworks,fuzzylogic,geneticalgorithms,supportvectormachines, andintelligentagents. 3 Introduction BI Analytical Applications include: • Customersegmentation:Whatmarketsegmentsdomycustomersfallinto, andwhataretheircharacteristics? • Propensitytobuy:Whatcustomersaremostlikelytorespondtomy promotion? • Frauddetection:HowcanItellwhichtransactionsarelikelytobefraudulent? Ct tt iti Whi h t i t ik f li? • C us t omera tt r iti on: Whi c h cus t omer i sa t r i s k o f l eav i ng ? • Creditscoring:Whichcustomerwillsuccessfullyrepayhisloan,willnot defaultonhiscreditcardpayment? • Time series prediction 4 • Time ‐ series prediction . Feed-forward neural networks A feed-forward neuralnetwork with one hidden layer: ibl l i • Inputvar i a bl eva l uesareg i ven totheinputunits. • Thehiddenunitscom p utethe p activationvaluesusinginput valuesandconnectionweight valuesW. • Thehiddenunitactivationsare giventotheoutputunits. • Decisionismadeattheoutput layeraccordingtotheactivation valuesoftheoutputunits. 5 Feed-forward neural networks Hiddenunitactivation: • Compute the weighted input: w 1 x 1 + w 2 x 2 + …. + w x Compute the weighted input: w 1 x 1 + w 2 x 2 + …. + w n x n • Applyanactivationfunctiontothisweightedinput,forexamplethelogistic fif( ) 1/(1 ) f unct i on f( x ) = 1/(1 +e ‐x ) : 6 Neuralnetwork training and pruning Neuralnetworktraining: • Findanoptimalweight(W,V). • Minimizeafunctionthatmeasureshowwellthenetworkpredictsthedesired outputs (class label) outputs (class label) • Errorinpredictionfori‐th sample: e = (desired output) – (predicted output) e i = (desire d output) i – (predicted output) i • Sumofsquarederrorfunction: ∑ E(W,V)= ∑ e i 2 • Cross‐entropyerrorfunction: E(W,V)=‐ Σ d i logp i +(1‐ d i )log(1–p i ) d is the desired output either 0 or 1 7 d i is the desired output , either 0 or 1 . Neuralnetwork training and pruning Neuralnetworktraining: • Many optimization methods can be applied to find an optimal (W,V): Many optimization methods can be applied to find an optimal (W,V): o Gradientdescent/errorbackpropagation o Conjugategradient o QuasiNewtonmethod o Geneticalgorithm Nt ki id d ll ti dif it di t tii dt d • N e t wor k i scons id ere d we ll t ra i ne d if it canpre di c t t ra i n i ng d a t aan d cross‐ validationdata withacceptableaccuracy. 8 Neuralnetwork training and pruning Neuralnetworkpruning:Removeirrelevant/redundantnetworkconnections 1. Initialization. (a)LetWbethesetofnetworkconnectionsthatarestillpresentinthenetworkand (b)letCbethesetofconnectionsthathavebeencheckedforpossibleremoval (c) W corresponds to all the connections in the fully connected trained network and C is the empty set. (c) W corresponds to all the connections in the fully connected trained network and C is the empty set. 2.Saveacopyoftheweightvaluesofallconnectionsinthenetwork. 3.Findw∈ Wandw– Csuchthatwhenitsweightvalueissetto0,theaccuracyofthenetworkisleastaffected. 4.Settheweightfornetworkconnectionw to0andretrainthenetwork. 5.Iftheaccuracyofthenetworkisstillsatisfactory,then (a)Removew,i.e.setW:=W−{w}. (b)ResetC:=∅. (c) Go to Step 2. (c) Go to Step 2. 6.Otherwise, (a)SetC:=C∪ {w}. 9 (b)RestorethenetworkweightswiththevaluessavedinStep2above. (c)IfC≠W, gotoStep2.Otherwise,Stop. Neuralnetwork training and pruning PrunedneuralnetworkforLEDrecognition(1) z 1 z 2 z 3 z 4 2 3 z 7 z 5 z 6 Howmanyhiddenunitsandnetworkconnectionsareneededtorecognizeall d l? 7 ten d igitscorrect l y ? 10 [...].. .Neural network training and pruning Pruned neuralnetworkfor LED recognition (2) Raw data z1 z1 z3 z4 z5 z6 z7 Digit 1 1 1 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 1 0 1 2 1 0 1 1 0 1 1 3 0 1 1 1 0 1 0 4 1 1 0 1 0 1 1 5 1 1 0 1 1 1 1 6 1 0 1 0 0 1 0 7 1 1 1 1 1 1 1 8 1 1 1 1 0 1 1 A neuralnetwork A neural network for data analysis 1 1 Processed data z2 9 11 Neural network training and pruning Pruned neuralnetworkfor LED recognition (3)... pruning Pruned neuralnetworkfor LED recognition (3) Many different pruned neural networks diff d l k can recognized all 10 digits correctly 12 Part 2. Novel techniques for data analysisand pruning Neural network training Pruned neuralnetworkfor LED recognition (4): What do we learn? = 0 0 = 1 1 = 2 2 Must be on Must be off Classification rules can be extracted from pruned networks t t df d t k... Part 2. Novel techniques for data analysis Business intelligence applications Experiment 1: CARD datasets • 30 neural networks for each of the data sets were trained l k f h f h d d • Neuralnetwork starts has one hidden neuron. • The number of input neurons, including one bias input was 52 • The initial weights of the networks were randomly and uniformly generated in the interval [ 1, 1] uniformly generated... NN (other) 13.95 18.02 18.02 NeuralWorks 14.07 14 07 18.37 18 37 15.13 15 13 NeuroShell 12.73 18.72 15.81 ( Pruned NN (θ1) 12.21 18.24 15.33 Pruned NN (θ2) 11.65 14.83 12.85 22 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 1: CARD datasets • Neural networks with just one hidden unit and very few connections outperform more complex neural networks! l t k! • Rule can be extracted to provide more understanding about the classification... d ti attributes C57,C58, . . .C63. • 666 randomly selected samples for training and the remaining 334 samples for 666 randomly selected samples for training and the remaining 334 samples for testing. 26 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 2: German credit data set • A pruned network with one hidden unit and 10 input units was found to have p... Operating Characteristic (ROC) Curve (AUC) is also computed 19 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 1: CARD datasets • Where αi are the predicted outputs for Class 1 samples i 1 2 Where α are the predicted outputs for Class 1 samples i = 1,2, … m and βj are predicted output for Class 0 samples, j = 1,2, … n. • AUC is a more appropriate performance measure than ACC when the class distribution is skewed... CARD3(TS) • θ is the cut‐off point forneuralnetwork classification: if output is greater than θ, than predict Class 1, else predict Class 0. • θ1 and θ2 are cut‐off points selected to maximize the accuracy on the training data and the test data sets, respectively • AUCd = AUC for the discrete classifier = (1 – fp + tp)/2 21 Part 2. Novel techniques for data analysis Business intelligence applications... Part 2. Novel techniques for data analysis Business intelligence applications Experiment 3: Bene1 and Bene2 credit scoring data sets • A pruned neuralnetworkfor Bene1: 32 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 3: Bene1 and Bene2 credit scoring data sets • The extracted rules for Bene1 (partial): Rule R: If Purpose = cash provisioning and Marital status = not married and Applicant ... k Doesn’t matter 13 Part 2. Novel techniques for data analysis Rule extraction Re‐RX: an algorithm for rule extraction from neural networks • New pedagocical rule extraction algorithm: Re‐RX (Recursive Rule Extraction) New pedagocical rule extraction algorithm: Re RX (Recursive Rule Extraction) • Handles mix of discrete/continuous variables without need for discretization of continuous variables –... Similar scoring models are now also used to estimate the credit risk of entire loan portfolios in the context of Basel II. 16 Part 2. Novel techniques for data analysis Business intelligence applications • Basel II capital accord: framework regulating minimum capital requirements for banks • C t Customer data credit risk score h d t dit i k how much capital to h it l t set aside for a portfolio of loans • Data collected from various operational systems in the bank,