Thông tin tài liệu
Finding Minimal Neural Networks for Business Intelligence Applications Rud y Setiono y School of Computing National University of Singapore d/d www.comp.nus.e d u.sg / ~ru d ys Outline • Introduction • Feed-forward neural networks • Neural network training and pruning • Neural network training and pruning • Rule extraction • Business intelligence applications • Conclusion • References References • For discussion: Time-series data mining 2 using neural network rule extraction Introduction • BusinessIntelligence(BI):Asetofmathematicalmodelsandanalysis methodologiesthatexploitavailabledatatogenerateinformationand knowledgeusefulforcomplexdecision‐makingprocess. • Mathematical models and analysis methodologies for BI include various • Mathematical models and analysis methodologies for BI include various inductivelearningmodelsfordataminingsuchasdecisiontrees,artificial neuralnetworks,fuzzylogic,geneticalgorithms,supportvectormachines, andintelligentagents. 3 Introduction BI Analytical Applications include: • Customersegmentation:Whatmarketsegmentsdomycustomersfallinto, andwhataretheircharacteristics? • Propensitytobuy:Whatcustomersaremostlikelytorespondtomy promotion? • Frauddetection:HowcanItellwhichtransactionsarelikelytobefraudulent? Ct tt iti Whi h t i t ik f li? • C us t omera tt r iti on: Whi c h cus t omer i sa t r i s k o f l eav i ng ? • Creditscoring:Whichcustomerwillsuccessfullyrepayhisloan,willnot defaultonhiscreditcardpayment? • Time series prediction 4 • Time ‐ series prediction . Feed-forward neural networks A feed-forward neural network with one hidden layer: ibl l i • Inputvar i a bl eva l uesareg i ven totheinputunits. • Thehiddenunitscom p utethe p activationvaluesusinginput valuesandconnectionweight valuesW. • Thehiddenunitactivationsare giventotheoutputunits. • Decisionismadeattheoutput layeraccordingtotheactivation valuesoftheoutputunits. 5 Feed-forward neural networks Hiddenunitactivation: • Compute the weighted input: w 1 x 1 + w 2 x 2 + …. + w x Compute the weighted input: w 1 x 1 + w 2 x 2 + …. + w n x n • Applyanactivationfunctiontothisweightedinput,forexamplethelogistic fif( ) 1/(1 ) f unct i on f( x ) = 1/(1 +e ‐x ) : 6 Neural network training and pruning Neuralnetworktraining: • Findanoptimalweight(W,V). • Minimizeafunctionthatmeasureshowwellthenetworkpredictsthedesired outputs (class label) outputs (class label) • Errorinpredictionfori‐th sample: e = (desired output) – (predicted output) e i = (desire d output) i – (predicted output) i • Sumofsquarederrorfunction: ∑ E(W,V)= ∑ e i 2 • Cross‐entropyerrorfunction: E(W,V)=‐ Σ d i logp i +(1‐ d i )log(1–p i ) d is the desired output either 0 or 1 7 d i is the desired output , either 0 or 1 . Neural network training and pruning Neuralnetworktraining: • Many optimization methods can be applied to find an optimal (W,V): Many optimization methods can be applied to find an optimal (W,V): o Gradientdescent/errorbackpropagation o Conjugategradient o QuasiNewtonmethod o Geneticalgorithm Nt ki id d ll ti dif it di t tii dt d • N e t wor k i scons id ere d we ll t ra i ne d if it canpre di c t t ra i n i ng d a t aan d cross‐ validationdata withacceptableaccuracy. 8 Neural network training and pruning Neuralnetworkpruning:Removeirrelevant/redundantnetworkconnections 1. Initialization. (a)LetWbethesetofnetworkconnectionsthatarestillpresentinthenetworkand (b)letCbethesetofconnectionsthathavebeencheckedforpossibleremoval (c) W corresponds to all the connections in the fully connected trained network and C is the empty set. (c) W corresponds to all the connections in the fully connected trained network and C is the empty set. 2.Saveacopyoftheweightvaluesofallconnectionsinthenetwork. 3.Findw∈ Wandw– Csuchthatwhenitsweightvalueissetto0,theaccuracyofthenetworkisleastaffected. 4.Settheweightfornetworkconnectionw to0andretrainthenetwork. 5.Iftheaccuracyofthenetworkisstillsatisfactory,then (a)Removew,i.e.setW:=W−{w}. (b)ResetC:=∅. (c) Go to Step 2. (c) Go to Step 2. 6.Otherwise, (a)SetC:=C∪ {w}. 9 (b)RestorethenetworkweightswiththevaluessavedinStep2above. (c)IfC≠W, gotoStep2.Otherwise,Stop. Neural network training and pruning PrunedneuralnetworkforLEDrecognition(1) z 1 z 2 z 3 z 4 2 3 z 7 z 5 z 6 Howmanyhiddenunitsandnetworkconnectionsareneededtorecognizeall d l? 7 ten d igitscorrect l y ? 10 [...].. .Neural network training and pruning Pruned neural network for LED recognition (2) Raw data z1 z1 z3 z4 z5 z6 z7 Digit 1 1 1 0 1 1 1 0 0 0 1 0 0 1 0 0 1 1 1 0 1 2 1 0 1 1 0 1 1 3 0 1 1 1 0 1 0 4 1 1 0 1 0 1 1 5 1 1 0 1 1 1 1 6 1 0 1 0 0 1 0 7 1 1 1 1 1 1 1 8 1 1 1 1 0 1 1 A neural network A neural network for data analysis 1 1 Processed data z2 9 11 Neural network training and pruning Pruned neural network for LED recognition (3)... pruning Pruned neural network for LED recognition (3) Many different pruned neural networks diff d l k can recognized all 10 digits correctly 12 Part 2. Novel techniques for data analysisand pruning Neural network training Pruned neural network for LED recognition (4): What do we learn? = 0 0 = 1 1 = 2 2 Must be on Must be off Classification rules can be extracted from pruned networks t t df d t k... Part 2. Novel techniques for data analysis Business intelligence applications Experiment 1: CARD datasets • 30 neural networks for each of the data sets were trained l k f h f h d d • Neural network starts has one hidden neuron. • The number of input neurons, including one bias input was 52 • The initial weights of the networks were randomly and uniformly generated in the interval [ 1, 1] uniformly generated... NN (other) 13.95 18.02 18.02 NeuralWorks 14.07 14 07 18.37 18 37 15.13 15 13 NeuroShell 12.73 18.72 15.81 ( Pruned NN (θ1) 12.21 18.24 15.33 Pruned NN (θ2) 11.65 14.83 12.85 22 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 1: CARD datasets • Neural networks with just one hidden unit and very few connections outperform more complex neural networks! l t k! • Rule can be extracted to provide more understanding about the classification... d ti attributes C57,C58, . . .C63. • 666 randomly selected samples for training and the remaining 334 samples for 666 randomly selected samples for training and the remaining 334 samples for testing. 26 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 2: German credit data set • A pruned network with one hidden unit and 10 input units was found to have p... Operating Characteristic (ROC) Curve (AUC) is also computed 19 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 1: CARD datasets • Where αi are the predicted outputs for Class 1 samples i 1 2 Where α are the predicted outputs for Class 1 samples i = 1,2, … m and βj are predicted output for Class 0 samples, j = 1,2, … n. • AUC is a more appropriate performance measure than ACC when the class distribution is skewed... CARD3(TS) • θ is the cut‐off point for neural network classification: if output is greater than θ, than predict Class 1, else predict Class 0. • θ1 and θ2 are cut‐off points selected to maximize the accuracy on the training data and the test data sets, respectively • AUCd = AUC for the discrete classifier = (1 – fp + tp)/2 21 Part 2. Novel techniques for data analysis Business intelligence applications... Part 2. Novel techniques for data analysis Business intelligence applications Experiment 3: Bene1 and Bene2 credit scoring data sets • A pruned neural network for Bene1: 32 Part 2. Novel techniques for data analysis Business intelligence applications Experiment 3: Bene1 and Bene2 credit scoring data sets • The extracted rules for Bene1 (partial): Rule R: If Purpose = cash provisioning and Marital status = not married and Applicant ... k Doesn’t matter 13 Part 2. Novel techniques for data analysis Rule extraction Re‐RX: an algorithm for rule extraction from neural networks • New pedagocical rule extraction algorithm: Re‐RX (Recursive Rule Extraction) New pedagocical rule extraction algorithm: Re RX (Recursive Rule Extraction) • Handles mix of discrete/continuous variables without need for discretization of continuous variables –... Similar scoring models are now also used to estimate the credit risk of entire loan portfolios in the context of Basel II. 16 Part 2. Novel techniques for data analysis Business intelligence applications • Basel II capital accord: framework regulating minimum capital requirements for banks • C t Customer data credit risk score h d t dit i k how much capital to h it l t set aside for a portfolio of loans • Data collected from various operational systems in the bank,
Ngày đăng: 28/04/2014, 10:17
Xem thêm: Finding minimal Neural Network for Business, Finding minimal Neural Network for Business