The artists of data science

42 6 0
The artists of data science

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

11 eval algo slides Sebastian Raschka STAT 479 Machine Learning FS 2018 Model Evaluation 4 Algorithm Comparisons Lecture 11 �1 STAT 479 Machine Learning, Fall 2018 Sebastian Raschka http stat wisc e.11 eval algo slides Sebastian Raschka STAT 479 Machine Learning FS 2018 Model Evaluation 4 Algorithm Comparisons Lecture 11 �1 STAT 479 Machine Learning, Fall 2018 Sebastian Raschka http stat wisc e.

Lecture 11 Model Evaluation 4: Algorithm Comparisons STAT 479: Machine Learning, Fall 2018 Sebastian Raschka http://stat.wisc.edu/~sraschka/teaching/stat479-fs2018/ Sebastian Raschka STAT 479: Machine Learning FS 2018 Overview Bias and Variance Basics Overfitting and Underfitting Holdout method Confidence Intervals Model Eval Lectures Repeated holdout Resampling methods Empirical confidence intervals Hyperparameter tuning Cross-Validation Model selection This Lecture Algorithm Selection Statistical Tests Evaluation Metrics Sebastian Raschka STAT 479: Machine Learning FS 2018 Overview, (my) "recommendations" Large dataset Performance estimation Small dataset Large dataset Model selection (hyperparameter optimization) and performance estimation Small dataset Large dataset Model & algorithm comparison Small dataset Sebastian Raschka ▪ 2-way holdout method (train/test split) ▪ Confidence interval via normal approximation ▪ (Repeated) k-fold cross-validation without independent test set ▪ Leave-one-out cross-validation without independent test set ▪ Confidence interval via 0.632(+) bootstrap ▪ 3-way holdout method (train/validation/test split) ▪ (Repeated) k-fold cross-validation with independent test set ▪ Leave-one-out cross-validation with independent test set ▪ Disjoint training sets + test set (algorithm comparison, AC) ▪ McNemar test (model comparison, MC) ▪ Cochran’s Q + McNemar test (MC) ▪ Combined 5x2cv F test (AC) ▪ Nested cross-validation (AC) STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test McNemar's test, introduced by Quinn McNemar in 1947 [1], is a non-parametric statistical test for paired comparisons that can be applied to compare the performance of two machine learning classifiers: Task Gaussian data Compare a group to a reference value … Paired nominal data Binomial test Compare a pair of groups McNemar’s test test, Fisher’s exact test Compare two unpaired groups This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License [1] McNemar, Quinn "Note on the sampling error of the difference between correlated proportions or percentages." Psychometrika 12.2 (1947): 153-157 Sebastian Raschka STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test • Also referred to as "within-subjects chi-squared test" • Applied to paired nominal data based on a version of a 2x2 confusion matrix • Compares the predictions of two models to each other rather than listing false positive, true positive, false negative, and true negative counts of a single model • The layout of the 2x2 confusion matrix suitable for McNemar's test is shown in the Model correct Model wrong Model correct A B Model wrong following figure: C D This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License Sebastian Raschka STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test • Given such a 2x2 confusion matrix as shown in the previous figure, we can compute the accuracy of a Model via (A+B) / (A+B+C+D) • • Similarly, we can compute the accuracy of Model as (A+C) / N Model correct Model wrong Model correct A B Model wrong Cells B and C (the off-diagonal entries) tell us how the models differ C D This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License Sebastian Raschka STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test Let's take a look at the following example: • Model wrong 9959 11 29 Model correct Model wrong Model correct Model correct 9945 25 Model wrong B Model correct A Model wrong • 15 15 This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License What is the prediction accuracy of models and 2? Sebastian Raschka STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test What is the prediction accuracy of models and 2? Model wrong 9959 11 29 Model correct Model wrong Model correct Model correct Model correct B Model wrong A 9945 25 Model wrong • 15 15 This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License In both subpanel A and B, the accuracy of Model and Model are ???% and ???%, respectively • • • • Model accuracy subpanel A: (???)/10000 × 100 % = ??? % Model accuracy subpanel B: (???)/10000 × 100 % = ??? % Model accuracy subpanel A: (???)/10000 × 100 % = ??? % Model accuracy subpanel B: (???)/10000 × 100 % = ??? % Sebastian Raschka STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test What is the prediction accuracy of models and 2? Model wrong 9959 11 29 Model correct Model wrong Model correct Model correct Model correct B Model wrong A 9945 25 Model wrong • 15 15 This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License In both subpanel A and B, the accuracy of Model and Model are 99.7% and 99.6%, respectively • • • • Model accuracy subpanel A: (9959 + 11)/10000 × 100 % = 99.7 % Model accuracy subpanel B: (9945 + 25)/10000 × 100 % = 99.7 % Model accuracy subpanel A: (9959 + 1)/10000 × 100 % = 99.6 % Model accuracy subpanel B: (9945 + 15)/10000 × 100 % = 99.6 % Sebastian Raschka STAT 479: Machine Learning FS 2018 Comparing two machine learning classifiers McNemar's Test In both subpanel A and B, the accuracy of Model and Model are 99.7% and 99.6%, respectively 9959 11 29 Model correct Model wrong Model correct Model wrong 9945 25 Model wrong Model correct Model correct B Model wrong A 15 15 This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License In subpanel A: • Model got 11 predictions right that Model got wrong • Model got prediction right that Model got wrong • Based on this 11:1 ratio (based on our intuition), does Model perform substantially better than Model 2? In subpanel B: • The Model 1:Model ratio is 25:15 • This is less conclusive about which model is the better one to choose Sebastian Raschka STAT 479: Machine Learning FS 2018 10 Resampled paired t test t= ΔACCavg k k ∑i=1 (ΔACCi − ΔACCavg)2 /(k − 1) k ΔACCavg = ΔACCi ∑ k i=1 Here, k is the number of times we split the set into train/test sets ΔACCi = ACCiA − ACCiB H0: equal accuracies Two independence violations!!! Sebastian Raschka STAT 479: Machine Learning FS 2018 28 K-fold cross-validation with paired t test t= ΔACCavg k k ∑i=1 (ΔACCi Here, k is the number of folds we use − ΔACCavg)2 /(k − 1) k ΔACCavg = ΔACCi ∑ k i=1 ΔACCi = ACCiA − ACCiB H0: equal accuracies Sebastian Raschka STAT 479: Machine Learning FS 2018 29 5x2 CV Cross-Validation + paired t test Dietterich, T G (1998) Approximate statistical tests for comparing supervised classification learning algorithms Neural computation, 10(7), 1895-1923: Argument: independent training sets for 2-fold Now we get differences, since we use 2-fold cross-validation: ΔACCi(1) = ACCiA(1) − ACCiB(1) ΔACCi(2) = ACCiA(2) − ACCiB(2) ΔACCavg,i = (ΔACCi(1) + ΔACCi(2))/2 est variance: t= si2 = (ACC (1) − ΔACCavg,i)2 + (ACC (2) − ΔACCavg,i)2 ΔACC1(1) (1/5) ∑i=1 si2 (note that the subscript in denominator is not a typo, it only refers to the first run) Sebastian Raschka STAT 479: Machine Learning FS 2018 30 F Test for classifiers Looney, S W (1988) A statistical technique for comparing the accuracies of several classifiers Pattern Recognition Letters, 8(1), 5-9 Assume test set and L independent classifiers with accuracies ACC1, ACCL SSC = Nts L ∑ i=1 ACCi2 − Nts ⋅ L ⋅ ACCavg Ns t SSO = (Lj)2 − Nts ⋅ L ⋅ ACCavg L∑ j=1 SST = Nts ⋅ L ⋅ ACCavg(1 − ACCavg) SSCOMB = SST − SSC − SSO (where Lj is the number of classifiers that correctly classified the jth example) SSC MSC = L−1 SSCOMB MSCOMB = (L − 1)(Nts − 1) Sebastian Raschka STAT 479: Machine Learning MSC F= MSCOMB FS 2018 31 Combined × cv F Test for Comparing Supervised Classification Learning Algorithms Alpaydin, Ethem "Combined 5×2 cv F test for comparing supervised classification learning algorithms." Neural computation 11.8 (1999): 1885-1892 More robust than Dietterich 1998's 5x2 CV + t test f= ∑i=1 ∑j=1 (ΔACCij)2 ∑i=1 si2 Approximately F-distributed with 10 and degrees of freedom Sebastian Raschka STAT 479: Machine Learning FS 2018 32 Back to
 "Computational/Empirical" Methods Sebastian Raschka STAT 479: Machine Learning FS 2018 33 Recap: Model Selection with 3-way Holdout Original dataset Training set Training set Test set Validation set Test set Change hyperparameters and repeat Machine learning algorithm Fit Predictive model Sebastian Raschka Evaluate Final performance estimate STAT 479: Machine Learning FS 2018 34 Recap: Model Selection with k-fold Cross.-Val 1) good or bad ? Training Training set 2) good or bad ? 3) good or bad ? Training set Training set K-FOLD CROSS- 4) good or bad ? VALIDATION Model Model Model Model Model Model Model Model Test set Model Training Training Evaluation Selection Selection & Evaluation Model Validation set Test set Model Evaluation Test set Model Model Model Training set Training set Selection Model Evaluation Test set … Training set Sebastian Raschka STAT 479: Machine Learning FS 2018 35 Recap: Model Selection with k-fold Cross.-Val Training 1) Training set 3) Training set Training set Training K-FOLD CROSSVALIDATION 4) Model Model Model Model Model Model Model Model Test set Model Training 2) Evaluation Selection Selection & Evaluation Model Validation set Test set Model Evaluation Test set Model Model Model Training set Training set Selection Model Evaluation Test set … Training set Sebastian Raschka STAT 479: Machine Learning FS 2018 36 Nested Cross-Validation 
 for Algorithm Selection Main Idea:
 • Outer loop: purpose related to train/test split • Inner loop: like k-fold cross-validation for tuning Sebastian Raschka STAT 479: Machine Learning FS 2018 37 Nested Cross-Validation Sebastian Raschka STAT 479: Machine Learning FS 2018 38 Nested Cross-Validation 
 for Algorithm Selection • Outer loop: 
 use average performance as generalization performance
 check for "model stability" • Finally:
 as usual, fit model on whole dataset for deployment
 Sebastian Raschka STAT 479: Machine Learning FS 2018 39 Conclusions, (my) "recommendations" Large dataset Performance estimation Small dataset Large dataset Model selection (hyperparameter optimization) and performance estimation Small dataset Large dataset Model & algorithm comparison Small dataset Sebastian Raschka ▪ 2-way holdout method (train/test split) ▪ Confidence interval via normal approximation ▪ (Repeated) k-fold cross-validation without independent test set ▪ Leave-one-out cross-validation without independent test set ▪ Confidence interval via 0.632(+) bootstrap ▪ 3-way holdout method (train/validation/test split) ▪ (Repeated) k-fold cross-validation with independent test set ▪ Leave-one-out cross-validation with independent test set ▪ Disjoint training sets + test set (algorithm comparison, AC) ▪ McNemar test (model comparison, MC) ▪ Cochran’s Q + McNemar test (MC) ▪ Combined 5x2cv F test (AC) ▪ Nested cross-validation (AC) STAT 479: Machine Learning FS 2018 40 Code Examples https://github.com/rasbt/stat479-machine-learning-fs18/blob/master/ 11_eval-algo/11_eval-algo_code.ipynb Sebastian Raschka STAT 479: Machine Learning FS 2018 41 Overview Bias and Variance Basics Overfitting and Underfitting Holdout method Confidence Intervals Model Eval Lectures Repeated holdout Resampling methods Empirical confidence intervals Hyperparameter tuning Cross-Validation Model selection Algorithm Selection Statistical Tests Evaluation Metrics Next Lecture Sebastian Raschka STAT 479: Machine Learning FS 2018 42 ... McNemar's Test, we formulate the • • null hypothesis: the probabilities p(B) and p(C) are the same alternative hypothesis: the performances of the two models are not equal The McNemar test statistic... have the obvious result of reducing the absolute value of the difference, [B - C], by unity." [1] [1] Edwards, Allen L "Note on the “correction for continuity” in testing the significance of the. .. Multiple Hypothesis Testing Issue Conduct an omnibus test under the null hypothesis that there is no difference between the classification accuracies If the omnibus test led to the rejection of the null

Ngày đăng: 20/10/2022, 07:56

Tài liệu cùng người dùng

Tài liệu liên quan