Ngày tải lên: 05/07/2014, 21:20
... inter- polated KN, instead of one D parameter for each N-gram length, there are three: D 1 for N-grams whose count is 1, D 2 for N-grams whose count is 2, and D 3 for N-grams whose count is 3 or more. The ... held-out data, or by a theoretically-derived formula analogous to the Ney et al. formula for the one-discount case: D r = r − (r + 1)Y N r+1 N r , for 1 ≤ r ≤ 3, where Y = N 1 /(N 1 + 2N 2 ), ... al. formula dis- counts, backoff absolute discounting with (3) Ney et al. formula discounts and with (4) one empir- ically optimized discount, (5) modified interpo- lated KN with Chen-Goodman formula...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "An Improved Parser for Data-Oriented Lexical-Functional Analysis" doc
... the Homecentre An Improved Parser for Data-Oriented Lexical-Functional Analysis Rens Bod Informatics Research Institute, University of Leeds, Leeds LS2 9JT, UK, & Institute for Logic, Language ... same f-structure unit (see Bod 2000c for some illustrations of these metrics for LFG-DOP). 5.1 Comparing the two fragment estimators We were first interested in comparing the performance of the simple RF ... derivation for an LFG-DOP representation R is a sequence of fragments the first of which is labeled with S and for which the iterative application of the composition operation produces R. For an...
Ngày tải lên: 20/02/2014, 18:20
Báo cáo khoa học: "Project for production of closed-caption TV programs for the hearing impaired" docx
... processing methods for automatic text summarisation. The first is to compute key words in a text and importance measures for each sentence, and then select importanct sentences for the text. The ... improvement and evaluation of the system in the second stage. 1341 Project for production of closed-caption TV programs for the hearing impaired Takahiro Wakao Telecommunications Advancement ... Japanese. Tanahashi D. (1998) Study on Caption Presentation for TV news programs for the hearing impaired Waseda University, Department of Information and Computer Science (master's thesis)...
Ngày tải lên: 08/03/2014, 06:20
Báo cáo khoa học: "An Improved Heuristic for Ellipsis Processing*" ppt
... will be formally equivalent in predictive power to a grammar for ellipti- cal forms. Though the heuristic is independent of the individual grammar, designating expansion paths and transformations ... of a transformed previous path using the same phrases as in the transformed form ("I was angry"). Replacement ellipsis is processed by sub- stituting the elliptical input for contig- ... a transformed previous path. There are two contributions of the work. First, our method allows for expan- sion ellipsis. Second, it accounts for combinations of previous sentence form and...
Ngày tải lên: 08/03/2014, 18:20
improved algorithm for adaboost with svm base classifiers
... be known a priori before boosting begins. majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult weight that assigned to ht . to obtain. AdaBoost, on the other hand, is adaptive in that The algorithm for AdaBoost is given below: it adapts to the error rates of the individual base 1. Input: a set of training samples with labels hypotheses. This is the basis of its name - "Ada" is short D= {(xl,y1), .,(xYn)} , xi , Y = 1-1,+1. for "adaptive". Base learner algorithm, the number of cycles T. 2. Initialize: the weight of samples: w1 (i) = 1 / n, for alli =l,*, n 949 (5) Set weight of base classifier Ct number of training samples, and axis Y gives the correct 1 1- classification rates. In Fig.1, Ada-SVM stands for at =-ln( t); AdaBoostSVM, and improved Ada-SVM stands for 2 Et variable a-AdaBoostRBF SVM. (6) Update training samples' weights: 90 - w, (i) exp {-ar,yih, (xi )} wt+1 W 8 Where Z, is a normalization factor, and 5/+mprvdAaV 427 w,+1 (i) = 1; * 4 i rn 10 0 2i 0O 2ai 300 3;iO 400 4ai ai0Z t+ n ~~~~~~~~~~~~~~~~~~NuberfThaig Saples 4.Otu:tefia lsiir Fig. 1. Performance comparison for Westontoynonliner H(x) = seign[ ah (xh)]. t 61 4. EXPERIMENTS AND RESU LTS .S t Ada~~~~~~~~~~~~~~~~~~~~~~~~- SVM Impove Ada-SV To evaluate the performance of the variable ci- 79 ~Ad-V AdaBoostRBFSVM, and make a experimental 7 J100 1i comparison between AdaBoostSVM which using a single Fig.2 Performbance compailes Fig. 2. Performance comparison for Wetnooiner and fixed a to all RBFSVM base classifiers and our improved algorithm, experiments for the Westontoynon- For the Wine data set, the training and testing samples liner data set and the Wine data set[8] were conducted, were also chosen randomly from the given datasets, and the results of the classification experiments are 50,80,100,130,150 are the numbers of training samples given. used in the experiments, and 79 is the number of testing The SVM we used is from Steve Gunn SVM Toolbox. samples used in the experiments. The Westontoynonliner data set consists of 1000 samples For SVM and AdaBoostSVM, set the Gaussian width of 2 classes, each sample having 52 attributes. The Wine ai of RBFSVM to 2, 6, and 12, the average correct data set consists of 178 samples of 3 classes, each sample classification rates for randomly chosen testing data sets having 13 attributes, class 1 is used as the positive class are calculated. and the other two classes belong to the negative class in For variable ci-AdaBoostRBFSVM and AdaBoost- the classification experiments. SVM, 1/2-1/8 of the training samples are used to train the The training of SVMs for the variable ci- base classifiers, and the average correct classification AdaBoostRBFSVM, AdaBoostSVM and SVM are under rates for 3 randomly chosen testing data sets are the same parameter when comparing the performance of calculated. the algorithms, C= 1000. For SVM and AdaBoostSVM, Fig.2 gives the results of performance comparison for set the Gaussian width ai of RBFSVM to 12. Let T be the the Wine data set, axis X indicates the number of training number of base classifiers and T=10 in the experiments, samples, and axis Y gives the correct classification rates. For the Westontoynonliner data set, the training and In Fig.2, Ada-SVM stands for AdaBoostSVM, and testing samples are chosen randomly from the given improved Ada-SVM stands for variable ci-AdaBoost- datasets, 50,150,200,300,500 are the numbers of training RB3FSVM. samples used in the experiments, and 128 is the number From Fig. 1 and Fig.2 we can know that comparing of testing samples used in the experiments. AdaBoostSVM with a single SVM, they have almost the For variable ci-AdaBoostRBFSVM and AdaBoost- same classification performances, but our improved SVM, 1/2-1/10 of the training samples are used to train AdaBoostRBFSVM improves the average correct the base classifiers, and the average correct classification classification rates obviously. rates for 3 randomly chosen testing data sets are For the Wine data set, the distribution of the training calculated. samples is unbalanced, because there are 59 training Fig. 1 gives the results of performance comparison for samples in the positive class and 119 training samples in the Westontoynonliner data set, axis X indicates the the negative class. From Fig.2 we can know that the 951 variable a-AdaBoostRBFSVM is more efficient for [4] Y. Freund, R. E. Schapire, "A decision-theoretic unbalanced data set. generalization of online learning and an application to Compared with using a single SVM, the benefit of boosting", Journal of Computer and System Sciences, using the improved AdaBoostRBFSVM is its advantage vol. 55,no. 1, pp.119-139, August 1997. of model selection; and compared with using AdaBoost [5] R. E. Schapire, Y. Singer, P. Bartlett, and W. Lee, of a single and fixed a to all RBFSVM base classifiers, it "Boosting the margin: A new explanation for the has better generation performance. effectiveness of voting methods," The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998. [6] Vladimir Vapnik, Statistical Learning Theory, John 5. CONCLUSIONS Wiley and Sons Inc., New York, 1998. [7] G. Valentini, T. G. Dietterich, "Bias-variance AdaBoost is a general method for improving the analysis of support vector machines for the accuracy of any given learning algorithm. After development of svm-based ensemble methods", analyzing the relation between the performance of Journal of Machine Learning Research, vol. 5, pp. AdaBoost and the performance of base classifiers, the 725-775, 2004. approach of improving the classification performance of [8] Dmitry Pavlov, Jianchang Mao, "Scaling-up Support AdaBoostSVM was studied in this paper. Vector Machines Using Boosting Algorithm", In There is inconsistency existed between the accuracy Proceedings of ICPR 2000. and diversity of base classifiers, and the inconsistency [9] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, affect generalization performance of the algorithm. How Daijin Kim, and Sung Yang Bang, "Constructing to deal with the dilemma between base classifier's support vector machine ensemble," Pattern accuracy and diversity of AdaBoost is very important for Recognition, vol. 36, no. 12, pp. 2757-2767, Dec improving the performance of AdaBoost. A new variable 2003. a-AdaBoostSVM was proposed by adjusting the kernel [10]Xuchun Li, Lei Wang, Eric Sung, "A Study of function parameter of the base classifier based on the AdaBoost with SVM Based Weak Learners", In distribution of training samples. The proposed algorithm Proceedings of IJCNN 2005. improves the classification performance by making a [11] P. Melville, R. J. Mooney, "Creating diversity in balance between the accuracy and diversity of base ensembles using articial data", Information Fusion, classifiers. Experimental results for the benchmark vol. 6, no. 1,pp. 99-111, Mar 2005. dataset indicate the effectiveness of the proposed [12] I. K. Ludmila, J. W. Christopher, "Measures of algorithm. The experimental results also indicate that the diversity in classifier ensembles and their relationship proposed algorithm is more efficient for unbalanced data with the ensemble accuracy", Machine Learning, vol. set. 51, no. 2,pp. 181-207, May 2003. [13] H. W. Shin and S. Y. Sohn, "Selected tree classifier combination based on both accuracy and error diversity," Pattern Recognition, vol. 38, pp. 191-197, Acknowledgements 2005. [14] Thomas G. Dietterich, "An experimental This work is supported by Natural Science Basic comparison of three methods for constructing Research Plan in Shaanxi Province of China under Grant ensembles of decision trees: Bagging, boosting, and 2004F36, and partially supported by NSFC under Grant randomization," Machine Learning, vol. 40, no. 2, 50505051. pp. 139-157, Aug 2000. References [1] L. G. Valiant, "A theory of the learnable", Communications of the ACM, vol. 27, no. 11, pp.1134-1142, November 1984. [2] R. E. Schapire, "The strength of weak learnability", Machine Learning, vol. 5, no. 2, pp. 197- 227, 1990. [3] Yoav Freund, "Boosting a weak learning algorithm by majority", Information and Computation, vol. 121, no. 2, pp.256-285, 1995. 952 3. AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and CLASSIFIERS get good classification performance, it is necessary to find suitable a for each base classifier RBFSVM. Because if a roughly suitable C is given and the variance Diversity is known to be an important factor affecting of the training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good [11][12], which means that the errors made by different classification performance, we will use the variance of base classifiers are uncorrelated. If each base classifier is the training samples for each base classifier as the moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM classifiers will be removed by the voting process so as to classifiers for AdaBoost, and an improved algorithm will achieve good ensemble results[13]. This also applies to be obtained, we call it the variable o- AdaBoost. AdaBoostRBFSVM. Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM AdaBoost have been reported[7][8][9][10]. These studies base learners are mostly moderately accurate, which give showed the good generalization performance of chances to obtain more un-correlated base learners. AdaBoost. Through adjusting the a value according to the variance For AdaBoost, it is known that there exists a dilemma of the training samples, a set of SVM base learners with between base learner's accuracy and diversity[14], which different learning abilities is obtained. The proposed means that the more accurate two base learners become, variable G-AdaBoostRBFSVM is hoped to achieve the less they can disagree with each other. Therefore, higher generalization performance than AdaBoostSVM how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers. In the proposed algorithm, without loss of but not too accurate ones? generality, re-sampling technique is used. Suppose we can keep an accuracy and diversity The algorithm for variable a -AdaBoostRBFSVM: balance among different base classifiers, the superior 1. Input: a set of training samples with labels result of AdaBoost will be gotten, but there is no D= {(x, , yI ), - , (x, yJ )}, xi E X, y1 E Y = {-1,+1}. effective way to get a desirable result for AdaBoost. We will then analyze the instance of using RBFSVM as the Bs hlerer the number of cycles T. base classifier of AdaBoost. 2. Initialize: the weight of samples: w1 (i) = 1/n, for The problem of model selection is very important for all i = 1, ... be known a priori before boosting begins. majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult weight that assigned to ht . to obtain. AdaBoost, on the other hand, is adaptive in that The algorithm for AdaBoost is given below: it adapts to the error rates of the individual base 1. Input: a set of training samples with labels hypotheses. This is the basis of its name - "Ada" is short D= {(xl,y1), .,(xYn)} , xi , Y = 1-1,+1. for "adaptive". Base learner algorithm, the number of cycles T. 2. Initialize: the weight of samples: w1 (i) = 1 / n, for alli =l,*, n 949 (5) Set weight of base classifier Ct number of training samples, and axis Y gives the correct 1 1- classification rates. In Fig.1, Ada-SVM stands for at =-ln( t); AdaBoostSVM, and improved Ada-SVM stands for 2 Et variable a-AdaBoostRBF SVM. (6) Update training samples' weights: 90 - w, (i) exp {-ar,yih, (xi )} wt+1 W 8 Where Z, is a normalization factor, and 5/+mprvdAaV 427 w,+1 (i) = 1; * 4 i rn 10 0 2i 0O 2ai 300 3;iO 400 4ai ai0Z t+ n ~~~~~~~~~~~~~~~~~~NuberfThaig Saples 4.Otu:tefia lsiir Fig. 1. Performance comparison for Westontoynonliner H(x) = seign[ ah (xh)]. t 61 4. EXPERIMENTS AND RESU LTS .S t Ada~~~~~~~~~~~~~~~~~~~~~~~~- SVM Impove Ada-SV To evaluate the performance of the variable ci- 79 ~Ad-V AdaBoostRBFSVM, and make a experimental 7 J100 1i comparison between AdaBoostSVM which using a single Fig.2 Performbance compailes Fig. 2. Performance comparison for Wetnooiner and fixed a to all RBFSVM base classifiers and our improved algorithm, experiments for the Westontoynon- For the Wine data set, the training and testing samples liner data set and the Wine data set[8] were conducted, were also chosen randomly from the given datasets, and the results of the classification experiments are 50,80,100,130,150 are the numbers of training samples given. used in the experiments, and 79 is the number of testing The SVM we used is from Steve Gunn SVM Toolbox. samples used in the experiments. The Westontoynonliner data set consists of 1000 samples For SVM and AdaBoostSVM, set the Gaussian width of 2 classes, each sample having 52 attributes. The Wine ai of RBFSVM to 2, 6, and 12, the average correct data set consists of 178 samples of 3 classes, each sample classification rates for randomly chosen testing data sets having 13 attributes, class 1 is used as the positive class are calculated. and the other two classes belong to the negative class in For variable ci-AdaBoostRBFSVM and AdaBoost- the classification experiments. SVM, 1/2-1/8 of the training samples are used to train the The training of SVMs for the variable ci- base classifiers, and the average correct classification AdaBoostRBFSVM, AdaBoostSVM and SVM are under rates for 3 randomly chosen testing data sets are the same parameter when comparing the performance of calculated. the algorithms, C= 1000. For SVM and AdaBoostSVM, Fig.2 gives the results of performance comparison for set the Gaussian width ai of RBFSVM to 12. Let T be the the Wine data set, axis X indicates the number of training number of base classifiers and T=10 in the experiments, samples, and axis Y gives the correct classification rates. For the Westontoynonliner data set, the training and In Fig.2, Ada-SVM stands for AdaBoostSVM, and testing samples are chosen randomly from the given improved Ada-SVM stands for variable ci-AdaBoost- datasets, 50,150,200,300,500 are the numbers of training RB3FSVM. samples used in the experiments, and 128 is the number From Fig. 1 and Fig.2 we can know that comparing of testing samples used in the experiments. AdaBoostSVM with a single SVM, they have almost the For variable ci-AdaBoostRBFSVM and AdaBoost- same classification performances, but our improved SVM, 1/2-1/10 of the training samples are used to train AdaBoostRBFSVM improves the average correct the base classifiers, and the average correct classification classification rates obviously. rates for 3 randomly chosen testing data sets are For the Wine data set, the distribution of the training calculated. samples is unbalanced, because there are 59 training Fig. 1 gives the results of performance comparison for samples in the positive class and 119 training samples in the Westontoynonliner data set, axis X indicates the the negative class. From Fig.2 we can know that the 951 variable a-AdaBoostRBFSVM is more efficient for [4] Y. Freund, R. E. Schapire, "A decision-theoretic unbalanced data set. generalization of online learning and an application to Compared with using a single SVM, the benefit of boosting", Journal of Computer and System Sciences, using the improved AdaBoostRBFSVM is its advantage vol. 55,no. 1, pp.119-139, August 1997. of model selection; and compared with using AdaBoost [5] R. E. Schapire, Y. Singer, P. Bartlett, and W. Lee, of a single and fixed a to all RBFSVM base classifiers, it "Boosting the margin: A new explanation for the has better generation performance. effectiveness of voting methods," The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998. [6] Vladimir Vapnik, Statistical Learning Theory, John 5. CONCLUSIONS Wiley and Sons Inc., New York, 1998. [7] G. Valentini, T. G. Dietterich, "Bias-variance AdaBoost is a general method for improving the analysis of support vector machines for the accuracy of any given learning algorithm. After development of svm-based ensemble methods", analyzing the relation between the performance of Journal of Machine Learning Research, vol. 5, pp. AdaBoost and the performance of base classifiers, the 725-775, 2004. approach of improving the classification performance of [8] Dmitry Pavlov, Jianchang Mao, "Scaling-up Support AdaBoostSVM was studied in this paper. Vector Machines Using Boosting Algorithm", In There is inconsistency existed between the accuracy Proceedings of ICPR 2000. and diversity of base classifiers, and the inconsistency [9] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, affect generalization performance of the algorithm. How Daijin Kim, and Sung Yang Bang, "Constructing to deal with the dilemma between base classifier's support vector machine ensemble," Pattern accuracy and diversity of AdaBoost is very important for Recognition, vol. 36, no. 12, pp. 2757-2767, Dec improving the performance of AdaBoost. A new variable 2003. a-AdaBoostSVM was proposed by adjusting the kernel [10]Xuchun Li, Lei Wang, Eric Sung, "A Study of function parameter of the base classifier based on the AdaBoost with SVM Based Weak Learners", In distribution of training samples. The proposed algorithm Proceedings of IJCNN 2005. improves the classification performance by making a [11] P. Melville, R. J. Mooney, "Creating diversity in balance between the accuracy and diversity of base ensembles using articial data", Information Fusion, classifiers. Experimental results for the benchmark vol. 6, no. 1,pp. 99-111, Mar 2005. dataset indicate the effectiveness of the proposed [12] I. K. Ludmila, J. W. Christopher, "Measures of algorithm. The experimental results also indicate that the diversity in classifier ensembles and their relationship proposed algorithm is more efficient for unbalanced data with the ensemble accuracy", Machine Learning, vol. set. 51, no. 2,pp. 181-207, May 2003. [13] H. W. Shin and S. Y. Sohn, "Selected tree classifier combination based on both accuracy and error diversity," Pattern Recognition, vol. 38, pp. 191-197, Acknowledgements 2005. [14] Thomas G. Dietterich, "An experimental This work is supported by Natural Science Basic comparison of three methods for constructing Research Plan in Shaanxi Province of China under Grant ensembles of decision trees: Bagging, boosting, and 2004F36, and partially supported by NSFC under Grant randomization," Machine Learning, vol. 40, no. 2, 50505051. pp. 139-157, Aug 2000. References [1] L. G. Valiant, "A theory of the learnable", Communications of the ACM, vol. 27, no. 11, pp.1134-1142, November 1984. [2] R. E. Schapire, "The strength of weak learnability", Machine Learning, vol. 5, no. 2, pp. 197- 227, 1990. [3] Yoav Freund, "Boosting a weak learning algorithm by majority", Information and Computation, vol. 121, no. 2, pp.256-285, 1995. 952 3. AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and CLASSIFIERS get good classification performance, it is necessary to find suitable a for each base classifier RBFSVM. Because if a roughly suitable C is given and the variance Diversity is known to be an important factor affecting of the training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good [11][12], which means that the errors made by different classification performance, we will use the variance of base classifiers are uncorrelated. If each base classifier is the training samples for each base classifier as the moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM classifiers will be removed by the voting process so as to classifiers for AdaBoost, and an improved algorithm will achieve good ensemble results[13]. This also applies to be obtained, we call it the variable o- AdaBoost. AdaBoostRBFSVM. Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM AdaBoost have been reported[7][8][9][10]. These studies base learners are mostly moderately accurate, which give showed the good generalization performance of chances to obtain more un-correlated base learners. AdaBoost. Through adjusting the a value according to the variance For AdaBoost, it is known that there exists a dilemma of the training samples, a set of SVM base learners with between base learner's accuracy and diversity[14], which different learning abilities is obtained. The proposed means that the more accurate two base learners become, variable G-AdaBoostRBFSVM is hoped to achieve the less they can disagree with each other. Therefore, higher generalization performance than AdaBoostSVM how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers. In the proposed algorithm, without loss of but not too accurate ones? generality, re-sampling technique is used. Suppose we can keep an accuracy and diversity The algorithm for variable a -AdaBoostRBFSVM: balance among different base classifiers, the superior 1. Input: a set of training samples with labels result of AdaBoost will be gotten, but there is no D= {(x, , yI ), - , (x, yJ )}, xi E X, y1 E Y = {-1,+1}. effective way to get a desirable result for AdaBoost. We will then analyze the instance of using RBFSVM as the Bs hlerer the number of cycles T. base classifier of AdaBoost. 2. Initialize: the weight of samples: w1 (i) = 1/n, for The problem of model selection is very important for all i = 1, ... be known a priori before boosting begins. majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult weight that assigned to ht . to obtain. AdaBoost, on the other hand, is adaptive in that The algorithm for AdaBoost is given below: it adapts to the error rates of the individual base 1. Input: a set of training samples with labels hypotheses. This is the basis of its name - "Ada" is short D= {(xl,y1), .,(xYn)} , xi , Y = 1-1,+1. for "adaptive". Base learner algorithm, the number of cycles T. 2. Initialize: the weight of samples: w1 (i) = 1 / n, for alli =l,*, n 949 (5) Set weight of base classifier Ct number of training samples, and axis Y gives the correct 1 1- classification rates. In Fig.1, Ada-SVM stands for at =-ln( t); AdaBoostSVM, and improved Ada-SVM stands for 2 Et variable a-AdaBoostRBF SVM. (6) Update training samples' weights: 90 - w, (i) exp {-ar,yih, (xi )} wt+1 W 8 Where Z, is a normalization factor, and 5/+mprvdAaV 427 w,+1 (i) = 1; * 4 i rn 10 0 2i 0O 2ai 300 3;iO 400 4ai ai0Z t+ n ~~~~~~~~~~~~~~~~~~NuberfThaig Saples 4.Otu:tefia lsiir Fig. 1. Performance comparison for Westontoynonliner H(x) = seign[ ah (xh)]. t 61 4. EXPERIMENTS AND RESU LTS .S t Ada~~~~~~~~~~~~~~~~~~~~~~~~- SVM Impove Ada-SV To evaluate the performance of the variable ci- 79 ~Ad-V AdaBoostRBFSVM, and make a experimental 7 J100 1i comparison between AdaBoostSVM which using a single Fig.2 Performbance compailes Fig. 2. Performance comparison for Wetnooiner and fixed a to all RBFSVM base classifiers and our improved algorithm, experiments for the Westontoynon- For the Wine data set, the training and testing samples liner data set and the Wine data set[8] were conducted, were also chosen randomly from the given datasets, and the results of the classification experiments are 50,80,100,130,150 are the numbers of training samples given. used in the experiments, and 79 is the number of testing The SVM we used is from Steve Gunn SVM Toolbox. samples used in the experiments. The Westontoynonliner data set consists of 1000 samples For SVM and AdaBoostSVM, set the Gaussian width of 2 classes, each sample having 52 attributes. The Wine ai of RBFSVM to 2, 6, and 12, the average correct data set consists of 178 samples of 3 classes, each sample classification rates for randomly chosen testing data sets having 13 attributes, class 1 is used as the positive class are calculated. and the other two classes belong to the negative class in For variable ci-AdaBoostRBFSVM and AdaBoost- the classification experiments. SVM, 1/2-1/8 of the training samples are used to train the The training of SVMs for the variable ci- base classifiers, and the average correct classification AdaBoostRBFSVM, AdaBoostSVM and SVM are under rates for 3 randomly chosen testing data sets are the same parameter when comparing the performance of calculated. the algorithms, C= 1000. For SVM and AdaBoostSVM, Fig.2 gives the results of performance comparison for set the Gaussian width ai of RBFSVM to 12. Let T be the the Wine data set, axis X indicates the number of training number of base classifiers and T=10 in the experiments, samples, and axis Y gives the correct classification rates. For the Westontoynonliner data set, the training and In Fig.2, Ada-SVM stands for AdaBoostSVM, and testing samples are chosen randomly from the given improved Ada-SVM stands for variable ci-AdaBoost- datasets, 50,150,200,300,500 are the numbers of training RB3FSVM. samples used in the experiments, and 128 is the number From Fig. 1 and Fig.2 we can know that comparing of testing samples used in the experiments. AdaBoostSVM with a single SVM, they have almost the For variable ci-AdaBoostRBFSVM and AdaBoost- same classification performances, but our improved SVM, 1/2-1/10 of the training samples are used to train AdaBoostRBFSVM improves the average correct the base classifiers, and the average correct classification classification rates obviously. rates for 3 randomly chosen testing data sets are For the Wine data set, the distribution of the training calculated. samples is unbalanced, because there are 59 training Fig. 1 gives the results of performance comparison for samples in the positive class and 119 training samples in the Westontoynonliner data set, axis X indicates the the negative class. From Fig.2 we can know that the 951 variable a-AdaBoostRBFSVM is more efficient for [4] Y. Freund, R. E. Schapire, "A decision-theoretic unbalanced data set. generalization of online learning and an application to Compared with using a single SVM, the benefit of boosting", Journal of Computer and System Sciences, using the improved AdaBoostRBFSVM is its advantage vol. 55,no. 1, pp.119-139, August 1997. of model selection; and compared with using AdaBoost [5] R. E. Schapire, Y. Singer, P. Bartlett, and W. Lee, of a single and fixed a to all RBFSVM base classifiers, it "Boosting the margin: A new explanation for the has better generation performance. effectiveness of voting methods," The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998. [6] Vladimir Vapnik, Statistical Learning Theory, John 5. CONCLUSIONS Wiley and Sons Inc., New York, 1998. [7] G. Valentini, T. G. Dietterich, "Bias-variance AdaBoost is a general method for improving the analysis of support vector machines for the accuracy of any given learning algorithm. After development of svm-based ensemble methods", analyzing the relation between the performance of Journal of Machine Learning Research, vol. 5, pp. AdaBoost and the performance of base classifiers, the 725-775, 2004. approach of improving the classification performance of [8] Dmitry Pavlov, Jianchang Mao, "Scaling-up Support AdaBoostSVM was studied in this paper. Vector Machines Using Boosting Algorithm", In There is inconsistency existed between the accuracy Proceedings of ICPR 2000. and diversity of base classifiers, and the inconsistency [9] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, affect generalization performance of the algorithm. How Daijin Kim, and Sung Yang Bang, "Constructing to deal with the dilemma between base classifier's support vector machine ensemble," Pattern accuracy and diversity of AdaBoost is very important for Recognition, vol. 36, no. 12, pp. 2757-2767, Dec improving the performance of AdaBoost. A new variable 2003. a-AdaBoostSVM was proposed by adjusting the kernel [10]Xuchun Li, Lei Wang, Eric Sung, "A Study of function parameter of the base classifier based on the AdaBoost with SVM Based Weak Learners", In distribution of training samples. The proposed algorithm Proceedings of IJCNN 2005. improves the classification performance by making a [11] P. Melville, R. J. Mooney, "Creating diversity in balance between the accuracy and diversity of base ensembles using articial data", Information Fusion, classifiers. Experimental results for the benchmark vol. 6, no. 1,pp. 99-111, Mar 2005. dataset indicate the effectiveness of the proposed [12] I. K. Ludmila, J. W. Christopher, "Measures of algorithm. The experimental results also indicate that the diversity in classifier ensembles and their relationship proposed algorithm is more efficient for unbalanced data with the ensemble accuracy", Machine Learning, vol. set. 51, no. 2,pp. 181-207, May 2003. [13] H. W. Shin and S. Y. Sohn, "Selected tree classifier combination based on both accuracy and error diversity," Pattern Recognition, vol. 38, pp. 191-197, Acknowledgements 2005. [14] Thomas G. Dietterich, "An experimental This work is supported by Natural Science Basic comparison of three methods for constructing Research Plan in Shaanxi Province of China under Grant ensembles of decision trees: Bagging, boosting, and 2004F36, and partially supported by NSFC under Grant randomization," Machine Learning, vol. 40, no. 2, 50505051. pp. 139-157, Aug 2000. References [1] L. G. Valiant, "A theory of the learnable", Communications of the ACM, vol. 27, no. 11, pp.1134-1142, November 1984. [2] R. E. Schapire, "The strength of weak learnability", Machine Learning, vol. 5, no. 2, pp. 197- 227, 1990. [3] Yoav Freund, "Boosting a weak learning algorithm by majority", Information and Computation, vol. 121, no. 2, pp.256-285, 1995. 952 3. AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and CLASSIFIERS get good classification performance, it is necessary to find suitable a for each base classifier RBFSVM. Because if a roughly suitable C is given and the variance Diversity is known to be an important factor affecting of the training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good [11][12], which means that the errors made by different classification performance, we will use the variance of base classifiers are uncorrelated. If each base classifier is the training samples for each base classifier as the moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM classifiers will be removed by the voting process so as to classifiers for AdaBoost, and an improved algorithm will achieve good ensemble results[13]. This also applies to be obtained, we call it the variable o- AdaBoost. AdaBoostRBFSVM. Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM AdaBoost have been reported[7][8][9][10]. These studies base learners are mostly moderately accurate, which give showed the good generalization performance of chances to obtain more un-correlated base learners. AdaBoost. Through adjusting the a value according to the variance For AdaBoost, it is known that there exists a dilemma of the training samples, a set of SVM base learners with between base learner's accuracy and diversity[14], which different learning abilities is obtained. The proposed means that the more accurate two base learners become, variable G-AdaBoostRBFSVM is hoped to achieve the less they can disagree with each other. Therefore, higher generalization performance than AdaBoostSVM how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers. In the proposed algorithm, without loss of but not too accurate ones? generality, re-sampling technique is used. Suppose we can keep an accuracy and diversity The algorithm for variable a -AdaBoostRBFSVM: balance among different base classifiers, the superior 1. Input: a set of training samples with labels result of AdaBoost will be gotten, but there is no D= {(x, , yI ), - , (x, yJ )}, xi E X, y1 E Y = {-1,+1}. effective way to get a desirable result for AdaBoost. We will then analyze the instance of using RBFSVM as the Bs hlerer the number of cycles T. base classifier of AdaBoost. 2. Initialize: the weight of samples: w1 (i) = 1/n, for The problem of model selection is very important for all i = 1,...
Ngày tải lên: 24/04/2014, 13:05
SYNTHESIZE AND INVESTIGATE THE CATALYTIC ACTIVITY OF THREE-WAY CATALYSTS BASED ON MIXED METAL OXIDES FOR THE TREATMENT OF EXHAUST GASES FROM INTERNAL COMBUSTION ENGINE
... systems based on metallic oxides Metal oxides are an alternative to NMs as catalysts for complete oxidation. The most active single metal oxides for combustion of VOCs are the oxides of Cu, Co, Mn, ... before and after aging at 800 o C in flow containing 57% steam for 24h 74 Figure 3.32 XRD patterns of MnCoCe catalysts before and after aging in a flow containing 57% vol.H 2 O at 800 o C for ... mixture of single oxides and soot in TG-DTA (DSC) diagrams 63 Table 3.6 Catalytic activity of single oxides for soot treatment 63 Table 3.7 T max of mixture of multiple oxides and soot determined...
Ngày tải lên: 12/05/2014, 17:14
APPLYING OF ULTRASOUND FOR PRODUCTION OF NANOMATERIALS
... Nanomaterials, for example metal oxides or carbon nanotubes tend to be agglomerated when mixed into a liquid, while production of nanomaterials requires effective dispersion and obtaining of uniform ... this is extended to metal oxides the cooling rate required for many oxides is well beyond that which can be obtained using the cold quenching method. This is why glass-former materials ... Livage J Amorphous transition metal oxides // J. Phys., 1981. - vol. 42. - 981. 8. M. Sugimoto. Amorphous characteristics in spinel ferrites containing glassy oxides // J. Magn. Magn. Mater., 1994....
Ngày tải lên: 11/06/2014, 12:29
Báo cáo hóa học: " Torque teno virus: an improved indicator for viral pathogens in drinking waters" pot
... are used instead [2,3]. In water supply systems, monitoring for total coliforms, fecal coliforms, and E. coli is regulated under the Total Coliform Rule (TCR) [4]. However, these bacterial indicators ... such as coliforms, are not fully protective of public health from enteric viruses in water sources. Waterborne disease outbreaks have occurred in systems that tested negative for coliforms, and ... BioMed Central Page 1 of 6 (page number not for citation purposes) Virology Journal Open Access Hypothesis Torque teno virus: an improved indicator for viral pathogens in drinking waters Jennifer...
Ngày tải lên: 20/06/2014, 01:20
Báo cáo hóa học: " Research Article An Improved Flowchart for Gabor Order Tracking with Gaussian Window as the Analysis Window" doc
Ngày tải lên: 21/06/2014, 07:20
Báo cáo hóa học: " An Improved Algorithm for the Piecewise-Smooth Mumford and Shah Model in Image Segmentation" docx
Ngày tải lên: 22/06/2014, 22:20
AN IMPROVED INTERPOLATION METHOD FOR CHANNEL ESTIMATION IN AN ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING (OFDM) SYSTEM USING PILOT SIGNALS
Ngày tải lên: 26/05/2013, 21:28
BT 3 optimization of medium for indole 3 acetic acid production using pantoea agglomerans strain PVM
Ngày tải lên: 06/08/2013, 21:06
Approach to Zero Emission Processes in Food Industry - Case Study for Soy-Sauce Production Process -
Ngày tải lên: 05/09/2013, 08:40
Analysis of Phosphorus Behavior in the Giant Reed for Phytoremediation and the Biomass Production System
... resources, which can potentially be used as a substitute for fossil fuel. To date, many studies on wetlands for water treatments have been performed (e.g., Rousseau et al., 2008), and various plants ... nitrogen transformation in reed constructed wetlands, Desalination 235(1-3), 93-101. Yalcuk A. and Ugurlu A. (2009) Comparison of horizontal and vertical constructed wetland systems for landfill ... reeds that will be remained in the next year is cropped before start of the dying down period Cropping with roots and rhizomes except for the remaining roots and rhizomes All of the above-ground...
Ngày tải lên: 05/09/2013, 09:38
Different physical and chemical pretreatments of wheat straw for enhanced biobutanol production in simultaneous saccharification and fermentation
Ngày tải lên: 05/09/2013, 15:28
Methane cracking over commercial carbons for hydrogen production
Ngày tải lên: 05/09/2013, 15:28
The Use of Plant Cell Biotechnology for the Production of Phytochemicals
... development of an information base on genetic, cellular, and molecular levels is now a prerequisite for the use of plants or plant cell cultures for biotechnological applications for the following ... cells is now routinely per- formed for plant cell cultures in order to permit accurate assessment of growth and metabolite production rates. The availability of plant cells for quantitative measure- ment ... already been achieved for the production of several high-value secondary metabolites through cell cultivation in bioreactors. For exam- ple the valuable progress has been achieved for paclitaxel (Taxol),...
Ngày tải lên: 25/10/2013, 05:20
English On Business Practical English For International Executives_THE PRODUCTION MEETING
Ngày tải lên: 25/10/2013, 15:20
Bạn có muốn tìm thêm với từ khóa: