Abstract
Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that any overlapping classes’ task can be reduced to a deterministic task with the same Bayesian separating surface. This can be done by removing “confusing samples” – samples that are misclassified by a “perfect” Bayesian classifier. We propose an algorithm for removing confusing samples and experimentally study behavior of AdaBoost trained on the resulting data sets. Experiments confirm that removing confusing samples helps boosting to reduce the generalization error and to avoid overfitting on both synthetic and real world. Process of removing confusing samples also provides an accurate error prediction based on the work with the training sets.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Angelova, A., Abu-Mostafa, Y., Perona, P.: Pruning Training Sets for Learning of Object Categories. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2005)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Breiman, L.: Bagging Predictors. Machine Learning 24, 2, 123–140 (1996)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2) (1999)
Domingos, P.: A Unified Bias-Variance Decomposition for Zero-One and Squared Loss. In: Proc. of the 17th National Conference on Artificial Intelligence (2000)
Domingo, C., Watanabe, O.: Madaboost: A modication of adaboost. In: 13th Annual Conference on Comp. Learning Theory (2000)
Freund, Y., Schapire, R.: Discussion of the paper Additive logistic regression: a statistical view of boosting. Friedman, J., Hastie, T., Tibshirani, R. The Annals of Statistics 38, 2, 391–393 (2000)
Freund, Y.: An Adaptive Version of the Boost by Majority Algorithm. Machine Learning 43(3), 293–318 (2001)
Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: a Statistical View of Boosting. The Annals of Statistics 28, 2, 337–407 (2000)
Friedman, J.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001)
Grove, A.J., Schuurmans, D.: Boosting in the limit: Maximizing the margin of learned ensembles. In: Proceedings of the Fifteenth National Conference on Artifical Intelligence (1998)
Hampel, F.R., Rousseeuw, P.J., Ronchetti, E.M., Stahel, W.A.: Robust Statistics: the Approach Based on Influence Functions. Wiley, New York (1986)
Jiang, Zhou: Editing training data for kNN classifiers with neural network ensemble. LNCS (2004)
Krause, N., Singer, Y.: Leveraging the Margin More Carefully. In: ACM International Conference Proceeding Series, vol. 69 (2004)
Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent. In: Neural Information Processing Systems, vol. 12, pp. 512–518. MIT Press, Cambridge (2000)
Merler, S., Caprile, B., Furlanello, C.: Bias-variance control via hard points shaving. International Journal of PatternRecognition and Artificial Intelligence (2004)
Muhlenbach, F., Lallich, S., Zighed, D.A.: Identifying and handling mislabelled instances. Intelligent Information Systems 22, 1, 89–109 (2004)
Nicholson, A.: Generalization Error Estimates and Training Data Valuation, Ph.D. Thesis, California Institute of Technology (2002)
Niculescu-Mizil, A., Caruana, R.: Obtaining Calibrated Probabilities from Boosting. In: Proc. 21st Conference on Uncertainty in Artificial Intelligence (2005)
Perrone, M.: Improving regression estimation: Averaging methods for Variance reduction with extension to General Convex Measure Optimization, Ph.D. Thesis, Brown University (1993)
Ratsch, G.: Robust Boosting and Convex Optimization. Doctoral dissertation, University of Potsdam (2001)
Reyzin, L., Schapire, R.: How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Rosset, S.: Robust Boosting and Its Relation to Bagging. In: KDD-2005 (2005)
Sanchez, et al.: Analysis of new techniques to obtain quality training sets. Pattern Recognition Letters (2003)
Schapire, R., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning 37, 3, 297–336 (1999)
Schapire, R., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Machine Learning: Proceedings of the Fourteenth International Conference (1997)
Takenouchi, T., Eguchi, S.: Robustifying AdaBoost by adding the naive error rate. Neural Computation 16 (2004)
Taniguchi, M., Tresp, V.: Averaging Regularized Estimators. Neural Computation (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vezhnevets, A., Barinova, O. (2007). Avoiding Boosting Overfitting by Removing Confusing Samples. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)