Abstract
Bagging can be interpreted as an approximation of random aggregating, an ideal ensemble method by which base learners are trained using data sets randomly drawn according to an unknown probability distribution. An approximate realization of random aggregating can be obtained through subsampled bagging, when large training sets are available. In this paper we perform an experimental bias–variance analysis of bagged and random aggregated ensembles of Support Vector Machines, in order to quantitatively evaluate their theoretical variance reduction properties. Experimental results with small samples show that random aggregating, implemented through subsampled bagging, reduces the variance component of the error by about 90%, while bagging, as expected, achieves a lower reduction. Bias–variance analysis explains also why ensemble methods based on subsampling techniques can be successfully applied to large data mining problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40, 139–158 (2000)
Bousquet, O., Elisseeff, A.: Stability and Generalization. Journal of Machine Learning Research 2, 499–526 (2002)
Valentini, G., Dietterich, T.G.: Bias–variance analysis of Support Vector Machines for the development of SVM-based ensemble methods. Journal of Machine Learning Research (accepted for publication)
Valentini, G., Dietterich, T.: Bias–variance analysis and ensembles of SVM. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 222–231. Springer, Heidelberg (2002)
Andersen, T., Rimer, M., Martinez, T.R.: Optimal artificial neural network architecture selection for voting. In: Proc. of the IEEE International Joint Conference on Neural Networks IJCNN 2001, pp. 790–795. IEEE, Los Alamitos (2001)
Kim, H., Pang, S., Je, H., Kim, D., Bang, S.: Pattern Classification Using Support Vector Machine Ensemble. In: Proc. of ICPR 2002, vol. 2, pp. 20160–20163. IEEE, Los Alamitos (2002)
Breiman, L.: Pasting Small Votes for Classification in Large Databases and On-Line. Machine Learning 36, 85–103 (1999)
Joachims, T.: Making large scale SVM learning practical. In: Scholkopf, B., Burges, C.S.A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)
Chawla, N., Hall, L., Bowyer, K., Moore, T., Kegelmeyer, W.: Distributed pasting of small votes. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 52–61. Springer, Heidelberg (2002)
Valentini, G.: Ensemble methods based on bias–variance analysis. PhD thesis, DISI, Università di Genova, Italy (2003), ftp://ftp.disi.unige.it/person/ValentiniG/Tesi/finalversion/vale-th-2003-04.pdf
Domingos, P.: A Unified Bias-Variance Decomposition for Zero-One and Squared Loss. In: Proc. of the Seventeenth National Conference on Artificial Intelligence, Austin, TX, pp. 564–569. AAAI Press, Menlo Park (2000)
Valentini, G., Masulli, F.: NEURObjects: an object-oriented library for neural network development. Neurocomputing 48, 623–646 (2002)
Merz, C., Murphy, P.: UCI repository of machine learning databases (1998), www.ics.uci.edu/mlearn/MLRepository.html
Evgeniou, T., Perez-Breva, L., Pontil, M., Poggio, T.: Bounds on the Generalization Performance of Kernel Machine Ensembles. In: Langley, P. (ed.) Proc. of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 271–278. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Valentini, G. (2004). Random Aggregated and Bagged Ensembles of SVMs: An Empirical Bias–Variance Analysis. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-25966-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22144-9
Online ISBN: 978-3-540-25966-4
eBook Packages: Springer Book Archive