Random Aggregated and Bagged Ensembles of SVMs: An Empirical Bias–Variance Analysis

Valentini, Giorgio

doi:10.1007/978-3-540-25966-4_26

Giorgio Valentini¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3077))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

1748 Accesses
5 Citations

Abstract

Bagging can be interpreted as an approximation of random aggregating, an ideal ensemble method by which base learners are trained using data sets randomly drawn according to an unknown probability distribution. An approximate realization of random aggregating can be obtained through subsampled bagging, when large training sets are available. In this paper we perform an experimental bias–variance analysis of bagged and random aggregated ensembles of Support Vector Machines, in order to quantitatively evaluate their theoretical variance reduction properties. Experimental results with small samples show that random aggregating, implemented through subsampled bagging, reduces the variance component of the error by about 90%, while bagging, as expected, achieves a lower reduction. Bias–variance analysis explains also why ensemble methods based on subsampling techniques can be successfully applied to large data mining problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40, 139–158 (2000)
Article Google Scholar
Bousquet, O., Elisseeff, A.: Stability and Generalization. Journal of Machine Learning Research 2, 499–526 (2002)
Article MATH MathSciNet Google Scholar
Valentini, G., Dietterich, T.G.: Bias–variance analysis of Support Vector Machines for the development of SVM-based ensemble methods. Journal of Machine Learning Research (accepted for publication)
Google Scholar
Valentini, G., Dietterich, T.: Bias–variance analysis and ensembles of SVM. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 222–231. Springer, Heidelberg (2002)
Chapter Google Scholar
Andersen, T., Rimer, M., Martinez, T.R.: Optimal artificial neural network architecture selection for voting. In: Proc. of the IEEE International Joint Conference on Neural Networks IJCNN 2001, pp. 790–795. IEEE, Los Alamitos (2001)
Google Scholar
Kim, H., Pang, S., Je, H., Kim, D., Bang, S.: Pattern Classification Using Support Vector Machine Ensemble. In: Proc. of ICPR 2002, vol. 2, pp. 20160–20163. IEEE, Los Alamitos (2002)
Google Scholar
Breiman, L.: Pasting Small Votes for Classification in Large Databases and On-Line. Machine Learning 36, 85–103 (1999)
Article Google Scholar
Joachims, T.: Making large scale SVM learning practical. In: Scholkopf, B., Burges, C.S.A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)
Google Scholar
Chawla, N., Hall, L., Bowyer, K., Moore, T., Kegelmeyer, W.: Distributed pasting of small votes. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 52–61. Springer, Heidelberg (2002)
Chapter Google Scholar
Valentini, G.: Ensemble methods based on bias–variance analysis. PhD thesis, DISI, Università di Genova, Italy (2003), ftp://ftp.disi.unige.it/person/ValentiniG/Tesi/finalversion/vale-th-2003-04.pdf
Domingos, P.: A Unified Bias-Variance Decomposition for Zero-One and Squared Loss. In: Proc. of the Seventeenth National Conference on Artificial Intelligence, Austin, TX, pp. 564–569. AAAI Press, Menlo Park (2000)
Google Scholar
Valentini, G., Masulli, F.: NEURObjects: an object-oriented library for neural network development. Neurocomputing 48, 623–646 (2002)
Article MATH Google Scholar
Merz, C., Murphy, P.: UCI repository of machine learning databases (1998), www.ics.uci.edu/mlearn/MLRepository.html
Evgeniou, T., Perez-Breva, L., Pontil, M., Poggio, T.: Bounds on the Generalization Performance of Kernel Machine Ensembles. In: Langley, P. (ed.) Proc. of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 271–278. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

DSI – Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, Via Comelico 39, Milano, Italy
Giorgio Valentini

Authors

Giorgio Valentini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Electronic Engineering, Piazza d’Armi, University of Cagliari, 09123, Cagliari, Italy
Fabio Roli
Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH, Guildford, UK
Josef Kittler
Centre for Vision, Speech and Signal Proc (CVSSP), University of Surrey, GU2 7XH, Guildford, Surrey, United Kingdom
Terry Windeatt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valentini, G. (2004). Random Aggregated and Bagged Ensembles of SVMs: An Empirical Bias–Variance Analysis. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-25966-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22144-9
Online ISBN: 978-3-540-25966-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics