Abstract
Support vector machines (SVM) are strong classifiers, but large datasets might lead to prohibitively long computation times and high memory requirements. SVM ensembles, where each single SVM sees only a fraction of the data, can be an approach to overcome this barrier. In continuation of related work in this field we construct SVM ensembles with Bagging and Boosting. As a new idea we analyze SVM ensembles with different kernel types (linear, polynomial, RBF) involved inside the ensemble. The goal is to train one strong SVM ensemble classifier for large datasets with less time and memory requirements than a single SVM on all data. From our experiments we find evidence for the following facts: Combining different kernel types can lead to an ensemble classifier stronger than each individual SVM on all training data and stronger than ensembles from a single kernel type alone. Boosting is only productive if we make each single SVM sufficiently weak, otherwise we observe overfitting. Even for very small training sample sizes—and thus greatly reduced time and memory requirements—the ensemble approach often delivers accuracies similar or close to a single SVM trained on all data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L. (1996): Bagging predictors. In: Machine Learning, 24(2), 123–140.
Caputo, B., SIM, K. Furesjo, F., & Smola, A. (2002). Appearance-based object recognition using SVMs: Which kernel should I use? In Proceedings of Nips Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision, Whistler.
Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Chang, E. Y., & Zhu, K., et al. (2008). PSVM: Parallelizing support vector machines on distributed computers. Advances in Neural Information Processing Systems, 20, 16.
Cortes, C., Mohri, M., & Rostamizadeh, A. (2012). Ensembles of kernel predictors. arXiv preprint:1202.3712, arxiv.org.
Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine Learning, 20(3), 273–297.
Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292.
Cristianini, N., & Shawe-Taylor, J. (2000). Support vector machines. Cambridge: Cambridge University Press.
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., & Weingessel, A. (2008). Misc functions of the department of statistics (e1071), TU Wien. R package, version 1.5-18. http://CRAN.R-project.org/package=e1071.
Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory (EuroCOLT) (pp. 23–37).
Kim, H.-C., Pang, S., Je, H.-M., Kim, D., & Yang Bang, S. (2003). Constructing support vector machine ensemble. Pattern Recognition, 36(12), 2757–2767.
Lin, H.-T., & Li, L. (2008). Support vector machinery for infinite ensemble learning. The Journal of Machine Learning Research, 9, 285–312.
Mercer, J. (1909). Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society London, 209, 415–446.
Meyer, O., Bischl, B., & Weihs, C. (2013). Support vector machines on large data sets: Simple parallel approaches. In M. Spiliopoulou, et al. (Eds.), Data analysis, machine learning and knowledge discovery. New York: Springer.
Pavlov, D., Mao, J., & Dom, B. (2000). Scaling-up Support Vector Machines using boosting algorithm. In Proceedings of the 15th International Conference on Pattern Recognition (Vol. 2, pp. 219–222). IEEE, Barcelona.
Schölkopf, B., & Smola, A. (2002). Learning with kernels: Support vector machines, regularization, optimization and beyond. Massachusetts: MIT Press.
Wang, S., Mathew, A., Chen, Y., Xi, L., Ma, L., & Lee, J. (2009). Empirical analysis of support vector machine ensemble classifiers. Expert Systems with Applications, 36(3), 6466–6476.
Weston, J., & Watkins, C. (1999). Support Vector Machines for multi-class pattern recognition. In Proceedings of the 7th European Symposium on Artificial Neural Networks (ESANN) (Vol. 99, pp. 61–72).
Wickramaratna, J., Holden, S. B., & Buxton, B. (2001). Performance degradation in boosting. In J. Kittler & F. Roli (Eds.), Proceedings of the 2nd International Workshop on Multiple Classifier Systems (pp. 11–21). Cambridge: Cambridge University Press.
Yu, H., Yang J., Han, J., & Li, X. (2005). Making SVMs scalable to large data sets using hierarchical cluster indexing. Data Mining and Knowledge Discovery, 11(3), 295–321.
Acknowledgements
This work has been partially supported by the Bundesministerium für Bildung und Forschung (BMBF) under the grant SOMA (AiF FKZ 17N1009) and by the Cologne University of Applied Sciences under the research focus grant COSA.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stork, J., Ramos, R., Koch, P., Konen, W. (2015). SVM Ensembles Are Better When Different Kernel Types Are Combined. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-662-44983-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)