Abstract
The performance of support vector machines in nonlinearly separable classification problems strongly relies on the kernel function. Toward an automatic machine learning approach for this technique, many research outputs have been produced dealing with the challenge of automatic learning of good-performing kernels for support vector machines. However, these works have been carried out without a thorough analysis of the set of components that influence the behavior of support vector machines and their interaction with the kernel. These components are related in an intricate way and it is difficult to provide a comprehensible analysis of their joint effect. In this paper, we try to fill this gap introducing the necessary steps in order to understand these interactions and provide clues for the research community to know where to place the emphasis. First of all, we identify all the factors that affect the final performance of support vector machines in relation to the elicitation of kernels. Next, we analyze the factors independently or in pairs and study the influence each component has on the final classification performance, providing recommendations and insights into the kernel setting for support vector machines.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ali S, Smith-Miles KA (2006) A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 70(1):173–186. https://doi.org/10.1016/j.neucom.2006.03.004
Alizadeh M, Ebadzadeh MM (2011) Kernel evolution for support vector classification. In: 2011 IEEE workshop on evolving and adaptive intelligent systems (EAIS), pp 93–99. https://doi.org/10.1109/EAIS.2011.5945924
Bing W, Wen-qiong Z, Ling C, Jia-hong L (2010) A GP-based kernel construction and optimization method for RVM. In: 2010 the 2nd international conference on computer and automation engineering (ICCAE), vol 4, pp 419–423. https://doi.org/10.1109/ICCAE.2010.5451646
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM, New York, NY, USA, COLT ’92, pp 144–152. https://doi.org/10.1145/130385.130401. (Event-place: Pittsburgh, Pennsylvania, USA)
Burges CJ, Crisp DJ (2000) Uniqueness of the SVM solution. In: Advances in neural information processing systems, pp 223–229
Chapelle O (2002) Support vector machines: induction principle, adaptive tuning and prior knowledge. Ph.D. thesis, LIP6
Cho Y, Saul LK (2009) Kernel methods for deep learning. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. Curran Associates, Inc., pp 342–350. http://papers.nips.cc/paper/3628-kernel-methods-for-deep-learning.pdf
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dioşan L, Rogozan A, Pecuchet JP (2007) Evolving kernel functions for SVMs by genetic programming. In: Sixth international conference on machine learning and applications (ICMLA 2007), pp 19–24. https://doi.org/10.1109/ICMLA.2007.70
Dioşan L, Rogozan A, Pecuchet JP (2008) Optimising multiple kernels for SVM by genetic programming. In: Evolutionary computation in combinatorial optimization, Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 230–241. https://doi.org/10.1007/978-3-540-78604-7_20
Dioşan L, Rogozan A, Pecuchet JP (2012) Improving classification performance of support vector machine by genetically optimising kernel shape and hyper-parameters. Appl Intell 36(2):280–294. https://doi.org/10.1007/s10489-010-0260-1
Dua D, Graff C (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Durrande N, Ginsbourger D, Roustant O (2012) Additive covariance kernels for high-dimensional Gaussian process modeling. Annales de la Faculté de Sciences de Toulouse Tome 21(3):481–499
Duvenaud D (2014) Automatic model construction with Gaussian processes. Thesis. University of Cambridge. http://www.repository.cam.ac.uk/handle/1810/247281
Duvenaud D, Lloyd J, Grosse R, Tenenbaum J, Zoubin G (2013) Structure discovery in nonparametric regression through compositional kernel search. In: Proceedings of the 30th international conference on machine learning, pp 1166–1174. http://jmlr.org/proceedings/papers/v28/duvenaud13.html
Fortin FA, Rainville FMD, Gardner MA, Parizeau M, Gagné C (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13(Jul):2171–2175
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Gagné C, Schoenauer M, Sebag M, Tomassini M (2006) Genetic programming for kernel-based learning with co-evolving subsets selection. In: Parallel problem solving from nature—PPSN IX, Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 1008–1017. https://doi.org/10.1007/11844297_102
Genton MG (2002) Classes of kernels for machine learning: a statistics perspective. J Mach Learn Res 2:299–312
Gijsberts A, Metta G, Rothkrantz L (2010) Evolutionary optimization of least-squares support vector machines. In: Data mining, annals of information systems. Springer, Boston, MA, pp 277–297. https://doi.org/10.1007/978-1-4419-1280-0_12
Girdea M, Ciortuz L (2007) A hybrid genetic programming and boosting technique for learning kernel functions from training data. In: Ninth international symposium on symbolic and numeric algorithms for scientific computing (SYNASC 2007), pp 395–402. https://doi.org/10.1109/SYNASC.2007.71
HajiGhassemi N, Deisenroth M (2014) Analytic long-term forecasting with periodic Gaussian processes. In: Proceedings of machine learning research, pp 303–311. http://proceedings.mlr.press/v33/hajighassemi14.html
Howley T, Madden MG (2005) The genetic kernel support vector machine: description and evaluation. Artif Intell Rev 24(3–4):379–395. https://doi.org/10.1007/s10462-005-9009-3
Howley T, Madden MG (2006) An evolutionary approach to automatic kernel construction. In: Artificial neural networks—ICANN 2006, Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 417–426. https://doi.org/10.1007/11840930_43
Hussain M, Wajid SK, Elzaart A, Berbar M (2011) A comparison of SVM kernel functions for breast cancer detection. In: Imaging and visualization 2011 eighth international conference computer graphics, pp 145–150. https://doi.org/10.1109/CGIV.2011.31
Joachims T (1998) Making large-scale SVM learning practical. Technical report. https://www.econstor.eu/handle/10419/77178
Koch P, Bischl B, Flasch O, Bartz-Beielstein T, Weihs C, Konen W (2012) Tuning and evolution of support vector kernels. Evol Intell 5(3):153–170. https://doi.org/10.1007/s12065-012-0073-8
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Li CH, Lin CT, Kuo BC, Chu HS (2010) An automatic method for selecting the parameter of the RBF kernel function to support vector machines. In: 2010 IEEE international geoscience and remote sensing symposium, pp 836–839. https://doi.org/10.1109/IGARSS.2010.5649251. (iSSN: 2153-7003)
Li JB, Chu SC, Pan JS (2013) Kernel learning algorithms for face recognition. Springer, Berlin
MacKay DJC (1996) Bayesian methods for backpropagation networks. In: Models of neural networks III, physics of neural networks. Springer, New York, NY, pp 211–254. https://doi.org/10.1007/978-1-4612-0723-8_6
Mercer J (1909) XVI. Functions of positive and negative type, and their connection the theory of integral equations. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Character 209(441–458):415–446. https://doi.org/10.1098/rsta.1909.0016
Mezher MA, Abbod MF (2014) Genetic folding for solving multiclass SVM problems. Appl Intell 41(2):464–472. https://doi.org/10.1007/s10489-014-0533-1
Mohandes MA, Halawani TO, Rehman S, Hussain AA (2004) Support vector machines for wind speed prediction. Renew Energy 29(6):939–947. https://doi.org/10.1016/j.renene.2003.11.009
Neal RM (1996) Bayesian learning for neural networks. Lecture notes in statistics. Springer, New York
Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 10(1):36. https://doi.org/10.1186/s13040-017-0154-4
Pei Y (2019) Automatic decision making for parameters in kernel method. In: 2019 IEEE symposium series on computational intelligence (SSCI), pp 3207–3214. https://doi.org/10.1109/SSCI44817.2019.9002691
Phienthrakul T, Kijsirikul B (2007) GPES: an algorithm for evolving hybrid kernel functions of support vector machines. In: 2007 IEEE congress on evolutionary computation, pp 2636–2643. https://doi.org/10.1109/CEC.2007.4424803
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large-Margin Classif 10(3):61–74
Powell MJD (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput J 7(2):155–162. https://doi.org/10.1093/comjnl/7.2.155
Pree H, Herwig B, Gruber T, Sick B, David K, Lukowicz P (2014) On general purpose time series similarity measures and their use as kernel functions in support vector machines. Inf Sci 281:478–495. https://doi.org/10.1016/j.ins.2014.05.025
Reitmaier T, Sick B (2015) The responsibility weighted Mahalanobis kernel for semi-supervised training of support vector machines for classification. Inf Sci 323:179–198. https://doi.org/10.1016/j.ins.2015.06.027
Schuh MA, Angryk RA, Sheppard J (2012) Evolving kernel functions with particle swarms and genetic programming. In: Youngblood GM, McCarthy PM (eds) Proceedings of the twenty-fifth international Florida artificial intelligence research society conference, 2012. AAAI Press, Marco Island, Florida, pp 80–85. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS12/paper/view/4479/4770.pdf
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464. https://doi.org/10.1214/aos/1176344136
Shaffer JP (2012) Modified sequentially rejective multiple test procedures. J Am Stat Assoc 81:826–831
Sousa ADM, Lorena AC, Basgalupp MP (2017) GEEK: grammatical evolution for automatically evolving kernel functions. In: 2017 IEEE Trustcom/BigDataSE/ICESS, pp 941–948. https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.334
Sullivan KM, Luke S (2007) Evolving kernels for support vector machine classification. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, NY, USA, GECCO ’07, pp 1702–1707. https://doi.org/10.1145/1276958.1277292
Thadani K, Ashutosh, Jayaraman VK, Sundararajan V (2006) Evolutionary selection of kernels in support vector machines. In: 2006 international conference on advanced computing and communications, pp 19–24. https://doi.org/10.1109/ADCOM.2006.4289849
Valerio R, Vilalta R (2014) Kernel selection in support vector machines using gram-matrix properties. In: Proceedings of the 27th international conference on advances in neural information processing systems. Workshop on modern nonparametrics: automating the learning pipeline, NIPS, vol 14, pp 2–4
Vapnik V (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin
Zhang F (2011) Positive semidefinite matrices. In: Matrix theory, universitext. Springer, New York, NY, pp 199–252. https://doi.org/10.1007/978-1-4614-1099-7_7
Zhao L, Gai M, Jia Y (2018) Classification of multiple power quality disturbances based on PSO-SVM of hybrid kernel function. J Inf Hiding Multimed Signal Process 10(1):138–146
Acknowledgements
This work has been supported by the Spanish Ministry of Science and Innovation (projects TIN2016-78365-R and PID2019-104966GB-I00), and the Basque Government (projects KK-2020/00049 and IT1244-19, and ELKARTEK program). Jose A. Lozano is also supported by BERC 2018-2021 (Basque government) and BCAM Severo Ochoa accreditation SEV-2017-0718 (Spanish Ministry of Science and Innovation).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Roman, I., Santana, R., Mendiburu, A. et al. In-depth analysis of SVM kernel learning and its components. Neural Comput & Applic 33, 6575–6594 (2021). https://doi.org/10.1007/s00521-020-05419-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05419-z