Neural Processing Letters

, Volume 50, Issue 3, pp 2899–2923 | Cite as

Learning Distance Metric for Support Vector Machine: A Multiple Kernel Learning Approach

  • Weiqi Zhang
  • Zifei YanEmail author
  • Gang Xiao
  • Hongzhi Zhang
  • Wangmeng Zuo


Recent work in distance metric learning has significantly improved the performance in k-nearest neighbor classification. However, the learned metric with these methods cannot adapt to the support vector machines (SVM), which are amongst the most popular classification algorithms using distance metrics to compare samples. In order to investigate the possibility to develop a novel model for joint learning distance metric and kernel classifier, in this paper, we provide a new parameterization scheme for incorporating the squared Mahalanobis distance into the Gaussian RBF kernel, and formulate kernel learning into a generalized multiple kernel learning framework, gearing towards SVM classification. We demonstrate the effectiveness of the proposed algorithm on the UCI machine learning datasets of varying sizes and difficulties and two real-world datasets. Experimental results show that the proposed model achieves competitive classification accuracies and comparable execution time by using spectral projected gradient descent optimizer compared with state-of-the-art methods.


Metric learning Multiple kernel learning Gaussian RBF kernel Support vector machines 



This work is partly support by the National Science Foundation of China (NSFC) Project under the Contract Nos. 61671182, 61102037, 61471146 and 61871381.


  1. 1.
    Aiolli F, Donini M (2015) Easymkl: a scalable multiple kernel learning algorithm. Neurocomputing 169:215–224Google Scholar
  2. 2.
    Bach FR (2009) Exploring large feature spaces with hierarchical multiple kernel learning. In: Advances in neural information processing systems, pp 105–112Google Scholar
  3. 3.
    Bach FR, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 6Google Scholar
  4. 4.
    Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on computer vision and pattern recognition, 2008. CVPR 2008, pp 1–8Google Scholar
  5. 5.
    Cao Q, Ying Y, Li P (2013) Similarity metric learning for face recognition. In: IEEE international conference on computer vision, pp 2408–2415Google Scholar
  6. 6.
    Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. In: Advances in neural information processing systems, pp 396–404Google Scholar
  7. 7.
    Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Machine learning, proceedings of the twenty-fourth international conference, pp 209–216Google Scholar
  8. 8.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30MathSciNetzbMATHGoogle Scholar
  9. 9.
    Deng C, Tang X, Yan J, Liu W, Gao X (2016) Discriminative dictionary learning with common label alignment for cross-modal retrieval. IEEE Trans Multimed 18(2):208–218Google Scholar
  10. 10.
    Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27(8):3893–3903MathSciNetzbMATHGoogle Scholar
  11. 11.
    Do H, Kalousis A (2013) Convex formulations of radius-margin based support vector machines. In: Proceedings of the 30th international conference on Machine learning, pp 169–177Google Scholar
  12. 12.
    Do H, Kalousis A, Wang J, Woznica A (2012) A metric learning perspective of SVM: on the relation of SVM and LMNN. Eprint Arxiv pp 308–317Google Scholar
  13. 13.
    Dong Y, Du B, Zhang L, Zhang L, Tao D (2017) Lam3l: locally adaptive maximum margin metric learning for visual data classification. Neurocomputing 235:1–9Google Scholar
  14. 14.
    Frank A, Asuncion A (2010) Uci machine learning repository []. irvine, ca: University of california. School of Information and Computer Science 213
  15. 15.
    Gai K, Chen G, Zhang C (2010) Learning kernels with radiuses of minimum enclosing balls. In: Advances in neural information processing systems, pp 649–657Google Scholar
  16. 16.
    Gao X, Hoi SCH, Zhang Y, Wan J, Li J (2014) Soml: sparse online metric learning with application to image retrieval. In: Twenty-eighth AAAI conference on artificial intelligence, pp 1206–1212Google Scholar
  17. 17.
    Goldberger J, Roweis ST, Hinton GE, Salakhutdinov R (2004) Neighbourhood components analysis. Adv Neural Inf Process Syst 83(6):513–520Google Scholar
  18. 18.
    Gu Y, Wang C, You D, Zhang Y, Wang S, Zhang Y (2012) Representative multiple kernel learning for classification in hyperspectral imagery. Neurocomputing 50:215–224Google Scholar
  19. 19.
    Guillaumin M, Verbeek J, Schmid C (2009) Is that you? Metric learning approaches for face identification. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 498–505Google Scholar
  20. 20.
    Hasan MA, Ahmad S, Molla MK (2017) Protein subcellular localization prediction using multiple kernel learning based support vector machine. Mol Biosyst 13(4):785Google Scholar
  21. 21.
    Hoi SCH, Liu W, Lyu MR, Ma WY (2006) Learning distance metrics with contextual constraints for image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 2072–2078Google Scholar
  22. 22.
    Jain A, Vishwanathan SVN, Varma M (2012) Spg-gmkl: generalized multiple kernel learning with a million kernels. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 750–758Google Scholar
  23. 23.
    Kedem D, Tyree S, Weinberger KQ, Sha F, Lanckriet G (2012) Non-linear metric learning. In: Advances in neural information processing systems, pp 2573–2581Google Scholar
  24. 24.
    Kloft M, Brefeld U, Sonnenburg S, Zien A (2011) Lp-norm multiple kernel learning. J Mach Learn Res 12:953–997MathSciNetzbMATHGoogle Scholar
  25. 25.
    Lanckriet GRG, Cristianini N, Bartlett P, El Ghaoui L, Jordan MI (2002) Learning the kernel matrix with semi-definite programming. J Mach Learn Res 5(1):323–330zbMATHGoogle Scholar
  26. 26.
    Lauriola I, Polato M, Aiolli F (2017) Radius-margin ratio optimization for dot-product Boolean kernel learning. In: International conference on artificial neural networks, pp 183–191Google Scholar
  27. 27.
    Lim DKH, Mcfee B, Lanckriet G (2013) Robust structural metric learning. In: International conference on machine learning, pp 615–623Google Scholar
  28. 28.
    Lu X, Wang Y, Zhou X, Ling Z (2015) A method for metric learning with multiple-kernel embedding. Neural Process Lett 43(3):923–924Google Scholar
  29. 29.
    Mcfee B, Lanckriet G (2011) Learning multi-modal similarity. J Mach Learn Res 12(8):491–523MathSciNetzbMATHGoogle Scholar
  30. 30.
    Nguyen B, Morell C, De Baets B (2016) Large-scale distance metric learning for k-nearest neighbors regression. Neurocomputing 214:805–814Google Scholar
  31. 31.
    Nguyen N, Guo Y (2008) Metric learning: a support vector approach. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 125–136Google Scholar
  32. 32.
    Rakotomamonjy A, Bach FR, Canu S, Grandvalet Y (2008) Simplemkl. J Mach Learn Res 9(11):2491–2521MathSciNetzbMATHGoogle Scholar
  33. 33.
    Schölkopf B, Smola A (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, CambridgeGoogle Scholar
  34. 34.
    Shawe-Taylor J, Cristianini N (2006) Kernel methods for pattern analysis. J Am Stat Assoc 101(12):1730–1730Google Scholar
  35. 35.
    Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7(7):1531–1565MathSciNetzbMATHGoogle Scholar
  36. 36.
    Squarcina L, Castellani U, Bellani M, Perlini C, Lasalvia A, Dusi N, Bonetto C, Cristofalo D, Tosato S, Rambaldelli G (2017) Classification of first-episode psychosis in a large cohort of patients using support vector machine and multiple kernel learning techniques. Neuroimage 145:238–245Google Scholar
  37. 37.
    Torresani L, Kc L (2007) Large margin component analysis. Adv Neural Inf Process Syst 19:1385Google Scholar
  38. 38.
    Tran D, Sorokin A (2008) Human activity recognition with metric learning. In: European conference on computer vision, pp 548–561Google Scholar
  39. 39.
    Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12(9):2013–2036Google Scholar
  40. 40.
    Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: International conference on machine learning, pp 1065–1072Google Scholar
  41. 41.
    Wang F, Zuo W, Zhang L, Meng D, Zhang D (2015) A kernel classification framework for metric learning. IEEE Trans Neural Netw Learn Syst 26(9):1950–1962MathSciNetGoogle Scholar
  42. 42.
    Wang J, Do HT, Woznica A, Kalousis A (2011) Metric learning with multiple kernels. In: Advances in neural information processing systems, pp 1170–1178Google Scholar
  43. 43.
    Wang J, Deng Z, Choi KS, Jiang Y, Luo X, Chung FL, Wang S (2016) Distance metric learning for soft subspace clustering in composite kernel space. Pattern Recognit 52:113–134zbMATHGoogle Scholar
  44. 44.
    Weinberger KQ, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(1):207–244zbMATHGoogle Scholar
  45. 45.
    Wu H, He L (2015) Combining visual and textual features for medical image modality classification with lp-norm multiple kernel learning. Neurocomputing 147(1):387–394Google Scholar
  46. 46.
    Xu X, Tsang IW, Xu D (2013) Soft margin multiple kernel learning. IEEE Trans Neural Netw Learn Syst 24(5):749–761Google Scholar
  47. 47.
    Xu Z, Jin R, King I, Lyu M (2009) An extended level method for efficient multiple kernel learning. In: Advances in neural information processing systems, pp 1825–1832Google Scholar
  48. 48.
    Xu Z, Weinberger KQ, Chapelle O (2012) Distance metric learning for kernel machines. arXiv preprint arXiv:1208.3422
  49. 49.
    Yang E, Deng C, Li C, Liu W, Li J, Tao D (2018) Shared predictive cross-modal deep quantization. IEEE Trans Neural Netw Learn Syst 99:1–12Google Scholar
  50. 50.
    Yi S, Jiang N, Wang X, Liu W (2016) Individual adaptive metric learning for visual tracking. Neurocomputing 191:273–285Google Scholar
  51. 51.
    Ying Y, Li P (2012) Distance metric learning with eigenvalue optimization. J Mach Learn Res 13(1):1–26MathSciNetzbMATHGoogle Scholar
  52. 52.
    Zhang X, Mahoor MH, Mavadati SM (2015) Facial expression recognition using lp-norm MKL multiclass-SVM. Mach Vis Appl 26(4):467–483Google Scholar
  53. 53.
    Zhao C, Chen Y, Wei Z, Miao D, Gu X (2018) Qrkiss: a two-stage metric learning via QR-decomposition and kiss for person re-identification. Neural Process Lett 2:1–24Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.No. 211 Hospital of PLAHarbinChina

Personalised recommendations