Neural Computing and Applications

, Volume 20, Issue 5, pp 703–715 | Cite as

An unsupervised approach to learn the kernel functions: from global influence to local similarity

  • M. Ehsan Abbasnejad
  • Dhanesh Ramachandram
  • Rajeswari Mandava
Original Article

Abstract

Recently there has been a steep growth in the development of kernel-based learning algorithms. The intrinsic problem in such algorithms is the selection of the optimal kernel for the learning task of interest. In this paper, we propose an unsupervised approach to learn a linear combination of kernel functions, such that the resulting kernel best serves the objectives of the learning task. This is achieved through measuring the influence of each point on the structure of the dataset. This measure is calculated by constructing a weighted graph on which a random walk is performed. The measure of influence in the feature space is probabilistically related to the input space that yields an optimization problem to be solved. The optimization problem is formulated in two different convex settings, namely linear and semidefinite programming, dependent on the type of kernel combination considered. The contributions of this paper are twofold: first, a novel unsupervised approach to learn the kernel function, and second, a method to infer the local similarity represented by the kernel function by measuring the global influence of each point toward the structure of the dataset. The proposed approach focuses on the kernel selection which is independent of the kernel-based learning algorithm. The empirical evaluation of the proposed approach with various datasets shows the effectiveness of the algorithm in practice.

Keywords

Learning the kernels Kernel methods Support vector machine Kernel-PCA 

References

  1. 1.
    Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernal functions. Neural Netw 12(6):783–789. doi:10.1016/S0893-6080(99)00032-5
  2. 2.
    Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: ICML ’04: proceedings of the twenty-first international conference on Machine learning. ACM, New York, NY, USA, p 6. doi:10.1145/1015330.1015424
  3. 3.
    Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2:125–137MATHCrossRefGoogle Scholar
  4. 4.
    Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRefGoogle Scholar
  5. 5.
    Burges CJC (1999) Geometry and invariance in kernel based methods. pp 89–116Google Scholar
  6. 6.
    Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm
  7. 7.
    Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30. doi:10.1016/j.acha.2006.04.006. http://www.dx.doi.org/10.1016/j.acha.2006.04.006
  8. 8.
    Cristianini N, Shawe-taylor J, Elissee A, Kandola J (2002) On kernel-target alignment. In: Advances in neural information processing systems 14. MIT Press, pp 367–373Google Scholar
  9. 9.
    Duan K, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning svm hyperparameters. Neurocomputing 51:41–59. doi:10.1016/S0925-2312(02)00601-X. http://www.sciencedirect.com/science/article/B6V10-4625PVR-1/2/cc81e4581eca9413c3784c75c1ba0ee1 Google Scholar
  10. 10.
    Graf A, Smola A, Borer S (2003) Classification in a normalized feature space using support vector machines. Neural Netw, IEEE Trans 14(3):597–605. doi:10.1109/TNN.2003.811708 CrossRefGoogle Scholar
  11. 11.
    Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs. http://www.dx.doi.org/10.1007/978-1-84800-155-8_7
  12. 12.
    Grant M, Boyd S (2009) Cvx: Matlab software for disciplined convex programming web page and software. http://www.stanford.edu/boyd/cvx
  13. 13.
    Guo Y, Gao J, Kwan P (2008) Twin kernel embedding. Pattern Anal Mach Intell, IEEE Trans 30(8):1490–1495. doi:10.1109/TPAMI.2008.74 CrossRefGoogle Scholar
  14. 14.
    Herbster M, Pontil M, Wainer L (2005) Online learning over graphs. In: ICML ’05: proceedings of the 22nd international conference on Machine learning. ACM, New York, NY, USA, pp 305–312. http://www.doi.acm.org/10.1145/1102351.1102390
  15. 15.
    Kandola J, Shawe-Taylor J, Cristianini N (2002) On the extensions of kernel alignment. NC-TR-2002-120. http://www.eprints.ecs.soton.ac.uk/9745
  16. 16.
    Kandola J, Shawe-Taylor J, Cristianini N (2002) Optimizing kernel alignment over combinations of kernel. http://www.eprints.ecs.soton.ac.uk/9746/
  17. 17.
    Kim SJ, Zymnis A, Magnani A, Koh K, Boyd S (2008) Learning the kernel via convex optimization. pp 1997–2000. doi:10.1109/ICASSP.2008.4518030
  18. 18.
    Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Proceedings of the ICML, pp 315–322Google Scholar
  19. 19.
    Kulis B, Sustik M, Dhillon I (2006) Learning low-rank kernel matrices. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning. ACM, New York, NY, USA, pp 505–512. http://www.doi.acm.org/10.1145/1143844.1143908
  20. 20.
    Vandenberghe L, Vandenberghe SB, Boyd S (1996) Semidefinite programming. SIAM Rev 38(1):49–95. http://www.jstor.org/stable/2132974 Google Scholar
  21. 21.
    Lafon S, Lee A (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. Pattern Anal Mach Intell, IEEE Trans 28(9):1393–1403. doi:10.1109/TPAMI.2006.184 CrossRefGoogle Scholar
  22. 22.
    Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72Google Scholar
  23. 23.
    Marie Szafranski YG, Rakotomamonjy A (2008) Composite kernel learning. In: ICMLGoogle Scholar
  24. 24.
    Meila M, Shi J (2001) A random walks view of spectral segmentationGoogle Scholar
  25. 25.
    Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learn Res 6:1099–1125MathSciNetGoogle Scholar
  26. 26.
    Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels, pp 41–48. doi:10.1109/NNSP.1999.788121
  27. 27.
    Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. Neural Netw, IEEE Trans 12(2):181–201. doi:10.1109/72.914517 CrossRefGoogle Scholar
  28. 28.
    Nadler B, Lafon S, Coifman R, Kevrekidis I (2007) Diffusion maps-a probabilistic interpretation for spectral embedding and clustering algorithms. http://www.dx.doi.org/10.1007/978-3-540-73750-6_10
  29. 29.
    Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: ICML ’07: Proceedings of the 24th international conference on Machine learning. ACM, New York, NY, USA, pp 775–782. http://www.doi.acm.org/10.1145/1273496.1273594
  30. 30.
    Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp 10(5):1299–1319. http://neco.mitpress.org/cgi/content/abstract/10/5/1299 Google Scholar
  31. 31.
    Shaw B, Jebara T (2009) Structure preserving embedding. In: ICML ’09: Proceedings of the 26th annual international conference on machine learning. ACM, New York, NY, USA, pp 937–944.doi:10.1145/1553374.1553494
  32. 32.
    Smola A, Hofmann T, Scholkopf B (2007) Kernel methods in machine learning. Ann Stat 36:1171–1220. http://eprints.pascal-network.org/archive/00003984 Google Scholar
  33. 33.
    Society FRKCAM (1997) Spectral graph theory. Regional conference series in mathmatics. American Mathematical Society, Providence, RIGoogle Scholar
  34. 34.
    Terence Sim SB, Bsat M (2001) The cmu pose, illumination, and expression (pie) database of human faces. Technical Reports CMU-RI-TR-01-02, Robotics Institute, Pittsburgh, PAGoogle Scholar
  35. 35.
    Vapnik V (1999) An overview of statistical learning theory. Neural Netw, IEEE Trans 10(5):988–999. doi:10.1109/72.788640 CrossRefGoogle Scholar
  36. 36.
    Weinberger KQ, Packer BD, Saul LK (2005) Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 381–388Google Scholar
  37. 37.
    Weinberger KQ, Saul LK (2006) An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Unfolding, Proceedings of the 21st national conference on artificial intelligence. AAAIGoogle Scholar
  38. 38.
    Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning. ACM, New York, NY, USA, p 106. doi:10.1145/1015330.1015345
  39. 39.
    Xiong H, Swamy M, Ahmad M (2005) Optimizing the kernel in the empirical feature space. Neural Netw, IEEE Trans 16(2):460–474. doi:10.1109/TNN.2004.841784 CrossRefGoogle Scholar
  40. 40.
    Yang MH, Ahuj N, Kriegman D (2000) Face recognition using kernel eigenfaces. In: Image Processing, 2000. Proceedings. 2000 International Conference on, vol 1, pp 37–40. doi:10.1109/ICIP.2000.900886
  41. 41.
    Zhang X, Lee WS (2006) Hyperparameter learning for graph based semi-supervised learning algorithms. In: NIPS. http://books.nips.cc/papers/files/nips19/NIPS2006_0148.pdf
  42. 42.
    Zhengdong Lu, Dhillon PJI (2009) Geometry-aware metric learning. In: ICML ’09: Proceedings of the 26nd international conference on Machine learning. ACMGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  • M. Ehsan Abbasnejad
    • 1
  • Dhanesh Ramachandram
    • 1
  • Rajeswari Mandava
    • 1
  1. 1.Computer Vision Research Group, School of Computer SciencesUniversiti Sains MalaysiaPenangMalaysia

Personalised recommendations