Skip to main content
Log in

An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Multiple empirical kernel learning (MEKL) is demonstrated to be flexible and effective due to introducing multiple kernels. But MEKL also brings a large computational complexity in practice. Therefore, in this paper we adopt the random projection (RP) technique to efficiently construct the low-dimensional feature space, and then develop an efficient and effective MEKL named MEKLRP so as to decrease the computational complexity. The proposed MEKLRP randomly selects a subset \(S'\) of \(p\) samples from the whole training set \(S\) of \(N\) samples, and then utilizes \(S'\) to generate \(M\) different EKMs \(\{\Phi ^{rpe}_l(x)\}_{l=1}^M\). Following that, MEKLRP maps each sample \(x\) into \(\Phi _l^{rpe}(x), l=1...M\). Finally, MEKLRP applies the transformed samples into our previous MEKL framework. We highlight the contributions of the MEKLRP as follows. Firstly, the MEKLRP adopts the random characteristic of RP and efficiently decreases the computational cost of the matrix eigen-decomposition from \(O(N^3)\) to \(O(p^3)\). Secondly, the MEKLRP maintains an approximate separability at one certain margin and preserves most of the discriminant information in a low-dimensional space since the characteristic of RP in kernel-based learning. Thirdly, the MEKLRP behaves a lower generalization risk bound than its corresponding original learning machine according to the Rademacher complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Achlioptas D (2003) Database-friendly random projections: johnson-lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687

    Article  MathSciNet  MATH  Google Scholar 

  2. Arriaga R, Vempala S (2006) An algorithmic theory of learning: robust concepts and random projection. Mach Learn 63(2):161–182

    Article  MATH  Google Scholar 

  3. Bach FR, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 6

  4. Bache K, Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]

  5. Balcan M, Blum A, Vempala S (2006) Kernels as features: on kernels, margins, and low-dimensional mappings. Mach Learn 65(1):79–94

    Article  Google Scholar 

  6. Bartlett P, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Lear Res 3:463–482

    MathSciNet  MATH  Google Scholar 

  7. Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. pp 245–250

  8. Boutsidis C, Zouzias A, Drineas P (2010) Random projections for \(k\)-means clustering. arXiv preprint arXiv:1011.4632

  9. Calderon-Niquin M, Valverde-Rebaza J (2012) Multiple kernel learning based on local and nonlinear combinations. In: 2012 XXXVIII Conferencia Latinoamericana En Informatica (CLEI). IEEE, pp 1–7

  10. Candès EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489–509

    Article  MATH  Google Scholar 

  11. Chen X, Qi C (2014) Nonlinear neighbor embedding for single image super-resolution via kernel mapping. Signal Proc 94:6–22

    Article  Google Scholar 

  12. Chen Z, Li J, Wei L, Xu W, Shi Y (2011) Multiple-kernel svm based multiple-task oriented data mining system for gene expression data analysis. Expert Syst Appl 38(10):12151–12159

    Article  Google Scholar 

  13. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, New York

    Book  Google Scholar 

  14. Dasgupta S, Gupta A (2002) An elementary proof of the johnson-lindenstrauss lemma. Random Struct Algorithm 22(1):60–65

    Article  MathSciNet  Google Scholar 

  15. Farquhar J, Hardoon D, Meng H, Shawe-taylor J, Szedmak S (2005) Two view learning: Svm-2k, theory and practice. Adv Neural Inf Proc Syst 18:355–362

    Google Scholar 

  16. Goel N, Bebis G, Nefian A (2005) Face recognition experiments with random projection. In: Defense and Security. International Society for Optics and Photonics, pp 426–437

  17. Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Advances in information retrieval. Springer, pp 345–359

  18. Hino H (2013) Gaussian multiple kernel learning with entropy power inequality. In: 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp 1–6

  19. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing. pp 604–613

  20. Izenman AJ (2008) Linear discriminant analysis. Springer, New York

    Google Scholar 

  21. Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Conference in modern analysis and probability (New Haven, Conn., 1982), volume 26. American Mathematical Society, pp 189–206

  22. Kaski S (1997) Data exploration using self-organizing maps. In: Acta Polytechnica Scandinavica: Mathematics, Computing and Management in Engineering Series NO. 82. Citeseer

  23. Kim SJ, Magnani A, Boyd S (2006) Optimal kernel selection in kernel fisher discriminant analysis. In: Proceedings of the 23rd international conference on Machine learning. pp 465–472

  24. Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory 47(5):1902–1914

    Article  MathSciNet  MATH  Google Scholar 

  25. Koltchinskii V, Panchenko D (2000) Rademacher processes and bounding the risk of function learning. In: High Dimensional Probability II, volume 47. Springer, pp 443–457

  26. Kressel UHG (1999) Advances in kernel methods. In: Pairwise Classification and Support Vector Machines. MIT Press, pp 255–268

  27. Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MATH  Google Scholar 

  28. Landauer T, Foltz P, Laham D (1998) An introduction to latent semantic analysis. J Mach Learn Res 25:259–284

    Google Scholar 

  29. Liang J, Chen L, Chen X (2012) Discriminant kernel learning using hybrid regularization. Neural Proc Lett 36(3):257–273

    Article  Google Scholar 

  30. Liang Z, Liu N (2013) Efficient feature scaling for support vector machines with a quadratic kernel. Neural Processing Letters pp 1–12

  31. Linial M, Linial N, Tishby N, Yona G (1997) Global self-organization of all known protein sequences reveals inherent biological signatures1. J Mole Biol 268(2):539–556

    Article  Google Scholar 

  32. Liu X, Wang L, Yin J, Zhu E, Zhang J (2013) An efficient approach to integrating radius information into multiple kernel learning. IEEE Trans Cybern 43(2):557–569

    Article  Google Scholar 

  33. Lkeski J (2003) Ho-kashyap classifier with generalization control. Pattern Recognition Letters 24(14):2281–2290

    Article  Google Scholar 

  34. Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126

    Article  Google Scholar 

  35. Mendelson S (2002) Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans Inf Theory 48(1):251–263

    Article  MathSciNet  MATH  Google Scholar 

  36. Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: A probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. pp 159–168

  37. Rudelson M, Vershynin R (2013) Hanson-wright inequality and sub-gaussian concentration. Electron Commun Prob 18(82):1–9

    MathSciNet  Google Scholar 

  38. Scholkopf B, Mika S, Burges CJC, Knirsch P, Muller KR, Ratsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017

    Article  Google Scholar 

  39. Sonnenburg S, Rätsch G, Schäfer C (2006) A general and efficient multiple kernel learning algorithm. 18:1273–1280

    Google Scholar 

  40. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. The Journal of Machine Learning Research 7:1531–1565

    MATH  Google Scholar 

  41. Valentini G (2005) An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Trans Syst Man Cybern Part B 35(6):1252–1271

    Article  Google Scholar 

  42. Wang Z, Chen S, Sun T (2008) Multik-mhks: a novel multiple kernel learning algorithm. IEEE Trans Pattern Anal Mach Intell 30(2):348–353

    Article  Google Scholar 

  43. Wang Z, Jie W, Chen S, Gao D (2013) Random projection ensemble learning with multiple empirical kernels. Knowledge-Based Syst 37:388–393

    Article  Google Scholar 

  44. Wang Z, Jie W, Gao D (2013) A novel multiple nyström-approximating kernel discriminant analysis. Neurocomputing 119:385–398

    Article  Google Scholar 

  45. Wang Z, Xu J, Gao D, Fu Y (2013) Multiple empirical kernel learning based on local information. Neural Comput Appl 23(7–8):2113–2120

    Article  Google Scholar 

  46. Welling M (2005) Fisher linear discriminant analysis. Department of Computer Science, University of Toronto, 3

  47. Wu P, Duan F, Guo P (2013) Multiple kernel learning method using mrmr criterion and kernel alignment. In: Neural Information Processing. Springer, pp 113–120

  48. Xiong H (2009) A unified framework for kernelization: The empirical kernel feature space. In: Chinese Conference on Pattern Recognition 2009 (CCPR 2009). IEEE, pp 1–5

  49. Xu QS, Liang YZ (2001) Monte carlo cross validation. Chemom Intell Lab Syst 56(1):1–11

    Article  MathSciNet  Google Scholar 

  50. Xu X, Tsang IW, Xu D (2013) Soft margin multiple kernel learning. IEEE Trans Neural Netw Learn Syst 24(5):749–761

    Article  Google Scholar 

  51. Yan F, Mikolajczyk K, Barnard M, Cai H, Kittler J (2010) lp-norm multiple kernel fisher discriminant analysis for object and image categorisation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition. pp 3626–3632

  52. Yang B, Bu Y (2009) Multiple kernel learning using regularized ho-kashyap classifier in empirical kernel mapping space. In: Fifth International Conference on Natural Computation (ICNC’09), volume 1. IEEE, pp 209–212

  53. Yang H, Xu Z, Ye J, King I, Lyu MR (2011) Efficient sparse generalized multiple kernel learning. IEEE Trans Neural Netw 22(3):433–446

    Article  Google Scholar 

  54. Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J Mach Learn Res 6:483–502

    MathSciNet  MATH  Google Scholar 

  55. Ye J, Li T, Xiong T, Janardan R (2004) Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans Comput Biol Bioinform 1(4):181–190

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by Natural Science Foundations of China under Grant Nos. 61272198 and 21176077, Innovation Program of Shanghai Municipal Education Commission under Grant No. 14ZZ054, the Fundamental Research Funds for the Central Universities, Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-2012-003, and Provincial Key Laboratory for Computer Information Processing Technology of Soochow University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhe Wang or Daqi Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Fan, Q., Jie, W. et al. An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection. Neural Process Lett 42, 715–744 (2015). https://doi.org/10.1007/s11063-014-9385-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-014-9385-2

Keywords

Navigation