An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection

Wang, Zhe; Fan, Qi; Jie, Wenbo; Gao, Daqi

doi:10.1007/s11063-014-9385-2

An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection

Published: 15 October 2014

Volume 42, pages 715–744, (2015)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Zhe Wang¹,
Qi Fan¹,
Wenbo Jie¹ &
…
Daqi Gao¹

342 Accesses
1 Citation
Explore all metrics

Abstract

Multiple empirical kernel learning (MEKL) is demonstrated to be flexible and effective due to introducing multiple kernels. But MEKL also brings a large computational complexity in practice. Therefore, in this paper we adopt the random projection (RP) technique to efficiently construct the low-dimensional feature space, and then develop an efficient and effective MEKL named MEKLRP so as to decrease the computational complexity. The proposed MEKLRP randomly selects a subset \(S'\) of \(p\) samples from the whole training set \(S\) of \(N\) samples, and then utilizes \(S'\) to generate \(M\) different EKMs \(\{\Phi ^{rpe}_l(x)\}_{l=1}^M\). Following that, MEKLRP maps each sample \(x\) into \(\Phi _l^{rpe}(x), l=1...M\). Finally, MEKLRP applies the transformed samples into our previous MEKL framework. We highlight the contributions of the MEKLRP as follows. Firstly, the MEKLRP adopts the random characteristic of RP and efficiently decreases the computational cost of the matrix eigen-decomposition from \(O(N^3)\) to \(O(p^3)\). Secondly, the MEKLRP maintains an approximate separability at one certain margin and preserves most of the discriminant information in a low-dimensional space since the characteristic of RP in kernel-based learning. Thirdly, the MEKLRP behaves a lower generalization risk bound than its corresponding original learning machine according to the Rademacher complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Maximizing adjusted covariance: new supervised dimension reduction for classification

Article 02 April 2024

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Achlioptas D (2003) Database-friendly random projections: johnson-lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687
Article MathSciNet MATH Google Scholar
Arriaga R, Vempala S (2006) An algorithmic theory of learning: robust concepts and random projection. Mach Learn 63(2):161–182
Article MATH Google Scholar
Bach FR, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 6
Bache K, Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]
Balcan M, Blum A, Vempala S (2006) Kernels as features: on kernels, margins, and low-dimensional mappings. Mach Learn 65(1):79–94
Article Google Scholar
Bartlett P, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Lear Res 3:463–482
MathSciNet MATH Google Scholar
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. pp 245–250
Boutsidis C, Zouzias A, Drineas P (2010) Random projections for \(k\)-means clustering. arXiv preprint arXiv:1011.4632
Calderon-Niquin M, Valverde-Rebaza J (2012) Multiple kernel learning based on local and nonlinear combinations. In: 2012 XXXVIII Conferencia Latinoamericana En Informatica (CLEI). IEEE, pp 1–7
Candès EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489–509
Article MATH Google Scholar
Chen X, Qi C (2014) Nonlinear neighbor embedding for single image super-resolution via kernel mapping. Signal Proc 94:6–22
Article Google Scholar
Chen Z, Li J, Wei L, Xu W, Shi Y (2011) Multiple-kernel svm based multiple-task oriented data mining system for gene expression data analysis. Expert Syst Appl 38(10):12151–12159
Article Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, New York
Book Google Scholar
Dasgupta S, Gupta A (2002) An elementary proof of the johnson-lindenstrauss lemma. Random Struct Algorithm 22(1):60–65
Article MathSciNet Google Scholar
Farquhar J, Hardoon D, Meng H, Shawe-taylor J, Szedmak S (2005) Two view learning: Svm-2k, theory and practice. Adv Neural Inf Proc Syst 18:355–362
Google Scholar
Goel N, Bebis G, Nefian A (2005) Face recognition experiments with random projection. In: Defense and Security. International Society for Optics and Photonics, pp 426–437
Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Advances in information retrieval. Springer, pp 345–359
Hino H (2013) Gaussian multiple kernel learning with entropy power inequality. In: 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp 1–6
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing. pp 604–613
Izenman AJ (2008) Linear discriminant analysis. Springer, New York
Google Scholar
Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Conference in modern analysis and probability (New Haven, Conn., 1982), volume 26. American Mathematical Society, pp 189–206
Kaski S (1997) Data exploration using self-organizing maps. In: Acta Polytechnica Scandinavica: Mathematics, Computing and Management in Engineering Series NO. 82. Citeseer
Kim SJ, Magnani A, Boyd S (2006) Optimal kernel selection in kernel fisher discriminant analysis. In: Proceedings of the 23rd international conference on Machine learning. pp 465–472
Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory 47(5):1902–1914
Article MathSciNet MATH Google Scholar
Koltchinskii V, Panchenko D (2000) Rademacher processes and bounding the risk of function learning. In: High Dimensional Probability II, volume 47. Springer, pp 443–457
Kressel UHG (1999) Advances in kernel methods. In: Pairwise Classification and Support Vector Machines. MIT Press, pp 255–268
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
MATH Google Scholar
Landauer T, Foltz P, Laham D (1998) An introduction to latent semantic analysis. J Mach Learn Res 25:259–284
Google Scholar
Liang J, Chen L, Chen X (2012) Discriminant kernel learning using hybrid regularization. Neural Proc Lett 36(3):257–273
Article Google Scholar
Liang Z, Liu N (2013) Efficient feature scaling for support vector machines with a quadratic kernel. Neural Processing Letters pp 1–12
Linial M, Linial N, Tishby N, Yona G (1997) Global self-organization of all known protein sequences reveals inherent biological signatures1. J Mole Biol 268(2):539–556
Article Google Scholar
Liu X, Wang L, Yin J, Zhu E, Zhang J (2013) An efficient approach to integrating radius information into multiple kernel learning. IEEE Trans Cybern 43(2):557–569
Article Google Scholar
Lkeski J (2003) Ho-kashyap classifier with generalization control. Pattern Recognition Letters 24(14):2281–2290
Article Google Scholar
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126
Article Google Scholar
Mendelson S (2002) Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans Inf Theory 48(1):251–263
Article MathSciNet MATH Google Scholar
Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: A probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. pp 159–168
Rudelson M, Vershynin R (2013) Hanson-wright inequality and sub-gaussian concentration. Electron Commun Prob 18(82):1–9
MathSciNet Google Scholar
Scholkopf B, Mika S, Burges CJC, Knirsch P, Muller KR, Ratsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017
Article Google Scholar
Sonnenburg S, Rätsch G, Schäfer C (2006) A general and efficient multiple kernel learning algorithm. 18:1273–1280
Google Scholar
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. The Journal of Machine Learning Research 7:1531–1565
MATH Google Scholar
Valentini G (2005) An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Trans Syst Man Cybern Part B 35(6):1252–1271
Article Google Scholar
Wang Z, Chen S, Sun T (2008) Multik-mhks: a novel multiple kernel learning algorithm. IEEE Trans Pattern Anal Mach Intell 30(2):348–353
Article Google Scholar
Wang Z, Jie W, Chen S, Gao D (2013) Random projection ensemble learning with multiple empirical kernels. Knowledge-Based Syst 37:388–393
Article Google Scholar
Wang Z, Jie W, Gao D (2013) A novel multiple nyström-approximating kernel discriminant analysis. Neurocomputing 119:385–398
Article Google Scholar
Wang Z, Xu J, Gao D, Fu Y (2013) Multiple empirical kernel learning based on local information. Neural Comput Appl 23(7–8):2113–2120
Article Google Scholar
Welling M (2005) Fisher linear discriminant analysis. Department of Computer Science, University of Toronto, 3
Wu P, Duan F, Guo P (2013) Multiple kernel learning method using mrmr criterion and kernel alignment. In: Neural Information Processing. Springer, pp 113–120
Xiong H (2009) A unified framework for kernelization: The empirical kernel feature space. In: Chinese Conference on Pattern Recognition 2009 (CCPR 2009). IEEE, pp 1–5
Xu QS, Liang YZ (2001) Monte carlo cross validation. Chemom Intell Lab Syst 56(1):1–11
Article MathSciNet Google Scholar
Xu X, Tsang IW, Xu D (2013) Soft margin multiple kernel learning. IEEE Trans Neural Netw Learn Syst 24(5):749–761
Article Google Scholar
Yan F, Mikolajczyk K, Barnard M, Cai H, Kittler J (2010) lp-norm multiple kernel fisher discriminant analysis for object and image categorisation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition. pp 3626–3632
Yang B, Bu Y (2009) Multiple kernel learning using regularized ho-kashyap classifier in empirical kernel mapping space. In: Fifth International Conference on Natural Computation (ICNC’09), volume 1. IEEE, pp 209–212
Yang H, Xu Z, Ye J, King I, Lyu MR (2011) Efficient sparse generalized multiple kernel learning. IEEE Trans Neural Netw 22(3):433–446
Article Google Scholar
Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J Mach Learn Res 6:483–502
MathSciNet MATH Google Scholar
Ye J, Li T, Xiong T, Janardan R (2004) Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans Comput Biol Bioinform 1(4):181–190
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by Natural Science Foundations of China under Grant Nos. 61272198 and 21176077, Innovation Program of Shanghai Municipal Education Commission under Grant No. 14ZZ054, the Fundamental Research Funds for the Central Universities, Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-2012-003, and Provincial Key Laboratory for Computer Information Processing Technology of Soochow University.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, East China University of Science & Technology, Shanghai, 200237, China
Zhe Wang, Qi Fan, Wenbo Jie & Daqi Gao

Authors

Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Fan
View author publications
You can also search for this author in PubMed Google Scholar
Wenbo Jie
View author publications
You can also search for this author in PubMed Google Scholar
Daqi Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhe Wang or Daqi Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Fan, Q., Jie, W. et al. An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection. Neural Process Lett 42, 715–744 (2015). https://doi.org/10.1007/s11063-014-9385-2

Download citation

Published: 15 October 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11063-014-9385-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation