Abstract
Multiple empirical kernel learning (MEKL) is demonstrated to be flexible and effective due to introducing multiple kernels. But MEKL also brings a large computational complexity in practice. Therefore, in this paper we adopt the random projection (RP) technique to efficiently construct the low-dimensional feature space, and then develop an efficient and effective MEKL named MEKLRP so as to decrease the computational complexity. The proposed MEKLRP randomly selects a subset \(S'\) of \(p\) samples from the whole training set \(S\) of \(N\) samples, and then utilizes \(S'\) to generate \(M\) different EKMs \(\{\Phi ^{rpe}_l(x)\}_{l=1}^M\). Following that, MEKLRP maps each sample \(x\) into \(\Phi _l^{rpe}(x), l=1...M\). Finally, MEKLRP applies the transformed samples into our previous MEKL framework. We highlight the contributions of the MEKLRP as follows. Firstly, the MEKLRP adopts the random characteristic of RP and efficiently decreases the computational cost of the matrix eigen-decomposition from \(O(N^3)\) to \(O(p^3)\). Secondly, the MEKLRP maintains an approximate separability at one certain margin and preserves most of the discriminant information in a low-dimensional space since the characteristic of RP in kernel-based learning. Thirdly, the MEKLRP behaves a lower generalization risk bound than its corresponding original learning machine according to the Rademacher complexity.
Similar content being viewed by others
References
Achlioptas D (2003) Database-friendly random projections: johnson-lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687
Arriaga R, Vempala S (2006) An algorithmic theory of learning: robust concepts and random projection. Mach Learn 63(2):161–182
Bach FR, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 6
Bache K, Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]
Balcan M, Blum A, Vempala S (2006) Kernels as features: on kernels, margins, and low-dimensional mappings. Mach Learn 65(1):79–94
Bartlett P, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Lear Res 3:463–482
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. pp 245–250
Boutsidis C, Zouzias A, Drineas P (2010) Random projections for \(k\)-means clustering. arXiv preprint arXiv:1011.4632
Calderon-Niquin M, Valverde-Rebaza J (2012) Multiple kernel learning based on local and nonlinear combinations. In: 2012 XXXVIII Conferencia Latinoamericana En Informatica (CLEI). IEEE, pp 1–7
Candès EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489–509
Chen X, Qi C (2014) Nonlinear neighbor embedding for single image super-resolution via kernel mapping. Signal Proc 94:6–22
Chen Z, Li J, Wei L, Xu W, Shi Y (2011) Multiple-kernel svm based multiple-task oriented data mining system for gene expression data analysis. Expert Syst Appl 38(10):12151–12159
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, New York
Dasgupta S, Gupta A (2002) An elementary proof of the johnson-lindenstrauss lemma. Random Struct Algorithm 22(1):60–65
Farquhar J, Hardoon D, Meng H, Shawe-taylor J, Szedmak S (2005) Two view learning: Svm-2k, theory and practice. Adv Neural Inf Proc Syst 18:355–362
Goel N, Bebis G, Nefian A (2005) Face recognition experiments with random projection. In: Defense and Security. International Society for Optics and Photonics, pp 426–437
Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Advances in information retrieval. Springer, pp 345–359
Hino H (2013) Gaussian multiple kernel learning with entropy power inequality. In: 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp 1–6
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing. pp 604–613
Izenman AJ (2008) Linear discriminant analysis. Springer, New York
Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. In: Conference in modern analysis and probability (New Haven, Conn., 1982), volume 26. American Mathematical Society, pp 189–206
Kaski S (1997) Data exploration using self-organizing maps. In: Acta Polytechnica Scandinavica: Mathematics, Computing and Management in Engineering Series NO. 82. Citeseer
Kim SJ, Magnani A, Boyd S (2006) Optimal kernel selection in kernel fisher discriminant analysis. In: Proceedings of the 23rd international conference on Machine learning. pp 465–472
Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory 47(5):1902–1914
Koltchinskii V, Panchenko D (2000) Rademacher processes and bounding the risk of function learning. In: High Dimensional Probability II, volume 47. Springer, pp 443–457
Kressel UHG (1999) Advances in kernel methods. In: Pairwise Classification and Support Vector Machines. MIT Press, pp 255–268
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
Landauer T, Foltz P, Laham D (1998) An introduction to latent semantic analysis. J Mach Learn Res 25:259–284
Liang J, Chen L, Chen X (2012) Discriminant kernel learning using hybrid regularization. Neural Proc Lett 36(3):257–273
Liang Z, Liu N (2013) Efficient feature scaling for support vector machines with a quadratic kernel. Neural Processing Letters pp 1–12
Linial M, Linial N, Tishby N, Yona G (1997) Global self-organization of all known protein sequences reveals inherent biological signatures1. J Mole Biol 268(2):539–556
Liu X, Wang L, Yin J, Zhu E, Zhang J (2013) An efficient approach to integrating radius information into multiple kernel learning. IEEE Trans Cybern 43(2):557–569
Lkeski J (2003) Ho-kashyap classifier with generalization control. Pattern Recognition Letters 24(14):2281–2290
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans Neural Netw 14(1):117–126
Mendelson S (2002) Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans Inf Theory 48(1):251–263
Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: A probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. pp 159–168
Rudelson M, Vershynin R (2013) Hanson-wright inequality and sub-gaussian concentration. Electron Commun Prob 18(82):1–9
Scholkopf B, Mika S, Burges CJC, Knirsch P, Muller KR, Ratsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017
Sonnenburg S, Rätsch G, Schäfer C (2006) A general and efficient multiple kernel learning algorithm. 18:1273–1280
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. The Journal of Machine Learning Research 7:1531–1565
Valentini G (2005) An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Trans Syst Man Cybern Part B 35(6):1252–1271
Wang Z, Chen S, Sun T (2008) Multik-mhks: a novel multiple kernel learning algorithm. IEEE Trans Pattern Anal Mach Intell 30(2):348–353
Wang Z, Jie W, Chen S, Gao D (2013) Random projection ensemble learning with multiple empirical kernels. Knowledge-Based Syst 37:388–393
Wang Z, Jie W, Gao D (2013) A novel multiple nyström-approximating kernel discriminant analysis. Neurocomputing 119:385–398
Wang Z, Xu J, Gao D, Fu Y (2013) Multiple empirical kernel learning based on local information. Neural Comput Appl 23(7–8):2113–2120
Welling M (2005) Fisher linear discriminant analysis. Department of Computer Science, University of Toronto, 3
Wu P, Duan F, Guo P (2013) Multiple kernel learning method using mrmr criterion and kernel alignment. In: Neural Information Processing. Springer, pp 113–120
Xiong H (2009) A unified framework for kernelization: The empirical kernel feature space. In: Chinese Conference on Pattern Recognition 2009 (CCPR 2009). IEEE, pp 1–5
Xu QS, Liang YZ (2001) Monte carlo cross validation. Chemom Intell Lab Syst 56(1):1–11
Xu X, Tsang IW, Xu D (2013) Soft margin multiple kernel learning. IEEE Trans Neural Netw Learn Syst 24(5):749–761
Yan F, Mikolajczyk K, Barnard M, Cai H, Kittler J (2010) lp-norm multiple kernel fisher discriminant analysis for object and image categorisation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition. pp 3626–3632
Yang B, Bu Y (2009) Multiple kernel learning using regularized ho-kashyap classifier in empirical kernel mapping space. In: Fifth International Conference on Natural Computation (ICNC’09), volume 1. IEEE, pp 209–212
Yang H, Xu Z, Ye J, King I, Lyu MR (2011) Efficient sparse generalized multiple kernel learning. IEEE Trans Neural Netw 22(3):433–446
Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J Mach Learn Res 6:483–502
Ye J, Li T, Xiong T, Janardan R (2004) Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans Comput Biol Bioinform 1(4):181–190
Acknowledgments
This work was partially supported by Natural Science Foundations of China under Grant Nos. 61272198 and 21176077, Innovation Program of Shanghai Municipal Education Commission under Grant No. 14ZZ054, the Fundamental Research Funds for the Central Universities, Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-2012-003, and Provincial Key Laboratory for Computer Information Processing Technology of Soochow University.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wang, Z., Fan, Q., Jie, W. et al. An Efficient and Effective Multiple Empirical Kernel Learning Based on Random Projection. Neural Process Lett 42, 715–744 (2015). https://doi.org/10.1007/s11063-014-9385-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-014-9385-2