Kernel learning and optimization with Hilbert–Schmidt independence criterion

Wang, Tinghua; Li, Wei

doi:10.1007/s13042-017-0675-7

Kernel learning and optimization with Hilbert–Schmidt independence criterion

Original Article
Published: 11 April 2017

Volume 9, pages 1707–1717, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Tinghua Wang^1,2 &
Wei Li¹

866 Accesses
33 Citations
Explore all metrics

Abstract

Measures of statistical dependence between random variables have been successfully applied in many machine learning tasks, such as independent component analysis, feature selection, clustering and dimensionality reduction. The success is based on the fact that many existing learning tasks can be cast into problems of dependence maximization (or minimization). Motivated by this, we present a unifying view of kernel learning via statistical dependence estimation. The key idea is that good kernels should maximize the statistical dependence between the kernels and the class labels. The dependence is measured by the Hilbert–Schmidt independence criterion (HSIC), which is based on computing the Hilbert–Schmidt norm of the cross-covariance operator of mapped samples in the corresponding Hilbert spaces and is traditionally used to measure the statistical dependence between random variables. As a special case of kernel learning, we propose a Gaussian kernel optimization method for classification by maximizing the HSIC, where two forms of Gaussian kernels (spherical kernel and ellipsoidal kernel) are considered. Extensive experiments on real-world data sets from UCI benchmark repository validate the superiority of the proposed approach in terms of both prediction accuracy and computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

A review of unsupervised feature selection methods

Article 29 January 2019

References

Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Gao X, Fan L, Xu H (2015) Multiple rank multi-linear kernel support vector machine for matrix data classification. Int J Mach Learn Cybern. doi:10.1007/s13042-015-0383-0
Google Scholar
Wang T, Tian S, Huang H, Deng D (2009) Learning by local kernel polarization. Neurocomputing 72(13–15):3077–3084
Article Google Scholar
Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
MathSciNet MATH Google Scholar
Pan B, Chen WS, Xu C, Chen B (2016) A novel framework for learning geometry-aware kernels. IEEE Trans Neural Netw Learn Syst 27(5):939–951
Article MathSciNet Google Scholar
Fukumizu K, Gretton A, Sun X, Schölkopf B (2007) Kernel measures of conditional dependence. Adv Neural Inf Process Syst 20:489–496
Google Scholar
Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, Smola A (2007) A kernel statistical test of independence. Adv Neural Inf Process Syst 20:585–592
Google Scholar
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13:723–773
MathSciNet MATH Google Scholar
Chwialkowski K, Gretton A (2014) A kernel independence test of random process. In: Proceedings of the 31th International Conference on Machine Learning, Beijing, China, pp 1422–1430
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48
MathSciNet MATH Google Scholar
Gretton A, Smola A, Bousquet O, Herbrich R, Belitski A, Augath M, Murayama Y, Pauls J, Schölkopf B, Logothetis NK (2005) Kernel constrained covariance for dependence measurement. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, Bridgetown, Barbados, pp 112–119
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory, Singapore, pp 63–77
Song L, Smola A, Gretton A, Borgwardt K (2007) A dependence maximization view of clustering. In: Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA, pp 823–830
Zhong W, Pan W, Kwok JT, Tsang IW (2010) Incorporating the loss function into discriminative clustering of structured outputs. IEEE Trans Neural Netw 21(10):1564–1575
Article Google Scholar
Camps-Valls G, Mooij J, Schölkopf B (2010) Remote sensing feature selection by kernel dependence measures. IEEE Geosci Remote Sens Lett 7(3):587–591
Article Google Scholar
Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learn Res 13:1393–1434
MathSciNet MATH Google Scholar
Chen J, Ji S, Ceran B, Li Q, Wu M, Ye J (2008) Learning subspace kernels for classification. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp 106–114
Shu X, Lai D, Xu H, Tao L (2015) Learning shared subspace for multi-label dimensionality reduction via dependence maximization. Neurocomputing 168:356–364
Article Google Scholar
Chapelle O, Vapnik V, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1):131–159
Article MATH Google Scholar
Keerthi SS (2002) Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans Neural Netw 13(5):1225–1229
Article Google Scholar
Liu Y, Liao S, Hou Y (2011) Learning kernels with upper bounds of leave-one-out error. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, Glasgow, United Kingdom, pp 2205–2208
Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola J (2001) On kernel-target alignment. Adv Neural Inf Process Syst 14:367–373
Google Scholar
Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13:795–828
MathSciNet MATH Google Scholar
Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192
Article Google Scholar
Baram Y (2005) Learning by kernel polarization. Neural Comput 17(6):1264–1275
Article MathSciNet MATH Google Scholar
Wang T, Zhao D, Feng Y (2013) Two-stage multiple kernel learning with multiclass kernel polarization. Knowl-Based Syst 48:10–16
Article Google Scholar
Nguyen CH, Ho TB (2008) An efficient kernel matrix evaluation measure. Pattern Recognit 41(11):3366–3372
Article MATH Google Scholar
Wang L (2008) Feature selection with kernel class separability. IEEE Trans Pattern Anal Mach Intell 30(9):1534–1546
Article Google Scholar
Steinwart I (2001) On the influence of the kernels on the consistency of support vector machines. J Mach Learn Res 2:67–93
MathSciNet MATH Google Scholar
Sugiyama M (2012) On kernel parameter selection in Hilbert-Schmidt independence criterion. IEICE Trans Inf Syst E95-D(10):2564–2567
Article Google Scholar
Lu Y, Wang L, Lu J, Yang J, Shen C (2014) Multiple kernel clustering based on centered kernel alignment. Pattern Recognit 47(11):3656–3664
Article MATH Google Scholar
Neumann J, Schnörr C, Steidl G (2005) Combined SVM-based feature selection and classification. Mach Learn 61(1–3):129–150
Article MATH Google Scholar
Lichman M (2013) UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml/
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Chen PH, Lin CJ, Chölkopf B (2005) A tutorial on—support vector machines. Appl Stoch Models Bus Ind 21(2):111–136
Article MathSciNet Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Chen Z, Haykin S (2002) On different facets of regularization theory. Neural Comput 14(12):2791–2846
Article MATH Google Scholar
Liu P, Huang Y, Meng L, Gong S, Zhang G (2016) Two-stage extreme learning machine for high-dimensional data. Int J Mach Learn Cybern 7(5):765–772
Article Google Scholar
Chen C, Zhang J, He X, Zhou ZH (2012) Non-parametric kernel learning with robust pairwise constraints. Int J Mach Learn Cybern 3(2):83–96
Article Google Scholar
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Article Google Scholar
Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput 26(1):185–207
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (No. 61562003), the Natural Science Foundation of Jiangxi Province of China (No. 20161BAB202070), and the China Scholarship Council (No. 201508360144). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, 341000, People’s Republic of China
Tinghua Wang & Wei Li
Decision Systems and e-Service Intelligence Laboratory, Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, NSW, 2007, Australia
Tinghua Wang

Authors

Tinghua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tinghua Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Li, W. Kernel learning and optimization with Hilbert–Schmidt independence criterion. Int. J. Mach. Learn. & Cyber. 9, 1707–1717 (2018). https://doi.org/10.1007/s13042-017-0675-7

Download citation

Received: 06 November 2016
Accepted: 30 March 2017
Published: 11 April 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s13042-017-0675-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel learning and optimization with Hilbert–Schmidt independence criterion

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

Learning from imbalanced data: open challenges and future directions

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel learning and optimization with Hilbert–Schmidt independence criterion

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

Learning from imbalanced data: open challenges and future directions

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation