Measuring Statistical Dependence with Hilbert-Schmidt Norms

  • Arthur Gretton
  • Olivier Bousquet
  • Alex Smola
  • Bernhard Schölkopf
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3734)


We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.


Independent Component Analysis Covariance Operator Independent Component Analysis Reproduce Kernel Hilbert Space Independence Criterion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achard, S., Pham, D.-T., Jutten, C.: Quadratic dependence measure for nonlinear blind source separation. In: 4th International Conference on ICA and BSS (2003)Google Scholar
  2. 2.
    Amari, S.-I., Cichoki, A., Yang, H.: A new learning algorithm for blind signal separation. Advances in Neural Information Processing Systems 8, 757–763 (1996)Google Scholar
  3. 3.
    Bach, F., Jordan, M.: Kernel independent component analysis. Journal of Machine Learning Research 3, 1–48 (2002)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Baker, C.R.: Joint measures and cross-covariance operators. Transactions of the American Mathematical Society 186, 273–289 (1973)CrossRefGoogle Scholar
  5. 5.
    Bell, A., Sejnowski, T.: An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7(6), 1129–1159 (1995)CrossRefGoogle Scholar
  6. 6.
    Cardoso, J.-F.: Blind signal separation: statistical principles. Proceedings of the IEEE 90(8), 2009–2026 (1998)CrossRefGoogle Scholar
  7. 7.
    Chen, A., Bickel, P.: Consistent independent component analysis and prewhitening, Tech. report, Berkeley (2004)Google Scholar
  8. 8.
    Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition. In: Applications of mathematics, vol. 31. Springer, New York (1996)Google Scholar
  9. 9.
    Fukumizu, K., Bach, F.R., Jordan, M.I.: Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research 5, 73–99 (2004)MathSciNetGoogle Scholar
  10. 10.
    Gretton, A., Herbrich, R., Smola, A.: The kernel mutual information, Tech. report, Cambridge University Engineering Department and Max Planck Institute for Biological Cybernetics (2003)Google Scholar
  11. 11.
    Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J., Schölkopf, B., Logothetis, N.: Kernel constrained covariance for dependence measurement. AISTATS 10 (2005)Google Scholar
  12. 12.
    Hein, M., Bousquet, O.: Kernels, associated structures, and generalizations, Tech. Report 127, Max Planck Institute for Biological Cybernetics (2004)Google Scholar
  13. 13.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. John Wiley and Sons, New York (2001)CrossRefGoogle Scholar
  15. 15.
    Leurgans, S.E., Moyeed, R.A., Silverman, B.W.: Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, Series B (Methodological) 55(3), 725–740 (1993)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Miller, E., Fisher III, J.: ICA using spacings estimates of entropy. JMLR 4, 1271–1295 (2003)CrossRefGoogle Scholar
  17. 17.
    Rényi, A.: On measures of dependence. Acta Math. Acad. Sci. Hungar 10, 441–451 (1959)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Samarov, A., Tsybakov, A.: Nonparametric independent component analysis. Bernoulli 10, 565–582 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. JMLR 2 (2001)Google Scholar
  20. 20.
    Yamanishi, Y., Vert, J.-P., Kanehisa, M.: Heterogeneous data comparison and gene selection with kernel canonical correlation analysis. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, pp. 209–229. MIT Press, Cambridge (2004)Google Scholar
  21. 21.
    Zwald, L., Bousquet, O., Blanchard, G.: Statistical properties of kernel principal component analysis. In: Proceedings of the 17th Conference on Computational Learning Theory (COLT) (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Arthur Gretton
    • 1
  • Olivier Bousquet
    • 2
  • Alex Smola
    • 3
  • Bernhard Schölkopf
    • 1
  1. 1.MPI for Biological CyberneticsTübingenGermany
  2. 2.PertinenceParisFrance
  3. 3.National ICT AustraliaCanberraAustralia

Personalised recommendations