Admissible kernels for RKHS embedding of probability distributions

  • Liangzhi Chen
  • Thomas Hotz
  • Haizhang ZhangEmail author
Regular Article


Similarity measurement of two probability distributions is important in many applications of statistics. Embedding such distributions into a reproducing kernel Hilbert space (RKHS) has many favorable properties. The choice of the reproducing kernel is crucial in the approach. We study this question by considering the similarity of two distributions of the same class. In particular, we investigate when the RKHS embedding is “admissible” in the sense that the distance between the embeddings should become smaller when the expectations are getting closer or when the variance is increasing to infinity. We give conditions on the widely-used translation-invariant reproducing kernels to be admissible. We also extend the study to multivariate non-symmetric Gaussian distributions.


Gaussian distributions Reproducing kernels RKHS embedding Translation-invariant kernels Radially decreasing functions 



  1. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetCrossRefGoogle Scholar
  2. Beaulieu NC, Young DJ (2009) Designing time-hopping ultrawide bandwidth receivers for multiuser interference environments. Proc IEEE 97(2):255–284CrossRefGoogle Scholar
  3. Berlinet A, Thomas-Agnan C (2004) Reproducing kernel Hilbert spaces in probability and statistics. Kluwer, DordrechtCrossRefGoogle Scholar
  4. Bochner S (1937) Stable laws of probability and completely monotone functions. Duke Math J 3(4):726–728MathSciNetCrossRefGoogle Scholar
  5. Bochner S (1959) Lectures on Fourier integrals with an author’s supplement on monotonic functions, Stieltjes integrals, and harmonic analysis. Annals of mathematics studies, vol 42. Princeton University, New JerseyzbMATHGoogle Scholar
  6. Chen W, Wang B, Zhang H (2016) Universalities of reproducing kernels revisited. Appl Anal 95:1776–1791MathSciNetCrossRefGoogle Scholar
  7. Dudley RM (2002) Real analysis and probability. Cambridge University Press, Cambridge, UKCrossRefGoogle Scholar
  8. Engelking R (1989) Gerneral topology, 2nd edn. Heldermann-Verlag, BerlinzbMATHGoogle Scholar
  9. Fukumizu K, Gretton A, Sun X, Schölkopf B (2008) Kernel measures of conditional dependence. In: Advances in neural information processing systems, vol 20. MIT Press, Cambridge, pp 489–496Google Scholar
  10. Fukumizu K, Bach FR, Jordan MI (2009) Kernel dimension reduction in regression. Ann Stat 37:1871–1905MathSciNetCrossRefGoogle Scholar
  11. Gretton A, Borgwardt K, Rasch B, Schölkopf B, Smola A (2007) A kernel methods for the two sample problem. In: Advances in neural information processing systems, vol 19. MIT Press, Cambridge, pp 513–520Google Scholar
  12. Lieb EH, Loss M (2001) Analysis. American Mathematical Society, New YorkzbMATHGoogle Scholar
  13. Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693CrossRefGoogle Scholar
  14. Miller J, Thomas JB (1972) Detectors for discrete-time signals in non-Gaussian noise. IEEE Trans Inf Theory 18(2):241–250CrossRefGoogle Scholar
  15. Moulin P, Liu J (1999) Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans Inf Theory 45(3):909–919MathSciNetCrossRefGoogle Scholar
  16. Müller A (1997) Integral probability metrics and their generating classes of functions. Adv Appl Probab 29:429–443MathSciNetCrossRefGoogle Scholar
  17. Rachev ST (1991) Probability metrics and the stability models. Wiley, ChichesterzbMATHGoogle Scholar
  18. Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann. Math. (2) 39:811–841MathSciNetCrossRefGoogle Scholar
  19. Shorack GR (2000) Probability for statisticians. Springer, New YorkzbMATHGoogle Scholar
  20. Smola AJ, Gretton A, Song L, Schölkopf B (2007) A Hilbert space embedding for distributions. In: Proc. 18th international conference on algorithmic learning theory. Springer, Berlin, pp 13–31Google Scholar
  21. Song G, Zhang H, Hickernell FJ (2013) Reproducing kernel banach spaces with the \(\ell ^1\) norm. Appl Comput Harmon Anal 34:96–116MathSciNetCrossRefGoogle Scholar
  22. Sriperumbudur BK, Gretton A, Fukumizu K, Schölkopf B, Lanckriet GRG (2009) On integral probability metrics, \(\phi \)-divergences and binary classification. Computing Research Repository. arXiv: 0901.2698v4
  23. Sriperumbudur BK, Gretton A, Fukumizu K, Schölkopf B, Lanckriet GRG (2010) Hilbert space embeddings and metrics on probability measures. J Mach Learn Res 11:1517–1561MathSciNetzbMATHGoogle Scholar
  24. Sriperumbudur BK, Fukumizu K, Lanckriet GRG (2011) Learning in Hilbert vs. Banach spaces: a measure embedding viewpoint. In: Advances in neural information processing systems, vol 24. MIT Press, pp 1773–1781Google Scholar
  25. Sriperumbudur BK, Fukumizu K, Lanckriet GRG (2011) Universality, characteristic kernels and RKHS embedding of measures. J Mach Learn Res 12:2389–2410MathSciNetzbMATHGoogle Scholar
  26. Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res 2:67–93MathSciNetzbMATHGoogle Scholar
  27. Vajda I (1989) Theory of statistical inference and information. Kluwer Academic Publishers, BostonzbMATHGoogle Scholar
  28. van Mill J (1989) Infinite-dimensional topology, prerequisites and introduction. North-Holland math. library, vol 43. Elsevier, AmsterdamGoogle Scholar
  29. Weaver N (1999) Lipschitz algebras. World Scientific Publishing Company, SingaporeCrossRefGoogle Scholar
  30. Wendland H (2005) Scattered data approximation. Cambridge University Press, CambridgezbMATHGoogle Scholar
  31. Wu ZM (1995) Compactly supported positive definite radial functions. Adv Comput Math 4(3):283–292MathSciNetCrossRefGoogle Scholar
  32. Zhang H, Zhao L (2013) On the inclusion relation of reproducing kernel Hilbert spaces. Anal Appl 11, 1350014MathSciNetCrossRefGoogle Scholar
  33. Zhang H, Xu Y, Zhang J (2009) Reproducing kernel Banach spaces for machine learning. J Mach Learn Res 10:2741–2775MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Data and Computer ScienceSun Yet-sen UniversityGuangzhouPeople’s Republic of China
  2. 2.Institute of MathematicsIlmenauGermany
  3. 3.School of Data and Computer Science and Guangdong Province Key Laboratory of Computational ScienceSun Yat-sen UniversityGuangzhouPeople’s Republic of China

Personalised recommendations