Designing a Metric for the Difference between Gaussian Densities

  • Karim T. Abou–Moustafa
  • Fernando De La Torre
  • Frank P. Ferrie
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 83)


Measuring the difference between two multivariate Gaussians is central to statistics and machine learning. Traditional measures based on the Bhattacharyya coefficient or the symmetric Kullback-Leibler divergence do not satisfy metric properties necessary for many algorithms. This paper proposes a metric for Gaussian densities. Similar to the Bhattacharyya distance and the symmetric Kullback-Leibler divergence, the proposed metric reduces the difference between two Gaussians to the difference between their parameters. Based on the proposed metric we introduce a symmetric and positive semi-definite kernel between Gaussian densities. We illustrate the benefits of the proposed metric in two settings: (1) a supervised problem, where we learn a low-dimensional projection that maximizes the distance between Gaussians, and (2) an unsupervised problem on spectral clustering where the similarity between samples is measured with our proposed kernel.


Covariance Matrice Triangle Inequality Mahalanobis Distance Spectral Cluster Dissimilarity Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ali, S.M., Silvey, S.D.: A general class of coefficients of divergence of one distribution from another. J. of the Royal Statistical Society. Seris B 28(1), 131–142 (1966)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis metric from equivalence constraints. J. of Machine Learning Research 6, 937–965 (2005)MathSciNetGoogle Scholar
  3. 3.
    Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)zbMATHMathSciNetGoogle Scholar
  4. 4.
    Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 22, 493–507 (1952)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Christakos, G., Papanicolaou, V.: Norm–dependent covariance permissibility of weakly homogeneous spatial random fields and its consequences in spatial statistics. Stochastic Environmental Research and Risk Management 14, 471–478 (2000)CrossRefGoogle Scholar
  6. 6.
    Csiszár, I.: Information–type measures of difference of probability distributions and indirect observations. Studia Scientiarium Mathematicarum Hungarica 2, 299–318 (1967)zbMATHGoogle Scholar
  7. 7.
    De La Torre, F., Kanade, T.: Multimodal oriented discriminant analysis. In: ACM Proc. of ICML, pp. 177–184 (2005)Google Scholar
  8. 8.
    Elkan, C.: Using the trinagle inequality to accelerate k–means. In: ACM Proc. of ICML, pp. 147–153 (2003)Google Scholar
  9. 9.
    Förstner, W., Moonen, B.: A metric for covariance matrices. Tech. rep., Dept. of Geodesy and Geo–Informatics, Stuttgart University (1999)Google Scholar
  10. 10.
    Genton, M.: Classes of kernels for machine learning: A statistics perspective. J. of Machine Learning Research 2, 299–312 (2001)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS 11, pp. 487–493. MIT Press, Cambridge (1999)Google Scholar
  12. 12.
    Jebara, T., Kondor, R., Howard, A.: Probability product kenrels. J. of Machine Learning Research 5, 819–844 (2004)MathSciNetGoogle Scholar
  13. 13.
    Kailath, T.: The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. on Communication Technology 15(1), 52–60 (1967)CrossRefGoogle Scholar
  14. 14.
    Knuth, D.E. (ed.): The Stanford graphbase. Springer, New York (1988)Google Scholar
  15. 15.
    Kondor, R., Jebara, T.: A kernel between sets of vectors. In: ACM Proc. of ICML 2003 (2003)Google Scholar
  16. 16.
    Kullback, S.: Information Theory and Statistics – Dover Edition. Dover, New York (1997)Google Scholar
  17. 17.
    Lafferty, J., Lebanon, G.: Information diffusion kernels. In: NIPS 14, MIT Press, Cambridge (2002)Google Scholar
  18. 18.
    Luxburg, U.v.: A tutotrial on spectral clustering. Tech. Rep. TR–149, Max Plank Institute for Biological Cybernetics (2006)Google Scholar
  19. 19.
    Martins, A., Smith, N., Xing, E., Aguiar, P., Figueiredo, M.: Nonextensive information theoretic kernels on measures. J. of Machine Learning Research 10, 935–975 (2009)MathSciNetGoogle Scholar
  20. 20.
    Moreno, P., Ho, P., Vasconcelos, N.: A Kullback–Leibler divergence based kernel for svm classification in multimedia applications. In: NIPS 16 (2003)Google Scholar
  21. 21.
    Rao, C.: Use of Hellinger distance in graphical displays. In: Titt, E., Kollo, T., Niemi, H. (eds.) Multivariate Statistics and Matrices in Statistics, pp. 143–161 (1995)Google Scholar
  22. 22.
    Roth, V., Laub, J., Buhmann, J.: Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans. PAMI 25(12), 1540–1551 (2003)Google Scholar
  23. 23.
    Schölkopf, B., Smola, A.: Learning with kernels. MIT Press, Cambridge (2002)Google Scholar
  24. 24.
    Shi, J., Malik, J.: Motion segmentation and tracking using normalized cuts. In: IEEE Proc. of ICCV, pp. 1154–1160 (1998)Google Scholar
  25. 25.
    Sriperumbudur, B., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.: Hilbert space embeddings and metrics on probability distributions. J. of Machine Learning Research 11, 1517–1561 (2010)Google Scholar
  26. 26.
    Tsuda, K., Kawanabi, M., Ratsch, G., Sonnenburg, S., Muller, K.R.: A new discriminative kernel from probability distributions. Neural Computation 14, 2397–2414 (2002)zbMATHCrossRefGoogle Scholar
  27. 27.
    Young, G., Householder, A.: Discussion of a set of points in temrs of their mutual distances. Psychometrika 3(1), 19–22 (1938)CrossRefGoogle Scholar
  28. 28.
    Zha, H., Ding, C., Gu, M., He, X., Simon, H.: Spectral relaxation for k–means clustering. In: NIPS 13, MIT Press, Cambridge (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Karim T. Abou–Moustafa
    • 1
  • Fernando De La Torre
    • 2
  • Frank P. Ferrie
    • 1
  1. 1.Centre of Intelligent Machines (CIM)McGill UniversityMontrealCanada
  2. 2.The Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations