Learning Mixtures by Simplifying Kernel Density Estimators



Gaussian mixture models are a widespread tool for modeling various and complex probability density functions. They can be estimated by various means, often using Expectation–Maximization or Kernel Density Estimation. In addition to these well known algorithms, new and promising stochastic modeling methods include Dirichlet Process mixtures and k-Maximum Likelihood Estimators. Most of the methods, including Expectation–Maximization, lead to compact models but may be expensive to compute. On the other hand Kernel Density Estimation yields to large models which are computationally cheap to build. In this chapter we present new methods to get high-quality models that are both compact and fast to compute. This is accomplished by the simplification of Kernel Density Estimator. The simplification is a clustering method based on k-means-like algorithms. Like all k-means algorithms, our method rely on divergences and centroids computation and we use two different divergences (and their associated centroids), Bregman and . Along with the description of the algorithms, we describe the pyMEF =library=, which is a Python library designed for the manipulation of mixture of exponential families. Unlike most of the other existing tools, this library allows to use any exponential family instead of being limited to a particular distribution. The generic library allows to rapidly explore the different available exponential families in order to choose the better suited for a particular application. We evaluate the proposed algorithms by building mixture models on examples from a bio-informatics application. The quality of the resulting models is measured in terms of log-likelihood and of Kullback–Leibler divergence.


Kernel Density Estimation Simplification Expectation–Maximization k-means Bregman Fisher-Rao 



The authors would like to thank Julie Bernauer (INRIA team Amib, LIX, École Polytechnique) for insightful discussions about the bio-informatics application of our work and for providing us with the presented dataset. FN (5793b870) would like to thank Dr Kitano and Dr Tokoro for their support.


  1. 1.
    Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MathSciNetMATHGoogle Scholar
  2. 2.
    Bernauer, J., Huang, X., Sim, A.Y.L., Levitt, M.: Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation. RNA 17(6), 1066 (2011)CrossRefGoogle Scholar
  3. 3.
    Biernacki, C., Celeux, G., Govaert, G., Langrognet, F.: Model-based cluster and discriminant analysis with the MIXMOD software. Comput. Stat. Data Anal. 51(2), 587–600 (2006)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Brown, L.D.: Fundamentals of statistical exponential families: with applications in statistical decision theory. IMS (1986)Google Scholar
  5. 5.
    Čencov, N.N.: Statistical Decision Rules and Optimal Inference, Translations of Mathematical Monographs, vol. 53. American Mathematical Society, Providence, R.I. (1982). (Translation from the Russian edited by Lev J. Leifman)Google Scholar
  6. 6.
    Costa, S.I.R., Santos, S.A., Strapasson, J.E.: Fisher information matrix and hyperbolic geometry. In: Information Theory Workshop, 2005 IEEE, 3 pp, 29 Aug-1 Sept (2005)Google Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodological), 1–38 (1977)Google Scholar
  8. 8.
    Galperin, G.A.: A concept of the mass center of a system of material points in the constant curvature spaces. Commun. Math. Phys. 154(1), 63–84 (1993)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Garcia, V., Nielsen, F., Nock, R.: Levels of details for gaussian mixture models. In: Computer Vision-ACCV 2009, 514–525 (2010)Google Scholar
  10. 10.
    Georgi, B., Costa, I.G., Schliep, A.: PyMix–the Python mixture package–a tool for clustering of heterogeneous biological data. BMC Bioinf. 11(1), 9 (2010)CrossRefGoogle Scholar
  11. 11.
    Ji, Y., Wu, C., Liu, P., Wang, J., Coombes, K.R.: Applications of beta-mixture models in bioinformatics. Bioinformatics 21(9), 2118 (2005)CrossRefGoogle Scholar
  12. 12.
    Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Wiley, New York (1987)Google Scholar
  13. 13.
    Mayrose, I. Friedman, N. Pupko, T.: A gamma mixture model better accounts for among site rate heterogeneity. Bioinformatics 21(Suppl 2), ii151-ii158 (2005)Google Scholar
  14. 14.
    Nielsen, F. Boltz, S. Schwander, O.: Bhattacharyya clustering with applications to mixture simplifications. In: IEEE International Conference on Pattern Recognition, Istanbul, Turkey, ICPR’10 (2010)Google Scholar
  15. 15.
    Nielsen, F., Garcia. V.: Statistical exponential families: a digest with flash cards. arXiv:0911.4863 (2009)Google Scholar
  16. 16.
    Nielsen, F., Nock, R.: Hyperbolic voronoi diagrams made easy. arXiv:0903.3287 (2009)Google Scholar
  17. 17.
    Nielsen, F., Nock. R.: Jensen-bregman voronoi diagrams and centroidal tessellations. In: 2010 International Symposium on Voronoi Diagrams in Science and Engineering (ISVD), pp. 56–65. IEEE (2010)Google Scholar
  18. 18.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Pelletier, B.: Informative barycentres in statistics. Ann. Inst. Stat. Math. 57(4), 767–780 (2005)MathSciNetMATHCrossRefGoogle Scholar
  20. 20.
    Rasmussen, C.E.: The infinite gaussian mixture model. Adv. Neural Inf. Process. Systems 12, 554–560 (2000)Google Scholar
  21. 21.
    Reverter, F., Oller, J.M.: Computing the rao distance for gamma distributions. J. Comput. Appl. Math. 157(1), 155–167 (2003)MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Rong, G., Jin, M., Guo, X.: Hyperbolic centroidal voronoi tessellation. In: Proceedings of the 14th ACM Symposium on Solid and Physical Modeling, SPM ’10, pp. 117–126. ACM, New York, NY, USA (2010)Google Scholar
  23. 23.
    Schwander, O., Nielsen, F.: Model centroids for the simplification of kernel density estimators. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March (2012)Google Scholar
  24. 24.
    Seabra, J.C., Ciompi, F., Pujol, O., Mauri, J., Radeva, P., Sanches, J.: Rayleigh mixture model for plaque characterization in intravascular ultrasound. IEEE Trans. Biomed. Eng. 58(5), 1314–1324 (2011)CrossRefGoogle Scholar
  25. 25.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B (Methodological) 53(3), 683–690 (1991)MathSciNetMATHGoogle Scholar
  26. 26.
    Sim, A.Y.L., Schwander, O., Levitt, M., Bernauer, J.: Evaluating mixture models for building rna knowledge-based potentials. J. Bioinf. Comput. Biol. (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Laboratoire d’InformatiqueÉcole PolytechniquePalaiseauFrance
  2. 2.Sony Computer Science Laboratories IncTokyoJapan

Personalised recommendations