Monte Carlo Information-Geometric Structures

  • Frank NielsenEmail author
  • Gaëtan Hadjeres
Part of the Signals and Communication Technology book series (SCT)


Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback–Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold. We then show how to generate MCIG for arbitrary separable statistical divergence between distributions belonging to a same parametric family of distributions.


Computational information geometry Statistical manifold Dually flat information geometry Bregman generator Stochastic Monte Carlo integration Mixture family Exponential family Clustering 


  1. 1.
    Amari, S.: Information Geometry and Its Applications. Applied Mathematical Sciences. Springer, Japan (2016)CrossRefGoogle Scholar
  2. 2.
    Amari, Si, Cichocki, A.: Information geometry of divergence functions. Bull. Polish Acad. Sci.: Tech. Sci. 58(1), 183–195 (2010)Google Scholar
  3. 3.
    Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 43(3), 211–246 (2001)CrossRefGoogle Scholar
  4. 4.
    Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6(Oct), 1705–1749 (2005)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bhattacharya, B.B., Mukherjee, S., et al.: Inference in Ising models. Bernoulli 24(1), 493–525 (2018)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Boissonnat, J.D., Nielsen, F., Nock, R.: Bregman Voronoi diagrams. Discret. Comput. Geom. 44(2), 281–307 (2010)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  8. 8.
    Calin, O., Udriste, C.: Geometric Modeling in Probability and Statistics. Mathematics and Statistics. Springer International Publishing, Berlin (2014)CrossRefGoogle Scholar
  9. 9.
    Cipra, B.A.: The Ising model is NP-complete. SIAM News 33(6), 1–3 (2000)Google Scholar
  10. 10.
    Cobb, L., Koppstein, P., Chen, N.H.: Estimation and moment recursion relations for multimodal distributions of the exponential family. J. Am. Stat. Assoc. 78(381), 124–130 (1983)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)zbMATHGoogle Scholar
  12. 12.
    Critchley, F., Marriott, P.: Computational information geometry in statistics: theory and practice. Entropy 16(5), 2454–2471 (2014)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Crouzeix, J.P.: A relationship between the second derivatives of a convex function and of its conjugate. Math. Programm. 13(1), 364–365 (1977)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Davis, J.V., Dhillon, I.S.: Differential entropic clustering of multivariate gaussians. In: Advances in Neural Information Processing Systems, pp. 337–344 (2007)Google Scholar
  15. 15.
    Dawid, A.P.: The geometry of proper scoring rules. Ann. Inst. Stat. Math. 59(1), 77–93 (2007)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Eguchi, S.: Second order efficiency of minimum contrast estimators in a curved exponential family. Ann. Stat. 11, 793–803 (1983)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Eguchi, S.: Geometry of minimum contrast. Hiroshima Math. J. 22(3), 631–647 (1992)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Fleisch, D.A.: A Student’s Guide to Vectors and Tensors. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  19. 19.
    Frongillo, R., Reid, M.D.: Convex foundations for generalized MaxEnt models. In: AIP Conference Proceedings, vol. 1636, pp. 11–16. AIP (2014)Google Scholar
  20. 20.
    Gao, B., Pavel, L.: On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning. ArXiv e-prints (2017)Google Scholar
  21. 21.
    Geman, S., Graffigne, C.: Markov random field image models and their applications to computer vision. In: Proceedings of the International Congress of Mathematicians, vol. 1, p. 2 (1986)Google Scholar
  22. 22.
    Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S.: Fast exact \(k\)-means, \(k\)-medians and Bregman divergence clustering in 1D (2017). arXiv:1701.07204
  23. 23.
    Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Fisher-Rao metric of location-scale family is hyperbolic (and can be diagonalized), pp. 192–193. Wiley-Interscience (1997)Google Scholar
  24. 24.
    Lauritzen, S.L.: Statistical manifolds. Differential Geometry in Statistical Inference, p. 164 (1987)Google Scholar
  25. 25.
    Lauster, F., Luke, D.R., Tam, M.K.: Symbolic computation with monotone operators. Set-Valued and Variational Analysis, pp. 1–16 (2017)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Liu, Q., Ihler, A.T.: Distributed estimation, information loss and exponential families. In: Advances in Neural Information Processing Systems, pp. 1098–1106 (2014)Google Scholar
  27. 27.
    Matsuzoe, H., Scarfone, A.M., Wada, T.: A sequential structure of statistical manifolds on deformed exponential family. In: International Conference on Geometric Science of Information, pp. 223–230. Springer (2017)Google Scholar
  28. 28.
    Mitchell, A.F.S.: Statistical manifolds of univariate elliptic distributions. International Statistical Review/Revue Internationale de Statistique, pp. 1–16 (1988)Google Scholar
  29. 29.
    Nielsen, F.: A family of statistical symmetric divergences based on Jensen’s inequality (2010). arXiv:1009.4004
  30. 30.
    Nielsen, F.: Legendre transformation and information geometry (2010)Google Scholar
  31. 31.
    Nielsen, F.: Hypothesis testing, information divergence and computational geometry. In: Geometric Science of Information, pp. 241–248. Springer (2013)Google Scholar
  32. 32.
    Nielsen, F.: An information-geometric characterization of Chernoff information. IEEE Signal Process. Lett. 20(3), 269–272 (2013)CrossRefGoogle Scholar
  33. 33.
    Nielsen, F.: Introduction to HPC with MPI for Data Science. Undergraduate Topics in Computer Science. Springer (2016).
  34. 34.
    Nielsen, F., Boissonnat, J.D., Nock, R.: On Bregman Voronoi diagrams. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp. 746–755 (2007)Google Scholar
  35. 35.
    Nielsen, F., Boltz, S.: The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 57(8), 5455–5466 (2011)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Nielsen, F., Garcia, V.: Statistical exponential families: a digest with flash cards (2009) arXiv:0911.4863
  37. 37.
    Nielsen, F., Garcia, V., Nock, R.: Simplifying Gaussian mixture models via entropic quantization. In: 17th European Conference on Signal Processing (EUSIPCO), pp. 2012–2016. IEEE (2009)Google Scholar
  38. 38.
    Nielsen, F., Hadjeres, G.: Monte Carlo information geometry: the dually flat case (2018). CoRR arXiv:1803.07225
  39. 39.
    Nielsen, F., Nock, R.: On the smallest enclosing information disk. Inf. Process. Lett. 105(3), 93–97 (2008)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Nielsen, F., Nock, R.: Sided and symmetrized Bregman centroids. IEEE Trans. Inf. Theory 55(6), 2882–2904 (2009)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Nielsen, F., Nock, R.: Optimal interval clustering: application to Bregman clustering and statistical mixture learning. IEEE Signal Process. Lett. 21(10), 1289–1292 (2014)CrossRefGoogle Scholar
  42. 42.
    Nielsen, F., Nock, R.: Patch matching with polynomial exponential families and projective divergences. In: International Conference on Similarity Search and Applications, pp. 109–116. Springer (2016)Google Scholar
  43. 43.
    Nielsen, F., Nock, R.: On \(w\)-mixtures: finite convex combinations of prescribed component distributions (2017). CoRR arXiv:1708.00568
  44. 44.
    Nielsen, F., Nock, R.: On the geometric of mixtures of prescribed distributions. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)Google Scholar
  45. 45.
    Nielsen, F., Piro, P., Barlaud, M.: Bregman vantage point trees for efficient nearest neighbor queries. In: 2009 IEEE International Conference on Multimedia and Expo, ICME 2009, pp. 878–881. IEEE (2009)Google Scholar
  46. 46.
    Nielsen, F., Piro, P., Barlaud, M.: Tailored Bregman ball trees for effective nearest neighbors. In: Proceedings of the 25th European Workshop on Computational Geometry (EuroCG), pp. 29–32 (2009)Google Scholar
  47. 47.
    Nielsen, F., Sun, K.: Guaranteed bounds on information-theoretic measures of univariate mixtures using piecewise log-sum-exp inequalities. Entropy 18(12), 442 (2016)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Nielsen, F., Sun, K., Marchand-Maillet, S.: On hölder projective divergences. Entropy 19(3), 122 (2017)CrossRefGoogle Scholar
  49. 49.
    Nock, R., Luosto, P., Kivinen, J.: Mixed Bregman clustering with approximation guarantees. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 154–169. Springer (2008)Google Scholar
  50. 50.
    Notsu, A., Komori, O., Eguchi, S.: Spontaneous clustering via minimum \(\gamma \)-divergence. Neural Comput. 26(2), 421–448 (2014)MathSciNetCrossRefGoogle Scholar
  51. 51.
    Pelletier, B.: Informative barycentres in statistics. Ann. Inst. Stat. Math. 57(4), 767–780 (2005)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Robert, C.P.: Monte Carlo methods. Wiley Online Library (2004)Google Scholar
  53. 53.
    Shima, H.: The Geometry of Hessian Structures. World Scientific, Singapore (2007)Google Scholar
  54. 54.
    Tang, Y., Salakhutdinov, R.R.: Learning stochastic feedforward neural networks. In: Advances in Neural Information Processing Systems, pp. 530–538 (2013)Google Scholar
  55. 55.
    Varga, R.S.: Geršgorin and His Circles, vol. 36. Springer Science & Business Media, Berlin (2010)zbMATHGoogle Scholar
  56. 56.
    Watanabe, S., Yamazaki, K., Aoyagi, M.: Kullback information of normal mixture is not an analytic function. Technical report of IEICE (in Japanese) (2004-0), pp. 41–46 (2004)Google Scholar
  57. 57.
    Zhang, J.: Reference duality and representation duality in information geometry. In: AIP Conference Proceedings, vol. 1641, pp. 130–146. AIP (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Sony Computer Science LaboratoriesTokyoJapan
  2. 2.Sony Computer Science LaboratoryParisFrance

Personalised recommendations