Statistics and Computing

, Volume 25, Issue 4, pp 767–780 | Cite as

The Poisson transform for unnormalised statistical models

  • Simon Barthelmé
  • Nicolas ChopinEmail author


Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Extending classical results on the multinomial-Poisson transform (Baker In: J Royal Stat Soc 43(4):495–504, 1994), we show that inferring the parameters of a unnormalised model on a space \(\Omega \) can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on \(\Omega \). The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive estimation method of Gutmann and Hyvärinen (J Mach Learn Res 13(1):307–361, 2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.


Logistic Regression Point Process Normalisation Constant Markov Chain Model Bregman Divergence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material


  1. Baddeley, A., Berman, M., Fisher, N.I., Hardegen, A., Milne, R.K., Schuhmacher, D., Shah, R., Turner, R.: Spatial logistic regression and change-of-support in Poisson point processes. Electro. J. Stat. 4, 1151–1201 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  2. Baddeley, A., Coeurjolly, J.-F., Rubak, E., Waagepetersen, R.: Logistic regression for spatial Gibbs point processes. Biometrika 101(2), 377–392 (2014)Google Scholar
  3. Baker, S.G.: The multinomial-Poisson transformation. J. Royal Stat. Soc. 43(4), 495–504 (1994)Google Scholar
  4. Barthelmé, S., Trukenbrod, H., Engbert, R., Wichmann, F.: Modeling fixation locations using spatial point processes. J. Vis. 13(12), 1 (2013)CrossRefGoogle Scholar
  5. Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Comput. 21(6), 1601–1621 (2009)zbMATHMathSciNetCrossRefGoogle Scholar
  6. Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc. Netw. 33(1), 41–55 (2011)CrossRefGoogle Scholar
  7. Engbert, R., Trukenbrod, H.A., Barthelmé, S., Wichmann, F.A.: Spatial statistics and attentional dynamics in scene viewing. J. Vis. 15(1), 14 (2014)CrossRefGoogle Scholar
  8. Foulsham, T., Kingstone, A., Underwood, G.: Turning the world around: patterns in saccade direction vary with picture orientation. Vis. Res. 48(17), 1777–1790 (2008)CrossRefGoogle Scholar
  9. Geyer, C.J.: Estimating normalizing constants and reweighting mixtures in markov chain monte carlo. Technical Report 568, School of Statistics, University of Minnesota (1994)Google Scholar
  10. Girolami, M., Lyne, A.-M., Strathmann, H., Simpson, D., Atchade, Y.: Playing russian roulette with intractable likelihoods. arxiv:1306.4032 (2013)
  11. Gu, M.G., Zhu, H.-T.: Maximum likelihood estimation for spatial models by markov chain monte carlo stochastic approximation. J. Royal Stat. Soc. 63(2), 339–355 (2001)zbMATHMathSciNetCrossRefGoogle Scholar
  12. Gutmann, M., Ichiro Hirayama, J.: Bregman divergence as general framework to estimate unnormalized statistical models. In: Cozman, F.G., Pfeffer, A. (eds.) Uncertainty in Artificial Intelligence, pp. 283–290. AUAI Press, Barcelona (2011)Google Scholar
  13. Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(1), 307–361 (2012)zbMATHMathSciNetGoogle Scholar
  14. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, New York (2003). corrected editionGoogle Scholar
  15. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  16. Kienzle, W., Franz, M.O., Schölkopf, B., Wichmann, F.A.: Center-surround patterns emerge as optimal predictors for human saccade targets. J. Vis. 9(5), 7 (2009)CrossRefGoogle Scholar
  17. Kingman, J.F.C.: Poisson Processes (Oxford Studies in Probability). Oxford University Press, London (1993)Google Scholar
  18. Li, P., König, A.C.: Theory and applications of b-bit minwise hashing. Commun. ACM 54(8), 101–109 (2011)CrossRefGoogle Scholar
  19. Minka, T.: Divergence Measures and Message Passing. Technical report, Microsoft Research Technical Report (2005)Google Scholar
  20. Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 2265–2273. Curran Associates Inc, Red Hook (2013)Google Scholar
  21. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th International Conference on Machine Learning, pp. 1751–1758 (2012a)Google Scholar
  22. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, 26 June–1 July, 2012 (2012b)Google Scholar
  23. Møller, J., Pettitt, A.N., Reeves, R., Berthelsen, K.K.: An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93(2), 451–458 (2006)Google Scholar
  24. Murray, I., Ghahramani, Z., MacKay, D.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI) (2006) Google Scholar
  25. Pihlaja, M., Gutmann, M., Hyvärinen, A.: A family of computationally E cient and simple estimators for unnormalized statistical models. In: UAI 2010, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, pp. 442–449, 8–11 July 2010, (2010)Google Scholar
  26. Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)Google Scholar
  27. Schölkopf, B., Smola, A.J.: Learning with Kernels : Support Vector Machines, Regularization, Optimization, and Beyond, 1st edn. The MIT Press, Cambridge (2001)Google Scholar
  28. Tatler, B., Vincent, B.: The prominence of behavioural biases in eye guidance. Vis. Cogn. 17(6), 1029–1054 (2009)CrossRefGoogle Scholar
  29. Van der Vaart, A.W.: Asymptotic statistics. Cambridge University Press, Cambridge (2000)Google Scholar
  30. Walker, S.G.: Posterior sampling when the normalizing constant is unknown. Commun. Stat. 40(5), 784–792 (2011)zbMATHCrossRefGoogle Scholar
  31. Wang, C., Komodakis, N., Paragios, N.: Markov random field modeling, inference & learning in computer vision & image understanding: a survey. Comput. Vis. Image Underst. 117(11), 1610–1627 (2013)CrossRefGoogle Scholar
  32. Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688 (2011)Google Scholar
  33. Wood, S.: Generalized Additive Models: An Introduction with R (Chapman & Hall/CRC Texts in Statistical Science), 1st edn. Chapman and Hall/CRC, Boca Raton (2006)Google Scholar
  34. Wood, S.N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. Royal Stat. Soc. 73(1), 3–36 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.CNRSGIPSA-LabGrenobleFrance
  2. 2.CREST-ENSAE and HECParisFrance

Personalised recommendations