Measure Transformer Semantics for Bayesian Machine Learning

  • Johannes Borgström
  • Andrew D. Gordon
  • Michael Greenberg
  • James Margetson
  • Jurgen Van Gael
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6602)


The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.


  1. 1.
    Abadi, M., Rogaway, P.: Reconciling two views of cryptography (the computational soundness of formal encryption). J. Cryptology 15(2), 103–127 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Barthe, G., Grégoire, B., Béguelin, S.Z.: Formal certification of code-based cryptographic proofs. In: POPL, pp. 90–101. ACM, New York (2009)Google Scholar
  3. 3.
    Billingsley, P.: Probability and Measure, 3rd edn. Wiley, Chichester (1995)zbMATHGoogle Scholar
  4. 4.
    Bonawitz, K.A.: Composable Probabilistic Inference with Blaise. PhD thesis, MIT, Available as Technical Report MIT-CSAIL-TR-2008-044 (2008)Google Scholar
  5. 5.
    Borgström, J., Gordon, A.D., Greenberg, M., Margetson, J., Van Gael, J.: Measure transformer semantics for Bayesian machine learning. Technical report, Microsoft Research (2011)Google Scholar
  6. 6.
    Dalvi, N.N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)CrossRefGoogle Scholar
  7. 7.
    Erwig, M., Kollmansberger, S.: Functional pearls: Probabilistic functional programming in Haskell. J. Funct. Program. 16(1), 21–34 (2006)CrossRefzbMATHGoogle Scholar
  8. 8.
    Fraser, D.A.S., McDunnough, P., Naderi, A., Plante, A.: On the definition of probability densities and sufficiency of the likelihood map. J. Probability and Mathematical Statistics 15, 301–310 (1995)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Goodman, N., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: a language for generative models. In: UAI, pp. 220–229. AUAI Press (2008)Google Scholar
  10. 10.
    Gupta, V., Jagadeesan, R., Panangaden, P.: Stochastic processes as concurrent constraint programs. In: POPL 1999, pp. 189–202 (1999)Google Scholar
  11. 11.
    Herbrich, R., Minka, T., Graepel, T.: TrueSkill(TM): A Bayesian skill rating system. In: Advances in Neural Information Processing Systems 20 (2007)Google Scholar
  12. 12.
    Jaynes, E.T.: 15.7 The Borel-Kolmogorov paradox. In: Probability Theory: The Logic of Science, pp. 467–470. CUP (2003)Google Scholar
  13. 13.
    Jones, C., Plotkin, G.D.: A probabilistic powerdomain of evaluations. In: LICS, pp. 186–195. IEEE Computer Society, Los Alamitos (1989)Google Scholar
  14. 14.
    Kiselyov, O., Shan, C.: Monolingual probabilistic programming using generalized coroutines. In: UAI (2009)Google Scholar
  15. 15.
    Koller, D., Friedman, N.: Probabilistic Graphical Models. The MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  16. 16.
    Koller, D., McAllester, D.A., Pfeffer, A.: Effective Bayesian inference for stochastic programs. In: AAAI/IAAI, pp. 740–747 (1997)Google Scholar
  17. 17.
    Kozen, D.: Semantics of probabilistic programs. J. Comput. Syst. Sci. 22(3), 328–350 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47(2), 498–519 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kwiatkowska, M.Z., Norman, G., Parker, D.: Quantitative analysis with the probabilistic model checker PRISM. ENTCS 153(2), 5–31 (2006)Google Scholar
  20. 20.
    Lowe, G.: Quantifying information flow. In: CSFW, pp. 18–31. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  21. 21.
    MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithms. CUP (2003)Google Scholar
  22. 22.
    McCallum, A., Schultz, K., Singh, S.: FACTORIE: Probabilistic programming via imperatively defined factor graphs. In: Poster at 23rd Annual Conference on Neural Information Processing Systems, NIPS (2009)Google Scholar
  23. 23.
    McIver, A., Morgan, C.: Abstraction, refinement and proof for probabilistic systems. Monographs in computer science. Springer, Heidelberg (2005)zbMATHGoogle Scholar
  24. 24.
    McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: SIGMOD Conference, pp. 19–30. ACM, New York (2009)Google Scholar
  25. 25.
    Minka, T., Winn, J., Guiver, J., Kannan, A.: Infer.NET 2.3 (November 2009), Software available from
  26. 26.
    Minka, T., Winn, J.M.: Gates. In: NIPS, pp. 1073–1080. MIT Press, Cambridge (2008)Google Scholar
  27. 27.
    Minka, T.P.: Expectation Propagation for approximate Bayesian inference. In: UAI, pp. 362–369. Morgan Kaufmann, San Francisco (2001)Google Scholar
  28. 28.
    Ntzoufras, I.: Bayesian Modeling Using WinBUGS. Wiley, Chichester (2009)CrossRefzbMATHGoogle Scholar
  29. 29.
    Panangaden, P.: Labelled Markov processes. Imperial College Press, London (2009)Google Scholar
  30. 30.
    Park, S., Pfenning, F., Thrun, S.: A probabilistic language based upon sampling functions. In: POPL, pp. 171–182. ACM, New York (2005)Google Scholar
  31. 31.
    Pfeffer, A.: IBAL: A probabilistic rational programming language. In: Nebel, B. (ed.) IJCAI, pp. 733–740. Morgan Kaufmann, San Francisco (2001)Google Scholar
  32. 32.
    Pfeffer, A.: The design and implementation of IBAL: A General-Purpose Probabilistic Language. In: Statistical Relational Learning. MIT Press, Cambridge (2007)Google Scholar
  33. 33.
    Ramsey, N., Pfeffer, A.: Stochastic lambda calculus and monads of probability distributions. In: POPL, pp. 154–165 (2002)Google Scholar
  34. 34.
    Reed, J., Pierce, B.C.: Distance makes the types grow stronger: A calculus for differential privacy. In: ICFP, pp. 157–168 (2010)Google Scholar
  35. 35.
    Saheb-Djahromi, N.: Probabilistic LCF. In: Winkowski, J. (ed.) MFCS 1978. LNCS, vol. 64, pp. 442–451. Springer, Heidelberg (1978)CrossRefGoogle Scholar
  36. 36.
    Syme, D., Granicz, A., Cisternino, A.: Expert F#. Apress (2007)Google Scholar
  37. 37.
    Winn, J., Minka, T.: Probabilistic programming with Infer.NET. Machine Learning Summer School lecture notes (2009),
  38. 38.
    Winn, J.M., Bishop, C.M.: Variational message passing. Journal of Machine Learning Research 6, 661–694 (2005)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Johannes Borgström
    • 1
  • Andrew D. Gordon
    • 1
  • Michael Greenberg
    • 2
  • James Margetson
    • 1
  • Jurgen Van Gael
    • 1
  1. 1.Microsoft ResearchUSA
  2. 2.University of PennsylvaniaUSA

Personalised recommendations