Data-Driven Synthesis of Full Probabilistic Programs

  • Sarah ChasinsEmail author
  • Phitchaya Mangpo Phothilimthana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10426)


Probabilistic programming languages (PPLs) provide users a clean syntax for concisely representing probabilistic processes and easy access to sophisticated built-in inference algorithms. Unfortunately, writing a PPL program by hand can be difficult for non-experts, requiring extensive knowledge of statistics and deep insights into the data. To make the modeling process easier, we have created a tool that synthesizes PPL programs from relational datasets. Our synthesizer leverages the input data to generate a program sketch, then applies simulated annealing to complete the sketch. We introduce a data-guided approach to the program mutation stage of simulated annealing; this innovation allows our tool to scale to synthesizing complete probabilistic programs from scratch. We find that our synthesizer produces accurate programs from 10,000-row datasets in 21 s on average.



We thank Dawn Song and Rastislav Bodik for their thoughtful feedback. This work is supported in part by NSF Grants CCF–1139138, CCF–1337415, NSF ACI–1535191, and Graduate Research Fellowship DGE–1106400, a Microsoft Research PhD Fellowship, a grant from the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Energy Frontier Research Centers program under Award Number FOA–0000619, and grants from DARPA FA8750–14–C–0011 and DARPA FA8750–16–2–0032, as well as gifts from Google, Intel, Mozilla, Nokia, and Qualcomm.


  1. 1.
  2. 2.
    Akiba, T., Imajo, K., Iwami, H., Iwata, Y., Kataoka, T., Takahashi, N., Moskal, M., Swamy, N.: Calibrating research in program synthesis using 72,000 hours of programmer time. Technical report MSR (2013)Google Scholar
  3. 3.
    Alur, R., Bodik, R., Dallal, E., Fisman, D., Garg, P., Juniwal, G., Kress-Gazit, H., Madhusudan, P., Martin, M.M.K., Raghothaman, M., Saha, S., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: SyGus Competition (2014)Google Scholar
  4. 4.
    Arora, N.S., Russell, S.J., Sudderth, E.B.: Automatic inference in BLOG. In: Statistical Relational Artificial Intelligence, AAAI Workshops, vol. WS-10-06. AAAI (2010)Google Scholar
  5. 5.
    Barthe, G., Crespo, J.M., Gulwani, S., Kunz, C., Marron, M.: From relational verification to SIMD loop synthesis. In: PPoPP (2013)Google Scholar
  6. 6.
    Bhat, S., Borgström, J., Gordon, A.D., Russo, C.: Deriving probability density functions from probabilistic functional programs. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 508–522. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36742-7_35 CrossRefGoogle Scholar
  7. 7.
    Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metasketches. In: POPL (2016)Google Scholar
  8. 8.
    Feizi, S., Marbach, D., Médard, M., Kellis, M.: Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol. 31(8), 726–733 (2013)CrossRefGoogle Scholar
  9. 9.
    Gens, R., Domingos, P.M.: Learning the structure of sum-product networks. In: ICML (2013)Google Scholar
  10. 10.
    Gilks, W.R., Thomas, A., Spiegelhalter, D.J.: A language and program for complex Bayesian modelling. J. R. Stat. Soc. Ser. D (Stat.) 43(1), 169–177 (1994)Google Scholar
  11. 11.
    Goodman, N.D., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: a language for generative models. In: UAI, pp. 220–229 (2008)Google Scholar
  12. 12.
    Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: FOSE 2014 (2014)Google Scholar
  13. 13.
    Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In: PLDI (2011)Google Scholar
  14. 14.
    Heckerman, D.: A tutorial on learning with Bayesian networks. In: Learning in Graphical Models, pp. 301–354. MIT Press, Cambridge (1999)Google Scholar
  15. 15.
    Hofmann, H., Cook, D., Kielion, C., Schloerke, B., Hobbs, J., Loy, A., Mosley, L., Rockoff, D., Huang, Y., Wrolstad, D., Yin, T.: Delayed, canceled, on time, boarding.. flying in the USA. J. Comput. Graph. Stat. 20(2), 287–290 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Koller, D., McAllester, D., Pfeffer, A.: Effective Bayesian inference for stochastic programs. In: AAAI/IAAI (1997)Google Scholar
  17. 17.
    Kozlov, A.V., Koller, D.: Nonuniform dynamic discretization in hybrid networks. In: UAI (1997)Google Scholar
  18. 18.
    Li, L., Wu, Y., Russell, S.J.: SWIFT: compiled inference for probabilistic programs. Technical report UCB/EECS-2015-12, EECS Department, University of California, Berkeley, March 2015.
  19. 19.
    Lowd, D., Domingos, P.M.: Learning arithmetic circuits. In: UAI (2008)Google Scholar
  20. 20.
    Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D.L., Kolobov, A.: BLOG: probabilistic models with unknown objects. In: IJCAI, pp. 1352–1359 (2005)Google Scholar
  21. 21.
    Moral, S., Rumi, R., Salmerón, A.: Mixtures of truncated exponentials in hybrid Bayesian networks. In: Benferhat, S., Besnard, P. (eds.) ECSQARU 2001. LNCS, vol. 2143, pp. 156–167. Springer, Heidelberg (2001). doi: 10.1007/3-540-44652-4_15 CrossRefGoogle Scholar
  22. 22.
    Nori, A.V., Hur, C.K., Rajamani, S.K., Samuel, S.: R2: an efficient MCMC sampler for probabilistic programs. In: AAAI, July 2014Google Scholar
  23. 23.
    Nori, A.V., Ozair, S., Rajamani, S.K., Vijaykeerthy, D.: Efficient synthesis of probabilistic programs. In: PLDI (2015)Google Scholar
  24. 24.
    Perov, Y.N., Wood, F.D.: Learning probabilistic programs. CoRR abs/1407.2646 (2014).
  25. 25.
    Phothilimthana, P.M., Thakur, A., Bodik, R., Dhurjati, D.: Scaling up superoptimization. In: ASPLOS (2016)Google Scholar
  26. 26.
    Poli, R., Graff, M., McPhee, N.F.: Free lunches for function and program induction. In: FOGA (2009)Google Scholar
  27. 27.
    Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: ICCV Workshops (2011)Google Scholar
  28. 28.
    Romero, V., Rumí, R., Salmerón, A.: Learning hybrid Bayesian networks using mixtures of truncated exponentials. Int. J. Approx. Reason. 42(1–2), 54–68 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Schkufza, E., Sharma, R., Aiken, A.: Stochastic superoptimization. In: ASPLOS (2013)Google Scholar
  30. 30.
    Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009)CrossRefGoogle Scholar
  31. 31.
    Shah, A., Woolf, P.J.: Python environment for bayesian learning: inferring the structure of bayesian networks from knowledge and data. J. Mach. Learn. Res. 10, 159–162 (2009)Google Scholar
  32. 32.
    Solar-Lezama, A., Tancau, L., Bodik, R., Seshia, S., Saraswat, V.: Combinatorial sketching for finite programs. In: ASPLOS (2006)Google Scholar
  33. 33.
    Torlak, E., Bodik, R.: A lightweight symbolic virtual machine for solver-aided host languages. In: PLDI (2014)Google Scholar
  34. 34.
    Udupa, A., Raghavan, A., Deshmukh, J.V., Mador-Haim, S., Martin, M.M., Alur, R.: TRANSIT: specifying protocols with concolic snippets. In: PLDI (2013)Google Scholar
  35. 35.
    Wicklin, R.: An analysis of airline delays with SAS/IMLr Studio (2009)Google Scholar
  36. 36.
    Wong, M.L., Leung, K.S.: Evolutionary program induction directed by logic grammars. Evol. Comput. 5(2), 143–180 (1997)CrossRefGoogle Scholar
  37. 37.
    Woodward, J.R., Bai, R.: Why evolution is not a good paradigm for program induction: a critique of genetic programming. In: ACM/SIGEVO GEC, GEC 2009 (2009)Google Scholar
  38. 38.
    Yoo, C., Thorsson, V., Cooper, G.F.: Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational DNA microarray data. In: Proceedings of PSB, pp. 498–509 (2002)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Sarah Chasins
    • 1
    Email author
  • Phitchaya Mangpo Phothilimthana
    • 1
  1. 1.University of CaliforniaBerkeleyUSA

Personalised recommendations