# Data-Driven Synthesis of Full Probabilistic Programs

- 5 Citations
- 2k Downloads

## Abstract

Probabilistic programming languages (PPLs) provide users a clean syntax for concisely representing probabilistic processes and easy access to sophisticated built-in inference algorithms. Unfortunately, writing a PPL program by hand can be difficult for non-experts, requiring extensive knowledge of statistics and deep insights into the data. To make the modeling process easier, we have created a tool that synthesizes PPL programs from relational datasets. Our synthesizer leverages the input data to generate a program sketch, then applies simulated annealing to complete the sketch. We introduce a data-guided approach to the program mutation stage of simulated annealing; this innovation allows our tool to scale to synthesizing complete probabilistic programs from scratch. We find that our synthesizer produces accurate programs from 10,000-row datasets in 21 s on average.

## Notes

### Acknowledgements

We thank Dawn Song and Rastislav Bodik for their thoughtful feedback. This work is supported in part by NSF Grants CCF–1139138, CCF–1337415, NSF ACI–1535191, and Graduate Research Fellowship DGE–1106400, a Microsoft Research PhD Fellowship, a grant from the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Energy Frontier Research Centers program under Award Number FOA–0000619, and grants from DARPA FA8750–14–C–0011 and DARPA FA8750–16–2–0032, as well as gifts from Google, Intel, Mozilla, Nokia, and Qualcomm.

## References

- 1.RITA \(|\) BTS \(|\) Transtats. http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time. Accessed 05 Feb 2016
- 2.Akiba, T., Imajo, K., Iwami, H., Iwata, Y., Kataoka, T., Takahashi, N., Moskal, M., Swamy, N.: Calibrating research in program synthesis using 72,000 hours of programmer time. Technical report MSR (2013)Google Scholar
- 3.Alur, R., Bodik, R., Dallal, E., Fisman, D., Garg, P., Juniwal, G., Kress-Gazit, H., Madhusudan, P., Martin, M.M.K., Raghothaman, M., Saha, S., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-guided synthesis. In: SyGus Competition (2014)Google Scholar
- 4.Arora, N.S., Russell, S.J., Sudderth, E.B.: Automatic inference in BLOG. In: Statistical Relational Artificial Intelligence, AAAI Workshops, vol. WS-10-06. AAAI (2010)Google Scholar
- 5.Barthe, G., Crespo, J.M., Gulwani, S., Kunz, C., Marron, M.: From relational verification to SIMD loop synthesis. In: PPoPP (2013)Google Scholar
- 6.Bhat, S., Borgström, J., Gordon, A.D., Russo, C.: Deriving probability density functions from probabilistic functional programs. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 508–522. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36742-7_35 CrossRefGoogle Scholar
- 7.Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metasketches. In: POPL (2016)Google Scholar
- 8.Feizi, S., Marbach, D., Médard, M., Kellis, M.: Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol.
**31**(8), 726–733 (2013)CrossRefGoogle Scholar - 9.Gens, R., Domingos, P.M.: Learning the structure of sum-product networks. In: ICML (2013)Google Scholar
- 10.Gilks, W.R., Thomas, A., Spiegelhalter, D.J.: A language and program for complex Bayesian modelling. J. R. Stat. Soc. Ser. D (Stat.)
**43**(1), 169–177 (1994)Google Scholar - 11.Goodman, N.D., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: a language for generative models. In: UAI, pp. 220–229 (2008)Google Scholar
- 12.Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: FOSE 2014 (2014)Google Scholar
- 13.Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In: PLDI (2011)Google Scholar
- 14.Heckerman, D.: A tutorial on learning with Bayesian networks. In: Learning in Graphical Models, pp. 301–354. MIT Press, Cambridge (1999)Google Scholar
- 15.Hofmann, H., Cook, D., Kielion, C., Schloerke, B., Hobbs, J., Loy, A., Mosley, L., Rockoff, D., Huang, Y., Wrolstad, D., Yin, T.: Delayed, canceled, on time, boarding.. flying in the USA. J. Comput. Graph. Stat.
**20**(2), 287–290 (2011)MathSciNetCrossRefGoogle Scholar - 16.Koller, D., McAllester, D., Pfeffer, A.: Effective Bayesian inference for stochastic programs. In: AAAI/IAAI (1997)Google Scholar
- 17.Kozlov, A.V., Koller, D.: Nonuniform dynamic discretization in hybrid networks. In: UAI (1997)Google Scholar
- 18.Li, L., Wu, Y., Russell, S.J.: SWIFT: compiled inference for probabilistic programs. Technical report UCB/EECS-2015-12, EECS Department, University of California, Berkeley, March 2015. http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-12.html
- 19.Lowd, D., Domingos, P.M.: Learning arithmetic circuits. In: UAI (2008)Google Scholar
- 20.Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D.L., Kolobov, A.: BLOG: probabilistic models with unknown objects. In: IJCAI, pp. 1352–1359 (2005)Google Scholar
- 21.Moral, S., Rumi, R., Salmerón, A.: Mixtures of truncated exponentials in hybrid Bayesian networks. In: Benferhat, S., Besnard, P. (eds.) ECSQARU 2001. LNCS, vol. 2143, pp. 156–167. Springer, Heidelberg (2001). doi: 10.1007/3-540-44652-4_15 CrossRefGoogle Scholar
- 22.Nori, A.V., Hur, C.K., Rajamani, S.K., Samuel, S.: R2: an efficient MCMC sampler for probabilistic programs. In: AAAI, July 2014Google Scholar
- 23.Nori, A.V., Ozair, S., Rajamani, S.K., Vijaykeerthy, D.: Efficient synthesis of probabilistic programs. In: PLDI (2015)Google Scholar
- 24.Perov, Y.N., Wood, F.D.: Learning probabilistic programs. CoRR abs/1407.2646 (2014). http://arxiv.org/abs/1407.2646
- 25.Phothilimthana, P.M., Thakur, A., Bodik, R., Dhurjati, D.: Scaling up superoptimization. In: ASPLOS (2016)Google Scholar
- 26.Poli, R., Graff, M., McPhee, N.F.: Free lunches for function and program induction. In: FOGA (2009)Google Scholar
- 27.Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: ICCV Workshops (2011)Google Scholar
- 28.Romero, V., Rumí, R., Salmerón, A.: Learning hybrid Bayesian networks using mixtures of truncated exponentials. Int. J. Approx. Reason.
**42**(1–2), 54–68 (2006)MathSciNetCrossRefzbMATHGoogle Scholar - 29.Schkufza, E., Sharma, R., Aiken, A.: Stochastic superoptimization. In: ASPLOS (2013)Google Scholar
- 30.Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science
**324**, 81–85 (2009)CrossRefGoogle Scholar - 31.Shah, A., Woolf, P.J.: Python environment for bayesian learning: inferring the structure of bayesian networks from knowledge and data. J. Mach. Learn. Res.
**10**, 159–162 (2009)Google Scholar - 32.Solar-Lezama, A., Tancau, L., Bodik, R., Seshia, S., Saraswat, V.: Combinatorial sketching for finite programs. In: ASPLOS (2006)Google Scholar
- 33.Torlak, E., Bodik, R.: A lightweight symbolic virtual machine for solver-aided host languages. In: PLDI (2014)Google Scholar
- 34.Udupa, A., Raghavan, A., Deshmukh, J.V., Mador-Haim, S., Martin, M.M., Alur, R.: TRANSIT: specifying protocols with concolic snippets. In: PLDI (2013)Google Scholar
- 35.Wicklin, R.: An analysis of airline delays with SAS/IMLr Studio (2009)Google Scholar
- 36.Wong, M.L., Leung, K.S.: Evolutionary program induction directed by logic grammars. Evol. Comput.
**5**(2), 143–180 (1997)CrossRefGoogle Scholar - 37.Woodward, J.R., Bai, R.: Why evolution is not a good paradigm for program induction: a critique of genetic programming. In: ACM/SIGEVO GEC, GEC 2009 (2009)Google Scholar
- 38.Yoo, C., Thorsson, V., Cooper, G.F.: Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational DNA microarray data. In: Proceedings of PSB, pp. 498–509 (2002)Google Scholar