Probabilistic Program Analysis

  • Matthew B. Dwyer
  • Antonio Filieri
  • Jaco Geldenhuys
  • Mitchell Gerrard
  • Corina S. Păsăreanu
  • Willem Visser
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10223)


This paper provides a survey of recent work on adapting techniques for program analysis to compute probabilistic characterizations of program behavior. We survey how the frameworks of data flow analysis and symbolic execution have incorporated information about input probability distributions to quantify the likelihood of properties of program states. We identify themes that relate and distinguish a variety of techniques that have been developed over the past 15 years in this area. In doing so, we point out opportunities for future research that builds on the strengths of different techniques.


Data flow analysis Symbolic execution Abstract interpretation Model checking Probabilistic program Markov decision processes 


  1. 1.
    Aydin, A., Bang, L., Bultan, T.: Automata-based model counting for string constraints. In: Proceedings of the 27th International Conference on Computer Aided Verification, CAV 2015, Part I, San Francisco, CA, USA, 18–24 July 2015, pp. 255–272 (2015)Google Scholar
  2. 2.
    Bagnara, R., Hill, P.M., Zaffanella, E.: The parma polyhedra library: toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Sci. Comput. Program. 72(1), 3–21 (2008)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bang, L., Aydin, A., Phan, Q., Pasareanu, C.S., Bultan, T.: String analysis for side channels with segmented oracles. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, 13–18 November 2016, pp. 193–204 (2016)Google Scholar
  4. 4.
    Barvinok, A.I.: A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Math. Oper. Res. 19(4), 769–779 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    de Berg, M.: Computational Geometry: Algorithms and Applications. Springer, Heidelberg (2008)zbMATHCrossRefGoogle Scholar
  6. 6.
    Biere, A., van Maaren, H.: Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications. IOS Press, Amsterdam (2009)zbMATHGoogle Scholar
  7. 7.
    Bishop, C.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
  8. 8.
    Borges, M., Filieri, A., d’Amorim, M., Păsăreanu, C.S.: Iterative distribution-aware sampling for probabilistic symbolic execution. In: Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015. ACM (2015)Google Scholar
  9. 9.
    Borges, M., Filieri, A., d’Amorim, M., Păsăreanu, C.S., Visser, W.: Compositional solution space quantification for probabilistic software analysis. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 123–132. ACM (2014)Google Scholar
  10. 10.
    Bryant, R.E.: Symbolic boolean manipulation with ordered binary-decision diagrams. ACM Comput. Surv. (CSUR) 24(3), 293–318 (1992)CrossRefGoogle Scholar
  11. 11.
    Cadar, C., Dunbar, D., Engler, D.R.: Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, vol. 8, pp. 209–224 (2008)Google Scholar
  12. 12.
    Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distribution-aware sampling and weighted model counting for SAT. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)Google Scholar
  13. 13.
    Chakraborty, S., Meel, K.S., Vardi, M.Y.: A scalable approximate model counter. In: Schulte, C. (ed.) CP 2013. LNCS, vol. 8124, pp. 200–216. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40627-0_18 CrossRefGoogle Scholar
  14. 14.
    Chistikov, D., Dimitrova, R., Majumdar, R.: Approximate counting in SMT and value estimation for probabilistic programs. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 320–334. Springer, Heidelberg (2015). doi: 10.1007/978-3-662-46681-0_26 Google Scholar
  15. 15.
    Claret, G., Rajamani, S.K., Nori, A.V., Gordon, A.D., Borgström, J.: Bayesian inference using data flow analysis. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 92–102. ACM (2013)Google Scholar
  16. 16.
    Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (1999)Google Scholar
  17. 17.
    Clarke, L., et al.: A system to generate test data and symbolically execute programs. IEEE Trans. Software Eng. 3, 215–222 (1976)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 238–252. ACM (1977)Google Scholar
  19. 19.
    Cousot, P., Monerau, M.: Probabilistic abstract interpretation. In: Seidl, H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 169–193. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-28869-2_9 CrossRefGoogle Scholar
  20. 20.
    Di Pierro, A., Wiklicky, H.: Probabilistic data flow analysis: a linear equational approach. arXiv preprint arXiv:1307.4474 (2013)
  21. 21.
    Dwyer, M.B.: Unifying testing and analysis through behavioral coverage. In: 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), p. 2. IEEE (2011)Google Scholar
  22. 22.
    Esparza, J., Gaiser, A.: Probabilistic abstractions with arbitrary domains. In: Yahav, E. (ed.) SAS 2011. LNCS, vol. 6887, pp. 334–350. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23702-7_25 CrossRefGoogle Scholar
  23. 23.
    Filieri, A., Frias, M., Păsăreanu, C., Visser, W.: Model counting for complex data structures. In: Proceedings of the 2015 International SPIN Symposium on Model Checking of Software. ACM (2015)Google Scholar
  24. 24.
    Filieri, A., Păsăreanu, C.S., Visser, W., Geldenhuys, J.: Statistical symbolic execution with informed sampling. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 437–448. ACM (2014)Google Scholar
  25. 25.
    Filieri, A., Păsăreanu, C.S., Yang, G.: Quantification of software changes through probabilistic symbolic execution. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) - Short Paper, November 2015Google Scholar
  26. 26.
    Filieri, A., Păsăreanu, C.S., Visser, W.: Reliability analysis in symbolic pathfinder. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 622–631. IEEE Press (2013)Google Scholar
  27. 27.
    Fink, S., Dolby, J.: WALA-The TJ watson libraries for analysis (2012)Google Scholar
  28. 28.
    Floyd, R.W.: Assigning meanings to programs. In: Mathematical Aspects of Computer Science, pp. 19–32 (1967)Google Scholar
  29. 29.
    Fosdick, L.D., Osterweil, L.J.: Data flow analysis in software reliability. ACM Comput. Surv. (CSUR) 8(3), 305–330 (1976)MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    Fu, K., Huang, T.: Stochastic grammars and languages. Int. J. Comput. Inform. Sci. 1(2), 135–170 (1972)MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    Geldenhuys, J., Dwyer, M.B., Visser, W.: Probabilistic symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 166–176. ACM (2012)Google Scholar
  32. 32.
    Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., Rubin, D.: Bayesian Data Analysis, 3rd edn. Chapman & Hall/CRC Texts in Statistical Science, Taylor & Francis (2013)Google Scholar
  33. 33.
    Gentle, J.: Random Number Generation and Monte Carlo Methods. Statistics and Computing. Springer, New York (2013)zbMATHGoogle Scholar
  34. 34.
    Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing. In: ACM Sigplan Notices, vol. 40, pp. 213–223. ACM (2005)Google Scholar
  35. 35.
    Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: Proceedings of the on Future of Software Engineering, pp. 167–181. ACM (2014)Google Scholar
  36. 36.
    Graf, S., Saidi, H.: Construction of abstract state graphs with PVS. In: Grumberg, O. (ed.) CAV 1997. LNCS, vol. 1254, pp. 72–83. Springer, Heidelberg (1997). doi: 10.1007/3-540-63166-6_10 CrossRefGoogle Scholar
  37. 37.
    Hahn, E.M., Hermanns, H., Wachter, B., Zhang, L.: PASS: abstraction refinement for infinite probabilistic models. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 353–357. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-12002-2_30 CrossRefGoogle Scholar
  38. 38.
    Hasuo, I., Jacobs, B., Sokolova, A.: Generic trace theory. Electron. Notes Theor. Comput. Sci. 164(1), 47–65 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Hérault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 73–84. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-24622-0_8 CrossRefGoogle Scholar
  40. 40.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Jamrozik, K., Fraser, G., Tillman, N., Halleux, J.: Generating test suites with augmented dynamic symbolic execution. In: Veanes, M., Viganò, L. (eds.) TAP 2013. LNCS, vol. 7942, pp. 152–167. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38916-0_9 CrossRefGoogle Scholar
  42. 42.
    Jegourel, C., Legay, A., Sedwards, S.: Cross-entropy optimisation of importance sampling parameters for statistical model checking. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 327–342. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31424-7_26 CrossRefGoogle Scholar
  43. 43.
    Jegourel, C., Legay, A., Sedwards, S.: Importance splitting for statistical model checking rare properties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 576–591. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39799-8_38 CrossRefGoogle Scholar
  44. 44.
    Jones, C.: Probabilistic non-determinism (1990)Google Scholar
  45. 45.
    Kildall, G.A.: A unified approach to global program optimization. In: Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 194–206. ACM (1973)Google Scholar
  46. 46.
    King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–394 (1976)MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Kozen, D.: Semantics of probabilistic programs. J. Comput. Syst. Sci. 22(3), 328–350 (1981)MathSciNetzbMATHCrossRefGoogle Scholar
  48. 48.
    Kozen, D.: A probabilistic PDL. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, pp. 291–297. ACM (1983)Google Scholar
  49. 49.
    Kwiatkowska, M., Norman, G., Parker, D.: Advances and challenges of probabilistic model checking. In: 2010 Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (2010)Google Scholar
  50. 50.
    Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-22110-1_47 CrossRefGoogle Scholar
  51. 51.
    Lam, P., Bodden, E., Lhoták, O., Hendren, L.: The Soot framework for Java program analysis: a retrospective. In: Cetus Users and Compiler Infrastructure Workshop, Galveston Island, TX, October 2011Google Scholar
  52. 52.
    Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (2004)Google Scholar
  53. 53.
    Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H., Falcone, Y., Finkbeiner, B., Havelund, K., Lee, I., Pace, G., Roşu, G., Sokolsky, O., Tillmann, N. (eds.) RV 2010. LNCS, vol. 6418, pp. 122–135. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-16612-9_11 CrossRefGoogle Scholar
  54. 54.
    Luckow, K., Păsăreanu, C.S., Dwyer, M.B., Filieri, A., Visser, W.: Exact and approximate probabilistic symbolic execution for nondeterministic programs. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 575–586. ACM (2014)Google Scholar
  55. 55.
    Luu, L., Shinde, S., Saxena, P., Demsky, B.: A model counter for constraints over unbounded strings. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 565–576. ACM (2014)Google Scholar
  56. 56.
    Mardziel, P., Magill, S., Hicks, M., Srivatsa, M.: Dynamic enforcement of knowledge-based security policies using probabilistic abstract interpretation. J. Comput. Secur. 21(4), 463–532 (2013)CrossRefGoogle Scholar
  57. 57.
    McDonald, J.B.: Some generalized functions for the size distribution of income. Econometrica J. Econometric Soc. 52, 647–663 (1984)zbMATHCrossRefGoogle Scholar
  58. 58.
    Meel, K.S.: Sampling techniques for boolean satisfiability. CoRR abs/1404.6682 (2014).
  59. 59.
    Monniaux, D.: Abstract interpretation of probabilistic semantics. In: Palsberg, J. (ed.) SAS 2000. LNCS, vol. 1824, pp. 322–339. Springer, Heidelberg (2000). doi: 10.1007/978-3-540-45099-3_17 CrossRefGoogle Scholar
  60. 60.
    Monniaux, D.: Backwards Abstract Interpretation of Probabilistic Programs. In: Sands, D. (ed.) ESOP 2001. LNCS, vol. 2028, pp. 367–382. Springer, Heidelberg (2001). doi: 10.1007/3-540-45309-1_24 CrossRefGoogle Scholar
  61. 61.
    Monniaux, D.: Abstract interpretation of programs as markov decision processes. Sci. Comput. Program. 58(1), 179–205 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  62. 62.
    Morgan, C., McIver, A., Seidel, K.: Probabilistic predicate transformers. ACM Trans. Program. Lang. Syst. (TOPLAS) 18(3), 325–353 (1996)CrossRefGoogle Scholar
  63. 63.
    Murta, D., Oliveira, J.N.: A study of risk-aware program transformation. Sci. Comput. Program. 110(C), 51–77 (2015)CrossRefGoogle Scholar
  64. 64.
    Oliveira, J.N., Miraldo, V.C.: “Keep definition, change category” — a practical approach to state-based system calculi. J. Logical Algebraic Methods Program. 85(4), 449–474 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  65. 65.
    Pasareanu, C.S., Phan, Q., Malacaria, P.: Multi-run side-channel analysis using symbolic execution and Max-SMT. In: IEEE 29th Computer Security Foundations Symposium, CSF 2016, Lisbon, Portugal, 27 June–1 July 2016, pp. 387–400 (2016)Google Scholar
  66. 66.
    Păsăreanu, C.S., Rungta, N.: Symbolic pathfinder: symbolic execution of Java bytecode. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 179–180. ACM (2010)Google Scholar
  67. 67.
    Pestman, W.R.: Mathematical Statistics: An Introduction, vol. 1. Walter de Gruyter, Berlin (1998)zbMATHCrossRefGoogle Scholar
  68. 68.
    Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of MDPs with convex uncertainties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 527–542. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39799-8_35 CrossRefGoogle Scholar
  69. 69.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)zbMATHCrossRefGoogle Scholar
  70. 70.
    Ramalingam, G.: Data flow frequency analysis. In: ACM SIGPLAN Notices, vol. 31, pp. 267–277. ACM (1996)Google Scholar
  71. 71.
    Robert, C.: The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer Texts in Statistics. Springer, New York (2007)zbMATHGoogle Scholar
  72. 72.
    Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2013)zbMATHGoogle Scholar
  73. 73.
    Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2005)zbMATHGoogle Scholar
  74. 74.
    Sang, T., Beame, P., Kautz, H.: Heuristics for fast exact model counting. In: Bacchus, F., Walsh, T. (eds.) SAT 2005. LNCS, vol. 3569, pp. 226–240. Springer, Heidelberg (2005). doi: 10.1007/11499107_17 CrossRefGoogle Scholar
  75. 75.
    Schmidt, D.A.: Data flow analysis is model checking of abstract interpretations. In: Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 38–48. ACM (1998)Google Scholar
  76. 76.
    Sen, K., Marinov, D., Agha, G.: CUTE: A concolic unit testing engine for C (2005)Google Scholar
  77. 77.
    Smith, M.J.: Probabilistic abstract interpretation of imperative programs using truncated normal distributions. Electron. Notes Theor. Comput. Sci. 220(3), 43–59 (2008)zbMATHCrossRefGoogle Scholar
  78. 78.
    Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-89862-7_1 CrossRefGoogle Scholar
  79. 79.
    Thakur, A., Elder, M., Reps, T.: Bilateral algorithms for symbolic abstraction. In: Miné, A., Schmidt, D. (eds.) SAS 2012. LNCS, vol. 7460, pp. 111–128. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33125-1_10 CrossRefGoogle Scholar
  80. 80.
    The Apache Software Foundation: Commons math. Accessed 16 Dec 2014
  81. 81.
    Thurley, M.: sharpSAT – counting models with advanced component caching and implicit BCP. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, pp. 424–429. Springer, Heidelberg (2006). doi: 10.1007/11814948_38 CrossRefGoogle Scholar
  82. 82.
    Thurow, L.C.: Analyzing the American income distribution. Am. Econ. Rev. 60, 261–269 (1970)Google Scholar
  83. 83.
    UC Davis, Mathematics: LattE.
  84. 84.
    Vallée-Rai, R., Co, P., Gagnon, E., Hendren, L., Lam, P., Sundaresan, V.: Soot-a Java bytecode optimization framework. In: Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, p. 13. IBM Press (1999)Google Scholar
  85. 85.
    Verdoolaege, S.: Software package barvinok (2004).
  86. 86.
    Wachter, B., Zhang, L.: Best probabilistic transformers. In: Barthe, G., Hermenegildo, M. (eds.) VMCAI 2010. LNCS, vol. 5944, pp. 362–379. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-11319-2_26 CrossRefGoogle Scholar
  87. 87.
    Zuliani, P., Platzer, A., Clarke, E.M.: Bayesian statistical model checking with application to simulink/stateflow verification. In: Proceedings of the 13th ACM International Conference on Hybrid Systems: Computation and Control, pp. 243–252. ACM (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Matthew B. Dwyer
    • 1
  • Antonio Filieri
    • 2
  • Jaco Geldenhuys
    • 4
  • Mitchell Gerrard
    • 1
  • Corina S. Păsăreanu
    • 3
  • Willem Visser
    • 4
  1. 1.University of Nebraska – LincolnLincolnUSA
  2. 2.Imperial College LondonLondonUK
  3. 3.Carnegie Mellon Silicon Valley and NASA Ames Research CenterSanta ClaraUSA
  4. 4.University of StellenboschStellenboschSouth Africa

Personalised recommendations