Formal Methods in System Design

, Volume 53, Issue 2, pp 189–220 | Cite as

Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses

  • Kihong Heo
  • Hakjoo Oh
  • Hongseok Yang


We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33\(\times \) faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2%. The general idea behind our methodis applicable to other types of static analyses as well. We demonstrate that our method is also effective to learn a strategy for context-sensitivity of interval analysis.


Static analysis Machine learning Relational analysis Context-sensitivity 



This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Numbers SRFC-IT1701-09. This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2015-0-00565, Development of Vulnerability Discovery Technologies for IoT Software Security, No. 2017-0-00184, Self-Learning Cyber Immune Technology Development).


  1. 1.
    Blanchet B, Cousot P, Cousot R, Feret J, Mauborgne L, Miné A, Monniaux D, Rival X (2003) A static analyzer for large safety-critical software. In: PLDIGoogle Scholar
  2. 2.
    Breiman L (2001) Random forests. Machine Learning 45:5–32CrossRefGoogle Scholar
  3. 3.
    Cousot P, Halbwachs N (1978) Automatic discovery of linear restraints among variables of a program. In: POPLGoogle Scholar
  4. 4.
    Garg P, Neider D, Madhusudan P, Roth D (2016) Learning invariants using decision trees and implication counterexamples. In: POPL, pp 499–512Google Scholar
  5. 5.
    Grigore R, Yang H (2016) Abstraction refinement guided by a learnt probabilistic model. In: POPLGoogle Scholar
  6. 6.
    Heo K, Oh H, Yang H (2016) Learning a variable-clustering strategy for octagon from labeled data generated by a static analysis. In: SASGoogle Scholar
  7. 7.
    Jeannet B, Miné A (2009) Apron: a library of numerical abstract domains for static analysis. In: CAVGoogle Scholar
  8. 8.
    Mangal R, Zhang X, Nori AV, Naik M (2015) A user-guided approach to program analysis. In: ESEC/FSE, pp 462–473Google Scholar
  9. 9.
    Miné A (2006) The octagon abstract domain. Higher-Order Symb Comput 19:31–100CrossRefGoogle Scholar
  10. 10.
    Mitchell TM (1997) Machine learning. McGraw-Hill Inc, New YorkzbMATHGoogle Scholar
  11. 11.
    Murphy KP (2012) Machine learning: a probabilistic perspective (adaptive computation and machine learning series). Mit Press ISBNGoogle Scholar
  12. 12.
    Nori AV, Sharma R (2013) Termination proofs from tests. In: FSE, pp 246–256Google Scholar
  13. 13.
    Octeau D, Jha S, Dering M, McDaniel P, Bartel A, Li L, Klein J, Le Traon Y (2016) Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: POPL, pp 469–484Google Scholar
  14. 14.
    Oh H, Heo K, Lee W, Lee W, Park D, Kang J, Yi K (2014) Global sparse analysis framework. ACM Trans Program Lang Syst 36(3):8:1–8:44.
  15. 15.
    Oh H, Heo K, Lee W, Lee W, Yi K (2012) Design and implementation of sparse global analyses for C-like languages. In: PLDIGoogle Scholar
  16. 16.
    Oh H, Lee W, Heo K, Yang H, Yi K (2014) Selective context-sensitivity guided by impact pre-analysis. In: PLDIGoogle Scholar
  17. 17.
    Oh H, Yang H, Yi K (2015) Learning a strategy for adapting a program analysis via bayesian optimisation. In: OOPSLAGoogle Scholar
  18. 18.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  19. 19.
    Raychev V, Bielik P, Vechev MT, Krause A (2016) Learning programs from noisy data. In: POPL, pp. 761–774Google Scholar
  20. 20.
    Sankaranarayanan S, Chaudhuri S, Ivančić F, Gupta A (2008) Dynamic inference of likely data preconditions over predicates by tree learning. In: ISSTA, pp 295–306Google Scholar
  21. 21.
    Sankaranarayanan S, Ivančić F, Gupta A (2008) Mining library specifications using inductive logic programming. In: ICSE, pp 131–140Google Scholar
  22. 22.
    Sharir M, Pnueli A (1981) Two approaches to interprocedural data flow analysis. Program flow analysis: theory and applications. Prentice-Hall, Englewood Cliffs, pp 189–234Google Scholar
  23. 23.
    Sharma R, Gupta S, Hariharan B, Aiken A, Liang P, Nori AV (2013) A data driven approach for algebraic loop invariants. In: ESOP, pp 574–592.
  24. 24.
    Sharma R, Gupta S, Hariharan B, Aiken A, Nori AV (2013) Verification as learning geometric concepts. In: SAS, pp 388–411Google Scholar
  25. 25.
    Sharma R, Nori AV, Aiken A (2012) Interpolants as classifiers. In: CAV, pp 71–87Google Scholar
  26. 26.
    Singh, G., Püschel, M., Vechev, M (2015) Making Numerical Program Analysis Fast. In: PLDIGoogle Scholar
  27. 27.
  28. 28.
    Venet A, Brat G (2004) Precise and efficient static array bound checking for large embedded C programs. In: PLDIGoogle Scholar
  29. 29.
    Yi K, Choi H, Kim J, Kim Y (2007) An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf Process Lett 102(2–3):118–123MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.University of PennsylvaniaPhiladelphiaUSA
  2. 2.Korea UniversitySeoulKorea
  3. 3.KAISTDaejeonKorea

Personalised recommendations