Static Value Analysis of Python Programs by Abstract Interpretation

  • Aymeric Fromherz
  • Abdelraouf Ouadjaout
  • Antoine Miné
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10811)

Abstract

We propose a static analysis by abstract interpretation for a significant subset of Python to infer variable values, run-time errors, and uncaught exceptions. Python is a high-level language with dynamic typing, a class-based object system, complex control structures such as generators, and a large library of builtin objects. This makes static reasoning on Python programs challenging. The control flow is highly dependent on the type of values, which we thus infer accurately.

As Python lacks a formal specification, we first present a concrete collecting semantics of reachable program states. We then propose a non-relational flow-sensitive type and value analysis based on simple abstract domains for each type, and handle non-local control such as exceptions through continuations. We show how to infer relational numeric invariants by leveraging the type information we gather. Finally, we propose a relational abstraction of generators to count the number of available elements and prove that no StopIteration exception is raised.

Our prototype implementation is heavily in development; it does not support some Python features, such as recursion nor the compile builtin, and it handles only a small part of the builtin objects and standard library. Nevertheless, we are able to present preliminary experimental results on analyzing actual, if small, Python code from a benchmarking application and a regression test suite.

References

  1. 1.
    Åkerblom, B., Stendahl, J., Tumlin, M., Wrigstad, T.: Tracing dynamic features in python programs. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 292–295. ACM (2014)Google Scholar
  2. 2.
    Amadini, R., et al.: Combining string abstract domains for JavaScript analysis: an evaluation. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 41–57. Springer, Heidelberg (2017).  https://doi.org/10.1007/978-3-662-54577-5_3 CrossRefGoogle Scholar
  3. 3.
    Ancona, D., Ancona, M., Cuni, A., Matsakis, N.D.: RPython: a step towards reconciling dynamically and statically typed OO languages. In: Proceedings of the 2007 Symposium on Dynamic Languages, DLS 2007, pp. 53–64. ACM (2007)Google Scholar
  4. 4.
    Balakrishnan, G., Reps, T.: Recency-abstraction for heap-allocated storage. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 221–239. Springer, Heidelberg (2006).  https://doi.org/10.1007/11823230_15 CrossRefGoogle Scholar
  5. 5.
    Bertrane, J., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Rival, X.: Static analysis and verification of aerospace software by abstract interpretation. In: AIAA Infotech\(@\)Aerospace, number 2010–3385 in AIAA, pp. 1–38. AIAA (American Institute of Aeronautics and Astronautics), April 2010Google Scholar
  6. 6.
    Bodin, M., Chargueraud, A., Filaretti, D., Gardner, P., Maffeis, S., Naudziuniene, D., Schmitt, A., Smith, G.: A trusted mechanised JavaScript specification. SIGPLAN Not. 49(1), 87–100 (2014)MATHGoogle Scholar
  7. 7.
    Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM Symposium on Principles of Programming Languages (POPL 1977), pp. 238–252. ACM, January 1977Google Scholar
  8. 8.
    Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Conference Record of the 5th Annual ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages (POPL 1978), pp. 84–97. ACM (1978)Google Scholar
  9. 9.
    Standard ECMA-262. ECMAScript 2017 Language Specification, 8th edn, June 2017Google Scholar
  10. 10.
    Guha, A., Saftoiu, C., Krishnamurthi, S.: The essence of JavaScript. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 126–150. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14107-2_7 Google Scholar
  11. 11.
    Guth, D.: A formal semantics of Python 3.3. Master’s thesis, University of Illinois at Urbana-Champaign, July 2013Google Scholar
  12. 12.
    Hassan, M.: SMT-based static type inference for Python 3. Bachelor thesis, ETH Zürich, Department of Computer Science (2017)Google Scholar
  13. 13.
    Jensen, S.H., Jonsson, P.A., Møller, A.: Remedying the eval that men do. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pp. 34–44. ACM (2012)Google Scholar
  14. 14.
    Jensen, S.H., Møller, A., Thiemann, P.: Type analysis for JavaScript. In: Palsberg, J., Su, Z. (eds.) SAS 2009. LNCS, vol. 5673, pp. 238–255. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03237-0_17 CrossRefGoogle Scholar
  15. 15.
    Kashyap, V., Dewey, K., Kuefner, E.A., Wagner, J., Gibbons, K., Sarracino, J., Wiedermann, B., Hardekopf, B.: JSAI: a static analysis platform for JavaScript. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pp. 121–132. ACM (2014)Google Scholar
  16. 16.
    Madsen, M., Andreasen, E.: String analysis for dynamic field access. In: Cohen, A. (ed.) CC 2014. LNCS, vol. 8409, pp. 197–217. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54807-9_12 CrossRefGoogle Scholar
  17. 17.
    Miné, A.: The octagon abstract domain. Higher Order Symbol. Comput. 19(1), 31–100 (2006)CrossRefMATHGoogle Scholar
  18. 18.
    Oh, H., Heo, K., Lee, W., Lee, W., Yi, K.: Design and implementation of sparse global analyses for C-like languages. SIGPLAN Not. 47(6), 229–238 (2012)CrossRefGoogle Scholar
  19. 19.
    Poli, F.: A small step abstract interpreter for (desugared) Python. Master’s thesis, Università degli Studi di Padova, Dipartimento di Matematica (2016)Google Scholar
  20. 20.
    Politz, J.G., Martinez, A., Milano, M., Warren, S., Patterson, D., Li, J., Chitipothu, A., Krishnamurthi, S.: Python: the full monty. SIGPLAN Not. 48(10), 217–232 (2013)CrossRefGoogle Scholar
  21. 21.
    Python Software Foundation. The Python language reference, 3.6 edn (2017). https://docs.python.org/3.6/reference
  22. 22.
    Ranson, J.F., Hamilton, H.J., Fong, P.W.L.: A semantics of Python in Isabelle/HOL. Technical report, Department of Computer Science, University of Regina, December 2008Google Scholar
  23. 23.
    Sharir, M., Pnueli, A.: Two approaches to interprocedural data flow analysis. In: Program Flow Analysis: Theory and Applications, pp. 189–234. Prentice-Hall, Upper Saddle River (1981)Google Scholar
  24. 24.
    Smaragdakis, Y., Bravenboer, M., Lhoták, O.: Pick your contexts well: understanding object-sensitivity. SIGPLAN Not. 46(1), 17–30 (2011)CrossRefMATHGoogle Scholar
  25. 25.
    Smeding, G.J.: An executable operational semantics for Python. Master’s thesis, Universiteit Utrecht (2009)Google Scholar
  26. 26.
    Spoto, F.: Julia: a generic static analyser for the Java bytecode. In: Proceedings of the 7th Workshop on Formal Techniques for Java-like Programs (FTfJP 2005), p. 17, July 2005Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Sorbonne Université, CNRS, Laboratoire d’Informatique de Paris 6, LIP6ParisFrance

Personalised recommendations