Advertisement

Acta Informatica

, Volume 53, Issue 4, pp 327–356 | Cite as

Symbolic automata for representing big code

  • Hila Peleg
  • Sharon Shoham
  • Eran YahavEmail author
  • Hongseok Yang
Original Article

Abstract

Analysis of massive codebases (“big code”) presents an opportunity for drawing insights about programming practice and enabling code reuse. One of the main challenges in analyzing big code is finding a representation that captures sufficient semantic information, can be constructed efficiently, and is amenable to meaningful comparison operations. We present a formal framework for representing code in large codebases. In our framework, the semantic descriptor for each code snippet is a partial temporal specification that captures the sequences of method invocations on an API. The main idea is to represent partial temporal specifications as symbolic automata—automata where transitions may be labeled by variables, and a variable can be substituted by a letter, a word, or a regular language. Using symbolic automata, we construct an abstract domain for static analysis of big code, capturing both the partialness of a specification and the precision of a specification. We show interesting relationships between lattice operations of this domain and common operators for manipulating partial temporal specifications, such as building a more informative specification by consolidating two partial specifications, and comparing partial temporal specifications.

Notes

Acknowledgments

The research was partially supported by The Israeli Science Foundation (Grant No. 965/10) and EU’s FP7 Program/ERC Agreement No. 615688. Yang was partially supported by EPSRC. Peleg was partially supported by EU’s FP7 Program/ERC Agreement No. 321174. Shoham was partially supported by BSF Grant No. 2012259.

References

  1. 1.
    Abdulla, P.A., Chen, Y.F., Holík, L., Mayr, R., Vojnar, T.: When simulation meets antichains. In: TACAS, pp. 158–174 (2010)Google Scholar
  2. 2.
    Acharya, M., Xie, T., Pei, J., Xu, J.: Mining API patterns as partial orders from source code: from usage scenarios to specifications. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE ’07, pp. 25–34. ACM (2007)Google Scholar
  3. 3.
    Alur, R., Cerny, P., Madhusudan, P., Nam, W.: Synthesis of interface specifications for Java classes. In: Proceedings of the 32Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’05, pp. 98–109. ACM (2005)Google Scholar
  4. 4.
    Ammons, G., Bodik, R., Larus, J.R.: Mining specifications. In: Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’02, pp. 4–16. ACM (2002)Google Scholar
  5. 5.
    Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE TSE 33(9), 577–591 (2007)Google Scholar
  6. 6.
    Cook, J.E., Wolf, A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998). doi: 10.1145/287000.287001 CrossRefGoogle Scholar
  7. 7.
    Dallmeier, V., Lindig, C., Wasylkowski, A., Zeller, A.: Mining object behavior with ADABU. In: Proceedings of the 2006 International Workshop on Dynamic Systems Analysis, WODA ’06, pp. 17–24. ACM (2006)Google Scholar
  8. 8.
    David, Y., Yahav, E.: Tracelet-based code search in executables. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp. 349–360. ACM, New York, NY, USA (2014). doi: 10.1145/2594291.2594343
  9. 9.
    Ganesh, V., Minnes, M., Solar-Lezama, A., Rinard, M.: Word equations with length constraints: whats decidable? In: Haifa Verification Conference, HVC’12, Lecture Notes in Computer Science, vol. 7857, pp. 209–226. Springer (2012)Google Scholar
  10. 10.
    Gruska, N., Wasylkowski, A., Zeller, A.: Learning from 6,000 projects: lightweight cross-project anomaly detection. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pp. 119–130. ACM (2010)Google Scholar
  11. 11.
    Horwitz, S.: Identifying the semantic and textual differences between two versions of a program. In: Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 234–245 (1990)Google Scholar
  12. 12.
    Horwitz, S., Reps, T., Binkley, D.: Interprocedural slicing using dependence graphs. In: PLDI ’88 (1988). doi: 10.1145/53990.53994
  13. 13.
    Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) Static Analysis. Lecture Notes in Computer Science, vol. 2126, pp. 40–56. Springer, Berlin (2001)Google Scholar
  14. 14.
    Lo, D., Khoo, S.C.: SMArTIC: towards building an accurate, robust and scalable specification miner. In: Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’06/FSE-14, pp. 265–275. ACM (2006)Google Scholar
  15. 15.
    Mariani, L., Pezzè, M.: Dynamic detection of COTS component incompatibility. IEEE Softw. 24(5), 76–85 (2007)CrossRefGoogle Scholar
  16. 16.
    Mishne, A., Shoham, S., Yahav, E.: Typestate-based semantic code search over partial programs. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’12, pp. 997–1016. ACM (2012)Google Scholar
  17. 17.
    Monperrus, M., Bruch, M., Mezini, M.: Detecting missing method calls in object-oriented software. In: Proceedings of the 24th European Conference on Object-Oriented Programming, ECOOP’10, LNCS, vol. 6183, pp. 2–25 (2010)Google Scholar
  18. 18.
    Partush, N., Yahav, E.: Abstract semantic differencing for numerical programs. In: Logozzo, F., Fhndrich, M. (eds.) Static Analysis. Lecture Notes in Computer Science, vol. 7935, pp. 238–258. Springer, Berlin (2013)Google Scholar
  19. 19.
    Partush, N., Yahav, E.: Abstract semantic differencing via speculative correlation. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’14 (2014)Google Scholar
  20. 20.
    Peleg, H., Shoham, S., Yahav, E., Yang, H.: Symbolic automata for static specification mining. In: Proceedings of Static Analysis—20th International Symposium, SAS 2013, Lecture Notes in Computer Science, vol. 7935, pp. 63–83. Springer (2013)Google Scholar
  21. 21.
    Plandowski, W.: An efficient algorithm for solving word equations. In: Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’06, pp. 467–476. ACM (2006)Google Scholar
  22. 22.
    Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp. 419–428. ACM, New York, NY, USA (2014). doi: 10.1145/2594291.2594321
  23. 23.
    Shoham, S., Yahav, E., Fink, S., Pistoia, M.: Static specification mining using automata-based abstractions. In: Proceedings of the 2007 International Symposium on Software Testing and Analysis, ISSTA ’07, pp. 174–184. ACM (2007)Google Scholar
  24. 24.
    Strom, R.E., Yemini, S.: Typestate: a programming language concept for enhancing software reliability. IEEE Trans. Softw. Eng. 12(1), 157–171 (1986)CrossRefzbMATHGoogle Scholar
  25. 25.
    Wasylkowski, A., Zeller, A., Lindig, C.: Detecting object usage anomalies. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE ’07, pp. 35–44. ACM (2007)Google Scholar
  26. 26.
    Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: Tools and Algorithms for the Construction and Analysis of Systems, TACAS’05, pp. 461–476 (2005)Google Scholar
  27. 27.
    Whaley, J., Martin, M.C., Lam, M.S.: Automatic extraction of object-oriented component interfaces. In: Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’02, pp. 218–228. ACM (2002)Google Scholar
  28. 28.
    Wulf, M.D., Doyen, L., Henzinger, T.A., Raskin, J.F.: Antichains: A new algorithm for checking universality of finite automata. In: CAV, pp. 17–30 (2006)Google Scholar
  29. 29.
    Yang, J., Evans, D., Bhardwaj, D., Bhat, T., Das, M.: Perracotta: mining temporal API rules from imperfect traces. In: Proceedings of the 28th International Conference on Software Engineering, ICSE ’06, pp. 282–291. ACM (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Hila Peleg
    • 1
  • Sharon Shoham
    • 2
  • Eran Yahav
    • 3
    Email author
  • Hongseok Yang
    • 4
  1. 1.Tel Aviv UniversityTel AvivIsrael
  2. 2.Tel Aviv-Yaffo Academic CollegeTel AvivIsrael
  3. 3.TechnionHaifaIsrael
  4. 4.University of OxfordOxfordUK

Personalised recommendations