Formal Methods in System Design

, Volume 50, Issue 2–3, pp 249–288 | Cite as

Z3str2: an efficient solver for strings, regular expressions, and length constraints

  • Yunhui Zheng
  • Vijay Ganesh
  • Sanu Subramanian
  • Omer Tripp
  • Murphy Berzish
  • Julian Dolby
  • Xiangyu Zhang
Article

Abstract

In recent years, string solvers have become an essential component in many formal verification, security analysis, and bug-finding tools. Such solvers typically support a theory of string equations, the length function, and the regular-expression membership predicate. These enable considerable expressive power, which comes at the cost of slow solving time, and in some cases even non-termination. We present three techniques, designed for word-based SMT string solvers, to mitigate these problems: (1) detecting overlapping variables, which is essential to avoiding common cases of non-termination; (2) pruning of the search space via bi-directional integration between the string and integer theories, enabling new cross-domain heuristics; and (3) a binary search based heuristic, allowing the procedure to skip unnecessary string length queries and converge on consistent length assignments faster for large strings. We have implemented above techniques atop the Z3-str solver, resulting in a significantly more robust and efficient solver, dubbed Z3str2, for the quantifier-free theory of string equations, the regular-expression membership predicate, and linear arithmetic over the length function. We report on a series of experiments over four sets of challenging real-world benchmarks, where we compare Z3str2 with five different string solvers: S3, CVC4, Kaluza, PISA and Stranger. Each of these tools utilizes a different solving strategy and/or string representation (based e.g. on words, bit vectors or automata). The results point to the efficacy of our proposed techniques, which yield dramatic performance improvement. We also demonstrate performance improvements enabled by Z3str2 in the context of symbolic execution for string-manipulating programs. We observe that the techniques presented here are of broad applicability, and can be integrated into other string solvers to improve their performance.

Keywords

String constraint solver SMT solver String analysis 

References

  1. 1.
    Z3str2 String Constraint Solver. https://sites.google.com/site/z3strsolver/
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Personal communications with the Stranger team. 2015Google Scholar
  6. 6.
    Abdulla PA, Atig MF, Chen Y-F, Holík L, Rezine A, Rümmer P, Stenman J (2014) String constraints for verification. In: Proceedings of the 26th international conference on computer aided verification, CAV’14, pp 150–166Google Scholar
  7. 7.
    Barrett C, Fontaine P, Tinelli C (2016) The Satisfiability Modulo Theories Library (SMT-LIB). http://www.SMT-LIB.org
  8. 8.
    Bjørner N, Tillmann N, Voronkov A (2009) Path feasibility analysis for string-manipulating programs. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems, TACAS ’09, pp 307–321Google Scholar
  9. 9.
    Cristian C, Daniel D, Dawson E (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX conference on operating systems design and implementation, OSDI’08. USENIX Association, Berkeley, pp 209–224Google Scholar
  10. 10.
    Chipounov V, Kuznetsov V, Candea G (2011) S2e: a platform for in-vivo multi-path analysis of software systems. In: Proceedings of the sixteenth international conference on architectural support for programming languages and operating systems. ASPLOS XVI. ACM, New York, pp 265–278Google Scholar
  11. 11.
    Christensen AS, Møller A, Schwartzbach MI (2003) Precise analysis of string expressions. In: Proceedings of the 10th international conference on static analysis, SAS’03, pp 1–18Google Scholar
  12. 12.
    De Moura L, Bjørner N (2008) Z3: an efficient smt solver. In: Proceedings of the theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems, TACAS’08, pp 337–340Google Scholar
  13. 13.
    Ganesh V, Dill DL (2007) A decision procedure for bit-vectors and arrays. In: Proceedings of the 19th international conference on computer aided verification, CAV’07, pp 519–531Google Scholar
  14. 14.
    Ganesh V, Minnes M, Solar-Lezama A, Rinard M (2012) Word equations with length constraints: what’s decidable? In: HVC’12Google Scholar
  15. 15.
    Ghosh I, Shafiei N, Li G, Chiang W-F (2013) JST: an automatic test generation tool for industrial java applications with strings. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 992–1001Google Scholar
  16. 16.
    Hooimeijer P, Weimer W (2010) Solving string constraints lazily. In: Proceedings of the IEEE/ACM international conference on automated software engineering, ASE ’10, pp 377–386Google Scholar
  17. 17.
    Hopcroft JE, Motwani R, Ullman JD (2007) Introduction to automata theory, languages, and computation. Pearson/Addison Wesley, ReadingMATHGoogle Scholar
  18. 18.
    Jeon J, Qiu X, Fetter-Degges J, Foster JS, Solar-Lezama A (2016) Synthesizing framework models for symbolic execution. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. ACM, New York, pp 156–167Google Scholar
  19. 19.
    Jeż A (2013) Recompression: word equations and beyond. In: Developments in language theory, Lecture Notes in Computer Science, pp 12–26Google Scholar
  20. 20.
    Karhumäki J, Mignosi F, Plandowski W (2000) The expressibility of languages and relations by word equations. J ACM 47(3):483–505MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Kausler S (2014) Evaluation of string constraint solvers using dynamic symbolic execution. Master’s Thesis, Boise State UniversityGoogle Scholar
  22. 22.
    Kausler S, Sherman E (2014) Evaluation of string constraint solvers in the context of symbolic execution. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, ASE ’14. ACM, New York, pp 259–270Google Scholar
  23. 23.
    Kiezun A, Ganesh V, Guo PJ, Hooimeijer P, Ernst MD (2009) Hampi: a solver for string constraints. In: Proceedings of the eighteenth international symposium on software testing and analysis, ISSTA ’09, pp 105–116Google Scholar
  24. 24.
    Li G, Andreasen E, Ghosh I (2014) SymJS: automatic symbolic testing of javascript web applications. In: Proceedings of the 22Nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014, pp 449–459Google Scholar
  25. 25.
    Li G, Ghosh I (2013) PASS: string solving with parameterized array and interval automaton. In: 9th international Haifa verification conference, HVC ’13, pp 15–31Google Scholar
  26. 26.
    Liang T, Reynolds A, Tinelli C, Barrett C, Deters M (2014) A dpll(t) theory solver for a theory of strings and regular expressions. In: Proceedings of the 26th international conference on computer aided verification, CAV’14, pp 646–662Google Scholar
  27. 27.
    Makanin GS (1977) The problem of solvability of equations in a free semigroup. Math Sbornik 103:147–236 (1977). English transl. in Math USSR Sbornik 32Google Scholar
  28. 28.
    Matiyasevich Y (2007) Word equations, Fibonacci numbers, and Hilbert’s tenth problem. In: Workshop on Fibonacci wordsGoogle Scholar
  29. 29.
    Plandowski W (2004) Satisfiability of word equations with constants is in pspace. J ACM 51(3):483–496MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Plandowski W (2006) An efficient algorithm for solving word equations. In: Proceedings of the 38th annual ACM symposium on theory of computing, STOC ’06, pp 467–476Google Scholar
  31. 31.
    Qu X, Robinson B (2011) A case study of concolic testing tools and their limitations. In: 2011 international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society, pp 117–126Google Scholar
  32. 32.
    Redelinghuys G, Visser W, Geldenhuys J (2012) Symbolic execution of programs with strings. In: Proceedings of the South African Institute for computer scientists and information technologists conference, SAICSIT ’12, pp 139–148Google Scholar
  33. 33.
    Saxena P, Akhawe D, Hanna S, Mao F, McCamant S, Song D (2010) A symbolic execution framework for javascript. In: Proceedings of the 2010 IEEE symposium on security and privacy, SP ’10, pp 513–528Google Scholar
  34. 34.
    Schulz K (1992) Makanin’s algorithm for word equations-two improvements and a generalization. In: Schulz K (ed) Word equations and related topics, Lecture Notes in Computer Science, vol 572. Springer, Berlin, pp 85–150CrossRefGoogle Scholar
  35. 35.
    Tateishi T, Pistoia M, Tripp O (2013) Path- and index-sensitive string analysis based on monadic second-order logic. ACM Trans Softw Eng Methodol 22(4):33:1–33:33CrossRefGoogle Scholar
  36. 36.
    Trinh M-T, Chu D-H, Jaffar J (2014) S3: a symbolic string solver for vulnerability detection in web applications. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, CCS ’14, pp 1232–1243Google Scholar
  37. 37.
    Xie X, Liu Y, Le W, Li X, Chen H (2015) S-looper: automatic summarization for multipath string loops. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 2015. ACM, New York, pp 188–198Google Scholar
  38. 38.
    Yu F, Alkhalaf M, Bultan T (2010) Stranger: an automata-based string analysis tool for php. In: Proceedings of the 16th international conference on tools and algorithms for the construction and analysis of systems, TACAS’10, pp 154–157Google Scholar
  39. 39.
    Yu F, Bultan T, Ibarra OH (2009) Symbolic string verification: combining string analysis and size analysis. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems, TACAS ’09, pp 322–336Google Scholar
  40. 40.
    Zheng Y, Ganesh V, Subramanian S, Tripp O, Dolby J, Zhang X (2015) Effective search-space pruning for solvers of string equations, regular expressions and length constraints. Technical report. https://sites.google.com/site/z3strsolver/publications
  41. 41.
    Zheng Y, Zhang X, Ganesh V (2013) Z3-str: a z3-based string solver for web application analysis. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013, pp 114–124Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA
  2. 2.GoogleMountain ViewUSA
  3. 3.University of WaterlooWaterlooCanada
  4. 4.Purdue UniversityWest LafayetteUSA

Personalised recommendations