Skip to main content

Twinning Automata and Regular Expressions for String Static Analysis

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12597)


In this paper we formalize \(\textsc {Tarsis}\), a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of \(\textsc {Tarsis}\) is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the widening operator, and the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state-of-the-art approaches. We implemented a prototype of \(\textsc {Tarsis}\), and we applied it to some case studies taken from some of the most popular Java libraries manipulating string values. The experimental results confirm that \(\textsc {Tarsis}\) is in position to obtain strictly more precise results than existing analyses.


  • String analysis
  • Static analysis
  • Abstract interpretation

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-67067-2_13
  • Chapter length: 24 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-67067-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.


  1. 1.

  2. 2.

    The set \(\varSigma ^*_\mathsf {P}\) can be easily computed collecting the constant strings in \(\mathsf {P}\) by visiting its abstract syntax tree and then computing their substrings.

  3. 3.

    For the sake of readability, in the program examples presented in this paper the plus operation between strings corresponds to the string concatenation.

  4. 4.

    The abstract semantics of concat does not add any further important technical detail to the paper hence it is not reported.

  5. 5.

    Note that this is a decidable check since \(\mathtt{A}\) and \(\mathtt{A}'\) are cycle-free, otherwise the interval \([-1, +\infty ]\) would be returned in the first case.

  6. 6.

    \(\mathsf {\textsc {Fa}_{/\equiv }}\) source code:

  7. 7.

    \(\textsc {Tarsis}\) source code:


  1. Abdulla, P.A., et al.: Efficient handling of string-number conversion. In: Donaldson, A.F., Torlak, E. (eds.) Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, 15–20 June 2020, pp. 943–957. ACM (2020).

  2. Abdulla, et al.: String constraints for verification. In: Biere, A., Bloem, R. (eds.) Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, 18–22 July 2014. Proceedings. Lecture Notes in Computer Science, vol. 8559, pp. 150–166. Springer (2014).

  3. Abdulla, P.A., Atig, M.F., Diep, B.P., Holík, L., Janku, P.: Chain-free string constraints. In: Chen, Y., Cheng, C., Esparza, J. (eds.) Automated Technology for Verification and Analysis - 17th International Symposium, ATVA 2019, Taipei, Taiwan, 28–31 October 2019, Proceedings. Lecture Notes in Computer Science, vol. 11781, pp. 277–293. Springer (2019).

  4. Almashfi, N., Lu, L.: Precise string domain for analyzing javascript arrays and objects. In: 2020 3rd International Conference on Information and Computer Technologies (ICICT), pp. 17–23 (2020)

    Google Scholar 

  5. Amadini, R., Gange, G., Stuckey, P.J.: Dashed strings for string constraint solving. Artif. Intell. 289, 103368 (2020).

    CrossRef  MathSciNet  Google Scholar 

  6. Arceri, V., Maffeis, S.: Abstract domains for type juggling. Electron. Notes Theor. Comput. Sci. 331, 41–55 (2017).

    CrossRef  MATH  Google Scholar 

  7. Arceri, V., Mastroeni, I.: A sound abstract interpreter for dynamic code. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) SAC 2020: The 35th ACM/SIGAPP Symposium on Applied Computing, online event, [Brno, Czech Republic], 30 March–3 April 2020. pp. 1979–1988. ACM (2020).

  8. Arceri, V., Mastroeni, I., Xu, S.: Static analysis for Ecmascript string manipulation programs. Appl. Sci. 10, 3525 (2020).

    CrossRef  Google Scholar 

  9. Arceri, V., Olliaro, M., Cortesi, A., Mastroeni, I.: Completeness of abstract domains for string analysis of javascript programs. In: Hierons, R.M., Mosbah, M. (eds.) Theoretical Aspects of Computing - ICTAC 2019–16th International Colloquium, Hammamet, Tunisia, 31 October–4 November 2019, Proceedings. Lecture Notes in Computer Science, vol. 11884, pp. 255–272. Springer (2019).

  10. Bartzis, C., Bultan, T.: Widening arithmetic automata. In: Alur, R., Peled, D.A. (eds.) Computer Aided Verification, 16th International Conference, CAV 2004, Boston, MA, USA, 13–17 July 2004, Proceedings. Lecture Notes in Computer Science, vol. 3114, pp. 321–333. Springer (2004).

  11. Chen, T., Hague, M., Lin, A.W., Rümmer, P., Wu, Z.: Decision procedures for path feasibility of string-manipulating programs with complex operations. Proc. ACM Program. Lang. 3(POPL), 49:1–49:30 (2019).

  12. Choi, T., Lee, O., Kim, H., Doh, K.: A practical string analyzer by the widening approach. In: Kobayashi, N. (ed.) Programming Languages and Systems, 4th Asian Symposium, APLAS 2006, Sydney, Australia, 8–10 November 2006, Proceedings. Lecture Notes in Computer Science, vol. 4279, pp. 374–388. Springer (2006).

  13. Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise analysis of string expressions. In: Cousot, R. (ed.) Static Analysis, 10th International Symposium, SAS 2003, San Diego, CA, USA, 11–13 June 2003, Proceedings. Lecture Notes in Computer Science, vol. 2694, pp. 1–18. Springer (2003).

  14. Cortesi, A., Olliaro, M.: M-string segmentation: A refined abstract domain for string analysis in C programs. In: Pang, J., Zhang, C., He, J., Weng, J. (eds.) 2018 International Symposium on Theoretical Aspects of Software Engineering, TASE 2018, Guangzhou, China, 29–31 August 2018, pp. 1–8. IEEE Computer Society (2018).

  15. Cortesi, A., Zanioli, M.: Widening and narrowing operators for abstract interpretation. Comput. Lang. Syst. Struct. 37(1), 24–42 (2011).

    CrossRef  MATH  Google Scholar 

  16. Costantini, G., Ferrara, P., Cortesi, A.: A suite of abstract domains for static analysis of string values. Softw. Pract. Exp. 45(2), 245–287 (2015).

    CrossRef  Google Scholar 

  17. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Graham, R.M., Harrison, M.A., Sethi, R. (eds.) Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977, pp. 238–252. ACM (1977).

  18. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: Aho, A.V., Zilles, S.N., Rosen, B.K. (eds.) Conference Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, San Antonio, Texas, USA, January 1979, pp. 269–282. ACM Press (1979).

  19. Cousot, P., Cousot, R.: Abstract interpretation frameworks. J. Log. Comput. 2(4), 511–547 (1992).

    CrossRef  MathSciNet  MATH  Google Scholar 

  20. Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Aho, A.V., Zilles, S.N., Szymanski, T.G. (eds.) Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona, USA, January 1978, pp. 84–96. ACM Press (1978).

  21. D’Antoni, L., Veanes, M.: Minimization of symbolic automata. In: Jagannathan, S., Sewell, P. (eds.) The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2014, San Diego, CA, USA, 20–21 January 2014, pp. 541–554. ACM (2014).

  22. Davis, M.D., Sigal, R., Weyuker, E.J.: Computability, Complexity, and Languages: Fund. of Theor. CS. Academic Press Professional, Inc. (1994)

    Google Scholar 

  23. D’Silva, V.: Widening for Automata. MsC Thesis, Inst. Fur Inform. - UZH (2006)

    Google Scholar 

  24. Giacobazzi, R., Ranzato, F., Scozzari, F.: Making abstract interpretations complete. J. ACM 47(2), 361–416 (2000).

  25. Madsen, M., Andreasen, E.: String analysis for dynamic field access. In: Cohen, A. (ed.) Compiler Construction - 23rd International Conference, CC 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, 5–13 April, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8409, pp. 197–217. Springer (2014).

  26. Mastroeni, I., Nikolic, D.: Abstract program slicing: From theory towards an implementation. In: Dong, J.S., Zhu, H. (eds.) Formal Methods and Software Engineering - 12th International Conference on Formal Engineering Methods, ICFEM 2010, Shanghai, China, 17–19 November 2010. Proceedings. Lecture Notes in Computer Science, vol. 6447, pp. 452–467. Springer (2010).

  27. Mastroeni, I., Zanardini, D.: Abstract program slicing: an abstract interpretation-based approach to program slicing. ACM Trans. Comput. Log. 18(1), 7:1–7:58 (2017).

  28. Midtgaard, J., Nielson, F., Nielson, H.R.: A parametric abstract domain for lattice-valued regular expressions. In: Rival, X. (ed.) Static Analysis - 23rd International Symposium, SAS 2016, Edinburgh, UK, 8–10 September 2016, Proceedings. Lecture Notes in Computer Science, vol. 9837, pp. 338–360. Springer (2016).

  29. Park, C., Im, H., Ryu, S.: Precise and scalable static analysis of jquery using a regular expression domain. In: Ierusalimschy, R. (ed.) Proceedings of the 12th Symposium on Dynamic Languages, DLS 2016, Amsterdam, The Netherlands, 1 November 2016, pp. 25–36. ACM (2016).

  30. Preda, M.D., Giacobazzi, R., Lakhotia, A., Mastroeni, I.: Abstract symbolic automata: mixed syntactic/semantic similarity analysis of executables. In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 329–341. ACM (2015).

  31. Rival, X., Mauborgne, L.: The trace partitioning abstract domain. ACM Trans. Program. Lang. Syst. 29(5), 26-es, August 2007.

  32. Veanes, M.: Applications of symbolic finite automata. In: Konstantinidis, S. (ed.) Implementation and Application of Automata - 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16–19, 2013. Proceedings. Lecture Notes in Computer Science, vol. 7982, pp. 16–23. Springer (2013).

  33. Wang, H., Chen, S., Yu, F., Jiang, J.R.: A symbolic model checking approach to the analysis of string and length constraints. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 623–633. ACM (2018).

  34. Yu, F., Alkhalaf, M., Bultan, T., Ibarra, O.H.: Automata-based symbolic string analysis for vulnerability detection. Formal Meth. Syst. Des. 44(1), 44–70 (2013).

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Vincenzo Arceri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Negrini, L., Arceri, V., Ferrara, P., Cortesi, A. (2021). Twinning Automata and Regular Expressions for String Static Analysis. In: Henglein, F., Shoham, S., Vizel, Y. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2021. Lecture Notes in Computer Science(), vol 12597. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67066-5

  • Online ISBN: 978-3-030-67067-2

  • eBook Packages: Computer ScienceComputer Science (R0)