Precise Analysis of String Expressions

  • Aske Simon Christensen
  • Anders Møller
  • Michael I. Schwartzbach
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2694)


We perform static analysis of Java programs to answer a simple question: which values may occur as results of string expressions? The answers are summarized for each expression by a regular language that is guaranteed to contain all possible values. We present several applications of this analysis, including statically checking the syntax of dynamically generated expressions, such as SQL queries. Our analysis constructs flow graphs from class files and generates a context-free grammar with a nonterminal for each string expression. The language of this grammar is then widened into a regular language through a variant of an algorithm previously used for speech recognition. The collection of resulting regular languages is compactly represented as a special kind of multi-level automaton from which individual answers may be extracted. If a program error is detected, examples of invalid strings are automatically produced. We present extensive benchmarks demonstrating that the analysis is efficient and produces results of useful precision.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers — Principles, Techniques, and Tools. Addison-Wesley, November 1985.Google Scholar
  2. [2]
    Alex Aiken. Introduction to set constraint-based program analysis. Science of Computer Programming, 35:79–111, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  3. [3]
    Aske Simon Christensen, Anders Møller, and Michael I. Schwartzbach. Static analysis for dynamic XML. Technical Report RS-02-24, BRICS, May 2002. Presented at Programming Language Technologies for XML, PLAN-X, October 2002.Google Scholar
  4. [4]
    Aske Simon Christensen, Anders Møller, and Michael I. Schwartzbach. Extending Java for high-level Web service construction. ACM Transactions on Programming Languages and Systems, 2003. To appear.Google Scholar
  5. [5]
    James Clark and Steve DeRose. XML path language, November 1999. W3C Recommendation. Scholar
  6. [6]
    H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree automata techniques and applications, 1999. Available from Scholar
  7. [7]
    Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. 4th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’77, pages 238–252, 1977.Google Scholar
  8. [8]
    Nurit Dor, Michael Rodeh, and Mooly Sagiv. Cleanness checking of string manipulations in C programs via integer analysis. In Proc. 8th International Static Analysis Symposium, SAS’ 01, volume 2126 of LNCS. Springer-Verlag, July 2001.Google Scholar
  9. [9]
    John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, April 1979.Google Scholar
  10. [10]
    Haruo Hosoya and Benjamin C. Pierce. XDuce: A typed XML processing language. In Proc. 3rd International Workshop on the World Wide Web and Databases, WebDB’00, volume 1997 of LNCS. Springer-Verlag, May 2000.Google Scholar
  11. [11]
    Mehryar Mohri and Mark-Jan Nederhof. Robustness in Language and Speech Technology, chapter 9: Regular Approximation of Context-Free Grammars through Transformation. Kluwer Academic Publishers, 2001.Google Scholar
  12. [12]
    Anders Møller. Document Structure Description 2.0, December 2002. BRICS, Department of Computer Science, University of Aarhus, Notes Series NS-02-7. Available from Scholar
  13. [13]
    Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. Principles of Program Analysis. Springer-Verlag, October 1999.Google Scholar
  14. [14]
    Rajesh Parekh and Vasant Honavar. DFA learning from simple examples. Machine Learning, 44:9–35, 2001.zbMATHCrossRefGoogle Scholar
  15. [15]
    Thomas Reps. Program analysis via graph reachability. Information and Software Technology, 40(11–12):701–726, November/December 1998.CrossRefGoogle Scholar
  16. [16]
    Umesh Shankar, Kunal Talwar, Jeffrey S. Foster, and David Wagner. Detecting format string vulnerabilities with type qualifiers. In Proc. 10th USENIX Security Symposium, 2001.Google Scholar
  17. [17]
    Naoshi Tabuchi, Eijiro Sumii, and Akinori Yonezawa. Regular expression types for strings in a text processing language. In Proc. Workshop on Types in Programming, TIP’ 02, 2002.Google Scholar
  18. [18]
    Raja Vallee-Rai, Laurie Hendren, Vijay Sundaresan, Patrick Lam, Etienne Gagnon, and Phong Co. Soot — A Java optimization framework. In Proc. IBM Centre for Advanced Studies Conference, CASCON’99. IBM, November 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Aske Simon Christensen
    • 1
    • 2
  • Anders Møller
    • 1
    • 2
  • Michael I. Schwartzbach
    • 1
    • 2
  1. 1.BRICSDenmark
  2. 2.Department of Computer ScienceUniversity of AarhusDenmark

Personalised recommendations