Abstract Parsing: Static Analysis of Dynamically Generated String Output Using LR-Parsing Technology

  • Kyung-Goo Doh
  • Hyunha Kim
  • David A. Schmidt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5673)


We combine LR(k)-parsing technology and data-flow analysis to analyze, in advance of execution, the documents generated dynamically by a program. Based on the document language’s context-free reference grammar and the program’s control structure, the analysis predicts how the documents will be generated and parses the predicted documents. Our strategy remembers context-free structure by computing abstract LR-parse stacks. The technique is implemented in Objective Caml and has statically validated a suite of PHP programs that dynamically generate HTML documents.


Regular Expression Program Point Grammatical Structure Call Graph Injection Attack 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, G.: Simultaneous demand-driven data-flow and call graph analysis. In: Proc. Int’l. Conf. Software Maintenance, Oxford (1999)Google Scholar
  2. 2.
    Aho, A., Ullman, J.: Principles of Compiler Design. Addison-Wesley, Reading (1977)zbMATHGoogle Scholar
  3. 3.
    Brabrand, C., Møller, A., Schwartzbach, M.I.: The <bigwig> project. ACM Trans. Internet Technology 2 (2002)Google Scholar
  4. 4.
    Choi, T.-H., Lee, O., Kim, H., Doh, K.-G.: A practical string analyzer by the widening approach. In: Kobayashi, N. (ed.) APLAS 2006. LNCS, vol. 4279, pp. 374–388. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Christensen, A.S., Møller, A., Schwartzbach, M.I.: Static analysis for dynamic XML. In: Proc. PLAN-X 2002 (2002)Google Scholar
  6. 6.
    Christensen, A.S., Møller, A., Schwartzbach, M.I.: Extending Java for high-level web service construction. ACM TOPLAS 25 (2003)Google Scholar
  7. 7.
    Duesterwald, E., Gupta, R., Soffa, M.L.: A practical framework for demand-driven interprocedural data flow analysis. ACM TOPLAS 19, 992–1030 (1997)CrossRefGoogle Scholar
  8. 8.
    Horwitz, S., Reps, T., Sagiv, M.: Demand interprocedural dataflow analysis. In: Proc. 3rd ACM SIGSOFT Symp. Foundations of Software Engg. (1995)Google Scholar
  9. 9.
    Hosoya, H.: XDuce: A typed XML processing language. Technical Report (2008),
  10. 10.
    Hosoya, H., Vouillon, J., Pierce, B.C.: Regular expression types for XML. ACM TOPLAS 27, 46–90 (2005)CrossRefzbMATHGoogle Scholar
  11. 11.
    Jones, N.D., Mycroft, A.: Data flow analysis of applicative programs using minimal function graphs. In: Proc. 13th Symp. POPL, pp. 296–306. ACM Press, New York (1986)Google Scholar
  12. 12.
    Jovanovich, N., Kruegel, C., Kirda, E.: Pixy: A static analysis tool for detecting web application vulnerabilities. In: Proc. IEEE Symp. on Security and Privacy, pp. 258–263 (2006)Google Scholar
  13. 13.
    Kirkegaard, C., Møller, A.: Static analysis for Java Servlets and JSP. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 336–352. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Minamide, Y.: Static approximation of dynamically generated web pages. In: Proc. 14th ACM Int’l Conf. on the World Wide Web, pp. 432–441 (2005)Google Scholar
  15. 15.
    Minimide, Y., Tozawa, A.: XML validation for context-free grammars. In: Kobayashi, N. (ed.) APLAS 2006. LNCS, vol. 4279, pp. 357–373. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999)CrossRefzbMATHGoogle Scholar
  17. 17.
    Nishiyama, T., Minimide, Y.: A translation from the HTML DTD into a regular hedge grammar. In: Ibarra, O.H., Ravikumar, B. (eds.) CIAA 2008. LNCS, vol. 5148, pp. 122–131. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Thiemann, P.: Grammar-based analysis of string expressions. In: Proc. ACM workshop Types in languages design and implementation, pp. 59–70 (2005)Google Scholar
  19. 19.
    Wassermann, G., Gould, C., Su, Z., Devanbu, P.: Static checking of dymanically generated queries in database applications. ACM Trans. Software Engineering and Methodology 16(4), 1–27 (2007)CrossRefGoogle Scholar
  20. 20.
    Wassermann, G., Su, Z.: The essence of command injection attacks in web applications. In: Proc. 33d ACM POPL, pp. 372–382 (2006)Google Scholar
  21. 21.
    Wassermann, G., Su, Z.: Sound and precise analysis of web applications for injection vulnerabilities. In: Proc. ACM PLDI, pp. 32–41 (2007)Google Scholar
  22. 22.
    Xie, Y., Aiken, A.: Static detection of security vulnerabilities in scripting languages. In: Proc. 15th USENIX Security Symp. (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kyung-Goo Doh
    • 1
  • Hyunha Kim
    • 1
  • David A. Schmidt
    • 2
  1. 1.Hanyang UniversityAnsanSouth Korea
  2. 2.Kansas State University, ManhattanKansasUSA

Personalised recommendations