Advertisement

XML Validation for Context-Free Grammars

  • Yasuhiko Minamide
  • Akihiko Tozawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4279)

Abstract

String expression analysis conservatively approximates the possible string values generated by a program. We consider the validation of a context-free grammar obtained by the analysis against XML schemas and develop two algorithms for deciding inclusion L(G 1) ⊆ L(G 2) where G 1 is a context-free grammar and G 2 is either an XML-grammar or a regular hedge grammar. The algorithms for XML-grammars and regular hedge grammars have exponential and doubly exponential time complexity, respectively. We have incorporated the algorithms into the PHP string analyzer and validated several publicly available PHP programs against the XHTML DTD. The experiments show that both of the algorithms are efficient in practice although they have exponential complexity.

Keywords

Regular Expression Inclusion Problem Tree Automaton Tree Transducer Validation Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [BB02]
    Berstel, J., Boasson, L.: Formal properties of XML grammars and languages. Acta Informatica 38(9), 649–671 (2002)MATHCrossRefMathSciNetGoogle Scholar
  2. [BMS01]
    Brabrand, C., Møller, A., Schwartzbach, M.I.: Static validation of dynamically generated HTML. In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, pp. 38–45 (2001)Google Scholar
  3. [CM01]
    Clark, J., Murata, M.: RELAX NG specification (2001), http://www.oasis-open.org/committees/relax-ng/spec
  4. [CMS03a]
    Christensen, A.S., Møller, A., Schwartzbach, M.I.: Extending Java for high-level web service construction. ACM Transactions on Programming Languages and Systems 25(6), 814–875 (2003)CrossRefGoogle Scholar
  5. [CMS03b]
    Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise analysis of string expressions. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 1–18. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. [GF80]
    Greibach, S.A., Friedman, E.P.: Superdeterministic PDAs: A subcase with a decidable inclusion problem. Journal of the Association for Computing Machinery 27(4), 675–700 (1980)MATHMathSciNetGoogle Scholar
  7. [Har78]
    Harrison, M.A.: Introduction to Formal Language Theory, ch. 4. Addison-Wesley, Reading (1978)Google Scholar
  8. [HP03]
    Hosoya, H., Pierce, B.: XDuce: A statically typed XML processing language. ACM Transactions on Internet Technology 3(2), 117–148 (2003)CrossRefGoogle Scholar
  9. [HVP05]
    Hosoya, H., Vouillon, J., Pierce, B.: Regular expression types for XML. ACM Transactions on Programming Languages and Systems 27(1), 46–90 (2005)CrossRefGoogle Scholar
  10. [Knu67]
    Knuth, D.E.: A characterization of parenthesis languages. Information and Control 11(3), 269–289 (1967)MATHCrossRefGoogle Scholar
  11. [McN67]
    McNaughton, R.: Parenthesis grammars. Journal of the Association for Computing Machinery 14(3), 490–500 (1967)MATHMathSciNetGoogle Scholar
  12. [Min05]
    Minamide, Y.: Static approximation of dynamically generated Web pages. In: Proceedings of the 14th International World Wide Web Conference, pp. 432–441. ACM Press, New York (2005)CrossRefGoogle Scholar
  13. [Mur99]
    Murata, M.: Hedge automata: a formal model for XML schemata (1999), http://www.xml.gr.jp/relax/hedge_nice.html
  14. [Pla94]
    Plandowski, W.: Testing equivalence of morphisms on context-free languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  15. [PQ68]
    Pair, C., Quere, A.: Définition et étude des bilangages réguliers. Information and Control 13(6), 565–593 (1968)MATHCrossRefMathSciNetGoogle Scholar
  16. [Tak75]
    Takahashi, M.: Generalizations of regular sets and their application to a study of context-free languages. Information and Control 21(1), 1–36 (1975)CrossRefGoogle Scholar
  17. [Toz06]
    Tozawa, A.: XML type checking using high-level tree transducer. In: Hagiya, M., Wadler, P. (eds.) FLOPS 2006. LNCS, vol. 3945, pp. 81–96. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yasuhiko Minamide
    • 1
  • Akihiko Tozawa
    • 2
  1. 1.Department of Computer ScienceUniversity of Tsukuba 
  2. 2.IBM Research, Tokyo Research Laboratory, IBM Japan, ltd. 

Personalised recommendations