XML Validation for Context-Free Grammars
String expression analysis conservatively approximates the possible string values generated by a program. We consider the validation of a context-free grammar obtained by the analysis against XML schemas and develop two algorithms for deciding inclusion L(G 1) ⊆ L(G 2) where G 1 is a context-free grammar and G 2 is either an XML-grammar or a regular hedge grammar. The algorithms for XML-grammars and regular hedge grammars have exponential and doubly exponential time complexity, respectively. We have incorporated the algorithms into the PHP string analyzer and validated several publicly available PHP programs against the XHTML DTD. The experiments show that both of the algorithms are efficient in practice although they have exponential complexity.
KeywordsRegular Expression Inclusion Problem Tree Automaton Tree Transducer Validation Algorithm
Unable to display preview. Download preview PDF.
- [BMS01]Brabrand, C., Møller, A., Schwartzbach, M.I.: Static validation of dynamically generated HTML. In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, pp. 38–45 (2001)Google Scholar
- [CM01]Clark, J., Murata, M.: RELAX NG specification (2001), http://www.oasis-open.org/committees/relax-ng/spec
- [Har78]Harrison, M.A.: Introduction to Formal Language Theory, ch. 4. Addison-Wesley, Reading (1978)Google Scholar
- [Mur99]Murata, M.: Hedge automata: a formal model for XML schemata (1999), http://www.xml.gr.jp/relax/hedge_nice.html