Foundations of Fast Communication via XML


Communication with XML often involves pre-agreed document types. In this paper, we propose an offline parser generation approach to enhance online processing performance for documents conforming to a given DTD. Our examination of DTDs and the languages they define demonstrates the existence of ambiguities. We present an algorithm that maps DTDs to deterministic context-free grammars defining the same languages. We prove the grammars to be LL(1) and LALR(1), making them suitable for standard parser generators. Our experiments show the superior performance of generated optimized parsers. Our results generalize from DTDs to XML schema specifications with certain restrictions, most notably the absence of namespaces, which exceed the scope of context-free grammars.

This is a preview of subscription content, access via your institution.


  1. Apache (2002), Xerces C++ Parser, Apache XML Project,

  2. B2B Group (2002), aXMLerate Project, University of Karlsruhe,

  3. Berstel, J. and L. Boasson (2000), “XML Grammars, ” In Mathematical Foundations of Computer Science (MFCS'2000), N. Nielsen and B. Rovan, Eds., Lecture Notes in Computer Science, Vol. 1893, Springer, pp. 182–191. Long version as Technical Report IGM 2000–06, see www-igm.univ-mlv. fr/~berstel/Recherche.html.

  4. Brüggemann-Klein, A. (1993), “Regular Expressions into Finite Automata, ” Theoretical Computer Science 120, 2, 197–213.

    Article  Google Scholar 

  5. Clark, J. (2000), “Expat - XML Parser Toolkit Version 1.2, ”

  6. DeRemer, F.L. (1971), “Simple LR(k) Grammars, ” Communications of the ACM 14, 7, 453–460.

    Article  Google Scholar 

  7. Donelly and Stallmann (1988), “Bison Manual, ” The GNU Project,

  8. Grosch, J. (1989), “Generators for High-Speed Front-Ends, ” In Proceedings of the 2nd Workshop on Compiler Compilers and High Speed Compilation, D. Hammer, Ed., Lecture Notes in Computer Science, Vol. 371, Springer, Berlin, pp. 81–92.

    Google Scholar 

  9. IBM AlphaWorks (2001), “XML Parser for Java, ” IBM Alpha Works,

  10. ISO (1986), “Information Processing - Text and Office Systems - Standard Generalized Markup Language (SGML), ” ISO 8879.

  11. Johnson, S. (1975), “Yacc - Yet Another Compiler-Compiler, ” Technical Report 32, Bell Telephone Laboratories, Murray Hill, NJ.

    Google Scholar 

  12. Microsoft (2002), “Component Object Model, ” Microsoft, (2002), “The MOST Cooperation, ” The MOST Cooperation,

  13. OMG (2002), “Corba 2.4.2 Specification, ” Object Management Group,

  14. PhiDaNi (2001), “The XML Booster, ” PhiDaNi Software,

  15. Rosenkrantz, D.J. and R.E. Stearns (1969), “Properties of Deterministic Top Down Grammars, ” In Conference Record of ACM Symposium on Theory of Computing, Marina del Rey, CA, pp. 165- 180.

  16. Vielsack, B. (1988), “The Parser Generators lalr and ell, ” Technical Report 93–3, Gesellschaft für Mathematik und Datenverarbeitung, Forschungsstelle Karlsruhe.

  17. W3C (1998), “Extensible Markup Language (XML) 1.0, ” W3C Recommendation 10 February 1998,

  18. W3C (1999), “Namespaces in XML, ”W3C Recommendation 14 January 1999,

  19. W3C (2001), “XML Schema Part 1: Structures, ” W3C Recommendation 2 May 2001,–1–20010502.

  20. Waite, W. and G. Goos (1985), Compiler Construction, Texts and Monographs in Computer Science, Springer, Berlin.

    Google Scholar 

  21. doc.html.

Download references

Author information



Rights and permissions

Reprints and Permissions

About this article

Cite this article

Löwe, W.M., Noga, M.L. & Gaul, T.S. Foundations of Fast Communication via XML. Annals of Software Engineering 13, 357–379 (2002).

Download citation


  • Operating System
  • Superior Performance
  • Processing Performance
  • Schema Specification
  • Document Type