Skip to main content

Learning XML Grammars

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2123))

Abstract

We sketch possible applications of grammatical inference techniques to problems arising in the context of XML. The idea is to infer document type definitions (DTDs) of XML documents in situations either when the original DTD is missing or when a DTD should be (re) designed or when a DTD should be restricted to a more user-oriented view on a subset of the (given) DTD. The usefulness of such an approach is underlined by the importance of knowing appropriate DTDs; this knowledge can be exploited, e.g., for optimizing database queries based on XML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Ahonen. Automatic generation of SGML content models.In Electronic Publishing’ 96 (Palo Alto, California, USA), September 1996.

    Google Scholar 

  2. H. Ahonen. Generating grammars for structured documents using grammatical inference methods. Phd thesis. Also: Report A-1996-4, Department of Computer Science, University of Helsinki, Finland, 1996.

    Google Scholar 

  3. H. Ahonen, H. Mannila,and E. Nikunen. Forming grammars for structured documents:an application of grammatical inference.In R.C. Carrasco and J. Oncina, editors, Proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94): Grammatical Inference and Applications, volume 862 of LNCS/LNAI, pages 153–167. Springer, 1994.

    Google Scholar 

  4. O. Altamura, F. Esposito, F. A. Lisi, and D. Malerba. Symbolic learning techniques in paper document processing. In P. Perner and M. Petrou, editors, Machine learning and data mining in pattern recognition, volume 1715 of LNCS/LNAI, pages 159–173. Springer, 1999.

    Chapter  Google Scholar 

  5. D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982.

    MATH  MathSciNet  Google Scholar 

  6. R. Behrens. A grammar based model for XML schema integration.In B. Lings and K. Jeffery, editors, Advances in Databases, 17th British National Conference on Databases (BNCOD 17), volume 1832 of LNCS, pages 172–190. Springer, 2000.

    Google Scholar 

  7. R. Behrens and G. Buntrock. XML, eine Verwandte der Dyck-Sprachen. In 9. Theorietag der GI-Fachgruppe 0.1.5 Automaten und Formale Sprachen, volume Preprint 12/99 of Mathematische Schriften Kassel, September 1999.

    Google Scholar 

  8. J. Berstel and L. Boasson. XML grammars. In N. Nielsen and B. Rovan, editors, Mathematical Foundations of Computer Science (MFCS’2000), volume 1893 of LNCS, pages 182–191. Springer, 2000. Long Version as Technical Report IGM 2000-06, see http://w-igm.univ-mlv.fr/berstel/Recherche.html.

    Chapter  Google Scholar 

  9. H. Boström. Theory-guided induction of logic programs by inference of regular languages. In Proc. of the 13th International Conference on Machine Learning, pages 46–53. Morgan Kaufmann, 1996.

    Google Scholar 

  10. A. Brüggemann-Klein, S. Herrmann, and D. Wood. Context and caterpillars and structured documents. In E. V. Munson, C. Nicholas, and D. Wood, editors, Principles of Digital Document Processing; 4th International Workshop (PODDP’98), volume 1481 of LNCS, pages 1–9. Springer, 1998.

    Google Scholar 

  11. A. Brüggemann-Klein and D. Wood. Caterpillars, context, tree automata and tree pattern matching. In G. Rozenberg and W. Thomas, editors, Developments in Language Theory; Foundations, Applications, and Perspectives (DLT’99), pages 270–285. World Scientific, 2000.

    Google Scholar 

  12. CZ-Redaktion. Maschinenmenschen plauern per XML mit der Unternehmens-IT. Computer Zeitung, (50):30, December 2000.

    Google Scholar 

  13. P. Dupont and L. Miclet. Inférence grammaticale réguliére: fondements théoriques et principaux algorithmes. Technical Report RR-3449, INRIA, 1998.

    Google Scholar 

  14. P. Fankhauser and Y. Xu. Markitup! An incremental approach to document structure recognition. Electronic Publishing-Origination, Dissemination and Design, 6(4):447–456, 1994.

    Google Scholar 

  15. H. Fernau. Identification of function distinguishable languages. In H. Arimura, S. Jain, and A. Sharma, editors, Proceedings of the 11th International Conference Algorithmic Learning Theory ALT 2000, volume 1968 of LNCS/LNAI, pages 116–130. Springer, 2000.

    Chapter  Google Scholar 

  16. H. Fernau. k-gram extensions of terminal distinguishable languages. In International Conference on Pattern Recognition (ICPR 2000), volume 2, pages 125–128. IEEE/IAPR, IEEE Press, 2000.

    Article  Google Scholar 

  17. H. Fernau. Approximative learning of regular languages. Technical Report WSI-2001-2, Universität Tübingen (Germany), Wilhelm-Schickard-Institut für Informatik, 2001.

    Google Scholar 

  18. H. Fernau and J. M. Sempere. Permutations and control sets for learning non-regular language families. In A.L. Oliveira, editor, Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), volume 1891 of LNCS/LNAI, pages 75–88. Springer, 2000.

    Google Scholar 

  19. D. Freitag. Usinggrammatical inferencetoimproveprecisionininformation extraction. InWorkshop on Grammatical Inference, Automata Induction, and Language Acquisition (ICML’97), Nashville,TN, 1997. Available through: http://www.univ-st-etienne.fr/eurise/pdupont/mlworkshop.html#proc.

  20. P. Garcýa and E. Vidal. Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:920–925, 1990.

    Article  Google Scholar 

  21. T. Goan, N. Benson, and O. Etzioni. A grammatical inference algorithm for the World Wide Web. In Working Notes of the AAAI-96 Spring Symposium on Machine Learning in Information Access, 1996.

    Google Scholar 

  22. E. M. Gold. Language identification in the limit. Information and Control (now Information and Computation), 10:447–474, 1967.

    Article  MATH  Google Scholar 

  23. J. Gregor. Data-driven inductive inference of finite-state automata. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):305–322, 1994.

    Article  Google Scholar 

  24. C. de la Higuera. Current trends in grammatical inference.In F. J. Ferri et al., editors, Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR’2000, volume 1876 of LNCS, pages 28–31. Springer, 2000.

    Google Scholar 

  25. T. Hu and R. Ingold. A mixed approach toward an efficient logical structure recognition. Electronic Publishing-Origination, Dissemination and Design, 6(4):457–468, 1994.

    Google Scholar 

  26. S. Kobayashi and T. Yokomori. Learning approximately regular languages with reversible languages. Theoretical Computer Science, 174(1–2):251–257, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  27. E. Mäkinen. Inferring regular languages by merging nonterminals. International Journal of Computer Mathematics, 70:601–616, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  28. T. Mitchell. Machine Learning. McGraw-Hill, 1997.

    Google Scholar 

  29. T. Mitchell. Machine learning and data mining. Communications of the ACM, 42:31–36, 1999.

    Article  Google Scholar 

  30. S. Muggleton and L. De Raedt. Inductive logic programming: theory and methods. Journal of Logic Programming, 20:629–679, 1994.

    Article  Google Scholar 

  31. V. Radhakrishnan. Grammatical Inference from Positive Data: An Effective Integrated Approach. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay (India), 1987.

    Google Scholar 

  32. V. Radhakrishnan and G. Nagaraja. Inference of regular grammars via skeletons. IEEE Transactions on Systems, Man and Cybernetics, 17(6):982–992, 1987.

    MathSciNet  Google Scholar 

  33. G. Semeraro, F. Esposito, and D. Malerba. Learning contextual rules for document understanding. In Proceedings of the 10th IEEE Conference on Artificial Intelligence for Applications, pages 108–115, 1994.

    Google Scholar 

  34. R. E. Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the Association for Computing Machinery, 22(2):215–225, 1975.

    MATH  MathSciNet  Google Scholar 

  35. P. T. Wood. Rewriting XQL queries on XML repositories.In B. Lings and K. Jeffery, editors, Advances in Databases, 17th British National Conference on Databases (BNCOD 17), volume 1832 of LNCS, pages 209–226. Springer, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fernau, H. (2001). Learning XML Grammars. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2001. Lecture Notes in Computer Science(), vol 2123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44596-X_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-44596-X_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42359-1

  • Online ISBN: 978-3-540-44596-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics