Learning XML Grammars

Fernau, Henning

doi:10.1007/3-540-44596-X_7

Henning Fernau²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2123))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

1271 Accesses
13 Citations

Abstract

We sketch possible applications of grammatical inference techniques to problems arising in the context of XML. The idea is to infer document type definitions (DTDs) of XML documents in situations either when the original DTD is missing or when a DTD should be (re) designed or when a DTD should be restricted to a more user-oriented view on a subset of the (given) DTD. The usefulness of such an approach is underlined by the importance of knowing appropriate DTDs; this knowledge can be exploited, e.g., for optimizing database queries based on XML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Ahonen. Automatic generation of SGML content models.In Electronic Publishing’ 96 (Palo Alto, California, USA), September 1996.
Google Scholar
H. Ahonen. Generating grammars for structured documents using grammatical inference methods. Phd thesis. Also: Report A-1996-4, Department of Computer Science, University of Helsinki, Finland, 1996.
Google Scholar
H. Ahonen, H. Mannila,and E. Nikunen. Forming grammars for structured documents:an application of grammatical inference.In R.C. Carrasco and J. Oncina, editors, Proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94): Grammatical Inference and Applications, volume 862 of LNCS/LNAI, pages 153–167. Springer, 1994.
Google Scholar
O. Altamura, F. Esposito, F. A. Lisi, and D. Malerba. Symbolic learning techniques in paper document processing. In P. Perner and M. Petrou, editors, Machine learning and data mining in pattern recognition, volume 1715 of LNCS/LNAI, pages 159–173. Springer, 1999.
Chapter Google Scholar
D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982.
MATH MathSciNet Google Scholar
R. Behrens. A grammar based model for XML schema integration.In B. Lings and K. Jeffery, editors, Advances in Databases, 17th British National Conference on Databases (BNCOD 17), volume 1832 of LNCS, pages 172–190. Springer, 2000.
Google Scholar
R. Behrens and G. Buntrock. XML, eine Verwandte der Dyck-Sprachen. In 9. Theorietag der GI-Fachgruppe 0.1.5 Automaten und Formale Sprachen, volume Preprint 12/99 of Mathematische Schriften Kassel, September 1999.
Google Scholar
J. Berstel and L. Boasson. XML grammars. In N. Nielsen and B. Rovan, editors, Mathematical Foundations of Computer Science (MFCS’2000), volume 1893 of LNCS, pages 182–191. Springer, 2000. Long Version as Technical Report IGM 2000-06, see http://w-igm.univ-mlv.fr/berstel/Recherche.html.
Chapter Google Scholar
H. Boström. Theory-guided induction of logic programs by inference of regular languages. In Proc. of the 13th International Conference on Machine Learning, pages 46–53. Morgan Kaufmann, 1996.
Google Scholar
A. Brüggemann-Klein, S. Herrmann, and D. Wood. Context and caterpillars and structured documents. In E. V. Munson, C. Nicholas, and D. Wood, editors, Principles of Digital Document Processing; 4th International Workshop (PODDP’98), volume 1481 of LNCS, pages 1–9. Springer, 1998.
Google Scholar
A. Brüggemann-Klein and D. Wood. Caterpillars, context, tree automata and tree pattern matching. In G. Rozenberg and W. Thomas, editors, Developments in Language Theory; Foundations, Applications, and Perspectives (DLT’99), pages 270–285. World Scientific, 2000.
Google Scholar
CZ-Redaktion. Maschinenmenschen plauern per XML mit der Unternehmens-IT. Computer Zeitung, (50):30, December 2000.
Google Scholar
P. Dupont and L. Miclet. Inférence grammaticale réguliére: fondements théoriques et principaux algorithmes. Technical Report RR-3449, INRIA, 1998.
Google Scholar
P. Fankhauser and Y. Xu. Markitup! An incremental approach to document structure recognition. Electronic Publishing-Origination, Dissemination and Design, 6(4):447–456, 1994.
Google Scholar
H. Fernau. Identification of function distinguishable languages. In H. Arimura, S. Jain, and A. Sharma, editors, Proceedings of the 11th International Conference Algorithmic Learning Theory ALT 2000, volume 1968 of LNCS/LNAI, pages 116–130. Springer, 2000.
Chapter Google Scholar
H. Fernau. k-gram extensions of terminal distinguishable languages. In International Conference on Pattern Recognition (ICPR 2000), volume 2, pages 125–128. IEEE/IAPR, IEEE Press, 2000.
Article Google Scholar
H. Fernau. Approximative learning of regular languages. Technical Report WSI-2001-2, Universität Tübingen (Germany), Wilhelm-Schickard-Institut für Informatik, 2001.
Google Scholar
H. Fernau and J. M. Sempere. Permutations and control sets for learning non-regular language families. In A.L. Oliveira, editor, Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), volume 1891 of LNCS/LNAI, pages 75–88. Springer, 2000.
Google Scholar
D. Freitag. Usinggrammatical inferencetoimproveprecisionininformation extraction. InWorkshop on Grammatical Inference, Automata Induction, and Language Acquisition (ICML’97), Nashville,TN, 1997. Available through: http://www.univ-st-etienne.fr/eurise/pdupont/mlworkshop.html#proc.
P. Garcýa and E. Vidal. Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:920–925, 1990.
Article Google Scholar
T. Goan, N. Benson, and O. Etzioni. A grammatical inference algorithm for the World Wide Web. In Working Notes of the AAAI-96 Spring Symposium on Machine Learning in Information Access, 1996.
Google Scholar
E. M. Gold. Language identification in the limit. Information and Control (now Information and Computation), 10:447–474, 1967.
Article MATH Google Scholar
J. Gregor. Data-driven inductive inference of finite-state automata. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):305–322, 1994.
Article Google Scholar
C. de la Higuera. Current trends in grammatical inference.In F. J. Ferri et al., editors, Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR’2000, volume 1876 of LNCS, pages 28–31. Springer, 2000.
Google Scholar
T. Hu and R. Ingold. A mixed approach toward an efficient logical structure recognition. Electronic Publishing-Origination, Dissemination and Design, 6(4):457–468, 1994.
Google Scholar
S. Kobayashi and T. Yokomori. Learning approximately regular languages with reversible languages. Theoretical Computer Science, 174(1–2):251–257, 1997.
Article MATH MathSciNet Google Scholar
E. Mäkinen. Inferring regular languages by merging nonterminals. International Journal of Computer Mathematics, 70:601–616, 1999.
Article MATH MathSciNet Google Scholar
T. Mitchell. Machine Learning. McGraw-Hill, 1997.
Google Scholar
T. Mitchell. Machine learning and data mining. Communications of the ACM, 42:31–36, 1999.
Article Google Scholar
S. Muggleton and L. De Raedt. Inductive logic programming: theory and methods. Journal of Logic Programming, 20:629–679, 1994.
Article Google Scholar
V. Radhakrishnan. Grammatical Inference from Positive Data: An Effective Integrated Approach. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay (India), 1987.
Google Scholar
V. Radhakrishnan and G. Nagaraja. Inference of regular grammars via skeletons. IEEE Transactions on Systems, Man and Cybernetics, 17(6):982–992, 1987.
MathSciNet Google Scholar
G. Semeraro, F. Esposito, and D. Malerba. Learning contextual rules for document understanding. In Proceedings of the 10th IEEE Conference on Artificial Intelligence for Applications, pages 108–115, 1994.
Google Scholar
R. E. Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the Association for Computing Machinery, 22(2):215–225, 1975.
MATH MathSciNet Google Scholar
P. T. Wood. Rewriting XQL queries on XML repositories.In B. Lings and K. Jeffery, editors, Advances in Databases, 17th British National Conference on Databases (BNCOD 17), volume 1832 of LNCS, pages 209–226. Springer, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Sand 13, D-72076, Tübingen, Germany
Henning Fernau

Authors

Henning Fernau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Arno-Nitzsche-Str. 45, 04277, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernau, H. (2001). Learning XML Grammars. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2001. Lecture Notes in Computer Science(), vol 2123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44596-X_7

Download citation

DOI: https://doi.org/10.1007/3-540-44596-X_7
Published: 26 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42359-1
Online ISBN: 978-3-540-44596-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics