Abstract
We sketch possible applications of grammatical inference techniques to problems arising in the context of XML. The idea is to infer document type definitions (DTDs) of XML documents in situations either when the original DTD is missing or when a DTD should be (re) designed or when a DTD should be restricted to a more user-oriented view on a subset of the (given) DTD. The usefulness of such an approach is underlined by the importance of knowing appropriate DTDs; this knowledge can be exploited, e.g., for optimizing database queries based on XML.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Ahonen. Automatic generation of SGML content models.In Electronic Publishing’ 96 (Palo Alto, California, USA), September 1996.
H. Ahonen. Generating grammars for structured documents using grammatical inference methods. Phd thesis. Also: Report A-1996-4, Department of Computer Science, University of Helsinki, Finland, 1996.
H. Ahonen, H. Mannila,and E. Nikunen. Forming grammars for structured documents:an application of grammatical inference.In R.C. Carrasco and J. Oncina, editors, Proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94): Grammatical Inference and Applications, volume 862 of LNCS/LNAI, pages 153–167. Springer, 1994.
O. Altamura, F. Esposito, F. A. Lisi, and D. Malerba. Symbolic learning techniques in paper document processing. In P. Perner and M. Petrou, editors, Machine learning and data mining in pattern recognition, volume 1715 of LNCS/LNAI, pages 159–173. Springer, 1999.
D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982.
R. Behrens. A grammar based model for XML schema integration.In B. Lings and K. Jeffery, editors, Advances in Databases, 17th British National Conference on Databases (BNCOD 17), volume 1832 of LNCS, pages 172–190. Springer, 2000.
R. Behrens and G. Buntrock. XML, eine Verwandte der Dyck-Sprachen. In 9. Theorietag der GI-Fachgruppe 0.1.5 Automaten und Formale Sprachen, volume Preprint 12/99 of Mathematische Schriften Kassel, September 1999.
J. Berstel and L. Boasson. XML grammars. In N. Nielsen and B. Rovan, editors, Mathematical Foundations of Computer Science (MFCS’2000), volume 1893 of LNCS, pages 182–191. Springer, 2000. Long Version as Technical Report IGM 2000-06, see http://w-igm.univ-mlv.fr/berstel/Recherche.html.
H. Boström. Theory-guided induction of logic programs by inference of regular languages. In Proc. of the 13th International Conference on Machine Learning, pages 46–53. Morgan Kaufmann, 1996.
A. Brüggemann-Klein, S. Herrmann, and D. Wood. Context and caterpillars and structured documents. In E. V. Munson, C. Nicholas, and D. Wood, editors, Principles of Digital Document Processing; 4th International Workshop (PODDP’98), volume 1481 of LNCS, pages 1–9. Springer, 1998.
A. Brüggemann-Klein and D. Wood. Caterpillars, context, tree automata and tree pattern matching. In G. Rozenberg and W. Thomas, editors, Developments in Language Theory; Foundations, Applications, and Perspectives (DLT’99), pages 270–285. World Scientific, 2000.
CZ-Redaktion. Maschinenmenschen plauern per XML mit der Unternehmens-IT. Computer Zeitung, (50):30, December 2000.
P. Dupont and L. Miclet. Inférence grammaticale réguliére: fondements théoriques et principaux algorithmes. Technical Report RR-3449, INRIA, 1998.
P. Fankhauser and Y. Xu. Markitup! An incremental approach to document structure recognition. Electronic Publishing-Origination, Dissemination and Design, 6(4):447–456, 1994.
H. Fernau. Identification of function distinguishable languages. In H. Arimura, S. Jain, and A. Sharma, editors, Proceedings of the 11th International Conference Algorithmic Learning Theory ALT 2000, volume 1968 of LNCS/LNAI, pages 116–130. Springer, 2000.
H. Fernau. k-gram extensions of terminal distinguishable languages. In International Conference on Pattern Recognition (ICPR 2000), volume 2, pages 125–128. IEEE/IAPR, IEEE Press, 2000.
H. Fernau. Approximative learning of regular languages. Technical Report WSI-2001-2, Universität Tübingen (Germany), Wilhelm-Schickard-Institut für Informatik, 2001.
H. Fernau and J. M. Sempere. Permutations and control sets for learning non-regular language families. In A.L. Oliveira, editor, Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), volume 1891 of LNCS/LNAI, pages 75–88. Springer, 2000.
D. Freitag. Usinggrammatical inferencetoimproveprecisionininformation extraction. InWorkshop on Grammatical Inference, Automata Induction, and Language Acquisition (ICML’97), Nashville,TN, 1997. Available through: http://www.univ-st-etienne.fr/eurise/pdupont/mlworkshop.html#proc.
P. Garcýa and E. Vidal. Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:920–925, 1990.
T. Goan, N. Benson, and O. Etzioni. A grammatical inference algorithm for the World Wide Web. In Working Notes of the AAAI-96 Spring Symposium on Machine Learning in Information Access, 1996.
E. M. Gold. Language identification in the limit. Information and Control (now Information and Computation), 10:447–474, 1967.
J. Gregor. Data-driven inductive inference of finite-state automata. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):305–322, 1994.
C. de la Higuera. Current trends in grammatical inference.In F. J. Ferri et al., editors, Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR’2000, volume 1876 of LNCS, pages 28–31. Springer, 2000.
T. Hu and R. Ingold. A mixed approach toward an efficient logical structure recognition. Electronic Publishing-Origination, Dissemination and Design, 6(4):457–468, 1994.
S. Kobayashi and T. Yokomori. Learning approximately regular languages with reversible languages. Theoretical Computer Science, 174(1–2):251–257, 1997.
E. Mäkinen. Inferring regular languages by merging nonterminals. International Journal of Computer Mathematics, 70:601–616, 1999.
T. Mitchell. Machine Learning. McGraw-Hill, 1997.
T. Mitchell. Machine learning and data mining. Communications of the ACM, 42:31–36, 1999.
S. Muggleton and L. De Raedt. Inductive logic programming: theory and methods. Journal of Logic Programming, 20:629–679, 1994.
V. Radhakrishnan. Grammatical Inference from Positive Data: An Effective Integrated Approach. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay (India), 1987.
V. Radhakrishnan and G. Nagaraja. Inference of regular grammars via skeletons. IEEE Transactions on Systems, Man and Cybernetics, 17(6):982–992, 1987.
G. Semeraro, F. Esposito, and D. Malerba. Learning contextual rules for document understanding. In Proceedings of the 10th IEEE Conference on Artificial Intelligence for Applications, pages 108–115, 1994.
R. E. Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the Association for Computing Machinery, 22(2):215–225, 1975.
P. T. Wood. Rewriting XQL queries on XML repositories.In B. Lings and K. Jeffery, editors, Advances in Databases, 17th British National Conference on Databases (BNCOD 17), volume 1832 of LNCS, pages 209–226. Springer, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fernau, H. (2001). Learning XML Grammars. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2001. Lecture Notes in Computer Science(), vol 2123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44596-X_7
Download citation
DOI: https://doi.org/10.1007/3-540-44596-X_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42359-1
Online ISBN: 978-3-540-44596-8
eBook Packages: Springer Book Archive