Advertisement

Grammatical tree matching

  • Pekka Kilpeläinen
  • Heikki Mannila
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 644)

Abstract

In structured text databases documents are represented as parse trees, and different tree matching notions can be used as primitives for query languages. Two useful notions of tree matching, tree inclusion and tree pattern matching both seem to require superlinear time. In this paper we give a general sufficient condition for a tree matching problem to be solvable in linear time, and apply it to tree pattern matching and tree inclusion. The application is based on the notion of a nonperiodic parse tree. We argue that most text documents can be modeled in a natural way using grammars yielding nonperiodic parse trees. We show how the knowledge that the target tree is nonperiodic can be used to obtain linear time algorithms for the tree matching problems. We also discuss the preprocessing of patterns for grammatical tree matching.

Keywords

Regular Expression Query Language Match Problem Parse Tree Recursive Call 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.Google Scholar
  2. 2.
    F. Bancilhon and P. Richard. Managing texts and facts in a mixed data base environment. In G. Gardarin and E. Gelenbe, editors, New Applications of Data Bases. Academic Press, 1984.Google Scholar
  3. 3.
    G. Coray, R. Ingold, and C. Vanoirbeek. Formatting structured documents: Batch versus interactive. In J.C. van Vliet, editor, Text Processing and Document Manipulation. Cambridge University Press, 1986.Google Scholar
  4. 4.
    M. Dubiner, Z. Galil, and E. Magen. Faster tree pattern matching. In Proc. of the Symposium on Foundations of Computer Science (FOCS'90), pages 145–150, 1990.Google Scholar
  5. 5.
    P. Dublish. Some comments on the subtree isomorphism problem for ordered trees. Information Processing Letters, 36:273–275, 1990.Google Scholar
  6. 6.
    R. Furuta, V. Quint, and J. André. Interactively editing structured documents. Electronic Publishing, 1(1):19–44, 1988.Google Scholar
  7. 7.
    G. H. Gonnet and F. Wm. Tompa. Mind your grammar — a new approach to text databases. In Proc. of the Conference on Very Large Data Bases (VLDB'87), pages 339–346, 1987.Google Scholar
  8. 8.
    R. Grossi. A note on the subtree isomorphism for ordered trees and related problems. Information Processing Letters, 39:81–84, 1991.Google Scholar
  9. 9.
    C. M. Hoffman and M. J. O'Donnell. Pattern matching in trees. Journal of the ACM, 29(1):68–95, January 1982.Google Scholar
  10. 10.
    J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.Google Scholar
  11. 11.
    P. Kilpeläinen, G. Lindén, H. Mannila, and E. Nikunen. A structured document database system. In Richard Furuta, editor, EP90 — Proceedings of the International Conference on Electronic Publishing, Document Manipulation & Typography, The Cambridge Series on Electronic Publishing. Cambridge University Press, 1990.Google Scholar
  12. 12.
    P. Kilpeläinen and H. Mannila. Ordered and unordered tree inclusion. Report A-1991-4, University of Helsinki, Dept. of Comp. Science, August 1991.Google Scholar
  13. 13.
    P. Kilpeläinen and H. Mannila. The tree inclusion problem. In Samson Abramsky and T.S.E. Maibaum, editors, TAPSOFT'91, Proc. of the International Joint Conference on the Theory and Practice of Software Development, Vol. 1: Colloqium on Trees in Algebra and Programming (CAAP'91), pages 202–214. Springer-Verlag, 1991.Google Scholar
  14. 14.
    P. Kilpeläinen and H. Mannila. A query language for structured text databases. Manuscript in preparation, February 1992.Google Scholar
  15. 15.
    S. R. Kosaraju. Efficient tree pattern matching. In Proc. of the Symposium on Foundations of Computer Science (FOCS'89), pages 178–183, 1989.Google Scholar
  16. 16.
    E. Mäkinen. On the subtree isomorphism problem for ordered trees. Information Processing Letters, 32:271–273, September 1989.Google Scholar
  17. 17.
    H. Mannila and K.-J. Räihä. On query languages for the p-string data model. In H. Kangassalo, S. Ohsuga, and H. Jaakkola, editors, Information Modelling and Knowledge Bases, pages 469–482. IOS Press, 1990.Google Scholar
  18. 18.
    E. Nikunen. Views in structured text databases. Phil.lic. thesis, University of Helsinki, Department of Computer Science, December 1990.Google Scholar
  19. 19.
    V. Quint and I. Vatton. GRIF: An interactive system for structured document manipulation. In J.C. van Vliet, editor, Proceedings of the International Conference on Text Processing and Document Manipulation. Cambridge University Press, 1986.Google Scholar
  20. 20.
    S. W. Reyner. An analysis of a good algorithm for the subtree problem. SIAM Journal of Computing, 6(4):730–732, December 1977.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Pekka Kilpeläinen
    • 1
  • Heikki Mannila
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations