Tree k-Grammar Models for Natural Language Modelling and Parsing

  • Jose L. Verdú-Mas
  • Mikel L. Forcada
  • Rafael C. Carrasco
  • Jorge Calera-Rubio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2396)

Abstract

In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that also stores information about the parent node’s category, and (3) a model that estimates the probabilities according to a generalized k-gram scheme for trees with k = 3. The last model allows for faster parsing and decreases considerably the perplexity of test samples.

References

  1. 1.
    Ezra Black, Steven Abney, Dan Flickinger, Claudia Gdaniec, Ralph Grishman, Philip Harrison, Donald Hindle, Robert Ingria, Frederick Jelinek, Judith Klavans, Mark Liberman, Mitch Marcus, Salim Roukos, Beatrice Santorini, and Tomek Strzalkowski. A procedure for quantitatively comparing the syntatic coverage of english grammars. In Proc. Speech and Natural Language Workshop 1991, pages306–311, San Mateo, CA, 1991. Morgan Kauffmann.Google Scholar
  2. 2.
    Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.Google Scholar
  3. 3.
    Rafael C. Carrasco, Jose Oncina, and Jorge Calera-Rubio. Stochastic inference of regular tree languages. Machine Learning, 44(1/2):185–197, 2001.MATHCrossRefGoogle Scholar
  4. 4.
    John Carroll, Ted Briscoe, and Antonio Sanfilippo. Parser evaluation: A survey and a new proposal. In Proceedings of the International Conference on Language REsources and Evaluation, pages 447–454, Granada, Spain, 1998.Google Scholar
  5. 5.
    J.-C. Chappelier and M. Rajman. A generalized CYK algorithm for parsing stochastic CFG. In Actes de TAPD’98, pages 133–137, 1998.Google Scholar
  6. 6.
    Eugene Charniak. Treebank grammars. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1031–1036. AAAI Press/MIT Press, 1996.Google Scholar
  7. 7.
    L. Frazier and K. Rayner. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14:178–210, 1982.CrossRefGoogle Scholar
  8. 8.
    Mark Johnson. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613–632, 1998.Google Scholar
  9. 9.
    Alexander Krotov, Robert Gaizauskas, Mark Hepple, and Yorick Wilks. Compacting the Penn Treebank grammar. In Proceedings of COLING/ACL’98, pages699–703, 1998.Google Scholar
  10. 10.
    Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.Google Scholar
  11. 11.
    Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19:313–330, 1993.Google Scholar
  12. 12.
    Maurice Nivat and Andreas Podelski. Minimal ascending and descending tree automata. SIAM Journal on Computing, 26(1):39–58, 1997.MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    A. Radford, M. Atkinson, D. Britain, H. Clahsen, and A. Spencer. Linguistics: an introduction. Cambridge Univ. Press, Cambridge, 1999.Google Scholar
  14. 14.
    J.R. Rico-Juan, J. Calera-Rubio, and R.C. Carrasco. Probabilistic k-testable tree-languages. In A.L. Oliveira, editor, Proceedings of 5th International Colloquium, ICGI2000, Lisbon (Portugal), volume 1891 of Lecture Notes in Computer Science, pages 221–228, Berlin, 2000. Springer.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jose L. Verdú-Mas
    • 1
  • Mikel L. Forcada
    • 1
  • Rafael C. Carrasco
    • 1
  • Jorge Calera-Rubio
    • 1
  1. 1.Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations