Advertisement

Grammars

, Volume 4, Issue 1, pp 41–65 | Cite as

Tabulation for Multi-Purpose Partial Parsing

  • Vitor Jorge Rocio
  • Gabriel Pereira Lopes
  • Eric de la Clergerie
Article

Abstract

Efficient partial parsing systems (chunkers) are urgently required by various natural language application areas because these parsers always produce partially parsed text even when the text does not fully fit existing lexica and grammars. Availability of partially parsed corpora is absolutely necessary for extracting various kinds of information that may then be fed into those systems, thereby increasing their processing power. In this paper, we propose an efficient partial parsing scheme, based on chart parsing, that is flexible enough to support both normal parsing tasks and diagnosis in previously obtained partial parses of possible causes (kinds of faults) that led to those partial, instead of complete, parses. Through the use of the built-in tabulation capabilites of the DyALog system, we implemented a partial parser that runs as fast as the best non-deterministic parsers. In this paper we elaborate on the implementation of two different grammar formalisms: Definite Clause Grammars (DCG) extended with head declarations and Bound Movement Grammars (BMG).

constituent movement head-driven parsing partial parsing tabulation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. Abney, S. Partial parsing via finite-state cascades. In John Carroll, editor, Proceedings of Workshop on Robust Parsing at Eighth Summer School in Logic, Language and Information, 8–15, 1996.Google Scholar
  2. Balsa, J., V. Dahl and J.G.P. Lopes. Datalog grammars for abductive syntactic error diagnosis and repair. In Proceedings of the Fifth International Workshop on Natural Language Understanding and Logic Programming, Lisbon, 111–125, 1995.Google Scholar
  3. Carroll, J., G. Minnen and T. Briscoe. Can subcategorization probabilities help a statistical parser? In Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora, 118–126, Montreal, Canada, 1998.Google Scholar
  4. de la Clergerie, E. and B. Lang. LPDA: Another look at tabulation in logic programming. In V. Hentenryck, editor, Proceedings of the 11th International Conference on Logic Programming (ICLP'94), 470–486. MIT Press, 1994.Google Scholar
  5. Collins, M. Three generative, lexicalised models for statistical parsing. In Proceedings of the European Chapter of the Annual Meeting of ACL, 1997.Google Scholar
  6. Collins, M. and J. Brooks. Prepositional phrase attachment through a backed-off model. In Proceedings of the Third Workshop on Very Large Corpora, 1995.Google Scholar
  7. Daelemans, W., S. Buchholz and J. Veenstra. Memory-based shallow parsing. In Proceedings of the EACL'99 Workshop on Computational Natural Language Learning, Bergen, Norway, 53–60, 1999.Google Scholar
  8. Earley, J. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, 1970.Google Scholar
  9. Grefenstette, G. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, 1994. PhD Thesis, University of Pittsburgh.Google Scholar
  10. Hobbs, J.R. et al. FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In E. Roche and Y. Schabes, editors, Finite-State Language Processing, 383–406. Bradford Books, 1997.Google Scholar
  11. Lopes, J., V. Rocio, R. Viccari and E. Padilha. Bound movement grammar for natural language parsing. In Proceedings Second Workshop on Computational Processing of Writen and Spoken Portuguese, Curitiba, Brazil, October 21-22, 1996, 11–19. SBC, Brazil, 1996.Google Scholar
  12. Lopes, J.G.P. and J. Balsa. Overcoming incomplete information in NLP systems-verb subcategorization. In F. Giunchiglia, editor, Proceedings of Artificial Intelligence: Methods Systems and Applications, 8th International Conference, AIMSA'98, Sozopol, Bulgaria, Proceedings. Lecture Notes on Artificial Intelligence 1480, 331–340, Springer Verlag, 1998.Google Scholar
  13. Lopes, J.G.P. and V.J. Rocio. An infra-structure for diagnosing causes for partially parsed natural language input. In Proceedings of the 6th International Symposium on Social Communication, 550–554, Santiago deCuba, 1999. Editorial Oriente.Google Scholar
  14. Lopes, J.G.P., V.J. Rocio and J. Balsa. Superando a incompletude da informação lexical. In P. Marrafa and M.A. Mota, editors, Linguística Computacional: Investigação Fundamental e Aplicações, Edições Colibri, 1999 (in Portuguese).Google Scholar
  15. Marques, N. Uma metodologia para a modelação estatística da subcategorização verbal,PhD thesis, FCT-Universidade Nova de Lisboa, 2000.Google Scholar
  16. Marques, N. and J.G.P. Lopes. Using neural nets for Portuguese part-of-speech tagging. In Pro-ceedings of the Fifth International Conference on the Cognitive Science of Natural Language Understanding, Dublin City University, 1996.Google Scholar
  17. Marques, N.M.C. and J.G.P. Lopes. Using neural nets for Portuguese part-of-speech tagging. In Proceedings of the Fifth International Conference on the Cognitive Science of Natural Language Processing, Dublin City University, 1996.Google Scholar
  18. Pereira, F. Extraposition grammars, American Journal of Computational Linguistics, 7(4): 1981.Google Scholar
  19. Pereira, F.C.N. and D.H.D. Warren. Definite clause grammars for language analysis-a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence, 13: 231–278, 1980.Google Scholar
  20. Ramshaw, L. and M. Marcus. Text chunking using transformation-based learning. In Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, 62–64, 1995.Google Scholar
  21. Ratnaparkhi, A. Statistical models for unsupervised prepositional phrase attachment. In Proceedings of COLING-ACL'98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, 1998.Google Scholar
  22. Ritchie, G. Completeness conditions for mixed strategy bidirectional parsing. Computational Linguistics, 25: 457–486, 1999.Google Scholar
  23. Rocio, V., M.A. Alves, J.G.P. Lopes, M.F. Xavier and G. Vicente. Automated creation of a partially syntactically annotated corpus of medieval portuguese using contemporary Portuguese resources. In Laurent Roussarie, editor, Journées ATALA sur les corpus annotés pour la syntaxe, June 1999, Paris, 59–67, ATALA-Association pour le Traitement Automatique des Langues, 1999.Google Scholar
  24. Roth, M. and J. Carroll. Valence induction with a head-lexicalized PCFG. Technical report, draft at http://www.ims.uni-stuttgart.de/ mats, 1996.Google Scholar
  25. Sagonas, K.F., T. Swift, D.S. Warren, J. Freire and P. Rao. The XSB programmer's manual, version 1.7.1. Technical report, State University of New York at Stone Brook, 1997.Google Scholar
  26. Sikkel K. and R.O.D. Akker. Predictive head-corner chart parsing. In H. Bunt and M. Tomita, editors, Recent Advances in Parsing Technology, 169–182, Kluwer, 1996.Google Scholar
  27. Silva, J., G. Dias, S. Guillore and G. Lopes. Using localmaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In P. Barahona, editor, Proceedings of the 9th Portuguese Conference on Artificial Intelligence (EPIA'99), Evora, Portugal. Lecture Notes in Artificial Intelligence, vol. 1695, 113–132, Springer Verlag, 1999.Google Scholar
  28. Skut, W. and T. Brants. Chunk tagger. In Proceedings of the ESSLLI-98 Workshop on Automated Acquisition of Syntax and Parsing, Saarbrucken, Germany, 1998.Google Scholar
  29. Skut W. and T. Brants. A maximum-entropy partial parser for unrestricted text. In Proceedings of the 6th Workshop on Very Large Corpora, Montréal, Québec, 1998.Google Scholar
  30. Yeh, A.S. and M.B. Vilain. Some properties of preposition and subordinate conjunction attachments. In Proceedings of COLING-ACL'98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, 1436–1442, 1998.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Vitor Jorge Rocio
    • 1
  • Gabriel Pereira Lopes
    • 2
  • Eric de la Clergerie
    • 3
  1. 1.CENTRIA – Departamento de Informática, Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaMonte de CaparicaPortugal
  2. 2.CENTRIA – Departamento de Informática, Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaMonte de CaparicaPortugal
  3. 3.INRIA - Rocquencourt -Le ChesnayFrance

Personalised recommendations