Mining Parsing Results for Lexical Correction: Toward a Complete Correction Process of Wide-Coverage Lexicons

  • Lionel Nicolas
  • Benoît Sagot
  • Miguel A. Molinero
  • Jacques Farré
  • Éric de La Clergerie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5603)

Abstract

The coverage of a parser depends mostly on the quality of the underlying grammar and lexicon. The development of a lexicon both complete and accurate is an intricate and demanding task. We introduce a automatic process for detecting missing, incomplete and erroneous entries in a morphological and syntactic lexicon, and for suggesting corrections hypotheses for these entries. The detection of dubious lexical entries is tackled by two different techniques; the first one is based on a specific statistical model, the other one benefits from information provided by a part-of-speech tagger. The generation of correction hypotheses for dubious lexical entries is achieved by studying which modifications could improve the successful parse rate of sentences in which they occur. This process brings together various techniques based on taggers, parsers and statistical models. We report on its application for improving a large-coverage morphological and syntacic French lexicon, the Lefff.

Keywords

Lexical acquisition and correction wide coverage lexicon error mining tagger entropy classifier syntactic parser 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nicolas, L., Sagot, B., Molinero, M.A., Farré, J., Villemonte de La Clergerie, E.: Computer aided correction and extension of a syntactic wide-coverage lexicon. In: Proceedings of Coling 2008, Manchester (2008)Google Scholar
  2. 2.
    Daumé III, H.: Notes on CG and LM-BFGS optimization of logistic regression. Paper available at, http://pub.hal3.name/daume04cg-bfgs, implementation available at, http://hal3.name/megam/ (August 2004)
  3. 3.
    Molinero, M.A., Barcala, F.M., Otero, J., Graña, J.: Practical application of one-pass viterbi algorithm in tokenization and pos tagging. In: Proceedings of Recent Advances in Natural Language Processing (RANLP), pp. 35–40 (2007)Google Scholar
  4. 4.
    Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural (robust syntactic analysis methods for natural language tagging). Doctoral thesis, Universidad de A Coruña, Spain (2000)Google Scholar
  5. 5.
    Sagot, B., Villemonte de La Clergerie, É.: Error mining in parsing results. In: Proceedings of ACL/COLING 2006, Sydney, Australia, pp. 329–336. Association for Computational Linguistics (2006)Google Scholar
  6. 6.
    Sagot, B., de La Clergerie, E.: Fouille d’erreurs sur des sorties d’analyseurs syntaxiques. Traitement Automatique des Langues 49(1) (2008) (to appear)Google Scholar
  7. 7.
    Van Noord, G.: Error mining for wide-coverage grammar engineering. In: Proceedings of ACL 2004, Barcelona, Spain (2004)Google Scholar
  8. 8.
    Barg, P., Walther, M.: Processing unkonwn words in hpsg. In: Proceedings of the 36th Conference of the ACL and the 17th International Conference on Computational Linguistics (1998)Google Scholar
  9. 9.
    Van de Cruys, T.: Automatically extending the lexicon for parsing. In: Proceedings of the eleventh ESSLLI student session (2006)Google Scholar
  10. 10.
    Yi, Z., Kordoni, V.: Automated deep lexical acquisition for robust open texts processing. In: Proceedings of LREC- 2006 (2006)Google Scholar
  11. 11.
    Nicolas, L., Farré, J., Villemonte de La Clergerie, É.: Correction mining in parsing results. In: Proceedings of LTC 2007, Poznan, Poland (2007)Google Scholar
  12. 12.
    Erbach, G.: Syntactic processing of unknown words. In: IWBS Report 131 (1990)Google Scholar
  13. 13.
    Sagot, B., Clément, L., Villemonte de La Clergerie, É., Boullier, P.: The Lefff 2 syntactic lexicon for french: architecture, acquisition, use. In: Proceedings of LREC 2006 (2006)Google Scholar
  14. 14.
    Thomasset, F., Villemonte de La Clergerie, É.: Comment obtenir plus des méta-grammaires. In: Proceedings of TALN 2005 (2005)Google Scholar
  15. 15.
    Villemonte de La Clergerie, E.: DyALog: a tabular logic programming based environment for NLP. In: Proceedings of 2nd International Workshop on Constraint Solving and Language Processing (CSLP 2005), Barcelona, Spain (2005)Google Scholar
  16. 16.
    Boullier, P., Sagot, B.: Efficient parsing of large corpora with a deep LFG parser. In: Proceedings of LREC (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Lionel Nicolas
    • 1
  • Benoît Sagot
    • 2
  • Miguel A. Molinero
    • 3
  • Jacques Farré
    • 1
  • Éric de La Clergerie
    • 2
  1. 1.Équipe RL, Laboratoire I3SUNSA + CNRSSophia AntipolisFrance
  2. 2.Projet ALPAGEINRIA Rocquencourt + Paris 7Le ChesnayFrance
  3. 3.Grupo LYSUniv. de A CoruñaA CoruñaEspaña

Personalised recommendations