Advertisement

Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition

  • Óscar Ferrández
  • Antonio Toral
  • Rafael Muñoz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3999)

Abstract

This paper presents a Named Entity Recognition (NER) system for Spanish which combines the learning and knowledge approaches. Our contribution focuses on two matters: first, a discussion about selecting the best features for a machine learning NER system. Second, an error study of this system which lead us to the creation of a set of general post-processing rules. These issues are explained in detail and then evaluated. The selection of features provides an improvement of around 2.3% over the results of our previous system while the application of the set of post-processing rules provides an increment of performance which is around 3.6%, reaching finally 83.37% f-score.

Keywords

Hide Markov Model Natural Language Processing Vote Strategy Name Entity Recognition Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arevalo, M., Civit, M., Martí, M.A.: Mice: A module for named entity recognition and clasification. International Journal of Corpus Linguistics 9(1), 53–68 (2004)CrossRefGoogle Scholar
  2. 2.
    Bogers, T.: Dutch named entity recognition: Optimizing features, algorithms, and output. Master’s thesis, Tilburg University (September 2004)Google Scholar
  3. 3.
    Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, WVLC 1998, Montreal, Canada (1998)Google Scholar
  4. 4.
    Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)Google Scholar
  5. 5.
    Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Technical Report ILK 03-10, Tilburg University (November 2003)Google Scholar
  6. 6.
    Ferrández, Ó., Kozareva, Z., Montoyo, A., Muñoz, R.: Nerua: sistema de detección y clasificación de entidades utilizando aprendizaje automático. Procesamiento del Lenguaje Natural 35, 37–44 (2005)Google Scholar
  7. 7.
    Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)Google Scholar
  8. 8.
    Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark, pp. 466–471 (1996)Google Scholar
  9. 9.
    Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Mitkov, R., Nicolov, N., Angelova, G., Bontcheva, K., Nikolov, N. (eds.) Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, Bulgaria (2001)Google Scholar
  10. 10.
    Rössler, M.: Using markov models for named entity recognition in german newspapers. In: Proceedings of the Workshop on Machine Learning Aproaches in Computational Linguistics, Trento, Italy, pp. 29–37 (2002)Google Scholar
  11. 11.
    Schröder, I.: A case study in part-of-speech tagging using the icopost toolkit. Technical Report FBI-HH-M-314/02, Department of Computer Science, University of Hamburg (2002)Google Scholar
  12. 12.
    Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, pp. 960–966 (August 2002)Google Scholar
  13. 13.
    Tjong Kim Sang, E.F.: Introduction to the conll 2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)Google Scholar
  14. 14.
    Toral, A.: DRAMNERI: a free knowledge based tool to Named Entity Recognition. In: Proceedings of the 1st Free Software Technologies Conference (2005)Google Scholar
  15. 15.
    Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 473–480 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Óscar Ferrández
    • 1
  • Antonio Toral
    • 1
  • Rafael Muñoz
    • 1
  1. 1.Natural Language Processing and Information Systems Group, Department of Software and Computing SystemsUniversity of AlicanteSpain

Personalised recommendations