Two-Stage Named-Entity Recognition Using Averaged Perceptrons

  • Lars Buitinck
  • Maarten Marx
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7337)

Abstract

We describe a simple approach to named-entity recognition (NER), aimed initially at the Dutch language, but potentially applicable to other languages. Our NER system employs a two-stage architecture, with handcrafted but dataset-independent features for both stages, and is on a par with state-of-the-art systems described in the literature. Notably, our approach does not depend on language-specific assets such as gazetteers. The resulting system is quite fast and is implemented in less than 500 lines of code.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bogers, T.: Dutch named entity recognition: Optimizing features, algorithms, and output. Master’s thesis, Tilburg University (2004)Google Scholar
  2. Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Proc. 20th Meeting of CLIN, pp. 29–41 (2010)Google Scholar
  3. Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Proc. 23rd Benelux Conference on Artificial Intelligence (2011)Google Scholar
  4. Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. 43rd Annual Meeting of the ACL, pp. 363–370 (2005)Google Scholar
  5. Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Machine Learning 37(3), 277–296 (1999)MATHCrossRefGoogle Scholar
  6. Oostdijk, N., Reynaart, M., Monachesi, P., Van Noord, G., Ordelman, R., Schuurman, I., Vandeghinste, V.: From D-Coi to SoNaR: A reference corpus for Dutch. In: Proc. Int’l Conf. on Language Resources and Evaluation, LREC (2008)Google Scholar
  7. Rizzolo, N., Roth, D.: Learning Based Java for rapid development of NLP systems. In: Proc. Int’l Conf. on Language Resources and Evaluation, LREC (2010)Google Scholar
  8. Sarawagi, S., Cohen, W.W.: Semi-Markov conditional random fields for information extraction. Advances in Neural Information Processing Systems 17, 1185–1192 (2004)Google Scholar
  9. Tjong, E.F., Sang, K.: Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proc. 6th Conf. on Computational Natural Language Learning (CoNLL), pp. 155–158 (2002)Google Scholar
  10. Wang, Y., Patrick, J.: Cascading classifiers for named entity recognition in clinical notes. In: Proc. Workshop on Biomedical Information Extraction (WBIE), pp. 42–49 (2009)Google Scholar
  11. Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proc. 6th Conf. on Computational Natural Language Learning, CoNLL (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Lars Buitinck
    • 1
  • Maarten Marx
    • 1
  1. 1.Information and Language Processing SystemsInformatics Institute, University of AmsterdamThe Netherlands

Personalised recommendations