Abstract
We describe a simple approach to named-entity recognition (NER), aimed initially at the Dutch language, but potentially applicable to other languages. Our NER system employs a two-stage architecture, with handcrafted but dataset-independent features for both stages, and is on a par with state-of-the-art systems described in the literature. Notably, our approach does not depend on language-specific assets such as gazetteers. The resulting system is quite fast and is implemented in less than 500 lines of code.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bogers, T.: Dutch named entity recognition: Optimizing features, algorithms, and output. Master’s thesis, Tilburg University (2004)
Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Proc. 20th Meeting of CLIN, pp. 29–41 (2010)
Desmet, B., Hoste, V.: Dutch named entity recognition using classifier ensembles. In: Proc. 23rd Benelux Conference on Artificial Intelligence (2011)
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. 43rd Annual Meeting of the ACL, pp. 363–370 (2005)
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Machine Learning 37(3), 277–296 (1999)
Oostdijk, N., Reynaart, M., Monachesi, P., Van Noord, G., Ordelman, R., Schuurman, I., Vandeghinste, V.: From D-Coi to SoNaR: A reference corpus for Dutch. In: Proc. Int’l Conf. on Language Resources and Evaluation, LREC (2008)
Rizzolo, N., Roth, D.: Learning Based Java for rapid development of NLP systems. In: Proc. Int’l Conf. on Language Resources and Evaluation, LREC (2010)
Sarawagi, S., Cohen, W.W.: Semi-Markov conditional random fields for information extraction. Advances in Neural Information Processing Systems 17, 1185–1192 (2004)
Tjong, E.F., Sang, K.: Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proc. 6th Conf. on Computational Natural Language Learning (CoNLL), pp. 155–158 (2002)
Wang, Y., Patrick, J.: Cascading classifiers for named entity recognition in clinical notes. In: Proc. Workshop on Biomedical Information Extraction (WBIE), pp. 42–49 (2009)
Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proc. 6th Conf. on Computational Natural Language Learning, CoNLL (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buitinck, L., Marx, M. (2012). Two-Stage Named-Entity Recognition Using Averaged Perceptrons. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-31178-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)