Combining Data-Driven Systems for Improving Named Entity Recognition

Kozareva, Zornitsa; Ferrández, Oscar; Montoyo, Andres; Muñoz, Rafael; Suárez, Armando

doi:10.1007/11428817_8

Combining Data-Driven Systems for Improving Named Entity Recognition

Zornitsa Kozareva¹⁹,
Oscar Ferrández¹⁹,
Andres Montoyo¹⁹,
Rafael Muñoz¹⁹ &
…
Armando Suárez¹⁹

Conference paper

1380 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Abstract

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER which involves identification of proper names in texts, and classification into a set of predefined categories of interest as Person names, Organizations (companies, government organizations, committees, etc.) and Locations (cities, countries, rivers, etc). We examined the differences in language models learned by different data-driven systems performing the same NLP tasks and how they can be exploited to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 98.5% accuracy for recognition and 84.94% accuracy for classification of NE for Spanish language were achieved.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arevalo, M., Civit, M., Mart’ı, M.A.: MICE: A module for Named Entity Recognition and Clasification. International Journal of Corpus Linguistics 9(1), 53–68 (2004)
Article Google Scholar
Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp. 167–170 (2002)
Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Technical Report ILK 03-10, Tilburg University (November 2003)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 168–171 (2003)
Google Scholar
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 184–187 (2003)
Google Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Mitkov, R., Nicolov, N., Angelova, G., Bontcheva, K., Nikolov, N. (eds.) Proceedings of the Recent Advances in Natural Language Processing, Tzigov Chark (2001)
Google Scholar
Ratnaparkhi, A.: Maximum Entropy Models For Natural Language Ambiguity Resolution. PhD thesis, Computer and Information Science Department, University of Pennsylvania (1998)
Google Scholar
Sang, T.K.: Introduction to the conll-2002 shared task: Language independent named entity recognition. In: Proceedings of CoNLL-2002, pp. 155–158 (2002)
Google Scholar
Schröder, I.: A case study in part-of-speech tagging using the icopost toolkit. Technical Report FBI-HH-M-314/02, Department of Computer Science, University of Hamburg (2002)
Google Scholar
Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Chen, H.-H., Lin, C.-Y. (eds.) Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, August 2002, pp. 960–966 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,
Zornitsa Kozareva, Oscar Ferrández, Andres Montoyo, Rafael Muñoz & Armando Suárez

Authors

Zornitsa Kozareva
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Ferrández
View author publications
You can also search for this author in PubMed Google Scholar
Andres Montoyo
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Armando Suárez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
Andrés Montoyo
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kozareva, Z., Ferrández, O., Montoyo, A., Muñoz, R., Suárez, A. (2005). Combining Data-Driven Systems for Improving Named Entity Recognition. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_8

Download citation

DOI: https://doi.org/10.1007/11428817_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics