Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme

Troyano, José A.; Díaz, Víctor J.; Enríquez, Fernando; Romero, Luisa

doi:10.1007/978-3-540-30498-2_30

Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme

José A. Troyano²¹,
Víctor J. Díaz²¹,
Fernando Enríquez²¹ &
…
Luisa Romero²¹

Conference paper

973 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3315))

Abstract

In this paper we investigate the way of improving the performance of a Named Entity Extraction (NEE) system by applying machine learning techniques and corpus transformation. The main resources used in our experiments are the publicly available tagger TnT and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We split the NEE task into two subtasks 1) Named Entity Recognition (NER) that involves the identification of the group of words that make up the name of an entity and 2) Named Entity Classification (NEC) that determines the category of a named entity. We have focused our work on the improvement of the NER task, generating four different taggers with the same training corpus and combining them using a stacking scheme. We improve the baseline of the NER task (F _β=1 value of 81.84) up to a value of 88.37. When a NEC module is added to the NER system the performance of the whole NEE task is also improved. A value of 70.47 is achieved from a baseline of 66.07.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA, pp. 224–231 (2000)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning Journal 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Carreras, X., Màrquezy, L., Padró, L.: Named Entity Extraction using AdaBoost. In: CoNLL 2002 Computational Natural Language Learning, pp. 167–170. Taiwan (2002)
Google Scholar
Civit, M.: Guía para la anotación morfosintáctica del corpus CLiC-TALP. XTRACT Working Paper WP-00/06 (2000)
Google Scholar
Florian, R.: Named entity recognition as a house of cards: Classifier stacking. In: CoNLL-2002. Computational Natural Language Learning, pp. 175–178. Taiwan (2002)
Google Scholar
Halteren, v.H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27, 199–230 (2001)
Article Google Scholar
Henderson, J.C., Brill, E.: Exploiting diversity in natural language processing. Combining parsers. In: 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, ACL, USA, pp. 187–194 (1999)
Google Scholar
Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. In: Proceedings of NAACL 2000, USA, pp. 63–69 (2000)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Rössler, M.: Using Markov Models for Named Entity recognition in German newspapers. In: Proceedings of the Workshop on Machine Learning Approaches in Computational Linguistics, Italy, pp. 29–37 (2002)
Google Scholar
Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowsky, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of COLING 2000, Germany, pp. 857–863 (2000)
Google Scholar
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 Shared Task: Language- Independent Named Entity Recognition. In: Proceedings of CoNLL 2002, Taiwan, pp. 155–158 (2002)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Witten, I.H., Frank, E.: Data Mining. Machine Learning Algorithms in Java. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
Wu, D., Ngai, G., Carpuat, M.: A stacked, voted, stacked model for named entity recognition. In: CoNLL 2003. Computational Natural Language Learning, Edmonton, pp. 200–203 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Languages and Computer Systems, University of Seville, Av. Reina Mercedes s/n, 41012, Sevilla, Spain
José A. Troyano, Víctor J. Díaz, Fernando Enríquez & Luisa Romero

Authors

José A. Troyano
View author publications
You can also search for this author in PubMed Google Scholar
Víctor J. Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Enríquez
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Romero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratorio Nacional de Informatica Avanzada, A.C. Lania, Rebsamen 80, Col. Centro, 91000, Xalapa, Veracruz, Mexico
Christian Lemaître
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics (INAOE), Luis E. Erro 1, 72840, Sta. Maria Tonantzintla, Puebla, Mexico
Carlos A. Reyes
Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1, 72840, Tonantzintla, México
Jesús A. González

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Troyano, J.A., Díaz, V.J., Enríquez, F., Romero, L. (2004). Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme. In: Lemaître, C., Reyes, C.A., González, J.A. (eds) Advances in Artificial Intelligence – IBERAMIA 2004. IBERAMIA 2004. Lecture Notes in Computer Science(), vol 3315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30498-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-30498-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23806-5
Online ISBN: 978-3-540-30498-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics