Abstract
In this paper we investigate the way of improving the performance of a Named Entity Extraction (NEE) system by applying machine learning techniques and corpus transformation. The main resources used in our experiments are the publicly available tagger TnT and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We split the NEE task into two subtasks 1) Named Entity Recognition (NER) that involves the identification of the group of words that make up the name of an entity and 2) Named Entity Classification (NEC) that determines the category of a named entity. We have focused our work on the improvement of the NER task, generating four different taggers with the same training corpus and combining them using a stacking scheme. We improve the baseline of the NER task (F β =1 value of 81.84) up to a value of 88.37. When a NEC module is added to the NER system the performance of the whole NEE task is also improved. A value of 70.47 is achieved from a baseline of 66.07.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA, pp. 224–231 (2000)
Breiman, L.: Bagging predictors. Machine Learning Journal 24, 123–140 (1996)
Carreras, X., Màrquezy, L., Padró, L.: Named Entity Extraction using AdaBoost. In: CoNLL 2002 Computational Natural Language Learning, pp. 167–170. Taiwan (2002)
Civit, M.: Guía para la anotación morfosintáctica del corpus CLiC-TALP. XTRACT Working Paper WP-00/06 (2000)
Florian, R.: Named entity recognition as a house of cards: Classifier stacking. In: CoNLL-2002. Computational Natural Language Learning, pp. 175–178. Taiwan (2002)
Halteren, v.H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27, 199–230 (2001)
Henderson, J.C., Brill, E.: Exploiting diversity in natural language processing. Combining parsers. In: 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, ACL, USA, pp. 187–194 (1999)
Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. In: Proceedings of NAACL 2000, USA, pp. 63–69 (2000)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Rössler, M.: Using Markov Models for Named Entity recognition in German newspapers. In: Proceedings of the Workshop on Machine Learning Approaches in Computational Linguistics, Italy, pp. 29–37 (2002)
Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowsky, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of COLING 2000, Germany, pp. 857–863 (2000)
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 Shared Task: Language- Independent Named Entity Recognition. In: Proceedings of CoNLL 2002, Taiwan, pp. 155–158 (2002)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Witten, I.H., Frank, E.: Data Mining. Machine Learning Algorithms in Java. Morgan Kaufmann Publishers, San Francisco (2000)
Wu, D., Ngai, G., Carpuat, M.: A stacked, voted, stacked model for named entity recognition. In: CoNLL 2003. Computational Natural Language Learning, Edmonton, pp. 200–203 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Troyano, J.A., Díaz, V.J., Enríquez, F., Romero, L. (2004). Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme. In: Lemaître, C., Reyes, C.A., González, J.A. (eds) Advances in Artificial Intelligence – IBERAMIA 2004. IBERAMIA 2004. Lecture Notes in Computer Science(), vol 3315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30498-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-30498-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23806-5
Online ISBN: 978-3-540-30498-2
eBook Packages: Springer Book Archive