Skip to main content

Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme

  • Conference paper
  • 973 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3315))

Abstract

In this paper we investigate the way of improving the performance of a Named Entity Extraction (NEE) system by applying machine learning techniques and corpus transformation. The main resources used in our experiments are the publicly available tagger TnT and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We split the NEE task into two subtasks 1) Named Entity Recognition (NER) that involves the identification of the group of words that make up the name of an entity and 2) Named Entity Classification (NEC) that determines the category of a named entity. We have focused our work on the improvement of the NER task, generating four different taggers with the same training corpus and combining them using a stacking scheme. We improve the baseline of the NER task (F β =1 value of 81.84) up to a value of 88.37. When a NEC module is added to the NER system the performance of the whole NEE task is also improved. A value of 70.47 is achieved from a baseline of 66.07.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA, pp. 224–231 (2000)

    Google Scholar 

  2. Breiman, L.: Bagging predictors. Machine Learning Journal 24, 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  3. Carreras, X., Màrquezy, L., Padró, L.: Named Entity Extraction using AdaBoost. In: CoNLL 2002 Computational Natural Language Learning, pp. 167–170. Taiwan (2002)

    Google Scholar 

  4. Civit, M.: Guía para la anotación morfosintáctica del corpus CLiC-TALP. XTRACT Working Paper WP-00/06 (2000)

    Google Scholar 

  5. Florian, R.: Named entity recognition as a house of cards: Classifier stacking. In: CoNLL-2002. Computational Natural Language Learning, pp. 175–178. Taiwan (2002)

    Google Scholar 

  6. Halteren, v.H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27, 199–230 (2001)

    Article  Google Scholar 

  7. Henderson, J.C., Brill, E.: Exploiting diversity in natural language processing. Combining parsers. In: 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, ACL, USA, pp. 187–194 (1999)

    Google Scholar 

  8. Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. In: Proceedings of NAACL 2000, USA, pp. 63–69 (2000)

    Google Scholar 

  9. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  10. Rössler, M.: Using Markov Models for Named Entity recognition in German newspapers. In: Proceedings of the Workshop on Machine Learning Approaches in Computational Linguistics, Italy, pp. 29–37 (2002)

    Google Scholar 

  11. Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowsky, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of COLING 2000, Germany, pp. 857–863 (2000)

    Google Scholar 

  12. Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 Shared Task: Language- Independent Named Entity Recognition. In: Proceedings of CoNLL 2002, Taiwan, pp. 155–158 (2002)

    Google Scholar 

  13. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  14. Witten, I.H., Frank, E.: Data Mining. Machine Learning Algorithms in Java. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  15. Wu, D., Ngai, G., Carpuat, M.: A stacked, voted, stacked model for named entity recognition. In: CoNLL 2003. Computational Natural Language Learning, Edmonton, pp. 200–203 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Troyano, J.A., Díaz, V.J., Enríquez, F., Romero, L. (2004). Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme. In: Lemaître, C., Reyes, C.A., González, J.A. (eds) Advances in Artificial Intelligence – IBERAMIA 2004. IBERAMIA 2004. Lecture Notes in Computer Science(), vol 3315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30498-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30498-2_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23806-5

  • Online ISBN: 978-3-540-30498-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics