Skip to main content

Segment Representations in Named Entity Recognition

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9302)

Abstract

In this paper we study the effects of various segment representations in the named entity recognition (NER) task. The segment representation is responsible for mapping multi-word entities into classes used in the chosen machine learning approach. Usually, the choice of a segment representation in the NER system is arbitrary without proper tests. Some authors presented comparisons of different segment representations such as BIO, BIEO, BILOU and usually compared only two segment representations. Our goal is to show, that the segment representation problem is more complex and that the proper selection of the best approach is not straightforward. We provide experiments with a wide set of segment representations. All the representations are tested using two popular machine learning algorithms: Conditional Random Fields and Maximum Entropy. Furthermore, the tests are done on four languages, namely English, Spanish, Dutch and Czech.

Keywords

  • Conditional Random Field
  • Name Entity Recognition
  • Entity Recognition
  • Representation Pair
  • Name Entity Recognition System

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-24033-6_7
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-24033-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Borthwick, A.E.: A Maximum Entropy Approach to Named Entity Recognition. Ph.D. thesis, New York, NY, USA. AAI9945252 (1999)

    Google Scholar 

  2. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. ICML 2001, San Francisco, CA, USA, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)

    Google Scholar 

  3. Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing. ANLC 1997, Stroudsburg, PA, USA, pp. 194–201. Association for Computational Linguistics (1997)

    Google Scholar 

  4. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)

    Google Scholar 

  5. Béchet, F., Nasr, A., Genet, F.: Tagging unknown proper names using decision trees. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. ACL 2000, Stroudsburg, PA, USA, pp. 77–84. Association for Computational Linguistics (2000)

    Google Scholar 

  6. Cucerzan, S., Yarowsky, D.: Language independent ner using a unified model of internal and contextual evidence. In: Proceedings of, Taipei, Taiwan, pp. 171–174 (2002)

    Google Scholar 

  7. Mao, X., Xu, W., Dong, Y., He, S., Wang, H.: Using Non-Local Features to Improve Named Entity Recognition Recall, vol. 21. The Korean Society for Language and Information (KSLI) (2007)

    Google Scholar 

  8. Sun, J., Wang, T., Li, L., Wu, X.: Person name disambiguation based on topic model. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing (2010)

    Google Scholar 

  9. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. HLT 2011, Stroudsburg, PA, USA, vol. 1, pp. 359–367. Association for Computational Linguistics (2011)

    Google Scholar 

  10. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning. CoNLL 2009, Stroudsburg, PA, USA, pp. 147–155. Association for Computational Linguistics (2009)

    Google Scholar 

  11. Straková, J., Straka, M., Hajič, J.: A new state-of-the-art Czech named entity recognizer. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 68–75. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  12. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: language independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. CONLL 2003, Stroudsburg, PA, USA, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)

    Google Scholar 

  13. Cho, H.C., Okazaki, N., Miwa, M., Tsujii, J.: Named entity recognition with multiple segment representations. Information Processing & Management 49(4), 954–965 (2013)

    CrossRef  Google Scholar 

  14. Shen, H., Sarkar, A.: Voting between multiple data representations for text chunking. In: Kégl, B., Lee, H.-H. (eds.) Canadian AI 2005. LNCS (LNAI), vol. 3501, pp. 389–400. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

  15. Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2. ACL 2009, Stroudsburg, PA, USA, pp. 1030–1038. Association for Computational Linguistics (2009)

    Google Scholar 

  16. Konkol, M.: Brainy: a machine learning library. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part II. LNCS, vol. 8468, pp. 490–499. Springer, Heidelberg (2014)

    CrossRef  Google Scholar 

  17. Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: language-independent named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning. COLING 2002, Stroudsburg, PA, USA, vol. 20, pp. 1–4. Association for Computational Linguistics (2002)

    Google Scholar 

  18. Konkol, M., Konopík, M.: CRF-based Czech named entity recognizer and consolidation of Czech NER research. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 153–160. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  19. Ševčíková, M., Žabokrtský, Z., Krůza, O.: Named entities in Czech: annotating data and developing NE tagger. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 188–195. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michal Konkol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Konkol, M., Konopík, M. (2015). Segment Representations in Named Entity Recognition. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)