Evaluating Corpora for Named Entity Recognition Using Character-Level Features

  • Casey Whitelaw
  • Jon Patrick
Conference paper

DOI: 10.1007/978-3-540-24581-0_78

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2903)
Cite this paper as:
Whitelaw C., Patrick J. (2003) Evaluating Corpora for Named Entity Recognition Using Character-Level Features. In: Gedeon T..D., Fung L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science, vol 2903. Springer, Berlin, Heidelberg

Abstract

We present a new collection of training corpora for evaluation of language-independent named entity recognition systems. For the five languages included in this initial release, Basque, Dutch, English, Korean, and Spanish, we provide an analysis of the relative difficulty of the NER task for both the language in general, and as a supervised task using these corpora. We construct three strongly language-independent systems, each using only orthographic features, and compare their performance on both seen and unseen data. We achieve improved results through combining these classifiers, showing that ensemble approaches are suitable when dealing with language-independent problems.

Keywords

computational linguistics machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Casey Whitelaw
    • 1
  • Jon Patrick
    • 1
  1. 1.Sydney Language Technology Research Group Capital Markets Co-operative Research CentreUniversity of Sydney 

Personalised recommendations