Inducing Context Gazetteers from Encyclopedic Databases for Named Entity Recognition

  • Han-Cheol Cho
  • Naoaki Okazaki
  • Kentaro Inui
Conference paper

DOI: 10.1007/978-3-642-37453-1_31

Volume 7818 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Cho HC., Okazaki N., Inui K. (2013) Inducing Context Gazetteers from Encyclopedic Databases for Named Entity Recognition. In: Pei J., Tseng V.S., Cao L., Motoda H., Xu G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science, vol 7818. Springer, Berlin, Heidelberg

Abstract

Named entity recognition (NER) is a fundamental task for mining valuable information from unstructured and semi-structured texts. State-of-the-art NER models mostly employ a supervised machine learning approach that heavily depends on local contexts. However, results of recent research have demonstrated that non-local contexts at the sentence or document level can help advance the improvement of recognition performance. As described in this paper, we propose the use of a context gazetteer, the list of contexts with which entity names can co-occur, as new non-local context information. We build a context gazetteer from an encyclopedic database because manually annotated data are often too few to extract rich and sophisticated context patterns. In addition, dependency path is used as sentence level non-local context to capture more syntactically related contexts to entity mentions than linear context in traditional NER. In the discussion of experimentation used for this study, we build a context gazetteer of gene names and apply it for a biomedical NER task. High confidence context patterns appear in various forms. Some are similar to a predicate–argument structure whereas some are in unexpected forms. The experiment results show that the proposed model using both entity and context gazetteers improves both precision and recall over a strong baseline model, and therefore the usefulness of the context gazetteer.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Han-Cheol Cho
    • 1
  • Naoaki Okazaki
    • 2
    • 3
  • Kentaro Inui
    • 2
  1. 1.Suda Lab., Graduate School of Information Science and Technologythe University of TokyoTokyoJapan
  2. 2.Inui and Okazaki Lab., Graduate School of Information ScienceTohoku UniversitySendaiJapan
  3. 3.Japan Science and Technology Agency (JST)Japan