Chapter

Advances in Knowledge Discovery and Data Mining

Volume 7818 of the series Lecture Notes in Computer Science pp 378-389

Inducing Context Gazetteers from Encyclopedic Databases for Named Entity Recognition

  • Han-Cheol ChoAffiliated withSuda Lab., Graduate School of Information Science and Technology, the University of Tokyo
  • , Naoaki OkazakiAffiliated withInui and Okazaki Lab., Graduate School of Information Science, Tohoku UniversityJapan Science and Technology Agency (JST)
  • , Kentaro InuiAffiliated withInui and Okazaki Lab., Graduate School of Information Science, Tohoku University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Named entity recognition (NER) is a fundamental task for mining valuable information from unstructured and semi-structured texts. State-of-the-art NER models mostly employ a supervised machine learning approach that heavily depends on local contexts. However, results of recent research have demonstrated that non-local contexts at the sentence or document level can help advance the improvement of recognition performance. As described in this paper, we propose the use of a context gazetteer, the list of contexts with which entity names can co-occur, as new non-local context information. We build a context gazetteer from an encyclopedic database because manually annotated data are often too few to extract rich and sophisticated context patterns. In addition, dependency path is used as sentence level non-local context to capture more syntactically related contexts to entity mentions than linear context in traditional NER. In the discussion of experimentation used for this study, we build a context gazetteer of gene names and apply it for a biomedical NER task. High confidence context patterns appear in various forms. Some are similar to a predicate–argument structure whereas some are in unexpected forms. The experiment results show that the proposed model using both entity and context gazetteers improves both precision and recall over a strong baseline model, and therefore the usefulness of the context gazetteer.