Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

  • David Nadeau
  • Peter D. Turney
  • Stan Matwin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4013)


In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands).


Ambiguity Resolution Computational Linguistics Unknown Word Supervise System Lexical Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chinchor, N.: MUC-7 Named Entity Task Definition, version 3.5. In: Proc. of the Seventh Message Understanding Conference (1998)Google Scholar
  2. 2.
    Cohen, W., Fan, W.: Learning Page-Independent Heuristics for Extracting Data from Web Page. In: Proc. of the International World Wide Web Conference (1999)Google Scholar
  3. 3.
    Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)Google Scholar
  4. 4.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence 165, 91–134 (2005)CrossRefGoogle Scholar
  5. 5.
    Evans, R.: A Framework for Named Entity Recognition in the Open Domain. In: Proc. Recent Advances in Natural Language Processing (2003)Google Scholar
  6. 6.
    Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proc. of International Conference on Computational Linguistics (1992)Google Scholar
  7. 7.
    Lin, D., Pantel, P.: Induction of Semantic Classes from Natural Language Text. In: Proc. of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2001)Google Scholar
  8. 8.
    Ling, C., Li, C.: Data Mining for Direct Marketing: Problems and Solutions. In: Proc. International Conference on Knowledge Discovery and Data Mining (1998)Google Scholar
  9. 9.
    Mikheev, A.: A Knowledge-free Method for Capitalized Word Disambiguation. In: Proc. Conference of Association for Computational Linguistics (1999)Google Scholar
  10. 10.
    Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proc. Conference of European Chapter of the Association for Computational Linguistics (1999)Google Scholar
  11. 11.
    Nadeau, D.: Création de surcouche de documents hypertextes et traitement du langage naturel. In: Proc. Computational Linguistics in the North-East (2005)Google Scholar
  12. 12.
    Palmer, D.D., Day, D.S.: A Statistical Profile of the Named Entity Task. In: Proc. ACL Conference for Applied Natural Language Processing (1997)Google Scholar
  13. 13.
    Petasis, G., Vichot, F., Wolinski, F., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D.: Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems. In: Proc. Conference of Association for Computational Linguistics (2001)Google Scholar
  14. 14.
    Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction using Multi-level Bootstrapping. In: Proc. of National Conference on Artificial Intelligence (1999)Google Scholar
  15. 15.
    Sekine, S., Sudo, K., Nobata, C.: Extended Named Entity Hierarchy. In: Proc. of the Language Resource and Evaluation Conference (2002)Google Scholar
  16. 16.
    Zhu, X., Wu, X., Chen, Q.: Eliminating Class Noise in Large Data-Sets. In: Proc. of the International Conference on Machine Learning (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David Nadeau
    • 1
    • 2
  • Peter D. Turney
    • 1
  • Stan Matwin
    • 2
    • 3
  1. 1.National Research CouncilInstitute for Information TechnologyCanada
  2. 2.School of Information Technology and EngineeringUniversity of OttawaCanada
  3. 3.Institute for Computer SciencePolish Academy of SciencesPoland

Personalised recommendations