Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

Nadeau, David; Turney, Peter D.; Matwin, Stan

doi:10.1007/11766247_23

Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

David Nadeau^20,21,
Peter D. Turney²⁰ &
Stan Matwin^21,22

Conference paper

2949 Accesses
75 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4013))

Abstract

In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chinchor, N.: MUC-7 Named Entity Task Definition, version 3.5. In: Proc. of the Seventh Message Understanding Conference (1998)
Google Scholar
Cohen, W., Fan, W.: Learning Page-Independent Heuristics for Extracting Data from Web Page. In: Proc. of the International World Wide Web Conference (1999)
Google Scholar
Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence 165, 91–134 (2005)
Article Google Scholar
Evans, R.: A Framework for Named Entity Recognition in the Open Domain. In: Proc. Recent Advances in Natural Language Processing (2003)
Google Scholar
Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proc. of International Conference on Computational Linguistics (1992)
Google Scholar
Lin, D., Pantel, P.: Induction of Semantic Classes from Natural Language Text. In: Proc. of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2001)
Google Scholar
Ling, C., Li, C.: Data Mining for Direct Marketing: Problems and Solutions. In: Proc. International Conference on Knowledge Discovery and Data Mining (1998)
Google Scholar
Mikheev, A.: A Knowledge-free Method for Capitalized Word Disambiguation. In: Proc. Conference of Association for Computational Linguistics (1999)
Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proc. Conference of European Chapter of the Association for Computational Linguistics (1999)
Google Scholar
Nadeau, D.: Création de surcouche de documents hypertextes et traitement du langage naturel. In: Proc. Computational Linguistics in the North-East (2005)
Google Scholar
Palmer, D.D., Day, D.S.: A Statistical Profile of the Named Entity Task. In: Proc. ACL Conference for Applied Natural Language Processing (1997)
Google Scholar
Petasis, G., Vichot, F., Wolinski, F., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D.: Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems. In: Proc. Conference of Association for Computational Linguistics (2001)
Google Scholar
Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction using Multi-level Bootstrapping. In: Proc. of National Conference on Artificial Intelligence (1999)
Google Scholar
Sekine, S., Sudo, K., Nobata, C.: Extended Named Entity Hierarchy. In: Proc. of the Language Resource and Evaluation Conference (2002)
Google Scholar
Zhu, X., Wu, X., Chen, Q.: Eliminating Class Noise in Large Data-Sets. In: Proc. of the International Conference on Machine Learning (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

National Research Council, Institute for Information Technology, Canada
David Nadeau & Peter D. Turney
School of Information Technology and Engineering, University of Ottawa, Canada
David Nadeau & Stan Matwin
Institute for Computer Science, Polish Academy of Sciences, Poland
Stan Matwin

Authors

David Nadeau
View author publications
You can also search for this author in PubMed Google Scholar
Peter D. Turney
View author publications
You can also search for this author in PubMed Google Scholar
Stan Matwin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departement of Computer Science and Software Engineering, Laval University, G1K 7P4, Québec, Canada
Luc Lamontagne
Département IFT-GLO, Pavillon Adrien-Pouliot, Université Laval, G1K-7P4, Québec, Canada
Mario Marchand

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nadeau, D., Turney, P.D., Matwin, S. (2006). Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. In: Lamontagne, L., Marchand, M. (eds) Advances in Artificial Intelligence. Canadian AI 2006. Lecture Notes in Computer Science(), vol 4013. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766247_23

Download citation

DOI: https://doi.org/10.1007/11766247_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34628-9
Online ISBN: 978-3-540-34630-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics