Training a Named Entity Recognizer on the Web

  • David Urbansky
  • James A. Thom
  • Daniel Schuster
  • Alexander Schill
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6997)

Abstract

In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language processing and is widely used in fields such as question answering, tagging, and information retrieval. Our NER can be trained on a set of entity names of different types and can be extended whenever a new entity type should be recognized. This feature increases the practical applications of the NER.

Keywords

Training Data Entity Type Computational Linguistics Entity Recognition Supervise Machine Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alias-i. Lingpipe 4.0.1 (2011), http://alias-i.com/lingpipe
  2. 2.
    Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of Human Language Technology Conference (HLT-NAACL), pp. 8–15 (2003)Google Scholar
  3. 3.
    Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201. Association for Computational Linguistics Morristown, NJ (1997)CrossRefGoogle Scholar
  4. 4.
    Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), vol. 6 (1998)Google Scholar
  5. 5.
    Buchholz, S., van den Bosch, A.: Integrating seed names and n-grams for a named entity list and classifier. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 1215–1221 (2000)Google Scholar
  6. 6.
    Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 1002–1012 (2010)Google Scholar
  7. 7.
    Cohen, W.W.: Fast effective rule induction. In: Machine Learning, International Workshop then Conference, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
  8. 8.
    Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196 (1999)Google Scholar
  9. 9.
    Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of the Workshop on Very Large Corpora at the Conference on Empirical Methods in NLP., pp. 90–99 (1999)Google Scholar
  10. 10.
    Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of IJCAI (2007)Google Scholar
  11. 11.
    Fleischman, M., Hovy, E.: Fine Grained Classification of Named Entities. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  12. 12.
    Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of the 16th conference on Computational linguistics, vol. 1, pp. 466–471. Association for Computational Linguistics, Morristown (1996)CrossRefGoogle Scholar
  13. 13.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics, Morristown (1992)CrossRefGoogle Scholar
  14. 14.
    Iacobelli, F., Nichols, N., Hammond, L.B.K.: Finding new information via robust entity detection. In: Proactive Assistant Agents (PAA 2010) AAAI 2010 Fall Symposium (2010)Google Scholar
  15. 15.
    Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of CoNLL, vol. 3 (2003)Google Scholar
  16. 16.
    Kozareva, Z., Bonev, B., Montoyo, A.: Self-training and co-training applied to spanish named entity recognition. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 770–779. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL) (2003)Google Scholar
  18. 18.
    McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Corpus Processing for Lexical Acquisition, pp. 21–39 (1996)Google Scholar
  19. 19.
    Meulder, F.D., Daelemans, W., Hoste, V.: A named entity recognition system for dutch. Language and Computers 45(1), 77–88 (2002); ISSN 0921-5034Google Scholar
  20. 20.
    Millan, M., Sánchez, D., Moreno, A.: Unsupervised Web-based Automatic Annotation. In: Proceeding of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers’ Symposium, pp. 118–129. IOS Press, Amsterdam (2008)Google Scholar
  21. 21.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)Google Scholar
  22. 22.
    Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Named Entities: Recognition, Classification and Use, pp. 3–28 (2009)Google Scholar
  23. 23.
    Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Proceedings of the Canadian Conference on Artificial Intelligence, pp. 266–277. Springer, Heidelberg (2006)Google Scholar
  24. 24.
    Niu, C., Li, W., Ding, J., Srihari, R.K.: Bootstrapping for named entity tagging using concept-based seeds. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003–short papers, vol. 2, pp. 73–75. Association for Computational Linguistics (2003)Google Scholar
  25. 25.
    Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)Google Scholar
  26. 26.
    Sekine, S.: NYU: Description of the Japanese NE System used for MET-2. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)Google Scholar
  27. 27.
    Szarvas, G., Farkas, R., Ormándi, R.: Improving a state-of-the-art named entity recognition system using the world wide web. Advances in Data Mining. Theoretical Aspects and Applications, 163–172 (2007)Google Scholar
  28. 28.
    Urbansky, D., Feldmann, M., Thom, J.A., Schill, A.: Entity extraction from the web with webKnox. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds.) Advances in Intelligent Web Mastering - 2. AISC, vol. 67, pp. 209–218. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Urbansky, D., Muthmann, K., Katz, P., Reichert, S.: Palladian: A toolkit for Internet Information Retrieval and Extraction. Website (May 2011), http://www.palladian.ws/documents/palladianBook.pdf
  30. 30.
    Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition 20, 1–4 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • David Urbansky
    • 1
  • James A. Thom
    • 2
  • Daniel Schuster
    • 1
  • Alexander Schill
    • 1
  1. 1.Dresden University of TechnologyGermany
  2. 2.RMIT UniversityAustralia

Personalised recommendations