Skip to main content

Training a Named Entity Recognizer on the Web

  • Conference paper
Web Information System Engineering – WISE 2011 (WISE 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6997))

Included in the following conference series:

  • 1443 Accesses

Abstract

In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language processing and is widely used in fields such as question answering, tagging, and information retrieval. Our NER can be trained on a set of entity names of different types and can be extended whenever a new entity type should be recognized. This feature increases the practical applications of the NER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alias-i. Lingpipe 4.0.1 (2011), http://alias-i.com/lingpipe

  2. Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of Human Language Technology Conference (HLT-NAACL), pp. 8–15 (2003)

    Google Scholar 

  3. Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201. Association for Computational Linguistics Morristown, NJ (1997)

    Chapter  Google Scholar 

  4. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), vol. 6 (1998)

    Google Scholar 

  5. Buchholz, S., van den Bosch, A.: Integrating seed names and n-grams for a named entity list and classifier. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 1215–1221 (2000)

    Google Scholar 

  6. Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 1002–1012 (2010)

    Google Scholar 

  7. Cohen, W.W.: Fast effective rule induction. In: Machine Learning, International Workshop then Conference, pp. 115–123. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  8. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196 (1999)

    Google Scholar 

  9. Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of the Workshop on Very Large Corpora at the Conference on Empirical Methods in NLP., pp. 90–99 (1999)

    Google Scholar 

  10. Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of IJCAI (2007)

    Google Scholar 

  11. Fleischman, M., Hovy, E.: Fine Grained Classification of Named Entities. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)

    Google Scholar 

  12. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of the 16th conference on Computational linguistics, vol. 1, pp. 466–471. Association for Computational Linguistics, Morristown (1996)

    Chapter  Google Scholar 

  13. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics, Morristown (1992)

    Chapter  Google Scholar 

  14. Iacobelli, F., Nichols, N., Hammond, L.B.K.: Finding new information via robust entity detection. In: Proactive Assistant Agents (PAA 2010) AAAI 2010 Fall Symposium (2010)

    Google Scholar 

  15. Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of CoNLL, vol. 3 (2003)

    Google Scholar 

  16. Kozareva, Z., Bonev, B., Montoyo, A.: Self-training and co-training applied to spanish named entity recognition. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 770–779. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL) (2003)

    Google Scholar 

  18. McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Corpus Processing for Lexical Acquisition, pp. 21–39 (1996)

    Google Scholar 

  19. Meulder, F.D., Daelemans, W., Hoste, V.: A named entity recognition system for dutch. Language and Computers 45(1), 77–88 (2002); ISSN 0921-5034

    Google Scholar 

  20. Millan, M., Sánchez, D., Moreno, A.: Unsupervised Web-based Automatic Annotation. In: Proceeding of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers’ Symposium, pp. 118–129. IOS Press, Amsterdam (2008)

    Google Scholar 

  21. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)

    Google Scholar 

  22. Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Named Entities: Recognition, Classification and Use, pp. 3–28 (2009)

    Google Scholar 

  23. Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Proceedings of the Canadian Conference on Artificial Intelligence, pp. 266–277. Springer, Heidelberg (2006)

    Google Scholar 

  24. Niu, C., Li, W., Ding, J., Srihari, R.K.: Bootstrapping for named entity tagging using concept-based seeds. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003–short papers, vol. 2, pp. 73–75. Association for Computational Linguistics (2003)

    Google Scholar 

  25. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)

    Google Scholar 

  26. Sekine, S.: NYU: Description of the Japanese NE System used for MET-2. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  27. Szarvas, G., Farkas, R., Ormándi, R.: Improving a state-of-the-art named entity recognition system using the world wide web. Advances in Data Mining. Theoretical Aspects and Applications, 163–172 (2007)

    Google Scholar 

  28. Urbansky, D., Feldmann, M., Thom, J.A., Schill, A.: Entity extraction from the web with webKnox. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds.) Advances in Intelligent Web Mastering - 2. AISC, vol. 67, pp. 209–218. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  29. Urbansky, D., Muthmann, K., Katz, P., Reichert, S.: Palladian: A toolkit for Internet Information Retrieval and Extraction. Website (May 2011), http://www.palladian.ws/documents/palladianBook.pdf

  30. Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition 20, 1–4 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Urbansky, D., Thom, J.A., Schuster, D., Schill, A. (2011). Training a Named Entity Recognizer on the Web. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds) Web Information System Engineering – WISE 2011. WISE 2011. Lecture Notes in Computer Science, vol 6997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24434-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24434-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24433-9

  • Online ISBN: 978-3-642-24434-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics