Training a Named Entity Recognizer on the Web

Urbansky, David; Thom, James A.; Schuster, Daniel; Schill, Alexander

doi:10.1007/978-3-642-24434-6_7

David Urbansky¹⁹,
James A. Thom²⁰,
Daniel Schuster¹⁹ &
…
Alexander Schill¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6997))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1443 Accesses

Abstract

In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language processing and is widely used in fields such as question answering, tagging, and information retrieval. Our NER can be trained on a set of entity names of different types and can be extended whenever a new entity type should be recognized. This feature increases the practical applications of the NER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alias-i. Lingpipe 4.0.1 (2011), http://alias-i.com/lingpipe
Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of Human Language Technology Conference (HLT-NAACL), pp. 8–15 (2003)
Google Scholar
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201. Association for Computational Linguistics Morristown, NJ (1997)
Chapter Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), vol. 6 (1998)
Google Scholar
Buchholz, S., van den Bosch, A.: Integrating seed names and n-grams for a named entity list and classifier. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 1215–1221 (2000)
Google Scholar
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 1002–1012 (2010)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Machine Learning, International Workshop then Conference, pp. 115–123. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196 (1999)
Google Scholar
Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of the Workshop on Very Large Corpora at the Conference on Empirical Methods in NLP., pp. 90–99 (1999)
Google Scholar
Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of IJCAI (2007)
Google Scholar
Fleischman, M., Hovy, E.: Fine Grained Classification of Named Entities. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of the 16th conference on Computational linguistics, vol. 1, pp. 466–471. Association for Computational Linguistics, Morristown (1996)
Chapter Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics, Morristown (1992)
Chapter Google Scholar
Iacobelli, F., Nichols, N., Hammond, L.B.K.: Finding new information via robust entity detection. In: Proactive Assistant Agents (PAA 2010) AAAI 2010 Fall Symposium (2010)
Google Scholar
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of CoNLL, vol. 3 (2003)
Google Scholar
Kozareva, Z., Bonev, B., Montoyo, A.: Self-training and co-training applied to spanish named entity recognition. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 770–779. Springer, Heidelberg (2005)
Chapter Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL) (2003)
Google Scholar
McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Corpus Processing for Lexical Acquisition, pp. 21–39 (1996)
Google Scholar
Meulder, F.D., Daelemans, W., Hoste, V.: A named entity recognition system for dutch. Language and Computers 45(1), 77–88 (2002); ISSN 0921-5034
Google Scholar
Millan, M., Sánchez, D., Moreno, A.: Unsupervised Web-based Automatic Annotation. In: Proceeding of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers’ Symposium, pp. 118–129. IOS Press, Amsterdam (2008)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)
Google Scholar
Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Named Entities: Recognition, Classification and Use, pp. 3–28 (2009)
Google Scholar
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Proceedings of the Canadian Conference on Artificial Intelligence, pp. 266–277. Springer, Heidelberg (2006)
Google Scholar
Niu, C., Li, W., Ding, J., Srihari, R.K.: Bootstrapping for named entity tagging using concept-based seeds. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003–short papers, vol. 2, pp. 73–75. Association for Computational Linguistics (2003)
Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)
Google Scholar
Sekine, S.: NYU: Description of the Japanese NE System used for MET-2. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)
Google Scholar
Szarvas, G., Farkas, R., Ormándi, R.: Improving a state-of-the-art named entity recognition system using the world wide web. Advances in Data Mining. Theoretical Aspects and Applications, 163–172 (2007)
Google Scholar
Urbansky, D., Feldmann, M., Thom, J.A., Schill, A.: Entity extraction from the web with webKnox. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds.) Advances in Intelligent Web Mastering - 2. AISC, vol. 67, pp. 209–218. Springer, Heidelberg (2010)
Chapter Google Scholar
Urbansky, D., Muthmann, K., Katz, P., Reichert, S.: Palladian: A toolkit for Internet Information Retrieval and Extraction. Website (May 2011), http://www.palladian.ws/documents/palladianBook.pdf
Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition 20, 1–4 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dresden University of Technology, Germany
David Urbansky, Daniel Schuster & Alexander Schill
RMIT University, Australia
James A. Thom

Authors

David Urbansky
View author publications
You can also search for this author in PubMed Google Scholar
James A. Thom
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schill
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Engineering Laborarory, CSIRO ICT Centre, Australia
Athman Bouguettaya
Digital Enterprise Research Institute (DERI), National University of Ireland, IDA Business Park, Lower Dangan,, Galway, Ireland
Manfred Hauswirth
College of Computing, Georgia Institute of Technology, 266 Ferst Drive, 30332-0765, Atlanta, GA, USA
Ling Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Urbansky, D., Thom, J.A., Schuster, D., Schill, A. (2011). Training a Named Entity Recognizer on the Web. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds) Web Information System Engineering – WISE 2011. WISE 2011. Lecture Notes in Computer Science, vol 6997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24434-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-24434-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24433-9
Online ISBN: 978-3-642-24434-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics