Abstract
In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language processing and is widely used in fields such as question answering, tagging, and information retrieval. Our NER can be trained on a set of entity names of different types and can be extended whenever a new entity type should be recognized. This feature increases the practical applications of the NER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alias-i. Lingpipe 4.0.1 (2011), http://alias-i.com/lingpipe
Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of Human Language Technology Conference (HLT-NAACL), pp. 8–15 (2003)
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201. Association for Computational Linguistics Morristown, NJ (1997)
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), vol. 6 (1998)
Buchholz, S., van den Bosch, A.: Integrating seed names and n-grams for a named entity list and classifier. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 1215–1221 (2000)
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 1002–1012 (2010)
Cohen, W.W.: Fast effective rule induction. In: Machine Learning, International Workshop then Conference, pp. 115–123. Morgan Kaufmann, San Francisco (1995)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196 (1999)
Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of the Workshop on Very Large Corpora at the Conference on Empirical Methods in NLP., pp. 90–99 (1999)
Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of IJCAI (2007)
Fleischman, M., Hovy, E.: Fine Grained Classification of Named Entities. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings of the 16th conference on Computational linguistics, vol. 1, pp. 466–471. Association for Computational Linguistics, Morristown (1996)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics, Morristown (1992)
Iacobelli, F., Nichols, N., Hammond, L.B.K.: Finding new information via robust entity detection. In: Proactive Assistant Agents (PAA 2010) AAAI 2010 Fall Symposium (2010)
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of CoNLL, vol. 3 (2003)
Kozareva, Z., Bonev, B., Montoyo, A.: Self-training and co-training applied to spanish named entity recognition. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 770–779. Springer, Heidelberg (2005)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL) (2003)
McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Corpus Processing for Lexical Acquisition, pp. 21–39 (1996)
Meulder, F.D., Daelemans, W., Hoste, V.: A named entity recognition system for dutch. Language and Computers 45(1), 77–88 (2002); ISSN 0921-5034
Millan, M., Sánchez, D., Moreno, A.: Unsupervised Web-based Automatic Annotation. In: Proceeding of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers’ Symposium, pp. 118–129. IOS Press, Amsterdam (2008)
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)
Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. Named Entities: Recognition, Classification and Use, pp. 3–28 (2009)
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Proceedings of the Canadian Conference on Artificial Intelligence, pp. 266–277. Springer, Heidelberg (2006)
Niu, C., Li, W., Ding, J., Srihari, R.K.: Bootstrapping for named entity tagging using concept-based seeds. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003–short papers, vol. 2, pp. 73–75. Association for Computational Linguistics (2003)
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)
Sekine, S.: NYU: Description of the Japanese NE System used for MET-2. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)
Szarvas, G., Farkas, R., Ormándi, R.: Improving a state-of-the-art named entity recognition system using the world wide web. Advances in Data Mining. Theoretical Aspects and Applications, 163–172 (2007)
Urbansky, D., Feldmann, M., Thom, J.A., Schill, A.: Entity extraction from the web with webKnox. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds.) Advances in Intelligent Web Mastering - 2. AISC, vol. 67, pp. 209–218. Springer, Heidelberg (2010)
Urbansky, D., Muthmann, K., Katz, P., Reichert, S.: Palladian: A toolkit for Internet Information Retrieval and Extraction. Website (May 2011), http://www.palladian.ws/documents/palladianBook.pdf
Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition 20, 1–4 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Urbansky, D., Thom, J.A., Schuster, D., Schill, A. (2011). Training a Named Entity Recognizer on the Web. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds) Web Information System Engineering – WISE 2011. WISE 2011. Lecture Notes in Computer Science, vol 6997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24434-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-24434-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24433-9
Online ISBN: 978-3-642-24434-6
eBook Packages: Computer ScienceComputer Science (R0)