Skip to main content

Entity Recognition in Information Extraction

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8397))

Included in the following conference series:

Abstract

Detecting and resolving entities is an important step in information retrieval applications. Humans are able to recognize entities by context, but information extraction systems (IES) need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an IES that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles, is domain independent, and recognizes entities of arbitrary types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bast, H., Chitea, A., Suchanek, F.M., Weber, I.: Ester: efficient search on text, entities, and relations. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 671–678. ACM (2007)

    Google Scholar 

  2. Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: McCarthy, D., Wintner, S. (eds.) Proc. 11th Conf. of the European Chapter of the Association for Computational Linguistics, Trento, Italy (2006)

    Google Scholar 

  3. Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Kambhampati, S., Knoblock, C.A. (eds.) Proceedings of IJCAI 2003 Workshop on Information Integration on the Web (IIWeb), Acapulco, Mexico, pp. 73–78 (2003)

    Google Scholar 

  4. Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 708–716 (2007)

    Google Scholar 

  5. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K., Rajagopalan, S., Tomkins, A.: A case for automated large-scale semantic annotation. Web Semantics 1(1), 115–132 (2003)

    Article  Google Scholar 

  6. Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China, pp. 277–285. Tsinghua University Press (2010)

    Google Scholar 

  7. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 1535–1545 (2011)

    Google Scholar 

  8. Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Virgilio, R.D., Giunchiglia, F., Tanca, L. (eds.) Proc. 4th Intl. Workshop on Semantic Web Information Management (SWIM), Scottsdale, AZ. ACM (2012)

    Google Scholar 

  9. Halevy, A.Y., Etzioni, O., Doan, A., Ives, Z.G., Madhavan, J., McDowell, L., Tatarinov, I.: Crossing the structure chasm. In: Proc. 1st Biennal Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2003)

    Google Scholar 

  10. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China. Tsinghua University Press (2010)

    Google Scholar 

  11. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  12. Yosef, M.A., Hoart, J., Bordino, I., Spaniol, M., Weikum, G.: Aida: An online tool for accurate disambiguation of named entities in text and tables. PVLDB 4(12), 1450–1453 (2011)

    Google Scholar 

  13. Zhang, W., Su, J., Tan, C.L., Wang, W.: Entity linking leveraging automatically generated annotation. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China, pp. 1290–1298. Tsinghua University Press (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hanafiah, N., Quix, C. (2014). Entity Recognition in Information Extraction. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8397. Springer, Cham. https://doi.org/10.1007/978-3-319-05476-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05476-6_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05475-9

  • Online ISBN: 978-3-319-05476-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics