Advertisement

AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data

  • Ricardo Usbeck
  • Axel-Cyrille Ngonga Ngomo
  • Michael Röder
  • Daniel Gerber
  • Sandro Athaide Coelho
  • Sören Auer
  • Andreas Both
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8796)

Abstract

Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of named entities. While several approaches aim to tackle this problem, they still achieve poor accuracy. We address this drawback by presenting AGDISTIS, a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach combines the Hypertext-Induced Topic Search (HITS) algorithm with label expansion strategies and string similarity measures. Based on this combination, AGDISTIS can efficiently detect the correct URIs for a given set of named entities within an input text. We evaluate our approach on eight different datasets against state-of-the-art named entity disambiguation frameworks. Our results indicate that we outperform the state-of-the-art approach by up to 29% F-measure.

Keywords

Input Text Name Entity Recognition Entity Recognition Expansion Policy Coreference Resolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adida, B., Herman, I., Sporny, M., Birbeck, M.: RDFa 1.1 Primer. Technical report, World Wide Web Consortium (June 2012), http://www.w3.org/TR/2012/NOTE-rdfa-primer-20120607/
  2. 2.
    Adrian, B., Hees, J., Herman, I., Sintek, M., Dengel, A.: Epiphany: Adaptable RDFa generation linking the web of documents to the web of data. In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 178–192. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: 22nd WWW, pp. 249–260 (2013)Google Scholar
  4. 4.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, pp. 708–716 (2007)Google Scholar
  5. 5.
    Ell, B., Vrandečić, D., Simperl, E.: Labels in the web of data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162–176. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. IEEE Software 29(1) (2012)Google Scholar
  7. 7.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005)Google Scholar
  8. 8.
    Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., Ngonga Ngomo, A.-C.: Real-time RDF extraction from unstructured data streams. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 135–150. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust Disambiguation of Named Entities in Text. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, Scotland, pp. 782–792 (2011)Google Scholar
  10. 10.
    Kleb, J., Abecker, A.: Entity reference resolution via spreading activation on RDF-graphs. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part I. LNCS, vol. 6088, pp. 152–166. Springer, Heidelberg (2010)Google Scholar
  11. 11.
    Kleb, J., Abecker, A.: Disambiguating entity references within an ontological model. In: WIMS, p. 22 (2011)Google Scholar
  12. 12.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: 15th ACM SIGKDD, pp. 457–466 (2009)Google Scholar
  14. 14.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. SWJ (2014)Google Scholar
  15. 15.
    Mendes, P.N., Jakob, M., Garcia-Silva, A., Bizer, C.: Dbpedia spotlight: Shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics (2011)Google Scholar
  16. 16.
    Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: 16th ACM Conference on Information and Knowledge Management, CIKM 2007, pp. 233–242. ACM, New York (2007)Google Scholar
  17. 17.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: 17th ACM CIKM, pp. 509–518 (2008)Google Scholar
  18. 18.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: A unified approach. TACL 2 (2014)Google Scholar
  19. 19.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 3–26 (2007)CrossRefGoogle Scholar
  20. 20.
    Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL (June 2009)Google Scholar
  21. 21.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL (2011)Google Scholar
  22. 22.
    Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3 - a collection of datasets for named entity recognition and disambiguation in the nlp interchange format. In: 9th LREC (2014)Google Scholar
  23. 23.
    Shen, W., Wang, J., Luo, P., Wang, M.: Linden: linking named entities with knowledge base via semantic knowledge. In: 21st WWW, pp. 449–458 (2012)Google Scholar
  24. 24.
    Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Large-scale cross-document coreference using distributed inference and hierarchical models. In: 49th ACL: Human Language Technologies, pp. 793–803 (2011)Google Scholar
  25. 25.
    Speck, R., Ngomo, A.-C.N.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014. LNCS (LNAI), vol. 8796, pp. 511–526. Springer, Heidelberg (2001)Google Scholar
  26. 26.
    Sang, E.F.T.K., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2003, pp. 142–147 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ricardo Usbeck
    • 1
    • 2
  • Axel-Cyrille Ngonga Ngomo
    • 1
  • Michael Röder
    • 1
    • 2
  • Daniel Gerber
    • 1
  • Sandro Athaide Coelho
    • 3
  • Sören Auer
    • 4
  • Andreas Both
    • 2
  1. 1.University of LeipzigGermany
  2. 2.R & D, Unister GmbHGermany
  3. 3.Federal University of Juiz de ForaBrazil
  4. 4.University of Bonn & Fraunhofer IAISGermany

Personalised recommendations