Retrieving Textual Evidence for Knowledge Graph Facts

  • Gonenc Ercan
  • Shady ElbassuoniEmail author
  • Katja Hose
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11503)


Knowledge graphs have become vital resources for semantic search and provide users with precise answers to their information needs. Knowledge graphs often consist of billions of facts, typically encoded in the form of RDF triples. In most cases, these facts are extracted automatically and can thus be susceptible to errors. For many applications, it can therefore be very useful to complement knowledge graph facts with textual evidence. For instance, it can help users make informed decisions about the validity of the facts that are returned as part of an answer to a query. In this paper, we therefore propose Open image in new window , an approach that given a knowledge graph and a text corpus, retrieves the top-k most relevant textual passages for a given set of facts. Since our goal is to retrieve short passages, we develop a set of IR models combining exact matching through the Okapi BM25 model with semantic matching using word embeddings. To evaluate our approach, we built an extensive benchmark consisting of facts extracted from YAGO and text passages retrieved from Wikipedia. Our experimental results demonstrate the effectiveness of our approach in retrieving textual evidence for knowledge graph facts.



This research was partially funded by the Danish Council for Independent Research (DFF) under grant agreement no. DFF-8048-00051B and Aalborg University’s Talent Programme.


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). Scholar
  2. 2.
    Bhatia, S., Dwivedi, P., Kaur, A.: Tell me why is it so? Explaining knowledge graph relationships by finding descriptive support passages. In: ISWC (2018)Google Scholar
  3. 3.
    Brokos, G.I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. In: ACL, p. 114 (2016)Google Scholar
  4. 4.
    Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: ACL, pp. 100–109 (2013)Google Scholar
  5. 5.
    Daciuk, J., Mihov, S., Watson, B.W., Watson, R.E.: Incremental construction of minimal acyclic finite-state automata. Comput. Linguist. 26(1), 3–16 (2000)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Elbassuoni, S., Hose, K., Metzger, S., Schenkel, R.: ROXXI: Reviving witness dOcuments to eXplore eXtracted Information. PVLDB 3(2), 1589–1592 (2010)Google Scholar
  7. 7.
    Gad-Elrab, M., Stepanova, D., Urbani, J., Weikum, G.: ExFaKT: a framework for explaining facts over knowledge graphs and text. In: WSDM (2019)Google Scholar
  8. 8.
    Galke, L., Saleh, A., Scherp, A.: Word embeddings for practical information retrieval. In: INFORMATIK (2017)Google Scholar
  9. 9.
    Gerber, D., et al.: Defacto - temporal and multilingual deep fact validation. Web Semant. 35, 85–101 (2015)CrossRefGoogle Scholar
  10. 10.
    Gerber, D., Ngomo, A.C.N.: Bootstrapping the linked data web. In: Workshop on Web Scale Knowledge Extraction (2011)Google Scholar
  11. 11.
    Guo, J., Fan, Y., Ai, Q., Croft, W.B.: Semantic matching by non-linear word transportation for information retrieval. In: CIKM, pp. 701–710 (2016)Google Scholar
  12. 12.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM TOIS 20(4), 422–446 (2002)CrossRefGoogle Scholar
  13. 13.
    Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: Part 2. Inf. Process. Manage. 36(6), 809–840 (2000)CrossRefGoogle Scholar
  14. 14.
    Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: CIKM, pp. 1411–1420 (2015)Google Scholar
  15. 15.
    Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: ICML, pp. 957–966 (2015)Google Scholar
  16. 16.
    Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)CrossRefGoogle Scholar
  17. 17.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  18. 18.
    Metzger, S., Elbassuoni, S., Hose, K., Schenkel, R.: S3K: seeking statement-supporting top-K witnesses. In: CIKM, pp. 37–46 (2011)Google Scholar
  19. 19.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  20. 20.
    Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: EMNLP (2012)Google Scholar
  21. 21.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)Google Scholar
  22. 22.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC. vol. Special Publication 500–225, pp. 109–126. National Institute of Standards and Technology (NIST) (1994)Google Scholar
  23. 23.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  24. 24.
    Vulić, I., Moens, M.F.: Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: SIGIR, pp. 363–372 (2015)Google Scholar
  25. 25.
    Zhou, G., He, T., Zhao, J., Hu, P.: Learning continuous word embedding with metadata for question retrieval in community question answering. In: ACL, vol. 1, pp. 250–259 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Aalborg UniversityAalborgDenmark
  2. 2.Informatics Institute AnkaraHacettepe UniversityAnkaraTurkey
  3. 3.American University of BeirutBeirutLebanon

Personalised recommendations