Skip to main content

Textual Entailment for Effective Triple Validation in Object Prediction

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2023 (ISWC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14265))

Included in the following conference series:

  • 1453 Accesses

Abstract

Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines. However, prompt-based fact retrieval can be brittle and heavily depend on the prompts and context used, which may produce results that are unintended or hallucinatory. We propose to use textual entailment to validate facts extracted from language models through cloze statements. Our results show that triple validation based on textual entailment improves language model predictions in different training regimes. Furthermore, we show that entailment-based triple validation is also effective to validate candidate facts extracted from other sources including existing knowledge graphs and text passages where named entities are recognized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://blog.google/products/search/introducing-knowledge-graph-things-not/.

  2. 2.

    The KBP evaluation track of the TAC [14] is a long running initiative. However, manual system evaluation makes it hard to reproduce evaluation for new systems.

  3. 3.

    See RDF schema in https://www.w3.org/TR/rdf-primer/#properties.

  4. 4.

    While rdf:type is the standard property used to state that a resource is an instance of a class, some knowledge graphs could use other ad-hoc property.

  5. 5.

    Due to the diverse nature of the MISC category we do not consider it.

  6. 6.

    https://lm-kbc.github.io/2022/.

  7. 7.

    https://pypi.org/project/duckduckgo-search/.

  8. 8.

    https://spacy.io/models/en#en_core_web_trf.

  9. 9.

    https://huggingface.co/microsoft/deberta-v2-xlarge-mnli.

  10. 10.

    https://huggingface.co/microsoft/deberta-v3-xsmall.

  11. 11.

    https://huggingface.co/boychaboy/MNLI_bert-large-cased.

  12. 12.

    https://github.com/satori2023/Textual-Entailment-for-Effective-Triple-Validation-in-Object-Prediction.

  13. 13.

    https://github.com/lm-kbc/dataset.

  14. 14.

    https://huggingface.co/deepset/deberta-v3-large-squad2.

  15. 15.

    https://rajpurkar.github.io/SQuAD-explorer/.

  16. 16.

    The relation mapping can be found in the paper repository.

  17. 17.

    https://github.com/huggingface/transformers/tree/v4.24.0/examples/pytorch/text-classification.

  18. 18.

    https://github.com/huggingface/transformers/tree/v4.24.0/examples/pytorch/question-answering.

  19. 19.

    https://github.com/Babelscape/rebel/blob/main/src/train.py.

  20. 20.

    https://github.com/expertailab/Textual-Entailment-for-Effective-Triple-Validation-in-Object-Prediction.

References

  1. Adel, H., Schütze, H.: Type-aware convolutional neural networks for slot filling. J. Artif. Intell. Res. 66, 297–339 (2019)

    Article  Google Scholar 

  2. Alivanistos, D., Santamaría, S., Cochez, M., Kalo, J., van Krieken, E., Thanapalasingam, T.: Prompting as probing: using language models for knowledge base construction. In: Singhania, S., Nguyen, T.P., Razniewski, S. (eds.) LM-KBC 2022 Knowledge Base Construction from Pre-trained Language Models 2022, pp. 11–34. CEUR Workshop Proceedings, CEUR-WS.org (2022)

    Google Scholar 

  3. Balazevic, I., Allen, C., Hospedales, T.: TuckER: tensor factorization for knowledge graph completion. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 5185–5194. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1522. https://aclanthology.org/D19-1522

  4. Balog, K.: Populating knowledge bases. In: Balog, K. (ed.) Entity-Oriented Search. TIRS, vol. 39, pp. 189–222. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93935-3_6

    Chapter  Google Scholar 

  5. Bentivogli, L., Clark, P., Dagan, I., Giampiccolo, D.: The seventh pascal recognizing textual entailment challenge. In: Theory and Applications of Categories (2011)

    Google Scholar 

  6. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://papers.nips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html

  7. Bouraoui, Z., Camacho-Collados, J., Schockaert, S.: Inducing relational knowledge from BERT. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7456–7463 (2020)

    Google Scholar 

  8. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 632–642. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1075. https://aclanthology.org/D15-1075

  9. Cao, B., et al.: Knowledgeable or educated guess? Revisiting language models as knowledge bases. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, pp. 1860–1874. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.146. https://aclanthology.org/2021.acl-long.146

  10. Chen, Z., Feng, Y., Zhao, D.: Entailment graph learning with textual entailment and soft transitivity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, pp. 5899–5910. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-long.406. https://aclanthology.org/2022.acl-long.406

  11. Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9

    Chapter  Google Scholar 

  12. Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, New York, NY, USA, pp. 375–383. Association for Computing Machinery (2017). https://doi.org/10.1145/3018661.3018739

  13. Gerber, D., et al.: Defacto-temporal and multilingual deep fact validation. Web Semant. 35(P2), 85–101 (2015). https://doi.org/10.1016/j.websem.2015.08.001

  14. Getman, J., Ellis, J., Strassel, S., Song, Z., Tracey, J.: Laying the groundwork for knowledge base population: nine years of linguistic resources for TAC KBP. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA) (2018). https://aclanthology.org/L18-1245

  15. Goodfellow, I.J., Mirza, M., Da, X., Courville, A.C., Bengio, Y.: An empirical investigation of catastrophic forgeting in gradient-based neural networks. CoRR abs/1312.6211 (2013)

    Google Scholar 

  16. Guo, Z., Schlichtkrull, M., Vlachos, A.: A survey on automated fact-checking. Trans. Assoc. Comput. Linguist. 10, 178–206 (2022). https://doi.org/10.1162/tacl_a_00454

    Article  Google Scholar 

  17. Hosseini, M.J., Cohen, S.B., Johnson, M., Steedman, M.: Open-domain contextual link prediction and its complementarity with entailment graphs. In: Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, pp. 2790–2802. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.238. https://aclanthology.org/2021.findings-emnlp.238/

  18. Huang, L., Sil, A., Ji, H., Florian, R.: Improving slot filling performance with attentive neural networks on dependency structures. In: EMNLP (2017)

    Google Scholar 

  19. Huguet Cabot, P.L., Navigli, R.: REBEL: relation extraction by end-to-end language generation. In: Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, pp. 2370–2381. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.204. https://aclanthology.org/2021.findings-emnlp.204

  20. Jaradeh, M.Y., Singh, K., Stocker, M., Auer, S.: Triple classification for scholarly knowledge graph completion. In: Proceedings of the 11th on Knowledge Capture Conference, pp. 225–232 (2021)

    Google Scholar 

  21. Ji, H., Grishman, R.: Knowledge base population: successful approaches and challenges. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 1148–1158. Association for Computational Linguistics (2011). https://aclanthology.org/P11-1115

  22. Ji, S., Pan, S., Cambria, E., Marttinen, P., Yu, P.S.: A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 33(2), 494–514 (2022). https://doi.org/10.1109/TNNLS.2021.3070843

    Article  MathSciNet  Google Scholar 

  23. Kim, J., Choi, K.s.: Unsupervised fact checking by counter-weighted positive and negative evidential paths in a knowledge graph. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 1677–1686. International Committee on Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.coling-main.147. https://aclanthology.org/2020.coling-main.147

  24. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703. https://aclanthology.org/2020.acl-main.703

  25. Li, T., Huang, W., Papasarantopoulos, N., Vougiouklis, P., Pan, J.Z.: Task-specific pre-training and prompt decomposition for knowledge graph population with language models. arXiv abs/2208.12539 (2022)

    Google Scholar 

  26. Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 1073–1094. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1112. https://aclanthology.org/N19-1112

  27. MacCartney, B., Manning, C.D.: Modeling semantic containment and exclusion in natural language inference. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 521–528. Coling 2008 Organizing Committee (2008). https://aclanthology.org/C08-1066

  28. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)

    Google Scholar 

  29. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  30. Peters, M.E., et al.: Knowledge enhanced contextual word representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 43–54 (2019)

    Google Scholar 

  31. Petroni, F., et al.: Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 2463–2473. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1250. https://aclanthology.org/D19-1250

  32. Poerner, N., Waltinger, U., Schütze, H.: E-BERT: efficient-yet-effective entity embeddings for BERT. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 803–818 (2020)

    Google Scholar 

  33. Qin, G., Eisner, J.: Learning how to ask: querying LMs with mixtures of soft prompts. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, pp. 5203–5212. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.naacl-main.410. https://aclanthology.org/2021.naacl-main.410

  34. Richardson, K., Hu, H., Moss, L., Sabharwal, A.: Probing natural language inference models through semantic fragments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8713–8721 (2020). https://doi.org/10.1609/aaai.v34i05.6397. https://ojs.aaai.org/index.php/AAAI/article/view/6397

  35. Rodrigo, Á., Peñas, A., Verdejo, F.: Overview of the answer validation exercise 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 296–313. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04447-2_35

    Chapter  Google Scholar 

  36. Sainz, O., Gonzalez-Dios, I., Lopez de Lacalle, O., Min, B., Agirre, E.: Textual entailment for event argument extraction: zero- and few-shot with multi-source learning. In: Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, United States, pp. 2439–2455. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-naacl.187. https://aclanthology.org/2022.findings-naacl.187/

  37. Sainz, O., de Lacalle, O.L., Labaka, G., Barrena, A., Agirre, E.: Label verbalization and entailment for effective zero and few-shot relation extraction. arXiv abs/2109.03659 (2021)

    Google Scholar 

  38. Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)

    Google Scholar 

  39. Shi, B., Weninger, T.: Open-world knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018). https://doi.org/10.1609/aaai.v32i1.11535. https://ojs.aaai.org/index.php/AAAI/article/view/11535

  40. Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp. 4222–4235. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.346. https://aclanthology.org/2020.emnlp-main.346

  41. Shiralkar, P., Flammini, A., Menczer, F., Ciampaglia, G.L.: Finding streams in knowledge graphs to support fact checking. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 859–864 (2017). https://doi.org/10.1109/ICDM.2017.105

  42. Singhania, S., Nguyen, T.P., Razniewski, S.: LM-KBC: knowledge base construction from pre-trained language models. In: the Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models 2022 co-located with the 21st International Semantic Web Conference (ISWC 2022), Hanghzou, China, vol. 3274 (2022). https://ceur-ws.org/Vol-3274/paper1.pdf

  43. Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  44. Surdeanu, M., Ji, H.: Overview of the english slot filling track at the TAC2014 knowledge base population evaluation. In: Proceedings of Text Analysis Conference (TAC 2014) (2014)

    Google Scholar 

  45. Syed, Z.H., Röder, M., Ngomo, A.-C.N.: Unsupervised discovery of corroborative paths for fact validation. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 630–646. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_36

    Chapter  Google Scholar 

  46. Syed, Z.H., Röder, M., Ngonga Ngomo, A.C.: Factcheck: validating RDF triples using textual evidence. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 1599–1602. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3269206.3269308

  47. Tenney, I., et al.: What do you learn from context? Probing for sentence structure in contextualized word representations. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=SJzSgnRcKX

  48. Thorne, J., Vlachos, A.: Automated fact checking: task formulations, methods and future directions. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3346–3359. Association for Computational Linguistics (2018). https://aclanthology.org/C18-1283

  49. Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1499–1509. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1174. https://aclanthology.org/D15-1174

  50. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  51. Wang, S., Fang, H., Khabsa, M., Mao, H., Ma, H.: Entailment as Few-Shot Learner (2021). arXiv:2104.14690

  52. West, R., Gabrilovich, E., Murphy, K., Sun, S., Gupta, R., Lin, D.: Knowledge base completion via search-based question answering. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 515–526. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2566486.2568032

  53. Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 1112–1122. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/N18-1101. https://aclanthology.org/N18-1101

  54. Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot entity linking with dense entity retrieval. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6397–6407 (2020)

    Google Scholar 

  55. Yao, L., Mao, C., Luo, Y.: KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193 (2019)

  56. Zha, H., Chen, Z., Yan, X.: Inductive Relation Prediction by BERT. In: Proceedings of the First MiniCon Conference (2022). https://aaai-022.virtualchair.net/poster_aaai7162

  57. Zhou, X., Zhang, Y., Cui, L., Huang, D.: Evaluating commonsense in pre-trained language models. In: AAAI (2020)

    Google Scholar 

Download references

Acknowledgement

We are grateful to the European Commission (EU Horizon 2020 EXCELLENT SCIENCE - Research Infrastructure under grant agreement No. 101017501 RELIANCE) and ESA (Contract No. 4000135254/21/NL/GLC/kk FEPOSI) for the support received to carry out this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrés García-Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

García-Silva, A., Berrío, C., Gómez-Pérez, J.M. (2023). Textual Entailment for Effective Triple Validation in Object Prediction. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47240-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47239-8

  • Online ISBN: 978-3-031-47240-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics