Abstract
Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines. However, prompt-based fact retrieval can be brittle and heavily depend on the prompts and context used, which may produce results that are unintended or hallucinatory. We propose to use textual entailment to validate facts extracted from language models through cloze statements. Our results show that triple validation based on textual entailment improves language model predictions in different training regimes. Furthermore, we show that entailment-based triple validation is also effective to validate candidate facts extracted from other sources including existing knowledge graphs and text passages where named entities are recognized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The KBP evaluation track of the TAC [14] is a long running initiative. However, manual system evaluation makes it hard to reproduce evaluation for new systems.
- 3.
See RDF schema in https://www.w3.org/TR/rdf-primer/#properties.
- 4.
While rdf:type is the standard property used to state that a resource is an instance of a class, some knowledge graphs could use other ad-hoc property.
- 5.
Due to the diverse nature of the MISC category we do not consider it.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
The relation mapping can be found in the paper repository.
- 17.
- 18.
- 19.
- 20.
References
Adel, H., Schütze, H.: Type-aware convolutional neural networks for slot filling. J. Artif. Intell. Res. 66, 297–339 (2019)
Alivanistos, D., Santamaría, S., Cochez, M., Kalo, J., van Krieken, E., Thanapalasingam, T.: Prompting as probing: using language models for knowledge base construction. In: Singhania, S., Nguyen, T.P., Razniewski, S. (eds.) LM-KBC 2022 Knowledge Base Construction from Pre-trained Language Models 2022, pp. 11–34. CEUR Workshop Proceedings, CEUR-WS.org (2022)
Balazevic, I., Allen, C., Hospedales, T.: TuckER: tensor factorization for knowledge graph completion. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 5185–5194. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1522. https://aclanthology.org/D19-1522
Balog, K.: Populating knowledge bases. In: Balog, K. (ed.) Entity-Oriented Search. TIRS, vol. 39, pp. 189–222. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93935-3_6
Bentivogli, L., Clark, P., Dagan, I., Giampiccolo, D.: The seventh pascal recognizing textual entailment challenge. In: Theory and Applications of Categories (2011)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://papers.nips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html
Bouraoui, Z., Camacho-Collados, J., Schockaert, S.: Inducing relational knowledge from BERT. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7456–7463 (2020)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 632–642. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1075. https://aclanthology.org/D15-1075
Cao, B., et al.: Knowledgeable or educated guess? Revisiting language models as knowledge bases. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, pp. 1860–1874. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.146. https://aclanthology.org/2021.acl-long.146
Chen, Z., Feng, Y., Zhao, D.: Entailment graph learning with textual entailment and soft transitivity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, pp. 5899–5910. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-long.406. https://aclanthology.org/2022.acl-long.406
Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9
Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, New York, NY, USA, pp. 375–383. Association for Computing Machinery (2017). https://doi.org/10.1145/3018661.3018739
Gerber, D., et al.: Defacto-temporal and multilingual deep fact validation. Web Semant. 35(P2), 85–101 (2015). https://doi.org/10.1016/j.websem.2015.08.001
Getman, J., Ellis, J., Strassel, S., Song, Z., Tracey, J.: Laying the groundwork for knowledge base population: nine years of linguistic resources for TAC KBP. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA) (2018). https://aclanthology.org/L18-1245
Goodfellow, I.J., Mirza, M., Da, X., Courville, A.C., Bengio, Y.: An empirical investigation of catastrophic forgeting in gradient-based neural networks. CoRR abs/1312.6211 (2013)
Guo, Z., Schlichtkrull, M., Vlachos, A.: A survey on automated fact-checking. Trans. Assoc. Comput. Linguist. 10, 178–206 (2022). https://doi.org/10.1162/tacl_a_00454
Hosseini, M.J., Cohen, S.B., Johnson, M., Steedman, M.: Open-domain contextual link prediction and its complementarity with entailment graphs. In: Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, pp. 2790–2802. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.238. https://aclanthology.org/2021.findings-emnlp.238/
Huang, L., Sil, A., Ji, H., Florian, R.: Improving slot filling performance with attentive neural networks on dependency structures. In: EMNLP (2017)
Huguet Cabot, P.L., Navigli, R.: REBEL: relation extraction by end-to-end language generation. In: Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, pp. 2370–2381. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.204. https://aclanthology.org/2021.findings-emnlp.204
Jaradeh, M.Y., Singh, K., Stocker, M., Auer, S.: Triple classification for scholarly knowledge graph completion. In: Proceedings of the 11th on Knowledge Capture Conference, pp. 225–232 (2021)
Ji, H., Grishman, R.: Knowledge base population: successful approaches and challenges. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 1148–1158. Association for Computational Linguistics (2011). https://aclanthology.org/P11-1115
Ji, S., Pan, S., Cambria, E., Marttinen, P., Yu, P.S.: A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 33(2), 494–514 (2022). https://doi.org/10.1109/TNNLS.2021.3070843
Kim, J., Choi, K.s.: Unsupervised fact checking by counter-weighted positive and negative evidential paths in a knowledge graph. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 1677–1686. International Committee on Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.coling-main.147. https://aclanthology.org/2020.coling-main.147
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703. https://aclanthology.org/2020.acl-main.703
Li, T., Huang, W., Papasarantopoulos, N., Vougiouklis, P., Pan, J.Z.: Task-specific pre-training and prompt decomposition for knowledge graph population with language models. arXiv abs/2208.12539 (2022)
Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 1073–1094. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1112. https://aclanthology.org/N19-1112
MacCartney, B., Manning, C.D.: Modeling semantic containment and exclusion in natural language inference. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 521–528. Coling 2008 Organizing Committee (2008). https://aclanthology.org/C08-1066
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Peters, M.E., et al.: Knowledge enhanced contextual word representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 43–54 (2019)
Petroni, F., et al.: Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 2463–2473. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1250. https://aclanthology.org/D19-1250
Poerner, N., Waltinger, U., Schütze, H.: E-BERT: efficient-yet-effective entity embeddings for BERT. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 803–818 (2020)
Qin, G., Eisner, J.: Learning how to ask: querying LMs with mixtures of soft prompts. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, pp. 5203–5212. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.naacl-main.410. https://aclanthology.org/2021.naacl-main.410
Richardson, K., Hu, H., Moss, L., Sabharwal, A.: Probing natural language inference models through semantic fragments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8713–8721 (2020). https://doi.org/10.1609/aaai.v34i05.6397. https://ojs.aaai.org/index.php/AAAI/article/view/6397
Rodrigo, Á., Peñas, A., Verdejo, F.: Overview of the answer validation exercise 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 296–313. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04447-2_35
Sainz, O., Gonzalez-Dios, I., Lopez de Lacalle, O., Min, B., Agirre, E.: Textual entailment for event argument extraction: zero- and few-shot with multi-source learning. In: Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, United States, pp. 2439–2455. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-naacl.187. https://aclanthology.org/2022.findings-naacl.187/
Sainz, O., de Lacalle, O.L., Labaka, G., Barrena, A., Agirre, E.: Label verbalization and entailment for effective zero and few-shot relation extraction. arXiv abs/2109.03659 (2021)
Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)
Shi, B., Weninger, T.: Open-world knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018). https://doi.org/10.1609/aaai.v32i1.11535. https://ojs.aaai.org/index.php/AAAI/article/view/11535
Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp. 4222–4235. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.346. https://aclanthology.org/2020.emnlp-main.346
Shiralkar, P., Flammini, A., Menczer, F., Ciampaglia, G.L.: Finding streams in knowledge graphs to support fact checking. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 859–864 (2017). https://doi.org/10.1109/ICDM.2017.105
Singhania, S., Nguyen, T.P., Razniewski, S.: LM-KBC: knowledge base construction from pre-trained language models. In: the Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models 2022 co-located with the 21st International Semantic Web Conference (ISWC 2022), Hanghzou, China, vol. 3274 (2022). https://ceur-ws.org/Vol-3274/paper1.pdf
Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Surdeanu, M., Ji, H.: Overview of the english slot filling track at the TAC2014 knowledge base population evaluation. In: Proceedings of Text Analysis Conference (TAC 2014) (2014)
Syed, Z.H., Röder, M., Ngomo, A.-C.N.: Unsupervised discovery of corroborative paths for fact validation. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 630–646. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_36
Syed, Z.H., Röder, M., Ngonga Ngomo, A.C.: Factcheck: validating RDF triples using textual evidence. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 1599–1602. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3269206.3269308
Tenney, I., et al.: What do you learn from context? Probing for sentence structure in contextualized word representations. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=SJzSgnRcKX
Thorne, J., Vlachos, A.: Automated fact checking: task formulations, methods and future directions. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3346–3359. Association for Computational Linguistics (2018). https://aclanthology.org/C18-1283
Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1499–1509. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1174. https://aclanthology.org/D15-1174
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Wang, S., Fang, H., Khabsa, M., Mao, H., Ma, H.: Entailment as Few-Shot Learner (2021). arXiv:2104.14690
West, R., Gabrilovich, E., Murphy, K., Sun, S., Gupta, R., Lin, D.: Knowledge base completion via search-based question answering. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 515–526. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2566486.2568032
Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 1112–1122. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/N18-1101. https://aclanthology.org/N18-1101
Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot entity linking with dense entity retrieval. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6397–6407 (2020)
Yao, L., Mao, C., Luo, Y.: KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193 (2019)
Zha, H., Chen, Z., Yan, X.: Inductive Relation Prediction by BERT. In: Proceedings of the First MiniCon Conference (2022). https://aaai-022.virtualchair.net/poster_aaai7162
Zhou, X., Zhang, Y., Cui, L., Huang, D.: Evaluating commonsense in pre-trained language models. In: AAAI (2020)
Acknowledgement
We are grateful to the European Commission (EU Horizon 2020 EXCELLENT SCIENCE - Research Infrastructure under grant agreement No. 101017501 RELIANCE) and ESA (Contract No. 4000135254/21/NL/GLC/kk FEPOSI) for the support received to carry out this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
García-Silva, A., Berrío, C., Gómez-Pérez, J.M. (2023). Textual Entailment for Effective Triple Validation in Object Prediction. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-47240-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47239-8
Online ISBN: 978-3-031-47240-4
eBook Packages: Computer ScienceComputer Science (R0)