Abstract
Structured knowledge bases (KBs) are a foundation of many intelligent applications, yet are notoriously incomplete. Language models (LMs) have recently been proposed for unsupervised knowledge base completion (KBC), yet, despite encouraging initial results, questions regarding their suitability remain open. Existing evaluations often fall short because they only evaluate on popular subjects, or sample already existing facts from KBs. In this work, we introduce a novel, more challenging benchmark dataset, and a methodology tailored for a realistic assessment of the KBC potential of LMs. For automated assessment, we curate a dataset called WD-Known, which provides an unbiased random sample of Wikidata, containing over 3.9 million facts. In a second step, we perform a human evaluation on predictions that are not yet in the KB, as only this provides real insights into the added value over existing KBs. Our key finding is that biases in dataset conception of previous benchmarks lead to a systematic overestimate of LM performance for KBC. However, our results also reveal strong areas of LMs. We could, for example, perform a significant completion of Wikidata on the relations nativeLanguage, by a factor of \(\sim \)21 (from 260k to 5.8M) at \(82\%\) precision, and citizenOf by a factor of \(\sim \)0.3 (from 4.2M to 5.3M) at 90% precision. Moreover, we find that LMs possess surprisingly strong generalization capabilities: even on relations where most facts were not directly observed in LM training, prediction quality can be high. We open-source the benchmark dataset and code. (https://github.com/bveseli/LMsForKBC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Wikidata might broadly fall in between, as its aim is human-curated quality, but major portions are imported semi-automatically from other sources.
- 3.
There is often a terminological confusion here: Automated editing is omnipresent on Wikidata, but the bots performing them typically execute meticulously pre-defined edit and insertion tasks (e.g., based on other structured sources), not based on statistical inference.
- 4.
- 5.
- 6.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NeurIPS (2013)
Cao, N.D., Aziz, W., Titov, I.: Editing factual knowledge in language models. In: EMNLP (2021)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Cohen, R., Geva, M., Berant, J., Globerson, A.: Crawling the internal knowledge-base of language models. In: Findings of EACL (2023)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Elazar, Y., et al.: Measuring and improving consistency in pretrained language models. TACL 9, 1012–1031 (2021)
Elsahar, H., Vougiouklis, P., Remaci, A., Gravier, C., Hare, J., Laforest, F., Simperl, E.: T-REx: a large scale alignment of natural language with knowledge base triples. In: LREC (2018)
Heinzerling, B., Inui, K.: Language models as knowledge bases: on entity representations, storage capacity, and paraphrased queries. In: EACL (2021)
Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. CACM 38, 33–38 (1995)
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM CSUR 55, 1–35 (2022)
Lv, X., et al.: Do pre-trained models benefit knowledge graph completion? A reliable evaluation and a reasonable approach. In: Findings of ACL (2022)
Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multi-relational data. In: ICML (2011)
Paulheim, H.: How much is a triple? Estimating the cost of knowledge graph creation. In: ISWC (2018)
Petroni, F., et al.: How context affects language models’ factual predictions. In: AKBC (2020)
Petroni, F., et al.: Language models as knowledge bases? In: EMNLP (2019)
Poerner, N., Waltinger, U., Schütze, H.: E-BERT: efficient-yet-effective entity embeddings for BERT. In: Findings of EMNLP (2020)
Razniewski, S., Yates, A., Kassner, N., Weikum, G.: Language models as or for knowledge bases. In: DL4KG (2021)
Roberts, A., Raffel, C., Shazeer, N.: How much knowledge can you pack into the parameters of a language model? In: EMNLP (2020)
Safavi, T., Koutra, D.: CoDEx: a comprehensive knowledge graph completion benchmark. In: EMNLP (2020)
Shaik, Z., Ilievski, F., Morstatter, F.: Analyzing race and country of citizenship bias in wikidata (2021)
Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: EMNLP (2020)
Singhania, S., Nguyen, T.P., Razniewski, S.: LM-KBC: Knowledge base construction from pre-trained language models. CEUR (2022)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW (2007)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57, 78–85 (2014)
Weikum, G., Dong, L., Razniewski, S., Suchanek, F.M.: Machine knowledge: creation and curation of comprehensive knowledge bases. In: FnT (2021)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Veseli, B., Singhania, S., Razniewski, S., Weikum, G. (2023). Evaluating Language Models for Knowledge Base Completion. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-33455-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33454-2
Online ISBN: 978-3-031-33455-9
eBook Packages: Computer ScienceComputer Science (R0)