Explanation of Link Predictions on Knowledge Graphs via Levelwise Filtering and Graph Summarization

Barile, Roberto; d’Amato, Claudia; Fanizzi, Nicola

doi:10.1007/978-3-031-60626-7_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14664))

Included in the following conference series:

European Semantic Web Conference

207 Accesses

Abstract

Link Prediction methods aim at predicting missing facts in Knowledge Graphs (KGs) as they are inherently incomplete. Several methods rely on Knowledge Graph Embeddings, which are numerical representations of elements in the Knowledge Graph. Embeddings are effective and scalable for large KGs; however, they lack explainability.Kelpie is a recent and versatile framework that provides post-hoc explanations for predictions based on embeddings by revealing the facts that enabled them. Problems have been recognized, however, with filtering potential explanations and dealing with an overload of candidates. We aim at enhancing Kelpie by targeting three goals: reducing the number of candidates, producing explanations at different levels of detail, and improving the effectiveness of the explanations. To accomplish them, we adopt a semantic similarity measure to enhance the filtering of potential explanations, and we focus on a condensed representation of the search space in the form of a quotient graph based on entity types. Three quotient formulations of different granularity are considered to reduce the risk of losing valuable information. We conduct a quantitative and qualitative experimental evaluation of the proposed solutions, using Kelpie as a baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Baltatzis, V., Costabello, L.: Kgex: explaining knowledge graph embeddings via subgraph sampling and knowledge distillation. arXiv preprint arXiv:2310.01065 (2023)
Betz, P., Meilicke, C., Stuckenschmidt, H.: Adversarial explanations for knowledge graph embeddings. In: IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, pp. 2820–2826 (2022). https://doi.org/10.24963/ijcai.2022/391
Bhowmik, R., de Melo, G.: Explainable link prediction for emerging entities in knowledge graphs. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12506, pp. 39–55. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_3
Chapter Google Scholar
Bollacker, K., Cook, R., Tufts, P.: Freebase: a shared database of structured general human knowledge. In: AAAI 2007, pp. 1962–1963. AAAI Press (2007). https://doi.org/10.5555/1619797.1619981
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26 (2013). https://doi.org/10.5555/2999792.2999923
Buneman, P., Staworko, S.: RDF graph alignment with bisimulation. Proc. VLDB Endow. 9(12), 1149–1160 (2016). https://doi.org/10.14778/2994509.2994531
Article Google Scholar
Čebirić, Š, et al.: Summarizing semantic graphs: a survey. VLDB J. 28(3), 295–327 (2019). https://doi.org/10.1007/s00778-018-0528-3
Article Google Scholar
Cohen, S., et al.: Drug repurposing using link prediction on knowledge graphs with applications to non-volatile memory. In: Benito, R.M., Cherifi, C., Cherifi, H., Moro, E., Rocha, L.M., Sales-Pardo, M. (eds.) COMPLEX NETWORKS 2021. Studies in Computational Intelligence, vol. 1073, pp. 742–753. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93413-2_61
Chapter Google Scholar
d’Amato, C., Masella, P., Fanizzi, N.: An approach based on semantic similarity to explaining link predictions on knowledge graphs. In: WI-IAT 2021, pp. 170–177 (2021). https://doi.org/10.1145/3486622.3493956
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI 2018. AAAI Press (2018). https://doi.org/10.5555/3504035.3504256
Ding, B., Wang, Q., Wang, B., Guo, L.: Improving knowledge graph embedding using simple constraints. In: ACL 2018, pp. 110–121. ACL (2018). https://doi.org/10.18653/v1/P18-1011
Dong, X.L.: Building a broad knowledge graph for products. In: IEEE-ICDE 2019, pp. 25–25. IEEE (2019). https://doi.org/10.1109/ICDE.2019.00010
Dovier, A., Piazza, C., Policriti, A.: A fast bisimulation algorithm. In: Berry, G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 79–90. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44585-4_8
Chapter Google Scholar
Gentilini, R., Piazza, C., Policriti, A.: From bisimulation to simulation: coarsest partition problems. J. Autom. Reason. 31, 73–103 (2003). https://doi.org/10.1023/A:1027328830731
Article MathSciNet Google Scholar
Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: HermiT: an OWL 2 reasoner. J. Autom. Reason. 53(3), 245–269 (2014). https://doi.org/10.1007/s10817-014-9305-1
Article Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018). https://doi.org/10.1145/3236009
Article Google Scholar
Hogan, A., et al.: Knowledge Graphs. No. 22 in Synthesis Lectures on Data, Semantics, and Knowledge, Springer, Switzerland (2021). https://doi.org/10.2200/S01125ED1V01Y202109DSK022, https://kgbook.org/
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML 2017, pp. 1885–1894. PMLR (2017). https://doi.org/10.5555/3305381.3305576
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://doi.org/10.5555/3295222.3295230
Monroe, D.: AI, explain yourself. Commun. ACM 61(11), 11–13 (2018). https://doi.org/10.1145/3276742
Article Google Scholar
Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987). https://doi.org/10.1137/0216062
Article MathSciNet Google Scholar
Pellissier Tanon, T., Weikum, G., Suchanek, F.: YAGO 4: a reason-able knowledge base. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 583–596. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_34
Chapter Google Scholar
Pezeshkpour, P., Tian, Y., Singh, S.: Investigating robustness and interpretability of link prediction via adversarial modifications. arXiv preprint arXiv:1905.00563 (2019)
Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discovery Data (TKDD) 15(2), 1–49 (2021). https://doi.org/10.1145/3424672
Article Google Scholar
Rossi, A., Firmani, D., Merialdo, P., Teofili, T.: Explaining link prediction systems based on knowledge graph embeddings. In: SIGMOD 2022, pp. 2062–2075 (2022). https://doi.org/10.1145/3514221.3517887
Rousseeuw, P.J., Hampel, F.R., Ronchetti, E.M., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, Hoboken (2011). https://doi.org/10.1002/9781118186435
Shapley, L.S.: A value for n-person games. In: Contributions to the Theory of Games II (1953). Princeton University Press (1997). https://doi.org/10.1515/9781400829156-012
Shi, B., Weninger, T.: Open-world knowledge graph completion. In: AAAI 2018, vol. 32 (2018). https://doi.org/10.1609/aaai.v32i1.11535
Singhal, A.: Introducing the knowledge graph: things, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML 2016, pp. 2071–2080. PMLR (2016). https://doi.org/10.5555/3045390.3045609
Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: GNNExplainer: generating explanations for graph neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019). https://dl.acm.org/doi/10.5555/3454287.3455116
Zhang, H., et al.: Data poisoning attack against knowledge graph embedding. In: International Joint Conference on Artificial Intelligence, pp. 4853–4859 (2019). https://doi.org/10.24963/ijcai.2019/674
Zhang, W., Deng, S., Wang, H., Chen, Q., Zhang, W., Chen, H.: Xtranse: explainable knowledge graph embedding for link prediction with lifestyles in e-commerce. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds.) JIST 2019. LNCS, vol. 1157, pp. 78–87. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3412-6_8
Chapter Google Scholar
Zhang, W., Paudel, B., Zhang, W., Bernstein, A., Chen, H.: Interaction embeddings for prediction and explanation in knowledge graphs. In: WSDM 2019, pp. 96–104 (2019). https://doi.org/10.1145/3289600.3291014

Download references

Acknowledgments

This work was partially supported by project FAIR - Future AI Research (PE00000013), spoke 6 - Symbiotic AI (https://future-ai-research.it/), under the NRRP MUR program funded by the NextGenerationEU and by project HypeKG - Hybrid Prediction and Explanation with Knowledge Graphs (H53D23003700006), under PRIN 2022 program funded by MUR.

Author information

Authors and Affiliations

Dipartimento di Informatica – University of Bari Aldo Moro, Bari, Italy
Roberto Barile, Claudia d’Amato & Nicola Fanizzi
CILA – University of Bari Aldo Moro, Bari, Italy
Claudia d’Amato & Nicola Fanizzi

Authors

Roberto Barile
View author publications
You can also search for this author in PubMed Google Scholar
Claudia d’Amato
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Fanizzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto Barile .

Editor information

Editors and Affiliations

King’s College London, London, UK
Albert Meroño Peñuela
KU Leuven, Sint-Katelijne-Waver, Belgium
Anastasia Dimou
EURECOM, Biot, France
Raphaël Troncy
Linköping University, Linköping, Sweden
Olaf Hartig
Technical University of Munich, Heilbronn, Germany
Maribel Acosta
Polytechnic Institute of Paris, Palaiseau, France
Mehwish Alam
University of Mannheim, Mannheim, Germany
Heiko Paulheim
EURECOM, Biot, France
Pasquale Lisena

Ethics declarations

Disclosure of Interests

The authors declare to have no competing interests.

Appendices

A Appendix: Repair of Inconsistent and Unsatisfiable Ontologies

We integrated the datasets DB100K and DB50K with OWL schema axioms retrieved from various sources. The resulting ontologies were inconsistent; hence, we manually repaired them. Moreover, YAGO4-20 turned out to contain unsatisfiable classes. Specifically, we identified the causes of such problems by running the explanation facility of the reasoner. In this appendix, we report some insights on the adjustments that we performed to make DB100K consistent and YAGO4-20 satisfiable, hence “reasonable”. In our GitHub repository, we report all the adaptations.

Table 5. Hyper-parameters of the models

Full size table

For DB100K, we modified the types for certain instances. For instance, our SPARQL query to retrieve the class assertions returned \(\textit{Politician}\), and \(\textit{TimePeriod}\) for several entities. Such types led to inconsistencies as \(\textit{Politician}\) is a subclass of \(\textit{Person}\) which, in turn, is disjoint with \(\textit{TimePeriod}\). We modified these entities, keeping only the type \(\textit{Politician}\).

We also modified schema axioms in certain cases to preserve several triples, which otherwise would have led to inconsistencies. For instance, we modified the range of the property \(\textit{location}\). We recall that the range of a relationship p specify the classes whose instances can occur as object in a triple with predicate p. We changed the range from \(\textit{Place}\) to \(\textit{Place} \sqcup \textit{Company}\). Through this adjustment, we accommodate also triples having \(\textit{location}\) as predicate and an instance of \(\textit{Company}\) as object.

In other cases, we needed to remove triples. For instance, we removed the triple \(\langle \textit{Subramanian\_Swamy}, \textit{region}, \textit{Economics} \rangle \). These triples caused an inconsistency because the type of \(\textit{Economics}\) is, \(\textit{University}\) which in turn is a descendant of \(\textit{Agent}\) in the class hierarchy. However, the range of \(\textit{region}\) is \(\textit{Place}\) which is disjoint with \(\textit{Agent}\).

For YAGO4-20, we removed certain \(\textit{subClassOf}\) axioms. For instance, the class \(\textit{Districts\_of\_Slovakia}\) was a subclass of \(\textit{AdministrativeArea}\), \(\textit{Product}\), and \(\textit{CreativeWork}\). The class \(\textit{Districts\_of\_Slovakia}\) was unsatisfiable as \(\textit{AdministrativeArea}\) is subclass of \(\textit{Localization}\) which is disjoint with \(\textit{CreativeWork}\). We modified it keeping only \(\textit{AdministrativeArea}\) and \(\textit{Product}\) as super-classes.

B Appendix: Hyper-parameters

In this appendix, we report in Table 5 the hyper-parameters that we adopted to train each model on each dataset. Furthermore, we employed the same set of hyper-parameters to execute the post-training in the extraction of the explanations and to retrain the models in the evaluation of the explanations.

Note that:

D is the embedding dimension, in the models that we adopted entity and relation embeddings always have same dimension
p is the exponent of the p-norm
Lr is the learning rate
B is the batch size
Ep is the number of epochs
\(\gamma \) is the margin in the Pairwise Ranking Loss
N is the number of negative triples generated for each positive triple
\(\omega \) is the size of the convolutional kernels
Drop is the training dropout rate, specifically:
- in is the input dropout
- h is the dropout applied after a hidden layer
- feat is the feature dropout

We adopted random search to find the values of the hyper-parameters. Exceptions are given by B and Ep. For B we adopted the value leading to optimize execution times and parallelism. For Ep we adopted early stopping with 1000 as maximum number of epochs and 5 as patience threshold during the training of the models, and we reported the epoch on which the training stopped. Hence, we used such value as number of epochs in the post-training and in the evaluation. Furthermore, as in Kelpie, for TransE we adopted the learning rate (Lr) values in Table 5 during training and evaluation, but for the post-training we used a different value. For TransE the batch size (B) is particularly large (2048) and usually exceeds by far the number of triples featuring an entity. This affects post-training because in any post-training epoch the entity would only benefit from one optimization step. We easily balanced this by increasing the Lr to 0.01.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barile, R., d’Amato, C., Fanizzi, N. (2024). Explanation of Link Predictions on Knowledge Graphs via Levelwise Filtering and Graph Summarization. In: Meroño Peñuela, A., et al. The Semantic Web. ESWC 2024. Lecture Notes in Computer Science, vol 14664. Springer, Cham. https://doi.org/10.1007/978-3-031-60626-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-60626-7_10
Published: 19 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60625-0
Online ISBN: 978-3-031-60626-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explanation of Link Predictions on Knowledge Graphs via Levelwise Filtering and Graph Summarization