Skip to main content

Explanation of Link Predictions on Knowledge Graphs via Levelwise Filtering and Graph Summarization

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2024)

Abstract

Link Prediction methods aim at predicting missing facts in Knowledge Graphs (KGs) as they are inherently incomplete. Several methods rely on Knowledge Graph Embeddings, which are numerical representations of elements in the Knowledge Graph. Embeddings are effective and scalable for large KGs; however, they lack explainability.Kelpie is a recent and versatile framework that provides post-hoc explanations for predictions based on embeddings by revealing the facts that enabled them. Problems have been recognized, however, with filtering potential explanations and dealing with an overload of candidates. We aim at enhancing Kelpie by targeting three goals: reducing the number of candidates, producing explanations at different levels of detail, and improving the effectiveness of the explanations. To accomplish them, we adopt a semantic similarity measure to enhance the filtering of potential explanations, and we focus on a condensed representation of the search space in the form of a quotient graph based on entity types. Three quotient formulations of different granularity are considered to reduce the risk of losing valuable information. We conduct a quantitative and qualitative experimental evaluation of the proposed solutions, using Kelpie as a baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/rbarile17/kelpiePP.

  2. 2.

    https://dbpedia.org/sparql.

  3. 3.

    https://databus.dbpedia.org/ontologies/dbpedia.org/ontology--DEV.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  2. Baltatzis, V., Costabello, L.: Kgex: explaining knowledge graph embeddings via subgraph sampling and knowledge distillation. arXiv preprint arXiv:2310.01065 (2023)

  3. Betz, P., Meilicke, C., Stuckenschmidt, H.: Adversarial explanations for knowledge graph embeddings. In: IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, pp. 2820–2826 (2022). https://doi.org/10.24963/ijcai.2022/391

  4. Bhowmik, R., de Melo, G.: Explainable link prediction for emerging entities in knowledge graphs. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12506, pp. 39–55. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_3

    Chapter  Google Scholar 

  5. Bollacker, K., Cook, R., Tufts, P.: Freebase: a shared database of structured general human knowledge. In: AAAI 2007, pp. 1962–1963. AAAI Press (2007). https://doi.org/10.5555/1619797.1619981

  6. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26 (2013). https://doi.org/10.5555/2999792.2999923

  7. Buneman, P., Staworko, S.: RDF graph alignment with bisimulation. Proc. VLDB Endow. 9(12), 1149–1160 (2016). https://doi.org/10.14778/2994509.2994531

    Article  Google Scholar 

  8. Čebirić, Š, et al.: Summarizing semantic graphs: a survey. VLDB J. 28(3), 295–327 (2019). https://doi.org/10.1007/s00778-018-0528-3

    Article  Google Scholar 

  9. Cohen, S., et al.: Drug repurposing using link prediction on knowledge graphs with applications to non-volatile memory. In: Benito, R.M., Cherifi, C., Cherifi, H., Moro, E., Rocha, L.M., Sales-Pardo, M. (eds.) COMPLEX NETWORKS 2021. Studies in Computational Intelligence, vol. 1073, pp. 742–753. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93413-2_61

    Chapter  Google Scholar 

  10. d’Amato, C., Masella, P., Fanizzi, N.: An approach based on semantic similarity to explaining link predictions on knowledge graphs. In: WI-IAT 2021, pp. 170–177 (2021). https://doi.org/10.1145/3486622.3493956

  11. Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI 2018. AAAI Press (2018). https://doi.org/10.5555/3504035.3504256

  12. Ding, B., Wang, Q., Wang, B., Guo, L.: Improving knowledge graph embedding using simple constraints. In: ACL 2018, pp. 110–121. ACL (2018). https://doi.org/10.18653/v1/P18-1011

  13. Dong, X.L.: Building a broad knowledge graph for products. In: IEEE-ICDE 2019, pp. 25–25. IEEE (2019). https://doi.org/10.1109/ICDE.2019.00010

  14. Dovier, A., Piazza, C., Policriti, A.: A fast bisimulation algorithm. In: Berry, G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 79–90. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44585-4_8

    Chapter  Google Scholar 

  15. Gentilini, R., Piazza, C., Policriti, A.: From bisimulation to simulation: coarsest partition problems. J. Autom. Reason. 31, 73–103 (2003). https://doi.org/10.1023/A:1027328830731

    Article  MathSciNet  Google Scholar 

  16. Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: HermiT: an OWL 2 reasoner. J. Autom. Reason. 53(3), 245–269 (2014). https://doi.org/10.1007/s10817-014-9305-1

    Article  Google Scholar 

  17. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018). https://doi.org/10.1145/3236009

    Article  Google Scholar 

  18. Hogan, A., et al.: Knowledge Graphs. No. 22 in Synthesis Lectures on Data, Semantics, and Knowledge, Springer, Switzerland (2021). https://doi.org/10.2200/S01125ED1V01Y202109DSK022, https://kgbook.org/

  19. Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML 2017, pp. 1885–1894. PMLR (2017). https://doi.org/10.5555/3305381.3305576

  20. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://doi.org/10.5555/3295222.3295230

  21. Monroe, D.: AI, explain yourself. Commun. ACM 61(11), 11–13 (2018). https://doi.org/10.1145/3276742

    Article  Google Scholar 

  22. Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987). https://doi.org/10.1137/0216062

    Article  MathSciNet  Google Scholar 

  23. Pellissier Tanon, T., Weikum, G., Suchanek, F.: YAGO 4: a reason-able knowledge base. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 583–596. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_34

    Chapter  Google Scholar 

  24. Pezeshkpour, P., Tian, Y., Singh, S.: Investigating robustness and interpretability of link prediction via adversarial modifications. arXiv preprint arXiv:1905.00563 (2019)

  25. Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discovery Data (TKDD) 15(2), 1–49 (2021). https://doi.org/10.1145/3424672

    Article  Google Scholar 

  26. Rossi, A., Firmani, D., Merialdo, P., Teofili, T.: Explaining link prediction systems based on knowledge graph embeddings. In: SIGMOD 2022, pp. 2062–2075 (2022). https://doi.org/10.1145/3514221.3517887

  27. Rousseeuw, P.J., Hampel, F.R., Ronchetti, E.M., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, Hoboken (2011). https://doi.org/10.1002/9781118186435

  28. Shapley, L.S.: A value for n-person games. In: Contributions to the Theory of Games II (1953). Princeton University Press (1997). https://doi.org/10.1515/9781400829156-012

  29. Shi, B., Weninger, T.: Open-world knowledge graph completion. In: AAAI 2018, vol. 32 (2018). https://doi.org/10.1609/aaai.v32i1.11535

  30. Singhal, A.: Introducing the knowledge graph: things, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not

  31. Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML 2016, pp. 2071–2080. PMLR (2016). https://doi.org/10.5555/3045390.3045609

  32. Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: GNNExplainer: generating explanations for graph neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019). https://dl.acm.org/doi/10.5555/3454287.3455116

  33. Zhang, H., et al.: Data poisoning attack against knowledge graph embedding. In: International Joint Conference on Artificial Intelligence, pp. 4853–4859 (2019). https://doi.org/10.24963/ijcai.2019/674

  34. Zhang, W., Deng, S., Wang, H., Chen, Q., Zhang, W., Chen, H.: Xtranse: explainable knowledge graph embedding for link prediction with lifestyles in e-commerce. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds.) JIST 2019. LNCS, vol. 1157, pp. 78–87. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3412-6_8

    Chapter  Google Scholar 

  35. Zhang, W., Paudel, B., Zhang, W., Bernstein, A., Chen, H.: Interaction embeddings for prediction and explanation in knowledge graphs. In: WSDM 2019, pp. 96–104 (2019). https://doi.org/10.1145/3289600.3291014

Download references

Acknowledgments

This work was partially supported by project FAIR - Future AI Research (PE00000013), spoke 6 - Symbiotic AI (https://future-ai-research.it/), under the NRRP MUR program funded by the NextGenerationEU and by project HypeKG - Hybrid Prediction and Explanation with Knowledge Graphs (H53D23003700006), under PRIN 2022 program funded by MUR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Barile .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors declare to have no competing interests.

Appendices

A Appendix: Repair of Inconsistent and Unsatisfiable Ontologies

We integrated the datasets DB100K and DB50K with OWL schema axioms retrieved from various sources. The resulting ontologies were inconsistent; hence, we manually repaired them. Moreover, YAGO4-20 turned out to contain unsatisfiable classes. Specifically, we identified the causes of such problems by running the explanation facility of the reasoner. In this appendix, we report some insights on the adjustments that we performed to make DB100K consistent and YAGO4-20 satisfiable, hence “reasonable”. In our GitHub repository, we report all the adaptations.

Table 5. Hyper-parameters of the models

For DB100K, we modified the types for certain instances. For instance, our SPARQL query to retrieve the class assertions returned \(\textit{Politician}\), and \(\textit{TimePeriod}\) for several entities. Such types led to inconsistencies as \(\textit{Politician}\) is a subclass of \(\textit{Person}\) which, in turn, is disjoint with \(\textit{TimePeriod}\). We modified these entities, keeping only the type \(\textit{Politician}\).

We also modified schema axioms in certain cases to preserve several triples, which otherwise would have led to inconsistencies. For instance, we modified the range of the property \(\textit{location}\). We recall that the range of a relationship p specify the classes whose instances can occur as object in a triple with predicate p. We changed the range from \(\textit{Place}\) to \(\textit{Place} \sqcup \textit{Company}\). Through this adjustment, we accommodate also triples having \(\textit{location}\) as predicate and an instance of \(\textit{Company}\) as object.

In other cases, we needed to remove triples. For instance, we removed the triple \(\langle \textit{Subramanian\_Swamy}, \textit{region}, \textit{Economics} \rangle \). These triples caused an inconsistency because the type of \(\textit{Economics}\) is, \(\textit{University}\) which in turn is a descendant of \(\textit{Agent}\) in the class hierarchy. However, the range of \(\textit{region}\) is \(\textit{Place}\) which is disjoint with \(\textit{Agent}\).

For YAGO4-20, we removed certain \(\textit{subClassOf}\) axioms. For instance, the class \(\textit{Districts\_of\_Slovakia}\) was a subclass of \(\textit{AdministrativeArea}\), \(\textit{Product}\), and \(\textit{CreativeWork}\). The class \(\textit{Districts\_of\_Slovakia}\) was unsatisfiable as \(\textit{AdministrativeArea}\) is subclass of \(\textit{Localization}\) which is disjoint with \(\textit{CreativeWork}\). We modified it keeping only \(\textit{AdministrativeArea}\) and \(\textit{Product}\) as super-classes.

B Appendix: Hyper-parameters

In this appendix, we report in Table 5 the hyper-parameters that we adopted to train each model on each dataset. Furthermore, we employed the same set of hyper-parameters to execute the post-training in the extraction of the explanations and to retrain the models in the evaluation of the explanations.

Note that:

  • D is the embedding dimension, in the models that we adopted entity and relation embeddings always have same dimension

  • p is the exponent of the p-norm

  • Lr is the learning rate

  • B is the batch size

  • Ep is the number of epochs

  • \(\gamma \) is the margin in the Pairwise Ranking Loss

  • N is the number of negative triples generated for each positive triple

  • \(\omega \) is the size of the convolutional kernels

  • Drop is the training dropout rate, specifically:

    • in is the input dropout

    • h is the dropout applied after a hidden layer

    • feat is the feature dropout

We adopted random search to find the values of the hyper-parameters. Exceptions are given by B and Ep. For B we adopted the value leading to optimize execution times and parallelism. For Ep we adopted early stopping with 1000 as maximum number of epochs and 5 as patience threshold during the training of the models, and we reported the epoch on which the training stopped. Hence, we used such value as number of epochs in the post-training and in the evaluation. Furthermore, as in Kelpie, for TransE we adopted the learning rate (Lr) values in Table 5 during training and evaluation, but for the post-training we used a different value. For TransE the batch size (B) is particularly large (2048) and usually exceeds by far the number of triples featuring an entity. This affects post-training because in any post-training epoch the entity would only benefit from one optimization step. We easily balanced this by increasing the Lr to 0.01.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barile, R., d’Amato, C., Fanizzi, N. (2024). Explanation of Link Predictions on Knowledge Graphs via Levelwise Filtering and Graph Summarization. In: Meroño Peñuela, A., et al. The Semantic Web. ESWC 2024. Lecture Notes in Computer Science, vol 14664. Springer, Cham. https://doi.org/10.1007/978-3-031-60626-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-60626-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-60625-0

  • Online ISBN: 978-3-031-60626-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics