Skip to main content

Supervised Knowledge Aggregation for Knowledge Graph Completion

  • 496 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13261)

Abstract

We explore data-driven rule aggregation based on latent feature representations in the context of knowledge graph completion. For a given query and a collection of rules obtained by a symbolic rule learning system, we propose end-to-end trainable aggregation functions for combining the rules into a confidence score answering the query. Despite using latent feature representations for rules, the proposed models remain fully interpretable in terms of the underlying symbolic approach. While our models improve the base learner constantly and achieve competitive results on various benchmark knowledge graphs, we outperform current state-of-the-art with respect to a biomedical knowledge graph by a significant margin. We argue that our approach is in particular well suited for link prediction tasks dealing with a large multi-relational knowledge graph with several million triples, while the queries of interest focus on only one specific target relation.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-06981-9_5
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-06981-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    The project repository and code can be found at this URL.

  2. 2.

    All derivations throughout the work are equivalent for the head/subject direction.

  3. 3.

    The common gradient approximation for the \(\mathtt {Max}\) function is, e.g., \(\nabla {max(y_1, y_2)} = [1,0]\) for \(y_1 > y_2\) and [0, 1] otherwise.

References

  1. Ali, M., et al.: Bringing light into the dark: a large-scale evaluation of knowledge graph embedding models under a unified framework. IEEE Trans. Pattern Anal. Mach. Intell., 1–1 (2021). https://doi.org/10.1109/TPAMI.2021.3124805

  2. Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)

    CrossRef  Google Scholar 

  3. Betz, P., Niepert, M., Minervini, P., Stuckenschmidt, H.: Backpropagating through Markov logic networks. In: Proceedings of 15th International Workshop on Neural-Symbolic Learning and Reasoning, vol. 2986, pp. 67–81. CEUR (2021)

    Google Scholar 

  4. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Neural Information Processing Systems (NIPS), pp. 1–9 (2013)

    Google Scholar 

  5. Broscheit, S., Ruffinelli, D., Kochsiek, A., Betz, P., Gemulla, R.: LibKGE-a knowledge graph embedding library for reproducible research. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 165–174 (2020)

    Google Scholar 

  6. Chen, S., Liu, X., Gao, J., Jiao, J., Zhang, R., Ji, Y.: Hitter: hierarchical transformers for knowledge graph embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)

    Google Scholar 

  7. Cohen, W., Yang, F., Mazaitis, K.R.: TensorLog: a probabilistic database implemented using deep-learning infrastructure. J. Artif. Intell. Res. 67, 285–325 (2020)

    MathSciNet  CrossRef  Google Scholar 

  8. Das, R., et al.: Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning. arXiv preprint arXiv:1711.05851 (2017)

  9. Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, pp. 1811–1818 (2018)

    Google Scholar 

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long and Short Papers), vol. 1, June 2019

    Google Scholar 

  11. Dörpinghaus, J., Jacobs, M.: Semantic knowledge graph embeddings for biomedical research: data integration using linked open data. In: SEMANTICS Posters&Demos (2019)

    Google Scholar 

  12. Evans, R., Grefenstette, E.: Learning explanatory rules from noisy data. J. Artif. Intell. Res. 61, 1–64 (2018)

    MathSciNet  CrossRef  Google Scholar 

  13. Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE. VLDB J. 24(6), 707–730 (2015)

    CrossRef  Google Scholar 

  14. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 413–422 (2013)

    Google Scholar 

  15. García-Durán, A., Niepert, M.: KBLRN: end-to-end learning of knowledge base representations with latent, relational, and numerical features. UAI (2018)

    Google Scholar 

  16. Himmelstein, D.S., et al.: Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife 6, e26726 (2017)

    Google Scholar 

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Neural Computation, vol. 9, pp. 1735–1780. MIT Press (1997)

    Google Scholar 

  18. Liu, Y., Hildebrandt, M., Joblin, M., Ringsquandl, M., Raissouni, R., Tresp, V.: Neural multi-hop reasoning with logical rules on biomedical knowledge graphs. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12731, pp. 375–391. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77385-4_22

    CrossRef  Google Scholar 

  19. Meilicke, C., Betz, P., Stuckenschmidt, H.: Why a naive way to combine symbolic and latent knowledge base completion works surprisingly well. In: 3rd Conference on Automated Knowledge Base Construction (2021)

    Google Scholar 

  20. Meilicke, C., Chekol, M.W., Fink, M., Stuckenschmidt, H.: Reinforced anytime bottom up rule learning for knowledge graph completion (2020)

    Google Scholar 

  21. Meilicke, C., Chekol, M.W., Ruffinelli, D., Stuckenschmidt, H.: Anytime bottom-up rule learning for knowledge graph completion. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). IJCAI/AAAI Press (2019)

    Google Scholar 

  22. Meilicke, C., Fink, M., Wang, Y., Ruffinelli, D., Gemulla, R., Stuckenschmidt, H.: Fine-grained evaluation of rule- and embedding-based systems for knowledge graph completion. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 3–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_1

    CrossRef  Google Scholar 

  23. Minervini, P., Bošnjak, M., Rocktäschel, T., Riedel, S., Grefenstette, E.: Differentiable reasoning on large knowledge bases and natural language. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5182–5190 (2020)

    Google Scholar 

  24. Minervini, P., Riedel, S., Stenetorp, P., Grefenstette, E., Rocktäschel, T.: Learning reasoning strategies in end-to-end differentiable proving. In: International Conference on Machine Learning, pp. 6938–6949. PMLR (2020)

    Google Scholar 

  25. Mohamed, S.K., Nounu, A., Nováček, V.: Drug target discovery using knowledge graph embeddings. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 11–18 (2019)

    Google Scholar 

  26. Nickel, M., Tresp, V., Kriegel, H.: A three-way model for collective learning on multi-relational data. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, pp. 809–816. Omnipress (2011)

    Google Scholar 

  27. Niepert, M., Minervini, P., Franceschi, L.: Implicit MLE: backpropagating through discrete exponential family distributions. In: NeurIPS (2021)

    Google Scholar 

  28. Ott, S., Graf, L., Agibetov, A., Meilicke, C., Samwald, M.: Scalable and interpretable rule-based link prediction for large heterogeneous knowledge graphs (2020)

    Google Scholar 

  29. Pogančić, M.V., Paulus, A., Musil, V., Martius, G., Rolinek, M.: Differentiation of blackbox combinatorial solvers. In: International Conference on Learning Representations (2020)

    Google Scholar 

  30. Qu, M., Tang, J.: Probabilistic logic neural networks for reasoning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  31. Rocktäschel, T., Riedel, S.: End-to-end differentiable proving. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 3788–3800 (2017)

    Google Scholar 

  32. Rolínek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.: Optimizing rank-based metrics with blackbox differentiation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7620–7630 (2020)

    Google Scholar 

  33. Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data (TKDD) 15(2), 1–49 (2021)

    CrossRef  Google Scholar 

  34. Ruffinelli, D., Broscheit, S., Gemulla, R.: You CAN teach an old dog new tricks! on training knowledge graph embeddings. In: International Conference on Learning Representations (2020)

    Google Scholar 

  35. Sadeghian, A., Armandpour, M., Ding, P., Wang, D.Z.: DRUM: end-to-end differentiable rule mining on knowledge graphs. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada, pp. 15321–15331 (2019)

    Google Scholar 

  36. Safavi, T., Koutra, D.: CoDEx: a comprehensive knowledge graph completion benchmark. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8328–8350. Association for Computational Linguistics, November 2020

    Google Scholar 

  37. Sola, D., Meilicke, C., van der Aa, H., Stuckenschmidt, H.: A rule-based recommendation approach for business process modeling. In: La Rosa, M., Sadiq, S., Teniente, E. (eds.) CAiSE 2021. LNCS, vol. 12751, pp. 328–343. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79382-1_20

    CrossRef  Google Scholar 

  38. Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: Rotate: knowledge graph embedding by relational rotation in complex space. In: International Conference on Learning Representations (2019)

    Google Scholar 

  39. Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, pp. 57–66 (2015)

    Google Scholar 

  40. Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of the 33nd International Conference on Machine Learning. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2071–2080. JMLR.org (2016)

    Google Scholar 

  41. Vashishth, S., Sanyal, S., Nitin, V., Talukdar, P.: Composition-based multi-relational graph convolutional networks. In: International Conference on Learning Representations (2020)

    Google Scholar 

  42. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  43. Wang, S., et al.: Mixed-curvature multi-relational graph neural network for knowledge graph completion. In: Proceedings of the Web Conference 2021, pp. 1761–1771 (2021)

    Google Scholar 

  44. Xiong, W., Hoang, T., Wang, W.Y.: DeepPath: a reinforcement learning method for knowledge graph reasoning. arXiv preprint arXiv:1707.06690 (2017)

  45. Yang, F., Yang, Z., Cohen, W.W.: Differentiable learning of logical rules for knowledge base reasoning. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, NeurIPS 2017, Long Beach, US (2017)

    Google Scholar 

  46. Zhang, J., Chen, B., Zhang, L., Ke, X., Ding, H.: Neural, symbolic and neural-symbolic reasoning on knowledge graphs. AI Open 2, 14–35 (2021)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Betz .

Editor information

Editors and Affiliations

A Experimental Details

A Experimental Details

1.1 A.1 Model Input

On the highest abstraction level, our models take as input a list of rules and output a real-valued score. More precisely, for a query \(q=(s,p,?)\) (same in head direction) we collect the \(top\_n\) answer candidates \(c_i\) proposed by AnyBURL, that is, the candidates that were generated by at least one rule. For each of these candidates the respective list of rules defines the model input. Then, the descriptions of the main text apply. Finally, we obtain a vector of scores and likewise a ranking in regard to all candidate answers \(c_i\). At test time, this ranking can directly be used for the evaluation. At training time, we distinguish the true answer/candidate \(c^{*}\) and the remaining candidates \(c'\) which we filter with the training set, i.e., we exclude a \(c'\ne c^{*}\) if a triple \((s,p,c')\) exists in train. For a ranking loss we can now calculate the query loss of q as explained in Sect. 5.2. For some arbitrary loss function such as cross-entropy, \(c^*\) defines the true candidate and the remaining candidates \(c'\) define the reference candidates or pseudo negative candidates.

1.2 A.2 Hyperparameters

For all the experiments, we use a max top-n = 100, the Adagrad optimizer, and a batch-size of 256. Training is performed by using early stopping based on the validation set. LibKGE based configuration files for the experiments are provided in the supplementary materials.

Sparse Aggregator. The hyperparameters that we are concerned with are dropout on the latent features, the latent dimension d, and the learning rate lr. For Hetionet we set d = 10, dropout = 0.15 and lr = 0.9. On Fb15k-237 we set d = 40, dropout = 0.4, and lr = 0.02. For WNRR we set d = 50, dropout = 0.4 and lr = 0.03. For Codex-m we set d = 40, dropout = 0.4 and lr = 0.02. For all the experiments we use a value of 5 for lambda when training on the mean-rank loss.

Dense Aggregator. The dense aggregator follows in its architecture the PyTorch BERT encoder with the modification as explained in Sect. 5.3. We use 4 heads and 4 layers throughout all the experiments. The feed-forward dimensionality within the encoder is 256. For Hetionet we use d = 20, dropout = 0.15 and lr = 0.01. For the remaining datasets we use d = 56, dropout = 0.15, and lr = 0.005. We set the maximal number of rules per input list to 50 for the dense aggregator for all the experiments.

1.3 A.3 Rule Sets

The base data for our experiments are the rules learned with AnyBURL. These are processed in a pre-processing pipeline to generate the inputs for the aggregators as explained above. For all the datasets we exclude AC2 rules and rules with an empty body. This leaves the AnyBURL performance mostly unchanged but we report newest AnyBURL results reported by the authors.

For WNRR rules are mined for 3600 s and we set the maximum length for cyclical rules equal to 5 as suggested in the AnyBURL documentation. All the learned rules are processed for training the models on this dataset. For the remaining datasets the default AnyBURL parameters are used. Here, we prune the learned rulesets slightly and only process rules that had at least 5 (10) true predictions for sparse (dense). On Hetionet rules are learned for 1000 s for dense and sparse. On Fb15k-237 rules are learned for 3600 (500) s for sparse (dense). Finally, on Codex-M rules are learend for 1000 (500) s for sparse (dense).

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Betz, P., Meilicke, C., Stuckenschmidt, H. (2022). Supervised Knowledge Aggregation for Knowledge Graph Completion. In: , et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06981-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06980-2

  • Online ISBN: 978-3-031-06981-9

  • eBook Packages: Computer ScienceComputer Science (R0)