Efficient energy-based embedding models for link prediction in knowledge graphs

Abstract

We focus on the problem of link prediction in Knowledge Graphs, with the goal of discovering new facts. To this purpose, Energy-Based Models for Knowledge Graphs that embed entities and relations in continuous vector spaces have been largely used. The main limitation in their applicability lies in the parameter learning phase, which may require a large amount of time for converging to optimal solutions. In this article, we first propose an unified view on different Energy-Based Embedding Models. Hence, for improving the model training phase, we propose the adoption of adaptive learning rates. We show that, by adopting adaptive learning rates during training, we can improve the efficiency of the parameter learning process by an order of magnitude, while leading to more accurate link prediction models in a significantly lower number of iterations. We extensively evaluate the proposed learning procedure on a variety of new models: our result show a significant improvement over state-of-the-art link prediction methods on two large Knowledge Graphs, namely WordNet and Freebase.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    1 http://www.w3.org/TR/rdf11-concepts/

  2. 2.

    2 This description is taken from the Freebase KG (Bollacker et al. 2008)

  3. 3.

    3 For readability reasons, we describe entities and relations using an intuitive way of writing down triples as text rather than using the pure RDF syntax.

  4. 4.

    4 State of the LOD Cloud 2014: http://lod-cloud.net/

  5. 5.

    5 Available at https://developers.google.com/freebase/data

  6. 6.

    6 If X is a continuous random variable, then \(Z(\beta ) = {\int }_{\tilde {x} \in \mathcal {X}} e^{- \beta E(\tilde {x})}\).

  7. 7.

    7 https://github.com/pminervini/ebemkg/

References

  1. Airoldi, E.M., Blei, D.M., Fienberg, S.E., & Xing, E.P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981–2014.

    MATH  Google Scholar 

  2. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., & Hellmann, S. (2009). DBPedia - A crystallization point for the web of data. Journal of Web Seminars, 7(3), 154–165.

    Article  Google Scholar 

  3. Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In Wang, J.T. (Ed.) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008 (pp. 1247–1250). Vancouver: ACM.

  4. Bordes, A., & Gabrilovich, E. (2015). Constructing and mining web-scale knowledge graphs. WWW 2015 Tutorial. In Gangemi, A., Leonardi, S., & Panconesi, A. (Eds.) Proceedings of the 24th international conference on world wide web companion, WWW 2015 - companion volume.: ACM.

  5. Bordes, A., & Gabrilovich, E. (2014). Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In In the 20th ACM SIGKDD international conference on knowledge discovery and data mining (p. 1967): KDD ’14.

  6. Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Burgard, W. et al. (Eds.) Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI 2011. San Francisco: AAAI Press.

  7. Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Burges, C.J.C. et al. (Eds.) Proceedings of the 27th Annual Conference on Neural Information Processing Systems (pp. 2787–2795). Nevada: Lake Tahoe.

  8. Bordes, A., Glorot, X., Weston, J., & Bengio, Y. (2014). A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach Learn, 94(2), 233– 259.

    MathSciNet  Article  MATH  Google Scholar 

  9. Chang, K., Yih, W., Yang, B., & Meek, C. (2014). Typed tensor decomposition of knowledge bases for relation extraction. In Moschitti, A. et al. (Eds.) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014. A meeting of SIGDAT, a Special Interest Group of the ACL (pp. 1568–1579). Doha: ACL.

  10. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M., Senior, A.W., Tucker, P.A., Yang, K., & Ng, A.Y. (2012). Large scale distributed deep networks. In Bartlett, P.L. et al. (Eds.) Proceedings of the 26th Annual Conference on Neural Information Processing Systems (pp. 1232–1240). Nevada: Lake Tahoe.

  11. De Raedt, L., Dries, A., Thon, I., Van den Broeck, G., & Verbeke, M. (2015). Inducing probabilistic relational rules from probabilistic examples. In Qiang Yang, Q., & Wooldridge, M. (Eds.) Proceedings of 24th international joint conference on artificial intelligence (IJCAI), 2015 (pp. 1835–1843): AAAI press.

  12. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., & Zhang, W. (2014). Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In Macskassy, S.A. et al. (Eds.) The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14 (pp. 601–610). New York: ACM.

  13. Duchi, J. C., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.

    MathSciNet  MATH  Google Scholar 

  14. Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning, The MIT press.

  15. Jenatton, R., Roux, N.L., Bordes, A., & Obozinski, G. (2012). A latent factor model for highly multi-relational data. In Bartlett, P.L. et al. (Eds.) Proceedings of the 26th Annual Conference on Neural Information Processing Systems. (pp. 3176–3184). Nevada: Lake Tahoe.

  16. Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (pp. 381–388). Boston: AAAI press.

  17. Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques: MIT Press.

  18. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A tutorial on energy-based learning. In Bakir, G. et al. (Eds.) Predicting structured data: MIT press.

  19. Mahdisoltani, F., Biega, J., & Suchanek, F. M. (2015). YAGO3: A Knowledge base from multilingual Wikipedias. In CIDR, 2015, seventh biennial conference on innovative data systems research, Online Proceedings.

  20. Miller, G. A. (1995). Wordnet: a lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  21. Miller, K.T., Griffiths, T.L., & Jordan, M.I. Bengio, Y. et al. (Eds.) (2009). Nonparametric latent feature models for link prediction. Vancouver: Curran Associates, Inc.

  22. Nickel, M., Tresp, V., & Kriegel, H. (2011). A three-way model for collective learning on multi-relational data. In Getoor, L. et al. (Eds.) Proceedings of the 28th international conference on machine learning, ICML 2011 (pp. 809–816). Bellevue: Omnipress.

  23. Rettinger, A., Nickles, M., & Tresp, V. (2009). Statistical relational learning with formal ontologies. In Buntine, W.L. et al. (Eds.) Machine learning and knowledge discovery in databases, european conference, ECML PKDD 2009, Bled, Slovenia September 7-11, 2009, Proceedings, Part II. LNCS, (Vol. 5782 pp. 286–301): Springer.

  24. Rumelhart, D. E., Hinton, G. E., & Wilson, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.

    Article  Google Scholar 

  25. Schaul, T., Antonoglou, I., & Silver, D. (2014). Unit tests for stochastic optimization. In International conference on learning representations. Banff.

  26. Socher, R., Chen, D., Manning, C.D., & Ng, A.Y. (2013). Reasoning with neural tensor networks for knowledge base completion. In Burges, C. J. C. et al. (Eds.) Proceedings of the 27th Annual Conference on Neural Information Processing Systems. (pp. 926–934). Nevada: Lake Tahoe.

  27. Wang, Y. J., & Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397), 8–19.

    MathSciNet  Article  MATH  Google Scholar 

  28. Xu, Z., Tresp, V., Yu, K., & Kriegel, H. (2006). Infinite hidden relational models. In UAI’06, Proceedings of the 22nd conference in uncertainty in artificial intelligence. Cambridge: AUAI Press.

  29. Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv:1212.57013.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Claudia d’Amato.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Minervini, P., d’Amato, C. & Fanizzi, N. Efficient energy-based embedding models for link prediction in knowledge graphs. J Intell Inf Syst 47, 91–109 (2016). https://doi.org/10.1007/s10844-016-0414-7

Download citation

Keywords

  • Energy-based embedding models
  • Link predictions
  • RDF knowledge graphs