Journal of Intelligent Information Systems

, Volume 47, Issue 1, pp 91–109 | Cite as

Efficient energy-based embedding models for link prediction in knowledge graphs

  • Pasquale Minervini
  • Claudia d’Amato
  • Nicola Fanizzi


We focus on the problem of link prediction in Knowledge Graphs, with the goal of discovering new facts. To this purpose, Energy-Based Models for Knowledge Graphs that embed entities and relations in continuous vector spaces have been largely used. The main limitation in their applicability lies in the parameter learning phase, which may require a large amount of time for converging to optimal solutions. In this article, we first propose an unified view on different Energy-Based Embedding Models. Hence, for improving the model training phase, we propose the adoption of adaptive learning rates. We show that, by adopting adaptive learning rates during training, we can improve the efficiency of the parameter learning process by an order of magnitude, while leading to more accurate link prediction models in a significantly lower number of iterations. We extensively evaluate the proposed learning procedure on a variety of new models: our result show a significant improvement over state-of-the-art link prediction methods on two large Knowledge Graphs, namely WordNet and Freebase.


Energy-based embedding models Link predictions RDF knowledge graphs 


  1. Airoldi, E.M., Blei, D.M., Fienberg, S.E., & Xing, E.P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981–2014.zbMATHGoogle Scholar
  2. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., & Hellmann, S. (2009). DBPedia - A crystallization point for the web of data. Journal of Web Seminars, 7(3), 154–165.CrossRefGoogle Scholar
  3. Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In Wang, J.T. (Ed.) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008 (pp. 1247–1250). Vancouver: ACM.Google Scholar
  4. Bordes, A., & Gabrilovich, E. (2015). Constructing and mining web-scale knowledge graphs. WWW 2015 Tutorial. In Gangemi, A., Leonardi, S., & Panconesi, A. (Eds.) Proceedings of the 24th international conference on world wide web companion, WWW 2015 - companion volume.: ACM.Google Scholar
  5. Bordes, A., & Gabrilovich, E. (2014). Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In In the 20th ACM SIGKDD international conference on knowledge discovery and data mining (p. 1967): KDD ’14.Google Scholar
  6. Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Burgard, W. et al. (Eds.) Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI 2011. San Francisco: AAAI Press.Google Scholar
  7. Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Burges, C.J.C. et al. (Eds.) Proceedings of the 27th Annual Conference on Neural Information Processing Systems (pp. 2787–2795). Nevada: Lake Tahoe.Google Scholar
  8. Bordes, A., Glorot, X., Weston, J., & Bengio, Y. (2014). A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Mach Learn, 94(2), 233– 259.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Chang, K., Yih, W., Yang, B., & Meek, C. (2014). Typed tensor decomposition of knowledge bases for relation extraction. In Moschitti, A. et al. (Eds.) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014. A meeting of SIGDAT, a Special Interest Group of the ACL (pp. 1568–1579). Doha: ACL.Google Scholar
  10. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M., Senior, A.W., Tucker, P.A., Yang, K., & Ng, A.Y. (2012). Large scale distributed deep networks. In Bartlett, P.L. et al. (Eds.) Proceedings of the 26th Annual Conference on Neural Information Processing Systems (pp. 1232–1240). Nevada: Lake Tahoe.Google Scholar
  11. De Raedt, L., Dries, A., Thon, I., Van den Broeck, G., & Verbeke, M. (2015). Inducing probabilistic relational rules from probabilistic examples. In Qiang Yang, Q., & Wooldridge, M. (Eds.) Proceedings of 24th international joint conference on artificial intelligence (IJCAI), 2015 (pp. 1835–1843): AAAI press.Google Scholar
  12. Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., & Zhang, W. (2014). Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In Macskassy, S.A. et al. (Eds.) The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14 (pp. 601–610). New York: ACM.Google Scholar
  13. Duchi, J. C., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.MathSciNetzbMATHGoogle Scholar
  14. Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning, The MIT press.Google Scholar
  15. Jenatton, R., Roux, N.L., Bordes, A., & Obozinski, G. (2012). A latent factor model for highly multi-relational data. In Bartlett, P.L. et al. (Eds.) Proceedings of the 26th Annual Conference on Neural Information Processing Systems. (pp. 3176–3184). Nevada: Lake Tahoe.Google Scholar
  16. Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (pp. 381–388). Boston: AAAI press.Google Scholar
  17. Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques: MIT Press.Google Scholar
  18. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A tutorial on energy-based learning. In Bakir, G. et al. (Eds.) Predicting structured data: MIT press.Google Scholar
  19. Mahdisoltani, F., Biega, J., & Suchanek, F. M. (2015). YAGO3: A Knowledge base from multilingual Wikipedias. In CIDR, 2015, seventh biennial conference on innovative data systems research, Online Proceedings.Google Scholar
  20. Miller, G. A. (1995). Wordnet: a lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRefGoogle Scholar
  21. Miller, K.T., Griffiths, T.L., & Jordan, M.I. Bengio, Y. et al. (Eds.) (2009). Nonparametric latent feature models for link prediction. Vancouver: Curran Associates, Inc.Google Scholar
  22. Nickel, M., Tresp, V., & Kriegel, H. (2011). A three-way model for collective learning on multi-relational data. In Getoor, L. et al. (Eds.) Proceedings of the 28th international conference on machine learning, ICML 2011 (pp. 809–816). Bellevue: Omnipress.Google Scholar
  23. Rettinger, A., Nickles, M., & Tresp, V. (2009). Statistical relational learning with formal ontologies. In Buntine, W.L. et al. (Eds.) Machine learning and knowledge discovery in databases, european conference, ECML PKDD 2009, Bled, Slovenia September 7-11, 2009, Proceedings, Part II. LNCS, (Vol. 5782 pp. 286–301): Springer.Google Scholar
  24. Rumelhart, D. E., Hinton, G. E., & Wilson, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.CrossRefGoogle Scholar
  25. Schaul, T., Antonoglou, I., & Silver, D. (2014). Unit tests for stochastic optimization. In International conference on learning representations. Banff.Google Scholar
  26. Socher, R., Chen, D., Manning, C.D., & Ng, A.Y. (2013). Reasoning with neural tensor networks for knowledge base completion. In Burges, C. J. C. et al. (Eds.) Proceedings of the 27th Annual Conference on Neural Information Processing Systems. (pp. 926–934). Nevada: Lake Tahoe.Google Scholar
  27. Wang, Y. J., & Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397), 8–19.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Xu, Z., Tresp, V., Yu, K., & Kriegel, H. (2006). Infinite hidden relational models. In UAI’06, Proceedings of the 22nd conference in uncertainty in artificial intelligence. Cambridge: AUAI Press.Google Scholar
  29. Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv:1212.57013.

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Pasquale Minervini
    • 1
  • Claudia d’Amato
    • 1
  • Nicola Fanizzi
    • 1
  1. 1.Department of Computer ScienceUniversity of BariBariItaly

Personalised recommendations