Skip to main content
Log in

Simple and automated negative sampling for knowledge graph embedding

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Negative sampling, which samples negative triplets from non-observed ones in knowledge graph (KG), is an essential step in KG embedding. Recently, generative adversarial network (GAN) has been introduced in negative sampling. By sampling negative triplets with large gradients, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, they make the original model more complex and harder to train. In this paper, motivated by the observation that negative triplets with large gradients are important but rare, we propose to directly keep track of them with the cache. In this way, our method acts as a “distilled” version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. However, how to sample from and update the cache are two critical questions. We propose to solve these issues by automated machine learning techniques. The automated version also covers GAN-based methods as special cases. Theoretical explanation of NSCaching is also provided, justifying the superior over fixed sampling scheme. Besides, we further extend NSCaching with skip-gram model for graph embedding. Finally, extensive experiments show that our method can gain significant improvements on various KG embedding models and the skip-gram model and outperforms the state-of-the-art negative sampling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. The mean rank (MR) metric, which is given in the conference version [67], is removed in the journal version since (i) space is limited. and (ii) the mean rank is easily influenced by low ranking samples.

  2. https://github.com/cai-lw/KBGAN

References

  1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. In: ICLR (2017)

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: The Semantic Web, pp. 722–735. Springer (2007)

  3. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)

  4. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In:NIPS, pp. 2546–2554 (2011)

  5. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. JMLR 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  6. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: ACM SIGMOD, pp. 1247–1250 (2008)

  7. Bordes, A., Chopra, S., Weston, J.: Question answering with subgraph embeddings. In: Conference on EMNLP, pp. 615–620 (2014)

  8. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)

  9. Bose, A., Ling, H., Cao, Y.: Adversarial contrastive estimation. In: ACL (Volume 1: Long Papers), pp. 1021–1032 (2018)

  10. Cai, H., Zheng, V., Chang, K.: A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE TKDE 30(9), 1616–1637 (2018)

    Google Scholar 

  11. Cai, L., Wang, W.: KBGAN: adversarial learning for knowledge graph embeddings. Conference of NAACL 1, 1470–1480 (2018)

    Google Scholar 

  12. Chen, L., Yuan, F., Jose, J., Zhang, W.: Improving negative sampling for word representation using self-embedded features. In: WSDM, pp. 99–107 (2018)

  13. Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI (2018)

  14. Ding, J., Quan, Y., He, X., Li, Y., Jin, D.: Reinforced negative sampling for recommendation with exposure data. In: IJCAI, pp. 2230–2236. AAAI Press (2019)

  15. Dong, L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: ACM SIGKDD, pp. 601–610 (2014)

  16. Dori, D.: Visweb-the visual semantic web: unifying human and machine knowledge representations with object-process methodology. VLDB J. 13(2), 120–147 (2004)

    Article  Google Scholar 

  17. Fedus, W., Goodfellow, I., Dai, A.: Maskgan: better text generation via filling in the \_. In: ICLR (2018)

  18. Gao, H. Huang, H.: Self-paced network embedding. In: ACM SIGKDD, pp. 1406–1415 (2018)

  19. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

  20. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)

  21. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: ACM SIGKDD, pp. 855–864 (2016)

  22. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein gans. In: NIPS, pp. 5767–5777 (2017)

  23. Hutter, F., Hoos, H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: ICLIO, pp. 507–523 (2011)

  24. Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning: Methods. Systems, Challenges. Springer (2018)

  25. Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. ACL 1, 687–696 (2015)

    Google Scholar 

  26. Kadlec, R., Bajgar, O., Kleindienst, J.: Knowledge base completion: baselines strike back. In: The 2nd Workshop on Representation Learning for NLP, pp. 69–74 (2017)

  27. Kazemi, S., Poole, D.: SimplE embedding for link prediction in knowledge graphs. In: NeurIPS, pp. 4289–4300 (2018)

  28. Kingma, D., Ba, J.A.: A method for stochastic optimization. Technical report. arXiv:1412.6980 (2014)

  29. Kok, S., Domingos, P.: Statistical predicate invention. In: ICML, pp. 433–440 (2007)

  30. Koller, D., Friedman, N., Džeroski, S., Sutton, C., McCallum, A., Pfeffer, A., Abbeel, P., Wong, M., Heckerman, D., Meek, C., et al.: Introduction to Statistical Relational Learning. The MIT Press (2007)

  31. Kumar, M., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS, pp. 1189–1197 (2010)

  32. Lao, N., Mitchell, T., Cohen, W. Random walk inference and learning in a large scale knowledge base. In: Conference on EMNLP, pp. 529–539. ACL (2011)

  33. Li, J., Tao, C., Feng, Y., Zhao, D., Yan, R. et al.: Sampling matters! an empirical study of negative sampling strategies for learning of matching models in retrieval-based dialogue systems. In: Proceedings of the 2019 EMNLP-IJCNLP, pp. 1291–1296 (2019)

  34. March, J.: Exploration and exploitation in organizational learning. Organ. Sci. 2(1), 71–87 (1991)

    Article  Google Scholar 

  35. McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Inf. Retrieval 3(2), 127–163 (2000)

    Article  Google Scholar 

  36. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)

  37. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

  38. Mikolov, T. and Yih, G., Zweig, W.: Linguistic regularities in continuous space word representations. In: NAACL, pp. 746–751 (2013)

  39. Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: a new way of searching. VLDB J. 25(6), 741–765 (2016)

    Article  Google Scholar 

  40. Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: NIPS, pp. 1017–1025 (2014)

  41. Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2015)

    Article  Google Scholar 

  42. Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., Taylor, J.: Industry-scale knowledge graphs: lessons and challenges. Queue 17(2), 48–75 (2019)

    Article  Google Scholar 

  43. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8024–8035 (2019)

  44. Perozzi, B., Al-Rfou, R., Skiena, S.D.: Online learning of social representations. In: ACM SIGKDD, pp. 701–710 (2014)

  45. Rawat, A., Chen, J., Yu, F., Suresh, A., Kumar, S.: Sampled softmax with random Fourier features. In: NeurIPS, pp. 13857–13867 (2019)

  46. Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: Conference on UAI, pp. 452–461. AUAI Press (2009)

  47. Suchanek, F., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)

  48. Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: knowledge graph embedding by relational rotation in complex space. In: ICLR (2018)

  49. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: WWW, pp. 1067–1077 (2015)

  50. Toutanova, K. and Chen, D.: Observed versus latent features for knowledge base and text inference. In: Workshop on CVSMC, pp. 57–66 (2015)

  51. Trouillon, T., Dance, C., Gaussier, É., Welbl, J., Riedel, S., Bouchard, G.: Knowledge graph completion via complex tensor factorization. JMLR 18(1), 4735–4772 (2017)

    MathSciNet  MATH  Google Scholar 

  52. Wang, H., Wang, J., Wang, J., Zhao, M., Zhang, W., Zhang, F., Xie, X., Guo, M.: Graphgan: graph representation learning with generative adversarial nets. In: AAAI (2018)

  53. Wang, J., Yu, L., Zhang, W., Gong, Y., Xu, Y., Wang, B., Zhang, P., Zhang, D.: IRGAN: a minimax game for unifying generative and discriminative information retrieval models. In: ACM SIGIR, pp. 515–524 (2017)

  54. Wang, P., Li, S., Pan, R.: Incorporating GAN for negative sampling in knowledge representation learning. In: AAAI (2018)

  55. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE TKDE 29(12), 2724–2743 (2017)

    Google Scholar 

  56. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. AAAI 14, 1112–1119 (2014)

    Google Scholar 

  57. Welford, B.: Note on a method for calculating corrected sums of squares and products. Technometrics 4(3), 419–420 (1962)

    Article  MathSciNet  Google Scholar 

  58. Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    MATH  Google Scholar 

  59. Wu, C., Manmatha, R., Smola, A., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the ICCV, pp. 2840–2848 (2017)

  60. Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: ICLR (2017)

  61. Yao, Q., Wang, N., Jair Escalante, H., Guyon, I., Hu, Y., Li, Y., Tu, W., Yang, Q., Yu, Y.: Taking human out of learning applications: a survey on automated machine learning. Technical report, arXiv preprint (2018)

  62. Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W., Leskovec, J.: Graph convolutional neural networks for web-scale recommender systems. In: ACM SIGKDD, pp. 974–983 (2018)

  63. Zhang, C., Li, Y., Du, N., Fan, W., Yu, P.: On the generative discovery of structured medical knowledge. In: SIGKDD, pp. 2720–2728 (2018)

  64. Zhang, F., Yuan, N., Lian, D., Xie, X., Ma, W.: Collaborative knowledge base embedding for recommender systems. In: ACM SIGKDD, pp. 353–362 (2016)

  65. Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)

    Google Scholar 

  66. Zhang, W., Chen, T., Wang, J., Yu, T.: Optimizing top-n collaborative filtering via dynamic negative item sampling. In: ACM SIGIR, pp. 785–788 (2013)

  67. Zhang, Y., Yao, Q., Shao, Y., Chen, L.: NSCaching: simple and efficient negative sampling for knowledge graph embedding. In: ICDE, pp. 614–625 (2019)

  68. Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: ICML, pp. 1–9 (2015)

  69. Zou, L., Chen, L., Özsu, M., Zhao, D.: Answering pattern match queries in large graph databases via graph embedding. VLDB J. 21(1), 97–120 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quanming Yao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: Proof of Theorem 1

A Appendix: Proof of Theorem 1

Proof

Following [68], we consider a more general optimization formulation as \( \min _{\mathbf {w}} \bar{F}(\mathbf {w}) \equiv \frac{1}{n} \sum \nolimits _{i = 1}^n \phi _{i}(\mathbf {w}) + \lambda r(\mathbf {w})\), which covers (11) as a special case with \(\lambda = 0\). Then, the stochastic training gives \(\mathbf {w}^{t + 1}\) as

$$\begin{aligned} \mathbf {w}^{t + 1}&=\! \arg \min _{\mathbf {w}} \left[ (np_{i_t}^t)^{-1}\mathbf {w}^{\top } \nabla \phi _{i_t}(\mathbf {w}^t) \! + \! \lambda r(\mathbf {w}) \! \right. \nonumber \\&\left. + \! \frac{1}{\eta _t} \mathscr {B}_{\psi }(\mathbf {w}, \mathbf {w}^t) \right] \!,\! \end{aligned}$$
(18)

where \(\mathscr {B}_{\psi }\) is a Bregman distance function measuring the difference between \(\mathbf {w}\) and \(\mathbf {w}^t\). Based on (18), we state Theorem 3 in [68] as

Theorem 2

([68]) Let \(\mathbf {w}^t\) be generated from (18). Assume \(\mathscr {B}_{\psi }\) is \(\sigma \)-strongly convex w.r.t. a norm \(\Vert \cdot \Vert \), \(\bar{F}\) and r are convex, if \(\eta _t = \eta \), the following inequality holds for any \(T \ge 1\)

$$\begin{aligned}&\frac{1}{T}\sum \nolimits _{t =1}^T \mathbb {E}[\bar{F}(\mathbf {w}^t)] - \mathbb E[\bar{F}(\mathbf {w}^*)] \\&\le \frac{1}{T}\left[ \frac{1}{\eta } \mathscr {B}_{\psi }(\mathbf {w}^*, \mathbf {w}^1) + \frac{\eta }{2\sigma } \sum \nolimits _{t = 1}^T \mathbb {E} \Vert \nicefrac {\nabla \phi _{i_t}(\mathbf {w}^t)}{ n \mathbf {p}_{i_t}^t }\Vert ^2 \right] \end{aligned}$$

where the expectation is take with distribution \(\mathbf {p}^t\).

In our Algorithm 5, we do not have r and thus \(\lambda = 0\) in (11). Besides, \(\mathscr {B}_{\psi }(\mathbf {w}, \mathbf {w}^t) = \frac{1}{2} \Vert \mathbf {w} - \mathbf {w}^t \Vert ^2\) in our case; thus, \(\sigma = 1\). Above all, we have Theorem 1 which is derived from Theorem 2. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Yao, Q. & Chen, L. Simple and automated negative sampling for knowledge graph embedding. The VLDB Journal 30, 259–285 (2021). https://doi.org/10.1007/s00778-020-00640-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00640-7

Keywords

Navigation