Improving Knowledge Graph Embedding Using Locally and Globally Attentive Relation Paths
 4k Downloads
Abstract
Knowledge graphs’ incompleteness has motivated many researchers to propose methods to automatically infer missing facts in knowledge graphs. Knowledge graph embedding has been an active research area for knowledge graph completion, with great improvement from the early TransE to the current stateoftheart ConvKB. ConvKB considers a knowledge graph as a set of triples, and employs a convolutional neural network to capture global relationships and transitional characteristics between entities and relations in the knowledge graph. However, it only utilizes the triple information, and ignores the rich information contained in relation paths. In fact, a path of one relation describes the relation from some aspect in a finegrained way. Therefore, it is beneficial to take relation paths into consideration for knowledge graph embedding. In this paper, we present a novel convolutional neural networkbased embedding model PConvKB, which improves knowledge graph embedding by incorporating relation paths locally and globally. Specifically, we introduce attention mechanism to measure the local importance of relation paths. Moreover, we propose a simple yet effective measure DIPF to compute the global importance of relation paths. Experimental results show that our model achieves substantial improvements against stateoftheart methods.
Keywords
Knowledge graph embedding Link prediction Triple classification Convolutional neural network Attention mechanism1 Introduction
Largescale knowledge graphs such as Freebase [3], DBpedia [1], and Wikidata [38] store realworld facts in the form of triples (head, relation, tail), abbreviated as (h, r, t), where head and tail are entities and relation represents the relationship between head and tail. They are important resources for many intelligence applications like question answering and web search. Although current knowledge graphs consist of billions of triples, they are still far from complete and missing crucial facts, e.g., 75% of the person entities in Freebase have no known nationality [8], which hampers their usefulness in the aforementioned applications.
Various methods are proposed to address this problem, and the knowledge graph embedding methods have attracted increasing attention in recent years. The main idea of knowledge graph embedding is to embed entities and relations of a knowledge graph into a continuous vector space and predict missing facts by manipulating the entity and relation embeddings involved. Among knowledge graph embedding methods, the translationbased models are simple and efficient, also perform well. For example, given a triple (h, r, t), the most wellknown translationbased model TransE [5] models the relation r as a translation vector \(\mathbf {r}\) connecting the embeddings \(\mathbf {h}\) and \(\mathbf {t}\) of the two entities, i.e., \(\mathbf {h}+\mathbf {r} \approx \mathbf {t}\). It performs well on simple relations, i.e., 1to1 relations. but poorly on complicated relations, i.e., 1toN, Nto1 and NtoN relations. To address this issue, TransH [41], TransR [20] and TransD [14] are proposed. Unfortunately, these models are less simplicity and efficiency than TransE. Nickel et al. [26] present HolE, which uses circular correlation to combine the expressive power of the tensor product with the simplicity and efficiency of TransE.
In this paper, we present a pathaugmented CNNbased model, which incorporates relation paths for knowledge graph embedding. Specifically, we first introduce the attention mechanism to automatically measure the local importance of each path for the given entity pair, then inspired by inverse document frequency, we propose degreeguided inverse path frequency to compute the global importance of each path. Finally, we improve knowledge graph embedding by incorporating locally and globally attentive relation paths.
We present a pathaugmented CNNbased knowledge graph embedding model, which improves embedding model by incorporating relation paths locally and globally.
We introduce attention mechanism to model the local importances of relation paths for knowledge graph embedding.
We propose a simple yet effective measure, degreeguided inverse path frequency, to compute the global importances of relation paths for knowledge graph embedding.
In addition, we apply three pooling operations to aggregate convolutional feature maps, which reduces the number of parameters greatly.
The experimental results on four benchmark datasets show that our model achieves stateoftheart performance.
2 Preliminaries
2.1 Problem Definition
Given a knowledge graph \( \mathcal {G} \), which is a collection of valid factual triples (h, r, t), where \( h, t \in \mathcal {E} \) and \( r \in \mathcal {R} \). \(\mathcal {E}\) is the entity set and \(\mathcal {R}\) is the relation set. In knowledge graph completion, embedding methods aim to define a score function f that gives an implausibility score for each triple (h, r, t) such that valid triples receive lower scores than invalid triples.
2.2 ConvKB
In this section, we briefly describe the stateoftheart CNNbased model ConvKB, and choose it as the base of our model.
It is obvious that ConvKB only learns from triples, ignoring the rich knowledge contained in relation paths, which can lead to poor performance.
3 Our Proposed Model
3.1 PConvKB
The computation of local and global importances is detailed in Sects. 3.2 and 3.3, respectively.
3.2 Measuring Local Importances of Relation Paths by Attention Mechanism
3.3 Measuring Global Importances of Relation Paths by DegreeGuided Inverse Path Frequency
Since the attention mechanism only focuses on the set of relation paths P(h, t) of the given entity pair (h, t) that connects by the relation r. It does not consider that the path in the set of relation paths may also occur in other entity pairs that connects by other relations. Typically, the more set of relation paths a path occurs in, the less importance the path is. Therefore, inspired by inverse document frequency [10, 16], which is a weighting function that has been widely used for measuring how informative each word is in a set of documents. We propose the Degreeguided Inverse Path Frequency (DIPF) to model the global importance of each path in the set of relation paths.
3.4 Aggregating Feature Maps Using Pooling Operation
3.5 Model Training
3.6 Complexity Analysis
We compare the parameter size and computational complexity of our model PConvKB with ConvKB. Let \(N_{e}\) denote the number of entities, \(N_{r}\) the number of relations, K the embedding dimension, S the number of triples for learning, P the expected number of relation paths connecting two entities, and L the expected length of relation paths. The parameter size of PConvKB is equal to the parameter size of ConvKB, i.e., \((N_{e}+N_{r})K\). For each iteration in optimization, the computational complexity of PConvKB is O(SKPL), and the computational complexity of ConvKB is O(SK).
4 Experiments
For a fair comparison, we evaluate our model on two tasks: link prediction [5], and triples classification [33]. Both of them evaluate the accuracy of predicting unseen triples from different viewpoints.
4.1 Datasets
Statistics of the experimental datasets
Dataset  #Entity  #Relation  #Train  #Valid  #Test 

WN18  40,943  18  141,442  5,000  5,000 
FB15k  14,951  1,345  483,142  50,000  59,071 
WN18RR  40,943  11  86,835  3,034  3,134 
FB15k237  14,541  237  272,115  17,535  20,466 
4.2 Comparison Methods
To demonstrate the effectiveness of our model, we compare PConvKB against a variety of knowledge graph embedding methods developed in recent years.

TransE [5] is one of the most widely used knowledge graph embedding methods.

TransH [41] associates each relation with a relationspecific hyperplane to alleviate the complex relations problem.

TransD [14] not only considers the complex relations, but also the diversity of entities, by embedding entities and relations into separate entity space and relationspecific spaces.

HolE [26] uses circular correlation, a novel compositional operator, to capture rich interactions of embeddings.

ConvE [7] is the first CNNbased model for knowledge graph embedding.

ConvKB [22] improves ConvE by taking the transitional characteristic (i.e., one of the most useful intuitions for knowledge graph completion) into consideration.

CapsE [23] combines convolutional neural network with capsule network [29] for knowledge graph embedding.
4.3 Link Prediction
Link prediction task is to complete a triple (h, r, t) with h or t missing, i.e., to predict the missing h given (r, t) or the missing t given (h, r).
Experiments results on link prediction. Hits@10 is reported in %. The best score is in bold, while the second best score is in underline. For comparison methods, the values in black color are the results listed in the original publication, except ConvKB uses the [23] implemented version, which has been reported significantly better performance than the original one. The values in blue color are obtained by implementations from the OpenKE repository.
Model  WN18  FB15k  WN18RR  FB15k237  

MR  Hits@10  MR  Hits@10  MR  Hits@10  MR  Hits@10  
TransE  –  –  125  47.1  –  –  –  – 
TransH  388  82.3  87  64.4  –  –  –  – 
TransD  212  92.2  91  77.3  –  –  –  – 
HolE  –  94.9  –  73.9  –  –  –  – 
ConvE  504  95.5  64  87.3  5277  48.0  246  49.1 
CapsE  –  –  –  –  719  56.0  303  59.3 
ConvKB  –  –  –  –  763  56.7  254  53.2 
PConvKB (local)  212  95.3  58  89.6  733  57.0  267  57.5 
PConvKB (global)  249  93.8  63  89.1  749  56.8  283  56.2 
PConvKB  196  96.3  54  91.4  691  57.4  245  59.8 
Implementation Details. Following the previous work [41], we use the common Bernoulli trick to generate the head or tail entities when sampling invalid triples. Like in ConvKB [22], we also use entity and relation embeddings produced by TransE to initialize entity and relation embeddings in PConvKB. We use the pretrained 100dimensional glove word embeddings [28] to train TransE model, and employ the TransE implementation provided by [25]. We select the learning rate in \(\{5e^{6}, 1e^{5}, 5e^{5}, 1e^{4}\}\), the number of filters in \(\{50, 100, 200, 400\}\). We fix the batch size at 128 and set the \(L_{2}\)regularizer \(\lambda \) at 0.001 in our objective function. We run PConvKB up to 150 epochs and monitor the Hits@10 score after every 10 training epochs to choose optimal hyperparameters. We obtain the highest Hits@10 scores on the validation set when learning rate at \(5e^{5}\), the number of filters at 400 on WN18; and learning rate at \(1e^{5}\), the number of filters at 50 on FB15k; and the learning rate at \(5e^{6}\), the number of filters at 400 on WN18RR; and the learning rate at \(1e^{5}\), the number of filters at 200 on FB15k237. For comparison methods, we use the codes released by [7, 11] and [22].
 1.
PConvKB obtains the best MR and highest Hits@10 scores on the four benchmark datasets, demonstrating the effectiveness of incorporating relation paths for knowledge graph embedding.
 2.
Among PConvKB, PConvKB (local) and PConvKB (global), PConvKB obtains the best performance, which indicates that considering relation paths locally and globally is beneficial for knowledge graph embedding.
 3.
PConvKB does better than the closely related model ConvKB on all experimental datasets, especially on FB15k where PConvKB gains significant improvements of \(275  247 = 28\) in MR (which is about 10.1% relative improvement) and \(59.8\%  54.7\% = 5.1\%\) absolute improvement in Hits@10.
4.4 Triple Classification
Triple classification task is to determine whether a given triple (h, r, t) is correct or not, i.e., binary classification on a triple.
Evaluation Protocol. We follow the same protocol in [33]. For each triple in test set and validation set, we construct one negative triple by switching entities from test triples and validation triples, respectively. The triple classification decision rule is: for a triple (h, r, t), if its implausibility score is below the relationspecific threshold \( \sigma _{r} \), predict positive, otherwise negative. The relationspecific threshold \( \sigma _{r} \) is determined by maximizing classification accuracy on the validation set. The triple classification accuracy is the percentage of triples in the test set that are classified correctly.
Experiments results on triple classification (%). The best score is in bold, while the second best score is in underline.
Model  WN18  FB15k  WN18RR  FB15k237 

TransE  87.6  82.9  74.0  75.6 
TransH  96.5  85.7  77.0  77.0 
TransD  96.4  86.1  76.3  77.0 
HolE  88.1  82.6  71.4  70.3 
ConvE  95.4  87.3  78.3  78.2 
CapsE  96.5  88.4  79.6  79.5 
ConvKB  96.4  87.9  79.1  80.1 
PConvKB (local)  97.5  88.1  79.7  80.6 
PConvKB (global)  96.9  87.6  79.4  80.9 
PConvKB  97.6  89.5  80.3  82.1 
 1.
On the whole, PConvKB yields the best performance on the four benchmark datasets, which is consistent with the results of link prediction, and further illustrates taking the relation paths into consideration is beneficial for knowledge graph embedding.
 2.
More specifically, on FB15k237, the accuracy of triple classification improves from 80.6% of PConvKB(locally) to 82.1% PConvKB, and 80.9% of PConvKB (global) to 82.1% PConvKB. It demonstrates that considering the importances of relation paths locally and globally can better improve the knowledge graph embedding.
5 Related Work
Various methods have been proposed for knowledge graph embedding, such as general linearbased models [6], bilinearbased models [13, 27, 34], translationbased models [5, 9, 14, 15, 20, 41, 43], and neural networkbased models [4, 7, 22, 23, 31, 32, 33]. We refer to [24, 39] for a recent survey. In this section, we focus on the most relevant neural networkbased models, and briefly review the other related methods.
Socher et al. [33] introduce neural tensor networks for knowledge graph embedding, which allows mediated interaction of entity embeddings via a tensor. Schlichtkrull et al. [31] present relational graph convolutional networks for knowledge graph completion. Shi and Weninger [32] present a shared variable neural network model called ProjE, which fillsin missing facts in a knowledge graph by learning joint embeddings of entities and relations. Dettmers et al. [7] present a multilayer convolutional network model, namely ConvE, which uses 2D convolutions over embeddings to predict missing links in knowledge graphs. Nguyen et al. [22] present a CNNbased embedding model, i.e., ConvKB. It applies CNN to explore the global relationships among same dimensional entries in each embedding triple, which generalizes the transitional characteristics in the transitionbased embedding models. Nguyen et al. [23] present CapsE, which combines CNN with capsule networks [29] for knowledge graph embedding. All these models treat a knowledge graph as a collection of triples, and disregard the rich information exist in relation paths.
There are several translationbased models [12, 19, 37, 42, 44] incorporating relation paths to improve the embeddings of entities and relations. However, they fully rely on handdesigned features to measure the importance of each path, which is not differentiable and cannot adjust during training. Moreover, they all based on translationbased models, which are not suitable for CNNbased model. To the best of our knowledge, our model PConvKB is the first attempt which incorporates relation paths in CNNbased embedding model.
6 Conclusion
In this paper, we present a novel CNNbased embedding model PConvKB, which improves knowledge graph embedding by incorporating relation paths locally and globally. In particular, we introduce attention mechanism to measure the local importance of relation paths. Moreover, we propose a simple yet effective measure DIPF to compute the global importance of relation paths. We evaluate our model on link prediction and triple classification. Experimental results show that our model achieves substantial improvements against stateoftheart methods.
Notes
Acknowledgments
We acknowledge anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Grant No. 61872045), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 61921003).
References
 1.Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540762980_52CrossRefGoogle Scholar
 2.Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015 (2015). http://arxiv.org/abs/1409.0473
 3.Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)Google Scholar
 4.Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multirelational data  application to wordsense disambiguation. Mach. Learn. 94(2), 233–259 (2014). https://doi.org/10.1007/s1099401353636MathSciNetCrossRefzbMATHGoogle Scholar
 5.Bordes, A., Usunier, N., GarciaDuran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multirelational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)Google Scholar
 6.Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Proceedings of the TwentyFifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, 7–11 August 2011 (2011). http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3659
 7.Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: ThirtySecond AAAI Conference on Artificial Intelligence (2018)Google Scholar
 8.Dong, X., et al.: Knowledge vault: a webscale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–610. ACM (2014)Google Scholar
 9.Ebisu, T., Ichise, R.: TorusE: knowledge graph embedding on a lie group. In: Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence (AAAI2018), the 30th Innovative Applications of Artificial Intelligence (IAAI2018), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI2018), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 1819–1826 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16227
 10.Ghosh, S., Desarkar, M.S.: Class specific TFIDF boosting for shorttext classification: application to shorttexts generated during disasters. In: Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon, France, 23–27 April 2018, pp. 1629–1637 (2018). https://doi.org/10.1145/3184558.3191621
 11.Han, X., et al.: OpenKE: an open toolkit for knowledge embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018, pp. 139–144 (2018). https://aclanthology.info/papers/D182024/d182024
 12.Huang, W., Li, G., Jin, Z.: Improved knowledge base completion by the pathaugmented TransR model. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds.) KSEM 2017. LNCS (LNAI), vol. 10412, pp. 149–159. Springer, Cham (2017). https://doi.org/10.1007/9783319635583_13CrossRefGoogle Scholar
 13.Jenatton, R., Roux, N.L., Bordes, A., Obozinski, G.: A latent factor model for highly multirelational data. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held at 3–6 December 2012, Lake Tahoe, Nevada, United States, pp. 3176–3184 (2012). http://papers.nips.cc/paper/4744alatentfactormodelforhighlymultirelationaldata
 14.Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 687–696 (2015)Google Scholar
 15.Ji, G., Liu, K., He, S., Zhao, J.: Knowledge graph completion with adaptive sparse transfer matrix. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 12–17 February 2016, pp. 985–991 (2016). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11982
 16.Kim, D., Seo, D., Cho, S., Kang, P.: Multicotraining for document classification using various document representations: TFIDF, LDA, and Doc2Vec. Inf. Sci. 477, 15–29 (2019). https://doi.org/10.1016/j.ins.2018.10.006CrossRefGoogle Scholar
 17.Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015 (2015). http://arxiv.org/abs/1412.6980
 18.Li, X., et al.: Beyond RNNs: positional selfattention with coattention for video question answering. In: The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, The ThirtyFirst Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019, pp. 8658–8665 (2019). https://aaai.org/ojs/index.php/AAAI/article/view/4887
 19.Lin, Y., Liu, Z., Luan, H., Sun, M., Rao, S., Liu, S.: Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 705–714 (2015). http://aclweb.org/anthology/D/D15/D151082.pdf
 20.Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: TwentyNinth AAAI Conference on Artificial Intelligence (2015)Google Scholar
 21.Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
 22.Nguyen, D.Q., Nguyen, T.D., Nguyen, D.Q., Phung, D.: A novel embedding model for knowledge base completion based on convolutional neural network. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), vol. 2, pp. 327–333 (2018)Google Scholar
 23.Nguyen, D.Q., Vu, T., Nguyen, T.D., Nguyen, D.Q., Phung, D.: A capsule networkbased embedding model for knowledge graph completion and search personalization. arXiv preprint arXiv:1808.04122 (2018)
 24.Nguyen, D.Q.: An overview of embedding models of entities and relationships for knowledge base completion. CoRR abs/1703.08098 (2017). http://arxiv.org/abs/1703.08098
 25.Nguyen, D.Q., Sirts, K., Qu, L., Johnson, M.: STransE: a novel embedding model of entities and relationships in knowledge bases. In: NAACL HLT 2016: The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, 12–17 June 2016, pp. 460–466 (2016). https://www.aclweb.org/anthology/N161054/
 26.Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
 27.Nickel, M., Tresp, V., Kriegel, H.: A threeway model for collective learning on multirelational data. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, 28 June–2 July 2011, pp. 809–816 (2011). https://icml.cc/2011/papers/438_icmlpaper.pdf
 28.Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, A Meeting of SIGDAT, a Special Interest Group of the ACL, Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014). https://www.aclweb.org/anthology/D141162/
 29.Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017, pp. 3856–3866 (2017). http://papers.nips.cc/paper/6975dynamicroutingbetweencapsules
 30.Saeedan, F., Weber, N., Goesele, M., Roth, S.: Detailpreserving pooling in deep networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 9108–9116 (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Saeedan_DetailPreserving_Pooling_in_CVPR_2018_paper.html
 31.Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/9783319934174_38CrossRefGoogle Scholar
 32.Shi, B., Weninger, T.: ProjE: embedding projection for knowledge graph completion. In: Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017, pp. 1236–1242 (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14279
 33.Socher, R., Chen, D., Manning, C.D., Ng, A.Y.: Reasoning with neural tensor networks for knowledge base completion. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting Held at 5–8 December 2013, Lake Tahoe, Nevada, United States, pp. 926–934 (2013). http://papers.nips.cc/paper/5028reasoningwithneuraltensornetworksforknowledgebasecompletion
 34.Sutskever, I., Salakhutdinov, R., Tenenbaum, J.B.: Modelling relational data using Bayesian clustered tensor factorization. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a Meeting Held at 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1821–1828 (2009). http://papers.nips.cc/paper/3863modellingrelationaldatausingbayesianclusteredtensorfactorization
 35.Tong, Z., Tanaka, G.: Hybrid pooling for enhancement of generalization ability in deep convolutional neural networks. Neurocomputing 333, 76–85 (2019). https://doi.org/10.1016/j.neucom.2018.12.036CrossRefGoogle Scholar
 36.Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66 (2015)Google Scholar
 37.Toutanova, K., Lin, V., Yih, W., Poon, H., Quirk, C.: Compositional learning of embeddings for relation paths in knowledge base and text. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Volume 1: Long Papers, Berlin, Germany, 7–12 August 2016 (2016). http://aclweb.org/anthology/P/P16/P161136.pdf
 38.Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. Commun. ACM 57, 78–85 (2014)CrossRefGoogle Scholar
 39.Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017). https://doi.org/10.1109/TKDE.2017.2754499CrossRefGoogle Scholar
 40.Wang, W., Chen, Z., Hu, H.: Hierarchical attention network for image captioning. In: The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, The ThirtyFirst Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019, pp. 8957–8964 (2019). https://aaai.org/ojs/index.php/AAAI/article/view/4924
 41.Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: TwentyEighth AAAI Conference on Artificial Intelligence (2014)Google Scholar
 42.Xiong, S., Huang, W., Duan, P.: Knowledge graph embedding via relation paths and dynamic mapping matrix. In: Woo, C., Lu, J., Li, Z., Ling, T.W., Li, G., Lee, M.L. (eds.) ER 2018. LNCS, vol. 11158, pp. 106–118. Springer, Cham (2018). https://doi.org/10.1007/9783030013912_18CrossRefGoogle Scholar
 43.Yuan, J., Gao, N., Xiang, J.: TransGate: knowledge graph embedding with shared gate structure. In: The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, The ThirtyFirst Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019, pp. 3100–3107 (2019). https://aaai.org/ojs/index.php/AAAI/article/view/4169
 44.Zhang, M., Wang, Q., Xu, W., Li, W., Sun, S.: Discriminative pathbased knowledge graph embedding for precise link prediction. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 276–288. Springer, Cham (2018). https://doi.org/10.1007/9783319769417_21CrossRefGoogle Scholar