Keywords

1 Introduction

The development of knowledge graph research has developed a variety of methods for the alignment of knowledge graph entities. Traditional entity alignment methods can only use the symbolic information on the surface of the knowledge graph data. The entity alignment between knowledge graphs can be realized efficiently and accurately.

This paper proposes a method for entity alignment based on joint knowledge representation and using improved NTN. We regard entity alignment as a binary classification problem, improve the evaluation function of NTN, and use the aligned entity pair vector as the input of alignment relationship model. If the “the Same As” relationship exists between the input entity pairs, the evaluation function of the model will return a high score, otherwise it will return a low score, based on the scores of the candidate entities to complete the entity alignment task.

2 Related Work

2.1 Joint Knowledge Represents Learning

The purpose of knowledge representation learning is to embed entities and relationships into a low-dimensional vector space, and to maximize the preservation of the original semantic structure information. The TransE method opens a series of translation-based methods that learn vectorized representations of entities and relationships to support further applications, such as entity alignment, relationship reasoning, and triple classification. However, TransE is not very effective in solving many-to-one and one-to-many problems. In order to improve the effect of TransE learning multiple mapping relations, TransH, TransR and TransDare proposed. All variants of TransE specifically embed entities for different relationships, and improve the knowledge representation learning method of multi-mapping relationships at the cost of increasing the complexity of the model. In addition, there are some non-translation-based methods, including UM [1], SE, DistMult, and HolE [2], which do not express relational embedding.

2.2 Evaluation of the Similarity of the Neural Tensor Network

The goal of similarity evaluation is to measure the degree of similarity between entities. The BootEA model [3] designed a method to solve the problem that the training data set is very limited in the process of knowledge representation learning, iteratively marked out the possible entity alignment pairs, added them into the training of knowledge embedded model, and constrained the alignment data generated in each iteration. The similarity evaluation methods of these models belong to the traditional string text similarity calculation method. For example, KL divergence [4] is used to measure the amount of information lost when one vector approximates to another; There are also Euclidean distance, Manhattan distance [5] and other distance evaluation functions for mapping entities to vector space; There are many models using cosine similarity [6] as entity similarity calculation. Entity alignment algorithm.

3 Entity Alignment Algorithm

3.1 Algorithm Framework

This paper proposes an entity alignment method based on neural tensor network, which consists of two parts: Joint knowledge representation and neural tensor network similarity evaluation. The whole framework of this method is illustrated in Fig. 1. We use \(\mathrm{G}\) to represent a set of knowledge maps, and \({\mathrm{G}}^{2}\) to represent the combination of kgs (that is, the set of unordered knowledge pairs). For \({G}_{1}\) and \({\mathrm{G}}_{2}\) is defined as the entity set in knowledge graph \(\mathrm{G}\), and \(\mathrm{R}\) is defined as the relationship set in knowledge map \(\mathrm{G}\). \(\mathrm{T }= (\mathrm{h},\mathrm{ R},\mathrm{ t})\) denotes the entity relation triple of a positive example in the knowledge graph \(\mathrm{G}\), let \(\mathrm{h},\mathrm{ t }\in \mathrm{ E};\mathrm{r}\in \mathrm{R}\), vector_ h, vector_ r, vector_ T represents the embedding vectors of head entity \(\mathrm{h}\), relation \(\mathrm{R}\) and tail entity \(\mathrm{t}\) respectively.

We regard the alignment relationship “the Same As” as a special relationship between entities, as shown in Fig. 2, and perform alignment specific translation operations between aligned entities to constrain the training process of two knowledge maps to learn joint knowledge representation.

Formulaic given two aligned entities \({e}_{1}\in {\mathrm{E}}_{1}\) and \({\mathrm{e}}_{2}\in {\mathrm{E}}_{2}\). We assume that there is an alignment relation \({r}^{same}\) between two aligned entities, so \({e}_{1}+{r}^{Same}\cong {e}_{2}\). The energy function of joint knowledge representation is defined as:

$$E\left({e}_{1},{r}^{Same},{e}_{2}\right)=\| {e}_{1}+{r}^{Same}-{e}_{2}\| $$
(1)
Fig. 1.
figure 1

NtnEA method framework

Fig. 2.
figure 2

Learning process of joint knowledge representation

The similarity evaluation model in 2.2 does not use the underlying semantic and structural information of the entity vector, and then considers that the neural tensor network is used in knowledge reasoning. This is in modeling the relationship between two vectors and inferring the relationship that exists between entities. A task has a very good effect, as shown in Fig. 3. Inspired by this, this article uses the NTN method as an alignment model to infer and judge whether there is a “the Same As” alignment relationship between two entities to be aligned. This method uses The tensor function regards entity alignment as a binary classification problem, and the evaluation function of the neural tensor network is:

$$S\left({e}_{1},{e}_{2}\right)={u}^{T}f({{e}_{1}}^{T}{W}^{\left[1:k\right]}{e}_{2}+V\left(\begin{array}{c}{e}_{1}\\ {e}_{2}\end{array}\right)+b)$$
(2)

Where \(\mathrm{f }=\mathrm{ tanh}\) is a nonlinear function; \({\mathrm{W}}^{[1:\mathrm{k}]}\in {\mathrm{R}}^{\mathrm{d }\times \mathrm{ d }\times \mathrm{ k}}\) is a three-dimensional tensor; \(\mathrm{D}\) is the dimension of entity embedding vector, \(\mathrm{k}\) is the number of tensor slices; \(\mathrm{V }\in {\mathrm{R}}^{2\mathrm{d }\times \mathrm{ k}}\) And \(\mathrm{b }\in {\mathrm{R}}^{k}\) is the parameter of the linear part of the evaluation function; \(\mathrm{u }\in {\mathrm{R}}^{k}\).

In the legal triples, the relationship between the head entity and the tail entity is irreversible and directional for the current triple; However, for the alignment of entities to triples, the alignment relationship between entities is undirected, that is, there is such a triple relationship between aligned entity pairs \((\mathrm{A},\mathrm{ B})\):\(\left(\mathrm{A},\mathrm{theSameAs},\mathrm{B}\right)\), \(\left(\mathrm{B},\mathrm{theSameAs},\mathrm{A}\right)\),

The triplet embedding section in Fig. 1 shows this very well. We optimize the evaluation function:

$$S\left({e}_{1},{e}_{2}\right)={u}^{T}f\left(\begin{array}{c}{mean({e}_{1}}^{T}{W}^{\left[1:k\right]}{e}_{2}+V\left(\begin{array}{c}{e}_{1}\\ {e}_{2}\end{array}\right),\\ {{e}_{2}}^{T}{W}^{\left[1:k\right]}{e}_{1}+V\left(\begin{array}{c}{e}_{2}\\ {e}_{1}\end{array}\right))+b\end{array}\right)$$
(3)

The final loss function is as follows:

$$L(\mathit{\Omega})=\mathop{\sum}\nolimits_{i=1}^{N}\mathop{\sum}\nolimits_{c=1}^{C}\mathit{max}\left(\mathrm{0,1}-S\left({T}^{\mathrm{i}}\right)+S\left({T}_{c}^{i}\right)\right)+\lambda {\Vert \mathit{\Omega} \Vert }_{2}^{2}$$
(4)

where \(\Omega \) is the set of all parameters. \({T}_{c}^{i}\) is the \({\mathrm{c}}^{th}\) negative example of the \({\mathrm{i}}^{th}\) positive example.

Fig. 3.
figure 3

Neural tensor network relational reasoning process

3.2 Algorithm Flow

The algorithm description of the specific NtnEA model is shown in Algorithm 1.

figure a

4 Experiment

4.1 Datasets

This experiment is aimed at the comparison of entity alignment methods based on knowledge representation learning, in order to facilitate the horizontal comparison of multiple entity alignment methods, and evaluate the NtnEA method in the context of cross-language entity alignment tasks. This experimental data set uses a more general paper data, the DBP15K [7] data set, which contains three cross-language data sets. These data sets are constructed based on the multilingual version of the DBpedia knowledge base: \({\mathrm{DBP}}_{\mathrm{ZH}-\mathrm{EN}}\) (Chinese and English), \({\mathrm{DBP}}_{\mathrm{JP}-\mathrm{EN}}\) (Japanese and English) and \({\mathrm{DBP}}_{\mathrm{FR}-\mathrm{EN}}\) (French and English). Each data set contains 15,000 aligned entities.

4.2 Training and Evaluation

In order to verify the effectiveness of this research method on the task of knowledge map alignment, the following relatively common method pairs were selected as experimental reference comparisons:

  • MTransE, the linear transformation between two vector spaces established by TransE;

  • IPTransE, which embeds entities from different knowledge graphs into a unified vector space, and iteratively uses predicted anchor points to improve performance;

  • AlignE [6] uses ε-truncated uniform negative sampling and parameter exchange to realize the embedded representation of the knowledge graph. It is a variant of BootEA method without bootstrapping;

  • AVR-GCN uses VR-GCN as a network embedding model to learn the representation of entities and the representation of relations at the same time and use this network in the task of multi-relational network alignment based on this network;

To experimentally verify the algorithm in this paper, first learn the vectorized representation of entity relationships in the low-dimensional embedding space in the DBP15K data set. In the entire training process, the dimension d of the vector space is selected from the set \(\{\mathrm{50, 80, 100, 150}\}\), and the learning rate λ is selected from the set \(\{{10}^{-2},{10}^{-3},{10}^{-4}\}\), the number of negative samples n is selected from the set \(\{\mathrm{1,3},\mathrm{5,15,30}\}\). Three sets of data sets are trained separately, and the final optimal parameter configuration is selected as follows: 1. ZH-EN data set, \(\mathrm{d}=100\), \(\uplambda =0.001\), \(\mathrm{n}=5\); 2. JP-EN data set, \(\mathrm{d}=100\), \(\uplambda =0.001\), \(\mathrm{n}=3\); 3. FR-EN data set, \(\mathrm{d}=100\), \(\uplambda =0.003\), \(\mathrm{n}=5\).

The alignment entity data of each cross-language data set is divided according to the ratio of 3:7. As shown in Fig. 4, as the number of tensor slices k increases, the complexity of the model becomes larger, and its performance also improves, but considering that the parameter complexity will increase with the increase of tensor slice parameters. Therefore, the optimal parameter configuration of the neural tensor network model in this process is: \(\uplambda =0.0005,\mathrm{ k}=200 (\mathrm{tensor})\).

Fig. 4.
figure 4

Hit@1 indicator curve at any value of k

4.3 Experimental Results and Analysis

According to the experimental settings in the experimental method in the previous section, entity alignment experiments were performed on the three sets of cross-language data sets of DBP15K. The results of entity alignment are shown in Table 1. Through the experimental results, it can be seen that in the data sets \({\mathrm{DBP}}_{\mathrm{FR}-\mathrm{EN}}\), \({\mathrm{DBP}}_{\mathrm{ZH}-\mathrm{EN}}\) and \({\mathrm{DBP}}_{\mathrm{JP}-\mathrm{EN}}\), compared with the traditional entity alignment method on Hit@k and MRR indicators, The experimental results are shown in the table. The experimental results of MTransE, IPTransE, AlignE and AVR-GCN are obtained from the literature [8]. It can be seen from the table that the experimental results of the two NtnEA methods are significantly improved compared to the benchmark methods MTransE and IPTransE. For example, the Hit@10 values of NtnEA on the three cross-language data sets of DBP15k are 82.00, 78.07 and 77.10, respectively. Compared with the experimental indicators of the AlignE model, an average increase of 10.7%.

This paper uses the semantic structure information of triple data, and through joint knowledge indicates that more alignment information is integrated, so the results show that its alignment effect is significantly improved compared to the alignment methods based on knowledge representation learning such as MTransE and IPTransE. Among the two NtnEA entity alignment methods, the NtnEA model performs better than the NtnEA(Orig) model. This verifies the fact that the head entity and the tail entity in the triples of the alignment relationship are undirected graph structures under the relationship “the same As”. On the three cross-language data sets, the Hit@10 and MRR indicators of the NtnEA(Orig) and NtnEA models proposed in this paper exceed the MTransE and IPTransE methods. However, there is no obvious advantage over the current more advanced AVR-GCN model in the Hit@1 indicator, which represents the alignment accuracy.

Table 2 shows that when using the similarity evaluation model for training, the more priori seed set training set alignment relationship data, the better the effect of the model on the entity alignment task.

Table 1. Comparison of entity alignment results
Table 2. Comparison results under different seed set partition ratios Hit@k index

5 Conclusions

This paper introduces a cross-knowledge graph entity alignment model based on neural tensor network proposed in this paper. The model is mainly divided into two parts: joint knowledge representation learning and neural tensor network similarity evaluation. The entity alignment method based on neural tensor network is verified experimentally. The experimental results show that the method based on neural tensor network has good entity alignment performance under given experimental conditions. Compared with previous algorithms, the indexes HIT@5 and HIT@10 have been improved, but the improvement effect on HIT@1 is not obvious, which means that the method has short board in alignment accuracy.