Knowledge graph embedding methods for entity alignment: experimental review

Fanourakis, Nikolaos; Efthymiou, Vasilis; Kotzinos, Dimitris; Christophides, Vassilis

doi:10.1007/s10618-023-00941-9

Knowledge graph embedding methods for entity alignment: experimental review

Open access
Published: 29 June 2023

Volume 37, pages 2070–2137, (2023)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Knowledge graph embedding methods for entity alignment: experimental review

Download PDF

5696 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, we have witnessed the proliferation of knowledge graphs (KG) in various domains, aiming to support applications like question answering, recommendations, etc. A frequent task when integrating knowledge from different KGs is to find which subgraphs refer to the same real-world entity, a task largely known as the Entity Alignment. Recently, embedding methods have been used for entity alignment tasks, that learn a vector-space representation of entities which preserves their similarity in the original KGs. A wide variety of supervised, unsupervised, and semi-supervised methods have been proposed that exploit both factual (attribute based) and structural information (relation based) of entities in the KGs. Still, a quantitative assessment of their strengths and weaknesses in real-world KGs according to different performance metrics and KG characteristics is missing from the literature. In this work, we conduct the first meta-level analysis of popular embedding methods for entity alignment, based on a statistically sound methodology. Our analysis reveals statistically significant correlations of different embedding methods with various meta-features extracted by KGs and rank them in a statistically significant way according to their effectiveness across all real-world KGs of our testbed. Finally, we study interesting trade-offs in terms of methods’ effectiveness and efficiency.

A survey: knowledge graph entity alignment research based on graph embedding

Article Open access 03 August 2024

Joint Word and Entity Embeddings for Entity Retrieval from a Knowledge Graph

Fast Hubness-Reduced Nearest Neighbor Search for Entity Alignment in Knowledge Graphs

Article Open access 01 October 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, we have witnessed the proliferation of knowledge graphs (KGs) in various domains, aiming to support applications like entity search (Dong et al. 2014), question answering (Ahmetaj et al. 2021), and recommendations (Tarus et al. 2018). Typically, KGs store machine-readable descriptions of real-world entities (e.g., people, movies, books) that capture both relational and factual information. In this work, we refer to an entity description as an identifiable set of property-value pairs that abstracts several data formats, such as relational, RDF, or property graphs.

As different KGs may independently describe the same real-world entity, a crucial task when integrating knowledge from several KGs is to align their entity descriptions. Entity alignment (EA), also known as entity resolution (Christophides et al. 2015, 2021), aims to identify pairs of descriptions from different KGs that refer to the same real-world entity, which we call matching pairs or simply, matches.

Figure 1 shows an example of two KGs, each containing four entity descriptions, represented as nodes, with their properties, represented as edges. The entities described in those KGs can be aligned. For instance, node $v_1$ in $KG_1$ is a description of the director Stanley Kubrick, providing his name and birth year as attributes, and the facts that he directed and wrote the movie The Shining, as well as that he also directed an entity represented by $v_3$ as relations. Node $v_1$ should be aligned with node $v_5$ in $KG_2$, even if the “name” attribute is now called “label”, and its value, “S. Kubrick”, is slightly different than the name value, “Stanley Kubrick”, used in $v_1$. The birth year attribute is also missing in $v_5$, compared to $v_1$, and its only relation to $v_6$ is “directed”, missing the relation “wrote” that was also provided in $KG_1$ between $v_1$ and $v_2$. Similarly, the other entity alignments in the example of Fig. 1 should be $v_2$ with $v_6$ (describing the movie “The Shining”), $v_3$ with $v_7$ (describing the movie “Barry Lyndon”), and $v_4$ with $v_8$ (describing the actor Philip Stone).

One way to implement entity alignment as a machine learning (ML) task, is to learn a vector-space representation of symbolic KGs, known as embeddings. Numerical representations (embeddings) of KGs are preferred over symbolic ones in various ML tasks (link/node/subgraph prediction, matching, etc.), as they potentially mitigate the symbolic, linguistic and schematic heterogeneity of independently created KGs and thus aim to simplify knowledge reasoning. The idea is to embed the nodes (entities) and edges (relations or attributes) of a KG into a low-dimensional vector space. Particularly, we would like similar entities in the original KG to be close to each other in the embedding space and dissimilar entities to be far from each other. In this respect, both positive (i.e., actual KG edges) and negative (i.e., synthetic edges, non-existing in the actual KG) samples of the KGs are used. This way, by measuring their embeddings distance in the vector space, we can decide whether two entities are matching or not. Any available ground truth regarding the alignment of entities, called seed alignment, can be used for training and/or evaluating an embedding method.

Learning low-dimensional representations of KGs in a way such that the semantic relatedness of entities is captured by the geometrical structures of an embedding space is a challenging task that gave birth to numerous methods. Embedding-based entity alignment methods essentially exploit the relational (i.e., entity structural neighborhood) and the factual part (i.e., entity names/identities, attributes that represent literals) of descriptions. We refer to the former as relation-based methods, e.g., MTransE (Chen et al. 2017), MTransE+RotatE (Sun et al. 2020), RDGCN (Wu et al. 2019), RREA (Mao et al. 2020b), and to the latter as attribute-based methods, e.g., MultiKE (Zhang et al. 2019), AttrE (Trisedya et al. 2019), KDCoE (Chen et al. 2018), BERT_INT (Tang et al. 2020). Although this research direction is rapidly growing, there are still several open questions regarding the underlying assumptions of methods, as well as, the efficiency and effectiveness of entity alignment in realistic settings. In particular, in this work we address the following missing insights from the literature:

Q1.
Characteristics of methods. What are the critical factors that affect the effectiveness of relation-based (e.g., negative sampling, range of neighborhood) and attribute-based methods (e.g., usage of literals) and how sensitive are the methods to hyperparameters tuning?
Q2.
Families of methods. What is the improvement in the effectiveness of embedding-based entity alignment methods if we consider not only the structural relations of entities, but also their attribute values?
Q3.
Effectiveness vs Efficiency Tradeoff. Is the runtime overhead of each method worth paying, with respect to the achieved effectiveness?
Q4.
Characteristics of datasets. To which characteristics of the datasets (e.g., sparsity, number of entity pairs in seed alignment, heterogeneity in terms of literals, predicate names and entity names) are supervised, semi-supervised and unsupervised methods sensitive?

Although several recent works (Zeng et al. 2021; Choudhary et al. 2021; Wang et al. 2017; Sun et al. 2020; Zhang et al. 2020; Zhao et al. 2022; Jiang et al. 2021; Wang et al. 2021; Leone et al. 2022; Zhang et al. 2022; Chaurasiya et al. 2022) survey embedding-based entity alignment methods, only few of them (Sun et al. 2020; Zhang et al. 2020; Zhao et al. 2022; Leone et al. 2022; Zhang et al. 2022; Chaurasiya et al. 2022) conduct an experimental evaluation to obtain useful insights. The conclusions drawn from Zhang et al. (2020) are limited, as it leaves out some representative methods in embedding-based alignment such as MTransE (no negative sampling), KDCoE (semi-supervised exploiting long textual descriptions^{Footnote 1}), RREA (semi-supervised exploiting structural information), AttrE (unsupervised), BERT_INT (supervised exploiting both structural and factual information), while neither RREA nor BERT_INT were part of OpenEA (Sun et al. 2020), as both were published later. In addition, Leone et al. (2022) does not include MTransE, MTransE+RotatE, AttrE, KDCoE, while RREA is not included in Leone et al. (2022), Zhang et al. (2022), Chaurasiya et al. (2022) either. Moreover, benchmarking efforts such as Zhang et al. (2020), OpenEA (Sun et al. 2020), EAE (Zhao et al. 2022) and Leone et al. (2022), Zhang et al. (2022), Chaurasiya et al. (2022), do not shed light on questions Q1, Q2 and Q3, addressed in our work. Furthermore, only a subset of the dataset characteristics we study in our work, such as the density of the KGs, and the similarity of entity names, have been considered by previous works to answer question Q4.

To compare the effectiveness and efficiency of the methods in realistic settings, we have extended the testbed of datasets with pairs of KGs usually considered in related empirical studies. Specifically, OpenEA, EAE and Leone et al. (2022), Zhang et al. (2022), Chaurasiya et al. (2022) employ only datasets with a low number of entities featuring descriptions and literal values. In our testbed, we have included five additional datasets whose unique characteristics allow us to draw new insights regarding the evaluated EA methods, which were not previously reported in Sun et al. (2020), Zhang et al. (2020), Zhao et al. (2022), Leone et al. (2022), Zhang et al. (2022), Chaurasiya et al. (2022). More precisely, supervised methods like RDGCN exploiting KG relations, are outperformed by unsupervised (AttrE) and semi-supervised (KDCoE) methods that exploit the similarity of literals in datasets of decreasing density, but with rich factual information (i.e., attributes).

Rather than simply reporting the raw experimental results, we conduct a meta-level analysis, aiming to find statistically significant correlations between the methods and the dataset characteristics (meta-features). Furthermore, we consider the non-parametric Friedman test (Demsar 2006) and the post-hoc Nemenyi test (Nemenyi 1963), in order to perform a pairwise comparison of the methods and rank them based on their effectiveness across all datasets of our testbed. Finally, we are interested in the methods’ training time curve and potential effectiveness vs efficiency trade-offs instead of simply reporting overall runtimes (as in OpenEA and EAE).

In a nutshell, the contributions of this work are the following:

In Sect. 2, we present a qualitative comparison of state-of-the-art embedding-based entity alignment methods that span from supervised, i.e., MTransE (Chen et al. 2017), MTransE+RotatE (Sun et al. 2020), MultiKE (Zhang et al. 2019), RDGCN (Wu et al. 2019), RREA(basic) (Mao et al. 2020b), BERT_INT (Tang et al. 2020), to unsupervised, i.e., AttrE (Trisedya et al. 2019), and semi-supervised, i.e., KDCoE (Chen et al. 2018), RREA(semi) (Mao et al. 2020b), paradigms. They are representative methods of different embedding families covering both relation- and attribute-based, but also considering one-hop and multi-hop neighborhoods in KGs, as well as different negative sampling strategies.
In Sect. 3, we describe our framework for a fair empirical comparison of the different methods. We detail the extended testbed of datasets that exhibit diverse characteristics (w.r.t. KG density, entity naming, textual descriptions, etc.) usually encountered in reality along with the corresponding pre-processing pipelines. We additionally introduce the evaluation protocol and metrics capturing different aspects of the methods’ effectiveness.
In Sect. 4, we report and analyze the results of a series of experiments, including a comparison to a state-of-the-art non-embedding-based method, PARIS (Suchanek et al. 2011), conducted to answer the four open questions introduced previously, using a reliable, statistically sound methodology. First, we discover a statistical significant ranking of the methods according to their effectiveness across all real-world KGs of our testbed. Then, we study interesting trade-offs in terms of their effectiveness and efficiency. Last but not least, we extract statistically significant correlations between the methods’ performance with various characteristics of our datasets (i.e., meta features).

Finally, the main conclusions drown from our experiments, as well as the plans for future work, are discussed in Sect. 5.

2 Entity alignment with KG embeddings

In this section, we first formally define the entity alignment problem on Knowledge Graphs (KGs), along with some related constraints. Then, we provide a qualitative comparison of entity alignment methods based on KG embedding.

2.1 The entity alignment problem

Following the typical notation used in the literature (Zhang et al. 2019; Wang et al. 2020), we assume that entities (with the corresponding entity names),^{Footnote 2} are described in KGs by a collection of edges $\left<h,r,t\right>$, whose head h is always an entity, and tail t may be either another entity, in which case we call this edge a relation edge and r a relation, or a literal (e.g., number, date, string), in which case we call this edge an attribute edge and r an attribute with its corresponding attribute name$^{2}$. We represent a knowledge graph as $KG = (E, R, A, L, X, Y)$, where E is a set of entities, R is a set of relations, A is a set of attributes, L is a set of literals, $X \subseteq (E \times R \times E)$ and $Y \subseteq (E \times A \times L)$ are the sets of relation and attribute edges of the KG, respectively. Given a source $KG_1 = (E_1, R_1, A_1, L_1, X_1, Y_1)$ and a target $KG_2$ = $(E_2, R_2, A_2$, $L_2, X_2, Y_2)$, the task of entity alignment is to find pairs of matching entities $M = \{(e_i,e_j)\in E_1 \times E_2 \mid e_i \equiv e_j \}$, where “$\equiv$” denotes the equivalence relationship (Zhang et al. 2019; Wang et al. 2020). A subset $\delta \subseteq M$ of the matching pairs may be used as a seed alignment for training. For instance, in Fig. 1, the entities of the two KGs are $E_1 = \{v_1, v_2, v_3, v_4\}$ and $E_2 = \{v_5, v_6, v_7, v_8\}$. The relations are $R_1 = \{cast, directed, wrote\}$ and $R_2 = \{directed, actedIn\}$, while the attributes are $A_1 = \{name, birth$-$year, title\}$, $A_2=\{label\}$ and the literals are $L_1 = \{``Stanley Kubrick'', ``1928'', ``The Shining''\}$. $L_2 = \{``S. Kubrick'', ``Barry Lyndon'', ``P. Stone''\}$. The relation edges are $X_1 = \{(v1,directed,v2), (v2,directed,v1), (v1,directed,v3), (v2, directed, v4)\}$, $X_2$ = $\{(v5,directed,v6)$, (v8, actedIn, v6), $(v8,actedIn,v7)\}$, while the attribute edges are $Y_1$ = $\{(v1,name, ``Stanley Kubrick'')$, (v1, birth-$year,``1928'')$, $(v2,title,``The Shining'')\}$ and $Y_2$ = $\{(v5,label, ``S. Kubrick'')$, $(v8,label,``P. Stone'')$, $(v7,label,``Barry Lyndon'')\}$. The task of entity alignment is to find the matches (denoted by dashed edges in Fig. 1) M = $\{(v1,v5), (v2, v6), (v3, v7), (v4, v8)\}$.

In practice, all the evaluated entity alignment methods rely on a number of assumptions/constraints, as listed below:

Every entity is assumed to be the head of at least one relation edge (so we do not consider entities that are not part of a connected component of the KG):

$\forall e \in E, \exists r \in R, t \in E: (e, r, t) \in X.$
1-to-1 constraint: Every entity in $E_1$ should be matched to exactly one entity in $E_2$: $\forall e_i \in E_1\; \left( \exists e_j \in E_2: (e_i, e_j) \in M\right) \wedge \left( \not \exists e_j' \in E_2: (e_i, e_j') \in M\right)$ and vice versa $\forall e_j \in E_2\; \left( \exists e_i \in E_1: (e_i, e_j) \in M\right) \wedge \left( \not \exists e_i' \in E_1: (e_i', e_j) \in M\right) .$ This also implies that $\mid M \mid = \mid E_1 \mid = \mid E_2 \mid$.

2.2 Knowledge graph embeddings for entity alignment

KG embedding methods aim to learn a low-dimensional vector-space representation of symbolic KGs, known as embeddings. The idea is to embed the nodes (entities) and edges (relations or attributes) of a KG in an embedding space in a way that preserves their similarity in the original KG. Embedding methods have been proven to be effective in many machine learning tasks, such as node classification (Kipf and Welling 2017) that aims to assign entity types to KG nodes, or link prediction (Bordes et al. 2013; Sun et al. 2019) that aims to find missing relations between entities in a single KG. Lately, several embedding-based methods have been also proposed for entity alignment, exploiting either relation edges (relation-based methods), such as MTransE (Chen et al. 2017), MTransE + RotatE (Sun et al. 2020), RDGCN (Wu et al. 2019), RREA(basic) (Mao et al. 2020b), RREA(semi) (Mao et al. 2020b) or attribute edges (attribute-based methods), such as MultiKE (Zhang et al. 2019), AttrE (Trisedya et al. 2019), KDCoE (Zhang et al. 2017), and BERT_INT (Tang et al. 2020).

Figure 2 depicts the building blocks of embedding-based entity alignment methods: (i) The embedding module $S_K$ that encodes the entities of each KG in an embedding space (L1 for $KG_1$ and L2 for $KG_2$) according to the relational (i.e., entity structural neighborhood) and/or the factual part (i.e., entity names/identities, literals/text) of descriptions. (ii) The alignment module $S_A$ that aligns the produced entity embeddings using the seed alignment (supervised) or attribute-values similarity (unsupervised), or both (semi-supervised). It produces a common embedding space for the entities of two KGs, in order to generate the alignment result according to a distance metric (e.g., Euclidean), using three different techniques known as sharing, swapping and mapping. Sharing and Swapping, update directly the entity embeddings produced by the embedding module according to the available similarity evidence of entities, while Mapping essentially learns a linear transformation between the two embedding spaces of aligned KGs. In the rest of this section, we will detail popular methods that implement those modules.

2.2.1 Embedding module

KG embedding methods proposed for the task of link prediction are used to implement the embedding module of entity alignment methods. There are several families of KG embedding methods for link prediction have been proposed in the literature, e.g., Yang et al. (2015), Trouillon et al. (2016); Nickel and Kiela (2017). In this paper, we are interested in contrasting translational methods such as TransE (Bordes et al. 2013) and RotatE (Sun et al. 2019), with Graph Neural Networks such as Graph Convolutional Networks (GCNs) (Kipf and Welling 2017) and Graph Attention Networks (GATs) (Velickovic et al. 2018).

2.2.1.1 Translational methods Translational methods use distance-based scoring functions in order to optimize a margin-based loss function and learn the embeddings of entities in a KG. A distance-based scoring function is a function that measures the plausibility of a relation edge $\left<h,r,t\right>$ i.e., it measures the distance of the embedding of the head to the embedding of the tail entities, given the embedding of the relation. A margin-based loss function is a function that these methods aim to minimize, in order to minimize the distance of entity embeddings by a certain margin, computed by a distance-based scoring function. The key difference among all those translational methods is based on the degree that they are able to capture more complex graph structures such as cycles, by adopting the appropriate operator in the scoring function.

TransE (Bordes et al. 2013) is one of the most widely used translational KG embedding methods. In this method, both entities and relations are represented in the same vector space. The relation r is equivalent to the translation of vectors from head entity h to the tail entity t. If $\left<h,r,t\right> \in X$, then the embedding ${{\textbf {t}}}$ of t should be close to the embedding ${{\textbf {h}}}$ of h, plus the vector ${{\textbf {r}}}$ of r , i.e., ${{\textbf {h}}}+{{\textbf {r}}}\approx {{\textbf {t}}}$. Formally, TransE minimizes the margin-based loss function:

$$\begin{aligned} J_{S E}=\sum _{x \in X} \sum _{x' \in X'} \max \left( 0, \gamma +f\left( {\textbf{x}}\right) -f\left( \mathbf {x'}\right) \right) , \end{aligned}$$

(1)

where $f\left( {\textbf{h}},{\textbf{r}},{\textbf{t}}\right) =\; \mid {\textbf{h}}+{\textbf{r}}-{\textbf{t}} \mid$ is the scoring function, X is the set of positive relation edges (relation edges that exists in the KG), $X'$ is the set of negative relation edges (relation edges that do not exist in the KG), and $\gamma$ is the margin hyperparameter. Each negative edge $x' \in X'$ is created by replacing the head or the tail of a positive edge in X with a random entity, ensuring that $x' \notin X$.

RotatE (Sun et al. 2019) is a translation-based embedding model that, unlike TransE, infers various relation patterns, such as symmetries. Specifically, RotatE maps the entities and relations to the complex vector space and defines each relation as a rotation from the head entity to the tail entity (Fig. 3). Given a relation edge $\left<h,r,t\right>$, we expect that ${{\textbf {t}}} = {{\textbf {h}}} \circ {{\textbf {r}}}$, where $\circ$ denotes the Hadamard (Million 2007) (element-wise) product.

This model aims to minimize the margin-based loss function

$$\begin{aligned} L=-\log \sigma \left( \gamma -d_{r}({\textbf{h}}, {\textbf{r}}, {\textbf{t}})\right) -\sum _{i=1}^{n} \frac{1}{k} \log \sigma \left( d_{r}\left( {\textbf{h}}_{i}^{\prime }, {\textbf{r}}, {\textbf{t}}_{i}^{\prime }\right) -\gamma \right) , \end{aligned}$$

(2)

by maximizing the scores of positive relation edges and minimizing the scores of negative relation edges, where $d_r\left( \left<h,r,t\right>\right) =\; \mid {\textbf{h}}\circ {\textbf{r}}-{\textbf{t}} \mid$ is a scoring function, $\gamma$ is a fixed margin, $\sigma$ is the sigmoid function, and $(h'_i,r,t'_i)$ is the i-th negative edge. RotatE, like TransE, creates the negative relation edges by replacing the head and the tail of positive relation edges randomly.

2.2.1.2 Graph Neural Network Methods Translational methods cannot deal with various complex graph structures. For example, TransE (Bordes et al. 2013) cannot deal with triangular structures like the one in Fig. 4, because it requires the three equations $v_1+r_a \approx v_2$, $v_2+r_a \approx v_3$ and $v_1+r_a \approx v_3$ to hold at the same time. This is impossible, because for satisfying the former two equations we would have $v_1+2 r_a \approx v_3$ which is contradictory to the equation $v_1+r_a \approx v_3$.

In order to cope with that, Graph Neural Network (GNN) methods have been proposed. GNNs learn entity embeddings, by recursively aggregating the representations of neighboring nodes. They essentially rely on message passing, according to which, each graph node recursively receives and aggregates features (node representations) from its neighbors in order to represent the local graph structure.

There is a range of GNN variants, that implement different aggregation strategies. In this section, we focus on standard graph convolutional networks (GCNs) (Kipf and Welling 2017) and graph attention networks (GATs) (Velickovic et al. 2018), since they are the core of both RDGCN (Wu et al. 2019) and RREA (Mao et al. 2020b); two of the proposed methods that we evaluate in this study and describe in Sect. 2.3.

GCN (Kipf and Welling 2017) takes as input the randomly initialized entity embeddings of the KG, which is treated as an undirected graph. Then, it learns a set of layer-specific weights, known as filters or kernels, that are multiplied with the input embeddings. In essence, it acts as a sliding window across the KG that learns entity features while preserving useful structural information from the neighborhoods. GCN uses the following function

$$\begin{aligned} H_{i}^{(l+1)}=\sigma \left( \sum _{j \in {\mathcal {N}}i} \frac{1}{c_{i j}} W^{(l)} H_{j}^{(l)}\right) \end{aligned}$$

(3)

to aggregate the entity embeddings of l layers, where $\sigma$ is an activation function, $N_{i}$ is the set of the one-hop neighbors of the central entity i (including itself by adding a self-loop), $c_{ij}$ is a normalization constant that defines isotropic average computation (each neighbor contributes equally to update the embedding of the central entity), $W^{(l)}$ is a trainable layer-specific weighted matrix for feature transformation and $H^{(l)}$ are the entity embeddings for layer l. More precisely, in order to learn the final embedding of a central entity, GCN sums its embedding with the neighbors embeddings.

GAT (Velickovic et al. 2018) expands the aggregation function of GCN, by an attention mechanism that assigns different weights to each neighbor of a central entity. GAT uses the following aggregation function

$$\begin{aligned} H_{i}^{(l+1)}=\sigma \left( \sum _{j \in {\mathcal {N}}i} \alpha _{i j}^{(l)} z_{j}^{(l)}\right) \end{aligned}$$

(4)

that aggregates the entity embeddings of l layers, where $\sigma$ is an activation function, $N_{i}$ is the set of the one-hop neighbors of the central entity i, $H^{(l)}$ are the entity embeddings for layer l, $z^{(l)}_{i}$ is a transformation operation and $\alpha _{i j}^{(l)}$ is the normalized coefficient score. Particularly, $z^{(l)}_{i}$ and $\alpha _{i j}^{(l)}$ are calculated as:

$$\begin{aligned} z_{i}^{(l)}=W^{(l)} H_{i}^{(l)} \end{aligned}$$

(5)

and

$$\begin{aligned} \alpha _{i j}^{(l)}=\frac{\exp \left( e_{i j}^{(l)}\right) }{\sum _{k \in {\mathcal {N}}(i)} \exp \left( e_{i k}^{(l)}\right) }, \end{aligned}$$

(6)

where

$$e_{{ij}}^{{(l)}} = {\text{LeakyReLU}}\left( {{\mathbf{\vec{a}}}^{{(l)^{T} }} \left( {z_{i}^{{(l)}} \parallel z_{j}^{{(l)}} } \right)} \right),$$

(7)

${\mathbf{\vec{a}}}^{{(l)^{T} }}$ is a learnable weight vector, LeakyReLU a variant of the activation function ReLU (Parisi et al. 2022) and $\mid \mid$ is the concatenation operation.

2.2.2 Alignment module

For the alignment module $S_A$, there exist three techniques: sharing, swapping and mapping (Sun et al. 2020). We describe them below, while we extensively compare them in Sect. 2.5.2.

2.2.2.1 Sharing Sharing aims to iteratively update the already produced entity embeddings, in order to minimize the embedding distance of each entity e and its aligned entity $e'$ from the seed alignment $\delta$.

$$\begin{aligned} S_{A}=\sum _{\left( e, e'\right) \in \delta \left( K G_{i}, K G_{j}\right) } {\mid \mid } {\textbf{e}} - \mathbf {e'}{\mid \mid }. \end{aligned}$$

(8)

In Fig. 5, we demonstrate the entity embeddings of $KG_1$ and $KG_2$ in L1 and L2 embedding spaces, respectively, while we also show the updates of the embeddings of the entities of seed alignment, in order to minimize their embedding distance. For simplification, we use a part of seed alignment, thus only the blue entities and the orange entities are considered as aligned. Therefore, by this technique, assuming the spatial similarity of aligned entities in two different KGs, we aim to adjust the axis of the two embedding spaces, so that entity vectors of the same entity in two KGs to overlap. It is worth mentioning that we started from two KGs encoded in two different embedding spaces (embeddings from the embedding module) and we ended up with two KGs encoded in an unified embedding space.

2.2.2.2 Swapping Swapping is a variation of sharing that produces extra positive edges, preserving the same objective as sharing. For instance, given two aligned entity pairs $(h,h') \in \delta (KG_1,KG_2)$ and $(t,t') \in \delta (KG_1,KG_2)$ and a relation edge (h, r, t) of $KG_1$, swapping produces two new positive edges $(h^{\prime },r,t)$ and $(h,r,t^{\prime })$ and feeds them in KG embedding models (embedding module) as positive relation edges, in order to increase the training data, benefiting the quality of the embeddings as we describe in Sects.2.5.1.2 and 2.5.6. Swapping does not introduce a new loss function.

2.2.2.3 Mapping Mapping aims to learn a matrix M as a linear transformation on entity vectors from $L_i$ to $L_j$, in order to minimize the embedding distance of each linearly transformed entity e and its aligned entity $e'$ from the seed alignment $\delta$:

$$\begin{aligned} S_{A}=\sum _{\left( e, e^{\prime }\right) \in \delta \left( K G_{i}, K G_{j}\right) } {\mid \mid }{\textbf{M}}_{i j} {\textbf{e}}-\mathbf {e'}{\mid \mid }. \end{aligned}$$

(9)

In Fig. 6, we demonstrate the entity embeddings of $KG_1$ and $KG_2$ in L1 and L2 embedding spaces respectively, while we also show the process in which we learn the matrix $M_{i,j}$ that linearly transforms entities from L1 to L2. During this process, the linearly transformed entities of $KG_1$ should be close to their aligned entity of $KG_2$ according to the seed alignment. For simplification, we use a part of the seed alignment, thus only the blue entities and the orange entities are considered as aligned. Mapping, in contrary to sharing and swapping, aims to learn the mappings between the two embedding spaces (deducing the linear transformation from $L_1$ to $L_2$), without assuming the similarity of spatial emergence. More precisely, it does not force the entity vectors of aligned entities to overlap, instead, it treats the learned mappings as topological transformations (one-to-one correspondence) from $L_1$ to $L_2$, preserving the two KGs encoded in two different embedding spaces.

2.3 Knowledge graph embeddings using relations

In this section, we discuss relation-based KG embedding methods, all of which are supervised. These methods use only the structural information (relation edges) for learning the entity embeddings.

MTransE (Chen et al. 2017) is a translation-based model for multilingual KG embeddings, but it is also applicable to general-purpose KGs, capturing their structure. The objective is to minimize the loss function

$$\begin{aligned} J = S_{K} + \alpha S_{A}, \end{aligned}$$

(10)

where $S_K$ is the loss function of the embedding module, $S_A$ is the loss function of the alignment module, and $\alpha$ is a factor that weights $S_K$ and $S_A$. As the loss function $S_K$ of the embedding module, MTransE utilizes a simplified version of TransE (Eq. 1), in which no negative relation edges are considered, while as the loss function $S_A$ of the alignment module, it uses mapping (Eq. 9).

MTransE+RotatE (Sun et al. 2020) is a variation of MTransE that uses RotatE (Eq. 2) as $S_K$ in Eq. 10, and sharing (Eq. 8) as $S_A$, instead of TransE and mapping, respectively.

RDGCN (Wu et al. 2019) leverages GCNs (described in 2.2.1.2) to incorporate structural information in the entity embeddings. Particularly, given $KG_1$ and $KG_2$, RDGCN constructs a primal (entity) graph $G^e$ by merging $KG_1$ and $KG_2$, and its dual (relation) graph $G^r$, by creating a node in $G^r$ for every relation type of $G^e$, and connecting two nodes in $G^r$ if the corresponding relations in $G^e$ share the same head or tail entities.

Then, it uses a graph attention mechanism (a dual attention layer that assigns different importance to each neighbor’s contribution) to make interactions between $G^e$ and $G^r$, in order the resulting entity representations in $G^e$ to capture the relation information, and then, to be fed to a GCN, capturing the structure of the neighborhood (Eq. 3). The resulting entity embeddings are refined using the mapping alignment technique (Sect. 2.2.2.3). The loss function that RDGCN aims to minimize is

$$\begin{aligned} L=\sum _{(e_i, e_j) \in \delta , (e_i', e_j') \notin \delta } \text {max}\left( 0, d(\mathbf {e_i}, \mathbf {e_j}) - d\left( \mathbf {e_i'}, \mathbf {e_j'}\right) +\gamma \right) , \end{aligned}$$

(11)

where $(e_i, e_j)$ are entity pairs from the seed alignment $\delta$, $(e_i', e_j')$ are negative samples generated by replacing $e_i$ or $e_j$ with a random entity, and d is the embedding distance function used in mapping (Sect. 2.2.2.3).

RREA (Mao et al. 2020b) integrates GCNs and GATs (described in Sect. 2.2.1) with a Relational Reflection Transformation, in order to obtain relation-specific embeddings for KG entities. This transformation utilizes a matrix that, in contrary to standard GCN and GAT, is constrained to be orthogonal, in order to reflect entity embeddings across different relational hyperplanes. The orthogonal property of the aforementioned matrix keeps the norms and the relative distances of entities in the relational space unchanged.

More precisely, RREA stacks multiple GNN layers, in order to capture and aggregate multi-hop neighborhood information for each entity embedding. The output embedding of entity $e_i$ from the l-th layer is obtained as follows:

$$\begin{aligned} {\varvec{H}}_{e_{i}}^{l+1}={\text {ReLU}}\left( \sum _{e_{j} \in {\mathcal {N}}_{e_{i}}^{e}} \sum _{r_{k} \in R_{i j}} \alpha _{i j k}^{l} {\varvec{M}}_{r_{k}} {\varvec{h}}_{e_{j}}^{l}\right) , \end{aligned}$$

(12)

where ReLU (Parisi et al. 2022) is an activation function, $N^{e}_{e_i}$ are the neighboring entities of $e_i$, $R_{ij}$ denotes the relations between $e_i$ and $e_j$, $M_{r_{k}}$ the relational reflection matrix of $r_k$, and $\alpha _{i j k}^{l}$ is a weight coefficient of $M_{r_{k}}$ (similar to GAT). The final entity embedding comes from the concatenation of the embeddings of each layer. In addition, in order to include relational information around entities, RREA concatenates the summation of the relation embeddings with entity embeddings to get dual-aspect embeddings. The resulting entity embeddings are refined using the sharing alignment technique (Sect. 2.2.2.1). The loss function that RREA aims to minimize is the following:

$$\begin{aligned} L=\sum _{\left( e_{i}, e_{j}\right) \in P} \max \left( {\text {dist}}\left( e_{i}, e_{j}\right) -{\text {dist}}\left( e_{i}^{\prime }, e_{j}^{\prime }\right) +\lambda , 0\right) , \end{aligned}$$

(13)

where $e'_i$ and $e'_j$ represent the negative pair of $e_i$ and $e_j$, generated using truncated uniform negative sampling (Sun et al. 2018; Zhu et al. 2019; Cao et al. 2019) and dist is the embedding distance function used in sharing.

The methodology described above refers to the basic version of RREA, RREA(basic). RREA also comes with a semi-supervised version, RREA(semi), that proposes possibly aligned entity pairs in different iterations, in order to enrich the training set. According to Mao et al. (2020a), the entity pair $(e_i,e_j)$ is proposed as aligned, if $e_i$ and $e_j$ are mutually nearest aligned.

2.4 Knowledge graph embeddings using attributes

In this section, we focus on attribute-based KG embedding methods. These methods utilize not only the structural information of the KGs (relation edges) to learn the entity embeddings, but also the attribute values (literals). In addition, in many methods the attribute embeddings help to enrich the seed alignment or even in refining the entity embeddings. We categorize the attribute-based methods depending on the usage of seed alignment as supervised, semi-supervised and unsupervised.

2.4.1 Supervised

MultiKE (Zhang et al. 2019) first constructs the embeddings of each literal l

$$\begin{aligned} \phi (l)={\text {encode}}\left( \left[ {\text {LP}}\left( o_{1}\right) ; {\text {LP}}\left( o_{2}\right) ; \ldots ; \textrm{LP}\left( o_{n}\right) \right] \right) , \end{aligned}$$

(14)

where $LP(o_n)$ is the pre-trained word embedding of word $o_n$, $encode(\cdot )$ is the encoder that does the compression of the embeddings, and [; ] is the concatenation operation. If $o_n$ is an out-of-vocabulary word (i.e., there is no pre-trained embedding for this word), then MultiKE builds it by using pre-trained character embeddings. Then, it learns entity embeddings by exploring three different views: the name view $\Theta ^{(1)}$, the relation view $\Theta ^{(2)}$ and the attribute view $\Theta ^{(3)}$.

Given an entity e, its name view ($\Theta ^{(1)}$) is defined as

$$\begin{aligned} {\textbf{e}}^{(1)}=\phi ({\text {name}}(e)), \end{aligned}$$

(15)

where $name(\cdot )$ is the name of the entity.

For the relation view $\Theta ^{(2)}$, it adopts TransE to learn the entity embeddings of the two KGs, minimizing the following loss function

$$\begin{aligned} {\mathcal {L}}\left( \Theta ^{(2)}\right) =\sum _{(h, r, t) \in X^+ \cup X^-} \log \left( 1+\exp \left( -\zeta _{(h, r, t)} f_{\textrm{rel}}({\textbf{h}}, {\textbf{r}}, {\textbf{t}})\right) \right) , \end{aligned}$$

(16)

where $X^+ = X_1 \cup X_2$ are the relation edges that exist in the two KGs, $X^-$ are relation edges that do not exist in the two KGs (negative relation edges, created as in TransE), $f_{\textrm{rel}}$ is the scoring function of TransE (Eq. 1), and $\zeta _{(h, r, t)} \in \{-1,1\}$ denotes whether (h, r, t) is a positive or a negative edge.

For the attribute view $\Theta ^{(3)}$, again it uses TransE to learn the embeddings exploiting the attributes and their values, aiming to minimize the loss function

$$\begin{aligned} {\mathcal {L}}\left( \Theta ^{(3)}\right) =\sum _{(h, a, v) \in Y^+} \log \left( 1+\exp \left( -f_{\text{ attr } }\left( {\textbf{h}}^{(3)}, {\textbf{a}}, {\textbf{l}}\right) \right) \right) , \end{aligned}$$

(17)

where $Y^+ = Y_1 \cup Y_2$ are the attribute edges of the two KGs, $f_{\text{ attr }}\left( {\textbf{h}}^{(3)}, {\textbf{a}}, {\textbf{l}}\right) = - \mid {\textbf{h}}^{(3)} - \textbf{CNN}(\langle {\textbf{a}} ; {\textbf{l}}\rangle ) \mid$, which is using a Convolution Neural Network (CNN) representation of an attribute a and its literal value l, as follows:

$$\begin{aligned} \textbf{CNN}(\langle {\textbf{a}} ; {\textbf{l}}\rangle )=\sigma ({\text {vec}}(\sigma (\langle {\textbf{a}} ; {\textbf{l}}\rangle * \Omega )) {\textbf{W}}), \end{aligned}$$

(18)

where [; ] denotes the concatenation operation, ${\textbf{a}}$ the embedding of an attribute a, ${\textbf{l}}$ the embedding of a literal l, $\Omega$ the kernel of CNN, $\sigma$ the activation function, and ${\textbf{W}}$ a trainable weighted matrix.

For refining the entity embeddings of the relation and the attribute views, MultiKE minimizes the following two loss functions, respectively:

$$\begin{aligned} \begin{aligned}&{\mathcal {L}}_{\textrm{CE}}\left( \Theta ^{(2)}\right) =\sum _{(h, r, t) \in {\mathcal {X}}'} \log \left( 1+\exp \left( -f_{\textrm{rel}}\left( \hat{{\textbf{h}}}^{(2)}, {\textbf{r}}, {\textbf{t}}^{(2)}\right) \right) \right) \\&\quad\quad\quad +\sum _{(h, r, t) \in {\mathcal {X}}''} \log \left( 1+\exp \left( -f_{\textrm{rel}}\left( {\textbf{h}}^{(2)}, {\textbf{r}}, \hat{{\textbf{t}}}^{(2)}\right) \right) \right) \end{aligned} \end{aligned}$$

(19)

and

$$\begin{aligned} {\mathcal {L}}_{\textrm{CE}}\left( \Theta ^{(3)}\right) =\sum _{(h, a, l) \in {\mathcal {Y}}'} \log \left( 1+\exp \left( -f_{\textrm{attr}}\left( \hat{{\textbf{h}}}^{(3)}, {\textbf{a}}, {\textbf{l}}\right) \right) \right) , \end{aligned}$$

(20)

where $(h, {\hat{h}})$ and $(t, {\hat{t}})$ are entity pairs in the seed alignment, ${\mathcal {X}}'$ and ${\mathcal {X}}''$ are the relation edges whose head and tail entities are in seed alignment, respectively, and ${\mathcal {Y}}'$ are the attribute edges whose head entities are in seed alignment.

For the entity alignment (alignment module), MultiKE produces a set of aligned relations $S_{rel}$ and a set of aligned attributes $S_{attr}$ that are used to minimize the following loss function (cross-KG relation inference):

$$\begin{aligned} {\mathcal {L}}_{\textrm{CRA}}\left( \Theta ^{(2)}\right) =\sum _{(h, r, t) \in {\mathcal {X}}^{\prime \prime \prime }} {\text {sim}}(r, {\hat{r}}) \log \left( 1+\exp \left( -f_{\textrm{rel}}\left( {\textbf{h}}^{(2)}, \hat{{\textbf{r}}}, {\textbf{t}}^{(2)}\right) \right) \right) , \end{aligned}$$

(21)

where ${\mathcal {X}}'''$ are the relation edges whose relations are in $S_{rel}$ and

$$\begin{aligned} {\text {sim}}(r, {\hat{r}})=\alpha _{1} \text { cosine} (\phi ({\text {name}}(r)), \phi ({\text {name}}({\hat{r}})))+ (1-\alpha _{1}) \text { cosine} ({\textbf{r}}, \hat{{\textbf{r}}}) \end{aligned}$$

(22)

is the similarity measure that is used to align or not two relations, based on their name similarity (from literal embeddings) and their semantic similarity (from relation embeddings).

Finally, MultiKE jointly learns the final entity embeddings from the different views in a unified embedding space, by minimizing the loss function

$$\begin{aligned} {\mathcal {L}}_{\text{ ITC } }({\tilde{\textbf{H}}}, {\textbf{H}})=\sum _{i=1}^{D} \mid \mid {\tilde{\textbf{H}}}-{\textbf{H}}^{(i)}\mid \mid , \end{aligned}$$

(23)

where ${\tilde{\textbf{H}}}=\bigcup _{i=1}^{D} {\textbf{H}}^{(i)}$, ${\textbf{H}}$ is the view-specific entity embedding, and D is the number of views. For entity alignment, it uses swapping (Sect. 2.2.2.2).

BERT_INT (Tang et al. 2020) utilizes a well-known language model, BERT (Devlin et al. 2019), in order to embed entities based on their factual information (e.g., descriptions, names) and an interaction model, in order to compute their interactions, instead of aggregating neighbors, which in many cases causes noisy matches.

More precisely, for each entity e, it applies a pre-trained basic BERT unit that accepts the factual information as input, aiming to minimize the following loss function

$$\begin{aligned} {\mathcal {L}}=\sum _{\left( e, e^{\prime +}\right) \in {\mathcal {D}}} \max \left\{ 0, g\left( e, e^{\prime +}\right) -g\left( e, e^{\prime -}\right) +m\right) \} \end{aligned}$$

(24)

to fine-tune BERT. Here, D is the seed alignment, $e^{\prime +}$ is the correctly aligned entity known from seed alignment, $e^{\prime -}$ is a randomly selected negative entity from the other KG - truncated uniform negative sampling (Sun et al. 2018), m the margin and g is the l1 distance, used for measuring the similarity between the embeddings C(e) and $C(e^{\prime })$.

Regarding the interaction model, it is divided into the name/description view, the neighbor-views and the attribute-view interactions. Firstly, as name/description interaction, it leverages the embeddings generated by BERT, calculating their cosine similarity. Then, neighbor-view interaction compares names/descriptions of each neighbor pair (considering also their neighboring relations and multi-hop neighbors), producing a similarity matrix. The similarity matrix is then processed by a dual aggregation function to extract the similarity vectors, i.e., the entity embeddings. Afterwards, rather than learning embeddings of entities by aggregating their attributes, it compares each attribute pair, learning similarly to neighbor-view the attribute similarity vectors. Finally, a unified dual aggregation function is applied to extract the features from the neighbor-view and attribute-view interactions and generate the final entity embeddings.

2.4.2 Semi-supervised

KDCoE (Chen et al. 2018) leverages a weakly aligned KG for semi-supervised entity alignment using long (typically from a couple of sentences) textual descriptions of entities. It co-trains iteratively two embedding models, one on the structure of the KG (KGEM) and another on the textual descriptions of the entities (DEM), respectively, given a small seed alignment. During each iteration, each embedding model proposes a new set of aligned entity pairs alternately, in order to enrich the seed alignment. The process runs until one of the models has no entity pair to propose.

KGEM is practically the same as MTransE (Eq. 10), using the TransE (Eq. 1) embedding module (including negative samples) and mapping (Eq. 9). At the end of this model, if the embedding distance between an entity e and its closest (based on the distance function of mapping) entity ${\hat{e}}'$ is lower than a threshold, then the pair $(e,{\hat{e}}^{\prime })$ is proposed as aligned. DEM utilizes an encoder to process textual description sequences of vectors $d_e$, that are produced by pre-trained word embeddings, and learn the description embeddings. The learning objective of DEM is to maximize the log likelihood of an entity e and its counterpart $e'$ in terms of description embeddings, by minimizing the following loss function:

$$\begin{aligned} S_{D} =\sum _{\left( e, e^{\prime }\right) \in \delta }-L L_{1}-L L_{2} =\sum _{\left( e, e^{\prime }\right) \in \delta }-\log \left( P\left( e \mid e^{\prime }\right) \right) -\log \left( P\left( e^{\prime } \mid e\right) \right) \end{aligned}$$

(25)

where $\delta$ is the seed alignment and

$$\begin{aligned}{} & {} L L_{1} =\log \sigma \left( {\textbf{d}}_{e}^{\top } {\textbf{d}}_{e^{\prime }}\right) +\sum _{k=1}^{\mid B_{d}\mid } {\mathbb {E}}_{e_{k} \sim U\left( e_{k} \in E_{L_{i}}\right) }\left[ \log \sigma \left( -{\textbf{d}}_{e_{k}}^{\top } {\textbf{d}}_{e^{\prime }}\right) \right] \end{aligned}$$

(26)

$$\begin{aligned}{} & {} L L_{2} =\log \sigma \left( {\textbf{d}}_{e}^{\top } {\textbf{d}}_{e^{\prime }}\right) +\sum _{k=1}^{\mid B_{d}\mid } {\mathbb {E}}_{e_{k} \sim \textrm{U}\left( e_{k} \in E_{L_{j}}\right) }\left[ \log \sigma \left( -{\textbf{d}}_{e}^{\top } {\textbf{d}}_{e_{k}}\right) \right] , \end{aligned}$$

(27)

where $d_e$ and $d_{e'}$ are the embeddings of the textual descriptions of the two aligned entities e and $e'$, unrelated entities $e_k$ are chosen randomly from a uniform distribution U, and $\mid B_{d} \mid$ is the batched sampling size. Intuitively, the encoder aims to maximize the dot product of descriptions of aligned entities and decrease the dot product of descriptions of unrelated entities. At the end of this model, if the embedding distance between $d_e$ and its closest entity $d_{e'}$ is lower than a threshold (different than the one used for KGEM), then the pair $(e, e')$ is proposed as aligned.

For generating entity embeddings from textual descriptions, KDCoE utilizes a self-attention Gated Recurrent Unit (GRU), in order to preserve the sequence of words in a textual description and remove information irrelevant for the prediction, while sharing information across the different descriptions. For more details on this, we refer the reader to Chen et al. (2018).

2.4.3 Unsupervised

AttrE (Trisedya et al. 2019) is an unsupervised method that leverages both structural embeddings and attribute character embeddings for entity alignment. Instead of relying on a seed alignment to refine the structural embeddings, it uses the factual information to minimize the embedding distance between entities that have similar attribute character embeddings.

As shown in Fig. 7, AttrE consists of four modules: the schema alignment module (preprocessing step), the structure embedding module $J_{S E}$, the attribute character embedding module $J_{C E}$, and the alignment module $J_{SIM}$

$$\begin{aligned} J=J_{S E}+J_{C E}+J_{SIM}. \end{aligned}$$

(28)

The predicate alignment module merges the two KGs and renames predicates with a similar predicate name from $KG_1$ and $KG_2$, with a unified naming schema. For example, lgd:hasCountry and dbp:country (with lgd and dbp being the prefixes of the Linked Geo Data and DBpedia namespaces, respectively) are converted to :country. To find the predicates with similar names, it computes the Levenshtein distance of the postfixes of the predicates’ URIs. If this score is greater than a predefined threshold, then the two predicates are similar.

AttrE adopts TransE (Eq. 1) to learn the structure embeddings by minimizing the $J_{SE}$ loss function, with $\alpha =\frac{{\text {count}}(r)}{{\mid }X{\mid }}$, where count(r) is the number of occurrences of relation r, and ${\mid }X{\mid }$ is the total number of relation edges in the merged KG.

To learn the attribute character embedding, AttrE minimizes the following objective function:

$$\begin{aligned} J_{C E}=\sum _{y \in Y} \sum _{y' \in Y'} \max \left( 0,\left[ \gamma +w\left( f\left( y\right) -f\left( y'\right) \right) \right] \right) , \end{aligned}$$

(29)

where

$$\begin{aligned} f\left( \left<h,a,l\right>\right) = {\mid }{\textbf{h}}+{\textbf{r}}-f_{a}(l){\mid } \end{aligned}$$

(30)

and Y is the set of positive attribute edges, $Y'$ is the set of negative attribute edges (generated by replacing the head entity with a random entity), w weights relation edges with aligned predicates, and $f_{a}(l)=\sum _{n=1}^{N}\left( \frac{\sum _{i=1}^{t} \sum _{j=i}^{n} {\textbf{c}}_{{\textbf{j}}}}{t-i-1}\right)$ is an n-gram based compositional function that encodes literals l of attributes a in the embedding space. At the end of this process, entities that have similar attribute embeddings should also have similar entity embeddings. In the example of Fig. 7, lgd:5147 has a similar embedding with :Germany, since their attributes label:Germany have similar embeddings.

Having learned the structure embeddings and the attribute embeddings, AttrE combines them to produce the final entity embeddings, by minimizing the loss function

$$\begin{aligned} J_{SIM}=\sum _{e \in E_{1} \cup E_{2}}\left[ 1-\cos \left( {\textbf{e}}_{se}, {\textbf{e}}_{ce}\right) \right] , \end{aligned}$$

(31)

where $\mathbf {e_{se}}$ the structure embedding of e, and $\mathbf {e_{ce}}$ is the attribute character embedding of e. For example, the embedding of the entities lgd:5147 and :Germany will be updated in order their structured and attribute embeddings to be close. However, the entities lgd:2401 and dbp:Kromsdorf have tail entities with similar embeddings. At the end, the entities lgd:2401 and dbp:Kromsdorf will end up having similar embeddings too. For generating the alignment results, AttrE reports two entities as aligned, if the cosine similarity of their entity embeddings is greater than a pre-defined threshold.

2.5 Qualitative comparison of embedding methods

In this section, we describe the general assumptions of the evaluated methods, while we also compare the embedding-based entity alignment methods from different perspectives. For this purpose, we summarize in Tables 1 and 2 the basic characteristics and techniques of the methods, with respect to eight main categories: embedding module, literal size, alignment module, learning, schema alignment, embedding initialization, and negative sampling on relations and attributes, as described next.

Table 1 Method categories

Knowledge graph embedding methods for entity alignment: experimental review

Abstract

Similar content being viewed by others

A survey: knowledge graph entity alignment research based on graph embedding

Joint Word and Entity Embeddings for Entity Retrieval from a Knowledge Graph

Fast Hubness-Reduced Nearest Neighbor Search for Entity Alignment in Knowledge Graphs

Explore related subjects

1 Introduction

2 Entity alignment with KG embeddings

2.1 The entity alignment problem

2.2 Knowledge graph embeddings for entity alignment

2.2.1 Embedding module

2.2.2 Alignment module

2.3 Knowledge graph embeddings using relations

2.4 Knowledge graph embeddings using attributes

2.4.1 Supervised

2.4.2 Semi-supervised

2.4.3 Unsupervised

2.5 Qualitative comparison of embedding methods

2.5.1 Embedding module

2.5.1.1 Entity names

2.5.1.2 Relations

2.5.1.3 Attribute names

2.5.1.4 Literal values and literal size

2.5.2 Alignment module

2.5.3 Learning

2.5.4 Schema alignment

2.5.5 Embedding initialization

2.5.6 Negative sampling

2.5.7 Neural network architectures

3 Experimental setting

3.1 Datasets

3.2 Statistics and meta-features

3.3 Evaluation protocol and metrics

3.3.1 Evaluation protocol

3.3.2 Evaluation metrics

3.4 Pre-processing pipelines

3.5 Implementation details

4 Analysis of experimental results

4.1 Effectiveness of EA methods using different metrics

4.1.1 Relation-based EA methods

4.1.2 Attribute-based EA methods

4.1.3 Conventional EA methods

4.2 Effectiveness vs efficiency trade-offs

4.2.1 Effectiveness-based ranking of EA methods

4.2.2 Efficiency-based ranking of EA methods

4.3 Meta-level analysis of EA methods

4.3.1 Seed alignment size

4.3.2 Density

4.3.3 Heterogeneity

4.4 Lessons learned

5 Conclusions and future work

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation