1 Introduction

Matching data instances that refer to the same real-world entity is a long-standing problem. It establishes the connections among multiple data sources and is critical to data integration and cleaning [33]. Therefore, the task has been actively studied; for instance, in the database community, various entity matching (EM) (and entity resolution (ER)) strategies are proposed to train a (supervised) classifier to predict whether a pair of data records match [9, 33].

Recently, due to the emergence and proliferation of knowledge graphs (KGs), matching entities in KGs draws much attention from both academia and industries. Distinct from traditional data matching, it brings its own challenges. In particular, it underlines the use of KGs’ structures for matching and manifests unique characteristics of data, e.g., imbalanced class distribution, and few attributive textual information, etc. As a consequence, although viable, following the traditional EM pipeline, it is difficult to train an effective classifier that can infer the equivalence between KGs’ entities. Hence, much effort has been dedicated to specifically addressing the matching of entities in KGs, which is also referred to as entity alignment (EA).

Nevertheless, early solutions to EA are mainly unsupervised [22, 42], i.e., no labeled data is assumed. They utilize discriminative features of entities (e.g., entity descriptions and relational structures) to infer the equivalent entity pair, which are, however, embarrassed by the heterogeneity of independently constructed KGs [44].

To mitigate this issue, recent solutions to EA employ a few labeled pairs as seeds to guide the learning and prediction [8, 14, 27, 37, 47]. In short, they embed the symbolic representations of KGs as low-dimensional vectors in a way such that the semantic relatedness of entities is captured by the geometrical structures of embedding spaces [4], where the seed pairs are leveraged to produce unified entity representations. In the testing stage, they match entities based on the unified entity embeddings. They are coined as embedding-based EA methods, which have exhibited state-of-the-art performance on existing benchmarks.

To be more specific, the embedding-based EAFootnote 1 pipeline can be roughly divided into two major stages, i.e., representation learning and matching KGs in entity embedding spaces (or embedding matching for short). While the former encodes the KG structures into low-dimensional vectors and establishes connections between independent KGs via the calibration or transformation of (seed) entity embeddings [44], the latter computes pairwise scores between source and target entities based on such embeddings and then makes alignment decisions according to the pairwise scores. Although this field has been actively explored, existing efforts are mainly devoted to the representation learning stage [24, 26, 61], while embedding matching has not raised many attentions until very recently [29, 53]. The majority of existing EA solutions adopt a simple algorithm to realize this stage, i.e., DInf, which first leverages common similarity metrics such as cosine similarity to calculate the pairwise similarity scores between entity embeddings and then matches a source entity to its most similar target entity according to the pairwise scores [47]. Nevertheless, it is evident that such an intuitive strategy can merely reach local optimums for individual entities and completely overlooks the (global) interdependence among the matching decisions for different entities [55].

To address the shortcomings of DInf, advanced strategies are devised [12, 44, 49, 53, 55, 56]. While some of them inject the modeling of global interdependence into the computation of pairwise scores [12, 44, 53], some directly improve the alignment decision-making process by imposing collective matching constraints [49, 55, 56]. These efforts demonstrate the significance of matching KGs in entity embedding spaces from at least three major aspects: (1) It is an indispensable step of EA, which takes as input the entity embeddings (generated by the representation learning stage), and outputs matched entity pairs. (2) Its performance is crucial to the overall EA results, e.g., an effective algorithm can improve the alignment results by up to 88% [53]. (3) It empowers EA with explainability, as it unveils the decision-making process of alignment. We use the following example to further illustrate the significance of the embedding matching process.

Example Figure 4.1 presents three representative cases of EA. The KG pairs to be aligned are first encoded into embeddings via the representation learning models. Next, the embedding matching algorithms produce the matched entity pairs based on the embeddings. In the most ideal case where two KGs are identical, e.g., case (a), with an ideal representation learning model, equivalent entities would be embedded into exactly the same place in the low-dimensional space, and using the simple DInf algorithm would attain perfect results. Nevertheless, in the majority of practical scenarios, e.g., case (b) and (c), the two KGs have high structure heterogeneity. As thus, even an ideal representation learning model might generate different embeddings for equivalent entities. In this case, adopting the simple DInf strategy is likely to produce false entity pairs, such as \((u_5, v_3)\) in case (b).

Fig. 4.1
3 flow diagrams with network graphs for 3 E A cases. a. The ideal K G pairs with seeds undergo ideal representation learning and D inf embedding matching to get 3 matched entities. b. The practical K G pairs undergo ideal and practical learning and embedding matching to get 2 and 1 matched entities.

Three cases of EA. Dashed lines between KGs denote the seed entity pairs. Entities with the same subscripts are equivalent. In the embedding space, the circles with two colors represent that the corresponding entities in the two KGs have the same embeddings

Worse still, as pointed out in previous works [44, 59], existing representation learning methods for EA cannot fully capture the structural information (possibly due to their inner design mechanisms, or their incapability of dealing with scarce supervision signals). Under these settings, e.g., case (c), the distribution of entity embeddings in the low-dimensional space would become irregular, where the simple embedding matching algorithm DInf would fall short, i.e., producing incorrect entity pairs \((u_3, v_1)\) and \((u_5, v_1)\). As thus, in these practical cases, an effective embedding matching algorithm is crucial to inferring the correct matches. For instance, by exploiting the collective embedding matching algorithm that imposes the 1-to-1 alignment constraint, the correct matches, i.e., \((u_3, v_3)\) and \((u_5, v_5)\), are likely to be restored.

While the study on matching KGs in entity embedding spaces is rapidly progressing, there is no systematic survey or comparison of these solutions [44]. We do notice that there are several survey papers covering embedding-based EA frameworks [44, 52, 57,58,59], whereas they all briefly introduce the embedding matching module (mostly only mentioning the DInf algorithm). In this chapter, we aim to fill in this gap by surveying current solutions for matching KGs in entity embedding spaces and providing a comprehensive empirical evaluation of these methods with the following features:

(1) Systematic Survey and Fair Comparison

Albeit essential to the alignment performance, existing embedding matching strategies have yet not been compared directly. Instead, they are integrated with representation learning models and then evaluated and compared with each other (as a whole). This, however, cannot provide a fair comparison of the embedding matching strategies themselves, since the difference among them can be offset by other influential factors, such as the choices of representation learning models or input features. Therefore, in this chapter, we exclude irrelevant factors and provide a fair comparison of current matching algorithms for KGs in entity embedding spaces at both theoretical and empirical levels.

(2) Comprehensive Evaluation and Detailed Discussion

To fully appreciate the effectiveness of embedding matching strategies, we conduct extensive experiments on a wide range of EA settings, i.e., with different representation learning models, with various input features, and on datasets at different scales. We also analyze the complexity of these algorithms and evaluate their efficiency/scalability under each experimental setting. Based on the empirical results, we discuss to reveal strengths and weaknesses.

(3) New Experimental Settings and Insights

Through empirical evaluation and analysis, we discover that the current mainstream evaluation setting, i.e., 1-to-1 constrained EA, oversimplifies the real-life alignment scenarios. As thus, we identify two experimental settings that better reflect the challenges in practice, i.e., alignment with unmatchable entities, as well as a new setting of non-1-to-1 alignment. We compare the embedding matching algorithms under these challenging settings to provide further insights.

Contributions

We make the following contributions:

  • We systematically and comprehensively survey and compare state-of-the-art algorithms for matching KGs in entity embedding spaces (Sect. 4.3).

  • We evaluate and compare the state-of-the-art embedding matching algorithms on a wide range of EA datasets and settings, as well as reveal their strengths and weaknesses. The codes of these algorithms are organized and integrated into an open-source library, EntMatcher, publicly available at https://github.com/DexterZeng/EntMatcher (Sect. 4.4).

  • We identify experimental settings that better mirror real-life challenges and construct a new benchmark dataset, where deeper insights into the algorithms are obtained via empirical evaluations (Sect. 4.5).

  • Based on our evaluation and analysis, we provide useful insights into the design trade-offs of existing works and suggest promising directions for the future development of matching KGs in entity embedding spaces (Sect. 4.6).

2 Preliminaries

In this section, we first present the task formulation of EA and its general framework. Next, we introduce the studies related to alignment inference and clarify the scope of this chapter. Finally, we present the key assumptions of embedding-based EA.

2.1 Task Formulation and Framework

Task Formulation

A KG \(\mathcal {G}\) is composed of triples \(\{(s, p, o)\}\), where \(s, o \in \mathcal {E}\) represent entities and \(p \in \mathcal {P}\) denotes the predicate (relation). Given a source KG \(\mathcal {G}_s\), a target KG \(\mathcal {G}_t\), the task of EA is formulated as discovering new (equivalent) entity pairs \(\mathcal {M} = \{(u, v)| u \in \mathcal {E}_s, v \in \mathcal {E}_t, u\Leftrightarrow v\}\) by using pre-annotated (seed) entity pairs \(\mathcal {S}\) as anchors, where \(\Leftrightarrow \) represents the equivalence between entities and \(\mathcal {E}_s\) and \(\mathcal {E}_t\) denote the entity sets in \(\mathcal {G}_s\) and \(\mathcal {G}_t\), respectively.

General Framework

The pipeline of state-of-the-art embedding-based EA solutions can be divided into two stages, i.e., representation learning and embedding matching, as shown in Fig. 4.2. The general algorithm can be found in Algorithm 1.

Fig. 4.2
A flow diagram with undirected network graphs for embedding-based E A. a. The K G pairs undergo representation learning followed by embedding space, and embedding matching with pairwise scores and matching to get matched entity pairs.

The pipeline of embedding-based EA. Dashed lines denote the pre-annotated alignment links

Algorithm 1: General algorithm of embedding-based EA

The majority of studies on EA are devoted to the representation learning stage. They first utilize KG embedding techniques such as TransE [4] and GCN [20] to capture the KG structure information and generate entity structural representations. Next, based on the assumption that equivalent entities from different KGs possess similar neighboring KG structures (and in turn similar embeddings), they leverage the seed entity pairs as anchors and progressively project individual KG embeddings into a unified space through training, resulting in the unified entity representations \(\boldsymbol {E}\).Footnote 2 There have already been several survey papers concentrating on representation learning approaches for EA, and we refer the interested readers to these works [2, 44, 57, 59].

Next, we introduce the embedding matching process –the focus of this chapter– as well as its related works.

2.2 Related Work and Scope

Matching KGs in Entity Embedding Spaces

After obtaining the unified entity representations \(\boldsymbol {E}\) where equivalent entities from different KGs are assumed to have similar embeddings, the embedding matching stage (also frequently referred to as alignment inference stage [44]) produces alignment results by comparing the embeddings of entities from different KGs. Concretely, it first calculates the pairwise scores between source and target entity embeddings according to a specific metric.Footnote 3 The pairwise scores are then organized into matrix form as \(\boldsymbol {S}\). Next, according to the pairwise scores, various matching algorithms are put forward to align entities. The most common algorithm is Greedy, described in Algorithm 2. It directly matches a source entity to the target entity that possesses the highest pairwise score according to \(\boldsymbol {S}\). Over the last few years, advanced solutions [12, 15, 28, 29, 44, 49, 53, 55, 56, 60] are devised to improve the embedding matching performance, and in this work, we focus on surveying and comparing these algorithms for matching KGs in entity embedding spaces.

Algorithm 2: Greedy\((\mathcal {E}_s,\mathcal {E}_t, \boldsymbol {S})\)

1for \(u \in \mathcal {E}_s\) do

2\( v^* = \operatorname *{\arg \max }_{v\in \mathcal {E}_t}\boldsymbol {S}(u, v)\);

3\(\mathcal {M} \gets \mathcal {M}\cup \{(u,v^*)\}\);

4return \(\mathcal {M}\);

Matching KGs in Symbolic Spaces

Before the emergence of embedding-based EA, there have already been many conventional frameworks that match KGs in symbolic spaces [17, 41, 42]. While some are based on equivalence reasoning mandated by OWL semantics [17], some leverage similarity computation to compare the symbolic features of entities [42]. However, these solutions are not comparable to algorithms for matching KGs in entity embedding spaces, as (1) they cover both the representation learning and embedding matching stages in embedding-based EA and (2) the required inputs are different from those of embedding matching algorithms. Thus, we do not include them in our experimental evaluation, while they have already been compared in the survey papers covering the overall embedding-based EA frameworks [44, 59].

The matching of relations (or ontology) between KGs has also been studied by prior symbolic works [41, 42]. Nevertheless, compared with entities, they are usually in smaller amounts, of various granularities [36], and under-explored in embedding-based approaches [51]. Hence, in this work, we exclude relevant studies on this topic and focus on the matching of entities.

The task of entity resolution (ER) [9, 16, 35], also known as entity matching, deduplication, or record linkage, can be regarded as the general case of EA [59]. It assumes that the input is relational data, and each data object usually has a large amount of textual information described in multiple attributes. Nevertheless, in this article, we focus on EA approaches, which strive to align KGs and mainly rely on graph representation learning techniques to model the KG structure and generate entity structural embeddings for alignment. Therefore, the discussion and comparison with ER solutions is beyond the scope of this work.

Matching Data Instances via Deep Learning

Entity matching (EM) between databases has also been greatly advanced by utilizing pre-trained language models for expressive contextualization of database records [10, 33]. These deep learning (DL)-based EM solutions devise end-to-end neural models to learn to classify an entity pair into matching or non-matching and then feed the test entity pairs into the trained models to obtain classification results [5, 25, 33]. Nevertheless, this procedure is different from the focus of our study, as both of its training and testing stage involve representation learning and matching. Besides, these solutions are not suitable for matching KGs in entity embedding space, since (1) they require adequate labeled data to train the neural classification models, but the training data in EA is much less than the testing ones, which could result in the overfitting issue; (2) they would suffer from severe class imbalance in EA, where an entity and all of its nonequivalent entities in another KG would constitute many negative samples, while there is usually one positive sample for this entity; and (3) they depend on the attributive text information between data records for training, while EA underlines the use of KG structure, which could provide much less useful features for model training. In the experiment, we adapt DL-based EM models to tackle EA, and the results are not promising. This will be further discussed in Sect. 4.4.3.

Existing Surveys on EA

There are several survey papers covering EA frameworks [44, 52, 57,58,59]. Some articles provide high-level discussion of embedding-based EA frameworks, experimentally evaluate and compare these works, and offer guidelines for potential practitioners [44, 58, 59]. Specifically, Zhao et al. propose a general EA framework to encompass existing works and then evaluate them under a wide range of settings. Nevertheless, they only briefly mention DInf and SMat in the embedding matching stage [59]. Sun et al. survey EA approaches and develop an open-source library to evaluate existing works. However, they merely introduce DInf, SMat, and Hun. and overlook the comparison among these algorithms. Besides, they point out that current approaches put in their main efforts in learning expressive embeddings to capture entity features but ignore the alignment inference (i.e., embedding matching) stage [44]. Zhang et al. empirically evaluate state-of-the-art embedding-based EA methods in an industrial context and particularly investigate the influence of the sizes and biases in seed mappings. They evaluate each method as a whole and do not mention the embedding matching process [58].

Two recent survey papers include the latest efforts on embedding-based EA and give more self-contained explanation on each technique. Zhang et al. provide a tutorial-type survey, while for embedding matching, they merely introduce the nearest neighbor search strategy, i.e., DInf [57]. Zeng et al. mainly introduce representation learning methods and their applications on EA, but they neglect the embedding matching stage [52].

In all, existing EA survey articles focus on the representation learning process and briefly introduce the embedding matching module (mostly only mentioning the DInf algorithm), while in this work, we systematically survey and empirically evaluate the algorithms designed for the embedding matching process in KG alignment and present comprehensive results and insightful discussions.

Scope of This Work

This study aims to survey and empirically compare the algorithms for matching KGs in entity embedding spaces, i.e., various implementations of Embedding_Matching( ) in Algorithm 1, on a wide range of EA experimental settings.

2.3 Key Assumptions

Notably, existing embedding-based EA solutions have a fundamental assumption; that is, the equivalent entities in different KGs possess similar (ideally, isomorphic) neighboring structures. Under such an assumption, effective representation learning models would transform the structures of equivalent entities into similar entity embeddings. As thus, based on the entity embeddings, the embedding matching stage would assign higher (resp., lower) pairwise similarity scores to the equivalent (resp., nonequivalent) entity pairs and finally make accurate alignment decisions via the coordination according to pairwise scores.

Besides, current EA evaluation settings assume that the entities in different KGs conform to the 1-to-1 constraint. That is, each \(u \in \mathcal {E}_s\) has one and only one equivalent entity \(v \in \mathcal {E}_t\) and vice versa. However, we contend that this assumption is in fact impractical and provides detailed experiments and discussions in Sect. 4.5.2.

3 Alignment Inference Algorithms

In this section, we introduce the algorithms for alignment inference, i.e., Embedding_Matching( ) in Algorithm 1.

3.1 Overview

We first provide the overview and comparison of matching algorithms for KGs in entity embedding spaces in Table 4.1. As mentioned in Sect. 4.2, embedding matching comprises two stages—pairwise score computation and matching. The baseline approach DInf adopts existing similarity metrics to calculate the similarity between entity embeddings and generate the pairwise scores in the first stage, and then it leverages Greedy for matching. In pursuit of better alignment performance, more advanced embedding matching strategies are put forward. While some (i.e., CSLS, RInf, and Sink.) optimize the pairwise score computation process and produce more accurate pairwise scores, some (i.e., Hun., SMat, and RL) take into account the global alignment dynamics, rather than greedily pursue the local optimum for each entity, during the matching process, where more correct matches could be generated according to the coordination under the global constraint.

Table 4.1 Overview and comparison of state-of-the-art algorithms for matching KGs in entity embedding spaces

We further identify two notable characteristics of matching KGs in entity embedding spaces, i.e., whether the matching leverages the 1-to-1 constraint, and the direction of the matching. Regarding the former, Hun. and SMat explicitly exert the 1-to-1 constraint on the matching process. RL relaxes the strict 1-to-1 constraint by allowing non-1-to-1 matches. The greedy strategies, however, normally do not take into consideration this constraint, except for Sink., which implicitly implements the 1-to-1 constraint in a progressive manner when calculating the pairwise scores. As for the direction of matching, Greedy only considers a single direction at a time and overlooks the influence from the reverse direction. As thus, the resultant source-to-target alignment results are not necessarily equal to the target-to-source ones. By improving the pairwise score computation, CSLS, RInf, and Sink. are actually modeling and integrating the bidirectional alignments, whereas they still adopt Greedy to produce final results. For non-greedy methods, Hun. and SMat fully consider the bidirectional alignments and produce a matching agreed by both directions, while RL is unidirectional.

Next, we describe these methods in detail.Footnote 4

3.2 Simple Embedding Matching

DInf is the most common implementation of Embedding_Matching( ), described in Algorithm 3. Assume both KGs contain n entities. The time and space complexity of DInf is \(O(n^2)\).

Algorithm 3: \(\textsf { DInf} (\mathcal {E}_s,\mathcal {E}_t, \boldsymbol {E})\)

1 Derive similarity matrix \(\boldsymbol {S}\) based on \(\boldsymbol {E}\);

2\(\mathcal {M} \gets \) Greedy\((\mathcal {E}_s,\mathcal {E}_t,\boldsymbol {S})\);

3return \(\mathcal {M}\);

3.3 CSLS Algorithm

The cross-domain similarity local scaling (CSLS) algorithm [23] is introduced to mitigate the hubness and isolation issues of entity embeddings in EA [44]. The hubness issue refers to the phenomenon where some entities (known as hubs) frequently appear as the top one most similar entities of other entities in the vector space, while the isolation issue means that there exist some outliers isolated from any point clusters. As thus, CSLS increases the similarity associated with isolated entity embeddings and conversely decreases the ones of vectors lying in dense areas [23]. Formally, the CSLS pairwise score between source entity u and target entity v is:

$$\displaystyle \begin{aligned} \textsf{ CSLS} (u,v) = 2\boldsymbol{S}(u,v) - \phi(u) - \phi(v)^\top, {} \end{aligned} $$
(4.1)

where \(\boldsymbol {S}\) is the similarity matrix derived from \(\boldsymbol {E}\) using similarity metrics, \(\phi (u) = \frac {1}{k}\sum _{v^\prime \in \mathcal {N}_u}\boldsymbol {S}(u,v^\prime )\) is the mean similarity score between the source entity u and its top-k most similar entities \(\mathcal {N}_u\) in the target KG, and \(\phi (v)\) is defined similarly. The mean similarity scores of all source and target entities are denoted in vector form as \(\phi _s\) and \(\phi _t\), respectively. To generate the matched entity pairs, it further applies Greedy on the CSLS matrix (i.e., \(\boldsymbol {S}_{\textsf { CSLS} }\)). Algorithm 4 describes the detailed procedure of CSLS.

Algorithm 4: CSLS\((\mathcal {E}_s,\mathcal {E}_t, \boldsymbol {E},k)\)

1 Derive similarity matrix \(\boldsymbol {S}\) based on \(\boldsymbol {E}\);

2 Calculate the mean values of top-k similarity scores of entities in \(\mathcal {E}_s\) and \(\mathcal {E}_t\), resulting in \(\phi _s\) and \(\phi _t\), respectively;

3\(\boldsymbol {S}_{\textsf { CSLS} } = 2\boldsymbol {S} - \phi _s -\phi _t^\top \);

4\(\mathcal {M} \gets \) Greedy\((\mathcal {E}_s,\mathcal {E}_t,\boldsymbol {S}_{\textsf { CSLS} })\);

5return \(\mathcal {M}\);

Complexity

The time and space complexity is \(O(n^2)\). Practically, it requires more time and space than DInf, as it needs to generate the additional CSLS matrix.

3.4 Reciprocal Embedding Matching

Zeng et al. [53] formulate EA task as the reciprocal recommendation process [38] and offer a reciprocal embedding matching strategy RInf to model and integrate the bidirectional preferences of entities when inferring the matching results. Formally, it defines the pairwise score of source entity u toward target entity v as:

$$\displaystyle \begin{aligned} p_{u,v} = \boldsymbol{S}(u,v) - \max_{u^\prime\in \mathcal{E}_s}\boldsymbol{S}(v,u^\prime) + 1, {} \end{aligned} $$
(4.2)

where \(\boldsymbol {S}\) is the similarity matrix derived from \(\boldsymbol {E}\), \(0\leq p_{u,v} \leq 1\), and a larger \(p_{u,v}\) denotes a higher degree of preference. As such, the matrix forms of the source-to-target and target-to-source preference scores are denoted as \(\boldsymbol {P}_{s,t}\) and \(\boldsymbol {P}_{t,s}\), respectively. Next, it converts the preference matrix \(\boldsymbol {P}\) into the ranking matrix \(\boldsymbol {R}\) and then averages the two ranking matrices, resulting in the reciprocal preference matrix \(\boldsymbol {P}_{s \leftrightarrow t}\) that encodes the bidirectional alignment information. Finally, it adopts Greedy to generate the matched entity pairs.

Algorithm 5: \(\textsf { RInf} (\mathcal {E}_s,\mathcal {E}_t, \boldsymbol {E})\)

1 Derive similarity matrix \(\boldsymbol {S}\) based on \(\boldsymbol {E}\);

2for \(u \in \mathcal {E}_s\) do

3for \(v \in \mathcal {E}_t\) do

4 Calculate \(p_{u,v}\) and \(p_{v,u}\) (cf. Eq. (4.2));

5 Collect the preference scores, resulting in \(\boldsymbol {P}_{s,t}\) and \(\boldsymbol {P}_{t,s}\);

6 Convert \(\boldsymbol {P}_{s,t}\) and \(\boldsymbol {P}_{t,s}\) into \(\boldsymbol {R}_{s,t}\) and \(\boldsymbol {R}_{t,s}\), respectively;

7\(\boldsymbol {P}_{s \leftrightarrow t} = (\boldsymbol {R}_{s,t} + \boldsymbol {R}_{t,s}^\top )/2\);

8\(\mathcal {M} \gets \) Greedy\((\mathcal {E}_s,\mathcal {E}_t, -\boldsymbol {P}_{s \leftrightarrow t})\);

9return \(\mathcal {M}\);

Complexity

Algorithm 5 describes the detailed procedure of RInf. The time complexity is \(O(n^2\lg n)\) [53]. The space complexity is \(O(n^2)\). Practically, it requires more space than DInf and CSLS, due to the computation of similarity, preference, and ranking matrices. Noteworthily, two variant methods, i.e., RInf-wr and RInf-pb, are proposed to reduce the memory and time consumption brought by the reciprocal modeling. More details can be found in [53].

3.5 Embedding Matching as Assignment

Some very recent studies [29, 49] propose to model the embedding matching process as the linear assignment problem. They first use similarity metrics to calculate pairwise similarity scores based on \(\boldsymbol {E}\). Then they adopt the Hungarian algorithm [21] to solve the task of assigning source entities to target entities according to the pairwise scores. The objective is to maximize the sum of the pairwise similarity scores of the final matched entity pairs while observing the 1-to-1 assignment constraint. In this work, we use the Hungarian algorithm implemented by Jonker and Volgenant [18] and denote it as Hun.\((\mathcal {E}_s,\mathcal {E}_t,\boldsymbol {E})\).

Besides, the Sinkhorn operation [31] (or Sink. for short) is also adopted to solve the assignment problem [12, 15, 29], which converts the similarity matrix \(\boldsymbol {S}\) into a doubly stochastic matrix \(\boldsymbol {S}_{sinkhorn}\) that encodes the entity correspondence information. Specifically,

$$\displaystyle \begin{aligned} \begin{aligned} &Sinkhorn^l(\boldsymbol{S}) = \Gamma_c(\Gamma_r(Sinkhorn^{l-1}(\boldsymbol{S})));\\ &\boldsymbol{S}_{sinkhorn} = \lim_{l\rightarrow\infty} Sinkhorn^l(\boldsymbol{S}), \end{aligned} {} \end{aligned} $$
(4.3)

where \(Sinkhorn^0(\boldsymbol {S}) = \exp (\boldsymbol {S})\) and \(\Gamma _c\) and \(\Gamma _r\) refer to the column and row-wise normalization operators of a matrix. Since the number of iterations l is limited, the Sinkhorn operation can only obtain an approximate 1-to-1 assignment solution in practice [29]. Then \(\boldsymbol {S}_{sinkhorn}\) is forwarded to Greedy to obtain the alignment results.

Algorithm 6: Sink.\((\mathcal {E}_s,\mathcal {E}_t, \boldsymbol {E}, l)\)

1 Derive similarity matrix \(\boldsymbol {S}\) based on \(\boldsymbol {E}\);

2\(\boldsymbol {S}_{sinkhorn} = Sinkhorn^l(\boldsymbol {S})\) (cf. Eq. (4.3));

3\(\mathcal {M} \gets \) Greedy\((\mathcal {E}_s,\mathcal {E}_t,\boldsymbol {S}_{sinkhorn})\);

4return \(\mathcal {M}\);

Complexity

For Hun., the time complexity is \(O(n^3)\), and the space complexity is \(O(n^2)\). Algorithm 5 describes the procedure of Sink.. The time complexity of Sink. is \(O(ln^2)\) [29], and the space complexity is \(O(n^2)\). In practice, both algorithms require more space than DInf, since they need to store the intermediate results.

3.6 Stable Embedding Matching

In order to consider the interdependence among alignment decisions, the embedding matching process is formulated as the stable matching problem [13] by Zeng et al. [55] and Zhu et al. [60]. It is proved that for any two sets of members with the same size, each of whom provides a ranking of the members in the opposing set, there exists a bijection of the two sets such that no pair of two members from the opposite side would prefer to be matched to each other rather than their assigned partners [11]. Specifically, these works first produce the similarity matrix \(\boldsymbol {S}\) based on \(\boldsymbol {E}\) using similarity metrics. Next, they generate the rankings of members in the opposing set according to the pairwise similarity scores. Finally, they use the Gale-Shapley algorithm [40] to solve the stable matching problem. This procedure is denoted as SMat\((\mathcal {E}_s,\mathcal {E}_t,\boldsymbol {E})\).

Complexity

SMat has time complexity of \(O(n^2\lg n)\) (since for each entity, the ranking of entities in the opposite side needs to be computed) and space complexity of \(O(n^2)\).

3.7 RL-Based Embedding Matching

The embedding matching process is cast to the classic sequence decision problem by [56]. Given a sequence of source entities (and their embeddings), the goal of the sequence decision problem is to decide to which target entity each source entity aligns. It devises a reinforcement learning (RL)-based framework to learn to optimize the decision-making for all entities, rather than optimize every single decision separately. More details can be found in the original paper [56], and we use RL\((\mathcal {E}_s,\mathcal {E}_t,\boldsymbol {E})\) to denote it.

Complexity

Note that it is difficult to deduce the time complexity for this neural RL model. Instead, we provide the empirical time costs in the experiments. The space complexity is \(O(n^2)\).

4 Main Experiments

In this section, we compare the algorithms for matching KGs in entity embedding spaces on the mainstream EA evaluation setting (1-to-1 alignment).

4.1 EntMatcher: An Open-Source Library

To ensure comparability, we re-implemented all compared algorithms using Python under a unified framework and established an open-source library, EntMatcher.Footnote 5 The architecture of EntMatcher library is presented in the blue block of Fig. 4.3, which takes as input unified entity embeddings \(\boldsymbol {E}\) and produces the matched entity pairs. It has the following three major features:

Fig. 4.3
An architecture diagram of the Ent Matcher library. Representation learning with G C N, R R E A and none, and auxiliary information of name, description and none lead to Ent Matcher. The Ent Matcher has similarity metric, score optimization, and matching constraint.

Architecture of the EntMatcher library and additional modules required by the experimental evaluation

Loosely Coupled Design

There are three independent modules in EntMatcher, and we have implemented the representative methods in each module. Users are free to combine the techniques in each module to develop new approaches, or to implement their new designs by following the templates in modules.

Reproduction of Existing Approaches

To support our experimental study, we tried our best to re-implement all existing algorithms by using EntMatcher. For instance, the combination of cosine similarity, CSLS, and Greedy reproduces the CSLS algorithm in Sect. 4.3.3; and the combination of cosine similarity, None, and Hun. reproduces the Hun. algorithm in Sect. 4.3.5. The specific hyper-parameter settings are elaborated in Sect. 4.4.2.

Flexible Integration with Other Modules in EA

EntMatcher is highly flexible, which can be directly called during the development of standalone EA approaches. Besides, users may also use EntMatcher as the backbone and call other modules. For instance, to conduct the experimental evaluations in this work, we implemented the representation learning and auxiliary information modules to generate the unified entity embeddings \(\boldsymbol {E}\), as shown in the white blocks of Fig. 4.3. More details are elaborated in the next subsection. Finally, EntMatcher is also compatible with existing open-source EA libraries (that mainly focus on representation learning) such as OpenEAFootnote 6 and EAkit.Footnote 7

4.2 Experimental Settings

Current EA evaluation setting assumes that the entities in source and target KGs are 1-to-1 matched (cf. Sect. 4.2.3). Although this assumption simplifies the real-word scenarios where some entities are unmatchable or some might be aligned to multiple entities on the other side, it indeed reflects the core challenge of EA. Therefore, following existing literature, we mainly compare the embedding matching algorithms under this setting and postpone the evaluation on the challenging real-life scenarios to Sect. 4.5.

Datasets

We used popular EA benchmarks for evaluation: (1) DBP15K, which comprises three multilingual KG pairs extracted from DBpedia [1], English to Chinese (\({\mathtt {DBP15K}_{\mathtt {ZH-EN}}}\)), English to Japanese (\({\mathtt {DBP15K}_{\mathtt {JA-EN}}}\)), and English to French (\({\mathtt {DBP15K}_{\mathtt {FR-EN}}}\)); (2) SRPRS, which is a sparser dataset that follows real-life entity distribution, including two multilingual KG pairs extracted from DBpedia, English to French (\({\mathtt {SRPRS}_{\mathtt {EN-FR}}}\)) and English to German (\({\mathtt {SRPRS}_{\mathtt {EN-DE}}}\)), and two mono-lingual KG pairs, DBpedia to Wikidata [46] (\({\mathtt {SRPRS}_{\mathtt {DBP-WD}}}\)) and DBpedia to YAGO [43] (\({\mathtt {SRPRS}_{\mathtt {DBP-YG}}}\)); and (3) DWY100K, which is a larger dataset consisting of two mono-lingual KG pairs: DBpedia to Wikidata (D-W) and DBpedia to YAGO (D-Y). The detailed statistics can be found in Table 4.2, where the numbers of entities, relations, triples, gold links, and the average entity degree are reported. Regarding the gold alignment links, we adopted 70% as test set, 20% for training, and 10% for validation.

Table 4.2 Dataset statistics

Hardware Configuration and Hyper-Parameter Setting

Our experiments were performed on a Linux server that is equipped with an Intel Core i7-4790 CPU running at 3.6GHz, NVIDIA GeForce GTX TITAN X GPU, and 128 GB RAM. We followed the configurations presented in the original papers of these algorithms and tuned the hyper-parameters on the validation set. Specifically, for CSLS, we set k to 1, except on the non-1-to-1 setting where we set it to 5. Similarly, regarding RInf, we changed the maximum operation in Eq. (4.2) to top-k average operation on the non-1-to-1 setting, where k is set to 5. As to Sink., we set l to 100. For RL, we found the hyper-parameters in the original paper could already produce the best results and directly adopted them. The rest of the approaches, i.e., DInf, Hun., and SMat, do not contain hyper-parameters.

Evaluation Metric

We utilized F1 score as the evaluation metric, which is the harmonic mean between precision and recall, where the precision value is computed as the number of correct matches divided by the number of matches found by a method, and the recall value is computed as the number of correct matches found by a method divided by the number of gold matches. Note that recall is equivalent to the Hits@1 metric used in some previous works.

Representation Learning Models

Since representation learning is not the focus of this work, we adopted two frequently used models, i.e., RREA [30] and GCN [47]. Concretely, GCN is one of the simplest models, which uses graph convolutional networks to learn the structural embeddings, while RREA is one of the best-performing solutions, which leverages relational reflection transformation to obtain relation-specific entity embeddings.

Auxiliary Information for Alignment

Some works leverage the auxiliary information in KGs (e.g., entity attributes, descriptions, and pictures) to complement the KG structure. Specifically, these auxiliary information are first encoded into low-dimensional vectors and then fused with structural embeddings to provide more accurate entity representations for the subsequent embedding matching stage [44]. Although EA underlines the use of graph structure for alignment [59], for a more comprehensive evaluation, we examined the influence of auxiliary information on the matching results by following previous works and using entity name embeddings to facilitate alignment [34, 59]. We also combined these two channels of information with equal weights to generate the fused similarity matrix for matching.Footnote 8

Similarity Metric

After obtaining the unified entity representations \(\boldsymbol {E}\), a similarity metric is required to produce pairwise scores and generate the similarity matrix \(\boldsymbol {S}\). Frequent choices include the cosine similarity [6, 30, 45], the Euclidean distance [7, 24], and the Manhattan distance [48, 50]. In this work, we followed mainstream works and adopted the cosine similarity.

4.3 Main Results and Comparison

We first evaluate with only structural information and report the results in Table 4.3, where R- and G- refer to using RREA and GCN to generate the structural embeddings, respectively, and DBP and SRP denote DBP15K and SRPRS, respectively. Next, we supplement with name embeddings and report the results in Table 4.4, where N- and NR- refer to only using the name embeddings and fusing name embeddings with RREA structural representations, respectively. Note that, on existing datasets, all the entities in the test set can be matched, and all the algorithms are devised to find a target entity for each test source entity. Hence, the number of matches found by a method equals to the number of gold matches, and consequently the precision value is equal to the recall value and the F1 score [56].

Table 4.3 The F1 scores of only using structural information
Table 4.4 The F1 scores of using auxiliary information

Overall Performance

First, we do not delve into the embedding matching algorithms and directly analyze the general results. Specifically, using RREA to learn structural representations can bring better performance compared with using GCN, showcasing that representation learning strategies are crucial to the overall alignment performance. When introducing the entity name information, it observes that this auxiliary signal alone can already provide very accurate signal for alignment. This is because the equivalent entities in different KGs of current datasets share very similar or even identical names. After fusing the semantic and structural information, the alignment performance is further lifted, with most of the approaches hitting over 0.9 in terms of the F1 score.

Effectiveness Comparison of Embedding Matching Algorithms

From the tables, it is evident that: (1) Overall, Hun. and Sink. attain much better results than the other strategies. Specifically, Hun. takes full account of the global matching constraints and strives to reach a globally optimal matching given the objective of maximizing the sum of pairwise similarity scores. Moreover, the 1-to-1 constraint it exerts aligns with present evaluation setting where the source and target entities are 1-to-1 matched. Sink., on the other hand, implicitly implements the 1-to-1 constraint during pairwise score computation and still adopts Greedy to produce final results, where there might exist non-1-to-1 matches; (2) DInf attains the worst performance. This is because it directly adopts the similarity scores that suffer from the hubness and isolation issues [44]. Besides, it leverages Greedy, which merely reaches the local optimum for each entity. (3) The performance of RInf, CSLS, SMat, and RL are well matched. RInf and CSLS improve upon DInf by mitigating the hubness issue and enhancing the quality of pairwise scores. SMat and RL, on the other hand, improve upon DInf by modeling the interactions among matching decisions for different entities.

Furthermore, we conduct a deeper analysis of these approaches and identify the following patterns:

Pattern 1. If for source entities, their highest pairwise similarity scores are close, RInf and CSLS (resp., SMat and RL) would attain relatively better (resp., worse) performance. Specifically, in Table 4.3 where RInf consistently (CSLS sometimes) attains superior results than SMat and RL, the average standard deviation (STD) values of the top five pairwise similarity scores of source entities (cf. Fig. 4.4) are very small, unveiling that the top scores are close and difficult to differentiate. In contrast, in Table 4.4 where SMat and RL outperform RInf and CSLS, the corresponding STD values are relatively large. This is because RInf and CSLS aim to make the scores more distinguishable, and hence they are more effective in cases where the top similarity scores are very close (i.e., low STD values). On the contrary, when the top similarity scores are already discriminating (e.g., Table 4.4), RInf and CSLS become less useful, while SMat and RL can still make improvements by using the global constraints to enforce the deviation from local optimums.

Fig. 4.4
A bar graph of pairwise similarity scores versus algorithms. N S has the highest score of around 0.9 followed by N R S, N D, N R D, R D, R S, G D, and G S. G S and G D are below 0.2.

The statistic of pairwise similarity scores (i.e., top five STD)

Pattern 2. On sparser datasets, the superiority of Sink. and Hun. over the rest of the methods becomes less significant. This is based on the observation that on SRPRS, other matching algorithms (RInf in particular) attain much closer performance to Sink. and Hun.. Such a pattern could be attributed to the fact that, on sparser datasets, entities normally have fewer connections with others, i.e., lower average entity degree (in Table 4.2), where representation learning strategies might fail to fully capture the structural signals for alignment and the resultant pairwise scores become less accurate. These inaccurate scores could mislead the matching process and hence limit the effectiveness of the top-performing methods, i.e., Sink. and Hun.. In other words, sparser KG structures are more likely to (partially) break the fundamental assumption on KG structure similarity (cf. Sect. 4.2.3).

Efficiency Analysis

We compare the time and space efficiency of these methods on the medium-sized datasets in Fig. 4.5. Since the costs on KG pairs from the same dataset are very similar, we report the average time and space costs under each setting in the interest of space.

Fig. 4.5
2 scatterplots, a and b, of 8 algorithms versus time cost and memory space. a. Points for D inf and C S L S are between 0 and 10, Hun. R inf and S mat are between 10 and 100. Sink and R L are between 100 and 1000. b. D inf, C S L S, Hun, sink, R inf and R L are before 10, and S mat is at 20.

Efficiency comparison. Shapes in blue denote methods that improve pairwise scores, while shapes in black denote those exerting global constraints (except for DInf). (a) Time cost (in seconds). (b) Memory space cost (in GB)

Specifically, it observes that: (1) The simple algorithm DInf is the most efficient approach. (2) Among the advanced approaches, CSLS is the most efficient one, closely following DInf. (3) The efficiency of RInf and Hun. are equally matched. While Hun. consumes relatively less memory space than RInf, its time efficiency is less stable and tends to run slower on datasets with less accurate pairwise scores. (4) The space efficiency of Sink. is close to RInf and Hun., whereas it has much higher time costs, which largely depends on the value of l. (5) RL is the least time-efficient approach, while SMat is the least space-efficient algorithm. RL requires more time on datasets with less accurate pairwise scores where its pre-processing module fails to produce promising results [56]. The memory space consumption of SMat is high, as it needs to store a large amount of intermediate matching results. In all, we can conclude that generally, advanced embedding matching algorithms require more time and memory space, among which the methods incorporating global matching constraints tend to be less efficient.

Comparison with DL-Based EM Approaches

We utilize the deepmatcher Python package [33], which provides built-in neural networks and utilities that can train and apply state-of-the-art deep learning models for entity matching, to address EA. Specifically, we use the structural and name embeddings to replace the attributive text inputs in deepmatcher, respectively, and then train the neural model with labeled data. For each positive entity pair, we randomly sample ten negative ones. In the testing stage, for each source entity, we feed the entity pairs constituting it and all the target entities into the trained classifier and regard the entity pair with the highest predicted score as the result.

In the final results, only several entities are correctly aligned, showing that DL-based EM approaches cannot handle EA, which can be ascribed to the insufficient labeled data, imbalanced class distribution, and the lack of attributive text information, as discussed in Sect. 4.2.2.

4.4 Results on Large-Scale Datasets

Next, we provide the results on the relatively larger dataset, i.e., DWY100K, which can also reflect the scalability of these algorithms. The results are presented in Table 4.5.Footnote 9 The general pattern is similar to that on G-DBP (i.e., using GCN on DBP15K), where Sink. and Hun. obtain the best results, followed by RInf. The performance of CSLS and RL is close, outperforming DInf by over 20%.

Table 4.5 The F1 scores on DWY100K using GCN

We compare the efficiency of these algorithms in Table 4.5, where \(\bar {T}\) refers to the average time cost and Mem. denotes whether the memory space required by the model can be covered by our experimental environment.Footnote 10 It observes that, given larger datasets, most of the performant algorithms have poor efficiency and scalability (e.g., RInf, Sink., and Hun.). Note that in [53], two variants of RInf, i.e., RInf-wr and RInf-pb, are proposed to improve its scalability at the cost of a small performance drop, which is empirically validated in Table 4.5. This also reveals that more scalable matching algorithms for KGs in entity embedding spaces should be devised.

4.5 Analysis and Insights

We provide further experiments and discussions in this subsection.

On Efficiency and Scalability

The simple algorithm DInf is the most efficient and scalable one, as it merely involves the most basic computation and matching operations. CSLS is slightly less efficient than DInf due to the update of pairwise similarity scores. It also has good scalability. Although RInf adopts a similar idea to CSLS, it involves an additional ranking process, which brings much more time and memory consumption, making it less scalable. Sink. repeatedly conducts the normalization operation, and thus its time efficiency is mainly up to the l value. Its scalability is also limited by the memory space consumption since it needs to store intermediate results, as revealed in Table 4.5.

Regarding the methods that exert global constraints, Hun. is efficient on medium-sized datasets, while it is not scalable due to the high time complexity and memory space consumption. SMat is space-inefficient even on the medium-sized datasets, making it not scalable. In comparison, RL has more stable time and space costs and can scale to large datasets, and the main influencing factor is the accuracy of pairwise scores. This is because RL has a pre-processing step that filters out confident matched entity pairs and excludes them from the time-consuming RL learning process [56]. More confident matched entity pairs would be filtered out if the pairwise scores are more accurate.

On Effectiveness of Improving Pairwise Score Computation

We compare and discuss the strategies for improving the pairwise score computation, i.e., CSLS, RInf, and Sink.

Both CSLS and RInf aim to mitigate the hubness and isolation issues in the raw pairwise scores (from different starting points). Particularly, we observe that, by setting k (in Eq. (4.1)) of CSLS to 1, the difference between RInf and CSLS is reduced to the extra ranking process of RInf, and the results in Table 4.3 and 4.4 validate that this ranking process can consistently bring better performance. This is because the ranking operation can amplify the difference among the scores and prevent such information from being lost after the bidirectional aggregation [53]. However, it is noteworthy that the ranking process brings much more time and memory consumption, as can be observed from the empirical results.

Then we analyze the influence of k value in CSLS. As shown in Fig. 4.6, a larger k leads to worse performance. This is because a larger k implies a smaller \(\phi \) value in Eq. (4.1) (where the top-k highest scores are considered and averaged), and the resultant pairwise scores become less distinctive. This also validates the effectiveness of the design in RInf (cf. Eq. (4.2)), where only the maximum value is considered to compute the preference score. Nevertheless, in Sect. 4.5.2, we reveal that setting k to 1 is only useful in the 1-to-1 alignment setting.

Fig. 4.6
2 multiline graphs plot F 1 scores versus k values. a. The line for D F decreases from (1, 0.71) to (5, 0.70), D Z from (1, 0.69) to (5, 0.68), and D J from (1, 0.68) to (5, 0.67). b. S D is above 0.35, S Y is at around 0.30, S W is at around 0.25, and S F is below 0.25. Values are estimated.

F1 scores of CSLS with varying k value. (a) On R-DBP. (b) On G-SRP

As for Sink., it adopts an extreme approach to optimize the pairwise scores, which encourages each source (resp., target) entity to have only one positive pairwise score with a target (resp., source) entity and 0’s with the rest of the target (resp., source) entities. Thus, it is in fact progressively and implicitly implementing the 1-to-1 alignment constraint during the pairwise score computation process with the increase of l and is particularly useful in present 1-to-1 evaluation settings of EA. In Fig. 4.7, we further examine the influence of l in Eq. (4.3) on the alignment results of Sink., which meets our expectation that the larger the l value, the better the distribution of the resultant pairwise scores fits the 1-to-1 constraint, and thus the higher the alignment performance. Nevertheless, a larger l also implies longer processing time. Therefore, by tuning on the validation set, we set l to 100 to reach the balance between effectiveness and efficiency.

Fig. 4.7
2 multiline graphs plot F 1 scores versus l values. a. D F increases from (0, 0.725) to (900, 0.775), D Z from (0, 0.690) to (900, 0.750), and D J from below (0, 0.690) to (900, 0.750). b. S D increases from 0.35, S Y from 0.30, S W from 0.25, and S F from below 0.25. Values are estimated.

F1 scores of Sink. with varying l value. (a) On R-DBP. (b) On G-SRP

On Effectiveness of Exerting Global Constraints

Next, we compare and discuss the methods that exert global constraints on the embedding matching process, i.e., Hun., SMat, and RL.

It is evident that Hun. is the most performant approach, as it fits well with the present EA setting and can secure an optimal solution toward maximizing the sum of pairwise scores. Specifically, the current EA setting has two notable assumptions (cf. Sect. 4.2.3). With these two assumptions, EA can be transformed into the linear assignment problem, which aims to maximize the sum of pairwise scores under the 1-to-1 constraint [29]. As thus, the algorithms for solving the linear assignment problem, e.g., Hun., can attain remarkably high performance on EA. However, these two assumptions do not necessarily hold on all occasions, which could influence the effectiveness of Hun.. For instance, as revealed in Pattern 2, on sparse datasets (e.g., SRPRS), the neighboring structures of some equivalent entities are likely to be different, where the effectiveness of Hun. is limited. In addition, the 1-to-1 alignment constraint is not necessarily true in practice, which will be discussed in Sect. 4.5.

In comparison, SMat merely aims to attain a stable matching, where the resultant entity pairing could be sub-optimal under present evaluation setting. RL, on the other hand, relaxes the 1-to-1 constraint and only deviates slightly from the greedy matching, and hence the results are not very promising.

Overall Comparison and Conclusion

Finally, we compare the algorithms all together and draw the following conclusions under the 1-to-1 alignment setting: (1) The best performing methods are Hun. and Sink.. Nevertheless, they have low scalability. (2) CSLS and RInf achieve the best balance between effectiveness and efficiency. While CSLS is more efficient, RInf is more effective. (3) SMat and RL tend to attain better results when the accuracy of the pairwise scores is high. Nevertheless, they require relatively more time.

Combining Embedding Matching Algorithms

As described above, CSLS, RInf, and Sink. mainly improve the computation of pairwise scores, while Hun., SMat, and RL exert the global constraints during the matching process. Thus, by using the EntMatcher library, we aim to investigate whether the combination of these strategies would lead to better matching performance.

The results are reported in Table 4.6, where we can observe that: (1) Hun. is already effective given the raw pairwise scores, and using CSLS or RInf to improve the pairwise scores would not change and even bring down the performance (2) For SMat and RL, using CSLS or RInf to improve the raw pairwise scores would consistently lead to better results (3) Looking from the other side, while applying Hun., SMat, and RL upon CSLS improves its performance, such additional operations bring down the results of RInf This is because, by modeling entity preference and converting to rankings, RInf has already lost the information contained in the original pairwise scores, and exerting global constraints upon the reciprocal preference scores is no longer beneficial.

Table 4.6 F1 scores of combined embedding matching algorithms

Hence, we can conclude that, generally speaking, combining the algorithms (designed for different embedding matching stages) would lead to better alignment results, except for the combination with RInf and Hun..

Alignment Results Analysis

We further analyze the sets of matched entity pairs produced by the compared algorithms. Specifically, we examine the difference of the correct results found by different methods and report the pairwise difference ratios in the heat map of Fig. 4.8. The difference ratio is defined as the \(|\mathcal {C}-\mathcal {R}|/|\mathcal {C}|\), where \(\mathcal {C}\) and \(\mathcal {R}\) denote the correct aligned entity pairs produced by the corresponding algorithms in the column and row, respectively.

Fig. 4.8
A heatmap and a Venn diagram. a. Heatmap of R L, S mat, hun, sink, R inf, C S L S, and D inf with gradient shades for difference ratios. D inf and hun have the highest ratio of above 0.20. b. A Venn diagram. The intersection of R inf, hun, and S mat has 61%. Other intersections have less than 10%.

Alignment results analysis on \({\mathtt {DBP15K}_{\mathtt {ZH-EN}}}\) of R-DBP. (a) Heat map of the difference ratios of correct matching results. (b) Proportions of correct results found by RInf, Hun., and SMat

From Fig. 4.8a, we can observe that: (1) The elements in the matrix (except those in secondary diagonal) are above 0, showing that these matching algorithms produce complementary correct matches (2) The results of Hun. and Sink., CSLS and RInf, and SMat and RL are similar; that is, the algorithms in each pair produce very similar correct matches (i.e., with low difference ratios and light colors) (3) The columns of Hun., Sink., and RInf have relatively darker colors, revealing that they tend to discover the matches that other methods fail to detect

We further select three representative methods, i.e., RInf, Hun., and SMat, and provide a more detailed analysis in Fig. 4.8b. It is obvious that these approaches do produce complementary results, and it calls for an ensemble framework that integrates the alignment results produced by different matching algorithms.

5 New Evaluation Settings

In this section, we conduct experiments on settings that can better reflect real-life challenges.

5.1 Unmatchable Entities

Current EA literature largely overlooks the unmatchable issue, where a KG contains entities that the other KG does not contain. For instance, when aligning YAGO 4 and IMDB, only 1% of entities in YAGO 4 are film-related and possibly have equivalent entities in IMDB, while the other 99% of entities in YAGO 4 necessarily have no match in IMDB [59]. Hence, we aim to evaluate the embedding matching algorithms in terms of dealing with unmatchable entities.

Datasets and Evaluation Settings

Following [54], we adapt the KG pairs in DBP15K to include unmatchable entities, resulting in DBP15K+. More specific construction procedure can be found in [54].

As for the evaluation metric, we follow the main experimental setting and adopt the F1 score. Unlike 1-to-1 alignment, there exist unmatchable entities in this adapted dataset, and the precision and recall values are not necessarily equivalent, since some methods would also align unmatchable entities.

Noteworthily, the original setting of SMat and Hun. requires that the numbers of entities on the two sides are equal. Thus, we add the dummy nodes on the side with fewer entities to restore such a setting and then apply SMat and Hun.. The corresponding results are reported in Table 4.7.

Table 4.7 F1 scores on DBP15K+

Alignment Results

It reads that Hun. attains the best results, followed by SMat. The superior results are partially due to the addition of dummy nodes, which could mitigate the unmatchable issue to a certain degree. The results RInf and Sink. are close, outperforming CSLS and RL. DInf still achieves the worst performance.

Besides, by comparing the results on DBP15K+ and those on the original dataset DBP15K (cf. Table 4.3), we observe that: (1) After including the unmatchable entities, for all methods, the F1 scores drop. This is because most of current embedding matching algorithms are greedy, i.e., retrieving a target entity for each source entity (including the unmatchable ones), which leads to a very low precision. For the rest of the methods, e.g., Hun. and SMat, the unmatchable entities also mislead the matching process and thus affect the final results. (2) Unlike on DBP15K where the performance of Sink. and Hun. is close, on DBP15K+, Hun. largely outperforms Sink., as Hun. does not necessarily align a target entity to each source entity and has a higher precision. (3) Overall, existing algorithms for matching KGs in entity embedding spaces lack the capability of dealing with unmatchable entities.

5.2 Non-1-to-1 Alignment

Next, we study the setting where the source and target entities do not strictly conform to the 1-to-1 constraint, so as to better appreciate these matching algorithms for KGs in entity embedding spaces. Non-1-to-1 alignment is common in practice, especially when two KGs contain entities in different granularity, or one KG is noisy and involves duplicate entities. To the best of our knowledge, we are among the first attempts to identify and investigate this issue.

Dataset Construction

Present EA benchmarks are constructed according to the 1-to-1 constraint. Thus, in this work, we establish a new dataset that involves non-1-to-1 alignment relationships. Specifically, we obtain the pre-annotated linksFootnote 11 between Freebase [3] and DBpedia [1] and preserve the entities that are involved in 1-to-many, many-to-1, and many-to-many alignment relationships. Then, we retrieve the relational triples that contain these entities from respective KGs, which also introduces new entities.

Next, we detect the links among the newly added entities and add them into the alignment links. Finally, the resultant dataset, FB_DBP_MUL, contains 44,716 entities, 164,882 triples, and 22,117 gold links, among which 20,353 are non-1-to-1 links and 1,764 are 1-to-1 links.Footnote 12 The specific statistics are also presented in Table 4.2.

Evaluation Settings

To keep the integrity of the links among entities, we sample the training, validation, and test sets from the gold links according to the principle that the links involving the same entity should not be distributed among different sets. The size of the final training, validation, and test sets is approximately 7:1:2. We compare the entity pairs produced by embedding matching algorithms against the gold test links and report the precision (P), recall (R), and F1 values.

Alignment Results

It is evident from Table 4.8 that, compared with 1-to-1 alignment, the results change significantly on the new dataset. Specifically: (1) RInf and CSLS attain the best F1 scores, whereas the results are not very promising (e.g., with F1 score lower than 0.1 when using GCN). (2) Sink. and Hun. achieve much worse results compared with the performance on 1-to-1 alignment datasets. (3) The results of SMat and RL are even inferior to those of the simple baseline DInf. The main reason accounting for these changes is that the non-1-to-1 alignment links pose great challenges to existing embedding matching algorithms. Specifically, for DInf, CSLS, RInf, Sink., and RL, they only align one target entity (that possesses the highest score) to a given source entity, but fail to discover other alignment links that also involve this source entity. For SMat and Hun., they impose the 1-to-1 constraint during the matching process, which falls short on the non-1-to-1 setting, thus leading to inferior results. Therefore, it calls for the study on embedding matching algorithms targeted at non-1-to-1 alignment.

Table 4.8 The results on non-1-to-1 alignment dataset

Discussion of the k Value in CSLS and RInf

In Fig. 4.9, we report the performance of CSLS and RInf given varying k values on FB_DBP_MUL. It shows that, generally, a larger k leads to better results. This is because, on the non-1-to-1 setting, an entity is likely to be matched to several entities on the other side, where it is more appropriate to consider the top-k values, rather than the sole maximum value, when refining the pairwise scores.

Fig. 4.9
2 double line graphs of F 1 scores versus k values. a. The C S L S line is constant at 0.073 and ends at (5, 0.074). R inf fluctuates from (1, 0.073) and ends at (5, 0.076). b. C S L S fluctuates between (1, 0.154) and (5, 0.154). R inf increases from (1, 0.152) to (5, 0.155). Values are estimated.

F1 scores with varying k value on FB_DBP_MUL. (a) With GCN. (b) With RREA

6 Summary and Future Direction

In this section, we summarize the observations and insights made from our evaluation and provide possible future research directions.

(1) The Investigation into Matching KGs in Embedding Spaces Has Not Yet Made Substantial Progress

Although there are a few algorithms tailored for matching KGs in embedding spaces, e.g., CSLS, RInf, and RL, under the most popular EA evaluation setting (with 1-to-1 alignment constraint), they are outperformed by the classic general matching algorithms, i.e., Hun.. Hence, there is still much room for improving matching KGs in embedding spaces.

(2) No Existing Embedding Matching Algorithm Prevails Under All Experimental Settings

The strategies designed to solve the linear assignment problem attain the best performance under the 1-to-1 setting, while they fall short on more practical and challenging scenarios since the new settings (e.g., non-1-to-1 alignment) no longer align with the conditions of these optimization algorithms. Similarly, although the methods for improving the computation of pairwise scores achieve superior results in the non-1-to-1 alignment scenario, they are outperformed by other solutions under the unmatchable setting. Therefore, each evaluation setting poses its own challenge to the embedding matching process, and currently there is no consistent winner.

(3) The Adaptation from General Matching Algorithms Requires Careful Design

Among the embedding matching algorithms, Hun. and SMat are general matching algorithms that have been applied to many other related tasks. Although directly adopting these general strategies to tackle EA is simple and effective, they might well fall short in some scenarios, as the alignment on KGs possesses its own challenges, e.g., the matching is not necessarily 1-to-1 constrained, or the pairwise scores are inaccurate. Thus, it is suggested to take full account of the characteristics of the alignment settings when adapting other general matching algorithms to cope with matching KGs in entity embedding spaces.

(4) The Scalability and Efficiency Should Be Brought to the Attention

Existing advanced embedding matching algorithms have poor scalability, due to the additional resource-consuming operations that contribute to the alignment performance, such as the ranking process in RInf and the 1-to-1 constraint exerted by Hun. and SMat. Besides, the space efficiency is also a critical issue. As shown in Sect. 4.4.4, most of the approaches have rather high memory costs given large-scale datasets. Therefore, considering that in practice there are much more entities, the scalability and efficiency issues should be considered during the algorithm design.

(5) The Practical Evaluation Settings Are Worth Further Investigation

Under the unmatchable and non-1-to-1 alignment settings, the performance of existing algorithms is not promising. A possible future direction is to introduce the notion of probability and leverage the probabilistic reasoning frameworks [19, 39], which have higher flexibility, to produce the alignment results.

7 Conclusion

This paper conducts a comprehensive survey and evaluation of matching algorithms for KGs in entity embedding spaces. We evaluate seven state-of-the-art strategies in terms of effectiveness and efficiency on a wide range of datasets, including two experimental settings that better mirror real-life challenges. We identify the strengths and weaknesses of these algorithms under different settings. We hope the experimental results would be valuable for researchers and practitioners to put forward more effective and scalable embedding matching algorithms.