Introduction to Entity Alignment

Zhao, Xiang; Zeng, Weixin; Tang, Jiuyang

doi:10.1007/978-981-99-4250-3_1

Part of the book series: Big Data Management ((BIGDM))

913 Accesses

Abstract

In this section, we provide a concise overview of the entity alignment task and also discuss other related tasks that have a close connection to entity alignment.

You have full access to this open access chapter, Download chapter PDF

1 Background

In the past few years, there has been a significant increase in the use and development of KGs and their various applications. These KGs are designed to store world knowledge, represented as triples (i.e., \(<\)entity, relation, entity\(>\)) consisting of entities, relations, and other entities, with each entity referring to a distinct real-world object, and each relation representing a connection between those objects. Since these entities serve as the foundation for the triples in a KG, the triples are inherently interconnected, creating a large and complex graph of knowledge. Currently, we have a large number of general KGs (e.g., DBpedia [1], YAGO [52], Google’s Knowledge Vault [14]) and domain-specific KGs (e.g., medical [48] and scientific KGs [56]). KGs have been utilized to improve a wide range of downstream applications, including but not limited to keyword search [64], fact-checking [30], and question answering [12, 28].

A knowledge graph, denoted as \(G = (E, R, T)\), is a graph that consists of three main components: a set of entities E, a set of relations R, and a set of triples T, where \(T \subseteq E \times R \times E\) represents the directed edges in the graph. In the set of triples T, a single triple \((h, r, t)\) represents a relationship between a head entity h and a tail entity t through a specific relation r. Each entity in the graph is identified by a unique identifier, such as http://dbpedia.org/resource/Spain in the case of DBpedia.

In practice, KGs are typically constructed from a single data source, making it difficult to achieve comprehensive coverage of a given domain [46]. To improve the completeness of a KG, one popular strategy is to integrate information from other KGs that may contain supplementary or complementary data. For instance, a general KG may only include basic information about a scientist, while scientific domain-specific KGs may have additional details like biographies and lists of publications. To combine knowledge across multiple KGs, a crucial step is to align equivalent entities in different KGs, which is known as entity alignment (EA) [7, 25].^{Footnote 1}

Given a source KG \(G_1 = (E_1, R_1, T_1)\), a target KG \(G_2 = (E_2, R_2, T_2)\), and seed entity pairs (training set), i.e., \(S = \{(u,v) \mid u\in E_1, v\in E_2, u \leftrightarrow v\}\), where \(\leftrightarrow \) represents equivalence (i.e., u and v refer to the same real-world object), the task of EA can be defined as discovering the equivalent entity pairs in the test set.

^{Footnote 2}

2 mirrored knowledge graphs with 7 entity nodes that have names in English and Spanish. Seed entity pairs, Mexico link to Alfonso Cuaron, which links to Roma city, connected Roma film and gravity film, and Spain that links to Madrid. A dashed line connects the seed entity pair of Mexico. — **Fig. 1.1**

Example Figure 1.1 shows a partial English KG (KG\({ }_{\text{EN}}\)) and a partial Spanish KG (KG\({ }_{\text{ES}}\)) concerning the director Alfonso Cuarón. Note that each entity in the KG has a unique identifier. For example, the movie “Roma” in the source KG is uniquely identified by Roma(film).^Fn2 Given the seed entity pair, i.e., Mexico from KG\({ }_{\text{EN}}\) and Mexico from KG\({ }_{\text{ES}}\), EA aims to find the equivalent entity pairs in the test set, e.g., returning Roma(ciudad) in KG\({ }_{\text{ES}}\) as the corresponding target entity to the source entity Roma(city) in KG\({ }_{\text{EN}}\).

Broadly speaking, current entity alignment (EA) methods typically address the problem by assuming that equivalent entities in different KGs share similar local structures and applying representation learning techniques to embed entities as data points in a low-dimensional feature space. With effective entity embedding, the pairwise dissimilarity of entities can be calculated as the distance between data points, allowing us to evaluate whether two entities are a match or not.^{Footnote 3}

2 Related Works

While the problem of EA was introduced a few years ago, the more generic version of the problem –identifying entity records referring to the same real-world entity from different data sources– has been investigated from various angles by different communities, under the names of entity resolution (ER) [15, 18, 45], entity matching [13, 42], record linkage [8, 34], deduplication [16], instance/ontology matching [20, 35, 49,50,51], link discovery [43, 44], and entity linking/entity disambiguation [11, 29]. Next, we describe the related work and the scope of this book.

2.1 Entity Linking

The process of entity linking (EL) or entity disambiguation is the act of recognizing entity mentions in natural language text and linking them to the corresponding entities in a given reference catalog, which is usually a knowledge graph. This process involves identifying which entity a particular mention in the text refers to. For example, if given the word “Rome,” the task would be to determine if it refers to the city in Italy, a movie, or another entity and then link it to the right entity in the reference catalog. Prior studies in EL [21, 22, 29, 36, 68] have used various sources of information to disambiguate entity mentions, including surrounding words, prior probabilities of certain target entities, already disambiguated entity mentions, and background knowledge from sources such as Wikipedia. However, much of this information is not available in scenarios where aligning KGs is required, such as entity embeddings or the prior distribution of entity linking given a mention. Moreover, EL is concerned with mapping natural language text to a KG, while this research investigates the mapping of entities between two KGs.

2.2 Entity Resolution

Entity resolution, which is also referred to as entity matching, deduplication, or record linkage, assumes that the input is relational data, and each data object usually has a large amount of textual information described in multiple attributes. Therefore, various similarity or distance functions are used in entity resolution to measure the similarity between two objects. These functions include Jaro-Winkler distance for comparing names and numerical distance for comparing dates. Based on the similarity measure, both rule-based and machine learning-based methods can be employed to classify two objects as either matching or non-matching [9].

To clarify further, in ER tasks, the attributes of data objects are first aligned, which can be done manually or automatically. Then, the similarity or distance functions are used to calculate the similarities between corresponding attribute values of the two objects. Finally, the similarity scores between the aligned attributes are combined or aggregated to determine the overall similarity between the two objects. This process allows rule-based or machine learning-based methods to classify pairs of objects as either matching or non-matching, based on the computed similarity scores [32, 45].

2.3 Entity Resolution on KGs

Certain methods for ER are created with the purpose of managing KGs and focus solely on binary connections, or data shaped like a graph. These methods are sometimes called instance/ontology matching approaches [49, 50]. The graph-shaped data comes with its own challenges: (1) Entities in graph-shaped data often lack detailed textual descriptions and may only be represented by their name, with a minimal amount of accompanying information. (2) Unlike classical databases, which assume that all fields of a record are present, KGs are built on the Open World Assumption, where the absence of certain attributes of an entity in the KG does not necessarily mean that they do not exist in reality. This fundamental difference sets KGs apart from traditional databases. (3) KGs have their own set of predefined semantics. At a basic level, these can take the form of a taxonomy of classes. In more complex cases, KGs can be endowed with an ontology of logical axioms.

In the past 20 years, various techniques have been developed to address the specific challenges of KGs, particularly in the context of the Semantic Web and the Linked Open Data cloud [26]. These techniques can be categorized along several different dimensions:

Scope. Several techniques have been developed for aligning KGs along different dimensions. For example, some approaches aim to align the entities in two different KGs, while others focus on aligning the relationship names, or schema, between KGs. Additionally, some methods aim to align the class taxonomies of two KGs, and a few techniques achieve all three tasks at once. In this particular book, however, the focus is on the first task, which is aligning entities in KGs.
Background knowledge. Certain techniques rely on an ontology (T-box) as background information, particularly those that participate in the Ontology Alignment Evaluation Initiative (OAEI).^{Footnote 4} However, in this specific book, the focus is on techniques that do not require such prior knowledge and can operate without an ontology.
Training. Some techniques for aligning knowledge graphs are unsupervised and operate directly on input data without any need for training data or a training phase. Examples of such methods include PARIS [51] and SiGMa [35]. On the other hand, other approaches involve learning mappings between entities based on predefined seeds. This particular book, however, focuses on the latter class of approaches.

Most of the supervised or semi-supervised approaches for entity alignment utilize recent advances in deep learning [23]. These approaches primarily rely on graph representation learning techniques to model the structure of knowledge graphs and generate entity embeddings for alignment. To refer to the supervised or semi-supervised approaches, we use the term “entity alignment (EA) approaches,” which is also the main focus of this study. However, in the next chapter, we include PARIS [51] for comparison as a representative of the unsupervised approaches. We also include AgreementMakerLight (AML) [17] as a representative of unsupervised systems that use background knowledge. For the other systems, we refer the reader to other surveys [9, 33, 41, 43].

In addition, since EA pursues the same goal as ER, it can be deemed a special but nontrivial case of ER. In this light, general ER approaches can be adapted to the problem of EA, and we include representative ER methods for comparison (to be detailed in Chap. 2).

Existing Benchmarks

Several synthetic datasets, such as DBP15K and DWY100K, were created using the inter-language and reference links already present in DBpedia to assess the effectiveness of EA methods. Chapter 2 contains more extensive statistical information about these datasets.

Notably, the Ontology Alignment Evaluation Initiative (OAEI) promoted the knowledge graph track.^{Footnote 5} Existing benchmarks for EA only provide instance-level information, while the KGs in these datasets include both schema and instance information. This can create an unfair evaluation of current EA approaches that do not consider the availability of ontology information. Hence, they are not presented in this book.

3 Evaluation Settings

This section provides an introduction to the evaluation settings that are commonly used for the EA task.

Datasets

Three datasets are commonly used and are representative, including the following:

DBP15K [53]. This particular dataset comprises three pairs of multilingual KGs that were extracted from DBpedia. These pairs include English to Chinese (\({\mathtt {DBP15K}_{\mathtt {ZH-EN}}}\)), English to Japanese (\({\mathtt {DBP15K}_{\mathtt {JA-EN}}}\)), and English to French (\({\mathtt {DBP15K}_{\mathtt {FR-EN}}}\)). Each of these KG pairs is made up of 15,000 inter-language links, which serve as gold standards.
DWY100K [54]. The dataset consists of two pairs of mono-lingual knowledge graphs, namely, \({\mathtt {DWY100K}_{\mathtt {DBP-WD}}}\) and \({\mathtt {DWY100K}_{\mathtt {DBP-YG}}}\). These pairs were extracted from DBpedia, Wikidata, and YAGO 3, and each one contains 100,000 pairs of entities. The extraction process is similar to that of DBP15K, except that the inter-language links have been replaced with reference links that connect these knowledge graphs.
SRPRS. According to Guo et al. [24], the KGs in previous EA datasets, such as DBP15K and DWY100K, are overly dense and do not accurately reflect the degree distributions observed in real-life KGs. In response to this issue, Guo et al. [24] developed a new EA benchmark that uses reference links in DBpedia to establish KGs with degree distributions that better reflect real-life situations. The resulting evaluation benchmark includes both cross-lingual (\({\mathtt {SRPRS}_{\mathtt {EN-FR}}}\), \({\mathtt {SRPRS}_{\mathtt {EN-DE}}}\)) and mono-lingual KG pairs (\({\mathtt {SRPRS}_{\mathtt {DBP-WD}}}\), \({\mathtt {SRPRS}_{\mathtt {DBP-YG}}}\)), where EN, FR, DE, DBP, WD, and YG represent DBpedia (English), DBpedia (French), DBpedia (German), DBpedia, Wikidata, and YAGO 3, respectively. Each KG pair is comprised of 15,000 pairs of entities.

Table 1.1 provides a summary of the datasets used in this study. Each KG pair includes relational triples, cross-KG entity pairs (30% of which are seed entity pairs and used for training), and attribute triples. The cross-KG entity pairs serve as gold standards.

Table 1.1 Statistics of EA benchmarks and our constructed dataset

Full size table

Degree Distribution

Figure 1.2 presents the degree distributions of entities in the datasets, which provides insights into the characteristics of these datasets. The degree of an entity is defined as the number of triples in which the entity is involved. Entities with higher degrees tend to have richer neighboring structures. The degree distributions of the different KG pairs in each dataset are very similar. Thus, for brevity, we present only one KG pair’s degree distribution in Fig. 1.2.

8 graphs against entity degree. Line plots for % of entities increase. Bars for number of entities decrease in, a minus 1, a minus 2 D B P 15 K, Z H and E N, c minus 1, c minus 2 S R P R S, E N and F R, and d minus 1 D B P, fluctuate in b minus 1, b minus 2 D W Y 100 K, D B P and W D, d minus 2 F B. — **Fig. 1.2**

The sub-figures in series (a) correspond to the DBP15K dataset. As shown, entities with a degree of 1 comprise the largest proportion, while the number of entities generally decreases with increasing degree values, with some fluctuations. It is worth noting that the coverage curve approximates a straight line, as the number of entities changes only slightly when the degree increases from 2 to 10.

The (b) set of figures is related to DWY100K. This dataset has a distinct structure from (a), as there are no entities with a degree of 1 or 2. Additionally, the number of entities reaches its highest point at degree 4 and then decreases as the entity degree increases.

The (c) set of figures is related to SRPRS. It is clear that the degree distribution of entities in this dataset is more realistic, with entities of lower degrees making up a larger proportion. This is due to its well-thought-out sampling approach. Additionally, the (d) set of figures corresponds to the dataset we created, which will be discussed in Chap. 2.

Evaluation Metrics

Most existing EA solutions use Hits@k (\(k=1, 10\)) and mean reciprocal rank (MRR) as their evaluation metrics. The target entities are arranged in order of increasing distance scores from the source entity when making a prediction. The Hits@k metric shows the proportion of correctly aligned entities among the k nearest target entities. Hits@1 is the most significant measure of the accuracy of the alignment results.

MRR denotes the average of the reciprocal ranks of the ground truths. Note that higher Hits@k and MRR indicate better performance. Unless otherwise specified, the results of Hits@k are represented in percentages.

Notes

1.
As where we are standing, EA can be deemed as a special case of entity resolution (ER), which recalls a pile of literature (to be discussed in Sect. 1.2). Thus, some ER methods (with minor adaptation to handle EA) are also involved in this book.
2.
²The identifiers in some KGs are human-readable, e.g., those in Fig. 1.1, while some are incomprehensible, e.g., Freebase MIDs like /m/012rkqx.
3.
Throughout the rest of this article, we may use the terms “align” and “match” interchangeably with the same meaning.
4.
http://oaei.ontologymatching.org/.
5.
http://oaei.ontologymatching.org/2019/knowledgegraph.

References

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC, pages 722–735, 2007.
Google Scholar
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.
Article Google Scholar
K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247–1250, 2008.
Google Scholar
A. Bordes, N. Usunier, A. García-Durán, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In NIPS, pages 2787–2795, 2013.
Google Scholar
Y. Cao, Z. Liu, C. Li, Z. Liu, J. Li, and T. Chua. Multi-channel graph neural network for entity alignment. In ACL, pages 1452–1461, 2019.
Google Scholar
M. Chen, Y. Tian, K. Chang, S. Skiena, and C. Zaniolo. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In IJCAI, pages 3998–4004, 2018.
Google Scholar
M. Chen, Y. Tian, M. Yang, and C. Zaniolo. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In IJCAI, pages 1511–1517, 2017.
Google Scholar
P. Christen. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng., 24(9):1537–1555, 2012.
Article Google Scholar
V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, and K. Stefanidis. End-to-end entity resolution for big data: A survey. CoRR, abs/1905.06397, 2019.
Google Scholar
A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou. Word translation without parallel data. arXiv preprint arXiv:1710.04087, 2017.
Google Scholar
S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, pages 708–716, 2007.
Google Scholar
W. Cui, Y. Xiao, H. Wang, Y. Song, S. Hwang, and W. Wang. KBQA: learning question answering over QA corpora and knowledge bases. PVLDB, 10(5):565–576, 2017.
Google Scholar
S. Das, P. S. G. C., A. Doan, J. F. Naughton, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra, and Y. Park. Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services. In SIGMOD, pages 1431–1446, 2017.
Google Scholar
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In KDD, pages 601–610, 2014.
Google Scholar
M. Ebraheem, S. Thirumuruganathan, S. R. Joty, M. Ouzzani, and N. Tang. Distributed representations of tuples for entity resolution. PVLDB, 11(11):1454–1467, 2018.
Google Scholar
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1–16, 2007.
Article Google Scholar
D. Faria, C. Pesquita, E. Santos, I. F. Cruz, and F. M. Couto. Agreementmakerlight 2.0: Towards efficient large-scale ontology matching. In M. Horridge, M. Rospocher, and J. van Ossenbruggen, editors, ISWC, volume 1272 of CEUR Workshop Proceedings, pages 457–460. CEUR-WS.org, 2014.
Google Scholar
C. Fu, X. Han, L. Sun, B. Chen, W. Zhang, S. Wu, and H. Kong. End-to-end multi-perspective matching for entity resolution. In IJCAI, pages 4961–4967, 2019.
Google Scholar
L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. Fast rule mining in ontological knowledge bases with AMIE+. VLDB J., 24(6):707–730, 2015.
Article Google Scholar
L. A. Galárraga, N. Preda, and F. M. Suchanek. Mining rules to align knowledge bases. In AKBC@CIKM, pages 43–48, 2013.
Google Scholar
O.-E. Ganea and T. Hofmann. Deep joint entity disambiguation with local neural attention. In EMNLP, pages 2619–2629, Sept. 2017.
Google Scholar
A. Globerson, N. Lazic, S. Chakrabarti, A. Subramanya, M. Ringgaard, and F. Pereira. Collective entity resolution with multi-focal attention. In ACL, pages 621–631, Aug. 2016.
Google Scholar
I. J. Goodfellow, Y. Bengio, and A. C. Courville. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016.
MATH Google Scholar
L. Guo, Z. Sun, and W. Hu. Learning to exploit long-term relational dependencies in knowledge graphs. In ICML, pages 2505–2514, 2019.
Google Scholar
Y. Hao, Y. Zhang, S. He, K. Liu, and J. Zhao. A joint embedding method for entity alignment of knowledge bases. In CCKS, pages 3–14, 2016.
Google Scholar
T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, 2011.
Book Google Scholar
S. Hertling and H. Paulheim. The knowledge graph track at OAEI - gold standards, baselines, and the golden hammer bias. In A. Harth, S. Kirrane, A. N. Ngomo, H. Paulheim, A. Rula, A. L. Gentile, P. Haase, and M. Cochez, editors, ESWC, volume 12123 of Lecture Notes in Computer Science, pages 343–359. Springer, 2020.
Google Scholar
B. Hixon, P. Clark, and H. Hajishirzi. Learning knowledge graphs for question answering through conversational dialog. In NAACL, pages 851–861, 2015.
Google Scholar
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, pages 782–792, 2011.
Google Scholar
V. Huynh and P. Papotti. Buckle: Evaluating fact checking algorithms built on knowledge bases. PVLDB, 12(12):1798–1801, 2019.
Google Scholar
T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907, 2016.
Google Scholar
P. Konda, S. Das, P. S. G. C., A. Doan, A. Ardalan, J. R. Ballard, H. Li, F. Panahi, H. Zhang, J. F. Naughton, S. Prasad, G. Krishnan, R. Deep, and V. Raghavendra. Magellan: Toward building entity matching management systems. PVLDB, 9(12):1197–1208, 2016.
Google Scholar
H. Köpcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2):197–210, 2010.
Article Google Scholar
N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: similarity measures and algorithms. In SIGMOD, pages 802–803, 2006.
Google Scholar
S. Lacoste-Julien, K. Palla, A. Davies, G. Kasneci, T. Graepel, and Z. Ghahramani. Sigma: simple greedy matching for aligning large knowledge bases. In KDD, pages 572–580, 2013.
Google Scholar
P. Le and I. Titov. Improving entity linking by modeling latent relations between mentions. In ACL, pages 1595–1604, 2018.
Google Scholar
V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710, 1966.
Google Scholar
C. Li, Y. Cao, L. Hou, J. Shi, J. Li, and T.-S. Chua. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In EMNLP, pages 2723–2732, 2019.
Google Scholar
Y. Liu, H. Li, A. García-Durán, M. Niepert, D. Oñoro-Rubio, and D. S. Rosenblum. MMKG: multi-modal knowledge graphs. In P. Hitzler, M. Fernández, K. Janowicz, A. Zaveri, A. J. G. Gray, V. López, A. Haller, and K. Hammar, editors, ESWC, volume 11503 of Lecture Notes in Computer Science, pages 459–474. Springer, 2019.
Google Scholar
F. Monti, O. Shchur, A. Bojchevski, O. Litany, S. Günnemann, and M. M. Bronstein. Dual-primal graph convolutional networks. CoRR, abs/1806.00770, 2018.
Google Scholar
M. Mountantonakis and Y. Tzitzikas. Large-scale semantic integration of linked data: A survey. ACM Comput. Surv., 52(5):103:1–103:40, 2019.
Google Scholar
S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, pages 19–34, 2018.
Google Scholar
M. Nentwig, M. Hartung, A. N. Ngomo, and E. Rahm. A survey of current link discovery frameworks. Semantic Web, 8(3):419–436, 2017.
Article Google Scholar
A. N. Ngomo and S. Auer. LIMES - A time-efficient approach for large-scale link discovery on the web of data. In IJCAI, pages 2312–2317, 2011.
Google Scholar
H. Nie, X. Han, B. He, L. Sun, B. Chen, W. Zhang, S. Wu, and H. Kong. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In CIKM, pages 629–638, 2019.
Google Scholar
H. Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8(3):489–508, 2017.
Article Google Scholar
V. Rastogi, N. N. Dalvi, and M. N. Garofalakis. Large-scale collective entity matching. PVLDB, 4(4):208–218, 2011.
Google Scholar
M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng, and D. Sontag. Learning a health knowledge graph from electronic medical records. Scientific Reports, 7, 12 2017.
Article Google Scholar
C. Shao, L. Hu, J. Li, Z. Wang, T. L. Chung, and J. Xia. Rimom-im: A novel iterative framework for instance matching. J. Comput. Sci. Technol., 31(1):185–197, 2016.
Article MathSciNet Google Scholar
P. Shvaiko and J. Euzenat. Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng., 25(1):158–176, 2013.
Article Google Scholar
F. M. Suchanek, S. Abiteboul, and P. Senellart. PARIS: probabilistic alignment of relations, instances, and schema. PVLDB, 5(3):157–168, 2011.
Google Scholar
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697–706, 2007.
Google Scholar
Z. Sun, W. Hu, and C. Li. Cross-lingual entity alignment via joint attribute-preserving embedding. In ISWC, pages 628–644, 2017.
Google Scholar
Z. Sun, W. Hu, Q. Zhang, and Y. Qu. Bootstrapping entity alignment with knowledge graph embedding. In IJCAI, pages 4396–4402, 2018.
Google Scholar
Z. Sun, J. Huang, W. Hu, M. Chen, L. Guo, and Y. Qu. Transedge: Translating relation-contextualized embeddings for knowledge graphs. In ISWC, pages 612–629, 2019.
Google Scholar
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In SIGKDD, pages 990–998. ACM, 2008.
Google Scholar
B. D. Trisedya, J. Qi, and R. Zhang. Entity alignment between knowledge graphs using attribute embeddings. In AAAI, pages 297–304, 2019.
Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS, pages 5998–6008, 2017.
Google Scholar
Z. Wang, Q. Lv, X. Lan, and Y. Zhang. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP, pages 349–357, 2018.
Google Scholar
Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI, pages 5278–5284, 2019.
Google Scholar
Y. Wu, X. Liu, Y. Feng, Z. Wang, and D. Zhao. Jointly learning entity and relation representations for entity alignment. In EMNLP, pages 240–249, 2019.
Google Scholar
K. Xu, L. Wang, M. Yu, Y. Feng, Y. Song, Z. Wang, and D. Yu. Cross-lingual knowledge graph alignment via graph matching neural network. In ACL, pages 3156–3161, 2019.
Google Scholar
H.-W. Yang, Y. Zou, P. Shi, W. Lu, J. Lin, and S. Xu. Aligning cross-lingual entities with multi-aspect information. In EMNLP, pages 4422–4432, 2019.
Google Scholar
Y. Yang, D. Agrawal, H. V. Jagadish, A. K. H. Tung, and S. Wu. An efficient parallel keyword search engine on knowledge graphs. In ICDE, pages 338–349, 2019.
Google Scholar
W. Zeng, X. Zhao, J. Tang, and X. Lin. Collective entity alignment via adaptive features. In ICDE, pages 1870–1873. IEEE, 2020.
Google Scholar
W. Zeng, X. Zhao, W. Wang, J. Tang, and Z. Tan. Degree-aware alignment for entities in tail. In SIGIR, pages 811–820. ACM, 2020.
Google Scholar
Q. Zhang, Z. Sun, W. Hu, M. Chen, L. Guo, and Y. Qu. Multi-view knowledge graph embedding for entity alignment. In IJCAI, pages 5429–5435, 2019.
Google Scholar
X. Zhou, Y. Miao, W. Wang, and J. Qin. A recurrent model for collective entity linking with adaptive features. In AAAI, pages 329–336. AAAI Press, 2020.
Google Scholar
H. Zhu, R. Xie, Z. Liu, and M. Sun. Iterative entity alignment via joint knowledge embeddings. In IJCAI, pages 4258–4264, 2017.
Google Scholar
Q. Zhu, X. Zhou, J. Wu, J. Tan, and L. Guo. Neighborhood-aware attentional representation for multilingual knowledge graphs. In IJCAI, pages 1943–1949, 2019.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, Hunan, China
Xiang Zhao, Weixin Zeng & Jiuyang Tang

Authors

Xiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weixin Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jiuyang Tang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhao, X., Zeng, W., Tang, J. (2023). Introduction to Entity Alignment. In: Entity Alignment. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-99-4250-3_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-4250-3_1
Published: 26 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4249-7
Online ISBN: 978-981-99-4250-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics