Synonyms
Entity matching; Object deduplication; Record linkage; Reference reconciliation
Definition
Let \( \mathcal {E} \) denote a set of entities in a domain, described using a set of attributes \( {\mathcal {A}} \). Each entity \( E \in {\mathcal {E}} \) is associated with zero, one, or more values for each attribute \( A \in {\mathcal {A}} \). For each entity in \( {\mathcal {E}} \), there can be a set of records \( \mathcal {R} \), provided by one or more sources over the attributes \( {\mathcal {A}} \), where each record provides at most one value for an attribute. We consider atomic values (string, number, date, time, etc.) as attribute values, and allow multiple representations of the same value, as well as erroneous values, in records. Entity resolution takes as input the records provided by the sources and decides which records refer to the same entity; in particular, it computes a partitioning \( {\mathcal {P}} \) of \( {\mathcal {R}} \), such that records in each partition...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. Science. 1959;130(3381):954–59.
Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64(328):1183–210.
Cohen WW. Integration of heterogeneous databases without common domains using queries based on textual similarity. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 201–12.
Cohen WW, Ravikumar P, Fienberg SE. A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 3rd International Workshop on Information Integration on the Web; 2003. p. 73–8.
Hernandez MA, Stolfo SJ. The merge/purge problem for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1995. p. 127–38.
Winkler WE. Using the EM algorithm for weight computation in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods; 1988. p. 667–71.
Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002. p. 269–78.
Dey D. Entity matching in heterogeneous databases: a logistic regression approach. Decis Support Syst. 2008;44(3):740–47.
Hassanzadeh O, Chiang F, Miller RJ, Lee HC. Framework for evaluating clustering algorithms in duplicate detection. Proc. VLDB Endowment. 2009;2(1):1282–293.
Dong X, Halevy AY, Madhavan J. Reference reconciliation in complex information spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 85–96.
Chaudhuri S, Sarma AD, Ganti V, Kaushik R. Leveraging aggregate constraints for deduplication. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 437–48.
Guo S, Dong XL, Srivastava D, Zajac R. Record linkage with uniqueness constraints and erroneous values. Proc. VLDB Endowment. 2010;3(1):417–28.
Li P, Dong XL, Maurino A, Srivastava D. Linking temporal records. Proc. VLDB Endowment. 2011;4(11):956–67.
McCallum AK, Nigam K, Ungar LH. Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000. p. 169–78.
Kolb L, Thor A, Rahm E. Load balancing for MapReduce-based entity resolution. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 618–29.
Gruenheid A, Dong XL, Srivastava D. Incremental record linkage. Proc. VLDB Endowment. 2014;7(9):697–708.
Fan W, Jia X, Li J, Ma S. Reasoning about record matching rules. Proc. VLDB Endowment. 2009;2(1):407–18.
Bansal N, Blum A, Chawla S. Correlation clustering. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 238–47.
Baxter R, Christen P, Churches T. A comparison of fast blocking methods for record linkage. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation; 2003. p. 253–68.
Kopcke H, Thor A, Rahm E. Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endowment. 2010;3(1):484–93.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Dong, X.L., Srivastava, D. (2018). Entity Resolution. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_2547
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_2547
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering