Record Linkage

Christen, Peter; Winkler, William E.

doi:10.1007/978-1-4899-7502-7_712-1

Peter Christen³ &
William E. Winkler⁴

604 Accesses
2 Citations

Abstract

Many data mining and machine learning projects require information from various data sources to be integrated and linked before they can be used for further analysis. A crucial task of such data integration is to identify which records refer to the same real-world entities across databases when no common entity identifiers are available and when records can contain errors and variations. This process of record linkage therefore has to rely upon the attributes that are available in the databases to be linked. For databases that contain personal information, for example, of customers, taxpayers, or patients, these are commonly their names, addresses, phone numbers, and dates of birth.To improve the scalability of the linkage process, blocking or indexing techniques are commonly applied to limit the comparison of records to pairs or groups that likely correspond to the same entity. Records are compared using a variety of comparison functions, most commonly approximate string comparators that account for typographical errors and variations in textual attributes. The compared records are then classified into matches, non-matches, and potential matches, depending upon the decision model used. If training data in the form of true matches and non-matches are available, supervised classification techniques can be employed. However, in many practical record linkage applications, no ground truth data are available, and therefore unsupervised approaches are required. An approach known as probabilistic record linkage is commonly employed. In this article we provide an overview of record linkage with an emphasis on the classification aspects of this process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Author information

Authors and Affiliations

Research School of Computer Science, The Australian National University, Canberra, Australia
Peter Christen
US Census Bureau, Suitland, MD, USA
William E. Winkler

Authors

Peter Christen
View author publications
You can also search for this author in PubMed Google Scholar
William E. Winkler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Christen .

Editor information

Editors and Affiliations

Engineering (CSE), University of New South Wales School of Computer Science &, Sydney, New South Wales, Australia
Claude Sammut
Software Engineering, Monash University School of Computer Science &, Melbourne, Victoria, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Christen, P., Winkler, W.E. (2016). Record Linkage. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_712-1

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7502-7_712-1
Received: 05 May 2016
Accepted: 05 May 2016
Published: 17 June 2016
Publisher Name: Springer, Boston, MA
Online ISBN: 978-1-4899-7502-7
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

Latest
Record Linkage

Published:

25 March 2023

DOI: https://doi.org/10.1007/978-1-4899-7502-7_712-2
Original
Record Linkage

Published:

17 June 2016

DOI: https://doi.org/10.1007/978-1-4899-7502-7_712-1

Record Linkage

Abstract

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Chapter history

Latest

Original

Navigation

Record Linkage

Abstract

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Chapter history

Latest

Original

Search

Navigation