Definition/Introduction
Probabilistic matching differs from the simplest data matching technique, deterministic matching. For deterministic matching, two records are said to match if one or more identifiers are identical. Deterministic record linkage is a good option when the entities in the data sets have identified common identifiers with a relatively high quality of data. Probabilistic matching is a statistical approach in measuring the probability that two records represent the same subject or individual based on whether they agree or disagree on the various identifiers (Dusetzina et al. 2014).
It calculates linkage composite weights based on likeness scores for identifier values and uses thresholds to determine a match, nonmatch, or possible match. The quality of resulting matches can depend upon one’s confidence in the specification of the matching rules (Zhang and Stevens 2012). It is designed to work using a wider set of data elements and all available identifiers for matching...
Further Readings
Dusetzina, S. B., Tyree, S., Meyer, A. M., et al. (2014). Linking data for health services research: A framework and instructional guide. Rockville: Agency for Healthcare Research and Quality (US).
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.
Schumacher, S. (2007). Probabilistic versus deterministic data matching: Making an accurate decision, information management special reports. Washington, DC: The Office of the National Coordinator for Health Information Technology (ONC).
Winkler, W. E. (1999). The state of record linkage and current research problems. Washington, DC: Statistical Research Division, US Census Bureau.
Zhang, T., & Stevens, D. W. (2012). Integrated data system person identification: Accuracy requirements and methods. https://ssrn.com/abstract=2512590; https://doi.org/10.2139/ssrn.2512590.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Zhang, T. (2018). Probabilistic Matching. In: Schintler, L., McNeely, C. (eds) Encyclopedia of Big Data. Springer, Cham. https://doi.org/10.1007/978-3-319-32001-4_501-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-32001-4_501-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32001-4
Online ISBN: 978-3-319-32001-4
eBook Packages: Springer Reference Business and ManagementReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences