Encyclopedia of Big Data

Living Edition
| Editors: Laurie A. Schintler, Connie L. McNeely

Probabilistic Matching

Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-32001-4_501-1


Probabilistic matching differs from the simplest data matching technique, deterministic matching. For deterministic matching, two records are said to match if one or more identifiers are identical. Deterministic record linkage is a good option when the entities in the data sets have identified common identifiers with a relatively high quality of data. Probabilistic matching is a statistical approach in measuring the probability that two records represent the same subject or individual based on whether they agree or disagree on the various identifiers (Dusetzina et al. 2014).

It calculates linkage composite weights based on likeness scores for identifier values and uses thresholds to determine a match, nonmatch, or possible match. The quality of resulting matches can depend upon one’s confidence in the specification of the matching rules (Zhang and Stevens 2012). It is designed to work using a wider set of data elements and all available identifiers for matching...


Matching Probability Link Plus Deterministic Matching Link King Total Link Weight 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.

Further Readings

  1. Dusetzina, S. B., Tyree, S., Meyer, A. M., et al. (2014). Linking data for health services research: A framework and instructional guide. Rockville: Agency for Healthcare Research and Quality (US).Google Scholar
  2. Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.CrossRefGoogle Scholar
  3. Schumacher, S. (2007). Probabilistic versus deterministic data matching: Making an accurate decision, information management special reports. Washington, DC: The Office of the National Coordinator for Health Information Technology (ONC).Google Scholar
  4. Winkler, W. E. (1999). The state of record linkage and current research problems. Washington, DC: Statistical Research Division, US Census Bureau.Google Scholar
  5. Zhang, T., & Stevens, D. W. (2012). Integrated data system person identification: Accuracy requirements and methods. https://ssrn.com/abstract=2512590;  https://doi.org/10.2139/ssrn.2512590.

Authors and Affiliations

  1. 1.Department of Accounting, Finance and EconomicsMerrick School of Business, University of BaltimoreBaltimoreUSA