Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data

  • Giulia Carreras
  • Monica Simonetti
  • Claudio Cricelli
  • Francesco Lapi
Systems-Level Quality Improvement
Part of the following topical collections:
  1. Systems-Level Quality Improvement


In the last decades, the availability of electronic records routinely collected in various health care settings has increased. The data sources include clinical databases, such as primary care databases, and administrative databases, such as electronic health record of hospital admissions, in-hospital procedures, and reimbursed medications. These data present opportunities for innovative research to improve patient care and to inform decisions in public health and clinical practice [1, 2].

In order to take advantage of the available data sources, linking procedures are important and consist of matching records of two or more datasets by means of common identifiers [3].

The difficulties of record linkage vary with the structure and the quality of the databases being linked. The linking variables may not uniquely identify an individual, are prone to errors and/or can be missing.

Two approaches for record linkage are possible, namely the deterministic and the probabilistic...



This study has been supported by the Italian College of General Practitioners and Primary Care.

Compliance with ethical standards

Conflict of interests



  1. 1.
    Morrato, E. H., Elias, M., and Gericke, C. A., Using population-based routine data for evidence-based health policy decisions: Lessons from three examples of setting and evaluating national health policy in Australia, the UK and the USA. J Public Health (Oxf) 29(4):463–471, 2007.CrossRefGoogle Scholar
  2. 2.
    De Coster, C., Quan, H., Finlayson, A. et al., Identifying priorities in methodological research using ICD-9-CM and ICD-10 administrative data: Report from an international consortium. BMC Health Serv Res 6(1):77, 2006.CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Leicester, G., Goldacre, M., Simmons, H. et al., Computerized linking of medical records: Methodological guidelines. J Epidemiol Community Health 47:316–319, 1993.CrossRefGoogle Scholar
  4. 4.
    Fellegi, I. P., and Sunter, A. B., A theory for record linkage. J Am Stat Ass 64(328):1183–1210, 1969.CrossRefGoogle Scholar
  5. 5.
    Contiero, P., Tittarelli, A., Tagliabue, G., Maghini, A., Fabiano, S., Crosignani, P., and Tessandori, R., The EpiLink record linkage software. Methods Inf Med 44(1):66–71, 2005.CrossRefPubMedGoogle Scholar
  6. 6.
    Christen, P. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science & Business Media. 2012.Google Scholar
  7. 7.
    Christen, P., and Goiser, K., Quality and complexity measures for data linkage and deduplication. Quality Measures in Data Mining 43:127–151, 2007.Google Scholar
  8. 8.
    Wasi, N., and Flaaen, A., Record linkage using STATA: Pre-processing, linking and reviewing utilities. The Stata Journal 15:672–697, 2014.Google Scholar
  9. 9.
    Sariyar, M., and Borg, A., The RecordLinkage package: Detecting errors in data. The R Journal 2:61–67, 2010.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Giulia Carreras
    • 1
  • Monica Simonetti
    • 2
  • Claudio Cricelli
    • 3
  • Francesco Lapi
    • 2
  1. 1.Cancer Prevention and Research InstituteFlorenceItaly
  2. 2.Health SearchItalian College of General Practitioners and Primary CareFlorenceItaly
  3. 3.Italian College of General Practitioners and Primary CareFlorenceItaly

Personalised recommendations