European Journal of Epidemiology

, Volume 33, Issue 6, pp 601–605 | Cite as

Performing studies using the UK Clinical Practice Research Datalink: to link or not to link?

  • Laura McDonald
  • Anna Schultze
  • Robert Carroll
  • Sreeram V. RamagopalanEmail author


The Clinical Practice Research Datalink (CPRD) is a repository of electronic medical records collected during routine primary care clinical practice in the UK, and is one of the most widely used sources of real-world data for healthcare research. Although CPRD provides access to comprehensive longitudinal patient records, the data does not fully capture diagnoses or outcomes occurring in secondary care and/or mortality. We provide here an overview of CPRD and the potential bias when using unlinked data in certain situations. Linkage of CPRD to other datasets can help to overcome these limitations. We discuss when to consider linkage to secondary care, disease-specific data sources or the official mortality data when conducting research using CPRD data.


CPRD HES Data linkage Epidemiology Primary care Secondary care 



We are also grateful to staff at the CPRD for their work and research, which contributes to the continuous improvement of studies using CPRD data.

Author’s Contribution

Laura McDonald (LM) and Sreeram Ramagopalan (SR) are full-time employees of Bristol-Myers Squibb. Anna Schultze (AS) and Robert Carroll (RC) are full-time employees of Evidera. The authors report no other competing interests. All authors have extensive experience of performing observational epidemiological studies using CPRD. After reading a number of publications not considering linked CPRD data, SR conceived the article to provide guidance to researchers. LM identified all research articles comparing linked and unlinked CPRD data and wrote the first draft of the manuscript. AS and RC worked on further manuscript drafts. All authors reviewed and approved the manuscript as submitted.


  1. 1.
    Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36.CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Smeeth L, Cook C, Fombonne E, Heavey L, Rodrigues LC, Smith PG, et al. MMR vaccination and pervasive developmental disorders: a case-control study. Lancet. 2004;364(9438):963–9.CrossRefPubMedGoogle Scholar
  3. 3.
    Douglas IJ, Smeeth L. Exposure to antipsychotics and risk of stroke: self controlled case series study. BMJ. 2008;337(7450):a1227.CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Oyinlola JO, Campbell J, Kousoulis AA. Is real world evidence influencing practice? A systematic review of CPRD research in NICE guidances. BMC Health Serv Res. 2016;16(1):299.CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Kousoulis AA, Rafi I, De Lusignan S. The CPRD and the RCGP: building on research success by enhancing benefits for patients and practices. Br J Gen Pract. 2015;65(631):54–5.CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Hospital Episode Statistics. Accessed 1 Jan 2017.
  7. 7.
    National Cancer Intelligence Network. Accessed 1 Jan 2017.
  8. 8.
    National Institute for Clinincal Outcomes Research, Herrett E, Smeeth L, Walker L, Weston C. Myocardial ischaemia national audit project. Heart. 2010;96(16):1264–7.CrossRefGoogle Scholar
  9. 9.
    Office for National Statistics. Mortality statistics: metadata. Cardiff: ONS; 2014.Google Scholar
  10. 10.
    CPRD linked data access. Accessed 1 Jan 2017.
  11. 11.
    Currie CJ, Berni E, Jenkins-Jones S, Poole CD, Ouwens M, Driessen S, et al. Antibiotic treatment failure in four common infections in UK primary care 1991–2012: longitudinal analysis. BMJ. 2014;349:g5493.CrossRefPubMedGoogle Scholar
  12. 12.
    Herrett E, Shah AD, Boggon R, Denaxas S, Smeeth L, van Staa T, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346(May):f2350.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Morley KI, Wallace J, Denaxas SC, Hunter RJ, Patel RS, Perel P, et al. Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PLoS ONE. 2014;9(11):e110900.CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Morgan CL, Currie CJ, Stott NCH, Smithers M, Butler CC, Peters JR. Estimating the prevalence of diagnosed diabetes in a health district of Wales: the importance of using primary and secondary care sources of ascertainment with adjustment for death and migration. Diabet Med. 2002;17(2):141–5.CrossRefGoogle Scholar
  15. 15.
    Millett ERC, Quint JK, De Stavola BL, Smeeth L, Thomas SL. Improved incidence estimates from linked versus stand-alone electronic health records. J Clin Epidemiol. 2016;75:66–9.CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Crooks CJ, West J, Card TR. A comparison of the recording of comorbidity in primary and secondary care by using the Charlson Index to predict short-term and long-term survival in a routine linked data cohort. BMJ Open. 2015;5(6):e007974.CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Boggon R, van Staa TP, Chapman M, Gallagher AM, Hammad TA, Richards MA. Cancer recording and mortality in the General Practice Research Database and linked cancer registries. Pharmacoepidemiol Drug Saf. 2013;22(2):168–75.CrossRefPubMedGoogle Scholar
  18. 18.
    Gallagher AM, Williams T, Leufkens HGM, De Vries F. The impact of the choice of data source in record linkage studies estimating mortality in venous thromboembolism. PLoS ONE. 2016;11(2):1–11.CrossRefGoogle Scholar
  19. 19.
    Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4–14.CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Langan SM, Guttmann A, Harron K, Moher D, Petersen I, Sorensen H, et al. Guidelines for the REporting of studies Conducted using Observational Routinely-collected health data (RECORD): an extension of the STROBE reporting guidelines. PLoS Med. 2015;12(10):e1001885.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    De Vries F, Setakis E, Zhang B, Van Staa TP. Long-acting 2-agonists in adult asthma and the pattern of risk of death and severe asthma outcomes: a study using the GPRD. Eur Respir J. 2010;36(3):494–502.CrossRefPubMedGoogle Scholar
  22. 22.
    Springate DA, Kontopantelis E, Ashcroft DM, Olier I, Parisi R, Chamapiwa E, et al. ClinicalCodes: an online clinical codes repository to improve the validity and reproducibility of research using electronic medical records. PLoS ONE. 2014;9(6):6–11.CrossRefGoogle Scholar
  23. 23.
    Rañopa M, Douglas I, van Staa T, Smeeth L, Klungel O, Reynolds R, et al. The identification of incident cancers in UK primary care databases: a systematic review. Pharmacoepidemiol Drug Saf. 2015;24(1):11–8.CrossRefPubMedGoogle Scholar
  24. 24.
    Denaxas SC, George J, Herrett E, Shah AD, Kalra D, Hingorani AD, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol. 2012;41(6):1625–38.CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Baker R, Tata LJ, Kendrick D, Orton E. Identification of incident poisoning, fracture and burn events using linked primary care, secondary care and mortality data from England: implications for research and surveillance. Inj Prev. 2016;22(1):59–67.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Centre for Observational Research and Data SciencesBristol-Myers SquibbUxbridgeUK
  2. 2.Real-World EvidenceEvideraLondonUK

Personalised recommendations