European Journal of Epidemiology

, Volume 29, Issue 8, pp 551–558 | Cite as

When the entire population is the sample: strengths and limitations in register-based epidemiology

  • Lau Caspar ThygesenEmail author
  • Annette Kjær Ersbøll


Studies based on databases, medical records and registers are used extensively today in epidemiological research. Despite the increasing use, no developed methodological literature on use and evaluation of population-based registers is available, even though data collection in register-based studies differs from researcher-collected data, all persons in a population are available and traditional statistical analyses focusing on sampling error as the main source of uncertainty may not be relevant. We present the main strengths and limitations of register-based studies, biases especially important in register-based studies and methods for evaluating completeness and validity of registers. The main strengths are that data already exist and valuable time has passed, complete study populations minimizing selection bias and independently collected data. Main limitations are that necessary information may be unavailable, data collection is not done by the researcher, confounder information is lacking, missing information on data quality, truncation at start of follow-up making it difficult to differentiate between prevalent and incident cases and the risk of data dredging. We conclude that epidemiological studies with inclusion of all persons in a population followed for decades available relatively fast are important data sources for modern epidemiology, but it is important to acknowledge the data limitations.


Registers Database management systems Epidemiology Bias Nordic countries 


  1. 1.
    Irgens LM, Bjerkeda T. Epidemiology of leprosy in Norway—history of National Leprosy Registry of Norway from 1856 until today. Int J Epidemiol. 1973;2(1):81–9.PubMedCrossRefGoogle Scholar
  2. 2.
    Goldberg J, Gelfand HM, Levy PS. Registry evaluation methods: a review and case study. Epidemiol Rev. 1980;2:210–20.PubMedGoogle Scholar
  3. 3.
    St Sauver JL, Grossardt BR, Yawn BP, Melton LJ III, Rocca WA. Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. Am J Epidemiol. 2011;173:1059–68.PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Olsen J, Bronnum-Hansen H, Gissler M, Hakama M, Hjern A, Kamper-Jorgensen F, et al. High-throughput epidemiology: combining existing data from the Nordic countries in health-related collaborative research. Scand J Public Health. 2010;38:777–9.PubMedCrossRefGoogle Scholar
  5. 5.
    Thygesen LC, Daasnes C, Thaulow I, Bronnum-Hansen H. Introduction to Danish (nationwide) registers on health and social issues: structure, access, legislation, and archiving. Scand J Public Health. 2011;39:12–6.PubMedCrossRefGoogle Scholar
  6. 6.
    Sorensen TI. Great scientific potential in Danish registries [in Danish]. Ugeskr Laeger. 1994;156:5812–3.Google Scholar
  7. 7.
    Frank L. Epidemiology—when an entire country is a cohort. Science. 2000;287:2398–9.PubMedCrossRefGoogle Scholar
  8. 8.
    Sorensen HT, Sabroe S, Olsen J. A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol. 1996;25:435–42.PubMedCrossRefGoogle Scholar
  9. 9.
    Sorensen H. Regional administrative health registries as a resource in clinical epidemiology. Aarhus: Aarhus University; 1996.Google Scholar
  10. 10.
    Sorensen H. Regional administrative health registries as a resource in clinical epidemiology. Int J Risk Saf Med. 1997;10:1–22.PubMedGoogle Scholar
  11. 11.
    Pike MC, Henderson BE, Casagrande JT, Rosario I, Gray GE. Oral-contraceptive use and early abortion as risk-factors for breast-cancer in young-women. Br J Cancer. 1981;43:72–6.PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Brind J, Chinchilli VM, Severs WB, Summy-Long J. Induced abortion as an independent risk factor for breast cancer: a comprehensive review and meta-analysis. J Epidemiol Community Health. 1996;50:481–96.Google Scholar
  13. 13.
    Melbye M, Wohlfahrt J, Olsen JH, Frisch M, Westergaard T, Helweg-Larsen K, et al. Induced abortion and the risk of breast cancer. N Engl J Med. 1997;336:81–5.PubMedCrossRefGoogle Scholar
  14. 14.
    Blenstrup LT, Knudsen LB. Danish registers on aspects of reproduction. Scand J Public Health. 2011;39(7 Suppl.):79–82.Google Scholar
  15. 15.
    Gjerstorff ML. The Danish cancer registry. Scand J Public Health. 2011;39(7 Suppl):42–5.PubMedCrossRefGoogle Scholar
  16. 16.
    Norgaard M, Wogelius P, Pedersen L, Rothman KJ, Sorensen HT. Maternal use of oral contraceptives during early pregnancy and risk of hypospadias in male offspring. Urology. 2009;74:583–7.PubMedCrossRefGoogle Scholar
  17. 17.
    Peltola M, Juntunen M, Hakkinen U, Rosenqvist G, Seppala TT, Sund R. A methodological approach for register-based evaluation of cost and outcomes in health care. Ann Med. 2011;43:S4–13.PubMedCrossRefGoogle Scholar
  18. 18.
    Sund R, Nurmi-Luthje I, Luthje P, Tanninen S, Narinen A, Keskimaki I. Comparing properties of audit data and routinely collected register data in case of performance assessment of hip fracture in Finland. Methods Inf Med. 2007;46:558–66.PubMedGoogle Scholar
  19. 19.
    Dans PE. Looking for answers in all the wrong places. Ann Intern Med. 1993;119:855–7.PubMedCrossRefGoogle Scholar
  20. 20.
    Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP. Accuracy of diagnostic coding for Medicare patients under the prospective-payment system. N Engl J Med. 1988;318:352–5.PubMedCrossRefGoogle Scholar
  21. 21.
    Irgens LM. Challenges to registry-based epidemiology in post-modernistic civilization. Nor Epidemiol. 2001;11:127–31.Google Scholar
  22. 22.
    United Nations Economic Commission of Europe. Register-based statistics in the Nordic countries. New York: United Nations; 2007.Google Scholar
  23. 23.
    Wallgren A, Wallgren B. Register-based statistics—administrative data for statistical purposes. Sussex: Wiley; 2007.CrossRefGoogle Scholar
  24. 24.
    Hartley HO, Sielken RL Jr. A “super-population viewpoint” for finite population sampling. Biometrics. 1975;31:411–22.PubMedCrossRefGoogle Scholar
  25. 25.
    Edington ES. Randomization tests. New York: Marcel Dekker; 1986.Google Scholar
  26. 26.
    Sorensen HT, Schulze S. Danish health registries. A valuable tool in medical research. Dan Med Bull. 1996;43:463.PubMedGoogle Scholar
  27. 27.
    Agerbo E. Epidemiological suicide research based on Danish routine registers. Aarhus: Aarhus University; 2009.Google Scholar
  28. 28.
    Olsen J. Register-based research: some methodological considerations. Scand J Public Health. 2011;39:225–9.PubMedCrossRefGoogle Scholar
  29. 29.
    Jensen VM, Rasmussen AW. Danish education registers. Scand J Public Health. 2011;39(7 Suppl):91–4.PubMedCrossRefGoogle Scholar
  30. 30.
    Olsen J. Using secondary data. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia, PA: Lippincott Williams & Wilkins; 2008. p. 481–91.Google Scholar
  31. 31.
    Thomsen CF, Skovdal J, Helkjaer PE. Intraobserver variation in the classification of diseases [in Danish]. Ugeskr Laeger. 1995;157:3746–9.Google Scholar
  32. 32.
    Green J, Wintfeld N. How accurate are hospital discharge data for evaluating effectiveness of care? Med Care. 1993;31:719–31.PubMedCrossRefGoogle Scholar
  33. 33.
    Jencks SF, Williams DK, Kay TL. Assessing hospital-associated deaths from discharge data. The role of length of stay and comorbidities. JAMA. 1988;260:2240–6.PubMedCrossRefGoogle Scholar
  34. 34.
    Ray WA. Improving automated database studies. Epidemiology. 2011;22:302–4.PubMedCrossRefGoogle Scholar
  35. 35.
    Weiss NS. The new world of data linkages in clinical epidemiology: are we being brave or foolhardy? Epidemiology. 2011;22:292–4.PubMedCrossRefGoogle Scholar
  36. 36.
    Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15:291–303.PubMedCrossRefGoogle Scholar
  37. 37.
    Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH. Adjusting for unmeasured confounders in pharmacoepidemiologic claims data using external information. Epidemiology. 2005;16:17–24.PubMedCrossRefGoogle Scholar
  38. 38.
    Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;29:722–9.PubMedCrossRefGoogle Scholar
  39. 39.
    Hernan MA, Robins JM. Instruments for causal inference. An epidemiologists dream? Epidemiology. 2006;17:360–72.PubMedCrossRefGoogle Scholar
  40. 40.
    Earle CC, Tsai JS, Gelber RD, Weinstein MC, Neumann PJ, Weeks JC. Effectiveness of chemotherapy for advanced lung cancer in the elderly: instrumental variable and propensity analysis. J Clin Oncol. 2001;19:1064–70.PubMedGoogle Scholar
  41. 41.
    Cavelaars AEJM, Kunst AE, Geurts JJM, Crialesi R, Grotvedt L, Helmert U, et al. Educational differences in smoking: international comparison. Br Med J. 2000;320:1102–7.CrossRefGoogle Scholar
  42. 42.
    Groth MV, Fagt S, Brondsted L. Social determinants of dietary habits in Denmark. Eur J Clin Nutr. 2001;55:959–66.PubMedCrossRefGoogle Scholar
  43. 43.
    Schneeweiss S, Maclure M. Use of comorbidity scores for control of confounding in studies using administrative databases. Int J Epidemiol. 2000;29:891–8.PubMedCrossRefGoogle Scholar
  44. 44.
    Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40:373–83.PubMedCrossRefGoogle Scholar
  45. 45.
    Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45:613–9.PubMedCrossRefGoogle Scholar
  46. 46.
    Ghali WA, Hall RE, Rosen AK, Ash AS, Moskowitz MA. Searching for an improved clinical comorbidity index for use with ICD-9-CM administrative data. J Clin Epidemiol. 1996;49:273–8.PubMedCrossRefGoogle Scholar
  47. 47.
    Clark DO, VonKorff M, Saunders K, Baluch WM, Simon GE. A chronic disease score with empirically derived weights. Med Care. 1995;33:783–95.PubMedCrossRefGoogle Scholar
  48. 48.
    Greenland S. Basic methods for sensitivity analysis of biases. Int J Epidemiol. 1996;25:1107–16.PubMedCrossRefGoogle Scholar
  49. 49.
    Groenwold RHH, Nelson DB, Nichol KL, Hoes AW, Hak E. Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. Int J Epidemiol. 2010;39:107–17.PubMedCrossRefGoogle Scholar
  50. 50.
    Rothman KJ. Epidemiology—an introduction. Oxford: Oxford University Press; 2002.Google Scholar
  51. 51.
    Jaro MA. Probabilistic linkage of large public-health data files. Stat Med. 1995;14:491–8.PubMedCrossRefGoogle Scholar
  52. 52.
    Dean JM, Vernon DD, Cook L, Nechodom P, Reading J, Suruda A. Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services. Ann Emerg Med. 2001;37:616–26.PubMedCrossRefGoogle Scholar
  53. 53.
    Victor TW, Mera RM. Record linkage of health care insurance claims. J Am Med Inform Assoc. 2001;8:281–8.PubMedCentralPubMedCrossRefGoogle Scholar
  54. 54.
    Kripke DF, Langer RD, Kline LE. Hypnotics’ association with mortality or cancer: a matched cohort study. BMJ Open. 2012;2:e000850.Google Scholar
  55. 55.
    Hommel K, Rasmussen S, Madsen M, Kamper AL. The Danish Registry on regular dialysis and transplantation: completeness and validity of incident patient registration. Nephrol Dial Transplant. 2010;25:947–51.PubMedCrossRefGoogle Scholar
  56. 56.
    Lynge E, Sandegaard JL, Rebolj M. The Danish National Patient Register. Scand J Public Health. 2011;39(7 Suppl):30–3.PubMedCrossRefGoogle Scholar
  57. 57.
    Almdal TP, Sorensen TI. Incidence of parenchymal liver diseases in Denmark, 1981 to 1985: analysis of hospitalization registry data. The Danish Association for the Study of the Liver. Hepatology. 1991;13:650–5.PubMedCrossRefGoogle Scholar
  58. 58.
    Bernillon P, Lievre L, Pillonel J, Laporte A, Costagliola D. Record-linkage between two anonymous databases for a capture-recapture estimation of underreporting of AIDS cases: France 1990–1993. The Clinical Epidemiology Group from Centres d’Information et de Soins de l’Immunodeficience Humaine. Int J Epidemiol. 2000;29:168–74.PubMedCrossRefGoogle Scholar
  59. 59.
    Thomas AM, Thygerson SM, Merrill RM, Cook LJ. Identifying work-related motor vehicle crashes in multiple databases. Traffic Inj Prev. 2012;13:348–54.PubMedCrossRefGoogle Scholar
  60. 60.
    Patterson CC, Gyurus E, Rosenbauer J, Cinek O, Neu A, Schober E, et al. Trends in childhood type 1 diabetes incidence in Europe during 1989–2008: evidence of non-uniformity over time in rates of increase. Diabetologia. 2012;55:2142–7.PubMedCrossRefGoogle Scholar
  61. 61.
    McDonald TL, Amstrup SC. Estimation of population size using open capture-recapture models. J Agric Biol Environ Stat. 2001;6:206–20.CrossRefGoogle Scholar
  62. 62.
    Devantier A, Kjer JJ. The national patient register—a research tool? Ugeskr Laeger. 1991;153:516–7.PubMedGoogle Scholar
  63. 63.
    Christensen J, Vestergaard M, Olsen J, Sidenius P. Validation of epilepsy diagnoses in the Danish National Hospital Register. Epilepsy Res. 2007;75:162–70.PubMedCrossRefGoogle Scholar
  64. 64.
    Krarup LH, Boysen G, Janjua H, Prescott E, Truelsen T. Validity of stroke diagnoses in a National Register of Patients. Neuroepidemiology. 2007;28:150–4.PubMedCrossRefGoogle Scholar
  65. 65.
    Djurhuus BD, Skytthe A, Faber CE. Validation of the cholesteatoma diagnosis in the Danish National Hospital Register. Dan Med Bull. 2010;57:A4159.PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.National Institute of Public HealthUniversity of Southern DenmarkCopenhagen KDenmark

Personalised recommendations