International Journal of Public Health

, Volume 64, Issue 3, pp 441–450 | Cite as

Identifying diabetes cases in health administrative databases: a validation study based on a large French cohort

  • Sonsoles FuentesEmail author
  • Emmanuel Cosson
  • Laurence Mandereau-Bruno
  • Anne Fagot-Campagna
  • Pascale Bernillon
  • Marcel Goldberg
  • Sandrine Fosse-Edorh
  • CONSTANCES-Diab Group
Original Article



In the French national health insurance information system (SNDS) three diabetes case definition algorithms are applied to identify diabetic patients. The objective of this study was to validate those using data from a large cohort.


The CONSTANCES cohort (Cohorte des consultants des Centres d’examens de santé) comprises a randomly selected sample of adults living in France. Between 2012 and 2014, data from 45,739 participants recorded in a self-administrated questionnaire and in a medical examination were linked to the SNDS. Two gold standards were defined: known diabetes and pharmacologically treated diabetes. Sensitivity, specificity, positive and negative predictive values (PPV, NPV) and kappa coefficients (k) were estimated.


All three algorithms had specificities and NPV over 99%. Their sensitivities ranged from 73 to 77% in algorithm A, to 86 and 97% in algorithm B and to 93 and 99% in algorithm C, when identifying known and pharmacologically treated diabetes, respectively. Algorithm C had the highest k when using known diabetes as the gold standard (0.95). Algorithm B had the highest k (0.98) when testing for pharmacologically treated diabetes.


The SNDS is an excellent source for diabetes surveillance and studies on diabetes since the case definition algorithms applied have very good test performances.


Information systems Diabetes Algorithms Validation studies CONSTANCES 



The CONSTANCES cohort is supported by the Caisse Nationale d’Assurance Maladie des travailleurs salariés-CNAMTS. CONSTANCES is accredited as a “National Infrastructure for Biology and health” by the governmental Investissements d’avenir programme and was funded by the Agence nationale de la recherche (ANR-11-INBS-0002 Grant). CONSTANCES also receives funding from MSD, AstraZeneca and Lundbeck managed by INSERM-Transfert. This study has received a funding from the Interministerial Mission for Combating Drugs and Addictive Behaviors (“Mission Interministérielle de Lutte contre les Drogues et les Conduites Addictives”, MILDECA). None of the authors are salaried by the funders of the CONSTANCES cohort. The funders did not have any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. All authors declared no potential conflict of interest relevant to this article.

Compliance with ethical standards

Conflict of interest

All authors declared no potential conflict of interest relevant to this article.

Research involving human participants and/or animals

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

The CONSTANCES study was approved by authorities regulating ethical data collection in France (CCTIRS: Comité Consultatif pour le Traitement des Informations Relatives à la Santé; CNIL-Commission Nationale Informatique et Liberté) and all participants signed an informed consent.

Supplementary material

38_2018_1186_MOESM1_ESM.docx (70 kb)
Supplementary material 1 (DOCX 70 kb)


  1. Bezin J, Duong M, Lassalle R, Droz C, Pariente A, Blin P, Moore N (2017) The national healthcare system claims databases in France, Sniiram And Egb: powerful tools for pharmacoepidemiology. Pharmacoepidemiol Drug Saf 26:954–962. CrossRefGoogle Scholar
  2. Carrere P, Fagour C, Sportouch D, Gane-Troplent F, Helene-Pelage J, Lang T, Inamo J (2018) Diabetes mellitus and obesity in the French Caribbean: a special vulnerability for women? Women Health 58:145–159. CrossRefGoogle Scholar
  3. Cho NH, Shaw JE, Karuranga S, Huang Y, Da Rocha Fernandes JD, Ohlrogge AW, Malanda B (2018) IDF Diabetes Atlas: global estimates of diabetes prevalence for 2017 and projections for 2045. Diabet Res Clin Pract 138:271–281. CrossRefGoogle Scholar
  4. Clottey C, Mo F, Lebrun B, Mickelson P, Niles J, Robbins G (2001) The development of the national diabetes surveillance system (NDSS) in Canada. Chronic Dis Can 22:67–69Google Scholar
  5. Dart D, Martens PJ, Sellers EA, Brownell MD, Rigatto C, Dean HJ (2011) Validation of a pediatric diabetes case definition using administrative health data in Manitoba. Canada Diabetes Care 34:898–903. CrossRefGoogle Scholar
  6. Day HR, Parker JD (2013) Self-report of diabetes and claims-based identification of diabetes among medicare beneficiaries. National Health Statistics Report 1–14Google Scholar
  7. De Lagasnerie G, Aguade AS, Denis P, Fagot-Campagna A, Gastaldi-Menager C (2018) The economic burden of diabetes to French national health insurance: a new cost-of-illness method based on a combined medicalized and incremental approach. Eur J Health Econ 19:189–201. CrossRefGoogle Scholar
  8. Dwyer-Lindgren L, Mackenbach JP, Van Lenthe FJ, Flaxman AD, Mokdad AH (2016) Diagnosed and undiagnosed diabetes prevalence by county in the US 1999–2012. Diabetes Care 39:1556–1562. CrossRefGoogle Scholar
  9. Fosse-Edorh S, Rigou A, Morin S, Fezeu L, Mandereau-Bruno L, Fagot-Campagna A (2017) Algorithms based on medico-administrative data in the field of endocrine, nutritional and metabolic diseases, especially diabetes. Rev Epidemiol Sante Publique 65(Suppl 4):S168–S173. CrossRefGoogle Scholar
  10. Fromont A et al (2013) Comorbidities at multiple sclerosis diagnosis. J Neurol 260:2629–2637. CrossRefGoogle Scholar
  11. Geiss LS, Kirtland K, Lin J, Shrestha S, Thompson T, Albright A, Gregg EW (2017) Changes in diagnosed diabetes, obesity, and physical inactivity prevalence in US counties, 2004–2012. PLoS ONE 12:E0173428. CrossRefGoogle Scholar
  12. Geiss LS, Bullard KM, Brinks R, Hoyer A, Gregg EW (2018) Trends in type 2 diabetes detection among adults in the USA, 1999–2014. BMJ Open Diabetes Res Care 6:E000487. CrossRefGoogle Scholar
  13. Goldberg M (2006) Administrative data bases: could they be useful for epidemiology? Rev Epidemiol Sante Publique 54:297–303CrossRefGoogle Scholar
  14. Goldberg M et al (2017) Constances: a general prospective population-based cohort for occupational and environmental epidemiology: cohort profile. Occup Environ Med 74:66–71. CrossRefGoogle Scholar
  15. Hebert PL, Geiss LS, Tierney EF, Engelgau MM, Yawn BP, Am M (1999) Identifying persons with diabetes using medicare claims data. Am J Med Qual 14:270–277. CrossRefGoogle Scholar
  16. Kirtland KA, Burrows NR, Geiss LS (2014) Diabetes interactive atlas. Prev Chronic Dis 11:130300. CrossRefGoogle Scholar
  17. Kusnik-Joinville O, Weill A, Salanave B, Ricordeau P, Allemand H (2008) Prevalence and treatment of diabetes in France: trends between 2000 and 2005. Diabetes Metab 34:266–272. CrossRefGoogle Scholar
  18. Leong A, Dasgupta K, Bernatsky S, Lacaille D, Avina-Zubieta A, Rahme E (2013) Systematic review and meta-analysis of validation studies on a diabetes case definition from health administrative records. PLoS ONE 8:E75256. CrossRefGoogle Scholar
  19. Lipscombe LL, Hux JE (2007) Trends in diabetes prevalence, incidence, and mortality in Ontario, Canada 1995–2005: a population-based study. Lancet 369:750–756. CrossRefGoogle Scholar
  20. Maura G, Blotiere PO, Bouillon K, Billionnet C, Ricordeau P, Alla F, Zureik M (2015) Comparison of the short-term risk of bleeding and arterial thromboembolic events in nonvalvular atrial fibrillation patients newly treated with dabigatran or rivaroxaban versus vitamin K antagonists: a French nationwide propensity-matched cohort study. Circulation 132:1252–1260. CrossRefGoogle Scholar
  21. Monesi L et al (2012) Prevalence, incidence and mortality of diagnosed diabetes: evidence from an Italian population-based study. Diabet Med 29:385–392. CrossRefGoogle Scholar
  22. Muggah E, Graves E, Bennett C, Manuel DG (2013) Ascertainment of chronic diseases using population health data: a comparison of health administrative data and patient self-report. BMC Public Health 13:16. CrossRefGoogle Scholar
  23. Perlbarg J, Allonier C, Boisnault P, Daniel F, Le Fur P, Szidon P, Bourgueil Y (2013) Feasibility and practical value of statistical matching of a general practice database and a health insurance database applied to diabetes and hypertension. Sante Publique (Bucur) 26:355–363CrossRefGoogle Scholar
  24. Ricci P, Mezzarobba M, Blotière P, Polton D (2013) Reimbursed health expenditures during the last year of life, in France, in the year 2008. Rev Epidemiol Sante Publique 61:29–36CrossRefGoogle Scholar
  25. Richesson RL (2011) Data standards in diabetes patient registries. J Diabetes Sci Technol 5:476–485. CrossRefGoogle Scholar
  26. Ruiz F et al (2016) High quality standards for a large-scale prospective population-based observational cohort: Constances. BMC Public Health 16:877. CrossRefGoogle Scholar
  27. Sakshaug S, Weir DR, Nicholas LH (2014) Identifying diabetics in medicare claims and survey data: implications for health services research. BMC Health Serv Res 14:150. CrossRefGoogle Scholar
  28. Santin G et al (2016) Estimation De Prévalencesdans Constances: Premières explorations. Bull Épidémiologique Hebd 35–36:622–628Google Scholar
  29. Saydah S, LS G, Tierney E, SM B, Engelgau M, Brancati F (2004) Review of the performance of methods to identify diabetes cases among vital statistics, administrative, and survey data. Ann Epidemiol 14:507–516. CrossRefGoogle Scholar
  30. Schmittdiel JA et al (2014) Prescription medication burden in patients with newly diagnosed diabetes: a surveillance, prevention, and management of diabetes mellitus (Supreme-Dm) study. J Am Pharm Assoc 54:374–382. CrossRefGoogle Scholar
  31. Tubiana S et al (2017) Dental procedures, antibiotic prophylaxis, and endocarditis among people with prosthetic heart valves: nationwide population based cohort and a case crossover. Study BMJ 358:J3776. CrossRefGoogle Scholar
  32. Tuppin P et al (2017) Value of a national administrative database to guide public decisions: from the Systeme National D’information Interregimes De L’assurance Maladie (SNIIRAM) to the Systeme National Des Donnees De Sante (SNDS) in France. Rev Epidemiol Sante Publique 65(Suppl 4):S149–S167. CrossRefGoogle Scholar
  33. Walraven CV (2017) A comparison of methods to correct for misclassification bias from administrative database diagnostic codes. Int J Epidemiol 0:1–12. Google Scholar
  34. Weill A et al (2016) Low dose oestrogen combined oral contraception and risk of pulmonary embolism, embolism, stroke, and myocardial infarction in five million french women: cohort. Study BMJ 353:I2002. CrossRefGoogle Scholar
  35. Wong HB, Lim GH (2011) Measures of diagnostic accuracy: sensitivity, specificity, PPV and NPV. Proc Singap Healthc 20:316–318. CrossRefGoogle Scholar
  36. Zins M et al (2010) The constances cohort: an open epidemiological laboratory. BMC Public Health 10:1CrossRefGoogle Scholar

Copyright information

© Swiss School of Public Health (SSPH+) 2018

Authors and Affiliations

  • Sonsoles Fuentes
    • 1
    Email author
  • Emmanuel Cosson
    • 2
    • 3
  • Laurence Mandereau-Bruno
    • 1
  • Anne Fagot-Campagna
    • 4
  • Pascale Bernillon
    • 1
  • Marcel Goldberg
    • 5
  • Sandrine Fosse-Edorh
    • 1
  • CONSTANCES-Diab Group
  1. 1.Santé publique France (SpF)Saint-MauriceFrance
  2. 2.Department of Endocrinology-Diabetology-Nutrition, AP-HP, Jean Verdier HospitalParis 13 University, Sorbonne Paris Cité, CRNH-IdF, CINFOBondyFrance
  3. 3.Sorbonne Paris Cité, UMR U1153 Inserm/U1125 Inra/Cnam/Université Paris 13BobignyFrance
  4. 4.Strategy and Research DepartmentCaisse nationale de l’assurance maladieParisFrance
  5. 5.Population-Based Epidemiological Cohorts UnitInserm UMS 011VillejuifFrance

Personalised recommendations