Cancer Causes & Control

, Volume 18, Issue 5, pp 561–569

Agreement of diagnosis and its date for hematologic malignancies and solid tumors between medicare claims and cancer registry data

  • Soko Setoguchi
  • Daniel H. Solomon
  • Robert J. Glynn
  • E. Francis Cook
  • Raisa Levin
  • Sebastian Schneeweiss
Original Paper



Claims data may be a suitable source studying associations between drugs and cancer. However, linkage between cancer registry and claims data including pharmacy-dispensing information is not always available. We examined the accuracy of claims-based definitions of incident cancers and their date of diagnosis.


Four claims-based definitions were developed to identify incident leukemia, lymphoma, lung, colorectal, stomach, and breast cancer. We identified a cohort of subjects aged ≥65 (1997–2000) from Pennsylvania Medicare and drug benefit program data linked with the state cancer registry. We calculated sensitivity, specificity, and positive predictive values of the claims-based definitions using registry as the gold standard. We further assessed the agreement between diagnosis dates from two data sources.


All definitions had very high specificity (≥98%), while sensitivity varied between 40% and 90%. Test characteristics did not vary systematically by age groups. The date of first diagnosis according to Medicare data tended to be later than the date recorded in the registry data except for breast cancer. The differences in dates of first diagnosis were within 14 days for 75% to 88% of the cases. Bias due to outcome misclassification of our claims-based definition of cancer was minimal in our example of a cohort study.


Claims data can identify incident hematologic malignancies and solid tumors with very high specificity with sufficient agreement in the date of first diagnosis. The impact of bias due to outcome misclassification and thus the usefulness of claims-based cancer definitions as cancer outcome markers in etiologic studies need to be assessed for each study setting.


Incident cancer Medicare claims Cancer registry Date of diagnosis Agreement Senstivity Sepcificity Positive predictive value Misclassification 


  1. 1.
    Brown SL, Greene MH, Gershon SK, Edwards ET, Braun MM (2002) Tumor necrosis factor antagonist therapy and lymphoma development: twenty-six cases reported to the Food and Drug Administration. Arthritis Rheum 46(12):3151–3158PubMedCrossRefGoogle Scholar
  2. 2.
    Setoguchi S, Solomon DH, Weinblatt ME et al (2006) Tumor necrosis factor alpha antagonist use and cancer in patients with rheumatoid arthritis. Arthritis Rheum 54(9):2757–2764PubMedCrossRefGoogle Scholar
  3. 3.
    Warren JL, Feuer E, Potosky AL, Riley GF, Lynch CF (1999) Use of Medicare hospital and physician data to assess breast cancer incidence.[comment]. Medical Care 37(5):445–456PubMedCrossRefGoogle Scholar
  4. 4.
    Cooper GS, Yuan Z, Stange KC, Dennis LK, Amini SB, Rimm AA (1999) The sensitivity of Medicare claims data for case ascertainment of six common cancers.[comment]. Medical Care 37(5):436–444PubMedCrossRefGoogle Scholar
  5. 5.
    Solin LJ, Legorreta A, Schultz DJ, Levin HA, Zatz S, Goodman RL (1994) Analysis of a claims database for the identification of patients with carcinoma of the breast. J Med Syst 18(1):23–32PubMedCrossRefGoogle Scholar
  6. 6.
    Solin LJ, MacPherson S, Schultz DJ, Hanchak NA (1997) Evaluation of an algorithm to identify women with carcinoma of the breast. J Med Syst 21(3):189–199PubMedCrossRefGoogle Scholar
  7. 7.
    Leung KM, Hasan AG, Rees KS, Parker RG, Legorreta AP (1999) Patients with newly diagnosed carcinoma of the breast: validation of a claim-based identification algorithm. J Clin Epidemiol 52(1):57–64PubMedCrossRefGoogle Scholar
  8. 8.
    Freeman JL, Zhang D, Freeman DH, Goodwin JS (2000) An approach to identifying incident breast cancer cases using Medicare claims data. J Clin Epidemiol 53(6):605–614PubMedCrossRefGoogle Scholar
  9. 9.
    Penberthy L, McClish D, Manning C, Retchin S, Smith T (2005) The added value of claims for cancer surveillance: results of varying case definitions. Med Care 43(7):705–712PubMedCrossRefGoogle Scholar
  10. 10.
    The North American Association of Central Cancer Registries, Inc. Official website (2004) (Accessed September, 9, 2005, at
  11. 11.
    Fletcher RW, Fletcher SW (2005) Clinical Epidemiology: The Essentials. 4th edn. Lippincott Williams & Wilkins (ed)Google Scholar
  12. 12.
    Rothman KJ, Greenland S (1998) Modern Epidemiology. 2nd edn. Lippincott Williams & WilkinsGoogle Scholar
  13. 13.
    SEER Fast Stats (Accessed January 15, 2005, 2005, at
  14. 14.
    Brenner H, Gefeller O (1993) Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. Am J Epidemiol 138(11):1007–1015PubMedGoogle Scholar
  15. 15.
    Fanning J, Gangestad A, Andrews SJ (2000) National cancer data base/surveillance epidemiology and end results: potential insensitive-measure bias. Gynecol Oncol 77(3):450–453PubMedCrossRefGoogle Scholar
  16. 16.
    Penberthy L, McClish D, Pugh A, Smith W, Manning C, Retchin S (2003) Using hospital discharge files to enhance cancer surveillance. Am J Epidemiol 158(1):27–34PubMedCrossRefGoogle Scholar
  17. 17.
    Stang A, Glynn RJ, Gann PH, Taylor JO, Hennekens CH (1999) Cancer occurrence in the elderly: agreement between three major data sources. Ann Epidemiol 9(1):60–67PubMedCrossRefGoogle Scholar
  18. 18.
    Wang PS, Walker AM, Tsuang MT, Orav EJ, Levin R, Avorn J (2001) Finding incident breast cancer cases through US claims data and a state cancer registry. Cancer Causes & Control 12(3):257–265CrossRefGoogle Scholar
  19. 19.
    McClish DK, Penberthy L, Whittemore M et al (1997) Ability of Medicare claims data and cancer registries to identify cancer cases and treatment. Am J Epidemiol 145(3):227–233PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • Soko Setoguchi
    • 1
  • Daniel H. Solomon
    • 1
    • 2
  • Robert J. Glynn
    • 1
  • E. Francis Cook
    • 3
    • 4
  • Raisa Levin
    • 1
  • Sebastian Schneeweiss
    • 1
    • 4
  1. 1.Division of Pharmacoepidemiology and Pharmacoeconomics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUSA
  2. 2.Division of Rheumatology, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUSA
  3. 3.Division of General Internal Medicine, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUSA
  4. 4.Department of EpidemiologyHarvard School of Public HealthBostonUSA

Personalised recommendations