Skip to main content

Advertisement

Log in

Implications of missing data on reported breast cancer mortality

  • Epidemiology
  • Published:
Breast Cancer Research and Treatment Aims and scope Submit manuscript

Abstract

Background

National cancer registries are valuable tools to analyze patterns of care and clinical outcomes; yet, missing data may impact the accuracy and generalizability of these data. We sought to evaluate the association between missing data and overall survival (OS).

Methods

Using the NCDB (National Cancer Database) and SEER (Surveillance, Epidemiology, End Results Program), we assessed data missingness among patients diagnosed with invasive breast cancer from 2010 to 2014. Key variables included demographic (age, race, ethnicity, insurance, education, income), tumor (grade, ER, PR, HER2, TNM stages), and treatment (surgery in both databases; chemotherapy and radiation in NCDB). OS was compared between those with and without missing data using Cox proportional hazards models.

Results

Overall, 775,996 patients in the NCDB and 263,016 in SEER were identified; missing at least 1 key variable occurred for 29% and 13%, respectively. Of those, the overwhelming majority (NCDB 80%; SEER 88%) were missing tumor variables. When compared to patients with complete data, missingness was associated with a greater risk of death: NCDB HR 1.23 (99% CI 1.21–1.25) and SEER HR 2.11 (99% CI 2.05–2.18). Patients with complete tumor data had higher unadjusted OS estimates than that of the entire sample: NCDB 82.7% vs 81.8% and SEER 83.5% vs 81.7% for 5-year OS.

Conclusions

Missingness of select variables is not uncommon within large national cancer registries and is associated with a worse OS. Exclusion of patients with missing variables may introduce unintended bias into analyses and result in findings that underestimate breast cancer mortality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Janz TA, Graboyes EM, Nguyen SA, Ellis MA, Neskey DM, Harruff EE, Lentsch EJ (2019) A comparison of the NCDB and SEER database for research involving head and neck cancer. Otolaryngol Head Neck Surg 160(2):284–294

    Article  Google Scholar 

  2. Mallin K, Browner A, Palis B, Gay G, McCabe R, Nogueira L, Yabroff R, Shulman L, Facktor M, Winchester DP et al (2019) Incident cases captured in the national cancer database compared with those in US population based central cancer registries in 2012–2014. Ann Surg Oncol. https://doi.org/10.1245/s10434-019-07213-1

    Article  Google Scholar 

  3. Mallin K, Palis BE, Watroba N, Stewart AK, Walczak D, Singer J, Barron J, Blumenthal W, Haydu G, Edge SB (2013) Completeness of American Cancer Registry Treatment Data: implications for quality of care research. J Am Coll Surg 216(3):428–437

    Article  Google Scholar 

  4. An MW, Tang J, Grothey A, Sargent DJ, Ou FS, Mandrekar SJ (2020) Missing tumor measurement (TM) data in the search for alternative TM-based endpoints in cancer clinical trials. Contemp Clin Trials Commun 17:100492

    Article  Google Scholar 

  5. Newman DA (2014) Missing data: five practical guidelines. Organ Res Methods 17(4):372–411

    Article  Google Scholar 

  6. Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576

    Article  Google Scholar 

  7. Walters S, Maringe C, Butler J, Rachet B, Barrett-Lee P, Bergh J, Boyages J, Christiansen P, Lee M, Wärnberg F et al (2013) Breast cancer survival and stage at diagnosis in Australia, Canada, Denmark, Norway, Sweden and the UK, 2000–2007: a population-based study. Br J Cancer 108(5):1195–1208

    Article  Google Scholar 

  8. Maringe C, Walters S, Rachet B, Butler J, Fields T, Finan P, Maxwell R, Nedrebø B, Påhlman L, Sjövall A et al (2013) Stage at diagnosis and colorectal cancer survival in six high-income countries: a population-based study of patients diagnosed during 2000–2007. Acta Oncol 52(5):919–932

    Article  Google Scholar 

  9. WHO/IARC Classification of Tumours, vol. 4, 4 edn: World Health Organization; 2012.

  10. Yang DX, Khera R, Miccio JA, Jairam V, Chang E, Yu JB, Park HS, Krumholz HM, Aneja S (2021) Prevalence of missing data in the national cancer database and association with overall survival. JAMA Netw Open 4(3):e211793

    Article  Google Scholar 

  11. Boffa DJ, Rosen JE, Mallin K, Loomis A, Gay G, Palis B, Thoburn K, Gress D, McKellar DP, Shulman LN et al (2017) Using the national cancer database for outcomes research. JAMA Oncol 3(12):1722

    Article  Google Scholar 

  12. Bilimoria KY, Bentrem DJ, Stewart AK, Winchester DP, Ko CY (2009) Comparison of commission on cancer-approved and –nonapproved hospitals in the United States: implications for studies that use the national cancer data base. J Clin Oncol 27(25):4177–4181

    Article  Google Scholar 

  13. Schlick CJ, Yang AD (2020) Is there value in cancer center accreditation? Am J Surg 220(1):27–28

    Article  Google Scholar 

  14. Brubakk K, Vist GE, Bukholm G, Barach P, Tjomsland O (2015) A systematic review of hospital accreditation: the challenges of measuring complex intervention effects. BMC Health Serv Res 15:280

    Article  Google Scholar 

  15. Fong ZV, Chang DC, Hur C, Jin G, Tramontano A, Sell NM, Warshaw AL, Fernandez-Del Castillo C, Ferrone CR, Lillemoe KD et al (2020) Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer. Am J Surg 220(1):29–34

    Article  Google Scholar 

  16. David EA, Cooke DT, Chen Y, Perry A, Canter RJ, Cress R (2015) Surgery in high-volume hospitals not commission on cancer accreditation leads to increased cancer-specific survival for early-stage lung cancer. Am J Surg 210(4):643–647

    Article  Google Scholar 

  17. SEER Cancer Statistics Review, 1975–2014, National Cancer Institute [https://seer.cancer.gov/csr/1975_2014/]

  18. Kuo T-M, Mobley LR (2016) How generalizable are the SEER registries to the cancer populations of the USA? Cancer Causes Control 27(9):1117–1126

    Article  Google Scholar 

  19. Bleicher RJ, Ruth K, Sigurdson ER, Beck JR, Ross E, Wong Y-N, Patel SA, Boraas M, Chang EI, Topham NS et al (2016) Time to surgery and breast cancer survival in the United States. JAMA Oncol 2(3):330

    Article  Google Scholar 

  20. Gradishar WJ, Anderson BO, Abraham J, Aft R, Agnese DM, Allison KH, Blair SL, Burstein HJ, Dang C, Elias AD et al: NCCN Clinical Practice Guidelines in Oncology: Breast Cancer. In., Version 1.2019 edn. Online; 2019.

  21. Rapp J, Tuminello S, Alpert N, Flores RM, Taioli E (2019) Disparities in surgery for early-stage cancer: the impact of refusal. Cancer Causes Control 30(12):1389–1397

    Article  Google Scholar 

  22. Luo Q, Egger S, Yu XQ, Smith DP, O’Connell DL (2017) Validity of using multiple imputation for “unknown” stage at diagnosis in population-based cancer registry data. PLoS ONE 12(6):e0180033

    Article  Google Scholar 

  23. Huchcroft SA, Snodgrass T (1993) Cancer patients who refuse treatment. Cancer Causes Control 4(3):179–185

    Article  Google Scholar 

  24. Weinmann S, Taplin SH, Gilbert J, Beverly RK, Geiger AM, Yood MU, Mouchawar J, Manos MM, Zapka JG, Westbrook E et al (2005) Characteristics of women refusing follow-up for tests or symptoms suggestive of breast cancer. J Natl Cancer Inst Monogr 35:33–38

    Article  Google Scholar 

  25. van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Soft 45(3):67

    Article  Google Scholar 

  26. Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49

    Article  Google Scholar 

  27. Hoskin TL, Boughey JC, Day CN, Habermann EB (2019) Lessons learned regarding missing clinical stage in the national cancer database. Ann Surg Oncol 26(3):739–745

    Article  Google Scholar 

  28. Egleston BL, Wong YN (2009) Sensitivity analysis to investigate the impact of a missing covariate on survival analyses using cancer registry data. Stat Med 28(10):1498–1511

    Article  Google Scholar 

  29. Motzer RJ, Jonasch E, Agarwal N, Alva A, Bhayani S, Choueiri TK, Costello BA, Derweesh IH, Gallagher TH, George S et al: NCCN Clinical Practice Guidelines in Oncology: Kidney Cancer. Version 2.2020. In., Version 2.2020 edn. Online; 2020.

  30. Overview of the SEER Program [https://seer.cancer.gov/about/overview.html]

  31. National Cancer Database [http://www.facs.org/quality-programs/cancer/ncdb]

  32. Mercieca-Bebber R, Palmer MJ, Brundage M, Calvert M, Stockler MR, King MT (2016) Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review. BMJ Open 6(6):e010938

    Article  Google Scholar 

  33. Wisniewski SR, Leon AC, Otto MW, Trivedi MH (2006) Prevention of missing data in clinical research studies. Biol Psychiatry 59(11):997–1000

    Article  Google Scholar 

Download references

Acknowledgements

The National Cancer Data Base (NCDB) is a joint project of the Commission on Cancer (CoC) of the American College of Surgeons and the American Cancer Society. The CoC's NCDB and the hospitals participating in the CoC NCDB are the source of the de-identified data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors.

Funding

This work was in part supported by Duke Cancer Institute through NIH grant P30CA014236 (PI: Kastan) for the Biostatistics Core.

Author information

Authors and Affiliations

Authors

Contributions

Jennifer K. Plichta: conceptualization, methodology, data analysis, writing (original draft, review, and editing), project administration. Christel N. Rushing: methodology, resources, data curation, formal analysis, writing (review and editing). Holly C. Lewis: data review, writing (original draft, review, and editing). Marguerite M. Rooney: data review, writing (original draft, review, and editing). Dan G. Blazer: data analysis, writing (review, and editing). Samantha Thomas: data analysis, writing (review, and editing). E. Shelley Hwang: data analysis, writing (review, and editing). Rachel A. Greenup: conceptualization, methodology, resources, data analysis, writing (review, and editing), project administration.

Corresponding author

Correspondence to Jennifer K. Plichta.

Ethics declarations

Conflict of interest

The authors report no proprietary or commercial interest in any product mentioned or concept discussed in this article. The authors have no relevant financial or non-financial interests to disclose. Dr. J. Plichta is a recipient of research funding by the Color Foundation (PI: Plichta). The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 420 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Plichta, J.K., Rushing, C.N., Lewis, H.C. et al. Implications of missing data on reported breast cancer mortality. Breast Cancer Res Treat 197, 177–187 (2023). https://doi.org/10.1007/s10549-022-06764-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10549-022-06764-4

Keywords

Navigation