Abstract
Background
National cancer registries are valuable tools to analyze patterns of care and clinical outcomes; yet, missing data may impact the accuracy and generalizability of these data. We sought to evaluate the association between missing data and overall survival (OS).
Methods
Using the NCDB (National Cancer Database) and SEER (Surveillance, Epidemiology, End Results Program), we assessed data missingness among patients diagnosed with invasive breast cancer from 2010 to 2014. Key variables included demographic (age, race, ethnicity, insurance, education, income), tumor (grade, ER, PR, HER2, TNM stages), and treatment (surgery in both databases; chemotherapy and radiation in NCDB). OS was compared between those with and without missing data using Cox proportional hazards models.
Results
Overall, 775,996 patients in the NCDB and 263,016 in SEER were identified; missing at least 1 key variable occurred for 29% and 13%, respectively. Of those, the overwhelming majority (NCDB 80%; SEER 88%) were missing tumor variables. When compared to patients with complete data, missingness was associated with a greater risk of death: NCDB HR 1.23 (99% CI 1.21–1.25) and SEER HR 2.11 (99% CI 2.05–2.18). Patients with complete tumor data had higher unadjusted OS estimates than that of the entire sample: NCDB 82.7% vs 81.8% and SEER 83.5% vs 81.7% for 5-year OS.
Conclusions
Missingness of select variables is not uncommon within large national cancer registries and is associated with a worse OS. Exclusion of patients with missing variables may introduce unintended bias into analyses and result in findings that underestimate breast cancer mortality.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Janz TA, Graboyes EM, Nguyen SA, Ellis MA, Neskey DM, Harruff EE, Lentsch EJ (2019) A comparison of the NCDB and SEER database for research involving head and neck cancer. Otolaryngol Head Neck Surg 160(2):284–294
Mallin K, Browner A, Palis B, Gay G, McCabe R, Nogueira L, Yabroff R, Shulman L, Facktor M, Winchester DP et al (2019) Incident cases captured in the national cancer database compared with those in US population based central cancer registries in 2012–2014. Ann Surg Oncol. https://doi.org/10.1245/s10434-019-07213-1
Mallin K, Palis BE, Watroba N, Stewart AK, Walczak D, Singer J, Barron J, Blumenthal W, Haydu G, Edge SB (2013) Completeness of American Cancer Registry Treatment Data: implications for quality of care research. J Am Coll Surg 216(3):428–437
An MW, Tang J, Grothey A, Sargent DJ, Ou FS, Mandrekar SJ (2020) Missing tumor measurement (TM) data in the search for alternative TM-based endpoints in cancer clinical trials. Contemp Clin Trials Commun 17:100492
Newman DA (2014) Missing data: five practical guidelines. Organ Res Methods 17(4):372–411
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576
Walters S, Maringe C, Butler J, Rachet B, Barrett-Lee P, Bergh J, Boyages J, Christiansen P, Lee M, Wärnberg F et al (2013) Breast cancer survival and stage at diagnosis in Australia, Canada, Denmark, Norway, Sweden and the UK, 2000–2007: a population-based study. Br J Cancer 108(5):1195–1208
Maringe C, Walters S, Rachet B, Butler J, Fields T, Finan P, Maxwell R, Nedrebø B, Påhlman L, Sjövall A et al (2013) Stage at diagnosis and colorectal cancer survival in six high-income countries: a population-based study of patients diagnosed during 2000–2007. Acta Oncol 52(5):919–932
WHO/IARC Classification of Tumours, vol. 4, 4 edn: World Health Organization; 2012.
Yang DX, Khera R, Miccio JA, Jairam V, Chang E, Yu JB, Park HS, Krumholz HM, Aneja S (2021) Prevalence of missing data in the national cancer database and association with overall survival. JAMA Netw Open 4(3):e211793
Boffa DJ, Rosen JE, Mallin K, Loomis A, Gay G, Palis B, Thoburn K, Gress D, McKellar DP, Shulman LN et al (2017) Using the national cancer database for outcomes research. JAMA Oncol 3(12):1722
Bilimoria KY, Bentrem DJ, Stewart AK, Winchester DP, Ko CY (2009) Comparison of commission on cancer-approved and –nonapproved hospitals in the United States: implications for studies that use the national cancer data base. J Clin Oncol 27(25):4177–4181
Schlick CJ, Yang AD (2020) Is there value in cancer center accreditation? Am J Surg 220(1):27–28
Brubakk K, Vist GE, Bukholm G, Barach P, Tjomsland O (2015) A systematic review of hospital accreditation: the challenges of measuring complex intervention effects. BMC Health Serv Res 15:280
Fong ZV, Chang DC, Hur C, Jin G, Tramontano A, Sell NM, Warshaw AL, Fernandez-Del Castillo C, Ferrone CR, Lillemoe KD et al (2020) Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer. Am J Surg 220(1):29–34
David EA, Cooke DT, Chen Y, Perry A, Canter RJ, Cress R (2015) Surgery in high-volume hospitals not commission on cancer accreditation leads to increased cancer-specific survival for early-stage lung cancer. Am J Surg 210(4):643–647
SEER Cancer Statistics Review, 1975–2014, National Cancer Institute [https://seer.cancer.gov/csr/1975_2014/]
Kuo T-M, Mobley LR (2016) How generalizable are the SEER registries to the cancer populations of the USA? Cancer Causes Control 27(9):1117–1126
Bleicher RJ, Ruth K, Sigurdson ER, Beck JR, Ross E, Wong Y-N, Patel SA, Boraas M, Chang EI, Topham NS et al (2016) Time to surgery and breast cancer survival in the United States. JAMA Oncol 2(3):330
Gradishar WJ, Anderson BO, Abraham J, Aft R, Agnese DM, Allison KH, Blair SL, Burstein HJ, Dang C, Elias AD et al: NCCN Clinical Practice Guidelines in Oncology: Breast Cancer. In., Version 1.2019 edn. Online; 2019.
Rapp J, Tuminello S, Alpert N, Flores RM, Taioli E (2019) Disparities in surgery for early-stage cancer: the impact of refusal. Cancer Causes Control 30(12):1389–1397
Luo Q, Egger S, Yu XQ, Smith DP, O’Connell DL (2017) Validity of using multiple imputation for “unknown” stage at diagnosis in population-based cancer registry data. PLoS ONE 12(6):e0180033
Huchcroft SA, Snodgrass T (1993) Cancer patients who refuse treatment. Cancer Causes Control 4(3):179–185
Weinmann S, Taplin SH, Gilbert J, Beverly RK, Geiger AM, Yood MU, Mouchawar J, Manos MM, Zapka JG, Westbrook E et al (2005) Characteristics of women refusing follow-up for tests or symptoms suggestive of breast cancer. J Natl Cancer Inst Monogr 35:33–38
van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Soft 45(3):67
Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49
Hoskin TL, Boughey JC, Day CN, Habermann EB (2019) Lessons learned regarding missing clinical stage in the national cancer database. Ann Surg Oncol 26(3):739–745
Egleston BL, Wong YN (2009) Sensitivity analysis to investigate the impact of a missing covariate on survival analyses using cancer registry data. Stat Med 28(10):1498–1511
Motzer RJ, Jonasch E, Agarwal N, Alva A, Bhayani S, Choueiri TK, Costello BA, Derweesh IH, Gallagher TH, George S et al: NCCN Clinical Practice Guidelines in Oncology: Kidney Cancer. Version 2.2020. In., Version 2.2020 edn. Online; 2020.
Overview of the SEER Program [https://seer.cancer.gov/about/overview.html]
National Cancer Database [http://www.facs.org/quality-programs/cancer/ncdb]
Mercieca-Bebber R, Palmer MJ, Brundage M, Calvert M, Stockler MR, King MT (2016) Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review. BMJ Open 6(6):e010938
Wisniewski SR, Leon AC, Otto MW, Trivedi MH (2006) Prevention of missing data in clinical research studies. Biol Psychiatry 59(11):997–1000
Acknowledgements
The National Cancer Data Base (NCDB) is a joint project of the Commission on Cancer (CoC) of the American College of Surgeons and the American Cancer Society. The CoC's NCDB and the hospitals participating in the CoC NCDB are the source of the de-identified data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors.
Funding
This work was in part supported by Duke Cancer Institute through NIH grant P30CA014236 (PI: Kastan) for the Biostatistics Core.
Author information
Authors and Affiliations
Contributions
Jennifer K. Plichta: conceptualization, methodology, data analysis, writing (original draft, review, and editing), project administration. Christel N. Rushing: methodology, resources, data curation, formal analysis, writing (review and editing). Holly C. Lewis: data review, writing (original draft, review, and editing). Marguerite M. Rooney: data review, writing (original draft, review, and editing). Dan G. Blazer: data analysis, writing (review, and editing). Samantha Thomas: data analysis, writing (review, and editing). E. Shelley Hwang: data analysis, writing (review, and editing). Rachel A. Greenup: conceptualization, methodology, resources, data analysis, writing (review, and editing), project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors report no proprietary or commercial interest in any product mentioned or concept discussed in this article. The authors have no relevant financial or non-financial interests to disclose. Dr. J. Plichta is a recipient of research funding by the Color Foundation (PI: Plichta). The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Plichta, J.K., Rushing, C.N., Lewis, H.C. et al. Implications of missing data on reported breast cancer mortality. Breast Cancer Res Treat 197, 177–187 (2023). https://doi.org/10.1007/s10549-022-06764-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10549-022-06764-4