Implications of missing data on reported breast cancer mortality

Plichta, Jennifer K.; Rushing, Christel N.; Lewis, Holly C.; Rooney, Marguerite M.; Blazer, Dan G.; Thomas, Samantha M.; Hwang, E. Shelley; Greenup, Rachel A.

doi:10.1007/s10549-022-06764-4

Implications of missing data on reported breast cancer mortality

Epidemiology
Published: 05 November 2022

Volume 197, pages 177–187, (2023)
Cite this article

Breast Cancer Research and Treatment Aims and scope Submit manuscript

Jennifer K. Plichta ORCID: orcid.org/0000-0002-7411-0558^1,2,3,
Christel N. Rushing^3,4,
Holly C. Lewis¹,
Marguerite M. Rooney¹,
Dan G. Blazer^1,3,
Samantha M. Thomas^3,4,5,
E. Shelley Hwang^1,3 &
…
Rachel A. Greenup^1,2,3

596 Accesses
15 Citations
5 Altmetric
Explore all metrics

Abstract

Background

National cancer registries are valuable tools to analyze patterns of care and clinical outcomes; yet, missing data may impact the accuracy and generalizability of these data. We sought to evaluate the association between missing data and overall survival (OS).

Methods

Using the NCDB (National Cancer Database) and SEER (Surveillance, Epidemiology, End Results Program), we assessed data missingness among patients diagnosed with invasive breast cancer from 2010 to 2014. Key variables included demographic (age, race, ethnicity, insurance, education, income), tumor (grade, ER, PR, HER2, TNM stages), and treatment (surgery in both databases; chemotherapy and radiation in NCDB). OS was compared between those with and without missing data using Cox proportional hazards models.

Results

Overall, 775,996 patients in the NCDB and 263,016 in SEER were identified; missing at least 1 key variable occurred for 29% and 13%, respectively. Of those, the overwhelming majority (NCDB 80%; SEER 88%) were missing tumor variables. When compared to patients with complete data, missingness was associated with a greater risk of death: NCDB HR 1.23 (99% CI 1.21–1.25) and SEER HR 2.11 (99% CI 2.05–2.18). Patients with complete tumor data had higher unadjusted OS estimates than that of the entire sample: NCDB 82.7% vs 81.8% and SEER 83.5% vs 81.7% for 5-year OS.

Conclusions

Missingness of select variables is not uncommon within large national cancer registries and is associated with a worse OS. Exclusion of patients with missing variables may introduce unintended bias into analyses and result in findings that underestimate breast cancer mortality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Burden of female breast cancer in India: estimates of YLDs, YLLs, and DALYs at national and subnational levels based on the national cancer registry programme

Article Open access 04 March 2024

A review of prognostic and predictive biomarkers in breast cancer

Article 15 January 2022

Estimating surgery, radiotherapy and systemic anti-cancer therapy treatment costs for cancer patients by stage at diagnosis

Article Open access 01 September 2023

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Janz TA, Graboyes EM, Nguyen SA, Ellis MA, Neskey DM, Harruff EE, Lentsch EJ (2019) A comparison of the NCDB and SEER database for research involving head and neck cancer. Otolaryngol Head Neck Surg 160(2):284–294
Article Google Scholar
Mallin K, Browner A, Palis B, Gay G, McCabe R, Nogueira L, Yabroff R, Shulman L, Facktor M, Winchester DP et al (2019) Incident cases captured in the national cancer database compared with those in US population based central cancer registries in 2012–2014. Ann Surg Oncol. https://doi.org/10.1245/s10434-019-07213-1
Article Google Scholar
Mallin K, Palis BE, Watroba N, Stewart AK, Walczak D, Singer J, Barron J, Blumenthal W, Haydu G, Edge SB (2013) Completeness of American Cancer Registry Treatment Data: implications for quality of care research. J Am Coll Surg 216(3):428–437
Article Google Scholar
An MW, Tang J, Grothey A, Sargent DJ, Ou FS, Mandrekar SJ (2020) Missing tumor measurement (TM) data in the search for alternative TM-based endpoints in cancer clinical trials. Contemp Clin Trials Commun 17:100492
Article Google Scholar
Newman DA (2014) Missing data: five practical guidelines. Organ Res Methods 17(4):372–411
Article Google Scholar
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576
Article Google Scholar
Walters S, Maringe C, Butler J, Rachet B, Barrett-Lee P, Bergh J, Boyages J, Christiansen P, Lee M, Wärnberg F et al (2013) Breast cancer survival and stage at diagnosis in Australia, Canada, Denmark, Norway, Sweden and the UK, 2000–2007: a population-based study. Br J Cancer 108(5):1195–1208
Article Google Scholar
Maringe C, Walters S, Rachet B, Butler J, Fields T, Finan P, Maxwell R, Nedrebø B, Påhlman L, Sjövall A et al (2013) Stage at diagnosis and colorectal cancer survival in six high-income countries: a population-based study of patients diagnosed during 2000–2007. Acta Oncol 52(5):919–932
Article Google Scholar
WHO/IARC Classification of Tumours, vol. 4, 4 edn: World Health Organization; 2012.
Yang DX, Khera R, Miccio JA, Jairam V, Chang E, Yu JB, Park HS, Krumholz HM, Aneja S (2021) Prevalence of missing data in the national cancer database and association with overall survival. JAMA Netw Open 4(3):e211793
Article Google Scholar
Boffa DJ, Rosen JE, Mallin K, Loomis A, Gay G, Palis B, Thoburn K, Gress D, McKellar DP, Shulman LN et al (2017) Using the national cancer database for outcomes research. JAMA Oncol 3(12):1722
Article Google Scholar
Bilimoria KY, Bentrem DJ, Stewart AK, Winchester DP, Ko CY (2009) Comparison of commission on cancer-approved and –nonapproved hospitals in the United States: implications for studies that use the national cancer data base. J Clin Oncol 27(25):4177–4181
Article Google Scholar
Schlick CJ, Yang AD (2020) Is there value in cancer center accreditation? Am J Surg 220(1):27–28
Article Google Scholar
Brubakk K, Vist GE, Bukholm G, Barach P, Tjomsland O (2015) A systematic review of hospital accreditation: the challenges of measuring complex intervention effects. BMC Health Serv Res 15:280
Article Google Scholar
Fong ZV, Chang DC, Hur C, Jin G, Tramontano A, Sell NM, Warshaw AL, Fernandez-Del Castillo C, Ferrone CR, Lillemoe KD et al (2020) Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer. Am J Surg 220(1):29–34
Article Google Scholar
David EA, Cooke DT, Chen Y, Perry A, Canter RJ, Cress R (2015) Surgery in high-volume hospitals not commission on cancer accreditation leads to increased cancer-specific survival for early-stage lung cancer. Am J Surg 210(4):643–647
Article Google Scholar
SEER Cancer Statistics Review, 1975–2014, National Cancer Institute [https://seer.cancer.gov/csr/1975_2014/]
Kuo T-M, Mobley LR (2016) How generalizable are the SEER registries to the cancer populations of the USA? Cancer Causes Control 27(9):1117–1126
Article Google Scholar
Bleicher RJ, Ruth K, Sigurdson ER, Beck JR, Ross E, Wong Y-N, Patel SA, Boraas M, Chang EI, Topham NS et al (2016) Time to surgery and breast cancer survival in the United States. JAMA Oncol 2(3):330
Article Google Scholar
Gradishar WJ, Anderson BO, Abraham J, Aft R, Agnese DM, Allison KH, Blair SL, Burstein HJ, Dang C, Elias AD et al: NCCN Clinical Practice Guidelines in Oncology: Breast Cancer. In., Version 1.2019 edn. Online; 2019.
Rapp J, Tuminello S, Alpert N, Flores RM, Taioli E (2019) Disparities in surgery for early-stage cancer: the impact of refusal. Cancer Causes Control 30(12):1389–1397
Article Google Scholar
Luo Q, Egger S, Yu XQ, Smith DP, O’Connell DL (2017) Validity of using multiple imputation for “unknown” stage at diagnosis in population-based cancer registry data. PLoS ONE 12(6):e0180033
Article Google Scholar
Huchcroft SA, Snodgrass T (1993) Cancer patients who refuse treatment. Cancer Causes Control 4(3):179–185
Article Google Scholar
Weinmann S, Taplin SH, Gilbert J, Beverly RK, Geiger AM, Yood MU, Mouchawar J, Manos MM, Zapka JG, Westbrook E et al (2005) Characteristics of women refusing follow-up for tests or symptoms suggestive of breast cancer. J Natl Cancer Inst Monogr 35:33–38
Article Google Scholar
van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Soft 45(3):67
Article Google Scholar
Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49
Article Google Scholar
Hoskin TL, Boughey JC, Day CN, Habermann EB (2019) Lessons learned regarding missing clinical stage in the national cancer database. Ann Surg Oncol 26(3):739–745
Article Google Scholar
Egleston BL, Wong YN (2009) Sensitivity analysis to investigate the impact of a missing covariate on survival analyses using cancer registry data. Stat Med 28(10):1498–1511
Article Google Scholar
Motzer RJ, Jonasch E, Agarwal N, Alva A, Bhayani S, Choueiri TK, Costello BA, Derweesh IH, Gallagher TH, George S et al: NCCN Clinical Practice Guidelines in Oncology: Kidney Cancer. Version 2.2020. In., Version 2.2020 edn. Online; 2020.
Overview of the SEER Program [https://seer.cancer.gov/about/overview.html]
National Cancer Database [http://www.facs.org/quality-programs/cancer/ncdb]
Mercieca-Bebber R, Palmer MJ, Brundage M, Calvert M, Stockler MR, King MT (2016) Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review. BMJ Open 6(6):e010938
Article Google Scholar
Wisniewski SR, Leon AC, Otto MW, Trivedi MH (2006) Prevention of missing data in clinical research studies. Biol Psychiatry 59(11):997–1000
Article Google Scholar

Download references

Acknowledgements

The National Cancer Data Base (NCDB) is a joint project of the Commission on Cancer (CoC) of the American College of Surgeons and the American Cancer Society. The CoC's NCDB and the hospitals participating in the CoC NCDB are the source of the de-identified data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors.

Funding

This work was in part supported by Duke Cancer Institute through NIH grant P30CA014236 (PI: Kastan) for the Biostatistics Core.

Author information

Authors and Affiliations

Department of Surgery, Duke University Medical Center, Durham, NC, DUMC 351327710, USA
Jennifer K. Plichta, Holly C. Lewis, Marguerite M. Rooney, Dan G. Blazer, E. Shelley Hwang & Rachel A. Greenup
Department of Population Health Sciences, Duke University Medical Center, Durham, NC, DUMC 351327710, USA
Jennifer K. Plichta & Rachel A. Greenup
Duke Cancer Institute, Durham, NC, USA
Jennifer K. Plichta, Christel N. Rushing, Dan G. Blazer, Samantha M. Thomas, E. Shelley Hwang & Rachel A. Greenup
Biostatistics Shared Resource, Duke Cancer Institute, Durham, NC, USA
Christel N. Rushing & Samantha M. Thomas
Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA
Samantha M. Thomas

Authors

Jennifer K. Plichta
View author publications
You can also search for this author in PubMed Google Scholar
Christel N. Rushing
View author publications
You can also search for this author in PubMed Google Scholar
Holly C. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Marguerite M. Rooney
View author publications
You can also search for this author in PubMed Google Scholar
Dan G. Blazer
View author publications
You can also search for this author in PubMed Google Scholar
Samantha M. Thomas
View author publications
You can also search for this author in PubMed Google Scholar
E. Shelley Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Rachel A. Greenup
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jennifer K. Plichta: conceptualization, methodology, data analysis, writing (original draft, review, and editing), project administration. Christel N. Rushing: methodology, resources, data curation, formal analysis, writing (review and editing). Holly C. Lewis: data review, writing (original draft, review, and editing). Marguerite M. Rooney: data review, writing (original draft, review, and editing). Dan G. Blazer: data analysis, writing (review, and editing). Samantha Thomas: data analysis, writing (review, and editing). E. Shelley Hwang: data analysis, writing (review, and editing). Rachel A. Greenup: conceptualization, methodology, resources, data analysis, writing (review, and editing), project administration.

Corresponding author

Correspondence to Jennifer K. Plichta.

Ethics declarations

Conflict of interest

The authors report no proprietary or commercial interest in any product mentioned or concept discussed in this article. The authors have no relevant financial or non-financial interests to disclose. Dr. J. Plichta is a recipient of research funding by the Color Foundation (PI: Plichta). The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 420 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Plichta, J.K., Rushing, C.N., Lewis, H.C. et al. Implications of missing data on reported breast cancer mortality. Breast Cancer Res Treat 197, 177–187 (2023). https://doi.org/10.1007/s10549-022-06764-4

Download citation

Received: 25 April 2022
Accepted: 06 October 2022
Published: 05 November 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10549-022-06764-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implications of missing data on reported breast cancer mortality