Skip to main content

Advertisement

Log in

Large Datasets for Disparities Research in Breast Cancer

  • Breast Cancer Disparities (LA Newman, Section Editor)
  • Published:
Current Breast Cancer Reports Aims and scope Submit manuscript

Abstract

Purpose of Review

Breast cancer disparities affect how different populations are impacted by breast cancer incidence, mortality, and survival. We provide an overview of large datasets that scientists can use to study disparities in breast cancer outcomes.

Recent Findings

Many large datasets are accessible to disparities researchers with a project plan and little or no cost. Yet only two datasets have been significantly used in breast cancer disparities publications. Other datasets combine administrative claim, molecular, electronic health record, patient reported, imaging, and clinical trial data in a way that could benefit disparities research.

Summary

Many existing datasets lack sufficient diversity or detail in key disparity variables. With this review of the different datasets available and their potential pitfalls, researchers will be better equipped to conduct studies that can identify and solve the problems that lead to health outcome disparities for breast cancer patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. Bigby JA, Holmes MD. Disparities across the breast cancer continuum. Cancer Causes Control. 2005:35–44.

  2. Braveman PA, Kumanyika S, Fielding J, LaVeist T, Borrell LN, Manderscheid R, et al. Health disparities and health equity: the issue is justice. Am J Public Health. 2011;101:S149–55.

    Article  Google Scholar 

  3. Wang JJ, Health N, Oluwole SF, Hiotis K, Bickell NA, Oluwole S, et al. Missed opportunities: racial disparities in adjuvant breast cancer treatment. Artic J Clin Oncol. 2006;24:1357–62 [cited 2020 Jan 27] https://www.researchgate.net/publication/7228818. Accessed 30 Jan 2020.

  4. Harper S, Lynch J, Meersman SC, Breen N, Davis WW, Reichman MC. Trends in area-socioeconomic and race-ethnic disparities in breast cancer incidence, stage at diagnosis, screening, mortality, and survival among women ages 50 years and over (1987-2005). Cancer Epidemiol Biomark Prev. 2009;18:121–31.

    Article  Google Scholar 

  5. Burgess DJ, Fu SS, Van Ryn M. Why do providers contribute to disparities and what can be done about it? J Gen Intern Med. 2004:1154–9.

  6. • Chambers DA, Amir E, Saleh RR, Rodin D, Keating NL, Osterman TJ, et al. The impact of big data research on practice, policy, and cancer care. Am Soc Clin Oncol Educ B. American Society of Clinical Oncology (ASCO). 2019:e167–75 This book chapter highlights the types of big data available for observational cancer research studies. The authors include details about the strengths and weakness of each source of data as well as examples for studies that have used big data to improve cancer care.

  7. •• Reeder-Hayes KE, Troester MA, Meyer A-M. Reducing racial disparities in breast cancer care: the role of “big data.”. Oncology. 2017;31:756–62 This paper addresses the opportunities and challenges of working with big data in reducing disparities in breast cancer, particularly with regard to racial disparities. The authors highlight several large datasets used for disparities research such as SEER-Medicare and the Carolina Breast Cancer Study. They also identify some datasets with with potential for big data research such as Flatiron and CancerLinQ.

    PubMed  Google Scholar 

  8. Battaglia TA, Roloff K, Posner MA, Freund KM. Improving follow-up to abnormal breast cancer screening in an urban population. Cancer. 2007 [cited 2020 Jan 27], 109:359. https://doi.org/10.1002/cncr.22354.

  9. Daly B, Olopade OI. A perfect storm: How tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J Clin. 2015 [cited 2020 Jan 27];65:221–38. https://doi.org/10.3322/caac.21271.

    Article  PubMed  Google Scholar 

  10. Abbott DE, Voils CL, Fisher DA, Greenberg CC, Safdar N. Socioeconomic disparities, financial toxicity, and opportunities for enhanced system efficiencies for patients with cancer. J Surg Oncol. 2017 [cited 2020 Jan 27];115:250–6. https://doi.org/10.1002/jso.24528.

    Article  PubMed  Google Scholar 

  11. • Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51:584–91 Nature Publishing Group. his paper illustrates the problem with developing polygenic risk scores on racially homogenous populations. Risk scores are more accurate for individuals of European ancestry since data to develop the risk models are trained on primarily individuals of European ancestry. These findings underscore the importance of diversifying large datasets and increasing transparency of variables related to disparities in clinical research.

    Article  CAS  Google Scholar 

  12. Buolamwini J, Gebru T. Gender shades: intersectional accuracy disparities in commercial gender classification. Proc Mach Learn Res. 2018.

  13. •• Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(80):447–53 This paper uncovers racial bias in a widely used risk prediction algorithm. In this study, researchers discovered that black patients were assigned lower risk scored compared to white patients with the same comorbidities. This occurred because the model was trained using cost of care as the outcome, and hospitals in the past had spent more on white patients than black patients. This real-world example of the failure of a prediction model trained on big data is a cautionary tale that even diverse datasets can perpetuate existing disparities.

    Article  CAS  Google Scholar 

  14. Reeder-Hayes KE, Bainbridge J, Meyer AM, Amos KD, Weiner BJ, Godley PA, et al. Race and age disparities in receipt of sentinel lymph node biopsy for early-stage breast cancer. Breast Cancer Res Treat. 2011;128:863–71.

    Article  Google Scholar 

  15. Freedman RA, He Y, Winer EP, Keating NL. Trends in racial and age disparities in definitive local therapy of early-stage breast cancer. J Clin Oncol. [cited 2020 Jan 27]. 27:713–9 www.jco.org. Accessed 30 January 2020.

  16. Walker GV, Grant SR, Guadagnolo BA, Hoffman KE, Smith BD, Koshy M, et al. Disparities in stage at diagnosis, treatment, and survival in nonelderly adult patients with cancer according to insurance status. J Clin Oncol. 2014;32:3118–25.

  17. Niu X, Roche LM, Pawlish KS, Henry KA. Cancer survival disparities by health insurance status. Cancer Med. 2013 [cited 2020 Jan 27];2:403–11. https://doi.org/10.1002/cam4.84.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Krieger N. Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?: the Public Health Disparities Geocoding Project. Am J Epidemiol. 2002 [cited 2020 Jan 27];;156:471–82. https://doi.org/10.1093/aje/kwf068.

    Article  PubMed  Google Scholar 

  19. Bronson MR, Kapadia NS, Austin AM, Wang Q, Feskanich D, Bynum JPW, et al. Leveraging linkage of cohort studies with administrative claims data to identify individuals with cancer. Med Care. 2018;56:e83–9.

  20. NCI Brief description of SEER-Medicare Database [Internet]. [cited 2019 Feb 21]. https://healthcaredelivery.cancer.gov/seermedicare/overview/. Accessed 30 Jan 2020.

  21. National Cancer Database [Internet]. [cited 2020 Jan 31]. Available from: https://www.facs.org/quality-programs/cancer/ncdb. Accessed 30 Jan 2020.

  22. Newman LA. Breast cancer disparities: socioeconomic factors versus biology. Ann Surg OncolSpringer New York LLC. 2017;24:2869–75.

    Article  Google Scholar 

  23. Wheeler SB, Reeder-Hayes KE, Carey LA. Disparities in breast cancer treatment and outcomes: biological, social, and health system determinants and opportunities for research. Oncologist Alphamed Press. 2013;18:986–93.

    Article  Google Scholar 

  24. Vona-Davis L, Rose DP. The influence of socioeconomic disparities on breast cancer tumor biology and prognosis: a review. J Women’s Heal. 2009;18:883–93.

    Article  Google Scholar 

  25. Patel TA, Colon-Otero G, Bueno Hume C, Copland JA, Perez EA. Breast cancer in Latinas: gene expression, differential response to treatments, and differential toxicities in Latinas compared with other population groups. Oncologist. Alphamed Press. 2010;15:466–75.

    Article  Google Scholar 

  26. Hamdan D, Nguyen TT, Leboeuf C, Meles S, Janin A, Bousquet G. Genomics applied to the treatment of breast cancer. Oncotarget. Impact Journals LLC. 2019:4786–801.

  27. OHDSI. Data standardization [Internet]. [cited 2020 Jan 31]. https://www.ohdsi.org/data-standardization/. Accessed 30 January 2020.

  28. Wang MC, Mosen D, Shuster E, Bellows J. Association of patient-reported care coordination with patient satisfaction. J Ambul Care Manage. 2015;38:69–76 [cited 2018 Apr 30] http://insights.ovid.com/crossref?an=00004479-201501000-00011.

    Article  CAS  Google Scholar 

  29. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature Nature Research. 2020;577:89–94.

    Article  CAS  Google Scholar 

  30. Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A deep learning mammography-based model for improved breast cancer risk prediction. Radiology. 2019 [cited 2020 Jan 27];292:60–6. https://doi.org/10.1148/radiol.2019182716.

    Article  PubMed  Google Scholar 

  31. Stewart JH, Bertoni AG, Staten JL, Levine EA, Gross CP. Participation in surgical oncology clinical trials: gender-, race/ethnicity-, and age-based disparities. Ann Surg Oncol Springer New York. 2007;14:3328–34.

    Article  Google Scholar 

  32. Number of respondents by (selected) first cancer site and date of diagnosis: Medicare advantage [Internet]. [cited 2019 Nov 15]. Available from: https://healthcaredelivery.cancer.gov/seer-cahps/aboutdata/diagnosis-ma.html

  33. Demographic characteristics of SEER-CAHPS respondents by health plan type [Internet]. [cited 2019 Nov 15]. Available from: https://healthcaredelivery.cancer.gov/seer-cahps/aboutdata/demographics.html. Accessed 30 Jan 2020.

  34. Number of cancer cases for selected cancers in the SEER-Medicare data [Internet]. [cited 2019 Nov 15]. Available from: https://healthcaredelivery.cancer.gov/seermedicare/aboutdata/cases.html. Accessed 30 Jan 2020.

  35. SEER-MHOS data: number of SEER-MHOS respondents (age 65+) by first cancer site, 1998–2017 [Internet]. [cited 2019 Nov 15]. Available from: https://healthcaredelivery.cancer.gov/seer-mhos/aboutdata/table.cancer.site.html. Accessed 30 Jan 2020.

  36. SEER-MHOS data: demographic characteristics for MHOS (1998-2017) respondents (age 65+) with completed survey before or after their first cancer diagnosis, and those without cancer [Internet]. [cited 2019 Nov 15]. Available from: https://healthcaredelivery.cancer.gov/seer-mhos/aboutdata/table.demographics.html. Accessed 30 Jan 2020.

  37. Humes KR, Jones NA, Ramirez RR. Overview of race and Hispanic origin: 2010 census briefs [Internet] 2010. Available from: www.whitehouse.gov/omb. Accessed 30 Jan 2020.

  38. Susan G. Komen®. Age [Internet]. [cited 2020 Jan 31]. Available from: https://ww5.komen.org/BreastCancer/GettingOlder.html

  39. American Cancer Society. Breast cancer facts & figures. 2017.

  40. US Census Bureau. Income, poverty, and health insurance: 2018. 2019.

  41. Thorpe KE, Howard D. Health insurance and spending among cancer patients. Health Aff (Millwood). 2003;Suppl Web Exclusives.

  42. Bureau UC. American Community Survey (ACS).

  43. USDA ERS - Rural-urban continuum codes [Internet]. [cited 2020 Jan 27]. Available from: https://www.ers.usda.gov/data-products/rural-urban-continuum-codes.aspx. Accessed 30 Jan 2020.

  44. Mapping Broadband Health in America | Federal Communications Commission [Internet]. [cited 2020 Jan 27]. Available from: https://www.fcc.gov/health/maps. Accessed 30 Jan 2020.

  45. Search SEER Linkage Publications [Internet]. [cited 2019 Nov 15]. Available from: https://healthcaredelivery.cancer.gov/publications/. Accessed 30 Jan 2020.

Download references

Acknowledgments

The authors would like to thank all the dataset representatives who responded to the questionnaire. We also thank Kristine De La Torre, PhD, for assistance with identifying datasets of interest and data collection. Finally, we thank Susan G. Komen for funding work for all three of the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Cheng.

Ethics declarations

Conflict of Interest

Mia Levy reports ESAB from Personalis, Inc.; royalties from GenomOncology, Inc.; and serving on the advisory board for Roche DIS outside the submitted work. Alex Cheng and Jerome Jourquin declare no conflicts of interest relevant to this manuscript.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Breast Cancer Disparities

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, A., Jourquin, J. & Levy, M. Large Datasets for Disparities Research in Breast Cancer. Curr Breast Cancer Rep 12, 140–148 (2020). https://doi.org/10.1007/s12609-020-00367-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12609-020-00367-y

Keywords

Navigation