A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System

Wu, Guosong; Cheligeer, Cheligeer; Brisson, Anne-Marie; Quan, May Lynn; Cheung, Winson Y.; Brenner, Darren; Lupichuk, Sasha; Teman, Carolin; Basmadjian, Robert Barkev; Popwich, Brittany; Xu, Yuan

doi:10.1245/s10434-022-12955-6

A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System

Global Health Services Research
Published: 21 December 2022

Volume 30, pages 2095–2103, (2023)
Cite this article

Annals of Surgical Oncology Aims and scope Submit manuscript

Guosong Wu PhD^1,2,
Cheligeer Cheligeer PhD^2,3,
Anne-Marie Brisson MD, MSc⁴,
May Lynn Quan MD, MSc^1,4,6,
Winson Y. Cheung MD, MPH^5,6,
Darren Brenner PhD^1,4,
Sasha Lupichuk MD, MSc⁴,
Carolin Teman MD⁷,
Robert Barkev Basmadjian MSc¹,
Brittany Popwich BSc^1,4,6 &
…
Yuan Xu MD, PhD^1,4,6

470 Accesses
3 Citations
7 Altmetric
Explore all metrics

Abstract

Background

Accurate identification of pathologic complete response (pCR) from population-based electronic narrative data in a timely and cost-efficient manner is critical. This study aimed to derive and validate a set of natural language processing (NLP)-based machine-learning algorithms to capture pCR from surgical pathology reports of breast cancer patients who underwent neoadjuvant chemotherapy (NAC).

Methods

This retrospective cohort study included all invasive breast cancer patients who underwent NAC and subsequent curative-intent surgery during their admission at all four tertiary acute care hospitals in Calgary, Alberta, Canada, between 1 January 2010 and 31 December 2017. Surgical pathology reports were extracted and processed with NLP. Decision tree classifiers were constructed and validated against chart review results. Machine-learning algorithms were evaluated with a performance matrix including sensitivity, specificity, positive predictive value (PPV), negative predictive value [NPV], accuracy, area under the receiver operating characteristic curve [AUC], and F1 score.

Results

The study included 351 female patients. Of these patients, 102 (29%) achieved pCR after NAC. The high-sensitivity model achieved a sensitivity of 90.5% (95% confidence interval [CI], 69.6–98.9%), a PPV of 76% (95% CI, 59.6–87.2), an accuracy of 88.6% (95% CI, 78.7–94.9%), an AUC of 0.891 (95% CI, 0.795–0.987), and an F1 score of 82.61. The high-PPV algorithm reached a sensitivity of 85.7% (95% CI, 63.7–97%), a PPV of 81.8% (95% CI, 63.4–92.1%), an accuracy of 90% (95% CI, 80.5–95.9%), an AUC of 0.888 (95% CI, 0.790–0.985), and an F1 score of 83.72. The high-F1 score algorithm obtained a performance equivalent to that of the high-PPV algorithm.

Conclusion

The developed algorithms demonstrated excellent accuracy in identifying pCR from surgical pathology reports of breast cancer patients who received NAC treatment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports

Article 27 October 2017

Real-world treatment response in Japanese patients with cancer using unstructured data from electronic health records

Article Open access 16 February 2023

Predictive value of radiomic features extracted from primary lung adenocarcinoma in forecasting thoracic lymph node metastasis: a systematic review and meta-analysis

Article Open access 18 May 2024

References

Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424.
Article PubMed Google Scholar
Waks AG, Winer EP. Breast cancer treatment: a review. JAMA. 2019;321:288–300.
Article CAS PubMed Google Scholar
Harbeck N, Penault-Llorca F, Cortes J, et al. Breast cancer. Nat Rev Dis Primers. 2019;5:1–31.
Article Google Scholar
Dialani V, Chadashvili T, Slanetz PJ. Role of imaging in neoadjuvant therapy for breast cancer. Ann Surg Oncol. 2015;22:1416–24.
Article PubMed Google Scholar
Mamounas EP. Impact of neoadjuvant chemotherapy on locoregional surgical treatment of breast cancer. Ann Surg Oncol. 2015;22:1425–33.
Article PubMed Google Scholar
Cortazar P, Zhang L, Untch M, et al. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. Lancet. 2014;384:164–72.
Article PubMed Google Scholar
Spring LM, Fell G, Arfe A, et al. Pathologic complete response after neoadjuvant chemotherapy and impact on breast cancer recurrence and survival: a comprehensive meta-analysis. Clin Cancer Res. 2020;26:2838–48.
Article PubMed PubMed Central Google Scholar
Pondé NF, Zardavas D, Piccart M. Progress in adjuvant systemic therapy for breast cancer. Nat Rev Clin Oncol. 2019;16:27–44.
Article PubMed Google Scholar
Korn E, Sachs M, McShane L. Statistical controversies in clinical research: assessing pathologic complete response as a trial-level surrogate end point for early-stage breast cancer. Ann Oncol. 2016;27:10–5.
Article CAS PubMed Google Scholar
Cortazar P, Geyer CE. Pathological complete response in neoadjuvant treatment of breast cancer. Ann Surg Oncol. 2015;22:1441–6.
Article PubMed Google Scholar
Locke S, Bashall A, Al-Adely S, et al. Natural language processing in medicine: a review. Trends Anaesth Crit Care. 2021;38:4–9.
Article Google Scholar
Chowdhary K. Natural language processing. Fund Artif Intell. 2020. https://doi.org/10.1007/978-81-322-3972-7_19.
Article Google Scholar
Duma N, Hoversten KP, Ruddy KJ. Exclusion of male patients in breast cancer clinical trials. JNCI Cancer Spect. 2018. https://doi.org/10.1093/jncics/pky018.
Article Google Scholar
Cohen JF, Korevaar DA, Altman DG, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6:e012799.
Article PubMed PubMed Central Google Scholar
Lee S, Xu Y, D’Souza AG, et al. Unlocking the potential of electronic health records for health research. Int J Population Data Sci. 2020. https://doi.org/10.23889/ijpds.v5i1.1123.
Article Google Scholar
Pathological Complete Response in Neoadjuvant Treatment of High-Risk Early-Stage Breast Cancer: Use as an Endpoint to Support Accelerated Approval Guidance for Industry. Food and Drug Administration, 2020.
Honnibal M, Montani I. spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To Appear. 2017;7:411–20.
Google Scholar
Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace
Neumann M, King D, Beltagy I, et al. ScispaCy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv: arXiv:190207669(2019).
A comparison of event models for I Bayes text classification. AAAI-98 workshop on learning for text categorization; 1998. Citeseer.
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Google Scholar
Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl. 2013. https://doi.org/10.5121/ijdkp.2013.3402.
Article Google Scholar
Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees. Routledge: New York, NY, 2017.
Mercaldo ND, Lau KF, Zhou XH. Confidence intervals for predictive values with an emphasis to case–control studies. Stat Med. 2007;26:2170–83.
Article PubMed Google Scholar
van Rossum G. Python reference manual. Department of Computer Science [CS] 1995(R 9525).
Tang R, Ouyang L, Li C, et al. Machine learning to parse breast pathology reports in Chinese. Breast Cancer Res Treat. 2018;169:243–50.
Article PubMed Google Scholar
Yala A, Barzilay R, Salama L, et al. Using machine learning to parse breast pathology reports. Breast Cancer Res Treat. 2017;161:203–11.
Article PubMed Google Scholar
Cain EH, Saha A, Harowicz MR, et al. Multivariate machine learning models for prediction of pathologic response to neoadjuvant therapy in breast cancer using MRI features: a study using an independent validation set. Breast Cancer Res Treat. 2019;173:455–63.
Article CAS PubMed Google Scholar
Li F, Yang Y, Wei Y, et al. Deep learning-based predictive biomarker of pathological complete response to neoadjuvant chemotherapy from histological images in breast cancer. J Translat Med. 2021;19:1–13.
Article Google Scholar
Qu YH, Zhu HT, Cao K, et al. Prediction of pathological complete response to neoadjuvant chemotherapy in breast cancer using a deep learning (DL) method. Thorac Cancer. 2020;11:651–8.
Article CAS PubMed PubMed Central Google Scholar
Sutton EJ, Onishi N, Fehr DA, et al. A machine learning model that classifies breast cancer pathologic complete response on MRI post-neoadjuvant chemotherapy. Breast Cancer Res. 2020;22:1–11.
Article Google Scholar
Song Y-Y, Ying L. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry. 2015;27:130.
PubMed PubMed Central Google Scholar
Myles AJ, Feudale RN, Liu Y, et al. An introduction to decision tree modeling. J Chemomet. 2004;18:275–85.
Article CAS Google Scholar
Ford E, Rooney P, Oliver S, et al. Identifying undetected dementia in UK primary care patients: a retrospective case-control study comparing machine-learning and standard epidemiological approaches. BMC Med Informat Decision Making. 2019;19:1–9.
Article Google Scholar
Kim H-E, Kim HH, Han B-K, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digital Health. 2020;2:e138–48.
Article PubMed Google Scholar

Download references

Acknowledgment

This project, entitled “Building Pipeline to Transform Real-World Data to Evidence to Improve Cancer Care,” was supported by the Canadian Cancer Society (CCS).

Author information

Authors and Affiliations

Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Guosong Wu PhD, May Lynn Quan MD, MSc, Darren Brenner PhD, Robert Barkev Basmadjian MSc, Brittany Popwich BSc & Yuan Xu MD, PhD
The Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Guosong Wu PhD & Cheligeer Cheligeer PhD
Alberta Health Services, Calgary, AB, Canada
Cheligeer Cheligeer PhD
Departments of Oncology, Community Health Sciences, and Surgery, and The Center for Health Informatics, Cumming School of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, AB, T2N4Z6, Canada
Anne-Marie Brisson MD, MSc, May Lynn Quan MD, MSc, Darren Brenner PhD, Sasha Lupichuk MD, MSc, Brittany Popwich BSc & Yuan Xu MD, PhD
Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Winson Y. Cheung MD, MPH
Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
May Lynn Quan MD, MSc, Winson Y. Cheung MD, MPH, Brittany Popwich BSc & Yuan Xu MD, PhD
Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Carolin Teman MD

Authors

Guosong Wu PhD
View author publications
You can also search for this author in PubMed Google Scholar
Cheligeer Cheligeer PhD
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Marie Brisson MD, MSc
View author publications
You can also search for this author in PubMed Google Scholar
May Lynn Quan MD, MSc
View author publications
You can also search for this author in PubMed Google Scholar
Winson Y. Cheung MD, MPH
View author publications
You can also search for this author in PubMed Google Scholar
Darren Brenner PhD
View author publications
You can also search for this author in PubMed Google Scholar
Sasha Lupichuk MD, MSc
View author publications
You can also search for this author in PubMed Google Scholar
Carolin Teman MD
View author publications
You can also search for this author in PubMed Google Scholar
Robert Barkev Basmadjian MSc
View author publications
You can also search for this author in PubMed Google Scholar
Brittany Popwich BSc
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Xu MD, PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Xu MD, PhD.

Ethics declarations

DISCLOSURE

The authors declare that they have are no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 22 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, G., Cheligeer, C., Brisson, AM. et al. A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System. Ann Surg Oncol 30, 2095–2103 (2023). https://doi.org/10.1245/s10434-022-12955-6

Download citation

Received: 02 August 2022
Accepted: 01 December 2022
Published: 21 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1245/s10434-022-12955-6

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System