Prediction of treatment response, upfront or as early as possible, is important in order to offer BC patients individualised treatment. Currently, evaluation of responses to NACT includes anatomical imaging, functional imaging (metabolic evaluation through PET/CT), and possibly biomarker evaluation [10, 25]. To the best of the authors’ knowledge, this study is the first one investigating the use of AI on baseline DM to predict treatment responses. In this report, we present results of a deep learning–based method on baseline DM and its capability to subsequently identify patients who achieved pCR, resulting in an AUC of 0.71.
AI and treatment response evaluation
Tahmassebi et al. conducted a study using machine learning based on both pre- and during-NACT MRI (N = 38) with residual cancer burden as an outcome measure (with class zero being defined as pCR), which yielded an AUC of 0.86 . A study by Qu et al. presented results of a deep learning–based method applied to MRI (N/training = 244, N/validation = 58) using pCR as an outcome measure and showed an AUC of 0.55 using pre-NACT data in comparison to an AUC of 0.97 when using post-NACT data or the combination of both pre- and post-NACT MRI . Sutton et al. applied machine learning to pre- and post-NACT MRI (N/training = 222, N/validation = 56) and showed an AUC between 0.78 and 0.83. In the latter model, the molecular subtype was added to radiomics . From the I-SPY TRIAL breast MRI database, an implemented CNN algorithm on MRI (N = 131) showed an AUC of 0.72 . Similarly, CNN used in a pre-NACT MRI study by Ha et al. (N = 141) showed an AUC as high as 0.98 . Cain et al. built multivariate machine learning models (logistic regression and a support vector machine) based on pre-NACT MRI (N/training = 144, N/ validation = 144), which resulted in an AUC of 0.71 . Nevertheless, our results suggest that our AI model on DM is in the range of those based on pre-NACT MRI. The most obvious advantages of DM are easy accessibility worldwide in contrast to the expensive and more complicated imaging methods of MRI and PET/CT.
Predictive factors for NACT response
It is well known that different BC subtypes with their heterogeneous biology respond differently to NACT . Generally, the most aggressive BC subtypes are associated with higher pCR rates . On the other hand, the relevance of pCR as an outcome measure is less certain for luminal BC, which is often considered a less aggressive subtype [32, 33]. In addition to BC subtype and immunohistochemical parameters, immunological markers (such as tumour-infiltrating lymphocytes), tumour-genetic profiles (which are commonly used in the adjuvant setting), and immune-associated signatures hold predictive information at baseline [34,35,36]. However, the tumour and its relation to the surrounding tissue must be taken into account. The local microenvironment and the systemic host characteristics also influence tumour response, and these properties are not routinely included in medical decision-making and treatment algorithms [11, 37]. Increased mammographic density, MRI background parenchymal enhancement, higher age and higher body mass index have been suggested to be associated with lower rates of pCR [13, 38,39,40]. In addition, multiple studies have investigated dynamic predictive factors, for example, predicting pCR status by considering a change in various biomarkers, such as tumour immune microenvironments , measurements of cell loss  and circulating tumour cells . Many tumour response studies using structural and functional imaging studies are published using both conventional and state-of-the-art imaging, including mammography, tomosynthesis, ultrasound, MRI, PET and shear wave elastography [44,45,46,47]. Evaluating AI in breast tomosynthesis would be an interesting line of research for future studies. In order to fine-tune predictive information, many nomograms have been developed that consider multiple parameters aiming to optimise precision in estimation of response to NACT . Recently, the concept has been further developed by evaluating the predictive performances of machine learning using clinical and pathological data .
Implications of identifying pCR/non-pCR
In order to individualise NACT treatment, more biomarkers, including imaging biomarkers, are needed. Tools to early identify responders from non-responders could aid clinical decision-making, motivate patients to continue treatment, and enable the concept of response-guided treatment as introduced in the GeparTrio study . Since response-guided treatment is currently lacking convincing evidence of its benefits, the common strategy is to complete NACT unless evident progression or intolerable side effects occur [11, 37]. Early identification of patients who are not likely to achieve pCR after subsequently administrated NACT has the potential to improve tailored treatment and escalate/de-escalate treatment accordingly. On the other hand, in the post-NACT setting, the potential clinical gain is mostly a surgical matter; if imaging in combination with minimally invasive procedures could lead to a considerably high degree of correctly identified pCR, further invasive surgery for these patients may not be needed.
Digital mammograms: AI versus breast radiologists
To evaluate the performance of our AI model in relation to the performance of radiologists, seven experienced radiologists jointly reached an AUC of 0.71 in correctly discriminating between pCR and non-pCR (unpublished data from the NeoDense trial ) for the post-NACT time point by evaluating DM. The baseline NeoDense study-specific protocol was not designed for the radiologists to estimate subsequent pCR status after completion of NACT; therefore, a more direct comparison between performances at the pre-NACT time point was not possible. Also, comparison with MRI is not possible since this modality is not currently used for this purpose at the study sites.
Strengths and limitations
We present the results of a relatively large BC patient cohort who received NACT according to clinical routine. Conventional imaging was used according to the local routine at the time being; thus, MRI, as used by many other researchers, was not available. While hindering direct comparison, the use of DM makes our study unique since, to the best of our knowledge, no literature is available concerning AI application to baseline DM to predict treatment response during NACT for BC patients. Before AI training, a test set of patients were set aside for final assessment, enhancing validity of our results. The concerns with a binary output as pCR must be briefly acknowledged. Many post-NACT pathological assessment scores also reflect partial responses, possibly providing a more nuanced prognosis. The importance of these results is most evident when considering salvage adjuvant chemotherapy for which residual cancer burden score 1 (“near-pCR”) shows as good an outcome as patients who achieved a pCR . Nevertheless, convincingly, pCR is still the most widely accepted endpoint in NACT studies.
A limitation of our study is the heterogeneous cohort in terms of both BC subtype and time period for NACT treatment. Unfortunately, the cohort was not large enough to perform subgroup analyses according to BC subtype since AI modelling demands a large number of images. Here, we shortly address possible concerns of the long recording period (2005–2019) and possible changes in NACT treatment during this time period. For both cohorts, the standard NACT contained series of FEC or EC followed by series of taxanes (docetaxel or paclitaxel) and, in the case of HER2-positive tumour, combined with HER2 blockade (trastuzumab/pertuzumab). Thus, the NACT regimen was consistent during the recording time and we therefore believe this to be of minor impact.
Next, we will train AI using dynamic DM from three time points during NACT and further explore explainable AI by identifying the areas on the mammograms that AI find most informative to generate “heat maps.” In addition, the concept of AI-guided response evaluation during NACT can be applied to other medical images for other organs.