Introduction

Female breast cancer has surpassed lung cancer as the most commonly diagnosed cancer, and it is the leading cause of cancer-related death among women worldwide [1]. Axillary lymph node metastasis (ALNM) is one of essential prognostic factors for breast cancer, guiding therapy decisions [2]. Currently, the gold standard for diagnosing ALNM is still pathological examination. Axillary lymph node dissection (ALND) and sentinel lymph node biopsy (SLNB) are most commonly recommended in clinical practice. However, both of them are invasive procedures and have some complications, such as numbness, seroma, lymphedema, and infection [3]. SLNB has been criticized because of its high false-negative rate [4]. It would be beneficial to develop a noninvasive and precise diagnostic approach to evaluating axillary lymph node status preoperatively, reducing unnecessary lymph node operation and patient distress.

Breast magnetic resonance imaging (MRI), as a noninvasive method, has been widely used in clinical practice with a variety of indications, including screening of high-risk women, tumor staging, and neoadjuvant chemotherapy (NACT) response assessment. Typical breast MRI protocols include multiple different sequences, for instance, T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), and diffusion-weighted imaging (DWI), as well as dynamic contrast-enhanced MRI (DCE-MRI). MRI features mainly derived from DCE and DWI allow for independently predicting lymph node status. The low apparent diffusion coefficient (ADC) value and rim enhancement of tumor in patients with breast cancer were associated with lymph node metastasis [5, 6]. However, manual annotation of tumor imaging characteristics is generally limited to a few qualitative descriptors and observer subjectivity [7, 8].

Radiomics, a hot research topic in recent years, is a process of converting medical images into mineable data by extracting high-throughput quantitative features. [9]. The subsequent analysis of these features can expose intratumor heterogeneity and provide potential noninvasive biomarkers for clinical-decision support [9, 10]. Radiomic nomogram, a graphic representation of model that combines radiomic signature and clinical characteristics, has improved the prediction ability of axillary lymph node metastasis in breast cancer [11].

In combination with deep learning features automatically learned from convolutional neural networks, radiomics showed excellent performance in cancer diagnosis [12, 13]. Compared to the predefined handcrafted radiomics features, deep features of the tuning model are high-level features learning directly from image pixels in a data-driven way, which could supplement predictive information to improve the model performance. To date, most studies used traditional radiomics methods and obtained average performance by manually extracting features from only one sequence—dynamic contrast-enhanced MRI [11, 14, 15]. Few radiomics studies combined multiple sequences for the prediction of ALNM [16, 17]. Until now, deep learning has been less often combined with radiomics for the prediction of ALNM in breast cancer.

Therefore, the purpose of this study was to develop a noninvasive radiomic signature from preoperative DCE-MRI and DWI of the primary tumor combined with clinicopathologic factors to predict ALN metastasis in patients with invasive breast cancer, which helps identifying those patients who have certain negative lymph node invasion and reduce unnecessary invasive procedure.

Materials and Methods

Patients

This retrospective study was approved by the Institutional Ethics Committee of our hospital, and the informed consent requirement was waived (No.XHEC-D-2022–236). Between February 1, 2018, and June 31, 2020, totally 488 lesions of 479 patients were included as the training/testing cohort. Nine patients had bilateral simultaneous breast cancer. Our inclusion criteria were patients with (1) preoperative contrast-enhanced MRI examination before surgery or biopsy; (2) histologically confirmed primary invasive breast cancer; and (3) SLN biopsy or ALND to evaluate the status of ALN. The exclusion criteria were as follows: (1) underwent biopsy, chemoradiotherapy before MRI examination; (2) insufficient image quality; (3) incomplete clinicopathological data.

Finally, a total of 488 lesions (166 lymph node metastasis positive and 322 negative) in 479 patients (mean age, 58.0 ± 11.8 years; age range, 28–89 years) who met the criteria and with DCE-MRI of the same spatial resolution and completed clinicopathologic characteristics were included in this study. The included patients were divided into two cohorts by time. Lesions diagnosed between February 2018 and October 2019 were assigned to a training cohort (n = 366, 123 positive LN/243 negative LN), and lesions diagnosed between November 2019 and June 2020 were included as an independent testing cohort (n = 122, 44 positive LN/78 negative LN). The recruitment pathway is shown in Supplementary material 1.

Clinicopathological Characteristics

Baseline clinical and histopathological data were collected from patient medical records and postoperative histopathology reports, including patient age, menopausal status, LN palpability, LN status (LN with macrometastasis or micrometastasis was considered positive), status of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), progesterone receptor (PR), KI-67 index, histological tumor type and grade, and multifocality (yes or no). Tumor size obtained from MRI report was also considered as a clinical characteristic. The details of patient characteristics in the training cohort and testing cohort are presented in Table 1.

Table 1 Clinical and histopathological characteristics of patients in the entire, training, and testing cohorts

Imaging Acquisition

Imaging was performed on a 3.0 T whole-body MRI scanner (Ingenia, Philips, Netherlands). The patients were positioned in the prone position with both breasts placed in an eight-channel phase-array breast coil. The contrast-enhanced T1-weighted imaging (T1 + C), T2-weighted imaging (T2WI), and diffusion-weighted imaging–quantitatively measured apparent diffusion coefficient (DWI-ADC) imaging were acquired for analysis. The acquisition parameters of the protocols are given in Supplementary material 2. The enhanced T1 high-resolution isotropic volume excitation (e_THRIVE) on Ingenia were obtained before and four times after the intravenous injection of gadopentetate dimeglumine (Gd-DTPA; Beilu, Beijing, China) with 0.1 mmol/kg at a flow rate of 2 mL/s and 20 mL normal saline flush.

Tumor Segmentation

The slice which showed the maximum layer of the tumor was selected by a radiologist (Y.C. with 4-year experience of breast MRI) for analysis. The tumor was segmented manually by using the ITK-SNAP software (an open-source software, version 3.8, http://www.itksnap.org). Tumor segmentation was performed by the same radiologist who was blinded to the clinical and pathological information of the patients, and all contours were reviewed by another senior radiologist (W.L. with more than 10 years of experience). For radiomic feature extraction, we used the precontrast DWI (images of b = 800 s/mm2), the ADC map, and the second and fourth postcontrast phase of the DCE sequence (dyn2 and dyn4). The tumor region of interest (ROI) was firstly manually delineated on the dyn2 image and DWI and then was copied to the corresponding sequence (dyn4 and ADC map), followed by a manual adjustment of the segmentation contours on these sequences as needed. Only the largest tumor lesion was segmented for analysis in the cases that patients had ipsilateral multifocal or multicentric lesions.

Radiomics Features Extraction, Selection, and Radiomics Signature Construction

A total of 1000 deep learning features were extracted from each ROI by the pretrained Densely Connected Convolutional Networks (DenseNet)121 [18] on ImageNet by using Keras 2.0.5 of Python 3.7. Features of two DCE phases and DWI-ADC sequence were extracted separately.

We used two methods to select/rank the most significant features prior to the modeling process for ALN metastasis status classification—the maximum relevance minimum redundancy (mRMR) algorithm and the least absolute shrinkage and selection operator (LASSO) technique. At first, mRMR was used to eliminate the redundancy of the features; meanwhile features of high correlation with the label were retained. Then LASSO with tenfold cross-validation was conducted to choose the optimized subset of features to construct the final model. In considering of the imbalance of the data, we used Synthetic Minority Oversampling Technique (SMOTE) to oversampling of the small number parts.

Radiomics score for each image sequence and their combination was calculated by summing the selected features weighted by their corresponding coefficients.

Clinicopathological Model, Combined Model, and Nomogram Establishment

Univariate analysis was applied to select statistically significant clinicopathological characteristics (P < 0.05), including patient age, menopausal status, LN palpability, status of HER2, ER and PR, KI-67 index, and histological tumor type and grade, multifocality, and tumor size. Backward stepwise multivariate logistic regression was used to construct the clinicopathological and combined model based on the significant clinicopathological characteristics and the combination of the clinical features and radiomics score, respectively. To provide the clinicians and patients with an individualized and easy-to-use tool for LN metastasis prediction, the combined model was exhibited as a nomogram.

Model Performance Evaluation

Radiomics signature, clinicopathological model, and combined model were constructed on the basis of data from training cohort, and then the performance of the above models was strictly evaluated with an independent testing cohort.

The receiver operating characteristic (ROC) curve analysis was performed, and the area under the ROC curve (AUC) and accuracy were used to evaluate the performance of the models. The optimal cutoff threshold was identified by maximizing the Youden index (sensitivity + specificity − 1). The AUC, sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) were then calculated with the cutoff of ROC curve identified in the training cohort, which was also applied to the testing cohort.

The Delong test was used to compare these models according to AUC values. The agreement between the LN metastasis predictions and the actual outcomes was assessed using a calibration curve. In addition, the Hosmer–Lemeshow test was used to assess the performance of the combined nomogram. To investigate the clinical usefulness of the deep learning signature, we adopted the decision curve analysis (DCA) to estimate the standard net benefits (sNB) at different threshold probabilities.

Statistical Analysis

The Mann–Whitney U-test and x2 test or Fisher’s exact test were used to assess the difference in continuous and categorical variables separately. All statistical analyses for the present study were performed with R (version 3.5.1) and Python (version 3.7.0). A two-tailed P value < 0.05 indicated statistical significance.

Results

Radiomics Signature Construction

After the feature extraction and selection, twenty-three features were selected to build the radiomics signature, including two features from dyn2, 7 from dyn4, 7 from DWI, and 7 from ADC. The rad-score based on selected features weighted by their coefficients was calculated by using the formula in the Supplementary material 3. There was a statistically significant difference in rad score between malignant and non-malignant group in both training (P < 0.0001) and testing cohort (P = 0.0035)). The radiomic scores of patients are shown in Fig. 1.

Fig. 1
figure 1

Boxplots of the radiomic score in training cohort (a) and testing cohort (b). Label 0 indicates axillary lymph node metastasis negative, and label 1 indicate axillary lymph node metastasis positive

As shown in Fig. 2, in the training cohort, the AUC of dyn2, dyn4, DWI, ADC, and combined radiomic signature were 0.66, 0.72, 0.70, 0.69, and 0.76, respectively. In the testing cohort, the AUC of dyn2, dyn4, DWI, ADC,and combined radiomic signature were 0.63, 0.63, 0.65, 0.63, and 0.66, respectively.

Fig. 2
figure 2

a, b ROC curves of dyn2, dyn4, DWI, ADC, and combined radiomic signature for prediction LN metastasis in the training cohorts

Construction of Clinicopathological Model and Combined Nomogram

After univariate analysis and backward stepwise-selection multiple logistic regression analysis, three clinicopathological factors were selected to construct clinicopathological model including LN palpability (odds ratio (OR) = 6.04; 95% confidence interval (CI) = 3.06–12.54, P = 0.004), tumor size in MRI (OR = 1.45, 95% CI = 1.18–1.80, P = 0.104), and Ki-67(OR = 1.01; 95% CI = 1.00–1.02, P = 0.099). Then we developed a nomogram based on a radiomic signature and the three significant clinicopathological factors (Fig. 3).

Fig. 3
figure 3

Nomogram for prediction of LN metastasis. The different values for each variable corresponds to a point at the top of the graph, while the sum of the points for all the variables corresponds to a total point; draw a line from the total points to the bottom line which is the probability of LN metastasis

Classification Performance

The calibration curve for the nomogram was tested using Hosmer–Lemeshow test and yielded a non-significant result (P = 0.23 in training cohorts) providing evidence of good calibration (Fig. 4a, b). The nomogram displayed an AUC of 0.80 (95% CI [0.75, 0.84]) for predicting LN metastases in the training cohort, and the sensitivity, specificity, and accuracy were 56%, 85%, and 72%, respectively. In the testing cohort, it also displayed excellent prediction efficacy, with an AUC of 0.71 (95% CI [0.61, 0.81]), and the sensitivity, specificity, and accuracy were 65%, 80%, and 75%, respectively (Fig. 5a, b, Table 2).

Fig. 4
figure 4

a, b Calibration curve of the nomogram for the training cohort and testing cohort. The calibration curve of the model shows the agreement between the predicted probability (x-axis) and actual probability (y-axis) in LN metastasis. The solid line in the middle represents the perfect prediction, and the dotted line represents the predictive power of the nomogram. The closer the solid line is to the dotted line, the better the predictive power of the model

Fig. 5
figure 5

a, b ROC curves of radiomic signature, clinicopathological model, and combined nomogram for prediction LN metastasis in the training and testing cohorts

Table 2 Prediction performance of training and testing cohorts of the combined model

Clinical Use

The decision curve analysis for the nomogram is shown in Fig. 6. The decision curve analysis indicated that when the threshold probability is within a range from 0.07 to 0.85, the net benefit of using nomogram to predict LN metastasis is greater than treat-all or treat-none scheme.

Fig. 6
figure 6

Decision curve analysis of the nomogram. The vertical axis shows the net benefit of standardization, and the two horizontal axes display the correspondence between the risk threshold and the cost–benefit ratio. The closer the decision curves to the yellow and black curves, the more similar the net benefit of the models as those from the assumption that “none” or “all” patients have positive labels (LN metastasis in the current study). The decision curves also could be used for comparing the net benefit of different models within a specific threshold probability point or range. The higher the decision curve of the model, the larger the net benefit

Discussion

To date, there are plenty of works that evaluated the effectiveness of breast radiomics in breast cancer diagnosis, identification, prognosis, or response to therapy, using the imaging information produced by different techniques (US, mammography, and MRI) [16, 19,20,21]. However, fewer are the studies that evaluate the use of breast radiomics in predicting axillary lymph node metastasis, especially using deep learning features. In this study, we developed a radiomic signature based on features extracted from DCE MRI and DWI-ADC, and the capability of the radiomic signature for estimating LN metastasis is impressing. Combined with clinicopathological characteristics, the nomogram displayed excellent ability to predict LN metastases with an AUC of 0.71, a sensitivity of 65%, a specificity of 80%, and an accuracy of 75% in the independent testing cohort.

In addition, there are some strengths of our study that need to be highlighted. Firstly, most of previous studies are monocentric retrospective studies, with a population less than 400 patients. Our study includes 479 patients with 488 breast lesions, being in the top of the current literary trend.

Secondly, in this study, we applied a pretrained neural network of DenseNet121 [18] for radiomics feature extraction, which has a denser connectivity pattern compared to traditional convolutional networks. It has been widely used in different medical tasks, such as discrimination of pancreatic cysts [22] and LN status and diagnosis of COVID-19 [23,24,25], showing comparable accuracy with other deep learning models [23, 24]. The application of deep learning features extracted by neural network from anatomical and functional MRI scans provided a new approach that was promising for intratumor heterogeneity quantification.

Thirdly, previous studies have proved that many clinicopathological characteristics were correlated with LN metastasis, such as histological type, LN palpability, and multifocality [11, 26]. Our results were superior to the studies only based on the clinicopathologic information which all failed to reach an AUC > 0.8 [27]. We combined radiomic signature with clinicopathologic characteristics, including LN palpability, tumor size in MRI, and Ki-67, to effectively improve the prediction performance, improving AUC from 0.76 to 0.80 in the training cohort.

Fourthly, regarding the choice of the contrast enhanced phase, there is currently no consensus in defining in which phase the extraction of features offers the best forecast. For the same purpose of this paper, Han et al. [11] and Liu et al. [14] have applied radiomic features extracted from the first contrast-enhanced phase, respectively, obtaining an AUC of 0.78 and 0.81. Song et al. [28] extracted features from the second contrast-enhanced phase with an AUC of 0.805. For tumor-infiltrating lymphocyte (TIL) prediction, Tang et al. [29] found that image features extracted from the delayed phases can help improve the model performance. Therefore, both the second and fourth contrast-enhanced phases were used in our study, and we also found that the model of dyn4 exceeded the model of dyn2 in identifying ALNM, and the final formula of the rad-score contained more features from dyn4 than dyn2.

Fifthly, this study stood out from previous radiomic studies because we not only used two contrast-enhanced phases but also used additional DWI-ADC sequences for more robustly interpreting intratumor heterogeneity. Most of pervious works use only contrast-enhanced sequence which is the main sequence for detection and characterization of the breast lesion, but it might be inadequate to reflect tumor heterogeneity. DWI with ADC assesses the restriction of water molecule diffusion, which could increase the specificity of MRI for predicting ipsilateral axillary LN metastases in patients with newly diagnosed breast cancer [30]. Dong et al. [31] jointed only the T2-weighted and diffusion sequences for the prediction of sentinel lymph node metastasis with the aim of preventing the use of the intravenous contrast medium, obtaining AUC values of 0.805. Yu et al. [16] used contrast-enhanced T1-weighted imaging (T1 + C), T2WI, and DWI-ADC sequences to construct the radiomic signature which identified ALNM with AUC of 0.88 and predicted 3-year Disease-free survival (DFS) with AUC of 0.81. With a combination of DCE-MRI and DWI-ADC sequences, we constructed a radiomic signature to predict LN metastasis with an AUC as high as 0.76 in the training cohort. In addition, we do notice that the selected radiomic features contained fewer DCE features but more DWI-ADC features. This observation merits further investigation in future work.

Our greatest advantage is that LNs can be evaluated pre-operatively and without damage of lymph node biopsy. The clinicopathological-radiomic nomogram that incorporate the radiomic signature and clinicopathological factors successfully stratified breast cancer patients according to their risk of ALNM which can be used to guide the further treatment planning in breast cancer avoiding unnecessary SLNB or ALND and the corresponding complications.

There are several limitations associated with this study. First, the dataset from a single MR scanner in a single institution with a consistent scanning protocol were used to extract deep learning features, which allows maximum reproducibility in the extraction and analysis of radiomic characteristics. Our findings required future multicenter validation in larger dataset to achieve high-level evidence, so that it could serve better in clinical application. Second, it is worth noting that radiomics, like other techniques, has some technical limitations—including susceptibility toward image acquisition and reconstruction parameters [32]. The convolutional neural network has been proven to be superior to radiomic analysis for the classification of enhancing lesions as benign or malignant at multiparametric breast MRI [33]. Although we have used a pretrained neural network for feature extraction and achieved impressive results with radiomics methods, further advanced neural networks can be directly applied to the classification task of predicting ALNM in the future.

Conclusions

This study described the application of MRI-based deep learning radiomics in patients with breast cancer to predict axillary lymph node metastasis. Both the clinicopathological-radiomic nomogram and the deep learning signature were valuable in clinical decision-making and provide a noninvasive approach to structuring the treatment strategy.