Background

Breast MRI is widely used for breast cancer diagnosis and treatment evaluation [1]. Dynamic contrast-enhanced (DCE) sequences with the use of a contrast agent can provide both morphological and hemodynamic cues for lesion diagnosis. However, a higher false-positive rate and background parenchymal enhancement limit the diagnostic specificity of DCE [2, 3].

Diffusion-weighted imaging (DWI), a noninvasive contrast agent-free method, has been established for breast MR imaging and could improve the diagnostic specificity of lesions suspicious for breast cancer [4, 5]. Conventional DWI (mono-exponential model) with 2 to 3 b-values for the measurement of the apparent diffusion coefficient (ADC) is the most commonly used diffusion fitting model for the characterization of breast lesions [6]. Furthermore, several studies have suggested that biexponential (BE), stretched-exponential (SE), or diffusion kurtosis imaging (DKI), fitting with multi-b-value sequences, could provide more accurate information about water diffusion [7,8,9,10]. Le Bihan et al. [11, 12] proposed the intravoxel incoherent motion (IVIM) model, a kind of BE fitting, to separately calculate fast and slow diffusion components. The SE model was introduced by Bennett et al. [13] to depict the heterogeneity of intravoxel diffusion rates and the distributed diffusion effect. The DKI model was proposed by Jensen et al. [14] to reflect the complexity of the microenvironment. Since DWI with different fitting models may demonstrate different aspects of tissue properties [7, 15, 16], informative radiomics features could be derived from these models to better characterize breast lesions.

Radiomics-based analysis profiles lesions with extensive morphological and textural features for latter classification models to attain better differential diagnosis, prognosis prediction, and tumor subtype diagnosis, etc. [17,18,19,20]. Previous radiomics studies [21,22,23] of breast lesions focused more on the modalities of T2WI and DCE. Fewer studies [17, 24, 25] have considered the value of multiparametric DWI. However, to our knowledge, no study comparing the radiomics features of these different diffusion imaging approaches in the differentiation of breast lesions has been conducted. Further study of multiparametric DWI using more effective machine learning methods is needed to better understand their predictive value in breast cancer diagnosis.

We hypothesized that the diagnostic accuracy of breast lesions using multi-b-value sequences combined with ME, BE, SE and DKI can be improved by radiomics-based analysis. The purpose of this study is to compare radiomics features and the mean values of diffusion metrics in the assessment of breast lesions with four machine learning methods, i.e., random forest (RF), L1 regularization combined with linear regression (L1R-LR), principal component analysis combined with linear regression (PCA-LR), and support vector machine (SVM).

Methods

Study design and patient selection

This retrospective study was performed with a prospectively acquired data set with institutional and governmental review board approval. The local Institutional Review Board (IRB) approved this study. Written informed consent was obtained from each participant. From February 2018 to November 2018, 622 women with lesions suspicious for breast cancer on mammography or ultrasonography (i.e., BI-RADS category 4 or 5) underwent MRI examinations with multi-b DWI. The exclusion criteria included the following: patients previously treated for a malignancy (N = 13), patients without histopathological results (N = 27), and patients with motion artifacts (N = 5), lesions that were not seen in DWI mappings (N = 25), and the mean value of the goodness-of-fit of the diffusion fitting model was less than 0.8 (N = 10). Ultimately, a total of 542 women (mean age, 51 years; age range, 24–84 years), with 542 lesions were enrolled in this study.

MR imaging

All breast MRI examinations were performed on a 1.5 T MR scanner (MAGNETOM Aera, Siemens Healthcare, Erlangen, Germany) with a dedicated 18-channel phased-array breast coil. The breast MR examinations included fat-suppressed T2-weighted fast spin-echo imaging, T1-weighted imaging (T1WI), DWI, and DCE T1WI. All MR imaging examinations were performed before biopsy. The parameters of the above sequences are shown in Additional file 1: Appendix S1.

Image postprocessing and lesion segmentation

After data acquisition, all images were transferred to N4ITK for the data normalization. Then, all these data were assessed by KS and WC (with 8 years and 12 years of experience in breast imaging) to identify all lesions by using the DWI source images with b values of 1000 s/mm2, T2-weighted images, and the first phase of postcontrast T1-weighted images. Clinical information and the X-ray and US images were provided to the radiologists. The lesions were manually segmented in the DW images (b1000) on all visible sections, resulting in a three-dimensional image of the lesion. Lesions were segmented by using the inner border of the lesion to minimize partial volume effects. All volumes of interest (VOIs) were manually segmented and labeled via a free open-source software package (ITK-SNAP, version 3.4.0, http://www.itksnap.org). An overview of our workflow is illustrated in Fig. 1.

Fig. 1
figure 1

Workflow of image processing. a MRI data of multi-b value sequences and quantitative maps from ME, BE, SE and DKI models. b 3D segmentations of lesions shown as surface shaded 3D renderings. c Extraction of radiomics features, i.e., First-order, Shape, GLCM, GLSZM and GLDM. d Radiomics analysis using four models (RF, SVM, PCA-LR, and L1R-LR), and e ROC curve analysis. ROC curves are used for the comparison of four methods, and diagnostic performance of radiomics features and mean diffusion metrics

Diffusion data analysis and processing

All diffusion parameter maps were generated using an in-house MATLAB software (MathWorks, Natick, MA, USA). The software first applied a Gaussian filter with a full width at a half maximum of 3 mm to suppress noise in the diffusion images before the pixel-by-pixel fitting process. Four diffusion models are described as follows:

  1. 1.

    ME model

    ADC maps were generated according to the following equation:

    $$S_{b} /S_{0} = exp \, \left( { - b \cdot ADC} \right),$$

    where Sb represents signal intensity in the presence of diffusion sensitization, and S0 represents signal intensity in the absence of diffusion sensitization. The ADC_all-b maps were generated by using all 13 b values. The ADC0–1000 maps were generated by using b values of 0 and 1000.

  2. 2.

    BE_IVIM model

    The IVIM parameters were fitted using the following IVIM model (proposed by Le Bihan et al. [11, 12]:

    $$S_{b} /S_{0} = \left( {1 - f} \right) \cdot exp\left( { - b \cdot D} \right) + f \cdot exp \, \left( { - b \cdot D^{*} } \right),$$

    where D is the true diffusion as reflected by the pure molecular diffusion, f is the fractional perfusion related to microcirculation, and D* is the pseudo-diffusion coefficient that represents perfusion-related diffusion or incoherent microcirculation.

  3. 3.

    SE model

    The SE model was used to obtain the molecular water diffusion heterogeneity index (α) and the distributed diffusion coefficient (DDC) through the following equation:

    $$S_{b} /S_{0} = exp\left[ { - \, \left( {b \cdot DDC} \right)^{\alpha } } \right],$$

    where α is related to the intravoxel molecular water diffusion heterogeneity, which ranges from 0 to 1. A numerically high α value represents low intravoxel diffusion heterogeneity (approaching mono-exponential decay). DDC represents the mean intravoxel diffusion rate.

  4. 4.

    DKI model

    Calculation of DKI parameters was performed by fitting the following nonlinear equation:

    $$S_{b} /S_{0} = exp\left( { - b \cdot D + 1/6 \cdot b^{2} \cdot D^{2} \cdot K} \right),$$

    where K is a unitless parameter that quantifies the deviation of water motion from the Gaussian distribution. K is zero for a perfect Gaussian diffusion, and a large K indicates considerable deviation of diffusion from a perfect Gaussian behaviour. D is a corrected ADC by removing non-Gaussian bias.

Feature extraction

Radiomics features were calculated using the PyRadiomics Python package (version 2.1.2), and the recommended default settings were used for the analysis [26]. Each map extracted 100 features comprising 18 first-order (FO) features, 14 shape features, 22 Gy level co-occurrence matrix (GLCM) features, 16 Gy level run length matrix (GLRLM) features, 16 Gy level size zone matrix (GLSZM) features, and 14 Gy level dependence matrix (GLDM) features. Details of the extracted features are shown in Additional file 1: Appendix S2. In total, 900 features were extracted. The interclass correlation coefficients (ICCs) were used to determine the interobserver reproducibility of the radiomics features [27].

The mean diffusion metrics of ME (mADCall-b, mADC0–1000), BE (mD, mD*, mf), SE (mDDC, mα), and DKI (mK, mD) were extracted from the radiomics set for separate analysis. Feature importance (FI) was calculated by using random forest. Feature importance was determined as the mean decrease in the impurity of the random forest as previously described [28].

RF, L1R, PCA, and SVM

The 542 subjects were randomly and equally divided into a training set containing 271 subjects and an independent testing set containing the remaining 271 subjects. The ratios of malignant and benign subjects in the training set and the testing set were equal to the ratio in the whole dataset. The RF, SVM, PCA-LR, and L1R-LR algorithms were all based on the most widely used machine learning Python package, i.e., Scikit-learn [29]. For RF, the parameters were set as the default values, the number of trees was 100, and the maximum depth of the tree was 3. For L1 regularization (L1R), the features were selected implicitly by the L1 regularization of the linear classifier. L1R enforced the coefficients of the linear model to be sparse, thus making a small subset of radiomics features contribute to the final results. For PCA, 100 features were selected based on their power to differentiate benign from malignant lesions in the training set by sorting the lowest P values. Then, the first 10 principal components were chosen for the linear model for prediction. The parameter settings of both PCA and L1R followed the widely-used strategies in other MRI-based radiomics studies for breast cancer [23]. For SVM, we used the radial basis function (RBF) kernel. The parameters were optimized with respect to the training set. The hyperparameters of the above four methods are shown in Additional file 1: Appendix S3. The classifiers were trained using the repeated tenfold cross-validation (CV) method (100 times) in the training cohort, and their prognostic performance was then evaluated in the validation cohort using the area under the receiver operating characteristic (ROC) curve. A more detailed description of the frequencies of the features of RF during 100 times of tenfold CV is shown in Additional file 1: Appendix S4.

Statistical analysis

A goodness-of fit evaluation was performed for fitting of the BE, SE and DKI models by using MATLAB (MathWorks). The R2 value was calculated [9]. ROC curves were generated for the mean diffusion metrics (ME-mADCall b, ME-mADC0–1000, BE-mD, BE-mD*, BE-mf, SE_mDDC, SE_mα, DKI-mK, and DKI-mD), and the ROC curves of all the 9 DWI image sets of the RF, L1R, PCA, and SVM models were calculated for comparison. The ROC curves of the 9 diffusion-related image sets were calculated from the results obtained by the CV models in the independent testing set. To compare the AUCs of the mean diffusion metrics and radiomics features, the McNemar test was used for the paired cases. All these comparisons were run 100 times, and we obtained the mean P values. Bonferroni adjustment was performed to control for α error inflation [29]. A P value less than 0.05/23 (0.00217) was regarded as a significant difference. All statistical evaluations were performed by using software developed either with the Python programming language [30] or with MATLAB software.

Results

Image quality of multi-b diffusion weighted imaging

The mean R2 value for the BE model fit was 0.90 ± 0.06. The mean R2 value for the SE model fit was 0.95 ± 0.03. The mean R2 value for the DKI model fit was 0.99 ± 0.01.

The signal intensity of malignant lesions on the map of b2500 was 113. 25 ± 31.53. The signal intensity of benign lesions on the map of b2500 was 36.83 ± 10.73. The signal to noise ratio (SNR) of b2500 was 30.01 ± 10.16. The contrast noise ratio (CNR) of b2500 was 2.25 ± 0.67. The lesion contrast on the map of b2500 was 3.20 ± 1.04. A case of 23 datasets is shown in Additional file 1: Appendix S5.

Patient demographic characteristics

There was significant difference in demographic characteristics between patients with malignant lesions and patients with benign lesions (55.0 ± 12.2 vs. 50.3 ± 11.6, P < 0.001).

Pathological features

Of the 542 lesions, 333 were malignant, and 209 were benign. The malignant lesions included ductal carcinoma in situ (N = 28), lobular carcinoma in situ (N = 1), invasive carcinoma (N = 274), invasive lobular carcinoma (N = 1), invasive solid papillary carcinoma (N = 9), malignant phyllodes tumors (N = 3), mucinous carcinoma (N = 8), metaplastic cancer (N = 1), diffuse large B-cell lymphoma (N = 2), encapsulated papillary carcinoma (N = 3), and invasive micropapillary carcinoma (N = 3). Benign lesions included fibroadenoma (N = 101), benign phyllodes tumors (N = 3), fibrocystic change (N = 4), cyst combined chronic infection (N = 6), papilloma (N = 54), usual ductal hyperplasia (N = 16), fat necrosis (N = 1), and adenosis (N = 24).

Comparison of RF, L1R-LR, PCA-LR, and SVM in the diagnosis of breast lesions with multi-b diffusion-weighted imaging

The AUCs of RF in the differential diagnosis of breast lesions ranged from 0.80 (BE_D*) to 0.85 (BE_D), whereas the AUCs of PCA-LR ranged from 0.53 (SE_DDC) to 0.78 (BE_D*). The AUCs of L1R-LR and SVM ranged from 0.53 (SE_DDC) to 0.83 (ME_ADC0–1000) and from 0.51 (SE_DDC) to 0.81 (ME_ADC0–1000), respectively.

The top image image sets with the highest AUCs by the RF were BE_D (0.85), ME_ADCall b (0.84), DKI_K (0.84), ME_ADC0–1000 (0.83) and DKI_D (0.83). The results of all AUCs by RF are shown in Table 1. The top five image sets with the highest mean AUCs were ME_ADC0–1000 (0.81), BE_D (0.81), ME_ADCall b (0.81), DKI_D (0.80), and DKI_K (0.80).

Table 1 Comparisons between radiomics and mean diffusion metrics

Details on the top five image sets with the highest mean AUCs by RF, SVM, L1R-LR, and PCA-LR are shown in Table 2. The comparisons between RF and L1R, and between PCA and SVM are shown in Additional file 1: Appendix S6.

Table 2 Diagnostic performance of ME_ADC0–1000, BE_IVIM_D, ME_ADCall b, DKI-D and DKI-K by using RF, L1R-LR, PCA-LR, and SVM, respectively

RF achieved the highest frequency of the highest AUCs compared with L1R-LR, PCA-LR, and SVM (8/9 vs. 1/9 vs. 0/9 vs. 0/9, P < 0.001). The mean AUCs of the nine image sets by RF, L1R-LR, PCA-LR and SVM were 0.82, 0.78, 0.73, and 0.76, respectively.

Diagnostic performance comparison of radiomics features by RF and the mean values of diffusion metrics

The interobserver reproducibility of radiomics feature extraction was satisfactory, with ICCs greater than 0.80 for all extracted features. The AUCs of the radiomics features for the differential diagnosis of breast lesions ranged from 0.80 (BE_D*) to 0.85 (BE_D), with a sensitivity of 83% to 88%, and a specificity of 74% to 82%. The AUCs of the mean diffusion metrics ranged from 0.54 (BE_mf) to 0.79 (ME_mADC0–1000), with a sensitivity of 74% to 88%, and a specificity of 41% to 71%. The AUCs of the radiomics features for the differential diagnosis of breast lesions were higher than those of the corresponding mean diffusion metrics, and there were significant differences in the AUCs between the mean values of the diffusion metrics (ME_mADCall-b, ME_mADC0–1000, BE_mD, BE_mD*, BE_mf, SE_mα, and DKI_mK) and the corresponding radiomics features of AUCs (all P < 0.002) for the differentiation of benign and malignant breast lesions. Details of the comparison are shown in Table 3.

Table 3 Diagnostic performance of multi-b diffusion maps based on ME, BE, SE and DKI models

Importance of diffusion-related radiomics features

Details of all radiomics feature importance were shown in Additional file 1: Appendix S7. Regarding the radiomics features computed from nine image sets, the top five important features were FO-10 percentile (FI = 0.043), FO-Median (FI = 0.030), Shape-Sphericity (FI = 0.030), FO-Skewness (FI = 0.029), and Shape-Flatness (FI = 0.026).

Of the radiomics features computed from the map of BE_IVIM_D, which had the highest AUC (0.85), the top five most important features were FO-10 percentile (FI = 0.07), FO-Skewness (FI = 0.06), FO-Minimum (FI = 0.04), GLCM-Cluster Shade (FI = 0.04), and FO-Median (FI = 0.02). Details of the top 20 important features of BE_IVIM_D are shown in Fig. 2. The ROC curve of BE_IVIM_D is shown in Fig. 3.

Fig. 2
figure 2

Top 20 radiomics features of BE_IVIM_D, ranked by the mean decrease in impurity of RF

Fig. 3
figure 3

ROC curve analysis of BE_IVIM_D for radiomics-based analysis with RF, L1R, PCA, and SVM, respectively

Discussion

Based on experimental results of this study, the BE_IVIM_D map (with the highest AUC by RF), and the FO-10 percentile feature (with the highest FI by RF) from the radiomics-based analysis of multiparametric DWI are recommended in the characterization of breast lesions. Furthermore, we also found that the diagnostic performance of multiparametric DWI-derived radiomics was superior to that of the mean diffusion metrics in differentiating between benign and malignant breast lesions. This finding suggests that the radiomics-based analysis for multiparametric DWI has a potentially-improved performance in the classifications of breast lesions.

The majority of radiomics-based analyses in breast MRI research utilize T2WI, contrast T1WI, and conventional DWI [17, 18, 31, 32]. To the best of our knowledge, this is the first study that extensively explored radiomics from multi-b-value maps and its commonly used fitting models (ME, BE, SE, and DKI), which could reflect more details of both Gaussian and non-Gaussian water diffusion distributions in tumors. Bickelhaupt et al. [17] demonstrated that the radiomics features of DKI can help differentiate malignant breast lesions from benign lesions. However, they only used the fitting model of DKI, and their scan sequences contained both the single-shot echo planar imaging (ss-EPI) in 95 patients and readout-segmented echo-planar imaging (rs-EPI) in 127 patients. We used four clinically used diffusion fitting models, and we also enlarged the sample size (542 lesions) in our study. Moreover, all the patients in our study were scanned with rs-EPI, which has significantly higher image quality and lesion conspicuity than ss-EPI, as suggested by previous studies [33, 34].

Many radiomics-based machine learning methods can be used for lesion classification [17, 18, 35, 36]. In this study, we extensively explored four promising algorithms of RF, L1R-LR, PCA-LR and SVM, which have been demonstrated to have high effectiveness in the previous radiomics studies [23, 37]. We found that the ADC0–1000 feature attained the highest mean AUC with all four algorithms, indicating that the mono-exponential model had already provided enough diagnostic information for breast cancer. Furthermore, RF had the highest probability of achieving the highest AUCs (8/9). Accordingly, this finding further corroborates the robustness and strong generalization power of RF [28]. Thus, in our further analysis, both the calculation of feature importance and the comparison of AUCs were based on the results of RF.

The most predictive image set by RF (i.e., with the highest AUC, sensitivity and specificity) was BE_IVIM_D. Of note, BE_IVIM_D can remove the influence of perfusion and therefore reflects the true diffusion coefficient, better reflecting water movement in the living tissues. This may be the reason why the radiomics features computed from BE_IVIM_D provide more accurate information on water diffusion in breast cancer classification. Furthermore, the most important radiomics feature of BE_IVIM_D is the FO-10-percentile. Unlike that in previously reported studies [38,39,40], our experimental results did not suggest that texture features can attain better performance than the FO features on BE_IVIM_D. Accordingly, the FO features may be more predictive of lesion malignancy. On the other hand, the FO-10-percentile was also shown to be the most important feature (FI = 0.043) for the differentiation of benign and malignant breast lesions, indicating that first-order features remain important cues in multiparametric DWI for the differential diagnosis of breast lesions.

Our study has several limitations. First, all lesions in this study were drawn manually, which was time-consuming. Thus, automated lesion segmentation will be implemented in our future study to improve the objectiveness of lesion boundaries and to expedite preprocessing. Second, our multi-b value sequences were acquired with a fixed protocol, whereas the choice of optimal b-values could vary across different institutions. Thus, there was a lack of an external independent verification dataset to verify the generalization ability of this study’s findings. Finally, this study employed all extracted diffusion-related radiomics for breast cancer diagnosis. The feature selection strategy was not implemented in this study. In future studies, we will conduct feature selection to optimize the construction of radiomics models.

Conclusions

In conclusion, the BE_IVIM_D map, and of FO-10-percentile feature by RF enabled accurate differentiation between malignant and benign breast lesions. Radiomics features computed from multiparametric DWI performed better than the mean values in distinguishing benign and malignant breast lesions. Hence, our study may shed a light on the applicability of radiomics from the multiparametric DWI for the clinical diagnosis of breast lesions.