Introduction

Contrast-enhanced breast magnetic resonance imaging (MRI) is widely used in detecting and diagnosing breast cancer, screening high-risk groups, staging, preoperative planning, and assessing treatment effectiveness [1,2,3,4,5]. In the field of imaging diagnosis of breast lesions, contrast-enhanced MRI has demonstrated higher diagnostic capabilities than mammography and ultrasonography. Although breast MRI exhibits high sensitivity, its specificity is insufficient, leading to the issue of false positives [6, 7].

The imaging diagnosis of contrast-enhanced breast MRI follows the Breast Imaging Reporting and Data System (BI-RADS) guidelines, incorporating known terms that suggest benign or malignant findings. Although features of malignant lesions have been extensively evaluated, lexicons characterizing benign lesions are limited to circumscribed round or oval shapes, slow-persistent enhancing patterns, and dark internal septations, among others, often resulting in overlap with malignant findings [8].

Although some reports have suggested patient age, lesion size, location, and T2-weighted signal intensity as indicators for benign or malignant diagnosis, they are not addressed in the BI-RADS assessment criteria [8,9,10,11]. Moreover, the exploration of comparing or combining these criteria with BI-RADS to develop a comprehensive approach for distinguishing benign and malignant lesions has been limited.

BI-RADS recommends biopsy for category 4 lesions with a 2% or higher probability of malignancy [8]. However, the importance lies in establishing robust criteria for discerning benign cases from malignant cases because performing biopsies indiscriminately is not ideal. By thoroughly considering the possibility of benign outcomes, providing a solid foundation for evaluation can reduce unnecessary biopsies.

Gadobutrol, a macrocyclic gadolinium-based contrast agent (GBCA) with double the gadolinium concentration of other agents [12], offers superior image quality in breast cancer imaging [13,14,15]. However, Further research is needed on its impact on breast MRI accuracy.

In this study, we compiled contrast-enhanced MR images of benign and malignant breast lesions using gadobutrol from multiple institutions and aimed to elucidate the imaging features for discriminating between malignant and benign lesions by comparing and analyzing various imaging findings, including the BI-RADS 2013 assessment criteria, along with the clinical characteristics of patients.

Materials and methods

Patient selection and clinicopathological factor evaluation

We conducted a retrospective multicenter study involving 11 hospitals in Japan. This study was approved by the Ethical Review Board on Clinical Studies of each participating institution. Many subjects in this study had completed their medical treatment and were difficult to contact; therefore, they opted out by public announcement.

The inclusion criteria were as follows: (1) women with BI-RADS category 3 or 4 lesions on MRI; (2) patients aged 20–69 years; and (3) patients who have been informed of having a malignant lesion, if present. The exclusion criteria were as follows: (1) patients with a history of anaphylactoid or anaphylactic reaction to any contrast media; (2) those with impaired renal function (estimated glomerular filtration rate (eGFR) < 30 mL/min/1.73 m2); (3)pregnant or breastfeeding women; (4) patients with a history of treatment for breast cancer; (5) patients undergoing drug therapy for breast cancer; and (6) patients judged for any reason as being ineligible for participation in this study by the investigator.

This study aimed to elucidate the characteristics of benign lesions by collecting several samples, and to achieve this, we included benign and malignant cases in a 1:1 ratio. The number of target subjects was 200 (100 benign and 100 malignant lesions). Among patients with suspected malignant breast lesions by mammography or ultrasonography and MRI with gadobutrol between July 1, 2015 and January 31, 2018 at the participating centers in the study, benign cases that did not violate the exclusion criteria were selected retrospectively from January 31, 2018. The benign cases for the study were selected sequentially from those that underwent MRI examinations with gadobutrol. Additionally, for the purpose of having a control, one malignant case was chosen, specifically the one with the imaging date closest to that of the benign case.

Benign lesions were defined as those with a benign histological diagnosis after MRI examination or without a histological diagnosis of cancer within 1 year. Malignant lesions were defined as all cases with a histological diagnosis of breast cancer within 1 year after MRI.

The following data were examined from the medical records: (1) date of birth, (2) height, (3) weight, (4) history of breast treatment (e.g., surgery, radiation, and chemotherapy), (5) menstrual history, (6) menstrual cycle, (7) date of the onset of most recent menstrual period on MRI, (8) pregnancy history, (9) childbirth history, (10) history of female hormone use including low-dose oral contraceptives at the time of MRI, and (11) histopathology results.

MRI protocols

MRI examinations were performed using a 1.5 or 3 T system. 3D fast gradient echo (GRE) T1-weighted images with fat suppression of the entire breasts of either side were acquired with a dedicated breast coil. No restrictions were set regarding the acquisition plane, repetition time, echo time (TE), or flip angle. The prescribed slice thickness was 1–3 mm. Gadobutrol (Gadovist 1.0®, Bayer AG, Germany) was administered at a dose of 0.1 mmol/kg body weight of gadolinium. Dynamic MR images were acquired before and at least two phases, an early phase (1–2 min) and a delayed phase (5–7 min), after bolus injection of the contrast medium, followed by a saline flush using an automatic injector. T1- and fat-suppressed T2-weighted images were also obtained consistently. Figure 1 shows the representative MR images of this study.

Fig. 1
figure 1

Representative MR images. A represents T1-weighted image, B represents fat-suppressed T2-weighted image, C represents fat-suppressed T1-weighted image, D represents early post-contrast phase, E represents late post-contrast phase, and F represents delayed post-contrast phase

Image analysis

In this study, two radiologists (the first, one of three radiologists with 7–12 years of experience, and the second, a radiologist with 22 years of experience) evaluated breast MR images by consensus without knowledge of the patients’ clinical progress. They assessed lesion characteristics, including lesion location (i.e., side, quadrant, and depth), lesion size, fibroglandular tissue (FGT), background parenchymal enhancement (BPE) (i.e., level and symmetry), signal intensity in T1, and signal intensity in T2. Moreover, for lesions categorized as masses, shape, margin, internal enhancement, and kinetic pattern were evaluated. For lesions classified as non-mass enhancement (NME), they assessed their distribution and internal enhancement patterns. The evaluation of the images was based on the Breast Imaging Reporting and Data System, 5th edition [8]. The medical viewing system EV Insite R (PSP Co., Tokyo, Japan) was used, which offers reading tools, such as window width–window level adaptation, panning, and zooming.

Statistical analysis

We performed univariate logistic regression analysis, with the case (benign) or control (malignant) group as the dependent variable and clinical and imaging characteristics as the independent variables. Univariate logistic regression analysis was performed by binarizing multiple independent variables wherever possible. Subsequently, we conducted a multivariate logistic regression analysis, with the case (benign) or control (malignant) group as the dependent variable and included age as an adjustment factor. All imaging characteristics that showed a p-value < 0.2 and a sufficient sample size in the univariate logistic regression analysis were entered as independent variables. We then conducted a multivariate decision tree analysis. Furthermore, we devised a predictive model using multivariate logistic regression analysis, and the area under the curve (AUC) was calculated using receiver operating characteristic (ROC) curve analysis. All statistical analyses were performed using Statistical Package for the Social Sciences (version 26; IBM Corp., Armonk, NY, USA), and p-values < 0.05 were used to denote statistical significance.

Results

In our study, all 100 malignant cases underwent histological diagnosis. Of the 100 malignant cases, 66 were diagnosed with invasive ductal carcinoma of no special type, 4 were diagnosed with mucinous carcinoma, 3 were diagnosed with other special types of invasive ductal carcinoma, 3 were diagnosed with invasive lobular carcinoma, and 24 were diagnosed with ductal carcinoma in situ. In the benign group, 91 cases were confirmed as benign using biopsy: 30 fibroadenomas, 24 intraductal papillomas, 14 mastopathies, 8 lobular tumors, 6 sclerosing adenomas, and 9 cases of other histologic types. Among the malignant group, 51 were diagnosed through surgery and 49 through image-guided biopsy, while among the benign group, 29 were diagnosed through surgery, 62 through image-guided biopsy, and 9 through follow-up observation.

Table 1 shows patient characteristics, and Table 2 summarizes the imaging findings. Table 3 presents the results of the univariate analysis, revealing significant differences in independent variables, such as age, lesion location (i.e., quadrant and depth), mass (i.e., shape, margin, internal enhancement, kinetic-initial phase, and kinetic-delayed phase), NME distribution, FGT, BPE level, and T2 signal intensity.

Table 1 Patient characteristics
Table 2 Summary of imaging findings
Table 3 Univariate analysis

Multivariate logistic regression and decision tree analyses were performed on 151 cases of mass to develop prediction models. Figure 2 and Table 4 show the results of the multivariate logistic regression analysis, and Fig. 3 shows the results of the decision tree analysis.

Fig. 2
figure 2

Receiver operating characteristic analysis for mass

Table 4 Logistic regression analysis for mass
Fig. 3
figure 3

Decision tree analysis for mass

According to the multivariate model with logistic regression analysis, old age, lesion location (quadrant: left outer quadrant [LOQ], upper outer quadrant [UOQ], upper inner quadrant [UIQ], or left inner quadrant [LIQ]), margin (i.e., irregular or spiculated), and kinetic-delayed phase (i.e., plateau or washout) significantly increased the risk of malignancy. Moreover, factors, such as internal enhancement (i.e., heterogeneous or rim enhancement) and FGT (i.e., fatty, scattered, or heterogeneous), were included in the model as factors that substantially increased the risk of malignancy, although these factors were not statistically significant. The AUC of the model was 0.925 (95% confidence interval [CI] 0.881–0.970) in the ROC analysis.

According to the decision tree analysis, margin, age, and lesion location (depth) were diagnostic indicators of the model. The sensitivity, specificity, and positive predictive value of this model were 0.770 (95% CI 0.658–0.860), 0.922 (95% CI 0.838–0.971), and 0.848 (95% CI 0.780–0.901), respectively.

Multivariate logistic regression analysis and decision tree analysis were performed on 49 cases of NME to develop prediction models. Figure 4 and Table 5 show the results of the multivariate logistic regression analysis, Fig. 5 shows the results of the decision tree analysis for NME.

Fig. 4
figure 4

Receiver operating characteristic analysis for non-mass enhancement

Table 5 Logistic regression analysis for NME
Fig. 5
figure 5

Decision tree analysis for non-mass enhancement

According to this multivariate model with logistic regression analysis, the BPE level (i.e., minimal and mild) and distribution (i.e., linear and segmental) significantly increase the risk of malignancy. The AUC of the model was 0.829 (95% CI 0.706–0.951) in the ROC analysis.

According to the decision tree analysis, BPE level and distribution were diagnostic indicators of the model. The sensitivity, specificity, and positive predictive value of this model were 0.538 (95% CI 0.334–0.734), 0.913 (95% CI 0.720–0.989), and 0.714 (95% CI 0.567–0.834), respectively.

Discussion

Breast MRI detects abnormalities and determines whether these abnormalities are benign or malignant. Achieving a definitive diagnosis using MRI alone is challenging, and additional information is necessary to assess malignancy and ensure proper management. Using terms based on the BI-RADS definition allows for the consideration of findings from the perspective of whether they are more suggestive of malignancy or benignity [16, 17].

In this study, we identified important diagnostic terms to discriminate benign cases from malignant ones by performing univariate and multivariate logistic regression and decision tree analyses, making this a very valuable study.

Univariate analysis revealed significant differences in mass (i.e., shape, margin, internal enhancement, and kinetics) and NME distribution, which are included in the BI-RADS assessment criteria and have been emphasized in previous diagnoses. However, the importance of other factors has not been considered. In particular, the significance of age in the diagnosis was shown to be extremely significant. It is clinically evident that younger individuals often have benign lesions [18]; however, this factor may be less noticeable when examining images alone, making it worthy of attention. Furthermore, in this study, the lesion site (i.e., quadrant and depth) was useful in distinguishing benign cases from malignant ones. Intraductal papilloma, which is a benign tumor, has been shown to have a high incidence in the subareolar and superficial areas [19]. This study included 24 cases of intraductal papilloma, which may have influenced the results. Although lesion location is not included in the BI-RADS diagnostic criteria, it should be a factor that can be used as a reference when making a diagnosis. FGT and BPE levels are factors suggested to be associated with the occurrence of breast cancer, and they have been implicated as factors related to malignancy, indicating their significance in image diagnostics [20, 21]. Although previous studies have reported that lesion size is an independent factor in diagnosing solitary breast masses and incorporating lesion size information into the BI-RADS-MRI 2013 descriptors enables more precise categorization [9], this study did not establish a relationship between lesion size and the likelihood of malignancy. These findings may be influenced by subjective assessments or selection bias, necessitating further investigation.

Multiple regression analysis of masses showed that age, lesion location (quadrant), margin, and kinetic-delayed phase significantly increased the risk of malignancy. The proposed model showed high diagnostic performance in the ROC analysis (AUC of 0.925 [95% CI 0.881–0.970]. In contrast, the decision tree analysis of mass showed that age is useful in the diagnosis process when the margin is circumscribed, and location is useful when the margin is irregular or spiculated. Even if the mass margin is circumscribed, the frequency of malignancy increases with age; therefore, caution should be exercised if the patient is over 64 years of age. If the margin of the mass is irregular or spiculated, the likelihood of malignancy is higher when it is located in the middle or posterior part of the breast. This may be related to previous studies reporting that triple negative breast cancers and breast cancers arising from BRCA2 mutation carriers tend to originate from the posterior portion of the breast [22, 23]. However, this study has not investigated BRCA gene mutations or breast cancer subtypes, so further investigation is needed in the future.

Multiple regression analysis of NME showed that BPE level and distribution significantly increased the risk of malignancy. The model designed for NME had a lower diagnostic performance (AUC of 0.829 [95% CI 0.706–0.951) than the model designed for mass, possibly because of the small sample size. In this study, multiple regression analysis for NME showed that malignancy is suggested when the distribution is segmental and the BPE level is minimal. This may indicate that BPE levels influence the diagnosis of NME. There have been several reports on NME, each reporting the frequency of malignancy according to the BI-RADS lexicon. NME is widely distributed, ranging from 25% to 83.6%, and the frequency of each finding varies [24]. However, there is a limited number of reports focusing on the combination of findings or characteristics specific to benign cases. In this current study, focusing on benign lesions, it has been demonstrated that BPE level and NME are important findings to consider as indicative of benignity in actual readings. In interpreting images, it is crucial not to regard BPE as an abnormal finding. Additionally, in instances of non-mass enhancement, it should be recognized that distributions not characterized as segmental often suggest the possibility of being benign.

Tozaki et al. reported that information on the shape/edge of lesions, heterogeneity within tumors, and kinetic information are useful for distinguishing benign lesions from malignant ones [25]. Moreover, An et al. reported the usefulness of the heterogeneity of internal lesion patterns and low apparent diffusion coefficient values in predicting malignant lesions [26]. However, even in frequently encountered tissues, definitively classifying a benign tumor as benign on MRI can sometimes be challenging. Differentiating intraductal papillomas from ductal carcinomas poses a challenge, and some reports have suggested that a low early-phase enhancement rate and evolution of the DCE-MRI enhancement pattern from homogeneous or heterogeneous enhancement to rim enhancement are more likely to suggest intraductal papilloma [27]. However, even with core biopsy, accurate classification remains difficult. Fibroadenomas, which are known to have high T2 signal intensity, present a challenge in differentiation from phyllodes tumors or mucinous carcinomas, which show similar imaging characteristics [28,29,30]. In this study, it is suggested that in elderly individuals, especially those aged over 64, even circumscribed masses may have a higher likelihood of malignancy. In younger patients, considering fibroadenomas or phyllodes tumors may enhance diagnostic accuracy, while in older patients, keeping mucinous carcinoma in mind could improve diagnostic precision.

This study was a multicenter collection of only breast MRI images performed using Gadobutrol. The high paramagnetic effect of gadobutrol provides a higher relaxivity, associated with the image quality, as compared to other macrocyclic GBCAs [13, 14]. It has been reported that gadobutolol may reduce the contrast between breast cancer and background parenchyma in premenopausal patients, and that breast cancer patients are characterized as being less likely to "washout" and more likely to "plateau" [15]. In our research, we constructed diagnostic models using multiple regression analysis and decision tree analysis, but it is necessary to be cautious about whether breast MRI images taken with contrast agents other than Gadobutrol can be adapted to this model.

To achieve high diagnostic accuracy, appropriately integrating clinical and imaging features is essential. Combining multiple imaging modalities allows the use of complementary information, enabling a more comprehensive evaluation. Machine learning and deep learning algorithms are gaining attention in medical imaging because they offer the potential to extract patterns from vast amounts of data and construct diagnostic models [31,32,33,34,35,36]. This can enhance the diagnostic accuracy, providing a more accurate and efficient diagnostic capability. Further research is crucial to determine the true effectiveness of these approaches for breast MRI.

This study has several limitations. First, there is a case selection bias because of the focus on Japanese individuals and the selection of an equal number of malignant and benign cases simultaneously from clinical cases conducted across multiple facilities. Second, nine cases were not pathologically diagnosed as benign lesions among the 100 benign cases. Although this study investigated indicators for distinguishing benign cases from malignant ones, a detailed comparison between individual cases in terms of pathological and imaging findings was impossible. Third, there is a potential for interobserver variability in assessments based on the findings. However, because we used the established BI-RADS terminology, which has already been validated, and adopted a straightforward judgment, the impact of this limitation is considered minimal. Finally, this study lacks a prospective diagnostic evaluation, and further verification is required for different scenarios, such as screening or preoperative settings. Additionally, continuous research to identify new observations is essential.

In conclusion, we conducted a multicenter collaborative study on breast MRI involving Japanese individuals using gadobutrol as the sole contrast agent. Our findings emphasize the significance of incorporating age, lesion location, and BPE level alongside the BI-RADS lexicon morphology features for more accurate determinations, potentially enhancing the differentiation between benign and malignant diagnoses in clinical practice.