Improved characterization of sub-centimeter enhancing breast masses on MRI with radiomics and machine learning in BRCA mutation carriers

Objectives To investigate whether radiomics features extracted from MRI of BRCA-positive patients with sub-centimeter breast masses can be coupled with machine learning to differentiate benign from malignant lesions using model-free parameter maps. Methods In this retrospective study, BRCA-positive patients who had an MRI from November 2013 to February 2019 that led to a biopsy (BI-RADS 4) or imaging follow-up (BI-RADS 3) for sub-centimeter lesions were included. Two radiologists assessed all lesions independently and in consensus according to BI-RADS. Radiomics features were calculated using open-source CERR software. Univariate analysis and multivariate modeling were performed to identify significant radiomics features and clinical factors to be included in a machine learning model to differentiate malignant from benign lesions. Results Ninety-six BRCA mutation carriers (mean age at biopsy = 45.5 ± 13.5 years) were included. Consensus BI-RADS classification assessment achieved a diagnostic accuracy of 53.4%, sensitivity of 75% (30/40), specificity of 42.1% (32/76), PPV of 40.5% (30/74), and NPV of 76.2% (32/42). The machine learning model combining five parameters (age, lesion location, GLCM-based correlation from the pre-contrast phase, first-order coefficient of variation from the 1st post-contrast phase, and SZM-based gray level variance from the 1st post-contrast phase) achieved a diagnostic accuracy of 81.5%, sensitivity of 63.2% (24/38), specificity of 91.4% (64/70), PPV of 80.0% (24/30), and NPV of 82.1% (64/78). Conclusions Radiomics analysis coupled with machine learning improves the diagnostic accuracy of MRI in characterizing sub-centimeter breast masses as benign or malignant compared with qualitative morphological assessment with BI-RADS classification alone in BRCA mutation carriers. Key Points • Radiomics and machine learning can help differentiate benign from malignant breast masses even if the masses are small and morphological features are benign. • Radiomics and machine learning analysis showed improved diagnostic accuracy, specificity, PPV, and NPV compared with qualitative morphological assessment alone. Electronic supplementary material The online version of this article (10.1007/s00330-020-06991-7) contains supplementary material, which is available to authorized users.


Introduction
Women who inherit BRCA1 and BRCA2 mutations lack tumor suppressor proteins that repair damaged DNA [1]. These women have an increased risk of developing breast cancer at a younger age compared with women who do not have these mutations. MRI is the most sensitive imaging modality for breast cancer detection and therefore, the American Cancer Society and the American College of Radiology recommend yearly mammography in BRCA mutation carriers starting at age 30 years and yearly MRI beginning at age 25 [2][3][4][5][6][7].
A significant proportion (45%) of BRCA1-related cancers are seen only on MRI [8] where they tend to be cellular with round pushing margins rather than scirrhous with irregular infiltrating margins as seen in other breast cancers. Therefore, early/small tumors may not exhibit classic malignant features but rather may exhibit a benign imaging appearance [9]. As these cancers are also more likely to be high grade and frequently triple negative (hormone receptor and HER-2 negative), the threshold for the recommendation of a biopsy should be low [10,11]. Prior studies [12,13] showed how benign morphology is common in invasive cancers of less than 5 mm in diameter regardless of BRCA mutation status and suggested that all masses representing an interval change as well as lesions increasing in size should lead to a biopsy. Unfortunately, BRCA carriers are also more prone to developing benign tumors of the breast [14,15], resulting in numerous benign biopsies during their life unless prophylactic mastectomy is performed.
To avoid missing significant cancers as well as exposing women to unnecessary biopsies, additional tools to help discriminate benign from malignant lesions should be used to predict the likelihood of malignancy. Radiomics analysis involves the quantitative assessment of the pixel intensity arrangement within specific regions of interest (ROIs) and extracts quantitative features that can be used for further disease characterization. Initial results in women at average risk of breast cancer indicate that radiomics analysis and machine learning (ML) are of value in distinguishing benign and malignant small breast masses [16].
The purpose of our study was to investigate whether radiomics features extracted from MRI of BRCA-positive patients with sub-centimeter breast masses can be coupled with machine learning to differentiate benign from malignant lesions using model-free parameter maps.

Study population
This was a retrospective Health Insurance Portability and Accountability Act-compliant study conducted at Memorial Sloan Kettering Cancer Center. The study was approved by the Institutional Review Board (protocol number 19-119) and the need for written informed consent was waived.
A review of the Department of Radiology database was performed to identify consecutive patients with genetic testing results available and who had an MRI from November 2013 to February 2019 that led to a biopsy or a short-term follow-up. We identified 430 patients. Our inclusion criteria were as follows: BRCA 1-or BRCA 2-positive patients; breast masses with the longest diameter ≤ 10 mm; and BI-RADS 3, 4, or 5 on MRI further assessed with follow-up or vacuum-assisted breast biopsy (MRI or ultrasound-guided) yielding benign or malignant histology. Findings described as non-mass enhancements on MRI were not included. We excluded patients with mutations other than BRCA 1 and 2 and those with a follow-up of less than 2 years when biopsy was not performed (BI-RADS 3 and BI-RADS 4 when target was not visualized at the time of biopsy).

Breast MRI technique
Breast MRI was performed on either a 1.5-T or a 3-T magnet (Sigma; GE) using an 8-channel or 16-channel dedicated surface breast coil. The imaging sequences are included in Table 1.

Imaging assessment by radiologists
All images were independently assessed by two dedicated fellowship-trained breast radiologists in one session (R1: R.L., and R2: I.D., both with 4 years of experience in breast imaging and interpreting breast MRI) blinded to the final histopathological diagnoses and prior or subsequent conventional and MRI imaging. For each lesion, the following morphological features were assessed according to the BI-RADS lexicon on post-contrast-enhanced T1-weighted images: lesion shape, margin, and internal enhancement characteristics. Readers also assigned a BI-RADS classification. Lesion size was measured as the single largest diameter. On T2-weighted and DW images, signal intensity, morphology, background parenchymal enhancement (BPE), and fibroglandular tissue (FGT) for each breast were also assessed. Time-intensity kinetic curve analysis (signal enhancement in relation to time after contrast injection) was performed on a dedicated workstation with a commercially available computer-aided diagnosis system (OsiriX, OsiriX Foundation) by R1. The reader qualitatively measured the kinetic curve pattern described as washout, plateau, or persistent, according to the BI-RADS lexicon. The location of lesions within the breast (anterior, middle, or posterior depth) was also assessed by R1.
After independent review was conducted, the cases in which there was disagreement between the two readers were re-reviewed in consensus to generate an overall consensus assessment.

Reference standard
Preferentially, histopathology was used as the reference standard established by either image-guided needle biopsy or surgery. In two patients who had benign high-risk lesions on biopsy, the histological report from the surgical biopsy was recorded to confirm the benign nature of the lesion. When biopsy was not performed, stability of more than 2 years on follow-up MRI was considered benign.

Radiomics analysis
Digital Imaging and Communications in Medicine (DICOM) images from the DCE-MRI and non-contrast-enhanced T1weighted MRI were loaded into the open-source image processing tool OsiriX. Both radiologists reviewed the images in consensus before delineating the ROIs and R1manually delineated the ROIs, tracing the borders of each lesion to include the entire enhancing lesion.
Given the small size of the lesions sampled yielding a small number of pixels per slice, an in-house code written in MATLAB (The MathWorks, Inc.) was used to input the ROIs into the open-source CERR software environment (freely available through GitHub) which calculated the radiomics features [17]. Data was reduced to 16 gray levels and only an interpixel distance of one was considered (for small lesions, higher interpixel distances are not appropriate and would reduce counting statistics drastically). CERR analysis resulted in 102 radiomics features sub-divided into six categories: 22 first-order features, 26 features based on the gray level cooccurrence matrix (GLCM), 16 features based on the run length matrix (RLM), 16 features based on the size zone matrix (SZM), 17 features based on the neighborhood gray level dependence matrix, and 5 features based on the neighborhood gray tone difference matrix. Since patients were scanned at either 1.5 T (27 benign cases and 17 malignant cases) or 3 T (49 benign cases and 23 malignant cases), ComBat harmonization (Supplemental Info A1) was employed prior to statistical analysis to remove center effects [18].
Univariate analysis was initially performed to select significant radiomics features able to differentiate between benign and malignant lesions. An AUC cutoff of ≥ 0.65 was used to reduce the number of features of interest. Correlation analysis was then employed to further remove redundant features. For any significant correlations in which the Spearman rank correlation coefficient > 0.9, the feature with the lowest AUC was removed from consideration. This resulted in a more manageable number of features for subsequent multivariate modeling. Using a fine Gaussian support vector machine, perfect separation of benign and malignant cases was obtained. To limit data overfitting, a fivefold cross-validation was employed to develop a robust ML model which should produce similar results for new data. Axial fat-suppressed 2D T2-weighted imaging TR, 5000-6000 ms; TE, 90-110 ms; refocusing flip angle, "auto"; slice thickness, 3 mm; gap, 0 mm; field of view, 34-38 cm; matrix size, 320 × 320; bandwidth, 125 kHz for 1.5 T and 83 kHz for 3.0 T; parallel imaging, "ASSET" Axial non-fat-suppressed 3D T1-weighted imaging Axial fat-suppressed 3D T1-weighted imaging using a Volume Image Breast Assessment (VIBRANT) gradient echo. One sequence before and 3 sequences after intravenous administration of a gadolinium-based contrast agent TR, 4-4.5 ms; TE, 2.1 ms; flip angle, 10°; bandwidth, 62 kHz; field of view, 34-38 cm; matrix size, 320 × 192 (for 1.5 T) and 300 × 300 (for 3.0 T); slice thickness, 1.1 mm; gap, 0 mm; parallel imaging, "ASSET" Axial DWI using single-shot with echo-planar imaging (EPI) 2 b-values (b = 0, 800); TR, 6000 ms; TE, "minimum"; flip angle, 90°; field of view, 34-38 cm, matrix size, 128 × 128 (for 1.5 T), 256 × 256 (for 3 T); fat suppression, "special"; dual shims, "on"; slice thickness, 4-5 mm; parallel imaging, "ASSET"

ADC mapping available in 65 lesions
ASSET, array spatial sensitivity encoding technique; TR, repetition time; TE, echo time

Statistical analysis
Statistical analysis was conducted using SAS (version 9.4, SAS Institute). Continuous variables were summarized using means (± standard deviation) and medians (range); categorical variables were summarized using proportions. Univariate analysis using the chi-square test or Fisher's exact test was performed to assess associations between the imaging parameters (from independent and consensus assessment) with disease status (malignant vs. benign). p values < 0.05 were considered significant. To determine inter-observer agreement, weighted Cohen's κ was used to assess ordinal parameters, while simple Cohen's κ was used to assess the interreader agreement for nominal parameters. For radiomics data, statistical analysis was performed using SPSS (version 25, IBM Corp.) and MATLAB (R2017b, The MathWorks, Inc.). Univariate analysis was performed to identify radiomics features that were significantly different between malignant and benign lesions. Since the number of patients was not large (especially in the malignant cohort), normality in the malignant and benign cohort distributions was tested using the Shapiro-Wilk test and Q-Q plots. For a minority (21/ 102) of normally distributed features, a two-tailed independent t test was used to determine the significant features. For the majority of non-normally distributed features (81/102), the Mann-Whitney U test for two independent samples was used to determine the significant features.
Clinical factors considered as potential predictors of malignancy (age, BRCA status, menopausal status, and lesion location) were assessed for statistically significant associations with disease status using the Mann-Whitney U test (for age) and the Pearson chi-square test (for all other clinical factors). Significant clinical factors were incorporated into multivariate modeling along with significant radiomics features to produce a robust ML model for discriminating between benign and malignant lesions. All ML modelling was performed using a predefined Gaussian support vector machine.

Patient population and breast lesion characteristics
The study population included 96 patients (Fig. 1). Table 2 and Fig. 2 show the patient and breast lesion characteristics. Figures 3 and 4 are examples of benign and malignant breast masses included in this study. After segmentation, the median benign lesion size was 514.5 pixels (range 85-2425 pixels) and the median malignant lesion size was 816 pixels (range 66-2116 pixels).

Imaging assessment by radiologists
Consensus BI-RADS classification achieved a sensitivity of 75%, specificity of 42.1%, PPV of 40.5%, NPV of 76.2%, and accuracy of 53.4%. Time-intensity kinetic curve analysis was performed of 109/116 lesions; 7 lesions were not analyzed due to motion-related artifacts. Progressive contrast enhancement was present in 54.2% of patients with benign lesions (38/70) and in 23% of patients with malignant lesions (9/39); there was a statistically significant association with disease status based on kinetic analysis (p = 0.01). Table 3 shows the results from univariate analysis according to independent assessments by the two radiologists. Table 4 shows the results from univariate analysis according to overall consensus assessment as well as according to singular assessment performed for kinetics and lesion location, BRCA mutation status, and menopausal status. In consensus reading, there was no significant association with disease status based on margin (p = 0.11), shape (p = 0.97), enhancement pattern (p = 0.05), T2 signal intensity (p = 0.16), DWI (p = 0.54), BPE (p = 0.32), and BRCA mutation status (BRCA1 vs. BRCA2, p = 0.79). There was a statistically significant association with disease status based on lesion location within the breast (p = 0.03), menopausal status (p = 0.0001), and BI-RADS classification (p < 0.001).

Radiomics analysis
ML Model using only the first post-contrast phase At univariate analysis, 37/102 radiomics features were found to be significantly different between benign and malignant lesions (Supplemental Table S1). The AUC cutoff of ≥ 0.65 reduced the number of features of interest to 21/102. Correlation  Table S2). Using a fine Gaussian support vector machine with all 11 parameters, a perfect separation of benign and malignant cases was obtained, demonstrating 100% accuracy. However, this ML model undoubtedly overfitted the data (Supplemental Table S3).
After fivefold cross-validation, LASSO (least absolute shrinkage and selection operator) was used to further reduce the number of parameters. The final ML model utilized three parameters (GLCM-based correlation, SZM-based gray level nonuniformity normalized, and SZM-based zone emphasis). This ML model achieved a diagnostic accuracy of 75% but it can be regarded as a robust ML model which should produce similar results for new data (Supplemental Table S4). This ML model achieved a sensitivity of 55.0% (22/40), specificity of 85.5% (65/ 76), PPV of 66.7% (22/33), and NPV of 78.3% (65/83).

ML model combining radiomics features from all dynamic phases and clinical factors
The results for the ML model using all dynamic phases and clinical factors are provided in the Supplemental Data (Supplemental Info A2, Table S6, Table S7, Table S8). This ML model resulted in a diagnostic accuracy of 81.5% and can be regarded as a robust model. The results from all radiomics models are illustrated in Table 5.

Discussion
In this study, we investigated whether radiomics analysis and ML with MRI can accurately differentiate subcentimeter benign from malignant lesions in BRCA mutation carriers using model-free parameter maps. We demonstrated that radiomics analysis coupled with ML aids in the differentiation of benign and malignant enhancing subcentimeter masses in these patients. The T2-weighted signal intensity and DW imaging did not help to differentiate benign from malignant lesions. While larger cancers have been well-described and characterized on MRI, sub-centimeter lesions, particularly those less than 0.5 cm, have traditionally been regarded as being too small to characterize according to morphological descriptors, negatively impacting accuracy. With advancements in hardware and software, the spatial resolution of MRI has improved, allowing not only the detection but also the morphologic characterization of small enhancing lesions [19]. Meissnitzer et al [13] showed that sub-centimeter invasive breast cancers often present with benign morphologic features such as persistent enhancement (30%) and high T2 signal (17%). Raza et al [20] demonstrated that breast cancers smaller than 5 mm tend to present with circumscribed margins (71%), benign shape (67%), and benign kinetic characteristics (41%). The presence of a BRCA mutation is an additional confounding factor as breast cancers in this population often present with benign morphologic features (e.g., oval shape and well-defined margins) on MRI and can resemble a fibroadenoma or a cyst in 23-38% of cases [12,20]. Yet, these cancers are more aggressive with fast growth rates and a short lead time [20].
Our results confirmed that for sub-centimeter masses in BRCA mutation carriers, morphologic BI-RADS descriptors are not particularly useful for breast cancer diagnosis; there was only moderate inter-rater agreement for morphology although there was at least substantial inter-rater agreement for the BI-RADS assessment categories. Compared with Ha et al [21] who concluded that any T2 hypointense enhancing focus representing an interval change should be biopsied rather than undergo short-term follow-up, we found no significant difference in T2 signal intensity between benign and malignant lesions. This is in agreement with Zhang et al who also showed that T2-weighted imaging does not significantly contribute to differentiating benign from malignant lesions [22]. In addition, we found that DWI signal analysis did not contribute to the accuracy of assessing these lesions, which can in part be explained by its limited spatial resolution which makes it challenging to accurately evaluate sub-centimeter masses.
Several studies have shown that radiomics and machine learning can be used as adjuvant tools to support radiologist image interpretation in differentiating benign from malignant lesions using mammography [23], digital breast tomosynthesis [24], and MRI [16,25]. A study by Truhn et al [26] demonstrated that radiomics and CNN were superior compared with radiomics analysis in differentiating benign from malignant breast masses but both were inferior to the assessment performed by the radiologist. However, for this study, the authors included lesions with overall average diameter of 22.4 ± 20.3; thus, their results could be due to the fact that when lesions are larger in size, they are easier to be characterized as benign or malignant by just analyzing BI-RADS descriptors.
Our study shows a more accurate means of differentiating benign from malignant lesions in BRCA mutation carriers. Gibbs et al evaluated the utility of radiomics and ML from DCE-based parameter maps to diagnose small breast lesions in the general population [16]. The best AUC was 0.78 ± 0.12 and their results showed that radiomics can potentially improve the evaluation of small, benign-appearing breast masses, with increased PPV (fewer biopsies needed) and NPV (more cancers diagnosed) compared with the currently used BI-RADS classification alone. In our study population of BRCA mutation carriers, our data indicate that radiomics analysis and ML can in fact spare women from unnecessary biopsies for benign-appearing small breast nodules. Three radiomics features (coefficient of variation, cluster prominence, and Haralick correlation) were able to separate benign from malignant masses with a diagnostic accuracy of 79.3% when only the first post-contrast scan, combined with clinical data, was used in a ML model.
Recently, alternative abbreviated protocols have been proposed for screening women [19,28] to reduce scan time by acquiring only one pre-contrast and one early post-contrast T1weighted image set. In agreement with the results of Gibbs et al [16], our results showed that delayed post-contrast phases did not add any significant discriminative value to the analysis. This study therefore provides indirect evidence for the potential use of radiomics analysis in abbreviated protocols which have been recently proposed as an alternative for screening high-risk women with dense breast tissue [19] without concerns regarding a decrease in specificity related to the lack of information of enhancement kinetics in the delayed phases.
This study has limitations. By using only single-center data, it is difficult to predict how the developed models might perform with data acquired under different imaging protocols, especially in the case of poorer spatial resolution and slice thickness. We included only sub-centimeter breast masses which do not constitute many pixels in an image, leading to lower spatial resolution and fewer pixels in the final ROI and an increased proportion of pixels that can be regarded as potentially contaminated by partial volume effects. To ensure adequate counting statistics, we decreased the data to only 16 gray levels (vs. 32 or 64 gray levels that have previously been employed in breast MRI) [29]. Another limitation is the relatively small sample size of 116 breast masses due to our strict inclusion criteria. With only 40 cases in the malignant group, feature selection was performed prior to any crossvalidation fold.
In conclusion, radiomics analysis coupled with machine learning improves the diagnostic accuracy in small breast masses in BRCA mutation carriers compared with the qualitative morphological assessment with BI-RADS classification alone. Further studies, preferentially multi-center studies in larger patient cohorts, are needed to confirm these promising results.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Katja Pinker, MD PhD.

Conflict of interest
The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Statistics and biometry Two of the authors (Peter Gibbs, PhD, and Almir Bitencourt, MD, PhD) have significant statistical expertise.
Informed consent Written informed consent was waived by the Institutional Review Board.
Ethical approval Institutional Review Board approval was obtained.

Methodology
• retrospective • observational • performed at one institution Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.