Introduction

Ductal carcinoma in situ (DCIS) is a noninvasive neoplasm of the breast that also displays an unpredictable risk for developing into invasive cancer [1]. As breast cancer screening has become widely available through routine imaging including mammography, the incidence of DCIS has gradually increased to account for approximately one-quarter of new breast cancer diagnoses [2]. While pure DCIS shows a very good prognosis, approximately 14–43% of cases are eventually upgraded to invasive cancer after surgery [3,4,5], due to the inherent heterogeneous histopathologic features of breast cancer [6]. Therefore, patients with DCIS are treated the same as those with invasive cancer and undergo surgery, radiation therapy, and hormonal therapy [7].

Ongoing trials are investigating active monitoring as an alternative to current standard cancer treatment for low-grade DCIS, which is typically associated with favorable prognosis and often manifests as calcifications on mammography [8,9,10]. However, studies suggest that candidates presumed to have low-risk DCIS who undergo active monitoring will still face a 5–12% risk of invasive upgrade [11,12,13,14]. Currently, there are no factors that reliably predict invasive upgrade after surgery or invasive progression during surveillance.

In an attempt to identify DCIS candidates for less aggressive treatment including active monitoring, previous studies have incorporated clinical or imaging variables to predict invasive upgrade of DCIS after surgery [15,16,17,18]. In these studies, the subjective image interpretation of radiologists was used for analysis, limiting the value of the interpretation in terms of data reproducibility or consistency. There has also been an attempt to predict invasive upgrades of DCIS using a machine-learning model based on mammographic radiomic features. However, the complexity of extracting and utilizing radiomic features in actual clinical settings is another challenge in addition to clinically incorporating the model [19]. Recently, artificial intelligence (AI)-based computer-aided detection/diagnosis (CAD) algorithms were approved for commercial use in mammography interpretation [20], and the software programs provide quantitative numeric data for abnormalities detected on mammography images. The quantitative analytic data obtained from AI-CAD may be consistent and objective imaging features compared to radiologists’ interpretations. However, to the best of our knowledge, there are no studies evaluating the utilization of abnormality scores provided by AI-CAD systems for mammographic interpretation to predict DCIS upgrade.

Thus, the purpose of this study was to evaluate whether the quantitative abnormality scores provided by AI-CAD for mammography interpretation can be used to predict invasive upgrade in DCIS diagnosed with percutaneous biopsy.

Methods

The institutional review board (IRB) of Severance Hospital, Yonsei University (IRB approval No: 4-2022-0519) approved this retrospective study and waived the requirement for informed consent based on its study design.

Study population

We searched our institutional database for women who were diagnosed with DCIS via percutaneous biopsy, including core needle biopsy and imaging-guided vacuum-assisted biopsy (VAB). From January 2015 to December 2019, 743 DCIS were diagnosed in 717 women via image-guided percutaneous biopsy. The exclusion criteria were as follows: (1) women who were surgically treated for breast cancer in the ipsilateral breast (n = 58), (2) women who had invasive cancers diagnosed in the ipsilateral breast (n = 78), (3) women who underwent neoadjuvant chemotherapy due to invasive cancers in the contralateral breast (n = 33), (4) women who were lost to follow-up after DCIS diagnosis (n = 61), and (5) women who only had analog mammograms from outside hospitals that were inadequate for AI-CAD analysis (n = 73) (Fig. 1).

Fig. 1
figure 1

Flowchart of patient selection

The clinical characteristics of patients including age, family history of breast cancer, personal history of breast cancer, presence of bilateral breast cancer, presence of related symptoms such as palpability or nipple discharge, method of percutaneous biopsy, and tumor size were extracted from our electronic medical record (EMR) system.

Mammography examinations and interpretation

One of two dedicated digital mammography units was used for the mammography examinations (Senographe DS, GE Medical Systems; Lorad Selenia, Hologic). Standard mediolateral oblique (MLO) and craniocaudal (CC) mammograms and magnification views with 90° lateral and craniocaudal projections, if required, were obtained for all patients.

One board-certified, breast-dedicated radiologist with 14 years of experience in breast imaging (J.H.Y.) retrospectively reviewed the baseline mammograms that were collected routinely before biopsy. Mammographic features of abnormalities that correlated to the biopsy-proven DCIS were categorized into the following four categories: (1) mammographically occult (DCIS detected on supplemental ultrasound (US)), (2) mass/asymmetry/distortion, (3) calcifications only, and (4) combined mass/asymmetry/distortion with calcifications (referred to as “combined features”). Final assessments according to the American College of Radiology Breast Imaging Reporting And Data System (ACR BI-RADS) [21] were also determined by the radiologist during the retrospective review. The radiologist was blinded to the final surgical diagnosis.

Mammography analysis using AI-CAD

A commercially available AI-CAD algorithm (Lunit INSIGHT for Mammography, version 1.1.4.3, Lunit Inc., Seoul, Korea) that was previously validated through a multinational study [22] was used for analyzing mammograms. The algorithm, based on the ResNet-34, a popular deep convolutional neural network (CNN) architecture, was trained using 31,604 cancer-positive mammograms and 19,625 benign mammograms with pixel-label labels indicating lesion locations annotated by 12 breast-dedicated radiologists. The algorithm provides region of interest (ROI) marks for abnormalities on mammograms while providing corresponding abnormality scores (referred to as AI-CAD scores, ranging 0–100%) per view.

In this study, we employed a three-pronged approach for AI-CAD scores: (1) numerical AI-CAD score provided in raw numbers (ranging from 0 to 100%); (2) AI-CAD scores dichotomized into < 50% and ≥ 50%; and (3) graded AI-CAD score of < 25%, 25–50%, 50–75%, and ≥ 75%.

Histopathology at percutaneous biopsy

Information regarding nuclear grade (low, intermediate, or high grade) and presence of comedonecrosis was collected from the pathology reports from percutaneous biopsy. Tumors on percutaneous biopsy specimens were histologically classified using the World Health Organization criteria [23].

Statistical analysis

Ground truth in terms of pure DCIS or invasive cancer was confirmed after surgery. The Shapiro-Wilk test and Kolmogorov-Smirnov test were performed to test for normality for age, tumor size, and AI-CAD scores. As the normality assumption was not satisfied, the median values for these factors were calculated and compared. The Mann-Whitney U test was used to compare clinicopathological variables between pure DCIS and DCIS with invasive upgrade. Mammographic variables such as imaging features on mammography, ACR-BI-RADS final assessment, and median AI-CAD score were also compared between pure DCIS and DCIS with invasive upgrade using the Mann-Whitney U test and Fisher’s exact test.

Univariable logistic regression analysis using clinicopathological variables and mammographic variables was performed to assess predicting factors for invasive upgrade in DCIS. Subsequently, multivariable logistic regression analysis was performed to identify independent predictive mammographic variables after adjusting for clinicopathological variables. Variables with p values less than 0.05 in the univariable logistic regression analysis were included for multivariable logistic regression analysis. The predictability of the multivariable models was evaluated with the area under the receiver operating characteristics curve (AUROC). A subgroup analysis was conducted specifically on DCIS detected on mammography, referred to as “mammographically detected DCIS.” This analysis excluded cases that were mammographically occult, to simulate situations where supplemental screening with imaging modalities other than mammography is not common.

All statistical analyses were performed using SAS (version 9.4, SAS Inc.). p-values less than 0.05 were considered statistically significant.

Results

In total, 440 DCIS diagnosed via percutaneous biopsy in 420 women were included in this study. Three hundred thirty-three were diagnosed using US-guided core needle biopsy, and 107 through VAB. Twenty women were diagnosed with DCIS in both breasts, and these lesions were all included in our analysis as AI-CAD for mammography provides a per-breast analysis. Within the VAB group, 37 patients were diagnosed via stereotactic biopsy (mammogram-guided VAB), while 70 were diagnosed using US-guided VAB. None of the enrolled patients underwent biopsy under MRI guidance. Of the 440 lesions that were diagnosed as DCIS by percutaneous biopsy, 117 (26.6%) lesions were upgraded to invasive disease after surgery (Fig. 1). Breast US is commonly employed as a supplemental imaging tool for screening breast cancer in Korea, and a significant proportion of our population (22.5%, 99 of 440) had DCIS detected on supplemental screening US. In the subgroup analysis of mammographically detected DCIS (n = 341), 103 (30.2%) lesions were upgraded to invasive disease after surgery. The mean patient age was 52.8 years (range, 28–85 years).

Table 1 compares clinicopathologic factors between pure DCIS and DCIS with invasive upgrade. Among clinicopathological factors, significantly higher rates of unilateral DCIS at diagnosis, presence of symptoms, core needle biopsy as the biopsy method, larger tumor size, intermediate or high nuclear grades, and the presence of comedonecrosis were observed in DCIS with invasive upgrade (all p < 0.05, respectively). Similar findings were seen in the subgroup of mammographically detected DCIS, except that the presence of bilateral DCIS and comedonecrosis did not show significant differences between pure DCIS and DCIS with invasive upgrade (p = 0.138 and 0.086, respectively).

Table 1 Comparison of clinicopathological factors between pure DCIS and DCIS with invasive upgrade

Table 2 compares mammographic factors between pure DCIS and DCIS with invasive upgrade. Regarding mammographic factors, significantly higher rates of combined features, BI-RADS 4C and 5 assessments, higher median AI-CAD scores, and higher rates of AI-CAD scores ≥ 50% and ≥ 75% were seen in DCIS with invasive upgrade (all p < 0.001, respectively) for the total DCIS and mammographically detected DCIS.

Table 2 Comparison of mammographic factors between pure DCIS and DCIS with invasive upgrade

Prediction of upgrade to invasive carcinoma

For the total 440 DCIS, univariable logistic regression analysis demonstrated that clinicopathological variables including unilateral DCIS at diagnosis, presence of related symptoms, core needle biopsy as the biopsy method, larger tumor size, intermediate or high nuclear grade, and presence of comedonecrosis were predictors of invasive upgrade (all p < 0.05, respectively). In the subgroup of mammographically detected lesions (n = 341), presence of related symptoms, biopsy method, and nuclear grade were significantly associated with invasive upgrade (Additional file 1: Table S1).

For mammographic variables, combined features, higher BI-RADS assessments, and higher AI-CAD scores, either as raw numbers or in grades, were significantly associated with invasive upgrade (all p < 0.05, respectively) in the total DCIS and mammographically detected DCIS (Additional file 1: Table S1).

Table 3 summarizes the multivariable logistic regression analysis results in the total DCIS group. After adjusting for clinicopathological variables, mammographic variables of combined features (odds ratio (OR): 2.225, 95% confidence interval (CI): 1.068–4.633, p = 0.033), BI-RADS 4c and 5 assessments (OR: 2.473, 95% CI: 1.135–5.390, p = 0.023 and OR: 5.190, 95% CI: 2.175–12.384, p < 0.001, respectively), higher AI-CAD score (1.009, 95% CI: 1.003–1.016, p = 0.007), AI-CAD score ≥ 50% (OR: 1.960, 95% CI: 1.130–3.399, p = 0.017), and AI-CAD score ≥ 75% (OR: 2.306, 95% CI: 1.233–4.313, p = 0.009) were all independent predictors of invasive upgrade. Multivariable logistic regression analysis in mammographically detected DCIS showed that combined features (OR: 2.194, 95% CI: 1.057–4.557, p = 0.035) and higher AI-CAD score (OR: 1.008, 95% CI: 1.000–1.017, p = 0.047) were independent predictors of invasive upgrade (Table 4). However, the final BI-RADS assessment, AI-CAD score ≥ 50%, and AI-CAD score ≥ 75% were not independent predictors for invasive upgrade in mammographically detected DCIS.

Table 3 Multivariable analysis for predictors of invasive upgrade in the total 440 DCIS patients
Table 4 Multivariable analysis for predictors of invasive upgrade in the 341 mammographically detected DCIS patients

Discussion

In this study, we investigated the clinicopathologic and mammographic factors associated with DCIS upgrade to invasive cancer after surgery. Clinicopathologic factors such as biopsy method and nuclear grade on biopsy were significant predictors of invasive upgrade. Among mammographic factors, combined features, BI-RADS 4c and 5 assessments, and AI-CAD scores—raw numbers, ≥ 50%, and ≥ 75%—were independent predictors of invasive upgrade. In the subgroup analysis of mammographically detected DCIS, the AI-CAD scores in raw numbers remained an independent predictor for invasive upgrade.

Our results showed that DCIS presenting as combined mass/asymmetry/distortion with calcifications had a significantly higher likelihood of being upgraded to invasive cancer. In the multivariable logistic regression analysis, OR was 2.225 and 2.194 for the total 440 DCIS and the mammographically detected DCIS, respectively. The association between DCIS presenting as soft tissue lesions combined with calcifications and invasive upgrade has been previously reported [3, 24]. In the meta-analysis by Brennan et al. [3], DCIS appearing as a mass with calcifications on mammography was significantly associated with invasive upgrade with an OR of 1.83. A higher level of suspicion for breast cancer indicated by a higher BI-RADS final assessment made by a radiologist was also associated with an increased rate of DCIS with invasive upgrade, as in prior studies [3, 25]. When suspicious mass/asymmetry/distortion is accompanied by suspicious calcifications, final assessments are commonly elevated. Therefore, the association between combined features and invasive upgrade of DCIS aligns with the association between higher BI-RADS assessment and DCIS upgrade.

Having radiologists evaluate abnormal findings on mammography possesses inherent limitations due to subjectivity and low reproducibility [26]. Notable disagreements have been observed among radiologists with various levels of experience when using BI-RADS descriptors and assessments for mammographic interpretation [27, 28]. In this aspect, image analysis data provided by AI-CAD can be used not only to assist image interpretation but also as objective imaging biomarkers. In this aspect, we evaluated the analysis data of a commercially available AI-CAD system designed for mammographic interpretation to see if the information could be used as biomarkers to predict invasive upgrade in DCIS diagnosed by percutaneous biopsy. Our results showed that the AI-CAD score could potentially predict invasive upgrade in DCIS: higher AI-CAD score (OR 1.009), AI-CAD score ≥ 50% (OR 1.960), and AI-CAD score ≥ 75% (OR 2.306) were independent predictors of invasive upgrade in the total DCIS, and a higher AI-CAD score was an independent predictor of invasive upgrade (OR 1.008) in mammographically detected DCIS. Based on our results, DCIS with higher AI-CAD scores, especially in the highest quartile of ≥ 75%, can be considered to have invasive components (Figs. 2 and 3).

Fig. 2
figure 2

A 49-year-old female patient with pure ductal carcinoma in situ. The patient was referred to our hospital due to a screening-detected calcification in the left lower medial breast. The calcification was assessed as BI-RADS category 4B by the breast radiologist (A) and the AI-CAD score was 42% (B). A stereotactic biopsy was performed, confirming ductal carcinoma in situ. Following partial mastectomy, the final pathology revealed pure ductal carcinoma in situ with intermediate nuclear grade and the presence of comedo necrosis

Fig. 3
figure 3

A 37-year-old female patient with ductal carcinoma in situ with invasive upgrade. The patient was referred to our hospital due to a screening-detected calcification in the right upper outer and upper central breast. The calcification was assessed as BI-RADS category 4B by the breast radiologist (A) and the AI-CAD score was 92% (B). A stereotactic biopsy was performed, confirming ductal carcinoma in situ. However, after nipple-sparing mastectomy, the final pathology revealed invasive ductal carcinoma. No metastatic lymph nodes were found on sentinel lymph node biopsy

When combining clinicopathologic variables with AI-CAD scores to construct a predictive model for invasive upgrade, we achieved acceptable diagnostic performance with an AUROC of 0.699–0.703 for total DCIS (Table 3) and an AUROC of 0.677–0.688 for mammographically detected DCIS (Table 4). These results are comparable to the AUROCs of 0.71 (95% CI, 0.67–0.75) and 0.70 (95% CI, 0.68–0.73) reported in prior studies, where algorithms predicting invasive upgrade were developed by applying CNN to the mammography images of biopsy-proven DCIS patients [29, 30]. Notably, our study utilized a commercially available AI algorithm designed for breast cancer detection, rather than an AI algorithm specifically trained to predict invasive upgrade. In this context, comparable outcomes can be considered highly promising, as this achievement suggests that pre-existing AI technologies can be used to predict invasive upgrade.

Predicting DCIS upgrade before surgery is crucial for surgical decision-making, as the overall incidence of axillary lymph node metastasis is around 5% but rises to 20% for preoperatively underestimated lesions [31,32,33]. This highlights the need for performing sentinel biopsies on patients with a strong preoperative upgrade prediction to prevent additional completion surgery. Moreover, given the current trend towards less aggressive treatment for low-risk DCIS including the omission of surgery [8,9,10, 34], accurately identifying DCIS with invasive components becomes vital in selecting the right patients for active monitoring. Our results show that in addition to clinicopathological predictors such as biopsy method or nuclear grades, imaging features either evaluated by radiologists or AI-CAD can be used as biomarkers for predicting invasive upgrade. We look forward to future studies that can validate our results.

Interestingly, the imaging predictors that significantly predicted invasive upgrade in total DCIS—BI-RADS assessment by the radiologist, AI-CAD score ≥ 50%, and ≥ 75%—lost their significance within the mammographically detected DCIS subgroup. Approximately 78.6% (268 of 341) of mammographically detected DCIS in our study presented with calcifications, and this may have attributed to this result. Calcifications representative of DCIS have a broad range of imaging features from round/amorphous, benign-appearing calcifications to fine pleomorphic/fine linear branching suspicious calcifications where the BI-RADS assessments may have been evenly distributed among the BI-RADS 4 subgroup. Radiologists encounter difficulties assessing abnormalities with ambiguous, overlapping imaging features, which seems to be the same for AI-CAD as dichotomized or graded AI-CAD scores could not independently predict DCIS upgrade in mammographically detected ones. Nonetheless, raw numerical AI-CAD scores showed significance even in the mammographically detected DCIS, supporting our claim that quantitative AI-CAD assessment has the potential to provide more substantial predictive value compared to radiologists when identifying invasive upgrades within DCIS diagnosed through percutaneous biopsy.

There are several limitations to our study. First, this was a retrospective study performed in a single institution, and therefore, selection bias was inevitable. Also, the women included in this study were all Asian, and most had mammographically dense breasts (88.4% with breasts assessed as parenchymal density grade C or D). Second, mammographic features of DCIS and the BI-RADS final assessments were determined by one experienced breast radiologist. Results may differ when readers with varying levels of experience are involved in mammography interpretation. Third, we only used one commercially available AI-CAD system, and our results cannot be generalized to other AI platforms.

Conclusion

In conclusion, when applying AI-CAD for mammography, the AI-CAD score was an independent predictor of invasive upgrade in DCIS. Higher AI-CAD scores, especially in the highest quartile of ≥ 75%, can be used as an objective imaging biomarker to predict invasive upgrade in DCIS diagnosed with percutaneous biopsy.