Introduction

Screening mammograms frequently reveal the presence of calcifications, a majority of which are associated with benign changes in breast tissue. However, a minority of these calcifications may be linked to invasive breast cancer (IBC) or its non-obligatory precursor, ductal carcinoma in situ (DCIS) [1].

The introduction of systematic mammographic screening [1,2,3] has significantly increased DCIS detection, as up to 90% of DCIS lesions are detected due to calcifications on mammography [4]. Nevertheless, the anticipated decrease in mortality due to increased DCIS treatment has not been as significant as expected [1], supporting the corpus of evidence that many DCIS lesions would remain harmless if left untreated [5]. Currently, it is not possible to differentiate accurately between calcifications associated with DCIS that will progress to IBC, i.e., high-risk DCIS, and those that remain indolent, i.e., low-risk DCIS. Consequently, all DCIS cases receive treatment, resulting in overtreatment of low-risk DCIS [6,7,8,9,10].

Given the heterogeneity of DCIS in terms of morphology, biology, genetics, and outcome [11], mammographic calcification patterns and distributions may reflect the disease’s heterogeneity.

The American College of Radiology’s Breast Imaging-Reporting and Data System (BI-RADS) standardizes these patterns, classifying calcifications with a suspicious morphology into four categories, (1) amorphous, (2) coarse heterogeneous, (3) fine pleomorphic, and (4) fine linear or fine linear branching [12]. It is noteworthy that not all mammographically observed calcifications are related to DCIS or invasive breast cancer. Amorphous calcifications are only associated with malignancy in about 20% of cases, whereas the positive predictive value of fine linear calcifications is higher (70–80%) [13, 14]. In DCIS, calcifications are most commonly linear, linear branching, and fine pleomorphic, in a linear distribution [15].

Investigating the association between calcification morphology descriptors (CMDs) and specific clinicopathological factors of DCIS lesions, such as grade, receptor-based surrogate subtypes based on hormone receptors and HER2, and the risk of local DCIS or IBC recurrence could provide clinicians with valuable insights into the likelihood of DCIS progression and lesion aggressiveness. Accurately determined CMDs may enable clinicians to make better-informed decisions regarding the necessity of further diagnostic procedures, such as biopsies. However, despite numerous studies, the prognostic value of mammographic calcification descriptors remains unclear [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48].

To address this knowledge gap, we conducted a systematic review and meta-analysis to assess the association between mammographic CMDs and clinicopathological factors in women with DCIS, while evaluating the overall quality of evidence and identifying sources of bias.

Materials and methods

This systematic review is reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) [49]. The study protocol was registered under study ID CRD42022341599 in PROSPERO [50], an international prospective register of systematic reviews.

A comprehensive literature search was performed in Medline via Ovid, Embase.com, and Web of Science Core collection (SCI-expanded, SSCI, A&HCI, ESCI) according to Bramer et al. [51]. Non-peer-reviewed sources such a Google and other grey literature sources were excluded from the search. The literature search focused on English-language articles published from 2000 (when BI-RADS was implemented in the Netherlands) and was conducted on January 25, 2022. Schematically, the search is as follows: (calcinosis AND (mammography or BI-RADS)) AND (DCIS OR breast cancer). Both thesaurus terms (in Medline via Ovid and Embase.com) and free text terms were used if applicable. Conference abstracts were excluded based on the publication type metadata. The search strategy did not employ any additional filters or draw from previous searches. The scope and syntax of the search were verified by a second information specialist. Supplementary Table S1 provides a comprehensive description of the search strategy.

The search results were imported into EndNote 20 [52] to remove duplicate records and retrieve articles. Duplicates were removed using the Bramer method, a specialized technique designed to increase accuracy when compared to automatic deduplication by a reference management system [53]. Initial screening of titles and abstracts was conducted by both M.M.L. and S.D. using the Rayyan app [54]. Studies were considered eligible if they detailed the mammographic CMDs related to clinicopathological factors such as grade, receptor-based surrogate subtypes based on hormone receptors and HER2, and risk of local DCIS or IBC recurrence. M.M.L. and S.D. independently screened the remaining full-text articles for inclusion, resolving discrepancies through group discussion with team members. Exclusion criteria were documented as follows: (1) non-original data (e.g., reviews, editorials, and guidelines), (2) non-English articles, (3) preclinical studies (e.g., animal or in vitro studies), (4) case reports and very small studies (i.e., studies including less than 20 DCIS patients with calcifications), (5) arterial calcifications, (6) other breast imaging techniques than mammography or experimental breast imaging modalities (e.g., ultrasound), and (7) calcification morphology not described. Subsequently, the reference lists of included articles were examined for any relevant articles not identified by the search. Finally, articles with overlapping patient data were excluded, retaining only the largest series.

No other sources were searched, and no other methods were used.

Data extraction and definitions

After the selection process, a custom form was used to extract several study characteristics of the included articles. These characteristics included study details (reference, country, study design), patient and lesion characteristics (setting of recruitment, single- or multi-center, follow-up time in years if applicable, number of DCIS or IBC lesions presenting as calcifications (only), age in years, histopathological size of the lesion), outcome measurement details (type of assay), exposure measurement details (imaging system, method of detection, number of (blinded) readers, calcification classification system), and necessary information for quality assessment.

The numerical results were documented by cross-tabulating each CMD’s absolute numbers concerning clinicopathological factors.

Authors were not contacted for missing data. Missing information was noted as “not specified” or “not available”.

Quality assessment

The Quality in Prognostic Studies (QUIPS) tool [55] was used to assess the risk of bias in prognostic and non-prognostic outcomes, covering six domains: study participation, study attrition, exposure measurement, outcome measurement, study confounding, and statistical analysis and reporting. Each domain was evaluated using three to six related questions. Supplementary Table S2 provides more details on the assessment tool. Each study was given a low, moderate, or high risk of bias for each study, with low risk marked as one and high risk as three. The study attrition domain was only rated for prognostic studies that involved follow-up. Some studies reported both prognostic (e.g., recurrence and progression to IBC) and non-prognostic outcomes, with the QUIPS tool applied separately for these outcomes.

To ensure consistency in the interpretation of the QUIPS criteria, the first five papers’ results were compared between the two reviewers. Then, M.M.L. and S.D. independently performed quality assessment of the included studies. A low-high discrepancy was defined as a difference in risk of bias rating between the two reviewers for a specific domain. When the reviewers did not reach a consensus in case of a low-high discrepancy, they sought the help of a third reviewer (A.W.B.D.), to reach a consensus decision.

The average risk of bias per QUIPS domain was compared using a t-test for studies published between 2000 and 2010 and between 2010 and 2022, with high risk of bias scored as three, medium risk as two, and low risk as one.

Certainty of evidence

The certainty of evidence (CoE) for studies included in the meta-analysis was assessed using a modified GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) instrument [56]. Since all included studies were observational, the initial overall quality of evidence graded as low with a score of 0. Subsequently, the overall quality rating was adjusted based on the following domains: risk of bias, inconsistency, indirectness, imprecision, and publication bias. A domain rated as having a “moderate to low quality of evidence” was downgraded by one point, while a domain rated as having a “very low quality of evidence” was downgraded by two points. Supplementary Table S3 contains the reasons for downgrading.

The risk of bias was determined using the average QUIPS tool score. Inconsistency (heterogeneity) was determined through I2 values and the Cochran’s Q-test’s p-value (P(Q)). I2 values above 50% and a significant Q-test indicated high heterogeneity between the studies. Indirectness was assessed by examining differences between studies in population characteristics, exposure, and outcome measurements. Imprecision was evaluated based on point estimates spread, 95% confidence intervals size, and the overlap of confidence intervals. Publication bias was assessed using the Egger’s test. To account for the test’s lack of power without a representative number of studies [57], it was applied only to outcomes with at least 10 studies. The GRADE score was downgraded by one point for studies whose publication bias could not be assessed.

Data synthesis

The total number of studies assessing the same clinicopathological factor in relation to CMDs was cross-tabulated along with the number of studies finding a significant association with CMDs.

The associations between CMDs and clinicopathological factors were synthesized in pooled odds ratios (pORs) after grouping CMDs into three risk categories. CMDs were categorized into a low-risk, intermediate-risk, and high-risk group for meta-analysis since not all studies used BI-RADS descriptors or analyzed different CMDs separately. The low-risk group included calcifications with punctate or amorphous morphology and served as the reference group. The intermediate-risk group consisted of calcifications with a coarse heterogeneous or (fine) pleomorphic morphology, while the high risk-group included fine linear calcifications based on the difference in positive predictive value for the presence of DCIS. To categorize CMDs into risk groups, non-BI-RADS descriptors were aligned with BI-RADS descriptors (Supplementary Table S4) based on similar descriptions (e.g., “linear” and “casting-type” as both describe calcifications arranged in a line).

Random-effects models were employed to calculate the pORs for CMDs and clinicopathological factors, allowing for heterogeneity between studies. The Mantel-Haenszel method was used to combine binary effect estimates (ORs) across studies. The Paule-Mandel estimator was used to model between-study variance (tau2) in calculating the pORs, as this estimator is suitable for studies with small sample sizes and binary outcomes. A forest plot was used to visually represent the pOR for each factor and risk group.

Studies were excluded from the meta-analysis if they did not report effect sizes or used categories (e.g., effect size measures, follow-up period, definitions or methods of measurement for exposure or outcome, adjustment factors, and analytical methods) that were not comparable with the other studies assessing that specific clinicopathological factor.

Subgroup analyses were not feasible due to the limited number of studies per clinicopathological factor and insufficient information on relevant subgroups (e.g., method of detection, calcification classification systems, and number of readers).

All analyses were performed with R version 4.2.2 (R Foundation for Statistical Computing, Vienna, Austria) using the meta R package (version 6.1). Pooled estimates were reported in combination with 95% confidence intervals (CI), and two-sided p-values of < 0.05 were used to determine statistical significance.

Results

Through an extensive search that included reference cross-checking of relevant articles, a total of 4715 unique articles were collected. An initial screening, based only on studies’ titles, led to the exclusion of 3946 articles. A following evaluation of the remaining 769 articles, based on their abstract, led to the removal of 666 articles. Among these, about 44% were excluded because they were not in English, presented non-original data, or were classified as case reports. Further analysis of the remaining 103 articles using full-text assessment (Supplementary Table S5) resulted in the exclusion of another 74 articles. The main reason of exclusion at this stage was the lack of an adequate description of calcification morphology.

Ultimately, 29 studies met the strict inclusion criteria, which covered CMDs and associated clinicopathological factors in patients diagnosed with DCIS. The results section is further organized in three sections: (i) study characteristics, presenting an overview of the included studies and patient populations; (ii) results of synthesis, consolidating the extracted outcomes from the included studies; and (iii) assessment of bias using the QUIPS tool, evaluating the risk of bias across the selected studies.

Figure 1 outlines the approach used for the systematic literature search and subsequent study selection process.

Fig. 1
figure 1

Overview of the Medline, EMBASE, and Web of Science literature search and selection process of eligible articles. The searches were performed on January 25, 2022. Note that 4713 articles were identified, of which 29 met our inclusion criteria. Abbreviations: DCIS, ductal carcinoma in situ; IBC, invasive breast cancer

Study characteristics

In all 29 studies, the data was collected retrospectively from hospital registries, national registries, or clinical trials. Cohort (n = 27), case–control (n = 1), and a case-cohort within a random control trial (n = 1) designs were used. Twenty were single-center studies, and 9 involved multiple centers. While 20 studies exclusively studied DCIS, nine studies included patients with both DCIS and IBC. The number of DCIS patients with calcifications per study ranged from 32 to 1783. Fifteen studies described calcification morphology according to the BI-RADS system, while the remaining 14 studies used non-BI-RADS descriptors. In ten studies, the lesions were described as only screen-detected, while in another ten studies, they were reported to be both screen-detected and non-screen-detected. The remaining studies did not specify the method of lesion detection. Thirteen studies specified using mammograms with calcifications only, without other mammographic abnormalities.

Table 1 shows the characteristics of the selected studies between 2000 and 2022.

Table 1 Characteristics of studies reporting on mammographic morphology of calcifications associated with clinicopathological factors

Reported clinicopathological factors in relation to CMDs

A total of 29 studies investigated 17 distinct factors concerning CMDs (Table 2), with 28 studies assessing non-prognostic outcomes, including high grade (n = 16), (micro)invasion (n = 8), (comedo)necrosis (n = 7), HER2 overexpression (n = 6), ER positivity (n = 6), age (n = 3), Ki67 or proliferation (n = 2), histological size (n = 2), neoductgenesis (n = 2), calcification distribution (n = 2), margin status (n = 1), comedocarcinoma (n = 1), multicentricity (n = 1), tenascin-C (n = 1), and Oncotype DX score (n = 1). Furthermore, five studies assessed prognostic outcomes including recurrence (n = 4) and DCIS progression to IBC (n = 1).

Table 2 Overview of clinicopathological factors that were assessed in the studies

Data synthesis and meta-analysis

Out of the 17 clinicopathological factors reported across 29 studies, 14 factors were significantly association in at least one study (Table 2). A meta-analysis was conducted for five clinicopathological factors, deemed sufficiently homogeneous across 20 studies (Fig. 2): high grade (n = 11), HER2 overexpression (n = 4), ER positivity (n = 4), (comedo)necrosis (n = 5), and the presence of (micro)invasion) (n = 5). The meta-analysis shows the aggregated results for low-, intermediate-, and high-risk CMDs concerning the clinicopathological factors.

Fig. 2
figure 2

The meta-analysis results for each clinicopathological factor in a forest plot. For the calcification morphology descriptor (CMD) risk groups, the pooled odds ratios (pORs), 95% confidence intervals and associated p-values are shown. Furthermore, associated heterogeneity measures (I2, P(Q)) and publication bias (Egger’s p-value), as well as certainty of evidence summarized in the GRADE score are given. The low-risk CMDs served as a reference. Per CMD risk-group, details on the studies (number, number of calcified lesions, and number of cases) are given

High-risk CMDs demonstrated a significant association with four clinicopathological factors including high grade (pOR, 4.92; 95% CI, 2.64–9.17), (comedo)necrosis (pOR, 3.46; 95% CI, 1.29–9.30), (micro)invasion (pOR, 1.53; 95% CI, 1.03–2.27), and ER positivity (pOR, 0.33; 95% CI, 0.12–0.89). High-risk CMDs were negatively associated with ER positivity, indicating a reduced incidence of ER positivity in high-risk versus low-risk CMDs.

Intermediate-risk CMDs were significantly associated with high grade (pOR, 2.07; 95% CI, 1.44–2.96) and (comedo)necrosis (pOR, 2.58; 95% CI, 1.87–3.54), while showing an increased pOR of 1.66 (95% CI, 0.92–2.99) with a p-value of 0.09 for (micro)invasion.

Heterogeneity measures I2 and P(Q) revealed inconsistencies in the estimates reported in the included studies concerning high grade (I2, P(Q): 54%, p = 0.002 for high-risk and 47%, p = 0.04 for intermediate-risk CMDs), ER positivity (I2, P(Q): 49%, p = 0.12 for high-risk CMDs), comedo(necrosis) (I2, P(Q): 52%, p = 0.08 for high-risk CMDs), and invasion (I2, P(Q): 55%, p = 0.06 for intermediate-risk CMDs).

Neither high-risk CMDs (pOR, 1.80; 95% CI, 0.28–11.46) nor intermediate-risk CMDs (pOR, 0.72; 95% CI, 0.19–2.82) were significantly associated with HER2. One contributing study by Zhou et al. [47] reported odds ratios below one, indicating a reduced risk. A considerable discrepancy existed between odds ratios calculated from the different included studies, reflected in the heterogeneity measures I2 with > 83% and p(Q) <  = 0.001.

Egger’s test was not significant, indicating that there was no publication bias for high grade, while publication bias was not determined due to the small sample sizes for the other outcomes.

Certainty of evidence

According to the GRADE tool approach, the certainty of evidence for all outcomes can be rated as low (Supplementary Table S3), as the studies assessed associations through observations. The calculated GRADE score denoted the level of insufficient evidence or bias across five domains (risk of bias according to the QUIPS tool, heterogeneity, indirectness, imprecision and publication bias). The highest GRADE score of −11 for (high-risk and intermediate-risk CMDs combined) was identified for ER positivity and high grade, indicating the highest level of evidence. The GRADE score for the other outcomes were as follows: invasion (−13), (comedo)necrosis (−14), and HER2 overexpression (−16). The next section evaluates the risk of bias across the selected studies in detail using the QUIPS tool.

Risk of bias per QUIPS domain

To further understand the reliability of the included studies, a thorough assessment of bias was conducted using the QUIPS tool. The risk of bias was assessed across five study domains, namely study participation, exposure measurement, outcome measurement, study confounding, and statistical analysis and reporting. For studies measuring prognostic outcomes a sixth domain, study attrition, was also evaluated (Fig. 3).

Fig. 3
figure 3

Risk of bias per QUIPS domain for each individual study with (a) non-prognostic outcome(s) and (b) prognostic outcome(s)

Across the 29 studies, five out of 170 individual rated QUIPS domains exhibited a low-high discrepancy between the two reviewers in their rating of bias. Following consultation with the third reviewer (A.W.B.D.), these domains were assigned a moderate risk of bias rating. This suggests that the discrepancies in the assessment of the bias using the QUIPS tool were limited.

The study participation domain revealed eight studies with a high risk of bias and 15 with a moderate risk in either prognostic or non-prognostic outcomes. The downgrading of studies was primarily attributed to small sample sizes and inadequate description of study groups, data collection criteria, and methods or reasons for missing data.

In the exposure domain, seven studies exhibited a high risk of bias, while 13 demonstrated a moderate risk. The downgrading mainly resulted from situations where only one reader determined the CMDs, or when crucial details were omitted, such as whether the readers were blinded to the outcome and how consensus was achieved between readers.

The outcome measurement domain indicated three studies with a high risk of bias and 11 studies with a moderate risk. High-risk studies were characterized by a severe lack of detail regarding the definition and method of measuring the outcome variable. Moderate-risk studies contained insufficient information on the measurement of outcome variables and, if applicable, blinding of reviewers.

With regard to the confounding domain, most individual studies did not adjust their results for potential confounders. Five studies were rated as having a high risk of bias in this domain and 18 as having a moderate risk of bias. High-risk studies failed to account for potential confounding through matching, stratification, or the initial assembly of comparable groups. Prognostic studies were rated as moderate or high risk when they did not adjust for treatment or age in their statistical analyses. Studies with a design that somewhat limited the risk of confounding were rated as moderate risk.

The statistical analysis and reporting domain predominantly displayed a low risk of bias. However, in thirteen studies, this domain was rated as moderate, because the analysis was not powerful enough to prove or disprove the hypothesis. This occasionally occurred for individual CMD groups, e.g., when chi-square tests were applied to small sample sizes.

In the study attrition domain, three prognostic studies were rated as having a moderate risk of bias because the follow-up or characteristics of women who completed the study and those who did not were not described.

Notably, the average risk of bias was significantly higher in the exposure measurement (p = 0.01) and confounding (p = 0.025) domains for studies published before 2010 compared to those published after 2010.

Discussion

Data synthesis and meta-analysis

This systematic review comprehensively synthesizes the existent literature examining the associations between calcification morphological descriptors (CMDs) and clinicopathological factors in women diagnosed with DCIS with the aim of distinguishing high-risk from low-risk DCIS lesions based on CMDs, which may aid clinical decision making. A total of twenty-nine studies were identified, evaluating 17 clinicopathological factors, of which five were deemed appropriate for meta-analysis.

The meta-analysis revealed a significant association between fine linear calcifications, i.e., the high-risk group and features of aggressiveness including high grade, presence of (comedo)necrosis, and (micro)invasion. An inverse association was observed with ER positivity.

Intermediate-risk CMDs, i.e., coarse heterogeneous and fine pleomorphic calcifications, were significantly associated with high-grade and (comedo)necrosis. The associations were generally similar to those of the fine linear calcifications, albeit to a lesser extent. The presence of calcifications in DCIS is thought to be due to active secretion of calcium into the ducts by (malignant) epithelial cells in non-comedocarcinoma and calcification of necrotic debris in comedocarcinoma [58]. The observed association between fine linear calcifications and the presence of (comedo)necrosis, high grade and (micro)invasion might be attributed to the rapid growth and common cell death that occurs within poorly differentiated DCIS, culminating in calcification deposition along the ductal structures and their linear appearance on mammography. Fine linear calcifications may thus be more associated to more aggressive malignancy, as their linear appearance suggests a duct lumen filled with calcified necrotic debris [58].

With respect to HER2 overexpression, a positive association with high-risk CMDs was found, whereas a negative association was identified with intermediate-risk CMDs; however, neither association was statistically significant. The results were inconsistent, due to one study [47] presenting contradictory findings compared to other studies [16, 17, 27] that reported on HER2. The study’s characteristics did not provide a clear explanation for this discrepancy, with the only apparent difference being the use of tissue microarrays rather than tissue resections. Nevertheless, a meta-analysis conducted in 2013 by Elias et al. [59] which aimed to identify imaging features of HER2 overexpression in multiple imaging modalities and included IBC lesions next to DCIS lesions, discovered a significant association between HER2 overexpression and high-risk CMDs on mammography. For intermediate-risk calcifications, they found a positive, non-significant association with pleomorphic calcifications and no association with coarse calcifications.

Considering the prior findings that ER-positive and HER2-negative (luminal) breast cancers are generally less aggressive than ER-negative and HER2-positive invasive breast cancers [60], the associations between linear calcifications with HER2 overexpression and negative association with ER-positivity in our review were not unexpected. Further research is warranted to elucidate the role of receptor subtypes in the risk profile of calcifications and associated lesions.

Twelve other clinicopathological factors were examined in relation to CMD, in addition to the ones that were suitable for inclusion in the meta-analysis, and several studies reported either significant or non-significant associations for these factors. Among these factors, the one related to the risk of recurrence including the Oncotype DX score showed significant association in two out of five studies. However, additional cohorts and standardized methods are required to validate the evidence on these factors.

Our comprehensive analysis suggests a potential association between CMDs and the aggressiveness of lesions, particularly in the progression from DCIS to IBC. This conclusion is supported by O’Grady et al. [61], and Tot et al. [62], who presented evidence from a selection of important clinicopathological factors and outcomes in their respective non-systematic reviews. However, it is essential to note that these are narrative reviews without described strategies to identify and mitigate reporting bias. The majority of the referenced studies were centered on IBC lesions, with some of them solely examining presence of calcifications or comparing specific calcification morphologies to their absence. Moreover, Tot et al. [62] grouped different calcifications into two main categories: those mostly occupying the ducts (including casting-type and skipping stone-like calcifications) and those predominantly involving the TDLUs (including crushed stone-like and powdery calcification). This resulted in less detailed information about different mammographic CMDs. To confirm the significance of CMDs as a prognostic biomarker, CMDs should be studied in terms of prognostic outcomes, such as survival or recurrence rates, next to clinicopathological factors.

Certainty of evidence and sources of bias

Uncertainty in the findings of the meta-analysis were revealed and the most common sources of bias in the relevant articles using the GRADE approach and the QUIPS tool. The primary source of bias originated from the study participation domain due to low sample size and inadequately described study groups. The majority of the included studies utilized data from hospital or national registries, which could have influenced the found estimates. Retrospective registry-based studies depend on the quality, size, and completeness of relevant variables and features of the used registries. Frequently, these variables were not mentioned or properly described.

The exposure domain was also frequently rated as moderate or high risk of bias because CMD determination was often conducted by a single reader, given that the assessment of these qualitative descriptors is prone to inter- and intra-observer variability. While radiologists strongly agree on the presence of calcifications, they agree to a lesser degree on the classification of the observed calcification morphology [63]. Further standardization of the descriptors is therefore essential for using CMDs in medical decision-making. Specifically, methods that can extract high-quality features from radiological images, such as radiomics and deep learning, hold the potential to accurately discriminate between calcifications associated with low- and high-risk DCIS [64,65,66]. In addition to AI and radiomics, other imaging modalities and image enhancement techniques (e.g., noise reduction and contrast manipulation) could be considered depending on accessibility, numbers, and costs, as mammography images suffer from low contrast and background, making breast cancer diagnosis challenging. Using accurate prediction models could facilitate the assessment of different calcification types more reliably in a more objective manner, overcoming the substantial inter-reader variability among radiologists. Ultimately, this may aid in the clinical management of lesions associated with such calcifications.

Bias due to inter-reader variability could also have occurred in the outcome domain for the clinicopathological factor grade, meaning that both radiomics and pathomics methods might improve risk stratification of calcifications.

Concerning the confounding domain, most studies reported on the morphology of calcifications in isolation, not considering other descriptors that can aid in further risk stratification and control for confounding factors such as distribution, size, and clinicopathological factors. Some studies also assessed distribution, as this is another calcification descriptor often used in clinical practice, but not in combination with morphology. Hence, the results from this review reflect univariable associations only, which can lead to biased estimates and incorrect conclusions if relevant covariates are omitted.

The studies published after 2010 had significantly lower risk of bias scores in the exposure and confounding domains as compared to the older studies from before 2010, indicating an improvement in study and evaluation methods for this research question.

Limitations and strengths

As with the majority of systematic reviews, the design of the current study is subject to potential limitations [67]. Systematic reviews employ a retrospective, observational research design, and as such are susceptible to systematic and random error. The majority of known errors in systematic reviews arise during the selection and reporting stages [68]. To mitigate the risk of these errors, an information specialist (S.M.) was consulted in advance to define all the steps and judgments in the systematic review process and to conduct the search of the articles. Furthermore, M.M.L. and S.D. piloted the screening, quality assessment, and data extraction process to improve accurate interpretation and discussed discrepancies between their results with A.W.B.D. or the whole team.

Studies with a high risk of bias according to the QUIPS tool were not excluded in our meta-analysis given that the risk of bias was relatively high for all studies, predominantly due to small sample sizes and inconsistencies in the evaluation and registration of calcifications. These biases have affected the reported pOR estimates, as well as the performance of tests of heterogeneity, publication bias, and other sample size effects summarized in the GRADE score for each clinicopathological factor and calcification group. For a more nuanced interpretation of the meta-analysis results, the ORs and QUIPS scores per domain for each study were reported.

Nonetheless, this is the first systematic review and meta-analysis to assess the association between mammographic CMDs and clinicopathologic factors in women with DCIS. The primary strength of such a meta-analysis lies in its capacity to enhance the identification of associations and uncover the sources of heterogeneity between reported estimates across studies. Indeed, using the QUIPS tool, we identified the most frequently occurring biases in included studies by assessing the association between CMDs and clinicopathological factors in a standardized manner.

In conclusion, our meta-analysis demonstrates that specific mammographic calcification morphologies are related to clinicopathological factors associated with lesion aggressiveness in women with DCIS.

This systematic review also showed a high risk of bias and heterogeneity between studies. Therefore, these findings need to be verified through high-quality studies that use homogeneous cohorts and standardized, reliable calcification assessment systems. Future radiomics and deep learning studies may help in the extraction of relevant calcification features that can extract prognostic information in DCIS lesions and, ultimately, in making the distinction between high-risk and low-risk DCIS lesions and reducing overtreatment of DCIS.