Introduction

Epidemiologic studies have consistently demonstrated that elevated mammographic density is a strong and independent risk factor for sporadic breast cancer, conferring relative risks of 4- to 5-fold when comparing women with high versus low mammographic density [1]. Although mammographic density has a strong heritable component [2]-[10], it is currently being debated as to whether mammographic density is associated with hereditary breast cancer risk [11],[12]. Up to half of all hereditary breast cancer cases can be attributed to autosomal dominant mutations in two genes, BRCA1 and BRCA2[13]. Among women with BRCA1/2 mutations, nearly 50% may be expected to develop breast cancer by age 50 years [13]. The ability to identify high-risk patients through analysis of mammographic images could have clinically significant implications for breast cancer screening and prevention strategies.

Utilizing a computer-assisted method to characterize percent mammographic density (PMD), we have previously reported that mammographic density is not associated with BRCA1/2 mutation status [14], a finding consistent with those from prior studies [12],[15]-[18]. In contrast, Huo et al. and Li et al. used computerized radiographic texture analysis of a retro-areolar region-of-interest (ROI) to distinguish between mutation carriers and low-risk women; mutation carriers had a breast parenchymal texture pattern that was characterized as being coarse with low contrast [19],[20].

Radiographic texture analysis (RTA) has long been utilized in medical imaging research, but investigators have taken different approaches when using texture analysis of mammographic images [19]-[30]. Broadly, the extracted mammographic features are described as gray-level magnitude-based features, which describe variation of gray-value intensities and ignore spatial relationships (for example, percent density), and texture-based features, which characterize the higher-order statistics of the spatial radiographic patterns.

Multiple investigators have evaluated whether texture-based features capture a component of risk beyond that of mammographic density [19],[22]-[26],[31],[32], but only Huo et al. and Li et al. have suggested that this method might accurately classify subjects according to BRCA1/2 mutation status [19],[20]. These findings, though promising, were based on the analysis of 30 BRCA1/2 mutation carriers. This study represents replication and validation of their results in a larger, independent dataset.

Methods

Study populations and data collection

The study populations have been described previously [14]. Briefly, the NCI Clinical Genetics Branch Breast Imaging Study evaluated breast cancer screening modalities in women who were at high genetic risk of breast cancer. From 2001 to 2007, 200 women were enrolled in this study, including 170 women with proven deleterious BRCA1/2 mutations and 30 proven mutation-negative women from the same families. Participants were seen at the NIH Clinical Center (NCI Protocol #01-C-0009; NCT-00012415) and underwent a physical examination, nipple fluid aspiration, breast duct lavage, standard clinical four-view screening mammogram and breast magnetic resonance imaging (MRI), which were reviewed by the study radiologist (CKC). See prior reports for additional details related to study design [33],[34]. The NCI Institutional Review Board (IRB) approved the study, and all participants provided informed consent.

The NCI/National Naval Medical Center (NNMC) Susceptibility to Breast Cancer Study was a cross-sectional study of the association between mammographic density and genes involved in estrogen metabolism. From 2000 to 2006, 219 women with a documented personal history of breast cancer and 488 controls were enrolled. Participants were enrolled from the patient population at the NNMC and other referring institutions and the NIH Clinical Center (NNMC Protocol #NNMC.2000.0010; NCI Protocol #00-C-0079; NCT-00004565). Mammograms obtained within the year prior to enrollment were reviewed by two study radiologists (CKC and CEG). Study participants did not undergo BRCA1/2 mutation testing. Five-year Gail assessment [35] and Pedigree assessment tool (PAT) [36] scores were calculated for all controls. The PAT is a point-scoring system that uses family cancer history to identify women who are at high risk of hereditary breast cancer (that is, >10% risk of being a BRCA1/2 mutation carrier) [36]-[38]. A PAT score ≥8.0 has been associated with 100% sensitivity and 93% specificity for detecting mutation carriers, and a PAT score <8.0 has been associated with a negative predictive value of 100% [36]. For the current study, control subjects with low scores by both models were classified as having low risk of breast cancer; they were highly unlikely to be BRCA1/2 mutation carriers. The IRBs of the NNMC and NCI approved this study, and all participants provided written informed consent.

Participants from both studies completed self-administered questionnaires which captured demographic characteristics, current weight and height, medical and reproductive history, and personal and familial history of cancer. Questionnaire items were compared between studies, and common response categories were combined in order to create a harmonized analytic database.

Analytic sample

A flow diagram of the criteria utilized to derive the analytic sample of BRCA1/2 mutation carriers and non-carriers is depicted in Figure 1.

Figure 1
figure 1

Flow diagram depicting the eligibility criteria used to derive the analytic sample of BRCA1/2 mutation carriers and non-carriers. PAT, Pedigree assessment tool.

The NCI Clinical Genetics Branch’s Breast Imaging Study

After excluding 22 women with prevalent breast cancer (11 BRCA1 carriers, 11 BRCA2 carriers), one BRCA1 carrier with prevalent ovarian cancer, five women with missing mammographic density readings (three BRCA1 carriers, one BRCA2 carrier, and one non-carrier whose mammograms were given to the patients for care in their home communities prior to being digitized), the final study population included 143 mutation carrier and 29 non-carrier women (the latter from mutation-positive families) eligible for analysis. Of these, images from six mutation carrier and three non-carrier women were deemed ineligible for analysis of computer-extracted texture features for various reasons (for example, breast area too small for ROI placement, image artifacts, et cetera), resulting in a total of 137 mutation carriers (88 BRCA1- and 49 BRCA2-positive) and 26 non-carriers in our analytic sample.

The NCI/NNMC Susceptibility to Breast Cancer Study

For the purposes of this report, the analytic sample was restricted to controls with available mammographic density readings, who were determined to be at low-to-average breast cancer risk. After excluding controls with missing density readings (n = 226), 262 potentially eligible women remained. Of these, 153 women had a 5-year Gail score ≥1.67, three women were missing Gail scores, 15 women had PAT scores ≥8, and one woman had a personal history of skin cancer, type unspecified; these 172 women were excluded, resulting in 90 non-carriers eligible for analysis. Of these, images from 16 women were deemed ineligible for analysis of computer-extracted texture features for the reasons described above and were excluded, yielding 74 women at low-to-average risk of breast cancer for our analytic sample. Medians (ranges) for their maternal PAT, paternal PAT and 5-year Gail scores were 0 (0, 7), 0 (0, 5), and 1.2 (0.3, 1.6), respectively. Given the rarity of BRCA1/2 mutations in the general population, and the low PAT scores, these 74 women were assumed to be mutation-negative. For the sake of simplicity, combining these women with the 26 known mutation-negative subjects from the Breast Imaging Study, we use the term “non-carriers” in this report to describe these 100 women.

Assessment of mammographic density

Analog mammographic films from both studies were digitized at 0.095 mm (267 dots per inch) in pixel size and 8-bit quantization in gray level. The details of the digitization process have been described previously [14]. Participants from both studies had standardized, quantitative calculations of PMD measured in digitized craniocaudal views by the same experienced study mammographer (CKC), using an interactive computerized thresholding method developed at the NIH Clinical Center (MEDx™ version 3.44, Medical Numerics, Germantown, MD, USA). We have previously reported that the intra-observer agreement for PMD assessed in 100 paired sets using MEDx was 0.89 [14]. In addition, we found that Cumulus™ measures of PMD were strongly and positively correlated with those assessed by MEDx (r = 0.84, P <0.0001) [14].

Computerized assessment of mammographic parenchymal patterns

Regions-of-interest (ROIs) measuring 256 by 256 pixels were manually selected by the same investigator (LL) without knowledge of BRCA1/2 mutation status, from the central breast region behind the nipple on digitized craniocaudal projections (Figure 2). Detailed explanations of the effects of ROI extraction, ROI size, and ROI location on RTA have been reported elsewhere [20]. These ROIs were used in the subsequent analytic step to extract and characterize the gray-level magnitude-based and parenchymal texture-based features of the digitized mammograms.

Figure 2
figure 2

A sample region-of-interest (ROI) selected from central breast region behind the nipple on a digitized mammogram.

Radiographic texture analysis (RTA) of computer-extracted features

The detailed descriptions of the 38 computer-extracted parenchymal texture features (mathematical descriptors used in the RTA) have been reported previously [19],[20],[27]-[30],[39]-[41]; their feature numbers, names and definitions are summarized in Additional file 1: Table S1. For ease of interpretation, gray-level magnitude (M)-based features were assigned alpha-numeric descriptors ranging from M1 to M9, and texture (T)-based features were assigned descriptors ranging from T1 to T29. We assessed the internal reliability of the reader’s ROI placement by randomly submitting a masked set of 91 mammograms (Susceptibility to Breast Cancer Study (n = 27); Breast Imaging Study (n = 64)) for re-selection of the ROIs and re-analysis by the RTA algorithms. The intraclass correlation coefficient (ICC) was calculated to assess the intra-observer reliability of the RTA features following manual re-selection of the ROIs.

Statistical analyses

Selection of participants for the training and testing datasets

After exclusions, 237 subjects were eligible for analysis: 137 BRCA1/2 carriers, 100 non-carriers. We divided these women into a training set used to develop discrimination models to distinguish carriers from non-carriers, and a testing set used to evaluate how well the discrimination model distinguished carriers from non-carriers. From the 100 non-carriers, 6 were randomly selected from each non-carrier quintile of age, for a total of 30 non-carriers, to comprise the testing set. Likewise, from the 137 mutation carriers, 6 women were randomly selected from each carrier quintile of age, yielding 30 carriers for the testing set. The remaining 177 women comprised the training dataset. Baseline characteristics were compared between BRCA1/2 mutation carriers and non-carriers within the training and testing datasets using the two-sample t-test for independent samples. We assumed equal variances for continuous measures, and used the chi-square test for discrete measures.

Stepwise feature selection using linear discriminant analysis

Utilizing the 177 subjects in the training dataset, we employed stepwise feature selection using linear discriminant analysis, in which RTA features were reiteratively added and removed from the group of selected features based on a feature selection criterion, that is, the Wilks’ lambda [42],[43]. In each iteration step, linear discriminant analysis was used to calculate the discriminant scores, which were then used to compute the Wilks’ lambda. The F-statistic was applied to determine whether a particular feature contributed significantly (P-value <0.05) to the performance of the linear discriminant analysis in each step. Details of stepwise feature selection using linear discriminant analysis are described in Additional file 2. The stepwise feature selection was performed 177 times, by leaving out one woman from the training set each time. To be included as a classifier for distinguishing carriers from non-carriers, a feature had to be selected in at least half of these 177 analyses.

Merging of computer-extracted features

The RTA features selected in the linear discriminant analysis were combined using a Bayesian artificial neural network (BANN) algorithm (Additional file 2). The output from BANN was converted to an estimate (probability score) of the likelihood of being within the BRCA1/2 mutation carrier group. These probability scores were evaluated for their capacity to serve as an image-based marker of risk in the independent testing data set by assessing whether their distribution differed between BRCA1/2 mutation carriers and non-carriers. In order to assess how mammographic density might influence the discrimination performance, we also developed (training data) and tested (testing data) a modified BANN classifier in which percent mammographic density was forced to be included along with the same selected RTA features. Both the linear discriminant analysis and the BANN algorithm were completed in MatLab™ (The MathWorks, Inc. Natick, MA, USA).

Performance evaluation and related statistical analyses

Spearman’s rank correlation coefficient was used to describe the relationships between the selected computerized texture features with PMD, age and each other. The ability of the BANN-trained classifier to distinguish between BRCA1/2 mutation carriers and non-carriers was evaluated in the testing dataset using several approaches. We evaluated the relation between the BANN-trained classifier output and BRCA1/2 mutation status in univariate and multivariable logistic regression analysis, first adjusted for age as a continuous variable, and then adjusted for age and PMD. For comparison purposes, we evaluated the relationships between (a) PMD alone, and (b) the modified BANN-trained classifier, which included PMD with BRCA1/2 mutation status in both univariate and multivariable logistic regression analysis adjusted for age. In sensitivity analyses, we additionally adjusted for baseline characteristics that differed by mutation status.

Because carriers were on average approximately 10 years younger than non-carriers [14], we also performed age-matched sensitivity analysis in the testing data. First, we applied the BANN-trained classifier from the original training dataset to testing datasets restricted to pairs of BRCA1/2 mutation carriers and non-carriers who were randomly selected and matched on age within ±3 years (that is, 19 mutation carriers, 19 non-carriers) and ±1 year (that is, 17 mutation carriers, 17 non-carriers). Within the age-matched testing datasets, the Wilcoxon signed rank test was used to examine the mean paired difference in the BANN probability score between carriers and non-carriers. We performed a similar paired difference analysis of BANN probability scores based on the selected features and PMD. In an additional sensitivity analysis, we removed women older than age 55 years from both the training and testing datasets, and repeated the analysis conducted with the combined dataset.

The utility of the computer-extracted RTA features, as well as the output from BANN in the task of differentiating the two groups, was also evaluated by using receiver operating characteristic (ROC) analysis [44],[45]. The area under the fitted ROC curve (AUC) was used to evaluate the inherent discriminant capacity of the decision variable. The AUC measures the probability that a randomly-selected carrier will have a greater probability score than a randomly-selected non-carrier. The ROCKIT™ software package (ROCKIT, version 1.1b) [46] was used to evaluate the statistical significance of the difference between two AUC values (that is, the AUC from the BANN-trained classifier was compared with the AUC from PMD alone) [47]. We used two methods to obtain age-adjusted estimates of the AUC values explained by the BANN-trained classifier. In the first method, we restricted our test-set to pairs age-matched within ±3 and ±1 years, as defined above. In an alternate approach, we computed individual AUCs within age strata, in which the testing dataset was divided into three age strata: 25 to <35 years, 35 to <45 years, and 45 to 55 years. The AUCs were computed within each age stratum, and then were averaged to yield the AUC across the age strata. Except where noted above, analyses were completed using SAS statistical software (SAS 9.2 software, SAS Institute Inc., Cary, NC, USA). Probability values <0.05 were considered to be statistically significant. All tests of statistical significance were two-tailed.

Results

Distribution of patient characteristics in the training and testing datasets

The baseline characteristics of BRCA1/2 mutation carriers and non-carriers stratified by training and testing datasets are shown in Table 1. Compared with non-carriers, the BRCA1/2 mutation carriers were statistically significantly younger, more likely to be white, nulliparous or to have a later age at first birth, and to have undergone surgical menopause. As previously reported, age-adjusted mean PMD did not differ between BRCA1/2 carriers and non-carriers [14].

Table 1 Baseline characteristics of BRCA1/2 mutation carriers and non-carriers according to the training and testing datasets

Because women were randomly selected from age quintiles within each risk group for the training and testing datasets, the age distribution of non-carriers in the training set (n = 70) was similar to that of the testing set (n = 30) (P = 0.44). Likewise, the age distribution of the carriers in the training set (n = 107) was similar to those in the testing set (n = 30) (P = 0.95). The distributions of PMD within the risk groups were also similar between training and testing sets (non-carriers: P = 0.66; BRCA1/2 carriers: P = 0.47). There were no statistically significant differences in body mass index (BMI) between the risk groups or between the training and testing datasets.

Descriptive characteristics of selected computer-extracted features

The ICCs between duplicate measurements of the 38 computer-extracted RTA features for the 91 women with repeated readings ranged from 0.79 to 0.99, documenting high reliability of ROI selection and analysis (Additional file 1: Table S1). Additional file 1: Figure S1 shows the number of times that each feature was selected in the 177 leave-one-case-out feature selection analyses of the training data. Of the 9 gray-level magnitude- and 29 texture-based computerized features explored using the training dataset, two gray-level magnitude- (that is, M1: AVE; M2: MinCDF) and two texture-based features (that is, T1: Energy; T2: MaxF (COOC) were selected more than half the time, and were therefore included in subsequent BANN models. A third gray-level feature, “Balance”, was selected in sensitivity analyses in which the training dataset was truncated at the upper age-limit of mutation carriers. The distribution of values for the selected features of Energy and Balance are shown in the scatter plot in Figure 3. This plot demonstrates that the parenchymal texture features of mutation carriers tend to have low Energy, that is, they are less homogenous, with a coarse pattern.

Figure 3
figure 3

Scatterplot of the computer-extracted parenchymal features of Energy and Balance for BRCA1/2 mutation carriers and non-carriers. Energy, a texture-based feature, was identified as distinguishing between carriers and non-carriers; Balance, a gray-level magnitude-based feature, was selected in age-matched analyses. Compared with non-carriers, mutation carriers tended to have a parenchymal texture with low Energy.

Table 2 presents descriptive information related to the selected features. On average, mutation carriers tended to have lower values for the gray-level magnitude-based features, and texture-based features were less homogeneous as compared with the non-carriers. With regard to the three selected gray-level magnitude-based features, the feature characterizing the average gray value within the ROI (“AVE”) was positively correlated with PMD (r = 0.31, P <0.0001), whereas the Balance feature was inversely correlated with PMD (r = -0.32, P <0.0001). A weak inverse correlation was observed between the MinCDF feature (that is, the gray value corresponding to the 5% region cutoff on the cumulative density function) and PMD (r = -0.13, P = 0.04); MinCDF was positively correlated with age (r = 0.23, P = 0.0005). Modest statistically significant inverse correlations were observed between PMD and both of the selected texture-based features, Energy and MaxF (COOC), which are measures of image homogeneity (Energy: r = -0.30, P < 0.0001; MaxF (COOC): r = -0.24, P = 0.0002). These selected texture-based features were positively correlated with age.

Table 2 Descriptive characteristics of selected computer-extracted features

The selected gray-level magnitude-based features (AVE, MinCDF, and Balance) were strongly correlated with each other; however, of the three gray-level magnitude-based features, only MinCDF was statistically significantly and positively correlated with the two selected texture-based features (Additional file 1: Table S2). The selected texture-based features, Energy and MaxF (COOC), were strongly and positively correlated with one another (r = 0.90, P <0.0001) (Additional file 1: Table S2). There were no statistically significant mean differences in the selected computer-extracted feature measures between the training and testing data sets (P -value range from Wilcoxon rank sum test = 0.17 to 0.45; data not shown). Likewise, the descriptive characteristics of and correlations between the selected computer extracted features in the testing dataset were consistent with those observed for the training and testing datasets combined (data not shown).

Relationships between computer-extracted mammographic features and BRCA1/2 mutation status: original training and testing datasets

Table 3 shows the results for the ability of the BANN-trained classifier, developed using the selected feature subset, to distinguish between BRCA1/2 mutation carriers and non-carriers in the independent testing dataset. The AUC (standard error, SE) for the BANN-trained classifier of 0.68 (0.07) was an improvement over the AUC (SE) for PMD alone (0.59 (0.07)); however, the two AUC statistics were not significantly different from one another (P = 0.52), likely due to the small sample size. One SD increase in the probability score from the BANN-trained classifier, developed using the features selected in the original training dataset, was associated in the testing data with about a two-fold increase in the odds of predicting BRCA1/2 mutation status in both unadjusted (odds ratio (OR) = 2.00, 95% CI: 1.59, 2.51, P = 0.02) and age-adjusted (OR = 1.93, 95% CI: 1.53, 2.42, P = 0.03) models. Additional adjustment for PMD did not alter the observed age-adjusted OR. The findings were nearly identical when the BANN-trained classifier, modified to include PMD (that is, Features + PMD in Table 3), was used, and when adjusting for baseline characteristics that differed by mutation status (that is, parity, age at first birth, oral contraceptive use, and surgical menopause) (data not shown).

Table 3 Ability of trained classifier to distinguish between BRCA1/2 mutation carriers and non-carriers in testing dataset

Relationships between computer-extracted mammographic features and BRCA1/2 mutation status: sensitivity analyses utilizing an age-matched testing dataset

By virtue of the Breast Imaging Study eligibility criteria, the BRCA1/2 mutation carriers were on average approximately 10 years younger than the non-carriers. We therefore conducted a series of age-matched sensitivity analyses. First, the testing dataset was restricted to pairs of BRCA1/2 mutation carriers and non-carriers matched on age within ±3 years (Additional file 1: Table S3). The mean paired differences in the probability scores from the trained classifiers developed using selected features alone and the features plus PMD were statistically significantly greater than zero (P = 0.02 and P = 0.02, respectively). Using the same age-matched testing dataset, the corresponding AUC (SE) values for the BANN-trained classifier without and with PMD were 0.71 (0.09) and 0.72 (0.08), respectively. When matching on age within ±1 year, the findings were similar, although the mean paired difference in the probability score was no longer statistically significant (P = 0.06 for features alone and P = 0.08 for features + PMD). Computing AUCs within age strata yielded comparable results (data not shown).

We performed additional sensitivity analyses by removing women above age 55 years (the upper limit of age among the Breast Imaging Study participants) from both the training and testing datasets. This resulted in 96 women in the training data (48 mutation carriers and 48 non-carriers) and 38 women in the testing data (19 carriers and 19 non-carriers) matched on age within ±3 years. The mean paired difference in the probability score from the BANN-trained classifier, developed using the newly-selected feature subset (MinCDF, MaxF (COOC), Balance), was of borderline statistical significance (P = 0.055). Forcing PMD into the BANN-trained classifier did not substantially alter our ability to distinguish between BRCA1/2 mutation carriers and non-carriers (P = 0.06). The AUC (SE) for the BANN-trained classifier to distinguish between BRCA1/2 mutation carriers and non-carriers was 0.72 (0.09) for features alone and 0.71 (0.09) for the features plus PMD. The results from these sensitivity analyses are consistent with those from our primary analyses based on the original testing dataset.

In contrast to the differences we observed in the BANN-trained classifier between BRCA1/2 mutation carriers and non-carriers, we did not observe any statistically significant mean paired differences in PMD between the test-set pairs age-matched within ±3 or ±1 years or when using the age-restricted dataset (Additional file 1: Table S3, P = 0.83, P = 1.00, and P = 0.83, respectively).

Discussion

We investigated relationships between computer-extracted mammographic texture features and BRCA1/2 mutation status among women without breast cancer, and identified novel mammographic texture features (AVE, MinCDF, Energy, MaxF (COOC)) that appear to distinguish BRCA1/2 mutation carriers from non-carriers. We had previously observed no difference in percent density obtained from the entire mammogram by BRCA1/2 mutation status in this same population, motivating our search for new informative parenchymal characteristics based on radiographic texture analysis within a retro-areolar ROI. These associations changed minimally when we included PMD in models with the four selected texture features. Thus, the associations we have identified between specific RTA features and mutation status are independent from any possible modifying effect of mammographic density, which in both our prior work and that of others appears no different in mutation carriers than that observed in the general population [12],[14]-[18]. The strength of the RTA feature associations was attenuated when mutation carriers were age-matched to non-carriers, likely due to reduced sample size. Our study adds to the existing RTA literature [19],[20] by analyzing the largest number of mutation carriers yet studied in this manner, and our findings indicate that computer-extracted mammographic features provide some additional information for identifying women likely to carry BRCA1/2 mutations. The RTA classifier we have identified could prove a useful adjunct to mammographic interpretation both in women from families with many affected relatives in whom no genetic susceptibility has yet been identified and in families known to have mutations in these genes. However, because the positive predictive value of such a test would be low in the general population, owing to the rarity of these mutations, the strength of the association we found is not high enough for screening a general population to identify candidates for mutation testing.

The texture-based features Energy and MaxF (COOC) - which describe the spatial distribution pattern for tissue homogeneity - and AVE and MinCDF - which provide gray-level magnitude information on tissue denseness - were the strongest RTA predictors of mutation status within a given ROI. The RTA texture-based features selected in this study characterize similar parenchymal attributes found in previous studies on digitized screen/film mammograms [19],[27],[28],[30], such that BRCA1/2 mutation carriers tend to have retro-areolar parenchymal patterns that are coarse in texture. It is important to note that a given parenchymal attribute may be described by multiple computer-extracted features. For example, image homogeneity can be measured by Energy and the largest number of a gray-value pair in the co-occurrence matrix (MaxF (COOC)), as selected in this study, or by the first moment of the power spectrum (FMP) or Coarseness, which Huo et al. and Li et al. previously found to be associated with BRCA1/2 mutation status [19],[27]. In addition, our findings are consistent with two case-control studies reporting that mammograms of coarse texture are associated with increased breast cancer risk [23],[24]. In these studies, however, simultaneous inclusion of the texture features in a model with PMD did not improve breast cancer risk prediction [23],[24].

Although we found that the selected texture features significantly improved our ability to distinguish between mutation carriers and non-carriers when compared with PMD alone, ours was a cross-sectional study evaluating features associated with BRCA1/2 mutation status rather than subsequent risk of developing breast cancer. Prior studies have questioned the importance of mammographic density for breast cancer risk prediction among BRCA1/2 mutation carriers [11]; further research is warranted to investigate the predictive value of computer-extracted texture features among this high-risk patient population. We currently have no information on the association between the RTA classifier and the risk of breast cancer per se among BRCA mutation carriers. While it may seem logical to assume that women with the BRCA-related RTA mammographic texture pattern will actually be at increased risk of breast cancer, that fact has not yet been established. Further clinical development of the RTA classifier will require proof of this hypothesized association; we strongly recommend that a new study with that question as its primary study endpoint be undertaken.

Mutation carriers tended to have lower values for the RTA gray-level magnitude-based features selected in this study, suggesting that their breasts were less dense in the retro-areolar region as compared with the non-carriers. This finding is inconsistent with prior studies suggesting that mutation carriers have gray-level magnitude-based features that are low in contrast [19],[30]. It is possible that differences in film digitizers and/or digital mammographic image acquisition systems between studies could influence RTA, particularly for the gray-level magnitude-based features which have been previously shown to be sensitive to the effects of variable gain [48]. Consistent with the idea that texture-based features are more robust than gray-level magnitude-based features across systems of varying gain [48], a prior study, which utilized full-field digital mammograms (FFDM) to identify high-risk features, resulted in selection of only spatial distribution texture-based features [49]. Hence, the gray-level magnitude-based features that were related to mutation status in our study population may not be generalizable to FFDM. This is not surprising as image processing of FFDM permits the degree of contrast in the image to be manipulated, such that contrast may be increased in the dense areas of the breast in order to maximize mammographic sensitivity [50]. As clinical practice is rapidly shifting toward digital breast imaging, this work should set the stage for applying the strategies described herein to newer images from mutation carriers as they become available.

Our research method was also limited by the need for manual placement of retro-areolar ROIs; however, manual ROI reselection for a randomly selected subset of participants was found to be highly reliable, both in this study and as reported previously [20]. Automation of ROI placement could be applied in future work. Our study had several strengths, including the largest number of mutation carriers and non-carriers yet studied in this manner, assessment of digitized images that was completely masked to mutation status and evaluation of the proposed classifier in independent test data. Although the discriminatory accuracy of the RTA classifier was modest (AUC = 0.68), and for a diagnostic test we would like to have a higher value, the AUC does compare favorably with AUC statistics reported in most breast cancer risk models [51]. Further, we performed extensive sensitivity analyses, and our findings persisted in the presence of multiple potential confounding factors, including age and PMD. Although statistical power was limited for the age-matched sensitivity analyses, these analyses provided an important confirmatory way to control for age and results were consistent in their suggestion of a relation between computer-extracted mammographic texture pattern features and mutation status. Thus, our findings warrant validation in larger independent clinical studies.

The biology of mammographic density is poorly understood [52],[53], and the biologic correlates of texture-based features are even less well-characterized. Nevertheless, evidence from animal models and human breast tissues suggests underlying biological differences in the molecular histology and pathology of the breast by BRCA1/2 mutation status [54]-[57]. While it is possible that our results may be related to true anatomical differences between carriers and non-carriers as reflected in their parenchymal patterns, other biologic factors, such as biochemical differences, also need to be explored.

Conclusions

Several noteworthy clinical implications flow from our results. First, we confirm an important observation, previously made by Huo et al. and Li et al. [19],[20] but not widely appreciated in the clinical community: the digitized mammographic image contains computer-extractable information not captured during routine radiologic interpretation which may permit improved, real-time risk stratification among women undergoing screening mammography. Nonetheless, it is early days for the tools used in this analysis; further development of these techniques might identify additional, more strongly-correlated features. In the current instance, our computer model was significantly correlated with the presence of deleterious mutations in BRCA1/2, conferring a two-fold increase in the likelihood of being a mutation carrier, per one SD increase in the probability score. If the interpreting radiologist were to be made aware of this information while reading clinical mammographic images, it could alter image interpretation by increasing the prior probability of disease in subjects with the BRCA-related pattern. The model’s ability to distinguish between BRCA1/2 mutation carriers and non-carriers might, in the context of a positive family history of breast and/or ovarian cancer, serve as an indicator to consider formal genetic risk assessment in persons who have not been previously tested. Integration of breast imaging data with family history and breast tumor markers could be formally assessed by estimating the added value of our image-based probability score to existing statistical models that are used to predict BRCA1/2 mutations [58]. Although mathematical and statistical concepts involved in generating the RTA classifier are complex, a great deal of work has already been done relative to the details of this methodology. Should the RTA classifier be validated clinically, this algorithm is amenable to a user-friendly implementation. The current data do not support these clinical applications at the present time, but they provide a solid basis for extending this novel research into larger, more rigorously-designed studies utilizing digital imaging modalities. Our findings also serve as a reminder of the importance of keeping an open mind relative to novel applications of old technologies. This value-added strategy may improve the cost-benefit ratios of tried, true and readily available clinical tests, without the development costs associated with an entirely new technology.

Additional files