Background

Women with mammographically dense breasts are at a higher risk of developing breast cancer than women with more fatty breasts. The risk of developing breast cancer can be four- to six-times higher in women with breast density in the top quartile of the population compared to the bottom quartile [1, 2]. Why breast density is predictive of future cancer occurrence is not fully known. What is known is that breast density is not homogeneous. Some of the earliest measures of breast density categorized the appearance of mammograms by the patterns projected from the heterogeneity of the tissue [3]. However, the description of the heterogeneity, or “texture”, has not been incorporated in standardization reporting of breast density categories in the Breast Imaging-Reporting and Data System (BI-RADS) [4], or the quantitative measures of volumetric breast density using methods such as the Volpara (Matakina, Wellington, New Zealand) and Quantra (Hologic, Inc., Marlborough, MA, USA) [5].

Breast density texture can be described using numerous statistical descriptors of the distribution and spatial relationship of grayscale values in the image pixels. Texture has been studied as a breast cancer risk factor independent of average breast density [611], but the results have not been adequately adjusted for breast density and other risk factors. For example, Byng et al. reported a negative significant correlation between regional skewness, fractal dimension, and cancer risk [7]. However, Torres-Mejia et al. [6] reported that the regional skewness and fractal dimensions had no association with breast cancer after adjusting for other risk factors and overall breast density. One feature, lacunarity, remained significant [6]. Manduca et al. found that skewness and kurtosis did not predict breast cancer risk [8], but did find associations for the Markovian, run length, Laws, wavelet, and Fourier transformations. After adjustment for planar mammographic percent density (PD), each feature attenuated only slightly and retained statistical significance; however, simultaneous inclusion of these features in a model with PD did not significantly improve the ability to predict breast cancer [8]. Other studies have shown that differences in texture and density features are related to predisposing mutations and tumor type including BRCA1/BRCA2 mutation carriers [1214] and estrogen receptor (ER) status [1517]. Thus, the density patterns of the parenchymal tissue have attracted clinical attention because of their potential to offer additional information about subtype and cancer biology. However, it remains unknown if breast texture descriptors will help better identify women at high risk of breast cancer from standard screening mammograms.

To this end, we amassed a library of imaging features previously reported on in the breast imaging and general imaging literature as candidate descriptors of breast tissue characteristics. In this study, we investigated the association of these descriptors and breast cancer risk using prospectively acquired mammograms from five breast cancer epidemiology studies. We also examined the association of these descriptors to tumor type and ER status.

Methods

Study design

This study is a large, comprehensive pooled analysis of five case–control studies, two of which were nested within cohorts, to examine the association between texture of mammographic density and breast cancer risk and breast cancer subtypes.

Study population

The studies and populations used in this analysis have been previously described elsewhere [16]. Briefly, the participating studies included the Mayo Mammography Health Study (MMHS) [18], the Nurses’ Health Studies (NHS and NHSII) [19], the Mayo Clinic Mammography Study (MCMAM) [20], and the San Francisco Bay Area Breast Cancer SPORE and San Francisco Mammography Registry (SFMR) at the University of California San Francisco (UCSF) [21]. Breast cancer cases diagnosed within 6 months of mammography were excluded from all studies. We collected covariate data from medical record review (MCMAM), and self-administered questionnaires (NHS, NHSII, SFMR), or both (MMHS). Information was obtained before (NHS, NHSII) or at the time of (MMHS, MCMAM, SFMR) screening mammogram. The Institutional Review Boards at the Mayo Clinic, Brigham and Women’s Hospital, UCSF, and the Connecticut Department of Public Health Human Investigations Committee reviewed and approved these studies. Informed consent was obtained or implied by return of questionnaires (NHS, NHSII).

There were 9353 women with screening visits during the study period from all studies. For MMHS and SFMR only, due to study design, large batches of cases were digitized at one time followed later by batches of matched controls. Thus, to ensure no bias due to potential confounding by digitization we only included those cases and matched controls that were digitized in the same batches, resulting in a substantially reduced sample for these two studies. To ensure that no bias was associated with study exclusions due to digitizer in these two studies, we compared the included cancer cases to the excluded cancer cases. We found that the eligible vs. excluded cases did not differ in terms of their demographic and clinical characteristics (P > 0.05). Similarly, matched controls were compared against the whole study population and were found to be comparable (data not shown). Overall, 2830 women were eligible for our case–control set and 6523 (69.7% of population) from MMHS and SFMR were excluded. Of these, mammograms of 1171 breast cancer cases and 1659 controls were analyzed.

Mammogram digitization and harmonization

For this study, the craniocaudal (cc) views of screening examinations of both breasts were digitized at each respective study site. The cc view images were more conducive to being analyzed automatically with our algorithms; also, not all studies had mediolateral oblique views available. The MMHS screen-film mammograms were digitized on the Array 2905 laser digitizer (Array Corporation, The Netherlands) that has 50-μm (limiting) pixel spacing with 12-bit grayscale bit depth. The MCMAM mammograms were digitized on a Lumiscan 85 scanner with 12-bit grayscale bit depth and 0.100 × 0.100 mm2 pixel size. For mammograms provided by the SFMR, digitization was performed using two digitizers, a R2 ImageChecker with 16-bit dynamic range and 150-μm pixel size, and a Vidar Diagnostic Pro (Vidar Systems Corporation) with 16-bit dynamic range and 169-μm pixel size. For NHS and NHSII, film mammograms were digitized at 261 μm per pixel with a Lumisys 85 laser film scanner (Lumisys, Sunnyvale, CA, USA) or a VIDAR CAD PRO Advantage scanner (VIDAR Systems Corporation, Herndon, VA, USA) and comparable resolution of 150 dots per inch and 12 bit depth. To minimize effects of the film digitization process, we performed a harmonization procedure by rescaling all images to have the same pixel size and dynamic range. The ultimate space resolution was set to 160 μm using a Matlab “imresize” function with default parameters (bicubic interpolation). The dynamic scale of all images was converted into 16-bit grayscale by the proper coefficient multiplication.

Assessment of mammographic density

To quantify PD, two semi-automatic threshold techniques were applied: Cumulus [22] (all studies besides SFMR) and UCSF custom software [23] (SFMR study; comparable to Cumulus). The test at the beginning of the study demonstrated that there was high correlation between the UCSF and Cumulus methods. As documented in [16], similar results are obtained from an average of both breasts and from a randomly selected side. We quantified PD on the contralateral breast for cases and the corresponding side for matched controls for all studies except NHS and NHSII where the average PD of both left and right views were used. Only one reader read the images at each site. To match PD measures between readers and studies, we standardized the readings by removing the study-specific age trends, standardizing the variability across studies, and incorporating the known age trend in PD into the standardized PD. Details of this standardization procedure have been previously published [16].

Breast texture measurements

We automated 46 candidate image texture features into our mammography image analysis program (Table 1). Features were measured on both left and right cc views for all subjects. The texture analysis was performed in the entire breast area. The entire breast area was automatically segmented from the background by global thresholding. Texture measures were grouped by the type of statistical description. Features derived from the histogram of the mammographic grayscale values were grouped as “Gray-Level Histogram” and include the image Standard Deviation, Skewness, Kurtosis, and Balance [7, 22, 2426]. The second-order features described the spatial relationships between pixel intensities. We derived these second-order features using two matrixes: gray-level co-occurrence matrix (GLCM) [24, 25, 27] and neighborhood gray-tone difference matrix (NGTDM) [24, 28]. The GLCM matrix defined the distribution of co-occurring values at a given pixel offset in the image. Because co-occurrence matrices were often large and sparse, various metrics were used to describe the features of the matrix. The GLCM matrix was created by Matlab “graycomatrix” function with a number of gray levels equal to 16 and offset = [0 1] related to horizontal proximity of the pixels. The features used to describe a GLCM are often called Haralick features [27], and include Energy, Entropy, Dissimilarity, Contrast, Homogeneity, Correlation, Mean and Variance. In the textural analysis, the GLCM Entropy represents image pixel spatial disorder (e.g., heavy heterogeneous textures versus a flat gray level and smooth textures). The GLCM Energy represents local homogeneity and is a measure opposite to GLCM Entropy. Actually, this texture feature describes the degree of texture uniformity; basically, more homogeneous texture has a higher Energy. For example, the image with only constant grayscale pixels has Energy equal to 1. Other similar texture features from this table are GLCM Homogeneity and Dissimilarity. Homogeneity measures how uniform are the non-zero entries in the GLCM matrix. This feature represents existence of repetitions in texture. The image with irregular texture elements and their spatial positions is characterized by low Homogeneity. An image that contains repetitive structures represents high Homogeneity. Dissimilarity is a measure that defines the variation of gray level pairs in an image. It is very similar to Contrast with a difference in the weight.

Table 1 Image texture features that are currently defined for all study participants

The NGTDM is a column matrix, which was first defined by Amadasun and King [28]. This matrix was derived by calculating the gray level difference between pixels with a certain gray level and their neighboring pixels. The NGTDM features included were Coarseness, Contrast, Complexity, Strength and Busyness [24, 28]. One feature, the mean gradient, was from a group of features called the Edge Frequency Analysis group. Lastly, Fourier and fractal analysis groups defined the remainder of the features. Fourier transform (FT) operations were used to estimate features in the frequency domain: root mean square (FT_RMS), first (FT_FMP) and second (FT_SMP) moments of power spectrum, and fractal dimension (FD) from power spectrum exponent (FT_FD) [29]. To define fractal qualities, shapes within the image were created using the pixels at a percentage threshold value of the total contrast (i.e., FD_TH_X, for threshold at X = 5, 10, 15…85%). These features were derived by a box counting method. Further fractal features include FD of the standard deviation (FD_Sigma), intercept of the plot of the standard deviation of the high frequency image as a function of the size the kernel (CD_Yint), slope of the plot of the standard deviation of the high-frequency image as a function of the size the kernel (CD_Slope), standard deviation of the mean value of the breast pixels rows (HZ_PROJ), FD of the surface of the breast considering the gray value represents the height (FD_CALDWELL) [30, 31], and Minkowski fractal dimension (FD_Minkowski) derived from morphological image operations [29]. The FD_Minkowski is similar to the box counting fractal dimensions (i.e., FD_TH variables). It is calculated by an image dilation procedure with different scale structure disk element. As a result of edge frequency analysis, the mean gradient parameter was created. We previously demonstrated the utility of this set of features for derivation of volumetric breast density by a statistical model approach [32].

Assessment of tumor characteristics

Tumor type (invasive vs. ductal carcinoma in situ (DCIS)) and ER status were available using Northern and Southern California Surveillance Epidemiology and End Results programs for SFMR, pathology reports or immunohistochemical analysis of tumor microarrays for NHS and NHSII, and state and clinic cancer registries for MMHS and MCMAM.

Statistical analysis

Risk factors and PD phenotypes were harmonized on the eligible cases and controls. For all subjects, concordance between features measured on left and right sides were evaluated. Lin’s concordance correlation coefficients were used to summarize the correlation between left and right sides. Values ranged from 0.50 to 0.98 with median of 0.85. Given this, we chose to average sides to reduce noise in the measurements. To avoid issues with outliers and violations of distributional assumptions, the averaged features were normalized within each study using a normal transformation of the ranks. All analyses were performed using the normalized features. Logistic regression models evaluated the overall breast cancer associations with each normalized feature as a continuous variable and results are presented as odds ratio (OR) per 1 standard deviation (SD). All models were adjusted for age (continuous), body mass index (BMI) (continuous), first-degree family history of breast cancer (yes vs. no vs. unknown), PD (continuous), and study. To assess whether there were differences in associations by study, we included and tested an interaction term for texture feature by study. Study-specific results were also examined and summarized. The top 15 of 46 analyzed features that were significant (p < 0.05) in the case–control models were selected for further analysis. Polytomous logistic regression models were fitted to examine associations of features with respect to invasive/DCIS breast cancers and ER status. Contrasts were constructed within the polytomous model framework to test for differences of feature associations between tumor subgroups (p-het). SAS version 9.3 was used for analyses and two-sided p values < 0.05 were considered to be statistically significant. Pearson correlation coefficients were used to examine correlations among features and also correlations of features with PD among control subjects. Dendrograms were created to illustrate clustering among the significant features, age, body mass index (BMI), and PD on data from controls. A hierarchical clustering method using averaged distance was utilized as implemented in “proc cluster” in SAS.

Results

The baseline case and control characteristics of the eligible population are shown in Table 2. The cases had stronger family history and were more likely to have higher PD compared with controls. Both cases and control groups were of similar age, BMI, menopause status, and parity. The baseline characteristics of the study population separated by study site are presented in Additional file 1 (Table S1). The NHSII site population is different from other sites by lower age, premenopausal prevalence, and higher PD. The baseline characteristics of study population separated by study site demonstrate similar trends between cancers and controls as above mentioned.

Table 2 Baseline characteristics of study population matched by age, date of mammogram, and study

The top 15 of 46 analyzed features had a statistically significant (p < 0.05) association with breast cancer after adjustment for age, BMI, family history, PD, and study (Table 3). It should be noted that the features mostly follow the same trend across studies even though some are not significant in their separate OR estimation, and there was no evidence of study heterogeneity for any feature (p > 0.05 for all). Study-specific estimates for SFMR were often not consistent with other studies. In sensitivity analysis, we excluded SFMR to explore the impact of these differences and found similar results (data not shown). Three features with the strongest association were FD_TH_75, Energy, and Entropy. Increasing the FD_TH_75 and Energy feature values were associated with a decreased risk of breast cancer while increasing Entropy was associated with an increased risk of breast cancer. The fractal dimension features were separated into two groups. The first group described the fractal dimensions in the densest pixels, and contained features FD_TH_60, FD_TH_65, FD_TH_70, FD_TH_75, FD_TH_80, FD_TH_85, and FD_Minkowski. All these features were significant and were associated with a decrease in cancer risk with the most significant association OR (95% confidence interval (CI)) per 1 SD = 0.87 (0.79–0.95) for FD_TH_75. The second feature group described fractal dimensions in the lower density (less opaque) pixels: FD_TH_10 and FD_TH_15. In contrast to the first group, they were associated with an increase in breast cancer risk. Energy and Entropy demonstrate opposite associations to cancer with OR (95% CI) 0.88 (0.81–0.96) and 1.14 (1.05–1.25), respectively. The GLCM features Homogeneity and Dissimilarity showed opposite trends with OR (95% CI) 1.10 (1.01–1.20) and 0.91 (0.83–0.99), respectively. Table 3 also demonstrates the results of area under the curve (AUC) analysis of different feature models. For the baseline model (adjusted for age, BMI, family history, PD, and study), AUC was 0.617 and with with top feature (FD_TH_75) it was 0.621, suggesting modest increases in discrimination with the addition of this texture feature.

Table 3 The top 15 of 46 analyzed features were significant (p < 0.05) in the case–control models

Figure 1 shows the dendrogram noting the clustering of the top 15 features and clinical risk factors (PD, age, BMI) restricted to the control subjects (see Additional file 2: Figure S1 for clustering results restricted to the cases). The features separated into two primary clusters. Within the first cluster, features FD_TH_60 through FD_TH_85 formed a subcluster separate from the other non-feature risk factors. Interestingly, the clinical risk factors (BMI, age, PD) form a subcluster with Kurtosis and Busyness independent of other features. The second main cluster includes pairs of Entropy/Energy, Dissimilarity/Homogeneity, and FD_TH_10/FD_TH_15. The intercorrelation of each feature and risk factor calculated using control subjects is shown in Table 4 (see Additional file 1: Table S2 for intercorrelation calculated using case subjects). Interestingly PD is highly correlated to features similar to FD_TH_75, FD_Minkowski and Kurtosis from the same primary cluster group. However, the features of the second primary cluster show no or negligible association with PD.

Fig. 1
figure 1

Dendrogram of cluster analysis of the top 15 features with PD, age, and BMI. Similar features cluster together. Percent density groups closely with body mass index (BMI) and age. The figure is restricted to the controls

Table 4 Pearson correlation coefficient for the top 15 significant features

Figure 2 shows representative images with similar densities but different feature values for the FD_TH_75 feature. We selected images with FD_TH_75 values in the top and bottom 20% of values matched by BMI, PD, age, case status, and study. The top row of Fig. 2 has similar low PD densities (17%) while the bottom row has a relatively high PD (67%). The inner black delineation lines in each breast image show the delineation lines of the tissue used to describe FD_TH_75. The outer black delineation lines show the delineation lines of the tissue used to describe FD_TH_15. The top left and bottom left images show a top 20th percent tile value of FD_TH_75 while the top right and bottom right images show a bottom 20th percent tile value.

Fig. 2
figure 2

Representative images with similar densities but different groups: FD_TH_75 values in the top and bottom 20% of values matched by BMI, PD, age, case status, and study. The top row has similar low PD densities (17%) while the bottom row has a relatively high PD (67%). The inner black delineation lines in each breast image show the delineation lines of the tissue used to describe FD_TH_75. The outer black delineation lines show the delineation lines of the tissue used to describe FD_TH_15. The top left and bottom left images show a top 20th percent tile value of FD_TH_75 while the top right and bottom right images show a bottom 20th percent tile value

In Table 5, the breast cancer risk associated with DCIS and invasive cancer is shown for the 15 most significant features found overall, adjusted for age, BMI, and PD. While invasive cancers have approximately the same significant features as the all-cancer results in Table 2, DCIS showed a smaller number of significant associations with features. FD_TH_10 and FD_TH_15 significantly associated with DCIS risk, but not with invasive cancer. Five features were significantly associated with the ER+ cases (Table 5) while no features were significantly associated with ER– status, although power was limited. The patterns of association were similar for risk of DCIS, invasive breast cancer, and ER+ and ER– breast cancer.

Table 5 Risk associated of either DCIS or invasive cancer for each feature

Discussion

The combined results of five separate studies, including 1171 cancer cases and 1659 controls, were used to study the association of mammographic textural features on film-screen mammograms, independent of PD, with breast cancer risk overall and defined by tumor type and ER status. Of the 46 features studied, several candidate features demonstrated an association with breast cancer overall. The addition of individual texture features to the baseline model (adjusted for age, BMI, family, PD, and study) demonstrated modest increases in the discriminatory ability of the model. The patterns of association were found to be similar for the risk of DCIS, invasive breast cancer, and ER+ and ER– breast cancer, although there were differences in magnitude of the associations between invasive/DCIS, ER+/ER– status cancer subtypes, and specific features. We also found that many mammographic features associated with breast cancer were not correlated with PD, a desirable quality for potentially improving the discrimination of risk-prediction models. Specifically, the GLCM Entropy/Energy and Homogeneity/Dissimilarity, Busyness, FD_15, and FD_10 features may be tested in combination with PD in risk-prediction models.

In previous reports, there have been few examples of texture features that are associated with cancer independent of PD. Torres-Mejia et al. [6] found no significant breast cancer risk association of fractal features after adjusting for PD, and Manduca et al. [8] found that features did not add additional significance when adjusted for PD. We found several fractal dimension features associated with breast cancer risk (FD_TH_5:FD_TH_85), but the association was reversed dependent on the threshold level used to create the line profiles. An example was given of the FD_TH_75 (line profile outlining highly dense tissue) and FD_TH_15 (line profile outlining the edge of the compressed area) in Fig. 1. Thus, the reversal in association from high to low risk is associated with defining fractal characteristics in different types of tissue. Another fractal dimension feature, FD_Minkowski, showed a decreased association with cancer risk similar to FD_TH_75. These measures are closely mathematically related as noted by their clustering in the dendrogram. Unlike other studies, the association of FD_Minkowski feature with breast cancer risk [6] remained significant after adjustment for PD and other risk factors.

Other associated features include the paired features Entropy and Energy as well as Homogeneity and Dissimilarity. The Entropy is intuitively assumed to be significant for breast cancer risk because tissue with high entropy is more heterogeneous. Energy value is associated with a reduced risk of breast cancer because it is related to tissue with more homogeneous texture. The features that denoted more coarseness increased risk and those that were less coarse did not increase risk or were protective. The Pearson correlation coefficients show the features in both pairs are highly negatively correlated. The protective character of Dissimilarity (or Contrast) is not intuitive. We can speculate that finer structure has high contrast and has similar behavior to fractal dimension. Other studies provided an important role for mammographic textures such as fractal dimensions, GLCM matrix parameters, and power Fourier spectrum in distinguishing between BRCA1/BRCA2 gene mutations and cancer risks [29, 33]. These results are consistent with the results of our study. The fractal dimension and GLCM features derived in our study also demonstrate a significant association with breast cancer risk. The cause and underlying biology of mammographic feature association to breast cancer risk is complex. The features responsible for increased cancer risk are likely to be a measure of image heterogeneity or a degree of local tissue disorganization. Mammograms visualize breast tissue patterns consisting of epithelial and stromal cells, collagen, and fat. These tissue components communicate and interact with each other. Each component may influence the risk and progression of breast cancer [34]. Entropy associated with an increased risk of breast cancer and represented a measure of spatial disorder likely to show a degree of tissue heterogeneity. It could be associated with processes on the cellular level where increased entropy is stated to be as a metaphor of progressive irreversible loss of initial order (e.g., by acquiring mutations) in the cell [35]. Another significant feature, FD_TH_75, associated with a decreased risk of breast cancer is also related to tissue heterogeneity but in the opposite direction. As shown in Fig. 2 (top right and bottom right images), FD_TH_75 in the bottom 20th percent tile values represents highly heterogeneous tissue.

Our study had the following limitations. First, many films, especially from the SFMR, were excluded due to temporal inconsistencies with the digitization of cases and controls. Harmonization procedures were needed to rescale the spatial dimensions and dynamic range. Ideally, all images would have been digitized on one digitizer, or been a native digital format (versus film). We also had few ER– and DCIS cancer subtypes, limiting our power for these subtypes. For example, the FD_TH_10 and FD_TH_15 features look promising to differentiate DCIS from invasive cancer because, even with fewer cases, they showed significance for DCIS and were not significant for invasive cancers. However, the heterogeneity p values to test for differences in effect between DCIS and invasive cancer subgroups were p = 0.09 and p = 0.21 for FD_TH_15 and FD_TH_10, respectively. Finally, film mammography has largely been replaced by full-field digital mammography systems as well as three-dimensional tomosynthesis systems. However, texture features measured using film mammograms have been shown to be in a good agreement with those measures using digital mammography systems [36]. It is an important point for future validation of the proposed texture features to add MLO view mammograms, to estimate rotation-invariant measures by averaging GLCM features over the four rotations (0, 45, 90, 135 degrees), and to apply them for tomosynthesis slices and projections.

Conclusions

We conclude that the description of breast density texture from mammograms shows promise as an independent risk factor for breast cancer risk and potentially differentiating between risks of cancer subtypes. For future work, we plan to assess risk prediction combining mammographic density and features assessed on digital mammography and tomosynthesis images.