Background

The incidence and mortality rates of breast cancer remain extremely high despite advances in screening and treatment [1]. In the USA, it is estimated that, in 2016, there will be 246,660 new cases of invasive breast cancer and 40,450 breast cancer deaths [2]. Therefore, better strategies are urgently needed to identify women at high risk for developing breast cancer who could benefit the most from supplemental screening and preventive therapies [3, 4].

Unfortunately, to date, the broadly available risk assessment models cannot identify high-risk women reliably within the general population. Current models predict either the risk of carrying a high-risk genetic mutation such as BRCA1/2 (e.g., Claus model, BOADICEA, and BRCAPRO) or the risk of developing breast cancer over time with or without such a mutation (e.g., Gail model, BOADICEA, Rosner-Colditz model, and Tyrer-Cuzick model) [5]. These models have only modest discriminatory capacity and continuing efforts are needed to improve these models at the individual level [6]. In addition, genetic susceptibility models are only useful in the familial setting (where cancer pedigree history is known) and are not of relevance to the general population where the great majority of women have no relevant family history. Therefore, in striving to tailor breast cancer screening recommendations for the individual woman [7] it is crucial to develop more accurate risk assessment models that can be easily adopted in routine clinical practice.

While mammography remains the cornerstone of early breast cancer detection [8], it also provides a readily accessible method to assess the distribution of fatty and dense, or fibroglandular (stromal and epithelial), tissues in the breast. In x-ray imaging, fatty tissue appears radiographically lucent, or darker, and dense tissue is radio-opaque, or brighter. Mammographic percent density (PD), a measure of the relative amount of fibroglandular tissue within the breast, has been shown to be related to screening sensitivity and specificity and has also been established as a strong independent risk factor for breast cancer [912]. Studies have repeatedly shown significant associations with breast cancer risk for both qualitative and quantitative breast density measures and a potential to improve cancer risk assessment models [13, 14]. Recent legislation in several US states mandates notification of breast density [15], and substantial research continues to be devoted to accurate measurement of this key biomarker and to its incorporation into risk prediction models [9, 16].

Compared to the global image measure of breast density, parenchymal texture descriptors can provide more refined, localized descriptors to characterize the complexity as well as the morphological distribution of the breast parenchymal patterns. Breast density measures are generally dichotomous, or each area or voxel of breast density measured in the mammogram is compared to a threshold of “dense” or “not dense” without reflecting the broader range and spatial distribution of the various breast parenchymal elements. Parenchymal textural features have been proposed as not only imaging markers that could identify parenchymal changes associated with breast cancer development [1719], but also with subtypes and grading of subsequent breast malignancies [2022]. In addition, there is growing evidence in support of textural features of the breast parenchyma reflecting inherent, independent, biologic risk factors associated with cancer development, and this may thus have the potential to augment breast density in assessing an individual woman’s risk of developing cancer [2325]. Therefore, efforts to incorporate breast parenchymal texture analyses in breast cancer risk assessment have recently also gained substantial momentum.

This article reviews approaches to quantitate mammographic textural features and methods to incorporate these features into breast cancer risk assessment models, focusing primarily on novel computerized approaches. A systematic review of the literature in PubMed was performed to identify all original articles published up to April 2016 that evaluated computational measures of mammographic texture in breast cancer risk assessment. The following keywords were used in combination: “texture” or “parenchymal patterns” or “image features”, “mammography” or “mammogram”, and “breast cancer risk” or “mammographic risk”. To broaden the search, the “related articles” function provided in PubMed was also used, and all articles and citations obtained were reviewed. The references from all the articles identified were also examined for further relevant studies. The last search was conducted on 29 April 2016. Studies not considered relevant to the scope of the review were excluded; other exclusion criteria included: study not published in the English language, full text not available, letter to the editor, and duplicate publication. In the rest of this manuscript, we summarize key methodological details and evaluation results from the 44 research papers identified by the search and discuss future challenges in this promising research field.

Mammographic texture analysis using automatically extracted features

The value of characterizing the mammographic texture of the breast parenchyma in breast cancer risk estimation was originally demonstrated in the pioneering studies of Wolfe [26, 27], Boyd et al. [2830], Gram et al. [31], and Brisson et al. [32], proposing visually assessed, qualitative or quantitative classifications which were based on the extent and the characteristics of breast densities in a mammogram. These early approaches have been used by several groups, generally reporting elevated risks among women with more complex parenchymal tissue patterns [3345]. Nevertheless, these studies also observed increased heterogeneity and low reproducibility in corresponding risk estimates due to subjectivity and inter-observer variation in visual appraisal of the mammogram [3345]. By introducing computerized texture features to automate the characterization of breast parenchymal patterns, later studies addressed the limitations of visual classifications and re-established the potential of texture descriptors in breast cancer risk assessment [4650]. Since then, this research field has continuously been evolving. A variety of quantitative methodologies have been developed, involving different techniques to sample the breast and multiple texture descriptors to characterize the texture properties of the sampled regions of interest (ROIs) from cranio-caudal (CC) and mediolateral-oblique (MLO) view mammograms (Table 1).

Table 1 Key studies in automated parenchymal texture analysis for breast cancer risk assessment

In most studies, texture analysis has been performed within a single ROI in the breast (Table 1). This single ROI is usually placed in the retroareolar breast area, while, in some cases, it can be a larger region corresponding to the entire breast or to the largest rectangular box inscribed within the breast (Fig. 1a and b). In an attempt to capture the granularity and heterogeneity of the parenchymal texture within the breast, more recent studies have estimated texture in multiple ROIs throughout the breast (Fig. 1c and d). A lattice-based strategy which splits the entire breast into multiple square patches was proposed by Zheng et al. [51, 52] showing that, with respect to single ROI methodologies, this breast sampling technique may improve risk assessment, with performance being maximized when smaller patches (6.3 × 6.3 mm2) are used. Multiple ROIs defined at various scales of breast tissue density were used by Sun et al. [53], where it was shown that fusing features from different density scales may prove to be another effective way to enhance the cancer prediction performance.

Fig. 1
figure 1

Regions of interest (ROIs) used in texture analysis. a single ROIs selected in the retro-areolar breast area, b the entire breast and the largest rectangular box inscribed within the breast, studied as single ROIs, c multiple ROIs at multiple scales of density, and d multiple ROIs defined by a lattice covering the entire breast

The texture descriptors used in breast cancer risk assessment to date can be broadly classified into five feature groups (Table 2), each of which reveals different aspects of the mammographic texture (Fig. 2): 1) grey-level intensity/histogram features [5456]; 2) co-occurrence (Haralick/Markovian) descriptors [57]; 3) run-length features [58, 59]; 4) structural/pattern measures [46, 6065]; and 5) multi-resolution/spectral features [53, 61, 64, 66, 67]. Gray-level intensity histogram features are common first-order statistics which describe the distribution of gray-level intensity within the breast tissue. The co-occurrence features also consider the spatial relationships of pixel intensities in different directions and are based on the gray-level co-occurrence matrix (GLCM) which encodes the relative frequency of neighboring intensity values. Run-length features capture the coarseness of texture in specified directions by measuring strings of consecutive pixels (i.e., runs) which have the same gray-level intensity along specific linear orientations. Fine textures tend to contain more short runs with similar gray-level intensities, while coarse textures have longer runs with different gray-level intensities. Structural features capture the architectural composition of the parenchyma by characterizing the tissue complexity, the directionality of flow-like structures in the breast, and intensity variations between central and neighboring pixels. Finally, multi-resolution/spectral features use spatial frequency transforms, such as Fourier, wavelet/Gabor, and the Power spectrum, to characterize intrinsic periodic texture structures that repeat over multiple scales.

Table 2 Parenchymal texture descriptors for breast cancer risk assessment; texture descriptors which have been examined in association with breast cancer risk, classified to five feature groups
Fig. 2
figure 2

Characterization of parenchymal patterns using computerized texture analysis. Examples of feature maps showing the distribution of texture values in the breast, generated by the application of the lattice-based strategy of Zheng et al. [51] to an MLO-view full-field digital mammogram. (a) Grey-level histogram, (b) Co-occurrence, (c) Run-length, (d) Structural, and (e) Multi-resolution

Towards a new breast cancer risk assessment paradigm based on mammographic texture descriptors

The proposed methodologies have been applied primarily to digitized film-screen mammograms and more recently on full-field digital mammograms. Texture descriptors have been evaluated in a few prospective and a larger number of retrospective case–control studies, where their discriminatory capacity in breast cancer prediction was typically assessed in terms of the area under the ROC curve (AUC) measuring their ability to distinguish between cancer cases and controls (Table 3). The potential of mammographic texture in breast cancer risk assessment has also been investigated in studies with BRCA1/2 mutation carriers, where the AUC was evaluated in terms of the performance of the texture features in predicting a woman’s risk of carrying this high-risk genetic mutation. Although hereditary breast cancers account for 5–10 % of incident breast cancers, women who inherit a mutated form of the BRCA1/2 gene have up to 87 % risk of developing breast cancer by the age of 70 years [68]. As such, and considering that mammographic PD has not been associated with BRCA1/2 mutation status [6971], the ability of texture to identify potential BRCA1/2 carriers could have important value in risk stratification.

Table 3 Breast cancer prediction capacity of automated characterization of the parenchymal patterns

Associations of parenchymal texture with breast cancer in case–control studies

Byng et al. [60] were the first to evaluate automatically calculated parenchymal texture descriptors directly as independent risk factors for breast cancer. The authors reported on data from a prospective case–control study using 354 incident cases diagnosed with histologically verified invasive breast carcinoma at least 1 year after their entry in the Canadian National Breast Screening Study, and 354 age-matched controls with at least 7 years of negative follow-up. Two grey-level intensity histogram texture features were estimated in screen-film mammograms; specifically, skewness averaged over individual 6.2 × 6.2 mm2 patches in the breast and the fractal dimension estimated by considering the entire breast as a single ROI. For both features, the results showed moderate relative risk (RR) after adjustments for the effects of other risk factors, i.e., age at menarche, menopausal status, age at first time pregnancy, number of live births, family history of breast carcinoma, height, and weight (RR = 3.35 and RR = 3.35 for skewness and fractal dimension, respectively), while no additional contribution to risk was found in models that incorporated breast density measures. Similar conclusions were reported by Torres-Mejia et al. [72] who estimated the same texture features and lacunarity, a measure of the degree of structural variation in image intensities within the breast, from prospectively collected data of 111 breast cancer cases and 3100 controls.

The promising results of these early studies were followed by retrospective studies using more complex parenchymal texture descriptors [61, 64, 7376]. Wei et al. [73] investigated the associations of breast cancer risk with run-length features, using two different implementations of run-length statistics: namely, the conventional approach for calculating the runs of pixels in one direction and an extension for the two-dimensional space [76]. The authors found that the run-length measures calculated in the retroareolar region of the breast could serve as an additional risk factor that could not be explained by established breast cancer risk factors (i.e., age, BMI, family history of breast cancer, and number of previous biopsies) and breast density. A mammographic texture resemblance (MTR) marker based on multi-scale Gaussian features was proposed by Nielsen et al. [61]. This marker demonstrated high case–control discriminatory performance (AUC = 0.60–0.63) in two independent cohorts within the Dutch screening program [61] and the Mayo Mammography Health Study [61, 64], while performance was optimized by an aggregate marker combining MTR with density measures (AUC = 0.66). Gaussian derivative features at multiple scales were examined in a cross-sectional study with MLO-view film mammograms of 245 cancer cases and 245 controls from the Nijmegen risk-assessment study [74]. In this work, derivative features were extracted using an anatomically oriented breast coordinate system and, compared to breast PD, demonstrated enhanced breast cancer prediction ability (AUC = 0.63 versus 0.56). Finally, a preliminary study on the dual-tree complex wavelet transform showed that wavelet features alone may have value in risk assessment [75].

In an attempt to identify highly discriminative texture descriptors from multiple feature groups and develop optimal combinations that maximize the case–control classification performance, research groups have also explored comprehensive sets of multi-parametric features reflecting various aspects of mammographic texture [51, 53, 56, 66, 77, 78]. Following an evaluation of more than 1000 co-occurrence, run-length, Laws, wavelet, and Fourier features in prior film mammograms of 246 cases and 522 controls, Manduca et al. [66] identified individual features which, when estimated at a coarse scale of a single ROI covering the entire breast, provided strong prediction for future breast cancer (odds ratio per 1 SD = 1.36–1.50, AUC = 0.61–0.62). In another retrospective study with 864 cancer cases and 418 controls, a three-step variable selection process separated 46 highly discriminative features from a total number of 470 features initially calculated [56]. When fed to multivariable logistic regression models adjusted for established breast cancer risk factors, these features demonstrated an AUC of 0.79 and an odds ratio of 2.88, while the additional inclusion of breast PD did not lead to any further performance improvement.

Promising results from rich feature sets were also recently reported for digital mammograms. In a study with CC-view digital mammograms of 141 cases and 199 controls, a total number of 765 features were computed from ROIs defined at multiple density scales [53]. From these features, an optimal set of 12 features was selected and yielded an AUC of 0.73 in separating the two study subgroups using a support vector machine classifier. Zheng et al. [51] retrospectively analyzed MLO-view digital mammograms of 106 cases and 318 controls, where 30 candidate features were extracted from multiple adjacent ROIs covering the entire parenchyma. The authors showed a collective discriminatory capacity of AUC = 0.85, with the fractal dimension, run-length, co-occurrence, and gray-level histogram features being more frequently selected than local binary and edge-enhancing index features in classification models. Furthermore, preliminary comparisons of the parenchymal patterns of estrogen-receptor positive (ER+) and negative (ER–) cancer cases measured with the same methodology [51] showed that subtype-specific breast cancer risk assessment based on mammographic textures may also be feasible [79]. Finally, to assess the combined discriminatory ability of texture analysis in CC and MLO views, Tan et al. [78] designed an artificial neural network model to fuse the features extracted from the two views. Following an evaluation of 79 features calculated from a single ROI, corresponding to either the entire breast or the dense tissue areas of the breast, on 430 cases and 440 controls, the highest performance of the proposed fusion model (AUC = 0.73) was obtained for the run-length features of the dense tissue. The authors also demonstrated a classification performance of similar magnitude (AUC = 0.71) for the same fusion model when applied on texture features of the entire breast for a larger dataset of 821 cancer cases and 1084 controls [77].

Assessing the risk of carrying a high-risk gene mutation

The potential of mammographic texture in breast cancer risk assessment has also been demonstrated in studies with BRCA1/2 carriers, where texture features from a single 25.6 × 25.6 mm2 retro-areolar ROI in CC mammographic views were shown to predict a woman’s risk of carrying this high-risk genetic mutation. The first study addressing this topic extracted a comprehensive feature set of grey-level intensity statistics, co-occurrence features, and multi-scale texture measures based on Fourier transform analysis [80]. In film mammograms of 30 BRCA1/2 carriers and 142 low-risk women, most features demonstrated high individual discriminatory capacity (AUC > 0.68), while the collective performance of the features that were deemed significant in multivariable models raised AUC values of 0.91 and 0.92 in the entire database and in an age-matched subgroup, respectively [55]. Using the same image dataset, the authors also showed a promising individual classification performance for structural measures such as edge frequency (AUC = 0.78) [81], for different implementations of the fractal dimension (AUC = 0.74–0.93) [81, 82], and for power law spectral analysis (AUC = 0.90) [83].

These results were recently replicated and validated in datasets with digital mammograms [71, 84] and larger numbers of high-risk women [71, 84, 85]. A similar design of texture analysis in the retroareolar breast region combined with a Bayesian Artificial Neural Network (BANN) for the classification task was applied to 1) film mammograms of 137 mutation carriers and 100 low-risk women [85], and 2) digital mammograms of 53 mutation carriers, 75 women with unilateral cancer, and 328 low-risk women [71, 84]. The first analysis conferred a two-fold increase in the odds of predicting BRCA1/2 mutation status, and an AUC of 0.68 for texture features alone and 0.72 for the features plus breast PD [85]. In the second analysis, AUC values of 0.82 and 0.73 were obtained between mutation carriers and low-risk women, and between unilateral cancer and low-risk women, respectively [84]; these evaluation results were also retained in age-matched subgroup analysis (0.81 and 0.70, respectively) [84] without any significant improvement from the inclusion of breast PD (0.81 and 0.68, respectively) [71].

Beyond established risk factors in breast cancer risk assessment

A comparison of the evaluation results published to date (Table 3), focusing primarily on cross-validated experiments, suggests that more comprehensive sets of multi-parametric texture features [51, 53, 54, 56, 71, 77, 84] may be more effective in predicting breast cancer than a single feature group. However, the literature lacks extensive comparative studies on the same datasets and generalized conclusions should, therefore, be limited. While the implementation of texture analysis, including both the location and size of ROIs [51, 52, 54] and the specific texture measures, appears to have an effect on texture classification performance, all studies have consistently shown the highly promising, independent role of automated texture analysis in breast cancer risk assessment. Specifically, parenchymal texture descriptors have demonstrated a strong cross-validated ability in predicting both risk for breast cancer (0.58 ≤ AUC ≤ 0.85) and BRCA1/2 mutation status (0.53 ≤ AUC ≤ 0.93). Moreover, texture performance has been shown to be either comparable or significantly higher than the performance of breast PD (0.51 ≤ AUC ≤ 0.62 and 0.53 ≤ AUC ≤ 0.59, respectively), as reported in studies where texture and density measures were comparatively evaluated on the same datasets [51, 56, 61, 66, 71, 7375, 85].

In addition, a number of related findings suggest that texture analysis is able to provide complementary information about a woman’s risk of developing breast cancer which cannot be captured by breast PD and other established risk factors. Texture descriptors have been weakly or moderately correlated with breast PD [55, 61, 64, 7173, 8587], and weakly correlated with risk factors as reflected in the Gail and Claus risk scores [80, 86]. In addition, texture descriptors deemed as strong predictors of breast cancer retained significance when breast PD, age, BMI, family history of breast cancer, parity, age at first term pregnancy, number of previous breast biopsies, menopause, and hormonal use, all shown to be associated with breast cancer risk, were simultaneously considered in classification models (Table 3). Finally, with age-matched datasets or model adjustments for age, most studies evaluating the capacity of parenchymal texture features in risk assessment have ruled out possible confounding due to differences in age, a major breast cancer risk factor, thereby showing a strong potential for computerized texture descriptors in augmenting breast cancer risk assessment.

Future directions

Moving forward, experiments evaluating the relative performance of different implementations of texture analysis, using the same evaluation methodology (i.e., dataset and classification model), are necessary to develop more robust and reproducible quantitative mammographic phenotypes of breast cancer risk. Future studies to test the incremental value added by computerized textural measures in predicting breast cancer will require: (a) the design of large age-matched datasets; (b) the selection of an effective classification model, where different previously used models (Table 3) could be comparatively examined; (c) model adjustments to rule out possible confounding due to differences in major risk factors; and (d) validation of the classification performance in independent datasets.

In an attempt to add an anatomical meaning in texture analysis which may also give additional discrimination power to feature classification, increasing attention is currently given to the incorporation of breast anatomy in texture analysis pipelines. Brandt et al. [74] first introduced an anatomically oriented breast coordinate system which allows for anatomical correspondences across mammograms of the same woman or different women. In preliminary analyses using the proposed coordinate system, the authors have demonstrated that anatomy-driven Gaussian derivative features are able to (a) effectively separate cancer cases and controls [74], (b) quantify the effect of hormone replacement therapy as a change in the breast parenchymal patterns [88], and (c) demonstrate specific regions of the breast parenchyma where breast cancer risk is mainly expressed [89]. More recently, Gastounioti et al. [9092] showed that the discriminatory capacity of texture descriptors is further enhanced by an anatomy-driven polar grid for anatomical breast sampling and a breast-anatomy-weighted texture signature which considers the spatial position and the underlying tissue composition of individual ROIs to summarize the parenchymal texture properties of the breast.

Another emerging technology is deep learning [93], which may prove a valuable addition in texture analysis for breast cancer risk assessment [9496]. Deep learning involves automated learning, from raw image data, of hierarchical representations useful for pattern detection and classification, in a supervised mode via neural networks with multiple hidden layers or in an unsupervised mode via autoencoders. The few available studies which have applied deep learning in the particular field show a promising role in risk scoring (AUC = 0.61–0.65) [94, 96]. Further, preliminary comparisons against two previously presented methodologies with handcrafted texture features [56, 64] suggest that it may be better to “let the data speak” instead of modeling prior assumptions [94]. Additional experimentation with deep learning, as well as future comparisons with the state-of-the-art texture analysis techniques, is warranted to better explore the potential of this novel technology.

Digital breast tomosynthesis (DBT), an emerging x-ray technology [97] in which quasi three-dimensional (3D) images are reconstructed from a limited number of low-dose x-ray source projections [98, 99], is increasingly being implemented clinically due to improvements in sensitivity and specificity compared to imaging with digital mammography alone [100]. By imaging the breast in 3D, DBT alleviates the effect of tissue superimposition, offering superior tissue visualization, which in turn may allow for better characterization of the breast parenchyma compared to two-dimensional mammography [86, 87]. The extension of the parenchymal texture analysis descriptors for volumetric texture analysis in DBT is, therefore, an important future challenge towards developing superior texture features which can optimize image-driven breast cancer risk assessment.

Another challenging future step which would establish the predictive value of texture analysis is the validation of parenchymal texture measures in prospectively collected data. Large-scale studies involving multiple screening centers, imaging machines, and image acquisition settings are also of major importance towards validating their predictive capacity and robustness to heterogeneous image data [101]. Furthermore, the literature lacks large-scale longitudinal studies monitoring longitudinal changes in automated parenchymal texture descriptors over successive mammograms, which could elucidate the mechanisms of breast cancer development [11] and the causal relations between the texture risk scoring and breast cancer [102, 103]. Finally, crucial questions to be addressed in such rich datasets are the causes of inter-woman variation in mammographic parenchymal patterns [104, 105] and in the relation of texture risk markers to the subsequent location and grading of tumors, disease mortality, and treatment effects [20, 21].

The valuable risk markers provided by parenchymal texture analysis could also leverage the relatively new, yet promising, paradigms of radiomics [106] and radiogenomics [107] for breast cancer, aiming to convert breast images into comprehensive measurable data and to delve into the interaction between these data and genetic variants. These novel approaches may pave the way to revealing correlations with the genomic diversity present in breast cancer, understanding how biological processes are reflected in quantitative breast imaging phenotypes, and defining novel clinical biomarkers or biological surrogates [108111], thus improving personalized breast cancer screening, monitoring, and treatment selection.

Conclusions

Automated breast parenchymal texture analysis has the potential to elucidate imaging phenotypes of breast cancer risk, which is valuable in accelerating the translation of individualized risk stratification into routine breast cancer screening and prevention strategies. Future work addressing technical challenges in this field and large prospective studies are expected to further enhance and establish the predictive value of parenchymal texture measures for inclusion in breast cancer risk assessment models in clinical practice.