Introduction

Accurate estimation of breast cancer risk could enable the identification of high-risk women who might be most likely to benefit from specific interventions while allowing low-risk women to safely avoid unnecessary screening and procedures. Successful application of this strategy could help to reduce breast cancer mortality and limit false-positive screening tests.

Breast cancer risk models are generally useful to predict the number of cancers that will develop in populations but cannot identify which individuals will develop cancer [1•]. Thus, risk models are useful to determine sample sizes needed for adequate statistical power in clinical trials but have less value for patient management. Accordingly, discovering new breast cancer risk factors and refining the assessment of established ones is important for improving risk assessment. In this review, we discuss four emerging topics in breast cancer risk prediction: 1) etiological heterogeneity; 2) genetic susceptibility; 3) mammographic breast density; and 4) molecular histology: involution of normal breast tissue (Table 1).

Table 1 Emerging topics in breast cancer risk prediction: Evidence and Opportunities

Etiological Heterogeneity

Breast cancer can be classified into distinctive, clinically relevant molecular subtypes based on mRNA profiling [24, 5••]. Detailed molecular characterization of breast cancer has revealed increasing biological diversity, which has been matched by a growing recognition that risk factor associations vary by tumor subtype.

Molecular epidemiological studies using immunohistochemistry for tumor subtyping show that reproductive risk factors are more strongly linked to estrogen receptor (ER)-positive or progesterone receptor (PR)-positive cancers than to receptor negative tumors [6, 7]. These relationships are compatible with the probable importance of cumulative exposure to sex-steroid hormones in the pathogenesis of ER-positive breast cancer [8].

In contrast to cancers that are ER-positive/PR-positive and human epidermal growth factor receptor 2 (HER 2)-negative (“luminal” molecular subtype), triple-negative (TN) breast cancers are less strongly related to reproductive factors [9•]. In particular, basal-like cancers (a subset of TN tumors) are clinically aggressive and associated with early onset, African American race, and BRCA1 germline mutations [1013]. Data suggest that basal-like cancers may underlie many of the differences in risk factor associations between ER-positive and ER-negative cancers [9•].

Nulliparity

Nulliparity is associated with increased breast cancer risk overall but does not seem to increase risk of TN tumors [7, 9•] and may even be protective [10, 14•, 15]. In a pooled analysis, including up to 35,568 cases within the Breast Cancer Association Consortium (BCAC), women diagnosed with ER-negative breast cancers were less likely to be nulliparous than women diagnosed with ER-positive tumors (P = 3 × 10-6), and the frequency of nulliparity was lowest among TN tumors (13 % among women with TN cancers vs. 17 % among women with luminal cancers). In a subset of 12 population-based BCAC studies, nulliparity was associated with increased risk of luminal tumors but not TN cancers [9•]. Similarly, nulliparity was not associated with risk for TN tumors in another population-based, case-control study not included within the BCAC analysis [16•]. Moreover, nulliparity was associated with decreased risk of TN cancer in the Women’s Health Initiative (WHI) cohort [14•] and decreased risk of ER-negative/PR-negative breast cancer in the Black Women’s Health Study [15] and the Carolina Breast Cancer Study [10], which included a high percentage of African American women. In addition, increasing age at first birth, which is an established breast cancer risk factor, was not associated with increased risk for TN tumors in several studies [9•, 14•, 16•].

Breastfeeding

Breastfeeding for long durations has been associated with a modest reduction in breast cancer risk in two meta-analyses [17, 18], one of which included prospective data [17]. However, some studies have found that breastfeeding is associated specifically with a substantial risk reduction for ER-negative/PR-negative or basal-like tumors but not for ER-positive cancers [10, 15], whereas another did not find significant protection for any tumor type [14•]. In addition, data suggest that breastfeeding eliminates the association between high parity and increased risk of ER-negative/PR-negative or basal-like breast cancers [10, 15, 19]. Limited data also suggest that breastfeeding may be particularly protective for women with a family history of breast cancer [20] and carriers of germline BRCA1 mutations [21].

Age at Menarche

Most investigations suggest that early age at menarche is more strongly associated with increased risk of hormone receptor-positive than receptor-negative cancers [6, 7, 9•, 14•], with one investigation finding the strongest relationship with PR status [9•]. However, data suggest that early age at menarche is not significantly different between women with TN tumors versus women with luminal tumors [9•].

Obesity

Premenopausal obesity is protective for breast cancer, whereas postmenopausal obesity is associated with increased risk [22, 23]. Obesity may produce several potentially procarcinogenic effects, related to sex-steroid hormones, growth factors, and inflammation [24]. In one report, waist-to-hip ratio (WHR), a measure of central obesity, was related to increased risk for TN tumors, among both premenopausal women (odds ratio (OR) = 1.8; 95 % confidence interval (CI) = 1.0, 3.4; P trend = 0.07) and postmenopausal women (OR = 2.7; 95 % CI = 1.3, 5.4; P trend = 0.006) [10]. Obesity has been related to increased risk for TN tumors in two studies of predominantly white women [19, 25] and a similar suggestion was found in the pooled BCAC analysis [9•].

Other Risk Factors

Menopausal hormone therapy (MHT) use has been associated with increased risk of ER-positive but not ER-negative tumors [2628], whereas reported associations between oral contraceptive use and breast cancer subtypes are more variable [14•, 16•, 29]. A positive family history increases risk, irrespective of ER status [9•, 30], although risks may be greatest for basal-like cancers [9•].

Etiological Heterogeneity: Risk Prediction and Future Directions

Data suggest that the Gail Model provides more accurate risk prediction of ER-positive breast cancer than ER-negative tumors [31]. The development of a risk model for ER-positive breast cancer could enable more specific identification of women most likely to benefit from chemoprevention with endocrine agents, which may only reduce risk for hormone-dependent tumors. By analogy, models could be developed to identify women at high-risk for other specific molecular subtypes of breast cancer that could be prevented with targeted interventions. However, implementation of multiple risk models would be complex compared with the use of an omnibus risk prediction model for all tumor types.

Genetic Susceptibility

Women who have a family history of breast cancer are personally at increased risk, suggesting the importance of genetic factors in the etiology of the disease [32]. Furthermore, risk varies with the number of affected relatives, the closeness of their relationship (i.e., first or second degree), and the relatives’ ages at diagnosis [32]. Risk is similar for women with an affected maternal compared to an affected paternal relative. A meta-analysis of 74 studies showed that women who have a first-degree relative with breast cancer are at approximately 2-fold increased risk, whereas those with an affected second-degree relative are at approximately 1.5-fold increased risk [32]. Having a relative diagnosed by age 40 years increases risk fivefold, whereas having a relative diagnosed at age 60 years or older increases risk by 40 %. In contrast to a positive family history, a negative family history provides limited risk information, because 80 % to 90 % breast cancers occur among women without an affected close relative. Increasingly, genetic research has focused on identifying specific markers of risk, including uncommon variants conferring large or moderate risk and common variants conferring small increases in risk.

Genetic Risk Factors for Breast Cancer: Genes with High or Moderate Penetrance

An estimated 57 % of women with BRCA1 mutations and 49 % of those with BRCA2 mutations will develop breast cancer by age 70 years [33]. Cancers among BRCA2 mutation carriers differ from those among BRCA1 carriers in that they more closely resemble the range of cancer subtypes that occur in the general population (i.e., predominantly ER-positive) and are diagnosed at older ages. Other high penetrance mutations linked to elevated breast cancer risk are discussed elsewhere, including TP53 in Li-Fraumeni syndrome [34], PTEN in Cowden syndrome [35, 36], and STK11/LKB1 in Peutz-Jeghers syndrome [37]. Mutations in CDH1 (which encodes e-cadherin) have been linked to increased risk, especially for lobular cancers [38].

Mutations with moderate penetrance (ATM, CHEK2, BRIP1, and PALB2) [3942] confer a twofold to fourfold increase in breast cancer risk and are quite rare in the general population. CHEK2 mutations may confer susceptibility specifically to luminal breast cancer subtypes, including tumors with lobular histology [4345]. Clear links between mutations in ATM, PALB2, and BRIP1 and breast cancers with specific molecular or pathologic characteristics have not been established. Efforts to identify additional moderately penetrant susceptibility mutations are ongoing.

Common Variants of Low Penetrance

A prior report based on 13 common low penetrant genetic variants suggested that they explained approximately 8.3 % of familial risk compared with 5 % for moderately penetrant genes and 22 % for highly penetrant mutations [46•]. Genome-wide association studies (GWAS) have identified more than 30 common low penetrant variants as of this writing, the majority of which are single nucleotide polymorphism (SNP) markers, which confer breast cancer risks ranging from 1.04-1.4 per allele. SNPs have been associated with important breast cancer features, such as ER, PR or HER2 status, grade, and histology [4754].

Genetic Susceptibility: Risk Prediction and Future Directions

Analyses suggest that adding known SNPs to breast cancer risk models produces only modest improvements in risk prediction [55•]. In addition, strong gene-gene or gene-environment interactions that would identify subsets of high-risk women have not been found [5658], nor have variants been strongly linked to breast cancer outcomes [59]. Research on the functional consequences of SNPs on carcinogenic processes is in its infancy, and the identification of SNPs in noncoding regions of the genome raises intriguing questions about the mechanisms that mediate the risk associated with these variants. Identification of variants related to ER-negative breast cancers and early onset tumors is a research priority. A web-based computer program for assessing risk among patients with a family history of breast and/or ovarian cancer has been developed (BOADICEA) [60]. Finally, a polygenic model that combines multiple common susceptibility variants to predict risk has been developed, which could inform future public health recommendations about screening women based on level of risk [54, 61, 62].

Mammographic Breast Density

Mammographic breast density (MBD) reflects the tissue composition of the breast: high MBD corresponds to a greater percentage of fibroglandular tissue relative to fat, whereas low MBD is related to a higher percentage of fat relative to nonfatty tissue. Although high MBD is related to some breast cancer risk factors, such as nulliparity, a positive family history, and MHT use, many cross-sectional and prospective studies have consistently demonstrated that high MBD is a strong and independent breast cancer risk factor, conferring relative risks (RRs) of fourfold to fivefold when comparing women with highest to lowest MBD (reviewed in [63]). Dense areas on a mammogram, which appear white (Fig. 1b), may mask tumors, leading to delayed detection; however, high MBD is related to long-term prospective increases in tumor incidence, independent of its effects on detection [64]. Given that elevated MBD is the strongest risk factor for nonfamilial breast cancer apart from age and gender [65, 66], and that many women have high MBD [66], assessment of MBD represents a potentially useful breast cancer risk assessment tool.

Fig. 1
figure 1

Digitized mammograms from NCI Polish Breast Cancer Study participants where A and B represent breasts of low and high mammographic breast density (MBD), respectively.

Little is known about the biology of high MBD and why it is related to breast cancer risk. MBD has a substantial heritable component; it is estimated that approximately two-thirds of the variance in density is genetically determined [67], suggesting that density acts at early stages in carcinogenesis. As described by Boyd et al. [68], density at young ages may be a key risk marker, because it reflects the number of undifferentiated cells that are vulnerable to carcinogenic insults before the differentiating effects of pregnancy and age-related involution.

Methods for Assessing Mammographic Breast Density

Methods for assessing MBD have become increasingly more quantitative, reproducible, and automated. MBD can be estimated visually as the extent of dense tissue in mammograms (e.g., Wolfe’s parenchymal patterns and the American College of Radiology Breast Imaging-Reporting and Data System (BI-RADS)) or quantitatively as an absolute area or as a percentage of total breast area using planimetry or computer-assisted methods, which are more reliable. Technological advances enable measurement of density as a volume (rather than an area) and employ methods such as magnetic resonance imaging (MRI) and ultrasound (reviewed in [69]).

Clinical Relevance of Mammographic Breast Density

High MBD is clinically important because women with such breasts are more likely to develop interval cancers (undetected by screening mammography). Furthermore, most studies have found that high MBD increases risk for both ER-positive and ER-negative cancers [7073]. MBD and its associations with risk also do not appear to differ between carriers of high penetrant mutations, such as BRCA1/2, compared to noncarriers [7475]. Therefore, understanding the mechanisms that mediate the risk related to high MBD may increase our knowledge about etiological factors that contribute to most subtypes of breast cancer and could facilitate the development of prevention strategies with broad impact.

High MBD has been linked to increased risk of breast cancer recurrence (reviewed in [76]). In addition, MBD may reflect the breast cancer risk associated with use of exogenous hormones. Specifically, limited data suggest that women whose MBD declines after receiving endocrine agents for chemoprevention [77] or adjuvant treatment [78] are more likely to benefit from these medications than those whose MBD does not fall. In contrast, elevated MBD in the context of MHT or oral contraceptive use may be associated with increased breast cancer risk [79]. Thus, MBD is potentially a strong, modifiable “biosensor” of breast cancer risk, which may have utility in multiple populations and in different clinical settings.

Mammographic Breast Density: Risk Prediction and Future Directions

Several studies [8083] have suggested that adding MBD to the Gail model may improve breast cancer risk prediction modestly and efforts to incorporate MBD in newer risk models are ongoing [84]. Of those studies incorporating MBD into the Gail model, three [80, 81, 83] evaluated the addition of BI-RADS density categories and the fourth [82] used a quantitative measure of MBD as assessed by planimetry. These results show modest but consistent improvements in risk prediction by adding MBD to existing risk prediction models [8082]. However, the potential gains in risk prediction that might be realized by using automated, quantitative measures of density obtained through full-field digital mammography or other emerging technologies have not been fully explored. In addition, data related to MBD and breast cancer risk in non-white populations are limited. Finally, elevated MBD may produce its strongest effect among young women who are below the age of initiation of mammographic screening but who might benefit from preventive interventions [66]. Evaluating density without exposing young women to ionizing radiation is critical, and these approaches have not been implemented in clinical practice.

Subjective visual assessment of MBD often is performed clinically, but this approach has only moderate interrater reliability [85, 86] and assigns most women to two of four possible categories, thereby providing limited risk discrimination. Measurement of MBD is limited by properties of mammography, including: 1) two-dimensional representation; 2) need for compression, which distorts tissue architecture and varies between examinations; and 3) ionizing radiation exposure, which poses cancers risks.

It is hoped that more accurate, precise measurement of MBD is achievable through technological advances, which will increase its clinical utility. Methods for measuring breast density as a volume and in specific regions of the breast using digital mammography with density phantoms [87] and other breast imaging modalities are rapidly evolving [69]. Nonionizing technologies, such as MRI and ultrasound tomography, may be ideally suited to assess volumetric density in young or high-risk women or in situations where it is desirable to perform more frequent measurements [88•]. These evolving technologies also may offer further opportunities to increase accuracy in measurement and to identify stronger risk associations.

Molecular Histology: Involution of Normal Breast Tissue

“Molecular histology” may be defined as the sum total of all microscopic and molecular characteristics of the breast [89]. With aging, the breast undergoes dramatic structural and compositional changes, including a reduction in epithelium, followed successively by stromal and then adipose tissue replacement [9092]. Recent studies suggest that analysis of terminal duct lobular units (TDLUs), the benign structures from which most breast cancers arise, may be useful for breast cancer risk prediction [9294]. TDLU involution begins before menopause, progresses with aging, and varies substantially among women, reflecting differences in reproductive history and other factors [90, 91, 93]. Thus, TDLU involution could represent a global measure of risk, which reflects the interaction of genetic and environmental risk factors over time.

The number of TDLUs in the breast, like MBD, declines with aging, although breast cancer risk increases [95]. As has been proposed for MBD [88•], this apparent paradox may be reconciled by viewing these factors as proxies for exposure to carcinogenic influences, culminating during critical periods of heightened susceptibility to malignant transformation, such as before a first birth. From this perspective, women with less TDLU involution are postulated to have had more at-risk epithelium over their lifetimes or during vulnerable periods.

TDLU involution may be viewed as a reduction in the number of TDLUs or simplification of the structure of individual TDLUs, manifested as a shorter diameter or a reduced number of acini (functional subunits of TDLUs). A retrospective analysis of 8,736 benign breast biopsies found that TDLU involution was absent in 18.6 %, partial in 59.5 %, and complete in 21.9 % [93]. The percentage of biopsies with complete TDLU involution increased with age, reaching 53.1 % among women aged 70 years or older. Having given birth, a strong family history of breast cancer, use of menopausal hormones, or having proliferative breast disease was associated with less TDLU involution. Preliminary analyses suggest that many of these relationships hold in normal breast tissues donated by volunteers (unpublished observation).

Compared with population-based rates, Milanese et al. reported that women with TDLU involution categorized as “none” had increased breast cancer risk (RR = 1.88; 95 % CI = 1.59, 2.21), as did women with “partial” involution (RR = 1.47; 95 % CI = 1.33, 1.61); risk for women with “complete” involution was close to unity [93]. Levels of TDLU involution stratified women’s breast cancer risk irrespective of other factors, including family history, parity, age at first birth biopsy and the presence of hyperplasia or atypical hyperplasia. Similar but less significant relationships with risk were reported in a smaller case-control comparison nested within another cohort [94].Data indicated that levels of TDLU involution are similar throughout both breasts [96], suggesting that TDLU involution represents a global marker of risk. Breasts that contain more TDLUs (less involution) are associated with high MBD [95].

Molecular Histology: Risk Prediction and Future Directions

The degree of TDLU involution is generally inversely related to factors that increase breast cancer risk. However, these associations raise questions as to whether TDLU involution represents an independent risk marker. In one analysis, both MBD and TDLU involution were significantly associated with breast cancer risk after mutual adjustment. Compared with women who had low MBD and complete involution, women who had high MBD and no involution had a RR = 4.08 (95 % CI = 1.72, 9.68) [97•]. Limited data also suggest that TLDU involution levels in tissues surrounding breast cancers may represent a marker of etiological heterogeneity. Specifically, TDLUs associated with basal-like cancers showed significantly less involution than those surrounding luminal cancers in age-adjusted analyses [98].

Molecular histology generally, and TDLU analysis specifically, may offer an objective method for breast cancer risk assessment that reflects cumulative effects of exposures on the target organ. The approach bypasses concerns related to imperfect recall of medical history and may reflect important effects of unknown risk factors and interactions among factors, both genetic and environmental [89]. Despite subjectivity, data suggest that visual and computer-assisted assessment of TDLU characteristics yield reproducible results [98, 99] (and unpublished data). A limitation of TDLU analysis is that it requires access to tissue; however, breast biopsies are common and precede a cancer diagnosis among many women. Another significant challenge is that when TDLUs are not identified in a biopsy of limited dimensions, it is difficult to judge whether this represents complete involution (i.e. most TDLUs have been replaced by stroma throughout the breast) or sampling of a non-representative area.

Improving criteria for visually categorizing TDLU involution and developing image analysis tools for quantifying TDLU metrics may increase the utility of this approach for risk prediction [99]. It is unclear which metrics of TDLU involution best predict breast cancer risk. Given that increased expression of ER in benign epithelium has been associated with elevated breast cancer risk [100], it is hoped that molecular analysis of TDLUs might be useful for risk prediction, if valid techniques can be developed.

Conclusions

Developing improved approaches for breast cancer risk assessment offers a potential means of targeting interventions to women most likely to benefit, which could lead to reduced mortality, lowered costs, and more efficient screening. We explore four possible approaches for achieving that objective: accounting for etiological heterogeneity; testing for genetic susceptibility factors, measurement of MBD, and assessment of TDLU involution. These topics reflect the overall complexity of breast cancer risk prediction, given the enormous diversity among women and the breast cancers that they develop. Future research will reveal whether understanding the heterogeneity of this complex disease represents a gateway to progress or a challenge to clinical translation.