Introduction

Risk factors for breast cancer development are thought to promote carcinogenesis by inducing proliferative epithelial changes, but emerging data suggest that stromal and adipose tissue components of the breast may also play crucial roles in early stages of breast carcinogenesis [1,2,3]. Results from a previous study by Troester and colleagues, for example, found that associations between some risk factors and involution of the breast epithelium were modified by the mammary stroma [4]. Further, adipose tissue features, such as crown-like structures, that are reflective of increased levels of proinflammatory mediators, aromatase expression, and possibly elevated breast cancer risk, have been found to be more prevalent among obese than normal weight postmenopausal women [5].

In support of a stromal role in breast cancer etiopathogenesis, a recent study by our group found a context-dependent role of the stroma to either prevent or promote breast cancer development in women with benign breast disease (BBD) [6]. Among BBD patients with non-proliferative disease, we observed increasing stromal proportion to be strongly protective against breast cancer development, whereas among those with proliferative disease increasing stroma was associated with increasing breast cancer risk. We also found the relative abundance of epithelium to stroma on BBD biopsies, i.e., the epithelium-to-stroma proportion (ESP), to be strongly associated with risk of future invasive breast cancer, independently of BBD histological classification [6]. In another study of BBD patients from the Nurses’ Health Study (NHS), Vellal and colleagues found a similar metric on BBD biopsies, i.e., the epithelium-to-stroma ratio (ESR), to be independently associated with elevated risk of future invasive breast cancer [7]. In both studies, the association between ESP/ESR and breast cancer risk was stronger among women with non-proliferative than proliferative BBD.

Morphologically, non-proliferative diseases more closely resemble the normal adult female breast than proliferative diseases [8]. Accordingly, results from previous studies may be indicative of the role of disruptive changes in the epithelial and stromal equilibrium in the pathogenesis of breast cancer [6, 7]. Most previous studies that have examined the association between epidemiological factors and tissue composition metrics have relied on measures of involuting epithelial structures called terminal duct lobular units (TDLUs) and/or were largely conducted within BBD populations [9,10,11,12,13,14,15,16]. Both have limitations. TDLUs do not capture information on tissue composition metrics beyond epithelial changes, and results from BBD populations may be limited by the impact of the underlying BBD pathology on breast tissue composition. For example, results from our previous BBD study were in support of associations between individual breast cancer risk factors and quantitative tissue composition metrics, including ESP, but the underlying BBD lesion was the strongest predictor of variations in breast tissue composition [6].

The main aim of this study was, therefore, to investigate the associations between several breast cancer risk factors and quantitative tissue composition metrics in normal breast biopsies, individually (epithelium, stroma, and adipose tissue) and in combination as fibroglandular tissue (epithelium plus stroma) and ESP (the proportion of fibroglandular tissue that is epithelium, relative to stroma), among women participating in the Susan G. Komen Tissue Bank.

Methods and materials

Study population

Participants in this study were women without a personal history of breast cancer who voluntarily donated breast tissues to the Susan G. Komen Tissue Bank (KTB). Details of the KTB project (http://komentissuebank.iu.edu/) have been described elsewhere [17,18,19]. In brief, the KTB is a continuously growing biorepository that collects, stores, and annotates histologically normal breast tissue donated by volunteers. Participants were generally women ≥ 18 years of age at donation with no breast implants and not receiving strong blood thinners or radiation to the chest. About 5382 tissue donations from 4906 women were recorded in the KTB by 2019. For women with multiple donations (n = 476), we used the earliest donation corresponding to the time when questionnaires were administered. Women without hematoxylin and eosin (H&E)-stained images (n = 526) were excluded from the analytical population. In addition, we excluded women above 75 years of age (n = 51) and those without data on age (n = 3), those who were pregnant and/or breastfeeding at the time of tissue collection (n = 59), previously had breast cancer (n = 149), and those without data on menopausal status (n = 10). The final analytical population comprised 4108 women who donated tissues between 2008 and 2019 and for whom we could retrieve the corresponding digitized H&E-stained sections (Additional file 1: Fig. S1). At the time of donation/enrollment, the participants provided written informed consent and were enrolled under a protocol approved by the Indiana University Institutional Review Board and the National Institutes of Health Office of Human Subjects Research (NIH OHSR #4508).

Exposure assessment

The methodology for exposure assessment within KTB has been described elsewhere [13, 18]. In general, detailed information on sociodemographic, medical, reproductive, menstrual, and lifestyle factors, as well as information on gynecologic surgeries and mammographic screening, were collected by means of self-administered questionnaires. Relevant exposures for this analysis included age (years; < 30, 30–39, 40–49, 50–59, 60–75) at the time of tissue donation, race/ethnicity (Black, White, Asian/Other), age at menarche (years; categorized as ≤ 12, 13, ≥ 14), parity (gravid vs nulligravid), number of live births (0, 1, 2, ≥ 3), age at first full term birth (years; < 25, 25–29, ≥ 30), breastfeeding (ever vs never; duration), body mass index (BMI; < 25, 25–29, ≥ 30 kg/m2;), hormonal birth control use (yes vs no), menopausal status (post- vs. pre-menopausal), bilateral oophorectomy (yes vs none), menopausal hormone therapy (MHT) use (never, former, current) and MHT type, smoking status (never, former, current), alcohol intake, and family history of breast cancer (FHBC) in a first degree relative (present vs absent).

Breast tissue collection

Up to four tissue cores were biopsied from the upper outer quadrant of the right or left breast using a standard 9-gauge (since 2010) or 10-gauge (prior to 2010) needle and one core was fixed in 10% buffered formalin [13, 18]. The formalin-fixed and paraffin-embedded (FFPE) tissue blocks that were prepared from that core were sectioned and stained using H&E staining according to standard laboratory procedures [13, 18]. Archival, digitized, H&E-stained sections were shared with the Molecular and Digital Pathology Laboratory (MDPL) of the Division of Cancer Epidemiology and Genetics (DCEG) at the National Cancer Institute (NCI), USA, for downstream tissue composition analysis (see below).

Machine learning characterization of tissue composition metrics

Digitized H&E-stained slides were archived using the Halo Link digital image repository (Indica Labs, Albuquerque, NM) at the US National Cancer Institute (NCI). Image analysis was performed using the Halo Client computational pathology software (Indica Labs, Albuquerque, NM). A custom-built, random forest, tissue classifier algorithm was trained by two pathologists (MA and MAD) to develop an optimized, 85-datapoint, tissue classifier script. By annotating regions on randomly selected representative images comprised of epithelium, stroma, and adipose tissue, the random forest algorithm was trained to identify, segment, and quantify areas (in mm2) on each slide comprised of epithelium (42-datapoints), stroma (37-datapoints), and adipose tissue (6-datapoints) as shown on Fig. 1 (Red: epithelium; Green: stroma; Yellow: adipose tissue). In previous reproducibility analyses [6], we demonstrated excellent concordance (Spearman’s correlation coefficients ≥ 0.95) between scripts that were independently trained by two pathologists to identify and segment all three tissue types. Training and centralized image analysis were performed masked to all patient characteristics. Percent epithelium, stroma, and adipose tissue were calculated by dividing the absolute value of each histologic metric (in mm2) by the total tissue area (i.e., epithelium + stroma + adipose tissue, mm2) on each slide and multiplying by 100. Percent fibroglandular tissue area was calculated by adding epithelial and stromal area on the slides, dividing by total tissue area, and multiplying by 100. Percent ESP was calculated by dividing the epithelial area by total fibroglandular tissue area and multiplying by 100 as we previously described [6].

Fig. 1
figure 1

Machine learning analysis of quantitative tissue composition metrics. Digitized hematoxylin and eosin-stained slides were used to optimize machine learning-based tissue classification scripts. A custom-built, random forest, tissue classifier algorithm (Indica Labs, Albuquerque, NM) was trained by two pathologists to develop an optimized, 85-datapoint, tissue classifier script. By annotating regions on randomly selected representative H&E-images comprised of epithelium, stroma, and adipose tissue (A), the random forest algorithm was trained to identify, segment, and quantify areas (in mm.2) on each slide comprised of epithelium (42-datapoints), stroma (37-datapoints), and adipose tissue (6-datapoints) as shown on (B) (Red: epithelium; Green: stroma; Yellow: adipose tissue). C and D show high-power views of the machine’s capacity to identify regions on the slide comprised of adipose tissue (C) as well as epithelium and stroma (D)

Statistical analysis

Kruskal–Wallis tests were used to test differences in tissue composition metrics by participant characteristics. The associations of host (age, race/ethnicity, FHBC, menopause), reproductive (age at menarche, pregnant (gravid vs nulligravid), number of live births, age at first full-term birth (AFFB), breastfeeding), and lifestyle (smoking, alcohol intake, BMI, hormonal birth control, MHT use) factors with tissue composition metrics (epithelium, stroma, adipose tissue, fibroglandular tissue, and ESP) were assessed in linear regression models. All tissue composition metrics were square root transformed to better approximate the normal distributions for the linear regression model. Partially adjusted models included age and tissue area and fully adjusted models included all of the variables under consideration. Associations of breastfeeding, number of live births, and AFFB with tissue composition metrics were assessed in models restricted to previously pregnant women. Analyses were performed overall and stratified by menopausal status. In the overall model, bilateral oophorectomy and uterine ablation were included separately to examine their effects on tissue composition metrics. In stratified analyses, individuals who had a bilateral oophorectomy, irrespective of age, or uterine ablation after the age of 55 years, were considered postmenopausal. Locally weighted scatter plot smoothing (Lowess) functions were used to plot the residuals from multivariable linear regression models for each tissue composition metric as a function of age. Lowess plots were constructed overall, and separately for pre- and postmenopausal women, parous and nulliparous women, normal and overweight/obese women, and for Black and White women. For racial/ethnicity comparisons, plots were restricted to comparisons between Black and White women due to the very small numbers of other individual racial and ethnic groups. To explore whether BMI impacted parity and race-related curves, we conducted sensitivity analyses by stratifying Lowess plots for parity and race by BMI categories (i.e., normal versus overweight or obese). In sensitivity analyses, we also assessed whether parity impacted the race-related curves by creating separate plots for nulliparous and parous women. The majority of the risk factors were complete for participants. For those with missing values (Additional file 5: Table S1), however, these were addressed by the inclusion of missing values indicators in the models. In sensitivity analyses, we compared with multiple imputation and found the results to be similar (Additional file 5: Table S2). Further, we removed AFFB, which had the largest number of missing values (48.9%) from our model and compared models with and without AFFB (Additional file 5: Table S3). Although the results were similar, the model containing AAFB explained more variability in ESP than the model without AFFB (0.057 vs 0.045, respectively). Accordingly, AAFB was retained in models. All analyses were performed using R version 4.2 and all p values were two sided. Lowess plots were created using Stata statistical software version 16.1.

Results

Descriptive characteristics of analytical population

The characteristics of study participants are shown in Table 1. On average, participants were 43.8 years at the time of tissue donation (range = 18–75 years). Of the 4108 participants, 2696 (65.6%) were premenopausal while 1412 (34.4%) were postmenopausal. The majority (72%) of the participants were Non-Hispanic White, while ~ 18% were Black or African American and 9% were Asian or belonged to other ethnic groups (including Native Hawaiian/Pacific Islander, Alaskan native, Filipino, Japanese, Mixed race, and others). Most of the participants had a college degree (32%) or graduate/professional degree (25%). Two-thirds of the participants were either overweight or obese (BMI > 25 kg/m2).

Table 1 Characteristics of women volunteers who donated normal breast tissue to the US-based Susan G. Komen Tissue Bank that were included in the current study (N = 4108)

Associations of age and menopausal status with breast tissue composition metrics

The distributions of all the tissue composition metrics varied statistically significantly by age and menopausal status (Table 2). With the exception of adipose tissue, which was higher among older than younger and among postmenopausal- than premenopausal women, the distributions of all other tissue composition metrics were higher among younger than older women and among premenopausal than postmenopausal women. In multivariable linear regression models (Table 3), increasing age remained statistically significantly associated with decreasing epithelium, stroma, and fibroglandular tissue and with increasing adipose tissue, but not with ESP. On the other hand, compared with premenopausal women, postmenopausal women had significantly lower epithelium and ESP (Table 3).

Table 2 Distributions of quantitative tissue composition metrics according to characteristics of women volunteers who donated normal breast tissue to the US-based Susan G. Komen Tissue Bank that were included in the current study (N = 4108)
Table 3 Beta coefficients and 95% confidence intervals (CIs) for the associations of host, reproductive, and lifestyle factors with quantitative tissue composition metrics among the 4108 women who donated normal breast tissue to the US-based Susan G Komen Tissue Bank that were included in the current study

Similar patterns of age- and menopause-related changes in tissue composition metrics as in the regression models were seen in Lowess curves for all the tissue composition metrics, with the exception of ESP. Unlike epithelium, stroma, and fibroglandular tissue that declined with increasing age, ESP showed a bimodal age distribution, increased starting at age 18 and peaked around 40 years of age, decreased from 40 until 55 years of age, and then increased with age thereafter (Fig. 2). Similar patterns of bimodal ESP distributions were seen with respect to menopausal status, with the first peak among premenopausal women occurring around age 30–40 years and a later peak among postmenopausal women occurring around 60–70 years of age. The bimodal age distribution of ESP corresponded to differences in the rates of decline of epithelial and stromal tissues by age. In multivariable linear regression models, stromal decline was ~ 34 times higher than epithelial decline before age 40 years but this slowed to ~ 2 times more between 40 and 55 years and increased again to ~ 10 times more after 55 years of age.

Fig. 2
figure 2

Relationship between age and menopause with quantitative tissue composition metrics of the normal breast. Locally weighted scatter plot smoothing (Lowess) functions were used to plot the residuals estimated in multivariable linear regression models for each tissue composition metric as a function of age. Lowess plots were constructed overall (A) and stratified by menopausal status (B). Pre- and postmenopausal status were defined by combining information on self-reported menopausal status, age (< 55 years (premenopausal) versus ≥ 55 years (postmenopausal)), bilateral oophorectomy, and having had a uterine ablation

Associations of reproductive factors with breast tissue composition metrics

Increasing age at menarche was associated with higher stromal and fibroglandular tissue but with lower adipose tissue and lower ESP (Table 2). Compared with women who had never been pregnant, previously pregnant women had statistically significantly higher ESP and this increased with increasing number of live births. Among parous women, those who breastfed had higher epithelium, stroma, fibroglandular tissue, and ESP but lower adipose tissue. In addition, increasing duration of breastfeeding was associated with increasing epithelium, stroma, fibroglandular tissue, and ESP but with decreasing adipose tissue (Table 2). The distributions of tissue composition metrics did not differ by age at first full-term birth.

In multivariable linear regression models (Table 3), parity and increasing number of live births remained statistically significantly associated with higher epithelium and higher ESP, with a strong linear trend in the magnitude of these associations with increasing stroma (Additional file 2: Fig. S2). Breastfeeding remained statistically significantly associated with increasing stromal and fibroglandular tissue and with decreasing adipose tissue and ESP (Table 3). Increasing duration of breastfeeding was associated with increasing stroma and fibroglandular tissue and with decreasing adipose tissue. Although different strata of breastfeeding duration were associated with ESP, there was no statistically significant trend in the association between duration of breastfeeding and ESP. The observed associations of parity, increasing number of live births, and breastfeeding with the individual tissue composition metrics were evident in both pre- (Additional file 5: Table S4) and post- (Additional file 5: Table S5) menopausal women.

Separate Lowess plots for nulliparous and parous women revealed rapid increase in ESP among parous women from 20 years, peaking around 40 years, declining slightly between 40 and 60 years, increasing again after 60 years, and remaining higher for parous than nulliparous women throughout life. Among nulliparous women, epithelium decreased progressively up to around 50 years after which it began to increase, surpassing levels in parous women around 65 years. In contrast, ESP decreased progressively up to 50 years among nulliparous women and remained fairly constant afterward (Fig. 3A).

Fig. 3
figure 3

Relationships between parity, body mass index, and race with quantitative tissue composition metrics of the normal breast. Locally weighted scatter plot smoothing (Lowess) functions were used to plot the residuals estimated in multivariable linear regression models for each tissue composition metric as a function of age. Lowess plots were constructed separately for parous and nulliparous women, normal and overweight/obese women, and for Black and White women. Lowess plots were restricted to comparisons between Black and White women due to the small number of individuals in the other ethnic classes

Associations of body mass index (BMI) with breast tissue composition metrics

The distributions of all tissue composition metrics varied by BMI categories (Table 2). Increasing BMI was associated with lower proportions of epithelium, stroma, and fibroglandular tissue but with higher proportions of adipose tissue (Table 2). Further, compared with normal weight women, overweight and obese women had statistically significantly higher levels of ESP. These associations persisted in multivariable linear regression models adjusted for other factors (Table 3). In separate plots for normal versus overweight/obese women (Fig. 3B), epithelial, stromal, and fibroglandular tissue components declined with age, while adipose tissue increased, in both groups. However, while decreases in epithelium (i.e., consistent with lobular involution) continued throughout life among normal weight women, age-related decline in epithelial tissue was not evident among overweight/obese women after 50 years. We also found ESP levels to be slightly higher among normal than overweight/obese women before 50 years of age, with a rapid decline among normal weight women after age 50 (Fig. 3B). Conversely, ESP levels were fairly constant among overweight/obese women before 50 years of age after which a rapid increase was observed causing ESP to be markedly higher among overweight/obese than normal weight women after 50 years (Fig. 3B).

Associations of race and ethnicity with breast tissue composition metrics

Of the tissue composition metrics, the distributions of stroma and ESP varied statistically significantly by race and ethnicity (Table 2). In general, White women had the highest amount of stromal tissue (median = 18.0%), while Black women had the lowest (median = 15.1%). Conversely, ESP was highest among Black women (median = 6.7%) and lowest among White women (median = 5.7%). The difference in stroma by race/ethnicity was not statistically significant in multivariable models; however, epithelium was statistically significantly higher, while ESP was suggestively higher, among Black than White women (Table 3). In separate Lowess plots for Black and White women, the pattern of age-related decline in ESP differed by race. ESP was higher for Black than White women between 20 and 45 years of age, similar for Black and White women 40–60 years of age, and higher among White than Black women above 60 years of age (Fig. 3C). This pattern of association did not differ by parity status or BMI (Additional file 3: Fig. S3).

Associations of other factors with breast tissue composition metrics

The distributions of individual tissue composition metrics varied according to several other factors, including FHBC, average number of alcoholic drinks per week, bilateral oophorectomy (Tables 2 and 3), as well as MHT use among postmenopausal women (Additional file 5: Table S5). While a positive FHBC was positively associated with ESP (Table 3), increasing number of alcoholic drinks per week (Table 3) and current use of combined estrogen and progesterone MHT formulation (Additional file 5: Table S5) were statistically significantly inversely associated with ESP in multivariable models. We did not find statistically significant associations between use of hormonal birth control among premenopausal women and any tissue composition metric.

Discussion

By examining breast cancer risk factors in relation to quantitative tissue composition metrics of the normal breast, we showed that joint variations in both epithelial and stromal tissue composition may be critical for breast carcinogenesis. In particular, our findings suggest that both epithelial and stromal tissues involute toward fat and that imbalance in the rate of stromal and epithelial involution can manifest as high ESP, which may represent a feature of the mammary tissue ecosystem that is conducive for carcinogenesis [6]. The bimodal age- and menopause-related peaks in ESP that we found corresponds to the widely reported early-onset (premenopausal) and late-onset (postmenopausal) peaks in breast cancer incidence [20,21,22]. For most solid cancers, incidence rates increase linearly with age, but the pattern is different for female breast cancer which is characterized by an initial linear increase up to around age 50 years after which the slope changes to a downward trend and then resumes at a slower rate of increase with advancing age [22]. The point at which the incidence curve changes is known as the “Clemmensen’s” hook [23,24,25], a characteristic of female breast cancers that occurs around the perimenopausal period of life when ovarian function begins to decline until after its cessation at menopause.

Results from epidemiological studies have shown that the bimodal pattern of breast cancer incidence correlates with differences in breast tumor biology [22, 26]. Tumors occurring among younger/premenopausal women tend to be more aggressive than those occurring among older/postmenopausal women. However, tissue correlates of this phenomenon have yet to be fully characterized. Our findings of age- and menopausal-related bimodal ESP distributions suggest that epithelial and stromal tissues in the breast jointly respond to aging- and menopause-related changes in endogenous hormones. Anomalies at critical points in this process will manifest as variations in ESP that mirror and might explain the breast cancer incidence curve, Clemmensen’s hook, as well as age- and menopause-related differences in tumor biology.

Age-specific heterogeneity in breast cancer incidence and molecular subtype has been shown to characterize breast cancer risk relationships for parity, BMI, and race/ethnicity [27]. For instance, parity is associated with decreased breast cancer risk among older women but with increased risk among women younger than 30–44 years [27,28,29,30]. In the current study, parity was strongly associated with higher ESP, with statistically significant dose-dependent ESP increases with increasing number of live births. Observed associations between parity/increasing number of live births and ESP were driven by positive associations with epithelium and inverse associations with stroma, findings that are consistent with those from previous studies [10, 11]. We did not observe qualitative age interactions between parity and ESP. Instead, our observed bimodal age distribution of ESP was present among parous but not nulliparous women. The first ESP peak among parous women occurred around 30–45 years, which corresponds to the well-documented parity-related increased risk of early-onset breast cancer [31, 32]. The second ESP peak occurred after 55 years of age, and although not consistent with the documented protective effect of parity among older women [33], the apparent inconsistency may be due to etiologic heterogeneity of breast cancer with respect to parity/nulliparity [34, 35]. In general, parity is associated with increased risk of basal-like breast cancer, while nulliparity is associated with an increased risk of luminal breast cancer [32, 34, 36]. Although basal-like tumors tend to predominate among younger women, recent data have shown a bimodal age distribution in the incidence of this tumor subtype [37], which is consistent with our observation of a bimodal age distribution of ESP among parous women. In contrast to basal-like tumors, luminal tumors tend to predominate among nulliparous women and at older ages [38,39,40,41,42], which is also consistent with our findings of increasing epithelial tissue among older nulliparous as opposed to parous women.

Presumably, nulliparity might increase the risk of luminal breast cancer through an intrinsically epithelial-proliferation pathway while parity may increase the risk of basal-like breast cancer via stromal-epithelial crosstalk. The former idea is supported by results from studies showing strong associations between nulliparity and highly proliferating luminal tumors, defined by expression of the proliferation marker Ki67 [43] and is buttressed by data from experimental studies showing that parity induces terminal differentiation of luminal epithelial cells as well as downregulation of growth factors and the upregulation of growth inhibitory signals [44]. Conversely, parity may increase risk of aggressive/basal-like breast cancers by disrupting stromal-epithelial homeostasis, a notion that is supported by studies showing that stromal remodeling and perturbed immune response mechanisms constitute pathways by which parity influences breast cancer risk [3, 45,46,47]. In addition to the strong association that we observed between parity and increasing ESP, our observations that the magnitude of this association increased with increasing stromal (as opposed to adipose tissue) content support the potential role of stromal-epithelial crosstalk in mediating parity-related breast carcinogenesis. In the current study, breastfeeding was associated with lower ESP but not epithelium. A previous study reported an inverse association between breastfeeding and adipose tissue content, which is consistent with our findings [11]. Breastfeeding is thought to attenuate parity-related increased risk of aggressive breast cancers [48, 49]. Conceivably, our finding of an inverse association between breastfeeding and ESP, which appears to be driven by increased stromal content with increasing breastfeeding duration, may suggest that breastfeeding’s protective effect might be partly mediated through post-lactational stromal restoration.

The association of elevated BMI with breast cancer incidence varies by age [27, 50]. Among women younger than 50 years, being overweight or obese is associated with decreased breast cancer risk, but risk increases among these women thereafter. In the current study, we found a strong association between elevated BMI and increasing ESP. Differences in ESP between women with normal versus overweight/obese BMI were highest after 50 years of age, corresponding to the age period during which elevated BMI is associated with increased breast cancer risk. Among women younger than 50 years, however, overweight/obese BMI was associated with slightly lower ESP than normal BMI, which is consistent with the lower risk of breast cancer among women with overweight/obese than normal BMI below 50 years of age [51]. The relatively higher ESP among normal than overweight/obese women between 30 and 50 years appears to be due to the correspondingly lower stromal proportion among women with normal BMI. On the other hand, the markedly higher ESP among overweight/obese than normal weight women after 50 years of age appears to be driven by a combination of increasing epithelium and decreasing stroma. These tissue-level observations reflect the complex relationships between BMI, aging, and breast cancer risk among pre- and postmenopausal women [52].

Our findings might also hold clues into differences in age-related incidence and tumor biology among racial groups [53]. We found that ESP was higher among Black than White women before 40 years, but this declined with advancing age in parallel with increasing ESP among White women leading to a crossover around 55 years, after which ESP levels were higher among White than Black women. It is unclear why ESP levels were higher among younger Black than White women and vice versa among older women, but this pattern is reminiscent of the higher rates of early-onset breast cancer among Black than White women and of later-onset breast cancer among White than Black women [53]. Although this analysis was based on self-reported race and ethnicity, our findings are consistent with those from a previous analysis within this population that found TDLU levels to be higher among women of African than European genetic ancestry [54]. Given the link between higher TDLU levels and TNBC [55, 56], our findings with respect to epithelial and ESP differences by race buttress the notion that changes in mammary tissue composition may reflect cumulative exposure to endogenous and exogenous breast cancer risk factors over the lifespan, holding clues into the etiopathogenesis of breast cancer subtypes.

Having a positive FHBC is a strong risk factor for breast cancer development. However, the tissue pathways by which FHBC influences breast cancer risk are yet to be fully defined. Results from a previous study suggested that polygenic risk scores for breast cancer development were associated with TDLU involution [57]. Here, we found positive FHBC to be associated with higher ESP, which is consistent with its association with increased breast cancer risk. We also found varying but less consistent associations between other risk factors and individual tissue composition metrics. Having had a bilateral oophorectomy was suggestively associated with lower epithelial and fibroglandular tissue, a low-risk tissue phenotype that is consistent with the reduced risk of breast cancer among women who have had a bilateral oophorectomy [58, 59]. Current use of MHT, particularly the combined estrogen and progesterone formulation, was associated with higher stromal and fibroglandular tissue, correspondingly lower adipose tissue, and lower ESP. While the association between MHT use and higher fibroglandular tissue is consistent with its association with higher mammographic density [60], a radiological representation of the amount of fibroglandular tissue in the breast, and elevated breast cancer risk [61, 62], its association with lower ESP is not consistent with its risk increasing role. Similar to MHT use, we found increasing number of alcoholic drinks per week to be inversely associated with ESP. In line with data from epidemiological studies suggesting that use of combined MHT and alcohol consumption are associated with elevated risk of hormone receptor-positive (ER + , mostly low grade) but not receptor-negative (ER-, mostly high grade) breast cancers [43, 63,64,65,66,67,68,69], our observed associations of combined MHT use and alcohol consumption with breast tissue composition metrics may provide further clues into the role of variations in exposure-tissue interactions in the etiopathogenesis of breast cancer subtypes.

This study has several important strengths, including the application of high-accuracy machine learning algorithms for the detailed and centralized assessment of quantitative tissue composition metrics on digitized, H&E-stained, biopsy specimens from over 4000 normal breast tissue donors. To the best of our knowledge, this is the largest analysis of its kind to date to investigate associations between several questionnaire-based risk factors and quantitative tissue composition metrics of the normal breast. The large sample size allowed us to conduct analysis overall and stratified by menopausal status and other relevant characteristics. We were able to control for several potential confounders in our analyses and to conduct relevant sensitivity analyses, all of which strongly support the internal validity of our findings. Nevertheless, the current analysis is not without limitations. For instance, we were unable to examine longitudinal changes in tissue composition metrics. Also, we were unable to directly evaluate the potential impact of sociodemographic, environmental, and socioeconomic factors on our BMI and race-related findings, but all our estimates were adjusted for educational level as a surrogate for socioeconomic status. The use of questionnaire-based data may be associated with measurement error and recall bias. However, the accuracy and reproducibility of self-reports for many of the factors that were significantly associated with breast tissue composition metrics in the current study have been previously documented to be high [70,71,72]. Moreover, measurement errors in exposure assessment are very unlikely to be differential by tissue composition metrics and, if they exist, will be more likely to bias the results toward the null. We did not have information on time since last pregnancy or time since weaning, so we were unable to evaluate temporal changes in the magnitude of the associations between pregnancy or weaning and tissue composition metrics. Nevertheless, pregnancy and breastfeeding history were significantly associated with ESP even in advanced ages suggesting that time since last pregnancy or breastfeeding may not confound our observed associations. Although this study was based on a population of self-selected volunteers, BCRAT (or Gail) scores of absolute breast cancer risk for participants in this study were normally distributed (Additional file 4: Fig. S4), as in the general population, which lends credence to the external generalizability of the findings. Nevertheless, the majority of the study participants were US-based, otherwise healthy, White women, which might impact the generalizability of these findings to other racial groups or populations.

In conclusion, we investigated the relationships of host, lifestyle, and reproductive factors on quantitative tissue composition metrics of the normal breast, including epithelium, stroma, adipose tissue, fibroglandular tissue, and histologic ESP (a metric of the proportion of fibroglandular tissue that is epithelium relative to stroma). We found several established breast cancer risk factors to be associated with individual tissue metrics, including novel observations with respect to ESP. In particular, age, menopausal status, parity, breastfeeding history, BMI, race, FHBC, alcohol intake, and MHT use demonstrated heterogenous associations with ESP consistent with their documented associations with incidence of molecular breast cancer subtypes. Overall, our findings provide critical insights into the role of stromal-epithelial interactions in breast cancer etiology, with implications for our understanding of the histogenesis of breast cancer subtypes. Conceivably, variations in tissue composition metrics on biopsy, particularly ESP, could serve as intermediate markers of risk and might be used to inform breast cancer prevention strategies for women.