Background

Breast cancer risk prediction models can help women and their health providers make decisions about screening and chemoprevention. While women aged 50 are uniformly included in mammographic screening recommendations, the guidelines regarding at what age to start screening are inconsistent, varying from age 40 to 50, particularly for women without a family history of breast cancer (https://www.uspreventiveservicestaskforce.org/Page/Document/UpdateSummaryFinal/breast-cancer-screening1 [1,2,3,4,5,6,7]). Improvements in individualized risk assessment would therefore be particularly valuable for women younger than 50 to decide when to start mammographic screening. A risk prediction model with high accuracy could also help women decide whether to take tamoxifen for breast cancer prevention. Younger women are more likely to benefit from tamoxifen than older women because they have lower risks of tamoxifen-related adverse events [8,9,10,11,12,13]. Nonetheless, an accurate estimate of risk of breast cancer is critical in calculating the benefit-risk index for these women.

The Gail model 2 [14] is the most widely studied breast cancer risk prediction model for women without a strong family history of breast cancer or an inherited mutation associated with high susceptibility. The breast cancer risk factors in the model are age, age at menarche, age at first live birth, number of previous breast biopsies, history of atypical hyperplasia, and first-degree family history of breast cancer [14]. The Gail model 2 was initially developed using data from white women, and race/ethnicity-specific adaptations of the model were subsequently developed. The model was implemented in the National Cancer Institute’s Breast Cancer Risk Assessment Tool (BCRAT) which is available online. The model has been validated in studies in the USA and several Western European countries, including studies of younger women [15,16,17,18,19,20,21,22,23]. It has been shown in most studies to be well calibrated [14, 15, 23], i.e., it predicts fairly accurately the number of women who will develop breast cancer overall and in subgroups defined by risk factors. However, the model has limited discriminatory accuracy, i.e., it does not separate well women who subsequently develop cancer from those who do not [15].

We recently showed that the premenopausal circulating concentration of anti-Müllerian hormone (AMH), a marker of ovarian reserve, is associated with risk of breast cancer [24]. Circulating testosterone concentration, measured before [25,26,27,28,29,30] or after menopause [31,32,33,34,35,36,37,38], has also been consistently associated with breast cancer risk. AMH and testosterone are fairly stable during the menstrual cycle and temporal reliability studies have shown that a single measurement of AMH or testosterone can be used to rank premenopausal women with regard to their average hormone level over a several-year period with reasonable accuracy [25, 34, 39,40,41,42]. They are also relatively inexpensive to measure. Thus, these two hormones are good candidate biomarkers for inclusion in breast cancer risk prediction models for younger women, who have large fluctuations in other hormone-related biomarkers during the menstrual cycle.

The objective of this study was to evaluate whether adding circulating AMH and/or testosterone measurements to the Gail model improves its discriminatory accuracy among women aged 35–50.

Methods

Study subjects

Participants in a nested case-control study in a consortium of ten prospective cohorts from the USA, UK, Italy, and Sweden [24] were included in this study. The parent cohorts were the Generations Study (BGS); CLUE II; Columbia, MO Serum Bank (CSB); Guernsey Cohort; New York University Women’s Health Study (NYUWHS); Nurses’ Health Studies (NHS) I and II; Northern Sweden Mammary Screening Cohort (NSMSC); Hormones and Diet in the Etiology of Breast Cancer (ORDET); and the Sister Study (Sister). A brief description of the cohorts can be found in Ge et al. [24]. Each cohort was approved by its institutional review board, and informed consent was obtained from each participant.

Incident breast cancer cases were ascertained by each cohort through self-report on follow-up questionnaires and/or linkages with local, regional, or national cancer registries. All cases of incident invasive breast cancer diagnosed among women who were 35–50 at the time of blood donation were included except in the NHS cohorts, which further limited case selection to women who were premenopausal and between the ages of 35–50 at diagnosis. Controls were selected within each cohort using incidence density sampling. One control was selected for each case (except for the Sister Study, which matched 1:2). Matching variables included age and date of blood donation, and race/ethnicity [24]. Many of the cohorts matched on additional variables, for example, phase or day of menstrual cycle and technical sample characteristics, such as time between collection and processing. Women who were ever users of hormone therapy (HT) or current users of oral contraceptives (OCs) were excluded.

Laboratory measurements

AMH was measured in serum or plasma samples from women who were premenopausal at the time of blood donation using the picoAMH assay (ANSH laboratories) [24]. Women who had AMH concentrations below the lowest detectable value (LDV) (< 10% of samples for eight cohorts and < 20% for the remaining two cohorts) were classified into the lowest quartile for analyses (see “Statistical methods”). Because it has previously been shown that postmenopausal women have AMH concentrations below the LDV [43, 44], we did not measure AMH in postmenopausal women (23 cases and 40 controls) but also classified them into the lowest quartile.

Total testosterone was measured for all subjects in CLUE II, NHS, and NSMSC and for the matched sets for which it was not measured previously for the other cohorts. Measurements were done in the Immunochemical Core Laboratory of the Mayo Clinic by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Assay coefficients of variation (CVs) were calculated using blinded quality control samples. For AMH, the mean intra-batch CV was 5.1% and the inter-batch CV was 21.4%. For testosterone, all intra- and inter-batch CVs were ≤ 10.6%. Previous testosterone measurements were performed as described in [25, 26, 29, 45,46,47,48].

Statistical methods

Relative risk estimation

We estimated cohort-specific relative risks (RRs) associated with the breast cancer risk factors included in the Gail model and with each of the biomarkers (testosterone and AMH) using conditional logistic regression (odds ratio estimates are referred to throughout as relative risks (RRs), by convention). Cohort-specific RRs were combined to obtain consortium-wide RR estimates using the random-effects meta-analytic method. I2 and Q-tests were used to test for heterogeneity across cohorts.

We used the same coding as the BCRAT for age at menarche (< 12 years, 12 to 13, or ≥ 14) and age at first live birth (< 20, 20 to 24, 25 to 29/nulliparous, or ≥ 30 years) [14]. Family history of breast cancer was coded using a three-category variable (0/1/> 1 affected relative(s)). For cohorts that collected family history as a yes/no variable, women who responded yes were included in the intermediate category (1 affected relative). History of breast biopsy was coded as yes/no. We did not include an interaction between breast biopsy and age (< 50/≥ 50 years) because this study was restricted to younger women (≤ 50). The interaction term between age at first birth and number of affected relatives was not statistically significant for any cohort and thus not included in the model. To be consistent with BCRAT, which imputes missing data to the lowest risk category, we imputed missing data as follows: age at menarche: ≥ 14 for 35 cases (1.5%) and 49 controls (1.9%); age at first live birth: < 20 for 5 cases (0.2%) and 7 (0.3%) controls; and number of breast biopsies: 0 for 42 cases (1.8%) and 40 controls (1.6%). Data on history of atypical hyperplasia were not available from any of the cohorts and this variable was set to the lowest risk category as is the case when “unknown” is entered in the BCRAT. Because we could not exclude the possibility that cohort differences in the AMH and testosterone concentration distributions were related to collection/handling/storage of samples [24], biomarkers were categorized into quartiles using cohort-specific cutpoints and modeled as ordered categorical variables.

Absolute risk estimation

We used the method described by Gail et al. [22, 49] to estimate the 5-year absolute breast cancer risk for each participant. We used consortium-wide estimates of RRs for the Gail variables and biomarkers (calculated as described above), consortium-based estimates of attributable risk fractions, and population-based breast cancer incidence and mortality rates. Attributable risk fractions were estimated using consortium-wide RR estimates and distributions of the Gail variables and biomarkers in the cases (excluding the Sister Study because all women in this study had a family history of breast cancer) [49]. Breast cancer incidence and competing mortality (i.e., non-breast cancer mortality) rates were obtained from the countries of the participating cohorts (US, UK, Italy, and Sweden) for the relevant 5-year age categories (35–39, 40–44, 45–49) and calendar years of blood collection (Additional file 1: Table S1).

For comparison, we also calculated the 5-year absolute risks of developing breast cancer using the BCRAT SAS macro (available at: https://dceg.cancer.gov/tools/risk-assessment/bcrasasmacro), which uses US population-based RR estimates [8, 14, 15, 22]. We refer to results using these calculations as “BCRAT” (to distinguish them from results based on RRs estimated from our dataset, called “Gail model”).

Assessment of discriminatory accuracy

We estimated the area under the receiver operating characteristic curve (AUC) based on the 5-year absolute risk estimates from the BCRAT, the Gail model, and the Gail model with addition of AMH and/or testosterone. Summary AUCs were estimated from the cohort-specific AUCs using random-effects meta-analytic methods. AUCs were also estimated within subgroups, i.e., by age, estrogen receptor (ER) status of the tumor, and Gail risk score (< 1%/≥ 1%), and for women without a family history of breast cancer. AUCs are expressed throughout as percentages (AUC × 100) for ease of interpretation. Finally, we assessed reclassification of 5-year absolute risks upon addition of biomarkers.

Results

Descriptive characteristics of the cases and controls are shown in Table 1. By design, women were between the ages of 35–50 at blood donation. About 40% of cases donated blood samples in the 5 years preceding breast cancer diagnosis. Consistent with known breast cancer risk factor associations, cases were more likely than controls to have had a breast biopsy, to have a family history of breast cancer, and to be nulliparous or have had their first live birth after age 30. The vast majority of women had low to average BCRAT 5-year risk scores (over half of the women had a risk < 1%), as expected in a study of younger women.

Table 1 Descriptive characteristics of invasive breast cancer cases and matched controls

Table 2 shows the RR estimates for invasive breast cancer associated with Gail model risk factors and biomarkers. The RRs for the Gail model variables did not change appreciably with the addition of biomarkers to the model. When individually added to the Gail model, AMH was associated with a 55% increase in risk and testosterone with a 27% increase in risk for the 4th vs. 1st quartiles; when added together, AMH was associated with a 53%, and testosterone with a 22%, increase. Table 2 also shows the attributable risk fraction estimates for each unit increase in risk factor or biomarker. For Gail model variables, the risk attributable to age at menarche was low (< 1%), while attributable risks were higher for family history of breast cancer (7%), history of breast biopsy (8%), and age at first pregnancy (18%). The attributable risk for a one-quartile increase in AMH was 19% and for testosterone 9%. In a sensitivity analysis restricted to the five US cohorts included in our study, the attributable risks calculated using US population risk factor distributions were similar to estimates based on risk factor distributions in the cases (data not shown) [22, 49,50,51]. Cohort-specific RR estimates for invasive breast cancer from the model including both biomarkers are shown in Additional file 1: Figure S1. Tests for heterogeneity by cohort were not statistically significant. Removing one cohort at a time from the analysis did not change the RRs appreciably (data not shown).

Table 2 Relative risks calculated using random-effects meta-analysis and attributable risk fractions

Figure 1 and Table 3 show the AUCs based on BCRAT, the Gail model, and the Gail model with biomarkers. The summary AUC for invasive breast cancer using the BCRAT was 55.0 (95% CI 53.1, 56.8). The AUC in our implementation of the Gail model was very similar (AUC 55.3, 95% CI 53.4, 57.1). The AUC increased with the addition of AMH (AUC 57.6, 95% CI 55.7, 59.5), testosterone (AUC 56.2, 95% CI 54.4, 58.1), and both AMH and testosterone (AUC 58.1, 95% CI 56.2, 59.9). The percent increase relative to the Gail model was statistically significant for the model including AMH (4.2%, p = 0.007) and the model including both AMH and testosterone (5.1%, p = 0.001), but not testosterone alone (1.6%, p = 0.086). AUCs were similar when both in situ and invasive cases were considered together (Additional file 1: Figure S4).

Fig. 1
figure 1

Area under the receiver operating curve (AUC) estimates and 95% confidence intervals

Table 3 AUCs by subgroups

Table 3 also shows AUCs in subgroups. Small improvements in AUCs with the addition of both biomarkers to the Gail model were observed in each age-at-blood-donation subgroup, with the largest increase (3.5, a relative increase of 6.0%) for women ages 45–50, for whom the Gail model also had the highest AUC (58.6). AUC improvements for women with a 5-year risk lower than 1% were greater (3.0, a relative increase of 5.7%) than those for women with risk of at least 1% (1.0, a relative increase of 1.7%). AUC improvement was larger for ER-positive tumors (2.8, a relative increase of 5.0%) than ER-negative tumors (0.3, a relative increase of 0.5%). We also found that the AUC increased (4.0, a relative increase of 7.6%) with the addition of biomarkers for the subgroup of women without a family history of breast cancer, but less so for women with a family history (2.2, a relative increase of 4.4%).

Figure 2 shows the histograms displaying absolute risk estimates of cases and controls for the Gail model with and without testosterone and AMH. Though there was substantial overlap between the distributions in cases and controls, the distribution was skewed to the right for cases. Adding the biomarkers resulted in a slight shift of the distribution to the right for cases (9.3% had risk estimates move from below to above 1%, while 8.1% moved down, Table 4) and a slight shift to the left for controls (8.7% had risk estimates move from below to above 1%, while 10.4% moved down, Table 4).

Fig. 2
figure 2

Reclassification of absolute 5-year risk of breast cancer with the addition of AMH and testosterone to the Gail model

Table 4 Absolute risk reclassification upon adding AMH and testosterone to the Gail model

Discussion

Circulating AMH and testosterone moderately increased the discriminatory accuracy of the Gail breast cancer risk prediction model among women ages 35–50 in our study of 1762 invasive cases and 1890 matched controls. Discriminatory accuracy improved with the addition of either AMH or testosterone, though the improvement was only statistically significant for AMH. In the model including both biomarkers, we observed an AUC increase from 55.3 to 58.1 (relative increase of 5.1%). Overall, inclusion of biomarkers tended to moderately increase 5-year risk estimates for cases and reduce estimates for controls.

The increase in AUC resulting from the addition of biomarkers was slightly higher in analyses limited to women without a family history of breast cancer than that observed in analyses including all women. This is of interest because the majority of breast cancers occur among women without a family history. Further, women without a family history are the group in which improvements in risk prediction could have the most impact, since it is already recommended that women with a family history start screening early (https://www.uspreventiveservicestaskforce.org/Page/Document/UpdateSummaryFinal/breast-cancer-screening1).

While risk prediction models applicable to younger women would be valuable for screening and preventive treatment decision-making, less work has focused on this group of women as compared to older women [52,53,54]. To our knowledge, risk prediction estimation has been assessed for premenopausal women from the general population in six studies [55,56,57,58,59,60]. Most of these assessed or modified the Gail model, but some had extensive missing data for Gail model variables [55, 57] or did not assess discriminatory accuracy [57]. Others developed new models for which validation has not yet been attempted in independent studies [55, 60]. Testosterone was added to the Gail model in one study that included premenopausal women [56]. In this study of 430 cases/684 controls, the addition of hormones, including testosterone, to the Gail model did not result in any change in the AUC for premenopausal women [56]. Unlike this study, the increase in AUC that we observed with the addition of testosterone is in agreement with the premenopausal testosterone-breast cancer risk association that has been consistently observed [25,26,27,28,29,30]. AMH has not been included in breast cancer risk prediction models previously.

Some studies, though not all [61, 62], have reported correlations of BMI with testosterone and AMH in premenopausal women [39, 63, 64]. These correlations have generally been weak, including in our study (Spearman partial correlations with BMI among controls, adjusted for cohort and age, were 0.06 for testosterone, and − 0.07 for AMH). This suggests that including BMI in the model, though it would be easier than including biomarkers because BMI does not require a blood draw, would not capture the impact of AMH and testosterone on breast cancer risk.

The AUC increases with the addition of AMH, and testosterone were greater for ER-positive than ER-negative tumors, as expected since AMH was more strongly associated with risk of ER-positive than ER-negative tumors in our study [24]. Though AMH and estrogen concentrations are not strongly correlated in premenopausal women [39, 64], AMH is strongly associated with age at menopause, at which time estrogen exposure decreases. This association may explain the greater improvement in prediction of estrogen-sensitive tumors than ER-negative tumors with the inclusion of AMH in the Gail model.

Several other risk factors have been proposed for inclusion in the Gail model to improve discriminatory accuracy, with varying applicability to premenopausal women. Mammographic density has been shown to increase the discriminatory accuracy of the Gail model in several studies [51, 55, 65, 66], but density is not available yet to women deciding when to begin screening. Endogenous hormones other than AMH and testosterone, such as estrogen, progesterone, and prolactin, fluctuate during the menstrual cycle and/or are not consistently associated with risk in premenopausal women [31, 67]. Common, low-penetrance genetic risk factors may also have utility for risk prediction in younger women. Single nucleotide polymorphisms (SNPs), and their combined risk scores (ranging from 6 to 77 SNPs across studies), have increased Gail model AUCs (AUC increases of 0.6–7.0) in most studies [54, 59, 68,69,70,71,72,73,74,75], including among younger women [59]. Inclusion of a 77-SNP score increased the AUC from 0.64 to 0.66 among women < 50 years of age [59], an increase comparable to that observed with the addition of AMH and testosterone. Because most genetic variants that are associated with breast cancer risk are not in hormone-related genes, they are likely to contribute to risk prediction independently of AMH and testosterone. Thus, models including both genetic variants and hormone biomarkers as a panel may perform better than models including only one type of marker.

We could not directly assess the calibration of the model including biomarkers because AMH and testosterone were measured only in matched case-control sets; thus, the expected number of cases in the full cohorts using the model including biomarkers could not be estimated [76]. Another method to indirectly assess calibration is inverse probability weighting [77], which uses the probability of being selected into the nested case-control study as a weighting factor to estimate the expected number of cases in the cohort. However, closely matched nested case-control studies, as in this consortium, yield high selection probabilities for a substantial proportion of controls because the risk sets from which controls are selected can be very small. For example, for the 496 controls in the NYUWHS, we would expect an average selection probability of ~ 10% (5600 cohort participants were between the ages of 35 and 50 at enrollment), but the average probability was 35%. The controls in this study provided insufficient information about the full cohort, precluding the assessment of calibration [76].

Our study included past users of oral contraceptives (> 65%) [24], but not current users because AMH levels go down during oral contraceptive use [62, 78, 79]. Thus, our results only apply to women not on oral contraceptives.

In addition to the large size of our study, its major strength is the prospective design. Samples collected prior to diagnosis are valuable for measuring biomarkers that can be affected by the diagnosis and/or treatment of breast cancer. Another strength is that detailed epidemiological data on breast cancer risk factors were collected from all cohorts.

Conclusions

In conclusion, we observed moderate increases in the discriminatory accuracy of the Gail model 2 for women aged 35–50 with the addition of AMH and testosterone. Combining these markers with others (e.g., SNPs) may improve risk prediction models, though the improvement in discriminatory accuracy will remain limited until new markers with stronger associations with breast cancer risk are identified [80, 81].