Introduction

The Breast and Ovarian Analysis of Disease and Carrier Estimation Algorithm (BOADICEA) breast cancer model was originally developed to predict breast cancer risk for women using pedigree-level family history information and genetic testing results on rare pathogenic variants in high and moderate risk genes [1, 2]. This model has been updated (version 5.0) to include reproductive and lifestyle factors and the recently developed polygenic risk score (PRS) based on 313 common germline variants [3] for applications in both general and high-risk populations [4, 5]. The Tyrer-Cuzick or International Breast Intervention Study (IBIS) model [6], commonly used in clinical and research settings, also includes extensive family history and comprehensive risk factor information and has been updated to include information on PRS. We recently evaluated the performance of the Tyrer-Cuzick model (version 8.0) without PRS in a prospective cohort of women of European ancestry in the general population [7]. Here, we perform a comparative validation of the extended BOADICEA and Tyrer-Cuzick models incorporating the 313-variant PRS in the same prospective cohort.

The original BOADICEA and Tyrer-Cuzick models are considered in clinical guidelines [8, 9] for management of women with a family history of breast cancer and have been implemented in user-friendly risk assessment tools that can incorporate PRS (BOADICEA: https://canrisk.org; IBIS: https://ibis.ikonopedia.com). Assessment of disease risk using PRS has been increasingly commercially available through genetic services and marketed to clinicians [10]. Given the widespread use of these models and their capabilities to incorporate PRS with other risk factors in flexible risk prediction tools, comparative prospective validation of these models and their extensions is critical to assess their ability to accurately identify women at different risk levels for risk-stratified screening, surveillance or prevention strategies.

We report results from a prospective comparative validation of the extended BOADICEA model (v.5) with risk factors and 313-variant PRS and Tyrer-Cuzick model (v.8) with the same PRS in the Generations Study, a population-based cohort study of UK women [11].

Methods

Data were used from a nested case-control sample within the Generations Study (2003–2012), a prospective cohort of over 113,000 UK women aged 16–102 years; details are elsewhere [7, 11]. The comparative validation analyses of 5-year absolute risk of breast cancer were based on 1337 women aged 23–75 years, including 619 incident breast cancer patients within 5 years from study recruitment, with information on the PRS and the risk factors used in both the BOADICEA (v.5) and Tyrer-Cuzick (v.8) models (Supplementary Fig. 1). Supplementary Table 1 summarizes the information on questionnaire-based risk factors and 313-variant PRS for these women.

To update the original BOADICEA model, the relative risks for the risk factors and PRS were derived using the literature-based approach [3, 7]; further details are given in Lee et al. [4]. In this model, the family history association, described by a residual polygenic component, was adjusted to account for the PRS explaining ~ 20% of the breast cancer familial aggregation. The PRS was added to the Tyrer-Cuzick model (v.8) using the approach described in Brentnall et al. [12], where the associations of family history and PRS were unadjusted and assumed to be multiplicative on the risk of developing breast cancer. The comparative validation analyses were performed using the standardized model calibration and discrimination methods implemented in the Individualized Coherent Absolute Risk Estimator (iCARE) tool [13] (details in supplement). Briefly, model calibration was assessed in terms of relative and absolute risk by comparing the observed and expected quantities, overall, and within risk categories. The area under the curve (AUC) was estimated to assess model discrimination.

Results

For women younger than 50 years, the original and extended BOADICEA models (with PRS and with PRS and reproductive/lifestyle factors) showed good calibration of relative and absolute risk (Fig. 1). At the highest decile of predicted 5-year absolute risk, the extended model with PRS and reproductive/lifestyle factors showed better calibration than both the original model and the extended model with PRS only, with a ratio of expected to observed number of cases (E/O) of 0.97 [95% confidence interval (CI) 0.51 − 1.86], 0.83 (0.44 − 1.56), 0.85 (0.44 − 1.63), respectively. Adding PRS and risk factors led to modest improvement in AUC from 69.1 % (63.5 %  − 74.6%) to 69.7 % (64.1 %  − 75.2%). Incorporating risk factors did not improve the discrimination of the original model (data not shown) or the extended model with PRS (Fig. 1). The Tyrer-Cuzick model with PRS had similar discrimination [AUC : 69.4 % (63.8 %  − 75.0%)] to the extended BOADICEA model with PRS and risk factors but showed evidence of overestimation at the highest risk decile [E/O : 1.54 (0.81 − 2.92)].

Fig. 1
figure 1

Calibration and discrimination of 5-year risk predictions of breast cancer for women younger than 50 years in the nested case-control sample of the Generations Study cohort with risk categories based on deciles of predicted 5-year absolute risk. Validation results are shown for the original BOADICEA model that incorporates pedigree level family history information, its two extensions: (i) incorporating the recently developed PRS based on 313 common germline variants to the original model and (ii) incorporating the 313-variant PRS and reproductive and lifestyle factors to the original model, and the IBIS (Version 8.0) model after including the 313-variant PRS. Estimates and 95% CI of the calibration slope and intercept are reported based on a linear regression of the decile-specific observed proportion of cases within 5 years and the average of the predicted 5-year absolute risk. AUC = area under the curve, c2 =chi-square goodness-of-fit test, BOADICEA = Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm, IBIS = International Breast Cancer Intervention Study, PRS = polygenic risk score, E/O = expected to observed number of cases, CI = confidence interval

The original and extended BOADICEA models also showed good calibration of relative and absolute risk for women 50 years or older (Fig. 2), in particular for women at the highest risk decile [E/O : 0.95 (0.56 − 1.62) for the original model, 1.07 (0.63 − 1.82) for the extended model with PRS, 1.09 (0.66 − 1.80) for the extended model with PRS and risk factors]. For this age group, incorporating PRS and risk factors led to substantial improvements in AUC from 56.8 % (52.9 %  − 60.6%) to 64.6 % (60.9 %  − 68.2%). Adding risk factors substantially improved the risk discrimination of the original model (data not shown) and the extended model with PRS (Fig. 2). The Tyrer-Cuzick model with PRS had risk discrimination comparable to the extended BOADICEA model with PRS and risk factors; however, the former substantially overestimated risk for women at the highest risk decile [E/O : 1.73 (1.03 − 2.90)]. Overestimation of risk in high-risk deciles was present in models with or without the PRS (Supplementary Fig. 2).

Fig. 2
figure 2

Calibration and discrimination of 5-year risk predictions of breast cancer for women aged 50 years or older in the nested case-control sample of the Generations Study cohort with risk categories based on deciles of predicted 5-year absolute risk. Validation results are shown for the original BOADICEA model that incorporate pedigree level family history information, its two extensions: (i) incorporating the recently developed PRS based on 313 common germline variants to the original model and (ii) incorporating the 313-variant PRS and reproductive and lifestyle factors to the original model, and the IBIS model (Version 8.0) after including the 313-variant PRS. Estimates and 95% CI of the calibration slope and intercept are reported based on a linear regression of the decile-specific observed proportion of cases within 5 years and the average of the predicted 5-year absolute risk. AUC = area under the curve, c2 =chi-square goodness-of-fit test, BOADICEA = Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm, IBIS = International Breast Cancer Intervention Study, PRS = polygenic risk score, E/O = expected to observed number of cases, CI = confidence interval

Discussion

Our study shows that the extended BOADICEA model, which incorporated reproductive and lifestyle factors and a 313-SNP PRS to the familial aggregation information, predicted 5-year absolute risk of breast cancer more accurately than the Tyrer-Cuzick model with the same PRS, for women at the highest risk decile in the Generations study, a UK-based prospective cohort.

Previous studies in populations of women of European ancestry provided evidence of overestimation of absolute risk obtained from the Tyrer-Cuzick model without PRS for women in the highest risk decile [7, 14]. Two recent studies that incorporated PRSs with fewer genetic variants to this model, showed good calibration in terms of relative risk, but did not evaluate absolute risk calibration [12, 15]. Our results showed overestimation of the absolute risk for women at the high-risk deciles, possibly due to not attenuating the contribution of family history association to account for the substantial familial aggregation explained by the PRS. This can lead to inflated breast cancer risks, particularly for women with breast cancer family history who are more prevalent in high-risk deciles. Accounting for the correlation between the PRS and family history would likely reduce this overestimation and future studies are needed to investigate the extent of this reduction.

Strengths of the current analyses include the use of the Generations Study, a relatively recent population-based cohort with a wide range of ages of participating women and the comparison of two widely used risk prediction tools that can incorporate PRS. With the increasing availability of PRS (e.g., in countries like US), such rigorous comparative evaluation of models incorporating PRS with other risk factor information is critical to assess their suitability in clinical and research applications. Moreover, model calibration was assessed both overall and within risk categories, in particular for women at the extremes of risk for whom prevention and screening are most relevant. The CanRisk tool has already implemented the BOADICEA model and its extensions. The current study provides some evidence of accurate risk predictions from this tool for the UK general population. Further evaluation of this tool in both general and high-risk populations is needed before widespread clinical applications. Moreover, future research is merited towards risk model building and validation for women of non-European ancestry.

To summarize, the extended BOADICEA model with PRS and reproductive/lifestyle factors identified women of European ancestry at elevated 5-year risk of breast cancer more accurately than the Tyrer-Cuzick model with PRS. As disease risk prediction with PRS is becoming more available through genetic services in some countries (e.g., the USA), these and other similar analyses will potentially inform the choice of risk models for developing risk-stratified breast cancer prevention and screening strategies for women of European ancestry.