Breast cancer pathology and stage are better predicted by risk stratification models that include mammographic density and common genetic variants

Purpose To improve breast cancer risk stratification to enable more targeted early detection/prevention strategies that will better balance risks and benefits of population screening programmes. Methods 9362 of 57,902 women in the Predicting-Risk-Of-Cancer-At-Screening (PROCAS) study who were unaffected by breast cancer at study entry and provided DNA for a polygenic risk score (PRS). The PRS was analysed alongside mammographic density (density-residual-DR) and standard risk factors (Tyrer-Cuzick-model) to assess future risk of breast cancer based on tumour stage receptor expression and pathology. Results 195 prospective incident breast cancers had a prediction based on TC/DR/PRS which was informative for subsequent breast cancer overall [IQ-OR 2.25 (95% CI 1.89–2.68)] with excellent calibration-(0.99). The model performed particularly well in predicting higher stage stage 2+ IQ-OR 2.69 (95% CI 2.02–3.60) and ER + BCs (IQ-OR 2.36 (95% CI 1.93–2.89)). DR was most predictive for HER2+ and stage 2+ cancers but did not discriminate as well between poor and extremely good prognosis BC as either Tyrer-Cuzick or PRS. In contrast, PRS gave the highest OR for incident stage 2+ cancers, [IQR-OR 1.79 (95% CI 1.30–2.46)]. Conclusions A combined approach using Tyrer-Cuzick/DR/PRS provides accurate risk stratification, particularly for poor prognosis cancers. This provides support for reducing the screening interval in high-risk women and increasing the screening interval in low-risk women defined by this model. Electronic supplementary material The online version of this article (10.1007/s10549-019-05210-2) contains supplementary material, which is available to authorized users.


Introduction
Breast cancer is the most commonly diagnosed cancer in women worldwide. In familial breast cancer just over half of all cases are explained by a known genetic component [1][2][3], predominantly pathogenic variants in BRCA1 or BRCA2 and single nucleotide polymorphisms (SNPs). SNPs account for more of the familial risk than all pathogenic variants in high or moderate-risk breast cancer genes [1][2][3]. SNPs also explain a large proportion of risk in women developing breast cancer in those without a family history. Therefore, at a population level, SNPs are more informative than screening for moderate and high-risk gene variants [4,5]. Dependent on the genotype of susceptibility SNPs (i.e. 0, 1 or 2 risk alleles) and the individual odds ratios for each risk allele, a risk estimate can be derived to create a polygenic risk score (PRS) [6].
At present, breast cancer risk prediction models include classical risk factors, for example, current age, family history, age of menarche, first full-term pregnancy and menopause, body mass index, type and number of breast biopsies and use of hormone replacement therapy (HRT -dose, type and duration) [7,8]. In addition, high mammographic density has been established as a well-delineated breast cancer risk factor and several studies show that incorporation of mammographic density improves the accuracy of risk prediction models [9,10]. Recent studies consider the value of including SNP genotype data into risk prediction algorithms, with promising results [11][12][13][14].
We collected data on classical risk factors, mammographic density and 18 breast cancer susceptibility SNPs (SNP18) on women who did not have breast cancer at entry to PROCAS [14,15]. Recently, we showed that by combining mammographic density and SNP18 data with the Tyrer-Cuzick (TC) risk prediction model v6, women aged 46-73 years could be accurately divided into four 10-year risk groups (< 2%-low, 2-3.49%-average, 3.5-4.99% above average and ≥ 5% moderate/high) [14][15][16]. However, improvements in risk stratification are required, to define groups more precisely and reduce the large numbers at average risk.
Here, we report on the incidence rates and pathology in risk groups defined by TC/mammographic density/SNP18 in the PROCAS study.

Methods
A total of 57,902 women aged between 46 and 73 years from the Greater Manchester area were recruited to the PRO-CAS study between October 2009 and June 2015. Women were recruited at the time of attendance for mammographic screening in the National Health Service Breast Screening Programme (NHSBSP). Standard breast cancer risk factors were collected using self-completed two-page questionnaires. Saliva samples were collected from 9899 women after their initial study mammogram at drop-in days at several centres in Greater Manchester. In addition, samples were specifically collected from women with breast cancer (invasive breast cancer or ductal carcinoma in situ) subsequently diagnosed after recruitment to the study. All saliva samples were collected before January 2014.
The PROCAS study was approved by the North Manchester Research Ethics Committee (Ref. 09/H1008/81).
Saliva samples were collected to extract DNA for SNP genotyping. DNA samples were stored at − 20 °C. The 18 SNPs (Supplemental Table 1) were genotyped as previously described [14,15], blinded to whether the patient had developed breast cancer, by a custom designed Sequenom MassARRAY iPLEX assay or TaqMan® SNP Genotyping Assay. Per-allele odds ratios (OR) were derived from published OR and allele frequency as described previously by normalising around a relative risk of 1.0 [14,15]. Briefly, the PRSs were calculated by multiplying the per-allele OR for each SNP (when a single SNP failed the woman was given an arbitrary score of 1.0 for that SNP). The PRS was used in further statistical analyses. Mammographic density was estimated by two readers using visual analogue scales, as previously described [10]. Density was adjusted for BMI and age and reported as a 'density residual' (DR) and was also expressed as a predictive odds ratio [10]. Women with bilateral breast cancer on prevalent study screen or with breast implants had no assessable VAS score and were given a nominal OR of 1.0 for density residual in the combined analysis of all three measures.
TC v. 6 10-year risk was calculated based on the questionnaires completed by PROCAS participants at study entry. The questionnaire included data on age, age at first full-term pregnancy, BMI (from height/weight), number of affected first and second-degree relatives, history of previous breast biopsy, parity and ethnicity. We have previously demonstrated very low correlation between SNP18 and TC, and SNP18 with DR [15]. Thus, no adjustments were made for SNP18 PRS when incorporated 0 into the 10-year risk estimate as SNP18 was almost completely calibrated (observed to expected odds ratio for SNP18 was 0.99, 95% CI 0.70-1.26).
Clinical endpoints examined in the present study were: breast cancer characteristics obtained from histopathology reports: invasive tumour vs ductal carcinoma in situ (DCIS), invasive tumour grade, stage and estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) status. Further, breast cancer cases were subdivided into a category of extremely good prognosis (EGP) that was not previously defined but was used to capture those cancers that would either have been unlikely to have caused a patient death or may never have presented clinically i.e. invasive tumours that were both stage 1 and grade 1 occurring in those > 60 years or intermediate grade DCIS > 55 years or low-grade DCIS at any age.
10-year breast cancer risk was stratified into four main groups: < 2%; 2-3.49%; 3.5-4.99% and ≥ 5% risk as previously [15], the latter group combining UK National Institute for Clinical and Care Excellence (NICE) defined high and moderate risk groups for which additional screening and chemoprevention are recommended [16]. Additionally, the moderate/high-risk group was split further as per NICE guidelines into moderate (5%-7.99%) and highrisk (8%+), and an additional very low risk group with a < 1.5% 10-year risk (the mean 10-year population risk at age 40 years, where screening is not recommended in the UK). Prospective follow-up was censored at date of BC diagnosis date of death (from NHS Hospital Episode Statistics data) or date cancer databases were last checked via the cancer registries (30/09/2017).
Incident breast cancers were those occurring after DNA sample collection. DR and TC were also studied in the wider PROCAS cohort.

Statistical methods
Two sets of cases were analysed: (1) all breast cancers diagnosed subsequent to completion of the questionnaire and (2) those diagnosed subsequent to saliva collection for analysis of DNA.
Odds ratios per interquartile range (IQR-OR) with 95% Wald confidence intervals were calculated by logistic regression, using the natural logarithm for each of the risk factors assessed (TC, DR, SNP18), to examine the relationship between cases and controls for the different pathological types (invasive tumour grade, DCIS, stage 2+, ER+, ER−, HER2+ and all breast cancer).
In the prospective analysis following saliva collection, relative risks were estimated with 95% confidence intervals from an exact Poisson method. Expected risk from the model was obtained by converting expected incidence to cumulative hazard i.e., cumulative hazard H = − log (1 − incidence) and summing.

DNA cohort
Of the 9899 women with saliva DNA, 537 were excluded as having previously been diagnosed with breast cancer. Of the remaining 9362 women unaffected at study entry, 270 were diagnosed at the prevalent mammogram in PROCAS with 30 being diagnosed after their prevalent study mammogram but before saliva DNA collection. This left 9062 who had not been diagnosed with breast cancer before DNA collection.
In these 9062 women there were 56057.6 women years of follow-up (mean 6.19 years, IQR 5.46-6.96), indicating that most women had two further screening mammograms within the study period. Two were lost to follow-up and 167 had died at the time of analysis, 6 from an incident breast cancer.

Incident breast cancers
There were 195 incident breast cancers (26 (13.3%) of which were DCIS) with 195.5 expected by a combined TC/DR/ SNP18 risk calculation (O:E ratio = 0.997). There were 184.3 expected by TC assessment alone ( Table 1). The 10-year rates of breast cancer in each group were accurately predicted (Fig. 1), with point estimates within the expected range, even in the very low-risk group (< 1.5%). The combined model performed well in predicting subsequent breast cancer overall (IQ-OR 2.25 (1.89-2.68)) Table 1 Women unaffected by breast cancer at entry with DNA collected by combined 10-year risk groups a The 1.5% risk group includes women also in the < 2% group to assess which is the better low-risk threshold

All with DNA including prevalent
Utilising all the available tumour pathology from prevalent (only contralateral breast density was used as previously [14,15]) and incident cancers from 9362 women combined, TC, DR and SNP18 were individually predictive of breast cancer as a whole, as well as most pathological subtypes. However, the strength of prediction of each individual risk factor (TC, DR and SNP18) varied considerably for several tumour subtypes (  Fig. 2).
In view of the potential benefits of lower prediction probabilities when the cancer is grade 1 or EGP where there is a great risk of overtreatment we looked at the full TC/ DR/SNP18 prediction model for such cancers (Table 3). There was a higher proportion of grade 1 breast cancers as a proportion of all invasive breast cancers in the lowrisk group: 23/67 (34.3%) compared to 15/66 (22.7%) in the moderate and 8/58 (13.8%) in the high risk group (P = 0.007; Table 3). The proportion of EGP as defined in the methods was also significantly higher at 20/85 (23.5%) in the low-risk group compared to the moderate/high-risk group at 16/155 (10.3%); P = 0.008 (Table 1).

3
who have died in the whole DNA/PROCAS cohort, thirteen died as a result of a breast cancer diagnosis since entering PROCAS. Nine of the thirteen deaths (69%) were in those with a 10-year risk > 3.5% and nine of the thirteen were stage 2+ at diagnosis. The TC/DR/SNP18 score may be useful to define a group of women attending for their first population screening mammogram (aged 46-52 years in the UK) who do not require screening for a 10-year period. For example, the group aged 46-52 years at < 2% 10-year risk in PROCAS had a mean 10-year incidence of only 1.4% (5 of 547 developed breast cancer after 3383.27 years of follow up) and for the 262 women at < 1.5% 10-year risk there was only 1 breast cancer in 1648.43 years giving a rate of 0.6 per 1000 women in 10 years.

Discussion
We have shown that combining TC, DR and SNP18 improves the accuracy of breast cancer risk stratification over each factor independently and helps predict which women are more likely to develop better and worse prognosis cancers. Previously, we demonstrated that there was a higher proportion of cancers that are interval and stage 2+ in women at high/moderate-risk than for those at low risk [15]. The present study adds a further 1.3 years of prospective follow-up, 30 more incident cancers and details of the pathology of the breast cancers diagnosed. We are unable to find any previous report of a prospective risk stratification study that has shown differences in grade and stage of breast cancers in relation to risk. The results suggest that in a 3-yearly mammography screening programme more frequent imaging, potentially including newer techniques such automated breast ultrasound (ABUS), contrast-enhanced mammography or MRI may be indicated to down-stage breast cancers in the moderate/high-risk groups. Although high stage invasive cancers still occur in the low risk group this has to be balanced against the higher proportion of extremely good prognosis cancers, which is almost double that in the high/moderate risk group (23.5% vs 10.1%). A case could be made to assess women at screening entry at around 46-50 years of age, and those at a 10 years risk of < 2% or perhaps < 1.5% could be counselled that screening is not indicated now and they should be reassessed in a further 10-years. In those countries with 2-yearly screening programmes reduction in frequency to 3-yearly in the 33% at < 2% risk seems a reasonable option in order to offset the cost of the extra screening suggested in those at above average risk. Although the cost of a SNP PRS commercially is relatively high the current cost of an Illumina onco-array which allows testing of over 300 potential breast cancer SNPs is currently around $70USD per person and with saliva collection, DNA extraction and analysis a $100USD cost is feasible and would only need to be undertaken once.
The high proportion of stage 2+ and interval cancers in women with high mammographic density is likely to be due in part to 'masking' of smaller cancers in areas of dense fibroglandular tissue on mammography. This does not, however, explain the overall better prediction of post-prevalent stage 2+ cancers using SNP18.
Before any such change to screening intervals is introduced the effectiveness of this change could be enhanced by better identification of the worse prognosis ER− breast cancers. Although TC risk does to some extent predict these cancers they are less effective than for ER+ cancers. SNP18 has little predictive value as would be expected, as the majority of individual SNPs are associated with ER+ tumours [17], with only three predicting ER− cancer risk which are also predictive for breast cancer in women with BRCA1 pathogenic variants, where the cancers are predominantly ER− [6]. The recent identification of ten new SNPs for ER− disease [18] alongside ten that were already discovered may provide a SNP20 for ER− breast cancer. A SNP20 PRS for ER− breast cancer would likely provide a more accurate prediction of more lethal ER− disease along with TC and density. Assuming 10-15% of breast cancers in the age range 46-73 years are ER-an acceptably low rate of both total risk of breast cancer of < 2% or perhaps < 1.5%, alongside a rate of ER− of < 0.3% could constitute a reasonable threshold to delay further screening until reassessment at 10 years, unless risk factors change in the interim.
We have identified a strong link between mammographic density and HER2+ breast cancer and this has also been highlighted recently [19].
There are limitations of the present study. We have used a definition for extremely good prognosis breast cancer that is likely to contain the great majority of potential overdiagnosis. However, grade 1 stage 1 breast cancers are still capable of causing breast cancer deaths and if women are removed from screening they are likely to become higher stage before becoming symptomatic. Equally, intermediate grade DCIS could become at least a grade 2 invasive breast cancer if untreated, with the possibility of a later breast cancer-related death. There was some missing data from the present study and two women were lost to follow-up and 18/407 (4.4%) invasive breast cancers had missing grade and ER status. Strengths include the large number of fully genotyped women in the prospective analysis with mammographic density information.
In conclusion, in this study, we report that the risk groups not only define incidence rates but also the pathology and prognosis of the breast cancers that develop in them, with important implications for screening and preventive strategies. The current study confirms the added accuracy of risk prediction using a combined TC/DR/SNP18 approach and that it can define a sizeable group (32.5%) of women who have a low (< 2%) 10-year risk. Furthermore, cancers identified in this group are significantly more likely to have an extremely good prognosis [17]. This work provides important evidence for a risk-stratified approach to breast cancer screening with an assessment at first mammogram around age 45-50 where extra screening in the high-risk group to reduce the risk of stage 2+ cancers could be offset by reducing or eliminating screening in the larger low-risk group, where the benefits of screening may be outweighed by false-positive screens and the potential for over-diagnosis and over-treatment.