Introduction

Breast cancer risk models estimate the chance that a woman will develop breast cancer in the future, and a more accurate assessment is needed to guide prevention and screening strategies [1]. Risk is often assessed using the Gail (or Breast Cancer Risk Assessment Tool) and Tyrer-Cuzick [or International Breast Intervention Study (IBIS)] models [25]. The Gail model was originally developed using a case–control study of women attending screening in the United States [4] with invasive and ductal carcinoma in situ (DCIS) cases, but the absolute rates are calibrated to invasive cancer. The Gail model is based on eight questions, including age, hormonal factors, benign disease and the number of first-degree relatives affected by breast cancer, and it has been validated to be well calibrated for the general population [6]. The Tyrer-Cuzick model was developed by pooling relative risks from overview studies and was initially used to assess eligibility for a prevention trial (IBIS-I) [5]. It is calibrated to invasive and DCIS cancer rates and includes many of the Gail risk factors, but some are handled differently, including a more complex model for family history of the disease. The Tyrer-Cuzick model has not been validated to date in a prospective screening setting, but it has been compared with the Gail model in cohorts with a strong family history [79].

Mammographic density appears as white (radiopaque) areas on a mammogram, and it is often measured visually as a percentage of the total breast area. Dense breasts have more fibroglandular tissue and less fat than non-dense breasts, and it is well established that women with these features are at a higher risk of breast cancer [10]. Density could be routinely measured when a woman attends screening, but it is currently not incorporated in either the Tyrer-Cuzick model or the Gail model. Some work to combine breast density with classical hormonal and familial risk factors has been based on Breast Imaging-Reporting and Data System (BI-RADS) visual density classification [11]. This has been seen to produce a relative risk of approximately 2–4-fold between the highest and lowest of four categories [12]. Results incorporating BI-RADS density into risk models have been mixed [13]. Some have concluded that BI-RADS density added minimally to the Gail model, but others have shown that it adds useful additional information to risk factors used with the Gail model [12, 14]. A limitation of BI-RADS density is that approximately 80 % women fall into the middle two categories where the risk difference is more modest [12].

Another visually assessed density measure is the percentage of the area of the breast containing fibroglandular tissue. Methods for this have been observed to produce a 4–6-fold risk difference for dense versus non-dense breasts [15], and they predict response to tamoxifen prevention [16] and both tamoxifen and aromatase inhibitors in the adjuvant setting [17, 18]. Some previous work has found continuous measures of percentage density to be useful in combination with classical risk factors. In particular, Chen et al. [19] conducted a case–control study of women recruited to a screening study in the United States during the 1970s, and Warwick et al. [20] reported a nested case–control study of women at high risk of breast cancer from the IBIS-I trial, mostly in the 1990s. Visual assessment has significant drawbacks, including the time needed if using a computer-aided system such as Cumulus [21], as well as inter- and intrareader variability [22]; however, for risk prediction, it is currently the standard by which to judge newer methods because it has consistently been shown to be a strong risk factor [15].

The objective of this study was to assess whether visually assessed percentage density might improve the Tyrer-Cuzick and Gail risk models for risk assessment of women attending screening in the United Kingdom. We did so using a prospective screening cohort of women enrolled in the Predicting Risk of Breast Cancer at Screening (PROCAS) study from Manchester, UK [23].

Methods

Cohort

All women invited for routine mammographic screening between October 2009 and September 2013 across 15 screening areas in Greater Manchester, UK, were mailed a questionnaire, study information and a consent form. The two-page questionnaire was designed to collect family history as well as hormonal and lifestyle risk factors for breast cancer (http://www.uhsm.nhs.uk/research/Documents/PROCAS%20Questionnaire.pdf). Each completed questionnaire was imported into a database and verified using a set of rules to check for inconsistencies. Screening mammograms were collected and stored. The earliest mammograms were film (20 %), but the majority used GE Senographe Essential full-field digital mammography (GE Healthcare, Chalfont St Giles, UK).

Ethics, consent and permissions

Consent was obtained at the time of screening. The study was approved by Central Manchester Research Ethics Committee (reference 09/H1008/81).

Study design

The primary clinical endpoint was diagnosis of breast cancer [International Classification of Diseases, Tenth revision, codes C50/D05: invasive breast cancer/ductal carcinoma in situ (DCIS)] from entry screen onwards, as identified through the National Health Service Breast Screening Programme (NHSBSP) system and the Somerset and North West Cancer Intelligence services. In secondary analysis, we considered invasive breast cancer only. Prospective breast cancers occurred between October 2009 and September 2014, and the median follow-up was 3.2 years. Cancers detected less than 100 days after enrolment were defined as being detected at the entry screen.

Two widely used risk models were chosen before the study began: (1) Tyrer-Cuzick (IBIS version 6.0: http://www.ems-trials.org/riskevaluator/) [4] and (2) Gail (April 2014 version: http://www.cancer.gov/bcrisktool/) [5]. They were used to assess 10-year risk. There were three limitations of the questionnaire for the models: (1) limited information regarding unaffected relatives was collected (Tyrer-Cuzick); (2) previous breast biopsy included, but not the number of biopsies (Gail); and (3) type of benign disease (including proliferative disease and atypical hyperplasia) was not recorded (Tyrer-Cuzick). Unaided visually assessed density was the primary density measure. It was assessed using all available screening mammograms (usually four: craniocaudal and mediolateral oblique images of both breast sides) and scored independently by two readers on a standard visual analogue scale from 0 % to 100 %; percentages were scanned using computer software. In total, 18 professionals assessed density; this group consisted of 10 radiologists, 2 breast physicians and 6 advanced practitioner radiographers, many of whom had participated in an earlier study of density [24].

The mean percentage from two readers and four mammogram views was used for women without breast cancer, and only the contralateral breast was used for all cancers. For practical reasons, both the left and right breasts were assessed for all participants, including those diagnosed with breast cancer at the entry screen. Mammograms were reread to assess any possible bias associated with increasing density in those thought to have cancer. Four readers who carried out the most readings reassessed the densest 101 contralateral breasts from cancers diagnosed at first screen and 101 non-cancers matched by density and year of acquisition. Each mammogram was reread by two readers independently and blinded to case status. Overall, density when reread decreased in cancers (mean 49.5–44.2 %) and non-cancers (49.8–42.4 %), which was likely due to regression to the mean from selecting the highest-density mammograms. Density dropped by 2.0 % (95 % CI −0.7 to 4.7 %) more for non-cancers than for cancers (P = 0.182 by Wilcoxon test), so we concluded that potential bias for risk was negligible.

Assay methods

Tumour pathology characteristics were assessed in a standardised manner as required by pathologists reporting in the NHSBSP [25].

Statistical analysis

The study was designed so that 600 screen-detected and interval breast cancers were expected between the first two screening rounds. This gave the study approximately 90 % power at 5 % two-sided significance for detection of an arbitrary breast cancer risk factor with a relative risk of 1.3 and occurring in 50 % of the population, or 1.5 in 30 % of the population or 2.0 in 15 % of the population.

Projected 10-year risk was taken as the primary predictor, partly because, in current UK guidelines [3], women with a 5–8 % or greater than 8 % 10-year risk would qualify for prevention and additional screening. Ten-year risk is also the default in the Tyrer-Cuzick model.

Measurement error from breast density was assessed by fitting a linear mixed-effects model by restricted maximum likelihood and corresponding intraclass correlation coefficient [26]. Breast cancer risk factors used in the models were summarised with categories for continuous factors chosen so that the reference group was an established standard or an average containing approximately the middle half of the cohort. Adjusted odds ratios (ORs) were estimated using a logistic regression with age (continuous). Percentage mammographic density was adjusted for age and body mass index (BMI) via a ‘density residual’, obtained from fitting a linear regression of density against age, BMI and type of mammogram (digital or film) [20] (see Additional file 1). This helped to make density more independent of the risk models, and combined density and risk model projections were obtained by multiplying Tyrer-Cuzick or Gail model expected risk by observed breast density risk. In analysis of risk factors, we used ORs, profile likelihood confidence intervals (CIs) and likelihood ratio (LR) χ2 statistics from continuous predictors [each with 1 degree of freedom (df)]. Logistic regression models were fitted to assess the calibration of predicted logarithmic ORs, and observed risk was the predicted risk multiplied by an estimated calibration coefficient. Observed and expected ORs were plotted using a normal kernel smoother with bandwidth chosen by 10-fold cross validation [27], also shown by decile of predicted risk. Calibration of absolute risks was not assessed. The area under the receiver operating characteristic curve (AUC) was a secondary measure of discrimination, with DeLong CIs [28]. Calibration and discrimination were also assessed by age subgroups with a likelihood-ratio test for interaction.

All P values were two-sided. Analysis was conducted using GNU-R version 2.15.1 statistical software [29].

Results

Cohort

Between October 2009 and September 2013, 201,187 women were invited to breast cancer screening and 130,332 attended (65 %). Of these, 51,744 women (40 %) consented to join the study; 750 had a breast cancer diagnosed from entry screen until the end of follow-up. To assess breast density as a risk factor, the following exclusions were made: 756 who had a previous diagnosis of breast cancer (29 with prospective cancer); 11 bilateral breast cancers and 7 for whom the side was unknown; 122 who had no visual assessment of breast density available (2 prospective cancers); 14 who were older than 73 years of age at enrolment (0 cancers); and 206 who had a breast implant (4 cancers). This left 50,628 women who were breast cancer–free before entry screen, with 697 breast cancers diagnosed after enrolment, of which 567 (81 %) were invasive, 128 were DCIS and 2 were unknown.

The majority [n = 556 (80 %)] of cancers were diagnosed at entry screen, 28 (4 %) between 100 days and 2.5 years after enrolment and 110 (16 %) more than 2.5 years after enrolment. For three women, the timing was unknown at the time of analysis. Ethnic or other origin was recorded for 48,807 women (96 %), of whom 46,491 (92 %) were reported as white (453 Jewish); the remainder were Asian (n = 739), black (n = 556), mixed race of ethnicity (n = 262) and other (n = 759).

The number of women per reader assessed for breast density ranged from 104 to 16,121 [interquartile range (IQR) 1600–9842]. The percentage variance in density explained by reader differences was estimated to be 11 %. Half of the absolute percentage differences between readers were less than or equal to 10.00 %; the IQR was 4.75–17.75 %. The intraclass correlation coefficient between left and right sides was 93 %.

Analysis and presentation

Distributions of breast cancer risk factors in the cohort are shown in Table 1. In summary, the majority of the screening age cohort were postmenopausal (72 %) or perimenopausal (18 %); 8 % were currently using hormone replacement therapy. Most women were parous (87 %), on average first giving birth when aged 24 years (IQR 20–27), and most were overweight (62 % BMI >25 kg/m2). Twelve percent of women reported first-degree relatives with breast cancer, and 14 % disclosed a prior breast biopsy.

Table 1 Breast cancer risk factors in the cohort

The risk models provided useful information for discrimination (Table 2). The IQR-ORs were 1.22 for the Gail model and 1.36 for Tyrer-Cuzick, but the AUCs were modest at 0.55 and 0.57, respectively. The Tyrer-Cuzick model had more than twice the amount of information as the Gail model in terms of likelihood ratio χ2 (49.2 vs 19.7, respectively). However, the Gail model performed better for cancers detected after entry screen (see Additional file 1), for which the IQR-OR was 1.35 (95 % CI 1.11–1.62) compared with 1.36 (95 % CI 1.12–1.63) for the Tyrer-Cuzick model. The findings were not materially affected when we restricted attention to invasive cancer (Table 2).

Table 2 Performance of risk models and breast density

ORs derived from 10-year risks were not well calibrated, being 60 % (95 % CI 44–74 %) of expected for Tyrer-Cuzick and 46 % (95 % CI 26–65 %) of expected for the Gail model. Figure 1a, b provides a graphical illustration in which the shrinkage in risk distribution is reflected by the line of fit and risk deciles are less than expected for the high-risk groups and more than expected for the low-risk groups.

Fig. 1
figure 1

Calibration and spread of risk from the models and density. The predicted and observed odds ratios from (a) the Tyrer-Cuzick model and (b) the Gail model in the cohort are shown. c Histogram of observed risk. O vs E is the estimate from a logistic regression of the logarithmic predicted odds ratio. TC Tyrer-Cuzick 10-year risk, DR density residual

Overall, the Gail model was not more informative than age (AUC 0.55, 95 % CI 0.52–0.57). However, Table 3 shows that it added independent information to age (LR χ2 9.2), similarly for Tyrer-Cuzick (LR χ2 48.8). There was little evidence of interaction by age group [Gail LR χ2 3.9 (df = 5), P = 0.57; Tyrer-Cuzick, LR χ2 5.8 (df = 5), P = 0.33].

Table 3 Model performance, by age group

Visually assessed breast density was inversely correlated with BMI (Spearman correlation coefficient −0.38) and age (Spearman correlation coefficient −0.19). Density was less for digital mammograms (median 24 %, IQR 14–37 %) than for film (27 %, 15–44 %) (P < 0.001). It was higher in women with cancer (median 28 %, IQR 18–41 %) than in those without (24 %, 14–38 %) (P < 0.001). After adjusting density for age, BMI and type of mammogram in the density residual, we observed that the LR χ2 was doubled from 27.4 to 61.4, and it was a stronger univariate risk factor than either model (IQR-OR 1.48, AUC 0.59).

Mammographic density added substantial significant information to the models and increased the AUC by 0.04 for both (Table 2). An OR relative to the sample mean was estimated for each woman from the risk models alone and in combination with breast density. Figure 1c shows the distributions of ORs, which demonstrate that more women were accurately given high and low risks when density was added. Table 4 cross-tabulates incidence by risk groups from the models alone and with density. Inspection similarly shows that adding density helped to identify more high- and low-risk women accurately and that the number of women with a predicted Tyrer-Cuzick 10-year risk greater than 8 % was more than doubled, from 1.2 % to 2.7 %.

Table 4 Breast cancer incidence cross-classified by 10-year risk groups from the Tyrer-Cuzick and Gail models when combined with breast density

Most (80 %) of the cancers were diagnosed at entry, but the distribution of breast density and residual was similar in cancers diagnosed at screening versus later (Fig. 2). Furthermore, the IQR-ORs for adjusted density were 1.48 (95 % CI 1.32–1.64) at entry screen and 1.49 (95 % CI 1.20–1.85) thereafter (see Additional file 1). This suggests that density was predictive of future cancer, as well as being a cross-sectional risk factor.

Fig. 2
figure 2

Breast density and residual by time of diagnosis since enrolment. a and c Histograms and empirical cumulative distribution functions for breast density. b and d Histograms and empirical cumulative distribution functions for age, body mass index and type of image adjusted residual. The cancers are split into those diagnosed within 100 days of entry (<100-d) and more than 100 days (100-d+). A Wilcoxon test for the difference between <100 days and 100+ days yielded P = 0.34 for visual analogue scale (VAS) and P = 0.98 for the residual.

Discussion

The results of univariate analysis of breast cancer risk factors shown in Table 1 broadly agreed with the literature [5]. In this predominately postmenopausal cohort, age at menopause yielded the most information on the basis of LR χ2 statistics. Age at menopause is not in the Gail model, so this is part of the reason why overall the Tyrer-Cuzick model performed better. It was also interesting to note that a self-reported previous biopsy conferred more information than did presence of affected first-degree relatives. Mammographic density was a stronger risk factor than the risk factors used by the Tyrer-Cuzick or Gail models, after adjustment for age and BMI.

Although the risk models were found to provide useful information for risk stratification without mammographic density, their relative risks were not well calibrated. A partial reason is that some risk factors were not as strong as expected, the most noticeable being family history [30].

Our study differs from some other work in that we used a percentage visual assessment rather than the four-category BI-RADS scale. An advantage is that the visual percentage is not designed to assess masking issues, where dense tissue may make it more difficult to detect cancer in mammograms. This is relevant because in the most recent version (fifth edition) of the BI-RADS scale, specific reference to percentage density was removed [11] to focus on masking issues. It remains to be seen if the change in emphasis and focus on masking affect the distribution of BI-RADS density, as well as its relationship with risk.

Earlier studies have also shown that continuous measures of percentage density are useful in combination with classical risk factors. Chen et al. [19] used a case–control study of women (majority aged 35–74 years) recruited into a screening study in the United States during the 1970s and proposed a better model with some of the Gail model factors, weight and categorised percentage density. While their study demonstrated that density is very useful after allowing for classical factors, the model does not appear to be widely adopted in clinical practice, nor does it seem to have been externally validated. Warwick et al. [20] showed that density added to the Tyrer-Cuzick model in a case–control analysis of relatively young women (90 % aged 40–60 years), mostly from the United Kingdom, who were at high risk of breast cancer, which is a different source population than targeted in our present study. An issue that affects the earlier studies is that they recruited when film rather than digital mammography was used, and when populations were less obese. Our work adds to this literature by measuring how well commonly used risk estimation models performed in a UK screening cohort, and by showing how these models might be improved with breast density. The findings might help to inform the design of risk-adapted screening and prevention strategies in the United Kingdom and elsewhere.

Some limitations of the study include the following. Firstly, the risk factors were self-reported via a questionnaire. Secondly, breast density was used to predict risk at the same screen, and the subgroup analysis by time of diagnosis since enrolment was limited. More follow-up is needed to help to address this issue. Thirdly, the visually assessed score required human judgement, which might make it unreliable for routine use in a screening program [31], although the same applies to BI-RADS density. Exploration of automated methods is ongoing in a subset of the cohort. Fourthly, a possible limitation is that the reader was able to partially identify which mammograms had cancer at baseline. However, a small study suggested such bias is negligible, and almost identical relationships with breast cancer risk were observed in the cancers diagnosed after the entry screen, which was also found in an earlier screening study of the same density measure [32]. Thus, we believe this bias is extremely small, if present at all.

Two further criticisms might be made of the study. Firstly, it might be argued that the Gail model predicts the absolute risk of invasive cancer and does not include DCIS. However, the relative risk model was fitted to invasive cancer and DCIS [4], which is partial justification for using this endpoint in the analysis. It was also specified before inspection of the data in a statistical analysis plan that focused on relative risks rather than absolute rates, and a secondary analysis of invasive cancers did not suggest a bias against the Gail model from including DCIS. Secondly, one might argue that the risk models were designed to predict only future risk of women with a negative mammogram. We feel that it is not unreasonable to apply the implied relative risks from risk models to a cross-sectional study, albeit with a recalibration, because previous work has found breast cancer risk factors to have a similar magnitudes in case–control studies as for cohorts (see, e.g., [30]). Further, many of the participants will have had a previous negative screening mammogram, and we did not find a significant interaction between model performance and age.

Conclusions

We used a prospective cohort from the United Kingdom to test whether visually assessed mammographic density added to the Tyrer-Cuzick and Gail models for women who attend breast cancer screening. Discrimination measured by changes in LR χ2 was doubled, and a larger proportion of women could be accurately classified to be at more than a moderately high risk when density was combined with either model. However, the AUC values remained modest, so there is still more to be done.

To our knowledge, this study is the first in which the Tyrer-Cuzick risk model has been evaluated in a prospective screening setting. Although the analysis was limited by the cross-sectional nature and issues such as lead time, the Tyrer-Cuzick model was found to provide useful information for risk assessment. The Gail model has been well validated for use in North America, but in this UK setting the Gail model was outperformed by the Tyrer-Cuzick model.

In conclusion, the data in this report are of relevance for designing improvements to the national screening program in the United Kingdom. By combining the Tyrer-Cuzick model with breast density, we identified 72 % of the population which had an average or below average risk of breast cancer, and they had proportionally fewer breast cancers. Such results might be used to inform modelling of the effect of different strategies for risk-adapted screening and prevention.