Introduction

Breast cancer is the most common malignancy worldwide [1]. In Scotland, it constitutes 28.8% of all cancers among women, with 1 in every 9 women carrying a risk of developing it in her lifetime (https://www.scotpho.org.uk/).

Breast cancer has been classified into ‘intrinsic’ or molecular subtypes based on mRNA expression profiling that have different treatment and survival outcomes [2]. The characteristics of these molecular subtypes are largely distinguished by expression of various combinations of tumour markers such as oestrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor-2 (HER2) and Ki67 tumour proliferation marker. Although gene profiling is considered the gold standard for classification of molecular subtypes, given the cost and lack of genetic profiling in clinical practice, a similar classification defined by immunohistochemistry (IHC) staining is a well-accepted surrogate [3, 4]. The St. Gallen Expert Panel recommends using ER, PR and HER2, along with tumour grade as a proxy for Ki67 index in defining the subtypes when the latter is unknown [4]. Based on IHC characterisation, the molecular subtypes are: luminal A-like, luminal B-like (HER2−), luminal B-like (HER2+), HER2-enriched and triple-negative breast cancer (TNBC). As the luminal-like cancers (ER/PR+) express hormone receptors, they can be effectively treated with molecularly targeted hormone therapy and generally have better prognosis. Due to limited therapeutic targets, i.e. ER/PR or HER2 in TNBC, the most aggressive subtype, chemotherapy along with surgery are the primary treatment options [5, 6].

Reproductive factors have been well documented as key breast cancer risk factors with direct associations observed with early age at menarche, nulliparity, late age at menopause and first birth, and limited breastfeeding [7, 8]. Data also suggest that there is a temporal relationship with time since last birth, where a short-term increase in breast cancer risk is observed 3–5 years after last birth [9, 10], before a long-term protective effect of parity is observed compared to nulliparity.

Within Scotland’s renowned, high-quality routine electronic health records, the Scottish Cancer Registry (SMR06) is an excellent resource to investigate risk factors for cancer incidence. In Scotland, ER status data collection began in 1997, and PR and HER2 data collection started in 2009, almost a decade earlier than other registries in the UK. We have recently reported the high quality of these data and shown distinct temporal trends by molecular subtypes and observed increasing incidence of ER+ subtypes among women of screening age (50–70 years), among whom about half of all cases are diagnosed [11].

In this study, we aimed to assess whether there are differences in reproductive risk factors among invasive breast cancer cases diagnosed in Scotland using a ‘case–case’ approach. A case–case analysis compares the risk factor associations of breast cancer by comparing cases of a certain molecular subtype to cases of another subtype, without also describing risk factor patterns in women without breast cancer [12].

Methods

Data sources and study population

All persons that are residents of Scotland are registered with a GP practice (defined as residing in UK for 3 months or longer). Records from within the UK can be added for UK residents. There are no accurate records of emigrations outside the UK (Scotland), however within the registry there is a variable that indicates whether a woman emigrated (to England or a different country). Numbers were really small (< 20 cases for the whole study period had an embarkment date recorded). The Information Services Division (ISD) of Public Health Scotland holds population-level National Health Service (NHS) data for Scotland which can be deterministically linked using the Community Health Index (CHI) number, a unique patient identifier. Probabilistic linkage providing < 4% false positive and < 2% false negative linkage (https://www.scotpho.org.uk/publications/overview-of-key-data-sources/scottish-national-data-schemes/isd-linked-database). Incident primary breast cancer cases were identified using data from the Scottish Cancer Registry (https://www.isdscotland.org/Health-Topics/Cancer/Scottish-Cancer-Registry/) which attains an average of 95.4% breast cancer case ascertainment and is over 99% complete [13] (https://www.isdscotland.org/Quality-Indicators). All tumours diagnosed in women 20+ years of age, with a primary invasive breast cancer (defined on the basis of the International Classification of Diseases, 10th revision code of C50) between 1997 and 2016 were ascertained (https://www.ndc.scot.nhs.uk/National-Datasets) (https://www.isdscotland.org/Quality-Indicators).

Approval for the analysis was obtained from the Public Benefit and Privacy Panel (PBPP) of NHS Scotland, and analyses were conducted in the Scottish National Safe Haven (PBPP Reference Number 1718-0057).

Maternity data

CHI number and probabilistic matching were used to link cancer registry data (SMR06) to Scottish Morbidity Records maternity inpatient and day case records (SMR02) which was available from 1981. To improve completeness of maternity data, the study excluded women who were ≥ 16 years (i.e. already in their reproductive years) in 1981, resulting in a cohort of women born in 1966 or thereafter. Data on number of births, age at first birth and time since last birth, including both live births and stillbirths, were calculated. The number of births was derived from the number of maternity records each woman held in SMR02. The maternal age from the first maternity record for a parous woman was considered as her age at first birth. Time since last birth was calculated as the time from the most recent birth preceding a cancer diagnosis.

Molecular subtypes definition

The Scottish Cancer Registry (SMR06) records the receptor status for breast cancers using immunohistochemistry (IHC) staining for ER, PR and HER2, and for borderline IHC HER2 results the status based on fluorescence in situ hybridization (https://www.isdscotland.org/Cancer-Registration-Definitions). While ER status for breast cancer became available in SMR06 in 1997, recording of information on PR and HER2 status commenced only in 2009 (https://www.isdscotland.org/Cancer-Registration-Definitions). As we aimed to evaluate the subtypes based on ER, PR and HER2 status, we focused on cases diagnosed from 2009 onwards. Due to non-availability of data on Ki67 labelling index, tumour grade was employed as a proxy for distinguishing the luminal subtypes [4]. The outcome variable, breast cancer subtype, was derived from four variables in SMR06: ER status, PR status, HER2 status and histological grade of the tumour. The five subtypes were defined as: ‘luminal A-like’ [ER/PR+ HER2− grade 1 or 2], ‘luminal B-like (HER2−)’ [ER/PR+ HER2− grade 3], ‘luminal B-like (HER2+)’ [ER/PR+ HER2+], ‘HER2-overexpressed’ [ER-PR-HER2+], and ‘triple-negative breast cancer’ or ‘TNBC’ [ER-PR-HER2−]. SMR02 and SMR06 datasets were linked by ISD using a pseudonymised CHI.

The cohort was limited to women with complete data on IHC-defined molecular marker status and tumour grade. Further restricting to women born in 1966 or later and with a breast cancer diagnosis between 2009 and 2016, resulted in a cohort of women diagnosed at 50 years of age or younger.

Statistical analyses

A total of 431 (10% of cases) had missing subtype data and were excluded from analyses. To provide finer adjustment for age, we used 5 year age categories in regression models (20–35, 36–40, 41–45, 46–50). Age distribution at diagnosis of breast cancer, number of births, age at first birth and time since last birth were computed for each breast cancer subtype. Pearson’s chi-square tests were used to test for differences between subtypes in the distribution of reproductive risk factors of interest. We determined the correlation of age at diagnosis and each reproductive risk factor by computing Spearman’s correlation coefficients [14]. Polytomous logistic regression models adjusted for age at diagnosis of breast cancer were used to estimate odds ratios (OR) and 95% confidence intervals (CI) with the most common subtype, luminal A-like, as the reference group. We tested for interaction of age using likelihood ratio test (LRT) in polytomous logistic regression models with and without interaction term for each reproductive risk factor of interest. Tests were considered statistically significant at the 5% level. Stata MP V14 (College Station, TX) was used for all analyses.

Results

The final study population included 4,108 women with breast cancer diagnosed at or below 50 years of age with data available to assign breast cancer subtype, after excluding 9.7% of the initial cohort with missing hormone status or tumour grade data (data not shown). There was a significant relationship between age at diagnosis and missingness of subtype, with patients 41–50 less likely to have missing subtype data (66.1% vs 59.6%), although the mean age was similar between those not missing subtype (mean age = 41.8 (SD = 5.3) and missing subtype (mean age = 40.9 (SD = 5.2). Luminal A-like was the most common type (40%) and HER2-overexpressed was the least common (5%, Fig. 1).

Fig. 1
figure 1

Distribution of breast cancer subtypes defined by immunohistochemistry and tumour grade among 4,108 women born after 1965 who had breast cancer diagnosed in Scotland between 2009 and 2016. Luminal A-like (n = 1650), luminal B-like (HER2−) (n = 998), luminal B-like (HER2+). (n = 629), HER2-overexpressed (n = 214), Triple-negative (n = 617)

Distribution of age at diagnosis of breast cancer, number of births, age at first birth and time since last birth by the five breast cancer subtypes are presented in Table 1. Overall, 34% of breast cancers occurred in patients of age 40 years or younger and 66% between 41 and 50 years of age. The proportion of all luminal A-like tumours diagnosed in the age group 46–50 years was higher at 31.9% compared to the other subtypes being diagnosed in this age group (ranging from 19.7 to 22.6%). Women with luminal A-like subtype had the highest proportion of absence of birth records (assumed nulliparity of 30.5% or 69.5% with one or more birth records) and breast cancer diagnoses that were six or more years following their most recent birth (82.8%). Women with HER2-overexpressed and TNBC had higher proportions of one or more birth records (79.0% and 75.7%, respectively) and lower frequency diagnoses compared to luminal made six or more years after last birth (70.2% and 69.1%, respectively). Chi-square test revealed no statistically significant differences for age at first birth by subtype (Table 1). A significant correlation between age at diagnosis and time since last birth was observed (Spearman R2 = 0.66 p <0.001) as with age at first birth (Spearman R2 = 0.10) but not number of births (Spearman R2 = 0.19 p = 0.28).

Table 1 Descriptive characteristics of the cohort comprising of women born after 1965 and diagnosed with primary invasive breast cancer between 2009 and 2016 in Scotland stratified by surrogate molecular subtypes

Women with TNBC were significantly more likely to have at least one (relative to no birth records) in comparison to those with luminal A-like tumours (Table 2). Although based on fewer cases, a similar association was observed for women with HER2-overexpressed tumours who were more likely to have three or more births (relative to no birth records) when compared to women with luminal A-like tumours, in addition to a statistically significant test for trend across all subtypes. We observed a significant interaction with age at diagnosis and number of birth (LRT p = 0.05) hence we also present models adjusted for an interaction term. These results showed similar relationships, however estimates showed wider confidence intervals.

Table 2 Association of number of births among women born after 1965 and diagnosed with primary invasive breast cancer between 2009 and 2016 in Scotland by molecular subtypes adjusted for age at diagnosis with and without an interaction term for age at diagnosis

Table 3 shows case-case analysis for age at first birth by subtype. Luminal B-like HER2+ tumours compared to luminal A tumours were more likely to have a later age at first birth. In contrast, TNBC were less likely to have an older age at first birth compared to luminal A-like tumours. We did not observe statistical evidence of an interaction with age (LRT p = 0.40).

Table 3 Association of age at first birth among parous women born after 1965 and diagnosed with primary invasive breast cancer between 2009 and 2016 in Scotland by surrogate molecular subtypes adjusted for age at diagnosis

When compared to the luminal A-like subtype, TNBC cases were significantly less likely to have last given birth > 10 years ago (relative to ≤ 2 years ago) (Table 4). Other subtypes did not show a clear association for time since last birth. We did not observe statistical evidence of an interaction with age (LRT p = 0.34).

Table 4 Association of time since most recent birth with among parous women born after 1965 and diagnosed with primary invasive breast cancer between 2009 and 2016 in Scotland who had their last birth prior to diagnosis by surrogate molecular subtypes adjusted for age at diagnosis

Discussion

Using Scottish cancer registry data linked to maternity health records, we show that parity, number of births and time since last birth to diagnosis of breast cancer differ by IHC-defined molecular subtypes of breast cancer among women ≤ 50 years of age at diagnosis of breast cancer. Breast cancer aetiology in younger women is not fully understood as few risk factors have been identified. Furthermore, few opportunities for early detection of breast cancer are available for younger women beyond genetic counselling for high-risk families.

Multiple reports and pooled analyses have recently evaluated IHC and mRNA expression profiling defined molecular subtypes of breast cancer and consistently show a positive association with parity for triple-negative or basal-like breast tumours [15,16,17,18,19,20]. Interestingly, significant differences in the incidence of breast cancer exist for different ethnic and racial groups that also frequently have different reproductive histories [21]. Consistent with these data, we also found evidence of heterogeneity in reproductive history across IHC-defined molecular subtypes of breast cancer in this Scottish cohort. Women with ER- tumours (HER2-overexpressed and TNBC) were more likely to have a higher number of births compared to women with luminal A-like subtype. Unlike ER- cancers, we did not observe heterogeneity in number of births between luminal B-like (HER2+) and luminal A-like, which concurs with other reports [22,23,24,25]. Time since last birth showed differential associations by subtype, where women with TNBC or luminal B-like (HER2+) were less likely than women with luminal A-like tumours to have a longer time between their most recent birth and diagnosis of breast cancer. Findings for TNBC correspond well with the existing studies [26, 27].

Parity confers a dual effect on the risk of breast cancer with an augmented risk observed in the initial years following pregnancy (3–5 years, or even up to 10–15 years) [28,29,30], possibly by stimulating the growth of cells that have undergone initial stages of malignant change and also due to the immunosuppressive effects of pregnancy [28, 31]. It is only subsequent to this phase that the protective effect of parity sets in [32, 33] owing to the differentiation of normal breast cells that have the potential to undergo malignant transformation. While this has been observed for ER+ breast cancers (luminal A-like) [8, 22, 34], an increased risk of ER- breast cancer continues to persist even in the longer term [25, 27, 35]. Our results did not observe significant differences across subtypes for age at first birth. However, TNBC cases were more likely to have a younger age at first birth when compared to luminal A-like cases (approximately 16% versus 12.5% patients for age at first birth < 20 years). A similar, statistically significant association has been reported by other studies [22, 23, 35,36,37]. Luminal B-like (HER2−) cases showed no statistically significant difference from luminal A-like for either of the three risk factors of interest even though studies have reported an inverse association with number of births and a positive association with age at first birth for this subtype [38, 39].

ER- breast cancers are less likely than ER+ breast cancers to be detected through screening [40], and predictive modelling of breast cancer risk has been proposed as possible solution for personalised medicine and risk stratified screening [41,42,43]. Modelling studies using UK data suggest such risk stratified screening approaches could reduce overdiagnosis, improve cost-effectiveness, while maintaining the benefits of screening [44].

The key strengths of our study are the high-quality longitudinal data collected within the Scottish Cancer Registry for the entire population, and the availability and high level of completeness of molecular marker and tumour grade data (≤ 10% missing data). Another strength of the study is the inclusion of women diagnosed at age 50 years or below. Although breast cancer is less common within this age range, the tumours are more aggressive with poor prognosis making it important to identify and implement effective approaches to prevention amongst this age group [45]. Moreover, breast cancer incidence appears to be increasing in younger age groups in recent years in Scotland [11] and other populations such as the United States [46].

Although this is one of the largest studies of breast cancer among young women, a limitation is the modest number of cases for rarer tumour subtypes, especially HER2-overexpressed (5% of all cases), potentially reducing the statistical power of analyses for these tumour subtypes. Our study did not assess incidence or risk of breast cancer, which would require comparisons to controls/general population. In addition, we cannot exclude some residual confounding by age at diagnosis since we did observe some association with age and missing subtype data. Future work including a comparison cohort of women not diagnosed with breast cancer would add further updated information about the role of reproductive history as a risk factor for breast cancer, including, in due course for whose breast cancer is diagnosed at older ages. Other limitations of our study were the potential for incomplete maternity records for women whose children were born outside Scotland, lack of availability of data for other factors such as breastfeeding as well as for a more detailed mRNA expression or mutation profiling of the cancers.

In conclusion, our data highlight the value of integrating molecular data from tumours with routinely collected health records data for understanding cancer epidemiology. There is scope for future analysis using the cancer registry linked to other datasets, including community prescription records, and primary care records, to provide more detailed information on the role and patterns of key risk factors and possible new aetiologic or prognostic factors for subtypes of breast and other cancers.