Background

Women in Africa have lower incidence rates of breast cancer (BC) than women in developed countries (age-standardized rates (ASR) per 100,000 of 36 vs. 74), but higher mortality rates (ASR of 17 vs. 15) [1]. Furthermore, there is variation in the relative survival (RS) from BC by stage and country-level human development index (HDI) in sub-Saharan Africa (SSA) with the 5-year RS after breast cancer diagnosis in Mauritius at 83.2% and the lowest in Uganda at 12.1%, while it ranges between 40.1 and 64% in Kenya as per data abstracted from the Eldoret and Nairobi Cancer Registries, respectively [2]. Furthermore, survival differences in SSA remain for any given breast cancer stage with the lowest 3-year breast cancer-specific survival observed in Nigeria at 38% compared with 68% in Black women from Namibia, thus underlying as yet unexplained risks with survival [3]. In Kenya, country figures indicate that BC is the most frequently diagnosed cancer among women, representing 20.8% of all cancer cases, and the second most common cause from cancer mortality [4].

Although advanced stage at presentation, lack of awareness about BC and limited access to available screening and treatment options [5] are contributing factors to disparate mortality rates, whether incidence for more aggressive breast cancers are higher in African women remains controversial. Women of African descent present with BCs a decade earlier than their Caucasian counterparts [6, 3], and despite correcting for risk factor distribution, their tumors still tend to be estrogen receptor (ER) negative [7], suggesting the interplay of other biologic and genetic differences that remain largely unexplored.

Breast cancer can be divided into several molecular subtypes based on gene expression profiling analysis, which are subsequently corroborated by a panel of immunohistochemical (IHC) markers including ER, progesterone receptor (PR), human epidermal growth receptor factor 2 (HER2), proliferation marker Ki-67, cytokeratin (CK) 5/6, and epidermal growth factor receptor (EGFR). Epidemiologic studies have demonstrated that BC risk associated with established risk factors, including genetic and environment/lifestyle factors, differ for different breast cancer subtypes [8], which highlights the importance of developing subtype-specific risk prediction and prevention strategies [9]. Overwhelmingly, these breast cancer prediction models have been derived from European ancestry women and some studies have noted poor performance in African women [10]. This is likely explained by the differential associations of risk factors such as parity and obesity for ER-positive and ER-negative cancers and higher frequencies of ER-negative cancers among African women. In addition, the prevalence of breast cancer risk factors, including genetic background and environmental exposures, show marked differences between indigenous African and European and even African American women. Notably, women in African countries are more likely to have high exposures to infectious agents (malaria and other parasites), and a low prevalence of traditional BC risk factors (including low or late parity, lack of breastfeeding, obesity, and exogenous hormone use), which may contribute to differences in the risk of different BC subtypes. Furthermore, there are great variations in genetic structure and exposures as well as breast cancer subtype distributions across different African populations [11, 7, 12]. Therefore, studies in diverse indigenous African populations will allow for a broader capture of associations between risk factors and tumor subtypes, particularly for exposures and subtypes that are in general very rare but are prevalent in African populations. Findings from these studies will improve our understanding of risk factor heterogeneity and our ability to develop risk prediction models that are better tailored for specific African populations.

Here, in this study, using carefully annotated risk factor and pathology data collected from 838 BC patients enrolled from multiple hospitals across Kenya, we aimed to evaluate distributions of established BC risk factors across BC subtypes.

Methods

Study population and risk factor data

The study has been previously described but in brief, 838 pathologically confirmed BC cases were collected across Kenya between March 2012 and May 2015 [13]. There were 15 hospital/health facilities which we grouped into 5 network/regional facilities: Aga Khan University (AKU) hospitals (including AKU hospitals at Kisumu, Mombasa, and Nairobi), AIC Kijabe Hospital, Nyeri Provincial General Hospital (PGH), St Mary’s Mission Hospital (Nairobi), and others (Supplementary Table 1). The grouping was based on whether public, faith-based or private institutions. Institutional ethics approval was obtained. Socio-demographic, clinical, reproductive, and known breast cancer risk factor data were collected using a standardized questionnaire.

Pathology, immunohistochemical data, and molecular subtypes

Pathologic characteristics including histologic grade, histologic tumor type, tumor size, lymph node stage, lymphovascular invasion, and ER/PR/HER2 status were extracted from the clinical database. Central pathology review and IHC for ER/PR/HER2 of all breast carcinoma tissue were done at AKU Hospital, Nairobi, and interpreted by SS and ZM. AKU Pathology department is a College of American Pathologists accredited laboratory and as such enrolls in proficiency testing schemes for breast biomarkers. Additional slides were cut at 5 μm and subjected to IHC stains for EGFR, CK5/6, and Ki67 (Dako Monoclonal mouse anti-human antibodies were used; wild type EGFR polyclonal antibody in a dilution of 1:200, CK5/6 clone D5/16 B4 ready to use, Ki-67 Clone MIB-1, ready to use) according to the manufacturer specifications as previously described [13], with appropriate control tissues included, and stained on the DAKO Autostainer link instrument.

ER and PR tumor expression were considered positive by IHC with ≥ 1% nuclear staining. HER2 expression was determined by IHC and fluorescence in situ hybridization (FISH), the latter in case of an equivocal HER2 IHC result. An IHC score of 3+ or a FISH-positive test result was defined as HER2-positive [14]. Ki-67 was considered high if 20% or more of the cells showed nuclear staining based on St Gallen recommendation [15].

We used Ki-67 status (low/high) to discriminate luminal A and B and used tumor grade as a surrogate for patients with missing Ki-67 [16]. For EGFR and CK5/6, a result was considered positive for any amount of cytoplasmic or membranous staining in any percentage of tumor cells as per the recommendations from the British Columbia study for defining the Basal subtype of breast cancer [17].

Molecular subtypes were defined based on previous clinically validated guidelines [18] (Fig. 1): luminal A: ER+ and/or PR+, HER2−, and low Ki-67/histologic grade (I or II); luminal B-HER2+: ER+ and/or PR+, and HER2+; luminal B-high proliferative: ER+ or PR+, HER2−, and high Ki-67/histologic grade (III); HER2-enriched: ER−, PR−, and HER2+; and triple-negative (TN): ER−, PR−, HER2 (Fig. 1). Due to the small sample size, in primary subtype analysis, we grouped the two luminal B subtypes into a single subtype for risk factor associations. For patients with EGFR and CK5/6 data available, we further stratified TN patients into core-basal like (CK5/6+ and/or EGFR+) and five negative (CK5/6− and EGFR−).

Fig. 1
figure 1

Breast tumor subtype definition in Kenyan breast cancer patients (N=838). *Tumor grade was used to determine tumor subtypes in the absence of ki67: if tumor grade is low or intermediate, define tumor subtype as “Luminal A”; if tumor grade is high, define tumor subtype as “Luminal B HER2-”. †Seventeen cases are not included due to their missing HER2 status. ‡Forty-five cases are not included due to their missing CK5/6 and EGFR status. CK5/6, cytokeratin 5/6; EGFR, epidermal growth factor receptor; ER, estrogen receptor; HER2, human epidermal growth factor receptor-2; PR, progesterone receptor

Statistical analysis

Distributions of breast cancer risk factors, including sociodemographic, reproductive, and tumor pathologic characteristics in the overall study population and by hospital groups, were assessed using the chi-squared test or Fisher’s exact test. Multivariable polytomous logistic regression models were used to determine associations between BC risk factors and tumor molecular subtypes (ER status or luminal A-like as the reference).

All regression models were fully adjusted for the same covariates (except for where noted): age at diagnosis, BMI, age at menarche, age at first pregnancy, number of children, averaged breastfeeding duration, age at menopause, family history of breast cancer in 1st degree female relatives, highest education level, and occupation. A two-tailed P value less than 0.05 was considered statistically significant. All analyses were performed with SAS v9.4 statistical software (SAS Institute Inc.).

Results

Descriptive analysis of sociodemographic and reproductive characteristics

There were 838 invasive breast cancer cases with complete data on ER and PR status after exclusion of DCIS cases (n=21) and cases without any data for tumor subtype (n=8). Fifty-four percent of patients were diagnosed under 50 years of age, 69% had BMI ≥ 25 kg/m2 at diagnosis and 61% lived in rural areas. Our study population was also characteristic for late age at menarche (≥ 13 years, 92%), young age at first pregnancy (< 25 years, 70%), having 3 or more children (68%), high prevalence in breastfeeding (95%), and long breastfeeding duration (≥ 1 year per child, 80%) (Table 1).

Table 1 Distributions of breast cancer risk factors in Kenyan breast cancer patients, overall and by hospitals (N=838)

Compared to patients admitted to the other 4 hospital groups, AKU patients were more likely to be overweight or obese (79%), have tertiary education level (45%), start the first pregnancy ≥ 25 years (35%), have < 3 children (39%), and have shorter breastfeeding duration per child, which is as expected given that AKU is a private health facility, and compared to the others, patients are generally from a higher socioeconomic status.

Distributions of tumor subtypes and pathologic characteristics in the overall study population and by hospitals

The distribution of tumor subtypes defined by IHC markers is presented in Fig. 1 and Table 2. Overall, 69.5%, 59.4 %, and 27.4% of patients were ER+, PR+, and HER2+, respectively. After classifying BC into molecular subtypes, 34.8%, 35.8%, 10.7%, and 18.6% of patients had luminal A, luminal B, HER2-enriched, and TN breast cancers, respectively. More than 90% of patients had tumors larger than 2 cm (2–< 5 cm, 53.5%; ≥ 5 cm, 38.9%) and had intermediate-to-high tumor grade (intermediate, 45.9%; high, 49.1%). Sixty-one percent of tumors showed lymphovascular invasion. Nearly half of patients received definitive surgery, either lumpectomy or mastectomy, among which 91% had stage II or higher disease and for those cases with lymph node metastases, 39.5% were positive for extra-nodal extension. AKU patients were more likely to have small (≤ 2 cm) and early-stage tumors (P < 0.01). Patients admitted to Kijabe and Nyeri hospitals had higher proportions of tumors with lymphovascular invasion: 71.4% and 69.1%, respectively. There was no statistical difference in distributions of patient molecular subtypes (defined by ER, PR, and HER2) across hospitals (P = 0.08).

Table 2 Distributions of tumor characteristics in Kenyan breast cancer patients, overall and by hospitals (N=838)

Associations between breast cancer risk factors and tumor subtypes ER, PR, and HER2

Results of adjusted associations between risk factors and ER status are shown in

Table 3. Associations between breast cancer risk factors and ER status in Kenyan breast cancer patients (N=838)

Table 3. Compared to ER-positive patients, ER-negative patients were more likely to have higher parity (OR = 2.03, 95% CI = 1.11, 3.72, Ptrend = 0.021, comparing ≥ 5 to ≤ 2 children). ER-negative patients were also more likely to have longer cumulative breastfeeding duration (OR = 2.38, 95% CI = 1.33, 4.24; comparing ≥ 62 to < 39 months); however, these positive associations became insignificant after adjusting for a number of children. In fact, analyzing parity and breastfeeding variables together showed that the association was driven by parity (Table 3). In addition, the average duration of breastfeeding per child did not vary significantly by ER. Overall, we observed similar associations for PR to those for ER (Supplementary Table 2). BMI, either overall or by menopausal status, did not significantly vary by ER or PR status. When stratified by HER2 status, we found that, compared to HER2-negative patients, HER2-positive patients were less likely to be obese (OR = 0.58, 95% CI = 0.34, 0.97, Ptrend = 0.038), especially among postmenopausal women (OR = 0.26, 95% CI = 0.10, 0.62, Ptrend = 0.0026) (Supplementary Table 2). Similar results were observed when we restricted to early-stage patients (OR = 0.76, 95% CI = 0.59, 0.98, Ptrend = 0.038) suggesting that the association was unlikely to be due to the reverse causation.

Given that several risk factors and clinical variables varied by hospital groups (Tables 1 and 2), we next tested whether the observed associations varied among patients admitted to different hospital groups. In this analysis, we selected five key risk factors (i.e., BMI, age at first pregnancy, number of children, and mean breastfeeding duration per child, combined number of children and cumulative breastfeeding duration) and stratified their associations with ER or HER2 (for BMI) status by five hospital groups (Fig. 2 and Supplementary Figure 1; Supplementary Table 3 and 4). With the exception of Nyeri, the associations with ER were fairly consistent across other hospitals for age at first birth, parity, and breastfeeding (Fig. 2). In contrast, the association between BMI and HER2 appeared to be driven by AKU patients (Supplementary Figure 1), among whom obesity was significantly more prevalent than patients in other hospitals; however, this pattern was also observed among patients at Kijabe Hospital.

Fig. 2
figure 2

Associations between key breast cancer risk factors and ER status by hospitals. Odds ratios (OR) and 95% confidence interval (CI) were calculated from multivariable logistic regression models with ER status as the outcome variable (ER+ as reference) adjusting for categorized age at diagnosis and BMI

We further evaluated the associations between the risk factors and ER in younger (< 50 years) and older (≥ 50 years) women separately. In general, the associations with most risk factors were similar in younger and older women, except that we observed an association between older age at menarche and ER-negative patients in older (OR = 2.25, 95% CI = 1.04, 4.84, P = 0.038, comparing ≥ 15 to ≤ 13 years) but not in younger women (OR = 0.98, 95% CI = 0.52, 1.87, P = 0.96, comparing ≥ 15 to ≤ 13 years) (Supplementary Table 5).

Associations between breast cancer risk factors and molecular subtypes

Table 4 shows that the associations between BC risk factors and molecular subtypes defined by joint receptor status. Compared to luminal A patients, luminal B patients (combining luminal B-HER2+ and luminal B-high proliferative) were more likely to have lower parity (patients with 3 or 4 children, OR = 0.47, 95% CI = 0.28, 0.79, p = 0.005; with 5 or more children, OR = 0.45, 95% CI = 0.23, 0.87, p = 0.018, comparing to patients with 1 or 2 children). HER2-enriched patients were less likely to be obese (OR = 0.36, 95% CI = 0.16, 0.81, p = 0.013, comparing ≥ 30 to < 25 kg/m2) or to have older age at menopause (OR = 0.38, 95% CI = 0.15, 0.997, p = 0.049, comparing ≥ 50 to < 50 years). The HER2-BMI association appeared to be stronger among postmenopausal women (OR = 0.24, 95% CI = 0.07, 0.081, p = 0.022) than among premenopausal women. Overall, cumulative or average breastfeeding duration did not vary significantly across subtypes. When looking at a number of children and breastfeeding or age at first birth jointly, it appears that luminal B patients with four or more children seemed to have shorter cumulative breastfeeding duration and later age at birth compared with luminal A patients (Table 4). Further stratifying luminal B and TN subtypes did not reveal additional associations (Supplementary Table 6).

Table 4 Associations between breast cancer risk factors and tumor molecular subtypes in Kenyan breast cancer patients (N=821*)

We also conducted a number of sensitivity analyses to evaluate the impact of using grade to define subtypes when ki67 was missing and removing nulliparous women from analyses of age at first birth on our main conclusions. Overall, the results were similar to those from the original analyses (Supplementary Tables 7, 8, 9).

Discussion

The etiology of early-onset breast cancers is particularly lacking across populations given their rarity. Studying African populations where risk factors differ and where onset is almost a decade earlier could provide new insights on breast cancer etiology given the etiologic and molecular subtype heterogeneity in diverse populations.

There is limited data from Africa where some of the breast cancer-associated risk/protective factors such as parity and breastfeeding have extremely different distributions. The overall risk factor distribution for BC patients in our study is similar to a large case-control study from Ghana [19], but is strikingly different from that of other populations including African Americans [20,21,22]. As an example, among BC patients in Ghana and Kenya, > 60% of women had ≥ 3 children, > 80% women had the first child before age 25 years, and > 90% women had breastfed with the average breastfeeding duration per child near two years. Whereas among African American BC patients in the African American Breast Cancer Epidemiology and Risk (AMBER) consortium, only 35% had ≥ 3 children and > 40% had never breastfed [21]. Similarly, the prevalence of obesity (BMI > 30 kg/m2, 41.7% in AMBER vs. 29.4% in Kenya) and early age at menarche (< 13 years, 52.3% in AMBER vs. 8.5% in Kenya) was much higher in AMBER [22, 23] than in Kenya. On the other hand, the frequency of ER-negative cancers (AMBER: 33.9%; Kenya: 30.5%) and TNBC (AMBER: 15.3%; Kenya: 18.6%) was similar in AMBER and Kenya, which is lower compared to BC patients in Ghana (ER−: 50%; TNBC: 28%).

Parity has been reported to have a dual effect on breast cancer risk; it is protective for ER+ women while increases risk for ER− women especially among younger women [24, 21]. Despite the heterogeneity in parity-related exposures, the differential effect of parity by ER has been consistently reported across different populations [25, 21, 19, 26]. Although we were not able to compare relative risks associated with parity in different molecular subtypes due to the case-only design, our results of higher parity in ER-negative than in ER-positive patients is consistent with results from previous case-control studies [19, 26]. In particular, taking advantage of the much higher parity among patients in Kenya, we observed that the association of parity with ER followed a dose-dependent manner, with the highest variation by ER observed among women with five or more children. Similarly, in a population where the vast majority of women had their first children before the age of 30 years, we found a similar association between younger age at first birth and ER-negative breast cancer consistent with previous studies [27, 26, 28], supporting increased parity as a risk factor for ER-negative breast cancers across multiple populations. We observed luminal B patients, both luminal B/high proliferative and luminal B/HER2+, had fewer children compared to luminal A patients. These results are in line with data from the Nurse’s Health Study reporting greater reduced risks associated with parity in luminal B than luminal A patients [25], suggesting that parity may have a stronger protective effect for luminal B as compared to luminal A patients. However, using data based on a Malaysian case-series, we found that luminal B patients were more likely to be parous and to have breastfed compared to luminal A patients [26]. These inconsistent results warrant further investigations especially in diverse populations.

Investigations of associations between breastfeeding and breast cancer risk by receptor status have resulted in inconsistent findings, with some showing a similar protective effect for all subtypes [29], and others showing a stronger protection against ER-negative especially TNBC [30]. In the Ghana study in which the frequency of ER-negative breast cancer especially TNBC was higher (28% vs 18% of tumors) than in the Kenya study, the increased risk of parity was offset by more extended breastfeeding, which was only seen among patients < 50 years of age in ER-negative but not in ER-positive patients, while in older women, extended breastfeeding showed an inverse association regardless of ER status yet a stronger association for ER-positive patients [19]. We did not observe significant differences of breastfeeding by ER or by intrinsic subtype, either in all women or by age. The inconsistent findings between different African populations with similar parity and breastfeeding characteristics highlight the complexity of subtype-specific risk associations and the importance of conducting large molecular epidemiologic studies in diverse African populations.

Obesity is a known risk factor for breast cancer in post-menopausal women but protective in premenopausal women [31]. Obesity can disrupt some biological pathways, resulting in insulin resistance, and synthesis of endogenous sex hormones [32, 33]. When we examined the association of obesity with molecular subtypes, we found that patients with HER2 enriched BC were less likely to have a high BMI. Although we cannot completely rule out the possibility of reverse causality due to weight loss associated with breast cancer, it is unlikely that the association we observed is entirely driven by reverse causation since BMI did not vary significantly by tumor stage in our study. Our findings are consistent with a Polish breast cancer case-control study, which found that in premenopausal women, HER2 expression was inversely associated with BMI adjusted for the 4 markers (adjusted p-trend = 0.01) [34]. In addition, the association was stronger among AKU patients, who were more likely to have early-stage disease as compared to patients from other hospitals. Our findings are similar to a study conducted in Malaysia, which showed that women with HER2-enriched and TNBC tumors were significantly less likely to be obese than those with the luminal A subtype [26]. Our results are also in line with the analysis based on African Americans in the AMBER consortium [22] and a pooled analysis of nine studies of the National Cancer Institute cohort consortium [27] showing that, among postmenopausal women, higher recent BMI was associated with increased risk of ER-positive cancer, but was either associated with decreased risk of ER-negative tumors in AMBER or was not associated with ER-negative BC in the NCI cohort consortium. Notably, the association with BMI observed in our study was mostly driven by HER2 status rather than by TNBC, which is more similar to the findings in the Malaysian study [26].

The strength of our study includes representation of BC cases from multiple hospitals in Kenya, well-annotated risk factor questionnaire and clinical data, and centralized high-quality biomarker assessment in a unique east African population.

This study was limited by the retrospective collection of risk factor data and possible reverse causation, as well as the case-only design, which prohibited us from estimating relative risks associated with each risk factor. Further, despite being the largest BC study of this type conducted in Kenya, the sample size was still relatively small to evaluate risk factors in rare tumor subtypes, especially in age-stratified analyses.

Conclusion

In summary, our findings, based on data from an indigenous African population with unique risk factor profiles, add to the growing body of knowledge regarding the etiologic heterogeneity of breast cancer molecular subtypes among geographically diverse ethnic groups. Further investigations of genetic and environmental factors that modify breast cancer risk in African populations are recommended. Inclusion of diverse regional population groups from sub-Saharan Africa in global breast cancer studies may help provide a better understanding of the subtype-specific breast cancer risk etiology, which will be critical for the development of risk prediction models in African populations.