Introduction

Breast cancer is the most common cancer among women, accounting for 11.7% of all new cancers and 24.5% of all female cancers. Moreover, it is the fifth cause of cancer death and first-ranked in women [1]. Epidemiological studies indicated that progression in detection methods like mammographic screening led to increasing incidence rates of breast cancer during the 1980-90s decades in many countries. Conversely, widespread screening and reduced menopausal hormone therapy caused a decreased incidence during the early 2000s. However, breast cancer incidence is rising due to changes in lifestyle, sociocultural, and environmental issues. High Body Mass Index (BMI) resulting from a sedentary lifestyle and junk and high-calorie diet, night shift, and reproductive and gynecologic factors, including hormonal changes, reduced pregnancy, and lactation, have been identified as the risk factors of the disease. Therefore, identifying diagnostic and prognostic markers of the disease is a prominent point of attention in oncology research [2].

It is estimated that 5–10% of breast cancers are hereditary; however, a high portion of the disease is sporadic type affected by genetic and environmental risk factors, although most of the underlying genetic mechanisms have not been fully defined [3]. Among genetic indicators, polymorphisms are common genomic variations in the general population identified as potential genetic markers for risk assessment. However, comparing high penetrance mutations, these are typically associated with moderate risk [4]. Although candidate gene studies have introduced various loci [5], in recent years, high-throughput genome-wide association studies have identified many genetic loci associated with the risk of breast cancer, introducing breast cancer as a polygenic complex disease [6].

Caspase 8 protein (CASP8), a 55 kDa cysteine protease, is a member of the caspase family and a key apoptosis signaling molecule. It contributes to inducing cell death, particularly through the death receptor pathway. CASP8, one of the first low penetrance loci, has been identified to be associated with the risk of breast cancer in candidate gene studies [7,8,9,10]. Furthermore, efforts to identify new variations in fine-mapping [11, 12] and genome-wide association [13] studies have provided evidence of the association of several variants of CASP8 with breast cancer risk. Given the importance of allelic variations associated with cancers, including breast cancer [7,8,9,10,11,12,13], this study aimed to investigate the association of CASP8 polymorphisms, haplotypes and diplotypes with breast cancer risk, prognosis, and clinicopathological features in a northeastern population of Iran.

Materials and methods

Study population

This study was approved by the Ethics Committee of Mashhad University of Medical Sciences under the ethical approval number: IR.MUMS.REC.1394.188. All participants signed a written informed consent at the time of study entry.

Due to the fact that CASP8 had not been assessed in previous research in Iran, we did not have access to the allele frequency of its variation in our population to calculate the exact sample size required for a decent power of the study (80%). Consequently, a pilot sample size was performed based on similar studies in this field, which mainly have suggested 200–400 samples in each group. However, the final study population included 1008 participants. The breast cancer group included 455 patients (152 new cases diagnosed between 2016 and 2018 and 303 patients diagnosed between 1987 and 2016 and followed in this period) referred to academic teaching hospitals of Mashhad University of Medical Sciences. The control group consisted of 553 healthy people referred to clinicians between 2016 and 2018 for screening, and their health was confirmed using the clinical breast exam (CBE) and mammography. Demographic information was collected using a questionnaire providing sociodemographic data, including age, age of menarche, menopause and first gestation, BMI, history of lactation and abortion, and physical activity.

Pedigree was drawn for all participants to check the family history of cancer and find participants’ relatives. Manchester Score (MS) was used to identify the probability of harboring BRCA1/2 mutations [14]. As a result, the highly suspected hereditary cancer was excluded. After excluding five patients with probable hereditary breast cancer (with an MS of more than 10), 450 sporadic cancer subjects entered the study as the patient group.

The histopathological data, including breast tumor subtype, stage, grade, and receptor status (ER, PR, and HER2), was extracted from patients’ medical records. Categorization was performed according to the standard protocols of the world health organization (WHO) [15], the American Joint Committee on Cancer (AJCC) [16], and the American Society of Clinical Oncology (ASCO) [17]. All cases were followed, and new events, including recurrence, secondary tumors, and metastasis, were documented.

Blood collection and DNA extraction

Five ml of peripheral blood was collected using a Vacuette K2-EDTA blood collection tube (Greiner Bio-One, USA). The salting-out method was utilized to isolate DNA [18]. The qualification and quantification of extracted DNA were evaluated by gel electrophoresis and Epoch™ Microplate Spectrophotometer (BioTek Instruments Inc., Winooski, VT, USA). Samples were aliquoted in a concentration of 150 ng per microliter and stored at -20 until polymerase chain reaction (PCR) analysis.

SNP selection

Twelve validated polymorphisms of the CASP8 gene were selected in different gene regions, including 5’ UTR (promoter), exon, intron, and 3’ UTR regions. Selection of polymorphisms was performed based on several criteria, including validation of the association in numerous GWAS studies, which denotes a strong association with breast cancer risk in different populations. We also considered selecting SNPs that are located in the same region to be able to perform haplotype analysis to examine the overall effect of these polymorphisms. We also considered selecting markers with an acceptable MAF and heterozygosity (minor allele frequency > 5% and heterozygosity > 10%) to achieve the highest possible study power. Characteristics of the selected polymorphisms have been shown in (Additional file 1: Supplementary Table 1).

Genotyping

Genotyping was done using different PCR-based methods. rs3834129, rs12990906, rs3754934, rs3817578 ,and rs10931936 were genotyped using Tetra-ARMS-PCR, rs2037815 and rs7608692 using allele-specific PCR, rs3769821 and rs1045485 using RFLP-PCR. Genotyping method for rs1045487 and rs6435074 was HRM (LightCycler® 96 Instrument (Roche Molecular Systems, Inc.)), and for rs13113 was TaqMan (SNP genotyping Assays (TaqMan®), Catalog number: 4,351,379; Rotor-Gene 6000™ real-time analyzer (Applied Biosystems)). Primers were designed using Primer1, Gene runner and WASP (Web-based Allele-Specific PCR assay), and evaluated using Oligoanalyzer and Mfold. The designed primer sequences have been shown in (Additional file 1: Supplementary Table 2).

Amplification reactions and protocols are shown in (Additional file 1: Supplementary Tables 3 & 4). 5% of samples were randomly re-genotyped to verify genotyping results for quality control purposes. In addition, three samples were randomly sanger sequenced to validate the genotyping method for each marker. Sequencing was done using outer primers for polymorphisms genotyped by Tetra-ARMS-PCR, and new primers, outer both sides of the genotyped region, were designed for the other variations.

Haplotype and diplotype analysis

Assessing the haplotypes and diplotypes distribution was carried out using the PHASE software version 2.1.1 for windows [19]. The linkage disequilibrium (LD) was calculated by 2LD program version 1.00 and evaluated by the D′ statistic as the deviation between the expected haplotype and observed frequency [20].

Statistical analysis

The Hardy-Weinberg Equilibrium (HWE) assumption was assessed in the case and control samples using the χ2 with one degree of freedom. Data are shown in (Additional file 1: Supplementary Table 5). Depending on the assessment of normality using the Kolmogorov-Smirnov (K-S) test, the normally distributed continuous variables were examined using the independent sample t-test and the Mann-Whitney U test was used to compare non-normally distributed variables between the two groups. ANOVA or Kruskal Wallis was also used to compare more than two groups. The categorical variables were compared appropriately with the chi-square or Fisher’s exact tests. Correlations between variables were tested using the Pearson correlation test for normally distributed variables and the Spearman correlation test for non-normally distributed variables.

The associations of alleles, genotypes, haplotypes, and diplotypes with breast cancer risk, breast cancer risk factors, and histopathological status were judged by logistic regression. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated for the measured risk factors. Multivariate logistic regression was applied to identify the variables with independent association with the risk of breast cancer. The backward logistic regression (LR) model was implemented to select variables for multivariable investigation. The results were also adjusted for potential confounders such as BMI, age at first gestation, and Menopause status in the logistic regression analysis.

Overall survival (OS) time was considered the time between diagnosis according to the first biopsy confirming the disease and the time of death due to cancer or last contact. Kaplan–Meier plots/Log-rank and Cox proportional hazards regression approaches were used to explain the associations between different covariates and overall survival. The hazard rate ratio (HR) and 95% CIs were calculated by the Cox models.

Statistical analysis was performed using SPSS 16.0 (IBM, USA), and a P-value less than 0.05 was considered significant.

Results

Characteristics of the population

After excluding 5 patients with hereditary breast cancer, 450 breast cancer patients (mean age = 47.20 ± 10.41) and 553 healthy individuals (mean age = 45.88 ± 11.51) were studied. The characteristics of breast cancer cases and cancer-free controls have been shown in Table 1. Furthermore, tumor features of breast cancer patients have been reported in Table 2.

Table 1 The characteristics of breast cancer cases and cancer-free controls
Table 2 Distribution of tumour characteristics of Breast cancer cases

Menstrual status was significantly different between the two groups (p < 0.001). According to the findings of this study, there was no significant difference in lactation and abortion history between the groups (p > 0.05). BMI showed a significant difference (p < 0.001) with a mean of 27.65 ± 5.05 Kg/m2 in patients and 25.36 ± 4.36 Kg/m2 in healthy subjects. Also, the classification of this index into two groups of less and more than 25 showed that the percentage of people with a BMI above 25 in the patient group was higher than in the control group (p < 0.001).

Evaluation of clinicopathologic features indicated the most common type of tumor in the study population was the invasive ductal type by 75.1% of the total specimens examined. In situ, lobular and metastatic tumors were less prevalent. Tumor grade and stage examination showed that more patients (56%) had low-grade tumors, and 50.7% of patients were identified in the early stages of the disease (1 and 2). In terms of tumor size, small tumors (with 64.9% of all specimens) ranked first. Findings related to lymph node status showed that 47.4% of patients were lymph node-positive, with the highest number of involved nodes being between 1 and 3. Assessment of hormone receptor status showed that in more than 60% of patients, estrogen or progesterone hormone receptors were positive, and HER2 overexpression was observed in 22.9% of patients.

Evaluation of overall survival in patients showed that 5-year overall survival was 90%, and 10-year overall survival was 85%.

Association of CASP8 genotypes, haplotypes and diplotypes with breast cancer risk

Hardy–Weinberg equilibrium in the healthy controls is shown in (Additional file 1: Supplementary Table 5). For those polymorphisms which were not in Hardy–Weinberg equilibrium the genotyping results were verified by regenotyping 5% of samples randomly and the results were consistent with the previously genotyped samples. The results of statistical analysis showed that rs3834129 was associated with breast cancer risk in dominant (II + ID vs. DD) (pAdj=0.034) and recessive (ID + DD vs. II) (pAdj=0.014) models. In the dominant model, rs2037815-G allele carriers (GA + GG) (pAdj=0.031), rs7608692-A-allele carriers (GA + AA) (pAdj=0.006), and rs10931936-T allele carriers (TT + CT) (pAdj<0.001) had a higher risk of breast cancer. On the other hand, carriers of the rs3754934-A allele (CA + AA) had a reduced risk of breast cancer in the dominant model (pAdj=0.004). We did not find a significant association between breast cancer risk and rs3769821, rs6435074, rs3817578, rs1045485, rs1045487, and rs13113 in our study population. Alleles and genotypes frequencies have been reported in Table 3, for further information about the analyses based on different genetic models see (Additional file 1: Supplementary Table 6), and significant findings have been shown in Tables 4 and 5.

Table 3 The frequency of alleles and genotypes of CASP8 polymorphisms in breast cancer and healthy groups
Table 4 Association of CASP8 polymorphism, haplotypes and diplotypes with breast cancer risk, the clinico-pathological features and overall survival
Table 5 Association of CASP8 polymorphism with overall survival

The CTG haplotype of rs3817578-rs10931936-rs1045485, with a prevalence of 18.8%, among the haplotypes was associated with an increased risk of breast cancer (pAdj<0.001). Two 4-SNPs haplotypes, two 5-SNPs haplotypes and a 6-SNPs haplotype were also associated with the risk of breast cancer in the study population. Since the frequency of identified haplotypes with more SNPs was lower than 10%, they were not investigated in this study. Diplotypes were also identified using the haplotype data. Based on the identified diplotypes with a frequency of more than 10%, four diplotypes [rs3817578-rs10931936- rs1045485 (CCG-CTG), rs3817578-rs10931936- rs1045485 (CCC, CTG), rs3754934-rs3817578-rs10931936-rs1045485 (CCCG-CCTG) and rs3754934-rs381757-rs10931936-rs1045485-rs1045487 (CCCGG-CCTGG)] were associated with breast cancer risk. Significant results have been reported in Tables 4 and 5.

Association of CASP8 polymorphisms, haplotypes and diplotypes with clinicopathological features and overall survival

Genotypes, haplotypes, and diplotypes were extensively analyzed for a potential correlation/association with breast cancer clinicopathological characteristics and overall survival. Significant results have been presented in Tables 4 and 5.

Evaluation of the genotypes with respect to clinicopathological features specified the association of rs3834129 (p = 0.034) and rs2037815 with menstrual age (p = 0.026), rs1045487 with the diagnosis age (p = 0.022), rs13113 with BMI (p = 0.029), rs7608692 with molecular category (p = 0.039) and rs3754934 with ER status (p = 0.008).

Haplotype analysis identified a four-SNPs haplotype correlated with ER status (p < 0.001). Furthermore, three six-SNPs diplotypes were correlated with the stage of the disease (p = 0.017), HER2 status (p = 0.043), and BMI (p = 0.004).

Evaluation of overall survival in patients showed that 10-year overall survival was 87% (Fig. 1A). Overall survival comparison between different genetic models of rs3754934 polymorphism showed that the C allele was associated with a lower risk of death than the A allele [p = 0.022; HR = 0.46, 95% CI (0.23–0.89)] in all patients (Fig. 1B), as well as in hormone-positive group [p = 0.038; HR = 0.37, 95% CI (0.14–0.95)] (Fig. 1C). Furthermore, the CC genotype was associated with a lower risk of death than the AA genotype in the hormone-positive group [p = 0.002; HR = 0.09, 95% CI (0.02–0.43)] (Fig. 1D). However, we did not find any haplotypes and diplotypes associated with overall survival.

Fig. 1
figure 1

Overall Survival curves in total population (A and B) and in Hormone receptor-positive breast cancer patients (C and D) A: Kaplan-Meier overall survival curves of patients with breast cancer in total population; B: Kaplan-Meier overall survival curves for rs3754934 alleles (A vs. C) in all breast cancer patients; C: Kaplan-Meier overall survival curves for rs3754934 alleles (A vs. C) in Hormone receptor-positive breast cancer patients; D: Kaplan-Meier overall survival curves for rs3754934 genotypes (AA & AC vs. CC) in Hormone receptor-positive breast cancer patients

Discussion

Dysregulation of apoptosis has been well known in the pathogenesis of cancer. CASP8, as a key element of apoptosis, has been represented with several genomic variations in association with breast cancer [21]. Furthermore, its overexpression can lead to induced programmed cell death in breast tumors [22, 23]. Our results indicate variations in CASP8 are associated with the risk of breast cancer as well as clinicopathological features.

Regarding the rs3834129, as the most prevalent validated variant, I/D and D/D genotypes have been associated with 1.32 times and 1.42 times lower risk of breast cancer, respectively, indicating a dose-dependent effect of deletion allele similar to the reports in a Chinese population [24]. While a large study on the Europeans found no significant outcome [25], a meta-analysis has confirmed a reduced risk of breast cancer in association with the deletion allele, resulting in a reduction in the overall risk of cancer in the Asian and Caucasian populations but not in Africans [26]. Consistent with the association of rs7698692 A-allele carriers with a 47% increased risk of the disease in the dominant model, data from a meta-analysis study showed the association of A allele with a 35% increased risk of cancer in the Asian population [27]. In addition, rs10931936 may increase the risk of breast cancer by up to 73%, and carriers of the T allele in the dominant model also had a two-fold increased risk. In a GWAS in England, the association of rs10931936 with breast cancer was reported with a 13% increased risk (11). This result was again confirmed by a 7% increased risk in the European population [28]. However, a study on In Situ breast cancer patients reported no association between this polymorphism and breast cancer risk [29]. While A allele carriers of rs3754934 polymorphism in the dominant model had a 51% reduced risk of breast cancer in our population, a study of this variant in the British population did not indicate a significant association [11].

Association studies have confirmed the higher statistical power of haplotype analyses compared with alleles or genotypes analysis itself [30, 31]. In this regard, haplotype analysis indicated combinations of multiple loci of CASP8, including a 3-SNPs, a 4-SNPs, and a 5-SNPs haplotypes, associated with 58–78% increased risk of breast cancer in the study population. In two previous studies considering different polymorphisms of CASP8, several haplotypes, including rs7608692, rs3834129, rs3817578, and rs1045485, have been reported to be associated with a 28–31% increased risk of breast cancer [11, 12]. In these studies, two polymorphisms rs3834129 and rs1045485 have been introduced as prominent risk-related variants in line with the present study.

While a previous study has not provided such associations [11], another research has reported some CASP8 variants related to pathological factors [32]. Considering age, associated markers may be favorable in setting up a direct-to-consumer test for early diagnosis in routine screening or assessment of prognosis. Previous findings have shown that patients diagnosed at lower ages had more aggressive features and worse prognoses than those at higher ages [33]. These results suggest that the genetic architecture of the disease may be different in older patients compared to younger, and possibly unknown genetic factors may be responsible for different tumor behaviors. However, many of the molecular mechanisms of these effects are unknown and require functional studies to identify common pathways and potential diagnostic and prognostic targets.

The importance of polymorphisms is known as prognostic markers, as polymorphisms can play a leading role in altering the uptake and absorption of chemotherapy drugs and may influence the response to chemotherapy and, ultimately, the outcome of the disease [34, 35]. However, just CASP8 rs3754934 in the study population showed a relationship with prognosis. Previously, the association of rs3769821 [36] and rs1045485 [37] polymorphisms with an increased risk of death in advanced lung adenocarcinoma and breast cancer, respectively, have been reported. Also, the rs3834129 deletion allele was associated with poor prognosis in the German population, which contradicts the protective effect of this allele in breast cancer [37].

Conclusion

The present study with a carefully selected range of genetic markers across the CASP8 gene region can add more evidence to the literature about the overall role of the gene in breast cancer and improve the information about the genetic basis of the disease. Based on the results of this study, which was conducted for the first time in the Northeastern female population of Iran, CASP8 gene polymorphisms, haplotypes, and diplotypes may be used as predictive markers for the risk and prognosis of breast cancer. In addition, identified haplotypes and diplotypes which carry certain risk-related alleles may have the ability to be used in multigenic tests to calculate individual risk levels for personalized medicine purposes.

These findings, however, suggest that there is a difference in the allele frequency of considered variants in Iranian populations compared to Asian-related reports. This finding may indicate profound differences in the genetic background of populations and consequently different effects of alleles. Given that the eleven variants studied in this project were studied for the first time in Iran, highly-quality controlled frequencies obtained in this project can be used in calculating the appropriate sample size for future studies. However, identifying the mechanism of action of these haplotypes can also help to identify the tumorigenic process and may lead to opening new windows to the identification of therapeutic targets.