Breast cancer is the most common cancer among females and represents 14% of female cancer deaths worldwide [1]. In Korea, breast cancer is the second most common cancer comprising 14.7% in year 2008 and the incidence rate has increased 6.8% annually during the last 6 years. The age-specific incidence rates of breast cancer among Korean women are somewhat different from those of western country, showing their highest peak in the 45–49 years age group with the proportion of young age-onset breast cancer much higher than in western countries [24]. While breast cancer incidence has increased over the past 30–40 years, mortality has remained stable or even decreased in the last 10–15 years, a likely result of earlier detection and improved treatment strategy. [5]. However, breast cancer survival can vary considerably not only among different ethnic groups but also across different subtypes of breast cancer. These differences have been partially explained by the traditional prognostic and predictive factors related to clinical and pathological features of breast cancer [6]. Previous studies conducting candidate gene approach and recently added genome-wide association studies have implicated inherited factors as influences on breast cancer survival. However, the reported effect size was very small to moderate and do not fully explain the heterogeneity in breast cancer survival [79].

MicroRNAs (miRNAs) are a major class of endogenous short non-coding RNAs that post-transcriptionally modulate gene expression in a sequence specific manner [10]. The role of miRNAs in human cancer pathogenesis has been well established by the identification of aberrant expression of miRNAs in many types of cancer including breast cancer [11]. It is also suggested that the perturbations in miRNA expression facilitate the key pathways involved in cancer progression such as inflammation, cancer cell invasion, angiogenesis, and metastasis [12]. Some miRNAs were reported to be associated with invasive and metastatic phenotype of breast cancer cell lines and to be also correlated with metastatic tumor tissues as well. On the other hand, some miRNAs serve as metastasis suppressors and it was shown that their expression is frequently downregulated or lost in both breast cancer cell lines and metastatic foci [13].

There is increasing evidence that the presence of genetic variants in precursor or mature miRNAs, their biogenesis pathway genes, or in miRNA binding sites of target mRNA are associated with the risk of cancer development [14]. More recently, the effects of genetic variants in miRNA related genes have been demonstrated to be associated with cancer survival in several types of cancers including colorectal cancer [15], renal cell carcinoma [16], ovarian cancer [17] and head and neck cancer [18].

Although the role of miRNA’s biogenesis pathway genes in cancer development and its progression has been well established, the association between genetic variants of this pathway genes and breast cancer survival is still unknown. Here, we selected 41 SNPs in 14 candidate genes (AGO1, AGO2, DICER1, DGCR8, DROSHA, FMR1, GEMIN3, GEMIN4, HIWI, RAN, TARBP2, XPO5, p68 and p72) involved in the canonical miRNA biogenesis pathway [19, 20] and evaluated the associations with breast cancer survival.


Study population

The REMARK criteria was used to report our data [21]. The study subjects were derived from the previous study to investigate association between the polymorphisms in miRNA biogenesis pathway genes and breast cancer risk [22]. The description of the Seoul Breast Cancer Study (SeBCS) has been described in the previous study. Briefly, histologically confirmed breast cancer cases (n = 3,497) and controls (n = 1,273) were recruited by the Seoul National University Hospital and Asan Medical Center between 2001 and 2007. A questionnaire was given by the trained interviewers to collect information on demographic, reproductive and other lifestyle factors. Peripheral blood was drawn into 10-mL heparinized tubes and stored at −70°C until genotyping. Those with previous history of cancer (n = 156), previous history of hysterectomy or oophorectomy (n = 268) were excluded. After the additional exclusion of the subjects with no or insufficient DNA samples (n = 760), 559 patients were available for genotyping experiment. Patients were followed up to March 2010 using a retrospective chart review with standard protocol to collect clinicopathological features, patients’ treatments, and vital status such as recurrence and death. The vital status was additionally checked through the death registry of the Korea National Statistical Office in May 2011.

Patients diagnosed with in situ breast cancer (n = 49), benign breast cancer (n = 13), or multiple cancer at diagnosis (n = 2) and with incomplete linkage of both clinical record and death record due to error in resident registration number (n = 7) were additionally excluded for survival analysis. Thus, a total of 488 invasive breast cancer patients were included in the final analysis. The baseline characteristics by study inclusion were presented in Additional file 1: Table S 1.

Informed consents were received from every patient when the questionnaire was administered. The study design was approved by the Committee on Human Research of Seoul National University Hospital (IRB No. H-0503-144-004).

SNP selection and genotyping

The selection of genes and SNPs has been previously described in detail [22] . Using the data from the International HapMap Project (, dbSNP (NCBI, and web-based SNP selection tools ( [23], we selected 41 haplotype tagging SNPs (htSNPs) defined by the linkage disequilibrium (LD) with square of the pairwise correlation coefficient, r2 > 0.90, in the genomic region from 5 kb upstream and 3 kb downstream of the largest cDNA isoform of each gene with minor allele frequency (MAF) of > 0.05 in Asian populations; 35 SNPs in 12 genes in miRNA biogenesis pathway and 6 SNPs in 2 genes involved in hormonal regulation of miRNA biogenesis. To measure the LD between SNPs, LD parameters (r2 and D′) were calculated using Haploview version 4.2 software (Whitehead Institute, Cambridge, MA, In the case of multiple potentially functional SNPs within the same haplotype block, only one SNP was included in our analyses. The proportion of common variants tagged with r2 more than 0.8 ranged from 32.4% to 100% [22] .

Genotyping for 41 SNPs was conducted using the TaqMan assay (Applied Biosystems, Carlsbad, California). Primers and probes are available upon requests from the authors. Randomly selected 5% of samples and negative controls were included to ensure the accuracy of genotyping. The concordance rates for quality-control samples were >95.0% for all assays. For all 41 SNPs investigated, no significant deviation from Hardy-Weinberg equilibrium was observed among the controls (P > 0.01). We discarded one SNP in TARBP (rs2280448) that showed minor allele frequency less than 5%.

Statistical analyses

The differences in patient characteristics by study inclusion were assessed by the Pearson’s χ 2 tests for categorical variables and the Student's t-test for continuous variable.

The primary outcomes were disease-free survival (DFS) and overall survival (OS). For DFS analysis, we only included patients with stage I-III (n = 480) excluding patients diagnosed with stage IV (n = 8). DFS time was defined as the time from the date of surgery until the date of the first locoregional recurrence, first distant metastasis, 2nd primary cancer or death from any cause. Patients known to be alive with no evidence of disease progression were censored at the last follow-up date or 31 March 2010, whichever came first. OS time was defined as the time from diagnosis until the date of death from any cause, censoring at the date of the last follow up or 31 May 2011, whichever came first.

Kaplan-Meier method was used to estimate the survival function, and differences across survival curves were examined using log-rank test. For the DFS analysis, Cox’s proportional hazard regression models were used to estimate hazard ratios (HR) and 95% confidence intervals (CI) adjusting for age at diagnosis, tumor size (≤ 2 cm and > 2 cm), lymph-node involvement (no and yes), histologic grade (I-II and III), nuclear stage (I-II and III), estrogen receptor (ER) status (positive and negative), progesterone receptor (PR) status (positive and negative), and hormone receptor therapy (yes and no) in the final model. For the OS analysis, tumor size (≤ 2 cm and > 2 cm), lymph-node involvement (no and yes), distal organ metastasis (no and yes), histologic grade (I-II and III) and hormone receptor therapy (yes and no) were included in the final model. Other covariates considered but not included the final model were menopausal status, adjuvant chemotherapy, and radiotherapy. Because these variables did not alter HRs significantly after adjusting for other covariates (statistical significance for the inclusion of the final model was set at P < 0.20) and the final model adjusted for all the covariates made no substantial difference to the results. The significance of the full versus reduced model was calculated with an F-test.

The proportional hazard assumption of the Cox model was examined by graphic evaluation of Schoenfeld residual plot. We generated dummy variables indicating the missing (1) and non-missing (0) for the all covariates and all the missing values of covariates were coded as ‘-1’. When we tested the regression model, each covariate was included in the model with respective dummy variable for missingness so that all the patients were included in the statistical models.

For individual SNP analysis, we tested three genetic models (additive, dominant and recessive) to evaluate the significance of SNPs and the best fitting model for each SNP was selected by the smallest P value.

Given the number of SNPs investigated, the Benjamini-Hochberg false discovery rate (FDR) method was used to assess the statistical significance after correction for multiple comparisons. We considered FDR < 0.05 as being noteworthy [24].

For haplotype analysis, the missing data with at least one of five polymorphic sites were excluded. Assuming HWE, the expectation-maximization algorithm was used to calculate the maximum likelihood estimates of the haplotype frequencies using SAS PROC HAPLOTYPE. The haplotypes with frequencies less than 5% in all patients were combined into one group. The most common haplotype was compared with other haplotypes with adjustment for selected covariates. Haplotype data were treated as categorical variable and were incorporated as dummy variables in the Cox model. Cumulative effects of SNPs were assessed by counting the number of high-risk genotypes in each subject. High-risk genotypes were defined as the genotypes with risk conferring-allele shown to be significantly associated with DFS or OS. The number of high-risk genotypes was categorized as low, medium, and high-risk groups and HRs and 95% CIs were calculated for all groups and compared with the low-risk group.

All statistical procedures were conducted using SAS version 9.2 (SAS Institute, Cary, NC). All P values reported were two-sided.


The differences between the patients’ characteristics by study inclusion were evaluated (Supplementary Table 1). All the clinical characteristics compared were not significantly different between two groups other than the distribution of age. The patients included in this study were younger than those excluded from the study (mean (standard deviation), 46.6 (11.0) vs. 47.9 (9.8), P = 0.02), however, the menopausal status were not significantly different between the groups.

Table 1 Baseline characteristics of subjects by follow up status

Of the 488 invasive breast cancer patients included in this study, 323 (66.5%) patients were premenopausal women, 449 cases patients (92.0%) had ductal carcinoma, 37 cases (7.6%) had either lobular, mucinous or papillary carcinoma, and 2 cases were unknown. Over 83% of all cases (n = 408) had early stage disease (stage I-IIB).

During a median follow-up of 4.24 years (range, 0.1-8.3 years) of DFS, there were 76 recurrences, 11 second primary cancers, and 3 deaths among the 480 patients diagnosed with stage I-III. In addition, during a median follow-up of 6.24 years (range, 0.2-9.5 years) of OS, there were 41 deaths from any cause among the 488 patients. Table 1 summarized univariate and multivariate–adjusted HRs for DFS and OR by patients' characteristics. There were significant differences across survival curves of DFS and OS according to tumor size, lymph-node involvement, metastasis, histologic grade, nuclear grade, ER receptor, adjuvant chemotherapy and hormone receptor therapy (log-rank P < 0.05, data not shown). In multivariate analysis, tumor size, lymph-node involvement, histologic grade, nuclear grade, ER status, PR status, and hormone receptor therapy were remained as independent and significant prognostic factors for DFS and tumor size, lymph-node involvement, metastasis, histologic grade and hormone receptor therapy for OS (P < 0.20).

Results from association analyses for 40 SNPs and breast cancer prognosis are presented in Table2. There were seven SNPs significantly associated with breast cancer survival. Two SNPs in AGO2 (rs11786030 and rs2292779) and DICER1 rs1057035 were associated with both DFS and OS. Two SNPs in HIWI (rs4759659 and rs11060845) and DGCR8 rs9606250 were associated with DFS, while DROSHA rs874332 and GEMIN4 rs4968104 were associated with only OS (P < 0.05, Table 2). The statistical significance was retained after multiple comparisons only for the association between AGO2 rs2292779 and OS.

Table 2 Association of SNPs in microRNA biogenesis pathway genes and breast cancer survival

The most significant association was observed for the AGO2 rs11786030 G allele. The cases with AG/GG genotypes had 2.62-fold increased risk of disease progression of breast cancer (95% confidence interval (CI), 1.41-4.88) and 2.41-fold increased risk of death (95% CI, 1.05-5.50). The AGO2 rs2292779 was also associated with poor prognosis of breast cancer. The minor allele (G) of SNP rs2292779 was associated with 1.42-fold increased risk of disease progression in dose dependent manner (95% CI, 1.05-1.87) and the association with the risk of death was stronger with an adjusted HR of 2.94 in recessive model (95% CI, 1.18-4.35). In addition, the SNP rs1057035 located in 3’UTR of DICER1 was also associated with both DFS and OS. The TC/CC genotypes of rs1057035 were associated with 1.72-fold increased risk of disease progression (95% CI, 1.00-2.99) and 2.08-fold increased the risk of death.

The DGCR8 rs9606250 and two SNPs in HIWI (rs4759659 and 11060845) were associated with only DFS. The variant allele of DGCR8 rs9606250 variant allele (T) was significantly associated DFS with an adjusted HR of 0.21 (95% CI, 0.05-0.84) in dominant model. The HIWI rs4759659 variant allele (A) was associated with decreased risk of disease progression (per-allele HR, 0.50; 95% CI, 0.29-0.85), however, rs11060845 variant allele (T) was associated with increased risk of disease progression.

The DROSHA rs874332 C allele was associated with increased risk of death in recessive model (adjusted HR TT/TC vs CC, 2.24; 95% CI, 1.21-4.17) and GEMIN4 rs4968104 A allele was associated with decreased risk of death in dose dependent manner (per-allele HR, 0.46; 95% CI, 0.21-0.99).

We conducted haplotype analysis for 13 genes other than p68 and XPO5. We found that common haplotypes of AGO2 were associated with both DFS and OS (Table 3). The four most common haplotypes were G-A-T-C-A, C-A-T-C-G, G-A-T-C-G, and G-A-C-C-A with the respective frequencies of 37.1%, 26.6%, 19.6% and 3.5% accounting for 87.1% in all patients. We found the haplotype of G-A-T-C-G were significantly increased risk of disease progression and death compared to the most common haplotype of G-A-T-C-A (adjusted HR, 2.66; 95% CI, 1.49-4.73; P = 0.001 and adjusted HR, 3.11; 95% CI, 1.34-7.23; P = 0.008, respectively). No significant associations were observed with haplotype of other genes (data not shown).

Table 3 Association of AGO2 haplotypes and breast cancer survival

We also found cumulative effects of SNPs on DFS and OS. Compared with subjects carrying 0 to 2 high-risk genotypes, those carrying 3 and 4–6 high-risk genotypes had an increased risk of disease progression with an adjusted HR of 2.16 (95% CI. 1.18-3.93) and 4.47 (95% CI, 2.45-8.14), respectively (HR for trend, 2.11; P for trend, 6.11E-07). Similar pattern was observed as for OS, although the association was stronger than that for DFS (HR for trend, 2.80; P for trend, 3.30E-05) (Table 4). The Kaplan-Meier survival function is consistent with the result of proportional hazard assumption (data not shown).

Table 4 Cumulative effect analysis by the number of unfavorable genotypes in miRNA biogenesis pathway genes and disease free survival


In this study, we found seven SNPs significantly associated with breast cancer prognosis and gene-dosage effect of increasing number of high-risk genotypes on DFS and OS of breast cancer. Two SNPs in AGO2 (rs11786030 and rs2292779) and DICER1 rs1057035 were associated with both DFS and OS. Two SNPs in HIWI (rs4759659 and rs11060845) and DGCR8 rs9606250 were associated with DFS, while DROSHA rs874332 and GEMIN4 rs4968104 were associated with only OS. Furthermore, some specific AGO2 haplotype also associated with both DFS and OS.

The AGO2 plays a key role in miRNA-mediated gene silencing as a component of the miRNA-induced silencing complex that directly binds miRNAs and mediates cleavage of target mRNA [25]. There are emerging evidence from in vitro analysis and clinical samples that abnormal expression or enzymatic function of AGO2 is associated with cancer development and its progression. In breast cancer cell lines, it was shown that the overexpression of AGO2 could induce the transformed phenotype [26]. Moreover, the variations in genomic structure of AGO2, such as copy number change or frameshift mutation, were reported to be associated with several cancers including multiple myeloma, gastric and colorectal cancer [27, 28].

We observed that variant alleles of rs2292779 and rs11786030 in AGO2 were commonly associated with poor DFS and poor OS. In our previous study, we reported SNP rs3864659 in AGO2 is associated with the reduction of breast cancer risk. However, we did not observe the significant association with breast cancer survival for rs3864659 implying its different role in tumor development and survival in breast cancer patients. The rs2292779 G allele was more significantly and strongly associated with OS than DFS, while the effect sizes of rs11786030 were similar between OS and DFS. Furthermore, the haplotype containing minor alleles of both SNPs (rs2292779 and rs11786030) was also significantly associated with both DFS and OS showing slightly larger effect than those of estimated from single SNP. The AGO2 rs2292778 was not genotyped in our study, but it is a perfect proxy for AGO1 rs2292779, based on the HapMap Chinese in Beijing (CHB) (r2 = 0.94) and Japanese in Tokyo (JPT) (r2 = 0.95) and even in CEPH European ancestry (CEU) (r2 = 1.00). It is intriguing that the AGO2 rs2292778 is highly evolutionary conserved allele (conservation score = 0.992) although it resides outside the coding region of AGO2 suggesting potential functional effect of these variants [25]. However, it should be answered whether the function or expression level of AGO2 is regulated via proxy or identified SNP through experimental study.

The SNP rs9606250 in DGCR8 was found to be associated with longer DFS. As for the association with OS, however, we could not evaluate the effect of rs9606250 on OS due to low frequency of minor allele homozygote. DGCR8 (Pasha) is a double stranded RNA-binding protein involved in processing of primary precursor of miRNAs into the pre-miRNAs as a component of multi-protein complex with RNAse III enzyme DROSHA (RNASEN). The possible role of DGCR8 gene in the clinical outcome of breast cancer has been recently reported [25]. It was shown that impaired miRNA processing through knockdown of DGCR8 facilitates breast cancer cell invasion. Interestingly, both rs9605062 and rs9606250 identified in DGCR8 is in high LD with rs9606232 (CHB and JPT, r2 = 1.00; CEU, r2 = 0.83), which is located at a conserved transcription-binding site of DGCR8. Thus, it is plausible that the level or the timing of gene expression might be regulated by these variants, although more detailed functional studies should provide evidence for the underlying mechanism. In previous studies, three SNPs in DGCR8 were associated with OS of ovarian cancer consistent with our result [17], although there is no significant association between the genetic variants in DGCR8 and survival of patients with colorectal cancer [15], head and neck cancer [18] and renal cell carcinoma [16].

We also observed the variant allele of HIWI rs4759659 was also associated with longer DFS, although it was not associated with OS. In addition, HIWI rs11060845 was associated with increased risk of disease progression in the present study, while this SNP was associated with risk reduction in breast cancer susceptibility in our previous study implicating the difference effect of the same variant on cancer development and progression. HIWI (PIWI1) is suggested to play an important role in stem cell renewal, division, germ cell proliferation, RNA silencing and translational regulation [29]. In addition, HIWI expression has been shown to be associated with tumor development including gastric cancer [30] and pancreas adenocarcinoma [31]. Furthermore, the expression level of HIWI has been shown to be associated with cancer survival in esophageal squamous cancer cells [32] and soft-tissue sarcoma [21]. It is notable that HIWI is significantly overexpressed in several metastatic tumor tissues compared to benign hyperplasia or chronic inflammation lesion, and the expression is positively correlated with Ang-2 and Tie-2 which play a key role in angiogenesis [33]. However, further study is warranted to prove whether the identified variant leads to decreased expression of HIWI gene and whether the effect of this variant on breast cancer microinvasion or metastasis is mediated by angiogenic pathway.

We also found the variant allele of DROSHA rs874332 was associated with poor OS in breast cancer. DROSHA (RNASEN) is RNAse III enzyme mediating processing of pri-miRNAs into pre-miRNAs with DGCR8. The significance of aberrant expression levels of DROSHA as a potential prognostic factor has been substantiated in several studies on esophageal cancer [34], cervical neoplastic progression [35], ovarian cancer [36], and neuroblastoma [37] .

In breast cancer, down regulation in DROSHA and/or DICER was preferentially observed in distinct subgroups of breast cancer [38] . In addition, the haplotypes of DROSHA and DICER, but not individual SNP, were associated with altered survival and recurrence of renal cancer carcinoma [16]. Through in vitro functional study, Noh et al. demonstrated that less efficient miRNA processing caused by knockdown of DROSHA DICER and DGCR8 is responsible for reduced levels of mature forms of tumor-suppressive miRNAs, and finally facilitates breast cancer cell invasion by upregulating the expression of urokinase-type plasminogen activator (uPA) [39]. Considering that DROSHA rs642321, the proxy of rs874332 (CHB r2 = 0.93; JPT, r2 = 0.87 and CEU, r2 = 0.82), is located in the 3’-untranslated region (UTR) of DROSHA and is a predicted miRNA binding site (, it is plausible that rs874332 could be associated with translational repression and/or mRNA destabilization of DROSHA through miRNA-mRNA interaction.

We performed polygenic risk model analysis to investigate the cumulative effects of SNPs on breast cancer survival. By combining six SNPs for DFS and five SNPs for OS, we observed a strong dose–response trend toward an increasing risk of disease progression or death with an increasing number of high-risk genotypes. This cumulative effect of variants in miRNA biogenesis pathway genes on breast cancer survival is consistent with the notion that the germline genetic variation could be associated with cancer progression and survival via polygenic manner.

Although the present study included tagging SNPs in most currently known miRNA biogenesis pathway genes, future studies should investigate the comprehensive list of genetic variants in miRNA genes and target binding site in 3’-untranslated region to understand how their deregulation of miRNA network caused potentially by the genetic variants is implicated in breast cancer prognosis.

There are several limitations in this study. We could not evaluate the association between the selected SNPs and breast cancer specific mortality because the data for the cause of death were not available. Thus, the results on the association with the OS must be interpreted cautiously and need to be confirmed in study to investigate the association with breast cancer specific survival. The primary limitation is the limited power to detect the potential effects of genetic variants on the breast cancer prognosis, especially for the several SNPs with low minor allele frequencies. This study has only 1 to 66% of the statistical power to detect the effect sizes of 1.10 to 1.40 with the current sample size (Bonferroni corrected p-value = 0.001). Along with the limited power, some of the SNPs we identified might be attained by chance finding given the loss of robustness after multiple comparison adjustments. Furthermore, the risk profiles of genetic variants identified from this study could be manifested differently in different ethnic populations assuming the differences in MAF and underlying genomic structure, and interactions of environmental factors and other modifiers could have an effect on the penetrance of these SNPs.


We evaluated the comprehensive list of tagging SNPs in most currently known miRNA biogenesis pathway genes and we identified the associations between several putative genetic variants in miRNA biogenesis pathway genes and breast cancer survival. To the best of our knowledge, this is the first study to provide evidence suggesting that common genetic variants in miRNA biogenesis pathway genes may affect the breast cancer survival, individually and jointly. To more powerfully elucidate this risk-conferring effect of any SNP observed in this study, further epidemiological studies in independent and large number of subjects and functional studies need to validate our findings.