Background

MicroRNAs (miRNAs) are small non-coding single stranded RNA molecules (21 – 25 nucleotides) involved in negative regulation of gene expression [1, 2]. miRNAs are transcribed from long microRNA primary transcripts (pri-miRNAs) containing miRNA precursors (pre-miRNA, stem-loop molecules of 55 – 70 nucleotides). Pre-miRNAs are usually generated via the canonical pathway [3], where pri-miRNAs are cleaved by the complex formed by the nuclear ribonuclease (RNase) III DROSHA and the RNA-binding protein DCGR8 (DiGeorge syndrome critical region gene 8) [4]. Following transport to the cytoplasm by XPO5 (exportin5) and the nuclear protein ran-GTP, they are processed by DICER1 [5, 6]. The resultant duplex molecule contains both a single-stranded mature miRNA sequence and a complementary miRNA* strand which is released and degraded while the mature miRNA is loaded into the RNA-induced silencing complex (RISC) and merged into the Argonaute/EIFC2C (Ago) proteins [7, 8]. Each miRNA may bind to up to 200 gene targets and multiple binding sites for different miRNAs, making a highly complex web of interactions affecting a variety of biological pathways [9, 10].

miRNAs are very likely to play an important role in cancer biology due to their role in the regulation of important cellular processes including growth, differentiation and cell survival [1116]. Several mechanisms result in changes in miRNA synthesis and expression in relation to cancer including: point mutations in miRNA and mRNA sequences, loss or mutation in the promoter regions for specific miRNA clusters, epigenetic changes and alterations in pathway related to dsRBD proteins [1719]. Single nucleotide polymorphisms (SNPs) in miRNA genes (miR-SNPs) are one example of point mutation (albeit one occurring in the past) that could affect miRNA function in one of three possible ways: altering transcription of the primary miRNA transcript; processing of the pri-miRNA and pre-miRNA and; through their effects on modulation of miRNA-mRNA interactions [2023]. As a result miR-SNPs have been associated with different types of cancer, including chronic lymphocytic leukaemia, gastric, lung and thyroid carcinoma [2427].

Breast cancer is the second most commonly diagnosed cancer worldwide (1.67 million new cases, 25%) and the most common type of cancer for women in developed countries (793,684 new cases, 28%) [28]. Recent evidence suggests a role for miR-SNPs in breast cancer susceptibility including work by Hu et al. where the presence of mutant alleles of MIR196A2 rs11614913 and MIR499A rs3746444 significantly increased breast cancer risk in Chinese women [29]. However in genetic association analysis of Caucasian populations and functional studies in breast cancer cell lines performed by Hoffman et al. the presence of SNP in MIR196A2 rs11614913 was significantly associated with reduced risk of breast cancer, as well as less efficient processing of MIR196A2 and reduced capacity to regulate target genes, indicating additional factors may be at work in this SNP’s effect on breast cancer [30]. Additionally, Kontorovich et al. found 2 SNPs, rs6505162 and rs895819 located in MIR423 and MIR27A precursors respectively, to be significantly associated with decreased risk of breast cancer in BRCA2 mutation carriers from a Jewish population [31]. rs895819 was also significantly associated with reduced risk of developing breast cancer in families with a history of non-BRCA related breast cancer in a later study performed by Yang et al [32]. Finally rs2910164, a miR-SNP located in the 3p strand of MIR146A, was found to be associated with a younger age of diagnosis in familial breast cancer for BRCA1 mutation carriers [33].

In this study, we have investigated a panel of miR-SNPs present in miRNA genes previously identified to be involved in the pathophysiological mechanisms of breast cancer. Our initial selection included 24 variants located in 10 miRNA genes or in close proximity which were genotyped using multiplex PCR and matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) analysis. Genotyping results of the 19 miR-SNPs successfully genotyped and included in this study identified six previously reported microRNA SNPs as non-polymorphic. From the remaining polymorphisms twelve miR-SNPs showed no significant differences between cases and controls and one variant in MIR145 was identified to be associated with breast cancer susceptibility in both of our Australian Caucasian breast cancer case-control populations.

Methods

Study populations

Two independent Australian Caucasian breast cancer case populations were available for our study: The Genomics Research Centre Breast Cancer (GRC-BC) population and part of the Griffith University-Cancer Council Queensland Breast Cancer Biobank (GU-CCQ BB). We conducted single nucleotide polymorphism genotyping in the GRC-BC population initially. This consisted of DNA samples from 173 breast cancer patients from South East Queensland and DNA samples from 187 healthy age and sex matched females with no personal and/or familial history of breast, ovarian or any other type of cancer collected at the Genomics Research Centre Clinic, Southport, with research approved by Griffith University’s Human Ethics Committee (Approval: MSC/07/08/HREC and PSY/01/11/HREC) and the Queensland University of Technology Human Research Ethics Committee (Approval: 1400000104). Breast cancer samples comprised prevalent breast cancer cases diagnosed previous to their inclusion in this study. All participants supplied informed written consent. Average age of test population was 57.52 years and 57 years for cases and controls respectively.

Further validation of genotyping results was performed on a subset of the GU-CCQ BB population. 679 DNA samples from breast cancer patients residing in Queensland with a diagnosis of invasive breast cancer confirmed histologically were used to validate genotyping of miR-SNPs. Patient samples had been collected by the Genomics Research Centre in collaboration with the Cancer Council of Queensland as part of a 5-year population-based longitudinal study since January 2010. Patients included in this study were between 33 and 80 years of age, with an average age of 60.16 and they were screened for personal and/or familial history of breast, ovarian or any other type of cancer. Control population for the GU-CCQ BB was established from 2 sources: The control group for this cohort was comprised of genotyping result data taken from 201 healthy females belonging to the phase 1 European population from the 1000Genomes project. Efforts were made to select a subgroup of individuals that were comparable to the case group in terms of age, ethnicity and sex [34].

Genomic DNA sample preparation from whole human blood

Genomic DNA was extracted from whole blood samples using a modified salting out method described previously [35, 36]. DNA samples were evaluated by spectrophotometry using the Thermo Scientific NanoDrop™ 8000 UV-Vis Spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE. USA) to determine DNA yield and 260/280 ratios [3739]. Samples with a reading below 1.7 for their 260/280 ratio were purified using an ethanol precipitation protocol to guarantee DNA sample purity [40].

miRNA SNP selection

Figure 1 shows the selection process we followed to determine miRNA SNPs (miR-SNPs) that could be included in our study. Two datasets, “The whole miRNA-disease association data” and “The miRNA function set data” from the human miRNA disease database (HMMDD) created by Lu et al. [41] and updated in January 2012, were used to select 8 diseases and/or pathological characteristics and 24 biological and/or cellular functions related to breast cancer (See Table 1). As shown in Fig. 1, we picked the 50 miRNA genes from each dataset that were present in the majority of selected features for inclusion in the following steps. This list was narrowed down to the 25 miRNA genes on each dataset with the strongest evidence in order to maximise the potential for identification of biologically relevant molecules using two main criteria: miRNAs involved in the largest number of selected features from each group followed by a literature search to confirm the number of publications showing significant relationships to cancer biology or the possession of known functional effects of polymorphisms within the miRNA itself. Following this, we chose 10 miRNA genes from the 25 genes on both lists, again prioritising by number of functions and publications, and conducted a search to identify SNPs using both dbSNP database from The National Center for Biotechnology Information (NCBI) [42] and 1000 Genomes project browser [43]. Final selection of SNPs was done using this algorithm: All microRNA-SNPs located inside the pre-miRNA gene were automatically included in the SNP selection. However, SNPs located outside of the pre-miRNA gene were assessed using the following criteria: miR-SNPs located up to 500bp upstream or downstream from pre-miRNA were automatically included in the SNP selection. On the other hand, SNPs located more than 500bp from the 3’ or 5’ end were chosen only if they had a previously reported minor allele frequency higher than 5% in Caucasian populations. As a result 56 microRNA SNPs were identified in this preliminary selection (Data not shown) (See Fig. 1).

Fig. 1
figure 1

MicroRNA SNP (miR-SNP) selection algorithm using the Human miRNA Disease Database (HMDD). This flow chart shows workflow for selection of preliminary miR-SNPs included in genotyping study. Abbreviations: dbSNP, single nucleotide polymorphism database; MAF, minor allele frequency; miRNA, microRNA; NCBI National Center for Biotechnology Information; SNP, Single nucleotide polymorphisms

Table 1 Selected features from the Human miRNA disease database (HMDD) included in present study

Primer design

Using the MassARRAY® Assay Design Suite v1.0 software (SEQUENOM Inc., San Diego, CA, USA) we were able to create a single multiplex PCR genotyping assay containing 24 miR-SNPs from our preliminary selection (See Table 2). We designed forward and reverse PCR primers and one iPLEX® (extension) primer and verified that the mass of extension primers differed by at least 30 Da among different SNPs and by 5 Da between alternative alleles of the same marker to achieve successful marker and allele identification by mass spectrometry analysis. Primers were manufactured by Integrated DNA Technologies (IDT®) Pte. Ltd. (Baulkham Hills, NSW 2153, Australia) and primer information is shown in Table 3.

Table 2 List of the miRNA SNPs included in multiplex primer design using the MassARRAY® Assay Design Suite v1.0 software (SEQUENOM Inc., San Diego, CA, USA)
Table 3 Primer sequences for the miRNA SNPs included in genotyping study using multiplex PCR reaction and MALDI-TOF MS

Primary multiplex PCR

Genotyping was undertaken following the iPLEX™ GOLD genotyping protocol using the iPLEX® Gold Reagent Kit (SEQUENOM Inc., San Diego, CA, USA). Primer extension reactions were performed according to the instructions for the SEQUENOM linear adjustment method included in the iPLEX™ GOLD genotyping protocol (SEQUENOM Inc., San Diego, CA, USA). All reactions were performed using Applied Biosystems® MicroAmp® EnduraPlate™ Optical 96-Well Clear Reaction Plates with Barcode (Life Technologies Australia Pty Ltd., Mulgrave, VIC, Australia) and an Applied Biosystems® Veriti® 96-Well Thermal Cycler (Life Technologies Australia Pty Ltd., Mulgrave, VIC, Australia).

MALDI-TOF MS analysis and data analysis

A total of 12-16 nl of each iPLEX® reaction product were transferred onto a SpectroCHIP® II G96 (SEQUENOM Inc., San Diego, CA, USA) using SEQUENOM® MassARRAY® Nanodispenser (SEQUENOM Inc., San Diego, CA, USA). SpectroCHIP® analysis was carried out by SEQUENOM® MassArray® Analyzer 4 and the SpectroAcquire software Version 4.0 (SEQUENOM Inc., San Diego, CA, USA). Finally data analysis for genotype determination was done using the MassARRAY® Typer software version 4.0 (SEQUENOM Inc., San Diego, CA, USA). In order to confirm the genotypes obtained, randomly selected samples (5 each for case and control cohorts) from each genotype (n = 240) were validated by Sanger Sequencing to ensure accuracy of genotyping results. In all cases, the Sanger Sequencing confirmed the genotyping obtained using MassARRAY.

Statistical analysis

Statistical analysis of genotypes and alleles was conducted using Plink software version 1.07 (http://pngu.mgh.harvard.edu/purcell/plink/) [44]. The α for p-values was set at 0.05 to determine statistically significant association with breast cancer. Genotype and allele frequencies for each miRNA SNP in our case and control populations were established and we used Hardy-Weinberg equilibrium (HWE) to evaluate deviation between observed and expected frequencies for identification of unexpected population or genotyping biases [45, 46]. We performed Chi square analysis to evaluate differences in genotype and allele frequencies between cases and controls for each independent population [47]. Finally we calculated odds ratio (OR) and obtained 95% confidence interval (CI) 95% to assess disease risk.

Results and discussion

MicroRNAs are some of the small non-coding RNA molecules responsible for gene regulation at the translational level. They require a very complex series of nuclear and cytoplasmic processes for their synthesis and to achieve their functional effects on genes involved in key cellular functions like replication and cell differentiation [7, 10, 13]. As a result they are likely to play a role in the development and progression of cancer due to the important biological mechanisms they regulate [12, 15, 16]. They have been shown to have different roles in various types of cancer including breast cancer [18, 24]. Breast cancer is a cause for concern since recent reports by the IARC show it has very high incidence and mortality rates around the world [28]. Analysis of single nucleotide polymorphisms in well-defined case control cohorts has provided information on miR-SNPs involved in the pathophysiology of different types of cancer including breast cancer [2527, 2933]. Therefore we selected a panel of 24 miR-SNPs related to 9 miRNA genes previously identified to play a role in breast cancer to genotype in our Australian Caucasian breast cancer case control populations. Genotyping of our selected miRNA variants in the GRC-BC cohort showed six of them to be non-polymorphic (rs73798217, rs112394324, rs35301225, rs1547354, rs7050391 and rs3851812) and another five of the chosen miR-SNPs failed to successfully deliver genotypes (rs2829801, rs7395206, rs2858059, rs1143770 and rs77585961).

On the other hand, genotype and allele frequencies of the remaining 13 miR-SNPs in the GRC-BC cases and controls showed these to be closely similar to those found in Hapmap for Caucasian populations and they were also in Hardy Weinberg Equilibrium (HWE) (p > 0.05). Ultimately, however, chi-square analysis of genotyping results of 12 SNPs located in relation to seven miRNAs (MIR210, MIR221, MIR222, MIR21, MIRLET7A1, MIRLET7A2 and MIR145) showed no significant differences for genotype and allele frequencies between cases and controls in our GRC-BC population (See Tables 4, 5, 6, 7, 8 and 9).

Table 4 Allele and genotype frequencies for miRNA SNPs in MIR210 obtained from the GRC-BC population
Table 5 Allele and genotype frequencies for miRNA SNPs in MIR221 and MIR222 from the GRC-BC Population
Table 6 Allele and genotype frequencies for miRNA SNPs in MIR21 obtained from the GRC-BC population
Table 7 Allele and genotype frequencies for miRNA SNPs in MIRLET7A1 obtained from the GRC-BC cohort
Table 8 Allele and genotype frequencies for miRNA SNPs in MIRLET7A2 obtained from the GRC-BC population
Table 9 Allele and genotype frequencies for rs55945735 located in MIR145 obtained from the GRC-BC population

In contrast, we were able to determine significant differences at the allelic level for rs353291 after chi-square analysis in our GRC-BC cohort (p = 0.041) although no significant difference in genotype frequencies (p = 0.09) (See Table 10). We then proceeded to genotype this SNP in the GU-CCQ BB population and we also found the genotype and allele frequencies closely matched Hapmap frequencies for Caucasian populations and cases and controls were in HWE. Statistical analysis of genotyping in our replication population showed similar findings to those obtained in the GRB-BC cohort shown in Table 10. We were able to find significant differences in allele frequencies between cases and controls in the GU-CCQ BB population (p = 0.006) and also statistically significant differences at the genotype level (p = 0.02). Finally we calculated odds ratios for alleles in the GRC-BC and GU-CCQ BB populations to be 1.37 (CI 95%: 1.01–1.84) and 1.31 (CI 95%: 1.04–1.70) respectively, suggesting the presence of the C allele at this locus increases the risk of developing breast cancer. However it should be noted that if we consider multiple testing, our finding for the analysis of allele frequencies for both populations for this variant is not significant if we determine Bonferroni correction of the α for p-value to be 2.08 x 10−3. However these results are still potentially interesting particularly considering the similarity of allelic significance in both independent case-control cohorts and point to the need for further genotyping in extended populations.

Table 10 Allele and genotype frequencies for SNP rs353291 located in MIR145 obtained from the GRC-BC and GU-CCQ BB cohorts

miR-SNP rs353291 is located 450 bp upstream from the MIR145 gene, inside the miRNA 143 host gene transcript, in the long arm of chromosome 5 region 32 at position 148,810,746. To the best of our knowledge there are no previous association studies in relation to this miR-SNP and breast cancer risk in Caucasians or any populations of other ethnicities. However, MIR145 has been previously reported to play a role in cancer biogenesis and progression. Downregulation of MIR145 is a common finding previously reported in colorectal [48, 49], bladder [50], lung [51, 52] and oesophageal [53] cancers possibly leading to poor prognosis. Similarly, it has also been found to have a tumour suppressor role in breast tissue particularly in the myoepithelial/basal cell compartment and it is found to be downregulated in breast cancer tissue samples [5456]. Research using breast cancer cell lines showed it regulates genes involved in modulation of apoptosis [57, 58]. Finally downregulation of MIR145 has been associated with a more aggressive behaviour of breast malignancies based on results performed in both breast cancer cell line and tissue samples [59, 60]. However the molecular mechanisms leading to decreased expression of MIR145 still remain unknown and our finding could potentially help to provide further knowledge on these mechanisms through further functional validation on breast cancer cell culture and/or animal/models. It is also possible that rs353291 is linked to a different nearby SNP not considered in this research that may have direct functional effects on MIR145, so additional sequencing/genotyping studies may need to be performed prior to functional assessment of the link between rs353291 and breast cancer.

Conclusions

We were able to determine that the presence of a polymorphism in miR-SNP rs353291 is associated with an increased risk of developing breast cancer based on our findings. To the best of our knowledge, this is the first report on breast cancer risk association for this variant in individuals of Caucasian background. This finding could potentially explain the previously described role that MIR145 plays in breast cancer documented in the literature, but it requires confirmation via functional studies using cell culture or animal models. It also requires further validation on larger Caucasian population as well as in cohorts of individuals with different ethnical backgrounds before it can be translated into clinical applications used in breast cancer diagnosis or treatment.