Introduction

MicroRNAs (miRNAs) are small, non-coding, single-stranded RNAs ranging in size between 18 and 22 nucleotides; they are typically excised from longer, 60- to 110-nucleotide stem-loop precursors [1, 2]. miRNAs are involved in fundamental biological processes, including development, differentiation, apoptosis, and proliferation, and are believed to act predominately as post-transcriptional regulators that can either degrade their mRNA targets or repress their translation [3]. A single miRNA may have multiple mRNA targets, and up to 30% of human genes may be regulated by miRNAs [4, 5].

Aberrant expression of miRNAs in cancer was initially identified in B-cell chronic lymphocytic leukemia [6], and miRNA dysregulation has been subsequently reported for many tumor types in which, depending on the specific target mRNA(s), they may act either as tumor suppressor genes or as oncogenes [7, 8]. In breast cancer, post-diagnosis miRNA levels have been shown to correlate with a number of tumor characteristics, including stage, vascular invasion, proliferative index, and estrogen receptor/progesterone receptor (ER/PR) status [9, 10], and may have prognostic value.

miRNAs have recently been found in human serum and plasma, where they appear to be resistant to RNAase degradation and thus relatively stable, even in stored samples [11]. This stability has made miRNAs appealing candidates for epidemiologic studies of stored samples, particularly since miRNA profiling requires only small amounts of serum or plasma [12]. The use of circulating miRNA profiles as potential early-detection cancer markers has generated considerable interest [1316], although data addressing such application remain sparse. Initial studies have suggested that serum levels of miRNAs may differ between diagnosed cancer cases and controls [17], and several recent case control studies of breast cancer have reported evidence of differential miRNA expression levels in serum [1821]. These studies have shown little agreement, perhaps because some have measured only a few miRNAs whereas others have used more comprehensive miRNA screens, but with a small number of subjects. None has used samples obtained prior to diagnosis. Use of such prospective samples avoids a number of important potential biases (for example, differential selection and processing of cases and controls or the possibility that the differences observed in case samples are the result of biopsy, cancer treatments, behavioral changes, stress, or other factors experienced by cases but not controls).

Here, we report on a study that prospectively collected serum samples from 205 women who subsequently developed breast cancer and 205 women who remained cancer-free and that used microarrays to comprehensively assess known miRNAs.

Materials and methods

Study population

The Sister Study [22] is a prospective cohort study of 50,884 women and was designed to examine the environmental and genetic determinants of breast cancer. The cohort has been previously described [23]; briefly, women from the US or Puerto Rico were eligible to enroll if they themselves had never had breast cancer but had a full or half-sister who had breast cancer. At baseline interview, all participants provided extensive information, including family history, reproductive history, and information about potential risk factors. Informed consent and blood samples were obtained during a home visit. For women who subsequently developed breast cancer, detailed information on diagnosis was collected from medical records and self-report. Pathology reports were abstracted for tumor grade, stage, and other information, including status for ER, PR, and HER-2 (human epidermal growth factor receptor 2) expression. The study was approved by the Institutional Review Board of the National Institute of Environmental Health Sciences, National Institutes of Health, and the Copernicus Group Institutional Review Board.

Selection of cases and controls

We designed a matched-pair nested case control study. We selected patients who had confirmed invasive breast cancer, who completed enrollment by August 2008, and whose diagnosis occurred within 18 months following blood draw (n = 242). We excluded 29 cases who lacked a serum sample or whose sample had integrity issues during collection and shipping and eight cases whose sample had limited volume, leaving 205 cases that are the focus of our study. For each case, a matched control was selected from the 50,884 participants on the basis of the following criteria: no history of cancer (other than non-melanoma skin cancer), having completed enrollment by August 2008, an available blood sample, same race (non-Hispanic white, black, Hispanic, or other), similar age at enrollment (within 5 years), and similar date of blood draw (within 2 months). Three replicate serum samples from three women (nine samples in total) who were not participants in the study but who provided blood samples that were collected and processed in the same manner as Sister Study participants were used to provide technical replicates.

Assignment to extraction batches and array chip lot

To minimize possible processing and chip lot effects, samples were assigned to processing batches of seven to nine pairs, and batches had similar distributions of age, race, and date of enrollment. For array hybridization, each batch was assigned to one of two different chip lots ('A' and 'B') in a manner designed to ensure a balance of these same characteristics. The nine replicates (described above) were assigned to the same batch and chip lot. Laboratory personnel were blind to case control status and other phenotype information.

RNA extraction, labeling, and hybridization

Total RNA was extracted in batches by using a Total RNA purification kit (cat. no. 17200; Norgen Biotek Corp., Thorold, ON, Canada). In accordance with the manufacturer's recommendation not to exceed 200 µL per column, 400 µL of total serum from each individual was split into two equal 200-µL aliquots and then processed separately following the manufacturer's recommended protocol for total RNA purification from serum. An on-column DNase digestion was added before sample elution by using an RNase-Free DNase I Kit (cat. no. 25710; Norgen Biotek Corp.), and the two aliquots were subsequently pooled. Fixed volumes rather than fixed amounts of RNA were used in accordance with other studies [24].

Total RNA (8 µL) was directly labeled by using Flash Tag Biotin HSR Labeling kits (cat. no. HSR30FTA; Genisphere, LLC, Hatfield, PA, USA) in accordance with the instructions of the manufacturer. RNA was heated to 80°C for 10 minutes before labeling to inactivate any residual DNase activity. RNA was hybridized for 42 hours to the GeneChip miRNA 2.0 array (cat. no. 901755; Affymetrix Inc., Santa Clara, CA, USA [25]). The GeneChip miRNA 2.0 arrays contain 100% miRBase version 15 coverage of 131 organisms and contain probes for 3,439 human non-coding RNAs (ncRNAs), including 1,105 miRNAs and 2,334 other ncRNAs (including scaRNAs and snoRNAs). The arrays were washed and stained by using standard Affymetrix protocols and scanned by using an Affymetrix GCS 3000 7G Scanner. Feature intensities were extracted by using miRNA 2.0 array library files. Array hybridization and scanning were completed by Precision Biomarker Resources, Inc. (Evanston, IL, USA). The average Spearman correlation coefficient values for three sets of three technical replicates were all above 0.8 (Additional file 1). Array data were deposited into the NCBI Gene Expression Omnibus (GSE44281).

Replication samples and qRT-PCR

An independent set of 10 women were used to validate selected miRNAs via quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Five women who provided consent and blood samples but who developed breast cancer prior to completing enrollment were selected as cases, along with five controls who also provided consent and blood samples and who were cancer-free but did not complete enrollment. Total RNA was extracted from serum samples of these women as described above with the addition of Synthetic C. elegans miScript miRNA Mimic (cat. no. MSY0000010; Qiagen, Valencia, CA, USA). Synthetic cel-39 was spiked-in at a final concentration of 0.25 fmol/µL prior to extraction and used as a PCR normalization control. The RNA concentration, reverse transcription, and pre-PCR steps were carried out in accordance with a previously published protocol [26]. ExoSAP-IT (cat. no. 78250; Affymetrix Inc.) treatment followed by column purification (cat. no. 28004; Qiagen) in accordance with the protocol of the manufacturer was used to purify the pre-PCR product. Individual PCR was run in triplicate by using 1 µL of purified pre-PCR product. The reaction contained the following components: 2x Taqman universal master mix (cat. no. 4324018; ABI, Carlsbad, CA, USA), 1 µM forward primer, 1 µM universal reverse primer, and 0.2 µM probe. The reaction was run on a Bio-Rad CFX 384 Real-Time System (Bio-Rad Laboratories, Inc., Hercules, CA, USA) by using the following parameters: 55°C for 2 minutes, 95°C for 10 minutes, followed by 40 cycles of 95°C for 15 seconds and 55°C for 1 minute. PCR cycle threshold (Ct) values were recorded for each target gene and for normalization controls and were averaged across three independent runs. Primers for miR-222, miR-181a, miR-1825, and miR-18a were custom-ordered from IDT (San Diego, CA, USA) by using previously published sequences [26]. Primers for cel-39 were designed in the same fashion as above and custom-ordered from IDT.

To determine the best candidate miRNA for PCR normalization in our data set, we ran the array expression data from the 47 miRNAs expressed in almost all individuals through the NormFinder software [27]. NormFinder uses a model-based variance estimation approach [28]. Using these results, we selected as a qRT-PCR normalization control miR-1825, which showed one of the highest stability values across the 410 cases and controls and had blood levels that were similar to those of the three target miRNAs. We used the average of miR-1825 and an external spike-in cel-39 control, a strategy shown to be effective for controlling both technical and biologic variability in qRT-PCR assays from serum [17, 24]. The efficiency of the four PCR assays (for miR-181a, miR-18a, miR-222, and miR-1825) was similar for all four assays (Additional file 2). Normalized relative expression was based on Ct values and calculated as 1/(Ctgene−Ctnorm).

Data processing and statistical analysis

miRNA expression intensity values were background-corrected and normalized across arrays by using the robust multichip average method [29]. The intensity data used in all analysis were log (2)-transformed.

For each array, the miRNA probe set signals were compared with the distribution of signals for anti-genomic probes that had matching GC content (miRNA QC Tool, version 1.0.33.0), and in accordance with the recommendation of the manufacturer, Wilcoxon rank-sum test of P value of less than 0.06 was used to identify miRNAs above background. Subsequent analysis was restricted to 414 miRNAs that exceeded background levels in at least 50 women. Conditional logistic regression was used to identify differentially expressed miRNA probes between cases and controls for those 414 probes. Because analysis of circulating miRNAs in prospectively collected samples is still exploratory, we - like some other investigators of circulating miRNAs [30, 31] - regard these results as descriptive and not as tests of hypotheses and so provide P values that are unadjusted for multiple comparisons.

The association between miRNAs and the tumor characteristics of hormone receptor status (ER, PR, and HER-2) and lymph node status was tested in a case-only logistic analysis, in which race was adjusted for. Chip lot and batch were specified as random effect variables. All statistical analyses were performed by using R 2.15.

Pathway analysis with ingenuity pathway analysis

miRNAs found to be significantly associated with case control status were further analyzed with ingenuity pathway analysis (IPA) [32]. Using IPA's microRNA target filter, we generated a list of predicted mRNA targets for each of the 21 significant miRNAs. The list was then restricted to the mRNAs listed in the IPA database as experimentally verified targets of any of the 21 miRNAs. This mRNA target list was then used to run a canonical pathway analysis.

Results

A large number of miRNAs are detected in serum

In total, 410 serum samples from breast cancer cases (n = 205) and controls (n = 205) were analyzed in this study; baseline characteristics of the cases and controls are summarized in Table 1. Of the 1,105 human miRNAs, 414 miRNAs were detected above background threshold levels in at least 50 women. Forty-seven miRNAs were detected above background in 400 or more women (Table 2), and miR-16 showed the highest average expression. Even though expression of miRNAs showed considerable inter-individual variation, several miRNAs, including miR-1825 and miR-1228, were relatively constant among women (Figure 1).

Table 1 Demographic characteristics of study population
Table 2 Number of microRNAs detected above background
Figure 1
figure 1

A large number of microRNAs (miRNAs) are detected in serum. Box-and-whisker plots showing the log (2)-normalized expression for the 47 miRNAs which are expressed above background in 400 individuals. Expression levels were adjusted for batch and chip lot across all samples. The black line represents the median, and the upper and lower 25% are the top and bottom of the box, respectively. Dots represent the outliers.

Discovery of differentially expressed miRNAs in serum

When paired case control analysis of the 414 miRNAs expressed above background was used, 21 miRNAs showed significantly different levels in cases and controls (P ≤0.05) (Table 3). The differences were small, ranging from 4% to 19%. Higher miRNA expression in women destined to become cases was significantly more common (16 of 21 miRNAs) than would be expected by chance alone (binomial test, two-tailed P <0.05). Differential miRNA expression was not stronger in women close to their time of diagnosis, but sample size was small and all cases were diagnosed within 18 months of blood draw (data not shown). Using qRT-PCR on a small independent replication set of five cases and five controls, we further examined the three miRNAs (miR-18a, miR-181a, and miR-222) with the highest expression in cases. As predicted, all three miRNAs showed higher levels in cases, although none was statistically significant in this small set of women (Additional file 3).

Table 3 Twenty-one differentially expressed microRNAs with a P value of not more than 0

The impact of miRNA alterations on regulatory pathways

To explore potential biological associations, we ran IPA on the 82 experimentally verified mRNA targets of the 21 differentially expressed miRNAs. Sixteen IPA canonical pathways, including molecular mechanisms of cancer, were enriched as were other cancer-related pathways, including p53 signaling, cyclins and cell cycle regulation, and Myc-mediated apoptosis signaling (Additional file 4).

miRNA expression association with tumor characteristics

To investigate the potential association of serum miRNA expression with tumor characteristics in the 205 women who later developed breast cancer, we subclassified them into groups based on tumor characteristics (Table 4) and performed a case-case comparison. There was no evidence of significant differences in serum miRNA levels based on tumor ER or PR staining characteristics. In comparisons of serum samples from the 25 women who developed HER-2-positive tumors with 147 samples from women who developed HER-2-negative tumors, there were seven miRNAs with significantly differential expression (P ≤0.05); one miRNA was overexpressed and six miRNAs were underexpressed in the HER-2-positive tumors (Figure 2A and Additional file 5). Case-case comparison of serum from women who subsequently developed lymph node-negative tumors (pN0, n = 153) with that of women who developed lymph node-positive tumors (pN1, pN2, or pN3, n = 52) revealed 10 differentially expressed miRNAs (P ≤0.05); five were overexpressed and five were underexpressed in node-positive tumors (Figure 2B and Additional file 6).

Table 4 Patient tumor characteristics
Figure 2
figure 2

Serum microRNA (miRNA) expression is associated with tumor subtype. (A) Serum miRNAs significantly associated with HER-2 expression (negative differences correspond to lower levels in women developing tumors with overexpression) (P ≤0.05). (B) Serum miRNAs significantly associated with nodal status (pN1 or higher versus pN0) (P ≤0.05). P values and percentage change were determined by using a linear mixed model.

Discussion

miRNA profiles are gaining interest as potential diagnostic or prognostic markers for breast cancer [33]. However, existing studies have been limited by sample size or the number of miRNAs analyzed, and none has used prospectively collected samples [18, 31, 34]. Our study minimized potential biases by profiling global serum miRNA expression patterns in samples obtained from women prior to clinical diagnosis (mean time to diagnosis was 10 months). We found a set of 21 miRNAs differentially expressed in serum samples from 205 women who subsequently developed breast cancer compared with 205 women who remained cancer-free during the time of follow-up. The differences in miRNA levels were small and include both overexpression and underexpression of miRNAs in the cases, and overexpression was significantly more frequent than would be expected had the association been random. Published reports of primary breast tumors or cell lines have examined seven of the 21 differentially expressed miRNAs we found, and all seven showed agreement with the direction of change in our case serum samples (Table 3). IPA of the mRNA targets of these differentially expressed miRNAs suggested gene enrichment for cancer-related signaling pathways. Although the absolute differences in miRNA levels between serum samples of cases and controls are quite small, differences pre-date clinical diagnosis and may reflect important pathways for breast cancer development.

miR-18a, miR-181a, and miR-222 showed the highest percentage difference between cases and controls in our study; qRT-PCR of these miRNAs in a small independent replication set of cases and controls, though not statistically significant, replicated the direction of change for all three. These three miRNAs have been suggested to act as oncogenes through regulation of their potential target mRNAs. miR-18a is part of the oncogenic miR 17-92 cluster, which is often overexpressed in solid tumors, including breast [35]. Overexpression of this cluster is believed to cooperate with c-Myc in stimulating proliferation by negatively regulating E2F1 [36, 37]. Increased expression of miR-181a in the bone marrow of patients with breast cancer has been reported to be associated with shorter disease-free survival, higher grade, and breast cancer recurrence [38]. miR-181a is believed to target the tumor suppressor gene programmed cell death protein 4 (PDCD4) [38], which inhibits tumor neoplastic transformation [39]. In breast cancer cell lines, miR-222 overexpression has been reported to be associated with tamoxifen resistance through targeting the cell cycle inhibitor p27 (Kip1) [40]. miR-222 has also been reported to increase proliferation of ERα-negative cells while reducing the expression of various tumor suppressor proteins [41], and expression of miR-222 has been reported to increase cell migration in the epithelial-to-mesenchymal transition acting downstream of the RAS-RAF-MEK oncogenic pathway [42].

Interestingly, two recent case control studies have provided evidence that both miR-222 and miR-181a are overexpressed in the serum of patients with breast cancer. One used sequencing by oligonucleotide ligation and detection (SOLiD) of serum samples obtained prior to surgery from 13 breast cancer cases compared with samples from 10 healthy controls and found 26 miRNAs that were overexpressed in cases, including miR-222 and miR-181a; overexpression of miR-222 was validated in an independent group of 50 cases and 50 controls by using qRT-PCR [20]. A second study used Solexa sequencing combined with Taqman low-density array chips on serum samples obtained prior to surgery from 48 breast cancer cases and 48 controls; 10 miRNAs were found to be overexpressed in the cases, and four were validated by using qRT-PCR in an independent group of 76 cases and 76 controls [21]. That study also found overexpression of miR-222 [21]. These studies, combined with our prospective study, provide a growing body of evidence that miR-222 measured in blood is associated with breast cancer.

Among cases, we compared the serum miRNA profiles of women with different tumor characteristics, including hormone status (ER, PR, and HER-2) and nodal status. Although there were no differences in ER or PR status, there were differences in HER-2 and lymph node status. Of the seven miRNAs differentially expressed in the serum of women who developed HER-2-overexpressing breast tumors, miR-93, miR-183, and miR-29a have been reported to be associated with breast cancer in previous studies [20, 43, 44]. In our study, miR-93 was underexpressed in the serum of women who developed HER-2-overexpressing breast tumors; interestingly, miR-93 expression was recently shown to induce a more differentiated cell phenotype in breast cancer cell lines, and expression of miR-93 in mouse mammary fat pads blocked tumor development and metastases [44]. Of the 10 miRNAs differentially expressed in the serum of women with tumors that spread to the lymph nodes (pN1 or higher), four (miR-145, miR-124, miR-125b, and miR-320) have been reported to be associated with breast cancer in previous studies [4548]. Of these, miR-320 is of particular interest as we found three miR-320 family members (miR-320b, miR-320d, and miR320e) to be underexpressed in the serum of women who developed lymph node-positive breast tumors. miR-320 has been reported to be decreased in breast tumor tissue and downregulation of miR-320 - through loss of phosphatase and tensin homolog (PTEN) -has been shown to promote tumor proliferation and invasiveness in mouse models; expression of miR-320 distinguished human normal breast stroma from tumor stroma and was correlated with recurrence [49]. A study comparing miRNA expression in inflammatory breast cancer (IBC) with non-IBC also found miR-320 to be downregulated in the more aggressive IBC group of tumors [50]. Thus, loss of miR-320 expression may be associated with a higher likelihood of lymph node involvement and a more aggressive metastatic phenotype.

Although miRNAs that are differentially expressed between tumor and normal tissue are more frequently downregulated in tumor tissue [7], our study (like others [20, 21]) has found that circulating miRNAs that differ in levels between breast cancer patients and controls are more frequently at higher levels in case blood samples. The mechanism underlying circulating miRNA stability is still being investigated. One model involves the active release of miRNAs from cells in membrane-bound microvesicles, including exosomes and shedding vesicles [5153]. There is evidence that microvesicles can deliver miRNAs to recipient cells and trigger changes in target mRNA levels [54]. A recent report has shown that vesicle-encapsulated miRNAs represent only a minor portion of circulating miRNAs but that a significant portion of circulating miRNAs are associated with Argonaute2 (Ago2) [55], the effector component of the miRNA-induced silencing complex [56]. Both models support the possibility that miRNAs may be actively released into circulation and could act as signaling molecules able to regulate their target mRNA expression in recipient cells. Cancer-associated miRNAs in the circulation could also originate from immunocytes in the tumor microenvironment or from a response mediated by the body's systemic response to disease [57].

Conclusions

We find some evidence of differences in miRNA serum levels between women who subsequently developed cancer compared with women who remained clinically cancer-free. The magnitude of these differences is small, and this may limit their clinical application as circulating early-detection markers for breast cancer. This is the first study to use prospectively collected samples; limitations include a relatively short follow-up period and a sample size that is not large enough to fully explore etiologic versus diagnostic relevance. Our study was carried out within a cohort of women who each had a sister with breast cancer, putting the former at about twofold increased risk, and so the differences that we observed may not be generalizable to women without a similar family history.