Background

MicroRNAs (miRNA) are short 21-23 nt sequences that regulate gene expression post-transcriptionally [1, 2]. Two processes, mRNA destabilization and translational repression, are believed to occur as a result of miRNA targeted gene regulation [3]. Many miRNA target prediction strategies rely on sequence matches between the miRNA seed region (positions 2-7 from the 5'-end) and well-conserved sites on the 3'-UTR [4, 5]. Identification of several factors contributing to specificity of 3'-UTR target sites has helped improve target prediction methods [6]. However, not all target sites reside on the 3'-UTR; a few reports have shown that 5'-UTR and coding sequence (CDS) sites are functional as well [712].

Translation initiation in eukaryotes is postulated to follow the ribosome scanning model [13], possibly constrained by multiple cis-elements on the 5'-UTR such as secondary structure [14], 5'-terminal oligopyrimidine tracts [15] and upstream AUG (uAUG) nucleotides [16]. It is known that uAUGs cause a reduction in translational efficiency, therefore acting as a strong negative regulator of gene expression [13]. Comparative genomic analysis has revealed that uAUGs are conserved in mammalian 5'-UTRs to a greater extent than in other segments of mRNAs, genes harboring them mainly coding for transcription factors [17]. uAUGs may form alternative start sites forming upstream open reading frames (uORF), which are known to reduce efficiency of translation, possibly by translation of the uORF-encoded peptide [18]. It has been noted that a uAUG/uORF can inhibit translation independent of a downstream secondary structure or its position relative to other uAUGs before the main ORF [19, 20].

Unlike the start codon of the main ORF, which in good initiation context is typically identified by the consensus Kozak sequence [21], many of the uAUGs are in sub-optimal context for translation [16]. Some groups have been able to assay for in vitro-translated uORFs [22, 23], which are not, however, readily detectable unless fused to a reporter gene [24, 25]. One study showed that translation repression is not dependent on the encoded peptide sequence [23], which suggests that the peptide action may be non-specific. Further, Kwon et al. demonstrated that addition of a synthetic peptide encoded by a uORF did not alter translation of the protein-coding gene even though the uORF on the 5'-UTR was able to repress translation [24].

Moreover, previous studies have reported that the uAUGs' effect on translation repression is specific to tissue type: though mRNAs containing uAUGs are expressed ubiquitously, the proteins are expressed only in specific tissues [19, 26]. If indeed the translation of uORF limits downstream ORF translation, why does this repression occur only in certain cell-lines and tissues? There appears to be an additional mechanism of translation repression through uAUG other than upstream-encoded peptides.

Earlier, through computational analysis we discovered the presence of genome-wide sites on 5'-UTRs that interact with 3'-ends of miRNAs, a few of which were experimentally validated [12]. In this report we identify a subset of these miRNA interactions specific to the uAUG that occur preferentially through the 3'-end of the mature miRNA sequence. Based on our findings, we hypothesize that miRNAs expressed in one cell type but not in others may account for differences in protein expression in the cell types without changes in mRNA levels. Using miRNA expression data and results from prior work done with the KLF9 gene in HeLa and N2A cells, we demonstrate the validity of our hypothesis. Our results suggest the role of miRNAs in cases where uAUG confers tissue-specific protein expression of the target mRNA.

Results

uAUGs are potential miRNA target sites

An earlier study of excess conservation of uAUGs used a total of 1955 pairwise alignments of human and mouse 5'-UTR sequences [17]. The authors generated the alignments after careful pre-processing steps to remove any coding sequences that may have been mis-annotated as leader sequences. We used this well-curated alignment data to compile sequences containing uAUGs from human 5'-UTRs (see Methods), generating a total of 4009 11-mers centered on uAUG. The number of uAUGs per 5'-UTR ranges from one to 20, with 68% of the 1955 human 5'-UTRs containing at most two (Fig. 1A). Churbanov et al.[17] showed that upstream AUG triplets were conserved more than any other on the 5'-UTR. In order to investigate conservation patterns around the uAUG we looked at the identities of nucleotides in subsequences of 11-mers that were extracted. The uAUG sequences appear to be highly conserved between both human and mouse UTRs, with all 7-mers having 100% identities and roughly 70% of 11-mers being conserved (Fig. 1B). This indicates that the nucleotides surrounding uAUGs are well conserved between the two mammalian 5'-UTRs.

Figure 1
figure 1

Number of uAUGs in 5'-UTRs and their conservation. (A) Distribution of uAUGs in human 5'-UTR sequences (B) Fraction of uAUG-containing n-mer sequences conserved in human and mouse 5'-UTRs.

Mature human miRNA sequences (miRBase, version 11.0) [27] were downloaded and categorized as conserved (471 sequences) or non-conserved (206 sequences) miRNAs (see Methods). To reveal preferential interaction with any portion of the miRNA we split each sequence into its 5'- and 3'-ends, the former containing the seed region. We then looked for sequence matches between miRNA ends and the uAUG-containing sequences generated. This was done in two steps: 1) a thermodynamics-based search using RNAhybrid [28] with a ΔG cutoff ≤ -14 kcal mol-1 followed by 2) a filter step to look for 7 or more consecutive matches with zero or one GU wobbles. To control for spurious hits, the number of interacting pairs was compared to the number obtained after shuffling the mature miRNAs sequences and repeating the search procedure.

We observed many predicted interactions between uAUG sequences and the two miRNA ends, characterized by a dependency on conservation of miRNAs. Only conserved miRNAs showed a significant number of interactions while non-conserved miRNAs were no better than their shuffled cohorts (Fig 2A and 2B). There were a number of 7-mer Watson-Crick complementary matches between the 5'-ends of conserved miRNAs and uAUG sequences (Fig 2A). Interestingly, there seemed to be a greater number of such interactions at the 3'-ends (Fig. 2A), which suggests a preference for pairing between uAUGs and 3'-ends. These interactions arose from 46 conserved miRNAs and 263 unique uAUG motifs of length 7 or more (Table 1). Further, when we included at most one GU wobble the only significant result that persisted was the interaction with the 3'-ends of conserved miRNAs (Fig. 2B). Previously, we conducted a genome-wide motif study of 5'-UTRs and 3'-UTRs and observed a starkly similar propensity for interaction between 5'-UTRs and 3'-ends of miRNAs, few of which were validated [12]. Another study reported similar observations wherein 5'-UTR and coding regions participate in binding the 3'-end of the highly conserved miRNA, let-7[10]. The preference for interaction with 3'-ends suggests the role of non-seed region matches in the 5'-UTR, while seed-region matches prevail in the 3'-UTR. This may explain the fact that there are very few known endogenous targets on the 5'-UTR that exhibit seed-matches [29]. We conducted a brief GO-term investigation into the nature of genes containing the uAUGs listed in Table 1. Out of a total 1071 genes that contained these uAUGs we were able to retrieve annotations for 678 genes. The majority of these 678 were found to be involved in transcription factor activity (See Additional file 1).

Table 1 MicroRNAs predicted to interact with uAUG-containing motifs
Figure 2
figure 2

Interaction of miRNAs with uAUG sequences. Each predicted interaction is characterized by a 7-mer consecutive match between the indicated half of mature miRNA (5p and 3p for the 5'- and 3'-end respectively) and uAUG sequence with ΔG37 ≤ -14 kcal mol-1. Grey bars represent actual counts and white bars represent average number of counts over 1000 repetitions of miRNA shuffling. Error bars represent the standard deviations. Significant outcomes are indicated with the corresponding p-values. (A, B) Number of interactions between uAUG sequences (4009 in total) and conserved and non-conserved miRNAs (471 and 206 in total respectively) without GU wobbles (A) and with at most one GU wobble (B). (C, D) Number of interactions between conserved miRNAs and uAUG sequences (2935 conserved and 1074 non-conserved) without GU wobbles (C) and with at most one GU wobble (D).

As nearly 75% of the 11-mers were found to be conserved between human and mouse 5'-UTRs (2935 out of 4009), we investigated if the interactions with conserved miRNAs were a function of uAUG sequence conservation. Results showed no dependence on uAUG conservation when not allowing GU wobbles (Fig. 2C). However, when allowing at most one GU wobble, only conserved uAUGs exhibited significant interactions with 3'-ends of miRNAs (Fig. 2D).

The above results indicate that uAUGs may participate in highly sequence-specific Watson-Crick base-pairing with miRNAs, particularly towards the 3'-ends. The fact that inclusion of a GU wobble still resulted in a significant number of interactions between the 3'-ends and uAUGs suggests functionality.

Expressed miRNAs may bind endogenous uAUG sites

The analyses that follow are based on experiments with genes that contain uAUGs in their 5'-UTRs, drawing upon sequence data and results from previous experiments that attribute translational repression to the uAUGs. We also used miRNA expression evidence from several sources - these references are consolidated in the form of meta-data (Table 2). We extracted 11-mer sequences containing uAUGs for these genes and looked for interactions with conserved miRNAs using the search strategy outlined above. Based on the observations in Fig. 2A and 2B, we allowed one GU wobble for interactions with the 3'-end and none with the 5'-end. Many of the genes contain multiple uAUGs/uORFs that have different inhibitory effects on translation. We assigned discrete values to these uAUGs that reflect their ability to repress expression of a downstream reporter. These were obtained by comparing the effect of the uAUG either on a control construct or on a construct where the uAUG under consideration is mutated or deleted. The values range from 1× to 6×, where 1× indicates that the uAUG is least repressive or does not show any effect. Sequences that limit the expression of reporter to half or one-third the control or mutant case are assigned a value of 2× or 3× respectively, and so on.

Table 2 Genes used in analysis along with references

We not only observed complementary matches with conserved miRNA sequences but also confirmed the presence of many of the predicted miRNAs in cell-lines where repression was observed (Table 3). There also appears to be an association between repressive strength of uAUGs and miRNA target predictions. Two uAUGs that have little or no effect on repression are indicated by '1×' in Table 3, lacking miRNA interaction sites. Conversely, uAUGs with strong repressive potential (2×-6×) are complementary to expressed miRNAs except for the first uAUG in the ADH5/FDH gene and second uAUG of the KLF13 gene, where expressions of the predicted miRNAs have not been detected. In cases where there is more than one uAUG, more than one miRNA may act in a combinatorial manner to produce a net repressive effect. This is in line with observations of interactions between many miRNAs and single 3'-UTR [30]. These observations suggest that some of the uAUG sequences are miRNA-specific and functional target sites.

Table 3 Genes containing uAUGs predicted to interact with expressed miRNAs

KLF genes are probable 5'-UTR miRNA targets

Kruppel-like factors (KLFs) are transcriptional regulators that contain a characteristic zinc-finger domain and are known to play a role in differentiation and other cellular events [31, 32]. There are as many as 15 members in this family, seven of them containing at least one uAUG. Using the criteria set above we identified 7-mer matches between uAUG-containing sequences and miRNAs in all seven of these genes (Table 4). Two of these, KLF9 and KLF13, also called BTEB1 and RFLAT-1 respectively, are known to be translationally regulated by uAUGs in their 5'-UTRs [19, 26]. The uAUGs in these two genes have been implicated in cell-specific control of protein expression though their respective transcripts are present in many other tissues, suggesting a post-transcriptional mechanism of gene regulation [19, 26].

Table 4 uAUGs from members of the KLF family predicted to interact with conserved miRNAs

Specifically, protein expression of KLF9, whose 5'-UTR contains 10 uAUGs, is limited to brain tissue though its mRNA is expressed ubiquitously [19]. The 5'-UTR, particularly the portion containing uAUGs 6 and 7, suppressed reporter gene translation in HeLa cells but not in mouse neuroblastoma (N2A) cells [19]. This observation was even more intriguing because peptides from the two uORFs starting from uAUG6 and uAUG7 have not been detected [19]. Similarly, though KLF13 mRNA is expressed in multiple tissues, protein expression was only detected in adult spleen and lung tissues [33]. While KLF13 mRNA levels are constant throughout T-cell activation, KLF13 protein is only expressed later on in the activation process [26]. Presence of several uAUGs in its 5'-UTR down-regulated translation of the reporter gene in Jurkat T-cells and, to a lesser degree, in HEK293 cells [26].

We decided to focus our analysis on KLF9 uAUGs since the effects of wild-type and mutant constructs used to elucidate the roles of uAUGs were demonstrated in both cell-lines relevant to tissue specificity. We extracted uAUG 11-mers from the KLF9 5'-UTR sequence used in the experimental study [19] and searched for interactions with both ends of conserved miRNAs. Since the 5'-UTR study for KLF9 was also done in the mouse neuroblastoma (N2A) cell line, we used both mouse and human miRNAs in the analysis. All uAUGs except uAUG5 and uAUG8 interacted with at least one miRNA (Table 5). The ninth uAUG was predicted to interact with as many as five miRNAs. Most of these predicted miRNAs are expressed in HeLa cells but not in N2A cells, including those that match uAUG6 and uAUG7. Out of 26 human miRNAs predicted to interact with uAUGs (Table 5) 16 are reported to be expressed in HeLa cells, whereas out of 18 mouse miRNAs predicted only 5 are reported to be expressed in N2A cells.

Table 5 KLF9 uAUGs predicted to interact with miRNAs in HeLa cells

Regulatory roles of each uAUG/uORF may be studied by mutating one or more of the uAUGs to mitigate repression. In the case of KLF9, mutation of uAUG6 or 7 or both relieved translation repression [19]. However, uAUG6 inhibits translation to a greater extent compared to uAUG7, the translation efficiency of the uAUG6 mutant construct being 5 times that of the wild-type construct compared to a two-fold increase for the uAUG7 mutant, based on figure seven from Imataka et al.[19]. Interestingly, five human miRNAs are predicted to interact with uAUG6, of which two are expressed in the HeLa cell lines and none in N2A cells (Table 5 and Additional file 2). Only one expressed miRNA, hsa-miR-31, is predicted to bind uAUG7. If these two uAUGs are indeed miRNA interaction sites, their mutation should presumably eliminate interactions with the miRNAs predicted in Table 5. To test this assumption, we repeated the analysis using mutated uAUG sequences that had been shown to relieve translational repression. When mutated, uAUGs implicated in mediation of translation repression in KLF9 showed fewer predicted interactions with miRNAs (Table 5, sequences m6 and m7) compared to wild-type sequences. Moreover, there was little evidence for expression of miRNAs matching mutated uAUG sequences.

Discussion

Though uAUGs are known to act in post-transcriptional control of gene expression, there is no clear account of the mechanism involved when differences in activity of uAUGs exist across cell or tissue types. While studying uAUGs and miRNAs independent of one another, researchers observed that uAUGs affect gene expression by reducing protein levels while maintaining mRNA levels, just as with miRNA-mediated gene regulation.

Target sites for miRNAs have conventionally been thought to reside on conserved regions of the 3'-UTR and are predicted to bind the seed-region of a miRNA, while 5'-UTRs are thought to lack them [5, 29]. Using a combination of thermodynamic and sequence-based searches, we found many uAUG sites on the 5'-UTR that are predicted to interact with miRNAs. Interactions with uAUGs were, however, restricted to conserved miRNAs, as we found no significant interactions with non-conserved miRNAs. A likely reason might be that exon sequences, which also harbor uAUGs, are under selective pressure, causing conserved miRNAs to also evolve with them while non-conserved miRNAs are under no such constraint. On a genome-wide scale it was similarly noted that interactions with 5'-UTR sequences came mainly from conserved miRNAs [12]. Though both ends of conserved miRNAs exhibited a significant number of interactions, we found a propensity for 3'-end interactions with uAUGs. These possibly constitute a subset of many such interactions identified earlier that were shown, using miRNAs and genes of interest, to cause repression [12]. Forman et al. have also shown in silico that a well-conserved miRNA, let-7, is predicted to base-pair with the 5'-UTRs through the remainder of the miRNA apart from the seed portion [10]. The signal-to-noise ratio observed in the interaction between uAUG motifs and miRNAs surpassed those in our genome-wide motif study, thereby suggesting the importance of this interaction. Based on this evidence, we hypothesized that the overlap in miRNA and uAUG function may arise from underlying sequence-specific interactions.

Examining many genes where uAUGs have regulatory properties, we demonstrated the connection between uAUG-mediated repression and the likelihood that they serve as binding sites for conserved miRNAs. miRNA expression data support this link, confirming the presence of miRNAs in cell-lines where reporter translation is affected by uAUGs. Further, we predict that many uAUGs in the KLF family of genes are miRNA-binding sites. Two uAUGs in the well-studied KLF9 are proven down-regulators of protein expression, with regulation observed only in HeLa cells. Many miRNAs likely to interact with these two sequences were found to be expressed in the HeLa and not in N2A cells, where regulation was not observed.

As mentioned in a previous study and also demonstrated by the GO-term analysis in our results, many genes that contain uAUGs are transcription factors [17]. Two reports show that several miRNAs and transcription factors in C. elegans and mammals are involved in feedback circuits [34, 35]. Expanding these analyses to include transcription factors containing uAUGs in the 5'-UTRs might reveal more such miRNA-transcription factor regulatory networks.

Several other pieces of evidence point to the possible interaction between miRNAs and uAUGs on the 5'-UTRs. Orom et al. showed that miR-10a binds sequences downstream of a 5'-oligopyrimidine tract (5'-TOP) on RPS16, a gene encoding a ribosomal protein, to regulate its translation [9]. This exact binding site on the 5'-UTR was thought to be responsible for conferring cell-specific translational regulation [15]. Taken together with these findings, our results suggest that miRNAs can also interact with uAUG sequences and confer tissue specificity. This would constitute a unifying mechanism of translation repression for miRNAs and uAUGs. We specifically propose that the interaction of miRNAs with uAUGs may impede the progress of the scanning 40S ribosome subunit. Through reporter gene experiments we have shown that miR-34a can induce translation repression by binding to the 5'-UTR of its predicted target (AXIN2) in the absence of the 3'-UTR [12]. We also noted that repression was much higher when both UTRs were present. Based on these results we envision two very plausible scenarios: a) repression caused by binding of the 3'-end of miRNAs uAUGs in the 5'-UTR independent of a separate miRNA molecule that may bind to the 3'-UTR, or b) synergistic repression by both the seed region and 3'-end of miRNA due to simultaneous action of on the 3'-UTR and 5'-UTR, respectively. Interestingly, primer extension (toeprint) analysis reveals the presence of a 40S ribosomal subunit at the start codon on miRNA-repressed mRNAs [36]. The same technique also reveals stalling of ribosomes in the vicinity of uAUGs [24, 25, 37]. Furthermore, Ago2, a member of the Argonaute family of proteins [38, 39] and a component of the functional micro-ribonucleoprotein (miRNP) complex, was found to co-sediment with 40S-containing complexes [36]. These facts indicate that miRNAs associated with miRNPs may recognize uAUG sequences as target sites and prevent translation.

Conclusions

In this manuscript we present observations that suggest a miRNA role in translational control by uAUG cis-elements on the 5'-UTR. Specifically, we identified many interactions between uAUG sequences and conserved miRNAs which suggest a sequence-specific binding mechanism between these post-transcriptional regulatory factors. We also presented evidence to show that miRNAs possibly bind to uAUGs that inhibit translation of downstream reporters in cells where the miRNAs are expressed, thus explaining differential control. This expands the range of probable miRNA targets to include many endogenous sites on the 5'-UTR.

Our current knowledge has limited us to think of miRNAs and uAUGs as distinct regulatory mechanisms. While distinct functions of miRNAs or uAUGs are found in other contexts, our study unifies them as a single translational repression phenomenon whereby uAUGs act as miRNA target sites and translation is hindered.

Methods

uAUG sequences

Pairwise alignments between 5'-UTRs of mammalian human and mouse cDNAs were downloaded from the ftp site listed in Churbanov et al.[17]. From each alignment we extracted uAUG 11-mer sequences from the human 5'-UTR beginning at position -4 and ending at position +7, with the 'A' being designated as +1 (e.g. NNNNAUGNNNN, where N is any nucleotide). Sequences of length 7 to 10 nt (e.g. AUGNNNN, NNNNAUGN, etc.) were considered when the uAUG appears towards the beginning or end of an alignment. Only uAUG sequences sharing 100% identity with the mouse homolog were categorized as conserved while others were considered as non-conserved uAUGs. Experimentally characterized uAUG sequences in Table 3 were obtained from the references listed in Table 2. For the KLF family of genes in Table 4, uAUG sequences were extracted from the 5'-UTR portions of the full RefSeq mRNA.

MicroRNA sequences

For the motif analysis, mature miRNA sequences were downloaded from miRBase (version 11.0) [27]. miRNAs present in at least one other species (e.g. hsa-let-7d and mmu-let-7d), irrespective of conservation at the nucleotide level, were categorized as conserved miRNAs (471 in total) and others as non-conserved miRNAs (206 in total). miRNAs were then split into their 5'- and 3'-halves to check for any preferential interaction with one end or the other.

Sequence complementarity search

A two-step strategy was employed in looking for matches between uAUG 11-mers and miRNA sequences. First, the thermodynamic search program RNAhybrid [28] was used with-e option (ΔG) set to ≤ -14 kcal mol-1. Next, hits with at least seven consecutive nucleotide matches were selected.

Shuffling procedure and significance testing

miRNAs were shuffled in order to keep the nucleotide composition of the sequences intact. The search strategy above was repeated over 1000 shuffling iterations and the average number of interactions was calculated. The resulting distribution of number of interactions was assumed to be normal and significance calculated using a Z-test.

GO-term analysis

We used the Cytoscape plugin for BiNGO [40] to determine the molecular functions in H.sapiens that are over-represented in the set of genes that contain uAUGs from Table 1. We filtered out automatic annotations (evidence code: IEA) before beginning the analysis and used the default settings for all other options provided by the software package.

miRNA expression

For miRNAs from Landgraf et al.'s study [41], we used their web visualization tool to assess the presence or absence of miRNAs in a given cell-line. For data from Chen et al.'s study [42], we used a p-value cutoff of 0.01 to report the miRNA as expressed. We obtained expression evidence for miRNAs of interest in N2A cells from Hohjoh et al.'s [43] study through personal communication. Expression data from Lawrie et al.'s [44] and Takada et al.'s [45] studies were obtained directly from the manuscripts and supplementary information.