LncRNAs - a new layer of genome regulatory information

It is now well appreciated that less than two percent of the human genome codes for proteins and the majority of the genome gives rise to non-protein-coding RNAs (ncRNAs) [1], which are predicted to play essential roles in a variety of biological processes [2, 3].

The focus of this review is long ncRNAs (known as lncRNAs), which constitute the biggest class of ncRNAs with approximately 10,000 lncRNA genes so far annotated in humans [4]. lncRNAs are RNA polymerase II (RNAPII) transcripts that lack an open reading frame and are longer than 200 nucleotides. This size cut-off distinguishes lncRNAs from small RNAs such as microRNAs, piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs) and small interfering RNAs (siRNAs) and arises from RNA preparation methods that capture RNA molecules above this size. Although the function of most lncRNAs is unknown, the number of characterized lncRNAs is growing and many publications suggest they play roles in negatively or positively regulating gene expression in development, differentiation and human disease [2, 510]. lncRNAs may regulate protein-coding (pc) gene expression at both the posttranscriptional and transcriptional level. Posttranscriptional regulation could occur by lncRNAs acting as competing endogenous RNAs to regulate microRNA levels as well as by modulating mRNA stability and translation by homologous base pairing, or as in the example of NEAT1 that is involved in nuclear retention of mRNAs [11]. In this review we focus on the regulation at the transcriptional level.

Modes of transcriptional regulation by lncRNAs

Regulation of transcription is considered to be an interplay of tissue and developmental-specific transcription factors (TFs) and chromatin modifying factors acting on enhancer and promoter sequences to facilitate the assembly of the transcription machinery at gene promoters. With a growing number of lncRNAs implicated in transcriptional gene regulation, this view may need refinement to include networks of tissue and developmental-stage specific lncRNAs that complement known regulators to tightly control gene expression and thereby organism complexity [12, 13]. Transcriptional regulation by lncRNAs could work either in cis or in trans, and could negatively or positively control pc gene expression. lncRNAs work in cis when their effects are restricted to the chromosome from which they are transcribed, and work in trans when they affect genes on other chromosomes.

Regulation in trans

Some significant examples of lncRNAs that act in trans are those that can influence the general transcriptional output of a cell by directly affecting RNAPII activity (Figure 1a,b). One example is the 331 nucleotide 7SK lncRNA, which represses transcription elongation by preventing the PTEFβ transcription factor from phosphorylating the RNAPII carboxy-terminal domain (CTD) [14] (Figure 1a). Another example is the 178 nucleotide B2 lncRNA, a general repressor of RNAPII activity upon heat shock [15]. The B2 lncRNA acts by binding RNAPII and inhibiting phosphorylation of its CTD by TFIIH, thus disturbing the ability of RNAPII to bind DNA [16, 17].

Figure 1
figure 1

Long non-protein-coding RNAs (lncRNAs) act at different levels to regulate protein coding gene expression. lncRNAs can inhibit general protein-coding (pc) gene expression in trans (a) by preventing transcription factor (TF) activity (7SK lncRNA) or (b) by inhibiting RNAPII binding to DNA (B2 lncRNA). Xist lncRNA is transcribed from the X inactivation center (XIC) and inactivates a whole chromosome in cis (c) by recruiting epigenetic modifiers (EM). lncRNAs can regulate specific genes, acting in trans like HOTAIR (d) or in cis like HOTTIP (e) by directly recruiting epigenetic modifiers to certain genomic loci. In both cases the lncRNA binds EMs via a specific sequence or structure and targets them to promoter regions via DNA/RNA interaction elements to affect expression of the respective pc gene. Transcription of a lncRNA through a pc gene promoter or a cis-regulatory element (RE) affects pc gene expression in cis independent of the lncRNA product (f) by mechanisms discussed in the text. Both DNA strands are shown as separate boxes to indicate lncRNA transcription over the pc gene promoter in the antisense orientation. For details see text.

Regulation in trans can also act locus-specifically. While the ability of lncRNAs to act locus-specifically to regulate a set of genes was first demonstrated for imprinted genes where lncRNA expression was shown to silence from one to ten flanking genes in cis[1820], lncRNAs that lie outside imprinted gene clusters, such as the HOTAIR lncRNA, were later found also to have locus-specific action. HOTAIR is expressed from the HOXC cluster and was shown to repress transcription in trans across 40 kb of the HOXD cluster [21]. HOTAIR interacts with Polycomb repressive complex 2 (PRC2) and is required for repressive histone H3 lysine-27 trimethylation (H3K27me3) of the HOXD cluster. Targeting of epigenetic modifiers (EMs) by lncRNAs provided a much sought after model to explain how EMs gain locus specificity (Figure 1d), and has since been suggested as a general mechanism for trans-acting lncRNAs [22, 23].

Regulation in cis

In contrast to trans-acting lncRNAs, which act via their RNA product, cis-acting lncRNAs have the possibility to act in two fundamentally different modes. The first mode depends on a lncRNA product. The major example of general cis-regulation is induction of X inactivation by the Xist lncRNA in female mammals. Xist is expressed from one of the two X chromosomes and induces silencing of the whole chromosome [24] (Figure 1c). As an example of locus-specific regulation it has been proposed that enhancer RNAs activate corresponding genes in cis via their product [25]. A well-studied cis-acting lncRNA acting through its product is the human HOTTIP lncRNA that is expressed in the HOXA cluster and activates transcription of flanking genes. HOTTIP was shown to act by binding WDR5 in the MLL histone modifier complex, thereby bringing histone H3 lysine-4 trimethylation (H3K4me3) to promoters of the flanking genes [26]. Such a mechanism in which a nascent lncRNA transcript binds and delivers epigenetic modifiers to its target genes while still attached to the elongating RNAPII is generally termed ‘tethering’ and is often used to explain cis-regulation by lncRNAs [23, 27] (Figure 1e). It was also proposed to act in plants. In Arabidopsis thaliana, the COLDAIR lncRNA is initiated from an intron of the FLC pc gene and silences it by targeting repressive chromatin marks to the locus to control flowering time [28].

In contrast, the second mode of cis regulation by lncRNAs involves the process of transcription itself, which is a priori cis-acting (Figure 1f). Several lines of evidence suggest that the mere process of lncRNA transcription can affect gene expression if RNAPII traverses a regulatory element or changes general chromatin organization of the locus. In this review we discuss this underestimated role for lncRNA transcription in inducing protein-coding gene silencing or activation in cis, and overview possible mechanisms for this action in mammalian and non-mammalian organisms. Finally, we describe experimental strategies to distinguish lncRNAs acting as a transcript from those acting through transcription.

Mechanisms by which lncRNA transcriptionsilences gene expression

Transcription-mediated silencing, also referred to as ‘transcriptional interference’ (TI), is defined here as a case in which the act of transcription of one gene can repress in cis the functional transcription of another gene [29, 30]. TI has been reported in unicellular and multicellular organisms [30]. Mechanistic details are still largely unclear, but TI could theoretically act at several stages in transcription: by influencing enhancer or promoter activity or by blocking RNAPII elongation, splicing or polyadenylation. All that would be required is that the RNA polymerase (RNAPII) initiated from an 'interfering' promoter traverses a 'sensitive' DNA regulatory sequence. TI has mainly been reported at overlapped promoters [3135], but there are also examples where TI acts downstream of the promoter. In mouse, overlapping transcription controls polyadenylation choice of two imprinted genes [36, 37]. In Saccharomyces cerevisiae, collisions between elongating antisense RNAPIIs can lead to stalling of both polymerases that is resolved by ubiquitylation-directed proteolysis, and this has been proposed to be a regulatory mechanism [38]. However, it is unknown if RNAPII collisions occur sufficiently frequently in vivo in yeast or other organisms to offer a means of regulating convergent genes, or if this mechanism could lead to an interfering RNAPII eliminating its sensitive collision partner. Despite these examples, the most common reports of TI concern an overlapped promoter, and in the following sections we describe studies investigating the molecular mechanisms underlying interference at the promoter.

Transcriptional interference acting by promoter nucleosome repositioning

DNA in the nucleus is organized into chromatin with the organizational scaffold consisting of nucleosomes, each with two copies of H3, H4, H2A and H2B histones [39]. Nucleosomes can be densely packed, interfering with protein-DNA interactions, or relaxed, facilitating these interactions [40]. The transcription process, which generates single-stranded DNA as RNAPII progresses along a gene locus, can directly affect nucleosome positioning [4143] (reviewed in [44, 45]). Thus, lncRNA transcription could cause TI by depositing nucleosomes in a manner unfavorable for TF binding on promoters or enhancers. An example of this mechanism is the silencing of the yeast SER3 pc gene by transcriptional overlap by the SRG1 lncRNA (Figure 2a) [46]. SRG1 transcription increases nucleosome density at the overlapped SER3 promoter. Deletion of three transcription elongation factors that are associated with the elongating polymerase and are necessary for nucleosome repositioning (SPT16, SPT6, SPT2) [4749] abolished the silencing effect without stopping transcription of the overlapping lncRNA SRG1[50, 51], indicating the necessity of chromatin reorganization for silencing. In contrast, deletion of epigenetic modifiers (such as SET1/2 histone methyltransferases and SET3C/RPD3S deacetylases described later) did not affect silencing, showing that nucleosome positioning, but not changes in histone modifications, is responsible for repression. The experiments did not directly exclude a role for the SRG1 lncRNA product, but the silencing can be explained solely by the process of transcription [44, 45]. TI by nucleosome repositioning may be a general mechanism in yeast, as the RNAPII elongation and chromatin organization factors responsible for SER3 silencing are also known to be involved in the suppression of transcription initiation from cryptic promoters within the body of actively transcribed genes [52, 53]. Since genes controlling RNAPII elongation and chromatin organization are largely conserved, it is possible that lncRNAs could use similar nucleosome repositioning silencing in mammals. This is supported by the example that chromatin reassembly factors are necessary for silencing an HIV provirus when integrated into an actively transcribed host gene in a human cell system [54].

Figure 2
figure 2

Transcription interference-mediated silencing by chromatin changes. (a) Top: in yeast the absence of SRG1 lncRNA allows transcription machinery assembly at the SER3 protein coding gene promoter. Bottom: SRG1 lncRNA transcription causes dense nucleosome packing over the downstream SER3 pc gene promoter that blocks TF binding and pc gene expression. (b) Top: in yeast the absence of IRT1 lncRNA allows IME1 pc gene expression. Bottom: RNAPII transcribing the IRT1 lncRNA carries EMs that deposit repressive histone modifications at the IME1 promoter (EM1 - methyltransferases). These modifications allow the binding of other EMs that remove active histone modifications (EM2 - deacetylases) and cause a repressive chromatin environment that blocks TF binding leading to silencing. (c) Top: in a healthy human, LUC7L and HBA2 pc genes do not overlap and are both expressed. Bottom: a chromosomal deletion of the LUC7L transcriptional stop signal (red ‘stop’ box) causes transcription of the LUC7L pc gene through the promoter of the HBA2 pc gene. By an unknown mechanism this aberrant transcription causes DNA methylation and silencing of the HBA2 promoter. For details see Figure 1g and text.

Transcriptional interference acting by promoter histone modifications

Promoter associated nucleosomes carry post-translational histone tail modifications that reflect the activity state of the promoter and also influence accessibility of DNA binding factors involved in transcription [55]. Active gene promoters correlate with H3 and H4 acetylation and with H3K4me3, while inactive promoters do not and, in mammals, they also gain repressive histone marks such as H3K9me3 or H3K27me3. Some histone modifying enzymes have been shown to bind and travel with elongating RNAPII [56, 57], so it is possible that lncRNA transcription can induce TI by affecting histone modifications at the promoter of an overlapped target gene. For example, in yeast the SET1/2 methyltransferases, which induce H3K4me2 and H3K36me3 in the body of transcribed genes, bind and travel with elongating RNAPII [5860]. These modifications in turn recruit the SET3C/RPD3S histone deacetylase complexes to create a chromatin environment repressive for transcription initiation [6163].

Two studies indicate that this is a mechanism used by lncRNAs to induce TI in yeast. In the first study the IME1 pc gene, which induces gametogenesis in diploid S. cerevisiae cells but is repressed in haploid cells, was shown to be silenced by the IRT1 lncRNA that overlaps its promoter [64]. Genetic experiments repositioning the IRT1 lncRNA distant from IME1 on the same chromosome showed that IRT1 transcriptional overlap of the IME1 promoter is necessary for silencing. Interestingly, the instability of the IRT1 lncRNA product and its non-specific cellular localization indicated the lncRNA product is unlikely to play a role in the silencing mechanism. Instead, IRT1 lncRNA transcription through the IME1 promoter reduced recruitment of the essential POG1 transcription factor, increased nucleosome density and induced the SET1/2 mediated cascades of histone modifications, which were shown to be necessary for silencing [64] (Figure 2b). In the second study lncRNA transcription was shown to be causative for silencing of the GAL1 and GAL10 genes, involved in galactose metabolism in S. cerevisiae. GAL10 and GAL1 are divergently transcribed from a bidirectional promoter. The 4 kb lncRNA, called GAL10-ncRNA, initiates in the body of the GAL10 gene, and is transcribed through the GAL10/GAL1 promoter antisense to the GAL10 gene. GAL10-ncRNA transcription induces SET2-mediated establishment of H3K36me3 along its gene body, thereby recruiting RPD3S-dependent deacetylation that resulted in reduced transcription factor binding and repression of the GAL1/GAL10 promoter [65]. Both SET3C and RPD3S are proposed to have a general role in repressing cryptic promoters within gene bodies [61, 66] and a genome-wide study implied a role for SET3C in overlapping lncRNA-mediated silencing of a set of pc genes in yeast [66]. This indicates that the mechanism described above might be widely used to control gene expression in yeast. Although similar studies have not been described for the mammalian genome, H3K36me3 marks the body of transcribed genes in mammals, raising the possibility that such TI mechanisms could be conserved [56, 57].

Transcriptional interference acting by promoter DNA methylation

In mammalian genomes DNA methylation is generally associated with silent CpG island promoters, but the majority of CpG island promoters remain methylation free independent of their expression status [6769]. The process of de novo methylation depends on the DNMT3A/3B methyltransferases and the catalytically inactive DNMT3L homologue and requires histones lacking H3K4me3, ensuring that active promoters remain methylation-free [70]. Notably, while DNA methylation at the promoter blocks transcription initiation, methylation in the gene body does not. Two important examples in humans based on genetic analyses indicate that DNA methylation can be involved in TI-induced silencing, although the causality between DNA methylation and silencing is still a matter of discussion [67]. One study of a patient with inherited α-thalassemia identified a deletion of the LUC7L 3' end that allowed aberrant transcription of LUC7L through the downstream HBA2 gene, causing its silencing and the disease phenotype [71] (Figure 2c). Mouse models that mimicked the deleted genomic locus showed that the main cause of silencing was the acquisition of DNA methylation at the HBA2 promoter. Notably, DNA methylation acquisition was not simply the consequence of an inactive promoter, as removal of HBA2 transcription by deleting its TATA box did not induce methylation. The sequence of the LUC7L gene and thus the aberrant RNA product was also not essential for HBA2 silencing, as replacing the LUC7L gene body with another protein-coding gene did not remove the repressive effect. In a second example, a subset of Lynch syndrome patients display DNA methylation and inactivation of the mismatch repair MSH2 gene that correlates with aberrant transcription from the flanking EPCAM gene that carries a 3' deletion [72].

In both these examples, the molecular details of methylation establishment and the mechanism by which the methylation machinery targets the overlapped promoter are yet unknown. However, the data so far show that it is a cis-acting mechanism as only the allele carrying the deletion silences the overlapped protein-coding gene. In addition, although a role for the aberrant RNA product was not excluded, it appears unlikely that mutation-induced transcription of two independent intergenic chromosomal regions in the described diseases produces lncRNA products with similar repressive functions. Interestingly, the silencing of imprinted pc genes by lncRNAs is also often correlated with the gain of DNA methylation on the silent pc gene promoter [73]. In the case of the Igf2r gene, this DNA methylation mark is not necessary for initiation or maintenance of the silent state but seems to play a role in re-enforcing the silent state [35, 74].

Transcriptional interference in the absence of chromatin changes at the silenced promoter

In addition to RNAPII acting as a carrier of chromatin modifying enzymes, other TI models predict that RNAPII from one promoter traversing across another promoter can interfere with its activity without introducing chromatin changes [30, 75, 76]. An indication that such a mechanism can be used by lncRNAs in mammals comes from a study that used a genetic approach to dissect the silencing function of the imprinted mouse Airn lncRNA [77, 78]. Airn is an inefficiently spliced 118 kb lncRNA expressed on paternally inherited chromosomes that overlaps and silences the promoter of the Igf2r pc gene - a dose-sensitive and essential embryonic growth suppressor [18, 79] (Figure 3a). To determine if Airn transcription or its lncRNA product were required for silencing, homologous recombination in embryonic stem cells was used to shorten the length of Airn, either before or after the Igf2r promoter, by insertion of a polyadenylation cassette [35]. Notably, only shortened Airn variants that traversed the Igf2r promoter induced silencing. Furthermore, while Igf2r silencing is normally accompanied by DNA methylation, repressive histone marks and chromatin compaction of the silent Igf2r promoter [80, 81], Igf2r silencing was not dependent on DNA methylation - in contrast to the silencing of HBA2 by aberrant LUC7L transcription described above. Instead, Airn transcriptional overlap interfered with the accumulation of functional RNAPII on the Igf2r promoter in the presence of open chromatin [35]. Additional support for Igf2r silencing by Airn transcriptional interference is provided by genetic experiments that used an inducible Airn promoter to silence Igf2r at different stages of embryonic stem cell differentiation [74]. The demonstration that Airn transcription is continuously required for Igf2r silencing and that its silencing efficiency decreases when the Igf2r promoter is strongly expressed provides support for a model whereby RNAPII initiated from an 'interfering' promoter interferes with transcription initiation from a 'sensitive' promoter.

Figure 3
figure 3

Transcription interference-mediated silencing without chromatin changes. (a) Top: a wild-type maternal allele does not express Airn lncRNA as its promoter is repressed by a DNA methylation imprint, thus allowing the Igf2r gene to be active. Middle: on the wild-type paternal allele Airn transcription overlaps with and silences the Igf2r pc gene promoter, independent of the Airn lncRNA product. The silent Igf2r promoter is marked by increased nucleosome density and DNA methylation in the absence of active histone modifications. Bottom: increased nucleosome density, loss of active histone marks and DNA methylation are not necessary for Igf2r repression as demonstrated by the FAP allele that moved the Airn promoter close to the Igf2r promoter and silenced Igf2r in the absence of repressive chromatin features. (b) Top: a hypothetical enhancer activates a pc gene by direct long-range DNA interactions. Bottom: transcription of a lncRNA overlapping the enhancer interferes with the DNA interaction and thereby silences the pc gene. For details see Figure 1g, Figure 2d and text.

To date, other examples of lncRNAs acting by this mechanism in mammals are lacking. It has been suggested that silencing of an alternative promoter of the mouse fpgs pc gene is an example of transcription inducing silencing without introducing chromatin changes [82], but this system has not been subject to a similar genetic analysis and alternative explanations remain possible. How RNAPII from an interfering promoter is able to suppress functional transcription of the overlapped promoter remains to be determined, but stalling of the interfering RNAPII elongating over the sensitive promoter has been suggested to block access of essential TFs [30, 83]. This mechanism should not be confused with the phenomenon of genome-wide RNAPII pausing at promoters, which represents an intermediate step between RNAPII initiation and elongation phases and might be a common mechanism regulating differential gene expression in metazoans [84, 85].

The above examples describe repressive effects from RNAPII transcribing lncRNAs through promoters of silenced genes. However, transcriptional interference might also disrupt enhancer function when RNAPII traverses an enhancer, and this is an attractive model to explain the repression of a cluster of genes by a lncRNA in a tissue-specific manner [75] (Figure 3b). This situation arises in two imprinted gene clusters where the Airn and Kcnq1ot1 lncRNAs each overlap one gene, but silence multiple genes in cis in a tissue-specific manner. The repressive histone EHMT2 methyltransferase has been shown to be necessary in the placenta to silence one of the three genes controlled by Airn[86]. The Kcnq1ot1 lncRNA has been shown to silence multiple genes in placental cells by the action of repressive POLYCOMB histone modifying enzymes [87, 88]. In both cases, a direct role for the lncRNA in targeting the histone modifying complexes was proposed, based on the findings that the lncRNAs interact with the respective histone modifying complex. This correlation-based evidence is, however, not sufficient to rule out the possibility that both lncRNAs silence distant genes by transcription alone (reviewed in [75, 76]). In support of a transcription-based model, it was shown that Kcnq1ot1 silences at least one gene by regulating chromatin flexibility and access to enhancers [89]. This is consistent with a two-step model whereby lncRNA transcription initiates silencing of non-overlapped genes by enhancer interference, then repressive histone modifying enzymes maintain that silencing.

lncRNA transcription creating a permissive chromatin environment

Enhancers are genetic elements that bind transcription factors facilitating transcription machinery assembly at nearby promoters [90, 91]. RNAPII transcripts up to 2 kb long are transcribed bi-directionally from some neuronal enhancers (termed enhancer or eRNAs) [91, 92]. Transcription of eRNAs positively correlated with expression of nearby mRNAs and a model was proposed, but not yet experimentally tested, in which their transcription establishes a chromatin landscape that supports enhancer function (Figure 4a). lncRNA transcription, either by opening chromatin or inhibiting repressor protein binding, could similarly result in gene or locus activation. One example of this is the process of V(D)J recombination, which joins elements of the V, D and J multigene family by chromosomal rearrangements to create functional B cell immunoglobulins and T cell receptors [93] (Figure 4b). The V, D and J genes lie next to each other on the same chromosome and antisense intergenic transcription through these genes is detected prior to the recombination process [94]. Genetic experiments have shown that intergenic lncRNA transcription is required for both B and T cell V(D)J recombination [95, 96]. Similar correlations between intergenic transcription and gene expression were observed for the mouse β-globin locus [97] where promoter deletion experiments showed that lncRNA transcription was responsible for stable, active and hyper-accessible chromatin [98].

Figure 4
figure 4

Transcription of lncRNA creates permissive chromatin environment. (a) Top left: an inactive enhancer with closed chromatin cannot activate the pc gene. Top right, bottom left: transcription of the enhancer opens chromatin. Bottom right: open chromatin at the enhancer allows TF binding and interaction with and activation of the pc gene promoter. (b) VDJ recombination. From top to bottom: 1, D and J segments are joined and the V region has closed chromatin; 2, antisense transcription through the V region opens the chromatin and allows recombination factors to bind; 3, a V segment is joined to the DJ segment. (c) Top: at a bidirectional promoter a lncRNA and a pc gene are transcribed in opposite directions. The promoter is always in an open chromatin conformation as either the lncRNA or the pc gene is transcribed, which is thought to reduce transcriptional noise. Bottom: a unidirectional pc gene promoter can acquire a closed chromatin conformation due to stochastic TF binding, which is thought to increase transcriptional noise. Noise defines the variation of expression of a transcript between genetically identical cells caused by the stochastic binding of TFs regulated by the local chromatin environment. For details see Figure 1g, Figure 2d, Figure 3c and text.

lncRNA transcription and locus activation

Other examples indicate that lncRNA transcription activates gene expression by blocking access of repressor complexes to chromatin. In Drosophila, intergenic non-coding transcription at the BITHORAX complex (BX-C) is implicated in reversing POLYCOMB group (PCG)-mediated gene silencing and is correlated with an active chromatin state [99]. This mode of action was later suggested to be a general mechanism where the act of transcription serves as an epigenetic switch that relieves PCG-mediated gene silencing by recruiting epigenetic modifiers to induce gene expression and generate stable and heritable active chromatin [100]. In line with this hypothesis, intergenic transcription through PCG response elements (PREs) in the BX-C cluster is not only found during embryogenesis but also in late stage larvae, indicating that continuous transcription is required to keep genes active [101]. In mouse and human, a similar role for PRE transcription has been proposed. An analysis of lncRNA transcription in the human HOXA cluster revealed a positive correlation between lncRNA transcription and the loss of PCG/chromatin interactions that precedes HOXA gene activation [102]. Additionally, lncRNAs have been identified at promoter regions of PCG-regulated genes in mouse cells; while their role is not yet clear, it has been suggested that they either promote or interfere with PCG binding at target genes [103, 104].

A further example of a lncRNA mediating chromatin opening was described at the S. cerevisiae PHO5 gene. Transcription of an antisense lncRNA that initiates near the 3’end of PHO5 and overlaps its gene body and promoter is associated with rapid activation of PHO5 by enabling nucleosome eviction. Biochemical inhibition of RNAPII elongation as well as genetic disruption of lncRNA elongation demonstrated a direct role in PHO5 activation [105]. The association of lncRNA transcription with gene activation needs, however, to be considered within the framework that most protein-coding gene promoters in yeast and mammalian cells give rise to a bidirectional antisense lncRNA transcript [106, 107]. To date it is unclear if promoter-associated bidirectional lncRNAs represent spurious transcription in the context of open chromatin [108, 109] or is required to maintain open chromatin. In the latter case enhanced TF binding ensures accessible chromatin that allows more constant pc gene expression within a cell population [110] (Figure 4c).

Strategies for distinguishing a role for the lncRNA product from that of its transcription

Following genome-wide lncRNA mapping, functional studies so far have mainly focused on lncRNA products [7, 111]. As it becomes clear that lncRNAs can act through their transcription, it is important to identify strategies to determine the function and mode of action of each particular lncRNA. One common starting point to determine lncRNA function has been RNA interference (RNAi)-mediated knockdown, despite long-standing observations that the RNAi machinery in mammalian cells is located in the cytoplasm [112]. While there is evidence that some RNA-induced silencing complex (RISC) components are found in the nucleus, functional complexes are specifically loaded in the cytoplasm, prohibiting the application of RNAi strategies for nuclear localized lncRNAs [113]. In contrast, antisense oligonucleotides (ASO) that work via an RNaseH-dependent pathway will deplete nuclear-localized lncRNAs [114, 115]. However, three additional points of caution should be noted. First, non-specific effects arising from nuclear transfection reagents [116] have confused some observations. One critical validation step for knockdown studies would be a rescue experiment in which the lncRNA, modified to be invulnerable to the knockdown, is expressed as a transgene under the same transfection conditions [111]. Second, some results have highlighted major differences when functional studies used post-transcriptional depletion strategies in cell lines in contrast to genetic studies in the organism. Notable examples are Neat1[117], Malat1[116, 118, 119] and Hotair[120] where studies of mice carrying genetically disrupted alleles of these three lncRNAs failed to reproduce phenotypes deduced from cell lines following RNAi, ASO or over-expression studies. Third, while knockdown experiments may elucidate the function of lncRNAs acting through their product, the function of cis-acting lncRNAs that depend only on transcription will not be disturbed.

Features such as subcellular localization, half-life and steady-state abundance would form a good basis to allow functional tests to be designed. In addition, knowledge of the lncRNA splicing efficiency, conservation of splicing pattern in multiple tissues and species, an estimation of transcript repeat content and, finally, an accurate mapping of lncRNA 5' and 3' ends are essential preliminary steps. We have previously proposed that a subclass of lncRNAs, ‘macro’ lncRNAs, show RNA biology hallmarks such as inefficient splicing, extreme length, high repeat content, lack of conservation and a short half-life. These features are also indicators that the lncRNA product is less important than the act of transcription [121]. Once RNA biology features are known, experiments can be designed to distinguish between a role for the lncRNA product or its transcription.

From the caveats of posttranscriptional knockdown experiments described above, it becomes clear that genetic strategies are optimal for testing lncRNA function. These strategies include manipulating the endogenous locus to delete the promoter or the whole gene or to shorten its length using inserted polyadenylation signals, as described for several examples above. This may appear a formidable task with the appreciation that lncRNAs in the human genome may outnumber protein-coding genes [4]; however, suitable cell systems already exist. These include the use of haploid cell lines with transcriptional stop signal insertions in most human genes that are screened by RNA sequencing [122], gene targeting by engineered zinc-finger nucleases [123] or CRISPR systems [124] or the use of mouse embryonic stem cells that have efficient rates of homologous targeting [125, 126].

These genetic strategies could be applied to determine if the lncRNA is functional and if its function requires the lncRNA product or only depends on the act of transcription (Figure 5). Once these answers are obtained, it will be useful to test whether additional chromatin features are involved. This could include chromatin accessibility assays to address nucleosome density in the regulated gene; and mapping of histone modifications and DNA methylation, and of the presence of RNAPII and other transcription machinery components. These studies have been made easier in the mouse and human genome due to the publicly available ENCODE data [127]. As lncRNA identification becomes easier due to improved sequencing and bioinformatics tools, the number of annotated lncRNA transcripts is rising sharply [4, 128]. It is therefore a high priority to determine which lncRNAs are functional and which represent spurious transcription [109, 129]. To date only a relatively small number of mammalian lncRNAs have clearly been shown to regulate gene expression and most attention has centered on lncRNAs that act through their transcription product [23]. With the recent demonstration that for some mammalian lncRNAs the act of their transcription is sufficient for function [35], it becomes clear that there can be a number of lncRNAs acting in a similar way. If the above described findings and approaches are used as guidelines, many new lncRNAs regulating genes by the act of transcription are likely to be discovered.

Figure 5
figure 5

Strategies to distinguish between the function of a lncRNA product and its transcription. Both DNA strands are shown as separate boxes to indicate a lncRNA transcribed from the top reverse strand, overlapping a pc gene transcribed from the bottom forward strand in antisense orientation. A silencing function of the lncRNA can be predicted by an anti-correlating expression pattern. (a) Left: the lncRNA silencing effect is mediated by tethering of the lncRNA product at the site of transcription, sequence-specific binding of an EM to the lncRNA and guidance of the EM to the pc gene promoter. Right: silencing is mediated by a transcription process independent of the lncRNA product. (b) Posttranscriptional knockdown removes the lncRNA product, thus reversing a lncRNA product-mediated effect (left) but not the transcription-mediated effect (right). (c) lncRNA promoter deletion removes both lncRNA product- (left) and transcription-mediated (right) effects. (d) Truncation experiments inserting transcriptional stop signals at different positions within the lncRNA gene identify the functional region of the lncRNA gene (RNAPII is not shown). Left: lncRNA is only functional when the region corresponding to EM binding site is present. Right: lncRNA is only functional when it crosses the promoter of the overlapped pc gene. For details see Figure 1g, Figure 2e, Figure 3c, Figure 4d and text.