Emerging roles of non-coding RNAs in epigenetic regulation

Recent deep sequencing surveys of mammalian genomes have unexpectedly revealed pervasive and complex transcription and identified tens of thousands of RNA transcripts that do not code for proteins. These non-coding RNAs (ncRNAs) highlight the central role of RNA in gene regulation. ncRNAs are arbitrarily divided into two main groups: The first includes small RNAs, such as miRNAs, piRNAs, and endogenous siRNAs, that usually range from 20 to 30 nt, while the second group includes long non-coding RNAs (lncRNAs), which are typically more than 200 nt in length. These ncRNAs were initially thought to merely regulate gene expression at the post-transcriptional level, but recent studies have indicated that ncRNAs, especially lncRNAs, are extensively associated with diverse chromatin remodeling complexes and target them to specific genomic loci to alter DNA methylation or histone status. These findings suggest an emerging theme of ncRNAs in epigenetic regulation. In this review, we discuss the wide spectrum of ncRNAs in the regulation of DNA methylation and chromatin state, as well as the key questions that needs to be investigated and acknowledging the elegant design of these intriguing macromolecules.


INTRODUCTION
In 1956, Francis Crick described a "central dogma", stating that genetic information flowed unidirectionally from DNA to RNA to protein (Crick, 1970). Around 20 years later, Paul and Duerksen surprisingly found that biochemically purified chromatin fractions contain more than twice as much RNA as DNA (Paul and Duerksen, 1975), raising the suspicion that RNA may be the primary organizer of chromatin structure and epigenetic regulation. This underlying "dark matter" was extensively appreciated by transcriptome sequencing, which revealed that while ~1% of the mammalian genome carries protein-coding potential, more than 75% of the genome is transcribed to produce a huge number of non-coding transcripts, including rRNAs, tRNAs, snR-NAs, miRNAs, siRNAs, piRNAs, and especially lncRNAs (Djebali et al., 2012;Lander et al., 2001;Venter et al., 2001). Although such pervasive and complex transcription is well accepted, the functionality of this vast number of transcripts needs to be further explored. Some think most ncRNAs are transcriptional noise or the by-products of transcription, whereas others claim they are functional (Hangauer et al., 2013;Jacquier, 2009;Wilusz et al., 2009). Moreover, RNA and chromatin are clearly intertwined together, but mechanisms linking RNA to chromatin are still unclear. In this review, we first describe the transcriptional regulation of small ncRNAs and then focus on the epigenetic regulation of lncRNAs in diverse biological systems.
Non-coding RNAs are defined as functional molecules but without protein-coding ability. Based simply on their sizes, ncRNAs are arbitrarily divided into small and long RNAs (over 200 nt). Over the last two decades, several types of small non-coding RNAs, such as miRNAs, piRNAs, and endogenous siRNAs, have been discovered through genetic mapping. With the advent of deep-sequencing technology, an emerging small RNA world has been uncovered (Brosnan and Voinnet, 2009). These small ncRNAs play important roles in numerous cellular processes, such as cell proliferation, cell apoptosis, DNA damage, and induced pluripotency (He and Hannon, 2004). The abnormal expression of small ncRNAs has been shown to cause tumors in many reports (Di Leva and Croce, 2010).
The discovery of lncRNA occurred a bit earlier than that of miRNA, with H19 and Xist, involved in imprinting and dosage compensation, being the first two identified lncRNAs (Bartolomei et al., 1991;Brannan et al., 1990;Brown et al., 1991;Brown et al., 1992;Pachnis et al., 1988). lncRNAs can originate from intronic, exonic, or intergenic regions (St Laurent et al., 2015). Based on their genomic location relative to the protein-coding genes, lncRNA has been further classified into four categories: (i) Antisense lncRNAs are transcribed in the opposite direction of protein-coding genes and have partial overlap with genes. (ii) Intronic lncRNAs are initiated in the introns of protein-coding genes but without overlapping exons; (iii) bidirectional lncRNAs are defined by initiating in a divergent fashion from the promoter region of the nearest protein-coding gene. (iv) Intergenic lncRNAs reside in intergenic regions without overlapping any annotated pro-tein-coding genes ( Figure 1). Recent global run-on sequencing in mammalian cells further identified a large number of lncRNAs derived from promoter and enhancer regions. Although their expression levels are very low and relatively unstable, active transcription of these RNAs is critical for optimal gene induction. Many of these enhancer lncRNAs belong to the intergenic lncRNA family, whereas promoter lncRNAs may belong to either the antisense lncRNA or bidirectional lncRNA classes (Rinn and Chang, 2012).

SMALL ncRNAs IN EPIGENETIC REGULATION
Small regulatory RNAs, including miRNAs and siRNAs, are 20-23 nt small RNAs that were initially thought to regulate gene expression by slicing mRNA or inhibiting translation (He and Hannon, 2004). Currently, there are around 1,800 miRNAs that have been identified in the human genome. The number of miRNAs is still increasing exponentially along with advancements in high-throughput sequencing technology (Griffiths-Jones et al., 2006;Kozomara and Griffiths-Jones, 2014;Luo et al., 2015). miRNAs were initially thought to only function in 3′UTR (untranslated region) regions, though recent studies show that miRNA functional sites are highly diverse in the cell (Chi et al., 2009;Helwak et al., 2013;Xue et al., 2013). Although thousands of miRNAs have been annotated in mammalian genomes, defining their functional target genes remains a major challenge in the field. Thus, an efficient and unbiased high-throughput method would be valuable in revisiting the principal of miRNA-mRNA interactions and re-evaluating the functional mechanism of miRNA.

Figure 1
The genomic landscape of RNA classification. lncRNAs are defined by their relative proximity to the nearest protein-coding gene. Coding exons are represented as cyan boxes, while non-coding exons are shown in other colors. Schematic representation of lncRNAs organized bidirectionally (purple), antisense to coding genes (red), in the introns of coding genes (rose), in intergenic regions (blue) or associated with a promoter (green).
Although miRNAs have not been reported to be directly involved in epigenetic regulation in mammalian cells, several groups have found that aberrant expression of miRNAs can alter the global DNA or chromatin state by restricting the activity of a related chromatin remodeling enzyme (Benetti et al., 2008;Denis et al., 2011;Garzon et al., 2009;Yuan et al., 2011). Considering the extensive base pairing of miRNAs with these pervasive transcribed nascent RNAs, it is not surprising that the miRNA pathway may directly participate in epigenetic control of gene expression. Indeed, two recent reports demonstrated the DNA methylation changes mediated by the miRNA pathway in plants (Khraiwesh et al., 2010;Wu et al., 2010). In line with this observation, genome-wide mapping of Argonaute 2 (AGO2), the key component of the RNA interference silencing complex (RISC), revealed that over 60% of Ago2-miRNA sites are located in intronic, promoter, or intergenic regions (Xue et al., 2013), implying an undocumented role of miRNAs in transcription or epigenetic regulation.
The silencing of repetitive regions, such as centromeres, telomeres, and the mat locus, by siRNA-mediated RNA interference has been well characterized in Schizosaccharomyces pombe. Several groups have demonstrated that depletion of the key components of the RISC complex, such as Dicer (dcr1), Argonaute (Ago1), and RNA-dependent RNA polymerase homolog Rdp1, causes aberrant accumulation of long non-coding RNAs from centromeric repeats and corresponding loss of histone H3 lysine 9 methylation (H3K9me), thus impairing the function of the centromeres Volpe et al., 2002). Such long non-coding transcripts are not detected in wild-type cells, as Dicer is able to actively process them into small RNAs. These small RNAs in turn guide RNA-induced transcriptional silencing (RITS) complexes containing Ago1, Chp1, and Tas3 to centromeric regions through base pairing with nascent transcripts or complementary DNA sequences (Noma et al., 2004;Verdel et al., 2004). Meanwhile, the siRNA-mediated targeting tethers Clr4 methyltransferase complexes to repetitive regions to methylate histone H3 at lysine 9. H3K9me in turn recruits chromodomain-containing proteins, such as Chp1, Chp2, and Swi6, to initiate the spreading and establishment of heterochromatin domains. Interestingly, RITS complexes are proven to be able to interact with another complex called the RDRC (RNA-dependent RNA polymerase complex), which contains Rdp1 and Dicer. Further recruitment of RDRC is able to process nascent transcripts into siRNAs, which initiate another round of loading and tethering and thus form a self-enforcing loop to efficiently set up heterochromatin structure at the centromere and repetitive regions in the genome (Motamedi et al., 2004).
Such siRNA-mediated epigenetic regulation is also observed in Arabidopsis . In which a specific argonaute family protein, AGO4, is responsible for locus-specific siRNA accumulation as well as DNA and histone methylation (Zilberman et al., 2003). This RNAdirected transcriptional silencing and histone methylation is also conserved in mammalian cells. Several groups have demonstrated that transfected siRNAs targeting promoter regions are able to induce DNA methylation and histone H3 lysine 9 dimethylation, although this is dependent on a different argonaute protein family member, Ago1 (Han et al., 2007;Kim et al., 2006;Morris et al., 2004). Moreover, recent massively parallel sequencing of small RNAs from mouse oocytes and embryonic stem cells identified a larger number of endogenous siRNAs and piRNAs, most of which are derived from retrotransposons, bidirectional transcription, and antisense transcripts; however, whether these siRNAs and piRNAs are involved in heterochromatin formation in mouse cells is still unclear (Babiarz et al., 2008;Watanabe et al., 2008). In line with these reports, an RNA component has been described to be essential for the maintenance of H3K9me-marked pericentric heterochromatin (Maison et al., 2002). Moreover, depletion of Dicer, a key component of the RNAi machinery, can cause significant defects in centromeric silencing in mouse embryonic stem cells and DT40 cell lines (Fukagawa et al., 2004;Kanellopoulou et al., 2005). Recent sequencing efforts revealed a strong correlation between H3K9 methylation and repetitive elements (Lippman et al., 2004), which account for two-thirds of human genome. These findings lead to an intriguing possibility that the RNAi machinery may have profound roles in epigenome maintenance and regulation. Further mechanistic studies of small ncRNAs involved in this process may help us to understand the inheritance of epigenetic regulation.
Besides miRNAs and siRNAs, another small non-coding RNAs called piRNAs (piwi-interacting RNAs) have also been reported to play important roles in epigenetic regulation. In contrast to small RNA-mediated gene silencing, piRNAs were initially found to promote euchromatic histone modifications in Drosophila melanogaster (Yin and Lin, 2007). A recent genome-wide survey indicates that Piwi and piRNAs are extensively associated with chromatin, and that piRNAs are sufficient to recruit HP1a and Su(var)3-9 to specific genomic loci to repress RNA polymerase II transcription . This piR-NA-mediated repression has also been reported in other species, such as the Aplysia brain, in which a specific piR-NA is able to target Piwi protein complexes to the promoter region of the transcriptional repressor CREB2 and mediate its methylation at a conserved CpG island (Rajasethupathy et al., 2012). A similar mode of regulation may also exist in other somatic tissues, but this is still unclear.

LncRNA IDENTIFICATION AND CLASSIFICATION
The first identified lncRNA, H19, is one of the most abundant lncRNAs in developing mouse embryos; it is actively transcribed from a maternal allele and reciprocally imprinted with insulin-like factor 2 (Igf2), which is paternally expressed. Aberrant expression of H19 is closely linked to the genetic disorders such as Beckwith-Wiedemann syndrome and Silver-Russell syndrome (Bartolomei et al., 1991;Pachnis et al., 1988). Although it has been shown that H19 plays important roles in cell proliferation and tumorigenesis, complete deletion of H19 has no observable phenotype in mice (Ripoche et al., 1997). Recently, two groups reported that H19 functions as the precursor of miR-675 or as a specific sponge for let-7 family members, strongly arguing against its functionality as an lncRNA in imprinting balance (Cai and Cullen, 2007;Kallen et al., 2013). However, how H19 controls imprinting is still unclear. In 1991, another lncRNA, Xist was cloned from human X-inactivation region. It originated from an inactivated X chromosome and was found to encode a 17-kb RNA that can coat the chromosome in cis. The essential role of Xist in X chromosome inactivation strongly suggests that a non-coding RNA itself could be a driving force in altering DNA and chromatin states (Lee, 2012).
Around 15 years later, along with a consortium effort in epigenome mapping, ~1,600 lncRNAs were identified in the mouse genome by checking Pol II-transcribed genes with a signature of H3K4me3 and H3K36me3 chromatin features (K4-K36 domains) ). Now, the total membership of lncRNAs exceeds 200,000, although the number is still debated. This large number of lncRNAs immediately elicited questions about their functionality (Xie et al., 2014). However, lncRNAs show more evolutionarily conserved sequences than intronic and untranscribed intergenic regions, suggesting that most of them are biologically functional in mammals (Bu et al., 2015). Moreover, lnc-RNA expression patterns are strongly associated with diverse key biological processes, such as DNA damage, immune responses, and induced pluripotency, again indicating their potential functionality Song et al., 2014). In terms of the physical localization of lncRNAs relative to protein-coding genes, a subset of lncRNAs is predominantly localized in the nucleus, while the majority exclusively resides in the cytosol. Nuclear lncRNAs play important roles in epigenetic regulation, whereas cytoplasmic lncRNAs have been shown to regulate mRNA stability and translation.

lncRNAs IN EPIGENETIC CONTROL
Though a large number of lncRNAs have been identified, the mechanistic and functional roles for most lncRNAs are still unknown. Genetic studies have revealed that several lncRNAs, such as H19, Xist, Kcnq1ot1, AIR, HOTAIR, and ANRIL, are associated with imprinting and heterochromatin formation. The common lessons from these lncRNAs are that they facilitate the recruitment of chromatin modifying enzymes to specific genomic loci to change the chromatin or DNA state (Rinn and Chang, 2012). Such actions are employed by most lncRNAs. It has been reported that around 40% of lncRNAs are directly associated with diverse chromatin-modifying complexes, including PRC2, Co-REST, and SMCX (Khalil et al., 2009). The extensive number of associations between lncRNAs and chromatin modifying enzymes suggests in general that lncRNAs directly guide chromatin-remodeling proteins to target loci and function as scaffolds for the recruitment of transcription factors to activate or repress gene expression.
lncRNAs can function either in cis or in trans. For example, lincRNA-P21, a direct transcriptional target of P53, is activated by P53 upon DNA damage; it functions as a transcriptional repressor by recruiting the RNA-binding protein hnRNP K to the promoter region of P53 target genes. The detailed mechanisms of how lincRNA-P21 targets hnRNP K to P53 target genes and how the whole complex inhibits transcription are still unclear (Huarte et al., 2010). However, a recent study indicated that lincRNA-P21 represses somatic reprogramming by further recruiting the H3K9 methyltransferase SETDB1 and the DNA methyltransferase DNMT1 to the promoter region of pluripotent genes with hnRNP K, thus sustaining histone and DNA methylation levels and repressing pluripotent gene expression (Bao et al., 2015). This hnRNP K-dependent recruiting mechanism may also be applicable to P53-dependent transcriptional repression (Figure 2). Like lincRNA-P21, HOTAIR is another well-characterized lncRNA involved in epigenetic regulation in a trans manner. It can simultaneously bind two histone-modifying complexes, PRC2 and LSD1, via two highly structured RNA ends (Figure 2). This combination is coordinated with simultaneous H3K27 methylation and H3K4 demethylation, thus ensuring the stable silencing of the HOX D locus . As a strong indicator of breast cancer, elevated expression of HOTAIR is also coordinated with re-targeting of PRC2 complexes to the promoter region of metastasis suppressor genes to silence their transcription by altering H3K27me levels (Gupta et al., 2010). Unlike HOTAIR, Xist and Kcnq1ot1 mainly function in cis; they both interact with histone methyltransferase PRC2 to elicit locus-specific changes in histone H3K27 methylation (Rinn and Chang, 2012).
A recent genome-wide loss-of-function screening of lncRNAs in embryonic stem cells revealed that most lncRNAs function in trans, though this is still debated (Guttman et al., 2011). Several studies have indicated that cis or trans regulation is dependent on the relative expression level of a specific lncRNA in a specific cellular context. For example, Xist is a well-characterized lncRNA that functions specifically in cis to regulate dosage compensation; however, ectopic expression of Xist also induces de novo heterochromatin formation outside the X chromosome (Jeon and Lee, 2011). A similar phenomenon is also observed for HOTAIR (Gupta et al., 2010). Together, these observations suggest that how an lncRNA functions is also strictly dependent on its relative concentration and stability. Once the concentration of an lncRNA reaches a sufficient level, it may acquire the ability to diffuse to ectopic sites to elicit global gene expression changes.

THE EMERGING THEME OF AN RNA-CHROMATIN NETWORK
As RNA is a critical component for chromatin organization and RNA is twice as prevalent as DNA in purified chromatin fractions, there is no doubt that RNA-chromatin networks should the primary determinants of epigenetic control (Maison et al., 2002;Paul and Duerksen, 1975). In the following sections, we will focus on the lncRNA Xist to illustrate how these intriguing lncRNAs are designed to orchestrate crosstalk between the RNA and chromatin worlds. Xist is an X-inactivated specific transcript originating from the X-inactivation center (Xic). To balance X-chromosome gene expression between males and females, Xist is transcribed from the X chromosome to be silenced and coats the entire locus in cis (Figure 3). Once Xist is transcriptionally activated, it will recruit PRC2 complexes via the RepA element. Meanwhile, the RepC element in Xist specifically binds the YY1 transcription factor, which in turn localizes Xist-PRC2 complexes to the nucleation center of the X chromosome and initiates its inactivation (Lee, 2012). Several models were proposed over the past two decades to describe how Xist-PRC2 complexes spread along the X chromosome for the de novo establishment of H3K27me3 domains. To map Xist loading sites globally, two groups employed an antisense oligo-mediated affinity purification approach to identify Xist loading positions during Xchromosome inactivation. They revealed that Xist-PRC2 complexes first target several gene-rich regions of the X chromosome to shut down transcription first and then spread to gene-poor regions locally (Simon et al., 2013). Importantly, jumping of the Xist-PRC2 complex from the nucleation center to distal loading sites seems to be dependent on the 3D genome architecture, as most of the loading sites are physically proximal to the nucleation center (Engreitz et al., 2013). Moreover, all the initial loading sites of the Xist-PRC2 complexes are located in transcriptionally active regions, indicating that transcribed nascent RNAs may be the initial signal for Xist-PRC2 targeting. The remaining question is how the Xist-PRC2 complexes are targeted to these nascent transcripts.
Two recent reports provided initial clues for how Xist-PRC2 is loaded and how transcription is silenced at loading sites. Antisense affinity purification-mediated quantitative mass spectrometry revealed several proteins that directly interact with Xist, many of which are RNA-binding proteins (Chu et al., 2015;McHugh et al., 2015). One of Figure 3 The action model of lncRNA Xist in X-chromosome inactivation. Xist is transcribed from the X-inactivation center located on the X chromosome to be silenced. After activation, it will first recruit PRC2 complexes through the RepA sequence, and then transcription factor YY1 will bridge the complex onto the chromatin to methylate the future Xi. Later on, the Xist-PRC2/SHARP/HDAC3 complexes will jump from the X-inactivation center to initial loading sites, which were recently found to be gene-rich regions with high transcription activity. SHARP/SMRT/HDAC3 complexes will shut down transcription at initial loading sites and elicit histone H3 deacetylation. The Xist-PRC2 complex then will spread to gene-poor regions and methylate the entire chromosome.
these, called SHARP, is able to directly interact with the SMRT co-repressor and HDAC3, both of which are essential for Pol II transcriptional silencing of the X chromosome (McHugh et al., 2015). Moreover, HDAC3 is able to initiate global histone H3 deacetylation in Xist territories, thus explaining the long-standing observation of hypoacetylation on the entire silenced X chromosome, which appears very early upon initiation of X-chromosome inactivation. Together, these results raise the intriguing hypothesis that RNA-binding proteins in Xist complexes may provide an RNA code for the initial loading and targeting of Xist-PRC2.

CONCLUSIONS AND PROSPECTS
The ability of RNA-binding proteins (RBPs) to bind to noncoding regulatory RNAs and target them to specific nascent transcripts provides an ideal bridge between the RNA and chromatin worlds. A unique property of RBPs is that they can recognize specific RNA motifs, called RNA codes, in nascent RNAs. This may partially explain why a basal level of transcription is critical for lncRNA complex targeting. As lncRNAs tend to associate with several RNA-binding proteins, each of which has a preferred RNA sequence to bind, an array of different RBPs may acquire the ability to bind a long stretch of RNA sequences containing all of the corresponding RNA codes. This is likely the primary determinant that tethers lncRNA complexes to specific genomic loci ( Figure 4).

Figure 4
Combinatorial RNA codes may determine the specificity of lncRNA/remodeling complex targeting. Each RNA-binding protein (RBP) has a specific RNA-binding code, which tends to be a 4-8-nt RNA sequence (Ray et al., 2013). Several RNA codes in a specific arrangement may form an RNA code array. The corresponding RNA-binding proteins in lncRNA complexes may recognize the array containing RNA codes for all RNA-binding proteins; thus, the RBPs and nascent RNA sequence together act as the primary determinants for targeting of lncRNA complexes to specific chromatin loci. lncRNAs, especially lncRNAs, are emerging as molecular scaffolds to organize nuclear compartments and macromolecular machinery in order to execute sophisticated and precise regulation of diverse biological processes. Understanding how lncRNA complexes are organized and targeted to specific loci, as well as where these loci are located, will be helpful in establishing new paradigms for epigenetic regulation.

Compliance and ethics
The author(s) declare that they have no conflict of interest.