Introduction

Over the past decade, intensive studies have demonstrated that long non-coding RNAs (lncRNAs) are key regulators in embryonic development [1], DNA damage responses [2], and human diseases such as neuronal disorders [3], heart dysfunction [4], and cancers [5, 6]. LncRNAs can act as effectors to direct biological processes. Unlike DNA, RNA is more mobile, flexible, able to self-fold into a distinct structure and interact with DNA or RNA by base pairing. Revealing the molecular mechanisms by which lncRNAs regulate those biological functions relies on the study of their interaction between DNA and proteins. Several methodologies have been developed to systematically identify RNA interacting chromatin, RNA interacting proteins and RNA-RNA interactions in vivo, thus uncovering the network of protein-RNA-DNA (Table 1).

Table 1 In vivo mapping RNA interactomes

Chromatin associated RNAs

The first study showed that RNA constitutes a significant fraction of chromatin date back over 50 years ago [41, 42]. Later studies provide evidence to support the idea that a nuclear matrix is made of insoluble proteins and RNA in an interphase nucleus [43]. Treatment of RNase A leads to clumping of chromatin onto nuclear lamina and nucleolus [44, 45], indicating an indispensable role of RNA in the nuclear architecture. However, identifying a specific RNA contributes to a specific phenotype in the nucleus had facing technical challenges before the 1990s. Recently, the burst of studies utilized next-generation sequencing to characterize the interactions of lncRNA and chromatin. While using lncRNA ablation, accumulating evidence has suggested that RNA-protein complexes conduct various functions such as the formation of chromatin compartment, gene regulation, and inter- or intra-chromosomal interactions [44, 45] (Fig. 1).

Fig. 1
figure 1

lncRNA-interacting hub in the nucleus. LncRNAs serve as scaffolds for protein complexes and bring two or several distant DNA loci together. RNA involves the maintenance of nuclear architecture, facilitates chromatin looping, as well as directs the inter- or intra-chromosomal interactions

It has been reported that the CCCTC-binding transcription factor (CTCF), which promotes chromatin loop formation, binds to thousands of RNAs [46]. Depletion of steroid receptor RNA activator (SRA) noncoding RNA that is associated with CTCF, reduced CTCF-mediated insulator activity [47] at the IGF2/H19 imprinting control region and increased IGF2 expression. It was also reported that a lncRNA, Jpx, activates Xist RNA expression by evicting CTCF from binding to Xist promoter [48]. These results imply that lncRNAs in the nucleus cooperate with RNA-binding proteins to regulate the gene expression, perhaps through modulating the chromatin conformation.

It was proposed that lncRNA transcription guides genome organization [49]. For example, Oakes et al. demonstrated that inhibition of rRNA transcription leads to nucleolus disassembly [50]. Induction of the transcription of rRNA genes that are inserted into new chromosomal locations can generate nucleolus-like structures [51]. These results suggest that rRNA transcription could be responsible for the nucleolus organization. Lately, Nozawa et al. had suggested that chromatin-associated RNAs can form a chromatin mesh with scaffold attachment factor A (SAF-A), also known as heterogeneous ribonucleoprotein U (HNRNP-U) [52]. They showed that SAF-A oligomerization, which is dependent on ATP and RNA, drives chromatin decompaction, whereas its monomerization compacts large-scale chromatin organization. The global change of large-scale chromatin by depletion of SAF-A did not alter dramatically gene expression [52], nevertheless, it resulted in excessive DNA damages and genomic instability [52].

Xist RNA regulates gene silencing and chromatin conformation

Xist RNA is one of the well-studied lncRNAs involved in epigenetic regulation and gene silencing. It is a 17-kb lncRNA that is expressed from the inactive X-chromosome (Xi) [53], coating the entire X chromosome to repress gene expression through its ability to recruit repressive complexes such as polycomb complexes PRC1 and PRC2 [54,55,56]. In addition, Xist plays an important role in three-dimensional (3D) conformation and maintains the heterochromatin structure in Xi (Fig. 2). When Xist was depleted, topologically associated domains (TADs) were restored in cis and the Xi (inactive X chromosome) became Xa (active X chromosome) like conformation [20]. Lee et al. also showed that Xist repels positive chromatin factors such as BRG1 (also known as SMARCA4) and cohesin from the Xi [20, 57] to prevent the formation of TADs and chromatin superloops. In a higher order of chromatin structure, mammalian chromosomes are organized into alternating “A/B compartments” in 3D conformation. Spatial compartments usually possess chromatin of similar states, with A compartments being actively transcribed and gene-rich, and B compartments being transcriptionally inactive and gene-poor [58, 59]. Remarkably, ablating SMCHD1 (known as an architectural protein) displayed another layer of compartments--S1/S2 [60] in Xi. Their results indicate that SMCDH1 binds to Xist and facilitates the folding of the S1/S2 compartments into a compartment-less (more compact) structure in Xi (Fig. 2). Altogether, robust evidence has shown that a lncRNA can regulate gene expression and chromatin 3D conformation through its abilities to recruit epigenetic factors, or to act as a decoy or a scaffold for protein complexes.

Fig. 2
figure 2

Xist RNA tethers epigenetic regulators and contributes 3D conformation of the X chromosome. Xist RNA recruits repressive complexes such as polycomb complexes PRC1 and PRC2 to the inactive X chromosome (Xi) to facilitate heterochromatin formation, thus leading to gene silencing. On the other hand, Xist binds to BRG1 and cohesin and repels them from the Xi to prevent TAD (topologically associated domain) formation. Xist mediates the Xi folding in 3D space by tethering SMCHD1, which facilitates the merge of chromatin compartments. A/B compartments are first fused into “S1” and “S2” compartments, after SMCHD1 recruitment, further merging to compact Xi structure

Enhancer RNAs promote chromatin looping

Enhancer RNAs were first discovered by two early studies through the genome-wide sequencing such as RNA-seq or ChIP-seq [61, 62]. They demonstrated that enhancers were occupied by RNA polymerase II (RNAP II) and transcribed into a class of ncRNAs termed enhancer RNAs (eRNAs). The epigenetic features of enhancers consist of histone 3 lysine 4 monomethylation (H3K4me1), histone 3 lysine 27 acetylation (H3K27ac), histone variants (H2AZ, H3.3), co-activators (mediator complex), and an open-chromatin architecture (DNase I hypersensitivity) [63]. The expression of enhancer RNAs could be regulated by various stimuli such as estrogen (ER) or androgen (AR) [64,65,66]. It is generally believed that eRNAs are unstable, transcribed quickly after induction, and degraded rapidly [62]. Several lines of evidence suggest that eRNAs can promote enhancer-promoter looping or facilitate RNA pol II loading, thus upregulating their target genes [65, 67]. For example, the gonadotropin hormone α-subunit gene is regulated by the eRNA in a cell-type-specific manner [68, 69]. The depletion of the eRNA results in a loss of interactions of the enhancers and promoters of the gonadotropin gene [68]. These results further support that lncRNAs enable to direct chromatin looping.

lncRNAs mediate interchromosomal interactions

Homologous chromosome pairing rarely occurs in somatic cells under normal growth conditions with only a few exceptions. Transvection was first observed in 1908 in Drosophila, where homologous chromosomes were closely paired in somatic nuclei [70]. It is an epigenetic phenomenon that can lead to either gene activation or repression. X chromosome pairing is one of the best-known examples of somatic homologous pairing in mammals [71]. Tsix, a lncRNA, is highly expressed in undifferentiated ES cells, and antagonizes Xist action. During ES differentiation in female cells, transient homologous pairing occurs, and the Xist RNA expression increases to initiate chromosome-wide silencing. It was reported that Tsix RNA mediates the homologous pairing between two X chromosomes, thus breaking epigenetic symmetry between the two X chromosomes, as well as driving the random choices for the selection of inactive or active X chromosomes [72, 73]. Moreover, another lncRNA derived from the ends of sex chromosomes, dubbed PAR-TERRA (telomeric repeat-containing RNA), facilitates the pairing by clustering the ends of the sex chromosomes, and creates a hub to constrain the DNA loci in 3D space [10]. Not limited to X chromosomes, several studies indicated that somatic allelic pairing also occurs in a number of autosomal loci, such as Oct4 and various cytokine genes [74,75,76,77]. Although how interchromosomal pairing impacts gene expression and epigenetic regulation remains elusive, it has been proposed that the alignment of the two homologous alleles could allow bi-allelically bound transcription factors to redistribute onto one allele to achieve their lowest free energy state (the aggregated state) [72, 78, 79], thus inducing the transition from biallelic to monoallelic expression.

Methods for RNA-chromatin interactions

To map the RNA-associated chromatin in vivo, Chu et al. and Simon et al. utilized biotinylated antisense DNA oligo probes to capture a specific RNA that is associated with chromatin [7, 8] (Table 1 and Fig. 3), named ChIRP-seq (Chromatin isolation by RNA purification) and CHART-seq (Capture Hybridization Analysis of RNA Targets) respectively. They first fixed cells with either glutaraldehyde (ChIRP) or formaldehyde (CHART) and sheared chromatin into small pieces by physical sonication. To minimize the noises caused by the non-specific interactions of DNA probes and chromatin DNA or proteins, CHART includes an RNase H step to elute the RNA-chromatin complexes. This is to ensure that only lncRNAs-complexes that are targeted by DNA probes will be eluted by RNase H, which specifically degrades RNA of RNA-DNA hybrids. Later, a hybrid method was developed, called CHIRT-seq, which combines glutaraldehyde fixation and RNase H elution [10] to identify genomic binding sites for TERRA RNA. Accumulating studies have used ChIRP-seq to determine the genomic bindings of many lncRNAs, including NEAT1, MALAT1, HOTAIR and MEG3, TERC, LTR ERV-9 [80, 81]. CHART-seq had also successfully identified Xist, NEAT1, and MALAT1 genomic binding sites [8, 9].

Fig. 3
figure 3

Mapping RNA-chromatin interactions. In ChIRP-seq, CHART-seq, and CHIRT-seq, RNA associated chromatin complexes are crosslinked by formaldehyde or glutaraldehyde, captured by antisense oligos that target a specific lncRNA, and DNA fragments that are associated with RNA-protein complex are sequenced. For all-to-all interactions (MARGI-seq, ChAR-seq, and GRID-seq), a linker is ligated to connect DNA and RNA. DNA-RNA chimeras are sequenced. HiChIRP-seq combines Hi-C and ChIRP to identify the interacting chromatin (inter- and intra-chromatin interactions) of a specific lncRNA

Recently, several methods were developed to reveal all interactions between RNA and DNA in the nucleus, including MARGI-seq (Mapping RNA-genome interactions), ChAR-seq (Chromatin-Associated RNA sequencing), and GRID-seq (Global RNA Interactions with DNA) [11,12,13]. The idea of these methods is to ligate chromatin-associated RNAs with their target genomic sequences by proximity ligation using a linker, thus forming RNA-DNA chimeric fragments. These techniques revealed hundreds of chromatin-associated RNAs including previously known lncRNAs and a large set of non-coding RNAs that are bound to active promoters, enhancers, and super-enhancers in a tissue-specific manner [11,12,13]. All-to-all mapping ideally can discover all interactions of RNA-chromatin, however, the coverage depends on the ligation efficacy, the distances between RNA and genomic DNA, as well as the abundance of RNAs.

As discussed previously, long non-coding RNAs may serve as scaffolds to bring two or several distant DNA loci into spatial proximity. To understand how a specific RNA interacts with chromatin in a 3D space, Mumbach et al. developed a method, named HiChIRP, which combines Hi-C and ChIRP [14]. They incorporated azido-CTP into chromatin contacts, captured RNA-chromatin using biotinylated probes, conducted the copper-free dibenzocyclooctyne (DIBO) ‘CLICK’ chemistry to covalently conjugate a biotin for subsequent contacts, and sequenced the DNA contacts. They performed HiChIRP on 7SK small nuclear RNA (snRNA), telomerase RNA component (TERC) and lincRNA-EPS [14]. They found that thousands of loops were enriched on 7SK HiChIRP, and some of them in promoters and active regulatory elements. They also showed that TERC is not only associated with loops that formed between telomeres but also with enhancer-promoter loops at some oncogenes, implying a role of TERC outside of telomeres. Therefore, their results provide insights into how lncRNAs mediate inter- or intra-chromatin looping.

Methods for RNA-protein interactions

RNA binding proteins (RBPs) are associated with a large number of human disorders, such as autoimmune and neurologic diseases [82,83,84]. Remarkable examples include FMRP in the Fragile-X mental retardation protein [83], the neuron-specific Nova and Hu proteins in the paraneoplastic neurologic degenerations [84] and the small nuclear ribonucleoproteins (snRNPs) in systemic lupus erythematosus [82]. To identify RNAs that interact with these proteins in vivo, Ule et al. combined UV cross-linking with immunoprecipitation (CLIP) to pull down RNA-protein complexes [15] (Table 1 and Fig. 4), and the captured RNAs are sequenced. Because CLIP relies on reverse transcription to pass over residual amino acids that covalently attach to the RNA at the cross-link site, cDNAs tend to prematurely truncate immediately before the cross-link nucleotide [15]. Later on, they resolved this problem by PCR amplification of truncated cDNAs via self-circularization of cDNAs, and developed individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) to precisely map protein–RNA interactions [16]. Due to the fact of low efficiency of RNA-protein crosslinking by UV 254 nm in CLIP, Tuchi et al. developed a method, named PAR-CLIP (photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation) [17], to improve the crosslinking efficiency by incorporating 4-thiouridine (4SU) into nascent RNA transcripts in living cells. The cells were then irradiated by UV 365 nm to induce efficient crosslinking of photoreactive nucleoside (4SU)-labeled RNAs to interacting proteins (Table 1). PAR-CLIP has been applied to identify the transcriptome-wide binding sites of several RBPs and microRNA-containing ribonucleoprotein complexes.

Fig. 4
figure 4

Mapping RNA-protein interactions. In CLIP-seq, iCLIP-seq, and PAR-CLIP-seq, RNA associated chromatin complexes are crosslinked by UV light, pulled down using antibodies against a specific protein of interest, and the captured RNA are sequenced. In PAR-MS, ChIRP-MS, and iDRiP-MS, RNA-protein complexes crosslinked by UV light or formaldehyde, captured by antisense oligos, and the pulled-down proteins are identified by mass spectrometry. RBR-ID utilizes protein-RNA photocrosslinking and quantitative MS to identify all proteins that interact with RNA

Recently, researchers have elaborated on the methods for pulling down RNA-proteins complexes via antisense oligos that are complementary to targeted RNA sequences. Therefore, this type of capture is not limited by the requirement of antibodies against the proteins of interest for immunoprecipitation. In 2015, three groups established RNA-centric capture methods, including iDRiP-MS (identification of direct RNA interacting proteins), RAP-MS (RNA antisense purification coupled with mass spectrometry) and ChIRP-MS (comprehensive identification of RNA-binding proteins by mass spectrometry) to determine Xist RNA interacting proteins [18,19,20] (Table 1 and Fig. 4). In these three studies, cells were initially crosslinked to preserve RNA-protein interactions, and RNA-protein complexes were further purified by biotinylated antisense oligos along with highly denaturing purification conditions. ChIRP-MS utilizes formaldehyde to crosslink RNA-protein complexes, whereas RAP-MS and iDRiP-MS apply UV light for crosslinking (Table 1 and Fig. 4). Given that UV light is a short-range crosslinker, the RNA-protein interactions revealed by RAP-MS and iDRiP-MS tend to be direct. In contrast, the formaldehyde crosslinking fixes much larger macromolecular networks and generally leads to identify both direct and indirect factors.

To identify all RNA-binding proteins (RBPs), RBR-ID (proteomic identification of RNA-binding regions) introduces 4SU (4-thiouridine) into RNAs, crosslinks RNA-proteins by UV light, and compares the mass spectrometry of 4SU treated and non-treated samples [21]. An RNA-crosslinked peptide has a different mass, so the intensity of the signal is generally lower in the crosslinked sample compared to the non-crosslinked sample (Table 1 and Fig. 4). Using RBR-ID, about 800 previously unknown and known RNA-binding proteins, such as chromatin factors (CTCF, ATRX, HDAC1, DNMT3, EZH2, TET1, TET2 and HP1), were identified as RBPs in mouse embryonic stem cells [21].

Methods for RNA-RNA interactions

Unlike DNA, RNA is a single strand of nucleic acids and is capable of folding into 3D structures that range from simple helical elements to complex tertiary structures and quaternary ribonucleoprotein assemblies [85]. The changes in RNA structures directly affect their functions in response to cellular conditions. Recently, high-throughput techniques that combine nuclease digestion [22, 23, 86] or chemical probing [24,25,26, 28] with next-generation sequencing have revealed the single-stranded or double-stranded regions of RNA molecules. FragSeq (fragmentation sequencing) [22] utilizes nuclease P1, which specifically cleaves single-stranded nucleic acids (Table 1 and Fig. 5). PARS (parallel analysis of RNA structure) [23] maps RNA structure using RNase V1 or S1 nuclease to digest RNA to determine the double or single-stranded regions, respectively. In SHAPE-Seq (selective 2′-hydroxyl acylation analyzed by primer extension sequencing) [24,25,26], RNAs are treated with chemical probes (such as 1 M7) that covalently modify the RNA in loops and bulges, thus blocking reverse transcription at the modified sites. However, these methods may not represent the RNA structures in vivo due to applying in vitro transcription and in vitro folding of RNAs.

Fig. 5
figure 5

Profiling RNA structure and RNA-RNA interactions. FragSeq, PARS and SHAPE-seq reveal in vitro RNA structures by nuclease or chemical probing. DMS-seq, icSHAPE-seq, and COMRADES map in vivo RNA structures by modifying RNA with chemicals. CLASH and hiCLIP can identify RNA-RNA duplexes bound by a specific protein. MARIO, PARIS, and SPLASH map all RNA-RNA interactions by introducing a linker to ligate RNA and RNA

To profile genome-wide RNA structures in vivo, Ding et al. developed a method, called DMS-seq [28] (Table 1 and Fig. 5). In the DMS-seq, cells are treated with dimethyl sulfate (DMS) that methylates unprotected adenines (As) and cytosines (Cs) of RNA, resulting in the reverse transcriptase stalling at one nucleotide before DMS-modified As and Cs during cDNA synthesis. A year later, Chang’s group developed a method, termed icSHAPE [27] (in vivo click selective 2′-hydroxyl acylation and profiling experiment and NAI-N3) to probe RNA secondary structures in living cells for all four bases. They used a cell-permeable SHAPE reagent, NAI-N3, which adds an azide group to NAI (2-methylnicotinic acid imidazolide) to label flexible regions of RNA. After RNA isolation, a biotin moiety is selectively added to NAI-N3-modified RNA by copper-free click chemistry, allowing the biotin-streptavidin purification of modified RNA after RNA fragmentation. By comparison of in vivo and in vitro icSHAPE, they observed that all RNAs are less folded in vivo, suggesting that RNA structures largely depend on intracellular environments. Moreover, they found that regulatory RNAs, such as lncRNAs and primary microRNA (miRNA) precursors, preserve their structures better than mRNAs in vivo. To selectively enrich some specific RNA molecules, COMRADES (cross-linking of matched RNAs and deep sequencing) combines RNA capture and CLICK chemistry to probe RNA structures and RNA-RNA interactions [29]. In the COMRADES method, cells are crosslinked by azide-modified psoralen, following RNA capture using biotinylated probes, and a copper-free click-chemistry reaction is carried out to link a biotin moiety to the cross-linked regions, allowing the second selection of the cross-linked regions for sequencing (Table 1 and Fig. 5).

In addition, an RNA molecule can base-pair with other RNA molecules to form RNA duplexes (such as miRNA and its target RNA) bound by RBPs. To identify miRNA targets, Tollervey et al. developed a method, named CLASH (crosslinking, ligation, and sequencing of hybrids) [30, 31] to identify human AGO1 interacting RNA duplexes (Table 1 and Fig. 5). In the CLASH method, RNA-protein complexes are UV-crosslinked and purified by antibodies against RNA binding proteins, following the ligation of RNA-RNA hybrids, and the chimeric RNAs of RNA-RNA duplexes are deep sequenced. Similarly, hiCLIP [32] also can map RNA-RNA duplexes bound by RBPs. In the hiCLIP method, a linker-adapter is introduced to the ligation step of RNA-RNA duplexes to improve the ligation efficiency (Table 1 and Fig. 5). Moreover, the ligation of two chimeric RNAs has been applied to other methods including MARIO (Mapping RNA interactome in vivo) [33], PARIS [34] and SPLASH [35], which map all RNA-RNA interactions (RNA structures and interactome) in living cells.

R-loops in gene regulation and genomic instability

R-loops are three-stranded nucleic acid structures in which a strand of RNA hybridizes with a strand of DNA, while the other strand of DNA loops out. The R-loop structure was first described in 1976 in the study [87] in which Thomas et al. showed that RNA could hybridize to double-stranded DNA in the presence of 70% formamide in vitro. A year later, Roberts and Sharp used R-loop hybridization technique to map an adenovirus 2 (Ad2) mRNA to its genome and found that some DNA sequences of the Ad2 coding gene were not hybridized with the matured RNA, suggesting that Ad2 genome consists of non-coding sequences, later known as introns [88, 89]. Recently, genome-wide studies have shown that R-loops are found in vivo, especially enriched in promoter regions [36]. Ginno et al. demonstrated that R-loop formation is involved in gene regulation via its potential to protect DNA from methylation [36]. Moreover, recent reports showed that antisense long noncoding TARID (TCF21 antisense RNA inducing promoter demethylation) forms an R-loop at the TCF21 promoter, thus facilitating GADD45A binding, local DNA demethylation and TCF21 expression [90, 91]. Another example is GATA3-AS1 lncRNA, which forms an R-loop structure with the central intron of GATA3-AS1 and tethers the MLL H3K4 methyltransferase to GATA3 gene locus, thereby regulating GATA3 expression [92]. Furthermore, it has been proposed recently that R-loops act as intrinsic Pol II promoters to induce lncRNA transcription [93]. The depletion of R-loops by overexpressing RNase H1 causes the selective reduction of antisense lncRNA transcription [93].

It is generally believed that R-loops form in cis during transcription when a nascent RNA hybridizes to the DNA template behind the moving RNA polymerase [94]. However, research in yeast suggests that RNA:DNA hybrids can form in trans, which means that an RNA transcript at one locus hybridizes with homologous DNA at another locus [95], and the hybrids are likely involved in homologous recombination. Moreover, excellent studies have shown that genome instability arises from lesions generated from the formation of R-loops [95, 96]. Because transcription and replication share a common DNA template, when replication forks encounter transcription machinery, it results in transcription-replication collisions that lead to DNA damage. Therefore, the persistent RNA:DNA hybrids could be threats for genomic rearrangements [96].

Methods for mapping R-loops

To map R-loops in a genome-wide scale, Ginno et al. developed a method, named DRIP-seq (DNA-RNA immunoprecipitation coupled to high-throughput sequencing), which utilizes an antibody (S9.6) [36] to specifically purify RNA:DNA hybrids (Table 1 and Fig. 6), and the captured DNA fragments are further sequenced. Conventional DRIP-seq generally produces a robust signal but has a limited (approximately kilobase) resolution, a higher background, and a lack of strand specificity. Another method, named DRIPc-seq (DNA:RNA immunoprecipitation followed by cDNA conversion coupled to high-throughput sequencing), which builds on DRIP, except that a strand-specific RNA sequencing is performed to profile R-loops [40]. DRIPc-seq increases the resolution of R-loop profiling and shows the strand-specificity. However, the sensitivity of DRIP for revealing authentic R-loops in vivo has been doubted due to the fact that the immunoprecipitation for R-loops is usually performed using isolated genomic DNA without any crosslinking. Thus R-loops could be disrupted or formed in vitro after cell lysis. Dumelie et al. developed a method, named bisDRIP-seq (bisulfite-based DRIP-seq) [38], which selectively labels the single-stranded DNA that loops out from an R-loop. They used bisulfite to convert cytosine residues into uracil residues within the genomic DNA regions that contain single-stranded DNA simultaneously when cells were lysed. Their results showed that bisDRIP-seq could map R-loops at near nucleotide resolution and detect the boundaries of R-loops. In contrast to DRIP, R-ChIP (RNase H chromatin immunoprecipitation) utilizes an RNase H, which specifically recognizes DNA:RNA hybrids in vivo to map R-loops [39]. By expressing a catalytically dead enzyme, R-ChIP captures R-loops using a standard ChIP-seq protocol, which involves both fixation to stabilize R-loop/RNase H complex and sonication to increase the resolution (Table 1 and Fig. 6).

Fig. 6
figure 6

Profiling R-loops (RNA-DNA hybrids). DRIP-seq and bisDRIP-seq utilize S9.6 antibodies to pull down DNA:RNA hybrids (R-loops), and the captured DNA fragments are sequenced. DRIPc-seq builds on DRIP-seq, and the captured RNA fragments are reversed transcribed and sequenced. R-ChIP captures R-loops using a standard ChIP-seq protocol to pull down R-loop/catalytic-dead RNase H complex

lncRNAs mediate phase separation of membrane-less organelles

Cellular organelles such as mitochondria and Golgi apparatuses composed of lipid bilayer membrane structures, which help the formation of compartments and separate biological processes within a cell. In contrast, membraneless structures are formed through a process known as liquid-liquid phase separation (LLPS) and are made of RNA-protein droplets [97]. One of the most prominent membraneless structures in the nucleus is the nucleolus [98], which produces the ribosomal RNA and consists of a variety of proteins and RNA. Studies have shown that rRNA transcription is important for nucleolar assembly [50, 51, 99, 100]. Other membraneless structures, such as paraspeckles, Cajal bodies (CBs), histone locus bodies (HLBs) and promyelocytic leukemia (PML) bodies are also found in the nucleoplasm as phase separated-like droplets (Fig. 7), while stress bodies and process bodies are RNA granules in the cytoplasm. DNA is typically absent from the interior of these liquid-like droplets, whereas lncRNAs serve as scaffolds for their formation and maintenance [101,102,103].

Fig. 7
figure 7

RNA-protein droplets in cells. RNA granules are made out of protein and RNA complexes, and their formations are driven by liquid-liquid phase separation. Nucleoli, paraspeckles, Cajal bodies (CBs), histone locus bodies (HLBs) and promyelocytic leukemia (PML) bodies are RNA assemblies in the nucleus. P-bodies and stress granules are found in the cytoplasm. RNAs act as a regulatory element to control their sizes and properties. The imbalance of RNA/protein ratio in such RNA assemblies could lead to human diseases

How do RNAs promote phase separation synergistically with protein-protein interactions? The in vitro studies reported that RNA repeats, such as trinucleotide repeat and G-quadruplex, can undergo a phase transition to form either a condensed liquid or a gel-like state [104] through the multivalent base-pairing between RNA molecules. Droplet-like assemblies of RNA are associated with certain neurodegenerative diseases, including Huntington disease, muscular dystrophy, and amyotrophic lateral sclerosis [105]. It has been proposed that such gel-like RNA foci may contribute to neuronal dysfunction in vivo. A recent study showed that RNA plays an important role in the phase behavior of prion-like RBPs, such as TDP43 and FUS [106], which are largely soluble in the nucleus but form solid pathological aggregates when mislocalized to the cytoplasm. The in vitro studies indicated that the ratio of RNA/protein is important for droplet formation and phase separation [106]. Remarkably, reduction of nuclear RNAs or disruption of RNA binding leads to excessive phase separation and the formation of cytosolic assemblies in cells [106], suggesting that the higher RNA concentrations in the nucleus act as a buffer to prevent aggregation of RBPs in the cytoplasm. Then the question is how the RNA/protein ratio is regulated in such droplets. Hondele et al. recently reported that RNA flux into and out of phase-separated organelles is controlled by RNA-dependent DEAD-box ATPases (DDXs), which contain low-complexity domains (LCDs) that are crucial for the formation of multivalent meshworks within membraneless droplets [107]. In addition, Ries et al. showed that methylation on mRNAs triggers phase separation of endogenous compartments, such as P-bodies, stress granules or neuronal RNA granules [108]. These studies indicate that the abundance of nuclear RNAs and the modifications of RNA contribute to the dynamics of such membraneless organelles. However, there are intriguing questions that remain elusive. How do RNA structures impact on phase separation? How do RNA-mediated droplets involve in chromatin organization and gene regulation? What signals trigger the reorganization of such structures?

Conclusions

The Human Genome and ENCODE Projects have shown that above 90% of the genome is transcribed [1, 72] into non-coding RNAs. However, the functions of most non-coding RNAs remain largely unknown. In the last decade, emerging new techniques combined with high throughput sequencing have profiled the interactions between DNA, RNA, and proteins, thus facilitating the studies of lncRNAs functions in cells. LncRNAs can act as a guide, or decoy, or a scaffold for protein complexes to mediate the epigenetic regulation [109]. Chromatin-associated lncRNAs involves nuclear architecture and chromatin conformation. Given that lncRNAs are long and mobile, they could serve as bridges to mediate chromatin looping and to drive the inter- or intra-chromosomal interactions [44, 45]. Moreover, RNA is able to hybridize with DNA to form R-loops, which have been found to contribute to gene regulation and genomic instability. RNA also mediates the liquid-liquid phase separation through its ability being as a multivalent scaffold for the binding of RBPs, thus regulating the sizes and the dynamics of membraneless organelles that carry biological processes. It is still a beginning to uncover the surface of the lncRNA world. There are still a lot we don’t know and plenty of work that needs to be done. Because RNA molecules consist of specific sequences, it is realistic and easy to design drug targets by blocking their actions using antisense oligos, RNA interferences, or aptamers. We hope that by understanding the mechanisms of lncRNAs action, RNA-centric therapies could be potential options to treat human diseases.