Introduction

Long non-coding RNAs (lncRNAs) are typically defined as RNAs longer than 200 nucleotides in length without significant coding potential, often playing regulatory roles in mammalian systems (Winkle et al. 2021). Because this class of RNA molecules has been found to be important for processes in cancer, development, and brain function, there is keen interest in the pharmaceutical community (Kashi et al. 1859; Hon et al. 2017). However, the enormous size of these RNAs, which are often kilobases or tens of kilobases in length, makes the prospects of drugging them daunting. If the drug is a small molecule, then which of the 10,000 bases on a 10 kb lncRNA should be targeted? If the drug is an antisense oligo, which region of the RNA should be targeted? Which regions should not be targeted?

The steps of pre-clinical trials, clinical trials, and regulatory approval have been in the news lately regarding COVID-19 vaccine development. Similar steps are required for protein-based drugs, such as anti-viral therapeutics, cancer drugs, anti-depressants, antibiotics, and disease-related therapies (Matthews et al. 2016). However, before these steps can begin, target identification, lead generation, lead optimization, and drug candidate selection must take place. Each of these stages requires considerable structural characterization. In the case of protein-based drugs, often a high-resolution 3-D structure of the target protein is solved by either X-ray crystallography or cryo-EM, followed by binding pocket characterization, hit identification, lead development, and lead optimization (Grey and Thompson 2010). Currently, there are no high-resolution 3-D structures of lncRNAs.

In addition to drug development, structural biology has been quite useful for understanding protein mechanism. Since a protein’s mechanism and function are often determined by the other molecules that the protein interacts with, the 3-D structure of the protein can directly reveal its mechanism, as the structure provides the details of how the protein fits with its interaction partner, i.e., the details of how the protein works. Many actually define protein mechanism as the relationship between its structure and function. This is exemplified by the fact that, if, hypothetically, the positions of amino acids in a protein were to change drastically, then the function would likely also change drastically, if not ruined entirely.

Relative to the history of structural biology in mechanistic studies and drug development in the protein community, and the fact that high-resolution structures of lncRNAs have not yet been solved, lncRNA mechanism is not well understood at the molecular level of detail. Without a clear understanding of structure, structure–function relationships, and mechanism, lncRNA drug discovery is in its early stages. In the case of lncRNAs, we expect understanding mechanism will require determination of the structure–function relationship for the RNA, and determination of the structure–function relationship will require solving the lncRNA 3-D structure at high resolution, similar to how structure–function relationships and mechanisms were worked out for proteins. Thus, we anticipate that solving structures of lncRNAs will be an important stage for determination of lncRNA mechanism and for lncRNA drug discovery.

Although long non-coding RNAs have been shown to be important in development, epigenetics, stem cell biology, plant biology, RNA processing, hormone response, cancer, and brain function (Rinn and Chang 2012; Klattenhoff et al. 2013; Mercer and Mattick 2013; Swiezewski et al. 2009; Ulitsky and Bartel 2013; Gong and Maquat 2011; Kaneko et al. 2014; Heard et al. 1999; Rocha et al. 2014; Boumil and Lee 2001; Davidovich et al. 2013, 2015; Cech and Steitz 2014; Brown et al. 2014; Dharap et al. 2012; Ponting et al. 2009; Derrien et al. 2012), many researchers have avoided 3-D structural studies either because (i) they believe the RNAs are unstructured, (ii) they believe structural studies of lncRNAs are too difficult, or (iii) they are unaware of the success of structural biology techniques in other fields of RNA biology, such as RNAi, Crispr-Cas9, protein synthesis, splicing, and bacterial metabolism (Doherty and Doudna 2000; Wilson and Doudna 2013; Pyle 2016; Voorhees and Ramakrishnan 2013; Montange and Batey 2008; Frank and Gonzalez 2010; Hashem and Frank 2018). Because of the reluctance to study lncRNA structures, structure–function relations for these RNAs have lagged behind other sub-fields in RNA biology. The hesitancy, however, is not necessarily justified. Reason (i) is not necessarily true, in light of the physical properties of RNA: since the bases and backbone of RNA are polar, Watson–Crick and non-Watson–Crick base pairs form for almost any RNA sequence. This propensity to form base pairs combined with well-known non-specific backbone-to-base backbone-to-backbone (often ion-mediated) interactions results in the tendency of RNA to ‘stick to itself’ and form intricate secondary and tertiary structures. Reason (ii) has merit; however, breakthroughs in cryo-EM have proved the feasibility of 3-D studies of purely RNA systems, producing cryo-EM structures of riboswitch RNAs, frame shifting pseudoknot elements of mRNAs, and tRNA-like structures (Zhang et al. 2019, 2020; Kappel et al. 2020; Sherlock et al. 2021). As for reason (iii), x-ray crystallography, small-angle x-ray scattering, nuclear magnetic resonance imaging, and cryo-EM have enjoyed enormous success in determining 3-D structures of other RNA systems (Pyle 2016; Montange and Batey 2008; Zhang et al. 2019, 2020; Kappel et al. 2020; Sherlock et al. 2021; Liu et al. 2021; Roy et al. 2017a; Torabi et al. 2021; Pollack 2011). Here, we describe the lncRNA functional studies, review high-resolution structure–function relationships in other RNA systems, and discuss early results and the prospects for higher-resolution structure–function studies in lncRNAs.

Long non-coding RNAs (lncRNAs)

Long non-coding RNAs (lncRNAs) are often found in mammalian epigenetic systems, exceed 200 nucleotides in length, polyadenylated, alternatively spliced, low in abundance, and display relatively low sequence conservation. A subset of the non-coding RNAs (K. Numata et al. 2003; P. Carninci et al. 2005), long non-coding RNAs have been shown to have specificity to tissue type and developmental stage (Ponjavic et al. 2007; Dinger et al. 2008; Rinn and Chang 2012). Many genome-wide studies have been performed to identify large classes of lncRNAs associated with environmental changes, tissues, and diseases (Rinn and Chang 2012). Loss-of-function studies have been performed to characterize functional roles of lncRNAs (Charles Richard and Eichhorn 2018). Biochemical and low-resolution methods have been used to obtain structural information yielding glimpses of lncRNA structure (Novikova et al. 2013a). High-resolution structural biology techniques have been instrumental in determining structure–function relationships in other classes of RNA (riboswitches, ribozymes, and ribosomes) (Westhof 2015; Reyes et al. 2009). These structure–function relationships enable more precise understanding of mechanism in terms of structural dynamics, thermodynamics, kinetics, and Mg2+ effects. Yet, few studies have examined lncRNA mechanism at the atomistic level of detail (Novikova et al. 2013a).

Structure–function relationships

Structure–function relationships have been critical in understanding biological systems in molecular detail. Since the inception of structural biology, 3-D structures of proteins have led to breakthroughs in understanding protein binding, protein complex formation, ligand binding, and self-assembly, all of which are important throughout biology and biomedicine. In biological systems, we often first know that a molecule is important, and even what it does, but not how it does it. The ‘how,’ in the case of a protein, is then worked out by solving the protein’s 3-D structure and relating it to its function. Once we know the ‘how,’ we can begin to understand the molecule in context and start thinking about drugging the molecule. In the case of protein molecules, their function almost always hinges on interaction with another molecule, such as another protein, RNA, or DNA molecule. Solving the structure of the protein in isolation and complexed with its target molecules produces invaluable information about its function and about the structure–function relationship. A wide variety of techniques have been developed to gain information about the 3-D structures of proteins and protein complexes, including X-ray crystallography, X-ray free-electron laser crystallography, cryo-EM, NMR, and small-angle scattering (Adams et al. 2013; Sekhar and Kay 2019; Glaeser 2019; Rambo and Tainer 2013; Smith et al. 2018; Gruner and Lattman 2015). X-ray crystallography has been a leading technique for many decades. For example, the molecular basis of the biological functions of the lysozyme, ATP synthase, and ion channels was provided by their X-ray crystal structures (Blake et al. 1965; Doyle et al. 1998; Boyer 1997). In addition to producing the mechanism of a molecule, structural studies (Hunter 1997) have also led to new drugs, as in the case of Plexxikon (scaffold-based drug discovery) and Zelboraf (metastatic melanoma)(Gul and Zimmermann 2017). More recently, cryo-EM has taken center stage in protein structural biology. For example, cryo-EM structures of the COVID-19 spike protein in various states were used to optimize stable spike constructs for mRNA-based vaccines (Ma et al. 2021). In the case of nucleic acids, the DNA double helix structure immediately led to an understanding of the role of DNA in the cell as the carry of reproducible information (Watson and Crick 1953). More recently, cryo-EM structures of nucleosome complexes have produced new insights into chromatin organization and gene regulation (Han et al. 2020; Takizawa et al. 2020). On the whole, high-resolution 3-D structures have been instrumental in determining mechanism, discovering drugs, and identifying function in a large number of biomolecular systems.

Structural studies of RNA systems

As far fewer RNA systems have been studied relative to protein systems, RNA structural biology has lagged behind protein structural biology considerably. However, as described below, high-resolution structures have been obtained for several classes of RNAs, leading to important insights into their structure–function relationships.

Self-splicing introns

Some of the earliest RNA-only systems solved to high resolution are the group I and group II introns (Pyle 2016). Using X-ray crystallography, these structures revealed the overall 3-D architecture of the RNA, detailed local RNA–RNA interaction motifs connecting the RNA together, the role of Mg2+ ions in the structure, and how the 2-D secondary structure maps translate into 3-D structures. Importantly, the 3-D structures were critical in determining the mechanism of catalysis for splicing, answering questions that were difficult or impossible to solve using other methods.

Riboswitch RNAs

Riboswitch RNAs are regulatory stretches of RNA commonly residing in the 5’-UTR of mRNA in bacterial metabolism-related genes (Montange and Batey 2008; Breaker 2011). These RNAs control gene expression by detecting environmental molecules through ligand-binding 3-D folds that alter the regulatory behavior of the RNA. In a riboswitch, one sequence has two competing secondary structures (and two competing tertiary structures). The presence of ligand shifts the equilibrium to one structure, altering the gene expression ON/OFF state. The majority of riboswitches were discovered with cell-free, in vitro chemical probing studies revealing the ligand dependence of the secondary structure, supported by in vivo functional studies. These in vitro secondary structures were later validated by in vitro high-resolution X-ray crystallographic 3-D structures (Serganov and Patel 2012). The dynamics of these systems have been studied using small-angle X-ray scattering (SAXS) experiments and molecular dynamics simulations (Zhang et al. 1839). SAXS and biochemical studies have also revealed that ligand-free conformations tend to be extended and flexible, whereas ligand-bound conformations tend to be compact and ordered. Most recently, molecular dynamics simulations have been used to integrate crystallographic, biochemical, and SAXS data, elucidating the operational principles of riboswitches and their dependence on magnesium (Roy et al. 2017a, 2019, 2017b; Hayes et al. 2014, 2015; Hennelly et al. 2013).

Ribonucleoprotein complexes

Structural studies of several ribonucleoprotein complexes have been studied, including the ribosome, RNA processing complexes, and the spliceosome. The ribosome is perhaps the most extensively studied ribonucleoprotein complex (Jobe et al. 2019). Structural studies have been attempted since the 1980s, commencing with biochemical studies to determine the secondary structure of the small subunit ribosome RNA (16S) and large subunit ribosomal RNA (23S) (Rummel and Noller 1973; Woese et al. 1980; Noller et al. 1981; Noller and Woese 1981) Neutron scattering enabled the rough placement of proteins in 3-D space relative to the ribosome complex (Engelman and Moore 1976; Moore et al. 1975). Early cryo-EM studies yielded the morphologies of the two subunits, the tRNA and mRNA ligands, the ribosomal proteins, and various conformations of the ribosome (Frank and Gonzalez 2010). Details were filled in with X-ray crystallography structures (Voorhees and Ramakrishnan 2013). High-resolution cryo-EM enabled studies of ribosomes in a wide variety of functional states, for a variety of different species (Hashem and Frank 2018). With structures in hand, structural dynamics studies have been performed, integrating cryo-EM, single-molecule FRET, and large-scale molecular dynamics simulations, providing a comprehensive picture of the molecular mechanism of the ribosome, characterizing the energy landscape and transition rates in the context of the detailed structures of beginning, ending and a plethora of intermediate states for various stages of protein synthesis (Sanbonmatsu 2012, 2019, 2006; Morse et al. 2020; Sanbonmatsu et al. 2005; Tung and Sanbonmatsu 2004; Girodat et al. 2020; Wasserman et al. 2016; Ferguson et al. 2015; Munro et al. 2009).

RNA processing

3-D structures of macromolecular complexes that process RNA molecules have yielded important insights. Passmore and co-workers used X-ray crystallography to obtain high-resolution structures of Saccharomyces cerevisiae Pan2 in complex with RNA to show that Pan2 recognizes the stacked, helical conformation of poly(A) RNA (Kumar et al. 2019). This complex was reconstituted in a cell-free, in vitro system (Tang et al. 2019). They also used a combination of crystallography and electron microscopy to obtain structures of CPF/CPSF, a multi-protein complex essential for formation of mRNA 3’ ends, showing that the process requires incorporation of the Ysh1 endonuclease into an eight subunit core complex (Hill et al. 2019).

Spliceosome

The high-resolution structures of a large number of full spliceosome complexes have been solved using cryo-EM over the past five years in a wide variety of splicing states. These structures were the culmination of decades of biochemical and genetic work, as well as lower-resolution cryo-EM structures of complexes along with high-resolution crystallography structures of smaller sub-regions of the complex (Yan et al. 2019). The spliceosome complex assembles on the pre-mRNA through a variety of protein and RNA interactions that work together to recognize specific splicing sites. This is followed by RNA-based catalyzation of cleavage and ligation, removing the intron stretches of RNA and reconnecting the remaining RNA to form the mRNA. Like the ribosome, the spliceosome has a rich history in mechanism and structural studies and, in terms of structural studies, is one of the most important ribonucleoprotein complexes (Fica and Nagai 2017; Fica 2020; Wilkinson et al. 2020; Smathers and Robart 1862). Unlike the ribosome, spliceosome operation is significantly more complex: factors are continuously coming on and off the complex during the myriad of substeps required for splicing. It has been hypothesized that in humans, the composition of the complex may be transcript-specific. Furthermore, in addition to undergoing changes in tertiary structure, the secondary structure of the RNA also changes, requiring major rearrangements of the RNA. Although, from an RNA structure standpoint, spliceosome operation is more complex than ribosome operation, the spliceosome may present a more apt analog to a lncRNA molecular machine, since the complex is more dynamic, both in terms of the composition of the complex and in terms of the conformational changes required for the RNA (Wilkinson et al. 2020).

3-D structural techniques used to study other classes of RNAs

High-resolution techniques have been used to determine structures for a number of other classes of RNA systems, such as riboswitches, ribozymes, introns, ribosomes, and spliceosomes. In terms of techniques, nuclear magnetic resonance imaging (NMR) can be used to study small systems. This method has the advantage of capturing precise information about the dynamics of the RNA, multiple configurations, and rates of transition between configurations (Liu et al. 2021). NMR has been used to obtain such information for a variety of riboswitches and regions of viral RNAs, as well as a small region of Xist RepA lncRNA (Duszczyk et al. 2008). X-ray crystallography is a traditional form of high-resolution structure determination used for small- and medium-sized RNA systems. High-resolution structures have been determined for riboswitches, ribozymes, introns, and ribosomes. Cryogenic electron microscopy (cryo-EM) can be used to determine high-resolution structures for medium-sized and large-sized protein systems and ribonucleoprotein systems. To date, this method has determined a wide variety of structures for ribonucleoprotein complexes, including many ribosome complexes and several spliceosome complexes. Quite recently, the method has been used to obtain medium-resolution structures of several RNA-only systems, including riboswitches and regions of viral RNAs (Zhang et al. 2019, 2020; Kappel et al. 2020; Sherlock et al. 2021).

Studies of long non-coding RNAs

Loss-of-function studies have identified important lncRNAs, in terms of their functional roles in the cell, including epigenetic sensing and recruitment, sponging, P-bodies, scaffolding, RNA processing (lncRNAbnb1/2), and hormone response (Gong and Maquat 2011). Knockdown studies also improve understanding. Knockdowns of Braveheart showed that this lncRNA is critical for lineage commitment in cardiomyocytes (Klattenhoff et al. 2013). CRISPR/Cas9 knockout studies have expanding the number of clear causal roles of lncRNAs. CRISPR/Cas9 knockout of an 11-nucleotide r-turn RNA motif showed that this structural motif is critical for the overall function of Braveheart (Xue et al. 2016). Knockouts had a major reduction in embryoid body beating assays, along with dramatic decreases in normal development. Protein binding studies offer some insight into mechanism. In pulldowns and SAXS analysis, Braveheart was shown to bind zinc finger protein CNBP (Kim et al. 2020). Several genome-wide studies have been performed to identify proteins that bind to Xist (Minajigi et al. 2015a).

Mechanisms of lncRNAs

One of the earliest discovered lncRNAs is Xist (X chromosome inactivation-stimulated transcript), responsible for inactivation of the X chromosome during development (Lee and Jaenisch 1997). More recently, several lncRNAs have been associated with HOX gene systems during development (Rinn and Chang 2012). The 1/2sbs-lncRNA controls mRNA decay by hybridizing with mRNA to form a platform for STAU1 protein binding, triggering degradation of mRNA (Gong and Maquat 2011). Other lncRNAs are required for p21 activation (Huarte et al. 2010), stem cell reprogramming (Guttman et al. 2011), and stress response (Kino et al. 2010).

LncRNAs with phenotypes

Although the physiological relevance of many of the reported lncRNAs has not been determined, many lncRNAs have been shown to possess important, visible phenotypes (Li and Chang 2014). In addition to Xist, required for dosage compensation, the Braveheart lncRNA has been shown to be required for lineage commitment in cardiomyocytes (Klattenhoff et al. 2013). FENDRR lncRNA is required for heart, lung, and gastrointestinal development (Sauvageau et al. 2013). Linc-brn1b is required for neocortex development (Sauvageau et al. 2013). The COOLAIR lncRNA is required in A. thaliana for cold-timed flowering (Swiezewski et al. 2009). Additionally, the NEAT1 lncRNA has the clear phenotype of being critical for paraspeckle formation (Naganuma et al. 2012; Nakagawa and Hirose 2012; Sasaki et al. 2009).

LncRNA–protein interactions

Many studies have been performed to determine the protein partners of lncRNAs and elucidate the functions of these RNA–protein interactions (Davidovich et al. 2015; Minajigi et al. 2015a). Lee and co-workers developed an RNA centric proteomic method (iDRIP) to determine the Xist lncRNA interactome, showing cohesin repulsion and an RNA-directed chromosome conformations (Chu et al. 2021; Minajigi et al. 2015b). The group also identified lncRNAs associated with Polycomb repressive complex PRC2 using RIP-seq (Zhao et al. 2010). Carninci and co-workers developed a new technology to map genome-wide RNA–chromatin interactions in intact nuclei (RNA And DNA Interacting Complexes Ligated and sequenced, RADICL-seq) (Bonetti et al. 2020). This proximity ligation-based methodology identifies patterns of genome occupancy for different classes of transcripts (Bonetti et al. 2020).

2-D Structural studies of lncRNAs: LncRNA secondary structure studies using chemical probing

Genome-wide studies of secondary structure have revealed that lncRNAs are more structured than mRNAs, but less structured than ribosomal RNAs (Wan et al. 2014, 2013, 2012; Ouyang et al. 2013; Kertesz et al. 2010; Ding et al. 2014; Rouskin et al. 2014). Detailed secondary structure studies of complete, intact lncRNA systems show that some lncRNAs are hierarchically structured with sub-domains containing modular RNA secondary structure motifs (Novikova et al. 2012; Ilik et al. 2013; Somarowthu et al. 2015). Studies of Malat-1 and related lncRNAs show that the 3’-end forms a triple helix, protecting it from RNase degradation (Brown et al. 2014; Wilusz et al. 2012, 2008). Other studies have elucidated lncRNA–protein interactions, emphasizing the need for detailed structural studies and mechanistic studies at the molecular and atomistic level (Chu et al. 2015; Spitale et al. 2015).

LncRNAs tend to have relatively low sequence identity and are often described as non-conserved. Some non-coding RNAs (miRNAs and rRNAs) have very high sequence identity (> 78% in nucleic acid sequence identity) (Griffiths-Jones et al. 2003). In contrast, many other important classes of non-coding RNAs have relatively low sequence identity (nucleic acid sequence identity of ~ 50%-65%), but secondary structures that are conserved across thousands of sequences. For example, riboswitches, which regulate metabolism in bacteria, typically have sequence identities of only 50%–65%, but have secondary structures conserved across thousands of species (Griffiths-Jones et al. 2003). The U2 and U4 spliceosomal RNAs have sequence identities < 60% but secondary structures conserved for > 9000 sequences. The 5S ribosomal RNA has sequence identity of ~ 60% but secondary structure conserved over 229,000 sequences. The group I intron has decidedly low sequence identity (~ 36%) but structure conserved across 60,000 species (Griffiths-Jones et al. 2003).

RNAs with low sequence identity are difficult to find using conventional search algorithms such as BLAST. However, knowledge of secondary structure dramatically enhances the search success. A wide variety of computational techniques to predict RNA secondary structure exists, using either free-energy estimates, multiple sequence alignment and direct coupling analysis, machine learning, or a combination of these (Yao et al. 2017; Dallaire and Major 2016; Parisien and Major 2008; Mathews 2019; Spasic et al. 2018; Tan et al. 2017; Eggenhofer et al. 2016; Lorenz et al. 2016a, 2016b; Pucci et al. 2020). These can be highly effective for a range of RNAs. With the growing number of possibilities for long-range interactions, pseudoknots, and multiway junctions, the number of potential RNA secondary structure folds exponentiates as a function of sequence length, making the task of predicting long non-coding RNA secondary structure formidable. In many RNA systems, in vitro chemical probing experiments have produced highly accurate secondary structures, subsequently verified by X-ray crystallography. In the case of riboswitches, RNA secondary structures were determined experimentally for a single species using in vitro chemical probing of the RNA in cell-free reconstituted systems (Regulski and Breaker 2008; Winkler et al. 2002, 2004; Mandal et al. 2003, 2004; Sudarsan et al. 2006, 2008; Cheah et al. 2007). Next, this structure was used as a fingerprint to find the structure in thousands of other species, despite the low sequence identity (Weinberg et al. 2007). These secondary structures determined from cell-free systems by chemical probing were verified by X-ray crystallography (Montange and Batey 2008, 2006; Batey et al. 2004; Gilbert et al. 2008; Stoddard et al. 2010).

To determine the RNA secondary structure of lncRNA molecules, strategies similar to those used to determine the original 16S rRNA secondary structure (Woese et al. 1980; Noller et al. 1981; Noller and Woese 1981) and the riboswitches (Winkler et al. 2003) have been employed. Chemical probing experiments determine nucleotides that are highly mobile and likely to reside in looping regions, as well as those nucleotides with low mobility, likely to participate in Watson–Crick base pairs. To cope with the large RNA size, 3S (Shot-Gun Secondary Structure) can be used, which probes the entire RNA first and then probes shorter segments of the RNA in successive rounds of probing (Novikova et al. 2012, 2013b). By matching signals of short segments with full RNA experiments, modular sub-domains are identified, for which a secondary structure is often readily discernable. The resulting secondary structure can be used to improve existing phylogenetic sequence alignments and, in principle, can be used to find instances of the lncRNA not previously found in other species (Hawkes et al. 2016).

An interesting case is the 873 nt steroid receptor RNA activator lncRNA in humans (SRA-1). This lncRNA co-activates the hormone response in human T-47D cells and co-immunoprecipitates with a large number of important proteins, including several hormone receptors (estrogen receptor, progesterone receptor, androgen receptor, glucocorticoid receptor, and thyroid receptor) (Yao et al. 2010; Xu et al. 2009; Colley et al. 2008; Huet et al. 2014). Binding assays in in vitro cell-free reconstituted systems have shown strong binding to the pseudouridinylase Pus1p, estrogen receptor, thyroid receptor, the sex reversal factor DAX-1, and the epigenetic factor SHARP. While the primary function of SRA-1 is to co-activate the hormone response, a speculated secondary function involving the binding of SRA-1 to its cognate protein SRAP has recently been shown not to occur (SRA-1 does not bind to SRAP) (McKay et al. 2014).

A previous study demonstrated that SRA-1 contains four modular secondary structure sub-domains, each containing multiple secondary structure motifs. The secondary structure was consistent with four different probing techniques (SHAPE, DMS, in-line, and RNase V1). Binding studies have shown that SHARP binds to the helix 12/helix 13 (H12/13) domain (Arieti et al. 2014).

Because the probing signal in vivo may to be obfuscated by multiple proteins binding to the RNA (Davidovich et al. 2013, 2015), in vitro studies establish an important ab initio structure. There are few known cases of high-resolution 3-D structures, where an in vitro structure of an intact, individual RNA has been shown to differ from its corresponding in vivo structure. For example, the vast majority of crystallographic structures of RNAs, which are determined in vitro, have either (i) been validated in vivo or (ii) not been disproven in vivo. In the case of riboswitch RNAs, crystallographic data strongly support initial secondary structures determined by chemical probing techniques discussed above.

On the whole, determination of the precise and detailed secondary structure of lncRNAs allows classification into (i) highly structured RNAs with sub-domains and complex structural motifs, such as multiway junctions; (ii) loosely structured RNAs with multiple stem-loops, but lacking hierarchical domain structure and complex motifs; and (iii) unstructured, disordered RNAs, which lack secondary structure.

3-D studies of long non-coding RNAs at low resolution

Studies of tertiary interactions in long non-coding RNAs. Pyle and co-workers used UV crosslinking to identify individual tertiary interactions in lncRNA systems (Liu et al. 2017).

Small-angle X-ray scattering (SAXS). Small-angle X-ray scattering studies have been used to characterize the 3-D structure of RNA systems that are too flexible to be studied with X-ray crystallography. Often, RNA molecules sample a multitude of conformations. SAXS can characterize the distribution of configurations samples. In addition, SAXS can be a first step toward higher-resolution structure determination as the requirements for sample preparation are much less stringent than for X-ray crystallography or for higher-resolution cryo-EM. Recently, low-resolution structures of the Braveheart lncRNA and Braveheart-CNBP ribonucleoprotein complex were determined using SAXS (Kim et al. 2020). The structures were consistent with 2-D secondary structures determined via chemical probing, with secondary structure domains fairly well-separated in 3-D physical space. The molecule was found to be somewhat flexible, where multiple all-atom 3-D configurations were consistent with 3-D volume reconstructions consistent with the SAXS data. However, the SAXS data demonstrated compaction upon Mg2+ titration, which is clear evidence of well-defined tertiary structures in the RNA system. This is similar to riboswitch systems, which still sample well-defined 3-D structures, even in their ligand-free states, known to be extended and flexible. Additionally, Braveheart underwent significant reorganization upon protein binding, as evidenced by the substantial change in scattering profiles and corresponding 3-D volume reconstructions as a result of CNBP binding.

Atomic force microscopy (AFM) studies of lncRNAs. AFM has been used to characterize the 3-D structure of lncRNA systems without solution. In these experiments, MEG3 displayed tertiary structure consistent with 2-D secondary structures determined by chemical probing (Uroda et al. 2019). Bachelet and co-workers used fast AFM scanning to quantification of the motion of HOTAIR lncRNA, describing the anatomy and intrinsic properties of HOTAIR (Spokoini-Stern et al. 2020).

Fluorescence correlation spectroscopy (FCS). FCS has been used to characterize the size, in terms of extended vs. compact, of lncRNAs systems in 3-D. In one FCS study, lncRNAs (e.g., HOTAIR) were found to be more compact than mRNA transcripts, but less compact than ribosomes (Borodavka et al. 2016).

Expansion of structural tools to study long noncoding RNAs at high resolution

High-resolution structural studies of lncRNA systems will undoubtedly reveal new information about their mechanisms. As early studies present evidence for tertiary contacts, at minimum, cryo-EM studies of lncRNAs may reveal structured tertiary motifs surrounded by flexible regions or large swaths of RNA. At the other extreme, these studies may uncover highly structured ribonucleoprotein complexes, or even structured RNA-only systems. The past decade of lncRNA research has clearly shown that lncRNAs represent a highly diverse class of RNAs with a wide range of functional roles. Thus, a wide range of structural content may be observed, ranging from highly dynamic to highly structured. Higher-resolution structural studies will be able to shed light on structure–function relationships, in terms of specific protein binding partners, RNA binding partners, DNA binding partners, conformational changes, and roles in pathways. These studies may also offer insight into the evolution of lncRNAs. Since lncRNAs often have fairly low sequence identity, structure–function studies will enable analysis of conservation in terms of more general measures, such as 2-D structure, 3-D structural RNA motifs, 3-D RNA–protein binding motifs, RNA dynamics, and RNA function (Hezroni et al. 2015; Ulitsky 2016).