Introduction

Nine human hereditary neurodegenerative disorders, including Huntington's disease (HD) and several spinocerebellar ataxias (SCAs), are caused by the abnormal expansion of CAG repeats located in the translated sequences of different genes [1] (Fig. 1). For example, HD is triggered by the expression of at least 36 CAG repeats located in exon 1 of the HTT gene, which encodes the large and multifunctional huntingtin protein (Htt) [2]. SCA1 is caused by at least 39 CAG repeats present in exon 8 of the ATXN1 gene, which codes for the ataxin-1 protein involved in RNA metabolism and transcriptional regulation [3], and SCA3 (also known as Machado–Joseph disease) is induced by at least 60 CAG repeats that occur in exon 10 of the ATXN3 gene encoding a protein with deubiquitinating activity [4]. The CAG repeats are translated into polyglutamine tracts in protein products, and the entire group of resulting disorders is known as the polyglutamine (polyQ) diseases. Each polyQ disorder primarily affects a different population of neurons, although the causative genes are widely expressed in the central nervous system and peripheral tissues. The expanded repeats are thought to exert their pathogenic effects at the protein level, mainly through a gain of toxic function by the mutant protein [1]. For comparison, myotonic dystrophy type 1 (DM1), which is another example of a triplet repeat expansion disease (TRED), is caused by 50–3,000 copies of CTG tandem repeats located in the 3′ untranslated sequence of the dystrophia myotonica protein kinase gene (DMPK) [5]. In DM1, the gain of toxic function by mutant transcript containing CUG repeats is proposed to explain its pathogenesis [6, 7].

Fig. 1
figure 1

Comparison of the lengths of the CAG repeat tracts that occur in polyQ disease-related transcripts and the CUG repeats in the transcript responsible for DM1. The normal repeat range is marked in green, and the mutant repeat range is marked in red and specified. The repeat range marked in gray refers to unidentified or undefined tracts. The starting threshold for DM1 mutation (50 repeats) is denoted by a hatched line

The pathogenic number of CAG repeats in genes implicated in polyQ diseases is, in most cases, lower than the number of CUG repeats in the DMPK gene of DM1 patients, and CAG expansions typically span a much narrower range of repeat lengths (Fig. 1). Nevertheless, a considerable fraction of HD and SCA patients harbor mutant CAG repeats of a length corresponding to the lower range of CUG repeat lengths found in DM1. The expression levels of individual polyQ disease genes and the DMPK gene differ within and between tissues; however, they all belong to the moderate or low expression category [8, 9]. The structural features of the CAG repeats are similar to the CUG repeats, forming hairpin structures in transcripts when the repeat tracts are long enough [1016]. The repeats of normal length form small and semistable hairpins, and mutant repeats form long hairpins that are more stable [17]. According to the results of crystallographic studies, the CAG repeat duplexes and CUG repeat duplexes show a high degree of similarity [18, 19], suggesting that the stem portion of these hairpins may share some protein-binding properties [20, 21].

In this review, we first discuss the main cellular and molecular hallmarks of the mutant protein and mutant RNA gain-of-function mechanisms from studies of polyQ diseases and DM1 pathogenesis. We then present recent findings indicating that transcripts containing expanded CAG repeats might also be toxic and contribute to the pathogenesis of polyQ disorders. Finally, we focus on the experimental models used to demonstrate RNA toxicity in polyQ diseases. We discuss the features of these models and propose generating new cellular and animal models dedicated to elucidating the toxic effects triggered by mutant RNA and protein in polyQ diseases.

Mechanisms of protein toxicity in polyQ diseases and RNA toxicity in DM1

PolyQ toxicity

The protein products of genes that undergo mutations leading to polyQ diseases differ in their cellular functions and cover a wide range of molecular weights [1]. A common feature of these proteins is the presence of a polyQ tract that is expanded as the result of the mutation. However, the exact nature of protein gain-of-function toxicity in polyQ diseases remains a subject of debate [22, 23], and two mechanisms have been considered: the gain of a new toxic function by the mutant protein and the enhancement of the normal protein function to toxic levels. In either case, the key to understanding the details of such pathogenic mechanisms is the identification and characterization of new toxic protein–protein interactions and imbalanced normal interactions. A systematic search for proteins associated with normal and expanded Htt has recently been performed using quantitative proteomic analysis [24]. The data showed that the proteins that preferentially interacted with mutant Htt are enriched for intrinsically disordered proteins and proteins involved in key cellular functions, such as energy production, protein trafficking, RNA modification and translation, mitochondrial functions, cellular stress, and cell death. These results confirmed and expanded earlier findings regarding the cellular processes and pathways that are altered in HD [2527].

The pathogenesis of polyQ diseases is believed to be initiated by the expanded polyQ tract, which is thought to acquire a β-sheet conformation and cause the mutant protein to misfold [28]. Alternatively, the expanded polyQ stretch may stabilize one of several conformations characteristic of the wild-type protein that may normally exist in equilibrium [22]; such an effect enhances some intermolecular interactions of the mutant protein at the cost of other interactions. In addition to the polyQ tract, the context of other protein domains may also contribute to pathogenesis, as demonstrated in animal models of spinal–bulbar muscular atrophy [29], SCA1 [30], and HD [31]. In several instances (reviewed in [3234]), single amino acid substitutions in functional domains of relevant proteins resulted in dramatic changes of disease phenotypes pointing to the critical role of protein toxicity in polyQ diseases.

The cellular hallmarks of polyQ diseases are nuclear and cytoplasmic amyloid-like aggregates formed by mutant proteins, their proteolytic fragments [35, 36], or by fragments translated from aberrantly spliced shorter mRNAs [37] that sequester other proteins, predominantly those containing unstructured regions [24, 28, 38]. The formation of amyloid-like deposits is also observed in Parkinson and Alzheimer diseases, with which the protein toxicity of polyQ diseases is often compared. However, in efforts aimed at demonstrating RNA toxicity in polyQ diseases, reference to DM1 and other RNA-triggered TREDs, e.g., SCA8 [39] and HDL2 (Huntington's disease-like 2) [40, 41], is both appropriate and justified [20, 21, 4244].

CUG repeat toxicity

DM1 is the prototype of TREDs caused by toxic RNA [6, 7]. Terms such as “toxic RNA” and “RNA toxicity” are used to reflect the fact that the disease-causing mutation is passed from the gene to the transcript only, while the protein structure is unaffected by the mutation. The hallmarks of cells expressing expanded CUG repeats are nuclear RNA foci [45], which are toxic to cells because they sequester important cellular proteins, reducing their normal, functional levels (reviewed in [46, 47]). One such protein trapped in mutant RNA foci is muscleblind-like (MBNL1) alternative splicing factor [48]; together with the concomitant upregulation of the antagonistic splicing factor CUG-BP1 [49], MBNL1 deficiency results in the misregulation of the developmentally regulated alternative splicing of numerous MBNL1-regulated genes. The compromised functions of some of these genes have been linked to the clinical symptoms of DM1, e.g., the altered splicing of insulin receptor in insulin resistance [50], chloride channel 1 in myotonia [51], sarcoplasmic/endoplasmic reticulum Ca2+ ATPase 1 and 2 in muscle wasting [52], cardiac troponin T in cardiac abnormalities [53], and Tau (MAPT) in cognitive deficits [54]. The binding of the multifunctional helicases p68 and p72 to mutant CUG repeats facilitates the binding of MBNL1 to the repeats [55] and likely links the DM1 pathogenesis pathway to the nuclear step of microRNA biogenesis, a process in which these helicases are involved as cofactors of the ribonuclease Drosha complex [56]. Direct evidence was provided for the crosstalk between the DM1 pathogenesis pathway and the cytoplasmic ribonuclease Dicer step of microRNA biogenesis, in which the MBNL1 protein is also implicated [57]. It was demonstrated that MBNL1 sequestered by mutant CUG repeats cannot effectively function in its newly recognized role as a pre-miRNA cleavage regulator, resulting in the compromised expression and function of some miRNAs, including miR-1, an effect linked to DM1-related heart defects. The expanded CUG repeats were also shown to trigger immune responses in DM1 tissue in which the PKR, OAS, TLR3, and RIG I genes were activated [58].

Accumulating evidence for RNA toxicity in polyQ diseases

Although the contribution of RNA toxicity to the pathogenesis of polyQ diseases was previously postulated based on the structural similarities between the CAG and CUG repeats in transcripts [10, 17], the first compelling experimental evidence for such an involvement was provided in 2008 by Bonini and colleagues [59]. Using a Drosophila melanogaster model system, the researchers demonstrated that the expression of both translated and untranslated RNA molecules containing long hairpin-forming CAG repeat tracts causes neurodegeneration, whereas the expression of RNA containing CAA-interrupted CAG repeats resulted in considerably less pronounced neurodegenerative features. CAA triplets also encode glutamine but are unable to form a hairpin structure [11], and their occurrence within CAG repeat tracts [60, 61] destabilizes CAG repeat hairpins [15].

Other studies that were performed in nonhuman model systems provided further evidence for the toxicity of CAG repeats. Hsiao and colleagues showed that in transgenic Caenorhabditis elegans, similar to CUG repeats, CAG repeats of toxic length formed nuclear foci that colocalized with CeMbl, which is the worm homolog of MBNL1 [62]. Worms expressing the toxic CAG repeats showed shortened lifespan and reduced motility rates, which is consistent with the phenotypes of worms expressing toxic CUG repeats. As shown by Pan and colleagues, the expression of 200 CAG repeats from the 3′UTR of EGFP was toxic in transgenic mouse muscle and testis tissues in which the expression of the transgene was directed [63]. Importantly, ribonuclear foci and abnormal phenotypes were observed in the absence of a polyQ product expression.

The formation of CAG repeat RNA foci that colocalize with MBNL1 protein was, for the first time, demonstrated by Cooper and colleagues through the expression of very long exogenous CAG repeats in monkey COSM6 cells [64]. A question that was addressed using human cell lines was whether less CAG repeats could also form nuclear foci and cause alternative splicing aberrations similar to those known for DM1. The results showed that these effects are triggered in both HeLa and neuroblastoma SK-N-MC cells transiently expressing mutant CAG repeats [65]. In addition, we demonstrated that the expanded CAG repeats present in HTT and ATXN3 transcripts also occur within intranuclear MBNL1-positive RNA foci in human HD and SCA3 fibroblasts. Although the focus formation was accompanied by the aberrant alternative splicing of several endogenous MBNL1-dependent transcripts, this only occurred in cells expressing approximately 70 CAG repeats and not a lower number of approximately 45 CAGs [65]. Very recently, Bates and colleagues have demonstrated the CAG repeat length-dependent aberrant splicing of the HTT exon 1 giving rise to short polyadenylated exon 1-intron 1 mRNA which is translated to the toxic HTT exon 1 protein [37]. The aberrant splicing is triggered by the increased binding of the alternative splicing factor SRSF6 to expanded CAG repeats. Thus, it is likely that the pathogenesis of polyQ disorders involves, at least to some extent, the primary elements of the RNA gain-of-function mechanism that is well characterized in DM1. Consistent with the above nuclear effects, a protein that is responsible for the nuclear export of transcripts harboring expanded CAG repeats has been identified by Chan and colleagues as the small nuclear RNA U2-associated factor U2AF65 [66]. This factor binds directly to expanded CAG repeats and forms a complex with the nuclear export receptor NXF1 and its reduction in symptomatic tissues is linked to accumulation of mutant RNA in the nucleus.

An RNAi mechanism has been shown to be involved in pathogenesis in several in vivo and cellular models of polyQ diseases. As demonstrated by Bonini and colleagues [67] and Richards and colleagues [44] in two independent studies, the co-expression of exogenous expanded CAG and CUG repeats in fly cells resulted in a neurodegenerative phenotype. The complementary repeats likely formed a double-stranded RNA that was cleaved by Dicer-2 into 21 nt CAG/CUG repeat siRNAs, which were active in silencing in an Ago2-dependent process. Margolis and colleagues [68] discovered a natural antisense transcript, HTTAS, at the HD repeat locus that contains the CUG repeat tract. This transcript was shown to regulate sense HTT transcript expression in a process dependent on the repeat length and partly dependent on Dicer and the RISC pathway. The involvement of an RNAi mechanism was also demonstrated by Marti and colleagues in human cell lines expressing mutant CAG repeats flanked by the HTT exon 1 sequence [69]. The mutant CAG repeats, both translated and untranslated, gave rise to a toxic small RNA species (sCAG) in a Dicer-dependent manner and caused a downstream silencing effect in an Ago2-dependent manner. Again, the cytotoxic effects were triggered only by CAG repeats and not by CAA repeats. The involvement of the miRNA biogenesis pathway in the pathogenesis of polyQ diseases is only circumstantial, based on the ability of the MBNL1 [16, 64, 65] and p68 [55] proteins to bind to both CUG repeats and, though less efficiently, CAG repeats.

The recent study by Chan and colleagues revealed that CAG repeat expansion induces nucleolar stress in polyglutamine diseases [70], demonstrating that the expanded CAG repeat RNA interacts directly with nucleolin and that this interaction triggers a number of aberrant downstream cellular processes, eventually leading to cell apoptosis. These downstream effects include hypermethylation of the upstream control element of the rRNA promoter and the inhibition of rRNA transcription. The reduced level of rRNA results in the accumulation of free ribosomal proteins, and the interaction of these proteins with MDM2 E3 ubiquitin ligase causes mitochondrial p53 accumulation. The interaction between p53 and the antiapoptotic protein Bcl-xL causes the oligomerization of the proapoptotic protein Bak at the mitochondrial membrane and allows for cytochrome c release to the cytosol, an event that activates the caspase cascade and induces apoptosis [70]. Collectively, several mechanisms of RNA toxicity have been revealed that may likely participate in the pathogenesis of polyQ diseases; however, the entire scale of toxic RNA contribution to the disorders and the range of cellular processes and pathways altered by mutant transcripts remain unknown.

Experimental models used to trace RNA toxicity in polyQ diseases

Various genetic models have been used to demonstrate the first examples of CAG repeat RNA toxicity described in the previous section. The characteristics of these model systems and the main results obtained from their application are compiled in Table 1. It is apparent that this type of RNA toxicity has thus far been modeled in lower invertebrates, nematode worms, and fruit flies in addition to mammalian systems, including mice and monkey and human cell lines. However, a common issue in modeling human diseases in lower organisms is the relevance of the results obtained with simpler model systems of human disease. Indeed, relevance is hard to prove by merely a sufficient similarity of the genetic pathways operating in model organisms and humans. For polyQ diseases and TREDs in general, this task is more difficult, as these late-onset diseases only occur naturally in humans and not in shorter living animals in which the experimental induction of similar effects requires stronger triggers, i.e., longer repeat tracts and in some cases higher expression levels. In most studies, the expression of transgene was determined but the level not referred to the expression of endogene (homologue of human polyQ gene) or any other gene. Most authors related the expression of a single construct to the expression of other transgene variants. In case of mouse models R6/2, BACHD, and YAC128, the expression level of transgene did not differ much from that of homologous endogene [71, 72]. In case of constructs tested in mammalian systems, containing CMV promoters, the expression level of transgene was reported as very high [64, 65, 73]. The worm and fly systems satisfactorily fulfilled the requirement of genetic similarity, and both systems were capable of demonstrating some molecular and phenotypic hallmarks of RNA toxicity. Although more relevant and potentially more informative than the fly systems, transgenic mouse models have less frequently been used, and human cellular models have been used for this purpose only twice.

Table 1 Characteristics of model systems used for the investigation of RNA-mediated toxicity caused by expanded CAG repeats. The table is organized according to the species of model organism and includes human cell lines. Only mutant repeat tracts used in the studies are listed. Constructs containing untranslated repeat tracts are underlined. Note that some constructs contained either pure CAG or CAA repeats, and some contained other CAA-interrupted CAG repeats. The CAG repeats interrupted with CAA or pure CAA tracts are presented in bold

The genetic systems shown in Table 1 differ considerably with regard to their molecular design. Most models are transgenic and express exogenous expanded repeats that have the potential to trigger pathogenesis by a variety of mechanisms involving aberrant interactions with host proteins. Of importance in these systems is the intragenic localization of repeated sequences. In some experimental systems, the repeats are located within the open reading frame (ORF) and are typically transcribed and translated; the expressed mutant repeats are present within the sequence context of either the full-length human disease transcript or only a fragment of this transcript. Alternatively, the repeats are placed within the 3′UTR sequence of a heterologous gene and are only transcribed. The genes hosting mutant repeats are usually a marker gene, such as GFP or DsRed. In addition, the length and purity of the CAG repeat tracts are important factors in the design of a model system; the repeated sequence typically falls into the upper range of repeat lengths found in human subjects suffering from the disease and is often modified to contain CAA repeats.

The ideal model of a polyQ disease would be one that closely recapitulates the human disease with regard to most of its aspects [36]. Thus, human cell lines are well suited to study the details of the molecular mechanisms that are implicated in triggering pathogenesis. The rapid development of induced pluripotent stem (iPS) cell technology enables somatic cells to be transformed into iPS cells and then differentiated to neuronal cell lines [74]. Moreover, the advent of gene-editing technologies has begun to enable the precise engineering of human disease-related endogens [75]. Both technologies may provide highly informative cellular models for future studies. The expression of mutant exogenous full-length cDNA using the well-established technologies of transgenesis may also be considered as an option for the construction of in vitro models of polyQ diseases. A good transgenic mouse model of polyQ disease should express the CAG repeat mutation within the sequence context of the entire human gene, with sufficient upstream and downstream regulatory sequences to ensure the proper spatial and temporal expression of the human gene at natural levels [72, 76]. In general, model systems expressing a full-length human mRNA, despite showing milder disease phenotypes, are better suited for studies aimed at discovering new mechanisms of RNA toxicity than systems expressing only shorter or longer mRNA fragments. In the full-length RNA models, the mutant transcript has a chance to go through all steps of mRNA cellular journey from its “birth” in the nucleus to “death” in the cytoplasm; therefore, all steps altered by the mutation may be identified.

A serious difficulty in identifying the involvement of RNA toxicity in polyQ diseases is the overlap with the polyQ toxicity in models expressing CAG repeats from the ORF, and placing the repeat in the untranslated sequence of a heterologous gene is frequently used to overcome this difficulty and trace the effects of mutant RNA. However, preventing transcript translation by the mutation of the START codon in a polyQ disease gene or its cDNA resolves this problem while preserving the natural sequence context and expression level for the repeats. Based on the assumption that the non-AUG-initiated translation of repeated sequences [77] is not a very common phenomenon, the START codon mutation may indeed be considered a better solution. By mutating the START codon also in constructs expressing CAA repeats or CAA-interrupted CAG repeats, the four model types depicted in Fig. 2 may be generated. With these in vitro and in vivo models critical experiments using both transcriptome-wide and more focused candidate approaches have to be done to obtain clear-cut answers regarding the role of mutant transcripts in the pathogenesis of polyQ diseases.

Fig. 2
figure 2

Four types of models that can be used to separate RNA toxicity and protein toxicity in the investigation of the pathogenesis of CAG repeat diseases. The depicted models express one of two types of repeated sequences, either CAG or CAA. Both sequences code for glutamine, but only the CAG repeat forms a hairpin structure in transcripts. In these types of models, both repeats are expressed in their translated or untranslated forms, depending on the absence or presence of an AUG START codon mutation. Potentially toxic entities, an expanded CAG repeat-containing transcript and a polyQ-containing protein, are marked in red; a transcript containing expanded CAA repeats, presumed to be nontoxic, is marked in blue

Concluding remarks

Research on RNA toxicity in polyQ diseases is still in its infancy. At this stage, there is a marked disproportion between the ability of a model organism to closely reproduce human disease and the frequency with which the model system has been used in the search for RNA toxicity hallmarks. Nevertheless, most of the genetic models provided in Table 1 have succeeded in demonstrating the first examples of RNA toxicity caused by expanded CAG repeats. Future studies along this line will require the more extensive use of dedicated mouse models and the next generation of human cellular models.

The studies performed thus far have revealed the involvement of several cellular processes and pathways in CAG repeat RNA toxicity. These altered processes include aberrant alternative splicing, transcript nuclear transport and export, RNA interference, and nucleolar stress resulting in apoptosis. All these processes are triggered by aberrant interactions between cellular proteins and mutant RNA. The proteins that participate in direct or indirect interactions with mutant CAG repeat transcripts need to be identified in further studies, and their role in pathogenesis needs to be elucidated. Such efforts may help to determine the contribution of RNA toxicity to the overall toxicity caused by both toxic factors: the protein and the RNA. This endeavor may also provide the basis to answer the question of whether the RNA and protein toxicities interact with each other to enhance pathogenesis or whether these toxic processes are parallel and independent. Another issue to be clarified is whether the RNA and protein toxicities require the same or different repeat length thresholds to become operative. The latter alternative may likely be the case, and a higher number of repeats may be required to initiate the RNA-mediated toxic processes.

Finally, depending on the scale and repeat length thresholds for the various pathogenic mechanisms triggered by mutant RNA, these factors may impact the causative therapeutic approaches to combat polyQ diseases to a higher or lower extent. Should RNA toxicity be proven to contribute significantly to disease development and progression, both the inhibition of mutant protein synthesis and the blockage or degradation of mutant transcript will be required.