Introduction

Expansion of (CTG)n•(CAG)n trinucleotide repeat (TNR) sequences at distinct chromosomal loci is the mutation common to multiple neurological diseases including myotonic dystrophy type 1 (DM1), Huntington disease (HD), Huntington disease-like 2 (HDL2), dentatorubral-pallidoluysian atrophy (DRPLA), spinal and bulbar muscular atrophy (SBMA), and several forms of spinocerebellar ataxia (SCA). The polyglutamine diseases HD, DRPLA, SBMA, and SCA1, 3, 6, 7, 17 result from increases of (CAG)n repeats in the coding (nontemplate) strand for mRNA synthesis of the cognate genes ((CAG)n in RNA) to produce mutant polyglutamine proteins with toxic gain-of-function [1]. In contrast, (CTG)n•(CAG)n expansion at the DMPK 3' UTR alters the chromatin structure of the region, downregulates transcription of the locus and, as at the JPH3 gene produce poly-(CUG) pre-mRNAs respectively in DM1 and HDL2 patients that sequester the MBNL (CUG) binding proteins, leading to trans-dominant interference with the normal splicing of multiple RNAs. Finally, bidirectional transcription at the SCA8 locus can result in expression of both a polyglutamine protein and a (CUG)n expansion transcript, which may represent a toxic gain-of-function at both the protein and RNA levels.

Trinucleotide repeat expansion requires DNA synthesis, either during DNA replication or repair. The effects of replication origin proximity, replication polarity, and replication inhibition support replication-based models of TNR instability in mitotic cells [29]. Hairpin formation by DNA polymerase slippage is a likely mechanism for changes in TNR repeat length [1012]. Hairpin structure formation by DNA polymerase slippage at (CTG)n•(CAG)n sequences has been well documented in vitro [13, 14] and can result in either insertion or deletion mutations. However, hairpins have also been postulated to arise during replication fork reversal and postreplication repair [2, 15, 16], Okazaki fragment maturation [1719], base excision repair [20], nucleotide excision repair [2126] or repair of structures induced by R-loop formation during transcription [25, 27]. Current models of (CTG)n•(CAG)n instability during replication or repair envision that hairpin formation on the newly synthesized DNA strand leads to TNR expansion if the hairpin is sufficiently long-lived to serve as template in a subsequent round of replication. Conversely, stable hairpin formation in the leading or lagging template strand would lead to contraction of the repeat in the next round of replication (Figure 1).

Figure 1
figure 1

Hairpin-induced trinucleotide repeat instability. The TNR is indicated by gray lines, flanking DNA by black lines. (a) Nascent-strand hairpin formation results in over-replication of a segment of the TNR in one chromatid. A second round of replication of the hairpin strand fixes the expanded allele in the genome. (b) Template-strand hairpin formation results in under-replication of a segment of the TNR in one chromatid. A second round of replication of the nonhairpin strand fixes the contracted allele in the genome. (Reprinted from [7] with permission)

The salient observation that TNR instability in humans and mice can occur in postmitotic cells argues that repair mechanisms, instead of replication origin-dependent mitotic DNA replication, are involved in TNR instability in these tissues [2, 5, 2830]. In this vein, it has been proposed that the process of transcription stimulates TNR instability due to the formation of hairpin or other non-B DNA structures in the single stranded nontemplate DNA, or in the template strand upon RNA displacement. These structures may be targets for DNA repair processes such as transcription-coupled repair, nucleotide excision repair, mismatch repair, or double-stranded DNA break repair [24, 27, 31].

Following extensive linkage analysis in myotonic dystrophy families [3234], in 1992 several laboratories reported that expansion of the (CTG)n•(CAG)n repeat region in the 3' untranslated region of the dystrophia myotonica protein kinase gene was highly correlated with the occurrence of congenital DM [3537]. Strong correlations also exist between (CTG)n•(CAG)n repeat length and the occurrence of Huntington disease [38, 39], although second site modifier genes and epigenetic mechanisms play a significant role in the appearance of HD symptoms. In general, unaffected individuals display fewer than 30 (CTG)n•(CAG)n repeats at the DM1 or HD locus. Trinucleotide repeat (TNR) tracts in the range of 30-40 repeats are termed premutation alleles (DM1) or intermediate alleles of incomplete penetrance (HD), while TNRs of 42 or more repeats have been associated with complete penetrance of HD [40] and increased expansion frequency during intergenerational transfer or somatic development in DM1 families [2]. The phenomenon of 'genetic anticipation' is a hallmark of the (CTG)n•(CAG)n TNR instability disorders, in which an increase in the number of microsatellite repeats is correlated with an earlier age of onset and heightened severity of the disease in successive generations. Genetic anticipation reflects the bias towards expansion over contraction of long (CTG)n•(CAG)n tracts, and may be explained by the greater tendency of extended repeats to adopt non-B form DNA structures prone to progressive expansion.

(CTG)n•(CAG)n expansion can have pathological effects on local chromatin structure and gene expression, as well as dominant negative effects on RNA metabolism and protein function [4144]. This review will focus primarily on the structure and instability of (CTG)n•(CAG)n trinucleotide repeat sequences in eucaryotic cells. For further background on the metabolism of (CTG)n•(CAG)n sequences in bacterial cells, the reader is referred to several excellent research articles and reviews [5, 6, 4551].

(CTG)n•(CAG)n hairpins in vitro

NMR, melting and chemical modification analyses confirm that both (CTG)n and (CAG)n oligonucleotides as short as 6-10 repeats can form stable hairpin structures with mismatched base pairs [21, 5255]. Although short (CTG)n or (CAG)n hairpins represent stable structures, they rapidly convert to duplex DNA in the presence of the complementary oligonucleotide through loop-loop and stem-stem interactions without prior denaturation [52]. Considered with the greater thermodynamic stability of duplex vs. cruciform DNA, and the inhibition of cruciform formation by single base mismatches [56, 57], these observations suggest that unless otherwise stabilized e.g. by protein binding or high negative superhelicity [2, 58], apposed hairpin structures formed in vivo by transcription or replication fork regression to "chicken foot" structures would resolve to duplex DNA, and disfavor TNR expansions. Notably, (CTG)25 or (CAG)25 hairpins form duplex DNA approximately 5-fold more slowly than (CTG)10 or (CAG)10 hairpins in the presence of their complementary strands, despite similar thermal stabilities of the hairpins [55]. Thus, longer hairpins may also have longer half-lives in vivo.

Compared to nonrepetitive palindromic sequences, which would require half of the palindrome to become single stranded prior to hairpin formation, the free energy required to nucleate short hairpin formation may be provided by negative superhelicity [2, 58]. The greater number of hairpin configurations in a repetitive TNR would be expected to increase the entropy of hairpin formation and decrease the free energy. Thus, in addition to the slower dissolution of longer hairpins by their complementary sequences, hairpin formation may occur more rapidly in longer repeats, competing against SSB binding.

To assess the mechanism of hairpin instability, Panigrahi et al. used in vitro replication of (CTG)79•(CAG)79 repeats driven by the SV40 T antigen (T-ag) helicase. The remaining enzymatic machinery of DNA synthesis is endogenous to the host cell. Plasmids in which the (CAG)n sequence was the lagging strand template showed an expansion bias, while plasmids containing the opposite TNR replication orientation ((CTG)n in the lagging strand template) displayed a preference for contraction [59]. To the extent that the replication fork driven by the strong T-ag helicase mimics the activity and interactions of the cellular Cdc45/MCM2-7/GINS replicative helicase [60, 61] with replication fork stabilizing proteins, the effect of TNR orientation relative to the replication origin on (CTG)n•(CAG)n stability imply that unstable TNR structures can be processed to expansions or contractions depending on DNA replication polarity.

The question of why a contraction bias is observed in rapidly dividing eucaryotic cells [16] was addressed by Delagoutte et al. using a primer extension model of (CTG)n•(CAG)n replication, in which replication by T4 DNA polymerase through short (CAG)n or (CTG)n TNRs was inhibited relative to polymerization through non-structure forming repeats. (CAG)n repeats blocked replication more efficiently than (CTG)n repeats, and this difference was eliminated by the addition of E. coli or T4 single strand binding SSB proteins [62]. Based on the preferential binding of SSB to lagging strand template DNA and the more efficient blockage of polymerization by the (CAG)n vs. (CTG)n template, the authors proposed a 'template-push' model in which the contraction bias of TNRs with (CTG)n in the lagging strand template is not the result of lagging strand structure formation while single stranded, but is the result of extrusion of the leading strand (CAG)n template and replication across the abasic bottom of the hairpin in order to maintain contact between DNA polymerase and replicative helicase [62]. Alternatively, transient release of the leading strand hairpin template from the stalled polymerase could allow hairpin slippage or migration in the 5' -- > 3' direction away from the replication fork [13, 63, 64], and reestablishment of a functional primer-template junction.

Annealing of single stranded plasmid DNA to complementary strands containing excess (CTG)n or (CAG)n sequences, which formed hairpins as large as 25 repeats, yielded products that showed accurate repair in human cell extracts [56, 6568]. The requirement for PCNA, and the nick-dependence of accurate repair suggested that mismatch repair proteins (MMR) that function during replication in vivo might play a role in hairpin resolution in vitro. Indeed, in one study, the repair of short (CTG)1-3 slip out structures was reported to increase with increasing concentrations of MutSβ (Msh2/Msh3) in cell extracts [69]. Nevertheless, in contrast to the apparent requirement for MMR proteins for (CTG)n•(CAG)n expansions in transgenic mice [7072], MMR proteins, or the nucleotide excision repair (NER) protein XPG, were not essential for repair of longer (CTG)20-25 hairpins in cell extracts [66, 67, 69]. It remains possible that there are alternative pathways for hairpin removal. Additionally, the formation of stable hairpins in advance in these assays may have bypassed the contribution of MMR, NER or other pathways to hairpin repair.

Yeast models of (CTG)n•(CAG)n instability

Numerous genetic analyses have been performed in S. cerevisiae to characterize the effect of (CTG)n•(CAG)n sequences on DNA replication and chromosome fragility, and to identify proteins that affect (CTG)n•(CAG)n trinucleotide repeat stability. Between these studies there are some disparities that are likely due to differences in the repeat sequence ((CTG)n•(CAG)n, (GAA)n•(TTC)n, (CGG)n•(CCG)n), its environment (plasmid vs. chromosome; leading vs. lagging strand replication polarity), number of repeats in the microsatellite tract, genetic background, and the sensitivity of the assay to small changes in repeat length [7382]. Thus the frequency of expansions was approximately 500-fold greater when (CTG)25 was in the lagging strand template than when (CAG)25 was in the lagging strand template [17]. While the frequencies of (CTG)25 or (CAG)25 expansions, and (CTG) 50 or (CAG)50 contractions, were unaffected in msh2 mutants [17, 83], (CTG)13 expansion was stimulated by mutation of postreplication repair genes rad18 (hRAD18, binding partner of hUBE2A/B), rad5 (hSMARCA3), and PRR-specific alleles of pol30 (hPCNA) [74].

When analyzed by two-dimensional gel electrophoresis (CTG)80•(CAG)80 sequences showed only modest effects on replication fork progress, irrespective of replication polarity. In contrast, (CGG)40•(CCG)40 repeats imposed strong blocks to fork progression [75]. Surprisingly, in an assay that used reversion to 5-fluoroorotic acid resistance (FOAR) to quantitate TNR expansions, comparable rates of repeat instability were found for (CGG)25 or (CTG)25 (lagging strand template) TNRs. In the presence of the rfc1-1 mutation, which blocks PCNA loading and lagging strand Okazaki fragment synthesis, the expansion rates of (CGG)25 and (CTG)25 increased ~40-50 fold and ~2-3 fold respectively. One interpretation of this result is that inhibition of lagging strand synthesis can promote expansions in the leading strand nascent DNA. However, since (CGG)n and (CTG)n repeats in the lagging strand template characteristically show a strong bias towards contraction, this assay may not have revealed the full relationship between replication stalling and TNR instability. In a similar assay, expansion of (CAG)25•(CTG)25 was increased ~100 fold when (CAG) was in the lagging vs. leading strand template, and a ra27 Δ mutant in the Okazaki flap endonuclease (hFEN-1) enhanced (CTG)n•(CAG) n expansion an additional 100 fold, irrespective of replication orientation [84].

Bhattacharya and Lahue [85] reported that (CTG)13 (lagging strand template) expansion was markedly (~40 fold) increased in srs2 helicase mutants, while (CTG)25 expansion was increased ~5 fold in the same cells, and these rates were minimally affected by mutation of the RecQ helicase sgs1 or either of the homologous recombination proteins rad51 or rad52, arguing against unequal sister chromatid exchange as a mechanism of expansion, consistent with the absence of exchange of markers flanking expanded alleles in human patients [86, 87]. These results differed from those of Kerrest et al. who reported that the fragility of yeast artificial chromosomes (YACs) containing longer (CTG)70•(CAG)70 TNRs, which are above the expansion threshold, increased significantly in sgs1 Δ or srs2 Δ helicase mutants. Deletion of the homologous recombination protein genes mitigated the effect of the srs2 Δ mutation in either orientation, and decreased the effect of the sgs1 Δ mutation in the (CTG)70 orientation, but exacerbated the effect of the sgs1 Δ mutation in the (CAG)70 orientation [15]. A simple relationship between YAC fragility, TNR length and replication polarity was difficult to ascertain for these mutants, leading the authors to suggest that multiple pathways coexist that involve Srs2, Sgs1, Rad51 and Rad52 in the repair of replication fork damage due to hairpin forming sequences of different lengths and orientations.

In a screen for mutants that affected (CTG)n•(CAG)n instability, disruption of the replication fork stabilization complex protein genes Mrc1, Tof1 or Csm3 selectively enhanced contractions of a (CAG) 20 -URA3 (lagging strand template) reporter, independent of replication checkpoint and DNA damage checkpoint factors [76], and control experiments demonstrated that mutation of the same fork stabilization complex proteins did not affect the stability of a non-structure forming (CTA)n repeat. In contrast, mutation of the fork stabilization complex proteins or the DNA damage checkpoint proteins Ddc1, Rad9, Rad17, Mec1, Ddc2, Rad24, Mec3, Rad53 or Chk1 led to increased expansion of a (CAG) 13 -URA3 reporter. These results suggest that Mrc1, Tof1 and Csm3 may maintain TNR length through coupling of the DNA polymerase and replicative helicase to prevent the formation of hairpins, whereas the DNA damage checkpoint is involved in stabilization of the replisome after the formation of hairpin structures.

Assays using longer TNRs (85-155 repeats) in mrc1, rad9, mec1, ddc2, rad17, rad24, chk1, or rad53 mutant strains found elevated chromosome breakage due to expanded (CAG)n•(CTG)n tracts and increased instability (primarily contractions) of a (CAG)n (lagging strand template) reporter [77, 78]. The inherently greater instability of long (CTG)n•(CAG)n repeats in wild type strains may have masked the effects of some checkpoint mutants, nevertheless, these studies indicate that distinct protein complexes respond to forms of DNA replicative stress that differ in size or geometry, and underscore the correlation between noncanonical (CTG)n•(CAG)n structures, checkpoint activation and chromosome breakage [88, 89]. Indeed, Sundararajan et al. have recently shown that long (CTG)n•(CAG)n tracts induce chromosomal double strand breaks in yeast, and that the Mre11/Rad50/Xrs2 complex is necessary for blocking chromosome fragility and inhibiting (CAG)70 TNR instability (expansion and contraction) by both homologous recombination and NHEJ pathways [90].

The homologous recombination protein Rad52 was also required to protect the (CAG) 70 -URA3 reporter from length instability and chromosome breakage in the presence of mutations in the alternative clamp loader Ctf18-Dcc1-Ctf8-RFC (Ctf18-RFC) [91]. Previously thought to promote PCNA loading and unloading during replication fork navigation through sister chromatid cohesion (SCC) complexes [92, 93], Gellon et al. showed that Ctf18-RFC is required for TNR stability independent of its role in SCC, in parallel to a pathway involving the Mrc1 protein which couples the leading strand polymerase ε and the replicative helicase at the replication fork, and acts in signaling during the intra-S phase checkpoint and the DNA damage response [9496]. On the replication fork lagging strand, Ctf4 collaborates with MCM10 to link DNA polymerase α to the MCM2-7 helicase [9799]. Like the mrc1 mutant, a ctf4 deletion mutant is associated with chromosomal instability, ctf4 rad52 double mutants grow poorly and produce a high percentage of inviable cells [100], and ctf4 mrc1 mutants are inviable [101].

Taken together these studies in yeast suggest a cellular fail-safe strategy of overlapping pathways to (i) prevent the formation of stable hairpin structures by maintaining the rate of replisome movement and coupling of leading and lagging strand polymerases to the replicative helicase, (ii) restore hairpin structures to duplex DNA by repair helicases, and (iii) recruit postreplication repair machinery to excise hairpins.

Mouse models of (CTG)n•(CAG)n instability

Murine models of several trinucleotide expansion diseases including Huntington disease, myotonic dystrophy type 1, Fragile X syndrome, and Friedrich's ataxia have been generated by random integration of pathological length repeat tracts or knock-in at homologous genetic sites, and have reproduced many, though not all, phenotypes of the associated disease. These models typically show tissue-specific, expansion-biased patterns of instability similar to those in humans, including expansions in germ cells, early embryos and adults. Although intergenerational (CTG)n•(CAG)n expansion is typically smaller in transgenic mice than humans [16, 102], some recent studies have reported relatively large expansions during parent-to-offspring transmission [103, 104]. Among the cis-acting modulators of TNR instability in murine systems are the sequence and length of the TNR [105], the presence of human flanking DNA [27, 106108], the chromosomal integration site [109, 110], chromatin structure [41, 43, 111115], and replication polarity [4].

Relevant to studies of the relationship between the DNA replication and (CTG)n•(CAG)n stability is a comparison of origin activity at the DMPK locus in human cells and transgenic mice [4]. In this work, two origins were mapped upstream and downstream of the DMPK (CTG)n•(CAG)n repeat in both control and DM1 human fibroblasts. Transgenic mice bearing a single copy of a ~45-kb genomic region of the expanded DM1 locus containing (CTG) > 300•(CAG) > 300 repeats showed high levels of intergenerational and somatic repeat instability [116]. The transcriptional activity of the DM1 locus and tissue-specific patterns of instability were similar to those of DM1 individuals. Unlike in humans, however, when origin activity (abundance of nascent DNA) was quantitated over the ~45-kb human DM1 transgene from pancreatic cells of mice bearing either > 300 (DM328) or 20 (DM20) [117] repeats, neither the upstream nor the downstream origin was active in DM20 mice, and only the upstream origin was inactive in DM328 mice, [4]., Thus, the nuclear environment of the transgene may also modulate its replication origin activity and downstream effects on TNR stability.

Conversely, pathological length (CTG)n•(CAG)n transgenes could induce local heterochromatinization and position effect variegation (PEV) upon integration. With overexpression of the heterochromatin organizing protein HP1β, PEV increased only in transgenes containing the TNRs [43]. These data suggest that the integration site and the transgene may each effect biological pressure for or against integration of a DNA fragment at a particular genomic site. While the influences of chromosome environment are manifold, cis-effects of the integration site on TNR instability are generally diminished with increasing length of the microsatellite repeat tract and the human flanking DNA [27, 107].

Transcription is a possible cis-acting modifier that could lead to tissue-specific TNR instability, although the constitutive expression of the associated disease genes in humans argues against this model, and no correlation was observed between instability and stable mRNA levels in DM1 [118], HD [119] or SCA7 (CTG)92•(CAG)92) transgenic mice [115]. Nevertheless, secondary attributes of transcription, e.g. DNA supercoiling, histone modification, DNA repair induced by the formation of RNA-DNA hybrid loops [25, 120, 121] may indirectly account for the activation of ATR or ATM pathways [122124], and transcription induced (CTG)n•(CAG)n contraction has been reported in non-murine systems [26, 125129].

Specific trans-acting factors that have been implicated in (CTG)n•(CAG)n instability based on crosses between transgenic TNR mice and mice defective in DNA mismatch repair or base excision repair [20, 71, 72, 102, 130, 131]. In bacteria, MMR relies primarily on three protein complexes MutS, MutL and MutH [132]. The MutS dimer recognizes the mismatch and enlists a MutL dimer that then recruits the MutH endonuclease to initiate nick-directed repair. In eukaryotes there are at least six homologs to the MutS and MutL proteins [133, 134]. The major MutS homologue is MSH2, which can heterodimerize with either MSH6 to form MutSα which binds to single base mismatches, or with MSH3 to form MutSβ that recognizes short insertion/deletion loops. The PMS2/MLH1 (MutLα) heterodimer interacts with mismatches recognized by MutSα or MutSβ to trigger downstream excision and resynthesis reactions.

The murine mismatch repair genes MSH2 and MSH3 are essential for germinal and somatic expansion, and PMS2 is required for somatic instability, of long (> 84 repeat) (CTG)n•(CAG)n TNRs in transgenic mice [16, 7072, 102, 116, 130, 131, 135, 136]. In vitro, MutSβ binding to (CAG)n hairpins nominally reduced its ATPase activity, suggesting that MutSβ might mask hairpin structures from repair [137, 138]. However, a later study reported that the ATPase activity of MutSβ bound to (CAG)n hairpins was similar to that of the enzyme bound to nonhairpin duplex DNA, and that (CAG)n hairpin binding of MutSβ did not change its catalytic efficiency (kcat/Km) [139]. Thus, the precise mechanism by which the MMR system is involved in TNR instability in vivo remains unresolved.

When transgenic mice carrying the HD (CAG)n repeat were crossed with mice lacking the base excision repair (BER) glycosylase OGG1 (7,8-dihydro-8-oxoguanine DNA glycosylase), which is responsible for the removal of 7,8-dihydro-8-oxoguanine (8-oxoG) (the most common oxidized base in DNA), age-dependent somatic expansion was largely suppressed [20]. In an in vitro model of base excision repair, incision of 8-oxo-G within a (CAG)n tract by APE1 and extension by the major BER polymerase, polβ, resulted in expansion of the repeat tract [20, 140], leading to the hypothesis of a toxic oxidation cycle in which hairpin loops form during long-patch repair of bases damaged by reactive oxygen species. The hairpin may be protected from the endonuclease activity of FEN1 by MutSβ [141], or FEN1 may promote the ligation of hairpin-containing flaps [140]. Repeated cycles of oxidation, repair and expansion would promote progressive age-dependent expansion. The presence of additional glycosylases that can act on 8-oxo-G (and other oxidized bases) implies a unique function for OGG1 in (CAG)n expansion during BER, which is not yet understood [142]. Similar experiments in which DM328 mice ((CTG) > 300•(CAG) > 300 repeats) were crossed with mice deficient in Rad52, Rad54 or DNA-PKcs showed little effect on TNR stability, arguing that homologous recombination or nonhomologous end joining are not involved in expansion or contraction [135].

A difference between yeast and murine systems is the reported absence of an effect of Fen1 loss on TNR instability in DM1 knock-in transgenic mice [143]. Fen1 knockdown (80-90%) did not affect the stability of the HD (CAG)27•(CTG)27 repeats in human cells, but Fen1 haploinsufficiency did induce expansion of transgenic (CAG)120•(CTG)120 repeats [144]. Constitutively low levels of Fen1 have also been implicated in inducing (CAG)n•(CTG)n instability in the striatum vs. cerebellum of HD mice [145].

The similarity of tissue-specific patterns of somatic mosaicism in multiple mouse models also suggest the influence of additional tissue-specific factors affecting TNR stability [104, 110, 118, 119, 146]. Further, it has been proposed that sex-specific trans-acting factors are responsible for differences in intergenerational instability between murine and human systems [103]. A recent genomic study compared the phenotype of tissue-specific patterns of (CAG)n•(CTG)n instability of HdhQ111 (HD homolog) transgenic mice with microarray analysis of gene expression [147]. The collective expression signature of a group of 150 genes was highly correlated with tissue instability, although no single gene expression pattern was absolutely predictive of instability. Instability indices were highest in nondividing striatum (highly affected in HD) and liver cells, and lowest in testis and umbilical cord. Comparison of HdhQ 111/111and Hdh+/+ littermates showed that the instability of normal or mutant striata was significantly higher than the instability index of cerebellum. The authors concluded that mutant and wild type striata have similar tendencies towards TNR expansion, but the HD (CAG)n•(CTG) n microsatellite does not expand in normal striatum because it is not of sufficient length to be susceptible to additional processes involved in expansion. Possibly, transient or short-lived fluctuations in protein function or DNA structure occur frequently in specific tissues to increase the susceptibility of long TNRs to expansion.

In contrast to previous reports that MMR and BER proteins contribute to (CTG)n•(CAG)n expansion in mice, expression levels of DNA repair genes including MSH2, MSH3, and OGG1 did not correlate with the tissue specificity of somatic instability [147]. To address the caveat that steady state RNA levels may not reflect changes in protein abundance, the authors confirmed by immunoblot that Cbp and MSH2 protein protein levels were indistinguishable in Hdh+/+ and HdhQ111/+ mice. However, further inspection of the data revealed that 63 of 74 genes whose downregulation showed weak-to-medium range Pearson coefficient correlation to TNR instability are involved in DNA metabolism. The authors concluded that pathways including cell cycle, metabolism and neurotransmission act in combination to generate tissue-specific patterns of instability, and that multiple tissue factors reflect the level of somatic instability in different tissues. Components of any of these pathways may represent second site genetic modifiers that contribute to the tissue- and cell type-specific variation of (CTG)n•(CAG)n TNR instability observed in mice and humans.

Human models of (CTG)n•(CAG)n instability

During human and mouse development, DM1 tracts tend to expand in premeiotic spermatogonia, and large alleles subsequently contract during later stages of spermatogenesis and early in male development. In females, large expansions can be observed in nondividing oocytes, and full mutations are inherited almost exclusively from the mother. Thus, it has been proposed that two mechanisms of instability apply to (CTG)n•(CAG)n repeats: as in oocytes, expansions occur by DNA repair, while contractions characteristic of male development are the result of DNA replication [142]. In both male and female DM1 patients (and transgenic mouse models), (CTG)n•(CAG)n TNR tracts also show a significant level of somatic instability that increases with age in a tissue-specific manner. In possible support of a dual mechanism model for expansions and contractions, DM328 transgenic mice made deficient in DNA ligase I displayed reduced (CTG)n•(CAG)n instability upon maternal transmission, but showed no effect on paternal transmission or somatic instability [148]. Although studies such as these are valuable in identifying candidate genes affecting (CTG)n•(CAG)n instability, non-human model systems arguably do not recapitulate all aspects of microsatellite expansion disease in human cells due to differences in chromatin structure, cell division and DNA replication rates, and cell type. Hence, several investigators have turned to analyses of patient-derived cells, human embryonic stem (hES) cells [149151], and other human cell model systems [7].

Recent PCR and immunoblot studies reported that MMR (MSH2, MSH3, MSH6) gene expression and protein levels of VUB03_DM1 and VUB19_DM1 hES cells were as high as in MMR proficient HeLa cells and stable during culture in the undifferentiated state, when (CTG)n•(CAG)n repeat length increased significantly [151, 152]. Following differentiation to osteoblast progenitor-like cells, sharp decreases in MSH2, MSH3 and MSH6 levels were correlated with stabilization of (CTG)n•(CAG)n repeat lengths of the VUB03_DM1 and VUB19_DM1 hES cells [152]. These results imply that either the reduction in MMR protein expression, the decrease in cell proliferation during hES cell differentiation, or other trans-acting factors, may be related to (CTG)n•(CAG)n TNR stabilization.

In a human fibrosarcoma model that scores contraction of an intronic (CTG)95•(CAG)95 repeat by cell survival under HPRT+ selection (HAT medium), ~25-fold induction of transcription of randomly integrated HPRT cassettes increased contraction ~15-fold, to roughly 0.001% of cells. Transcription-induced contraction frequencies accumulated at the same rate in proliferating and confluent cells that differed by 10-fold in rates of cell division. siRNA knockdown of proteins involved in mismatch repair (MSH2, MSH3), and transcription-coupled nucleotide excision repair (CSB, ERCC1, XPA, XPG, TFIIS, BRCA1, BARD1) decreased the frequency of contractions 2- to 3-fold [25, 26, 126, 153], indicating that these pathways likely play a role in TNR expansion in postmitotic cells. Supporting a role for BER in somatic alteration of (CAG)n•(CTG)n repeat length, tissue-specific decreases of (CAG)n•(CTG)n instability in the striatum, cerebral cortex and hippocampus of Xpa-/- SCA1 mice have recently been reported [154]. Nevertheless, a parsimonious mechanism for transcription-induced instability that addresses both supporting and conflicting evidence is not yet available [25].

The effect of DNA replication on the stability of (CTG)n•(CAG)n sequences has been studied in SV40 origin plasmids replicating in COS-1 cells [155] or T-ag supplemented HeLa cell extracts [59]. While these systems do not duplicate the chromatin structure of genomic DNA, and replication does not utilize components of the ORC-dependent replisome that interact with the cellular DNA damage signaling and repair machinery, (CTG)n•(CAG)n instability in these plasmids was sensitive to TNR length, leading/lagging strand replication polarity, and distance to the viral replication origin. Hence, strong evidence supports both replication-dependent and replication-independent mechanisms of (CTG)n•(CAG)n instability.

A pharmacological approach to reducing the length of (CTG)n•(CAG)n TNRs was used by Hashem et al. in lymphoblast cell lines derived from DM1 patients (CTG)~770•(CAG)~770 repeats) [156]. Short term treatment with several DNA damaging drugs (ethylmethanesulfonate, mitomycin C, mitoxantrone, doxorubicin) led to the accumulation of smaller (CTG)n•(CAG)n repeat alleles in the cell population, often to fewer than 100 repeats. The rate of shift in the population profile indicated that the effects of the drugs were on the TNRs directly, rather than through mitotic selection. This result is significant given the tendency of the DM1 (CTG)n•(CAG)n repeats to expand rather than contract in patients and in culture. A similar study by Yang et al. showed that the replication inhibitors aphidicolin (which inhibits both leading and lagging strand DNA polymerases [157]) and emetine (which selectively blocks lagging strand Okazaki fragment synthesis [158]), but not mimosine (which induces DNA double strand breaks and arrests cells in late G1 phase [159, 160]), increased the rate of (CTG)n expansion in DM fibroblast cells. In these experiments only the expanded DM1 allele ((CTG)~220•(CAG)~220) was altered, leaving the normal allele, (CTG)12•(CAG)12 unaffected. Aphidicolin and emetine enhanced the magnitude of short expansions in almost 100% of cells approximately three-fold, while up to 25% of cells gained more than 120 repeats. Likewise, in kidney cells from Dmt-D transgenic mice carrying (CTG)160•(CAG)160 repeats, Gomes-Pereira and Monckton [161] showed that prolonged exposure to the nucleoside analog and chain elongation inhibitor cytosine arabinoside, the intercalating mutagen ethidium bromide, the DNA methylation inhibitor 5-azacytidine, and aspirin, reduced the rate of repeat expansion, while exposure to caffeine, which uncouples DNA replication and repair from cell cycle checkpoints, increased the rate of expansion.

An alternative approach to the study of (CTG)n•(CAG)n instability was taken by Liu et al., who constructed a clonal HeLa cell line containing a single FLP recombinase target site into which (CTG)n•(CAG)n repeats of various lengths were integrated alongside the human c-myc replication origin in either replication orientation at the same ectopic chromosomal site [7]. In these HeLa/c-myc:(CTG)n•(CAG)n cell lines, the (CTG)n•(CAG)n tracts displayed time-, replication polarity-, and repeat length-dependence of instability. Moreover, treatment of these cells with emetine, FEN1 siRNA or low dose aphidicolin rapidly (< 10 population doublings) and efficiently induced instability of the premutation length (CTG)45•(CAG) 45 and disease-related (CTG)102•(CAG)102 TNRs, but not normal length (CTG)12•(CAG)12 TNRs. For all three treatments (low dose aphidicolin, emetine, siRNA) there was a bias towards contraction when (CTG)102 was in the lagging strand template, and towards expansion when (CAG)102 was in the lagging strand template. The presence of (CAG)n in the lagging strand template is the same replication polarity that has generated (CTG)n•(CAG)n expansions in all other model systems [5, 162]. Additional RNAi experiments using these cells (GL, ML, submitted) have confirmed the results of yeast studies in which mutation of Tof1 (human Timeless), Csm3 (human Tipin), or Mrc1 (human Claspin) dramatically increased similar patterns of (CTG)n•(CAG)n instability.

In general, the treatment of cultured human or transgenic mouse cells with DNA damaging drugs or replication inhibitors demonstrates that environmental agents can modulate (CTG)n•(CAG)n microsatellite instability, and that agents that cause acute DNA damage or repair are at least three orders of magnitude more efficient at inducing TNR instability than transcription-induced destabilization [153], although a unifying mechanism for explaining the observed changes has not emerged.

Towards clarifying the relationship between replication and expansion of the DMPK (CTG)n•(CAG)n TNR, Cleary et al. analyzed origin activity across the DMPK locus in age-, tissue- and sex-matched human control and DM1 fibroblasts [4]. These experiments revealed two replication origins, upstream and downstream of the DMPK (CTG)n•(CAG)n repeats in both control and DM1 cells. Our laboratory has independently confirmed the presence of origins upstream and downstream of the DMPK (CTG)n•(CAG)n repeats in matched DM1 and non-DM1 cells. The upstream origin coincides with that found by Cleary et al., while the downstream origin is approximately 2 kb closer to the TNR (GL, ML, submitted). Cleary et al. also mapped the activity of these origins in transgenic mice containing the ~45 kb DMPK locus from control (DM20, (CTG)~20•(CAG)~20) or DM1 (DM328, (CTG) > 300•(CAG) > 300) and found that only the origin downstream of the expanded (CTG) > 300•(CAG) > 300 TNR was active. It is thus possible that the downstream origin (which positions (CAG)n in the lagging strand template) is responsible for replication and expansion of the TNR in the transgenic DM328 cells. However, extension of this interpretation to human cells or other chromosomal environments is clouded by the observations that, in contrast to the transgenes, both upstream and downstream origins were equally active in human control and DM1 fibroblasts, while integration of the nonexpanded control (DM20) DMPK locus resulted in inactivation of both upstream and downstream origins in the transgenic mice.

As discussed above, (CTG)n•(CAG)n instability is believed to result from the formation of hairpins in template strand DNA leading to contractions, or in newly synthesized DNA leading to expansions. Nevertheless, direct proof of hairpin formation in vivo has been lacking. To test for the presence of hairpins in vivo, synthetic zinc finger proteins (ZFPs) were engineered that specifically recognize either the (CTG)n strand or (CAG)n strand of the DMPK TNR, and fused to the Fok1 nuclease catalytic domain. The resulting zinc finger nucleases (ZFNs) dimerize only after zinc finger binding to their respective DNA substrate, which activates the nuclease catalytic domains. As diagrammed in Figure 2, heterodimerization of ZFNCTG and ZFNCAG is required to cleave Watson-Crick duplex DNA. However, hairpin DNA presents the same sequence ((CTG)n or (CAG)n) on both legs of the stem, and can be cleaved by a ZFNCTG or ZFNCAG homodimer, respectively. Expression of the ZFNs in the HeLa/c-myc:(CTG)n•(CAG)n cell lines followed by PCR across the ectopic (CTG)n•(CAG)n TNRs demonstrated directly that hairpins form in vivo on both leading strand and lagging strand templates. Moreover, ZFN cleavage was inhibited in serum-deprived nondividing cells, implying that hairpin formation in this system is replication-dependent [7].

Figure 2
figure 2

Predicted modes of ZFN binding. (a) Binding of a ZFNCTG and ZFNCAG heterodimer capable of cleaving heteroduplex DNA. FokI CD, FokI catalytic domain; ZFPGCT, (GCT)-recognition zinc finger protein; ZFPAGC, (AGC)-recognition zinc finger protein. (b) Predicted modes of ZFNCTG monomer binding to heteroduplex DNA (upper) or homodimeric ZFNCTG capable of cleaving (CTG) hairpin DNA (lower). (Reprinted from [7] with permission)

Conclusions

(CTG)n and (CAG)n trinucleotide repeat sequences can form stable hairpins in vitro and in vivo, however, there is a facile transition of (CTG)n or (CAG)n hairpins to duplex in the presence of their complementary sequences in vitro. This suggests that other factors prolong the lifetime of (CTG)n and (CAG)n hairpins in vivo, among which may be MMR complexes, negative supercoiling behind replication or transcription forks, replication fork reversal, and protein, RNA, or leading strand binding of the hairpin complement. In HeLa/c-myc:(CTG)n•(CAG)n cells in culture, the rapid and efficient cleavage of hairpins in vivo by sequence- and structure-specific synthetic zinc finger nucleases, compared to the relatively extended time required before the appearance of expansions or contractions, raises another alternative, namely that hairpins are common but short-lived in vivo, and rarely result in TNR instability unless DNA replication or repair is perturbed. The efficient and accurate repair of preformed hairpins in cell extracts is consistent with this notion.

The instability of (CTG)n•(CAG)n repeats and the frequency of chromosome breakage are increased by mutations in yeast replisome proteins. These findings strengthen the link between replication fork instability, hairpin formation, the intra-S phase checkpoint, and DNA damage responses. The similar phenotypes of mutations in yeast replisome proteins and knockdown of orthologous human proteins suggest that evolutionarily conserved pathways operate to stabilize replication forks and maximize the integrity of replication.

Not surprisingly, cell cycle and checkpoint pathways appear to play a role in murine (CTG)n•(CAG)n stability. An outstanding difference between transgenic mouse systems and the human in vitro repair systems is the apparent contribution of the MMR proteins to instability in mice and the absence of their effect on in vitro repair. One possibility is that the preformation of stable hairpin substrates for in vitro repair may bypass an in vivo effect of chromatin structure, DNA metabolism, or MMR proteins.

Several fundamental questions concerning the mechanism of (CTG)n•(CAG)n instability remain to be addressed. For example, do contractions and expansions occur as consequences of the same process of replication, replication restart or postreplication repair? Do contractions and expansions occur in different phases of the mitotic cycle? Do contractions (or expansions) occur preferentially on the leading or lagging strand during replication? Are different pathways involved in the instability of various length TNRs? Does DNA damage promote hairpin formation? Which repair mechanisms are responsible for TNR instability in postmitotic cells? What is the mechanism of transcription-induced instability?

The use of yeast and transgenic mouse mutants, and RNAi to produce human cells and cell extracts deficient in specific functions promise to give insight into these questions, and thereby reveal second site genetic modifiers of TNR instability that can be used in prognosis and therapy.