1 Introduction

Huntington’s disease (HD) is caused by an expanded CAG repeat in exon 1 of the HTT gene. Wild-type alleles contain between 9 and 26 repeats, whereas disease-causing alleles contain at least 36 repeats. The longer the repeat, the earlier the onset of motor symptoms, although there is considerable variation. Intermediate alleles containing between 27 and 35 repeats do not cause disease but are a risk factor for further expansion into the pathogenic range in the subsequent generation. By definition, repeat expansion requires DNA synthesis. In dividing cells this may occur in the context of DNA replication or DNA repair, whereas in nondividing (“postmitotic”) cells such as neurons only DNA repair pathways can be implicated. There is increasing evidence from genetics, animal models , and cellular studies for the involvement of DNA repair in HD pathogenesis .

1.1 Genetic Evidence for the Role of DNA Repair in HD

A recent genome-wide association study looking for single-nucleotide polymorphisms associated with particularly early or late motor onset of disease identified significant signals in or near DNA repair genes [1]. A pathway analysis in these data detected a strong signal in the genes of the DNA damage response implicating this pathway in modifying age at motor onset of disease. More specifically, signals were enriched for pathways involving mismatch repair genes and these findings have since been corroborated by the association of a coding variant in MSH3 with HD progression ([1, 2]; see chapter by Stone and Holmans for more detail [3]).

In theory, DNA repair could be involved in HD either at the level of the CAG repeat in the HTT gene or downstream of HTT protein function. Looking across the spinocerebellar ataxias, which are caused by expanded CAG repeats in various genes, highlights the same genetic signal as in HD, suggesting that there is a common pathogenic mechanism acting on CAG repeats in DNA [4, 5]. It remains possible that DNA repair is also affected by mutant HTT protein, or that an expanded polyglutamine can have effects on DNA repair outwith of protein context [6], but most evidence points toward repair processes affecting CAG repeat stability in DNA.

1.2 CAG Repeat Instability in HD

DNA microsatellites consist of short (up to 5 base pair (bp)) sequences that are repeated in tandem multiple times at specific genomic loci. Microsatellites are prone to expansion or contraction of the number of repeated units that they contain, and this relatively high mutation rate has made them useful as markers in genetic linkage and forensic studies [7]. Repeat sequences that cause disease when expanded beyond a threshold length are a specific instance of microsatellite instability. Many of these diseases are neurological, though why this should be so is unclear, although microsatellite instability is also observed in neoplasia [8,9,10]. HD is one of a group of neurological diseases caused by CAG repeat expansion above a threshold length. Repeat expansion can occur in germ line and non-germ line (somatic) cells with different consequences. Intergenerational repeat expansion through the germ line underpins genetic anticipation, where disease onset occurs earlier in successive generations [11]. Even in diseases where it is hard to measure the length of the repeat disease, anticipation occurs, suggesting that germ line expansion of the repeat also occurs [12]. Sperm analysis of repeat lengths in HD shows wide variation with a trend toward expansion rather than contraction, many mildly expanded alleles and a few very long expansions [13,14,15,16]. This is also seen in multiple mouse lines where expansion occurs over many generations leading to much longer alleles than those originally in the founders [17]. In HD, as in most other CAG repeat diseases and in microsatellite transmission in general, germ line expansion is most marked on paternal transmission of the mutated allele, for unclear reasons [13, 18, 19].

Somatic expansion of repeats occurs in many different cell types, including both those that divide and those that are terminally differentiated (such as neurons), and is observed in many repeat expansion diseases [11, 20, 21]. In HD, mouse models and human postmortem brain analyses have shown that somatic CAG repeat expansion occurs over time in a tissue-dependent manner, and is inversely correlated with age at disease onset [22]. The marked somatic expansion observed in striatum has led to the hypothesis that repeat expansion in striatal medium spiny neurons drives HD pathogenesis . However, large expansions also occur in some nonneural tissues such as liver which are not obviously involved in HD [21, 23]. Disease-associated somatic instability is modulated by the DNA damage response, and particularly by the mismatch repair system [9, 24, 25].

Therefore, by assaying repeat length and stability we can gain insight into the role of DNA repair in HD. Here we detail methods for CAG repeat sizing, and the cellular and mouse models that have been used to explore DNA repair in HD.

1.3 Methods for Measuring CAG Repeat Length

Assessing the sequence and organization of DNA in repetitive genomic regions is difficult. Various methods have been developed over the last 25 years to probe the CAG repeat in HTT, and these are outlined in Table 1. The most straightforward methods utilize PCR amplification of the repeat, but there are inherent errors such as amplification bias toward smaller alleles, and “PCR stutter” arising from DNA synthesis in a repetitive region. Nonamplification methods such as genomic DNA digestion and Southern blotting avoid PCR bias but are laborious, require much more input DNA, and are semiquantitative at best. In addition, most methods assess repeats in bulk reactions, i.e., utilizing input DNA from thousands of different cells (from tissue or culture) at once. Large changes in repeat number can be observed, but rare alleles in individual cells will be missed and calculating repeat length averages is not informative for subtle effects. Input DNA dilution as in small-pool PCR can obviate these issues but is labor-intensive and prone to contamination . Next-generation sequencing (NGS) technologies using typical short reads (up to 100 bp) can be used to assess wild-type alleles with few CAG repeats, but alignment of alleles containing expanded repeats is usually difficult, if not impossible. New long-read NGS offers exciting opportunities to revolutionize repeat sequence and length assessment: by generating sequencing reads of up to 80 kbp long repeats can be sequenced, sized and phased in one reaction, and there is the potential to multiplex many samples to increase throughput [33]. Currently most of these NGS methods work on amplicons across repeats, but techniques where repeats can be sequenced without amplification are in development . Costs are high, but are likely to decrease rapidly as the new technology becomes more widespread and embedded into standard analyses.

Table 1 In vitro methods for measuring CAG repeat length in HD

2 Cellular Models to Assess CAG Repeat Stability In Vivo

All methods for accurately sizing CAG repeats require DNA extraction from cells or tissue and subsequent analysis (Table 1). Although these assays can give an estimate of repeat length at a specific time point, they do not provide any information about the intracellular dynamics of the repeat instability reactions. To assess these repeat dynamics various models have been developed in bacterial, yeast, mammalian, and cell-free systems to try to provide insight into CAG repeat instability (Table 2). Most of these models utilize the fact that longer CAG repeats can interfere with gene expression and/or splicing as the basis for a selectable reporter system. Frameshifting assays are not useful here as the loss or gain of trinucleotide units leaves the translational frame unaffected. The reporter system can be plasmid-based or, more usefully, integrated into the host cell genome. Most recently a GFP minigene reporter system has been developed with the ability to identify expansions and contractions in the same population of living cells—this could provide an amenable system for testing DNA repair variants isolated in genetic studies [49]. All of the assay systems described in Table 2 highlight the inherent instability of the CAG repeat, the length threshold of 30–40 repeats seen in many human diseases, and show that contractions and expansions can occur in many genomic contexts. The systems are much more sensitive than PCR methods for identifying rare repeat instability events (the cellular reporter systems act as biosensors to enrich for contractions/expansions) but often require PCR validation of results. DNA repair gene knockouts and variants have been assessed effectively in these systems [41, 49]. However, interpretation of results is limited by the uncertain relevance of the cell lines to human disease: cells may be nonhuman, they are usually dividing in culture, often immortalized, and the CAG repeat arrays are usually within an artificial reporter construct rather than a physiologically relevant context. Many long CAG repeats show a bias toward contraction in dividing cellular models in contrast to HD neurons in vivo where expansions are favored. In addition, the time frame of cellular experiments is usually days to weeks which is insignificant in the context of the human HD time course that is measured in years.

Table 2 Cell-based models of CAG repeat instability

Primary and embryonic stem cells from mouse models of HD and human induced pluripotent stem cells from HD patients have also been cultured extensively and repeat lengths assessed [52,53,54]. These cells have the advantage of disease-relevant repeat contexts but require downstream in vitro repeat length analysis and are harder to manipulate genetically than the reporter systems above. In the future a reporter system that can be deployed in a multiwell plate design with automated readouts to show repeat instability in living cells, and ideally in cells derived from HD patients, would be the ultimate goal.

3 HD Mouse Models and DNA Repair

Many transgenic and knockin mouse models of HD have been generated since the discovery of the causative CAG expansion in HTT (reviewed in chapter by Bates [55]). These models recapitulate some of the characteristics of human disease and have been used to investigate the role of DNA repair processes in repeat stability and correlated phenotypes. Although very useful for assessing the roles of genes and mutations at the level of the whole organism, these models all have limitations in their representation of human HD. For example, mice have short lifespans and do not develop a disease phenotype unless genetically modified; most of the mice used have very long CAG repeats (>100) in order to drive a measurable phenotype in experimental time frames, whereas >90% of human HD patients have 40–50 CAG repeats. In addition, although many DNA repair genes are well conserved from mice to humans the increased complexity of the DNA damage response in humans, and the need to respond to DNA damage accumulated in neurons over decades (compared with months in mice), implies that results from mice will not necessarily hold in humans. However, multiple lines of evidence have implicated DNA repair in CAG repeat instability and HD pathogenesis and so many repair genes have been tested in mouse models.

Length and age-dependent somatic CAG repeat expansion in striatal neurons of murine HD models correlates with worsening motor and behavioral phenotypes in the animals. However, it is extremely difficult to distinguish between pathology arising from the starting (long) repeat and that which may be related to somatic expansion of that repeat. Many studies have measured repeat stability in the context of DNA repair mutations to implicate DNA repair processes in HD (Table 3). There is most evidence for the involvement of the mismatch repair and base excision repair pathways: knockout of various genes in various HD models prevents CAG repeat expansion in both germ line and striatum, and may be correlated with improved phenotype. There is some specificity of factors within these pathways: for example, Msh2 and Msh3 knockouts differ in somatic and germ line effects; Ogg1 and Neil1 affect CAG repeats, whereas other glycosylases (e.g., Mpg) do not. These findings suggest that there may be specific effects of certain repair enzymes on CAG repeats. One further difficulty with data interpretation arises as DNA repair mutant mice may have other deleterious phenotypes. For example, Msh2 null mice show methylation tolerance and predisposition to lymphomas [70]. Nevertheless, the weight of evidence suggests that DNA repair enzymes do affect CAG repeats in mouse models of HD, and may be involved in pathogenesis .

Table 3 Mouse models used to assess the role of DNA repair in HD

4 Discussion and Future Directions

There is now considerable evidence linking DNA repair with HD. Repair pathways are most likely to intersect with HD at the level of the CAG repeat, but mechanistic detail remains elusive. For example, it is unclear whether DNA repair can stimulate repeat expansion in an unbroken DNA duplex, or whether preexisting DNA damage (e.g., a single or double strand break) is required. DNA transactions that melt the duplex, such as transcription or replication, have been linked to repeat instability in specific systems but their relevance to human disease is uncertain [71]. How might DNA transcription or replication trigger repeat instability? Duplex unwinding exposes single-stranded DNA which is then susceptible to damage by exogenous or endogenous reagents leading to a requirement for DNA repair. Repair could then stimulate repeat instability . However, such DNA transactions also require an open chromatin conformation and recruitment of transcription/replication factors, and will cause local changes in DNA supercoiling, all of which may also affect the stability of CAG repeats. Experimental approaches to disentangle these and other possibilities involve inducing DNA damage, measuring global effects in cells, and then, in the context of HD, measuring repeat stability . DNA damage can be induced with exogenous agents both nonspecific (e.g., hydrogen peroxide to cause oxidative damage; topoisomerase I inhibitors such as camptothecin to induce single-strand breaks) and specific (e.g., CRISPR/Cas9 or zinc-finger nucleases to induce targeted single or double strand breaks) to probe different aspects of DNA repair. DNA damage in cellular systems can be measured using comet or alkaline comet assays (for strand breaks), or γH2AX immunofluorescence (for double-strand breaks in nondividing cells), and repair measured through unscheduled DNA synthesis (UDS [72])—but none of these assays is particularly specific or quantitative [73]. In addition, any assays involving perturbation of DNA repair systems are prone to significant confounders such as neoplasia and genomic instability —a particular issue in cells (e.g., induced pluripotent stem cells) that may be prone to accumulation of genetic rearrangements in the laboratory.

Over 450 human genes are now implicated in the DNA damage response (DDR) and although there are linear pathways within the DDR—such as mismatch repair or base excision repair—it is becoming increasingly clear that there is considerable functional redundancy between factors in different pathways, and the interplay between them is complex [73]. Such safeguards are crucial for maintaining genomic stability in vivo but make experimental investigation much more difficult. In HD only a few DDR genes have been investigated so far. These were mainly chosen on theoretical grounds linking microsatellite instability and mismatch repair. Results have shown that these genes can influence CAG stability in HD models, but do not show whether they are relevant to human disease. Recent human genetic data are starting to link actual genetic modifiers of HD onset and progression to variants in DNA repair genes [1, 2]. Researchers can now introduce disease modifiers identified from the actual HD population into their model systems of DNA repair and should be able to draw much stronger, disease-relevant, mechanistic conclusions. Developments of such assay systems will allow more efficient testing of potential therapeutic compounds that affect these mechanisms.