Huntington’s disease (HD) was precisely defined in 1993 when the huntingtin gene was cloned and the disease-causing CAG nucleotide expansion was defined, showing an inverse correlation between expansion length and age at onset [1]. The problem of HD was daunting, as the gene encoded one of the largest non-membrane proteins in the human proteome at 3144 amino acids or 350 kDa. Over the next 25 years, the normal function of huntingtin also eluded strict definition. Cell biology and biochemical studies revealed the protein to have no enzymatic activity and no conserved functional motifs, with the exception that proteins with polyglutamine tracts were often transcriptionally active and huntingtin contained HEAT repeats common to several protein scaffolds [2, 3]. Interactome studies showed huntingtin at the heart of several complexes [4] and it has been localized to the nucleus [5], endoplasmic reticulum [6], and early, recycling, and late endosomes [7], the neuronal postsynaptic density [8], mitotic spindle [9, 10], and the primary cilium [11, 12]. This ubiquitous location of huntingtin made it difficult to define one primary function of the protein, but rather suggested that this was a large, mobile scaffold with many dynamic complexes, with location and conformation sensitive to cell stress events.

From 1993, the predominant hypothesis of toxicity in HD was protein misfolding. The presence of > 37 continuous CAG repeats, hence, continuous glutamine residues, would lead to protein misfolding and accumulation of inclusions that triggered neuronal dysfunction and eventually neurodegeneration [13]. This hypothesis was heavily influenced by research in Alzheimer’s and Parkinson’s diseases, which were also associated with misfolded protein [14]. It was subsequently reinforced by studies in early mouse models in which a small fragment of huntingtin representing 3% of the total protein was overexpressed in trans [15]. The extreme polyglutamine length required to induce phenotypes in mice also caused abundant inclusions, and the protein misfolding hypothesis guided the field away from other viable avenues of research for many years.

Motivated by the need to find novel mechanisms and therapeutic targets in people, there has been a more recent return to unbiased statistical genetics. Although CAG expansion accurately predicts disease and correlates with age at onset, CAG length at typical clinical alleles, with a mean length of 43 repeats [16], is actually poor at predicting age at onset, with a variability of decades. Genome-wide association studies (GWAS) were therefore conducted to define modifiers of age at onset and disease severity.

A Return to an Unbiased Approach Nets New Leads in HD in DNA Damage Repair

A large global GWAS of 6000 to 9000 HD patients with broad genetic diversity and numbers for high statistical power primarily identified genes involved in DNA damage repair as significant modulators of age at onset and disease severity, with some pathways related to redox signaling and mitochondrial health [17,18,19]. There was a surprising lack of protein homeostasis regulators and the conclusion was that disease modifiers of HD are mainly involved in DNA repair mechanisms (Table 1). The most recent analyses make it clear that the length of uninterrupted CAG repeats, and not the length of the encoded polyglutamine tract, has the strongest association with age at onset [19, 23].

Table 1 Disease-modifying DNA repair genes

The single nucleotide polymorphisms (SNPs) in DNA repair genes significant to HD are also significant in some spinocerebellar ataxias (SCAs), a related group of CAG repeat diseases [20]. This suggests that DNA repair pathways underlie a common genetic mechanism of disease [20].

In retrospect, the GWAS results should have been anticipated. Many age-onset neurodegenerative diseases are known to be caused by mutations in bona fide DNA repair factors. Tyrosyl DNA-phosphodiesterase 1 (TDP1), aprataxin (APTX), and polynucleotide kinase/phosphatase (PNKP) are three factors involved in repairing broken DNA strands [24]. A series of neurodegenerative and neurodevelopmental disorders have been associated with mutations affecting the functions of these proteins. Mutations in TDP1 and APTX cause spinocerebellar ataxia with axonal neuropathy (SCAN1) [25] and ataxia-oculomotor apraxia 1 (AOA1) [26], respectively. Mutations in PNKP cause microcephaly with seizures (MCSZ) [27] and ataxia-oculomotor apraxia 4 (AOA4) [28], while decreased PNKP activity is associated with spinocerebellar ataxia type 3 (SCA3 or Machado-Joseph disease) [29]. Mutations in the X-ray repair cross-complementing protein, XRCC1, have been linked to ataxia-ocular motor apraxia-XRCC1 (AOA-XRCC1) [30]. XRCC1 interacts with DNA ligase III, polymerase beta, and poly ADP-ribose polymerase 1 (PARP1) to participate in base excision repair (BER) to correct either mismatched or damaged bases. This repair pathway is critical in neurons that are postmitotic, as they cannot rely on DNA replication proofreading to repair the mismatched or damaged adducts.

One of the first identified genetic ataxias was ataxia telangiectasia, or Louis–Bar Syndrome. This is a pediatric-onset ataxia, with high risk of cancer later in life due to defective DNA repair induced by loss of function mutations in ataxia telangiectasia mutated (ATM) [31]. Similar to huntingtin, ATM is a large 350-kDa scaffolding protein, rich in HEAT repeats, and ubiquitous in location, found in the nucleus, cytoplasm, vesicles, primary cilium, and mitotic spindle [32]. However, unlike huntingtin, it has a known catalytic activity as a serine-threonine kinase [32]. As a signaling factor, ATM modifies DNA repair factors, as well as genomic integrity regulators tumor protein p53 (TP53) [33] and mouse double minute 2 homolog (MDM2) [34]. Understanding the broader landscape of DNA repair defects in neurodegenerative diseases will shed light on GWAS leads and why they affect HD age at onset and severity, and more importantly, where there could be targets for disease modification.

The Huntingtin Protein and DNA Damage Repair

In 2014, ATM was seen as a modifier of HD phenotypes in a mouse model, as crosses with heterozygous null ATM mice resulted in a milder phenotype [35]. What drove that study was the observation that there is elevated DNA damage in HD and SCA mouse models [36], and this damage preceded protein aggregation [37]. Indeed, even in human HD fibroblasts with only 43 CAG repeats, a phenotype of elevated DNA damage can be detected [38]. In an HD clinical study in 2018, elevated damage was seen longitudinally in HD patient peripheral blood mononuclear cells, and this damage preceded mitochondrial markers of dysfunction [39]. Similarly in 2019, another clinical study measured increased DNA damage in prodromal HD in the blood cell population [40]. Thus, from cells to mouse models, and most importantly to patient clinical samples, there is a clear progressive level of DNA damage in HD. This DNA damage starts in the prodromal phase of disease and is present with typical clinical alleles.

Although these observations and GWAS results provide hints of DNA damage mechanisms in HD, one question is the exact cause of the damage. Inhibition of DNA repair in postmitotic neurons would be sufficient to cause an accumulation of DNA lesions that occur through normal metabolic processes. Furthermore, during human aging, decreased mitochondrial efficiency leads to increased reactive oxygen and sulfur species, and this decreased efficiency is amplified in neurodegeneration [41]. In HD, these problems may be caused by impaired function of the huntingtin protein.

Huntingtin protein localization and conformation are stress dependent. Huntingtin is a ROS sensor and will relocate from the outer leaf of the pervasive endoplasmic reticulum bilayer to the nucleus upon oxidation of a single methionine at position 8 [42]. Reactive oxygen species (ROS) also increases the number of nuclear speckles: liquid–liquid phase separated droplets at which huntingtin colocalizes with DNA repair factors including ATM [38, 42,43,44]. Huntingtin acts as a scaffold that can localize to DNA damage, and modulates its associated complex in the presence of ROS stress [38]. Importantly, localization to damage is dependent on ATM kinase activity [38]. Huntingtin was identified in the transcription-coupled repair (TCR) complex that finds lesions and mediates repair during transcription. Mutant huntingtin impairs the function of the TCR complex components PNKP and ataxin-3, leading to elevated DNA damage and ATM hyperactivation [45].

Thus, we have a new role of huntingtin scaffolding activity in DNA repair, in which dysregulation may explain the predominance of DNA repair pathways identified by GWAS. However, within the genes uncovered by GWAS and subsequent SNP genotyping, there are very specific, seemingly unrelated repair pathways that have appeared. Mismatch repair (MMR) is represented by SNPs in several genes, whereas one of the most significant hits, FAN1, is associated with interstrand cross-link (ICL) repair.

Mismatch Repair and Somatic CAG Expansion

The mismatch repair (MMR) pathway is of particular interest because of its influence on somatic expansion of CAG repeats, which can undergo progressive length increases over time, particularly in the brain [46]. Somatic expansion of huntingtin is associated with earlier age of HD onset [47] and more severe symptoms [48]. In mouse models, expansion is prevented upon deletion of MMR genes, mutS homologs 2 and 3 (msh2 and msh3), mutL homolog 1 (mlh1), or PMS1 homolog 2 (pms2) [49,50,51,52]. These proteins normally work in concert to repair insertion-deletion loops in the DNA, but they also drive disease-associated repeat expansions [53].

The initial evidence from mouse models has since been bolstered by human data, clearly implicating the MMR pathway in disease progression. Although the chromosomal region bearing MLH1 was identified by GWAS, but did not initially reach genome-wide significance [17], a subsequent SNP genotyping study confirmed a modifier haplotype at the MLH1 locus [21]. The genomic region bearing MSH3 was identified by a second GWAS with more in-depth measures of disease progression [18], and PMS2 was among the SNP locations independently confirmed by genotyping [20]. Disease-associated SNPs near the MLH1, MSH3, PMS2, and PMS1 loci have recently been confirmed and the importance of uninterrupted CAG repeats within the huntingtin gene itself, which also impacts somatic instability, is now clear [19, 23]. PMS2 and MSH3 have also been implicated in SCA1 and myotonic dystrophy, respectively [20, 22]. Modifier genes are summarized in Table 1.

Although it is likely that MMR proteins influence pathology via somatic expansion, this may not be the sole mechanism, as tissue specificity of somatic expansion correlates with striatal neurodegeneration in HD, but not in SCAs [54], despite ubiquitous transcription of the CAG-containing proteins. It is possible that MMR proteins also contribute to disease modification through other DNA repair factors, including TP53 [55], breast cancer associated-1 (BRCA1) [56], and ATM [57]. The MMR machinery has also been implicated in the repair of several types of DNA damage [58,59,60], including ICLs [61].

In the patient population, pathology is likely to be caused by a combination of these mechanisms. The fact that MMR proteins, and scaffolds such as huntingtin and ATM are common components of many DNA repair complexes [38, 56, 57, 60], offers the intriguing possibility that mutant huntingtin protein dysfunction may affect CAG expansion within its own gene. DNA repair mechanisms are paradoxical in that the pathways are very specific to types of DNA damage, yet many players operate in more than one pathway [62]. Most DNA damage repair mechanisms have been studied through the disease lens of cancer or universal base mechanisms from prokaryotes, which may explain why huntingtin was never identified previously in DNA damage repair complexes: there appear to be some very basic differences critical to neuronal populations. It is well established that the bulk of neuronal DNA damage is acquired by oxidative damage rather than replication errors [63], and novel mechanisms such as the generation of double-strand breaks in targeted promoters of active neurons have more recently been uncovered [64].

FAN1 and Interstrand Cross-Link Repair

Another lead gene identified in HD GWAS is the Fanconi anemia FANCD2- and FANCI-associated nuclease 1 (FAN1) [17, 20, 21]. FAN1 is well defined in the repair of interstrand cross-links (ICLs) as a nuclease inducing strand cleavage around ICL adducts [65]. These covalent structures, if not repaired, distort DNA and suppress transcription by preventing strand separation [66]. Like other forms of DNA damage, ICLs can occur as a result of cellular metabolism, and can form between strands at abasic sites produced by oxidative damage and base excision repair [67].

Importantly, FAN1 has also been linked to somatic expansion in fragile X-related disorders [68] and HD [69]. In contrast to MMR proteins, FAN1 expression is associated with reduced expansion and later age at onset, in a mechanism that surprisingly does not require nuclease activity [69]. As with many other factors in DNA damage repair, such as XRCC1 and ATM, FAN1 is both an enzyme and a scaffold, which is a critical consideration when interpreting genetic data and assuming the target is enzymatic inhibition alone.

The relative contributions of somatic expansion suppression and ICL repair to pathology have yet to be elucidated. These seemingly disparate pathways may be connected by products of oxidative DNA damage, such as N6-furfuryladenine (N6FFA), which has been defined as a beneficial compound in HD models and a potential drug lead as described below [43, 70].

DNA Repair Mechanisms in HD May Uncover New Targets and Drug Leads

N6FFA was identified in a screen for compounds affecting the phosphorylation of serines 13 and 16 within the N17 domain of huntingtin. N17 comprises the first 17 amino acids that modulate the location of the massive 3144 amino acid protein as a “tail that wags the dog” [6, 71]. This modification is deficient in HD [9, 44] and its restoration is beneficial in HD model systems [9, 43, 72, 73] making it an attractive therapeutic target.

N6FFA is a naturally occurring product of oxidative DNA damage and an excretory metabolite [74, 75]. Furan moieties are formed in the presence of reactive oxygen species and further react by addition to adenosine. This modified base forms bulky lesions in DNA and causes mismatches at adjacent bases, requiring BER and MMR mechanisms for repair [76, 77]. The excised product, N6FFA, is salvaged to the ATP analog, KTP, via the promiscuous activity of adenine phosphoribosyltransferase (APRT) [78]. KTP can be used as a “neo-substrate” by kinases with large enzymatic pockets, including CK2, the kinase responsible for phosphorylating the huntingtin N17 domain [9, 43]. This would be especially important under conditions of DNA damage, which are associated with low levels of ATP [79]. CK2 is an important player in the DNA damage response, having numerous substrates and activating ATM at the very first steps of DNA mismatch recognition [80].

Huntingtin found at sites of DNA damage is phosphorylated at N17 [38], and N6FFA, APRT, and CK2 colocalize with huntingtin at damage sites [43]. We therefore proposed that in proximity to DNA damage, N6FFA is salvaged to KTP by APRT, and CK2 uses locally generated KTP to phosphorylate huntingtin and its other substrates. In this way, KTP would enhance the ability of CK2 to phosphorylate its targets, in proximity to DNA damage sites, under conditions of depleted ATP. By this mechanism, a product of DNA damage signals the activation of repair proteins, resulting in the production of additional signaling molecules in a feed-forward loop that dampens naturally once N6FFA adducts are corrected (Fig. 1, left panel).

Fig. 1
figure 1

Proposed mechanism of N6FFA action. Left panel: Products of oxidative DNA damage are excised by the repair machinery yielding N6FFA, which is salvaged to KTP by APRT. KTP is used by CK2 to phosphorylate its targets, including normal huntingtin. The signal dampens naturally as adducts are repaired. Middle panel: Expanded huntingtin is inefficiently phosphorylated and impaired in its scaffolding function, resulting in the accumulation of adducts. Without N6FFA excision and conversion to KTP, the signal is stifled. Right panel: Exogenous N6FFA is salvaged by APRT, providing a source of KTP to activate CK2 signaling and restore repair

We hypothesize that polyglutamine expansion inhibits the phosphorylation of N17 by CK2 [9, 44] and impairs the scaffolding function of huntingtin in the process of DNA repair [38]. Excision of damaged bases would therefore be diminished, leaving N6FFA trapped in DNA and shutting down the critical signaling by CK2 via KTP in a negative cascade (Fig. 1, middle panel). This mechanism provides an opportunity for therapeutic intervention: the feed-forward repair signaling would be restored by adding free N6FFA in trans, which is salvaged to KTP [78] for use by CK2 on huntingtin and its other targets (Fig. 1, right panel). Indeed, N6FFA was beneficial in mouse cortical neuron assays as well as mouse models to correct HD phenotypes, with an intriguing loss of huntingtin inclusions in mouse brains [43] that could be attributed to the restoration of energy levels as described in the next section.

Interestingly, the N6FFA trapped in DNA due to poor BER and MMR due to mutant huntingtin can further oxidize to generate an ICL, in addition to DNA–RNA and DNA–protein cross-links [77, 81]. Thus, N6FFA can be considered the focal point of: an age-onset mechanism of increased ROS stress; the generation of DNA mismatches implicating MMR; and a maturation of the adduct to an ICL, which highlights FAN1.

Understanding the different mechanisms by which GWAS hits and expanded huntingtin affect DNA repair will be indispensable in the development of therapeutic strategies going forward. Regardless of mechanism, however, the consequences of DNA damage accumulation may be similar, including changes in transcription and epigenetic signatures, as well as important effects on energy metabolism.

DNA Repair Is Intimately Linked to Energy Defects in Neurodegeneration

The development of cerebellar ataxia in AOA-XRCC1 is associated with the elevation of poly ADP-ribose (PAR) levels [30]. The production of PAR chains by PARPs is one of the first steps in the process of DNA damage repair. These PAR chains transiently recruit DNA repair factors to adducts, including XRCC1, but the fast polymerization of ADP comes at the great metabolic cost of draining nicotinamide adenine dinucleotide (NAD+) levels [82] and inhibiting glycolysis [83]. Transient energy drops are relieved quickly by the breakdown of PAR by PAR glycohydrolase (PARG) [84] and nucleotide-salvaging enzymes are critical to restore intraneuronal nucleotide levels [85]. However, in AOA-XRCC1, the production of PAR persists due to low XRCC1 levels, leading to toxicity. Excessive and persistent PAR production can lead to neuronal death by the unique mechanism of parthanatos, which involves the nuclear translocation of apoptosis-inducing factor (AIF) from mitochondria [86].

Hyper-PARylation can also lead to protein aggregation. In 2018, this was highlighted elegantly in the Parkinson’s disease context showing PAR chains could nucleate alpha-synuclein aggregation [87]. Aggregation is also a downstream consequence of ROS-mediated energy depletion, as ATP-dependent chaperones struggle to maintain protein quality under conditions of oxidative stress [88, 89]. There is recent interest in repurposing PARP inhibitors aside from cancer treatment [90], but the caveat is that most PARP inhibitors are designed to be toxic to transformed cells by trapping PARP enzymes on DNA. One solution may be PARP expression knockdown with new generations of anti-sense oligonucleotides [91].

The relevance of PARylation to HD has yet to be determined, but from GWAS we may have a hint that ties hyper-PARylation to the ATM/huntingtin complex. Another major lead in HD GWAS is the ribonucleoside-diphosphate reductase subunit M2B, or RRM2B, also known as P53R2. RRM2B is a critical ribonucleotide salvager which nets a severe fatal pediatric disorder called mitochondrial DNA depletion syndrome (MDDS) that affects muscle, the brain, and the respiratory tract when null [92]. Like huntingtin, RRM2B is activated by the TP53 tumor suppressor [93], and is a known ATM interactor [94]. The ribonucleoside-diphosphate reductase activity could be critical downstream of PARG to salvage ADP back from hydrolyzed PAR chains during neuronal energy crisis. Thus, we can hypothesize a mechanism of RRM2B disease modification by catalyzing the conversion of poly ADP-ribose chains back to critical adenosine ribonucleosides.

HD GWAS in the Bigger Picture of Polyglutamine Diseases

Given the significance of HD genetic modifier SNPs to some spinocerebellar ataxias, we may find an intersection of molecular mechanisms of disease between HD and other CAG repeat disorders with respect to DNA repair. One node is the ATM complex, but huntingtin has also been defined at the TCR complex [45] along with ataxin-3, the affected protein in SCA3, and PKNP, the protein mutated in MCSZ [27]. Ataxin-3 has also been implicated by others in the double-strand break response [95]. CAG expansion in the androgen receptor, a transcription factor involved in DNA damage repair signaling, leads to spinal and bulbar muscle atrophy, SBMA, or Kennedy’s disease [96, 97]. CAG expansion in the TATA box-binding protein (TBP) causes SCA17, and TBP localizes to damaged DNA [98]. DNA repair has also been implicated in SCA1, as overexpression of DNA repair factors replication protein A1 and high mobility group box 1 in mouse and Drosophila models corrects motor phenotypes [99, 100]. Thus, we may anticipate increased relevance of DNA repair in many late age–onset neurodegenerative diseases, as the natural increased ROS stress during human aging and effects on DNA/RNA oxidation are tempting mechanisms to explain why these diseases typically occur later in life.

Increased DNA oxidation and hyper-PARylation appeared in neurodegenerative disease studies in the late 1990s [101, 102], but the early study of PAR chains and relevance of oxidation to disease mechanism were not further explored in favor of various amyloid hypotheses of late age–onset neurodegeneration. Although guanine oxidation products were the focus of these studies as a biomarker of DNA oxidation, it is not clear that all DNA bases are equally modified under oxidative stress nor processed in a similar manner. Adenosine bases are subject to nucleotide salvage in neurons and have unique utility after base excision repair to be salvaged back to adenosine nucleosides for energy production [85]. Neurons are highly metabolically active and thus generate high ROS loads, yet oxidative base damage to DNA cannot be repaired by DNA replication as in mitotic cell types. Brain subregions can also transiently flip to aerobic glycolysis or Warburg metabolism to generate ATP at times of energy stress [103], but this energy supply comes at a high cost of increased ROS stress. The burden of reactive oxygen levels on mitochondria, which have a diminished efficiency with human aging, might explain how mitochondrial dysfunction is a common aspect of all neurodegenerative diseases [104].

To fully understand the implications of defective DNA repair in neurodegeneration, it is important to understand how DNA repair imparts a severe energy stress on neurons. High rates of neuronal metabolism mean that even as energy is depleted, ROS by-products cause damage that further drains the energy supply. In this way, neurons must struggle to maintain energy levels and even a minor deficiency in a DNA repair protein would amount to undue neuronal death over time (Fig. 2).

Fig. 2
figure 2

DNA repair and neuronal energy homeostasis. The combined energetic costs of high metabolic rate, and repair of metabolism by-product–mediated damage, make neurons vulnerable to minor deficiencies in DNA repair proteins with age

The involvement of DNA repair pathways in neurodegenerative diseases presents a number of opportunities for therapeutic intervention. Although drugs against key potential targets have been developed for cancer, preclinical data on their applicability for neurodegenerative diseases is needed as the goal of anticancer therapy is to kill affected cells, whereas the goal of antineurodegeneration therapy is to save affected cells. The immediate goals in researching DNA repair in neurodegeneration are to define the roles of DNA repair factors, in terms of their scaffolding versus enzymatic functions, to precisely define therapeutic mechanisms of either inhibiting enzymatic activity or modulating protein levels, or both. Testing in the most clinically relevant models and special attention to measures of genotoxicity will help pave the way to the clinic.