Base excision repair

Base excision repair (BER) is one of the most important DNA repair pathways, which ameliorates environmentally induced DNA damage, including that which arises spontaneously as a result of alkylation, oxidation, and deamination events during normal metabolic processes [1]. BER is also responsible for repairing small, non-helix distorting lesions that may be induced by chemical carcinogens [2]. In addition, it is also responsible for the repair of abasic sites, which may arise spontaneously as a function of temperature fluctuation, or it could arise as intermediates in the DNA repair process [3]. Compared with other repair machinery, such as nucleotide excision repair, the core components of BER machinery have been well conserved from bacteria to humans both structurally and functionally during evolution [47], underscoring the vital role BER plays in maintaining genome integrity. BER is believed to be the simplest and most defined of all DNA repair processes. The molecular mechanism of BER has been resolved to the tertiary structure for all core components [810].

The BER pathway functions by a series of well-coordinated enzymatic events which can overall be divided into two steps. The first step of BER is the recognition and excision of a damaged base or an abasic site by a series of specific DNA glycosylases. The next step involves the sequential action of different proteins which correct DNA by template-directed insertion of one or a few nucleotides, starting at the damaged site.

The first step of BER relies on glycosylases that recognize and remove the damaged base through N-glycosylic bond hydrolysis to generate abasic or apurinic/apyrimidinic (AP) sites. These AP sites are identical to spontaneous DNA depurination or depyrimidation. Each of the glycosylases has specificity to a relatively narrow, partially overlapping spectrum of lesions, and may function as a monofunctional enzyme (exclusively removes damaged base) or a bifunctional enzyme (removes damaged base and incises DNA backbone) [11]. Bifunctional glycosylases possess AP-lyase activity that hydrolyses the 3'-phosphodiester bond of the AP site by a β- or β-d-elimination mechanism generating 3' a,β-unsaturated aldehyde and 5'-phosphate products at the termini [12]. This terminus is then cleaved by a pivotal enzyme APE1 (also called HAP1 or Ref) which has lyase and phosphodiesterase activities [13]. Monofunctional DNA glycosylases need the assistance of APE1 to hydrolyse the phosphodiester bond at the 5' end of the AP site via its lyase activity, producing a single strand break (SSB) with a normal 3'-hydroxyl group and an abnormal 5'-deoxyribose 5-phosphate (dRP) residue [14].

Numerous prokaryotic and eukaryotic DNA glycosylases have been isolated and purified. Their substrate specificities have been determined using various types of substrates. Moreover the crystal structure of numerous DNA glycosylases has been solved. Data obtained from structure determination have allowed glycosylases to be grouped into several major structural families by architectural folds [7], including helix-hairpin-helix [15], helix-two-turn-helix (H2TH) [16], and uracil DNA glycosylases (UDGs) [17].

In humans about 11 mono-functional and bifunctional glycosylases have been identified [12, 18] (Table 1). The human mono-functional glycosylases include methyl-CpG-binding domain protein 4 (MBD4), N-methylpurine-DNA glycosylase (MPG), single-strand selective mono-functional uracil DNA glycosylase (SMUG1), thymine/uracil mismatch glycosylase (TDG), uracil-DNA glycosylase 2 (UDG2), uracil DNA N-glycosylase (UNG1), and UNG2. The bifunctional glycosylases include A/G-specific adenine DNA glycosylase (MUTYH), endonuclease VIII-like 1 (NEIL1), endonuclease III-like protein 1 (NTH1), and N-glyco-sylase/DNA lyase (OGG1).

Table 1 DNA glycosylases involved in BER in human (modified from refs. [12, 18])

Determination of the crystal structure of glycosylases has revealed that there exist differences in the folds and specific residues used to recognize damaged bases. This information, coupled by experimental characterization of DNA base repair processes, has allowed enzymatic mechanisms of cleavage by DNA glycosylases. Based on specific mechanisms for recognition of damage, DNA glycosylases can generally be grouped into those that remove oxidative damage, deamination products, and alkylation damage.

As presented in Table 1, in human cells, the DNA glycosylases that are involved in the removal of oxidized bases include MUTYH, NEIL1, NTH1, OGG1, and SMUG1. NEIL1, NTH1 and SMUG1 catalyze excision of oxidized pyrimidines, such as 5-OHC, whereas MUTYH and OGG1 repair oxidized purines, such as 8-oxoG. The human glycosylases that remove uracil are uracil-DNA glycosylase (UNG/UDG), thymine-DNA glycosylase (TDG), SMUG1, and MBD4. The only glycosylase known to be involved in repair of alkylation damage is MPG.

To complete repair after glycosylase action BER can proceed through two different sub-pathways, short-patch and long-patch pathway. These pathways are differentiated by the enzymes involved and the number of nucleotides removed. When BER is initiated by bifunctional glycosylases, short patch is the main pathway, whereas when BER is initiated by monofunctional glycosylases it might proceed through either pathway [45].

In the short-patch pathway, DNA polymerase β (Pol β), which is recruited upon direct interaction of damaged DNA with APE1, extends the 3'-OH terminus by inserting 1 nucleotide and at the same time removes 5' terminal deoxyribose phosphate (5'-dRP) by its AP lyase activity [46]. Finally, the single strand nick is sealed by either DNA ligase I or DNA ligase III in a complex with the scaffolding protein XRCC1 [47, 48].

Long-patch BER is initiated in a manner similar to the short-patch pathway. Initially 2 to 12 nucleotides are incorporated by a sequential action of three different DNA polymerases (Pol β, δ, or ε) by elongating the 3' end by a few nucleotides and moving aside a DNA fragment which contains 5' deoxyribophosphate [49]. Next, this flap structure is cleaved out by specific flap endonuclease, FEN1 [50, 51]. In addition to FEN1, DNA synthesis and strand displacement is stimulated by the combined presence of proliferating cell nuclear antigen (PCNA) [52], replication factor C (RFC) [53], and poly(ADP-ribose)polymerase 1 (PARP1) [54, 55]. Finally, the intact DNA strand can be restored by DNA ligase I or III. However, since this pathway is stimulated by PCNA, it has been suggested that ligase I is the predominant enzyme due to its interaction with PCNA [56].

MutY homologue (MUTYH)

Structure of the MUTYH gene

The first human homologue of the E. coli MutY gene was cloned by Slupska et al. [37]. By screening a human cDNA sequence database they identified an expressed sequence tag (EST) which was then used to probe a human bacterial artificial chromosome (BAC) library to isolate genomic human mutY, which was referred to as MUTYH, also called MYH. A full-length MUTYH cDNA clone from a human brain tissue cDNA library was also isolated which localized to the short arm of chromosome 1 between p32.1 and p34.3. MUTYH encompasses 7.1 kb and has 16 exons encoding a 535 amino acid protein displaying 41% identity with E. coli protein [37]. Ohtsubo and co-workers [38] identified 10 forms of MUTYH transcripts, each with a different 5' sequence or first exon and each transcript being alternatively spliced, that were sub-grouped into 3 types, isoform MUTYHα (splice variants α1, 2, 3, and 4), isoform MUTYHβ (splice variants β1, 3, and 5), and isoform MUTYHγ (splice variants γ2, 3, and 4). The authors also showed that MUTYH protein encoded by type α mRNA possesses a mitochondrial targeting sequence (MTS), consisting of the amino terminal 14 residues which are required for its localization in the mitochondria [57], while those encoded by type β and γ mRNAs lack the MTS, and are localized in the nuclei. MUTYH α3 is the major isoform expressed in most cells and corresponds to the cDNA sequence isolated and characterized by Slupska et al. [37]. In addition to human, homologues of MutY have been cloned from other mammals, including cow [58], mouse [59] and rat [60].

The protein structure and function of MUTYH

The open reading frame (ORF) of the full-length major isoform of MUTYH (MUTYH α3) encodes a 535 amino acid protein [38]. This protein displays very high identity with MutY homologues from other mammals, 78 and 74.1% identity with mouse and rat MutY, respectively [61]. Structurally, MUTYH reveals extensive homology and conservation with established structural domains found in E. coli MutY and many prokaryotic and eukaryotic BER enzymes, in addition to domains unique to MUTYH (Fig. 1).

Figure 1
figure 1

Diagram of the functional domains of MUTYH (adopted from [107]).

MutY and its homologues contain the catalytic domain which shares several motifs with other glycosylases, including the helix-hairpin-helix (HhH), pseudo-HhH and an [4Fe-4S] iron sulphur cluster in the N terminus [62, 63]. The latter has a high overall similarity to endonuclease III that excises oxidized pyrimidines [64, 65]. HhH (aa 114–273 in MUTYH) functions to detect, recognize and remove adenines opposite to 8-oxoG by binding the phosphate backbone of the substrate. It includes a highly conserved aspartic acid residue (Asp222 in MUTYH), which is required for nucleophilic attack of the adenine base (reviewed in Ref. [61]). MutY has a special carboxy-terminal domain that is not found in other BER glycosylases, with sequence and structural homology to MutT (an 8-oxoGTPase), and this domain has an important role in the recognition of 8-oxoG because the truncation of the domain results in loss of discrimination between 8-oxoG:A and G:A mispairs [6668]. In addition, ORF of MUTYH also contains other domains that are shown to be the binding sites of other proteins. By using HeLa nuclear extracts, Parker and co-workers [69] demonstrated that MUTYH contains an APE1 binding site (aa 300), a replication protein A (RPA) binding site (aa 6–32), and a PCNA binding site (aa 505–527).

The primary function of MUTYH as a BER DNA glycosylase is to excise adenines or 2-hydroxy-adenines (2-OH-A) misincorporated opposite 7,8-dihydro-8-oxo-guanine (8-oxoG or GO) [70]. GO is one of the most stable products of DNA damage resulting from reactive oxygen species (ROS) [71] because this oxidized form of the guanine base can pair with adenine as well as cytosine with equal frequency during DNA replication and thus has the potential to cause a high rate of G:C to T:A transversions [72, 73]. In E. coli MutY, together with MutM and MutT (formamidopyrimidine-DNA glycosylase: FPG), play important roles in reducing the mutagenic effects of GO lesions [74, 75]. MutM removes GO paired with cytosine and introduces a single strand gap as a result of the accompanying AP-lyase activity [75]; whereas MutT hydrolyzes 8-oxo-dGTP and depletes it from the nucleotide pool [73]. Similar DNA error avoiding mechanisms have also been shown in human cells, whereby MTH1 (MutT homologue), OGG1 (MutM homologue), and MUTYH (MutY homologue) have been proposed to function in the reduction of GO in human genome [76].

Yang and co-workers demonstrated that the cloned cDNA of MUTYH [37, 77] complement the mutator phenotype of a MutY E. coli strain and prevented G:C to T:A transversions [78]. Suppressive activities of MUTYH against G:C to T:A transversions have also been shown in human cells in vivo [79].

Unlike MTH1 or OGG1, MUTYH possesses no detectable AP-lyase activity [78]. The authors also demonstrated that APE1 catalytic activity is required for the formation of cleaved AP DNA and stimulation of MUTYH glycosylase activity by increasing the formation of the MUTYH-DNA complex. Similar findings were also reported by Parker and co-workers, who showed that MUTYH interacts with APE1, PCNA, and RPA, suggesting a role in long-patch BER [69]. Despite the structural homology between MUTYH and its bacterial counterpart, the human MUTYH protein efficiently removes 2-OH-A from 2-OH-A:G mismatches [38], while the bacterial protein removes 2-OH-A from the substrate containing 2-OH-A:G pairs very poorly [80] (Fig. 2, 3).

Figure 2
figure 2

Base pairing properties of oxo8G residues in DNA.

Figure 3
figure 3

Different pathways for repairing damaged G residues caused by reactive oxygen species (adapted from [123]).

Recent findings from experiments employing full-length structure of MutY cross-linked to DNA containing 8-oxoG have shed light on the mechanism of recognition and removal of 8-oxoG:A mismatched by MutY. Although the precise mechanism is still unclear, the results of these experiments suggest that MutY relies heavily on the recognition of GO to locate A bases for excision. For example, Fromme and co-workers [68] demonstrated that in the X-ray crystal structure of an inactive variant (Asp144Asn) of B. stearothermophilus MutY there are extensive contacts with 8-oxoG but minimal contacts with adenine. In another instance Bernards and co-workers [82] conducted time-resolved fluorescence experiments of the MutY A-excision reaction using 8-oxo·GA substrates and found a multiphase reaction profile, with a fast process being associated with changes at 8-oxoG and a slower process associated with altering the environment of the adenine. MutY exposes its target by deeply penetrating the DNA helix, interrupting helical stacking on both strands, encircling the DNA with its catalytic core and MutT-like domains, and rotating the phosphodiester bonds surrounding the nucleotide, causing the target base adenine to be flipped out of the DNA helix [68, 83]. The iron-sulphur cluster [4Fe-4S] contained in the catalytic core of MutY plays a significant role in facilitating DNA binding with the damaged region and catalyzing the removal of adenine [84]. This iron-sulphur cluster plays a DNA-dependent electron transfer role which enables MutY to quickly and efficiently seek out damaged sites in the genome using DNA-mediated charge transport [85].

Cross-talk between MUTYH and other DNA repair enzymes

Evidence is accumulating in the literature implicating the interactions of DNA glycosylases and non-BER pathways, although in most cases the details of the mechanism are not well understood [86]. In addition to their participation in BER, DNA glycosylases have been reported to interact with nucleotide excision repair (NER) and mismatch repair (MMR).

MUTYH-initiated BER and MMR pathways share common features in terms of function and timing of action. Functionally, both pathways participate in repairing DNA lesions resulting from DNA oxidation. Both pathways take place immediately after DNA replication to increase the fidelity of DNA replication and bear a task to distinguish newly synthesized DNA strands from their parental counterparts [37, 8793].

The molecular evidence of the interaction of MUTYH and MMR proteins was first obtained from an elegant study by Gu and co-workers [94]. In their experiment they found that hMUTYH directly interacts with hMSH6 in the hMSH2/hMSH6 (hMutSα) heterodimer, which functions to bind to the mismatches and initiate the repair on the daughter DNA strands [95]. They also observed that this physical protein interaction can occur in the absence of DNA. Based on their findings, the authors further hypothesized that proteins involved in DNA replication, mismatch repair and base excision repair may exist as a multiple-protein complex and that hMUTYH may be orientated in the replication fork to recognize 8-oxoG on the parental strands and to excise misincorporated A on the daughter strand [94]. This hypothesis is supported by some evidence. Both hMUTYH and hMSH6 interact with replication proteins PCNA and RPA [69, 96, 97]. Moreover, both proteins show overexpression during S phase and colocalize with PCNA at replication foci [69, 87, 96].

Although to date there is no direct evidence for the physical interactions between MUTYH and proteins involved in NER, there is evidence indicating that DNA glycosylases are coupled to the NER pathway, more specifically to the subpathway transcription-coupled repair (TCR).

One example is the repair of thymine glycols which are normally repaired by thymine glycol DNA glycosylase. Leadon and Cooper [98] reported that thymine glycols generated in NER- and BER-proficient human cells following ionizing radiation exposure are removed in a biased fashion from the transcribed strand of an expressed gene. In this case, TCR of thymine glycols exists in the same locus in cells where BER is active, suggesting that the thymine glycol-DNA glycosylase is coupled to TCR. It is logical therefore that the question arises whether such coupled repair exists for other lesions that are normally repaired by BER glycosylases.

It has been proposed that the crucial event in transcription-coupled NER (TC-NER) is the stalling of an elongating RNA polymerase II upon a lesion, which recruits the repair proteins to the damage site [99102]. Therefore, the ability of a lesion on the transcribed strand to block the RNA polymerase transcription complex has been assumed to be crucial for TCR. Kathe and co-workers [103] found that DNA base damage does not block transcriptional elongation by RNA polymerase II in HeLa cell nuclear extracts, but single-strand breaks do. It is known that single-strand breaks are common BER processing intermediates. It is therefore tempting to speculate that a strand break produced by a DNA glycosylase at an oxidative lesion in a transcription bubble would serve as a block to the RNA polymerase apparatus which in turn will signal the TC-NER to take place.

MUTYH variants and their role in carcinogenesis

Cheadle and Sampson [104] have presented a comprehensive review of MUTYH variants and their diagnostic implications. They reported that as of the end of 2006, 30 mutations that are predicted to produce truncated proteins have been reported in MUTYH, consisting of 11 nonsense, 9 small insertion/deletions and 10 splice site variants. Moreover, 52 missense variants and three small inframe insertion/deletions have been reported that are distributed throughout the gene [104]. To date, as presented in Table 2, the list of reported MUTYH variants has grown. Of the reported variants, Y165C and G382D together account for approximately 73% of reported MUTYH variants [104] and have been commonly identified in Caucasian populations, including American, British, Danish, Finnish, Dutch, Italian, and Portuguese (reviewed in refs. [61, 105]). Beside these two variants, the occurrence of the rest of the variants is rare, although recurrent variants have been observed in some populations and will be discussed later in this section.

Table 2 List of reported MUTYH variants

As seen in Table 2, most of the variants reported, as is usually the case with most genes, are categorized as single nucleotide polymorphisms (SNPs). SNPs are characterized by single base changes in genes and other DNA sequences and are discovered by DNA re-sequencing, single strand conformation polymorphism (SSCP), or nucleotide probing technologies, such as pyrosequencing, real time PCR and multiplex ligation-dependent probe amplification (MLPA). These single base changes give rise to missense and nonsense variants and may or may not result in an obvious phenotypic trait.

Over the past decade the determination of SNPs has become an important method of characterizing the individual differences in genetic makeup, as compared to the "normal" genome sequence. Of potentially greater importance are the SNPs that may increase an individual's susceptibility toward malignancy because of subtle differences in the polymorphic protein products due to nonsense mutations, especially when such variations are present in key structural areas.

Evidence supporting a role for defective MUTYH in human carcinogenesis is accumulating. The first study to establish such a role was that of Al-Tassan and co-workers [114], who described a British family in which three siblings with colorectal cancer and adenomatous polyposis were compound heterozygous for two germline mutations in the MUTYH gene that result in MUTYH proteins containing amino acid substitutions Tyr165Cys (Y165C) and Gly382Asp (G382D). Unaffected family members were either homozygous normal or heterozygous for one of the mutations, thus suggesting an autosomal recessive pattern of inheritance for the phenotype of multiple adenomas. The authors further showed that proteins containing these substitutions in E. coli lead to severe impairment of base excision repair. The Y165C mutation is located in the HhH motif, which is highly conserved in all mammalian MutY as well as in E. coli MutY proteins [62, 118, 119]. Functionally, it completely abolishes adenine glycosylase activity toward the A:8-oxoG mispair. The Gly382Asp mutation resides in the MutT-like domain and also gives rise to decreased adenine removal [114].

Since this discovery, which is the first to demonstrate a direct link between a defective human DNA repair gene and predisposition to colorectal cancer, considerable work has established the relationship between mutations in MUTYH and colorectal adenomas and carcinomas, and this disorder is now referred to as MUTYH-associated polyposis (MAP) [105, 107, 120]. To date, testing for MUTYH variants is recommended for patients who have clinical features of familial adenomatous polyposis (FAP) but either do not have inherited mutations in APC or have a family history consistent with recessive inheritance, as is the case for MAP [121].

As seen in Table 2 there are many variants of MUTYH reported in the literature, in addition to Y165C and G382D. These variants range from simple one nucleotide substitutions to insertions/deletions (indels) affecting more than one nucleotide. Some of these variants may give rise to non-functional MUTYH. Many of these variants, however, are silent, causing missense changes, or are small in-frame indels. Although without functional analyses and/or segregation analyses it is problematic to determine the effect of these variants, on the basis of the type of mutation, the location in the sequence of MUTYH, and the corresponding position in the structure it of MUTYH, is possible however, to make a prediction about the consequences of some of these types of mutation in some cases.

Among examples of the variants in which the consequence of MUTYH function may be predicted by their corresponding position in the structure of MUTYH are the P18L [113], V22M [114], and G25D [113] variants. These three SNPs are present in the RPA binding site in the N-terminus of MUTYH and could interfere with the localization of MUTYH to the site of DNA replication. In other instances, the variants E466X [109] and Y90X [107], which were found in individuals of Indian and Pakistani descent, respectively, represent truncated MUTYH protein. In these variants, X indicates a stop codon which results in premature termination of protein synthesis and in turn gives rise to non-functional MUTYH protein.

Nonetheless, the consequences of most of the missense variants are not obvious. Moreover, some missense variants are not conserved in the bacterial MutY enzymes, so it is difficult to make predictions (reviewed in ref. [122]). This, coupled with their collective frequency and lack of functional data, poses major difficulties for molecular diagnostics since many will be benign polymorphisms.

Mutations in the MUTYH gene and defective MUTYH activities are just beginning to be identified in human cancers. Consequently, it is obvious that more information about the clinical and molecular properties of MAP is needed to aid in the diagnosis and treatment of affected patients and family members. Such knowledge might also provide insight into how MUTYH mutations contribute more globally to malignancies other than colorectal cancer.