Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Chemical modifications of DNA bases (Fig. 1) have fundamental biological roles in virtually every living organism. In both prokaryotes and many eukaryotes, cytosine can be methylated at the carbon-5 (C5) position by cytosine-C5 methyltransferases (MTases) to generate 5-methylcytosine (5mC) (Goll and Bestor 2005; Kumar et al. 1994). In higher eukaryotes, 5mC dioxygenases ten-eleven translocation (TET) enzymes utilize α-ketoglutarate (αKG) and Fe(II) to oxidize the methyl group of 5mC to generate 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC) via discrete reactions (Ito et al. 2011; Kriaucionis and Heintz 2009; Tahiliani et al. 2009). In prokaryotes, 5mC and 5hmC can be introduced de novo into the genome during phage invasions, as both modified bases can be synthesized prior to incorporation into the phage genome during DNA synthesis (Warren 1980). After DNA synthesis, phage glucosyltransferases can modify 5hmC within the genome to generate glucosylated 5hmC (Kornberg et al. 1961; Lehman and Pratt 1960). Beyond cytosine-C5 modifications, exocyclic amine groups of cytosine and adenine can be methylated in prokaryotes to generate N4-methylcytosine (N4mC) and N6-methyladenine (N6mA) (Cheng 1995; Jeltsch 2002). Crystal structures of DNA modification enzymes to date have consistently shown that the target nucleotide is flipped out of the double helix for reactions in a process called base flipping.

Fig. 1
figure 1

Chemical modifications of DNA. (a) Cytosine-C5 modifications: enzymes and proteins involved in writing, reading, and erasing the modifications via base-flipping mechanisms. (b) Adenine-N6 methylation: enzymes involved in writing and erasing DNA adenine N6 methylation. (c) Cytosine-N4 methylation

In addition to the modification writers, modified base readers have also been shown to flip the target base for recognitions. Mammalian and plant SET- and RING-associated (SRA) domains recognize 5mC within the genome by base flipping (Arita et al. 2008; Avvakumov et al. 2008; Hashimoto et al. 2008; Rajakumara et al. 2011) and have been characterized as nonenzymatic base flippers. Since the first discovery in eukaryotes, SRA domains have been rediscovered in prokaryotes, recognizing 5mC, 5hmC, and/or 5ghmC to coordinate restriction activity in a modification-dependent manner (Horton et al. 2012, 2014a, b, c). In addition to SRA, the bacterial modified cytosine restriction B enzyme also flips 5mC for recognitions but is structurally distinct from other known base flippers (Sukackaite et al. 2012). Structural homologs of McrB across different phyla may recognize modified bases in a similar way.

A brief survey of DNA base modifications in both prokaryotes and eukaryotes reveals that two major families of enzymes, methyltransferases and dioxygenases, are involved in writing DNA modifications in the four forms of modified cytosine: 5mC, 5hmC, 5fC, and 5caC. In plants, the 5mC DNA glycosylase repressor of silencing 1 (ROS1) can excise 5mC and 5hmC (in vitro) (Gong et al. 2002; Jang et al. 2014; Hong et al. 2014), and in mammals, thymine DNA glycosylase (TDG) can excise 5fC and 5caC (He et al. 2011; Maiti and Drohat 2011; Zhang et al. 2012; Hashimoto et al. 2012a). These discoveries effectively link the base excision repair pathway to DNA demethylation/demodification, by which epigenetic signals encoded in the modified cytosines can be reversed. DNA glycosylases represent the most structurally diverse family of enzymes that are involved in base flipping (also known as nucleotide flipping) (Brooks et al. 2013). Thus, base flipping is not restricted to writers and readers, but has been adopted by DNA glycosylases for erasing DNA modifications as well. Together, structural characterizations of writers, readers, and erasers of DNA base modifications in prokaryotes and eukaryotes effectively showcase base flipping as a general mechanism for regulating and translating fundamental epigenetic signals.

2 Base Flipping for Methylation of DNA Bases

2.1 Bacterial DNMTs (HhaI, TaqI, PvuII)

Biological methylation is widely engaged in various regulations, and it uses S-adenosyl-l-methionine (AdoMet) as the primary methyl group donor. The methyl group of AdoMet is bound to a positively charged sulfur atom predisposed to a nucleophilic attack. During the methylation reaction, AdoMet loses the methyl group and becomes S-adenosyl-l-homocysteine (AdoHcy). A number of different families of methyltransferases use AdoMet as cofactor, targeting diverse substrates ranging from small molecules to large macromolecules such as DNA, RNA, proteins, lipid, and polysaccharides. The atoms subjected to methylation also vary, including carbon (C), nitrogen (N), oxygen (O), sulfur (S), and several metals. AdoMet-dependent DNA methyltransferases were first discovered in bacterial restriction-modification systems (Roberts et al. 2015). The known structures of AdoMet-dependent DNA methyltransferases share a common “MTase fold” characterized by mixed seven-stranded β sheets (6↓ 7↑ 5↓ 4↓ 1↓ 2↓ 3↓) in which strand 7 is inserted between strands 5 and 6 antiparallel to the others (Cheng and Roberts 2001) (Fig. 2a).

Fig. 2
figure 2

Writers of DNA modifications. (a) Prokaryotic DNA methyltransferases involved in three different types of DNA methylation have similar structures. DNA is colored in red, and AdoMet/AdoHcy is colored in blue. Flipped bases are shown in green. (b) E. coli AlkB and eukaryotic TET are dioxygenases with common structural folds. αKG is colored in blue, and metal in the active sites is colored in orange

M.HhaI was the first DNA methyltransferase to be structurally characterized (Cheng et al. 1993). It contains an N-terminal MTase domain and a C-terminal target recognition domain (TRD) (Cheng 1995). M.HhaI is a cytosine-C5 methyltransferase that methylates the first cytosine within 5′-GCGC-3′ recognition sequences and prevents R.HhaI restriction activity at the site (Roberts et al. 1976, 2015). Before the structure was available, the proposed mechanism predicted that the catalytic Cys81 would make a nucleophilic attack on the C6 of cytosine to form a covalent complex, followed by transferring the methyl group from AdoMet to cytosine-C5 and releasing the covalent intermediate (Wu and Santi 1985, 1987). In 1994, the crystal structure of M.HhaI-DNA complex with AdoMet was solved as a trapped covalent enzyme-DNA intermediate using 5-fluorocytosine and directly supported the proposed mechanism, presenting the catalytic cysteine covalently linked to C6 and showing methylated C5 adjacent to AdoHcy (Klimasauskas et al. 1994). Yet the most striking aspect of the structure was that both the MTase and the TRD of the enzyme work simultaneously to bind DNA and flip the target base into the active site pocket. The mechanism of DNA base access by base flipping has since been described as the framework for other DNA methyltransferases (Cheng and Roberts 2001).

After the first structure of M.HhaI-DNA complex was solved, many crystal structures of DNA methyltransferase-DNA complexes have been solved. Besides cytosine-C5 methylation, adenine exocyclic N6 methylation is also a critical modification in prokaryotic DNA (Fig. 1b) and in eukaryotic RNA (Low et al. 2001; Hattman 2005; Jia et al. 2013; Niu et al. 2013). Recent studies have also shown that Drosophila genome also harbors N6mA in DNA (Zhang et al. 2015). The structure of the adenine-N6 methyltransferase M.TaqI in complex with DNA and a nonreactive AdoMet analog was solved in 2001 (Goedecke et al. 2001). The enzyme methylates adenine within 5′-TCGA-3′ sequence and harbors a similar two-domain organization as M.HhaI, with the conserved N-terminal MTase domain, but a quite distinct C-terminal TRD (Cheng 1995; Goedecke et al. 2001). The ternary structure is remarkably reminiscent of M.HhaI, involving a flipped adenine in the active site, where the methyl group from the AdoMet analog is positioned near the N6 of the flipped adenine. Instead of the catalytic cysteine residue as in M.HhaI, the asparagine 105 side chain and the following proline backbone oxygen make hydrogen bonds with the adenine-N6 amine group, potentially modulating the direct transfer of the methyl group from AdoMet. A similar mode of interaction is also seen in the active site of the T4 phage DNA adenine methyltransferase (T4 Dam) that flips adenine in 5′-GATC-3′ sequence, and an aspartate residue (Asp171) contacts the adenine-N6 (Horton et al. 2005). Besides adenine-N6 methylation, cytosine-N4 methylation is another type of DNA methylation (Fig. 1c). For example, M.PvuII methylates the central cytosine within 5′-CAGCTG-3′ in the exocyclic amine (Blumenthal et al. 1985; Bheemanaik et al. 2006). The structure of M.PvuII is available only in an AdoMet-bound form without DNA, yet it contains many shared features of other methyltransferases in terms of domain organization and AdoMet interactions (Gong et al. 1997; Bheemanaik et al. 2006).

2.2 Mammalian DNMTs (DNMT1, DNMT3A/3L)

Structural features of classic prokaryotic methyltransferases are extensively shared by the mammalian DNA methyltransferases DNMT1, DNMT3A, and DNMT3B. They are all cytosine-C5 methyltransferases containing an MTase domain with a catalytic cysteine and a TRD. DNMT1 is primarily implicated in methylation of the daughter strand during DNA replication to maintain the methylation pattern encoded in the mother strand by preferentially recognizing hemi-methylated DNA in CpG dinucleotide context (Li et al. 1992; Yoder et al. 1997). On the other hand, DNMT3A and DNMT3B are considered de novo methyltransferases that can methylate CpG sites as well as non-CpG sites (Ramsahoye et al. 2000; Gowher and Jeltsch 2001; Suetake et al. 2003). Such differences in substrate specificities are partly due to the involvement of other domains outside the catalytic fragment. For example, a CXXC domain and a BAH1 domain within DNMT1 hinder methylation of unmethylated CpG sites (Song et al. 2011), whereas DNMT3A and DNMT3B do not contain such domains and can readily methylate them.

2.3 Implications of DNA Methyltransferase Dimers (DNMT3A/3L and EcoP15I)

Besides being a catalytic domain, the MTase domain can participate in protein-protein interactions as exemplified by the DNMT3A MTase domain interacting with a naturally inactive MTase-like domain of DNMT3L, a scaffold protein that binds histone tail H3 to guide DNMT3A activities by forming a tetramer of 3L-3A-3A-3L (Jia et al. 2007; Ooi et al. 2007). Interestingly, a multi-subunit prokaryotic DNA N6mA methyltransferase, EcoP15I, contains a DNA MTase dimer in which one monomer is involved in target base flipping and the other in the recognition of DNA base context (Gupta et al. 2015). Thus, dimerization of two structurally comparable proteins for divergent functionalities may be a mechanism for fine-controlling genomic DNA modifications.

2.4 Plant DNMTs

Plant DNA MTases show similar functionalities as the mammalian counterparts. Met1 is homologous to mammalian DNMT1 and is responsible for the maintenance of CpG methylation, whereas domain rearranged methyltransferase 2 (DRM2) is involved in de novo DNA methylation (Goll and Bestor 2005; Law and Jacobsen 2010). DRM2 contains a rearranged MTase domain, such that its N-terminal half is equivalent to the C-terminal half of the conventional MTase fold and vice versa. A structural study of DRM2 family MTase domain has revealed that the rearranged domain still forms a classic MTase structure and functions as a homodimer (Zhong et al. 2014) analogous to DNMT3A-3L heterodimer. In addition to Met1 and DRM2, plants also have plant-specific DNA methyltransferases, such as CMT2 and CMT3 that are specifically involved in non-CpG methylation (Stroud et al. 2014; Lindroth et al. 2001; Zemach et al. 2013). The higher diversity of the MTase family within plants compared to the mammalian family suggests that DNA methylation may be more dynamically regulated in plants than in mammals.

3 Base Flipping in Oxidative Modifications of Methylated Bases

3.1 Eukaryotic TET Enzymes

The 5mC is by far the most widely studied modified base. Yet, if 5mC has been considered “the fifth” base, 5hmC is increasingly being labeled as “the sixth” base and has garnered much attention recently. The existence of 5hmC in bacteriophage, modified from 2′-deoxycytidine before the integration into the viral genome (Warren 1980), was first reported in 1953 (Wyatt and Cohen 1953). In 1993, a novel J base (β-D-glucosyl-hydroxymethyluracil) was discovered in trypanosomes, in which J-binding proteins (JBP1 and JBP2) are involved in oxidizing the C5-methyl group of thymine during J-base synthesis by using αKG and Fe(II) as cofactors to generate 5-hydroxymethyluracil (Gommers-Ampt et al. 1993; Borst and Sabatini 2008). In 2009, mammalian JBP homolog TET enzymes were discovered to oxidize the methyl group of 5mC to generate 5hmC (Tahiliani et al. 2009; Kriaucionis and Heintz 2009). Further analysis revealed that TET enzymes could further oxidize 5hmC to 5fC and then to 5caC (Ito et al. 2011). Also, TET enzymes have been shown to convert thymine (5-methyluracil) to 5-hydroxymethyluracil by oxidizing the C5-methyl group of thymine (Pfaffeneder et al. 2014; Pais et al. 2015).

Eukaryotic JBP/TET homologs are present across many eukaryotic organisms including amoeboflagellate Naegleria gruberi (Iyer et al. 2013; Hashimoto et al. 2014b). Crystal structures of Naegleria gruberi TET-like (NgTET) and human TET2 (hTET2) in complex with 5mC-, 5hmC-, and 5fC-containing DNA have been characterized (Hashimoto et al. 2014b, 2015a; Hu et al. 2013a, 2015). All TET structures show a flipped base positioned in the active site pocket close to N-oxalylglycine (NOG)—an inactive αKG analog—and a divalent metal such as Fe(II) or Mn(II) used for stalling the enzyme in the pre-reaction state. Some of the features of flipped base recognitions observed in DNMT-DNA complex structures (Cheng and Roberts 2001; Horton et al. 2005) can also be seen in the structures of TET-DNA complexes. The flipped base in the active site of a TET enzyme in complex with DNA is stabilized by π stacking interactions involving an aromatic residue such as Phe295 in NgTET (Hashimoto et al. 2014b) and Tyr1902 in hTET2 (Hu et al. 2013a). Also, polar residues such as Asn147, His297, and Asp234 in NgTET contact O2, N3, and N4, respectively, to guide substrate specificities (Hashimoto et al. 2014b), and the methyl or the hydroxymethyl group is oriented toward NOG and Fe(II)/Mn(II) (Hashimoto et al. 2015a; Hu et al. 2015). Often, active site pockets for flipped bases not only contain residues for base recognition but also specifically orient the base for distinct reactions depending on the type of enzymes. Base flipping is therefore a common mechanism applied by different classes of enzymes, such as AdoMet-dependent methyltransferases and αKG- and Fe(II)-dependent dioxygenases to recognize and stabilize the target base for specific reactions.

3.2 AlkB and Homologs

Similar to TET enzymes, eukaryotic homologs of E. coli AlkB such as FTO and ALKBH5 are also αKG- and Fe(II)-dependent dioxygenases that can oxidize the methyl group of N6mA within mRNA to yield demethylated adenine (Jia et al. 2011; Zheng et al. 2013; Zhu and Yi 2014). Indeed, TET-DNA complex structures are remarkably comparable to that of the AlkB-DNA complex, and both TET enzymes and AlkB homologs perform base flipping as part of their reaction mechanism (Hu et al. 2013a; Iyer et al. 2013; Hashimoto et al. 2014b; Zhu and Yi 2014; McDonough et al. 2010). Common structural folds include two twisted β-sheets in the core where the active site is formed (Fig. 2b). However, the two enzyme families differ in an important way. TET enzymes oxidize CH3 attached to an inert carbon atom (cytosine or thymine C5). The resulting product 5hmC (or 5hmU) is very stable and can undergo further oxidations in subsequent rounds of reactions to generate further oxidized products. On the other hand, FTO and ALKBH5 likely generate N6-hydroxymethyladenine intermediate in which the oxidized carbon is attached to a reactive nitrogen atom (adenine-N6). This intermediate spontaneously releases the hydroxymethyl group as formaldehyde and decomposes to adenine—the final “demethylated” product (Hashimoto et al. 2015b) (Fig. 1b). Therefore, AlkB and its homologs are demethylases, while TET enzymes should not be designated as a demethylase but would rather be appropriately understood as a “writer” that generates additional marks on 5mC within genomes to alter epigenetic signals.

Several biochemical observations suggest that modified cytosines beyond 5mC may form distinct epigenetic signals. Many 5mCpG readers such as methyl-CpG binding domain (MBD) proteins have shown significantly reduced binding affinity toward 5hmC when compared to 5mC within CpG context (Hashimoto et al. 2012b), whereas some proteins may preferentially bind 5hmC (Spruijt et al. 2013). DNMT1 has a significantly reduced affinity toward hemi-hydroxymethylated DNA substrate compared to hemi-methylated DNA (Hashimoto et al. 2012b), suggesting that methylation marks altered by TET enzymes can be lost in subsequent DNA replications. In addition, the RNA polymerase II transcription rate can be specifically reduced by 5fC and 5caC (Kellinger et al. 2012; Wang et al. 2015). These findings strongly point to the possibility that modifications beyond 5mC are distinct signals, and much future work is needed to elucidate how the modified bases are differently implicated in larger biological contexts.

4 Base Flipping in the Recognition of Modified Bases

4.1 Eukaryotic SRA Domains

The function of 5mC and N6mA in prokaryotes was classically understood in the context of restriction-modification systems, in which methylated bacterial DNA is protected from restriction digestion (Wilson and Murray 1991). Effects of DNA methylation are fundamentally determined by the way the methyl groups alter various protein-DNA interactions. In eukaryotes, genomic 5mC bases are widely involved in various regulatory processes to control gene expression, chromatin states, and genomic stability that are highly relevant in the human disease context (Robertson 2005). Such penetrating biological implications can be partly attributed to a large number of protein-DNA interactions that are potentially affected by DNA methylation in a direct manner. Evidence shows that several transcription factors are prevented from DNA binding when the binding site is methylated (Tate and Bird 1993), whereas several MBD family proteins are specific 5mCpG readers, as previously mentioned (Klose and Bird 2006). The interface between methylated DNA and its biological effects can be further complicated by the involvement of the nucleosome context which is closely interwoven with DNA methylation (Hashimoto et al. 2010; Cedar and Bergman 2009).

The initial discovery of 5mC-binding proteins has raised the possibility of other readers involved in modified base recognitions. In 2007, another family of 5mC readers was discovered in plants and was termed SET and RING associated (SRA) domain as a part of VIM1 (Woo et al. 2007). A mammalian homolog to VIM1 is Uhrf1, which can associate with DNMT1 during post-replicative maintenance of DNA methylation (Bostick et al. 2007). In the following year, three crystal structures of the mammalian Uhrf1 SRA domain in complex with 5mC-containing DNA were reported (Hashimoto et al. 2008; Avvakumov et al. 2008; Arita et al. 2008). The structures have revealed that SRA recognizes 5mC by base flipping, although it is not a DNA-modifying enzyme such as methyltransferases or dioxygenases. SRA is also structurally distinct from other base flippers and is characterized by a twisted β-sheet fold resembling a half-moon shape (Fig. 3a). Remarkably, the 5mC-binding pocket of SRA features familiar modes of base recognitions exemplified by π stacking interactions, recognitions of the N3 and N4 by Asp474 side chain, and a van der Waals contact of the C5-methyl group of flipped 5mC by Ser486 Cβ.

Fig. 3
figure 3

Readers of DNA modifications. (a) Prokaryotic and eukaryotic SRA domains recognize C5-modified cytosine via base flipping and are similar in structures. (b) Crystal structure of prokaryotic McrB-N monomer flipping 5mC

Interestingly, the SRA of Uhrf2 binds 5hmC with a slightly higher preference compared to 5mC, and the crystal structure of Uhrf2 SRA in complex with 5hmC-containing DNA is available (Zhou et al. 2014). In the structure, 5hmC is flipped and stabilized, and the OH moiety of the hydroxymethyl group is contacted by the backbone carbonyl groups of Thr508 and Gly509 in the active site pocket which is slightly larger in size compared to that of Uhrf1 SRA. Therefore, the eukaryotic SRA domain has been characterized as a base-flipping domain that recognizes both 5mC and 5hmC.

4.2 Prokaryotic SRA Domains

Recently, SRA domains have been rediscovered in prokaryotes in families of modification-dependent restriction enzymes that recognize modified bases and introduce a double-stranded break in some distances away. MspJI was among the first such enzymes to be reported, which recognizes hemi-modified 5mC or 5hmC by the N-terminal SRA-like domain and restricts the DNA by the C-terminal endonuclease domain (Cohen-Karni et al. 2011). The crystal structure of MspJI has been solved with substrate DNA, revealing an SRA-like structure in the N-terminal modification recognition domain that flips the target 5mC (Cohen-Karni et al. 2011; Horton et al. 2014c). Despite the lack of amino acid sequence conservation between eukaryotic UHRF1/2 SRA and MspJI SRA, all SRA domains feature a twisted β-sheet fold with a half-moon shape (Fig. 3a).

As more modification-dependent restriction enzymes have been identified, some of them are found with different specificities toward 5mC, 5hmC, and 5ghmC. AbaSI, unlike MspJI, has an N-terminal Vsr-like endonuclease domain and a C-terminal SRA-like domain (Borgaro and Zhu 2013; Horton et al. 2014a). Its SRA domain seems to preferentially recognize 5ghmC and 5hmC compared to 5mC, as the relative rate of cleavage of DNA containing the corresponding modification is 5ghmC:5hmC:5mC = 8000:500:1 (Wang et al. 2011). Structural features within SRA domains that fine-tune such specificities await future characterizations.

4.3 McrB-N as Distinct 5mC Reader

Modification-dependent restriction enzymes also utilize yet another 5mC recognition domain (Fig. 3b). The N-terminus of McrB (McrB-N) recognizes 5mC next to adenine within 5′-ACCGGT-3′ sequences, and McrC associates with McrB to provide endonuclease activity (Sutherland et al. 1992; Gast et al. 1997). The crystal structure of McrB-N in complex with 5mC-containing DNA shows flipped 5mC in the active site, revealing a novel fold distinct from any other known base flippers (Sukackaite et al. 2012). The active site displays familiar π stacking of the flipped 5mC via aromatic residues and van der Waals contact of the C5-methyl group via the side chain of Leu68. So far, SRA is the only known modified base reader in eukaryotes that flips the target base, and no eukaryotic homolog of McrB-N has been identified. However, the history of the discovery of base flippers suggests a strong possibility of its structural homologs present in a wide spectrum of phyla.

4.4 DpnI as N6mA Reader

While base flipping seems to be a major mechanism by which a modified DNA base can be recognized, it should be noted that modified bases can be recognized by some transcription factors in a sequence-dependent context as well (Spruijt et al. 2013; Hu et al. 2013b), none of which involves base flipping. Along with the previously mentioned MBD family proteins that recognize 5mC within the simple dinucleotide CpG sequence, certain mammalian zinc-finger family proteins such as Kaiso (Buck-Koehntop et al. 2012), Zfp57 (Liu et al. 2012), Klf4 (Liu et al. 2014), and Egr1 (Hashimoto et al. 2014a) bind 5mC within specific sequences via a common structural motif (Liu et al. 2013; Hashimoto et al. 2015b). In addition, another zinc-finger transcription factor WT1 (Hashimoto et al. 2014a) and the basic helix-loop-helix (bHLH) family Tcf3-Ascl1 heterodimer (Golla et al. 2014) can specifically bind 5caC within their consensus sequences. In prokaryotes, DpnI harbors a C-terminal winged-helix (WH) domain that recognizes the methyl group of N6mA within 5′-GATC-3′ sequence via Trp138 involving van der Waals interactions (Mierzejewska et al. 2014). Therefore, DNA modifications may regulate transcription-binding sites in much more dynamic and selective manners than they were previously understood.

5 Base Flipping in Removing Modified and Unmodified Bases

5.1 Mammalian Thymine DNA Glycosylase (TDG)

The discovery of TET-mediated modified cytosine bases has provided a fresh insight into a long sought-after pathway of 5mC demethylation/demodification within mammalian genomes (see review (Zhu 2009)). In the base excision repair pathway, DNA glycosylases cleave the glycosidic bond between the ribose and the target base and represent the most structurally diverse family of base-flipping enzymes (Brooks et al. 2013). Initially, it was hypothesized that 5mC is removed by 5mC DNA glycosylase(s), as mammalian 5mC DNA glycosylase activity had been reported (Vairapandi and Duker 1993, 1996; Vairapandi et al. 2000). However, the glycosylase involved was never identified. After the discovery of TET enzymes, mammalian TDG that generally removes uracil or thymine mismatched to guanine was surprisingly revealed to excise 5fC and 5caC to establish genome-wide DNA demethylation (He et al. 2011; Maiti and Drohat 2011; Hashimoto et al. 2012a; Zhang et al. 2012). The crystal structure of the human TDG catalytic domain in complex with 5caC-containing DNA was also solved (Fig. 4a), presenting the flipped base in the active site where the C5-carboxyl moiety of 5caC is specifically recognized by the side chain of Asn157 and the Tyr152 amide backbone (Zhang et al. 2012). The discovery of TDG excising 5fC and 5caC has effectively linked the base excision repair pathway to DNA demethylation in mammalian system.

Fig. 4
figure 4

Erasers of DNA modifications. (a) Crystal structure of human TDG flipping 5caC opposite guanine. (b) Crystal structure of Geobacillus stearothermophilus endonuclease III in complex with DNA. Iron-sulfur cluster is colored in orange and yellow

5.2 Plant ROS1

In plants, paradoxically, bona fide 5mC DNA glycosylases were clearly demonstrated and identified in 2002 (Gong et al. 2002), approximately a decade before TET and TDG were implicated in DNA demethylation. In Arabidopsis, four closely related 5mC DNA glycosylases exist: ROS1, DME, DML2, and DML3 (Gong et al. 2002; Morales-Ruiz et al. 2006; Gehring et al. 2006; Ortega-Galisteo et al. 2008). They have a catalytic glycosylase domain homologous to E. coli endonuclease III (Fig. 4b), a helix-hairpin-helix (HhH) fold DNA glycosylase that harbors an iron-sulfur cluster-binding site and excises damaged pyrimidines (Ponferrada-Marin et al. 2009, 2011; Mok et al. 2010). Both ROS1 and DME have been shown to excise 5mC in vivo and in vitro (Gong et al. 2002; Ponferrada-Marin et al. 2009; Gehring et al. 2006; Mok et al. 2010), and they are shown to excise 5hmC, but not 5fC and 5caC in vitro (Hong et al. 2014; Jang et al. 2014; Brooks et al. 2014). Thus, plant ROS1 and mammalian TDG have mutually exclusive substrate specificities for 5mC, 5hmC, 5fC, and 5caC; the first two are substrates for ROS1 and the latter two for TDG (Hashimoto et al. 2012a; Hong et al. 2014). One of the most surprising aspects of plant 5mC DNA glycosylases is that they excise the target base only when both the catalytic glycosylase domain and the C-terminal domain are present (Hong et al. 2014; Mok et al. 2010). The C-terminal domain of ROS1 is conserved only among plant 5mC DNA glycosylases and has been shown to strongly associate with the catalytic domain, suggesting that domain-domain interactions are important for target base recognition and excision (Hong et al. 2014).

While TDG and ROS1 have been clearly implicated in DNA demethylation pathways, jury is still out on the possibility of the contribution of other pathways to DNA demethylation. In addition to the previously mentioned mammalian 5mC DNA glycosylase activities, 5hmC DNA glycosylase activity was observed in a calf thymus extract (Cannon et al. 1988). A recent proteomic study has revealed that several mammalian DNA glycosylases such as NTH1, OGG1, NEIL1, and NEIL2 bind 5mC- and 5hmC-containing DNA in a modification-specific manner (Spruijt et al. 2013), though they by themselves do not have the glycosylase activity against 5mC or 5hmC (Hong et al. 2014).

The 5mC DNA glycosylase activity by ROS1 is interesting from a standpoint of historical characterization of DNA glycosylases as DNA damage repair enzymes. In a given genome, there can be many types of damaged bases, and their diversity is on par with many classes of DNA glycosylases that are structurally distinct (Brooks et al. 2013). On the other hand, 5mC in plants is not considered a damaged base and exists in substantial amounts in the Arabidopsis genome (Zhang et al. 2006). Thus, ROS1 must be regulated and specifically targeted to a certain genomic location to initiate DNA demethylation (Zheng et al. 2008; Qian et al. 2012). In addition to 5mC, ROS1 is comparably active on thymine mismatched to guanine and on some damaged pyrimidines, suggesting that ROS1 can be involved in both DNA demethylation and DNA damage repair (Ponferrada-Marin et al. 2009, 2010). Such dual functionality can be applied to TDG as well, which not only excises thymine or uracil mismatched to G during the process of DNA mismatch repair but also excises 5fC and 5caC base paired with guanine for DNA demodification in mammals (He et al. 2011; Maiti and Drohat 2011; Hashimoto et al. 2012a; Zhang et al. 2012).

5.3 Achaeon PabI Activity as Adenine DNA Glycosylase

Interestingly, the archaeal Pyrococcus abyssi PabI enzyme was initially thought to be a restriction endonuclease but has recently been re-characterized as a sequence-specific adenine DNA glycosylase (Miyazono et al. 2014). PabI is comparable to MutY family mismatch repair DNA glycosylases that excise target adenine mismatched to 8-oxoguanine (Fromme et al. 2004). However PabI is remarkably distinct from MutY, because PabI excises adenine correctly base paired to thymine in a targeted manner. It is therefore possible that DNA glycosylases have adapted to function in more processes than DNA damage repair by removing benign bases for various biological regulations.

Conclusions

First observed in 1994 in the crystal structure of M.HhaI with DNA, base flipping is now understood as a common mode of protein-DNA/RNA interactions adopted by structurally and functionally distinct classes of proteins across various phyla. Base flipping is the only known mechanism for establishing DNA modifications in a targeted manner via DNA methyltransferases and TET dioxygenases. What used to be considered a eukaryote-specific base-flipping 5mC reader, SRA, has later been shown to be widely prevalent in prokaryotic systems for recognizing several modified bases including 5mC, 5hmC, and 5ghmC. In addition to SRA, more structurally diverse classes of modified base readers have been discovered in prokaryotes, such as the base-flipping 5mC reader McrB-N and the N6mA-recognizing WH domain of DpnI (using non-base-flipping mechanism). Also, DNA glycosylases are base flippers primarily characterized as DNA repair enzymes, though not all DNA glycosylases flip a base/nucleotide for base excision, as presented in the very recent example of bacterial AlkD (Mullins et al. 2015). Today, DNA demodification is considered a bona fide output of the base excision repair pathway through DNA glycosylases, such as mammalian TDG and plant ROS1 whose mechanism of action again involves base flipping. In an era in which DNA modifications are considered critical and increasingly complex epigenetic signals, this simple, but elegant, structural mechanism for protein-DNA interaction is preserved as a truly ubiquitous framework.