Introduction

Mapping the epigenetic modifications of DNA and RNA becomes increasingly crucial to understand their diverse biological functions. At least 17 and 160 types of chemical modifications have been discovered in DNA and RNA, respectively (Raiber et al., 2017; Boccaletto et al., 2018). DNA modification plays important roles in several biological processes and diseases, including development (Greenberg and Bourc’his, 2019), aging (Unnikrishnan et al., 2019), cancer (Koch et al., 2018), etc. These modifications would not interfere with Watson-Crick pairing but affect the DNA-protein interaction while in the major groove of the double helix. In the mammalian genome, methylation at the 5th carbon of cytosine (5-methylcytosine, or 5mC) is the most predominant DNA modification, which is also called the “fifth base” (Greenberg and Bourc’his, 2019). The reaction is catalyzed by DNA methyltransferases (DNMTs) and mostly found in the context of symmetrical CpG dinucleotides, although a small percentage of methylation at CHG and CHH sequences (where H correspond to A, T or C) is also observed in embryonic stem (ES) cells. While showing tissue-specific differences, mammalian genomes exhibit particularly high CpG methylation levels, 70% to 80% of CpGs are methylated (Li and Zhang, 2014).

Other modifications apart from 5mC have also been found in mammalian DNA. In 2009, two groups have independently reported the existence of 5-hydroxymethylcytosine (5hmC) in mammalian genome, which is now widely accepted as the “sixth base”. Tahiliani et al. showed that the ten-eleven translocation 1 (TET1) enzyme catalyses the conversion of 5mC to 5hmC (Tahiliani et al., 2009), while Kriaucionis demonstrated the presence of 5hmC in mouse brain (Kriaucionis and Heintz, 2009). Further successive oxidations mediated by TET result in formation of 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) (He et al., 2011; Ito et al., 2011; Pfaffeneder et al., 2011). These two oxidative products are hypothesized to be intermediates in an active DNA demethylation pathway, which are excised by thymine DNA glycosylase (TDG) and restored to unmodified cytosines through the base excision repair (BER) pathway (He et al., 2011).

Similar to DNA, cellular RNA is also decorated with diverse chemical modifications, and such modifications participate in all aspects of RNA metabolism. The multitude of modifications in RNA add a new layer to the gene regulation, leading to the emerging field of “RNA epigenetics” or “epitranscriptomics” (He, 2010; Frye et al., 2016; Gilbert et al., 2016; Roundtree et al., 2017). Recently developed high-throughput sequencing technologies for detecting RNA modifications have greatly accelerated the functional study of epitranscriptomics (Li et al., 2016b). Here, we primarily focus on mRNA modifications, including N6-methyladenosine (m6A), N6,2’-O-dimethyladenosine (m6Am), N1-methyladenosine (m1A), 5-methylcytosine (m5C), 5-hydroxymethylcytosine (hm5C), N4-acetylcytidine (ac4C), pseudouridine (Ψ), N7-methylguanosine (m7G), etc.

The interest in understanding the functions of DNA and RNA modifications as well as the related molecular mechanisms has been growing, which drives the progresses in developing chemical and biochemical tools to detect specific modifications within genomes and transcriptomes. On the other hand, the development of new technologies contributes to increased knowledge on modifications of DNA and RNA. In this review, we mainly focus on high-throughput detection strategies for DNA (Fig. 1) and RNA (Fig. 2) modifications, and their biological findings as well as questions to be addressed.

Figure 1
figure 1

Single-base resolution methods for quantitatively profiling mammalian DNA modifications of cytosine. For 5 methylcytosine (5mC) mapping there are three methods, whole-genome bisulfite sequencing (WGBS), TET-assisted pyridine borane sequencing (TAPS) and enzymatic methyl-sequencing (EM-Seq); For 5-hydroxymethylcytosine (5hmC) mapping in single-base resolution there are four methods, oxidative bisulphite sequencing (oxBS-Seq), TET-assisted bisulphite sequencing (TAB-Seq) and APOBEC-coupled epigenetic sequencing (ACE-Seq) as well as chemical-assistant C-to-T conversion of 5hmC sequencing (hmC-CATCH); Four sequencing methods for mapping the 5-formylcytosine (5fC), chemically assisted bisulfite sequencing (fCAB-Seq), reduced BS-Seq (redBS-Seq), M.SssI methylase-assisted bisulfite sequencing (MAB-Seq) and 5fC cyclization-enabled C-to-T transition of 5fC (fC-CET); Chemical modification-assisted bisulfite sequencing (CAB-Seq) is a singe-base resolution sequencing method to map 5caC

Figure 2
figure 2

Chemical structures, modification enzymes and high-throughput detection strategies of modifications in the transcriptome

Base-resolution sequencing for DNA modifications

Base-resolution sequencing of the predominant DNA modification: 5mC

Bisulfite sequencing (BS-Seq), is regarded as the gold standard for 5mC detection. It is based on the differential reactivity of bisulfite between methylated C (5mC and 5hmC) and unmethylated C, in which DNA is treated with bisulfite that leads to the deamination of unmethylated C to Uracil (U) while methylated C is resistant to deamination. In the subsequent PCR amplification, methylated C remains C while unmethylated C can be readout as T. Whole-genome bisulfite sequencing (WGBS), as a whole genome method of BS-Seq, has been widely utilized in DNA methylation profiling, as it can provide single-base resolution with full genome coverage (Fig. 1). The readouts of methylated or unmethylated C from individual genomic locations of the whole genome are digital counts in WGBS, resulting in high resolution and precision with unmethylated C conversion efficiency over 99%, thus making WGBS the most accepted method for charting the DNA methylation landscape (Adey and Shendure, 2012; Kobayashi and Kono, 2012; Yamaguchi et al., 2012; Kobayashi et al., 2013; Shirane et al., 2013). However, the harsh bisulfite treatment degrades the majority of the DNA (Tanaka and Okamoto, 2007), which severely limits its applications to those precious DNA samples with low input, even though several efforts have been made to improve the DNA recovery (Smallwood et al., 2014; Clark et al., 2017). Moreover, since unmethylated cytosines accounting for nearly 95% of the total cytosine in mammalian genome are converted to thymine, the bisulfite treatment reduces the sequence complexity of template DNA, leading to low mapping rates, uneven genome coverages and inherent biases. The last but not the least it should be noted that bisulfite conversion could not distinguish between 5mC and 5hmC, given that it only provides the combined signal of 5mC and 5hmC.

Recently, two bisulfite-free whole-genome base-resolution DNA methylation sequencing methods have been developed to replace WGBS. The TET-assisted pyridine borane sequencing (TAPS) (Liu et al., 2019b) was introduced to detect 5mC and 5hmC, in which 5mC and 5hmC are firstly oxidized by TET to 5caC and then reduced to dihydrouracil (DHU) using pyridine borane with around 98% conversion efficiency, and subsequently readout as T after PCR amplifications (Fig. 1). It should be noted that the conversion rates of TAPS and bisulfite sequencing are different measurements. When comparing like-for-like, TAPS has a lower false positive rate (falsely detect unmodified C as modified, 0.23%) than bisulfite sequencing (0.6%). Compared with bisulfite sequencing, TAPS further demonstrates higher mapping rate and quality, more even coverage as well as lower sequencing cost. Through mild enzymatic and chemical reactions, TAPS can work effectively with as little as 1 ng of genomic DNA and circulating cell-free DNA, illustrating its potential for the clinical applications on challenging samples. As a nondestructive sequencing method, TAPS can preserve DNA fragments over 10 kilobases long, based on which a targeted long-read TAPS (lrTAPs) was recently develped (Liu et al., 2020), allowing accurate long-range methylation sequencing and phasing with third-generation sequencing technologies, such as Nanopore and SMRT sequencing. Modification of TAPS such as addition of β-glucosyltransferase (βGT) protection or replacement of TET enzyme with potassium perruthenate for oxidation could enable selective sequencing of 5mC (TAPSβ) or 5hmC (chemical-assisted pyridine borane sequencing, CAPS) (Liu et al., 2019b).

Another bisulfite-free method, the enzyme-based method Enzymatic Methyl-Seq (EM-Seq) has been developed by New England BioLabs, which first utilizes TET2 and βGT to oxidize and glucosylate 5mC and 5hmC to 5gmC. This provides protection from deamination by the AID/APOBEC family DNA deaminase APOBEC3A in the next step while unmodified C is deaminated to U (Fig. 1). EM-Seq showed higher mapping efficiency and more uniform GC coverage than BS-Seq. However, conversion of all unmodified C to U by EM-Seq would still cause low complexicity problem in the sequencing library and lower DNA input such as 100 pg resulted in PCR duplicate rate as high as 84.5% while only 10.8% of the reads were usable (Vaisvila et al., 2019).

Base-resolution sequencing of the “sixth base”: 5hmC

The in-depth investigation of the biological functions of 5hmC requires elucidating the distribution patterns of 5hmC in genomes, preferentially at single-nucleotide resolution. Two modified BS-seq methods has been developed for mapping 5hmC. Oxidative bisulfite sequencing (oxBS-Seq) (Booth et al., 2012) is based on the selective and quantitative chemical oxidation of 5hmC using potassium perruthenate (KRuO4) to produce 5fC that subsequently converted to U by bisulfite treatment with an overall 5hmC-to-U conversion rate of 94.5% (Fig. 1). The absolute level and precise position of 5hmC in oxBS-Seq will be detected by subtracting signals of oxBS-Seq from BS-Seq (mESCs) (Booth et al., 2012, 2013). Deep sequencing depth is required to achieve high-confidence 5hmC mapping for oxBS-Seq, as it needs subtraction from two random sampling-based BS-Seq experiments are required.

TET-assisted bisulfite sequencing (TAB-Seq) is an approach in which 5hmC is firstly modified using βGT, and then 5mC is subsequently oxidized to 5caC by TET1 (Fig. 1). Subsequent bisulfite treatment enables 5hmC detected as C, while C, 5mC, 5fC and 5caC are readout as T, offering a strategy for the directly mapping of 5hmC at a single base resolution (Yu et al., 2012a, b). TAB-Seq can achive over 96% of conversion rate of 5mC to T in genomic DNA with over 90% of 5hmC protected from conversion. It has been applied to not only confirm a high-confidence mapping of widespread distribution of 5hmC across the whole genome of mouse embryonic stem cells (mESCs) but also demonstrate strand asymmetry and sequence bias at the 5hmC. Additionally, 5hmC was shown to be highly enriched at distal regulatory elements through TAB-seq analysis.

More recently, two bisulfite-free approaches have been developed to map 5hmC at base-resolution. Chemical-assistant C-to-T conversion of 5hmC sequencing (hmC-CATCH) is a bisulfite-free method to map 5hmC, which is based on selective oxidation of 5hmC to 5fC by potassium ruthenate (K2RuO4) with a conversion efficiency of ~94% and subsequent chemical labeling and conversion of 5fC to T during PCR (Zeng et al., 2018) (Fig. 1). hmC-CATCH allows direct detection of 5hmC as T without affecting unmodified C or 5mC. It was illustrated that potassium ruthenate causes less DNA damage than potassium perruthenate, and enables the mapping of 5hmC with nanoscale genomic DNA, which is especially benificial for those biological and clinical samples with limited amounts. Futhermore, this method was applied to detect the cell-free DNA (cfDNA) of healthy donors and cancer patients, and revealed base-resolution hydroxymethylome in the human cfDNA for the first time.

Another method, APOBEC-coupled epigenetic sequencing (ACE-Seq) (Schutsky et al., 2018) has been developed as a bisulfite-free and enzymatic method for base resolution of sequencing of 5hmC (Fig. 1). Similar to EM-Seq, it uses AID/APOBEC to deaminate unmodified C and 5mC to U after protecting 5hmC with βGT first, so it remains as C after PCR amplification. ACE-Seq achieved 99.9% and 99.5% conversion rates for cytosine and 5mC, respectively, while 98.5% of 5hmC remained as C. Compared with conventional bisulfite-based methods, ACE-seq is non-destructive, which allows for high confidence 5hmC profiles with up to 1000-fold less DNA input. 5hmC was found to be almost entirely confined to CG dinucleotides in tissue-derived cortical excitatory neurons by using ACE-seq. Similarly, Li et al. reported an APOBEC3A-mediated deamination sequencing (AMD-seq) which was also established for localization analysis of 5hmC at base-resolution (Li et al., 2018).

Base-resolution sequencing of 5fC

5fC chemically assisted bisulfite sequencing (fCAB-Seq) was the first quantitative method to sequence 5fC at single-base resolution in genomic DNA (Song et al., 2013) (Fig. 1). In fCAB-Seq, 5fC is modified with O-ethylhydroxylamine (EtONH2) to form a derivative which can not be converted to U during the following BS-Seq. Therefore, the precise genomic locations of 5fC at single-base level can be identified, through comparison of EtONH2-treated BS-Seq and conventional BS-Seq of the same sample. Applying fCAB-Seq, low abundance 5fC at endogenous loci at levels down to only a few percent could be detected. Another bisulfite-based method termed reduced BS-Seq (redBS-Seq) was developed to quantititively detect 5fC in genomic DNA at single-base resolution (Booth et al., 2014), which is based on a selective reduction of 5fC to 5hmC by sodium borohydride (NaBH4) followed by BS-Seq (Fig. 1). Using redBS-Seq, 5fC was demonstrated to be negatively correlated to 5hmC in locations where 5fC and 5hmC appeared simultaneously. The 5fC protection rate for fCAB-Seq is 50%–60%, while it is nearly 97% for redBS-Seq.

Another bisulfite-dependent genome-wide method, termed methylase-assisted bisulfite sequencing (MAB-Seq), can quantitatively detect 5fC and 5caC simultaneously at single-base resolution (Guo et al., 2014; Wu et al., 2014; Neri et al., 2015) (Fig. 1). In this approach, genomic DNA is first treated with the CpG methyltransferase M.SssI which efficiently methylates CpG dinucleotides, and the following bisulfite treatment can only result in deamination of 5fC and 5caC which readout as T, while C, 5mC and 5hmC are readout as C in the subsequently sequencing, since unmodified CpGs in the original genomic DNA are mythylated as 5mCpG. MAB-Seq, through which 84.7% of 5fC and 99.5% of 5caC are efficently converted, respectively, reveals strong strand asymmetry of active demethylation within palindromic CpGs. Using this method, 5fC and 5caC in ESCs were found to occur on active promoters and enhancers, and be associated with TET and TDG. The generation and excision of 5fC and 5caC indicated a dynamic DNA demethylation activity mediated by TET/TDG using MAB-Seq combined with Tdg depletion. MAB-seq could be further combined with sodium borohydride reduction to map 5caC and 5fC separately at a single base-resolution (Wu et al., 2016).

Two bisulfite-free sequencing methods have been developed to map 5fC at a single base-resolution in genomic scale. In fC-CET (5fC cyclization-enabled C-to-T transition), an azido derivative of 1,3-indandione (AI) was used to achieve selectively labelling of 5fC (Xia et al., 2015) (Fig. 1). The azide group in the labelling adduct enabled the efficient enrichment of 5fC containing DNA fragments, which largely reduced the sequencing cost for 5fC detection in a whole genome as compared with fCAB-Seq and redBS-Seq, considering the limited abundance of 5fC in the genome. With this method, genome-wide 5fC maps were obtained on the single-base level for the first time in both Tdgfl/fl mESCs and Tdg−/− mESCs with no noticeable DNA degradation, demonstrating a limited overlap with 5hmC. Moreover, the first single-cell 5fC sequencing method termed chemical-labeling-enabled C-to-T conversion sequencing (CLEVER-Seq) was introduced based on malononitrile labeling of 5fC (Zhu et al., 2017). With this method, conversion rate of ∼86.4% was observed for the 5fC site. Besides, the highly dynamic 5fC profile and its intrinsic heterogeneity were revealed at single base resolution for mouse embryos and mESCs, and the abundance of 5fC in promoter region could regulate corresponding gene expression.

Base-resolution sequencing of 5caC

Chemical modification-assisted bisulfite sequencing (CAB-Seq) has been developed to sequence 5caC at base-resolation (Lu et al., 2013) (Fig. 1). In CAB-Seq, 5caC is protected as an amide in a 1-ethyl-3-[3-imethylaminopropyl] carbodiimide hydrochloride (EDC) catalyzed reaction, which could not be converted to U during bisulfite treatment, and hereby readout as C. Therefore, 5caC could be detected by subtracting the BS-Seq signal from CAB-seq method. Based on CAB-Seq, DNA immunoprecipitation-coupled CAB-Seq (DIP-CAB-Seq) (Lu et al., 2015), as a pre-enrichment-based bisulfite sequencing strategy, was developed to map 5fC and 5caC at single-base resolution level in genome-wide both for WT and Tdg KO mouse ESCs, and illustrated only a very limited overlap existed between 5fC and 5caC.

Antibody- or immunoprecipitation (IP)- based mapping methods for modified DNA

While we focus on base-resolution sequencing methods, antibody- or IP-based DNA modification detection strategies are traditionally widely used for the sake of simple and low-cost features. Methylated DNA immunoprecipitation (MeDIP)(Weber et al., 2005) used a 5mC-specific antibody to recognize and pull-down the DNA fragment with 5mC modification. Similar to MeDIP, 5hmC/5fC/5caC, can be recognized with specific antibodies (Ficz et al., 2011; Stroud et al., 2011; Shen et al., 2013).

A method profiling 5hmC in genomic DNA termed as hmC-seal (Song et al., 2011b) was developed as an antibody-independent method on the basis of selective chemical labeling and the extremely specific and tight biotin-streptavidin interaction, which can be then used to perform selective pull-down. Using hmC-seal to profile 5hmC, researchers found 5hmC signatures in cell free DNA could be diagnostic biomarkers for human cancers (Li et al., 2017a; Song et al., 2017).

Despite the low-cost sequencing, the antibody- or IP- based methods for modified DNA are not quantitative and do not offer base-resolution information. In addition, the specificity is highly depended on the quality of the antibody, and high background noise could result from cross-reactivity with off-target sites and intrinsic affinity of IgG for short unmodified DNA repeats (Booth et al., 2015; Lentini et al., 2018). Therefore, profile of modified DNA detected by antibody-or IP-based methods should be interpreted with care.

Sequencing of N6-methyladenine (6mA) in DNA

In spite of its scarcity in mammalian DNA, 6mA has grabed increasing attention since the presence of 6mA in various eukaryotic genomes was confirmed in 2015 (Fu et al., 2015; Greer et al., 2015; Zhang et al., 2015). LC-MS/MS mass spectrometry can quantify the proportion of 6mA/A with a high sensitivity and is able to detect 6mA with very low abundance. DNA 6mA sequencing mainly relies on antibody enrichment, which is prone to background noise and off-target binding as desbribed above (Lentini et al., 2018). The third generation sequencing methods are also used to identify 6mA in DNA, which are discussed in the third part. However, recent studies revealed that the sample contamination, RNA contamination, technological limitations, and antibody non-specificity may cause serious problems in quantification and sequencing of 6mA in mammalian genomic DNA, casting doubts on the significance of 6mA in the mammalian genome (O’Brown et al., 2019; Douvlataniotis et al., 2020; Musheev et al., 2020). However, 6mA could be a regulatory mark in mammalian mitochondrial DNA (mtDNA) (Hao et al., 2020).

The biogenesis and sequencing approaches for RNA modifications

The most prevalent internal mRNA modification: m6A

m6A is the most prevalent internal modification in eukaryotic mRNA. It is primarily catalyzed by a methyltransferase complex (termed “writers”) consisting of METTL3 and METTL14 as well as additional protein subunits (including WTAP, VIRMA, HAKAI, Zc3h13, and RBM15/15B) (Harper et al., 1990; Bokar et al., 1994; Liu et al., 2014; Ping et al., 2014; Schwartz et al., 2014b; Wang et al., 2014; Patil et al., 2016; Wen et al., 2018; Yue et al., 2018) (Fig. 2A). Another methyltransferase METTL16 has been identified to methylate MAT2A mRNA (Pendleton et al., 2017) (Fig. 2A). m6A can be demethylated by FTO and ALKBH5 (“erasers”) (Jia et al., 2011; Zheng et al., 2013) (Fig. 2A), hence is a reversible modification. Dynamic m6A methylomes have been identified in physiological processes, across tissues, and in response to stimuli (Dominissini et al., 2012; Meyer et al., 2012; Schwartz et al., 2013; Zhou et al., 2015; Cao and Li, 2016; Roundtree et al., 2017; Liu et al., 2019c; Xiao et al., 2019). As an important epitranscriptomic mark, m6A plays critical roles in mRNA splicing, polyadenylation, export, translation, stability, structure, etc.

Most of the high-throughput sequencing methods of m6A rely on an m6A-specific antibody. For instance, m6A/MeRIP-Seq uses the antibody to identify thousands of m6A peaks in mammalian mRNA (Dominissini et al., 2012; Meyer et al., 2012). PA-m6A-Seq, m6A-CLIP, and miCLIP utilize UV-induced antibody-RNA crosslinking to obtain the base-resolution m6A profiles (Chen et al., 2015; Ke et al., 2015; Linder et al., 2015). m6A-LAIC-Seq compares RNA abundances in m6A-positive and m6A-negative fractions to quantify the m6A stoichiometry on a transcriptome-wide scale (Molinie et al., 2016). Endoribonuclease-based strategies to detect m6A (MAZTER-Seq and m6A-REF-Seq) have been developed, providing examples of antibody-independent m6A sequencing methods (Garcia-Campos et al., 2019; Zhang et al., 2019b). Another antibody-free m6A sequencing method, DART-Seq, utilizes fused APOBEC1-YTH protein to induce C-to-U editing at site adjacent to m6A, thus identifying m6A sites (Meyer, 2019). Very recently, two chemical labeling methods (m6A-label-seq and m6A-SEAL) have also been developed (Shu et al., 2020; Wang et al., 2020). Despite the fact that m6A has been profiled extensively, cautions should still be taken when using specific methods for m6A detection. For instance, the antibody-based methods could be influenced by the intrinsic bias of the antibody and binding to particular RNA sequence or other modification (Schwartz et al., 2013; Linder et al., 2015). For the endoribonuclease-based methods, they do not pre-enrich m6A sites, have motif preference and thus detect only part of m6A sites. For the chemical labeling methods, labeling efficiency are needed to be improved. Hence, new methods are still desired to facilitate the study of m6A.

Despite these advances supporting the crucial roles of m6A in various cellular and physiological processes, there are still many issues in our understanding of m6A-mediated regulatory roles in gene expression. FTO, the first RNA demethylase identified both in vivo and in vitro to erase m6A, binds to exon and intron regions of pre-mRNA (Jia et al., 2011; Fu et al., 2013; Bartosovic et al., 2017). FTO-mediated demethylation of m6A has regulatory roles in alternative splicing and translation (Bartosovic et al., 2017; Yu et al., 2018). FTO dynamically regulates m6A RNA in response to heat shock stress, DNA UV damage and virus infection (Zhou et al., 2015; Gokhale et al., 2016; Xiang et al., 2017). Moreover, FTO-mediated m6A demethylation affects cell growth and plays an oncogenic role in cancer cells (Cui et al., 2017a; Li et al., 2017c; Su et al., 2018). Therefore, the demethylase activity of FTO is very important for diverse physiological processes. A recent study reported in a liver-specific Fto-transgenic mice model, Fto can mediate demethylation of both internal m6A and cap m6Am (Zhou et al., 2018). Moreover, another study found FTO preferentially demethylates m6Am than m6A (Mauer et al., 2017). Further investigations found that FTO shows differential substrate preferences for m6A and m6Am in polyadenylated RNA in the nucleus versus in the cytoplasm, and can mediate tRNA m1A demethylation as well (Wei et al., 2018). Collectively, FTO can demethylate multiple substrates, but it is still unclear how FTO coordinates the demethylation of multiple modifications and what are the regulatory roles of FTO in each methylation substrates.

m6A can play an important role in pre-mRNA splicing. An initial study has revealed that m6A peaks are overrepresented in alternative exons, suggesting m6A may have regulatory functions in mRNA splicing (Dominissini et al., 2012). Further investigations reported that perturbation of m6A writers, erasers, or readers has effects on splicing. For m6A writers, the depletion of Mettl3 in mouse embryonic stem cells (mESCs) significantly affects alternative splicing (Geula et al., 2015); METTL16 can modify MAT2A transcript and regulate intron retention of MAT2A (Pendleton et al., 2017). For m6A erasers, the depletion of ALKBH5 was shown to alter splicing in HeLa cells (Zheng et al., 2013); FTO preferentially binds to intronic regions of pre-mRNA and the depletion of FTO in HEK293T and mouse 3T3-L1 cells also results in changes in pre-mRNA splicing (Zhao et al., 2014; Bartosovic et al., 2017). For m6A readers, nuclear reader YTHDC1 regulates splicing of m6A-methylated mRNAs by recruiting splicing factors (SRSF3 and SRSF10) (Xiao et al., 2016); HNRNPA2B1, another nuclear reader of m6A, elicits consequences on alternative splicing similar to those of METTL3 (Alarcon et al., 2015); HNRNPC and HNRNPG regulate the expression as well as alternative splicing of the target mRNAs via m6A-switch (Liu et al., 2015, 2017). A more recent study has reported that m6A is decorated in nascent RNA and can regulate the kinetics of RNA splicing (Louloupi et al., 2018). Despite the above rich information supporting a role of m6A in splicing, there is one study claiming that mRNA m6A modification can be deposited before splicing but it is not required for splicing in mESCs (Ke et al., 2017). Thus, future investigations are still needed to determine how m6A directly or indirectly affects pre-mRNA splicing and which transcripts are regulated by m6A in different biological contexts.

m6A has intricate functions during diverse viral infection. m6A can be deposited in the RNAs of Zika virus (ZIKV), hepatitis C virus (HCV), influenza A virus (IAV), simian virus 40 (SV40), and human immunodeficiency virus 1 (HIV-1). m6A negatively regulates the infection of ZIKV and HCV (Gokhale et al., 2016; Lichinchi et al., 2016b); while m6A promotes gene expression and replication of IAV and SV40 (Courtney et al., 2017; Tsai et al., 2018). The contention for the regulatory roles of m6A was observed in HIV: m6A was shown to enhance HIV-1 gene expression and replication (Kennedy et al., 2016; Lichinchi et al., 2016a), while m6A was also found to inhibit HIV-1 infection by decreasing the reverse transcription (RT) of HIV-1 (Tirumuru et al., 2016; Lu et al., 2018). Besides, m6A methylation in the mRNA of host cells also has regulatory roles in response to viral infection (Liu et al., 2019d; Wang et al., 2019). Together, m6A is an important epitranscriptomic mark for controlling viral infection, but it is still unclear how m6A regulates viral infection and why m6A has different regulatory outputs towards diverse viruses.

At the beginning of mRNA: m6Am

The first adenosine proximal to 5’ cap is 2’-O-methylated adenosine (Am), which can be further methylated by methyltransferase PCIF1 to form m6Am (Akichika et al., 2019; Boulias et al., 2019; Sendinc et al., 2019; Sun et al., 2019) (Fig. 2B). Similar to m6A, the N6-methyl group of m6Am can also be demethylated by FTO (Mauer et al., 2017; Wei et al., 2018) (Fig. 2B). Since m6A-specific antibody do not distinguish between m6Am and m6A, m6A/MeRIP-Seq and miCLIP can be used to detect m6Am at transcription start sites (TSSs) (Linder et al., 2015; Sun et al., 2019). Recently, a more specific detection method of m6Am has also been developed: m6Am-Exo-Seq utilizes a 5’ exonuclease to deplete the internal m6A-containing RNA fragments and enrich capped 5’ terminus of mRNA, followed immunoprecipitation (IP) with antibody against m6A (Sendinc et al., 2019). Nevertheless, all current m6Am sequencing technologies still rely on anti-m6A antibody, thus further development of unbiased and specific m6Am detection methods are still desired to help us to better understand the m6Am methylome.

The presence of m6Am was originally suggested to alter mRNA stability (Mauer et al., 2017); however, this finding was recently challenged. The transcripts with m6Am-cap were purposed with enhanced stabilities in HEK293T cells, and FTO knockdown causes a global increase in the expression level of m6Am-containing mRNAs (Mauer et al., 2017). However, the authors did not tease out the combinatorial effects of m6Am from internal m6A. Another study found FTO depletion does not noticeably affect the expression levels of mRNAs containing only m6Am in HEK293T cells (Wei et al., 2018). Further studies revealed that the loss of m6Am modification in PCIF1 knockout (KO) HEK293T or MEL624 cells does not significantly affect the level of mRNAs with m6Am either (Akichika et al., 2019; Sendinc et al., 2019). Controversial observations were also been reported: when mRNAs in the lower and upper half of gene expression were separately examined, only the half-life of m6Am-containing mRNAs in the lower half of gene expression were significantly decreased in PCIF1 KO HEK293T cells (Boulias et al., 2019). On the other hand, it appears that m6Am also influences mRNA translation (Akichika et al., 2019; Sendinc et al., 2019). Collectively, the regulatory function of m6Am in mRNA is still at its early stage and remains to be fully explored.

Another well-known methylated adenosine: m1A

m1A is an isomer of m6A, with the methyl group attached to the N1 instead of N6 position. m1A is known to present in tRNA, rRNA, and recently has also been identified in mRNA (Dominissini et al., 2016; Li et al., 2016a, 2017b; Safra et al., 2017). Similar to m6A, m1A in RNA is a dynamic and reversible modification. The methyltransferase complex TRMT6-TRMT61A is responsible for the installation of a subset of m1A in mRNA, while other set of methyltransferases, TRMT61B and TRMT10C catalyze the formation of m1A in mitochondrial mRNA (Li et al., 2017b; Safra et al., 2017) (Fig. 2C). The reversal of m1A in RNA can be catalyzed by ALKBH1, ALKBH3, and FTO (Dominissini et al., 2016; Li et al., 2016a; Liu et al., 2016; Wei et al., 2018) (Fig. 2C).

Recently, several groups have independently developed transcriptome-wide approaches to map m1A methylomes (m1A-ID-Seq, m1A-Seq, m1A-MAP, and m1A-Seq-TGIRT) (Dominissini et al., 2016; Li et al., 2016a, 2017b; Safra et al., 2017). During RT, m1A causes termination or misincorporation, thus m1A sites can be identified at single-base resolution after IP by commercial antibody and sequencing. Moreover, demethylase treatment or Dimroth rearrangement are further used to remove the RT signatures of m1A as an additional validation step. m1A-MAP identified 473 m1A sites in human mRNA (Li et al., 2017b); however, m1A-Seq-TGIRT detected only 15 m1A sites in human mRNA (Safra et al., 2017), due to its limited sensitivity (Xiong et al., 2018). This is further exemplified by the fact that all the m1A sites identified by m1A-Seq-TGIRT are included in the more comprehensive m1A list by m1A-MAP. In addition, independent studies have reported that the m1A/A ratio in human mRNA is about 0.01%–0.05% (Dominissini et al., 2016; Li et al., 2016a; Ueda et al., 2017; Xu et al., 2017), supporting the existence of hundreds to thousands of m1A sites in mRNA (Dominissini et al., 2016; Li et al., 2016a, 2017b) instead of just a handful sites detected by m1A-Seq-TGIRT (Safra et al., 2017). Moreover, in-depth analysis revealed potential reasons that lead to insensitivity of m1A-Seq-TGIRT, including severe reads duplication, rRNA contamination, significant RNA degradation, low efficiency of Dimroth reaction, limited sequencing depth, etc. (Xiong et al., 2018). Very recently, new approaches (m1A-IP-Seq and m1A-quant-Seq) utilizing an evolved reverse transcriptase that reads through m1A more efficiently also reported hundreds of m1A sites, further corroborating its prevalence in mRNA (Zhou et al., 2019).

Chemical modifications in cytosine: m5C, hm5C, and ac4C

m5C is formed by methylation at the C5 position of cytosine, which is present in tRNA, rRNA, and mRNA (Dubin and Taylor, 1975). In mRNA, NSUN2 is the main m5C methyltransferase (Squires et al., 2012; Hussain et al., 2013b; Yang et al., 2017) (Fig. 2D). Drawing lessons from m5dC detection in DNA, m5C in RNA can be detected by a modified bisulfite treatment to achieve single-base resolution (Schaefer et al., 2009; Squires et al., 2012). To avoid potentially annealing to the inefficiently deaminated RNA templates, ACT random hexamers devoid of Gs were applied to prime the bisulfite-treated poly(A)-enriched RNA samples for RT (Yang et al., 2017). The mRNA export adapter ALYREF and the DNA/RNA binding protein YBX1 have been identified as m5C readers (Yang et al., 2017, 2019; Chen et al., 2019). Besides, several groups independently developed strategies to detect m5C: Aza-IP utilized a cytidine analogue, 5-azacytidine, to form a covalent adduct with methyltransferase, which can enrich and subsequently sequence m5C targets (Khoddami and Cairns, 2013); miCLIP of m5C (different from m6A miCLIP) exploited the formation of covalent bond between C271A mutant NSUN2 and substrate to detect the enriched m5C targets (Hussain et al., 2013b); m5C-RIP used m5C-specific antibody to identify m5C peaks in bacteria, archaea, yeast and plant transcriptomes (Edelheit et al., 2013; Cui et al., 2017b). Among them, bisulfite sequencing is the most widely used, which is single-base resolution and potentially quantitative. However, it also has limitations: it could lead to the loss of RNA due to harsh chemical and thermal conditions, thus this method is insensitive to detect m5C in low abundant RNA. Unconverted cytosines and other cytosine modifications resistant to bisulfite treatment may result in false-positive detection (Hussain et al., 2013a; Gilbert et al., 2016; Shafik et al., 2016). Furthermore, Aza-IP and miCLIP of m5C are bisulfite-independent and can pre-enrich m5C targets, but require over-expression of methyltransferase, which may lead to false-positive detection from nonspecific targeting by the highly expressed and potential mis-localized enzymes within the cell. Therefore, future development of more sensitive and accurate m5C detection methods are still desired (Yuan et al., 2019).

m5C can be further oxidized by ten-eleven translocation (TET) family enzymes to form hm5C (Fu et al., 2014; Delatte et al., 2016; Shen et al., 2018) (Fig. 2E). Similar to m5C-RIP, hMeRIP-Seq relied on anti-hm5C antibody to detect over 3,000 hm5C peaks in Drosophila mRNA (Delatte et al., 2016). Additionally, the N4 position of cytosine can be acetylated by the acetyltransferase NAT10 to form ac4C, which is present in tRNA, rRNA, and mRNA (Dong et al., 2016; Arango et al., 2018) (Fig. 2F). Based on ac4C-specific antibody, acRIP-Seq exploited anti-ac4C antibody and identified over 4,000 ac4C peaks in the human transcriptome (Arango et al., 2018). However, both detection strategies of hm5C and ac4C are based on specific antibodies and cannot reach single-base resolution, which hinders functional studies of RNA modification. Thus, learning from the success of single-base and quantitative m6A sequencing technologies, optimized methods are expected to be developed (Yuan et al., 2019).

The rotation isomerization of uridine: Ψ

Ψ, known as the “fifth nucleotide” of RNA, is the most abundant modification in RNA and widely present in tRNA, rRNA, snRNA, and mRNA (Karijolich et al., 2015). The formation of Ψ is catalyzed by two kinds of pseudouridine synthases (PUSs): “stand-alone” PUSs that require no cofactor and the RNA-dependent PUSs that require the cofactor, box H/ACA-box small nucleolar RNA (snoRNA), as guides to recognize substrates (Song and Yi, 2019) (Fig. 2G). In human, stand-alone synthases PUS1, PUS7, TRUB1 and the RNA-dependent synthase DKC1 have been reported to catalyze a subset of Ψ in mRNA (Carlile et al., 2014; Schwartz et al., 2014a; Li et al., 2015), but it is still unclear whether other PUSs can also modify mRNA.

High-throughput sequencing methods for Ψ (Ψ-Seq, Pseudo-Seq, PSI-Seq, and CeU-Seq) rely on a chemical, N-cyclohexyl-N’-β-(4-methylmorpholinium) ethylcarbodiimide (CMC), which can specifically label Ψ (Carlile et al., 2014; Lovejoy et al., 2014; Schwartz et al., 2014a; Li et al., 2015). During RT, the CMC-Ψ adduct can cause stop at one nucleotide 3′ to the labeled Ψ site, enabling the detection of 100–400 Ψ sites in human mRNA at base resolution (Carlile et al., 2014; Lovejoy et al., 2014; Schwartz et al., 2014a). However, these methods cannot pre-enrich Ψ sites and may dropout Ψ in low abundant RNA. CeU-Seq utilized a CMC derivative, azido-CMC (N3-CMC), to allow the pre-enrichment of Ψ-containing RNA through biotin pulldown, which identified about 2,000 Ψ sites in human mRNA (Li et al., 2015). In fact, the ratio of Ψ/U in mammalian mRNA as measured by LC-MS/MS (about 0.2%–0.6%) is comparable to the content of m6A (Li et al., 2015), which further supports the existence of thousands of Ψ sites in mRNA. The CMC chemistry can also be coupled to high resolution qPCR analysis to conveniently detect locus-specific Ψ sites in mRNA and lncRNA (Lei and Yi, 2017). Moreover, bisulfite treatment can have Ψ nucleotide to form a monobisulfite adduct, which causes a deletion signature at the Ψ sites during RT. Thus, utilizing bisulfite treatment, RBS-Seq has been developed to detect Ψ modification (Khoddami et al., 2019). However, similar to the ordinary CMC labeling, this method also cannot pre-enrich Ψ sites and identified 322 Ψ sites in mRNA; even for abundant tRNA, RBS-Seq failed to detect all known Ψ sites. Recently, by combining CMC-labeling and demethylase treatment, DM-Ψ-Seq has been developed to detect global Ψ sites in tRNAome (Song et al., 2019).

CMC-labeling is not perfect. Alkaline treatment step could lead to RNA degradation, and not all Ψ sites can be equally labeled. These may have led to the low overlap of identified Ψ sites in mRNA by different methods. Yet, Ψ sites in abundant non-coding RNAs (rRNA, etc.) were highly correlated, suggesting the abundance and thus the sequencing depth certainly influence the modification list. Further comparisons of different methods have revealed other factors that need to be considered, such as varied sequencing depth, different bioinformatics algorithms and cutoffs, distinct cell lines and/or growth conditions, etc (Li et al., 2016b; Zaringhalam and Papavasiliou, 2016). On the other hand, considering the dynamic nature of Ψ modification, it is likely that only a subset of pseudouridylation events have been reported. Thus, further improvements for Ψ profiling with quantification and higher sensitivity are still needed.

Not only a cap modification, but also an internal modification: m7G

m7G is a well-known mRNA cap modification. It is also prevalent in tRNA and recently has been identified in mRNA as well (Chu et al., 2018; Malbec et al., 2019; Zhang et al., 2019a). METTL1-WDR4, known as a tRNA m7G methyltransferase complex, installs a subset of internal m7G in mRNA (Fig. 2H). Both antibody-based and chemical labeling sequencing methods have been developed to map m7G methylomes. m7G-MeRIP-Seq used m7G-specific antibody to identified over 2,000 internal m7G peaks in the mammalian transcriptome (Zhang et al., 2019a). m7G miCLIP-Seq utilized cross-linking-induced truncation and mutation to detect m7G (Malbec et al., 2019). m7G-Seq adopted a reduction-induced depurination reaction to generate a basic site at m7G positions, which can be further labeled with biotin and subsequently pulled down. The labeled m7G sites in RNA can cause misincorporation during RT, thus achieving the base-resolution map of m7G methylome (Zhang et al., 2019a). Benefiting from high-throughput detection strategies, two groups independently found that internal m7G in mRNA plays regulatory roles in translation (Malbec et al., 2019; Zhang et al., 2019a). Considering that METTL1-modified m7G in tRNA is also required for translation and modification enzymes are shared between mRNA and tRNA (Lin et al., 2018), it would be interesting to separately probe its function in mRNA.

Long-read sequencing for DNA and RNA modifications

Most sequencing methods described above work with next-generations equencing, which is limited by short sequencing length. In contrast, third-generation sequencing methods including PacBio Single-Molecule Real-Time (SMRT) sequencing (Ardui et al., 2018; Wenger et al., 2019) and Oxford Nanopore sequencing (Clarke et al., 2009; Jain et al., 2018), have been developed to enable long-read and single-molecule sequencing of DNA and RNA. Apart from the much longer read-length, both SMRT and Nanopore sequencing also allow direct readout of DNA and RNA modification.

SMRT sequencing, which is based on the differentiation of nucleobases in DNA through the fluorescent labelled nucleotide being incorporated into DNA by polymerases, can also detect base modifications using on polymerase kinetics, such as 5mC, 5hmC and 6mA (Flusberg et al., 2010). Genome-wide mapping of 5hmC at single-base resolution in mESCs was realized by chemical labeling-mediated SMRT sequencing (Song et al., 2011a). Chemical labeling enables the affinity enrichment of 5hmC-containing DNA fragments and increases the kinetic signal of 5hmC during SMRT sequencing. SMRT sequencing can detect 6mA in DNA, however, causions should be made since it overestimates 6mA level in DNA samples when it is rare (O’Brown et al., 2019). Moreover, it is possible to detect m6A in RNA and secondary structure of RNA by SMRT sequencing combined with reverse transcription (Vilfan et al., 2013).

As for Oxford Nanopore sequencing, different molecules can generate different ionic current when they pass through the nanoscale pore, which is then employed as characterized signatures to discriminate nucleosides in DNA or RNA (Venkatesan and Bashir, 2011; Jain et al., 2016; Garalde et al., 2018). Nanopore sequencing can directly detect DNA or RNA without PCR amplification or cDNA conversion in real time (Rand et al., 2017; Simpson et al., 2017; Garalde et al., 2018). It can be applied to detect different kinds of modified bases, such as 5mC, 5hmC and 6mA in DNA and m6A, Inosine, m5C, Ψ, and m7G in RNA, as well as secondary structure of RNA and G-quadruplex (Li et al., 2013; Simpson et al., 2017; Garalde et al., 2018; Wongsurawat et al., 2018; Liu et al., 2019a; Smith et al., 2019; Viehweger et al., 2019; Workman et al., 2019).

Although third generation sequencing is promising in direct detecting DNA and RNA modification, the high error rate and unmatured base-calling prevent the practical application at present. The combination of certain sequencing methods mentioned above with third generation sequencing could provide highly accurate long-read epigenetic sequencing, such as lrTAPS (Liu et al., 2020).

Conclusion and outlook

In summary, we highlight the advances of mapping methods for DNA and RNA modification, and biological discoveries with their application in recent years. Collectively, these methods set a stage for systematic investigation of the functional significance of DNA and RNA modification in biological processes and human diseases. However, the current pace of advancement needs to continue in order to develop affordable and accurate assays to detect DNA and RNA modification, especially at the most phenotypically relevant sites, with the eventual goal of bringing these assays to routine use in clinical utility.

Abbreviations

5caC, 5-carboxylcytosine; 5fC, 5-formylcytosine; 5gmC, 5-(β-glucosyloxymethyl) cytosine; 5hmC, 5-hydroxymethylcytosine; 5mC, 5-methylcytosine; 6mA, N6-methyladenine; ac4C, N4-acetylcytidine; ACE-Seq, APOBEC-coupled epigenetic sequencing; AI, azido derivative of 1,3-indandione; AID/APOBEC, activation-induced (cytidine) deaminase/apolipoprotein B mRNA editing enzyme; AMD-Seq, APOBEC3A-mediated deamination sequencing; APOBEC3A, apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like; BER, base excision repair; BS, bisulfite sequencing; C, cytosine; CAB-Seq, chemical modification-assisted bisulfite sequencing; CLEVER-Seq, chemical-labeling-enabled C-to-T conversion sequencing; CMC, N-cyclohexyl-N’-β-(4-methylmorpholinium) ethylcarbodiimide; DHU, dihydrouracil; DIP-CAB-Seq, DNA immunoprecipitation-coupled CAB-Seq; DNATS, DNA methyltransferases; EM-Seq, Enzymatic Methyl-Seq; EDC, 1-ethyl-3-[3-imethylaminopropyl] carbodiimide hydrochloride; EtONH2, O-ethylhydroxylamine; fCAB-Seq, 5fC chemically assisted bisulfite sequencing; fC-CET, 5fC cyclization-enabled C-to-T transition; FFPE, formalin-fixed paraffin embedded; hm5C, 5-hydroxymethylcytosine; hmC-CATCH, chemical-assistant C-to-T conversion of 5hmC sequencing; IP, immunoprecipitation; K2RuO4, potassium ruthenate; KRuO4, potassium perruthenate; lncRNA, long noncoding RNA; lrTAPs, long-read TAPS; m1A, N1-methyladenosine; m6A, N6-methyladenosine; m6Am, N6,2’-O-dimethyladenosine; m5C, 5-methylcytosine; m7G, N7-methylguanosine; MAB-Seq, M.SssI methylase-assisted bisulfite sequencing; MeDIP, Methylated DNA immunoprecipitation; mESCs, mouse embryonic stem cells, mRNA, messenger RNA; NaBH4, sodium borohydride; oxBS-Seq, Oxidative bisulfite sequencing; PCR, polymerase chain reaction; rRNA, ribosomal RNA; RT, reverse transcription; SMRT, Pac Bio Single-Molecule Real-Time; snRNA, small nuclear RNA; snoRNA, small nucleolar RNA; T, thymine; TAB-Seq, TET-assisted bisulfite sequencing; TAPS, TET-assisted pyridine borane sequencing; TDG, thymine DNA glycosylase; Tdg KO mESCs, Tdg Knowkout mESCs; TET1, ten-eleven translocation 1; tRNA, transfer RNA; U, Uracil; WGBS, Whole-genome bisulfite sequencing; β-GT, β-glucosyltransferase; Ψ, pseudouridine.