Mapping the epigenetic modifications of DNA and RNA

Over 17 and 160 types of chemical modifications have been identified in DNA and RNA, respectively. The interest in understanding the various biological functions of DNA and RNA modifications has lead to the cutting-edged fields of epigenomics and epitranscriptomics. Developing chemical and biological tools to detect specific modifications in the genome or transcriptome has greatly facilitated their study. Here, we review the recent technological advances in this rapidly evolving field. We focus on high-throughput detection methods and biological findings for these modifications, and discuss questions to be addressed as well. We also summarize third-generation sequencing methods, which enable long-read and single-molecule sequencing of DNA and RNA modification.


INTRODUCTION
Mapping the epigenetic modifications of DNA and RNA becomes increasingly crucial to understand their diverse biological functions. At least 17 and 160 types of chemical modifications have been discovered in DNA and RNA, respectively (Raiber et al., 2017;Boccaletto et al., 2018). DNA modification plays important roles in several biological processes and diseases, including development (Greenberg and Bourc'his, 2019), aging (Unnikrishnan et al., 2019), cancer (Koch et al., 2018), etc. These modifications would not interfere with Watson-Crick pairing but affect the DNAprotein interaction while in the major groove of the double helix. In the mammalian genome, methylation at the 5th carbon of cytosine (5-methylcytosine, or 5mC) is the most predominant DNA modification, which is also called the "fifth base" (Greenberg and Bourc'his, 2019). The reaction is catalyzed by DNA methyltransferases (DNMTs) and mostly found in the context of symmetrical CpG dinucleotides, although a small percentage of methylation at CHG and CHH sequences (where H correspond to A, T or C) is also observed in embryonic stem (ES) cells. While showing tissue-specific differences, mammalian genomes exhibit particularly high CpG methylation levels, 70% to 80% of CpGs are methylated .
Other modifications apart from 5mC have also been found in mammalian DNA. In 2009, two groups have independently reported the existence of 5-hydroxymethylcytosine (5hmC) in mammalian genome, which is now widely accepted as the "sixth base". Tahiliani et al. showed that the ten-eleven translocation 1 (TET1) enzyme catalyses the conversion of 5mC to 5hmC (Tahiliani et al., 2009), while Kriaucionis demonstrated the presence of 5hmC in mouse brain (Kriaucionis and Heintz, 2009). Further successive oxidations mediated by TET result in formation of 5-formylcytosine (5fC) Lin-Yong Zhao and Jinghui Song have contributed equally to this work. and 5-carboxylcytosine (5caC) Ito et al., 2011;Pfaffeneder et al., 2011). These two oxidative products are hypothesized to be intermediates in an active DNA demethylation pathway, which are excised by thymine DNA glycosylase (TDG) and restored to unmodified cytosines through the base excision repair (BER) pathway .
The interest in understanding the functions of DNA and RNA modifications as well as the related molecular mechanisms has been growing, which drives the progresses in developing chemical and biochemical tools to detect specific modifications within genomes and transcriptomes. On the other hand, the development of new technologies contributes to increased knowledge on modifications of DNA and RNA. In this review, we mainly focus on high-throughput detection strategies for DNA ( Fig. 1) and RNA (Fig. 2) modifications, and their biological findings as well as questions to be addressed.

BASE-RESOLUTION SEQUENCING FOR DNA MODIFICATIONS
Base-resolution sequencing of the predominant DNA modification: 5mC Bisulfite sequencing (BS-Seq), is regarded as the gold standard for 5mC detection. It is based on the differential reactivity of bisulfite between methylated C (5mC and 5hmC) and unmethylated C, in which DNA is treated with bisulfite that leads to the deamination of unmethylated C to Uracil (U) while methylated C is resistant to deamination. In the subsequent PCR amplification, methylated C remains C while unmethylated C can be readout as T. Whole-genome bisulfite sequencing (WGBS), as a whole genome method of BS-Seq, has been widely utilized in DNA methylation profiling, as it can provide single-base resolution with full genome coverage (Fig. 1). The readouts of methylated or unmethylated C from individual genomic locations of the whole genome are digital counts in WGBS, resulting in high resolution and precision with unmethylated C conversion efficiency over 99%, thus making WGBS the most accepted method for charting the DNA methylation landscape (Adey and Shendure, 2012;Kobayashi and Kono, 2012;Yamaguchi et al., 2012;Kobayashi et al., 2013;Shirane et al., 2013). However, the harsh bisulfite treatment degrades the majority of the DNA (Tanaka and Okamoto, 2007), which severely limits its applications to those precious DNA samples with low input, even though several efforts have been made to improve the DNA recovery (Smallwood et al., 2014;Clark et al., 2017). Moreover, since unmethylated cytosines accounting for nearly 95% of the total cytosine in mammalian genome are converted to thymine, the bisulfite treatment reduces the sequence complexity of template DNA, leading to low mapping rates, uneven genome coverages and inherent biases. The last but not the least it should be noted that bisulfite conversion could not distinguish between 5mC and 5hmC, given that it only provides the combined signal of 5mC and 5hmC.
Recently, two bisulfite-free whole-genome base-resolution DNA methylation sequencing methods have been developed to replace WGBS. The TET-assisted pyridine borane sequencing (TAPS) (Liu et al., 2019b) was introduced to detect 5mC and 5hmC, in which 5mC and 5hmC are firstly oxidized by TET to 5caC and then reduced to dihydrouracil (DHU) using pyridine borane with around 98% conversion efficiency, and subsequently readout as T after PCR amplifications (Fig. 1). It should be noted that the conversion rates of TAPS and bisulfite sequencing are different measurements. When comparing like-for-like, TAPS has a lower false positive rate (falsely detect unmodified C as modified, 0.23%) than bisulfite sequencing (0.6%). Compared with bisulfite sequencing, TAPS further demonstrates higher mapping rate and quality, more even coverage as well as lower sequencing cost. Through mild enzymatic and chemical reactions, TAPS can work effectively with as c Figure 1. Single-base resolution methods for quantitatively profiling mammalian DNA modifications of cytosine. For 5 methylcytosine (5mC) mapping there are three methods, whole-genome bisulfite sequencing (WGBS), TET-assisted pyridine borane sequencing (TAPS) and enzymatic methyl-sequencing (EM-Seq); For 5-hydroxymethylcytosine (5hmC) mapping in single-base resolution there are four methods, oxidative bisulphite sequencing (oxBS-Seq), TET-assisted bisulphite sequencing (TAB-Seq) and APOBEC-coupled epigenetic sequencing (ACE-Seq) as well as chemical-assistant C-to-T conversion of 5hmC sequencing (hmC-CATCH); Four sequencing methods for mapping the 5-formylcytosine (5fC), chemically assisted bisulfite sequencing (fCAB-Seq), reduced BS-Seq (redBS-Seq), M.SssI methylaseassisted bisulfite sequencing (MAB-Seq) and 5fC cyclization-enabled C-to-T transition of 5fC (fC-CET); Chemical modification-assisted bisulfite sequencing (CAB-Seq) is a singe-base resolution sequencing method to map 5caC.
Base-resolution sequencing of the "sixth base": 5hmC The in-depth investigation of the biological functions of 5hmC requires elucidating the distribution patterns of 5hmC in genomes, preferentially at single-nucleotide resolution. Two modified BS-seq methods has been developed for mapping 5hmC. Oxidative bisulfite sequencing (oxBS-Seq) (Booth et al., 2012) is based on the selective and quantitative chemical oxidation of 5hmC using potassium perruthenate (KRuO 4 ) to produce 5fC that subsequently converted to U by bisulfite treatment with an overall 5hmCto-U conversion rate of 94.5% (Fig. 1). The absolute level and precise position of 5hmC in oxBS-Seq will be detected by subtracting signals of oxBS-Seq from BS-Seq (mESCs) (Booth et al., 2012(Booth et al., , 2013. Deep sequencing depth is required to achieve high-confidence 5hmC mapping for oxBS-Seq, as it needs subtraction from two random sampling-based BS-Seq experiments are required. TET-assisted bisulfite sequencing (TAB-Seq) is an approach in which 5hmC is firstly modified using βGT, and then 5mC is subsequently oxidized to 5caC by TET1 (Fig. 1). Subsequent bisulfite treatment enables 5hmC detected as C, while C, 5mC, 5fC and 5caC are readout as T, offering a strategy for the directly mapping of 5hmC at a single base resolution (Yu et al., 2012a, b). TAB-Seq can achive over 96% of conversion rate of 5mC to T in genomic DNA with over 90% of 5hmC protected from conversion. It has been applied to not only confirm a high-confidence mapping of widespread distribution of 5hmC across the whole genome of mouse embryonic stem cells (mESCs) but also demonstrate strand asymmetry and sequence bias at the 5hmC. Additionally, 5hmC was shown to be highly enriched at distal regulatory elements through TAB-seq analysis.
More recently, two bisulfite-free approaches have been developed to map 5hmC at base-resolution. Chemical-assistant C-to-T conversion of 5hmC sequencing (hmC-CATCH) is a bisulfite-free method to map 5hmC, which is based on selective oxidation of 5hmC to 5fC by potassium ruthenate (K 2 RuO 4 ) with a conversion efficiency of ∼94% and subsequent chemical labeling and conversion of 5fC to T during PCR (Zeng et al., 2018) (Fig. 1). hmC-CATCH allows direct detection of 5hmC as T without affecting unmodified C or 5mC. It was illustrated that potassium ruthenate causes less DNA damage than potassium perruthenate, and enables the mapping of 5hmC with nanoscale genomic DNA, which is especially benificial for those biological and clinical samples with limited amounts. Futhermore, this method was applied to detect the cell-free DNA (cfDNA) of healthy donors and cancer patients, and revealed base-resolution hydroxymethylome in the human cfDNA for the first time.
Another method, APOBEC-coupled epigenetic sequencing (ACE-Seq) (Schutsky et al., 2018) has been developed as a bisulfite-free and enzymatic method for base resolution of sequencing of 5hmC ( Fig. 1). Similar to EM-Seq, it uses AID/APOBEC to deaminate unmodified C and 5mC to U after protecting 5hmC with βGT first, so it remains as C after PCR amplification. ACE-Seq achieved 99.9% and 99.5% conversion rates for cytosine and 5mC, respectively, while 98.5% of 5hmC remained as C. Compared with conventional bisulfite-based methods, ACE-seq is non-destructive, which allows for high confidence 5hmC profiles with up to 1000-fold less DNA input. 5hmC was found to be almost entirely confined to CG dinucleotides in tissue-derived cortical excitatory neurons by using ACE-seq. Similarly, Li et al. reported an APOBEC3A-mediated deamination sequencing (AMDseq) which was also established for localization analysis of 5hmC at base-resolution .
Base-resolution sequencing of 5fC 5fC chemically assisted bisulfite sequencing (fCAB-Seq) was the first quantitative method to sequence 5fC at singlebase resolution in genomic DNA   (Fig. 1). In fCAB-Seq, 5fC is modified with O-ethylhydroxylamine (EtONH 2 ) to form a derivative which can not be converted to U during the following BS-Seq. Therefore, the precise genomic locations of 5fC at single-base level can be identified, through comparison of EtONH 2 -treated BS-Seq and conventional BS-Seq of the same sample. Applying fCAB-Seq, low abundance 5fC at endogenous loci at levels down to only a few percent could be detected. Another bisulfitebased method termed reduced BS-Seq (redBS-Seq) was developed to quantititively detect 5fC in genomic DNA at single-base resolution (Booth et al., 2014), which is based on a selective reduction of 5fC to 5hmC by sodium borohydride (NaBH 4 ) followed by BS-Seq (Fig. 1). Using redBS-Seq, 5fC was demonstrated to be negatively correlated to 5hmC in locations where 5fC and 5hmC appeared simultaneously. The 5fC protection rate for fCAB-Seq is 50%-60%, while it is nearly 97% for redBS-Seq.
Another bisulfite-dependent genome-wide method, termed methylase-assisted bisulfite sequencing (MAB-Seq), can quantitatively detect 5fC and 5caC simultaneously at single-base resolution (Guo et al., 2014;Wu et al., 2014;Neri et al., 2015) (Fig. 1). In this approach, genomic DNA is first treated with the CpG methyltransferase M.SssI which efficiently methylates CpG dinucleotides, and the following bisulfite treatment can only result in deamination of 5fC and 5caC which readout as T, while C, 5mC and 5hmC are readout as C in the subsequently sequencing, since unmodified CpGs in the original genomic DNA are mythylated as 5mCpG. MAB-Seq, through which 84.7% of 5fC and 99.5% of 5caC are efficently converted, respectively, reveals strong strand asymmetry of active demethylation within palindromic CpGs. Using this method, 5fC and 5caC in ESCs were found to occur on active promoters and enhancers, and be associated with TET and TDG. The generation and excision of 5fC and 5caC indicated a dynamic DNA demethylation activity mediated by TET/TDG using MAB-Seq combined with Tdg depletion. MAB-seq could be further combined with sodium borohydride reduction to map 5caC and 5fC separately at a single base-resolution .
Two bisulfite-free sequencing methods have been developed to map 5fC at a single base-resolution in genomic scale. In fC-CET (5fC cyclization-enabled C-to-T transition), an azido derivative of 1,3-indandione (AI) was used to achieve selectively labelling of 5fC (Xia et al., 2015) (Fig. 1). The azide group in the labelling adduct enabled the efficient enrichment of 5fC containing DNA fragments, which largely reduced the sequencing cost for 5fC detection in a whole genome as compared with fCAB-Seq and redBS-Seq, considering the limited abundance of 5fC in the genome. With this method, genome-wide 5fC maps were obtained on the single-base level for the first time in both Tdg fl/fl mESCs and Tdg −/− mESCs with no noticeable DNA degradation, demonstrating a limited overlap with 5hmC. Moreover, the first single-cell 5fC sequencing method termed chemical-labeling-enabled C-to-T conversion sequencing (CLEVER-Seq) was introduced based on malononitrile labeling of 5fC (Zhu et al., 2017). With this method, conversion rate of ∼86.4% was observed for the 5fC site. Besides, the highly dynamic 5fC profile and its intrinsic heterogeneity were revealed at single base resolution for mouse embryos and mESCs, and the abundance of 5fC in promoter region could regulate corresponding gene expression.

Base-resolution sequencing of 5caC
Chemical modification-assisted bisulfite sequencing (CAB-Seq) has been developed to sequence 5caC at base-resolation (Lu et al., 2013) (Fig. 1). In CAB-Seq, 5caC is protected as an amide in a 1-ethyl-3-[3-imethylaminopropyl] carbodiimide hydrochloride (EDC) catalyzed reaction, which could not be converted to U during bisulfite treatment, and hereby readout as C. Therefore, 5caC could be detected by subtracting the BS-Seq signal from CAB-seq method. Based on CAB-Seq, DNA immunoprecipitation-coupled CAB-Seq (DIP-CAB-Seq) , as a pre-enrichment-based bisulfite sequencing strategy, was developed to map 5fC and 5caC at single-base resolution level in genome-wide both for WT and Tdg KO mouse ESCs, and illustrated only a very limited overlap existed between 5fC and 5caC.
Antibody-or immunoprecipitation (IP)-based mapping methods for modified DNA While we focus on base-resolution sequencing methods, antibody-or IP-based DNA modification detection strategies are traditionally widely used for the sake of simple and lowcost features. Methylated DNA immunoprecipitation (MeDIP) (Weber et al., 2005) used a 5mC-specific antibody to recognize and pull-down the DNA fragment with 5mC modification. Similar to MeDIP, 5hmC/5fC/5caC, can be recognized with specific antibodies (Ficz et al., 2011;Stroud et al., 2011;Shen et al., 2013).
A method profiling 5hmC in genomic DNA termed as hmC-seal (Song et al., 2011b) was developed as an antibody-independent method on the basis of selective chemical labeling and the extremely specific and tight biotin-streptavidin interaction, which can be then used to perform selective pull-down. Using hmC-seal to profile 5hmC, researchers found 5hmC signatures in cell free DNA could be diagnostic biomarkers for human cancers Song et al., 2017).
Despite the low-cost sequencing, the antibody-or IPbased methods for modified DNA are not quantitative and do not offer base-resolution information. In addition, the specificity is highly depended on the quality of the antibody, and high background noise could result from cross-reactivity with off-target sites and intrinsic affinity of IgG for short unmodified DNA repeats (Booth et al., 2015;Lentini et al., 2018). Therefore, profile of modified DNA detected by antibody-or IP-based methods should be interpreted with care.
Sequencing of N 6 -methyladenine (6mA) in DNA In spite of its scarcity in mammalian DNA, 6mA has grabed increasing attention since the presence of 6mA in various eukaryotic genomes was confirmed in 2015 Greer et al., 2015;Zhang et al., 2015). LC-MS/MS mass spectrometry can quantify the proportion of 6mA/A with a high sensitivity and is able to detect 6mA with very low abundance. DNA 6mA sequencing mainly relies on antibody enrichment, which is prone to background noise and offtarget binding as desbribed above (Lentini et al., 2018). The third generation sequencing methods are also used to identify 6mA in DNA, which are discussed in the third part. However, recent studies revealed that the sample contamination, RNA contamination, technological limitations, and antibody non-specificity may cause serious problems in quantification and sequencing of 6mA in mammalian genomic DNA, casting doubts on the significance of 6mA in the DNA and RNA modification REVIEW mammalian genome (O'Brown et al., 2019;Douvlataniotis et al., 2020;Musheev et al., 2020). However, 6mA could be a regulatory mark in mammalian mitochondrial DNA (mtDNA) (Hao et al., 2020).
Most of the high-throughput sequencing methods of m 6 A rely on an m 6 A-specific antibody. For instance, m 6 A/MeRIP-Seq uses the antibody to identify thousands of m 6 A peaks in mammalian mRNA (Dominissini et al., 2012;Meyer et al., 2012). PA-m 6 A-Seq, m 6 A-CLIP, and miCLIP utilize UV-induced antibody-RNA crosslinking to obtain the base-resolution m 6 A profiles Ke et al., 2015;Linder et al., 2015). m 6 A-LAIC-Seq compares RNA abundances in m 6 A-positive and m 6 A-negative fractions to quantify the m 6 A stoichiometry on a transcriptome-wide scale (Molinie et al., 2016). Endoribonuclease-based strategies to detect m 6 A (MAZTER-Seq and m 6 A-REF-Seq) have been developed, providing examples of antibody-independent m 6 A sequencing methods (Garcia-Campos et al., 2019;Zhang et al., 2019b). Another antibody-free m 6 A sequencing method, DART-Seq, utilizes fused APOBEC1-YTH protein to induce C-to-U editing at site adjacent to m 6 A, thus identifying m 6 A sites (Meyer, 2019). Very recently, two chemical labeling methods (m6A-label-seq and m6A-SEAL) have also been developed (Shu et al., 2020;Wang et al., 2020). Despite the fact that m 6 A has been profiled extensively, cautions should still be taken when using specific methods for m 6 A detection. For instance, the antibody-based methods could be influenced by the intrinsic bias of the antibody and binding to particular RNA sequence or other modification Linder et al., 2015). For the endoribonuclease-based methods, they do not pre-enrich m 6 A sites, have motif preference and thus detect only part of m 6 A sites. For the chemical labeling methods, labeling efficiency are needed to be improved. Hence, new methods are still desired to facilitate the study of m 6 A.
Despite these advances supporting the crucial roles of m 6 A in various cellular and physiological processes, there are still many issues in our understanding of m 6 A-mediated regulatory roles in gene expression. FTO, the first RNA demethylase identified both in vivo and in vitro to erase m 6 A, binds to exon and intron regions of pre-mRNA Fu et al., 2013;Bartosovic et al., 2017). FTO-mediated demethylation of m 6 A has regulatory roles in alternative splicing and translation (Bartosovic et al., 2017;Yu et al., 2018). FTO dynamically regulates m 6 A RNA in response to heat shock stress, DNA UV damage and virus infection Gokhale et al., 2016;Xiang et al., 2017). Moreover, FTO-mediated m 6 A demethylation affects cell growth and plays an oncogenic role in cancer cells (Cui et al., 2017a;Li et al., 2017c;Su et al., 2018). Therefore, the demethylase activity of FTO is very important for diverse physiological processes. A recent study reported in a liverspecific Fto-transgenic mice model, Fto can mediate demethylation of both internal m 6 A and cap m 6 Am . Moreover, another study found FTO preferentially demethylates m 6 Am than m 6 A (Mauer et al., 2017). Further investigations found that FTO shows differential substrate preferences for m 6 A and m 6 Am in polyadenylated RNA in the nucleus versus in the cytoplasm, and can mediate tRNA m 1 A demethylation as well (Wei et al., 2018). Collectively, FTO can demethylate multiple substrates, but it is still unclear how FTO coordinates the demethylation of multiple modifications and what are the regulatory roles of FTO in each methylation substrates. m 6 A can play an important role in pre-mRNA splicing. An initial study has revealed that m 6 A peaks are overrepresented in alternative exons, suggesting m 6 A may have regulatory functions in mRNA splicing (Dominissini et al., 2012). Further investigations reported that perturbation of m 6 A writers, erasers, or readers has effects on splicing. For m 6 A writers, the depletion of Mettl3 in mouse embryonic stem cells (mESCs) significantly affects alternative splicing (Geula et al., 2015); METTL16 can modify MAT2A transcript and regulate intron retention of MAT2A (Pendleton et al., 2017). For m 6 A erasers, the depletion of ALKBH5 was shown to alter splicing in HeLa cells (Zheng et al., 2013); FTO preferentially binds to intronic regions of pre-mRNA and the depletion of FTO in HEK293T and mouse 3T3-L1 cells also results in changes in pre-mRNA splicing Bartosovic et al., 2017). For m 6 A readers, nuclear reader YTHDC1 regulates splicing of m 6 Amethylated mRNAs by recruiting splicing factors (SRSF3 and SRSF10) (Xiao et al., 2016); HNRNPA2B1, another nuclear reader of m 6 A, elicits consequences on alternative splicing similar to those of METTL3 (Alarcon et al., 2015); HNRNPC and HNRNPG regulate the expression as well as alternative splicing of the target mRNAs via m 6 A-switch (Liu et al., 2015, REVIEW Lin-Yong Zhao et al. 2017). A more recent study has reported that m 6 A is decorated in nascent RNA and can regulate the kinetics of RNA splicing (Louloupi et al., 2018). Despite the above rich information supporting a role of m 6 A in splicing, there is one study claiming that mRNA m 6 A modification can be deposited before splicing but it is not required for splicing in mESCs . Thus, future investigations are still needed to determine how m 6 A directly or indirectly affects pre-mRNA splicing and which transcripts are regulated by m 6 A in different biological contexts.
m 6 A has intricate functions during diverse viral infection. m 6 A can be deposited in the RNAs of Zika virus (ZIKV), hepatitis C virus (HCV), influenza A virus (IAV), simian virus 40 (SV40), and human immunodeficiency virus 1 (HIV-1). m 6 A negatively regulates the infection of ZIKV and HCV Lichinchi et al., 2016b); while m 6 A promotes gene expression and replication of IAV and SV40 (Courtney et al., 2017;Tsai et al., 2018). The contention for the regulatory roles of m 6 A was observed in HIV: m 6 A was shown to enhance HIV-1 gene expression and replication Lichinchi et al., 2016a), while m 6 A was also found to inhibit HIV-1 infection by decreasing the reverse transcription (RT) of HIV-1 (Tirumuru et al., 2016;Lu et al., 2018). Besides, m 6 A methylation in the mRNA of host cells also has regulatory roles in response to viral infection (Liu et al., 2019d;Wang et al., 2019). Together, m 6 A is an important epitranscriptomic mark for controlling viral infection, but it is still unclear how m 6 A regulates viral infection and why m 6 A has different regulatory outputs towards diverse viruses.

At the beginning of mRNA: m 6 Am
The first adenosine proximal to 5' cap is 2'-O-methylated adenosine (Am), which can be further methylated by methyltransferase PCIF1 to form m 6 Am (Akichika et al., 2019;Boulias et al., 2019;Sendinc et al., 2019;Sun et al., 2019) (Fig. 2B). Similar to m 6 A, the N 6 -methyl group of m 6 Am can also be demethylated by FTO (Mauer et al., 2017;Wei et al., 2018) (Fig. 2B). Since m 6 A-specific antibody do not distinguish between m 6 Am and m 6 A, m 6 A/MeRIP-Seq and miCLIP can be used to detect m 6 Am at transcription start sites (TSSs) (Linder et al., 2015;Sun et al., 2019). Recently, a more specific detection method of m 6 Am has also been developed: m 6 Am-Exo-Seq utilizes a 5' exonuclease to deplete the internal m 6 A-containing RNA fragments and enrich capped 5' terminus of mRNA, followed immunoprecipitation (IP) with antibody against m 6 A (Sendinc et al., 2019). Nevertheless, all current m 6 Am sequencing technologies still rely on anti-m 6 A antibody, thus further development of unbiased and specific m 6 Am detection methods are still desired to help us to better understand the m 6 Am methylome.
The presence of m 6 Am was originally suggested to alter mRNA stability (Mauer et al., 2017); however, this finding was recently challenged. The transcripts with m 6 Am-cap were purposed with enhanced stabilities in HEK293T cells, and FTO knockdown causes a global increase in the expression level of m 6 Am-containing mRNAs (Mauer et al., 2017). However, the authors did not tease out the combinatorial effects of m 6 Am from internal m 6 A. Another study found FTO depletion does not noticeably affect the expression levels of mRNAs containing only m 6 Am in HEK293T cells (Wei et al., 2018). Further studies revealed that the loss of m 6 Am modification in PCIF1 knockout (KO) HEK293T or MEL624 cells does not significantly affect the level of mRNAs with m 6 Am either (Akichika et al., 2019;Sendinc et al., 2019). Controversial observations were also been reported: when mRNAs in the lower and upper half of gene expression were separately examined, only the half-life of m 6 Am-containing mRNAs in the lower half of gene expression were significantly decreased in PCIF1 KO HEK293T cells . On the other hand, it appears that m 6 Am also influences mRNA translation (Akichika et al., 2019;Sendinc et al., 2019). Collectively, the regulatory function of m 6 Am in mRNA is still at its early stage and remains to be fully explored.
Another well-known methylated adenosine: m 1 A m 1 A is an isomer of m 6 A, with the methyl group attached to the N 1 instead of N 6 position. m 1 A is known to present in tRNA, rRNA, and recently has also been identified in mRNA (Dominissini et al., 2016;Li et al., 2016aLi et al., , 2017bSafra et al., 2017). Similar to m 6 A, m 1 A in RNA is a dynamic and reversible modification. The methyltransferase complex TRMT6-TRMT61A is responsible for the installation of a subset of m 1 A in mRNA, while other set of methyltransferases, TRMT61B and TRMT10C catalyze the formation of m 1 A in mitochondrial mRNA (Li et al., 2017b;Safra et al., 2017) (Fig. 2C). The reversal of m 1 A in RNA can be catalyzed by ALKBH1, ALKBH3, and FTO (Dominissini et al., 2016;Li et al., 2016a;Liu et al., 2016;Wei et al., 2018) (Fig. 2C).
Recently, several groups have independently developed transcriptome-wide approaches to map m 1 A methylomes (m 1 A-ID-Seq, m 1 A-Seq, m 1 A-MAP, and m 1 A-Seq-TGIRT) (Dominissini et al., 2016;Li et al., 2016aLi et al., , 2017bSafra et al., 2017). During RT, m 1 A causes termination or misincorporation, thus m 1 A sites can be identified at single-base resolution after IP by commercial antibody and sequencing. Moreover, demethylase treatment or Dimroth rearrangement are further used to remove the RT signatures of m 1 A as an additional validation step. m 1 A-MAP identified 473 m 1 A sites in human mRNA (Li et al., 2017b); however, m 1 A-Seq-TGIRT detected only 15 m 1 A sites in human mRNA (Safra et al., 2017), due to its limited sensitivity . This is further exemplified by the fact that all the m 1 A sites identified by m 1 A-Seq-TGIRT are included in the more comprehensive m 1 A list by m 1 A-MAP. In addition, independent studies have reported that the m 1 A/A ratio in human mRNA is about 0.01%-0.05% (Dominissini et al., 2016;Li et al., 2016a;Ueda et al., 2017;Xu et al., 2017), supporting DNA and RNA modification REVIEW the existence of hundreds to thousands of m 1 A sites in mRNA (Dominissini et al., 2016;Li et al., 2016aLi et al., , 2017b instead of just a handful sites detected by m 1 A-Seq-TGIRT (Safra et al., 2017). Moreover, in-depth analysis revealed potential reasons that lead to insensitivity of m 1 A-Seq-TGIRT, including severe reads duplication, rRNA contamination, significant RNA degradation, low efficiency of Dimroth reaction, limited sequencing depth, etc. . Very recently, new approaches (m 1 A-IP-Seq and m 1 A-quant-Seq) utilizing an evolved reverse transcriptase that reads through m 1 A more efficiently also reported hundreds of m 1 A sites, further corroborating its prevalence in mRNA .
Chemical modifications in cytosine: m 5 C, hm 5 C, and ac 4 C m 5 C is formed by methylation at the C 5 position of cytosine, which is present in tRNA, rRNA, and mRNA (Dubin and Taylor, 1975). In mRNA, NSUN2 is the main m 5 C methyltransferase (Squires et al., 2012;Hussain et al., 2013b;Yang et al., 2017) (Fig. 2D). Drawing lessons from m 5 dC detection in DNA, m 5 C in RNA can be detected by a modified bisulfite treatment to achieve single-base resolution (Schaefer et al., 2009;Squires et al., 2012). To avoid potentially annealing to the inefficiently deaminated RNA templates, ACT random hexamers devoid of Gs were applied to prime the bisulfitetreated poly(A)-enriched RNA samples for RT (Yang et al., 2017). The mRNA export adapter ALYREF and the DNA/ RNA binding protein YBX1 have been identified as m 5 C readers (Yang et al., 2017Chen et al., 2019). Besides, several groups independently developed strategies to detect m 5 C: Aza-IP utilized a cytidine analogue, 5-azacytidine, to form a covalent adduct with methyltransferase, which can enrich and subsequently sequence m 5 C targets (Khoddami and Cairns, 2013); miCLIP of m 5 C (different from m 6 A miCLIP) exploited the formation of covalent bond between C271A mutant NSUN2 and substrate to detect the enriched m 5 C targets (Hussain et al., 2013b); m 5 C-RIP used m 5 C-specific antibody to identify m 5 C peaks in bacteria, archaea, yeast and plant transcriptomes (Edelheit et al., 2013;Cui et al., 2017b). Among them, bisulfite sequencing is the most widely used, which is single-base resolution and potentially quantitative. However, it also has limitations: it could lead to the loss of RNA due to harsh chemical and thermal conditions, thus this method is insensitive to detect m 5 C in low abundant RNA. Unconverted cytosines and other cytosine modifications resistant to bisulfite treatment may result in false-positive detection (Hussain et al., 2013a;Gilbert et al., 2016;Shafik et al., 2016). Furthermore, Aza-IP and miCLIP of m 5 C are bisulfite-independent and can preenrich m 5 C targets, but require over-expression of methyltransferase, which may lead to false-positive detection from nonspecific targeting by the highly expressed and potential mis-localized enzymes within the cell. Therefore, future development of more sensitive and accurate m 5 C detection methods are still desired . m 5 C can be further oxidized by ten-eleven translocation (TET) family enzymes to form hm 5 C Delatte et al., 2016;Shen et al., 2018) (Fig. 2E). Similar to m 5 C-RIP, hMeRIP-Seq relied on anti-hm 5 C antibody to detect over 3,000 hm 5 C peaks in Drosophila mRNA (Delatte et al., 2016). Additionally, the N 4 position of cytosine can be acetylated by the acetyltransferase NAT10 to form ac 4 C, which is present in tRNA, rRNA, and mRNA (Dong et al., 2016;Arango et al., 2018) (Fig. 2F). Based on ac 4 C-specific antibody, acRIP-Seq exploited anti-ac 4 C antibody and identified over 4,000 ac 4 C peaks in the human transcriptome (Arango et al., 2018). However, both detection strategies of hm 5 C and ac 4 C are based on specific antibodies and cannot reach single-base resolution, which hinders functional studies of RNA modification. Thus, learning from the success of single-base and quantitative m 6 A sequencing technologies, optimized methods are expected to be developed .
The rotation isomerization of uridine: Ψ Ψ, known as the "fifth nucleotide" of RNA, is the most abundant modification in RNA and widely present in tRNA, rRNA, snRNA, and mRNA (Karijolich et al., 2015). The formation of Ψ is catalyzed by two kinds of pseudouridine synthases (PUSs): "stand-alone" PUSs that require no cofactor and the RNA-dependent PUSs that require the cofactor, box H/ACA-box small nucleolar RNA (snoRNA), as guides to recognize substrates   (Fig. 2G). In human, stand-alone synthases PUS1, PUS7, TRUB1 and the RNA-dependent synthase DKC1 have been reported to catalyze a subset of Ψ in mRNA (Carlile et al., 2014;Schwartz et al., 2014a;Li et al., 2015), but it is still unclear whether other PUSs can also modify mRNA.
High-throughput sequencing methods for Ψ (Ψ-Seq, Pseudo-Seq, PSI-Seq, and CeU-Seq) rely on a chemical, N-cyclohexyl-N'-β-(4-methylmorpholinium) ethylcarbodiimide (CMC), which can specifically label Ψ (Carlile et al., 2014;Lovejoy et al., 2014;Schwartz et al., 2014a;Li et al., 2015). During RT, the CMC-Ψ adduct can cause stop at one nucleotide 3′ to the labeled Ψ site, enabling the detection of 100-400 Ψ sites in human mRNA at base resolution (Carlile et al., 2014;Lovejoy et al., 2014;Schwartz et al., 2014a). However, these methods cannot pre-enrich Ψ sites and may dropout Ψ in low abundant RNA. CeU-Seq utilized a CMC derivative, azido-CMC (N 3 -CMC), to allow the pre-enrichment of Ψ-containing RNA through biotin pulldown, which identified about 2,000 Ψ sites in human mRNA (Li et al., 2015). In fact, the ratio of Ψ/U in mammalian mRNA as measured by LC-MS/MS (about 0.2%-0.6% is comparable to the content of m 6 A (Li et al., 2015), which further supports the existence of thousands of Ψ sites in mRNA. The CMC chemistry can also be coupled to high resolution qPCR analysis to conveniently detect locus-specific Ψ sites in REVIEW Lin-Yong Zhao et al. mRNA and lncRNA (Lei and Yi, 2017). Moreover, bisulfite treatment can have Ψ nucleotide to form a monobisulfite adduct, which causes a deletion signature at the Ψ sites during RT. Thus, utilizing bisulfite treatment, RBS-Seq has been developed to detect Ψ modification (Khoddami et al., 2019). However, similar to the ordinary CMC labeling, this method also cannot pre-enrich Ψ sites and identified 322 Ψ sites in mRNA; even for abundant tRNA, RBS-Seq failed to detect all known Ψ sites. Recently, by combining CMC-labeling and demethylase treatment, DM-Ψ-Seq has been developed to detect global Ψ sites in tRNAome .
CMC-labeling is not perfect. Alkaline treatment step could lead to RNA degradation, and not all Ψ sites can be equally labeled. These may have led to the low overlap of identified Ψ sites in mRNA by different methods. Yet, Ψ sites in abundant non-coding RNAs (rRNA, etc.) were highly correlated, suggesting the abundance and thus the sequencing depth certainly influence the modification list. Further comparisons of different methods have revealed other factors that need to be considered, such as varied sequencing depth, different bioinformatics algorithms and cutoffs, distinct cell lines and/or growth conditions, etc (Li et al., 2016b;Zaringhalam and Papavasiliou, 2016). On the other hand, considering the dynamic nature of Ψ modification, it is likely that only a subset of pseudouridylation events have been reported. Thus, further improvements for Ψ profiling with quantification and higher sensitivity are still needed.
Not only a cap modification, but also an internal modification: m 7 G m 7 G is a well-known mRNA cap modification. It is also prevalent in tRNA and recently has been identified in mRNA as well (Chu et al., 2018;Malbec et al., 2019;Zhang et al., 2019a). METTL1-WDR4, known as a tRNA m 7 G methyltransferase complex, installs a subset of internal m 7 G in mRNA (Fig. 2H). Both antibody-based and chemical labeling sequencing methods have been developed to map m 7 G methylomes. m 7 G-MeRIP-Seq used m 7 G-specific antibody to identified over 2,000 internal m 7 G peaks in the mammalian transcriptome (Zhang et al., 2019a). m 7 G miCLIP-Seq utilized cross-linking-induced truncation and mutation to detect m 7 G (Malbec et al., 2019). m 7 G-Seq adopted a reduction-induced depurination reaction to generate a basic site at m 7 G positions, which can be further labeled with biotin and subsequently pulled down. The labeled m 7 G sites in RNA can cause misincorporation during RT, thus achieving the base-resolution map of m 7 G methylome (Zhang et al., 2019a). Benefiting from high-throughput detection strategies, two groups independently found that internal m 7 G in mRNA plays regulatory roles in translation (Malbec et al., 2019;Zhang et al., 2019a). Considering that METTL1modified m 7 G in tRNA is also required for translation and modification enzymes are shared between mRNA and tRNA (Lin et al., 2018), it would be interesting to separately probe its function in mRNA.

LONG-READ SEQUENCING FOR DNA AND RNA MODIFICATIONS
Most sequencing methods described above work with nextgenerations equencing, which is limited by short sequencing length. In contrast, third-generation sequencing methods including PacBio Single-Molecule Real-Time (SMRT) sequencing (Ardui et al., 2018;Wenger et al., 2019) and Oxford Nanopore sequencing (Clarke et al., 2009;Jain et al., 2018), have been developed to enable long-read and singlemolecule sequencing of DNA and RNA. Apart from the much longer read-length, both SMRT and Nanopore sequencing also allow direct readout of DNA and RNA modification.
SMRT sequencing, which is based on the differentiation of nucleobases in DNA through the fluorescent labelled nucleotide being incorporated into DNA by polymerases, can also detect base modifications using on polymerase kinetics, such as 5mC, 5hmC and 6mA (Flusberg et al., 2010). Genome-wide mapping of 5hmC at single-base resolution in mESCs was realized by chemical labeling-mediated SMRT sequencing (Song et al., 2011a). Chemical labeling enables the affinity enrichment of 5hmC-containing DNA fragments and increases the kinetic signal of 5hmC during SMRT sequencing. SMRT sequencing can detect 6mA in DNA, however, causions should be made since it overestimates 6mA level in DNA samples when it is rare (O'Brown et al., 2019). Moreover, it is possible to detect m 6 A in RNA and secondary structure of RNA by SMRT sequencing combined with reverse transcription (Vilfan et al., 2013).
As for Oxford Nanopore sequencing, different molecules can generate different ionic current when they pass through the nanoscale pore, which is then employed as characterized signatures to discriminate nucleosides in DNA or RNA (Venkatesan and Bashir, 2011;Jain et al., 2016;Garalde et al., 2018). Nanopore sequencing can directly detect DNA or RNA without PCR amplification or cDNA conversion in real time (Rand et al., 2017;Simpson et al., 2017;Garalde et al., 2018). It can be applied to detect different kinds of modified bases, such as 5mC, 5hmC and 6mA in DNA and m 6 A, Inosine, m 5 C, Ψ, and m 7 G in RNA, as well as secondary structure of RNA and G-quadruplex Simpson et al., 2017;Garalde et al., 2018;Wongsurawat et al., 2018;Liu et al., 2019a;Smith et al., 2019;Viehweger et al., 2019;Workman et al., 2019).
Although third generation sequencing is promising in direct detecting DNA and RNA modification, the high error rate and unmatured base-calling prevent the practical application at present. The combination of certain sequencing methods mentioned above with third generation DNA and RNA modification REVIEW sequencing could provide highly accurate long-read epigenetic sequencing, such as lrTAPS (Liu et al., 2020).

OPEN ACCESS
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.