Abstract
Bisulfite conversion of genomic DNA combined with next-generation sequencing (BS-seq) is widely used to measure the methylation state of a whole genome, the methylome, at single-base resolution. However, analysis of BS-seq data still poses a considerable challenge. Here we summarize the challenges of BS-seq mapping as they apply to both base and color-space data. We also explore the effect of sequencing errors and contaminants on inferred methylation levels and recommend the most appropriate way to analyze this type of data.
Similar content being viewed by others
References
Law, J.A. & Jacobsen, S.E. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet. 11, 204–220 (2010).
Pelizzola, M. & Ecker, J.R. The DNA methylome. FEBS Lett. 585, 1994–2000 (2010).
Robertson, K.D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610 (2005).
Doi, A. et al. Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat. Genet. 41, 1350–1353 (2009).
Esteller, M. Epigenetics in cancer. N. Engl. J. Med. 358, 1148–1159 (2008).
Bock, C. et al. Reference Maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell 144, 439–452 (2011).
Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).This was the first human methylome analyzed at single-base resolution using whole-genome bisulfite next-generation sequencing.
Lister, R. et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471, 68–73 (2011).
Bird, A.P. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499–1504 (1980).
Coulondre, C., Miller, J.H., Farabaugh, P.J. & Gilbert, W. Molecular basis of base substitution hotspots in Escherichia coli. Nature 274, 775–780 (1978).
Weber, M. et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat. Genet. 39, 457–466 (2007).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Suzuki, M.M. & Bird, A. DNA methylation landscapes: provocative insights from epigenomics. Nat. Rev. Genet. 9, 465–476 (2008).
Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Illingworth, R.S. et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 6, e1001134 (2010).
Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res. 19, 959–966 (2009).
Laird, P.W. Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 (2010).
Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008).
Jacinto, F.V., Ballestar, E. & Esteller, M. Methyl-DNA immunoprecipitation (MeDIP): hunting down the DNA methylome. Biotechniques 44, 35–39 (2008).
Serre, D., Lee, B.H. & Ting, A.H. MBD-isolated Genome sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 38, 391–399 (2010).
Li, N. et al. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods 52, 203–212 (2010).
Frommer, M. et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl. Acad. Sci. USA 89, 1827–1831 (1992).
Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat. Biotechnol. 28, 1106–1114 (2010).
Harris, R.A. et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat. Biotechnol. 28, 1097–1105 (2010).A detailed comparison of different sequencing-based technologies to analyze DNA methylation genome-wide.
Huang, Y. et al. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS ONE 5, e8888 (2010).
Ficz, G. et al. Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells and during differentiation. Nature 473, 398–402 (2011).
Pastor, W.A. et al. Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature 473, 394–397 (2011).
Song, C.X. et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat. Biotechnol. 29, 68–72 (2011).
Li, Y. et al. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol. 8, e1000533 (2010).
Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008).This study reported the first genome-wide DNA methylation in mouse cells generated by RRBS.
Feng, S. et al. Conservation and divergence of methylation patterning in plants and animals. Proc. Natl. Acad. Sci. USA 107, 8689–8694 (2010).
Popp, C. et al. Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature 463, 1101–1105 (2010).
Gu, H. et al. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nat. Protoc. 6, 468–481 (2011).
Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nat. Methods 7, 133–136 (2010).
Smith, Z.D., Gu, H., Bock, C., Gnirke, A. & Meissner, A. High-throughput bisulfite sequencing in mammalian genomes. Methods 48, 226–232 (2009).
Song, F. et al. Association of tissue-specific differentially methylated regions (TDMs) with differential gene expression. Proc. Natl. Acad. Sci. USA 102, 3336–3341 (2005).
Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008).This study reported a methylome of Arabidopsis thaliana at single-base resolution generated via a nondirectional bisulfite sequencing library.
Smallwood, S.A. et al. Dynamic CpG island methylation landscape in oocytes and preimplantation embryos. Nat Genet. 43, 811–814 (2011).
Chen, P.Y., Cokus, S.J. & Pellegrini, M.B.S. Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 11, 203 (2010).
Krueger, F. & Andrews, S.R. Bismark: A flexible aligner and methylation caller for Bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).
McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).
Ondov, B.D. et al. An alignment algorithm for bisulfite sequencing using the Applied Biosystems SOLiD System. Bioinformatics 26, 1901–1902 (2010).
Harris, E.Y., Ponts, N., Levchuk, A., Roch, K.L. & Lonardi, S. BRAT: bisulfite-treated reads analysis tool. Bioinformatics 26, 572–573 (2010).
Kreck, B. et al. B-SOLANA: An approach for the analysis of two-base encoding bisulfite sequencing data. Bioinformatics published online, doi:10.1093/bioinformatics/btr660 (6 December 2011).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).
Cox, M.P., Peterson, D.A., Biggs, P.J. & Solexa, Q.A. At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11, 485 (2010).
Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).
Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009).
Pedersen, B., Hsieh, T.F., Ibarra, C. & Fischer, R.L. MethylCoder: software pipeline for bisulfite-treated sequences. Bioinformatics 27, 2435–2436 (2011).
Smith, A.D. et al. Updates to the RMAP short-read mapping software. Bioinformatics 25, 2841–2842 (2009).
Acknowledgements
This work was funded by the Biotechnology and Biological Sciences Research Council, UK. A.F. and B.K. received infrastructure support from the Deutsche Forschungsgemeinschaft Excellence Cluster 'Inflammation at Interfaces'.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Table 1 (PDF 507 kb)
Rights and permissions
About this article
Cite this article
Krueger, F., Kreck, B., Franke, A. et al. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145–151 (2012). https://doi.org/10.1038/nmeth.1828
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.1828
- Springer Nature America, Inc.
This article is cited by
-
Genome-wide analysis of hepatic DNA methylation reveals impact of epigenetic aging on xenobiotic metabolism and transport genes in an aged mouse model
GeroScience (2024)
-
Hepatocellular carcinoma detection via targeted enzymatic methyl sequencing of plasma cell-free DNA
Clinical Epigenetics (2023)
-
Identification of the ultrahigh-risk subgroup in neuroblastoma cases through DNA methylation analysis and its treatment exploiting cancer metabolism
Oncogene (2022)
-
Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase
Genome Biology (2021)
-
The effect of DNA methylation on bumblebee colony development
BMC Genomics (2021)