Abstract
There are multiple chemical modifications of cytosine that are important to the regulation and ultimately the functional expression of the genome. To date no single experiment can capture these separate modifications, and integrative experimental designs are needed to fully characterize cytosine methylation and chemical modification. This chapter describes a generative probabilistic model, Lux, for integrative analysis of cytosine methylation and its oxidized variants. Lux simultaneously analyzes partially orthogonal bisulfite sequencing data sets to estimate proportions of different cytosine methylation modifications and estimate multiple cytosine modifications for a single sample by integrating across experimental designs composed of multiple parallel destructive genomic measurements. Lux also considers the variation in measurements introduced by different imperfect experimental steps; the experimental variation can be quantified by using appropriate spike-in controls, allowing Lux to deconvolve the measurements and recover accurately the underlying signal.
Key words
- DNA methylation
- Bayesian analysis
- Hierarchical generative modeling
- 5-methylcytosine oxidation
- Bisulfite sequencing
- BS-seq/oxBS-seq/TAB-seq/fCAB-seq/CAB-seq/redBS-seq/MAB-seq
This is a preview of subscription content, access via your institution.
Buying options



References
Kohli RM, Zhang Y (2013) TET enzymes, TDG and the dynamics of DNA demethylation. Nature 502(7472):472. https://doi.org/10.1038/nature12750
Pastor WA, Aravind L, Rao A (2013) TETonic shift: biological roles of TET proteins in DNA demethylation and transcription. Nat Rev Mol Cell Biol 14(6):341. https://doi.org/10.1038/nrm3589
Wu X, Zhang Y (2017) TET-mediated active DNA demethylation: mechanism, function and beyond. Nat Rev Genet 18(9):517–534
Shen L, Wu H, Diep D, Yamaguchi S, D’Alessio AC, Fung H-L et al (2013) Genome-wide analysis reveals TET-and TDG-dependent 5-methylcytosine oxidation dynamics. Cell 153(3):692–706
Spruijt CG, Gnerlich F, Smits AH, Pfaffeneder T, Jansen PW, Bauer C (2013) Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives. Cell 152(5):1146–1159. https://doi.org/10.1016/j.cell.2013.02.004
Yin Y, Morgunova E, Jolma A, Kaasinen E, Sahu B, Khund-Sayeed S et al (2017) Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356(6337):eaaj2239. http://www.sciencemag.org/lookup/doi/10.1126/science.aaj2239
Äijö T, Huang Y, Mannerström H, Chavez L, Tsagaratou A, Rao A et al (2016) A probabilistic generative model for quantification of DNA modifications enables analysis of demethylation pathways. Genome Biol 17(1):49. https:// doi.org/10.1186/s13059-016-0911-6
Äijö T, Yue X, Rao A, Lähdesmäki H (2016) LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs. Bioinformatics 32(17):i511–i519
Plongthongkum N, Diep DH, Zhang K (2014) Advances in the profiling of DNA modifications: cytosine methylation and beyond. Nat Rev Genet 15(10):647–661. https://doi.org/10.1038/nrg3772
Huang Y, Pastor WA, Shen Y, Tahiliani M, Liu DR, Rao A (2010) The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS One 5(1):e8888. https:// doi.org/10.1371/journal.pone.0008888
Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W (2012) Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 336(6083):934–937. https://doi.org/10.1126/science.1220671
Yu M, Hon GC, Szulwach KE, Song CX, Zhang L, Kim A (2012) Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149(6):1368–1380. https://doi.org/10.1016/j.cell.2012.04.027
Song CX, Szulwach KE, Dai Q, Fu Y, Mao SQ, Lin L (2013) Genome-wide profiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153(3):678–691. https://doi.org/10.1016/j.cell.2013.04.001
Booth MJ, Marsico G, Bachman M, Beraldi D, Balasubramanian S (2014) Quantitative sequencing of 5-formylcytosine in DNA at single-base resolution. Nat Chem 6(5):435–440. https://doi.org/10.1038/nchem.1893
Lu X, Song CX, Szulwach K, Wang Z, Weidenbacher P, Jin P (2013) Chemical modification-assisted bisulfite sequencing (CAB-Seq) for 5-carboxylcytosine detection in DNA. J Am Chem Soc 135(25):9315–9317. https://doi.org/10.1021/ja4044856
Wu H, Wu X, Shen L, Zhang Y (2014) Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencing. Nat Biotechnol 32(12):1231–1240. https://doi.org/10.1038/nbt.3073
Yu M, Hon GC, Szulwach KE, Song C-X, Jin P, Ren B et al (2012) Tet-assisted bisulfite sequencing of 5-hydroxymethylcytosine. Nat Protoc 7(12):2159–2170. https://doi.org/ 10.1038/nprot.2012.137
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M et al (2017) Stan: a probabilistic programming language. J Stat Softw 76(1):1–32. https://www.jstatsoft.org/v076/i01
Hoffman MD, Gelman A (2014) The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(1):1593–1623
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Taylor & Francis. (Chapman & Hall/CRC Texts in Statistical Science), London. https://books.google.com/books?id=ZXL6AQAAQBAJ
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data [Internet]. http://www. bioinformatics.babraham.ac.uk/projects/fastqc/
Krueger F, Andrews SR (2011) Bismark: a flexible aligner and methylation caller for bisulfite-Seq applications. Bioinformatics 27(11):1571–1572. https://doi.org/ 10.1093/bioinformatics/btr167
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842. https:// doi.org/10.1093/bioinformatics/btq033
Stan Development Team (2017) PyStan: the Python interface to Stan [Internet]. http://mc-stan.org
Stan Development Team (2017) CmdStan: the command-line interface to Stan
Äijö T, Mannerström H (2017) Lux: an integrative hierarchical Bayesian modeli for analyzing bisulphite based sequencing data [Internet]. https://github.com/tare/Lux/
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472. http://projecteuclid.org/euclid.ss/1177011136
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795
Dickey JM (1971) The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann Math Stat 42(1):204–223
Jeffreys H (1998) Theory of probability, 3rd edn. Oxford University Press, New York, p xii+459; (Oxford Classic Texts in the Physical Sciences)
Hon GC, Rajagopal N, Shen Y, McCleary DF, Yue F, Dang MD et al (2013) Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45(10):1198–1206. http://www.nature.com/doifinder/10.1038/ng.2746
Tsagaratou A, Äijö T, Lio C-WJ, Yue X, Huang Y, Jacobsen SE et al (2014) Dissecting the dynamic changes of 5-hydroxymethylcytosine in T-cell development and differentiation. Proc Natl Acad Sci 111(32):E3306–E3315. http://www.pnas.org/cgi/doi/10.1073/pnas.1412327111
Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D (2015) Methods of integrating data to uncover genotype–phenotype interactions. Nat Rev Genet 16(2):85–97. http://www.nature.com/doifinder/10.1038/nrg3868
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Äijö, T., Bonneau, R., Lähdesmäki, H. (2018). Generative Models for Quantification of DNA Modifications. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_4
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8561-6_4
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8560-9
Online ISBN: 978-1-4939-8561-6
eBook Packages: Springer Protocols