Abstract
The genome 3D structure is central to understanding how disease-associated genetic variants in the noncoding genome regulate their target genes. Genome architecture spans large-scale structures determined by fine-grained regulatory elements, making it challenging to predict the effects of sequence and structural variants. Experimental approaches for chromatin interaction mapping remain costly and time-consuming, limiting their use for interrogating changes of chromatin architecture associated with genomic variation at scale. Computational models to predict chromatin interactions have either interpreted chromatin at coarse resolution or failed to capture the long-range dependencies of larger sequence contexts. To bridge this gap, we previously developed deepC, a deep neural network approach to predict chromatin interactions from DNA sequence at megabase scale. deepC employs dilated convolutional layers to achieve simultaneously a large sequence context while interpreting the DNA sequence at single base pair resolution. Using transfer learning of convolutional weights trained to predict a compendium of chromatin features across cell types allows deepC to predict cell type-specific chromatin interactions from DNA sequence alone. Here, we present a detailed workflow to predict chromatin interactions with deepC. We detail the necessary data pre-processing steps, guide through deepC model training, and demonstrate how to employ trained models to predict chromatin interactions and the effect of sequence variations on genome architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hanssen LLP, Kassouf MT, Oudelaar AM et al (2017) Tissue-specific CTCF-cohesin-mediated chromatin architecture delimits enhancer interactions and function in vivo. Nat Cell Biol 19:952–961. https://doi.org/10.1038/ncb3573
Deng W, Lee J, Wang H et al (2012) Controlling long-range genomic interactions at a native Locus by targeted tethering of a looping factor. Cell 149:1233–1244. https://doi.org/10.1016/J.CELL.2012.03.051
Lieberman-Aiden E, van Berkum NL, Williams L et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326:289–293. https://doi.org/10.1126/science.1181369
Rao SSP, Huntley MH, Durand NC et al (2014) A 3D map of the human genome at Kilobase resolution reveals principles of chromatin looping. Cell 159:1665–1680. https://doi.org/10.1016/j.cell.2014.11.021
Nora EP, Goloborodko A, Valton AL et al (2017) Targeted degradation of CTCF decouples local insulation of chromosome domains from Genomic compartmentalization. Cell 169:930.e22–944.e22. https://doi.org/10.1016/j.cell.2017.05.004
Schwessinger R, Gosden M, Downes D et al (2020) DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods 17:1118–1124. https://doi.org/10.1038/s41592-020-0960-3
Bianco S, Lupiáñez DG, Chiariello AM et al (2018) Polymer physics predicts the effects of structural variants on chromatin architecture. Nat Genet 50:662–667. https://doi.org/10.1038/s41588-018-0098-8
Buckle A, Brackley CA, Boyle S et al (2018) Polymer simulations of heteromorphic chromatin predict the 3D folding of complex Genomic Loci. Mol Cell 72:786.e11–797.e11. https://doi.org/10.1016/j.molcel.2018.09.016
Belokopytova PS, Nuriddinov MA, Mozheiko EA et al (2020) Quantitative prediction of enhancer–promoter interactions. Genome Res 30:72–84. https://doi.org/10.1101/gr.249367.119
Zhang S, Chasman D, Knaack S, Roy S (2019) In silico prediction of high-resolution Hi-C interaction matrices. Nat Commun 10:5449. https://doi.org/10.1038/s41467-019-13423-8
Whalen S, Truty RM, Pollard KS (2016) Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48:488–496. https://doi.org/10.1038/ng.3539
Schreiber J, Libbrecht M, Bilmes J, Noble WS (2017) Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv 103614. https://doi.org/10.1101/103614
Li W, Wong WH, Jiang R (2019) DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res 47:e60–e60. https://doi.org/10.1093/nar/gkz167
Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods 12:931–934. https://doi.org/10.1038/nmeth.3547
Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26:990–999. https://doi.org/10.1101/gr.200535.115
Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33:831–838. https://doi.org/10.1038/nbt.3300
Kelley DR, Reshef YA, Bileschi M et al (2018) Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 28:739–750. https://doi.org/10.1101/gr.227819.117
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Adv Neural Inf Proces Syst 4:3320–3328
Fudenberg G, Kelley DR, Pollard KS (2020) Predicting 3D genome folding from DNA sequence with Akita. Nat Methods 17:1111–1117. https://doi.org/10.1038/s41592-020-0958-x
Zhou J (2021) Sequence-based modeling of genome 3D architecture from kilobase to chromosome-scale. bioRxiv 2021.05.19.444847. https://doi.org/10.1101/2021.05.19.444847
Zheng X, Wang J, Wang C (2021) HiCArch: a deep learning-based Hi-C data predictor. bioRxiv 2021.11.26.470146. https://doi.org/10.1101/2021.11.26.470146
Cao F, Zhang Y, Cai Y et al (2021) Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences. Genome Biol 22:1–25. https://doi.org/10.1186/S13059-021-02453-5/FIGURES/8
Chen K, Zhao H, Yang Y (2021) Capturing large genomic contexts for accurately predicting enhancer-promoter interactions. bioRxiv 2021.09.04.458817. https://doi.org/10.1101/2021.09.04.458817
Das A, Yang R, Gao V, et al Epiphany: predicting the Hi-C Contact Map from 1D Epigenomic Data
Bonev B, Mendelson Cohen N, Szabo Q et al (2017) Multiscale 3D genome rewiring during mouse neural development. Cell 171:557.e24–572.e24. https://doi.org/10.1016/j.cell.2017.09.043
The ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science (New York, NY) 306:636–640. https://doi.org/10.1126/science.1105136
Servant N, Varoquaux N, Lajoie BR et al (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16:259. https://doi.org/10.1186/s13059-015-0831-x
Imakaev M, Fudenberg G, McCord RP et al (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. – Supplement. Nat Methods 9:999–1003. https://doi.org/10.1038/nmeth.2148
Karolchik D, Hinricks AS, Furey TS et al (2004) The UCSC table browser data retrieval tool. Nucleic Acids Res 32. https://doi.org/10.1093/NAR/GKH103
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Abadi M, Barham P, Chen J, et al (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), p 265–284
van der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30. https://doi.org/10.1109/MCSE.2011.37
Dale RK, Pedersen BS, Quinlan AR (2011) Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics (Oxford, UK) 27:3423–3424. https://doi.org/10.1093/BIOINFORMATICS/BTR539
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https://doi.org/10.1093/bioinformatics/btq033
Kingma DP, Ba J (2014) Adam: a method for Stochastic Optimization. https://doi.org/http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503
Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer, New York
Lawrence M, Huber W, Pagès H et al (2013) Software for computing and annotating genomic ranges. PLoS Comput Biol 9:e1003118. https://doi.org/10.1371/JOURNAL.PCBI.1003118
Schwessinger R, Gosden M, Downes D et al (2020) DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods. https://doi.org/10.1038/s41592-020-0960-3
Sandelin A, Alkema W, Engström P et al (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32:D91–D94. https://doi.org/10.1093/nar/gkh012
Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 1:032821. https://doi.org/10.1101/032821
Oord A van den, Dieleman S, Zen H, et al (2016) WaveNet: a generative model for Raw Audio. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, p 3437–3440
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Schwessinger, R. (2023). Predicting Chromatin Interactions from DNA Sequence Using DeepC. In: Oliveira, P.H. (eds) Computational Epigenomics and Epitranscriptomics. Methods in Molecular Biology, vol 2624. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2962-8_3
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2962-8_3
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2961-1
Online ISBN: 978-1-0716-2962-8
eBook Packages: Springer Protocols