Skip to main content

Predicting Chromatin Interactions from DNA Sequence Using DeepC

  • Protocol
  • First Online:
Computational Epigenomics and Epitranscriptomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2624))

Abstract

The genome 3D structure is central to understanding how disease-associated genetic variants in the noncoding genome regulate their target genes. Genome architecture spans large-scale structures determined by fine-grained regulatory elements, making it challenging to predict the effects of sequence and structural variants. Experimental approaches for chromatin interaction mapping remain costly and time-consuming, limiting their use for interrogating changes of chromatin architecture associated with genomic variation at scale. Computational models to predict chromatin interactions have either interpreted chromatin at coarse resolution or failed to capture the long-range dependencies of larger sequence contexts. To bridge this gap, we previously developed deepC, a deep neural network approach to predict chromatin interactions from DNA sequence at megabase scale. deepC employs dilated convolutional layers to achieve simultaneously a large sequence context while interpreting the DNA sequence at single base pair resolution. Using transfer learning of convolutional weights trained to predict a compendium of chromatin features across cell types allows deepC to predict cell type-specific chromatin interactions from DNA sequence alone. Here, we present a detailed workflow to predict chromatin interactions with deepC. We detail the necessary data pre-processing steps, guide through deepC model training, and demonstrate how to employ trained models to predict chromatin interactions and the effect of sequence variations on genome architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hanssen LLP, Kassouf MT, Oudelaar AM et al (2017) Tissue-specific CTCF-cohesin-mediated chromatin architecture delimits enhancer interactions and function in vivo. Nat Cell Biol 19:952–961. https://doi.org/10.1038/ncb3573

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Deng W, Lee J, Wang H et al (2012) Controlling long-range genomic interactions at a native Locus by targeted tethering of a looping factor. Cell 149:1233–1244. https://doi.org/10.1016/J.CELL.2012.03.051

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Lieberman-Aiden E, van Berkum NL, Williams L et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326:289–293. https://doi.org/10.1126/science.1181369

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Rao SSP, Huntley MH, Durand NC et al (2014) A 3D map of the human genome at Kilobase resolution reveals principles of chromatin looping. Cell 159:1665–1680. https://doi.org/10.1016/j.cell.2014.11.021

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Nora EP, Goloborodko A, Valton AL et al (2017) Targeted degradation of CTCF decouples local insulation of chromosome domains from Genomic compartmentalization. Cell 169:930.e22–944.e22. https://doi.org/10.1016/j.cell.2017.05.004

    Article  CAS  Google Scholar 

  6. Schwessinger R, Gosden M, Downes D et al (2020) DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods 17:1118–1124. https://doi.org/10.1038/s41592-020-0960-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bianco S, Lupiáñez DG, Chiariello AM et al (2018) Polymer physics predicts the effects of structural variants on chromatin architecture. Nat Genet 50:662–667. https://doi.org/10.1038/s41588-018-0098-8

    Article  CAS  PubMed  Google Scholar 

  8. Buckle A, Brackley CA, Boyle S et al (2018) Polymer simulations of heteromorphic chromatin predict the 3D folding of complex Genomic Loci. Mol Cell 72:786.e11–797.e11. https://doi.org/10.1016/j.molcel.2018.09.016

    Article  CAS  Google Scholar 

  9. Belokopytova PS, Nuriddinov MA, Mozheiko EA et al (2020) Quantitative prediction of enhancer–promoter interactions. Genome Res 30:72–84. https://doi.org/10.1101/gr.249367.119

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zhang S, Chasman D, Knaack S, Roy S (2019) In silico prediction of high-resolution Hi-C interaction matrices. Nat Commun 10:5449. https://doi.org/10.1038/s41467-019-13423-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Whalen S, Truty RM, Pollard KS (2016) Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48:488–496. https://doi.org/10.1038/ng.3539

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Schreiber J, Libbrecht M, Bilmes J, Noble WS (2017) Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv 103614. https://doi.org/10.1101/103614

  13. Li W, Wong WH, Jiang R (2019) DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res 47:e60–e60. https://doi.org/10.1093/nar/gkz167

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods 12:931–934. https://doi.org/10.1038/nmeth.3547

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26:990–999. https://doi.org/10.1101/gr.200535.115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33:831–838. https://doi.org/10.1038/nbt.3300

    Article  CAS  PubMed  Google Scholar 

  17. Kelley DR, Reshef YA, Bileschi M et al (2018) Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 28:739–750. https://doi.org/10.1101/gr.227819.117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions

    Google Scholar 

  19. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Adv Neural Inf Proces Syst 4:3320–3328

    Google Scholar 

  20. Fudenberg G, Kelley DR, Pollard KS (2020) Predicting 3D genome folding from DNA sequence with Akita. Nat Methods 17:1111–1117. https://doi.org/10.1038/s41592-020-0958-x

    Article  PubMed  PubMed Central  Google Scholar 

  21. Zhou J (2021) Sequence-based modeling of genome 3D architecture from kilobase to chromosome-scale. bioRxiv 2021.05.19.444847. https://doi.org/10.1101/2021.05.19.444847

  22. Zheng X, Wang J, Wang C (2021) HiCArch: a deep learning-based Hi-C data predictor. bioRxiv 2021.11.26.470146. https://doi.org/10.1101/2021.11.26.470146

  23. Cao F, Zhang Y, Cai Y et al (2021) Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences. Genome Biol 22:1–25. https://doi.org/10.1186/S13059-021-02453-5/FIGURES/8

    Article  Google Scholar 

  24. Chen K, Zhao H, Yang Y (2021) Capturing large genomic contexts for accurately predicting enhancer-promoter interactions. bioRxiv 2021.09.04.458817. https://doi.org/10.1101/2021.09.04.458817

  25. Das A, Yang R, Gao V, et al Epiphany: predicting the Hi-C Contact Map from 1D Epigenomic Data

    Google Scholar 

  26. Bonev B, Mendelson Cohen N, Szabo Q et al (2017) Multiscale 3D genome rewiring during mouse neural development. Cell 171:557.e24–572.e24. https://doi.org/10.1016/j.cell.2017.09.043

    Article  CAS  Google Scholar 

  27. The ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science (New York, NY) 306:636–640. https://doi.org/10.1126/science.1105136

    Article  CAS  Google Scholar 

  28. Servant N, Varoquaux N, Lajoie BR et al (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16:259. https://doi.org/10.1186/s13059-015-0831-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Imakaev M, Fudenberg G, McCord RP et al (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. – Supplement. Nat Methods 9:999–1003. https://doi.org/10.1038/nmeth.2148

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Karolchik D, Hinricks AS, Furey TS et al (2004) The UCSC table browser data retrieval tool. Nucleic Acids Res 32. https://doi.org/10.1093/NAR/GKH103

  31. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Abadi M, Barham P, Chen J, et al (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), p 265–284

    Google Scholar 

  33. van der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30. https://doi.org/10.1109/MCSE.2011.37

    Article  Google Scholar 

  34. Dale RK, Pedersen BS, Quinlan AR (2011) Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics (Oxford, UK) 27:3423–3424. https://doi.org/10.1093/BIOINFORMATICS/BTR539

    Article  CAS  Google Scholar 

  35. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https://doi.org/10.1093/bioinformatics/btq033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Kingma DP, Ba J (2014) Adam: a method for Stochastic Optimization. https://doi.org/http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503

  37. Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer, New York

    Book  Google Scholar 

  38. Lawrence M, Huber W, Pagès H et al (2013) Software for computing and annotating genomic ranges. PLoS Comput Biol 9:e1003118. https://doi.org/10.1371/JOURNAL.PCBI.1003118

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Schwessinger R, Gosden M, Downes D et al (2020) DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods. https://doi.org/10.1038/s41592-020-0960-3

  40. Sandelin A, Alkema W, Engström P et al (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32:D91–D94. https://doi.org/10.1093/nar/gkh012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 1:032821. https://doi.org/10.1101/032821

    Article  CAS  Google Scholar 

  42. Oord A van den, Dieleman S, Zen H, et al (2016) WaveNet: a generative model for Raw Audio. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, p 3437–3440

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Schwessinger, R. (2023). Predicting Chromatin Interactions from DNA Sequence Using DeepC. In: Oliveira, P.H. (eds) Computational Epigenomics and Epitranscriptomics. Methods in Molecular Biology, vol 2624. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2962-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2962-8_3

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2961-1

  • Online ISBN: 978-1-0716-2962-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics