Abstract
The three-dimensional organization of the human genome is of crucial importance for gene regulation. Results from high-throughput chromosome conformation capture techniques show that the CCCTC-binding factor (CTCF) plays an important role in chromatin interactions, and CTCF-mediated chromatin loops mostly occur between convergent CTCF-binding sites. However, it is still unclear whether and what sequence patterns in addition to the convergent CTCF motifs contribute to the formation of chromatin loops. To discover the complex sequence patterns for chromatin loop formation, we have developed a deep learning model, called DeepCTCFLoop, to predict whether a chromatin loop can be formed between a pair of convergent CTCF motifs using only the DNA sequences of the motifs and their flanking regions. Our results suggest that DeepCTCFLoop can accurately distinguish the convergent CTCF motif pairs forming chromatin loops from the ones not forming loops. It significantly outperforms CTCF-MP, a machine learning model based on word2vec and boosted trees, when using DNA sequences only. Moreover, we show that DNA motifs binding to ASCL1, SP2 and ZNF384 may facilitate the formation of chromatin loops in addition to convergent CTCF motifs. To our knowledge, this is the first published study of using deep learning techniques to discover the sequence motif patterns underlying CTCF-mediated chromatin loop formation. Our results provide useful information for understanding the mechanism of 3D genome organization. The source code and datasets used in this study for model construction are freely available at https://github.com/BioDataLearning/DeepCTCFLoop.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bonev, B., Cavalli, G.: Organization and function of the 3D genome. Nat. Rev. Genet. 17, 661 (2016)
Bickmore, W.A.: The spatial organization of the human genome. Ann. Rev. Genomics Hum. Genet. 14, 67–84 (2013)
Lieberman-Aiden, E., Van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., et al.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009)
Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Mohamed, Y.B., et al.: An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 58 (2009)
Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., et al.: A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014)
Tang, Z., Luo, O.J., Li, X., Zheng, M., Zhu, J.J., Szalaj, P., et al.: CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611–1627 (2015)
Nora, E.P., Goloborodko, A., Valton, A.-L., Gibcus, J.H., Uebersohn, A., Abdennur, N., et al.: Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell 169, 930–944 (2017). e922
Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D.U., et al.: CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900–910 (2015)
Zhang, R., Wang, Y., Yang, Y., Zhang, Y., Ma, J.: Predicting CTCF-mediated chromatin loops using CTCF-MP. Bioinformatics 34, i133–i141 (2018)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)
Quang, D., Xie, X.: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016)
Angermueller, C., Lee, H.J., Reik, W., Stegle, O.: DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017)
Kelley, D.R., Snoek, J., Rinn, J.L.: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 207-212 (2016)
Li, W., Wong, W.H., Jiang, R.: DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res. 47, e60–e60 (2019)
Grant, C.E., Bailey, T.L., Noble, W.S.: FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002)
Consortium, E.P.: The ENCODE (ENCyclopedia of DNA elements) project. Science 306, 636–640 (2004)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980
Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. Proceedings of the 12th Python in Science Conference, pp. 13-20 (2013)
Crooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E.: WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004)
Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worsley-Hunt, R., Arenillas, D.J., et al.: JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2013)
Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., et al.: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009)
Trabelsi, A., Chaabane, M., Hur, A.B.: Comprehensive Evaluation of Deep Learning Architectures for Prediction of DNA/RNA Sequence Binding Specificities (2019). arXiv preprint arXiv:1901.10526
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., et al.: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376 (2012)
Dekker, J., Heard, E.: Structural and functional diversity of topologically associating domains. FEBS Lett. 589, 2877–2884 (2015)
Smith, E.M., Lajoie, B.R., Jain, G., Dekker, J.: Invariant TAD boundaries constrain cell-type-specific looping interactions between promoters and distal elements around the CFTR locus. Am. J. Hum. Genet. 98, 185–201 (2016)
Bouwman, B.A., de Laat, W.: Getting the genome in shape: the formation of loops, domains and compartments. Genome Biol. 16, 154 (2015)
Aydin, B., Kakumanu, A., Rossillo, M., Moreno-Estellés, M., Garipler, G., Ringstad, N., et al.: Proneural factors Ascl1 and Neurog2 contribute to neuronal subtype identities by establishing distinct chromatin landscapes. Nat. Neurosci. 22(6), 897–908 (2019)
Raposo, A.A., Vasconcelos, F.F., Drechsel, D., Marie, C., Johnston, C., Dolle, D., et al.: Ascl1 coordinately regulates gene expression and the chromatin landscape during neurogenesis. Cell Rep. 10, 1544–1556 (2015)
Park, N.I., Guilhamon, P., Desai, K., McAdam, R.F., Langille, E., O’Connor, M., et al.: ASCL1 reorganizes chromatin to direct neuronal fate and suppress tumorigenicity of glioblastoma stem cells. Cell Stem Cell 21, 209–224 (2017). e207
Ren, G., Jin, W., Cui, K., Rodrigez, J., Hu, G., Zhang, Z., et al.: CTCF-mediated enhancer-promoter interaction is a critical regulator of cell-to-cell variation of gene expression. Mol. Cell 67, 1049–1058 (2017). e1046
Whalen, S., Truty, R.M., Pollard, K.S.: Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 48, 488 (2016)
Moorefield, K.S., Yin, H., Nichols, T.D., Cathcart, C., Simmons, S.O., Horowitz, J.M.: Sp2 localizes to subnuclear foci associated with the nuclear matrix. Mol. Biol. Cell 17, 1711–1722 (2006)
Hnisz, D., Weintraub, A.S., Day, D.S., Valton, A.-L., Bak, R.O., Li, C.H., et al.: Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016)
Guo, Y.A., Chang, M.M., Huang, W., Ooi, W.F., Xing, M., Tan, P., et al.: Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers. Nat. Commun. 9, 1520 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kuang, S., Wang, L. (2020). Deep Learning of CTCF-Mediated Chromatin Loops in 3D Genome Organization. In: Măndoiu, I., Murali, T., Narasimhan, G., Rajasekaran, S., Skums, P., Zelikovsky, A. (eds) Computational Advances in Bio and Medical Sciences. ICCABS 2019. Lecture Notes in Computer Science(), vol 12029. Springer, Cham. https://doi.org/10.1007/978-3-030-46165-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-46165-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46164-5
Online ISBN: 978-3-030-46165-2
eBook Packages: Computer ScienceComputer Science (R0)