Abstract
Background
In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions.
Methods
Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given.
Results
Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes.
Conclusions
This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.
Article PDF
Similar content being viewed by others
References
Sexton, T. and Cavalli, G. (2015) The role of chromosome domains in shaping the functional genome. Cell, 160, 1049–1059
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293
Fullwood, M. J. and Ruan, Y. (2009) ChIP-based methods for the identification of long-range chromatin interactions. J. Cell. Biochem., 107, 30–39
Tang, Z., Luo, O. J., Li, X., Zheng, M., Zhu, J. J., Szalaj, P., Trzaskoma, P., Magalska, A., Włodarczyk, J., Ruszczycki, B., et al. (2015) CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell, 163, 1611–1627
Zhang, Y., Wong, C.-H., Birnbaum, R. Y., Li, G., Favaro, R., Ngan, C. Y., Lim, J., Tai, E., Poh, H. M., Wong, E., et al. (2013) Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature, 504, 306–310
Dixon, J. R., Jung, I., Selvaraj, S., Shen, Y., Antosiewicz-Bourget, J. E., Lee, A. Y., Ye, Z., Kim, A., Rajagopal, N., Xie, W., et al. (2015) Chromatin architecture reorganization during stem cell differentiation. Nature, 518, 331–336
Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D. U., Jung, I., Wu, H., Zhai, Y., Tang, Y., et al. (2015) CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell, 162, 900–910
Sanyal, A., Lajoie, B. R., Jain, G. and Dekker, J. (2012) The long-range interaction landscape of gene promoters. Nature, 489, 109–113
Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., et al. (2012) Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell, 148, 84–98
Rao, S. S., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., Sanborn, A. L., Machol, I., Omer, A. D., Lander, E. S., et al. (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159, 1665–1680
Roy, S., Siahpirani, A. F., Chasman, D., Knaack, S., Ay, F., Stewart, R., Wilson, M. and Sridharan, R. (2015) A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res., 43, 8694–8712
Whalen, S., Truty, R. M. and Pollard, K. S. (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet., 48, 488–496
Schreiber, J., Libbrecht, M., Bilmes, J. and Noble, W. (2018) Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv, 103614
Zhu, Y., Chen, Z., Zhang, K., Wang, M., Medovoy, D., Whitaker, J. W., Ding, B., Li, N., Zheng, L. and Wang, W. (2016) Constructing 3D interaction maps from 1D epigenomes. Nat. Commun., 7, 10812
Cao, Q., Anyansi, C., Hu, X., Xu, L., Xiong, L., Tang, W., Mok, M. T. S., Cheng, C., Fan, X., Gerstein, M., et al. (2017) Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet., 49, 1428–1436
Yang, Y., Zhang, R., Singh, S., and Ma, J. (2017) Exploiting sequence-based features for predicting enhancer-promoter interactions. Bioinformatics 33, i252–i260
Friedman, J. H. (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat., 29, 1189–1232
Zhou, J. and Troyanskaya, O. G. (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods, 12, 931–934
Park, Y. and Kellis, M. (2015) Deep learning for regulatory genomics. Nat. Biotechnol., 33, 825–826
Alipanahi, B., Delong, A., Weirauch, M. T. and Frey, B. J. (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838
Quang, D. and Xie, X. (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res., 44, e107
Li, Y., Shi, W. and Wasserman, W. W. (2018) Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics, 19, 202
Kelley, D. R., Snoek, J. and Rinn, J. L. (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res., 26, 990–999
Zhang, S., Hu, H., Jiang, T., Zhang, L. and Zeng, J. (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
Cuperus, J. T., Groves, B., Kuchina, A., Rosenberg, A. B., Jojic, N., Fields, S. and Seelig, G. (2017) Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res., 27, 2015–2024
Singh, R., Lanchantin, J., Sekhon, A. and Qi, Y. (2017) Attend and predict: understanding gene regulation by selective attention on chromatin. In: Advances in Neural Information Processing Systems 30
Zhang, S., Hu, H., Jiang, T., Zhang, L. and Zeng, J. (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
Boža, V., Brejová, B., and Vinař, T. (2017) DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PloS one, 12, e0178751
Wang, S., Sun, S., Li, Z., Zhang, R. and Xu, J. (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324
Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T., Way, G. P., Ferrero, E., Agapow, P.-M., Zietz, M., Hoffman, M. M., et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface, 15, 142760
Angermueller, C., Pärnamaa, T., Parts, L. and Stegle, O. (2016) Deep learning for computational biology. Mol. Syst. Biol., 12, 878
ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74
Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J., et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature, 518, 317–330
Kulakovskiy, I. V., Vorontsov, I. E., Yevshin, I. S., Soboleva, A. V., Kasianov, A. S., Ashoor, H., Ba-Alawi, W., Bajic, V. B., Medvedeva, Y. A., Kolpakov, F. A., et al. (2016) HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res., 44, D116–D125
Xu, J., Sankaran, V. G., Ni, M., Menne, T. F., Puram, R. V., Kim, W. and Orkin, S. H. (2010) Transcriptional silencing of γ-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev., 24, 783–798
Frank, C. L., Liu, F., Wijayatunge, R., Song, L., Biegler, M. T., Yang, M. G., Vockley, C. M., Safi, A., Gersbach, C. A., Crawford, G. E., et al. (2015) Regulation of chromatin accessibility and Zic binding at enhancers in the developing cerebellum. Nat. Neurosci., 18, 647–656
Krivega, I. and Dean, A. (2017) LDB1-mediated enhancer looping can be established independent of mediator and cohesin. Nucleic Acids Res., 45, 8255–8268
Bowman, C. J., Ayer, D. E. and Dynlacht, B. D. (2014) Foxk proteins repress the initiation of starvation-induced atrophy and autophagy programs. Nat. Cell Biol., 16, 1202–1214
van Riel, B. and Rosenbauer, F. (2014) Epigenetic control of hematopoiesis: the PU.1 chromatin connection. Biol. Chem., 395, 1265–1274
Steidl, U., Rosenbauer, F., Verhaak, R. G., Gu, X., Ebralidze, A., Otu, H. H., Klippel, S., Steidl, C., Bruns, I., Costa, D. B., et al. (2006) Essential role of Jun family transcription factors in PU.1 knockdown-induced leukemic stem cells. Nat. Genet., 38, 1269–1277
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. and Noble, W. S. (2007) Quantifying similarity between motifs. Genome Biol., 8, R24
Hodis, E., Watson, I. R., Kryukov, G. V., Arold, S. T., Imielinski, M., Theurillat, J.-P., Nickerson, E., Auclair, D., Li, L., Place, C., et al. (2012) A landscape of driver mutations in melanoma. Cell, 150, 251–263
Xi, W. and Beer, M. A. (2018). Local epigenomic state cannot discriminate interacting and non-interacting enhancer-promoter pairs with high accuracy. PLoS Comput. Biol., 14, e1006625
Cao, Q., Anyansi, C., Hu, X., Xu, L., Xiong, L., Tang, W., Mok, M. T.S., Cheng, C., Fan, X., Gerstein, M. et al. (2017) Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines. Nat. Genet., 49, 1428–1436
Shrikumar, A., Greenside, P., Shcherbina, A. and Kundaje, A. (2016) Not just a black box: learning important features through propagating activation differences. arXiv, 1605.01713
Li, Y., Chen, C.-Y. and Wasserman, W. W. (2016) Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol., 23, 322–336
Glorot, X., Bordes, A. and Bengio, Y. (2011) Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligen Vol. 15, pp. 275
LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep learning. Nature, 521, 436–444
Hochreiter, S. and Schmidhuber, J. (1997) Long short-term memory. Neural Comput., 9, 1735–1780
Graves, A., Jaitly, N. and Mohamed, A.-R. (2013) Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on IEEE pp. 273–278
Chollet, F. (2015) Keras. https://doi.org/github.com/fchollet/keras, accessed on April 10, 2018
Kingma, D. and Ba, J. (2014) Adam: a method for stochastic optimization. arXiv, 1412.6980
Ioffe, S. and Szegedy, C. (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd International Conference on Machine Learning pp. 448–456
Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems pp. 1097–1105
Grant, C. E., Bailey, T. L. and Noble, W. S. (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics, 27, 1017–1018
Acknowledgements
We thank the members of the Ma lab, especially Yang Zhang, Yuchuan Wang, Ruochi Zhang, and Dechao Tian, for helpful discussions. We also thank Yihang Shen for technical assistance. This work was supported in part by the National Science Foundation (1252522 to Shashank Singh, 1054309 and 1262575 to Jian Ma) and the National Institutes of Health (HG007352 and DK107965 to Jian Ma).
Author information
Authors and Affiliations
Corresponding author
Additional information
Author summary: Distal enhancers in the human genome regulate target genes by interacting with promoters, forming enhancer-promoter interactions (EPIs). Experimental approaches have allowed us to recognize potential EPIs genome-wide, but it is unclear how the sequence information encoded in our genome helps guide such interactions. Here we report a novel machine learning tool (named SPEID) using deep neural networks that predicts EPIs directly from the DNA sequences, given locations of putative enhancers and promoters. We also apply SPEID to identify mutations that may have reduced EPIs in melanoma genomes. This work demonstrates that sequence-based features are sufficient to predict EPIs genome-wide.
Supplementary Materials
Rights and permissions
About this article
Cite this article
Singh, S., Yang, Y., Póczos, B. et al. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks. Quant Biol 7, 122–137 (2019). https://doi.org/10.1007/s40484-019-0154-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40484-019-0154-0