Skip to main content
Log in

Prokaryotic and eukaryotic promoters identification based on residual network transfer learning

  • Research Paper
  • Published:
Bioprocess and Biosystems Engineering Aims and scope Submit manuscript

Abstract

Promoters contribute to research in the context of many diseases, such as coronary heart disease, diabetes and tumors, and one fundamental task is to identify promoters. Deep learning is widely used in the study of promoter sequence recognition. Although deep models have fast and accurate recognition capabilities, they are also limited by their reliance on large amounts of high-quality data. Therefore, we performed transfer learning on a typical deep network based on residual ideas, called a deep residual network (ResNet), to solve the problem of a deep network's high dependence on large amounts of data in the process of promoter prediction. We used binary one-hot encoding to represent the promoter and took advantage of ResNet to extract feature representations from organisms with a large amount of promoter data. Then, we transferred the learned structural parameters to target organisms with insufficient promoter data to improve the generalization performance of ResNet in target organisms. We evaluated the promoter datasets of four organisms (Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae and Drosophila melanogaster). The experimental results showed that the AUCs of ResNet’s promoter prediction after deep transfer were 0.8537 and 0.8633, which increased by 0.1513 and 0.1376 in prokaryotes and eukaryotes, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

source domain E. coli transfer to the target domain B. subtilis using transfer learning (TL) and baseline (B). B Line chart of AUC and Acc for source domain B. subtilis transfer to the target domain E. coli using transfer learning (TL) and baseline (B). C Line chart of AUC and Acc for source domain D. melanogaster transfer to the target domain S. cerevisiae using transfer learning (TL) and baseline (B). D Line chart of AUC and Acc for source domain S. cerevisiae transfer to the target domain D. melanogaster using transfer learning (TL) and baseline (B)

Fig. 5
Fig. 6

source domain and 50% of the target domain trained the model. C Using 50% of the target domain fine-tuned the pretrained model obtained based on the ImageNet [33] database. D After obtaining the pretrained model in the source domain, it was transferred to the target domain and fine-tuned using 50% of the target domain

Similar content being viewed by others

References

  1. Kondapalli MS, Galimudi RK, Gundapaneni KK, Padala C, Cingeetham A, Gantala S, Ali A, Shyamala N, Sahu SK, Nallari P (2016) Mmp 1 circulating levels and promoter polymorphism in risk prediction of coronary artery disease in asymptomatic first degree relatives. Gene 595(1):115–120. https://doi.org/10.1016/j.gene.2016.09.041

    Article  CAS  PubMed  Google Scholar 

  2. Gantala SR, Kon Da Palli MS, Kummari R, Padala C, Tupurani MA, Kupsal K, Galimudi RK, Gun Da Paneni KK, Puranam K, Shyamala N (2018) Collagenase-1 (-1607 1g/2g), gelatinase-a (-1306 c/t), stromelysin-1 (-1171 5a/6a) functional promoter polymorphisms in risk prediction of type 2 diabetic nephropathy. Gene 673(5):22–31. https://doi.org/10.1016/j.gene.2018.06.007

    Article  CAS  PubMed  Google Scholar 

  3. Saif I, Kasmi Y, Allali K, Ennaji MM (2018) Prediction of DNA methylation in the promoter of gene suppressor tumor. Gene 651(20):166–173. https://doi.org/10.1016/j.gene.2018.01.082

    Article  CAS  PubMed  Google Scholar 

  4. Towsey M, Timms P, Hogan J, Mathews SA (2008) The cross-species prediction of bacterial promoters using a support vector machine. Comput Biol Chem 32(5):359–366. https://doi.org/10.1016/j.compbiolchem.2008.07.009

    Article  CAS  PubMed  Google Scholar 

  5. Demeler B, Zhou G (1991) Neural network optimization for E Coli promoter prediction. Nucleic Acids Res 19(7):1593–1599. https://doi.org/10.1093/nar/19.7.1593

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Silva SDAE, Forte F, Sartor ITS, Andrighetti T, Gerhardt GJL, Longaray Delamare AP, Echeverrigaray S (2014) DNA duplex stability as discriminative characteristic for Escherichia coli σ54- and σ28- dependent promoter sequences. Biologicals 42(1):22–28. https://doi.org/10.1016/j.biologicals.2013.10.001

    Article  CAS  Google Scholar 

  7. Coelho RV, de Avila E, Silva S, Echeverrigaray S, Delamare APL (2018) Bacillus subtilis promoter sequences data set for promoter prediction in gram-positive bacteria. Data Brief 19:264–270. https://doi.org/10.1016/j.dib.2018.05.025

    Article  PubMed  PubMed Central  Google Scholar 

  8. Lin H, Liang Z, Tang H, Chen W (2019) Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinf 16(4):1316–1321. https://doi.org/10.1109/TCBB.2017.2666141

    Article  CAS  Google Scholar 

  9. Rahman MS, Aktar U, Jani MR, Shatabda S (2019) Ipromoter-fsen: identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 111(5):1160–1166. https://doi.org/10.1016/j.ygeno.2018.07.011

    Article  CAS  PubMed  Google Scholar 

  10. Oubounyt M, Louadi Z, Tayara H, Chong KT (2019) Deepromoter: robust promoter predictor using deep learning. Front Genet 10:286. https://doi.org/10.3389/fgene.2019.00286

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Kalkatawi M, Magana-Mora A, Jankovic B, Bajic VB (2019) Deepgsr: an optimized deep-learning structure for the recognition of genomic signals and regions. Bioinformatics 35(7):1125–1132. https://doi.org/10.1093/bioinformatics/bty752

    Article  CAS  PubMed  Google Scholar 

  12. Amin R, Rahman CR, Ahmed S, Sifat M, Shatabda S (2020) Ipromoter-bncnn: a novel branched cnn based predictor for identifying and classifying sigma promoters. Bioinformatics 36(19):4869–4875. https://doi.org/10.1093/bioinformatics/btaa609

    Article  CAS  PubMed  Google Scholar 

  13. Umarov RK, Solovyev VV (2017) Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 12(2):e171410. https://doi.org/10.1371/journal.pone.0171410

    Article  CAS  Google Scholar 

  14. Cai M, Hao Nguyen C, Mamitsuka H, Li L (2021) Xgsea: cross-species gene set enrichment analysis via domain adaptation. Brief Bioinform 22(5):a406. https://doi.org/10.1101/2020.07.21.213645

    Article  Google Scholar 

  15. Engelen JEV, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440. https://doi.org/10.1007/s10994-019-05855-6

    Article  Google Scholar 

  16. Settles B (2010) Active learning literature survey. University of Wisconsinmadison. http://digital.library.wisc.edu/1793/60660

  17. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):9. https://doi.org/10.1186/s40537-016-0043-6

    Article  Google Scholar 

  18. Li L, Cai M (2019) Cross-species data classification by domain adaptation via discriminative heterogeneous maximum mean discrepancy. IEEE/ACM Trans Comput Biol Bioinf 18(1):312–324. https://doi.org/10.1109/tcbb.2019.2914103

    Article  Google Scholar 

  19. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  20. Zhang W, Li R, Zeng T, Sun Q, Kumar S, Ye J, Ji S (2020) Deep model based transfer and multi-task learning for biological image analysis. Proc Tenth ACM SIGKDD Int Conf Knowl Discov Data Mining 6(2):1475–1484. https://doi.org/10.1145/2783258.2783304

    Article  Google Scholar 

  21. Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y (2019) Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans Comput Biol Bioinf 16(6):2089–2100. https://doi.org/10.1109/TCBB.2018.2822803

    Article  Google Scholar 

  22. Wang S, Li Z, Yu Y, Xu J (2017) Folding membrane proteins by deep transfer learning. Cell Syst 5(3):202–211. https://doi.org/10.1016/j.cels.2017.09.001

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Giorgi JM, Bader GD (2018) Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics 34(23):4087–4094. https://doi.org/10.1093/bioinformatics/bty449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Hanson J, Litfin T, Paliwal K, Zhou Y (2019) Identifying molecular recognition features in intrinsically disordered regions of proteins by transfer learning. Bioinformatics 36(4):1107–1113. https://doi.org/10.1093/bioinformatics/btz691

    Article  CAS  Google Scholar 

  25. Sharifi-Noghabi H, Peng S, Zolotareva O, Collins CC, Ester M (2020) Aitl: adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics 36(Supplement_1):i380–i388. https://doi.org/10.1101/2020.01.24.918953

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p 770–778. https://doi.org/10.1109/CVPR.2016.90

  27. He W, Jia C, Duan Y, Zou Q (2018) 70propred: a predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst Biol. https://doi.org/10.1186/s12918-018-0570-1

    Article  PubMed  PubMed Central  Google Scholar 

  28. Ramzan U, Hiroyuki K, Yu Li, Xin G, Victor S (2019) Promoter analysis and prediction in the human genome using sequence-based deep learning models. Bioinformatics 35(16):2730–2737. https://doi.org/10.1093/bioinformatics/bty1068

    Article  CAS  Google Scholar 

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci. https://arxiv.org/abs/1409.1556v6

  30. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

    Article  CAS  PubMed  Google Scholar 

  31. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003

    Article  PubMed  Google Scholar 

  32. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. ICML 37:448–456. https://doi.org/10.5555/3045118.3045167

    Article  Google Scholar 

  33. Jia D, Wei D, Socher R, Li LJ, Kai L, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, p 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  34. Li QZ, Lin H (2006) The recognition and prediction of sigma70 promoters in Escherichia coli k-12. J Theor Biol 242(1):135–141. https://doi.org/10.1016/j.jtbi.2006.02.007

    Article  CAS  PubMed  Google Scholar 

  35. Song K (2012) Recognition of prokaryotic promoters based on a novel variable-window z-curve method. Nucleic Acids Res 40(3):963–971. https://doi.org/10.1093/nar/gkr795

    Article  CAS  PubMed  Google Scholar 

  36. Lin H, Li QZ (2011) Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theor Biosci 130:91–100. https://doi.org/10.1007/s12064-010-0114-8

    Article  Google Scholar 

  37. Lai HY, Zhang ZY, Su ZD (2019) iProEP: a computational predictor for predicting promoter. Mol Ther-Nucleic Acids 17:337–346. https://doi.org/10.1016/j.omtn.2019.05.028

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Funding

This work was supported by the Key project of basic science and frontier technology of Chongqing [No.cstc2015jcyjB0493]; and the Fundamental Research Funds for the Central Universities [No.106112017CDJPT160001].

Author information

Authors and Affiliations

Authors

Contributions

XL: conceptualization, writing—review and editing, resources. YX: writing—original draft, software, visualization. YL: investigation. LT: data curation.

Corresponding author

Correspondence to Xiao Liu.

Ethics declarations

Conflict of interest

Non-financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 52 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Xu, Y., Luo, Y. et al. Prokaryotic and eukaryotic promoters identification based on residual network transfer learning. Bioprocess Biosyst Eng 45, 955–967 (2022). https://doi.org/10.1007/s00449-022-02716-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00449-022-02716-w

Keywords

Navigation