Skip to main content

Advertisement

Log in

EPDRNA: A Model for Identifying DNA–RNA Binding Sites in Disease-Related Proteins

  • Published:
The Protein Journal Aims and scope Submit manuscript

Abstract

Protein–DNA and protein–RNA interactions are involved in many biological processes and regulate many cellular functions. Moreover, they are related to many human diseases. To understand the molecular mechanism of protein–DNA binding and protein–RNA binding, it is important to identify which residues in the protein sequence bind to DNA and RNA. At present, there are few methods for specifically identifying the binding sites of disease-related protein–DNA and protein–RNA. In this study, so we combined four machine learning algorithms into an ensemble classifier (EPDRNA) to predict DNA and RNA binding sites in disease-related proteins. The dataset used in model was collated from UniProt and PDB database, and PSSM, physicochemical properties and amino acid type were used as features. The EPDRNA adopted soft voting and achieved the best AUC value of 0.73 at the DNA binding sites, and the best AUC value of 0.71 at the RNA binding sites in 10-fold cross validation in the training sets. In order to further verify the performance of the model, we assessed EPDRNA for the prediction of DNA-binding sites and the prediction of RNA-binding sites on the independent test dataset. The EPDRNA achieved 85% recall rate and 25% precision on the protein–DNA interaction independent test set, and achieved 82% recall rate and 27% precision on the protein–RNA interaction independent test set. The online EPDRNA webserver is freely available at http://www.s-bioinformatics.cn/epdrna.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Luscombe NM, Austin SE, Berman HM et al (2000) Genome Biol 1(1):1–37

    Article  Google Scholar 

  2. Charoensawan V, Wilson D, Teichmann SA (2010) Nucleic Acids Res 38(21):7364–7377

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Glisovic T, Bachorik JL, Yong J et al (2008) FEBS Lett 582(14):1977–1986

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Noller HF (2005) Science 309(5740):1508–1514

    Article  ADS  CAS  PubMed  Google Scholar 

  5. Hertel KJ, Graveley BR (2005) Trends Biochem Sci 30(3):115–118

    Article  CAS  PubMed  Google Scholar 

  6. Lukong KE, Chang K-W, Khandjian EW et al (2008) Trends Genet 24(8):416–425

    Article  CAS  PubMed  Google Scholar 

  7. Chen-Plotkin AS, Lee VM-Y, Trojanowski JQ (2010) Nat Rev Neurol 6(4):211–220

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hu W, Xin Y, Hu J et al (2019) Cell Commun Signal 17(1):1–11

    Article  Google Scholar 

  9. Bullock AN, Fersht AR (2001) Nat Rev Cancer 1(1):68–76

    Article  CAS  PubMed  Google Scholar 

  10. Neef DW, Jaeger AM, Thiele DJ (2011) Nat Rev Drug Discov 10(12):930–944

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Camandola S, Mattson MP (2007) Expert Opin Ther Targets 11(2):123–132

    Article  CAS  PubMed  Google Scholar 

  12. Lee DH, Kim TM, Kim JK et al (2019) Theranostics 9(19):5694

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Pereira B, Billaud M, Almeida R (2017) Trends Cancer 3(7):506–528

    Article  CAS  PubMed  Google Scholar 

  14. Barnby G, Abbott A, Sykes N et al (2005) Am J Hum Genet 76(6):950–966

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Voineagu I, Wang X, Johnston P et al (2011) Nature 474(7351):380–384

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Zhou H, Mangelsdorf M, Liu J et al (2014) Sci China Life Sci 57(4):432–444

    Article  CAS  PubMed  Google Scholar 

  17. Bansal P, Arora M (2020) Adv Exp Med Biol 1229:105–118

    Article  CAS  PubMed  Google Scholar 

  18. de Bruin RG, Rabelink TJ, van Zonneveld AJ et al (2017) Eur Heart J 38(18):1380–1388

    PubMed  Google Scholar 

  19. Teichmann SA, Murzin AG, Chothia C (2001) Curr Opin Struct Biol 11(3):354–363

    Article  CAS  PubMed  Google Scholar 

  20. Burley SK, Bhikadiya C, Bi C et al (2021) Nucleic Acids Res 49(D1):D437–D451

    Article  CAS  PubMed  Google Scholar 

  21. Zhang QC, Petrey D, Deng L et al (2012) Nature 490(7421):556–560

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ahmad S, Sarai A (2005) BMC Bioinform 6(1):1–6

    Article  Google Scholar 

  23. Hwang S, Gou Z, Kuznetsov IB (2007) Bioinformatics 23(5):634–636

    Article  CAS  PubMed  Google Scholar 

  24. Wang L, Huang C, Yang MQ et al (2010) BMC Syst Biol 4(1):1–9

    Article  CAS  Google Scholar 

  25. Yan J, Kurgan L (2017) Nucleic Acids Res 45(10):e84

    PubMed  PubMed Central  Google Scholar 

  26. Si J, Zhang Z, Lin B et al (2011) BMC Syst Biol 5(1):1–7

    Article  Google Scholar 

  27. Qiu JJ, Bernhofer M, Heinzinger M et al (2020) J Mol Biol 432(7):2428–2443

    Article  CAS  PubMed  Google Scholar 

  28. Wang N, Yan K, Zhang J et al (2022) Brief Bioinform 24(37):18

    Google Scholar 

  29. Zhang J, Chen QC, Liu B (2021) Brief Bioinform 22(5)

  30. Zhang J, Chen QC, Liu B (2020) J Mol Biol 432(22):5860–5875

    Article  CAS  PubMed  Google Scholar 

  31. Feng JW, Wang N, Zhang J et al (2022) Comput Biol Med 149:105940

    Article  CAS  PubMed  Google Scholar 

  32. Cui FF, Li S, Zhang ZL et al (2022) Comput Struct Biotechnol J 20:2020–2028

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wang N, Zhang J, Liu B (2022) IEEE/ACM Trans Comput Biol Bioinform 19(4):2284–2293

    Article  CAS  PubMed  Google Scholar 

  34. Hu J, Li Y, Zhang M et al (2017) IEEE/ACM Trans Comput Biol Bioinform 14(64):1389–1398

    Article  CAS  PubMed  Google Scholar 

  35. Bahadur RP, Zacharias M, Janin J (2008) Nucleic Acids Res 36(8):2705–2716

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Barik A, Mishra A, Bahadur RP (2012) Nucleic Acids Res 40:440–444

    Article  Google Scholar 

  37. Chen YC, Sargsyan K, Wright JD et al (2014) Nucleic Acids Res 42(3):e15

    Article  CAS  PubMed  Google Scholar 

  38. Terribilini M, Sander JD, Lee JH et al (2007) Nucleic Acids Res 35:578–584

    Article  Google Scholar 

  39. Zhang T, Zhang H, Chen K et al (2010) Curr Protein Pept Sci 11(7):609–628

    Article  CAS  PubMed  Google Scholar 

  40. Fernandez M, Kumagai Y, Standley DM et al (2011) BMC Bioinform 12:S5

    Article  CAS  Google Scholar 

  41. Liu ZP, Wu LY, Wang Y et al (2010) Bioinformatics 26(13):1616–1622

    Article  CAS  PubMed  Google Scholar 

  42. Gupta A, Gribskov M (2011) J Mol Biol 409(4):574–587

    Article  CAS  PubMed  Google Scholar 

  43. Wang CC, Fang Y, Xiao J et al (2011) Amino Acids 40(1):239–248

    Article  CAS  PubMed  Google Scholar 

  44. Ren H, Shen Y (2015) BMC Bioinform 16(1):249

    Article  Google Scholar 

  45. Li S, Yamashita K, Amada KM et al (2014) Nucleic Acids Res 42(15):10086–10098

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Sun M, Wang X, Zou C et al (2016) BMC Bioinform 17(1):231

    Article  Google Scholar 

  47. Sathyapriya R, Vijayabaskar MS, Vishveshwara S et al (2016) PLoS Comput Biol 4(9):e1000170

    Article  Google Scholar 

  48. Dey S, Pal A, Guharoy M et al (2012) Nucleic Acids Res 40(15):7150–7161

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Liu R, Hu J (2013) Proteins 81(11):1885–1899

    Article  CAS  PubMed  Google Scholar 

  50. Ma X, Guo J, Liu HD et al (2012) IEEE/ACM Trans Comput Biol Bioinform 9(6):1766–1775

    Article  PubMed  Google Scholar 

  51. Chakravarty A, Carlson JM, Khetani RS, Gross RH (2007) BMC Bioinform 8:249–263

    Article  Google Scholar 

  52. Zhang C, Ma Y (2012) Ensemble machine learning: methods and applications. Springer, Berlin

    Book  Google Scholar 

  53. Osareh A, Shadgar B (2013) Biomed Res Int 2013:478410

    Article  PubMed  PubMed Central  Google Scholar 

  54. Kim C, You SC, Reps JM et al (2021) J Am Med Inform Assoc 28(6):1098–1107

    Article  PubMed  Google Scholar 

  55. Iakoucheva LM, Brown CJ, Lawson JD et al (2002) J Mol Biol 323(3):573–584

    Article  CAS  PubMed  Google Scholar 

  56. Cheng Y, LeGall T, Oldfield CJ et al (2006) Biochemistry 45(35):10448–10460

    Article  CAS  PubMed  Google Scholar 

  57. Uversky VN (2014) Front Biosci (Landmark Ed) 19:181–258

    Article  CAS  PubMed  Google Scholar 

  58. Bateman A, Martin M-J, Orchard S et al (2020) Nucleic Acids Res

  59. Huang Y, Niu B, Gao Y et al (2010) Bioinformatics 26(5):680–682

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Zhang J, Chen Q, Liu B (2021) IEEE/ACM Trans Comput Biol Bioinform 18(4):1451–1463

    Article  CAS  PubMed  Google Scholar 

  61. Ahmad S, Gromiha MM, Sarai A (2004) Bioinformatics 20(4):477–486

    Article  CAS  PubMed  Google Scholar 

  62. Si J, Zhang Z, Lin B et al (2011) BMC Syst Biol 17:88–105

    Google Scholar 

  63. Huang YF, Chiu LY, Huang CC et al (2010) BMC Genomics 11:S2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Walia RR, Caragea C, Lewis BA et al (2012) BMC Bioinform 13(1):1–20

    Article  Google Scholar 

  65. Terribilini M, Sander JD, Lee J-H et al (2007) Nucleic Acids Res 35(Suppl_2):W578–W584

    Article  PubMed  PubMed Central  Google Scholar 

  66. DeLano WL (2002) CCP4 Newsl Protein Crystallogr 40(1):82–92

    Google Scholar 

  67. Ahmad S, Sarai A (2005) BMC Bioinform 19(6):33

    Article  Google Scholar 

  68. Altschul SF, Madden TL, Schäffer AA et al (1997) Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Kawashima S, Pokarowski P, Pokarowska M et al (2007) Nucleic Acids Res 36(Suppl_1):D202–D205

    Article  PubMed  PubMed Central  Google Scholar 

  70. Wei ZS, Han K, Yang JY et al (2016) Neurocomputing 193:201–212

    Article  Google Scholar 

  71. Raymer ML, Sanschagrin PC, Punch WF et al (1997) J Mol Biol 265(4):445–464

    Article  CAS  PubMed  Google Scholar 

  72. Mousavi SZ, Kavian A, Soleimani K et al (2011) Geomatics Nat Hazards Risk 2(1):33–50

    Article  Google Scholar 

  73. Chen C, Wang H (2020) J Comput Biol 27(6):934–940

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  74. Song X, Zhu J, Tan X et al (2022) Front Public Health 10:926069

    Article  PubMed  PubMed Central  Google Scholar 

  75. Zhao Z, Xu Y, Zhao Y (2019) Genes (Basel) 10(12):965

    Article  CAS  PubMed  Google Scholar 

  76. Batista GE, Prati RC, Monard MC (2004) ACM SIGKDD Explor Newsl 6(1):20–29

    Article  Google Scholar 

  77. Chawla NV, Bowyer KW, Hall LO et al (2002) J Artif Intell Res 16:321–357

    Article  Google Scholar 

  78. Wilson DL (1972) IEEE Trans Syst Man Cybern 3:408–421

    Article  Google Scholar 

  79. Luengo J, Fernández A, García S et al (2011) Soft Comput 15(10):1909–1936

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the PYMOL’s author Warren Lyford Delano and acknowledge the author of DB-Bind and DRNApred for making their methods available. And the authors are grateful to the anonymous reviewers for their valuable suggestions and comments, which have led to the improvement of this paper. The work was supported by the National Natural Science Foundation of China (No. 62262050) and the Special Fund of National Natural Science Foundation of China (No. 62141204).

Author information

Authors and Affiliations

Authors

Contributions

YengE Feng designed the project and performed the analysis and drafted the manuscript. CanZhuang Sun collected the data and carried out the computation of binding sites and set up web server. All authors read and approved the final manuscript.

Corresponding author

Correspondence to YongE Feng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, C., Feng, Y. EPDRNA: A Model for Identifying DNA–RNA Binding Sites in Disease-Related Proteins. Protein J (2024). https://doi.org/10.1007/s10930-024-10183-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10930-024-10183-3

Keywords

Navigation