Advertisement

XPredRBR: Accurate and Fast Prediction of RNA-Binding Residues in Proteins Using eXtreme Gradient Boosting

  • Lei Deng
  • Zuojin Dong
  • Hui Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10847)

Abstract

A variety of studies have shown that protein-RNA interactions play a vital role in many fundamental cellular processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. Identification of RNA-binding residues (RBR) in proteins is an key step to understand the mutual recognition mechanism underlying the protein-RNA interactions. In this paper, we proposed a novel method, XPredRBR, to predict the RNA binding residues in proteins, by exploiting the eXtreme Gradient Boosting (XGBoost) algorithm. Two types of new predictive features derived from residue interaction network and solvent exposures are combined with conventional sequence features and structural neighborhood features to predict RBR. We carried out empirical experiments on two datasets to demonstrate the performance of the proposed method. By 10-fold cross-validations, our method achieved the accuracy of 0.861, sensitivity of 0.872, MCC of 0.584 and AUC of 0.941 on the RBP170 dataset. On another independent test set RBP101, XPredRBR outperformed three traditional classifiers and seven existing RNA-binding residue methods. A case study on the chain E of 3PLA protein illustrated XPredRBR effectively identified most RNA-binding and non RNA-binding sites. Furthermore, XPredRBR is much faster than our previous method PredRBR. These experimental results show that our proposed method achieves state-of-the-art performance in predicting RNA-binding residues in proteins.

Keywords

Protein-RNA interactions eXtreme gradient boosting RNA-binding residues 

References

  1. 1.
    Glisovic, T., Bachorik, J.L., Yong, J., Dreyfuss, G.: RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582(14), 1977–1986 (2008)CrossRefGoogle Scholar
  2. 2.
    Re, A., Joshi, T., Kulberkyte, E., Morris, Q., Workman, C.T.: RNA–protein interactions: an overview. In: Gorodkin, J., Ruzzo, W.L. (eds.) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. MMB, vol. 1097, pp. 491–521. Humana Press, Totowa, NJ (2014).  https://doi.org/10.1007/978-1-62703-709-9_23CrossRefGoogle Scholar
  3. 3.
    Miao, Z., Westhof, E.: A large-scale assessment of nucleic acids binding site prediction programs. PLoS Comput. Biol. 11(12), e1004639 (2015)CrossRefGoogle Scholar
  4. 4.
    Chen, Y., Lim, C.: Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 36(5), e29 (2008)CrossRefGoogle Scholar
  5. 5.
    Maetschke, S., Yuan, Z.: Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinform. 10, 341 (2009)CrossRefGoogle Scholar
  6. 6.
    Miao, Z., Westhof, E.: Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res. 43(11), 5340–5351 (2015)CrossRefGoogle Scholar
  7. 7.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  8. 8.
    Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)CrossRefGoogle Scholar
  9. 9.
    Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The Higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, no. 7, p. 072015 (2015)Google Scholar
  10. 10.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)Google Scholar
  11. 11.
    Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)CrossRefGoogle Scholar
  12. 12.
    Rätsch, G., Onoda, T., Müller, K.-R.: Soft margins for AdaBoost. Mach. Learn. 42(3), 287–320 (2001)CrossRefGoogle Scholar
  13. 13.
    Tang, Y., Liu, D., Wang, Z., Wen, T., Deng, L.: A boosting approach for prediction of protein-RNA binding residues. BMC Bioinform. 18(13), 465 (2017)CrossRefGoogle Scholar
  14. 14.
    Luo, J., Liu, L., Venkateswaran, S., Song, Q., Zhou, X.: RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites. Sci. Rep. 7, 614 (2017)CrossRefGoogle Scholar
  15. 15.
    Yan, J., Kurgan, L.: DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res. 45(10), e84 (2017)Google Scholar
  16. 16.
    Pan, X., Shen, H.-B.: RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 18(1), 136 (2017)CrossRefGoogle Scholar
  17. 17.
    Rose, P., Prlic, A., Altunkaya, A.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45(D1), D271–D281 (2017)Google Scholar
  18. 18.
    Chakrabarty, B., Parekh, N.: NAPS: network analysis of protein structures. Nucleic Acids Res. 44(W1), W375–W382 (2016)CrossRefGoogle Scholar
  19. 19.
    Song, J., Tan, H., Takemoto, K., Akutsu, T.: HSEpred: predict half-sphere exposure from protein sequences. Bioinformatics 24(13), 1489–1497 (2008)CrossRefGoogle Scholar
  20. 20.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of SoftwareCentral South UniversityChangshaChina
  2. 2.Lab of Information ManagementChangzhou UniversityChangzhouChina

Personalised recommendations