Skip to main content

XPredRBR: Accurate and Fast Prediction of RNA-Binding Residues in Proteins Using eXtreme Gradient Boosting

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10847))

Included in the following conference series:

Abstract

A variety of studies have shown that protein-RNA interactions play a vital role in many fundamental cellular processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. Identification of RNA-binding residues (RBR) in proteins is an key step to understand the mutual recognition mechanism underlying the protein-RNA interactions. In this paper, we proposed a novel method, XPredRBR, to predict the RNA binding residues in proteins, by exploiting the eXtreme Gradient Boosting (XGBoost) algorithm. Two types of new predictive features derived from residue interaction network and solvent exposures are combined with conventional sequence features and structural neighborhood features to predict RBR. We carried out empirical experiments on two datasets to demonstrate the performance of the proposed method. By 10-fold cross-validations, our method achieved the accuracy of 0.861, sensitivity of 0.872, MCC of 0.584 and AUC of 0.941 on the RBP170 dataset. On another independent test set RBP101, XPredRBR outperformed three traditional classifiers and seven existing RNA-binding residue methods. A case study on the chain E of 3PLA protein illustrated XPredRBR effectively identified most RNA-binding and non RNA-binding sites. Furthermore, XPredRBR is much faster than our previous method PredRBR. These experimental results show that our proposed method achieves state-of-the-art performance in predicting RNA-binding residues in proteins.

H. Liu—This work was supported by National Natural Science Foundation of China under grants No. 61672541 and No. 61672113, and Natural Science Foundation of Hunan Province under grant No. 2017JJ3287.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Glisovic, T., Bachorik, J.L., Yong, J., Dreyfuss, G.: RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582(14), 1977–1986 (2008)

    Article  Google Scholar 

  2. Re, A., Joshi, T., Kulberkyte, E., Morris, Q., Workman, C.T.: RNA–protein interactions: an overview. In: Gorodkin, J., Ruzzo, W.L. (eds.) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. MMB, vol. 1097, pp. 491–521. Humana Press, Totowa, NJ (2014). https://doi.org/10.1007/978-1-62703-709-9_23

    Chapter  Google Scholar 

  3. Miao, Z., Westhof, E.: A large-scale assessment of nucleic acids binding site prediction programs. PLoS Comput. Biol. 11(12), e1004639 (2015)

    Article  Google Scholar 

  4. Chen, Y., Lim, C.: Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 36(5), e29 (2008)

    Article  Google Scholar 

  5. Maetschke, S., Yuan, Z.: Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinform. 10, 341 (2009)

    Article  Google Scholar 

  6. Miao, Z., Westhof, E.: Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res. 43(11), 5340–5351 (2015)

    Article  Google Scholar 

  7. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  8. Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)

    Article  Google Scholar 

  9. Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The Higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, no. 7, p. 072015 (2015)

    Google Scholar 

  10. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  11. Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)

    Article  Google Scholar 

  12. Rätsch, G., Onoda, T., Müller, K.-R.: Soft margins for AdaBoost. Mach. Learn. 42(3), 287–320 (2001)

    Article  Google Scholar 

  13. Tang, Y., Liu, D., Wang, Z., Wen, T., Deng, L.: A boosting approach for prediction of protein-RNA binding residues. BMC Bioinform. 18(13), 465 (2017)

    Article  Google Scholar 

  14. Luo, J., Liu, L., Venkateswaran, S., Song, Q., Zhou, X.: RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites. Sci. Rep. 7, 614 (2017)

    Article  Google Scholar 

  15. Yan, J., Kurgan, L.: DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res. 45(10), e84 (2017)

    Google Scholar 

  16. Pan, X., Shen, H.-B.: RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 18(1), 136 (2017)

    Article  Google Scholar 

  17. Rose, P., Prlic, A., Altunkaya, A.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45(D1), D271–D281 (2017)

    Google Scholar 

  18. Chakrabarty, B., Parekh, N.: NAPS: network analysis of protein structures. Nucleic Acids Res. 44(W1), W375–W382 (2016)

    Article  Google Scholar 

  19. Song, J., Tan, H., Takemoto, K., Akutsu, T.: HSEpred: predict half-sphere exposure from protein sequences. Bioinformatics 24(13), 1489–1497 (2008)

    Article  Google Scholar 

  20. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deng, L., Dong, Z., Liu, H. (2018). XPredRBR: Accurate and Fast Prediction of RNA-Binding Residues in Proteins Using eXtreme Gradient Boosting. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94968-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94967-3

  • Online ISBN: 978-3-319-94968-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics