Abstract
A variety of studies have shown that protein-RNA interactions play a vital role in many fundamental cellular processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. Identification of RNA-binding residues (RBR) in proteins is an key step to understand the mutual recognition mechanism underlying the protein-RNA interactions. In this paper, we proposed a novel method, XPredRBR, to predict the RNA binding residues in proteins, by exploiting the eXtreme Gradient Boosting (XGBoost) algorithm. Two types of new predictive features derived from residue interaction network and solvent exposures are combined with conventional sequence features and structural neighborhood features to predict RBR. We carried out empirical experiments on two datasets to demonstrate the performance of the proposed method. By 10-fold cross-validations, our method achieved the accuracy of 0.861, sensitivity of 0.872, MCC of 0.584 and AUC of 0.941 on the RBP170 dataset. On another independent test set RBP101, XPredRBR outperformed three traditional classifiers and seven existing RNA-binding residue methods. A case study on the chain E of 3PLA protein illustrated XPredRBR effectively identified most RNA-binding and non RNA-binding sites. Furthermore, XPredRBR is much faster than our previous method PredRBR. These experimental results show that our proposed method achieves state-of-the-art performance in predicting RNA-binding residues in proteins.
H. Liu—This work was supported by National Natural Science Foundation of China under grants No. 61672541 and No. 61672113, and Natural Science Foundation of Hunan Province under grant No. 2017JJ3287.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Glisovic, T., Bachorik, J.L., Yong, J., Dreyfuss, G.: RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582(14), 1977–1986 (2008)
Re, A., Joshi, T., Kulberkyte, E., Morris, Q., Workman, C.T.: RNA–protein interactions: an overview. In: Gorodkin, J., Ruzzo, W.L. (eds.) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. MMB, vol. 1097, pp. 491–521. Humana Press, Totowa, NJ (2014). https://doi.org/10.1007/978-1-62703-709-9_23
Miao, Z., Westhof, E.: A large-scale assessment of nucleic acids binding site prediction programs. PLoS Comput. Biol. 11(12), e1004639 (2015)
Chen, Y., Lim, C.: Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 36(5), e29 (2008)
Maetschke, S., Yuan, Z.: Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinform. 10, 341 (2009)
Miao, Z., Westhof, E.: Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res. 43(11), 5340–5351 (2015)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)
Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The Higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, no. 7, p. 072015 (2015)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Rätsch, G., Onoda, T., Müller, K.-R.: Soft margins for AdaBoost. Mach. Learn. 42(3), 287–320 (2001)
Tang, Y., Liu, D., Wang, Z., Wen, T., Deng, L.: A boosting approach for prediction of protein-RNA binding residues. BMC Bioinform. 18(13), 465 (2017)
Luo, J., Liu, L., Venkateswaran, S., Song, Q., Zhou, X.: RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites. Sci. Rep. 7, 614 (2017)
Yan, J., Kurgan, L.: DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res. 45(10), e84 (2017)
Pan, X., Shen, H.-B.: RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 18(1), 136 (2017)
Rose, P., Prlic, A., Altunkaya, A.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45(D1), D271–D281 (2017)
Chakrabarty, B., Parekh, N.: NAPS: network analysis of protein structures. Nucleic Acids Res. 44(W1), W375–W382 (2016)
Song, J., Tan, H., Takemoto, K., Akutsu, T.: HSEpred: predict half-sphere exposure from protein sequences. Bioinformatics 24(13), 1489–1497 (2008)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Deng, L., Dong, Z., Liu, H. (2018). XPredRBR: Accurate and Fast Prediction of RNA-Binding Residues in Proteins Using eXtreme Gradient Boosting. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-94968-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94967-3
Online ISBN: 978-3-319-94968-0
eBook Packages: Computer ScienceComputer Science (R0)