Amino Acids

, Volume 35, Issue 2, pp 295–302 | Cite as

PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles

  • Y. WangEmail author
  • Z. Xue
  • G. Shen
  • J. Xu


Protein–RNA interactions play a key role in a number of biological processes such as protein synthesis, mRNA processing, assembly and function of ribosomes and eukaryotic spliceosomes. A reliable identification of RNA-binding sites in RNA-binding proteins is important for functional annotation and site-directed mutagenesis. We developed a novel method for the prediction of protein residues that interact with RNA using support vector machine (SVM) and position-specific scoring matrices (PSSMs). Two cases have been considered in the prediction of protein residues at RNA-binding surfaces. One is given the sequence information of a protein chain that is known to interact with RNA; the other is given the structural information. Thus, five different inputs have been tested. Coupled with PSI-BLAST profiles and predicted secondary structure, the present approach yields a Matthews correlation coefficient (MCC) of 0.432 by a 7-fold cross-validation, which is the best among all previous reported RNA-binding sites prediction methods. When given the structural information, we have obtained the MCC value of 0.457, with PSSMs, observed secondary structure and solvent accessibility information assigned by DSSP as input. A web server implementing the prediction method is available at the following URL:

Keywords: Protein–RNA interactions – RNA-binding sites – Support vector machine – Multiple sequence alignment 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ahmad, S, Gromiha, MM, Sarai, A 2003Real value prediction of solvent accessibility from amino acid sequenceProteins Struct Funct Genet50629635PubMedCrossRefGoogle Scholar
  2. Ahmad, S, Gromiha, MM, Sarai, A 2004Analysis and prediction of DNA-binding proteins and their binding residues based on sequence and structural informationBioinformatics20477486PubMedCrossRefGoogle Scholar
  3. Ahmad, S, Sarai, A 2005PSSM-based prediction of DNA binding sites in proteinsBMC Bioinformatics633PubMedCrossRefGoogle Scholar
  4. Allers, J, Shamoo, Y 2001Structure-based analysis of protein-RNA interactions using the program ENTANGLEJ Mol Biol3117586PubMedCrossRefGoogle Scholar
  5. Altschul, SF, Madden, TL, Schaffer, AA, Zhang, J, Zhang, Z, Miller, W, Lipman, DJ 1997Gapped blast and psi-blast: a new generation of protein databases and search programsNucleic Acids Res2533893402PubMedCrossRefGoogle Scholar
  6. Berman, HM, Westbrook, J, Feng, Z, Gilliland, G, Bhat, TN, Weissig, H, Shindyalov, IN, Bourne, PE 2000The protein data bankNucleic Acids Res28235242PubMedCrossRefGoogle Scholar
  7. Cai, YD, Zhou, GP, Chou, KC 2003Support vector machines for predicting membrane protein types by using functional domain compositionBiophys J8432573263PubMedGoogle Scholar
  8. Chen, J, Liu, H, Yang, J, Chou, KC 2007Prediction of linear B-cell epitopes using amino acid pair antigenicity scaleAmino Acids33423428PubMedCrossRefGoogle Scholar
  9. Chou, KC 2001Prediction of protein cellular attributes using pseudo amino acid compositionProteins Struct Funct Genet43246255Erratum: ibid., 2001, Vol. 44 (60)PubMedCrossRefGoogle Scholar
  10. Chou, KC 2005aCoupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding proteinJ Proteome Res416811686CrossRefGoogle Scholar
  11. Chou, KC 2005bInsights from modeling the 3D structure of DNA-CBF3b complexJ Proteome Res416571660CrossRefGoogle Scholar
  12. Chou, KC 2005cUsing amphiphilic pseudo amino acid composition to predict enzyme subfamily classesBioinformatics211019CrossRefGoogle Scholar
  13. Chou, KC, Cai, YD 2002Using functional domain composition and support vector machines for prediction of protein subcellular locationJ Biol Chem2774576545769PubMedCrossRefGoogle Scholar
  14. Chou, KC, Shen, HB 2006aHum-PLoc: a novel ensemble classifier for predicting human protein subcellular localizationBiochem Biophys Res Commun347150157CrossRefGoogle Scholar
  15. Chou, KC, Shen, HB 2006bLarge-scale predictions of Gram-negative bacterial protein subcellular locationsJ Proteome Res534203428CrossRefGoogle Scholar
  16. Chou, KC, Shen, HB 2007aEuk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sitesJ Proteome Res617281734Google Scholar
  17. Chou, KC, Shen, HB 2007bLarge-scale plant protein subcellular location predictionJ Cell Biochem100665678CrossRefGoogle Scholar
  18. Chou, KC, Shen, HB 2007cMemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSMBiochem Biophys Res Commun360339345CrossRefGoogle Scholar
  19. Chou, KC, Shen, HB 2007dReview: recent progresses in protein subcellular location predictionAnal Biochem370116CrossRefGoogle Scholar
  20. Chou, KC, Shen, HB 2007eSignal-CF: a subsite-coupled and window-fusing approach for predicting signal peptidesBiochem Biophys Res Commun357633640CrossRefGoogle Scholar
  21. Chou, KC, Zhang, CT 1995Prediction of protein structural classesCrit Rev Biochem Mol Biol30275349PubMedCrossRefGoogle Scholar
  22. Diao, Y, Li, M, Feng, Z, Yin, J, Pan, Y 2007aThe community structure of human cellular signaling networkJ Theor Biol247608615CrossRefGoogle Scholar
  23. Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids, DOI: 10.1007/s00726-007-0550-zGoogle Scholar
  24. Ding, YS, Zhang, TL, Chou, KC 2007Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine networkProtein Peptide Lett14811815CrossRefGoogle Scholar
  25. Draper, DE 1994Protein-RNA recognitionAnnu Rev Biochem64593620CrossRefGoogle Scholar
  26. Draper, DE 1999Themes in RNA-protein recognitionJ Mol Biol293255270PubMedCrossRefGoogle Scholar
  27. Gao, Y, Shao, SH, Xiao, X, Ding, YS, Huang, YS, Huang, ZD, Chou, KC 2005Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filterAmino Acids28373376PubMedCrossRefGoogle Scholar
  28. Guo, YZ, Li, M, Lu, M, Wen, Z, Wang, K, Li, G, Wu, J 2006Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transformAmino Acids30397402PubMedCrossRefGoogle Scholar
  29. Hua, SJ, Sun, Z 2001A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approachJ Mol Biol308397407PubMedCrossRefGoogle Scholar
  30. Hua, SJ, Sun, Z 2001Support vector machine approach for protein subcellular location predictionBioinformatics17721728PubMedCrossRefGoogle Scholar
  31. Jahandideh, S, Abdolmaleki, P, Jahandideh, M, Asadabadi, EB 2007Novel two-stage hybrid neural discriminant model for predicting proteins structural classesBiophys Chem1288793PubMedCrossRefGoogle Scholar
  32. Jeong, E, Chung, IF, Miyano, S 2004A neural network method for identification of RNA-interacting residues in proteinsGenome Inform Ser Workshop Genome Inform15105116Google Scholar
  33. Jeong E, Miyano S (2006) A weighted profile based method for Protein-RNA interacting residues prediction. Trans Comput Syst Biol IV: 123–139Google Scholar
  34. Joachims, T 1999Making large-scale SVM learning practicalSchőlkopf, BBurges, CSmola, A eds. Advances in kernel methods-support vector learningMIT PressCambridge, MA, USAGoogle Scholar
  35. Jones, S, Thornton, JM 1997Prediction of protein-protein interaction sites using patch analysisJ Mol Biol272133143PubMedCrossRefGoogle Scholar
  36. Jones, DT 1999Protein secondary structure prediction based on position-specific scoring matricesJ Mol Biol292195202PubMedCrossRefGoogle Scholar
  37. Jones, S, Daley, DT, Luscombe, NM, Berman, HM, Thornton, JM 2001Protein-RNA interaction: a structural analysisNucleic Acids Res29943954PubMedCrossRefGoogle Scholar
  38. Kabsch, W, Sander, C 1983Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical featuresBiopolymers2225772637PubMedCrossRefGoogle Scholar
  39. Kedarisetti, KD, Kurgan, L, Dick, S 2006Classifier ensembles for protein structural class prediction with varying homologyBiochem Biophys Res Commun348981988PubMedCrossRefGoogle Scholar
  40. Koike, A, Takagi, T 2004Prediction of protein–protein interaction sites using support vector machinesProtein Eng17165173CrossRefGoogle Scholar
  41. Liu, DQ, Liu, H, Shen, HB, Yang, J, Chou, KC 2007Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignmentsAmino Acids32493496PubMedCrossRefGoogle Scholar
  42. Mondal, S, Bhavna, R, Mohan Babu, R, Ramakumar, S 2006Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classificationJ Theor Biol243252260PubMedCrossRefGoogle Scholar
  43. Morik K, Brockhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach – a case study in intensive care monitoring. In: Proceedings of the 16th International Conference on Machine Learning (ICML-99)Google Scholar
  44. Mundra, P, Kumar, M, Kumar, KK, Jayaraman, VK, Kulkarni, BD 2007Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM sourcePattern Recogn Lett2816101615CrossRefGoogle Scholar
  45. Niu, B, Cai, YD, Lu, WC, Zheng, GY, Chou, KC 2006Predicting protein structural class with AdaBoost learnerProtein Peptide Lett13489492CrossRefGoogle Scholar
  46. Ofran, Y, Rost, B 2003Predicted protein–protein interaction sites from local sequence informationFEBS Lett544236239PubMedCrossRefGoogle Scholar
  47. Shen, HB, Chou, KC 2005Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid compositionBiochem Biophys Res Commun337752756PubMedCrossRefGoogle Scholar
  48. Shen, HB, Chou, KC 2006Using ensemble classifier to identify membrane protein typesAmino Acids32483488PubMedCrossRefGoogle Scholar
  49. Shen, HB, Chou, KC 2007aEzyPred: a top-down approach for predicting enzyme functional classes and subclassesBiochem Biophys Res Commun3645359CrossRefGoogle Scholar
  50. Shen, HB, Chou, KC 2007bGpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteinsProtein Eng Des Sel203946CrossRefGoogle Scholar
  51. Shen, HB, Chou, KC 2007cHum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sitesBiochem Biophys Res Commun35510061011CrossRefGoogle Scholar
  52. Shen HB, Chou KC (2007d) Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng Des Sel (DOI: 1093/protein/hzm057)Google Scholar
  53. Shen, HB, Chou, KC 2007eSignal-3L: a 3-layer approach for predicting signal peptideBiochem Biophys Res Commun363297303CrossRefGoogle Scholar
  54. Shen, HB, Chou, KC 2007fUsing ensemble classifier to identify membrane protein typesAmino Acids32483488CrossRefGoogle Scholar
  55. Shen, HB, Chou, KC 2007gVirus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cellsBiopolymers85233240CrossRefGoogle Scholar
  56. Shen, HB, Yang, J, Chou, KC 2007aEuk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location predictionAmino Acids335767CrossRefGoogle Scholar
  57. Shen, HB, Yang, J, Chou, KC 2007bReview: methodology development for predicting subcellular localization and other attributes of proteinsExpert Rev Proteomics4453463CrossRefGoogle Scholar
  58. Shi, JY, Zhang, SW, Pan, Q, Cheng, YM, Xie, J 2007Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid compositionAmino Acids336974PubMedCrossRefGoogle Scholar
  59. Sun, XD, Huang, RB 2006Prediction of protein structural classes using support vector machinesAmino Acids30469475PubMedCrossRefGoogle Scholar
  60. Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic – algorithm partial least squares and support vector machine. Amino Acids (DOI: 10.1007/s00726-006-0465-0)Google Scholar
  61. Terribilini, M, Lee, JH, Yan, C, Jernigan, RL, Honavar, V, Dobbs, D 2006Prediction of RNA binding sites in proteins from amino acid sequenceRNA12113CrossRefGoogle Scholar
  62. Treger, M, Westhof, E 2001Statistical analysis of atomic contacts at RNA-protein interfacesJ Mol Recogn14199214CrossRefGoogle Scholar
  63. Vapnik, V 1998The nature of statistical learning theorySpringerNew YorkGoogle Scholar
  64. Wang LJ, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res Web Server Issue: W243–W248Google Scholar
  65. Wang, M, Yang, J, Liu, GP, Xu, ZJ, Chou, KC 2004Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid compositionProtein Eng Des Sel17509516PubMedCrossRefGoogle Scholar
  66. Wang, M, Yang, J, Chou, KC 2005Using string kernel to predict signal peptide cleavage site based on subsite coupling modelAmino Acids28395402Erratum, ibid. 2005, 29: 301PubMedCrossRefGoogle Scholar
  67. Wen, Z, Li, M, Li, Y, Guo, Y, Wang, K 2006Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognitionAmino Acids32277283PubMedCrossRefGoogle Scholar
  68. Xiao, X, Chou, KC 2007Digital coding of amino acids based on hydrophobic indexProtein Peptide Lett14871875CrossRefGoogle Scholar
  69. Xiao, X, Shao, SH, Ding, YS, Huang, ZD, Huang, Y, Chou, KC 2005Using complexity measure factor to predict protein subcellular locationAmino Acids285761PubMedCrossRefGoogle Scholar
  70. Xiao, X, Shao, SH, Huang, ZD, Chou, KC 2006Using cellular automata images and pseudo amino acid composition to predict protein subcellular locationAmino Acids304954PubMedCrossRefGoogle Scholar
  71. Xiao, X, Shao, SH, Huang, ZD, Chou, KC 2006Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factorJ Comput Chem27478482PubMedCrossRefGoogle Scholar
  72. Yang, ZR, Chou, KC 2004Bio-support vector machines for computational proteomicsBioinformatics20735741PubMedCrossRefGoogle Scholar
  73. Zhang, SW, Pan, Q, Zhang, HC, Shao, ZC, Shi, JY 2006Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusionAmino Acids30461468PubMedCrossRefGoogle Scholar
  74. Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids (DOI: 10.1007/s00726-007-0496-1)Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Institute of Biophysics and BiochemistrySchool of Life Science, Huazhong University of Science and TechnologyWuhan CityChina
  2. 2.Software College, Huazhong University of Science and TechnologyWuhan CityChina
  3. 3.Department of Control Science and EngineeringHuazhong University of Science and TechnologyWuhan CityChina

Personalised recommendations