PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles
- 379 Downloads
Protein–RNA interactions play a key role in a number of biological processes such as protein synthesis, mRNA processing, assembly and function of ribosomes and eukaryotic spliceosomes. A reliable identification of RNA-binding sites in RNA-binding proteins is important for functional annotation and site-directed mutagenesis. We developed a novel method for the prediction of protein residues that interact with RNA using support vector machine (SVM) and position-specific scoring matrices (PSSMs). Two cases have been considered in the prediction of protein residues at RNA-binding surfaces. One is given the sequence information of a protein chain that is known to interact with RNA; the other is given the structural information. Thus, five different inputs have been tested. Coupled with PSI-BLAST profiles and predicted secondary structure, the present approach yields a Matthews correlation coefficient (MCC) of 0.432 by a 7-fold cross-validation, which is the best among all previous reported RNA-binding sites prediction methods. When given the structural information, we have obtained the MCC value of 0.457, with PSSMs, observed secondary structure and solvent accessibility information assigned by DSSP as input. A web server implementing the prediction method is available at the following URL: http://188.8.131.52/printr/.
Unable to display preview. Download preview PDF.
- Chou, KC, Shen, HB 2007aEuk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sitesJ Proteome Res617281734Google Scholar
- Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids, DOI: 10.1007/s00726-007-0550-zGoogle Scholar
- Jeong, E, Chung, IF, Miyano, S 2004A neural network method for identification of RNA-interacting residues in proteinsGenome Inform Ser Workshop Genome Inform15105116Google Scholar
- Jeong E, Miyano S (2006) A weighted profile based method for Protein-RNA interacting residues prediction. Trans Comput Syst Biol IV: 123–139Google Scholar
- Joachims, T 1999Making large-scale SVM learning practicalSchőlkopf, BBurges, CSmola, A eds. Advances in kernel methods-support vector learningMIT PressCambridge, MA, USAGoogle Scholar
- Morik K, Brockhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach – a case study in intensive care monitoring. In: Proceedings of the 16th International Conference on Machine Learning (ICML-99)Google Scholar
- Shen HB, Chou KC (2007d) Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng Des Sel (DOI: 1093/protein/hzm057)Google Scholar
- Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic – algorithm partial least squares and support vector machine. Amino Acids (DOI: 10.1007/s00726-006-0465-0)Google Scholar
- Vapnik, V 1998The nature of statistical learning theorySpringerNew YorkGoogle Scholar
- Wang LJ, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res Web Server Issue: W243–W248Google Scholar
- Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids (DOI: 10.1007/s00726-007-0496-1)Google Scholar