Abstract
Knowledge of protein-protein interaction sites is vital to determine proteins’ function and involvement in different pathways. Support Vector Machines (SVM) have been proposed over the recent years to predict protein-protein interface residues, primarily based on single amino acid sequence inputs. We investigate the features of amino acids that can be best used with SVM for predicting residues at protein-protein interfaces. The optimal feature set was derived from investigation into features such as amino acid composition, hydrophobic characters of amino acids, secondary structure propensity of amino acids, accessible surface areas, and evolutionary information generated by PSI-BLAST profiles. Using a backward elimination procedure, amino acid composition, accessible surface areas, and evolutionary information generated by PSI-BLAST profiles gave the best performance. The present approach achieved overall prediction accuracy of 74.2% for 77 individulal proteins collected from the Protein Data Bank, which is better than the previously reported accuracies.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lichtarge, O., Sowa, M.E., Philippi, A.: Evolutionary traces of functional surfaces along the G protein signaling pathway. Methods Enzymol. 344, 536–556 (2001)
Sowa, M.E., He, W., Slep, K.C., Kercher, M.A., Lichtarge, O., Wensel, T.G.: Prediction and confirmation of a site critical for effector regulation of RGS domain activity. Nat. Struct. Biol. 8, 234–237 (2001)
Zhou, H.X.: Improving the understanding of human genetic disease through predictions of protein structures and protein-protein interaction sites. Curr. Med. Chem. 11, 539–549 (2004)
Chen, H., Zhou, H.X.: Prediction of interface residues in protein-protein complexes by a consensus neural network method: Test against NMR data. Proteins 61, 21–35 (2005)
Yan, C., Dobbs, D., Honavar, V.: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 20, i371–i378 (2004)
Zhou, H.X., Shan, Y.: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44, 336–343 (2001)
Fariselli, P., Pazos, F., Valencia, A., Casadio, R.: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002)
Ofran, Y., Rost, B.: Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 544, 236–239 (2003)
Yan, C., Dobbs, D., Honavar, V.: Identification of residues involved in protein-protein interaction from amino acid sequencea support vector machine approach. In: Abraham, A., Franke, K., Köppen, M. (eds.) Intelligent Systems Design and Applications, pp. 53–62. Springer, Berlin (2003)
Vapnik, V.: Statistical Learning Theory. Wiley and Sons Inc., New York (1998)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, New York (2000)
Nguyen, M.N., Rajapakse, J.C.: Two-stage support vector machines for protein secondary structure prediction. Neural, Parallel and Scientific Computations 11, 1–18 (2003)
Nguyen, M.N., Rajapakse, J.C.: Two-stage multi-class SVMs for protein secondary structure prediction. Pacific Symposium on Biocomputing (PSB), Hawaii (2005)
Nguyen, M.N., Rajapakse, J.C.: Prediction of protein secondary structure with two-stage multi-class SVM approach. International Journal of Data Mining and Bioinformatics 1(3), 248–269 (2007)
Nguyen, M.N., Rajapakse, J.C.: Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins: Structure, Function, and Bioinformatics 59, 30–37 (2005)
Nguyen, M.N., Rajapakse, J.C.: Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins: Structure, Function, and Bioinformatics 63, 542–550 (2006)
Rajapakse, J.C., Duan, K.-B., Yeo, W.K.: Proteomic cancer classification with mass spectra data. American Journal of Pharmacology 5(5), 228–234 (2005)
Duan, K.-B., Rajapakse, J.C., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification witn expression data. IEEE Transactions on Nanobioscience 4(3), 228–234 (2005)
Thornton, J., Taylor, W.R.: Structure prediction. In: Findlay, J.B.C., Geisow, M.J. (eds.) Protein Sequencing, pp. 147–190. IRL Press, Oxford (1989)
Wang, L.H., Liu, J., Li, Y.F., Zhou, H.B.: Predicting protein secondary structure by a support vector machine based on a new coding scheme. Genome Informatics 15, 181–190 (2004)
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 195–202 (1999)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Chakrabarti, P., Janin, J.: Dissecting protein-protein recognition sites. J. Mol. Biol. 272, 132–143 (2002)
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)
Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA 93, 13–20 (1996)
Hsu, C.W., Lin, C.J.: A comparison on methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Nguyen, M.N., Rajapakse, J.C., Duan, KB. (2007). Amino Acid Features for Prediction of Protein-Protein Interface Residues with Support Vector Machines. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds) Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics. EvoBIO 2007. Lecture Notes in Computer Science, vol 4447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71783-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-71783-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71782-9
Online ISBN: 978-3-540-71783-6
eBook Packages: Computer ScienceComputer Science (R0)
