Amino Acid Features for Prediction of Protein-Protein Interface Residues with Support Vector Machines

  • Minh N. Nguyen
  • Jagath C. Rajapakse
  • Kai-Bo Duan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4447)


Knowledge of protein-protein interaction sites is vital to determine proteins’ function and involvement in different pathways. Support Vector Machines (SVM) have been proposed over the recent years to predict protein-protein interface residues, primarily based on single amino acid sequence inputs. We investigate the features of amino acids that can be best used with SVM for predicting residues at protein-protein interfaces. The optimal feature set was derived from investigation into features such as amino acid composition, hydrophobic characters of amino acids, secondary structure propensity of amino acids, accessible surface areas, and evolutionary information generated by PSI-BLAST profiles. Using a backward elimination procedure, amino acid composition, accessible surface areas, and evolutionary information generated by PSI-BLAST profiles gave the best performance. The present approach achieved overall prediction accuracy of 74.2% for 77 individulal proteins collected from the Protein Data Bank, which is better than the previously reported accuracies.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lichtarge, O., Sowa, M.E., Philippi, A.: Evolutionary traces of functional surfaces along the G protein signaling pathway. Methods Enzymol. 344, 536–556 (2001)CrossRefGoogle Scholar
  2. 2.
    Sowa, M.E., He, W., Slep, K.C., Kercher, M.A., Lichtarge, O., Wensel, T.G.: Prediction and confirmation of a site critical for effector regulation of RGS domain activity. Nat. Struct. Biol. 8, 234–237 (2001)CrossRefGoogle Scholar
  3. 3.
    Zhou, H.X.: Improving the understanding of human genetic disease through predictions of protein structures and protein-protein interaction sites. Curr. Med. Chem. 11, 539–549 (2004)CrossRefGoogle Scholar
  4. 4.
    Chen, H., Zhou, H.X.: Prediction of interface residues in protein-protein complexes by a consensus neural network method: Test against NMR data. Proteins 61, 21–35 (2005)CrossRefGoogle Scholar
  5. 5.
    Yan, C., Dobbs, D., Honavar, V.: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 20, i371–i378 (2004)CrossRefGoogle Scholar
  6. 6.
    Zhou, H.X., Shan, Y.: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44, 336–343 (2001)CrossRefGoogle Scholar
  7. 7.
    Fariselli, P., Pazos, F., Valencia, A., Casadio, R.: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002)CrossRefGoogle Scholar
  8. 8.
    Ofran, Y., Rost, B.: Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 544, 236–239 (2003)CrossRefGoogle Scholar
  9. 9.
    Yan, C., Dobbs, D., Honavar, V.: Identification of residues involved in protein-protein interaction from amino acid sequencea support vector machine approach. In: Abraham, A., Franke, K., Köppen, M. (eds.) Intelligent Systems Design and Applications, pp. 53–62. Springer, Berlin (2003)Google Scholar
  10. 10.
    Vapnik, V.: Statistical Learning Theory. Wiley and Sons Inc., New York (1998)zbMATHGoogle Scholar
  11. 11.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, New York (2000)Google Scholar
  12. 12.
    Nguyen, M.N., Rajapakse, J.C.: Two-stage support vector machines for protein secondary structure prediction. Neural, Parallel and Scientific Computations 11, 1–18 (2003)MathSciNetGoogle Scholar
  13. 13.
    Nguyen, M.N., Rajapakse, J.C.: Two-stage multi-class SVMs for protein secondary structure prediction. Pacific Symposium on Biocomputing (PSB), Hawaii (2005)Google Scholar
  14. 14.
    Nguyen, M.N., Rajapakse, J.C.: Prediction of protein secondary structure with two-stage multi-class SVM approach. International Journal of Data Mining and Bioinformatics 1(3), 248–269 (2007)CrossRefGoogle Scholar
  15. 15.
    Nguyen, M.N., Rajapakse, J.C.: Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins: Structure, Function, and Bioinformatics 59, 30–37 (2005)CrossRefGoogle Scholar
  16. 16.
    Nguyen, M.N., Rajapakse, J.C.: Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins: Structure, Function, and Bioinformatics 63, 542–550 (2006)CrossRefGoogle Scholar
  17. 17.
    Rajapakse, J.C., Duan, K.-B., Yeo, W.K.: Proteomic cancer classification with mass spectra data. American Journal of Pharmacology 5(5), 228–234 (2005)Google Scholar
  18. 18.
    Duan, K.-B., Rajapakse, J.C., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification witn expression data. IEEE Transactions on Nanobioscience 4(3), 228–234 (2005)CrossRefGoogle Scholar
  19. 19.
    Thornton, J., Taylor, W.R.: Structure prediction. In: Findlay, J.B.C., Geisow, M.J. (eds.) Protein Sequencing, pp. 147–190. IRL Press, Oxford (1989)Google Scholar
  20. 20.
    Wang, L.H., Liu, J., Li, Y.F., Zhou, H.B.: Predicting protein secondary structure by a support vector machine based on a new coding scheme. Genome Informatics 15, 181–190 (2004)MathSciNetGoogle Scholar
  21. 21.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 195–202 (1999)CrossRefGoogle Scholar
  22. 22.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  23. 23.
    Chakrabarti, P., Janin, J.: Dissecting protein-protein recognition sites. J. Mol. Biol. 272, 132–143 (2002)Google Scholar
  24. 24.
    Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)CrossRefGoogle Scholar
  25. 25.
    Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)CrossRefGoogle Scholar
  26. 26.
    Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA 93, 13–20 (1996)CrossRefGoogle Scholar
  27. 27.
    Hsu, C.W., Lin, C.J.: A comparison on methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)CrossRefGoogle Scholar
  28. 28.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Minh N. Nguyen
    • 1
  • Jagath C. Rajapakse
    • 1
    • 2
  • Kai-Bo Duan
    • 1
  1. 1.BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological UniversitySingapore
  2. 2.Singapore-MIT AllianceSingapore

Personalised recommendations