Skip to main content

Amino Acid Features for Prediction of Protein-Protein Interface Residues with Support Vector Machines

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 4447)

Abstract

Knowledge of protein-protein interaction sites is vital to determine proteins’ function and involvement in different pathways. Support Vector Machines (SVM) have been proposed over the recent years to predict protein-protein interface residues, primarily based on single amino acid sequence inputs. We investigate the features of amino acids that can be best used with SVM for predicting residues at protein-protein interfaces. The optimal feature set was derived from investigation into features such as amino acid composition, hydrophobic characters of amino acids, secondary structure propensity of amino acids, accessible surface areas, and evolutionary information generated by PSI-BLAST profiles. Using a backward elimination procedure, amino acid composition, accessible surface areas, and evolutionary information generated by PSI-BLAST profiles gave the best performance. The present approach achieved overall prediction accuracy of 74.2% for 77 individulal proteins collected from the Protein Data Bank, which is better than the previously reported accuracies.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lichtarge, O., Sowa, M.E., Philippi, A.: Evolutionary traces of functional surfaces along the G protein signaling pathway. Methods Enzymol. 344, 536–556 (2001)

    CrossRef  Google Scholar 

  2. Sowa, M.E., He, W., Slep, K.C., Kercher, M.A., Lichtarge, O., Wensel, T.G.: Prediction and confirmation of a site critical for effector regulation of RGS domain activity. Nat. Struct. Biol. 8, 234–237 (2001)

    CrossRef  Google Scholar 

  3. Zhou, H.X.: Improving the understanding of human genetic disease through predictions of protein structures and protein-protein interaction sites. Curr. Med. Chem. 11, 539–549 (2004)

    CrossRef  Google Scholar 

  4. Chen, H., Zhou, H.X.: Prediction of interface residues in protein-protein complexes by a consensus neural network method: Test against NMR data. Proteins 61, 21–35 (2005)

    CrossRef  Google Scholar 

  5. Yan, C., Dobbs, D., Honavar, V.: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 20, i371–i378 (2004)

    CrossRef  Google Scholar 

  6. Zhou, H.X., Shan, Y.: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44, 336–343 (2001)

    CrossRef  Google Scholar 

  7. Fariselli, P., Pazos, F., Valencia, A., Casadio, R.: Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002)

    CrossRef  Google Scholar 

  8. Ofran, Y., Rost, B.: Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 544, 236–239 (2003)

    CrossRef  Google Scholar 

  9. Yan, C., Dobbs, D., Honavar, V.: Identification of residues involved in protein-protein interaction from amino acid sequencea support vector machine approach. In: Abraham, A., Franke, K., Köppen, M. (eds.) Intelligent Systems Design and Applications, pp. 53–62. Springer, Berlin (2003)

    Google Scholar 

  10. Vapnik, V.: Statistical Learning Theory. Wiley and Sons Inc., New York (1998)

    MATH  Google Scholar 

  11. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, New York (2000)

    Google Scholar 

  12. Nguyen, M.N., Rajapakse, J.C.: Two-stage support vector machines for protein secondary structure prediction. Neural, Parallel and Scientific Computations 11, 1–18 (2003)

    MathSciNet  Google Scholar 

  13. Nguyen, M.N., Rajapakse, J.C.: Two-stage multi-class SVMs for protein secondary structure prediction. Pacific Symposium on Biocomputing (PSB), Hawaii (2005)

    Google Scholar 

  14. Nguyen, M.N., Rajapakse, J.C.: Prediction of protein secondary structure with two-stage multi-class SVM approach. International Journal of Data Mining and Bioinformatics 1(3), 248–269 (2007)

    CrossRef  Google Scholar 

  15. Nguyen, M.N., Rajapakse, J.C.: Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins: Structure, Function, and Bioinformatics 59, 30–37 (2005)

    CrossRef  Google Scholar 

  16. Nguyen, M.N., Rajapakse, J.C.: Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins: Structure, Function, and Bioinformatics 63, 542–550 (2006)

    CrossRef  Google Scholar 

  17. Rajapakse, J.C., Duan, K.-B., Yeo, W.K.: Proteomic cancer classification with mass spectra data. American Journal of Pharmacology 5(5), 228–234 (2005)

    Google Scholar 

  18. Duan, K.-B., Rajapakse, J.C., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification witn expression data. IEEE Transactions on Nanobioscience 4(3), 228–234 (2005)

    CrossRef  Google Scholar 

  19. Thornton, J., Taylor, W.R.: Structure prediction. In: Findlay, J.B.C., Geisow, M.J. (eds.) Protein Sequencing, pp. 147–190. IRL Press, Oxford (1989)

    Google Scholar 

  20. Wang, L.H., Liu, J., Li, Y.F., Zhou, H.B.: Predicting protein secondary structure by a support vector machine based on a new coding scheme. Genome Informatics 15, 181–190 (2004)

    MathSciNet  Google Scholar 

  21. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 195–202 (1999)

    CrossRef  Google Scholar 

  22. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    CrossRef  Google Scholar 

  23. Chakrabarti, P., Janin, J.: Dissecting protein-protein recognition sites. J. Mol. Biol. 272, 132–143 (2002)

    Google Scholar 

  24. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)

    CrossRef  Google Scholar 

  25. Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)

    CrossRef  Google Scholar 

  26. Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA 93, 13–20 (1996)

    CrossRef  Google Scholar 

  27. Hsu, C.W., Lin, C.J.: A comparison on methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)

    CrossRef  Google Scholar 

  28. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elena Marchiori Jason H. Moore Jagath C. Rajapakse

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Nguyen, M.N., Rajapakse, J.C., Duan, KB. (2007). Amino Acid Features for Prediction of Protein-Protein Interface Residues with Support Vector Machines. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds) Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics. EvoBIO 2007. Lecture Notes in Computer Science, vol 4447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71783-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71783-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71782-9

  • Online ISBN: 978-3-540-71783-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics