Skip to main content
Log in

Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach

  • Original Article
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

In this paper, we describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface), based on the identity of the target residue and its ten sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and protease-inhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease-inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to predict residues involved in protein-protein interactions from sequence information alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1a, b
Fig. 2a-c

Similar content being viewed by others

References

  1. Baldi P, Brunak S, Chauvin Y, Andersen CAF (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412-424

    Article  Google Scholar 

  2. Benner SA, Badcoe I, Cohen MA, Gerloff DL (1994) Bona fide prediction of aspects of protein conformation: assigning interior and surface residues from patterns of variation and conservation in homologous protein sequences. J Mol Biol 235:926-958

    Article  Google Scholar 

  3. Bossart-Whitaker P, Chang CY, Novotny J, Benjamin DC, Sheriff S (1995) The crystal structure of the antibody N10-staphylococcal nuclease complex at 2.9 Å resolution. J Mol Biol 253:559-575

    Article  Google Scholar 

  4. Chakrabarti P, Janin J (2002) Dissecting protein-protein recognition sites. Proteins 47:334-343

    Article  Google Scholar 

  5. Dodge C, Schneider R, Sander C (1998) The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res 26:313-315

    Article  Google Scholar 

  6. Eisenberg D, Schwarz E, Komaromy M, Wall R (1984) Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol 179:125-142

    CAS  PubMed  Google Scholar 

  7. Fariselli P, Pazos F, Valencia A, Casadia R (2002) Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur J Biochem 269:1356-1361

    Article  CAS  PubMed  Google Scholar 

  8. Frigerio F, Coda A, Pugliese L, Lionetti C, Menegatti E, Amiconi G, Schnebli HP, Ascenzi P, Bolognesi M (1992) Crystal and molecular structure of the bovine alpha-chymotrypsin-eglin c complex at 2.0 A resolution. J Mol Biol 225:107-123

    Google Scholar 

  9. Gallet X, Charloteaux B, Thomas A, Brasseur R (2000) A fast method to predict protein interaction sites from sequences. J Mol Biol 302:917-926

    Article  CAS  PubMed  Google Scholar 

  10. Gallivan JP, Lester HA, Dougherty DA (1997) Site-specific incorporation of biotinylated amino acids to identify surface-exposed residues in integral membrane proteins. Chem Biol 4:739-749

    CAS  PubMed  Google Scholar 

  11. Glaser F, Steinberg DM, Vakser A, Ben-Tal N (2001) Residue frequencies and pairing preferences at protein-protein interfaces. Proteins 43:89-102

    Article  Google Scholar 

  12. Holbrook SR, Muskal SM, Kim SH (1990) Predicting surface exposure of amino acids from protein sequence. Protein Eng 3:659-665

    Google Scholar 

  13. Jones S,Thornton JM (1996) Principles of protein-protein interactions. P Natl Acad Sci USA, 93:13-20

    Google Scholar 

  14. Jones S, Thornton JM (1997a) Analysis of protein-protein interaction sites using surface patches. J Mol Biol 272:121-132

    Article  Google Scholar 

  15. Jones S, Thornton JM (1997b) Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 272:133-143

    Article  Google Scholar 

  16. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577-2637

    CAS  PubMed  Google Scholar 

  17. Kini RM, Evans HJ (1996) Prediction of potential protein-protein interaction sites from amino acid sequence identification of a fibrin polymerization site. FEBS Lett 385:81-86

    Article  Google Scholar 

  18. Lu L, Lu H, Skolnick J (2003) Development of Unified Statistical Potentials describing Protein-protein interactions. Biophy J 84:1895–1901

    Article  Google Scholar 

  19. Mandler J (1988) ANTIGEN: protein surface residue prediction. Comput Appl Biosci 4:493

    Google Scholar 

  20. Mucchielli-Giorgi MH, About S, Puffery P (1999) PredAcc: prediction of solvent accessibility. Bioinformatics 15:176-177

    Article  Google Scholar 

  21. Naderi-Manesh H, Sadeghi M, Arab S, Movahedi AAM (2001) Prediction of protein surface accessibility with information theory. Proteins 42:452-459

    Article  Google Scholar 

  22. Ofran Y, Rost B (2003a) Analysing six types of protein-protein interfaces. J Mol Biol 325:377-387

    Article  Google Scholar 

  23. Ofran Y, Rost B (2003b) Predicted protein-protein interaction sites from local sequence information. FEBS Lett 544:236-239

    Article  Google Scholar 

  24. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods-support vector learning. MIT Press, Cambridge, pp 185-208

  25. Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20:216-226

    Google Scholar 

  26. Teichmann SA, Murzin AG, Chothia C (2001) Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struc Biol 11:354-363

    Article  Google Scholar 

  27. Tsunemi M, Matsuura Y, Sakakibara S, Katsube Y(1996) Crystal structure of an elastase-specific inhibitor elafin complexed with porcine pancreatic elastase determined at 1.9 A resolution. Biochemistry 35:11570-11576

    Article  Google Scholar 

  28. Valencia A, Pazos F (2002) Computational methods for prediction of protein interactions. Curr Opin Struc Biol 12:368-373

    Article  Google Scholar 

  29. Vapnik V (1998) Statistical learning theory. Springer, Berlin Heidelberg New York

  30. Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kauffman, San Mateo, California

    Google Scholar 

  31. Honavar V, Yan C, Dobbs D (2002) Predicting protein-protein interaction sites from amino acid sequence. Technical report ISU-CS-TR 02-11 (http://archives.cs.iastate.edu/documents/disk0/00/00/02/88/index.html). Department of Computer Science, Iowa State University

  32. Zhou H, Shan Y (2001) Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44:336-343

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported in part by grants from the National Science Foundation (0219699), the National Institute of Health (GM066387), and the Iowa State University Plant Science Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changhui Yan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, C., Honavar, V. & Dobbs, D. Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach. Neural Comput & Applic 13, 123–129 (2004). https://doi.org/10.1007/s00521-004-0414-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-004-0414-3

Keywords

Navigation