Advertisement

Using Product Kernels to Predict Protein Interactions

  • Shawn MartinEmail author
  • W. Michael Brown
  • Jean-Loup Faulon
Chapter
Part of the Advances in Biochemical Engineering/Biotechnology book series (ABE, volume 110)

Abstract

There is a wide variety of experimental methods for the identification of protein interactions.This variety has in turn spurred the development of numerous different computational approaches for modelingand predicting protein interactions. These methods range from detailed structure-based methods capableof operating on only a single pair of proteins at a time to approximate statistical methods capableof making predictions on multiple proteomes simultaneously. In this chapter, we provide a brief discussionof the relative merits of different experimental and computational methods available for identifying proteininteractions. Then we focus on the application of our particular (computational) method using Support VectorMachine product kernels. We describe our method in detail and discuss the application of the method forpredicting protein–protein interactions, β-strand interactions,and protein–chemical interactions.

β-strand interactions Product kernels Protein–chemical interactions Protein–protein interactions Support Vector Machines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Shoemaker BA, Panchenko AR (2007) Deciphering protein–protein interactions. Part I Experimental techniques and databases. PLoS Comput Biol 3(2):e42 CrossRefGoogle Scholar
  2. 2.
    Yan Y, Marriott G (2003) Analysis of protein interactions using fluorescence technologies. Curr Opin Chem Biol 7(4):635–640 CrossRefGoogle Scholar
  3. 3.
    Karlsson R (2004) SPR for molecular interaction analysis: a review of emerging application areas. J Mol Recognit 17(2):151–161 CrossRefGoogle Scholar
  4. 4.
    Yang Y, Wang H, Erie DA (2003) Quantitative characterization of biomolecular assemblies and interactions using atomic force microscopy. Methods 29(1):175–187 CrossRefGoogle Scholar
  5. 5.
    Baumeister W, Grimm R, Walz J (1999) Electron tomography of molecules and cells. Trends Cell Biol 9(1):81–85 CrossRefGoogle Scholar
  6. 6.
    Ito T et al. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98(7):4569–4574 CrossRefGoogle Scholar
  7. 7.
    Uetz P et al. (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623–627 CrossRefGoogle Scholar
  8. 8.
    Rigaut G et al. (1999) A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol 17(9):1030–1032 CrossRefGoogle Scholar
  9. 9.
    Eisen MB et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868 CrossRefGoogle Scholar
  10. 10.
    Jones RB et al. (2006) A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439(7073):168–174 CrossRefGoogle Scholar
  11. 11.
    Ye P et al. (2005) Gene function prediction from congruent synthetic lethal interactions in yeast. Mol Syst Biol 1:2005–0026 CrossRefGoogle Scholar
  12. 12.
    Smith GP (1985) Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228(4705):1315–1717 CrossRefGoogle Scholar
  13. 13.
    Shoemaker BA, Panchenko AR (2007) Deciphering protein–protein interactions. Part II Computational methods to predict protein and domain interaction partners. PLoS Comput Biol 3(3):e43 CrossRefGoogle Scholar
  14. 14.
    Aloy P, Russell RB (2006) Structural systems biology: modelling protein interactions. Nat Rev Mol Cell Biol 7(2):188–197 CrossRefGoogle Scholar
  15. 15.
    Smith GR, Sternberg MJ (2002) Prediction of protein–protein interactions by docking methods. Curr Opin Struct Biol 12(1):28–35 CrossRefGoogle Scholar
  16. 16.
    Aloy P, Russell RB (2002) Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci USA 99(8):5896–5901 CrossRefGoogle Scholar
  17. 17.
    de Rinaldis M et al. (1998) Three-dimensional profiles: a new tool to identify protein surface similarities. J Mol Biol 284(3):1211–1221 CrossRefGoogle Scholar
  18. 18.
    Sheinerman FB, Al-Lazikani B, Honig B (2003) Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. J Mol Biol 334(3):823–841 CrossRefGoogle Scholar
  19. 19.
    Dandekar T et al. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23(8):324–328 CrossRefGoogle Scholar
  20. 20.
    Overbeek R et al. (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96(5):2896–2901 CrossRefGoogle Scholar
  21. 21.
    Pazos F, Valencia A (2001) Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng 14(8):609–614 CrossRefGoogle Scholar
  22. 22.
    Pellegrini M et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96(7):4285–4288 CrossRefGoogle Scholar
  23. 23.
    Enright AJ et al. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402(6757):86–90 CrossRefGoogle Scholar
  24. 24.
    Goh CS et al. (2000) Co-evolution of proteins with their interaction partners. J Mol Biol 299(1):283–293 CrossRefGoogle Scholar
  25. 25.
    Walhout AJ et al. (2000) Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287(5450):116–122 CrossRefGoogle Scholar
  26. 26.
    Ben-Hur A, Noble WS (2005) Kernel methods for predicting protein–protein interactions. Bioinformatics 21(1):i38–46 CrossRefGoogle Scholar
  27. 27.
    Martin S, Roe D, Faulon JL (2005) Predicting protein–protein interactions using signature products. Bioinformatics 21(1):218–226 CrossRefGoogle Scholar
  28. 28.
    Sprinzak E, Margalit H (2001) Correlated sequence-signatures as markers of protein–protein interaction. J Mol Biol 311(3):681–692 CrossRefGoogle Scholar
  29. 29.
    Deng M et al. (2002) Inferring domain-domain interactions from protein–protein interactions. Genome Res 12(9):1540–1548 CrossRefGoogle Scholar
  30. 30.
    Jansen R et al. (2003) A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302(5644):449–453 CrossRefGoogle Scholar
  31. 31.
    Apweiler R et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 29(1):37–40 CrossRefGoogle Scholar
  32. 32.
    Shawe-Taylor J, Cristianini N (2004) Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge CrossRefGoogle Scholar
  33. 33.
    Vapnik V (1998) Statistical Learning Theory. Wiley, New York Google Scholar
  34. 34.
    Shawe-Taylor J, Cristianini N (2000) Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press, Cambridge Google Scholar
  35. 35.
    Smola A, Scholkopf B (1998) A tutorial on support vector regression. NeuroCOLT NC-TR-98–030, Royal Holloway College, University of London, UK Google Scholar
  36. 36.
    Ben-Hur A et al. (2001) Support vector clustering. J Mach Learn Res 2:125–137 Google Scholar
  37. 37.
    Ham J et al. (2004) A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the International Conference on Machine Learning (ICML'04). Banff, Canada Google Scholar
  38. 38.
    Weinberger KQ, Saul LK (2006) An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Proceedings of the National Conference on Artificial Intelligence (AAAI'06). Boston, MA Google Scholar
  39. 39.
    Bennet K, Campbell C (2000) Support vector machines: hype or hallelujah? SIGKDD Explorations 2(1):1–13 CrossRefGoogle Scholar
  40. 40.
    Burges C (1998) A tutorial on support vector machines for pattern recogntion. Data Mining Knowledge Discov 2:121–167 CrossRefGoogle Scholar
  41. 41.
    Xenarios I et al. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30(1):303–305 CrossRefGoogle Scholar
  42. 42.
    Alfarano C et al. (2005) The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 33(Database issue), p D418–D424 Google Scholar
  43. 43.
    Guldener U et al. (2006) MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res. 34(Database issue), p D436–D441 Google Scholar
  44. 44.
    Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. Pac Symp Biocomput, pp 564–575 Google Scholar
  45. 45.
    Leslie C, Kuang R (2004) Fast string kernels using inexact matching for protein sequences. J Mach Learn Res 5:1435–1455 Google Scholar
  46. 46.
    Mahe P et al. (2005) Graph kernels for molecular structure-activity relationship analysis with support vector machines. J Chem Inf Model 45(3):939–951 CrossRefGoogle Scholar
  47. 47.
    Ralaivola L et al. (2005) Graph kernels for chemical informatics. Neural Netw 18(7):1093–1110 CrossRefGoogle Scholar
  48. 48.
    Swamidass SJ et al. (2005) Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21(1):i359–i368 CrossRefGoogle Scholar
  49. 49.
    Faulon JL, Visco DP Jr, Pophale RS (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci 43(2):707–720 Google Scholar
  50. 50.
    Faulon JL, Churchwell CJ, Visco DP Jr (2003) The signature molecular descriptor. 2. Enumerating molecules from their extended valence sequences. J Chem Inf Comput Sci 43(2):721–734 Google Scholar
  51. 51.
    Faulon JL, Collins MJ, Carr RD (2004) The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J Chem Inf Comput Sci 44(1):427–436 Google Scholar
  52. 52.
    Spivak M (1965) Calculus on Manifolds. Perseus Books Publishing Google Scholar
  53. 53.
    Tong AH et al. (2002) A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295(5553):321–324 CrossRefGoogle Scholar
  54. 54.
    Rain JC et al. (2001) The protein–protein interaction map of Helicobacter pylori. Nature 409(6817):211–215 CrossRefGoogle Scholar
  55. 55.
    Bock JR, Gough DA (2001) Predicting protein–protein interactions from primary structure. Bioinformatics 17(4):455–460 CrossRefGoogle Scholar
  56. 56.
    Brown WM et al. (2006) Prediction of beta-strand packing interactions using the signature product. J Mol Model 12(2):355–361 CrossRefGoogle Scholar
  57. 57.
    Orengo CA et al. (1999) Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins Suppl 3:149–170 Google Scholar
  58. 58.
    Przybylski D, Rost B (2002) Alignments grow, secondary structure prediction improves. Proteins 46(1):197–205 CrossRefGoogle Scholar
  59. 59.
    Hutchinson EG et al. (1998) Determinants of strand register in antiparallel beta-sheets of proteins. Protein Sci 7(10):2287–2300 CrossRefGoogle Scholar
  60. 60.
    Steward RE, Thornton JM (2002) Prediction of strand pairing in antiparallel and parallel beta-sheets using information theory. Proteins 48(1):178–191 CrossRefGoogle Scholar
  61. 61.
    Zaremba SM, Gregoret LM (1999) Context-dependence of amino acid residue pairing in antiparallel beta-sheets. J Mol Biol 291(1):463–479 CrossRefGoogle Scholar
  62. 62.
    King RD et al. (1994) On the use of machine learning to identify topological rules in the packing of beta-strands. Protein Eng 7(10):1295–1303 CrossRefGoogle Scholar
  63. 63.
    Siepen JA, Radford SE, Westhead DR (2003) Beta edge strands in protein structure prediction and aggregation. Protein Sci 12(9):2348–2359 Google Scholar
  64. 64.
    Berman HM et al. (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242 CrossRefGoogle Scholar
  65. 65.
    Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134(2–3):204–218 CrossRefGoogle Scholar
  66. 66.
    Simossis VA, Heringa J (2004) Integrating protein secondary structure prediction and multiple sequence alignment. Curr Protein Pept Sci 5(3):249–266 CrossRefGoogle Scholar
  67. 67.
    Faulon J-L, Misra M, Martin S, Sale K (2007) Genome scale enzyme-metabolite and drug-target interaction prediction using the signature molecular descriptor. Bioinformatics Google Scholar
  68. 68.
    Kanehisa M et al. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(Database issue):D354–D357 CrossRefGoogle Scholar
  69. 69.
    Wishart DS et al. (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34(Database issue):D668–D672 CrossRefGoogle Scholar
  70. 70.
    Nagamine N, Sakakibara Y (2007) Statistical prediction of protein-chemical interactions based on chemical structure and mass spectrometry data. Bioinformatics 23(5):2004–2012 CrossRefGoogle Scholar
  71. 71.
    Webb EC (1992) Enzyme Nomenclature Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Academic Press, San Diego Google Scholar
  72. 72.
    Borgwardt KM et al. (2005) Protein function prediction via graph kernels. Bioinformatics 21(1):i47–i56 CrossRefGoogle Scholar
  73. 73.
    Cai CZ et al. (2003) SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31(12):3692–3697 CrossRefGoogle Scholar
  74. 74.
    Kunik V et al. (2005) Motif extraction and protein classification. Proc IEEE Comput Syst Bioinform Conf, pp 80–85 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Shawn Martin
    • 1
    Email author
  • W. Michael Brown
    • 1
  • Jean-Loup Faulon
    • 2
  1. 1.Computational BiologySandia National LaboratoriesAlbuquerqueUSA
  2. 2.Computational BioscienceSandia National LaboratoriesAlbuquerqueUSA

Personalised recommendations