Using a Stochastic AdaBoost Algorithm to Discover Interactome Motif Pairs from Sequences

  • Huan Yu
  • Minping Qian
  • Minghua Deng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4115)


Protein interactome is an important research focus in the post-genomic era. The identification of interacting motif pairs is essential for exploring the mechanism of protein interactions. We describe a stochastic AdaBoost approach for discovering motif pairs from known interactions and pairs of proteins that are putatively not to interact. Our interacting motif pairs are validated by multiple-chain PDB structures and show more significant than those selected by traditional statistical method. Furthermore, in a cross-validated comparison, our model can be used to predict interactions between proteins with higher sensitivity (66.42%) and specificity (87.38%) comparing with the Naive Bayes model and the dominating model.


Motif Pair Traditional Statistical Method RCSB Protein Data Feature Vector Extraction Protein Interactome 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Phizicky, E.M., Fields, S.: Protein-Protein Interactions: Methods for Detection and Analysis. Microbiol. Rev. 59(1), 94–123 (1995)Google Scholar
  2. 2.
    MacBeath, G., Schreiber, S.L.: Printing Proteins as Microarrays for High-Throughput Function Determination. Science 289(5485), 1760–1763 (2000)Google Scholar
  3. 3.
    Uetz, P., Giot, L., Cagney, G., et al.: A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae. Nature 403(6770), 623–627 (2000)CrossRefGoogle Scholar
  4. 4.
    Ito, T., Chiba, T., Ozawa, R., et al.: A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome. Proc. Natl. Acad. Sci. U S A 98(8), 4569–4574 (2001)CrossRefGoogle Scholar
  5. 5.
    Zhu, H., Bilgin, M., Bangham, R., et al.: Global Analysis of Protein Activities Using Proteome Chips. Science 293(5537), 2101–2105 (2001)CrossRefGoogle Scholar
  6. 6.
    Gavin, A.C., Bosche, M., Krause, R., et al.: Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes. Nature 415(6868), 141–147 (2002)CrossRefGoogle Scholar
  7. 7.
    Ho, Y., Gruhler, A., Heilbut, A., et al.: Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry. Nature 415(6868), 180–183 (2002)CrossRefGoogle Scholar
  8. 8.
    Mrowka, R., Patzak, A., Herzel, H.: Is There a Bias in Proteome Research? Genome. Res. 11(12), 1971–1973 (2001)CrossRefGoogle Scholar
  9. 9.
    Huynen, M.A., Bork, P.: Measuring Genome Evolution. Proc. Natl. Acad. Sci. U S A 95(11), 5849–5856 (1998)CrossRefGoogle Scholar
  10. 10.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., et al.: Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles. Proc. Natl. Acad. Sci. U S A 96(8), 4285–4288 (1999)CrossRefGoogle Scholar
  11. 11.
    Enright, A.J., Iliopoulos, I., Kyrpides, N.C., et al.: Protein Interaction Maps for Complete Genomes Based on Gene Fusion Events. Nature 402(6757), 86–90 (1999)CrossRefGoogle Scholar
  12. 12.
    Marcotte, E.M., Pellegrini, M., Ng, H.L., et al.: Detecting Protein Function and Protein-Protein Interactions from Genome Sequences. Science 285(5428), 751–753 (1999)CrossRefGoogle Scholar
  13. 13.
    Dandekar, T., Snel, B., Huynen, M., et al.: Conservation of Gene Order: A Fingerprint of Proteins that Physically Interact. Trends Biochem. Sci. 23(9), 324–328 (1998)CrossRefGoogle Scholar
  14. 14.
    Overbeek, R., Fonstein, M., D’Souza, M., et al.: The Use of Gene Clusters to Infer Functional Coupling. Proc. Natl. Acad. Sci. U S A 96(6), 2896–2901 (1999)CrossRefGoogle Scholar
  15. 15.
    Wojcik, J., Schachter, V.: Protein-Protein Interaction Map Inference Using Interacting Domain Profile Pairs. Bioinformatics 17(Suppl. 1), S296–S305 (2001)Google Scholar
  16. 16.
    Deng, M., Mehta, S., Sun, F., et al.: Inferring Domain-Domain Interactions from Protein-Protein Interactions. Genome. Res. 12(10), 1540–1548 (2002)CrossRefGoogle Scholar
  17. 17.
    Kim, W.K., Park, J., Suh, J.K.: Large Scale Statistical Prediction of Protein-Protein Interaction by Potentially Interacting Domain (pid) Pair. In: Genome Inform Ser Workshop Genome Inform, vol. 13, pp. 42–50 (2002)Google Scholar
  18. 18.
    Bock, J.R., Gough, D.A.: Whole-Proteome Interaction Mining. Bioinformatics 19(1), 125–134 (2003)CrossRefGoogle Scholar
  19. 19.
    Gomez, S.M., Rzhetsky, A.: Towards the Prediction of Complete Protein-Protein Interaction Networks. In: Pac. Symp. Biocomput., pp. 413–424 (2002)Google Scholar
  20. 20.
    Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to Predict Protein-Protein Interactions from Protein Sequences. Bioinformatics 19(15), 1875–1881 (2003)CrossRefGoogle Scholar
  21. 21.
    Han, D.S., Kim, H.S., Jang, W.H., Lee, S.D., Suh, J.K.: PreSPI: A Domain Combination Based Prediction System for Protein-Protein Interaction. Nucleic Acids Res. 32(21), 6312–6320 (2004)CrossRefGoogle Scholar
  22. 22.
    Hayashida, M., Ueda, N., Akutsu, T.: Inferring Strengths of Protein-Protein Interactions from Experimental Data Using Linear Programming. Bioinformatics 19(Suppl. 2), II58–II65 (2003)Google Scholar
  23. 23.
    Ng, S.K., Zhang, Z., Tan, S.H.: Integrative Approach for Computationally Inferring Protein Domain Interactions. Bioinformatics 19(8), 923–929 (2003)CrossRefGoogle Scholar
  24. 24.
    Chen, X.W., Liu, M.: Prediction of Protein-Protein Interactions Using Random Decision Forest Framework. Bioinformatics 21(24), 4394–4400 (2005)CrossRefGoogle Scholar
  25. 25.
    Espadaler, J., Romero-Isart, O., Jackson, R.M., Oliva, B.: Prediction of Protein-Protein Interactions Using Distant Conservation of Sequence Patterns and Structure Relationships. Bioinformatics 21(16), 3360–3368 (2005)CrossRefGoogle Scholar
  26. 26.
    Liu, Y., Liu, N., Zhao, H.: Inferring Protein-Protein Interactions through High-Throughput Interaction Data from Diverse Organisms. Bioinformatics 21(15), 3279–3285 (2005)CrossRefGoogle Scholar
  27. 27.
    Nye, T.M., Berzuini, C., Gilks, W.R., Babu, M.M., Teichmann, S.A.: Statistical Analysis of Domains in Interacting Protein Pairs. Bioinformatics 21(7), 993–1001 (2005)CrossRefGoogle Scholar
  28. 28.
    Riley, R., Lee, C., Sabatti, C., Eisenberg, D.: Inferring Protein Domain Interactions from Databases of Interacting Proteins. Genome. Biol. 6(10), R89 (2005)CrossRefGoogle Scholar
  29. 29.
    Lehrach, W.P., Husmeier, D., Williams, C.K.: A Regularized Discriminative Model for the Prediction of Protein-Peptide Interactions. Bioinformatics 22(5), 532–540 (2006)CrossRefGoogle Scholar
  30. 30.
    Sprinzak, E., Margalit, H.: Correlated Sequence-Signatures as Markers of Protein-Protein Interaction. J. Mol. Biol. 311(4), 681–692 (2001)CrossRefGoogle Scholar
  31. 31.
    Wang, H., Segal, E., Ben-Hur, A., et al.: Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale. In: Advances in Neural Information Processing Systems 17, pp. 1465–1472. MIT Press, Cambridge (2005)Google Scholar
  32. 32.
    Fang, J., Haasl, R.J., Dong, Y., Lushington, G.H.: Discover Protein Sequence Signatures from Protein-Protein Interaction Data. BMC Bioinformatics 6(1), 277 (2005)CrossRefGoogle Scholar
  33. 33.
    Falquet, L., Pagni, M., Bucher, P., et al.: The PROSITE Database, its Status in 2002. Nucleic Acids Res. 30(1), 235–238 (2002)CrossRefGoogle Scholar
  34. 34.
    Yu, H., Qian, M., Deng, M.: Understanding Protein-Protein Interactions: From Domain Level to Motif Level. In: Proceeding of Sino-Germany Conference: Network, From Biology to Theory, Springer, Heidelberg (2005)Google Scholar
  35. 35.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)MATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Salwinski, L., Miller, C.S., Smith, A.J., et al.: The Database of Interacting Proteins: 2004 Update. Nucleic Acids Res. 32(Database issue), D449–D451 (2004)CrossRefGoogle Scholar
  37. 37.
    Jansen, R., Gerstein, M.: Analyzing Protein Function on a Genomic Scale: The Importance of Gold-Standard Positives and Negatives for Network Prediction. Curr. Opin. Microbiol. 7(5), 535–545 (2004)CrossRefGoogle Scholar
  38. 38.
    Deshpande, N., Addess, K.J., Bluhm, W.F., et al.: The RCSB Protein Data Bank: A Redesigned Query System and Relational Database Based on the mmCIF Schema. Nucleic Acids Res. 33(Database issue), D233–D237 (2005)CrossRefGoogle Scholar
  39. 39.
    Taylor, W.R., Jones, D.T.: Deriving an Amino Acid Distance Matrix. J. Theor. Biol. 164(1), 65–83 (1993)CrossRefGoogle Scholar
  40. 40.
    Littlestone, N.: Learning Quickly when Irrelevant Attributes Abound: A New Linear-Threshold Algorithm. Machine Learning 2(4), 285–318 (1988)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Huan Yu
    • 1
  • Minping Qian
    • 1
  • Minghua Deng
    • 1
  1. 1.LMAM, School of Mathematical Sciences and Center for Theoretical BiologyPeking UniversityBeijingP.R. China

Personalised recommendations