Computational Prediction of Protein-Protein Interactions

  • John C. Obenauer
  • Michael B. Yaffe
Part of the Methods in Molecular Biology book series (MIMB, volume 261)


Eukaryotic proteins typically contain one or more modular domains such as kinases, phosphatases, and phoshopeptide-binding domains, as well as characteristic sequence motifs that direct post-translational modifications such as phosphorylation, or mediate binding to specific modular domains. A computational approach to predict protein interactions on a proteomewide basis would therefore consist of identifying modular domains and sequence motifs from protein primary sequence data, creating sequence specificity-based algorithms to connect a domain in one protein with a motif in another in “interaction space” and then graphically constructing possible interaction networks. Computational methods for predicting modular domains in proteins have been quite successful, but identifying the short sequence motifs these domains recognize has been more difficult. We are developing improved methods to identify these motifs by combining experimental and computational techniques with databases of sequences and binding information. Scansite is a web-accessible program that predicts interactions between proteins using experimental binding data from peptide library and phage display experiments. This program focuses on domains important in cell signaling, but it can, in principle, be used for other interactions if the domains and binding motifs are known. This chapter describes in detail how to use Scansite to predict the binding partners of an input protein, and how to find all proteins that contain a given sequence motif.

Key Words

Protein-protein interactions phosphorylation sites peptide library phage display sequence motifs bioinformatics 


  1. 1.
    Uetz, P. and Hughes, R. E. (2000) Systematic and large-scale two-hybrid screens. Curr. Opin. Microbiol. 3, 303–308.PubMedCrossRefGoogle Scholar
  2. 2.
    Zucconi, A., Panni, S., Paoluzi, S., Castagnoli, L., Dente, L., and Cesareni, G. (2000) Domain repertoires as a tool to derive protein recognition rules. FEBS Lett. 480, 49–54.PubMedCrossRefGoogle Scholar
  3. 3.
    Kay, B. K., Kasanov, J., and Yamabhai, M. (2001) Screening phage-displayed combinatorial peptide libraries. Methods 24, 240–246.PubMedCrossRefGoogle Scholar
  4. 4.
    Ho, Y., Gruhler, A., Heilbut, A., et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183.PubMedCrossRefGoogle Scholar
  5. 5.
    Gavin, A. C., Bosche, M., Krause, R., et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147.PubMedCrossRefGoogle Scholar
  6. 6.
    Link, A. J., Eng, J., Schieltz, D. M., et al. (1999) Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17, 676–682.PubMedCrossRefGoogle Scholar
  7. 7.
    Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M., and Cesareni, G. (2002) MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140.PubMedCrossRefGoogle Scholar
  8. 8.
    Xenarios, I., Salwinski, L., Duan, X. J., Higney, P., Kim, S. M., and Eisenberg, D. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305.PubMedCrossRefGoogle Scholar
  9. 9.
    Bader, G. D. and Hogue, C. W. (2000) BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 16, 465–477.PubMedCrossRefGoogle Scholar
  10. 10.
    Kreegipuu, A., Blom, N., and Brunak, S. (1999) PhosphoBase, a database of phosphorylation sites: release 2.0. Nucleic Acids Res. 27, 237–239.PubMedCrossRefGoogle Scholar
  11. 11.
    Yaffe, M. B., Leparc, G. G., Lai, J., Obata, T., Volinia, S., and Cantley, L. C. (2001) A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nat. Biotechnol. 19, 348–353.PubMedCrossRefGoogle Scholar
  12. 12.
    Brannetti, B., Via, A., Cestra, G., Cesareni, G., and Helmer-Citterich, M. (2000) SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family. J. Mol. Biol. 298, 313–328.PubMedCrossRefGoogle Scholar
  13. 13.
    Blom, N., Gammeltoft, S., and Brunak, S. (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362.PubMedCrossRefGoogle Scholar
  14. 14.
    Tong, A. H., Drees, B., Nardelli, G., et al. (2002) A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324.PubMedCrossRefGoogle Scholar
  15. 15.
    Luthy, R., Xenarios, I., and Bucher, P. (1994) Improving the sensitivity of the sequence profile method. Protein Sci. 3, 139–146.PubMedCrossRefGoogle Scholar
  16. 16.
    Henikoff, S., Henikoff, J. G., and Pietrokovski, S. (1999) Blocks+: a nonredundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15, 471–479.PubMedCrossRefGoogle Scholar
  17. 17.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  18. 18.
    Marchler-Bauer, A., Panchenko, A. R., Shoemaker, B. A., Thiessen, P. A., Geer, L. Y., and Bryant, S. H. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281–283.PubMedCrossRefGoogle Scholar
  19. 19.
    Hart, R. K., Royyuru, A. K., Stolovitzky, G., and Califano, A. (2000) Systematic and fully automated identification of protein sequence patterns. J. Comput. Biol. 7, 585–600.PubMedCrossRefGoogle Scholar
  20. 20.
    Ponting, C. P., Schultz, J., Milpetz, F., and Bork, P. (1999) SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Res. 27, 229–232.PubMedCrossRefGoogle Scholar
  21. 21.
    Kemp, B. E. and Pearson, R. B. (1990) Protein kinase recognition sequence motifs. Trends Biochem. Sci. 15, 342–346.PubMedCrossRefGoogle Scholar
  22. 22.
    Pinna, L. A. and Ruzzene, M. (1996) How do protein kinases recognize their substrates? Biochim. Biophys. Acta 1314, 191–225.PubMedCrossRefGoogle Scholar
  23. 23.
    Songyang, Z. and Cantley, L. C. (1998) The use of peptide library for the determination of kinase peptide substrates. Methods Mol. Biol. 87, 87–98.PubMedGoogle Scholar
  24. 24.
    Yaffe, M. B. and Cantley, L. C. (2000) Mapping specificity determinants for protein-protein association using protein fusions and random peptide libraries. Methods Enzymol. 328, 157–170.PubMedCrossRefGoogle Scholar
  25. 25.
    Rebhan, M., Chalifa-Caspi, V., Prilusky, J., and Lancet, D. (1998) GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656–664.PubMedCrossRefGoogle Scholar
  26. 26.
    Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Finn, R. D., and Sonnhammer, E. L. (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res. 27, 260–262.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2004

Authors and Affiliations

  • John C. Obenauer
    • 1
  • Michael B. Yaffe
    • 1
  1. 1.Center for Cancer ResearchMassachusetts Institute of TechnologyCambridge

Personalised recommendations