Skip to main content

Reconstructing the Topology of Protein Complexes

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Abstract

Recent advances in high-throughput experimental techniques have enabled the production of a wealth of protein interaction data, rich in both quantity and variety. While the sheer quantity and variety of data present special difficulties for modeling, they also present unique opportunities for gaining insight into protein behavior by leveraging multiple perspectives. Recent work on the modularity of protein interactions has revealed that reasoning about protein interactions at the level of domain interactions can be quite useful. We present proctor, a learning algorithm for reconstructing the internal topology of protein complexes by reasoning at the domain level about both direct protein interaction data (Y2H) and protein co-complex data (AP-MS). While other methods have attempted to use data from both these kinds of assays, they usually require that co-complex data be transformed into pairwise interaction data under a spoke or clique model, a transformation we do not require. We apply proctor to data from eight high-throughput datasets, encompassing 5,925 proteins, essentially all of the yeast proteome. First we show that proctor outperforms other algorithms for predicting domain-domain and protein-protein interactions from Y2H and AP-MS data. Then we show that our algorithm can reconstruct the internal topology of AP-MS purifications, revealing known complexes like Arp2/3 and RNA polymerase II, as well as suggesting new complexes along with their corresponding topologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aloy, P., Bottcher, B., Ceulemans, H., Leutwein, C., Mellwig, C., Fischer, S., Gavin, A.C., Bork, P., Superti-Furga, G., Serrano, L., Russell, R.B.: Structure-based assembly of protein complexes in yeast. Nature 303, 2026–2029 (2004)

    Google Scholar 

  2. Aloy, P., Russell, R.B.: Structural systems biology: modeling protein interactions. Nature Reviews in Molecular Cell Biology 7, 188–197 (2006)

    Article  Google Scholar 

  3. Bader, G.D., Donaldson, I., Wolting, C., Ouellette, B.F., Pawson, T., Hogue, C.W.: BIND - the biomolecular interaction network database. Nucleic Acids Research 29, 242–245 (2001)

    Article  Google Scholar 

  4. Bateman, A., et al.: The pfam protein families database. Nucleic Acids Research 32, D138–D141 (2004)

    Article  Google Scholar 

  5. Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. In: ISMB. ISCB (June 2005)

    Google Scholar 

  6. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)

    Article  Google Scholar 

  7. Broder, A.Z.: Generating random spanning trees. In: Foundations of Computer Science, pp. 442–447. IEEE Computer Society Press, Los Alamitos (1989)

    Google Scholar 

  8. Chu, W., Ghahramani, Z., Krause, R., Wild, D.L.: Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model. In: PSB, pp. 231–242 (2006)

    Google Scholar 

  9. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proc. 23rd International Conference on Machine Learning, pp. 233–240 (2006)

    Google Scholar 

  10. Deng, M., Mehta, S., Sun, F., Chen, T.: Inferring domain-domain interactions from protein-protein interactions. In: RECOMB ’02: Proceedings of the sixth annual international conference on Computational biology, pp. 117–126. ACM Press, New York (2002)

    Chapter  Google Scholar 

  11. D’haeseleer, P., Church, G.M.: Estimating and improving protein interaction error rates. In: CSB, IEEE, Los Alamitos (Aug. 2004)

    Google Scholar 

  12. Drawid, A., Gerstein, M.: A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. Journal of Molecular Biology 301, 1059–1075 (2000)

    Article  Google Scholar 

  13. Edwards, R., Glass, L.: Combinatorial explosion in model gene networks. Chaos 10, 691–704 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Finn, R., Marshall, M., Bateman, A.: ipfam: visualization of protein-protein interactions in pdb at domain and amino acid resolutions. Bioinformatics 21, 410–412 (2005)

    Article  Google Scholar 

  15. Gavin, A.-C., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)

    Article  Google Scholar 

  16. Gavin, A.-C., et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006)

    Article  Google Scholar 

  17. Gilchrist, M.A., Salter, L.A., Wagner, A.: A statistical framework for combining and interpreting proteomic datasets. Bioinformatics 20, 689–700 (2004)

    Article  Google Scholar 

  18. Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions. Bioinformatics 19, 1875–1881 (2003)

    Article  Google Scholar 

  19. Gomez, S.M., Rzhetsky, A.: Towards the prediction of complete protein-protein interaction networks. PSB 7, 413–424 (2002)

    Google Scholar 

  20. Ho, Y., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 123–124 (2002)

    Article  Google Scholar 

  21. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569–4574 (2001)

    Article  Google Scholar 

  22. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)

    Article  Google Scholar 

  23. Krogan, N.J., et al.: High-definition macromolecular composition of yeast rna-processing complexes. Molecular Cell 13, 225–239 (2004)

    Article  Google Scholar 

  24. Krogan, N.J., et al.: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440, 637–643 (2006)

    Article  Google Scholar 

  25. Lyons, R., Peres, Y.: Probability on trees and networks. Cambridge University Press, Cambridge (in progress, 2005)

    Google Scholar 

  26. Martin, S., Roe, D., Faulon, J.-L.: Predicting protein-protein interactions using signature products. Bioinformatics 21, 218–226 (2005)

    Article  Google Scholar 

  27. Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: MIPS: a database for genomes and protein sequences. Nucleic Acids Research 30, 31–34 (2002)

    Article  Google Scholar 

  28. Mulder, N.J., et al.: Interpro, progress and status in 2005. Nucleic Acids Research 33, D201–D205 (2005)

    Article  Google Scholar 

  29. Nye, T.M., Berzuini, C., Gilks, W.R., Babu, M.M., Teichmann, S.A.: Statistical analysis of domains in interacting protein pairs. Bioinformatics 21, 993–1001 (2005)

    Article  Google Scholar 

  30. Nye, T.M., Berzuini, C., Gilks, W.R., Babu, M.M., Teichmann, S.A.: Predicting the strongest domain-domain contact in interacting protein pairs. Statistical Applications in Genetics and Molecular Biology 5 (2006)

    Google Scholar 

  31. Scholtens, D., Vidal, M., Gentleman, R.: Local modeling of global interactome networks. Bioinformatics 21, 3548–3557 (2005)

    Article  Google Scholar 

  32. Sprinzak, E., Margalit, H.: Correlated sequence-signatures as markers of protein-protein interaction. J. Mol. Biol. 311, 681–692 (2001)

    Article  Google Scholar 

  33. Stein, A., Russell, R., Aloy, P.: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Research 33, D413–D417 (2005)

    Article  Google Scholar 

  34. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Narayan, V., Lockshon, D., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770), 623–627 (2000)

    Article  Google Scholar 

  35. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)

    Article  Google Scholar 

  36. Wang, H., Segal, E., Ben-Hur, A., Koller, D., Brutlag, D.: Identifying protein-protein interaction sites on a genome-wide scale. In: Advances in Neural Information Processing Systems (NIPS 2004), Vancouver, Canada (2004)

    Google Scholar 

  37. Wilson, D.B.: Generating random spanning trees more quickly than the cover time. In: Symposium on Theory of Computing, pp. 296–303. ACM Press, New York (1996)

    Google Scholar 

  38. Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research 30, 303–305 (2002)

    Article  Google Scholar 

  39. Zhang, L.V., Wong, S.L., King, O.D., Roth, F.P.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5, 1–15 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Bernard, A., Vaughn, D.S., Hartemink, A.J. (2007). Reconstructing the Topology of Protein Complexes. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71681-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71680-8

  • Online ISBN: 978-3-540-71681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics