Abstract
Recent advances in high-throughput experimental techniques have enabled the production of a wealth of protein interaction data, rich in both quantity and variety. While the sheer quantity and variety of data present special difficulties for modeling, they also present unique opportunities for gaining insight into protein behavior by leveraging multiple perspectives. Recent work on the modularity of protein interactions has revealed that reasoning about protein interactions at the level of domain interactions can be quite useful. We present proctor, a learning algorithm for reconstructing the internal topology of protein complexes by reasoning at the domain level about both direct protein interaction data (Y2H) and protein co-complex data (AP-MS). While other methods have attempted to use data from both these kinds of assays, they usually require that co-complex data be transformed into pairwise interaction data under a spoke or clique model, a transformation we do not require. We apply proctor to data from eight high-throughput datasets, encompassing 5,925 proteins, essentially all of the yeast proteome. First we show that proctor outperforms other algorithms for predicting domain-domain and protein-protein interactions from Y2H and AP-MS data. Then we show that our algorithm can reconstruct the internal topology of AP-MS purifications, revealing known complexes like Arp2/3 and RNA polymerase II, as well as suggesting new complexes along with their corresponding topologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aloy, P., Bottcher, B., Ceulemans, H., Leutwein, C., Mellwig, C., Fischer, S., Gavin, A.C., Bork, P., Superti-Furga, G., Serrano, L., Russell, R.B.: Structure-based assembly of protein complexes in yeast. Nature 303, 2026–2029 (2004)
Aloy, P., Russell, R.B.: Structural systems biology: modeling protein interactions. Nature Reviews in Molecular Cell Biology 7, 188–197 (2006)
Bader, G.D., Donaldson, I., Wolting, C., Ouellette, B.F., Pawson, T., Hogue, C.W.: BIND - the biomolecular interaction network database. Nucleic Acids Research 29, 242–245 (2001)
Bateman, A., et al.: The pfam protein families database. Nucleic Acids Research 32, D138–D141 (2004)
Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. In: ISMB. ISCB (June 2005)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Broder, A.Z.: Generating random spanning trees. In: Foundations of Computer Science, pp. 442–447. IEEE Computer Society Press, Los Alamitos (1989)
Chu, W., Ghahramani, Z., Krause, R., Wild, D.L.: Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model. In: PSB, pp. 231–242 (2006)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proc. 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Deng, M., Mehta, S., Sun, F., Chen, T.: Inferring domain-domain interactions from protein-protein interactions. In: RECOMB ’02: Proceedings of the sixth annual international conference on Computational biology, pp. 117–126. ACM Press, New York (2002)
D’haeseleer, P., Church, G.M.: Estimating and improving protein interaction error rates. In: CSB, IEEE, Los Alamitos (Aug. 2004)
Drawid, A., Gerstein, M.: A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. Journal of Molecular Biology 301, 1059–1075 (2000)
Edwards, R., Glass, L.: Combinatorial explosion in model gene networks. Chaos 10, 691–704 (2000)
Finn, R., Marshall, M., Bateman, A.: ipfam: visualization of protein-protein interactions in pdb at domain and amino acid resolutions. Bioinformatics 21, 410–412 (2005)
Gavin, A.-C., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002)
Gavin, A.-C., et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006)
Gilchrist, M.A., Salter, L.A., Wagner, A.: A statistical framework for combining and interpreting proteomic datasets. Bioinformatics 20, 689–700 (2004)
Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions. Bioinformatics 19, 1875–1881 (2003)
Gomez, S.M., Rzhetsky, A.: Towards the prediction of complete protein-protein interaction networks. PSB 7, 413–424 (2002)
Ho, Y., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 123–124 (2002)
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569–4574 (2001)
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)
Krogan, N.J., et al.: High-definition macromolecular composition of yeast rna-processing complexes. Molecular Cell 13, 225–239 (2004)
Krogan, N.J., et al.: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440, 637–643 (2006)
Lyons, R., Peres, Y.: Probability on trees and networks. Cambridge University Press, Cambridge (in progress, 2005)
Martin, S., Roe, D., Faulon, J.-L.: Predicting protein-protein interactions using signature products. Bioinformatics 21, 218–226 (2005)
Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: MIPS: a database for genomes and protein sequences. Nucleic Acids Research 30, 31–34 (2002)
Mulder, N.J., et al.: Interpro, progress and status in 2005. Nucleic Acids Research 33, D201–D205 (2005)
Nye, T.M., Berzuini, C., Gilks, W.R., Babu, M.M., Teichmann, S.A.: Statistical analysis of domains in interacting protein pairs. Bioinformatics 21, 993–1001 (2005)
Nye, T.M., Berzuini, C., Gilks, W.R., Babu, M.M., Teichmann, S.A.: Predicting the strongest domain-domain contact in interacting protein pairs. Statistical Applications in Genetics and Molecular Biology 5 (2006)
Scholtens, D., Vidal, M., Gentleman, R.: Local modeling of global interactome networks. Bioinformatics 21, 3548–3557 (2005)
Sprinzak, E., Margalit, H.: Correlated sequence-signatures as markers of protein-protein interaction. J. Mol. Biol. 311, 681–692 (2001)
Stein, A., Russell, R., Aloy, P.: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Research 33, D413–D417 (2005)
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Narayan, V., Lockshon, D., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770), 623–627 (2000)
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)
Wang, H., Segal, E., Ben-Hur, A., Koller, D., Brutlag, D.: Identifying protein-protein interaction sites on a genome-wide scale. In: Advances in Neural Information Processing Systems (NIPS 2004), Vancouver, Canada (2004)
Wilson, D.B.: Generating random spanning trees more quickly than the cover time. In: Symposium on Theory of Computing, pp. 296–303. ACM Press, New York (1996)
Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research 30, 303–305 (2002)
Zhang, L.V., Wong, S.L., King, O.D., Roth, F.P.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5, 1–15 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Bernard, A., Vaughn, D.S., Hartemink, A.J. (2007). Reconstructing the Topology of Protein Complexes. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-71681-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71680-8
Online ISBN: 978-3-540-71681-5
eBook Packages: Computer ScienceComputer Science (R0)