Abstract
The ever accumulating wealth of knowledge about protein interactions and the domain architecture of involved proteins in different organisms offers ways to understand the intricate interplay between interactome and proteome. Ultimately, the combination of these sources of information will allow the prediction of interactions among proteins where only domain composition is known. Based on the currently available protein–protein interaction and domain data of Saccharomyces cerevisiae and Drosophila melanogaster we introduce a novel method, Maximum Specificity Set Cover (MSSC), to predict potential protein–protein interactions. Utilizing interactions and domain architectures of domains as training sets, this algorithm employs a set cover approach to partition domain pairs, which allows the explanation of the underlying protein interaction to the largest degree of specificity. While MSSC in its basic version only considers domain pairs as the driving force between interactions, we also modified the algorithm to account for combinations of more than two domains that govern a protein–protein interaction. This approach allows us to predict the previously unknown protein–protein interactions in S. cerevisiae and D. melanogaster, with a degree of sensitivity and specificity that clearly outscores other approaches. As a proof of concept we also observe high levels of co-expression and decreasing GO distances between interacting proteins. Although our results are very encouraging, we observe that the quality of predictions significantly depends on the quality of interactions, which were utilized as the training set of the algorithm. The algorithm is part of a Web portal available at http://ppi.cse.nd.edu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rain JC, Selig L, DeReuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wo jcik J, Schächter V, Chemama Y, Labigne A, Legrain P. The protein–protein interaction map of Helicobacter pylori. Nature 2001, 409, 211–215.
Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y. Towards a protein–protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Nat Acad Sci USA 2000, 97, 1143–1147.
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Nat Acad Sci USA 2001, 98, 4569–4574.
Uetz P, Giot L, Cagney G, Mansfield T, Judson R, Knight J, Lockshorn D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg J. A comprehensive analysis of protein–protein interactions of Saccharomyces cerevisiae. Nature 2000, 403, 623–627.
Gavin A, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick J, Michon AM, Cruciat CM, Remor M, Böfert C, Schelder M, Bra jenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley R, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415, 141–147.
Ho Y, Gruhler A, Heilbut A, Bader G, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutillier K, coauthors. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415, 180–183.
Jeong H, Mason S, Barabási AL, Oltvai Z. Lethality and centrality in protein networks. Nature 2001, 411, 41–42.
Walhout A, Sordella R, Lu X, Hartley J, Temple G, Brasch M, Thierry-Mieg N, Vidal M. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 2000, 287, 116–122.
Li S, Armstrong C, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Ha T, et al. A map of the interactome network of the metazoan C. elegans. Science 2004, 303, 540–543.
Giot L, Bader J, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao Y, Ooi C, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon C, Finley R Jr, White K, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets R, McKenna M, Chant J, Rothberg J. A protein interaction map of Drosophila melanogaster. Science 2004, 302, 1727–1736.
Enright A, Iliopoulos I, Kyrpides N, Ouzounis C. Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402, 86–90.
Marcotte E, Pellegrini M, Thompson M, Yeates T, Eisenberg D. A combined algorithm for genomewide prediction of protein function. Nature 1999, 402, 83–86.
Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96, 4285–4288.
Wo jcik J, Schächter V. protein–protein interaction map inference using interacting domain profile pairs. Bioinformatics 2001, 17, 296S–305S.
Deng M, Mehta S, Sun F, Cheng T. Inferring domain-domain interactions from protein–protein interactions. Genome Res 2002, 12, 1540–1548.
Iossifov I, Krauthammer M, Friedman C, Hatzivassiloglou V, Bader J, White K, Rzhetsky A. Probabilistic inference of molecular networks from noisy data sources. Bioinformatics 2004, 20, 1205–1213.
Sprinzak E, Margalit H. Correlated sequence-signature as markers of protein–protein interaction. J Mol Biol 2001, 311, 681–692.
Goldberg D, Roth F. Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA 2003, 100, 4372–4376.
Tong A, Drees B, Nardelli G, Bader G, Branetti B, Castagnoli L, Evangelista M, ferracuti S, Nelson B, Apoluzzi S, et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295, 321–324.
Albert I, Albert R. Conserved network motifs allow protein–protein interaction prediction. Bioinformatics 2004, 20, 3346–3352.
Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G. Mint – a molecular interaction database. FEBS Lett. 513, 2002, 135–140.
Mewes HW, D Frishman UB, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B. MIPS: A database for genomes and protein sequences. Nucl Acids Res 2002, 30, 31–34.
Bader G, Donaldson I, Wolting C, Ouellette B, Pawson T, Hogue C. BIND – The biomolecular interaction network database. Nucl Acids Res 2001, 29, 242–245.
Xenarios I, Salwinski L, Duan X, Higney P, Kim SM, Eisenberg D. Dip, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucl Acids Res 2002, 30, 303–305.
Bader J, Chaudhuri D, Rothberg J, Chant J. Gaining confidence in high-throughput protein interaction networks. Nature Biotech 2004, 22, 78–85.
Apweiler R, Bairoch A, Wu C, Barker W, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin M, Natale D, O’Donovan C, Redaschi N, Yeh L. Uniprot: The universal protein knowledgebase. Nucl Acids Res 2004, 32, D115–D119.
Mulder N, Apweiler R, Attwood T, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley R, Courcelle E, Das U, Durbin R, LFalquet, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, MKrestyaninova, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard S, Pagni M, Peyruc D, Ponting C, Selengut J, Servant F, Sigrist C, Vaughan R, Zdobnov E. The interpro database, 2003 brings increased coverage and new features. Nucl Acids Res 2003, 31, 315–318.
Kriventseva E, Fleischmann W, Zdobnov E, Apweiler R. CluSTr: A database of clusters of SWISS-PROT+TrEMBL proteins. Nucl Acids Res 2001, 29, 33–36.
Consortium G. The gene ontology (go) database and information resource. Nucl Acids Res 2004, 32, D258–D261.
Kersey P, Duarte J, Williams A, Apweiler R, Karavidopoulou Y, Birney E. The international protein index: An integrated database for proteomics experiments. Proteomics 2004, 4, 1985–1988.
Bateman A, Coin L, Durbin R, Finn R, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer E, Studholme D, Yeats C, Eddy S. The PFAM protein families database. Nucl Acids Res 2004, 32, D138–D141.
Grigoriev A. A relationship between gene expression and protein interactions on the proteome scale: Analysis of the bacteriophage t7 and the yeast Saccharomyces cerevisiae. Nucl Acids Res 2001, 29: 3513–3519.
Ge H, Ziu L, Church G, Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 2001, 29, 482–486.
Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B. GOToolBox: Functional analysis of gene datasets based on gene ontology. Genome Biol. 2004, 5, R101.
Doolittle R. The multiplicity of domains in proteins. Ann Rev Biochem 1995, 64, 287–314.
Li WH, Gu Z, Wang H. Evolutionary analyses of the human genome. Nature 2001, 409, 847–849.
Johnson DS. Approximation algorithms for combinatorial problems. J Comput System Sci 1974, 9, 256–278.
Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms, Second Edition. McGraw Hill Boston, MA, 2001.
Huang C, Morcos F, Kanaan S, Wuchty S, Chen D, Izaguirre J. Predicting protein–protein interactions from protein domains using a set cover approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2007, 4, 78–87.
von Mering C, Krause R, Snel B, Cornell M, Oliver S, Fields S, Bork P. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 2003, 31, 399–403.
Wuchty S. Topology and evolution in the yeast protein interaction network. Genome Res 2004, 14, 1310–1314.
Fraser H, Hirsh A, Steinmetz L, Scharfe C, Feldman M. Evolutionary rate in the protein interaction network. Science 2002, 296, 750–752.
Wuchty S, Oltvai Z, Barabaśi AL. Evolutionary conservation of motif constituents within the yeast protein interaction network. Nat Genet 2003, 35, 176–179.
Wuchty S, Barabási AL, Ferdig M. Stable evolutionary signal in a yeast protein int eraction network. BMC Evol Biol. 2006, 6, pp. 8.
Acknowledgments
Danny Chen was supported in part by the NSF under Grant CCF-0515203. Jesús Izaguirre was supported by partial funding from NSF grants IOB-0313730, CCR-0135195, and DBI- 0450067. Stefan Wuchty was supported by the Northwestern Institute of Complexity (NICO).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Kanaan, S.P., Huang, C., Wuchty, S., Chen, D.Z., Izaguirre, J.A. (2009). Inferring Protein–Protein Interactions from Multiple Protein Domain Combinations. In: Ireton, R., Montgomery, K., Bumgarner, R., Samudrala, R., McDermott, J. (eds) Computational Systems Biology. Methods in Molecular Biology, vol 541. Humana Press. https://doi.org/10.1007/978-1-59745-243-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-59745-243-4_3
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-905-5
Online ISBN: 978-1-59745-243-4
eBook Packages: Springer Protocols