Advertisement

Computational Reconstruction of Protein–Protein Interaction Networks: Algorithms and Issues

  • Eric Franzosa
  • Bolan Linghu
  • Yu Xia
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 541)

Abstract

Accurate mapping of protein–protein interaction networks in model organisms is a crucial first step toward subsequent quantitative study of the organization and evolution of biological systems. Data quality of experimental interactome maps can be assessed and improved by integrating multiple sources of evidence using machine learning methods. Here we describe the commonly used algorithms for predicting protein–protein interaction by genome data integration, and discuss several important yet often overlooked issues in computational reconstruction of protein–protein interaction networks.

Key words

Protein–protein interaction machine learning protein network data integration Naïve Bayes logistic regression 

Notes

Acknowledgments

Y.X. thanks Mark Gerstein for advice and support.

References

  1. 1.
    Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403(6770):623–7.PubMedCrossRefGoogle Scholar
  2. 2.
    Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001, 98(8):4569–74.PubMedCrossRefGoogle Scholar
  3. 3.
    Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415(6868):180–3.PubMedCrossRefGoogle Scholar
  4. 4.
    Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415(6868):141–7.PubMedCrossRefGoogle Scholar
  5. 5.
    Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB, Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440(7084):631–6.PubMedCrossRefGoogle Scholar
  6. 6.
    Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrin-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MH, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006, 440(7084):637–43.PubMedCrossRefGoogle Scholar
  7. 7.
    Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M. A map of the interactome network of the metazoan C. elegans. Science 2004, 303(5657):540–3.Google Scholar
  8. 8.
    Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M. Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437(7062):1173–8.PubMedCrossRefGoogle Scholar
  9. 9.
    Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature 2000, 407(6804):651–4.PubMedCrossRefGoogle Scholar
  10. 10.
    Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. Evolutionary rate in the protein interaction network. Science 2002, 296(5568):750–2.PubMedCrossRefGoogle Scholar
  11. 11.
    Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: Simple building blocks of complex networks. Science 2002, 298(5594):824–7.PubMedCrossRefGoogle Scholar
  12. 12.
    Han JD, Dupuy D, Bertin N, Cusick ME, Vidal M. Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 2005, 23(7):839–44.PubMedCrossRefGoogle Scholar
  13. 13.
    Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature 1999, 402(6757):83–6.PubMedCrossRefGoogle Scholar
  14. 14.
    Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–53.PubMedCrossRefGoogle Scholar
  15. 15.
    Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M. Assessing the limits of genomic data integration for predicting protein networks. Genome Res 2005, 15(7):945–53.PubMedCrossRefGoogle Scholar
  16. 16.
    Mewes HW, Heumann K, Kaps A, Mayer K, Pfeiffer F, Stocker S, Frishman D. MIPS: A database for genomes and protein sequences. Nucleic Acids Res 1999, 27(1):44–8.PubMedCrossRefGoogle Scholar
  17. 17.
    Jansen R, Gerstein M. Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol 2004, 7(5):535–45.PubMedCrossRefGoogle Scholar
  18. 18.
    Xia Y, Lu LJ, Gerstein M. Integrated prediction of the helical membrane protein interactome in yeast. J Mol Biol 2006, 357(1):339–49.PubMedCrossRefGoogle Scholar
  19. 19.
    Ben-Hur A, Noble WS. Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 2006, 7(Suppl 1):S2.PubMedCrossRefGoogle Scholar
  20. 20.
    Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: Protein-protein interologs and protein-DNA regulogs. Genome Res 2004, 14(6):1107–18.PubMedCrossRefGoogle Scholar
  21. 21.
    Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96(8):4285–8.PubMedCrossRefGoogle Scholar
  22. 22.
    Tamames J, Casari G, Ouzounis C, Valencia A. Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 1997, 44(1):66–73.PubMedCrossRefGoogle Scholar
  23. 23.
    Goh CS, Cohen FE. Co-evolutionary analysis reveals insights into protein-protein interactions. J Mol Biol 2002, 324(1):177–92.PubMedCrossRefGoogle Scholar
  24. 24.
    Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. Prolinks: A database of protein functional linkages derived from coevolution. Genome Biol 2004, 5(5):R35.PubMedCrossRefGoogle Scholar
  25. 25.
    Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285(5428):751–3.PubMedCrossRefGoogle Scholar
  26. 26.
    Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA. Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402(6757):86–90.PubMedCrossRefGoogle Scholar
  27. 27.
    Ge H, Liu Z, Church GM, Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 2001, 29(4):482–6.PubMedCrossRefGoogle Scholar
  28. 28.
    Jansen R, Greenbaum D, Gerstein M. Relating whole-genome expression data with protein-protein interactions. Genome Res 2002, 12(1):37–46.PubMedCrossRefGoogle Scholar
  29. 29.
    Yu H, Luscombe NM, Qian J, Gerstein M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet 2003, 19(8):422–7.PubMedCrossRefGoogle Scholar
  30. 30.
    Yu H, Greenbaum D, Xin Lu H, Zhu X, Gerstein M. Genomic analysis of essentiality within protein networks. Trends Genet 2004, 20(6):227–31.PubMedCrossRefGoogle Scholar
  31. 31.
    Lu L, Arakaki AK, Lu H, Skolnick J. Multimeric threading-based prediction of protein-protein interactions on a genomic scale: Application to the Saccharomyces cerevisiae proteome. Genome Res 2003, 13(6A):1146–54.PubMedCrossRefGoogle Scholar
  32. 32.
    Ng AY, Jordan MI. On discriminative vs. generative classifiers: A comparison of logistic regression and Naive Bayes. Adv Neural Inform Process Syst 2002, 2(14):841–8.Google Scholar
  33. 33.
    Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Statist 2004, 32(1):56–85.CrossRefGoogle Scholar
  34. 34.
    Ben-Hur A, Noble WS. Kernel methods for predicting protein-protein interactions. Bioinformatics 2005, 21(Suppl 1):i38–46.PubMedCrossRefGoogle Scholar
  35. 35.
    Kondor RI, Lafferty JD. Diffusion kernels on graphs and other discrete input spaces. In: Proc 19th International Conf on Machine Learning. Morgan Kaufmann Publishers Inc., 2002, pp. 315–22.Google Scholar
  36. 36.
    Rives AW, Galitski T. Modular organization of cellular networks. Proc Natl Acad Sci USA 2003, 100(3):1128–33.PubMedCrossRefGoogle Scholar
  37. 37.
    Lanckriet GR, De Bie T, Cristianini N, Jordan MI, Noble WS. A statistical framework for genomic data fusion. Bioinformatics 2004, 20(16):2626–35.PubMedCrossRefGoogle Scholar
  38. 38.
    Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA, Bader JS. Gene function prediction from congruent synthetic lethal interactions in yeast. Mol Syst Biol 2005, 1:2005.0026.Google Scholar
  39. 39.
    Lin N, Wu B, Jansen R, Gerstein M, Zhao H. Information assessment on predicting protein-protein interactions. BMC Bioinformatics 2004, 5:154.PubMedCrossRefGoogle Scholar
  40. 40.
    Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: A general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database issue):D535–9.PubMedCrossRefGoogle Scholar
  41. 41.
    Pan X, Ye P, Yuan DS, Wang X, Bader JS, Boeke JD. A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell 2006, 124(5):1069–81.PubMedCrossRefGoogle Scholar
  42. 42.
    Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, Weissman JS, Krogan NJ. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 2005, 123(3):507–19.PubMedCrossRefGoogle Scholar
  43. 43.
    Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Menard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, Sdicu AM, Shapiro J, Sheikh B, Suter B, Wong SL, Zhang LV, Zhu H, Burd CG, Munro S, Sander C, Rine J, Greenblatt J, Peter M, Bretscher A, Bell G, Roth FP, Brown GW, Andrews B, Bussey H, Boone C. Global mapping of the yeast genetic interaction network. Science 2004, 303(5659):808–13.PubMedCrossRefGoogle Scholar
  44. 44.
    Miller JP, Lo RS, Ben-Hur A, Desmarais C, Stagljar I, Noble WS, Fields S. Large-scale identification of yeast integral membrane protein interactions. Proc Natl Acad Sci USA 2005, 102(34):12123–8.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Eric Franzosa
    • 1
  • Bolan Linghu
    • 1
  • Yu Xia
    • 2
  1. 1.Bioinformatics ProgramBoston UniversityBostonUSA
  2. 2.Bioinformatics Unit, Branch of Research Resources, National Institute on Aging, National Institutes of HealthBostonUSA

Personalised recommendations