Structural Pattern Discovery in Protein–Protein Interaction Networks

  • Tamás NepuszEmail author
  • Alberto PaccanaroEmail author
Part of the Springer Handbooks book series (SHB)


Most proteins in a cell do not act in isolation, but carry out their function through interactions with other proteins. Elucidating these interactions is therefore central for our understanding of cellular function and organization. Recently, experimental techniques have been developed, which have allowed us to measure protein interactions on a genomic scale for several model organisms. These datasets have a natural representation as weighted graphs, also known as protein–protein interaction (PPI) networks. This chapter will present some recent advances in computational methods for the analysis of these networks, which are aimed at revealing their structural patterns. In particular, we shall focus on methods for uncovering modules that correspond to protein complexes, and on random graph models, which can be used to de-noise large scale PPI networks. In Sect. 23.1, the state-of-the-art techniques and algorithms are described followed by the definition of measures to assess the quality of the predicted complexes and the presentation of a benchmark of the detection algorithms on four PPI networks. Section 23.2 moves beyond protein complexes and explores other structural patterns of protein–protein interaction networks using random graph models.


Gene Ontology Random Graph Random Graph Model Reference Complex Geometric Random Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.





co-affinity purification followed by mass spectrometry


area under curve


biological process


biological general repository for interaction dataset


cellular component


directed acyclic graph


gene ontology


geometric random graph


human protein reference database


hierarchical random graph,


Kyoto encyclopedia for genes and genomes


maximum a posteriori


Markov clustering


molecular complex detection


membership function


Munich Information Center for Protein Sequences


maximum likelihood


maximum matching ratio


protein–protein interaction


positive predictive value


ribonucleic acid


restricted neighborhood search clustering


receiver operating characteristic


structural classification of proteins


yeast two hybrid


logistic regression


  1. 23.1.
    P. Uetz, L. Giot, G. Cagney, T. Mansfield, R. Judson, J. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, J. Rothberg: A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae, Nature 403(6770), 623–627 (2000)CrossRefGoogle Scholar
  2. 23.2.
    T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, Y. Sakaki: Toward a protein–protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins, Proc. Natl. Acad. Sci. USA 97(3), 1143–1147 (2000)CrossRefGoogle Scholar
  3. 23.3.
    L. Giot, J. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y. Hao, C. Ooi, B. Godwin, E. Vitols, G. Vijayadamodar, P. Pochart, H. Machineni, M. Welsh, Y. Kong, B. Zerhusen, R. Malcolm, Z. Varrone, A. Collis, M. Minto, S. Burgess, L. McDaniel, E. Stimpson, F. Spriggs, J. Williams, K. Neurath, N. Ioime, M. Agee, E. Voss, K. Furtak, R. Renzulli, N. Aanensen, S. Carrolla, E. Bickelhaupt, Y. Lazovatsky, A. DaSilva, J. Zhong, C. Stanyon, R. Finley, K. White, M. Braverman, T. Jarvie, S. Gold, M. Leach, J. Knight, R. Shimkets, M. McKenna, J. Chant, J. Rothberg: A protein interaction map of Drosophila melanogaster, Science 302(5651), 1727–1736 (2003)CrossRefGoogle Scholar
  4. 23.4.
    S. Li, C. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem, P. Vidalain, J. Han, A. Chesneau, T. Hao, D. Goldberg, N. Li, M. Martinez, J. Rual, P. Lamesch, L. Xu, M. Tewari, S. Wong, L. Zhang, G. Berriz, L. Jacotot, P. Vaglio, J. Reboul, T. Hirozane-Kishikawa, Q. Li, H. Gabel, A. Elewa, B. Baumgartner, D. Rose, H. Yu, S. Bosak, R. Sequerra, A. Fraser, S. Mango, W. Saxton, S. Strome, S. Van Den Heuvel, F. Piano, J. Vandenhaute, C. Sardet, M. Gerstein, L. Doucette-Stamm, K. Gunsalus, J. Harper, M. Cusick, F. Roth, D. Hill, M. Vidal: A map of the interactome network of the metazoan C. elegans, Science 303(5657), 540–543 (2004)CrossRefGoogle Scholar
  5. 23.5.
    U. Stelzl, U. Worm, M. Lalowski, C. Haenig, F. Brembeck, H. Goehler, M. Stroedicke, M. Zenkner, A. Schoenherr, S. Koeppen, J. Timm, S. Mintzlaff, C. Abraham, N. Bock, S. Kietzmann, A. Goedde, E. Toksöz, A. Droege, S. Krobitsch, B. Korn, W. Birchmeier, H. Lehrach, E. Wanker: A human protein–protein interaction network: A resource for annotating the proteome, Cell 122(6), 957–968 (2005)CrossRefGoogle Scholar
  6. 23.6.
    J. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, G. Berriz, F. Gibbons, M. Dreze, N. Ayivi-Guedehoussou, N. Klitgord, C. Simon, M. Boxem, S. Milstein, J. Rosenberg, D. Goldberg, L. Zhang, S. Wong, G. Franklin, S. Li, J. Albala, J. Lim, C. Fraughton, E. Llamosas, S. Cevik, C. Bex, P. Lamesch, R. Sikorski, J. Vandenhaute, H. Zoghbi, A. Smolyar, S. Bosak, R. Sequerra, L. Doucette-Stamm, M. Cusick, D. Hill, F. Roth, M. Vidal: Towards a proteome-scale map of the human protein–protein interaction network, Nature 437(7062), 1173–1178 (2005)CrossRefGoogle Scholar
  7. 23.7.
    N. Krogan, G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta, A. Tikuisis, T. Punna, J. Peregrin-Alvarez, M. Shales, X. Zhang, M. Davey, M. Robinson, A. Paccanaro, J. Bray, A. Sheung, B. Beattie, D. Richards, V. Canadien, A. Lalev, F. Mena, P. Wong, A. Starostine, M. Canete, J. Vlasblom, S. Wu, C. Orsi, S. Collins, S. Chandran, R. Haw, J. Rilstone, K. Gandi, N. Thompson, G. Musso, P. St. Onge, S. Ghanny, M. Lam, G. Butland, A. Altaf-Ui, S. Kanaya, A. Shilatifard, E. OʼShea, J. Weissman, C. Ingles, T. Hughes, J. Parkinson, M. Gerstein, S. Wodak, A. Emili, J. Greenblatt: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae, Nature 440(7084), 637–643 (2006)CrossRefGoogle Scholar
  8. 23.8.
    A. Gavin, P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M. Heurtier, V. Hoffman, C. Hoefert, K. Klein, M. Hudak, A. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J. Rick, B. Kuster, P. Bork, R. Russell, G. Superti-Furga: Proteome survey reveals modularity of the yeast cell machinery, Nature 440(7084), 631–636 (2006)CrossRefGoogle Scholar
  9. 23.9.
    N. Pržulj, D. Corneil, I. Jurisica: Modeling interactome: Scale-free or geometric?, Bioinformatics 20(18), 3508–3515 (2004)CrossRefGoogle Scholar
  10. 23.10.
    L. Lu, Y. Xia, A. Paccanaro, H. Yu, M. Gerstein: Assessing the limits of genomic data integration for predicting protein networks, Genome Res. 15(7), 945–953 (2005)CrossRefGoogle Scholar
  11. 23.11.
    H. Yu, A. Paccanaro, V. Trifonov, M. Gerstein: Predicting interactions in protein networks by completing defective cliques, Bioinformatics 22(7), 823–829 (2006)CrossRefGoogle Scholar
  12. 23.12.
    S.R. Collins, P. Kemmeren, X.C. Zhao, J.F. Greenblatt, F. Spencer, F.C. Holstege, J.S. Weissman, N.J. Krogan: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae, Mol. Cell Proteomics 6, 439–450 (2007)CrossRefGoogle Scholar
  13. 23.13.
    D. Higham, M. Rašajski, N. Pržulj: Fitting a geometric graph to a protein–protein interaction network, Bioinformatics 24(8), 1093–1099 (2008)CrossRefGoogle Scholar
  14. 23.14.
    H. Yu, P. Braun, M. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, T. Hao, J. Rual, A. Dricot, A. Vazquez, R. Murray, C. Simon, L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A. de Smet, A. Motyl, M. Hudson, J. Park, X. Xin, M. Cusick, T. Moore, C. Boone, M. Snyder, F. Roth, A. Barabási, J. Tavernier, D. Hill, M. Vidal: High-quality binary protein interaction map of the yeast interactome network, Science 322(5898), 104–110 (2008)CrossRefGoogle Scholar
  15. 23.15.
    O. Kuchaiev, M. Rašajski, D. Higham, N. Pržulj: Geometric de-noising of protein–protein interaction networks, PLoS Comp. Biol. 5(8), e1000454 (2009)CrossRefMathSciNetGoogle Scholar
  16. 23.16.
    B. Karrer, M.E.J. Newman: Stochastic blockmodels and community structure in networks, Phys. Rev. E 83(1 Pt 2), 016107 (2011)MathSciNetCrossRefGoogle Scholar
  17. 23.17.
    M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, M. Harris, D. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. Matese, J. Richardson, M. Ringwald, G. Rubin, G. Sherlock: Gene ontology: Tool for the unification of biology, Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  18. 23.18.
    M. Kanehisa, M. Araki, S. Goto, M. Hattori, M. Hirakawa, M. Itoh, T. Katayama, S. Kawashima, S. Okuda, T. Tokimatsu, Y. Yamanishi: KEGG for linking genomes to life and the environment, Nucl. Acids Res. 36(Database issue), D480–4 (2008)Google Scholar
  19. 23.19.
    B. Alberts, A. Johnson, J. Lewis, M. Raff: Molecular Biology of the Cell, 4th edn. (Garland Science, New York 2002), Chap. 6, p. 342Google Scholar
  20. 23.20.
    A. King, N. Pržulj, I. Jurisica: Protein complex prediction via cost-based clustering, Bioinformatics 20(17), 3013–3020 (2004)CrossRefGoogle Scholar
  21. 23.21.
    A.J. Enright, S.V. Dongen, C.A. Ouzounis: An efficient algorithm for large-scale detection of protein families, Nucl. Acids Res. 30(7), 1575–1584 (2002)CrossRefGoogle Scholar
  22. 23.22.
    S. van Dongen: Graph clustering via a discrete uncoupling process, SIAM J. Matrix Anal. Appl. 30, 121–141 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.23.
    A. King: Graph Clustering with Restricted Neighborhood Search, Masterʼs thesis (University of Toronto, Toronto 2004)Google Scholar
  24. 23.24.
    F. Glover, M. Laguna: Tabu Search (Kluwer Academic, Dordrecht 1997)CrossRefzbMATHGoogle Scholar
  25. 23.25.
    G.D. Bader, C.W. Hogue: An automated method for finding molecular complexes in large protein interaction networks, BMC Bioinformatics 4, 2 (2003)CrossRefGoogle Scholar
  26. 23.26.
    G. Palla, I. Derényi, I. Farkas, T. Vicsek: Uncovering the overlapping community structure of complex networks in nature and society, Nature 435(7043), 814–818 (2005)CrossRefGoogle Scholar
  27. 23.27.
    T. Nepusz, H. Yu, A. Paccanaro: Detecting overlapping protein complexes from protein–protein interaction networks, Nat. Methods 9(5), 471–472 (2012)CrossRefGoogle Scholar
  28. 23.28.
    B. Adamcsek, G. Palla, I. Farkas, I. Derényi, T. Vicsek: CFinder: Locating cliques and overlapping modules in biological networks, Bioinformatics 22(8), 1021–1023 (2006)CrossRefGoogle Scholar
  29. 23.29.
    I. Farkas, D. Ábel, G. Palla, T. Vicsek: Weighted network modules, New. J. Phys. 9, 180 (2007)CrossRefGoogle Scholar
  30. 23.30.
    F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi: Defining and identifying communities in networks, Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)CrossRefGoogle Scholar
  31. 23.31.
    A. Clauset: Finding local community structure in networks, Phys. Rev. E 72, 026132 (2005)CrossRefGoogle Scholar
  32. 23.32.
    J. Baumes, M. Goldberg, M. Magdon-Ismail: Efficient Identification of Overlapping Communities, LNCS 3495, 27–36 (2005)Google Scholar
  33. 23.33.
    F. Luo, J.Z. Wang, E. Promislow: Exploring local community structures in large networks, Web Intell. Agent Syst. 6(4), 387–400 (2008)Google Scholar
  34. 23.34.
    H.W. Mewes, C. Amid, R. Arnold, D. Frishman, U. Güldener, G. Mannhaupt, M. Münsterkötter, P. Pagel, N. Strack, V. Stümpflen, J. Warfsmann, A. Ruepp: MIPS: Analysis and annotation of proteins from whole genomes, Nucl. Acids Res. 32(Database issue), D41–44 (2004)CrossRefGoogle Scholar
  35. 23.35.
    S. Brohée, J. van Helden: Evaluation of clustering algorithms for protein–protein interaction networks, BMC Bioinformatics 7, 488 (2006)CrossRefGoogle Scholar
  36. 23.36.
    R. Jansen, M. Gerstein: Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction, Curr. Opin. Microbiol. 7(5), 535–545 (2004)CrossRefGoogle Scholar
  37. 23.37.
    A.L. Boulesteix: Over-optimism in bioinformatics research, Bioinformatics 26, 437–439 (2009)CrossRefGoogle Scholar
  38. 23.38.
    P. Erdős;, A. Rényi: On random graphs, Publ. Math. 6, 290–297 (1959)MathSciNetzbMATHGoogle Scholar
  39. 23.39.
    M. Molloy, B. Reed: A critical point for random graphs with a given degree sequence, Random Struct. Algorithms 6, 161–179 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 23.40.
    N. Pržulj, D. Higham: Modelling protein–protein interaction networks via a stickiness index, J. R. Soc. Interface 3(10), 711–716 (2006)CrossRefGoogle Scholar
  41. 23.41.
    S. Maslov, K. Sneppen: Specificity and stability in topology of protein networks, Science 296(5569), 910–913 (2002)CrossRefGoogle Scholar
  42. 23.42.
    M.D. Penrose: Random Geometric Graphs, Oxford Studies in Probability, Vol. 5 (Oxford Univ. Press, Oxford 2003)CrossRefzbMATHGoogle Scholar
  43. 23.43.
    P. Holland, K.B. Laskey, S. Leinhardt: Stochastic blockmodels: Some first steps, Soc. Netw. 5, 109–137 (1983)MathSciNetCrossRefGoogle Scholar
  44. 23.44.
    T.A.B. Snijders, K. Nowicki: Estimation and prediction for stochastic blockmodels for graphs with latent block structure, J. Classif. 14(1), 75–100 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 23.45.
    L. Négyessy, T. Nepusz, L. Kocsis, F. Bazsó: Prediction of the main cortical areas and connections involved in the tactile function of the visual cortex by network analysis, Eur. J. Neurosci. 23(7), 1919–1930 (2006)CrossRefGoogle Scholar
  46. 23.46.
    T. Nepusz, L. Négyessy, G. Tusnády, F. Bazsó: Reconstructing cortical networks: Case of directed graphs with high level of reciprocity, Bolyai Soc. Math. Stud. 18, 325–368 (2008)CrossRefMathSciNetGoogle Scholar
  47. 23.47.
    J.L. Morrison, R. Breitling, D.J. Higham, D.R. Gilbert: A lock-and-key model for protein–protein interactions, Bioinformatics 22(16), 2012–2019 (2006)CrossRefGoogle Scholar
  48. 23.48.
    T. Nepusz: Data mining in complex networks: Fuzzy communities and missing link prediction. Ph.D. Thesis (Budapest University of Technology and Economics, Budapest 2008)Google Scholar
  49. 23.49.
    H. Akaike: Likelihood and the Bayes procedure. In: Bayesian Statistics, ed. by J.M. Bernardo, M.H. De Groot, D.V. Lindley, A.F.M. Smith (Valencia Univ. Press, Valencia 1980)Google Scholar
  50. 23.50.
    G.E. Schwarz: Estimating the dimension of a model, Ann. Stat. 6(2), 461–464 (1978)CrossRefMathSciNetzbMATHGoogle Scholar
  51. 23.51.
    A. Clauset, C. Moore, M.E.J. Newman: Hierarchical structure and the prediction of missing links in networks, Nature 453, 98–101 (2008)CrossRefGoogle Scholar
  52. 23.52.
    A. Murzin, S. Brenner, T. Hubbard, C. Chothia: SCOP: A structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol. 247(4), 536–540 (1995)Google Scholar
  53. 23.53. (last accessed May 16, 2011)
  54. 23.54.
    J. Davis, M. Goadrich: The relationship between precision-recall and ROC curves, ICML ʼ06: Proc. 23rd Int. Conf. Mach. Learn. (ACM, New York 2006) pp. 233–240CrossRefGoogle Scholar
  55. 23.55.
    S. Swamidass, C. Azencott, K. Daily, P. Baldi: A CROC stronger than ROC: Measuring, visualizing and optimizing early retrieval, Bioinformatics 26(10), 1348–1356 (2010)CrossRefGoogle Scholar
  56. 23.56.
    G. Hart, A. Ramani, E. Marcotte: How complete are current yeast and human protein-interaction networks?, Genome Biol. 7(11), 120 (2006)CrossRefGoogle Scholar
  57. 23.57.
    C. Stark, B. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers: BioGRID: A general repository for interaction datasets, Nucl. Acids Res. 34(Database issue), D535–9 (2006)CrossRefGoogle Scholar
  58. 23.58.
    T. Keshava Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen, A. Venugopal, L. Balakrishnan, A. Marimuthu, S. Banerjee, D. Somanathan, A. Sebastian, S. Rani, S. Ray, C. Harrys Kishore, S. Kanth, M. Ahmed, M. Kashyap, R. Mohmood, Y. Ramachandra, V. Krishna, B. Rahiman, S. Mohan, P. Ranganathan, S. Ramabadran, R. Chaerkady, A. Pandey: Human Protein Reference Database – 2009 update, Nucl. Acids Res. 37(Database issue), D767–72 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.Department of Biological PhysicsEötvös Loránd UniversityBudepastHungary
  2. 2.Department of Computer ScienceUniversity of LondonEghamUK

Personalised recommendations