Structural Pattern Discovery in Protein–Protein Interaction Networks

Abstract

Most proteins in a cell do not act in isolation, but carry out their function through interactions with other proteins. Elucidating these interactions is therefore central for our understanding of cellular function and organization. Recently, experimental techniques have been developed, which have allowed us to measure protein interactions on a genomic scale for several model organisms. These datasets have a natural representation as weighted graphs, also known as protein–protein interaction (PPI) networks. This chapter will present some recent advances in computational methods for the analysis of these networks, which are aimed at revealing their structural patterns. In particular, we shall focus on methods for uncovering modules that correspond to protein complexes, and on random graph models, which can be used to de-noise large scale PPI networks. In Sect. 23.1, the state-of-the-art techniques and algorithms are described followed by the definition of measures to assess the quality of the predicted complexes and the presentation of a benchmark of the detection algorithms on four PPI networks. Section 23.2 moves beyond protein complexes and explores other structural patterns of protein–protein interaction networks using random graph models.

Abbreviations

3-D

three-dimensional

AP-MS

co-affinity purification followed by mass spectrometry

AUC

area under curve

BP

biological process

BioGRID

biological general repository for interaction dataset

CC

cellular component

DAG

directed acyclic graph

GO

gene ontology

GRG

geometric random graph

HPRD

human protein reference database

HRG

hierarchical random graph,

KEGG

Kyoto encyclopedia for genes and genomes

MAP

maximum a posteriori

MCL

Markov clustering

MCODE

molecular complex detection

MF

membership function

MIPS

Munich Information Center for Protein Sequences

ML

maximum likelihood

MMR

maximum matching ratio

PPI

protein–protein interaction

PPV

positive predictive value

RNA

ribonucleic acid

RNSC

restricted neighborhood search clustering

ROC

receiver operating characteristic

SCOP

structural classification of proteins

Y2H

yeast two hybrid

log

logistic regression

References

  1. 23.1.
    P. Uetz, L. Giot, G. Cagney, T. Mansfield, R. Judson, J. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, J. Rothberg: A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae, Nature 403(6770), 623–627 (2000)CrossRefGoogle Scholar
  2. 23.2.
    T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, Y. Sakaki: Toward a protein–protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins, Proc. Natl. Acad. Sci. USA 97(3), 1143–1147 (2000)CrossRefGoogle Scholar
  3. 23.3.
    L. Giot, J. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y. Hao, C. Ooi, B. Godwin, E. Vitols, G. Vijayadamodar, P. Pochart, H. Machineni, M. Welsh, Y. Kong, B. Zerhusen, R. Malcolm, Z. Varrone, A. Collis, M. Minto, S. Burgess, L. McDaniel, E. Stimpson, F. Spriggs, J. Williams, K. Neurath, N. Ioime, M. Agee, E. Voss, K. Furtak, R. Renzulli, N. Aanensen, S. Carrolla, E. Bickelhaupt, Y. Lazovatsky, A. DaSilva, J. Zhong, C. Stanyon, R. Finley, K. White, M. Braverman, T. Jarvie, S. Gold, M. Leach, J. Knight, R. Shimkets, M. McKenna, J. Chant, J. Rothberg: A protein interaction map of Drosophila melanogaster, Science 302(5651), 1727–1736 (2003)CrossRefGoogle Scholar
  4. 23.4.
    S. Li, C. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem, P. Vidalain, J. Han, A. Chesneau, T. Hao, D. Goldberg, N. Li, M. Martinez, J. Rual, P. Lamesch, L. Xu, M. Tewari, S. Wong, L. Zhang, G. Berriz, L. Jacotot, P. Vaglio, J. Reboul, T. Hirozane-Kishikawa, Q. Li, H. Gabel, A. Elewa, B. Baumgartner, D. Rose, H. Yu, S. Bosak, R. Sequerra, A. Fraser, S. Mango, W. Saxton, S. Strome, S. Van Den Heuvel, F. Piano, J. Vandenhaute, C. Sardet, M. Gerstein, L. Doucette-Stamm, K. Gunsalus, J. Harper, M. Cusick, F. Roth, D. Hill, M. Vidal: A map of the interactome network of the metazoan C. elegans, Science 303(5657), 540–543 (2004)CrossRefGoogle Scholar
  5. 23.5.
    U. Stelzl, U. Worm, M. Lalowski, C. Haenig, F. Brembeck, H. Goehler, M. Stroedicke, M. Zenkner, A. Schoenherr, S. Koeppen, J. Timm, S. Mintzlaff, C. Abraham, N. Bock, S. Kietzmann, A. Goedde, E. Toksöz, A. Droege, S. Krobitsch, B. Korn, W. Birchmeier, H. Lehrach, E. Wanker: A human protein–protein interaction network: A resource for annotating the proteome, Cell 122(6), 957–968 (2005)CrossRefGoogle Scholar
  6. 23.6.
    J. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, G. Berriz, F. Gibbons, M. Dreze, N. Ayivi-Guedehoussou, N. Klitgord, C. Simon, M. Boxem, S. Milstein, J. Rosenberg, D. Goldberg, L. Zhang, S. Wong, G. Franklin, S. Li, J. Albala, J. Lim, C. Fraughton, E. Llamosas, S. Cevik, C. Bex, P. Lamesch, R. Sikorski, J. Vandenhaute, H. Zoghbi, A. Smolyar, S. Bosak, R. Sequerra, L. Doucette-Stamm, M. Cusick, D. Hill, F. Roth, M. Vidal: Towards a proteome-scale map of the human protein–protein interaction network, Nature 437(7062), 1173–1178 (2005)CrossRefGoogle Scholar
  7. 23.7.
    N. Krogan, G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta, A. Tikuisis, T. Punna, J. Peregrin-Alvarez, M. Shales, X. Zhang, M. Davey, M. Robinson, A. Paccanaro, J. Bray, A. Sheung, B. Beattie, D. Richards, V. Canadien, A. Lalev, F. Mena, P. Wong, A. Starostine, M. Canete, J. Vlasblom, S. Wu, C. Orsi, S. Collins, S. Chandran, R. Haw, J. Rilstone, K. Gandi, N. Thompson, G. Musso, P. St. Onge, S. Ghanny, M. Lam, G. Butland, A. Altaf-Ui, S. Kanaya, A. Shilatifard, E. OʼShea, J. Weissman, C. Ingles, T. Hughes, J. Parkinson, M. Gerstein, S. Wodak, A. Emili, J. Greenblatt: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae, Nature 440(7084), 637–643 (2006)CrossRefGoogle Scholar
  8. 23.8.
    A. Gavin, P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M. Heurtier, V. Hoffman, C. Hoefert, K. Klein, M. Hudak, A. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J. Rick, B. Kuster, P. Bork, R. Russell, G. Superti-Furga: Proteome survey reveals modularity of the yeast cell machinery, Nature 440(7084), 631–636 (2006)CrossRefGoogle Scholar
  9. 23.9.
    N. Pržulj, D. Corneil, I. Jurisica: Modeling interactome: Scale-free or geometric?, Bioinformatics 20(18), 3508–3515 (2004)CrossRefGoogle Scholar
  10. 23.10.
    L. Lu, Y. Xia, A. Paccanaro, H. Yu, M. Gerstein: Assessing the limits of genomic data integration for predicting protein networks, Genome Res. 15(7), 945–953 (2005)CrossRefGoogle Scholar
  11. 23.11.
    H. Yu, A. Paccanaro, V. Trifonov, M. Gerstein: Predicting interactions in protein networks by completing defective cliques, Bioinformatics 22(7), 823–829 (2006)CrossRefGoogle Scholar
  12. 23.12.
    S.R. Collins, P. Kemmeren, X.C. Zhao, J.F. Greenblatt, F. Spencer, F.C. Holstege, J.S. Weissman, N.J. Krogan: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae, Mol. Cell Proteomics 6, 439–450 (2007)CrossRefGoogle Scholar
  13. 23.13.
    D. Higham, M. Rašajski, N. Pržulj: Fitting a geometric graph to a protein–protein interaction network, Bioinformatics 24(8), 1093–1099 (2008)CrossRefGoogle Scholar
  14. 23.14.
    H. Yu, P. Braun, M. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, T. Hao, J. Rual, A. Dricot, A. Vazquez, R. Murray, C. Simon, L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A. de Smet, A. Motyl, M. Hudson, J. Park, X. Xin, M. Cusick, T. Moore, C. Boone, M. Snyder, F. Roth, A. Barabási, J. Tavernier, D. Hill, M. Vidal: High-quality binary protein interaction map of the yeast interactome network, Science 322(5898), 104–110 (2008)CrossRefGoogle Scholar
  15. 23.15.
    O. Kuchaiev, M. Rašajski, D. Higham, N. Pržulj: Geometric de-noising of protein–protein interaction networks, PLoS Comp. Biol. 5(8), e1000454 (2009)CrossRefMathSciNetGoogle Scholar
  16. 23.16.
    B. Karrer, M.E.J. Newman: Stochastic blockmodels and community structure in networks, Phys. Rev. E 83(1 Pt 2), 016107 (2011)MathSciNetCrossRefGoogle Scholar
  17. 23.17.
    M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, M. Harris, D. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. Matese, J. Richardson, M. Ringwald, G. Rubin, G. Sherlock: Gene ontology: Tool for the unification of biology, Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  18. 23.18.
    M. Kanehisa, M. Araki, S. Goto, M. Hattori, M. Hirakawa, M. Itoh, T. Katayama, S. Kawashima, S. Okuda, T. Tokimatsu, Y. Yamanishi: KEGG for linking genomes to life and the environment, Nucl. Acids Res. 36(Database issue), D480–4 (2008)Google Scholar
  19. 23.19.
    B. Alberts, A. Johnson, J. Lewis, M. Raff: Molecular Biology of the Cell, 4th edn. (Garland Science, New York 2002), Chap. 6, p. 342Google Scholar
  20. 23.20.
    A. King, N. Pržulj, I. Jurisica: Protein complex prediction via cost-based clustering, Bioinformatics 20(17), 3013–3020 (2004)CrossRefGoogle Scholar
  21. 23.21.
    A.J. Enright, S.V. Dongen, C.A. Ouzounis: An efficient algorithm for large-scale detection of protein families, Nucl. Acids Res. 30(7), 1575–1584 (2002)CrossRefGoogle Scholar
  22. 23.22.
    S. van Dongen: Graph clustering via a discrete uncoupling process, SIAM J. Matrix Anal. Appl. 30, 121–141 (2008)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.23.
    A. King: Graph Clustering with Restricted Neighborhood Search, Masterʼs thesis (University of Toronto, Toronto 2004)Google Scholar
  24. 23.24.
    F. Glover, M. Laguna: Tabu Search (Kluwer Academic, Dordrecht 1997)CrossRefMATHGoogle Scholar
  25. 23.25.
    G.D. Bader, C.W. Hogue: An automated method for finding molecular complexes in large protein interaction networks, BMC Bioinformatics 4, 2 (2003)CrossRefGoogle Scholar
  26. 23.26.
    G. Palla, I. Derényi, I. Farkas, T. Vicsek: Uncovering the overlapping community structure of complex networks in nature and society, Nature 435(7043), 814–818 (2005)CrossRefGoogle Scholar
  27. 23.27.
    T. Nepusz, H. Yu, A. Paccanaro: Detecting overlapping protein complexes from protein–protein interaction networks, Nat. Methods 9(5), 471–472 (2012)CrossRefGoogle Scholar
  28. 23.28.
    B. Adamcsek, G. Palla, I. Farkas, I. Derényi, T. Vicsek: CFinder: Locating cliques and overlapping modules in biological networks, Bioinformatics 22(8), 1021–1023 (2006)CrossRefGoogle Scholar
  29. 23.29.
    I. Farkas, D. Ábel, G. Palla, T. Vicsek: Weighted network modules, New. J. Phys. 9, 180 (2007)CrossRefGoogle Scholar
  30. 23.30.
    F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi: Defining and identifying communities in networks, Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)CrossRefGoogle Scholar
  31. 23.31.
    A. Clauset: Finding local community structure in networks, Phys. Rev. E 72, 026132 (2005)CrossRefGoogle Scholar
  32. 23.32.
    J. Baumes, M. Goldberg, M. Magdon-Ismail: Efficient Identification of Overlapping Communities, LNCS 3495, 27–36 (2005)Google Scholar
  33. 23.33.
    F. Luo, J.Z. Wang, E. Promislow: Exploring local community structures in large networks, Web Intell. Agent Syst. 6(4), 387–400 (2008)Google Scholar
  34. 23.34.
    H.W. Mewes, C. Amid, R. Arnold, D. Frishman, U. Güldener, G. Mannhaupt, M. Münsterkötter, P. Pagel, N. Strack, V. Stümpflen, J. Warfsmann, A. Ruepp: MIPS: Analysis and annotation of proteins from whole genomes, Nucl. Acids Res. 32(Database issue), D41–44 (2004)CrossRefGoogle Scholar
  35. 23.35.
    S. Brohée, J. van Helden: Evaluation of clustering algorithms for protein–protein interaction networks, BMC Bioinformatics 7, 488 (2006)CrossRefGoogle Scholar
  36. 23.36.
    R. Jansen, M. Gerstein: Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction, Curr. Opin. Microbiol. 7(5), 535–545 (2004)CrossRefGoogle Scholar
  37. 23.37.
    A.L. Boulesteix: Over-optimism in bioinformatics research, Bioinformatics 26, 437–439 (2009)CrossRefGoogle Scholar
  38. 23.38.
    P. Erdős;, A. Rényi: On random graphs, Publ. Math. 6, 290–297 (1959)MathSciNetMATHGoogle Scholar
  39. 23.39.
    M. Molloy, B. Reed: A critical point for random graphs with a given degree sequence, Random Struct. Algorithms 6, 161–179 (1995)MathSciNetCrossRefMATHGoogle Scholar
  40. 23.40.
    N. Pržulj, D. Higham: Modelling protein–protein interaction networks via a stickiness index, J. R. Soc. Interface 3(10), 711–716 (2006)CrossRefGoogle Scholar
  41. 23.41.
    S. Maslov, K. Sneppen: Specificity and stability in topology of protein networks, Science 296(5569), 910–913 (2002)CrossRefGoogle Scholar
  42. 23.42.
    M.D. Penrose: Random Geometric Graphs, Oxford Studies in Probability, Vol. 5 (Oxford Univ. Press, Oxford 2003)CrossRefMATHGoogle Scholar
  43. 23.43.
    P. Holland, K.B. Laskey, S. Leinhardt: Stochastic blockmodels: Some first steps, Soc. Netw. 5, 109–137 (1983)MathSciNetCrossRefGoogle Scholar
  44. 23.44.
    T.A.B. Snijders, K. Nowicki: Estimation and prediction for stochastic blockmodels for graphs with latent block structure, J. Classif. 14(1), 75–100 (1997)MathSciNetCrossRefMATHGoogle Scholar
  45. 23.45.
    L. Négyessy, T. Nepusz, L. Kocsis, F. Bazsó: Prediction of the main cortical areas and connections involved in the tactile function of the visual cortex by network analysis, Eur. J. Neurosci. 23(7), 1919–1930 (2006)CrossRefGoogle Scholar
  46. 23.46.
    T. Nepusz, L. Négyessy, G. Tusnády, F. Bazsó: Reconstructing cortical networks: Case of directed graphs with high level of reciprocity, Bolyai Soc. Math. Stud. 18, 325–368 (2008)CrossRefMathSciNetGoogle Scholar
  47. 23.47.
    J.L. Morrison, R. Breitling, D.J. Higham, D.R. Gilbert: A lock-and-key model for protein–protein interactions, Bioinformatics 22(16), 2012–2019 (2006)CrossRefGoogle Scholar
  48. 23.48.
    T. Nepusz: Data mining in complex networks: Fuzzy communities and missing link prediction. Ph.D. Thesis (Budapest University of Technology and Economics, Budapest 2008)Google Scholar
  49. 23.49.
    H. Akaike: Likelihood and the Bayes procedure. In: Bayesian Statistics, ed. by J.M. Bernardo, M.H. De Groot, D.V. Lindley, A.F.M. Smith (Valencia Univ. Press, Valencia 1980)Google Scholar
  50. 23.50.
    G.E. Schwarz: Estimating the dimension of a model, Ann. Stat. 6(2), 461–464 (1978)CrossRefMathSciNetMATHGoogle Scholar
  51. 23.51.
    A. Clauset, C. Moore, M.E.J. Newman: Hierarchical structure and the prediction of missing links in networks, Nature 453, 98–101 (2008)CrossRefGoogle Scholar
  52. 23.52.
    A. Murzin, S. Brenner, T. Hubbard, C. Chothia: SCOP: A structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol. 247(4), 536–540 (1995)Google Scholar
  53. 23.53.
    http://scop.mrc-lmb.cam.ac.uk/scop/intro.html (last accessed May 16, 2011)
  54. 23.54.
    J. Davis, M. Goadrich: The relationship between precision-recall and ROC curves, ICML ʼ06: Proc. 23rd Int. Conf. Mach. Learn. (ACM, New York 2006) pp. 233–240CrossRefGoogle Scholar
  55. 23.55.
    S. Swamidass, C. Azencott, K. Daily, P. Baldi: A CROC stronger than ROC: Measuring, visualizing and optimizing early retrieval, Bioinformatics 26(10), 1348–1356 (2010)CrossRefGoogle Scholar
  56. 23.56.
    G. Hart, A. Ramani, E. Marcotte: How complete are current yeast and human protein-interaction networks?, Genome Biol. 7(11), 120 (2006)CrossRefGoogle Scholar
  57. 23.57.
    C. Stark, B. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers: BioGRID: A general repository for interaction datasets, Nucl. Acids Res. 34(Database issue), D535–9 (2006)CrossRefGoogle Scholar
  58. 23.58.
    T. Keshava Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen, A. Venugopal, L. Balakrishnan, A. Marimuthu, S. Banerjee, D. Somanathan, A. Sebastian, S. Rani, S. Ray, C. Harrys Kishore, S. Kanth, M. Ahmed, M. Kashyap, R. Mohmood, Y. Ramachandra, V. Krishna, B. Rahiman, S. Mohan, P. Ranganathan, S. Ramabadran, R. Chaerkady, A. Pandey: Human Protein Reference Database – 2009 update, Nucl. Acids Res. 37(Database issue), D767–72 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.Department of Biological PhysicsEötvös Loránd UniversityBudepastHungary
  2. 2.Department of Computer ScienceUniversity of LondonEghamUK

Personalised recommendations