Skip to main content

Analysis Strategy of Protein–Protein Interaction Networks

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 939))

Abstract

Protein interactions, as well as the networks they formed, play a key role in many cellular processes and the distortion of the protein interacting interfaces may lead to the development of many diseases. In this chapter, we will briefly introduce the background knowledge of the protein–protein interaction, followed by the detailed explanation of varied analysis—from basic to advanced, as well as related tools and databases. VisANT (http://visant.bu.edu)—a free Web-based software platform for the integrative visualization, mining, analysis, and modeling of the biological networks—will be used as a main tool for all examples used in this section.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Notes

  1. 1.

    When you are uncertain about the format of edge-list, you can always export the network in the format of edge-list with the menu File→Export as Tab-Delimited File→All and follow the exported examples.

References

  1. Phizicky EM, Fields S (1995) Protein-protein interactions: methods for detection and analysis. Microbiol Rev 59(1):94–123

    PubMed  CAS  Google Scholar 

  2. Berggard T, Linse S, James P (2007) Methods for the detection and analysis of protein-protein interactions. Proteomics 7(16):2833–2842

    Article  PubMed  Google Scholar 

  3. Sobott F, Robinson CV (2002) Protein complexes gain momentum. Curr Opin Struct Biol 12(6):729–734

    Article  PubMed  CAS  Google Scholar 

  4. McCammon MG et al (2002) Screening transthyretin amyloid fibril inhibitors: characterization of novel multiprotein, multiligand complexes by mass spectrometry. Structure 10(6):851–863

    Article  PubMed  CAS  Google Scholar 

  5. von Mering C et al (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887):399–403

    Article  Google Scholar 

  6. Lu L, Arakaki AK, Lu H, Skolnick J (2003) Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res 13(6A):1146–1154

    Article  PubMed  CAS  Google Scholar 

  7. Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22(10):1317–1321

    Article  PubMed  CAS  Google Scholar 

  8. Ito T et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98(8):4569–4574

    Article  PubMed  CAS  Google Scholar 

  9. Uetz P et al (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623–627

    Article  PubMed  CAS  Google Scholar 

  10. Hart GT, Ramani AK, Marcotte EM (2006) How complete are current yeast and human protein-interaction networks? Genome Biol 7(11):120

    Article  PubMed  Google Scholar 

  11. Hu Z, Snitkin ES, DeLisi C (2008) VisANT: an integrative framework for networks in systems biology. Brief Bioinform 9(4):317–325

    Article  PubMed  CAS  Google Scholar 

  12. Hu Z et al (2007) VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic Acids Res 35(Web Server issue):W625–W632

    Article  PubMed  Google Scholar 

  13. Hu Z et al (2005) VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res 33(Web Server issue):W352–W357

    Article  PubMed  CAS  Google Scholar 

  14. Hu Z, Mellor J, Wu J, DeLisi C (2004) VisANT: an online visualization and analysis tool for biological interaction data. BMC Bioinformatics 5:17

    Article  PubMed  Google Scholar 

  15. Hu Z, Mellor J, DeLisi C (2004) Analyzing networks with VisANT. In: Baxevanis A, Davison D, Page R, Petsko G, Stein L, Stormo G (eds) Current protocols in bioinformatics. Wiley, Hoboken

    Google Scholar 

  16. Hu Z et al (2009) VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology (translated from eng). Nucleic Acids Res 37(Web Server issue):W115–W121 (in eng)

    Article  PubMed  CAS  Google Scholar 

  17. Hermjakob H et al (2004) The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 22:177–183

    Article  PubMed  CAS  Google Scholar 

  18. Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C (2009) Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol 10(9):R91

    Article  PubMed  Google Scholar 

  19. Linghu B et al (2008) High-precision high-coverage functional inference from integrated data sources. BMC Bioinformatics 9:119

    Article  PubMed  Google Scholar 

  20. Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34(Database issue):D504–D506

    Article  PubMed  CAS  Google Scholar 

  21. Breitkreutz BJ et al (2008) The BioGRID Interaction Database: 2008 update. Nucleic Acids Res 36(Database issue):D637–D640

    PubMed  CAS  Google Scholar 

  22. Aranda B et al (2010) The IntAct molecular interaction database in 2010. Nucleic Acids Res 38(Database issue):D525–D531

    Article  PubMed  CAS  Google Scholar 

  23. Zanzoni A et al (2002) MINT: a Molecular INTeraction database. FEBS Lett 513(1):135–140

    Article  PubMed  CAS  Google Scholar 

  24. Mewes HW et al (2008) MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res 36(Database issue):D196–D201

    PubMed  CAS  Google Scholar 

  25. Cherry JM et al (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Res 26(1):73–79

    Article  PubMed  CAS  Google Scholar 

  26. Wilson RJ, Goodman JL, Strelets VB (2008) FlyBase: integration and improvements to query tools. Nucleic Acids Res 36(Database issue):D588–D593

    PubMed  CAS  Google Scholar 

  27. Keshava Prasad TS et al (2009) Human Protein Reference Database—2009 update. Nucleic Acids Res 37(Database issue):D767–D772

    Article  PubMed  CAS  Google Scholar 

  28. Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res 30(1):306–309

    Article  PubMed  CAS  Google Scholar 

  29. von Mering C et al (2007) STRING 7—recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35(Database issue):D358–D362

    Article  Google Scholar 

  30. UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36(Database issue):D190–D195

    Google Scholar 

  31. Bruford EA et al (2008) The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res 36(Database issue):D445–D448

    PubMed  CAS  Google Scholar 

  32. Schuster-Bockler B, Bateman A (2008) Protein interactions in human genetic diseases. Genome Biol 9(1):R9

    Article  PubMed  Google Scholar 

  33. Yeger-Lotem E et al (2004) Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci USA 101(16):5934–5939

    Article  PubMed  CAS  Google Scholar 

  34. Zhang LV et al (2005) Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol 4(2):6

    Article  PubMed  CAS  Google Scholar 

  35. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2008) GenBank. Nucleic Acids Res 36(Database issue):D25–D30

    PubMed  CAS  Google Scholar 

  36. Rogers A et al (2008) WormBase 2007. Nucleic Acids Res 36(Database issue):D612–D617

    PubMed  CAS  Google Scholar 

  37. Goh KI et al (2007) The human disease network. Proc Natl Acad Sci USA 104(21):8685–8690

    Article  PubMed  CAS  Google Scholar 

  38. Tong AH et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813

    Article  PubMed  CAS  Google Scholar 

  39. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large-scale organization of metabolic networks. Nature 407(6804):651–654

    Article  PubMed  CAS  Google Scholar 

  40. Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378–382

    Article  PubMed  CAS  Google Scholar 

  41. Gavin AC et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147

    Article  PubMed  CAS  Google Scholar 

  42. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555

    Article  PubMed  CAS  Google Scholar 

  43. Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (2004) Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14(3):283–291

    Article  PubMed  CAS  Google Scholar 

  44. del Sol A, O’Meara P (2005) Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3):672–682

    Article  PubMed  Google Scholar 

  45. King OD (2004) Comment on “Subgraphs in random networks”. Phys Rev E Stat Nonlin Soft Matter Phys 70(5 Pt 2):058101, author reply 058102

    Article  PubMed  Google Scholar 

  46. Itzkovitz S, Milo R, Kashtan N, Ziv G, Alon U (2003) Subgraphs in random networks. Phys Rev E Stat Nonlin Soft Matter Phys 68(2 Pt 2):026127

    Article  PubMed  CAS  Google Scholar 

  47. Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68

    Article  PubMed  CAS  Google Scholar 

  48. Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29

    Article  PubMed  CAS  Google Scholar 

  49. da Huang W et al (2007) The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8(9):R183

    Article  Google Scholar 

  50. Hu Z et al (2007) Towards zoomable multidimensional maps of the cell (translated from eng). Nat Biotechnol 25(5):547–554 (in eng)

    Article  PubMed  CAS  Google Scholar 

  51. Lee TI et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594):799–804

    Article  PubMed  CAS  Google Scholar 

  52. Milo R et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

    Article  PubMed  CAS  Google Scholar 

  53. Endy D, Brent R (2001) Modelling cellular behaviour. Nature 409(6818):391–395

    Article  PubMed  CAS  Google Scholar 

  54. Stamm S (2002) Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome. Hum Mol Genet 11(20):2409–2416

    Article  PubMed  CAS  Google Scholar 

  55. Boulos MN (2003) The use of interactive graphical maps for browsing medical/health Internet information resources. Int J Health Geogr 2(1):1

    Article  PubMed  Google Scholar 

  56. Green ML, Karp PD (2006) The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Res 34(13):3687–3697

    Article  PubMed  CAS  Google Scholar 

  57. Fraser AG, Marcotte EM (2004) A probabilistic view of gene function. Nat Genet 36(6):559–564

    Article  PubMed  CAS  Google Scholar 

  58. Guimera R, Nunes Amaral LA (2005) Functional cartography of complex metabolic networks. Nature 433(7028):895–900

    Article  PubMed  CAS  Google Scholar 

  59. Ihmels J et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4):370–377

    PubMed  CAS  Google Scholar 

  60. Bar-Joseph Z et al (2003) Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21(11):1337–1342

    Article  PubMed  CAS  Google Scholar 

  61. Wu J, Hu Z, DeLisi C (2006) Gene annotation and network inference by phylogenetic profiling. BMC Bioinformatics 7:80

    Article  PubMed  Google Scholar 

  62. Oltvai ZN, Barabasi AL (2002) Systems biology. Life’s complexity pyramid. Science 298(5594):763–764

    Article  PubMed  CAS  Google Scholar 

  63. Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9(7):509–515

    Article  PubMed  CAS  Google Scholar 

  64. Reimand J, Tooming L, Peterson H, Adler P, Vilo J (2008) GraphWeb: mining heterogeneous biological networks for gene modules with functional significance. Nucleic Acids Res 36(Web Server issue):W452–W459

    Article  PubMed  CAS  Google Scholar 

  65. Zhang M et al (2008) Interactive analysis of systems biology molecular expression data. BMC Syst Biol 2:23

    Article  PubMed  Google Scholar 

  66. Brohee S et al (2008) NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res 36(Web Server issue):W444–W451

    Article  PubMed  CAS  Google Scholar 

  67. Alibes A, Canada A, Diaz-Uriarte R (2008) PaLS: filtering common literature, biological terms and pathway information. Nucleic Acids Res 36(Web Server issue):W364–W367

    Article  PubMed  CAS  Google Scholar 

  68. Antonov AV, Schmidt T, Wang Y, Mewes HW (2008) ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data. Nucleic Acids Res 36(Web Server issue):W347–W351

    Article  PubMed  CAS  Google Scholar 

  69. Lee T, Desai VG, Velasco C, Reis RJ, Delongchamp RR (2008) Testing for treatment effects on gene ontology. BMC Bioinformatics 9(Suppl 9):S20

    Article  PubMed  Google Scholar 

  70. Salomonis N et al (2007) GenMAPP 2: new features and resources for pathway analysis. BMC Bioinformatics 8:217

    Article  PubMed  Google Scholar 

  71. Zhu J et al (2007) GO-2D: identifying 2-dimensional cellular-localized functional modules in gene ontology. BMC Genomics 8:30

    Article  PubMed  Google Scholar 

  72. Antonov AV, Tetko IV, Mewes HW (2006) A systematic approach to infer biological relevance and biases of gene network structures. Nucleic Acids Res 34(1):e6

    Article  PubMed  Google Scholar 

  73. Draghici S et al (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31(13):3775–3781

    Article  PubMed  CAS  Google Scholar 

  74. Khatri P, Bhavsar P, Bawa G, Draghici S (2004) Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments. Nucleic Acids Res 32(Web Server issue):W449–W456

    Article  PubMed  CAS  Google Scholar 

  75. Khatri P et al (2007) Onto-Tools: new additions and improvements in 2006. Nucleic Acids Res 35(Web Server issue):W206–W211

    Article  PubMed  Google Scholar 

  76. Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21(18):3587–3595

    Article  PubMed  CAS  Google Scholar 

  77. Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281

    Article  PubMed  CAS  Google Scholar 

  78. Maglott D, Ostell J, Pruitt KD, Tatusova T (2007) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 35(Database issue):D26–D31

    Article  PubMed  CAS  Google Scholar 

  79. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125(1–2):279–284

    Article  PubMed  CAS  Google Scholar 

  80. Barry WT, Nobel AB, Wright FA (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21(9):1943–1949

    Article  PubMed  CAS  Google Scholar 

  81. Mootha VK et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273

    Article  PubMed  CAS  Google Scholar 

  82. Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles (translated from eng). Proc Natl Acad Sci USA 102(43):15545–15550 (in eng)

    Article  PubMed  CAS  Google Scholar 

  83. Volinia S et al (2004) GOAL: automated gene ontology analysis of expression profiles. Nucleic Acids Res 32(Web Server issue):W492–W499

    Article  PubMed  CAS  Google Scholar 

  84. Zhou X, Kao MC, Wong WH (2002) Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA 99(20):12783–12788

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenjun Hu .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Mathematical Definition of Metagraph

A metagraph \( {G_{\rm{m}}} = \{ V,E\} \) consists of a finite set V of the nodes and a finite set E of the edges. Nodes in a metagraph can be denoted as \( V = \{ {V_{\rm{s}}},{V_{\rm{m}}}\} \) where \( {V_{\rm{s}}} \) represents simple nodes as generally defined in simple graph and \( {V_{\rm{m}}} \) represents the metanodes. The subscription s represents the simple node/edge and the subscription m represents metanode/metaedge. Each metanode \( {v_{\rm{m}}} \in {V_{\rm{m}}} \) contains a subgraph consisting of child nodes and connected edges. In addition, each node \( v \in V \) represents a set of its instance nodes, i.e., \( v = \left\{ {{v_i}\left| {i > 0} \right.} \right\} \) where \( {v_i} \) is the instance node of \( v \). Instance nodes remains exact same identity between them but can have individual-specific properties. The statement that two metanodes share a node implies that each metanode contains an instance of the same node.

A metanode \( {v_{\rm{m}}} \) has two states, expanded or contracted; the expanded state manifests the internal subgraph (that is, places all children nodes with their connections into the graph) while the contracted state replaces this subgraph with the single node. The combination of different states of the metanodes for a given metagraph results in multiple views that are abstract representations of the same underlying data. The change of the views for a given metagraph is defined as the dynamics of the metagraph, as shown in Fig. 1D, E.

Edges in a metagraph can be denoted as \( E = \{ {E_{\rm{s}}},{E_{\rm{m}}}\} \)where \( {E_{\rm{s}}} \)represents simple edges that are generally defined in the simple graph and \( {E_{\rm{m}}} \) represents metaedges. Each metanode edge \( {e_{\rm{m}}} \in {E_{\rm{m}}} = {e_{{{v_{\rm{m}}},v}}} \)is associated with at least one contracted metanode \( {v_{\rm{m}}} \)and is transient: it appears when the metanode is contracted and disappears when one or two connected metanode nodes expanded, i.e., the metaedge is derived from the properties of two connected nodes. The most common derivation of the metaedge is the connection transfer. For example, when metanodes M1 and M2 are contracted in Fig. 1E, the connection between C and E is transferred to M1 and M2. However, metaedge can also be derived from other properties of the metanode. The metaedge shown in Fig. 1E is derived because two metanode M2 and M3 share the same node E. The derivation of the metaedge can be generalized as \( {e_{{{v_{{{\rm{m}},v}}}}}} = g({v_{{{\rm{m}},}}}v) \), where g is the aggregation function and \( v \in V \) can either be a metanode node or a simple node.

1.2 Download and Run VisANT as a Local Application

VisANT has four running modes in total, and two of them require a local copy of VisANT. Please visit http://visant.bu.edu and click the link “Run VisANT” for detailed instruction of other modes. It is recommended to run VisANT as a local application when handling large-scale network, such as the network with more than 100,000 nodes and edges because you will have the option to specify the memory size that VisANT can use. In addition, a local application allows VisANT to access local resources, such as load/save network files, directly; it also allows the user to develop VisANT plugins, as well as run a list of batch commands in the background without any user interface (batch mode).

The only drawback to run VisANT as a local application is that it easily becomes out of date because VisANT is under active development. Fortunately, VisANT provides a function to checks the update automatically and an icon will be shown near the Help menu if the update is available. Users can either click the icon, or corresponding menu to upgrade the VisANT to the latest version, as shown below:

  1. 1.

    If not already installed, download and install the Java 2 Platform, Standard Edition, version 1.4 or higher (http://java.sun.com/javase/downloads/index.jsp).

  2. 2.

    Go to http://visant.bu.edu and click on the link “Download,” then click the link “Latest Version of VisANT.”

  3. 3.

    Select a directory to save the file “VisAnt.jar”

    The VisAnt.jar is only about 400 K in size and the download shall take less than 1 min to finish. No installation is needed to run the VisANT.

  4. 4.

    To launch VisANT, double-clicking VisAnt.jar

  5. 5.

    To launch VisANT by an alternative mean: Open a Dos window in Win OS, or a shell window in other operation systems, and go to the directory where VisAnt.jar locates, and run the command:

    java -Xmx512M -classpath VisAnt.jar cagt.bu.visant.VisAntApplet

    where 512 M indicates the maximum size of the memory that VisANT can use. Increase this number if you have a large network or you get the “run out of memory” error.

  6. 6.

    The VisANT main window will appear (Fig. 4).

  7. 7.

    To exit VisANT, close the VisANT main window, or use the File → Exit menu option, or press the key combination ALT + X.

1.3 GO Term Enrichment Analysis

The four steps here describe how GOTEA works in VisANT. For illustration purposes, the following steps take only one metanode, G, into account and calculate only the enrichment score of one target GO term, T.

Step 1: Fully annotate all of the nodes in G with gene names and GO terms.

Step 2: Calculate density scores for each node based upon the topology and the GO term similarity to T. A vector D G of density scores of each gene in G is computed, with the element of D G for the ith gene denoted D i . The density score is used to evaluate the impact of other genes in G on the ith gene, according to both the GO term similarity and the topological distance to the ith gene. D i is defined as:

$$ {D_i} = \sum\limits_{{j \in G}} {{{\log }_2}\left[ {\left( {\frac{{{M_j}}}{\alpha }} \right)\Theta ({M_j} - \alpha ) + \Theta (\alpha - {M_j})} \right]} {{\text{e}}^{{ - \beta {d_{{ij}}}}}}, $$

where the step function,

$$ \Theta (x - y) = \left\{ {\begin{array}{ll} 1 \hfill & {x \geqslant y} \hfill \\0 \hfill & {x < y,} \hfill \\\end{array} } \right. $$

ensures that D i  ≥ 0. M j is a measure of the GO term similarity calculated based upon the graph structure of the GO term hierarchy [85]. A significance threshold, α, is used to control the contribution that gene j makes to D i . For larger α, a greater number of less statistically significant (with M j  < α) genes are filtered and they do not contribute to D i . The shortest distance between genes i and j given the topology of G is denoted d ij and was calculated with the Floyd–Warshall algorithm. We assume that shorter distances make an exponentially greater contribution to the density than do longer distances, with the steepness of the exponential determined by the parameter \( \tilde{\beta } \)When a bigger β is chosen, more distant genes can contribute to the density. Taken together, the parameters α and β are used to control the sensitivity and selectivity of the density.

Step 3: Another vector of density scores, D NG, is computed based upon a randomly chosen subset of genes representative of the background distribution. The background consists of all genes annotated by NCBI.

Step 4: Statistical significance for rejecting the null hypothesis is determined by a permutation test. For statistical robustness, step 3 is repeated n times. The number of times the average density score of randomly chosen genes is found to be larger than the average density score of genes in G is counted after n iterations and used to compute the final p-value (Fig. 23).

Fig. 23.
figure 23

VisANT upgrade.

These four steps can be carried out for multiple testing by using multiple metanodes and multiple targeting GO terms. In this case, the p-values are corrected using FDR methods (79). Specifically,\( {\text{\ FDR}} = p \times {{m} \left/ {k} \right.} \), where m is the total number of GO terms tested and k is the rank of the GO terms under consideration. There is also an option for GOTEA to identify representative GO terms from all its discoveries based upon approaches that identify the most informative GO term (84).

1.4 Network Module Enrichment Analysis

NMEA is implemented in a manner similar to GOTEA. Where GOTEA used GO term similarities, NMEA uses p-values from T-tests on the expression values of two phenotypes.

Step 1: Fetch the expression profile of each gene in a given module (i.e., metanode, denoted M in the following context) from formatted user input. The input should include an adequate number of samples with comparable phenotypes (e.g., normal and disease).

Step 2: A vector D M of density scores of each gene is computed, with the element of D M for the ith gene denoted as D i . D i is defined as:

$$ {D_i} = \sum\limits_{{j \in G}} {{{\log }_2}\left[ {\left( {\frac{\alpha }{{{M_j}}}} \right)\Theta (\alpha - {M_j}) + \Theta ({M_j} - \alpha )} \right]} {{\text{e}}^{{ - \beta {d_{{ij}}}}}}, $$

where the step function,

$$ \Theta (x - y) = \left\{ {\begin{array}{ll} 1 \hfill & {x \geqslant y} \hfill \\0 \hfill & {x < y,} \hfill \\\end{array} } \right. $$

ensures that D i  ≥ 0. M j is the p-value from a two-tailed t-test of differential expression between two phenotypes (for example, normal and disease). The parameters α and β are used to control the sensitivity and selectivity of the density as described in the previous section.

The density score is used to evaluate the impact of other genes in M on the ith gene, according to both the p-value calculated by T-test (an indicator of differential expression) and their topological distances to the ith gene.

Step 3: Another vector of density scores, D NM, is computed by randomly shuffling the phenotypes to obtain a representative sampling of the background distribution.

Step 4: Statistical significance for rejecting the null hypothesis is determined by a permutation test. For statistical robustness, step 3 is repeated n times. The number of times the average density score of randomly chosen genes is found to be larger than the average density score of genes in M is counted after n iterations and used to compute the final p-value.

When applying NMEA to multiple metanodes, the p-value must be corrected by FDR in a manner similar to what was described above for GOTEA. In this case, \( {\text{FDR}} = p \times {{m} \left/ {k} \right.} \) as before, but m is the total number of metanodes and k is the rank of the metanodes under consideration.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this protocol

Cite this protocol

Hu, Z. (2013). Analysis Strategy of Protein–Protein Interaction Networks. In: Mamitsuka, H., DeLisi, C., Kanehisa, M. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 939. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-107-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-107-3_11

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-106-6

  • Online ISBN: 978-1-62703-107-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics