Analysis Strategy of Protein–Protein Interaction Networks

Hu, Zhenjun

doi:10.1007/978-1-62703-107-3_11

Analysis Strategy of Protein–Protein Interaction Networks

Zhenjun Hu⁴

Protocol
First Online: 01 January 2012

3760 Accesses
3 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 939))

Abstract

Protein interactions, as well as the networks they formed, play a key role in many cellular processes and the distortion of the protein interacting interfaces may lead to the development of many diseases. In this chapter, we will briefly introduce the background knowledge of the protein–protein interaction, followed by the detailed explanation of varied analysis—from basic to advanced, as well as related tools and databases. VisANT (http://visant.bu.edu)—a free Web-based software platform for the integrative visualization, mining, analysis, and modeling of the biological networks—will be used as a main tool for all examples used in this section.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Notes

1.
When you are uncertain about the format of edge-list, you can always export the network in the format of edge-list with the menu File→Export as Tab-Delimited File→All and follow the exported examples.

References

Phizicky EM, Fields S (1995) Protein-protein interactions: methods for detection and analysis. Microbiol Rev 59(1):94–123
PubMed CAS Google Scholar
Berggard T, Linse S, James P (2007) Methods for the detection and analysis of protein-protein interactions. Proteomics 7(16):2833–2842
Article PubMed Google Scholar
Sobott F, Robinson CV (2002) Protein complexes gain momentum. Curr Opin Struct Biol 12(6):729–734
Article PubMed CAS Google Scholar
McCammon MG et al (2002) Screening transthyretin amyloid fibril inhibitors: characterization of novel multiprotein, multiligand complexes by mass spectrometry. Structure 10(6):851–863
Article PubMed CAS Google Scholar
von Mering C et al (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887):399–403
Article Google Scholar
Lu L, Arakaki AK, Lu H, Skolnick J (2003) Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res 13(6A):1146–1154
Article PubMed CAS Google Scholar
Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22(10):1317–1321
Article PubMed CAS Google Scholar
Ito T et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98(8):4569–4574
Article PubMed CAS Google Scholar
Uetz P et al (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623–627
Article PubMed CAS Google Scholar
Hart GT, Ramani AK, Marcotte EM (2006) How complete are current yeast and human protein-interaction networks? Genome Biol 7(11):120
Article PubMed Google Scholar
Hu Z, Snitkin ES, DeLisi C (2008) VisANT: an integrative framework for networks in systems biology. Brief Bioinform 9(4):317–325
Article PubMed CAS Google Scholar
Hu Z et al (2007) VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic Acids Res 35(Web Server issue):W625–W632
Article PubMed Google Scholar
Hu Z et al (2005) VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res 33(Web Server issue):W352–W357
Article PubMed CAS Google Scholar
Hu Z, Mellor J, Wu J, DeLisi C (2004) VisANT: an online visualization and analysis tool for biological interaction data. BMC Bioinformatics 5:17
Article PubMed Google Scholar
Hu Z, Mellor J, DeLisi C (2004) Analyzing networks with VisANT. In: Baxevanis A, Davison D, Page R, Petsko G, Stein L, Stormo G (eds) Current protocols in bioinformatics. Wiley, Hoboken
Google Scholar
Hu Z et al (2009) VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology (translated from eng). Nucleic Acids Res 37(Web Server issue):W115–W121 (in eng)
Article PubMed CAS Google Scholar
Hermjakob H et al (2004) The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 22:177–183
Article PubMed CAS Google Scholar
Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C (2009) Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol 10(9):R91
Article PubMed Google Scholar
Linghu B et al (2008) High-precision high-coverage functional inference from integrated data sources. BMC Bioinformatics 9:119
Article PubMed Google Scholar
Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34(Database issue):D504–D506
Article PubMed CAS Google Scholar
Breitkreutz BJ et al (2008) The BioGRID Interaction Database: 2008 update. Nucleic Acids Res 36(Database issue):D637–D640
PubMed CAS Google Scholar
Aranda B et al (2010) The IntAct molecular interaction database in 2010. Nucleic Acids Res 38(Database issue):D525–D531
Article PubMed CAS Google Scholar
Zanzoni A et al (2002) MINT: a Molecular INTeraction database. FEBS Lett 513(1):135–140
Article PubMed CAS Google Scholar
Mewes HW et al (2008) MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res 36(Database issue):D196–D201
PubMed CAS Google Scholar
Cherry JM et al (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Res 26(1):73–79
Article PubMed CAS Google Scholar
Wilson RJ, Goodman JL, Strelets VB (2008) FlyBase: integration and improvements to query tools. Nucleic Acids Res 36(Database issue):D588–D593
PubMed CAS Google Scholar
Keshava Prasad TS et al (2009) Human Protein Reference Database—2009 update. Nucleic Acids Res 37(Database issue):D767–D772
Article PubMed CAS Google Scholar
Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res 30(1):306–309
Article PubMed CAS Google Scholar
von Mering C et al (2007) STRING 7—recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35(Database issue):D358–D362
Article Google Scholar
UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36(Database issue):D190–D195
Google Scholar
Bruford EA et al (2008) The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res 36(Database issue):D445–D448
PubMed CAS Google Scholar
Schuster-Bockler B, Bateman A (2008) Protein interactions in human genetic diseases. Genome Biol 9(1):R9
Article PubMed Google Scholar
Yeger-Lotem E et al (2004) Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci USA 101(16):5934–5939
Article PubMed CAS Google Scholar
Zhang LV et al (2005) Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol 4(2):6
Article PubMed CAS Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2008) GenBank. Nucleic Acids Res 36(Database issue):D25–D30
PubMed CAS Google Scholar
Rogers A et al (2008) WormBase 2007. Nucleic Acids Res 36(Database issue):D612–D617
PubMed CAS Google Scholar
Goh KI et al (2007) The human disease network. Proc Natl Acad Sci USA 104(21):8685–8690
Article PubMed CAS Google Scholar
Tong AH et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813
Article PubMed CAS Google Scholar
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large-scale organization of metabolic networks. Nature 407(6804):651–654
Article PubMed CAS Google Scholar
Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378–382
Article PubMed CAS Google Scholar
Gavin AC et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147
Article PubMed CAS Google Scholar
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555
Article PubMed CAS Google Scholar
Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (2004) Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14(3):283–291
Article PubMed CAS Google Scholar
del Sol A, O’Meara P (2005) Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3):672–682
Article PubMed Google Scholar
King OD (2004) Comment on “Subgraphs in random networks”. Phys Rev E Stat Nonlin Soft Matter Phys 70(5 Pt 2):058101, author reply 058102
Article PubMed Google Scholar
Itzkovitz S, Milo R, Kashtan N, Ziv G, Alon U (2003) Subgraphs in random networks. Phys Rev E Stat Nonlin Soft Matter Phys 68(2 Pt 2):026127
Article PubMed CAS Google Scholar
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68
Article PubMed CAS Google Scholar
Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
Article PubMed CAS Google Scholar
da Huang W et al (2007) The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8(9):R183
Article Google Scholar
Hu Z et al (2007) Towards zoomable multidimensional maps of the cell (translated from eng). Nat Biotechnol 25(5):547–554 (in eng)
Article PubMed CAS Google Scholar
Lee TI et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594):799–804
Article PubMed CAS Google Scholar
Milo R et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Article PubMed CAS Google Scholar
Endy D, Brent R (2001) Modelling cellular behaviour. Nature 409(6818):391–395
Article PubMed CAS Google Scholar
Stamm S (2002) Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome. Hum Mol Genet 11(20):2409–2416
Article PubMed CAS Google Scholar
Boulos MN (2003) The use of interactive graphical maps for browsing medical/health Internet information resources. Int J Health Geogr 2(1):1
Article PubMed Google Scholar
Green ML, Karp PD (2006) The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Res 34(13):3687–3697
Article PubMed CAS Google Scholar
Fraser AG, Marcotte EM (2004) A probabilistic view of gene function. Nat Genet 36(6):559–564
Article PubMed CAS Google Scholar
Guimera R, Nunes Amaral LA (2005) Functional cartography of complex metabolic networks. Nature 433(7028):895–900
Article PubMed CAS Google Scholar
Ihmels J et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4):370–377
PubMed CAS Google Scholar
Bar-Joseph Z et al (2003) Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21(11):1337–1342
Article PubMed CAS Google Scholar
Wu J, Hu Z, DeLisi C (2006) Gene annotation and network inference by phylogenetic profiling. BMC Bioinformatics 7:80
Article PubMed Google Scholar
Oltvai ZN, Barabasi AL (2002) Systems biology. Life’s complexity pyramid. Science 298(5594):763–764
Article PubMed CAS Google Scholar
Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9(7):509–515
Article PubMed CAS Google Scholar
Reimand J, Tooming L, Peterson H, Adler P, Vilo J (2008) GraphWeb: mining heterogeneous biological networks for gene modules with functional significance. Nucleic Acids Res 36(Web Server issue):W452–W459
Article PubMed CAS Google Scholar
Zhang M et al (2008) Interactive analysis of systems biology molecular expression data. BMC Syst Biol 2:23
Article PubMed Google Scholar
Brohee S et al (2008) NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res 36(Web Server issue):W444–W451
Article PubMed CAS Google Scholar
Alibes A, Canada A, Diaz-Uriarte R (2008) PaLS: filtering common literature, biological terms and pathway information. Nucleic Acids Res 36(Web Server issue):W364–W367
Article PubMed CAS Google Scholar
Antonov AV, Schmidt T, Wang Y, Mewes HW (2008) ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data. Nucleic Acids Res 36(Web Server issue):W347–W351
Article PubMed CAS Google Scholar
Lee T, Desai VG, Velasco C, Reis RJ, Delongchamp RR (2008) Testing for treatment effects on gene ontology. BMC Bioinformatics 9(Suppl 9):S20
Article PubMed Google Scholar
Salomonis N et al (2007) GenMAPP 2: new features and resources for pathway analysis. BMC Bioinformatics 8:217
Article PubMed Google Scholar
Zhu J et al (2007) GO-2D: identifying 2-dimensional cellular-localized functional modules in gene ontology. BMC Genomics 8:30
Article PubMed Google Scholar
Antonov AV, Tetko IV, Mewes HW (2006) A systematic approach to infer biological relevance and biases of gene network structures. Nucleic Acids Res 34(1):e6
Article PubMed Google Scholar
Draghici S et al (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31(13):3775–3781
Article PubMed CAS Google Scholar
Khatri P, Bhavsar P, Bawa G, Draghici S (2004) Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments. Nucleic Acids Res 32(Web Server issue):W449–W456
Article PubMed CAS Google Scholar
Khatri P et al (2007) Onto-Tools: new additions and improvements in 2006. Nucleic Acids Res 35(Web Server issue):W206–W211
Article PubMed Google Scholar
Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21(18):3587–3595
Article PubMed CAS Google Scholar
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281
Article PubMed CAS Google Scholar
Maglott D, Ostell J, Pruitt KD, Tatusova T (2007) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 35(Database issue):D26–D31
Article PubMed CAS Google Scholar
Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125(1–2):279–284
Article PubMed CAS Google Scholar
Barry WT, Nobel AB, Wright FA (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21(9):1943–1949
Article PubMed CAS Google Scholar
Mootha VK et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273
Article PubMed CAS Google Scholar
Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles (translated from eng). Proc Natl Acad Sci USA 102(43):15545–15550 (in eng)
Article PubMed CAS Google Scholar
Volinia S et al (2004) GOAL: automated gene ontology analysis of expression profiles. Nucleic Acids Res 32(Web Server issue):W492–W499
Article PubMed CAS Google Scholar
Zhou X, Kao MC, Wong WH (2002) Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA 99(20):12783–12788
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Program, Boston University, Boston, MA, USA
Zhenjun Hu

Authors

Zhenjun Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenjun Hu .

Editor information

Editors and Affiliations

, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan
Hiroshi Mamitsuka
Dept. Biomedical Engineering, Boston University, Cummington St. 44, Boston, 02215, Massachusetts, USA
Charles DeLisi
Inst. Chemical Research, Bioinformatics Center, Kyoto University, Gokasho, Uji, 611-0011, Kyoto, Japan
Minoru Kanehisa

Appendix

1.1 Mathematical Definition of Metagraph

A metagraph $ {G_{\rm{m}}} = \{ V,E\} $ consists of a finite set V of the nodes and a finite set E of the edges. Nodes in a metagraph can be denoted as $ V = \{ {V_{\rm{s}}},{V_{\rm{m}}}\} $ where $ {V_{\rm{s}}} $ represents simple nodes as generally defined in simple graph and $ {V_{\rm{m}}} $ represents the metanodes. The subscription s represents the simple node/edge and the subscription m represents metanode/metaedge. Each metanode $ {v_{\rm{m}}} \in {V_{\rm{m}}} $ contains a subgraph consisting of child nodes and connected edges. In addition, each node $ v \in V $ represents a set of its instance nodes, i.e., $ v = \left\{ {{v_i}\left| {i > 0} \right.} \right\} $ where $ {v_i} $ is the instance node of $ v $. Instance nodes remains exact same identity between them but can have individual-specific properties. The statement that two metanodes share a node implies that each metanode contains an instance of the same node.

A metanode $ {v_{\rm{m}}} $ has two states, expanded or contracted; the expanded state manifests the internal subgraph (that is, places all children nodes with their connections into the graph) while the contracted state replaces this subgraph with the single node. The combination of different states of the metanodes for a given metagraph results in multiple views that are abstract representations of the same underlying data. The change of the views for a given metagraph is defined as the dynamics of the metagraph, as shown in Fig. 1D, E.

Edges in a metagraph can be denoted as $ E = \{ {E_{\rm{s}}},{E_{\rm{m}}}\} $where $ {E_{\rm{s}}} $represents simple edges that are generally defined in the simple graph and $ {E_{\rm{m}}} $ represents metaedges. Each metanode edge $ {e_{\rm{m}}} \in {E_{\rm{m}}} = {e_{{{v_{\rm{m}}},v}}} $is associated with at least one contracted metanode $ {v_{\rm{m}}} $and is transient: it appears when the metanode is contracted and disappears when one or two connected metanode nodes expanded, i.e., the metaedge is derived from the properties of two connected nodes. The most common derivation of the metaedge is the connection transfer. For example, when metanodes M1 and M2 are contracted in Fig. 1E, the connection between C and E is transferred to M1 and M2. However, metaedge can also be derived from other properties of the metanode. The metaedge shown in Fig. 1E is derived because two metanode M2 and M3 share the same node E. The derivation of the metaedge can be generalized as $ {e_{{{v_{{{\rm{m}},v}}}}}} = g({v_{{{\rm{m}},}}}v) $, where g is the aggregation function and $ v \in V $ can either be a metanode node or a simple node.

1.2 Download and Run VisANT as a Local Application

VisANT has four running modes in total, and two of them require a local copy of VisANT. Please visit http://visant.bu.edu and click the link “Run VisANT” for detailed instruction of other modes. It is recommended to run VisANT as a local application when handling large-scale network, such as the network with more than 100,000 nodes and edges because you will have the option to specify the memory size that VisANT can use. In addition, a local application allows VisANT to access local resources, such as load/save network files, directly; it also allows the user to develop VisANT plugins, as well as run a list of batch commands in the background without any user interface (batch mode).

The only drawback to run VisANT as a local application is that it easily becomes out of date because VisANT is under active development. Fortunately, VisANT provides a function to checks the update automatically and an icon will be shown near the Help menu if the update is available. Users can either click the icon, or corresponding menu to upgrade the VisANT to the latest version, as shown below:

1.
If not already installed, download and install the Java 2 Platform, Standard Edition, version 1.4 or higher (http://java.sun.com/javase/downloads/index.jsp).
2.
Go to http://visant.bu.edu and click on the link “Download,” then click the link “Latest Version of VisANT.”
3.
Select a directory to save the file “VisAnt.jar”

The VisAnt.jar is only about 400 K in size and the download shall take less than 1 min to finish. No installation is needed to run the VisANT.
4.
To launch VisANT, double-clicking VisAnt.jar
5.
To launch VisANT by an alternative mean: Open a Dos window in Win OS, or a shell window in other operation systems, and go to the directory where VisAnt.jar locates, and run the command:

java -Xmx512M -classpath VisAnt.jar cagt.bu.visant.VisAntApplet

where 512 M indicates the maximum size of the memory that VisANT can use. Increase this number if you have a large network or you get the “run out of memory” error.
6.
The VisANT main window will appear (Fig. 4).
7.
To exit VisANT, close the VisANT main window, or use the File → Exit menu option, or press the key combination ALT + X.

1.3 GO Term Enrichment Analysis

The four steps here describe how GOTEA works in VisANT. For illustration purposes, the following steps take only one metanode, G, into account and calculate only the enrichment score of one target GO term, T.

Step 1: Fully annotate all of the nodes in G with gene names and GO terms.

Step 2: Calculate density scores for each node based upon the topology and the GO term similarity to T. A vector D ^G of density scores of each gene in G is computed, with the element of D ^G for the ith gene denoted D _i. The density score is used to evaluate the impact of other genes in G on the ith gene, according to both the GO term similarity and the topological distance to the ith gene. D _i is defined as:

$$ {D_i} = \sum\limits_{{j \in G}} {{{\log }_2}\left[ {\left( {\frac{{{M_j}}}{\alpha }} \right)\Theta ({M_j} - \alpha ) + \Theta (\alpha - {M_j})} \right]} {{\text{e}}^{{ - \beta {d_{{ij}}}}}}, $$

where the step function,

$$ \Theta (x - y) = \left\{ {\begin{array}{ll} 1 \hfill & {x \geqslant y} \hfill \\0 \hfill & {x < y,} \hfill \\\end{array} } \right. $$

ensures that D _i ≥ 0. M _j is a measure of the GO term similarity calculated based upon the graph structure of the GO term hierarchy [85]. A significance threshold, α, is used to control the contribution that gene j makes to D _i. For larger α, a greater number of less statistically significant (with M _j < α) genes are filtered and they do not contribute to D _i. The shortest distance between genes i and j given the topology of G is denoted d _ij and was calculated with the Floyd–Warshall algorithm. We assume that shorter distances make an exponentially greater contribution to the density than do longer distances, with the steepness of the exponential determined by the parameter $ \tilde{\beta } $When a bigger β is chosen, more distant genes can contribute to the density. Taken together, the parameters α and β are used to control the sensitivity and selectivity of the density.

Step 3: Another vector of density scores, D ^NG, is computed based upon a randomly chosen subset of genes representative of the background distribution. The background consists of all genes annotated by NCBI.

Step 4: Statistical significance for rejecting the null hypothesis is determined by a permutation test. For statistical robustness, step 3 is repeated n times. The number of times the average density score of randomly chosen genes is found to be larger than the average density score of genes in G is counted after n iterations and used to compute the final p-value (Fig. 23).

These four steps can be carried out for multiple testing by using multiple metanodes and multiple targeting GO terms. In this case, the p-values are corrected using FDR methods (79). Specifically,$ {\text{\ FDR}} = p \times {{m} \left/ {k} \right.} $, where m is the total number of GO terms tested and k is the rank of the GO terms under consideration. There is also an option for GOTEA to identify representative GO terms from all its discoveries based upon approaches that identify the most informative GO term (84).

1.4 Network Module Enrichment Analysis

NMEA is implemented in a manner similar to GOTEA. Where GOTEA used GO term similarities, NMEA uses p-values from T-tests on the expression values of two phenotypes.

Step 1: Fetch the expression profile of each gene in a given module (i.e., metanode, denoted M in the following context) from formatted user input. The input should include an adequate number of samples with comparable phenotypes (e.g., normal and disease).

Step 2: A vector D ^M of density scores of each gene is computed, with the element of D ^M for the ith gene denoted as D _i. D _i is defined as:

$$ {D_i} = \sum\limits_{{j \in G}} {{{\log }_2}\left[ {\left( {\frac{\alpha }{{{M_j}}}} \right)\Theta (\alpha - {M_j}) + \Theta ({M_j} - \alpha )} \right]} {{\text{e}}^{{ - \beta {d_{{ij}}}}}}, $$

where the step function,

$$ \Theta (x - y) = \left\{ {\begin{array}{ll} 1 \hfill & {x \geqslant y} \hfill \\0 \hfill & {x < y,} \hfill \\\end{array} } \right. $$

ensures that D _i ≥ 0. M _j is the p-value from a two-tailed t-test of differential expression between two phenotypes (for example, normal and disease). The parameters α and β are used to control the sensitivity and selectivity of the density as described in the previous section.

The density score is used to evaluate the impact of other genes in M on the ith gene, according to both the p-value calculated by T-test (an indicator of differential expression) and their topological distances to the ith gene.

Step 3: Another vector of density scores, D ^NM, is computed by randomly shuffling the phenotypes to obtain a representative sampling of the background distribution.

Step 4: Statistical significance for rejecting the null hypothesis is determined by a permutation test. For statistical robustness, step 3 is repeated n times. The number of times the average density score of randomly chosen genes is found to be larger than the average density score of genes in M is counted after n iterations and used to compute the final p-value.

When applying NMEA to multiple metanodes, the p-value must be corrected by FDR in a manner similar to what was described above for GOTEA. In this case, $ {\text{FDR}} = p \times {{m} \left/ {k} \right.} $ as before, but m is the total number of metanodes and k is the rank of the metanodes under consideration.

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Hu, Z. (2013). Analysis Strategy of Protein–Protein Interaction Networks. In: Mamitsuka, H., DeLisi, C., Kanehisa, M. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 939. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-107-3_11

Download citation

DOI: https://doi.org/10.1007/978-1-62703-107-3_11
Published: 08 September 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-106-6
Online ISBN: 978-1-62703-107-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Mathematical Definition of Metagraph

1.2 Download and Run VisANT as a Local Application

1.3 GO Term Enrichment Analysis

1.4 Network Module Enrichment Analysis

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation