Network Biology pp 399-414 | Cite as
Predicting Node Characteristics from Molecular Networks
Abstract
A large number of genome-scale networks, including protein–protein and genetic interaction networks, are now available for several organisms. In parallel, many studies have focused on analyzing, characterizing, and modeling these networks. Beyond investigating the topological characteristics such as degree distribution, clustering coefficient, and average shortest-path distance, another area of particular interest is the prediction of nodes (genes) with a given characteristic (labels) – for example prediction of genes that cause a particular phenotype or have a given function. In this chapter, we describe methods and algorithms for predicting node labels from network-based datasets with an emphasis on label propagation algorithms (LPAs) and their relation to local neighborhood methods.
Key words
Functional linkage networks Gene function prediction Label propagation1 Introduction
This table summarizes some examples of widely used interaction networks whose edges are predictive of cofunctionality between the connected genes or proteins
Network | Experimental methodology | Description |
---|---|---|
Cocomplex network | Copurification (Affinity capture) | An interaction is inferred between members of subunits in a purified protein complex using affinity purification and one or more additional fractionation steps |
Protein interaction networks | Protein-fragment complementation assay (PCA) | Two proteins of interest are fused to complementary fragments of a reporter protein; when proteins of interest react the reporter protein fluoresces |
Two-hybrid | Bait protein expressed as a DNA binding domain (DBD) fusion and prey expressed as a transcriptional activation domain (TAD) fusion; interaction measured by reporter gene activation | |
Genetic interaction networks | Synthetic genetic array (SGA) | Genetic interactions between two genes is inferred when the mutation (or deletion) of both genes results in a phenotype that is unexpected based on the single mutations of each of the genes. For example, in yeast, synthetic lethal genetic is inferred whenever a double mutation of two nonessential genes results in lethality |
diploid based synthetic lethality analysis on microarrays (dSLAM) | ||
Colocalization networks | Green fluorescent protein (GFP) fusion | An interaction is inferred from colocalization of two proteins in the cell, including codependent association of proteins with promoter DNA in chromatin immunoprecipitation experiments |
Coexpression networks | Microarray | Quantification of changes in expression level of genes by measuring the abundance of their corresponding mRNA in different conditions based on hybirdization of labeled mRNA to known probes |
Serial analysis of gene expression (SAGE) | Quantification of abundance of transcripts by cloning and sequencing of the extracted mRNA | |
Transcriptional regulatory networks | ChIP-on-Chip | Combines Chromatin immunoprecipitation (ChIP) with microarray (chip) to determine the binding sites of DNA-binding proteins on genome-wide basis |
Coinheritance (shared phylogenetic profiles) | Coinheritance networks are derived from phylogenetic profiles that summarize the presence/absence of homologous proteins in various species |
In the case of function prediction, positive gene labels can be obtained from databases such as Gene Ontology (GO) (6), KEGG (7), and MIPS (8). These databases provide both a controlled vocabulary for describing categories of gene function and curated lists of genes annotated to these functions. For disease prioritization, positive gene lists can be obtained from the Human Phenotype Ontology (9) and the OMIM database (10), which provide genes associated with various phenotypic abnormalities and genetic diseases.
Here, we focus on algorithms that solve a binary classification problem. If necessary, it is easy to generalize these approaches to predict multiple functions/diseases per gene. Formally, given a network over all entities (genes or proteins), and a set of binary labels where positives are labeled as +1, the goal is to predict which unlabeled nodes are likely to be positives. The “guilt-by-association” approach only considers direct neighbors of nodes when making predictions (1). However, often indirect interactions and global network topology can improve the prediction performance (11, 12, 13). As such, a large number of models and algorithms have been proposed that consider both direct and indirect interactions (12, 13, 14, 15, 16, 17, 18). For example, label propagation algorithms (LPAs) assign continuous scores (predicted labels) to all nodes in the network while considering both local and global network topology. In addition to offering a principled way of incorporating indirect interactions, the complexity of LPAs scales with the number of edges in the network and thus LPAs are computationally feasible for very large networks – empirically, less than 0.1% of total possible edges are often observed in real-world networks.
In this chapter, we focus on describing commonly used algorithms for predicting gene labels (gene function or involvement in a disease) from network-based data. In particular, as done in (19), we categorize such algorithms into two broad categories; those that use the node’s local neighborhood and those that use global network topology when predicting node labels. For the latter category, we mainly focus on LPAs. As we show, LPAs are closely related to local neighborhood approaches; we describe an approximation framework to LPAs that will allow us to directly derive the local neighborhood methods.
The rest of this chapter is organized as follows: In Subheading 2.1, we describe how to construct networks from various high-throughput data sources, with the purpose of using these networks for prioritizing genes or prediction gene function; in Subheading 2.2, we review algorithms for predicting node labels from a network; in Subheading 2.3, we describe several methods for constructing networks from multiple high-throughput data sources; and in Subheading 2.4, we describe several online resources for gene prioritization and predicting gene and protein function.
2 Methods
The task of gene function prediction requires three components: A network that can be constructed from one or many different high-throughput data sources; a set of positive genes (seeNote 1); and an algorithm for making prediction from the network. In the next section, we describe how to construct individual networks that represent the evidence for cofunctionality implied by individual high-throughput datasets.
2.1 Constructing Networks for Predicting Gene Function
To predict gene function, we assume that we are provided with a network whose nodes correspond to genes and whose edges represent the strength of the evidence for cofunctionality between the connected genes. These networks are called functional linkage networks (FLNs). There are several different types of FLNs that support good function prediction performance including those whose edge weights represent coexpression, genetic interaction, protein interaction, coinheritance, colocalization, or shared domain composition of the connected genes. A number of studies have demonstrated a drastic improvement in the accuracy of function prediction when multiple data sources are combined (13, 17, 18, 20, 21, 22, 23, 24). Below, we first review how to construct individual networks from high-throughput data sources; in Subheading 2.3, we describe how to combine multiple networks into one composite network for input to a label prediction algorithm.
We broadly classify networks into those that are derived from interaction-based data and those that are derived from profile-based data. The former includes networks derived from protein and genetic interaction datasets, and the latter includes those derived from gene expression profiles or patterns of protein localization.
For profile-based datasets, such as gene expression, the edges in the corresponding network are constructed from pairwise similarity scores. Determining the appropriate similarity metric for a given data type is an active area of research. For example, much work has been carried out on constructing coexpression networks (25, 26). Here, we present a simple method that performs well on a variety of data sources (as used in refs. 17, 27). In particular, for many types of profiled-based data, we have found that the Pearson Correlation Coefficient (PCC) results in networks with comparable or better performance than other similarity metrics. For binary profile-based data such as protein localization, prior to taking the PCC, we use the following background correction that significantly improves the resulting FLNs: given a binary matrix B with n rows (genes) and d columns (features), we set all 1’s in column i to \( -\mathrm{log}({p}_{i}^{(1)})\) and 0’s to \( \mathrm{log}(1-{p}_{i}^{(1)})\), where \( {p}_{i}^{(1)}\) is the probability of observing 1 in column i. In this way, genes that share “rare” features will have higher correlation than those that share “common” features (seeNote 2).
For network-based data, we can always use the binary interactions alone as the final network. However, several studies have observed a drastic improvement in performance when constructing FLNs using the PCC between interaction partners of genes or proteins (28, 29, 30). Calculating PCC on frequency-corrected data as described above further improves the performance (28).
2.2 Predicting Node Labels from Networks
In this section, we review and discuss algorithms for predicting binary labels using networks. In particular, we focus on LPAs and their relationship to simpler direct and indirect neighbor methods.
Notation
Here, we assume we are given a network, represented as a symmetric matrix W. Assuming there are n nodes (genes) in the network, then W is an n × n matrix with entries w_{ji} = w_{ij} ≥ 0 representing the weighted edge between nodes i and j. For example, in the case of protein–protein interaction, w_{ij} can be binary (indicating the absence or presence of a physical interaction) or weighted according to the score (e.g., −log of p-value) of the interaction between proteins i and j. We represent the labels using a vector \( \overrightarrow{y}\in {\{0,1\}}^{n}\), where positive nodes (e.g., those involved in a given function of interest) are labeled as +1 and unlabeled nodes are labeled as 0. Ideally, in addition to the positives, some nodes should serve as negative examples and thus would be assigned a negative label (y_{i} = −1); however, such negative examples are rarely available – here, we assume we only have positive and unlabeled nodes.
Local Neighborhood Approaches
In guilt-by-association, genes are assigned a score based on their direct connections to the positive genes. For example, in (1), a gene is predicted to have the same function as those of the majority of its direct neighbors. More recent studies have extended guilt-by-association to include second-degree (indirect) interactions (13) or consider a small neighborhood around the positively labeled genes (31) when assigning scores to the unlabeled genes. Below, we present a single general framework that serves as the basis for deriving existing local neighborhood methods.
In the context of the above representation, several existing direct and indirect neighbor-based methods define node scores as \( {f}_{i}={\displaystyle \sum _{j=1}^{n}{p}_{ij}{y}_{j}+{\displaystyle \sum _{j=1}^{n}{[{\widehat{P}}^{2}]}_{ij}{y}_{j}}}\), where \( {\widehat{P}}^{2}\) is a modified version of Pˆ ^{2} with some entries set to zero (or modified). For example, in the BIOPIXIE graph search, the second summation only includes the top k genes that had the highest direct neighbor score; if node j is not among the top scoring direct neighbors of the positive genes, then \( {[{\widehat{P}}^{2}]}_{ij}=0\).
In this section, we describe a simple framework for assigning node scores based on the label of other nodes in their local neighborhood. In the next section, we show that the local neighborhood approaches, as presented in Eq. 1, can be thought of as an approximation to LPAs.
Label Propagation Algorithms
Several variants of LPA have been used in gene function prediction including the works of (16, 34, 35). Here, we review an LPA derived from the Gaussian fields algorithm as it allows us to analytically calculate the final node scores (reviewed in ref. 36) and it has been shown to produce state-of-the-art performance in gene function prediction in yeast and mouse (17, 23).
The basic assumption of LPA is that the score of node i at iteration (or time) r can be computed from a weighted combination of score of its neighbors at the previous iteration and its initial label y_{i}. In its simplest form, we can state this principle as \( {f}_{i}^{(r)}=\lambda {\displaystyle \sum _{j=1}^{n}{w}_{ij}{f}_{j}^{(r-1)}+(1-\lambda ){y}_{i}}\), where \( \lambda \) is a constant such that \( 0<\lambda <1\) thus making \( {f}_{i}^{(r)}\) a convex combination of the scores of its neighbors and its initial label.
However, in the above formulation for node scores \( {f}_{i}^{(r)}\), nodes with high weighted degree that have positive neighbors will end up influencing the scores of many nodes in their local neighborhood; it is standard practice to correct for this effect by normalizing W. In particular, we can normalize W in two different ways: (1) by dividing each row with its row sum and thus using the Markov Transition matrix in place of W: \( P={D}^{-1}W\) resulting in the expression \( {f}_{i}^{(r)}=\lambda {\displaystyle \sum _{j=1}^{n}\frac{{w}_{ij}}{{d}_{i}}{f}_{j}^{(r-1)}+(1-\lambda ){y}_{i}=}\lambda {\displaystyle \sum _{j=1}^{n}{p}_{ij}{f}_{j}^{(r-1)}}+(1-\lambda ){y}_{i}\) or (2) performing a symmetric normalization \( \dot{W}={D}^{-1/2}W{D}^{-1/2}\) resulting in the expression \( {f}_{i}^{(r)}=\lambda {\displaystyle \sum _{j=1}^{n}\frac{{w}_{ij}}{\sqrt{{d}_{i}{d}_{j}}}{f}_{j}^{(r-1)}+(1-\lambda ){y}_{i}}\). These two choices of normalization result in slightly different node scores. We first pursue the asymmetric normalization, thus using the Markov Transition Matrix P, as it will allow us to directly compare LPA to the local neighborhood method presented in Eq. 1.
The above representation clarifies the connection between LPA and the local neighborhood method as presented in Eq. 1. However, there is a major difference: in LPA, because we multiply P ^{r} with λ^{r} < 0, propagation from the positive nodes declines rapidly with paths of increasing length, whereas there are no such guarantees when using Eq. 1. This fact offers an explanation for the rapid decline in performance of neighborhood methods with path length increasing beyond two (e.g., as observed in (11)).
Again, using the Taylor’s theorem, we can correspondingly derive the following form: \( \overrightarrow{f}=(1-\lambda ){\displaystyle \sum _{j=1}^{\infty }{\lambda }^{r}{\dot{W}}^{r}\overrightarrow{y}}\) since we also have that the largest magnitude eigenvalue of \( \lambda \dot{W}\) is again less than 1 (37).
The solution to LPA depends on a parameter \( 0<\lambda <1\). Note that smaller \( \lambda \) allow for more influence from paths of increasing length, in contrast, with large \( \lambda \), only local paths (e.g., those with path lengths of 2 or 3) influence the final solution. In practice, \( \lambda \) can be set using cross-validation (seeNote 4).
In this section, we present a formulation of LPA that is based on iterative propagation of scores to direct neighbors; it is also possible to derive the solution to LPA by solving a convex optimization problem (33) as we describe in Note 5. In particular, we have derived LPA for two different normalization of the matrix W: by using the asymmetric matrix P or symmetrically normalized matrix \( \dot{W}\) (seeNote 6). This formulation allows us to generalize several other LPAs: for example, the RankProp algorithm (34) uses asymmetrically normalized P (as in Eq. 3). On the contrary, the functional Flow algorithm (16) does not explicitly set a decay parameter \( \lambda \) or downweigh the influence of hubs by normalizations: these criterions are implicitly enforced by always propagating to shortest distance neighbors first and subtracting out-flow from in-flow. We also have shown that approximating the LPA solution using Taylor’s matrix expansion results in an algorithm very similar to the local neighborhood method.
In addition to LPAs, discrete Markov Random Fields (DMRFs) present another class of methods that consider the global network topology when making predictions. In particular, DMRFs can be viewed as a discrete version of LPA where the predicted scores are constrained to be binary: i.e., \( \overrightarrow{f}={[0,1]}^{n}\). However, the integer constraint makes the DMRFs intractable. Nevertheless, previous studies have used DMRFs for predicting gene function either using simulated annealing (38) or approximating the solution using coordinate descent (39). However, typically solving for the node scores in a DMRF requires considerable computational effort. We note that several studies have found that methods based on discrete MRFs do not perform any better than LPAs or neighborhood-based methods (16, 19).
2.3 Constructing a Composite Network from Multiple Data Sources
As we discussed in Subheading 1, previous studies have shown that combining multiple high-throughput data sources into a single functional linkage network (FLN) results in better prediction performance. There are two broad categories of methods for constructing FLNs: (1) probabilistic FLNs, where edges between two genes represent the probability of their cofunctionality (functional coupling) and (2) FLNs constructed by weighted summation of the underlying networks, each constructed from a different data source. Probabilistic FLNs are commonly constructed using Bayesian network. In fact, most existing methods are similar to Naïve Bayes (13, 40, 41, 42, 43). In the following, we describe the second approach for constructing FLNs. For a more extensive review on the subject, we refer the reader to (44).
A simple and widely used approach for combining multiple networks constructs a composite network, denoted by a weighted matrix W^{ *}, as the average of the individual networks: e.g., \( {W}^{*}=\frac{1}{D}{\displaystyle \sum _{d=1}^{D}{W}_{d}}\), where we have D networks represented as W_{d} (24). We can extend the above approach by taking a weighted sum: \({W}^{*}={\displaystyle \sum _{d=1}^{D}{\alpha }_{d}{W}_{d}}\), where each network W_{d} is weighted according to the coefficient \( {\alpha }_{d}\) (17, 18, 21, 22). For example, we can set these coefficients to downweigh redundant network and ignore irrelevant ones. These coefficients \( {\alpha }_{d}\) are often set to optimize the performance of the composite network in predicting a single or a group of gene functions. As an example, (17) used linear regression to determine the coefficients \( \overrightarrow{\alpha }\) (seeNote 7).
To summarize, combining multiple networks into a single graph creates a better platform for the label propagation performance. In this section, we briefly described two simple and widely used methods for combining multiple networks. In the next section, we list online resources for prioritizing genes and predicting gene function from multiple heterogeneous data sources.
2.4 Online Resources for Predicting Gene Function
Summary of Web-based resources for on-demand prediction of gene function from FLNs
Organisms | Algorithm | Flexibility in choosing FLNs | |
---|---|---|---|
GeneMANIA (49) | Yeast, mouse, human, fly, worm, Arabidopsis thaliana | Network integration. Using the choice of (a) linear regression, (b) averaging, or (c) averaging while accounting for redundancy Label prediction. LPA using the combined network | Allows users to choose any combination of the available FLNs (by type or based on a publication) and network integration method. Additionally, users can upload their own networks |
FunCoup (50) | Yeast, mouse, human, fly, worm, rat, Arabidopsis thaliana, Ciona intestinalis | Network integration. Naïve Bayes Label prediction. Uses Naïve Bayes classifiers, which make prediction based on direct neighbors in the combined network | Allows users to choose a network type (for example coexpression) but no flexibility on using specific networks based on a given publication |
STRING (51) | 630 Organisms | Network integration. Individual scoring of FLNs prior to construction of a network combination using a Noisy OR model Label prediction. Considers direct neighbors only | All predictions are made from a fixed network |
BioPIXIE (13) | Yeast | Network integration. Naïve Bayes | All predictions are made from a fixed network |
Label prediction. Uses direct and second-order neighbors heuristic (BioPIXIE Graph Search) | |||
MouseNet (52) | Mouse | Network integration. Naïve Bayes | All predictions are made from a fixed network |
Label prediction. Uses direct and second-order neighbors heuristic (BioPIXIE Graph Search) | |||
HFalMp (43) | Human | Network integration. Regularized Naïve Bayes | All predictions are made from a fixed network |
Label prediction. Uses direct and second-order neighbors heuristic |
3 Conclusions
In this chapter, we describe how to construct networks from high-throughput data sources and how to combine multiple networks into a composite functional linkage network. In addition, we provide a review of algorithms for predicting node labels from networks; such algorithms are often applied to predict genes that are involved in a given function or result in a specific phenotype. In particular, we focus on describing the label propagation algorithms and their relation to simpler neighborhood-based methods. As we show, these two types of algorithms are closely related. This observation may explain why several studies have found that neighborhood methods result in similar performance to LPAs (16, 45) – noting that average shortest distance in molecular interaction networks often follows the same distribution as small-world networks; in other words, most nodes are connected by small number of steps (e.g., 3 or 4). Finally, we summarize several online resources for gene prioritizing and predicting gene function.
4 Notes
- 1.
Positive Labels for Gene Function Prediction. As mentioned in the Introduction, the set of positive labels can be derived from online databases such as Gene Ontology (GO) (6), MIPS (8), and KEGG (7). GO is one the most widely used annotation databases covering a large number of organisms. GO defines three hierarchies for describing properties of gene products: Biological Process, Molecular Function, and Cellular Component. The categories defined in GO range from very broad properties that consist of hundreds of genes (e.g., biological regulation) to very specific properties that consist of only a few genes (e.g., positive regulation of mitosis). Algorithms for predicting gene function are often tested on categories that have between 10 and 300 annotations; these categories have enough positives for training without being too broad (23). For a discussion on choosing informative GO categories see ref. 46.
- 2.
Sparsification of FLNs. To construct an FLN from profile-based data, we can use a similarity metric such as the PCC; for example, in a coexpression network, the edge between protein i and j is the PCC between their expression profiles. However, since the PCC is often nonzero for many pairs of genes, we can often improve performance (both in accuracy and computational time requirement) by sparsifying the PCC derived networks. A common way for doing this is to keep the top m interacting partners for each gene and set the rest to zero; m can range from 50 to 100 as done in (17, 21, 23). A second approach is to set a threshold value t, where all PCC’s smaller than t are set to zero (25).
- 3.
Taylors Matrix Expansion Theorem. To be able to use the Matrix Expansion Theorem we need to ensure that the largest magnitude eigenvalue of P is less than 1: \(\underset{i}{\mathrm{max}}\left|{\sigma }_{i}\right|<1\). We use the Perron-Frobenius Theorem (PFT) to show that this condition holds: from PFT we have that \(\underset{i}{\mathrm{min}}{\displaystyle \sum _{j=1}^{n}\lambda {p}_{ij}}\text{ }\lambda {p}_{ij}\le \underset{i}{\mathrm{max}}\left|{\sigma }_{i}\right|\text{\hspace{0.05em}}\text{\hspace{0.05em}}\le \underset{i}{\mathrm{max}}{\displaystyle \sum _{j=1}^{n}\lambda {p}_{ij}}\text{ }\) note that \( {\displaystyle \sum _{j=1}^{n}{p}_{ij}=1}\) for all i, since we have a multiplication with the constant \( 0<\lambda <1\), then the maximum value of all row sums is less than 1.
- 4.
Setting the Parameter\( \lambda \)in LPA. We can set the parameter \( \lambda \) using cross-validation. To do so, we investigate the performance of various settings of \( \lambda \) on the validation set (a fraction of training data not used to in training). In practice, we have found that setting \( \lambda \approx 0.5\) (when solving LPA in Eq. 4) results in good performance in a wide variety of prediction tasks.
- 5.LPA as the solution to a convex optimization problem. LPA as proposed by (33, 47) is derived using the following objective function:where c > 0 is a constant. This equation can be written in matrix form as \( \mathrm{arg}\underset{\overrightarrow{f}}{\mathrm{min}}c{(\overrightarrow{f}-\overrightarrow{y})}^{T}(\overrightarrow{f}-\overrightarrow{y})+{\overrightarrow{f}}^{T}\dot{L}\overrightarrow{f}\) with \( L=I-\dot{W}\) known as the normalized graph Laplacian (recall that \( {W}^{Y}={D}^{-1/2}W{D}^{-1/2}\) is the symmetrically normalized weight matrix W). Differentiating and setting the derivative to zero, we get \( \overrightarrow{f}=c{\left((1+c)I-\dot{W}\right)}^{-1}\overrightarrow{y}\). Using \( \lambda =\frac{1}{1+c}\) we can write as follows: \( \overrightarrow{f}=(1-\lambda ){(I-\lambda \dot{W})}^{-1}\overrightarrow{y}\). Thus, this version of LPA uses the symmetrically normalized W that we used to derive Eq. 4.$$ \mathrm{arg}\underset{\overrightarrow{f}}{\mathrm{min}}c{\displaystyle \sum _{i=1}^{n}{({f}_{i}-{y}_{i})}^{2}+\frac{1}{2}{\displaystyle \sum _{i,j=1}^{n}{w}_{ij}{\left(\frac{{f}_{i}}{\sqrt{{d}_{i}}}-\frac{{f}_{j}}{\sqrt{{d}_{j}}}\right)}^{2}}},$$(5)
- 6.
LPA with symmetrically and asymmetrically normalized weight matrix. Empirically, we have observed better performance when using the symmetrically normalized \( \dot{W}\) and so we suggest using this form of LPA in practice. Furthermore, because \( \dot{W}\) is symmetric, we can solve Eq. 4 efficiently using Conjugate Gradient, which scales with the number of nonzero elements in \( \dot{W}\).
- 7.
Constructing a Composite Network by Averaging FLNs. Several previous studies have found that a simple averaging of underlying networks often results in composite networks that have comparable performance to those constructed with more sophisticated methods, for example, those that assign network weights \( {\alpha }_{d}\) (24, 48). However, simple averaging will suffer when many of the underlying networks are redundant (or represent similar information); for example, there are often many more coexpression networks than other data types. One simple way to correct for redundancy is to simply group networks together based on their type and assign each type an equal weight: e.g., \( {W}^{*}=\frac{1}{3}{W}_{\mathrm{exp}}+\frac{1}{3}{W}_{gi}+\frac{1}{3}{W}_{pi},\) where W_{exp},W_{gi} , and W_{pi} represent the average network of all coexpression, genetic interactions, and protein interactions networks, respectively.
References
- 1.Marcotte, E.M., et al., Detecting protein function and protein-protein interactions from genome sequences. Science, 1999. 285(5428): p. 751–3.Google Scholar
- 2.Wu, X., et al., Network-based global inference of human disease genes. Mol Syst Biol, 2008. 4: p. 189.Google Scholar
- 3.Aerts, S., et al., Gene prioritization through genomic data fusion. Nat Biotechnol, 2006. 24(5): p. 537–44.Google Scholar
- 4.Sharan, R., I. Ulitsky, and R. Shamir, Network-based prediction of protein function. Mol Syst Biol, 2007. 3: p. 88.Google Scholar
- 5.Oti, M. and H.G. Brunner, The modular nature of genetic diseases. Clin Genet, 2007. 71(1): p. 1–11.Google Scholar
- 6.Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25–9.Google Scholar
- 7.Ogata, H., et al., KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res, 1999. 27(1): p. 29–34.Google Scholar
- 8.Ruepp, A., et al., The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res, 2004. 32(18): p. 5539–45.Google Scholar
- 9.Robinson, P.N., et al., The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet, 2008. 83(5): p. 610–5.Google Scholar
- 10.Hamosh, A., et al., Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res, 2005. 33(Database issue): p. D514–7.Google Scholar
- 11.Chua, H.N., W.K. Sung, and L. Wong, Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006. 22(13): p. 1623–30.Google Scholar
- 12.Zhou, X., M.C. Kao, and W.H. Wong, Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA, 2002. 99(20): p. 12783–8.Google Scholar
- 13.Myers, C.L., et al., Discovery of biological networks from diverse functional genomic data. Genome Biol, 2005. 6(13): p. R114.Google Scholar
- 14.Karaoz, E., et al., Protective role of melatonin and a combination of vitamin C and vitamin E on lung toxicity induced by chlorpyrifos-ethyl in rats. Exp Toxicol Pathol, 2002. 54(2): p. 97–108.Google Scholar
- 15.Deng, M., et al., Prediction of protein function using protein-protein interaction data. J Comput Biol, 2003. 10(6): p. 947–60.Google Scholar
- 16.Nabieva, E., et al., Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 2005. 21 Suppl 1: p. i302–10.Google Scholar
- 17.Mostafavi, S., et al., GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol, 2008. 9 Suppl 1: p. S4.Google Scholar
- 18.Tsuda, K., H. Shin, and B. Scholkopf, Fast protein classification with multiple networks. Bioinformatics, 2005. 21 Suppl 2: p. ii59-65.Google Scholar
- 19.Murali, T.M., C.J. Wu, and S. Kasif, The art of gene function prediction. Nat Biotechnol, 2006. 24(12): p. 1474–5; author reply 1475–6.Google Scholar
- 20.Deng, M., T. Chen, and F. Sun, An integrated probabilistic model for functional prediction of proteins. J Comput Biol, 2004. 11(2–3): p. 463–75.Google Scholar
- 21.Mostafavi, S. and Q. Morris, Fast Integration of Heterogeneous Data Sources for Predicting Gene Function with Limited Annotation. Bioinformatics, 2010.Google Scholar
- 22.Lanckriet, G.R., et al., A statistical framework for genomic data fusion. Bioinformatics, 2004. 20(16): p. 2626–35.Google Scholar
- 23.Pena-Castillo, L., et al., A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol, 2008. 9 Suppl 1: p. S2.Google Scholar
- 24.Pavlidis, P., et al., Learning gene functional classifications from multiple data types. J Comput Biol, 2002. 9(2): p. 401–11.Google Scholar
- 25.Zhang, B. and S. Horvath, A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol, 2005. 4: p. Article17.Google Scholar
- 26.Yona, G., et al., Effective similarity measures for expression profiles. Bioinformatics, 2006. 22(13): p. 1616–22.Google Scholar
- 27.Warde-Farley, D., et al., The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res, 2010. Accepted(Webserver Issue).Google Scholar
- 28.Costanzo, M., et al., The genetic landscape of a cell. Science, 2010. 327(5964): p. 425–31.Google Scholar
- 29.Tong, A.H., et al., Global mapping of the yeast genetic interaction network. Science, 2004. 303(5659): p. 808–13.Google Scholar
- 30.Weirauch, M.T., et al., Information-based methods for predicting gene function from systematic gene knock-downs. BMC Bioinformatics, 2008. 9: p. 463.Google Scholar
- 31.Hishigaki, H., et al., Assessment of prediction accuracy of protein function from protein--protein interaction data. Yeast, 2001. 18(6): p. 523–31.Google Scholar
- 32.Schwikowski, B., P. Uetz, and S. Fields, A network of protein-protein interactions in yeast. Nat Biotechnol, 2000. 18(12): p. 1257–61.Google Scholar
- 33.Zhou, D., et al., Learning with Local and Global Consistency, in Neural Information Processing Systems. 2003, MIT Press: Vancouver, BC, Canada.Google Scholar
- 34.Weston, J., et al., Protein ranking: from local to global structure in the protein similarity network. Proc Natl Acad Sci USA, 2004. 101(17): p. 6559–63.Google Scholar
- 35.Hu, P., H. Jiang, and A. Emili, Predicting protein functions by relaxation labelling protein interaction network. BMC Bioinformatics, 2010. 11 Suppl 1: p. S64.Google Scholar
- 36.Bengio, Y., O. Delalleau, and N. Le Roux, Label Propagation and Quadratic Criterion, in Semi-Supervised Learning, O. Chapelle, B. Scholkopf, and A. Zien, Editors. 2006, MIT Press.Google Scholar
- 37.Chung, F., Spectral Graph Theory. Number 92 in CBMS Regional Conference Series in Mathematics. 1999: American Mathematical Society.Google Scholar
- 38.Vazquez, A., et al., Global protein function prediction from protein-protein interaction networks. Nat Biotechnol, 2003. 21(6): p. 697–700.Google Scholar
- 39.Karaoz, U., et al., Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA, 2004. 101(9): p. 2888–93.Google Scholar
- 40.Fraser, A.G. and E.M. Marcotte, A probabilistic view of gene function. Nat Genet, 2004. 36(6): p. 559–64.Google Scholar
- 41.Lee, I., et al., A probabilistic functional network of yeast genes. Science, 2004. 306(5701): p. 1555–8.Google Scholar
- 42.Myers, C.L. and O.G. Troyanskaya, Context-sensitive data integration and prediction of biological networks. Bioinformatics, 2007. 23(17): p. 2322–30.Google Scholar
- 43.Huttenhower, C., et al., Exploring the human genome with functional maps. Genome Res, 2009. 19(6): p. 1093–106.Google Scholar
- 44.Noble, W.S. and A. Ben-Hur, Integrating Information for Protein Function Prediction, in Bioinformatics-From Genomes to Therapies, T. Lengauer, Editor. 2007, Wiley-VCH Verlag GmbH & Co KGaA: Weinheim, Germany.Google Scholar
- 45.Song, J. and M. Singh, How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics, 2009. 25(23): p. 3143–50.Google Scholar
- 46.Myers, C.L., et al., Finding function: evaluation methods for functional genomic data. BMC Genomics, 2006. 7: p. 187.Google Scholar
- 47.Zhu, X., J. Lafferty, and Z. Ghahramani. Semi-supervised learning using Gaussian fields and harmonic functions. in International Conference on Machine Learning. 2003. Washington DC, USA.Google Scholar
- 48.Lewis, D.P., T. Jebara, and W.S. Noble, Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics, 2006. 22(22): p. 2753–60.Google Scholar
- 49.Warde-Farley, D., et al., The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res, 2010. 38 Suppl: p. W214–20.Google Scholar
- 50.Alexeyenko, A. and E.L. Sonnhammer, Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res, 2009. 19(6): p. 1107–16.Google Scholar
- 51.von Mering, C., et al., STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res, 2005. 33(Database issue): p. D433–7.Google Scholar
- 52.Guan, Y., et al., A genomewide functional network for the laboratory mouse. PLoS Comput Biol, 2008. 4(9): p. e1000165.Google Scholar