Toy examples
The TiWnet is compared with the graph lasso method (Friedman et al. 2007) and with its non-invariant counterpart Wnet on artificial data. The graph lasso maximizes the standard Wishart likelihood under a sparsity penalty on the inverse covariance matrix, see (2). Wnet replaces the invariant Wishart used in TiWnet with the standard Wishart (1), but uses otherwise exactly the same MCMC code.
Sample generation
For these experiments we implemented a data generator that mimics the assumed generative model as shown in Fig. 3. First, a sparse inverse covariance matrix \(\varPsi\in\mathbb{R}^{n\times n}\) with n=25 is sampled. Networks with uniformly sampled node degrees are relatively easy to reconstruct for most methods, while networks with “hubs” are better suited for showing differences. Hubs are nodes with high degrees that appear naturally in many real networks since they often are scale-free i.e. their node degrees follow a power law. We simulate such networks by drawing node degrees from a Pareto(7×10−5,0.5)-distribution and use these values as parameters in a binomial model for sampling 0/1 entries in the rows/columns of Ψ. The sign of these entries is randomly flipped, and scaled with samples from a Gamma- or uniform distribution (see below for a precise description of the distribution of the edge weights). The diagonal elements are imputed as the row-sums of absolute values plus some small constant ϵ(=0.1) to ensure full rank. We draw d vectors \(\boldsymbol{x}^{o}_{i} \in\mathbb{R}^{n}\) from \(\mathcal {N}(\boldsymbol{0}_{n}, \varPsi )\), and arrange them as columns in X
o. \(S^{o} = \frac{1}{d}X^{o}(X^{o})^{t}\) is then a central Wishart matrix. To study the effect of biased measurements, we randomly generate biases b
(i=1,…,d), resulting in the mean-shifted vectors x
i
in Fig. 3. The resulting matrix S is non-central Wishart with non-centrality matrix Θ=Σ
−1
MM
t, and M=1
b
t. In fact, we always sample two i.i.d. replicates of the matrices S
o and S, and we use the second ones as a test set to tune all model parameters of the respective methods (the ℓ
1 regularization parameter in graph lasso and the corresponding λ-parameter in the prior P
2(Ψ) of TiWnet and Wnet) by maximizing the predictive likelihood on this test set. In order to separate the effects of parameter tuning from the “true” differences in the models themselves, we additionally compared all models by tuning them to the same sparsity level. Figure 7 shows an example network drawn from our data generator together with a Gamma(2,4)-distribution of the absolute values of the edge weights.
Simulations
In a first experiment, we compare the performance of TiWnet with graph lasso and Wnet. The quality of the reconstructed networks is measured as follows: A binary vector l of size n(n−1)/2 encoding the presence of an edge in the upper triangle matrix of Ψ is treated as “true” edge labels, and this vector is compared with a vector \(\hat {\boldsymbol{l}}\) containing the absolute values of elements in the reconstructed \(\hat {\varPsi}\) after zeroing those elements in \(\hat{\boldsymbol{l}}\) which are not sign-consistent with the nonzero entries in Ψ (meaning that sign-inconsistent estimates will always be counted as errors). The agreement of l and \(\hat{\boldsymbol{l}}\) is measured with the F-measure, i.e. the highest harmonic mean of precision and recall under thresholding the elements in \(\hat{\boldsymbol{l}}\). The left panel in Fig. 8 shows boxplots of F-scores obtained in 20 experiments with randomly generated Ψ-matrices for graph lasso, TiWnet, and Wnet. For graph lasso, a series of \(\hat{\varPsi}\) estimates with increasing ℓ
1 penalty parameter is computed using the glassopath function from the glasso R-package.Footnote 4 For the MCMC-based methods TiWnet and Wnet, \(\hat{\varPsi}\) is computed as the sample average of networks drawn from the Gibbs samples after a certain burn-in period. The right panel shows the outcome of a Friedman test (i.e. non-parametric ANOVA) with post-hoc analysis for assessing the significance of the differences, see figure caption for further details. From the results we conclude that for the methods relying on the standard Wishart distribution (i.e. graph lasso and Wnet), column centering does not overcome the problem of model mismatch due to column biases. Further, TiWnet using only the pairwise distances D performs as well as graph lasso on the original (not shifted) data. Note that for the original S
o, graph lasso might indeed serve as a “gold standard”, since the model assumptions are exactly met. And last but not least, the invariance properties of the likelihood used in TiWnet are indeed essential for its good performance, since its non-invariant counterpart Wnet uses exactly the same MCMC code (apart from using the standard Wishart likelihood, of course).
The left column of Fig. 9 shows the networks reconstructed by the different methods (networks with highest predictive likelihood for graph lasso and sample average in the case of TiWnet and Wnet). The right column depicts the thresholded networks according to the best F-score with respect to the known ground truth. Analyzing the reconstructed networks in the left column of Fig. 9, it is obvious that the graph lasso networks are very dense, and that thresholding the edge weights is essential for a high F-score. Note, however, that such thresholding is only possible if the ground truth is known. The average TiWnet/Wnet result is also dense, since it represents the empirical distribution of networks sampled during the MCMC iterations. Thresholding the edges is also essential here, but for the MCMC models we can easily compute a truly sparse network by annealing the Markov chain without having access to the ground truth. Further studying this effect leads us to a second experiment, where we directly compare the lasso-type networks reconstructed using a sequence of ℓ
1 regularization parameters with the “frozen” TiWnet after annealing. In this comparison, however we do not allow for further thresholding the edge weights when computing the F-score (i.e. we replace the entries in \(\hat {\boldsymbol{l}}\) by their sign). The left panel in Fig. 10 shows that TiWnet clearly outperforms all other methods. We conclude that model selection in the lasso methods does not work satisfactorily, probably because the ℓ
1 penalty not only sparsifies the solution, but also globally shrinks the parameters. As a result, truly sparse solutions have a relatively small predictive likelihood. Further, it is obvious that in the case of TiWnet, the annealing mechanism in our MCMC sampler produces very sparse networks of very high quality. The direct comparison with the non-invariant Wnet model shows that the invariance in the Wishart likelihood is indeed the essential ingredient of TiWnet.
It is clear that the results of the previous experiment crucially depend on the model selection step. To exclude differences caused by model selection, in a third experiment we additionally investigated the performance of the models after tuning all of them to the same sparsity level as the annealed network obtained by TiWnet. The results are presented in Fig. 11. It is obvious that TiWnet clearly outperforms its competitors. Inspecting the recovered networks for the graph lasso, we see that under these restrictive sparsity constraints, the lasso selection has particular problems to recover the edges connecting hubs in the network.
We test the dependency of these results on the validity of the model assumptions, in a fourth experiment. The TiWnet in its simplest form uses only three levels for edge weights: 0,+1,−1. It is clear that this simple model will have problems recovering networks with a very high dynamic range of edge weights (the generalization to more than 3 levels, however, is straight forward). Since the edge weight distribution in the previous experiments was relatively concentrated around the mode of the gamma distribution (see Fig. 7), we changed the distribution to a uniform distribution over the interval [0.2,20]. This choice implies a uniform dynamic range over two decades. The performance of TiWnet measured in terms of the F-score, however, did not change significantly, see the top row in Fig. 12 in comparison to Fig. 8.
In order to further test the robustness under model mismatches, in a fifth experiment, we substituted the Gaussian to produce X
o with a Student-t distribution in our data generator. The resulting plot of F-scores (Fig. 12, bottom row) has the same overall-structure as in Fig. 8, which shows that TiWnet is relatively robust under such model mismatches. In summary, we conclude from these experiments that TiWnet significantly outperforms its competitors, and that the main reason for this good performance is indeed attributed to the invariant Wishart likelihood.
Real-world examples
A module network of Escherichia coli genes
For inferring module networks in a biological context, we applied the TiWnet to a published dataset of promoter activity data from ≈1100 Escherichia coli operons (Zaslaver et al. 2006). The promoter activities were recorded with high temporal resolution as the bacteria progressed through a classical growth curve experiment experiencing a “diauxic shift”. Certain groups of genes are induced or repressed during specific stages of this growth curve. Cluster analysis of the promoter activity data was performed using a spherical Gaussian mixture model with shared variance σ: \(p(x)=\sum_{k}\pi_{k} \mathcal{N}(x|\mu_{k},\sigma)\) along with a Dirichlet-process prior to automatically select the number of clusters. This revealed the presence of 14 distinct gene clusters (see expression profiles of nodes in Fig. 13). Network inference with TiWnet was carried out on a Bhattacharyya kernel K
B
computed over the Gaussian clusters where \(K_{B}(k,j)=\exp^{ -\|\mu_{k} - \mu_{j}\|^{2}/ 8\sigma^{2}}\) (see Jebara et al. 2004). When the clusters were analyzed, genes known to be co-regulated were predominantly found in the same or nearby clusters with positive partial correlations. For example, during the diauxic shift experiment, the transcriptional activator CRP induces a certain set of genes in a specific growth phase (Keseler et al. 2011). Strikingly, of the 72 known CRP regulated operons in the dataset, 43 genes are found in cluster 6 or the four neighboring clusters (3,9,11,13). Likewise, genes involved in specific molecular functions (those coding for proteins involved in amino acid biosynthesis pathways) were found in close proximity in the network, for example in nodes 1 and 2 (Fig. 13). Physiologically, this co-regulation makes sense since protein biosynthesis (carried out by the ribosome) depends on a constant supply of synthesized amino acids. Thus TiWnet can successfully identify connections between genes co-regulated by the same molecular factor, or are involved in interlinked molecular processes.
“Landscape” of chemical compounds with in vitro activity against HIV-1
As a second real-world example TiWnet is used to reconstruct a network of chemical compounds. We enriched a small list of compounds identified in an AIDS antiviral screen by NCI/NIH available at http://dtp.nci.nih.gov/docs/aids/searches/list.html#NPorA with all currently available anti-HIV drugs, yielding a set of 86 compounds. Chemical hashed fingerprints were computed from the chemical structure of the compounds that was encoded in SMILES strings (Weininger 1988). The Tanimoto kernel, a similarity matrix S of inner-product type, is constructed by the pairwise Tanimoto association scores (Rogers and Tanimoto 1960) between the compounds. Since the geometric position of the underlying Euclidean space is unclear, we again relied heavily on the geometric invariance inherent in TiWnet. The resulting network (Fig. 14) shows several disconnected components which nicely correspond to chemical classes (the node colors). Currently available anti-HIV drugs are indicated by their chemical and commercial names alongside their 2D-structures depicting the chemical similarity underlying this network. These drugs belong to the functional groups “Nucleoside reverse transcriptase inhibitors (NRTI)”, “Non-nucleoside reverse transcriptase inhibitors (NNRTI)”, “Protease inhibitors”, “Integrase inhibitors”, or “Entry inhibitors”, and most compounds of a certain functional type cluster together in the graph. Medically, this network can be very useful to predict “cross resistance” between resistant HIV-1 variants and drugs and is especially distinctive for NRTIs. The pairs lamivudine-emtricitabine, tenofovir-abacavir, and d4T-zidovudine(ZDV) show almost the same resistance profiles (Johnson et al. 2010). This similarity is very well reflected by our network where these pairs are in close proximity.
It is worth noting that graph lasso has similar difficulties on this dataset as in the toy examples. When following the solution path by varying the penalty parameter, it is difficult to find a good compromise between sparsity and connectivity: either the obtained graphs are very dense being difficult to plot and harder to interpret, or are increasingly sparse in which, however, several interesting structural connections are lost since many singleton nodes are created. For a graphical depiction, refer Figs. 1–3 in Supplementary material A. The R and C++ source code for this experiment using TiWnet is available at http://bmda.cs.unibas.ch/TiWnet.
The “Landscape” of glycosidase enzymes of Escherichia coli.
In yet another real-world experiment, we use TiWnet to extract the network of Glycosidase enzymes of Escherichia coli. Every enzyme is represented by its vectorized contact map computed from their PDB (Protein Data Bank) files. A contact map is a compact representation of the topological information of the 3D protein structure, present in the PDB file, into a symmetric, binary 2D matrix consisting of pairwise, inter-residue contacts: for a protein with R amino acid residues, the contact map (see Fig. 15) would be a R×R binary matrix CM where CM
ij
=1 if residues i and j are similar or 0 otherwise. The starting point for TiWnet is the contact map representation of an enzyme whose row-wise vectors serve as strings. To obtain the pairwise distances between strings in these contact maps, we compute the Normalized Compression Distance (NCD) (Li et al. 2004) which is an approximation to the Normalized Information Distance (NID). The NID (Li et al. 2004) is a distance metric minimizing any admissible metric between objects. Given strings x and y, NID is proportional to the length of the shortest program that computes x|y as well as y|x and is defined as
$$\mathit{NID}(x,y) = \frac{\max\{K(x|y),K(y|x)\}}{\max\{K(x),K(y)\}} = \frac {K(xy)-\min\{K(x),K(y)\}}{\max\{K(x),K(y)\}} $$
where K(x) is the Kolmogorov complexity of the string x. The real-world approximated version of NID is given by NCD and is calculated as follows:
$$\mathit{NCD}(x,y) = \frac{C(xy)-\min\{C(x),C(y)\}}{\max\{C(x),C(y)\}}, $$
where C(xy) represents the size of the file obtained by compressing the concatenation of x and y. We use the ProCKSI-Server (Barthel et al. 2007; Krasnogor and Pelta 2004) to compute NCD(x,y).
The network extracted by TiWnet from the NCD values is shown in Fig. 16. The network shows a clear formation of subnets of enzymes given by node colors. To further analyze the obtained subnets, we look at their corresponding Gene Ontology (GO) annotations. The GO annotations are part of a Directed Acyclic Graph (DAG), covering three orthogonal taxonomies: molecular function, biological process and cellular component. For two subnets (shown in dotted circles in Fig. 16), we inspect the GO subgraphs that are subsets of the entire GO graph. The three taxonomic components of the GO subgraphs explain the proteins in these subnets and show the relevance of these proteins through the color-scaling scheme where red accounts for highly-frequent enzymes. As depicted, the GO subgraphs plotted for the two subnets consist of many highly-significant enzymes thus emphasizing that the subnets so obtained using TiWnet are not random, but instead consist of groups of enzymes having shared annotations. Subnets of this kind are beneficial to identify the most important GO domains for a given set of enzymes and also suggest biological areas that warrant further study.