Advertisement

BMC Bioinformatics

, 20:144 | Cite as

Network meta-analysis correlates with analysis of merged independent transcriptome expression data

  • Christine Winter
  • Robin Kosch
  • Martin Ludlow
  • Albert D. M. E. Osterhaus
  • Klaus JungEmail author
Open Access
Methodology article
Part of the following topical collections:
  1. Transcriptome analysis

Abstract

Background

Using meta-analysis, high-dimensional transcriptome expression data from public repositories can be merged to make group comparisons that have not been considered in the original studies. Merging of high-dimensional expression data can, however, implicate batch effects that are sometimes difficult to be removed. Removing batch effects becomes even more difficult when expression data was taken using different technologies in the individual studies (e.g. merging of microarray and RNA-seq data). Network meta-analysis has so far not been considered to make indirect comparisons in transcriptome expression data, when data merging appears to yield biased results.

Results

We demonstrate in a simulation study that the results from analyzing merged data sets and the results from network meta-analysis are highly correlated in simple study networks. In the case that an edge in the network is supported by multiple independent studies, network meta-analysis produces fold changes that are closer to the simulated ones than those obtained from analyzing merged data sets. Finally, we also demonstrate the practicability of network meta-analysis on a real-world data example from neuroinfection research.

Conclusions

Network meta-analysis is a useful means to make new inferences when combining multiple independent studies of molecular, high-throughput expression data. This method is especially advantageous when batch effects between studies are hard to get removed.

Keywords

Fold change Gene expression Meta-analysis Network meta-analysis Research synthesis 

Introduction

Network meta-analysis has been widely used for aggregating results of clinical trials to make direct and indirect inferences about treatment effects, and several methodical concepts for network meta-analysis have been proposed [1, 2, 3]. Published examples of network meta-analysis are for example the comparison of the efficacy of different treatments against each other [4], the comparison of different therapies [5], or the study of safety of different drugs [6]. In contrast to ‘traditional’ meta-analysis which aggregates studies on the same study question, network meta-analysis also involves studies on different study questions which are linked by pairwise same treatment groups. Treatment comparisons that have not been studied in the original studies can indirectly be made within the network meta-analysis. Thus, inferences about group comparisons which are not linked within the network of study groups from the original studies are possible. While ‘traditional’ meta-analysis has already been used to merge the results of high-dimensional gene expression studies from microarray or RNA-seq experiments, and this topic has also been elaborated methodically [7, 8, 9], the relatively new methodology of network meta-analysis has not been considered for such data so far. Examples of ‘traditional’ meta-analysis of high-dimensional expression data are for example the identification of genes differentially expressed in cancer [10] or in neurological tissues [11, 12].

The aim of this work is to compare network meta-analysis as a tool for indirect inferences with the analysis of merged gene expression data. Since most journals in the area of high-dimensional expression data demand submitting authors to deposit their original data in public repositories such as Gene Expression Omnibus (GEO) [13] or ArrayExpress (AE) [14] alternatives to meta-analysis and network meta-analysis have opened up: the direct merging and subsequent joint analysis of the original data. Merging of original data can also be an approach of making indirect comparisons. However, merging becomes difficult if the data was taken using devices from different manufacturer or even different technologies. Problems in data merging may for example arise when expression data in some studies were taken by means of DNA microarrays as continuous fluorescence values [15] and by means of RNA-seq as read counts [16] in other studies. In some cases, batch effects between different types of expression data can be removed [17, 18]. However, even after applying a batch effect removing step onto the merged data false discoveries may occur as was shown by [19]. Therefore, merging of results in form of meta-analysis appears to be advantages in such cases, meaning that meta-analyses should be preferred over data merging strategies. Hereupon, the question arises how comparable indirect inferences from network meta-analysis and from the analysis of merged data are.

In this article, we evaluate the possibility of indirect group comparisons using either the strategy of data merging or of network meta-analysis. Specifically, we study how strong the lists of differentially expressed genes detected in indirect group comparisons by either type of analysis differ. Furthermore, we study how strong the indirect fold changes of genes determined by the two ways of analysis are correlated, and how strong they are correlated to the true fold changes. After briefly describing the approaches of network meta-analysis and the alternative analysis variant based on merged data sets, we demonstrate the benefits and limitations of either approach in a simulation study and on a data example of high-dimensional gene expression data from infection research.

Methods

Consider a study network with n different experimental groups. Let further m denote the number of possible pairwise group comparisons in this network. Thus, a graph is formed with n nodes and m edges. In practice, not all m edges will be covered by direct study internal comparisons. In this case, mdirectm denotes the number of existing comparisons for which effect estimates are available directly from at least one study. One goal of the network meta-analysis is to obtain estimates for the non-existing mindirect=mmdirect comparisons. The study networks depicted in Fig. 1 consist of n=3 nodes and mdirect=2 directly available comparisons, while for mindirect=1 pair of study groups no direct comparisons exist from the original studies. Thus, the whole study network consists of m=3 edges. In the study network at the bottom of Fig. 1, the comparison of treatment A versus control is supported by three independent studies. Thus, the number of available independent comparisons can be even larger than mdirect. We therefore introduce \(m^{\prime }_{direct}\geq m_{direct}\) as the number of available independent comparisons in the network.
Fig. 1

Schemes of study networks. Networks were either simulated or represent the infection example. Top: two studies are connected by a similar control group. (This scenario is evaluated in simulations no. 1a and no. 1b and by the infection example.). Bottom: the edge representing the comparison between treatment A and control is supported by three independent studies (This scenario is evaluated in simulation no. 2.)

Differential expression analysis can either be performed on the \(m^{\prime }_{direct}\) available individual studies so that results can be merged in a network meta-analysis. Alternatively, differential expression analysis can be performed on the merged data. Both variants allow for direct and indirect inferences.

Differential testing

As method for differential testing between each pair {k,k} of experimental groups (kk;k,k=1,...,n) we use the linear models implemented in the R-package ‘limma’ [20]. After fitting this model to the data, we obtain for each gene g (g=1,...,G) the estimated regression coefficient and its related standard error from the result object from the ‘eBayes’ function of the ‘limma’-package:
$$ \hat\beta_{g}\hspace{.5cm}\text{and}\hspace{.5cm}SE\left(\hat\beta_{g}\right). $$
(1)

In these linear models, the regression coefficients can be interpreted in the sense of the log fold change of a gene between two experimental groups. Besides, test results in form of a p-value per gene are procuced, of course. Fold changes, standard errors and p-values can then be used in the network meta-analysis to bring together the results of the individual studies and also to make indirect comparisons.

Network meta-analysis

To estimate regression coefficients and their standard errors within the network of comparisons (direct as well as indirect comparisons) we employ the method proposed by [2] which we briefly sketch in the following and refer the reader to this publication for further details. The calculations of the network meta-analysis are done separately for each gene g (g=1,...,G). Here, G is the number of genes jointly studied in all independent studies. Genes for which the expression measurements are not available in all studies are excluded from the analysis. To determine in this network the log fold change of gene g and its standard error related to the comparisons of all m pairs of groups {k,k}(kk,andk,k=1,...,n), a (mdirect′×mdirect′) weight matrix W is constructed first, with diagonal elements \(1/SE(\hat \beta)^{2}\) and with all other entries being equal zero. With this weight matrix, comparisons with a high standard error get less weight in the network. Furthermore, the regression coefficients \(\hat \beta _{g}\) from the individual comparisons are stored in the vector x. Next, an (mdirect′×n) matrix B is constructed where each row represents one of the \(m^{\prime }_{direct}\) available comparisons, and where the connections of the nodes to each other are represented. Therefore, in each row of B, a 1 is put in the column related to the node of experimental group k and a -1 one is put in the column related to the other group k of the available comparison represented by this row. All other elements are zero. Thus, matrix B shows for which pairs of experimental groups, results of differential expression analysis are available from the original studies. Using the matrices W and B, a Laplacian matrix as used in graph theory and its Moore-Penrose inverse are calculated as follows:
$$ \mathbf{L}=\mathbf{B}^{T}\mathbf{WB}, \text{ and } \mathbf{L}^{+}=(\mathbf{L}-\mathbf{J}/n)^{-1}+\mathbf{J}/n, $$
(2)
where J is an (n×n) matrix of ones. The variances of the log fold changes in the network meta-analysis can then be determined by the (n×n) matrix R with entries
$$ \mathbf{R}_{k,k'}=\mathbf{L}^{+}_{k,k}+\mathbf{L}^{+}_{k',k'}-2\mathbf{L}^{+}_{k,k'}. $$
(3)

Note that R is symmetric, i.e. \(\phantom {\dot {i}\!}R_{k,k'}=R_{k',k}\). The standard errors for each comparison in the network meta-analysis are then given by \(\sqrt {\mathbf {R}}\).

In order to calculate estimates of the direct log fold changes in this network, stored in vector v of length \(m^{\prime }_{direct}\), the following equation is used:
$$ \mathbf{v}=\mathbf{BL}^{+}\mathbf{B}^{T}\mathbf{Wx}. $$
(4)

In the case that \(m^{\prime }_{direct}=m_{direct}\), the elements of v are equal to the input fold changes stored in x. In cases where \(m^{\prime }_{direct}>m_{direct}\), the elements of v for network edges which are supported by multiple studies are a summary of the fold changes from these studies. The fold changes for the indirect comparisons can be obtained by a subtraction procedure between the elements of v. This subtraction procedure is detailed by the example code provided within [2] (cf. three fold for-loop to construct the matrix ‘all’ in their example code). To perform these calculations in our simulations and in the analysis of the infection data, we employ the R-package ‘netmeta’ that provides the implementation of the methods by [2].

Example R-code that shows how to use the ‘limma’-results in the package ‘netmeta’ is provided as supplementary material (Additional file 1).

Batch effect removal in merged data sets

A regular problem when merging data from different studies are batch effects. Therefore, we base our simulation study on a gene expression model that includes additive and multiplicative batch effects [17]. This model was recommended as a results of a systematic comparison by [18]. We refer to a further comparison of methods for batch effect removal in the discussion section of this work. In this model, the gene expression level of gene g in group j and study i is drawn by
$$ Y_{ijg}=\alpha_{g} + \beta_{gj} + \gamma_{ig} + \rho_{ig}\epsilon_{ijg}, $$
(5)

where αg and βgj are the overall and the group specific expression level, respectively. The components γig and ρig are an additive and multiplicative batch effet, respectively, and εijg is the overall error. Estimation and removal of these types of batch effects are implemented in the ‘ComBat’ function of the R-package ‘sva’.

Results

To evaluate network meta-analysis of transcriptome profiles and to compare the results with the analysis of merged data sets, we ran a simulation study and applied the methods to an example from neuroinfection research. All simulation scenarios were first performed using 500 runs and then repeated with 1000 runs which led to the same conclusions. Therefore, the authors considered 1000 runs a appropriate choice.

Simulation study

Simulation no. 1a represents two studies on two different diseases (A and B), each study involving samples from diseased individuals and from healthy controls. In practice, a researcher would usually be interested in a comparison of two diseases from a similar area (e.g. different cancers or different infectious diseases). While the individual studies provide the direct comparison between samples from the disease group versus control samples, data merging or network meta-analysis can be used to make the indirect comparison of the samples from the two disease groups (Fig. 1). The comparison of the transcriptome expression data from the disease groups could provide insights about their differences, e.g. which genes are highly expressed under disease A but not under disease B. Assuming, alternatively, not a scenario with diseases but with different treatments (where the control group represents untreated samples) the indirect comparison of the different treatment samples could uncover which genes are influenced by treatment A but not by treatment B.

In the simulation, the parameters of the model specified by Eq. (5) were mainly drawn from the normal distribution except for the multiplicative batch effect which was drawn from the inverse Gamma distribution (Table 1). Using the inverse Gamma distribution was also proposed by [17] to obtain values distributed around 1. Hence, for most genes, the multiplicative effect is rather weak. For both studies, different values of the distribution parameters were chosen for the batch effects. Note also, that the term for the fold change, βgj, was set to zero for the control groups. In total, we simulate data for G=100 genes which is enough, here, to compare ranking lists from differential expression analysis. Sample sizes per group were chosen as n1=n2=10 in this simulation.
Table 1

Setting of simulation parameters

Group/group

α g

β gj

γ ig

ρ ig

ε ijg

Study 1: Control

\(\mathcal {N}(0, 1)\)

0

\(\mathcal {N}(0, 1)\)

InvGamma(1,1)

\(\mathcal {N}(0, 1)\)

Study 1: Disease A

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

InvGamma(1,1)

\(\mathcal {N}(0, 1)\)

Study 2: Control

\(\mathcal {N}(0, 1)\)

0

\(\mathcal {N}(2, 1)\)

InvGamma(1,2)

\(\mathcal {N}(0, 1)\)

Study 2: Disease B

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(2, 1)\)

InvGamma(1,2)

\(\mathcal {N}(0, 1)\)

Study 3: Control

\(\mathcal {N}(0, 1)\)

0

\(\mathcal {N}(0, 1)\)

InvGamma(1,1)

\(\mathcal {N}(0, 1)\)

Study 3: Disease A

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

InvGamma(1,1)

\(\mathcal {N}(0, 1)\)

Study 4: Control

\(\mathcal {N}(0, 1)\)

0

\(\mathcal {N}(0, 1)\)

InvGamma(1,1)

\(\mathcal {N}(0, 1)\)

Study 4: Disease A

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

\(\mathcal {N}(0, 1)\)

InvGamma(1,1)

\(\mathcal {N}(0, 1)\)

Simulation nos. 1a and 1b involve studies 1 and 2, only, while simulation no. 2 involves all four studies

Comparing in simulation no. 1a the ranks of p-values and ranks of log fold changes from the network meta-analysis versus those from the merged data analysis, high correlations can be observed (Fig. 2). The results of both analysis variants would therefore lead to similar biological conclusions. If we look at the true simulated fold changes, β.1β.2, and correlate them with either the fold changes from the network meta-analysis or from the merged data analysis, again no large differences between the two analysis variants could be observed. Taken from 1000 simulation runs, the mean (+/- standard deviation) correlation between the true fold changes and those from the network meta-analysis or from the merged data was 0.74 +/- 0.18 each (Additional file 2).
Fig. 2

Correlation between results of data merging versus results of network meta-analysis. Smoothed scatterplots representing the ranks of p-values (top) and of log fold changes (bottom), respectively, resulting from network meta-analysis versus the results from the analysis of merged data in the simulation of two independent studies (Simulation no. 1a). The plots represent the results from 1 of 1000 simulation runs

In order to study how the correlation between the true fold changes and those from either network meta-analysis or from the merged data analysis changes when sample sizes are increased, the simulation scenario was extended with sample sizes per group being increased from n1=n2=100 to n1=n2=1000 by steps of 100 (simulation no. 1b). Again, 1000 runs were performed for each value of the sample sizes. In this simulation, the correlation increases when the sample size per group was increased (Fig. 3). Still, no relevant differences in the correlation can be seen between the two analysis variants, i.e. for each sample size the corresponding two boxplots are nearly identical.
Fig. 3

Precision of fold change estimation in a simple scenario. Boxplots representing the correlation between true and estimated logFC versus sample size per group observed in the analysis of merged data and in network meta-analysis. 1000 simulation runs of two independent studies were performed per sample size (Simulation no. 1b). Both analysis variants show nearly the same correlation which increases with increasing sample sizes

Network meta-analysis also allows that an edge of the network is supported by multiple studies (Fig. 1 bottom). In simulation no. 2, we generated data from three independent studies to support the comparison between treatment A and control, and data from one study to support the comparison between treatment B and control. In this scenario, the correlation between true and calculated logFC was overall higher when using network meta-analysis than in the analysis of the merged data (Fig. 4).
Fig. 4

Precision of fold change estimation in more complex scenarios. Correlation between true and estimated logFC observed in the analysis of merged data and in network meta-analysis from 1000 runs of simulation scenario no. 2., where one edge of the network is represented by multiple independent studies

Examples: transcriptome expression profiles in neuroinfectious diseases

Our real world example involved transcriptome expression profiles from ZIKA virus (ZIKV) infected neural progenitor cells [21] as well as expression profiles of differentiated NT-neurons infected with herpes simplex virus 1 (HSV1). No journal publication is available for the latter study. Both data sets were selected from GEO with accession numbers GSE80434 (African ZIKVM and mock infected samples only) and GSE24725, respectively. Furthermore, both studies follow a two group design with the infected cells compared to control samples. ZIKV is a mosquito-borne Flavivirus, first discovered in 1947 in Uganda [22]. HSV1 belongs to the class of Herpesviridae and is transmitted by direct contact. The capacity of both viruses to infect neural tissues following initial systemic virus spread means that a network meta-analysis can be helpful to identify genes that show a different expression in hosts infected by either virus [23]. The intersection of both studies was G=7912 genes that were subjected to the joint analysis. In order to compare the expression profiles of ZIKV and HSV1 infected neural cells we first merged both data sets, performed the batch effect removal and finally differential expression analysis. We denote the resulting p-values and log fold changes by pmerged and logFCmerged, respectively. As second analysis variant, we performed network meta-analysis obtaining pnet and logFCnet, respectively.

In general, the order of the p-values and log fold changes in both analysis variants were highly but not perfectly correlated (Fig. 5). Thus, the top selected genes can differ between the two strategies, and biological conclusions can vary. Variations in the biological interpretation from both analysis strategies will be discussed in the last chapter.
Fig. 5

Correlation of results in infection data set. Smoothed scatterplots representing the ranks of p-values (top) and log fold changes (bottom), respectively, resulting from network meta-analysis versus the results from the analysis of merged ZIKV and HSW1 data sets

In addition to the differential expression analysis, we studied how gene set enrichment analysis changes when using either merged data analysis or network meta-analysis. Therefore, ranked gene lists resulting from differential expression analysis were subjected to GO term enrichment analyses. In total 4860 GO terms were analysed. Based on the merged data analysis, 43 GO terms were significantly enriched among the differentially expressed genes between ZIKV and HSV1, while 67 GO terms were selected when using network meta-analysis. The overlap of these two sets included 13 GO terms that would contribute to the biological interpretation regardless of the type of analysis. Again, the commonalities and differences in biological interpretation will be discussed in the last chapter.

Discussion

Differences in biological interpretation

The analyses of the infection data by data merging and network meta-analyis, respectively, have shown commonalities and differences in the results. This can have consequences on the biological interpretation as will be demonstrated in the following.

In general, among the top 10 genes selected with both analysis variants (Table 2) are genes with diverse functions in cell recruitment, apoptosis or neuronal development. Some of these genes were already described in connection with the development of other neuropathic diseases, in particular with Alzheimer’s, Parkinson’s and Huntington’s Disease.
Table 2

Top 10 differentially expressed genes (i.e., with the smallest p-values), selected from either merged data sets (left) and network meta-analysis (right), respectively

Rank

Merged analysis

Network meta-analysis

1

COX7B

RHO

2

CXCR3

LTB

3

LTB

CXCR3

4

RHO

COX7B

5

TNFAIP2

TPO

6

SF1

SLC39A2

7

ENO1

HAL

8

SLC4A1

TNFAIP2

9

PNMA1

PNMA1

10

H1FX

MFN2

Bold names indicate 6 genes that appear in both top 10 lists

Looking at the 6 genes that occur in both top 10 lists, CXCR3 is expressed on activated T-lymphocytes, natural killer cells and on B-lymphocyte subsets and mediates T-cell migration into inflammatory areas of the nervous system during viral infection [24, 25]. Furthermore, CXCR3-deficient mice showed an increased mortality rate (associated with higher viral load) after West Nile Virus (WNV) or dengue virus infection [26, 27] that can also lead to neuropathic diseases. In contrast, an elevated level of viral clearance was observed during HSV-1 encephalitis in CXCR3-deficient mice resulting in reduced clinical signs and decreased mortality [28, 29]. CXCR3 activation lead to transactivation of pro-inflammatory genes, and initiation of apoptosis in neurons. To prevent neuronal cell death during WNV Encephalitis, WNV-infected cells induce TNF α-regulated signaling pathways which result in down regulation of CXCR3 [30]. COX7B is one of the small, nucleus-encoded subunits of cytochrome c-oxidase, the terminal complex in the mitochondrial respiratory chain. The small subunits have regulatory functions and play an essential role in complex assembly [31]. Furthermore, mutations lead to microcephaly, indicating a role for COX7B in brain and eye development [32]. Expression changes of COX7B have also been described during the development of neurodegenerative diseases [23, 33, 34]. Anti-PNMa1 autoantibodies can be found in patients with paraneoplastic neurological disorders [35] in connection with brainstem or limbic encephalitis, hypothalamic disorder and dementia. Furthermore, PNMA1 expression is also increased in apoptotic neurons, although the underlying mechanism is poorly understood [36]. Lymphotoxin B (LTB) is a type II membrane protein encoded by the LTB gene and plays a key role during lymph node development, LTB gene deletion in mice leads to a lack of peripheral lymph nodes and Peyer’s patches [37]. LTB only binds its receptor LTBR, leading to NF κB activation and cell death [38, 39]. With respect to ZIKV and HSV-1, these 6 genes could play a similar role.

Among the top 10 genes selected by the analysis of the merged data are ENO1, H1FX, SF1, SLC4A1, which only occur in the network meta-analysis from rank 469 and below, and would probably not be considered in a biological interpretation of the results. ENO1 catalyzes the penultimate step in glycolysis, but is also involved in regulation processes, such as inflammatory cell recruitment [40] and tumor suppression [41]. The protein interacts with ZIKV non-structural proteins and is able to influence cell proliferation and differentiation [42, 43]. H1FX belongs to the histone H1 family. H1 linker histones bind the nucleosomal core particle around the DNA entry and exit sites and stabilize the chromatin structure. In this way, H1 proteins are involved in transcriptional regulation, but also play a role in cell proliferation and differentiation. All H1 variants have the same general structure, but differ in their functions [44]. SLC4A1 is a chloride-bicarbonate exchanger expressed in erythrocytes and intercalated cells of renal collecting ducts. Mutations of SLC4A1 have been described associated with distal renal tubular necrosis and haemolytic anemia [45]. Little is known about SF1 in connection with neuroinfection.

In contrast, the top10 genes selected by network meta-analysis included HAL, MFN2, TPO, SLC39A2 which also occur among the top20 list obtained from the analysis of the merged data. Therefore, these 4 genes would eventually be regarded in the interpretation of both analysis results. Mitofusin 2 (MFN2) GTPase is a mitochondrial membrane protein that is also crucial in mitochondria metabolism [46]. Furthermore, MFN2 is involved in activation of the inflammasome in macrophages during virus infection [47]. The loss of MFN2 lead to an enhanced virus-induced synthesis of IFN β and decreased viral reproduction [48]. Thyroid Peroxidase (TPO) is expressed in the thyroid gland and is essential for thyroid hormonogenesis. Nevertheless, TPO promotor also contains a specific NF κB binding site, leading to transactivation after (LPS) stimulation [49].

In the gene-set enrichment analysis 13 GO terms were identified regardless of the type of analysis. In general, these 13 GO terms could hardly be related to either the neurological or infection context. However, some of the enriched GO terms have been described in connection with viral infection. Polyoma virus infected cells showed an upregulation of genes associated with positive regulation of cell proliferation (GO:0008284) [50]. The term GO:0006977 (DNA damage response, signal transduction by p53 class mediator resulting in cell cycle arrest) is enriched in in neoplastic cells infected with Epstein-Barr Virus (EBV), another member of the herpesvirus family [51]. If only network meta-analysis was performed, GO:0006915 (apoptosis) was selected for example. The term GO:0006915 was identified to be overrepresented in retinal epithelium cells after infection with West Nile virus compared to uninfected cells [52]. In contrast, if only the merged data were analysed, the term GO:0007049 (cell cycle) was selected, which was detected to be enriched among differentially expressed genes in patients with EBV associated infectious mononucleosis [53].

Methodical issues

We have demonstrated in a simulation study and by the analysis of a real-world example that network meta-analysis is a useful tool to make additional inferences from multiple independent studies with high-dimensional molecular expression data. While the results of network meta-analysis are highly correlated with the results of merged data analysis in simply study networks, network meta-analysis showed a higher correlation with the true fold changes than merged data analysis when one edge of the network was supported by multiple independent studies. This might indicate that the step of batch effect removal does not work well in the latter case. In our data analysis we used the ‘ComBat’ method to remove batch effects, and we used the same model for generating the simulation data. Thus, our results could be too optimistic with respect to the performance of the approach of analyzing the merged data. In practice, there may also be other types of batch effects which are not considered by the ‘ComBat’ model. Another batch effect removal approach, ‘FAbatch’, was proposed by Hornung et al. [54] that failed in their evaluation only in the case of extremely outlying batches or in cases where batch effects were very weak compared to the biological signal. Hornung et al. also provide a more detailed discussion on different batch effect models and methods for batch effect removal. These specific cases where batch effect removal fails are also an argument in favor of the network meta-analysis approach. Furthermore, as mentioned in the introduction, batch effect removal might also be a critical step when expression data was taken by different platforms.

In order to further study the issue of multiple batches we generated principal component plots under the simple and under the more complex study network scenarios, each before and after the step of batch effects removal (Additional file 3). Therein, samples of the control groups cluster together after batch effect removal, but samples from the different disease groups form separate clusters that may be represented by different batches. While most methods for batch effect removal have been devised for scenarios with dichotomous target variables (e.g. control versus diseased), typical scenarios of study networks involve multiple groups of different diseases, and these may be represented by different batches. By circumventing the step of batch effect removal, network meta-analysis can provide a helpful alternative over the analysis of merged data sets when there is uncertainty regarding the performance of the batch removal step.

Regarding the number G of genes involved in the analysis, we mentioned in the methods part that genes that are not involved in all studies of the network will be dropped from the analysis. In the case of larger study networks and when using network meta-analysis it would be easily possible to study sub-networks and thus to re-include some of the omitted genes. When using the data merging approach, studying sub-networks with some of the omitted genes re-included would require to newly perform the data merging with batch effect and normalization steps which would in summary make the results from the different sub-networks hard to compare.

Regarding the method for network meta-analysis, we have so far only used the methods by [2], implemented in the R-package ‘netmeta’. A comparison with the results of other network meta-analysis approaches would be an interesting addition which we intend for our future research.

Notes

Acknowledgements

None.

Funding

This work was supported by the Niedersachsen-Research Network on Neuroinfectiology (N-RENNT) of the Ministry of Science and Culture of Lower Saxony. The funding body was not involved in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials

All expression data can be publicly derived from databases NCBI Gene Expression Omnibus or EMBL-EBI ArrayExpress with Databank ID given in “Examples: transcriptome expression profiles in neuroinfectious diseases” section.

Authors’ contributions

KJ formulated the idea for network meta-analysis of high-throughput expression data, performed the simulation studies and supervised the analyses. CW and RK performed the data analysis and contributed to the simulation studies. CW, ML and AO performed the biological interpretation of the results. All authors contributed in writing the manuscript. All authors read and approved the final manuscript.

Ethics approval and consent to participate

No human, animal or plant derived strains have been used directly. This study is instead based on retrospectively analysed, publicly available data.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary material

12859_2019_2705_MOESM1_ESM.r (3 kb)
Additional file 1 Example R-code for network meta-analysis. R-code that demonstrates how fold changes and their standard errors as obtained from ‘limma’ are used for network meta-analysis in the ‘netmeta’-R-package. (R 3 kb)
12859_2019_2705_MOESM2_ESM.png (5 kb)
Additional file 2 Correlation between true and estimated logFC (Simulation no. 1a). Boxplots representing the correlation between true and estimated logFC versus sample size per group observed in the analysis of merged data and in network meta-analysis. 1000 simulation runs of two independent studies were performed with samples of n=10 per group (Simulation no. 1a). (PNG 6 kb)
12859_2019_2705_MOESM3_ESM.pdf (136 kb)
Additional file 3 Principal component plots of merged data before and after batch effect removal. PCA plots of samples within a simple study network (top) or more complex study network (bottom) before and after batch effect removal. After batch effect removal the samples of the control groups cluster together. (PDF 136 kb)

References

  1. 1.
    Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med. 2002; 21(16):2313–24.PubMedGoogle Scholar
  2. 2.
    Rücker G. Network meta-analysis, electrical networks and graph theory. Res Synth Methods. 2012; 3(4):312–24.PubMedGoogle Scholar
  3. 3.
    Dias S, Sutton AJ, Ades A, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Dec Making. 2013; 33(5):607–17.Google Scholar
  4. 4.
    Sobieraj DM, Coleman CI, Pasupuleti V, Deshpande A, Kaw R, Hernandez AV. Comparative efficacy and safety of anticoagulants and aspirin for extended treatment of venous thromboembolism: A network meta-analysis. Thromb Res. 2015; 135(5):888–96.PubMedGoogle Scholar
  5. 5.
    Lipinski MJ, Benedetto U, Escarcega RO, Biondi-Zoccai G, Lhermusier T, Baker NC, Torguson R, Brewer Jr HB, Waksman R. The impact of proprotein convertase subtilisin-kexin type 9 serine protease inhibitors on lipid levels and outcomes in patients with primary hypercholesterolaemia: a network meta-analysis. Eur Heart J. 2015; 37(6):536–45.PubMedGoogle Scholar
  6. 6.
    Trelle S, Reichenbach S, Wandel S, Hildebrand P, Tschannen B, Villiger PM, Egger M, Jüni P. Cardiovascular safety of non-steroidal anti-inflammatory drugs: network meta-analysis. BMJ. 2011; 342:7086.Google Scholar
  7. 7.
    Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 2012; 40(9):3785–99.PubMedPubMedCentralGoogle Scholar
  8. 8.
    Rau A, Marot G, Jaffrézic F. Differential meta-analysis of RNA-seq data from multiple studies. BMC Bioinformatics. 2014; 15(1):91.PubMedPubMedCentralGoogle Scholar
  9. 9.
    Sudmant PH, Alexis MS, Burge CB. Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biol. 2015; 16(1):287.PubMedPubMedCentralGoogle Scholar
  10. 10.
    Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A. 2004; 101(25):9309–14.PubMedPubMedCentralGoogle Scholar
  11. 11.
    Logotheti M, Papadodima O, Venizelos N, Chatziioannou A, Kolisis F. A comparative genomic study in schizophrenic and in bipolar disorder patients, based on microarray expression profiling meta-analysis. Sci World J. 2013. Article ID 685917.Google Scholar
  12. 12.
    Kosch R, Delarocque J, Claus P, Becker SC, Jung K. Gene expression profiles in neurological tissues during West Nile virus infection: a critical meta-analysis. BMC Genomics. 2018; 19(1):530.PubMedPubMedCentralGoogle Scholar
  13. 13.
    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets – update. Nucleic Acids Res. 2012; 41(D1):991–5.Google Scholar
  14. 14.
    Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, Dylag M, Kurbatova N, Brandizi M, Burdett T, Megy K, Pilicheva E, Rustici G, Tikhonov A, Parkinson H, Petryszak R, Sarkans U, Brazma A. ArrayExpress update – simplifying data submissions. Nucleic Acids Res. 2014; 43(D1):1113–6.Google Scholar
  15. 15.
    Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995; 270(5235):467–70.Google Scholar
  16. 16.
    Anders S, Pyl PT, Huber W. Htseq – a python framework to work with high-throughput sequencing data. Bioinformatics. 2015; 31(2):166–9.PubMedGoogle Scholar
  17. 17.
    Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics. 2007; 8(1):118–27.PubMedPubMedCentralGoogle Scholar
  18. 18.
    Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, Weiss-Solís DY, Duque R, Bersini H, Nowé A. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform. 2012; 14(4):469–90.PubMedGoogle Scholar
  19. 19.
    Nygaard V, Rødland EA, Hovig E. Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics. 2016; 17(1):29–39.PubMedGoogle Scholar
  20. 20.
    Smyth GK. Limma: linear models for microarray data. In: Bioinformatics and computational biology solutions using R and Bioconductor. New York: Springer: 2005. p. 397–420.Google Scholar
  21. 21.
    Zhang F, Hammack C, Ogden SC, Cheng Y, Lee EM, Wen Z, Qian X, Nguyen HN, Li Y, Yao B, et al. Molecular signatures associated with ZIKV exposure in human cortical neural progenitors. Nucleic Acids Res. 2016; 44(18):8610–20.PubMedPubMedCentralGoogle Scholar
  22. 22.
    Dick G, Kitchen S, Haddow A. Zika virus (I). isolations and serological specificity. Trans R Soc Trop Med Hyg. 1952; 46(5):509–20.PubMedGoogle Scholar
  23. 23.
    Kong P, Lei P, Zhang S, Li D, Zhao J, Zhang B. Integrated microarray analysis provided a new insight of the pathogenesis of Parkinson’s disease. Neurosci Lett. 2018; 662:51–8.PubMedGoogle Scholar
  24. 24.
    Liu MT, Chen BP, Oertel P, Buchmeier MJ, Armstrong D, Hamilton TA, Lane TE. Cutting edge: the T cell chemoattractant IFN-inducible protein 10 is essential in host defense against viral-induced neurologic disease. J Immunol. 2000; 165(5):2327–30.PubMedGoogle Scholar
  25. 25.
    Loetscher M, Gerber B, Loetscher P, Jones SA, Piali L, Clark-Lewis I, Baggiolini M, Moser B. Chemokine receptor specific for IP10 and mig: structure, function, and expression in activated T-lymphocytes. J Exp Med. 1996; 184(3):963–9.PubMedGoogle Scholar
  26. 26.
    Hsieh M-F, Lai S-L, Chen J-P, Sung J-M, Lin Y-L, Wu-Hsieh BA, Gerard C, Luster A, Liao F. Both CXCR3 and CXCL10/IFN-inducible protein 10 are required for resistance to primary infection by dengue virus. J Immunol. 2006; 177(3):1855–63.PubMedGoogle Scholar
  27. 27.
    Zhang B, Chan YK, Lu B, Diamond MS, Klein RS. CXCR3 mediates region-specific antiviral T cell trafficking within the central nervous system during west nile virus encephalitis. J Immunol. 2008; 180(4):2641–9.PubMedGoogle Scholar
  28. 28.
    Lundberg P, Openshaw H, Wang M, Yang H-J, Cantin E. Effects of CXCR3 signaling on development of fatal encephalitis and corneal and periocular skin disease in HSV-infected mice are mouse-strain dependent. Investig Ophthalmol Vis Sci. 2007; 48(9):4162–70.Google Scholar
  29. 29.
    Zimmermann J, Hafezi W, Dockhorn A, Lorentzen EU, Krauthausen M, Getts DR, Müller M, Kühn JE, King NJ. Enhanced viral clearance and reduced leukocyte infiltration in experimental herpes encephalitis after intranasal infection of CXCR3-deficient mice. J Neurovirol. 2017; 23(3):394–403.PubMedGoogle Scholar
  30. 30.
    Zhang B, Patel J, Croyle M, Diamond MS, Klein RS. TNF- α-dependent regulation of CXCR3 expression modulates neuronal survival during West Nile virus encephalitis. J Neuroimmunol. 2010; 224(1):28–38.PubMedPubMedCentralGoogle Scholar
  31. 31.
    Li Y, Park J-S, Deng J-H, Bai Y. Cytochrome coxidase subunit IV is essential for assembly and respiratory function of the enzyme complex. J Bioenerg Biomembr. 2006; 38(5-6):283–91.PubMedPubMedCentralGoogle Scholar
  32. 32.
    Indrieri A, van Rahden VA, Tiranti V, Morleo M, Iaconis D, Tammaro R, D’Amato I, Conte I, Maystadt I, Demuth S, et al. Mutations in COX7B cause microphthalmia with linear skin lesions, an unconventional mitochondrial disease. Am J Hum Genet. 2012; 91(5):942–9.PubMedPubMedCentralGoogle Scholar
  33. 33.
    Naughton BJ, Duncan FJ, Murrey DA, Meadows AS, Newsom DE, Stoicea N, White P, Scharre DW, Mccarty DM, Fu H. Blood genome-wide transcriptional profiles reflect broad molecular impairments and strong blood-brain links in alzheimer’s disease. J Alzheimers Dis. 2015; 43(1):93–108.PubMedPubMedCentralGoogle Scholar
  34. 34.
    Zhang L, Guo X, Chu J, Zhang X, Yan Z, Li Y. Potential hippocampal genes and pathways involved in Alzheimer’s disease: a bioinformatic analysis. Genet Mol Res. 2015; 14:7218–32.PubMedGoogle Scholar
  35. 35.
    Dalmau J, Gultekin SH, Voltz R, Hoard R, DesChamps T, Balmaceda C, Batchelor T, Gerstner E, Eichen J, Frennier J, et al. Ma1, a novel neuron-and testis-specific protein, is recognized by the serum of patients with paraneoplastic neurological disorders. Brain. 1999; 122(1):27–39.PubMedGoogle Scholar
  36. 36.
    Chen H-L, D’mello SR. Induction of neuronal cell death by paraneoplastic Ma1 antigen. J Neurosci Res. 2010; 88(16):3508–19.PubMedPubMedCentralGoogle Scholar
  37. 37.
    Ware CF. Network communications: lymphotoxins, LIGHT, and TNF. Annu Rev Immunol. 2005; 23:787–819.PubMedGoogle Scholar
  38. 38.
    Browning JL, Ngam-ek A, Lawton P, DeMarinis J, Tizard R, Chow EP, Hession C, O’Brine-Greco B, Foley SF, Ware CF. Lymphotoxin β, a novel member of the TNF family that forms a heteromeric complex with lymphotoxin on the cell surface. Cell. 1993; 72(6):847–56.PubMedGoogle Scholar
  39. 39.
    VanArsdale TL, VanArsdale SL, Force WR, Walter BN, Mosialos G, Kieff E, Reed JC, Ware CF. Lymphotoxin- β receptor signaling complex: role of tumor necrosis factor receptor-associated factor 3 recruitment in cell death and activation of nuclear factor κb. Proc Natl Acad Sci. 1997; 94(6):2460–5.PubMedGoogle Scholar
  40. 40.
    Plow EF, Hoover-Plow J. The functions of plasminogen in cardiovascular disease. Trends Cardiovasc Med. 2004; 14(5):180–6.PubMedGoogle Scholar
  41. 41.
    Ejeskär K, Krona C, Carén H, Zaibak F, Li L, Martinsson T, Ioannou PA. Introduction of in vitro transcribed ENO1 mRNA into neuroblastoma cells induces cell death. BMC Cancer. 2005; 5(1):161.PubMedPubMedCentralGoogle Scholar
  42. 42.
    Kazmirchuk T, Dick K, Burnside DJ, Barnes B, Moteshareie H, Hajikarimlou M, Omidi K, Ahmed D, Low A, Lettl C, et al. Designing anti-Zika virus peptides derived from predicted human-Zika virus protein-protein interactions. Comput Biol Chem. 2017; 71:180–7.PubMedGoogle Scholar
  43. 43.
    Schmechel D, Brightman M, Marangos P. Neurons switch from non-neuronal enolase to neuron-specific enolase during differentiation. Brain Res. 1980; 190(1):195–214.PubMedGoogle Scholar
  44. 44.
    Izzo A, Kamieniarz K, Schneider R. The histone H1 family: specific members, specific functions?. Biol Chem. 2008; 389(4):333–43.PubMedGoogle Scholar
  45. 45.
    Fawaz NA, Beshlawi IO, Al Zadjali S, Al Ghaithi HK, Elnaggari MA, Elnour I, Wali YA, Al-Said BB, Rehman JU, Pathare AV, et al. dRTA and hemolytic anemia: first detailed description of SLC4A1 A858D mutation in homozygous state. Eur J Haematol. 2012; 88(4):350–5.PubMedGoogle Scholar
  46. 46.
    Bach D, Pich S, Soriano FX, Vega N, Baumgartner B, Oriola J, Daugaard JR, Lloberas J, Camps M, Zierath JR, et al. Mitofusin-2 determines mitochondrial network architecture and mitochondrial metabolism a novel regulatory mechanism altered in obesity. J Biol Chem. 2003; 278(19):17190–7.PubMedGoogle Scholar
  47. 47.
    Ichinohe T, Yamazaki T, Koshiba T, Yanagi Y. Mitochondrial protein mitofusin 2 is required for NLRP3 inflammasome activation after RNA virus infection. Proc Natl Acad Sci. 2013; 110(44):17963–8.PubMedGoogle Scholar
  48. 48.
    Yasukawa K, Oshiumi H, Takeda M, Ishihara N, Yanagi Y, Seya T, Kawabata S-i, Koshiba T. Mitofusin 2 inhibits mitochondrial antiviral signaling. Sci Signal. 2009; 2(84):47.Google Scholar
  49. 49.
    Nazar M, Nicola JP, Vélez ML, Pellizas CG, Masini-Repiso AM. Thyroid peroxidase gene expression is induced by lipopolysaccharide involving nuclear factor (NF)- κb p65 subunit phosphorylation. Endocrinology. 2012; 153(12):6114–25.PubMedGoogle Scholar
  50. 50.
    Grinde B, Gayorfar M, Rinaldo CH. Impact of a polyomavirus (BKV) infection on mRNA expression in human endothelial cells. Virus Res. 2007; 123(1):86–94.PubMedGoogle Scholar
  51. 51.
    Wu S, Zhang X, Li Z-M, Shi Y-X, Huang J-J, Xia Y, Yang H, Jiang W-Q. Partial least squares based gene expression analysis in ebv-positive and ebv-negative posttransplant lymphoproliferative disorders. Asian Pac J Cancer Prev. 2013; 14(11):6347–50.PubMedGoogle Scholar
  52. 52.
    Munoz-Erazo L, Natoli R, Provis JM, Madigan MC, King NJC. Microarray analysis of gene expression in West Nile virus–infected human retinal pigment epithelium. Mol Vis. 2012; 18:730.PubMedPubMedCentralGoogle Scholar
  53. 53.
    Poorebrahim M, Salarian A, Najafi S, Abazari MF, Aleagha MN, Dadras MN, Jazayeri SM, Ataei A, Poortahmasebi V. Regulatory network analysis of Epstein-Barr virus identifies functional modules and hub genes involved in infectious mononucleosis. Arch Virol. 2017; 162(5):1299–309.PubMedGoogle Scholar
  54. 54.
    Hornung R, Boulesteix A-L, Causeur D. Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment. BMC Bioinformatics. 2016; 17(1):27.PubMedPubMedCentralGoogle Scholar

Copyright information

© The Author(s) 2019

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Authors and Affiliations

  1. 1.Institute for Animal Breeding and GeneticsUniversity of Veterinary Medicine HannoverHannoverGermany
  2. 2.Research Center for Emerging Infections and Zoonoses, University of Veterinary Medicine HannoverHannoverGermany

Personalised recommendations