Methods of Analysis and Meta-Analysis for Identifying Differentially Expressed Genes

  • Panagiota I Kontou
  • Athanasia Pavlopoulou
  • Pantelis G. BagosEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1793)


Microarray approaches are widely used high-throughput techniques to assess simultaneously the expression of thousands of genes under certain conditions and study the effects of certain treatments, diseases, and developmental stages. The traditional way to perform such experiments is to design oligonucleotide hybridization probes that correspond to specific genes and then measure the expression of the genes in order to determine which of them are up- or down-regulated compared to a condition that is used as a control. Hitherto, individual experiments cannot capture the bigger picture of how a biological system works and, therefore, data integration from multiple experimental studies and external data repositories is necessary to understand the function of genes and their expression patterns under certain conditions. Therefore, the development of methods for handling, integrating, comparing, interpreting and visualizing microarray data is necessary. The selection of an appropriate method for analysing microarray datasets is not an easy task. In this chapter, we provide an overview of the various methods developed for microarray data analysis, as well as suggestions for choosing the appropriate method for microarray meta-analysis.

Key words

Gene expression Microarrays Differentially expressed genes Meta-analysis Statistical tests Multiple comparisons 

1 Introduction

Gene expression microarrays have been used in various applications, including the identification of novel genes associated with certain diseases (most notably cancers), tumor classification, and prediction of patient outcome.

In a microarray experiment, the mRNA levels of thousands of genes are measured simultaneously in tissue samples. One basic method for preparing microarrays is spotting arrays on plates. Each spot on a microarray plate is designed to contain multiple identical copies of single DNA strands, fragments or oligonucleotides that represent specific gene coding regions, and are referred to as “probes.” Each spot or a set of spots corresponds to one gene. The order of the probes on the chip is stored in a computer database so that results can be obtained easily. Probes are designed in such a way that they are uniquely complementary to purified RNA or DNA fragments which are fluorescently or radioactively prelabeled. The probes are then hybridized to their corresponding target sequences. The more RNA or DNA fragments get attached to a spot, the higher the radioactive signal; thus, the intensity of a set of spots represents the expression of a gene. After thorough washing to remove non-specific binding sequences, the raw microarray data are obtained by laser scanning or autoradiographic imaging . Microarrays can be fabricated using various technologies.

In spotted microarrays , the probes, which are oligonucleotides, cDNA or small fragments of PCR products that correspond to mRNAs, are “spotted” onto the surface. Spotted microarrays are “customizable”, since the researcher can choose the probes for each experimental study [1]. In oligonucleotide microarrays , with Agilent and Affymetrix being the most popular platforms, the probes are short sequences designed to be complementary to parts of the target sequence so that a gene is represented by a set of probes (probe-set) instead of a single probe. Contrary to spotted microarrays, the probes are synthesized directly onto the surface. The length of the oligo sequences depends on the specific experimental needs [2].

Dual channel (or two-color) microarrays are typically hybridized with cDNA prepared from two samples to be compared (e.g., diseased tissue versus healthy tissue). These samples are labeled with two different fluorophores (e.g., Cy3 and Cy5) with different emission intensity. Relative signal intensities of each fluorophore are used to measure differential gene expression [3]. In single-channel (or one-color) microarrays , contrary to dual-channel microarrays, the samples to be compared are labeled with a single fluorophore. Relative signal intensities for each probe or probe-set reflect the expression level of the labeled target sequence. The main representatives of single-channel microarray platforms are Affymetrix, Illumina and Agilent [4].

The selection of a microarray platform is done on the basis of cost, chip availability for the species under analysis, genome coverage, the starting amount of RNA quality array manufacturing, the validity and availability of software tools for image analysis and intra platform variability [5, 6].

After hybridization, image analysis is performed [7], followed by pre-filtering/masking for microarray signal correction. Background signal adjustment is also recommended before scaling. Normalization is performed to adjust microarray data for effects that are attributed to technology variations [8]. Typical normalization methods include the rank invariant normalization [9], quantile [10] and LOWESS/LOESS methods [9]. For many types of commercial arrays, suites of R-BioConductor-based packages [11], such as RMA (Robust Multi-array Average expression measure ) [12] and MAS 5.0 Algorithm [13], are used to perform consecutive background adjustments and data normalization.

After pre-processing and normalization (and potentially some other steps, such as filtering, imputation of missing values and standardization), we usually end up with an expression matrix that contains the expression values for each probe. The objectives of an analysis may be classified into three broad classes: identification of differentially expressed genes (DEGs), classification /class prediction and clustering.

Clustering of microarray data seeks to group genes based on specific features in a biologically meaningful manner. Clustering operates in an unsupervised manner. There are clustering methods that require the number of clusters to be defined beforehand and methods where the number of clusters is automatically defined. There are several clustering methods available, the most popular being the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), as well as other hierarchical clustering methods for tree-based representations. Evolutionary tree-based algorithms such as Neighbor Joining could be also applied in clustering. In the widely used k-means algorithm , the number of clusters is pre-defined. Another popular clustering algorithm, the Self-Organizing Map (SOM) [14], produces ordered low-dimensional representations of an input data space, and is particularly well suited for exploratory data analysis. Most of the aforementioned methods are implemented in BioConductor [11], Expander [15] and Hierarchical Clustering Explorer (HCE) [16].

Classification of microarray data refers to class prediction from gene expression patterns. In classification, the classes, two or more, (e.g., healthy individuals vs. diseased), are predefined and a classifier is built to discriminate between the classes in future applications [17, 18], most notably screening and diagnosis [19]. A wide variety of supervised methods have been designed for classification, including Neural Networks [20], Support Vector Machines [21], Graphical Models [22], genetic algorithms [23], nearest neighbour classifiers and many other statistical methods such as shrunken centroids [24] and Partial Least Squares and Discriminant analysis [25]. Due to the large number of features given as input to the various classifiers, a subsequent problem is to select the subset of features that can be used efficiently by the classifier. This problem is known as feature selection in machine learning and statistics [26]. A great number of feature selection methods tailored for microarray studies have been developed. Comparison of such methods in gene expression classification can be found in several excellent reviews and evaluation studies [27, 28, 29].

The topic of this review is the description of methodologies for the identification of DEGs . The main objective is to identify which genes are differentially expressed, that is, up- or down-regulated, under different conditions. Ideally, the identification of DEGs is a simple procedure reduced to a statistical test for the equality of means (e.g., t-test, see below). Microarray datasets, however, are characterised by several key distinctive features such as small number of samples, large number of variables and excessive amount of noise; therefore, several advanced statistical methods have been proposed to handle these issues efficiently. Moreover, the generation of similar datasets from various laboratories highlight the need for combining these datasets in order to increase the sample size. This approach, which is termed “meta-analysis” in the medical literature, has been increasingly popular during the last years, and a variety of meta-analysis methods have been developed. In this review, we explore the statistical methods for analysis and meta-analysis of DEGs arising from microarray experiments by focusing on the simple case of comparing only two classes (disease-healthy, treated-non treated etc.). First, the analysis methods for detecting DEGs, starting with the well-known t-test, as well as the various modifications of these methods proposed for different microarray datasets, are described. Afterwards, the methods for meta-analysis of microarray datasets are presented. Moreover, the related software implementation methods are listed, as well as the novel variants of these methods. Examples of microarray data analysis and meta-analysis are also presented.

2 Methods

2.1 Methods of Analysis of Differentially Expressed Genes

Earlier microarray publications assessed differential expression merely in terms of fold-change (FC), with a FC ± 2 being considered as a reliable cut-off value. However, FC cut-off values do not take intra-dataset variability into account or ensure reproducibility. Moreover, the FC-based ranking is not adequate, since a gene with larger variance in expression values has a higher probability of having a larger statistic value. It is also suggested that FC-based methods result in lists of DEGs that are more reproducible. However, reproducibility does not signify accuracy, and the question of whether to use FC to identify DEGs is essentially biological, rather than statistical [30].

2.1.1 t-Test

The t-test assesses whether the means of two groups have statistically significant differences. The one sample t-test is a statistical method used to determine the mean difference between the sample and the known or hypothesized value of the population mean. The null hypothesis assumes that there are no significant differences between the sample and the population mean. In real life applications, however, the one sample t-test is used for testing statistical differences in paired observations, either measuring the same sample (for instance, an individual before and after treatment) or, more generally, samples that are somehow grouped (matched observations):
$$ {\overline{X}}_1-{\overline{X}}_2={\overline{X}}_D $$
To calculate the value of the one sample t-test , we use the formula:
$$ t=\frac{{\overline{X}}_D}{S_D/\sqrt{n}} $$

In this analysis, subscripts 1 and 2 denote the two conditions, the mean difference of which is assumed to be zero according to the null hypothesis. The t-statistic is compared against a t-distribution with n1 degrees of freedom.

The two sample t-test is used to compare two independent population means. The null hypothesis here assumes that the means of two populations are equal. The t-statistic assuming equal variances can be calculated as follows:
$$ t=\frac{{\overline{X}}_1-{\overline{X}}_2}{S_{\mathrm{p}}\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} $$
where the pooled variance Sp is given by the formula:
$$ {S}_{\mathrm{p}}=\sqrt{\frac{\left({n}_1-1\right){S_1}^2+\left({n}_2-1\right){S_2}^2}{n_1+{n}_2-2}} $$

The t-statistic is compared against a t-distribution with n1 + n2 − 2 degrees of freedom.

In the case of unequal variances, the appropriate t-statistic to test the equality of population means is calculated as:
$$ t=\frac{{\overline{X}}_1-{\overline{X}}_2}{S_{{\overline{X}}_1-{\overline{X}}_2}} $$
$$ {S}_{{\overline{X}}_1-{\overline{X}}_2}=\sqrt{\frac{{S_1}^2}{n_1}+\frac{{S_2}^2}{n_2}} $$
However, in this case the asymptotic distribution is difficult to obtain, and a reasonable approximation is to use a t-distribution with degrees of freedom (d.f.) given by the formula:
$$ \mathrm{d}.\mathrm{f}.=\frac{{\left({S_1}^2/{n}_1+{S_2}^2/{n}_2\right)}^2}{{\left({S_1}^2/{n}_1\right)}^2/\left({n}_1-1\right)+{\left({S_2}^2/{n}_2\right)}^2/\left({n}_2-1\right)} $$

A drawback of the t-test in microarray data analysis is that most microarray experiments contain only a few samples in each group (n1 and n2) the assumption of normality does not hold. Thus, several alternatives to the t-test have been proposed in the literature.

2.1.2 Resampling Methods

Bootstrap [31, 32] is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample. Bootstrap provides an ideal alternative method when no formula for the sampling distribution is available or when available formulas make inappropriate assumptions (e.g., small sample size, non-normal distribution). The accuracy of bootstrapping depends on the number of observations in the original sample and the number of replications. A crudely estimated sampling distribution is adequate to calculate, for instance, a standard error; a better estimate is needed for constructing a 95% confidence interval. There are various methods for constructing a Bootstrap confidence interval from the resampled statistics, such as the normal approximation method, the bias corrected method, the percentile method and the t-percentile method [33]. Generally, replications of the order of 1000 produce very accurate estimates, although more may be needed for the accurate estimation of p-values. Only 50–200 replications are needed for estimating standard errors, though this may have implications in meta-analysis (see below). Various methods have been proposed for estimating the adequate number of replications [34, 35]. The Bootstrap has been applied in microarray experiments and empirical evidence suggests that it produces accurate estimates, at least for moderate sample sizes [36]. For really small sample sizes (i.e., <10), various modifications to the standard bootstrap method have been proposed [37, 38].

A conceptually different resampling method is the permutation test . This is a type of statistical significance test where the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic following rearrangements of the labels on the observations. If under the null hypothesis the labels are exchangeable, then the resulting tests generate exact significance levels. Confidence intervals can then be derived from the tests. The theory has evolved from the works of Fisher and Pitman in the 1930s (reviewed in Kaiser [39]). For small samples, all possible permutations can be evaluated; however, for sample sizes >15, a random sample of the permutation is used instead, hence the name Monte Carlo permutation. An important assumption underlying a permutation test is that under the null hypothesis the observations are exchangeable. Thus, a consequence of this is that tests of difference in location (e.g., t-test) require equal variance. In this respect, the permutation t-test has the same weakness as the classical Student’s t-test (i.e., the Behrens–Fisher problem). In general, since the permutation computes a p-value by counting the times the test statistic is larger than the observed one, a large number of replications are required (typically of the order of 1000 or more). Permutation tests have been used for the analysis of microarray data [40]. In the case sample sizes are very small, the number of distinct permutations can be severely restricted, and combining the permutation-derived test statistics for each gene has been proposed. However, since the null distribution of the test statistics under permutation is not the same for all genes, this can have a negative impact on p-value estimation [41].

Bootstrap and permutation methods are readily available in major statistical packages like Stata [42] and R [43]. There are various implementations of the Bootstrap available in Stata (bootstrap command) and in R (boot command). Permutation can be also performed using the permute and permtest (for paired observations) commands in Stata, as well as the perm command in R. In the Supplement at we give examples of performing bootstrapping and permutation t-test is Stata.

2.1.3 Bayesian Methods

The Bayesian methods provide an intuitively appealing framework for handling most of the problems encountered in the analysis of microarray data. Several Bayesian methods have been developed that replace t-test, which is one of the simplest and widely used statistical methods in microarray expression data analysis. These Bayesian methods share some common features but have also marked differences according to various criteria, especially in the prior distribution for the hyperparameters. Moreover, some of these methods are oriented towards hypothesis testing by relying on the Bayes Factor to compare the null against the alternative hypothesis [44, 45, 46, 47]; other methods are oriented towards parameter estimation and compute credible intervals for the parameters of interest, for example the difference of the means [48, 49]. One of the advantages of the t-test is that its simplicity allows in many cases a closed-form expression to be derived, especially for the Bayes Factor [44, 45, 46, 47], whereas other methods rely on Markov Chain Monte Carlo (MCMC) to sample from the posterior distribution [48, 49]. Another major advantage of the Bayesian methods is that within the Bayesian framework, one cannot only incorporate the uncertainty regarding the parameters and the small sample size, but also multiple testing , which is very important in microarray analysis [44, 50, 51].

There are several software implementations available of the aforementioned Bayesian methods. For instance, the Bayes Factor method of Rouder and coworkers [45], which is known as the Jeffreys–Zellner–Siow (JZS) t-test , is available as a web-calculator (, as well as an R package ( The Savage–Dickey (SD) t-test , proposed by Wetzels and coworkers [48], is inspired by the JZS t-test and retains its key concepts. It is, however, applicable to a wider range of statistical problems, since it allows researchers to test order restrictions and applies to two-sample situations with unequal variance. The SD t-test is also implemented into an R package that uses WinBUGS ( Finally, the BEST (Bayesian Estimation Supersedes the t-test ) software package, which provides a Bayesian alternative to t-test, providing much richer information than a simple p-value, such as complete distributions of credible values for the effect size, difference of mean between groups, difference of standard deviations, and the normality of the data within the groups [49]. The BEST package is implemented in R ( and is also available as an online calculator ( Moreover, the BEST method is implemented in the Bayesian First Aid package ( that aims to provide user friendly Bayesian alternatives to the most widely used estimation commands.

2.1.4 Penalized t-Test

As we have already mentioned, the ordinary t-test is not recommended for microarray experiments because a large t-statistic can be driven by an unrealistically small value for S2. Genes with small sample variances, possibly due to a very small sample size, have a good chance of giving a large t-statistic even if they are not expressed differentially. Alternative methods have been proposed in order to alleviate such problems, namely, penalized, moderated or regularized t-tests . Most of these methods are characterized by an empirical Bayesian justification, and hence share a lot of common features with the Bayesian methods; other methods mainly consist of ad-hoc rules. In any case, all of these methods apply certain types of modification to the denominator in the t-test formula by increasing the variance [52]. Thus, they all have the same interpretation as an ordinary t-statistic, except that the standard errors have been moderated across genes. Baldi and Long were among the first to discuss Bayesian methods for the t-test in the context of microarray experiments [53, 54] and they preferred to develop an empirical Bayesian regularized t-test method with variance equal to:
$$ {S}_{\mathrm{Cyber}\hbox{-} \mathrm{T}}^2=\frac{\nu_0{\sigma_0}^2+\left(n-1\right){S}^2}{\nu_0+n-2} $$

This method is implemented in the web-server Cyber-T ( and in R ( The parameter ν0 represents the degree of confidence in the background variance σ02 versus the empirical variance. In Cyber-T, the value of ν0 is user defined; the smaller the n value , the larger the ν0 value. A simple rule of thumb is to assume that K > 2 is needed to properly estimate the standard deviation and keep n + ν0 = K. This allows a more flexible treatment of situations in which the number n of available data points varies from gene to gene. The default value of K is 10. In particular, by using this approach, the empirical variance depends on ν0 “pseudo-observations” with a background variance σ02. For σ0, one could use the standard deviation of the entire dataset or of particular categories of genes. Cyber-T uses a flexible approach under which the background standard deviation is estimated by pooling together all the neighbouring genes contained in a window of size w (the default value of w is 101, corresponding to 50 genes in the immediate neighbourhood of the gene under consideration).

Another empirical Bayesian method is the method of Lönnstedt and Speed [55] which uses the moderated variance:
$$ {S}_{\mathrm{LS}}^2=a+{S}^2 $$
where the penalty a is estimated from the mean and standard deviation of the sample variances S. Later, Smyth [56] provided a variant of this formula, which is implemented in the well-known limma (linear models for microarray data) package:
$$ {S}_{limma}^2=\frac{\nu_0{\sigma_0}^2+{nS}^2}{\nu_0+n} $$
Here, d0 and s0 are estimated from the data with the method of moments using an empirical Bayesian approach. The limma method is one of the most widely used methods for analysing DEGs , and it is available as a Bioconductor package in R ( Tusher et al. [57] and Efron et al. [58] also used a penalized t-statistics:
$$ {S}_{\mathrm{SAM}}=a+S $$

This differs slightly from the previous statistics in that the penalty a is applied to the sample standard deviation S rather than to the sample variance S2. Tusher et al. [57] in the so-called “Significance Analysis of Microarrays (SAM)” method, chose a to minimize the coefficient of variation of the absolute t-values while Efron et al. [58], used a as the 90th percentile of the S values. These choices are based on empirical rather than theoretical considerations. SAM is one of the oldest and most widely-used methods and it is available as an Excel plugin at, as well as implemented in several R packages (samr, ema).

Ranking Analysis of Microarray data (RAM) uses another type of regularization, which is based on the observation that even when the sample size is small, the fudging effect using the modified t-statistic is still quite strong [59]. In particular, small sample size often leads to an unreasonably large value of a that dominates the test statistic and consequently reduces the power of the analysis. Thus, the authors proposed the following:
$$ {S}_{\mathrm{RAM}}=\left\{\begin{array}{l}1+S\ \mathrm{if}\ {\overline{X}}_D>S<1\\ {}S\ \mathrm{otherwise}\end{array}\right. $$

RAM is based on the comparisons between a set of ranked t-statistics and a set of ranked Z-values (a set of ranked estimated null scores) yielded by a “randomly splitting” (RS) approach instead of the permutation approach used by SAM. Results obtained from simulated and real microarray data revealed that RAM is more efficient in the identification of DEGs under undesirable conditions such as small sample size, a large fudge factor, or mixture distribution of noises compared to SAM.

The regularised t-statistics have many desired properties. In particular, they are easily computed, have a natural interpretation, and are less computationally intensive compared to the full Bayesian methods and the resampling approaches. Moreover, simulation studies [30] have shown that regularised t-statistics are superior to the ordinary t-statistic for detecting DEGs, even when the sample size is very small (n < 10). The penalized t-statistics, on the other hand, can also be extended in several ways to apply to more general experimental situations. A disadvantage is that the null distribution of the modified t-statistic is not standard. Baldi and Long [53], as well as Smyth [56], rely on a modified t-distribution with adjusted degrees of freedom. On the other hand, methods such as SAM use permutations in order to calculate False Discovery Rate (FDR , see below).

2.1.5 Other Methods

As we have already mentioned, earlier microarray publications estimated differential expression of genes based solely on FC The moderated t-tests, on the other hand, borrow information across genes; they perform better, providing estimates of statistical significance and results more in line with FC rankings . However, even these contemporary statistical tests permit genes with relatively small FCs to be considered statistically significant probably due to t-statistic formula’s very small denominator.

Hence, it is becoming increasingly necessary in the literature that DEGs meet both p-value and FC criteria. Several authors require that genes satisfy an acceptable level of statistical significance and then rank significant genes by FC with an arbitrarily set cut-off. There are also authors who first apply a FC cut-off and then rank genes according to their p-value. Other authors declare genes as differentially expressed on the basis that they simultaneously show a FC larger than a given threshold value and satisfy the criterion for p-value . Such combined criteria are suggested to identify more biologically relevant sets of genes and even provide a much better inter-platform agreement compared to FC and p-values alone [60].

TREAT (t-tests relative to a threshold) is used to introduce statistical formalism to these approaches. This method is an extension of the empirical Bayesian moderated t-statistic presented by Smyth (i.e., limma), and can be used to test whether the true differential gene expression is greater than a given threshold value. By including the FC threshold value of interest in a formal hypothesis test, the methods achieve reliable p-values for identifying genes with differential expression that is biologically relevant [60]. TREAT has been shown to perform well in both real and simulated data.

Similar considerations have led to the development of the weighted average difference (WAD) method for ranking DEGs [61]. The authors observed that some genes which are falsely declared to be highly differentially expressed tend to display lower expression levels. In this way, the “true” DEGs cannot be identified because the relative error is increased at lower signal intensities. WAD uses the average gene expression difference and relative average signal intensity in a way such that highly expressed genes are top ranked on the average for the different conditions:
$$ \mathrm{WAD}=\left({\overline{X}}_1-{\overline{X}}_2\right)\frac{\overline{X}-{\min}_p\left(\overline{X}\right)}{\max_p\left(\overline{X}\right)-{\min}_p\left(\overline{X}\right)} $$
where \( \overline{X}=\left({\overline{X}}_1+{\overline{X}}_2\right)/2 \)and the max (or min) indicates the maximum (or minimum) value, respectively, in an average expression vector of \( \overline{X} \) among the p genes analysed (on a log scale). WAD was compared to several other methods and the results showed that it outperforms them in terms of both sensitivity and specificity.
Finally, the RankProduct (RP) method is based on calculating rank products from replicate experiments, in a fast and simple way. This method seeks to alleviate the above-mentioned problems by relying on biologically significant FC, providing at the same time an estimate of the statistical significance. The RP method is essentially a non-parametric method for detecting DEGs in microarray experiments [62, 63]. The genes are ranked according to FC and then analysis is performed separately for up-regulated and under-regulated genes. For instance, concerning the up-regulated gene g with i = 1, 2, …, k replicates, the rank product will be given by the geometric mean:
$$ {\mathrm{RP}}_g^{\mathrm{up}}={\left({\prod}_k{r}_{g,i}^{\mathrm{up}}\right)}^{1/k} $$

The RP method is available as an R package (RankProd), and also supported by the webserver RankProdIt (

The use of exact calculation and permutation methods have been proposed to determine the statistical significance. These approaches have serious limitations as they are computationally demanding. Approximation methods have been also proposed but these usually provide inaccurate estimates in the tail of the p-value distribution. Lately, however, a method to determine upper bounds and accurate approximate p-values of the RP statistic has been developed, decreasing the computational time significantly. The R code for this method is available at [64].

The RP method has been reported to perform more reliably and consistently compared to SAM, even on highly noisy data. In realistic simulated microarray datasets, RP is more robust and accurate for sorting genes based on differential expression compared to t-statistics, especially for replicate numbers n < 10. This method performs particularly well on data contaminated by abnormal random noise and heterogeneous samples. RP, however, assumes equal measurement variance for all genes and tends to give overly optimistic p-values when this assumption does not apply. Therefore, appropriate variance-stabilizing normalization should be performed on the data prior to calculating the RP values. If applicable, another rank-based variant of RP , that is, average ranks, provides a suitable alternative with comparable performance.

2.2 Meta-Analysis of Microarrays

Meta-analysis is the statistical technique for combining data from multiple independent but related studies [65]. In particular, meta-analysis can be used to identify a treatment effect that is consistent among studies. In case the treatment effect varies among studies, meta-analysis may be used to identify the cause for this variation. Hypotheses cannot be inferred and validated based solely on the results of a single study, as the results typically vary between studies; instead, data across studies should be combined [66]. Meta-analysis applies universal formulas to a number of different studies. Nowadays, GEO ( and ArrayExpress ( databases provide the option to compare the normalized raw data across many experiments and organisms, allowing in this way comparative gene expression profiling.

In this section, we provide a practical guide that could enable the reader to make informed decisions on how to conduct a meta-analysis of microarray data.

Issue 1: Selection of Appropriate Microarray Datasets

The first, and most critical, step in an experimental study is to clearly state objectives. Meta-analysis enables the identification of DEGs among multiple samples in order to improve classification within and across platforms, detect redundancy across diverse datasets, identify differentially co-expressed genes, and infer networks of genetic interactions. The second step of meta-analysis is to set eligibility criteria, either biological (e.g., tissue type, disease) or technical (e.g., one-channel versus two-channel detection, density of microarrays, technological platform). Based on these criteria, literature searches are preformed, using appropriate key terms, to retrieve relevant studies. These studies can be complemented by microarray data available in public databases that conform to the MIAME (Minimum Information About a Microarray Experiment ) guidelines.

Issue 2: Data Acquisition from Studies

The genes found to be differentially expressed in a given study constitute the published gene lists (PGLs) which are either included in the main text or provided as supplementary material. The gene expression data matrices (GEDM) contain preprocessed expression values of every probe-set and sample for one gene. The published GEDM cannot be used directly as input for meta-analysis because of the different algorithms used for processing raw data in the original studies, which may generate heterogeneous, non-comparable results.

Issue 3: Preprocessing of Datasets from Diverse Platforms

To enable consistent analysis of all datasets, bias introduced by the preprocessing algorithms should be eliminated. To this end, feature-level extraction output (FLEO) files, such as CEL files, should be obtained and converted to GEDM suitable for meta-analysis. Multiple studies from the same platform should be preprocessed using a single algorithm. In case the studies are conducted on different platforms, it is recommended to be preprocessed with comparable algorithms in order to be combinable.

Issue 4: Promiscuous Hybridization between Probes and Genes

The datasets are annotated using UniGene or RefSeq gene identifiers, collectively referred to as GeneIDs. Multiple probes can hybridize with the same GeneID, as UniGene represents a cluster of sequences that correspond to a unique gene. Conversely, one non-specific probe can cross-hybridize with multiple GeneIDs due to imperfect specificity. There are also probes with inadequate sequence information that cannot hybridize with any GeneID. One approach to resolve the “many-to-many” relationships between probes and genes is to include in the meta-analysis only probes that are associated with a single gene, and exclude the promiscuous probes that are associated with more than one gene; however, important information can be lost. Averaging the expression profiles prior to meta-analysis is not recommended either, given that probe binding affinity differences affect the gene expression measurements. It is therefore recommended to apply descriptive statistics, thereby reducing the “many-to-many” into “one-to-one” relationship between probe and GeneID for each study [66, 67, 68].

Issue 5: Choosing a Meta-Analysis Technique

The choice of meta-analysis techniques depends on the type of response (e.g., binary, continuous, survival). In this review, we focus on the two-class comparison of microarrays where the objective is to identify genes expressed differentially between two different conditions. In such cases as this, there are three broad categories of statistical methods for meta-analysis that make use of effect sizes, p-values and ranks.

2.2.1 Effect Size

The first statistical method is a standard approach for meta-analysis using fixed or random effects. In principle, any suitable effect size can be used in meta-analysis; in practice, however, most authors advocate the standardized mean difference:
$$ {d}_i=\frac{X_{1i}-{X}_{2i}}{S_{\mathrm{p}i}} $$
X1i and X2i are the means of the two groups under comparison in the ith study, and Spi is the pooled standard deviation given by:
$$ {S}_{\mathrm{p}i}=\sqrt{\frac{\left({n}_{1i}-1\right){S}_{1i}^2+\left({n}_{2i}-1\right){S}_{2i}^2}{n_{1i}+{n}_{2i}-2}} $$
In research synthesis, the sample estimate of the standardized mean difference is referred to as Cohen’s d [69]. Nevertheless, d has the tendency to overestimate the absolute value in small samples. This bias introduced by d can be corrected using the so-called Hedges’ g, which generates an unbiased estimate . A correction factor, called J, is employed to convert from d to Hedges’ g. Although there is an exact formula for J, researchers often use an approximation given by g i  = Jd i  = d i  − 3d i /(4n i  − 9). The estimated variance of d is given by:
$$ \operatorname{var}\left({d}_i\right)={s}_i^2=\left(\frac{1}{n_{1i}}+\frac{1}{n_{2i}}\right)+\frac{d_i^2}{2\left({n}_{1i}+{n}_{2i}\right)} $$
When g is used, var(g) = J2var(d). In any case, it is straightforward to obtain a pooled estimate of d (or g):
$$ \widehat{d}=\frac{\sum \limits_{i=1}^k{w}_i{d}_i}{\sum \limits_{i=1}^k{w}_i} $$
This is the well-known inverse-variance estimate used in meta-analysis with w i  = 1/s i 2 [65, 70]. The above method assumes homogeneity of the effect across studies, which is a rather weak assumption . In the case of between-study heterogeneity, we hypothesize that the true effect varies between studies, d i ~N(d, s i 2 + τ2) and, therefore, an additive component of the between studies variance (τ2) needs to be estimated (random-effects model). The most commonly used method for estimating τ2 is the non-iterative method of moments proposed by DerSimonian and Laird [71], even though there are several alternative methods including iterative procedures [72]. In case τ2 = 0, the random-effects and the fixed-effects estimates coincide. In the random-effects case , the weights are calculated by:
$$ {w}_i={\left({\tau}^2+{s}_i^2\right)}^{-1} $$
and subsequently Eq. (18) is applied in order to obtain the overall estimate. In any case, inferences about the overall effect are based on the normal approximation since:
$$ \operatorname{var}\left(\widehat{d}\right)=\frac{1}{\sum \limits_{i=1}^k{w}_i} $$

In the case of a matched design (e.g., use of same individuals before and after treatment), there is a very similar formula , except that the natural unit of deviation is the standard deviation of the difference scores, and so this is the value that is likely to be reported or calculated from the data.

As we have already noted, this approach is based on common practices in meta-analysis and thus it was advocated early in the literature [73, 74]. However, to handle the problem of small sample size and non-normal data, most authors suggest a type of correction for calculating the statistical significance. Therefore, instead of relying on the normal approximation , they propose the permutation test. Although Choi and coworkers [73] suggest permutations to calculate p-values, a faster solution is offered in the Bioconductor package GeneMeta, which assumes a normal distribution on the z-scores after checking the reliability of this hypothesis by a QQ plot. In general, all the aforementioned resampling methods can be used, with bootstrapping being, probably, the most advantageous since it requires a smaller number of replications. The bootstrap or the permutation methods can also be used in different settings. One option would be to perform an analysis for each study separately, obtain a corrected estimate of variance and then use this in order to calculate the weights for the meta-analysis. Another option would be to perform the analysis in a single step using the resampling strategy (bootstrap or permutation) in a stratified manner, in which the studies are treated as strata.

Following another approach, inference could be based on the ratio of means instead of the standardized difference [75]. This approach has the distinct advantage that uses a measure related to the well-known FC . Thus, the statistic would be:
$$ {\gamma}_i=\log \left(\frac{X_{1i}}{X_{2i}}\right) $$
with estimated variance equal to:
$$ \operatorname{var}\left({\gamma}_i\right)={s}_i^2=\frac{1}{n_{1i}}\frac{S_{1i}^2}{X_{1i}^2}+\frac{1}{n_{2i}}\frac{S_{2i}^2}{X_{2i}^2} $$

All the standard methods reported above can be easily used with this effect size and its variance. The ratio of means has also been used for data other than gene expression, and, in general, it performs well even in small samples [76]. Lately, the ratio of geometric means has also been proposed, especially for skewed data, and its application in the meta-analysis of gene expression data could be also investigated [77]. The points mentioned above regarding bootstrap and permutation are also applicable to this effect size.

The aforementioned methods, since they are standard methods for meta-analysis, can be easily extended to a Bayesian framework [78]. Several studies have been performed to this end, and source code to fit the model is available [79, 80]. In general, Conlon and coworkers [79, 80] use in their models a structure similar to the one Gotardo and coworkers use in their model for single studies; an additional level is added though to account for multiple studies. The main problem with the Bayesian methods is the increased computational complexity and time needed to perform the analysis, especially when a large number of genes is investigated which perhaps limit their applicability. The WinBUGS code to fit the models of Conlon and coworkers is available at

Finally, another promising approach is to use the moderated effect sizes calculated by methods such as limma, instead of the typical effect sizes, in the traditional meta-analysis. This is a two-step method relying in the first step on an advanced method for regularized t-test [81]. Then, provided that that t = dn, a traditional random effects meta-analysis is performed. Another modification of this work is that instead of using the approximation for the variance of d, the exact calculation given by Hedges is used. This approach is implemented in the R package metaMA (

Several major meta-analysis methods for DEG analysis, including fixed effects and random effects methods, as well as methods for combining p-values and ranks (see next sections), are implemented in R packages, such as GeneMeta and metaMA. The most complete package , however, is MetaDE, which also offers functionality for preprocessing the data, as well as for displaying the results graphically [82]. Stata lacks a meta-analysis command dedicated to microarrays, but several of the methods mentioned here can be easily implemented. As a proof of concept, we describe in the Appendix several approaches for performing random effects meta-analysis. One approach consists of performing the analysis for each study separately (using bootstrap or permutation) and then combine the results in the usual way. Another approach would be to perform meta-analysis in a single step and run the bootstrap or permutation simulation as a wrapper method; both bootstrap and permutation should be then performed in a stratified manner treating the studies as strata.

2.2.2 Ranks

Another class of methods for meta-analysis consists of methods that combine ranks. There are several different approaches, the common denominator of which is that if the same gene is repeatedly at the top of the list ordered by up- or down-regulated genes in replicate experiments, the gene is more likely to be declared as differentially expressed. The Rank Product method, which we have already described in the context of a single study, uses FC to rank genes and calculates the products of ranks across samples and studies [83]. A similar method, Rank Sum, uses the sum of ranks instead, but all other calculations are identical. The RankProd software is available at:

A related method, termed METRADISC (Meta-analysis of Rank Discovery Dataset ), is based on the same principle, but it is more general [84, 85]. The ranking within each study is performed with any available method (FC, t-test, p-value etc.) and then the average rank of a particular gene across studies is calculated. The overall mean can be weighted or unweighted; the weighted overall mean resembles the traditional methods for meta-analysis. The between-study heterogeneity of the study-specific ranks can also be computed. METRADISC is implemented in R ( and it is also available as a stand-alone application ( The methods that use ranks are quite robust and can combine studies using different methods. However, the statistical inferences are based on Monte Carlo permutation tests, which may be time-consuming.

The rank-based methods offer several advantages compared to traditional approaches, including the FC criterion, fewer assumptions under the model, and robustness with noisy data and/or low numbers of replicates . These methods overcome heterogeneity across multiple datasets and combine them to achieve increased sensitivity and reliability. Of particular note, these methods do not require the simultaneous normalization of multiple datasets using the same technique, solving in this way a key issue in microarray meta-analysis pre-processing. Moreover, the rank-based methods transform the actual expression values into ranks, and thus they can integrate datasets produced by a wide variety of platforms (Affymetrix oligonucleotide arrays, two-color cDNA arrays etc.). Finally, the rank-based methods are quite general and therefore can be applied to different types of data, such as proteomics or genetic association data.

2.2.3 Combination of p-values

Another class of methods that is popular in meta-analysis of microarray studies [86] involves combination of p-values. It is widely accepted that Fisher’s seminal work on the combination of p-values [87] was the origin of meta-analysis [88]. Fisher noted that since p-values from k independent samples are uniform random variables, the sum of their logarithm will follow a χ2 distribution with 2 k degrees of freedom:
$$ U=-2\sum \limits_{i=1}^k\log \left({p}_i\right)=-2\log \left(\prod \limits_{i=1}^k{p}_i\right) $$
Bailey and Gribskov, in a different context, showed that the same probability can be calculated easily with their QFAST algorithm without relying on the χ2 distribution [89]. Edgington suggested to use the sum of the p-values in order to obtain a pooled estimate [90].
$$ p=\frac{{\left(\sum \limits_{i=1}^k{p}_i\right)}^k}{k!} $$
Later, the same author suggested the use of a contrast such as [91]
$$ \overline{p}=\frac{\sum \limits_{i=1}^k{p}_i}{k} $$
in which case \( U=\left(0.5-\overline{p}\right)\sqrt{12} \) follows a N(0,1) distribution. A more sophisticated method was presented by Zaykin and coworkers, the so-called truncated product method (TPM) . Their procedure was to use the product of only those p-values less than a specific cut-off value (τ) to evaluate the probability of such a product, or a smaller value, under the overall hypothesis that all k hypotheses are true [92]. The formula used is:
$$ W=\prod \limits_{i=1}^k{\left({p}_i\right)}^{I\left({p}_i\le \tau \right)} $$
The authors provide an explicit formula for this p-value:
$$ P\left(W\le w\right)=\sum \limits_{i=1}^k\left(\begin{array}{c}k\\ {}r\end{array}\right){\left(1-\tau \right)}^{k-r}\left(w\sum \limits_{s=0}^{r-1}\frac{{\left(r\log \tau -\log w\right)}^s}{s!}I\left(w\le {\tau}^r\right)+{\tau}^rI\left(w>{\tau}^r\right)\right) $$
where r is the number of p i ’s less than τ. Zaykin et al. also showed by simulation that this formula is quite robust for detecting deviations from the overall hypothesis. Of particular note, when τ = min, p results in the well-known Sidak’s correction , and when τ = 1, W becomes \( W=\prod \limits_{i=1}^k{p}_i \). This method provides Fisher’s combined p-value without the need of looking up the cumulative probability from the tail of a chi-square distribution:
$$ P\left(W\le w\right)=w\sum \limits_{i=0}^{k-1}\frac{{\left(-\log w\right)}^i}{i!} $$

Interestingly, this is the exact formula from the QFAST method of Bailey and Gribskov, presented independently few years earlier. Source code for implementing TPM can be obtained from The different approaches for combining p-values have been compared in several evaluation studies [93, 94]. Most of the methods presented in this section are implemented in the metap command available in Stata and R.

Nevertheless, combining p-values presents serious problems relative to combining effect sizes, as in the case of testing different null hypotheses. Moreover, in the combination of p-value, the direction of the association is not taken into consideration and therefore all p-values have to be one-sided , otherwise up- and down-regulated genes have to be combined separately. Finally, these methods cannot quantify the magnitude of the association (the effect size), and, most importantly, do not account for between-studies heterogeneity. A method developed by Stouffer partially overcomes these limitations, by combining the equivalent Z-scores instead of p-values [95]:
$$ \overline{Z}=\frac{\sum \limits_{i=1}^k{Z}_i}{\sqrt{k}} $$
This method does not account for the differences in the size of studies. Thus, a weighted variant can be formulated:
$$ \overline{Z}=\frac{\sum_{i=1}^k\sqrt{w_i}{Z}_i}{\sqrt{\sum_{i=1}^k{w}_i^2}} $$
with weights being proportional to the square-root of the sample size for each study
$$ {w}_i=\sqrt{n_i} $$
Yet, this method does not account for between-studies variability and, also, recent evidence from genetic association studies [96] suggests that this weighting scheme is suboptimal. Zhou and coworkers [96] demonstrated that the optimal weights are proportional to (1/n1i + 1/n2i)−1, providing the foundation in this way for a random effects meta-analysis (even without the actual effect sizes). Notably, the peculiarity of microarray experiments allows the (non available) effect sizes to be estimated accurately as follows: from the Z-statistic a hypothetical effect size, d*, is calculated that would correspond to the same significance level:
$$ {Z}_i={d}_i^{\ast }/ se\left({d}_i^{\ast}\right)\Rightarrow {d}_i^{\ast }={Z}_i se\left({d}_i^{\ast}\right) $$
The standard error of this hypothetical effect size is given from Eq. (17). Thus, the formula for d* is:
$$ {d}_i^{\ast }={Z}_i\sqrt{\left(\frac{n_{1i}+{n}_{2i}}{n_{1i}+{n}_{2i}-{Z}_i^2/2}\right)\left(\frac{1}{n_{1i}}+\frac{1}{n_{2i}}\right)} $$

Using this (hypothetical) effect size and its variance , standard methods for random effects meta-analysis can be applied easily. This approach requires only the Z-score, which can be either acquired directly or calculated from the p-value, the direction of association, and the number of replicates for each condition. This simple approach, inherits all the desirable properties of the method of Stouffer and, at the same time, performs optimal weighting, quantifies the association and enables random-effects meta-analysis in order to account for between-studies heterogeneity. If the original data are analysed with standard methods, the estimated d’s are accurate. If, however, a modified version of the t-test or a resampling method for the statistical significance is used, some discrepancies may be expected; nevertheless, the Z-score and the statistical significance (p-value) of the overall effect are accurate. A Stata program that implements this method and compares it against other methods for combining p-values is given in the Supplement.

2.3 Multiple Comparisons

A typical microarray experiment measures the expression of several thousand genes simultaneously across different conditions. When investigating for potential DEGs between two conditions, each gene is treated independently and a t-test (or any other test described above) usually is performed on each gene separately. The incidence of false positives (i.e., genes falsely declared as DEGs) is proportional to the number of tests performed and the critical significance level (p-value cut-off). When a t-test is performed, the null hypothesis (H0) is usually the hypothesis of no difference between the gene’s expression level, whereas the alternative hypothesis (H1) is that the expression levels differ. If the p-value is less than the chosen significance level, then the null hypothesis is rejected. Assuming the null hypothesis holds, in case 10,000 genes are tested at a 5% level of significance, 500 genes might be declared as significant, by chance alone. Thus, it is important to correct the p-value when performing a statistical test on a group or genes. This is the case for multiple testing correction methods.

Multiple-comparison correction methods take as input a list of p-values and an uncorrected critical p-value and calculate a corrected overall p-value for rejection of null hypotheses. These methods are clasified into two categories, the ones that control the family-wise error rate (FWER) and the ones that control the False Discovery Rate (FDR). In general, a FWER-controlling method defines a corrected p-value for a set of true null hypotheses. Usually, this level of significance is lower than the uncorrected p-value. The most common procedure to control the FWER is the Bonferroni correction [97], where the critical value (α) for an individual test is calculated by dividing the FWER (usually 0.05) by the number of tests. Thus, for 10,000 genes (i.e., number of tests), the critical value for an individual test would be α = 0.05/10000 = 5 × 10−6; genes with p-value <5 × 10−6 are declared as differentially expressed. Thus, the corrected p-value from Bonferroni correction is given by:
$$ {p}_{\mathrm{cor}(i)}={p}_{(i)}/n $$
The Bonferroni correction is easily applied and intuitive, but it is very conservative. Another commonly used method is the method of Sidak [98]:
$$ {p}_{\mathrm{cor}(i)}=1-{\left(1-{p}_{(i)}\right)}^{1/n} $$

Other popular methods used for multiple testing correction in microarray analysis and meta-analysis are the methods proposed by Holland et al. [99] and Holm [100].

Benjamini and Hochberg [101] proposed a method which controls the FDR instead of the FWER. FDR-controlling procedures are designed to control the expected proportion of rejected null hypotheses that were incorrect rejections (“false discoveries”). FDR-controlling procedures provide less stringent control of Type I errors compared to FWER-controlling procedures, which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power (i.e., they detect more differences as statistically significant), at the cost of increased rates of Type I errors. In this method, the p-values of each gene are ranked from the smallest to the largest. In this way, the smallest p-value has a rank of i = 1, then next smallest has i = 2, etc. Then, each individual p-value is compared against its Benjamini–Hochberg critical value, (i/n)Q, where i is the rank, n is the total number of tests, and Q is the chosen FDR. The largest p-value, p < (i/n)Q, is significant, Moreover, all of the p-values smaller than this are also significant, as well as the ones that aren’t less than their Benjamini–Hochberg critical value:
$$ {p}_{\mathrm{cor}(i)}=\frac{i}{n}{p}_{(i)} $$

Other popular multiple comparison correction methods which control FDR in microarray analysis and meta-analysis are the methods proposed by Benjamini and Yekutieli [102], Benjamini and Liu [103, 104], Benjamini, Krieger and Yekutieli [105]. The methods described above are implemented in the multproc command in Stata and in multcomp package in R.

3 Closing Remarks

Microarray experiments enable researches to analyze a vast amount of genetic information in a single experimental run. Therefore, the expression of multiple genes can be measured simultaneously under specific conditions. The use of DNA microarrays is very promising towards understanding genes’ effect on diseases, drug discovery and development. Microarray experiments combined with bioinformatics analysis can reveal a great deal of information about a biological system and its dynamics. Such approaches though, like any other emerging technologies, come with shortcoming and disadvantages.

One of the main problems concerning microarray experiments is the lack of standardization. As a result, the data collected from different microarray platforms cannot be compared accurately, or even be replicated. In an evaluation study, Ioannidis and coworkers found that a large proportion of published studies could not be reproduced, either partially or completely [106]. This was mainly attributed to data unavailability and incomplete data annotation or specification of data processing and analysis. The authors called for stricter publication rules that will enforce public data availability and explicit description of data processing and analysis. The issue of comparing data generated by different platforms have long been under investigation [107] and filtering of probes has been shown to significantly improve intra-platform data comparability [108].

Methods for combining different datasets in a meta-analysis can help researchers to alleviate some of the problems mentioned above [109]. However, issues mentioned earlier such as the lack of standardization remain important obstacles for the development of such methods. In this chapter, we presented the available methods and provided information about the various available implementations of these statistical methods. In the recent literature, there are various studies that compare the different methods [110, 111, 112]. Notably, the lack of standardization is also apparent in the literature pertinent to studies in meta-analysis of microarrays, since different methods and combinations of these methods have been used in the recent literature. We have shown that, especially in the case of meta-analysis of effect sizes, various combinations of methods can be used. The final choice depends on the available software, the number of genes analysed, the number of studies and the different platforms to be combined. We have also shown that a meta-analysis of p-values can be performed under a random-effects method. It is also worth mentioning that, apart from the commonly used DerSimonian and Laird estimator, there are many available methods for calculating the between-studies variance in a random-effects meta-analysis; some of these methods are better suited for small and heterogeneous samples [113]. Moreover, there are available approximate Bayesian methods for meta-analysis that do not rely on simulations, and hence faster [114]. Taken together, the points above suggest that there is plenty of room for improvements in microarray meta-analysis methodology and software, both in terms of accuracy and speed. A recent systematic search in PubMed, resulted in the empirical evaluation of 333 articles based on microarray meta-analysis studies [115]. The results of this evaluation were very interesting, since, apart from the three general classes of methods presented earlier (effect size, rank, p-values), a large proportion of the published studies was found to be conducted using the “inappropriate” method of pooling datasets. This is a well-known issue in the meta-analysis literature, and this approach of pooling datasets in order to simply create a larger one is not recommended, since it can lead to various types of bias. Vote counting, in which one counts the number of studies in which a gene was declared significant, is another commonly used approach that is not recommended either.

In this chapter, we presented a review on the methodological issues pertaining to microarray data analysis and meta-analysis. The relative microarray databases and available software were also presented. Moreover, statistical methods of microarray data analysis were illustrated by a case study.


  1. 1.
    Bammler T, Beyer RP, Bhattacharya S et al (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2(5):351–356. PubMedCrossRefGoogle Scholar
  2. 2.
    Pease AC, Solas D, Sullivan EJ et al (1994) Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci U S A 91(11):5022–5026PubMedPubMedCentralCrossRefGoogle Scholar
  3. 3.
    Tang T, Francois N, Glatigny A et al (2007) Expression ratio evaluation in two-colour microarray experiments is significantly improved by correcting image misalignment. Bioinformatics 23(20):2686–2691. PubMedCrossRefGoogle Scholar
  4. 4.
    Suarez E, Burguete A, McLachlan GJ (2009) Microarray data analysis for differential expression: a tutorial. P R Health Sci J 28(2):89–104PubMedGoogle Scholar
  5. 5.
    Bosotti R, Locatelli G, Healy S et al (2007) Cross platform microarray analysis for robust identification of differentially expressed genes. BMC Bioinformatics 8(Suppl 1):S5. PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Tan PK, Downey TJ, Spitznagel EL Jr et al (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 31(19):5676–5684PubMedPubMedCentralCrossRefGoogle Scholar
  7. 7.
    Yang YH, Buckley MJ, Speed TP (2001) Analysis of cDNA microarray images. Brief Bioinform 2(4):341–349PubMedCrossRefGoogle Scholar
  8. 8.
    Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496–501. PubMedCrossRefGoogle Scholar
  9. 9.
    Tseng GC, Oh MK, Rohlin L et al (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res 29(12):2549–2557PubMedPubMedCentralCrossRefGoogle Scholar
  10. 10.
    Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193PubMedCrossRefGoogle Scholar
  11. 11.
    Reimers M, Carey VJ (2006) Bioconductor: an open source framework for bioinformatics and computational biology. Methods Enzymol 411:119–134. PubMedCrossRefGoogle Scholar
  12. 12.
    Irizarry RA, Hobbs B, Collin F et al (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2):249–264. PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Pepper SD, Saunders EK, Edwards LE et al (2007) The utility of MAS5 expression summary and detection call algorithms. BMC Bioinformatics 8:273. PubMedPubMedCentralCrossRefGoogle Scholar
  14. 14.
    Tamayo P, Slonim D, Mesirov J et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 96(6):2907–2912PubMedPubMedCentralCrossRefGoogle Scholar
  15. 15.
    Shamir R, Maron-Katz A, Tanay A et al (2005) EXPANDER–an integrative program suite for microarray data analysis. BMC Bioinformatics 6:232. PubMedPubMedCentralCrossRefGoogle Scholar
  16. 16.
    Seo J, Shneiderman B (2002) Interactively exploring hierarchical clustering results. Computer 35(7):80–86. CrossRefGoogle Scholar
  17. 17.
    Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537. doi:7911 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  18. 18.
    Radmacher MD, McShane LM, Simon R (2002) A paradigm for class prediction using gene expression profiles. J Comput Biol 9(3):505–511PubMedCrossRefGoogle Scholar
  19. 19.
    Simon R, Radmacher MD, Dobbin K et al (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1):14–18PubMedCrossRefGoogle Scholar
  20. 20.
    Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679PubMedPubMedCentralCrossRefGoogle Scholar
  21. 21.
    Furey TS, Cristianini N, Duffy N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914PubMedCrossRefGoogle Scholar
  22. 22.
    Bura E, Pfeiffer RM (2003) Graphical methods for class prediction using dimension reduction techniques on DNA microarray data. Bioinformatics 19(10):1252–1258PubMedCrossRefGoogle Scholar
  23. 23.
    Ooi C, Tan P (2003) Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44PubMedCrossRefGoogle Scholar
  24. 24.
    Tibshirani R, Hastie T, Narasimhan B et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99(10):6567–6572. PubMedPubMedCentralCrossRefGoogle Scholar
  25. 25.
    Nguyen DV, Rocke DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1):39–50PubMedCrossRefGoogle Scholar
  26. 26.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182Google Scholar
  27. 27.
    Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517PubMedCrossRefGoogle Scholar
  28. 28.
    Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437PubMedCrossRefGoogle Scholar
  29. 29.
    Ma S, Huang J (2008) Penalized feature selection and classification in bioinformatics. Brief Bioinform 9(5):392–403PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Witten D, Tibshirani R (2007) A comparison of fold-change and the t-statistic for microarray data analysis. Analysis 1776:58–85Google Scholar
  31. 31.
    Efron B (1982) The jackknife, the bootstrap and other resampling plans, vol 38. SIAM, PhiladelphiaCrossRefGoogle Scholar
  32. 32.
    Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall/CRC, Boca Raton, FLCrossRefGoogle Scholar
  33. 33.
    Efron B (1987) Better bootstrap confidence intervals. J Am Stat Assoc 82(397):171–185CrossRefGoogle Scholar
  34. 34.
    Andrews DW, Buchinsky M (2000) A three-step method for choosing the number of bootstrap repetitions. Econometrica 68(1):23–51CrossRefGoogle Scholar
  35. 35.
    Davidson R, MacKinnon JG (2000) Bootstrap tests: how many bootstraps? Econome Rev 19(1):55–68CrossRefGoogle Scholar
  36. 36.
    Meuwissen TH, Goddard ME (2004) Bootstrapping of gene-expression data improves and controls the false discovery rate of differentially expressed genes. Genet Sel Evol 36(2):191–205. PubMedPubMedCentralCrossRefGoogle Scholar
  37. 37.
    Jiang W, Simon R (2007) A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Stat Med 26(29):5320–5334. PubMedCrossRefGoogle Scholar
  38. 38.
    Neuhauser M, Jockel KH (2006) A bootstrap test for the analysis of microarray experiments with a very small number of replications. Appl Bioinforma 5(3):173–179. doi:535 [pii]CrossRefGoogle Scholar
  39. 39.
    Kaiser J (2007) An exact and a Monte Carlo proposal to the fisher–pitman permutation tests for paired replicates and for independent samples. Stata J 7(3):402–412CrossRefGoogle Scholar
  40. 40.
    Tsai CA, Chen YJ, Chen JJ (2003) Testing for differentially expressed genes with microarray data. Nucleic Acids Res 31(9):e52PubMedPubMedCentralCrossRefGoogle Scholar
  41. 41.
    Yang H, Churchill G (2007) Estimating p-values in small microarray experiments. Bioinformatics 23(1):38–43. doi:btl548 [pii]PubMedCrossRefGoogle Scholar
  42. 42.
    StataCorp (2013) Stata statistical software: release. StataCorp LP, College Station, TX, p 13Google Scholar
  43. 43.
    R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  44. 44.
    Gottardo R, Pannucci JA, Kuske CR et al (2003) Statistical analysis of microarray data: a Bayesian approach. Biostatistics 4(4):597–620. PubMedCrossRefGoogle Scholar
  45. 45.
    Rouder JN, Speckman PL, Sun D et al (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237. PubMedCrossRefGoogle Scholar
  46. 46.
    Gönen M, Johnson WO, Lu Y et al (2005) The Bayesian two-sample t test. Am Stat 59(3):252–257CrossRefGoogle Scholar
  47. 47.
    Wang M, Liu G (2015) A simple two-sample Bayesian t-test for hypothesis testing. Am Stat 70(2):195–201CrossRefGoogle Scholar
  48. 48.
    Wetzels R, Raaijmakers JGW, Jakab E et al (2009) How to quantify support for and against the null hypothesis: a flexible WinBUGS implementation of a default Bayesian t test. Psychon Bull Rev 16(4):752–760. PubMedCrossRefGoogle Scholar
  49. 49.
    Kruschke JK (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen 142(2):573–603. PubMedCrossRefGoogle Scholar
  50. 50.
    Gonen M (2010) The Bayesian t-test and beyond. Methods Mol Biol 620:179–199. PubMedCrossRefGoogle Scholar
  51. 51.
    Fox RJ, Dimmic MW (2006) A two-sample Bayesian t-test for microarray data. BMC Bioinformatics 7:126. doi:1471-2105-7-126 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  52. 52.
    Kooperberg C, Aragaki A, Strand AD et al (2005) Significance testing for small microarray experiments. Stat Med 24(15):2281–2298PubMedCrossRefGoogle Scholar
  53. 53.
    Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17(6):509–519PubMedCrossRefGoogle Scholar
  54. 54.
    Kayala MA, Baldi P (2012) Cyber-T web server: differential analysis of high-throughput data. Nucleic Acids Res 40(Web Server issue):W553–W559. PubMedPubMedCentralCrossRefGoogle Scholar
  55. 55.
    Lönnstedt I, Speed T (2002) Replicated microarray data. Stat Sin 12:31–46Google Scholar
  56. 56.
    Smyth GK (2005) Limma: linear models for microarray data. In: Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 397–420CrossRefGoogle Scholar
  57. 57.
    Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121. PubMedPubMedCentralCrossRefGoogle Scholar
  58. 58.
    Efron B, Tibshirani R, Storey JD et al (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160CrossRefGoogle Scholar
  59. 59.
    Tan YD, Fornage M, Fu YX (2006) Ranking analysis of microarray data: a powerful method for identifying differentially expressed genes. Genomics 88(6):846–854. doi:S0888-7543(06)00237-0 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  60. 60.
    McCarthy DJ, Smyth GK (2009) Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25(6):765–771PubMedPubMedCentralCrossRefGoogle Scholar
  61. 61.
    Kadota K, Nakai Y, Shimizu K (2008) A weighted average difference method for detecting differentially expressed genes from microarray data. Algorithms Mol Biol 3:8. PubMedPubMedCentralCrossRefGoogle Scholar
  62. 62.
    Breitling R, Armengaud P, Amtmann A et al (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573(1-3):83–92. PubMedCrossRefGoogle Scholar
  63. 63.
    Breitling R, Herzyk P (2005) Rank-based methods as a non-parametric alternative of the T-statistic for the analysis of biological microarray data. J Bioinforma Comput Biol 3(05):1171–1189CrossRefGoogle Scholar
  64. 64.
    Heskes T, Eisinga R, Breitling R (2014) A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments. BMC Bioinformatics 15(1):1CrossRefGoogle Scholar
  65. 65.
    Normand SL (1999) Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med 18(3):321–359PubMedCrossRefGoogle Scholar
  66. 66.
    Ramasamy A, Mondry A, Holmes CC et al (2008) Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med 5(9):e184PubMedPubMedCentralCrossRefGoogle Scholar
  67. 67.
    Shi L, Reid LH, MACQ Consortium et al (2006) The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161. PubMedCrossRefGoogle Scholar
  68. 68.
    Zeeberg BR, Riss J, Kane DW et al (2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics. BMC bioinformatics 5:80. PubMedPubMedCentralCrossRefGoogle Scholar
  69. 69.
    Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. L. Erlbaum, Hillsdale, New JerseyGoogle Scholar
  70. 70.
    Petiti DB (1994) Meta-analysis decision analysis and cost-effectiveness analysis. In: Monographs in epidemiology and biostatistics, vol 24. Oxford University Press, OxfordGoogle Scholar
  71. 71.
    DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7:177–188PubMedPubMedCentralCrossRefGoogle Scholar
  72. 72.
    Thompson SG, Sharp SJ (1999) Explaining heterogeneity in meta-analysis: a comparison of methods. Stat Med 18(20):2693–2708PubMedCrossRefGoogle Scholar
  73. 73.
    Choi JK, Yu U, Kim S et al (2003) Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19(Suppl 1):i84–i90PubMedCrossRefGoogle Scholar
  74. 74.
    Stevens JR, Doerge RW (2005) Combining affymetrix microarray results. BMC Bioinformatics 6:57. doi:1471-2105-6-57 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  75. 75.
    Hu P, Greenwood CM, Beyene J (2009) Using the ratio of means as the effect size measure in combining results of microarray experiments. BMC Syst Biol 3:106. PubMedPubMedCentralCrossRefGoogle Scholar
  76. 76.
    Friedrich JO, Adhikari NK, Beyene J (2008) The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study. BMC Med Res Methodol 8(1):1CrossRefGoogle Scholar
  77. 77.
    Friedrich JO, Adhikari NK, Beyene J (2012) Ratio of geometric means to analyze continuous outcomes in meta-analysis: comparison to mean differences and ratio of arithmetic means using empiric data and simulation. Stat Med 31(17):1857–1886. PubMedCrossRefGoogle Scholar
  78. 78.
    Sutton AJ, Abrams KR (2001) Bayesian methods in meta-analysis and evidence synthesis. Stat Methods Med Res 10(4):277–303PubMedCrossRefGoogle Scholar
  79. 79.
    Conlon EM, Song JJ, Liu A (2007) Bayesian meta-analysis models for microarray data: a comparative study. BMC Bioinformatics 8:80. doi:1471-2105-8-80 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  80. 80.
    Conlon EM, Song JJ, Liu JS (2006) Bayesian models for pooling microarray studies with multiple sources of replications. BMC Bioinformatics 7:247. doi:1471-2105-7-247 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  81. 81.
    Marot G, Foulley JL, Mayer CD et al (2009) Moderated effect size and P-value combinations for microarray meta-analyses. Bioinformatics 25(20):2692–2699. PubMedCrossRefGoogle Scholar
  82. 82.
    Wang X, Kang DD, Shen K et al (2012) An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics 28(19):2534–2536. doi:bts485 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  83. 83.
    Hong F, Breitling R, McEntee CW et al (2006) RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics 22(22):2825–2827. doi:btl476 [pii]PubMedCrossRefGoogle Scholar
  84. 84.
    Zintzaras E, Ioannidis JP (2008) Meta-analysis for ranked discovery datasets: theoretical framework and empirical demonstration for microarrays. Comput Biol Chem 32(1):38–46. doi:S1476-9271(07)00119-3 [pii]PubMedCrossRefGoogle Scholar
  85. 85.
    Zintzaras E, Ioannidis JP (2012) METRADISC-XL: a program for meta-analysis of multidimensional ranked discovery oriented datasets including microarrays. Comput Methods Prog Biomed 108(3):1243–1246. CrossRefGoogle Scholar
  86. 86.
    Hess A, Iyer H (2007) Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics 8:96. doi:1471-2164-8-96 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  87. 87.
    Fisher RA (1946) Statistical methods for research workers, 10th edn. Oliver and Boyd, EdinburghGoogle Scholar
  88. 88.
    Jones DR (1995) Meta-analysis: weighing the evidence. Stat Med 14(2):137–149PubMedCrossRefGoogle Scholar
  89. 89.
    Bailey TL, Gribskov M (1998) Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14(1):48–54PubMedCrossRefGoogle Scholar
  90. 90.
    Edgington ES (1972) An additive method for combining probability values from independent experiments. J Psychol 80(2):351–363CrossRefGoogle Scholar
  91. 91.
    Edgington ES (1972) A normal curve method for combining probability values from independent experiments. J Psychol 82(1):85–89CrossRefGoogle Scholar
  92. 92.
    Zaykin DV, Zhivotovsky LA, Westfall PH et al (2002) Truncated product method for combining P-values. Genet Epidemiol 22(2):170–185. PubMedCrossRefGoogle Scholar
  93. 93.
    Loughin TM (2004) A systematic comparison of methods for combining p-values from independent tests. Comput Stat Data Anal 47(3):467–485CrossRefGoogle Scholar
  94. 94.
    Cousins RD (2007) Annotated bibliography of some papers on combining significances or p-values. arXiv preprint arXiv:07052209Google Scholar
  95. 95.
    Stouffer SA, Suchman EA, De Vinney L et al (1951) Studies in social psychology in world war II. In: The American soldier: adjustment during army life, vol Vol. 1. Princeton University Press, PrincetonGoogle Scholar
  96. 96.
    Zhou B, Shi J, Whittemore AS (2011) Optimal methods for meta-analysis of genome-wide association studies. Genet Epidemiol 35(7):581–591. PubMedPubMedCentralCrossRefGoogle Scholar
  97. 97.
    Dudoit S, Yang YH, Callow MJ, et al (2000) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical report # 578Google Scholar
  98. 98.
    Sidak Z (1967) Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62:626–633Google Scholar
  99. 99.
    Holland BS, Copenhaver MD (1987) An improved sequentially rejective bonferroni test procedure. Biometrics 43(2):417–423CrossRefGoogle Scholar
  100. 100.
    Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70Google Scholar
  101. 101.
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol 57(1):289–300Google Scholar
  102. 102.
    Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188CrossRefGoogle Scholar
  103. 103.
    Benjamini Y, Liu W (1999) A distribution-free multiple test procedure that controls the false discovery rate. Tel Aviv University, Tel AvivGoogle Scholar
  104. 104.
    Benjamini Y, Liu W (1999) A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Inference 82(1):163–170CrossRefGoogle Scholar
  105. 105.
    Benjamini Y, Krieger AM, Yekutieli D (2006) Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3):491–507. CrossRefGoogle Scholar
  106. 106.
    Ioannidis JP, Allison DB, Ball CA et al (2009) Repeatability of published microarray gene expression analyses. Nat Genet 41(2):149–155PubMedCrossRefGoogle Scholar
  107. 107.
    Jarvinen AK, Hautaniemi S, Edgren H et al (2004) Are data from different gene expression microarray platforms comparable? Genomics 83(6):1164–1168. PubMedCrossRefGoogle Scholar
  108. 108.
    Hwang KB, Kong SW, Greenberg SA et al (2004) Combining gene expression data from different generations of oligonucleotide arrays. BMC Bioinformatics 5:159. doi:1471-2105-5-159 [pii]PubMedPubMedCentralCrossRefGoogle Scholar
  109. 109.
    Moreau Y, Aerts S, De Moor B et al (2003) Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet 19(10):570–577. doi:S0168952503002336 [pii]PubMedCrossRefGoogle Scholar
  110. 110.
    Chang LC, Lin HM, Sibille E et al (2013) Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline. BMC Bioinformatics 14:368. PubMedPubMedCentralCrossRefGoogle Scholar
  111. 111.
    Hong F, Breitling R (2008) A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics 24(3):374–382PubMedCrossRefGoogle Scholar
  112. 112.
    Campain A, Yang YH (2010) Comparison study of microarray meta-analysis methods. BMC Bioinformatics 11:408. PubMedPubMedCentralCrossRefGoogle Scholar
  113. 113.
    Thorlund K, Wetterslev J, Awad T et al (2011) Comparison of statistical inferences from the DerSimonian–Laird and alternative random-effects model meta-analyses–an empirical assessment of 920 Cochrane primary outcome meta-analyses. Res Synth Methods 2(4):238–253PubMedCrossRefGoogle Scholar
  114. 114.
    Abrams K, Sanso B (1998) Approximate Bayesian inference for random effects meta-analysis. Stat Med 17(2):201–218PubMedCrossRefGoogle Scholar
  115. 115.
    Tseng GC, Ghosh D, Feingold E (2012) Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res 40(9):3785–3799. PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Panagiota I Kontou
    • 1
  • Athanasia Pavlopoulou
    • 1
    • 2
  • Pantelis G. Bagos
    • 1
    Email author
  1. 1.Department of Computer Science and Biomedical InformaticsUniversity of ThessalyLamiaGreece
  2. 2.International Biomedicine and Genome Institute (iBG-Izmir)Dokuz Eylul UniversityIzmirTurkey

Personalised recommendations