DNA microarray technology allows researchers to monitor the expressions of thousands of genes under different conditions, and to measure the levels of thousands of different DNA molecules at a given point in the life of an organism, tissue or cell. A wide variety of different diseases that are characterised by unregulated gene expression, DNA replication, cell division and cell death, can be detected early using microarrays. One of the major objectives of microarray experiments is to identify differentially expressed genes under various conditions. The detection of differential gene expression under two different conditions is very important in biological studies, and allows us to identify experimental variables that affect different biological processes. Most of the tests available in the literature are based on the assumption of normal distribution. However, the assumption of normality may not be true in real-life data, particularly with respect to microarray data.
A test is proposed for the identification of differentially expressed genes in replicated microarray experiments conducted under two different conditions. The proposed test does not assume the distribution of the parent population; thus, the proposed test is strictly nonparametric in nature. We calculate the p-value and the asymptotic power function of the proposed test statistic. The proposed test statistic is compared with some of its competitors under normal, gamma and exponential population setup using the Monte Carlo simulation technique. The application of the proposed test statistic is presented using microarray data. The proposed test is robust and highly efficient when populations are non-normal.
This is a preview of subscription content, log in to check access.
Iyer VR, Eisen MB, Schuler DT, et al. The transcriptional program in the response of human fibroblasts to serum. Science 1999; 283: 83–7
DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278: 680–6
Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Res 2001; 11: 1878–87
Long AD, Mangalam HJ, Chan BY, et al. Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework, analysis of global gene expression in E. Coli K12. J Biol Chem 2001; 276: 19937–44
Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17: 509–19
Hunter L, Taylor RC, Leach SM, et al. GEST: a gene expression search tool based on a novel Bayesian similarity metric. Bioinformatics 2001; 17Suppl. 1: S115–22
Tsodikov A, Szabo A, Jones D. Adjustments and measures of differential expression for microarray data. Bioinformatics 2002; 18: 251–60
Dudoit S, Yang YH, Speed TP, et al. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 2002; 12: 111–40
Tusher V, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98: 5116–24
Pan W. On the use of permutation in the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 2003; 19: 1333–40
Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96: 1151–60
Dudoit S, Vander LMJ, Pollard KS. Multiple testing. Part I: single-step procedures for control of general type I error rates. Stat Appl Genet Mol Biol 2004; 3(1): article 13
Serfling RJ. Approximation theorems of mathematical statistics. New York: John Wiley, 2000: 80
Welch BL. The generalization of ‘Students’ problem when several different population variances are involved. Biometrika 1947; 34: 28–35
Draghici S. Data analysis tools for DNA microarrays. Boca Raton (FL): Chapman and Hall/CRC, 2003
Author would like to thank Prof. PK Sen, University of North Carolina, Chapel Hill; Prof. Z Govindarajulu, University of Kentucky; and the referee for giving their valuable feedback.
This work is not supported by any research grant, and there is no conflict of interest.
The algorithm for calculating the p-value.
Step-1 Generate random sample X from the population 1 under the null hypothesis.
Step-2 Generate random sample Y from the population 2 under the null hypothesis.
Step-3 Calculate sample X.
Step-4 Calculate sample Y.
Step-5 Calculate the test statistic Q.
Step-6 Repeat the process for N = 10,000 times.
Step-7 Find n1, the number of times Q exceeded 0.
Step-8 p-value = n1/N.
Critical value of Q for the right-tail test at 5% level of significance for the sample sizes (m, n) using the re-sampling scheme (table AI).
About this article
Cite this article
Mathur, S. A Robust Statistical Method for Detecting Differentially Expressed Genes. Appl-Bioinformatics 4, 247–251 (2005). https://doi.org/10.2165/00822942-200504040-00004
- Microarray Data
- Normal Mixture
- Gene AC002378
- Monte Carlo Simulation Technique
- Nominal Significance Level