A Robust Statistical Method for Detecting Differentially Expressed Genes

Abstract

DNA microarray technology allows researchers to monitor the expressions of thousands of genes under different conditions, and to measure the levels of thousands of different DNA molecules at a given point in the life of an organism, tissue or cell. A wide variety of different diseases that are characterised by unregulated gene expression, DNA replication, cell division and cell death, can be detected early using microarrays. One of the major objectives of microarray experiments is to identify differentially expressed genes under various conditions. The detection of differential gene expression under two different conditions is very important in biological studies, and allows us to identify experimental variables that affect different biological processes. Most of the tests available in the literature are based on the assumption of normal distribution. However, the assumption of normality may not be true in real-life data, particularly with respect to microarray data.

A test is proposed for the identification of differentially expressed genes in replicated microarray experiments conducted under two different conditions. The proposed test does not assume the distribution of the parent population; thus, the proposed test is strictly nonparametric in nature. We calculate the p-value and the asymptotic power function of the proposed test statistic. The proposed test statistic is compared with some of its competitors under normal, gamma and exponential population setup using the Monte Carlo simulation technique. The application of the proposed test statistic is presented using microarray data. The proposed test is robust and highly efficient when populations are non-normal.

This is a preview of subscription content, log in to check access.

Table I
Table IV
Table II
Table III
Table V

References

  1. 1.

    Iyer VR, Eisen MB, Schuler DT, et al. The transcriptional program in the response of human fibroblasts to serum. Science 1999; 283: 83–7

    PubMed  Article  CAS  Google Scholar 

  2. 2.

    DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278: 680–6

    PubMed  Article  CAS  Google Scholar 

  3. 3.

    Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Res 2001; 11: 1878–87

    PubMed  CAS  Google Scholar 

  4. 4.

    Long AD, Mangalam HJ, Chan BY, et al. Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework, analysis of global gene expression in E. Coli K12. J Biol Chem 2001; 276: 19937–44

    PubMed  Article  CAS  Google Scholar 

  5. 5.

    Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17: 509–19

    PubMed  Article  CAS  Google Scholar 

  6. 6.

    Hunter L, Taylor RC, Leach SM, et al. GEST: a gene expression search tool based on a novel Bayesian similarity metric. Bioinformatics 2001; 17Suppl. 1: S115–22

    PubMed  Article  Google Scholar 

  7. 7.

    Tsodikov A, Szabo A, Jones D. Adjustments and measures of differential expression for microarray data. Bioinformatics 2002; 18: 251–60

    PubMed  Article  CAS  Google Scholar 

  8. 8.

    Dudoit S, Yang YH, Speed TP, et al. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 2002; 12: 111–40

    Google Scholar 

  9. 9.

    Tusher V, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98: 5116–24

    PubMed  Article  CAS  Google Scholar 

  10. 10.

    Pan W. On the use of permutation in the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 2003; 19: 1333–40

    PubMed  Article  CAS  Google Scholar 

  11. 11.

    Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96: 1151–60

    Article  Google Scholar 

  12. 12.

    Dudoit S, Vander LMJ, Pollard KS. Multiple testing. Part I: single-step procedures for control of general type I error rates. Stat Appl Genet Mol Biol 2004; 3(1): article 13

    Google Scholar 

  13. 13.

    Serfling RJ. Approximation theorems of mathematical statistics. New York: John Wiley, 2000: 80

  14. 14.

    Welch BL. The generalization of ‘Students’ problem when several different population variances are involved. Biometrika 1947; 34: 28–35

    PubMed  CAS  Google Scholar 

  15. 15.

    Draghici S. Data analysis tools for DNA microarrays. Boca Raton (FL): Chapman and Hall/CRC, 2003

    Google Scholar 

Download references

Acknowledgements

Author would like to thank Prof. PK Sen, University of North Carolina, Chapel Hill; Prof. Z Govindarajulu, University of Kentucky; and the referee for giving their valuable feedback.

This work is not supported by any research grant, and there is no conflict of interest.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dr Sunil Mathur.

Appendix

Appendix

Appendix I

The algorithm for calculating the p-value.

Step-1 Generate random sample X from the population 1 under the null hypothesis.

Step-2 Generate random sample Y from the population 2 under the null hypothesis.

Step-3 Calculate sample X.

Step-4 Calculate sample Y.

Step-5 Calculate the test statistic Q.

Step-6 Repeat the process for N = 10,000 times.

Step-7 Find n1, the number of times Q exceeded 0.

Step-8 p-value = n1/N.

Appendix II

Critical value of Q for the right-tail test at 5% level of significance for the sample sizes (m, n) using the re-sampling scheme (table AI).

Table A1
figureTabA1

Critical value (CV) of the proposed test statistic Q for the one-tail test at 5% level of significance for the sample sizes (m, n)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mathur, S. A Robust Statistical Method for Detecting Differentially Expressed Genes. Appl-Bioinformatics 4, 247–251 (2005). https://doi.org/10.2165/00822942-200504040-00004

Download citation

Keywords

  • Microarray Data
  • Normal Mixture
  • Gene AC002378
  • Monte Carlo Simulation Technique
  • Nominal Significance Level