Statistical Issues in Microarray Data Analysis

  • Willem A. Rensink
  • Samuel P. Hazen
Part of the Methods in Molecular Biology™ book series (MIMB, volume 323)


Microarrays provide the ability to quantitatively measure the abundance of specific RNA transcripts through sample hybridization to a solid-state grid of oligonucleotides or amplicons. The prospect of measuring the entire transcriptome is extremely alluring, but as with any experiment, it should be met with caution and great consideration. The level of confidence we can assign to the results depends on the skill at which the experiment is conducted, the quality of the experimental design and subsequent analysis, and, most important, the power in the study. Any microarray experiment consists of several components: (1) carrying out an appropriately designed (replicated) plant experiment; (2) array processing, which includes several steps of data acquisition and normalization; and (3) analysis of expression data to identify differentially expressed genes and overall patterns of expression. Numerous software packages are available to assist in performing these steps and it is not our intent to provide a software users manual or a statistical review. It is our intent to provide a brief user’s explanation of these various components and present the commonly used methods.

Key Words

Microarray data analysis experimental design normalization differential expression cluster analysis 


  1. 1.
    Harmer, S. L., Hogenesch, J. B., Straume, M., et al. (2000) Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. Science 290, 2110–2113.CrossRefPubMedGoogle Scholar
  2. 2.
    Yang, Y. H. and Speed, T. (2002) Design issues for cDNA microarray experiments. Nat. Rev. Genet. 3, 579–588.PubMedGoogle Scholar
  3. 3.
    Churchill, G. A. (2002) Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32 Suppl, 490–495.CrossRefPubMedGoogle Scholar
  4. 4.
    Kerr, M. K. and Churchill, G. A. (2001) Statistical design and the analysis of gene expression microarray data. Genet. Res. 77, 123–128.PubMedGoogle Scholar
  5. 5.
    Cui, X. and Churchill, G. A. (2003) Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 4, 210.CrossRefPubMedGoogle Scholar
  6. 6.
    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300.Google Scholar
  7. 7.
    Cope, L. M., Irizarry, R. A., Jaffee, H. A., Wu, Z., and Speed, T. P. (2004) A benchmark for Affymetrix GeneChip expression measures. Bioinformatics. 20, 323–331.CrossRefPubMedGoogle Scholar
  8. 8.
    Li, C. and Wong, W. H. (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98, 31–36.CrossRefPubMedGoogle Scholar
  9. 9.
    Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15.CrossRefPubMedGoogle Scholar
  10. 10.
    Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15.CrossRefPubMedGoogle Scholar
  11. 11.
    Quackenbush, J. (2002) Microarray data normalization and transformation. Nat. Genet. 32 Suppl, 496–501.CrossRefPubMedGoogle Scholar
  12. 12.
    Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003) A comparison of normalizationmethods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193.CrossRefPubMedGoogle Scholar
  13. 13.
    Astrand, M. (2003) Contrast normalization of oligonucleotide arrays. J. Comput. Biol. 10, 95–102.CrossRefPubMedGoogle Scholar
  14. 14.
    Schadt, E. E., Li, C., Ellis, B. and Wong, W. H. (2001) Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J. Cell Biochem. Suppl. Suppl 37, 120–125.CrossRefPubMedGoogle Scholar
  15. 15.
    Tusher, V. G., Tibshirani, R., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121.CrossRefPubMedGoogle Scholar
  16. 16.
    Kerr, M. K., Martin, M., and Churchill, G. A. (2000) Analysis of variance for gene expression microarray data. J. Comput. Biol. 7, 819–837.CrossRefPubMedGoogle Scholar
  17. 17.
    Wolfinger, R. D., Gibson, G., Wolfinger, E. D., et al. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol. 8, 625–637.CrossRefPubMedGoogle Scholar
  18. 18.
    Smyth, G. K., Yang, Y. H., and Speed, T. (2003) Statistical issues in cDNA microarray data analysis. Methods Mol. Biol. 224, 111–136.PubMedGoogle Scholar
  19. 19.
    Yeung, K. Y., Haynor, D. R., and Ruzzo, W. L. (2001) Validating clustering for gene expression data. Bioinformatics 17, 309–318.CrossRefPubMedGoogle Scholar
  20. 20.
    Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.CrossRefPubMedGoogle Scholar
  21. 21.
    Kerr, M. K. and Churchill, G. A. (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc. Natl. Acad. Sci. USA 98, 8961–8965.CrossRefPubMedGoogle Scholar
  22. 22.
    Gasch, A. P. and Eisen, M. B. (2002) Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3, RESEARCH0059.Google Scholar
  23. 23.
    Tamayo, P., Slonim, D., Mesirov, J., et al. (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912.CrossRefPubMedGoogle Scholar
  24. 24.
    Brazma, A. and Vilo, J. (2000) Gene expression data analysis. FEBS Lett. 480, 17–24.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2006

Authors and Affiliations

  • Willem A. Rensink
    • 1
  • Samuel P. Hazen
    • 2
  1. 1.The Institute for Genomic ResearchRockville
  2. 2.The Scripps Research InstituteLa Jolla

Personalised recommendations