Skip to main content
Log in

Gene expression data: The technology and statistical analysis

  • Editor’s Invited Article
  • Published:
Journal of Agricultural, Biological, and Environmental Statistics Aims and scope Submit manuscript

Abstract

The desire to view the simultaneous behavior of genes affected by a stimulus at the total genome level has brought the scientific world to a new place in history. It is now commonplace to have an experiment that investigates the expression of thousands of genes across treatments and time points. Biologists are quickly understanding that in order to make sense of these data and the variation that is inherent in the experimental process, statistical models need to be employed. This article presents important aspects of the two most common microarray technologies, the spotted array and the oligonucleotide array, for the purpose of identifying common and unique features of each technology and the data produced. Statistical models are suggested, and the statistical literature reviewed, in an attempt to bring some level of simplicity to the daunting task of analyzing these data. We include two examples, each based upon one of the different technologies, suggesta statistical model, and present the results of the analyses in hopes of providing both encouragement and guidance to readers wanting to become more involved in this exciting field known as genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amaratunga, D., and Cabrera, J. (2001). “Analysis of Data from DNA Microchips,” Journal of the American Statistical Association, 96, 1161–1169.

    Article  MATH  MathSciNet  Google Scholar 

  • Anderson, M., and Roberts, J. (1998), Arabidopsis, Sheffield, England: Sheffield Academic Press.

    Google Scholar 

  • Baggerly, K. A., Coombes, K. R., Hess, K. R., Stivers, D. N., Abruzzo, L. V., and Zhang, W. (2001), “Identifying Differentially Expressed Genes in cDNA Microarray Experiments,” Journal of Computational Biology, 8, 639–659.

    Article  Google Scholar 

  • Baldi, P., and Long, A. D. (2001), “A Bayeisan Framework for the Analysis of Microarray Expression Data: Regularized t-Test and Statistical Inferences of Gene Changes,” Bioinformatics, 17, 509–519.

    Article  Google Scholar 

  • Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistcial Society, Series B, 57, 289–300.

    MATH  MathSciNet  Google Scholar 

  • Black, M. (2002), “Statistical Issues in the Design and Analysis of Spotted Microarray Experiments,” unpublished Ph.D. thesis, Purdue University.

  • Black, M. A., and Doerge, R. W. (2001), “Calculation of the Minimum Number of Replicate Spots Required for Detection of Significant Gene Expression Fold Change in Microarray Experiments,” in Proceedings of the Conference on Applied Statistics in Agriculture, ed. G. Milliken, pp. 144–158.

  • —, (2002), “Calculation of the Minimum Number of Replicate Spots Required for Detection of Significant Gene Expression Fold Change in Microarray Experiments,” Bioinformatics, 18, 1609–1616.

    Article  Google Scholar 

  • Carson, J. A., Nettleton, D., and Reecy, J. M. (2001), “Differential Gene Expression in the Rat Soleus Muscle During Early Work Overload-Induced Hypertrophy,” FASEB Journal, 15, U261-U281.

    Article  Google Scholar 

  • Chen, Y., Dougherty, E. R., and Bittner, M. L. (1997), “Ratio-Based Decisions and the Quantitative Analysis of cDNA Microarray Images,” Journal of Biomedical Optics, 2, 264–374.

    Article  Google Scholar 

  • Chu, T.-M., Weir, B., and Wolfinger, R. (2002), “A Systematic Statistical Linear Modeling Approach to Oligonucleotide Array Experiments,” Mathematical Biosciences, 176, 35–51.

    Article  MATH  MathSciNet  Google Scholar 

  • Clement, K., Viguerie, N., Diehn, M., Alizadeh, A., Barbe, P., Thalamas, C., Storey, J. D., Brown, P. O., Barsh, G. S., and Langin, D. (2002), “In Vivo Regulation of Human Skeletal Muscle Gene Expression by Thyroid Hormone,” Genome Research, 2, 281–291.

    Article  Google Scholar 

  • Craig, B. A., Vitek, O., Black, M. A., Tanurdzic, M., and Doerge, R. W. (2001), “Designing Microarray Experiments: Chips. Dips, Flips and Skips,” in Proceedings of the Conference on Applied Statistics in Agriculture, ed. G. Milliken, pp. 159–182.

  • Daniel, W. W. (1990), Applied Nonparametric Statistics (2nd ed.), Boston: PWS-Kent Publishing.

    Google Scholar 

  • Dudoit, S., Fridlyand, J., and Speed, T. P. (2002), “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” Journal of the American Statistical Association, 97, 77–87.

    Article  MATH  MathSciNet  Google Scholar 

  • Dudoit, S., Yang, Y. H., Callow, M. J., and Speed, T. P. (2000), “Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments,” Technical Report 578, Statistics Department, University of California at Berkeley.

  • Duggan, D. J., Bittner, M., Chen, Y., Meltzer, P., and Trent, J. M. (1999), “Expression Profiling Using cDNA Microarrays,” Nature Genetics Supplement, 21, 10–14.

    Article  Google Scholar 

  • Durbin, B., Hardin, J., Hawkins, D., and Rocke, D. (2002), “A Variance-Stabilizing Transformation for Gene-Expression Microarray Data,” Bioinformatics, 18, S105-S110.

    Google Scholar 

  • Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001), “Empirical Bayes Analysis of a Microarray Experiment,” Journal of the American Statistical Association, 96, 1151–1160.

    Article  MATH  MathSciNet  Google Scholar 

  • Eisen, M., and Brown, P. O. (1999), “DNA Arrays for Analysis of Gene Expression,” Methods in Enzymology, 303, 179–205.

    Article  Google Scholar 

  • Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998), “Cluster Analysis of Genome-Wide Expression Patterns,” in Proceedings of the National Academy of Sciences, 95, pp. 14863–14868.

  • Finnegan, E., Genger, R., Peacock, W., and Dennis, E. (1998), “DNA Methylation in Plants,” Annual Review of Plant Physiology and Plant Molecular Biology, 49, 223–247.

    Article  Google Scholar 

  • Fodor, S. P., Read, J. L., Pirrung, M. C., Stryer, L., Lu, A. T., and Solas, D. (1991), “Light-Directed Spatially addressable Parallel Chemical Synthesis,” Science, 251, 767–773.

    Article  Google Scholar 

  • Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E., Snesrud, E., Lee, N., and Quackenbush, J. (2000), “A Concise Guide to cDNA Microarray Analysis,” Bio Techniques, 29, 548–562.

    Google Scholar 

  • Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (2000), Understanding Robust and Exploratory Data Analysis, New York: Wiley.

    MATH  Google Scholar 

  • Hochberg, Y., and Tamhane, A. C. (1987), Multiple Comparison Procedures, New York: Wiley.

    Book  MATH  Google Scholar 

  • Holm, S. (1979), “A Simple Sequentially Rejective Multiple Test Procedure,” Scandanavian Journal of Statistics, 6, 65–70.

    MATH  MathSciNet  Google Scholar 

  • Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. (2002), “Variance Stabilization Applied to Microarray Data Calibration and to the Quantification of Differential Expression,” Bioinformatics, 1, 1–9.

    Google Scholar 

  • Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., and Speed, T. P. (in press), “Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data”, Biostatistics.

  • Jin, W., Riley, R. M., Wolfinger, R. D., White, K. P., Passador-Gurgel, G., and Gibson, G. (2001), “The Contributions of Sex, Genotype and Age to Transcriptional Variance in Drosophila melanogaster,” Nature Genetics, 29, 389–395.

    Article  Google Scholar 

  • Kendziorski, C., Zhang, Y., Lan, H., and Attie, A. (2002), “The Efficiency of mRNA Pooling in Microarray Experiments,” Technical Report 172, Department of Biostatistics, University of Wisconsin-Madison.

  • Kerr, M. K., and Churchill, G. A. (2001a), “Experimental Design for Gene Expression Microarrays,” Biostatistics, 2, 183–201.

    Article  MATH  Google Scholar 

  • — (2001b), “Statistical Design and the Analysis of Gene Expression Microarray Data,” Genetical Research, 77, 123–128.

    Google Scholar 

  • Kerr, M. K., Martin, M., and Churchill, G. A. (2000), “Analysis of Variance for Gene Expression Microarray Data,” Journal of Computational Biology, 7, 819–837.

    Article  Google Scholar 

  • Lee, M. T., Kuo, F. C., Whitmore, G. A., and Sklar, J. (2000), “Importance of Replication in Microarray Gene Expression Studies: Statistical Methods and Evidence From Repetitive cDNA Hybridizations,” in Proceedings of the National Academy of Sciences, 97, 9834–9839.

  • Li, C., and Wong, W. H. (2001a), “Model-Based Analysis of Oligonucleotide Arrays: Expression Index Computation and Outlier Detection,” in Proceedings of the National Academy of Sciences, 98, 31–36.

  • Li, C., (2001b), “Model-Based Analysis of Oligonucleotide Arrays: Model Validation, Design Issues and Standard Error Application,” Genome Biology, 2, research0032.1-research0031.11.

  • Lockhart, D. J., Dong, H. L., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittman, M., Wang, C. W., Kobayashi, M., Horton, H., and Brown, E. L. (1996), “Expression Monitoring by Hybridization to High-Denisty Oligonucleotide Arrays,” Nature Biotechnology, 14, 1675–1680.

    Article  Google Scholar 

  • Mills, J. C., and Gordon, J. L. (2001), “A New Approach for Filtering Noise from High-Density Oligonucleotide Microarray Datasets,” Nucleic Acids Research, 29, No. 15e72.

  • Munneke, B. (2001), “Null Model Methods for Cluster Analysis of Gene Expression Data,” unpublished Ph.D. thesis, Purdue University.

  • Nadon, R., and Shoemaker, J. (2002), “Statistical Issues with Microarrays: Processing and Analysis,” Trends in Genetics, 18, 265–271.

    Article  Google Scholar 

  • Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R., and Tsui, K. W. (2001), “On Differential Variability of Expression Ratios: Improving Statistical Inference About Gene Expression Changes from Microarray Data,” Journal of Computational Biology, 8, 37–52.

    Article  Google Scholar 

  • Nguyen, D., Bulak, A., Naisyin, W., and Carroll, R. (2002), “DNA Microarray Experiments: Biological and Technological Aspects,” Biometrics, 58, 701–717.

    Article  MathSciNet  Google Scholar 

  • Pease, A. C., Solas, D., Sullivan, E. J., Cronin, M. T., Holmes, C. P., and Fodor, S. P. (1994), “Light-Generated Oligonucleotide Arrays for Rapid DNA Sequence Analysis,” in Proceedings of the National Academy of Sciences, 91, 5022–5026.

  • Richmond, C. S., Glasner, J. D., Mau, R., Jin, H., and Blattner, F. R. (1997), “Genome-Wide Expression Profiling in Escherichia coli K-12,” Nucleic Acids Research, 27, 3821–3835.

    Article  Google Scholar 

  • Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., and Erlich, H. A. (1988), “Primer-Directed Enzymatic Amplification of DNA with a Thermostable DNA Polymerase,” Science, 239, 487–491.

    Article  Google Scholar 

  • Schadt, E. E., Li, C., Su, C., and Wong, W. H. (2000), “Analyzing High-Density Oligonucleotide Gene Expression Array Data,” Journal of Cellular Biochemistry, 80, 192–202.

    Article  Google Scholar 

  • Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995), “Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray,” Science, 270, 467–470.

    Article  Google Scholar 

  • Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O., and Davis, R. W. (1996), “Parallel Human Genome Analysis: Microarray-based Expression Monitoring of 1,000 Genes,” in Proceedings of the National Academy of Sciences, 93, 10614–10619.

  • Thomas, J. G., Olson, J. M., Tapscott, S. J., and Zhao, L. P. (2001), “An Efficient and Robust Statistcial Modelling Approach to Discover Differentially Expressed Genes Using Genomic Expression Profiles,” Genome Research, 11, 1227–1236.

    Article  Google Scholar 

  • Tusher, V. G., Tibshirani, R., and Chu, G. (2001), “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” in Proceedings of the National Academy of Sciences, 98, 5116–5121.

  • Vongs, A., Kakutai, T., Martienssen, R., and Richards, E. (1993), “Arabidopsis-thaliana DNA mMethylation Mutants,” Science, 260, 1926–1928.

    Article  Google Scholar 

  • Weller, J. I., Song, J. Z., Heyen, D. W., Lewin, H. A., and Ron, M. (1998), “A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits,” Genetics, 150, 1699–1706.

    Google Scholar 

  • Westfall, P. H., and Young, S. S. (1993), Resampling-Based Multiple Testing: Examples and Methods for p-value Adjustment, New York: Wiley.

    Google Scholar 

  • Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R. S. (2001), “Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models,” Journal of Computational Biology, 8, 625–637.

    Article  Google Scholar 

  • Yang, Y. H., Buckley, M. J., Dudoit, S., and Speed, T. P. (2000), “Comparison of Methods for Image Analysis on cDNA Microarray Data,” Technical Report 584, Statistics Department, University of California at Berkeley.

  • Yang, Y. H., Dudoit, S., Lu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002), “Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation,” Nucleic Acids Research, 30, e15.

    Article  Google Scholar 

  • Yang, Y.HH., and Speed, T. (2002), “Design Issues for cDNA Microarray Experiments,” Nature Reviews—Genetics, 3, 579–588.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. A. Craig.

Additional information

All three authors are also affiliated with the Computational Genomics Facility, Purdue University, West Lafayette, IN 47907.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Craig, B.A., Black, M.A. & Doerge, R.W. Gene expression data: The technology and statistical analysis. JABES 8, 1–28 (2003). https://doi.org/10.1198/1085711031256

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1198/1085711031256

Key Words

Navigation