Abstract
The desire to view the simultaneous behavior of genes affected by a stimulus at the total genome level has brought the scientific world to a new place in history. It is now commonplace to have an experiment that investigates the expression of thousands of genes across treatments and time points. Biologists are quickly understanding that in order to make sense of these data and the variation that is inherent in the experimental process, statistical models need to be employed. This article presents important aspects of the two most common microarray technologies, the spotted array and the oligonucleotide array, for the purpose of identifying common and unique features of each technology and the data produced. Statistical models are suggested, and the statistical literature reviewed, in an attempt to bring some level of simplicity to the daunting task of analyzing these data. We include two examples, each based upon one of the different technologies, suggesta statistical model, and present the results of the analyses in hopes of providing both encouragement and guidance to readers wanting to become more involved in this exciting field known as genomics.
Similar content being viewed by others
References
Amaratunga, D., and Cabrera, J. (2001). “Analysis of Data from DNA Microchips,” Journal of the American Statistical Association, 96, 1161–1169.
Anderson, M., and Roberts, J. (1998), Arabidopsis, Sheffield, England: Sheffield Academic Press.
Baggerly, K. A., Coombes, K. R., Hess, K. R., Stivers, D. N., Abruzzo, L. V., and Zhang, W. (2001), “Identifying Differentially Expressed Genes in cDNA Microarray Experiments,” Journal of Computational Biology, 8, 639–659.
Baldi, P., and Long, A. D. (2001), “A Bayeisan Framework for the Analysis of Microarray Expression Data: Regularized t-Test and Statistical Inferences of Gene Changes,” Bioinformatics, 17, 509–519.
Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistcial Society, Series B, 57, 289–300.
Black, M. (2002), “Statistical Issues in the Design and Analysis of Spotted Microarray Experiments,” unpublished Ph.D. thesis, Purdue University.
Black, M. A., and Doerge, R. W. (2001), “Calculation of the Minimum Number of Replicate Spots Required for Detection of Significant Gene Expression Fold Change in Microarray Experiments,” in Proceedings of the Conference on Applied Statistics in Agriculture, ed. G. Milliken, pp. 144–158.
—, (2002), “Calculation of the Minimum Number of Replicate Spots Required for Detection of Significant Gene Expression Fold Change in Microarray Experiments,” Bioinformatics, 18, 1609–1616.
Carson, J. A., Nettleton, D., and Reecy, J. M. (2001), “Differential Gene Expression in the Rat Soleus Muscle During Early Work Overload-Induced Hypertrophy,” FASEB Journal, 15, U261-U281.
Chen, Y., Dougherty, E. R., and Bittner, M. L. (1997), “Ratio-Based Decisions and the Quantitative Analysis of cDNA Microarray Images,” Journal of Biomedical Optics, 2, 264–374.
Chu, T.-M., Weir, B., and Wolfinger, R. (2002), “A Systematic Statistical Linear Modeling Approach to Oligonucleotide Array Experiments,” Mathematical Biosciences, 176, 35–51.
Clement, K., Viguerie, N., Diehn, M., Alizadeh, A., Barbe, P., Thalamas, C., Storey, J. D., Brown, P. O., Barsh, G. S., and Langin, D. (2002), “In Vivo Regulation of Human Skeletal Muscle Gene Expression by Thyroid Hormone,” Genome Research, 2, 281–291.
Craig, B. A., Vitek, O., Black, M. A., Tanurdzic, M., and Doerge, R. W. (2001), “Designing Microarray Experiments: Chips. Dips, Flips and Skips,” in Proceedings of the Conference on Applied Statistics in Agriculture, ed. G. Milliken, pp. 159–182.
Daniel, W. W. (1990), Applied Nonparametric Statistics (2nd ed.), Boston: PWS-Kent Publishing.
Dudoit, S., Fridlyand, J., and Speed, T. P. (2002), “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” Journal of the American Statistical Association, 97, 77–87.
Dudoit, S., Yang, Y. H., Callow, M. J., and Speed, T. P. (2000), “Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments,” Technical Report 578, Statistics Department, University of California at Berkeley.
Duggan, D. J., Bittner, M., Chen, Y., Meltzer, P., and Trent, J. M. (1999), “Expression Profiling Using cDNA Microarrays,” Nature Genetics Supplement, 21, 10–14.
Durbin, B., Hardin, J., Hawkins, D., and Rocke, D. (2002), “A Variance-Stabilizing Transformation for Gene-Expression Microarray Data,” Bioinformatics, 18, S105-S110.
Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001), “Empirical Bayes Analysis of a Microarray Experiment,” Journal of the American Statistical Association, 96, 1151–1160.
Eisen, M., and Brown, P. O. (1999), “DNA Arrays for Analysis of Gene Expression,” Methods in Enzymology, 303, 179–205.
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998), “Cluster Analysis of Genome-Wide Expression Patterns,” in Proceedings of the National Academy of Sciences, 95, pp. 14863–14868.
Finnegan, E., Genger, R., Peacock, W., and Dennis, E. (1998), “DNA Methylation in Plants,” Annual Review of Plant Physiology and Plant Molecular Biology, 49, 223–247.
Fodor, S. P., Read, J. L., Pirrung, M. C., Stryer, L., Lu, A. T., and Solas, D. (1991), “Light-Directed Spatially addressable Parallel Chemical Synthesis,” Science, 251, 767–773.
Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E., Snesrud, E., Lee, N., and Quackenbush, J. (2000), “A Concise Guide to cDNA Microarray Analysis,” Bio Techniques, 29, 548–562.
Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (2000), Understanding Robust and Exploratory Data Analysis, New York: Wiley.
Hochberg, Y., and Tamhane, A. C. (1987), Multiple Comparison Procedures, New York: Wiley.
Holm, S. (1979), “A Simple Sequentially Rejective Multiple Test Procedure,” Scandanavian Journal of Statistics, 6, 65–70.
Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. (2002), “Variance Stabilization Applied to Microarray Data Calibration and to the Quantification of Differential Expression,” Bioinformatics, 1, 1–9.
Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., and Speed, T. P. (in press), “Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data”, Biostatistics.
Jin, W., Riley, R. M., Wolfinger, R. D., White, K. P., Passador-Gurgel, G., and Gibson, G. (2001), “The Contributions of Sex, Genotype and Age to Transcriptional Variance in Drosophila melanogaster,” Nature Genetics, 29, 389–395.
Kendziorski, C., Zhang, Y., Lan, H., and Attie, A. (2002), “The Efficiency of mRNA Pooling in Microarray Experiments,” Technical Report 172, Department of Biostatistics, University of Wisconsin-Madison.
Kerr, M. K., and Churchill, G. A. (2001a), “Experimental Design for Gene Expression Microarrays,” Biostatistics, 2, 183–201.
— (2001b), “Statistical Design and the Analysis of Gene Expression Microarray Data,” Genetical Research, 77, 123–128.
Kerr, M. K., Martin, M., and Churchill, G. A. (2000), “Analysis of Variance for Gene Expression Microarray Data,” Journal of Computational Biology, 7, 819–837.
Lee, M. T., Kuo, F. C., Whitmore, G. A., and Sklar, J. (2000), “Importance of Replication in Microarray Gene Expression Studies: Statistical Methods and Evidence From Repetitive cDNA Hybridizations,” in Proceedings of the National Academy of Sciences, 97, 9834–9839.
Li, C., and Wong, W. H. (2001a), “Model-Based Analysis of Oligonucleotide Arrays: Expression Index Computation and Outlier Detection,” in Proceedings of the National Academy of Sciences, 98, 31–36.
Li, C., (2001b), “Model-Based Analysis of Oligonucleotide Arrays: Model Validation, Design Issues and Standard Error Application,” Genome Biology, 2, research0032.1-research0031.11.
Lockhart, D. J., Dong, H. L., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittman, M., Wang, C. W., Kobayashi, M., Horton, H., and Brown, E. L. (1996), “Expression Monitoring by Hybridization to High-Denisty Oligonucleotide Arrays,” Nature Biotechnology, 14, 1675–1680.
Mills, J. C., and Gordon, J. L. (2001), “A New Approach for Filtering Noise from High-Density Oligonucleotide Microarray Datasets,” Nucleic Acids Research, 29, No. 15e72.
Munneke, B. (2001), “Null Model Methods for Cluster Analysis of Gene Expression Data,” unpublished Ph.D. thesis, Purdue University.
Nadon, R., and Shoemaker, J. (2002), “Statistical Issues with Microarrays: Processing and Analysis,” Trends in Genetics, 18, 265–271.
Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R., and Tsui, K. W. (2001), “On Differential Variability of Expression Ratios: Improving Statistical Inference About Gene Expression Changes from Microarray Data,” Journal of Computational Biology, 8, 37–52.
Nguyen, D., Bulak, A., Naisyin, W., and Carroll, R. (2002), “DNA Microarray Experiments: Biological and Technological Aspects,” Biometrics, 58, 701–717.
Pease, A. C., Solas, D., Sullivan, E. J., Cronin, M. T., Holmes, C. P., and Fodor, S. P. (1994), “Light-Generated Oligonucleotide Arrays for Rapid DNA Sequence Analysis,” in Proceedings of the National Academy of Sciences, 91, 5022–5026.
Richmond, C. S., Glasner, J. D., Mau, R., Jin, H., and Blattner, F. R. (1997), “Genome-Wide Expression Profiling in Escherichia coli K-12,” Nucleic Acids Research, 27, 3821–3835.
Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., and Erlich, H. A. (1988), “Primer-Directed Enzymatic Amplification of DNA with a Thermostable DNA Polymerase,” Science, 239, 487–491.
Schadt, E. E., Li, C., Su, C., and Wong, W. H. (2000), “Analyzing High-Density Oligonucleotide Gene Expression Array Data,” Journal of Cellular Biochemistry, 80, 192–202.
Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995), “Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray,” Science, 270, 467–470.
Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O., and Davis, R. W. (1996), “Parallel Human Genome Analysis: Microarray-based Expression Monitoring of 1,000 Genes,” in Proceedings of the National Academy of Sciences, 93, 10614–10619.
Thomas, J. G., Olson, J. M., Tapscott, S. J., and Zhao, L. P. (2001), “An Efficient and Robust Statistcial Modelling Approach to Discover Differentially Expressed Genes Using Genomic Expression Profiles,” Genome Research, 11, 1227–1236.
Tusher, V. G., Tibshirani, R., and Chu, G. (2001), “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” in Proceedings of the National Academy of Sciences, 98, 5116–5121.
Vongs, A., Kakutai, T., Martienssen, R., and Richards, E. (1993), “Arabidopsis-thaliana DNA mMethylation Mutants,” Science, 260, 1926–1928.
Weller, J. I., Song, J. Z., Heyen, D. W., Lewin, H. A., and Ron, M. (1998), “A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits,” Genetics, 150, 1699–1706.
Westfall, P. H., and Young, S. S. (1993), Resampling-Based Multiple Testing: Examples and Methods for p-value Adjustment, New York: Wiley.
Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R. S. (2001), “Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models,” Journal of Computational Biology, 8, 625–637.
Yang, Y. H., Buckley, M. J., Dudoit, S., and Speed, T. P. (2000), “Comparison of Methods for Image Analysis on cDNA Microarray Data,” Technical Report 584, Statistics Department, University of California at Berkeley.
Yang, Y. H., Dudoit, S., Lu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002), “Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation,” Nucleic Acids Research, 30, e15.
Yang, Y.HH., and Speed, T. (2002), “Design Issues for cDNA Microarray Experiments,” Nature Reviews—Genetics, 3, 579–588.
Author information
Authors and Affiliations
Corresponding author
Additional information
All three authors are also affiliated with the Computational Genomics Facility, Purdue University, West Lafayette, IN 47907.
Rights and permissions
About this article
Cite this article
Craig, B.A., Black, M.A. & Doerge, R.W. Gene expression data: The technology and statistical analysis. JABES 8, 1–28 (2003). https://doi.org/10.1198/1085711031256
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1198/1085711031256