Skip to main content
Log in

Statistical Issues in the Design and Analysis of Gene Expression Microarray Studies of Animal Models

  • Published:
Journal of Mammary Gland Biology and Neoplasia Aims and scope Submit manuscript

Abstract

Appropriate statistical design and analysis of gene expression microarray studies is critical in order to draw valid and useful conclusions from expression profiling studies of animal models. In this paper, several aspects of study design are discussed, including the number of animals that need to be studied to ensure sufficiently powered studies, usefulness of replication and pooling, and allocation of samples to arrays. Data preprocessing methods for both cDNA dual-label spotted arrays and Affymetrix-style oligonucleotide arrays are reviewed. High-level analysis strategies are briefly discussed for each of the types of study aims, namely class comparison, class discovery, and class prediction. For class comparison, methods are discussed for identifying genes differentially expressed between classes while guarding against unacceptably high numbers of false positive findings. Various clustering methods are discussed for class discovery aims. Class prediction methods are briefly reviewed, and reference is made to the importance of proper validation of predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Dowing, M. Caligiuri, C. Bloomfield, and E. Lander (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537.

    Google Scholar 

  2. L. D. Miller, P. M. Long, L. Wong, S. Mukherjee, L. M. McShane, and E. T. Liu (2002). Optimal gene expression analysis by microarrays. Cancer Cell 2:353–361.

    Google Scholar 

  3. R. Simon, M. D. Radmacher, and K. Dobbin (2002). Design of studies using DNA microarrays. Genet. Epidemiol. 23:21–36.

    Google Scholar 

  4. Y. H. Yang and T. Speed (2002). Design issues for cDNA microarray experiments. Nat. Rev. Genet. 3:579–588.

    Google Scholar 

  5. R. Simon and K. Dobbin (2003). Experimental design of DNA microarray experiments. Biotechniques 34:S16-S21.

    Google Scholar 

  6. K. Dobbin, J. Shih, and R. Simon (2003). Questions and answers on design of dual-label microarrays for identifying differentially expressed genes. J. Natl. Cancer Inst. 95(18):1362–1369.

    Google Scholar 

  7. K. Dobbin and R. Simon (2002). Comparison of microarray designs for class comparison and class discovery. Bioinformatics 18:1438–1445.

    Google Scholar 

  8. M. K. Kerr and G. A. Churchill (2001). Statistical design and the analysis of gene expression microarray data. Genet. Res. 77:123–128.

    Google Scholar 

  9. M. K. Kerr and G. A. Churchill (2001). Experimental design for gene expression microarrays. Biostatistics 2:183–201.

    Google Scholar 

  10. R. D. Wolfinger, G. Gibson, E. D. Wolfinger, L. Bennett, H. Hamadeh, P. Bushel, C. Afshari, and R. S. Paules (2001). Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol. 8:625–638.

    Google Scholar 

  11. M.-L. Lee, F. C. Kuo, G. A. Whitmore, and J. Sklar (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. U.S.A. 97:983–989.

    Google Scholar 

  12. K. Dobbin, J. Shih, and R. Simon (2003). Statistical design of reverse dye microarrays. Bioinformatics 19(7):803–810.

    Google Scholar 

  13. R. Simon, E. Korn, L. M. McShane, M. D. Radmacher, G. W. Wright, and Y. Zhao (in press). Design and Analysis of DNA Microarray Investigations, Springer-Verlag, New York, a: chapter 3; b: chapter 9.

  14. J. Neter, W. Wasserman, and M. H. Kutner (1985). Applied Linear Statistical Models, 2nd edn., Richard D. Irwin, Homewood, IL, pp. 547–549, 700–702, 818, 919–920.

    Google Scholar 

  15. K. V. Desai, N. Xiao, W. Wang, L. Gangi, J. Greene, J. I. Powell, R. Dickson, P. Furth, K. Hunter, R. Kucherlapati, R. Simon, E. T. Liu, and J. E. Green (2002). Initiating oncogenic event determines gene-expression patterns of human breast cancer models. Proc. Natl. Acad. Sci. U.S.A. 99:6967–6972.

    Google Scholar 

  16. C. M. Kendziorski, Y. Zhang, H. Lan, and A. D. Attie (2003). The efficiency of pooling mRNA in microarray experiments. Biostatistics 4:465–477.

    Google Scholar 

  17. Y. H. Yang, S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai, and P. Speed (2002). Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4):e15.

    Google Scholar 

  18. Affymetrix (2001). Affymetrix Microarray Suite User Guide. 5th edn., Affymetrix, Santa Clara, CA.

    Google Scholar 

  19. C. Li and W. H. Wong (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl. Acad. Sci. U.S.A. 98:31–36.

    Google Scholar 

  20. C. Li and W. H. Wong (2001). Model-based analysis of oligonucleotide arrays: Model validation, design issues and standard error application. Genome Biol. 2:research0032.1–0032.11.

    Google Scholar 

  21. R. A. Irizarry, B. M. Bolstad, F. Collin, L. M. Cope, B. Hobbs, and T. P. Speed (2003). Summaries of Affymetrix genechip probe level data. Nucleic Acids Res. 31(4):e15.

    Google Scholar 

  22. R. A. Irizarry, B. Hobbs, F. Collin, Y. D. Beazer-Barclay, K. J. Antonellis, U. Scherf, and T. P. Speed (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2):249–264.

    Google Scholar 

  23. B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed (2003). A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19(2):185–193.

    Google Scholar 

  24. R. Simon, M. D. Radmacher, K. Dobbin, and L. M. McShane (2003). Pitfalls in the analysis of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95:14–18.

    Google Scholar 

  25. G. W. Snedecor and W. G. Cochran (1989). Statistical Methods, 8th edn., Iowa State University Press, Ames, IA, a: pp. 234–236; b: chapter 9.

    Google Scholar 

  26. M. Hollander and D. A. Wolfe (1999). Nonparametric Statistical Methods, 2nd edn., Wiley, New York, a: pp. 106–124; b: 190–201.

    Google Scholar 

  27. V. Tusher, R. Tibshirani, and G. Chu (2001). Significance analysis of microarrays applied to transcriptional responses to ionizing radiation. Proc. Natl. Acad. Sci. U.S.A. 98:5116–5121.

    Google Scholar 

  28. B. Efron, R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96:1151–1160.

    Google Scholar 

  29. E. L. Korn, J. F. Troendle, L. M. McShane, and R. Simon (in press). Controlling the number of false discoveries: Application to high-dimensional genomic data. J. Stat. Plan. Infer.

  30. P. H. Westfall and S. S. Young (1993). Resampling-Based Multiple Testing, Wiley, New York, pp. 72–74.

    Google Scholar 

  31. P. Baldi and A. D. Long (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519.

    Google Scholar 

  32. P. Broet, S. Richardson, and F. Radvanyi (2002). Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. J. Comput. Biol. 9(4):671–683.

    Google Scholar 

  33. G. Wright and R. Simon (in press). The random variance model for differential gene detection in small sample microarray experiments. Bioinformatics.

  34. A. K. Jain, M. N. Murty, and P. J. Flynn (1999). Data clustering: A Review. ACM Comput. Surv. 31(3):264–323.

    Google Scholar 

  35. M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863–14868.

    Google Scholar 

  36. R. Tibshirani, T. Hastie, M. Eisen, D. Ross, D. Botstein, and P. Brown. Clustering Methods for the Analysis of DNA Microarray Data, Stanford University Department of Statistics Technical Report, Stanford, CA.

  37. P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U.S.A. 96:2907–2912.

    Google Scholar 

  38. J. MacQueen (1967). Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Stat. Probability 1:281–297.

    Google Scholar 

  39. L. M. McShane, M. D. Radmacher, B. Freidlin, R. Yu, M. Li, and R. Simon (2002). Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18:1462–1469.

    Google Scholar 

  40. R. A. Fisher (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen. 7:179–188.

    Google Scholar 

  41. I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O. P. Kallioniemi, B. Wilfond, A. Borg, and J. Trent (2001). Gene expression profiles of hereditary breast cancer. N. Engl. J. Med. 344:549–548.

    Google Scholar 

  42. M. D. Radmacher, L. M. McShane, and R. Simon (2002). A paradigm for class prediction using gene expression profiles. J. Comput. Biol. 9:505–511.

    Google Scholar 

  43. L. Breiman, J. Friedman, C. Stone, and R. Olshen (1984). Classification and Regression trees, Wadsworth, Belmont, CA.

    Google Scholar 

  44. J. Khan, J. S. Wei, M. Ringnér, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7:673–679.

    Google Scholar 

  45. T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914.

    Google Scholar 

  46. M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristiani, C. W. Sunet, T. S. Furey, M. Ares, and D. Haussler (2000). Knowedge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U.S.A. 97:262–267.

    Google Scholar 

  47. R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. U.S.A. 99:6567–6572.

    Google Scholar 

  48. S. Dudoit, J. Fridlyand, and T. P. Speed (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97:77–87.

    Google Scholar 

  49. S. Knudsen (2002). A Biologist's Guide to Analysis of DNA Microarray data, Wiley, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisa M. McShane.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McShane, L.M., Shih, J.H. & Michalowska, A.M. Statistical Issues in the Design and Analysis of Gene Expression Microarray Studies of Animal Models. J Mammary Gland Biol Neoplasia 8, 359–374 (2003). https://doi.org/10.1023/B:JOMG.0000010035.57912.5a

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:JOMG.0000010035.57912.5a

Navigation