Statistical Issues in cDNA Microarray Data Analysis

  • Gordon K. Smyth
  • Yee Hwa Yang
  • Terry Speed
Part of the Methods in Molecular Biology book series (MIMB, volume 224)


Statistical considerations are frequently to the fore in the analysis of microarray data, as researchers sift through massive amounts of data and adjust for various sources of variability in order to identify the important genes among the many that are measured. This chapter summarizes some of the issues involved and provides a brief review of the analysis tools that are available to researchers to deal with these issues.


Background Intensity Foreground Pixel Background Estimate Bright Pixel Familywise Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Kerr, M. K. and Churchill, G. A. (2001) Experimental design for gene expression microarrays. Biostatistics 2, 183–201.PubMedCrossRefGoogle Scholar
  2. 2.
    Glonek, G. F. V. and Solomon, P. J. (2002) Factorial designs for microarray experiments. Technical Report, Department of Applied Mathematics, University of Adelaide, Australia.Google Scholar
  3. 3.
    Pan, W., Lin, J., and Le, C. (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. 3(5), research0022.1–0022.10.Google Scholar
  4. 4.
    Speed, T. P. and Yang, Y. H. (2002) Direct versus indirect designs for cDNA microarray experiments. Technical Report 616, Department of Statistics, University of California, Berkeley.Google Scholar
  5. 5.
    Alizadeh, A. A, Eisen, M. B., Davis, R. E., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511.PubMedCrossRefGoogle Scholar
  6. 6.
    Chen, Y., Dougherty, E. R., and Bittner, M. L. (1997) Ratio based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374.CrossRefGoogle Scholar
  7. 7.
    QuantArray Analysis Software.
  8. 8.
    Scanalytics MicroArray Suite.
  9. 9.
    GenePix Pro microarray and array analysis software, Axon Instruments
  10. 10.
    Buhler, J., Ideker, T., and Haynor, D. (2000) Dapple: improved techniques for finding spots on DNA microarrays. CSE Technical Report UWTR 2000-08-05, University of Washington.Google Scholar
  11. 11.
    Beucher, S. and Meyer, F. (1993) The morphological approach to segmentation: the watershed transformation: mathematical morphology in image processing. Opt. Eng. 34, 433–481.Google Scholar
  12. 12.
    Adams, R. and Bischof, L. (1994) Seeded region growing. IEEE Trans. Pattern Anal. Machine Intelligence 16, 641–647.CrossRefGoogle Scholar
  13. 13.
    Buckley, M. J. (2000) Spot User’s Guide, CSIRO Mathematical and Information Sciences, Sydney, Australia.
  14. 14.
    Wang, X., Ghosh, S., and Guo, S.-W. (2001) Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res. 29(15), E75–5.PubMedCrossRefGoogle Scholar
  15. 15.
    Eisen, M. B. (1999) ScanAlyze User Manual, Stanford University, Palo Alto. Google Scholar
  16. 16.
    ArrayVision, Imaging Research.
  17. 17.
    Soille, P. (1999) Morphological Image Analysis: Principles and Applications, Springer, New York.Google Scholar
  18. 18.
    Yang, Y. H., Buckley, M. J., Dudoit, S., and Speed, T. P. (2002) Comparison of methods for image analysis on cDNA microarray data. J. Computat. Graph. Stat. 11, 108–136.CrossRefGoogle Scholar
  19. 19.
    Kooperberg, C., Fazzio, T. G., Delrow, J. J., and Tsukiyama, T. (2002) Improved background correction for spotted cDNA microarrays. J. Computat. Biol. 9, 55–66.CrossRefGoogle Scholar
  20. 20.
    Dudoit, S., Yang, Y. H., Speed, T. P., and Callow, M. J. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12, 111–140.Google Scholar
  21. 21.
    Kerr, M. K., Martin, M., and Churchill, G. A. (2000) Analysis of variance for gene expression microarray data. J. Computat. Biol. 7, 819–837.CrossRefGoogle Scholar
  22. 22.
    Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R. S. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Computat. Biol. 8, 625–637.CrossRefGoogle Scholar
  23. 23.
    Yang, Y. H., Dudoit, S., Luu, P., and Speed, T. P. (2001) Normalization for cDNA microarray data, in Microarrays: Optical Technologies and Informatics (Bittner, M. L. Chen, Y. Dorsel, A. N. and Dougherty, E. R., eds.), Proceedings of SPIE, vol. 4266.Google Scholar
  24. 24.
    Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4), E15.PubMedCrossRefGoogle Scholar
  25. 25.
    Finkelstein, D. B., Gollub, J., Ewing, R., Sterky, F., Somerville, S., and Cherry, J. M. (2001) Iterative linear regression by sector, in Methods of Microarray Data Analysis. Papers from CAMDA 2000. (Lin S. M. and Johnson, K. F., eds.) Kluwer Academic, pp. 57–68.Google Scholar
  26. 26.
    Kepler, T. B., Crosby, L., and Morgan, K. T. (2000) Normalization and analysis of DNA microarray data by self-consistency and local regression, Santa Fe Institute Working Paper, Santa Fe, NM.Google Scholar
  27. 27.
    Schadt, E. E., Li, C., Ellis, B., and Wong, W. H. (2002) Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J. Cell. Biochem. 84(Suppl. 37), 120–125.Google Scholar
  28. 28.
    Tseng, G. C., Oh, M.-K., Rohlin, L., Liao, J. C., and Wong, W. H. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 29, 2549–2557.PubMedCrossRefGoogle Scholar
  29. 29.
    Brown, C. S., Goodwin, P. C., and Sorger, P. K. (2000) Image metrics in the statistical analysis of DNA microarray data. Proc. Natl. Acad. Sci. USA 98, 8944–8949.CrossRefGoogle Scholar
  30. 30.
    Yang, M. C., Ruan, Q.-G., Yang, J. J., Eckenrode, S., Wu, S., McIndoe, R. A., and She, J.-X. (2001) A statistical procedure for flagging weak spots greatly improves normalization and ratio estimates in microarray experiments. Physiol. Genomics 7, 45–53.PubMedGoogle Scholar
  31. 31.
    Nadon, R., Shi, P., Skandalis, A., Woody, E., Hubschle, H., Susko, E., Rghei, N., and Ramm, P. (2001) Statistical methods for gene expression arrays, in Microarrays: Optical Technologies and Informatics Proceedings of SPIE, vol. 4266, (Bittner, M. L., Chen, Y., Dorsel, A. N., and Dougherty, E. R. eds.), pp. 46–55.Google Scholar
  32. 32.
    Tusher, V., Tibshirani, R., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5124.PubMedCrossRefGoogle Scholar
  33. 33.
    Lönnstedt, I. and Speed, T. P. (2002) Replicated microarray data. Statistica Sinica 12, 31–46.Google Scholar
  34. 34.
    Efron B., Tibshirani, R., Storey J. D., and Tusher V. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96, 1151–1160.CrossRefGoogle Scholar
  35. 35.
    Lin, D. M., Yang, Y. H., Scolnick, J. A., Brunet, L. J., Peng, V., Speed, T. P., and Ngai, J. (2002) A spatial map of gene expression in the olfactory bulb, Department of Molecular and Cell Biology, University of California, Berkeley.Google Scholar
  36. 36.
    Lönnstedt, I., Grant, S., Begley, G., and Speed, T. P. (2001) Microarray analysis of two interacting treatments: a linear model and trends in expression over time. Technical Report, Department of Mathematics, Uppsala University, Sweden.Google Scholar
  37. 37.
    Huber, P. J. (1981) Robust Statistics, Wiley, New York.CrossRefGoogle Scholar
  38. 38.
    Marazzi, A. (1993) Algorithms, Routines and S Functions for Robust Statistics, Wadsworth & Brooks/Cole, CA.Google Scholar
  39. 39.
    Shaffer, J. P. (1995) Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–576.CrossRefGoogle Scholar
  40. 40.
    Westfall, P. H. and Young, S. S. (1993) Re-Sampling Based Multiple Testing, Wiley, New York.Google Scholar
  41. 41.
    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. 57, 289–300.Google Scholar
  42. 42.
    Storey, J. D. and Tibshirani, R. (2001) Estimating false discovery rates under dependence with applications to DNA microarrays, Technical Report, Department of Statistics, Stanford University.Google Scholar
  43. 43.
    Ideker, T., Thorsson, V., Siegel, A. F., and Hood, L. (2000) Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J. Computat. Biol. 7(6), 805–817.CrossRefGoogle Scholar
  44. 44.
    Newton, M. A., Kenziorski, C. M., Richmond, C. S., Blattner, F. R., and Tsui, K. W. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Computat. Biol. 8, 37–52.CrossRefGoogle Scholar
  45. 45.
    Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537.PubMedCrossRefGoogle Scholar
  46. 46.
    Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979) Multivariate Analysis, Academic, London.Google Scholar
  47. 47.
    McLachlan, G. J. (1992) Discriminant Analysis and Statistical Pattern Recognition, Wiley, New York.CrossRefGoogle Scholar
  48. 48.
    Riply, B. D. (1996) Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.Google Scholar
  49. 49.
    Breiman, L., Friedman, J. H., Olsen, R. A., and Stone, C. J. (1984) Classification and Regression Trees, Wadsworth, Monterey, CA.Google Scholar
  50. 50.
    Breiman, L. (1996) Bagging predictors. Machine Learning 24, 123–140.Google Scholar
  51. 51.
    Breiman, L. (1998) Arcing classifiers. Ann. Stat. 26, 801–824.CrossRefGoogle Scholar
  52. 52.
    Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M. Jr., and Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267.PubMedCrossRefGoogle Scholar
  53. 53.
    Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427.PubMedCrossRefGoogle Scholar
  54. 54.
    Dudoit, S., Fridlyand, J., and Speed, T. P. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87.CrossRefGoogle Scholar
  55. 55.
    Eisen, M. B., Spellman, P. T., Brown, P. O., Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.PubMedCrossRefGoogle Scholar
  56. 56.
    Hastie, T., Tibshirani, R., Eisen, M. B., Alizadeh, A., Levy, R., Staudt, L., Chan, W. C., Botstein, D., and Brown, P. (2000) “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1(2), 0003.1–0003.21.CrossRefGoogle Scholar
  57. 57.
    Lazzeroni, L. and Owen, A. B. (2002) Plaid models for gene expression data. Statistica Sinica 12, 61–86.Google Scholar
  58. 58.
    Parmigiani, G., Garrett, E. S., Anbazhagan, R., and Gabrielson, E. (2002) A statistical framework for expression-based molecular classification in cancer, Technical Report, Department of Biostatistics, Johns Hopkins University.Google Scholar
  59. 59.
    Dudoit, S., Yang, Y. H., and Bolstad, B. (2002) Using R for the analysis of DNA microarray data. R News 2(1), 24–32.Google Scholar
  60. 60.
    Dudoit, S. and Yang, Y. H. (2003) Bioconductor R packages for exploratory analysis and normalization of cDNA microarray data, in The Analysis of Gene Expression Data: Methods and Software (Parmigiani, G., Garrett, E. S., Irizarry, R. A., and Zeger, S. L., eds.), Springer, New York, in press.Google Scholar

Copyright information

© Humana Press Inc. 2003

Authors and Affiliations

  • Gordon K. Smyth
    • 1
  • Yee Hwa Yang
    • 2
  • Terry Speed
    • 1
    • 3
  1. 1.Walter and Eliza Hall Institute of Medical ResearchMelbourneAustralia
  2. 2.Division of BiostatisticsUniversity of CaliforniaSan Francisco
  3. 3.Department of StatisticsUniversity of California-BerkeleyBerkeley

Personalised recommendations