Skip to main content

Advertisement

Log in

Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments

  • Review Paper
  • Published:
Functional & Integrative Genomics Aims and scope Submit manuscript

Abstract

Progress in mapping the genome and developments in array technologies have provided large amounts of information for delineating the roles of genes involved in complex diseases and quantitative traits. Since complex phenotypes are determined by a network of interrelated biological traits typically involving multiple inter-correlated genetic and environmental factors that interact in a hierarchical fashion, microarrays hold tremendous latent information. The analysis of microarray data is, however, still a bottleneck. In this paper, we review the recent advances in statistical analyses for associating phenotypes with molecular events underpinning microarray experiments. Classical statistical procedures to analyze phenotypes in genetics are reviewed first, followed by descriptions of the statistical procedures for linking molecular events to measured gene expression phenotypes (microarray-based gene expression) and observed phenotypes such as diseases status. These statistical procedures include (1) prior analysis, such as data quality controls, and normalization analyses for minimizing the effects of experimental artifacts and random noise; (2) gene selections and differentiation procedures based on inferential statistics for the class comparisons; (3) dynamic temporal patterns analysis through exploratory statistics such as unsupervised clustering and supervised classification and predictions; (4) assessing the reliability of microarray studies using real-time PCR and the reproducibility issues from many studies and multiple platforms. In addition, the post analysis to associate the discovered patterns of gene expression to pathway and functional analysis for selected genes are also considered in order to increase our understanding of interconnected gene processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723

    Article  Google Scholar 

  • Alter O, Brown PO, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 97:10101–10106

    Article  PubMed  CAS  Google Scholar 

  • Bailar JC (1997) The promise and problems of meta-analysis (editorial). N Engl J Med 337:559–561

    Article  PubMed  Google Scholar 

  • Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519

    Article  PubMed  CAS  Google Scholar 

  • Ball RD (2001) Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian information criterion. Genetics 159:1351–1364

    PubMed  CAS  Google Scholar 

  • Barczak A, Rodriguez MW, Hanspers K, Koth LL, Tai YC, Bolstad BM, Speed TP, Erle DJ (2003) Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res 13:1775–1785

    Article  PubMed  CAS  Google Scholar 

  • Beal MJ, Falciani FL, Ghahramani Z, Rangel C, Wild D (2005) A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics 21:349–356

    Article  PubMed  CAS  Google Scholar 

  • Beaumont MA, Rannala B (2004) The Bayesian revolution in genetics. Nat Rev Genet 5:251–261

    Article  PubMed  CAS  Google Scholar 

  • Beissbarth T, Speed TP (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20(9):1464–1465

    Article  PubMed  CAS  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300

    Google Scholar 

  • Black MA, Doerge RW (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18(12):1609–1616

    Article  PubMed  CAS  Google Scholar 

  • Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet 33(Suppl):228–237

    Article  PubMed  CAS  Google Scholar 

  • Breslin T, Eden P, Krogh M (2004) Comparing functional annotation analyses with Catmap. BMC Bioinformatics 5:193

    Article  PubMed  CAS  Google Scholar 

  • Broet P, Richardson S, Radvanyi F (2002) Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. J Comput Biol 9(4):671–683

    Article  PubMed  CAS  Google Scholar 

  • Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J R Stat Soc B 64:641–656 (731–775)

    Article  MathSciNet  Google Scholar 

  • Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Junior MA, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using supported vector machines. Proc Natl Acad Sci U S A 97(1):262–267

    Article  PubMed  CAS  Google Scholar 

  • Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971

    PubMed  CAS  Google Scholar 

  • Cui X, Churchill G (2003) Statistical tests for differential expression in cDNA microarray experiments. Genome Biol 4:210

    Article  PubMed  Google Scholar 

  • Darvasi A (2003) Genomics—gene expression meets genetics. Nature 422:269–270

    Article  PubMed  CAS  Google Scholar 

  • Datta S, Datta S et al (2003) Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19:459–466

    Article  PubMed  CAS  Google Scholar 

  • Do K, Muller P, Tang F (2005) A Bayesian mixture model for differential gene expression. J R Stat Soc C 54(3):627–644

    Article  Google Scholar 

  • Doerge RW (2002) Mapping and analysis of quantitative trait loci in experimental populations. Nat Rev Genet 3:43–52

    PubMed  CAS  Google Scholar 

  • Doerge RW, Churchill GA (1996) Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285–294

    PubMed  CAS  Google Scholar 

  • Doerge RW, Zeng ZB, Weir BS (1997) Statistical issues in the search for genes affecting quantitative traits in experimental populations. Stat Sci 12:195–219

    Article  Google Scholar 

  • Dudoit S, Fridlyand J (2002) A prediction-based resampling method to estimate the number of clusters in a dataset. Genome Biol 3:RESEARCH0036

    Article  PubMed  Google Scholar 

  • Dudoit S, Fridlyand J, Speed T (2002a) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87

    Article  CAS  Google Scholar 

  • Dudoit S, Yang YH, Speed TP, Callow MJ (2002b) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 12:111–139

    Google Scholar 

  • Durbin BP, Hardin JS, Hawkins DM, Rocke DM (2002) A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 18:105–110

    Google Scholar 

  • Dysvik THB et al (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3)

  • Edwards J, Page GP, Gadbury G, Heo M, Kayo T, Weindruch R, Allison D (2005) Empirical Bayes estimation of gene specific effects in microarray research. Funct Integr Genomics 5:32–39

    Article  PubMed  CAS  Google Scholar 

  • Efron B (1996) Empirical Bayes methods for combining likelihoods (with discussion). J Am Stat Assoc 91:538–565

    Google Scholar 

  • Efron B, Morris C (1975) Data analysis using Stein’s estimator and its generalization. J Am Stat Assoc 70(350):311–319

    Google Scholar 

  • Efron B, Tibshirani R, Goss V, Chu G (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160

    Article  Google Scholar 

  • Ehm MG, Kimmel M, Cottingham RW (1996) Error detection for genetic data, using likelihood methods. Am J Hum Genet 58:225–234

    PubMed  CAS  Google Scholar 

  • Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868

    Article  PubMed  CAS  Google Scholar 

  • Gasch AP, Eisen MB (2002) Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 3(11)

  • Ghosh D (2005) Nonparametric methods for analyzing replication origins in genomewide data. Funct Integr Genomics 5:18–27 (5:28–31)

    Article  PubMed  CAS  Google Scholar 

  • Ghosh D, Chonnaiyan A (2003a) Covariate adjustment in the analysis of microarray data from clinical studies. Funct Integr Genomics 5:18–27

    Article  CAS  Google Scholar 

  • Ghosh D, Barette T, Rhodes D (2003b) Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer. Funct Integr Genomics 3:180–188

    Article  PubMed  CAS  Google Scholar 

  • Glynn D, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA (2003) DAVID: database for annotation, visualization, and integrated discovery. Genome Biol 4:R60

    Article  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caliguiri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    PubMed  CAS  Google Scholar 

  • Hastiel T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown PO (2000) Gene shaving as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol 1(2):research0003.1–research0003.21

    Article  Google Scholar 

  • Hoeting J, Madigan D, Raftery A, Volinsky C (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–417

    Article  Google Scholar 

  • Hosack D, Hosack A, Sherman BT, Lane HC, Lempicki RA (2003) Identifying biological themes within lists of genes with EASE. Genome Biol 4:R70

    Article  PubMed  Google Scholar 

  • Ibrahim JG, Chen M-H, Gray RJ (2002) Bayesian models for gene expression with DNA microarray data. J Am Stat Assoc 3:88–99

    Article  Google Scholar 

  • Ibrahim AFM, Hedley PE, Cardle L, Kruger W, Marshall DF, Muehlbauer GJ, Waugh R (2005) A comparative analysis of transcript abundance using SAGE and Affymetrix arrays. Funct Integr Genomics 5:163–174

    Article  PubMed  CAS  Google Scholar 

  • Irizarry RA, Bolstad BM, Collin F, Cope L, Hobbs B, Speed T (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31(4):e15

    Article  PubMed  CAS  Google Scholar 

  • Irizarry RA et al (2005) Multiple-laboratory comparison of microarray platforms. Nat Methods 2:345–349

    Article  PubMed  CAS  Google Scholar 

  • Jolliffe IT (1986) Principal component analysis. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Jolliffe IT, Uddin M (2003) A modified principal component technique base on the lasso. J Comput Graph Stat 12:531–547

    Article  Google Scholar 

  • Keller AD, Schummer M, Hood L, Ruzzo WL (2000) Bayesian classification of DNA array expression data. Technical Report UW-CSE-2000-08-01

  • Kerr M, Churchill (2000) Analysis of variance for gene expression microarray data. J Comput Biol 7:819–837

    Article  PubMed  CAS  Google Scholar 

  • Kerr M, Churchill G (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci U S A 97:8961–8965

    Article  Google Scholar 

  • Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679

    Article  PubMed  CAS  Google Scholar 

  • Larkin JE et al (2005) Independence and reproducibility across microarray platforms. Nat Methods 2:337–343

    Article  PubMed  CAS  Google Scholar 

  • Lee ML, Whitmore GA (2002) Power and sample size for DNA microarray studies. Stat Med 21:3543–3570

    Article  PubMed  Google Scholar 

  • Lee MT, Kuo FC, Whitmore GA, Sklar J (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A 97(18):9834–9839

    Article  PubMed  CAS  Google Scholar 

  • Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P (2004) Coexpression analysis of human genes across many microarray data sets. Genome Res 14:1085–1094

    Article  PubMed  CAS  Google Scholar 

  • Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98(1):31–36

    Article  PubMed  CAS  Google Scholar 

  • Li Y, Campbell C, Tipping M (2002) Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 18:1332–1339

    Article  PubMed  CAS  Google Scholar 

  • Liang Y, Kelemen A (2004) Hierarchical Bayesian neural network for gene expression temporal patterns. J Stat Appl Genet Mol Biol 3(1) Article 20

  • Liang Y, Tayo B, Cai X, Kelemen A (2005) Differential and trajectory methods for time course gene expression data. Bioinformatics 20(13):3009–3016

    Article  CAS  Google Scholar 

  • Liu B (1998) Statistical genomics: linkage, mapping and QTL analysis. CRC, Boca Raton

    Google Scholar 

  • Lonnstedt I, Speed T (2002) Replicated microarray data. Stat Sin 12(1):31–46

    Google Scholar 

  • Luan Y, Li H (2004) Model-based methods for identifying periodically regulated genes based on the time course microarray gene expression data. Bioinformatics 20:332–339 (01)

    Article  PubMed  CAS  Google Scholar 

  • MacKay DJC (1995) Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network: Comput Neural Syst 6(3):469–505

    Article  Google Scholar 

  • McShane LM, Radmacher MD, Friedlin B, Yu R, Li MC, Simon R (2002) Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18:1462–1469

    Article  PubMed  CAS  Google Scholar 

  • Merlise AC (1998) Bayesian model averaging and model search strategies. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics 6. Oxford University Press, Oxford

    Google Scholar 

  • Members of the Toxicogenomics Research Consortium (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2:351–356

    Google Scholar 

  • Monti S, Tamayo P, Mesirov JP, Golub TR (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 2003 52(1–2):91–118

    Article  Google Scholar 

  • Morris CN (1983) Parametric empirical Bayes inference: theory and applications. J Am Stat Assoc 78:47–65

    Google Scholar 

  • Neal SH, Madhusmita M, Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoroff VF (2000) Fundamental patterns underlying gene expression profiles: simplicity from complexicity. Proc Natl Acad Sci U S A 97:8409–8414

    Article  PubMed  Google Scholar 

  • Nettleton D, Doerge RW (2000) Accounting for variability in the use of permutation testing to detect quantitative trait loci. Biometrics 56:52–58

    Article  PubMed  CAS  Google Scholar 

  • Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios, improving statistical inference about gene expression changes from microarray data. J Comput Biol 8:37–52

    Article  PubMed  CAS  Google Scholar 

  • Nimgaonkar A, Sanoudou D, Butte AJ, Haslett JN, Kunkel LM, Beggs AH, Kohane IS (2003) Reproducibility of gene expression across generations of Affymetrix microarrays. BMC Bioinformatics 4(1):27

    Article  PubMed  Google Scholar 

  • Ooi CH, Tan P (2003) Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44

    Article  PubMed  CAS  Google Scholar 

  • Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4):546–549

    Article  PubMed  CAS  Google Scholar 

  • Pan W, Lin J, Le CT (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 3(5):Research0022

    PubMed  Google Scholar 

  • Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E (2004) A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clin Cancer Res 10:2922–2927

    Article  PubMed  CAS  Google Scholar 

  • Pochet N, Smet F, Suykens J, De Moor J (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20:3185–3195

    Article  PubMed  CAS  Google Scholar 

  • Ramoni MF, Sebastiani P, Kohane IS (2002) Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A 99(14):9121–9126

    Article  PubMed  CAS  Google Scholar 

  • Rhodes D, Yu J, Shanker K et al (2004) Large scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A 101(25):9309–9314

    Article  PubMed  CAS  Google Scholar 

  • Robbins H (1955) An empirical Bayes approach to statistics. In: Proceedings of the 3rd Berkeley symposium on mathematical statistics and probability, 1. University of California Press, Berkeley, CA, pp 157–164

    Google Scholar 

  • Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G (2003) Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet 12(8):823–836

    Article  PubMed  CAS  Google Scholar 

  • Sham P (2001) Statistics in human genetics. Oxford University Press, Oxford

    Google Scholar 

  • Sillanpaa MJ, Arjas E (2002) Model choice in gene mapping: what and why. Trends Genet 18:301–307

    Article  PubMed  CAS  Google Scholar 

  • Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1):14–18

    Article  PubMed  CAS  Google Scholar 

  • Slonim DK (2002) From patterns to pathways: gene expression data analysis comes of age. Nat Genet Suppl 32:502

    Article  CAS  Google Scholar 

  • Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1) Article 3

  • Smyth GK, Speed T (2003) Normalization of cDNA microarray data. In: Carter D (ed) Methods: selecting candidate genes for DNA array screens. Application to neuroscience. Elsevier, Amsterdam, pp 265–273

    Google Scholar 

  • Smyth GK, Michaud J, Scott HS (2005) Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9):2067–2075

    Article  PubMed  CAS  Google Scholar 

  • Spiegelhalter et al (2002) Bayesian measures of model complexity and fit. J R Stat Soc B 583–639

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions (with discussion). J R Stat Soc B 36:111–147

    Google Scholar 

  • Storey JD (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 31(6):2013–2035

    Article  Google Scholar 

  • Szabo A, Boucher K, Carrol WL, Klebanov LB, Tsodikov AD, Yakovlev AY (2002) Variable selection and pattern recognition with gene expression data generated by the microarray technology. Math Biosci 176:71–98

    Article  PubMed  MathSciNet  CAS  Google Scholar 

  • Tamayo T, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A 96:2907–2912

    Article  PubMed  CAS  Google Scholar 

  • Tan PK, Downey TJ, Spitznagel EL Jr, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 31(19)

  • Troyanskaya OG, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525

    Article  PubMed  CAS  Google Scholar 

  • Troyanskaya OG, Garber ME, Brown P, Botstein D, Altman RB (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18(11):1454–1461

    Article  PubMed  CAS  Google Scholar 

  • Troyanskaya OG, Dolinski K, Owen AO, Altman RB, Botstein D (2003) A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci U S A 100:8348–8353

    Article  PubMed  CAS  Google Scholar 

  • Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res 29(12):2549–2557

    PubMed  CAS  Google Scholar 

  • Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 9:5116–5121

    Article  Google Scholar 

  • Visscher PM, Thompson R, Haley CS (1996) Confidence intervals in QTL mapping by bootstrapping. Genetics 143:1013–1020

    PubMed  CAS  Google Scholar 

  • Wall ME, Dyck PA, Brettin TS (2001) SVDMAN—Singular value decomposition analysis of microarray data. Bioinformatics 17:566–568

    Article  PubMed  CAS  Google Scholar 

  • Weir B (1996) Genetic data analysis II. Sinauer, Sunderland

    Google Scholar 

  • West M (2000) Bayesian regression analysis in the "Large p, Small n" paradigm. Technical Report 00-22, Institute of Statistics and Decision Sciences, Duke University, CSE-2000-08-01

  • Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules R (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J Comp Biol 8(6):625–637

    Article  CAS  Google Scholar 

  • Wuju L, Momiao X (2002) Tclass: tumor classification system based on gene expression profile. Bioinformatics 18:325–326

    Article  PubMed  Google Scholar 

  • Xiao Y, Frisina R, Gordon A, Klebanov L, Yakovlev A (2004) Multivariate search for differentially expressed gene combinations. BMC Bioinformatics 5(1):164

    Article  PubMed  CAS  Google Scholar 

  • Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP (2002) Normalization for cDNA microarray data. Nucleic Acids Res 30(4):e15

    Article  PubMed  Google Scholar 

  • Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774

    Article  PubMed  CAS  Google Scholar 

  • Yeung et al (2003) Clustering gene-expression data with repeated measurements. Genome Biol 4:R34.1–R34.17

  • Zien A, Fluck J, Zimmer R, Lengauer T (2003) Microarrays: how many do you need? J Comput Biol 10(3–4):653–667

    Article  PubMed  CAS  Google Scholar 

  • Zou H, Hastie T et al (2004) Sparse principal component analysis. J Comput Graph Stat (in press)

Download references

Acknowledgements

The authors thank Rudi Appels for the critical reading of, and comments on, this manuscript. This work was supported in part by the National Institute of General Medical Sciences Grant (no. 5P20GM67650-02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yulan Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, Y., Kelemen, A. Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments. Funct Integr Genomics 6, 1–13 (2006). https://doi.org/10.1007/s10142-005-0006-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10142-005-0006-z

Keywords

Navigation