Abstract
Microarray technology allows one to measure gene expression levels simultaneously on the whole-genome scale. The rapid progress generates both a great wealth of information and challenges in making inferences from such massive data sets. Bayesian statistical modeling offers an alternative approach to frequentist methodologies, and has several features that make these methods advantageous for the analysis of microarray data. These include the incorporation of prior information, flexible exploration of arbitrarily complex hypotheses, easy inclusion of nuisance parameters, and relatively well developed methods to handle missing data.
Recent developments in Bayesian methodology generated a variety of techniques for the identification of differentially expressed genes, finding genes with similar expression profiles, and uncovering underlying gene regulatory networks. Bayesian methods will undoubtedly become more common in the future because of their great utility in microarray analysis.
Similar content being viewed by others
Notes
1Note that in statistical terminology, a parameter is an aspect of a population distribution (e.g. a mean, a variance, a correlation). The word parameter does not denote the measured values of the subject characteristics under study. These are referred to as variables.
References
Allison DB. Statistical methods for microarray research for drug target identification. Proceedings of the American Statistical Association, Biopharmaceutical Section [CDRom]. Alexandria (VA): American Statistical Association, 2002
Satagopan JM, Panageas KS. A statistical perspective on gene expression data analysis. Stat Med 2003; 22(3): 481–99
Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat Genet 2002; 32 Suppl.: 502–8
Krajewski P, Bocianowski J. Statistical methods for microarray assays. J Appl Genet 2002; 43(3): 269–78
Nadon R, Shoemaker J. Statistical issues with microarrays: processing and analysis. Trends Genet 2002; 18(5): 265–71
Fisher RA. Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society 1925; 22: 700–25
Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. New York (NY): Chapman & Hall, 1995
Hatfield GW, Hung SP, Baldi P. Differential analysis of DNA microarray gene expression data. Mol Microbiol 2003; 47(4): 871–7
Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17(6): 509–19
Jeffreys H. An invariant form for the prior probability in estimation problems. Proc R Soc Lond (Series A) 1946; 186: 453–61
Edwards AWF. Statistical methods in scientific inference. Nature 1969; 222(200): 1233–7
Hung SP, Baldi P, Hatfield GW. Global gene expression profiling in Escherichia coli K12: the effects of leucine-responsive regulatory protein. J Biol Chem 2002; 277(43): 40309–23
Long AD, Mangalam HJ, Chan BY, et al. Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework: analysis of global gene expression in Escherichia coli K12. J Biol Chem 2001; 276(23): 19937–44
Townsend JP, Hartl DL. Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments. Genome Biol 2002; 3(12): 1–16
Morris CN. Parametric empirical bayes inference: theory and application. J Am Stat Assoc 1983; 78: 47–55
Samaniego FJ, Vestrup E. On improving standard estimators via linear empirical Bayes methods. Stat Prob Letters 1999; 44(3): 309–18
Kubokawa T. Shrinkage and modification techniques in estimation of variance and the related problems: a review. Commun Stat Theor Methods 1999; 28(3–4): 613–50
Newton MA, Kendziorski CM, Richmond CS, et al. On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J Comput Biol 2001; 8(1): 37–52
Ihaka R, Gentleman R. A language for data analysis and graphics. J Comput Graph Stat 1996; 5(3): 299–314
Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of microarray experiment. J Am Stat Assoc 2001; 96: 1151–60
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995; 57: 289–300
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98(9): 5116–21
Allison DB, Gadbury GL, Heo M, et al. A mixture model approach for the analysis of microarray gene expression data. Comput Stat Data Anal 2002; 39: 1–20
Quackenbush J. Computational analysis of microarray data. Nat Rev Genet 2001; 2(6): 418–27
Broet P, Richardson S, Radvanyi F. Bayesian hierarchical model or identifying changes in gene expression from microarray experiments. J Comput Biol 2002; 9(4): 671–83
Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 2000; 97(18): 10101–6
West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A 2001; 98(20): 11462–7
Li Y, Campbell C, Tipping M. Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 2002; 18(10): 1332–9
Eisen MB, Spellman PT, Brown PO, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998; 95(25): 14863–8
Ramoni MF, Sebastiani P, Kohane IS. Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A 2002; 99(14): 9121–6
Everitt BS. An introduction to finite mixture distributions. Stat Methods Med Res 1996; 5(2): 107–27
Green PJ. Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika 1995; 82: 711–32
Medvedovic M, Sivaganesan S. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 2002; 18(9): 1194–206
Moloshok TD, Klevecz RR, Grant JD, et al. Application of Bayesian decomposition for analyzing Microarray data. Bioinformatics 2002; 18(4): 566–75
Spellman PT, Sherlock G, Zhang MQ, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998; 9(12): 3273–97
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, chapter 14. New York: Springer Verlag, 2001
Vapnik VN. Statistical learning theory. New York: Wiley, 1998
Tipping ME. The relevance vector machine. Adv Neural Inf Process Syst 2000; 12: 652–8
Zhang Z, Page GP, Zhang H. Applying classification separability analysis to microarray data. In: Lin SM, Johnson KF, editors. Methods of microarray data analysis: papers from CAMDA 2000; 2000 Dec 18–19; Durham (NC). Boston (MA): Kluwer Academic Publishers, 2002: 125–36
Qin ZS, McCue LA, Thompson W, et al. Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 2003; 21(4): 435–9
Barash Y, Friedman N. Context-specific Bayesian clustering for gene expression data. J Comput Biol 2002; 9(2): 169–91
Sabatti C, Rohlin L, Oh MK, et al. Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res 2002; 30(13): 2886–93
Butte A. The use and analysis of microarray data. Nat Rev Drug Discov 2002; 1(12): 951–60
Shoemaker DD, Linsley PS. Recent developments in DNA microarrays. Curr Opin Microbiol 2002; 5(3): 334–7
de la Fuente A, Brazhnik P, Mendes P. Linking the genes: inferring quantitative gene networks from microarray data. Trends Genet 2002; 18(8): 395–8
Brazhnik P, de la Fuente A, Mendes P. Gene networks: how to put the function in genomics. Trends Biotechnol 2002; 20(11): 467–72
Bolouri H, Davidson EH. Modeling transcriptional regulatory networks. Bioessays 2002; 24(12): 1118–29
Friedman N, Linial M, Nachman I, et al. Using Bayesian networks to analyze expression data. J Comput Biol 2000; 7(3–4): 601–20
Cooper G, Herskovitz E. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 1992; 9: 309–47
Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning 1995; 20(3): 197–243
Chickering DM. Learning Bayesian networks is NP-Complete. In: Fisher D, Lenz H, editors, Learning from data: artificial intelligence and statistics V. New York: Springer-Verlag, 1996: 121–30
Robert CP, Casella G. Monte Carlo statistical methods. New York: Springer Verlag, 1999
Friedman N, Nachman I, Pe’er D. Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99); 1999 Jul 30-Aug 1; Stockholm. San Francisco (CA): Morgan Kaufmann Publishers, 1999: 206–15
Friedman N, Goldszmidt M, Wyner A. Data analysis with Bayesian networks: a bootstrap approach. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99); 1999 Jul 30–Aug 1; Stockholm. San Francisco (CA): Morgan Kaufmann Publishers, 1999: 196–205
Imoto S, Goto T, Miyano S. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput 2002; 7: 175–86
Hartemink A, Gifford DK, Jakkola TS, et al. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pac Symp Biocomput 2001; 6: 422–33
Yoo C, Thorsson V, Cooper GF. Discovery of causal relationships in a generegulation pathway from a mixture of experimental and observational DNA microarray data. Pac Symp Biocomput 2002; 7: 498–509
Pe’ er D, Regev A, Elidan G, et al. Inferring subnetworks from perturbed expression profiles. Bioinformatics 2001; 17Suppl. 1: 8215–24
Hartemink A, Gifford D, Jaakkola T, et al. Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput 2002; 7: 437–49
Segal E, Shapira M, Regev A, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003; 34(2): 166–76
Ong IM, Glasner JD, Page D. Modeling regulatory pathways in E.coli from time series expression profiles. Bioinformatics 2002; 18Suppl. 1: S241–8
Cunningham MJ, Liang S, Fuhrman S, et al. Gene expression microarray data analysis for toxicology profiling. Ann N Y Acad Sci 2000; 919: 52–67
Somogyi R, Greller LD. The dynamics of molecular networks: applications to therapeutic discovery. Drug Discov Today 2001; 6(24): 1267–77
Savoie CJ, Aburatani S, Watanabe S, et al. Use of gene networks from full genome microarray libraries to identify functionally relevant drug-affected genes and gene regulation cascades. DNA Res 2003; 10(1): 19–25
Ball CA, Sherlock G, Parkinson H, et al. Microarray Gene Expression Data (MGED) Society. Standards for microarray data [letter]. Science 2002; 298(5593): 539
Acknowledgements
Supported in part by NIH grants T32AR007450, R01DK56366, P30DK56336, P01AG11915, R01AG018922, P20CA093753, R01AG011653, U24DK058776 and R01ES09912, NSF grants 0090286 and 0217651, and a grant from the UAB-HSF-GEF. ## None of the authors have a conflict of interest directly relevant to the content of this review.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, D., Zakharkin, S.O., Page, G.P. et al. Applications of Bayesian Statistical Methods in Microarray Data Analysis. Am J Pharmacogenomics 4, 53–62 (2004). https://doi.org/10.2165/00129785-200404010-00006
Published:
Issue Date:
DOI: https://doi.org/10.2165/00129785-200404010-00006