Skip to main content
Log in

Applications of Bayesian Statistical Methods in Microarray Data Analysis

  • Bioinformatics
  • Published:
American Journal of Pharmacogenomics

Abstract

Microarray technology allows one to measure gene expression levels simultaneously on the whole-genome scale. The rapid progress generates both a great wealth of information and challenges in making inferences from such massive data sets. Bayesian statistical modeling offers an alternative approach to frequentist methodologies, and has several features that make these methods advantageous for the analysis of microarray data. These include the incorporation of prior information, flexible exploration of arbitrarily complex hypotheses, easy inclusion of nuisance parameters, and relatively well developed methods to handle missing data.

Recent developments in Bayesian methodology generated a variety of techniques for the identification of differentially expressed genes, finding genes with similar expression profiles, and uncovering underlying gene regulatory networks. Bayesian methods will undoubtedly become more common in the future because of their great utility in microarray analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Table I
Fig. 1

Similar content being viewed by others

Notes

  1. 1Note that in statistical terminology, a parameter is an aspect of a population distribution (e.g. a mean, a variance, a correlation). The word parameter does not denote the measured values of the subject characteristics under study. These are referred to as variables.

References

  1. Allison DB. Statistical methods for microarray research for drug target identification. Proceedings of the American Statistical Association, Biopharmaceutical Section [CDRom]. Alexandria (VA): American Statistical Association, 2002

    Google Scholar 

  2. Satagopan JM, Panageas KS. A statistical perspective on gene expression data analysis. Stat Med 2003; 22(3): 481–99

    Article  PubMed  Google Scholar 

  3. Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat Genet 2002; 32 Suppl.: 502–8

    Article  PubMed  CAS  Google Scholar 

  4. Krajewski P, Bocianowski J. Statistical methods for microarray assays. J Appl Genet 2002; 43(3): 269–78

    PubMed  Google Scholar 

  5. Nadon R, Shoemaker J. Statistical issues with microarrays: processing and analysis. Trends Genet 2002; 18(5): 265–71

    Article  PubMed  CAS  Google Scholar 

  6. Fisher RA. Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society 1925; 22: 700–25

    Google Scholar 

  7. Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. New York (NY): Chapman & Hall, 1995

    Google Scholar 

  8. Hatfield GW, Hung SP, Baldi P. Differential analysis of DNA microarray gene expression data. Mol Microbiol 2003; 47(4): 871–7

    Article  PubMed  CAS  Google Scholar 

  9. Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17(6): 509–19

    Article  PubMed  CAS  Google Scholar 

  10. Jeffreys H. An invariant form for the prior probability in estimation problems. Proc R Soc Lond (Series A) 1946; 186: 453–61

    Article  CAS  Google Scholar 

  11. Edwards AWF. Statistical methods in scientific inference. Nature 1969; 222(200): 1233–7

    Article  PubMed  CAS  Google Scholar 

  12. Hung SP, Baldi P, Hatfield GW. Global gene expression profiling in Escherichia coli K12: the effects of leucine-responsive regulatory protein. J Biol Chem 2002; 277(43): 40309–23

    Article  PubMed  CAS  Google Scholar 

  13. Long AD, Mangalam HJ, Chan BY, et al. Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework: analysis of global gene expression in Escherichia coli K12. J Biol Chem 2001; 276(23): 19937–44

    Article  PubMed  CAS  Google Scholar 

  14. Townsend JP, Hartl DL. Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments. Genome Biol 2002; 3(12): 1–16

    Article  CAS  Google Scholar 

  15. Morris CN. Parametric empirical bayes inference: theory and application. J Am Stat Assoc 1983; 78: 47–55

    Article  Google Scholar 

  16. Samaniego FJ, Vestrup E. On improving standard estimators via linear empirical Bayes methods. Stat Prob Letters 1999; 44(3): 309–18

    Article  Google Scholar 

  17. Kubokawa T. Shrinkage and modification techniques in estimation of variance and the related problems: a review. Commun Stat Theor Methods 1999; 28(3–4): 613–50

    Google Scholar 

  18. Newton MA, Kendziorski CM, Richmond CS, et al. On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J Comput Biol 2001; 8(1): 37–52

    Article  PubMed  CAS  Google Scholar 

  19. Ihaka R, Gentleman R. A language for data analysis and graphics. J Comput Graph Stat 1996; 5(3): 299–314

    Google Scholar 

  20. Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of microarray experiment. J Am Stat Assoc 2001; 96: 1151–60

    Article  Google Scholar 

  21. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995; 57: 289–300

    Google Scholar 

  22. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98(9): 5116–21

    Article  PubMed  CAS  Google Scholar 

  23. Allison DB, Gadbury GL, Heo M, et al. A mixture model approach for the analysis of microarray gene expression data. Comput Stat Data Anal 2002; 39: 1–20

    Article  Google Scholar 

  24. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet 2001; 2(6): 418–27

    Article  PubMed  CAS  Google Scholar 

  25. Broet P, Richardson S, Radvanyi F. Bayesian hierarchical model or identifying changes in gene expression from microarray experiments. J Comput Biol 2002; 9(4): 671–83

    Article  PubMed  CAS  Google Scholar 

  26. Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 2000; 97(18): 10101–6

    Article  PubMed  CAS  Google Scholar 

  27. West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A 2001; 98(20): 11462–7

    Article  PubMed  CAS  Google Scholar 

  28. Li Y, Campbell C, Tipping M. Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 2002; 18(10): 1332–9

    Article  PubMed  CAS  Google Scholar 

  29. Eisen MB, Spellman PT, Brown PO, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998; 95(25): 14863–8

    Article  PubMed  CAS  Google Scholar 

  30. Ramoni MF, Sebastiani P, Kohane IS. Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A 2002; 99(14): 9121–6

    Article  PubMed  CAS  Google Scholar 

  31. Everitt BS. An introduction to finite mixture distributions. Stat Methods Med Res 1996; 5(2): 107–27

    Article  PubMed  CAS  Google Scholar 

  32. Green PJ. Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika 1995; 82: 711–32

    Article  Google Scholar 

  33. Medvedovic M, Sivaganesan S. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 2002; 18(9): 1194–206

    Article  PubMed  CAS  Google Scholar 

  34. Moloshok TD, Klevecz RR, Grant JD, et al. Application of Bayesian decomposition for analyzing Microarray data. Bioinformatics 2002; 18(4): 566–75

    Article  PubMed  CAS  Google Scholar 

  35. Spellman PT, Sherlock G, Zhang MQ, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998; 9(12): 3273–97

    PubMed  CAS  Google Scholar 

  36. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, chapter 14. New York: Springer Verlag, 2001

    Google Scholar 

  37. Vapnik VN. Statistical learning theory. New York: Wiley, 1998

    Google Scholar 

  38. Tipping ME. The relevance vector machine. Adv Neural Inf Process Syst 2000; 12: 652–8

    Google Scholar 

  39. Zhang Z, Page GP, Zhang H. Applying classification separability analysis to microarray data. In: Lin SM, Johnson KF, editors. Methods of microarray data analysis: papers from CAMDA 2000; 2000 Dec 18–19; Durham (NC). Boston (MA): Kluwer Academic Publishers, 2002: 125–36

    Chapter  Google Scholar 

  40. Qin ZS, McCue LA, Thompson W, et al. Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 2003; 21(4): 435–9

    Article  PubMed  CAS  Google Scholar 

  41. Barash Y, Friedman N. Context-specific Bayesian clustering for gene expression data. J Comput Biol 2002; 9(2): 169–91

    Article  PubMed  CAS  Google Scholar 

  42. Sabatti C, Rohlin L, Oh MK, et al. Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res 2002; 30(13): 2886–93

    Article  PubMed  CAS  Google Scholar 

  43. Butte A. The use and analysis of microarray data. Nat Rev Drug Discov 2002; 1(12): 951–60

    Article  PubMed  CAS  Google Scholar 

  44. Shoemaker DD, Linsley PS. Recent developments in DNA microarrays. Curr Opin Microbiol 2002; 5(3): 334–7

    Article  PubMed  CAS  Google Scholar 

  45. de la Fuente A, Brazhnik P, Mendes P. Linking the genes: inferring quantitative gene networks from microarray data. Trends Genet 2002; 18(8): 395–8

    Article  PubMed  Google Scholar 

  46. Brazhnik P, de la Fuente A, Mendes P. Gene networks: how to put the function in genomics. Trends Biotechnol 2002; 20(11): 467–72

    Article  PubMed  CAS  Google Scholar 

  47. Bolouri H, Davidson EH. Modeling transcriptional regulatory networks. Bioessays 2002; 24(12): 1118–29

    Article  PubMed  CAS  Google Scholar 

  48. Friedman N, Linial M, Nachman I, et al. Using Bayesian networks to analyze expression data. J Comput Biol 2000; 7(3–4): 601–20

    Article  PubMed  CAS  Google Scholar 

  49. Cooper G, Herskovitz E. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 1992; 9: 309–47

    Google Scholar 

  50. Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning 1995; 20(3): 197–243

    Google Scholar 

  51. Chickering DM. Learning Bayesian networks is NP-Complete. In: Fisher D, Lenz H, editors, Learning from data: artificial intelligence and statistics V. New York: Springer-Verlag, 1996: 121–30

    Google Scholar 

  52. Robert CP, Casella G. Monte Carlo statistical methods. New York: Springer Verlag, 1999

    Google Scholar 

  53. Friedman N, Nachman I, Pe’er D. Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99); 1999 Jul 30-Aug 1; Stockholm. San Francisco (CA): Morgan Kaufmann Publishers, 1999: 206–15

    Google Scholar 

  54. Friedman N, Goldszmidt M, Wyner A. Data analysis with Bayesian networks: a bootstrap approach. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99); 1999 Jul 30–Aug 1; Stockholm. San Francisco (CA): Morgan Kaufmann Publishers, 1999: 196–205

    Google Scholar 

  55. Imoto S, Goto T, Miyano S. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput 2002; 7: 175–86

    Google Scholar 

  56. Hartemink A, Gifford DK, Jakkola TS, et al. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pac Symp Biocomput 2001; 6: 422–33

    Google Scholar 

  57. Yoo C, Thorsson V, Cooper GF. Discovery of causal relationships in a generegulation pathway from a mixture of experimental and observational DNA microarray data. Pac Symp Biocomput 2002; 7: 498–509

    Google Scholar 

  58. Pe’ er D, Regev A, Elidan G, et al. Inferring subnetworks from perturbed expression profiles. Bioinformatics 2001; 17Suppl. 1: 8215–24

    Google Scholar 

  59. Hartemink A, Gifford D, Jaakkola T, et al. Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput 2002; 7: 437–49

    Google Scholar 

  60. Segal E, Shapira M, Regev A, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003; 34(2): 166–76

    Article  PubMed  CAS  Google Scholar 

  61. Ong IM, Glasner JD, Page D. Modeling regulatory pathways in E.coli from time series expression profiles. Bioinformatics 2002; 18Suppl. 1: S241–8

    Article  PubMed  Google Scholar 

  62. Cunningham MJ, Liang S, Fuhrman S, et al. Gene expression microarray data analysis for toxicology profiling. Ann N Y Acad Sci 2000; 919: 52–67

    Article  PubMed  CAS  Google Scholar 

  63. Somogyi R, Greller LD. The dynamics of molecular networks: applications to therapeutic discovery. Drug Discov Today 2001; 6(24): 1267–77

    Article  PubMed  CAS  Google Scholar 

  64. Savoie CJ, Aburatani S, Watanabe S, et al. Use of gene networks from full genome microarray libraries to identify functionally relevant drug-affected genes and gene regulation cascades. DNA Res 2003; 10(1): 19–25

    Article  PubMed  CAS  Google Scholar 

  65. Ball CA, Sherlock G, Parkinson H, et al. Microarray Gene Expression Data (MGED) Society. Standards for microarray data [letter]. Science 2002; 298(5593): 539

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

Supported in part by NIH grants T32AR007450, R01DK56366, P30DK56336, P01AG11915, R01AG018922, P20CA093753, R01AG011653, U24DK058776 and R01ES09912, NSF grants 0090286 and 0217651, and a grant from the UAB-HSF-GEF. ## None of the authors have a conflict of interest directly relevant to the content of this review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David B. Allison.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, D., Zakharkin, S.O., Page, G.P. et al. Applications of Bayesian Statistical Methods in Microarray Data Analysis. Am J Pharmacogenomics 4, 53–62 (2004). https://doi.org/10.2165/00129785-200404010-00006

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2165/00129785-200404010-00006

Keywords

Navigation