Applied Bioinformatics

, Volume 4, Issue 4, pp 263–276 | Cite as

Random Walk Models for Bayesian Clustering of Gene Expression Profiles

Original Research Article


The analysis of gene expression temporal profiles is a topic of increasing interest in functional genomics. Model-based clustering methods are particularly interesting because they are able to capture the dynamic nature of these data and to identify the optimal number of clusters. We have defined a new Bayesian method that allows us to cope with some important issues that remain unsolved in the currently available approaches: the presence of time dislocations in gene expression, the non-stationarity of the processes generating the data, and the presence of data collected on an irregular temporal grid. Our method, which is based on random walk models, requires only mild a priori assumptions about the nature of the processes generating the data and explicitly models inter-gene variability within each cluster. It has first been validated on simulated datasets and then employed for the analysis of a dataset relative to serum-stimulated fibroblasts. In all cases, the results have been promising, showing that the method can be helpful in functional genomics research.



This work was in part supported by the Progetto di Ricerca di Interesse Nazionale (PRIN) 2003 grant ‘Dynamic modelling of gene expression profiles’ from the Italian Ministry of Education.

The authors have no conflicts of interest that are directly relevant to the content of this article.


  1. 1.
    Brown P, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet 1999; 21: 33–7PubMedCrossRefGoogle Scholar
  2. 2.
    Lipshutz RJ, Fodor SPA, Gingeras TR, et al. High density synthetic oligonucleotide arrays. Nat Genet 1999; 21: 20–4PubMedCrossRefGoogle Scholar
  3. 3.
    Sebastiani P, Gussoni E, Kohane I, et al. Statistical challenges in functional genomics. Stat Sci 2003; 18: 33–70CrossRefGoogle Scholar
  4. 4.
    Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New York: Springer, 2001Google Scholar
  5. 5.
    Eisen M, Spellman P, Botstein D, et al. Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci U S A 1998; 95: 14863–8PubMedCrossRefGoogle Scholar
  6. 6.
    Iyer VR, Eisen M, Ross DT, et al. The transcriptional program in the response of human fibroblasts to serum. Science 1999; 283: 83–7PubMedCrossRefGoogle Scholar
  7. 7.
    Spellman PT, Sherlock G, Zhang MQ, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998; 9: 3273–97PubMedGoogle Scholar
  8. 8.
    Chu S, DeRisi J, Eisen M, et al. The transcriptional program of sporulation in budding yeast. Science 1998; 282: 699–705PubMedCrossRefGoogle Scholar
  9. 9.
    Reis BY, Butte AS, Kohane IS. Extracting knowledge from dynamics in gene expression. J Biomed Inform 2001; 34: 15–27PubMedCrossRefGoogle Scholar
  10. 10.
    Aach J, Church GM. Aligning gene expression time series with time warping algorithms. Bioinformatics 2001; 17: 495–508PubMedCrossRefGoogle Scholar
  11. 11.
    Herwig R, Poustka AJ, Mller C, et al. Large-scale clustering of cDNA-fingerprinting data. Genome Res 1999; 9: 1093–105PubMedCrossRefGoogle Scholar
  12. 12.
    Tamayo P, Slonim D, Mesirov J, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 1999; 96: 2907–12PubMedCrossRefGoogle Scholar
  13. 13.
    Fraley C, Raftery A. How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 1998; 41: 578–88CrossRefGoogle Scholar
  14. 14.
    Yeung KY, Fraley C, Murua A, et al. Model-based clustering and data transformations for gene expression data. Bioinformatics 2001; 17: 977–87PubMedCrossRefGoogle Scholar
  15. 15.
    Bar-Joseph Z, Gerber G, Gifford DK, et al. A new approach to analyzing gene expression time series data. The 6th Annual International Conference on Research in Computational Molecular Biology (RECOMB); 2002 April 18–21; Washington, DCGoogle Scholar
  16. 16.
    Schliep A, Schonhuth A, Steinhoff C. Using hidden Markov models to analyze gene expression time course data. Bioinformatics 2003; 19: 255–63CrossRefGoogle Scholar
  17. 17.
    Ramoni M, Sebastiani P, Kohane I. Cluster analysis of gene expression dynamics. Proc Natl Acad Sci 2003; 99: 9121–6CrossRefGoogle Scholar
  18. 18.
    Barash Y, Friedman N. Context-specific Bayesian clustering for gene expression data. J Comput Biol 2002; 9: 169–91PubMedCrossRefGoogle Scholar
  19. 19.
    Magni P, Bellazzi R, De Nicolao G, et al. Non parametric AUC estimation in population studies with incomplete sampling: a Bayesian approach. J Pharmacokinet Pharmacodyn 2002; 29: 445–71PubMedCrossRefGoogle Scholar
  20. 20.
    Magni P, Bellazzi R, De Nicolao G. Bayesian function learning using MCMC methods. IEEE Trans Patten Anal Mach Intell 1998; 20: 1319–31CrossRefGoogle Scholar
  21. 21.
    De Nicolao G, Sparacino G, Cobelli C. Nonparametric input estimation in physiological systems: problems, methods and case studies. Automatica 1997; 33: 851–70CrossRefGoogle Scholar
  22. 22.
    Bellazzi R, Magni P, De Nicolao G. Bayesian analysis of blood glucose time series from diabetes home monitoring. IEEE Trans Biomed Eng 2000; 47: 971–5PubMedCrossRefGoogle Scholar
  23. 23.
    Schwartz G. Estimating the dimension of a model. Ann Stat 1978; 6: 461–4CrossRefGoogle Scholar
  24. 24.
    Kay SM. Fundamentals of statistical signal processing: estimation theory. Prentice Hall Signal Processing Series. Englewood Cliffs (NJ): PTR Prentice Hall, 1993Google Scholar
  25. 25.
    Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. London: Chapman & Hall, 1995Google Scholar
  26. 26.
    Hvidsten TR, Komorowski J, Sandvik AK, et al. Predicting gene function from gene expressions and ontologies. Pac Symp Biocomput 2001: 299–310Google Scholar
  27. 27.
    Sharan R, Maron-Katz A, Shamir R. CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 2003; 19: 1787–99PubMedCrossRefGoogle Scholar
  28. 28.
    Dennis Jr G, Sherman BT, Hosack DA, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003; 4(5): P3PubMedCrossRefGoogle Scholar

Copyright information

© Adis Data Information BV 2005

Authors and Affiliations

  • Fulvia Ferrazzi
    • 1
  • Paolo Magni
    • 1
  • Riccardo Bellazzi
    • 1
  1. 1.Dipartimento di Informatica e SistemisticaUniversità di PaviaPaviaItaly

Personalised recommendations