Clustering Short Gene Expression Profiles

  • Ling Wang
  • Marco Ramoni
  • Paola Sebastiani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


The unsupervised clustering analysis of data from temporal or dose-response experiments is one of the most important and challenging tasks of microarray data anlysis. Here we present an extension of CAGED (Cluster Analysis of Gene Expression Dynamics, one of the most commonly used programs) to identify similar gene expression patterns measured in either short time-course or dose-response microarray experiments. Compared to the initial version of CAGED, in which gene expression temporal profiles are modeled by autoregressive equations, this new method uses polynomial models to incorporate time/dosage information into the model, and objective priors to include information about background noise in gene expression data. In its current formulation, CAGED results may change according to the parametrization. In this new formulation, we make the results invariant to reparametrization by using proper prior distributions on the model parameters. We compare the results obtained by our approach with those generated by STEM to show that our method can identify the correct number of clusters and allocate gene expression profiles to the correct clusters in simulated data, and produce more meaningful Gene Ontology enriched clusters in data from real microarray experiments.


Gene Expression Data Microarray Experiment Polynomial Model Ease Score Natural Logarithmic Scale 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994)MATHCrossRefGoogle Scholar
  2. 2.
    Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  3. 3.
    Ernst, J., Nau, G.J., Bar-Joseph, Z.: Clustering short time series gene expression data. Bioinformatics, 21 Suppl 21(Suppl. 1), i159–i168 (2005)Google Scholar
  4. 4.
    Guillemin, K., Salama, N., Tompkins, L., Falkow, S.: Cag pathogenicity island-specific response of gastric epithelial cells to helicobacter pylori infection. Proc. Natl. Acad. Sci. USA 99(23), 15136–15141 (2002)CrossRefGoogle Scholar
  5. 5.
    Gunderson, K.L., Kruglyak, S., Graigeand, M.S., Garcia, F., Kermani, B.G., Zhao, C., Che, D., Dickinson, T., Wickham, E., Bierle, J., Doucet, D., Milewski, M., Yang, R., Siegmund, C., Haas, J., Zhou, L., Oliphant Ad, A., Fan, J., Barnard, S., Chee, M.S.: Decoding randomly ordered DNA arrays. Genome Res. 14, 870–877 (2004)CrossRefGoogle Scholar
  6. 6.
    Hosack, D.A., Dennis Jr., G., Sherman, B.T., Clifford Lane, H., Lempicki, R.A.: Identifying biological themes within lists of genes with EASE. Genome Biology 4(6), 4 (2003)CrossRefGoogle Scholar
  7. 7.
    Kass, R.E., Raftery, A.: Bayes factors. J. Ameri. Statist. Assoc. 90, 773–795 (1995)MATHCrossRefGoogle Scholar
  8. 8.
    Ramoni, M., Sebastiani, P., Cohen, P.R.: Bayesian clustering by dynamics. Mach. Learn. 47(1), 91–121 (2002)MATHCrossRefGoogle Scholar
  9. 9.
    Ramoni, M., Sebastiani, P., Kohane, I.S.: Cluster analysis of gene expression dynamics. Proc. Natl. Acad. Sci. USA 99(14), 9121–9126 (2002)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Sebastiani, P., Gussoni, E., Kohane, I.S., Ramoni, M.: Statistical challenges in functional genomics (with discussion). Statist. Sci. 18, 33–70 (2003)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ling Wang
    • 1
  • Marco Ramoni
    • 2
  • Paola Sebastiani
    • 1
  1. 1.Department of BiostatisticsBoston University School of Public HealthBostonUSA
  2. 2.Children’s Hospital Informatics Program, Harvard Medical SchoolBostonUSA

Personalised recommendations