Clustering Time-Series Gene Expression Data with Unequal Time Intervals

  • Luis Rueda
  • Ataul Bari
  • Alioune Ngom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5410)


Clustering gene expression data given in terms of time-series is a challenging problem that imposes its own particular constraints, namely exchanging two or more time points is not possible as it would deliver quite different results, and also it would lead to erroneous biological conclusions. We have focused on issues related to clustering gene expression temporal profiles, and devised a novel algorithm for clustering gene temporal expression profile microarray data. The proposed clustering method introduces the concept of profile alignment which is achieved by minimizing the area between two aligned profiles. The overall pattern of expression in the time-series context is accomplished by applying agglomerative clustering combined with profile alignment, and finding the optimal number of clusters by means of a variant of a clustering index, which can effectively decide upon the optimal number of clusters for a given dataset. The effectiveness of the proposed approach is demonstrated on two well-known datasets, yeast and serum, and corroborated with a set of pre-clustered yeast genes, which show a very high classification accuracy of the proposed method, though it is an unsupervised scheme.


Microarrays gene expression time series data clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bari, A., Rueda, L.: A New Profile Alignment Method for Clustering Gene Expression Data. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS, vol. 4013, pp. 86–97. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Bréhélin, L.: Clustering Gene Expression Series with Prior Knowledge. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 27–38. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2(1), 65–73 (1998)CrossRefGoogle Scholar
  4. 4.
    Conesa, A., Nueda, M.J., Ferrer, A., Talon, M.: maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 22(9), 1096–1102 (2006)CrossRefGoogle Scholar
  5. 5.
    Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P., Herskowitz, I.: The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)CrossRefGoogle Scholar
  6. 6.
    Déjean, S., Martin, P.G.P., Baccini, A., Besse, P.: Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives. EURASIP J. Bioinform. Syst. Biol. 2007, 70561 (2007)Google Scholar
  7. 7.
    Drăghici, S.: Data Analysis Tools for DNA Microarrays. Chapman & Hall, Boca Raton (2003)Google Scholar
  8. 8.
    Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)CrossRefGoogle Scholar
  9. 9.
    Ernst, J., Nau, G.J., Bar-Joseph, Z.: Clustering Short Time Series Gene Expression Data. Bioinformatics 21(suppl. 1), i159–i168 (2005)CrossRefGoogle Scholar
  10. 10.
    Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 0059.1–0059.22 (2002)CrossRefGoogle Scholar
  11. 11.
    Guillemin, K., Salama, N., Tompkins, L., Falkow, S.: Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection. Proc. Natl. Acad. Sci. 99, 15136–15141 (2002)CrossRefGoogle Scholar
  12. 12.
    Hartigan, J.A.: Clustering Algorithms. John Wiley and Sons, Chichester (1975)zbMATHGoogle Scholar
  13. 13.
    Heijne, W.H., Stierum, R.H., Slijper, M., van Bladeren, P.J., van Ommen, B.: Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. Biochem. Pharmacol. 65, 857–875 (2003)CrossRefGoogle Scholar
  14. 14.
    Heyer, L., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 9, 1106–1115 (1999)CrossRefGoogle Scholar
  15. 15.
    Hogg, R., Craig, A.: Introduction to Mathematical Statistics, 5th edn. MacMillan, Basingstoke (1995)Google Scholar
  16. 16.
    Hwang, J., Peddada, S.: Confidence interval estimation subject to order restrictions. Ann. Statist. 22, 67–93 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Iyer, V., Eisen, M., Ross, D., Schuler, G., Moore, T., Lee, J., Trent, J., Staudt, L., Hudson Jr., J., Boguski, M.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)CrossRefGoogle Scholar
  18. 18.
    Bar-Joseph, Z., Gerber, G., Jaakkola, T., Gifford, D., Simon, I.: Continuous representations of time series gene expression data. Journal of Computational Biology 10(3-4), 341–356 (2003)CrossRefGoogle Scholar
  19. 19.
    Lobenhofer, E., Bennett, L., Cable, P., Li, L., Bushel, P., Afshari, C.: Regulation of DNA replication fork genes by 17betaestradiol. Molec. Endocrin. 16, 1215–1229 (2002)CrossRefGoogle Scholar
  20. 20.
    Maulik, U., Bandyopadhyay, S.: Performance Evaluation of Some Clustering Algorithms and Validity Indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)CrossRefGoogle Scholar
  21. 21.
    Moller-Levet, C., Klawonn, F., Cho, K.-H., Wolkenhauer, O.: Clustering of unevenly sampled gene expression time-series data. Fuzzy sets and Systems 152(1,16), 49–66 (2005)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Peddada, S., Prescott, K., Conaway, M.: Tests for order restrictions in binary data. Biometrics 57, 1219–1227 (2001)CrossRefMathSciNetGoogle Scholar
  23. 23.
    Peddada, S., Lobenhofer, E., Li, L., Afshari, C., Weinberg, C., Umbach, D.: Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19(7), 834–841 (2003)CrossRefGoogle Scholar
  24. 24.
    Petrie, T.: Probabilistic functions of finite state Markov chains. Ann. Math. Statist. 40, 97–115 (1969)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Ramoni, M., Sebastiani, P., Kohane, I.: Cluster analysis of gene expression dynamics. Proc. Natl. Acad. Sci. USA 99(14), 9121–9126 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Ramsay, J., Silverman, B.: Functional Data Analysis, 2nd edn. Springer, New York (2005)Google Scholar
  27. 27.
    Rueda, L., Bari, A.: Clustering Temporal Gene Expression Data with Unequal Time Intervals. In: 2nd International Conference on Bio-Inspired Models of Network, Information, and Computing Systems, Bioinformatics Track, Budapest, Hungary (2007) ICST 978-963-9799-11-0Google Scholar
  28. 28.
    Schliep, A., Schonhuth, A., Steinhoff, C.: Using hidden Markov models to analyze gene expression time course data. Bioinformatics 19, I264–I272 (2003)CrossRefGoogle Scholar
  29. 29.
    Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycleregulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 9, 3273–3297 (1998)Google Scholar
  30. 30.
    Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E., Golub, T.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96(6), 2907–2912 (1999)CrossRefGoogle Scholar
  31. 31.
    Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)CrossRefGoogle Scholar
  32. 32.
    Zhu1, G., Spellman, P.T., Volpe, T., Brown, P.O., Botstein, D., Davis, T.N., Futcher, B.: Two yeast forkhead genes regulate cell cycle and pseudohyphal growth. Nature 406, 90–94 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Luis Rueda
    • 1
  • Ataul Bari
    • 1
  • Alioune Ngom
    • 1
  1. 1.School of Computer ScienceUniversity of WindsorWindsorCanada

Personalised recommendations