Annals of Operations Research

, Volume 263, Issue 1–2, pp 405–428 | Cite as

Clustering of short time-course gene expression data with dissimilar replicates

Data Mining and Analytics


Microarrays are used in genetics and medicine to examine large numbers of genes simultaneously through their expression levels under any condition such as a disease of interest. The information from these experiments can be enriched by following the expression levels through time and biological replicates. The purpose of this study is to propose an algorithm which clusters the genes with respect to the similarities between their behaviors through time. The algorithm is also aimed at highlighting the genes which show different behaviors between the replicates and separating the constant genes that keep their baseline expression levels throughout the study. Finally, we aim to feature cluster validation techniques to suggest a sensible number of clusters when it is not known a priori. The illustrations show that the proposed algorithm in this study offers a fast approach to clustering the genes with respect to their behavior similarities, and also separates the constant genes and the genes with dissimilar replicates without any need for pre-processing. Moreover, it is also successful at suggesting the correct number of clusters when that is not known.


Microarray gene expression Short time-series Replication Distance Clustering Cluster validation 



The authors would like to thank the Academic Writing Center at Middle East Technical University for their writing consultation and editorial help.


  1. Alonso, A., Berrendero, J., Hernandez, A., & Justel, A. (2006). Time series clustering based on forecast densities. Computational Statistics and Data Analysis, 51, 762–776.CrossRefGoogle Scholar
  2. Bar-Joseph, Z. (2004). Analyzing time series gene expression data. Bioinformatics, 20(16), 2493–2503.CrossRefGoogle Scholar
  3. Bar-Joseph, Z., Gerber, G. K., Gifford, D. K., Jaakkola, T. S., & Simon, I. (2003). Continuous representations of time-series gene expression data. Journal of Computational Biology, 10(3–4), 341–356.CrossRefGoogle Scholar
  4. Caiado, J., Crato, N., & Pena, D. (2006). A periodogram-based metric for time series classification. Computational Statistics and Data Analysis, 50, 2668–2684.CrossRefGoogle Scholar
  5. Celeux, G., Martin, O., & Lavergne, C. (2005). Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments. Statistical Modelling, 5(3), 243–267.CrossRefGoogle Scholar
  6. Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Wodicka, A. C. L., Wolfsberg, T. G., et al. (1998). A genome-wide transcriptioanal analysis of the mitotic cell cycle. Molecular Cell, 2(1), 65–73.CrossRefGoogle Scholar
  7. Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., & Brown, P. O. (1998). The transcriptional program of sporulation in budding yeast. Science, 282(5389), 699–705.CrossRefGoogle Scholar
  8. Corduas, M., & Piccolo, D. (2008). Time series clustering and classification by the autoregressive metric. Computational Statistics and Data Analysis, 52, 1860–1872.CrossRefGoogle Scholar
  9. Díaz, S. P., & Vilar, J. A. (2010). Comparing several parametric and nonparametric approaches to time series clustering: A simulation study. Journal of Classification, 27, 333–362.CrossRefGoogle Scholar
  10. Do, J. H., & Choi, D. (2008). Clustering approaches to identfying gene expression patterns from dna microarray data. Molecules and Cells, 25(2), 279.Google Scholar
  11. Eisen, M. B., Spellman, P. T., Brown, P. O., & Boltstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95(25), 14,863–14,868.CrossRefGoogle Scholar
  12. Ernst, J., Nau, G. J., & Bar-Joseph, Z. (2005). Clustering short time series gene expression data. Bioinformatics, 21(suppl 1), i159–i168.CrossRefGoogle Scholar
  13. Galbraith, J., & Jiaqing, L. (1999). Cluster and discriminant analysis on time series as a research tool UTIP Working Paper Number 6, The University of Texas at Austin, Austin: Lyndon BGoogle Scholar
  14. Hackstadt, A. J., & Hess, A. M. (2009). Filtering for increased power for microarray data analysis. BMC Bioinformatics, 10(1), 1.CrossRefGoogle Scholar
  15. Hakamada, K., Okamoto, M., & Hanai, T. (2006). Novel technique for preprocessing high dimensional time-course data from dna microarray: Mathematical model-based clustering. Bioinformatics, 22(7), 843–848.CrossRefGoogle Scholar
  16. Heard, N. A., Holmes, C. C., Stephens, D. A., Hand, D. J., & Dimopoulos, G. (2005). Bayesian coclustering of anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proceedings of the National Academy of Sciences of the United States of America, 102(47), 16,939–16,944.CrossRefGoogle Scholar
  17. Heyer, L. J., Kruglyak, S., & Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Research, 9(11), 1106–1115.CrossRefGoogle Scholar
  18. Irigoien, I., Vives, S., & Arenas, C. (2011). Microarray time course experiments: Finding profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(2), 464–475.CrossRefGoogle Scholar
  19. Kakizawa, Y., Shumway, R. H., & Taniguchi, M. (1998). Discrimation and clustering for multivariate time series. Journal of the American Statistical Association, 93, 328–340.CrossRefGoogle Scholar
  20. Khan, J., Simon, R., Bittner, M., Chen, Y., Leighton, S. B., Pohida, T., et al. (1998). Gene expression profiling of alveolar rhabdomyosarcoma with cdna microarrays. Cancer Research, 58(22), 5009–5013.Google Scholar
  21. Kim, B. R., Zhang, L., Berg, A., Fan, J., & Wu, R. (2008). A computational approach to the functional clustering of periodic gene-expression profiles. Genetics, 180(2), 821–834.CrossRefGoogle Scholar
  22. Liao, T. W. (2005). Clustering of time series data: A survey. Pattern Recognition, 38(11), 1857–1874.CrossRefGoogle Scholar
  23. Luan, Y., & Li, H. (2004). Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data. Bioinformatics, 20(3), 332–339.CrossRefGoogle Scholar
  24. Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650–1654.CrossRefGoogle Scholar
  25. McLachlan, G. J., Peel, D., Basford, K. E., & Adams, P. (1999). The emmix software for the fitting of mixture of normal and t-components. Journal of Statistical Software, 4(2), 1–14.CrossRefGoogle Scholar
  26. Möller-Levet, C. S., Klawonn, F., Cho, K. H., Yin, H., & Wolkenhauer, O. (2005). Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems, 152(1), 49–66.CrossRefGoogle Scholar
  27. Ng, S. K., McLachlan, G. J., Wang, K., Jones, L. B. T., & Ng, S. W. (2006). A mixture of model with random effect components for clustering correlated gene-expression profiles. Bioinformatics, 22(14), 1745–1752.CrossRefGoogle Scholar
  28. Peddada, S., Harris, S., Zajd, J., & Harvey, E. (2005). Oriogen: Order restricted inference for ordered gene expression data. Bioinformatics, 21(20), 3933–3934.CrossRefGoogle Scholar
  29. Ramoni, M. F., Sebastiani, P., & Kohane, I. S. (2002). Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences, 99(14), 9121–9126.CrossRefGoogle Scholar
  30. Schliep, A., Schönhuth, A., & Steinhoff, C. (2003). Using hidden markov models to analyze gene expression time course data. Bioinformatics, 19(suppl 1), i255–i263.CrossRefGoogle Scholar
  31. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., et al. (1998). Comprehensive identification of cell cycle-regulated of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12), 3273–3297.CrossRefGoogle Scholar
  32. Storey, J. D., Xiao, W., Leef, J. T., Tompkins, R. G., & Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the America, 102(36), 12,837–12,842.CrossRefGoogle Scholar
  33. Szekely, G. J., & Rizzo, M. L. (2005). Hierarchical clustering via joint between-within distances: Extending ward’s minimum variance method. Journal of Classification, 22(2), 151–183.CrossRefGoogle Scholar
  34. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et al. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences, 96(6), 2907–2912.CrossRefGoogle Scholar
  35. Vilar, J. A., Alonso, A., & Vilar, J. M. (2010). Non-linear time series clustering based on non-parametric forecast densities. Computational Statistics and Data Analysis, 54, 2850–2865.CrossRefGoogle Scholar
  36. Vilar, J. M., Vilar, J. A., & Pertega, S. (2009). Classifying time series data: A nonparametric approach. Journal of Classification, 26, 3–28.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Psychiatry and NeuropsychologyMaastricht UniversityMaastrichtThe Netherlands
  2. 2.Department of StatisticsMiddle East Technical UniversityAnkaraTurkey
  3. 3.Department of Industrial EngineeringMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations