Clustering of short time-course gene expression data with dissimilar replicates

Cinar, Ozan; Ilk, Ozlem; Iyigun, Cem

doi:10.1007/s10479-017-2583-3

Clustering of short time-course gene expression data with dissimilar replicates

Data Mining and Analytics
Published: 25 July 2017

Volume 263, pages 405–428, (2018)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Ozan Cinar¹,
Ozlem Ilk²^na1 &
Cem Iyigun³^na1

360 Accesses
1 Citation
Explore all metrics

Abstract

Microarrays are used in genetics and medicine to examine large numbers of genes simultaneously through their expression levels under any condition such as a disease of interest. The information from these experiments can be enriched by following the expression levels through time and biological replicates. The purpose of this study is to propose an algorithm which clusters the genes with respect to the similarities between their behaviors through time. The algorithm is also aimed at highlighting the genes which show different behaviors between the replicates and separating the constant genes that keep their baseline expression levels throughout the study. Finally, we aim to feature cluster validation techniques to suggest a sensible number of clusters when it is not known a priori. The illustrations show that the proposed algorithm in this study offers a fast approach to clustering the genes with respect to their behavior similarities, and also separates the constant genes and the genes with dissimilar replicates without any need for pre-processing. Moreover, it is also successful at suggesting the correct number of clusters when that is not known.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assisted clustering of gene expression data using ANCut

Article Open access 16 August 2017

Impact of Partition Based Clustering Algorithms to Cluster Samples in Microarray Gene Expression Data

Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results

Article Open access 01 March 2018

References

Alonso, A., Berrendero, J., Hernandez, A., & Justel, A. (2006). Time series clustering based on forecast densities. Computational Statistics and Data Analysis, 51, 762–776.
Article Google Scholar
Bar-Joseph, Z. (2004). Analyzing time series gene expression data. Bioinformatics, 20(16), 2493–2503.
Article Google Scholar
Bar-Joseph, Z., Gerber, G. K., Gifford, D. K., Jaakkola, T. S., & Simon, I. (2003). Continuous representations of time-series gene expression data. Journal of Computational Biology, 10(3–4), 341–356.
Article Google Scholar
Caiado, J., Crato, N., & Pena, D. (2006). A periodogram-based metric for time series classification. Computational Statistics and Data Analysis, 50, 2668–2684.
Article Google Scholar
Celeux, G., Martin, O., & Lavergne, C. (2005). Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments. Statistical Modelling, 5(3), 243–267.
Article Google Scholar
Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Wodicka, A. C. L., Wolfsberg, T. G., et al. (1998). A genome-wide transcriptioanal analysis of the mitotic cell cycle. Molecular Cell, 2(1), 65–73.
Article Google Scholar
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., & Brown, P. O. (1998). The transcriptional program of sporulation in budding yeast. Science, 282(5389), 699–705.
Article Google Scholar
Corduas, M., & Piccolo, D. (2008). Time series clustering and classification by the autoregressive metric. Computational Statistics and Data Analysis, 52, 1860–1872.
Article Google Scholar
Díaz, S. P., & Vilar, J. A. (2010). Comparing several parametric and nonparametric approaches to time series clustering: A simulation study. Journal of Classification, 27, 333–362.
Article Google Scholar
Do, J. H., & Choi, D. (2008). Clustering approaches to identfying gene expression patterns from dna microarray data. Molecules and Cells, 25(2), 279.
Google Scholar
Eisen, M. B., Spellman, P. T., Brown, P. O., & Boltstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95(25), 14,863–14,868.
Article Google Scholar
Ernst, J., Nau, G. J., & Bar-Joseph, Z. (2005). Clustering short time series gene expression data. Bioinformatics, 21(suppl 1), i159–i168.
Article Google Scholar
Galbraith, J., & Jiaqing, L. (1999). Cluster and discriminant analysis on time series as a research tool UTIP Working Paper Number 6, The University of Texas at Austin, Austin: Lyndon B
Hackstadt, A. J., & Hess, A. M. (2009). Filtering for increased power for microarray data analysis. BMC Bioinformatics, 10(1), 1.
Article Google Scholar
Hakamada, K., Okamoto, M., & Hanai, T. (2006). Novel technique for preprocessing high dimensional time-course data from dna microarray: Mathematical model-based clustering. Bioinformatics, 22(7), 843–848.
Article Google Scholar
Heard, N. A., Holmes, C. C., Stephens, D. A., Hand, D. J., & Dimopoulos, G. (2005). Bayesian coclustering of anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proceedings of the National Academy of Sciences of the United States of America, 102(47), 16,939–16,944.
Article Google Scholar
Heyer, L. J., Kruglyak, S., & Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Research, 9(11), 1106–1115.
Article Google Scholar
Irigoien, I., Vives, S., & Arenas, C. (2011). Microarray time course experiments: Finding profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(2), 464–475.
Article Google Scholar
Kakizawa, Y., Shumway, R. H., & Taniguchi, M. (1998). Discrimation and clustering for multivariate time series. Journal of the American Statistical Association, 93, 328–340.
Article Google Scholar
Khan, J., Simon, R., Bittner, M., Chen, Y., Leighton, S. B., Pohida, T., et al. (1998). Gene expression profiling of alveolar rhabdomyosarcoma with cdna microarrays. Cancer Research, 58(22), 5009–5013.
Google Scholar
Kim, B. R., Zhang, L., Berg, A., Fan, J., & Wu, R. (2008). A computational approach to the functional clustering of periodic gene-expression profiles. Genetics, 180(2), 821–834.
Article Google Scholar
Liao, T. W. (2005). Clustering of time series data: A survey. Pattern Recognition, 38(11), 1857–1874.
Article Google Scholar
Luan, Y., & Li, H. (2004). Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data. Bioinformatics, 20(3), 332–339.
Article Google Scholar
Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650–1654.
Article Google Scholar
McLachlan, G. J., Peel, D., Basford, K. E., & Adams, P. (1999). The emmix software for the fitting of mixture of normal and t-components. Journal of Statistical Software, 4(2), 1–14.
Article Google Scholar
Möller-Levet, C. S., Klawonn, F., Cho, K. H., Yin, H., & Wolkenhauer, O. (2005). Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems, 152(1), 49–66.
Article Google Scholar
Ng, S. K., McLachlan, G. J., Wang, K., Jones, L. B. T., & Ng, S. W. (2006). A mixture of model with random effect components for clustering correlated gene-expression profiles. Bioinformatics, 22(14), 1745–1752.
Article Google Scholar
Peddada, S., Harris, S., Zajd, J., & Harvey, E. (2005). Oriogen: Order restricted inference for ordered gene expression data. Bioinformatics, 21(20), 3933–3934.
Article Google Scholar
Ramoni, M. F., Sebastiani, P., & Kohane, I. S. (2002). Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences, 99(14), 9121–9126.
Article Google Scholar
Schliep, A., Schönhuth, A., & Steinhoff, C. (2003). Using hidden markov models to analyze gene expression time course data. Bioinformatics, 19(suppl 1), i255–i263.
Article Google Scholar
Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., et al. (1998). Comprehensive identification of cell cycle-regulated of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12), 3273–3297.
Article Google Scholar
Storey, J. D., Xiao, W., Leef, J. T., Tompkins, R. G., & Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the America, 102(36), 12,837–12,842.
Article Google Scholar
Szekely, G. J., & Rizzo, M. L. (2005). Hierarchical clustering via joint between-within distances: Extending ward’s minimum variance method. Journal of Classification, 22(2), 151–183.
Article Google Scholar
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et al. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences, 96(6), 2907–2912.
Article Google Scholar
Vilar, J. A., Alonso, A., & Vilar, J. M. (2010). Non-linear time series clustering based on non-parametric forecast densities. Computational Statistics and Data Analysis, 54, 2850–2865.
Article Google Scholar
Vilar, J. M., Vilar, J. A., & Pertega, S. (2009). Classifying time series data: A nonparametric approach. Journal of Classification, 26, 3–28.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Academic Writing Center at Middle East Technical University for their writing consultation and editorial help.

Author information

Ozlem Ilk and Cem Iyigun have contributed equally to this work.

Authors and Affiliations

Department of Psychiatry and Neuropsychology, Maastricht University, Maastricht, The Netherlands
Ozan Cinar
Department of Statistics, Middle East Technical University, Ankara, Turkey
Ozlem Ilk
Department of Industrial Engineering, Middle East Technical University, Ankara, Turkey
Cem Iyigun

Authors

Ozan Cinar
View author publications
You can also search for this author in PubMed Google Scholar
Ozlem Ilk
View author publications
You can also search for this author in PubMed Google Scholar
Cem Iyigun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ozan Cinar.

Appendix

Irigoien et al. (2011) proposed a method to cluster the time-course gene expression levels. This procedure involves several steps, where the gene profiles are first standardized, and then the genes are filtered out if they have dissimilar replicates or constant patterns. The distances are measured with Procrustes statistics. The default Irigoien procedure is their methodology with Procrustes statistics. Furthermore, they embed their Procrustes statistics in k-means and compare this method with usual k-means algorithms with Euclidean and correlation distances.

ORIOGEN is a Java-based software package used to cluster time-course gene expression profiles (Peddada et al. 2005). ORIOGEN requires candidate gene profiles such as increasing or cyclical profiles. Next, each individual gene is tested against each profile by obtaining a goodness-of-fit statistic. According to the goodness-of-fit statistics each gene is assigned to one pattern which constitutes the clusters at the end of the algorithm.

EMMIX is another piece of software proposed by McLachlan et al. (1999) to fit a mixture model of multivariate normal or t-distributed components. This process is done with the Maximum Likelihood method and the optimum likelihood is obtained using an Expectation-Maximization (EM) algorithm. Two parameters for random effects and mixture effects can be used to specify the initial step of the EM algorithm. These two parameters can also be specified such that they can determine the clusters in which the subjects are assigned.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cinar, O., Ilk, O. & Iyigun, C. Clustering of short time-course gene expression data with dissimilar replicates. Ann Oper Res 263, 405–428 (2018). https://doi.org/10.1007/s10479-017-2583-3

Download citation

Published: 25 July 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10479-017-2583-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering of short time-course gene expression data with dissimilar replicates

Abstract

Access this article

Similar content being viewed by others

Assisted clustering of gene expression data using ANCut

Impact of Partition Based Clustering Algorithms to Cluster Samples in Microarray Gene Expression Data

Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering of short time-course gene expression data with dissimilar replicates

Abstract

Access this article

Similar content being viewed by others

Assisted clustering of gene expression data using ANCut

Impact of Partition Based Clustering Algorithms to Cluster Samples in Microarray Gene Expression Data

Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation