Abstract
Cluster analysis has been widely used to explore thousands of gene expressions from microarray analysis and identify a small number of similar genes (objects) for further detailed biological investigation. However, most clustering algorithms tend to identify loose clusters with too many genes. In this paper, we propose a Bayesian tight clustering method for time course gene expression data, which selects a small number of closely-related genes and constructs tight clusters only with these closely-related genes.
Similar content being viewed by others
References
Basford KE, McLachlan GJ (1985) Likelihood estimation with normal mixture models. Appl Stat 34: 282–289
Basford KE, Greenway DR, McLachlan GJ, Peel D (1997) Standard errors of fitted means under normal mixture models. Comput Stat 12: 1–17
Booth J, Casella G, Hobert J (2008) Clustering using objective functions and stochastic search. J R Stat Soc B 70(1): 119–140
Costa IG, Carvalho FAT, Souto MCP (2004) Comparative analysis of clustering methods for gene expression time course data. Genet Mol Biol 27: 623–631
Crowley EM (1997) Product partition models for normal means. J Am Stat Assoc 92: 192–198
Datta S, Datta S (2003) Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19: 459–466
Ghosh D, Chinnaiyan AM (2002) Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18: 275–286
Hakamada K, Okamoto M, Hanai T (2006) Novel technique for preprocessing high dimensional time-course data from DNA microarray: mathematical model-based clustering. Bioinformatics 22: 843–848
Hartigan JA (1991) Partition models. Commun Stat Theory Methods 19: 2745–2756
Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. Appl Stat 28: 100–108
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98: 397–408
Jerrum M, Sinclair A (1996) The Markov Chain Monte Carlo method: an approach to approximate counting and integration. In: Approximation algorithms for NP-hard problems. PWS Publishing, Boston
Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, Upper Saddle River
Leng X, Muller H (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22: 68–76
Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 19: 474–482
Lukashin AV, Fuchs R (2001) Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics 17: 405–414
Ma P, Castillo-Davis CI, Zhong W, Liu JS (2006) A data-driven clustering method for time course gene expression data. Nucleic Acids Res 34: 1261–1269
McLachlan GJ, Baford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, Inc., New York
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18: 413–422
Ng SK, McLachlan GJ, Wang K, Jones LB, Ng SW (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22: 1745–1752
Ouyang M, Welsh WJ, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20: 917–923
Park T, Yi S, Lee S, Lee SY, Yoo D, Ahn J, Lee Y (2003) Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 19: 694–703
Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM (2003) Gene selection and clustering for time-course and dose response microarray experiments using order-restricted inference. Bioinformatics 19: 834–841
Pitman J (1997) Some probabilistic aspects of set partitions. Am Math Mon 104: 201–209
Ruppert D, Wand MP, Caroll RJ (2003) Semiparametric regression. Cambridge University Press, New York
Schliep A, Schonhuth A, Steinhoff C (2003) Using hidden Markov models to analyze gene expression time course data. Bioinformatics 19: i255–i263
Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22: 2405–2412
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58: 267–288
Tseng GC, Wong WH (2005) Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics 61: 10–16
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Joo, Y., Casella, G. & Hobert, J. Bayesian model-based tight clustering for time course data. Comput Stat 25, 17–38 (2010). https://doi.org/10.1007/s00180-009-0159-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-009-0159-7