Improved Robustness in Time Series Analysis of Gene Expression Data by Polynomial Model Based Clustering
Microarray experiments produce large data sets that often contain noise and considerable missing data. Typical clustering methods such as hierarchical clustering or partitional algorithms can often be adversely affected by such data. This paper introduces a method to overcome such problems associated with noise and missing data by modelling the time series data with polynomials and using these models to cluster the data. Similarity measures for polynomials are given that comply with commonly used standard measures. The polynomial model based clustering is compared with standard clustering methods under different conditions and applied to a real gene expression data set. It shows significantly better results as noise and missing data are increased.
KeywordsGene Expression Data Time Series Analysis Polynomial Model Improve Robustness Direct Cluster
Unable to display preview. Download preview PDF.
- 1.Altman, D.G.: Practical Statistics for Medical Research. Chapman and Hall, Boca Raton (1997)Google Scholar
- 4.Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)Google Scholar
- 6.Kaufman, L., Rousseeuw, P.J.: Clustering by means of Medoids. In: Dodge, Y. (ed.) Statistical Data Analysis based on the L1-Norm, pp. 405–416. North-Holland, Amsterdam (1987)Google Scholar
- 7.Kellam, P., Liu, X., Martin, N., Orengo, C., Swift, S., Tucker, A.: Comparing, Contrasting and Combining Clusters in Viral Gene Expression Data. In: Proceedings of the IDAMAP 2001 Workshop, London, pp. 56–62 (2001)Google Scholar
- 10.Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 9, 3273–3297, URL: http://cellcycle-www.stanford.edu