PAKDD 2008: Advances in Knowledge Discovery and Data Mining pp 759-766 | Cite as
Constrained Clustering for Gene Expression Data Mining
Abstract
Constrained clustering algorithms have the advantage that domain-dependent constraints can be incorporated in clustering so as to achieve better clustering results. However, the existing constrained clustering algorithms are mostly k-means like methods, which may only deal with distance-based similarity measures. In this paper, we propose a constrained hierarchical clustering method, called Correlational-Constrained Complete Link (C-CCL), for gene expression analysis with the consideration of gene-pair constraints, while using correlation coefficients as the similarity measure. C-CCL was evaluated for the performance with the correlational version of COP-k-Means (C-CKM) method on a real yeast dataset. We evaluate both clustering methods with two validation measures and the results show that C-CCL outperforms C-CKM substantially in clustering quality.
Keywords
Hierarchical Clustering Constrained Clustering Gene Expression Mining Micorarray analysisPreview
Unable to display preview. Download preview PDF.
References
- 1.Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: Proceedings of the 9th International Conference on Machine Learning, pp. 19–26 (2002)Google Scholar
- 2.Cho, S.B., Ryu, J.: Classifying Gene Expression Data of Cancer Using Classifier Ensemble with Mutually Exclusive Features. Proceedings of IEEE 90, 1744–1753 (2002)CrossRefGoogle Scholar
- 3.Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699–705 (1998)CrossRefGoogle Scholar
- 4.Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)MATHGoogle Scholar
- 5.Davidson, I., Ravi, S.S.: Clustering With Constraints: Feasibility Issues and the k-Means Algorithm. In: Proceedings of the SIAM International Conference on Data Mining (2005)Google Scholar
- 6.Fisher, D.H.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)Google Scholar
- 7.Gordon, A.D.: Classification, 2nd edn. Monographs on Statistics and Applied Probability 82. Chapman and Hall/CRC, NY (1999)MATHGoogle Scholar
- 8.Klein, D., Kamvar, S., Manning, C.: From Instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proceedings of the 9th International Conference on Machine Learning, pp. 307–314 (2002)Google Scholar
- 9.Tseng, V.S., Kao, C.P.: Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 355–365 (2005)CrossRefGoogle Scholar
- 10.Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: 17th International Conference on Machine Learning, pp. 1103–1110 (2000)Google Scholar
- 11.Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means Clustering with Background Knowledge. In: Proceedings of the 19th International Conference on Machine Learning, pp. 577–584 (2001)Google Scholar