Constrained Clustering for Gene Expression Data Mining

  • Vincent S. Tseng
  • Lien-Chin Chen
  • Ching-Pin Kao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5012)

Abstract

Constrained clustering algorithms have the advantage that domain-dependent constraints can be incorporated in clustering so as to achieve better clustering results. However, the existing constrained clustering algorithms are mostly k-means like methods, which may only deal with distance-based similarity measures. In this paper, we propose a constrained hierarchical clustering method, called Correlational-Constrained Complete Link (C-CCL), for gene expression analysis with the consideration of gene-pair constraints, while using correlation coefficients as the similarity measure. C-CCL was evaluated for the performance with the correlational version of COP-k-Means (C-CKM) method on a real yeast dataset. We evaluate both clustering methods with two validation measures and the results show that C-CCL outperforms C-CKM substantially in clustering quality.

Keywords

Hierarchical Clustering Constrained Clustering Gene Expression Mining Micorarray analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: Proceedings of the 9th International Conference on Machine Learning, pp. 19–26 (2002)Google Scholar
  2. 2.
    Cho, S.B., Ryu, J.: Classifying Gene Expression Data of Cancer Using Classifier Ensemble with Mutually Exclusive Features. Proceedings of IEEE 90, 1744–1753 (2002)CrossRefGoogle Scholar
  3. 3.
    Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699–705 (1998)CrossRefGoogle Scholar
  4. 4.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)MATHGoogle Scholar
  5. 5.
    Davidson, I., Ravi, S.S.: Clustering With Constraints: Feasibility Issues and the k-Means Algorithm. In: Proceedings of the SIAM International Conference on Data Mining (2005)Google Scholar
  6. 6.
    Fisher, D.H.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)Google Scholar
  7. 7.
    Gordon, A.D.: Classification, 2nd edn. Monographs on Statistics and Applied Probability 82. Chapman and Hall/CRC, NY (1999)MATHGoogle Scholar
  8. 8.
    Klein, D., Kamvar, S., Manning, C.: From Instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proceedings of the 9th International Conference on Machine Learning, pp. 307–314 (2002)Google Scholar
  9. 9.
    Tseng, V.S., Kao, C.P.: Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 355–365 (2005)CrossRefGoogle Scholar
  10. 10.
    Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: 17th International Conference on Machine Learning, pp. 1103–1110 (2000)Google Scholar
  11. 11.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means Clustering with Background Knowledge. In: Proceedings of the 19th International Conference on Machine Learning, pp. 577–584 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Vincent S. Tseng
    • 1
  • Lien-Chin Chen
    • 1
  • Ching-Pin Kao
    • 1
  1. 1.Dept. of Computer Science and Information EngineeringNational Cheng Kung UniversityTaiwan, R.O.C.

Personalised recommendations