ISPA 2005: Parallel and Distributed Processing and Applications - ISPA 2005 Workshops pp 159-167 | Cite as
COMPACT: A Comparative Package for Clustering Assessment
Abstract
There exist numerous algorithms that cluster data-points from large-scale genomic experiments such as sequencing, gene-expression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a significant and often overlooked aspect in extracting information from large-scale datasets. Evidently, such choice may significantly influence the biological interpretation of the data. We present an easy-to-use and intuitive tool that compares some clustering methods within the same framework. The interface is named COMPACT for Comparative-Package-for-Clustering-Assessment. COMPACT first reduces the dataset’s dimensionality using the Singular Value Decomposition (SVD) method, and only then employs various clustering techniques. Besides its simplicity, and its ability to perform well on high-dimensional data, it provides visualization tools for evaluating the results. COMPACT was tested on a variety of datasets, from classical benchmarks to large-scale gene-expression experiments. COMPACT is configurable and expendable to newly added algorithms.
Keywords
Cluster Algorithm Singular Value Decomposition Yeast Cell Cycle Quantum Cluster Real ClassificationPreview
Unable to display preview. Download preview PDF.
References
- 1.Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)MATHGoogle Scholar
- 2.Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19(14), 1787–1799 (2003)Google Scholar
- 3.Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. In: Proc. Natl. Acad. Sci., USA, vol. 95(25), pp. 14863–14868 (1998)Google Scholar
- 4.Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
- 5.Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI, Menlo Park (2000)Google Scholar
- 6.Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.P.T.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9(12), 3273–3297 (1998)Google Scholar
- 7.Horn, D., Gottlieb, A.: Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys. Rev. Lett. 88(1), 018702 (2002)Google Scholar
- 8.Yeang, C.H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R.M., Angelo, M., Reich, M., Lander, E., Mesirov, J., Golub, T.C.H., Ramaswamy, S.: Molecular classification of multiple tumor types. Bioinformatics, 17 (Suppl. 1) S316–S322 (2001)Google Scholar
- 9.Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)Google Scholar
- 10.Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T.R., Mesirov, J.P.S.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10(2), 119–42 (2003)Google Scholar
- 11.Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)Google Scholar
- 12.Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. In: Proc. Natl. Acad. Sci. USA, vol. 97, pp. 10101-10106 (2000)Google Scholar
- 13.Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19(9), 1110–1115 (2003)Google Scholar
- 14.Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–20 (2000)Google Scholar
- 15.Sasson, O., Linial, N., Linial, M.: The metric space of proteins-comparative study of clustering algorithms. Bioinformatics, 18 (Suppl. 1) S14–S21 (2002)Google Scholar
- 16.Sasson, O., Vaaknin, A., Fleischer, H., Portugaly, E., Bilu, Y., Linial, N., Linial, M.: ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res. 31(1), 348–52 (2003)Google Scholar
- 17.The Eisen Lab software page, http://rana.lbl.gov/EisenSoftware.htm
- 18.The R project for statistical computing, http://www.r-project.org/