Ensembles Based on Random Projections to Improve the Accuracy of Clustering Algorithms
We present an algorithmic scheme for unsupervised cluster ensembles, based on randomized projections between metric spaces, by which a substantial dimensionality reduction is obtained. Multiple clusterings are performed on random subspaces, approximately preserving the distances between the projected data, and then they are combined using a pairwise similarity matrix; in this way the accuracy of each “base” clustering is maintained, and the diversity between them is improved. The proposed approach is effective for clustering problems characterized by high dimensional data, as shown by our preliminary experimental results.
KeywordsHigh Dimensional Data Random Projection Multiple Clusterings Cluster Ensemble Random Subspace
Unable to display preview. Download preview PDF.
- 4.Hadjitodorov, S., Kuncheva, L., Todorova, L.: Moderate Diversity for Better Cluster Ensembles. Information Fusion (2005)Google Scholar
- 5.Bertoni, A., Valentini, G.: Random projections for assessing gene expression cluster stability. In: IJCNN 2005, The IEEE-INNS International Joint Conference on Neural Networks, Montreal (2005) (in press)Google Scholar
- 6.Smolkin, M., Gosh, D.: Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 4 (2003)Google Scholar
- 8.Johnson, W., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. In: Conference in modern analysis and probability. Contemporary Mathematics, vol. 26, pp. 189–206. Amer. Math. Soc., Providence (1984)Google Scholar
- 9.Bingham, E., Mannila, H.: Random projection in dimensionality reduction: Applications to image and text data. In: Proc. of KDD 2001, San Francisco, CA, USA. ACM, New York (2001)Google Scholar
- 11.Valentini, G.: An experimental bias-variance analysis of SVM ensembles based on resampling techniques. IEEE Transactions on Systems, Man and Cybernetics- Part B: Cybernetics 35 (2005)Google Scholar