Ensembles Based on Random Projections to Improve the Accuracy of Clustering Algorithms

  • Alberto Bertoni
  • Giorgio Valentini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3931)


We present an algorithmic scheme for unsupervised cluster ensembles, based on randomized projections between metric spaces, by which a substantial dimensionality reduction is obtained. Multiple clusterings are performed on random subspaces, approximately preserving the distances between the projected data, and then they are combined using a pairwise similarity matrix; in this way the accuracy of each “base” clustering is maintained, and the diversity between them is improved. The proposed approach is effective for clustering problems characterized by high dimensional data, as shown by our preliminary experimental results.


High Dimensional Data Random Projection Multiple Clusterings Cluster Ensemble Random Subspace 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dietterich, T.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002, vol. 2486, pp. 3–19. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research 3, 583–618 (2002)MathSciNetMATHGoogle Scholar
  4. 4.
    Hadjitodorov, S., Kuncheva, L., Todorova, L.: Moderate Diversity for Better Cluster Ensembles. Information Fusion (2005)Google Scholar
  5. 5.
    Bertoni, A., Valentini, G.: Random projections for assessing gene expression cluster stability. In: IJCNN 2005, The IEEE-INNS International Joint Conference on Neural Networks, Montreal (2005) (in press)Google Scholar
  6. 6.
    Smolkin, M., Gosh, D.: Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 4 (2003)Google Scholar
  7. 7.
    Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  8. 8.
    Johnson, W., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. In: Conference in modern analysis and probability. Contemporary Mathematics, vol. 26, pp. 189–206. Amer. Math. Soc., Providence (1984)Google Scholar
  9. 9.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: Applications to image and text data. In: Proc. of KDD 2001, San Francisco, CA, USA. ACM, New York (2001)Google Scholar
  10. 10.
    Ward, J.: Hierarchcal grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)CrossRefGoogle Scholar
  11. 11.
    Valentini, G.: An experimental bias-variance analysis of SVM ensembles based on resampling techniques. IEEE Transactions on Systems, Man and Cybernetics- Part B: Cybernetics 35 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alberto Bertoni
    • 1
  • Giorgio Valentini
    • 1
  1. 1.DSI, Dipartimento di Scienze dell’ InformazioneUniversità degli Studi di MilanoMilanoItalia

Personalised recommendations