Ensemble Clustering of High Dimensional Data with FastMap Projection

  • Imran Khan
  • Joshua Zhexue Huang
  • Nguyen Thanh Tung
  • Graham Williams
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8643)

Abstract

In this paper, we propose an ensemble clustering method for high dimensional data which uses FastMap projection to generate subspace component data sets. In comparison with popular random sampling and random projection, FastMap projection preserves the clustering structure of the original data in the component data sets so that the performance of ensemble clustering is improved significantly. We present two methods to measure preservation of clustering structure of generated component data sets. The comparison results have shown that FastMap preserved the clustering structure better than random sampling and random projection. Experiments on three real data sets were conducted with three data generation methods and three consensus functions. The results have shown that the ensemble clustering with FastMap projection outperformed the ensemble clusterings with random sampling and random projection.

Keywords

Ensemble clustering FastMap Random sampling Random projection Consensus function 

References

  1. 1.
    Law, M., Topchy, A., Jain, A.: Multiobjective data clustering. In: Proceedings of CVPR, pp. 424–430 (2004)Google Scholar
  2. 2.
    Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250 (2001)Google Scholar
  4. 4.
    Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (2003)Google Scholar
  5. 5.
    Faloutsos, C., Lin, K.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM-SIGMOD, pp. 163–174 (1995)Google Scholar
  6. 6.
    Xue, H., Chen, S., Yang, Q.: Discriminatively regularized least-squares classification. Pattern Recognit. 42, 93–104 (2009)CrossRefMATHGoogle Scholar
  7. 7.
    Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, pp. 143–151 (2000)Google Scholar
  8. 8.
    Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: methods and analysis. ACM Trans. KDD 2, 1–40 (2009)Google Scholar
  9. 9.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Kriegel, H., Kroger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans. KDD 3, 1–58 (2009)Google Scholar
  11. 11.
    Aswani Kumar, C.: Reducing data dimensionality using random projections and fuzzy k-means clustering. Int. J. Intell. Comput. Cybern. 4, 353–365 (2011)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in vlsi domain. In: Proceedings of Conference on Design and Automation (1997)Google Scholar
  13. 13.
    Kuncheva, L., Vetrov, D.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. PAMI 28, 1798–1808 (2006)CrossRefGoogle Scholar
  14. 14.
    Kuncheva, L.I., Hadjitodorov, S.T.: Using diversity in cluster ensembles. In: IEEE International Conference on Systems, pp. 1214–1219 (2004)Google Scholar
  15. 15.
    Strehl, A., Ghosh, J.: Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetGoogle Scholar
  16. 16.
    Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering optimal solution by dual random projection. In: JMLR: Workshop and Conference Proceedings, vol. 30, pp. 1–23 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Imran Khan
    • 1
  • Joshua Zhexue Huang
    • 1
    • 2
  • Nguyen Thanh Tung
    • 1
  • Graham Williams
    • 1
  1. 1.Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  2. 2.College of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina

Personalised recommendations