Chapter

Trends and Applications in Knowledge Discovery and Data Mining

Volume 8643 of the series Lecture Notes in Computer Science pp 483-493

Date:

Ensemble Clustering of High Dimensional Data with FastMap Projection

  • Imran KhanAffiliated withShenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Email author 
  • , Joshua Zhexue HuangAffiliated withShenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of SciencesCollege of Computer Science and Software Engineering, Shenzhen University
  • , Nguyen Thanh TungAffiliated withShenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
  • , Graham WilliamsAffiliated withShenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

* Final gross prices may vary according to local VAT.

Get Access

Abstract

In this paper, we propose an ensemble clustering method for high dimensional data which uses FastMap projection to generate subspace component data sets. In comparison with popular random sampling and random projection, FastMap projection preserves the clustering structure of the original data in the component data sets so that the performance of ensemble clustering is improved significantly. We present two methods to measure preservation of clustering structure of generated component data sets. The comparison results have shown that FastMap preserved the clustering structure better than random sampling and random projection. Experiments on three real data sets were conducted with three data generation methods and three consensus functions. The results have shown that the ensemble clustering with FastMap projection outperformed the ensemble clusterings with random sampling and random projection.

Keywords

Ensemble clustering FastMap Random sampling Random projection Consensus function