Evaluation of Subspace Clustering Quality

  • Urszula Markowska-Kaczmar
  • Arletta Hurej
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5271)

Abstract

Subspace clustering methods seek to find clusters in different subspaces within a data set instead of searching them in full feature space. In such a case there is a problem how to evaluate the quality of the clustering results. In this paper we present our method of the subspace clustering quality estimation which is based on adaptation of Davies-Bouldin Index to subspace clustering. The assumptions which were made to build the metrics are presented first. Then the proposed metrics is formally described. Next it is verified in an experimental way with the use of our clustering method IBUSCA. The experiments have shown that its value reflects a quality of subspace clustering thus it can be an alternative in the case where there is no expert’s evaluation.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bouveyron, C., Girard, S., Shmid, C.: High-dimensional data clustering. Computational Statistic and Data Analysis 52(1), 502–519 (2007)CrossRefMATHGoogle Scholar
  2. 2.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1, 224–227 (1979)CrossRefGoogle Scholar
  3. 3.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar
  4. 4.
    Glomba, M., Markowska-Kaczmar, U.: IBUSCA: A Grid-based Bottom-up Subspace Clustering Algorithm. In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA 2006). IEEE Computer Society, Los Alamitos (2006)Google Scholar
  5. 5.
    Han, J., Kember, M.: Data Mining: Concept and Techniques. In: Cluster Analysis, pp. 335–393. Morgan Kaufman Publishers/ Academic Press (2001)Google Scholar
  6. 6.
    Newman, S., Hettich, D., Blake, C., Merz, C.: Uci repository of machine learning databases, http://www.ics.uci.edu/mlearn/MLRepository.html
  7. 7.
    Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6(1), 90–105 (2004)CrossRefGoogle Scholar
  8. 8.
    Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering 18(7), 902–916 (2006)CrossRefGoogle Scholar
  9. 9.
    Wand, M.P.: Data-based choice of histogram bin width. The American Statistician 51(1), 59–64 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Urszula Markowska-Kaczmar
    • 1
  • Arletta Hurej
    • 1
  1. 1.Institute of Applied InformaticsWroclaw University of TechnologyWroclawPoland

Personalised recommendations