An examination of procedures for determining the number of clusters in a data set

  • André Hardy
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Summary

A problem common to all clustering techniques is the difficulty of deciding the number of clusters present in the data. The aim of this paper is to compare three methods based on the hypervolume criterion with four other well-known methods. This evaluation of procedures for determining the number of clusters is conducted on artificial data sets. To provide a variety of solutions the data sets are analysed by six clustering methods. We finally conclude by pointing out the performance of each method and by giving some guidance for making choices between them.

Keywords

Hull 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ANDERBERG, M.R. (1973): Cluster Analysis for Applications. Academic Press, New York.Google Scholar
  2. BOCK, H.H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108. CrossRefGoogle Scholar
  3. DIDAY, E. et Collaborateurs (1979): Optimisation en Classification Automatique. INRIA, Paris.Google Scholar
  4. EVERITT, B. (1980): Cluster analysis. Halsted Press, London.Google Scholar
  5. GORDON, A.D. (1981): Classification. Chapman and Hall, London.Google Scholar
  6. HARDY, A., and RASSON, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des données, 7, 41–56. Google Scholar
  7. HARDY, A. (1983): Une nouvelle approche des problèmes de classification automatique. Un modèle — Un nouveau critère — Des algorithmes — Des applications. Ph.D Thesis, F.U.N.D.P., Namur, Belgium.Google Scholar
  8. HARDY, A. (1993): Criteria for determining the number of groups in a data set based on the hypervolume criterion. Technical report, FUNDP, Namur, Belgium.Google Scholar
  9. MOORE, M. (1984): On the estimation of a convex set. The Annals of Statistics, 12, 3, 1090–1099. CrossRefGoogle Scholar
  10. NEVEU, J. (1974): Processus ponctuels. Technical report, Laboratoire de Calcul des Probabilités, Université Paris VI.Google Scholar
  11. RIPLEY, B.D., and RASSON, J.P. (1977): Finding the edge of a Poisson Forest. Journal of Applied Probability, 14, 483–491. CrossRefGoogle Scholar
  12. WISHART, D. (1978): CLUSTAN User Manual, 3rd ed., Program Library Unit, University of Edimburgh.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • André Hardy
    • 1
    • 2
  1. 1.Unité de Statistique, Département de MathématiqueFacultés Universitaires N.-D. de la PaixNamurBelgium
  2. 2.Facultés Universitaires Saint-LouisBruxellesBelgium

Personalised recommendations