Abstract
A problem common to all clustering techniques is the difficulty of deciding the number of clusters present in the data. The aim of this paper is to assess the performance of the best stopping rules from the Milligan and Cooper’s (1985) study, on specific artificial data sets containing a particular cluster structure. To provide a variety of solutions the data sets are analysed by four clustering procedures. We compare also these results with those obtained by three methods based on the hypervolume clustering criterion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beale, E. M. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute, 43, 2, 92–94.
Calinski, T., and Harabasz, J. (1974): A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.
Duda, R.O., and Hart, P.E. (1973): Pattern Classification and Scene Analysis. Wiley, New York.
Goodman, L.A. and Kruskal, W.H. (1954): Measures of association for cross- classifications. Journal of the American Statistical Association, 49, 732–764.
Gordon, A.D. (1997): How many clusters? An investigation of five procedures for detecting nested cluster structure, in Proceedings of the IFCS-96 Conference, Kobe (in print).
Hardy, A., and Rasson, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des données, 7, 41–56.
Hardy, A. (1983): Statistique et classification automatique: Un modèle - Un nouveau critère - Des algorithmes - Des applications. Ph.D Thesis, F.U.N.D.P., Namur, Belgium.
Hardy, A. (1994): An examination of procedures for determining the number of clusters in a data set, in New Approches in Classification and Data Analysis, E. Diday et al. (Editors), Springer-Verlag, Paris, 178–185.
Milligan, G.W. and Cooper, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.
Ripley, B.D., and Rasson, J.P. (1977): Finding the edge of a Poisson Forest. Journal of Applied Probability, 14, 483–491.
Sarle, W.S. (1983): Cubic Clustering Criterion. Technical Report: A-108, SAS Institute Inc., Cary, NC, USA.
Wishart, D. (1978): CLUSTAN User Manual, 3rd edition, Program Library Unit, University of Edinburgh.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Hardy, A., Andre, P. (1998). An investigation of nine procedures for detecting the structure in a data set. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-72253-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64641-9
Online ISBN: 978-3-642-72253-0
eBook Packages: Springer Book Archive