Skip to main content

An investigation of nine procedures for detecting the structure in a data set

  • Conference paper
Advances in Data Science and Classification

Abstract

A problem common to all clustering techniques is the difficulty of deciding the number of clusters present in the data. The aim of this paper is to assess the performance of the best stopping rules from the Milligan and Cooper’s (1985) study, on specific artificial data sets containing a particular cluster structure. To provide a variety of solutions the data sets are analysed by four clustering procedures. We compare also these results with those obtained by three methods based on the hypervolume clustering criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Beale, E. M. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute, 43, 2, 92–94.

    Google Scholar 

  • Calinski, T., and Harabasz, J. (1974): A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.

    Article  Google Scholar 

  • Duda, R.O., and Hart, P.E. (1973): Pattern Classification and Scene Analysis. Wiley, New York.

    Google Scholar 

  • Goodman, L.A. and Kruskal, W.H. (1954): Measures of association for cross- classifications. Journal of the American Statistical Association, 49, 732–764.

    Article  Google Scholar 

  • Gordon, A.D. (1997): How many clusters? An investigation of five procedures for detecting nested cluster structure, in Proceedings of the IFCS-96 Conference, Kobe (in print).

    Google Scholar 

  • Hardy, A., and Rasson, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des données, 7, 41–56.

    Google Scholar 

  • Hardy, A. (1983): Statistique et classification automatique: Un modèle - Un nouveau critère - Des algorithmes - Des applications. Ph.D Thesis, F.U.N.D.P., Namur, Belgium.

    Google Scholar 

  • Hardy, A. (1994): An examination of procedures for determining the number of clusters in a data set, in New Approches in Classification and Data Analysis, E. Diday et al. (Editors), Springer-Verlag, Paris, 178–185.

    Google Scholar 

  • Milligan, G.W. and Cooper, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.

    Article  Google Scholar 

  • Ripley, B.D., and Rasson, J.P. (1977): Finding the edge of a Poisson Forest. Journal of Applied Probability, 14, 483–491.

    Article  Google Scholar 

  • Sarle, W.S. (1983): Cubic Clustering Criterion. Technical Report: A-108, SAS Institute Inc., Cary, NC, USA.

    Google Scholar 

  • Wishart, D. (1978): CLUSTAN User Manual, 3rd edition, Program Library Unit, University of Edinburgh.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Hardy, A., Andre, P. (1998). An investigation of nine procedures for detecting the structure in a data set. In: Rizzi, A., Vichi, M., Bock, HH. (eds) Advances in Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-72253-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-72253-0_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64641-9

  • Online ISBN: 978-3-642-72253-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics