Skip to main content

Clustering Methods: A History of k-Means Algorithms

  • Chapter

Abstract

This paper surveys some historical issues related to the well-known k-means algorithm in cluster analysis. It shows to which authors the different versions of this algorithm can be traced back, and which were the underlying applications. We sketch various generalizations (with references also to Diday’s work) and thereby underline the usefulness of the k-means approach in data analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ANDERBERG, M.R. (1973): Cluster analysis for applications. Academic Press, New York.

    MATH  Google Scholar 

  • BIJNEN, E.J. (1973): Cluster analysis. Tilburg University Press, Tilburg, Netherlands.

    MATH  Google Scholar 

  • BOCK, H.-H. (1969): The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Paper presented at the Workshop ‘Medizinische Statistik’, February 1969, Forschungsinstitut Oberwolfach.

    Google Scholar 

  • BOCK, H.-H. (1974): Automatische Klassifikation. Theoretische und praktische Methoden zur Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen.

    Google Scholar 

  • BOCK, H.-H. (1985): On some significance tests in cluster analysis. Journal of Classification 2, 77–108.

    Article  MATH  MathSciNet  Google Scholar 

  • BOCK, H.-H. (1983): A clustering algorithm for choosing optimal classes for the chi-square test. Bull. 44th Session of the International Statistical institute, Madrid, Contributed Papers, Vol 2, 758–762.

    MathSciNet  Google Scholar 

  • BOCK, H.-H. (1986): Loglinear models and entropy clustering methods for qualitative data. In: W. Gaul, M. Schader (Eds.): Classification as a tool of research. North Holland, Amsterdam, 19–26.

    Google Scholar 

  • BOCK, H.-H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: H. Bozdogan, A.K. Gupta (Eds.): Multivariate statistical modeling and data analysis. Reidel, Dordrecht, 17–34.

    Google Scholar 

  • BOCK, H.-H. (1992): A clustering technique for maximizing Ø-divergence, noncentrality and discriminating power. In: M. Schader (Ed.): Analyzing and modeling data and knowledge. Springer, Heidelberg, 19–36.

    Google Scholar 

  • BOCK, H.-H. (1996a): Probability models and hypotheses testing in partitioning cluster analysis. In: P. Arabie, L.J. Hubert, G. De Soete (Eds.): Clustering and classification. World Scientific, Singapore, 377–453.

    Google Scholar 

  • BOCK, H.-H. (1996b): Probabilistic models in partitional cluster analysis. Computational Statistics and Data Analysis 23, 5–28.

    Article  MATH  Google Scholar 

  • BOCK, H.-H. (1996c): Probabilistic models in cluster analysis. In: A. Ferligoj, A. Kramberger (Eds.): Developments in data analysis. Proc. Intern.Conf. on’ statistical data collection and analysis’, Bled, 1994. FDV, Metodoloski zvezki, 12, Ljubljana, Slovenia, 3–25.

    Google Scholar 

  • BOCK, H.-H. (2003): Convexity-based clustering criteria: theory, algorithms, and applications in statistics. Statistical Methods & Applications 12, 293–317.

    MATH  MathSciNet  Google Scholar 

  • BRYANT, P. (1988): On characterizing optimization-based clustering methods. Journal of Classification 5, 81–84.

    Article  MathSciNet  Google Scholar 

  • CHARLES, C. (1977): Regression typologique. Rapport de Recherche no. 257. IRIALABORIA, Le Chesnay.

    Google Scholar 

  • COX, D.R. (1957) Note on grouping. J. Amer. Statist. Assoc. 52, 543–547.

    Article  MATH  Google Scholar 

  • DALENIUS, T. (1950): The problem of optimum stratification I. Skandinavisk Aktuarietidskrift 1950, 203–213.

    MathSciNet  Google Scholar 

  • DALENIUS, T., GURNEY, M. (1951): The problem of optimum stratification. II. Skandinavisk Aktuarietidskrift 1951, 133–148.

    MATH  MathSciNet  Google Scholar 

  • DIDAY, E. (1971): Une nouvelle méthode de classification automatique et reconnaissance des formes: la méthode des nuées dynamiques. Revue de Statistique Appliquée XIX(2), 1970, 19–33.

    Google Scholar 

  • DIDAY, E. (1972): Optimisation en classification automatique et reconnaissance des formes. Revue Française d’Automatique, Informatique et Recherche Opérationelle (R.A.I.R.O.) VI, 61–96.

    MathSciNet  Google Scholar 

  • DIDAY, E. (1973): The dynamic clusters method in nonhierarchical clustering. Intern. Journal of Computer and Information Sciences 2(1), 61–88.

    Article  MATH  MathSciNet  Google Scholar 

  • DIDAY, E. et al. (1979): Optimisation en classification automatique. Vol. I, II. Institut National der Recherche en Informatique et en Automatique (INRIA), Le Chesnay, France.

    MATH  Google Scholar 

  • DIDAY, E., GOVAERT, G. (1974): Classification avec distance adaptative. Comptes Rendus Acad. Sci. Paris 278 A, 993–995.

    MathSciNet  Google Scholar 

  • DIDAY, E., GOVAERT, G. (1977): Classification automatique avec distances adaptatives. R.A.I.R.O. Information/Computer Science 11(4), 329–349.

    MathSciNet  Google Scholar 

  • DIDAY, E., SCHROEDER, A. (1974a): The dynamic clusters method in pattern recognition. In: J.L. Rosenfeld (Ed.): Information Processing 74. Proc. IFIP Congress, Stockholm, August 1974. North Holland, Amsterdam, 691–697.

    Google Scholar 

  • DIDAY, E., SCHROEDER, A. (1974b): A new approach in mixed distribution detection. Rapport de Recherche no. 52, Janvier 1974. INRIA, Le Chesnay.

    Google Scholar 

  • DIDAY, E., SCHROEDER, A. (1976): A new approach in mixed distribution detection. R.A.I.R.O. Recherche Opérationelle 10(6), 75–1060.

    MathSciNet  Google Scholar 

  • FISHER, W.D. (1958): On grouping for maximum heterogeneity. J. Amer. Statist. Assoc. 53, 789–798.

    Article  MATH  MathSciNet  Google Scholar 

  • FORGY, E.W. (1965): Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics 21 (1965) 768.

    Google Scholar 

  • GALLEGOS, M.T. (2002): Maximum likelihood clustering with outliers. In: K. Jajuga, A. Sokolowski, H.-H. Bock (Eds.): Classification, clustering, and data analysis. Springer, Heidelberg, 248–255.

    Google Scholar 

  • GALLEGOS, M.T., RITTER, G. (2005): A robust method for cluster analysis. Annals of Statistics 33, 347–380.

    Article  MATH  MathSciNet  Google Scholar 

  • GRÖTSCHEL, M., WAKABAYASHI, Y. (1989): A cutting plane algorithm for a clustering problem. Mathematical Programming 45, 59–96.

    Article  MATH  MathSciNet  Google Scholar 

  • HANSEN, P., JAUMARD, B. (1997): Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215.

    MathSciNet  Google Scholar 

  • HARTIGAN, J.A. (1975): Clustering algorithms. Wiley, New York.

    MATH  Google Scholar 

  • HARTIGAN, J.A., WONG, M.A. (1979): A k-means clustering algorithm. Applied Statistics 28, 100–108.

    Article  MATH  Google Scholar 

  • JANCEY, R.C. (1966a): Multidimensional group analysis. Australian J. Botany 14, 127–130.

    Article  Google Scholar 

  • JANCEY, R. C. (1966b): The application of numerical methods of data analysis to the genus Phyllota Benth. in New South Wales. Australian J. Botany 14, 131–149.

    Article  Google Scholar 

  • JARDINE, N., SIBSON, R. (1971): Mathematical taxonomy. Wiley, New York.

    MATH  Google Scholar 

  • JENSEN, R.E. (1969): A dynamic programming algorithm for cluster analysis. Operations Research 17, 1034–1057.

    MATH  Google Scholar 

  • KAUFMAN, L., ROUSSEEUW, P.J. (1987): Clustering by means of medoids. In: Y. Dodge (Ed.): Statistical data analysis based on the L 1-norm and related methods. North Holland, Amsterdam, 405–416.

    Google Scholar 

  • KAUFMAN, L., ROUSSEEUW, P.J. (1990): Finding groups in data. Wiley, New York.

    Google Scholar 

  • LERMAN, I.C. (1970): Les bases de la classification automatique. Gauthier-Villars, Paris.

    MATH  Google Scholar 

  • LLOYD, S.P. (1957): Least squares quantization in PCM. Bell Telephone Labs Memorandum, Murray Hill, NJ. Reprinted in: IEEE Trans. Information Theory IT-28 (1982), vol. 2, 129–137.

    Google Scholar 

  • MACQUEEN, J. (1967): Some methods for classification and analysis of multivariate observations. In: L.M. LeCam, J. Neyman (eds.): Proc. 5th Berkeley Symp. Math. Statist. Probab. 1965/66. Univ. of California Press, Berkeley, vol. I, 281–297.

    Google Scholar 

  • MARANZANA, F.E. (1963): On the location of supply points to minimize transportation costs. IBM Systems Journal 2, 129–135.

    Article  Google Scholar 

  • MULVEY, J.M., CROWDER, H.P. (1979): Cluster analysis: an application of Lagrangian relaxation. Management Science 25, 329–340.

    Article  MATH  Google Scholar 

  • PÖTZELBERGER, K., STRASSER, H. (2001): Clustering and quantization by MSP partitions. Statistics and Decision 19, 331–371.

    MATH  Google Scholar 

  • POLLARD, D. (1982): A central limit theorem for k-means clustering. nnals of Probability 10, 919–926.

    MATH  MathSciNet  Google Scholar 

  • RAO, M.R. (1971): Cluster analysis and mathematical programming. J. Amer. Statist. Assoc. 66, 622–626.

    Article  MATH  Google Scholar 

  • SCHNEEBERGER, H. (1967): Optimale Schichtung bei proportionaler Aufteilung mit Hilfe eines iterativen Analogrechners. Unternehmensforschung 11, 21–32.

    Article  Google Scholar 

  • SCLOVE, S.L. (1977): Population mixture models and clustering algorithms. Commun. in Statistics, Theory and Methods, A6, 417–434.

    Article  MathSciNet  Google Scholar 

  • SODEUR, W. (1974): Empirische Verfahren zur Klassifikation. Teubner, Stuttgart.

    Google Scholar 

  • SOKAL, R.R., SNEATH, P. H. (1963): Principles of numerical taxonomy. Freeman, San Francisco-London.

    Google Scholar 

  • SPÄ TH, H. (1975): Cluster-Analyse-Algorithmen zur Objektklassifizierung und Datenreduktion. Oldenbourg Verlag, München-Wien.

    Google Scholar 

  • SPÄ TH, H. (1979): Algorithm 39: Clusterwise linear regression. Computing 22, 367–373. Correction in Computing 26 (1981), 275.

    Article  MathSciNet  Google Scholar 

  • SPÄ TH, H. (1985): Cluster dissection and analysis. Wiley, Chichester.

    Google Scholar 

  • STANGE, K. (1960): Die zeichnerische Ermittlung der besten Schätzung bei proportionaler Aufteilung der Stichprobe. Zeitschrift für Unternehmensforschung 4, 156–163.

    Article  MATH  Google Scholar 

  • STEINHAUS, H. (1956): Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences, Classe III, vol. IV, no. 12, 801–804.

    MathSciNet  Google Scholar 

  • STRECKER, H. (1957): Moderne Methoden in der Agrarstatistik. Physica, Würzburg, p. 80 etc.

    Google Scholar 

  • VICHI, M. (2005): Clustering including dimensionality reduction. In: D. Baier, R. Decker, L. Schmidt-Thieme (Eds.): Data analysis and decision support. Springer, Heidelberg, 149–156.

    Chapter  Google Scholar 

  • VINOD, H.D. (1969): Integer programming and the theory of grouping. J. Amer. Statist. Assoc. 64, 506–519.

    Article  MATH  Google Scholar 

  • VOGEL, F. (1975): Probleme und Verfahren der Numerischen Klassifikation. Vandenhoeck & Ruprecht, Göttingen.

    MATH  Google Scholar 

  • WINDHAM, M.P. (1986): A unification of optimization-based clustering algorithms. In: W. Gaul, M. Schader (Eds.): Classification as a tool of research. North Holland, Amsterdam, 447–451.

    Google Scholar 

  • WINDHAM, M.P. (1987): Parameter modification for clustering criteria. Journal of Classification 4, 191–214.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bock, HH. (2007). Clustering Methods: A History of k-Means Algorithms. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_15

Download citation

Publish with us

Policies and ethics