Clustering Methods: A History of k-Means Algorithms

Bock, Hans-Hermann

doi:10.1007/978-3-540-73560-1_15

Clustering Methods: A History of k-Means Algorithms

Hans-Hermann Bock²³

Chapter

3618 Accesses
79 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

This paper surveys some historical issues related to the well-known k-means algorithm in cluster analysis. It shows to which authors the different versions of this algorithm can be traced back, and which were the underlying applications. We sketch various generalizations (with references also to Diday’s work) and thereby underline the usefulness of the k-means approach in data analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ANDERBERG, M.R. (1973): Cluster analysis for applications. Academic Press, New York.
MATH Google Scholar
BIJNEN, E.J. (1973): Cluster analysis. Tilburg University Press, Tilburg, Netherlands.
MATH Google Scholar
BOCK, H.-H. (1969): The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Paper presented at the Workshop ‘Medizinische Statistik’, February 1969, Forschungsinstitut Oberwolfach.
Google Scholar
BOCK, H.-H. (1974): Automatische Klassifikation. Theoretische und praktische Methoden zur Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen.
Google Scholar
BOCK, H.-H. (1985): On some significance tests in cluster analysis. Journal of Classification 2, 77–108.
Article MATH MathSciNet Google Scholar
BOCK, H.-H. (1983): A clustering algorithm for choosing optimal classes for the chi-square test. Bull. 44th Session of the International Statistical institute, Madrid, Contributed Papers, Vol 2, 758–762.
MathSciNet Google Scholar
BOCK, H.-H. (1986): Loglinear models and entropy clustering methods for qualitative data. In: W. Gaul, M. Schader (Eds.): Classification as a tool of research. North Holland, Amsterdam, 19–26.
Google Scholar
BOCK, H.-H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: H. Bozdogan, A.K. Gupta (Eds.): Multivariate statistical modeling and data analysis. Reidel, Dordrecht, 17–34.
Google Scholar
BOCK, H.-H. (1992): A clustering technique for maximizing Ø-divergence, noncentrality and discriminating power. In: M. Schader (Ed.): Analyzing and modeling data and knowledge. Springer, Heidelberg, 19–36.
Google Scholar
BOCK, H.-H. (1996a): Probability models and hypotheses testing in partitioning cluster analysis. In: P. Arabie, L.J. Hubert, G. De Soete (Eds.): Clustering and classification. World Scientific, Singapore, 377–453.
Google Scholar
BOCK, H.-H. (1996b): Probabilistic models in partitional cluster analysis. Computational Statistics and Data Analysis 23, 5–28.
Article MATH Google Scholar
BOCK, H.-H. (1996c): Probabilistic models in cluster analysis. In: A. Ferligoj, A. Kramberger (Eds.): Developments in data analysis. Proc. Intern.Conf. on’ statistical data collection and analysis’, Bled, 1994. FDV, Metodoloski zvezki, 12, Ljubljana, Slovenia, 3–25.
Google Scholar
BOCK, H.-H. (2003): Convexity-based clustering criteria: theory, algorithms, and applications in statistics. Statistical Methods & Applications 12, 293–317.
MATH MathSciNet Google Scholar
BRYANT, P. (1988): On characterizing optimization-based clustering methods. Journal of Classification 5, 81–84.
Article MathSciNet Google Scholar
CHARLES, C. (1977): Regression typologique. Rapport de Recherche no. 257. IRIALABORIA, Le Chesnay.
Google Scholar
COX, D.R. (1957) Note on grouping. J. Amer. Statist. Assoc. 52, 543–547.
Article MATH Google Scholar
DALENIUS, T. (1950): The problem of optimum stratification I. Skandinavisk Aktuarietidskrift 1950, 203–213.
MathSciNet Google Scholar
DALENIUS, T., GURNEY, M. (1951): The problem of optimum stratification. II. Skandinavisk Aktuarietidskrift 1951, 133–148.
MATH MathSciNet Google Scholar
DIDAY, E. (1971): Une nouvelle méthode de classification automatique et reconnaissance des formes: la méthode des nuées dynamiques. Revue de Statistique Appliquée XIX(2), 1970, 19–33.
Google Scholar
DIDAY, E. (1972): Optimisation en classification automatique et reconnaissance des formes. Revue Française d’Automatique, Informatique et Recherche Opérationelle (R.A.I.R.O.) VI, 61–96.
MathSciNet Google Scholar
DIDAY, E. (1973): The dynamic clusters method in nonhierarchical clustering. Intern. Journal of Computer and Information Sciences 2(1), 61–88.
Article MATH MathSciNet Google Scholar
DIDAY, E. et al. (1979): Optimisation en classification automatique. Vol. I, II. Institut National der Recherche en Informatique et en Automatique (INRIA), Le Chesnay, France.
MATH Google Scholar
DIDAY, E., GOVAERT, G. (1974): Classification avec distance adaptative. Comptes Rendus Acad. Sci. Paris 278 A, 993–995.
MathSciNet Google Scholar
DIDAY, E., GOVAERT, G. (1977): Classification automatique avec distances adaptatives. R.A.I.R.O. Information/Computer Science 11(4), 329–349.
MathSciNet Google Scholar
DIDAY, E., SCHROEDER, A. (1974a): The dynamic clusters method in pattern recognition. In: J.L. Rosenfeld (Ed.): Information Processing 74. Proc. IFIP Congress, Stockholm, August 1974. North Holland, Amsterdam, 691–697.
Google Scholar
DIDAY, E., SCHROEDER, A. (1974b): A new approach in mixed distribution detection. Rapport de Recherche no. 52, Janvier 1974. INRIA, Le Chesnay.
Google Scholar
DIDAY, E., SCHROEDER, A. (1976): A new approach in mixed distribution detection. R.A.I.R.O. Recherche Opérationelle 10(6), 75–1060.
MathSciNet Google Scholar
FISHER, W.D. (1958): On grouping for maximum heterogeneity. J. Amer. Statist. Assoc. 53, 789–798.
Article MATH MathSciNet Google Scholar
FORGY, E.W. (1965): Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics 21 (1965) 768.
Google Scholar
GALLEGOS, M.T. (2002): Maximum likelihood clustering with outliers. In: K. Jajuga, A. Sokolowski, H.-H. Bock (Eds.): Classification, clustering, and data analysis. Springer, Heidelberg, 248–255.
Google Scholar
GALLEGOS, M.T., RITTER, G. (2005): A robust method for cluster analysis. Annals of Statistics 33, 347–380.
Article MATH MathSciNet Google Scholar
GRÖTSCHEL, M., WAKABAYASHI, Y. (1989): A cutting plane algorithm for a clustering problem. Mathematical Programming 45, 59–96.
Article MATH MathSciNet Google Scholar
HANSEN, P., JAUMARD, B. (1997): Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215.
MathSciNet Google Scholar
HARTIGAN, J.A. (1975): Clustering algorithms. Wiley, New York.
MATH Google Scholar
HARTIGAN, J.A., WONG, M.A. (1979): A k-means clustering algorithm. Applied Statistics 28, 100–108.
Article MATH Google Scholar
JANCEY, R.C. (1966a): Multidimensional group analysis. Australian J. Botany 14, 127–130.
Article Google Scholar
JANCEY, R. C. (1966b): The application of numerical methods of data analysis to the genus Phyllota Benth. in New South Wales. Australian J. Botany 14, 131–149.
Article Google Scholar
JARDINE, N., SIBSON, R. (1971): Mathematical taxonomy. Wiley, New York.
MATH Google Scholar
JENSEN, R.E. (1969): A dynamic programming algorithm for cluster analysis. Operations Research 17, 1034–1057.
MATH Google Scholar
KAUFMAN, L., ROUSSEEUW, P.J. (1987): Clustering by means of medoids. In: Y. Dodge (Ed.): Statistical data analysis based on the L ₁-norm and related methods. North Holland, Amsterdam, 405–416.
Google Scholar
KAUFMAN, L., ROUSSEEUW, P.J. (1990): Finding groups in data. Wiley, New York.
Google Scholar
LERMAN, I.C. (1970): Les bases de la classification automatique. Gauthier-Villars, Paris.
MATH Google Scholar
LLOYD, S.P. (1957): Least squares quantization in PCM. Bell Telephone Labs Memorandum, Murray Hill, NJ. Reprinted in: IEEE Trans. Information Theory IT-28 (1982), vol. 2, 129–137.
Google Scholar
MACQUEEN, J. (1967): Some methods for classification and analysis of multivariate observations. In: L.M. LeCam, J. Neyman (eds.): Proc. 5th Berkeley Symp. Math. Statist. Probab. 1965/66. Univ. of California Press, Berkeley, vol. I, 281–297.
Google Scholar
MARANZANA, F.E. (1963): On the location of supply points to minimize transportation costs. IBM Systems Journal 2, 129–135.
Article Google Scholar
MULVEY, J.M., CROWDER, H.P. (1979): Cluster analysis: an application of Lagrangian relaxation. Management Science 25, 329–340.
Article MATH Google Scholar
PÖTZELBERGER, K., STRASSER, H. (2001): Clustering and quantization by MSP partitions. Statistics and Decision 19, 331–371.
MATH Google Scholar
POLLARD, D. (1982): A central limit theorem for k-means clustering. nnals of Probability 10, 919–926.
MATH MathSciNet Google Scholar
RAO, M.R. (1971): Cluster analysis and mathematical programming. J. Amer. Statist. Assoc. 66, 622–626.
Article MATH Google Scholar
SCHNEEBERGER, H. (1967): Optimale Schichtung bei proportionaler Aufteilung mit Hilfe eines iterativen Analogrechners. Unternehmensforschung 11, 21–32.
Article Google Scholar
SCLOVE, S.L. (1977): Population mixture models and clustering algorithms. Commun. in Statistics, Theory and Methods, A6, 417–434.
Article MathSciNet Google Scholar
SODEUR, W. (1974): Empirische Verfahren zur Klassifikation. Teubner, Stuttgart.
Google Scholar
SOKAL, R.R., SNEATH, P. H. (1963): Principles of numerical taxonomy. Freeman, San Francisco-London.
Google Scholar
SPÄ TH, H. (1975): Cluster-Analyse-Algorithmen zur Objektklassifizierung und Datenreduktion. Oldenbourg Verlag, München-Wien.
Google Scholar
SPÄ TH, H. (1979): Algorithm 39: Clusterwise linear regression. Computing 22, 367–373. Correction in Computing 26 (1981), 275.
Article MathSciNet Google Scholar
SPÄ TH, H. (1985): Cluster dissection and analysis. Wiley, Chichester.
Google Scholar
STANGE, K. (1960): Die zeichnerische Ermittlung der besten Schätzung bei proportionaler Aufteilung der Stichprobe. Zeitschrift für Unternehmensforschung 4, 156–163.
Article MATH Google Scholar
STEINHAUS, H. (1956): Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences, Classe III, vol. IV, no. 12, 801–804.
MathSciNet Google Scholar
STRECKER, H. (1957): Moderne Methoden in der Agrarstatistik. Physica, Würzburg, p. 80 etc.
Google Scholar
VICHI, M. (2005): Clustering including dimensionality reduction. In: D. Baier, R. Decker, L. Schmidt-Thieme (Eds.): Data analysis and decision support. Springer, Heidelberg, 149–156.
Chapter Google Scholar
VINOD, H.D. (1969): Integer programming and the theory of grouping. J. Amer. Statist. Assoc. 64, 506–519.
Article MATH Google Scholar
VOGEL, F. (1975): Probleme und Verfahren der Numerischen Klassifikation. Vandenhoeck & Ruprecht, Göttingen.
MATH Google Scholar
WINDHAM, M.P. (1986): A unification of optimization-based clustering algorithms. In: W. Gaul, M. Schader (Eds.): Classification as a tool of research. North Holland, Amsterdam, 447–451.
Google Scholar
WINDHAM, M.P. (1987): Parameter modification for clustering criteria. Journal of Classification 4, 191–214.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistics, RWTH Aachen University, D-52056, Aachen, Germany
Hans-Hermann Bock

Authors

Hans-Hermann Bock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-464, Porto, Portugal
Paula Brito
ESG UQAM, 315 East, Sainte-Catherine Street, Montreal, Quebec, H2X 3X2, Canada
Guy Cucumel
Department Lussi, ENST Bretagne, 2 rue de la Châtaigneraie, CS 17607, 35576, Cesson-Sévigné Cedex, France
Patrice Bertrand
Centre of Computer Science (CIn), Federal University of Pernambuco (UFPE), Av. Prof. Luiz Freire s/n Cidade Universitária, CEP 50740-540, Recife-PE, Brazil
Francisco de Carvalho

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bock, HH. (2007). Clustering Methods: A History of k-Means Algorithms. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-73560-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics