Abstract
This paper surveys some historical issues related to the well-known k-means algorithm in cluster analysis. It shows to which authors the different versions of this algorithm can be traced back, and which were the underlying applications. We sketch various generalizations (with references also to Diday’s work) and thereby underline the usefulness of the k-means approach in data analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
ANDERBERG, M.R. (1973): Cluster analysis for applications. Academic Press, New York.
BIJNEN, E.J. (1973): Cluster analysis. Tilburg University Press, Tilburg, Netherlands.
BOCK, H.-H. (1969): The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Paper presented at the Workshop ‘Medizinische Statistik’, February 1969, Forschungsinstitut Oberwolfach.
BOCK, H.-H. (1974): Automatische Klassifikation. Theoretische und praktische Methoden zur Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen.
BOCK, H.-H. (1985): On some significance tests in cluster analysis. Journal of Classification 2, 77–108.
BOCK, H.-H. (1983): A clustering algorithm for choosing optimal classes for the chi-square test. Bull. 44th Session of the International Statistical institute, Madrid, Contributed Papers, Vol 2, 758–762.
BOCK, H.-H. (1986): Loglinear models and entropy clustering methods for qualitative data. In: W. Gaul, M. Schader (Eds.): Classification as a tool of research. North Holland, Amsterdam, 19–26.
BOCK, H.-H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: H. Bozdogan, A.K. Gupta (Eds.): Multivariate statistical modeling and data analysis. Reidel, Dordrecht, 17–34.
BOCK, H.-H. (1992): A clustering technique for maximizing Ø-divergence, noncentrality and discriminating power. In: M. Schader (Ed.): Analyzing and modeling data and knowledge. Springer, Heidelberg, 19–36.
BOCK, H.-H. (1996a): Probability models and hypotheses testing in partitioning cluster analysis. In: P. Arabie, L.J. Hubert, G. De Soete (Eds.): Clustering and classification. World Scientific, Singapore, 377–453.
BOCK, H.-H. (1996b): Probabilistic models in partitional cluster analysis. Computational Statistics and Data Analysis 23, 5–28.
BOCK, H.-H. (1996c): Probabilistic models in cluster analysis. In: A. Ferligoj, A. Kramberger (Eds.): Developments in data analysis. Proc. Intern.Conf. on’ statistical data collection and analysis’, Bled, 1994. FDV, Metodoloski zvezki, 12, Ljubljana, Slovenia, 3–25.
BOCK, H.-H. (2003): Convexity-based clustering criteria: theory, algorithms, and applications in statistics. Statistical Methods & Applications 12, 293–317.
BRYANT, P. (1988): On characterizing optimization-based clustering methods. Journal of Classification 5, 81–84.
CHARLES, C. (1977): Regression typologique. Rapport de Recherche no. 257. IRIALABORIA, Le Chesnay.
COX, D.R. (1957) Note on grouping. J. Amer. Statist. Assoc. 52, 543–547.
DALENIUS, T. (1950): The problem of optimum stratification I. Skandinavisk Aktuarietidskrift 1950, 203–213.
DALENIUS, T., GURNEY, M. (1951): The problem of optimum stratification. II. Skandinavisk Aktuarietidskrift 1951, 133–148.
DIDAY, E. (1971): Une nouvelle méthode de classification automatique et reconnaissance des formes: la méthode des nuées dynamiques. Revue de Statistique Appliquée XIX(2), 1970, 19–33.
DIDAY, E. (1972): Optimisation en classification automatique et reconnaissance des formes. Revue Française d’Automatique, Informatique et Recherche Opérationelle (R.A.I.R.O.) VI, 61–96.
DIDAY, E. (1973): The dynamic clusters method in nonhierarchical clustering. Intern. Journal of Computer and Information Sciences 2(1), 61–88.
DIDAY, E. et al. (1979): Optimisation en classification automatique. Vol. I, II. Institut National der Recherche en Informatique et en Automatique (INRIA), Le Chesnay, France.
DIDAY, E., GOVAERT, G. (1974): Classification avec distance adaptative. Comptes Rendus Acad. Sci. Paris 278 A, 993–995.
DIDAY, E., GOVAERT, G. (1977): Classification automatique avec distances adaptatives. R.A.I.R.O. Information/Computer Science 11(4), 329–349.
DIDAY, E., SCHROEDER, A. (1974a): The dynamic clusters method in pattern recognition. In: J.L. Rosenfeld (Ed.): Information Processing 74. Proc. IFIP Congress, Stockholm, August 1974. North Holland, Amsterdam, 691–697.
DIDAY, E., SCHROEDER, A. (1974b): A new approach in mixed distribution detection. Rapport de Recherche no. 52, Janvier 1974. INRIA, Le Chesnay.
DIDAY, E., SCHROEDER, A. (1976): A new approach in mixed distribution detection. R.A.I.R.O. Recherche Opérationelle 10(6), 75–1060.
FISHER, W.D. (1958): On grouping for maximum heterogeneity. J. Amer. Statist. Assoc. 53, 789–798.
FORGY, E.W. (1965): Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics 21 (1965) 768.
GALLEGOS, M.T. (2002): Maximum likelihood clustering with outliers. In: K. Jajuga, A. Sokolowski, H.-H. Bock (Eds.): Classification, clustering, and data analysis. Springer, Heidelberg, 248–255.
GALLEGOS, M.T., RITTER, G. (2005): A robust method for cluster analysis. Annals of Statistics 33, 347–380.
GRÖTSCHEL, M., WAKABAYASHI, Y. (1989): A cutting plane algorithm for a clustering problem. Mathematical Programming 45, 59–96.
HANSEN, P., JAUMARD, B. (1997): Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215.
HARTIGAN, J.A. (1975): Clustering algorithms. Wiley, New York.
HARTIGAN, J.A., WONG, M.A. (1979): A k-means clustering algorithm. Applied Statistics 28, 100–108.
JANCEY, R.C. (1966a): Multidimensional group analysis. Australian J. Botany 14, 127–130.
JANCEY, R. C. (1966b): The application of numerical methods of data analysis to the genus Phyllota Benth. in New South Wales. Australian J. Botany 14, 131–149.
JARDINE, N., SIBSON, R. (1971): Mathematical taxonomy. Wiley, New York.
JENSEN, R.E. (1969): A dynamic programming algorithm for cluster analysis. Operations Research 17, 1034–1057.
KAUFMAN, L., ROUSSEEUW, P.J. (1987): Clustering by means of medoids. In: Y. Dodge (Ed.): Statistical data analysis based on the L 1-norm and related methods. North Holland, Amsterdam, 405–416.
KAUFMAN, L., ROUSSEEUW, P.J. (1990): Finding groups in data. Wiley, New York.
LERMAN, I.C. (1970): Les bases de la classification automatique. Gauthier-Villars, Paris.
LLOYD, S.P. (1957): Least squares quantization in PCM. Bell Telephone Labs Memorandum, Murray Hill, NJ. Reprinted in: IEEE Trans. Information Theory IT-28 (1982), vol. 2, 129–137.
MACQUEEN, J. (1967): Some methods for classification and analysis of multivariate observations. In: L.M. LeCam, J. Neyman (eds.): Proc. 5th Berkeley Symp. Math. Statist. Probab. 1965/66. Univ. of California Press, Berkeley, vol. I, 281–297.
MARANZANA, F.E. (1963): On the location of supply points to minimize transportation costs. IBM Systems Journal 2, 129–135.
MULVEY, J.M., CROWDER, H.P. (1979): Cluster analysis: an application of Lagrangian relaxation. Management Science 25, 329–340.
PÖTZELBERGER, K., STRASSER, H. (2001): Clustering and quantization by MSP partitions. Statistics and Decision 19, 331–371.
POLLARD, D. (1982): A central limit theorem for k-means clustering. nnals of Probability 10, 919–926.
RAO, M.R. (1971): Cluster analysis and mathematical programming. J. Amer. Statist. Assoc. 66, 622–626.
SCHNEEBERGER, H. (1967): Optimale Schichtung bei proportionaler Aufteilung mit Hilfe eines iterativen Analogrechners. Unternehmensforschung 11, 21–32.
SCLOVE, S.L. (1977): Population mixture models and clustering algorithms. Commun. in Statistics, Theory and Methods, A6, 417–434.
SODEUR, W. (1974): Empirische Verfahren zur Klassifikation. Teubner, Stuttgart.
SOKAL, R.R., SNEATH, P. H. (1963): Principles of numerical taxonomy. Freeman, San Francisco-London.
SPÄ TH, H. (1975): Cluster-Analyse-Algorithmen zur Objektklassifizierung und Datenreduktion. Oldenbourg Verlag, München-Wien.
SPÄ TH, H. (1979): Algorithm 39: Clusterwise linear regression. Computing 22, 367–373. Correction in Computing 26 (1981), 275.
SPÄ TH, H. (1985): Cluster dissection and analysis. Wiley, Chichester.
STANGE, K. (1960): Die zeichnerische Ermittlung der besten Schätzung bei proportionaler Aufteilung der Stichprobe. Zeitschrift für Unternehmensforschung 4, 156–163.
STEINHAUS, H. (1956): Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences, Classe III, vol. IV, no. 12, 801–804.
STRECKER, H. (1957): Moderne Methoden in der Agrarstatistik. Physica, Würzburg, p. 80 etc.
VICHI, M. (2005): Clustering including dimensionality reduction. In: D. Baier, R. Decker, L. Schmidt-Thieme (Eds.): Data analysis and decision support. Springer, Heidelberg, 149–156.
VINOD, H.D. (1969): Integer programming and the theory of grouping. J. Amer. Statist. Assoc. 64, 506–519.
VOGEL, F. (1975): Probleme und Verfahren der Numerischen Klassifikation. Vandenhoeck & Ruprecht, Göttingen.
WINDHAM, M.P. (1986): A unification of optimization-based clustering algorithms. In: W. Gaul, M. Schader (Eds.): Classification as a tool of research. North Holland, Amsterdam, 447–451.
WINDHAM, M.P. (1987): Parameter modification for clustering criteria. Journal of Classification 4, 191–214.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bock, HH. (2007). Clustering Methods: A History of k-Means Algorithms. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-73560-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)