Beyond K-means: Clusters Identification for GIS

  • Andreas Hamfelt
  • Mikael Karlsson
  • Tomas Thierfelder
  • Vladislav Valkovsky
Part of the Lecture Notes in Geoinformation and Cartography book series (LNGC, volume 5)


Clustering is an important concept for analysis of data in GIS. Due to the potentially large amount of data in such systems, the time complexity for clustering algorithms is critical. K-means is a popular clustering algorithm for large-scale systems because of its linear complexity. However, this requires a priori knowledge of the number of clusters and the subsequent selection of their centroids. We propose a method for K-means to find automatically the number of clusters and their associated centroids. Moreover, we consider recursive extension of the algorithm to improve visibility of the results at different levels of abstraction, in order to support the decision-making process.


  1. Bacao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for K- means clustering. In: Sunderam VS et al. (eds): ICCS 2005, LNCS 3516, pp 476–483Google Scholar
  2. Galjano P, Popovich V (2007) Intelligent images analysis in GIS. In: Popovich VV et al. (eds) Information fusion and geographic information systems. Proceedings of the third international workshop, LNG&C, pp 45–68CrossRefGoogle Scholar
  3. Valkovsky VB, Gerasimov MB (1995) Approximate recursive solution for large scale traveling salesman problem (in Russian). Proceedings of St. Petersburg Electrotechnical University, No 489, St Petersburg, pp 27–37Google Scholar
  4. Valkovsky VB, Gerasimov MB, Savvin KO (1999) Phase transitions inTSP and matrix topology. In: Proceedings of the joint workshop on integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Universita degli studi di Ferrara- Facolta di Ingegneria, ItalyGoogle Scholar
  5. Karlsson M (2009) Modifying K-means clustering for Data Mining. Master thesis, Uppsala UniversityGoogle Scholar
  6. Murray AT, Estivil-Castro V (1998) Cluster discovery techniques for exploratory spatial data analysis. In: International journal of geographical information science, 12, Issue 5, July, pp 431–443Google Scholar
  7. Pick J (2004) Geographic information systems. Proceedings of American conference on information systems, AMCIS 2004CrossRefGoogle Scholar
  8. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM computing surveys 31(3): 264–323CrossRefGoogle Scholar
  9. Kolatch E (2001) Clustering algorithms for spatial databases: a survey,
  10. Rui X, Wunsch DC II (2009) Clustering. IEEE Press series on computational intelligence, John Wiley & SonsGoogle Scholar
  11. Forgy E (1965) Cluster analysis of multivariate data; efficiency vs. interpretability of classifications. Biometrics, 21: pp 768–780Google Scholar
  12. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium, 1, pp 281–297Google Scholar
  13. Duda R, Hart P (2001) Pattern classification, 2nd edn. New York, NY: John Wiley & SonsGoogle Scholar
  14. Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. San Diego, CA: Academic PressGoogle Scholar
  15. Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison WesleyGoogle Scholar
  16. Bradley P, Fayyad U (1998) Refining initial points for K-means clustering. International conference on machine learning (ICML-98), pp 91–99Google Scholar
  17. Selim S, Ismail M (1984) K-means-type algorithms: a generalization convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, 6(1): pp 77–81CrossRefGoogle Scholar
  18. Dubes R (1993) Cluster analysis and related issue. In: Chen C, Pau L, Wang P (eds) Handbook of pattern recognition and computer vision, River Edge, NY: World Science Publishing Company, pp 3–32Google Scholar
  19. Krishna K, Murty M (1999) Generic K-Means algorithm. IEEE Transactions on systems, man, and cybernetics- part B: Cybernetics, 29(3): pp 433–439Google Scholar
  20. Jai A, Dubes R (1988) Algorithms for clustering data. Englewood Cliffs, NJ: Prentice HallGoogle Scholar
  21. Likas A, Vlassis N, Verbeek J (2003) The global K-means clustering algorithm. Pattern recognition, 36(2), pp 451–461CrossRefGoogle Scholar
  22. Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for K-means algorithm. Pattern recognition letters 20: pp 1027–1040CrossRefGoogle Scholar
  23. Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behavioral science, 12: pp 153–155CrossRefGoogle Scholar
  24. Milligan G, Cooper M (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50: pp 150–179Google Scholar
  25. SAS Institute Inc., SAS technical report A-108 (1983) Cubic clustering criterion. Cary, NC: SAS Institute Inc., 56 ppGoogle Scholar
  26. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans. inform theory 13(1): 21–27CrossRefGoogle Scholar
  27. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine learning 2: 139–172Google Scholar
  28. Higgs RE, Bemis KG, Watson I, Wikel J (1997) Experimental designs for selecting molecules from large chemical databases. Journal of chemical information and computer sciences (37) 5: 861–870Google Scholar
  29. Meila M, Heckerman D (2001) An experimental comparison of several clustering and initialization methods. Machine learning 42: 9–29CrossRefGoogle Scholar
  30. Han J, Kamber M (2006) Data Mining. Concepts and techniques. Elsevier Inc.Google Scholar
  31. Wasserman L (2007) All of nonparametric statistics. Springer-VerlagGoogle Scholar
  32. Kolmogorov A (1941) Confidence limits for an unknown distribution function. Annals of mathematical statistics 12, 461–483CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Andreas Hamfelt
    • 1
  • Mikael Karlsson
    • 2
  • Tomas Thierfelder
    • 3
  • Vladislav Valkovsky
    • 1
  1. 1.Informatics and MediaUppsala UniversityUppsalaSweden
  2. 2.Eins SAP ConsultingStockholmSweden
  3. 3.Department of Energy and TechnologySwedish University of Agricultural SciencesUppsalaSweden

Personalised recommendations