Statistical Methods and Applications

, Volume 10, Issue 1–3, pp 237–256 | Cite as

A constrainedk-means clustering algorithm for classifying spatial units

  • G. Damiana Costanzo
Statistical Applications


In some classification problems it may be important to impose constraints on the set of allowable solutions. In particular, in regional taxonomy, urban and regional studies often try to segment a set of territorial data in homogenous groups with respect to a set of socio-economic variables taking into account, at the same time, contiguous neighbourhoods. The objects in a class are thus required not only to be similar to one another but also to be part of a spatially contiguous set. The rationale behind this is that if a spatially varying phenomenon influences the objects, as could occur in the case of geographical units, and this spatial information were ignored in constructing the classes then it would be less likely to be detected. In this paper a constrained version of thek-means clustering method (MacQueen, 1967; Ball and Hall, 1967) and a new algorithm for devising such a procedure are proposed; the latter is based on the efficient algorithm proposed by Hartigan and Wong (1979). This algorithm has proved its usefulness in zoning two large regions in Italy (Calabria and Puglia).

Key words

k-means clustering constrained optimisation contiguity matrix spatial data regional taxonomy segmentation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Anania G, Cersosimo D, Costanzo GD (2001) Le Calabrie contemporanee. Un'analisi delle caratteristiche degli ambiti economico produttivi sub-regionali. In: Scelte pubbliche, strategie private e sviluppo economico in Calabria. Conoscere per Decidere, Rubbettino, Soveria Mannelli, 333–380Google Scholar
  2. Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behavioural, Science12, 153–155Google Scholar
  3. Batagelj V (1984) Agglomerative methods in clustering with constrains. Preprint Series Dept. Math. Univ. Ljublijana22 (102), 5–19Google Scholar
  4. Caliñski T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics3, 1–27MATHCrossRefGoogle Scholar
  5. Christofides N (1975) Graph Theory. Academic Press, London.MATHGoogle Scholar
  6. Cressie NAC (1993) Statistics for spatial data. Wiley, New YorkMATHGoogle Scholar
  7. De Soete G, DeSarbo WS, Furnas GW, Carrol JD (1984) The estimation of ultrametric and path trees from rectangular proximity data. Psychometrika49, 289–310CrossRefGoogle Scholar
  8. De Soete G, Carrol JD (1994)K-means clustering in a low-dimensional Euclidean space. In: Diday E et al. (eds.) New approaches in classification and data analysis, pp. 212–219. Springer, Berlin Heidelberg New YorkGoogle Scholar
  9. DeSarbo WS, Mahajan V (1984) Constrained classification: the use of a priori information in cluster analysis. Psychometrika49, 187–215MATHCrossRefGoogle Scholar
  10. Ferligoj A, Batagelj V (1982) Clustering with relational constraint. Psychometrika47, 413–426MATHMathSciNetCrossRefGoogle Scholar
  11. Ferligoj A, Batagelj V (1983) Some types of clustering with relational constraint. Psychometrika48, 541–522.MATHMathSciNetCrossRefGoogle Scholar
  12. Ferligoj A, Batagelj V (1992) Direct multicriteria clustering algorithms. Journal of Classification9 (1), 43–61MATHMathSciNetGoogle Scholar
  13. Ferligoj A, Batagelj, V (1998) Constrained clustering problems. In: Proceedings of IFCS '98, Rome, 541–522Google Scholar
  14. Ferligoj A, Batagelj V (2000). Clustering relational data. In: Gaul W, Opitz O., Schader M (eds.) Data analysis, Springer, Berlin heidelberg New York, 3–15Google Scholar
  15. Gordon AD (1973) Classifications in the presence of constraints. Biometrics29, 821–827CrossRefGoogle Scholar
  16. Gordon AD (1980) Methods of constrained classification. In: Tomassone R (ed.) Analyse de données et informatique. (INRIA, Le Chesnay), 149–160.Google Scholar
  17. Gordon AD (1999) Classification. Chapmann & Hall, LondonMATHGoogle Scholar
  18. Gordon AD (1987) Parsimonious trees. Journal of Classification4, 85–101MATHCrossRefGoogle Scholar
  19. Gordon AD (1996) A survey of constrained classification. Computational Statistics & Data Analysis21, 17–29MATHMathSciNetCrossRefGoogle Scholar
  20. Gordon AD (1996) (a). How many clusters? An Investigation of five procedures for detecting nested cluster structure. In: Hayashi C et al. (eds.) Data science, classification, and related methods. Berlin Heidelberg New York, Springer, 109–116Google Scholar
  21. Gordon AD, Vichi M (2001) Fuzzy partition models for fitting a set of partitions.Psychometrika 66, 229–248MathSciNetCrossRefGoogle Scholar
  22. Harary F (1969) Graph theory Addison-Wesley, Reading, MAGoogle Scholar
  23. Hartigan JA (1975) Clustering algorithms Wiley, New YorkMATHGoogle Scholar
  24. Hartigan JA, Wong MA (1979) Algorithm AS 136: Ak-means clustering algorithm. Applied Statistics28 (1), 100–108MATHCrossRefGoogle Scholar
  25. Hubert LJ (1974) Some applications of graph theory to clustering. Psychometrika39 (3), 283–308MATHMathSciNetCrossRefGoogle Scholar
  26. Lebart L (1978) Programme d'agrégation avec contraintes. Le Cahiers de l'Analyse des Données3, 275–287Google Scholar
  27. Lechevallier Y (1980) Classification sous contraintes. In: Diday E et al. (eds.) Optimisation en classification automatique INRIA, Paris, 677–696Google Scholar
  28. Lefkovitch LP (1980) Conditional clustering. Biometrics36, 43–58MATHCrossRefGoogle Scholar
  29. Legendre P (1987) Constrained clustering. In: Legendre P et al. (eds.) Developments in numerical ecology. Springer, Berlin Heidelberg New YorkGoogle Scholar
  30. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: LeCam LM et al. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematic, Statistics and Probability, vol. 1, Statistics, University of California Press, Berkeley, CA, 281–298Google Scholar
  31. Maravalle M, Simeone B, Naldini, R (1997). Clustering on trees. Computational Statistics & Data Analysis24, 217–234MATHMathSciNetCrossRefGoogle Scholar
  32. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika50, 159–179CrossRefGoogle Scholar
  33. Mills G (1967) The determination of local government boundaries. Operational Research Quarterly18, 243–255CrossRefGoogle Scholar
  34. Monestiez P (1977) Méthode de classification automatique sous contraintes spatiales. Statistique et Analyse des Données3, 75–84Google Scholar
  35. Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Computer Journal28, 82–88CrossRefGoogle Scholar
  36. Openshaw S (1977) A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transaction of the Institute of British Geographers52, 247–258Google Scholar
  37. Seber GAF (1984) Multivariate observations, Wiley, New YorkMATHGoogle Scholar
  38. Späth H (1980) Cluster analysis algorithms. Ellis Horwood, ChichesterMATHGoogle Scholar
  39. Taylor PJ (1973) Some implications of the spatial organizations of elections. Transaction of the Institute of British Geographers60, 121–136CrossRefGoogle Scholar
  40. Upton G, Fingleton B (1985) Spatial data analysis by example, vol. 1, Wiley, New YorkGoogle Scholar
  41. Vicari D (1990) Indici per la scelta del numero dei gruppi. Metron49, 473–492Google Scholar
  42. Webster R (1977) Quantitative and numerical methods in soil classification and survey. Clarendon Press, Oxford New YorkGoogle Scholar
  43. Wilson RJ (1996) Introduction to graph theory. Addison Wesley Longman, EnglandMATHGoogle Scholar
  44. Zani S (1993) Classificazione di unità territoriali e spaziali. In: Zani S (ed.) Metodi statistici per le analisi territoriali. Franco Angeli, Milano, 93–121Google Scholar

Copyright information

© Springer-Verlag 2001

Authors and Affiliations

  • G. Damiana Costanzo
    • 1
  1. 1.Dipartimento di Economia e StatisticaUniversità della CalabriaArcavacata di RendeItaly

Personalised recommendations