Skip to main content
Log in

ClustGeo: an R package for hierarchical clustering with spatial constraints

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices \(D_0\) and \(D_1\) are inputted, along with a mixing parameter \(\alpha \in [0,1]\). The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the “feature space” and the second matrix gives the dissimilarities in the “constraint space”. The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with \(D_0\) and the homogeneity criterion calculated with \(D_1\). The idea is then to determine a value of \(\alpha \) which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ambroise C, Govaert G (1998) Convergence of an EM-type algorithm for spatial clustering. Pattern Recognit Lett 19(10):919–927

    Article  Google Scholar 

  • Ambroise C, Dang M, Govaert G (1997) Clustering of spatial data by the EM algorithm. In: Soares A, Gòmez-Hernandez J, Froidevaux R (eds) geoENV I: geostatistics for environmental applications. Springer, Berlin, pp 493–504

    Chapter  Google Scholar 

  • Bécue-Bertaut M, Kostov B, Morin A, Naro G (2014) Rhetorical strategy in forensic speeches: multidimensional statistics-based methodology. J Class 31(1):85–106

    Article  MathSciNet  MATH  Google Scholar 

  • Bécue-Bertaut M, Alvarez-Esteban R, Sànchez-Espigares JA (2017) Xplortext: statistical analysis of textual data R package. R package version 1.0. https://cran.r-project.org/package=Xplortext. Accessed 26 Oct 2017

  • Bourgault G, Marcotte D, Legendre P (1992) The multivariate (co) variogram as a spatial weighting function in classification methods. Math Geol 24(5):463–478

    Article  Google Scholar 

  • Chavent M, Kuentz-Simonet V, Labenne A, Saracco J (2017) ClustGeo: hierarchical clustering with spatial constraints. R package version 2.0. https://cran.r-project.org/package=ClustGeo. Accessed 14 July 2017

  • Dehman A, Ambroise C, Neuvial P (2015) Performance of a blockwise approach in variable selection using linkage disequilibrium information. BMC Bioinform 16:148

    Article  Google Scholar 

  • Duque JC, Dev B, Betancourt A, Franco JL (2011) ClusterPy: library of spatially constrained clustering algorithms, RiSE-group (research in spatial economics). EAFIT University. Version 0.9.9. http://www.rise-group.org/risem/clusterpy/. Accessed 19 July 2017

  • Ferligoj A, Batagelj V (1982) Clustering with relational constraint. Psychometrika 47(4):413–426

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon AD (1996) A survey of constrained classication. Comput Stat Data Anal 21:17–29

    Article  Google Scholar 

  • Lance GN, Williams WT (1967) A general theory of classicatory sorting strategies. 1. Hierarchical systems. Comput J 9:373–380

    Article  Google Scholar 

  • Legendre P (2014) const.clust: Space- and time-constrained clustering package. http://adn.biol.umontreal.ca/~numericalecology/Rcode/. Accessed 30 Mar 2014

  • Legendre P, Legendre L (2012) Numerical ecology, vol 24. Elsevier, New York

    MATH  Google Scholar 

  • Miele V, Picard F, Dray S (2014) Spatially constrained clustering of ecological networks. Methods Ecol Evol 5(8):771–779

    Article  Google Scholar 

  • Murtagh F (1985a) Multidimensional clustering algorithms. Compstat lectures. Physika, Vienna

    MATH  Google Scholar 

  • Murtagh F (1985b) A survey of algorithms for contiguity-constrained clustering and related problems. Comput J 28:82–88

    Article  Google Scholar 

  • Oliver M, Webster R (1989) A geostatistical basis for spatial weighting in multivariate classication. Math Geol 21(1):15–35

    Article  Google Scholar 

  • Strauss T, von Maltitz MJ (2017) Generalising ward’s method for use with manhattan distances. PloS ONE. https://doi.org/10.1371/journal.pone.0168288

    Google Scholar 

  • Vignes M, Forbes F (2009) Gene clustering via integrated Markov models combining individual and pairwise features. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 6(2):260–270

    Article  Google Scholar 

  • Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor and the anonymous referees for their valuable comments that lead to several improvements of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marie Chavent.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chavent, M., Kuentz-Simonet, V., Labenne, A. et al. ClustGeo: an R package for hierarchical clustering with spatial constraints. Comput Stat 33, 1799–1822 (2018). https://doi.org/10.1007/s00180-018-0791-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-018-0791-1

Keywords

Navigation