Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Spatio-temporal climate regionalization using a self-organized clustering approach

  • 35 Accesses


The authors present a novel self-organized climate regionalization (CR) method that obtains a spatial clustering of regions, based on the explained variance of physical measurements in their coverage. This method enables a microscopic characterization of the probabilistic spatial extent of climate regions, using the statistics of the obtained clusters. It also allows for the study of the macroscopic behaviour of climate regions through time by using the dissimilarity among different cluster size probability histograms. The main advantages of the presented method, based on the Second-Order Data-Coupled Clustering (SODCC) algorithm, are that SODCC is robust to the selection of tunable parameters and that it does not require a regular or homogeneous grid to be applied. Moreover, the SODCC method has higher spatial resolution, lower computational complexity, and allows for a more direct physical interpretation of the outputs than other existing CR methods, such as Empirical Orthogonal Function (EOF) or Rotated Empirical Orthogonal Function (REOF). These facts are illustrated with an example of winter wind speed regionalization in the Iberian Peninsula through the period (1979 − 2014). This study also reveals that the North Atlantic Oscillation (NAO) has a high influence over the wind distribution in the Iberian Peninsula in a subset of years in the considered period.

This is a preview of subscription content, log in to check access.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

    A vector is a matrix consisting of a single column of elements and the symbol (⋅) indicates the transpose operation.

  2. 2.

    Note that the algorithm source code can be found at https://github.com/MihChi/SODCC.


  1. Argüeso D, Hidalgo-Muñoz JM, Gámiz-Fortis SR, Esteban-Parra MJ, Dudhia J, Castro-Díez Y (2011) Evaluation of WRF parameterizations for climate studies over Southern Spain using a multistep regionalization. J Climate 24(21):5633–5651

  2. Azorin-Molina C, Vicente-Serrano SM, McVicar TR, Jerez S, Sanchez-Lorenzo A, López-Moreno JI, Revuelto J, Trigo R, Lopez-Bustins JA, Espírito-Santo F (2014) Homogenization and assessment of observed near-surface wind speed trends over Spain and Portugal, 1961–2011. J Climate 27(10):3692–3712

  3. Badr HS, Zaitchik BF, Dezfuli AK (2014) Climate regionalization through hierarchical clustering: options and recommendations for Africa. AGU Fall Meeting Abstracts

  4. Badr HS, Zaitchik BF, Dezfuli AK (2015) A tool for hierarchical climate regionalization. Earth Sci Inform 8(4):949–958

  5. Baeriswyl PA, Rebetez M (1997) Regionalization of precipitation in Switzerland by means of principal component analysis. Theor Appl Climatol 58(1–2):31–41

  6. Baik J, Silverstein JW (2006) Eigenvalues of large sample covariance matrices of spiked population models. J Multivar Anal 97(6):1382–1408

  7. Biehl M, Mietzner A (1994) Statistical mechanics of unsupervised structure recognition. J Phys A Math Gen 27(6):1885

  8. Bierstedt SE, Hünicke B, Zorita E, Wagner S, Gómez-Navarro JJ (2014) Variability of daily winter wind speed distribution over Northern Europe during the past millennium in regional and global climate simulations. Clim Past 2(12):317–338

  9. Björnsson H, Venegas S (1997) A manual for EOF and SVD analyses of climatic data. CCGCR Report 97(1):112–134

  10. Branick ML (1997) A climatology of significant winter-type weather events in the contiguous United States, 1982–94. Weather Forecast 12(2):193–207

  11. Burn DH (1989) Cluster analysis as applied to regional flood frequency. J Water Resour Plan Manag 115 (5):567–582

  12. Burningham H, French J (2013) Is the NAO winter index a reliable proxy for wind climate and storminess in Northwest Europe? Int J Climatol 33(8):2036–2049

  13. Carvalho M, Melo-Gonçalves P, Teixeira J, Rocha A (2016) Regionalization of Europe based on a K-Means cluster analysis of the climate change of temperatures and precipitation. Phys Chem Earth Parts A/B/C 94:22–28

  14. Cassou C, Terray L, Hurrell JW, Deser C (2004) North Atlantic winter climate regimes: spatial asymmetry, stationarity with time, and oceanic forcing. J Climate 17(5):1055–1068

  15. Chidean MI, Morgado E, del Arco E, Ramiro-Bargueño J, Caamaño AJ (2015a) Scalable data-coupled clustering for large scale WSN. IEEE Trans Wireless Commun 15:4681–4694

  16. Chidean MI, Muñoz-Bulnes J, Ramiro-Bargueño J, Caamaño AJ, Salcedo-Sanz S (2015b) Spatio-temporal trend analysis of air temperature in Europe and Western Asia using data-coupled clustering. Global Planet Change 129:45–55

  17. Chidean MI, Caamaño AJ, Ramiro-Bargueño J, Casanova-Mateo C, Salcedo-Sanz S (2018) Spatio-temporal analysis of wind resource in the Iberian Peninsula with data-coupled clustering. Renew Sustain Energy Rev 81:2684–2694

  18. Comrie AC, Glenn EC (1998) Principal components-based regionalization of precipitation regimes across the southwest United States and northern Mexico, with an application to monsoon precipitation variability. Climate Res 10(3):201–215

  19. Dee DP, Uppala SM, et al (2011) The ERA–interim reanalysis: configuration and performance of the data assimilation system. Q J Roy Meteorol Soc 137(656):553–597

  20. Doi T, Fujita I (2014) Cross-matching: a modified cross-correlation underlying threshold energy model and match-based depth perception. Front Comput Neurosci 8(127):1–15

  21. Dommenget D, Latif M (2002) A cautionary note on the interpretation of EOFs. J Climate 15(2):216–225

  22. Gimeno L, Ribera P, Iglesias R, de la Torre L, García R, Hernández E (2002) Identification of empirical relationships between indices of ENSO and NAO and agricultural yields in Spain. Clim Res 21(2):165–172

  23. Irwin SE (2015) Assessment of the regionalization of precipitation in two Canadian climate regions: a fuzzy clustering approach. Ph.D. thesis, The University of Western Ontario

  24. Jerez S, Trigo R, Vicente-Serrano SM, Pozo-Vázquez D, Lorente-Plazas R, Lorenzo-Lacruz J, Santos-Alamillos F, Montávez J (2013) The impact of the North Atlantic oscillation on renewable energy resources in southwestern Europe. J Appl Meteorol Climatol 52(10):2204–2225

  25. Jolliffe I (2002) Principal component analysis, 2nd edn. Springer

  26. Kaiser HF (1958) The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3):187–200

  27. Kim KY, Hamlington B, Na H (2015) Theoretical foundation of cyclostationary EOF analysis for geophysical and climatic variables: concepts and examples. Earth Sci Rev 150:201–218

  28. Knapp PA, Grissino-Mayer HD, Soulé PT (2002) Climatic regionalization and the spatio-temporal occurrence of extreme single-year drought events (1500–1998) in the interior Pacific Northwest, USA. Quatern Res 58(3):226–233

  29. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

  30. Lian T, Chen D (2012) An evaluation of rotated EOF analysis and its application to tropical Pacific SST variability. J Climate 25(15):5361–5373

  31. Lorente-Plazas R, Montávez JP, Jimenez PA, Jerez S, Gómez-Navarro JJ, García-Valero JA, Jimenez-Guerrero P (2015) Characterization of surface winds over the Iberian Peninsula. Int J Climatol 35(6):1007–1026

  32. Nadler B (2008) Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann Stat, 2791–2817

  33. Nojarov P (2017) Genetic climatic regionalization of the Balkan Peninsula using cluster analysis. J Geogr Sci 27(1):43–61

  34. Önol B, Semazzi FHM (2009) Regionalization of climate change simulations over the Eastern Mediterranean. J Climate 22(8):1944–1961

  35. Prieto L, García R, Díaz J, Hernández E, Del Teso T (2002) NAO influence on extreme winter temperatures in Madrid (Spain). Annales Geophysicae 20(12):2077–2085

  36. Qu B, Gabric AJ, Zhu J, Lin D, Qian F, Zhao M (2012) Correlation between sea surface temperature and wind speed in Greenland Sea and their relationships with NAO variability. Water Sci Eng 5(3):304–315

  37. Regonda SK, Zaitchik BF, Badr HS, Rodell M (2016) Using climate regionalization to understand climate forecast system version 2 (CFSv2) precipitation performance for the Conterminous United States (CONUS). Geophys Res Lett 43(12):6485–6492

  38. Richman M (1986) Rotation of principal components. J Climatol 6(3):293–335

  39. Santos-Alamillos F, Thomaidis N, Quesada-Ruiz S, Ruiz-Arias J, Pozo-Vázquez D (2016) Do current wind farms in Spain take maximum advantage of spatiotemporal balancing of the wind resource? Renew Energy 96:574–582

  40. Sarma AK, Hazarika J (2014) GCM based fuzzy clustering to identify homogeneous climatic regions of North-East India. World Academy of Science, Engineering and Technology, International Journal of Environmental, Chemical, Ecological, Geological and Geophysical Engineering 8(12):807–814

  41. Shahriar F, Montazeri M, Momeni M, Freidooni A (2015) Regionalization of the climatic areas of Qazvin province using multivariate statistical methods. Mod Appl Sci 9(2):123

  42. Trigo R, Osborn TJ, Corte-Real JM (2002) The North Atlantic oscillation influence on Europe: climate impacts and associated physical mechanisms. Climate Res 20(1):9–17

  43. Trigo R, Pozo-Vázquez D, Osborn TJ, Castro-Díez Y, Gámiz-Fortis S, Esteban-Parra MJ (2004) North Atlantic Oscillation influence on precipitation, river flow and water resources in the Iberian Peninsula. Int J Climatol 24(8):925–944

  44. Troccoli A, Muller K, Coppin P, Davy R, Russell C, Hirsch AL (2012) Long-term wind speed trends over Australia. J Climate 25(1):170–183

  45. Unal Y, Kindap T, Karaca M (2003) Redefining the climate zones of Turkey using cluster analysis. Int J Climatol 23(9):1045–1055

  46. Vicsek T, Family F (1984) Dynamic scaling for aggregation of clusters. Phys Rev Lett 52:1669–1672

  47. von Storch H, Zwiers FW (1999) Statistical analysis in climate research. Cambridge University Press

  48. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

  49. White D, Richman M, Yarnal B (1991) Climate regionalization and rotation of principal components. Int J Climatol 11(1):1–25

  50. Working Group on Surface Pressure (2016) Download Climate Timeseries. North Atlantic Oscillation (NAO). https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/NAO/. Accessed: 2016-01-22

  51. Xu G, Kailath T (1994) Fast subspace decomposition. IEEE Trans Signal Process 42(3):539–551

  52. Yu L, Zhong S, Bian X, Heilman WE (2015) Temporal and spatial variability of wind resources in the United States as derived from the climate forecast system reanalysis. J Climate 28(3):1166–1183

  53. Zhang Y, Moges S, Block P (2016) Optimal cluster analysis for objective regionalization of seasonal precipitation in regions of high spatial–temporal variability: application to Western Ethiopia. J Climate 29 (10):3697–3717

  54. Zishka KM, Smith PJ (1980) The climatology of cyclones and anticyclones over North America and surrounding ocean environs for January and July, 1950-77. Mon Weather Rev, 387–401

Download references


ERA-Interim data provided courtesy of ECMWF. The authors would like to thank the editing and reviewing effort of the anonymous peers and the editor.


This work has been partially supported by the Autonomous Community of Madrid (Grant Ref. S2013/MAE-2835) and by the Ministerio de Economía y Competitividad of Spain (Grant Ref. TIN2017-85887-C2-2-P).

Author information

Correspondence to Antonio J. Caamaño.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 1.61 MB)

(PDF 72.3 MB)

(PDF 2.27 MB)

(PDF 2.28 MB)

(PDF 3.39 MB)


Appendix A: Phase transition of correlation matrices with a finite number of samples

Using matrix perturbation analysis, it is possible to set a lower bound on the amount of samples to recover the largest eigenvalue of the corresponding correlation matrix (Nadler 2008).

Consider the data matrix \(\mathbf {X}\in {\mathbb R}^{N\times M}\), being M the number of available data samples for each of the N measuring stations or grid sites that we aim to analyze. Its covariance matrix can be defined as \({\Sigma } = \mathbf {M_{X}}\mathbf {M_{X}}^{\top }/M\), being MX the mean-centered data matrix, i.e. transformation of matrix X such that the mean of each row is zero.

Let the noise power of the measurement be defined as σ2 and the modulus of the first EOF of Σ as ||v||2.The noise of the first EOF being simply a white noise stochastic physical process that perturbs all EOFs equally in time and space. Thus, the Signal to Noise Ratio (SNR) needed to resolve the first EOF (the one with higher explained variance) can be defined as

$$ \mathsf{SNR}_{v}=||\mathbf{v}||^{2}/\sigma^{2} $$

It has been shown that the stochastic self-adjoint matrix Σ experiences a phase transition for M samples obtained from N measuring stations (for N/M = constant as N and M got to \(\infty \)) s.t. \(M/N \geq \mathsf {SNR}_{v}^{-2}\) (Nadler 2008, Eq. 2.19). The phase transition can be detected in the largest eigenvalue as it “arises” from the set of noise degenerate eigen values to a signal eigenvalue. This phase transition has been identified also in the fields of unsupervised learning (via the replica method) (Biehl and Mietzner 1994) and statistics (via the Stieltjes transform) (Baik and Silverstein 2006).

This, in fact, provides a threshold on the minimum amount of samples M needed to detect the eigenvalue with a minimum SNRv as

$$ M\geq \frac{N}{\mathsf{SNR}_{v}^{2}} $$

Throughout the present work, we have used as working threshold to separate signal eigenvalues from noise eigenvalues that establishes that the minimum number of samples needed to obtain at least a signal eigenvalue is

$$ M=4\times N $$

This assumption is valid as long as the only intervening noise in the mixture of the EOF is additive, white, and Gaussian.

Thus, we can see that the phase transition sets a limit on the minimum number of samples M for a \({\Sigma } = \mathbf {M_{X}}\mathbf {M_{X}}^{\top }/M\) covariance matrix to be well conditioned, i.e. non-singular. By using a consistent threshold for the extraction of the EOFs, no matter the size of the originating covariance matrix (i.e. from small to large clusters or the complete network), we ensure the coherence and comparability of the results at any spatial scale.

Appendix B: Fast subspace decomposition

In this work, we use the Fast Subspace Decomposition (FSD) algorithm to estimate the number of eigenvalues \(\hat d\) that account for a given amount of explained variance (Xu and Kailath 1994). The FSD algorithm estimates the first \(\hat d\) Rayleigh-Ritz (RR) eigenvalues and eigenvectors (spanning the signal subspace) up to the \(\hat d=d\) iteration, where d is the value to be estimated. FSD is based on the Lanczos method and has O(N2d) computational complexity, much lower than that of the traditional eigendecomposition, that has order of O(N3).

For a data matrix \(\mathbf {X}\in {\mathbb R}^{N\times M}\) with MX the centered data matrix and \({\Sigma } = \mathbf {M_{X}}\mathbf {M_{X}}^{\top }/M\) the covariance matrix, the FSD statistic \(\varphi _{\hat d}\) is defined as (Xu and Kailath 1994):

$$ \varphi_{\hat d}=M(N-\hat d)\log\left[\frac{\sqrt{\frac{1}{N-\hat d}(||{\Sigma}||^{2}-{\sum}_{n=1}^{N}{\theta_{n}^{2}})}}{\frac{1}{N-\hat d}(\mathsf{Tr}{\Sigma}-{\sum}_{n=1}^{N}\theta_{n})} \right] $$

where ||⋅|| is the Frobenius norm, θn is the RR eigenvalues, and TrΣ is the trace of matrix Σ. In each iteration, the \(\varphi _{\hat d}\) statistic is computed and for \(\hat d\geq d+1\), \(\varphi _{\hat d}\) approaches a χ2 distribution with \((1/2)(N-\hat d)(N-\hat d+1)-1\) degrees of freedom.

Finally, it has been proven that, for M samples, the following equation is valid (Xu and Kailath 1994)

$$ \varphi_{\hat d}\leq \gamma_{\hat d}\ c(M) $$

where \(\gamma _{\hat d}\) is a threshold computed a priori as the end tail of the χ2 distribution according to the required amount of explained variance. As in the present work we have considered that the minimum explained variance by the \(\hat d\) eigenvalues estimated by FSD is 90%, we calculated the value \(\gamma _{\hat d}\) such that it accounts for 0.1 of the area of the corresponding χ2 distribution. Also, function c(M) must comply with the following:

$$ \lim\limits_{M\rightarrow\infty}\frac{c(M)}{M}=0\quad \text{and} \quad \lim\limits_{M\rightarrow\infty}\frac{c(M)}{\log \log M}=\infty $$

In practice, its asymptotic behaviour must be “slower” than linear but “faster” than \(\log \log \); functions such as \(c(M)=\log (M)\) or \(c(M)=\sqrt {\log (M)}\) can be used.

Appendix C. Schematic of the method

In this work, we use the SODCC algorithm as the core of a climate data analysis procedure. In order to facilitate both the understanding of our results and conclusions and reproducibility of this analysis, in this appendix, we include a general scheme of the complete analysis performed in this work (Fig. 14). We also include detailed schematic representations of the performed simulations in Figs. 15 and 16 and of the analysis carried out in Fig. 17.

Fig. 14

Schematic representation of the simulations and result analysis performed in this work. Each inset is zoomed in Figs. 1516, and 17, respectively

Fig. 15

Schematic representation of the cluster initialization stage of the SODCC algorithm

Fig. 16

Schematic representation of the cluster growing stage of the SODCC algorithm

Fig. 17

Schematic representation of the result analysis performed in this work

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chidean, M.I., Caamaño, A.J., Casanova-Mateo, C. et al. Spatio-temporal climate regionalization using a self-organized clustering approach. Theor Appl Climatol (2020). https://doi.org/10.1007/s00704-019-03082-6

Download citation