## Abstract

The authors present a novel self-organized climate regionalization (CR) method that obtains a spatial clustering of regions, based on the explained variance of physical measurements in their coverage. This method enables a microscopic characterization of the probabilistic spatial extent of climate regions, using the statistics of the obtained clusters. It also allows for the study of the macroscopic behaviour of climate regions through time by using the dissimilarity among different cluster size probability histograms. The main advantages of the presented method, based on the Second-Order Data-Coupled Clustering (SODCC) algorithm, are that SODCC is robust to the selection of tunable parameters and that it does not require a regular or homogeneous grid to be applied. Moreover, the SODCC method has higher spatial resolution, lower computational complexity, and allows for a more direct physical interpretation of the outputs than other existing CR methods, such as Empirical Orthogonal Function (EOF) or Rotated Empirical Orthogonal Function (REOF). These facts are illustrated with an example of winter wind speed regionalization in the Iberian Peninsula through the period (1979 − 2014). This study also reveals that the North Atlantic Oscillation (NAO) has a high influence over the wind distribution in the Iberian Peninsula in a subset of years in the considered period.

This is a preview of subscription content, log in to check access.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

## Notes

- 1.
A vector is a matrix consisting of a single column of elements and the symbol (⋅)

^{⊤}indicates the transpose operation. - 2.
Note that the algorithm source code can be found at https://github.com/MihChi/SODCC.

## References

Argüeso D, Hidalgo-Muñoz JM, Gámiz-Fortis SR, Esteban-Parra MJ, Dudhia J, Castro-Díez Y (2011) Evaluation of WRF parameterizations for climate studies over Southern Spain using a multistep regionalization. J Climate 24(21):5633–5651

Azorin-Molina C, Vicente-Serrano SM, McVicar TR, Jerez S, Sanchez-Lorenzo A, López-Moreno JI, Revuelto J, Trigo R, Lopez-Bustins JA, Espírito-Santo F (2014) Homogenization and assessment of observed near-surface wind speed trends over Spain and Portugal, 1961–2011. J Climate 27(10):3692–3712

Badr HS, Zaitchik BF, Dezfuli AK (2014) Climate regionalization through hierarchical clustering: options and recommendations for Africa. AGU Fall Meeting Abstracts

Badr HS, Zaitchik BF, Dezfuli AK (2015) A tool for hierarchical climate regionalization. Earth Sci Inform 8(4):949–958

Baeriswyl PA, Rebetez M (1997) Regionalization of precipitation in Switzerland by means of principal component analysis. Theor Appl Climatol 58(1–2):31–41

Baik J, Silverstein JW (2006) Eigenvalues of large sample covariance matrices of spiked population models. J Multivar Anal 97(6):1382–1408

Biehl M, Mietzner A (1994) Statistical mechanics of unsupervised structure recognition. J Phys A Math Gen 27(6):1885

Bierstedt SE, Hünicke B, Zorita E, Wagner S, Gómez-Navarro JJ (2014) Variability of daily winter wind speed distribution over Northern Europe during the past millennium in regional and global climate simulations. Clim Past 2(12):317–338

Björnsson H, Venegas S (1997) A manual for EOF and SVD analyses of climatic data. CCGCR Report 97(1):112–134

Branick ML (1997) A climatology of significant winter-type weather events in the contiguous United States, 1982–94. Weather Forecast 12(2):193–207

Burn DH (1989) Cluster analysis as applied to regional flood frequency. J Water Resour Plan Manag 115 (5):567–582

Burningham H, French J (2013) Is the NAO winter index a reliable proxy for wind climate and storminess in Northwest Europe? Int J Climatol 33(8):2036–2049

Carvalho M, Melo-Gonçalves P, Teixeira J, Rocha A (2016) Regionalization of Europe based on a K-Means cluster analysis of the climate change of temperatures and precipitation. Phys Chem Earth Parts A/B/C 94:22–28

Cassou C, Terray L, Hurrell JW, Deser C (2004) North Atlantic winter climate regimes: spatial asymmetry, stationarity with time, and oceanic forcing. J Climate 17(5):1055–1068

Chidean MI, Morgado E, del Arco E, Ramiro-Bargueño J, Caamaño AJ (2015a) Scalable data-coupled clustering for large scale WSN. IEEE Trans Wireless Commun 15:4681–4694

Chidean MI, Muñoz-Bulnes J, Ramiro-Bargueño J, Caamaño AJ, Salcedo-Sanz S (2015b) Spatio-temporal trend analysis of air temperature in Europe and Western Asia using data-coupled clustering. Global Planet Change 129:45–55

Chidean MI, Caamaño AJ, Ramiro-Bargueño J, Casanova-Mateo C, Salcedo-Sanz S (2018) Spatio-temporal analysis of wind resource in the Iberian Peninsula with data-coupled clustering. Renew Sustain Energy Rev 81:2684–2694

Comrie AC, Glenn EC (1998) Principal components-based regionalization of precipitation regimes across the southwest United States and northern Mexico, with an application to monsoon precipitation variability. Climate Res 10(3):201–215

Dee DP, Uppala SM, et al (2011) The ERA–interim reanalysis: configuration and performance of the data assimilation system. Q J Roy Meteorol Soc 137(656):553–597

Doi T, Fujita I (2014) Cross-matching: a modified cross-correlation underlying threshold energy model and match-based depth perception. Front Comput Neurosci 8(127):1–15

Dommenget D, Latif M (2002) A cautionary note on the interpretation of EOFs. J Climate 15(2):216–225

Gimeno L, Ribera P, Iglesias R, de la Torre L, García R, Hernández E (2002) Identification of empirical relationships between indices of ENSO and NAO and agricultural yields in Spain. Clim Res 21(2):165–172

Irwin SE (2015) Assessment of the regionalization of precipitation in two Canadian climate regions: a fuzzy clustering approach. Ph.D. thesis, The University of Western Ontario

Jerez S, Trigo R, Vicente-Serrano SM, Pozo-Vázquez D, Lorente-Plazas R, Lorenzo-Lacruz J, Santos-Alamillos F, Montávez J (2013) The impact of the North Atlantic oscillation on renewable energy resources in southwestern Europe. J Appl Meteorol Climatol 52(10):2204–2225

Jolliffe I (2002) Principal component analysis, 2nd edn. Springer

Kaiser HF (1958) The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3):187–200

Kim KY, Hamlington B, Na H (2015) Theoretical foundation of cyclostationary EOF analysis for geophysical and climatic variables: concepts and examples. Earth Sci Rev 150:201–218

Knapp PA, Grissino-Mayer HD, Soulé PT (2002) Climatic regionalization and the spatio-temporal occurrence of extreme single-year drought events (1500–1998) in the interior Pacific Northwest, USA. Quatern Res 58(3):226–233

Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

Lian T, Chen D (2012) An evaluation of rotated EOF analysis and its application to tropical Pacific SST variability. J Climate 25(15):5361–5373

Lorente-Plazas R, Montávez JP, Jimenez PA, Jerez S, Gómez-Navarro JJ, García-Valero JA, Jimenez-Guerrero P (2015) Characterization of surface winds over the Iberian Peninsula. Int J Climatol 35(6):1007–1026

Nadler B (2008) Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann Stat, 2791–2817

Nojarov P (2017) Genetic climatic regionalization of the Balkan Peninsula using cluster analysis. J Geogr Sci 27(1):43–61

Önol B, Semazzi FHM (2009) Regionalization of climate change simulations over the Eastern Mediterranean. J Climate 22(8):1944–1961

Prieto L, García R, Díaz J, Hernández E, Del Teso T (2002) NAO influence on extreme winter temperatures in Madrid (Spain). Annales Geophysicae 20(12):2077–2085

Qu B, Gabric AJ, Zhu J, Lin D, Qian F, Zhao M (2012) Correlation between sea surface temperature and wind speed in Greenland Sea and their relationships with NAO variability. Water Sci Eng 5(3):304–315

Regonda SK, Zaitchik BF, Badr HS, Rodell M (2016) Using climate regionalization to understand climate forecast system version 2 (CFSv2) precipitation performance for the Conterminous United States (CONUS). Geophys Res Lett 43(12):6485–6492

Richman M (1986) Rotation of principal components. J Climatol 6(3):293–335

Santos-Alamillos F, Thomaidis N, Quesada-Ruiz S, Ruiz-Arias J, Pozo-Vázquez D (2016) Do current wind farms in Spain take maximum advantage of spatiotemporal balancing of the wind resource? Renew Energy 96:574–582

Sarma AK, Hazarika J (2014) GCM based fuzzy clustering to identify homogeneous climatic regions of North-East India. World Academy of Science, Engineering and Technology, International Journal of Environmental, Chemical, Ecological, Geological and Geophysical Engineering 8(12):807–814

Shahriar F, Montazeri M, Momeni M, Freidooni A (2015) Regionalization of the climatic areas of Qazvin province using multivariate statistical methods. Mod Appl Sci 9(2):123

Trigo R, Osborn TJ, Corte-Real JM (2002) The North Atlantic oscillation influence on Europe: climate impacts and associated physical mechanisms. Climate Res 20(1):9–17

Trigo R, Pozo-Vázquez D, Osborn TJ, Castro-Díez Y, Gámiz-Fortis S, Esteban-Parra MJ (2004) North Atlantic Oscillation influence on precipitation, river flow and water resources in the Iberian Peninsula. Int J Climatol 24(8):925–944

Troccoli A, Muller K, Coppin P, Davy R, Russell C, Hirsch AL (2012) Long-term wind speed trends over Australia. J Climate 25(1):170–183

Unal Y, Kindap T, Karaca M (2003) Redefining the climate zones of Turkey using cluster analysis. Int J Climatol 23(9):1045–1055

Vicsek T, Family F (1984) Dynamic scaling for aggregation of clusters. Phys Rev Lett 52:1669–1672

von Storch H, Zwiers FW (1999) Statistical analysis in climate research. Cambridge University Press

Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

White D, Richman M, Yarnal B (1991) Climate regionalization and rotation of principal components. Int J Climatol 11(1):1–25

Working Group on Surface Pressure (2016) Download Climate Timeseries. North Atlantic Oscillation (NAO). https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/NAO/. Accessed: 2016-01-22

Xu G, Kailath T (1994) Fast subspace decomposition. IEEE Trans Signal Process 42(3):539–551

Yu L, Zhong S, Bian X, Heilman WE (2015) Temporal and spatial variability of wind resources in the United States as derived from the climate forecast system reanalysis. J Climate 28(3):1166–1183

Zhang Y, Moges S, Block P (2016) Optimal cluster analysis for objective regionalization of seasonal precipitation in regions of high spatial–temporal variability: application to Western Ethiopia. J Climate 29 (10):3697–3717

Zishka KM, Smith PJ (1980) The climatology of cyclones and anticyclones over North America and surrounding ocean environs for January and July, 1950-77. Mon Weather Rev, 387–401

## Acknowledgements

ERA-Interim data provided courtesy of ECMWF. The authors would like to thank the editing and reviewing effort of the anonymous peers and the editor.

## Funding

This work has been partially supported by the Autonomous Community of Madrid (Grant Ref. S2013/MAE-2835) and by the Ministerio de Economía y Competitividad of Spain (Grant Ref. TIN2017-85887-C2-2-P).

## Author information

## Additional information

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

## Appendices

### Appendix A: Phase transition of correlation matrices with a finite number of samples

Using matrix perturbation analysis, it is possible to set a lower bound on the amount of samples to recover the largest eigenvalue of the corresponding correlation matrix (Nadler 2008).

Consider the data matrix \(\mathbf {X}\in {\mathbb R}^{N\times M}\), being *M* the number of available data samples for each of the *N* measuring stations or grid sites that we aim to analyze. Its covariance matrix can be defined as \({\Sigma } = \mathbf {M_{X}}\mathbf {M_{X}}^{\top }/M\), being *M*_{X} the mean-centered data matrix, i.e. transformation of matrix **X** such that the mean of each row is zero.

Let the noise power of the measurement be defined as *σ*^{2} and the modulus of the first EOF of Σ as ||**v**||^{2}.The noise of the first EOF being simply a white noise stochastic physical process that perturbs all EOFs equally in time and space. Thus, the Signal to Noise Ratio (*S**N**R*) needed to resolve the first EOF (the one with higher explained variance) can be defined as

It has been shown that the stochastic self-adjoint matrix Σ experiences a *phase transition* for *M* samples obtained from *N* measuring stations (for *N*/*M* = constant as *N* and *M* got to \(\infty \)) s.t. \(M/N \geq \mathsf {SNR}_{v}^{-2}\) (Nadler 2008, Eq. 2.19). The phase transition can be detected in the largest eigenvalue as it “arises” from the set of noise degenerate eigen values to a signal eigenvalue. This phase transition has been identified also in the fields of unsupervised learning (via the replica method) (Biehl and Mietzner 1994) and statistics (via the Stieltjes transform) (Baik and Silverstein 2006).

This, in fact, provides a threshold on the minimum amount of samples *M* needed to detect the eigenvalue with a minimum *S**N**R*_{v} as

Throughout the present work, we have used as working threshold to separate *signal eigenvalues* from *noise eigenvalues* that establishes that the minimum number of samples needed to obtain at least a signal eigenvalue is

This assumption is valid as long as the only intervening noise in the mixture of the EOF is additive, white, and Gaussian.

Thus, we can see that the phase transition sets a limit on the minimum number of samples *M* for a \({\Sigma } = \mathbf {M_{X}}\mathbf {M_{X}}^{\top }/M\) covariance matrix to be well conditioned, i.e. non-singular. By using a consistent threshold for the extraction of the EOFs, no matter the size of the originating covariance matrix (i.e. from small to large clusters or the complete network), we ensure the coherence and comparability of the results at any spatial scale.

### Appendix B: Fast subspace decomposition

In this work, we use the Fast Subspace Decomposition (FSD) algorithm to estimate the number of eigenvalues \(\hat d\) that account for a given amount of explained variance (Xu and Kailath 1994). The FSD algorithm estimates the first \(\hat d\) Rayleigh-Ritz (RR) eigenvalues and eigenvectors (spanning the signal subspace) up to the \(\hat d=d\) iteration, where *d* is the value to be estimated. FSD is based on the Lanczos method and has *O*(*N*^{2}*d*) computational complexity, much lower than that of the traditional eigendecomposition, that has order of *O*(*N*^{3}).

For a data matrix \(\mathbf {X}\in {\mathbb R}^{N\times M}\) with *M*_{X} the centered data matrix and \({\Sigma } = \mathbf {M_{X}}\mathbf {M_{X}}^{\top }/M\) the covariance matrix, the FSD statistic \(\varphi _{\hat d}\) is defined as (Xu and Kailath 1994):

where ||⋅|| is the Frobenius norm, *θ*_{n} is the RR eigenvalues, and *T**r*Σ is the trace of matrix Σ. In each iteration, the \(\varphi _{\hat d}\) statistic is computed and for \(\hat d\geq d+1\), \(\varphi _{\hat d}\) approaches a *χ*^{2} distribution with \((1/2)(N-\hat d)(N-\hat d+1)-1\) degrees of freedom.

Finally, it has been proven that, for *M* samples, the following equation is valid (Xu and Kailath 1994)

where \(\gamma _{\hat d}\) is a threshold computed *a priori* as the end tail of the *χ*^{2} distribution according to the required amount of explained variance. As in the present work we have considered that the minimum explained variance by the \(\hat d\) eigenvalues estimated by FSD is 90%, we calculated the value \(\gamma _{\hat d}\) such that it accounts for 0.1 of the area of the corresponding *χ*^{2} distribution. Also, function *c*(*M*) must comply with the following:

In practice, its asymptotic behaviour must be “slower” than linear but “faster” than \(\log \log \); functions such as \(c(M)=\log (M)\) or \(c(M)=\sqrt {\log (M)}\) can be used.

### Appendix C. Schematic of the method

In this work, we use the SODCC algorithm as the core of a climate data analysis procedure. In order to facilitate both the understanding of our results and conclusions and reproducibility of this analysis, in this appendix, we include a general scheme of the complete analysis performed in this work (Fig. 14). We also include detailed schematic representations of the performed simulations in Figs. 15 and 16 and of the analysis carried out in Fig. 17.

## Rights and permissions

## About this article

### Cite this article

Chidean, M.I., Caamaño, A.J., Casanova-Mateo, C. *et al.* Spatio-temporal climate regionalization using a self-organized clustering approach.
*Theor Appl Climatol* (2020). https://doi.org/10.1007/s00704-019-03082-6

Received:

Accepted:

Published: