Time series clustering using trend, seasonal and autoregressive components to identify maximum temperature patterns in the Iberian Peninsula

Palacios Gutiérrez, Arnobio; Valencia Delfa, Jose Luis; Villeta López, María

doi:10.1007/s10651-023-00572-9

Time series clustering using trend, seasonal and autoregressive components to identify maximum temperature patterns in the Iberian Peninsula

Open access
Published: 15 July 2023

Volume 30, pages 421–442, (2023)
Cite this article

Download PDF

You have full access to this open access article

Environmental and Ecological Statistics Aims and scope Submit manuscript

Time series clustering using trend, seasonal and autoregressive components to identify maximum temperature patterns in the Iberian Peninsula

Download PDF

2272 Accesses
Explore all metrics

Abstract

Time series (TS) clustering is a crucial area of data mining that can be used to identify interesting patterns. This study introduces a novel approach to obtain clusters of TS by representing them with feature vectors that define the trend, seasonality and noise components of each series in order to identify areas of the Iberian Peninsula (IP) that follow the same pattern of change in regards to maximum temperature during 1931–2009. This representation allows for dimensionality reduction, and is obtained based on singular spectrum analysis decomposition in a sequential manner, which is a well-developed methodology of TS analysis and forecasting with applications ranging from the decomposition and filtering of nonparametric TS to parameter estimation and forecasting. In this approach, the trend, seasonality and residual components of each TS corresponding to a specific area in the Iberian region are extracted using the proposed SSA methodology. Afterwards, the feature vectors of the TS are obtained by modelling the extracted components and estimating their parameters. Finally, a clustering algorithm is applied to group the TS into clusters, which are defined according to the centroids. This methodology is applied to a climate database with reasonable results that align with the defined characteristics, enabling a spatial exploration of the IP. The results identified three differentiated zones that can be used to describe how the maximum temperature varied: in the northern and central zones, an increase in temperature was noted over time, whereas in the southern zone, a slight decrease was noted. Moreover, different seasonal variations were observed across the zones.

Assessing the Evolution of Meteorological Seasons and Climate Changes Using Hierarchical Clustering

Independent Component Analysis for Extended Time Series in Climate Data

Pattern Recognition Through Empirical Mode Decomposition for Temperature Time Series Between 1986 and 2019 in Mexico City Downtown for Global Warming Assessment

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Climate change is an increasingly noticeable global problem that has a significant impact on society as a whole and on ecosystems. As a result, research on climate change has become increasingly important, especially studies relating to temperature, and has been expanding. As stated in the report of the Sixth Assessment of the Intergovernmental Panel on Climate Change (IPCC), the global mean surface temperature (GMST) increased by 1.1 °C between 2001–2021 and 1850–1900, accelerating its rate after the 1970s (IPCC, 2021). Moreover, in the very near future (2025–2050), the GMST may warm up as much as 0.25 °C per decade, according to the climate model predictions by Samset et al. (2020) and Tebaldi et al. (2021).

Although these numbers may seem low, the changes and effects are really remarkable, as global warming manifests prolonged droughts, heat waves and forest fires. For instance, over the last fifteen years (2003, 2010, 2015, 2018), Europe has experienced extreme heat waves (Kuglitsch et al. 2010; Russo et al. 2015; Molina et al. 2020). According to Calheiros et al. (2021), weather and climate conditions such as high temperatures, moderate annual precipitation and prolonged dry spells are the cause of a large number of forest fires and burnt areas around the globe, particularly in areas of the Iberian Peninsula (IP). According to climate predictions, these conditions are expected to continue for the foreseeable future, with some projections predicting that more intense, prolonged, and frequent extreme heat events will occur in Europe in the twenty first century, with a higher impact on the IP and Mediterranean regions (IPCC 2014; King and Karoly 2017; Dosio et al. 2018; Vicedo-Cabrera et al. 2018).

Analysis of the temperature changes experienced by the IP, as well as the projections that have presented concerning this issue, will be a more manageable process if studies that were conducted to observe how these changes or variations in temperature have occurred in the IP are included. This will certainly be the case if these changes are defined by zones or sub-regions, since the temperature in the IP presents spatial variations strongly influenced by distance from the sea and complex orography, promoting a marked climate gradient from north to south (Lorenzo and Alvarez 2022). On the other hand, considering analyses that take into account temperature extremes will be more beneficial, since most studies in the climate field have focused on mean climate trend analysis, which does not tell us about unusual changes (Gebremichael et al. 2022).

The extreme temperature changes experienced in a geographical area or sub-region can be understood by obtaining time series (TS) clusters of these temperatures and defining them in points or areas distributed over a geographical area. Since TS clustering is used to identify interesting patterns in TS data sets, finding TS clusters can be valuable in different domains, such as responding to anomalies or novelties, detecting discordance, integrating applications in dynamic change recognition in TS, and predicting, recommending, and discovering patterns (Aghabozorgi et al. 2015). Different studies on TS clustering (Warren Liao 2005; Rani and Sikka 2012; Aghabozorgi et al. 2015; Ergüner Özkoç 2021) agree that there are three main categories or approaches to TS clustering, depending on whether one is working directly with raw data, indirectly with features or characteristics extracted from the raw data, or indirectly with models built from raw data. Some studies of different domains have used TS clustering to define patterns or find matching TS (Keogh and Smyth 1997; Huhtala et al. 1999; Wang and Wang 2000; Möller-Levet et al. 2003; Huiting et al. 2006; Guo et al. 2008; Lee et al. 2010, 2018; Li and Wei 2020; Shi et al. 2021).

This study is framed within TS clustering based on the approach of extracting features from data, and proposes a procedure to cluster TS by trend, seasonality, and main autocorrelations to ensure that patterns of change in maximum temperature (TMAX) can be identified for each zone in the IP during the period of 1931–2009. The novelty of this methodology is due to the decomposition of TS using singular spectrum analysis (SSA), a well-developed TS analysis and forecasting methodology whose applications have a wide scope ranging from nonparametric TS decomposition and filtering to parameter estimation and forecasting (Golyandina and Korobeynikov 2014). First, in this decomposition process, three components associated with the trend, seasonality and residual of the initial TS are reconstructed, allowing the parameters that describe these components to be extracted. Second, the representation of each TS is obtained from a feature vector generated on the basis of the calculated parameters, which allows the TS to be clustered using unsupervised learning algorithms, such as k-means (Hartigan and Wong 1979), k-medoids (Park and Jun 2009), hierarchical agglomerative (HA; Lukasová 1979) and Kohonen self-organising maps (SOM; Kohonen and Oja 1996), which are well-known and representative conventional algorithms that use the Euclidean distance. Finally, in the experiment on a climatic database, after comparing the clusters obtained with the different methods, a hybrid approach that combines HA and k-means, called hkmeans, was selected as a clustering algorithm to identify TS that are similar and follow a pattern. A set of 1776 points from a grid of 25 × 25 km² elaborated through spatial interpolation kriging by the “Servicio de Desarrollos Climatológicos” of the Meteorological Spanish State Agency was used. This grid includes points distributed throughout Spain, Portugal and the closest areas of the Atlantic Ocean and the Mediterranean Sea; for each point, a monthly TS of TMAX from January 1931 to 2009 is considered. The way in which the TS are represented here allows the clustering process to be optimised, since one of the advantages of implementing the SSA decomposition is the ability to eliminate the noise of the series (one of its main applications), together with the study of the spectral profile. In addition, costs and speed are improved with the reduction of the dimensionality. Therefore, SSA is used not only to decompose a series into a number of components to determine its spectral profile or to estimate its parameters, but also as a basis for recognizing characteristics and patterns of TS ensembles. The clustering results of this study made it possible to identify three differentiated zones: zone 1 is situated in the north of the IP, in areas with the lowest TMAX and a higher proportion of increase compared to the other areas; zone 2 is located more to the south, in areas with the highest TMAX, as well as a slight decline over an extended period; and zone 3 is stationed more towards the centre, in areas with intermediate TMAX, showing an increase over time. In addition, it was observed that the identified zones show different seasonal variations.

The remainder of this document is organised as follows. Section 2 describes the TS decomposition method, which uses SSA in a sequential manner. Section 3 proposes the new method for defining the trend, seasonality and autoregression patterns of TS, including extracting TS trend, seasonality and residual components and clustering and defining patterns. Section 4 presents the results of the method. Section 5 presents the main conclusions of the paper.

2 Sequential SSA decomposition method

In the following section, the theory of the sequential SSA decomposition method for extracting TS components is briefly presented.

SSA is a technique that is known for its application in TS analysis and prediction and has recently been used to analyse digital images and other objects that are not necessarily flat or rectangular and may contain gaps. SSA is a very unique and attractive methodology for solving a wide range of problems in different areas related to TS and digital images, as it naturally combines parametric and nonparametric techniques (Golyandina et al. 2018).

This technique is based on the singular value decomposition (SVD) of a specific matrix obtained from a TS and aims to decompose original TS into a small number of interpretable components, such as a trend that is smooth and slowly varying, with oscillatory components that are periodic, pure quasiperiodic or amplitude-modulated, and noise without any pattern or structure (Golyandina et al. 2001; Zhigljavsky 2011; Xiao et al. 2014; Golyandina and Korobeynikov 2014).

As stated in Golyandina et al. (2001), SSA consists of four steps: embedding, SVD, grouping and diagonal averaging. In some references, steps 1 and 2 of the generic SSA scheme are combined in the decomposition stage, whereas steps 3 and 4 are combined in the reconstruction stage. Several additive components of the original TS are obtained through SSA. In the following, the SSA method is presented formally.

Input: ${\mathbb{T}}=\left({t}_{1},{t}_{2},\ldots,{t}_{N}\right),$ the initial TS, which is a one-dimensional N-order TS.

Result: A decomposition of ${\mathbb{T}}$ into a sum of identifiable components, ${\mathbb{T}}={\widetilde{\mathbb{T}}}_{1}+{\widetilde{\mathbb{T}}}_{2}+\cdots+{\widetilde{\mathbb{T}}}_{m}$.

2.1 Step 1: Embedding

The so-called trajectory matrix is obtained through the equation $X=\mathcal{T}({\mathbb{T}})$, where $\mathcal{T}$ is a linear map that transforms the TS ${\mathbb{T}}$ into a matrix of order $L\times K$, and where $L$ is an integer that is called the window length, $1<L<N$, and $K=N-L+1$.

The set of all possible path matrices can be denoted as Hankel matrices, ${\mathcal{M}}_{L,K}^{(\mathcal{H})}$, where all elements along the diagonal are equal. If $N$ and $L$ are fixed, then there is a biunivocal correspondence between the path matrices and the TS.

The trajectory matrix ${\varvec{X}}$ constructed from lagged vectors, which are generated from the TS ${\mathbb{T}}$, can be represented in the following way:

$$\mathcal{T}\left({\mathbb{T}}\right)=\left(\begin{array}{ccc}\begin{array}{ccc}{{\varvec{t}}}_{1\boldsymbol{ }\boldsymbol{ }}& {{\varvec{t}}}_{2}\boldsymbol{ }\boldsymbol{ }\boldsymbol{ }& {{\varvec{t}}}_{3}\\ {{\varvec{t}}}_{2}\boldsymbol{ }\boldsymbol{ }& {{\varvec{t}}}_{3}\boldsymbol{ }\boldsymbol{ }& {\boldsymbol{ }{\varvec{t}}}_{4}\\ {{\varvec{t}}}_{3}\boldsymbol{ }\boldsymbol{ }& {{\varvec{t}}}_{4}\boldsymbol{ }\boldsymbol{ }& \boldsymbol{ }{{\varvec{t}}}_{5}\end{array}& \cdots & \begin{array}{c}{{\varvec{t}}}_{{\varvec{K}}}\\ {{\varvec{t}}}_{{\varvec{K}}+1}\\ {{\varvec{t}}}_{{\varvec{K}}+2}\end{array}\\ \vdots & \ddots & \vdots \\ \begin{array}{ccc}{\boldsymbol{ }\boldsymbol{ }\boldsymbol{ }\boldsymbol{ }{\varvec{t}}}_{{\varvec{L}}}& {{\varvec{t}}}_{{\varvec{L}}+1}& {{\varvec{t}}}_{{\varvec{L}}+2}\end{array}& \cdots & {{\varvec{t}}}_{{\varvec{N}}}\end{array}\right).$$

(1)

2.2 Step 2: Decomposition of ${\varvec{X}}$ into the sum of the rank 1 matrices

The result obtained in this step is the following decomposition:

$${\varvec{X}}=\sum_{{\varvec{i}}}{{\varvec{X}}}_{{\varvec{i}}},\boldsymbol{ }\boldsymbol{ }\boldsymbol{ }\boldsymbol{ }{{\varvec{X}}}_{{\varvec{i}}}={{\varvec{\sigma}}}_{{\varvec{i}}}{{\varvec{U}}}_{{\varvec{i}}}{{\varvec{V}}}_{{\varvec{i}}}^{{\varvec{T}}},$$

(2)

where ${U}_{i}\in {R}^{L}$ and ${V}_{i}\in {R}^{K}$ are vectors, such that $\Vert {U}_{i}\Vert =1$ and $\Vert {V}_{i}\Vert =1$ for all $i$ and ${\sigma }_{i}$ denotes nonnegative numbers.

If such a decomposition is performed using conventional SVD, the corresponding SSA method is “Basic SSA.” In addition, the SVD of the matrix ${\varvec{X}}$ is calculated via the eigenvalues and eigenvectors of the matrix $S={\varvec{X}}{{\varvec{X}}}^{{\varvec{T}}}$ of size $L\times L$. Here, ${\lambda }_{1},\ldots,{\lambda }_{d}$ denotes the eigenvalues of the matrix $S$ listed in decreasing order of magnitude $({\lambda }_{1}\ge \cdots\ge {\lambda }_{d}\ge 0)$, whereas ${U}_{i},\ldots,{U}_{d}$ denotes the orthonormal system of the eigenvectors of the matrix $S$ corresponding to these eigenvalues, considering that $d=L$. Since ${V}_{i}={{\varvec{X}}}^{{\varvec{T}}}{U}_{i}/\sqrt{{\lambda }_{i}}, (i=1,\ldots,d)$ are factor vectors, ${{\varvec{X}}}_{i}=\sqrt{{\lambda }_{i}}{U}_{i}{V}_{i}^{T}$ are elementary matrices of rank 1. Thus, the SVD of the trajectory matrix can be written as the following:

$${\varvec{X}}={{\varvec{X}}}_{1}+\cdots+{{\varvec{X}}}_{d}.$$

(3)

The collection $(\sqrt{{\lambda }_{i}}{,U}_{i},{V}_{i}^{T})$ is called an SVD eigenvector of order $i$ and consists of the singular value (equal to $\sqrt{{\lambda {\sigma }_{i}}_{i}}$), an eigenvector ${U}_{i}$ (the left singular vector) and a factor vector ${V}_{i}$ (the right singular vector).

2.3 Step 3: Grouping

The input of this step consists of expansion (2) and specification of how to cluster the components of (2). The index set {1, 2,…,d} must be segmented into $\mathrm{m}$ disjoint subsets. ${I}_{1},{I}_{2},\dots ,{I}_{m}$, where $I=\left\{{i}_{1},{i}_{2}\dots ,{i}_{p}\right\}\subset \left\{\mathrm{1,2},\dots ,d\right\}$ as a subset of indices. The resulting matrix ${{\varvec{X}}}_{I}$ corresponding to group $I$ is defined as the following:

$${{\varvec{X}}}_{I}={{\varvec{X}}}_{i1}+{{\varvec{X}}}_{i2}+\cdots+{{\varvec{X}}}_{ip}.$$

(4)

Thus, if a partition is specified in $m$ disjoint subsets of the index set $\left\{\mathrm{1,2},\dots ,d\right\}$, then, by expansion (2), the result of the grouping step leads to the following decomposition:

$${\varvec{X}}={{\varvec{X}}}_{I1}+{{\varvec{X}}}_{I2}+\cdots+{{\varvec{X}}}_{Im}.$$

(5)

The above procedure for choosing the sets ${I}_{1},{I}_{2},\dots ,{I}_{m}$ is called the eigentriple grouping procedure. The grouping of expansion (2), where ${I}_{j}=\left\{j\right\}$, is called elementary.

2.4 Step 4: Reconstruction

In this step, each matrix ${{\varvec{X}}}_{Ik}$ from lumped decomposition (5) is transferred into the form of the input object ${\mathbb{T}}$, which is a TS of length $N.$ To do this, each matrix ${{\varvec{X}}}_{Ik}$ is hankelised and, by means of one-to-one correspondence between Hankel matrices and TS, is transformed into a new series of length $N$. Thus, applying diagonal averaging to ${{\varvec{X}}}_{Ik}$ produces a reconstructed series ${\widetilde{\mathbb{T}}}_{k}$ of order $N$ (for more details, see Sect. 1.1.2.6 of Golyandina et al. 2018).

Consequently, the resulting decomposition of the input object ${\mathbb{T}}$ is the following:

$${\mathbb{T}}={\widetilde{\mathbb{T}}}_{1}+{\widetilde{\mathbb{T}}}_{2}+\cdots+{\widetilde{\mathbb{T}}}_{m}.$$

(6)

If the grouping is elementary, the reconstructed objects ${\widetilde{\mathbb{T}}}_{k}$ in (6) are called elementary components.

The SSA parameters, i.e., the length of the window $L$ and the way in which ${{\varvec{X}}}_{ik}$ matrices are grouped, are very important for the outcome of the decomposition and depend on the properties of the initial TS and the objective of the analysis. One aspect that helps in choosing these parameters is the notion of separability. The separability of two TS, ${\mathbb{T}}_{N}^{(1)}$ and ${\mathbb{T}}_{N}^{(2)}$, means the possibility of extracting ${\mathbb{T}}_{N}^{(1)}$ from an observed sum ${\mathbb{T}}_{N}^{(1)}+{\mathbb{T}}_{N}^{(2)}.$ According to Golyandina et al. (2001), SSA can approximately separate signals, noise, sinusoidal waves with different frequencies, trends and seasonality, etc.

It is possible to obtain recommendations for the choice of window length based on the (approximate) separability conditions. For example, the value of $L$ should be large enough ($L\approx N/2)$, and if extracting an existing periodic component of a TS with one or several known periods is needed, it is advisable to choose a window length that is proportional to the highest period. In Golyandina (2010), the choice of SSA parameters is discussed.

SSA can be performed sequentially, which is recommended when the TS structure is complex (Golyandina et al. 2012). Sequential SSA consists of two stages: in the first stage, the extraction of the TS trend with a small $L$ is performed, and in the second stage, the periodic components of the residue are detected and extracted with $L\approx N/2$.

3 Trend, seasonality and autoregression SSA-based TS pattern identification

In TS data mining, feature extraction is one of the dimensionality reduction procedures. Features extracted from series concisely represent the relevant features of each TS as a finite set of inputs for a clustering algorithm that can discern the similarities and differences of TS (Wang and Hyndman 2006). In this paper, features from complete series, rather than subsequences, are extracted to look for complete TS that have similar patterns (i.e., similar trend, stationarity and autoregression patterns).

3.1 Pattern identification algorithm

The algorithm used to identify trend, seasonality and autoregression patterns in TS that is proposed in this study can be summarised in the following steps:

1
Perform sequential SSA to extract the three components of the initial series that are associated with trend, seasonality and residual.
2
Model the extracted series in such a way that their associated characteristics can be extracted.
1. 2.1
  Trend component: from ${\mathbb{T}}_{trend}=\mu -\beta t$, estimate $\mu $ and $\beta $ using a linear regression, where ${\mathbb{T}}_{trend}$ is the extracted trend series and $t$ is time.
2. 2.2
  Seasonality component: from ${\mathbb{T}}_{seasonal}={c}_{1}\mathrm{sin}(2\pi t/w)+{c}_{2}\mathrm{cos}(2\pi t/w)$, estimate ${c}_{1}$ and ${c}_{2}$ using linear regression, where $w$ is the period, ${\mathbb{T}}_{seasonal}$ is the extracted seasonal component and $t$ is time.
3. 2.3
  Residual component: from ${\mathbb{T}}_{residual}$, obtain an $AR\left(T\right)$ and calculate the autocorrelations ${\varphi }_{1}, {\varphi }_{2}, {\varphi }_{3}$ and ${\varphi }_{w}$, where $w$ is the period and ${\mathbb{T}}_{residual}$ is the extracted residual component.
  
  For each initial series, a feature vector is constructed by considering the estimated parameters.
3
Use a conventional clustering algorithm to obtain a similar TS.
4
Average the initial series of each group to obtain their representative patterns based on the defined characteristics.

The new algorithm is explained step by step below.

3.2 Extracting the trend, seasonality and residual TS

The series for trend and seasonality and those associated with the residual of the original TS are extracted using a sequential SSA-based decomposition approach from the TS trend, seasonality and autoregression pattern identification algorithm proposed in this paper. In sequential SSA decomposition, TS are decomposed into a small number of components (i.e., trend, seasonality and residual) in two stages (Golyandina et al. 2012). When the trend shape is complex, it is impossible to completely decompose the TS at once, resulting in the decomposition process being performed sequentially (Golyandina and Korobeynikov 2014). Thus, sequential SSA is applied to each initial TS to ensure that, in the first stage, the component associated with the trend is extracted by choosing an $L$ value that is as small as possible, (i.e., divisible by the identified period w), to make the TS containing a periodic component smooth. In the second stage of sequential SSA, by considering the maximum L value ($L\approx N/2$ and it is divisible by $w$) for greater separability, the seasonality of the residual generated in the first stage can be extracted, generating a new TS for the residual part.

As noted in Sect. 2, the four steps of the SSA are highly connected, with each step immediately linked to the previous one, making each of them fundamental to the representation process proposed here, which starts with the extraction of the series of ${\mathbb{T}}_{trend}$, ${\mathbb{T}}_{seasonal}$ and ${\mathbb{T}}_{residual}$ (all of order $N$) described in this section. Thus, a correct identification of w and a correct definition of L are essential.

This process generates three TS for each initial TS, allowing them to be modelled and the parameters of interest to be calculated.

3.3 Constructing a feature vector for each TS

After modelling the extracted TS according to steps in 3.1.2 of the proposed TS trend, seasonality and autoregression pattern identification algorithm and to the estimated parameters, a feature vector for each TS is constructed.

Following the decomposition of TS ${\mathbb{T}}_{i}$ by means of sequential SSA to obtain the reconstruction of the trend, seasonal and residual components, which are modelled based on steps 3.1.2.1. to 3.1.2.3. of the proposed algorithm, the feature vector or representation of ${\mathbb{T}}_{i}$, denoted as ${C}_{i}=\left[{\mu }_{i},{\beta }_{i},{c}_{i1},{c}_{i2},{\varphi }_{i1}, {\varphi }_{i2}, {\varphi }_{i3},{\varphi }_{is}\right]$, can be constructed.

3.4 Obtaining similar TS

A variety of methods have been developed to assess whether two TS are similar or in the same group. In this paper, the Euclidean distance is employed as a similarity criterion using the unsupervised learning algorithm hkmeans to group feature vectors that translate from the trend, stationarity and residual TS.

Given a dataset of TS with n points $\left\{{\mathbb{T}}_{1}, {\mathbb{T}}_{2},\dots ,{\mathbb{T}}_{n}\right\}$, where each ${\mathbb{T}}_{i}$ is represented by a feature vector ${C}_{i}=\left[{\mu }_{i},{\beta }_{i},{c}_{i1},{c}_{i2},{\varphi }_{i1}, {\varphi }_{i2}, {\varphi }_{i3},{\varphi }_{is}\right]$, the unsupervised hkmeans partitioning process $P=\left\{{P}_{1}, {P}_{2},\dots , {P}_{k}\right\}$, where the ${C}_{i}$ vectors are clustered according to the Euclidean distance as a similarity measure and are called characteristic-based or representation-based TS clustering. ${P}_{i}$ is called a cluster, where $D=\bigcup_{i=1}^{k}{P}_{i}$ and ${P}_{i}\cap {P}_{j}=\varnothing , \forall i\ne j$.

In the clustering process, each reconstructed ${C}_{i}$ vector has been normalised by centring to a mean of 0 and scaling to a standard deviation of 1 (Bro and Smilde 2003). Afterwards, a similarity matrix is obtained by taking into account these vectors and considering the Euclidean distance as the similarity measure. Finally, the similarity matrix is used as input to the clustering algorithms. Consequently, the clustering process is optimised, noise is eliminated, and computational costs are reduced.

3.5 Representative patterns of each group

After clustering with a cluster algorithm, the original TS of each group is utilised to calculate a new TS ${\mathbb{T}}_{gi}$ by averaging the TS of each group ${g}_{i}$, with each ${\mathbb{T}}_{gi}$ serving as a representative prototype of group ${g}_{i}$. Subsequently, steps in 3.1.2 of the proposed algorithms are applied to define the trend, seasonality and autoregression patterns.

4 Empirical study

4.1 Data preparation

A set of 1776 points from a grid in an IP of 25 × 25 km² elaborated through spatial interpolation kriging conducted by the “Servicio de Desarrollos Climatológicos” of the Meteorological Spanish State Agency was used. According to Fig. 1, this grid includes points distributed in Spain, Portugal, Southern France, North Africa and the closest areas of the Atlantic Ocean and the Mediterranean Sea, and considers a monthly TS of TMAX from January 1931 to December 2009 for each point, with 948 observations each.

4.2 TS analysis

As shown in Fig. 2, the TS of the monthly mean TMAX at the points located in such a grid of the IP are clearly very seasonal, with temperature peaks occurring in the summer months and temperature drops occurring in the winter months. Moreover, a variable trend is observed throughout the period, as shown in Fig. 6.

4.3 Decomposition and reconstruction of SSA

Sequential SSA was applied as a data preprocessing tool due to its capacity for component separability, which facilitates dimensionality reduction, TS representation and result interpretability.

Figure 6 shows that the trend shape of the TS studied is complex, indicating that the decomposition is performed sequentially. First, the trend is extracted. For such a changing trend shape, its extraction is similar to smoothing (Golyandina and Korobeynikov 2014), starting with choosing a window length $L=12$ to smooth the TS containing a periodic component, as would be done in the moving average procedure. The window length must be the minimum possible length and must be divisible by the period, which in this case is 12, as seen in the periodogram in Fig. 3, suggesting that the seasonality consists of sine waves with the indicated period. Since it is a monthly series, the horizontal axis of this graph varies from 0 to 6, and the peak at point 0 is related to a long-term trend, whereas the peaks at points 1 and 2 are related to the annual and semi-annual cycles, respectively. Generally, the same was noted in all TS in this study.

After performing the first decomposition, it is confirmed that the first eigentriple corresponds to the trend, whereas the other eigentriples contain high-frequency components, meaning that they are not related to the trend. This can be seen in Figs. 4 and 5, which show the shape of the six main eigenvectors and the result of the reconstruction carried out by each of the six eigentriples. Additionally, it can be seen that the principal eigenvector has practically constant coordinates; thus, it corresponds to pure smoothing by the Bartlett filter (Golyandina et al. 2012). Notably, the first reconstructed principal eigenvector in Fig. 5 produces the same exact reconstructed trend shown in Fig. 6.

After extracting the trend, the seasonality is extracted from the residual generated in the first decomposition stage. Figure 7 shows the periodogram of this residual, demonstrating that there is seasonality consisting of sinusoidal waves with periods of 12 and 6, and that it is not possible to observe the peak at the point 0 shown in Fig. 3 since the trend was removed.

Since the length of the TS is $N=948$, to guarantee better separability, the window length $L=468$ is taken as the maximum window length, ensuring that $L\le N/2$ and $L$ is divisible by 12.

To properly identify the searched sinusoidal waves, eigenvalue plots, eigenvector scatter plots and the correlation matrix of the elementary components (${\varvec{W}}-matrix$) are used. Figure 8 shows several steps produced by approximately equal eigenvalues, each of which generates a pair of eigenvectors corresponding to a sine wave. This is confirmed in Fig. 9, which shows 2 almost regular polygons ET1–2 and ET3–4 corresponding to periods of 12 and 6, which occur due to seasonality and are clearly explained by the periodogram in Fig. 7.

The components are organised according to the order of the periodogram values at these frequencies (Golyandina and Korobeynikov 2014). The correlation matrix in Fig. 10 demonstrates that the indicated component pairs show high within-pair correlations, but between them, the correlation is almost zero. Some pairs of eigentriples that satisfy the same properties are observed, but they are referred to as noise because identifying the period to which they correspond is not interpretable for monthly data.

Thus, from the previous identification, the seasonality is extracted by reconstructing the ET1–4 clustering result. As shown in Fig. 11, the original TS corresponds to the residual of the 1st stage, TS F1 is the extracted seasonality and TS Residuals is the final noise or residual generated in the 2nd stage. The noise residuals obtained are heterogeneous (Golyandina and Korobeynikov 2014).

The resulting sequential SSA decomposition is shown in Fig. 12.

4.4 Parameter estimation for the representation

Considering the trend, seasonality and noise TS extracted through sequential SSA, the parameters that correspond to each one are estimated, as indicated in steps 3.1.2 of the proposed algorithm. The parameters estimated by modelling each component of the initial TS are shown in Table 1.

Table 1 Parameters associated the trend, seasonality and noise components with the SSA-extracted components: they will be used to obtain the feature vectors for the clustering

Full size table

The procedure is applied to the distinct TS considered in the dataset, and a feature vector is constructed for each TS.

4.5 TS clustering

When clustering methods are applied to any dataset (with random structures or not), the data is divided into groups. Thus, the clustering tendency of the new dataset obtained with the feature vectors of each TS is evaluated. The Hopkins statistic test evaluated the spatial randomness of the data by measuring the probability that the dataset was generated by uniform data distribution, enabling it to be used to evaluate the clustering tendency of a dataset (Cross and Jain 1982; Banerjee and Davé 2004). If the value of such a statistic is close to 0.5, the data is considered random, but if it is close to 1.0, it indicates that the dataset contains very well-defined clustered data. For the dataset, a Hopkins statistic value of 0.8550996 was obtained, which indicates that the dataset has natural clusters.

Since the dataset shows a clustering tendency, this study proceeds with clustering. Initially, four clustering algorithms, i.e., k-means, k-medoids, HA and SOM, are compared to the dataset. Internal validation measures of clustering are used.

Internal clustering measures the goodness of a clustering structure without taking into account external information that is not present in the data (Kremer et al. 2011; Song and Zhang 2008; Brun et al. 2007). This approach is usually based on compactness and separation criteria because clustering aims to arrange similar objects within a single cluster and distinct objects in different clusters (Kraus et al. 2011; Zhao and Karypis 2002). Three such internal validation indices are selected: the silhouette index (Rousseeuw 1987), the Dunn index (Dunn 1974) and the connectivity index (Liu et al. 2013). These indices suggest that among the applied algorithms, the best algorithm for clustering data into two groups is HA. However, although there are no objective criteria for choosing the number of clusters, when the popular silhouette (Rousseeuw 1987), elbow (Thorndike 1953) and GAP (Tibshirani et al. 2001) methods are applied, they estimate that the optimal number of clusters is 3. Looking at the information extracted from the selected internal validation indices with respect to three clusters, the methods to consider include HA and k-means clustering, as noted in Table 2. The connectivity index corresponds to the extent to which elements are placed in the same cluster as their nearest neighbours in the data space and should be minimised. In contrast, the Dunn index should be maximised, since, according to its calculation, a dataset has compact and well-separated clusters if the diameter of the clusters is small and the distance between clusters is large. The silhouette index should also be maximised because it measures how well an observation is clustered, in addition to estimating the average distance between clusters.

Table 2 Internal measures of cluster validation: the connectivity, Dunn and Silhouette indices of the clustering results of Hierarchical, k-means, PAM and SOM for the IP, which measure the compactness and stability of clustering

Full size table

Considering what is suggested by these indices, in order to improve the clustering results, a hybrid method, called hierarchical k-means (hkmeans), was selected to cluster the TS according to their extracted features. The hybrid hkmeans method combines the two algorithms suggested by the indices in three steps. In the first step, it calculates the hierarchical clustering results and cuts the tree into k clusters. In the second step, it calculates the centre or mean of each group. In the third step, it calculates the k-means using the set of cluster centres defined in the previous step as the centre of the initial clusters (Kassambara 2017; Lee et al. 2010). Although k-means is one of the most popular clustering algorithms, it has some limitations; for example, the number of clusters needs to be specified in advance, the initial centroids need to be selected randomly, and the final clustering solution is sensitive to that initial random selection of centroids. However, for hkmeans, the selection of the initial centroids for k-means is defined using the hierarchical clustering result, which reduces these limitations and improves the final results of k-means clustering.

Here, hkmeans is applied to the set formed by the feature vectors of the TS. In the first step, an HA algorithm with Ward’s linkage method is considered. The final clusters are shown in Fig. 13.

To retrieve information on the groups obtained, the centroids of each group are accessed, which are the averages of each characteristic used to define the characteristic vectors that represent each cluster. Thus, Cluster 1 is identified as containing points in IP with lower temperatures than the points in other clusters, although the point values of this cluster vary greatly over time, increasing over time. The seasonal component describing these TS typically corresponds to a cyclical variation, and is described by ${c}_{1}=-4.62810$ and ${c}_{2}=-5.20363$. In addition, the main autocorrelation of the residual component is negative. On the other hand, Cluster 2 contains points in IP where the average TMAX reaches the highest values, but the values decrease slightly over time, with a seasonal component whose amplitude of cyclical variation is defined by ${c}_{1}=-4.96816$ and ${c}_{2}=-5.13317$ on average. In addition, the main autocorrelation of the residual component is positive and has a higher absolute value than that of the other groups. Cluster 3 is characterised by points in IP where the average TMAX is at an intermediate level with respect to the temperatures of the other clusters and experiences a positive change over time. The seasonal component generally shows a cyclical variation described by ${c}_{1}=-6.07224$ and ${c}_{2}=-8.18860$, and its main autocorrelation in the residual component, which is negative, is the lowest absolute value compared to the other clusters. Figure 14 illustrates the distribution of the points in IP according to the clusters obtained (the grid is composed of longitudes and latitudes in UTM coordinates).

As can be seen in the map in Fig. 14, Cluster 1 includes areas of northern Spain, northern Portugal and southern France, as well as the Mediterranean region. In Cluster 2, most of the points are distributed in the Mediterranean Ocean, with the remaining points in southern Spain, northern Africa and the Atlantic Ocean. In Cluster 3, the points are primarily distributed in the Spanish and Portuguese territories.

When analysing the resulting patterns using the series obtained from the centroids of each cluster, the three zones in the IP are clearly differentiated: in the north and central zones, an increase in temperature was noted over time, whereas in the south, a slight decrease was observed. The north zone of the IP, where the areas with the lowest TMAX are found, experienced a 0.2034 °C increase in its TMAX per decade between 1931 and 2009, whereas the central zone showed an increase of 0.135 °C per decade. In contrast, the south zone, where the areas with the highest TMAX are located, showed a slight decline.

Moreover, different seasonal variations were noted for each zone: the north zone shows its largest variations in winter months, whereas the central zone shows its variations in spring and autumn months. The south zone does not show any marked differences in monthly variation.

5 Conclusion

This paper proposes a novel method for clustering TS data that considers their trend, seasonality, and residual components. This approach involves representing TS data as feature vectors that are constructed by extracting the trend, seasonality and noise components using SSA decomposition. The results demonstrate reasonable groupings based on the defined features.

The proposed procedure can be applied to discover patterns in TS datasets, extract valuable information, and perform exploratory analysis on large TS datasets to support modelling efforts. The experiments allowed for spatial exploration and description of the variations of TMAX in the IP from 1931 to 2009 based on different zones defined by their trend and monthly variation.

Furthermore, this method could be used to test TS datasets with varying lengths or seasonal periods, as it is not restricted to TS data with uniform characteristics. Future research could explore the applicability of this method in multivariate TS analysis, as SSA can also be utilised for decomposition of such types of series.

References

Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T (2015) Time-series clustering—a decade review. Inf Syst 53:16–38. https://doi.org/10.1016/j.is.2015.04.007
Article Google Scholar
Banerjee A, Davé RN (2004) Validating clusters using the Hopkins statistic. In: IEEE international conference on fuzzy systems, 2004, vol 1, pp 149–153. https://doi.org/10.1109/FUZZY.2004.1375706
Bro R, Smilde AK (2003) Centering and scaling in component analysis. J Chemom 17(1):16–33. https://doi.org/10.1002/cem.773
Article CAS Google Scholar
Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER (2007) Model-based evaluation of clustering validation measures. Pattern Recognit 40(3):807–824. https://doi.org/10.1016/j.patcog.2006.06.026
Article Google Scholar
Calheiros T, Pereira MG, Nunes JP (2021) Assessing impacts of future climate change on extreme fire weather and pyro-regions in Iberian Peninsula. Sci Total Environ 754:142233. https://doi.org/10.1016/j.scitotenv.2020.142233
Article CAS PubMed Google Scholar
Cross GR, Jain AK (1982) Measurement of clustering tendency. IFAC Proc 15(1):315–320. https://doi.org/10.1016/s1474-6670(17)63365-2
Article Google Scholar
Dosio A, Mentaschi L, Fischer EM, Wyser K (2018) Extreme heat waves under 1.5 °C and 2 °C global warming. Environ Res Lett 13(5):054006. https://doi.org/10.1088/1748-9326/aab827
Article Google Scholar
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104
Article Google Scholar
Ergüner Özkoç E (2021) Clustering of time-series data. In: Birant D (ed) Data mining—methods, applications and systems. https://doi.org/10.5772/intechopen.84490
Gebremichael HB, Raba GA, Beketie KT, Feyisa GL, Siyoum T (2022) Changes in daily rainfall and temperature extremes of upper Awash Basin, Ethiopia. Sci Afr 16:e01173. https://doi.org/10.1016/j.sciaf.2022.e01173
Article Google Scholar
Golyandina N (2010) On the choice of parameters in Singular Spectrum Analysis and related subspace-based methods. Stat Interface 3(3):259–279
Article Google Scholar
Golyandina N, Korobeynikov A (2014) Basic singular spectrum analysis and forecasting with R. Comput Stat Data Anal 71:934–954. https://doi.org/10.1016/j.csda.2013.04.009
Article Google Scholar
Golyandina N, Nekrutkin V, Zhigljavsky AA (2001) Analysis of time series structure: SSA and related techniques. CRC Press, Boca Raton
Book Google Scholar
Golyandina N, Pepelyshev A, Steland A (2012) New approaches to nonparametric density estimation and selection of smoothing parameters. Comput Stat Data Anal 56:2206–2218. https://doi.org/10.1016/j.csda.2011.12.019
Article Google Scholar
Golyandina N, Korobeynikov A, Zhigljavsky A (2018) Singular spectrum analysis with R. Springer, Berlin
Book Google Scholar
Guo C, Jia H, Zhang N (2008) Time series clustering based on ICA for stock data analysis. In: 2008 International conference on wireless communications, networking and mobile computing, WiCOM 2008, 2008, pp 1–4. https://doi.org/10.1109/WiCom.2008.2534
Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. Appl Stat 28(1):100. https://doi.org/10.2307/2346830
Article Google Scholar
Huhtala Y, Karkkainen J, Toivonen HTT (1999) Mining for similarities in aligned time series using wavelets. In: Data mining and knowledge discovery: theory, tools, and technology, vol 3695. https://doi.org/10.1117/12.339977
Huiting L, Zhiwei N, Jianyang L (2006) Time series similar pattern matching based on empirical mode decomposition. In: Proceedings—ISDA 2006: sixth international conference on intelligent systems design and applications, 2006, vol 1(050460402), pp 644–648. https://doi.org/10.1109/ISDA.2006.273
IPCC (2014) Climate change 2014: mitigation of climate change. Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, IPCC. Cambridge University Press. https://scholar.google.com/scholar_lookup?title=Climate%20change%202014&publication_year=2014&author=IPCC&author=K.P.%20R&author=A.M.%20L
IPCC: Masson-Delmotte V, Zhai P, Chen Y, Goldfarb L, Gomis MI, Matthews JBR, Berger S, Huang M, Yelekçi O, Yu R, Zhou B, Lonnoy E, Maycock TK, Waterfield T, Leitzell K, Caud N (2021) In: Working Group I Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (ed) Climate Change 2021: the physical science basis. IPCC. www.ipcc.ch
Kassambara A (2017) Multivariate analysis I: practical guide to cluster analysis in R. In: Unsupervised machine learning. Taylor & Francis Group, New York, p 188
Keogh E, Smyth P (1997) A probabilistic approach to fast pattern matching in time series databases. In: Proceedings of the 3rd international conference of knowledge discovery and data mining, M(1994), 1997, pp 52–57. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:A+Probabilistic+Approach+to+Fast+Pattern+Matching+in+Time+Series+Databases#0
King AD, Karoly DJ (2017) Climate extremes in Europe at 1.5 and 2 degrees of global warming. Environ Res Lett 12(11):114031. https://doi.org/10.1088/1748-9326/aa8e2c
Article Google Scholar
Kohonen T, Oja E (1996) Engineering applications of the self-organizing map. https://doi.org/10.1109/5.537105
Kraus JM, Müssel C, Palm G, Kestler HA (2011) Multi-objective selection for collecting cluster alternatives. Comput Stat 26(2):341–353. https://doi.org/10.1007/s00180-011-0244-6
Article Google Scholar
Kremer H, Kranen P, Jansen T, Seidl T, Bifet A, Holmes G, Pfahringer B (2011) An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, 2011, pp 868–876. https://doi.org/10.1145/2020408.2020555
Kuglitsch FG, Toreti A, Xoplaki E, Della-Marta PM, Zerefos CS, Türkeş M, Luterbacher J (2010) Heat wave changes in the eastern Mediterranean since 1960. Geophys Res Lett. https://doi.org/10.1029/2009GL041841
Article Google Scholar
Lee AJT, Lin MC, Kao RT, Chen KT (2010) An effective clustering approach to stock market prediction. In: PACIS 2010—14th Pacific Asia conference on information systems, 2010, pp 345–354
Lee Y, Na J, Lee WB (2018) Robust design of ambient-air vaporizer based on time-series clustering. Comput Chem Eng 118:236–247. https://doi.org/10.1016/j.compchemeng.2018.08.026
Article CAS Google Scholar
Li H, Wei M (2020) Fuzzy clustering based on feature weights for multivariate time series. Knowl Based Syst 197:105907. https://doi.org/10.1016/j.knosys.2020.105907
Article Google Scholar
Liu Y, Li Z, Xiong H, Gao X, Wu J, Wu S (2013) Understanding and enhancement of internal clustering validation measures. IEEE Trans Cybern 43(3):982–994. https://doi.org/10.1109/TSMCB.2012.2220543
Article PubMed Google Scholar
Lorenzo MN, Alvarez I (2022) Future changes of hot extremes in Spain: towards warmer conditions. Nat Hazards 113(1):383–402. https://doi.org/10.1007/s11069-022-05306-x
Article Google Scholar
Lukasová A (1979) Hierarchical agglomerative clustering procedure. Pattern Recognit 11(5–6):365–381. https://doi.org/10.1016/0031-3203(79)90049-9
Article Google Scholar
Molina MO, Sánchez E, Gutiérrez C (2020) Future heat waves over the Mediterranean from an Euro-CORDEX regional climate model ensemble. Sci Rep 10(1):8801. https://doi.org/10.1038/s41598-020-65663-0
Article CAS PubMed PubMed Central Google Scholar
Möller-Levet CS, Klawonn F, Cho K, Wolkenhauer O (2003) Fuzzy clustering of short time-series and unevenly distributed sampling points. Adv Intell Data Anal 2810:330–340
Google Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/J.ESWA.2008.01.039
Article Google Scholar
Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15):1–9. https://doi.org/10.5120/8282-1278
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(C):53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Article Google Scholar
Russo S, Sillmann J, Fischer EM (2015) Top ten European heatwaves since 1950 and their occurrence in the coming decades. Environ Res Lett 10(12):124003. https://doi.org/10.1088/1748-9326/10/12/124003
Article Google Scholar
Samset BH, Fuglestvedt JS, Lund MT (2020) Delayed emergence of a global temperature response after emission mitigation. Nat Commun 11(1):3261. https://doi.org/10.1038/s41467-020-17001-1
Article CAS PubMed PubMed Central Google Scholar
Shi Y, Li B, Du G, Dai W (2021) Clustering framework based on multi-scale analysis of intraday financial time series. Physica A. https://doi.org/10.1016/j.physa.2020.125728
Article Google Scholar
Song M, Zhang L (2008) Comparison of cluster representations from partial second- to full fourth-order cross moments for data stream clustering. In: Proceedings—IEEE international conference on data mining, ICDM, 2008, pp 560–569. https://doi.org/10.1109/ICDM.2008.143
Tebaldi C, Debeire K, Eyring V, Fischer E, Fyfe J, Friedlingstein P, Knutti R, Lowe J, O’Neill B, Sanderson B, van Vuuren D, Riahi K, Meinshausen M, Nicholls Z, Tokarska KB, Hurtt G, Kriegler E, Lamarque J-F, Meehl G et al (2021) Climate model projections from the Scenario Model Intercomparison Project (ScenarioMIP) of CMIP6. Earth Syst Dyn 12(1):253–293. https://doi.org/10.5194/esd-12-253-2021
Article Google Scholar
Thorndike RL (1953) Who belongs in the family? Psychometrika 18(4):267–276
Article Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63(2):411–423. https://doi.org/10.1111/1467-9868.00293
Article Google Scholar
Vicedo-Cabrera AM, Guo Y, Sera F, Huber V, Schleussner C-F, Mitchell D, Tong S, de Coelho MSZS, Saldiva PHN, Lavigne E, Correa PM, Ortega NV, Kan H, Osorio S, Kyselý J, Urban A, Jaakkola JJK, Ryti NRI, Pascal M et al (2018) Temperature-related mortality impacts under and beyond Paris Agreement climate change scenarios. Clim Change 150(3–4):391–402. https://doi.org/10.1007/s10584-018-2274-3
Article PubMed PubMed Central Google Scholar
Wang X, Hyndman R (2006) Characteristic-based clustering for time series data. Data Min Knowl Discov 13:335–364. https://doi.org/10.1007/s10618-005-0039-x
Article Google Scholar
Wang C, Wang XS (2000) Supporting content-based searches on time series via approximation. In: Proceedings of the international conference on scientific and statistical database management, SSDBM, 2000, pp 69–81. https://doi.org/10.1109/ssdm.2000.869779
Warren Liao T (2005) Clustering of time series data—a survey. Pattern Recognit 38(11):1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025
Article Google Scholar
Xiao Y, Liu JJ, Hu Y, Wang Y, Lai KK, Wang S (2014) A neuro-fuzzy combination model based on singular spectrum analysis for air transport demand forecasting. J Air Transp Manag 39:1–11. https://doi.org/10.1016/j.jairtraman.2014.03.004
Article Google Scholar
Zhao Y, Karypis G (2002) Evaluation of hierarchical clustering algorithms for document datasets. In: International conference on information and knowledge management, proceedings, 2002, pp 515–524. https://doi.org/10.1145/584792.584877
Zhigljavsky A (2011) Singular spectrum analysis for time series. In: International encyclopedia of statistical science. Springer, Berlin. https://doi.org/10.1007/978-3-642-04898-2_521

Download references

Acknowledgements

We express our gratitude to “Servicio de Desarrollos Climatológicos” of the Meteorological Spanish State Agency for providing the data used in the study, and to the Colombian Ministry of Science and the Technological University of Chocó for supporting the doctoral formation of Arnobio Palacios. The research is also supported by a Grant from Agencia Estatal de Investigación (PID2019-106433GB-I00/AEI/10.13039/501100011033), Spain.

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Authors and Affiliations

Faculty of Statistical Studies, Complutense University of Madrid, 28040, Madrid, Spain
Arnobio Palacios Gutiérrez, Jose Luis Valencia Delfa & María Villeta López
Group Valoración y Aprovechamiento de la Biodiversidad, Technological University of Chocó, Quibdó, Chocó, Colombia
Arnobio Palacios Gutiérrez

Authors

Arnobio Palacios Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
Jose Luis Valencia Delfa
View author publications
You can also search for this author in PubMed Google Scholar
María Villeta López
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: AP, JLV; Methodology: AP; Software: AP; Validation: AP, JLV, MV; Formal analysis and investigation: AP, MV; Data curation: AP, JLV; Writing—original draft preparation: AP; Writing—review and editing: AP, JLV, MV; Supervision: JLV, MV.

Corresponding author

Correspondence to Arnobio Palacios Gutiérrez.

Ethics declarations

Conflict of interest

We declare that no financial interests or personal relationships influenced the work reported in this article.

Additional information

Handling Editor: Luiz Duczmal.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Palacios Gutiérrez, A., Valencia Delfa, J.L. & Villeta López, M. Time series clustering using trend, seasonal and autoregressive components to identify maximum temperature patterns in the Iberian Peninsula. Environ Ecol Stat 30, 421–442 (2023). https://doi.org/10.1007/s10651-023-00572-9

Download citation

Received: 11 March 2023
Revised: 21 June 2023
Accepted: 23 June 2023
Published: 15 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10651-023-00572-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Time series clustering using trend, seasonal and autoregressive components to identify maximum temperature patterns in the Iberian Peninsula

Abstract

Similar content being viewed by others

Assessing the Evolution of Meteorological Seasons and Climate Changes Using Hierarchical Clustering

Independent Component Analysis for Extended Time Series in Climate Data

Pattern Recognition Through Empirical Mode Decomposition for Temperature Time Series Between 1986 and 2019 in Mexico City Downtown for Global Warming Assessment

1 Introduction