Abstract
This paper proposes a clustering approach for multivariate time series with timevarying parameters in a multiway framework. Although clustering techniques based on time series distribution characteristics have been extensively studied, methods based on timevarying parameters have only recently been explored and are missing for multivariate time series. This paper fills the gap by proposing a multiway approach for distributionbased clustering of multivariate time series. To show the validity of the proposed clustering procedure, we provide both a simulation study and an application to real air quality time series data.
1 Introduction
Clustering time series is an important tool for the analysis of real data in several contexts like biology, medicine, environmental sciences, engineering and finance. When clustering time series data, it is important to define a proper distance (Liao 2005). Distances based on the distributional characteristics of the time series are commonly considered (e.g. Nanopoulos et al. 2001; Wang et al. 2006; Fulcher and Jones 2014; D’Urso et al. 2017; Bastos and Caiado 2021). The idea of considering distribution characteristics is originally from Nanopoulos et al. (2001), that introduced the use of skewness and kurtosis in the clustering process. Later, Wang et al. (2006) and Fulcher and Jones (2014) proposed approaches of clustering based on multiple features, including the static first four moments. In particular, by using a partitioning clustering algorithm, D’Urso et al. (2017) proposed an approach based on time series’ extremes, using static parameters estimated from a Generalized Extreme Value (GEV) distribution. Similarly, Mattera et al. (2021) considered parameters estimated from a Skewed Generalized Error Distribution (SGED) to account for skewness and heavy tails. Recently, Bastos and Caiado (2021) considered a set of features for clustering financial time series where distribution characteristics were included. However, the use of distribution parameters is not limited to clustering time series data of economic and financial type. For example, Wang et al. (2011) proposed the use of parameters estimated from a Weibull distribution for clustering gene expression data.
The use of distribution parameters for clustering is well motivated by the high performances, in terms of clustering quality, that are witnessed by the previous studies.
According to previous studies (for an overview of time series clustering approaches see Maharaj et al. 2019), it is possible to classify time series with similar distribution parameters through a dissimilarity matrix computed on the difference of the estimated parameters in the clustering algorithm.
As highlighted by time series analysis studies, the use of static distribution parameters may not work with real time series data. The statistical models for time series with timevarying parameters have been categorized by Cox (1981) into two main classes, namely the observationdriven and the parameterdriven models. We focus our attention on the first type of models. In the observationdriven models, the time variation of the parameters is modeled through autoregressive approaches, where the parameters at a given time t are function of lagged values. This approach, that simplifies the likelihood evaluation, is very popular in the applied statistics and econometrics studies (e.g. see Creal et al. 2013; Harvey 2013; Harvey and Sucarrat 2014; Caivano and Harvey 2014; Koopman et al. 2016). Examples of observationdriven models are the ARCH (Engle 1982) and GARCH of Bollerslev (1986) for the variance, the Autoreressive Conditional Skewness (ARCS) of Harvey and Siddique (1999) for the skewness, the ARCSK of León et al. (2005) for modeling time variation in both skewness and kurtosis. More recently, Creal et al. (2013) proposed a very general approach to model time variation of the parameters for any kind of probability distribution. They developed a new statistical model, called Generalized Autoregressive Score (GAS), using the score function of the specified density as the source of time variation in the model’s parameters.
Despite clustering techniques, based on time series’ distribution characteristics, have been extensively studied, approaches based on timevarying parameters have only recently been explored in Cerqueti et al. (2021, 2022).
However, these two contributions have some weaknesses. The approach proposed in Cerqueti et al. (2021) is based on the selection of a target parameter. Although in some cases it can be of interesting to study clusters obtained according to a single distributional feature (e.g. the variance or the skewness), this approach can be less accurate when alternative features have their relevance in grouping the time series. Cerqueti et al. (2022) overcome the problem related to the selection of the target parameter by using more parameters jointly, focusing on the use of unconditional and conditional quantities in the clustering process. We have to acknowledge that the proposed unconditional distributionbased clustering provides results that are very close to the static parameters’ ones, even if the clustering interpretation is much more interesting. Most importantly, none of the two approaches can handle the case of multivariate time series.
In this paper, we propose a multiway clustering approach considering multiple timevarying parameters jointly in the definition of the clusters. We note that, with univariate time series with timevarying parameters the structure of the data is a 3D tensor, while with multivariate ones, it is a 4D tensor. According to the previous studies, we estimate the timevarying parameters with the GAS model.
To show the validity of the proposed multiway clustering procedure, we provide a simulation study with both univariate and multivariate time series. Moreover, we also show an application to real multivariate air pollution time series data. In particular, we aim identifying cities characterized by the same temporal evolution of air pollution, considering the Particular Matter (PM) time series variables as air quality indicators.
Studying air pollution clusters is important for policy makers. Indeed, there is a clear evidence that the presence of poor air quality leads to adverse effects on human health (e.g. see Dominici et al. 2003; Anderson et al. 2012). In particular, there is a strong association between PM and respiratory and cardiovascular diseases (see Rajagopalan et al. 2018). Moreover, there is a significant association between high levels of air pollution and the number of COVID19 cases (Copat et al. 2020). Since the exposure to PM is dangerous to human health, policy makers of local governments take particularly into account monitoring of air quality (e.g. see Gao et al. 2011). In this framework, cluster analysis in an important tool for detecting groups of regions and/or cities with the same levels of air pollution (for a review see Govender and Sivakumar 2020).
Our analysis suggests the relevance of the proposed clustering approach in the development of public policies aimed at reducing the environmental impact in specific cities and/or geographical areas.
The paper is structured as follows. In Sect. 2, we describe the multiway clustering procedure in detail. In particular, in Sect. 2.1 we introduce preliminaries and notation and in Sect. 2.2 we show the proposed clustering procedure. Sections 2.3 and 2.4 discuss two particular cases with timevarying parameters estimated from a Gaussian and Generalizedt distributions. Section 3 provides experimental results with simulated data, while in Sect. 4 we show the empirical relevance of the proposed approach in the context of environmental quality monitoring. Final remarks with possible future research directions are discussed in the last section.
2 Multiway clustering with timevarying parameters
Although many studies discussed the timevarying parameters’ evidence and there are a lot of statistical tools developed for modeling time variation in the parameters (e.g. see León et al. 2005; Harvey 2013; Creal et al. 2013; Harvey and Sucarrat 2014; Caivano and Harvey 2014), a clustering approach based on timevarying parameters has only recently been explored.
In what follows we propose a clustering approach for multivariate time series based on a multi steps algorithm (see e.g. Košmelj 1986; Košmelj and Batagelj 1990). We put ourself in the Relationship Matrices Analysis framework (for a clear illustration of such an approach, see e.g. D’Urso 2004), where the dissimilarity between units is determined by considering a relationship matrix (e.g. correlation, distance, etc.) between pairs of elements.
2.1 Preliminaries and notation
Let N be the number of statistical units and K the number of time series variables of length T. The distributionbased clustering approaches have mainly been developed for clustering univariate timeseries, i.e. in presence of N statistical units and \(K=1\) variable. By denoting the single \(K=1\) variable as \(y_t\), we have that \(y_{n,t}\) represents the values of the time series variable \(y_t\) for the nth statistical unit.
To assist the reader, we firstly present the notation used with univariate time series characterized by static distribution parameters. Let \({\mathbf {Y}} =\{y_{n,t}: n=1,\ldots ,N; \;t=1,\ldots ,T\}\) be the dataset matrix containing the N univariate time series—i.e., the statistical units—whose nth element is \(\{y_{n,t}: t=1,\ldots ,T\}\). Therefore:
Let us suppose that each column of the (1) is generated by a probability density function \(p(\cdot )\) characterized by the presence of J parameters, so that we call \(f_{n,j}\) the jth static distribution parameter associated to the nth statistical units. For example, in the case \(p(\cdot )\) follows a Gaussian distribution, we have \(J=2\) parameters, so that \(f_{n,1}=\mu _n\) and \(f_{n,2}=\sigma ^2_n\) are, respectively, the mean and the variance of the nth statistical unit. Therefore, the number of J parameters depends on the underlying distributional assumption. In presence of a general \(p(\cdot )\) density, a distributionbased clustering considers the following \((N \times J)\) matrix \(\mathbf {F}\) as the input of the algorithm:
where the distribution parameters \(f_{n,j}\) can be estimated with maximum likelihood.
In the case of \(K\ge 2\) multivariate time series, we define \(y_{n,k,t}\) \((n=1,\dots ,N; k=1\,\dots ,\;K, t=1,\dots ;T)\) the value of the kth variable at time t for the nth statistical unit. Therefore, in the case of multivariate time series, the matrix (1) becomes a 3D tensor:
By considering static distribution parameters with \(K\ge 2\), we have that the matrix (2) has a 3D tensorial representation with the elements \(f_{n,k,j}\) representing the jth static distribution parameter associated to the kth variable of the nth unit.
We are now in the position to introduce our contribution to the methodological setting of the timevarying parameters in the mumtivariate timeseries context. Specifically, we introduce time variation in the parameters of multivariate time series. In this case the \(f_{n,k,j}\)s in the 3D tensorial representation are time series themselves. Therefore, by considering timevarying parameters for multivariate time series (3), we have that the matrix (2) is the following 4D tensor called \(\tilde{\mathbf {F}}\):
where \(f_{n,k,j,t}\) denotes the jth distribution parameter for the kth variable of the nth statistical unit at time t. Clearly, the general formulation in (4) includes also the univariate timedependent case (\(K=1\)) and the static univariate case (\(K=1\) and \(T=1\)).
In this paper, starting from the multivariate time series data (3), we first estimate the terms appearing in equation (4). Then, we consider the multivariate timevarying parameters as the input of the clustering procedure. In order to model and estimate the timevarying parameters in (4), following previous studies, we use the Generalized Autoregressive Score (GAS) model of Creal et al. (2013). For details about the GAS model see the “Appendix 2”. Therefore, the estimated timevarying parameters \(\hat{f}_{n,k,j,t}\) are used as the input of the clustering procedure. The similarity between statistical units is defined by the degree to which the distribution parameters, for each variable, vary over time.
2.2 The clustering procedure
The proposed clustering procedure, inspired from the doublestep approaches for clustering longitudinal data (Košmelj 1986; Košmelj and Batagelj 1990), can be outlined as follows.
Let \(f_{n,k,j,t}\) be the realization of the jth timevarying parameter associated to the kth variable for the nth statistical unit at time t (4); we define \(\rho _{n,k,j,l}\) as the estimated autocorrelation at lag \(l (l=1,\dots ,L)\) of the jth timevarying parameter associated to the kth variable of the nth unit.
In the first step of the clustering procedure we compute \(N \times K\) distance matrices \({\mathbf {D}}_{n,k} = \left\{ d_{n, k, j, j^{\prime }}: j, j^{\prime }=1,\dots , J ; j \ne j^{\prime }\right\} \), for each \(n=1,\dots ,N; \;k=1,\dots ,K\). In line with previous studies (see e.g. Cerqueti et al. 2021), we consider an ACFbased distance between two pairs of timevarying parameters j and \(j\prime \):
Therefore, each matrix \({\mathbf {D}}_{n,k}\) can be written as follows:
Note that each \({\mathbf {D}}_{n,k}\) is a squared matrix of order J and it is symmetric with a null diagonal. In the second step of the procedure we aim to cluster the N statistical units on the basis of a dissimilarity measure among the matrices \({\mathbf {D}}_{n,k}\). Let \(\mathbf {L}_{n,k}\) be the lower triangular of \({\mathbf {D}}_{n,k}\):
Since each \({\mathbf {D}}_{n,k}\) is squared and symmetric with a null diagonal, we can vectorize its lower triangular \(\mathbf {L}_{n,k}\) without losing information. The vectorized lower triangular, called \({\text{vec}}(\mathbf {L}_{n,k})\), can be written as follows:
Note that \( {\text{vec}}\left( \mathbf {L}_{n,k}\right) \) has a length equal to \([J(J1)]/2\).
In the second step, we define, for each kth variable, the matrix \(\mathbf {X}_k\) whose rows are given by the N vectors \({\text{vec}}(\mathbf {L}_{n,k})\):
Therefore, each \(\mathbf {X}_{k}\) is of dimension \(N \times [J(J1)]/2\). The generic element of \( \mathbf {X}_{n}\) is denoted by \(x_{k,n,r}\) \((r=1,\dots ,[J(J1)]/2)\). Then, we can define the kth \({\mathbf {D}}_{k}\) distance matrix with dimension \(N \times N\), whose generic element \( d_{k, n, n\prime }\) can be written as follows:
Each kth distance \({\mathbf {D}}_k\) contains the information about dissimilarity of the N statistical units computed considering the kth variable.
In order to consider the information included in each of the K variables jointly, in the third phase we compute a synthesis of the K distance matrices \({\mathbf {D}}_k\) through the DISTATIS algorithm (for details see “Appendix 3”). The resulting consensus squared Euclidean distance matrix \(\varvec{\tilde{D}}\) (37) has as generic element \(\tilde{d}_{n,n^\prime }\) and represents the synthesis of the K distances in (10).
In the last step, we use the resulting consensus distance matrix in the Partition Around Medoid (PAM) (Kaufman and Rousseeuw 1990) algorithm to obtain the clusters. The PAM algorithm is based on the minimization of the squared elements of matrix \(\varvec{\tilde{D}}\), being one of the unit the centroid. In formulas, we have the following minimization problem:
Clearly, the univariate time series clustering is a special case where \(K=1\). In a this particular framework, we deal where a 3D tensor with the three dimensions are represented by N statistical units, J parameters and T time. Essentially, the clustering procedure in the univariate framework is very similar to the one explained so far, the difference is that we do not need to compute a consensus matrix.
2.3 Example with Gaussian density
Let us consider the data structure shown in (3) where each \(y_{n,k,t}\) time series follows a Gaussian distribution with timevarying parameters. In this case, the predictive density can be written as follows:
where \(\mu _{n,k,t}\) is the timevarying mean, \(\sigma ^2_{n,k,t}\) the timevarying variance, \({\mathcal {F}}_{n,k,t}\) is the information set and \(\theta _{n,k} = \left[ \omega _{n,k}, {\text {diag}}\left( {\mathbf {A}}_{n,k}\right) , {\text {diag}}\left( {{\mathbf {B}}}_{n,k}\right) \right] \) contains the parameters estimated by the following GaussianGAS(1,1) process:
given \(f_{n,k,t}\) the vector containing timevarying parameters \(f_{n,k,j,t} = \left[ f_{n,k,1,t}, f_{n,k,2,t}\right] = \left[ \mu _{n,k,t}, \sigma ^2_{n,k,t}\right] \) and \(s_{n,k,t}\) the scaled score with conditional scores equal to:
where \(\nabla _{n,k,1,t}\) is the score related to the timevarying mean (i.e. \(j=1\)) and \(\nabla _{n,k,2,t}\) is the score related to the timevarying variance (i.e. \(j=2\)). In summary, the model’s variables and parameters are:
In the univariate case (i.e. \(K=1\)), we compute the matrices \({\mathbf {D}}_n\)^{Footnote 1} according to formula (5). The matrices \({\mathbf {D}}_n\) in the case of Gaussian distribution can be written as follows:
The value \(d_{\mu _n, \sigma ^2_n}\) summarises the difference among the \(J=2\) parameters. Two units n and \(n\prime \) can be considered similar if \( d_{\mu _n, \sigma ^2_n}\) is close to \( d_{\mu _{n\prime }, \sigma ^2_{n\prime }}\). According to the procedure highlighted so far, we vectorize the lower triangular of each \({\mathbf {D}}_n\). In the peculiar case of Gaussian density, however, the vectorization results into a single point, i.e. \(d_{\mu _n, \sigma ^2_n}\). Therefore, we concatenate the values of \({\text{vec}}\left( \mathbf {L}_n\right) \) as follows:
obtaining a vector of dimension \(N\times 1\). An Euclidean distance among the values of the vector \(\mathbf {X}\) is the distance matrix used for the implementation of the PAM algorithm. Note that these arguments apply when any probability distribution with \(J=2\) parameters is specified^{Footnote 2}.
Let us now analyse the case in which K multivariate time series are studied with their timevarying parameters jointly. For each N units, we consider the generic kth distance matrix:
Then, we vectorize the lower triangular of each kth matrix. By concatenating these values we obtain the following vector:
Each \(\mathbf {X}_k\) is used to define a dissimilarity matrix \({\mathbf {D}}_k\). To obtain a synthesis, we apply the DISTATIS algorithm of Abdi et al. (2005). Hence, we find a consensus matrix \(\varvec{\tilde{D}}\) that is then employed as the distance in the PAM algorithm (11).
2.4 Example with Generalizedt density
Let us consider the data structure shown in (3) where each \(y_{n,k,t}\) time series follows a Generalizedt distribution with \(J=3\) timevarying parameters. The density of a Generalizedt distribution with timevarying parameters can be written as follows:
with location \(\mu _{n,k,t}\), scale \(\phi _{n,k,t}\) and shape \(\nu _{n,k,t}>2\), \({\mathcal {F}}_{n,k,t}\) is the information set and \(\theta _{n,k} = \left[ \omega _{n,k}, {\text {diag}}\left( {\mathbf {A}}_{n,k}\right) , {\text {diag}}\left( {\mathbf {B}}_{n,k}\right) \right] \) contains the parameters estimated by the following tGAS(1,1) process:
where differently from the Gaussian example, \(f_{n,k,j,t} = \left[ f_{n,k,1,t}, f_{n,k,2,t}, f_{n,k,3,t}\right] = \left[ \mu _{n,k,t}, \phi _{n,k,t}, \nu _{n,k,t}\right] \). The scaled scores, \(s_{n,k,t}\), are equal to:
with \(\psi (\cdot )\) being the Digamma function. Hence, \(\nabla _{n,k,1,t}\) is the score related to the timevarying location (i.e. \(j=1\)), \(\nabla _{n,k,2,t}\) is the score related to the timevarying scale (i.e. \(j=2\)) and \(\nabla _{n,k,3,t}\) is the score related to the timevarying shape (i.e. \(j=3\)). Finally, the model’s variables and parameters are:
Let us discuss, first, the univariate case. We estimate the timevarying parameters by means of the tGAS(1,1) process (29). Then, we compute the matrices \({\mathbf {D}}_n\) according to formula (5). The matrices \({\mathbf {D}}_n\) in the case of Generalizedt distribution can be written as follows:
According to the procedure highlighted so far, we vectorize the lower triangular of each \({\mathbf {D}}_n\). The vectorization results into the following vector:
Then, by concatenating the vectors \({\text{vec}}\left( \mathbf {L}_n\right) \) we have:
where each column of \(\mathbf {X}\) represents the nth statistical unit to be clustered and the rows are the dissimilarities among the timevarying parameters. An Euclidean distance among the columns of the matrix \(\mathbf {X}\) is the distance matrix among the N units. Note that when the probability distribution has \(J>2\) timevarying parameters, the vector \(\mathbf {X}\) (15) becomes a matrix.
Let analyse the case in which K multivariate time series are jointly studied with their timevarying parameters. For each nth unit, let us consider the kth ACFbased distance matrices:
For each kth variable, we vectorize the lower triangular. By concatenating these values we define the following matrix:
As in the example with Gaussian density, each \(\mathbf {X}_k\) is used to define a dissimilarity matrix \({\mathbf {D}}_k\), whose general element is defined in (10). To obtain a synthesis of the K dissimilarity matrices, we apply the DISTATIS algorithm (see “Appendix 3”). Hence, we find a consensus matrix \(\varvec{\tilde{D}}\) that is then employed as the distance in the PAM algorithm (11).
3 Experimental results with simulated data
To show the validity of the proposed clustering procedure, we provide an application to simulated data. We generate several alternative simulation schemes. The simulation schemes are based on time series simulated from the following GaussianGAS processess:
with parameters calibrated on the basis of real time series data. In the case of univariate time series, i.e. with \(K=1\), we provide 90 alternative simulation schemes, comparing the clustering accuracy assuming the following DGPs:

DGPs I: N/2 time series of length T from (24) and N/2 time series of length T from (25);

DGPs II: N/2 time series of length T from (24) and N/2 time series of length T from (26);

DGPs III: N/2 time series of length T from (24) and N/2 time series of length T from (27);

DGPs IV: N/2 time series of length T from (25) and N/2 time series of length T from (26);

DGPs V: N/2 time series of length T from (25) and N/2 time series of length T from (27);

DGPs VI: N/2 time series of length T from (26) and N/2 time series of length T from (27);
under three different sample sizes \(N=\{10, 30, 60\}\) and with four time series’ lengths, namely \(T=\{150,250, 500, 1000, 2000\}\). Therefore, we also evaluate how the performance of the clustering algorithm is affected by the number of statistical units N and the time series’ length T, considering six combinations of the alternative DGPs. For all the simulations we assume \(C=2\) clusters.
The proposed clustering approach is compared with two clustering algorithms. The first benchmark is represented by a standard PAM approach, where cluster analysis is conducted considering the original time series rather than their timevarying parameters. Then, a second benchmark is represented by Cerqueti et al. (2021), that considers the autocorrelation of a target timevarying parameter for clustering. In the case of Gaussian density, we consider the Cerqueti et al. (2021) algorithm with both mean and variance targeting. Differently, the approach proposed in this paper jointly considers all the timevarying parameters in the clustering process.
The performances of the algorithms are compared in terms of adjusted Rand Index (ARI, Hubert and Arabie 1985), averaged over 100 trials as in Park and Jun (2009).
The results in the case of \(N=10\) time series are shown in Table 1.
We notice that the proposed approach provides the best classification for all the considered simulated scenarios. Moreover, the clustering accuracy improves with increasing time series length. For example, looking at the results in the scenario I, we have that with short time series \(T=500\) the ARI is equal to 0.38, while with \(T=2000\) it takes value of 0.88. This pattern is consistent across all the considered scenario. The validity of timevarying parameters based clustering is highlighted also by the good performances of the targeting approaches with respect to the clustering on the original time series. Furthermore, clustering based on variance leads to much more accurate results than the meanbased clustering, hence confirming the results of Cerqueti et al. (2021).
Nevertheless, the values associated to the average Rand Indices vary across the simulation. The maximum value is reached in the simulated scenario II, where the proposed clustering approach provides an ARI value equal to 0.98 with \(T=2000\). Similarly, in the scenario IV we obtain an ARI equal to 0.95 with very long time series. We find the lowest ARI in the scenario V, with a value equal to 0.4. However, also in this case the proposed approach outperforms all the considered alternatives. Particularly, the second best for the scenario V is represented by the clustering approach with variance targeting—which shows an ARI equal to 0.3—characterized by a much lower performance than our proposal. To explore the distribution variability of the estimated ARI, it is possible to analyze the boxplots. For example, Fig. 1 shows the ARI’s boxplots for the simulations obtained with the six alternative DGP considering a time series length of \(T=2000\) and \(N=10\).^{Footnote 3}
According to Fig. 1 we observe that the ARI obtained with the proposed approach is often characterized by lower variability and higher median value than the alternatives. Although the variability associated to the conditional mean targeting approach is generally lower that the other clustering approaches, from Fig. 1 we observe that its ARI values are often below those obtained with the proposed clustering procedure. As showed in Table 1, Fig. 1 confirms that the clustering approach with conditional variance targeting is the most competitive among the considered alternatives. The boxplots referring to the other time series length T are not reported here because the results are very similar to those showed in Fig. 1. Indeed, the distribution variability of the estimated ARI associated to the proposed approach is always lower than the one obtained with the conditional variance clustering approach, which is the second best. Then, although the conditional mean clustering and the benchmark based on raw data show similar or lower variability than the proposed approach, their median and average values are much lower.
The results in the case of \(N=30\) and \(N=60\) time series are shown in the “Appendix 1” in Tables 7 and 8, respectively. Substantially, the performance of the proposed clustering procedure is not affected by the number of statistical units in the sample. Indeed, the overperformance in terms of adjusted Rand Index achieved with the use of the proposed clustering procedure is confirmed. Furthermore, also in these cases we observe higher clustering performances with increasing time series’ length T. The boxplots with \(T=2000\) and \(N=30\) and \(N=60\) are reported in Figs. 17 and 18, “Appendix 1”, showing similar results of Fig. 1. The unreported boxplots with lower time series length T and higher number of statistical units—i.e. \(N=30\) and \(N=60\)—share the same patterns of those showed in the “Appendix 1”.
Then, we consider an alternative simulation scenario where multivariate time series are jointly studied. Particularly, we compare the proposed clustering algorithm based on timevarying parameters with the multistep algorithm discussed in Košmelj (1986), Košmelj and Batagelj (1990), based on the raw time series rather than their distribution parameters.
Also in this case we consider six combination of the DGPs discussed above (24)(27), where the K time series variables for a given nth unit are simulated from the same DGP. For example, in the multivariate version of the scenario I, we simulate a first set of N/2 time series with K variables trough the (24) and another set of N/2 time series with K variables trough the (25). In other words, the K variables assume different values but are generated by the same DGP. As in the simulations with univariate time series, we consider, for each DGPs scenario, different time series’ length \(T=150, 250, 500, 1000, 2000\) and different sample sizes \(N=10, 30, 60\). Therefore, we end up with additional 90 alternative simulated schemes.
The results for \(N=10\) are shown in Table 2.
The results in terms of average ARI are, compared to the benchmark approach, outstanding especially with long time series’ length T. For example, in the scenario I of Table 2 the average ARI is equal to 0.96 for the proposed approach, while the benchmark provides a random partition with an ARI close to 0. Similarly good results are achieved with the scenario IV, where the ARI associated to the proposed clustering approach is equal to 0.98. Moreover, for these simulations we have that the lowest average ARIs associated to the developed clustering procedure are always close to 0.6 for long time series. For example, in the simulated scenario V it is equal to 0.6 versus the value of 0 of the benchmark.
With shorter time series the results are still good. For example, in the scenario I, we obtain an ARI equal to 0.8 with \(T=1000\) and 0.4 with \(T=500\). Unfortunately, not all the simulated scenario show high performances with very short time series \(T=250\) and \(T=150\). The results obtained with \(T=150\) are very close to those with \(T=250\). The best result is achieved with the scenario IV, where the average ARI is equal to 0.3. However, in many cases the average ARI is similar to the benchmark. Therefore, these results confirm that the proposed clustering approach works particularly well in presence of longer time series. This can be explained by the very good performances of the ACFbased distance with long time series data. Conversely, it is known that the performances of the ACFbased tend to be less accurate with short time series.
From these simulations it is evident that the benchmark model is associated to an always very low adjusted Rand Index. The so high performances of the proposed approach can be justified by the DGP, characterized by time variation in the distribution parameters. With the right specification of the DGP, the results in terms of clustering quality resulting by the use of timevarying parameters are very satisfactory.
As in the univariate case, for exploring the distribution variability of the estimated ARI, it is possible to analyze the boxplots. For example, Fig. 2 shows the ARI’s boxplots for the simulations obtained with the six alternative DGP considering \(K=2\) time series length of \(T=2000\).^{Footnote 4}
The proposed clustering procedure performs particularly well in the simulated scenarios with DGP I, DPG II and DPG IV. Indeed, in these cases we have that the variability of the estimated ARI is very low also compared with the conditional variance targeting approach, which represents the second best. The median ARI for the proposed procedure equals to the maximum value of 1 in such simulated scenarios. Quite similar conclusions can be derived from the other scenarios. Furthermore, we observe that the proposed clustering procedure is characterized by lower variability and higher median values than the alternatives, although the variability of the results under the DGPs III, V and VI is higher than in the DGPs I, II and IV. These results confirm those of univariate time series. As in the univariate case, the boxplots referring to simulated scenarion with other time series length T are not reported. Indeed, the obtained results in these unreported cases are close to those showed in Fig. 2. Indeed, considering the ARI obtained with the proposed clustering approach, we find a variability that is always lower (in some simulated scenarios it is very similar) to the one associated with the ARI of the conditional variance clustering approach, which is also in the multivariate case the best among the considered alternatives. The ARI associated to the other two alternative clustering approaches—i.e. conditional mean and raw databased—in general show the same variabilty of our procedure, but with much lower median and average values. Although in some simulated scenarios the conditional mean clustering shows lower variability (e.g. with DGP IV and \(T=150\) or with DGP V and \(T=250\)), such lower variability comes at a cost: lower clustering performances. Therefore, also the analysis of the boxplots shows that the proposed procedure outperforms the considered alternatives.
In the end, we evaluate how the performances change with increasing sample size N. The results of simulations with \(N=30\) and \(N=60\) are shown in Tables 9 and 10, in the “Appendix 1”.
As in the univariate setting, also for multivariate time series the number of statistical units to be clustered does not affect the clustering quality. Tables 9 and 10 confirm the very good performances of the proposed clustering approach with very long time series’ length. Scenario I and IV provide the best results, with average ARI equal to 0.97 and 0.99, respectively. The benchmark model is characterized by very poor performances, confirming that when the distribution parameters change over time a clustering approach that considers the raw time series should not be used. Finally, also with increasing N, we observe very high clustering performances with medium and long time series, whereas the good performances for short time series are not robust across all the simulations.
The boxplots with \(T=2000\) and \(N=30\) and \(N=60\) are reported in Figs. 19 and 20 in the “Appendix 1”. The results are similar to those showed in Fig. 2. The unreported boxplots, associated to simulated scenarios with lower time series length T and \(N=30\) and \(N=60\), share the same patterns of those showed in the “Appendix 1”. Overall, the boxplots with different number of statistical units, i.e. \(N=30\) and \(N=60\), do not differ from the case \(N=10\).
4 Application to air quality time series data
In what follows we show an application of the proposed clustering procedure to environmental time series with the aim of identifying groups of cities characterized by the same levels of air quality.
4.1 Data
Air quality monitoring is conducted by means of stations that measure the content of atmospheric pollutants and weather conditions. By aggregating data, it is possible to obtain the air quality patterns for a given region or city. Air quality is also related to many of the United Nations Sustainable Development Goals. For example, the development of policies aimed at reducing the emission of pollutants in the air are directly related with climate mitigation targets, access to clean energy services, waste management, and other aspects of socioeconomic development (Lu et al. 2015; Rafaj et al. 2018).
The application with real data is conducted on the most important cities in India^{Footnote 5}. In particular, we considered daily air quality time series about Particulate Matter (PM) with values expressed in micron, PM2.5 and PM10, in the period 1\(^{th}\) January 2020– 1\(^{th}\) June 2020. The data at city level are aggregated considering many stations placed within each city^{Footnote 6}. The final sample is characterized by \(N=15\) units (i.e. the cities) observed for \(T=182\) time periods.
The air pollution time series are shown in Figs. 3 (PM2.5) and Fig. 4 (PM10).
The PM2.5 and PM10 time series present some similarities in their patterns for all the cities. For example, we observe that most of the cities show a reduction in air pollution during the period 03/2020–06/2020 according to both the variables. However, there are also significant differences among the cities: some cities are characterized by negative trends (e.g. Kolkata and Mumbai) whereas some others show more stable patterns (e.g. Gurugram and Jaipur).
The presence of deterministic trends in the air pollution time series indicates that the underling processes are not stationary. As discussed in Blasques et al. (2022), stationarity of the observed time series is needed to ensure consistency of maximum likelihood estimator in the case of model misspecification for the GAS processes. For this reason, we prefer to analyze the air pollution’s rate of changes which have the same information for the problem at hand, i.e. clustering cities with same levels of air quality.
4.2 Results with Gaussian density
Figures 5 and 6 show the pattern of the estimated timevarying mean under the hypothesis of Gaussian distribution, while Figs. 7 and 8 show the timevarying variance.
The timevarying parameters provide some useful information about the pattern of air pollution. For example, considering the PM2.5 variable, Coimbatore and Jaipur show a lower level of variability in the conditional mean that fluctuates around a constant value with some spikes associated to days with very low air quality levels. In contrast, Gurugram and Kolkata are characterized by high variability in the conditional mean. These results are confirmed by the analysis of conditional variances shown in Fig. 7, with cities like Coimbatore and Jaipur characterized by quite flat conditional variances and others, like Gurugram and Kolkata, that show the typical pattern of conditionally heteroskedastic processes. The city of Hyderabad, instead, presents a very peculiar pattern for the conditional variance, which differs from the variances observed in the other cities. Considering the PM10 time series (Fig. 8) we also observe clear differences in the timevarying parameters. Coimbatore and Visakhapatnam are characterized by conditional means with low variability, reflecting the quite flat structure of conditional variances. Also in the case of PM10, we recognize that Hyderabad has a very peculiar pattern of the conditional variance. Therefore, we suspect that this city can be an outlier.
We compared the partition obtained by the use of the proposed clustering approach with the one based on the raw time series and the two clustering approaches involving parameter targeting by means of the Average Silhouette Width (ASW) criterion. The results are shown in Fig. 9.
In Fig. 9, the solid line represents the ASW of the proposed clustering procedure based on timevarying parameters, whereas the dashed line shows the ASW values associated to the benchmark for different number of clusters. We note that both the procedures define as optimal number of clusters \(C=3\), but our procedure provides a better partition (the line associated with our procedure is always above those of the benchmarks).
The resulting partitions are shown in Table 3.
Although the groups’ composition differs according to the two considered clustering procedures, some similarities can be highlighted. For example, some cities are clustered together according to the considered approaches. Examples are the cities of Ahmedabad, Amaravati and Coimbatore but also Delhi and Gurugram. This means that the same levels of air quality characterize these cities. However, despite these similarities, the clustering results are different. First of all, our procedure highlights the presence of an outlier, identified as the Hyderabad city, which is the only unit belonging to cluster 3. On the contrary, no outliers are identified by the benchmark clustering approach based on raw data and with variance targeting, while the conditional mean targeting approach considers the city of Ahmedabad as outlier.
As a consequence, also the groups’ size is different. Indeed, the benchmark clustering algorithm based on raw data assigns the cities in the similarly sized clusters, with six cities placed in cluster 1, four cities in cluster 2 and five cities to cluster 3. The mean targeting approach assigns most cities in cluster 2 and four cities in cluster 3. The variance targeting does not highlight any outlier, placing most cities in cluster 2 and two cities in cluster 1 and cluster 3. Differently, our procedure highlights that most Indian cities are placed in cluster 1 (ten units), and a residual part of them is placed in cluster 2 (four units). By looking at the average values of air quality within the clusters (see Table 4), we suppose that the resulting classification could imply some differences in environmental policies.
The proposed clustering procedure allows us to identify the cities characterized by low air quality, i.e. high levels of the PM2.5 and PM10 indicators. More precisely, the cities belonging to cluster 2 show the highest levels of particular matter (PM) in the air. Conversely, the cities in cluster 1 show lower average values. Therefore, cluster 1 includes cities with better air quality. Hyderabad is considered an outlier because of the conditional variance patterns in the air pollution indicators, as shown in Figs. 7 and 8. These results suggest improving air quality in cities belonging to cluster 2, which should be more closely monitored.
4.3 Results with Generalizedt density
To evaluate the impact of the modelling hypothesis on the final results, we also assessed the clusters’ change under an alternative distributional assumption. In the case of environmental time series, which are heavytailed (e.g. see Muller 2016; Williams et al. 2020), it can be more appropriate to use a conditional nonGaussian model. Thanks to the flexibility of the GAS model, the proposed clustering procedure can be extended to the case of nonGaussian distributions. In Sect. 2.4 we have introduced the case of Generalizedt distributionbased clustering procedure. Starting from the same dataset discussed in Sect. 4.1, in what follows we apply the proposed clustering procedure under the Generalizedt distributional assumption.
Figures 10 and 11 show the time series of the estimated timevarying location under the hypothesis of Generalizedt distribution, Figs. 12 and 13 show the estimated timevarying scale and Figs. 14 and 15 show the timevarying shape for both PM2.5 and PM10 time series.
The timevarying location parameters, shown in Figs. 10 and 11, are characterized by fluctuations around a constant longrun value. Two exceptions are the timevarying location of the variable PM2.5 and PM10 related to the city of Patna and Bengalauru, which show a positive trend in the first case and a negative one in the second case. The timevarying scale parameters are shown in Figs. 12 and 13.Moreover, the Hyderabad and Bengalauru cities show timevarying scale parameters of the PM2.5 and PM10 which are very different from those of the other cities. Therefore, the Bengalauru city could be considered as a possible outlier in terms of both location and scale. In the end, the timevarying shape parameters are showed in Figs. 14 and 15. Timevarying shape parameters are interestingly characterized by stationary patterns followed by a large peak. The city of Gurugam is characterized by two large peaks in the variable PM10.
We compared the partition obtained by the use of the proposed clustering approach with the selected benchmarks by means of the Average Silhouette Width (ASW) criterion. The results are shown in Fig. 16.
In terms of ASW, the proposed approach achieves the highest value, about 0.9 with \(C=3\) clusters, among the alternatives. We note that the ASW curve associated with the proposed clustering procedure is always above those of the alternative approaches. This suggests that it provides a better partition.
Some differences and similarities with the results obtained under a Gaussian distribution assumption can be highlighted. For example, as in the Gaussian case, the proposed clustering procedure maximizes the ASW with \(C=3\). This suggests that a partition with three clusters is probably the most appropriate for the analyzed dataset. However, the benchmark approaches, under the Generalizedt distributional assumption, indicate the presence of \(C=2\) clusters, in the case of location and scale targeting approaches, \(C=5\) for the shape targeting approach. The raw databased approach also suggests the presence of \(C=3\) clusters.
It is important to highlight that, under the Generalizedt assumption, all the clustering algorithms improve their performances compared to those with the Gaussian distribution. This suggests that the Generalizedt distribution better describes the considered environmental time series.
The resulting partitions are shown in Table 5.
We note that, in the case of conditional scale targeting, most cities are grouped together with the exception of Hyderabad and Bengaluru. This can be due to the time patterns of their conditional scale parameters for PM2.5 (Hyderabad) and PM10 (Bengaluru). Looking at the partition obtained with the conditional location targeting, Bengaluru and Patna are placed in the cluster 2 because of peculiar patterns of PM10 (Bengaluru) and PM2.5 (Patna). The conditional shape targeting provides a partition with two outliers, Bengalauru and Hyderabad. The proposed clustering procedure provides a partition taking jointly into account all the timevarying parameters. Therefore, it shows a unique outlier in the sample: the city of Bengaluru, with location (PM10) and scale (PM10) very different from the other cities.
Then, we consider the average values of air quality variables PM2.5 and PM10 within the clusters. Table 6 highlights interesting differences among the groups obtained with the proposed approach.
According to PM2.5, we observe that cluster 3 includes cities with very high average values, whereas cluster 1 is more heterogeneous in its composition. The city of Bengalauru has a relatively low value of PM2.5, which is close to the first quartile of the distribution. In the cluster 3 we have two cities with high PM2.5 average values. Delhi is the city with the maximum PM2.5 average value, while Kolkata has a value close to the third quartile of the distribution. Similar patterns can be find looking at the average values of PM10 variable.
5 Final remarks
Clustering time series according to their distribution parameters is a widely explored topic. In this framework, some recent contributions consider time variation in the distribution parameters, but only in the case of univariate time series. This paper provides a clustering procedure based on timevarying parameters for multivariate time series.
Clustering multivariate time series with timevarying parameters is not straightforward because the data structure is a 4D tensor. The four dimensions are: (1) the statistical units, (2) the time, (3) the variables, and (4) the distribution parameters. In the proposed multiway clustering procedure, we adopt a multistep approach where, firstly, a dissimilarity matrix, for each 3D tensor included in the 4D tensor, is computed. Then, starting from each distance matrix, the consensus matrix is computed by the DISTATIS algorithm Abdi et al. (2005). The final partition is obtained by using this distance matrix as input of the PAM algorithm.
An extensive simulation study, conducted considering both different time series lengths, sample sizes and number of variables, compares the performance of the proposed clustering procedure with the one of a standard multistep clustering procedure for 3D tensors applied to the raw time series. For all the considered scenarios, the proposed approach outperforms the alternatives. The usefulness of the proposed clustering is discussed through an application to environmental time series about air quality. As a further support to the validity of our procedure, we notice that the proposed procedure performs in partitioning the considered dataset, although the time series considered in the application are not very long. For this aim, we compare the clusters obtained using the proposed approach with those obtained considering a standard multistep clustering approach for multiway data.
Some future research developments can be highlighted. Firstly, we notice that the procedure developed in the paper can be used for clustering any 4D tensor. Therefore, it can also be adopted for clustering 4D tensors that do not include timevarying parameters. Secondly, we also highlight that the proposed approach could be extended to account for comoments, such as covariance, coskewenss and cokurtosis. This aspect is relevant when the time series show a crossdependence structure in higher moments of the distribution. A third line of future research lies on the parameters’ distribution weighting. Indeed, in the present paper we implicitly assign equal weights to the different timevarying parameters. However, as shown in Cerqueti et al. (2022), it could be interesting to assign different weights to the distribution parameters and search for the optimal weights. This aspect should be taken into account in future studies. In the end, the proposed clustering approach can be extended to include spatial dependence in the data. Spatial dependence arise when dealing with statistical units that are observed over both time and space, such provinces, cities or countries. Therefore, the extension of the proposed clustering procedure to spatiotemporal setting represents another interesting future research line.
Notes
Note that, following (5) it should be \({\mathbf {D}}_{n,1}\). However, to not abuse with notation, we write \({\mathbf {D}}_{n,1} ={\mathbf {D}}_n\).
In the case in which \(J\ge 3\), instead, we have that by concatenating the \({\text{vec}}\left( \mathbf {L}_n\right) \) we obtain a matrix \(\mathbf {X}\) of dimension \([J(J1)]/2\times N\), thus turning into a 2D clustering problem.
The boxplots of all the considered time series lengths are available upon request.
Also in this case, the entire set of the boxplots of all the considered time series lengths is available upon request.
The considered cities are the following: Ahmedabad, Amaravati, Bengaluru, Chandigarh Coimbatore, Delhi, Gurugram, Guwahati, Hyderabad, Jaipur, Kolkata, Mumbai, Patna, Thiruvananthapuram and Visakhapatnam. Other cities have been removed because of missing values
Data about monitoring station can be retrieved at the following link https://cpcb.nic.in/. The final dataset at city level is available on request.
References
Abdi H, O’Toole AJ, Valentin D et al (2005) Distatis: the analysis of multiple distance matrices. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)workshops. IEEE, p 42
Abdi H, Williams LJ, Valentin D et al (2012) Statis and distatis: optimum multitable principal component analysis and three way metric multidimensional scaling. Wiley Interdiscip Rev Comput Stat 4(2):124–167
Anderson JO, Thundiyil JG, Stolbach A (2012) Clearing the air: a review of the effects of particulate matter air pollution on human health. J Med Toxicol 8(2):166–175
Bastos JA, Caiado J (2021) On the classification of financial data with domain agnostic features. Int J Approx Reason 138:1–11
Blasques F, van Brummelen J, Koopman SJ et al (2022) Maximum likelihood estimation for scoredriven models. J Econom 227(2):325–346
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econom 31(3):307–327
Caivano M, Harvey A (2014) Timeseries models with an EGB2 conditional distribution. J Time Ser Anal 35(6):558–571
Cerqueti R, Giacalone M, Mattera R (2021) Modelbased fuzzy time series clustering of conditional higher moments. Int J Approx Reason 134:34–52
Cerqueti R, D’Urso P, De Giovanni L et al (2022) Weighted scoredriven fuzzy clustering of time series with a financial application. Expert Syst Appl 198:116752
Copat C, Cristaldi A, Fiore M et al (2020) The role of air pollution (pm and no2) in covid19 spread and lethality: a systematic review. Environ Res 191(110):129
Cox DR (1981) Statistical analysis of time series: some recent developments. Scand J Stat 8:93–115
Creal D, Koopman SJ, Lucas A (2013) Generalized autoregressive score models with applications. J Appl Econom 28(5):777–795
Dominici F, Sheppard L, Clyde M (2003) Health effects of air pollution: a statistical review. Int Stat Rev 71(2):243–276
D’Urso P (2004) Fuzzy cmeans clustering models for multivariate timevarying data: different approaches. Int J Uncertain Fuzziness KnowlBased Syst 12(03):287–326
D’Urso P, Maharaj EA, Alonso AM (2017) Fuzzy clustering of time series using extremes. Fuzzy Sets Syst 318:56–79
Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econom J Econom Soc 50:987–1007
Escoufier Y (1980) L’analyse conjointe de plusieurs matrices de données. Biométrie et temps 58:59–76
Fulcher BD, Jones NS (2014) Highly comparative featurebased timeseries classification. IEEE Trans Knowl Data Eng 26(12):3026–3037
Gao H, Chen J, Wang B et al (2011) A study of air pollution of city clusters. Atmos Environ 45(18):3069–3077
Govender P, Sivakumar V (2020) Application of kmeans and hierarchical clustering techniques for analysis of air pollution: a review (1980–2019). Atmos Pollut Res 11(1):40–56
Harvey AC (2013) Dynamic models for volatility and heavy tails: with applications to financial and economic time series, vol 52. Cambridge University Press, Cambridge
Harvey CR, Siddique A (1999) Autoregressive conditional skewness. J Financ Quant Anal 34:465–487
Harvey A, Sucarrat G (2014) Egarch models with fat tails, skewness and leverage. Comput Stat Data Anal 76:320–338
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Kaufman L, Rousseeuw PJ (1990) Finding groups in data. An introduction to cluster analysis. Wiley Series in Probability and Mathematical Statistics Applied Probability and Statistics
Koopman SJ, Lucas A, Scharth M (2016) Predicting timevarying parameters with parameterdriven and observationdriven models. Rev Econ Stat 98(1):97–110
Košmelj K (1986) A twostep procedure for clustering time varying data. J Math Sociol 12(3):315–326
Košmelj K, Batagelj V (1990) Crosssectional approach for clustering time varying data. J Classif 7(1):99–109
León Á, Rubio G, Serna G (2005) Autoregresive conditional volatility, skewness and kurtosis. Q Rev Econ Finance 45(4–5):599–618
Liao TW (2005) Clustering of time series dataa survey. Pattern Recogn 38(11):1857–1874
Lu Y, Nakicenovic N, Visbeck M et al (2015) Policy: five priorities for the un sustainable development goals. Nature 520(7548):432–433
Maharaj EA, D’Urso P, Caiado J (2019) Time series clustering and classification. CRC Press, Cambridge
Mattera R, Giacalone M, Gibert K (2021) Distributionbased entropy weighting clustering of skewed and heavy tailed time series. Symmetry 13(6):959
Muller NZ (2016) Power laws and air pollution. Environ Model Assess 21(1):31–52
Nanopoulos A, Alcock R, Manolopoulos Y (2001) Featurebased classification of timeseries data. Int J Comput Res 10(3):49–61
Park HS, Jun CH (2009) A simple and fast algorithm for kmedoids clustering. Expert Syst Appl 36(2):3336–3341
Rafaj P, Kiesewetter G, Gül T et al (2018) Outlook for clean air in the context of sustainable development goals. Glob Environ Change 53:1–11
Rajagopalan S, AlKindi SG, Brook RD (2018) Air pollution and cardiovascular disease: Jacc stateoftheart review. J Am Coll Cardiol 72(17):2054–2070
Salkind NJ (2006) Encyclopedia of measurement and statistics. SAGE Publications, London
Thiébaut B et al (1977) Etude de la pluviosité au moyen de la méthode statis. Revue de statistique appliquée 25(2):57–81
Wang X, Smith K, Hyndman R (2006) Characteristicbased clustering for time series data. Data Min Knowl Disc 13(3):335–364
Wang H, Wang Z, Li X et al (2011) A robust approach based on Weibull distribution for clustering gene expression data. Algorithms Mol Biol 6(1):1–9
Williams G, Schäfer B, Beck C (2020) Superstatistical approach to air pollution statistics. Phys Rev Res 2(1):013,019
Funding
Open access funding provided by Università degli Studi di Roma La Sapienza within the CRUICARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Simulation study: more results
See Figs. 17, 18, 19, 20 and Tables 7, 8, 9, 10.
Appendix 2: The GAS model
The GAS model is based on the assumption that, for each nth unit, the time series variable \(y_{n,t}\) is generated by the following observation density \(p(\cdot )\):
where \(\theta _n\) is a vector of static parameters, \({\mathcal {F}}_{n,t}\) is the information set at time t, \(f_{n,j,t}\) are the J \((j=1,\ldots ,J)\) time–varying parameters depending on the probability distribution. The model’s information set at time t, called \({\mathcal {F}}_{n,t}\), is obtained by the previous realizations of the time series \(y_{n,t}\) and its timevarying parameters \(f_{n,j,t}\). The Generalized Autoregressive Score of order one, the GAS(1, 1), can be written as:
where \(\omega _{n,j}\) is a real vector and \({\mathbf {A}}_{n,j}\) and \({\mathbf {B}}_{n,j}\) are diagonal matrices. All the scalar parameters \(\omega _{n,j}, {\mathbf {A}}_{n,j}, {\mathbf {B}}_{n,j}\) are collected in the vector \(\theta _n\). Moreover, \(s_{n,j,t}\) is the scaled score of the conditional density (28) in a time t in relation to a jth parameter of the nth time series:
with \(\nabla _{n,j,t}\) being the conditional score:
and \(S_{n,t}\) a scaling matrix that depends on the probability distribution, since it is usually set equal to the Fisher information matrix or to the Identity matrix in the case of no scaling.
In other words, in the GAS model we suppose that the evolution of the timevarying parameter vector \(f_{n,j,t}\) depends both on a vector \(s_{n,j,t}\), proportional to the score of the density, and on autoregressive component.
A useful feature of the GAS model is that the vector \(\theta _n\) is obtained by a maximum likelihood estimator (Creal et al. 2013).
Appendix 3: Compromise computation with DISTATIS
The Distance STATIS (DISTATIS, see Abdi et al. 2005) approach is used with the aim of synthesizing many distance matrices computed on the same set of statistical units. The main idea behind DISTATIS is to transform each of these distance matrices into a crossproduct matrix and, then, synthesise the several obtained matrices with a STATIS algorithm (Escoufier 1980; Thiébaut et al. 1977). Therefore, the final result of DISTATIS approach is the definition of a socalled compromise matrix as the best synthesis of the original distance matrices.
Thus, starting from \(K(k=1,\ldots ,K)\) distance matrices \({\mathbf {D}}_k\), the first step of DISTATIS is to transform each distance matrix \({\mathbf {D}}_k\) into a crossproduct matrix \(\varvec{\tilde{S}}_{k}\):
where \(\Xi =I_N1_N \mathbf {m}^{\prime }\), \(I_N\) is the identity matrix of dimension N (where N is the number of the observed statistical units), \(1_N\) is a vector of ones and \(\mathbf {m}\) is a vector of N equal elements \(m_n=1/N\text { }(\text {for } n=1,\ldots ,N)\).
The initial transformation (32) is necessary because the original distance matrices cannot be directly analyzed with STATIS since they are not positive semidefinite. This transformation is particularly relevant when we start from Euclidean distance matrices because, in this case, it is completely reversible, i.e. each Euclidean distance matrix can be perfectly reconstituted from its corresponding crossproduct matrix and viceversa (Abdi et al. 2012). The matrices \(\varvec{\tilde{S}}_{1}, \ldots , \varvec{\tilde{S}}_{K}\) are often normalized prior to the analysis such that, for example, the sum of their squared elements is equal to one or that they have a first eigenvalue equal to one.
In the second step, we search a compromise matrix. The compromise matrix is a crossproduct matrix that gives the best compromise of the crossproduct matrices. It is obtained as a weighted average of these matrices. Therefore, it is necessary to derive an optimal set of weights, by considering the degree of similarity among the K crossproduct matrices \(\varvec{\tilde{S}}_{1}, \ldots , \varvec{\tilde{S}}_{K}\).
The degree of similarity between two generic crossproduct matrices \(\varvec{\tilde{S}}_{k}\) and \(\varvec{\tilde{S}}_{k\prime }\) is computed by means of the \(R_V\) coefficient, defined as:
\(R_V(k, k^{\prime })\) coefficients for each couple k and \(k^{\prime }\) are the generic elements of a socalled cosine matrix \(\mathbf {C}\). By construction the \(R_V\) coefficients fall in the interval \([1, 1]\). This means that, considering two distances matrices k and \(k^{\prime }\), we have that they perfectly agree on the position of the units if \(c_{k, k^{\prime }}=1\), provide opposite results in the case \(c_{k, k^{\prime }}=1\) and are orthogonal if \(c_{k, k^{\prime }}=0\).
To find the optimal weights to use for calculating the compromise matrix, DISTATIS computes the eigendecomposition of the cosine matrix \(\mathbf {C}\):
where \(\mathbf {P}\) is the matrix of eigenvectors \(\mathbf {p}_1, \ldots ,{ \mathbf {p}}_K\) and \(\Lambda \) is the diagonal matrix of the eigenvalues of \(\mathbf {C}\). Let us define the optimal weights vector \(\varvec{\alpha }\), with generic element \(\alpha _k\) \((k=1,\ldots ,K)\), computed as:
where \(\mathbf {p}_{1}\) is the first eigenvector of \(\mathbf {C}\). Therefore, the compromise is computed as:
Starting from the crossproduct matrix \(\varvec{\tilde{S}}\) it is possible to obtain the following Euclidean squared distance matrix \(\varvec{\tilde{D}}\) (see Salkind 2006):
with \(\varvec{\tilde{s}}\) being a vector containing the diagonal element of \(\varvec{\tilde{S}}\), i.e. \(\varvec{\tilde{s}}={\text {diag}}\left( \varvec{\tilde{S}}\right) \). Therefore, the (37) represents the consensus matrix between two or more distance matrices.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cerqueti, R., Mattera, R. & Scepi, G. Multiway clustering with timevarying parameters. Comput Stat (2022). https://doi.org/10.1007/s00180022012945
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00180022012945
Keywords
 Generalized Autoregressive Score
 Dynamic Conditional Score
 timevarying parameters
 Time series clustering
 Multiway data
 Air quality