Abstract
We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag k of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.
Similar content being viewed by others
References
Aghabozorgi, S., Wah, T.Y.: Clustering of large time series data sets. Intell. Data Anal. 18, 793–817 (2014)
Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering—a decade review. Inf. Syst. 53, 16–38 (2015)
Alonso, A.M., Berrendero, J.R., Hernández, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51, 762–766 (2006)
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)
Ando, T., Bai, J.: Panel data models with grouped factor structure under unknown group membership. J. Appl. Econom. 31, 163–191 (2016)
Ando, T., Bai, J.: Clustering huge number of financial time series: a panel data approach with high-dimensional predictor and factor structures. J. Am. Stat. Assoc. 112, 1182–1198 (2017)
Caiado, J., Crato, N., Peña, D.: A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 50, 2668–2684 (2006)
Caiado, J., Maharaj, E.A., D’Urso, P.: Time Series Clustering. Handbook of Cluster Analysis. Chapman and Hall/CRC, Boca Raton (2015)
Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)
Davidson, J.: Stochastic Limit Theory. An Introduction for Econometricians. Oxford University Press, London (1994)
Douzal-Chouakria, A., Nagabhushan, P.N.: Adaptive dis- similarity index for measuring time series proximity. Adv. Data Anal. Classif. 1, 5–21 (2007)
D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)
D’Urso, P., Maharaj, E.A., Alonso, A.M.: Fuzzy clustering of time series using extremes. Fuzzy Sets Syst. 318, 56–79 (2017)
Fruhwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)
García-Martos, C., Conejo, A.J.: Price forecasting techniques in power system. In: Webster, J. (ed.) Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, New York (2013)
Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for FMRI. Magn. Reson. Med. 40, 249–260 (2005)
Granger, C.W., Morris, M.J.: Time series modelling and interpretation. J. R. Stat. Soc. A 139, 246–257 (1976)
Hallin, M., Lippi, M.: Factor models in high-dimensional time series—a time-domain approach. Stoch. Process. Appl. 123, 2678–2695 (2013)
Hamilton, J.D.: Time Series Analysis. Princeton University Press, New Jersey (1994)
Hannan, E.J.: Multiple Time Series. Wiley, New York (1970)
Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 703–730. Chapman and Hall/CRC, Boca Raton (2015)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Kakizawa, Y., Shumway, R.H., Taniguchi, M.: Discrimination and clustering for multivariate time series. J. Am. Stat. Assoc. 93, 328–340 (1998)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech 24, 320–327 (1976)
Koopman, S.J., Ooms, M., Carnero, M.A.: Periodic seasonal Reg-ARFIMA-GARCH models for daily electricity spot prices. J. Am. Stat. Assoc. 102, 16–27 (2007)
Kullback, S.: Information Theory and Statistics. Dover, New York (1968)
Lafuente-Rego, B., Vilar, J.A.: Clustering of time series using quantile autocovariances. Adv. Data Anal. Classif. 10, 391–415 (2015)
Lam, C., Yao, Q.: Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40, 694–726 (2012)
Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Liao, T.W.: Clustering of time series data—a survey. Pattern Recogn. 38, 1857–1874 (2005)
Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)
Maharaj, E.A.: Comparison of non-stationary time series in the frequency domain. Comput. Stat. Data Anal 40, 131–141 (2002)
Maharaj, E.A., D’Urso, P.: Fuzzy clustering of time series in the frequency domain. Inf. Sci. 181, 1187–1211 (2011)
Mahdi, E., McLeod, I.A.: Improved multivariate portmanteau test. J. Time Ser. Anal. 33, 211–222 (2012)
Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98, 873–895 (2007)
Montero, P., Vilar, J.: TSclust: an R package for time series clustering. J. Stat. Softw. 62, 1–43 (2014)
Pamminger, C., Fruhwirth-Schnatter, S.: Model-based clustering of categorical time series. Bayesian Anal. 2, 345–368 (2010)
Peña, D., Box, G.E.P.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82, 836–843 (1987)
Peña, D., Rodríguez, J.: A powerful portmanteau test of lack of test for time series. J. Am. Stat. Assoc. 97, 601–610 (2002)
Peña, D., Rodríguez, J.: Descriptive measures of multivariate scatter and linear dependence. J. Multivar. Anal. 85, 361–374 (2003)
Pértega, S., Vilar, J.A.: Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J. Classif. 27, 333–362 (2010)
Piccolo, D.: A distance measure for classifying ARMA models. J. Time Ser. Anal. 2, 153–163 (1990)
Robbins, M.W., Fisher, T.J.: Cross-correlation matrices for tests of independence and causality between two multivariate time series. J. Bus. Econ. Stat. 33, 459–473 (2015)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Sadahiro, Y., Kobayashi, T.: Exploratory analysis of time series data: detection of partial similarities, clustering, and visualization. Comput. Environ. Urban 45, 24–33 (2014)
Scotto, M.G., Barbosa, S.M., Alonso, A.M.: Extreme value and cluster analysis of European daily temperature series. J. Appl. Stat. 38, 2793–2804 (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 63, 411–423 (2001)
Vilar-Fernández, J.A., Alonso, A.M., Vilar-Fernández, J.M.: Nonlinear time series clustering based on nonparametric forecast densities. Comput. Stat. Data Anal. 54, 2850–2865 (2010)
Vilar, J.A., Lafuente-Rego, B., D’Urso, P.: Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst. 340, 38–72 (2018)
Wang, Y., Tsay, R.S., Ledolter, J., Shrestha, K.M.: Forecasting simultaneously high-dimensional time series: a robust model-based clustering approach. J. Forecast. 32, 673–684 (2013)
Xiong, Y., Yeung, D.: Time series clustering with ARMA mixtures. Pattern Recogn. 37, 1675–1689 (2004)
Zhang, X., Liu, J., Du, Y., Lv, T.: A novel clustering method on time series data. Expert Syst. Appl. 38, 11891–11900 (2011)
Zhang, T.: Clustering high-dimensional time series based on parallelism. J. Am. Stat. Assoc. 108, 577–588 (2013)
Acknowledgements
This research has been supported by Consejo Superior de Investigaciones Científicas (Grant No. ECO2015-66593-P) of MINECO/FEDER/UE. We thanks to professor Tomohiro Ando for making available their code and kindly answer some questions regarding the implementation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alonso, A.M., Peña, D. Clustering time series by linear dependency. Stat Comput 29, 655–676 (2019). https://doi.org/10.1007/s11222-018-9830-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-9830-6