Clustering time series by linear dependency

Alonso, Andrés M.; Peña, Daniel

doi:10.1007/s11222-018-9830-6

Clustering time series by linear dependency

Published: 05 September 2018

Volume 29, pages 655–676, (2019)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

1658 Accesses
35 Citations
Explore all metrics

Abstract

We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag k of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster Analysis of Time Series via Kendall Distribution

Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality

Article 11 May 2019

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Article 09 October 2018

References

Aghabozorgi, S., Wah, T.Y.: Clustering of large time series data sets. Intell. Data Anal. 18, 793–817 (2014)
Article Google Scholar
Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering—a decade review. Inf. Syst. 53, 16–38 (2015)
Article Google Scholar
Alonso, A.M., Berrendero, J.R., Hernández, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51, 762–766 (2006)
Article MathSciNet MATH Google Scholar
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)
MATH Google Scholar
Ando, T., Bai, J.: Panel data models with grouped factor structure under unknown group membership. J. Appl. Econom. 31, 163–191 (2016)
Article MathSciNet Google Scholar
Ando, T., Bai, J.: Clustering huge number of financial time series: a panel data approach with high-dimensional predictor and factor structures. J. Am. Stat. Assoc. 112, 1182–1198 (2017)
Article MathSciNet Google Scholar
Caiado, J., Crato, N., Peña, D.: A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 50, 2668–2684 (2006)
Article MathSciNet MATH Google Scholar
Caiado, J., Maharaj, E.A., D’Urso, P.: Time Series Clustering. Handbook of Cluster Analysis. Chapman and Hall/CRC, Boca Raton (2015)
MATH Google Scholar
Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)
Article MathSciNet MATH Google Scholar
Davidson, J.: Stochastic Limit Theory. An Introduction for Econometricians. Oxford University Press, London (1994)
Book MATH Google Scholar
Douzal-Chouakria, A., Nagabhushan, P.N.: Adaptive dis- similarity index for measuring time series proximity. Adv. Data Anal. Classif. 1, 5–21 (2007)
Article MathSciNet MATH Google Scholar
D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)
Article MathSciNet Google Scholar
D’Urso, P., Maharaj, E.A., Alonso, A.M.: Fuzzy clustering of time series using extremes. Fuzzy Sets Syst. 318, 56–79 (2017)
Article MathSciNet MATH Google Scholar
Fruhwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)
Article MathSciNet Google Scholar
García-Martos, C., Conejo, A.J.: Price forecasting techniques in power system. In: Webster, J. (ed.) Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, New York (2013)
Google Scholar
Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for FMRI. Magn. Reson. Med. 40, 249–260 (2005)
Article Google Scholar
Granger, C.W., Morris, M.J.: Time series modelling and interpretation. J. R. Stat. Soc. A 139, 246–257 (1976)
Article MathSciNet Google Scholar
Hallin, M., Lippi, M.: Factor models in high-dimensional time series—a time-domain approach. Stoch. Process. Appl. 123, 2678–2695 (2013)
Article MathSciNet MATH Google Scholar
Hamilton, J.D.: Time Series Analysis. Princeton University Press, New Jersey (1994)
MATH Google Scholar
Hannan, E.J.: Multiple Time Series. Wiley, New York (1970)
Book MATH Google Scholar
Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 703–730. Chapman and Hall/CRC, Boca Raton (2015)
Chapter Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article MATH Google Scholar
Kakizawa, Y., Shumway, R.H., Taniguchi, M.: Discrimination and clustering for multivariate time series. J. Am. Stat. Assoc. 93, 328–340 (1998)
Article MathSciNet MATH Google Scholar
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech 24, 320–327 (1976)
Article Google Scholar
Koopman, S.J., Ooms, M., Carnero, M.A.: Periodic seasonal Reg-ARFIMA-GARCH models for daily electricity spot prices. J. Am. Stat. Assoc. 102, 16–27 (2007)
Article MathSciNet MATH Google Scholar
Kullback, S.: Information Theory and Statistics. Dover, New York (1968)
MATH Google Scholar
Lafuente-Rego, B., Vilar, J.A.: Clustering of time series using quantile autocovariances. Adv. Data Anal. Classif. 10, 391–415 (2015)
Article MathSciNet Google Scholar
Lam, C., Yao, Q.: Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40, 694–726 (2012)
Article MathSciNet MATH Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Liao, T.W.: Clustering of time series data—a survey. Pattern Recogn. 38, 1857–1874 (2005)
Article MATH Google Scholar
Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)
MATH Google Scholar
Maharaj, E.A.: Comparison of non-stationary time series in the frequency domain. Comput. Stat. Data Anal 40, 131–141 (2002)
Article MathSciNet MATH Google Scholar
Maharaj, E.A., D’Urso, P.: Fuzzy clustering of time series in the frequency domain. Inf. Sci. 181, 1187–1211 (2011)
Article MATH Google Scholar
Mahdi, E., McLeod, I.A.: Improved multivariate portmanteau test. J. Time Ser. Anal. 33, 211–222 (2012)
Article MathSciNet MATH Google Scholar
Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98, 873–895 (2007)
Article MathSciNet MATH Google Scholar
Montero, P., Vilar, J.: TSclust: an R package for time series clustering. J. Stat. Softw. 62, 1–43 (2014)
Article Google Scholar
Pamminger, C., Fruhwirth-Schnatter, S.: Model-based clustering of categorical time series. Bayesian Anal. 2, 345–368 (2010)
Article MathSciNet MATH Google Scholar
Peña, D., Box, G.E.P.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82, 836–843 (1987)
MathSciNet MATH Google Scholar
Peña, D., Rodríguez, J.: A powerful portmanteau test of lack of test for time series. J. Am. Stat. Assoc. 97, 601–610 (2002)
Article MATH Google Scholar
Peña, D., Rodríguez, J.: Descriptive measures of multivariate scatter and linear dependence. J. Multivar. Anal. 85, 361–374 (2003)
Article MathSciNet MATH Google Scholar
Pértega, S., Vilar, J.A.: Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J. Classif. 27, 333–362 (2010)
Article MathSciNet MATH Google Scholar
Piccolo, D.: A distance measure for classifying ARMA models. J. Time Ser. Anal. 2, 153–163 (1990)
Article MATH Google Scholar
Robbins, M.W., Fisher, T.J.: Cross-correlation matrices for tests of independence and causality between two multivariate time series. J. Bus. Econ. Stat. 33, 459–473 (2015)
Article MathSciNet Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Sadahiro, Y., Kobayashi, T.: Exploratory analysis of time series data: detection of partial similarities, clustering, and visualization. Comput. Environ. Urban 45, 24–33 (2014)
Article Google Scholar
Scotto, M.G., Barbosa, S.M., Alonso, A.M.: Extreme value and cluster analysis of European daily temperature series. J. Appl. Stat. 38, 2793–2804 (2011)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 63, 411–423 (2001)
Article MathSciNet MATH Google Scholar
Vilar-Fernández, J.A., Alonso, A.M., Vilar-Fernández, J.M.: Nonlinear time series clustering based on nonparametric forecast densities. Comput. Stat. Data Anal. 54, 2850–2865 (2010)
Article MATH Google Scholar
Vilar, J.A., Lafuente-Rego, B., D’Urso, P.: Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst. 340, 38–72 (2018)
Article MathSciNet MATH Google Scholar
Wang, Y., Tsay, R.S., Ledolter, J., Shrestha, K.M.: Forecasting simultaneously high-dimensional time series: a robust model-based clustering approach. J. Forecast. 32, 673–684 (2013)
Article MathSciNet MATH Google Scholar
Xiong, Y., Yeung, D.: Time series clustering with ARMA mixtures. Pattern Recogn. 37, 1675–1689 (2004)
Article MATH Google Scholar
Zhang, X., Liu, J., Du, Y., Lv, T.: A novel clustering method on time series data. Expert Syst. Appl. 38, 11891–11900 (2011)
Article Google Scholar
Zhang, T.: Clustering high-dimensional time series based on parallelism. J. Am. Stat. Assoc. 108, 577–588 (2013)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research has been supported by Consejo Superior de Investigaciones Científicas (Grant No. ECO2015-66593-P) of MINECO/FEDER/UE. We thanks to professor Tomohiro Ando for making available their code and kindly answer some questions regarding the implementation.

Author information

Authors and Affiliations

Department of Statistics and Institute Flores de Lemus, Universidad Carlos III de Madrid, Getafe, Spain
Andrés M. Alonso
Department of Statistics and Institute UC3M-BS of Financial Big Data, Universidad Carlos III de Madrid, Getafe, Spain
Daniel Peña

Authors

Andrés M. Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Peña
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrés M. Alonso.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alonso, A.M., Peña, D. Clustering time series by linear dependency. Stat Comput 29, 655–676 (2019). https://doi.org/10.1007/s11222-018-9830-6

Download citation

Received: 25 April 2018
Accepted: 27 August 2018
Published: 05 September 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11222-018-9830-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering time series by linear dependency

Abstract

Access this article

Similar content being viewed by others

Cluster Analysis of Time Series via Kendall Distribution

Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering time series by linear dependency

Abstract

Access this article

Similar content being viewed by others

Cluster Analysis of Time Series via Kendall Distribution

Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation