Skip to main content
Log in

Clustering time series by linear dependency

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag k of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aghabozorgi, S., Wah, T.Y.: Clustering of large time series data sets. Intell. Data Anal. 18, 793–817 (2014)

    Article  Google Scholar 

  • Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering—a decade review. Inf. Syst. 53, 16–38 (2015)

    Article  Google Scholar 

  • Alonso, A.M., Berrendero, J.R., Hernández, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51, 762–766 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)

    MATH  Google Scholar 

  • Ando, T., Bai, J.: Panel data models with grouped factor structure under unknown group membership. J. Appl. Econom. 31, 163–191 (2016)

    Article  MathSciNet  Google Scholar 

  • Ando, T., Bai, J.: Clustering huge number of financial time series: a panel data approach with high-dimensional predictor and factor structures. J. Am. Stat. Assoc. 112, 1182–1198 (2017)

    Article  MathSciNet  Google Scholar 

  • Caiado, J., Crato, N., Peña, D.: A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 50, 2668–2684 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Caiado, J., Maharaj, E.A., D’Urso, P.: Time Series Clustering. Handbook of Cluster Analysis. Chapman and Hall/CRC, Boca Raton (2015)

    MATH  Google Scholar 

  • Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Davidson, J.: Stochastic Limit Theory. An Introduction for Econometricians. Oxford University Press, London (1994)

    Book  MATH  Google Scholar 

  • Douzal-Chouakria, A., Nagabhushan, P.N.: Adaptive dis- similarity index for measuring time series proximity. Adv. Data Anal. Classif. 1, 5–21 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)

    Article  MathSciNet  Google Scholar 

  • D’Urso, P., Maharaj, E.A., Alonso, A.M.: Fuzzy clustering of time series using extremes. Fuzzy Sets Syst. 318, 56–79 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Fruhwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)

    Article  MathSciNet  Google Scholar 

  • García-Martos, C., Conejo, A.J.: Price forecasting techniques in power system. In: Webster, J. (ed.) Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, New York (2013)

    Google Scholar 

  • Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for FMRI. Magn. Reson. Med. 40, 249–260 (2005)

    Article  Google Scholar 

  • Granger, C.W., Morris, M.J.: Time series modelling and interpretation. J. R. Stat. Soc. A 139, 246–257 (1976)

    Article  MathSciNet  Google Scholar 

  • Hallin, M., Lippi, M.: Factor models in high-dimensional time series—a time-domain approach. Stoch. Process. Appl. 123, 2678–2695 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Hamilton, J.D.: Time Series Analysis. Princeton University Press, New Jersey (1994)

    MATH  Google Scholar 

  • Hannan, E.J.: Multiple Time Series. Wiley, New York (1970)

    Book  MATH  Google Scholar 

  • Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 703–730. Chapman and Hall/CRC, Boca Raton (2015)

    Chapter  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  MATH  Google Scholar 

  • Kakizawa, Y., Shumway, R.H., Taniguchi, M.: Discrimination and clustering for multivariate time series. J. Am. Stat. Assoc. 93, 328–340 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech 24, 320–327 (1976)

    Article  Google Scholar 

  • Koopman, S.J., Ooms, M., Carnero, M.A.: Periodic seasonal Reg-ARFIMA-GARCH models for daily electricity spot prices. J. Am. Stat. Assoc. 102, 16–27 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Kullback, S.: Information Theory and Statistics. Dover, New York (1968)

    MATH  Google Scholar 

  • Lafuente-Rego, B., Vilar, J.A.: Clustering of time series using quantile autocovariances. Adv. Data Anal. Classif. 10, 391–415 (2015)

    Article  MathSciNet  Google Scholar 

  • Lam, C., Yao, Q.: Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40, 694–726 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)

  • Liao, T.W.: Clustering of time series data—a survey. Pattern Recogn. 38, 1857–1874 (2005)

    Article  MATH  Google Scholar 

  • Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)

    MATH  Google Scholar 

  • Maharaj, E.A.: Comparison of non-stationary time series in the frequency domain. Comput. Stat. Data Anal 40, 131–141 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Maharaj, E.A., D’Urso, P.: Fuzzy clustering of time series in the frequency domain. Inf. Sci. 181, 1187–1211 (2011)

    Article  MATH  Google Scholar 

  • Mahdi, E., McLeod, I.A.: Improved multivariate portmanteau test. J. Time Ser. Anal. 33, 211–222 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98, 873–895 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Montero, P., Vilar, J.: TSclust: an R package for time series clustering. J. Stat. Softw. 62, 1–43 (2014)

    Article  Google Scholar 

  • Pamminger, C., Fruhwirth-Schnatter, S.: Model-based clustering of categorical time series. Bayesian Anal. 2, 345–368 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Peña, D., Box, G.E.P.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82, 836–843 (1987)

    MathSciNet  MATH  Google Scholar 

  • Peña, D., Rodríguez, J.: A powerful portmanteau test of lack of test for time series. J. Am. Stat. Assoc. 97, 601–610 (2002)

    Article  MATH  Google Scholar 

  • Peña, D., Rodríguez, J.: Descriptive measures of multivariate scatter and linear dependence. J. Multivar. Anal. 85, 361–374 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Pértega, S., Vilar, J.A.: Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J. Classif. 27, 333–362 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Piccolo, D.: A distance measure for classifying ARMA models. J. Time Ser. Anal. 2, 153–163 (1990)

    Article  MATH  Google Scholar 

  • Robbins, M.W., Fisher, T.J.: Cross-correlation matrices for tests of independence and causality between two multivariate time series. J. Bus. Econ. Stat. 33, 459–473 (2015)

    Article  MathSciNet  Google Scholar 

  • Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  • Sadahiro, Y., Kobayashi, T.: Exploratory analysis of time series data: detection of partial similarities, clustering, and visualization. Comput. Environ. Urban 45, 24–33 (2014)

    Article  Google Scholar 

  • Scotto, M.G., Barbosa, S.M., Alonso, A.M.: Extreme value and cluster analysis of European daily temperature series. J. Appl. Stat. 38, 2793–2804 (2011)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 63, 411–423 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Vilar-Fernández, J.A., Alonso, A.M., Vilar-Fernández, J.M.: Nonlinear time series clustering based on nonparametric forecast densities. Comput. Stat. Data Anal. 54, 2850–2865 (2010)

    Article  MATH  Google Scholar 

  • Vilar, J.A., Lafuente-Rego, B., D’Urso, P.: Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst. 340, 38–72 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Y., Tsay, R.S., Ledolter, J., Shrestha, K.M.: Forecasting simultaneously high-dimensional time series: a robust model-based clustering approach. J. Forecast. 32, 673–684 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Xiong, Y., Yeung, D.: Time series clustering with ARMA mixtures. Pattern Recogn. 37, 1675–1689 (2004)

    Article  MATH  Google Scholar 

  • Zhang, X., Liu, J., Du, Y., Lv, T.: A novel clustering method on time series data. Expert Syst. Appl. 38, 11891–11900 (2011)

    Article  Google Scholar 

  • Zhang, T.: Clustering high-dimensional time series based on parallelism. J. Am. Stat. Assoc. 108, 577–588 (2013)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research has been supported by Consejo Superior de Investigaciones Científicas (Grant No. ECO2015-66593-P) of MINECO/FEDER/UE. We thanks to professor Tomohiro Ando for making available their code and kindly answer some questions regarding the implementation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrés M. Alonso.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alonso, A.M., Peña, D. Clustering time series by linear dependency. Stat Comput 29, 655–676 (2019). https://doi.org/10.1007/s11222-018-9830-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-018-9830-6

Keywords

Navigation