Clustering time series by linear dependency

Abstract

We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag k of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Aghabozorgi, S., Wah, T.Y.: Clustering of large time series data sets. Intell. Data Anal. 18, 793–817 (2014)

    Article  Google Scholar 

  2. Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering—a decade review. Inf. Syst. 53, 16–38 (2015)

    Article  Google Scholar 

  3. Alonso, A.M., Berrendero, J.R., Hernández, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51, 762–766 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  4. Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)

    Google Scholar 

  5. Ando, T., Bai, J.: Panel data models with grouped factor structure under unknown group membership. J. Appl. Econom. 31, 163–191 (2016)

    MathSciNet  Article  Google Scholar 

  6. Ando, T., Bai, J.: Clustering huge number of financial time series: a panel data approach with high-dimensional predictor and factor structures. J. Am. Stat. Assoc. 112, 1182–1198 (2017)

    MathSciNet  Article  Google Scholar 

  7. Caiado, J., Crato, N., Peña, D.: A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 50, 2668–2684 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  8. Caiado, J., Maharaj, E.A., D’Urso, P.: Time Series Clustering. Handbook of Cluster Analysis. Chapman and Hall/CRC, Boca Raton (2015)

    Google Scholar 

  9. Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  10. Davidson, J.: Stochastic Limit Theory. An Introduction for Econometricians. Oxford University Press, London (1994)

    Google Scholar 

  11. Douzal-Chouakria, A., Nagabhushan, P.N.: Adaptive dis- similarity index for measuring time series proximity. Adv. Data Anal. Classif. 1, 5–21 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  12. D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)

    MathSciNet  Article  Google Scholar 

  13. D’Urso, P., Maharaj, E.A., Alonso, A.M.: Fuzzy clustering of time series using extremes. Fuzzy Sets Syst. 318, 56–79 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  14. Fruhwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)

    MathSciNet  Article  Google Scholar 

  15. García-Martos, C., Conejo, A.J.: Price forecasting techniques in power system. In: Webster, J. (ed.) Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, New York (2013)

    Google Scholar 

  16. Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for FMRI. Magn. Reson. Med. 40, 249–260 (2005)

    Article  Google Scholar 

  17. Granger, C.W., Morris, M.J.: Time series modelling and interpretation. J. R. Stat. Soc. A 139, 246–257 (1976)

    MathSciNet  Article  Google Scholar 

  18. Hallin, M., Lippi, M.: Factor models in high-dimensional time series—a time-domain approach. Stoch. Process. Appl. 123, 2678–2695 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  19. Hamilton, J.D.: Time Series Analysis. Princeton University Press, New Jersey (1994)

    Google Scholar 

  20. Hannan, E.J.: Multiple Time Series. Wiley, New York (1970)

    Google Scholar 

  21. Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 703–730. Chapman and Hall/CRC, Boca Raton (2015)

    Google Scholar 

  22. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  MATH  Google Scholar 

  23. Kakizawa, Y., Shumway, R.H., Taniguchi, M.: Discrimination and clustering for multivariate time series. J. Am. Stat. Assoc. 93, 328–340 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  24. Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech 24, 320–327 (1976)

    Article  Google Scholar 

  25. Koopman, S.J., Ooms, M., Carnero, M.A.: Periodic seasonal Reg-ARFIMA-GARCH models for daily electricity spot prices. J. Am. Stat. Assoc. 102, 16–27 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  26. Kullback, S.: Information Theory and Statistics. Dover, New York (1968)

    Google Scholar 

  27. Lafuente-Rego, B., Vilar, J.A.: Clustering of time series using quantile autocovariances. Adv. Data Anal. Classif. 10, 391–415 (2015)

    MathSciNet  Article  Google Scholar 

  28. Lam, C., Yao, Q.: Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40, 694–726 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  29. Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)

  30. Liao, T.W.: Clustering of time series data—a survey. Pattern Recogn. 38, 1857–1874 (2005)

    Article  MATH  Google Scholar 

  31. Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)

    Google Scholar 

  32. Maharaj, E.A.: Comparison of non-stationary time series in the frequency domain. Comput. Stat. Data Anal 40, 131–141 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  33. Maharaj, E.A., D’Urso, P.: Fuzzy clustering of time series in the frequency domain. Inf. Sci. 181, 1187–1211 (2011)

    Article  MATH  Google Scholar 

  34. Mahdi, E., McLeod, I.A.: Improved multivariate portmanteau test. J. Time Ser. Anal. 33, 211–222 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  35. Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98, 873–895 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  36. Montero, P., Vilar, J.: TSclust: an R package for time series clustering. J. Stat. Softw. 62, 1–43 (2014)

    Article  Google Scholar 

  37. Pamminger, C., Fruhwirth-Schnatter, S.: Model-based clustering of categorical time series. Bayesian Anal. 2, 345–368 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  38. Peña, D., Box, G.E.P.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82, 836–843 (1987)

    MathSciNet  MATH  Google Scholar 

  39. Peña, D., Rodríguez, J.: A powerful portmanteau test of lack of test for time series. J. Am. Stat. Assoc. 97, 601–610 (2002)

    Article  MATH  Google Scholar 

  40. Peña, D., Rodríguez, J.: Descriptive measures of multivariate scatter and linear dependence. J. Multivar. Anal. 85, 361–374 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  41. Pértega, S., Vilar, J.A.: Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J. Classif. 27, 333–362 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  42. Piccolo, D.: A distance measure for classifying ARMA models. J. Time Ser. Anal. 2, 153–163 (1990)

    Article  MATH  Google Scholar 

  43. Robbins, M.W., Fisher, T.J.: Cross-correlation matrices for tests of independence and causality between two multivariate time series. J. Bus. Econ. Stat. 33, 459–473 (2015)

    MathSciNet  Article  Google Scholar 

  44. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  45. Sadahiro, Y., Kobayashi, T.: Exploratory analysis of time series data: detection of partial similarities, clustering, and visualization. Comput. Environ. Urban 45, 24–33 (2014)

    Article  Google Scholar 

  46. Scotto, M.G., Barbosa, S.M., Alonso, A.M.: Extreme value and cluster analysis of European daily temperature series. J. Appl. Stat. 38, 2793–2804 (2011)

    MathSciNet  Article  Google Scholar 

  47. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  48. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 63, 411–423 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  49. Vilar-Fernández, J.A., Alonso, A.M., Vilar-Fernández, J.M.: Nonlinear time series clustering based on nonparametric forecast densities. Comput. Stat. Data Anal. 54, 2850–2865 (2010)

    Article  MATH  Google Scholar 

  50. Vilar, J.A., Lafuente-Rego, B., D’Urso, P.: Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst. 340, 38–72 (2018)

    MathSciNet  Article  MATH  Google Scholar 

  51. Wang, Y., Tsay, R.S., Ledolter, J., Shrestha, K.M.: Forecasting simultaneously high-dimensional time series: a robust model-based clustering approach. J. Forecast. 32, 673–684 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  52. Xiong, Y., Yeung, D.: Time series clustering with ARMA mixtures. Pattern Recogn. 37, 1675–1689 (2004)

    Article  MATH  Google Scholar 

  53. Zhang, X., Liu, J., Du, Y., Lv, T.: A novel clustering method on time series data. Expert Syst. Appl. 38, 11891–11900 (2011)

    Article  Google Scholar 

  54. Zhang, T.: Clustering high-dimensional time series based on parallelism. J. Am. Stat. Assoc. 108, 577–588 (2013)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgements

This research has been supported by Consejo Superior de Investigaciones Científicas (Grant No. ECO2015-66593-P) of MINECO/FEDER/UE. We thanks to professor Tomohiro Ando for making available their code and kindly answer some questions regarding the implementation.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Andrés M. Alonso.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alonso, A.M., Peña, D. Clustering time series by linear dependency. Stat Comput 29, 655–676 (2019). https://doi.org/10.1007/s11222-018-9830-6

Download citation

Keywords

  • Unsupervised learning
  • Dynamic factor models
  • Correlation matrix
  • Correlation coefficient