Advertisement

Statistics and Computing

, Volume 29, Issue 4, pp 655–676 | Cite as

Clustering time series by linear dependency

  • Andrés M. AlonsoEmail author
  • Daniel Peña
Article

Abstract

We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag k of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.

Keywords

Unsupervised learning Dynamic factor models Correlation matrix Correlation coefficient 

Notes

Acknowledgements

This research has been supported by Consejo Superior de Investigaciones Científicas (Grant No. ECO2015-66593-P) of MINECO/FEDER/UE. We thanks to professor Tomohiro Ando for making available their code and kindly answer some questions regarding the implementation.

References

  1. Aghabozorgi, S., Wah, T.Y.: Clustering of large time series data sets. Intell. Data Anal. 18, 793–817 (2014)CrossRefGoogle Scholar
  2. Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering—a decade review. Inf. Syst. 53, 16–38 (2015)CrossRefGoogle Scholar
  3. Alonso, A.M., Berrendero, J.R., Hernández, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51, 762–766 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)zbMATHGoogle Scholar
  5. Ando, T., Bai, J.: Panel data models with grouped factor structure under unknown group membership. J. Appl. Econom. 31, 163–191 (2016)MathSciNetCrossRefGoogle Scholar
  6. Ando, T., Bai, J.: Clustering huge number of financial time series: a panel data approach with high-dimensional predictor and factor structures. J. Am. Stat. Assoc. 112, 1182–1198 (2017)MathSciNetCrossRefGoogle Scholar
  7. Caiado, J., Crato, N., Peña, D.: A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 50, 2668–2684 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  8. Caiado, J., Maharaj, E.A., D’Urso, P.: Time Series Clustering. Handbook of Cluster Analysis. Chapman and Hall/CRC, Boca Raton (2015)zbMATHGoogle Scholar
  9. Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Davidson, J.: Stochastic Limit Theory. An Introduction for Econometricians. Oxford University Press, London (1994)CrossRefzbMATHGoogle Scholar
  11. Douzal-Chouakria, A., Nagabhushan, P.N.: Adaptive dis- similarity index for measuring time series proximity. Adv. Data Anal. Classif. 1, 5–21 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  12. D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)MathSciNetCrossRefGoogle Scholar
  13. D’Urso, P., Maharaj, E.A., Alonso, A.M.: Fuzzy clustering of time series using extremes. Fuzzy Sets Syst. 318, 56–79 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  14. Fruhwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)MathSciNetCrossRefGoogle Scholar
  15. García-Martos, C., Conejo, A.J.: Price forecasting techniques in power system. In: Webster, J. (ed.) Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, New York (2013)Google Scholar
  16. Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for FMRI. Magn. Reson. Med. 40, 249–260 (2005)CrossRefGoogle Scholar
  17. Granger, C.W., Morris, M.J.: Time series modelling and interpretation. J. R. Stat. Soc. A 139, 246–257 (1976)MathSciNetCrossRefGoogle Scholar
  18. Hallin, M., Lippi, M.: Factor models in high-dimensional time series—a time-domain approach. Stoch. Process. Appl. 123, 2678–2695 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  19. Hamilton, J.D.: Time Series Analysis. Princeton University Press, New Jersey (1994)zbMATHGoogle Scholar
  20. Hannan, E.J.: Multiple Time Series. Wiley, New York (1970)CrossRefzbMATHGoogle Scholar
  21. Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 703–730. Chapman and Hall/CRC, Boca Raton (2015)CrossRefGoogle Scholar
  22. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefzbMATHGoogle Scholar
  23. Kakizawa, Y., Shumway, R.H., Taniguchi, M.: Discrimination and clustering for multivariate time series. J. Am. Stat. Assoc. 93, 328–340 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  24. Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech 24, 320–327 (1976)CrossRefGoogle Scholar
  25. Koopman, S.J., Ooms, M., Carnero, M.A.: Periodic seasonal Reg-ARFIMA-GARCH models for daily electricity spot prices. J. Am. Stat. Assoc. 102, 16–27 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  26. Kullback, S.: Information Theory and Statistics. Dover, New York (1968)zbMATHGoogle Scholar
  27. Lafuente-Rego, B., Vilar, J.A.: Clustering of time series using quantile autocovariances. Adv. Data Anal. Classif. 10, 391–415 (2015)MathSciNetCrossRefGoogle Scholar
  28. Lam, C., Yao, Q.: Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40, 694–726 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  29. Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)Google Scholar
  30. Liao, T.W.: Clustering of time series data—a survey. Pattern Recogn. 38, 1857–1874 (2005)CrossRefzbMATHGoogle Scholar
  31. Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)zbMATHGoogle Scholar
  32. Maharaj, E.A.: Comparison of non-stationary time series in the frequency domain. Comput. Stat. Data Anal 40, 131–141 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  33. Maharaj, E.A., D’Urso, P.: Fuzzy clustering of time series in the frequency domain. Inf. Sci. 181, 1187–1211 (2011)CrossRefzbMATHGoogle Scholar
  34. Mahdi, E., McLeod, I.A.: Improved multivariate portmanteau test. J. Time Ser. Anal. 33, 211–222 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  35. Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98, 873–895 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  36. Montero, P., Vilar, J.: TSclust: an R package for time series clustering. J. Stat. Softw. 62, 1–43 (2014)CrossRefGoogle Scholar
  37. Pamminger, C., Fruhwirth-Schnatter, S.: Model-based clustering of categorical time series. Bayesian Anal. 2, 345–368 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  38. Peña, D., Box, G.E.P.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82, 836–843 (1987)MathSciNetzbMATHGoogle Scholar
  39. Peña, D., Rodríguez, J.: A powerful portmanteau test of lack of test for time series. J. Am. Stat. Assoc. 97, 601–610 (2002)CrossRefzbMATHGoogle Scholar
  40. Peña, D., Rodríguez, J.: Descriptive measures of multivariate scatter and linear dependence. J. Multivar. Anal. 85, 361–374 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  41. Pértega, S., Vilar, J.A.: Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J. Classif. 27, 333–362 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  42. Piccolo, D.: A distance measure for classifying ARMA models. J. Time Ser. Anal. 2, 153–163 (1990)CrossRefzbMATHGoogle Scholar
  43. Robbins, M.W., Fisher, T.J.: Cross-correlation matrices for tests of independence and causality between two multivariate time series. J. Bus. Econ. Stat. 33, 459–473 (2015)MathSciNetCrossRefGoogle Scholar
  44. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  45. Sadahiro, Y., Kobayashi, T.: Exploratory analysis of time series data: detection of partial similarities, clustering, and visualization. Comput. Environ. Urban 45, 24–33 (2014)CrossRefGoogle Scholar
  46. Scotto, M.G., Barbosa, S.M., Alonso, A.M.: Extreme value and cluster analysis of European daily temperature series. J. Appl. Stat. 38, 2793–2804 (2011)MathSciNetCrossRefGoogle Scholar
  47. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  48. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 63, 411–423 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  49. Vilar-Fernández, J.A., Alonso, A.M., Vilar-Fernández, J.M.: Nonlinear time series clustering based on nonparametric forecast densities. Comput. Stat. Data Anal. 54, 2850–2865 (2010)CrossRefzbMATHGoogle Scholar
  50. Vilar, J.A., Lafuente-Rego, B., D’Urso, P.: Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst. 340, 38–72 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  51. Wang, Y., Tsay, R.S., Ledolter, J., Shrestha, K.M.: Forecasting simultaneously high-dimensional time series: a robust model-based clustering approach. J. Forecast. 32, 673–684 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  52. Xiong, Y., Yeung, D.: Time series clustering with ARMA mixtures. Pattern Recogn. 37, 1675–1689 (2004)CrossRefzbMATHGoogle Scholar
  53. Zhang, X., Liu, J., Du, Y., Lv, T.: A novel clustering method on time series data. Expert Syst. Appl. 38, 11891–11900 (2011)CrossRefGoogle Scholar
  54. Zhang, T.: Clustering high-dimensional time series based on parallelism. J. Am. Stat. Assoc. 108, 577–588 (2013)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Statistics and Institute Flores de LemusUniversidad Carlos III de MadridGetafeSpain
  2. 2.Department of Statistics and Institute UC3M-BS of Financial Big DataUniversidad Carlos III de MadridGetafeSpain

Personalised recommendations