Clustering of time series using quantile autocovariances

Lafuente-Rego, Borja; Vilar, José A.

doi:10.1007/s11634-015-0208-8

Clustering of time series using quantile autocovariances

Regular Article
Published: 26 May 2015

Volume 10, pages 391–415, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Borja Lafuente-Rego¹ &
José A. Vilar¹

1005 Accesses
30 Citations
Explore all metrics

Abstract

Time series clustering is an active research topic with applications in many fields. Unlike conventional clustering on multivariate data, time series often change over time so that the similarity concept between objects must take into account the dynamic of the series. In this paper, a distance measure aimed to compare quantile autocovariance functions is proposed to perform clustering of time series. Quantile autocovariances provide information about the serial dependence structure at different pairs of quantile levels, require no moment condition and allow to identify dependence features that covariance-based methods are unable to detect. Results from an extensive simulation study show that the proposed metric outperforms or is highly competitive with a range of dissimilarities reported in the literature, particularly exhibiting high capability to cluster time series generated from a broad range of dependence models. Estimation of the optimal number of clusters is also addressed. For illustrative purposes, our methodology is applied to a real dataset involving financial time series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Article 09 October 2018

Frequency Domain Clustering: An Application to Time Series with Time-Varying Parameters

Robust fuzzy clustering based on quantile autocovariances

Article 25 October 2018

Notes

http://www.bancaditalia.it/banca_centrale/cambi/rif;internal&action=_set-language.action?LANGUAGE=en.

References

Advances in Data Analysis and Classification (2011) Special issue on “Time series clustering”, vol 5(4). Springer, New York
Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50(10):2668–2684
Article MathSciNet MATH Google Scholar
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Simul Comput 3(1):1–27
Article MathSciNet MATH Google Scholar
Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā Indian J Stat 67:399–417
Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput Stat Data Anal 52(4):1860–1872
Article MathSciNet MATH Google Scholar
Davis RA, Mikosch T (1998) The sample autocorrelations of heavy-tailed processes with applications to arch. Ann Stat 26(5):2049–2080
Article MathSciNet MATH Google Scholar
Davis RA, Mikosch T (2009) The extremogram: a correlogram for extreme events. Bernoulli 15(4):977–1009
Article MathSciNet MATH Google Scholar
De Luca G, Zuccolotto P (2011) A tail dependence-based dissimilarity measure for financial time series clustering. Adv Data Anal Classif 5(4):323–340
Article MathSciNet Google Scholar
Dette H, Hallin M, Kley T, Volgushev S (2014) Of copulas, quantiles, ranks and spectra: An \(l_1\)-approach to spectral analysis. Unpublished manuscript, arXiv:1111.7205v2
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):research0036.1–research0036.21
Article Google Scholar
D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160(24):3565–3589
Article MathSciNet Google Scholar
D’Urso P, Cappelli C, Lallo DD, Massari R (2013) Clustering of financial time series. Physica A 392(9):2114–2129
Article MathSciNet Google Scholar
Frühwirth-Schnatter S (2011) Adv Data Anal Classif 5(4):251–280
Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Business Econ Stat 26(1):78–89
Article MathSciNet Google Scholar
Tc Fu (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Article Google Scholar
Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market (extended abstract): which measure is best? In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, KDD’00, pp 487–496
Hagemann A (2013) Robust spectral analysis. Unpublished manuscript, arXiv:1111.1965v1
Hartigan JA (1975) Clustering algorithms, 99th edn. Wiley, New York
MATH Google Scholar
Hong Y (2000) Generalized spectral tests for serial dependence. J R Stat Soc Ser B Stat Methodol 62(3):557–574
Article MathSciNet MATH Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Hyndman RJ, Fan Y (1996) Sample quantiles in statistical packages. Am Stat 50(4):361–365
Google Scholar
Kao SC, Ganguly AR, Steinhaeuser K (2009) Motivating complex dependence structures in data mining: A case study with anomaly detection in climate. In: Saygin Y, Yu JX, Kargupta H, Ranka S, Yu PS, Wu X (eds) 2013 IEEE 13th International Conference on Data Mining Workshops, IEEE Computer Society, Los Alamitos, pp 223–230
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Article MathSciNet Google Scholar
Koenker R (2005) Quantile regression. Econometric Society Monographs, Cambridge
Book MATH Google Scholar
Koenker RW, D’Orey V (1987) Algorithm as 229: computing regression quantiles. J Royal Stat Soc Series C Appl Stat 36(3):383–393
Google Scholar
Krzanowski WJ, Lai YT (1988) A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44(1):23–34
Article MathSciNet MATH Google Scholar
Lee J, Rao S (2012) The quantile spectral density and comparison based tests for nonlinear time series. Unpublished manuscript, Department of Statistics, Texas A&M University, College Station, arXiv:1112.2759v2
Li TH (2014) Quantile periodograms. J Am Stat Assoc 107(498):765–776
Article MathSciNet MATH Google Scholar
Liao TW (2005) Clustering of time series data: a survey. Pattern Recognit 38(11):1857–1874
Article MATH Google Scholar
Linton O, Whang YJ (2007) The quantilogram: with an application to evaluating directional predictability. J Econom 141(1):250–282
Article MathSciNet MATH Google Scholar
Maharaj EA (1996) A significance test for classifying ARMA models. J Stat Comput Simul 54(4):305–331
Article MathSciNet MATH Google Scholar
Maharaj EA (2000) Clusters of time series. J Classifi 17(2):297–314
Article MathSciNet MATH Google Scholar
Mikosch T, Stărică C (2000) Limit theory for the sample autocorrelations and extremes of a garch (1,1) process. Ann Stat 28(5):1427–1451
Article MathSciNet MATH Google Scholar
Montero P, Vilar JA (2014a) TSclust: An \(\sf R\) package for time series clustering. J Stat Softw 62(1):1–43
Montero P, Vilar JA (2014b) TSclust: Time series clustering utilities. http://CRAN.R-project.org/package=TSclust, \(\sf R\) package version 1.2.1
Otranto E (2008) Clustering heteroskedastic time series by model-based procedures. Comput Stat Data Anal 52(10):4685–4698
Article MathSciNet MATH Google Scholar
Pértega S, Vilar JA (2010) Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J Classif 27(3):333–362
Article MathSciNet MATH Google Scholar
Piccolo D (1990) A distance measure for classifying arima models. J Time Series Anal 11(2):153–164
Article MATH Google Scholar
Ramoni M, Sebastiani P, Cohen P (2002) Bayesian clustering by dynamics. Mach Learn 47(1):91–121
Article MATH Google Scholar
\(\sf R\) Core Team (2014) \(\sf R\): A language and environment for statistical computing. \(\sf R\) Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Skaug HJ, Tjøstheim D (1993) nonparametric test of serial independence based on the empirical distribution function. Biometrika 80(3):591–602
Article MathSciNet MATH Google Scholar
Taylor S (2007) Modelling financial time series. Wiley, New York
Book MATH Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63:411–423
Article MathSciNet MATH Google Scholar
Vilar JA, Pértega S (2004) Discriminant and cluster analysis for gaussian stationary processes: local linear fitting approach. J Nonparametr Stat 16(3–4):443–462
Article MathSciNet MATH Google Scholar
Vilar JA, Alonso AM, Vilar JM (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput Stat Data Anal 54(11):2850–2865
Article MathSciNet MATH Google Scholar
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh EJ (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors wish to thank the three anonymous reviewers and the Editors for their helpful comments and valuable suggestions, which have allowed us to improve the quality of this work. This research was supported by the Spanish grants MTM2011-22392 and MTM2014-52876-R from the Ministerio de Economía y Competitividad.

Author information

Authors and Affiliations

Research Group on Modeling, Optimization and Statistical Inference (MODES), Department of Mathematics, Computer Science Faculty, University of A Coruña, 15071, A Coruña, Spain
Borja Lafuente-Rego & José A. Vilar

Authors

Borja Lafuente-Rego
View author publications
You can also search for this author in PubMed Google Scholar
José A. Vilar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Borja Lafuente-Rego.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lafuente-Rego, B., Vilar, J.A. Clustering of time series using quantile autocovariances. Adv Data Anal Classif 10, 391–415 (2016). https://doi.org/10.1007/s11634-015-0208-8

Download citation

Received: 15 September 2014
Revised: 23 April 2015
Accepted: 10 May 2015
Published: 26 May 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11634-015-0208-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering of time series using quantile autocovariances

Abstract

Access this article

Similar content being viewed by others

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Frequency Domain Clustering: An Application to Time Series with Time-Varying Parameters

Robust fuzzy clustering based on quantile autocovariances

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Clustering of time series using quantile autocovariances

Abstract

Access this article

Similar content being viewed by others

Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic

Frequency Domain Clustering: An Application to Time Series with Time-Varying Parameters

Robust fuzzy clustering based on quantile autocovariances

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation