Abstract
Robustness to the presence of outliers in time series clustering is addressed. Assuming that the clustering principle is to group realizations of series generated from similar dependence structures, three robust versions of a fuzzy C-medoids model based on comparing sample quantile autocovariances are proposed by considering, respectively, the so-called metric, noise, and trimmed approaches. Each method achieves its robustness against outliers in different manner. The metric approach considers a suitable transformation of the distance aimed at smoothing the effect of the outliers, the noise approach brings together the outliers into a separated artificial cluster, and the trimmed approach removes a fraction of the time series. All the proposed approaches take advantage of the high capability of the quantile autocovariances to discriminate between independent realizations from a broad range of stationary processes, including linear, non-linear and conditional heteroskedastic models. An extensive simulation study involving scenarios with different generating models and contaminated with outliers is performed. Robustness against (i) outliers generated from different generating patterns, and (ii) outliers characterized by isolated, temporary or persistent level changes is evaluated. The influence of the input parameters required by the different algorithms is analyzed. Regardless of the considered models, the results show that the proposed robust procedures are able to neutralize the effect of the anomalous series preserving the true clustering structure, and fairly outperform other robust algorithms based on alternative metrics. Two applications to financial data sets permit to illustrate the usefulness of the proposed models.
Similar content being viewed by others
References
Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering a decade review. Inf Syst 53(C):16–38
Aielli GP, Caporin M (2013) Fast clustering of GARCH processes via gaussian mixture models. Math Comput Simul 94:205–222
Alonso AM, Maharaj EA (2006) Comparison of time series using subsampling. Comput Stat Data Anal 50(10):2589–2599
Alonso AM, Berrendero JR, Hernández A, Justel A (2006) Time series clustering based on forecast densities. Comput Stat Data Anal 51(2):762–776
Amendola A, Francq C (2009) Concepts and tools for nonlinear time-series modelling. Wiley, New York, pp 377–427
An HZ, Huang FC (1996) The geometrical ergodicity of nonlinear autoregressive models. Stat Sin 6(4):943–956
Arabie P, Carroll JD, DeSarbo WS, Wind YJ (1981) Overlapping clustering: a new method for product positioning. J Mark Res 18(3):310–317
Baruník J, Kley T (2015) Quantile cross-spectral measures of dependence between economic variables. arXiv:1510.06946
Bastos JA, Caiado J (2014) Clustering financial time series with variance ratio statistics. Quant Financ 14(12):2121–2133
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA
Caiado J, Crato N (2010) Identifying common dynamic features in stock returns. Quant Financ 10(7):797–807
Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50(10):2668–2684
Caiado J, Crato N, Peña D (2009) Comparison of times series with unequal length in the frequency domain. Commun Stat Simul Comput 38(3):527–540
Caiado J, Maharaj E, D’Urso P (2015) Time series clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall/CRC, Boca Raton, pp 241–264
Campello R, Hruschka E (2006) A fuzzy extension of the sihouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875
Chae SS, Kim C, Kim JM, Warde WD (2008) Cluster analysis using different correlation coefficients. Stat Pap 49(4):715–727
Chen C, So M, Liu FC (2011) A review of threshold time series models in finance. Stat Interface 4:167–181
Cimino M, Frosini G, Lazzerini B, Marcelloni F (2005) On the noise distance in robust fuzzy c-means. Proc World Acad Sci Eng Technol 1:361–364
Coppi R, D’Urso P (2002) Fuzzy K-means clustering models for triangular fuzzy time trajectories. Stat Methods Appt 11(1):21–40
Coppi R, D’Urso P (2003) Three-way fuzzy clustering models for LR fuzzy time trajectories. Comput Stat Data Anal 43(2):149–177
Coppi R, D’Urso P (2006) Fuzzy unsupervised classification of multivariate time trajectories with the Shannon entropy regularization. Comput Stat Data Anal 50(6):1452–1477
Coppi R, D’Urso P, Giordani P (2006) Fuzzy C-medoids clustering models for time-varying data. In: Bouchon-Meunier B, Coletti G, Yager S (eds) Modern information processing: from theory applications. Elsevier, New York, pp 195–206
Coppi R, D’Urso P, Giordani P (2010) A fuzzy clustering model for multivariate spatial time series. J Classif 27(1):54–88
Dave RN (1991) Characterization and detection of noise in clustering. Pattern Recognit Lett 12(11):657–664
Davé RN, Krishnapuram R (1997) Robust clustering methods: an unified view. IEEE Trans Fuzzy Syst 5:270–293
Davé RN, Sen S (1997) Noise clustering algorithm revisited. In: IEEE Fuzzy information processing society, 1997 annual meeting of the North American, NAFIPS’97, pp 199–204
Davé RN, Sen S (2002) Robust fuzzy clustering of relational data. IEEE Trans Fuzzy Syst 10(6):713–727
De Luca G, Zuccolotto P (2017) Dynamic tail dependence clustering of financial time series. Stat Pap 58(3):641–657
Dette H, Hallin M, Kley T, Volgushev S (2015) Of copulas, quantiles, ranks and spectra: an \(l_{1}\)-approach to spectral analysis. Bernoulli 21(2):781–831
Di Lascio FML, Giannerini S (2016) Clustering dependent observations with copula functions. Stat Pap https://doi.org/10.1007/s00362-016-0822-3
Disegna M, D’Urso P, Durante F (2017) Copula-based fuzzy clustering of spatial time series. Spat Stat 21(Part A):209–225
Dugard P, Todman JB, Staines H (2010) Approaching multivariate analysis: a practical introduction, 2nd edn. Routledge, London
Durante F, Pappadà R, Torelli N (2014) Clustering of financial time series in risky scenarios. Adv Data Anal Classif 8(4):359–376
Durante F, Pappadà R, Torelli N (2015) Clustering of time series via non-parametric tail dependence estimation. Stat Pap 56(3):701–721
D’Urso P (2004) Fuzzy C-means clustering models for multivariate time-varying data: different approaches. Int J Uncertain Fuzz 12(03):287–326
D’Urso P (2005) Fuzzy clustering for data time arrays with inlier and outlier time trajectories. IEEE Trans Fuzzy Syst 13(5):583–604
D’Urso P (2015) Fuzzy clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall/CRC, Boca Raton
D’Urso P, De Giovanni L (2008) Temporal self-organizing maps for telecommunications market segmentation. Neurocomputing 71(13):2880–2892
D’Urso P, De Giovanni L (2014) Robust clustering of imprecise data. Chemometr Intell Lab Syst 136:58–80
D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160(24):3565–3589
D’Urso P, Maharaj EA (2012) Wavelets-based clustering of multivariate time series. Fuzzy Sets Syst 193:33–61
D’Urso P, Cappelli C, Di Lallo D, Massari R (2013a) Clustering of financial time series. Physica A 392(9):2114–2129
D’Urso P, De Giovanni L, Massari R, Di Lallo D (2013b) Noise fuzzy clustering of time series by autoregressive metric. Metron 71(3):217–243
D’Urso P, Di Lallo D, Maharaj EA (2013c) Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks. Soft Comput 17(1):83–131
D’Urso P, De Giovanni L, Maharaj EA, Massari R (2014) Wavelet-based self-organizing maps for classifying multivariate time series. J Chemom 28(1):28–51
D’Urso P, De Giovanni L, Massari R (2015) Time series clustering by a robust autoregressive metric with application to air pollution. Chemometr Intell Lab Syst 141:107–124
D’Urso P, De Giovanni L, Massari R (2016) GARCH-based robust clustering of time series. Fuzzy Sets Syst 305:1–28
D’Urso P, Maharaj EA, Alonso AM (2017a) Fuzzy clustering of time series using extremes. Fuzzy Sets Syst 318(Supplement C):56–79. https://doi.org/10.1016/j.fss.2016.10.006
D’Urso P, Massari R, Cappelli C, De Giovanni L (2017b) Autoregressive metric-based trimmed fuzzy clustering with an application to PM 10 time series. Chemometr Intell Lab Syst 161:15–26
D’Urso P, Giovanni LD, Massari R (2018) Robust fuzzy clustering of multivariate time trajectories. Int J Approx Reason 99:12–38
Everitt B, Landau S, Leese S (2001) Clust Anal. Arnold Press, London
Fan J, Yao Q (2005) Nonlinear time series: nonparametric and parametric methods. Springer series in statistics, Springer, New York
Floriello D, Vitelli V (2017) Sparse clustering of functional data. J Multivar Anal 154:1–18
Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94(447):956–969
García-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22(2):185–201
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2):89–109
Górecki T, Krzyśko M, Waszak Ł, Wołyński W (2018) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59(1):153–182
Hagemann A (2013) Robust spectral analysis. arXiv:1111.1965v1
Heiser WJ, Groenen PJF (1997) Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima. Psychometrika 62(1):63–83
Höppner F (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New York
Hruschka H (1986) Market definition and segmentation using fuzzy clustering methods. Int J Res Market 3(2):117–134
Hwang H, Desarbo WS, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72(2):181–198
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408
Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series. In: Proceedings IEEE international conference on data mining, 2001 (ICDM 2001), pp 273–280
Kamdar T, Joshi A (2000) On creating adaptive web servers using weblog mining. Technical report TR-CS- 00-05, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New York
Kley T, Volgushev S, Dette H, Hallin M (2016) Quantile spectral processes: asymptotic analysis and inference. Bernoulli 22(3):1770–1807
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275(C):1–12
Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: Snippet clustering, in proceedings of IEEE international conference on fuzzy systems - FUZZIEEE99, Korea, pp 1281–1286
Krishnapuram R, Joshi A, Nasraoui O, Yi L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 9:595–607
Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22):2176–2177
Lafuente-Rego B, Vilar JA (2016a) Clustering of time series using quantile autocovariances. Adv Data Anal Classif 10(3):391–415
Lafuente-Rego B, Vilar JA (2016b) Fuzzy clustering of series using quantile autocovariances. In: Douzal-Chouakria A, Vilar JA, Marteau PF (eds) Advanced analysis and learning on temporal data: first ECML PKDD workshop, AALTD 2015, Porto, Portugal, September 11, 2015. Springer International Publishing, Cham, Revised Selected Papers, pp 49–64
Lee J, Rao S (2012) The quantile spectral density and comparison based tests for nonlinear time series. Unpublished manuscript, Department of Statistics, Texas A&M University, College Station, USA, arXiv:1112.2759v2
Li TH (2014) Quantile periodograms. J Am Stat Assoc 107(498):765–776
Liao TW (2005) Clustering of time series dataa survey. Pattern Recognit 38(11):1857–1874
Linton O, Whang YJ (2007) The quantilogram: With an application to evaluating directional predictability. J Econom 141(1):250–282
Maharaj EA (1996) A significance test for classifying ARMA models. J Stat Comput Simul 54(4):305–331
Maharaj EA (1999) Comparison and classification of stationary multivariate time series. Pattern Recognit 32(7):1129–1138
Maharaj EA (2000) Cluster of time series. J Classif 17(2):297–314
Maharaj EA, D’Urso P (2010) A coherence-based approach for the pattern recognition of time series. Physica A 389(17):3516–3537
Maharaj EA, D’Urso P (2011) Fuzzy clustering of time series in the frequency domain. Inf Sci 181(7):1187–1211
Maharaj EA, Alonso AM, D’Urso P (2015) Clustering seasonal time series using extreme value analysis: an application to spanish temperature time series. Commun Stat 1(4):175–191
McBratney A, Moore A (1985) Application of fuzzy sets to climatic classification. Agric For Meteorol 35(1–4):165–185
Montero P, Vilar JA (2014) TSclust: An R package for time series clustering. J Stat Softw 62(1):1–43
Otranto E (2008) Clustering heteroskedastic time series by model-based procedures. Comput Stat Data Anal 52(10):4685–4698
Otranto E (2010) Identifying financial time series with similar dynamic conditional correlation. Comput Stat Data Anal 54(1):1–15
Peña D (2011) Outliers, influential observations, and missing data. Wiley, New York, chap 6:136–170
Peng Y, Wang G, Kou G, Shi Y (2011) An empirical study of classification algorithm evaluation for financial risk prediction. Appl Soft Comput 11(2):2906–2915
Pértega S, Vilar JA (2010) Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J Classif 27(3):333–362
Pham TD, Tran LT (1981) On the first-order bilinear time series model. J Appl Probab 18(3):617–627
Piccolo D (1990) A distance measure for classifying arima models. J Time Ser Anal 11(2):153–164
Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15):1–9
Runkler TA, Bezdek JC (1999) Alternating cluster estimation: a new tool for clustering and function approximation. IEEE Trans Fuzzy Syst 7(4):377–393
Slaets L, Claeskens G, Hubert M (2012) Phase and amplitude-based clustering for functional data. Comput Stat Data Anal 56(7):2360–2374
Tarpey T, Kinateder KK (2003) Clustering functional data. J Classif 20(1):093–114
Tsay RS (1986) Time series model specification in the presence of outliers. J Am Stat Assoc 81(393):132–141
Tsay RS (2016) Some methods for analyzing big dependent data. J Bus Econ Stat 34(4):673–688
Vilar JA, Pértega S (2004) Discriminant and cluster analysis for gaussian stationary processes: local linear fitting approach. J Nonparametr Stat 16(3–4):443–462
Vilar JM, Vilar JA, Pértega S (2009) Classifying time series data: a nonparametric approach. J Classif 26(1):3–28
Vilar JA, Alonso AM, Vilar JM (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput Stat Data Anal 54(11):2850–2865
Vilar JA, Lafuente-Rego B, D’Urso P (2018) Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst 340:38–72
Wedel M, Kamakura WA (1998) Market segmentation: conceptual and methodological foundations. Kluwer Academic Press, Boston
Wu KL, Yang MS (2002) Alternative c-means clustering algorithms. Pattern Recognit 35(10):2267–2278
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847
Xiong Y, Yeung DY (2004) Time series clustering with ARMA mixtures. Pattern Recognit 37(8):1675–1689
Yang MS, Wu KL (2004) A similarity-based robust clustering method. IEEE Trans Pattern Anal Mach Intell 26(4):434–448
Acknowledgements
The authors are grateful to anonymous referees whose suggestions and comments helped to enhance this paper. The research carried out by the authors José A. Vilar and Borja Lafuente-Rego has been supported by MINECO Grants MTM2014-52876-R and MTM2017-82724-R and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the European Regional Development Fund (ERDF).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lafuente-Rego, B., D’Urso, P. & Vilar, J.A. Robust fuzzy clustering based on quantile autocovariances. Stat Papers 61, 2393–2448 (2020). https://doi.org/10.1007/s00362-018-1053-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-1053-6