Skip to main content
Log in

Robust fuzzy clustering based on quantile autocovariances

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Robustness to the presence of outliers in time series clustering is addressed. Assuming that the clustering principle is to group realizations of series generated from similar dependence structures, three robust versions of a fuzzy C-medoids model based on comparing sample quantile autocovariances are proposed by considering, respectively, the so-called metric, noise, and trimmed approaches. Each method achieves its robustness against outliers in different manner. The metric approach considers a suitable transformation of the distance aimed at smoothing the effect of the outliers, the noise approach brings together the outliers into a separated artificial cluster, and the trimmed approach removes a fraction of the time series. All the proposed approaches take advantage of the high capability of the quantile autocovariances to discriminate between independent realizations from a broad range of stationary processes, including linear, non-linear and conditional heteroskedastic models. An extensive simulation study involving scenarios with different generating models and contaminated with outliers is performed. Robustness against (i) outliers generated from different generating patterns, and (ii) outliers characterized by isolated, temporary or persistent level changes is evaluated. The influence of the input parameters required by the different algorithms is analyzed. Regardless of the considered models, the results show that the proposed robust procedures are able to neutralize the effect of the anomalous series preserving the true clustering structure, and fairly outperform other robust algorithms based on alternative metrics. Two applications to financial data sets permit to illustrate the usefulness of the proposed models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://www.bancaditalia.it/compiti/operazioni-cambi/archivio-cambi/index.html.

  2. https://finance.yahoo.com/.

References

  • Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering a decade review. Inf Syst 53(C):16–38

  • Aielli GP, Caporin M (2013) Fast clustering of GARCH processes via gaussian mixture models. Math Comput Simul 94:205–222

    MathSciNet  Google Scholar 

  • Alonso AM, Maharaj EA (2006) Comparison of time series using subsampling. Comput Stat Data Anal 50(10):2589–2599

    MathSciNet  MATH  Google Scholar 

  • Alonso AM, Berrendero JR, Hernández A, Justel A (2006) Time series clustering based on forecast densities. Comput Stat Data Anal 51(2):762–776

    MathSciNet  MATH  Google Scholar 

  • Amendola A, Francq C (2009) Concepts and tools for nonlinear time-series modelling. Wiley, New York, pp 377–427

    Google Scholar 

  • An HZ, Huang FC (1996) The geometrical ergodicity of nonlinear autoregressive models. Stat Sin 6(4):943–956

    MathSciNet  MATH  Google Scholar 

  • Arabie P, Carroll JD, DeSarbo WS, Wind YJ (1981) Overlapping clustering: a new method for product positioning. J Mark Res 18(3):310–317

    Google Scholar 

  • Baruník J, Kley T (2015) Quantile cross-spectral measures of dependence between economic variables. arXiv:1510.06946

  • Bastos JA, Caiado J (2014) Clustering financial time series with variance ratio statistics. Quant Financ 14(12):2121–2133

    MathSciNet  MATH  Google Scholar 

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA

    MATH  Google Scholar 

  • Caiado J, Crato N (2010) Identifying common dynamic features in stock returns. Quant Financ 10(7):797–807

    MathSciNet  Google Scholar 

  • Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50(10):2668–2684

    MathSciNet  MATH  Google Scholar 

  • Caiado J, Crato N, Peña D (2009) Comparison of times series with unequal length in the frequency domain. Commun Stat Simul Comput 38(3):527–540

    MathSciNet  MATH  Google Scholar 

  • Caiado J, Maharaj E, D’Urso P (2015) Time series clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall/CRC, Boca Raton, pp 241–264

    Google Scholar 

  • Campello R, Hruschka E (2006) A fuzzy extension of the sihouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875

    MATH  Google Scholar 

  • Chae SS, Kim C, Kim JM, Warde WD (2008) Cluster analysis using different correlation coefficients. Stat Pap 49(4):715–727

    MathSciNet  MATH  Google Scholar 

  • Chen C, So M, Liu FC (2011) A review of threshold time series models in finance. Stat Interface 4:167–181

    MathSciNet  MATH  Google Scholar 

  • Cimino M, Frosini G, Lazzerini B, Marcelloni F (2005) On the noise distance in robust fuzzy c-means. Proc World Acad Sci Eng Technol 1:361–364

    Google Scholar 

  • Coppi R, D’Urso P (2002) Fuzzy K-means clustering models for triangular fuzzy time trajectories. Stat Methods Appt 11(1):21–40

    MATH  Google Scholar 

  • Coppi R, D’Urso P (2003) Three-way fuzzy clustering models for LR fuzzy time trajectories. Comput Stat Data Anal 43(2):149–177

    MathSciNet  MATH  Google Scholar 

  • Coppi R, D’Urso P (2006) Fuzzy unsupervised classification of multivariate time trajectories with the Shannon entropy regularization. Comput Stat Data Anal 50(6):1452–1477

    MathSciNet  MATH  Google Scholar 

  • Coppi R, D’Urso P, Giordani P (2006) Fuzzy C-medoids clustering models for time-varying data. In: Bouchon-Meunier B, Coletti G, Yager S (eds) Modern information processing: from theory applications. Elsevier, New York, pp 195–206

    Google Scholar 

  • Coppi R, D’Urso P, Giordani P (2010) A fuzzy clustering model for multivariate spatial time series. J Classif 27(1):54–88

    MathSciNet  MATH  Google Scholar 

  • Dave RN (1991) Characterization and detection of noise in clustering. Pattern Recognit Lett 12(11):657–664

    Google Scholar 

  • Davé RN, Krishnapuram R (1997) Robust clustering methods: an unified view. IEEE Trans Fuzzy Syst 5:270–293

    Google Scholar 

  • Davé RN, Sen S (1997) Noise clustering algorithm revisited. In: IEEE Fuzzy information processing society, 1997 annual meeting of the North American, NAFIPS’97, pp 199–204

  • Davé RN, Sen S (2002) Robust fuzzy clustering of relational data. IEEE Trans Fuzzy Syst 10(6):713–727

    Google Scholar 

  • De Luca G, Zuccolotto P (2017) Dynamic tail dependence clustering of financial time series. Stat Pap 58(3):641–657

    MathSciNet  MATH  Google Scholar 

  • Dette H, Hallin M, Kley T, Volgushev S (2015) Of copulas, quantiles, ranks and spectra: an \(l_{1}\)-approach to spectral analysis. Bernoulli 21(2):781–831

    MathSciNet  MATH  Google Scholar 

  • Di Lascio FML, Giannerini S (2016) Clustering dependent observations with copula functions. Stat Pap https://doi.org/10.1007/s00362-016-0822-3

  • Disegna M, D’Urso P, Durante F (2017) Copula-based fuzzy clustering of spatial time series. Spat Stat 21(Part A):209–225

  • Dugard P, Todman JB, Staines H (2010) Approaching multivariate analysis: a practical introduction, 2nd edn. Routledge, London

    Google Scholar 

  • Durante F, Pappadà R, Torelli N (2014) Clustering of financial time series in risky scenarios. Adv Data Anal Classif 8(4):359–376

    MathSciNet  MATH  Google Scholar 

  • Durante F, Pappadà R, Torelli N (2015) Clustering of time series via non-parametric tail dependence estimation. Stat Pap 56(3):701–721

    MathSciNet  MATH  Google Scholar 

  • D’Urso P (2004) Fuzzy C-means clustering models for multivariate time-varying data: different approaches. Int J Uncertain Fuzz 12(03):287–326

    MathSciNet  MATH  Google Scholar 

  • D’Urso P (2005) Fuzzy clustering for data time arrays with inlier and outlier time trajectories. IEEE Trans Fuzzy Syst 13(5):583–604

    Google Scholar 

  • D’Urso P (2015) Fuzzy clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall/CRC, Boca Raton

    Google Scholar 

  • D’Urso P, De Giovanni L (2008) Temporal self-organizing maps for telecommunications market segmentation. Neurocomputing 71(13):2880–2892

    Google Scholar 

  • D’Urso P, De Giovanni L (2014) Robust clustering of imprecise data. Chemometr Intell Lab Syst 136:58–80

    Google Scholar 

  • D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160(24):3565–3589

    MathSciNet  Google Scholar 

  • D’Urso P, Maharaj EA (2012) Wavelets-based clustering of multivariate time series. Fuzzy Sets Syst 193:33–61

    MathSciNet  MATH  Google Scholar 

  • D’Urso P, Cappelli C, Di Lallo D, Massari R (2013a) Clustering of financial time series. Physica A 392(9):2114–2129

    MathSciNet  Google Scholar 

  • D’Urso P, De Giovanni L, Massari R, Di Lallo D (2013b) Noise fuzzy clustering of time series by autoregressive metric. Metron 71(3):217–243

    MathSciNet  MATH  Google Scholar 

  • D’Urso P, Di Lallo D, Maharaj EA (2013c) Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks. Soft Comput 17(1):83–131

    Google Scholar 

  • D’Urso P, De Giovanni L, Maharaj EA, Massari R (2014) Wavelet-based self-organizing maps for classifying multivariate time series. J Chemom 28(1):28–51

    Google Scholar 

  • D’Urso P, De Giovanni L, Massari R (2015) Time series clustering by a robust autoregressive metric with application to air pollution. Chemometr Intell Lab Syst 141:107–124

    Google Scholar 

  • D’Urso P, De Giovanni L, Massari R (2016) GARCH-based robust clustering of time series. Fuzzy Sets Syst 305:1–28

    MathSciNet  MATH  Google Scholar 

  • D’Urso P, Maharaj EA, Alonso AM (2017a) Fuzzy clustering of time series using extremes. Fuzzy Sets Syst 318(Supplement C):56–79. https://doi.org/10.1016/j.fss.2016.10.006

  • D’Urso P, Massari R, Cappelli C, De Giovanni L (2017b) Autoregressive metric-based trimmed fuzzy clustering with an application to PM 10 time series. Chemometr Intell Lab Syst 161:15–26

    Google Scholar 

  • D’Urso P, Giovanni LD, Massari R (2018) Robust fuzzy clustering of multivariate time trajectories. Int J Approx Reason 99:12–38

    MathSciNet  MATH  Google Scholar 

  • Everitt B, Landau S, Leese S (2001) Clust Anal. Arnold Press, London

    Google Scholar 

  • Fan J, Yao Q (2005) Nonlinear time series: nonparametric and parametric methods. Springer series in statistics, Springer, New York

    MATH  Google Scholar 

  • Floriello D, Vitelli V (2017) Sparse clustering of functional data. J Multivar Anal 154:1–18

    MathSciNet  MATH  Google Scholar 

  • Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181

    Google Scholar 

  • García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94(447):956–969

    MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22(2):185–201

    MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2):89–109

    MathSciNet  MATH  Google Scholar 

  • Górecki T, Krzyśko M, Waszak Ł, Wołyński W (2018) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59(1):153–182

    MathSciNet  MATH  Google Scholar 

  • Hagemann A (2013) Robust spectral analysis. arXiv:1111.1965v1

  • Heiser WJ, Groenen PJF (1997) Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima. Psychometrika 62(1):63–83

    MathSciNet  MATH  Google Scholar 

  • Höppner F (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New York

    MATH  Google Scholar 

  • Hruschka H (1986) Market definition and segmentation using fuzzy clustering methods. Int J Res Market 3(2):117–134

    Google Scholar 

  • Hwang H, Desarbo WS, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72(2):181–198

    MathSciNet  MATH  Google Scholar 

  • James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408

    MathSciNet  MATH  Google Scholar 

  • Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series. In: Proceedings IEEE international conference on data mining, 2001 (ICDM 2001), pp 273–280

  • Kamdar T, Joshi A (2000) On creating adaptive web servers using weblog mining. Technical report TR-CS- 00-05, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County

  • Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New York

    MATH  Google Scholar 

  • Kley T, Volgushev S, Dette H, Hallin M (2016) Quantile spectral processes: asymptotic analysis and inference. Bernoulli 22(3):1770–1807

    MathSciNet  MATH  Google Scholar 

  • Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275(C):1–12

  • Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: Snippet clustering, in proceedings of IEEE international conference on fuzzy systems - FUZZIEEE99, Korea, pp 1281–1286

  • Krishnapuram R, Joshi A, Nasraoui O, Yi L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 9:595–607

    Google Scholar 

  • Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22):2176–2177

    Google Scholar 

  • Lafuente-Rego B, Vilar JA (2016a) Clustering of time series using quantile autocovariances. Adv Data Anal Classif 10(3):391–415

    MathSciNet  MATH  Google Scholar 

  • Lafuente-Rego B, Vilar JA (2016b) Fuzzy clustering of series using quantile autocovariances. In: Douzal-Chouakria A, Vilar JA, Marteau PF (eds) Advanced analysis and learning on temporal data: first ECML PKDD workshop, AALTD 2015, Porto, Portugal, September 11, 2015. Springer International Publishing, Cham, Revised Selected Papers, pp 49–64

    Google Scholar 

  • Lee J, Rao S (2012) The quantile spectral density and comparison based tests for nonlinear time series. Unpublished manuscript, Department of Statistics, Texas A&M University, College Station, USA, arXiv:1112.2759v2

  • Li TH (2014) Quantile periodograms. J Am Stat Assoc 107(498):765–776

    MathSciNet  MATH  Google Scholar 

  • Liao TW (2005) Clustering of time series dataa survey. Pattern Recognit 38(11):1857–1874

    MATH  Google Scholar 

  • Linton O, Whang YJ (2007) The quantilogram: With an application to evaluating directional predictability. J Econom 141(1):250–282

    MathSciNet  MATH  Google Scholar 

  • Maharaj EA (1996) A significance test for classifying ARMA models. J Stat Comput Simul 54(4):305–331

    MathSciNet  MATH  Google Scholar 

  • Maharaj EA (1999) Comparison and classification of stationary multivariate time series. Pattern Recognit 32(7):1129–1138

    Google Scholar 

  • Maharaj EA (2000) Cluster of time series. J Classif 17(2):297–314

    Google Scholar 

  • Maharaj EA, D’Urso P (2010) A coherence-based approach for the pattern recognition of time series. Physica A 389(17):3516–3537

    MathSciNet  Google Scholar 

  • Maharaj EA, D’Urso P (2011) Fuzzy clustering of time series in the frequency domain. Inf Sci 181(7):1187–1211

    MATH  Google Scholar 

  • Maharaj EA, Alonso AM, D’Urso P (2015) Clustering seasonal time series using extreme value analysis: an application to spanish temperature time series. Commun Stat 1(4):175–191

    Google Scholar 

  • McBratney A, Moore A (1985) Application of fuzzy sets to climatic classification. Agric For Meteorol 35(1–4):165–185

    Google Scholar 

  • Montero P, Vilar JA (2014) TSclust: An R package for time series clustering. J Stat Softw 62(1):1–43

    Google Scholar 

  • Otranto E (2008) Clustering heteroskedastic time series by model-based procedures. Comput Stat Data Anal 52(10):4685–4698

    MathSciNet  MATH  Google Scholar 

  • Otranto E (2010) Identifying financial time series with similar dynamic conditional correlation. Comput Stat Data Anal 54(1):1–15

    MathSciNet  MATH  Google Scholar 

  • Peña D (2011) Outliers, influential observations, and missing data. Wiley, New York, chap 6:136–170

    Google Scholar 

  • Peng Y, Wang G, Kou G, Shi Y (2011) An empirical study of classification algorithm evaluation for financial risk prediction. Appl Soft Comput 11(2):2906–2915

    Google Scholar 

  • Pértega S, Vilar JA (2010) Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J Classif 27(3):333–362

    MathSciNet  MATH  Google Scholar 

  • Pham TD, Tran LT (1981) On the first-order bilinear time series model. J Appl Probab 18(3):617–627

    MathSciNet  MATH  Google Scholar 

  • Piccolo D (1990) A distance measure for classifying arima models. J Time Ser Anal 11(2):153–164

    MATH  Google Scholar 

  • Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15):1–9

    Google Scholar 

  • Runkler TA, Bezdek JC (1999) Alternating cluster estimation: a new tool for clustering and function approximation. IEEE Trans Fuzzy Syst 7(4):377–393

    Google Scholar 

  • Slaets L, Claeskens G, Hubert M (2012) Phase and amplitude-based clustering for functional data. Comput Stat Data Anal 56(7):2360–2374

    MathSciNet  MATH  Google Scholar 

  • Tarpey T, Kinateder KK (2003) Clustering functional data. J Classif 20(1):093–114

    Google Scholar 

  • Tsay RS (1986) Time series model specification in the presence of outliers. J Am Stat Assoc 81(393):132–141

    Google Scholar 

  • Tsay RS (2016) Some methods for analyzing big dependent data. J Bus Econ Stat 34(4):673–688

    MathSciNet  Google Scholar 

  • Vilar JA, Pértega S (2004) Discriminant and cluster analysis for gaussian stationary processes: local linear fitting approach. J Nonparametr Stat 16(3–4):443–462

    MathSciNet  MATH  Google Scholar 

  • Vilar JM, Vilar JA, Pértega S (2009) Classifying time series data: a nonparametric approach. J Classif 26(1):3–28

    MathSciNet  MATH  Google Scholar 

  • Vilar JA, Alonso AM, Vilar JM (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput Stat Data Anal 54(11):2850–2865

    MathSciNet  MATH  Google Scholar 

  • Vilar JA, Lafuente-Rego B, D’Urso P (2018) Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst 340:38–72

    MathSciNet  MATH  Google Scholar 

  • Wedel M, Kamakura WA (1998) Market segmentation: conceptual and methodological foundations. Kluwer Academic Press, Boston

    Google Scholar 

  • Wu KL, Yang MS (2002) Alternative c-means clustering algorithms. Pattern Recognit 35(10):2267–2278

    MATH  Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847

    Google Scholar 

  • Xiong Y, Yeung DY (2004) Time series clustering with ARMA mixtures. Pattern Recognit 37(8):1675–1689

    MATH  Google Scholar 

  • Yang MS, Wu KL (2004) A similarity-based robust clustering method. IEEE Trans Pattern Anal Mach Intell 26(4):434–448

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to anonymous referees whose suggestions and comments helped to enhance this paper. The research carried out by the authors José A. Vilar and Borja Lafuente-Rego has been supported by MINECO Grants MTM2014-52876-R and MTM2017-82724-R and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Lafuente-Rego.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lafuente-Rego, B., D’Urso, P. & Vilar, J.A. Robust fuzzy clustering based on quantile autocovariances. Stat Papers 61, 2393–2448 (2020). https://doi.org/10.1007/s00362-018-1053-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-1053-6

Keywords

Navigation