Robust fuzzy clustering based on quantile autocovariances

Lafuente-Rego, B.; D’Urso, P.; Vilar, J. A.

doi:10.1007/s00362-018-1053-6

Robust fuzzy clustering based on quantile autocovariances

Regular Article
Published: 25 October 2018

Volume 61, pages 2393–2448, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

B. Lafuente-Rego¹,
P. D’Urso² &
J. A. Vilar¹

596 Accesses
16 Citations
Explore all metrics

Abstract

Robustness to the presence of outliers in time series clustering is addressed. Assuming that the clustering principle is to group realizations of series generated from similar dependence structures, three robust versions of a fuzzy C-medoids model based on comparing sample quantile autocovariances are proposed by considering, respectively, the so-called metric, noise, and trimmed approaches. Each method achieves its robustness against outliers in different manner. The metric approach considers a suitable transformation of the distance aimed at smoothing the effect of the outliers, the noise approach brings together the outliers into a separated artificial cluster, and the trimmed approach removes a fraction of the time series. All the proposed approaches take advantage of the high capability of the quantile autocovariances to discriminate between independent realizations from a broad range of stationary processes, including linear, non-linear and conditional heteroskedastic models. An extensive simulation study involving scenarios with different generating models and contaminated with outliers is performed. Robustness against (i) outliers generated from different generating patterns, and (ii) outliers characterized by isolated, temporary or persistent level changes is evaluated. The influence of the input parameters required by the different algorithms is analyzed. Regardless of the considered models, the results show that the proposed robust procedures are able to neutralize the effect of the anomalous series preserving the true clustering structure, and fairly outperform other robust algorithms based on alternative metrics. Two applications to financial data sets permit to illustrate the usefulness of the proposed models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy Clustering of Series Using Quantile Autocovariances

Fuzzy clustering of time series based on weighted conditional higher moments

Article Open access 05 November 2023

Robust DTW-based entropy fuzzy clustering of time series

Article Open access 02 December 2023

Notes

References

Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering a decade review. Inf Syst 53(C):16–38
Aielli GP, Caporin M (2013) Fast clustering of GARCH processes via gaussian mixture models. Math Comput Simul 94:205–222
MathSciNet Google Scholar
Alonso AM, Maharaj EA (2006) Comparison of time series using subsampling. Comput Stat Data Anal 50(10):2589–2599
MathSciNet MATH Google Scholar
Alonso AM, Berrendero JR, Hernández A, Justel A (2006) Time series clustering based on forecast densities. Comput Stat Data Anal 51(2):762–776
MathSciNet MATH Google Scholar
Amendola A, Francq C (2009) Concepts and tools for nonlinear time-series modelling. Wiley, New York, pp 377–427
Google Scholar
An HZ, Huang FC (1996) The geometrical ergodicity of nonlinear autoregressive models. Stat Sin 6(4):943–956
MathSciNet MATH Google Scholar
Arabie P, Carroll JD, DeSarbo WS, Wind YJ (1981) Overlapping clustering: a new method for product positioning. J Mark Res 18(3):310–317
Google Scholar
Baruník J, Kley T (2015) Quantile cross-spectral measures of dependence between economic variables. arXiv:1510.06946
Bastos JA, Caiado J (2014) Clustering financial time series with variance ratio statistics. Quant Financ 14(12):2121–2133
MathSciNet MATH Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA
MATH Google Scholar
Caiado J, Crato N (2010) Identifying common dynamic features in stock returns. Quant Financ 10(7):797–807
MathSciNet Google Scholar
Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50(10):2668–2684
MathSciNet MATH Google Scholar
Caiado J, Crato N, Peña D (2009) Comparison of times series with unequal length in the frequency domain. Commun Stat Simul Comput 38(3):527–540
MathSciNet MATH Google Scholar
Caiado J, Maharaj E, D’Urso P (2015) Time series clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall/CRC, Boca Raton, pp 241–264
Google Scholar
Campello R, Hruschka E (2006) A fuzzy extension of the sihouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875
MATH Google Scholar
Chae SS, Kim C, Kim JM, Warde WD (2008) Cluster analysis using different correlation coefficients. Stat Pap 49(4):715–727
MathSciNet MATH Google Scholar
Chen C, So M, Liu FC (2011) A review of threshold time series models in finance. Stat Interface 4:167–181
MathSciNet MATH Google Scholar
Cimino M, Frosini G, Lazzerini B, Marcelloni F (2005) On the noise distance in robust fuzzy c-means. Proc World Acad Sci Eng Technol 1:361–364
Google Scholar
Coppi R, D’Urso P (2002) Fuzzy K-means clustering models for triangular fuzzy time trajectories. Stat Methods Appt 11(1):21–40
MATH Google Scholar
Coppi R, D’Urso P (2003) Three-way fuzzy clustering models for LR fuzzy time trajectories. Comput Stat Data Anal 43(2):149–177
MathSciNet MATH Google Scholar
Coppi R, D’Urso P (2006) Fuzzy unsupervised classification of multivariate time trajectories with the Shannon entropy regularization. Comput Stat Data Anal 50(6):1452–1477
MathSciNet MATH Google Scholar
Coppi R, D’Urso P, Giordani P (2006) Fuzzy C-medoids clustering models for time-varying data. In: Bouchon-Meunier B, Coletti G, Yager S (eds) Modern information processing: from theory applications. Elsevier, New York, pp 195–206
Google Scholar
Coppi R, D’Urso P, Giordani P (2010) A fuzzy clustering model for multivariate spatial time series. J Classif 27(1):54–88
MathSciNet MATH Google Scholar
Dave RN (1991) Characterization and detection of noise in clustering. Pattern Recognit Lett 12(11):657–664
Google Scholar
Davé RN, Krishnapuram R (1997) Robust clustering methods: an unified view. IEEE Trans Fuzzy Syst 5:270–293
Google Scholar
Davé RN, Sen S (1997) Noise clustering algorithm revisited. In: IEEE Fuzzy information processing society, 1997 annual meeting of the North American, NAFIPS’97, pp 199–204
Davé RN, Sen S (2002) Robust fuzzy clustering of relational data. IEEE Trans Fuzzy Syst 10(6):713–727
Google Scholar
De Luca G, Zuccolotto P (2017) Dynamic tail dependence clustering of financial time series. Stat Pap 58(3):641–657
MathSciNet MATH Google Scholar
Dette H, Hallin M, Kley T, Volgushev S (2015) Of copulas, quantiles, ranks and spectra: an \(l_{1}\)-approach to spectral analysis. Bernoulli 21(2):781–831
MathSciNet MATH Google Scholar
Di Lascio FML, Giannerini S (2016) Clustering dependent observations with copula functions. Stat Pap https://doi.org/10.1007/s00362-016-0822-3
Disegna M, D’Urso P, Durante F (2017) Copula-based fuzzy clustering of spatial time series. Spat Stat 21(Part A):209–225
Dugard P, Todman JB, Staines H (2010) Approaching multivariate analysis: a practical introduction, 2nd edn. Routledge, London
Google Scholar
Durante F, Pappadà R, Torelli N (2014) Clustering of financial time series in risky scenarios. Adv Data Anal Classif 8(4):359–376
MathSciNet MATH Google Scholar
Durante F, Pappadà R, Torelli N (2015) Clustering of time series via non-parametric tail dependence estimation. Stat Pap 56(3):701–721
MathSciNet MATH Google Scholar
D’Urso P (2004) Fuzzy C-means clustering models for multivariate time-varying data: different approaches. Int J Uncertain Fuzz 12(03):287–326
MathSciNet MATH Google Scholar
D’Urso P (2005) Fuzzy clustering for data time arrays with inlier and outlier time trajectories. IEEE Trans Fuzzy Syst 13(5):583–604
Google Scholar
D’Urso P (2015) Fuzzy clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall/CRC, Boca Raton
Google Scholar
D’Urso P, De Giovanni L (2008) Temporal self-organizing maps for telecommunications market segmentation. Neurocomputing 71(13):2880–2892
Google Scholar
D’Urso P, De Giovanni L (2014) Robust clustering of imprecise data. Chemometr Intell Lab Syst 136:58–80
Google Scholar
D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160(24):3565–3589
MathSciNet Google Scholar
D’Urso P, Maharaj EA (2012) Wavelets-based clustering of multivariate time series. Fuzzy Sets Syst 193:33–61
MathSciNet MATH Google Scholar
D’Urso P, Cappelli C, Di Lallo D, Massari R (2013a) Clustering of financial time series. Physica A 392(9):2114–2129
MathSciNet Google Scholar
D’Urso P, De Giovanni L, Massari R, Di Lallo D (2013b) Noise fuzzy clustering of time series by autoregressive metric. Metron 71(3):217–243
MathSciNet MATH Google Scholar
D’Urso P, Di Lallo D, Maharaj EA (2013c) Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks. Soft Comput 17(1):83–131
Google Scholar
D’Urso P, De Giovanni L, Maharaj EA, Massari R (2014) Wavelet-based self-organizing maps for classifying multivariate time series. J Chemom 28(1):28–51
Google Scholar
D’Urso P, De Giovanni L, Massari R (2015) Time series clustering by a robust autoregressive metric with application to air pollution. Chemometr Intell Lab Syst 141:107–124
Google Scholar
D’Urso P, De Giovanni L, Massari R (2016) GARCH-based robust clustering of time series. Fuzzy Sets Syst 305:1–28
MathSciNet MATH Google Scholar
D’Urso P, Maharaj EA, Alonso AM (2017a) Fuzzy clustering of time series using extremes. Fuzzy Sets Syst 318(Supplement C):56–79. https://doi.org/10.1016/j.fss.2016.10.006
D’Urso P, Massari R, Cappelli C, De Giovanni L (2017b) Autoregressive metric-based trimmed fuzzy clustering with an application to PM 10 time series. Chemometr Intell Lab Syst 161:15–26
Google Scholar
D’Urso P, Giovanni LD, Massari R (2018) Robust fuzzy clustering of multivariate time trajectories. Int J Approx Reason 99:12–38
MathSciNet MATH Google Scholar
Everitt B, Landau S, Leese S (2001) Clust Anal. Arnold Press, London
Google Scholar
Fan J, Yao Q (2005) Nonlinear time series: nonparametric and parametric methods. Springer series in statistics, Springer, New York
MATH Google Scholar
Floriello D, Vitelli V (2017) Sparse clustering of functional data. J Multivar Anal 154:1–18
MathSciNet MATH Google Scholar
Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Google Scholar
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94(447):956–969
MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22(2):185–201
MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2):89–109
MathSciNet MATH Google Scholar
Górecki T, Krzyśko M, Waszak Ł, Wołyński W (2018) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59(1):153–182
MathSciNet MATH Google Scholar
Hagemann A (2013) Robust spectral analysis. arXiv:1111.1965v1
Heiser WJ, Groenen PJF (1997) Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima. Psychometrika 62(1):63–83
MathSciNet MATH Google Scholar
Höppner F (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New York
MATH Google Scholar
Hruschka H (1986) Market definition and segmentation using fuzzy clustering methods. Int J Res Market 3(2):117–134
Google Scholar
Hwang H, Desarbo WS, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72(2):181–198
MathSciNet MATH Google Scholar
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408
MathSciNet MATH Google Scholar
Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series. In: Proceedings IEEE international conference on data mining, 2001 (ICDM 2001), pp 273–280
Kamdar T, Joshi A (2000) On creating adaptive web servers using weblog mining. Technical report TR-CS- 00-05, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New York
MATH Google Scholar
Kley T, Volgushev S, Dette H, Hallin M (2016) Quantile spectral processes: asymptotic analysis and inference. Bernoulli 22(3):1770–1807
MathSciNet MATH Google Scholar
Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275(C):1–12
Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: Snippet clustering, in proceedings of IEEE international conference on fuzzy systems - FUZZIEEE99, Korea, pp 1281–1286
Krishnapuram R, Joshi A, Nasraoui O, Yi L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 9:595–607
Google Scholar
Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22):2176–2177
Google Scholar
Lafuente-Rego B, Vilar JA (2016a) Clustering of time series using quantile autocovariances. Adv Data Anal Classif 10(3):391–415
MathSciNet MATH Google Scholar
Lafuente-Rego B, Vilar JA (2016b) Fuzzy clustering of series using quantile autocovariances. In: Douzal-Chouakria A, Vilar JA, Marteau PF (eds) Advanced analysis and learning on temporal data: first ECML PKDD workshop, AALTD 2015, Porto, Portugal, September 11, 2015. Springer International Publishing, Cham, Revised Selected Papers, pp 49–64
Google Scholar
Lee J, Rao S (2012) The quantile spectral density and comparison based tests for nonlinear time series. Unpublished manuscript, Department of Statistics, Texas A&M University, College Station, USA, arXiv:1112.2759v2
Li TH (2014) Quantile periodograms. J Am Stat Assoc 107(498):765–776
MathSciNet MATH Google Scholar
Liao TW (2005) Clustering of time series dataa survey. Pattern Recognit 38(11):1857–1874
MATH Google Scholar
Linton O, Whang YJ (2007) The quantilogram: With an application to evaluating directional predictability. J Econom 141(1):250–282
MathSciNet MATH Google Scholar
Maharaj EA (1996) A significance test for classifying ARMA models. J Stat Comput Simul 54(4):305–331
MathSciNet MATH Google Scholar
Maharaj EA (1999) Comparison and classification of stationary multivariate time series. Pattern Recognit 32(7):1129–1138
Google Scholar
Maharaj EA (2000) Cluster of time series. J Classif 17(2):297–314
Google Scholar
Maharaj EA, D’Urso P (2010) A coherence-based approach for the pattern recognition of time series. Physica A 389(17):3516–3537
MathSciNet Google Scholar
Maharaj EA, D’Urso P (2011) Fuzzy clustering of time series in the frequency domain. Inf Sci 181(7):1187–1211
MATH Google Scholar
Maharaj EA, Alonso AM, D’Urso P (2015) Clustering seasonal time series using extreme value analysis: an application to spanish temperature time series. Commun Stat 1(4):175–191
Google Scholar
McBratney A, Moore A (1985) Application of fuzzy sets to climatic classification. Agric For Meteorol 35(1–4):165–185
Google Scholar
Montero P, Vilar JA (2014) TSclust: An R package for time series clustering. J Stat Softw 62(1):1–43
Google Scholar
Otranto E (2008) Clustering heteroskedastic time series by model-based procedures. Comput Stat Data Anal 52(10):4685–4698
MathSciNet MATH Google Scholar
Otranto E (2010) Identifying financial time series with similar dynamic conditional correlation. Comput Stat Data Anal 54(1):1–15
MathSciNet MATH Google Scholar
Peña D (2011) Outliers, influential observations, and missing data. Wiley, New York, chap 6:136–170
Google Scholar
Peng Y, Wang G, Kou G, Shi Y (2011) An empirical study of classification algorithm evaluation for financial risk prediction. Appl Soft Comput 11(2):2906–2915
Google Scholar
Pértega S, Vilar JA (2010) Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J Classif 27(3):333–362
MathSciNet MATH Google Scholar
Pham TD, Tran LT (1981) On the first-order bilinear time series model. J Appl Probab 18(3):617–627
MathSciNet MATH Google Scholar
Piccolo D (1990) A distance measure for classifying arima models. J Time Ser Anal 11(2):153–164
MATH Google Scholar
Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15):1–9
Google Scholar
Runkler TA, Bezdek JC (1999) Alternating cluster estimation: a new tool for clustering and function approximation. IEEE Trans Fuzzy Syst 7(4):377–393
Google Scholar
Slaets L, Claeskens G, Hubert M (2012) Phase and amplitude-based clustering for functional data. Comput Stat Data Anal 56(7):2360–2374
MathSciNet MATH Google Scholar
Tarpey T, Kinateder KK (2003) Clustering functional data. J Classif 20(1):093–114
Google Scholar
Tsay RS (1986) Time series model specification in the presence of outliers. J Am Stat Assoc 81(393):132–141
Google Scholar
Tsay RS (2016) Some methods for analyzing big dependent data. J Bus Econ Stat 34(4):673–688
MathSciNet Google Scholar
Vilar JA, Pértega S (2004) Discriminant and cluster analysis for gaussian stationary processes: local linear fitting approach. J Nonparametr Stat 16(3–4):443–462
MathSciNet MATH Google Scholar
Vilar JM, Vilar JA, Pértega S (2009) Classifying time series data: a nonparametric approach. J Classif 26(1):3–28
MathSciNet MATH Google Scholar
Vilar JA, Alonso AM, Vilar JM (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput Stat Data Anal 54(11):2850–2865
MathSciNet MATH Google Scholar
Vilar JA, Lafuente-Rego B, D’Urso P (2018) Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series. Fuzzy Sets Syst 340:38–72
MathSciNet MATH Google Scholar
Wedel M, Kamakura WA (1998) Market segmentation: conceptual and methodological foundations. Kluwer Academic Press, Boston
Google Scholar
Wu KL, Yang MS (2002) Alternative c-means clustering algorithms. Pattern Recognit 35(10):2267–2278
MATH Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847
Google Scholar
Xiong Y, Yeung DY (2004) Time series clustering with ARMA mixtures. Pattern Recognit 37(8):1675–1689
MATH Google Scholar
Yang MS, Wu KL (2004) A similarity-based robust clustering method. IEEE Trans Pattern Anal Mach Intell 26(4):434–448
Google Scholar

Download references

Acknowledgements

The authors are grateful to anonymous referees whose suggestions and comments helped to enhance this paper. The research carried out by the authors José A. Vilar and Borja Lafuente-Rego has been supported by MINECO Grants MTM2014-52876-R and MTM2017-82724-R and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Research Group on Modeling, Optimization and Statistical Inference (MODES), Department of Mathematics, Computer Science Faculty, University of A Coruña, 15071, A Coruña, Spain
B. Lafuente-Rego & J. A. Vilar
Dipartimento di Scienze Sociali ed Economiche, Sapienza University of Rome, Pza. Aldo Moro, 5, 00185, Rome, Italy
P. D’Urso

Authors

B. Lafuente-Rego
View author publications
You can also search for this author in PubMed Google Scholar
P. D’Urso
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Vilar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. Lafuente-Rego.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lafuente-Rego, B., D’Urso, P. & Vilar, J.A. Robust fuzzy clustering based on quantile autocovariances. Stat Papers 61, 2393–2448 (2020). https://doi.org/10.1007/s00362-018-1053-6

Download citation

Received: 17 October 2017
Revised: 26 September 2018
Published: 25 October 2018
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00362-018-1053-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust fuzzy clustering based on quantile autocovariances

Abstract

Access this article

Similar content being viewed by others

Fuzzy Clustering of Series Using Quantile Autocovariances

Fuzzy clustering of time series based on weighted conditional higher moments

Robust DTW-based entropy fuzzy clustering of time series

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust fuzzy clustering based on quantile autocovariances

Abstract

Access this article

Similar content being viewed by others

Fuzzy Clustering of Series Using Quantile Autocovariances

Fuzzy clustering of time series based on weighted conditional higher moments

Robust DTW-based entropy fuzzy clustering of time series

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation