Advertisement

Advances in Data Analysis and Classification

, Volume 13, Issue 1, pp 201–225 | Cite as

Robust clustering for functional data based on trimming and constraints

  • Diego Rivera-GarcíaEmail author
  • Luis A. García-Escudero
  • Agustín Mayo-Iscar
  • Joaquín Ortega
Regular Article
  • 131 Downloads

Abstract

Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of curves can influence the performance of most of them. In this work we propose a robust, model-based clustering method that relies on an approximation to the “density function” for functional data. The robustness follows from the joint application of data-driven trimming, for reducing the effect of contaminated observations, and constraints on the variances, for avoiding spurious clusters in the solution. The algorithm is designed to perform clustering and outlier detection simultaneously by maximizing a trimmed “pseudo” likelihood. The proposed method has been evaluated and compared with other existing methods through a simulation study. Better performance for the proposed methodology is shown when a fraction of contaminating curves is added to a non-contaminated sample. Finally, an application to a real data set that has been previously considered in the literature is given.

Keywords

Functional data analysis Clustering Robustness Trimming Functional principal components analysis 

Mathematics Subject Classification

62G35 62H30 68T10 

Notes

Acknowledgements

We would like to thank the Associate Editor and two anonymous reviewers for their helpful suggestions and comments. This work was partly done while DR and JO visited the Departamento de Estadística e I.O., Universidad de Valladolid, Spain, with support from Conacyt, Mexico (DR as visiting graduate student, JO by Projects 169175 Análisis Estadístico de Olas Marinas, Fase II y 234057 Análisis Espectral, Datos Funcionales y Aplicaciones), CIMAT, A.C. and the Universidad de Valladolid. Their hospitality and support is gratefully acknowledged. Research by LA G-E and A M-I was partially supported by the Spanish Ministerio de Economía y Competitividad, grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, grant VA005P17.

References

  1. Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300MathSciNetCrossRefzbMATHGoogle Scholar
  2. Bouveyron C, Jacques J (2014) funHDDC: model-based clustering in group-specific functional subspaces. R package version 1.0Google Scholar
  3. Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 2:245–276CrossRefGoogle Scholar
  4. Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Graph StatGoogle Scholar
  5. Cuesta-Albertos JA, Fraiman R (2007) Impartial trimmed \(k\)-means for functional data. Comput Stat Data Anal 51(10):4864–4877MathSciNetCrossRefzbMATHGoogle Scholar
  6. Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetCrossRefzbMATHGoogle Scholar
  7. Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38(2):1171–1193MathSciNetCrossRefzbMATHGoogle Scholar
  8. Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}x\) levels. Environmetrics 19(4):331–345MathSciNetCrossRefGoogle Scholar
  9. Febrero-Bande M, de la Fuente M Oviedo (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51(4):1–28CrossRefGoogle Scholar
  10. Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New YorkzbMATHGoogle Scholar
  11. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefzbMATHGoogle Scholar
  12. Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetCrossRefzbMATHGoogle Scholar
  13. Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp. 247–255Google Scholar
  14. García-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22(2):185–201MathSciNetCrossRefzbMATHGoogle Scholar
  15. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345MathSciNetCrossRefzbMATHGoogle Scholar
  16. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633MathSciNetCrossRefzbMATHGoogle Scholar
  17. García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43MathSciNetCrossRefGoogle Scholar
  18. Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171CrossRefGoogle Scholar
  19. James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408MathSciNetCrossRefzbMATHGoogle Scholar
  20. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New YorkCrossRefzbMATHGoogle Scholar
  21. Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New YorkzbMATHGoogle Scholar
  22. Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4Google Scholar
  23. Ritter G (2015) Robust cluster analysis and variable selection, vol 137. Monographs on statistics and applied probability. CRC Press, Boca Raton, FLzbMATHGoogle Scholar
  24. Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27(1):83–102MathSciNetCrossRefzbMATHGoogle Scholar
  25. Sguera C, Galeano P, Lillo RE (2015) Functional outlier detection by a local depth with application to NOx levels. Stoch Environ Res Risk Assess 462:1835–1851Google Scholar
  26. Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CIMATGuanajuatoMexico
  2. 2.Dept. de Estadística e Investigación OperativaUniversidad de ValladolidValladolidSpain

Personalised recommendations