A divisive clustering method for functional data with special consideration of outliers

Abstract

This paper presents DivClusFD, a new divisive hierarchical method for the non-supervised classification of functional data. Data of this type present the peculiarity that the differences among clusters may be caused by changes as well in level as in shape. Different clusters can be separated in different subregion and there may be no subregion in which all clusters are separated. In each step of division, the DivClusFD method explores the functions and their derivatives at several fixed points, seeking the subregion in which the highest number of clusters can be separated. The number of clusters is estimated via the gap statistic. The functions are assigned to the new clusters by combining the k-means algorithm with the use of functional boxplots to identify functions that have been incorrectly classified because of their atypical local behavior. The DivClusFD method provides the number of clusters, the classification of the observed functions into the clusters and guidelines that may be for interpreting the clusters. A simulation study using synthetic data and tests of the performance of the DivClusFD method on real data sets indicate that this method is able to classify functions accurately.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Abraham C, Cornillon PA, Matzner-Lber E, Molinari N (2003) Unsupervised curves clustering using B-Splines. Scand J Stat 30:581–595. doi:10.1111/1467-9469.00350

    MathSciNet  Article  MATH  Google Scholar 

  2. Alonso AM, Casado D, Romo J (2012) Supervised classification for functional data: a weighted distance aprproach. Comput Stat Data Anal 56:2334–2346. doi:10.1016/j.csda.2012.01.013

    Article  MATH  Google Scholar 

  3. Berrendero JR, Justel A, Svarc M (2011) Principal components for multivariate functional data. Comput Stat Data Anal 55:2619–2634. doi:10.1016/j.csda.2011.03.011

    MathSciNet  Article  MATH  Google Scholar 

  4. Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5:281–300. doi:10.1007/s11634-011-0095-6

    MathSciNet  Article  MATH  Google Scholar 

  5. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. http://www.cs.ucr.edu/~eamonn/time_series_data/

  6. Chiou JM, Li PL (2011) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc B 69:679–699. doi:10.1111/j.1467-9868.2007.00605.x

    MathSciNet  Article  Google Scholar 

  7. Febrero-Bande M, Oviedo de la Fuente M (2012) Statistical computing in functional data analysis: the R Package fda.usc. J Stat Softw 51(4):1–28. doi:10.18637/jss.v051.i04

    Article  Google Scholar 

  8. Fraiman R, Justel A, Svarc M (2008) Selection of variables for cluster analysis and classification rules. J Am Stat Assoc 103:1294–1303. doi:10.1198/016214508000000544

    MathSciNet  Article  MATH  Google Scholar 

  9. Ieva F, Paganoni AM, Pigoli D, Vitelli V (2013) Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J R Stat Soc Ser C (Appl Stat) 62:401–418. doi:10.1111/j.1467-9876.2012.01062.x

    MathSciNet  Article  Google Scholar 

  10. Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing 171:112–164. doi:10.1016/j.neucom.2012.11.042

    Article  Google Scholar 

  11. Jacques J, Preda C (2014a) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106. doi:10.1016/j.csda.2012.12.004

    MathSciNet  Article  Google Scholar 

  12. Jacques J, Preda C (2014b) Functional data clustering: a survey. Adv Data Anal Classif 8(3):231–255. doi:10.1007/s11634-013-0158-y

    MathSciNet  Article  Google Scholar 

  13. James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408. doi:10.1198/016214503000189

    MathSciNet  Article  MATH  Google Scholar 

  14. Keogh E, Pazzani M (2001) Dynamic time warping with higher order features. In: First SIAM international conference on data mining (SDM’2001), Chicago, IL, USA. doi:10.1007/s10618-015-0418-x

    MathSciNet  Article  Google Scholar 

  15. López-Pintado S, Romo J (2009) On the concept of depth for functional data. J Am Stat Assoc 104:718–734. doi:10.1198/jasa.2009.0108

    MathSciNet  Article  MATH  Google Scholar 

  16. López-Pintado S, Sun Y, Lin JK, Genton MG (2014) Simplicial band depth for multivariate functional data. Adv Data Anal Classif 8:321–338. doi:10.1007/s11634-014-0166-6

    MathSciNet  Article  Google Scholar 

  17. Mosler K (2013) Depth statistics. In: Becker C, Fried R, Kuhnt S (eds) Robustness and complex data structures. Springer, Berlin, pp 17–34. doi:10.1007/978-3-642-35494-6_2

    Google Scholar 

  18. Ray S, Mallick B (2006) Functional clustering by Bayesian wavelet methods. J R Stat Soc Ser B Stat Methodol 68(2):305–332. doi:10.1111/j.1467-9868.2006.00545.x

    MathSciNet  Article  MATH  Google Scholar 

  19. Ramsay J, Hooker G, Graves S (2009) Functional data analysis with R and Matlab. Springer, Berlin. doi:10.1007/978-0-387-98185-7

    Article  MATH  Google Scholar 

  20. Ramsay J, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York. doi:10.1111/j.1541-0420.2007.00743_1.x

    Article  Google Scholar 

  21. Sangalli LM, Secchi P, Vantini S, Vitelli V (2010a) \(k\)-means alignment for curve clustering. Comput Stat Data Anal 54:1219–1233. doi:10.1016/j.csda.2009.12.008

    MathSciNet  Article  MATH  Google Scholar 

  22. Sangalli LM, Secchi P, Vantini S, Vitelli V (2010b) Functional clustering and alignment methods with applications. Commun Appl Ind Math 1:205–224. doi:10.1685/2010CAIM486

    MathSciNet  Article  MATH  Google Scholar 

  23. Serban N, Wasserman L (2005) CATS: cluster analysis by transformation and smoothing. J Am Stat Assoc 100:990–999. doi:10.1198/016214504000001574

    Article  MATH  Google Scholar 

  24. Sun Y, Genton MG (2011) Functional boxplots. J Comput Graph Stat 20:316–334. doi:10.1198/jcgs.2011.09224

    MathSciNet  Article  Google Scholar 

  25. Tarpey T, Kinateder KKJ (2003) Clustering functional data. J Classif 20:93–114. doi:10.1007/s11634-013-0158-y

    Article  Google Scholar 

  26. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of data clusters via the gap statistic. J R Stat Soc B 63:411–423. doi:10.1111/1467-9868.00293

    Article  MATH  Google Scholar 

  27. Tokushige S, Yadohisa H, Inada K (2007) Crisp and Fuzzy \(k\)-means clustering algorithms for multivariate functional data. Comput Stat 21:1–16. doi:10.1007/s00180-006-0013-0

    MathSciNet  Article  MATH  Google Scholar 

  28. Tuddenham RD, Snyder MM (1954) Physical growth of California boys and girls from birth to eighteen years. Tech. Rep. 1, University of California Publications in Child Development

  29. Tukey J (1977) Exploratory data analysis. Addison-Westley, Boston. doi:10.1002/bimj.4710230408

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We wish to thank the editors and four anonymous referees who have carefully reviewed the paper. Their suggestions and comments have helped us to improve the quality of this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Marcela Svarc.

Additional information

This work was supported by the Spanish Agencia Estatal de Investigación (AEI) and Fondo Europeo de Desarrollo Regional (FEDER), Grant CTM2016-79741-R for MICROAIPOLAR project (to A. Justel and M. Svarc) and Spanish Ministerio de Economía y Competitividad, Grant CTM2011-28736 (to A. Justel).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Justel, A., Svarc, M. A divisive clustering method for functional data with special consideration of outliers. Adv Data Anal Classif 12, 637–656 (2018). https://doi.org/10.1007/s11634-017-0290-1

Download citation

Keywords

  • Hierarchical clustering
  • Functional boxplot
  • Gap statistic

Mathematics Subject Classification

  • 62H30