Advertisement

Clustering multivariate functional data in group-specific functional subspaces

  • Amandine SchmutzEmail author
  • Julien Jacques
  • Charles Bouveyron
  • Laurence Chèze
  • Pauline Martin
Original paper
  • 5 Downloads

Abstract

With the emergence of numerical sensors in many aspects of everyday life, there is an increasing need in analyzing multivariate functional data. This work focuses on the clustering of such functional data, in order to ease their modeling and understanding. To this end, a novel clustering technique for multivariate functional data is presented. This method is based on a functional latent mixture model which fits the data into group-specific functional subspaces through a multivariate functional principal component analysis. A family of parsimonious models is obtained by constraining model parameters within and between groups. An Expectation Maximization algorithm is proposed for model inference and the choice of hyper-parameters is addressed through model selection. Numerical experiments on simulated datasets highlight the good performance of the proposed methodology compared to existing works. This algorithm is then applied to the analysis of the pollution in French cities for 1 year.

Keywords

Multivariate functional curves Multivariate functional principal component analysis Model-based clustering EM algorithm 

Notes

Compliance with ethical standards

Conflicts of interest

The authors declare that they have no conflict of interest.

References

  1. Akaike H (1974) A new look at the statistical model identification. IEEE Tran Autom Control 9:716–723MathSciNetCrossRefGoogle Scholar
  2. Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54(12):2926–2941MathSciNetCrossRefGoogle Scholar
  3. Berrendero J, Justel A, Svarc M (2011) Principal components for multivariate functional data. Comput Stat Data Anal 55:2619–263MathSciNetCrossRefGoogle Scholar
  4. Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans PAMI 22:719–725CrossRefGoogle Scholar
  5. Birge L, Massart P (2007) Minimal penalties for Gaussian model selection. Probab Theory Relat Fields 138:33–73MathSciNetCrossRefGoogle Scholar
  6. Bongiorno EG, Goia A (2016) Classification methods for hilbert data based on surrogate density. Comput Stat Data Anal 99(C):204–222MathSciNetCrossRefGoogle Scholar
  7. Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300MathSciNetCrossRefGoogle Scholar
  8. Bouveyron C, Come E, Jacques J (2015) The discriminative functional mixture model for the analysis of bike sharing systems. Ann Appl Stat 9(4):1726–1760MathSciNetCrossRefGoogle Scholar
  9. Bouveyron C, Celeux G, Murphy T, Raftery A (2019) Model-based clustering and classification for data science: with applications in R. Statistical and probabilistic mathematics. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  10. Byers S, Raftery AE (1998) Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 93(442):577–584CrossRefGoogle Scholar
  11. Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276CrossRefGoogle Scholar
  12. Chen L, Jiang C (2016) Multi-dimensional functional principal component analysis. Stat Comput 27:1181–1192MathSciNetCrossRefGoogle Scholar
  13. Chiou J, Chen Y, Yang Y (2014) Multivariate functional principal component analysis: a normalization approach. Stat Sin 24:1571–1596MathSciNetzbMATHGoogle Scholar
  14. Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B Stat Methodol 69(4):679–699MathSciNetCrossRefGoogle Scholar
  15. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetzbMATHGoogle Scholar
  16. Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric approach. Comput Stat Data Anal 44:161–173MathSciNetCrossRefGoogle Scholar
  17. Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(1):347–380MathSciNetCrossRefGoogle Scholar
  18. Gallegos MT, Ritter G (2009) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3:135–167MathSciNetCrossRefGoogle Scholar
  19. Hennig C, Coretto P (2007) The noise component in model-based cluster analysis. Springer, Berlin, pp 127–138Google Scholar
  20. Ieva F, Paganoni AM (2016) Risk prediction for myocardial infarction via generalized functional regression models. Stat Methods Med Res 25:1648–1660MathSciNetCrossRefGoogle Scholar
  21. Ieva F, Paganoni A, Pigoli D, Vitelli V (2013) Multivariate functional clustering for the morphological analysis of ECG curves. J R Stat Soc Series C (Appl Stat) 62(3):401–418MathSciNetCrossRefGoogle Scholar
  22. Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing 112:164–171CrossRefGoogle Scholar
  23. Jacques J, Preda C (2014a) Functional data clustering: a survey. Adv Data Anal Classif 8(3):231–255MathSciNetCrossRefGoogle Scholar
  24. Jacques J, Preda C (2014b) Model based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106MathSciNetCrossRefGoogle Scholar
  25. James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408MathSciNetCrossRefGoogle Scholar
  26. Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized Gaussian basis expansions and its application. J Classif 27:211–230MathSciNetCrossRefGoogle Scholar
  27. Petersen KB, Pedersen MS (2012) The matrix cookbook. http://www2.imm.dtu.dk/pubdb/p.php?3274, version 20121115
  28. Preda C (2007) Regression models for functional data by reproducing kernel hilbert spaces methods. J Stat Plan Inference 137:829–840MathSciNetCrossRefGoogle Scholar
  29. R Core Team (2017) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, https://www.R-project.org/
  30. Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New YorkCrossRefGoogle Scholar
  31. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRefGoogle Scholar
  32. Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Cahiers du Bureau universitaire de recherche opérationnelle Série Recherche 37–38:7–194Google Scholar
  33. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MathSciNetCrossRefGoogle Scholar
  34. Singhal A, Seborg D (2005) Clustering multivariate time-series data. J Chemom 19:427–438CrossRefGoogle Scholar
  35. Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1):93–114MathSciNetCrossRefGoogle Scholar
  36. Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22:1–16MathSciNetCrossRefGoogle Scholar
  37. Traore OI, Cristini P, Favretto-Cristini N, Pantera L, Vieu P, Viguier-Pla S (2019) Clustering acoustic emission signals by mixing two stages dimension reduction and nonparametric approaches. Comput Stat 34(2):631–652MathSciNetCrossRefGoogle Scholar
  38. Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6:219–247MathSciNetCrossRefGoogle Scholar
  39. Yamamoto M, Terada Y (2014) Functional factorial k-means analysis. Comput Stat Data Anal 79:133–148MathSciNetCrossRefGoogle Scholar
  40. Yamamoto M, Hwang H (2017) Dimension-reduced clustering of functional data via subspace separation. J Classif 34:294–326MathSciNetCrossRefGoogle Scholar
  41. Zambom AZ, Collazos JA, Dias R (2019) Functional data clustering via hypothesis testing k-means. Comput Stat 34(2):527–549MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Lim FranceNontronFrance
  2. 2.CWD-VetLabEcole Nationale Vétérinaire d’AlfortMaisons-AlfortFrance
  3. 3.ERIC EA3083Université de Lyon, Lyon 2LyonFrance
  4. 4.Inria, CNRS, LJAD, Maasai teamUniversité Côte d’AzurNiceFrance
  5. 5.LBMC UMR T9406Université de Lyon, Lyon 1LyonFrance

Personalised recommendations