Advertisement

Advances in Data Analysis and Classification

, Volume 5, Issue 4, pp 281–300 | Cite as

Model-based clustering of time series in group-specific functional subspaces

  • Charles Bouveyron
  • Julien Jacques
Regular Article

Abstract

This work develops a general procedure for clustering functional data which adapts the clustering method high dimensional data clustering (HDDC), originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for determining both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical two-step clustering methods while providing useful interpretations of the groups and avoiding the uneasy choice of the discretization technique. In particular, funHDDC appears to always outperform HDDC applied on spline coefficients.

Keywords

Functional data Time series clustering Model-based clustering Group-specific functional subspaces Functional PCA 

Mathematics Subject Classification (2010)

62H30 62M10 62F99 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aguilera A, Escabiasa M, Preda C, Saporta G (2011) Using basis expansions for estimating functional PLS regression. Applications with chemometric data. Chemom Intell Lab Syst 104(2): 289–305CrossRefGoogle Scholar
  2. Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821MathSciNetzbMATHCrossRefGoogle Scholar
  3. Biernacki C (2004) Initializing EM using the properties of its trajectories in Gaussian mixtures. Stat Comput 14(3): 267–279MathSciNetCrossRefGoogle Scholar
  4. Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519MathSciNetzbMATHCrossRefGoogle Scholar
  5. Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2): 245–276CrossRefGoogle Scholar
  6. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28: 781–793CrossRefGoogle Scholar
  7. Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38: 1171–1193MathSciNetzbMATHCrossRefGoogle Scholar
  8. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1): 1–38MathSciNetzbMATHGoogle Scholar
  9. Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16: 95–107MathSciNetCrossRefGoogle Scholar
  10. Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New YorkGoogle Scholar
  11. Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89CrossRefGoogle Scholar
  12. Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28: 100–108CrossRefGoogle Scholar
  13. Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24: 719–727CrossRefGoogle Scholar
  14. James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408MathSciNetzbMATHCrossRefGoogle Scholar
  15. Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43Google Scholar
  16. Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PAGoogle Scholar
  17. Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2): 223–235zbMATHCrossRefGoogle Scholar
  18. Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New YorkGoogle Scholar
  19. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464zbMATHCrossRefGoogle Scholar
  20. Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1): 93–114MathSciNetzbMATHCrossRefGoogle Scholar
  21. Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2): 443–482CrossRefGoogle Scholar
  22. Wahba G (1990) Spline models for observational data. SIAM, PhiladelphiazbMATHCrossRefGoogle Scholar
  23. Warren Liao T (2005) Clustering of time series data—a survey. Pattern Recognit 38: 1857–1874zbMATHCrossRefGoogle Scholar
  24. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: 23rd international conference on machine learning (ICML 2006), Pittsburgh, PA, pp 1033–1040Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Laboratoire SAMM, EA 4543Université Paris 1 Panthéon-SorbonneParisFrance
  2. 2.Laboratoire Paul Painlevé, UMR CNRS 8524, INRIA Lille-Nord EuropeU.F.R. de Mathématiques, Université Lille 1Villeneuve d’Ascq CedexFrance

Personalised recommendations