Model-based clustering of time series in group-specific functional subspaces

Abstract

This work develops a general procedure for clustering functional data which adapts the clustering method high dimensional data clustering (HDDC), originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for determining both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical two-step clustering methods while providing useful interpretations of the groups and avoiding the uneasy choice of the discretization technique. In particular, funHDDC appears to always outperform HDDC applied on spline coefficients.

This is a preview of subscription content, log in to check access.

References

  1. Aguilera A, Escabiasa M, Preda C, Saporta G (2011) Using basis expansions for estimating functional PLS regression. Applications with chemometric data. Chemom Intell Lab Syst 104(2): 289–305

    Article  Google Scholar 

  2. Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821

    MathSciNet  MATH  Article  Google Scholar 

  3. Biernacki C (2004) Initializing EM using the properties of its trajectories in Gaussian mixtures. Stat Comput 14(3): 267–279

    MathSciNet  Article  Google Scholar 

  4. Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519

    MathSciNet  MATH  Article  Google Scholar 

  5. Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2): 245–276

    Article  Google Scholar 

  6. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28: 781–793

    Article  Google Scholar 

  7. Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38: 1171–1193

    MathSciNet  MATH  Article  Google Scholar 

  8. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1): 1–38

    MathSciNet  MATH  Google Scholar 

  9. Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16: 95–107

    MathSciNet  Article  Google Scholar 

  10. Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York

    Google Scholar 

  11. Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89

    Article  Google Scholar 

  12. Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28: 100–108

    Article  Google Scholar 

  13. Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24: 719–727

    Article  Google Scholar 

  14. James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408

    MathSciNet  MATH  Article  Google Scholar 

  15. Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43

  16. Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA

  17. Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2): 223–235

    MATH  Article  Google Scholar 

  18. Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York

    Google Scholar 

  19. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

    MATH  Article  Google Scholar 

  20. Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1): 93–114

    MathSciNet  MATH  Article  Google Scholar 

  21. Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2): 443–482

    Article  Google Scholar 

  22. Wahba G (1990) Spline models for observational data. SIAM, Philadelphia

    Google Scholar 

  23. Warren Liao T (2005) Clustering of time series data—a survey. Pattern Recognit 38: 1857–1874

    MATH  Article  Google Scholar 

  24. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: 23rd international conference on machine learning (ICML 2006), Pittsburgh, PA, pp 1033–1040

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Julien Jacques.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bouveyron, C., Jacques, J. Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5, 281–300 (2011). https://doi.org/10.1007/s11634-011-0095-6

Download citation

Keywords

  • Functional data
  • Time series clustering
  • Model-based clustering
  • Group-specific functional subspaces
  • Functional PCA

Mathematics Subject Classification (2010)

  • 62H30
  • 62M10
  • 62F99