Functional data clustering by projection into latent generalized hyperbolic subspaces

Abstract

We introduce a latent subpace model which facilitates model-based clustering of functional data. Flexible clustering is attained by imposing jointly generalized hyperbolic distributions on projections of basis expansion coefficients into group specific subspaces. The model acquires parsimony by assuming these subspaces are of relatively low dimension. Parameter estimation is done through a multicycle ECM algorithm. Application to simulated and real datasets illustrate competitive clustering capabilities, and demonstrate the models general applicability.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309. https://doi.org/10.1109/TPAMI.2009.149

    Article  Google Scholar 

  2. Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49(3):803–821

    MathSciNet  Article  Google Scholar 

  3. Bellman R (1954) The theory of dynamic programming. Bull Am Math Soc 60(6):503–515

    MathSciNet  Article  Google Scholar 

  4. Bickel PJ, Levina E (2008) Regularized estimation of large covariance matrices. Ann Stat 36(1):199–227. https://doi.org/10.1214/009053607000000758

    MathSciNet  Article  MATH  Google Scholar 

  5. Bouveyron C, Brunet C (2013) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78. https://doi.org/10.1016/j.csda.2012.12.008

    MathSciNet  Article  MATH  Google Scholar 

  6. Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300

    MathSciNet  Article  Google Scholar 

  7. Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519. https://doi.org/10.1016/j.csda.2007.02.009

    MathSciNet  Article  MATH  Google Scholar 

  8. Bouveyron C, Côme E, Jacques J (2015) The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann Appl Stat (in press). https://hal.archives-ouvertes.fr/hal-01024186

  9. Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43(2):176–198. https://doi.org/10.1002/cjs.11246

    MathSciNet  Article  MATH  Google Scholar 

  10. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793. https://doi.org/10.1016/0031-3203(94)00125-6

    Article  Google Scholar 

  11. Dau HA, Keogh E, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping, Hu B, Begum N, Bagnall A, Mueen A, Batista G, Hexagon-ML (2018) The ucr time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

  12. Ghahramani Z, Hinton GE (1997) The em algorithm for mixtures of factor analyzers. Technical report

  13. Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23(1):73–102. https://doi.org/10.1214/aos/1176324456

    MathSciNet  Article  MATH  Google Scholar 

  14. Jacques J, Preda C (2014a) Functional data clustering: a survey. Adv Data Anal Classif 8(3):231–255. https://doi.org/10.1007/s11634-013-0158-y

    MathSciNet  Article  MATH  Google Scholar 

  15. Jacques J, Preda C (2014b) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106. https://doi.org/10.1016/j.csda.2012.12.004

    MathSciNet  Article  MATH  Google Scholar 

  16. James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408. https://doi.org/10.1198/016214503000189

    MathSciNet  Article  MATH  Google Scholar 

  17. Kim NH, Browne R (2018) Subspace clustering for the finite mixture of generalized hyperbolic distributions. Adv Data Anal Classif. https://doi.org/10.1007/s11634-018-0333-2

    Article  MATH  Google Scholar 

  18. Lin Z, Müller HG, Yao F (2018) Mixture inner product spaces and their application to functional data analysis. Ann Statist 46(1):370–400. https://doi.org/10.1214/17-AOS1553

    MathSciNet  Article  MATH  Google Scholar 

  19. McLachlan G, Peel D, Bean R (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3):379–388. https://doi.org/10.1016/S0167-9473(02)00183-4

    MathSciNet  Article  MATH  Google Scholar 

  20. Mclachlan G, Bean R, Ben-Tovim Jones L (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate distribution. Comput Stat Data Anal 51:5327–5338. https://doi.org/10.1016/j.csda.2006.09.015

    MathSciNet  Article  MATH  Google Scholar 

  21. McNeil AJ, Frey R, Embrechts P (2015) Quantitative risk management: concepts, techniques and tools. Princeton University Press, Princeton, NJ

    Google Scholar 

  22. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: A review. SIGKDD Explor Newsl 6(1):90–105. https://doi.org/10.1145/1007730.1007731

    Article  Google Scholar 

  23. Pesevski A, Franczak B, McNicholas P (2017) Subspace clustering with the multivariate-t distribution. Pattern Recogn Lett 112:1. https://doi.org/10.1016/j.patrec.2018.07.003

    Article  Google Scholar 

  24. Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P (2018) Clustering multivariate functional data in group-specific functional subspaces (working paper or preprint). https://hal.inria.fr/hal-01652467

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alex Sharp.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sharp, A., Browne, R. Functional data clustering by projection into latent generalized hyperbolic subspaces. Adv Data Anal Classif (2021). https://doi.org/10.1007/s11634-020-00432-5

Download citation

Keywords

  • Model-based clustering
  • Functional data analysis
  • Dimension reduction
  • Functional principal component analysis
  • EM algorithm

Mathematics Subject Classification

  • 62R10