Skip to main content
Log in

Benchmarking different clustering algorithms on functional data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Theoretical knowledge of clustering functions is still scarce and only few models are available in form of applicable code. In literature, most methods are based on the projection of the functions onto a basis and building fixed or random effects models of the basis coefficients. They involve various parameters, among them number of basis functions, projection dimension, number of iterations etc. They usually work well on the data presented in the articles, but their performance has in most cases not been tested objectively on other data sets, nor against each other. The purpose of this paper is to give an overview of several existing methods to cluster functional data. An outline of their theoretic concepts is given and the meaning of their hyperparameters is explained. A simulation study was set up to analyze the parameters’ efficiency and sensitivity on different types of data sets, that were registered on regular and on irregular grids. For each method, a linear model of the clustering results was evaluated with different parameter levels as predictors. Later, the methods’ performances were compared to each other with the help of a visualization tool, to identify which method works the best on a specific kind of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  • Amato U, Theofanis S (2005) Wavelet shrinkage approaches to baseline signal estimation from repeated noisy measurements. Adv Appl Stat 5(1):21–50

    MathSciNet  MATH  Google Scholar 

  • Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B (Statistical Methodology) 69(4):679–699

    Article  MathSciNet  Google Scholar 

  • Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218

    Article  MATH  Google Scholar 

  • Gareth M James (2003) http://www-bcf.usc.edu/~gareth/research/fclust

  • Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2011) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 69(1):31–40

    Article  MathSciNet  MATH  Google Scholar 

  • Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) curvclust: Curve clustering. R package version 0.0.1. http://cran.r-project.org/src/contrib/Archive/curvclust

  • Hitchcock DB, Ferreira L (2009) A comparison of hierarchical methods for clustering functional data. Commun Stat Simul Comput 38:1925–1949

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  • James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408

    Article  MathSciNet  MATH  Google Scholar 

  • Minh H, Niyogi P, Yao Y (2006) Mercer’s theorem, feature maps, and smoothing. In: Lugosi G, Simon H (eds) Learning theory, lecture notes in computer science. Springer, Berlin Heidelberg, pp 154–168

    Google Scholar 

  • Nason G (2013) wavethresh: Wavelets statistics and transforms. R package version 4.6.4. http://CRAN.R-project.org/package=wavethresh

  • Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2:1056–1077

    Article  MathSciNet  MATH  Google Scholar 

  • Tang R, Müller HG (2009) Time-synchronized clustering of gene expression trajectories. Biostatistics 10:32–45

    Article  Google Scholar 

  • TU Wien (2009) Vienna scientific cluster. http://vsc.ac.at/

  • Venables W, Ripley B (2002) Modern Applied Statistics with S. Statistics and computing. Springer, New York

    Book  MATH  Google Scholar 

  • Yassouridis C, Leisch FL, Winkler C, Ziegler A, Beyerlein A (2016) Associations of growth patterns and islet autoimmunity in children with increased risk for type 1 diabetes: a functional analysis approach. Pediatric Diabetes

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christina Yassouridis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yassouridis, C., Leisch, F. Benchmarking different clustering algorithms on functional data. Adv Data Anal Classif 11, 467–492 (2017). https://doi.org/10.1007/s11634-016-0261-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0261-y

Keywords

Mathematics Subject Classification

Navigation