Advances in Data Analysis and Classification

, Volume 11, Issue 3, pp 467–492 | Cite as

Benchmarking different clustering algorithms on functional data

Regular Article

Abstract

Theoretical knowledge of clustering functions is still scarce and only few models are available in form of applicable code. In literature, most methods are based on the projection of the functions onto a basis and building fixed or random effects models of the basis coefficients. They involve various parameters, among them number of basis functions, projection dimension, number of iterations etc. They usually work well on the data presented in the articles, but their performance has in most cases not been tested objectively on other data sets, nor against each other. The purpose of this paper is to give an overview of several existing methods to cluster functional data. An outline of their theoretic concepts is given and the meaning of their hyperparameters is explained. A simulation study was set up to analyze the parameters’ efficiency and sensitivity on different types of data sets, that were registered on regular and on irregular grids. For each method, a linear model of the clustering results was evaluated with different parameter levels as predictors. Later, the methods’ performances were compared to each other with the help of a visualization tool, to identify which method works the best on a specific kind of data.

Keywords

Functional clustering Benchmarking Hyperparameters 

Mathematics Subject Classification

46 Functional analysis 62 Statistics 

References

  1. Amato U, Theofanis S (2005) Wavelet shrinkage approaches to baseline signal estimation from repeated noisy measurements. Adv Appl Stat 5(1):21–50MathSciNetMATHGoogle Scholar
  2. Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B (Statistical Methodology) 69(4):679–699MathSciNetCrossRefGoogle Scholar
  3. Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218CrossRefMATHGoogle Scholar
  4. Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2011) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 69(1):31–40MathSciNetCrossRefMATHGoogle Scholar
  5. Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) curvclust: Curve clustering. R package version 0.0.1. http://cran.r-project.org/src/contrib/Archive/curvclust
  6. Hitchcock DB, Ferreira L (2009) A comparison of hierarchical methods for clustering functional data. Commun Stat Simul Comput 38:1925–1949MathSciNetCrossRefMATHGoogle Scholar
  7. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefMATHGoogle Scholar
  8. James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408MathSciNetCrossRefMATHGoogle Scholar
  9. Minh H, Niyogi P, Yao Y (2006) Mercer’s theorem, feature maps, and smoothing. In: Lugosi G, Simon H (eds) Learning theory, lecture notes in computer science. Springer, Berlin Heidelberg, pp 154–168Google Scholar
  10. Nason G (2013) wavethresh: Wavelets statistics and transforms. R package version 4.6.4. http://CRAN.R-project.org/package=wavethresh
  11. Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2:1056–1077MathSciNetCrossRefMATHGoogle Scholar
  12. Tang R, Müller HG (2009) Time-synchronized clustering of gene expression trajectories. Biostatistics 10:32–45CrossRefGoogle Scholar
  13. TU Wien (2009) Vienna scientific cluster. http://vsc.ac.at/
  14. Venables W, Ripley B (2002) Modern Applied Statistics with S. Statistics and computing. Springer, New YorkCrossRefMATHGoogle Scholar
  15. Yassouridis C, Leisch FL, Winkler C, Ziegler A, Beyerlein A (2016) Associations of growth patterns and islet autoimmunity in children with increased risk for type 1 diabetes: a functional analysis approach. Pediatric DiabetesGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.University of Natural Resources and Life Sciences, Institute of Applied Statistics and ComputingViennaAustria

Personalised recommendations