Abstract
Theoretical knowledge of clustering functions is still scarce and only few models are available in form of applicable code. In literature, most methods are based on the projection of the functions onto a basis and building fixed or random effects models of the basis coefficients. They involve various parameters, among them number of basis functions, projection dimension, number of iterations etc. They usually work well on the data presented in the articles, but their performance has in most cases not been tested objectively on other data sets, nor against each other. The purpose of this paper is to give an overview of several existing methods to cluster functional data. An outline of their theoretic concepts is given and the meaning of their hyperparameters is explained. A simulation study was set up to analyze the parameters’ efficiency and sensitivity on different types of data sets, that were registered on regular and on irregular grids. For each method, a linear model of the clustering results was evaluated with different parameter levels as predictors. Later, the methods’ performances were compared to each other with the help of a visualization tool, to identify which method works the best on a specific kind of data.
Similar content being viewed by others
References
Amato U, Theofanis S (2005) Wavelet shrinkage approaches to baseline signal estimation from repeated noisy measurements. Adv Appl Stat 5(1):21–50
Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B (Statistical Methodology) 69(4):679–699
Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218
Gareth M James (2003) http://www-bcf.usc.edu/~gareth/research/fclust
Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2011) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 69(1):31–40
Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) curvclust: Curve clustering. R package version 0.0.1. http://cran.r-project.org/src/contrib/Archive/curvclust
Hitchcock DB, Ferreira L (2009) A comparison of hierarchical methods for clustering functional data. Commun Stat Simul Comput 38:1925–1949
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408
Minh H, Niyogi P, Yao Y (2006) Mercer’s theorem, feature maps, and smoothing. In: Lugosi G, Simon H (eds) Learning theory, lecture notes in computer science. Springer, Berlin Heidelberg, pp 154–168
Nason G (2013) wavethresh: Wavelets statistics and transforms. R package version 4.6.4. http://CRAN.R-project.org/package=wavethresh
Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2:1056–1077
Tang R, Müller HG (2009) Time-synchronized clustering of gene expression trajectories. Biostatistics 10:32–45
TU Wien (2009) Vienna scientific cluster. http://vsc.ac.at/
Venables W, Ripley B (2002) Modern Applied Statistics with S. Statistics and computing. Springer, New York
Yassouridis C, Leisch FL, Winkler C, Ziegler A, Beyerlein A (2016) Associations of growth patterns and islet autoimmunity in children with increased risk for type 1 diabetes: a functional analysis approach. Pediatric Diabetes
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yassouridis, C., Leisch, F. Benchmarking different clustering algorithms on functional data. Adv Data Anal Classif 11, 467–492 (2017). https://doi.org/10.1007/s11634-016-0261-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0261-y