Abstract
To find optimal clusters of functional objects in a lower-dimensional subspace of data, a sequential method called tandem analysis, is often used, though such a method is problematic. A new procedure is developed to find optimal clusters of functional objects and also find an optimal subspace for clustering, simultaneously. The method is based on the k-means criterion for functional data and seeks the subspace that is maximally informative about the clustering structure in the data. An efficient alternating least-squares algorithm is described, and the proposed method is extended to a regularized method. Analyses of artificial and real data examples demonstrate that the proposed method gives correct and interpretable results.
Similar content being viewed by others
References
Abraham C, Cornillon PA, Matzner-Lober E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Statist 30: 581–595
Arabie P, Hubert L (1994) Cluster analysis in marketing research. In: Bagozzi RP (eds) Advanced methods of marketing research. Blackwell Business, Cambridge, pp 160–189
Besse PC, Cardot H, Ferraty F (1997) Simultaneous non-parametric regressions of unbalanced longitudinal data. Comput Stat Data Anal 24: 255–270
Besse PC, Ramsay JO (1986) Principal components analysis of sampled functions. Psychometorika 51: 285–311
Boente G, Fraiman R (2000) Kernel-based functional principal components. Stat Probab Lett 48: 335–345
Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5: 281–300
De Boor C (2001) A practical guide to splines, revised edition. Springer, New York
de Leeuw J, Young FW, Takane Y (1976) Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometorika 41: 471–503
DeSarbo WS, Jedidi K, Cool K, Schendel D (1990) Simultaneous multidimensional unfolding and cluster analysis: an investigation of strategic groups. Mark Lett 2: 129–146
De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: Diday E, Lechevallier Y, Schader M, Bertrand P, Burtschy B (eds) New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219
Dunford N, Schwartz JT (1988) Linear operators, spectral theory, self adjoint operators in Hilbert space, part 2. Interscience, NewYork
Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall, London
Hardy A (1996) On the number of clusters. Comput Stat Data Anal 23: 83–96
Hartigan J (1975) Clustering algorithms. Wiley, New York
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2: 193–218
Kneip A (1994) Nonparametric estimation of common regressors for similar curve data. Ann Stat 22: 1386–1427
Illian JB, Prosser JI, Baker KL, Rangel-Castro JI (2009) Functional principal component data analysis: A new method for analysing microbial community fingerprints. J Microbiol Methods 79: 89–95
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28: 128–137
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50: 159–179
Ocaña FA, Aguilera AM, Valderrama MJ (1982) Functional principal components analysis by choice of norm. J Multivariate Anal 71: 262–276
Pezzulli SD, Silverman BW (1993) Some properties of smoothed principal components analysis for functional data. Comput Stat 8: 1–16
R Development Core Team (2005) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Austria. ISBN 3-900051-07-0, URL http://www.R-project.org
Ramsay JO, Wang X, Flanagan R (1995) A functional data analysis of the pinch force of human fingers. J Roy Stat Soc Ser C 44: 17–30
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd Edn. Springer, New York
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Stat Soc Ser B 53: 233–243
Rossi F, Conan-Guez B, Golli AE (2004) Clustering functional data with the SOM algorithm. ESANN’2004 proceedings, pp 305–312
Silverman BW (1996) Smoothed functional principal components analysis by choice of norm. Ann Stat 24: 1–24
Steinley D (2003) K-means clustering: What you don’t know may hurt you. Psychol Methods 8: 294–304
Steinley D, Henson R (2005) OCLUS: an analytic method for generating clusters with known overlap. J Classif 22: 221–250
Suyundykov R, Puechmorel S, Ferre L (2010) Multivariate functional data clusterization by PCA in Sobolev space using wavelets. Hyper Articles en Ligne:inria-00494702
Tarpey T (2007) Linear transformations and the k-means clustering algorithm: Applications to clustering curves. Am Stat 61: 34–40
Timmerman ME, Ceulemans E, Kiers HAL, Vichi M (2010) Factorial and reduced K-means reconsidered. Comput Stat Data Anal 54: 1858–1871
Vichi M, Kiers HAL (2001) Factorial k-means analysis for two-way data. Comput Stat Data Anal 37: 49–64
Wahba G (1990) Spline models for observational data. Society for Industrial and Applied Mathematics, Philadelphia
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yamamoto, M. Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6, 219–247 (2012). https://doi.org/10.1007/s11634-012-0113-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-012-0113-3