Advertisement

Advances in Data Analysis and Classification

, Volume 6, Issue 3, pp 219–247 | Cite as

Clustering of functional data in a low-dimensional subspace

  • Michio YamamotoEmail author
Regular Article

Abstract

To find optimal clusters of functional objects in a lower-dimensional subspace of data, a sequential method called tandem analysis, is often used, though such a method is problematic. A new procedure is developed to find optimal clusters of functional objects and also find an optimal subspace for clustering, simultaneously. The method is based on the k-means criterion for functional data and seeks the subspace that is maximally informative about the clustering structure in the data. An efficient alternating least-squares algorithm is described, and the proposed method is extended to a regularized method. Analyses of artificial and real data examples demonstrate that the proposed method gives correct and interpretable results.

Keywords

Functional data Clustering Low-dimensional space Dimension reduction Smoothing 

Mathematics Subject Classification (2000)

62H30 91C20 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham C, Cornillon PA, Matzner-Lober E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Statist 30: 581–595MathSciNetzbMATHCrossRefGoogle Scholar
  2. Arabie P, Hubert L (1994) Cluster analysis in marketing research. In: Bagozzi RP (eds) Advanced methods of marketing research. Blackwell Business, Cambridge, pp 160–189Google Scholar
  3. Besse PC, Cardot H, Ferraty F (1997) Simultaneous non-parametric regressions of unbalanced longitudinal data. Comput Stat Data Anal 24: 255–270MathSciNetzbMATHCrossRefGoogle Scholar
  4. Besse PC, Ramsay JO (1986) Principal components analysis of sampled functions. Psychometorika 51: 285–311MathSciNetzbMATHCrossRefGoogle Scholar
  5. Boente G, Fraiman R (2000) Kernel-based functional principal components. Stat Probab Lett 48: 335–345MathSciNetzbMATHCrossRefGoogle Scholar
  6. Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5: 281–300MathSciNetCrossRefGoogle Scholar
  7. De Boor C (2001) A practical guide to splines, revised edition. Springer, New YorkGoogle Scholar
  8. de Leeuw J, Young FW, Takane Y (1976) Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometorika 41: 471–503zbMATHCrossRefGoogle Scholar
  9. DeSarbo WS, Jedidi K, Cool K, Schendel D (1990) Simultaneous multidimensional unfolding and cluster analysis: an investigation of strategic groups. Mark Lett 2: 129–146CrossRefGoogle Scholar
  10. De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: Diday E, Lechevallier Y, Schader M, Bertrand P, Burtschy B (eds) New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219Google Scholar
  11. Dunford N, Schwartz JT (1988) Linear operators, spectral theory, self adjoint operators in Hilbert space, part 2. Interscience, NewYorkGoogle Scholar
  12. Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall, LondonzbMATHGoogle Scholar
  13. Hardy A (1996) On the number of clusters. Comput Stat Data Anal 23: 83–96zbMATHCrossRefGoogle Scholar
  14. Hartigan J (1975) Clustering algorithms. Wiley, New YorkzbMATHGoogle Scholar
  15. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2: 193–218CrossRefGoogle Scholar
  16. Kneip A (1994) Nonparametric estimation of common regressors for similar curve data. Ann Stat 22: 1386–1427MathSciNetzbMATHCrossRefGoogle Scholar
  17. Illian JB, Prosser JI, Baker KL, Rangel-Castro JI (2009) Functional principal component data analysis: A new method for analysing microbial community fingerprints. J Microbiol Methods 79: 89–95CrossRefGoogle Scholar
  18. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28: 128–137MathSciNetCrossRefGoogle Scholar
  19. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50: 159–179CrossRefGoogle Scholar
  20. Ocaña FA, Aguilera AM, Valderrama MJ (1982) Functional principal components analysis by choice of norm. J Multivariate Anal 71: 262–276CrossRefGoogle Scholar
  21. Pezzulli SD, Silverman BW (1993) Some properties of smoothed principal components analysis for functional data. Comput Stat 8: 1–16MathSciNetzbMATHGoogle Scholar
  22. R Development Core Team (2005) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Austria. ISBN 3-900051-07-0, URL http://www.R-project.org
  23. Ramsay JO, Wang X, Flanagan R (1995) A functional data analysis of the pinch force of human fingers. J Roy Stat Soc Ser C 44: 17–30zbMATHGoogle Scholar
  24. Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd Edn. Springer, New YorkGoogle Scholar
  25. Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Stat Soc Ser B 53: 233–243MathSciNetzbMATHGoogle Scholar
  26. Rossi F, Conan-Guez B, Golli AE (2004) Clustering functional data with the SOM algorithm. ESANN’2004 proceedings, pp 305–312Google Scholar
  27. Silverman BW (1996) Smoothed functional principal components analysis by choice of norm. Ann Stat 24: 1–24zbMATHCrossRefGoogle Scholar
  28. Steinley D (2003) K-means clustering: What you don’t know may hurt you. Psychol Methods 8: 294–304CrossRefGoogle Scholar
  29. Steinley D, Henson R (2005) OCLUS: an analytic method for generating clusters with known overlap. J Classif 22: 221–250MathSciNetCrossRefGoogle Scholar
  30. Suyundykov R, Puechmorel S, Ferre L (2010) Multivariate functional data clusterization by PCA in Sobolev space using wavelets. Hyper Articles en Ligne:inria-00494702Google Scholar
  31. Tarpey T (2007) Linear transformations and the k-means clustering algorithm: Applications to clustering curves. Am Stat 61: 34–40MathSciNetCrossRefGoogle Scholar
  32. Timmerman ME, Ceulemans E, Kiers HAL, Vichi M (2010) Factorial and reduced K-means reconsidered. Comput Stat Data Anal 54: 1858–1871MathSciNetCrossRefGoogle Scholar
  33. Vichi M, Kiers HAL (2001) Factorial k-means analysis for two-way data. Comput Stat Data Anal 37: 49–64MathSciNetzbMATHCrossRefGoogle Scholar
  34. Wahba G (1990) Spline models for observational data. Society for Industrial and Applied Mathematics, PhiladelphiazbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Division of Mathematical Science, Graduate School of Engineering ScienceOsaka UniversityToyonakaJapan

Personalised recommendations