Computational Statistics

, Volume 25, Issue 2, pp 317–328

KmL: k-means for longitudinal data

Original Paper

DOI: 10.1007/s00180-009-0178-4

Cite this article as:
Genolini, C. & Falissard, B. Comput Stat (2010) 25: 317. doi:10.1007/s00180-009-0178-4


Cohort studies are becoming essential tools in epidemiological research. In these studies, measurements are not restricted to single variables but can be seen as trajectories. Statistical methods used to determine homogeneous patient trajectories can be separated into two families: model-based methods (like Proc Traj) and partitional clustering (non-parametric algorithms like k-means). KmL is a new implementation of k-means designed to work specifically on longitudinal data. It provides scope for dealing with missing values and runs the algorithm several times, varying the starting conditions and/or the number of clusters sought; its graphical interface helps the user to choose the appropriate number of clusters when the classic criterion is not efficient. To check KmL efficiency, we compare its performances to Proc Traj both on artificial and real data. The two techniques give very close clustering when trajectories follow polynomial curves. KmL gives much better results on non-polynomial trajectories.


Functional analysisLongitudinal datak-meansCluster analysisNon-parametric algorithm

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Inserm, U669ParisFrance
  2. 2.Modal’X, Univ Paris Ouest Nanterre La DéfenseParisFrance
  3. 3.Univ Paris-Sud and Univ Paris Descartes, UMR-S0669ParisFrance
  4. 4.Département de santé publiqueAP-HP, Hôpital Paul BrousseVillejuifFrance