Advertisement

Advances in Data Analysis and Classification

, Volume 5, Issue 4, pp 301–321 | Cite as

Model-based clustering and segmentation of time series with changes in regime

  • Allou Samé
  • Faicel Chamroukhi
  • Gérard Govaert
  • Patrice Aknin
Regular Article

Abstract

Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the Expectation–Maximization (EM) algorithm. Within the context of a railway application, this paper introduces a novel mixture model for dealing with time series that are subject to changes in regime. The proposed approach, called ClustSeg, consists in modeling each cluster by a regression model in which the polynomial coefficients vary according to a discrete hidden process. In particular, this approach makes use of logistic functions to model the (smooth or abrupt) transitions between regimes. The model parameters are estimated by the maximum likelihood method solved by an EM algorithm. This approach can also be regarded as a clustering approach which operates by finding groups of time series having common changes in regime. In addition to providing a time series partition, it therefore provides a time series segmentation. The problem of selecting the optimal numbers of clusters and segments is solved by means of the Bayesian Information Criterion. The ClustSeg approach is shown to be efficient using a variety of simulated time series and real-world time series of electrical power consumption from rail switching operations.

Keywords

Clustering Time series Change in regime Mixture model Regression mixture Hidden logistic process EM algorithm 

Mathematics Subject Classification (2010)

62-07 62M10 62H30 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49: 803–821MathSciNetzbMATHCrossRefGoogle Scholar
  2. Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7): 719–725CrossRefGoogle Scholar
  3. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn. 28(5): 781–793CrossRefGoogle Scholar
  4. Chamroukhi F, Samé A, Govaert G, Aknin P (2010) A hidden process regression model for functional data description. application to curve discrimination. Neurocomputing 73: 1210–1221CrossRefGoogle Scholar
  5. Chiou J, Li P (2007) Functional clustering and identifying substructures of longitudinal data. J Royal Stat Soc Ser B (Stat Methodol) 69(4): 679–699MathSciNetCrossRefGoogle Scholar
  6. Coke G, Tsao M (2010) Random effects mixture models for clustering electrical load series. J Time Ser Anal 31(6): 451–464MathSciNetCrossRefGoogle Scholar
  7. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm (with discussion). J Royal Stat Soc B 39: 1–38MathSciNetzbMATHGoogle Scholar
  8. Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, San Diego, CA, USAGoogle Scholar
  9. Gaffney S, Smyth P (2003) Curve clustering with random effects regression mixtures. In: Proceedings of the ninth international workshop on artificial intelligence and statistics, society for artificial intelligence and statistics, Key West, Florida, USAGoogle Scholar
  10. Green P (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. J Royal Stat Soc B 46(2): 149–192zbMATHGoogle Scholar
  11. Hébrail G, Hugueney B, Lechevallier Y, Rossi F (2010) Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomputing 73(7–9): 1125–1141CrossRefGoogle Scholar
  12. James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408MathSciNetzbMATHCrossRefGoogle Scholar
  13. Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53(4): 1361–1376zbMATHCrossRefGoogle Scholar
  14. McLachlan GJ, Krishnan K (2008) The EM algorithm and extension, 2nd edn. Wiley, New YorkCrossRefGoogle Scholar
  15. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  16. Ng S, McLachlan G, Wang K, Ben-Tovim Jones L, Ng S (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22(14): 1745CrossRefGoogle Scholar
  17. Ramsay JO, Silverman BW (1997) Fuctional data analysis. Springer Series in Statistics, Springer, New YorkGoogle Scholar
  18. Schwarz G (1978) Estimating the number of components in a finite mixture model. Ann Stat 6: 461–464zbMATHCrossRefGoogle Scholar
  19. Shi J, Wang B (2008) Curve prediction and clustering with mixtures of gaussian process functional regression models. Stat Comput 18(3): 267–283MathSciNetCrossRefGoogle Scholar
  20. Wong C, Li W (2000) On a mixture autoregressive model. J Royal Stat Soc Ser B Stat Methodol 62(1): 95–115MathSciNetzbMATHGoogle Scholar
  21. Xiong Y, Yeung D (2004) Time series clustering with arma mixtures. Pattern Recogn 37(8): 1675–1689zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Allou Samé
    • 1
  • Faicel Chamroukhi
    • 1
  • Gérard Govaert
    • 2
  • Patrice Aknin
    • 1
  1. 1.Université Paris-Est, IFSTTAR, GRETTIANoisy-le-GrandFrance
  2. 2.Université de Technologie de Compiègne, UMR CNRS 6599CompiègneFrance

Personalised recommendations