Data Mining and Knowledge Discovery

, Volume 27, Issue 3, pp 421–441 | Cite as

Fast sequence segmentation using log-linear models

  • Nikolaj Tatti


Sequence segmentation is a well-studied problem, where given a sequence of elements, an integer K, and some measure of homogeneity, the task is to split the sequence into K contiguous segments that are maximally homogeneous. A classic approach to find the optimal solution is by using a dynamic program. Unfortunately, the execution time of this program is quadratic with respect to the length of the input sequence. This makes the algorithm slow for a sequence of non-trivial length. In this paper we study segmentations whose measure of goodness is based on log-linear models, a rich family that contains many of the standard distributions. We present a theoretical result allowing us to prune many suboptimal segmentations. Using this result, we modify the standard dynamic program for 1D log-linear models, and by doing so reduce the computational time. We demonstrate empirically, that this approach can significantly reduce the computational burden of finding the optimal segmentation.


Segmentation Pruning Change-point detection Dynamic program 



Nikolaj Tatti was partly supported by a Post-Doctoral Fellowship of the Research Foundation-Flanders (fwo).


  1. Basseville M, Nikiforov IV (1993) Detection of abrupt changes—theory and application. Prentice-Hall, Englewood CliffsGoogle Scholar
  2. Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284zbMATHCrossRefGoogle Scholar
  3. Bernaola-Galván P, Román-Roldán R, Oliver JL (1996) Compositional segmentation and long-range fractal correlations in dna sequences. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 53(5): 5181–5189Google Scholar
  4. Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92Google Scholar
  5. Douglas D, Peucker T (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Can Cartogr 10(2):112–122CrossRefGoogle Scholar
  6. Džeroski S, Goethals B, Panov P (eds) (2011) Inductive databases and constraint-based data mining. Springer, New YorkGoogle Scholar
  7. Gedikli A, Aksoy H, Unal NE, Kehagias A (2010) Modified dynamic programming approach for offline segmentation of long hydrometeorological time series. Stoch Environ Res Risk Assess 24(5):547–557CrossRefGoogle Scholar
  8. Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: Proceedings of the seventh annual international conference on research in computational molecular biology, RECOMB ’03, pp 123–130Google Scholar
  9. Grünwald P (2007) The minimum description length principle. MIT Press, CambridgeGoogle Scholar
  10. Haiminen N, Gionis A (2004) Unimodal segmentation of sequences. In: ICDM, pp 106–113Google Scholar
  11. Himberg J, Korpiaho K, Mannila H, Tikanmäki J, Toivonen H (2001) Time series segmentation for context recognition in mobile devices. In: ICDM, pp 203–210Google Scholar
  12. Keogh EJ, Lin J, Fu AWC (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: ICDM, pp 226–233Google Scholar
  13. Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: VLDB, pp 180–191Google Scholar
  14. Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Mining of concurrent text and time series. In: KDD workshop on text mining, pp 37–44Google Scholar
  15. Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, pp 339–349Google Scholar
  16. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464zbMATHCrossRefGoogle Scholar
  17. Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: ICDE, pp 536–545Google Scholar
  18. Terzi E, Tsaparas P (2006) Efficient algorithms for sequence segmentation. In: SIAM data miningGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Department of Mathematics and Computer ScienceUniversity of AntwerpAntwerpBelgium
  2. 2.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations