Data Mining and Knowledge Discovery

, Volume 27, Issue 3, pp 421–441 | Cite as

Fast sequence segmentation using log-linear models

Article

Abstract

Sequence segmentation is a well-studied problem, where given a sequence of elements, an integer K, and some measure of homogeneity, the task is to split the sequence into K contiguous segments that are maximally homogeneous. A classic approach to find the optimal solution is by using a dynamic program. Unfortunately, the execution time of this program is quadratic with respect to the length of the input sequence. This makes the algorithm slow for a sequence of non-trivial length. In this paper we study segmentations whose measure of goodness is based on log-linear models, a rich family that contains many of the standard distributions. We present a theoretical result allowing us to prune many suboptimal segmentations. Using this result, we modify the standard dynamic program for 1D log-linear models, and by doing so reduce the computational time. We demonstrate empirically, that this approach can significantly reduce the computational burden of finding the optimal segmentation.

Keywords

Segmentation Pruning Change-point detection Dynamic program 

References

  1. Basseville M, Nikiforov IV (1993) Detection of abrupt changes—theory and application. Prentice-Hall, Englewood CliffsGoogle Scholar
  2. Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284MATHCrossRefGoogle Scholar
  3. Bernaola-Galván P, Román-Roldán R, Oliver JL (1996) Compositional segmentation and long-range fractal correlations in dna sequences. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 53(5): 5181–5189Google Scholar
  4. Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92Google Scholar
  5. Douglas D, Peucker T (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Can Cartogr 10(2):112–122CrossRefGoogle Scholar
  6. Džeroski S, Goethals B, Panov P (eds) (2011) Inductive databases and constraint-based data mining. Springer, New YorkGoogle Scholar
  7. Gedikli A, Aksoy H, Unal NE, Kehagias A (2010) Modified dynamic programming approach for offline segmentation of long hydrometeorological time series. Stoch Environ Res Risk Assess 24(5):547–557CrossRefGoogle Scholar
  8. Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: Proceedings of the seventh annual international conference on research in computational molecular biology, RECOMB ’03, pp 123–130Google Scholar
  9. Grünwald P (2007) The minimum description length principle. MIT Press, CambridgeGoogle Scholar
  10. Haiminen N, Gionis A (2004) Unimodal segmentation of sequences. In: ICDM, pp 106–113Google Scholar
  11. Himberg J, Korpiaho K, Mannila H, Tikanmäki J, Toivonen H (2001) Time series segmentation for context recognition in mobile devices. In: ICDM, pp 203–210Google Scholar
  12. Keogh EJ, Lin J, Fu AWC (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: ICDM, pp 226–233Google Scholar
  13. Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: VLDB, pp 180–191Google Scholar
  14. Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Mining of concurrent text and time series. In: KDD workshop on text mining, pp 37–44Google Scholar
  15. Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE, pp 339–349Google Scholar
  16. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MATHCrossRefGoogle Scholar
  17. Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: ICDE, pp 536–545Google Scholar
  18. Terzi E, Tsaparas P (2006) Efficient algorithms for sequence segmentation. In: SIAM data miningGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Department of Mathematics and Computer ScienceUniversity of AntwerpAntwerpBelgium
  2. 2.Department of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations