Advances in Intelligent Data Analysis

Volume 2189 of the series Lecture Notes in Computer Science pp 198-207


An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes

  • Paul CohenAffiliated withDepartment of Computer Science, University of Massachusetts
  • , Niall AdamsAffiliated withDepartment of Mathematics, Imperial College

* Final gross prices may vary according to local VAT.

Get Access


This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in three languages. We claim that the algorithm finds meaningful episodes in categorical time series, because it exploits two statistical characteristics of meaningful episodes.