Date: 03 Sep 2001

An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes

* Final gross prices may vary according to local VAT.

Get Access


This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in three languages. We claim that the algorithm finds meaningful episodes in categorical time series, because it exploits two statistical characteristics of meaningful episodes.