An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes
This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in three languages. We claim that the algorithm finds meaningful episodes in categorical time series, because it exploits two statistical characteristics of meaningful episodes.
Unable to display preview. Download preview PDF.
- 1.Cohen, Paul. Fluent learning: Elucidating the structure of episodes. This volume.Google Scholar
- 2.M. Garofalakis, R. Rastogi, and K. Shim. Spirit: sequential pattern mining with regular expression constraints. In Proc. of the VLDB Conference, Edinburgh, Scotland, September 1999.Google Scholar
- 3.Magerman D. and Marcus, M. 1990. Parsing a natural language using mutual information statistics. In Proceedings of AAAI-90, Eighth National Conference on Artificial Intelligence, 984989Google Scholar
- 4.H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 1997.Google Scholar
- 6.Tim Oates, Laura Firoiu, Paul Cohen. Using Dynamic Time Warping to Bootstrap HMM-Based Clustering of Time Series. In Sequence Learning: Paradigms, Algorithms and Applications. Ron Sun and C. L. Giles (Eds.) Springer-Verlag: LNAI 1828. 2001Google Scholar
- 7.Paola Sebastiani, Marco Ramoni, Paul Cohen. Sequence Learning via Bayesian Clustering by Dynamics. In Sequence Learning: Paradigms, Algorithms and Applications. Ron Sun and C. L. Giles (Eds.) Springer-Verlag: LNAI 1828. 2001Google Scholar
- 9.Weiss, G. M., and Hirsh, H. 1998. Learning to Predict Rare Events in Categorical Time-Series Data, Proceedings of the 1998 AAAI/ICML Workshop on Time-Series Analysis, Madison, Wisconsin.Google Scholar