Abstract
This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The Voting-Experts algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four languages. The algorithm also segments time series of robot sensor data into subsequences that represent episodes in the life of the robot. We claim that Voting- Experts finds meaningful episodes in categorical time series because it exploits two statistical characteristics of meaningful episodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nevill-Manning, C.G., Witten, I. H.: Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 7 (1997) 67–82
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Miningand Knowledge Discovery 1 (1997) 259–289
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: The VLDB Journal. (1999) 223–234
Teahan, W.J., Wen, Y., McNab, R.J., Witten, I.H.: A compression-based algorithm for chinese word segmentation. Computational Linguistics 26 (2000) 375–393
Weiss, G.M., Hirsh, H.: Learningto predict rare events in event sequences. In: Knowledge Discovery and Data Mining. (1998) 359–363
Magerman, D., Marcus M.: Parsing a natural language using mutual information statistics. In: Proceedings, Eighth National Conference on Artificial Intelligence (AAAI 90). (1990) 984–989
Brent, M.R.: An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning 45 (1999) 71–105
Ando, R.K., Lee, L.: Mostly-unsupervised statistical segmentation of japanese: Application to kanji. In: Proceedings of the American Association for Computational Linguistics (NAACL). (2000) 241–248
Van Rijsbergen, C.J.: Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cohen, P., Heeringa, B., Adams, N.M. (2002). An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds) Pattern Detection and Discovery. Lecture Notes in Computer Science(), vol 2447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45728-3_5
Download citation
DOI: https://doi.org/10.1007/3-540-45728-3_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44148-9
Online ISBN: 978-3-540-45728-2
eBook Packages: Springer Book Archive