Abstract
Similarity or distance between objects is one of the central concepts in data mining. In this paper we consider the following problem: given a set of event sequences, define a useful notion of similarity between the different types of events occurring in the sequences. We approach the problem by considering two event types to be similar if they occur in similar contexts. The context of an occurrence of an event type is defined as the set of types of the events happening within a certain time limit before the occurrence. Then two event types are similar if their sets of contexts are similar. We quantify this by using a simple approach of computing centroids of sets of contexts and using the L1 distance. We present empirical results on telecommunications alarm sequences and student enrollment data, showing that the method produces intuitively appealing results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Agrawal, C. Faloutsos, and A. Swami. Efficiency similarity search in sequence databases. In FODO’93, pages 69–84. Springer-Verlag, Oct. 1993.
R. Agrawal, K.-I. Lin, H. S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB’95, pages 490–501. Morgan Kaufmann, Sept. 1995.
G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In KDD’98, pages 23–29. AAAI, Aug. 1998.
D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In CP’95, pages 137–153. Springer-Verlag, Sept. 1995.
H. Jagadish, A. O. Mendelzon, and T. Milo. Similarity-based queries. In PODS’95, pages 36–45. ACM, May 1995.
Y. Karov and S. Edelman. Similarity-based word sense disambiguation. Computational Linguistics, 24(1):41–59, Mar. 1998.
A. J. Knobbe and P. W. Adriaans. Analysing binary associations. In KDD’96, pages 311–314. AAAI, Aug. 1996.
S. Kullbach. Information Theory and Statistics. John Wiley Inc., NY, USA, 1959.
S. Kullbach and R. A. Leibler. On information theory and sufficiency. Annals of Mathematical Statistics, 22:79–86, 1951.
P. Laird. Identifying and using patterns in sequential data. In ALT’93, pages 1–18. Springer-Verlag, Nov. 1993.
H. Mannila and P. Ronkainen. Similarity of event sequences. In TIME’97, pages 136–139. IEEE, May 1997.
H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In KDD’96, pages 146–151. AAAI, Aug. 1996.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In KDD’95, pages 210–215. AAAI, Aug. 1995.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.
R. A. Morris, L. Khatib, and G. Ligozat. Generating scenarios from specifications of repeating events. In TIME’95. IEEE, Apr. 1995.
T. Oates and P. R. Cohen. Searching for structure in multiple streams of data. In ICML’96, pages 346–354. Morgan Kaufmann, July 1996.
D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. SIGMOD Record, 26(2):13–25, May 1997.
P. Ronkainen. Attribute similarity and event sequence similarity in data mining. PhLic thesis, Report C-1998-42, University of Helsinki, Department of Computer Science, Helsinki, Finland, Oct. 1998.
D. A. White and R. Jain. Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, San Diego, USA, July 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mannila, H., Moen, P. (1999). Similarity between Event Types in Sequences. In: Mohania, M., Tjoa, A.M. (eds) DataWarehousing and Knowledge Discovery. DaWaK 1999. Lecture Notes in Computer Science, vol 1676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48298-9_29
Download citation
DOI: https://doi.org/10.1007/3-540-48298-9_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66458-1
Online ISBN: 978-3-540-48298-7
eBook Packages: Springer Book Archive