Skip to main content

Similarity between Event Types in Sequences

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1676))

Abstract

Similarity or distance between objects is one of the central concepts in data mining. In this paper we consider the following problem: given a set of event sequences, define a useful notion of similarity between the different types of events occurring in the sequences. We approach the problem by considering two event types to be similar if they occur in similar contexts. The context of an occurrence of an event type is defined as the set of types of the events happening within a certain time limit before the occurrence. Then two event types are similar if their sets of contexts are similar. We quantify this by using a simple approach of computing centroids of sets of contexts and using the L1 distance. We present empirical results on telecommunications alarm sequences and student enrollment data, showing that the method produces intuitively appealing results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, C. Faloutsos, and A. Swami. Efficiency similarity search in sequence databases. In FODO’93, pages 69–84. Springer-Verlag, Oct. 1993.

    Google Scholar 

  2. R. Agrawal, K.-I. Lin, H. S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB’95, pages 490–501. Morgan Kaufmann, Sept. 1995.

    Google Scholar 

  3. G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In KDD’98, pages 23–29. AAAI, Aug. 1998.

    Google Scholar 

  4. D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In CP’95, pages 137–153. Springer-Verlag, Sept. 1995.

    Google Scholar 

  5. H. Jagadish, A. O. Mendelzon, and T. Milo. Similarity-based queries. In PODS’95, pages 36–45. ACM, May 1995.

    Google Scholar 

  6. Y. Karov and S. Edelman. Similarity-based word sense disambiguation. Computational Linguistics, 24(1):41–59, Mar. 1998.

    Google Scholar 

  7. A. J. Knobbe and P. W. Adriaans. Analysing binary associations. In KDD’96, pages 311–314. AAAI, Aug. 1996.

    Google Scholar 

  8. S. Kullbach. Information Theory and Statistics. John Wiley Inc., NY, USA, 1959.

    Google Scholar 

  9. S. Kullbach and R. A. Leibler. On information theory and sufficiency. Annals of Mathematical Statistics, 22:79–86, 1951.

    Article  MathSciNet  Google Scholar 

  10. P. Laird. Identifying and using patterns in sequential data. In ALT’93, pages 1–18. Springer-Verlag, Nov. 1993.

    Google Scholar 

  11. H. Mannila and P. Ronkainen. Similarity of event sequences. In TIME’97, pages 136–139. IEEE, May 1997.

    Google Scholar 

  12. H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In KDD’96, pages 146–151. AAAI, Aug. 1996.

    Google Scholar 

  13. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In KDD’95, pages 210–215. AAAI, Aug. 1995.

    Google Scholar 

  14. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.

    Article  Google Scholar 

  15. R. A. Morris, L. Khatib, and G. Ligozat. Generating scenarios from specifications of repeating events. In TIME’95. IEEE, Apr. 1995.

    Google Scholar 

  16. T. Oates and P. R. Cohen. Searching for structure in multiple streams of data. In ICML’96, pages 346–354. Morgan Kaufmann, July 1996.

    Google Scholar 

  17. D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. SIGMOD Record, 26(2):13–25, May 1997.

    Article  Google Scholar 

  18. P. Ronkainen. Attribute similarity and event sequence similarity in data mining. PhLic thesis, Report C-1998-42, University of Helsinki, Department of Computer Science, Helsinki, Finland, Oct. 1998.

    Google Scholar 

  19. D. A. White and R. Jain. Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, San Diego, USA, July 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannila, H., Moen, P. (1999). Similarity between Event Types in Sequences. In: Mohania, M., Tjoa, A.M. (eds) DataWarehousing and Knowledge Discovery. DaWaK 1999. Lecture Notes in Computer Science, vol 1676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48298-9_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-48298-9_29

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66458-1

  • Online ISBN: 978-3-540-48298-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics