A Qualitative Study of Similarity Measures in Event-Based Data

  • Katerina Vrotsou
  • Camilla Forsell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6771)


This paper presents an interview-based study of the definition of sequence similarity in different application areas of event-based data. The applicability of nine identified measures across these areas is investigated and discussed. The work helps highlight what are the core characteristics sought when analysing event-based data and performs a first validation of this across disciplines. The results of the study make a solid basis for follow-up evaluations of the practical applicability and usability of the similarity measures.


Event-based data event-sequences evaluation qualitative study similarity measures 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Han, J., Kamber, M.: Data mining. Concepts and techniques. Morgan Kaufmann, San Francisco (2006)zbMATHGoogle Scholar
  2. 2.
    Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)Google Scholar
  3. 3.
    Vrotsou, K.: Everyday mining: Exploring sequences in event-based data. PhD thesis, Linköping University (2010)Google Scholar
  4. 4.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  5. 5.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Hamming, R.W.: Error detecting and error correcting codes. Bell System Technical Journal 26(2), 147–160 (1950)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Ergun, F., Muthukrishnan, S., Sahinalp, S.C.: Comparing Sequences with Segment Rearrangements. In: Proceedings of Foundations of Software Technology and Theoretical Computer Science, pp. 183–194. Springer, Berlin (2003)Google Scholar
  8. 8.
    Moen, P.: Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining. PhD thesis, Dept. of Computer Science, University of Helsinki (2000)Google Scholar
  9. 9.
    Mannila, H., Moen, P.: Similarity between Event Types in Sequences. In: DaWaK 1999: Proc. of the First International Conference on Data Warehousing and Knowledge Discovery, Florence, Italy, pp. 271–280. Springer, Heidelberg (1999)Google Scholar
  10. 10.
    Wongsuphasawat, K., Shneiderman, B.: Finding comparable temporal categorical records: A similarity measure with an interactive visualization. In: IEEE Symposium on Visual Analytics Science and Technology, pp. 27–34 (2009)Google Scholar
  11. 11.
    Wongsuphasawat, K., Plaisant, C., Shneiderman, B.: Querying Timestamped Event Sequences by Exact Search or Similarity-based Search: Design and Empirical Evaluation (2010)Google Scholar
  12. 12.
    Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison.. Proc. of the National Academy of Sciences of the USA 85(8), 2444–2448 (1988)CrossRefGoogle Scholar
  13. 13.
    Gómez-Alonso, C., Valls, A.: A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements. In: Modeling Decisions for Artificial Intelligence, vol. 1, pp. 134–145. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Katerina Vrotsou
    • 1
  • Camilla Forsell
    • 1
  1. 1.Department of Science and TechnologyLinköping UniversityNorrköpingSweden

Personalised recommendations