Advertisement

Efficiently mining cohesion-based patterns and rules in event sequences

  • Boris Cule
  • Len FeremansEmail author
  • Bart Goethals
Article
  • 44 Downloads

Abstract

Discovering patterns in long event sequences is an important data mining task. Traditionally, research focused on frequency-based quality measures that allow algorithms to use the anti-monotonicity property to prune the search space and efficiently discover the most frequent patterns. In this work, we step away from such measures, and evaluate patterns using cohesion — a measure of how close to each other the items making up the pattern appear in the sequence on average. We tackle the fact that cohesion is not an anti-monotonic measure by developing an upper bound on cohesion in order to prune the search space. By doing so, we are able to efficiently unearth rare, but strongly cohesive, patterns that existing methods often fail to discover. Furthermore, having found the occurrences of cohesive itemsets in the input sequence, we use them to discover the representative sequential patterns and the dominant partially ordered episodes, without going through the computationally expensive candidate generation procedures typically associated with sequential pattern and episode mining. Experiments show that our method efficiently discovers important patterns that existing state-of-the-art methods fail to discover.

Keywords

Cohesive itemsets Sequential patterns Episodes Association rules 

Notes

Acknowledgements

The authors would like to thank the VLAIO SBO HYMOP project for funding this research.

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: International conference on very large data bases, pp 487–499Google Scholar
  2. Church KW, Mercer RL (1993) Introduction to the special issue on computational linguistics using large corpora. Comput Linguist 19(1):1–24Google Scholar
  3. Cule B, Goethals B (2010) Mining association rules in long sequences. In: Pacific-Asia conference on knowledge discovery and data miningGoogle Scholar
  4. Cule B, Goethals B, Robardet C (2009) A new constraint for mining sets in sequences. In: Proceedings of the 2009 SIAM international conference on data miningGoogle Scholar
  5. Cule B, Tatti N, Goethals B (2014) Marbles: Mining association rules buried in long event sequences. Stat Anal Data Min ASA Data Sci J 7(2):93–110MathSciNetGoogle Scholar
  6. Cule B, Feremans L, Goethals B (2016) Efficient discovery of sets of co-occurring items in event sequences. In: European conference on machine learning and principles and practice of knowledge discovery in databases, pp 361–377. SpringerGoogle Scholar
  7. Feremans L, Cule B, Goethals B (2018) Mining top-k quantile-based cohesive sequential patterns. In Proceedings of the 2018 SIAM international conference on data mining, pp 90–98. SIAMGoogle Scholar
  8. Fowkes J, Sutton C (2016) A subsequence interleaving model for sequential pattern mining. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 835–844. ACMGoogle Scholar
  9. Grünwald PD (2007) The minimum description length principle. MIT press, CambridgeGoogle Scholar
  10. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetGoogle Scholar
  11. Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Natl Lang Eng 1(1):9–27Google Scholar
  12. Lam HT, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min ASA Data Sci J 7(1):34–52MathSciNetGoogle Scholar
  13. Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningGoogle Scholar
  14. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289Google Scholar
  15. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT press, CambridgezbMATHGoogle Scholar
  16. Méger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: European conference on machine learning and principles and practice of knowledge discovery in databasesGoogle Scholar
  17. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto Helen, Chen Qiming, Dayal Umeshwar, Hsu Mei-Chun (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440Google Scholar
  18. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160Google Scholar
  19. Petitjean F, Li T, Tatti N, Webb GI (2016) Skopus: mining top-k sequential patterns under leverage. Data Min Knowl Discov 30(5):1086–1111MathSciNetGoogle Scholar
  20. Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: International conference on extending database technology, pp 1–17. SpringerGoogle Scholar
  21. Tatti N (2014) Discovering episodes with compact minimal windows. Data Min Knowl Discov 28(4):1046–1077MathSciNetzbMATHGoogle Scholar
  22. Tatti N (2015) Ranking episodes using a partition model. Data Min Knowl Discov 29(5):1312–1342MathSciNetzbMATHGoogle Scholar
  23. Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(1):34–66MathSciNetzbMATHGoogle Scholar
  24. Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 462–470. ACMGoogle Scholar
  25. Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: IEEE international conference on data engineering, pp 79–90Google Scholar
  26. Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans Knowl Discov Data 4(1):3MathSciNetGoogle Scholar
  27. Zaki MJ (2001) Spade: An efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60zbMATHGoogle Scholar
  28. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390Google Scholar
  29. Zimmermann A (2014) Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data. Intell Data Anal 18(5):761–791Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of AntwerpAntwerpBelgium
  2. 2.Monash UniversityMelbourneAustralia

Personalised recommendations