Skip to main content
Log in

Mining closed strict episodes

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also enforce in which order the events are allowed to occur. In this work we introduce a technique for discovering closed episodes. Adopting existing approaches for discovering traditional patterns, such as closed itemsets, to episodes is not straightforward. First of all, we cannot define a unique closure based on frequency because an episode may have several closed superepisodes. Moreover, to define a closedness concept for episodes we need a subset relationship between episodes, which is not trivial to define. We approach these problems by introducing strict episodes. We argue that this class is general enough, and at the same time we are able to define a natural subset relationship within it and use it efficiently. In order to mine closed episodes we define an auxiliary closure operator. We show that this closure satisfies the needed properties so that we can use the existing framework for mining closed patterns. Discovering the true closed episodes can be done as a post-processing step. We combine these observations into an efficient mining algorithm and demonstrate empirically its performance in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499

  • Agrawal R, Srikant R (1995) Mining sequential patterns. In: 11th international conference on data engineering (ICDE 1995), pp 3–14

  • Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: Proceedings of the 7th IEEE international conference on data mining (ICDM 2007), pp 83–92

  • Casas-Garriga G (2003) Discovering unbounded episodes in sequential data. In: Knowledge discovery in databases: PKDD 2003, 7th European conference on principles and practice of knowledge discovery in databases, pp 83–94

  • Casas-Garriga G (2005) Summarizing sequential data with closed partial orders. In: Proceedings of the SIAM international conference on data mining (SDM 2005), pp 380–391

  • Cule B, Goethals B, Robardet C (2009) A new constraint for mining sets in sequences. In: Proceedings of the SIAM international conference on data mining (SDM 2009), pp 317–328

  • Garofalakis M, Rastogi R, Shim K (2002) Mining sequential patterns with regular expression constraints. IEEE Trans Knowl Data Eng 14(3): 530–552

    Article  Google Scholar 

  • Gwadera R, Atallah MJ, Szpankowski W (2005a) Markov models for identification of significant episodes. In: Proceedings of the SIAM international conference on data mining (SDM 2005), pp 404–414

  • Gwadera R, Atallah MJ, Szpankowski W (2005) Reliable detection of episodes in event sequences. Knowl Inf Syst 7(4): 415–437

    Article  Google Scholar 

  • Laxman S, Sastry PS (2006) A survey of temporal data mining. SADHANA Acad Proc Eng Sci 31(2): 173–198

    MathSciNet  MATH  Google Scholar 

  • Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2007), pp 410–419

  • Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3): 259–289

    Article  Google Scholar 

  • Méger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: Knowledge discovery in databases: PKDD 2004, 8th European conference on principles and practice of knowledge discovery in databases, pp 313–324

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: ICDT ’99: proceedings of the 7th international conference on database theory, pp 398–416

  • Pei J, Wang H, Liu J, Wang K, Wang J, Yu PS (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11): 1467–1481

    Article  Google Scholar 

  • Tatti N (2009) Significance of episodes based on minimal windows. In: Proceedings of the 9th IEEE international conference on data mining (ICDM 2009), pp 513–522

  • Tatti N, Cule B (2010) Mining closed strict episodes. In: Proceedings of the 10th IEEE international conference on data mining (ICDM 2010)

  • Tzvetkov P, Yan X, Han J (2003) Tsp: mining top-k closed sequential patterns. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), pp 347–354

  • Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: 20th international conference on data engineering (ICDE 2004), p 79

  • Wang JT-L, Chirn G-W, Marr TG, Shapiro B, Shasha D, Zhang K (1994) Combinatorial pattern discovery for scientific data: some preliminary results. ACM SIGMOD Rec 23(2): 115–125

    Article  Google Scholar 

  • Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining (SDM 2003), pp 166–177

  • Zhou W, Liu H, Cheng H (2010) Mining closed episodes from event sequences efficiently. In: Proceedings of the 14th Pacific-Asia conference on knowledge discovery and data mining, vol 1, pp 310–318

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaj Tatti.

Additional information

Responsible editor: M. J. Zaki.

The research described in this paper builds upon and extends the work appearing in Proceedings of Tenth IEEE International Conference on Data Mining (ICDM 2010), 2010 (Tatti and Cule 2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tatti, N., Cule, B. Mining closed strict episodes. Data Min Knowl Disc 25, 34–66 (2012). https://doi.org/10.1007/s10618-011-0232-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0232-z

Keywords

Navigation