Skip to main content
Log in

Anytime mining of sequential discriminative patterns in labeled sequences

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of patterns covering sets of objects having interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for both transactional and numerical data, discovering subgroups within labeled sequential data has been much less studied. First, we propose an anytime algorithm SeqScout that discovers interesting subgroups w.r.t. a chosen quality measure. This is a sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. For a given budget, it finds a collection of local optima in the search space of descriptions and thus, subgroups. It requires a light configuration and is independent from the quality measure used for pattern scoring. We also introduce a second anytime algorithm MCTSExtent that pushes further the idea of a better trade-off between exploration and exploitation of a sampling strategy over the search space. To the best of our knowledge, this is the first time that the Monte Carlo Tree Search framework is exploited in a sequential data mining setting. We have conducted a thorough and comprehensive evaluation of our algorithms on several datasets to illustrate their added value, and we discuss their qualitative and quantitative results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

Notes

  1. https://www.rocketleague.com/.

  2. https://github.com/Romathonat/MCTSExtent.

  3. In the context of sequential pattern mining, the search space is a priori infinite. However, we can define the border of the search space (the bottom border in Fig. 5) by excluding patterns having a null support. We can easily prove that each element of this border is a sequence within the database. Therefore, the search space shape depends on the data.

  4. https://github.com/Romathonat/MCTSExtent.

  5. https://www.rocketleague.com/.

References

  1. Agrawal R, Srikant R (1995) Mining sequential patterns. Proc IEEE ICDE 1995:3–14. https://doi.org/10.1109/ICDE.1995.380415

    Article  Google Scholar 

  2. Atzmüller M, Puppe F (2006) SD-map: a fast algorithm for exhaustive subgroup discovery. Proc PKDD 2006:6–17. https://doi.org/10.1007/11871637_6

    Article  Google Scholar 

  3. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256. https://doi.org/10.1023/A:1013689704352

    Article  MATH  Google Scholar 

  4. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. Proc ACM SIGKDD 2002:429–435. https://doi.org/10.1145/775047.775109

    Article  Google Scholar 

  5. Belfodil A, Belfodil A, Kaytoue M (2018) Anytime subgroup discovery in numerical domains with guarantees. Proc ECML/PKDD 2018(2):500–516. https://doi.org/10.1007/978-3-030-10928-8_30

    Article  Google Scholar 

  6. Boley M, Lucchese C, Paurat D, Gärtner T (2011) Direct local pattern sampling by efficient two-step random procedures. Proc ACM SIGKDD 2011:582–590. https://doi.org/10.1145/2020408.2020500

    Article  Google Scholar 

  7. Bosc G, Boulicaut JF, Raïssi C, Kaytoue M (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Mini Knowl Discov 32(3):604–650. https://doi.org/10.1007/s10618-017-0547-5

    Article  MathSciNet  MATH  Google Scholar 

  8. Bosc G, Tan P, Boulicaut JF, Raïssi C, Kaytoue M (2017) A pattern mining approach to study strategy balance in RTS games. IEEE Trans Comput Intell AI Games 9(2):123–132. https://doi.org/10.1109/TCIAIG.2015.2511819

    Article  Google Scholar 

  9. Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43. https://doi.org/10.1109/TCIAIG.2012.2186810

    Article  Google Scholar 

  10. Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found Trends Mach Learn 5(1):1–122. https://doi.org/10.1561/2200000024

    Article  MATH  Google Scholar 

  11. DeepMind: Alphastar: Mastering the real-time strategy game StarCraft II (2019). https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/

  12. Diop L, Diop CT, Giacometti A, Li D, Soulet A (2018) Sequential pattern sampling with norm constraints. Proc IEEE ICDM 2018:89–98. https://doi.org/10.1109/ICDM.2018.00024

    Article  Google Scholar 

  13. Diop L, Diop CT, Giacometti A, Li D, Soulet A (2019) Sequential pattern sampling with norm-based utility. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01417-3

    Article  Google Scholar 

  14. Dua D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  15. Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining. Data Min Knowl Discov 30(1):47–98. https://doi.org/10.1007/s10618-015-0403-4

    Article  MathSciNet  MATH  Google Scholar 

  16. Egho E, Gay D, Boullé M, Voisine N, Clérot F (2017) A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52(1):53–81. https://doi.org/10.1007/s10115-016-1002-4

    Article  Google Scholar 

  17. Egho E, Raïssi C, Calders T, Jay N, Napoli A (2015) On measuring similarity for sequences of itemsets. Data Min Knowl Discov 29(3):732–764. https://doi.org/10.1007/s10618-014-0362-1

    Article  MathSciNet  MATH  Google Scholar 

  18. Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749. https://doi.org/10.1007/s10115-014-0817-0

    Article  Google Scholar 

  19. Giacometti A, Li DH, Marcel P, Soulet A (2013) 20 years of pattern mining: a bibliometric survey. SIGKDD Explor Newsl 15(1):41–50. https://doi.org/10.1145/2594473.2594480

    Article  Google Scholar 

  20. Gsponer S, Smyth B, Ifrim G (2017) Efficient sequence regression by learning linear models in all-subsequence space. Proc ECML/PKDD 2017(2):37–52. https://doi.org/10.1007/978-3-319-71246-8_3

    Article  Google Scholar 

  21. Hirschberg DS (1977) Algorithms for the longest common subsequence problem. J ACM 24(4):664–675. https://doi.org/10.1145/322033.322044

    Article  MathSciNet  MATH  Google Scholar 

  22. Ji X, Bailey J, Dong G (2007) Mining minimal distinguishing subsequence patterns with gap constraints. Knowl Inf Syst 11(3):259–286. https://doi.org/10.1007/s10115-006-0038-2

    Article  Google Scholar 

  23. Lavrac N, Flach PA, Zupan B (1999) Rule evaluation measures: a unifying view. Proc ILP 1999:174–185. https://doi.org/10.1007/3-540-48751-4_17

    Article  Google Scholar 

  24. Leeuwen MV, Knobbe AJ (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242. https://doi.org/10.1007/s10618-012-0273-y

    Article  MathSciNet  Google Scholar 

  25. Letham B, Rudin C, Madigan D (2013) Sequential event prediction. Mach Learn 93(2):357–380. https://doi.org/10.1007/s10994-013-5356-5

    Article  MathSciNet  MATH  Google Scholar 

  26. Mathonat R, Boulicaut JF, Kaytoue M (2019) SeqScout: Using a bandit model to discover interesting subgroups in labeled sequences. Proc IEEE DSAA 2019:81–90

    Google Scholar 

  27. Mathonat R, Boulicaut JF, Kaytoue M (2020) A behavioral pattern mining approach to model player skills in rocket league. In: 2020 IEEE conference on games (CoG)

  28. Moens S, Boley M (2014) Instant exceptional model mining using weighted controlled pattern sampling. Proc IDA 2014:203–214. https://doi.org/10.1007/978-3-319-12571-8_18

    Article  Google Scholar 

  29. Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. https://doi.org/10.1007/s10618-007-0070-1

    Article  MathSciNet  Google Scholar 

  30. Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403. https://doi.org/10.1145/1577069.1577083

    Article  MATH  Google Scholar 

  31. Nowozin S, Bakir G, Tsuda K (2007) Discriminative subsequence mining for action classification. Proc IEEE ICSV 2007:1–8. https://doi.org/10.1109/ICCV.2007.4409049

    Article  Google Scholar 

  32. Pei J, Han J, Mortazavi-asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Terano T, Liu H, Chen ALP (eds) Knowledge discovery and data mining. Current issues and new applications. Springer, Berlin, pp 396–407

    Chapter  Google Scholar 

  33. Raïssi C, Pei J (2011) Towards bounding sequential patterns. Proc ACM SIGKDD 2011:1379–1387. https://doi.org/10.1145/2020408.2020612

    Article  Google Scholar 

  34. Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall Press, Upper Saddle River

    MATH  Google Scholar 

  35. She R, Chen F, Wang K, Ester M, Gardy JL, Brinkman FSL (2003) Frequent-subsequence-based prediction of outer membrane proteins. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp 436–445. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/956750.956800

  36. Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. Proc ACM SIGKDD 2012:462–470. https://doi.org/10.1145/2339530.2339606

    Article  Google Scholar 

  37. Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings ACM SIGKDD 2003, pp 216–225. ACM. https://doi.org/10.1145/956750.956777

  38. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. Proc PKDD 1997:78–87

    Google Scholar 

  39. Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor Newsl 12(1):40–48. https://doi.org/10.1145/1882471.1882478

    Article  Google Scholar 

  40. Zaki M, Lesh N, Ogihara M (2000) Planmine: predicting plan failures using sequence mining. Artif Intell Rev 14:421–446. https://doi.org/10.1023/A:1006612804250

    Article  MATH  Google Scholar 

  41. Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60. https://doi.org/10.1023/A:1007652502315

    Article  MATH  Google Scholar 

  42. Zhou C, Cule B, Goethals B (2016) Pattern based sequence classification. IEEE Trans Knowl Data Eng 28:1285–1298. https://doi.org/10.1109/TKDE.2015.2510010

    Article  Google Scholar 

  43. Zimmermann A, Nijssen S (2014) Supervised pattern mining and applications to classification. In: Frequent pattern mining, pp 437–439

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romain Mathonat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mathonat, R., Nurbakova, D., Boulicaut, JF. et al. Anytime mining of sequential discriminative patterns in labeled sequences. Knowl Inf Syst 63, 439–476 (2021). https://doi.org/10.1007/s10115-020-01523-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01523-7

Keywords

Navigation