Abstract
It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of patterns covering sets of objects having interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for both transactional and numerical data, discovering subgroups within labeled sequential data has been much less studied. First, we propose an anytime algorithm SeqScout that discovers interesting subgroups w.r.t. a chosen quality measure. This is a sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. For a given budget, it finds a collection of local optima in the search space of descriptions and thus, subgroups. It requires a light configuration and is independent from the quality measure used for pattern scoring. We also introduce a second anytime algorithm MCTSExtent that pushes further the idea of a better trade-off between exploration and exploitation of a sampling strategy over the search space. To the best of our knowledge, this is the first time that the Monte Carlo Tree Search framework is exploited in a sequential data mining setting. We have conducted a thorough and comprehensive evaluation of our algorithms on several datasets to illustrate their added value, and we discuss their qualitative and quantitative results.
Similar content being viewed by others
Notes
In the context of sequential pattern mining, the search space is a priori infinite. However, we can define the border of the search space (the bottom border in Fig. 5) by excluding patterns having a null support. We can easily prove that each element of this border is a sequence within the database. Therefore, the search space shape depends on the data.
References
Agrawal R, Srikant R (1995) Mining sequential patterns. Proc IEEE ICDE 1995:3–14. https://doi.org/10.1109/ICDE.1995.380415
Atzmüller M, Puppe F (2006) SD-map: a fast algorithm for exhaustive subgroup discovery. Proc PKDD 2006:6–17. https://doi.org/10.1007/11871637_6
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256. https://doi.org/10.1023/A:1013689704352
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. Proc ACM SIGKDD 2002:429–435. https://doi.org/10.1145/775047.775109
Belfodil A, Belfodil A, Kaytoue M (2018) Anytime subgroup discovery in numerical domains with guarantees. Proc ECML/PKDD 2018(2):500–516. https://doi.org/10.1007/978-3-030-10928-8_30
Boley M, Lucchese C, Paurat D, Gärtner T (2011) Direct local pattern sampling by efficient two-step random procedures. Proc ACM SIGKDD 2011:582–590. https://doi.org/10.1145/2020408.2020500
Bosc G, Boulicaut JF, Raïssi C, Kaytoue M (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Mini Knowl Discov 32(3):604–650. https://doi.org/10.1007/s10618-017-0547-5
Bosc G, Tan P, Boulicaut JF, Raïssi C, Kaytoue M (2017) A pattern mining approach to study strategy balance in RTS games. IEEE Trans Comput Intell AI Games 9(2):123–132. https://doi.org/10.1109/TCIAIG.2015.2511819
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43. https://doi.org/10.1109/TCIAIG.2012.2186810
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found Trends Mach Learn 5(1):1–122. https://doi.org/10.1561/2200000024
DeepMind: Alphastar: Mastering the real-time strategy game StarCraft II (2019). https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/
Diop L, Diop CT, Giacometti A, Li D, Soulet A (2018) Sequential pattern sampling with norm constraints. Proc IEEE ICDM 2018:89–98. https://doi.org/10.1109/ICDM.2018.00024
Diop L, Diop CT, Giacometti A, Li D, Soulet A (2019) Sequential pattern sampling with norm-based utility. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01417-3
Dua D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining. Data Min Knowl Discov 30(1):47–98. https://doi.org/10.1007/s10618-015-0403-4
Egho E, Gay D, Boullé M, Voisine N, Clérot F (2017) A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52(1):53–81. https://doi.org/10.1007/s10115-016-1002-4
Egho E, Raïssi C, Calders T, Jay N, Napoli A (2015) On measuring similarity for sequences of itemsets. Data Min Knowl Discov 29(3):732–764. https://doi.org/10.1007/s10618-014-0362-1
Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749. https://doi.org/10.1007/s10115-014-0817-0
Giacometti A, Li DH, Marcel P, Soulet A (2013) 20 years of pattern mining: a bibliometric survey. SIGKDD Explor Newsl 15(1):41–50. https://doi.org/10.1145/2594473.2594480
Gsponer S, Smyth B, Ifrim G (2017) Efficient sequence regression by learning linear models in all-subsequence space. Proc ECML/PKDD 2017(2):37–52. https://doi.org/10.1007/978-3-319-71246-8_3
Hirschberg DS (1977) Algorithms for the longest common subsequence problem. J ACM 24(4):664–675. https://doi.org/10.1145/322033.322044
Ji X, Bailey J, Dong G (2007) Mining minimal distinguishing subsequence patterns with gap constraints. Knowl Inf Syst 11(3):259–286. https://doi.org/10.1007/s10115-006-0038-2
Lavrac N, Flach PA, Zupan B (1999) Rule evaluation measures: a unifying view. Proc ILP 1999:174–185. https://doi.org/10.1007/3-540-48751-4_17
Leeuwen MV, Knobbe AJ (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242. https://doi.org/10.1007/s10618-012-0273-y
Letham B, Rudin C, Madigan D (2013) Sequential event prediction. Mach Learn 93(2):357–380. https://doi.org/10.1007/s10994-013-5356-5
Mathonat R, Boulicaut JF, Kaytoue M (2019) SeqScout: Using a bandit model to discover interesting subgroups in labeled sequences. Proc IEEE DSAA 2019:81–90
Mathonat R, Boulicaut JF, Kaytoue M (2020) A behavioral pattern mining approach to model player skills in rocket league. In: 2020 IEEE conference on games (CoG)
Moens S, Boley M (2014) Instant exceptional model mining using weighted controlled pattern sampling. Proc IDA 2014:203–214. https://doi.org/10.1007/978-3-319-12571-8_18
Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. https://doi.org/10.1007/s10618-007-0070-1
Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403. https://doi.org/10.1145/1577069.1577083
Nowozin S, Bakir G, Tsuda K (2007) Discriminative subsequence mining for action classification. Proc IEEE ICSV 2007:1–8. https://doi.org/10.1109/ICCV.2007.4409049
Pei J, Han J, Mortazavi-asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Terano T, Liu H, Chen ALP (eds) Knowledge discovery and data mining. Current issues and new applications. Springer, Berlin, pp 396–407
Raïssi C, Pei J (2011) Towards bounding sequential patterns. Proc ACM SIGKDD 2011:1379–1387. https://doi.org/10.1145/2020408.2020612
Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall Press, Upper Saddle River
She R, Chen F, Wang K, Ester M, Gardy JL, Brinkman FSL (2003) Frequent-subsequence-based prediction of outer membrane proteins. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp 436–445. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/956750.956800
Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. Proc ACM SIGKDD 2012:462–470. https://doi.org/10.1145/2339530.2339606
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings ACM SIGKDD 2003, pp 216–225. ACM. https://doi.org/10.1145/956750.956777
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. Proc PKDD 1997:78–87
Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor Newsl 12(1):40–48. https://doi.org/10.1145/1882471.1882478
Zaki M, Lesh N, Ogihara M (2000) Planmine: predicting plan failures using sequence mining. Artif Intell Rev 14:421–446. https://doi.org/10.1023/A:1006612804250
Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60. https://doi.org/10.1023/A:1007652502315
Zhou C, Cule B, Goethals B (2016) Pattern based sequence classification. IEEE Trans Knowl Data Eng 28:1285–1298. https://doi.org/10.1109/TKDE.2015.2510010
Zimmermann A, Nijssen S (2014) Supervised pattern mining and applications to classification. In: Frequent pattern mining, pp 437–439
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mathonat, R., Nurbakova, D., Boulicaut, JF. et al. Anytime mining of sequential discriminative patterns in labeled sequences. Knowl Inf Syst 63, 439–476 (2021). https://doi.org/10.1007/s10115-020-01523-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01523-7