Skip to main content

Advertisement

Log in

Anytime discovery of a diverse set of patterns with Monte Carlo tree search

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting patterns from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing approaches make use of beam-search, sampling, and genetic algorithms for discovering a pattern set that is non-redundant and of high quality w.r.t. a pattern quality measure. We argue that such approaches produce pattern sets that lack of diversity: Only few patterns of high quality, and different enough, are discovered. Our main contribution is then to formally define pattern mining as a game and to solve it with Monte Carlo tree search (MCTS). It can be seen as an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property. We show through a comprehensive set of experiments how MCTS enables the anytime discovery of a diverse pattern set of high quality. It outperforms other approaches when dealing with a large pattern search space and for different quality measures. Thanks to its genericity, our MCTS approach can be used for SD but also for many other pattern mining tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Taken from Browne et al. (2012)

Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Notes

  1. We consider the finite set of all intervals from the data, without greedy discretization. As shown later, better patterns can be found in that case, when using only MCTS on large datasets.

  2. https://github.com/guillaume-bosc/MCTS4DM.

References

  • Abramson B (1990) Expected-outcome: a general model of static evaluation. IEEE Trans Pattern Anal Mach Intell 12(2):182–193. https://doi.org/10.1109/34.44404

    Article  Google Scholar 

  • Abudawood T, Flach PA (2009) Evaluation measures for multi-class subgroup discovery. In: Buntine WL, Grobelnik M, Mladenic D, Shawe-Taylor J (eds) In: Machine learning and knowledge discovery in databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7–11, 2009, Proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 5781, pp 35–50. https://doi.org/10.1007/978-3-642-04180-8_20

  • Atzmüller M, Lemmerich F (2009) Fast subgroup discovery for continuous target concepts. In: Rauch J, Ras ZW, Berka P, Elomaa T (eds) Foundations of intelligent systems, 18th international symposium, ISMIS 2009, Prague, Czech Republic, September 14–17, 2009. Proceedings, Springer, Lecture Notes in Computer Science, vol 5722, pp 35–44. https://doi.org/10.1007/978-3-642-04125-9_7

  • Atzmüller M, Puppe F (2006) Sd-map—a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, 10th European conference on principles and practice of knowledge discovery in databases, Berlin, Germany, September 18–22, 2006, Proceedings, Springer, Lecture Notes in Computer Science, vol 4213, pp 6–17. https://doi.org/10.1007/11871637_6

  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256. https://doi.org/10.1023/A:1013689704352

    Article  MATH  Google Scholar 

  • Belfodil A, Kuznetsov SO, Robardet C, Kaytoue M (2017) Mining convex polygon patterns with formal concept analysis. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, Australia, August 19–25, 2017, ijcai.org, pp 1425–1432. https://doi.org/10.24963/ijcai.2017/197

  • Bendimerad AA, Plantevit M, Robardet C (2016) Unsupervised exceptional attributed sub-graph mining in urban data. In: Bonchi F, Domingo-Ferrer J, Baeza-Yates RA, Zhou Z, Wu X (eds) IEEE 16th international conference on data mining, ICDM 2016, December 12–15, 2016, Barcelona, Spain, IEEE, pp 21–30. https://doi.org/10.1109/ICDM.2016.0013

  • Björnsson Y, Finnsson H (2009) Cadiaplayer: a simulation-based general game player. IEEE Trans Comput Intell AI Games 1(1):4–15. https://doi.org/10.1109/TCIAIG.2009.2018702

    Article  Google Scholar 

  • Boley M, Lucchese C, Paurat D, Gärtner T (2011) Direct local pattern sampling by efficient two-step random procedures. In: Apté C, Ghosh J, Smyth P (eds) Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21–24, 2011, ACM, pp 582–590. https://doi.org/10.1145/2020408.2020500

  • Bosc G, Golebiowski J, Bensafi M, Robardet C, Plantevit M, Boulicaut J, Kaytoue M (2016) Local subgroup discovery for eliciting and understanding new structure-odor relationships. In: Calders T, Ceci M, Malerba D (eds) Discovery science—19th international conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings, Lecture Notes in Computer Science, vol 9956, pp 19–34. https://doi.org/10.1007/978-3-319-46307-0_2

  • Boulicaut J, Jeudy B (2010) Constraint-based data mining. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, Berlin, pp 339–354. https://doi.org/10.1007/978-0-387-09823-4_17

    Google Scholar 

  • Bringmann B, Zimmermann A (2009) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81. https://doi.org/10.1007/s10115-008-0136-4

    Article  Google Scholar 

  • Browne C, Powley EJ, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Liebana DP, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43. https://doi.org/10.1109/TCIAIG.2012.2186810

    Article  Google Scholar 

  • Carmona CJ, González P, del Jesús MJ, Herrera F (2010) NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 18(5):958–970. https://doi.org/10.1109/TFUZZ.2010.2060200

    Article  Google Scholar 

  • del Jesús MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 15(4):578–592. https://doi.org/10.1109/TFUZZ.2006.890662

    Article  Google Scholar 

  • Downar L, Duivesteijn W (2017) Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowl Inf Syst 51(2):369–394. https://doi.org/10.1007/s10115-016-0979-z

    Article  Google Scholar 

  • Duivesteijn W, Knobbe AJ (2011) Exploiting false discoveries—statistical validation of patterns and quality measures in subgroup discovery. In: Cook DJ, Pei J, Wang W, Zaïane OR, Wu X (eds) 11th IEEE international conference on data mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011, IEEE Computer Society, pp 151–160. https://doi.org/10.1109/ICDM.2011.65

  • Duivesteijn W, Knobbe AJ, Feelders A, van Leeuwen M (2010) Subgroup discovery meets bayesian networks—an exceptional model mining approach. In: Webb GI, Liu B, Zhang C, Gunopulos D, Wu X (eds) ICDM 2010, The 10th ieee international conference on data mining, Sydney, Australia, 14–17 December 2010, IEEE Computer Society, pp 158–167. https://doi.org/10.1109/ICDM.2010.53

  • Duivesteijn W, Feelders A, Knobbe AJ (2016) Exceptional model mining–supervised descriptive local pattern mining with complex target concepts. Data Min Knowl Discov 30(1):47–98. https://doi.org/10.1007/s10618-015-0403-4

    Article  MathSciNet  Google Scholar 

  • Egho E, Gay D, Boullé M, Voisine N, Clérot F (2015) A parameter-free approach for mining robust sequential classification rules. In: Aggarwal CC, Zhou Z, Tuzhilin A, Xiong H, Wu X (eds) 2015 IEEE international conference on data mining, ICDM 2015, Atlantic City, NJ, USA, November 14–17, 2015, IEEE Computer Society, pp 745–750. https://doi.org/10.1109/ICDM.2015.87

  • Egho E, Gay D, Boullé M, Voisine N, Clérot F (2017) A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52(1):53–81. https://doi.org/10.1007/s10115-016-1002-4

    Article  Google Scholar 

  • Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Cognitive Technologies. Springer, Berlin. https://doi.org/10.1007/978-3-540-75197-7

    Book  MATH  Google Scholar 

  • Gaudel R, Sebag M (2010) Feature selection as a one-player game. In: Fürnkranz J, Joachims T (eds) Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, 2010, Haifa, Israel, Omnipress, pp 359–366. http://www.icml2010.org/papers/247.pdf

  • Gay D, Boullé M (2012) A bayesian approach for classification rule mining in quantitative databases. In: Flach PA, Bie TD, Cristianini N (eds) Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2012, Bristol, UK, September 24–28, 2012. Proceedings, Part II, Springer, Lecture Notes in Computer Science, vol 7524, pp 243–259. https://doi.org/10.1007/978-3-642-33486-3_16

  • Gelly S, Silver D (2007) Combining online and offline knowledge in UCT. In: Ghahramani Z (ed) Machine learning, proceedings of the twenty-fourth international conference (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007, ACM, ACM International Conference Proceeding Series, vol 227, pp 273–280. https://doi.org/10.1145/1273496.1273531

  • Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases, European conference, ECML/PKDD 2008, Antwerp, Belgium, September 15–19, 2008, Proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 5211, pp 440–456. https://doi.org/10.1007/978-3-540-87479-9_47

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds) Proceedings of the 2000 ACM SIGMOD international conference on management of data, May 16–18, 2000, Dallas, Texas, USA., ACM, pp 1–12. https://doi.org/10.1145/342009.335372

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83

    Article  MathSciNet  Google Scholar 

  • Helmbold DP, Parker-Wood A (2009) All-moves-as-first heuristics in Monte-Carlo go. In: Arabnia HR, de la Fuente D, Olivas JA (eds) Proceedings of the 2009 international conference on artificial intelligence, ICAI 2009, July 13–16, 2009, Las Vegas Nevada, USA, 2 Volumes, CSREA Press, pp 605–610

  • Holland JH (1975) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. University Michigan Press, Ann Arbor

    MATH  Google Scholar 

  • Kavsek B, Lavrac N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583. https://doi.org/10.1080/08839510600779688

    Article  Google Scholar 

  • Kaytoue M, Kuznetsov SO, Napoli A (2011) Revisiting numerical pattern mining with formal concept analysis. In: Walsh (2011), pp 1342–1347. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-227

  • Kaytoue M, Plantevit M, Zimmermann A, Bendimerad AA, Robardet C (2017) Exceptional contextual subgraph mining. Mach Learn 106(8):1171–1211. https://doi.org/10.1007/s10994-016-5598-0

    Article  MathSciNet  MATH  Google Scholar 

  • Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 249–271

    Google Scholar 

  • Kocsis L, Szepesvári C (2006) Bandit based monte-carlo planning. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Machine learning: ECML 2006, 17th European conference on machine learning, Berlin, Germany, September 18–22, 2006, Proceedings, Springer, Lecture Notes in Computer Science, vol 4212, pp 282–293. https://doi.org/10.1007/11871842_29

  • Lavrac N, Flach PA, Zupan B (1999) Rule evaluation measures: A unifying view. In: Dzeroski S, Flach PA (eds) Inductive logic programming, 9th international workshop, ILP-99, Bled, Slovenia, June 24–27, 1999, Proceedings, Springer, Lecture Notes in Computer Science, vol 1634, pp 174–185. https://doi.org/10.1007/3-540-48751-4_17

  • Lavrac N, Cestnik B, Gamberger D, Flach PA (2004) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1–2):115–143. https://doi.org/10.1023/B:MACH.0000035474.48771.cd

    Article  MATH  Google Scholar 

  • Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases, European conference, ECML/PKDD 2008, Antwerp, Belgium, September 15–19, 2008, Proceedings, Part II, Springer, Lecture Notes in Computer Science, vol 5212, pp 1–16. https://doi.org/10.1007/978-3-540-87481-2_1

  • Lemmerich F, Atzmueller M, Puppe F (2016) Fast exhaustive subgroup discovery with numerical target concepts. Data Min Knowl Discov 30(3):711–762. https://doi.org/10.1007/s10618-015-0436-8

    Article  MathSciNet  Google Scholar 

  • Lowerre BT (1976) The harpy speech recognition system. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, Department of Computer Science

  • Lucas T, Silva TCPB, Vimieiro R, Ludermir TB (2017) A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data. Appl Soft Comput 59:487–499. https://doi.org/10.1016/j.asoc.2017.05.048

    Article  Google Scholar 

  • Meeng M, Duivesteijn W, Knobbe AJ (2014) Rocsearch—an roc-guided search strategy for subgroup discovery. In: Zaki MJ, Obradovic Z, Tan P, Banerjee A, Kamath C, Parthasarathy S (eds) Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, 2014, SIAM, pp 704–712. https://doi.org/10.1137/1.9781611973440.81

  • Moens S, Boley M (2014) Instant exceptional model mining using weighted controlled pattern sampling. In: Blockeel H, van Leeuwen M, Vinciotti V (eds) Advances in intelligent data analysis XIII—13th international symposium, IDA 2014, Leuven, Belgium, October 30–November 1, 2014. Proceedings, Springer, Lecture Notes in Computer Science, vol 8819, pp 203–214. https://doi.org/10.1007/978-3-319-12571-8_18

  • Mueller M, Rosales R, Steck H, Krishnan S, Rao B, Kramer S (2009) Subgroup discovery for test selection: A novel approach and its application to breast cancer diagnosis. In: Adams NM, Robardet C, Siebes A, Boulicaut J (eds) Advances in intelligent data analysis VIII, 8th international symposium on intelligent data analysis, IDA 2009, Lyon, France, August 31–September 2, 2009. Proceedings, Springer, Lecture Notes in Computer Science, vol 5772, pp 119–130. https://doi.org/10.1007/978-3-642-03915-7_11

  • Novak PK, Lavrac N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403. https://doi.org/10.1145/1577069.1577083

    MATH  Google Scholar 

  • Pachón V, Vázquez JM, Domínguez JL, López MJM (2011) Multi-objective evolutionary approach for subgroup discovery. In: Corchado E, Kurzynski M, Wozniak M (eds) Hybrid artificial intelligent systems—6th international conference, HAIS 2011, Wroclaw, Poland, May 23–25, 2011, Proceedings, Part II, Springer, Lecture Notes in Computer Science, vol 6679, pp 271–278. https://doi.org/10.1007/978-3-642-21222-2_33

  • Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30. https://doi.org/10.1016/j.ins.2011.01.039

    Article  Google Scholar 

  • Russell SJ, Norvig P (2010) Artificial intelligence—a modern approach (3. internat. ed). Pearson Education. http://vig.pearsoned.com/store/product/1,1207,store-12521_isbn-0136042597,00.html

  • Schadd MPD, Winands MHM, van den Herik HJ, Chaslot G, Uiterwijk JWHM (2008) Single-player monte-carlo tree search. In: van den Herik HJ, Xu X, Ma Z, Winands MHM (eds) Computers and games, 6th international conference, CG 2008, Beijing, China, September 29–October 1, 2008. Proceedings, Springer, Lecture Notes in Computer Science, vol 5131, pp 1–12. https://doi.org/10.1007/978-3-540-87608-3_1

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  • van Leeuwen M, Galbrun E (2015) Association discovery in two-view data. IEEE Trans Knowl Data Eng 27(12):3190–3202. https://doi.org/10.1109/TKDE.2015.2453159

    Article  Google Scholar 

  • van Leeuwen M, Knobbe AJ (2011) Non-redundant subgroup discovery in large and complex data. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2011, Athens, Greece, September 5–9, 2011, Proceedings, Part III, Springer, Lecture Notes in Computer Science, vol 6913, pp 459–474. https://doi.org/10.1007/978-3-642-23808-6_30

  • van Leeuwen M, Knobbe AJ (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242. https://doi.org/10.1007/s10618-012-0273-y

    Article  MathSciNet  Google Scholar 

  • van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. In: Blockeel H, Kersting K, Nijssen S, Zelezný F (eds) Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2013, Prague, Czech Republic, September 23–27, 2013, Proceedings, Part III, Springer, Lecture Notes in Computer Science, vol 8190, pp 272–287. https://doi.org/10.1007/978-3-642-40994-3_18

  • Walsh T (ed) (2011) IJCAI 2011, Proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011, IJCAI/AAAI. http://ijcai.org/proceedings/2011

  • Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski HJ, Zytkow JM (eds) Principles of data mining and knowledge discovery, first European symposium, PKDD ’97, Trondheim, Norway, June 24–27, 1997, Proceedings, Springer, Lecture Notes in Computer Science, vol 1263, pp 78–87. https://doi.org/10.1007/3-540-63223-9_108

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive and insightful comments. They also warmly thank Sandy Moens, Mario Boley, Tarcísio Lucas, Renato Vimiero, Albrecht Zimmermann, Marc Plantevit, Aimene Belfodil, Abdallah Saffidine, Dave Ritchie and especially Céline Robardet for discussions, advice or code sharing. This work has been partially supported by the European Union (GRAISearch, FP7-PEOPLE-2013-IAPP) and the Institut rhônalpin des systèmes complexes (IXXI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Kaytoue.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bosc, G., Boulicaut, JF., Raïssi, C. et al. Anytime discovery of a diverse set of patterns with Monte Carlo tree search. Data Min Knowl Disc 32, 604–650 (2018). https://doi.org/10.1007/s10618-017-0547-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0547-5

Keywords

Navigation