Learning What Matters – Sampling Interesting Patterns

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10234)

Abstract

In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account—but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals.

To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in (1) weighted sampling in SAT and (2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user’s interests.

We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality–diversity and exploitation–exploration when compared to existing methods.

References

  1. 1.
    Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Heidelberg (2014)MATHGoogle Scholar
  2. 2.
    Bhuiyan, M., Hasan, M.A.: Interactive knowledge discovery from hidden data through sampling of frequent patterns. Stat. Anal. Data Mining: ASA Data Sci. J. 9(4), 205–229 (2016)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bhuiyan, M., Hasan, M.A.: PRIIME: a generic framework for interactive personalized interesting pattern discovery. In: Proceedings of IEEE Big Data, pp. 606–615 (2016)Google Scholar
  4. 4.
    Boley, M., Gärtner, T., Grosskreutz, H.: Formal concept sampling for counting and threshold-free local pattern mining. In: Proceedings of SDM, pp. 177–188 (2010)Google Scholar
  5. 5.
    Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)CrossRefGoogle Scholar
  6. 6.
    Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One click mining - interactive local pattern discovery through implicit preference and performance learning. In: Workshop Proceedings of KDD, pp. 28–36 (2013)Google Scholar
  7. 7.
    Boley, M., Moens, S., Gärtner, T.: Linear space direct pattern sampling using coupling from the past. In: Proceedings of KDD, pp. 69–77 (2012)Google Scholar
  8. 8.
    Bringmann, B., Nijssen, S., Tatti, N., Vreeken, J., Zimmermann, A.: Mining sets of patterns. Tutorial at ECML/PKDD (2010)Google Scholar
  9. 9.
    Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006). doi:10.1007/11615576_4 CrossRefGoogle Scholar
  10. 10.
    Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: On parallel scalable uniform SAT witness generation. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 304–319. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46681-0_25 Google Scholar
  11. 11.
    Chakraborty, S., Fremont, D., Meel, K., Vardi, M.: Distribution-aware sampling and weighted model counting for SAT. In: Proceedings of AAAI, pp. 1722–1730 (2014)Google Scholar
  12. 12.
    Dzyuba, V., van Leeuwen, M., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)CrossRefGoogle Scholar
  13. 13.
    Dzyuba, V., van Leeuwen, M.: Learning what matters - sampling interesting patterns, March 2017. http://arxiv.org/abs/1702.01975
  14. 14.
    Dzyuba, V., van Leeuwen, M., De Raedt, L.: Flexible constrained sampling with guarantees for pattern mining. In: Data Mining and Knowledge Discovery (in press). https://arxiv.org/abs/1610.09263
  15. 15.
    Filippi, S., Cappé, O., Garivier, A., Szepesvári, C.: Parametric bandits: the generalized linear case. In: Proceedings of NIPS, pp. 586–594 (2010)Google Scholar
  16. 16.
    Hasan, M.A., Zaki, M.: Output space sampling for graph patterns. In: Proceedings of VLDB, pp. 730–741 (2009)Google Scholar
  17. 17.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of KDD, pp. 133–142 (2002)Google Scholar
  18. 18.
    van Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9 CrossRefGoogle Scholar
  19. 19.
    Rueping, S.: Ranking interesting subgroups. In: Proceedings of ICML, pp. 913–920 (2009)Google Scholar
  20. 20.
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)MathSciNetMATHGoogle Scholar
  21. 21.
    van Leeuwen, M., Ukkonen, A.: Discovering skylines of subgroup sets. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 272–287. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_18 CrossRefGoogle Scholar
  22. 22.
    Xin, D., Shen, X., Mei, Q., Han, J.: Discovering interesting patterns through user’s interactive feedback. In: Proceedings of KDD, pp. 773–778 (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceKU LeuvenLeuvenBelgium
  2. 2.LIACSLeiden UniversityLeidenThe Netherlands

Personalised recommendations