Abstract
In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account—but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals.
To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in (1) weighted sampling in SAT and (2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user’s interests.
We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality–diversity and exploitation–exploration when compared to existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
References
Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Heidelberg (2014)
Bhuiyan, M., Hasan, M.A.: Interactive knowledge discovery from hidden data through sampling of frequent patterns. Stat. Anal. Data Mining: ASA Data Sci. J. 9(4), 205–229 (2016)
Bhuiyan, M., Hasan, M.A.: PRIIME: a generic framework for interactive personalized interesting pattern discovery. In: Proceedings of IEEE Big Data, pp. 606–615 (2016)
Boley, M., Gärtner, T., Grosskreutz, H.: Formal concept sampling for counting and threshold-free local pattern mining. In: Proceedings of SDM, pp. 177–188 (2010)
Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)
Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One click mining - interactive local pattern discovery through implicit preference and performance learning. In: Workshop Proceedings of KDD, pp. 28–36 (2013)
Boley, M., Moens, S., Gärtner, T.: Linear space direct pattern sampling using coupling from the past. In: Proceedings of KDD, pp. 69–77 (2012)
Bringmann, B., Nijssen, S., Tatti, N., Vreeken, J., Zimmermann, A.: Mining sets of patterns. Tutorial at ECML/PKDD (2010)
Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006). doi:10.1007/11615576_4
Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: On parallel scalable uniform SAT witness generation. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 304–319. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46681-0_25
Chakraborty, S., Fremont, D., Meel, K., Vardi, M.: Distribution-aware sampling and weighted model counting for SAT. In: Proceedings of AAAI, pp. 1722–1730 (2014)
Dzyuba, V., van Leeuwen, M., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)
Dzyuba, V., van Leeuwen, M.: Learning what matters - sampling interesting patterns, March 2017. http://arxiv.org/abs/1702.01975
Dzyuba, V., van Leeuwen, M., De Raedt, L.: Flexible constrained sampling with guarantees for pattern mining. In: Data Mining and Knowledge Discovery (in press). https://arxiv.org/abs/1610.09263
Filippi, S., Cappé, O., Garivier, A., Szepesvári, C.: Parametric bandits: the generalized linear case. In: Proceedings of NIPS, pp. 586–594 (2010)
Hasan, M.A., Zaki, M.: Output space sampling for graph patterns. In: Proceedings of VLDB, pp. 730–741 (2009)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of KDD, pp. 133–142 (2002)
van Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9
Rueping, S.: Ranking interesting subgroups. In: Proceedings of ICML, pp. 913–920 (2009)
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
van Leeuwen, M., Ukkonen, A.: Discovering skylines of subgroup sets. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 272–287. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_18
Xin, D., Shen, X., Mei, Q., Han, J.: Discovering interesting patterns through user’s interactive feedback. In: Proceedings of KDD, pp. 773–778 (2006)
Acknowledgements
Vladimir Dzyuba is supported by FWO-Vlaanderen. The authors would like to thank the anonymous reviewers for their helpful feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dzyuba, V., van Leeuwen, M. (2017). Learning What Matters – Sampling Interesting Patterns. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-57454-7_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57453-0
Online ISBN: 978-3-319-57454-7
eBook Packages: Computer ScienceComputer Science (R0)