Interactive Data Exploration Using Pattern Mining

  • Matthijs van Leeuwen

Abstract

We live in the era of data and need tools to discover valuable information in large amounts of data. The goal of exploratory data mining is to provide as much insight in given data as possible. Within this field, pattern set mining aims at revealing structure in the form of sets of patterns. Although pattern set mining has shown to be an effective solution to the infamous pattern explosion, important challenges remain.

One of the key challenges is to develop principled methods that allow user- and task-specific information to be taken into account, by directly involving the user in the discovery process. This way, the resulting patterns will be more relevant and interesting to the user. To achieve this, pattern mining algorithms will need to be combined with techniques from both visualisation and human-computer interaction. Another challenge is to establish techniques that perform well under constrained resources, as existing methods are usually computationally intensive. Consequently, they are only applied to relatively small datasets and on fast computers.

The ultimate goal is to make pattern mining practically more useful, by enabling the user to interactively explore the data and identify interesting structure. In this paper we describe the state-of-the-art, discuss open problems, and outline promising future directions.

Keywords

Interactive Data Exploration Pattern Mining Data Mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinksi, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the SIGMOD 1993, pp. 207–216. ACM (1993)Google Scholar
  2. 2.
    Bringmann, B., Nijssen, S., Tatti, N., Vreeken, J., Zimmermann, A.: Mining sets of patterns: Next generation pattern mining. In: Tutorial at ICDM 2011(2011)Google Scholar
  3. 3.
    Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: A constraint programming perspective. Artif. Intell. 175(12-13), 1951–1983 (2011)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Chau, D.H., Vreeken, J., van Leeuwen, M., Faloutsos, C. (eds.): Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics, IDEA 2013. ACM, New York (2013)Google Scholar
  5. 5.
    Atzmüller, M., Puppe, F.: Semi-automatic visual subgroup mining using vikamine. Journal of Universal Computer Science 11(11), 1752–1765 (2005)Google Scholar
  6. 6.
    Lucas, J.P., Jorge, A.M., Pereira, F., Pernas, A.M., Machado, A.A.: A tool for interactive subgroup discovery using distribution rules. In: Neves, J., Santos, M.F., Machado, J.M. (eds.) EPIA 2007. LNCS (LNAI), vol. 4874, pp. 426–436. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Goethals, B., Moens, S., Vreeken, J.: MIME: A framework for interactive visual pattern mining. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS (LNAI), vol. 6913, pp. 634–637. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Tuzhilin, A.: On subjective measures of interestingness in knowledge discovery. In: Proceedings of KDD 1995, pp. 275–281 (1995)Google Scholar
  9. 9.
    Kontonasios, K.N., Spyropoulou, E., De Bie, T.: Knowledge discovery interestingness measures based on unexpectedness. Wiley Int. Rev. Data Min. and Knowl. Disc. 2(5), 386–399 (2012)CrossRefGoogle Scholar
  10. 10.
    De Bie, T.: An information theoretic framework for data mining. In: Proceedings of KDD 2011, pp. 564–572 (2011)Google Scholar
  11. 11.
    Holzinger, A.: Human-computer interaction and knowledge discovery (hci-kdd): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  12. 12.
    Keim, D.A., Andrienko, G., Fekete, J.-D., Görg, C., Kohlhammer, J., Melançon, G.: Visual analytics: Definition, process, and challenges. In: Kerren, A., Stasko, J.T., Fekete, J.-D., North, C. (eds.) Information Visualization. LNCS, vol. 4950, pp. 154–175. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)Google Scholar
  14. 14.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  15. 15.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  16. 16.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15(1), 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  18. 18.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    van Leeuwen, M., Vreeken, J., Siebes, A.: Compression picks item sets that matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    van Leeuwen, M., Vreeken, J., Siebes, A.: Identifying the components. Data Min. Knowl. Discov. 19(2), 173–292 (2009)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Characterising the difference. In: Proceedings of the KDD 2007, pp. 765–774 (2007)Google Scholar
  22. 22.
    Kralj Novak, P., Lavrač, N., Webb, G.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)MATHGoogle Scholar
  23. 23.
    Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hidden data: A sampling-based solution. In: Proceedings of CIKM 2012, pp. 95–104. ACM, New York (2012)Google Scholar
  24. 24.
    Dzyuba, V., van Leeuwen, M.: Interactive discovery of interesting subgroup sets. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 150–161. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  25. 25.
    van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Mining and Knowledge Discovery 25, 208–242 (2012)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Galbrun, E., Miettinen, P.: A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining. In: Instant Interactive Data Mining Workshop at ECML-PKDD 2012 (2012)Google Scholar
  27. 27.
    Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One Click Mining — Interactive Local Pattern Discovery through Implicit Preference and Performance Learning. In: Interactive Data Exploration and Analytics (IDEA) workshop at KDD 2013, pp. 28–36 (2013)Google Scholar
  28. 28.
    Dzyuba, V., van Leeuwen, M., Nijssen, S., Raedt, L.D.: Active preference learning for ranking patterns. In: Proceedings of ICTAI 2013, pp. 532–539 (2013)Google Scholar
  29. 29.
    Rüping, S.: Ranking interesting subgroups. In: Proceedings of ICML 2009, pp. 913–920 (2009)Google Scholar
  30. 30.
    Bie, T.D.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Discov. 23(3), 407–446 (2011)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Spyropoulou, E., Bie, T.D., Boley, M.: Interesting pattern mining in multi-relational data. Data Min. Knowl. Discov. 28(3), 808–849 (2014)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Matthijs van Leeuwen
    • 1
  1. 1.Machine Learning groupKU LeuvenLeuvenBelgium

Personalised recommendations