Advertisement

Subjective Interestingness in Exploratory Data Mining

  • Tijl De Bie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8207)

Abstract

Exploratory data mining has as its aim to assist a user in improving their understanding about the data. Considering this aim, it seems self-evident that in optimizing this process the data as well as the user need to be considered. Yet, the vast majority of exploratory data mining methods (including most methods for clustering, itemset and association rule mining, subgroup discovery, dimensionality reduction, etc) formalize interestingness of patterns in an objective manner, disregarding the user altogether. More often than not this leads to subjectively uninteresting patterns being reported.

Here I will discuss a general mathematical framework for formalizing interestingness in a subjective manner. I will further demonstrate how it can be successfully instantiated for a variety of exploratory data mining problems. Finally, I will highlight some connections to other work, and outline some of the challenges and research opportunities ahead.

Keywords

Background Model Belief State Prior Belief Association Rule Mining Background Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    De Bie, T.: An information-theoretic framework for data mining. In: Proc. of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2011)Google Scholar
  2. 2.
    De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery 23(3), 407–446 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Friedman, J., Tukey, J.: A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers 100(9), 881–890 (1974)CrossRefGoogle Scholar
  4. 4.
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Computing Surveys 38(3), 9 (2006)CrossRefGoogle Scholar
  6. 6.
    Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3), 14 (2007)CrossRefGoogle Scholar
  7. 7.
    Grünwald, P.: The Minimum Description Length Principle. MIT Press (2007)Google Scholar
  8. 8.
    Hanhijarvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., Mannila, H.: Tell me something I don’t know: Randomization strategies for iterative data mining. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 379–388 (2009)Google Scholar
  9. 9.
    Huber, P.: Projection pursuit. The Annals of Statistics, 435–475 (1985)Google Scholar
  10. 10.
    Kontonasios, K.-N., De Bie, T.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: Proc. of the 2010 SIAM International Conference on Data Mining (SDM) (2010)Google Scholar
  11. 11.
    Kontonasios, K.-N., DeBie, T.: Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 161–171. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Kontonasios, K.-N., De Bie, T.: Subjectively interesting alternative clusterings. Machine Learning (2013)Google Scholar
  13. 13.
    Kontonasios, K.-N., Spyropoulou, E., De Bie, T.: Knowledge discovery interestingness measures based on unexpectedness. WIREs Data Mining and Knowledge Discovery 2(5), 386–399 (2012)CrossRefGoogle Scholar
  14. 14.
    Kontonasios, K.-N., Vreeken, J., De Bie, T.: Maximum entropy modelling for assessing results on real-valued data. In: Proc. of the IEEE International Conference on Data Mining (ICDM) (2011)Google Scholar
  15. 15.
    Kontonasios, K.-N., Vreeken, J., De Bie, T.: Maximum entropy models for iteratively identifying subjectively interesting structure in real-valued data. In: Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery from Databases (ECML-PKDD) (2013)Google Scholar
  16. 16.
    Padmanabhan, B., Tuzhilin, A.: A belief-driven method for discovering unexpected patterns. In: Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 94–100 (1998)Google Scholar
  17. 17.
    Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Proc. of the 2006 SIAM International Conference on Data Mining (SDM) (2006)Google Scholar
  18. 18.
    Silberschatz, A., Tuzhilin, A.: On subjective measures of interestingness in knowledge discovery. In: Proc. of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 275–281 (1995)Google Scholar
  19. 19.
    Spyropoulou, E., De Bie, T.: Interesting multi-relational patterns. In: Proc. of the IEEE International Conference on Data Mining (ICDM) (2011)Google Scholar
  20. 20.
    Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi-relational data. Data Mining and Knowledge Discovery (2013)Google Scholar
  21. 21.
    Spyropoulou, E., De Bie, T., Boley, M.: Mining interesting patterns in multi-relational data with n-ary relationships. In: Discovery Science (DS) (2013)Google Scholar
  22. 22.
    Webb, G.: Discovering significant patterns. Machine Learning 68(1), 1–33 (2007)CrossRefGoogle Scholar
  23. 23.
    Webb, G.: Filtered-top-k association discovery. WIREs Data Mining and Knowledge Discovery 1(3), 183–192 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tijl De Bie
    • 1
  1. 1.Intelligent Systems LabUniversity of BristolUK

Personalised recommendations