Formalizing Complex Prior Information to Quantify Subjective Interestingness of Frequent Pattern Sets

  • Kleanthis-Nikolaos Kontonasios
  • Tijl DeBie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7619)

Abstract

In this paper, we are concerned with the problem of modelling prior information of a data miner about the data, with the purpose of quantifying subjective interestingness of patterns. Recent results have achieved this for the specific case of prior expectations on the row and column marginals, based on the Maximum Entropy principle [2,9]. In the current paper, we extend these ideas to make them applicable to more general prior information, such as knowledge of frequencies of itemsets, a cluster structure in the data, or the presence of dense areas in the database. As in [2,9], we show how information theory can be used to quantify subjective interestingness against this model, in particular the subjective interestingness of tile patterns [3]. Our method presents an efficient, flexible, and rigorous alternative to the randomization approach presented in [5]. We demonstrate our method by searching for interesting patterns in real-life data with respect to various realistic types of prior information.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (2005)Google Scholar
  2. 2.
    DeBie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. In: Data Mining and Knowledge Discovery (2010)Google Scholar
  3. 3.
    Geerts, F., Goethals, B., Mielikainen, T.: Tiling databases. In: Discovery Science (2004)Google Scholar
  4. 4.
    Gionis, A., Mannila, H., Mielikainen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(3) (2007)Google Scholar
  5. 5.
    Hanhijarvi, S., Ojala, M., Vuokko, N., Puolamaki, K., Tatti, N., Mannila, H.: Tell me something I don’t know: Randomization strategies for iterative data mining. In: Proc. of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2009 (2009)Google Scholar
  6. 6.
    Jaynes, E.T.: On the rationale of maximum-entropy methods. Proceedings of the IEEE 70 (1982)Google Scholar
  7. 7.
    Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Information Processing Letters 70 (1999)Google Scholar
  8. 8.
    Koller, D., Friedman, N.: Probabilistic Graphical Models, Principles and Techniques. MIT Press (2009)Google Scholar
  9. 9.
    Kontonasios, K.-N., DeBie, T.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: SDM, pp. 153–164. SIAM (2010)Google Scholar
  10. 10.
    Kontonasios, K.-N., DeBie, T.: Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets (supplementary document). Technical report, University of Bristol (2011), https://patterns.enm.bris.ac.uk/projects/Mining, subjectively interesting patterns using prior knowledge
  11. 11.
    Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Trans. on Knowl. and Data Eng. 8(6), 970–974 (1996)CrossRefGoogle Scholar
  12. 12.
    Tatti, N., Mampaey, M.: Using background knowledge to rank itemsets. Data Min. Knowl. Discov. 21, 293–309 (2010)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Wang, C., Han, J., Jia, Y., Tang, J., Zhang, D., Yu, Y.: Mining advisor-advisee relationships from research publication networks. In: KDD 2010 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kleanthis-Nikolaos Kontonasios
    • 1
  • Tijl DeBie
    • 1
  1. 1.Intelligent Systems LaboratoryUniversity of BristolBristolUK

Personalised recommendations