Advertisement

Discovering Descriptive Tile Trees

By Mining Optimal Geometric Subtiles
  • Nikolaj Tatti
  • Jilles Vreeken
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)

Abstract

When analysing binary data, the ease at which one can interpret results is very important. Many existing methods, however, discover either models that are difficult to read, or return so many results interpretation becomes impossible. Here, we study a fully automated approach for mining easily interpretable models for binary data. We model data hierarchically with noisy tiles—rectangles with significantly different density than their parent tile. To identify good trees, we employ the Minimum Description Length principle.

We propose Stijl, a greedy any-time algorithm for mining good tile trees from binary data. Iteratively, it finds the locally optimal addition to the current tree, allowing overlap with tiles of the same parent. A major result of this paper is that we find the optimal tile in only Θ(NM min(N,M)) time. Stijl can either be employed as a top-k miner, or by MDL we can identify the tree that describes the data best.

Experiments show we find succinct models that accurately summarise the data, and, by their hierarchical property are easily interpretable.

Keywords

Minimum Description Length Kolmogorov Complexity Tile Tree Frequent Pattern Mining Minimum Description Length Principle 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Bringmann, B., Zimmermann, A.: The chosen few: On identifying valuable patterns. In: ICDM, pp. 63–72 (2007)Google Scholar
  3. 3.
    Calders, T., Dexters, N., Goethals, B.: Mining frequent itemsets in a stream. In: ICDM, pp. 83–92. IEEE (2007)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006)zbMATHGoogle Scholar
  5. 5.
    De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Disc. 23(3), 407–446 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Fortelius, M., Gionis, A., Jernvall, J., Mannila, H.: Spectral ordering and biochronology of european fossil mammals. Paleobiology 32(2), 206–214 (2006)CrossRefGoogle Scholar
  7. 7.
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and Combinatorial Tiles in 0–1 Data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 173–184. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Grünwald, P.: The Minimum Description Length Principle. MIT Press (2007)Google Scholar
  10. 10.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: Current status and future directions. Data Min. Knowl. Disc. 15 (2007)Google Scholar
  11. 11.
    Hanhijärvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., Mannila, H.: Tell me something I don’t know: randomization strategies for iterative data mining. In: KDD, pp. 379–388. ACM (2009)Google Scholar
  12. 12.
    Kontonasios, K.-N., De Bie, T.: An information-theoretic approach to finding noisy tiles in binary databases. In: SDM, pp. 153–164. SIAM (2010)Google Scholar
  13. 13.
    Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications. Springer (1993)Google Scholar
  14. 14.
    Mampaey, M., Tatti, N., Vreeken, J.: Tell me what I need to know: Succinctly summarizing data with itemsets. In: KDD, pp. 573–581. ACM (2011)Google Scholar
  15. 15.
    Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE TKDE 20(10), 1348–1362 (2008)Google Scholar
  16. 16.
    Mitchell-Jones, A., Amori, G., Bogdanowicz, W., Krystufek, B., Reijnders, P.H., Spitzenberger, F., Stubbe, M., Thissen, J., Vohralik, V., Zima, J.: The Atlas of European Mammals. Academic Press (1999)Google Scholar
  17. 17.
    Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)CrossRefGoogle Scholar
  18. 18.
    Pensa, R.G., Robardet, C., Boulicaut, J.-F.: A Bi-clustering Framework for Categorical Data. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 643–650. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Tatti, N.: Are your items in order? In: SDM 2011, pp. 414–425. SIAM (2011)Google Scholar
  20. 20.
    Tatti, N., Heikinheimo, H.: Decomposable Families of Itemsets. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 472–487. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining itemsets that compress. Data Min. Knowl. Disc. 23(1), 169–214 (2011)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nikolaj Tatti
    • 1
  • Jilles Vreeken
    • 1
  1. 1.Advanced Database Research and ModelingUniversiteit AntwerpenBelgium

Personalised recommendations