Advertisement

Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space

  • Henrik Grosskreutz
  • Daniel Paurat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)

Abstract

We consider a modified version of the top-k subgroup discovery task, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it has been applied in many applications, so far no efficient exact algorithm for this task has been proposed. Most existing solutions do not guarantee the exact solution (as a result of the use of non-admissible heuristics), while the only exact solution relies on the explicit storage of the whole search space, which results in prohibitively large memory requirements.

In this paper, we present a new top-k relevant subgroup discovery algorithm which overcomes these shortcomings. Our solution is based on the fact that if an iterative deepening approach is applied, the relevance check – which is the root of the problems of all other approaches – can be realized based solely on the best k subgroups visited so far. The approach also allows for the integration of admissible pruning techniques like optimistic estimate pruning. The result is a fast, memory-efficient algorithm which clearly outperforms existing top-k relevant subgroup discovery approaches. Moreover, we analytically and empirically show that it is competitive with simpler approaches which do not consider the relevance criterion.

Keywords

Quality Function Closure Operator Closed Subgroup Negative Support Subgroup Discovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Atzmueller, M., Lemmerich, F., Krause, B., Hotho, A.: Towards Understanding Spammers - Discovering Local Patterns for Concept Characterization and Description. In: Proc. of the LeGo Workshop at ECML-PKDD (2009)Google Scholar
  3. 3.
    Atzmueller, M., Lemmerich, F.: Fast subgroup discovery for continuous target concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 35–44. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Boley, M., Grosskreutz, H.: Non-redundant subgroup discovery using a closure system. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5781, pp. 179–194. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Bringmann, B., Nijssen, S., Zimmermann, A.: Pattern based classification: a unifying perspective. In: LeGo Worskhop Colocated with ECML/PKDD (2009)Google Scholar
  6. 6.
    Bringmann, B., Zimmermann, A.: The chosen few: On identifying valuable patterns. In: ICDM (2007)Google Scholar
  7. 7.
    Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference, pp. 1–12 (2000)Google Scholar
  10. 10.
    Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)Google Scholar
  11. 11.
    Knobbe, A., Cremilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: The lego approach to data mining. In: From Local Patterns to Global Models: Proceedings of the ECML/PKDD-2008 Workshop (2008)Google Scholar
  12. 12.
    Kralj, P., Lavrač, N., Zupan, B., Gamberger, D.: Experimental comparison of three subgroup discovery algorithms: Analysing brain ischemia data. In: Information Society, pp. 220–223 (2005)Google Scholar
  13. 13.
    Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5(Feb), 153–188 (2004)MathSciNetGoogle Scholar
  14. 14.
    Lavrac, N., Gamberger, D.: Relevancy in constraint-based subgroup discovery. In: Constraint-Based Mining and Inductive Databases (2005)Google Scholar
  15. 15.
    Lavrac, N., Gamberger, D., Jovanoski, V.: A study of relevance for learning in deductive databases. J. Log. Program. 40(2-3), 215–249 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Lemmerich, F., Atzmueller, M.: Fast discovery of relevant subgroup patterns. In: FLAIRS (2010)Google Scholar
  17. 17.
    Morishita, S., Sese, J.: Traversing itemset lattice with statistical metric pruning. In: PODS (2000)Google Scholar
  18. 18.
    Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in roc space: a constraint programming approach. In: KDD, pp. 647–656 (2009)Google Scholar
  19. 19.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefzbMATHGoogle Scholar
  20. 20.
    Russell, S.J., Norvig, P.: Artificial Intelligence: a modern approach, 2nd International edn. Prentice Hall, Englewood Cliffs (2003)zbMATHGoogle Scholar
  21. 21.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery Science (2004)Google Scholar
  22. 22.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263. Springer, Heidelberg (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Henrik Grosskreutz
    • 1
  • Daniel Paurat
    • 1
  1. 1.Schloss BirlinghovenFraunhofer IAISSt. AugustinGermany

Personalised recommendations