Non-redundant Subgroup Discovery Using a Closure System

  • Mario Boley
  • Henrik Grosskreutz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

Subgroup discovery is a local pattern discovery task, in which descriptions of subpopulations of a database are evaluated against some quality function. As standard quality functions are functions of the described subpopulation, we propose to search for equivalence classes of descriptions with respect to their extension in the database rather than individual descriptions. These equivalence classes have unique maximal representatives forming a closure system. We show that minimum cardinality representatives of each equivalence class can be found during the enumeration process of that closure system without additional cost, while finding a minimum representative of a single equivalence class is NP-hard. With several real-world datasets we demonstrate that search space and output are significantly reduced by considering equivalence classes instead of individual descriptions and that the minimum representatives constitute a family of subgroup descriptions that is of same or better expressive power than those generated by traditional methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Atzmüller, M., Puppe, F.: SD-map – A fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Calders, T., Rigotti, C., Boulicaut, J.f.: A survey on condensed representations for frequent sets. In: Constraint Based Mining and Inductive Databases, pp. 64–80. Springer, Heidelberg (2005)Google Scholar
  4. 4.
    Cohen, W.W.: Fast effective rule induction. In: ICML, pp. 115–123 (1995)Google Scholar
  5. 5.
    Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)CrossRefMATHGoogle Scholar
  7. 7.
    Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)MathSciNetMATHGoogle Scholar
  8. 8.
    Gebhardt, F.: Choosing among competing generalizations. Knowledge Acquisition 3(4), 361–380 (1991)CrossRefGoogle Scholar
  9. 9.
    Gély, A.: A generic algorithm for generating closed sets of a binary relation. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 223–234. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference, pp. 1–12 (2000)Google Scholar
  12. 12.
    Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI Press, Menlo Park (1996)Google Scholar
  13. 13.
    Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)MathSciNetGoogle Scholar
  14. 14.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATHGoogle Scholar
  15. 15.
    Slavík, P.: A tight analysis of the greedy algorithm for set cover. Journal of Algorithms 25(2), 237–254 (1997)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery Science, pp. 16–31 (2004)Google Scholar
  17. 17.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Mario Boley
    • 1
  • Henrik Grosskreutz
    • 1
  1. 1.Fraunhofer IAISSankt AugustinGermany

Personalised recommendations