Tight Optimistic Estimates for Fast Subgroup Discovery

  • Henrik Grosskreutz
  • Stefan Rüping
  • Stefan Wrobel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5211)

Abstract

Subgroup discovery is the task of finding subgroups of a population which exhibit both distributional unusualness and high generality. Due to the non monotonicity of the corresponding evaluation functions, standard pruning techniques cannot be used for subgroup discovery, requiring the use of optimistic estimate techniques instead. So far, however, optimistic estimate pruning has only been considered for the extremely simple case of a binary target attribute and up to now no attempt was made to move beyond suboptimal heuristic optimistic estimates. In this paper, we show that optimistic estimate pruning can be developed into a sound and highly effective pruning approach for subgroup discovery. Based on a precise definition of optimality we show that previous estimates have been tight only in special cases. Thereafter, we present tight optimistic estimates for the most popular binary and multi-class quality functions, and present a family of increasingly efficient approximations to these optimal functions. As we show in empirical experiments, the use of our newly proposed optimistic estimates can lead to a speed up of an order of magnitude compared to previous approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ABP06]
    Atzmüller, M., Baumeister, J., Puppe, F.: Introspective subgroup analysis for interactive knowledge refinement. In: Sutcliffe, G., Goebel, R. (eds.) FLAIRS Conference, pp. 402–407. AAAI Press, Menlo Park (2006)Google Scholar
  2. [AN07]
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  3. [AP06]
    Atzmüller, M., Puppe, F.: SD-map - a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. [BFOS84]
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (eds.): Classification and Regression Trees. Wadsworth (1984)Google Scholar
  5. [Bre96]
    Breiman, L.: Technical note: Some properties of splitting criteria. Machine Learning 24(1), 41–47 (1996)MATHMathSciNetGoogle Scholar
  6. [BV04]
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)MATHGoogle Scholar
  7. [GGM04]
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)Google Scholar
  8. [GRSW08]
    Grosskreutz, H., Rüping, S., Shaabani, N., Wrobel, S.: Optimistic estimate pruning strategies for fast exhaustive subgroup discovery. Technical report, Fraunhofer Institute IAIS (2008), http://publica.fraunhofer.de/eprints/urn:nbn:de:0011-n-723406.pdf
  9. [HPY00]
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) SIGMOD Conference, pp. 1–12. ACM, New York (2000)CrossRefGoogle Scholar
  10. [HPYM04]
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)CrossRefMathSciNetGoogle Scholar
  11. [KLGK07]
    Kralj, P., Lavrac, N., Gamberger, D., Krstacic, A.: Contrast set mining through subgroup discovery applied to brain ischaemina data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 579–586. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. [KLJ03]
    Kavsek, B., Lavrac, N., Jovanoski, V.: Apriori-sd: Adapting association rule learning to subgroup discovery. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 230–241. Springer, Heidelberg (2003)Google Scholar
  13. [Klö96]
    Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)Google Scholar
  14. [Klö02]
    Klösgen, W.: Subgroup Discovery. In: Handbook of Data Mining and Knowledge. Oxford University Press, Oxford (2002)Google Scholar
  15. [KM02]
    Klösgen, W., May, M.: Spatial subgroup mining integrated in an object-relational spatial database. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 275–286. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. [LCGF04]
    Lavrac, N., Cestnik, B., Gamberger, D., Flach, P.A.: Decision support through subgroup discovery: Three case studies and the lessons learned. Machine Learning 57(1-2), 115–143 (2004)MATHCrossRefGoogle Scholar
  17. [LKFT04]
    Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. Journal of Machine Learning Research 5, 153–188 (2004)MathSciNetGoogle Scholar
  18. [PEHH06]
    Paterson, J., Edlich, S., Hörning, H., Hörning, R.: The Definitive Guide to db4o. Apress, Berkely (2006)Google Scholar
  19. [SW00]
    Scheffer, T., Wrobel, S.: A sequential sampling algorithm for a general class of utility criteria, pp. 330–334. ACM Press, New York (2000)Google Scholar
  20. [Web95]
    Webb, G.I.: Opus: An efficient admissible algorithm for unordered search. J. Artif. Intell. Res (JAIR) 3, 431–465 (1995)MATHMathSciNetGoogle Scholar
  21. [Wro97]
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Henrik Grosskreutz
    • 1
  • Stefan Rüping
    • 1
  • Stefan Wrobel
    • 1
  1. 1.Fraunhofer IAIS, Schloss BirlinghovenSt. AugustinGermany

Personalised recommendations