Data Mining and Knowledge Discovery

, Volume 19, Issue 2, pp 210–226 | Cite as

On subgroup discovery in numerical domains

Article

Abstract

Subgroup discovery is a Knowledge Discovery task that aims at finding subgroups of a population with high generality and distributional unusualness. While several subgroup discovery algorithms have been presented in the past, they focus on databases with nominal attributes or make use of discretization to get rid of the numerical attributes. In this paper, we illustrate why the replacement of numerical attributes by nominal attributes can result in suboptimal results. Thereafter, we present a new subgroup discovery algorithm that prunes large parts of the search space by exploiting bounds between related numerical subgroup descriptions. The same algorithm can also be applied to ordinal attributes. In an experimental section, we show that the use of our new pruning scheme results in a huge performance gain when more that just a few split-points are considered for the numerical attributes.

Keywords

Pattern mining Subgroup discovery Performance Pruning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer SciencesGoogle Scholar
  2. Atzmueller M, Puppe F (2005) Semi-automatic visual subgroup mining using vikamine. J Univers Comp Sci 11(11): 1752–1765Google Scholar
  3. Atzmüller M, Puppe F (2006) SD-map—a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) PKDD, volume 4213 of lecture notes in computer science. Springer, New York, pp 6–17Google Scholar
  4. Demsar J, Zupan B, Leban G (2004) Orange: from experimental machine learning to interactive data mining. Technical report, faculty of computer and information science. University of Ljubljana, SloveniaGoogle Scholar
  5. Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. Morgan Kaufmann, Los Altos, pp 194–202Google Scholar
  6. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1029Google Scholar
  7. Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Daelemans W, Goethals B, Morik K (eds) ECML/PKDD (1), volume 5211 of lecture notes in computer science. Springer, New York, pp 440–456Google Scholar
  8. Hapfelmeier A, Schmidt J, Mueller M, Perneczky R, Drzezga A, Kurz A, Kramer S (2008) Interpreting pet scans by structured patient data: a data mining case study in dementia research. In: Proceedings of the eighth IEEE international conference on data mining (ICDM-2008)Google Scholar
  9. Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 249–271. ISBN 0-262-56097-6Google Scholar
  10. Klösgen W, May M (2002) Spatial subgroup mining integrated in an object-relational spatial database. In: PKDD ’02. Springer-Verlag, London, pp 275–286Google Scholar
  11. Kralj P, Lavrač N, Zupan B, Gamberger D (2005) Experimental comparison of three subgroup discovery algorithms: Analysing brain ischemia data. In: Proceedings of the 8th International multiconference information society IS 2005, pp 220–223Google Scholar
  12. Lavrac N, Gamberger D (2004) Relevancy in constraint-based subgroup discovery. In: Boulicaut J-F, Raedt LD, Mannila H (eds) Constraint-based mining and inductive databases, volume 3848 of lecture notes in computer science. Springer, New York, pp 243–266Google Scholar
  13. Lavrac N, Flach PA, Kavsek B, Todorovski L (2002) Adapting classification rule induction to subgroup discovery. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM 2002), 9–12 December 2002, Maebashi City, Japan. IEEE Computer Society, pp 266–273Google Scholar
  14. Lavrac N, Cestnik B, Gamberger D, Flach PA (2004) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1–2): 115–143MATHCrossRefGoogle Scholar
  15. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. ACM SIGMOD Record 25(2): 1–12CrossRefGoogle Scholar
  16. Webb GI (2001) Discovering associations with numeric variables. In: KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 383–388Google Scholar
  17. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Proceedings of the first European symposion on principles of data mining and knowledge discovery (PKDD-97). Springer, New York, pp 78–87Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Fraunhofer IAISSchloss BirlinghovenSankt AugustinGermany

Personalised recommendations