On subgroup discovery in numerical domains

Grosskreutz, Henrik; Rüping, Stefan

doi:10.1007/s10618-009-0136-3

On subgroup discovery in numerical domains

Published: 22 July 2009

Volume 19, pages 210–226, (2009)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Henrik Grosskreutz¹ &
Stefan Rüping¹

398 Accesses
53 Citations
Explore all metrics

Abstract

Subgroup discovery is a Knowledge Discovery task that aims at finding subgroups of a population with high generality and distributional unusualness. While several subgroup discovery algorithms have been presented in the past, they focus on databases with nominal attributes or make use of discretization to get rid of the numerical attributes. In this paper, we illustrate why the replacement of numerical attributes by nominal attributes can result in suboptimal results. Thereafter, we present a new subgroup discovery algorithm that prunes large parts of the search space by exploiting bounds between related numerical subgroup descriptions. The same algorithm can also be applied to ordinal attributes. In an experimental section, we show that the use of our new pruning scheme results in a huge performance gain when more that just a few split-points are considered for the numerical attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences
Atzmueller M, Puppe F (2005) Semi-automatic visual subgroup mining using vikamine. J Univers Comp Sci 11(11): 1752–1765
Google Scholar
Atzmüller M, Puppe F (2006) SD-map—a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) PKDD, volume 4213 of lecture notes in computer science. Springer, New York, pp 6–17
Google Scholar
Demsar J, Zupan B, Leban G (2004) Orange: from experimental machine learning to interactive data mining. Technical report, faculty of computer and information science. University of Ljubljana, Slovenia
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. Morgan Kaufmann, Los Altos, pp 194–202
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1029
Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Daelemans W, Goethals B, Morik K (eds) ECML/PKDD (1), volume 5211 of lecture notes in computer science. Springer, New York, pp 440–456
Google Scholar
Hapfelmeier A, Schmidt J, Mueller M, Perneczky R, Drzezga A, Kurz A, Kramer S (2008) Interpreting pet scans by structured patient data: a data mining case study in dementia research. In: Proceedings of the eighth IEEE international conference on data mining (ICDM-2008)
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 249–271. ISBN 0-262-56097-6
Klösgen W, May M (2002) Spatial subgroup mining integrated in an object-relational spatial database. In: PKDD ’02. Springer-Verlag, London, pp 275–286
Kralj P, Lavrač N, Zupan B, Gamberger D (2005) Experimental comparison of three subgroup discovery algorithms: Analysing brain ischemia data. In: Proceedings of the 8th International multiconference information society IS 2005, pp 220–223
Lavrac N, Gamberger D (2004) Relevancy in constraint-based subgroup discovery. In: Boulicaut J-F, Raedt LD, Mannila H (eds) Constraint-based mining and inductive databases, volume 3848 of lecture notes in computer science. Springer, New York, pp 243–266
Google Scholar
Lavrac N, Flach PA, Kavsek B, Todorovski L (2002) Adapting classification rule induction to subgroup discovery. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM 2002), 9–12 December 2002, Maebashi City, Japan. IEEE Computer Society, pp 266–273
Lavrac N, Cestnik B, Gamberger D, Flach PA (2004) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1–2): 115–143
Article MATH Google Scholar
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. ACM SIGMOD Record 25(2): 1–12
Article Google Scholar
Webb GI (2001) Discovering associations with numeric variables. In: KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 383–388
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Proceedings of the first European symposion on principles of data mining and knowledge discovery (PKDD-97). Springer, New York, pp 78–87
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany
Henrik Grosskreutz & Stefan Rüping

Authors

Henrik Grosskreutz
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Rüping
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henrik Grosskreutz.

Additional information

Responsible editor: Aleksander Kołcz, Wray Buntine, Marko Grobelnik, Dunja Mladenic, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grosskreutz, H., Rüping, S. On subgroup discovery in numerical domains. Data Min Knowl Disc 19, 210–226 (2009). https://doi.org/10.1007/s10618-009-0136-3

Download citation

Received: 10 June 2009
Accepted: 20 June 2009
Published: 22 July 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10618-009-0136-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On subgroup discovery in numerical domains

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On subgroup discovery in numerical domains

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation