Fast exhaustive subgroup discovery with numerical target concepts

Lemmerich, Florian; Atzmueller, Martin; Puppe, Frank

doi:10.1007/s10618-015-0436-8

Fast exhaustive subgroup discovery with numerical target concepts

Published: 13 November 2015

Volume 30, pages 711–762, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Florian Lemmerich¹,
Martin Atzmueller² &
Frank Puppe³

843 Accesses
45 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

Subgroup discovery is a key data mining method that aims at identifying descriptions of subsets of the data that show an interesting distribution with respect to a pre-defined target concept. For practical applications the integration of numerical data is crucial. Therefore, a wide variety of interestingness measures has been proposed in literature that use a numerical attribute as the target concept. However, efficient mining in this setting is still an open issue. In this paper, we present novel techniques for fast exhaustive subgroup discovery with a numerical target concept. We initially survey previously proposed measures in this setting. Then, we explore options for pruning the search space using optimistic estimate bounds. Specifically, we introduce novel bounds in closed form and ordering-based bounds as a new technique to derive estimates for several types of interestingness measures with no previously known bounds. In addition, we investigate efficient data structures, namely adapted FP-trees and bitset-based data representations, and discuss their interdependencies to interestingness measures and pruning schemes. The presented techniques are incorporated into two novel algorithms. Finally, the benefits of the proposed pruning bounds and algorithms are assessed and compared in an extensive experimental evaluation on 24 publicly available datasets. The novel algorithms reduce runtimes consistently by more than one order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Subgroup Discovery in Purely Numerical Data

DISDi: Discontinuous Intervals in Subgroup Discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Notes

Available at www.vikamine.org.

References

Alcala-Fernandez J, Fernandez A, Luengo J, Derrac J, Garcia S, Sanchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Atzmueller M (2015) Subgroup discovery—advanced review. WIREs Data Mining Knowl Discov 5(1):35–49
Article Google Scholar
Atzmueller M, Lemmerich F (2009) Fast subgroup discovery for continuous target concepts. In: Proceedings of the 18th international symposium on foundations of intelligent systems (ISMIS), p 35–44
Atzmueller M, Lemmerich F (2012) VIKAMINE—Open-source subgroup discovery, pattern mining, and analytics. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 842–845
Atzmueller M, Lemmerich F (2013) Exploratory pattern mining on social media using geo-references and social tagging information. Int J Web Sci 2(1–2):80–112
Article Google Scholar
Atzmueller M, Lemmerich F, Krause B, Hotho A (2009) Who are the spammers? Understandable local patterns for concept description. In: Proceedings of the 7th conference on computer methods and systems
Atzmueller M, Mueller J, Becker M (2015) Exploratory subgroup analytics on ubiquitous data. In: Atzmueller A, Chin A, Scholz C, Trattner C (Ed.), Mining, modeling and recommending ’things’ in social media, p 1–20. Springer
Atzmueller M, Puppe F (2006) SD-Map—a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), p 6–17
Atzmueller M, Puupe F (2009) A knowledge-intensive approach for semi-automatic causal subgroup discovery. In: Berendt B et al (eds) Knowledge discovery enhanced with semantic and social information, vol 220. Springer, Berlin, pp 19–36
Chapter Google Scholar
Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 261–270
Aumann Y, Lindell Y (2003) A statistical theory for quantitative association rules. J Intell Inf Syst 20(3):255–283
Article Google Scholar
Batal I, Hauskrecht M (2010) A concise representation of association rules using minimal predictive rules. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 87–102
Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Discov 5(3):213–246
Article MATH Google Scholar
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, p 85–93
Bayardo RJ, Agrawal R, Gunopulos D (1999) Constraint-based rule mining in large, dense databases. Data Min Knowl Discov 4(2–3):217–240
Google Scholar
Box GEP (1953) Non-normality and tests on variances. Biometrika 40:318–335
Article MathSciNet MATH Google Scholar
Breiman L, Friedman JH, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman & Hall, Boca Raton
MATH Google Scholar
Brin S, Rastogi R, Shim K (2003) Mining optimized gain rules for numeric attributes. IEEE Trans Knowl Data Eng 15(2):324–338
Article Google Scholar
Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: Proceedings of the 24th international conference on data engineering (ICDE), p 169–178
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 43–52
Duivesteijn W, Knobbe AJ, Feelders A, van Leeuwen M (2010) Subgroup discovery meets bayesian networks—an exceptional model mining approach. In: Proceedings of the 10th international conference on data mining (ICDM), p 158–167
El-Qawasmeh E (2003) Beating the popcount. Int J Inf Technol 9(1):1–18
Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence (IJCAI), p 1022–1027
Freidlin B, Gastwirth JL (2000) Should the median test be retired from general use? Am Stat 54(3):161–164
Google Scholar
Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Mining optimized association rules for numeric attributes. In: Proceedings of the 15th ACM symposium on principles of database systems (PODS), p 182–191
García S, Luengo J, Saez JA, Lopez V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750
Article Google Scholar
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9
Article Google Scholar
Grosskreutz H (2008) Cascaded subgroups discovery with an application to regression. In: From local patterns to global models, workshop at the ECML/PKDD, p 275–286
Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19(2):210–226
Article MathSciNet Google Scholar
Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Proceedings of the 2008 European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 440–456
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybernet 4(2):100–107
Article Google Scholar
Jorge AM, Azevedo PJ, Pereira F (2006) Distribution rules with numeric attributes of interest. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), p 247–258
Kavšek B, Lavrač N (2006) Apriori-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20:543–583
Article Google Scholar
Klösgen W (1994) Exploration of simulation experiments by discovery. Technical Report WS-04-03
Klösgen W (1995) Efficient discovery of interesting statements in databases. J Intell Inf Syst 4(1):53–69
Article Google Scholar
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad U-M, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, Cambridge, pp 249–271
Google Scholar
Klösgen W (2002) Data mining tasks and methods: subgroup discovery: deviation analysis. In: Klösgen W, Zytkow JM (ed), Handbook of Data Mining and Knowledge Discovery, p 354–361
Klösgen W, May M (2002) Census data mining—an application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD)
Kotsiantis S, Kanellopoulos D (2006) Discretization techniques: a recent survey. GESTS Int Trans Comput Sci Eng 32(1):47–58
Google Scholar
Kralj Novak P, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
MATH Google Scholar
Lavrač N, Kavšek B, Flach PA, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188
MathSciNet Google Scholar
Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 1–16
Lemmerich F (2014) Novel techniques for efficient and effective subgroup discovery. PhD thesis, Universität Würzburg
Lemmerich F, Atzmueller M (2012) Describing locations using tags and images: explorative pattern mining in social media. In: Revised selected papers from the workshops on modeling and mining ubiquitous social media, p 77–96
Lemmerich F, Becker M, Atzmueller M (2012) Generic pattern trees for exhaustive exceptional model mining. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 277–292
Lemmerich F, Becker M, Puppe F (2013) Difference-based estimates for generalization-aware subgroup discovery. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 288–303
Lemmerich F, Puppe F (2011) Local models for expectation-driven subgroup discovery. In: Proceedings of the 11th international conference on data mining (ICDM), p 360–369
Lemmerich F, Rohlfs M, Atzmueller M (2010) Fast discovery of relevant subgroup patterns. In: Proceedings of the 23rd Florida artificial intelligence research society conference (FLAIRS), p 428–433
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Lucas JP, Jorge AM, Pereira F, Pernas AM, Machado AA (2007) A tool for interactive subgroup discovery using distribution rules. In: Proceedings of the artificial intelligence 13th Portuguese conference on progress in artificial intelligence (EPIA), p 426–436
Mampaey M, Nijssen S, Feelders A, Knobbe AJ (2012) Efficient algorithms for finding richer subgroup descriptions in numeric and nominal data. In: Proceedings of the 12th international conference on data mining (ICDM), p 499–508
Moreland K, Truemper K (2009) Discretization of target attributes for subgroup discovery. In: Proceedings of the 6th international conference on machine learning and data mining in pattern recognition (MLDM), p 44–52
Morishita S (1998) On classification and regression. In: Proceedings of the first international conference on discovery science, p 40–57
Morishita S, Sese J (2000) Traversing itemset lattices with statistical metric pruning. In: Proceedings of the 19th ACM symposium on principles of database systems (PODS), p 226–236
Pieters BFI (2010) Subgroup discovery on numeric and ordinal targets, with an application to biological data aggregation. Technical report, Universiteit Utrecht
Pieters BFI, Knobbe AJ, Džeroski S (2010) Subgroup discovery in ranked data, with an application to gene set enrichment. In: Preference learning, workshop at the ECML/PKDD, vol. 10, p 1–18
Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50
Article Google Scholar
Webb GI (1995) OPUS: an efficient admissible algorithm for unordered search. J Artif Intell Res 3(1):431–465
MATH Google Scholar
Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 383–388
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery (PKDD), p 78–87
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Article MathSciNet Google Scholar
Zimmermann A, De Raedt L (2009) Cluster-grouping: from subgroup discovery to clustering. Mach Learn 77(1):125–159
Article MATH Google Scholar

Download references

Acknowledgments

This work has been partially supported by the VENUS research cluster at the interdisciplinary Research Center for Information System Design (ITeG) at Kassel University.

Author information

Authors and Affiliations

Computational Social Science Department, GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany
Florian Lemmerich
Research Center for Information System Design (ITeG), Knowledge and Data Engineering Group, University of Kassel, Kassel, Germany
Martin Atzmueller
Institute of Computer Science, Artificial Intelligence and Applied Computer Science Group, University of Würzburg, Würzburg, Germany
Frank Puppe

Authors

Florian Lemmerich
View author publications
You can also search for this author in PubMed Google Scholar
Martin Atzmueller
View author publications
You can also search for this author in PubMed Google Scholar
Frank Puppe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Lemmerich.

Additional information

Responsible editor: M.J. Zaki.

This paper summarizes and extends contents of the dissertation of the first author (Lemmerich 2014). A small part of this work, i.e., the SD-Map* algorithm for mean-based interestingness measures only, has been described in a previous publication (Atzmueller and Lemmerich 2009).

Appendix

Lemma 1

Using the notations of Theorem 9, the function $f^a (x) (n+x)^a \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) $ has no local maxima inside its domain of definition:

$$\begin{aligned} f^a(x) \le max (f(0), f(x_{max})) \end{aligned}$$

Proof

We distinguish three cases by the parameter a of the applied generic mean interestingness measure:

first, for $a = 1$, it holds that

$$\begin{aligned} f^1 (x)&= (n+x)^1 \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) \\&= \sigma + \theta x - \mu _\emptyset n - \mu _\emptyset x \\&= \left( \theta - \mu _\emptyset \right) \cdot x + \sigma - \mu _\emptyset n \end{aligned}$$

As this is a linear function in x, the function $f^1(x)$ is strictly increasing for $\theta > \mu _\emptyset $ and strictly decreasing otherwise. Thus, the theorem holds for $a=1$.

Second, we consider the case $(a \ne 1) \wedge (\sigma = \theta n)$, that is, the first n instances all had the same target value. In this case, the function $f^a(x)$ is given by $f^a(x) = (n+x)^a (\theta -\mu _\emptyset )$. This is strictly monotone since $n > 0, x > 0$. Thus, again $f^a(x)$ has no local maximum.

Third, the case $(a \ne 1) \wedge (\sigma \ne \theta n)$ is considered in detail: since $\sigma $ was computed as a sum of n values that are at least as large as $\theta $ it can be assumed that $\theta \cdot n < \sigma $. In the following, the maxima of $f^a(x)$ is determined by deriving this function twice.

$$\begin{aligned} f^a\,'(x)&= \frac{d}{d x} f^a(x) = (n+x)^a \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) \\&= (n+x)^a \left( \frac{d}{dx} \left( \frac{\theta x+\sigma }{n+x}-\mu _\emptyset \right) \right) + \left( \frac{\theta x +\sigma }{n+x}-\mu _\emptyset \right) \cdot \left( \frac{d}{dx}(n+x)^a\right) \\&= (n+x)^a \left( \frac{d}{dx} \left( \frac{\theta x+\sigma }{n+x} \right) \right) + \left( \frac{\theta x+\sigma }{n+x}-\mu _\emptyset \right) \cdot a (n+x)^{a-1} \\&= (n+x)^a \left( \frac{\theta }{n+x} - \frac{\theta x+\sigma }{(n+x)^2}\right) + \left( \frac{\theta x+\sigma }{n+x}-\mu _\emptyset \right) \cdot a(n+x)^{a-1} \\&= (n+x)^{a-2} \left( (\theta (n+x)- (\theta x+\sigma )\right) + a\left( \theta x+\sigma - \mu _\emptyset (n+x))\right) \\&= (n+x)^{a-2} (\theta n - \sigma + a\theta x+ a\sigma - a \mu _\emptyset n - a \mu _\emptyset x))\\&= (n+x)^{a-2} \left( (x (a\theta -a \mu _\emptyset ) + a\sigma -a \mu _\emptyset n + \theta n - \sigma \right) \end{aligned}$$

In line 2, the product rule is used. In line 3 the chain rule is applied, substituting (n+x). $\mu _\emptyset $ can be omitted, as it is constant with respect to x. In line 4 the quotient rule is used. Finally, in line 5 $(n+x)^{a-2}$ is factored out.

Since $x > 0, n > 0$ by definition, the first factor is obviously greater than zero for any valid x. For $a = 0$ or $\theta = \mu _\emptyset $, the second factor of this function is independent from x, so it has no root, thus f(x) has no maxima except the definition boundaries in this case. Otherwise the root of this function and therefore the only candidate for a maximum of $f^a(x)$ is given at the point

$$\begin{aligned} x^* = \frac{-a \sigma + an \mu _\emptyset - \theta n + \sigma }{a(\theta - \mu _\emptyset )}. \end{aligned}$$

In the following, it is shown that $x^*$ can not be a maximum value in our setting. For that purpose, the second derivative of f(x) is computed at the point $x^*$:

$$\begin{aligned} f^a\,''(x)&= \frac{d}{d x} f'(x) \\&= (n+x)^{a-3}(a-2)(x(a\theta -a \mu _\emptyset )\\&\quad + a \sigma -an +\theta n-\sigma )+ (a\theta -a \mu _\emptyset )(n+x)^{a-2} \\&= (n+x)^{a-3} ((a-2) (x(a\theta -a \mu _\emptyset ) \\&\quad + a \sigma -an \mu _\emptyset +\theta n-\sigma ) + (a\theta -a \mu _\emptyset ) (n+x)) \\&= (n+x)^{a-3} (a^2x\theta -a^2x \mu _\emptyset + a^2\sigma -a^2 \mu _\emptyset n + a\theta n-a\sigma - 2xa\theta \\&\quad + 2ax \mu _\emptyset -2a\sigma + 2an \mu _\emptyset - 2\theta n + 2\sigma + a\theta n - an \mu _\emptyset + a\theta x - ax \mu _\emptyset ) \\&= (n+x)^{a-3} (a-1) (a\theta x+a\sigma -an\mu _\emptyset -ax\mu _\emptyset +2\theta n-2\sigma ) \\&= (n+x)^{a-3} (a-1) (x(a\theta -a\mu _\emptyset ) +a\sigma -an\mu _\emptyset +2\theta n-2\sigma ) \end{aligned}$$

We now can determine the second derivative of f in $x^*$:

$$\begin{aligned} f^a\,''(x^*)&= (n+x^*)^{a-3} (a-1) (x^*(a\theta -a \mu _\emptyset ) +a\sigma - an\mu _\emptyset +2\theta n-2\sigma )\\&= (n+x^*)^{a-3} (a-1) \\&\quad \left( \frac{-a \sigma + an \mu _\emptyset - \theta n + \sigma }{a(\theta - \mu _\emptyset )}(a\theta -a \mu _\emptyset ) +a\sigma -an \mu _\emptyset +2\theta n-2\sigma \right) \\&= (n+x^*)^{a-3} (a-1) (-a \sigma + an \mu _\emptyset - \theta n + \sigma +a \sigma -an \mu _\emptyset + 2 \theta n-2 \sigma )\\&= (n+x^*)^{a-3} (a-1) (\theta n - \sigma )\\&= (n+x^*)^{a-3} (a-1) (\theta n - \sigma ) \end{aligned}$$

Since $f^a(x)$ is defined only for positive x, the first factor is always positive. Since by premise $a < 1$ and $\theta n < \sigma $, the second derivative at point $x^*$ is always positive. Thus, if $x^*$ is an extreme value of f(x), then it is a local minimum. Since it was shown above that f(x) has no other candidates for extreme values besides $x^*$, this proves the lemma.

Lemma 2

The generic mean-based measures $q_{mean}^a$ are convex for $a=1$ in the $(\sum T(c), i_P)$ space. They are not convex for arbitrary a.

Proof

For $a=1$, the interestingness measure is given by $q_{mean}^1 (P) = i_P \cdot (\mu _P - \mu _\emptyset ) = i_P \cdot (\frac{\sum _{c \in P} T(c)}{i_P} - \mu _\emptyset ) = \sum _{c \in P} T(c) - i_P \mu _\emptyset $. This function is linear in both $\sum T(C)$ and $i_P$. Since linear functions are known to be convex, $q_{mean}^1$ is convex.

To show that generic mean-based measures are not convex in general, we show an example where the definition of convex for a function f(x), that is, $\forall x,y, \lambda \in (0,1): f((1-\lambda ) x + \lambda y) \le (1-\lambda ) f (x) + \lambda f (y)$, is violated. In our case, x and y are each two-dimensional points in the $(\sum T(c), i_P)$ space. In that regard, we consider a dataset with $\mu _\emptyset = 0$ and the mean test interestingness measure $q_{mean}^{0.5}$. Then, the considered interestingness measure is given by $q_{mean}^{0.5} = i_P^{0.5} \cdot (\mu _P - \mu _\emptyset ) = \frac{\sum _{c \in P} T(c)}{\sqrt{i_P}} := f(x)$. As two points in the $(\sum T(c), i_P)$ space for which the convexity condition is violated we choose $x=(-100, 2)$ and $y=(-100, 10)$. Additionally, we choose $\lambda =0.5$. Then, the convexity inequality is violated:

$$\begin{aligned} f((1-\lambda ) x + \lambda y)&\le (1-\lambda ) f (x) + \lambda f (y) \\ f((0.5 x + 0.5 y)&\le 0.5\cdot f (x) + 0.5 \cdot f (y) \\ f((-100,6))&\le 0.5 \cdot f ((-100,2)) + 0.5 \cdot f (-100,10) \\ \frac{-100}{\sqrt{6}}&\le 0.5 \cdot \frac{-100}{\sqrt{2}} + 0.5 \cdot \frac{-100}{\sqrt{10}} \\ \approx -40.82&\le ~\approx -51.17 \end{aligned}$$

Since the definition of convexity is violated in at least one example, the mean test interestingness measure $q_{mean}^{0.5}$ is not convex.

The non-convexity of $q_{mean}^{0.5}$ is also evident by a surface plot of the function for $\mu _\emptyset =0$, see Fig. 3. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lemmerich, F., Atzmueller, M. & Puppe, F. Fast exhaustive subgroup discovery with numerical target concepts. Data Min Knowl Disc 30, 711–762 (2016). https://doi.org/10.1007/s10618-015-0436-8

Download citation

Received: 20 October 2014
Accepted: 01 September 2015
Published: 13 November 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10618-015-0436-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast exhaustive subgroup discovery with numerical target concepts

Abstract

Access this article

Similar content being viewed by others

Optimal Subgroup Discovery in Purely Numerical Data

DISDi: Discontinuous Intervals in Subgroup Discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Lemma 1

Proof

Lemma 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast exhaustive subgroup discovery with numerical target concepts

Abstract

Access this article

Similar content being viewed by others

Optimal Subgroup Discovery in Purely Numerical Data

DISDi: Discontinuous Intervals in Subgroup Discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Lemma 1

Proof

Lemma 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation