Skip to main content
Log in

Fast exhaustive subgroup discovery with numerical target concepts

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Subgroup discovery is a key data mining method that aims at identifying descriptions of subsets of the data that show an interesting distribution with respect to a pre-defined target concept. For practical applications the integration of numerical data is crucial. Therefore, a wide variety of interestingness measures has been proposed in literature that use a numerical attribute as the target concept. However, efficient mining in this setting is still an open issue. In this paper, we present novel techniques for fast exhaustive subgroup discovery with a numerical target concept. We initially survey previously proposed measures in this setting. Then, we explore options for pruning the search space using optimistic estimate bounds. Specifically, we introduce novel bounds in closed form and ordering-based bounds as a new technique to derive estimates for several types of interestingness measures with no previously known bounds. In addition, we investigate efficient data structures, namely adapted FP-trees and bitset-based data representations, and discuss their interdependencies to interestingness measures and pruning schemes. The presented techniques are incorporated into two novel algorithms. Finally, the benefits of the proposed pruning bounds and algorithms are assessed and compared in an extensive experimental evaluation on 24 publicly available datasets. The novel algorithms reduce runtimes consistently by more than one order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Available at www.vikamine.org.

References

  • Alcala-Fernandez J, Fernandez A, Luengo J, Derrac J, Garcia S, Sanchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287

    Google Scholar 

  • Atzmueller M (2015) Subgroup discovery—advanced review. WIREs Data Mining Knowl Discov 5(1):35–49

    Article  Google Scholar 

  • Atzmueller M, Lemmerich F (2009) Fast subgroup discovery for continuous target concepts. In: Proceedings of the 18th international symposium on foundations of intelligent systems (ISMIS), p 35–44

  • Atzmueller M, Lemmerich F (2012) VIKAMINE—Open-source subgroup discovery, pattern mining, and analytics. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 842–845

  • Atzmueller M, Lemmerich F (2013) Exploratory pattern mining on social media using geo-references and social tagging information. Int J Web Sci 2(1–2):80–112

    Article  Google Scholar 

  • Atzmueller M, Lemmerich F, Krause B, Hotho A (2009) Who are the spammers? Understandable local patterns for concept description. In: Proceedings of the 7th conference on computer methods and systems

  • Atzmueller M, Mueller J, Becker M (2015) Exploratory subgroup analytics on ubiquitous data. In: Atzmueller A, Chin A, Scholz C, Trattner C (Ed.), Mining, modeling and recommending ’things’ in social media, p 1–20. Springer

  • Atzmueller M, Puppe F (2006) SD-Map—a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), p 6–17

  • Atzmueller M, Puupe F (2009) A knowledge-intensive approach for semi-automatic causal subgroup discovery. In: Berendt B et al (eds) Knowledge discovery enhanced with semantic and social information, vol 220. Springer, Berlin, pp 19–36

    Chapter  Google Scholar 

  • Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 261–270

  • Aumann Y, Lindell Y (2003) A statistical theory for quantitative association rules. J Intell Inf Syst 20(3):255–283

    Article  Google Scholar 

  • Batal I, Hauskrecht M (2010) A concise representation of association rules using minimal predictive rules. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 87–102

  • Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Discov 5(3):213–246

    Article  MATH  Google Scholar 

  • Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, p 85–93

  • Bayardo RJ, Agrawal R, Gunopulos D (1999) Constraint-based rule mining in large, dense databases. Data Min Knowl Discov 4(2–3):217–240

    Google Scholar 

  • Box GEP (1953) Non-normality and tests on variances. Biometrika 40:318–335

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L, Friedman JH, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman & Hall, Boca Raton

    MATH  Google Scholar 

  • Brin S, Rastogi R, Shim K (2003) Mining optimized gain rules for numeric attributes. IEEE Trans Knowl Data Eng 15(2):324–338

    Article  Google Scholar 

  • Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: Proceedings of the 24th international conference on data engineering (ICDE), p 169–178

  • Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 43–52

  • Duivesteijn W, Knobbe AJ, Feelders A, van Leeuwen M (2010) Subgroup discovery meets bayesian networks—an exceptional model mining approach. In: Proceedings of the 10th international conference on data mining (ICDM), p 158–167

  • El-Qawasmeh E (2003) Beating the popcount. Int J Inf Technol 9(1):1–18

    Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence (IJCAI), p 1022–1027

  • Freidlin B, Gastwirth JL (2000) Should the median test be retired from general use? Am Stat 54(3):161–164

    Google Scholar 

  • Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Mining optimized association rules for numeric attributes. In: Proceedings of the 15th ACM symposium on principles of database systems (PODS), p 182–191

  • García S, Luengo J, Saez JA, Lopez V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750

    Article  Google Scholar 

  • Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9

    Article  Google Scholar 

  • Grosskreutz H (2008) Cascaded subgroups discovery with an application to regression. In: From local patterns to global models, workshop at the ECML/PKDD, p 275–286

  • Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19(2):210–226

    Article  MathSciNet  Google Scholar 

  • Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Proceedings of the 2008 European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 440–456

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12

    Article  Google Scholar 

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  • Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybernet 4(2):100–107

    Article  Google Scholar 

  • Jorge AM, Azevedo PJ, Pereira F (2006) Distribution rules with numeric attributes of interest. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), p 247–258

  • Kavšek B, Lavrač N (2006) Apriori-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20:543–583

    Article  Google Scholar 

  • Klösgen W (1994) Exploration of simulation experiments by discovery. Technical Report WS-04-03

  • Klösgen W (1995) Efficient discovery of interesting statements in databases. J Intell Inf Syst 4(1):53–69

    Article  Google Scholar 

  • Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad U-M, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, Cambridge, pp 249–271

    Google Scholar 

  • Klösgen W (2002) Data mining tasks and methods: subgroup discovery: deviation analysis. In: Klösgen W, Zytkow JM (ed), Handbook of Data Mining and Knowledge Discovery, p 354–361

  • Klösgen W, May M (2002) Census data mining—an application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD)

  • Kotsiantis S, Kanellopoulos D (2006) Discretization techniques: a recent survey. GESTS Int Trans Comput Sci Eng 32(1):47–58

    Google Scholar 

  • Kralj Novak P, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403

    MATH  Google Scholar 

  • Lavrač N, Kavšek B, Flach PA, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188

    MathSciNet  Google Scholar 

  • Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 1–16

  • Lemmerich F (2014) Novel techniques for efficient and effective subgroup discovery. PhD thesis, Universität Würzburg

  • Lemmerich F, Atzmueller M (2012) Describing locations using tags and images: explorative pattern mining in social media. In: Revised selected papers from the workshops on modeling and mining ubiquitous social media, p 77–96

  • Lemmerich F, Becker M, Atzmueller M (2012) Generic pattern trees for exhaustive exceptional model mining. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 277–292

  • Lemmerich F, Becker M, Puppe F (2013) Difference-based estimates for generalization-aware subgroup discovery. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), p 288–303

  • Lemmerich F, Puppe F (2011) Local models for expectation-driven subgroup discovery. In: Proceedings of the 11th international conference on data mining (ICDM), p 360–369

  • Lemmerich F, Rohlfs M, Atzmueller M (2010) Fast discovery of relevant subgroup patterns. In: Proceedings of the 23rd Florida artificial intelligence research society conference (FLAIRS), p 428–433

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Lucas JP, Jorge AM, Pereira F, Pernas AM, Machado AA (2007) A tool for interactive subgroup discovery using distribution rules. In: Proceedings of the artificial intelligence 13th Portuguese conference on progress in artificial intelligence (EPIA), p 426–436

  • Mampaey M, Nijssen S, Feelders A, Knobbe AJ (2012) Efficient algorithms for finding richer subgroup descriptions in numeric and nominal data. In: Proceedings of the 12th international conference on data mining (ICDM), p 499–508

  • Moreland K, Truemper K (2009) Discretization of target attributes for subgroup discovery. In: Proceedings of the 6th international conference on machine learning and data mining in pattern recognition (MLDM), p 44–52

  • Morishita S (1998) On classification and regression. In: Proceedings of the first international conference on discovery science, p 40–57

  • Morishita S, Sese J (2000) Traversing itemset lattices with statistical metric pruning. In: Proceedings of the 19th ACM symposium on principles of database systems (PODS), p 226–236

  • Pieters BFI (2010) Subgroup discovery on numeric and ordinal targets, with an application to biological data aggregation. Technical report, Universiteit Utrecht

  • Pieters BFI, Knobbe AJ, Džeroski S (2010) Subgroup discovery in ranked data, with an application to gene set enrichment. In: Preference learning, workshop at the ECML/PKDD, vol. 10, p 1–18

  • Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50

    Article  Google Scholar 

  • Webb GI (1995) OPUS: an efficient admissible algorithm for unordered search. J Artif Intell Res 3(1):431–465

    MATH  Google Scholar 

  • Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 383–388

  • Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery (PKDD), p 78–87

  • Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  MathSciNet  Google Scholar 

  • Zimmermann A, De Raedt L (2009) Cluster-grouping: from subgroup discovery to clustering. Mach Learn 77(1):125–159

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the VENUS research cluster at the interdisciplinary Research Center for Information System Design (ITeG) at Kassel University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Lemmerich.

Additional information

Responsible editor: M.J. Zaki.

This paper summarizes and extends contents of the dissertation of the first author (Lemmerich 2014). A small part of this work, i.e., the SD-Map*  algorithm for mean-based interestingness measures only, has been described in a previous publication (Atzmueller and Lemmerich 2009).

Appendix

Appendix

Lemma 1

Using the notations of Theorem 9, the function \(f^a (x) (n+x)^a \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) \) has no local maxima inside its domain of definition:

$$\begin{aligned} f^a(x) \le max (f(0), f(x_{max})) \end{aligned}$$

Proof

We distinguish three cases by the parameter a of the applied generic mean interestingness measure:

first, for \(a = 1\), it holds that

$$\begin{aligned} f^1 (x)&= (n+x)^1 \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) \\&= \sigma + \theta x - \mu _\emptyset n - \mu _\emptyset x \\&= \left( \theta - \mu _\emptyset \right) \cdot x + \sigma - \mu _\emptyset n \end{aligned}$$

As this is a linear function in x, the function \(f^1(x)\) is strictly increasing for \(\theta > \mu _\emptyset \) and strictly decreasing otherwise. Thus, the theorem holds for \(a=1\).

Second, we consider the case \((a \ne 1) \wedge (\sigma = \theta n)\), that is, the first n instances all had the same target value. In this case, the function \(f^a(x)\) is given by \(f^a(x) = (n+x)^a (\theta -\mu _\emptyset )\). This is strictly monotone since \(n > 0, x > 0\). Thus, again \(f^a(x)\) has no local maximum.

Third, the case \((a \ne 1) \wedge (\sigma \ne \theta n)\) is considered in detail: since \(\sigma \) was computed as a sum of n values that are at least as large as \(\theta \) it can be assumed that \(\theta \cdot n < \sigma \). In the following, the maxima of \(f^a(x)\) is determined by deriving this function twice.

$$\begin{aligned} f^a\,'(x)&= \frac{d}{d x} f^a(x) = (n+x)^a \cdot \left( \frac{\sigma + x \cdot \theta }{n+x}-\mu _\emptyset \right) \\&= (n+x)^a \left( \frac{d}{dx} \left( \frac{\theta x+\sigma }{n+x}-\mu _\emptyset \right) \right) + \left( \frac{\theta x +\sigma }{n+x}-\mu _\emptyset \right) \cdot \left( \frac{d}{dx}(n+x)^a\right) \\&= (n+x)^a \left( \frac{d}{dx} \left( \frac{\theta x+\sigma }{n+x} \right) \right) + \left( \frac{\theta x+\sigma }{n+x}-\mu _\emptyset \right) \cdot a (n+x)^{a-1} \\&= (n+x)^a \left( \frac{\theta }{n+x} - \frac{\theta x+\sigma }{(n+x)^2}\right) + \left( \frac{\theta x+\sigma }{n+x}-\mu _\emptyset \right) \cdot a(n+x)^{a-1} \\&= (n+x)^{a-2} \left( (\theta (n+x)- (\theta x+\sigma )\right) + a\left( \theta x+\sigma - \mu _\emptyset (n+x))\right) \\&= (n+x)^{a-2} (\theta n - \sigma + a\theta x+ a\sigma - a \mu _\emptyset n - a \mu _\emptyset x))\\&= (n+x)^{a-2} \left( (x (a\theta -a \mu _\emptyset ) + a\sigma -a \mu _\emptyset n + \theta n - \sigma \right) \end{aligned}$$

In line 2, the product rule is used. In line 3 the chain rule is applied, substituting (n+x). \(\mu _\emptyset \) can be omitted, as it is constant with respect to x. In line 4 the quotient rule is used. Finally, in line 5 \((n+x)^{a-2}\) is factored out.

Since \(x > 0, n > 0\) by definition, the first factor is obviously greater than zero for any valid x. For \(a = 0\) or \(\theta = \mu _\emptyset \), the second factor of this function is independent from x, so it has no root, thus f(x) has no maxima except the definition boundaries in this case. Otherwise the root of this function and therefore the only candidate for a maximum of \(f^a(x)\) is given at the point

$$\begin{aligned} x^* = \frac{-a \sigma + an \mu _\emptyset - \theta n + \sigma }{a(\theta - \mu _\emptyset )}. \end{aligned}$$

In the following, it is shown that \(x^*\) can not be a maximum value in our setting. For that purpose, the second derivative of f(x) is computed at the point \(x^*\):

$$\begin{aligned} f^a\,''(x)&= \frac{d}{d x} f'(x) \\&= (n+x)^{a-3}(a-2)(x(a\theta -a \mu _\emptyset )\\&\quad + a \sigma -an +\theta n-\sigma )+ (a\theta -a \mu _\emptyset )(n+x)^{a-2} \\&= (n+x)^{a-3} ((a-2) (x(a\theta -a \mu _\emptyset ) \\&\quad + a \sigma -an \mu _\emptyset +\theta n-\sigma ) + (a\theta -a \mu _\emptyset ) (n+x)) \\&= (n+x)^{a-3} (a^2x\theta -a^2x \mu _\emptyset + a^2\sigma -a^2 \mu _\emptyset n + a\theta n-a\sigma - 2xa\theta \\&\quad + 2ax \mu _\emptyset -2a\sigma + 2an \mu _\emptyset - 2\theta n + 2\sigma + a\theta n - an \mu _\emptyset + a\theta x - ax \mu _\emptyset ) \\&= (n+x)^{a-3} (a-1) (a\theta x+a\sigma -an\mu _\emptyset -ax\mu _\emptyset +2\theta n-2\sigma ) \\&= (n+x)^{a-3} (a-1) (x(a\theta -a\mu _\emptyset ) +a\sigma -an\mu _\emptyset +2\theta n-2\sigma ) \end{aligned}$$

We now can determine the second derivative of f in \(x^*\):

$$\begin{aligned} f^a\,''(x^*)&= (n+x^*)^{a-3} (a-1) (x^*(a\theta -a \mu _\emptyset ) +a\sigma - an\mu _\emptyset +2\theta n-2\sigma )\\&= (n+x^*)^{a-3} (a-1) \\&\quad \left( \frac{-a \sigma + an \mu _\emptyset - \theta n + \sigma }{a(\theta - \mu _\emptyset )}(a\theta -a \mu _\emptyset ) +a\sigma -an \mu _\emptyset +2\theta n-2\sigma \right) \\&= (n+x^*)^{a-3} (a-1) (-a \sigma + an \mu _\emptyset - \theta n + \sigma +a \sigma -an \mu _\emptyset + 2 \theta n-2 \sigma )\\&= (n+x^*)^{a-3} (a-1) (\theta n - \sigma )\\&= (n+x^*)^{a-3} (a-1) (\theta n - \sigma ) \end{aligned}$$

Since \(f^a(x)\) is defined only for positive x, the first factor is always positive. Since by premise \(a < 1\) and \(\theta n < \sigma \), the second derivative at point \(x^*\) is always positive. Thus, if \(x^*\) is an extreme value of f(x), then it is a local minimum. Since it was shown above that f(x) has no other candidates for extreme values besides \(x^*\), this proves the lemma.

Lemma 2

The generic mean-based measures \(q_{mean}^a\) are convex for \(a=1\) in the \((\sum T(c), i_P)\) space. They are not convex for arbitrary a.

Fig. 3
figure 3

A surface plot of the mean test interestingness measure \(q_{mean}^{0.5}\) for \(\mu _\emptyset =0\) shows the non-convexity of this measure

Proof

For \(a=1\), the interestingness measure is given by \(q_{mean}^1 (P) = i_P \cdot (\mu _P - \mu _\emptyset ) = i_P \cdot (\frac{\sum _{c \in P} T(c)}{i_P} - \mu _\emptyset ) = \sum _{c \in P} T(c) - i_P \mu _\emptyset \). This function is linear in both \(\sum T(C)\) and \(i_P\). Since linear functions are known to be convex, \(q_{mean}^1\) is convex.

To show that generic mean-based measures are not convex in general, we show an example where the definition of convex for a function f(x), that is, \(\forall x,y, \lambda \in (0,1): f((1-\lambda ) x + \lambda y) \le (1-\lambda ) f (x) + \lambda f (y)\), is violated. In our case, x and y are each two-dimensional points in the \((\sum T(c), i_P)\) space. In that regard, we consider a dataset with \(\mu _\emptyset = 0\) and the mean test interestingness measure \(q_{mean}^{0.5}\). Then, the considered interestingness measure is given by \(q_{mean}^{0.5} = i_P^{0.5} \cdot (\mu _P - \mu _\emptyset ) = \frac{\sum _{c \in P} T(c)}{\sqrt{i_P}} := f(x)\). As two points in the \((\sum T(c), i_P)\) space for which the convexity condition is violated we choose \(x=(-100, 2)\) and \(y=(-100, 10)\). Additionally, we choose \(\lambda =0.5\). Then, the convexity inequality is violated:

$$\begin{aligned} f((1-\lambda ) x + \lambda y)&\le (1-\lambda ) f (x) + \lambda f (y) \\ f((0.5 x + 0.5 y)&\le 0.5\cdot f (x) + 0.5 \cdot f (y) \\ f((-100,6))&\le 0.5 \cdot f ((-100,2)) + 0.5 \cdot f (-100,10) \\ \frac{-100}{\sqrt{6}}&\le 0.5 \cdot \frac{-100}{\sqrt{2}} + 0.5 \cdot \frac{-100}{\sqrt{10}} \\ \approx -40.82&\le ~\approx -51.17 \end{aligned}$$

Since the definition of convexity is violated in at least one example, the mean test interestingness measure \(q_{mean}^{0.5}\) is not convex.

The non-convexity of \(q_{mean}^{0.5}\) is also evident by a surface plot of the function for \(\mu _\emptyset =0\), see Fig. 3. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lemmerich, F., Atzmueller, M. & Puppe, F. Fast exhaustive subgroup discovery with numerical target concepts. Data Min Knowl Disc 30, 711–762 (2016). https://doi.org/10.1007/s10618-015-0436-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0436-8

Keywords

Navigation