Skip to main content
Log in

Efficient redundancy reduced subgroup discovery via quadratic programming

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Subgroup discovery is a task at the intersection of predictive and descriptive induction, aiming at identifying subgroups that have the most unusual statistical (distributional) characteristics with respect to a property of interest. Although a great deal of work has been devoted to the topic, one remaining problem concerns the redundancy of subgroup descriptions, which often effectively convey very similar information. In this paper, we propose a quadratic programming based approach to reduce the amount of redundancy in the subgroup rules. Experimental results on 12 datasets show that the resulting subgroups are in fact less redundant compared to standard methods. In addition, our experiments show that the computational costs are significantly lower than the costs of other methods compared in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Partial correlation may be useful if we want to investigate the dependency between two variables after removing the effect of all others.

  2. Quadratic programming has been used for feature selection before by Rodrigue-Lujan et al. (2010).

  3. Note that in contrast to a previous publication (Schmidt et al. 2010) the target variable is AD/non-AD, not the cluster membership in image clusters.

References

  • Bache, K., & Lichman, M. (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.

  • Bay, S., & Pazzani, M. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5, 213–246.

    Article  MATH  Google Scholar 

  • Bouckaert, R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In The 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 3–12). Springer.

  • Bringmann, B., & Zimmermann, A. (2009). One in a million: picking the right patterns. Knowledge and Information Systems, 18, 61–81.

    Article  Google Scholar 

  • Cohen, W., & Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the sixteenth national conference on artificial intelligence (pp. 335–342). AAAI Press.

  • Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99.

  • Drzezga, A. (2009). Diagnosis of Alzheimer’s disease with [18F] PET in mild and asymptomatic stages. Behavioural Neurology, 21, 101–115.

    Article  Google Scholar 

  • Friedman, J., & Popescu, B. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2, 916–954.

    Article  MATH  MathSciNet  Google Scholar 

  • Grosskreutz, H., & Paurat, D. (2011). Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In Proceedings of the 21st European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 533–548). Springer-Verlag.

  • Grosskreutz, H., Rüping, S., Wrobel, S. (2008). Tight optimistic estimates for fast subgroup discovery. In Proceedings of the 18th European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 440–456). Springer-Verlag.

  • Herrera, F., Carmona, C., González, P., Jesus, M. (2011). An overview on subgroup discovery: foundations and applications. Knowledge and Information Systems, 29, 495–525.

    Article  Google Scholar 

  • Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.

    Book  Google Scholar 

  • Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In Advances in knowledge discovery and data mining.

  • Klösgen, W., & May, M. (2002). Census data mining-an application. In Mining official data (pp. 65–79).

  • Lavrac, N., Kavsek, B., Flach, P., Todorovski, L., Wrobel, S. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–118.

    Google Scholar 

  • Martin, A., & Frank, P. (2006). SD-Map – a fast algorithm for exhaustive subgroup discovery. In Proceeding of 10th European conferences on principles and practice of knowledge discovery in databases (pp. 6–17).

  • Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 226–236). ACM.

  • Novak, P., Lavrac, N., Webb, G. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.

    MATH  Google Scholar 

  • Rodrigue-Lujan, I., Huerta, R., Elkan, C., Cruz, C. (2010). Quadratic programming feature selection. Journal of Machine Learning Research, 11, 1491–1516.

    Google Scholar 

  • Rückert, U., & Kramer, S. (2008). Margin-based first-order rule learning. Machine Learning, 70, 189–206.

    Article  Google Scholar 

  • Rüping, S. (2009). Ranking interesting subgroups. In Proceedings of the 26th annual International Conference on Machine Learning, ICML ’09 (pp. 913–920). New York: ACM.

    Google Scholar 

  • Schmidt, J., Hapfelmeier, A., Müller, M., Perneczky, R., Drzezga, A., Kurz, A., Kramer, S. (2010). Interpreting PET scans by structured patient data: a data mining case study in dementia research. Journal of Knowledge and Information Systems (KAIS), 24, 149–170.

    Article  Google Scholar 

  • van Leeuwen, M.V., & Knobbe, A. (2011). Non-redundant subgroup discovery in large and complex data. In Proceedings of the 21st European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 459–474). Springer-Verlag.

  • Vreeken, J., Leeuwen, M., Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining Knowledge Discovery, 23, 169–214.

    Article  MATH  Google Scholar 

  • Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the first European symposium on principles of data mining and knowledge discovery.

  • Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453).

  • Zimmermann, A., & Raedt, L. (2004). CorClass: Correlated association rule mining for classification. In Proceedings of discovery science (pp. 60–72). Springer.

Download references

Acknowledgement

The first author acknowledges the support of the TUM Graduate School of Information Science in Health (GSISH), Technische Universität München.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Kramer.

Appendix: mutual information matrix is not positive definite

Appendix: mutual information matrix is not positive definite

Proposition 2

Mutual information matrix is not positive definite.

Proof

: linear algebra states that a symmetric matrix H is positive definite if x T Hx > 0, \(\forall x\ne 0\). The mutual information matrix H ∈ ℝn×n is symmetric and its entry is denoted as H ij . Let x be a n × 1 column vector, x ∈ ℝn×1. Then, the matrix form of x T Hx can be written as:

$$ x^T H x = \begin{bmatrix} x_1 & x_2& \ldots & x_n \end{bmatrix} \begin{bmatrix}H_{11}& H_{12} & \ldots & H_{1n}\\ H_{21} & H_{22} & \ldots & H_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ H_{n1} & H_{n2} & \ldots & H_{nn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2\\ \ldots\\ x_n \end{bmatrix}$$
(15)
$$ = \begin{bmatrix} x_1 & x_2& \ldots & x_n \end{bmatrix}\begin{bmatrix} \displaystyle\mathop{\sum}\nolimits_{i=1}^n H_{1i}x_i \\[3pt] \displaystyle\mathop{\sum}\nolimits_{i=1}^n H_{2i}x_i\\ \ldots\\ \displaystyle\mathop{\sum}\nolimits_{i=1}^n H_{ni}x_i \end{bmatrix} $$
(16)
$$ = H_{11}x_1^2+H_{12}x_1x_2+\ldots+H_{1n}x_1x_n $$
(17)
$$\begin{array}{lll} && + H_{21}x_2x_1+H_{22}x_2^2+\ldots+H_{2n}x_2x_n\\ && + \ldots\end{array}$$
(18)
$$ + H_{n1}x_nx_1+H_{n2}x_nx_2+\ldots+H_{nn}x_n^2 $$
(19)
$$ = \displaystyle\mathop{\sum}\nolimits_{i,j=1}^n H_{ij}x_ix_j. $$
(20)

If n = 4, then x T Hx can be expanded as \( H_{11}x_1^2+H_{22}x_2^2+H_{33}x_3^2+H_{44}x_4^2+2H_{12}x_1x_2+2H_{13}x_1x_3+2H_{14}x_1x_4 +2H_{23}x_2x_3+2H_{24}x_2x_4+2H_{34}x_3x_4\), since H ij  = H ji for symmetric matrix. It is sufficient to show a counter example to complete the proof. For example, if a data set is \(\begin{bmatrix}3& 4 & 2 & 5\\ 1 & 4 & 4 & 4\\ 5& 4 & 4 & 3\\ 1 & 3 & 2 & 2\\ \end{bmatrix}\) and the corresponding mutual information matrix H is \(\begin{bmatrix}1.5& 0.3 & 0.5 & 1.5\\ 0.3 & 0.8 & 0.3 & 0.8\\ 0.5& 0.3 & 1.0 & 1.0\\ 1.5 & 0.8 & 1.0 & 2.0\\ \end{bmatrix}\) and the column vector \(x=\begin{bmatrix} 7.2 & 5.4& 4.9 & -9.9 \end{bmatrix}^T\), then x T H x = − 0.19 < 0, which violates the necessary condition for a matrix being positive definite. Therefore the mutual information matrix is not positive definite. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, R., Perneczky, R., Drzezga, A. et al. Efficient redundancy reduced subgroup discovery via quadratic programming. J Intell Inf Syst 44, 271–288 (2015). https://doi.org/10.1007/s10844-013-0284-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0284-1

Keywords

Navigation