Efficient redundancy reduced subgroup discovery via quadratic programming

Li, Rui; Perneczky, Robert; Drzezga, Alexander; Kramer, Stefan

doi:10.1007/s10844-013-0284-1

Efficient redundancy reduced subgroup discovery via quadratic programming

Published: 20 November 2013

Volume 44, pages 271–288, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Rui Li¹,
Robert Perneczky^2,3,4,
Alexander Drzezga⁵ &
…
Stefan Kramer⁶

261 Accesses
5 Citations
Explore all metrics

Abstract

Subgroup discovery is a task at the intersection of predictive and descriptive induction, aiming at identifying subgroups that have the most unusual statistical (distributional) characteristics with respect to a property of interest. Although a great deal of work has been devoted to the topic, one remaining problem concerns the redundancy of subgroup descriptions, which often effectively convey very similar information. In this paper, we propose a quadratic programming based approach to reduce the amount of redundancy in the subgroup rules. Experimental results on 12 datasets show that the resulting subgroups are in fact less redundant compared to standard methods. In addition, our experiments show that the computational costs are significantly lower than the costs of other methods compared in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust subgroup discovery

Article Open access 12 August 2022

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Searching for the most significant rules: an evolutionary approach for subgroup discovery

Article 11 December 2015

Notes

Partial correlation may be useful if we want to investigate the dependency between two variables after removing the effect of all others.
Quadratic programming has been used for feature selection before by Rodrigue-Lujan et al. (2010).
Note that in contrast to a previous publication (Schmidt et al. 2010) the target variable is AD/non-AD, not the cluster membership in image clusters.

References

Bache, K., & Lichman, M. (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
Bay, S., & Pazzani, M. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5, 213–246.
Article MATH Google Scholar
Bouckaert, R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In The 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 3–12). Springer.
Bringmann, B., & Zimmermann, A. (2009). One in a million: picking the right patterns. Knowledge and Information Systems, 18, 61–81.
Article Google Scholar
Cohen, W., & Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the sixteenth national conference on artificial intelligence (pp. 335–342). AAAI Press.
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99.
Drzezga, A. (2009). Diagnosis of Alzheimer’s disease with [18F] PET in mild and asymptomatic stages. Behavioural Neurology, 21, 101–115.
Article Google Scholar
Friedman, J., & Popescu, B. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2, 916–954.
Article MATH MathSciNet Google Scholar
Grosskreutz, H., & Paurat, D. (2011). Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In Proceedings of the 21st European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 533–548). Springer-Verlag.
Grosskreutz, H., Rüping, S., Wrobel, S. (2008). Tight optimistic estimates for fast subgroup discovery. In Proceedings of the 18th European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 440–456). Springer-Verlag.
Herrera, F., Carmona, C., González, P., Jesus, M. (2011). An overview on subgroup discovery: foundations and applications. Knowledge and Information Systems, 29, 495–525.
Article Google Scholar
Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Book Google Scholar
Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In Advances in knowledge discovery and data mining.
Klösgen, W., & May, M. (2002). Census data mining-an application. In Mining official data (pp. 65–79).
Lavrac, N., Kavsek, B., Flach, P., Todorovski, L., Wrobel, S. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–118.
Google Scholar
Martin, A., & Frank, P. (2006). SD-Map – a fast algorithm for exhaustive subgroup discovery. In Proceeding of 10th European conferences on principles and practice of knowledge discovery in databases (pp. 6–17).
Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 226–236). ACM.
Novak, P., Lavrac, N., Webb, G. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.
MATH Google Scholar
Rodrigue-Lujan, I., Huerta, R., Elkan, C., Cruz, C. (2010). Quadratic programming feature selection. Journal of Machine Learning Research, 11, 1491–1516.
Google Scholar
Rückert, U., & Kramer, S. (2008). Margin-based first-order rule learning. Machine Learning, 70, 189–206.
Article Google Scholar
Rüping, S. (2009). Ranking interesting subgroups. In Proceedings of the 26th annual International Conference on Machine Learning, ICML ’09 (pp. 913–920). New York: ACM.
Google Scholar
Schmidt, J., Hapfelmeier, A., Müller, M., Perneczky, R., Drzezga, A., Kurz, A., Kramer, S. (2010). Interpreting PET scans by structured patient data: a data mining case study in dementia research. Journal of Knowledge and Information Systems (KAIS), 24, 149–170.
Article Google Scholar
van Leeuwen, M.V., & Knobbe, A. (2011). Non-redundant subgroup discovery in large and complex data. In Proceedings of the 21st European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 459–474). Springer-Verlag.
Vreeken, J., Leeuwen, M., Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining Knowledge Discovery, 23, 169–214.
Article MATH Google Scholar
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the first European symposium on principles of data mining and knowledge discovery.
Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453).
Zimmermann, A., & Raedt, L. (2004). CorClass: Correlated association rule mining for classification. In Proceedings of discovery science (pp. 60–72). Springer.

Download references

Acknowledgement

The first author acknowledges the support of the TUM Graduate School of Information Science in Health (GSISH), Technische Universität München.

Author information

Authors and Affiliations

Institut für Informatik/I12, Technische Universität München, Boltzmannstr. 3, 85748, Garching b. München, Germany
Rui Li
Neuroepidemiology and Ageing Research Unit, School of Public Health, Faculty of Medicine, The Imperial College of Science, Technology and Medicine, St Dunstan’s Road, London, W6 8RP, UK
Robert Perneczky
Klinik und Poliklinik für Psychiatrie und Psychotherapie, Technische Universität München, Ismaninger Str. 22, 81675, München, Germany
Robert Perneczky
West London Cognitive Disorders Treatment and Research Unit, West London Mental Health Trust, London, Brentford Lodge, Boston Manor Road, London, TW8 8DS, UK
Robert Perneczky
Klinik und Poliklinik für Nuklearmedizin, Universität zu Köln, Kerpener Str. 62, 50937, Köln, Germany
Alexander Drzezga
Institut für Informatik, Johannes Gutenberg - Universität Mainz, Staudingerweg 9, 55128, Mainz, Germany
Stefan Kramer

Authors

Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Robert Perneczky
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Drzezga
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Kramer.

Appendix: mutual information matrix is not positive definite

Proposition 2

Mutual information matrix is not positive definite.

Proof

: linear algebra states that a symmetric matrix H is positive definite if x ^T Hx > 0, $\forall x\ne 0$. The mutual information matrix H ∈ ℝ^n×n is symmetric and its entry is denoted as H _ij. Let x be a n × 1 column vector, x ∈ ℝ^n×1. Then, the matrix form of x ^T Hx can be written as:

$$ x^T H x = \begin{bmatrix} x_1 & x_2& \ldots & x_n \end{bmatrix} \begin{bmatrix}H_{11}& H_{12} & \ldots & H_{1n}\\ H_{21} & H_{22} & \ldots & H_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ H_{n1} & H_{n2} & \ldots & H_{nn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2\\ \ldots\\ x_n \end{bmatrix}$$

(15)

$$ = \begin{bmatrix} x_1 & x_2& \ldots & x_n \end{bmatrix}\begin{bmatrix} \displaystyle\mathop{\sum}\nolimits_{i=1}^n H_{1i}x_i \\[3pt] \displaystyle\mathop{\sum}\nolimits_{i=1}^n H_{2i}x_i\\ \ldots\\ \displaystyle\mathop{\sum}\nolimits_{i=1}^n H_{ni}x_i \end{bmatrix} $$

(16)

$$ = H_{11}x_1^2+H_{12}x_1x_2+\ldots+H_{1n}x_1x_n $$

(17)

$$\begin{array}{lll} && + H_{21}x_2x_1+H_{22}x_2^2+\ldots+H_{2n}x_2x_n\\ && + \ldots\end{array}$$

(18)

$$ + H_{n1}x_nx_1+H_{n2}x_nx_2+\ldots+H_{nn}x_n^2 $$

(19)

$$ = \displaystyle\mathop{\sum}\nolimits_{i,j=1}^n H_{ij}x_ix_j. $$

(20)

If n = 4, then x ^T Hx can be expanded as $ H_{11}x_1^2+H_{22}x_2^2+H_{33}x_3^2+H_{44}x_4^2+2H_{12}x_1x_2+2H_{13}x_1x_3+2H_{14}x_1x_4 +2H_{23}x_2x_3+2H_{24}x_2x_4+2H_{34}x_3x_4$, since H _ij = H _ji for symmetric matrix. It is sufficient to show a counter example to complete the proof. For example, if a data set is $\begin{bmatrix}3& 4 & 2 & 5\\ 1 & 4 & 4 & 4\\ 5& 4 & 4 & 3\\ 1 & 3 & 2 & 2\\ \end{bmatrix}$ and the corresponding mutual information matrix H is $\begin{bmatrix}1.5& 0.3 & 0.5 & 1.5\\ 0.3 & 0.8 & 0.3 & 0.8\\ 0.5& 0.3 & 1.0 & 1.0\\ 1.5 & 0.8 & 1.0 & 2.0\\ \end{bmatrix}$ and the column vector $x=\begin{bmatrix} 7.2 & 5.4& 4.9 & -9.9 \end{bmatrix}^T$, then x ^T H x = − 0.19 < 0, which violates the necessary condition for a matrix being positive definite. Therefore the mutual information matrix is not positive definite. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, R., Perneczky, R., Drzezga, A. et al. Efficient redundancy reduced subgroup discovery via quadratic programming. J Intell Inf Syst 44, 271–288 (2015). https://doi.org/10.1007/s10844-013-0284-1

Download citation

Received: 31 January 2013
Revised: 15 September 2013
Accepted: 04 October 2013
Published: 20 November 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10844-013-0284-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient redundancy reduced subgroup discovery via quadratic programming

Abstract

Access this article

Similar content being viewed by others

Robust subgroup discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Searching for the most significant rules: an evolutionary approach for subgroup discovery

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Appendix: mutual information matrix is not positive definite

Proposition 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient redundancy reduced subgroup discovery via quadratic programming

Abstract

Access this article

Similar content being viewed by others

Robust subgroup discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Searching for the most significant rules: an evolutionary approach for subgroup discovery

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Appendix: mutual information matrix is not positive definite

Appendix: mutual information matrix is not positive definite

Proposition 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation