Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Downar, Lennart; Duivesteijn, Wouter

doi:10.1007/s10115-016-0979-z

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Regular Paper
Published: 18 August 2016

Volume 51, pages 369–394, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

393 Accesses
7 Citations
Explore all metrics

Abstract

Exceptional Model Mining strives to find coherent subgroups of the dataset where multiple target attributes interact in an unusual way. One instance of such an investigated form of interaction is Pearson’s correlation coefficient between two targets. EMM then finds subgroups with an exceptionally linear relation between the targets. In this paper, we enrich the EMM toolbox by developing the more general rank correlation model class. We find subgroups with an exceptionally monotone relation between the targets. Apart from catering for this richer set of relations, the rank correlation model class does not necessarily require the assumption of target normality, which is implicitly invoked in the Pearson’s correlation model class. Furthermore, it is less sensitive to outliers. We provide pseudocode for the employed algorithm and analyze its computational complexity, and experimentally illustrate what the rank correlation model class for EMM can find for you on six datasets from an eclectic variety of domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank correlated subgroup discovery

Article 06 April 2019

Stable correlation and robust feature screening

Article 30 April 2021

Exceptional Model Mining

Article 04 February 2015

Notes

Whether this model class falls under the spirit of EMM is debatable; having only a single target prohibits investigating target interaction. Careful reading of EMM literature [4, 5] reveals that the framework (accidentally) allows model classes where \(m=1\). Hence, we cannot formally say that this model class doesn’t fall under the letter of EMM. Since the authors of [12] introduced this model class as an EMM instance, and we cannot formally reject it as such, we adopt it into the EMM canon.

References

Downar L, Duivesteijn W (2015) Exceptionally monotone models—the rank correlation model class for exceptional model mining. ICDM, to appear, Proc
Downar L (2014) A rank correlation model class for exceptional model mining. Bachelor’s thesis, TU Dortmund
Duivesteijn W (2013) Exceptional model mining. PhD thesis, Leiden University
Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining—supervised descriptive local pattern mining with complex target concepts. Data Min Knowl Disc 30:47–98
Article Google Scholar
Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of ECML/PKDD, vol 2, pp 1–16
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1):81–93
Article MATH Google Scholar
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Article Google Scholar
Balasubramaniyan R, Hüllermeier E, Weskamp N, Kämper J (2005) Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 21(7):1069–1077
Yilmaz E, Aslam JA, Robertson S (2008) A new rank correlation coefficient for information retrieval. In: Proceedings of SIGIR, pp 587–594
Breese JS, Heckerman D, Kadie CM (1998) Empirical analysis of predictive algorithms for collaborative filtering. IN: Proceedings of UAI, pp 43–52
Li WK, Lee SY (1980) Application of rank correlation to lanthanide induced shift data. Organ Magn Reson 13(2):97–99
Lemmerich F, Becker M, Atzmüller M (2012) Generic pattern trees for exhaustive exceptional model mining. In: Proceedings of ECML-PKDD, vol 2, pp 277–292
Adam-Bourdarios C, Cowan G, Cécile Germain IG, Kégl B, Rousseau D (2014) Learning to discover: the higgs boson machine learning challenge. http://higgsml.lal.in2p3.fr/documentation/. Accessed 7 Aug
Hand D, Adams N, Bolton R (eds) (2002) Pattern detection and discovery. Springer, New York
MATH Google Scholar
Morik K, Boulicaut JF, Siebes A (eds) (2005) Local pattern detection. Springer, New York
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3):241–258
Article Google Scholar
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, pp 307–328
Herrera F, Carmona CJ, González P, Del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525
Article Google Scholar
Moens S, Boley M (2014) Instant exceptional model mining using weighted controlled pattern sampling. In: Proceedings of IDA, pp 203–214
Duivesteijn W, Knobbe A, Feelders A, Van Leeuwen M (2010) Subgroup discovery meets Bayesian networks—an exceptional model mining approach. In: Proceedings of ICDM, pp 158–167
Duivesteijn W, Feelders A, Knobbe A (2012) Different slopes for different folks—mining for exceptional regression models with Cook’s distance. In: Proceedings of KDD, pp 868–876
Duivesteijn W, Thaele J (2014) Understanding where your classifier does (not) work—the SCaPE model class for EMM. In: Proceedings of ICDM, pp 809–814
Kowalski CJ (1972) On the effects of non-normality on the distribution of the sample product-moment correlation coefficient. J R Stat Soc Ser C (Appl Stat) 21(1):1–12
MathSciNet Google Scholar
Anscombe FJ (1973) Graphs in statistical analysis. Am Stat 27(1):17–21
Google Scholar
Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Disc 5(3):213–246
Article MATH Google Scholar
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of KDD, pp 43–52
Kralj Novak P, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
Jorge AM, Azevedo PJ, Pereira F (2006) Distribution rules with numeric attributes of interest. In: Proceedings of PKDD, pp 247–258
Umek L, Zupan B (2011) Subgroup discovery in data sets with multi-dimensional responses. Intell Data Anal 15(4):533–549
Google Scholar
Galbrun E, Miettinen P (2012) From black and white to full color: extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303
Article MathSciNet Google Scholar
Fisher DH, Langley PW (1986) Conceptual clustering and its relation to numerical taxonomy. In: Gale WA (ed) Artificial intelligence and statistics, reading. Addison-Wesley, Boston, pp 77–116
Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Wareh Min 3(3):1–13
Article Google Scholar
Duivesteijn W, Loza Mencía E, Fürnkranz J, Knobbe A (2012) Multi-label LeGo—enhancing multi-label classifiers with local patterns. Technical report TUD-KE-2012-02, TU Darmstadt
Clark M (2013) A comparison of correlation measures. Technical report, University of Notre Dame
Hoeffding W (1948) A non-parametric test of independence. Ann Math Stat 19(4):546–557
Article MathSciNet MATH Google Scholar
Blum JR, Kiefer J, Rosenblatt M (1961) Distribution free tests of independence based on the sample distribution function. Ann Math Stat 32(2):485–498
Article MathSciNet MATH Google Scholar
Hollander M, Wolfe D (1999) Nonparametric statistical methods. Series in probability and statistics, 2nd edn. Wiley, Hoboken
MATH Google Scholar
Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794
Article MathSciNet MATH Google Scholar
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334:1518–1524
Article MATH Google Scholar
Kinney JB, Atwal GS (2014) Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci USA 111(9):3354–3359
Article MathSciNet MATH Google Scholar
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: Proceedings of ALT, pp 63–77
Lopez-Paz D, Hennig P, Schölkopf B (2013) The randomized dependence coefficient. Advances in Neural Information Processing Systems, pp 1–9
Gebelein H (1941) Das statistische problem der Korrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. Z Angew Math Mech 21:364–379
Article MathSciNet MATH Google Scholar
Conover WJ (1971) Practical nonparametric statistics. Wiley, Hoboken
Fisher RAS (1970) Statistical methods for research workers, 14th edn. Oliver and Boyd, London
MATH Google Scholar
Fieller EC, Hartley HO, Pearson ES (1957) Tests for rank correlation coefficients. I. Biometrika 44(4):470–481
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp 1–12
Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of KDD, pp 935–940
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. University of California, School of Information and Computer Science, Irvine, CA
Anglin PM, Gençay R (1996) Semiparametric estimation of a hedonic price function. J Appl Econom 11(6):633–648
Article Google Scholar
Rousseauw J, du Plessis J, Benade A, Jordaan P, Kotze J, Jooste P, Ferreira J (1983) Coronary risk factor screening in three rural communities. S Afr Med J 64:430–436
Google Scholar
Hastie T, Tibshirani R, Friedman J (2010) The elements of statistical learning. Springer, Stanford
MATH Google Scholar
Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40:203–228
Article MATH Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of ICDE, pp 1037–1048
Nguyen HV, Müller E, Böhm K (2013) 4S: scalable subspace search scheme overcoming traditional apriori processing. In: Proceedings of BigData, pp 359–367
Nguyen HV, Müller E, Vreeken J, Efros P, Böhm K (2014) Multivariate maximal correlation analysis. In: Proceedings of ICML, pp 775–783

Download references

Acknowledgments

We would like to thank Dr. Johannes Albrecht (Emmy Noether group leader at the TU Dortmund, department of experimental physics, with research focus on the CERN LHCb Experiment) for fruitful discussion and helpful comments. This research is supported in part by the Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis,” Project A1. This work was supported by the European Union through the ERC Consolidator Grant FORSIED (Project Reference 615517).

Author information

Authors and Affiliations

Fakultät für Informatik, LS VIII, Technische Universität Dortmund, Dortmund, Germany
Lennart Downar
Data Science Lab and iMinds, Universiteit Gent, Gent, Belgium
Wouter Duivesteijn

Authors

Lennart Downar
View author publications
You can also search for this author in PubMed Google Scholar
Wouter Duivesteijn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wouter Duivesteijn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Downar, L., Duivesteijn, W. Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining. Knowl Inf Syst 51, 369–394 (2017). https://doi.org/10.1007/s10115-016-0979-z

Download citation

Received: 13 December 2015
Revised: 27 June 2016
Accepted: 28 July 2016
Published: 18 August 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10115-016-0979-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Abstract

Access this article

Similar content being viewed by others

Rank correlated subgroup discovery

Stable correlation and robust feature screening

Exceptional Model Mining

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Abstract

Access this article

Similar content being viewed by others

Rank correlated subgroup discovery

Stable correlation and robust feature screening

Exceptional Model Mining

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation