Data Mining and Knowledge Discovery

, Volume 30, Issue 1, pp 47–98 | Cite as

Exceptional Model Mining

Supervised descriptive local pattern mining with complex target concepts
  • Wouter Duivesteijn
  • Ad J. Feelders
  • Arno Knobbe


Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (common use of subgroup discovery). These, however, do not encompass all forms of “interesting”. To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these targets is chosen to be the target concept. Then, we strive to find subgroups: subsets of the dataset that can be described by a few conditions on single attributes. Such subgroups are deemed interesting when the model over the targets on the subgroup is substantially different from the model on the whole dataset. For instance, we can find subgroups where two target attributes have an unusual correlation, a classifier has a deviating predictive performance, or a Bayesian network fitted on several target attributes has an exceptional structure. We give an algorithmic solution for the EMM framework, and analyze its computational complexity. We also discuss some illustrative applications of EMM instances, including using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand.


Exceptional Model Mining Subgroup Discovery Supervised Local Pattern Mining Regression Bayesian Networks 

Mathematics Subject Classification

H.2.8: Data mining 



This research is supported in part by the Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project C1, and in part by the Netherlands Organisation for Scientific Research (NWO) under project number 612.065.822 (Exceptional Model Mining).


  1. Agresti A (1990) Categorical data analysis. Wiley, New YorkGoogle Scholar
  2. Aidt T, Tzannatos Z (2002) Unions and collective bargaining. The World Bank, Washington, DCCrossRefGoogle Scholar
  3. Anglin PM, Gençay R (1996) Semiparametric estimation of a hedonic price function. J Appl Econ 11(6):633–648CrossRefGoogle Scholar
  4. Atzmüller M, Lemmerich F (2009) Fast subgroup discovery for continuous target concepts. In: Proceedings of ISMIS, pp 35–44Google Scholar
  5. Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Discov 5(3):213–246zbMATHCrossRefGoogle Scholar
  6. Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Procedings of ICML, pp 55–63Google Scholar
  7. Boley M, Grosskreutz H (2009) Non-redundant subgroup discovery using a closure system. In: Proceedings of ECML/PKDD, vol 1, pp 179–194Google Scholar
  8. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, MontereyzbMATHGoogle Scholar
  9. de Campos LM, Fernández-Luna JM, Huete JF (2004) Bayesian networks and information retrieval: an introduction to the special issue. Inf Process Manag 40(5):727–733CrossRefGoogle Scholar
  10. Carmona CJ, González P, del Jesus MJ, Herrera F (2010) NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 18(5):958–970CrossRefGoogle Scholar
  11. Chao C, Velicer C, Slezak JM, Jacobsen SJ (2009) Correlates for completion of 3-dose regimen of HPV vaccine in female members of a managed care organization. Mayo Clin Proc 84(10):864–870CrossRefGoogle Scholar
  12. Cook RD (1977) Detection of influential observation in linear regression. Technometrics 19(1):15–18zbMATHMathSciNetGoogle Scholar
  13. Cook RD, Weisberg S (1980) Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 22(4):495–508zbMATHMathSciNetCrossRefGoogle Scholar
  14. Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman & Hall, LondonzbMATHGoogle Scholar
  15. Costanigro M, Mittelhammer RC, McCluskey JJ (2009) Estimating class-specific parametric models under class uncertainty: local polynomial regression clustering in an hedonic analysis of wine markets. J Appl Econ 24:1117–1135MathSciNetCrossRefGoogle Scholar
  16. Davis GA (2003) Bayesian reconstruction of traffic accidents. Law Probab Risk 2:69–89CrossRefGoogle Scholar
  17. Díez FJ, Mira J, Iturralde E, Zubillaga S (1997) DIAVAL, a Bayesian expert system for echocardiography. Artif Intell Med 10:59–73CrossRefGoogle Scholar
  18. Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of KDD, pp 43–52Google Scholar
  19. Dougherty C (2011) Introduction to econometrics, 4th edn. Oxford University Press, OxfordGoogle Scholar
  20. Duivesteijn W, Feelders A, Knobbe AJ (2012) Different slopes for different folks—mining for exceptional regression models with Cook’s distance. In: Proceedings of KDD, pp 868–876Google Scholar
  21. Duivesteijn W, Knobbe AJ, Feelders A, van Leeuwen M (2010) Subgroup discovery meets Bayesian networks—an exceptional model mining approach. In: Proceedings of ICDM, pp 158–167Google Scholar
  22. Duivesteijn W, Loza Mencía E, Fürnkranz J, Knobbe AJ (2012) Multi-label LeGo—enhancing multi-label classifiers with local patterns. In: Proceedings of IDA, pp 114–125Google Scholar
  23. Friedman J, Fisher N (1999) Bump-hunting in high-dimensional data. Stat Comput 9(2):123–143CrossRefGoogle Scholar
  24. Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7(3/4):601–620CrossRefGoogle Scholar
  25. Galbrun E, Miettinen P (2012) From black and white to full color: extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303MathSciNetCrossRefGoogle Scholar
  26. Garriga GC, Heikinheimo H, Seppänen JK (2007) Cross-mining binary and numerical attributes. In: Proceedings of ICDM, pp 481–486Google Scholar
  27. Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: algorithms for redescription mining. In: Proceedings of SDM, pp 334–345Google Scholar
  28. Gentleman JF, Wilk MB (1975) Detecting outliers II: supplementing the direct analysis of residuals. Biometrics 31:387–410zbMATHCrossRefGoogle Scholar
  29. Goodman LA (1970) The multivariate analysis of qualitative data: interaction among multiple classifications. J Am Stat Assoc 65:226–256CrossRefGoogle Scholar
  30. Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19(2):210–226MathSciNetCrossRefGoogle Scholar
  31. Hand DJ, Adams NM, Bolton RJ (2002) Pattern detection and discovery, vol 2447. Lecture notes in computer science, Springer, BerlinzbMATHGoogle Scholar
  32. Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243zbMATHGoogle Scholar
  33. Heikinheimo H, Fortelius M, Eronen J, Mannila H (2007) Biogeography of European land mammals shows environmentally distinct and spatially coherent clusters. J Biogeogr 34(6):1053–1064CrossRefGoogle Scholar
  34. Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525CrossRefGoogle Scholar
  35. Hochberg Y, Tamhane A (1987) Multiple comparison procedures. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  36. Jensen RT, Miller NH (2008) Giffen behavior and subsistence consumption. Am Econ Rev 98(4):1553–1577CrossRefGoogle Scholar
  37. del Jesús MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 15(4):578–592CrossRefGoogle Scholar
  38. Jorge AM, Azevedo PJ, Pereira F (2006) Distribution rules with numeric attributes of interest. In: Proceedings of PKDD, pp 247–258Google Scholar
  39. Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. pp 249–271Google Scholar
  40. Klösgen W (1998) Deviation and association patterns for subgroup mining in temporal, spatial, and textual data bases. In: Rough sets and current trends in computing. Springer, pp 1–18Google Scholar
  41. Klösgen W (1999) Applications and research problems of subgroup mining. In: Proceedings of ISMIS, pp 1–15Google Scholar
  42. Klösgen W (2002) Subgroup discovery. In: Handbook of data mining and knowledge discovery, chap. 16.3. Oxford University Press, New YorkGoogle Scholar
  43. Knobbe AJ, Feelders A, Leman D (2012) Exceptional model mining. In: Data mining: foundations and intelligent paradigms, intelligent systems reference library, vol 24, pp 183–198Google Scholar
  44. Knuth DE (1998) The art of computer programming, vol. 3: sorting and searching, 2nd edn. Addison-Wesley, ReadingGoogle Scholar
  45. Kocev D, Vens C, Struyf J, Džeroski S (2013) Tree ensembles for predicting structured outputs. Pattern Recogn 46(3):817–833CrossRefGoogle Scholar
  46. Kohavi R (1995) The power of decision tables. In: Proceedings of ECML, pp 174–189Google Scholar
  47. van de Koppel E, Slavkov I, Astrahantseff K, Schramm A, Schulte J, Vandesompele J, de Jong E, Dzeroski S, Knobbe AJ (2007) Knowledge discovery in neuroblastoma-related biological data. In: Data mining in functional genomics and proteomics workshop at PKDD 2007, Warsaw, Poland, pp 45–56Google Scholar
  48. Kralj Novak P, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403zbMATHGoogle Scholar
  49. Kriegel H-P, Kröger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: Proceedings of ICDM, pp 379–388Google Scholar
  50. Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the ninth international workshop on inductive logic programming. Lecture notes in artificial intelligence, vol 1634, pp 174–185Google Scholar
  51. Lavrač N, Kavšek B, Flach PA, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188Google Scholar
  52. van Leeuwen M (2010) Maximal exceptions with minimal descriptions. Data Min Knowl Discov 21(2):259–276MathSciNetCrossRefGoogle Scholar
  53. van Leeuwen M, Knobbe AJ (2011) Non-redundant subgroup discovery in large and complex data. In: Proceedings of ECML/PKDD, vol 3, pp 459–474Google Scholar
  54. van Leeuwen M, Knobbe AJ (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242MathSciNetCrossRefGoogle Scholar
  55. Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of ECML/PKDD, vol 2, pp 1–16Google Scholar
  56. Lemmerich F, Becker M, Atzmüller M (2012) Generic pattern trees for exhaustive exceptional model mining. In: Proceedings of ECML/PKDD, vol 2, pp 277–292Google Scholar
  57. Mampaey M, Nijssen S, Feelders A, Knobbe AJ (2012) Efficient algorithms for finding richer subgroup descriptions in numeric and nominal data. In: Proceedings of ICDM, pp 499–508Google Scholar
  58. Marshall A (1895) Principles of economics. MacMillan and co, New YorkGoogle Scholar
  59. Meeng M, Knobbe AJ (2011) Flexible enrichment with Cortana—Software Demo. In: Proceedings of Benelearn, pp 117–119Google Scholar
  60. Mitchell-Jones T et al (1999) The atlas of European mammals. Poyser natural history. Poyser, LondonGoogle Scholar
  61. Moore D, McCabe G (1993) Introduction to the practice of statistics. WH Freeman and Company, New YorkGoogle Scholar
  62. Morik K, Boulicaut JF, Siebes A (2005) Local pattern detection. Lecture notes in computer science, vol 3539, Springer, HeidelbergGoogle Scholar
  63. Neil M, Fenton N, Tailor M (2005) Using Bayesian networks to model expected and unexpected operational losses. Risk Anal 25(4):963–972CrossRefGoogle Scholar
  64. Neter J, Kutner M, Nachtsheim CJ, Wasserman W (1966) Applied linear statistical models. WCB McGraw-Hill, BostonGoogle Scholar
  65. Paine RT (1966) Food web complexity and species diversity. Am Nat 100(910):65–75CrossRefGoogle Scholar
  66. Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (1995) Turning CARTwheels: an alternating algorithm for mining redescriptions. In: Proceedings of KDD, pp 837–844Google Scholar
  67. Rezende L (2008) Econometrics of auctions by least squares. J Appl Econ 23:925–948MathSciNetCrossRefGoogle Scholar
  68. Scholz M (2005) Knowledge-based sampling for subgroup discovery. In: Morik K, Boulicaut JF, Siebes A (eds) Local pattern detection. Lecture notes in computer science, vol 3539, Springer, Heidelberg, pp 171–189Google Scholar
  69. Schubert E, Wolfe J, Tarnopolsky A (2004) Spectral centroid and timbre in complex, multiple instrumental textures. In: Proceedings of 8th international conference on music perception & cognition, pp 654–657Google Scholar
  70. Siebes A (1995) Data surveying: foundations of an inductive query language. In: Proceedings of KDD, pp 269–274Google Scholar
  71. Stengos T, Zacharias E (2006) Intertemporal pricing and price discrimination: a semiparametric hedonic analysis of the personal computer market. J Appl Econ 21:371–386MathSciNetCrossRefGoogle Scholar
  72. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. In: Proceedings of 9th international conference on music information retrieval, pp 325–330Google Scholar
  73. Umek L, Zupan B (2011) Subgroup discovery in data sets with multi-dimensional responses. Intell Data Anal 15(4):533–549Google Scholar
  74. Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Proceedings of UAI, pp 255–270Google Scholar
  75. Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, New YorkzbMATHGoogle Scholar
  76. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of PKDD, pp 78–87Google Scholar
  77. Yang G, Le Cam L (2000) Asymptotics in statistics: some basic concepts. Springer, BerlinGoogle Scholar
  78. Zhang B (2003) Regression clustering. In: Proceedings of ICDM, pp 451–458Google Scholar
  79. Zimmermann A, De Raedt L (2009) Cluster-grouping: from subgroup discovery to clustering. Mach Learn 77(1):125–159CrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • Wouter Duivesteijn
    • 1
  • Ad J. Feelders
    • 2
  • Arno Knobbe
    • 3
  1. 1.Fakultät für Informatik, LS VIIITechnische Universität DortmundDortmundGermany
  2. 2.ICSUtrecht UniversityUtrechtthe Netherlands
  3. 3.LIACSLeiden UniversityLeidenthe Netherlands

Personalised recommendations