Subgroup Discovery with Proper Scoring Rules

  • Hao SongEmail author
  • Meelis Kull
  • Peter Flach
  • Georgios Kalogridis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9852)


Subgroup Discovery is the process of finding and describing sufficiently large subsets of a given population that have unusual distributional characteristics with regard to some target attribute. Such subgroups can be used as a statistical summary which improves on the default summary of stating the overall distribution in the population. A natural way to evaluate such summaries is to quantify the difference between predicted and empirical distribution of the target. In this paper we propose to use proper scoring rules, a well-known family of evaluation measures for assessing the goodness of probability estimators, to obtain theoretically well-founded evaluation measures for subgroup discovery. From this perspective, one subgroup is better than another if it has lower divergence of target probability estimates from the actual labels on average. We demonstrate empirically on both synthetic and real-world data that this leads to higher quality statistical summaries than the existing methods based on measures such as Weighted Relative Accuracy.


Quality Measure Class Distribution Target Variable Brier Score Actual Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the SPHERE Interdisciplinary Research Collaboration, funded by the UK Engineering and Physical Sciences Research Council under grant EP/K031910/1; and the REFRAME project granted by the European Coordinated Research on Long-Term Challenges in Information and Communication Sciences&Technologies ERA-Net (CHIST-ERA), and funded by the Engineering and Physical Sciences Research Council in the UK under grant EP/K018728/1. Hao Song would like to thank Toshiba Research Europe Ltd, Telecommunications Research Laboratory, for funding his doctoral research within SPHERE.


  1. 1.
    Abudawood, T., Flach, P.: Evaluation measures for multi-class subgroup discovery. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 35–50. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04180-8_20 CrossRefGoogle Scholar
  2. 2.
    Atzmueller, M., Lemmerich, F.: Fast subgroup discovery for continuous target concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS (LNAI), vol. 5722, pp. 35–44. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04125-9_7 CrossRefGoogle Scholar
  3. 3.
    Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991). doi: 10.1007/BFb0017011 CrossRefGoogle Scholar
  4. 4.
    Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Discovery 30(1), 47–98 (2016)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Duivesteijn, W., Thaele, J.: Understanding where your classifier does (not) work-the SCaPE model class for EMM. In: 2014 IEEE International Conference on Data Mining (ICDM), pp. 809–814. IEEE (2014)Google Scholar
  6. 6.
    Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)CrossRefGoogle Scholar
  7. 7.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)Google Scholar
  8. 8.
    Kull, M., Flach, P.: Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 68–85. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23528-8_5 CrossRefGoogle Scholar
  9. 9.
    Lavrač, N., Flach, P., Zupan, B.: Rule evaluation measures: a unifying view. In: Džeroski, S., Flach, P. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999). doi: 10.1007/3-540-48751-4_17 CrossRefGoogle Scholar
  10. 10.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNetGoogle Scholar
  11. 11.
    Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-87481-2_1 Google Scholar
  12. 12.
    Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 30(3), 711–762 (2016)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Lichman, M.: UCI machine learning repository (2013).
  14. 14.
    Mampaey, M., Nijssen, S., Feelders, A., Knobbe, A.: Efficient algorithms for finding richer subgroup descriptions in numeric and nominal data. In: IEEE International Conference on Data Mining, pp. 499–508 (2012)Google Scholar
  15. 15.
    Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10, 377–403 (2009)zbMATHGoogle Scholar
  16. 16.
    Winkler, R.L.: Scoring rules and the evaluation of probability assessors. J. Am. Stat. Assoc. 64(327), 1073–1078 (1969)CrossRefGoogle Scholar
  17. 17.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, pp. 78–87. Springer, Heidelberg (1997). doi: 10.1007/3-540-63223-9_108 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Hao Song
    • 1
    Email author
  • Meelis Kull
    • 1
  • Peter Flach
    • 1
  • Georgios Kalogridis
    • 2
  1. 1.Intelligent Systems LaboratoryUniversity of BristolBristolUK
  2. 2.Toshiba Research Europe Ltd., Telecommunications Research LaboratoryBristolUK

Personalised recommendations