Advertisement

Explaining Deviating Subsets Through Explanation Networks

  • Antti Ukkonen
  • Vladimir Dzyuba
  • Matthijs van Leeuwen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10535)

Abstract

We propose a novel approach to finding explanations of deviating subsets, often called subgroups. Existing approaches for subgroup discovery rely on various quality measures that nonetheless often fail to find subgroup sets that are diverse, of high quality, and most importantly, provide good explanations of the deviations that occur in the data.

To tackle this issue we introduce explanation networks, which provide a holistic view on all candidate subgroups and how they relate to each other, offering elegant ways to select high-quality yet diverse subgroup sets. Explanation networks are constructed by representing subgroups by nodes and having weighted edges represent the extent to which one subgroup explains another. Explanatory strength is defined by extending ideas from database causality, in which interventions are used to quantify the effect of one query on another.

Given an explanatory network, existing network analysis techniques can be used for subgroup discovery. In particular, we study the use of Page-Rank for pattern ranking and seed selection (from influence maximization) for pattern set selection. Experiments on synthetic and real data show that the proposed approach finds subgroup sets that are more likely to capture the generative processes of the data than other methods.

Notes

Acknowledgements

Antti Ukkonen was partially supported by Tekes (project Re:Know2) and Academy of Finland (decision 288814).

References

  1. 1.
    Atzmueller, M.: Subgroup discovery. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 5(1), 35–49 (2015)Google Scholar
  2. 2.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)Google Scholar
  3. 3.
    Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-46846-3_4 CrossRefGoogle Scholar
  4. 4.
    Grosskreutz, H.: Cascaded subgroups discovery with an application to regression. In: Proceedings of LeGo ECML/PKDD Workshop (2008)Google Scholar
  5. 5.
    Huang, S., Webb, G.I.: Discarding insignificant rules during impact rule discovery in large, dense databases. In: Proceedings of SDM, pp. 541–545 (2005)Google Scholar
  6. 6.
    Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of KDD, pp. 137–146 (2003)Google Scholar
  7. 7.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)Google Scholar
  8. 8.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(Feb), 153–188 (2004)Google Scholar
  9. 9.
    Lavrač, N., Gamberger, D.: Relevancy in constraint-based subgroup discovery. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 243–266. Springer, Heidelberg (2006).  https://doi.org/10.1007/11615576_12 CrossRefGoogle Scholar
  10. 10.
    van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Mining Knowl. Discov. 25(2), 208–242 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    van Leeuwen, M., Ukkonen, A.: Discovering skylines of subgroup sets. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 272–287. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40994-3_18 CrossRefGoogle Scholar
  12. 12.
    Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-87481-2_1 CrossRefGoogle Scholar
  13. 13.
    Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Discov. 30(3), 711–762 (2016)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Lemmerich, F., Becker, M., Puppe, F.: Difference-based estimates for generalization-aware subgroup discovery. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 288–303. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40994-3_19 CrossRefGoogle Scholar
  15. 15.
    Meliou, A., Gatterbauer, W., Halpern, J.Y., Koch, C., Moore, K.F., Suciu, D.: Causality in databases. IEEE Data Eng. Bull. 33(3), 59–67 (2010)Google Scholar
  16. 16.
    Meliou, A., Roy, S., Suciu, D.: Causality and explanations in databases. Proc. VLDB Endow. 7(13), 1715–1716 (2014)CrossRefGoogle Scholar
  17. 17.
    Pearl, J.: Causality, 2nd edn. Cambridge University Press, Cambridge (2009)Google Scholar
  18. 18.
    Roy, S., Orr, L., Suciu, D.: Explaining query answers with explanation-ready databases. Proc. VLDB Endow. 9(4), 348–359 (2015)CrossRefGoogle Scholar
  19. 19.
    Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: Proceedings of SIGMOD, pp. 1579–1590 (2014)Google Scholar
  20. 20.
    Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J.: Statistical significance of combinatorial regulations. Proc. Natl. Acad. Sci. 110(32), 12996–13001 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997).  https://doi.org/10.1007/3-540-63223-9_108 CrossRefGoogle Scholar
  22. 22.
    Wu, E., Madden, S.: Scorpion: explaining away outliers in aggregate queries. Proc. VLDB Endow. 6(8), 553–564 (2013)CrossRefGoogle Scholar
  23. 23.
    Zliobaite, I., Mathioudakis, M., Lehtiniemi, T., Parviainen, P., Janhunen, T.: Accessibility by public transport predicts residential real estate prices: a case study in Helsinki region. In: 2nd Workshop on Mining Urban Data at ICML 2015 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Antti Ukkonen
    • 1
  • Vladimir Dzyuba
    • 2
  • Matthijs van Leeuwen
    • 3
  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  2. 2.Department of Computer ScienceKU LeuvenLeuvenBelgium
  3. 3.LIACSLeiden UniversityLeidenThe Netherlands

Personalised recommendations