Advertisement

Rank correlated subgroup discovery

  • Mohamed Ali Hammal
  • Hélène Mathian
  • Luc Merchez
  • Marc Plantevit
  • Céline RobardetEmail author
Article

Abstract

Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.

Keywords

Subgroup discovery Exceptional model mining Gradual patterns 

Notes

Acknowledgments

This work was supported by the Labex IMU Université de Lyon (project RESALI). It was also partially supported by the CNRS project APRC Conf Pap Grant APQ-04224-16 (Multilateral Cooperation FAPEMIG-CNRS).

References

  1. Atzmüller, M., & Puppe, F. (2006). Sd-map - a fast algorithm for exhaustive subgroup discovery. In ECMLPKDD (pp. 6–17).Google Scholar
  2. Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In KDD. Citeseer, (Vol. 99 pp. 261–270).Google Scholar
  3. Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.CrossRefzbMATHGoogle Scholar
  4. Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M. (2017). Flash points: discovering exceptional pairwise behaviors in vote or rating data. In ECML PKDD (pp. 442–458).Google Scholar
  5. Belfodil, A., Kuznetsov, S.O., Robardet, C., Kaytoue, M. (2017). Mining convex polygon patterns with formal concept analysis. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI. Melbourne, Australia August 19-25, 2017 (pp. 1425–1432).Google Scholar
  6. Bendimerad, A.A., Plantevit, M., Robardet, C. (2016). Unsupervised exceptional attributed sub-graph mining in urban data. In ICDM (pp. 21–30).Google Scholar
  7. Bendimerad, A.A., Cazabet, R., Plantevit, M., Robardet, C. (2017). Contextual subgraph discovery with mobility models. In Complex networks (pp. 477–489).Google Scholar
  8. Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Błaszczyński, J., Słowiński, R., Szelkag, M. (2011). Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences, 181(5), 987–1002.MathSciNetCrossRefGoogle Scholar
  10. Boley, M., Lucchese, C., Paurat, D., Gärtner, T. (2011). Direct local pattern sampling by efficient two-step random procedures. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21-24, 2011 (pp. 582–590).Google Scholar
  11. Boley, M., Moens, S., Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77).Google Scholar
  12. Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J., Kaytoue, M. (2016). Local subgroup discovery for eliciting and understanding new structure-odor relationships. In DS (pp. 19–34).Google Scholar
  13. Calders, T., Goethals, B., Jaroszewicz, S. (2006). Mining rank-correlated sets of numerical attributes. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 96–105): ACM.Google Scholar
  14. Cerf, L., Besson, J., Robardet, C., Boulicaut, J. (2009). Closed patterns meet n-ary relations. TKDD, 3(1), 3:1–3:36.CrossRefGoogle Scholar
  15. Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J. (2008). ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis and Data Mining, 1(2), 67–84.MathSciNetCrossRefGoogle Scholar
  16. de Sá, C.R., Duivesteijn, W., Soares, C., Knobbe, A.J. (2016). Exceptional preferences mining (pp. 3–18).Google Scholar
  17. Do, T.D.T., Laurent, A., Termier, A. (2010). PGLCM: efficient parallel mining of closed frequent gradual itemsets (pp. 138–147).Google Scholar
  18. Do, T.D.T., Termier, A., Laurent, A., Negrevergne, B., Omidvar-Tehrani, B., Amer-Yahia, S. (2015). Pglcm: efficient parallel mining of closed frequent gradual itemsets. Knowledge and Information Systems, 43(3), 497–527.CrossRefGoogle Scholar
  19. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52): ACM.Google Scholar
  20. Downar, L., & Duivesteijn, W. (2017). Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowledge and Information Systems, 51(2), 369–394.CrossRefGoogle Scholar
  21. Dua, D., & Graff, C. (2017). UCI machine learning repository.Google Scholar
  22. Duivesteijn, W., Feelders, A.J., Knobbe, A. (2016). Exceptional model mining. Data Mining and Knowledge Discovery, 30(1), 47–98.MathSciNetCrossRefGoogle Scholar
  23. Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks – an exceptional model mining approach. In ICDM (pp. 158–167).Google Scholar
  24. Fan, Y.-N., Tseng, T.-L.B., Chern, C.-C., Huang, C.-C. (2009). Rule induction based on an incremental rough set. Expert Systems with Applications, 36(9), 11439–11450.CrossRefGoogle Scholar
  25. Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 19(2), 210–226.MathSciNetCrossRefGoogle Scholar
  26. Grosskreutz, H., Lang, B., Trabold, D. (2013). A relevance criterion for sequential patterns. In ECMLPKDD (pp. 369–384).Google Scholar
  27. Hüllermeier, E. (2002). Association rules for expressing gradual dependencies (pp. 200–211).Google Scholar
  28. Kaytoue, M., Kuznetsov, S.O., Napoli, A. (2011). Revisiting numerical pattern mining with formal concept analysis. In Walsh, T. (Ed.) IJCAI proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011 (p. 2011): IJCAI/AAAI.Google Scholar
  29. Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A., Robardet, C. (2017). Exceptional contextual subgraph mining. Machine Learning, 106(8), 1171–1211.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Klosgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining (pp. 249–271).Google Scholar
  31. Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In Džeroski, S., & Flach, P. (Eds.) Inductive logic programming (pp. 174–185). Berlin: Springer.Google Scholar
  32. Lavrač, N., Kavšek, B., Flach, P., Todorovski, L. (2004). Subgroup discovery with cn2-sd. Journal of Machine Learning Research, 5, 153–188.MathSciNetGoogle Scholar
  33. Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional model mining. In Joint European conference on machine learning and knowledge discovery in databases (pp. 1–16): Springer.Google Scholar
  34. Lemmerich, F., Atzmueller, M., Puppe, F. (2016). Fast exhaustive subgroup discovery with numerical target concepts. Data Mining and Knowledge Discovery, 30 (3), 711–762.MathSciNetCrossRefGoogle Scholar
  35. Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).Google Scholar
  36. Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F., Riquelme, J. (2016). Obtaining optimal quality measures for quantitative association rules. Neurocomputing, 176, 36–47. Recent advancements in hybrid artificial intelligence systems and its application to real-world problems.CrossRefGoogle Scholar
  37. Morishita, S., & Sese, J. (2000). Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’00 (pp. 226–236). New York: ACM.Google Scholar
  38. Prado, A., Plantevit, M., Robardet, C., Boulicaut, J.-F., patterns. (2013). Mining graph topological Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 25(9), 2090–2104.CrossRefGoogle Scholar
  39. Rückert, U., Richter, L., Kramer, S. (2004). Quantitative association rules based on half-spaces: an optimization approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), 1-4 November 2004, Brighton, UK (pp. 507–510).Google Scholar
  40. Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.zbMATHGoogle Scholar
  41. Sikora, M., & Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 91–114.Google Scholar
  42. Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J. (2013). Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences, 110(32), 12996–13001.MathSciNetCrossRefzbMATHGoogle Scholar
  43. Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley series in behavioral science : quantitative methods, Addison-Wesley.Google Scholar
  44. van Leeuwen, M., & Knobbem, A.J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.MathSciNetCrossRefGoogle Scholar
  45. Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In PKDD (pp. 78–87).Google Scholar
  46. Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453): ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Université de Lyon, INSA Lyon, LIRIS CNRS UMR 5205LyonFrance
  2. 2.École Normale Supérieure de Lyon, UMR 5600 Environnement Ville SociétéLyonFrance
  3. 3.Université de Lyon, Université Lyon 1, LIRIS CNRS UMR 5205LyonFrance

Personalised recommendations