Machine Learning

, Volume 106, Issue 8, pp 1171–1211 | Cite as

Exceptional contextual subgraph mining

  • Mehdi Kaytoue
  • Marc Plantevit
  • Albrecht Zimmermann
  • Anes Bendimerad
  • Céline Robardet
Article

Abstract

Many relational data result from the aggregation of several individual behaviors described by some characteristics. For instance, a bike-sharing system may be modeled as a graph where vertices stand for bike-share stations and connections represent bike trips made by users from one station to another. Stations and trips are described by additional information such as the description of the geographical environment of the stations (business vs. residential area, closeness to POI, elevation, urbanization density, etc.), or properties of the bike trips (timestamp, user profile, weather, events and other special conditions about the trip). Identifying highly connected components (such as communities or quasi-cliques) in this graph provides interesting insights into global usages but does not capture mobility profiles that characterize a subpopulation. To tackle this problem we propose an approach rooted in exceptional model mining to find exceptional contextual subgraphs, i.e., subgraphs generated from a context or a description of the individual behaviors that is exceptional (behaves in a different way) compared to the whole augmented graph. The dependency between a context and an edge is assessed by a \(\chi ^2\) test and the weighted relative accuracy measure is used to only retain contexts that strongly characterize connected subgraphs. We present an original algorithm that uses sophisticated pruning techniques to restrict the search space of vertices, context refinements, and edges to be considered. An experimental evaluation on synthetic data and two real-life datasets demonstrates the effectiveness of the proposed pruning mechanisms, as well as the relevance of the discovered patterns.

Keywords

Attributed graphs Exceptional Model Mining Subgroup discovery Supervised pattern mining 

Notes

Acknowledgements

The authors would like to thank the anonymous reviewers for their frank, fruitful, constructive and insightful comments and the authors of the MiMaG and DSSD algorithms for providing us their prototypes. They also gratefully acknowledge Pierre Houdyer for the development of the pattern visualization platform on VELOV data. This work has been partially supported by the projects GRAISearch (FP7-PEOPLE-2013-IAPP) and VEL’INNOV (ANR INOV 2012).

References

  1. Ahmed, R., & Karypis, G. (2011). George algorithms for mining the evolution of conserved relational states in dynamic networks. In IEEE ICDM (pp. 1–10).Google Scholar
  2. Atzmueller, Martin, Doerfel, Stephan, & Mitzlaff, Folke. (2016). Description-oriented community detection using exhaustive subgroup discovery. Information Sciences, 329, 965–984.CrossRefGoogle Scholar
  3. Atzmüller, M., & Puppe, F. (2006). Sd-map—A fast algorithm for exhaustive subgroup discovery. In PKDD, volume 4213 of LNCS (pp. 6–17), Springer.Google Scholar
  4. Berlingerio, M., Bonchi, F., Bringmann, B., & Gionis, A. (2009). Mining graph evolution rules. In ECML/PKDD (pp. 115–130).Google Scholar
  5. Berlingerio, Michele, Coscia, Michele, Giannotti, Fosca, Monreale, Anna, & Pedreschi, Dino. (2013). Multidimensional networks: Foundations of structural analysis. World Wide Web, 16(5–6), 567–593.CrossRefGoogle Scholar
  6. Besson, J., Robardet, C., & Boulicaut, J. (2006) Mining a new fault-tolerant pattern type as an alternative to formal concept discovery. In Schärfe, H., Hitzler, P. & Øhrstrøm, P. (eds.), Conceptual Structures: Inspiration and Application, Proceedings of the 14th International Conference on Conceptual Structures, ICCS 2006, Aalborg, Denmark, July 16–21, 2006 volume 4068 of Lecture Notes in Computer Science, (pp. 144–157), Springer.Google Scholar
  7. Boden, B., Günnemann, S., Hoffmann, H. & Seidl, T. (2012). Mining coherent subgraphs in multi-layer graphs with edge labels. In KDD (pp. 1258–1266).Google Scholar
  8. Bonchi, F., Gionis, A., Gullo, F., & Ukkonen, A. (2012). Chromatic correlation clustering. In KDD (pp. 1321–1329).Google Scholar
  9. Borgwardt, K. M., Kriegel, H. P. , & Wackersreuther, P. (2006) Pattern mining in frequent dynamic subgraphs. In IEEE ICDM (pp. 818–822).Google Scholar
  10. Bringmann, Björn, Berlingerio, Michele, Bonchi, Francesco, & Gionis, Aristides. (2010). Learning and predicting the evolution of social networks. IEEE Intelligent Systems, 25(4), 26–35.CrossRefGoogle Scholar
  11. Das, Mahashweta, Amer-Yahia, Sihem, Das, Gautam, & Cong, Yu. (2011). MRI: Meaningful interpretations of collaborative ratings. Proceedings of the VLDB Endowment, 4(11), 1063–1074.Google Scholar
  12. Das, Mahashweta, Thirumuruganathan, Saravanan, Amer-Yahia, Sihem, Das, Gautam, & Cong, Yu. (2014). An expressive framework and efficient algorithms for the analysis of collaborative tagging. The VLDB Journal, 23(2), 201–226.CrossRefGoogle Scholar
  13. de Melo, P. O. S. V., Faloutsos, C., & Loureiro, A. A. F. (2011). Human dynamics in large communication networks. In SDM (pp. 968–879), SIAM.Google Scholar
  14. Desmier, E., Plantevit, M., Robardet, C. & Boulicaut, J. F. (2013). Trend mining in dynamic attributed graphs. In ECML/PKDD (pp. 654–669).Google Scholar
  15. Duivesteijn, W. (2014). A short survey of exceptional model mining: Exploring unusual interactions between multiple targets. In 2014 International Workshop on Multi-Target Prediction.Google Scholar
  16. Duivesteijn, Wouter, Feelders, Ad, & Knobbe, Arno J. (2016). Exceptional model mining—Supervised descriptive local pattern mining with complex target concepts. Data Mining and Knowledge Discovery, 30(1), 47–98.MathSciNetCrossRefGoogle Scholar
  17. Duivesteijn, W., Knobbe, A., Feelders, A., & van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks—An exceptional model mining approach. In Geoffrey I. W., Bing L., Chengqi Z., Dimitrios G., & Xindong W. (Eds), ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010 (pp. 158–167), IEEE Computer Society.Google Scholar
  18. Girvan, Michelle, & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.MathSciNetCrossRefMATHGoogle Scholar
  19. Goyal, Amit, Bonchi, Francesco, Lakshmanan, Laks V . S., & Venkatasubramanian, Suresh. (2013). On minimizing budget and time in influence propagation over social networks. Social Network Analysis and Mining, 3(2), 179–192.CrossRefGoogle Scholar
  20. Günnemann, S., Färber, I., Boden, B., & Seidl, T. (2010). Subspace clustering meets dense subgraph mining. In ICDM (pp. 845–850).Google Scholar
  21. Hamon, R. (2015). Analysis of temporal networks using signal processing methods : Application to the bike-sharing system in Lyon. Ecole normale supérieure de lyon—ENS LYON: Theses.Google Scholar
  22. Inokuchi, A., & Washio, T. (2010). Mining frequent graph sequence patterns induced by vertices. In SDM (pp. 466–477), SIAM.Google Scholar
  23. Jiang, M., Cui, P., Liu, R., Yang, Q., Wang, F., Zhu, W., & Yang, S. (2012). Social contextual recommendation. In CIKM, (pp. 45–54).Google Scholar
  24. Kaytoue, M., Pitarch, Y., Plantevit, M., & Robardet, C. (2014). Triggering patterns of topology changes in dynamic graphs. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, Beijing, China, August 17–20, 2014 pp. (158–165).Google Scholar
  25. Kaytoue, M., Silva, A., Cerf, L., Meira Jr., W., & Raïssi, C. (2012). Watch me playing, i am a professional. In WWW (Comp. Vol.) pp. (1181–1188), ACM.Google Scholar
  26. Khan, A., Yan, X., & Wu, K. L. (2010). Towards proximity pattern mining in large graphs. In SIGMOD pp. (867–878), ACM.Google Scholar
  27. Lahiri, M., & Berger-Wolf, T. Y. (2008). Mining periodic behavior in dynamic social networks. In IEEE ICDM pp. (373–382).Google Scholar
  28. Lavrac, Nada, Kavsek, Branko, Flach, Peter A., & Todorovski, Ljupco. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.MathSciNetGoogle Scholar
  29. Leman, D., Feelders, A., & Knobbe, A. J. (2008). Exceptional model mining. In ECML/PKDD (pp. 1–16).Google Scholar
  30. Lemmerich, F., Becker, M., & Atzmueller, M. (2012). Generic pattern trees for exhaustive exceptional model mining. In Flach, P. A., De Bie, T, & Cristianini, N. (Eds.), Machine Learning and Knowledge Discovery in Databases—European Conference, ECML PKDD 2012, Bristol, UK, September 24–28, 2012. Proceedings, Part II, volume 7524 of Lecture Notes in Computer Science (pp. 277–292), Springer.Google Scholar
  31. Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.CrossRefGoogle Scholar
  32. Morishita, S., & Sese, J. (2000). Traversing itemset lattice with statistical metric pruning. In PODS.Google Scholar
  33. Moser, F., Colak, R., Rafiey, A., & Ester, M. (2009). Mining cohesive patterns from graphs with feature vectors. In SDM (pp. 593–604), SIAM.Google Scholar
  34. Mougel, P. N., Rigotti, C., Plantevit, M., & Gandrillon, O. (2013). Finding maximal homogeneous clique sets. Knowledge and Information Systems, pp. 1–30.Google Scholar
  35. Novak, Petra Kralj, Lavrač, Nada, & Webb, Geoffrey I. (2009). Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.MATHGoogle Scholar
  36. Ontanón, S., Synnaeve, G., Uriarte, A., Richoux, F., Churchill, D., & Preuss, M. (2013). A survey of real-time strategy game AI research and competition in starcraft. IEEE Transactions on Computational Intelligence and AI in Games, 5(4), 293–311.CrossRefGoogle Scholar
  37. Pearson, Karl. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175.CrossRefMATHGoogle Scholar
  38. Prado, Adriana, Jeudy, Baptiste, Fromont, Élisa, & Diot, Fabien. (2013). Mining spatiotemporal patterns in dynamic plane graphs. Intelligent Data Analysis, 17(1), 71–92.Google Scholar
  39. Prado, Adriana, Plantevit, Marc, Robardet, Céline, & Boulicaut, Jean-François. (2013). Mining graph topological patterns: Finding co-variations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 99, 1.Google Scholar
  40. Qi, Guo-Jun, Aggarwal, Charu C., Tian, Qi, Ji, Heng, & Huang, Thomas S. (2012). Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5), 850–862.CrossRefGoogle Scholar
  41. Robardet, C. (2009). Constraint-based pattern mining in dynamic graphs. In IEEE ICDM (pp. 950–955).Google Scholar
  42. Schubert, M., & Drachen, A. (2016). Esports analytics through encounter detection. In Sloan, M. I. T (Ed.), Proceedings of the MIT Sloan Sports Analytics Conference, 2016.Google Scholar
  43. Sese, J., Seki, M., & Fukuzaki, M. (2010). Mining networks with shared items. In CIKM (pp. 1681–1684), ACM.Google Scholar
  44. Silva, Arlei, Meira, Wagner, & Zaki, Mohammed J. (2012). Mining attribute-structure correlated patterns in large attributed graphs. Proceedings of the VLDB Endowment, 5(5), 466–477.CrossRefGoogle Scholar
  45. Soulet, A., Raïssi, C., Plantevit, M., & Crémilleux, B. (2011). Mining dominant patterns in the sky. In 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011 (pp. 655–664).Google Scholar
  46. Sun, Yizhou, & Han, Jiawei. (2012). Mining heterogeneous information networks: Principles and methodologies. San Rafael: Morgan & Claypool Publishers.Google Scholar
  47. Taylor, T . L. (2012). Raising the stakes:E-sports and the professionalization of computer gaming. Cambridge: MIT Press.Google Scholar
  48. Tong, H., Papadimitriou, S., Sun, J., Yu, P. S., & Faloutsos, C. (2008). Colibri: fast mining of large static and dynamic graphs. In KDD (pp. 686–694).Google Scholar
  49. van Leeuwen, Matthijs. (2010). Maximal exceptions with minimal descriptions. Data Mining and Knowledge Discovery, 21(2), 259–276.MathSciNetCrossRefGoogle Scholar
  50. van Leeuwen, Matthijs, & Knobbe, Arno J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.MathSciNetCrossRefGoogle Scholar
  51. Von Eschen, A. (2014). Machine learning and data mining in call of duty (invited talk). In ECML/PKDD.Google Scholar
  52. Yang, Y., Yu, J. X., Gao, H., Pei, J. & Li, J. (2013). Mining most frequently changing component in evolving graphs. World Wide Web, pp. 1–26.Google Scholar
  53. You, Chang Hun, Holder, Lawrence B & Cook, Diane J. (2009) Learning patterns in the dynamics of biological networks. In KDD, pages 977–986.Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • Mehdi Kaytoue
    • 1
  • Marc Plantevit
    • 2
  • Albrecht Zimmermann
    • 1
  • Anes Bendimerad
    • 1
  • Céline Robardet
    • 1
  1. 1.CNRS, LIRIS UMR5205INSA de LyonLyonFrance
  2. 2.CNRS, LIRIS UMR5205Université Lyon 1LyonFrance

Personalised recommendations