Advertisement

Knowledge and Information Systems

, Volume 54, Issue 1, pp 123–149 | Cite as

Resling: a scalable and generic framework to mine top-k representative subgraph patterns

  • Dheepikaa Natarajan
  • Sayan RanuEmail author
Regular Paper
  • 238 Downloads

Abstract

Mining subgraph patterns is an active area of research due to its wide-ranging applications. Examples include frequent subgraph mining, discriminative subgraph mining, statistically significant subgraphs. Existing research has primarily focused on mining all subgraph patterns in the database. However, due to the exponential subgraph search space, the number of patterns mined, typically, is too large for any human-mediated analysis. Consequently, deriving insights from the mined patterns is hard for domain scientists. In addition, subgraph pattern mining is posed in multiple forms: the function that models if a subgraph is a pattern varies based on the application and the database could be over multiple graphs or a single, large graph. In this paper, we ask the following question: Given a subgraph importance function and a budget k, which are the k subgraph patterns that best represent all other patterns of interest? We show that the problem is NP-hard, and propose a generic framework called Resling that adapts to arbitrary subgraph importance functions and generalizable to both transactional graph databases as well as single, large graphs. Resling derives its power by structuring the search space in the form of an edit map, where each subgraph is a node, and two subgraphs are connected if they have an edit distance of one. We rank nodes in the edit map through two random walk based algorithms: vertex-reinforced random walks ( Resling -VR) and negative-reinforced random walks( Resling -NR). Experiments show that Resling-VR is up to 20 times more representative of the pattern space and two orders of magnitude faster than the state-of-the-art techniques. Resling-NR further improves the running time while maintaining comparable or better performance in representative power.

Keywords

Graph mining Random walk Diversity Representative patterns 

References

  1. 1.
    Ranu S, Singh AK (2012) Indexing and mining topological patterns for drug discovery. In: EDBT, pp 562–565Google Scholar
  2. 2.
    Ranu S, Hoang M, Singh A (2013) Mining discriminative subgraphs from global-state networks. In: KDD, pp 509–517Google Scholar
  3. 3.
    Chaoji V, Ranu S, Rastogi R, Bhatt R (2012) Recommendations to boost content spread in social networks. In: WWW, pp 529–538Google Scholar
  4. 4.
    Banerjee P, Ranu S, Raghavan S (2014) Inferring uncertain trajectories from partial observations. In: ICDM, pp 30–39Google Scholar
  5. 5.
    Banerjee P, Yawalkar P, Ranu S (2016) Mantra: a scalable approach to mining temporally anomalous sub-trajectories. In: KDD, pp 1415–1424Google Scholar
  6. 6.
    Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: ICDM, p 721. ISBN: 0-7695-1754-4Google Scholar
  7. 7.
    Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph*. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRefGoogle Scholar
  8. 8.
    Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7):517–528Google Scholar
  9. 9.
    Gurukar S, Ranu S, Ravindran B (2015) Commit: A scalable approach to mining communication motifs from dynamic networks. In: SIGMOD, pp 475–489Google Scholar
  10. 10.
    Thoma M, Cheng H, Gretton A, Han J, Kriegel H-P, Smola A, Song L, Yu PS, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: SDM 2009, pp 1076–1087Google Scholar
  11. 11.
    Hasan MA, Zaki MJ (2009) Output space sampling for graph patterns. PVLDB 2(1):730–741Google Scholar
  12. 12.
    Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: ICDEGoogle Scholar
  13. 13.
    Ranu S, Calhoun BT, Singh AK, Swamidass SJ (2011) Probabilistic substructure mining from small-molecule screens. Mol Inf 30(9):809–815CrossRefGoogle Scholar
  14. 14.
    Ranu S, Singh AK (2009) Mining statistically significant molecular substructures for efficient molecular classification. J Chem Inf Model 49:2537–2550CrossRefGoogle Scholar
  15. 15.
    Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDMGoogle Scholar
  16. 16.
    Nijssen S, Kok JN (2004) The Gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based toolsGoogle Scholar
  17. 17.
    Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by scalable leap search. In: SIGMODGoogle Scholar
  18. 18.
    Jin N, Young C, Wang W (2010) Gaia: graph classification using evolutionary computation. In: SIGMODGoogle Scholar
  19. 19.
    Cheng H, Lo D, Zhou Y, Wang X, Yan X (2009) Identifying bug signatures using discriminative graph mining, In: Proceedings of the eighteenth international symposium on software testing and analysis, pp 141–152Google Scholar
  20. 20.
    Dutkowski J, Ideker T (2011) Protein networks as logic functions in development and cancer. PLoS Comput Biol 7:e1002180CrossRefGoogle Scholar
  21. 21.
    Hasan MA, Chaoji V, Salem S, Besson J, Zaki MJ (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162Google Scholar
  22. 22.
    Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: KDD, pp 286–295Google Scholar
  23. 23.
    Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. PVLDB 2(1):25–36Google Scholar
  24. 24.
    Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining, In: ICDM, pp 1082–1087Google Scholar
  25. 25.
    Natarajan D, Ranu S (2016) A scalable and generic framework to mine top-k representative subgraph patterns. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 370–379Google Scholar
  26. 26.
    Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp 398–412Google Scholar
  27. 27.
    Ranu S, Hoang M, Singh A (2014) Answering top-k representative queries on graph databases. In: SIGMOD, pp 1163–1174Google Scholar
  28. 28.
    Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1):13–24Google Scholar
  29. 29.
    Cornuejols G, Fisher ML, Nemhauser GL (1977) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Manag Sci 23(8):789–810MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDEGoogle Scholar
  31. 31.
    Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web. In: WWW, pp 161–172Google Scholar
  32. 32.
    Pemantle R (1992) Vertex-reinforced random walk. Probab Theory Relat Fields 92(1):117–136MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Badrinath R, Madhavan CEV (2012) Diversity in ranking using negative reinforcement. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, vol 11, no 1–11, p 6Google Scholar
  34. 34.
    Mei Q, Guo J, Radev D (2010) Divrank: the interplay of prestige and diversity in information networks. In: KDDGoogle Scholar
  35. 35.
    Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586Google Scholar
  36. 36.
    Thomas L, Valluri S, Karlapalem K (2006) Margin: maximal frequent subgraph mining. In: ICDM, pp 1097–1101Google Scholar
  37. 37.
    Krishnan A, Padmanabhan D, Ranu S, Mehta S (2016) Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: International conference on web information systems engineering, pp 157–173Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  1. 1.Machine Learning Team, AmazonSeattleUSA
  2. 2.Department of CSEIIT DelhiNew DelhiIndia

Personalised recommendations