Data Mining and Knowledge Discovery

, Volume 32, Issue 4, pp 913–948 | Cite as

Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model

  • Irma RavkicEmail author
  • Martin Žnidaršič
  • Jan Ramon
  • Jesse Davis
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2018


Counting the number of times a pattern occurs in a database is a fundamental data mining problem. It is a subroutine in a diverse set of tasks ranging from pattern mining to supervised learning and probabilistic model learning. While a pattern and a database can take many forms, this paper focuses on the case where both the pattern and the database are graphs (networks). Unfortunately, in general, the problem of counting graph occurrences is #P-complete. In contrast to earlier work, which focused on exact counting for simple (i.e., very short) patterns, we present a sampling approach for estimating the statistics of larger graph pattern occurrences. We perform an empirical evaluation on synthetic and real-world data that validates the proposed algorithm, illustrates its practical behavior and provides insight into the trade-off between its accuracy of estimation and computational efficiency.


Graph sampling Graph pattern matching Parameter estimation Statistical relational learning 



IR was partially supported by the KU Leuven Research Fund (OT/11/051) and is currently affiliated with the University of California, Los Angeles. MZ was partially supported by the KU Leuven Research Fund (OT/11/051) and the Slovenian Research Agency (P2-0103). JD is partially supported by the KU Leuven Research Fund (OT/11/051, C14/17/070, C22/15/015, C32/17/036) and FWO-Vlaanderen (G.0356.12, SBO-150033).


  1. Ariely D (2008) Predictably irrational: the hidden forces that shape our decisions. Harper Collins, New YorkGoogle Scholar
  2. Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512MathSciNetCrossRefzbMATHGoogle Scholar
  3. Baskerville K, Grassberger P, Paczuski M (2007) Graph animals, subgraph sampling, and motif search in large networks. Phys Rev E 76(3):036107MathSciNetCrossRefGoogle Scholar
  4. Bordino I, Donato D, Gionis A, Leonardi S (2008) Mining large networks with subgraph counting. In: Proceedings of the 2008 IEEE international conference on data mining (ICDM), pp 737–742Google Scholar
  5. Cordella L, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372CrossRefGoogle Scholar
  6. Das M, Wu Y, Khot T, Kersting K, Natarajan S (2016) Scaling lifted probabilistic inference and learning via graph databases. In: Proceedings of the 2016 SIAM international conference on data mining (SDM), pp 738–746Google Scholar
  7. Davis J, Domingos P (2009) Deep transfer via second-order Markov logic. In: Proceedings of the 26th international conference on machine learning (ICML), pp 217–224Google Scholar
  8. Davis J, Burnside E, Dutra IC, Page D, Costa VS (2005) An integrated approach to learning Bayesian networks of rules. In: Proceedings of the 16th European conference on machine learning (ECML), pp 84–95Google Scholar
  9. Di Natale R, Ferro A, Giugno R, Mongiovi M, Pulvirenti A, Shasha D (2010) SING: subgraph search in non-homogeneous graphs. BMC Bioinform 11(1):96CrossRefGoogle Scholar
  10. Fierens D, Blockeel H, Ramon J, Bruynooghe M (2004) Logical Bayesian networks. In: Proceedings of the 3rd international workshop on multi-relational data mining (MRDM), pp 19–30Google Scholar
  11. Friedman N, Goldzsmidt M (1996) Learning Bayesian networks with local structure. In: Proceedings of the 12th annual conference on uncertainty in artificial intelligence (UAI), pp 252–262Google Scholar
  12. Fürer M, Kasiviswanathan SP (2014) Approximately counting embeddings into random graphs. Combin Probab Comput 23(6):1028–1056MathSciNetCrossRefzbMATHGoogle Scholar
  13. Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, CambridgezbMATHGoogle Scholar
  14. Giugno R, Shasha D (2002) GraphGrep: A fast and universal method for querying graphs. In: Proceedings of the 16th international conference on pattern recognition (ICPR), pp 112–115Google Scholar
  15. Huynh T, Mooney R (2008) Discriminative structure and parameter learning for Markov logic networks. In: Proceedings of the 25th international conference on machine learning, pp 416–423Google Scholar
  16. Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354CrossRefzbMATHGoogle Scholar
  17. Jowhari H, Ghodsi M (2005) New streaming algorithms for counting triangles in graphs. In: Proceedings of the 11th international conference on computing and combinatorics (COCOON), pp 710–716Google Scholar
  18. Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758CrossRefGoogle Scholar
  19. Kersting K, De Raedt L, Kramer S (2000) Interpreting Bayesian logic programs. In: Proceedings of the AAAI-2000 workshop on learning statistical models from relational data, pp 29–35Google Scholar
  20. Kok S, Domingos P (2005) Learning the structure of Markov logic networks. In: Proceedings of the 22nd international conference on machine learning (ICML), pp 441–448Google Scholar
  21. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 631–636Google Scholar
  22. Mewes HW, Frishman D, Gruber C, Geier B, Haase D, Kaps A, Lemcke K, Mannhaupt G, Pfeiffer F, Schüller C, Stocker S, Weil B (2000) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 28(1):37–40CrossRefGoogle Scholar
  23. Pržulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):177–183CrossRefGoogle Scholar
  24. Ravkic I, Ramon J, Davis J (2015) Learning relational dependency networks in hybrid domains. Mach Learn 100(2–3):217–254MathSciNetCrossRefzbMATHGoogle Scholar
  25. Richards BL, Mooney RJ (1992) Learning relations by pathfinding. In: Proceedings of the 10th national conference on artificial intelligence (AAAI), pp 50–55Google Scholar
  26. Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2):107–136CrossRefGoogle Scholar
  27. Shervashidze N, Vishwanathan S, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Proceedings of the 12th international conference on artificial intelligence and statistics (AISTATS), pp 488–495Google Scholar
  28. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: Extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 990–998Google Scholar
  29. Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42MathSciNetCrossRefGoogle Scholar
  30. Van Haaren J, Kolobov A, Davis J (2015) TODTLER: two-order-deep transfer learning. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 3007–3015Google Scholar
  31. Venugopal D, Sarkhel S, Gogate V (2015) Just count the satisfied groundings: scalable local-search and sampling based inference in MLNs. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 3606–3612Google Scholar
  32. Wernicke S (2005) A faster algorithm for detecting network motifs. In: Proceedings of the 5th international workshop on algorithms in bioinformatics (WABI), pp 165–177Google Scholar
  33. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM), pp 721–724Google Scholar
  34. Zou R, Holder LB (2010) Frequent subgraph mining on a single large graph using sampling techniques. In: Proceedings of the 8th workshop on mining and learning with graphs (MLG), pp 171–178Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  • Irma Ravkic
    • 1
    Email author
  • Martin Žnidaršič
    • 2
  • Jan Ramon
    • 1
  • Jesse Davis
    • 1
  1. 1.Department of Computer ScienceKU LeuvenHeverlee, LeuvenBelgium
  2. 2.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations