Skip to main content
Log in

An efficiently computable subgraph pattern support measure: counting independent observations

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of overlap-graph based approaches is that they combine anti-monotonicity with counting the occurrences of a subgraph pattern which are independent according to certain criteria. However, existing overlap-graph based support measures are expensive to compute. In this paper, we propose a new support measure which is based on a new notion of independence. We show that our measure is the solution to a sparse linear program, which can be computed efficiently using interior point methods. We study the anti-monotonicity and other properties of this new measure, and relate it to the statistical power of a sample of embeddings in a network. We show experimentally that, in contrast to earlier overlap-graph based proposals, our support measure makes it feasible to mine subgraph patterns in large networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. This system is part of the MIPS project, available from http://people.cs.kuleuven.be/~jan.ramon/MiGraNT/MIPS/

  2. http://odysseas.calit2.uci.edu/doku.php/public:online_social_networks

References

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD’93, Washington DC, pp 207–216

  • Barabási A, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512

    Article  MathSciNet  Google Scholar 

  • Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: Proceedings of ECML/PKDD’09, Bled, pp 115–130

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Proceedings of PAKDD’08, Osaka, pp 858–863

  • Calders T, Ramon J, Dyck DV (2011) All normalized anti-monotonic overlap graph measures are bounded. Data Min Knowl Discov 23(3):503–548

    Article  MathSciNet  MATH  Google Scholar 

  • Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1):1–69

    Article  Google Scholar 

  • Chan T, Chang KL, Raman R (2009) An SDP primal-dual algorithm for approximating the Lovsz-theta function. In: Proceedings of the IEEE ISIT’09, pp 2808–2812

  • Diestel R (2010) Graph theory. Springer, Heidelberg

    Book  Google Scholar 

  • Dreweke A, Wörlein M, Fischer I, Schell D, Meinl Th, Philippsen M (2007) Graph-based procedural abstraction. In: Proceedings of the international symposium on code generation and optimization’07, San Jose, pp 259–270

  • Fagin R (1976) Probabilities on finite models. J Symb Logic 41(1):50–58

    Article  MathSciNet  MATH  Google Scholar 

  • Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: Proceedings of the workshop on mining and learning with graphs (MLG’07), Firenze

  • Feige U, Goldwasser S, Lovász L, Safra S, Szegedy M (1991) Approximating clique is almost NP-complete. In: FOCS IEEE Computer Society, pp 2–12

  • Garey MR, Johnson DS (1979) Computers and intractibility, a guide to the theory of NP-completeness. W. H. Freeman and Company, New York

    Google Scholar 

  • Gjoka M, Kurant M, Butts C, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM’10, San Diego, pp 1–9

  • Iyengar G, Phillips DJ, Stein C (2011) Approximating semidefinite packing programs. SIAM J Optim 21(1):231–268

    Article  MathSciNet  MATH  Google Scholar 

  • Kibriya A, Ramon J (2012) Nearly exact mining of frequent trees in large networks. In: Proceedings of ECML-PKDD 2012, Bristol, pp 426–441

  • Klein PN, Lu H (1996) Efficient approximation algorithms for semidefinite programs arising from MAX CUT and COLORING. In: Proceedings of ACM STOC’96, pp 338–347

  • Knuth DE (1994) The sandwich theorem. Electron J Comb 1:1–48

    Google Scholar 

  • Kuramochi M, Karypis G (2005) Finding frequent subgraph patterns in a large sparse graph. Data Mining Knowl Discov 11(3):243–271

    Article  MathSciNet  Google Scholar 

  • Lovász L (1979) On the Shannon capacity of a graph. IEEE Trans Inf Theory 25(1):1–7

    Article  MATH  Google Scholar 

  • Luigi P, Pasquale F, Carlo S, Mario V (2004) A subgraph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372

    Article  Google Scholar 

  • Schrijver A (1979) A comparison of the Delsarte and Lovász bounds. IEEE Trans Inf Theory 25:425–429

    Google Scholar 

  • Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph subgraph patterns from semistructured data. In: Proceeding of the IEEE international conference on data mining (ICDM’02), Maebashi, pp 458–465

  • Vanetik N, Shimony SE, Gudes E (2006) Support measures for graph data. Data Mining Knowl Discov 13(2):243–260

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Y, Ramon J (2012) An efficiently computable support measure for frequent subgraph pattern mining. In: Proceedings of ECML-PKDD 2012, Bristol, pp 362–377

Download references

Acknowledgments

This work was supported by ERC Starting Grant 240186 “MiGraNT: Mining Graphs and Networks: a Theory-based approach”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuyi Wang.

Additional information

Communicated by Tijl De Bie and Peter Flach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Ramon, J. & Fannes, T. An efficiently computable subgraph pattern support measure: counting independent observations. Data Min Knowl Disc 27, 444–477 (2013). https://doi.org/10.1007/s10618-013-0318-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0318-x

Keywords

Navigation