Abstract
Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of overlap-graph based approaches is that they combine anti-monotonicity with counting the occurrences of a subgraph pattern which are independent according to certain criteria. However, existing overlap-graph based support measures are expensive to compute. In this paper, we propose a new support measure which is based on a new notion of independence. We show that our measure is the solution to a sparse linear program, which can be computed efficiently using interior point methods. We study the anti-monotonicity and other properties of this new measure, and relate it to the statistical power of a sample of embeddings in a network. We show experimentally that, in contrast to earlier overlap-graph based proposals, our support measure makes it feasible to mine subgraph patterns in large networks.
Similar content being viewed by others
Notes
This system is part of the MIPS project, available from http://people.cs.kuleuven.be/~jan.ramon/MiGraNT/MIPS/
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD’93, Washington DC, pp 207–216
Barabási A, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512
Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: Proceedings of ECML/PKDD’09, Bled, pp 115–130
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Proceedings of PAKDD’08, Osaka, pp 858–863
Calders T, Ramon J, Dyck DV (2011) All normalized anti-monotonic overlap graph measures are bounded. Data Min Knowl Discov 23(3):503–548
Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1):1–69
Chan T, Chang KL, Raman R (2009) An SDP primal-dual algorithm for approximating the Lovsz-theta function. In: Proceedings of the IEEE ISIT’09, pp 2808–2812
Diestel R (2010) Graph theory. Springer, Heidelberg
Dreweke A, Wörlein M, Fischer I, Schell D, Meinl Th, Philippsen M (2007) Graph-based procedural abstraction. In: Proceedings of the international symposium on code generation and optimization’07, San Jose, pp 259–270
Fagin R (1976) Probabilities on finite models. J Symb Logic 41(1):50–58
Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: Proceedings of the workshop on mining and learning with graphs (MLG’07), Firenze
Feige U, Goldwasser S, Lovász L, Safra S, Szegedy M (1991) Approximating clique is almost NP-complete. In: FOCS IEEE Computer Society, pp 2–12
Garey MR, Johnson DS (1979) Computers and intractibility, a guide to the theory of NP-completeness. W. H. Freeman and Company, New York
Gjoka M, Kurant M, Butts C, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM’10, San Diego, pp 1–9
Iyengar G, Phillips DJ, Stein C (2011) Approximating semidefinite packing programs. SIAM J Optim 21(1):231–268
Kibriya A, Ramon J (2012) Nearly exact mining of frequent trees in large networks. In: Proceedings of ECML-PKDD 2012, Bristol, pp 426–441
Klein PN, Lu H (1996) Efficient approximation algorithms for semidefinite programs arising from MAX CUT and COLORING. In: Proceedings of ACM STOC’96, pp 338–347
Knuth DE (1994) The sandwich theorem. Electron J Comb 1:1–48
Kuramochi M, Karypis G (2005) Finding frequent subgraph patterns in a large sparse graph. Data Mining Knowl Discov 11(3):243–271
Lovász L (1979) On the Shannon capacity of a graph. IEEE Trans Inf Theory 25(1):1–7
Luigi P, Pasquale F, Carlo S, Mario V (2004) A subgraph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Schrijver A (1979) A comparison of the Delsarte and Lovász bounds. IEEE Trans Inf Theory 25:425–429
Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph subgraph patterns from semistructured data. In: Proceeding of the IEEE international conference on data mining (ICDM’02), Maebashi, pp 458–465
Vanetik N, Shimony SE, Gudes E (2006) Support measures for graph data. Data Mining Knowl Discov 13(2):243–260
Wang Y, Ramon J (2012) An efficiently computable support measure for frequent subgraph pattern mining. In: Proceedings of ECML-PKDD 2012, Bristol, pp 362–377
Acknowledgments
This work was supported by ERC Starting Grant 240186 “MiGraNT: Mining Graphs and Networks: a Theory-based approach”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Tijl De Bie and Peter Flach.
Rights and permissions
About this article
Cite this article
Wang, Y., Ramon, J. & Fannes, T. An efficiently computable subgraph pattern support measure: counting independent observations. Data Min Knowl Disc 27, 444–477 (2013). https://doi.org/10.1007/s10618-013-0318-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0318-x