Abstract
Developing algorithms that discover all frequently occurring subgraphs in a large graph database is computationally extensive, as graph and subgraph isomorphisms play a key role throughout the computations. Since subgraph isomorphism testing is a hard problem, fragment miners are exponential in runtime. To alleviate the complexity issue, we propose to introduce a bias in the projection operator and instead of using the costly subgraph isomorphism projection, one can use a polynomial projection having a semantically valid structural interpretation. In this paper, our purpose is to present LC-mine, a generic and efficient framework to mine frequent subgraphs by the means of local consistency techniques used in the constraint programming field. Two instances of the framework based on the arc consistency technique are developed and presented in this paper. The first instance follows a breadth-first order, while the second is a pattern-growth approach that follows a depth-first search space exploration strategy. Then, we prove experimentally that we can achieve an important performance gain without or with nonsignificant loss of discovered patterns in terms of quality.
Similar content being viewed by others
Notes
An embedding is a mapping of the nodes and edges of a subgraph to the corresponding nodes and edges in the graph the subgraph occurs in.
A frequent graph pattern is said to be closed, if there is no super frequent graph pattern with the same support.
References
Agrawal R, Skirant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases. Santiago, Chile, pp 478–499
Bessière C, Régin JC (1996) Mac and combined heuristics: two reasons to forsake fc (and cbj?) on hard problems. In ‘CP’, pp 61–75
Cook JD, Holder LB (2006) Mining graph data. Wiley, London
Douar B, Liquiere M, Latiri C, Slimani Y (2011a), FGMAC: Frequent subgraph mining with Arc Consistency. In: Proceedings of the IEEE symposium on computational intelligence and data mining, CIDM 2011, part of the IEEE symposium series on computational intelligence. IEEE Computer Society, Paris, pp 112–119
Douar B, Liquiere M, Latiri C, Slimani Y (2011b) Graph-based relational learning with a polynomial time projection algorithm. In: Proceedings of the 21st international conference on inductive logic programming, ILP 2011, vol 7207 of LNAI. Springer, Windsor Great Park, pp 96–112
Fan W, Li J, Luo J, Tan Z, Wang X, Wu Y (2011) Incremental graph pattern matching. In: Proceedings of the 2011 international conference on Management of data, SIGMOD ’11. ACM, New York, pp 925–936
Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. Proc. VLDB Endow. 3(1–2):264–275
Fan W, Li J, Ma S, Wang H, Wu Y (2010) Graph homomorphism revisited for graph matching. Proc. VLDB Endow. 3(1–2):1161–1172
Hell P, Nesetril J (2004) Graphs and homomorphism, vol 28. Oxford University Press, Oxford
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining, ICDM ’03, IEEE computer society, Washington p 549
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach. Learn. 50(3):321–354
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Cercone N, Lin TY, Wu X (eds) International conference on data mining, IEEE computer society, pp 313–320
Kuramochi M, Karypis G (2004) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16:1038–1051
Liquiere M. (2007) Arc consistency projection: a new generalization relation for graphs. In: ICCS, pp 333–346
Mackworth AK (1977) Consistency in networks of relations. Artif. Intell. 8(1):99–118
Nijssen S, Kok JN (2004) The gaston tool for frequent subgraph mining. In: International workshop on graph-based tools (Grabats). Electronic notes in theoretical computer science, pp 77–87
Provost FJ, Fawcett T (2001) Robust classification for imprecise environments. Mach. Learn. 42(3):203–231
Quinlan JR (1993) C4.5: programs for machine learning, 1st edn. Morgan Kaufmann, Burlington
Read RC, Corneil DG (1977) The graph isomorphism disease. J. Graph Theory 1(1):339–363
Rossi F, van Beek P, Walsh T (eds) (2006) Handbook of constraint programming. Elsevier, Amsterdam
Solnon C (2010) Alldifferent-based filtering for subgraph isomorphism. Artif. Intell. 174:850–864
Thoma M, Cheng H, Gretton A, Han J, Kriegel HP, Smola A, Song L, Yu PS, Yan X, Borgwardt KM (2010) Discriminative frequent subgraph mining with optimality guarantees. Stat. Anal. Data Min. 3(5):302–318
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, second edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, San Francisco
Wörlein M, Meinl T, Fischer I, Philippsen M (2005) A quantitative comparison of the subgraph miners mofa, gspan, ffsm, and gaston. In: European conference on machine learning and principles and practice of knowledge discovery in databases, vol 3721 of LNCS, Springer, Berlin pp 392–403
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: International conference on data mining, IEEE computer society, pp 721–724
Zampelli S, Deville Y, Solnon C (2010) Solving subgraph isomorphism problems with constraint programming. J Constraints 15:327–353
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Douar, B., Liquiere, M., Latiri, C. et al. LC-mine: a framework for frequent subgraph mining with local consistency techniques. Knowl Inf Syst 44, 1–25 (2015). https://doi.org/10.1007/s10115-014-0769-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0769-4