Abstract
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we consider the class of outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for outerplanar graphs, and show that it works in incremental polynomial time for the practically relevant subclass of well-behaved outerplanar graphs, i.e., which have only polynomially many simple cycles. We evaluate the algorithm empirically on chemo- and bioinformatics applications.
Similar content being viewed by others
References
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI Press/The MIT Press, Menlo Park, CA, pp 307–328
Bodlaender HL (1998) A partial k-arboretum of graphs with bounded treewidth. Theor Comput Sci 209(1–2): 1–45
Borgelt C, Berthold M (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM). IEEE Computer Society, pp 51–58
Calders T, Ramon J, Van Dyck D (2008) Anti-monotonic overlap-graph support measures. In: Proceedings of the 2008 IEEE international conference on data mining (ICDM). IEEE Computer Society, pp 73–82
Chartrand G, Harary F (1967) Planar permutation graphs. Annales de l’institut Henri Poincaré, (Sec. B) Probabilités et Statistiques 3(4): 433–438
Chi Y, Nijssen S, Muntz RR, Kok JN (2005) Frequent subtree mining–an overview. Fundam Inform 66: 161–198
Chi Y, Yang Y, Muntz RR (2005) Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl Inf Syst 8(2): 203–234
Cook D, Holder L (1994) Substructure discovery using minimum description length and background knowledge. J Artif Intell Res 1: 231–255
Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17(8): 1036–1050
Diestel R (2005) Graph theory. 3. Springer, Heidelberg
Feder T, Hell P (1998) List homomorphisms to reflexive graphs. J Comb Theory B 72(2): 236–250
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco
Harary F (1971) Graph theory. Addison-Wesley, Reading
He H, Singh AK (2007) Efficient algorithms for mining significant substructures in graphs with quality guarantees. In: Proceedings of the 2007 IEEE international conference on data mining (ICDM). IEEE Computer Society, pp 163–172
Hedetniemi S, Chartrand G, Geller D (1971) Graphs with forbidden suhgraphs. J Comb Theory 10: 12–41
Hopcroft JE, Wong JK (1974) Linear time algorithm for isomorphism of planar graphs. In: Proceedings of the sixth annual ACM symposium on theory of Computing (STOC). ACM Press, New York, pp 172–184
Horváth T (2005) Cyclic pattern kernels revisited. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD), vol 3518 of LNAI. Springer, Heidelberg, pp 791–801
Horváth T, Bringmann B, Raedt LD (2007) Frequent hypergraph mining. In: Proceedings of the 16th international conference on inductive logic programming (ILP), vol 4455 of LNAI. Springer, Heidelberg, pp 244–259
Horváth T, Wrobel S, Bohnebeck U (2001) Relational instance-based learning with lists and terms. Mach Learn 43(1/2): 53–80
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3): 321–354
Johnson DS, Papadimitriou CH, Yannakakis M (1988) On generating all maximal independent sets. Inform Process Lett 27(3): 119–123
Koontz W (1980) Economic evaluation of loop feeder relief alternatives. Bell Syst Tech J 59: 277–281
Kramer S, De Raedt L, Helma C (2001) Molecular feature mining in HIV data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 136–143
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 2001 international conference on data mining (ICDM). IEEE Computer Society, pp 313–320
Leydold J, Stadler PF (1998) Minimal cycle bases of outerplanar graphs. Electron J Comb 5
Lingas A (1989) Subgraph isomorphism for biconnected outerplanar graphs in cubic time. Theor Comput Sci 63(3): 295–302
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Mining Knowl Discover 1(3): 241–258
Matula DW (1978) Subtree isomorphism in O(n 5/2). Ann Discrete Math 2: 91–106
Maunz A, Helma C, Kramer S (2009) Large-scale graph mining using backbone refinement classes. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 617–626
Mitchell SL (1979) Linear algorithms to recognize outerplanar and maximal outerplanar graphs. Inform Process Lett 9(5): 229–232
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 647–652
Nishi T, Chua LO (1986) Uniqueness of solution for nonlinear resistive circuits containing CCCS’s or VCVS’s whose controlling coefficients are finite. IEEE Trans Circuits Syst 33(4): 381–397
Ramon J, Nijssen S (2009) Polynomial-delay enumeration of monotonic graph classes. J Mach Learn Res 10: 907–929
Read RC, Tarjan RE (1975) Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks 5(3): 237–252
Schietgat L, Costa F, Ramon J, De Raedt L (2009) Maximum common subgraph mining: a fast and effective approach towards feature generation. In: Proceedings of the 7th international workshop on mining and learning with graphs (MLG). Leuven, Belgium, pp 1–3
Schietgat L, Ramon J, Bruynooghe M, Blockeel H (2008) An efficiently computable graph-based metric for the classification of small molecules. In: Proceedings of the 11th international conference on discovery science (DS), vol 5255 of LNAI. Springer, Heidelberg, pp 197–209
Shamir R, Tsur D (1999) Faster subtree isomorphism. J Algorithms 33(2): 267–280
Syslo MM (1982) The subgraph isomorphism problem for outerplanar graphs. Theor Comput Sci 17: 91–97
Tarjan RE (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2): 146–160
Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 737–746
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining. IEEE Computer Society, pp 721–724
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M.J. Zaki.
A preliminary version of this paper appeared in: T. Eliassi-Rad, L. Ungar, M. Craven, and D. Gunopulos (Eds.), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 197–206, ACM Press, New York, NY, 2006.
Rights and permissions
About this article
Cite this article
Horváth, T., Ramon, J. & Wrobel, S. Frequent subgraph mining in outerplanar graphs. Data Min Knowl Disc 21, 472–508 (2010). https://doi.org/10.1007/s10618-009-0162-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0162-1