Fast Frequent Free Tree Mining in Graph Databases

Zhao, Peixiang; Yu, Jeffrey Xu

doi:10.1007/s11280-007-0031-z

Fast Frequent Free Tree Mining in Graph Databases

Published: 15 August 2007

Volume 11, pages 71–92, (2008)
Cite this article

World Wide Web Aims and scope Submit manuscript

Peixiang Zhao¹ &
Jeffrey Xu Yu¹

155 Accesses
9 Citations
Explore all metrics

Abstract

Free tree, as a special undirected, acyclic and connected graph, is extensively used in computational biology, pattern recognition, computer networks, XML databases, etc. In this paper, we present a computationally efficient algorithm F3TM (Fast Frequent Free Tree Mining) to find all frequently-occurred free trees in a graph database, \({\cal D} = \{g_1, g_2, \cdots, g_N\}\). Two key steps of F3TM are candidate generation and frequency counting. The frequency counting step is to compute how many graphs in \(\cal D\) containing a candidate frequent free tree, which is proved to be the subgraph isomorphism problem in nature and is NP-complete. Therefore, the key issue becomes how to reduce the number of false positives in the candidate generation step. Based on our observations, the cost of false positive reduction can be prohibitive itself. In this paper, we focus ourselves on how to reduce the candidate generation cost and minimize the number of infrequent candidates being generated. We prove a theorem that the complete set of frequent free trees can be discovered from a graph database by growing vertices on a limited range of positions of a free tree. We propose two pruning algorithms, namely, automorphism-based pruning and canonical mapping-based pruning, which significantly reduce the candidate generation cost. We conducted extensive experimental studies using a real application dataset and a synthetic dataset. The experiment results show that our algorithm F3TM outperforms the up-to-date algorithms by an order of magnitude in mining frequent free trees in large graph databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases (VLDB94), pp. 487–499 (1994)
Aho, A.V., Hopcroft, J.E.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston, MA (1974)
MATH Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh International Conference on World Wide Web (WWW98), pp. 107–117 (1998)
Chakrabarti, S., Dom, B.E., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J.: Mining the web’s link structure. Computer 32(8), 60–67 (1999)
Article Google Scholar
Chen, Z., Lin, F., Liu, H., Liu, Y., Ma, W.-Y., Wenyin, L.: User intention modeling in web applications using data mining. World Wide Web 5(3), 181–191 (2002)
Article Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM03), Washington, DC, p. 509. IEEE Computer Society, Los Alamitos, CA (2003)
Chapter Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2), 203–234 (2005)
Article Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th International Conference on Tools with Artificial Intelligence (ICTAI97), Washington, DC, p. 558. IEEE Computer Society, Los Alamitos, CA (1997)
Chapter Google Scholar
Cui, J.-H., Kim, J., Maggiorini, D., Boussetta, K., Gerla, M.: Aggregated multicasta comparative study. Cluster Comput. 8(1), 15–26 (2005)
Article Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, New York (1979)
MATH Google Scholar
Han, J., Yan, X., Yu, P.S.: Mining and searching graphs and structures. In: Proceeding of the 22th International Conference on Data Engineering (ICDE06), Philadelphia, PA. IEEE Computer Society Press, Los Alamitos, CA (2006)
Google Scholar
Hein, J., Jiang, T., Wang, L., Zhang, K.: On the complexity of comparing evolutionary trees. Discrete Appl. Math. 71(1–3), 153–169 (1996)
Article MATH MathSciNet Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM03), Washington, DC, p. 549. IEEE Computer Society, Los Alamitos, CA (2003)
Chapter Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An a priori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD00), pp. 13–23. Springer, Berlin Heidelberg New York (2000)
Chapter Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM01), pp. 313–320. IEEE Computer Society, Los Alamitos, CA (2001)
Chapter Google Scholar
Liu, T.-L., Geiger, D.: Approximate tree matching and shape similarity. In: International Conference of Computer Vision (ICCV99), pp. 456–462 (1999)
Mckay, B.D.: Nauty user’s guide. In: Technical Report TR-CS-90-02, the Department of Computer Science. Australia National University (1990)
Rückert, U., Kramer, S.: Frequent free tree discovery in graph data. In: Proceedings of the 2004 ACM symposium on Applied computing (SAC04), pp. 564–570. ACM, New York (2004)
Chapter Google Scholar
Shamir, R., Tsur, D.: Faster subtree isomorphism. In: Proceedings of the Fifth Israel Symposium on the Theory of Computing Systems (ISTCS97), Washington, DC, p. 126. IEEE Computer Society, Los Alamitos, CA (1997)
Chapter Google Scholar
Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS02), pp. 39–52. ACM, New York (2002)
Chapter Google Scholar
Termier, A., Rousset, M.-C., Sebag, M.: Treefinder: a first step towards XML data mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM02), Washington, DC, p. 450. IEEE Computer Society, Los Alamitos, CA (2002)
Chapter Google Scholar
Ullmann, J.R.: An algorithm for subgraph isomorphism. J. Assoc. Comput. Mach. 23(1), 31–42 (1976)
MathSciNet Google Scholar
Valiente, G.: Algorithms on Trees and Graphs. Springer, Berlin Heidleberg New York (2002)
MATH Google Scholar
Yan, X., Han, J.: gspan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM02), Washington, DC, p. 721. IEEE Computer Society, DC, Los Alamitos, CA (2002)
Google Scholar
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD03), pp. 286–295. ACM, New York (2003)
Chapter Google Scholar
Zaki, M.-M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar
Zhao, Q., Bhowmick, S.S., Mohania, M., Kambayashi, Y.: Discovering frequently changing structures from historical structural deltas of unordered XML. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management (CIKM04), pp. 188–197 (2004)

Download references

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Hong Kong, China
Peixiang Zhao & Jeffrey Xu Yu

Authors

Peixiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Xu Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peixiang Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, P., Yu, J.X. Fast Frequent Free Tree Mining in Graph Databases. World Wide Web 11, 71–92 (2008). https://doi.org/10.1007/s11280-007-0031-z

Download citation

Received: 05 January 2007
Accepted: 10 May 2007
Published: 15 August 2007
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11280-007-0031-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Frequent Free Tree Mining in Graph Databases

Abstract

Access this article

Similar content being viewed by others

Efficient and effective algorithms for densest subgraph discovery and maintenance

A spanning tree approach to social network sampling with degree constraints

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast Frequent Free Tree Mining in Graph Databases

Abstract

Access this article

Similar content being viewed by others

Efficient and effective algorithms for densest subgraph discovery and maintenance

A spanning tree approach to social network sampling with degree constraints

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation