Graph Mining

Hadzic, Fedja; Tan, Henry; Dillon, Tharam S.

doi:10.1007/978-3-642-17557-2_11

Graph Mining

Fedja Hadzic,
Henry Tan &
Tharam S. Dillon

Chapter

847 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 333))

Abstract

The contents of the book have focused so far on the mining of data where the underlying structure is characterized by special types of graphs where cycles are not allowed, i.e. acyclic graphs or trees. The focus of this chapter is on the frequent pattern mining problem where the underlying structure of the data can be of general graph type where cycles are allowed. These kinds of representations allow one to model complex aspects of the domain such as chemical compounds, networks, the Web, bioinformatics, etc. Generally speaking, graphs have many undesirable theoretical properties with respect to algorithmic complexity. In the graph mining problem, the common requirement is the systematic enumeration of sub-graphs from a given graph, known as the frequent subgraph mining problem. From the available graph analysis methods, we will narrow our focus to this problem as it is the prerequisite for the detection of interesting associations among graph-structured data objects, and has many important applications. For an extensive overview of graph mining in a general context, including different laws, data generators and algorithms, please refer to (Chakrabati & Faloutsos 2006; Washio & Motoda 2003, Han & Kamber 2006). Due to the existence of cycles in a graph, the frequent subgraph mining problem is much more complex than the frequent subtree mining problem. Even though theoretically it is an NP complete problem, in practice, a number of approaches are very applicable to the analysis of real-world graph data. We will look at a number of different approaches to the frequent subgraph mining problem and a number of approaches for the analysis of graph data in general.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bern, M., Eppstein, D.: Approximation Algorithms For Geometric Problems. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-Hard Problems, pp. 296–345. PWS Publishing Company (1996)
Google Scholar
Borgelt, C., Berthold, M.R.: Mining Molecular Fragments: Finding Relevant Substructures of Molecules. Paper presented at the Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM), Maebashi City, Japan, December 9-12 (2002)
Google Scholar
Chakrabarti, D., Faloutsos, C.: Graph mining: Laws, Generators and Algorithms. ACM Computing Surveys 38(1), 2-es (2006)
Article Google Scholar
Cook, D.J., Holder, L.B.: Substructure Discovery Using Minimum Description Length and Background Knowledge. Journal of Artificial Intelligence Research 1(1), 231–255 (1993)
Google Scholar
Cook, D.J., Holder, L.B.: Graph-Based Data Mining. IEEE Transactions on Intelligent Systems 15(2), 32–41 (2000)
Article MATH Google Scholar
Cook, D.J., Holder, L.B., Galal, G., Maglothin, R.: Approaches to Parallel Graph-Based Knowledge Discovery. Journal of Parallel and Distributed Computing 61(3), 427–446 (2001)
Article MATH Google Scholar
De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. Paper presented at the Proceedings of the 17th International Joint Conference on Artificial intelligence, Seattle, WA, USA, August 4-10 (2001)
Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)
Article Google Scholar
Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph Clustering and Minimum Cut Trees. Internet Mathematics 1(4), 385–408 (2004)
MATH MathSciNet Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco, CA, USA (2006)
Google Scholar
Hartuv, E., Shamir, R.: A Clustering Algorithm Based on Graph Connectivity. Information Processing Letters 76(4-6), 175–181 (2000)
Article MATH MathSciNet Google Scholar
Holder, L.B., Cook, D.J., Djoko, S.: Substructure Discovery in the SUBDUE System. Paper presented at the Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, USA, July 31- August 4 (1994)
Google Scholar
Holder, L., Cook, D., Gonzalez, J., Jonyer, I.: Structural Pattern Recognition in Graphs. In: Chen, D., Chen, X. (eds.) Pattern Recognition and String Matching, pp. 255–279. Kluwer Academic Publishers, Dordrecht (2003)
Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. Paper presented at the Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, December 19-22 (2003)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. Paper presented at the Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, Lyon, France, September 13-16 (2000)
Google Scholar
Jonyer, I., Holder, L.B., Cook, D.J.: Graph-based hierarchical conceptual clustering. Journal of Machine Learning Research 2, 19–43 (2002)
Article MATH Google Scholar
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. Paper presented at the Proceedings of the 20th International Conference on Machine Learning (ICML 2003), Washington, DC, USA, August 21-24 (2003)
Google Scholar
Ketkar, N.S., Holder, L.B., Cook, D.J.: Subdue: compression-based frequent pattern discovery in graph data. Paper presented at the Proceedings of the ACM SIGKDD 1st International Workshop on Open source Data Mining, Chicago, Illinois, USA, August 21-24 (2005)
Google Scholar
Kuramochi, M., Karypic, G.: Frequent Subgraph Discovery. Paper presented at the Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, November 29 - December 2 (2001)
Google Scholar
Kuramochi, M., Karypis, G.: Discovering Frequent Geometric Subgraphs. Paper presented at the Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, December 9-12 (2002)
Google Scholar
Lisi, F.A., Malerba, D.: Inducing Multi-Level Association Rules from Multiple Relations. Machine Learning 55(2), 175–210 (2004)
Article MATH Google Scholar
Mancoridis, S., Mitchell, B., Rorres, C., Chen, Y., Gansner, E.: Using Automatic Clustering to Produce High-Level System Organizations of Source Code. Paper presented at the Proceedings of the 6th International Workshop on Program Comprehension (IWPC 1998), Los Alamitos, CA, USA, June 26 (1998)
Google Scholar
Nijssen, S., Kok, J.N.: A Quickstart in Frequent Structure Mining Can Make a Difference. Paper presented at the Proceedings of the, International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, USA, August 22-25 (2004)
Google Scholar
Noble, C.C., Cook, D.J.: Graph-based anomaly detection. Paper presented at the Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24-27 (2003)
Google Scholar
Saigo, H., Tsuda, K.: Iterative Subgraph Mining for Principal Component Analysis. Paper presented at the Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, December 15-19 (2008)
Google Scholar
Thomas, S., Sarawagi, S.: Mining Generalized Association Rules and Sequential Patterns using SQL Queries. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining (KDD 1998), pp. 344–348 (1998)
Google Scholar
Vanetik, N., Gudes, E., Shimony, S.E.: Computing Frequent Graph Patterns from Semistructured Data. Paper presented at the Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, December 9-12 (2002)
Google Scholar
Wang, C.W., Pei, J., Zhu, Y., Shi, B.: Scalable Mining of Large Disk-Based Graph Databases. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, August 22-25 (2004)
Google Scholar
Wang, W., Wang, C., Zhu, Y., Shi, B., Pei, J., Yan, X., Han, J.: GraphMiner: a structural pattern-mining system for large disk-based graph databases and its applications. Paper presented at the Proceedings of the, ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16 (2005)
Google Scholar
Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter 5(1), 59–68 (2003)
Article Google Scholar
Wilson, R., Hancock, E., Luo, B.: Pattern vectors from algebraic graph theory. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1112–1124 (2005)
Article Google Scholar
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. Paper presented at the Proceedings of the, IEEE International Conference on Data Mining (ICDM), Maebashi City, Japan, December 9-12 (2002)
Google Scholar
Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24-27, pp. 286-295 (2003)
Google Scholar
Yan, X., Zhou, X.J., Han, J.: Mining Closed Relational Graphs with Connectivity Constraints. Paper presented at the Proceedings of the 11th ACM SIGKDD International Cofnerence on Knowledge Discovery and Data Mining (KDD 2005), Chicago, Illinois, USA, August 21-24 (2005)
Google Scholar
Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Journal of Applied Intelligence 4(3), 297–316 (1994)
Article Google Scholar
Zhang, S., Yang, J., Cheedella, V.: Monkey: Approximate Graph Mining Based on Spanning Trees. Paper presented at the Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 15-20 (2007)
Google Scholar

Download references

Authors

Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar
Henry Tan
View author publications
You can also search for this author in PubMed Google Scholar
Tharam S. Dillon
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hadzic, F., Tan, H., Dillon, T.S. (2011). Graph Mining. In: Mining of Data with Complex Structures. Studies in Computational Intelligence, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17557-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-17557-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17556-5
Online ISBN: 978-3-642-17557-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

Buying options