Abstract
Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and Web browsing history. There are a few approaches that can discover characteristic patterns from graph-structured data in the field of machine learning. However, almost all of them are not suitable for such applications that require a complete search for all frequent subgraph patterns in the data. In this paper, we propose a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graph-structured data. Our algorithm can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops (including self-loops) with labeled or unlabeled nodes and links. Its performance is evaluated through the applications to Web browsing pattern analysis and chemical carcinogenesis analysis.
Article PDF
Similar content being viewed by others
References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proc. of Twentyth Very Large Dada Base Conference: VLDB'94 (pp. 487–499).
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proc. of Eleventh International Conference on Data Engineering: ICDE'95 (pp. 3–14).
Barabasi, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
Biggs, N. (1974). Algebraic graph theory. Cambridge: Cambridge University Press.
Chen, M. S., Park, J. S., & Yu, P. S. (1998). Efficient data mining for path traversal patterns. IEEE Transaction on Knowledge and Data Engineering, 10:2, 209–221.
Cook, D. J., & Holder, L. B. (1994). Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–255.
Debnath, A. K. et al. (1991). Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. J. Med. Chem, 34, 786–797.
Dehaspe, L., Toivonen, H., & King, R. D. (1998). Finding frequent substructures in chemical compound. In Proc. of Fourth International Conference on Knowledge Discovery and Data Mining: KDD'98 (pp. 30–36).
de Raedt, L., & Kramer, S. (2001). The levelwise version space algorithm and its application to molecular fragment finding. In Proc. of Seventeenth International Joint Conference on Artificial Intelligence: IJCAI'01 (Vol. 2) (pp. 853–859).
Fortin, S. (1996). The graph isomorphism problem. Technical Report 96-20, University of Alberta, Edmonton, Alberta, Canada.
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness. New York: W. H. Freeman.
Geibel, P., & Wysotzki, F. (1996). Learning relational concepts with decision trees. In Proc. of Thirteenth International Conference on Machine Learning: ICML'96 (pp. 166–174).
Hogg, T. (1996). Refining the phase transition in combinatorial search. Artificial Intelligence, 81:1/2, 127–154.
Inokuchi, A. Washio, T., & Motoda, H. (1999). Basket analysis for graph structured data. In Proc. of Third Pacific-Asia Conference on Knowledge Discovery and Data Mining: PAKDD'99 (pp. 420–431).
Inokuchi, A. et al. (2000). Application of frequent substructure mining to mutagenesis data analysis. Working notes of International Workshop of KDD Challenge on Real-world Data, PAKDD2000.
Kann, V. (1995). Strong lower bounds on the approximability of some NPO PB-complete maximization problem. In MFCS 95, LNCS (Vol. 969) (pp. 227–236).
Liquiere, M., & Sallantin, J. (1998). Structural machine learning with Galois lattice and graphs. In Proc. of Fifteenth International Conference on Machine Learning: ICML'98 (pp. 305–313).
Mannila, H., Toivonen, H., & Verkamo, A. I. (1997). Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:3, 259–289.
Mckay, B.D. (1990). NAUTY users guide (version 1.5). Technical Report, TR-CS-90-02, Department of Computer Science, Australian National University.
Nijssen, S., & Kok, J. N. (2001). Faster association rules for multiple relations. In Proc. of Seventeenth International Joint Conference on Artificial Intelligence: IJCAI'01 (Vol. 2) (pp. 891–896).
Read, R., & Corneil, D. (1977). The graph isomorphism disease. Journal of Graph Theory, 1, 339–363.
Srikant, R., Vu, Q., & Agrawal, R. (1997). Mining association rules with item constraints. In Proc. of Third International Conference on Knowledge Discovery and Data Mining: KDD'97 (pp. 67–73).
Srinivasan, A., King, R. D., Muggleton, S. H., & Sternberg, M. J. E. (1997). The predictive toxicology evaluation challenge. In Proc. of Fifteenth International Joint Conference on Artificial Intelligence: IJCAI'97 (pp. 4–9).
Ullman, J. R. (1976). An algorithm for subgraph isomorphism. Journal of the ACM, 23:1, 31–32.
Walsh, T. (2001). Search on high degree graphs. In Proc. of Seventeenth International Conference on Artificial Intelligence: IJCAI'2001 (pp. 266–271).
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of small-world networks. Nature, 393, 440–442.
Yoshida, K., & Motoda, H. (1995). Clip: Concept learning from inference pattern. Artificial Intelligence, 75:1, 63–92.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Inokuchi, A., Washio, T. & Motoda, H. Complete Mining of Frequent Patterns from Graphs: Mining Graph Data. Machine Learning 50, 321–354 (2003). https://doi.org/10.1023/A:1021726221443
Issue Date:
DOI: https://doi.org/10.1023/A:1021726221443