Abstract
Recently, due to the rapid growth of electronic data having graph structures such as HTML and XML texts and chemical compounds, many researchers have been interested in data mining and machine learning techniques for finding useful patterns from graph-structured data (graph data). Since graph data contain a huge number of substructures and it tends to be computationally expensive to decide whether or not such data have given structural features, graph mining problems face computational difficulties. Let \({\mathcal{C}}\) be a graph class which satisfies a connected hereditary property and contains infinitely many different biconnected graphs, and for which a special kind of the graph isomorphism problem can be computed in polynomial time. In this paper, we consider learning and mining problems for \({\mathcal{C}}\) . Firstly, we define a new graph pattern, which is called a block preserving graph pattern (bp-graph pattern) for \({\mathcal{C}}\) . Secondly, we present a polynomial time algorithm for deciding whether or not a given bp-graph pattern matches a given graph in \({\mathcal{C}}\) . Thirdly, by giving refinement operators over bp-graph patterns, we present a polynomial time algorithm for finding a minimally generalized bp-graph pattern for \({\mathcal{C}}\) . Outerplanar graphs are planar graphs which can be embedded in the plane in such a way that all of vertices lie on the outer boundary. Many pharmacologic chemical compounds are known to be represented by outerplanar graphs. The class of connected outerplanar graphs \({\mathcal{O}}\) satisfies the above conditions for \({\mathcal{C}}\) . Next, we propose two incremental polynomial time algorithms for enumerating all frequent bp-graph patterns with respect to a given finite set of graphs in \({\mathcal{O}}\) . Finally, by reporting experimental results obtained by applying the two graph mining algorithms to a subset of the NCI dataset, we evaluate the performance of the two graph mining algorithms.
Article PDF
Similar content being viewed by others
References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th VLDB conference (pp. 487–499).
Aho, A. V., Hopcroft, J. D., & Ullman, J. D. (1974). The design and analysis of computer algorithms. Reading: Addison-Wesley.
Angluin, D. (1980a). Finding patterns common to a set of strings. Journal of Computer and System Science, 21, 46–62.
Angluin, D. (1980b). Inductive inference of formal languages from positive data. Information and Control, 45, 117–135.
Arimura, H., Sakamoto, H., & Arikawa, S. (2001). Efficient learning of semi-structured data from queries. In LNCS(LNAI): Vol. 2225. Proceedings of the 12th workshop on algorithmic learning theory (pp. 315–331). Berlin: Springer.
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of the second SIAM international conference on data mining (SDM-2002) (pp. 158–174).
Asai, T., Arimura, H., Uno, T., & Nakano, S. (2003). Discovering frequent substructures in large unordered trees. In LNCS(LNAI): Vol. 2843. Discovery science (DS-2003) (pp. 47–61). Berlin: Springer.
Cook, D. J., & Holder, L. (1994). Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–255.
Cook, D. J., & Holder, L. (2007). Mining graph data. New York: Wiley-Interscience.
Dinic, E. A. (1970). Algorithm for solution of a problem of maximum flow in a network with power estimation. Soviet Mathematics Doklady, 11, 1277–1280.
Fung, B. C. M., Wang, K., & Ester, M. (2003). Hierarchical document clustering using frequent itemsets. In Proceedings of the 3rd SIAM international conference on data mining (SDM-2003) (pp. 59–70).
Han, J., & Kamber, M. (2001). Data mining: concepts and techniques. San Mateo: Morgan Kaufmann.
Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53–87.
Hopcroft, J., & Karp, R. (1973). An n 5/2 algorithm for maximum matching in bipartite graphs. SIAM Journal on Computing, 2, 225–231.
Hopcroft, J. E., & Wong, J. K. (1974). Linear time algorithm for isomorphism of planar graphs. In Proceedings of the 6th annual ACM symposium on theory of computing (pp. 172–184).
Horváth, T., Roman, J., & Wrobel, S. (2006). Frequent subgraph mining in outerplanar graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 197–206).
Inokuchi, A., Washio, T., & Motoda, H. (2000). An apriori-based algorithm for mining frequent substructures from graph data. In LNCS: Vol. 1910. Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD-2000) (pp. 12–23). Berlin: Springer.
NCI (2000). The NCI Open Database, Release 2, August 2000 2D file. National Cancer Institute. http://cactus.nci.nih.gov/ncidb2/download.html. Accessed 1 November 2008.
Kashima, H., & Koyanagi, T. (2002). Kernels for semi-structured data. In Proceedings of the 19th international conference on machine learning (ICML-2002) (pp. 291–298).
Kryszkiewicz, M., & Skonieczny, L. (2006). Hierarchical document clustering using frequent closed sets. In Advances in soft computing. Proceedings of the international conference on intelligent information systems 2006: new trends in intelligent information processing and web mining (pp. 489–498). Berlin: Springer.
Kuboyama, T., Hirata, K., Aoki, K. F., Kashima, H., & Yasuda, H. (2006). A gram distribution kernel applied to glycan classification and motif extraction. In Proceedings of the 17th international conference on genome informatics (GIW-2006) (pp. 25–34).
Kudo, T., Maeda, E., & Matsumoto, Y. (2004). An application of boosting to graph classification. In Proceedings of the 18th annual conference on neural information processing systems (NIPS-2004).
Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In Proceedings of the 2001 IEEE international conference on data mining (pp. 313–320).
Lingas, A. (1989). Subgraph isomorphism for biconnected outerplanar graphs in cubic time. Theoretical Computer Science, 63, 295–302.
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., & Ueda, H. (2000). Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. In LNCS(LNAI): Vol. 1805. Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining (PAKDD-2000) (pp. 5–16). Berlin: Springer.
Okada, R., Matsumoto, S., Uchida, T., Suzuki, Y., & Shoudai, T. (2007). Exact learning of finite unions of graph patterns from queries. In LNCS(LNAI): Vol. 4754. Proceedings of the 18th international conference on algorithmic learning theory (ALT-2007) (pp. 298–312). Berlin: Springer.
Sasaki, Y., Yamasaki, H., Shoudai, T., & Uchida, T. (2008). Mining of frequent block preserving outerplanar graph structured patterns. In LNCS(LNAI): Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (ILP 2007) (pp. 239–253). Berlin: Springer.
Shinohara, T. (1982). Polynomial time inference of extended regular pattern languages. In LNCS(LNAI): Vol. 147. RIMS symposia on software science and engineering (pp. 115–127). Berlin: Springer.
Shoudai, T., Uchida, T., & Miyahara, T. (2001). Polynomial time algorithms for finding unordered tree patterns with internal variables. In LNCS: Vol. 2138. Proceedings of the 13th international symposium on fundamentals of computation theory (FCT-2001) (pp. 335–346). Berlin: Springer.
Suzuki, Y., Shoudai, T., Miyahara, T., & Uchida, T. (2003). A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data. In LNCS(LNAI): Vol. 2583. Proceedings of the 12nd international workshop on inductive logic programming (ILP-2002) (pp. 270–284). Berlin: Springer.
Suzuki, Y., Shoudai, T., Uchida, T., & Miyahara, T. (2006). Ordered term tree languages which are polynomial time inductively inferable from positive data. Theoretical Computer Science, 350, 63–90.
Takami, R., Suzuki, Y., Uchida, T., & Shoudai, T. (2009). Polynomial time inductive inference of TTSP graph languages from positive data. IEICE Transactions on Information and Systems, E92-D(2), 181–190.
Uchida, T., Mogawa, T., & Nakamura, Y. (2004). Finding frequent structural features among words in tree-structured documents. In LNCS(LNAI): Vol. 3056. Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining (PAKDD-2004) (pp. 351–360). Berlin: Springer.
Uchida, T., Shoudai, T., & Miyano, S. (1995). Parallel algorithm for refutation tree problem on formal graph systems. IEICE Transactions on Information and Systems, E78-D(2), 99–112.
Yamasaki, H., & Shoudai, T. (2007). A polynomial time algorithm for finding linear interval graph patterns. In LNCS: Vol. 4484. Proceedings of the 4th international conference of theory and applications of models of computation (TAMC-2007) (pp. 67–78). Berlin: Springer.
Yamasaki, H., & Shoudai, T. (2008). Mining of frequent externally extensible outerplanar graph patterns. In Proceedings of the 7th international conference on machine learning and applications (ICMLA’08) (pp. 871–876). Los Alamitos: IEEE Computer Society.
Yamasaki, H., Sasaki, Y., Shoudai, T., Uchida, T., & Suzuki, Y. (2008). Learning block-preserving outerplanar graph patterns and its application to data mining. In LNCS(LNAI): Vol. 5194. Proceedings of the 18th international conference on inductive logic programming (ILP 2008) (pp. 330–347). Berlin: Springer.
Yan, X., & Han, J. (2002). gSpan: Graph-based substructure pattern mining. In Proceedings of the third SIAM international conference on data mining (SDM03) (pp. 721–724).
Yoshida, K., & Motoda, H. (1995). Clip: concept learning from inference patterns. Artificial Intelligence, 75(1), 63–92.
Zaki, M. J. (2002). Inductive inference by stepwise pair expansion. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Filip Zelezny and Nada Lavrac.
Rights and permissions
About this article
Cite this article
Yamasaki, H., Sasaki, Y., Shoudai, T. et al. Learning block-preserving graph patterns and its application to data mining. Mach Learn 76, 137–173 (2009). https://doi.org/10.1007/s10994-009-5115-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5115-9