Learning block-preserving graph patterns and its application to data mining

Yamasaki, Hitoshi; Sasaki, Yosuke; Shoudai, Takayoshi; Uchida, Tomoyuki; Suzuki, Yusuke

doi:10.1007/s10994-009-5115-9

Learning block-preserving graph patterns and its application to data mining

Published: 12 June 2009

Volume 76, pages 137–173, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning block-preserving graph patterns and its application to data mining

Download PDF

Hitoshi Yamasaki¹,
Yosuke Sasaki¹,
Takayoshi Shoudai¹,
Tomoyuki Uchida² &
…
Yusuke Suzuki²

599 Accesses
18 Citations
Explore all metrics

Abstract

Recently, due to the rapid growth of electronic data having graph structures such as HTML and XML texts and chemical compounds, many researchers have been interested in data mining and machine learning techniques for finding useful patterns from graph-structured data (graph data). Since graph data contain a huge number of substructures and it tends to be computationally expensive to decide whether or not such data have given structural features, graph mining problems face computational difficulties. Let \({\mathcal{C}}\) be a graph class which satisfies a connected hereditary property and contains infinitely many different biconnected graphs, and for which a special kind of the graph isomorphism problem can be computed in polynomial time. In this paper, we consider learning and mining problems for \({\mathcal{C}}\) . Firstly, we define a new graph pattern, which is called a block preserving graph pattern (bp-graph pattern) for \({\mathcal{C}}\) . Secondly, we present a polynomial time algorithm for deciding whether or not a given bp-graph pattern matches a given graph in \({\mathcal{C}}\) . Thirdly, by giving refinement operators over bp-graph patterns, we present a polynomial time algorithm for finding a minimally generalized bp-graph pattern for \({\mathcal{C}}\) . Outerplanar graphs are planar graphs which can be embedded in the plane in such a way that all of vertices lie on the outer boundary. Many pharmacologic chemical compounds are known to be represented by outerplanar graphs. The class of connected outerplanar graphs \({\mathcal{O}}\) satisfies the above conditions for \({\mathcal{C}}\) . Next, we propose two incremental polynomial time algorithms for enumerating all frequent bp-graph patterns with respect to a given finite set of graphs in \({\mathcal{O}}\) . Finally, by reporting experimental results obtained by applying the two graph mining algorithms to a subset of the NCI dataset, we evaluate the performance of the two graph mining algorithms.

Article PDF

Graph Clustering via Inexact Patterns

Learning Unordered Tree Contraction Patterns in Polynomial Time

LC-mine: a framework for frequent subgraph mining with local consistency techniques

Article 24 July 2014

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th VLDB conference (pp. 487–499).
Aho, A. V., Hopcroft, J. D., & Ullman, J. D. (1974). The design and analysis of computer algorithms. Reading: Addison-Wesley.
MATH Google Scholar
Angluin, D. (1980a). Finding patterns common to a set of strings. Journal of Computer and System Science, 21, 46–62.
Article MATH MathSciNet Google Scholar
Angluin, D. (1980b). Inductive inference of formal languages from positive data. Information and Control, 45, 117–135.
Article MATH MathSciNet Google Scholar
Arimura, H., Sakamoto, H., & Arikawa, S. (2001). Efficient learning of semi-structured data from queries. In LNCS(LNAI): Vol. 2225. Proceedings of the 12th workshop on algorithmic learning theory (pp. 315–331). Berlin: Springer.
Chapter Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of the second SIAM international conference on data mining (SDM-2002) (pp. 158–174).
Asai, T., Arimura, H., Uno, T., & Nakano, S. (2003). Discovering frequent substructures in large unordered trees. In LNCS(LNAI): Vol. 2843. Discovery science (DS-2003) (pp. 47–61). Berlin: Springer.
Google Scholar
Cook, D. J., & Holder, L. (1994). Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–255.
Google Scholar
Cook, D. J., & Holder, L. (2007). Mining graph data. New York: Wiley-Interscience.
MATH Google Scholar
Dinic, E. A. (1970). Algorithm for solution of a problem of maximum flow in a network with power estimation. Soviet Mathematics Doklady, 11, 1277–1280.
Google Scholar
Fung, B. C. M., Wang, K., & Ester, M. (2003). Hierarchical document clustering using frequent itemsets. In Proceedings of the 3rd SIAM international conference on data mining (SDM-2003) (pp. 59–70).
Han, J., & Kamber, M. (2001). Data mining: concepts and techniques. San Mateo: Morgan Kaufmann.
Google Scholar
Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53–87.
Article MathSciNet Google Scholar
Hopcroft, J., & Karp, R. (1973). An n ^5/2 algorithm for maximum matching in bipartite graphs. SIAM Journal on Computing, 2, 225–231.
Article MATH MathSciNet Google Scholar
Hopcroft, J. E., & Wong, J. K. (1974). Linear time algorithm for isomorphism of planar graphs. In Proceedings of the 6th annual ACM symposium on theory of computing (pp. 172–184).
Horváth, T., Roman, J., & Wrobel, S. (2006). Frequent subgraph mining in outerplanar graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 197–206).
Inokuchi, A., Washio, T., & Motoda, H. (2000). An apriori-based algorithm for mining frequent substructures from graph data. In LNCS: Vol. 1910. Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD-2000) (pp. 12–23). Berlin: Springer.
Chapter Google Scholar
NCI (2000). The NCI Open Database, Release 2, August 2000 2D file. National Cancer Institute. http://cactus.nci.nih.gov/ncidb2/download.html. Accessed 1 November 2008.
Kashima, H., & Koyanagi, T. (2002). Kernels for semi-structured data. In Proceedings of the 19th international conference on machine learning (ICML-2002) (pp. 291–298).
Kryszkiewicz, M., & Skonieczny, L. (2006). Hierarchical document clustering using frequent closed sets. In Advances in soft computing. Proceedings of the international conference on intelligent information systems 2006: new trends in intelligent information processing and web mining (pp. 489–498). Berlin: Springer.
Google Scholar
Kuboyama, T., Hirata, K., Aoki, K. F., Kashima, H., & Yasuda, H. (2006). A gram distribution kernel applied to glycan classification and motif extraction. In Proceedings of the 17th international conference on genome informatics (GIW-2006) (pp. 25–34).
Kudo, T., Maeda, E., & Matsumoto, Y. (2004). An application of boosting to graph classification. In Proceedings of the 18th annual conference on neural information processing systems (NIPS-2004).
Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In Proceedings of the 2001 IEEE international conference on data mining (pp. 313–320).
Lingas, A. (1989). Subgraph isomorphism for biconnected outerplanar graphs in cubic time. Theoretical Computer Science, 63, 295–302.
Article MATH MathSciNet Google Scholar
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., & Ueda, H. (2000). Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. In LNCS(LNAI): Vol. 1805. Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining (PAKDD-2000) (pp. 5–16). Berlin: Springer.
Google Scholar
Okada, R., Matsumoto, S., Uchida, T., Suzuki, Y., & Shoudai, T. (2007). Exact learning of finite unions of graph patterns from queries. In LNCS(LNAI): Vol. 4754. Proceedings of the 18th international conference on algorithmic learning theory (ALT-2007) (pp. 298–312). Berlin: Springer.
Chapter Google Scholar
Sasaki, Y., Yamasaki, H., Shoudai, T., & Uchida, T. (2008). Mining of frequent block preserving outerplanar graph structured patterns. In LNCS(LNAI): Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (ILP 2007) (pp. 239–253). Berlin: Springer.
Google Scholar
Shinohara, T. (1982). Polynomial time inference of extended regular pattern languages. In LNCS(LNAI): Vol. 147. RIMS symposia on software science and engineering (pp. 115–127). Berlin: Springer.
Google Scholar
Shoudai, T., Uchida, T., & Miyahara, T. (2001). Polynomial time algorithms for finding unordered tree patterns with internal variables. In LNCS: Vol. 2138. Proceedings of the 13th international symposium on fundamentals of computation theory (FCT-2001) (pp. 335–346). Berlin: Springer.
Google Scholar
Suzuki, Y., Shoudai, T., Miyahara, T., & Uchida, T. (2003). A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data. In LNCS(LNAI): Vol. 2583. Proceedings of the 12nd international workshop on inductive logic programming (ILP-2002) (pp. 270–284). Berlin: Springer.
Google Scholar
Suzuki, Y., Shoudai, T., Uchida, T., & Miyahara, T. (2006). Ordered term tree languages which are polynomial time inductively inferable from positive data. Theoretical Computer Science, 350, 63–90.
Article MATH MathSciNet Google Scholar
Takami, R., Suzuki, Y., Uchida, T., & Shoudai, T. (2009). Polynomial time inductive inference of TTSP graph languages from positive data. IEICE Transactions on Information and Systems, E92-D(2), 181–190.
Article Google Scholar
Uchida, T., Mogawa, T., & Nakamura, Y. (2004). Finding frequent structural features among words in tree-structured documents. In LNCS(LNAI): Vol. 3056. Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining (PAKDD-2004) (pp. 351–360). Berlin: Springer.
Google Scholar
Uchida, T., Shoudai, T., & Miyano, S. (1995). Parallel algorithm for refutation tree problem on formal graph systems. IEICE Transactions on Information and Systems, E78-D(2), 99–112.
Google Scholar
Yamasaki, H., & Shoudai, T. (2007). A polynomial time algorithm for finding linear interval graph patterns. In LNCS: Vol. 4484. Proceedings of the 4th international conference of theory and applications of models of computation (TAMC-2007) (pp. 67–78). Berlin: Springer.
Google Scholar
Yamasaki, H., & Shoudai, T. (2008). Mining of frequent externally extensible outerplanar graph patterns. In Proceedings of the 7th international conference on machine learning and applications (ICMLA’08) (pp. 871–876). Los Alamitos: IEEE Computer Society.
Chapter Google Scholar
Yamasaki, H., Sasaki, Y., Shoudai, T., Uchida, T., & Suzuki, Y. (2008). Learning block-preserving outerplanar graph patterns and its application to data mining. In LNCS(LNAI): Vol. 5194. Proceedings of the 18th international conference on inductive logic programming (ILP 2008) (pp. 330–347). Berlin: Springer.
Google Scholar
Yan, X., & Han, J. (2002). gSpan: Graph-based substructure pattern mining. In Proceedings of the third SIAM international conference on data mining (SDM03) (pp. 721–724).
Yoshida, K., & Motoda, H. (1995). Clip: concept learning from inference patterns. Artificial Intelligence, 75(1), 63–92.
Article Google Scholar
Zaki, M. J. (2002). Inductive inference by stepwise pair expansion. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80).

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University, Fukuoka, 819-0395, Japan
Hitoshi Yamasaki, Yosuke Sasaki & Takayoshi Shoudai
Department of Intelligent Systems, Hiroshima City University, Hiroshima, 731-3194, Japan
Tomoyuki Uchida & Yusuke Suzuki

Authors

Hitoshi Yamasaki
View author publications
You can also search for this author in PubMed Google Scholar
Yosuke Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Takayoshi Shoudai
View author publications
You can also search for this author in PubMed Google Scholar
Tomoyuki Uchida
View author publications
You can also search for this author in PubMed Google Scholar
Yusuke Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takayoshi Shoudai.

Additional information

Editors: Filip Zelezny and Nada Lavrac.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yamasaki, H., Sasaki, Y., Shoudai, T. et al. Learning block-preserving graph patterns and its application to data mining. Mach Learn 76, 137–173 (2009). https://doi.org/10.1007/s10994-009-5115-9

Download citation

Received: 15 November 2008
Revised: 03 April 2009
Accepted: 17 April 2009
Published: 12 June 2009
Issue Date: July 2009
DOI: https://doi.org/10.1007/s10994-009-5115-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning block-preserving graph patterns and its application to data mining

Abstract

Article PDF

Similar content being viewed by others

Graph Clustering via Inexact Patterns

Learning Unordered Tree Contraction Patterns in Polynomial Time

LC-mine: a framework for frequent subgraph mining with local consistency techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning block-preserving graph patterns and its application to data mining

Abstract

Article PDF

Similar content being viewed by others

Graph Clustering via Inexact Patterns

Learning Unordered Tree Contraction Patterns in Polynomial Time

LC-mine: a framework for frequent subgraph mining with local consistency techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation