Machine Learning

, Volume 82, Issue 2, pp 95–121 | Cite as

Efficiently mining δ-tolerance closed frequent subgraphs

Open Access
Article

Abstract

The output of frequent pattern mining is a huge number of frequent patterns, which are very redundant, causing a serious problem in understandability. We focus on mining frequent subgraphs for which well-considered approaches to reduce the redundancy are limited because of the complex nature of graphs. Two known, standard solutions are closed and maximal frequent subgraphs, but closed frequent subgraphs are still redundant and maximal frequent subgraphs are too specific. A more promising solution is δ-tolerance closed frequent subgraphs, which decrease monotonically in δ, being equal to maximal frequent subgraphs and closed frequent subgraphs for δ=0 and 1, respectively. However, the current algorithm for mining δ-tolerance closed frequent subgraphs is a naive, two-step approach in which frequent subgraphs are all enumerated and then sifted according to δ-tolerance closedness. We propose an efficient algorithm based on the idea of “reverse-search” by which the completeness of enumeration is guaranteed and for which new pruning conditions are incorporated. We empirically demonstrate that our approach significantly reduced the amount of real computation time of two compared algorithms for mining δ-tolerance closed frequent subgraphs, being pronounced more for practical settings.

Keywords

Frequent subgraph mining δ-tolerance closedness Partial reverse search 

References

  1. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SDM. Google Scholar
  2. Avis, D., & Fukuda, K. (1996). Reverse search for enumeration. Discrete Applied Mathematics, 65(1–3), 21–46. MATHCrossRefMathSciNetGoogle Scholar
  3. Borgelt, C., & Berthold, M. R. (2002). Mining molecular fragments: finding relevant substructures of molecules. In ICDM (pp. 51–58). Google Scholar
  4. Borgelt, C., & Meinl, T. (2006). Full perfect extension pruning for frequent graph mining. In MCD. Google Scholar
  5. Borgelt, C., Meinl, T., & Berthold, M. R. (2004). Advanced pruning strategies to speed up mining closed molecular fragments. In SMC (pp. 4565–4570). Google Scholar
  6. Chakrabarti, D., & Faloutsos, C. (2006). Graph mining: laws, generators, and algorithms. ACM Computing Surveys, 38(1), 2. CrossRefGoogle Scholar
  7. Chen, C., Lin, C. X., Yan, X., & Han, J. (2008). On effective presentation of graph patterns: a structural representative approach. In CIKM (pp. 299–308). Google Scholar
  8. Cheng, J., Ke, Y., & Ng, W. (2006). δ-tolerance closed frequent itemsets. In ICDM (pp. 139–148). Google Scholar
  9. Cheng, J., Ke, Y., Ng, W., & Lu, A. (2007). FG-index: towards verification-free query processing on graph databases. In SIGMOD (pp. 857–872). Google Scholar
  10. Chi, Y., Xia, Y., Yang, Y., & Muntz, R. R. (2005). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), 190–202. CrossRefGoogle Scholar
  11. Deshpande, M., Kuramochi, M., & Wale, N. (2005). Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1036–1050. CrossRefGoogle Scholar
  12. Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15, 55–86. CrossRefMathSciNetGoogle Scholar
  13. Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., & Mamitsuka, H. (2008). Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics, 24(16), i167–i173; Proceedings of the seventh European conference on computational biology, ECCB 2008. CrossRefGoogle Scholar
  14. Huan, J., Wang, W., Prins, J., & Yang, J. (2004). SPIN: mining maximal frequent subgraphs from graph databases. In KDD (pp. 581–586). Google Scholar
  15. Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In ICDM (pp. 313–320). Google Scholar
  16. Liu, Y., Li, J., & Gao, H. (2008). Summarizing graph patterns. In ICDE (pp. 903–912). Google Scholar
  17. Maggiora, G. M., & Shanmugasundaram, V. (2004). Methods in molecular biology : Vol. 275. Molecular similarity measures (Chap. 1, pp. 1–50). Clifton: Humana Press Google Scholar
  18. Meinl, T., Wörlein, M., Urzova, O., Fischer, I., & Philippsen, M. (2006). The ParMol package for frequent subgraph mining. Electron. Commun. EASST, 1, 1–12. ISSN:1863-2122. Google Scholar
  19. Tarjan, R. (1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2), 146–160. MATHCrossRefMathSciNetGoogle Scholar
  20. Thomas, L. T., Valluri, S. R., & Karlapalem, K. (2006). MARGIN: maximal frequent subgraph mining. In ICDM (pp. 1097–1101). Google Scholar
  21. Wörlein, M. S. (2006). Extension and parallelization of a graph-mining-algorithm. Technical report, Diploma Thesis, Friedrich-Alexander-Universität. Google Scholar
  22. Yahia, S. B., Hamrouni, T., & Nguifo, E. M. (2006). Frequent closed itemset based algorithms: a thorough structural and analytical survey. ACM SIGKDD Explorations, 8(1), 93–104. CrossRefGoogle Scholar
  23. Yan, X., & Han, J. (2002). gSpan: graph-based substructure pattern mining. In ICDM (pp. 721–724). Google Scholar
  24. Yan, X., & Han, J. (2003). CloseGraph: mining closed frequent graph patterns. In KDD (pp. 286–295). Google Scholar
  25. Yan, X., Yu, P. S., & Han, J. (2004). Graph indexing: a frequent structure-based approach. In SIGMOD (pp. 335–346). Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Bioinformatics Center, Institute for Chemical ResearchKyoto UniversityUjiJapan
  2. 2.Institute for Bioinformatics Research and Development, BIRDJapan Science and Technology Agency, JSTTokyoJapan

Personalised recommendations