Advertisement

Duplicate Candidate Elimination and Fast Support Calculation for Frequent Subgraph Mining

  • Andrés Gago-Alonso
  • Jesús Ariel Carrasco-Ochoa
  • José Eladio Medina-Pagola
  • José Fco. Martínez-Trinidad
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5788)

Abstract

Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is proposed. In our proposal, the candidate space does not contain duplicates; therefore, we can use a fast evaluation strategy for reducing the cost of SI tests without wasting memory resources. Thus, we introduce a data structure to reduce the cost of SI tests. The performance of our algorithm is compared against other reported algorithms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Borgelt, C., Berthold, M.R.: Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In: Proceedings of the International Conference on Data Mining (ICDM 2002), Maebashi, Japan, pp. 211–218 (2002)Google Scholar
  2. 2.
    Gago-Alonso, A., Medina-Pagola, J.E., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Mining Frequent Subgrahps Reducing the Number of Candidates. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 365–376. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent Pattern Mining: Current Status and Future Directions. Data Mining and Knowledge Discovery, 10th Anniversary Issue 15(1), 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Hossain, M., Angryk, R.: GDClust: A Graph-based Document Clustering Technique. In: Proceedings of the 7th IEEE International Conference on Data Mining Workshops, Omaha, NE, pp. 417–422 (2007)Google Scholar
  5. 5.
    Huan, J., Wang, W., Prins, J.: Efficient Mining of Frequent Subgraph in the Presence of Isomorphism. In: Proceedings of the International Conference on Data Mining (ICDM 2003), Melbourne, FL, pp. 549–552 (2003)Google Scholar
  6. 6.
    Inokuchi, A., Washio, T., Motoda, H.: An Apriori based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Inokuchi, A., Washio, T., Nishimura, K., Motoda, H.: A Fast Algorithm for Mining Frequent Connected Subgraphs. Technical Report RT0448, IBM Research, Tokyo Research Laboratory (2002)Google Scholar
  8. 8.
    Kuramochi, M., Karypis, G.: Frequent Subgraph Discovery. In: Proceedings of the International Conference on Data Mining (ICDM 2001), San Jose, CA, pp. 313–320 (2001)Google Scholar
  9. 9.
    Nijssen, S., Kok, J.: A Quickstart in Frequent Structure Mining can Make a Difference. In: Proceedings of the ACM SIGKDD International Conference on Kowledge Discovery in Databases (KDD 2004), Seattle, WA, pp. 647–352 (2004)Google Scholar
  10. 10.
    Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A Quantitative Comparison of the Subgraph Miners Mofa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proceedings of the International Conference on Data Mining (ICDM 2002), Maebashi, Japan, pp. 721–724 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Andrés Gago-Alonso
    • 1
    • 2
  • Jesús Ariel Carrasco-Ochoa
    • 2
  • José Eladio Medina-Pagola
    • 1
  • José Fco. Martínez-Trinidad
    • 2
  1. 1.Advanced Technologies Application Center (CENATAV)La HabanaCuba
  2. 2.National Institute of AstrophysicsOptics and Electronics (INAOE)PueblaMexico

Personalised recommendations