Skip to main content

TKG: Efficient Mining of Top-K Frequent Subgraphs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11932))

Abstract

Frequent subgraph mining is a popular data mining task, which consists of finding all subgraphs that appear in at least minsup graphs of a graph database. An important limitation of traditional frequent subgraph mining algorithms is that the minsup parameter is hard to set. If set too high, few patterns are found and useful information may be missed. But if set too low, runtimes can become very long and a huge number of patterns may be found. Finding an appropriate minsup value to find just enough patterns can thus be very time-consuming. This paper addresses this limitation by proposing an efficient algorithm named TKG to find the top-k frequent subgraphs, where the only parameter is k, the number of patterns to be found. The algorithm utilizes a dynamic search procedure to always explore the most promising patterns first. An extensive experimental evaluation shows that TKG has excellent performance and that it provides a valuable alternative to traditional frequent subgraph mining algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Here, generating a candidate means to combine two subgraphs to obtain another subgraph that may or may not exist in the database [20]. This is done by algorithms such as AGM [9] and FSG [11] to explore the search space.

References

  1. Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics 21(Suppl 1), 47–56 (2005)

    Article  Google Scholar 

  2. Cheng, Z., Flouvat, F., Selmaoui-Folcher, N.: Mining recurrent patterns in a dynamic attributed graph. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 631–643. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57529-2_49

    Chapter  Google Scholar 

  3. Duong, V.T.T., Khan, K.U., Jeong, B.S., Lee, Y.K.: Top-k frequent induced subgraph mining using sampling. In: Proceedings 6th International Conference on Emerging Databases: Technologies, Applications, and Theory (2016)

    Google Scholar 

  4. Duong, V.T.T., Khan, K.U., Lee, Y.K.: Top-k frequent induced subgraph mining on a sliding window using sampling. In: Proceedings 11th International Conference on Ubiquitous Information Management and Communication (2017)

    Google Scholar 

  5. Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8

    Chapter  Google Scholar 

  6. Fournier-Viger, P., Lin, J.C.W., Kiran, U.R., Koh, Y.S.: A survey of sequential pattern mining. Data Sci. Pattern Recogn. 1(1), 54–77 (2017)

    Google Scholar 

  7. Fournier-Viger, P., Chun-Wei Lin, J., Truong-Chi, T., Nkambou, R.: A survey of high utility itemset mining. In: Fournier-Viger, P., Lin, J.C.-W., Nkambou, R., Vo, B., Tseng, V.S. (eds.) High-Utility Pattern Mining. SBD, vol. 51, pp. 1–45. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04921-8_1

    Chapter  Google Scholar 

  8. Fournier-Viger, P., Lin, J.C.W., Vo, B., Chi, T.T., Zhang, J., Le, B.: A survey of itemset mining. WIREs Data Min. Knowl. Discov. (2017)

    Google Scholar 

  9. Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45372-5_2

    Chapter  Google Scholar 

  10. Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28, 75–105 (2013)

    Article  Google Scholar 

  11. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings 1st IEEE International Conference on Data Mining (2001)

    Google Scholar 

  12. Lee, G., Yun, U., Kim, D.: A weight-based approach: frequent graph pattern mining with length-decreasing support constraints using weighted smallest valid extension. Adv. Sci. Lett. 22(9), 2480–2484 (2016)

    Article  Google Scholar 

  13. Li, Y., Lin, Q., Li, R., Duan, D.: TGP: mining top-k frequent closed graph pattern without minimum support. In: Proceedings 6th International Conference on Advanced Data Mining and Applications (2010)

    Google Scholar 

  14. Mrzic, A., et al.: Grasping frequent subgraph mining for bioinformatics applications. In: BioData Mining (2018)

    Google Scholar 

  15. Nguyen, D., Luo, W., Nguyen, T.D., Venkatesh, S., Phung, D.Q.: Learning graph representation via frequent subgraphs. In: Proceedings 2018 SIAM International Conference on Data Mining, pp. 306–314 (2018)

    Chapter  Google Scholar 

  16. Nijssen, S., Kok, J.N.: The gaston tool for frequent subgraph mining. Electron. Notes Theor. Comput. Sci. 127, 77–87 (2005)

    Article  Google Scholar 

  17. Saha, T.K., Hasan, M.A.: FS3: a sampling based method for top-k frequent subgraph mining. In: Proceedings 2014 IEEE International Conference on Big Data, pp. 72–79 (2014)

    Google Scholar 

  18. Sankar, A., Ranu, S., Raman, K.: Predicting novel metabolic pathways through subgraph mining. Bioinformatics 33(24), 3955–3963 (2017)

    Article  Google Scholar 

  19. Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. In: Proceedings 6th International Conference on Data Mining, pp. 678–689 (2006)

    Google Scholar 

  20. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings 2nd IEEE International Conference on Data Mining (2002)

    Google Scholar 

  21. Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003)

    Google Scholar 

  22. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 SIGMOD Conference (2004)

    Google Scholar 

  23. Yun, U., Lee, G., Kim, C.H.: The smallest valid extension-based efficient, rare graph pattern mining, considering length-decreasing support constraints and symmetry characteristics of graphs. Symmetry 8(5), 32 (2016)

    Article  MathSciNet  Google Scholar 

  24. Zhu, F., Yan, X., Han, J., Yu, P.S.: gPrune: a constraint pushing framework for graph pattern mining. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2007)

    Google Scholar 

Download references

Acknowledgements

The work presented in this paper has been partly funded by the National Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Fournier-Viger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fournier-Viger, P., Cheng, C., Lin, J.CW., Yun, U., Kiran, R.U. (2019). TKG: Efficient Mining of Top-K Frequent Subgraphs. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P. (eds) Big Data Analytics. BDA 2019. Lecture Notes in Computer Science(), vol 11932. Springer, Cham. https://doi.org/10.1007/978-3-030-37188-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37188-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37187-6

  • Online ISBN: 978-3-030-37188-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics