Skip to main content

Frequent Subgraph Mining in Graph Databases Based on MapReduce

  • Conference paper
  • First Online:
Advances in Services Computing (APSCC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10065))

Included in the following conference series:

Abstract

In recent years, graph mining has become a popular research direction in the area of data mining. Frequent subgraph mining is an important technology of graph mining that can be used in many fields such as chemical informatics, bioinformatics, and social sciences. The increasing size of graph database is challenging traditional methods of subgraph mining. In this paper, we propose a new approach based on MapReduce to mine frequent subgraph patterns from the vertex-classified graph databases in large sizes. There are two rounds operation to MapReduce. The first round is to mine the locally frequent subgraphs in each node and then we collect the results for all nodes and filter some redundant graphs to obtain a set of frequent subgraphs candidate in global view. The second round is to calculate the global frequency for each graph using the set of candidate generated by the first round. Some topical frequent subgraphs are filtered according to special requirement. The experimental results show that this approach reduces the execution time when dealing with large graph databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in mapreduce. In: Proceedings of IEEE International Conference on Data Engineering, pp. 844–855 (2014)

    Google Scholar 

  2. Xu, S., Su, S., Xiong, L., Cheng, X., Xiao, K.: Differentially private frequent subgraph mining. In: Proceedings of IEEE International Conference on Data Engineering, pp. 229–240 (2016)

    Google Scholar 

  3. Shahrivari, S., Jalili, S.: Distributed discovery of frequent subgraphs of a network using mapreduce. J. Comput. 97(11), 1101–1120 (2015)

    MathSciNet  MATH  Google Scholar 

  4. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: frequent subgraph and pattern mining in a single large graph. In: Proceedings of the VLDB Endowment, pp. 517–528 (2014)

    Google Scholar 

  5. Chen, Y., Zhao, X., Lin, X., Wang, Y.: Towards frequent subgraph mining on single large uncertain graphs. In: Proceedings of IEEE International Conference on Data Mining, pp. 41–50 (2015)

    Google Scholar 

  6. Zhao, Z., Wang, G., Butt, A.R., Khan, M., Kumar, V.S.A., Marathe, M.V.: Sahad: subgraph analysis in massive networks using hadoop. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, pp. 390–401 (2012)

    Google Scholar 

  7. Afrati, F., Fotakis, D., Ullman, J.: Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE International Conference on Data Engineering, pp. 62–73 (2012)

    Google Scholar 

  8. Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the VLDB Endowment, pp. 133–144 (2012)

    Google Scholar 

  9. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of IEEE International Conference on Data Mining, pp. 313–320 (2001)

    Google Scholar 

  10. Yan, X., Han, J.: Gspan: Graph-based substructure pattern mining. In: Proceedings of IEEE International Conference on Data Mining, pp. 721–724 (2002)

    Google Scholar 

  11. Teixeira, C., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of Symposium on Operating Systems Principles, pp. 425–440 (2015)

    Google Scholar 

  12. Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: Proceedings of ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 661–666 (2012)

    Google Scholar 

  13. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(15), 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  14. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  15. Inokuchi, A., Washio, T., Motoda, H.: Complete mining of frequent patterns from graphs: mining graph data. J. Mach. Learn. 50(3), 321–354 (2003)

    Article  MATH  Google Scholar 

  16. http://www.intsci.acxn/pdin/pdminer.html

  17. http://www.cse.ust.hk/graphgen/

  18. http://oldwww.comlab.ox.ac.uk/oucl/groups/machlearn/PTE

Download references

Acknowledgments

This paper is supported by the NSFC under grant No.61433019 and Science and technology project of Guangdong Province (No. 2016B030306003 and 2016B030305002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xia Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Wang, K., Xie, X., Jin, H., Yuan, P., Lu, F., Ke, X. (2016). Frequent Subgraph Mining in Graph Databases Based on MapReduce. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49178-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49177-6

  • Online ISBN: 978-3-319-49178-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics