Advertisement

Mining Graph Patterns

  • Hong ChengEmail author
  • Xifeng Yan
  • Jiawei Han
Chapter

Abstract

Graph pattern mining becomes increasingly crucial to applications in a variety of domains including bioinformatics, cheminformatics, social network analysis, computer vision and multimedia. In this chapter, we first examine the existing frequent subgraph mining algorithms and discuss their computational bottleneck. Then we introduce recent studies on mining various types of graph patterns, including significant, representative and dense subgraph patterns. We also discuss the mining tasks in new problem settings such as a graph stream and an uncertain graph model. These new mining algorithms represent the state-of-the-art graph mining techniques: they not only avoid the exponential size of mining result, but also improve the applicability of graph patterns significantly.

Keywords

Apriori Frequent subgraph Graph pattern Significant pattern Representative pattern Dense pattern Graph stream Uncertain graph 

References

  1. 1.
    C. C. Aggarwal, Y. Li, P. S. Yu, and R. Jin. On dense pattern mining in graph streams. PVLDB, 3(1):975–984, 2010.Google Scholar
  2. 2.
    M. Al Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. 2007 Int. Conf. Data Mining (ICDM'07), pages 153–162, 2007.Google Scholar
  3. 3.
    A. Angel, N. Koudas, N. Sarkas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. PVLDB, 3(5):574–585, 2012.Google Scholar
  4. 4.
    T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In Proc. 2002 SIAM Int. Conf. Data Mining (SDM'02), pages 158–174, 2002.Google Scholar
  5. 5.
    B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and MapReduce. PVLDB, 5(5):454–465, 2012.Google Scholar
  6. 6.
    A. Bifet, G. Holmes, B. Pfahringer, and R. Gavalda. Mining frequent closed graphs on evolving data streams. In KDD, pages 591–599, 2011.Google Scholar
  7. 7.
    C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. Data Mining (ICDM'02), pages 211–218, 2002.Google Scholar
  8. 8.
    B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. 2008 Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD'08), pages 858–863, 2008.Google Scholar
  9. 9.
    H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE'07), pages 716–725, 2007.Google Scholar
  10. 10.
    J. Cheng, Y. Ke, A. Fu, J. X. Yu, and L. Zhu. Finding maximal cliques in massive networks by H*-graph. In SIGMOD, pages 447–458, 2010.Google Scholar
  11. 11.
    J. Cheng, Y. Ke, S. Chu, and M. T. Ozsu. Efficient core decomposition in massive networks. In ICDE, pages 51–62, 2011.Google Scholar
  12. 12.
    J. Cheng, L. Zhu, Y. Ke, and S. Chu. Fast algorithms for Maximal Clique Enumeration with Limited Memory. In KDD, pages 1240–1248, 2012.Google Scholar
  13. 13.
    Y. Chi, Y. Xia, Y. Yang, and R. Muntz. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowledge and Data Eng., 17:190–202, 2005.CrossRefGoogle Scholar
  14. 14.
    L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 30–36, 1998.Google Scholar
  15. 15.
    M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 17:1036–1050, 2005.CrossRefGoogle Scholar
  16. 16.
    M. Fiedler and C. Borgelt. Support computation for mining frequent subgraphs in a single graph. In Proc. 5th Int. Workshop on Mining and Learning with Graphs (MLG'07), 2007.Google Scholar
  17. 17.
    Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proc. 2nd European Conf. Computational Learning Theory, pages 23–27, 1995.Google Scholar
  18. 18.
    D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721–732, 2005.Google Scholar
  19. 19.
    H. He and A. K. Singh. Efficient algorithms for mining significant substructures in graphs with quality guarantees. In Proc. 2007 Int. Conf. Data Mining (ICDM'07), pages 163–172, 2007.Google Scholar
  20. 20.
    L. B. Holder, D. J. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI'94 Workshop Knowledge Discovery in Databases (KDD'94), pages 169–180, 1994.Google Scholar
  21. 21.
    J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. In Proc. 2003 Int. Conf. Data Mining (ICDM'03), pages 549–552, 2003.Google Scholar
  22. 22.
    J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. 8th Int. Conf. Research in Computational Molecular Biology (RECOMB), pages 308–315, 2004.Google Scholar
  23. 23.
    J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining maximal frequent subgraphs from graph databases. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 581–586, 2004.Google Scholar
  24. 24.
    A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13–23, 1998.Google Scholar
  25. 25.
    R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, and G. Agrawal. Discovering frequent topological structures from graph datasets. In Proc. 2005 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'05), pages 606–611, 2005.Google Scholar
  26. 26.
    R. Jin, L. Liu, and C. C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. In KDD, pages 992–1000, 2011.Google Scholar
  27. 27.
    M. Koyuturk, A. Grama, and W. Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20:I200–I207, 2004.Google Scholar
  28. 28.
    T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Advances in Neural Information Processing Systems 18 (NIPS'04), 2004.Google Scholar
  29. 29.
    M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313–320, 2001.Google Scholar
  30. 30.
    M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11:243–271, 2005.CrossRefMathSciNetGoogle Scholar
  31. 31.
    S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 647–652, 2004.Google Scholar
  32. 32.
    J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 215–224, 2001.Google Scholar
  33. 33.
    S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. 2009 Int. Conf. Data Engineering (ICDE'09), pages 844–855, 2009.Google Scholar
  34. 34.
    H. Saigo, N. Krämer, and K. Tsuda. Partial least squares regression for graph mining. In Proc. 2008 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'08), pages 578–586, 2008.Google Scholar
  35. 35.
    L. Thomas, S. Valluri, and K. Karlapalem. MARGIN: Maximal frequent subgraph mining. In Proc. 2006 Int. Conf. on Data Mining (ICDM'06), pages 1097–1101, 2006.Google Scholar
  36. 36.
    C. E. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. A. Tsiarli. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD, pages 104–112, 2013.Google Scholar
  37. 37.
    K. Tsuda. Entire regularization paths for graph data. In Proc. 2007 Int. Conf. Machine Learning (ICML'07), pages 919–926, 2007.Google Scholar
  38. 38.
    N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458–465, 2002.Google Scholar
  39. 39.
    J. Wang and J. Cheng. Truss decomposition in massive networks. PVLDB, 5(9):812–823, 2012.Google Scholar
  40. 40.
    C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-base graph databases. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 316–325, 2004.Google Scholar
  41. 41.
    N. Wang, J. Zhang, K. L. Tan, A. K. H. Tung. On Triangulation-based Dense Neighborhood Graphs Discovery. PVLDB, 4(2):58–68, 2010.Google Scholar
  42. 42.
    J. Wang, J. Cheng, and A. Fu. Redundancy-aware maximal cliques Redundancy-aware Maximal Cliques. In KDD, pages 122–130, 2013.Google Scholar
  43. 43.
    T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59–68, 2003.CrossRefGoogle Scholar
  44. 44.
    J. Xiang, C. Guo, and A. Aboulnaga. Scalable maximum clique computation using MapReduce. In ICDE, pages 74–85, 2013.Google Scholar
  45. 45.
    X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. Data Mining (ICDM'02), pages 721–724, 2002.Google Scholar
  46. 46.
    X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286–295, 2003.Google Scholar
  47. 47.
    X. Yan and J. Han. Discovery of frequent substructures. In D. Cook and L. Holder (eds.), Mining Graph Data, pages 99–115, John Wiley Sons, 2007.Google Scholar
  48. 48.
    X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'04), pages 335–346, 2004.Google Scholar
  49. 49.
    X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In Proc. 2005 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'05), pages 324–333, 2005.Google Scholar
  50. 50.
    X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by scalable leap search. In Proc. 2008 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'08), pages 433–444, 2008.Google Scholar
  51. 51.
    M. J. Zaki. Efficiently mining frequent trees in a forest. In Proc. 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'02), pages 71–80, 2002.Google Scholar
  52. 52.
    Y. Zhang and S. Parthasarathy. Extracting analyzing and visualizing triangle k-core motifs within networks. In ICDE, pages 1049–1060, 2012.Google Scholar
  53. 53.
    Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633–642, 2010.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongHong KongChina
  2. 2.Department of Computer ScienceUniversity of California at Santa BarbaraSanta BarbaraUSA
  3. 3.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations